<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" version="2.0">

<channel>
	<title>gtuhl: startup technology</title>
	
	<link>http://blog.gtuhl.com</link>
	<description>Development, IT, Gadgets, and Startups</description>
	<lastBuildDate>Mon, 04 Jan 2010 01:28:14 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/gtuhl" /><feedburner:info xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" uri="gtuhl" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com" /><feedburner:emailServiceId xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">gtuhl</feedburner:emailServiceId><feedburner:feedburnerHostname xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">http://feedburner.google.com</feedburner:feedburnerHostname><item>
		<title>Looking Forward to Big Changes in 2010</title>
		<link>http://blog.gtuhl.com/2010/01/03/looking-forward-to-big-changes-in-2010/</link>
		<comments>http://blog.gtuhl.com/2010/01/03/looking-forward-to-big-changes-in-2010/#comments</comments>
		<pubDate>Mon, 04 Jan 2010 01:28:14 +0000</pubDate>
		<dc:creator>Joe</dc:creator>
				<category><![CDATA[startups]]></category>
		<category><![CDATA[atlanta]]></category>
		<category><![CDATA[career]]></category>

		<guid isPermaLink="false">http://blog.gtuhl.com/?p=518</guid>
		<description><![CDATA[It has been quite a year &#8211; a good year considering the broad financial turmoil of 2009.  Many goals have been accomplished but perhaps the greatest accomplishments for me have been professionally at WTC &#8211; a company I have had an active hand in since its earliest days.
Over the last year the number of [...]]]></description>
			<content:encoded><![CDATA[<p>It has been quite a year &#8211; a good year considering the broad financial turmoil of 2009.  Many goals have been accomplished but perhaps the greatest accomplishments for me have been professionally at WTC &#8211; a company I have had an active hand in since its earliest days.</p>
<p>Over the last year the number of customers, the amount of revenue, and the degree of stability (both financially and technically) has increased dramatically.  It is a completely different company than 12 months ago with serious traction.  My role in that progression was in my opinion minor and I chalk the success up to the absolutely stellar team that makes the company run now.  I could not be more satisfied or proud of what has been accomplished and could not be happier or more complementary about the team that has gathered around it.</p>
<p>Those facts make it difficult and potentially confusing to mention my decision to step way from an active role in the company.  I will still be involved as an adviser, supporter, and proud evangelist but will no longer be actively contributing to the technical development or operations of the company.  This change is being made slowly over the next month or two to ensure the transition is smooth and I genuinely believe the company is going to be a great example of success for Atlanta.  </p>
<p>What is next for me?</p>
<p>I am joining another startup in Atlanta working on an utterly different product.  The founders consist of guys I worked with for years in the undergraduate computer labs at Georgia Tech and the technology is around areas I am very passionate about &#8211; those that I was focused on as a student before jumping at a chance to help take WTC off the ground as an early engineer.  This opportunity combined with the strides made in the last year to put WTC in a stable, growing position are the reasons for this decision.  </p>
<p>Needless to say 2010 should be interesting, I am looking forward to it.  I’ll be trying to keep this blog a bit more active as well.  I had a few posts in 2009 that really took off and hope to give it more attention in 2010.</p>
<img src="http://feeds.feedburner.com/~r/gtuhl/~4/sPsgS_kjrJg" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.gtuhl.com/2010/01/03/looking-forward-to-big-changes-in-2010/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Vetting a Web Dev Firm</title>
		<link>http://blog.gtuhl.com/2009/10/13/vetting-a-web-dev-firm/</link>
		<comments>http://blog.gtuhl.com/2009/10/13/vetting-a-web-dev-firm/#comments</comments>
		<pubDate>Wed, 14 Oct 2009 00:28:33 +0000</pubDate>
		<dc:creator>Joe</dc:creator>
				<category><![CDATA[development]]></category>
		<category><![CDATA[agencies]]></category>
		<category><![CDATA[design]]></category>

		<guid isPermaLink="false">http://blog.gtuhl.com/?p=502</guid>
		<description><![CDATA[There are an awful lot of web design/development firms or agencies out there and a disheartening percentage are just terrible at what they do.  It is common to find firms that fail at both the design AND development though often more common to find a place with decent designs that is utterly incapable of [...]]]></description>
			<content:encoded><![CDATA[<p>There are an awful lot of web design/development firms or agencies out there and a disheartening percentage are just terrible at what they do.  It is common to find firms that fail at both the design AND development though often more common to find a place with decent designs that is utterly incapable of writing code, while claiming they can code just fine.  They just can&#8217;t implement their designs unless the implementation is done via huge images or flash.</p>
<p>Here are a few questions for vetting firms with the goal of purging the truly unqualified ones.  These questions are based on actual experiences where the lack of understanding was so complete that I felt embarrassed for them.  They can come in handy when looking for a firm to assist with some work.</p>
<h2>Filtering the Worst</h2>
<h3>In what country is your coding done?</h3>
<p>Generally I want to hear USA (point being &#8211; same country I am in).  There are exceptions and I don&#8217;t want to get into my broader feelings on offshoring but that is what I need to hear.</p>
<h3>What is DNS?  Have you ever updated a DNS record?</h3>
<p>Don&#8217;t care at all about getting a technically correct explanation here.  I just want to get a feeling that they understand domains don&#8217;t magically point at the right place and that they could handle making simple adjustments when I don&#8217;t control the domain.</p>
<h3>Using only notepad, textedit, or similar, write a page that has the following:</h3>
<ul>
<li> a green div</li>
<li>a button underneath that div</li>
<li>some Javascript that toggles the green div from visible to not visible when the button is clicked</li>
</ul>
<p>You would likely be amazed at how many agencies this would blow the minds of. </p>
<h3>Do you know the difference between GET and POST?  When would you generally use each of those?</h3>
<p>Want to find general understanding here.  Not concerned about a complete, technically correct answer but just want to hear vague familiarity.</p>
<h3>Do you know what SSH, SCP, or SFTP are?</h3>
<p>Them being familiar at all with just one of these is sufficient.  A similar question is how would you get files on a server without FTP?</p>
<h3>What are steps you could take to speed up the load time of a page?</h3>
<p>At a minimum, hoping for an understanding that reducing the size of pictures/images helps.  Beyond that mentions of caching, combining/compressing Javascript, not using every Javascript framework in existence, content distribution networks, or whatever else are icing on the cake.</p>
<h3>What is the difference between server side and client side code?  What would you personally prefer for each?</h3>
<p>Basic blocks here but there are firms with paying customers that could not answer this question well.</p>
<h2>Follow Up Questions</h2>
<p>Only if the above completely trivial questions can be answered with some confidence should you consider asking follow up questions.  Some ideas:</p>
<ul>
<li>How do you structure your code/css?</li>
<li>What do you use for version control?</li>
<li>Let me see some stuff you have worked on.</li>
<li>What libraries do you commonly use?</li>
</ul>
<p>I am not kidding when I say that I know of professional agencies that cannot answer a single question from that first set satisfactorily.  Don&#8217;t assume the basics.  You have to start from nothing when considering a web dev firm or else you might find yourself in a painful situation down the road.  There are some excellent firms out there that can really get good work done so don&#8217;t take this as a blanket statement covering them all.  It is just fair warning that you have to be careful when looking for someone to work on your stuff.</p>
<img src="http://feeds.feedburner.com/~r/gtuhl/~4/qKrJgyY_sto" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.gtuhl.com/2009/10/13/vetting-a-web-dev-firm/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Monitoring PSUs in Arch Linux Dell Servers</title>
		<link>http://blog.gtuhl.com/2009/08/22/monitoring-psus-in-arch-linux-dell-servers/</link>
		<comments>http://blog.gtuhl.com/2009/08/22/monitoring-psus-in-arch-linux-dell-servers/#comments</comments>
		<pubDate>Sat, 22 Aug 2009 12:27:40 +0000</pubDate>
		<dc:creator>Joe</dc:creator>
				<category><![CDATA[SysAdmin]]></category>
		<category><![CDATA[archlinux]]></category>
		<category><![CDATA[dell]]></category>
		<category><![CDATA[ipmi]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[zabbix]]></category>

		<guid isPermaLink="false">http://blog.gtuhl.com/?p=492</guid>
		<description><![CDATA[We currently use Arch Linux exclusively for servers.  Much of our equipment comes from Dell and one of the gotchas of using a non-Redhat, non-SUSE linux distribution with their servers is you cannot just drop in their Open Manage tools to monitor everything.  As a side note, despite the bloat of Open Manage, it actually [...]]]></description>
			<content:encoded><![CDATA[<p>We currently use Arch Linux exclusively for servers.  Much of our equipment comes from Dell and one of the gotchas of using a non-Redhat, non-SUSE linux distribution with their servers is you cannot just drop in their Open Manage tools to monitor everything.  As a side note, despite the bloat of Open Manage, it actually isn&#8217;t a bad set of tools once you get it installed (on an rpm-based distribution) &#8211; the command line utilities you get with it are pretty decent.  The GUI stuff is largely worthless in my biased opinion.</p>
<p>In any case, Open Manage is a big pile of rpms with lots of dependencies so it wouldn&#8217;t be easy to transfer to Arch.  I posted awhile back about <a href="http://blog.gtuhl.com/2009/03/11/monitoring-dell-perc5-and-perc6-disks-in-arch-linux/">using LSI&#8217;s Megaraid CLI tool to monitor Dell raid arrays</a> but what about everything else?  One big item that was really haunting me was power supplies.  I had no monitoring on those things so if I went too long between datacenter visits to check for amber lights on the fronts of the servers, we could have a double PSU failure on a really important server and be in a lot of trouble.  Server PSUs fail A LOT so don&#8217;t discount the importance of monitoring them.  My personal experience might be unusual, but i&#8217;ve had more PSUs fail than hard drives.</p>
<p>Yesterday I read up on <a href="http://en.wikipedia.org/wiki/Intelligent_Platform_Management_Interface">Intelligent Platform Management Interface (IPMI)</a> and saw that Dell has supported it for awhile.  I suspect their Open Manage tools are simply a proprietary wrapper around this.</p>
<p>So you just need some other software that will let you access this stuff. Some quick searches revealed ipmitools and freeipmi being popular.  I went with <a href="http://www.gnu.org/software/freeipmi/">freeipmi</a> because it is a very active project with recent updates (the Arch Linux AUR had them both but both PKGBUILDs were broken).</p>
<p>Installing it is straight forward:  download tarball, unpack, ./configure, make, make install.  This will place a bunch of command line tools in /usr/local/sbin/.  You&#8217;ll probably want to script the install process if you have to setup lots of servers.  Or, don&#8217;t be lazy like me and bundle up a PKGBUILD for the Arch AUR and just use yaourt for the installs.</p>
<p>You can check out the README and man pages for information about all the various commands but I am just using one: ipmi-sel.  This prints out the contents of the &#8220;System Event Log&#8221; and seems to correspond very nicely with any messages that appear on the front LCD of the server.  I removed, replaced, and removed again a PSU in a server and saw this perfectly parseable and useful output:</p>
<pre>28:21-Aug-2009 09:00:36:Power Supply Status <img src='http://blog.gtuhl.com/wp-includes/images/smilies/icon_razz.gif' alt=':P' class='wp-smiley' /> resence detected
29:21-Aug-2009 09:00:37:Power Supply PS Redundancy:Redundancy Lost
30:21-Aug-2009 09:04:56:Power Supply Status <img src='http://blog.gtuhl.com/wp-includes/images/smilies/icon_razz.gif' alt=':P' class='wp-smiley' /> resence detected
31:21-Aug-2009 09:04:57:Power Supply PS Redundancy:Fully Redundant (formerly "Redundancy Regained")
32:21-Aug-2009 09:05:16:Power Supply Status <img src='http://blog.gtuhl.com/wp-includes/images/smilies/icon_razz.gif' alt=':P' class='wp-smiley' /> resence detected
33:21-Aug-2009 09:05:17:Power Supply PS Redundancy:Redundancy Lost</pre>
<p>Other events I have seen in this log are memory DIMM failures and the case being open &#8211; and those are conveniently the only other alerts i&#8217;ve seen on the LCDs before.  You can clear out the SEL and have a few other options just with that one ipmi-sel tool &#8211; read the man page for more information.</p>
<p>For monitoring PSU failure I am using this script.  I am sure there are tighter approaches but this one works fine:</p>
<p><code>/usr/local/sbin/ipmi-sel | grep "PS Redundancy:" | tail -n1 | grep "Redundancy Lost" | wc -l</code></p>
<p>That will return a 1 if the last &#8220;PS Redundancy&#8221; related item in the log is a failure, and 0 otherwise.  You can then easily snap that into Zabbix or whatever monitoring software you prefer.  I did a <a href="http://blog.gtuhl.com/2009/02/21/monitoring-postgresql-tps-with-zabbix/">post with more detail on adding items to Zabbix</a> awhile back that might be helpful if you are not familiar with the process.</p>
<img src="http://feeds.feedburner.com/~r/gtuhl/~4/Xd9GlTjJ8ps" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.gtuhl.com/2009/08/22/monitoring-psus-in-arch-linux-dell-servers/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>WeTheCitizens IT</title>
		<link>http://blog.gtuhl.com/2009/08/15/wethecitizens-it/</link>
		<comments>http://blog.gtuhl.com/2009/08/15/wethecitizens-it/#comments</comments>
		<pubDate>Sat, 15 Aug 2009 13:47:09 +0000</pubDate>
		<dc:creator>Joe</dc:creator>
				<category><![CDATA[startups]]></category>
		<category><![CDATA[it]]></category>
		<category><![CDATA[wethecitizens]]></category>

		<guid isPermaLink="false">http://blog.gtuhl.com/?p=437</guid>
		<description><![CDATA[I threw up a quick post at the company blog about our tech setup.  Check it out if interested.
]]></description>
			<content:encoded><![CDATA[<p>I threw up a quick post at the company blog about our tech setup.  <a href="http://www.wildfireplatform.com/2009/08/wethecitizens-it/">Check it out</a> if interested.</p>
<img src="http://feeds.feedburner.com/~r/gtuhl/~4/BhSK0VNj_Lw" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.gtuhl.com/2009/08/15/wethecitizens-it/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PostgreSQL Tips and Tricks</title>
		<link>http://blog.gtuhl.com/2009/08/07/postgresql-tips-and-tricks/</link>
		<comments>http://blog.gtuhl.com/2009/08/07/postgresql-tips-and-tricks/#comments</comments>
		<pubDate>Fri, 07 Aug 2009 22:00:03 +0000</pubDate>
		<dc:creator>Joe</dc:creator>
				<category><![CDATA[SysAdmin]]></category>
		<category><![CDATA[postgresql]]></category>

		<guid isPermaLink="false">http://blog.gtuhl.com/?p=420</guid>
		<description><![CDATA[Here&#8217;s a dozen tips for working with a PostgreSQL database.  It is a sophisticated and powerful piece of software and just knowing a few rules of thumb before diving in can be a huge help.  If you want more detail read the amazing documention.  My list of tips was very long so [...]]]></description>
			<content:encoded><![CDATA[<p>Here&#8217;s a dozen tips for working with a PostgreSQL database.  It is a sophisticated and powerful piece of software and just knowing a few rules of thumb before diving in can be a huge help.  If you want more detail <a href="http://www.postgresql.org/docs/8.3/interactive/">read the amazing documention</a>.  My list of tips was very long so I just chopped off a dozen for this post.</p>
<h2>#1: Don&#8217;t do sequential scans &#8211; use indexes</h2>
<p>Do SELECTs against indexes.  Sequential scans will devastate your IO and in most cases should be avoided.  I read on a bogus guide somewhere that needing indexes was a sign of a bad database schema.  That is complete BS.</p>
<p>The only time indexes can hurt is when doing bulk inserts.  I <a href="http://blog.gtuhl.com/2009/04/18/bulk-data-loading-with-postgresql/">did a post</a> about dropping/recreating to temporarily boost bulk insert performance awhile back.</p>
<p>Most of the time it doesn&#8217;t matter though.  Throw down as many indexes as you need.</p>
<h2>#2: Index all foreign keys</h2>
<p>There are certain rare cases where this doesn&#8217;t help, but as your database grows in size this becomes increasingly important.  Say you have a &#8220;people&#8221; table.  You then have an &#8220;emails&#8221; table with a person_id column that is a FK to the people table.  As an example say you have 10 million rows in the emails table.</p>
<p>Now delete a row from that people table.  PostgreSQL must scan the emails table and ensure that no email rows reference the person row you just deleted.  If that person_id column on emails is not indexed you just kicked off a sequential scan of 10 million rows with your deletion attempt.  Index that column and the pain goes away.  My general rule of thumb is to index every single FK.  You can always drop the indexes later if they aren&#8217;t needed.</p>
<p>Here&#8217;s a handy query for discovering all the FKs pointed at a table in a database.  You can then shoot through the results and verify that each table has indexes for those FKs.</p>
<pre>
select t.constraint_name, t.table_name, t.constraint_type,
   c.table_name, c.column_name
from information_schema.table_constraints t,
   information_schema.constraint_column_usage c
where t.constraint_name = c.constraint_name
   and t.constraint_type = 'FOREIGN KEY'
   and c.table_name = 'MY_TABLE_NAME';
</pre>
<h2>#3: Don&#8217;t be afraid to selectively de-normalize</h2>
<p>It is only a matter of time before you realize the <a href="http://en.wikipedia.org/wiki/Database_normalization#Normal_forms">normal forms</a> you learned about in school can seriously hurt your database performance as size gets huge.  De-normalize (usually meaning duplicate a value in two tables) when it provides a big performance increase.  You must be mindful of the maintenance concerns with de-normalization.  The maintenance can be done in code, with triggers or similar in the DB, or sometimes the data is static enough that it isn&#8217;t a huge concern.</p>
<p><strong>Edit:</strong> I hedged this advice from the original post (originally was a blanket &#8220;de-normalize all the time&#8221; that could burn someone new to databases).  You should know the normal forms and understand them.  My point was that you shouldn&#8217;t feel that wavering from them with de-normalization makes your database a bad one &#8211; it is simply a reality of nontrivial production databases.  But, if you don&#8217;t understand them, and/or don&#8217;t understand when and why de-normalization can help blindly ignoring the normal forms is only going to get you into more trouble.  So learn them and understand them but then don&#8217;t be afraid to selectively de-normalize where it can give you big boosts of performance.  A common scenario that is often very helpful is de-normalizing to eliminate a join or multiple reads.</p>
<h2>#4: Avoid joins on huge tables</h2>
<p>As your tables get bigger and bigger (many millions of rows) joins cannot be done in memory and spill over to disk.  Once that happens your performance is gone.  For smaller tables it doesn&#8217;t matter but as tables get bigger avoid joining when you can.  Try using unions, subselects, or denormalization instead.  If you have two big tables that are 1-to-1 that you are joining a lot combine them into one wide table.</p>
<h2>#5: Don&#8217;t do unanchored text searches or full text searching</h2>
<p>That is don&#8217;t try something like this with wildcards on both ends:<br />
<code>select * from people where first_name like '%joe%'</code></p>
<p>The reason is that no index can be used so PostgreSQL will be doing a sequential scan no matter what.  If you truly need full text searching use Lucene or PostgreSQL 8.3+ full text searching capabilities instead of standard indexes.</p>
<p>If you are in a pinch and can get by with the left side being anchored you can create a special index using varchar_pattern_ops like this:</p>
<p><code>create index idx_people_first_name on people (first_name varchar_pattern_ops);</code></p>
<p>That index can be used by this query (no wildcard on left side):<br />
<code>select * from people where first_name like 'joe%';</code></p>
<p>But be aware that same index cannot be used by a query using the standard &#8220;=&#8221; operator so you have to create two indexes to get both (one with varchar_pattern_ops, one without).  Yes this is messy and its why you should use Lucene, PostgreSQL 8.3+ full text searching, or something similar to do text searching.</p>
<h2>#6: Use the COPY command for bulk data loading</h2>
<p>It is orders of magnitude faster.  <a href="http://blog.gtuhl.com/2009/04/18/bulk-data-loading-with-postgresql/">Check a previous post of mine</a> if you want numbers.</p>
<h2>#7: Use multi-column indexes</h2>
<p>For common queries with compound WHERE clauses create a multi-column index for maximum performance.  An example:<br />
<code>create index idx_people_full_name on people (first_name, last_name);</code></p>
<p>Now you can do queries like this and hit that index:<br />
<code>select * from people where first_name = 'joe' and last_name= 'smith';</code></p>
<p>Additionally, any WHERE clause using just first_name can take full advantage of this index (because first_name is the first column) so don&#8217;t waste disk space creating redundant single column indexes if you are already covered by a multi-column.</p>
<p>Finally, you can align a multi-column index with a common ORDER BY clause to make that query instant.  Using the above index a query against people ending in <code>order by first_name, last_name</code> would be super fast.</p>
<p>As a final note PostgreSQL can combine multiple single-column indexes so multi-column indexes on every combination are not necessary.  But, a single multi-column is going to be faster so they can be helpful for queries you run a lot.  You can read more about combining of multiple indexes <a href="http://www.postgresql.org/docs/8.2/interactive/indexes-bitmap-scans.html">here</a>.</p>
<h2>#8: Don&#8217;t bother indexing evenly distributed booleans or enum columns</h2>
<p>In PostgreSQL indexes do not store data, they are instead pointers to the data.  This means that after scanning the index a random read must be done for each matching record to get the data.  Random IO is a lot more expensive than sequential IO (though SSDs are changing that) so if the PostgreSQL planner (it does sampling of your data in the background) knows you have a very evenly distributed column (a boolean split 50/50 for instance) it won&#8217;t use the index at all.  It will instead make the assumption that a straight up sequential scan will be faster than index scanning for half the rows and having to do a ton of random IO.  Indexes on these types of columns are a waste of disk space and you are slowing down your inserts for no benefit.</p>
<h2>#9: Use EXPLAIN ANALYZE to benchmark and compare queries</h2>
<p>Using EXPLAIN on its own will give you the planner&#8217;s guess at how it will run your query.  This is pretty useful if you have a very huge query that is going to take a long time and you want to tune it without having to actually run it after each tweak.  The complete EXPLAIN ANALYZE will run the query, show the plan predicted, show what actually happened, and how long it took.</p>
<p>Be sure to ANALYZE after significant table modifications so that the planner can do its sampling and adjust its plans to account for any index changes.</p>
<h2>#10: Use expression indexes &#8211; they are awesome</h2>
<p>In PostgreSQL you can index an expression.  Here is a simple example:<br />
<code>create index idx_people_lower_first_name on people (lower(first_name));</code></p>
<p>Now you can run queries like this and use the index &#8211; no need to create a separate column with lowercase values:<br />
<code>select * from people where lower(first_name) = 'joe';</code></p>
<p>You can similarly index combinations of your data-containing columns:</p>
<p><code>create index idx_people_lower_full_name on people (lower(coalesce(first_name, '')) || lower(coalesce(last_name, '')));</code></p>
<h2>#11: Indexes can have WHERE clauses</h2>
<p>You can create conditional indexes that only apply for certain rows.  A useful scenario is creating a unique constraint on a column or set of columns only where certain conditions apply:</p>
<p><code>create unique index idx_people_person_id on people (person_id) where deleted = false;</code></p>
<p>That&#8217;s a simple example that feels a bit cleaner than changing the ids of deleted records to avoid conflicting with a unique constraint.  Anything can be placed in that WHERE clause, they aren&#8217;t limited in any way.</p>
<h2>#12: Don&#8217;t use the database if it isn&#8217;t a good fit</h2>
<p>Databases only scale so far alone.  At some point you are going to have to explore memcached, terracotta, or something similar to continue to scale.  The NoSQL (over)excitement going on right now offers alternatives as well for certain scenarios.  The main point is that at some point your database will just be one component in a set of services and technologies that provide the persistence layer for your application.</p>
<img src="http://feeds.feedburner.com/~r/gtuhl/~4/X7da3Em23g0" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.gtuhl.com/2009/08/07/postgresql-tips-and-tricks/feed/</wfw:commentRss>
		<slash:comments>20</slash:comments>
		</item>
		<item>
		<title>Why does a Windows Server OS exist?</title>
		<link>http://blog.gtuhl.com/2009/08/03/why-does-a-windows-server-os-exist/</link>
		<comments>http://blog.gtuhl.com/2009/08/03/why-does-a-windows-server-os-exist/#comments</comments>
		<pubDate>Mon, 03 Aug 2009 21:35:06 +0000</pubDate>
		<dc:creator>Joe</dc:creator>
				<category><![CDATA[SysAdmin]]></category>
		<category><![CDATA[archlinux]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[windows]]></category>

		<guid isPermaLink="false">http://blog.gtuhl.com/?p=408</guid>
		<description><![CDATA[Awhile back I had to fight again with the single Windows server at WeTheCitizens to get a Blackberry reset.  The RIM server software only runs on Windows, else that final machine would be running Arch Linux like everything else.
It was a nightmare to get working &#8211; the RIM software has literally a half dozen [...]]]></description>
			<content:encoded><![CDATA[<p>Awhile back I had to fight again with the single Windows server at <a href="http://www.wildfireplatform.com/">WeTheCitizens</a> to get a Blackberry reset.  The RIM server software only runs on Windows, else that final machine would be running <a href="http://www.archlinux.org/">Arch Linux</a> like everything else.</p>
<p>It was a nightmare to get working &#8211; the RIM software has literally a half dozen separate flaky services all running at the same time and tripping on one another that have to align perfectly.  The worst part though was having to work with a Window server.</p>
<p>That leads me to my question: why does anyone use Windows on servers any more?  How is it manageable at all?  For any Windows admins out there, would love to get some perspective on why people use it.</p>
<ul>
<li>How do you manage updates?  Especially with lots of machines, it seems borderline impossible to keep them all updated without downtime considering the mandatory restart needed for <strong>everything</strong>.  Linux only needs a restart for some kernel modifications (i.e. barely ever).  I have machines in production with 700 days of uptime.</li>
<li>What do you do for remote access?  Does an admin really have to login with a GUI remote desktop to each server?  Is there a VPN endpoint that has to be used or are all the servers left open?  How are bulk operations configured and run in that kind of mess?  With Linux I can whip up a bash script in minutes to do whatever I need done to as many machines as I wish.</li>
<li>Is there a fast approach to setting up new servers on a variety of hardware?  I can install Arch Linux in less than 10 minutes and I am not even doing anything advanced like PXE boots or disk imaging.</li>
<li>Windows fails in large environments.  The latest big example is The London Stock Exchange <a href="http://blogs.computerworld.com/london_stock_exchange_to_abandon_failed_windows_platform">completely abandoning Windows</a> after substantial investment of money and time to set it up on that platform.  As an older example I know for awhile (maybe even still today) <a href="http://www.theregister.co.uk/2001/12/12/microsoft_hotmail_still_runs/">Hotmail ran on FreeBSD</a> after Microsoft purchased it because they couldn&#8217;t make it stable on Windows.</li>
<li>Aside from the second rate Microsoft products that are decisively outperformed by Linux alternatives (IIS, MSSQL, Exchange) is there any decent server or DB software that doesn&#8217;t run better on Linux than on Windows?  I am thinking things like Apache, PostgreSQL, Tomcat (and the Java VM in general), Mongrel, JBoss, and innumerable others.</li>
</ul>
<p>This is really just my frustrated attempt at understanding why people put themselves through the utter pain of running Windows servers.  Why would you not use Linux?  As long as there are people using Windows it means crapware like Blackberry Server has no incentive to port to Linux and that means I have to suffer at work.  We&#8217;re 4:1 ratio for iPhones to BBs so maybe the best option is to get those final BB users switched over to a better phone.</p>
<p>If you are starting a company don&#8217;t even bother with Windows on the server.  Their new Bizspark program may seem like a good idea on the surface but I view it as a trap.  You get to use an inferior product for a few years and then must pay through the nose at the end or rebuild everything with the better technology you should have used from the beginning.</p>
<img src="http://feeds.feedburner.com/~r/gtuhl/~4/JLWbsgMZhKw" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.gtuhl.com/2009/08/03/why-does-a-windows-server-os-exist/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Cheap Home Network Storage</title>
		<link>http://blog.gtuhl.com/2009/07/03/cheap-home-network-storage/</link>
		<comments>http://blog.gtuhl.com/2009/07/03/cheap-home-network-storage/#comments</comments>
		<pubDate>Fri, 03 Jul 2009 11:12:14 +0000</pubDate>
		<dc:creator>Joe</dc:creator>
				<category><![CDATA[Gadgets]]></category>
		<category><![CDATA[archlinux]]></category>
		<category><![CDATA[home]]></category>
		<category><![CDATA[networking]]></category>

		<guid isPermaLink="false">http://blog.gtuhl.com/?p=297</guid>
		<description><![CDATA[Ever since I installed an SSD in my Macbook Pro I have been finding ways to move more and more stuff off of my laptop hard drive.  I wanted to move a bunch of rarely needed stuff to a network share at my house including most of my music library.  Ideally I also [...]]]></description>
			<content:encoded><![CDATA[<p>Ever since I <a href="http://blog.gtuhl.com/2009/03/26/ocz-vertex-ssd-in-a-17-santa-rosa-macbook-pro/">installed an SSD in my Macbook Pro</a> I have been finding ways to move more and more stuff off of my laptop hard drive.  I wanted to move a bunch of rarely needed stuff to a network share at my house including most of my music library.  Ideally I also wanted to be able to play the network stored music from my laptop when on my home wifi.</p>
<p>There are a lot of network attached storage options out there now but they are a bit expensive and have arbitrary limitation mixtures in the software.  Plus Lindsey was not keen on me spending more money on gadgets.</p>
<p>Instead I looked at the random gear I had around the house and was able to quickly get raided network storage setup for no cost that is more flexible than most of what you can buy.  Nothing new here, this is a pretty familiar setup.  It just worked out well for me so wanted to share.</p>
<p>I had the following to work with:</p>
<ul>
<li>An old Dell laptop purchased about 4-5 years ago.  2GB of ram, I don&#8217;t know any of the other specs.</li>
<li>The cheap WRT54GS Linksys router I was already using at the house.</li>
<li>A couple 500 GB external hard drives.</li>
</ul>
<p>That isn&#8217;t much but I imagine many people have a similar combination just lying around.  Most of my extra gear (including the last desktop I owned) got absorbed when the company was starting up.</p>
<p>Here&#8217;s the steps to set this up for yourself.</p>
<h2>[1] Put DD-WRT on the router</h2>
<p>The big items I wanted were reserved DHCP (so I could reserve an IP for the new &#8220;server&#8221;) and VPN (so I could connect remotely to my home network).  DD-WRT offers both of these along with loads of additional features.  It makes the stock Linksys firmware look like garbage in comparison.</p>
<p>My WRT54GS is an especially cheap router with a tiny amount of memory so I had to install the pared down &#8220;micro&#8221; version of DD-WRT but it still works great.</p>
<p>For help and information on DD-WRT <a href="http://dd-wrt.com/dd-wrtv3/index.php">go visit their site</a>.  You can enter your specific router model number and get the exact files you need along with specific instructions.  Its really easy to do.</p>
<h2>[2] Put Arch Linux on the laptop</h2>
<p>Arch Linux is a preference thing but I do believe it is the best Linux distribution.  I use it on all of our servers at work now and am glad Sean told me about it.  Since in this setup I am creating a dedicated server out of the old laptop and don&#8217;t plan to ever use it directly Arch is a perfect fit &#8211; just do a base install.  No need for x windows or any of that cruft.  Arch makes installing things so incredibly easy.  In 95% of the cases the steps are to grab the software with pacman, add it to the DAEMONS array in rc.conf (if its something that should start on its own on reboot), and then start it.</p>
<p>Obviously you can use a different Linux distribution if you would prefer but subsequent content below assumes Arch.</p>
<p>To get Arch or to view their fantastic wiki and documentation go to <a href="http://archlinux.org">archlinux.org</a>.</p>
<p>At this point I gave the Arch server&#8217;s mac address an IP reservation in DD-WRT.  That&#8217;s optional but I would highly recommend either doing that or using a static IP.</p>
<h2>[3] Connect external drives to laptop and raid them</h2>
<p>SSH into the Arch server and plug the drives in.  The faster the hard drive connection the better but honestly for stuff you are going over a network for it doesn&#8217;t really matter.  Your laptop hard drive is probably going to be <code>/dev/sda</code> and your plugged in external drives are probably going to be <code>/dev/sdb</code> and <code>/dev/sdc</code>.  I am making that assumption in the following, you can <code>ls -l /dev/sd*</code> to verify what is showing up.  On my machine the external drives showed up as &#8220;storage&#8221; (for the group permission) while the laptop drive was &#8220;disk&#8221;.</p>
<p>Install mdadm with <code>pacman -Sy mdadm</code>.  This is for the software raid.  Also install parted with <code>pacman -Sy parted</code>.  This is what I use for partitioning instead of fdisk.  For more information about mdadm read the <a href="http://en.wikipedia.org/wiki/Mdadm">wikipedia page</a> which has good tutorials and more information in the external links section.</p>
<p>Now, assuming <code>/dev/sdb</code> and <code>/dev/sdc</code> for the external drives as mentioned above do the following.  This is all done as root.</p>
<ul>
<li>Partition the first drive.  Type <code>parted /dev/sdb</code> and you will get the parted prompt.</li>
<ul style="padding-left: 30px">
<li>From the parted prompt type <code>mklabel gpt</code>.</li>
<li>Type <code>mkpart primary 0 100%</code>.</li>
<li>Type <code>quit</code> to leave the parted prompt.</li>
</ul>
<li>Now partition the second external drive in the same way (just do <code>parted /dev/sdc</code> and repeat the steps).</li>
<li>Create your raid array with <code>mdadm --create --verbose /dev/md0 --level=raid1 --raid-devices=2 /dev/sdb1 /dev/sdc1</code>.  This is creating a raid 1 with 2 drives.</li>
<li>Now format your new raid array with <code>mkfs.ext3 -b 4096 /dev/md0</code>.  This will format with the ext3 file system but you could use whatever you wanted instead just as well.</li>
<li>Create a directory wherever you want for mounting the new raided disk.  I used <code>/var/raid_array</code>.  That would be <code>cd var</code> and <code>mkdir raid_array</code>.</li>
<li>Mount the new disk with <code>mount /dev/md0 /var/raid_array</code>.  The raided disk should be usable now.  Do a <code>df -h</code> to make sure it mounted and verify the usable space is what you expected. </li>
<li>Edit <code>/etc/fstab</code> and add a line containing <code>/dev/md0 /var/raid_array auto defaults 0 1</code> to ensure the new disk gets mounted on reboot.  You can obviously change those mount parameters to whatever values you prefer.</li>
<li>If you want, do some benchmarking and consider bumping up readahead on your new raid array.  See my <a href="http://blog.gtuhl.com/2009/05/13/dell-md1120-perc6e-performance/">last post</a> for information about that and how important it can be.  BUT, if these are USB-connected external drives no need to waste time tuning.  With software raid at least the OS knows there are two drives involved and so will tune up your readhead to 512.  Just wanted to mention it as generally when setting up a raid array (or any decent hard drive) you want to be aware of that setting.</li>
</ul>
<p>In the above I was assuming the external drives were the same size.  If they are not then in the &#8220;mkpart&#8221; command use a fixed MB amount or smaller percentage for the second number instead of 100%.  You could then separately partition any remaining space so it is at least usable if not in the raid.</p>
<p>The above uses raid 1 but mdadm can handle a lot of different configurations, just tweak the &#8220;&#8211;level&#8221; and &#8220;&#8211;raid-devices&#8221; parameters as needed.</p>
<p>At this point you have a raid array usable at <code>/var/raid_array</code> on the laptop &#8220;server&#8221;.</p>
<h2>[4] Setup samba</h2>
<p>Your new raid array wouldn&#8217;t be much use if you couldn&#8217;t easily access it.  Use samba to set it up as a network share that all of your home computers can see or use.  I&#8217;d recommend checking out the consistently awesome <a href="http://wiki.archlinux.org/index.php/Samba">Arch Wiki</a> for samba instructions.  As with all things Arch, installation and configuration is very straight forward and cruft free.  Your mileage may vary on other distributions.</p>
<h2>[5] Finish up</h2>
<p>From here you can setup whatever you like.  Here is what I did.</p>
<ul>
<li>Setup an iTunes share so my music on the network just shows up when I am at home.  You can do this by setting up a daapd server, check out the <a href="http://en.wikipedia.org/wiki/Digital_Audio_Access_Protocol">wikipedia article about daap</a> for an overview of what it is.  There is unfortunately not an Arch Wiki article for this so here are the steps if interested.</li>
<ul style="padding-left: 30px">
<li>Grab the needed software with <code>pacman -Sy mt-daapd</code> and <code>pacman -Sy dbus</code>.</li>
<li>Edit <code>/etc/mt-daapd/mt-daapd.conf</code>.  Set the &#8220;mp3_dir&#8221; to the root location of your music and the &#8220;servername&#8221; to your server name.  You will probably want to add more extensions to &#8220;extensions&#8221;.  I added &#8220;.m4a,.m4p,.aac&#8221;.</li>
<li>Add dbus, avahi-daemon, and mt-daapd (in that specific order) to the DAEMONS array in rc.conf.</li>
<li>Start them up in that order with <code>/etc/rc.d/dbus start</code>, <code>/etc/rc.d/avahi-daemon start</code>, and <code>/etc/rc.d/mt-daapd start</code>.</li>
<li>That is it.  Your music should show up in iTunes as a shared library with whatever name you set as &#8220;servername&#8221; in mt-daapd.conf.</li>
</ul>
<li>Setup a backup script that pushes select important items up to s3 each week.  I used s3sync to do this but you can use whatever you like.  s3sync is straight forward and easy to use in command line scripts.  <a href="http://s3sync.net/wiki">Here is the s3sync site</a> and <a href="http://blog.eberly.org/2006/10/09/how-automate-your-backup-to-amazon-s3-using-s3sync/">here is a blog post that describes usage well</a>.</li>
<li>Setup a basic monitoring script, particularly to watch the status of your raid.  You can use <code>mdadm --detail /dev/md0</code> to get the status of your drives and write a script to parse out the pieces you want.  Since this is software raid with random external hard drives you aren&#8217;t going to get a nice red LED telling you when a drive is dead.  Have it check each morning and email you if things are busted up.  If you are sending email in your scripts do a <code>pacman -Sy postfix</code> first.</li>
</ul>
<h2>Conclusion</h2>
<p>The above may look like a lot of steps but it really is more straight forward than it seems.  With Arch especially, getting things setup is a snap.  This is a cheap solution and isn&#8217;t going to be a screamer on performance but it offers a lot of advantages:</p>
<ul>
<li>Cost $0 if you have an old laptop and a couple hard drives.</li>
<li>Even though its a cheap solution you still have complete features (raid, networked, iTunes share, plus whatever else you install on the server).</li>
<li>Can be easily expanded with additional drives, software raid in general offers a lot of flexibility and easy monitoring.</li>
<li>Laptops actually make decent home servers because they have batteries in them &#8211; power outtages aren&#8217;t a big deal.  On the other hand, your external hard drives will still die so maybe not a huge advantage.</li>
<li>You have a actual machine on top of the drives so you can easily add whatever server software you want.  You can use it as an SSH tunnel, web server, whatever.</li>
</ul>
<img src="http://feeds.feedburner.com/~r/gtuhl/~4/zjNepRNmlzQ" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.gtuhl.com/2009/07/03/cheap-home-network-storage/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Dell MD1120 + Perc6/E Performance</title>
		<link>http://blog.gtuhl.com/2009/05/13/dell-md1120-perc6e-performance/</link>
		<comments>http://blog.gtuhl.com/2009/05/13/dell-md1120-perc6e-performance/#comments</comments>
		<pubDate>Wed, 13 May 2009 12:33:44 +0000</pubDate>
		<dc:creator>Joe</dc:creator>
				<category><![CDATA[SysAdmin]]></category>
		<category><![CDATA[dell]]></category>
		<category><![CDATA[postgresql]]></category>
		<category><![CDATA[raid]]></category>

		<guid isPermaLink="false">http://blog.gtuhl.com/?p=318</guid>
		<description><![CDATA[The Hardware
We recently ordered one of Dell&#8217;s MD1120 units and a Perc 6/E raid card with 512MB battery-backed cache to beef up our production database.
Dell&#8217;s raid controllers are rebranded models manufacturered by other companies and they have been hit or miss.  They&#8217;ve done some horrible things (including advertising raid1-concatenated as raid10 a long while [...]]]></description>
			<content:encoded><![CDATA[<h2>The Hardware</h2>
<p>We recently ordered one of Dell&#8217;s <a href="http://www1.ap.dell.com/content/products/productdetails.aspx/storage_powervault_md1120?c=my&#038;l=en&#038;s=bsd">MD1120</a> units and a Perc 6/E raid card with 512MB battery-backed cache to beef up our production database.</p>
<p>Dell&#8217;s raid controllers are rebranded models manufacturered by other companies and they have been hit or miss.  They&#8217;ve done some horrible things (including advertising raid1-concatenated as raid10 a long while back) but my impression from reading online and from my own benchmarking are that these Perc6 cards are decent (but not exceptional).  You still get the lockin aspect &#8211; Dell won&#8217;t support your machine if there is a non-Dell raid card in it and the MD1xxx units supposedly only connect to Perc5 and Perc6 cards.</p>
<p>The MD1120 itself is a pretty cool unit.  Its only 2U and packs a lot of drives.  We ordered one with 24x 73GB 15K SAS drives.  No SSDs, I am amazed by SSD numbers but figure we can wait a few more years before shelling out the cash to fill an array with them.  I want more data on their reliability in a 24&#215;7 high IO server environment.  Here&#8217;s a picture of the new guy.</p>
<p><center><br />
<img src="http://blog.gtuhl.com/wp-content/md1120.jpg" width="550px"/><br />
</center></p>
<p>
&nbsp;&nbsp;
</p>
<p>The next time we have the need and budget to purchase a new database server from the ground up I plan to go whitebox but in this case we were looking for a relatively inexpensive way to get more capacity out of our existing Dell server and this seemed like a good option.  I just had to ignore their storage tech guy who wanted me to buy a Gigabit SAN unit and screw our performance.  So just ignore the tech guys that are part of the sales process and do your own research and benchmarking.</p>
<p>After benchmarking this Perc6+MD1120 combination extensively and putting it in production I am reasonably happy with its performance.  Going to share those numbers now as it is sometimes hard to track down data on these things.</p>
<h2>The Benchmark</h2>
<p>Bunch of notes about the testing environment and configurations for anyone interested.  If just want numbers skip past these.</p>
<ul>
<li>Perc 6/E upgraded to latest 6.2.0-0013 firmware and connected to a new PowerEdge 1950 with 2x Xeon E5410s and 8GB RAM.</li>
<li>MD1120 connected directly to Perc 6/E.</li>
<li>All hardware raid configured with 64kb stripes, write back enabled, read ahead disabled (Dell hardware read ahead isn&#8217;t good).</li>
<li>Server running latest opensuse.  Did this purely to make it easier to upgrade firmware, get Dell support etc.  If you call Dell and are using opensuse just lie and say you are running Suse 10 &#8211; everything will work and they will never know the difference.</li>
<li>All tests were run 3 times and the middle run was recorded.</li>
<li>xfs mount options were just <code>noatime</code> and ext3 mount options were <code>noatime,data=writeback</code>.</li>
<li>xfs file system params were <code>-b size=4096 -d su=64k,sw=X</code> where X was the appropriate value for the configuration involved.  ext3 params were <code>-b 4096 -E stride=16,stripe-width=192</code>.</li>
<li>dd params were &#8220;bs=8k, count=2000000&#8243; ensuring a file 2 times size of RAM to bypass OS cache.</li>
<li>The bonnie++ random seeks/second is the most important number for DB performance.</li>
<li>I did a ton of tests with the first configuration and then settled into a groove of just testing the bits that seemed to matter.  Hence the odd distribution of tests by config.</li>
</ul>
<h2>The dd and bonnie++ 1.02 results on opensuse</h2>
<p>The distinct raid configurations are color coded and numbered.  The winning individual tests are bolded.</p>
<table border="1" bordercolor="gray" cellpadding="3" cellspacing="0">
<tr bgcolor="#cccccc">
<th colspan="2">Record#</th>
<th colspan="3" align="center">Raid Level</th>
<th colspan="3" align="center">Linux Params</th>
<th colspan="3" align="center">Results</th>
</tr>
<tr bgcolor="#cccccc">
<th>Test</th>
<th>Config</th>
<th>HW</th>
<th>SW</th>
<th>Total</th>
<th>File System</th>
<th>Read Ahead</th>
<th>Sched.</th>
<th>dd Write MB/s</th>
<th>dd Read MB/s</th>
<th>bonnie++ seeks/s</th>
</tr>
<tr>
<td>1</td>
<td bgcolor="#a8d255">1</td>
<td>24disk&nbsp;raid10</td>
<td>None</td>
<td>10</td>
<td>xfs</td>
<td>256</td>
<td>cfq</td>
<td>540</td>
<td>519</td>
<td>787.4</td>
</tr>
<tr>
<td>2</td>
<td bgcolor="#a8d255">1</td>
<td>24disk&nbsp;raid10</td>
<td>None</td>
<td>10</td>
<td>xfs</td>
<td>256</td>
<td>noop</td>
<td>471</td>
<td>439</td>
<td>811.6</td>
</tr>
<tr>
<td>3</td>
<td bgcolor="#a8d255">1</td>
<td>24disk&nbsp;raid10</td>
<td>None</td>
<td>10</td>
<td>xfs</td>
<td>256</td>
<td>deadline</td>
<td>494</td>
<td>429</td>
<td>812.1</td>
</tr>
<tr>
<td>4</td>
<td bgcolor="#a8d255">1</td>
<td>24disk&nbsp;raid10</td>
<td>None</td>
<td>10</td>
<td>xfs</td>
<td>4096</td>
<td>cfq</td>
<td>544</td>
<td>836</td>
<td>802.7</td>
</tr>
<tr>
<td>5</td>
<td bgcolor="#a8d255">1</td>
<td>24disk&nbsp;raid10</td>
<td>None</td>
<td>10</td>
<td>xfs</td>
<td>4096</td>
<td>noop</td>
<td>474</td>
<td>837</td>
<td>809.4</td>
</tr>
<tr>
<td>6</td>
<td bgcolor="#a8d255">1</td>
<td>24disk&nbsp;raid10</td>
<td>None</td>
<td>10</td>
<td>xfs</td>
<td>4096</td>
<td>deadline</td>
<td>492</td>
<td>791</td>
<td>808.4</td>
</tr>
<tr>
<td>7</td>
<td bgcolor="#a8d255">1</td>
<td>24disk&nbsp;raid10</td>
<td>None</td>
<td>10</td>
<td>xfs</td>
<td>8192</td>
<td>cfq</td>
<td>533</td>
<td>853</td>
<td>805.9</td>
</tr>
<tr>
<td>8</td>
<td bgcolor="#a8d255">1</td>
<td>24disk&nbsp;raid10</td>
<td>None</td>
<td>10</td>
<td>xfs</td>
<td>16384</td>
<td>cfq</td>
<td>536</td>
<td>976</td>
<td>806.3</td>
</tr>
<tr>
<td><strong>9</strong></td>
<td bgcolor="#a8d255"><strong>1</strong></td>
<td><strong>24disk&nbsp;raid10</strong></td>
<td><strong>None</strong></td>
<td><strong>10</strong></td>
<td><strong>xfs</strong></td>
<td><strong>32768</strong></td>
<td><strong>cfq</strong></td>
<td><strong>543</strong></td>
<td><strong>1035</strong></td>
<td><strong>808.6</strong></td>
</tr>
<tr>
<td>10</td>
<td bgcolor="#a8d255">1</td>
<td>24disk&nbsp;raid10</td>
<td>None</td>
<td>10</td>
<td>ext3</td>
<td>32768</td>
<td>cfq</td>
<td>332</td>
<td>602</td>
<td>695.2</td>
</tr>
<tr>
<td>11</td>
<td bgcolor="#a8d255">1</td>
<td>24disk&nbsp;raid10</td>
<td>None</td>
<td>10</td>
<td>ext3</td>
<td>4096</td>
<td>cfq</td>
<td>339</td>
<td>929</td>
<td>743</td>
</tr>
<tr>
<td>12</td>
<td bgcolor="#a8d255">1</td>
<td>24disk&nbsp;raid10</td>
<td>None</td>
<td>10</td>
<td>ext3</td>
<td>4096</td>
<td>noop</td>
<td>356</td>
<td>925</td>
<td>765.4</td>
</tr>
<tr>
<td>13</td>
<td bgcolor="#a8d255">1</td>
<td>24disk&nbsp;raid10</td>
<td>None</td>
<td>10</td>
<td>ext3</td>
<td>4096</td>
<td>deadline</td>
<td>342</td>
<td>909</td>
<td>712.9</td>
</tr>
<tr>
<td>14</td>
<td bgcolor="#efc800">2</td>
<td>12disk&nbsp;raid10</td>
<td>None</td>
<td>10</td>
<td>xfs</td>
<td>4096</td>
<td>cfq</td>
<td>566</td>
<td>572</td>
<td>780.4</td>
</tr>
<tr>
<td>15</td>
<td bgcolor="#efc800">2</td>
<td>12disk&nbsp;raid10</td>
<td>None</td>
<td>10</td>
<td>xfs</td>
<td>4096</td>
<td>noop</td>
<td>561</td>
<td>567</td>
<td>788.9</td>
</tr>
<tr>
<td>16</td>
<td bgcolor="#efc800">2</td>
<td>12disk&nbsp;raid10</td>
<td>None</td>
<td>10</td>
<td>xfs</td>
<td>4096</td>
<td>deadline</td>
<td>552</td>
<td>571</td>
<td>786.6</td>
</tr>
<tr>
<td>17</td>
<td bgcolor="#efc800">2</td>
<td>12disk&nbsp;raid10</td>
<td>None</td>
<td>10</td>
<td>xfs</td>
<td>8192</td>
<td>cfq</td>
<td>566</td>
<td>623</td>
<td>778</td>
</tr>
<tr>
<td>18</td>
<td bgcolor="#5727ef">3</td>
<td>2&#215;12disk&nbsp;raid10</td>
<td>raid0</td>
<td>100</td>
<td>xfs</td>
<td>256</td>
<td>cfq</td>
<td>560</td>
<td>507</td>
<td>535.9</td>
</tr>
<tr>
<td><strong>19</strong></td>
<td bgcolor="#5727ef"><strong>3</strong></td>
<td><strong>2&#215;12disk&nbsp;raid10</strong></td>
<td><strong>raid0</strong></td>
<td><strong>100</strong></td>
<td><strong>xfs</strong></td>
<td><strong>4096</strong></td>
<td><strong>cfq</strong></td>
<td><strong>560</strong></td>
<td><strong>955</strong></td>
<td><strong>816</strong></td>
</tr>
<tr>
<td>20</td>
<td bgcolor="#5727ef">3</td>
<td>2&#215;12disk&nbsp;raid10</td>
<td>raid0</td>
<td>100</td>
<td>xfs</td>
<td>8192</td>
<td>cfq</td>
<td>558</td>
<td>857</td>
<td>817.5</td>
</tr>
<tr>
<td>21</td>
<td bgcolor="#bd550c">4</td>
<td>24disk&nbsp;raid6</td>
<td>None</td>
<td>6</td>
<td>xfs</td>
<td>256</td>
<td>cfq</td>
<td>436</td>
<td>478</td>
<td>415.9</td>
</tr>
<tr>
<td>22</td>
<td bgcolor="#bd550c">4</td>
<td>24disk&nbsp;raid6</td>
<td>None</td>
<td>6</td>
<td>xfs</td>
<td>4096</td>
<td>cfq</td>
<td>440</td>
<td>1038</td>
<td>666</td>
</tr>
<tr>
<td>23</td>
<td bgcolor="#bd550c">4</td>
<td>24disk&nbsp;raid6</td>
<td>None</td>
<td>6</td>
<td>xfs</td>
<td>8192</td>
<td>cfq</td>
<td>437</td>
<td>1054</td>
<td>670</td>
</tr>
<tr>
<td>24</td>
<td bgcolor="#bd550c">4</td>
<td>24disk&nbsp;raid6</td>
<td>None</td>
<td>6</td>
<td>xfs</td>
<td>8192</td>
<td>noop</td>
<td>434</td>
<td>1058</td>
<td>651.7</td>
</tr>
<tr>
<td>25</td>
<td bgcolor="#bd550c">4</td>
<td>24disk&nbsp;raid6</td>
<td>None</td>
<td>6</td>
<td>xfs</td>
<td>8192</td>
<td>deadline</td>
<td>435</td>
<td>1044</td>
<td>666.1</td>
</tr>
<tr>
<td>26</td>
<td bgcolor="#bd550c">4</td>
<td>24disk&nbsp;raid6</td>
<td>None</td>
<td>6</td>
<td>xfs</td>
<td>16384</td>
<td>cfq</td>
<td>437</td>
<td>1083</td>
<td>667.3</td>
</tr>
<tr>
<td>27</td>
<td bgcolor="#6f5900">5</td>
<td>24disk&nbsp;raid60</td>
<td>None</td>
<td>60</td>
<td>xfs</td>
<td>256</td>
<td>cfq</td>
<td>424</td>
<td>391</td>
<td>670.2</td>
</tr>
<tr>
<td>28</td>
<td bgcolor="#6f5900">5</td>
<td>24disk&nbsp;raid60</td>
<td>None</td>
<td>60</td>
<td>xfs</td>
<td>4096</td>
<td>cfq</td>
<td>426</td>
<td>1038</td>
<td>669.6</td>
</tr>
<tr>
<td>29</td>
<td bgcolor="#6f5900">5</td>
<td>24disk&nbsp;raid60</td>
<td>None</td>
<td>60</td>
<td>xfs</td>
<td>8192</td>
<td>cfq</td>
<td>424</td>
<td>1052</td>
<td>669.9</td>
</tr>
<tr>
<td>30</td>
<td bgcolor="#6f5900">5</td>
<td>24disk&nbsp;raid60</td>
<td>None</td>
<td>60</td>
<td>xfs</td>
<td>16384</td>
<td>cfq</td>
<td>424</td>
<td>1082</td>
<td>657.5</td>
</tr>
<tr>
<td>31</td>
<td bgcolor="#fdfa8e">6</td>
<td>3&#215;8disk&nbsp;raid10</td>
<td>raid0</td>
<td>100</td>
<td>xfs</td>
<td>256</td>
<td>cfq</td>
<td>557</td>
<td>530</td>
<td>621.4</td>
</tr>
<tr>
<td>32</td>
<td bgcolor="#fdfa8e">6</td>
<td>3&#215;8disk&nbsp;raid10</td>
<td>raid0</td>
<td>100</td>
<td>xfs</td>
<td>4096</td>
<td>cfq</td>
<td>555</td>
<td>936</td>
<td>820.6</td>
</tr>
<tr>
<td>33</td>
<td bgcolor="#fdfa8e">6</td>
<td>3&#215;8disk&nbsp;raid10</td>
<td>raid0</td>
<td>100</td>
<td>xfs</td>
<td>8192</td>
<td>cfq</td>
<td>560</td>
<td>902</td>
<td>817.7</td>
</tr>
<tr>
<td><strong>34</strong></td>
<td bgcolor="#fdfa8e"><strong>6</strong></td>
<td><strong>3&#215;8disk&nbsp;raid10</strong></td>
<td><strong>raid0</strong></td>
<td><strong>100</td>
<td><strong>xfs</strong></td>
<td><strong>16384</strong></td>
<td><strong>cfq</strong></td>
<td><strong>555</strong></td>
<td><strong>1041</strong></td>
<td><strong>815.5</strong></td>
</tr>
<tr>
<td>35</td>
<td bgcolor="#86a3ff">7</td>
<td>24disk&nbsp;jbod</td>
<td>raid10</td>
<td>10</td>
<td>xfs</td>
<td>256</td>
<td>cfq</td>
<td>367</td>
<td>573</td>
<td>817.5</td>
</tr>
<tr>
<td>36</td>
<td bgcolor="#86a3ff">7</td>
<td>24disk&nbsp;jbod</td>
<td>raid10</td>
<td>10</td>
<td>xfs</td>
<td>4096</td>
<td>cfq</td>
<td>360</td>
<td>964</td>
<td>814.7</td>
</tr>
<tr>
<td>37</td>
<td bgcolor="#86a3ff">7</td>
<td>24disk&nbsp;jbod</td>
<td>raid10</td>
<td>10</td>
<td>xfs</td>
<td>8192</td>
<td>cfq</td>
<td>358</td>
<td>994</td>
<td>816.3</td>
</tr>
<tr>
<td>38</td>
<td bgcolor="#86a3ff">7</td>
<td>24disk&nbsp;jbod</td>
<td>raid10</td>
<td>10</td>
<td>xfs</td>
<td>16384</td>
<td>cfq</td>
<td>377</td>
<td>1049</td>
<td>818.7</td>
</tr>
<tr>
<td>39</td>
<td bgcolor="#e68d82">8</td>
<td>12&#215;2disk&nbsp;raid1</td>
<td>raid0</td>
<td>10</td>
<td>xfs</td>
<td>256</td>
<td>cfq</td>
<td>549</td>
<td>408</td>
<td>598</td>
</tr>
<tr>
<td>40</td>
<td bgcolor="#e68d82">8</td>
<td>12&#215;2disk&nbsp;raid1</td>
<td>raid0</td>
<td>10</td>
<td>xfs</td>
<td>4096</td>
<td>cfq</td>
<td>549</td>
<td>714</td>
<td>578.5</td>
</tr>
<tr>
<td>41</td>
<td bgcolor="#e68d82">8</td>
<td>12&#215;2disk&nbsp;raid1</td>
<td>raid0</td>
<td>10</td>
<td>xfs</td>
<td>8192</td>
<td>cfq</td>
<td>546</td>
<td>643</td>
<td>563.8</td>
</tr>
<tr>
<td>42</td>
<td bgcolor="#e68d82">8</td>
<td>12&#215;2disk&nbsp;raid1</td>
<td>raid0</td>
<td>10</td>
<td>xfs</td>
<td>16384</td>
<td>cfq</td>
<td>546</td>
<td>861</td>
<td>549.9</td>
</tr>
<tr>
<td>43</td>
<td bgcolor="#d3e8b1">9</td>
<td>24disk&nbsp;jbod</td>
<td>raid0</td>
<td>0</td>
<td>xfs</td>
<td>16384</td>
<td>cfq</td>
<td>743</td>
<td>1054</td>
<td>687.6</td>
</tr>
<tr>
<td>44</td>
<td bgcolor="#6ec3fd">10</td>
<td>24disk&nbsp;raid0</td>
<td>None</td>
<td>0</td>
<td>xfs</td>
<td>16384</td>
<td>cfq</td>
<td>773</td>
<td>1094</td>
<td>671.8</td>
</tr>
</table>
<p>
&nbsp;&nbsp;
</p>
<p>Observations:</p>
<ul>
<li>Raid10 is the best option.  Winning configurations are pure hardware raid10 with loads of readahead (test #9) and software-striped raid 10 for &#8220;raid 100&#8243; (tests #19 and #34).</li>
<li>I wasn&#8217;t impressed with raid6 or raid60 and raid0 isn&#8217;t a realistic option so that is why those setups aren&#8217;t as heavily hit in the above configurations.</li>
<li>The IO scheduler didn&#8217;t really make much difference.  CFQ seemed to be just as good or better so stuck with it (its the default).</li>
<li>readahead makes a <strong>huge</strong> difference.  Linux defaults this to 256 per drive and Linux sees a hardware raid array as 1 drive.  You absolutely must increase that 256 default value to at least 4096 in my opinion.  I increased it as high as 32768 for the pure raid10 config and performance didn&#8217;t suffer in the seeks/sec or as reported by pg_bench while sequential read speeds increased dramatically.</li>
<li>xfs is faster than ext3 in all tests I compared them in.  Included just a few ext3 numbers above.  But, don&#8217;t use XFS (or ext3 with data=writeback) unless you have <strong>both</strong> a battery-backed cache on your raid controller and are connected to a UPS for main power (and ideally you should be monitoring the health of the BBU on the raid controller to ensure the battery isn&#8217;t dead).  You could lose data if this advice is ignored.</li>
</ul>
<h2>The bonnie++ 1.03e results on Arch Linux</h2>
<p>Next I installed Arch Linux which happens to come with bonnie++ 1.03e.  I did a couple bonnie++ runs just to make sure the new OS didn&#8217;t mess anything up and was shocked to see dramatically better random seeks/second numbers.  Sequential speeds were virtually the same, but seeks/second was massively faster.  Here is a table showing some of the configs (I didn&#8217;t retest them all &#8211; was running out of time) with bonnie++ 1.03e seeks/second numbers.  I am going to credit the newer version of bonnie++ for this difference.  I retested enough configurations to feel good about the trends seen when testing with opensuse and bonnie++ 1.02 still holding.  Open to hearing other possible explanations for the performance increase.  I was glad to see these numbers as I was disappointed with the apparent 800ish ceiling I was seeing in the first batch of tests.</p>
<p>Here are the configs I retested (middle of 3 runs again).  The Test# matches the row in the above table.  The new 1.03e score is listed next to the old 1.02 score.</p>
<table border="1" bordercolor="gray" cellpadding="3" cellspacing="0">
<tr bgcolor="#cccccc">
<th>Test#</th>
<th>Config#</th>
<th>bonnie++ 1.02 seeks/sec
<th>bonnie++ 1.03e seeks/sec</th>
</tr>
<tr>
<td>1</td>
<td bgcolor="#a8d255">1</td>
<td align="right">787.4</td>
<td align="right">1613</td>
</tr>
<tr>
<td>4</td>
<td bgcolor="#a8d255">1</td>
<td align="right">802.7</td>
<td align="right">1652</td>
</tr>
<tr>
<td>5</td>
<td bgcolor="#a8d255">1</td>
<td align="right">809.4</td>
<td align="right">1639</td>
</tr>
<tr>
<td>6</td>
<td bgcolor="#a8d255">1</td>
<td align="right">808.4</td>
<td align="right">1688</td>
</tr>
<tr>
<td>7</td>
<td bgcolor="#a8d255">1</td>
<td align="right">805.9</td>
<td align="right">1684</td>
</tr>
<tr>
<td>8</td>
<td bgcolor="#a8d255">1</td>
<td align="right">806.3</td>
<td align="right">1697</td>
</tr>
<tr>
<td>9</td>
<td bgcolor="#a8d255">1</td>
<td align="right">808.6</td>
<td align="right">1717</td>
</tr>
<tr>
<td>19</td>
<td bgcolor="#5727ef">3</td>
<td align="right">816</td>
<td align="right">1662</td>
</tr>
<tr>
<td>26</td>
<td bgcolor="#bd550c">4</td>
<td align="right">667.3</td>
<td align="right">1056</td>
</tr>
<tr>
<td>30</td>
<td bgcolor="#6f5900">5</td>
<td align="right">657.5</td>
<td align="right">1168</td>
</tr>
<tr>
<td>34</td>
<td bgcolor="#fdfa8e">6</td>
<td align="right">815.5</td>
<td align="right">1705</td>
</tr>
<tr>
<td>38</td>
<td bgcolor="#86a3ff">7</td>
<td align="right">818.7</td>
<td align="right">1560</td>
</tr>
<tr>
<td>44</td>
<td bgcolor="#6ec3fd">10</td>
<td align="right">671.8</td>
<td align="right">1175</td>
</tr>
</table>
<p>
&nbsp;&nbsp;
</p>
<p>Observations:</p>
<p>Higher numbers across the board.  No big new insights.  Raid 10 still wins, raid 6/60 still substantially slower.  At this point that pure raid 10 config (test #9) that scored a 1717 is looking pretty nice.  The pure software raid 10 (test #38) fell behind the hardware version further.  Biggest take away from this is to be absolutely certain that when you are benchmarking disks the OS and tools are identical.</p>
<h2>The pgbench results</h2>
<p>Finally, I took the 3 fastest configurations and did pgbench runs with those.  I was running out of time so I took what looked like the winner (test #9) and additionally tuned readahead and schedules a bit to ensure I got the best combination.  pgbench isn&#8217;t perfect and there are people who dislike it but it gives us another number to compare and consider along with the raw dd/bonnie++ numbers already known.  Keep in mind I am using a PostgreSQL install and pgbench on this new server &#8211; not the actual production server.  Doing the benchmarks on the actual final server just wasn&#8217;t an option.  The only significant difference between the benchmark server and the production server is that the latter has 4 times the RAM and has had nontrivial postgresql.conf tuning so I can only assume these numbers would improve a good bit.</p>
<p>I just tweaked a few things in the postgresql.conf file on the benchmark server.  The non default values:</p>
<ul>
<li><code>shared_buffers = 2048MB</code></li>
<li><code>checkpoint_segments = 10</code></li>
<li><code>effective_cache_size = 4096MB</code></li>
<li><code>max_connections = 500</code></li>
<li><code>work_mem = 20MB</code></li>
<li><code>maintenance_work_mem = 128MB</code></li>
<li><code>synchronous_commits = off</code></li>
<li><code>random_page_cost = 2.0</code></li>
</ul>
<p>Note that I did have to increase SHMMAX to get shared_buffers + connections that high.  See my <a href="http://blog.gtuhl.com/2009/03/08/postgresql-setup-basics/">PostgreSQL setup post</a> for more information about that.</p>
<p>Between each run I dropped the &#8220;test&#8221; database and recreated it.  I initialized with these commands.  Note the different scale factors to get nontrivial data amounts &#8211; scale factor 1000 is pretty large but was necessary before I saw the disks working constantly during the benchmarks.</p>
<p><code>pgbench -i -s 100 -U postgres -d test</code><br />
<code>pgbench -i -s 1000 -U postgres -d test</code></p>
<p>Then I ran the tests specifying 40 clients and 10,000 transactions per client in the params.  <a href="http://developer.postgresql.org/pgdocs/postgres/pgbench.html">Read more about the pgbench tool</a> for the test differences and what the transactions involve.</p>
<p>Here are the pgbench results.  Numbers are transactions/second.  Values listed are again the middle of 3 runs.  I would have liked to do these pgbench runs on more configs but I again was running out of time and these suckers took a long time to run with all the dropping and re-initializing.</p>
<table border="1" bordercolor="gray" cellpadding="3" cellspacing="0">
<tr bgcolor="#cccccc">
<th>Test#</th>
<th>Config#</th>
<th>TPC-B s=100</th>
<th>SELECT s=100</th>
<th>TPC-B s=1000</th>
<th>SELECT s=1000</th>
</tr>
<tr>
<td>4</td>
<td bgcolor="#a8d255">1</td>
<td align="right">1580</td>
<td align="right">10885</td>
<td align="right">1224</td>
<td align="right">3655</td>
</tr>
<tr>
<td>5</td>
<td bgcolor="#a8d255">1</td>
<td align="right">1591</td>
<td align="right">10667</td>
<td align="right">1270</td>
<td align="right">3499</td>
</tr>
<tr>
<td>6</td>
<td bgcolor="#a8d255">1</td>
<td align="right">1567</td>
<td align="right">10647</td>
<td align="right">1267</td>
<td align="right">3048</td>
</tr>
<tr>
<td>7</td>
<td bgcolor="#a8d255">1</td>
<td align="right">1551</td>
<td align="right">10656</td>
<td align="right">1202</td>
<td align="right">3478</td>
</tr>
<tr>
<td>8</td>
<td bgcolor="#a8d255">1</td>
<td align="right">1553</td>
<td align="right">10644</td>
<td align="right">1209</td>
<td align="right">3386</td>
</tr>
<tr>
<td>9</td>
<td bgcolor="#a8d255">1</td>
<td align="right">1606</td>
<td align="right">10759</td>
<td align="right">1296</td>
<td align="right">3548</td>
</tr>
<tr>
<td>19</td>
<td bgcolor="#5727ef">3</td>
<td align="right">1581</td>
<td align="right">10743</td>
<td align="right">1311</td>
<td align="right">3269</td>
</tr>
<tr>
<td>34</td>
<td bgcolor="#fdfa8e">6</td>
<td align="right">1563</td>
<td align="right">10677</td>
<td align="right">1323</td>
<td align="right">3156</td>
</tr>
</table>
<p>
&nbsp;&nbsp;
</p>
<p>Observations:</p>
<p>These pgbench tests were mostly a wash.  But that isn&#8217;t a big surprise considering I only compared the 3 best configurations that were close in the dd/bonnie++ tests.  Really wish there had been time to get some other configs in that table to see the difference.  The #9 config did well with a top 3 number in all 4 tests.</p>
<h2>Conclusion</h2>
<p>I went with setup #9.  Pure hardware raid 10.  It won almost all tests (with the striped raid 10s being the only real competitors) and pure hardware raid is super easy to configure, maintain, and monitor.</p>
<p>More generally this MD1120 performs pretty well, especially for the relatively low price.  As a quick note on price buying one of these MD1120s as configured above along with a PowerEdge 1950 with 16GB RAM and a Perc6/E card to connect them would cost less than $15k (about $540/mo for a $1 36mo lease) depending on the configuration and the deal you got.   You could probably even shave a few thousand off that number if you got a real good deal.</p>
<p>If you do buy stuff from Dell be sure to get in touch with a small business sales team.  They can offer nontrivial discounts on the price you can finagle in their online shopping cart, you get to talk to the same people every time you place an order, it gives you a contact if you have questions or run into issues with technical support, and it just generally is the way to go.</p>
<p>This MD1120+Perc6/E combo is connected to our existing DB server now and all I will say is performance is excellent.  I am seeing zero query backup, barely any IO wait reported by vmstat, and hugely impressive random IO spikes when load gets heavy (though we haven&#8217;t gotten close to maxing out so who knows how high it could go).  And, thanks to all this testing, I feel really good about having the optimal configuration for the hardware out there doing work.</p>
<img src="http://feeds.feedburner.com/~r/gtuhl/~4/ivNCNs4C0ns" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.gtuhl.com/2009/05/13/dell-md1120-perc6e-performance/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Bulk Data Loading With PostgreSQL</title>
		<link>http://blog.gtuhl.com/2009/04/18/bulk-data-loading-with-postgresql/</link>
		<comments>http://blog.gtuhl.com/2009/04/18/bulk-data-loading-with-postgresql/#comments</comments>
		<pubDate>Sat, 18 Apr 2009 13:18:22 +0000</pubDate>
		<dc:creator>Joe</dc:creator>
				<category><![CDATA[SysAdmin]]></category>
		<category><![CDATA[postgresql]]></category>

		<guid isPermaLink="false">http://blog.gtuhl.com/?p=299</guid>
		<description><![CDATA[Bulk data loading is usually a niche task &#8211; something that doesn&#8217;t have to be done too often but when it does it feels painful because it takes so long and if you mess something up it has to be done again.  After loading dozens of 7 and 8 digit record count files into [...]]]></description>
			<content:encoded><![CDATA[<p>Bulk data loading is usually a niche task &#8211; something that doesn&#8217;t have to be done too often but when it does it feels painful because it takes so long and if you mess something up it has to be done again.  After loading dozens of 7 and 8 digit record count files into PostgreSQL databases I think it would be helpful to at least share the basic stuff that can make the process go a lot faster.  The PostgreSQL documentation of course has everything you need for bulk data loading if you would rather read the official docs <a href="http://www.postgresql.org/docs/8.3/interactive/populate.html">here</a> but I consistently hear cases where the advice there isn&#8217;t used so I wanted to show some numbers.  That link covers more techniques, all this post really adds is some numbers to back up the basics.</p>
<p>For this post I am working with 600k records being loaded into a table that looks like this.  The data plus delimiters is about 60MB.</p>
<pre>
                 Table "public.people"
   Column    |            Type             | Modifiers
-------------+-----------------------------+-----------
 id          | character varying(36)       | not null
 first_name  | character varying(255)      |
 last_name   | character varying(255)      |
 middle_name | character varying(255)      |
 client_id   | character varying(36)       |
 birth_date  | timestamp without time zone |
Indexes:
    "people_pkey" PRIMARY KEY, btree (id)
    "idx_people_client_id" btree (client_id)
    "idx_people_name_search" gin (to_tsvector('english'::regconfig,
      (((COALESCE(first_name,
      ''::character varying)::text || ' '::text) ||
      COALESCE(middle_name, ''::character varying)::text) || ' '::text) ||
      COALESCE(last_name, ''::character varying)::text))
</pre>
<p></p>
<p></p>
<p>A few points on this particular table and config:</p>
<ul>
<li>The data is a subset of a real production table made up of entries people provided on themselves so the distribution is realistic and balanced.</li>
<li>The primary key id column is a UUID generated by code.</li>
<li>A very basic index on the client_id column.</li>
<li>A more complex expression gin index on the name fields to allow full text searching.  Full text searching was added in PostgreSQL 8.3 and it&#8217;s pretty powerful.  Read more about that <a href="http://www.postgresql.org/docs/8.3/interactive/textsearch.html">here</a> if interested.</li>
<li>Overall it is a relatively small, simple table but not trivially so and its large enough to get decent timings from bulk load operations for comparison purposes.</li>
<li>Performing tests against a clean DB, empty table, and freshly created indexes so we have a best case scenario.</li>
</ul>
<p>So, let&#8217;s say you have that empty table and want to load 600k records in.  Here is a sequence of methods going from worst to best.  I am doing a VACUUM ANALZYE before and after each operation.  All timings rounded up to nearest second.  We are comparing <a href="http://www.postgresql.org/docs/current/static/sql-insert.html">INSERT</a> and <a href="http://www.postgresql.org/docs/8.3/static/sql-copy.html">COPY</a>.</p>
<h2>First Setup: leave indexes and pkey in place.</h2>
<p><strong>INSERT</strong>: 6 minutes, 5 seconds</p>
<p><strong>COPY</strong>: 3 minutes, 11 seconds. </p>
<h2>Second Setup: drop/recreate indexes, don&#8217;t touch pkey</h2>
<p>When timed it took 1 second to drop indexes and 13 seconds to rebuild them.  I have added that overhead into these numbers.</p>
<p><strong>INSERT</strong>: 3 minutes, 19 seconds</p>
<p><strong>COPY</strong>: 1 minute, 5 seconds</p>
<h2>Third Setup: drop/recreate indexes and pkey</h2>
<p>When timed it took 1 second to drop indexes and 15 seconds to rebuild them (includes pkey).  I have added that overhead into these numbers.</p>
<p><strong>INSERT</strong>: 2 minutes, 49 seconds</p>
<p><strong>COPY</strong>: 21 seconds</p>
<h2>Conclusion</h2>
<p>With that final setup, dropping/recreating all indexes and using the COPY command, I was able to load 600k records in 21 seconds.  That includes the drop/recreate overhead.  The actual COPY command only took 5 seconds.</p>
<p>General notes related to the tests done above:</p>
<ul>
<li>The COPY command is magical.  It completely blows away INSERT for bulk load performance.</li>
<li>Dropping and recreating indexes is far faster than leaving indexes in place and forcing thousands or millions of individual updates to them.</li>
<li>Do a VACUUM ANALYZE before and after bulk loading data to be safe.  If you dropped indexes be sure to recreate them before doing the ANALYZE.</li>
<li>If you don&#8217;t think the speed improvement shown is significant keep in mind we are working with a narrow 6 column table, with just 3 indexes, and not very much data.  The savings found by dropping and recreating indexes becomes enormous as your table height and width increases and/or your number of indexes goes up.  Similarly if you are only working with thousands or low tens of thousands of rows you can just do whatever is easiest since the size is so trivial.</li>
</ul>
<p>Some other notes worth mentioning:</p>
<ul>
<li>The same rules apply for updates.  ALL indexes slow down updates, not just the ones on the column you are updating so if you need to update an entire table dropping and recreating the indexes will provide a massive speed increase.</li>
<li>If you are doing a bulk update and using a field on the updated table for lookups you are generally better off dropping all indexes <strong>except</strong> the one on your lookup column.  Say for each person in my people table I am looking up the specific person by id and setting their client_id.  I would want to leave the index/pkey on the id column so those lookups were fast.  You still pay a penalty at write time because the index/pkey must be updated but not having the index would mean a sequential scan is done on the entire table for each update.</li>
<li>Though incredibly fast the COPY command is <strong>very</strong> picky about its input.  It is an all or nothing operation and if there is even a slight issue with your data it will fail.  You will want to VACUUM your table after a failed COPY to recover space.  Thankfully if it does fail it prints out the exact line number and reason.</li>
<li>When specifying a file for the COPY command you must use an absolute path and the file (and all directories in its path) must be accessible by the postgres user because it is the server that goes and reads the file not the client.  I find it easiest to just toss my input files into /tmp.</li>
<li>Consider partitioning your data to allow drop/add of indexes on certain partitions preventing an entire table from having awful read performance while data is being loaded.  Partitioning is a whole separate topic but <a href="http://www.postgresql.org/docs/8.3/interactive/ddl-partitioning.html">read the docs</a>, be sure constraint_exclusion is on in your postgresql.conf, and be sure your check constraint is in the where clause of all queries if you take this route.  Partitions and significant bulk data loading fit together really really well.</li>
<li>Seriously read the <a href="http://www.postgresql.org/docs/8.3/interactive/populate.html">PostgreSQL documentation on this stuff</a> if you are bulk loading data.  Aside from the additional SQL-level recommendations it covers the tuning options for your DB that can make a very significant difference.</li>
</ul>
<p>Finally, you may find that expression index for full text searching in my people table above looks strange.  I hope to do a post on indexing specifically that explains in detail.  In my opinion tuning database indexes is a blast and made even greater by the amazing capabilities of PostgreSQL.</p>
<img src="http://feeds.feedburner.com/~r/gtuhl/~4/Y_5OHmR1tZQ" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.gtuhl.com/2009/04/18/bulk-data-loading-with-postgresql/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Connection Pooling: PgBouncer or JDBC</title>
		<link>http://blog.gtuhl.com/2009/04/04/connection-pooling-pgbouncer-or-jdbc/</link>
		<comments>http://blog.gtuhl.com/2009/04/04/connection-pooling-pgbouncer-or-jdbc/#comments</comments>
		<pubDate>Sat, 04 Apr 2009 16:09:53 +0000</pubDate>
		<dc:creator>Joe</dc:creator>
				<category><![CDATA[SysAdmin]]></category>
		<category><![CDATA[jdbc]]></category>
		<category><![CDATA[pgbouncer]]></category>
		<category><![CDATA[postgresql]]></category>

		<guid isPermaLink="false">http://blog.gtuhl.com/?p=276</guid>
		<description><![CDATA[Why Use Connection Pooling
With recent customer additions we have had to deal with lots of scaling challenges the last few months.  One puny fraction of that effort has been around tuning our database connection pools.  In very short, if your database server is overwhelmed with concurrent connections it can get bogged down in [...]]]></description>
			<content:encoded><![CDATA[<h2>Why Use Connection Pooling</h2>
<p>With recent customer additions we have had to deal with lots of scaling challenges the last few months.  One puny fraction of that effort has been around tuning our database connection pools.  In very short, if your database server is overwhelmed with concurrent connections it can get bogged down in context switching and resource/io waiting such that overall throughput or transactions per second can drop.  In the best case things just get slow.  In the worst case things completely lock up because the server just can&#8217;t service all of those connections or your client code starts throwing exceptions because the database has reached it maximum configured connection count.</p>
<p>One way to control the maximum number of connections hitting your database at once is via connection pooling.  This can ensure a consistent load that your database can dispatch quickly to keep a better overall throughput going.  You can also match your aggregate pool sizes to the maximum connection count configured in your database server to ensure your client code never asks for more than the DB can give.  The other HUGE advantage to connection pooling is that you are essentially caching the overhead of obtaining new connections.  To the database there is a set number of connections made and held open &#8211; the pooling software then distributes those pooled connections to clients that need them and when the clients are done the connection remains open and waiting to be used again.  This avoids a surprisingly high amount of overhead.</p>
<p>There is definitely a trade off.  If your application needs to have a certain number of connections simply because you have too many concurrent users it is time to tune your code/queries/indexes (<strong>HUGE</strong> gains can always be found there), push stuff into distributed caches, or buy new database hardware so you can service more connections.   There are recommendations online of a connection count as low as 2-4 per core.  We&#8217;ve settled in around 30 per core and that seems to be a sweet spot for stability and performance.  With 8 cores we can sustain well over 1000 transactions per second on a single PostgreSQL server (running on decent but not exceptional hardware).</p>
<p>We use JDBC connection pooling with <a href="http://www.mchange.com/projects/c3p0/index.html">c3p0</a>.  Each of our application servers has its own pool.  We ensure that the sum of those pool sizes matches the connections that the database allows and also equals our desired connections per core count.</p>
<p>We explored using <a href="https://developer.skype.com/SkypeGarage/DbProjects/PgBouncer">PgBouncer</a> instead.  This differs in that the connection pooling happens on the server &#8211; clients just ask for as many connections as they need talking to PgBouncer instead of directly to PostgreSQL.</p>
<p>You cannot use both JDBC pooling and PgBouncer or other server-side pooling.  Or more correctly you can but there is no point.  Say you set PgBouncer to maintain a pool of 500 connections.  If you have JDBC pooling in the mix that just means you have to tune your individual client JDBC pools such that the sum of their pools doesn&#8217;t exceed 500.  If that is the case, why use PgBouncer?  Just use PostgreSQL and set it to handle 500 connections.  Instead you would drop JDBC pooling and just have your clients ask for whatever they wanted while PgBouncer did the pooling and connection management.</p>
<h2>Comparison</h2>
<p>In deciding to move to PgBouncer or stick with c3p0 I ran a few tests.  Here are those results.  The goal here was to compare the overhead of obtaining connections <strong>not</strong> to determine the overall best solution by throughput.  That would require significantly different and more involved tests.  We were considering a move to PgBouncer and just wanted to consider any difference in connection overhead.  These tests were done with a pool size of 50 and with client code connecting to a separate database server on the same subnet and physical switch.</p>
<p><strong>First Test</strong><br />
Ran 500 queries with each query being a simple <code>select 1</code>.  This simple query means most of the time is connection overhead so its a good test to decide between pooling solutions.</p>
<table>
<tr>
<td>c3p0 ComboPooledDataSource</td>
<td>.99 s</td>
</tr>
<tr>
<td>JDBC no pooling and no PgBouncer</td>
<td>6.41 s</td>
</tr>
<tr>
<td>JDBC no pooling but with PgBouncer</td>
<td>3.99 s</td>
</tr>
</table>
<p></p>
<p><strong>Second Test</strong><br />
Ran 500 queries where the query is selected sequentially from an array of 5 distinct queries varying from very simple to very complex.  Highly imperfect test, but trying to see how connection overhead factors into a more normal load. </p>
<table>
<tr>
<td>c3p0 ComboPooledDataSource</td>
<td>122.18 s</td>
</tr>
<tr>
<td>JDBC no pooling no PgBouncer</td>
<td>139.71 s</td>
</tr>
<tr>
<td>JDBC no pooling but with PgBouncer</td>
<td>129.81 s</td>
</tr>
</table>
<p></p>
<p>So c3p0 definitely offers the fastest solution.  Additionally PgBouncer definitely makes a difference over not using connection pooling (and thus having to obtain a new connection with each query).  This makes some sense as c3p0 is closer to the client code and keeps the pooled connections local.  With PgBouncer the clients have to hit the network to queue up with the server.</p>
<h2>Conclusion</h2>
<p>We stuck with c3p0.  But in my opinion where PgBouncer really shines is on the administrative side.  As we continue to add more servers a move to PgBouncer or something similar seems likely.  It (and <a href="http://pgpool.projects.postgresql.org/">PgPool-II</a>, another solution) offer other capabilities as well.  PgPool-II even offers built-in load balancing of reads against replicated copies of a DB. </p>
<p>Say you have 10 application servers and 1 database server configured for 500 maximum connections.  You set the individual application server pool sizes to 50.  Now you need to add a new application server.  You would have to tune all 10 of your old servers dropping their pool sizes down to make room for the new one.  This problem gets more and more severe as you add more servers.</p>
<p>Also, tuning individual pool sizes per client means there is a chance that one server is starving while others have pooled connections sitting idle &#8211; the connections aren&#8217;t shared.  Probably not a big issue if you are load balancing user connections to your application servers evenly but still a possibility.</p>
<p>As you get more and more machines the server-layer pooling with PgBouncer definitely makes more sense.  It also is obviously client language neutral so if you are running in an environment that doesn&#8217;t have mature pooling solutions its a great option.</p>
<img src="http://feeds.feedburner.com/~r/gtuhl/~4/POxpvcstt9Y" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blog.gtuhl.com/2009/04/04/connection-pooling-pgbouncer-or-jdbc/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss><!-- Dynamic Page Served (once) in 0.718 seconds --><!-- Cached page served by WP-Cache -->
