<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:thr="http://purl.org/syndication/thread/1.0" xml:lang="en" xml:base="http://www.bitquill.net/blog/wp-atom.php">
	<title type="text">b i t q u i l l</title>
	<subtitle type="text">Search for Terrestrial Intelligence</subtitle>

	<updated>2009-09-03T13:41:23Z</updated>
	<generator uri="http://wordpress.org/" version="2.7.1">WordPress</generator>

	<link rel="alternate" type="text/html" href="http://www.bitquill.net/blog" />
	<id>http://www.bitquill.net/blog/?feed=atom</id>
	

			<link rel="self" href="http://feeds.feedburner.com/bitquill-all" type="application/atom+xml" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com" /><entry>
		<author>
			<name>spapadim</name>
						<uri>http://www.bitquill.net/</uri>
					</author>
		<title type="html"><![CDATA[Mobile OCR input: &#8220;Fully automatic&#8221; and reality]]></title>
		<link rel="alternate" type="text/html" href="http://www.bitquill.net/blog/?p=119" />
		<id>http://www.bitquill.net/blog/?p=119</id>
		<updated>2009-09-03T13:41:23Z</updated>
		<published>2009-09-01T18:06:08Z</published>
		<category scheme="http://www.bitquill.net/blog" term="Sci &amp; Tech" /><category scheme="http://www.bitquill.net/blog" term="Android" /><category scheme="http://www.bitquill.net/blog" term="Computer Science" /><category scheme="http://www.bitquill.net/blog" term="Development" /><category scheme="http://www.bitquill.net/blog" term="Machine learning" /><category scheme="http://www.bitquill.net/blog" term="Mobile devices" /><category scheme="http://www.bitquill.net/blog" term="Research" /><category scheme="http://www.bitquill.net/blog" term="User interfaces" />		<summary type="html"><![CDATA[Recently I&#8217;ve been toying around with WordSnap OCR (project page, source code, app on Android Market), an app for OCR-based camera input on Android. In the process, I found out a few things about &#8220;smart&#8221; versus &#8220;fast&#8221;.
At least in data mining, &#8220;fully automatic&#8221; is an often unquestioned holy grail.  There are certainly several valid reasons for [...]]]></summary>
		<content type="html" xml:base="http://www.bitquill.net/blog/?p=119">&lt;p&gt;Recently I&amp;#8217;ve been toying around with &lt;a title="WordSnap OCR" href="http://www.bitquill.net/trac/wiki/Android/OCR"&gt;WordSnap OCR&lt;/a&gt; (&lt;a title="WordSnap OCR - Wiki" href="http://www.bitquill.net/trac/wiki/Android/OCR"&gt;project page&lt;/a&gt;, &lt;a title="WordSnap OCR - Google Code" href="http://code.google.com/p/wordsnap-ocr/source/browse/#svn/trunk"&gt;source code&lt;/a&gt;, &lt;a title="WordSnap OCR - Cyrket" href="http://www.cyrket.com/package/net.bitquill.ocr"&gt;app on Android Market&lt;/a&gt;), an app for OCR-based camera input on Android. In the process, I found out a few things about &lt;strong&gt;&amp;#8220;smart&amp;#8221; versus &amp;#8220;fast&amp;#8221;&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;At least in data mining, &amp;#8220;fully automatic&amp;#8221; is an often unquestioned holy grail.  There are certainly several valid reasons for this, such as if you&amp;#8217;re trying to scan huge collections of books such as &lt;a title="Google Books" href="http://books.google.com/"&gt;this&lt;/a&gt;, or index images from your daily life like &lt;a title="Evernote" href="http://evernote.com/"&gt;this&lt;/a&gt;.  &lt;strong&gt;In this case, you use all the available processing power to make as few errors as possible&lt;/strong&gt; (i.e., maximize accuracy).&lt;/p&gt;
&lt;p&gt;However, if the user is sitting right in front of your program, watching your algorithms and their output, things are a little different. &lt;strong&gt;No matter how smart your algorithm is, some errors will occur.&lt;/strong&gt; This tends to annoy users. In that sense, actively involved users are a liability. However, they can also be an asset: since they&amp;#8217;re sitting there anyway, waiting for results, you may as well get them &lt;em&gt;really&lt;/em&gt; involved.&lt;strong&gt; If you have cheap but intelligent labor ready and willing, use it!&lt;/strong&gt; The results will be better or, at the very least, no worse.  &lt;strong&gt;Also, users tend to remember the failures.&lt;/strong&gt; So, even if end results were similar &lt;em&gt;on average&lt;/em&gt;, allowing users to correct failures as early as possible will make them happier.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Instead of making algorithms as smart as possible, the goal now is to make them as fast as possible, so that they produce near-realtime results that don&amp;#8217;t have to be perfect; they just shouldn&amp;#8217;t be total garbage. &lt;span style="font-weight: normal;"&gt;When I started playing with the idea for WordSnap, I was thinking how to make the algorithms as smart as possible.  However, for the reasons above, I soon changed tactics.&lt;/span&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="font-weight: normal;"&gt;&lt;span style="font-weight: normal;"&gt; &lt;/span&gt;&lt;/span&gt;&lt;span style="font-weight: normal;"&gt;The rest of this post describes some of the successful design decisions but,  more importantly, the failures in the balance between &amp;#8220;automatic&amp;#8221; and &amp;#8220;realtime guidance&amp;#8221;.&lt;/span&gt;&lt;span style="font-weight: normal;"&gt;&lt;span style="font-weight: normal;"&gt; &lt;/span&gt;&lt;/span&gt;&lt;span style="font-weight: normal;"&gt;The story begins with the following example image:&lt;/span&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;img class="size-full wp-image-149 alignnone" title="Original grayscale image" src="http://www.bitquill.net/blog/wp-content/uploads/2009/08/skew_original.png" alt="Original image" width="512" height="384" /&gt;&lt;/p&gt;
&lt;p&gt;Incidentally, this image was the inspiration for WordSnap: I wanted to look up &amp;#8220;inimical&amp;#8221; but I was too lazy to type. Also, for the record, WordSnap uses camera preview frames, which are semi-planar YUV data at HVGA resolution (480×320). This image is a downsampled (512×384) full-resolution photograph taken with the G1 camera (2048×1536); most experiments here were performed before WordSnap existed in any usable form. Finally, I should point out that OCR isn&amp;#8217;t really my area; what I describe below is based on common sense rather than knowledge of prior art, although just before writing this post I did try a quick review of the literature.&lt;/p&gt;
&lt;h3&gt;Binarization&lt;/h3&gt;
&lt;p&gt;A basic operation for OCR is binarization: mapping grayscale intensities between 0 and 255 to just two values: black (0) and white (1).  Only then can we start talking about shapes (lines, words, characters, etc).  One of the most widely used binarization algorithms is &lt;a title="Otsu's method - Wikipedia" href="http://en.wikipedia.org/wiki/Otsu's_method"&gt;Otsu&amp;#8217;s method&lt;/a&gt;.  It picks a single, global threshold so that it maximizes the within-class (black/white) variance, or equivalently maximizes the across-class variance. This is very simple to implement, very fast and works well for flatbed scans, which have uniform illumination.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;However, camera images are not uniformly illuminated.&lt;/strong&gt; The example image may look fine to human eyes, but it turns out that even for this image no global threshold is suitable (click on image for &lt;a title="Global thresholding - Animation" href="http://www.bitquill.net/blog/wp-content/uploads/2009/08/skew_bin_global.gif"&gt;animation showing various global thresholds&lt;/a&gt;):&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.bitquill.net/blog/wp-content/uploads/2009/08/skew_bin_global.gif"&gt;&lt;img class="alignnone size-full wp-image-157" title="Binarization with global threshold" src="http://www.bitquill.net/blog/wp-content/uploads/2009/08/skew_bin_global_static.png" alt="Binarization with global threshold" width="512" height="384" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;If you looked at the animation carefully, you might have noticed that at some point, at least the word of interest (&amp;#8221;inimical&amp;#8221;) is correctly binarized in this picture.  However, if the lighting gradient were steeper, this would not be possible. Incidentally, &lt;a title="ZXing - Google COde" href="http://code.google.com/p/zxing/"&gt;ZXing&lt;/a&gt; uses Otsu&amp;#8217;s method for binarization, because of it is fast. So, if you wondered why barcode scanning sometimes fails, now you know.&lt;/p&gt;
&lt;p&gt;So, a slightly smarter approach is needed: instead of using one global threshold,&lt;strong&gt; the threshold should be determined individually for each pixel (i,j)&lt;/strong&gt;. A natural threshold t(i,j) is the mean intensity μ&lt;sub&gt;w&lt;/sub&gt;(i,j) of pixels within a w×w neighborhood around pixel (i,j).  The key operation here is mean filtering: convolving the original image with a w×w matrix with constant entries 1/w&lt;sup&gt;2&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;The problem is that, using pure Java running on Dalvik, mean filtering is prohibitively slow.  First, Dalvik is fully interpreted (no JIT, yet). Firthermore, the fact that Java bytes are always signed doesn&amp;#8217;t help: casting to int and masking off the 24 most significant bits almost doubles running time.&lt;/p&gt;
&lt;table border="0" cellspacing="3" cellpadding="2"&gt;
&lt;tbody&gt;
&lt;tr style="background-color: #dddddd"&gt;
&lt;th&gt; Method&lt;/th&gt;
&lt;th colspan="3" align="center"&gt;Dalvik (msec)&lt;/th&gt;
&lt;th colspan="3" align="center"&gt;JNI (msec)&lt;/th&gt;
&lt;th align="center"&gt; Speedup&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Naïve&lt;/td&gt;
&lt;td align="right"&gt;109,882&lt;/td&gt;
&lt;td&gt;±&lt;/td&gt;
&lt;td&gt;4,813&lt;/td&gt;
&lt;td align="right"&gt;1,712&lt;/td&gt;
&lt;td&gt;±&lt;/td&gt;
&lt;td&gt;261&lt;/td&gt;
&lt;td align="center"&gt;64×&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sliding&lt;/td&gt;
&lt;td align="right"&gt;2,435&lt;/td&gt;
&lt;td&gt;±&lt;/td&gt;
&lt;td&gt;141&lt;/td&gt;
&lt;td align="right"&gt;71&lt;/td&gt;
&lt;td&gt;±&lt;/td&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;td align="center"&gt;34×&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;JNI to the rescue. The table above shows speedups for two implementations.  The naïve approach uses a triple nested loop and has complexity O(w&lt;sup&gt;2&lt;/sup&gt;mn), where m and n is the image height and width, respectively (m = 348, n = 512 in this example).  The 1-D equivalent would simply be:&lt;/p&gt;
&lt;pre&gt;for i = 0 to N-1:
   s = 0
   for j = max(i-r,0) to min(i+r,N-1):
      s += a[j]&lt;/pre&gt;
&lt;p&gt;where w = 2r+1 is the window size. The second implementation updates the sums incrementally, based on the values of adjacent windows. The complexity now is just O(mn). An interesting aside is the relative performance of two implementations for sliding window sums (where w = 2r+1 is the window size).  The first checks border conditions inside each iteration:&lt;/p&gt;
&lt;pre&gt;Initialize s = sum(a[0]..a[r])
for i = 1 to N-1:
   if i &amp;gt; r:
      s -= a[i-r-1]
   if i &amp;lt; N-r:
      s += a[i+r]&lt;/pre&gt;
&lt;p&gt;The second moves the border condition checks outside the loop which, if you think about it for a second, amounts to:&lt;/p&gt;
&lt;pre&gt;Initialize s = sum(a[0]..a[r])
for i = 1 to r:
   s += a[i+r]
for i = r+1 to N-r-1:
   s -= a[i-r-1]
   s += a[i+r]
for i = N-r to N-1:
   s -= a[i-r-1]&lt;/pre&gt;
&lt;p&gt;Among these two, the &lt;em&gt;first&lt;/em&gt; one is faster, at least on a laptop running Sun&amp;#8217;s JVM with JIT (I didn&amp;#8217;t time Dalvik or JNI).&lt;strong&gt; I&amp;#8217;m guessing that the second one messes loop unrolling&lt;/strong&gt;, but I haven&amp;#8217;t checked my guess.&lt;/p&gt;
&lt;p&gt;It turns out that there is a very similar approach in the literature, called &lt;em&gt;Sauvola&amp;#8217;s method&lt;/em&gt;. Furthermore, there are &lt;a href="http://www.dfki.uni-kl.de/~shafait/papers/Shafait-efficient-binarization-SPIE08.pdf"&gt;efficient methods&lt;/a&gt; to compute it, using integral images. These are simply the 2-D generalization of partial sums. In 1-D, if partial sums are pre-computed, window sums can be estimated in O(1) time using the simple observation that sum(i&amp;#8230;j) = sum(1..j) - sum(1..i-1).&lt;/p&gt;
&lt;p&gt;Savuola&amp;#8217;s method also computes local variance σ&lt;sub&gt;w&lt;/sub&gt;(i,j), and uses a relative threshold t(i,j) = μ&lt;sub&gt;w&lt;/sub&gt;(i,j)(1 + λσ&lt;sub&gt;w&lt;/sub&gt;(i,j)/127).  &lt;strong&gt;WordSnap uses the global variance and an additive threshold&lt;/strong&gt; t(i,j) = μ&lt;sub&gt;w&lt;/sub&gt;(i,j) + λσ&lt;sub&gt;global&lt;/sub&gt;, but after doing a contrast stretch of the original image (i.e., linearly mapping minimum intensity to 0 and maximum to 255).  Doing floating point math or 64-bit integer arithmetic is much more expensive, hence the additive threshold.  Furthermore, WordSnap does not use integral images because the same runtime can be achieved without the need to allocate a large buffer. &lt;strong&gt;Memory allocation on a mobile device is not cheap:&lt;/strong&gt; the time needed to allocate a 480×320 buffer of 32-bit integers (about 600KB total) varies significantly depending on how much system memory is available, whether the garbage collector is triggered and so on, but on average it&amp;#8217;s about half a second on the G1. Even though most buffers can be allocated once, &lt;strong&gt;startup time is important&lt;/strong&gt; for this application: if it takes more than 2-3 seconds to start scanning, the user might as well have typed the result.&lt;/p&gt;
&lt;p&gt;Anyway, here is the final result of locally adaptive thresholding:&lt;/p&gt;
&lt;p&gt;&lt;img class="alignnone size-full wp-image-153" title="Binarization with local mean filter" src="http://www.bitquill.net/blog/wp-content/uploads/2009/08/skew_bin.png" alt="Binarization with local mean filter" width="512" height="384" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-family: 'Lucida Grande', Tahoma, Arial, sans-serif; font-weight: bold; font-size: 1em"&gt;Conclusion:&lt;/span&gt; In this case we needed the slightly smarter approach, so we invested the time to implement it efficiently. WordSnap currently uses a 21×21 neighborhood.  Altogether, binarization takes under 100ms.&lt;/p&gt;
&lt;h3&gt;Skew&lt;/h3&gt;
&lt;p&gt;Another problem is that the orientation of the text lines may not be aligned with image edges.  This is called skew and makes recognition much harder.&lt;/p&gt;
&lt;p&gt;Initially, I set out to find a way to correct for skew.  After a few searches on Google, I came across the &lt;a title="Hough transform - Wikipedia" href="http://en.wikipedia.org/wiki/Hough_transform"&gt;Hough transform&lt;/a&gt;.  The idea is simple.  Sayyou want to detect a curve desribed by a set of parameters. E.g., for a line, those would be distance ρ from origin and slope θ. For each black pixel, find the parameter values for all possible curves to which this pixel may belong. For a line, that&amp;#8217;s all angles θ from 0 to 180 degrees, and all distances ρ from 0 to sqrt(m&lt;sup&gt;2&lt;/sup&gt;+n&lt;sup&gt;2&lt;/sup&gt;).  Then, compute the density distribution of parameter tuples.  If a line (ρ&lt;sub&gt;0&lt;/sub&gt;,θ&lt;sub&gt;0&lt;/sub&gt;) is present in the image, then the parameter density distribution should have a local maximum at (ρ&lt;sub&gt;0&lt;/sub&gt;,θ&lt;sub&gt;0&lt;/sub&gt;).&lt;/p&gt;
&lt;p&gt;If we apply this approach to our example image, the first maximum is detected at an angle of 20 degrees. Here is the image counter-rotated by that amount:&lt;/p&gt;
&lt;p&gt;&lt;img class="size-full wp-image-150 alignnone" title="Skew correction using Hough transform" src="http://www.bitquill.net/blog/wp-content/uploads/2009/08/skew_bin_rot.png" alt="After rotating by angle detected using Hough transform" width="512" height="384" /&gt;&lt;/p&gt;
&lt;p&gt;Success!  However, computing the Hough transform is too slow!  Typical implementations bucketize the parameter space. This would require a buffer of about 180×580 32-bit integers (for a 480×320 image), or about 410KB. In addition, it would require trigonometric operations or lookups to find the buckets for each pixel, not to mention counter-rotation. There are obvious optimizations one can try, such as computing histograms at multiple resolutions to progressively prune the parameter space.  Still, the cost implied by back-of-the envelope calculations put me off from even trying to implement this on the phone. Instead, why not just try to use the users:&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.bitquill.net/blog/wp-content/uploads/2009/09/finder.png"&gt;&lt;img class="alignnone size-full wp-image-181" title="Finder alignment guides" src="http://www.bitquill.net/blog/wp-content/uploads/2009/09/finder.png" alt="Finder alignment guides" width="480" height="320" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-family: 'Lucida Grande', Tahoma, Arial, sans-serif; font-weight: bold; font-size: 1em"&gt;Conclusion:&lt;/span&gt; Simple approach with help from user wins, and&lt;strong&gt; the computer doesn&amp;#8217;t even have to do &lt;em&gt;anything&lt;/em&gt; to solve the problem!&lt;/strong&gt; Incidentally, the guideline width is determined by the size of typical newsprint text at the smallest distance that the G1&amp;#8217;s camera can focus.&lt;/p&gt;
&lt;h3&gt;Font size&lt;/h3&gt;
&lt;p&gt;Next, we need to detect individual words.  The approach WordSnap uses is to &lt;a title="Dilation - Mathematical morphology - Wikipedia" href="http://en.wikipedia.org/wiki/Mathematical_morphology#Dilation"&gt;dilate&lt;/a&gt; the binary image with a rectangular structuring element (in the following image, the size 7×7), and then expand a rectangle (shown in green) until it covers the connected component which, presumably, is one word.&lt;/p&gt;
&lt;p&gt;&lt;img class="alignnone size-full wp-image-154" title="Dilation with 7x7 rectangle" src="http://www.bitquill.net/blog/wp-content/uploads/2009/08/skew_bin_rot_dilate3.png" alt="Dilation with 7x7 rectangle" width="512" height="384" /&gt;&lt;/p&gt;
&lt;p&gt;However, the size of the structuring element should really depend on the inter-word spacing, which in turn depends on the typeface as well as the distance of the camera from the text.  For example, if we use a 5×5 element, we would get the following:&lt;/p&gt;
&lt;p&gt;&lt;img class="alignnone size-full wp-image-155" title="Dilation with 5x5 rectangular element" src="http://www.bitquill.net/blog/wp-content/uploads/2009/08/skew_bin_rot_dilate2.png" alt="Dilation with 5x5 rectangular element" width="512" height="384" /&gt;&lt;/p&gt;
&lt;p&gt;I briefly toyed with two ideas for font size detection.  The first is to do a Fourier transform.  Presumably the first spatial frequency mode would correspond to inter-word and/or inter-line spacing and the second mode to inter-character spacing. But that assumes we apply Fourier to a &amp;#8220;large enough&amp;#8221; portion of the image, and things start becoming complicated.  Not to mention computationally expensive.&lt;/p&gt;
&lt;p&gt;The second approach (which also appears to be the most common?) is to to hierarchical grouping. First expand rectangles to cover individual letters (or, sometimes, ligatures), then compute histogram of horizontal distances and re-group into word rectangles, and so on.  This is also non-trivial.&lt;/p&gt;
&lt;p&gt;Instead, WordSnap uses a fixed dilation radius.  The implementation is optimized to allow near-realtime annotation of the detected word extent.  This video should give you an idea:&lt;br /&gt;
&lt;object width="425" height="344" data="http://www.youtube.com/v/GhUOWbOmn6s&amp;amp;hl=en&amp;amp;fs=1&amp;amp;rel=0" type="application/x-shockwave-flash"&gt;&lt;param name="allowFullScreen" value="true" /&gt;&lt;param name="allowscriptaccess" value="always" /&gt;&lt;param name="src" value="http://www.youtube.com/v/GhUOWbOmn6s&amp;amp;hl=en&amp;amp;fs=1&amp;amp;rel=0" /&gt;&lt;param name="allowfullscreen" value="true" /&gt;&lt;/object&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-family: 'Lucida Grande', Tahoma, Arial, sans-serif; font-weight: bold; font-size: 1em"&gt;Conclusion:&lt;/span&gt; Simple wins again, but this time we have to do &lt;em&gt;something&lt;/em&gt; (and let the user help with the rest). But,&lt;strong&gt; instead of trying to be &lt;em&gt;smart&lt;/em&gt; and find the best parameters given the camera position, we try to be &lt;em&gt;fast&lt;/em&gt;: fix the parameters and let the user find the camera position that works given the parameters. &lt;/strong&gt;WordSnap uses a 5×5 rectangular structuring element, although you can change that to 3×3 or 7×7 in the preferenfces screen. Altogether, word extent detection takes about 150-200ms, although it could be significantly optimized, if necessary, by using only JNI only, instead of a mix of pure Java and JNI calls.&lt;/p&gt;
&lt;hr /&gt;I&amp;#8217;m now looking into the possibility of moving OCR into the &amp;#8220;live&amp;#8221; loop: as you move the camera, the phone shows not only the word extent rectangle, but also the recognized word.  Perhaps as a hyperlink to Google, or along with Google Translate results.  Then I can justifiably use the buzzword of the day, &lt;strong&gt;&amp;#8220;augmented reality&amp;#8221;&lt;/strong&gt;!  It looks that it might just be possible, but let me get back to you in a week or two.  :)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Postscript:&lt;/strong&gt; Some of the papers referenced were pointed out to me by &lt;a title="Hideaki Goto - Homepage" href="http://www.sc.isc.tohoku.ac.jp/~hgot/"&gt;Hideaki Goto&lt;/a&gt;, who started and maintains &lt;a title="WeOCR - Homepage" href="http://weocr.ocrgrid.org/"&gt;WeOCR&lt;/a&gt;. Also, skew detection and correction experiments are based on this &lt;a href="http://www.bitquill.net/blog/wp-content/uploads/2009/09/test_skew.txt"&gt;quick-n-dirty Python script&lt;/a&gt; (needs &lt;a title="OpenCV - Homepage" href="http://opencv.willowgarage.com/wiki/"&gt;OpenCV&lt;/a&gt; and it ain&amp;#8217;t pretty!). &lt;em&gt;Update (9/2):&lt;/em&gt; Fixed really stupid mistake in parametrization of line.&lt;/p&gt;
&lt;img src="http://feeds.feedburner.com/~r/bitquill-all/~4/iObzk_-dbgo" height="1" width="1"/&gt;</content>
		<link rel="replies" type="text/html" href="http://www.bitquill.net/blog/?p=119#comments" thr:count="1" />
		<link rel="replies" type="application/atom+xml" href="http://www.bitquill.net/blog/?feed=atom&amp;p=119" thr:count="1" />
		<thr:total>1</thr:total>
	</entry>
		<entry>
		<author>
			<name>spapadim</name>
						<uri>http://www.bitquill.net/</uri>
					</author>
		<title type="html"><![CDATA[My godfather is a Markov model]]></title>
		<link rel="alternate" type="text/html" href="http://www.bitquill.net/blog/?p=66" />
		<id>http://www.bitquill.net/blog/?p=66</id>
		<updated>2009-05-07T06:31:17Z</updated>
		<published>2009-05-07T06:31:17Z</published>
		<category scheme="http://www.bitquill.net/blog" term="Life bits" /><category scheme="http://www.bitquill.net/blog" term="Web" />		<summary type="html"><![CDATA[Before the memory is completely lost in the dust of time, I&#8217;d like to document how I ended up with this domain name. It all started last summer, when I decided to start a personal site. Of course, both my first and last names were already taken, even in TLDs I&#8217;d never heard of before. [...]]]></summary>
		<content type="html" xml:base="http://www.bitquill.net/blog/?p=66">&lt;p&gt;Before the memory is completely lost in the dust of time, I&amp;#8217;d like to document how I ended up with this domain name. It all started last summer, when I decided to start a personal site. Of course, both my first and last names were already taken, even in TLDs I&amp;#8217;d never heard of before.  But using my name would have been &lt;em&gt;too&lt;/em&gt; easy anyway.  Challenge is good.&lt;/p&gt;
&lt;p&gt;Politically-correct and totally un-sarcastic as I am, I originally wanted to go with some combination of &amp;#8220;principled anarchy&amp;#8221;.  Now, &lt;em&gt;that&lt;/em&gt; was available! Apparently, nobody wanted to touch it with a ten foot pole, not even cybersquatters; which kind of gave me a hint.  Wouldn&amp;#8217;t want to, say, end up in a three-letter-agency watchlist, at least not while in the US on H1B.  They might not share my sense of humor.&lt;/p&gt;
&lt;p&gt;So, armed with online thesauri, dictionaries, the &lt;a title="Internet Anagram Server / I, Rearrangement Servant" href="http://wordsmith.org/anagram/"&gt;internet anagram server&lt;/a&gt;, and things like that, I set out on a name quest.  I don&amp;#8217;t remember anymore what I tried; &amp;#8220;coredump&amp;#8221; (which, in case you didn&amp;#8217;t know, has &amp;#8220;code rump&amp;#8221; as an anagram—still available, if you&amp;#8217;re interested), &amp;#8220;segfault&amp;#8221;, &amp;#8220;brainfart&amp;#8221;, &amp;#8220;farout&amp;#8221;, and pretty much anything else I could think of: all taken.   Even &lt;a title="Top 10 Worst Domain Names - Dreamhost Unofficial Blog" href="http://blog.dreamhosters.com/2006/07/26/top-10-worst-domain-names/"&gt;these names&lt;/a&gt; as well as &lt;a title="20 More Unfortunate Domain Names - Dreamhost Unofficial Blog" href="http://blog.dreamhosters.com/2007/01/26/20-more-unfortunate-domain-names/"&gt;these&lt;/a&gt; are taken (thank god!).&lt;/p&gt;
&lt;p&gt;At some point I was naïve enough to hope that a Tolkien name would be free.  No luck of course, anything semi-pronnounceable was taken.  You&amp;#8217;d have to go as far as, say,  &amp;#8221;gulduin&amp;#8221; (which, by the way, means &amp;#8220;magic river&amp;#8221; in Elvish) to find something available. Good luck getting people to remember &lt;em&gt;that&lt;/em&gt;!  Oh well, at least I had a reason to actually read some of the Silmarillion; if you&amp;#8217;ve tried this and you&amp;#8217;re not a religiously devoted Tolkien fan, you know what I&amp;#8217;m talking about.&lt;/p&gt;
&lt;p&gt;After the first week of searching, I think I even got temporarily banned from Yahoo! whois search. In desperation, I finally turned to one of many domain name generators.  I asked omniscient Google to give me one and, as always, &lt;a title="Domain Name Generator and Search - MakeWords.com" href="http://www.makewords.com/"&gt;it obliged&lt;/a&gt;.  By now I had decided that I wanted a name as free of any connotations as possible (say, like Google or Slashdot, not like Facebook or YouTube).  I went through things like &amp;#8220;fractors&amp;#8221;, &amp;#8220;naphead&amp;#8221;, &amp;#8220;magnarchy&amp;#8221;, &amp;#8220;aniarchy&amp;#8221;, &amp;#8220;mallock&amp;#8221;, &amp;#8220;hexndex&amp;#8221;, &amp;#8220;squilt&amp;#8221;, &amp;#8220;terable&amp;#8221;, and so on. It&amp;#8217;s amazing how several weeks of searching in frustration temper one&amp;#8217;s standards of quality. Anyway, one day &amp;#8220;bitquill&amp;#8221; popped up: neutral, inoffensive, bland, unusual, and a composite which is short and almost pronnounceable!  I couldn&amp;#8217;t ask for much more, so I registered it.  &lt;/p&gt;
&lt;p&gt;That, and &amp;#8220;clusterhack&amp;#8221;.  Sorry.  I couldn&amp;#8217;t resist.&lt;/p&gt;
&lt;img src="http://feeds.feedburner.com/~r/bitquill-all/~4/6xMKLp5_S7w" height="1" width="1"/&gt;</content>
		<link rel="replies" type="text/html" href="http://www.bitquill.net/blog/?p=66#comments" thr:count="0" />
		<link rel="replies" type="application/atom+xml" href="http://www.bitquill.net/blog/?feed=atom&amp;p=66" thr:count="0" />
		<thr:total>0</thr:total>
	</entry>
		<entry>
		<author>
			<name>spapadim</name>
						<uri>http://www.bitquill.net/</uri>
					</author>
		<title type="html"><![CDATA[Hello Android]]></title>
		<link rel="alternate" type="text/html" href="http://www.bitquill.net/blog/?p=65" />
		<id>http://www.bitquill.net/blog/?p=65</id>
		<updated>2009-05-09T23:17:21Z</updated>
		<published>2009-05-06T16:20:05Z</published>
		<category scheme="http://www.bitquill.net/blog" term="Sci &amp; Tech" /><category scheme="http://www.bitquill.net/blog" term="Android" /><category scheme="http://www.bitquill.net/blog" term="Development" />		<summary type="html"><![CDATA[After blabbering about Android, I decided to get my hands a little dirty and actually write some code. For various reasons, I won&#8217;t describe the app (it was a &#8220;weekend hack&#8221; anyway), but hopefully my first impressions will be clear even without a specific context.
Overall, the Android APIs are quite impressive, even though some edges [...]]]></summary>
		<content type="html" xml:base="http://www.bitquill.net/blog/?p=65">&lt;p&gt;After blabbering about Android, I decided to get my hands a little dirty and actually write some code. For various reasons, I won&amp;#8217;t describe the app (it was a &amp;#8220;weekend hack&amp;#8221; anyway), but hopefully my first impressions will be clear even without a specific context.&lt;/p&gt;
&lt;p&gt;Overall, the Android APIs are quite impressive, even though some edges are still rough.  It was reasonably easy to get up to speed, even though my prior experience on mobile application frameworks was zero.  The toughest part was getting used to the heavily event-based programming style, as well as the idea that your code may be interrupted, killed and restarted at any time.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Activity lifecycle.&lt;/strong&gt; Although Android supports multitasking and concurrency, on a mobile device with limited memory and no swap it&amp;#8217;s likely that the O/S will have to kill some or all of your tasks to reclaim resources needed by higher-priority, user-visible processes (e.g., an incoming phone call).  If you have non-persistent or external state, such as open database connections or separate threads that fetch data in the background, things may get a little tricky. Although Android has auxiliary features such as managed cursors and dialogs, you still need to know they exist and use them properly.&lt;/p&gt;
&lt;p&gt;However, even things like &lt;a title="Configuration Changes - android.app.Activity Javadoc" href="http://code.google.com/android/reference/android/app/Activity.html#ConfigurationChanges"&gt;screen orientation changes&lt;/a&gt; are handled by terminating and restarting any affected activities. At first, while spending a couple of hours to figure out why my app was crashing when I opened the keyboard, I bitched about this. Apparently, I wasn&amp;#8217;t the only one who was confused. To my surprise, I found that many Android Market apps crash when the screen is rotated.  Some Market apps even come with grave-sounding warnings that, e.g., &amp;#8220;the life counter [sic] resets on screen orientation change =/ Will fix for new version.&amp;#8221; Luckily, I also found numerous good posts about orientation changes, such as &lt;a title="Rotational Forces, Part Two - Mark Murphy on AndroidGuys blog" href="http://androidguys.com/?p=2642"&gt;this&lt;/a&gt; or &lt;a title="Rotational Forces, Part Three - Mark Murphy on AndroidGuys blog" href="http://androidguys.com/?p=2723"&gt;this&lt;/a&gt; (the &lt;a title="Mark Murphy on AndroidGuys" href="http://androidguys.com/?author=20"&gt;series by Mark Murphy&lt;/a&gt; are pretty good, by the way), as well as a &lt;a title="Faster screen orientation changes - Android Developers Blog" href="http://android-developers.blogspot.com/2009/02/faster-screen-orientation-change.html"&gt;post on the official blog&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In retrospect, handling orientation changes in this way is a good thing: it forces app developers to be prepared. After I fixed my code to handle orientation changes gracefully, I found that I was also ready to properly handle other sources of interruption: when an incoming call came as I was testing my app, everything worked out beautifully.&lt;/p&gt;
&lt;p&gt;Now, whenever I download an app, I perform the following test: I flip the keyboard open when the app executes a background operation, even if I don&amp;#8217;t need to type anything.  If the app crashes or gets into an inconsistent state (something that happens surprisingly often), that&amp;#8217;s a strong indication that the code is not very robust.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Event handling.&lt;/strong&gt; For APIs that are so heavily event-based, one of my gripes was that some (but not all) event handlers are based on inheritance rather than &lt;a title="Delegation pattern - Wikipedia" href="http://en.wikipedia.org/wiki/Delegation_pattern"&gt;delegation&lt;/a&gt;. These design choices are probably due to &lt;a title="Prefer virtual over interface - Designing for Performance - Android documentation" href="http://developer.android.com/guide/practices/design/performance.html#prefer_virtual"&gt;performance reasons&lt;/a&gt; that may be specific to Dalvik, the Android VM which is motivated partly for &lt;a title="Dalvik: how Google routed around Sun’s IP-based licensing restrictions on Java ME - Stefano's Linotype" href="http://www.betaversion.org/~stefano/linotype/news/110/"&gt;non-technical reasons&lt;/a&gt;. &lt;/p&gt;
&lt;p&gt;However, inheritance sometimes complicates things. For example, Android supports managed cursors and dialogs via methods in the base Activity class. On more than one occasion I found that managed threads would also be nice.  Implementing this requires hooking into the activity lifecycle events (and has, on occasion, been &lt;a title="A Simple [sic] Android App and a Threading Bug - OCIWeb" href="http://www.ociweb.com/jnb/jnbJan2009.html"&gt;over-engineered to death&lt;/a&gt;). Because there are several Activity subclasses (e.g., ListActivity, PreferenceActivity, etc), there is no simple way to extend them all. If lifecycle events were handled via delegates, it would be possible to implement a background UI thread manager as, say, an activity &lt;a title="Decorator pattern - Wikipedia" href="http://en.wikipedia.org/wiki/Decorator_pattern"&gt;decorator&lt;/a&gt; that can be added to any activity instance.  &lt;/p&gt;
&lt;p&gt;The delegation-based event model was introduced in Java 1.1 precisely &lt;a title="Java AWT: Delegation Event Model - Sun.com" href="http://java.sun.com/j2se/1.3/docs/guide/awt/designspec/events.html"&gt;to address such shortcomings&lt;/a&gt; of the inheritance-based model. But, being pragmatic about performance on current mobile devices, I should probably not complain too much.  Still, some API design choices seem a bit arbitrary, perhaps even Microsoft-esque: why would performance be an issue with lifecycle events (which are presumably rare, but handlers use inheritance) but not with click events (which are presumably more frequent, but handlers use delegation)?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Data sync and caching.&lt;/strong&gt; Another gripe was the lack of &lt;a title="android.content.AbstractSyncableContentProvider - Git" href="http://android.git.kernel.org/?p=platform/frameworks/base.git;a=blob;f=core/java/android/content/AbstractSyncableContentProvider.java"&gt;syncable content providers&lt;/a&gt;, something I&amp;#8217;ve mentioned before. Also, content providers aren&amp;#8217;t really appropriate for network-hosted data. The requirement that content providers use an integer primary key (row ID) is reasonable for local databases and simplifies the APIs, but requires some book-keeping when that&amp;#8217;s not the &amp;#8220;natural&amp;#8221; primary key.&lt;/p&gt;
&lt;p&gt;Ideally, I&amp;#8217;d like to see some support for caching remote data on the SD card (which would require gracefully handling card removal, and transparently fetching data either from the cache or the network). Although the core APIs provide all that is necessary to implement this from scratch, it was getting too complicated for my simple &amp;#8220;weekend hack&amp;#8221; app, so I decided to drop it.&lt;/p&gt;
&lt;p&gt;I hope that, in the near future, porting web apps to mobile devices will become easier with the support for &lt;a title="Offline web applications - HTML5 (Working Draft) - W3C" href="http://www.w3.org/TR/html5/offline.html"&gt;offline applications&lt;/a&gt; and &lt;a title="Structured client-side storage - HTML5 (Working draft) - W3C" href="http://www.w3.org/TR/html5/structured.html"&gt;client-side storage&lt;/a&gt; in HTML5, as well the proposed &lt;a title="Geolocation API specification (Editor's draft) - W3C" href="http://dev.w3.org/geo/api/spec-source.html"&gt;geolocation APIs&lt;/a&gt; (all of which are already &lt;a title="Gears as a bleeding-edge HTML 5 implementation" href="http://almaer.com/blog/gears-as-a-bleeding-edge-html-5-implementation"&gt;part of Google Gears&lt;/a&gt;). An application manifest might include &amp;#8220;web activities&amp;#8221;, translating intents into HTTP POST requests, while granting device access permissions to those activities (e.g., see promising hacks such as &lt;a title="OilCan: GreaseMonkey on steroids for Android" href="http://www.jsharkey.org/blog/2008/12/15/oilcan-greasemonkey-on-steroids-for-android/"&gt;OilCan&lt;/a&gt;). Porting might then involve little more than writing a new stylesheet. Perhaps that&amp;#8217;s where Palm is going with its &lt;a title="Palm WebOS Developer Site (retrieved 2/24)" href="http://developer.palm.com/"&gt;WebOS&lt;/a&gt; which apparently supports both &amp;#8220;native application&amp;#8221; and &amp;#8220;web application&amp;#8221; models, but information is rather thin at the moment.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Epilogue.&lt;/strong&gt; My first Android app was an interesting learning experience, not only from a technical standpoint (perhaps more on this in another post). I also found that Android is quite stable. I sometimes used my phone for live debugging, forcefully killing threads and processes through ADB.  Let me put it this way: if it wasn&amp;#8217;t for the RC33 OTA update, my phone would now have an uptime of a few months. For a piece of software that barely existed a year ago, this is impressive.&lt;/p&gt;
&lt;p&gt;There is plenty of documentation available, but at times it can take some searching to find the necessary information.  However, since Android is open-source, it&amp;#8217;s always possible to consult the source code itself (which is fairly well-written and documented).&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Note:&lt;/strong&gt; This post was mostly written sometime around February. Since then I had no time to try SDK v1.5, but I believe most points above are still relevant.&lt;/em&gt;&lt;/p&gt;
&lt;img src="http://feeds.feedburner.com/~r/bitquill-all/~4/ZrqRCgKr41g" height="1" width="1"/&gt;</content>
		<link rel="replies" type="text/html" href="http://www.bitquill.net/blog/?p=65#comments" thr:count="0" />
		<link rel="replies" type="application/atom+xml" href="http://www.bitquill.net/blog/?feed=atom&amp;p=65" thr:count="0" />
		<thr:total>0</thr:total>
	</entry>
		<entry>
		<author>
			<name>spapadim</name>
						<uri>http://www.bitquill.net/</uri>
					</author>
		<title type="html"><![CDATA[Back again&#8230;]]></title>
		<link rel="alternate" type="text/html" href="http://www.bitquill.net/blog/?p=71" />
		<id>http://www.bitquill.net/blog/?p=71</id>
		<updated>2009-04-07T19:40:59Z</updated>
		<published>2009-04-07T19:40:59Z</published>
		<category scheme="http://www.bitquill.net/blog" term="Life bits" /><category scheme="http://www.bitquill.net/blog" term="Pointless" />		<summary type="html"><![CDATA[
After coming back from Seoul, New York seemed even dinkier than the last time I returned from a trip. As I was boarding the plane at Incheon, I picked up a copy of the Wall Street Journal (Asian edition). I had enough time to read almost all of it, as KAL arrived into Narita early, but Continental [...]]]></summary>
		<content type="html" xml:base="http://www.bitquill.net/blog/?p=71">&lt;p&gt;&lt;img class="size-full wp-image-73 alignnone" title="Seoul - Collage" src="http://www.bitquill.net/blog/wp-content/uploads/2009/04/korea-collage.jpg" alt="Seoul" width="630" height="340" /&gt;&lt;/p&gt;
&lt;p&gt;After coming back from Seoul, New York seemed even dinkier than &lt;a title="A dog's life - bitquill.net" href="http://www.bitquill.net/blog/?p=45"&gt;the last time I returned from a trip&lt;/a&gt;. As I was boarding the plane at Incheon, I picked up a copy of the Wall Street Journal (Asian edition). I had enough time to read almost all of it, as KAL arrived into Narita early, but Continental was six hours late. It might as well have been called &amp;#8220;The GM Journal&amp;#8221;, since about two thirds of the stories were about GM and Chrysler, and how the US government is trying to save them from doom due to chronic mis-management and exorbitant legacy costs.  &lt;/p&gt;
&lt;p&gt;My wife, who has a far more sensitive nose than me, jokes that the first thing you smell upon disembarking the plane is cigarette smoke in Greece, and garlic in Korea.  Upon arriving at Newark (or any NYC airport, for that matter), even I can smell the mouldy carpets.  Getting on the subway the next morning, the smell was even worse and the signs of age everywhere.  I sat down, right across a poster ad by NYC Department of Consumer Affairs that read &amp;#8220;Debt Stress?  You&amp;#8217;re not alone&amp;#8221;.  Someone had plastered a makeshift sticker on top, reading &amp;#8220;Kill Your Boss&amp;#8221;.  After a ride on Metro North, I got into a taxi to work.  It was one of those Ford relics, with a severely dented right side, a cracked windshield and a barely functioning transmission, but still street-legal.  As the cab ended up triple-booked and I was the last one to get off, I got a 35-minute scenic tour through backstreets and pothole-riddled roads before finally arriving to the office.&lt;/p&gt;
&lt;p&gt;The experience was enough to make me look up the definition of &amp;#8220;&lt;a title="Developing country - Wikipedia" href="http://en.wikipedia.org/wiki/Developing_country"&gt;developing country&lt;/a&gt;&amp;#8221; in Wikipedia. Honestly, I don&amp;#8217;t get why South Korea is sometimes still listed as such (e.g., in WSJ and, if memory serves me right, in the Economist), while the US isn&amp;#8217;t. Something tells me it&amp;#8217;s more than GM that needs patching up. Anyway, welcome back home!&lt;/p&gt;
&lt;img src="http://feeds.feedburner.com/~r/bitquill-all/~4/tCTq3cqECWQ" height="1" width="1"/&gt;</content>
		<link rel="replies" type="text/html" href="http://www.bitquill.net/blog/?p=71#comments" thr:count="0" />
		<link rel="replies" type="application/atom+xml" href="http://www.bitquill.net/blog/?feed=atom&amp;p=71" thr:count="0" />
		<thr:total>0</thr:total>
	</entry>
		<entry>
		<author>
			<name>spapadim</name>
						<uri>http://www.bitquill.net/</uri>
					</author>
		<title type="html"><![CDATA[&#8220;Life is pointless&#8221; ??]]></title>
		<link rel="alternate" type="text/html" href="http://www.bitquill.net/blog/?p=67" />
		<id>http://www.bitquill.net/blog/?p=67</id>
		<updated>2009-03-20T20:22:15Z</updated>
		<published>2009-03-20T20:22:15Z</published>
		<category scheme="http://www.bitquill.net/blog" term="Life bits" /><category scheme="http://www.bitquill.net/blog" term="Pointless" />		<summary type="html"><![CDATA[That&#8217;s what you get when you use colorful tags like &#8220;life bits&#8221; and &#8220;pointless&#8220;, especially if you use them together:  Google thinks your website is highly relevant to the query &#8220;life is pointless&#8221;.

I&#8217;m now experimenting with a separate tumblelog to post most random thoughts but, in the meantime, here you go Google: one more post [...]]]></summary>
		<content type="html" xml:base="http://www.bitquill.net/blog/?p=67">&lt;p&gt;That&amp;#8217;s what you get when you use colorful tags like &amp;#8220;&lt;a title="Posts tagged " href="http://www.bitquill.net/blog/?tag=life-bits"&gt;life bits&lt;/a&gt;&amp;#8221; and &amp;#8220;&lt;a title="Posts tagged " href="http://www.bitquill.net/blog/?tag=pointless"&gt;pointless&lt;/a&gt;&amp;#8220;, especially if you use them together:  Google thinks your website is highly relevant to the query &amp;#8220;life is pointless&amp;#8221;.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.bitquill.net/blog/wp-content/uploads/2009/03/google-webmaster-feb09.jpg"&gt;&lt;img class="size-full wp-image-70" title="Google Webmaster Tools" src="http://www.bitquill.net/blog/wp-content/uploads/2009/03/google-webmaster-feb09.jpg" alt="Google Webmaster Tools for bitquill.net, February 2009" width="640" height="511" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I&amp;#8217;m now experimenting with a &lt;a title="Tumblr for bitquill.net" href="http://tumblr.bitquill.net/"&gt;separate tumblelog&lt;/a&gt; to post most random thoughts but, in the meantime, here you go Google: one more post about &amp;#8220;pointless life (bits)&amp;#8221;!&lt;/p&gt;
&lt;img src="http://feeds.feedburner.com/~r/bitquill-all/~4/YYC3PSvLGxI" height="1" width="1"/&gt;</content>
		<link rel="replies" type="text/html" href="http://www.bitquill.net/blog/?p=67#comments" thr:count="0" />
		<link rel="replies" type="application/atom+xml" href="http://www.bitquill.net/blog/?feed=atom&amp;p=67" thr:count="0" />
		<thr:total>0</thr:total>
	</entry>
		<entry>
		<author>
			<name>spapadim</name>
						<uri>http://www.bitquill.net/</uri>
					</author>
		<title type="html"><![CDATA[Revised thoughts on Android]]></title>
		<link rel="alternate" type="text/html" href="http://www.bitquill.net/blog/?p=63" />
		<id>http://www.bitquill.net/blog/?p=63</id>
		<updated>2009-01-30T15:24:33Z</updated>
		<published>2008-12-01T15:17:49Z</published>
		<category scheme="http://www.bitquill.net/blog" term="Sci &amp; Tech" /><category scheme="http://www.bitquill.net/blog" term="Android" /><category scheme="http://www.bitquill.net/blog" term="Mobile devices" /><category scheme="http://www.bitquill.net/blog" term="Opinion" /><category scheme="http://www.bitquill.net/blog" term="Technology" />		<summary type="html"><![CDATA[The post I wrote a few days ago about Android is all over the place. The right elements are in that post, but my composition and conclusions are somewhat incoherent. Perhaps I have been partly infected by the conventional thinking (of, e.g., various older, big corporations) and missed the obvious.
First, in a networked environment, it [...]]]></summary>
		<content type="html" xml:base="http://www.bitquill.net/blog/?p=63">&lt;p&gt;The &lt;a title="First thoughts on Android" href="http://www.bitquill.net/blog/?p=57"&gt;post&lt;/a&gt; I wrote a few days ago about Android is all over the place. The right elements are in that post, but my composition and conclusions are somewhat incoherent. Perhaps I have been partly infected by the conventional thinking (of, e.g., various older, big corporations) and missed the obvious.&lt;/p&gt;
&lt;p&gt;First, in a networked environment, it is common &lt;em&gt;standards&lt;/em&gt;, rather than a single, common software platform, which further enable information sharing. So, &lt;strong&gt;Google may be doing Android for precisely the &lt;em&gt;opposite&lt;/em&gt; reason than I originally suggested&lt;/strong&gt;: to avoid the emergence of a single, dominant, proprietary platform. Chrome may exist for a similar reason. After all, Android serves a purpose similar to a browser, but for mobile devices with various sensing modalities.&lt;/p&gt;
&lt;p&gt;Finally, mobile is arguably an important area and Google probably wants to &lt;strong&gt;encourage diversity and experimentation which, as I wrote in a &lt;a title="The long tail of ideas" href="http://www.bitquill.net/blog/?p=14"&gt;previous post&lt;/a&gt;, is a pre-requisite for innovation.&lt;/strong&gt; This is in contrast to the established mentality summarized by the quote I previously mentioned, to &amp;#8220;find an idea and ask yourself: is the potential market worth at least one billlion dollars? If not, then walk away.&amp;#8221; In fairness, this approach is appropriate to preserve the status quo. (By the way, in the same public speech, the person who gave this advice also responded to a question about competition by saying with commendable directness that &amp;#8220;Look: we&amp;#8217;ll all be dead some day.  But there&amp;#8217;s a lot of money to be made until then.&amp;#8221;)  But for innovation of any kind, one should &amp;#8220;ask &amp;#8216;why not?&amp;#8217; instead of &amp;#8216;why should we do it?&amp;#8217;&amp;#8221; as &lt;a title="Amazon’s Jeff Bezos on strategy &amp;amp; innovation (not Kindle-related!) - Tech IT Easy" href="http://techiteasy.org/2007/11/20/amazons-jeff-bezos-on-strategy-innovation-not-kindle-related/"&gt;Jeff Bezos said&lt;/a&gt;, or &amp;#8220;innovate toward the light, not against the darkness&amp;#8221; as &lt;a title="Ray Ozzie Wants to Push Microsoft Back Into Startup Mode - Wired" href="http://www.wired.com/techbiz/people/magazine/16-12/ff_ozzie?currentPage=all"&gt;Ray Ozzie said&lt;/a&gt;.&lt;/p&gt;
&lt;img src="http://feeds.feedburner.com/~r/bitquill-all/~4/z3OItGAp4-w" height="1" width="1"/&gt;</content>
		<link rel="replies" type="text/html" href="http://www.bitquill.net/blog/?p=63#comments" thr:count="2" />
		<link rel="replies" type="application/atom+xml" href="http://www.bitquill.net/blog/?feed=atom&amp;p=63" thr:count="2" />
		<thr:total>2</thr:total>
	</entry>
		<entry>
		<author>
			<name>spapadim</name>
						<uri>http://www.bitquill.net/</uri>
					</author>
		<title type="html"><![CDATA[On data ownership in a networked world]]></title>
		<link rel="alternate" type="text/html" href="http://www.bitquill.net/blog/?p=7" />
		<id>http://www.bitquill.net/blog/?p=7</id>
		<updated>2008-11-22T17:20:59Z</updated>
		<published>2008-11-21T01:02:45Z</published>
		<category scheme="http://www.bitquill.net/blog" term="Sci &amp; Tech" /><category scheme="http://www.bitquill.net/blog" term="Accountability" /><category scheme="http://www.bitquill.net/blog" term="Authentication" /><category scheme="http://www.bitquill.net/blog" term="Data ownership" /><category scheme="http://www.bitquill.net/blog" term="Data portability" /><category scheme="http://www.bitquill.net/blog" term="Opinion" /><category scheme="http://www.bitquill.net/blog" term="Privacy" /><category scheme="http://www.bitquill.net/blog" term="Provenance" /><category scheme="http://www.bitquill.net/blog" term="Regulation" /><category scheme="http://www.bitquill.net/blog" term="Transparency" /><category scheme="http://www.bitquill.net/blog" term="Web" />		<summary type="html"><![CDATA[Every piece of content has a creator and owner (in this post, I will assume they are by default the same entity).  I do not mean ownership in the traditional sense of, e.g., stashing a piece of paper in a drawer, but in the metaphysical sense that each artifact is forever associated with one [...]]]></summary>
		<content type="html" xml:base="http://www.bitquill.net/blog/?p=7">&lt;p&gt;Every piece of content has a creator and owner (in this post, I will assume they are by default the same entity).  I do not mean ownership in the traditional sense of, e.g., stashing a piece of paper in a drawer, but in the metaphysical sense that each artifact is forever associated with one or more &amp;#8220;creators.&amp;#8221;&lt;/p&gt;
&lt;p&gt;This is certainly true of the end-products of intellectual labor, such as the article you are reading.  However, it is also true of more mundane things, such as checkbook register entries or credit card activity. Whenever you pay a bill or purchase an item, you implicitly &amp;#8220;create&amp;#8221; a piece of content: the associated entry in your statement.  This has two immediately identifiable &amp;#8220;creators&amp;#8221;: the payer (you) and the payee.  The same is true for, e.g., your email, your IM chats, your web searches, etc. Interesting tidbit: &lt;a title="Udi Manber - Search is a Hard Problem" href="http://www.readwriteweb.com/archives/udi_manber_search_is_a_hard_problem.php"&gt;over 20% of search terms entered daily in Google are new&lt;/a&gt;, which would imply roughly 20 million new pieces of content per day, or over 7 billion (over twice the earth&amp;#8217;s population) per year—all this from just one activity on one website.&lt;/p&gt;
&lt;p&gt;When I spend a few weeks working on, say, a research paper, I have certain expectations and demands about my rights as a &amp;#8220;creator.&amp;#8221; However, I give almost no thought to my rights on the trail of droppings (digital or otherwise) that I &amp;#8220;create&amp;#8221; each day, by searching the web, filling up the gas tank, getting coffee, going through a toll booth, swiping my badge, and so on.  &lt;strong&gt;However, with the increasing ease of data collection and distribution in digital form, we should re-think our attitudes towards &amp;#8220;authorship&amp;#8221;.&lt;/strong&gt;&lt;/p&gt;
&lt;h4&gt;Unique identity&lt;/h4&gt;
&lt;p&gt;People call me &amp;#8220;Spiros&amp;#8221;, my identity documents list me as &amp;#8220;Spyridon Papadimitriou&amp;#8221; and on most online sites I&amp;#8217;m registered as &lt;tt&gt;spapadim&lt;/tt&gt;.  However, sometimes I&amp;#8217;m &lt;tt&gt;s_papadim&lt;/tt&gt; or &lt;tt&gt;spiros_papadimitriou&lt;/tt&gt;, and so on.  Like most people, I lost track of all my accounts a time ago.  Vice versa, I&amp;#8217;m not the only &amp;#8220;Spiros Papadimitriou&amp;#8221; in the real world.  For example, I occasionally get confused with my cousin, and receive comments about my interesting architectural designs!  Nor am I the only &lt;tt&gt;spapadim&lt;/tt&gt; on the net.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;A framework and mechanisms that allow (but do not enforce) asserting and verifying which of those labels (i.e., names, userids, etc) refer to the same entity (i.e., me) is missing.&lt;/strong&gt; However, this is a prerequisite: how can we talk about data ownership and tackle portability, transparency and accountability, if we have to jump through countless hoops just to prove identity?&lt;/p&gt;
&lt;p&gt;Some people, especially in the US, may object or even outright panic at the thought of such a global identifier.  In Greece, and in much of Europe, we&amp;#8217;ve had national identity cards for decades.  Which is fine, as long as you know they exist and what are permissible uses-in other words, as long as transparency is ensured.  Furthermore, the &lt;em&gt;illusion&lt;/em&gt; of privacy should not be confused with privacy itself—if in doubt, I suggest reading &amp;#8220;&lt;a title="Database Nation (Google Books, no preview)" href="http://books.google.com/books?id=T_7TGAAACAAJ"&gt;Database Nation&lt;/a&gt;&amp;#8221; (&lt;a title="Database Nation (official site)" href="http://www.databasenation.com/"&gt;official site&lt;/a&gt;).  Its examples are largely US-centric, but the lessons are not.&lt;/p&gt;
&lt;p&gt;&lt;a title="OpenID website" href="http://openid.net/"&gt;OpenID&lt;/a&gt; (despite &lt;a title="OpenID Phishing Brainstorm (OpenID wiki)" href="http://wiki.openid.net/OpenID_Phishing_Brainstorm"&gt;some shortcomings&lt;/a&gt;) and &lt;a title="OAuth website" href="http://oauth.net/"&gt;OAuth&lt;/a&gt; are emerging as open standards for authentication and authorization.  OpenID allows reuse of authentication credentials from one site on others: I can reuse, say, my Google username and password to log in to other sites (e.g., to leave a comment on this blog), without having to create yet another account from scratch.  OAuth resembles Kerberos&amp;#8217;s ticket granting service but for the web, permitting other web services to ask for access to a subset of personal information: I could allow Facebook to access only my Google addressbook and not, potentially, all of my data on any Google service.  OpenID and OAuth can, at least in principle, work together.&lt;/p&gt;
&lt;p&gt;Both high-profile individual developers and major companies are involved in these efforts.  For example, Yahoo! &lt;a title="OpenID - Yahoo!" href="http://openid.yahoo.com/"&gt;already supports OpenID&lt;/a&gt; and &lt;a title="OAuth - Yahoo! Developer" href="http://developer.yahoo.com/oauth/"&gt;plans to support OAuth&lt;/a&gt; as well, while Google &lt;a title="OAuth - Google Accounts API" href="http://code.google.com/apis/accounts/docs/OAuth.html"&gt;supports OAuth&lt;/a&gt; directly and OpenID indirectly in &lt;a title="OpenID - Google AppSpot" href="http://openid-provider.appspot.com/"&gt;various&lt;/a&gt; &lt;a title="Google Offers OpenID Logins Via Blogger - TechCrunch" href="http://www.techcrunch.com/2008/01/18/google-offers-openid-logins-via-blogger/"&gt;ways&lt;/a&gt;.  Wide adoption of these standards would be a major step forwards for data portability and web interoperability.  However, I suspect they fall slightly short of providing a truly permanent and global personal identity.  What if, for any reason, my Yahoo! account disappears, either because I decided to shut it down or because Yahoo! went bust?&lt;/p&gt;
&lt;p&gt;I was going to suggest a DNS-based solution and I was surprised when I found that the generic top-level domain &lt;a title=".name - Wikipedia" href="http://en.wikipedia.org/wiki/.name"&gt;&lt;tt&gt;.name&lt;/tt&gt;&lt;/a&gt; has been instituted since 2001 to provide URIs for personal identities. You can register for a free three-month trial on &lt;a title="FreeYourID website" href="http://www.freeyourid.com/"&gt;FreeYourID&lt;/a&gt; (after that, it&amp;#8217;s $11/year). What&amp;#8217;s more, their service already provides OpenID authentication. In principle, this should allow easy switching of authentication and authorization service providers. Just as I can still keep the &amp;#8220;label&amp;#8221; for this site even if I move to a different web host, I can still keep my personal &amp;#8220;label&amp;#8221; no matter who I choose to manage my personal information.  So, now my universal username is &lt;tt&gt;&lt;tt&gt;spiros.papadimitriou.name&lt;/tt&gt;&lt;/tt&gt;, any emails sent to &lt;tt&gt;&lt;tt&gt;spiros@papadimitriou.name&lt;/tt&gt;&lt;/tt&gt; will find their way to me, you can call me on Skype using &lt;tt&gt;&lt;tt&gt;spiros.papadimitriou.name/call&lt;/tt&gt;&lt;/tt&gt;, and so on.&lt;/p&gt;
&lt;p&gt;With such a unique identity tied to authorization and authentication services, the &lt;a title="Giant Global Graph - Tim Berners Lee on DIG" href="http://dig.csail.mit.edu/breadcrumbs/node/215"&gt;Giant Global Graph&lt;/a&gt; and &lt;a title="URLs are People, Too" href="http://google-code-updates.blogspot.com/2008/02/urls-are-people-too.html"&gt;its materializations&lt;/a&gt; would be one step closer to becoming really useful. If I want to use my identity to log and controll access to my data, I should be able to prove my claims.  Currently, FOAF and XFN allow assertion of relationshipt but provide no way to verify them.&lt;/p&gt;
&lt;h4&gt;Data portability&lt;/h4&gt;
&lt;p&gt;The point of this mental exercise so far is the following: &lt;strong&gt;A unique identity that can be verifiably associated with each and every data item that I produce is a prerequisite for making data ownership claims.&lt;/strong&gt; Subsequently, we need to ask what fundamental rights should be associated with data ownership.  &lt;strong&gt;The first is the right to keep &lt;em&gt;my&lt;/em&gt; information with &lt;em&gt;me&lt;/em&gt;&lt;/strong&gt; or, in other words, &amp;#8220;data portability&amp;#8221;. Just as I can freely move my money from one financial institution to another, I should be able to move any of my information from one data warehouse to another.&lt;/p&gt;
&lt;p&gt;For example, consider my web search history. I don&amp;#8217;t think I need to argue about the importance of historical information to improve search quality. If I decide for any reason to move to another search provider, I should be able to carry along all the information that&amp;#8217;s directly associated with me.  This should include my search keyword history, as well as any &lt;a title="Google SearchWiki brings custom search results - CNET" href="http://news.cnet.com/8301-17939_109-10102750-2.html"&gt;additional information I may have contributed&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The actual details, however, may not be that straightforward.  Take, say, the third hit on a Google search.  Who is the &amp;#8220;creator&amp;#8221;?  Me by entering the search keywords, Google by producing the search results in response to those keywords, or the person who wrote the web page that contains them in the first place?  Similarly, when I buy gas, who is the &amp;#8220;creator&amp;#8221; of the transaction entry: me, Mobil, or American Express?&lt;/p&gt;
&lt;p&gt;Even though intuition can often be wrong, my intuitive response to the Google search example would be that both I and Google have an ownership claim on this particular search, which includes the query keywords as well as a ranking of URLs.  On the other hand, the person who wrote the contents of, say, the third URL has ownership claims only on those, and not the search results.  Furthermore, the thousands of people that provided feedback to Google&amp;#8217;s ranking algorithms by clicking on this URL on similar searches have ownership claims on those searches, but not on mine.&lt;/p&gt;
&lt;p&gt;Finally, those two ownership claims (on keywords and on rankings) should probably not be treated the same.  If they were, then, say, MS Live could effectively copy Google by getting many users to move.  It seems reasonable to have the right to move my search history, but not the actual search results. However, I can imagine that some form of ownership claim on the rankings may be useful for other personal rights.&lt;/p&gt;
&lt;p&gt;This is a highly idealized example and I&amp;#8217;m not sure what an appropriate litmus test for ownership is, but some form of legal consensus must be in place.&lt;/p&gt;
&lt;h4&gt;Transparency&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;The second fundamental right is that I should know &lt;em&gt;who&lt;/em&gt; is using my personal information and &lt;em&gt;how&lt;/em&gt;. &lt;/strong&gt;For example, if an insurance company accesses my credit history to give me a rate quote, I can find this out. It may not be a completely painless process but it is certainly possible today, with a regulatory framework that ensures this.  Similar regulations should be instituted to cover any and all forms of access to personal information.&lt;/p&gt;
&lt;p&gt;Data access should be fully transparent to all parties involved. If the an insurance company accesses my medical records, I should know this.  If the government does a background check on me, I should know this too.  Transparency is a prerequisite for accountability. Otherwise, individuals have very limited power to protect themselves from improper uses of their personal information.&lt;/p&gt;
&lt;h4&gt;Concluding remarks&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Much of the privacy research in computer science seems to assume that we can keep the existing legal and regulatory frameworks intact.&lt;/strong&gt; Computer scientists taking such a position is even sadder than lawyers doing so; we have no excuse of failing to understand the technical issues.&lt;strong&gt; We cannot and should not make this assumption.&lt;/strong&gt; Technical solutions should be subsidiary to new regulations.  But that doesn&amp;#8217;t mean technologists cannot lead.  We should work towards supporting full transparency (for both individuals, as well as governments and corporations) rather than opacity and &lt;strong&gt;I&amp;#8217;m currently in favor of a &amp;#8220;shoot first, ask questions later&amp;#8221; approach (and help lawmakers figure out the answers)&lt;/strong&gt;. After all, if there is anything that the DRM wars have taught us, it&amp;#8217;s that information really wants to be free. Why do we think it&amp;#8217;s technically hard (to say the least) to prevent copying of music, movies and software but we still think it may be possible to prevent copying of personal information? As I pointed out in an &lt;a href="http://www.bitquill.net/blog/?p=5"&gt;older post&lt;/a&gt;, it&amp;#8217;s usually the use and not the possession of information that&amp;#8217;s the problem.&lt;/p&gt;
&lt;p&gt;My point in this post is simple: &lt;strong&gt;we should not fight the wrong war&lt;/strong&gt;. Instead, we need an easy way to make data ownership claims, and use this to enforce at least two fundamental rights: the ability to keep any personal data with us, and the ability to know who is using this data and how.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Postscript.&lt;/strong&gt; This post was wallowing for a while as a draft (originally separated from &lt;a href="http://www.bitquill.net/blog/?p=5"&gt;this post&lt;/a&gt;, then forgotten).  Since then, a recent &lt;a title="Who Owns Your Friends? (MIT TR)" href="http://www.technologyreview.com/Infotech/20920/"&gt;MIT TR article&lt;/a&gt; discusses some aspects of data ownership.  Even better, I have since found an excellent &lt;a title="Curating Yourself Online: What happens when your data is not your alone? (MIT TR)" href="https://www.technologyreview.com/Infotech/20936/"&gt;short piece&lt;/a&gt; in the same issue by &lt;a title="Esther Dyson (Wikipedia)" href="http://en.wikipedia.org/wiki/Esther_Dyson"&gt;Esther Dyson&lt;/a&gt;, with which I could not agree more.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Update.&lt;/strong&gt; After posting this last night, I did some further Googling and found &lt;a title="How Loss of Privacy May Mean Loss of Security - Scientific American" href="http://www.sciam.com/article.cfm?id=how-loss-of-privacy-may-mean-loss-of-security"&gt;another piece by Esther Dyson&lt;/a&gt; in the Scientific American. If you&amp;#8217;ve read through my ramblings so far, then I&amp;#8217;d urge you to read her article; she&amp;#8217;s a much better writer than me, and has apparently been thinking about these issues for almost a decade, way before many people even knew what the Internet is. I should probably follow her more closely myself, as I agree disturbingly often with what I&amp;#8217;ve read from her so far.&lt;/em&gt;&lt;/p&gt;
&lt;img src="http://feeds.feedburner.com/~r/bitquill-all/~4/HRwEbB-l7B0" height="1" width="1"/&gt;</content>
		<link rel="replies" type="text/html" href="http://www.bitquill.net/blog/?p=7#comments" thr:count="1" />
		<link rel="replies" type="application/atom+xml" href="http://www.bitquill.net/blog/?feed=atom&amp;p=7" thr:count="1" />
		<thr:total>1</thr:total>
	</entry>
		<entry>
		<author>
			<name>spapadim</name>
						<uri>http://www.bitquill.net/</uri>
					</author>
		<title type="html"><![CDATA[First thoughts on Android]]></title>
		<link rel="alternate" type="text/html" href="http://www.bitquill.net/blog/?p=57" />
		<id>http://www.bitquill.net/blog/?p=57</id>
		<updated>2009-01-30T15:24:08Z</updated>
		<published>2008-11-20T04:59:42Z</published>
		<category scheme="http://www.bitquill.net/blog" term="Sci &amp; Tech" /><category scheme="http://www.bitquill.net/blog" term="Android" /><category scheme="http://www.bitquill.net/blog" term="Cloud computing" /><category scheme="http://www.bitquill.net/blog" term="Development" /><category scheme="http://www.bitquill.net/blog" term="Mobile devices" /><category scheme="http://www.bitquill.net/blog" term="Opinion" /><category scheme="http://www.bitquill.net/blog" term="Technology" />		<summary type="html"><![CDATA[Update: I&#8217;ll keep this post for the record, even though I&#8217;ve completely changed my mind.
I recently upgraded to a T-Mobile G1 (aka. HTC Dream), running Android.  The G1 is a very nice and functional device. It&#8217;s also compact and decent looking, but perhaps not quite a fashion statement: unlike the iPhone my girlfriend got last [...]]]></summary>
		<content type="html" xml:base="http://www.bitquill.net/blog/?p=57">&lt;p&gt;&lt;em&gt;&lt;strong&gt;Update:&lt;/strong&gt; I&amp;#8217;ll keep this post for the record, even though I&amp;#8217;ve &lt;a title="Revised thoughts on Android" href="http://www.bitquill.net/blog/?p=63"&gt;completely changed my mind&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.bitquill.net/blog/wp-content/uploads/2008/11/g1-android.jpg"&gt;&lt;img class="alignleft size-full wp-image-59" title="T-Mobile G1" src="http://www.bitquill.net/blog/wp-content/uploads/2008/11/g1-android.jpg" alt="T-Mobile G1" width="110" height="234" /&gt;&lt;/a&gt;I recently upgraded to a &lt;a title="T-Mobile G1 - Wikipedia" href="http://en.wikipedia.org/wiki/T-Mobile_G1"&gt;T-Mobile G1&lt;/a&gt; (aka. &lt;a title="High Tech Computer Corporation - Wikipedia" href="http://en.wikipedia.org/wiki/High_Tech_Computer_Corporation"&gt;HTC&lt;/a&gt; Dream), running Android.  The G1 is a very nice and functional device. It&amp;#8217;s also compact and decent looking, but perhaps not quite a fashion statement: unlike the iPhone my girlfriend got last year, which was immediately recognizable and a stare magnet, I pretty much have to slap people on the face with the G1 to make them look at it.  Also, battery life is acceptable, but just barely.  But this post is not about the G1, it&amp;#8217;s about &lt;a title="Google Android - Wikipedia" href="http://en.wikipedia.org/wiki/Google_Android"&gt;Android&lt;/a&gt;, which is Google&amp;#8217;s Linux-based, open-source mobile application platform.&lt;/p&gt;
&lt;p&gt;I&amp;#8217;ll start with some light comments, by one of the greatest entertainers out there today: &lt;a title="Developers - Youtube" href="http://www.youtube.com/watch?v=KMU0tzLwhbE"&gt;Monkey&lt;/a&gt; &lt;a title="Ballmer Monkeyboy iPod Mashup - Youtube" href="http://www.youtube.com/watch?v=FncILxajmlw"&gt;Boy&lt;/a&gt; made fun of the iPhone in January, stating that &amp;#8220;&lt;a title="Microsoft CEO Ballmer laughs at Apple iPhone - Youtube (quote at 1:13)" href="http://www.youtube.com/watch?v=C5oGaZIKYvo"&gt;Apple is selling zero phones a year&lt;/a&gt;&amp;#8220;. Now he&amp;#8217;s making similar remarks about Android, summarized by his eloquent &amp;#8220;&lt;a title="Ballmer calls Google's Android 'way behind' - ZDNet Blogs" href="http://news.zdnet.com/2424-9595_22-246795.html"&gt;blah dee blah dee blah&lt;/a&gt;&amp;#8221; argument.  Less than a year after that interview, the iPhone is &lt;a title="iPhone passes RIM, gains on Nokia - Apple 2.0 Fortune Blog" href="http://apple20.blogs.fortune.cnn.com/2008/11/07/iphone-passes-rim-gains-on-nokia/"&gt;ahead of Windows Mobile&lt;/a&gt; in worldwide market share of smartphone operating systems (7M versus 5.5M devices). Yep, this guy sure knows how entertain—even if he makes a fool of himself and Microsoft.&lt;/p&gt;
&lt;p&gt;Furthermore, &lt;a title="Steve Ballmer going crazy - Youtube" href="http://www.youtube.com/watch?v=wvsboPUjrGc"&gt;Monkey&lt;/a&gt; &lt;a title="Adolf Balmer - Youtube" href="http://www.youtube.com/watch?v=c3XdOl5YtLg"&gt;Boy&lt;/a&gt; said that &amp;#8220;if I went to my shareholder meeting [...] and said, hey, we&amp;#8217;ve just launched a new product that has no revenue model! [...] I&amp;#8217;m not sure that my investors would take that very well. But that&amp;#8217;s kind of what Google&amp;#8217;s telling their investors about Android.&amp;#8221;  Even if this were true, perhaps no revenue model is better than a simian model.&lt;/p&gt;
&lt;p&gt;Anyway, someone from Microsoft should really know better—and quite likely he does, but can&amp;#8217;t really say it out loud. There are some obvious parallels between Microsoft MS-DOS and Google Android:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Disruptive technology:&lt;/strong&gt; In the 80s, it was the personal computer.  Today, many think it is &amp;#8220;cloud computing&amp;#8221; (or &amp;#8220;services&amp;#8221;, or &amp;#8220;ubiquitous computing&amp;#8221;, or &amp;#8220;utility computing&amp;#8221;, or whatever else you want to call it).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Commodity infrastructure:&lt;/strong&gt; In the 80s, PC-compatibles became a commodity through standardization of the hardware platform and fierce competition that drove prices (and profit margins) down. Today, network infrastructure (the Internet at the core, and mobile devices on the fringes) as well as systems software (&lt;a title="LAMP (software bundle) - Wikipedia" href="http://en.wikipedia.org/wiki/LAMP_(software_bundle)"&gt;LAMP&lt;/a&gt;) are facing similar pressures.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Common software platform:&lt;/strong&gt; MS-DOS was the engine that fueled the growth of the personal computer.  For cloud computing, there is still some way to go (which Android hopes to help pave).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Revenue model:&lt;/strong&gt; Microsoft made a profit out of every PC sold. In today&amp;#8217;s networked world, profit should come from services offered over the network and accessed via a multitude of devices (including mobile phones), rather than from selling software licenses.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;An executive once said that &lt;strong&gt;money is really made by controlling the middleware platform.&lt;/strong&gt; Lower levels of the stack face high competition and have low profit margins.  Higher levels of the stack (except perhaps some key applications) are too special-purpose and more of a niche.  &lt;strong&gt;The sweet-spot lies somewhere in the middle.&lt;/strong&gt; This is where MS-DOS was and where Android wants to be.&lt;/p&gt;
&lt;p&gt;Microsoft established itself by providing the platform for building applications on the &amp;#8220;revolution&amp;#8221; of its day, the personal computer. MS-DOS became the de-facto standard, much more open than anything else at that time. Subsequently, Microsoft took a cut of the profits out of each PC sold ever since. Taiwanese &amp;#8220;PC-compatibles&amp;#8221; helped fuel Microsoft&amp;#8217;s (as well as Intel&amp;#8217;s) growth. The rest is history.&lt;/p&gt;
&lt;p&gt;In &amp;#8220;cloud&amp;#8221; computing, the ubiquitous, commodity infrastructure is the network.  This enables access to applications and information from any networked device. Even though individual components matter, &lt;strong&gt;it is common &lt;em&gt;standards&lt;/em&gt;, rather than a single, common software platform, which further enable information sharing.&lt;/strong&gt; If you believe that the future will be the same as the past, i.e., selling shrink-wrapped applications and software licenses, then Android not only has no revenue model, but has no hope of ever coming up with one. Ballmer would be absolutely right.  But if there is a shift towards network-hosted data and applications, money can be made whenever users access those.  &lt;strong&gt;There are plenty of obvious examples which could be profitable:&lt;/strong&gt; geographically targeted advertising, smart shopping broker/assistant (see below), mobile office and add-on services, online games (&lt;a title="Location Based Games Will Rock Your World - Phandroid" href="http://phandroid.com/2008/09/29/location-based-games-will-rock-your-world/"&gt;location based&lt;/a&gt; or not), and so on. It&amp;#8217;s not clear whether Google plans to get directly involved in those (I would doubt it), or just stay mostly on the back end and provide an easy-to-use &amp;#8220;cloud infrastructure&amp;#8221; for application developers.&lt;/p&gt;
&lt;p&gt;The services provided by network operators are becoming commodities. &lt;span&gt;This is nothing new.&lt;/span&gt; A quote I liked is that &amp;#8220;&lt;span&gt;&lt;a title="An opportunity for ISPs - Wiki That!" href="http://www.wikithat.com/wiki_that/2005/11/an_opportunity_.html"&gt;ISPs have nothing to offer other than price and speed&lt;/a&gt;&amp;#8220;.  I wouldn&amp;#8217;t really include security in their offerings, as it is really an end-to-end service. &lt;/span&gt;As for devices, there is already evidence that commoditization similar to that of PC-compatibles may happen. Just one month after Android was open-sourced, &lt;a title="Running Google Android On iPhone Clones - Slashdot" href="http://mobile.slashdot.org/article.pl?sid=08/10/29/1710220"&gt;Chinese manufacturers have started deploying it&lt;/a&gt; on smartphones. Even big manufacturers are quickly getting in the game; for example, &lt;a title="Huawei Android Phone Coming Early 2009 - Phandroid" href="http://phandroid.com/2008/11/08/huawei-android-phone-coming-early-2009/"&gt;Huawei recently announced an Android phone&lt;/a&gt;. Most cellphones are already manufactured in China anyway.  The iPhone is assembled in Shenzhen, where Huawei&amp;#8217;s headquarters are also located (coincidence?). The Chinese already have a decent track record when it comes to building hardware and it&amp;#8217;s only a matter of time until they fully catch up.&lt;/p&gt;
&lt;p&gt;So, it&amp;#8217;s quite simple: &lt;strong&gt;Android wants to be for ubiquitous services as MS-DOS was for personal computers.&lt;/strong&gt; But Microsoft in the 80s did not really start out by saying &amp;#8220;our revenue model is this: we&amp;#8217;ll build a huge user base &lt;a title="MS pricing strategy exposed – cheap when there's competition, but… - The Register" href="http://www.theregister.co.uk/1999/01/13/ms_pricing_strategy_exposed_cheap/"&gt;at all costs&lt;/a&gt;, which will subsequently allow us to get $200 out of each and every PC sold&amp;#8221;?  Not really.  Similarly, Google is not going to say that &amp;#8220;we want to build a user base, so we can make a profit from all services hosted on the [our?] cloud and accessed via mobile devices [and set-top boxes, and cars, and...].&amp;#8221;  Such an announcement would be premature, and one of the surest ways to scare off your user base: unless Google first provides more evidence that it means no evil, the general public will tend to assume the worst.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The most interesting feature of Android is it&amp;#8217;s &lt;a title="Anatomy of an Android App - Android Docs" href="http://code.google.com/android/intro/anatomy.html"&gt;component-based architecture&lt;/a&gt;&lt;/strong&gt;, as pointed out by some of the more insightful &lt;a title="Google's Android revealed: Component software for the mobile world - MobileOpportunity" href="http://mobileopportunity.blogspot.com/2007/11/googles-android-revealed-component.html"&gt;blog posts&lt;/a&gt;. Components are like iGoogle gadgets, only Android calls them &amp;#8220;activities.&amp;#8221; &lt;strong&gt;Applications themselves are built using a very browser-like metaphor&lt;/strong&gt;: a &amp;#8220;task&amp;#8221; (which is Android-speak for running applications) is simply a stack of activites, which users can navigate backwards and forwards. The platform already has a set of basic activities that handle, e.g., email URLs, map URLs, calendar URLs, Flickr URLs, Youtube URLs, photo capture, music files, and so on. Any application can seamlessly invoke any of these reusable activities, either directly or via a registry of capabilities (which, roughly speaking, are called &amp;#8220;intents&amp;#8221;). The correspondence between a task and an O/S process is not necessarily one-to-one. Processes are used behind the scenes, for security and resource isolation purposes. Activities invoked by the same task may or may not run in the same process.&lt;/p&gt;
&lt;p&gt;In addition to activities and intents, Android also supports other types of components, such as &amp;#8220;content providers&amp;#8221; (to expose data sources, such as your calendar or todo list, via a common API), &amp;#8220;services&amp;#8221; (long-running background tasks, such as a music player, which can be controlled via remote calls) and &amp;#8220;broadcast receivers&amp;#8221; (handlers for external events, such as receiving an SMS).&lt;/p&gt;
&lt;p&gt;I think that &lt;strong&gt;Google is really pushing Android because it needs a component-based platform&lt;/strong&gt;, and not so much to avoid &lt;a title=" Update On Google iPhone Voice Recognition App: Look For It On Monday - TechCrunch" href="http://www.techcrunch.com/2008/11/16/update-on-google-iphone-voice-recognition-app-look-for-it-on-monday/"&gt;the occasional snafu&lt;/a&gt;. If embraced by developers, this is the major ace up Android&amp;#8217;s sleeve.  Furthermore, the &lt;a title="Android Open Source Project" href="http://source.android.com/"&gt;open source codebase&lt;/a&gt; is the strongest indication (among several) that Google &lt;a title="Finally, proper banner ads for Android: Flash demoed on a G!" href="http://www.engadget.com/2008/11/17/finally-proper-banner-ads-for-android-flash-demoed-on-a-g1/"&gt;has no intention&lt;/a&gt; to tightly &lt;a title="Why Apple Won't Allow Adobe Flash on iPhone - Wired" href="http://blog.wired.com/gadgets/2008/11/adobe-flash-on.html"&gt;regulate application frameworks like Apple&lt;/a&gt;, or to leverage it&amp;#8217;s position to attack the competition like Microsoft has done in the past.  Google wants to give itself enough leverage to realize it&amp;#8217;s cloud-based services vision. If others benefit too, so much the better—Google is still too young to be &amp;#8220;&lt;a title="New York Debate Audience Can't Decide if Google Is " href="http://www.marketwatch.com/news/story/New-York-Debate-Audience-Cant/story.aspx?guid={0A2DBD98-B3B2-4F6F-899F-69A2908A809C}"&gt;evil&lt;/a&gt;&amp;#8220;.  After all, &lt;a title="The Clouds Part on HP's Computing Strategy - Wired" href="http://www.wired.com/techbiz/it/news/2008/05/portfolio_0513"&gt;as Jeff Bezos said&lt;/a&gt;, &amp;#8220;like our retail business, [there] is not going to be one winner. [...] &lt;strong&gt;Important industries are rarely made by single companies.&lt;/strong&gt;&amp;#8221; I find the comparison to retail interesting. &lt;strong&gt;In fact, it is quite likely that many &amp;#8220;cloud services&amp;#8221; themselves will also become commodities.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;I&amp;#8217;d wager that really successful Android applications won&amp;#8217;t be just applications, but &lt;em&gt;components&lt;/em&gt; with content provided &lt;em&gt;over the network&lt;/em&gt;.&lt;/strong&gt; A shopping list app is nice. It was exciting in the PalmPilot era, a decade ago. But a shopping list &lt;em&gt;component&lt;/em&gt;, accessible from both my laptop and my cellphone, able to automatically pull good deals from a shopping component, and allow a navigation component to alert me that the supermarket I&amp;#8217;m about to drive by has items I need—well, that would be great! &lt;strong&gt;Android is built with that vision in mind&lt;/strong&gt;, even though it&amp;#8217;s not quite there yet.&lt;strong&gt;&lt;br /&gt;
&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;It&amp;#8217;s kind of disappointing, but not surprising, that many app developers do not yet think in terms of this component-based architecture. In fairness, there are already efforts, such as &lt;a title="OpenIntents.org website" href="http://www.openintents.org/en/"&gt;OpenIntents&lt;/a&gt;, to build collections of general-purpose intents. &lt;strong&gt;Furthermore, &lt;a title="Synchronization in Android - My life with Android :-)" href="http://mylifewithandroid.blogspot.com/2008/02/synchronization-in-android.html"&gt;the sync APIs are not (yet) for the faint of heart&lt;/a&gt;. Even Google-provided services could perhaps be improved.&lt;/strong&gt; For example, Google Maps does not synchronize stored locations with the web-based version. When I recently missed a highway exit on the way to work and needed to get directions, I had to pull over and re-type the full address. Neither does it expose those locations via a data provider. When I installed &lt;a title="Locale website" href="http://www.androidlocale.com/"&gt;Locale&lt;/a&gt;, I had to manually re-enter most of &amp;#8220;My Locations&amp;#8221; from the web version of Google Maps. So, there are clearly some rough edges that I&amp;#8217;m sure will be smoothed out.  After all, there have been other rough edges, such as &lt;a title="Issue 1207:  	 android appears to be watching text streams and acting upon them - Google Code" href="http://code.google.com/p/android/issues/detail?id=1207"&gt;forgotten debugging hooks&lt;/a&gt;, something I find more amusing than alarming or embarrassing and certainly not the &amp;#8220;&lt;a title="Worst. Bug. Ever. - ZDNet Blogs" href="http://blogs.zdnet.com/Burnette/?p=680"&gt;Worst. Bug. Ever.&lt;/a&gt;&amp;#8221;&lt;/p&gt;
&lt;p&gt;Android has a lot of potential, but it still needs work and Google should move fast. The top two items on my wish list would be:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Release a &amp;#8220;signature&amp;#8221; device&lt;/strong&gt; (or two), like the Motorola Razr was a couple of years ago and the Apple iPhone was last year. The G1 is really nice, but not enough.  A device that people &lt;em&gt;desire&lt;/em&gt; may be neither a necessary nor a sufficient condition for success, but it will sure help as a vehicle to move Android forward in market share.&lt;/li&gt;
&lt;li&gt;Expand the set of available activities and content providers, and release an easy-to-use data sync service and API. &lt;strong&gt;In principle, everything that is an iGoogle gadget should also be an Android activity, &lt;em&gt;sharing the same data sources&lt;/em&gt;.&lt;/strong&gt; This is at the core of what &amp;#8220;cloud computing&amp;#8221; is about.  After all, &lt;strong&gt;you could think of Android as a glorified modern browser&lt;/strong&gt; for devices with small screens, intermittent network connectivity, location sensors, and so on.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;I suspect it might not be that hard to build a Google gadget container for Android.  Google Gears is already there and some form of interaction with the local device via Javascript is already allowed.  Many gadgets don&amp;#8217;t need that much screen real estate anyway, so this may be an interesting hack to try out.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;But not many people will buy an Android device for what it &lt;em&gt;could&lt;/em&gt; do some day.&lt;/strong&gt; Google has created a lot of positive buzz, backed by a few actual features. Now it needs some sexy devices and truly interesting apps, to really jumpstart the necessary &lt;a title="Network effect - Wikipedia" href="http://en.wikipedia.org/wiki/Network_effect"&gt;network effect&lt;/a&gt;. &lt;strong&gt;Building the smart shopping list app &lt;em&gt;should&lt;/em&gt; be as easy as building the dumb one.&lt;/strong&gt; In the longer run, the set of devices on which Android is deployed should be expanded.  Move beyond cell phones, to &lt;a title="Google's Android: It's not just for phones - CNet" href="http://news.cnet.com/8301-17938_105-10047551-1.html"&gt;in-car computers, set-top boxes, and so on&lt;/a&gt; (Microsoft Windows does both &lt;a title="Microsoft Automotive Home Page" href="http://www.microsoft.com/auto/default.mspx"&gt;cars&lt;/a&gt; and &lt;a title="MSN TV - Wikipedia" href="http://en.wikipedia.org/wiki/MSN_TV"&gt;set-top boxes&lt;/a&gt; already, but with limited success so far)—in short, anything that can be used to access network-hosted data and applications.&lt;/p&gt;
&lt;img src="http://feeds.feedburner.com/~r/bitquill-all/~4/XhZaVvpS5LU" height="1" width="1"/&gt;</content>
		<link rel="replies" type="text/html" href="http://www.bitquill.net/blog/?p=57#comments" thr:count="2" />
		<link rel="replies" type="application/atom+xml" href="http://www.bitquill.net/blog/?feed=atom&amp;p=57" thr:count="2" />
		<thr:total>2</thr:total>
	</entry>
		<entry>
		<author>
			<name>spapadim</name>
						<uri>http://www.bitquill.net/</uri>
					</author>
		<title type="html"><![CDATA[Randy Pausch in CACM]]></title>
		<link rel="alternate" type="text/html" href="http://www.bitquill.net/blog/?p=54" />
		<id>http://www.bitquill.net/blog/?p=54</id>
		<updated>2008-10-08T14:51:30Z</updated>
		<published>2008-10-08T14:51:30Z</published>
		<category scheme="http://www.bitquill.net/blog" term="Life bits" /><category scheme="http://www.bitquill.net/blog" term="Sci &amp; Tech" /><category scheme="http://www.bitquill.net/blog" term="Academia" /><category scheme="http://www.bitquill.net/blog" term="Computer Science" /><category scheme="http://www.bitquill.net/blog" term="Research" />		<summary type="html"><![CDATA[The September issue of CACM has a one-page, seven-question interview with Randy Pausch. It is definitely worth reading, so I&#8217;ll give you a sneak peek (unfortunately, CACM is not &#8220;open access&#8221;):
What about advice for CS teachers and professors?
That it&#8217;s time for us to start being more honest with ourselves about what our field is and [...]]]></summary>
		<content type="html" xml:base="http://www.bitquill.net/blog/?p=54">&lt;p&gt;The September issue of CACM has a one-page, seven-question &lt;a title="Wisdom from Randy Pausch - CACM" href="http://doi.acm.org/10.1145/1378727.1378735"&gt;interview with Randy Pausch&lt;/a&gt;. It is definitely worth reading, so I&amp;#8217;ll give you a sneak peek (unfortunately, &lt;a title="CACM - Front page" href="http://cacm.acm.org/"&gt;CACM&lt;/a&gt; is not &amp;#8220;open access&amp;#8221;):&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;&lt;strong&gt;What about advice for CS teachers and professors?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;That it&amp;#8217;s time for us to start being more honest with ourselves about what our field is and how we should approach teaching it. Personally, I think that if we had named the field &amp;#8220;Information Engineering&amp;#8221; as opposed to &amp;#8220;Computer Science,&amp;#8221; we would have had a better culture for the discipline. For example, CS departments are notorious for not instilling concepts like testing and validation the way many other engineering disciplines do.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Is there anything you wish someone had told you before you began your own studies?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Just that being technically strong is only one aspect of an education.&lt;/p&gt;
&lt;p&gt;[...]&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Alice has proven phenomenally successful at teaching young women, in particular, to program. What else should we be doing to get more women engaged in computer science?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Well, it&amp;#8217;s important to note that Alice works for both women &lt;em&gt;and&lt;/em&gt; men. I think female-specific &amp;#8220;approaches&amp;#8221; can be dangerous for lots of reasons, but approaches like Alice, which focus on activities like storytelling, work across gender, age, and cultural background. It&amp;#8217;s something very fundamental to want to tell stories. And Caitlin Kelleher&amp;#8217;s dissertation did a fantastic job of showing just how powerful that approach is.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;The interview was conducted a few weeks before his death. I&amp;#8217;ll just say that, somehow, I suspect someone not in his position would never have said at least one of these things.  It&amp;#8217;s a sad thought, but Randy&amp;#8217;s message is, as always, positive.&lt;/p&gt;
&lt;img src="http://feeds.feedburner.com/~r/bitquill-all/~4/y1QvvbsHYV4" height="1" width="1"/&gt;</content>
		<link rel="replies" type="text/html" href="http://www.bitquill.net/blog/?p=54#comments" thr:count="0" />
		<link rel="replies" type="application/atom+xml" href="http://www.bitquill.net/blog/?feed=atom&amp;p=54" thr:count="0" />
		<thr:total>0</thr:total>
	</entry>
		<entry>
		<author>
			<name>spapadim</name>
						<uri>http://www.bitquill.net/</uri>
					</author>
		<title type="html"><![CDATA[We&#8217;ve moved!]]></title>
		<link rel="alternate" type="text/html" href="http://www.bitquill.net/blog/?p=55" />
		<id>http://www.bitquill.net/blog/?p=55</id>
		<updated>2008-10-03T15:37:07Z</updated>
		<published>2008-10-03T15:35:33Z</published>
		<category scheme="http://www.bitquill.net/blog" term="Life bits" />		<summary type="html"><![CDATA[Early this week we moved from White Plains to Manhattan.  So far, we&#8217;ve decorated the apartment using organic landscape elements, in harmony with the surrounding environment.  Here is what I mean:

On the left is the view outside the window and on the right is what you see inside.
]]></summary>
		<content type="html" xml:base="http://www.bitquill.net/blog/?p=55">&lt;p&gt;Early this week we moved from White Plains to Manhattan.  So far, we&amp;#8217;ve decorated the apartment using organic landscape elements, in harmony with the surrounding environment.  Here is what I mean:&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.bitquill.net/blog/wp-content/uploads/2008/10/movein-out.jpg"&gt;&lt;img class="size-full wp-image-56" title="Move in / out" src="http://www.bitquill.net/blog/wp-content/uploads/2008/10/movein-out.jpg" alt="Urban decoration" width="640" height="400" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;On the left is the view outside the window and on the right is what you see inside.&lt;/p&gt;
&lt;img src="http://feeds.feedburner.com/~r/bitquill-all/~4/zXb4yvYMkDU" height="1" width="1"/&gt;</content>
		<link rel="replies" type="text/html" href="http://www.bitquill.net/blog/?p=55#comments" thr:count="1" />
		<link rel="replies" type="application/atom+xml" href="http://www.bitquill.net/blog/?feed=atom&amp;p=55" thr:count="1" />
		<thr:total>1</thr:total>
	</entry>
		<entry>
		<author>
			<name>spapadim</name>
						<uri>http://www.bitquill.net/</uri>
					</author>
		<title type="html"><![CDATA[NYC initiation: rental application]]></title>
		<link rel="alternate" type="text/html" href="http://www.bitquill.net/blog/?p=53" />
		<id>http://www.bitquill.net/blog/?p=53</id>
		<updated>2008-09-26T12:32:56Z</updated>
		<published>2008-09-26T12:32:56Z</published>
		<category scheme="http://www.bitquill.net/blog" term="Life bits" />		<summary type="html"><![CDATA[We recently signed a lease to rent in UES. Besides the usual credit check, most places in NYC ask for a slew of personal information: bank statements (with balances and account numbers), federal tax return and W-2 copies, letter of employment stating yearly salary, and three character reference letters.  (As for the landlord, I only [...]]]></summary>
		<content type="html" xml:base="http://www.bitquill.net/blog/?p=53">&lt;p&gt;We recently signed a lease to rent in UES. Besides the usual credit check, most places in NYC ask for a slew of personal information: bank statements (with balances and account numbers), federal tax return and W-2 copies, letter of employment stating yearly salary, and three character reference letters.  (As for the landlord, I only know her name)&lt;/p&gt;
&lt;p&gt;I&amp;#8217;m told that managed buildings may skip some of these, but the apartment we found is in a condominium. Even though the landlord had already approved us, our broker prepared all the paperwork to a tee for the upcoming condo board review.&lt;/p&gt;
&lt;p&gt;He even sent us some anonymized character reference letter samples.  Some were quite amusing.  For example (emphasis mine):&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;[...] I have always found him to be serious and responsible about his works [sic] and his private life. His home life is extremely quiet, and I would think ideal for his neighbors. &lt;em&gt;Virtually all of his social gatherings are conducted in restaurants.&lt;/em&gt; He travels throughout nine months of the year and would probably be at home for only short periods of time between those trips. &lt;em&gt;And quite frankly, his time at home is usually spent resting as part of his recovery from his traveling&lt;/em&gt; and preparation for his next trip. He is just the kind of quiet, unobtrusive neighbor that I would like to have.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;I couldn&amp;#8217;t help but wonder how many boards went through letters like this one. For a moment or two, I entertained the thought of asking a friend to write a pithy one-liner instead:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Spiros = corpse – odor + money    ⇒    Spiros = dream tenant !&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;but I eventually decided that the &amp;#8220;⇒&amp;#8221; notation might be too much and dropped the idea altogether.&lt;/p&gt;
&lt;p&gt;I just hope those sample letters do not really reflect life in NYC!&lt;/p&gt;
&lt;img src="http://feeds.feedburner.com/~r/bitquill-all/~4/dzfL_7V0CeQ" height="1" width="1"/&gt;</content>
		<link rel="replies" type="text/html" href="http://www.bitquill.net/blog/?p=53#comments" thr:count="1" />
		<link rel="replies" type="application/atom+xml" href="http://www.bitquill.net/blog/?feed=atom&amp;p=53" thr:count="1" />
		<thr:total>1</thr:total>
	</entry>
		<entry>
		<author>
			<name>spapadim</name>
						<uri>http://www.bitquill.net/</uri>
					</author>
		<title type="html"><![CDATA[Data harvesting with MapReduce]]></title>
		<link rel="alternate" type="text/html" href="http://www.bitquill.net/blog/?p=17" />
		<id>http://www.bitquill.net/blog/?p=17</id>
		<updated>2008-10-21T01:38:05Z</updated>
		<published>2008-09-11T17:04:51Z</published>
		<category scheme="http://www.bitquill.net/blog" term="Sci &amp; Tech" /><category scheme="http://www.bitquill.net/blog" term="Cloud computing" /><category scheme="http://www.bitquill.net/blog" term="Data mining" /><category scheme="http://www.bitquill.net/blog" term="Development" /><category scheme="http://www.bitquill.net/blog" term="Distributed" /><category scheme="http://www.bitquill.net/blog" term="Hadoop" /><category scheme="http://www.bitquill.net/blog" term="MapReduce" /><category scheme="http://www.bitquill.net/blog" term="Web" />		<summary type="html"><![CDATA[
(original image source)
&#8220;The combine harvester, [...] is a machine that combines the tasks of harvesting, threshing and cleaning grain crops.&#8221;  If you have acres upon acres of wheat and want to separate the grain from the chaff, a group of combines is what you really want. If you have a bonsai tree and want [...]]]></summary>
		<content type="html" xml:base="http://www.bitquill.net/blog/?p=17">&lt;p&gt;&lt;a href="http://www.bitquill.net/blog/wp-content/uploads/2008/07/pack_of_harvesters.jpg"&gt;&lt;img class="size-full wp-image-19" title="pack_of_harvesters" src="http://www.bitquill.net/blog/wp-content/uploads/2008/07/pack_of_harvesters.jpg" alt="Combine harvesters" width="600" height="300" /&gt;&lt;/a&gt;&lt;br /&gt;
(original &lt;a title="Brazil's Answer to Global Hunger (BusinessWeek)" href="http://www.businessweek.com/magazine/content/08_22/b4086072681496.htm?chan=search"&gt;image source&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;&amp;#8220;The &lt;a title="Combine harvester (Wikipedia)" href="http://en.wikipedia.org/wiki/Combine_harvester"&gt;combine harvester&lt;/a&gt;, [...] is a machine that combines the tasks of harvesting, threshing and cleaning grain crops.&amp;#8221;  If you have acres upon acres of wheat and want to separate the grain from the chaff, a group of combines is what you really want. If you have a bonsai tree and want to trim it, a harvester may be less than ideal.&lt;/p&gt;
&lt;p&gt;MapReduce is like a pack of harvesters, well-suited for weeding through a &lt;em&gt;huge&lt;/em&gt; volumes of data, residing on a distributed storage system.  However, a lot of machine learning work is more akin to trimming bonsai into elaborate patterns. Vice versa, it&amp;#8217;s not uncommon to see trimmers used to harvest a wheat field. Well-established and respected researchers, as recently as this year write in their paper &amp;#8220;&lt;a title="Planetary Scale Views on a Large Instant-messaging Network (ACM DL)" href="http://doi.acm.org/10.1145/1367497.1367620"&gt;Planetary Scale Views on a Large Instant-messaging Network&lt;/a&gt;&amp;#8220;:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;We gathered data for 30 days of June 2006. Each day yielded about 150 gigabytes of compressed text logs (4.5 terabytes in total). Copying the data to a dedicated eight-processor server with 32 gigabytes of memory took 12 hours. Our log-parsing system employed a pipeline of four threads that parse the data in parallel, collapse the session join/leave events into sets of conversations, and save the data in a compact compressed binary format. This process compressed the data down to 45 gigabytes per day. Processing the data took an additional 4 to 5 hours per day.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Doing the math, that&amp;#8217;s &lt;em&gt;five full days&lt;/em&gt; of processing to parse and compress the data on a beast of a machine. Even more surprisingly, I found this exact quote &lt;a title="Post on " href="http://caddyshack.stanford.edu/lsna/blog/2008/03/planetary-scale-views-on-instant.html"&gt;singled out&lt;/a&gt; among all the interesting results in the paper! Let me make clear that I&amp;#8217;m not criticizing the study; in fact, both the dataset and the exploratory analysis are interesting in many ways.  However, &lt;strong&gt;it is somewhat surprising that, at least among the research community, such a statement is still treated more like a badge of honor rather than an admission of masochism.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The authors should be applauded for their effort.  &lt;strong&gt;Me, I&amp;#8217;m an impatient sod.&lt;/strong&gt; Wait one day for the results, I think I can do that. Two days, what the heck. But five? For an exploratory statistical analysis? I&amp;#8217;d be long gone before that. And what if I found a serious bug half way down the road?  That&amp;#8217;s after more than two days of waiting, in case you weren&amp;#8217;t counting. Or what if I decided I needed a minor modification to extract some other statistic?  Wait another five days?  Call me a Matlab-spoiled brat, but forget what I said just now about waiting one day. I changed my mind already. A few hours, tops. But &lt;a title="Web science: what and how? (Bitquill)" href="http://www.bitquill.net/blog/?p=24"&gt;we need a lot more studies like this&lt;/a&gt;.  Consequently, we need the tools to facilitate them.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Hence my decision to frolic with Hadoop.&lt;/strong&gt; This post focuses on exploratory data analysis tasks: the kind I usually do with Matlab or &lt;a title="IPython frontpage" href="http://ipython.scipy.org/moin/FrontPage"&gt;IPython&lt;/a&gt;/&lt;a title="SciPy website" href="http://www.scipy.org/"&gt;SciPy&lt;/a&gt; scripts, which involve many iterations of feature extraction, data summarization, model building and validation.  This may be contrary to Hadoop&amp;#8217;s design priorities: it is not intended for quick turnaround or interactive response times with modestly large datasets.  However, it can still make life much easier.&lt;/p&gt;
&lt;h3&gt;Scale up on large datasets&lt;/h3&gt;
&lt;p&gt;First, we start with a very simple benchmark, which scans a 350GB text log.  Each record is one line, consisting of a comma-separated list of &lt;tt&gt;key=value&lt;/tt&gt; pairs.  The job extracts the value for a specific key using a simple regular expression and computes the histogram of the corresponding values (i.e., how many times each distinct value appears in the log).  The input consists of approximately 500M records and the chosen key is associated with about 130 distinct values.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.bitquill.net/blog/wp-content/uploads/2008/07/hadoop-hist-scalability.png"&gt;&lt;img class="size-full wp-image-20" title="hadoop-hist-scalability" src="http://www.bitquill.net/blog/wp-content/uploads/2008/07/hadoop-hist-scalability.png" alt="Scalability: histogram" width="512" height="310" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The plot above shows aggregate throughput versus number of nodes. HDFS and MapReduce cluster sizes  are always equal, with HDFS rebalanced before each run. The job uses a split size of 256MB (or four HDFS blocks) and one reducer.  All machines have a total of four cores (most Xeon, a few AMD) and one local disk.  Disks range from ridiculously slow laptop-type drives (the most common type), to ridiculously fast SAS drives. Hadoop 0.16.2 (yes, this post took a while to write) and Sun&amp;#8217;s 1.6.0_04 JRE were used in all experiments.&lt;/p&gt;
&lt;p&gt;For such an embarrassingly parallel task, scaleup is linear. No surprises here, but it&amp;#8217;s worth pointing out some numbers.  As you can see from the plot, extracting simple statistics from this 350GB dataset took less than ten minutes with 39 nodes, down from several hours on one node. Without knowing the details of how the data were processed, &lt;strong&gt;if we assume similar throughput, then processing time of the raw instant messaging log could be roughly reduced from five days to just a few hours.&lt;/strong&gt; In fact, when parsing a document corpus (about 1TB of raw text) to extract a document-term graph, we witnessed similar scale-up, going down from well over a day on a beast of a machine, to a couple of hours on the Hadoop cluster.&lt;/p&gt;
&lt;p&gt;Hadoop is also reasonably simple to program with.  It&amp;#8217;s main abstraction is natural, even if your familiarity with functional programming concepts is next to none.  Furthermore, most distributed execution details are, by default, hidden: if the code runs correctly on your laptop (with a smaller dataset, of course), then it &lt;em&gt;will&lt;/em&gt; run correctly on the cluster.&lt;/p&gt;
&lt;h3&gt;Single core performance&lt;/h3&gt;
&lt;p&gt;Linear scaleup is good, but how about absolute performance? I implemented the same simple benchmark in C++, using Boost for regex matching.  For a rough measure of sustained sequential disk throughput, I simply &lt;tt&gt;cat&lt;/tt&gt; the same large file to &lt;tt&gt;/dev/null&lt;/tt&gt;.&lt;/p&gt;
&lt;p&gt;I collected measurements from various machines I had access to: (i) a five year old Mini-ITX system I use with my television at home, running Linux FC8 for this experiment, (ii) a two year old desktop at work, again with FC8, (iii) my three year old Thinkpad running Windows XP and Cygwin, and (iv) a recent IBM blade running RHEL4.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.bitquill.net/blog/wp-content/uploads/2008/08/hadoop-hist-singlecore.png"&gt;&lt;img class="size-full wp-image-48" title="hadoop-hist-singlecore" src="http://www.bitquill.net/blog/wp-content/uploads/2008/08/hadoop-hist-singlecore.png" alt="Single core performance" width="512" height="311" /&gt;&lt;/a&gt;&lt;a href="http://www.bitquill.net/blog/wp-content/uploads/2008/07/hadoop-hist-singlecore.png"&gt; &lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The hand-coded version in C++ is about 40% faster on the older machines and 33% faster on the blade [Note: I'm missing the C++ times for my laptop and it's drive crashed since then -- I was too lazy to reload the data and rerun everything, so I simply extrapolated from single-thread Hadoop assuming a 40% improvement, which seems reasonable enough for these back-of-the-envelope calculations].  Not bad, considering that Hadoop is written in Java and also incurs additional overheads to process each file split separately.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Perhaps I&amp;#8217;m flaunting my ignorance but, surprisingly, this workload was &lt;/strong&gt;&lt;strong&gt;CPU-bound and&lt;/strong&gt;&lt;strong&gt; &lt;em&gt;not&lt;/em&gt; I/O-bound&lt;/strong&gt;—with the exception of my laptop, which has a really crappy 2.5&amp;#8243; drive (and Windows XP).  &lt;strong&gt;Scanning raw text logs is a rather representative workload for real-world data analysis&lt;/strong&gt; (e.g., &lt;a title="AWK (Wikipedia)" href="http://en.wikipedia.org/wiki/Awk"&gt;AWK&lt;/a&gt; was built at AT&amp;amp;T for this purpose).&lt;/p&gt;
&lt;p&gt;The blade has a really fast SAS drive (suspiciously fast, except perhaps if it runs at 15K RPM) and the results are particularly instructive.  The drive reaches 120MB/sec sustained read throughput. Stated differently, the 3GHz CPU can only dwell on each byte for 24 cycles on average, if it&amp;#8217;s to keep up with the drive&amp;#8217;s read rate.  Even on the other machines, the break-even point is between 30-60 cycles [Note: The laptop drive seems to be an exception, but I wouldn't be so sure that at least part of the inefficiency isn't due to Cygwin].&lt;/p&gt;
&lt;p&gt;On the other hand, the benchmark throughput translates into 150-500 cycles per byte, on average.  If I get the chance, I&amp;#8217;d like to instrument the code with PAPI, validate these numbers and perhaps obtain a breakdown (into average cycles for regex state machine transition per byte, average cycles for hash update per record, etc). I would never have thought the numbers to be so high and I still don&amp;#8217;t quite believe it. &lt;strong&gt;In any case, if we believe these measurements, &lt;em&gt;at least&lt;/em&gt; 4-6 cores are needed to handle the sequential read throughput from a single drive!&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The common wisdom in algorithms and databases textbooks, as far as I remember, was that when disk I/O is involved, CPU cycles can be more or less treated as a commodity.  Perhaps this is an overstatement, but I didn&amp;#8217;t expect it to be so off the mark.&lt;/p&gt;
&lt;p&gt;This also raises another interesting question, which was the original motivation for measuring on a broad set of machines: what would be the appropriate cost-performance balance between CPU and disk for a &lt;em&gt;purpose-built&lt;/em&gt; machine?  &lt;strong&gt;I thought one might be able to get away with a setup similar to &lt;a title="Active Disks for Large Scale Data Processing (IEEE Computer)" href="http://www.pdl.cmu.edu/PDL-FTP/Active/activedisks01.pdf"&gt;active disks&lt;/a&gt;&lt;/strong&gt;: a really cheap and power-efficient Mini-ITX board, attached to a couple of moderately priced drives.  For example, see &lt;a title="PetaBox GB1000 (Capricorn Tech)" href="http://www.capricorn-tech.com/gb1000.php"&gt;this configuration&lt;/a&gt;, which was once used in the &lt;a title="Internet Archive: WayBack Machine" href="http://www.archive.org/web/web.php"&gt;WayBack machine&lt;/a&gt; (I just found out that the VIA-based models have apparently been withdrawn, but the pages are still there for now). &lt;strong&gt;This does not seem to be the case.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The blades may be ridiculously expensive, perhaps even a complete waste of money for a moderately tech-savvy person. However, you can&amp;#8217;t just throw together any old motherboard and hard disk, and magically turn them into a &amp;#8220;supercomputer.&amp;#8221; This is common sense, but some of the hype might have you believe the opposite.&lt;/p&gt;
&lt;h3&gt;Performance on smaller datasets&lt;/h3&gt;
&lt;p&gt;Once the original, raw data is processed, the representation of the features relevant to the analysis task typically occupies much less space.  In this case, a bipartite graph extracted from the same 350GB text logs (the details don&amp;#8217;t really matter for this discussion) takes up about 3GB, or two orders of magnitude less space.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.bitquill.net/blog/wp-content/uploads/2008/07/hadoop-cocluster-scalability.png"&gt;&lt;img class="size-full wp-image-22" title="hadoop-cocluster-scalability" src="http://www.bitquill.net/blog/wp-content/uploads/2008/07/hadoop-cocluster-scalability.png" alt="Scalability: coclustering iteration" width="512" height="310" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The graph shows aggregate throughput for one iteration of an algorithm similar to &lt;em&gt;k&lt;/em&gt;-means clustering.  This is fundamentally very similar to computing a simple histogram.  In both cases, the output size is very small compared to the input size: the histogram has size proportional to the number of distinct values, whereas the cluster centers occupy space proportional to &lt;em&gt;k&lt;/em&gt;.  Furthermore, both computations iterate over the entire dataset and perform a hash-based group-by aggregation.  For &lt;em&gt;k&lt;/em&gt;-means, each point is &amp;#8220;hashed&amp;#8221; based on its distance to the closest cluster center, and the aggregation involves a vector sum.&lt;/p&gt;
&lt;p&gt;Nothing much to say here, except that the linear scaleup tapers off after about 10-15 nodes, essentially due to lack of data: the fixed per-split overheads start dominating the total processing time. Hadoop is &lt;a title="Yahoo! Launches World's Largest Hadoop Production Application" href="http://developer.yahoo.com/blogs/hadoop/2008/02/yahoo-worlds-largest-production-hadoop.html"&gt;not really built to process datasets of modest size&lt;/a&gt;, but fundamentally &lt;a title="The Phoenix System for MapReduce Programming" href="http://csl.stanford.edu/~christos/sw/phoenix/"&gt;I see nothing to prevent MapReduce from doing so&lt;/a&gt;.  More importantly, &lt;strong&gt;when the dataset becomes &lt;em&gt;really&lt;/em&gt; huge, I would expect Hadoop to scale almost-linearly with more nodes.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Hadoop can clearly help pre-process the raw data quickly. Once the relevant features are extracted, they may occupy at least an order of magnitude less space. &lt;strong&gt;It &lt;em&gt;may&lt;/em&gt; be possible to get away with single-node processing on the &lt;em&gt;appropriate&lt;/em&gt; representation of the features&lt;/strong&gt;, at least for exploratory tasks.  &lt;a title="Scaling up all-pairs similarity search - WWW 2007" href="http://doi.acm.org/10.1145/1242572.1242591"&gt;Sometimes&lt;/a&gt; it may even be better to use a centralized approach.&lt;/p&gt;
&lt;h3&gt;Summary&lt;/h3&gt;
&lt;p&gt;My focus is on exploratory analysis of large datasets, which is a pre-requisite for the design of mining algorithms.  Such tasks typically involve (i) raw data pre-processing and feature extraction stages, and (ii) model building and testing stages. Distributed data processing platforms and, in particular, Hadoop are well-suited for such tasks, especially the feature extraction stages.  In fact, tools such as &lt;a title="Interpreting the Data: Parallel Analysis with Sawzall" href="http://research.google.com/archive/sawzall-sciprog.pdf"&gt;Sawzall&lt;/a&gt; (which is akin to AWK, but on top of Google&amp;#8217;s MapReduce and protocol buffers), excel at the feature extraction and summarization stages.&lt;/p&gt;
&lt;p&gt;The original, raw data may reside in a traditional database, but more often than not they don&amp;#8217;t: packet traces, event logs, web crawls, email corpora, sales data, issue-tracking ticket logs, and so on.  Hadoop is especially well-suited for &amp;#8220;harvesting&amp;#8221; those features out of the original data. In its present form, it can also help in model building stages, if the dataset is &lt;em&gt;really&lt;/em&gt; large.&lt;/p&gt;
&lt;p&gt;In addition to reducing processing time, Hadoop is also quite easy to use. My experience is that the programming effort compares very favorably to the usual approach of writing my own, quick Python scripts for data pre-processing.  Furthermore, there are ongoing efforts for even further simplification (e.g., &lt;a title="Cascading website" href="http://www.cascading.org/"&gt;Cascading&lt;/a&gt; and &lt;a title="Pig homepage" href="http://wiki.apache.org/pig/"&gt;Pig&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;I was somewhat surprised with the CPU vs I/O trade-offs for what I would consider real-world data processing tasks.  Perhaps also influenced by the original work on &lt;a title="Active Disks for Large Scale Data Processing (IEEE Computer)" href="http://www.pdl.cmu.edu/PDL-FTP/Active/activedisks01.pdf"&gt;active disks&lt;/a&gt; (one of the inspirations for MapReduce), which suggested using the disk controller to process data.  However, there is a cross-over point for the performance of active disks versus centralized processing; I was way off with my initial guess on how much CPU power it takes for a reasonably low cross-over point (which is workload-dependent, of course, and &lt;strong&gt;any results herein should be treated as indicative and not conclusive&lt;/strong&gt;).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Footnote:&lt;/strong&gt; For what it&amp;#8217;s worth, I&amp;#8217;ve put up &lt;a href="http://www.bitquill.net/trac/browser/trunk/pcc"&gt;some of the code&lt;/a&gt; (and hope to &lt;a href="http://www.bitquill.net/trac/wiki/PCC/Start"&gt;document&lt;/a&gt; it sometime). Also, thanks to &lt;a title="Stavros Harizopoulos's homepage" href="http://nms.csail.mit.edu/~stavros/"&gt;Stavros Harizopoulos&lt;/a&gt; for pointing out the simple cycles-per-byte metric.&lt;/p&gt;
&lt;img src="http://feeds.feedburner.com/~r/bitquill-all/~4/gwTFVl99byo" height="1" width="1"/&gt;</content>
		<link rel="replies" type="text/html" href="http://www.bitquill.net/blog/?p=17#comments" thr:count="0" />
		<link rel="replies" type="application/atom+xml" href="http://www.bitquill.net/blog/?feed=atom&amp;p=17" thr:count="0" />
		<thr:total>0</thr:total>
	</entry>
		<entry>
		<author>
			<name>spapadim</name>
						<uri>http://www.bitquill.net/</uri>
					</author>
		<title type="html"><![CDATA[Bad planning]]></title>
		<link rel="alternate" type="text/html" href="http://www.bitquill.net/blog/?p=50" />
		<id>http://www.bitquill.net/blog/?p=50</id>
		<updated>2009-03-20T16:14:42Z</updated>
		<published>2008-09-04T19:42:31Z</published>
		<category scheme="http://www.bitquill.net/blog" term="Life bits" /><category scheme="http://www.bitquill.net/blog" term="Pointless" /><category scheme="http://www.bitquill.net/blog" term="Humor" />		<summary type="html"><![CDATA[Some visions do not really translate into plans.  Becoming rich, established or happy, for example.  It&#8217;s like saying &#8220;I want to be a superhero!&#8221; How do you go about that?

Nope.  Not much of a plan.
]]></summary>
		<content type="html" xml:base="http://www.bitquill.net/blog/?p=50">&lt;p&gt;Some visions do not really translate into plans.  Becoming rich, established or happy, for example.  It&amp;#8217;s like saying &amp;#8220;I want to be a superhero!&amp;#8221; How do you go about that?&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.bitquill.net/blog/wp-content/uploads/2008/09/how-to-be-spiderman.png"&gt;&lt;img class="size-full wp-image-51" title="How to be Spiderman" src="http://www.bitquill.net/blog/wp-content/uploads/2008/09/how-to-be-spiderman.png" alt="How to become Spiderman" width="455" height="333" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Nope.  Not much of a plan.&lt;/p&gt;
&lt;img src="http://feeds.feedburner.com/~r/bitquill-all/~4/6o7bVeLuNXc" height="1" width="1"/&gt;</content>
		<link rel="replies" type="text/html" href="http://www.bitquill.net/blog/?p=50#comments" thr:count="1" />
		<link rel="replies" type="application/atom+xml" href="http://www.bitquill.net/blog/?feed=atom&amp;p=50" thr:count="1" />
		<thr:total>1</thr:total>
	</entry>
		<entry>
		<author>
			<name>spapadim</name>
						<uri>http://www.bitquill.net/</uri>
					</author>
		<title type="html"><![CDATA[&#8220;Beyond Relational Databases&#8221;]]></title>
		<link rel="alternate" type="text/html" href="http://www.bitquill.net/blog/?p=32" />
		<id>http://www.bitquill.net/blog/?p=32</id>
		<updated>2008-11-22T17:22:04Z</updated>
		<published>2008-08-21T16:20:59Z</published>
		<category scheme="http://www.bitquill.net/blog" term="Sci &amp; Tech" /><category scheme="http://www.bitquill.net/blog" term="Cloud computing" /><category scheme="http://www.bitquill.net/blog" term="Commentary" /><category scheme="http://www.bitquill.net/blog" term="Computer Science" /><category scheme="http://www.bitquill.net/blog" term="Data management" /><category scheme="http://www.bitquill.net/blog" term="Development" /><category scheme="http://www.bitquill.net/blog" term="Distributed" /><category scheme="http://www.bitquill.net/blog" term="Hadoop" /><category scheme="http://www.bitquill.net/blog" term="MapReduce" /><category scheme="http://www.bitquill.net/blog" term="Opinion" />		<summary type="html"><![CDATA[The article &#8220;Beyond Relational Databases&#8221; by Margo Seltzer in the July 2008 issue of CACM claims that &#8220;there is more to data access than SQL.&#8221;  Although this is a fairly obvious statement, the article is well-written and worth a read.  The main message is simple: bundling data storage, indexing, query execution, transaction control, and logging [...]]]></summary>
		<content type="html" xml:base="http://www.bitquill.net/blog/?p=32">&lt;p&gt;The article &amp;#8220;&lt;a title="Beyond Relational Databases (CACM)" href="http://doi.acm.org/10.1145/1364782.1364797"&gt;Beyond Relational Databases&lt;/a&gt;&amp;#8221; by &lt;a title="Margo Seltzer's homepage" href="http://www.eecs.harvard.edu/~margo/"&gt;Margo Seltzer&lt;/a&gt; in the July 2008 issue of CACM claims that &amp;#8220;there is more to data access than SQL.&amp;#8221;  Although this is a fairly obvious statement, the article is well-written and worth a read.  The main message is simple: bundling data storage, indexing, query execution, transaction control, and logging components into a monolithic system and wrapping them with a veneer of SQL is not the best solution to all data management problems. Consequently, the author makes a call for solutions based on a modular approach, using open components.&lt;strong&gt; &lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;However, the article offers no concrete examples at all, so I&amp;#8217;ll venture a suggestion. &lt;/strong&gt;In a growing open source ecosystem of scalable, fault-tolerant, distributed data processing and management components, &lt;a title="MapReduce: Simplified Processing on Large Clusters (OSDI 2004)" href="http://labs.google.com/papers/mapreduce.html"&gt;MapReduce&lt;/a&gt; is emerging as a predominant elementary abstraction for distributed execution of a large class of data-intensive processing tasks. It has attracted a lot of attention, proving both a source for &lt;a title="Pig (Apache Incubator)" href="http://incubator.apache.org/pig/"&gt;inspiration&lt;/a&gt;, as well as target of &lt;a title="MapReduce: A Major Step Backwards" href="http://www.databasecolumn.com/2008/01/mapreduce-a-major-step-back.html"&gt;polemic&lt;/a&gt; by prominent database researchers.&lt;/p&gt;
&lt;p&gt;In database terminology, &lt;strong&gt;MapReduce is an execution engine, largely unconcerned about data models and storage schemes&lt;/strong&gt;.  In the simplest case, data reside on a distributed file system (e.g., &lt;a title="The Google Filesystem (SOSP 2003)" href="http://labs.google.com/papers/gfs.html"&gt;GFS&lt;/a&gt;, &lt;a title="Hadoop Distributed Filesystem" href="http://hadoop.apache.org/core/docs/current/hdfs_design.html"&gt;HDFS&lt;/a&gt;, or &lt;a title="Kosmos Distributed Filesystem" href="http://hadoop.apache.org/core/docs/current/hdfs_design.html"&gt;KFS&lt;/a&gt;) but nothing prevents pulling data from a large data store like &lt;a title="BigTable (OSDI 2006)" href="http://labs.google.com/papers/bigtable.html"&gt;BigTable&lt;/a&gt; (or &lt;a title="HBase" href="http://hadoop.apache.org/hbase/"&gt;HBase&lt;/a&gt;, or &lt;a title="Hypertable" href="http://www.hypertable.org/"&gt;Hypertable&lt;/a&gt;), or any other storage engine, as long as it&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Provides data de-clustering and replication across many machines, and&lt;/li&gt;
&lt;li&gt;Allows computations to execute on local copies of the data.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Arguably, &lt;strong&gt;MapReduce is powerful both for the features it provides, as well as for the features it &lt;em&gt;omits&lt;/em&gt;&lt;/strong&gt;, in order to provide a clean and simple programming abstraction, which facilitates improved usability, &lt;a title="Apache Hadoop Wins Terabyte Sort Benchmark" href="http://developer.yahoo.com/blogs/hadoop/2008/07/apache_hadoop_wins_terabyte_sort_benchmark.html"&gt;efficiency&lt;/a&gt; and fault-tolerance.&lt;/p&gt;
&lt;p&gt;Most of the fundamental ideas for distributed data processing are not new.  For example, a researcher involved in some of the projects mentioned once said, with notable openness and directness, that &amp;#8220;people think there is something new in all this; there isn&amp;#8217;t, it&amp;#8217;s all &lt;a title="The Gamma Database Machine Project (IEEE TKDE)" href="http://dx.doi.org/10.1109/69.50905"&gt;Gamma&lt;/a&gt;&amp;#8220;—and he&amp;#8217;s probably right.  Reading the &lt;a title="The Google Filesystem (SOSP 2003)" href="http://labs.google.com/papers/gfs.html"&gt;original&lt;/a&gt; &lt;a title="MapReduce: Simplified Processing on Large Clusters (OSDI 2004)" href="http://labs.google.com/papers/mapreduce.html"&gt;Google&lt;/a&gt; &lt;a title="BigTable (OSDI 2006)" href="http://labs.google.com/papers/bigtable.html"&gt;papers&lt;/a&gt;, none make a claim to fundamental discoveries.  Focusing on &amp;#8220;academic novelty&amp;#8221; (whatever that may mean) is irrelevant.  Similarly, most of the other criticisms in the irresponsibly written and oft (mis)quoted &lt;a title="MapReduce: A Major Step Backwards" href="http://www.databasecolumn.com/2008/01/mapreduce-a-major-step-back.html"&gt;blog post&lt;/a&gt; and &lt;a title="MapReduce II" href="http://www.databasecolumn.com/2008/01/mapreduce-continued.html"&gt;its followup&lt;/a&gt; miss the point.  &lt;strong&gt;The big thing about the technologies mentioned in this post is, in fact, their promise to materialize Margo Seltzer&amp;#8217;s vision&lt;/strong&gt;, on clusters of commodity hardware.&lt;/p&gt;
&lt;p&gt;Michael Stonebraker and David DeWitt do have a valid point: we should &lt;em&gt;not&lt;/em&gt; fixate on MapReduce; greater things are happening. &lt;strong&gt;So, if we are indeed witnessing the emergence of an open ecosystem for scalable, distributed data processing, what might be the other key components?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Data types:&lt;/strong&gt; In database speak, these are known as &amp;#8220;schemas.&amp;#8221; Google&amp;#8217;s &lt;a title="Protobuf (Google Code)" href="http://code.google.com/p/protobuf/"&gt;protocol buffers&lt;/a&gt; the underlying API for data storage and exchange.  This is also nothing radically new; in essence, it is a &lt;a title="XML Binary Characterization (W3C)" href="http://www.w3.org/XML/Binary/"&gt;binary XML&lt;/a&gt; representation,  somewhere between the simple &lt;a title="An Evaluation of Binary XML Encoding Optimizations for Fast Stream Based XML Processing (ACM DL)" href="http://doi.acm.org/10.1145/988672.988719"&gt;XTalk&lt;/a&gt; protocol which underpins &lt;a title="Vinci (PDF)" href="http://www.bitquill.net/pdf/comnet02_vinci.pdf"&gt;Vinci&lt;/a&gt; and the &lt;a title="WAP Binary XML Content Format (W3C)" href="http://www.w3.org/TR/wbxml/"&gt;WBXML&lt;/a&gt; tokenized representation (both slightly predating protocol buffers and both now largely defunct).  In fact, if I had to name a major weakness in the open source versions of Google&amp;#8217;s infrastructure (Hadoop, HBase, etc), it would be the lack of such a common data representation format.  Hadoop has &lt;tt&gt;Writable&lt;/tt&gt;, but that is much too low-level (a data-agnostic, minimalistic abstraction for lightweight, mutable, serializable objects), leading to replication of effort in many projects that rely on Hadoop (such as &lt;a title="Lucene Nutch (Apache)" href="http://lucene.apache.org/nutch/"&gt;Nutch&lt;/a&gt;, Pig, Cascading, and so on).  Interestingly, the &lt;tt&gt;rcc&lt;/tt&gt; record compiler component (which seems to have fallen in disuse) was once called &lt;a title="JIRA on rcc naming" href="https://issues.apache.org/jira/browse/HADOOP-1069"&gt;Jute&lt;/a&gt; with &lt;em&gt;possibly&lt;/em&gt; plans grander than what came to be.  So, I was pleasantly surprised when Google &lt;a title="Protocol Buffers (Google Open Source Blog)" href="http://google-opensource.blogspot.com/2008/07/protocol-buffers-googles-data.html"&gt;decided to open-source protocol buffers&lt;/a&gt; a few days ago—although it may now turn out to be too little too late.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Data access:&lt;/strong&gt; In the beginning there was BigTable, which has been recently followed by HBase and Hypertable.  It started fairly simple, as a &amp;#8220;is a sparse, distributed, persistent multidimensional sorted map&amp;#8221; to quote the original paper.  It is now part of the &lt;a title="Google App Engine (Google Code)" href="http://code.google.com/appengine/"&gt;Google App Engine&lt;/a&gt; and even has support for general &lt;a title="Google App Engine - Datastore API - Transactions" href="http://code.google.com/appengine/docs/datastore/transactions.html"&gt;transactions&lt;/a&gt;. HBase, at least as of version 0.1 was relatively immature, but there is a flurry of development and we should expect good things pretty soon, given the Hadoop team&amp;#8217;s excellent track record so far.  While writing this post, I remembered an HBase wish list item which, although lower priority, I had found interesting: support for scripting languages, instead of HQL. Turns out this has already been done (&lt;a title="Replace HQL with an HBase-friendly jirb or jython shell (JIRA)" href="https://issues.apache.org/jira/browse/HBASE-487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel"&gt;JIRA entry&lt;/a&gt; and &lt;a href="http://wiki.apache.org/hadoop/Hbase/HbaseShell"&gt;wiki&lt;/a&gt; &lt;a href="http://wiki.apache.org/hadoop/Hbase/Shell/Replacement"&gt;entries&lt;/a&gt;).  I am a fan of modern scripting languages and generally skeptical about new special-purpose languages (which is not to say that they don&amp;#8217;t have their place).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Job and schema management:&lt;/strong&gt; &lt;a title="Pig (Yahoo! Research)" href="http://research.yahoo.com/node/90"&gt;Pig&lt;/a&gt;, from the database community, is described as a &lt;a title="Automatic Optimization of Parallel Dataflow Programs (USENIX 2008)" href="http://www.cs.cmu.edu/~olston/publications/usenix08.pdf"&gt;parallel dataflow engine&lt;/a&gt; and employs yet another special-purpose language which &lt;a title="Pig-Latin: A Not-So-Foreign Language for Data Processing (SIGMOD 2008, industrial track)" href="http://www.cs.cmu.edu/~olston/publications/sigmod08.pdf"&gt;tries to look a little like SQL&lt;/a&gt; (but it is no secret that &lt;a title="Chris Olston's blog comment on imperative vs. declarative approach" href="http://www.databasecolumn.com/2008/01/mapreduce-continued.html#comment-849"&gt;it isn&amp;#8217;t&lt;/a&gt;). &lt;a title="Cascading" href="http://www.cascading.org/"&gt;Cascading&lt;/a&gt; has received no attention in the research community, but it merits a closer look. It is based on a &amp;#8220;build system&amp;#8221; metaphor, aiminig to be the equivalent of Make or Ant for distributed processing of huge datasets.  Instead of introducing a new language, it provides a clean Java API and also integrates with scripting languages that support functional programming (at the moment, Groovy).  As I have used neither Cascading nor Pig at the moment, I will reserve any further comparisons.  It is worth noting that both projects build upon Hadoop core and do not integrate, at the moment, with other components, such as HBase. Finally, &lt;a title="Interpreting the Data: Parallel Analysis with Sawzall" href="http://labs.google.com/papers/sawzall.html"&gt;Sawzall&lt;/a&gt; deserves an honorable mention, but I won&amp;#8217;t discuss it further as it is a closed technology.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Indexing:&lt;/strong&gt; Beyond lookups based on row keys in BigTable, general support for indexing is a relatively open topic.  I suspect that IR-style indices, such as &lt;a title="Lucene Java (Apache)" href="http://lucene.apache.org/java/"&gt;Lucene&lt;/a&gt;, have much to offer (something that &lt;a title="Build a Lucene index on top of an HBase table (JIRA)" href="https://issues.apache.org/jira/browse/HBASE-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12528212"&gt;has not gone unnoticed&lt;/a&gt;)—more on this in another post.&lt;/p&gt;
&lt;p&gt;A number of other projects are also worth keeping an eye on, such as &lt;a title="CouchDB (Apache Incubator)" href="http://incubator.apache.org/couchdb/"&gt;CouchDB&lt;/a&gt;, Amazon&amp;#8217;s &lt;a title="Amazon S3" href="http://aws.amazon.com/s3"&gt;S3&lt;/a&gt;, Facebook&amp;#8217;s &lt;a title="Hive as a contrib module (JIRA)" href="https://issues.apache.org/jira/browse/HADOOP-3601"&gt;Hive&lt;/a&gt;, and &lt;a title="JAQL homepage" href="http://www.jaql.org/"&gt;JAQL&lt;/a&gt; (and I&amp;#8217;m sure I&amp;#8217;m missing many more).  All of them are, of course, open source.&lt;/p&gt;
&lt;img src="http://feeds.feedburner.com/~r/bitquill-all/~4/RNW5_jS5BOc" height="1" width="1"/&gt;</content>
		<link rel="replies" type="text/html" href="http://www.bitquill.net/blog/?p=32#comments" thr:count="0" />
		<link rel="replies" type="application/atom+xml" href="http://www.bitquill.net/blog/?feed=atom&amp;p=32" thr:count="0" />
		<thr:total>0</thr:total>
	</entry>
		<entry>
		<author>
			<name>spapadim</name>
						<uri>http://www.bitquill.net/</uri>
					</author>
		<title type="html"><![CDATA[A dog&#8217;s life]]></title>
		<link rel="alternate" type="text/html" href="http://www.bitquill.net/blog/?p=45" />
		<id>http://www.bitquill.net/blog/?p=45</id>
		<updated>2008-08-07T23:44:20Z</updated>
		<published>2008-08-07T23:44:20Z</published>
		<category scheme="http://www.bitquill.net/blog" term="Life bits" /><category scheme="http://www.bitquill.net/blog" term="Pointless" />		<summary type="html"><![CDATA[I recently returned from two weeks in Greece, which included four days in Santorini.  Even the dogs sunbathe, enjoying the scenery.  For the most part, I gladly followed their example.

Upon coming back to New York, somewhat to my surprise, it was the US that felt comparatively dinky.  A few years ago, it used to be [...]]]></summary>
		<content type="html" xml:base="http://www.bitquill.net/blog/?p=45">&lt;p&gt;I recently returned from two weeks in Greece, which included four days in Santorini.  Even the dogs sunbathe, enjoying the scenery.  For the most part, I gladly followed their example.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.bitquill.net/blog/wp-content/uploads/2008/08/dog-oia.jpg"&gt;&lt;img class="size-full wp-image-46" title="Dog\'s life in Santorini" src="http://www.bitquill.net/blog/wp-content/uploads/2008/08/dog-oia.jpg" alt="Dog\'s life in Santorini" width="630" height="472" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Upon coming back to New York, somewhat to my surprise, it was the US that felt comparatively dinky.  A few years ago, it used to be the other way around.  However, the contrast between new developments around the 2004 Olympics and New York&amp;#8217;s crumbling infrastructure was at least noticeable this time.&lt;/p&gt;
&lt;img src="http://feeds.feedburner.com/~r/bitquill-all/~4/qyG7prrkjm4" height="1" width="1"/&gt;</content>
		<link rel="replies" type="text/html" href="http://www.bitquill.net/blog/?p=45#comments" thr:count="0" />
		<link rel="replies" type="application/atom+xml" href="http://www.bitquill.net/blog/?feed=atom&amp;p=45" thr:count="0" />
		<thr:total>0</thr:total>
	</entry>
		<entry>
		<author>
			<name>spapadim</name>
						<uri>http://www.bitquill.net/</uri>
					</author>
		<title type="html"><![CDATA[The Fall of CAPTCHAs - really?]]></title>
		<link rel="alternate" type="text/html" href="http://www.bitquill.net/blog/?p=44" />
		<id>http://www.bitquill.net/blog/?p=44</id>
		<updated>2008-11-22T17:22:28Z</updated>
		<published>2008-07-17T06:56:43Z</published>
		<category scheme="http://www.bitquill.net/blog" term="Pointless" /><category scheme="http://www.bitquill.net/blog" term="Sci &amp; Tech" /><category scheme="http://www.bitquill.net/blog" term="Commentary" /><category scheme="http://www.bitquill.net/blog" term="Computer Science" /><category scheme="http://www.bitquill.net/blog" term="Opinion" /><category scheme="http://www.bitquill.net/blog" term="Web" />		<summary type="html"><![CDATA[I recently saw a Slashdot post dramatically titled &#8220;Fallout From the Fall of CAPTCHAs&#8220;, citing an equally dramatic article about &#8220;How CAPTCHA got trashed&#8220;.  Am I missing something? Ignoring their name for a moment, CAPTCHAs are computer programs, following specific rules, and therefore they are subject to the same cat-and-mouse games that all security mechanisms [...]]]></summary>
		<content type="html" xml:base="http://www.bitquill.net/blog/?p=44">&lt;p&gt;I recently saw a Slashdot post dramatically titled &amp;#8220;&lt;a title="Fallout from the Fall of CAPTCHAs (Slashdot)" href="http://it.slashdot.org/article.pl?sid=08/07/15/2025220"&gt;Fallout From the Fall of CAPTCHAs&lt;/a&gt;&amp;#8220;, citing an equally dramatic article about &amp;#8220;&lt;a title="How CAPTCHA got trashed (Computerworld)" href="http://www.computerworld.com.au/index.php/id;489635775;fp;;fpid;"&gt;How CAPTCHA got trashed&lt;/a&gt;&amp;#8220;.  Am I missing something? Ignoring their name for a moment, &lt;strong&gt;CAPTCHAs are &lt;em&gt;computer&lt;/em&gt; programs, following specific rules, and therefore they are subject to the same cat-and-mouse games that all security mechanisms go through. Where exactly is the surprise?&lt;/strong&gt; So Google&amp;#8217;s or Yahoo&amp;#8217;s current versions were cracked.  They&amp;#8217;ll soon come up with new tricks, and still newer ones after those are cracked, and so on.&lt;/p&gt;
&lt;p&gt;In fact, I was always confused about one aspect of CAPTCHAs. &lt;strong&gt;I thought that a &lt;a title="Turing test (Wikipedia)" href="http://en.wikipedia.org/wiki/Turing_test"&gt;Turing test&lt;/a&gt; is, by definition, &lt;em&gt;administered&lt;/em&gt; by a human, so a &amp;#8220;completely-automated Turing-test&amp;#8221; is an oxymoron, something like a &amp;#8220;liberal conservative&amp;#8221;.&lt;/strong&gt; An unbreakable authentication system based on Turing tests should rely &lt;em&gt;fully&lt;/em&gt; on &lt;a title="Human-based Computation (Wikipedia)" href="http://en.wikipedia.org/wiki/Human-based_computation"&gt;human computation&lt;/a&gt;: humans should also be at the end that generates the tests. Let humans come up with questions, using references to images, web site content, and whatever else they can think of.  Then match these to other humans who can gain access to a web service by solving the riddles. Perhaps the tests should also be somehow rated, lest the simple act of logging in turns into an absurd treasure hunt. I&amp;#8217;m not exactly sure if and how this could be turned into an &lt;a title="The ESP Game" href="http://www.gwap.com/gwap/gamesPreview/espgame/"&gt;addictive game&lt;/a&gt;, but I&amp;#8217;ll leave that to the experts.  The idea is too obvious to miss anyway.&lt;/p&gt;
&lt;p&gt;CAPTCHAs, even in their current form, have led to numerous contributions.  A non-exclusive list, in no particular order:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;They have a catchy name. That counts a lot. Seriously. I&amp;#8217;m not joking; if you don&amp;#8217;t believe me, repeat out loud after me: &amp;#8220;I have no idea what &amp;#8216;onomatopoeia&amp;#8217; is—I&amp;#8217;d better MSN-Live it&amp;#8221; or &amp;#8220;&amp;#8230; I&amp;#8217;d better Yahoo it.&amp;#8221;  Doesn&amp;#8217;t quite work, does it?&lt;/li&gt;
&lt;li&gt;They popularized an idea which, even if &lt;a title="USPTO 6195698: Method for selectively restricting access to computer systems (Google Patent Search)" href="http://www.google.com/patents?id=VncGAAAAEBAJ"&gt;not entirely new&lt;/a&gt;, was made accesible to webmasters the world over, and is now used daily by thousands if not millions of people.  What greater measure of success can you think of for a technology?&lt;/li&gt;
&lt;li&gt;Sowed the seeds for Luis von Ahn&amp;#8217;s &lt;a title="Human Computation (Google Video)" href="http://video.google.com/videoplay?docid=-8246463980976635143"&gt;viral talk&lt;/a&gt; on human computation, which has featured in countless universities, companies and conferences.  Although not professionally designed, the slides&amp;#8217; simplicity matches their content in a Jobs-esque way. As for delivery and timing, Steve might even learn something from this talk (although, in fairness, Steve Jobs probably doesn&amp;#8217;t get the chance to introduce the same product hundreds of times).&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;So is anyone really surprised that the race for smarter tests and authentication mechanisms has not ended, and probably never will? (Incidentally, the lecture video above is from 2006, over three years &lt;em&gt;after&lt;/em&gt; the first CAPTCHAs were &lt;a title="EzGimpy" href="http://www.cs.sfu.ca/~mori/research/gimpy/"&gt;succesfully broken&lt;/a&gt; by another computer program—see also &lt;a title="Recognizing Objects in Adversarial Clutter: Breaking a Visual CAPTCHA" href="http://www.cs.sfu.ca/~mori/research/papers/mori_cvpr03.pdf"&gt;CVPR 2003 paper&lt;/a&gt;—.  &lt;strong&gt;There are no silver bullets, no technology is perfect, but some are really useful.&lt;/strong&gt; Perhaps CAPTCHAs are, to some extent, victim of their own hype which, however, is instrumental and perhaps even necessary for the wide adoption of any useful technology.  I&amp;#8217;m pretty sure we&amp;#8217;ll see &lt;a title="Google Patents CAPTCHA Killer?" href="http://www.blahblahtech.com/2008/01/google-patent-captcha-killer.html"&gt;more elaborate tests&lt;/a&gt; soon, not less.&lt;/p&gt;
&lt;img src="http://feeds.feedburner.com/~r/bitquill-all/~4/zobwRIY975A" height="1" width="1"/&gt;</content>
		<link rel="replies" type="text/html" href="http://www.bitquill.net/blog/?p=44#comments" thr:count="0" />
		<link rel="replies" type="application/atom+xml" href="http://www.bitquill.net/blog/?feed=atom&amp;p=44" thr:count="0" />
		<thr:total>0</thr:total>
	</entry>
	</feed>
