<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:blogger="http://schemas.google.com/blogger/2008" xmlns:georss="http://www.georss.org/georss" xmlns:gd="http://schemas.google.com/g/2005" xmlns:thr="http://purl.org/syndication/thread/1.0" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" gd:etag="W/&quot;CEYAR38zeSp7ImA9WhBaFU0.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555</id><updated>2013-05-26T01:15:46.181+08:00</updated><category term="ibatis" /><category term="flash" /><category term="computer science" /><category term="javascript" /><category term="news" /><category term="silverlight" /><category term="web" /><category term="php" /><category term="tutorial" /><category term="gwt" /><category term="web development" /><category term="hosting" /><category term="methodology" /><category term="parsing" /><category term="open source" /><category term="algorithms" /><category term="Java" /><category term="blog" /><category term="site" /><category term="oracle" /><category term="tables" /><category term="stackoverflow" /><category term="css" /><category term="git" /><category term="ejb" /><category term="sql" /><category term="markdown" /><category term="opinion" /><category term="jpa" /><category term="web 2.0" /><category term="smarty" /><category term="spring" /><category term="browser compatibility" /><category term="html" /><category term="orm" /><category term="lombok" /><category term="windows" /><category term="tdd" /><category term="career" /><category term="performance" /><category term="review" /><category term="usability" /><category term="database" /><category term="google" /><title>C for Coding</title><subtitle type="html" /><link rel="http://schemas.google.com/g/2005#feed" type="application/atom+xml" href="http://www.cforcoding.com/feeds/posts/default" /><link rel="alternate" type="text/html" href="http://www.cforcoding.com/" /><link rel="next" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default?start-index=26&amp;max-results=25&amp;redirect=false&amp;v=2" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><generator version="7.00" uri="http://www.blogger.com">Blogger</generator><openSearch:totalResults>73</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.feedburner.com/CForCoding" /><feedburner:info uri="cforcoding" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><entry gd:etag="W/&quot;CkcNRXkyfSp7ImA9WhJQEEs.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-719267813792001122</id><published>2012-07-23T12:01:00.001+08:00</published><updated>2012-07-24T00:14:54.795+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-07-24T00:14:54.795+08:00</app:edited><title>Social is a Cancer</title><content type="html">&lt;p&gt;Two events recently have set me off on what could well end up being a rant.&lt;/p&gt;  &lt;p&gt;The first is &lt;a href="http://current.com/shows/the-gavin-newsom-show/videos/instagram-zynga-theres-a-lot-of-big-minds-chasing-small-ideas-says-allthingsds-kara-swisher/"&gt;Kara Swisher’s comments&lt;/a&gt; (&lt;a href="http://news.ycombinator.com/item?id=4244709"&gt;HN submission&lt;/a&gt; and &lt;a href="http://news.ycombinator.com/item?id=4244937"&gt;my reply&lt;/a&gt;). I firmly believe that the collective brain power and money being spent on social (in particular social “games”) is, well, sad.&lt;/p&gt;  &lt;p&gt;The second is the &lt;a href="http://store.steampowered.com/"&gt;Steam Summer Sale&lt;/a&gt;, an annual PC (and Mac and even Linux now I guess) game markdown event where one can pick up games as little as 2 years old for the price of iOS games. I’ve recently gotten back into playing PC games as a distraction and have bought a bunch of games in the last week.&lt;/p&gt;  &lt;p&gt;Some are old favourites like &lt;a href="http://www.civiv.com/"&gt;Civilization 4&lt;/a&gt; (especially with the superb &lt;a href="http://forums.civfanatics.com/showthread.php?t=171398"&gt;Fall From Heaven 2&lt;/a&gt; fantasy mod) and the excellent &lt;a href="http://www.rockstargames.com/grandtheftauto/"&gt;Grand Theft Auto&lt;/a&gt; series.&lt;/p&gt;  &lt;p&gt;What you notice in playing recent releases (up to 2-3 years) versus older games is one big addition: social. Here is a partial list of what I’ve had to do across various titles recently:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Log onto Ubisoft’s much-hated U-play to play Heroes of Might and Magic 6, a franchise I was once very fond of. Demand probably related to Steam was enough to cause &lt;a href="http://news.ycombinator.com/item?id=4244937"&gt;intermittent Uplay outages&lt;/a&gt; making single player games unplayable. HoMM6 wouldn’t even let me play until I’d logged on to Uplay at least once. Playing it while on Uplay leads to incessant nagging about connecting with friends, sharing my progress and so on; &lt;/li&gt;    &lt;li&gt;Playing GTA IV involves logging onto the Rockstar Social Club and then Windows Game Live, two separate registrations. Windows Game Live needed to install an update that it failed to on two successive occasions, It only succeeded at all because I happened to notice a background confirmation box that needed to be OKed; &lt;/li&gt;    &lt;li&gt;Diablo 3 of course only works while online; &lt;/li&gt;    &lt;li&gt;A friend bought Burnout Paradise on Steam but Steam… ran out of keys. Say what now? How is this not simply a case of Steam being able to generate their own keys? Why is this not a service? and&lt;/li&gt;    &lt;li&gt;EA, Origin and Kalypso each had their own “social” centers to log onto before you could play the game. &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;I refused to buy Anno 2070 because its social platform/DRM also included limited activations based on hardware changes. Screw that, screw them and screw the horse they rode in on.&lt;/p&gt;  &lt;p&gt;Compare this to GTA 3, Vice City and San Andreas and Civ4, all of which simply &lt;em&gt;worked&lt;/em&gt;. Emancipation.&lt;/p&gt;  &lt;p&gt;Now part of this is the game industry’s obsession with piracy. Making life difficult for consumers who are paying for games is the surest way of all to create software pirates because I &lt;em&gt;guarantee &lt;/em&gt;you that the pirated versions don’t have these problems.&lt;/p&gt;  &lt;p&gt;Who really shares “achievements” (you can’t really call them that) with their friends on 17 different social platforms or shares the news on Facebook or Twitter? Who really wants to? Does this really add anything to the game? Is it in any way, shape or form more likely to sell more copies of the game or keep people playing for longer?&lt;/p&gt;  &lt;p&gt;“Social” is a cancer and it has to stop.&lt;/p&gt;  &lt;p&gt;More accurately this kneejerk obsession with social has to stop. Not everything is “social” and-this goes beyond games—don’t ruin your customers experience by foisting “social” on them and then nagging them about it when they simply want to play or use your product.&lt;/p&gt;  &lt;p&gt;And don’t get me started on “social” games. They’re simply some combination of the idea of self-expression (“look how I arranged my farm!”) and inciting compulsive behaviour that’s really not much different from being addicted to gambling. I’ve seen iPhone apps were people have clearly spent &lt;em&gt;thousands&lt;/em&gt; of real world dollars. It’s nothing more than an exercise in who spends the most real world money. There is no challenge or end result.&lt;/p&gt;  &lt;p&gt;It’s simply a constant cycle of compulsive behaviour and big data analytics to identify what works best in creating addicting behaviour.&lt;/p&gt;  &lt;p&gt;Social is a cancer. It’s killing the PC as a gaming platform. That makes me sad. It needs to be irradiated, poisoned and excised before it kills the start-up scene too. &lt;a href="http://www.huffingtonpost.com/2012/04/09/instagram-facebook-deal_n_1413256.html"&gt;Instagram is worth more than the New York Times?&lt;/a&gt; &lt;a href="http://news.ycombinator.com/item?id=3891791"&gt;Is greater than the development budget for SpaceX’s orbital launch vehicle?&lt;/a&gt; &lt;em&gt;Give me a break&lt;/em&gt;.&lt;/p&gt;  &lt;p&gt;Stop it. Now.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/ZZvWSPtVFAY" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/719267813792001122/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2012/07/social-is-cancer.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/719267813792001122?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/719267813792001122?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/ZZvWSPtVFAY/social-is-cancer.html" title="Social is a Cancer" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://www.cforcoding.com/2012/07/social-is-cancer.html</feedburner:origLink></entry><entry gd:etag="W/&quot;D04MRXc8eSp7ImA9WhRWGE0.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-3590347835421261713</id><published>2012-01-06T07:26:00.001+08:00</published><updated>2012-01-06T07:26:24.971+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-01-06T07:26:24.971+08:00</app:edited><title>Interview Programming Problems Done Right</title><content type="html">&lt;h3&gt;Introduction&lt;/h3&gt;

&lt;p&gt;&lt;a href="http://news.ycombinator.com/item?id=3428984"&gt;Why 37signals Doesn't Hire Programmers Based on Brainteasers&lt;/a&gt; and &lt;a href="http://news.ycombinator.com/item?id=3429466"&gt;my comment on HN&lt;/a&gt; generated a lot of responses, so much so that I'm writing this post to properly explain the essence of a good (IMHO) interview programming problem.&lt;/p&gt;

&lt;h3&gt;Pascal's Triangle&lt;/h3&gt;

&lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/Pascal's_triangle"&gt;Pascal's Triangle&lt;/a&gt; is a shortcut for getting coefficients most often used &lt;a href="http://en.wikipedia.org/wiki/Binomial_probability"&gt;binomial probability&lt;/a&gt;. The root element is 1. Every other element is the sum of the one or two above it (diagonally left and diagonally right).&lt;/p&gt;

&lt;p&gt;There are several variations of the problem:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Print out the triangle to a specific row;&lt;/li&gt;
&lt;li&gt;Return a given row of the triangle;&lt;/li&gt;
&lt;li&gt;Return a given element (by row and index) of the triangle.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All of them use the same basic logic. You explain this to the interviewee and ask them to solve it on paper or on a whiteboard.&lt;/p&gt;

&lt;h3&gt;Recursive Solution&lt;/h3&gt;

&lt;p&gt;The simplest version is a recursive solution, something like:&lt;/p&gt;

&lt;pre class="brush:python"&gt;
#!/usr/bin/python
def value(row, index):
  if index &lt; 0 or index &gt; row:
    return 0
  if index == 0 or index == row:
    return 1
  return value(row-1, index-1) + value(row-1, index)

def row(n):
  return [value(n, x) for x in xrange(0, n+1)]

for i in xrange(10):
  print row(i)
&lt;/pre&gt;

&lt;p&gt;If a candidate can produce this it is at least a working solution even though the performance (for non-trivial n) is prohibitive. Ideally they would be able to point this out (plus the exponential big-O performance).&lt;p&gt;

&lt;p&gt;On my Macbook Pro (2010) this runs for n=20 in about 0.8 seconds.&lt;/p&gt;

&lt;h3&gt;Memoization&lt;/h3&gt;

&lt;p&gt;Some candidates will improve this solution by identifying and caching the repeated calculations. Something like this:&lt;/p&gt;

&lt;pre class="brush:python"&gt;
import collections

values = collections.defaultdict(dict)

def value(row, index):
  result = values[row].get(index)
  if result is not None:
    return result
  if index &lt; 0 or index &gt; row:
    return 0
  if index == 0 or index == row:
    return 1
  result = value(row-1, index-1) + value(row-1, index)
  values[row][index] = result
  return result
&lt;/pre&gt;

&lt;p&gt;Real time for n=20 is 0.03 seconds. Bonus points if the candidate correctly states this as "memorization" (rather than the more generic "caching").&lt;/p&gt;

&lt;h3&gt;Iterative Solution&lt;/h3&gt;

&lt;p&gt;A more common optimization is to use an iterative rather than recursive solution. The simplest version of this is something like:&lt;/p&gt;

&lt;pre class="brush:python"&gt;
#!/usr/bin/python
rows = [[1]]

for row in xrange(1, 20):
  values = [1]
  prev = rows[-1]
  for index in xrange(1, row):
    values.append(prev[index-1] + prev[index])
  values.append(1)
  rows.append(values)

for row in rows:
  print row
&lt;/pre&gt;

&lt;p&gt;Performance is similar to the previous one (0.028 seconds). There are lots of subtle variations of this.&lt;/p&gt;

&lt;h3&gt;Dynamic Programming&lt;/h3&gt;

&lt;p&gt;The astute candidate will realize that you don't need to store the rows at all. You only ever need the current and the previous rows. This reduces space for O(n&lt;sup&gt;2&lt;/sup&gt;) to O(n).&lt;/p&gt;

&lt;pre class="brush:python"&gt;
#!/usr/bin/python
prev = []
for row in xrange(20):
  curr = [1]
  for index in xrange(1, row):
    curr.append(prev[index-1] + prev[index])
  curr.append(1)
  print curr
  prev = curr
&lt;/pre&gt;

&lt;p&gt;0.027 seconds.&lt;p&gt;

&lt;h3&gt;Bonus Points&lt;/h3&gt;

&lt;p&gt;Assuming f(r, i) returns the value for row r and index i, a candidate may well point out that the triangle is symmetrical, specifically that f(r, i) == f(r, r-i), meaning you, at most, only have to calculate half of the triangle. This is particularly relevant if they are asked to return a specific value (ie f(117, 116) == f(117, 1) == 117).&lt;/p&gt;

&lt;p&gt;The second optimization one could make is that since f(r, i) == f(r, i-1) + f(r, i) then to calculate f(r, i) you only need to calculate up to the i'th element of each row.&lt;p&gt;

&lt;h3&gt;Why is this a good question?&lt;/h3&gt;

&lt;p&gt;As demonstrated, the code solutions are short. They're longer in, say, C, C++ or Java rather than Python but not that much longer. The point here isn't to get a perfect solution from the interviewee (meaning that deducting points for a missing colon or a typo would be silly). The purpose of the exercise is to demonstrate the they can turn a relatively simple algorithm into code. They have the thought process to do so. In doing so, their familiarity with their chosen language should be obvious.&lt;/p&gt;

&lt;p&gt;Also, there are several degrees of solutions. Any solution trumps no solution but the quality of the solution will give you a useful signal (IMHO).&lt;/p&gt;

&lt;h3&gt;What should a coding test tell you?&lt;/h3&gt;

&lt;p&gt;A coding test like this is a negative filter. Assuming your chosen problem is sufficiently simple, if an interviewee can't turn it into at least the outline of code in a reasonable length of time then that's a red flag. If they can do it in record time it doesn't mean they're a rock star. You're just trying to weed out candidates who should've already been screened out.&lt;p&gt;

&lt;p&gt;What's a reasonable length of time? IMHO 5 minutes or less is fast (arguably blazing fast). Anything under 10 minutes is fine. If someone is taking more than say 15-20 minutes, that is possibly cause for concern.&lt;/p&gt;

&lt;h3&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;The key qualities in a coding test are:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;It needs to be relatively simple. If it takes more than about 20-30 lines of Python to solve it's probably too complex;&lt;/li&gt;
&lt;li&gt;It needs to be easy to explain. If someone doesn't know what Pascal's Triangle is, that's not a problem. Explain it; and&lt;/li&gt;
&lt;li&gt;The goal is to put thought into code. That code doesn't need to be perfect. It just needs to be sufficiently expressive.&lt;/li&gt;
&lt;/ol&gt;&lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/ahSIPkRcc_U" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/3590347835421261713/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2012/01/interview-programming-problems-done.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/3590347835421261713?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/3590347835421261713?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/ahSIPkRcc_U/interview-programming-problems-done.html" title="Interview Programming Problems Done Right" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://www.cforcoding.com/2012/01/interview-programming-problems-done.html</feedburner:origLink></entry><entry gd:etag="W/&quot;D0UHQnk4cCp7ImA9Wx5WGEs.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-1517988092026765287</id><published>2010-10-01T00:47:00.001+08:00</published><updated>2010-10-01T00:47:13.738+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-10-01T00:47:13.738+08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="opinion" /><category scheme="http://www.blogger.com/atom/ns#" term="news" /><title>Did Michael Arrington Screw Jason Calacanis and the State of California?</title><content type="html">&lt;p&gt;This week lawyer turned blogger &lt;a href="http://online.wsj.com/article/SB10001424052748703882404575519831320838198.html"&gt;Michael Arrington sold the TecnCrunch network of sites to AOL&lt;/a&gt; for an undisclosed sum. That sum has been speculated to be between $25 and $40 million.&lt;/p&gt;  &lt;p&gt;The timeline of events suggests a not-so-pretty picture to this saga. But first some background.&lt;/p&gt;  &lt;h3&gt;The Crunchpad&lt;/h3&gt;  &lt;p&gt;The Crunchpad was a project to create a &lt;a href="http://techcrunch.com/2008/07/21/we-want-a-dead-simple-web-tablet-help-us-build-it/"&gt;$200 tablet started by Arrington in 2008&lt;/a&gt;. By June 2009 there had been &lt;a href="http://techcrunch.com/2009/06/03/crunchpad-the-launch-prototype/"&gt;several prototypes&lt;/a&gt; by then in partnership with Fusion Garage. In November 2009, the &lt;a href="http://techcrunch.com/2009/11/30/crunchpad-end/"&gt;Crunchpad was dead&lt;/a&gt; as Fusion Garage had announced they were going it alone. In December 2009, &lt;a href="http://techcrunch.com/2009/12/04/crunchpad-litigation/"&gt;filed a lawsuit against Fusion Garage&lt;/a&gt; from which not much was heard until a &lt;a href="http://techcrunch.com/2010/09/15/status-of-crunchpad-litigation/"&gt;September 2010 update&lt;/a&gt; that included some fairly damning communications from discovery.&lt;/p&gt;  &lt;p&gt;I, like many, felt that Arrington has been screwed on this one. It was clearly his idea and Fusion Garage had seemingly seriously misjudged both the market for their “JooJoo” and Arrington’s resolve to see justice done beyond any sense of any possible restitution.&lt;/p&gt;  &lt;p&gt;No formal contract seems to exist between the two parties, which struck many as strange given that Arrington is a lawyer. Fusion Garage seems to have relied upon this fact too not fully understanding or appreciating that a partnership can be inferred from one’s actions.&lt;/p&gt;  &lt;h3&gt;Tech Crunch Conference&lt;/h3&gt;  &lt;p&gt;In January 2007, Arrington &lt;a href="http://techcrunch.com/2007/01/31/the-techcrunch20-conference/"&gt;announced the TechCrunch 20 conference&lt;/a&gt; as a joint venture between himself and Jason Calacanis, founder of the Silicon Alley Reporter, Weblogs Inc and Mahalo. TC20 (later TC40 and TC50) was a conference for startups held in San Francisco annually that launched some highly successful companies including Mint.com and Yammer.&lt;/p&gt;  &lt;p&gt;In May 2010, &lt;a href="http://latimesblogs.latimes.com/technology/2010/05/rip-techcrunch50-as-founders-part-ways.html"&gt;Arrington and Calacnis parted ways on TC50&lt;/a&gt;. Arrington had earlier (March) &lt;a href="http://techcrunch.com/2010/03/01/techcrunch-disrupt-ny-2010/"&gt;launched TechCrunch Disrupt&lt;/a&gt;, his own conference. Soon after the split, Calacanis went on to &lt;a href="http://calacanis.com/2010/05/11/best-launch-conference-story-yet/"&gt;launch the Launch Conference&lt;/a&gt;.&lt;/p&gt;  &lt;h3&gt;Covering the Valley… from Seattle?!&lt;/h3&gt;  &lt;p&gt;&lt;a href="http://techcrunch.com/"&gt;TechCrunch&lt;/a&gt; is either a blog or a news organization (depending on whom you ask) that covers startups and entrepreneurs, primarily in the internet space. The Mecca for such companies is of course Silicon Valley, although there are nascent startup scenes in Boulder (eg TechStars), New York, Austin, Boston and other places.&lt;/p&gt;  &lt;p&gt;On May 3, 2010, &lt;a href="http://www.google.com/search?sourceid=chrome&amp;amp;ie=UTF-8&amp;amp;q=arrington+moves+seattle"&gt;Arrington announced a move to Seattle&lt;/a&gt;:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;My plan is to roughly split my time between Seattle and Silicon Valley … Seattle is sort of like the minor leagues of the startup world, … But to be honest the biggest reason I’ve moved is to simply mix things up in my life. Like many people I tend to get bored if I stay in one place too long – five years is the longest I’ve lived anywhere since high school. It was time for a change.&lt;/p&gt; &lt;/blockquote&gt;  &lt;h3&gt;The Tax Man&lt;/h3&gt;  &lt;p&gt;California and Washington have one important difference: Washington has no state income tax. &lt;a href="http://en.wikipedia.org/wiki/State_income_tax"&gt;California’s marginal income tax rate is 10.3% (over $1 million)&lt;/a&gt;. Why does this matter? Capital gains, such as the sale of a company, depending on the state are added to gross income.&lt;/p&gt;  &lt;p&gt;&lt;em&gt;That could be as much as $4 million!&lt;/em&gt;&lt;/p&gt;  &lt;p&gt;There are however some problems. From &lt;a href="http://klapachlaw.com/blog/38-february-13-2010-.html"&gt;Do you want to avoid California residency?&lt;/a&gt;&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;(1) California residents pays California tax on all their income.&lt;/p&gt;    &lt;p&gt;(2) California generally taxes California-source income, …&lt;/p&gt;    &lt;p&gt;So, who is a resident? In determining residency, California law provides two presumptions. The first presumption is that a taxpayer who, in the aggregate, spends more than 9 months of a taxable year in California will be presumed to be a California resident. The second presumption is that an individual whose presence in California does not exceed 6 months within a taxable year and who maintains a permanent home outside California is not considered a California resident provided the taxpayer does not engage in any activity or conduct within the State other than as a seasonal visitor, tourist, or guest.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Permanent residence in Washington? Check. Splitting time between Seattle and the Valley? Check.&lt;/p&gt;  &lt;p&gt;But there is no hard and fast rule for what qualifies you as a California resident. It is a subjective test based on factors such as the location of your permanent residence, the location of your family, where you are registered to vote, how much time you spent in California and so on.&lt;/p&gt;  &lt;p&gt;Now there’s nothing wrong with minimizing one’s tax liability. The late Australian media magnate &lt;a href="http://en.wikipedia.org/wiki/Kerry_Packer#Government_inquiry_and_legal_challenges"&gt;Kerry Packer went so far as to say to a 1991 government inquiry&lt;/a&gt;:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;Of course I am minimising my tax. And if anybody in this country doesn't minimise their tax, they want their heads read, because as a government, I can tell you you're not spending it that well that we should be donating extra!&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;The interesting part is that the move implies &lt;em&gt;intent&lt;/em&gt;. If the California FTB (Franchise Tax Board) argues Arrington moved simply to avoid tax on a business that is clearly associated with, was created in and operates within the state of California, then tax is due. One could go so far as to argue the ethics of paying what’s owed.&lt;/p&gt;  &lt;p&gt;But if you ignore that argument and instead argue that the May move to Seattle indicated Arrington’s intent to sell to AOL (rather than simply being a happy coincidence that he moved to an income tax free state 4 months before receiving $30 million), it puts subsequent events into a whole different context.&lt;/p&gt;  &lt;h3&gt;AOL Buys TechCrunch&lt;/h3&gt;  &lt;p&gt;On 29 September 2010, &lt;a href="http://www.guardian.co.uk/media/2010/sep/29/aol-buys-techcrunch"&gt;AOL Bought TechCrunch&lt;/a&gt;. No dollar amount for the sale has been disclosed. Rumours of $25 to $40 million abound. My personal opinion is that it was in the region of $30-35 million with a substantial ($10m+) earn out over 3+ years. The basis for this opinion is, in &lt;a href="http://techcrunch.com/2010/09/28/why-we-sold-techcrunch-to-aol-and-where-we-go-from-here/"&gt;Arrington's own words&lt;/a&gt;&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;So we begin another journey. I fully intend to stay with AOL for a very, very long time. &lt;strong&gt;&lt;em&gt;And the entire team has big incentives to stay on board for at least three years.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;(emphasis added)&lt;/p&gt;  &lt;p&gt;Plus earn-outs on an acquisition are common practice.&lt;/p&gt;  &lt;p&gt;Where this gets &lt;em&gt;really&lt;/em&gt; interesting is that according to Business Insider, &lt;a href="http://www.businessinsider.com/aol-tried-to-buy-techcrunch-twice-before-2010-9"&gt;AOL Tried To Buy TechCrunch Twice Before&lt;/a&gt;:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;During the second run, AOL wanted to buy TechCrunch for something a little under $25 million. TechCrunch, in negotiations led by CEO Heather Harde, wanted closer to $30 million. &lt;strong&gt;&lt;em&gt;AOL wouldn't go there because it thought too much of TechCrunch's revenues were from a conference business it didn't entirely own itself.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;(emphasis added)&lt;/p&gt;  &lt;p&gt;Tim Armstrong has been &lt;a href="http://en.wikipedia.org/wiki/Tim_Armstrong_(executive)"&gt;CEO of AOL since March 12, 2009&lt;/a&gt;. With TechCrunch Disrupt launched in March 2010 and Armstrong purportedly having made at least one other run at TechCrunch, it’s plausible that this move had been planned six months ago.&lt;/p&gt;  &lt;h3&gt;Jason Calacanis&lt;/h3&gt;  &lt;p&gt;Calacanis is an incendiary figure, deliberately so. As such he has many detractors. He also has a large media exit to AOL (Weblogs Inc) under his belt. When the sale announcement was imminent, &lt;a href="http://www.businessinsider.com/calacanis-arrington-techcrunch-aol-2010-9"&gt;Jason Calacanis Shreds Mike Arrington, Calls Him &amp;quot;A Trainwreck,&amp;quot; &amp;quot;A Liability,&amp;quot; And &amp;quot;A Sociopath&amp;quot;&lt;/a&gt;:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;@&lt;a href="http://twitter.com/arrington"&gt;arrington&lt;/a&gt; told me he wouldn't sell @&lt;a href="http://twitter.com/TechCrunch"&gt;TechCrunch&lt;/a&gt; for &amp;lt;than $40M last year.TC has ~$6m in revenue/~$1.5m in profits (all TC50!)&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Rather prophetic. From b&lt;a href="http://www.businessinsider.com/jason-calacanis-okay-fine-heres-what-i-really-think-of-techcrunch-and-mike-arrington-2010-6"&gt;ack in June:&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&lt;a href="http://www.businessinsider.com/jason-calacanis-okay-fine-heres-what-i-really-think-of-techcrunch-and-mike-arrington-2010-6#the-opening-volley-1"&gt;&lt;img style="width: 550px" src="http://static.businessinsider.com/image/4c1a65147f8b9afe371b0000-590-/the-opening-volley.jpg" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;h3&gt;The Timeline&lt;/h3&gt;  &lt;table border="0" cellspacing="0" cellpadding="2" width="500"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="top" width="250"&gt;Date&lt;/td&gt;        &lt;td valign="top" width="250"&gt;Event&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="250"&gt;March 12, 2009&lt;/td&gt;        &lt;td valign="top" width="250"&gt;Tim Armstrongs becomes CEO of AOL&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="250"&gt;2009&lt;/td&gt;        &lt;td valign="top" width="250"&gt;Armstrong’s first attempt to buy TechCrunch&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="250"&gt;September 14-15, 2009&lt;/td&gt;        &lt;td valign="top" width="250"&gt;TechCrunch50 conference&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="250"&gt;March 3, 2010&lt;/td&gt;        &lt;td valign="top" width="250"&gt;Arrington announces TechCrunch Disrupt&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="250"&gt;May 3, 2010&lt;/td&gt;        &lt;td valign="top" width="250"&gt;Arrington announces move to Seattle&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="250"&gt;May 12, 2010&lt;/td&gt;        &lt;td valign="top" width="250"&gt;Arrington and Calacanis announce end to TechCrunch50 joint venture.&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="250"&gt;June 17, 2010&lt;/td&gt;        &lt;td valign="top" width="250"&gt;Calacanis first (publicly) accuses Arrington of screwing him out of a piece of the pie.&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="250"&gt;September 28, 1010&lt;/td&gt;        &lt;td valign="top" width="250"&gt;AOL acquires TechCrunch&lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;h3&gt;Conclusion&lt;/h3&gt;  &lt;p&gt;Where once I felt sympathy for Arrington when he was (quite clearly, in my opinion) intentionally deceived by Fusion Garage, the pendulum has swung the other way.&lt;/p&gt;  &lt;p&gt;It’s hard to look at that timeline and not see the intent of Arrington splitting with a partner over something that allegedly represents the majority of TechCrunch’s revenue and then going on to launch a similar conference, building on its success.&lt;/p&gt;  &lt;p&gt;As divisive as Calacanis can be, it’s hard to argue that he wasn’t a big part of the TC50 conference at all levels: conceiving (jointly or solely) the idea for the conference, filtering and coaching startups, on-stage presentations, choosing the winners and so on.&lt;/p&gt;  &lt;p&gt;Compare this to Chris Dixon’s story: &lt;a href="http://mixergy.com/siteadvisor-chris-dixon-interview/"&gt;How SiteAdvisor Went From Startup Vision To A Career-Making Exit – with Chris Dixon&lt;/a&gt;. There are many tales like this where the founders go above and beyond what’s required to look after those who worked so hard for them.&lt;/p&gt;  &lt;p&gt;One could go so far as to call Arrington’s actions premeditated, even despicable.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/US7xJSQ8Ta8" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/1517988092026765287/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2010/10/did-michael-arrington-screw-jason.html#comment-form" title="10 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/1517988092026765287?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/1517988092026765287?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/US7xJSQ8Ta8/did-michael-arrington-screw-jason.html" title="Did Michael Arrington Screw Jason Calacanis and the State of California?" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>10</thr:total><feedburner:origLink>http://www.cforcoding.com/2010/10/did-michael-arrington-screw-jason.html</feedburner:origLink></entry><entry gd:etag="W/&quot;D0EMR3o9fCp7ImA9Wx5REkk.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-4157958050799348852</id><published>2010-08-20T03:19:00.001+08:00</published><updated>2010-08-20T03:21:26.464+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-08-20T03:21:26.464+08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="algorithms" /><title>Coin Tosses, Binomials and Dynamic Programming</title><content type="html">&lt;p&gt;Today someone asked about the &lt;a href="http://stackoverflow.com/questions/3519395/probability-of-outcomes-algorithm"&gt;probability of outcomes&lt;/a&gt; in relation to coin tosses on Stackoverflow. It’s an interesting question because it touches on several areas that programmers should know from maths (probability and counting) to dynamic programming.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/Dynamic_programming"&gt;Dynamic programming&lt;/a&gt; is a divide-and-conquer technique that is often overlooked by programmers. It can sometimes be hard to spot situation where it applies but when it does apply it will typically reduce an algorithm from exponential complexity (which is impractical in all but the smallest of cases) to a polynomial solution.&lt;/p&gt;  &lt;p&gt;I will explain these concepts in one of the simplest forms: the humble coin toss.&lt;/p&gt;  &lt;h3&gt;Bernoulli Trials&lt;/h3&gt;  &lt;p&gt;A Bernoulli trial is an event (or experiment) that randomly has two outcomes. The probability of each outcome is known. The probability of success is &lt;em&gt;p&lt;/em&gt; and the probability of failure is &lt;em&gt;1-p &lt;/em&gt;where &lt;em&gt;0 &amp;lt; p &amp;lt; 1&lt;/em&gt;. The outcome of each trial is independent of any other trial.&lt;/p&gt;  &lt;p&gt;What constitutes success is arbitrary. For the purposes of this post, success will be defined as a coin coming up heads. To clarify the independence property, the probability of a coin coming up heads does not change regardless of any previous tosses of that (or any other) coin.&lt;/p&gt;  &lt;p&gt;Assume a fair coin (&lt;em&gt;p&lt;/em&gt; = 0.5). If you want to determine the probability of &lt;em&gt;k&lt;/em&gt; successes (heads) from &lt;em&gt;n&lt;/em&gt; trials in the general case, it is worth first recognizing that if you make &lt;em&gt;n&lt;/em&gt; coin tosses there are 2&lt;em&gt;&lt;sup&gt;n&lt;/sup&gt;&lt;/em&gt; permutations. We are not interesting in permutations however. For &lt;em&gt;n&lt;/em&gt; = 2, heads then tails is identical to tails then heads. If we represent 0 as tails and 1 as heads, the outcomes for the first few values of &lt;em&gt;n&lt;/em&gt; are:&lt;/p&gt;  &lt;p&gt;&lt;/p&gt;  &lt;table border="0" cellspacing="0" cellpadding="2" width="401"&gt;&lt;thead&gt;     &lt;tr&gt;       &lt;th rowspan="2"&gt;Number of Coins&lt;/th&gt;        &lt;th colspan="5"&gt;Number of Heads&lt;/th&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;th valign="top" width="61"&gt;0&lt;/th&gt;        &lt;th valign="top" width="60"&gt;1&lt;/th&gt;        &lt;th valign="top" width="61"&gt;2&lt;/th&gt;        &lt;th valign="top" width="63"&gt;3&lt;/th&gt;        &lt;th valign="top" width="60"&gt;4&lt;/th&gt;     &lt;/tr&gt;   &lt;/thead&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td&gt;         &lt;p align="center"&gt;1&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="61"&gt;         &lt;p align="center"&gt;0&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="60"&gt;         &lt;p align="center"&gt;1&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="61"&gt;         &lt;p align="center"&gt;&amp;#160;&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="63"&gt;         &lt;p align="center"&gt;&amp;#160;&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="60"&gt;         &lt;p align="center"&gt;&amp;#160;&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="49"&gt;         &lt;p align="center"&gt;2&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="61"&gt;         &lt;p align="center"&gt;00&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="60"&gt;         &lt;p align="center"&gt;01            &lt;br /&gt;10&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="61"&gt;         &lt;p align="center"&gt;11&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="63"&gt;         &lt;p align="center"&gt;&amp;#160;&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="60"&gt;         &lt;p align="center"&gt;&amp;#160;&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="49"&gt;         &lt;p align="center"&gt;3&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="61"&gt;         &lt;p align="center"&gt;000&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="60"&gt;         &lt;p align="center"&gt;100            &lt;br /&gt;010             &lt;br /&gt;010&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="61"&gt;         &lt;p align="center"&gt;011            &lt;br /&gt;101             &lt;br /&gt;110&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="63"&gt;         &lt;p align="center"&gt;111&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="60"&gt;         &lt;p align="center"&gt;&amp;#160;&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="51"&gt;         &lt;p align="center"&gt;4&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="63"&gt;         &lt;p align="center"&gt;0000&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="64"&gt;         &lt;p align="center"&gt;1000            &lt;br /&gt;0100             &lt;br /&gt;0010             &lt;br /&gt;0001&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="65"&gt;         &lt;p align="center"&gt;0011            &lt;br /&gt;0101             &lt;br /&gt;0110             &lt;br /&gt;1001             &lt;br /&gt;1010             &lt;br /&gt;1100&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="68"&gt;         &lt;p align="center"&gt;0111            &lt;br /&gt;1101             &lt;br /&gt;1011             &lt;br /&gt;1110&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="66"&gt;         &lt;p align="center"&gt;1111&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;If you ignore the actual values and reduce it to the number of permutations:&lt;/p&gt;  &lt;table border="0" cellspacing="0" cellpadding="2" width="401"&gt;&lt;thead&gt;     &lt;tr&gt;       &lt;th rowspan="2"&gt;Number of Coins&lt;/th&gt;        &lt;th colspan="5"&gt;Number of Heads&lt;/th&gt;        &lt;th rowspan="2"&gt;Total&lt;/th&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;th valign="top" width="61"&gt;0&lt;/th&gt;        &lt;th valign="top" width="60"&gt;1&lt;/th&gt;        &lt;th valign="top" width="61"&gt;2&lt;/th&gt;        &lt;th valign="top" width="63"&gt;3&lt;/th&gt;        &lt;th valign="top" width="60"&gt;4&lt;/th&gt;     &lt;/tr&gt;   &lt;/thead&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td&gt;         &lt;p align="center"&gt;1&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="61"&gt;         &lt;p align="center"&gt;1&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="60"&gt;         &lt;p align="center"&gt;1&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="61"&gt;         &lt;p align="center"&gt;&amp;#160;&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="63"&gt;         &lt;p align="center"&gt;&amp;#160;&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="60"&gt;         &lt;p align="center"&gt;&amp;#160;&lt;/p&gt;       &lt;/td&gt;        &lt;td&gt;         &lt;p align="center"&gt;2&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="49"&gt;         &lt;p align="center"&gt;2&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="61"&gt;         &lt;p align="center"&gt;1&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="60"&gt;         &lt;p align="center"&gt;2&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="61"&gt;         &lt;p align="center"&gt;1&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="63"&gt;         &lt;p align="center"&gt;&amp;#160;&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="60"&gt;         &lt;p align="center"&gt;&amp;#160;&lt;/p&gt;       &lt;/td&gt;        &lt;td&gt;         &lt;p align="center"&gt;4&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="49"&gt;         &lt;p align="center"&gt;3&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="61"&gt;         &lt;p align="center"&gt;1&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="60"&gt;         &lt;p align="center"&gt;3&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="61"&gt;         &lt;p align="center"&gt;3&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="63"&gt;         &lt;p align="center"&gt;1&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="60"&gt;         &lt;p align="center"&gt;&amp;#160;&lt;/p&gt;       &lt;/td&gt;        &lt;td&gt;         &lt;p align="center"&gt;8&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="51"&gt;         &lt;p align="center"&gt;4&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="63"&gt;         &lt;p align="center"&gt;1&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="64"&gt;         &lt;p align="center"&gt;4&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="65"&gt;         &lt;p align="center"&gt;6&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="68"&gt;         &lt;p align="center"&gt;4&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="66"&gt;         &lt;p align="center"&gt;1&lt;/p&gt;       &lt;/td&gt;        &lt;td&gt;         &lt;p align="center"&gt;16&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;Divide the number of desired outcomes by the total number of outcomes and you have your probability.&lt;/p&gt;  &lt;h3&gt;Binomial Distribution&lt;/h3&gt;  &lt;p&gt;The above number are obviously not random and correspond to the coefficients of a binomial function:&lt;/p&gt;  &lt;p&gt;(1 + &lt;em&gt;x&lt;/em&gt;)&lt;sup&gt;&lt;em&gt;n&lt;/em&gt;&lt;sup&gt;&lt;/sup&gt; &lt;/sup&gt;&lt;/p&gt;  &lt;p&gt;For example:&lt;/p&gt;  &lt;p&gt;(1 + &lt;em&gt;x&lt;/em&gt;)&lt;sup&gt;2&lt;/sup&gt; = x&lt;sup&gt;2&lt;/sup&gt; + 2x + 1 (1, 2, 1)     &lt;br /&gt;(1 + &lt;em&gt;x&lt;/em&gt;)&lt;sup&gt;3&lt;/sup&gt; = x&lt;sup&gt;3&lt;/sup&gt; + 3x&lt;sup&gt;2&lt;/sup&gt; + 3x + 1 (1, 3, 3, 1)     &lt;br /&gt;(1 + &lt;em&gt;x&lt;/em&gt;)&lt;sup&gt;4&lt;/sup&gt; = x&lt;sup&gt;4&lt;/sup&gt; + 4x&lt;sup&gt;3&lt;/sup&gt; + 6x&lt;sup&gt;2&lt;/sup&gt; + 4x + 1 (1, 4, 6, 4, 1)     &lt;br /&gt;…&lt;/p&gt;  &lt;p&gt;Any of the above coefficients can be found with factorials:&lt;/p&gt;  &lt;p&gt;&lt;em&gt;f&lt;/em&gt;(&lt;em&gt;n&lt;/em&gt;,&lt;em&gt;k&lt;/em&gt;) = &lt;sup&gt;&lt;em&gt;n&lt;/em&gt;&lt;/sup&gt;C&lt;sub&gt;&lt;em&gt;k&lt;/em&gt;&lt;/sub&gt; = &lt;em&gt;n&lt;/em&gt;! / (&lt;em&gt;n&lt;/em&gt;-&lt;em&gt;k&lt;/em&gt;)!&lt;em&gt;k&lt;/em&gt;!&lt;/p&gt;  &lt;h3&gt;Binomial Probability&lt;/h3&gt;  &lt;p&gt;Since we can calculate the number of permutations for our desired outcome and we know the total outcomes are 2&lt;sup&gt;n&lt;/sup&gt; then:&lt;/p&gt;  &lt;p&gt;&lt;em&gt;P&lt;/em&gt;(&lt;em&gt;n&lt;/em&gt;,&lt;em&gt;k&lt;/em&gt;) = &lt;sup&gt;&lt;em&gt;n&lt;/em&gt;&lt;/sup&gt;C&lt;sub&gt;&lt;em&gt;k&lt;/em&gt;&lt;/sub&gt; x 2&lt;sup&gt;-&lt;em&gt;n&lt;/em&gt;&lt;/sup&gt;&lt;/p&gt;  &lt;p&gt;assuming:&lt;/p&gt;  &lt;p&gt;&lt;em&gt;P&lt;/em&gt; = the probability of &lt;em&gt;k &lt;/em&gt;coins coming up heads for &lt;em&gt;n&lt;/em&gt; coin tosses     &lt;br /&gt;&lt;em&gt;n&lt;/em&gt; = number of coin tosses     &lt;br /&gt;&lt;em&gt;k&lt;/em&gt; = number of desired successes&lt;/p&gt;  &lt;h3&gt;Unfair Coins&lt;/h3&gt;  &lt;p&gt;The above is accurate for fair coins but what about unfair coins? Assume that a given coin has a 60% chance of coming up heads. What does that do to our formula? Very little actually. The above formula is simply a special case of a more general formula.&lt;/p&gt;  &lt;p&gt;&lt;em&gt;P&lt;/em&gt;(&lt;em&gt;n&lt;/em&gt;,&lt;em&gt;k&lt;/em&gt;,&lt;em&gt;p&lt;/em&gt;) = &lt;sup&gt;&lt;em&gt;n&lt;/em&gt;&lt;/sup&gt;C&lt;sub&gt;&lt;em&gt;k&lt;/em&gt;&lt;/sub&gt; x &lt;em&gt;p&lt;/em&gt;&lt;sup&gt;&lt;em&gt;k&lt;/em&gt;&lt;/sup&gt; x (1-&lt;em&gt;p&lt;/em&gt;)&lt;sup&gt;&lt;em&gt;n&lt;/em&gt;-&lt;em&gt;k&lt;/em&gt;&lt;/sup&gt;&lt;/p&gt;  &lt;p&gt;assuming:&lt;/p&gt;  &lt;p&gt;&lt;em&gt;P&lt;/em&gt; = the probability of &lt;em&gt;k &lt;/em&gt;coins coming up heads for &lt;em&gt;n&lt;/em&gt; coin tosses     &lt;br /&gt;&lt;em&gt;n&lt;/em&gt; = number of coin tosses     &lt;br /&gt;&lt;em&gt;k&lt;/em&gt; = number of desired successes     &lt;br /&gt;&lt;em&gt;p&lt;/em&gt; = probability of a coin coming up heads&lt;/p&gt;  &lt;p&gt;Plug in &lt;em&gt;p&lt;/em&gt; = 0.5 and the formula reduces to the earlier version.&lt;/p&gt;  &lt;h3&gt;Pascal’s Triangle&lt;/h3&gt; &lt;a href="http://en.wikipedia.org/wiki/File:Pascal's_triangle_5.svg" rel="license"&gt;&lt;img style="float: left" src="http://upload.wikimedia.org/wikipedia/commons/thumb/f/f6/Pascal's_triangle_5.svg/200px-Pascal's_triangle_5.svg.png" /&gt;&lt;/a&gt;   &lt;p&gt;Dealing with factorials is cumbersome and impractical for even small values of &lt;em&gt;n&lt;/em&gt; (e.g., 1000! has &lt;a href="http://en.wikipedia.org/wiki/Factorial"&gt;2,567 digits&lt;/a&gt;). Fortunately there is a &lt;em&gt;much&lt;/em&gt; faster method of calculating coefficients known as &lt;a href="http://en.wikipedia.org/wiki/Pascal's_triangle"&gt;Pascal's triangle&lt;/a&gt;. &lt;/p&gt;  &lt;p&gt;If you look at the coefficients to the left, you’ll see that any coefficient is the sum of the two coefficients above it (eg 10 = 4 + 6). This leads to the following function:&lt;/p&gt;  &lt;p&gt;&lt;em&gt;C&lt;/em&gt;(n,k) = 1 when &lt;em&gt;k&lt;/em&gt; = 0 or &lt;em&gt;k&lt;/em&gt; = &lt;em&gt;n&lt;/em&gt;     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; = C(&lt;em&gt;n&lt;/em&gt;-1,&lt;em&gt;k&lt;/em&gt;-1) + C(&lt;em&gt;n&lt;/em&gt;-1,&lt;em&gt;k&lt;/em&gt;) for 1 &amp;lt; &lt;em&gt;k&lt;/em&gt; &amp;lt; &lt;em&gt;n&lt;/em&gt;&lt;/p&gt;  &lt;p&gt;which could lead to the following naive implementation:&lt;/p&gt;  &lt;pre class="brush:java"&gt;public static int coefficient(int n, int k) {
  return k == 0 || k == n ? 1 : coefficient(n - 1, k - 1) + coefficient(n - 1, k);
}&lt;/pre&gt;

&lt;p&gt;Unfortunately this simple function has &lt;em&gt;terrible&lt;/em&gt; performance: O(2&lt;sup&gt;&lt;em&gt;n&lt;/em&gt;&lt;/sup&gt;) to be precise. The reason is simple: it does an awful lot of redundant calculations&amp;#160; There are two ways to solve this performance problem:&lt;/p&gt;

&lt;h3&gt;Memoization&lt;/h3&gt;

&lt;p&gt;Memoization is an optimization technique that avoids making repeated calls to the same function with the same arguments. This assumes a &lt;a href="http://en.wikipedia.org/wiki/Pure_function"&gt;&lt;em&gt;pure function&lt;/em&gt;&lt;/a&gt;, meaning any two calls with the same arguments will evaluate to the same result and there are no relevant side effects. One such implementation is:&lt;/p&gt;

&lt;pre class="brush:java"&gt;private static class Pair {
  public final int n;
  public final int k;

  private Pair(int n, int k) {
    this.n = n;
    this.k = k;
  }
  
  @Override
  public int hashCode() {
    return (n + 37881) * (k + 47911);
  }
  
  @Override
  public boolean equals(Object ob) {
    if (!(ob instanceof Pair)) {
      return false;
    }
    Pair p = (Pair)ob;
    return n == p.n &amp;amp;&amp;amp; k == p.k;
  }
}

private static final Map&amp;lt;Pair, Integer&amp;gt; CACHE = new HashMap&amp;lt;Pair, Integer&amp;gt;();

public static int coefficient2(int n, int k) {
  if (k == 0 || k == n) {
    return 1;
  } else if (k == 1 || k == n - 1) {
    return n;
  }
  Pair p = new Pair(n, k);
  Integer i = CACHE.get(p);
  if (i == null) {
    i = coefficient2(n - 1, k - 1) + coefficient2(n - 1, k);
    CACHE.put(p, i);
  }
  return i;
}&lt;/pre&gt;

&lt;p&gt;To compare, calculating &lt;em&gt;C&lt;/em&gt;(34,20) took 10.7 seconds on my PC with the first method and 2.4 &lt;em&gt;milliseconds&lt;/em&gt; with the second. &lt;em&gt;C&lt;/em&gt;(340,200) still only takes a mere 21.1 ms. There probably isn’t enough time left in the universe for the first one to finish that.&lt;/p&gt;

&lt;p&gt;The second method uses O(&lt;em&gt;n&lt;/em&gt;&lt;sup&gt;2&lt;/sup&gt;) space and O(&lt;em&gt;n&lt;/em&gt;&lt;sup&gt;2&lt;/sup&gt;) time.&lt;/p&gt;

&lt;h3&gt;Introducing Dynamic Programming&lt;/h3&gt;

&lt;p&gt;The key to understanding dynamic programming and applying it to a problem is identifying a &lt;a href="http://en.wikipedia.org/wiki/Recurrence_relation"&gt;recurrence relation&lt;/a&gt; that expresses the solution for a given value &lt;em&gt;n&lt;/em&gt; in terms of the solution for a lower value of &lt;em&gt;n&lt;/em&gt;. We’ve already identified this.&lt;/p&gt;

&lt;p&gt;It is worth noting that the only thing we need to calculate the values for any row of Pascal’s triangles are the values for the previous row. Therefore, we should never need to keep more than one row in memory at a time, which reduces our desired algorithm to O(&lt;em&gt;n&lt;/em&gt;) space.&lt;/p&gt;

&lt;p&gt;Consider this implementation:&lt;/p&gt;

&lt;pre class="brush:java"&gt;public static int coefficient3(int n, int k) {
  int[] work = new int[n + 1];
  for ( int i = 1; i &amp;lt;=n ; i++ ) {
    for ( int j = i; j &amp;gt;= 0; j-- ) {
      if (j == 0 || j == i) {
        work[j] = 1;
      } else if (j == 1 || j == i - 1) {
        work[j] = i;
      } else {
        work[j] += work[j - 1];
      }
    }
  }
  return work[k];
}&lt;/pre&gt;

&lt;p&gt;This algorithm simply calculates one row at a time from the first until the required row is found. Only one row (ie &lt;code&gt;work&lt;/code&gt;) is ever kept. The inner loop satisfies the boundary conditions (arguably unnecessarily) and no recursion or more complex cache is required.&lt;/p&gt;

&lt;p&gt;This algorithm is still O(&lt;em&gt;n&lt;/em&gt;&lt;sup&gt;2&lt;/sup&gt;) but is O(&lt;em&gt;n&lt;/em&gt;) space and in practice will be faster than the previous version. My example of &lt;em&gt;C&lt;/em&gt;(340,200) runs in 1.5 ms compared 21.6 ms.&lt;/p&gt;

&lt;h3&gt;Different Coins&lt;/h3&gt;

&lt;p&gt;Up until now it has been assumed that the same coin (or at least an identical coin) is used for every test. In mathematical terms, &lt;em&gt;p&lt;/em&gt; is constant. What if it isn’t? Imagine we have two coins, the first has a 70% chance of coming up heads and the second has a 60% chance, what is the probability of getting one heads when tossing the pair of coins?&lt;/p&gt;

&lt;p&gt;Our formula clearly breaks down because it has been assumed until now that HT and TH are interchangeably and (more importantly) equally likely. That is no longer the case.&lt;/p&gt;

&lt;p&gt;The simplest solution traverses all permutations, finds those that satisfy the condition and works out their cumulative probability. Consider this &lt;em&gt;terrible&lt;/em&gt; implementation:&lt;/p&gt;

&lt;pre class="brush:java"&gt;private static final int SLICES = 10;
private static int[] COINS = {7, 6};

private static void runBruteForce(int... coins) {
  long start = System.nanoTime();
  int[] tally = new int[coins.length + 1];
  bruteForce(tally, coins, 0, 0);
  long end = System.nanoTime();
  int total = 0;
  for (int count : tally) {
    total += count;
  }
  for (int i = 0; i &amp;lt; tally.length; i++) {
    System.out.printf(&amp;quot;%d : %,.4f%n&amp;quot;, i, (double) tally[i] / total);
  }
  System.out.printf(&amp;quot;%nBrute force ran for %d COINS in %,.3f ms%n%n&amp;quot;,
      coins.length, (end - start) / 1000000d);
}

private static void bruteForce(int[] table, int[] coins, int index, int count) {
  if (index == coins.length) {
    table[count]++;
    return;
  }
  for (int i = 0; i &amp;lt; coins[index]; i++) {
    bruteForce(table, coins, index + 1, count);
  }
  for (int i = coins[index]; i &amp;lt; SLICES; i++) {
    bruteForce(table, coins, index + 1, count + 1);
  }
}&lt;/pre&gt;

&lt;p&gt;For simplicity, this code is assuming the probability of heads is an integer multiple of 0.1. The result:&lt;/p&gt;

&lt;pre class="brush:plain"&gt;0 : 0.4200
1 : 0.4600
2 : 0.1200&lt;/pre&gt;

&lt;p&gt;Inspection determines this result is correct. Of course, this algorithm is O(10&lt;sup&gt;&lt;em&gt;n&lt;/em&gt;&lt;/sup&gt;) time. It can be improved to avoid some redundant calls but the resulting algorithm is still exponential.&lt;/p&gt;

&lt;h3&gt;Dynamic Programming: Part Two&lt;/h3&gt;

&lt;p&gt;The key to dynamic programming is to be able to state the problem with a suitable recurrence relation that expresses the solution in terms of smaller values of the solution.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;General Problem:&lt;/strong&gt; Given &lt;em&gt;C&lt;/em&gt;, a series of &lt;em&gt;n&lt;/em&gt; coins &lt;em&gt;p&lt;/em&gt;&lt;sub&gt;&lt;em&gt;1&lt;/em&gt;&lt;/sub&gt; to &lt;em&gt;p&lt;/em&gt;&lt;sub&gt;&lt;em&gt;n&lt;/em&gt;&lt;/sub&gt; where &lt;em&gt;p&lt;/em&gt;&lt;sub&gt;&lt;em&gt;i&lt;/em&gt;&lt;/sub&gt; represents the probability of the &lt;em&gt;i&lt;/em&gt;-th coin coming up heads, what is the probability of &lt;em&gt;k&lt;/em&gt; heads coming up from tossing all the coins?&lt;/p&gt;

&lt;p&gt;The recurrence relation required to solve this problem isn’t necessarily obvious:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;P&lt;/em&gt;(&lt;em&gt;n&lt;/em&gt;,&lt;em&gt;k&lt;/em&gt;,&lt;em&gt;C&lt;/em&gt;,&lt;em&gt;i&lt;/em&gt;) = &lt;em&gt;p&lt;/em&gt;&lt;sub&gt;&lt;em&gt;i&lt;/em&gt;&lt;/sub&gt; x &lt;em&gt;P&lt;/em&gt;(&lt;em&gt;n&lt;/em&gt;-1,&lt;em&gt;k&lt;/em&gt;-1,&lt;em&gt;C&lt;/em&gt;,&lt;em&gt;i&lt;/em&gt;+1) + (1-&lt;em&gt;p&lt;/em&gt;&lt;sub&gt;&lt;em&gt;i&lt;/em&gt;&lt;/sub&gt;) x &lt;em&gt;P&lt;/em&gt;(&lt;em&gt;n&lt;/em&gt;,&lt;em&gt;k&lt;/em&gt;,&lt;em&gt;C&lt;/em&gt;,&lt;em&gt;i&lt;/em&gt;+1)&lt;/p&gt;

&lt;p&gt;To put it another way, if you take a subset of the the coins &lt;em&gt;C&lt;/em&gt; from &lt;em&gt;p&lt;/em&gt;&lt;sub&gt;&lt;em&gt;i&lt;/em&gt;&lt;/sub&gt; to &lt;em&gt;p&lt;/em&gt;&lt;sub&gt;&lt;em&gt;n&lt;/em&gt;&lt;/sub&gt;, the probability that there will be &lt;em&gt;k&lt;/em&gt; heads in the &lt;strong&gt;remaining&lt;/strong&gt; coins can be expressed this way: if the current coin &lt;em&gt;p&lt;/em&gt;&lt;sub&gt;&lt;em&gt;i&lt;/em&gt;&lt;/sub&gt; is heads then the subsequent coins need only have &lt;em&gt;k&lt;/em&gt;-1 heads. But if the current coin comes up tails (at a chance of 1-&lt;em&gt;p&lt;/em&gt;&lt;sub&gt;&lt;em&gt;i&lt;/em&gt;&lt;/sub&gt;) then the subsequent coins must contains &lt;em&gt;k&lt;/em&gt; heads.&lt;/p&gt;

&lt;p&gt;You may need to think about that before it sinks in. Once it does sink in it may not be obvious that this helps solve the problem. The key point to remember is that each step of this relation expresses the solution in terms of one less coin and possibly one less value of &lt;em&gt;k&lt;/em&gt;. That divide-and-conquer aspect is the key to dynamic programming.&lt;/p&gt;

&lt;h3&gt;Algorithm Explained&lt;/h3&gt;

&lt;p&gt;Assume three coins (0.2, 0.3, 0.4 of coming up heads). Consider the simplest case: &lt;em&gt;k&lt;/em&gt; = 0. The probability that all three coins are tails is equal to:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;P&lt;/em&gt;(3,0,&lt;em&gt;C&lt;/em&gt;,1) = (1-&lt;em&gt;p&lt;/em&gt;&lt;sub&gt;&lt;em&gt;1&lt;/em&gt;&lt;/sub&gt;) x &lt;em&gt;P&lt;/em&gt;(3,0,&lt;em&gt;C&lt;/em&gt;,2) 

  &lt;br /&gt;= (1-&lt;em&gt;p&lt;/em&gt;&lt;sub&gt;&lt;em&gt;1&lt;/em&gt;&lt;/sub&gt;) x (1-&lt;em&gt;p&lt;/em&gt;&lt;sub&gt;&lt;em&gt;2&lt;/em&gt;&lt;/sub&gt;) x &lt;em&gt;P&lt;/em&gt;(3,0,&lt;em&gt;C&lt;/em&gt;,3) 

  &lt;br /&gt;= (1-&lt;em&gt;p&lt;/em&gt;&lt;sub&gt;&lt;em&gt;1&lt;/em&gt;&lt;/sub&gt;) x (1-&lt;em&gt;p&lt;/em&gt;&lt;sub&gt;&lt;em&gt;2&lt;/em&gt;&lt;/sub&gt;) x (1-&lt;em&gt;p&lt;/em&gt;&lt;sub&gt;&lt;em&gt;3&lt;/em&gt;&lt;/sub&gt;)&lt;/p&gt;

&lt;p&gt;which is clearly correct. It gets slightly more complicated when &lt;em&gt;k&lt;/em&gt; &amp;gt; 0. We need to remember that we now have a table for &lt;em&gt;k&lt;/em&gt; = 0. We can construct a table for &lt;em&gt;k&lt;/em&gt; = 1 in terms of that table and so on.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;P&lt;/em&gt;(3,1,&lt;em&gt;C&lt;/em&gt;,1) = &lt;em&gt;p&lt;/em&gt;&lt;sub&gt;&lt;em&gt;1&lt;/em&gt;&lt;/sub&gt; x &lt;em&gt;P&lt;/em&gt;(3,0,&lt;em&gt;C&lt;/em&gt;,2) + (1-&lt;em&gt;p&lt;/em&gt;&lt;sub&gt;&lt;em&gt;1&lt;/em&gt;&lt;/sub&gt;) x &lt;em&gt;P&lt;/em&gt;(3,1,&lt;em&gt;C&lt;/em&gt;,2) 

  &lt;br /&gt;= ...&lt;/p&gt;

&lt;p&gt;Once again, if the current coin comes up heads we need &lt;em&gt;k&lt;/em&gt;-1 coins from the remaining coins. If it doesn’t, we need &lt;em&gt;k&lt;/em&gt; heads from the remaining coins. the above relation is adding those two things together.&lt;/p&gt;

&lt;p&gt;In implementation these two cases (&lt;em&gt;k&lt;/em&gt; = 0 and &lt;em&gt;k&lt;/em&gt; &amp;gt; 0) tend to be treated the same since &lt;em&gt;k&lt;/em&gt; = 0 is a special case of &lt;em&gt;k&lt;/em&gt; &amp;gt; 0 given the probability of anything where &lt;em&gt;k&lt;/em&gt; = –1 is 0.&lt;/p&gt;

&lt;p&gt;Expressing this as a table:&lt;/p&gt;

&lt;div align="center"&gt;
  &lt;table border="0" cellspacing="0" cellpadding="2" width="500" align="center"&gt;&lt;thead&gt;
      &lt;tr&gt;
        &lt;th valign="top" width="100"&gt;&amp;#160;&lt;/td&gt; &lt;/th&gt;

        &lt;th valign="top" width="300" colspan="3"&gt;Coins&lt;/th&gt;

        &lt;th valign="top" width="100"&gt;&amp;#160;&lt;/th&gt;
      &lt;/tr&gt;

      &lt;tr&gt;
        &lt;th valign="top" width="100"&gt;Number of Heads&lt;/th&gt;

        &lt;th valign="top" width="100"&gt;0.200&lt;/th&gt;

        &lt;th valign="top" width="100"&gt;0.300&lt;/th&gt;

        &lt;th valign="top" width="100"&gt;0.400&lt;/th&gt;

        &lt;th valign="top" width="100"&gt;&amp;#160;&lt;/th&gt;
      &lt;/tr&gt;
    &lt;/thead&gt;&lt;tbody&gt;
      &lt;tr&gt;
        &lt;td valign="top" width="100"&gt;&amp;#160;&lt;/td&gt;

        &lt;td valign="top" width="100"&gt;0.000&lt;/td&gt;

        &lt;td valign="top" width="100"&gt;0.000&lt;/td&gt;

        &lt;td valign="top" width="100"&gt;0.000&lt;/td&gt;

        &lt;td valign="top" width="100"&gt;0.000&lt;/td&gt;
      &lt;/tr&gt;

      &lt;tr&gt;
        &lt;td valign="top" width="100"&gt;0&lt;/td&gt;

        &lt;td valign="top" width="100"&gt;0.336&lt;/td&gt;

        &lt;td valign="top" width="100"&gt;0.420&lt;/td&gt;

        &lt;td valign="top" width="100"&gt;0.600&lt;/td&gt;

        &lt;td valign="top" width="100"&gt;1.000&lt;/td&gt;
      &lt;/tr&gt;

      &lt;tr&gt;
        &lt;td valign="top" width="100"&gt;1&lt;/td&gt;

        &lt;td valign="top" width="100"&gt;0.452&lt;/td&gt;

        &lt;td valign="top" width="100"&gt;0.460&lt;/td&gt;

        &lt;td valign="top" width="100"&gt;0.400&lt;/td&gt;

        &lt;td valign="top" width="100"&gt;0.000&lt;/td&gt;
      &lt;/tr&gt;

      &lt;tr&gt;
        &lt;td valign="top" width="100"&gt;2&lt;/td&gt;

        &lt;td valign="top" width="100"&gt;0.188&lt;/td&gt;

        &lt;td valign="top" width="100"&gt;0.120&lt;/td&gt;

        &lt;td valign="top" width="100"&gt;0.000&lt;/td&gt;

        &lt;td valign="top" width="100"&gt;0.000&lt;/td&gt;
      &lt;/tr&gt;

      &lt;tr&gt;
        &lt;td valign="top" width="100"&gt;3&lt;/td&gt;

        &lt;td valign="top" width="100"&gt;0.024&lt;/td&gt;

        &lt;td valign="top" width="100"&gt;0.000&lt;/td&gt;

        &lt;td valign="top" width="100"&gt;0.000&lt;/td&gt;

        &lt;td valign="top" width="100"&gt;0.000&lt;/td&gt;
      &lt;/tr&gt;
    &lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;

&lt;p&gt;Remember each cell in the table is a function of the cell to its right and the cell above that one weighted by the probability of that particular coin. The first column represents the probabilities for 0 to 3 heads from the 3 coins.&lt;/p&gt;

&lt;p&gt;Most importantly, this algorithm is O(&lt;em&gt;nk&lt;/em&gt;) time and O(&lt;em&gt;nk&lt;/em&gt;) space.&lt;/p&gt;

&lt;h3&gt;Implementation&lt;/h3&gt;

&lt;pre class="brush:java"&gt;private static void runDynamic() {
  long start = System.nanoTime();
  double[] probs = dynamic(0.2, 0.3, 0.4);
  long end = System.nanoTime();
  int total = 0;
  for (int i = 0; i &amp;lt; probs.length; i++) {
    System.out.printf(&amp;quot;%d : %,.4f%n&amp;quot;, i, probs[i]);
  }
  System.out.printf(&amp;quot;%nDynamic ran for %d coinsin %,.3f ms%n%n&amp;quot;,
      coins.length, (end - start) / 1000000d);
}

private static double[] dynamic(double... coins) {
  double[][] table = new double[coins.length + 2][];
  for (int i = 0; i &amp;lt; table.length; i++) {
    table[i] = new double[coins.length + 1];
  }
  table[1][coins.length] = 1.0d; // everything else is 0.0
  for (int i = 0; i &amp;lt;= coins.length; i++) {
    for (int j = coins.length - 1; j &amp;gt;= 0; j--) {
      table[i + 1][j] = coins[j] * table[i][j + 1] +
          (1 - coins[j]) * table[i + 1][j + 1];
    }
  }
  double[] ret = new double[coins.length + 1];
  for (int i = 0; i &amp;lt; ret.length; i++) {
    ret[i] = table[i + 1][0];
  }
  return ret;
}&lt;/pre&gt;

&lt;p&gt;Output:&lt;/p&gt;

&lt;pre class="brush:plain"&gt;0 : 0.3360
1 : 0.4520
2 : 0.1880
3 : 0.0240

Dynamic ran for 3 coins in 0.018 ms&lt;/pre&gt;

&lt;h3&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;I hope this serves as a useful introduction to dynamic programming. It can be an incredibly powerful and useful technique but can require some practice to identify where and how to apply it.&lt;/p&gt;

&lt;p&gt;The last very short code segment does an awful lot. It can handle large numbers of coins, uniform or not, and deliver results in a timely fashion.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/RxNw88QAFk0" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/4157958050799348852/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2010/08/coin-tosses-binomials-and-dynamic.html#comment-form" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/4157958050799348852?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/4157958050799348852?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/RxNw88QAFk0/coin-tosses-binomials-and-dynamic.html" title="Coin Tosses, Binomials and Dynamic Programming" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>2</thr:total><feedburner:origLink>http://www.cforcoding.com/2010/08/coin-tosses-binomials-and-dynamic.html</feedburner:origLink></entry><entry gd:etag="W/&quot;C0UHR3s5fCp7ImA9Wx5SEE4.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-8509836162958462206</id><published>2010-08-06T01:57:00.001+08:00</published><updated>2010-08-06T02:00:36.524+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-08-06T02:00:36.524+08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="opinion" /><category scheme="http://www.blogger.com/atom/ns#" term="news" /><category scheme="http://www.blogger.com/atom/ns#" term="google" /><title>Google Wave, Microsoft and Engineers Running the Asylum</title><content type="html">&lt;p&gt;This week marked the &lt;a href="http://googleblog.blogspot.com/2010/08/update-on-google-wave.html"&gt;official death of Google Wave&lt;/a&gt; and &lt;a href="http://arstechnica.com/microsoft/news/2010/07/ballmer-and-microsoft-still-doesnt-get-the-ipad.ars"&gt;Steve Ballmer described the iPad as “just another PC form factor”&lt;/a&gt;. These seemingly unrelated events highlight the perils of both driving and not driving a company from an engineering standpoint.&lt;/p&gt;  &lt;h3&gt;Microsoft, The Meteor is Coming&lt;/h3&gt;  &lt;p&gt;For those of us who started using computers before the internet, there is a stark difference between now and 10-20 years ago in the perception of Microsoft. In the post-IBM era, Microsoft was the 800 pound gorilla in the room that crushed everything else, even Apple, Wordperfect, Lotus and Borland.&lt;/p&gt;  &lt;p&gt;As much as Bill Gates is derided in some quarters he was and is a programmer. It has probably been many years since he wrote a line of code but the fact remains that has &lt;em&gt;has&lt;/em&gt; written code and understands the process. He is a technical guy. Case in point: &lt;a href="http://www.joelonsoftware.com/items/2006/06/16.html"&gt;My First BillG Review&lt;/a&gt;.&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;In those days, Microsoft was a lot less bureaucratic. Instead of the 11 or 12 layers of management they have today, I reported to Mike Conte who reported to Chris Graham who reported to Pete Higgins, who reported to Mike Maples, who reported to Bill. About 6 layers from top to bottom. We made fun of companies like General Motors with their eight layers of management or whatever it was.&lt;/p&gt;    &lt;p&gt;…&lt;/p&gt;    &lt;p&gt;&lt;em&gt;... and THERE WERE NOTES IN ALL THE MARGINS. ON EVERY PAGE OF THE SPEC. HE HAD READ THE WHOLE GODDAMNED THING AND WRITTEN NOTES IN THE MARGINS.&lt;/em&gt;&lt;/p&gt;    &lt;p&gt;&lt;em&gt;…&lt;/em&gt;&lt;/p&gt;    &lt;p&gt;Bill Gates was amazingly technical. He understood Variants, and COM objects, and IDispatch and why Automation is different than vtables and why this might lead to dual interfaces. He worried about date functions. He didn't meddle in software if he trusted the people who were working on it, but you couldn't bullshit him for a minute because he was a programmer. A real, actual, programmer.&lt;/p&gt;    &lt;p&gt;Watching non-programmers trying to run software companies is like watching someone who doesn't know how to surf trying to surf.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Steve Ballmer is no Bill Gates.&lt;/p&gt;  &lt;p&gt;Ballmer might be a whiz at making org charts, discussing corporate strategy, managing a division, making M&amp;amp;A deals or whatever but he’s simply not technical. He was hired as a business manager and that’s what he is.&lt;/p&gt;  &lt;p&gt;The problem is that Microsoft, in spite of everything it’s tried, is a software product company. If you don’t understand how software is created you have, in my opinion, no business running the company. It would be like me, as a programmer, running an airline.&lt;/p&gt;  &lt;p&gt;Ballmer isn’t unique or even in the minority here. My personal experience has been the majority of places I have worked haven’t understood software development nor been driven by engineering rather than the business side. Far more common is that programmers are treated as an unavoidable cost centre for which the perfect management metric has yet to be found.&lt;/p&gt;  &lt;p&gt;The problem here is twofold:&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;Microsoft customers are enterprises and PC OEMs not end users; and &lt;/li&gt;    &lt;li&gt;Singularity of purpose: Microsoft exists to sell Windows licenses. &lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;The first point is highlighted in another of this week’s stories, &lt;a href="http://arstechnica.com/microsoft/news/2010/08/microsofts-internal-internet-explorer-privacy-battles.ars"&gt;Inside Microsoft's internal IE8 privacy battles&lt;/a&gt;. Basically, advertising interests won out over user experience.&lt;/p&gt;  &lt;p&gt;Windows is so ridiculously convoluted because Microsoft can’t say no to companies that want ridiculous customizations. This is why you have a horrific mishmash of policies, registry settings, etc that mean it’s typically uneconomic to figure out what’s wrong (if not virtually impossible). It’s easier to simply wipe it and start again.&lt;/p&gt;  &lt;p&gt;In &lt;a href="http://www.codinghorror.com/blog/2004/06/commandos-infantry-and-police.html"&gt;Commandos, Infantry, and Police&lt;/a&gt;, Jeff Atwood references &lt;a href="http://www.cringely.com/"&gt;Robert X. Cringely&lt;/a&gt;’s book &lt;a href="http://www.amazon.com/Accidental-Empires-Silicon-Millions-Competition/dp/0887308554/ref=sr_1_1?ie=UTF8&amp;amp;s=books&amp;amp;qid=1280984194&amp;amp;sr=8-1"&gt;Accidental Empires&lt;/a&gt;, which can be used to categorize companies. Startups are commandos. They are aggressively seeking to gain something with nothing to lose. Successful startups become infantry since the army is now so large that it requires structure and discipline. Truly large companies become police that are only interested in maintaining the status quo.&lt;/p&gt;  &lt;p&gt;Microsoft’s sole purpose now is to protect its cash cow of Windows and Office. It’s leadership is probably terrified that something will kill the golden goose. What you fear you create. To quote &lt;a href="http://www.imdb.com/title/tt0076759/quotes"&gt;Princess Leia&lt;/a&gt;:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;The more you tighten your grip, Tarkin, the more star systems will slip through your fingers.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;So this is why Ballmer describes the iPad as just another PC form factor because Microsoft just wants to sell more Windows licenses. Ballmer lacks the technical foundation to understand exactly why he’s running a dinosaur so doesn’t comprehend that the meteor is coming.&lt;/p&gt;  &lt;h3&gt;Google, Today’s Mammals Are Tomorrow’s Dinosaurs&lt;/h3&gt;  &lt;p&gt;Google has a similar problem to Microsoft in that it has a couple of core, &lt;em&gt;hugely&lt;/em&gt; successful products (being the combined search and advertising business) and everything else has either failed, been lacklustre or is yet to bear fruit.&lt;/p&gt;  &lt;p&gt;Of course there are successful ancillary products like GMail, Google Maps, Google Docs and so on. Many of these are completely dominant in their space but all of them merely enhance the search and advertising business or simply don’t generate a considerable amount of revenue.&lt;/p&gt;  &lt;p&gt;To quote &lt;a href="http://www.bnet.com/blog/advertising-business/google-8217s-revenue-breakdown-shows-it-controls-even-more-advertising-than-you-8217d-think/4932"&gt;Google’s Revenue Breakdown Shows It Controls Even More Advertising Than You’d Think&lt;/a&gt;:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;a href="http://www.sec.gov/Archives/edgar/data/1288776/000119312510030774/d10k.htm#toc95279_8"&gt;Google revenues last year were $23.6 billion&lt;/a&gt;, 99 percent of which came from advertising, the company said. Google doesn’t break down which of its services earn what, and it still &lt;a href="http://www.seroundtable.com/archives/022260.html"&gt;hasn’t disclosed share breakdowns from content on YouTube or mobile devices&lt;/a&gt;.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Now I say this not to liken Google’s non-search forays as failures (as Microsoft’s forays beyond Windows and Office are almost entirely failures). Google has a different business model. Google is all about people using the Web. The more they use the Web, the more money Google makes. All their successful non-search products share this theme: they lower the cost of or increase the amount or effectiveness of using the Web.&lt;/p&gt;  &lt;p&gt;Google is an unusual company in many respects. Larry and Sergey completely nailed search and will go down in history as tech pioneers for doing so. They have created a company that is largely &lt;em&gt;engineering-driven&lt;/em&gt;, which shares a couple of parallels to the Microsoft of old except that early Microsoft was all about embracing the external developer community.&lt;/p&gt;  &lt;p&gt;That can enable the company to focus on products rather than attempting to quantify programmer performance (a far more typical scenario) but it can have a downside too, which is the whole point of this post.&lt;/p&gt;  &lt;p&gt;Despite it’s massive engineering talent, there are areas where Google has been unable to make any headway, namely in the much-hyped social space. Google Wave’s demise foreshadows the increasingly likely (re)entry into the social space, popularly referred to as &lt;a href="http://www.helium.com/items/1911943-what-is-google-me"&gt;Google Me&lt;/a&gt;.&lt;/p&gt;  &lt;p&gt;But the engineering way of thinking left unchecked can have some serious downsides. Consider an example.&lt;/p&gt;  &lt;h3&gt;Sudoku&lt;/h3&gt;  &lt;p&gt;We programmers think differently to “normal” people. While we share a lot in common with practitioners of other scientific and engineering disciplines I think it’s fair to say that part of the programmer psyche is still unique. Many programmers don’t realize this and it’s hard to explain to “outsiders” but I shall try with an example.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/File:Sudoku-by-L2G-20050714.svg" rel="license"&gt;&lt;img style="width: 200px; float: right; margin-left: 0px; margin-right: 0px" src="http://upload.wikimedia.org/wikipedia/commons/thumb/f/ff/Sudoku-by-L2G-20050714.svg/200px-Sudoku-by-L2G-20050714.svg.png" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/Sudoku"&gt;Sudoku&lt;/a&gt; is a number-placement puzzle where the player tries to figure out how to place numbers in a partly filled grid such that certain rules are followed regarding the allowable repetition patterns of those numbers. Puzzles vary in difficulty based on how few numbers they reveal. A given puzzle has but one solution.&lt;/p&gt;  &lt;p&gt;Many people enjoy this as a pastime. When first faced with Sudoku I like many programmers reacted in a very different way. I learnt the rules, devised a program that solved any given puzzle, satisfied myself it was correct and then forgot about Sudoku having never done one of these puzzles by hand, which basically defeats the whole point.&lt;/p&gt;  &lt;p&gt;Non-programmers will probably find this bizarre. Programmers will almost certainly nod understanding. The point here is that programming trains you to find &lt;em&gt;general solutions&lt;/em&gt;. So you can solve one Sudoku puzzle but what would be the point? There are billions of others unsolved! In terms of time investment, you’re better off solving all of them at once!&lt;/p&gt;  &lt;h3&gt;Google Wave&lt;/h3&gt;  &lt;p&gt;Google Wave was presented with much fanfare and hype at Google IO 2009. Spearheaded by the quite brilliant Rasmussens (who brought us Google Maps), Wave was touted as one communication platform to rule them all.&lt;/p&gt;  &lt;p&gt;Wave was designed to be low level such that any messaging paradigm could be implemented in terms of Wave. Wave could handle a version of email, Twitteresque micro-blogging, IM, even bug tracking and so on. All of these things can be implemented on top of Google Wave.&lt;/p&gt;  &lt;p&gt;Or at least that was the theory.&lt;/p&gt;  &lt;p&gt;See the pattern here? Much like the Sudoku problem, Google was seeking a &lt;em&gt;general solution&lt;/em&gt; to a set of problems.&lt;/p&gt;  &lt;p&gt;So while this may be a great engineering feat, it ignored a fairly basic problem: what is the use case for Google Wave? Who will use it? Why? for what?&lt;/p&gt;  &lt;p&gt;Those questions are perhaps unfair because many startup ventures begin with an idea and very little idea of what the end will look like. Or at least that end will change several times (dare I utter the overused “pivot” buzzword?). So you can argue that you create a platform and wait for others to find a use for it.&lt;/p&gt;  &lt;p&gt;Clearly they didn’t.&lt;/p&gt;  &lt;p&gt;Perhaps a better approach is to ask: what is the pain point? What problem is Google Wave solving?&lt;/p&gt;  &lt;p&gt;The programmer in me understand the desire to create one platform that can do everything. It’s an alluring siren but don’t lose sight of the rocks drawing ever closer.&lt;/p&gt;  &lt;p&gt;To quote Joel Spolsky from &lt;a href="http://www.joelonsoftware.com/articles/APIWar.html"&gt;How Microsoft Lost the API War&lt;/a&gt;:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;By the way, for those of you who follow the arcane but politically-charged world of blog syndication feed formats, you can see the same thing happening over there. RSS became fragmented with several different versions, inaccurate specs and lots of political fighting, and the attempt to clean everything up by creating &lt;em&gt;yet another format&lt;/em&gt; called Atom has resulted in several different versions of RSS plus one version of Atom, inaccurate specs and lots of political fighting. When you try to unify two opposing forces by creating a third alternative, you just end up with three opposing forces. You haven't unified anything and you haven't really fixed anything.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;To paraphrase, Wave didn’t unite anything. In fact it did nothing other than create yet another way we could communicate.&lt;/p&gt;  &lt;p&gt;As much as people complain about email, everyone understands the model. A user who gets on the internet for the first time quickly understand the concept of sending someone “electronic mail”. It parallels something with which they are familiar from real life.&lt;/p&gt;  &lt;h3&gt;Apple and the Click Wheel&lt;/h3&gt;  &lt;p&gt;Whatever the future holds for Apple, it (or perhaps more accurately, Steve Jobs) will be remembered as one of the most influential tech figures of the computing era to date. Apple popularized and revolutionized digital content distribution, has a brand synonymous with mobile digital music playback and &lt;a href="http://cdixon.org/2010/06/06/steve-jobs-single-handedly-restructured-the-mobile-industry/"&gt;singlehandedly restructured the telecommunications industry&lt;/a&gt;.&lt;/p&gt;  &lt;p&gt;The first iPod had a speaker behind the click wheel. Its only function was to make clicking noises to give to give the user auditory feedback. Such a feature makes absolutely zero engineering sense. Can you see Microsoft doing this? I can’t. I can’t even see Google doing it.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;em&gt;There is no other company that completely “gets it” in terms of consumer user experience.&lt;/em&gt;&lt;/strong&gt; No one is close. And before you decry Apple for its lack of Flash, the walled app garden, lack of “freedom” (whatever that means today), etc… nobody cares. That includes me.&lt;/p&gt;  &lt;p&gt;The point here is that Apple isn’t run by engineers. Apple is completely focused on product and user experience. Anything can be sacrificed to that end. If Apple were run like Microsoft, the iPad would be running some variant of MacOS X. But it doesn’t. Why? Because it makes no sense to put a full desktop OS based on a pixel-perfect pointer interface (ie the mouse) onto a low-power touchscreen device. Apple isn’t concerned about selling OS licenses. They’re focused on the product.&lt;/p&gt;  &lt;h3&gt;Conclusion&lt;/h3&gt;  &lt;p&gt;I’m not against programmers running companies. Not at all. Google is a great example of what can be achieved when you give talented engineers the freedom to innovate. Microsoft suffers I believe because its leader has no understanding of how software developments works and how programmers think.&lt;/p&gt;  &lt;p&gt;But both extremes can have a downside. This week it killed Google Wave. It’s food for thought as Google ramps up it’s next stab at social.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/4q5xepOfDD0" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/8509836162958462206/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2010/08/google-wave-microsoft-and-engineers.html#comment-form" title="7 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/8509836162958462206?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/8509836162958462206?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/4q5xepOfDD0/google-wave-microsoft-and-engineers.html" title="Google Wave, Microsoft and Engineers Running the Asylum" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>7</thr:total><feedburner:origLink>http://www.cforcoding.com/2010/08/google-wave-microsoft-and-engineers.html</feedburner:origLink></entry><entry gd:etag="W/&quot;D08FR3oycCp7ImA9WxFaEkg.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-1960314097210791831</id><published>2010-07-16T12:50:00.001+08:00</published><updated>2010-07-16T12:50:16.498+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-07-16T12:50:16.498+08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="opinion" /><category scheme="http://www.blogger.com/atom/ns#" term="google" /><category scheme="http://www.blogger.com/atom/ns#" term="career" /><title>My Google Interview</title><content type="html">&lt;p&gt;In early June I was contacted by a Google recruiter to ask if I was interested in applying for an engineering role at Google. She had found me based on my &lt;a href="http://stackoverflow.com/users/18393/cletus"&gt;Stackoverflow profile&lt;/a&gt;. The position she was recruiting for was in Mountain View, California.&lt;/p&gt;  &lt;h3&gt;Reasons Not to Apply&lt;/h3&gt;  &lt;p&gt;I had never considered applying to Google for several reasons.&lt;/p&gt;  &lt;p&gt;The first is that I’m not a US citizen and I don’t have a green card. This means I would need a visa. For most people, this means the employer needs to sponsor you for a &lt;a href="http://travel.state.gov/visa/temp/types/types_1271.html"&gt;H-1B visa&lt;/a&gt;, which anyone familiar with them will tell you is a huge pain. Most employers simply aren’t interested or at least that’s my perception. I guess the larger employers will have the scale and in-house counsel to make this viable.&lt;/p&gt;  &lt;p&gt;Many US employers don’t actually know that it’s far, far easier to employ Australians than any other nationality. In fact it’s almost as easy as employing Canadians. Much like the &lt;a href="http://travel.state.gov/visa/temp/types/types_1274.html"&gt;TN (NAFTA) visa&lt;/a&gt;, Australians have the &lt;a href="http://canberra.usembassy.gov/e3visa.html"&gt;E-3 visa&lt;/a&gt; but most people don’t know this and employers tend to assume you need a H-1B. US Employers take note: the process for employing Australian nationals is easy.&lt;/p&gt;  &lt;p&gt;The second reason I never considered applying to Google is perhaps another misperception on my part. My view was that Googlers seem to fit a particular profile. That profile is of being a graduate of a top school (think Stanford, UW or MIT), typically in their mid to late 20s. That’s not to say all fit this profile but your path is certainly a lot easier if this is you. Again, I’m not claiming this is the case but it certainly was my perception.&lt;/p&gt;  &lt;p&gt;The third reason is that Google uses C++. They also use Python, Java and JavaScript but C++ is my particular point of contention. It’s been nearly a decade since I’ve used C++ in anger. Personally I consider it a &lt;strong&gt;&lt;em&gt;horrible&lt;/em&gt;&lt;/strong&gt; language. I will go so far as to call it an abomination.&lt;/p&gt;  &lt;p&gt;Others have argued this far better than I could, most notably Linus Torvalds, addressing it &lt;a href="http://thread.gmane.org/gmane.comp.version-control.git/57643/focus=57918"&gt;in 2007&lt;/a&gt; and &lt;a href="http://www.realworldtech.com/forums/index.cfm?action=detail&amp;amp;id=110618&amp;amp;threadid=110549&amp;amp;roomid=2"&gt;2010&lt;/a&gt;. Suffice it to say I find C to be a far more elegant and easy-to-understand language for low-level programming.&lt;/p&gt;  &lt;p&gt;You could argue that Java and Python can be used by many (most?) Googlers and it’s a reasonable position but one that I don’t think is correct based on my limited experience. More on this later.&lt;/p&gt;  &lt;h3&gt;Reasons to Apply&lt;/h3&gt;  &lt;p&gt;The timing of being contacted was somewhat strange and somewhat timely.&lt;/p&gt;  &lt;p&gt;In April, a story blew up on &lt;a href="http://news.ycombinator.com/"&gt;Hacker News&lt;/a&gt;, &lt;a href="http://programming.reddit.com/"&gt;proggit&lt;/a&gt; and elsewhere concerning an employee leaving &lt;a href="http://www.mahalo.com/"&gt;Mahalo&lt;/a&gt; for &lt;a href="http://yahoo.com"&gt;Yahoo&lt;/a&gt;. &lt;a href="http://calacanis.com/"&gt;Jason Calacanis&lt;/a&gt; (CEO of Mahalo) said some perhaps rash things, which the employee foolishly posted to his blog (soon thereafter taken down but you can’t put the genie back in the bottle). More &lt;a href="http://www.cforcoding.com/2010/04/how-to-resign-gracefully.html"&gt;here&lt;/a&gt;.&lt;/p&gt;  &lt;p&gt;Jason tweeted about this. Entrepreneur-turned-VC &lt;a href="http://www.bothsidesofthetable.com/"&gt;Mark Suster&lt;/a&gt; chimed in, most notably with &lt;a href="http://www.bothsidesofthetable.com/2010/04/22/never-hire-job-hoppers-never-they-make-terrible-employees/"&gt;Never Hire Job Hoppers. Never. They Make Terrible Employees&lt;/a&gt;. This predictably caused an uproar in all the usual places.&lt;/p&gt;  &lt;p&gt;My view is Mark makes some good points but it’s too hardline. Let me give you some examples of why.&lt;/p&gt;  &lt;p&gt;I started out doing contract (short term) work because that’s all I could get. This all began with doing Perl CGI programming for an ISP that I was a telephone support person for and quickly transitioned into doing full-time programming (elsewhere). This was in part due to the fact that I was studying for my undergraduate degree part-time (so didn’t qualify for graduate programs), partly due to my location (most Australian companies are headquartered in Sydney in Melbourne so programming employment opportunities are disproportionately low in population terms in Perth) and partly due to timing (the early to mid-90s were a still a post-recession period).&lt;/p&gt;  &lt;p&gt;Little did I know that this would brand me to some extent a contractor for life. In many organization contractors are viewed as some combination of second-class citizen (eg I worked at one place that didn’t give internet access to contractors), necessary evil and disloyal mercenary. Most importantly, they are expendable and first to go in tough times. This last part of it I’m fine with because I viewed it as the ultimate meritocracy: you’d stay employed as long as you were valuable.&lt;/p&gt;  &lt;p&gt;In 2001 I moved to London, England and found this anti-contractor sentiment to be even more prevalent, which surprised me no end. So the cycle continued.&lt;/p&gt;  &lt;p&gt;Stays at various companies varied from 3 months to 3 years (on two occasions; one ending when the company fired all contractors and half the salaried staff due to financial woes and the other ending when the company was acquired and it became clear the software’s future was limited).&lt;/p&gt;  &lt;p&gt;But in this time I’ve been screwed over more times than I can recount.&lt;/p&gt;  &lt;p&gt;I’ve had a client threaten to sue and terminate a contract leaving months unpaid simply as a tactic to avoid paying. I’ve been denied payments by a recruitment agent I was entitled to simply because it was too expensive and time consuming (for me) to pursue legal action. I’ve been thrown under the bus in a political move by a manager who had a project going south and was looking for a scapegoat so his boss wouldn’t fire or replace him. I’ve had someone promise to pay only later realizing they were hiding behind an offshore shelter and never had any intention of paying. The list goes on.&lt;/p&gt;  &lt;p&gt;All in all I’ve probably lost $50,000 to $100,000 over the years from this kind of thing so when Mark (or anyone) likens this to a lack of loyalty, it’s fair to say it pisses me off.&lt;/p&gt;  &lt;p&gt;That’s not to say I’ve never done anything on reflection I probably shouldn’t have but hey we all make mistakes. I’ve never intentionally screwed anyone over this way.&lt;/p&gt;  &lt;p&gt;Perhaps the most erudite and eloquent take on this issue came from one of my favourite bloggers, &lt;a href="http://gilesbowkett.blogspot.com/"&gt;Giles Bowkett&lt;/a&gt;, in &lt;a href="http://gilesbowkett.blogspot.com/2010/05/job-hopping.html"&gt;Job-Hopping&lt;/a&gt;:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;Sorry, tangent. Point is, a job-hopper is like Larry King. Larry King's about to have his 8th divorce. He has all these women asking him to marry them, he marries them, it doesn't work out, he moves on to the next one. I don't know Larry King and I wish him the best, but to me, it sure looks like every time he gets married, he's settling. If that wasn't the way it was, he would have fought to keep at least &lt;i&gt;one&lt;/i&gt;of those marriages intact. And that's kinda what's going on if you're a job-hopper. It means the companies want you bad, but you could care less about the companies. I hate to side with the VCs here, but if they want you more than you want them, maybe it means you need to aim higher.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;I’ve spent too many years writing bullshit business software, wading through pointless process (eg spending a day in meetings to remove a comma; no I’m not kidding) and doing other brain dead nonsense. I’m tired of it. I’ve had enough. I’ve reached the point that I don’t want to do it anymore. It’s time to make a change even though I’m not exactly sure what that change is yet. It’s reached the point where I’d rather work for nothing on something remotely interesting than do this one more day.&lt;/p&gt;  &lt;p&gt;So this had been percolating in my brain. To bring this back to Google, that’s why I say the timing was strange. The project I’d been working on (which was turning into another that was running out of money) was on hiatus and I was looking to do something different.&lt;/p&gt;  &lt;p&gt;So I said yes I would apply.&lt;/p&gt;  &lt;h3&gt;Telephone Interviews&lt;/h3&gt;  &lt;p&gt;Let me first say it was hard to speak to them on the phone. By some quirk of geography, Mountain View, California and Perth, Western Australia and 16 hours apart (Perth is 16 hours ahead), which really limited times when they were at work and I was awake.&lt;/p&gt;  &lt;p&gt;The recruiter told me there would be a couple of phone interviews over the next couple of weeks. I had one the next week that went through a couple of questions.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; for this and the on-site interviews I’m &lt;strong&gt;&lt;em&gt;not&lt;/em&gt;&lt;/strong&gt; going to reveal the exact content. For the on-sites I signed an NDA but more importantly (at least to me) I said I wouldn’t.&lt;/p&gt;  &lt;p&gt;All I’ll say is that Google’s position seems to be that they want to assess your knowledge of computer science fundamentals and problem solving ability. If you Google “google interview questions” you’ll find the kind of problem that require some kind of recursive or other divide-and-conquer type technique. This makes sense and this &lt;em&gt;theme&lt;/em&gt; gelled with my experience overall.&lt;/p&gt;  &lt;p&gt;I got through two questions even though I stumbled somewhat through the second. I was trying to visualize the solution and that was the problem. As soon as I took a piece of paper and drew a diagram the solution was obvious. That’s just me: I like to whiteboard/diagram rather than trying to mentally visualize.&lt;/p&gt;  &lt;p&gt;It took until next week to hear back. I’d been expecting a second phone screen. As it turned out they wanted to arrange a series of on-site interviews at their office in Sydney. I took this as a good sign: apparently I didn’t require a second idiot test.&lt;/p&gt;  &lt;h3&gt;Preparation&lt;/h3&gt;  &lt;p&gt;My field guide for this was &lt;a href="http://steve-yegge.blogspot.com/"&gt;Steve Yegge’s&lt;/a&gt; &lt;a href="http://steve-yegge.blogspot.com/2008/03/get-that-job-at-google.html"&gt;Get that job at Google&lt;/a&gt;. It’s good advice, not just for Google but for also being a well-rounded programmer.&lt;/p&gt;  &lt;p&gt;Like I said, I’ve spent a lot of time writing bullshit business software. Frankly my day-to-day usage of graphs, dynamic programming and balanced trees is, well, almost nonexistent. So it was time to brush up. And brush up I did.&lt;/p&gt;  &lt;p&gt;I did some thinking about Steve’s post being over two years old. The one thing missing from this is &lt;em&gt;language theory&lt;/em&gt; (compilers, grammars, lexing/parsing and so on). Since Steve’s post Google has added:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;V8: a Javascript engine;&lt;/li&gt;    &lt;li&gt;Go: programming language; and&lt;/li&gt;    &lt;li&gt;Unladen Swallow: an LLVM port of CPython aimed at hugely speeding it up.&lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;So I concluded language theory is important but luckily I’d been doing that anyway as part of my (sadly stalled) Markdown project.&lt;/p&gt;  &lt;h3&gt;On-Site Interviews&lt;/h3&gt;  &lt;p&gt;A couple of weeks later I was flown to Sydney for 4 hours of back-to-back interviews. It’s been some years since I’ve been in Sydney so this was enjoyable anyway.&lt;/p&gt;  &lt;p&gt;&lt;img style="width: 550px; display: block; float: none; margin-left: auto; margin-right: auto" src="http://img198.imageshack.us/img198/4213/photogyzh.jpg" /&gt;&lt;/p&gt;  &lt;p&gt;I am primarily a Java developer, more by circumstance than planning. I transitioned to Java in the late 90s from doing Perl, C and C++. Since then I’ve dabbled in many languages including Javascript, C#, Haskell, Ruby, Python, PHP and others.&lt;/p&gt;  &lt;p&gt;The first problem I encountered was that at least two of my interviewers had only passing or no familiarity with Java. This made it particularly difficult as some concepts are unique to one language.&lt;/p&gt;  &lt;p&gt;But everyone knew C++. I’ve read about this before. This combined with my own experience now leads me to believe that C++ isn’t optional for any Google applicant. Not because you need to use it to work there. I have no direct experience of this. But because of “interviewer lottery”. Some at Google (it seems) do nothing but C++. You might be interviewed by one of these people.&lt;/p&gt;  &lt;p&gt;I had to write several code segments on a whiteboard. This I expected and was fine with. I realize this is necessary (see &lt;a href="http://www.codinghorror.com/blog/2007/02/why-cant-programmers-program.html"&gt;Why Can't Programmers.. Program?&lt;/a&gt; and &lt;a href="http://www.codinghorror.com/blog/2010/02/the-nonprogramming-programmer.html"&gt;The Non-Programming Programmer&lt;/a&gt;) and have no problem with it but it depends on what kind of problem you ask. Simple is usually best.&lt;/p&gt;  &lt;p&gt;Two of the problems I had were extremely finnicky to solve. In one I think the interviewer was understanding and simply wanted to determine if I understand the relevant contract and I could see what the issues were more than coding a completely correct solution (which I appreciated). Another interview got caught short before an efficient solution could be developed and I really don’t think it’s the kind of problem that lends itself to writing a code solution in 40 minute. By this I mean I believe it would be more valuable to speak about the algorithms and problems involved as the code for an efficient solution would be quite complex.&lt;/p&gt;  &lt;p&gt;The theme of problem solving and analyzing thought processes remained constant throughout.&lt;/p&gt;  &lt;p&gt;For some reason I had expected there would be a lunch break (10-2). There wasn’t. By the end I was mentally exhausted, 3pm (it ran over time) and I hadn’t eaten since the day before. The next day I returned home.&lt;/p&gt;  &lt;h3&gt;Aftermath&lt;/h3&gt;  &lt;p&gt;I was told the results would be reviewed and I would hear from them in two weeks.&lt;/p&gt;  &lt;p&gt;Two weeks rolled around. A time for a phone call was organized. That phone call was rather short: “strong but not strong enough for this particular position” was the crux of it.&lt;/p&gt;  &lt;p&gt;I’d be lying if I said I wasn’t disappointed. I’d actually gone into the whole thing fairly neutral to begin with in that I wasn’t mad keen on working for Google but was open to the possibility. But as time went on and I spoke to a few Googlers, I got more excited about it. I started to hope it would work out.&lt;/p&gt;  &lt;p&gt;I actually thought the on-sites went quite well and I did expect it to progress to the next level (whatever that would’ve been I’m not sure) but my hopes were dashed.&lt;/p&gt;  &lt;p&gt;The recruiter told me no specific feedback would be given (since I asked on what the weaknesses were). The one outright bizarre thing I was told was I was “too Microsoft-centric” (direct quote). Apart from dabbling in C# I haven’t programmed for Windows since Visual Studio 6 in 2000. It made me wonder if they were looking at the right candidate. And if I’m too Microsoft-centric, what is &lt;a href="http://msmvps.com/blogs/jon_skeet/Default.aspx"&gt;Jon Skeet&lt;/a&gt;?&lt;/p&gt;  &lt;h3&gt;Feedback&lt;/h3&gt;  &lt;p&gt;I have obviously been supervised in the workplace, I’ve supervised other developers, I’ve done recruiting for companies and obviously been recruited. An important theme through all of these is the need for feedback. People need to know what they’re doing right and what they’re doing wrong.&lt;/p&gt;  &lt;p&gt;In high school in Western Australia we have (had?) a thing called the TEE (Tertiary Entrance Exams). In year 12 (the last year of high school) you have a set of exames. Your coursework for the year and that exam result is added up with equal weighting (so 50-50), massaged through a bewildering array of scaling formulae and comes out as a score that is used for university admission. In Australia we don’t have the same system as the US of application essays, admissions committees looking at your extracurricular interests and so on. It’s all about that number.&lt;/p&gt;  &lt;p&gt;Anyway to prepare for our TEE exams in all my subjects we went through old exam papers. For Maths I did the exam papers for the previous &lt;em&gt;ten years&lt;/em&gt;. Our teacher would go through the problems and we could see what we did right and what we did wrong. This was &lt;em&gt;incredibly&lt;/em&gt; valuable. Exams may test knowledge but taking exams is also a skill. If you’ve seen how to solve 50 problems and gone through the process of doing so you’re much more likely to be able to apply that knowledge to future problems. There are only so many ways you can state an integral calculus problem on an exam paper.&lt;/p&gt;  &lt;p&gt;The Soviet Union dominated the chess world for much of the twentieth century. The Russian approach was to teach students Chess through old games.&lt;/p&gt;  &lt;p&gt;When I got to university this all changed. You couldn’t keep the exam paper. Previous exam papers were never examined. Your exam was never given back to you so you could see what you got right and what you got wrong. The exam problems were never solved in class so you could see the right and wrong ways to go about it. This was probably so lecturers could reuse papers year after year rather than writing new papers but ultimately I considered it then (and still consider it) a failing.&lt;/p&gt;  &lt;p&gt;Recently I did a Masters degree (in quantitative finance) and found it to be even worse: there were deliberate holes in your lecture notes about applying that knowledge to particular problems and you could bet those would be what shows up in the exam.&lt;/p&gt;  &lt;p&gt;Why I consider this a failing is that if you’re truly meant to learn something you should concentrate on (and be told) what you don’t know, what you got wrong.&lt;/p&gt;  &lt;p&gt;This is what I mean by feedback.&lt;/p&gt;  &lt;p&gt;I don’t even know what position I was applying for. I don’t know what the requirements were. I don’t know what my perceived strengths and weaknesses were. So while I enjoyed much of the actual process it did require a significant time investment and the value proposition for me, as a candidate, is very low. I’ve been told that my application may be reevaluated in a year. The recruiter told me they’d like to keep me on file. I don’t know if this means anything but I’m assuming not simply because it’s the kind of “let’s still be friends” lip service companies tend to offer candidates.&lt;/p&gt;  &lt;p&gt;One person asked if I would reapply. My inclination at this point is to say “no”. This might still be the disappointment talking (and I am disappointed) but a year is a long time. The timing worked out well this time around but the likelihood of it doing so a year from now are, well, unlikely. Taking a few days off to interview again for a low value proposition from the process may very well be hard to justify. At the risk of tooting my own horn, I tend to be free as I am now by choice rather than circumstance. It’s simply a question of finding something that doesn’t bore me shitless.&lt;/p&gt;  &lt;h3&gt;Conclusion&lt;/h3&gt;  &lt;p&gt;This has been a long post (now over 3,000 words), worthy of the best of Steve Yegge in length if not content.&lt;/p&gt;  &lt;p&gt;My view of Google has changed. I don’t have the same view of Google being a Mecca for Ivy-shrouded twentysomethings. It does seem like they really are interested in talent in many forms.&lt;/p&gt;  &lt;p&gt;But the process does seem to be a lottery to some extent. I don’t know what the selection criteria is but it wouldn’t surprise me if 4 (or even all 5) of your 5 interviewers need to give you a thumbs up for you to go on. This may represent the “anarchic” (even haphazard) culture of Google itself. I really don’t know.&lt;/p&gt;  &lt;p&gt;I do get the impression that references from Googlers count for a lot.&lt;/p&gt;  &lt;p&gt;I’ve read that Google receives some 1,500 CVs a day and they do on-site interviews for less than 1% of these so perhaps I should be flattered and should just take the experience for what it was. To some extent I do. I’m simply not sure I feel the need to repeat it next year.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/1iq4twARxl8" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/1960314097210791831/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2010/07/my-google-interview.html#comment-form" title="43 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/1960314097210791831?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/1960314097210791831?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/1iq4twARxl8/my-google-interview.html" title="My Google Interview" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>43</thr:total><feedburner:origLink>http://www.cforcoding.com/2010/07/my-google-interview.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DkMNR307fyp7ImA9WxFRFUg.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-648307402506764118</id><published>2010-04-29T23:21:00.001+08:00</published><updated>2010-04-29T23:21:36.307+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-04-29T23:21:36.307+08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="opinion" /><title>Google’s Position on Flash is as Bad as Apple’s</title><content type="html">&lt;p&gt;Steve Jobs has been remarkably talkative of late. Most recently, he posted &lt;a href="http://www.apple.com/hotnews/thoughts-on-flash/"&gt;Thoughts on Flash&lt;/a&gt; and before that has &lt;a href="http://www.wired.com/gadgetlab/2010/03/steve-jobs-e-mails/"&gt;randomly (yet tersely) replied to seemingly random emails&lt;/a&gt;.&lt;/p&gt;  &lt;p&gt;Jobs actually makes some interesting points in this post but there is something hysterically funny about Apple criticizing Flash for not being “open” (as Bart would say, the &lt;a href="http://www.imdb.com/title/tt0701101/quotes"&gt;ironing is delicious&lt;/a&gt;) but once you get past that, comments about video decoding in software, battery life, etc all fit in Jobs’ famous unwavering commitment to his product vision. The battery performance of the &lt;a href="http://www.pcworld.com/businesscenter/article/193580/ipad_battery_tests_and_application_performance.html"&gt;iPad&lt;/a&gt; and &lt;a href="http://www.anandtech.com/show/3669/apples-15inch-2010-macbook-pro-more-battery-life-tests-display-evaluated/2"&gt;Macbooks&lt;/a&gt; are almost legendary.&lt;/p&gt;  &lt;p&gt;Lack of Flash has become a rallying cry for iphone malcontents and Android proponents of late. Barely a day goes by that &lt;a href="http://www.cnet.com/buzz-out-loud-podcast/"&gt;Buzz Out Loud&lt;/a&gt;, particularly for Molly Wood and Jason Howell. Such people typically praise Google’s openness with its adoption of Flash.&lt;/p&gt;  &lt;p&gt;Last month, Google announced a partnership with Adobe, one effect of which is that &lt;a href="http://blog.chromium.org/2010/03/bringing-improved-support-for-adobe.html"&gt;Flash will be bundled with Chrome (and later Chrome OS)&lt;/a&gt;.&lt;/p&gt;  &lt;p&gt;This got me thinking. I don’t like Flash. I don’t want it in my browser. Flash tends to be abused (by advertisers). Some have &lt;a href="http://www.wired.com/epicenter/2009/08/you-deleted-your-cookies-think-again/"&gt;used Flash to restore deleted tracking cookies&lt;/a&gt;, which is a huge privacy (even security) concern.&lt;/p&gt;  &lt;p&gt;I have previously tried to remove Flash from Chrome (my preferred browser). You can remove it but then every page you go to tells you you’re missing necessary plug-ins and asks if you want to install them. Is there an option to disable this message/request? No.&lt;/p&gt;  &lt;p&gt;As of Chrome 3 extensions have come to the rescue and &lt;a href="http://www.chromeextensions.org/appearance-functioning/flashblock/"&gt;Flashblock&lt;/a&gt; is a must-have. Still, this solution is not completely satisfactory. Some Web sites will put extra Flash widgets (sometimes as small as a pixel) to defeat Flash blockers. Click on one of these regions/pixels and you’ve just run some Flash you probably didn’t mean to run.&lt;/p&gt;  &lt;p&gt;I can understand the criticisms of Apple’s position even though I personally want to see Flash die a horribly fiery death. But Google isn’t giving me a choice either. Instead of allowing me to have Flash if I wish, it’s forcing me to have Flash whether I like it or not (and, trust me, I don’t).&lt;/p&gt;  &lt;p&gt;Now that Youtube has HTML5 video, I don’t need Flash for anything. Why won’t Google give me that choice?&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/6wrZClSd7-g" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/648307402506764118/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2010/04/googles-position-on-flash-is-as-bad-as.html#comment-form" title="10 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/648307402506764118?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/648307402506764118?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/6wrZClSd7-g/googles-position-on-flash-is-as-bad-as.html" title="Google’s Position on Flash is as Bad as Apple’s" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>10</thr:total><feedburner:origLink>http://www.cforcoding.com/2010/04/googles-position-on-flash-is-as-bad-as.html</feedburner:origLink></entry><entry gd:etag="W/&quot;AkUAQ345fCp7ImA9WxFREkg.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-4041924400282650062</id><published>2010-04-26T09:39:00.001+08:00</published><updated>2010-04-26T13:04:02.024+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-04-26T13:04:02.024+08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="opinion" /><title>How to Resign Gracefully</title><content type="html">&lt;p&gt;An engineer resigned this week from an LA startup. This otherwise insignificant event turned into a big story when that engineer posted the exchange with his boss on his blog. It’s a lesson in human nature and how to comport oneself in a business environment.&lt;/p&gt;  &lt;h3&gt;Background&lt;/h3&gt;  &lt;p&gt;The resignation of an engineer from &lt;a href="http://www.mahalo.com/"&gt;Mahalo&lt;/a&gt; and the subsequent email exchange between that engineer and the CEO, high profile &lt;a href="http://en.wikipedia.org/wiki/Jason_Calacanis"&gt;Jason Calacanis&lt;/a&gt;, became news this week. It all started with &lt;a href="http://twitter.com/Jason/status/12621363849"&gt;this tweet&lt;/a&gt;:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;Free advice for entitled Gen Y trophy kids: if you spend 12 months at a company over and over you look like a flake.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;The &lt;a href="http://ezinearticles.com/?Just-Say-No-to-Participation-Trophies&amp;amp;id=3381772"&gt;“trophy kid”&lt;/a&gt; remark refers to a previous statement by Jason about the trend of Gen-Y now getting trophies or awards for participation, basically for just showing up.&lt;/p&gt;  &lt;p&gt;This prompted prominent venture capitalist and host of &lt;a href="http://thisweekin.com/thisweekin-venture-capital/"&gt;This Week In Venture Capital&lt;/a&gt; host &lt;a href="http://www.bothsidesofthetable.com/about-2/"&gt;Mark Suster&lt;/a&gt;&amp;#160;&lt;a href="http://twitter.com/msuster/status/12621573518"&gt;tweeted&lt;/a&gt;:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;.@&lt;a href="http://twitter.com/jason"&gt;jason&lt;/a&gt; @&lt;a href="http://twitter.com/tonyadam"&gt;tonyadam&lt;/a&gt; - I never hire job hoppers. Never. They never make good employees. Jason was spot on.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;and then posted the somewhat controversial &lt;a href="http://www.bothsidesofthetable.com/2010/04/22/never-hire-job-hoppers-never-they-make-terrible-employees/"&gt;Never Hire Job Hoppers. Never. They Make Terrible Employees&lt;/a&gt;, which was later tempered with &lt;a href="http://www.bothsidesofthetable.com/2010/04/25/job-hoppers-redux-an-employees-perspective/"&gt;Job Hoppers Redux: An Employee’s Perspective&lt;/a&gt;.&lt;/p&gt;  &lt;p&gt;It became clear that this referred to the resignation of one &lt;a href="http://www.linkedin.com/in/eculver"&gt;Evan Culver&lt;/a&gt; when he posted the email exchange on his blog (now removed). TechCrunch posted the exchange in &lt;a href="http://techcrunch.com/2010/04/24/how-not-to-handle-a-resignation-gracefully/trackback/"&gt;How Not To Handle A Resignation Gracefully&lt;/a&gt;, which has triggered a firestorm of response, much of it directed at Jason and allegedly much of it has been removed by the moderators.&lt;/p&gt;  &lt;h3&gt;The Facts&lt;/h3&gt;  &lt;p&gt;Evan’s email says:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;This isn’t an easy email to write, but as the subject suggests, this email is to inform you of my resignation from Mahalo effective in 2 weeks.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;This email was sent to &lt;a href="http://www.linkedin.com/pub/jacob-m-burch/17/235/671"&gt;Jacob Burch&lt;/a&gt; (Director of Technology), &lt;a href="http://www.linkedin.com/in/jammons"&gt;Jeff Ammons&lt;/a&gt; (Developer) and Jason. It appears Jason was out of the office (his reply is from his Blackberry) and it’s alleged in the TechCrunch comments that Evan resigned to the CTO (Mark Jeffrey).&lt;/p&gt;  &lt;p&gt;California is an at-will employment state so barring any relevant contractual terms, no notice was required to quit and no reason is required to fire someone (barring legal issues such as discrimination).&lt;/p&gt;  &lt;p&gt;Evan’s email was polite but otherwise perfunctory.&lt;/p&gt;  &lt;p&gt;Jason addresses this issue in &lt;a href="http://thisweekin.com/thisweekin-startups/twist-49-with-sky-dayton/"&gt;This Week In Startups #49&lt;/a&gt; saying that he liked the guy, two weeks prior he had been promoted into a management position.&lt;/p&gt;  &lt;p&gt;That being said, let me give you some advice.&lt;/p&gt;  &lt;h3&gt;Showing Up Is Not Enough&lt;/h3&gt;  &lt;p&gt;It’s about what you do, what you’re achieved. Nobody cares if you simply showed up. This is the tragedy of the modern education system in that it rewards participation not winning. Whether it be children, employees or whatever you are doing them a huge disservice and creating an entitlement culture.&lt;/p&gt;  &lt;h3&gt;You Will Get Yelled At&lt;/h3&gt;  &lt;p&gt;A lot of comments on TechCrunch revolved around being treated badly. If you’re &lt;em&gt;lucky&lt;/em&gt; you have a boss that’s passionate about what they’re doing. If so, such bosses will get heated and yell because they &lt;em&gt;care&lt;/em&gt;.&lt;/p&gt;  &lt;p&gt;Getting treated badly is actually having a boss who is completely indifferent. At that point you’re simply a square on an org chart and a line item on a budget, utterly expendable and replaceable.&lt;/p&gt;  &lt;p&gt;This shouldn’t be taken as &lt;em&gt;carte blanche&lt;/em&gt; for employee abuse but nor should isolated incidents of being yelled at be taken for abuse.&lt;/p&gt;  &lt;h3&gt;Man Up (In Person)&lt;/h3&gt;  &lt;p&gt;Apologists will argue that in the age of modern communication, it’s OK to resign by email. Let me be absolutely clear: &lt;em&gt;it absolutely is not.&lt;/em&gt;&lt;/p&gt;  &lt;p&gt;You walk into your boss’s office and say “I’m not happy because of …” or “I’ve been offered this opportunity to do …” or whatever the case is. Give your boss a chance to respond. This isn’t about making a play for more money. It’s about respect. Even if you have no intention of staying, just by giving your boss a chance to respond and to do in person, you’ve shown that person the respect they probably deserve.&lt;/p&gt;  &lt;p&gt;They’re not in the office? You wait a few days until they are. They new job can’t way? Bullshit. Or, if true, it’s a good sign that it’s an organization you don’t want to work for because they don’t care about you.&lt;/p&gt;  &lt;p&gt;Most of all, be honest. If it’s more money you want or need, say so. If you simply don’t like it where you are or you think it’s a mistake, say so.&lt;/p&gt;  &lt;h3&gt;A Startup is not a Large Company&lt;/h3&gt;  &lt;p&gt;The vast majority of startups are small. That means that each person is much more valuable and much harder to replace. What’s more, most employees will have some kind of equity stake in the company. Contrast this to a large company where you tend to be a small cog in a very large machine and infinitely replaceable. You can’t compare the two experiences.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://cdixon.org/about.html"&gt;Chris Dixon&lt;/a&gt; posted &lt;a href="http://cdixon.org/2009/10/23/twelve-months-notice/"&gt;Twelve months notice&lt;/a&gt;:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;Generally speaking, there are two approaches to relating to other people in the business world. The first approach is transactional and legalistic:&amp;#160; work is primarily an exchange of labor for money, and agreements are made via contracts.&amp;#160;&amp;#160; Enforcement is provided by organizations, especially the legal system.&amp;#160; The second approach relies on trust, verbal agreements, reputation and norms, and looks to the community to provide enforcement when necessary.&lt;/p&gt;    &lt;p&gt;In the startup world, the latter approach dominates…&lt;/p&gt;    &lt;p&gt;For this reason, if you are an employee working at a startup where the managers are honest, inclusive and fair, you should disregard everything you’ve learned about proper behavior from people outside of the startup world.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;So ignore any comments about the “at-will” issue. It’s irrelevant.&lt;/p&gt;  &lt;h3&gt;Never Ever Embarrass Your Boss&lt;/h3&gt;  &lt;p&gt;This is Evan’s biggest faux pas: posting the email exchange on his blog. Note the self satisfied:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;I should note, that instead of responding, he instead removed my email account. Real pro of him. Good thing I forwarded it to myself first :P&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Make no mistake: this is &lt;em&gt;deplorable&lt;/em&gt; behaviour. Had it remained private, which it should’ve, Jason may have calmed down and mellowed about the situation over time. As it stands, he would &lt;em&gt;rightfully &lt;/em&gt;be incensed because this has become a news story.&lt;/p&gt;  &lt;p&gt;Worse for Evan: any future employer will find this story on a Google search and it makes him look really bad.   &lt;br /&gt;&lt;/p&gt;  &lt;h3&gt;Barred From The Office&lt;/h3&gt;  &lt;p&gt;When someone resigns or is fired it is not uncommon to pay them for their notice period and send them home immediately. Frankly I wish more companies would do this.&lt;/p&gt;  &lt;p&gt;Employees that are fired—especially programmers and other IT people—can be a security risk as they can do a lot of damage. That rarely happens but it is an issue. What’s more common is soon-to-be former employees can be disruptive and drain the morale of the team that’s staying. It’s often better to simply tie things off cleanly.&lt;/p&gt;  &lt;p&gt;In TWIST #49, Jason also mentioned the salient point that Yahoo (the company Evan is apparently joining) is a competing company to Mahalo. They’re both search companies with Q&amp;amp;A platforms.&lt;/p&gt;  &lt;p&gt;Some tried to turn this into an issue about unlawfully withholding belongings. I can &lt;em&gt;guarantee&lt;/em&gt; you that if there was anything urgent there (eg prescription medication) that he would’ve gotten that ASAP. Otherwise his stuff would be put in a box and either couriered or delivered to the lobby for his collection in a timely manner.&lt;/p&gt;  &lt;p&gt;An employer is well within their rights to bar you from the premises.&lt;/p&gt;  &lt;h3&gt;A Final Point About Human Nature&lt;/h3&gt;  &lt;p&gt;There is a key observation you can make from the comments on this about human nature: the majority of people will start with a conclusion and then look for facts to support that conclusion.&lt;/p&gt;  &lt;p&gt;A vocal minority really doesn’t like Jason. So what? How is that relevant? You don’t like Mahalo either? How is that relevant? It isn’t. This story for many has simply become another opportunity to bash Jason and grind whatever axe it is you feel the need to grind.&lt;/p&gt;  &lt;h3&gt;Conclusion&lt;/h3&gt;  &lt;p&gt;This story that never should’ve been a story is a good opportunity to learn a few lessons about conducting oneself in a professional manner.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/KIFF_6-jupw" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/4041924400282650062/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2010/04/how-to-resign-gracefully.html#comment-form" title="22 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/4041924400282650062?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/4041924400282650062?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/KIFF_6-jupw/how-to-resign-gracefully.html" title="How to Resign Gracefully" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>22</thr:total><feedburner:origLink>http://www.cforcoding.com/2010/04/how-to-resign-gracefully.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CEYGQ3g9fyp7ImA9WxFSEkQ.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-971103012884909688</id><published>2010-04-13T09:53:00.001+08:00</published><updated>2010-04-15T08:42:02.667+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-04-15T08:42:02.667+08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="opinion" /><title>The American’s Guide to the Australian Internet Filter</title><content type="html">&lt;p&gt;Australia has been in the tech news for all the wrong reasons in the last year: for trying to impose the kind of internet filter used more commonly by dictatorships and communist countries (eg China). Yet this coverage seems to miss the reasons for this so, speaking as an Australian, I will explain.&lt;/p&gt;  &lt;h3&gt;The Boring Stuff First (yes, it’s relevant!)&lt;/h3&gt;  &lt;p&gt;Australia’s political system is a mixture of the British Westminster system and the American system. Similar to the United States, Australia has a dual-sovereignty federal system. There are six states and two territories.&lt;/p&gt;  &lt;p&gt;The House of Representatives is analogous to Congress. There are 150 seats (or electorates) much like congressional districts. Whichever part has the most seats in the House of Representatives forms government. The leader of that party becomes the Prime Minister, the head of government. In certain cases where no party can form a majority (which can happen with minority parts and independent members) a minority government can form but this isn’t terribly common and tends to be unstable.&lt;/p&gt;  &lt;p&gt;The Senate is representation for the states. Each state gets 12 senators and each territory 2, for a total of 76.&lt;/p&gt;  &lt;p&gt;The nominal head of state is the Governor General. Officially he or she is the representative of the British monarch. Constitutionally, the Governor-General has several powers but the role is largely ceremonial yet the Governor General can (&lt;em&gt;and has&lt;/em&gt;) sacked the government and dissolved Parliament (see &lt;a href="http://en.wikipedia.org/wiki/1975_Australian_constitutional_crisis"&gt;the 1975 dismissal of Gough Whitlam&lt;/a&gt;). The Prime Minister nominates the Governor General. The British monarch basically rubberstamps that choice.&lt;/p&gt;  &lt;p&gt;All 150 seats in the House are contested every election, which occurs &lt;em&gt;approximately&lt;/em&gt; every three years. I say “approximately” because whereas the United States has constitutionally mandated election dates, the Prime Minister can ask the Governor General to dissolve Parliament and call an election at any time. It is simply required to happen &lt;em&gt;at least&lt;/em&gt; every three years. Snap elections when the government is enjoying a surge in popularity to increase the majority in the House are not uncommon.&lt;/p&gt;  &lt;p&gt;The United States and most other countries use a voting system commonly referred to as First Past the Post. If a candidate secures a simple majority, they win. Where this doesn’t happen, often there is a runoff election between the top two candidates.&lt;/p&gt;  &lt;p&gt;Australia uses a preferential voting system. You can say on your ballot that your preferences are:&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;John Smith (Greens) &lt;/li&gt;    &lt;li&gt;Mary Miler (ALP) &lt;/li&gt;    &lt;li&gt;Gary Mason (Liberal) &lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;What this means is that if your first preference fails to get a simple majority there is an elimination system in place. The candidate with the least first preferences is eliminated. All votes for that candidate go to &lt;em&gt;second preference&lt;/em&gt; of each voter and so on until someone has a majority. It’s a remarkably elegant system that means a vote for a minor party is not a wasted vote as it is in the US.&lt;/p&gt;  &lt;p&gt;The Senate is a little different. Half the Senate seats are contested each election. In most cases this means six seats in each state. The same preferential system is used but instead of securing a majority a candidate merely needs roughly 14% of the distributed preferences to secure a seat. This means minor parties are represented much more in the Senate.&lt;/p&gt;  &lt;p&gt;As in the US, a bill has to be passed by both houses of parliament to become law. There is no powerful executive branch however. Nor is there is a Senate filibuster mechanism.&lt;/p&gt;  &lt;h3&gt;It’s the Politics, Stupid&lt;/h3&gt;  &lt;p&gt;Federal Government is currently held by the Australian Labor Party (ALP), which philosophically is most like the Democrats in the United States.&lt;/p&gt;  &lt;p&gt;The Republican equivalent is made up of two parties: the Liberal Party and the National Party, which together form what is most often referred to as the Coalition. The National Party hold rural seats (that the Liberal Party does not contest) and is focused on issues that affect “the bush”.&lt;/p&gt;  &lt;p&gt;The biggest minor party is the Greens, who are focused on environmental issues.&lt;/p&gt;  &lt;p&gt;The Government by definition holds a majority of the House of Representatives so typically it isn’t an issue to pass bills through the HoR yet it can and does happen that members from one party or another cross the floor to vote with the other side allowing the opposition party to pass bills or the government to fail to pass bills.&lt;/p&gt;  &lt;p&gt;The Senate has to pass the same bill and here’s where it gets interesting. It is &lt;em&gt;highly unusual&lt;/em&gt; for one party to hold a Senate majority because only ~14% of the preferential vote is required to get a Senate seat. Yet this did happen for the previous Coalition government. The current makeup is:&lt;/p&gt;  &lt;table border="0" cellspacing="0" cellpadding="2" width="400"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="top" width="200"&gt;&lt;strong&gt;Party&lt;/strong&gt;&lt;/td&gt;        &lt;td valign="top" width="200"&gt;&lt;strong&gt;Seats&lt;/strong&gt;&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="200"&gt;ALP&lt;/td&gt;        &lt;td valign="top" width="200"&gt;32&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="200"&gt;Coalition&lt;/td&gt;        &lt;td valign="top" width="200"&gt;37&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="200"&gt;Greens&lt;/td&gt;        &lt;td valign="top" width="200"&gt;5&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="200"&gt;Independent (no party)&lt;/td&gt;        &lt;td valign="top" width="200"&gt;1&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="200"&gt;Family First&lt;/td&gt;        &lt;td valign="top" width="200"&gt;1&lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;In recent years the Senate has held a disproportionate amount of power because the independents and minor parties have held the balance of power.&lt;/p&gt;  &lt;p&gt;Assuming the Greens vote with the ALP, which is most often but not always the case, the votes are split 37-37 with 38 required to pass a bill. That means the government needs to attract Nick Xenophon (independent) and Steve Fielding (Family First) to pass bills in a strict party line vote, which happens a lot.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/Family_First_Party"&gt;Family First&lt;/a&gt; is a relatively new political force in Australia. Their politics are as the name suggests. The Coalition tends to be sensitive to some of the same issues.&lt;/p&gt;  &lt;p&gt;This results in a virtual quid pro quo where those holding the balance of power will vote with the Government in exchange for the Government passing legislation friendly to their pet issues. Child safety on the internet and pornography are high on that list.&lt;/p&gt;  &lt;p&gt;The previous Coalition of John Howard had a similar phase when &lt;a href="http://en.wikipedia.org/wiki/Brian_Harradine"&gt;Brian Harradine&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/Mal_Colston"&gt;Mal Colston&lt;/a&gt; held the balance of power.&lt;/p&gt;  &lt;p&gt;So first and foremost it’s a numbers game.&lt;/p&gt;  &lt;h3&gt;The Politicians’ Syllogism&lt;/h3&gt;  &lt;p&gt;ZDNet covered this in &lt;a href="http://www.zdnet.com.au/internet-filter-according-to-yes-minister-339302319.htm"&gt;Internet filter according to Yes Minister&lt;/a&gt;. Politicians—including our so-called &lt;em&gt;Minister for Broadband, Communications and the Digital Economy&lt;/em&gt;, &lt;a href="http://en.wikipedia.org/wiki/Stephen_Conroy"&gt;Stephen Conroy&lt;/a&gt;—don’t understand the internet. The logic goes something like this:&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;The internet contains bad things; &lt;/li&gt;    &lt;li&gt;A filter can stop bad things; &lt;/li&gt;    &lt;li&gt;Therefore we must filter the internet. &lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;&lt;em&gt;I am not kidding&lt;/em&gt;. This is the logic.&lt;/p&gt;  &lt;h3&gt;Conclusion&lt;/h3&gt;  &lt;p&gt;Coverage on the Web typically comprehends the second point: that politicians generally don’t understand the internet. That explains why a handful want this but it ignores the larger issue that numbers in the Senate is what’s driving this forwards.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/yp5bBDzm47s" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/971103012884909688/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2010/04/americans-guide-to-australian-internet.html#comment-form" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/971103012884909688?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/971103012884909688?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/yp5bBDzm47s/americans-guide-to-australian-internet.html" title="The American’s Guide to the Australian Internet Filter" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>2</thr:total><feedburner:origLink>http://www.cforcoding.com/2010/04/americans-guide-to-australian-internet.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DEMBR3gzeSp7ImA9WxBVE00.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-3948045478498721299</id><published>2010-02-16T11:45:00.001+08:00</published><updated>2010-02-16T15:47:36.681+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-02-16T15:47:36.681+08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="stackoverflow" /><category scheme="http://www.blogger.com/atom/ns#" term="opinion" /><title>Stackoverflow: Joel and Jeff want VC Money? Say What?</title><content type="html">&lt;p&gt;The big news today is that Stackoverflow—started by &lt;a href="http://www.joelonsoftware.com/"&gt;Joel Spolsky&lt;/a&gt; and &lt;a href="http://www.codinghorror.com/blog/"&gt;Jeff Atwood&lt;/a&gt; as a programming Q&amp;amp;A site almost 18 months ago—is &lt;a href="http://www.joelonsoftware.com/items/2010/02/14.html"&gt;now looking for VC money&lt;/a&gt;. This is huge and deeply worrying. And it raises a whole raft of questions.&lt;/p&gt;  &lt;h3&gt;Vertical Growth&lt;/h3&gt;  &lt;p&gt;Stackoverflow has grown to be probably the largest programming Q&amp;amp;A site on the internet in its short life, supplanting the “evil hyphen site”, to be just outside the top 1000 sites having &lt;a href="http://www.quantcast.com/stackoverflow.com"&gt;over 4.5 million visitors a month&lt;/a&gt;. While it continues to grow, there’s only so big it can get because there are only so many programmers.&lt;/p&gt;  &lt;p&gt;Joel says:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;In 18 months we’ve accomplish that: we’ve got 6 million unique visitors every month.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; this figure includes Superuser (1M) and Serverfault (730K).&lt;/p&gt;  &lt;p&gt;The issue of course is how to turn this traffic into revenue sufficient to cover the site’s running costs, development of the site and profit for its owners. Programmers are a hard group to monetize and you can see Joel and Jeff struggle with this when it comes the usual method: advertising. See &lt;a href="http://blog.stackoverflow.com/2009/03/responsible-advertising-feed-a-programmer/"&gt;Responsible Advertising: Feed a Programmer&lt;/a&gt;, &lt;a href="http://blog.stackoverflow.com/2009/11/our-amazon-advertising-experiment/"&gt;Our Amazon Advertising Experiment&lt;/a&gt; and &lt;a href="http://blog.stackoverflow.com/summary-of-amazon-remnant-ad-experiment/"&gt;Summary of Amazon Remnant Ad Experiment&lt;/a&gt;.&lt;/p&gt;  &lt;h3&gt;Horizontal Growth&lt;/h3&gt;  &lt;p&gt;It’s natural for companies that exhaust opportunities in their home markets to look at other markets that are related somehow, fuelled by (sometimes justified) paranoia that if they stop growing they’ll die or simply the need for incessant growth.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://www.businessinsider.com/chart-of-the-day-microsoft-operating-income-by-division-2010-2"&gt;&lt;img style="width: 510px" title="Microsoft Operating Profits By Division" alt="Microsoft Operating Profits by Division" src="http://static.businessinsider.com/image/4b7337bc0000000000a10a91/chart-of-the-day-msft-operating-profit.gif" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Look at &lt;a href="http://www.businessinsider.com/chart-of-the-day-microsoft-operating-income-by-division-2010-2"&gt;where Microsoft's profits come from&lt;/a&gt; and you’ll see their core business is Windows and Office. Forays into gaming, music, online services, mobile communication, etc have varied from being lacklustre to haemorrhaging money pits.&lt;/p&gt;  &lt;p&gt;Google’s core business is search and advertising.&lt;/p&gt;  &lt;p&gt;It takes a rare combination of talent, timing and luck to successfully branch into new areas as Apple did with online music, portable music players and the iPhone.&lt;/p&gt;  &lt;p&gt;Joel gave a &lt;a href="http://blog.stackoverflow.com/2009/05/joel-talks-about-stack-overflow-at-google/"&gt;Google Tech Talk about Stackoverflow&lt;/a&gt; last May that’s instructive. A key point is that all software is social and that a given platform that works in one community that’s dropped into another may simply not work.&lt;/p&gt;  &lt;p&gt;Programmers respond to the Q&amp;amp;A format of Stackoverflow because a programmer is predisposed to formulating questions, answering them and categorizing (tagging) them. What’s more, the subject matter is sufficiently objective for there to be right and wrong answers most of the time.&lt;/p&gt;  &lt;p&gt;To put it another way: programmers talking about programming are self-organizing.&lt;/p&gt;  &lt;p&gt;Some miss the point completely and criticize the format for making discussion hard, which misses the point entirely.&lt;/p&gt;  &lt;h3&gt;Sister Sites&lt;/h3&gt;  &lt;p&gt;Joel and Jeff’s first attempts at horizontal market growth are the sister sites: &lt;a href="http://serverfault.com/"&gt;Serverfault&lt;/a&gt; (for sysadmins) and &lt;a href="http://superuser.com/"&gt;Superuser&lt;/a&gt; (for general computer questions), which Jeff calls the League of Justice. There are also loose affiliations with &lt;a href="http://www.howtogeek.com/"&gt;How-to Geek&lt;/a&gt; and &lt;a href="http://doctype.com/"&gt;Doctype&lt;/a&gt; (from the guys behind &lt;a href="http://litmusapp.com/"&gt;Litmus&lt;/a&gt;).&lt;/p&gt;  &lt;p&gt;While a million (ish) uniques per month is nothing to sneeze at it’s clear that these sites haven’t grown like Stackoverflow has. See &lt;a href="http://www.quantcast.com/superuser.com"&gt;superuser.com&lt;/a&gt; and &lt;a href="http://www.quantcast.com/serverfault.com"&gt;serverfault.com&lt;/a&gt; (this one has started to pick up recently).&lt;/p&gt;  &lt;h3&gt;Stack Exchange&lt;/h3&gt;  &lt;p&gt;&lt;a href="http://www.fogcreek.com/"&gt;Fog Creek&lt;/a&gt; has adapted the Stackoverflow code to create a hosted white label Q&amp;amp;A solution. For roughly $129/month you can have your own Q&amp;amp;A site to discuss everything from parenting issues to World of Warcraft (no joke).&lt;/p&gt;  &lt;p&gt;Such sites rely on communities and building communities takes time. Stackoverflow succeeded in part because it leveraged the existing audiences of Joel and Jeff.&lt;/p&gt;  &lt;h3&gt;Careers&lt;/h3&gt;  &lt;p&gt;This is perhaps the more controversial move and something I covered in &lt;a href="http://www.cforcoding.com/2009/12/joel-inc-stackoverflow-careers-and.html"&gt;Joel Inc., Stackoverflow Careers and Jumping Sharks&lt;/a&gt; and &lt;a href="http://www.cforcoding.com/2009/12/hard-numbers-on-stackoverflow-careers.html"&gt;Hard Numbers on Stackoverflow Careers&lt;/a&gt;. It’s something the pair have pushed repeatedly, going so far as &lt;a href="http://blog.stackoverflow.com/2010/01/careers-success-stories/"&gt;heartfelt testimonials&lt;/a&gt;.&lt;/p&gt;  &lt;p&gt;This one differs from the others in that the revenue model isn’t based on advertising: it’s based on the high cost of recruitment and the unique tie-in with Stackoverflow. My opinion is there simply aren’t enough active Stackoverflow users for this to be a real money spinner but time will tell.&lt;/p&gt;  &lt;h3&gt;Self-Funding and Control&lt;/h3&gt;  &lt;p&gt;Self-funding has huge advantages for any venture. If it’s possible it keeps control in the hands of the founders. Investors have their own agenda—being a return on that investment—which doesn’t necessarily coincide with the best long-term interests of the venture.&lt;/p&gt;  &lt;p&gt;Some argue &lt;a href="http://en.wikipedia.org/wiki/Transmeta"&gt;Transmeta&lt;/a&gt; was derailed by being forced to make a premature product launch.&lt;/p&gt;  &lt;p&gt;When you own your own venture you can do whatever you want. Well, you can’t break the law but other than that, there’s not a lot you can’t do.&lt;/p&gt;  &lt;p&gt;As soon as you have investors that changes. Investors have rights. Their money comes with conditions like how you can spend the company’s money, reporting requirements and so on.&lt;/p&gt;  &lt;p&gt;It gets even worse when you’re a public company and worse again when you’re a publicly listed company.&lt;/p&gt;  &lt;h3&gt;Debt and Equity&lt;/h3&gt;  &lt;p&gt;There are two basic sources of funding for a venture: debt and equity.&lt;/p&gt;  &lt;p&gt;Debt is borrowing money that you agree to repay the lender, typically at a fixed or floating rate over a given period of time. In the corporate world, there are many sources of debt: bank bills, overdrafts, commercial paper, bonds, swaps, traditional loans (secured and unsecured) and so forth. Many of these you have to be sufficiently large to have access to (eg corporate bonds are an option for the Toyotas of the world).&lt;/p&gt;  &lt;p&gt;Equity is ownership of the company. Depending on your jurisdiction there are many forms of equity: ordinary shareholders, preferential shareholders and so on. They have different rights and a different pecking order for being repaid if the company is ever wound up (and typically the debt-holders will be ahead of all of them).&lt;/p&gt;  &lt;p&gt;In between there are countless variations (eg convertible notes are a debt instrument that can be converted to equity in certain circumstances).&lt;/p&gt;  &lt;p&gt;Companies generally strive for a healthy mix of debt and equity funding options.&lt;/p&gt;  &lt;p&gt;The fallacy that many tech companies succumb to is that venture capitalists are their only source of funding. What’s more, VC funding is about the most expensive source of funding. A bank, being your typical source for a loan, will look at your plan and make a decision on your ability to repay the loan. Not your revenue but your income (being revenue minus expenses), both current and projected.&lt;/p&gt;  &lt;p&gt;VCs typically look for blue-sky potential, often in ventures that don’t even generate revenue now or in the foreseeable future. Still any business plan will need to answer the questions of “when” and “how” the investors will get a return.&lt;/p&gt;  &lt;h3&gt;What Does Stackoverflow Want?&lt;/h3&gt;  &lt;p&gt;This move is surprising consider Joel wrote &lt;a href="http://www.joelonsoftware.com/articles/VC.html"&gt;Fixing Venture Capital&lt;/a&gt; and &lt;a href="http://www.joelonsoftware.com/articles/fog0000000056.html"&gt;Strategy Letter I: Ben and Jerry's vs. Amazon&lt;/a&gt;. Joel is somewhat vague on their motivations, saying only:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;Now we’re biting off the bigger goal of changing the way&lt;em&gt;everyone &lt;/em&gt;gets answers to their questions on the Internet, and that’s something we can’t do alone.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;The infrastructure (hardware and bandwidth) is cheap (almost free) for Q&amp;amp;A. Stackoverflow.com seems to run on three Web servers based on &lt;a href="http://highscalability.com/blog/2009/8/5/stack-overflow-architecture.html"&gt;Stack Overflow Architecture&lt;/a&gt; (a little outdated but those Web servers are low RAM and single CPU, which means dirt cheap) and &lt;a href="http://blog.stackoverflow.com/2010/01/stack-overflow-network-configuration/"&gt;Stack Overflow Network Configuration&lt;/a&gt;.&lt;/p&gt;  &lt;p&gt;It’s fair to say that hardware is ludicrously cheap. Plentyoffish uses &lt;a href="http://highscalability.com/plentyoffish-architecture"&gt;less than 10 servers&lt;/a&gt; for over a billion monthly page views.&lt;/p&gt;  &lt;p&gt;Is it development? Is there some grand Q&amp;amp;A idea that’s going to take 50 man-years of development time to implement? Jeff has repeatedly said that apart from tweaking around the edges, Stackoverflow as a technology platform is basically “done”.&lt;/p&gt;  &lt;p&gt;Is it to broaden the scope of Stackoverflow? What about a Wikipedia-like platform? What about the Wikipedia content? Is there any money in that?&lt;/p&gt;  &lt;h3&gt;Why Venture Capital?&lt;/h3&gt;  &lt;p&gt;This points to something ridiculously large scale otherwise:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Why wouldn’t a bank fund it (based on existing income)? &lt;/li&gt;    &lt;li&gt;Why wouldn’t Fog Creek fund it? &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;The last is worth mulling over. Fog Creek has ~34 employees. Joel once said for every $10,000/month Fog Creek made he hired a programmer. Fog Creek is a private company so it’s profits aren’t published but it would seem reasonable to assume that their revenue is in the order of $4-10 million per annum.&lt;/p&gt;  &lt;blockquote&gt;   &lt;ol&gt;     &lt;li value="value"&gt;he business itself could benefit from the publicity of getting an investment from someone who is thought of as being a savvy investor. &lt;/li&gt;      &lt;li value="value"&gt;The investor will add substantial value to the business in advice, connections, and introductions. &lt;/li&gt;   &lt;/ol&gt; &lt;/blockquote&gt;  &lt;p&gt;But he also says:&lt;/p&gt;  &lt;blockquote&gt;   &lt;ol&gt;     &lt;li value="value"&gt;The founders are not in it for their own personal aggrandizement and are happy to give up some control to make the business more successful. &lt;/li&gt;   &lt;/ol&gt; &lt;/blockquote&gt;  &lt;p&gt;Interesting. Could it be as simple as wanting to cash out?&lt;/p&gt;  &lt;p&gt;I suspect (3) and (4) are more what it’s about but without knowing what they want to do it’s largely impossible to figure out the why.&lt;/p&gt;  &lt;h3&gt;Conclusion&lt;/h3&gt;  &lt;p&gt;It’s hard not to be concerned by this. The evil hyphen site became evil when they tried to take what was free content and and monetize it using a subscription model. I don’t believe this is a likely outcome here but when you give up control, it’s a question of what your investors believe is the path to profitability that matters.&lt;/p&gt;  &lt;p&gt;Many businesses fail because they try to apply something that worked one place to another area where it simply doesn’t work. I would hate to see this happen to Stackoverflow as I’m personally a big fan of the site.&lt;/p&gt;  &lt;p&gt;There’s something to be said for leaving something that works well enough alone and turning your attention to building something else. Not everyone can or should be Microsoft or Google. Trying to be is typically a surefire way of converting success into failure.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;em&gt;Update:&lt;/em&gt;&lt;/strong&gt; I misspoke regarding the Stackoverflow Web server configuration. Fixed.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/g5VyzCcC2-4" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/3948045478498721299/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2010/02/stackoverflow-joel-and-jeff-want-vc.html#comment-form" title="14 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/3948045478498721299?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/3948045478498721299?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/g5VyzCcC2-4/stackoverflow-joel-and-jeff-want-vc.html" title="Stackoverflow: Joel and Jeff want VC Money? Say What?" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>14</thr:total><feedburner:origLink>http://www.cforcoding.com/2010/02/stackoverflow-joel-and-jeff-want-vc.html</feedburner:origLink></entry><entry gd:etag="W/&quot;A0cBQHs8fip7ImA9WxBWGEo.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-1738664673136658575</id><published>2010-02-11T17:04:00.001+08:00</published><updated>2010-02-11T17:04:11.576+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-02-11T17:04:11.576+08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="open source" /><category scheme="http://www.blogger.com/atom/ns#" term="Java" /><category scheme="http://www.blogger.com/atom/ns#" term="parsing" /><category scheme="http://www.blogger.com/atom/ns#" term="markdown" /><title>Markdown, Inline Parsing and Badly Formed HTML</title><content type="html">&lt;p&gt;I haven’t had much time to work on my Markdown parser lately (sadly) but I thought it was worth posting an update on where I’m at. I have been digging deep into the dark depths of inline parsing. I have &lt;a href="http://www.cforcoding.com/2010/02/markdown-block-parsing-and-road-to-hell.html"&gt;previously discussed the two modes of parsing Markdown&lt;/a&gt;, which I call block and inline.&lt;/p&gt;  &lt;p&gt;But the block parsing is done (well, I have to go back and tweak &lt;em&gt;one&lt;/em&gt; thing) so I’m onto the murky world of inline Markdown parsing.&lt;/p&gt;  &lt;h3&gt;Parsing Block Markup&lt;/h3&gt;  &lt;p&gt;Various Markdown implementations allow you to create markup blocks. There are usually quite strict requirements about how you can write these blocks. For example, you might need to put the start and end tags on separate lines such as:&lt;/p&gt;  &lt;pre class="brush:plain"&gt;&amp;lt;ul&amp;gt;
  &amp;lt;li&amp;gt;one&amp;lt;/li&amp;gt;
  &amp;lt;li&amp;gt;two&amp;lt;/li&amp;gt;
  &amp;lt;li&amp;gt;three&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;  &lt;/pre&gt;

&lt;p&gt;I have a much more forgiving approach to this such that this “Markdown”:&lt;/p&gt;

&lt;pre class="brush:plain"&gt;This is a paragraph with a &amp;lt;ul&amp;gt;&amp;lt;li&amp;gt;nested&amp;lt;/li&amp;gt;&amp;lt;/li&amp;gt;block&amp;lt;/li&amp;gt; with
some &amp;lt;hr&amp;gt;random&amp;lt;h2&amp;gt;other tags&amp;lt;/h2&amp;gt; in it&lt;/pre&gt;

&lt;p&gt;and convert it to:&lt;/p&gt;

&lt;pre class="brush:xml"&gt;&amp;lt;p&amp;gt;This is a paragraph&amp;lt;/p&amp;gt;

&amp;lt;ul&amp;gt;
  &amp;lt;li&amp;gt;nested&amp;lt;/li&amp;gt;
  &amp;lt;li&amp;gt;block&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;

&amp;lt;p&amp;gt;with some&amp;lt;/p&amp;gt;

&amp;lt;hr&amp;gt;

&amp;lt;p&amp;gt;random&amp;lt;/p&amp;gt;

&amp;lt;h2&amp;gt;other tags&amp;lt;/h2&amp;gt;

&amp;lt;p&amp;gt;in it&amp;lt;/p&amp;gt;&lt;/pre&gt;

&lt;p&gt;&lt;em&gt;This part already works.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;But it gets better. It will also take that some input stream and convert it to:&lt;/p&gt;

&lt;pre class="brush:plain"&gt;This is a paragraph

- nested
- block

with some

------

random

## other tags ##

in it&lt;/pre&gt;

&lt;p&gt;So to be clear: this will convert acceptable markup to markdown and filter out unacceptable markup (like script tags).&lt;/p&gt;

&lt;p&gt;This will include parsing links and images into Markdown references.&lt;/p&gt;

&lt;h3&gt;Parsing Inline Markup&lt;/h3&gt;

&lt;p&gt;This is what I’m working on now. I’m still looking for a good generic way of doing this that correctly captures tag hierarchies (eg list items must be children to unordered and ordered lists). What I’m probably going to do is release a messy version of the code (being the current version) then go back and revisit it once I have a working implementation.&lt;/p&gt;

&lt;p&gt;This is a good general principle: it’s far easier to fix something that’s complete and working than it is to constantly strive for perfection in incomplete code (basically &lt;a href="http://www.folklore.org/StoryView.py?story=Real_Artists_Ship.txt"&gt;artists ship&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;One thing I’m debating is whether I require tags to be balanced. That means whether I accept this:&lt;/p&gt;

&lt;pre class="brush:xml"&gt;&amp;lt;b&amp;gt;this is&amp;lt;i&amp;gt;a&amp;lt;/b&amp;gt; test&amp;lt;/i&amp;gt;&lt;/pre&gt;

&lt;p&gt;Ideally I’d like to &lt;em&gt;not&lt;/em&gt; accept this. XML/XHTML requires balanced tags but HTML either doesn’t or even if it does, most browsers are quite forgiving of this. XML treats markup essentially as a document tree whereas the HTML view is more like tags are, in certain circumstances, switches to turn behaviour on or off.&lt;/p&gt;

&lt;h3&gt;Markdown Formatting&lt;/h3&gt;

&lt;p&gt;I went into this problem thinking I could construct a document tree out of&lt;/p&gt;

&lt;pre class="brush:plain"&gt;***this is a* test**&lt;/pre&gt;

&lt;p&gt;into&lt;/p&gt;

&lt;pre class="brush:plain"&gt;document
+- bold
   +- italic
   |  +- text: this is a 
   +- text: test&lt;/pre&gt;

&lt;p&gt;but that idea quickly falls down when you consider that this is valid Markdown:&lt;/p&gt;

&lt;pre class="brush:plain"&gt;**this is *a** test*&lt;/pre&gt;

&lt;p&gt;which basically parses to:&lt;/p&gt;

&lt;pre class="brush:plain"&gt;BOLD_ON
TEXT(&amp;quot;this is &amp;quot;)
ITALIC_ON
TEXT(&amp;quot;a&amp;quot;)
BOLD_OFF
TEXT(&amp;quot; test&amp;quot;)
ITALIC_OFF&lt;/pre&gt;

&lt;p&gt;Almost any Markdown parser will generate HTML from this that looks like this:&lt;/p&gt;

&lt;pre class="brush:xml"&gt;&amp;lt;strong&amp;gt;this is &amp;lt;em&amp;gt;a&amp;lt;/strong&amp;gt; test&amp;lt;/em&amp;gt;&lt;/pre&gt;

&lt;p&gt;That’s unfortunate because I like the document tree. But sadly the matching problem still remains because if a special sequence doesn’t have a matching close it is put into the document as a literal sequence.&lt;/p&gt;

&lt;p&gt;This leads to some fairly pathological corner cases like:&lt;/p&gt;

&lt;pre class="brush:plain"&gt;*this [link* google][1]

  [1]: http://google.com&lt;/pre&gt;

&lt;p&gt;which will translate to&lt;/p&gt;

&lt;pre class="brush:xml"&gt;&amp;lt;em&amp;gt;this &amp;lt;a href=&amp;quot;http://google.com&amp;quot;&amp;gt;link&amp;lt;/em&amp;gt; google&amp;lt;/a&amp;gt;&lt;/pre&gt;

&lt;p&gt;and browsers will tend to break that link into two parts (where you can click on “link” or “google”).&lt;/p&gt;

&lt;p&gt;But work progresses.&lt;/p&gt;

&lt;h3&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;Even with a lot of the Markdown spec being parsed my transformation times on basic documents (eg a couple of lists, a block quote and some paragraphs) is still under 60 microseconds (roughly) and that’s with some messy array manipulation and temporary object creation that I plan to revisit and clean up.&lt;/p&gt;

&lt;p&gt;At this stage I’m hoping to have some committed and available for comment within two weeks. It won’t be pretty but my goal is to get feedback earlier rather than later.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/belD0NvzMXs" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/1738664673136658575/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2010/02/markdown-inline-parsing-and-badly.html#comment-form" title="4 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/1738664673136658575?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/1738664673136658575?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/belD0NvzMXs/markdown-inline-parsing-and-badly.html" title="Markdown, Inline Parsing and Badly Formed HTML" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>4</thr:total><feedburner:origLink>http://www.cforcoding.com/2010/02/markdown-inline-parsing-and-badly.html</feedburner:origLink></entry><entry gd:etag="W/&quot;A0YAQ3w6fyp7ImA9WxBWE0w.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-2778215083728630318</id><published>2010-02-05T05:32:00.001+08:00</published><updated>2010-02-05T05:32:22.217+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-02-05T05:32:22.217+08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="opinion" /><category scheme="http://www.blogger.com/atom/ns#" term="Java" /><title>Standing on the Outside</title><content type="html">&lt;p&gt;This week I read &lt;a href="http://codebetter.com/blogs/kyle.baley/archive/2010/02/02/life-outside-net-or-how-to-check-out-your-neighbours.aspx"&gt;Life outside .NET, or “How to check out your neighbours”&lt;/a&gt;. I really like posts like this. They’re instructive about the culture of a particular community.&lt;/p&gt;  &lt;p&gt;For over a decade I’ve been a Java developer (since JDK 1.0.2). Like most Java developers I have a love-hate relationship with the language, the libraries and Sun. Java didn’t invent the virtual machine but it certainly popularized it. 5-10 years ago (in particular) Java was a hotbed for the development of many technologies, concepts and frameworks.&lt;/p&gt;  &lt;p&gt;As the author notes, MVC and DI (dependency injection) are simply assumed in Javaland. It’s true. Good luck finding a non-MVC Web framework in Java out of the dozens that exist.&lt;/p&gt;  &lt;p&gt;My experience and exposure with .Net is at best peripheral. ASP.NET always struck me as somewhat &lt;em&gt;primitive &lt;/em&gt;in the sense that it’s what would’ve happened had JSP been taken to the nth-degree instead of being supplanted by Struts and all that came after. That’s not to say ASP.NET is bad or doesn’t do it’s job but to a Java developer it seems somehow &lt;em&gt;crude&lt;/em&gt;.&lt;/p&gt;  &lt;p&gt;Beyond the boring and irrelevant comparisons of Java vs. .Net performance, the more interesting comparison is as a proxy for decentralized vs. centralized platform progression.&lt;/p&gt;  &lt;p&gt;The &lt;em&gt;Microsoft Way&lt;/em&gt; definitely has its advantages. Where once Redmond was playing catch-up on Java (technically speaking), Sun’s inability to lead (and no clue where they were going if they could) has left Java largely stagnant. Java 7 is due at the end of the year but has been delayed &lt;em&gt;years&lt;/em&gt;. Thankfully it’s now getting closures if for no other reason than we can all stop bitching about it (frankly, I think some form of function pointers or delegates in “C#-speke” will be sufficient for 99% of use cases).&lt;/p&gt;  &lt;p&gt;It can be useful not to have a diaspora of Web development frameworks (even at the cost of innovation). Takes a Struts developer and put them on a Wicket or Tapestry project and their experience won’t be especially applicable.&lt;/p&gt;  &lt;p&gt;It will certainly be interesting to see if Oracle can provide more leadership than Sun. Oracle was always heavily invested in Java&amp;#160; so I’m hoping Java isn’t simply collateral damage to Larry’s acquisition of Sun’s server business. Bizarrely Oracle seems committed to JavaFX of all things.&lt;/p&gt;  &lt;p&gt;For those of you unfamiliar with it, JavaFX is Sun’s “me too” Flash alternative and a prime example of Sun’s boondoggles of recent years.&lt;/p&gt;  &lt;p&gt;I for one welcome our new insect &lt;a href="http://knowyourmeme.com/memes/i-for-one-welcome-our-new-overlords"&gt;overlords&lt;/a&gt;. I’d like to remind them that as a trusted blogger, I can be helpful in rounding up others to toil in their underground sugar caves.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/AUnT4UWQb9w" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/2778215083728630318/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2010/02/standing-on-outside.html#comment-form" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/2778215083728630318?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/2778215083728630318?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/AUnT4UWQb9w/standing-on-outside.html" title="Standing on the Outside" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>1</thr:total><feedburner:origLink>http://www.cforcoding.com/2010/02/standing-on-outside.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DEIERnw6cSp7ImA9WxBWEEU.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-6936739971799482067</id><published>2010-02-02T12:53:00.000+08:00</published><updated>2010-02-02T12:55:07.219+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-02-02T12:55:07.219+08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="opinion" /><category scheme="http://www.blogger.com/atom/ns#" term="parsing" /><category scheme="http://www.blogger.com/atom/ns#" term="markdown" /><title>Markdown, Block Parsing and the Road to Hell</title><content type="html">&lt;p&gt;I thought it times to update my status on this particular undertaking, which so far has ended up being far more massive than originally envisioned.&lt;/p&gt;  &lt;p&gt;The overall design of the Markdown parser is that there are two parsers… &lt;em&gt;kinda&lt;/em&gt;. There is a parser to break your document into blocks and another to interpret the inline content within those blocks. As soon as I made this realization, everything just got a whole lot easier.&lt;/p&gt;  &lt;p&gt;I use this this term (and “inline”) because those are the terms HTML uses (“block elements” and “inline elements”). Of course HTML also gets more complex (eg “replaced” vs “non-replaced” elements and inline-block, floats, etc) but fundamentally you can think of a Markdown document—or any hypertext document—as consisting of block and inline elements.&lt;/p&gt;  &lt;p&gt;Markdown parsers will often talk about “blocks” and “spans” instead.&lt;/p&gt;  &lt;h3&gt;Block Parsing&lt;/h3&gt;  &lt;p&gt;The first level of parsing of Markdown is into blocks.&lt;/p&gt;  &lt;p&gt;Such a document can be viewed as a tree. The root node is the document. Every node below that is either a block or an inline node. The tree can be arbitrarily deep and there are certain rules about relationships in that tree. For instance:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Block nodes are only ever children of other block nodes (counting the root Document node as a block node); &lt;/li&gt;    &lt;li&gt;Paragraphs can only contain inline elements; &lt;/li&gt;    &lt;li&gt;List items must be children of lists; &lt;/li&gt;    &lt;li&gt;and so on. &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;The goal of any parser is take an input and build a &lt;em&gt;valid syntax tree&lt;/em&gt; based on the rules defined.&lt;/p&gt;  &lt;p&gt;This part of the problem for what I’m writing is now done. This includes code blocks, paragraphs, block quotes, ordered and unordered lists, headers and horizontal rules. Tables I plan to return to later.&lt;/p&gt;  &lt;h3&gt;List Parsing&lt;/h3&gt;  &lt;p&gt;Today I came across &lt;a href="http://blog.stackoverflow.com/2008/06/three-markdown-gotcha/"&gt;Three Markdown Gotchas&lt;/a&gt;, which I hadn’t seen before but it opened my eyes to one particular area of difficulty I had: list processing. Go to StackOverflow, ask a question and type in:&lt;/p&gt;  &lt;pre class="brush:plain"&gt;- one
 - two
  - three
   - four&lt;/pre&gt;

&lt;p&gt;and you probably won’t get you what you expect. You get this:&lt;/p&gt;

&lt;pre class="brush:xml"&gt;&amp;lt;ul&amp;gt;
  &amp;lt;li&amp;gt;one
    &amp;lt;ul&amp;gt;
      &amp;lt;li&amp;gt;two&amp;lt;/li&amp;gt;
      &amp;lt;li&amp;gt;two&amp;lt;/li&amp;gt;
      &amp;lt;li&amp;gt;two&amp;lt;/li&amp;gt;
    &amp;lt;/ul&amp;gt;
  &amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;&lt;/pre&gt;

&lt;p&gt;Let me give you some background: Markdown has the concept of &lt;em&gt;indents&lt;/em&gt;. Based on a predefined tab width (typically 4), a single tab or 4 spaces represents one indent. That’s important because code lines are preceded by one indent. A &lt;em&gt;non-indent space&lt;/em&gt; is sometimes ignored at the beginning of a line, for example at the start of a paragraph line or the continuation of an existing one.&lt;/p&gt;

&lt;p&gt;The original Markdown “spec” says that nesting list items is done by preceding the line with one more indent than the previous line. In vanilla Markdown the above sequence would come out as:&lt;/p&gt;

&lt;pre class="brush:xml"&gt;&amp;lt;ul&amp;gt;
  &amp;lt;li&amp;gt;one&amp;lt;/li&amp;gt;
  &amp;lt;li&amp;gt;two&amp;lt;/li&amp;gt;
  &amp;lt;li&amp;gt;two&amp;lt;/li&amp;gt;
  &amp;lt;li&amp;gt;two&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;&lt;/pre&gt;

&lt;p&gt;because none of the lines has a leading indent. That’s logical and consistent. Jeff’s point is basically that even one space should indicate intent and be interpreted as nesting. Sounds reasonable right? Maybe. The problem is that it leads to unintended complexity.&lt;/p&gt;

&lt;p&gt;Go back to the above example and put one, two then three spaces in front of the first list item. Watch the preview pane to see how the list changes. The implied nesting changes all over the place? Logical? I think not.&lt;/p&gt;

&lt;p&gt;But it gets worse.&lt;/p&gt;

&lt;pre class="brush:plain"&gt;- one

 two
 - three

 four&lt;/pre&gt;

&lt;p&gt;comes out as&lt;/p&gt;

&lt;pre class="brush:plain"&gt;&amp;lt;ul&amp;gt;
  &amp;lt;li&amp;gt;
    &amp;lt;p&amp;gt;one&amp;lt;/p&amp;gt;
    &amp;lt;p&amp;gt;two&amp;lt;/p&amp;gt;
    &amp;lt;ul&amp;gt;
      &amp;lt;li&amp;gt;three&amp;lt;/li&amp;gt;
    &amp;lt;/ul&amp;gt;
    &amp;lt;p&amp;gt;four&amp;lt;/p&amp;gt;
  &amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;&lt;/pre&gt;

&lt;p&gt;&lt;em&gt;Okay&lt;/em&gt;… bear in mind that there are spaces before two and four so that you continue the list item. Otherwise they would be interpreted as separate paragraphs. But what if you want four to continue the nested list item three? How much indentation do you need? It turns out that the magical number is anything from 5 to 11.&lt;/p&gt;

&lt;p&gt;But it gets worse. Put one space before one and suddenly one and three are the same list so four is now indented so far that it becomes a code block run-on from three. Add a second space to the front of one and for some reason it returns to the original nesting even though one is now indented more than three. Huh?&lt;/p&gt;

&lt;p&gt;I’ll leave an examination of the MarkdownSharp source code as to the reasons for this as an exercise for the reader. Suffice it to say that it all stems from the motivation that one (more) space indicating nesting being somehow more intuitive.&lt;/p&gt;

&lt;h3&gt;The Road to Hell&lt;/h3&gt;

&lt;p&gt;The road to hell is paved with good intentions. It’s one of my favourite sayings. We programmers as a whole are unreasonable people. Through a combination of hubris, stubbornness and even laziness we have a tendency to throw out what’s been done before or simply make breaking changes because we prefer it, we think others will prefer it, we don’t appreciate that someone else may have to deal with the consequences or simply out of ignorance as to what led to the original changes.&lt;/p&gt;

&lt;p&gt;We all do this, myself included. It’s worst when it not only manifests itself in company culture but it’s &lt;em&gt;enshrined&lt;/em&gt;. Take Microsoft as a prime example. Internet Explorer has “Favourites”. What the hell are favourites? Well, they’re bookmarks. But IE can’t call them that because Netscape called them that first and Microsoft wanted to differentiated themselves and their products. This is of course led to many conversations I know I had at the time that went something like this:&lt;/p&gt;

&lt;p&gt;New user: What’s a favourite?
  &lt;br /&gt;Me: It’s a bookmark.&lt;/p&gt;

&lt;p&gt;I couldn’t help but laugh out loud when I first read C# and saw all the things copied from Java had been renamed, sometimes with significantly worse names. Java’s final as C#’s sealed springs to mind. You can just tell that there were people dedicated to the task of finding names to Java concepts and keywords. &lt;em&gt;It’s just sad&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Hyperbolae aside, I digress.&lt;/p&gt;

&lt;p&gt;The point of all this is that:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Often things that came before you were done for a reason, whether or not you’re aware of it and whether or not you agree with it if you are;&lt;/li&gt;

  &lt;li&gt;Breaking changes have a high price so much so that the cure is often far worse than the disease and your delicate sensibilities be damned. Internal consistency and syntactic purity is overrated. Interestingly those overly encumbered with such sensibilities seem to have a disproportionate tendency to become Python programmers.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;List Sanity&lt;/h3&gt;

&lt;p&gt;For this reason my parser has returned to what is probably the original implementation. That is:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;A leading non-indent space is ignored before list items. That is, it implies no meaning and is discarded so there is no difference between 0 and 2 leading spaces before a list item; &lt;/li&gt;

  &lt;li&gt;Up to one leading indent (meaning one tab or 0 to 4 spaces) is consumed from each subsequent line until a new list items is hit or a line with no leading spaces is met. The subsequent list item will be a part of the same list. Text with no leading spaces will end the list and form a new paragraph; and &lt;/li&gt;

  &lt;li&gt;All lines that continue the list item are combined (with their leading tab or 0 to 4 spaces consumed) and they form a new &lt;em&gt;block context&lt;/em&gt;. Meaning they are then parsed as if they were a separate input, meaning it can contain new lists, block quotes, code segments and so on. &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;(3) provides a lot of consistency. it means that if you have a list item followed by a line with two indents that second line will be a code block (one indent marking a continued list item, the second will be interpreted as a code block within the list item block context).&lt;/p&gt;

&lt;p&gt;To me this is supremely more logical—and easier to implement—but I guess if you’re really attached to nesting list items with a single space and figuring out that 5 to 11 spaces is the magical number of spaces to continue a nested list item then you’ll hate it. Too bad.&lt;/p&gt;

&lt;p&gt;The nested block context from (3) has one exception. If the nested block context would result in a single paragraph then that paragraph is unwrapped to being inline content of the list item. This has one important effect, which some may consider a breaking change. Namely this Markdown:&lt;/p&gt;

&lt;pre class="brush:plain"&gt;- one
- two
- three&lt;/pre&gt;

&lt;p&gt;and&lt;/p&gt;

&lt;pre class="brush:plain"&gt;- one

- two

- three&lt;/pre&gt;

&lt;p&gt;will both be interpreted as being:&lt;/p&gt;

&lt;pre class="brush:xml"&gt;&amp;lt;ul&amp;gt;
  &amp;lt;li&amp;gt;one&amp;lt;/li&amp;gt;
  &amp;lt;li&amp;gt;two&amp;lt;/li&amp;gt;
  &amp;lt;li&amp;gt;three&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;&lt;/pre&gt;

&lt;p&gt;whereas MarkdownSharp will interpret the latter as:&lt;/p&gt;

&lt;pre class="brush:plain"&gt;&amp;lt;ul&amp;gt;
  &amp;lt;li&amp;gt;&amp;lt;p&amp;gt;one&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;
  &amp;lt;li&amp;gt;&amp;lt;p&amp;gt;two&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;
  &amp;lt;li&amp;gt;&amp;lt;p&amp;gt;three&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;&lt;/pre&gt;

&lt;p&gt;which is something &lt;a href="http://www.cforcoding.com/2010/01/markdown-musings-on-unintended.html"&gt;I've previously documented and disagreed with&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;But this could be interpreted as a breaking change so I will probably add a special case for just this scenario as an option&lt;/p&gt;

&lt;h3&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;The block parsing portion is done. The code is ugly and needs to be refactored (again) but it works. I still have an issue with too many temporary objects being created (mainly because it simplified some code) and I’ll need to go back and eliminate that.&lt;/p&gt;

&lt;p&gt;What’s been interesting is that I’ve now rewritten the block parsing at least four times before it felt right. John Carmack once said he needs to write something five or six times before he gets it right. I agree with his sentiment. It takes that long to truly understand the domain, in my opinion.&lt;/p&gt;

&lt;p&gt;The inline parsing has been a completely different set of problems. I will have a follow-up post on that soon.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/oVpahk4nXXQ" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/6936739971799482067/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2010/02/markdown-block-parsing-and-road-to-hell.html#comment-form" title="4 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/6936739971799482067?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/6936739971799482067?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/oVpahk4nXXQ/markdown-block-parsing-and-road-to-hell.html" title="Markdown, Block Parsing and the Road to Hell" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>4</thr:total><feedburner:origLink>http://www.cforcoding.com/2010/02/markdown-block-parsing-and-road-to-hell.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CUEMRnY6eCp7ImA9WxBXFkQ.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-8527414037634384027</id><published>2010-01-28T23:48:00.001+08:00</published><updated>2010-01-28T23:48:07.810+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-01-28T23:48:07.810+08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="opinion" /><category scheme="http://www.blogger.com/atom/ns#" term="Java" /><title>Java IDEs: the Blue Heeler, the Dachshund and the Labradoodle</title><content type="html">&lt;p&gt;I’ve had a frustrating week. I’m on a mission to find out why a piece of code I wrote had a “blowout” in execution time (“blowout” here means 60 microseconds instead of 15 in sustained usage just to keep things in perspective). I suspect it’s to do with temporary objects either auto-boxing/unboxing and/or temporary arrays.&lt;/p&gt;  &lt;p&gt;Java, in my opinion, has the best IDEs of any language or platform bar none. Say what you want about the language but the IDEs are, on the whole, first rate. That doesn’t mean there aren’t bumps along the road however.&lt;/p&gt;  &lt;p&gt;For the purposes of this completely biased rant I shall liken them to dog breeds.&lt;/p&gt;  &lt;h3&gt;Blue Heeler&lt;/h3&gt;  &lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/File:ACD-blue-spud.jpg" rel="license"&gt;&lt;img style="width: 250px" src="http://img32.imageshack.us/img32/8514/blueheeler.jpg" /&gt;&lt;/a&gt;The Blue Heeler is one kind of &lt;a href="http://en.wikipedia.org/wiki/Australian_Cattle_Dog"&gt;Australian cattle dog&lt;/a&gt;. It’s used on sheep farms and cattle stations to round up livestock. It’s not the prettiest of breeds.&lt;/p&gt;  &lt;p&gt;So you won’t see these as family pets or in trendy dog parks or in your neighbourhood. But they’re smart, obedient, protective and hard-working. If you’re herding cattle? You won’t see much else. The Blue Heeler is a &lt;em&gt;working dog&lt;/em&gt; and a victory for utilitarianism.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://www.jetbrains.com/idea/"&gt;IntelliJ IDEA&lt;/a&gt; is the Blue Heeler of the Java IDE world.&lt;/p&gt;  &lt;p&gt;Ever hear anyone rave about &lt;a href="http://www.jetbrains.com/resharper/index.html"&gt;Resharper&lt;/a&gt; when talking about Visual Studio? Or even go so far as to say that Resharper is what makes Visual Studio good? Well, Resharper is adding the functionality to Visual Studio that IntelliJ has for Java.&lt;/p&gt;  &lt;p&gt;Yet all is not perfect in IntelliJ-land. The biggest problem is plugins. You certainly don’t have the range that, say, Eclipse does. But nor do you have the “plugin hell” woes so often associated with Eclipse either. Open source frameworks will tend to release plugins for Eclipse and its up to third parties to make IntelliJ versions, which doesn’t always happen.&lt;/p&gt;  &lt;p&gt;On the bright side, you don’t actually need that many plugins because nearly everything you need is done out of the box anyway.&lt;/p&gt;  &lt;p&gt;What makes it worse is that &lt;strong&gt;&lt;em&gt;&lt;a href="http://www.jetbrains.com/"&gt;Jetbrains&lt;/a&gt; keeps breaking all the plugins&lt;/em&gt;&lt;/strong&gt;. IntelliJ 9 is relatively new but it once again broke all the plugins. I’ve lost count of the number of times a major version has done this. Seriously, can’t you guys make the plug-in architecture remotely backwards compatible? Is that breaking change you’re making &lt;em&gt;really&lt;/em&gt; necessary? Really?&lt;/p&gt;  &lt;p&gt;The &lt;a href="http://jetty.codehaus.org/jetty/"&gt;Jetty&lt;/a&gt; plugin &lt;em&gt;still&lt;/em&gt; doesn’t work, which is a reasonably big deal. Worse, i can’t find a profiler that works in IntelliJ to save my life, except possibly &lt;a href="http://www.ej-technologies.com/products/jprofiler/overview.html"&gt;JProfiler&lt;/a&gt; but who can justify &lt;a href="http://www.ej-technologies.com/buy/jprofiler/single"&gt;$499&lt;/a&gt; for a fixed single license of a profiler? Especially considering the same thing for the whole rest of the IDE is &lt;a href="http://www.jetbrains.com/idea/buy/index.jsp"&gt;$249&lt;/a&gt; (if you don’t want to use the &lt;a href="http://www.jetbrains.com/idea/free_java_ide.html"&gt;free version&lt;/a&gt;.&lt;/p&gt;  &lt;h3&gt;Dachshund&lt;/h3&gt;  &lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/File:Short-haired-Dachshund.jpg" rel="license"&gt;&lt;img style="width: 250px" src="http://img710.imageshack.us/img710/9428/dachshund.jpg" /&gt;&lt;/a&gt;The &lt;a href="http://en.wikipedia.org/wiki/Dachshund"&gt;Dachshund&lt;/a&gt; is a strange and impractical dog. Just look at it and you can tell it’s no product of evolution. Not without man’s intervention anyway.&lt;/p&gt;  &lt;p&gt;Yet people like them. Families own them. They are however stubborn and hard to train and they have their fair share of health problems (including spinal problems unsurprisingly).&lt;/p&gt;  &lt;p&gt;&lt;a href="http://www.eclipse.org/"&gt;Eclipse&lt;/a&gt; is the Dachshund of the Java IDE world.&lt;/p&gt;  &lt;p&gt;If IntelliJ has not enough plugins then arguably Eclipse has too many. Hell, this goes so far as &lt;a href="http://stackoverflow.com/questions/185486/which-eclipse-subversion-plugin-should-i-use"&gt;having two Subversion plugins&lt;/a&gt;. A former colleague, who was an avid Eclipse fan, could never get either one to work. Sometimes they lied about checking stuff in (&lt;em&gt;big&lt;/em&gt; problem) or just conflicted with other stuff.&lt;/p&gt;  &lt;p&gt;Every time I try and use Eclipse for something I’m struck with an overwhelming sense of how awkward and unintuitive it is. Take Maven projects as one example. In IntelliJ or Netbeans you just open one up and it &lt;em&gt;just works&lt;/em&gt;. &lt;a href="http://www.google.com.au/search?q=eclipse+maven"&gt;Googling&lt;/a&gt; doesn’t really help either. &lt;a href="http://maven.apache.org/eclipse-plugin.html"&gt;The first link&lt;/a&gt; is seemingly out of date and it doesn’t get much better.&lt;/p&gt;  &lt;p&gt;Now I realize this is a whole Coke vs Pepsi thing. Many people are no doubt experts in Eclipse. They’re used to the “Eclipse Way” so it all makes sense (which strikes me as a form of &lt;a href="http://en.wikipedia.org/wiki/Stockholm_syndrome"&gt;Stockholm Syndrome&lt;/a&gt; but I digress…). Hell, they may even like the whole perspectives thing, which I’ve always hated.&lt;/p&gt;  &lt;p&gt;But if you tell me you’ve never had problems with Eclipse plugins you’re lying.&lt;/p&gt;  &lt;h3&gt;Labradoodle&lt;/h3&gt;  &lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/File:Labradoodle_Brown.jpg" rel="license"&gt;&lt;img style="width: 262px" src="http://upload.wikimedia.org/wikipedia/commons/7/70/Labradoodle_Brown.jpg" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;The &lt;a href="http://en.wikipedia.org/wiki/Labradoodle"&gt;Labradoodle&lt;/a&gt; is a strange and relatively new dog breed. Whereas programmers with too much free time come out with &lt;a href="http://www.cise.ufl.edu/~manuel/obfuscate/pi.c"&gt;bizarre ways of calculating Pi&lt;/a&gt; and other such boondoggles, dog breeders with idle hands decide to answer a question that has plagued civilization since Aristotle’s time:&lt;/p&gt;  &lt;p&gt;What happens when you cross a Labrador Retriever with a poodle?&lt;/p&gt;  &lt;p&gt;Baseless hyperbolae aside, I’m sure there was a reason. I just don’t know what it was.&lt;/p&gt;  &lt;p&gt;But what resulted is a friendly, energetic and not-too-bright breed that families tend to like.&lt;/p&gt;  &lt;p&gt;Well &lt;a href="http://netbeans.org/"&gt;Netbeans&lt;/a&gt; is the Labradoodle of the Java IDE world.&lt;/p&gt;  &lt;p&gt;Netbeans does some things very well, particularly Swing development (which admittedly in today’s Web-focused world is a lot like being the best manufacturer of horse bridles and saddles).&lt;/p&gt;  &lt;p&gt;Netbeans also faces an uncertain future with Oracle’s acquisition of Sun.&lt;/p&gt;  &lt;p&gt;Netbeans at least immediately understand my Maven project. It couldn’t find classes with main() methods that were under the test directory (IntelliJ could) but it otherwise all just worked.&lt;/p&gt;  &lt;p&gt;So it was looking good to finally get a profiler running… until I came across a bug. There is at least one open bug against Netbeans that raises an issue against Windows 7. My dev machine is a Windows 7 64 bit machine. Months after Windows 7’s release—nearly a year after the beta version—to still have permission problems is simply unacceptable. Yet that’s what happens when I try and use the profiler.&lt;/p&gt;  &lt;h3&gt;Conclusion&lt;/h3&gt;  &lt;p&gt;All this and I still have no profile of my code!&lt;/p&gt;  &lt;p&gt;Please don’t waste my time and yours by commenting or sending me a message saying I’m wrong about &amp;lt;&lt;em&gt;insert favourite IDE here&lt;/em&gt;&amp;gt;. You’re missing the point of a rant (in that largely there isn’t one).&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/t-7YiP6nC08" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/8527414037634384027/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2010/01/java-ides-blue-heeler-dachshund-and.html#comment-form" title="10 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/8527414037634384027?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/8527414037634384027?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/t-7YiP6nC08/java-ides-blue-heeler-dachshund-and.html" title="Java IDEs: the Blue Heeler, the Dachshund and the Labradoodle" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>10</thr:total><feedburner:origLink>http://www.cforcoding.com/2010/01/java-ides-blue-heeler-dachshund-and.html</feedburner:origLink></entry><entry gd:etag="W/&quot;D04EQ3szfCp7ImA9WxBXE08.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-735938443884589256</id><published>2010-01-24T13:34:00.001+08:00</published><updated>2010-01-24T17:38:22.584+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-01-24T17:38:22.584+08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="open source" /><category scheme="http://www.blogger.com/atom/ns#" term="Java" /><category scheme="http://www.blogger.com/atom/ns#" term="parsing" /><category scheme="http://www.blogger.com/atom/ns#" term="computer science" /><category scheme="http://www.blogger.com/atom/ns#" term="markdown" /><title>Markdown and an Introduction to Parsing Expression Grammars (PEG)</title><content type="html">&lt;p&gt;Writing an &lt;a href="http://www.antlr.org/"&gt;ANTLR&lt;/a&gt; &lt;em&gt;LL(*)&lt;/em&gt; grammar for Markdown has been the itch I just can’t scratch this month. I keep going back to it as I have a new idea about how to approach the problem or how to solve a previous problem I’ve had. Each time I get further but I still keep hitting a wall.&lt;/p&gt;  &lt;p&gt;It’s a shame really because &lt;a href="http://www.antlr.org/works/index.html"&gt;ANTLRWorks&lt;/a&gt; is an excellent tool and ANTLR is an extremely mature product. The rewriting rules and tree grammars are extremely elegant.&lt;/p&gt;  &lt;p&gt;Over the couple of weeks I’ve been investigating PEGs (“Parsing Expression Grammars”). I highly recommend &lt;a href="http://pdos.csail.mit.edu/papers/parsing:popl04.pdf"&gt;Parsing Expression Grammars: A Recognition-Based Syntactic Foundation&lt;/a&gt; by &lt;a href="http://www.brynosaurus.com/"&gt;Bryan Ford&lt;/a&gt;. PEGs are relatively new (Ford’s paper was published in 2004) whereas parsing CFGs (“Context Free Grammars”) with &lt;em&gt;&lt;a href="http://en.wikipedia.org/wiki/LL_parser"&gt;LL&lt;/a&gt;&lt;/em&gt;, &lt;em&gt;&lt;a href="http://en.wikipedia.org/wiki/LR_parser"&gt;LR&lt;/a&gt;&lt;/em&gt; and &lt;em&gt;&lt;a href="http://en.wikipedia.org/wiki/LALR_parser"&gt;LALR&lt;/a&gt;&lt;/em&gt; parsers has a history going back decades.&lt;/p&gt;  &lt;h3&gt;Traditional Parsers&lt;/h3&gt;  &lt;p&gt;Parsing of computer and natural languages (by computers) has it’s roots in &lt;a href="http://en.wikipedia.org/wiki/Noam_Chomsky"&gt;Noam Chomsky&lt;/a&gt;’s work on generative grammars, particularly the &lt;a href="http://en.wikipedia.org/wiki/Chomsky_hierarchy"&gt;Chomsky hierarchy&lt;/a&gt; and the work of &lt;a href="http://en.wikipedia.org/wiki/Donald_Knuth"&gt;Donald Knuth&lt;/a&gt; (On the Translation of Languages from Left to Right [1965]) and Frank DeRemer (&lt;a href="http://portal.acm.org/citation.cfm?id=888578"&gt;Practical Translators for LR(k) Languages&lt;/a&gt; [1969]).&lt;/p&gt;  &lt;p&gt;To understand &lt;a href="http://en.wikipedia.org/wiki/Parsing_expression_grammar"&gt;Parsing Expression Grammars&lt;/a&gt; let me first explain the basic workings of a traditional parsers. The first step is lexical analysis that turns an input stream into a series of tokens. The parser will then apply various rules to these tokens. There are varying techniques for dealing with ambiguities and recursive rules.&lt;/p&gt;  &lt;p&gt;As &lt;a href="http://en.wikipedia.org/wiki/Terence_Parr"&gt;Terence Parr&lt;/a&gt; (the creator of ANTLR) puts it in his (excellent) &lt;a href="http://www.amazon.com/Definitive-Antlr-Reference-Domain-Specific-Programmers/dp/0978739256"&gt;The Definitive ANTLR Reference: Building Domain-Specific Languages&lt;/a&gt;:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;Unfortunately, ANTLR cannot generate a top-down recognizer for every grammar—&lt;em&gt;LL&lt;/em&gt; recognizers restrict the class of acceptable grammars somewhat. For example, ANTLR cannot accept left-recursive grammars such as the following (see Section 11.5, Left-Recursive Grammars, on page 274):&lt;/p&gt;    &lt;pre&gt;&lt;code&gt;/** An expression is defined to be an expression followed by '++' */
expr : expr '++'
     ;&lt;/code&gt;&lt;/pre&gt;

  &lt;p&gt;ANTLR translates this grammar to a recursive method called &lt;code&gt;expr()&lt;/code&gt; that immediately invokes itself:&lt;/p&gt;

  &lt;pre&gt;&lt;code&gt;void expr() {
  expr();
  match(&amp;quot;++&amp;quot;);
}&lt;/code&gt;&lt;/pre&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is something that &lt;em&gt;LALR&lt;/em&gt; parsers handle better.&lt;/p&gt;

&lt;h3&gt;The Lexical Analysis Problem&lt;/h3&gt;

&lt;p&gt;But the big problem as far as Markdown is concerned is that tokens are not context-free. Take a natural definition for lists:&lt;/p&gt;

&lt;pre class="brush:plain"&gt;listItem    : ORDERED inline NEWLINE
            | UNORDERED inline NEWLINE
            ;

ORDERED     : DIGIT+ '.' (' ' | '\t')+ ;
UNORDERED   : ('*' | '-' | '+') (' ' | '\t')+ ;
inline      : (~ NEWLINE)+ ;
NEWLINE     : '\r' '\n'? : '\n' ;&lt;/pre&gt;

&lt;p&gt;and this Markdown:&lt;/p&gt;

&lt;pre class="brush:plain"&gt;1. one
2. two
3. three&lt;/pre&gt;

&lt;p&gt;will be converted into this lexical stream:&lt;/p&gt;

&lt;pre class="brush:plain"&gt;ORDERED inline(&amp;quot;one&amp;quot;) NEWLINE
ORDERED inline(&amp;quot;two&amp;quot;) NEWLINE
ORDERED inline(&amp;quot;three&amp;quot;) NEWLINE&lt;/pre&gt;

&lt;p&gt;and then this AST (&amp;quot;Absract Syntax Tree&amp;quot;) will result:&lt;/p&gt;

&lt;pre class="brush:plain"&gt;document
+- listItem
|  +- ORDERED
|  +- inline (&amp;quot;one&amp;quot;)
|  +- NEWLINE
+- listItem
|  +- ORDERED
|  +- inline (&amp;quot;two&amp;quot;)
|  +- NEWLINE
+- listItem
   +- ORDERED
   +- inline (&amp;quot;three&amp;quot;)
   +- NEWLINE&lt;/pre&gt;

&lt;p&gt;Looks good, right? Wrong. It quickly falls down when the Markdown becomes pathological:&lt;/p&gt;

&lt;pre class="brush:plain"&gt;1. 1. one
2. two
3. three&lt;/pre&gt;

&lt;p&gt;because the input stream is:&lt;/p&gt;

&lt;pre class="brush:plain"&gt;ORDERED ORDERED inline(&amp;quot;one&amp;quot;) NEWLINE
ORDERED inline(&amp;quot;two&amp;quot;) NEWLINE
ORDERED inline(&amp;quot;three&amp;quot;) NEWLINE&lt;/pre&gt;

&lt;p&gt;assuming you can resolve the ambiguity regarding inline being able to technically match &amp;quot;1.&amp;quot; (which you can.... &lt;em&gt;kinda&lt;/em&gt;).&lt;/p&gt;

&lt;p&gt;The above will not be recognized because there is no rule that handles a pair of ORDERED tokens. Really what you want to do is not create an ORDERED token after you’ve already started a list item but at this point &lt;strong&gt;&lt;em&gt;you no longer have a context-free grammar&lt;/em&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;ANTLR’s semantic and syntactic predicates make an admirable effort of dealing with these kinds of ambiguities and context sensitivities but ultimately it’s just not designed for this kind of grammar.&lt;/p&gt;

&lt;h3&gt;Enter PEG&lt;/h3&gt;

&lt;p&gt;PEG parsers take a different approach in two important ways:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;PEGs are &lt;em&gt;not&lt;/em&gt; ambiguous. Choices in the above can lead to ambiguities. ANTLR resolves many of these by using predicates, which are a way of saying “if it looks like a duck then it’s a duck otherwise it’s something else”. PEGs use a &lt;em&gt;prioritized choice operator&lt;/em&gt;, which basically try the choices &lt;em&gt;in order&lt;/em&gt; until it finds one that matches. By definition this is unambiguous because the input stream will either be recognized or it won’t; and &lt;/li&gt;

  &lt;li&gt;PEGs better handle non-CFGs by trying to recognize tokens as part of processing a rule rather than recognizing tokens and then applying rules to them. &lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;Prioritized Choice&lt;/h3&gt;

&lt;p&gt;So in PEG terms, Markdown becomes easier to describe:&lt;/p&gt;

&lt;pre class="brush:plain"&gt;Document &amp;lt;- Line*
Line     &amp;lt;- Heading / ListItem / Inline / Empty
Heading  &amp;lt;= '#'+ WS+ Inline
ListItem &amp;lt;- (DIGIT+ '.' / '*' / '-' / '+') WS+ Inline
Inline   &amp;lt;- (!NEWLINE .)+ NEWLINE

DIGIT    &amp;lt;- [0-9]
WS       &amp;lt;- ' ' | '\t'
NEWLINE  &amp;lt;- '\r\n' / '\r' / '\n'&lt;/pre&gt;

&lt;p&gt;This is of course partial and a simplification but the important thing here is that prioritized choice resolves what otherwise will be ambiguous. This is the “else” clause I’ve been looking for.&lt;/p&gt;

&lt;h3&gt;Context-Sensitive Tokenization&lt;/h3&gt;

&lt;p&gt;Markdown has lots of these issues. For example ‘###’ &lt;em&gt;might&lt;/em&gt; indicate a header but only if that line itself isn’t a header (by the next line consisting of all equals signs or hyphens). ANTLR allows you to handle some of these situations by doing something like:&lt;/p&gt;

&lt;pre class="brush:plain"&gt;HEADER : {getCharPositionInLine()==0]?=&amp;gt; ‘#’+ WS+ ;&lt;/pre&gt;

&lt;p&gt;but what about this Markdown?&lt;/p&gt;

&lt;pre class="brush:plain"&gt;&amp;gt; # quoted heading
&amp;gt; some text&lt;/pre&gt;

&lt;p&gt;It’s entirely possible I’m missing some key part of the puzzle here but I’m not hopeful.&lt;/p&gt;

&lt;p&gt;Ford illustrates this problem:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;...PEGs also create new possibilities for language syntax design. Consider for example a well-known problem with C++ syntax involving nested template type expressions:&lt;/p&gt;

  &lt;pre&gt;&lt;code&gt;vector&amp;lt;vector&amp;lt;float&amp;gt; &amp;gt; MyMatrix;&lt;/code&gt;&lt;/pre&gt;

  &lt;p&gt;The space between the two right angle brackets is required because the C++ scanner is oblivious to the language’s hierarchical syntax, and would otherwise interpret the &lt;code&gt;&amp;gt;&amp;gt;&lt;/code&gt; incorrectly as a right shift operator. &lt;strong&gt;&lt;em&gt;In a language described by a unified PEG, however, it is easy to define the language to permit a &lt;code&gt;&amp;gt;&amp;gt;&lt;/code&gt; sequence to be interpreted as either one token or two depending on its context:&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

  &lt;pre&gt;&lt;code&gt;TemplType &amp;lt;- PrimType (LANGLE TemplType RANGLE)?
ShiftExpr &amp;lt;- PrimExpr (ShiftOper PrimExpr)*
ShiftOper &amp;lt;- LSHIFT / RSHIFT
LANGLE    &amp;lt;- ’&amp;lt;’ Spacing
RANGLE    &amp;lt;- ’&amp;gt;’ Spacing
LSHIFT    &amp;lt;- ’&amp;lt;&amp;lt;’ Spacing
RSHIFT    &amp;lt;- ’&amp;gt;&amp;gt;’ Spacing
&lt;/code&gt;&lt;/pre&gt;
&lt;/blockquote&gt;

&lt;p&gt;(emphasis added)&lt;/p&gt;

&lt;h3&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;This isn’t a new problem and I’m the first to approach the issue of Markdown parsing with a PEG grammar. &lt;a href="http://www.ohloh.net/p/peg-markdown"&gt;peg-markdown&lt;/a&gt; is an implementation of Markdown in C using a PEG parser.&lt;/p&gt;

&lt;p&gt;My own effort is going forward despite this implementation existing for several reasons:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;I plan on having implementations in several languages; &lt;/li&gt;

  &lt;li&gt;I intend to implement various Markdown and Wiki extensions and flavours; and &lt;/li&gt;

  &lt;li&gt;Because I’m getting a kick out of it. &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I learnt compiler theory in university but it was all quite theoretical with simple yet interesting examples. The practical application to a real-world problem is quite something else. Plus PEG is only 6 or so years old so is new to me.&lt;/p&gt;

&lt;p&gt;It is my belief that PEGs are a far more natural and robust means of parsing &lt;em&gt;any&lt;/em&gt; form of Markdown, Wiki syntax, BBcode or other forum format.&lt;/p&gt;

&lt;p&gt;And that’s the direction I’m heading.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/k232pHNkPzk" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/735938443884589256/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2010/01/markdown-and-introduction-to-parsing.html#comment-form" title="7 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/735938443884589256?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/735938443884589256?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/k232pHNkPzk/markdown-and-introduction-to-parsing.html" title="Markdown and an Introduction to Parsing Expression Grammars (PEG)" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>7</thr:total><feedburner:origLink>http://www.cforcoding.com/2010/01/markdown-and-introduction-to-parsing.html</feedburner:origLink></entry><entry gd:etag="W/&quot;C0QCQH4yeCp7ImA9WxBXE0w.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-6693774793451367620</id><published>2010-01-17T11:49:00.002+08:00</published><updated>2010-01-24T13:36:01.090+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-01-24T13:36:01.090+08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="open source" /><category scheme="http://www.blogger.com/atom/ns#" term="Java" /><category scheme="http://www.blogger.com/atom/ns#" term="parsing" /><category scheme="http://www.blogger.com/atom/ns#" term="markdown" /><title>Markdown Headings, Grief and Unknown Elements to the Rescue</title><content type="html">&lt;p&gt;Well it’s not a day to be outside (unless you’re at the beach). It’s &lt;a href="http://www.weather.com.au/wa/perth"&gt;41 degrees&lt;/a&gt; and that’s metric (none of this Imperial rubbish that only the US uses). That’s 106F in the old scale.&lt;/p&gt;  &lt;p&gt;So I’m tackling the problem of Markdown headings in my parser.&lt;/p&gt;  &lt;pre class="brush:plain"&gt;Heading 1
=========

Heading 2
---------

# Heading 1
## Heading 2
### Heading 3

Horizontal rules:

-------
*******
_______&lt;/pre&gt;

&lt;p&gt;This is really annoying for two reasons:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Ambiguous syntax: a line of hyphens could be a horizontal rule or indicate a heading depending on the context (and again we return to the point of Markdown being context-snesitive); and &lt;/li&gt;

  &lt;li&gt;From an LL-persepctive this is left-recursive and requires LL(*) (arbitrary lookahead) in a normal grammar. &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Let me explain.&lt;/p&gt;

&lt;p&gt;A subset of the grammar for Markdown might look something like this (in ANTLR-like syntax):&lt;/p&gt;

&lt;pre class="brush:plain"&gt;document  : block* ;
block     : paragraph | heading | heading1 | heading2 | codeblock ;
paragraph : inline+ END_BLOCK ;
heading   : '#'+ inline NEWLINE ;
heading1  : inline+ NEWLINE '='+ (NEWLINE | END_BLOCK) ;
heading2  : inline| NEWLINE '-'+ (NEWLINE | END_BLOCK) ;
inline    : '*' inline+ '*'
          | '`' inline+ '`'
          | ...
          | OTHER+
OTHER     : '.' ;
END_BLOCK : '\n' '\n'+ | EOF ;&lt;/pre&gt;

&lt;p&gt;Try and plug something like that into ANTLR and it will complain all over the place.&lt;/p&gt;

&lt;p&gt;Firstly it’s ambiguous. An input sequence like “*123*” matches two of the inline alternatives. I’m led to believe that PEG parsers can deal with this by simply trying rules in the order they appear. That would fit a lot better to this situation. ANTLR can (messily) handle it with syntactic predicates.&lt;/p&gt;

&lt;p&gt;The other problem is the grammar is left-recursive, most notably with the inline rule.&lt;/p&gt;

&lt;p&gt;Yet another problem is that this requires arbitrary lookahead (again, something ANTLR can do with its LL(*) algorithm) because the token that delineates the heading rules is right at the end of a cyclic rule.&lt;/p&gt;

&lt;p&gt;It's even worse once you start factoring in paragraphs and lists.&lt;/p&gt;

&lt;p&gt;All of this leads to a whole bunch of headaches but I thought about this long and hard (going so far as to wake up in a sweat after a lexical analysis nightmare) and came up with a much more elegant (imho) solution&lt;/p&gt;

&lt;p&gt;Consider a lexical stream that looks like this:&lt;/p&gt;

&lt;pre class="brush:plain"&gt;WORD(&amp;quot;Heading&amp;quot;) WHITE_SPACE(&amp;quot; &amp;quot;) WORD(&amp;quot;1&amp;quot;) ...&lt;/pre&gt;

&lt;p&gt;What's next is important because the parser doesn't yet know if this is a paragraph or a heading. But here is where I was trying to be too clever for my own good by determining the block quote in the lexer. After all, it would make the parsing step easier if I could just use a stack to push/pop block elements already knowing what they are.&lt;/p&gt;

&lt;p&gt;Instead I decided to treat a stream of inline elements as an Unknown element and I could just determine the type as a parsing action rather than a lexcial rule. So the grammar simplifies somewhat to:&lt;/p&gt;

&lt;pre class="brush:plain"&gt;document  : block* ;
block     : codeblock | unknown ;
codeblock : ('    ' .* NEWLINE)+ ;
unknown   : inline* END_BLOCK ;&lt;/pre&gt;

&lt;p&gt;Again, order of rules is useful here, meaning if it looks like a code block it is a code block, otherwise it’s an unknown block. A syntactic predicate could handle this or you could make an indent at the start of a line an INDENT token, which wouldn’t fit into the inline rule. This makes the grammar unambiguous but still requires arbitrary lookahead. It’s easier to simply make a decision based on the first token and avoid any backtracking whatsoever.&lt;/p&gt;

&lt;p&gt;So if the parsing actions come across the right token sequence within the unknown block it changes that block to a heading, otherwise when that block ends it simply defaults to being a paragraph.&lt;/p&gt;

&lt;p&gt;Think of unknown elements as being the stem cells of Markdown lexical analysis.&lt;/p&gt;

&lt;p&gt;Anyway that was my revelation for the week. I still need to finish my list handling, inline styling and links, which is still more than I’d like but it’s getting there.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/mLJtFVra_cI" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/6693774793451367620/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2010/01/markdown-headings-grief-and-unknown.html#comment-form" title="12 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/6693774793451367620?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/6693774793451367620?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/mLJtFVra_cI/markdown-headings-grief-and-unknown.html" title="Markdown Headings, Grief and Unknown Elements to the Rescue" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>12</thr:total><feedburner:origLink>http://www.cforcoding.com/2010/01/markdown-headings-grief-and-unknown.html</feedburner:origLink></entry><entry gd:etag="W/&quot;C0QDRXo4cSp7ImA9WxBXE0w.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-7495962090909617820</id><published>2010-01-14T23:24:00.002+08:00</published><updated>2010-01-24T13:36:14.439+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-01-24T13:36:14.439+08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="open source" /><category scheme="http://www.blogger.com/atom/ns#" term="Java" /><category scheme="http://www.blogger.com/atom/ns#" term="parsing" /><category scheme="http://www.blogger.com/atom/ns#" term="markdown" /><title>Markdown Musings on Unintended Consequences</title><content type="html">&lt;p&gt;It may seem lately that Markdown is my white whale to which I respond thusly… call me Ahab.&lt;/p&gt;  &lt;p&gt;One of the problems with implementing something like this is that no one can quite agree on what exactly constitutes Markdown. It gets worse when you consider Wiki syntaxes. What’s stunning is that someone (&lt;a href="http://www.cosmocode.de/en/index"&gt;CosmoCode&lt;/a&gt;) has gone so far as to create a &lt;a href="http://www.wikimatrix.org/index.php"&gt;matrix comparing them all&lt;/a&gt;.&lt;/p&gt;  &lt;p&gt;If you peruse the unit tests you find things like:&lt;/p&gt;  &lt;pre class="brush:plain"&gt;Asterisks tight:

* asterisk 1
* asterisk 2
* asterisk 3


Asterisks loose:

* asterisk 1

* asterisk 2

* asterisk 3&lt;/pre&gt;

&lt;p&gt;is converted to:&lt;/p&gt;

&lt;pre class="brush:html"&gt;&amp;lt;p&amp;gt;Asterisks tight:&amp;lt;/p&amp;gt;

&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;asterisk 1&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;asterisk 2&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;asterisk 3&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;

&amp;lt;p&amp;gt;Asterisks loose:&amp;lt;/p&amp;gt;

&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;asterisk 1&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;asterisk 2&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;asterisk 3&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;&lt;/pre&gt;

&lt;p&gt;Now having gone through the code I can see why this is: two newlines is typically used as a block delimiter, between paragraphs, code blocks and so forth. But I have to wonder three things:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Is this planned behaviour or simply the result of splitting the file into blocks using two or more newlines as a delimeter? &lt;/li&gt;

  &lt;li&gt;Is this behaviour desirable? &lt;/li&gt;

  &lt;li&gt;Is this behaviour reasonable? &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Of course there is a case for paragraphs being nested in list items, namely that you have two or more paragraphs or other nested block content within list items. This is certainly something you can do—and will do—in HTML but I’m not so convinced that a newlines wrapping list content in a paragraph is anything other than an unintended consequence.&lt;/p&gt;

&lt;p&gt;Of course there is no grammar or spec for Markdown so it’s something you can argue til the cows come home. You can also change it and still call what you do “Markdown”. It’s why there are so many Wiki syntaxes.&lt;/p&gt;

&lt;p&gt;There are other issues. For example, should you be able to start or end bold or italic styling in the middle of a word? I believe Github has taken the approach that underscores for italics can’t start or end intra-word, sensibly (as this is a common occurrence in source code).&lt;/p&gt;

&lt;p&gt;Lastly, Markdown preserves HTML. It’s my opinion that it should be replaced with Markdown where possible. What should you do with this:&lt;/p&gt;

&lt;pre class="brush:html"&gt;&amp;lt;blockquote&amp;gt;
  &amp;lt;ul id=&amp;quot;list&amp;quot;&amp;gt;
    &amp;lt;li&amp;gt;one&amp;lt;/li&amp;gt;
    &amp;lt;li&amp;gt;two&amp;lt;/li&amp;gt;
    &amp;lt;li&amp;gt;three&amp;lt;/li&amp;gt;
  &amp;lt;/ul&amp;gt;
&amp;lt;/blockquote&amp;gt;&lt;/pre&gt;

&lt;p&gt;In my opinion, it would make sense to convert this to:&lt;/p&gt;

&lt;pre class="brush:plain"&gt;&amp;gt; 1. one
&amp;gt; 2. two
&amp;gt; 3. three&lt;/pre&gt;

&lt;p&gt;Of course you lose information in doing this (namely the id attribute) but you have to decide: are you using Markdown or HTML?&lt;/p&gt;

&lt;p&gt;Opinions will of course vary.&lt;/p&gt;

&lt;p&gt;Weighty issues indeed! But this is what I’m struggling with as I’m working on my list parsing while trying to prevent my lexer from becoming a pushdown automaton.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/G9L-okS9MWw" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/7495962090909617820/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2010/01/markdown-musings-on-unintended.html#comment-form" title="6 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/7495962090909617820?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/7495962090909617820?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/G9L-okS9MWw/markdown-musings-on-unintended.html" title="Markdown Musings on Unintended Consequences" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>6</thr:total><feedburner:origLink>http://www.cforcoding.com/2010/01/markdown-musings-on-unintended.html</feedburner:origLink></entry><entry gd:etag="W/&quot;C0QNQ3k7fCp7ImA9WxBXE0w.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-268406792312293234</id><published>2010-01-13T23:03:00.003+08:00</published><updated>2010-01-24T13:36:32.704+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-01-24T13:36:32.704+08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="open source" /><category scheme="http://www.blogger.com/atom/ns#" term="Java" /><category scheme="http://www.blogger.com/atom/ns#" term="parsing" /><category scheme="http://www.blogger.com/atom/ns#" term="markdown" /><title>More Details on JMD Markdown Parsing</title><content type="html">&lt;p&gt;I’ve reached an important milestone tonight. As &lt;a href="http://www.cforcoding.com/2010/01/jmd-markdown-and-brief-overview-of.html"&gt;previously mentioned&lt;/a&gt; I’m working on a non-regex Markdown library for Java (and other languages to follow).&lt;/p&gt;  &lt;p&gt;The goals of this project are:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;To be feature complete for standard Markdown as well as the StackOverflow/Github extensions; &lt;/li&gt;    &lt;li&gt;To add table support; &lt;/li&gt;    &lt;li&gt;To support various Wiki markdown flavours; and &lt;/li&gt;    &lt;li&gt;To convert from Markdown to HTML &lt;em&gt;and from HTML to Markdown&lt;/em&gt;. &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;There looks like being four steps in this process.&lt;/p&gt;  &lt;p&gt;The first step I’ve called lexical analysis but it’s part scanning and part parsing mainly because to do it at this stage is convenient and saves me a lot of grief later. The end result of this step is a list of Tokens, which is highly memory efficient. The Token object only requires 4 integers each and for a source file 10K in size you’ll probably end up with between 2,000 and 5,000 tokens.&lt;/p&gt;  &lt;p&gt;The second step, which I’m not convinced will remain, is a rewrite step. There are a couple of awkward cases I don’t want to handle in the third step so I filter the list of tokens at this point.&lt;/p&gt;  &lt;p&gt;The last step is to take the list of tokens and to generate a Document. A Document is basically an &lt;a href="http://en.wikipedia.org/wiki/Abstract_syntax_tree"&gt;Abstract Syntax Tree&lt;/a&gt; and looks a lot like a DOM.&lt;/p&gt;  &lt;p&gt;The last step is to use the &lt;a href="http://en.wikipedia.org/wiki/Visitor_pattern"&gt;Visitor pattern&lt;/a&gt; to render an HTML document.&lt;/p&gt;  &lt;p&gt;Tonight I have working code that does all four steps. It is still very much feature incomplete. Lots of inline styling doesn’t work. Neither do reference images, reference links nor any kind of list. Still it is correctly handling nested block quotes, implicit paragraphs and paragraph breaks and indented code blocks.&lt;/p&gt;  &lt;p&gt;As of right now it is converting (the Code_blocks unit test from MarkdownSharp):&lt;/p&gt;  &lt;pre class="brush:plain"&gt; code block on the first line
 
Regular text.

    code block indented by spaces

Regular text.

 the lines in this block  
 all contain trailing spaces  

Regular Text.

 code block on the last line&lt;/pre&gt;

&lt;p&gt;into this:&lt;/p&gt;

&lt;pre class="brush:html"&gt;&amp;lt;pre&amp;gt;&amp;lt;code&amp;gt;code block on the first line
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;

&amp;lt;p&amp;gt;Regular text.&amp;lt;/p&amp;gt;

&amp;lt;pre&amp;gt;&amp;lt;code&amp;gt;code block indented by spaces
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;

&amp;lt;p&amp;gt;Regular text.&amp;lt;/p&amp;gt;

&amp;lt;pre&amp;gt;&amp;lt;code&amp;gt;the lines in this block  
all contain trailing spaces  
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;

&amp;lt;p&amp;gt;Regular Text.&amp;lt;/p&amp;gt;

&amp;lt;pre&amp;gt;&amp;lt;code&amp;gt;code block on the last line&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;&lt;/pre&gt;

&lt;p&gt;in &lt;strong&gt;&lt;em&gt;5 microseconds&lt;/em&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Let me repeat that: it looped through that conversion &lt;strong&gt;&lt;em&gt;one million times in under 5 seconds… in pure Java!&lt;/em&gt;&lt;/strong&gt; To compare, my regex solution is doing this in 600-700 microseconds (that’s based on the 1.006 MarkdownSharp code; 1.009 has improved block handling, which should make a difference).&lt;/p&gt;

&lt;p&gt;Now you might look at that document and say it’s not that complicated (and you’d be right) but all the infrastructure is there. I know how I’m going to implement the rest and I can’t imagine anything (other than auto-linking) significantly affecting performance. What’s more even if it was 100 times slower I’d still be happy. I’m working on a worst case of it being 10 times slower when feature complete.&lt;/p&gt;

&lt;p&gt;So far I haven’t used a single regular expression and don’t think I’ll need to apart from maybe link validation. I’ll document more about the design in future posts (after the code is released probably) to explain many optimizations you can make to this process as well as the overall parsing strategy. So far there has been almost zero need for lookahead and backtracking, which is generally what kills your performance (without complicated techniques like &lt;a href="http://en.wikipedia.org/wiki/Memoization"&gt;memoization&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;Stay tuned…&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/bFNDYj-3ubQ" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/268406792312293234/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2010/01/more-details-on-jmd-markdown-parsing.html#comment-form" title="3 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/268406792312293234?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/268406792312293234?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/bFNDYj-3ubQ/more-details-on-jmd-markdown-parsing.html" title="More Details on JMD Markdown Parsing" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>3</thr:total><feedburner:origLink>http://www.cforcoding.com/2010/01/more-details-on-jmd-markdown-parsing.html</feedburner:origLink></entry><entry gd:etag="W/&quot;C0MESXgyeyp7ImA9WxBXE0w.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-4066218079347763609</id><published>2010-01-11T07:19:00.003+08:00</published><updated>2010-01-24T13:36:48.693+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-01-24T13:36:48.693+08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="open source" /><category scheme="http://www.blogger.com/atom/ns#" term="Java" /><category scheme="http://www.blogger.com/atom/ns#" term="parsing" /><category scheme="http://www.blogger.com/atom/ns#" term="computer science" /><category scheme="http://www.blogger.com/atom/ns#" term="markdown" /><title>JMD, Markdown and a Brief Overview of Parsing and Compilers</title><content type="html">&lt;p&gt;Like most comp sci students, I did a course on compilers in university. I also did some parsing and syntax trees in a data structures course. At the time I wrote a couple of parsers, including one for simplifying boolean expressions (de Morgen’s laws, a AND true = a, etc) and another for evaluating arithmetic expressions.&lt;/p&gt;  &lt;p&gt;So I’ve been familiar with the basics and the theory of compiler design but I was by no means an expert.&lt;/p&gt;  &lt;p&gt;Recently, I &lt;a href="http://www.cforcoding.com/2010/01/announcing-jmd-java-markdown-port-of.html"&gt;launched JMD&lt;/a&gt;, a Java implementation of &lt;a href="http://code.google.com/p/markdownsharp/"&gt;MarkdownSharp&lt;/a&gt;, itself a C# port and extension of the original Perl Markdown scripts. Like the original it relies heavily on regular expressions.&lt;/p&gt;  &lt;p&gt;I like the idea of e and improving Markdown but I’m no fan of using complicated regular expressions for that purpose. The current version is a milestone. It passes the unit tests and allows me to better build the replacement, which will be written more in the traditional compiler/translator sense.&lt;/p&gt;  &lt;h3&gt;Finite State Machines&lt;/h3&gt;  &lt;p&gt;&lt;img style="width: 200px" src="http://upload.wikimedia.org/wikipedia/commons/thumb/9/9d/DFAexample.svg/200px-DFAexample.svg.png" /&gt;A &lt;a href="http://en.wikipedia.org/wiki/Finite-state_machine"&gt;finite state machine&lt;/a&gt; (“FSM”) defines two things: a finite number of states and transitions between them. This abstract machines models behaviour of some kind.&lt;/p&gt;  &lt;p&gt;Often—but not always—such machines have a start state and one or more end states. FSMs are typically used for games, input processing and many other things.&lt;/p&gt;  &lt;p&gt;One important characteristic of such machines is that they typically have no memory. They merely know the current state and what transitions there are.&lt;/p&gt;  &lt;p&gt;Typically in computer science we’re more concerned with a special class of FSMs called &lt;a href="http://en.wikipedia.org/wiki/Deterministic_finite-state_machine"&gt;deterministic finite state machines&lt;/a&gt; (“DFSM”) or &lt;em&gt;deterministic finite automata&lt;/em&gt; (“DFAs”). The key difference is that transitions are deterministic, meaning there is only one transition between two states with a given input symbol.&lt;/p&gt;  &lt;p&gt;Another characteristic of finite state machines is whether they are &lt;em&gt;cyclic&lt;/em&gt; or &lt;em&gt;acyclic&lt;/em&gt;. If there exists a state such that a transition can be taken and it is possible to return to that same state the FSM is cyclic, otherwise it is acyclic. By definition, cyclic FSMs are capable of processing an infinite space of inputs. Acyclic FSMs are not.&lt;/p&gt;  &lt;h3&gt;Regular Expressions&lt;/h3&gt;  &lt;p&gt;The most familiar DFA for most programmers will probably be &lt;a href="http://www.regular-expressions.info/"&gt;regular expressions&lt;/a&gt;. a regular expression (“regex”) is a shorthand way of building a DFA to process text input by specifying the optionality, cardinality, capturing and ordering of character sequences. Typically programs will determine if a given input matches a specified regex or whether or not that regex can be found anywhere in the input and possibly capture key parts of that input.&lt;/p&gt;  &lt;p&gt;Undoubtedly regexes are useful but they tend to be overused. Consider them a shining example of how once you have a hammer everything starts to look like a nail. In particular programmers will often try to use them to parse HTML or XML documents, which tends to be a pet peeve of Stackoverflow answeres such as myself.&lt;/p&gt;  &lt;p&gt;The reason they are a poor choice is that HTML is not a &lt;a href="http://en.wikipedia.org/wiki/Regular_language"&gt;regular language&lt;/a&gt;. What that means is that it is not possible to parse and validate HTML with a DFA. That’s because things like proper nesting of tags, ordering of opening and closing tags, etc require the machine to have some kind of memory, which DFAs don’t have.&lt;/p&gt;  &lt;p&gt;You see that in regex-based Markdown parsers where limitations are imposed to make it possible to parse Markdown, such as introducing a nesting depth limit to certain block level elements.&lt;/p&gt;  &lt;p&gt;To give you an example: if you’re looking for links, will “&amp;lt;a[ &amp;gt;]” find them? Most of the time? Yes. But not all the time. Consider the case of such expressions appearing in attributes, XML CDATA blocks, inside XML, CSS or Javascript comments, inside Javascript strings, etc. Regexes can’t detect these kinds of corner cases. It simply isn’t capable. Not reliably at least.&lt;/p&gt;  &lt;p&gt;As it turns out a fairly simple change greatly enhances the power of DFAs.&lt;/p&gt;  &lt;h3&gt;Pushdown Automata&lt;/h3&gt;  &lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/Pushdown_automaton"&gt;Pushdown Automata&lt;/a&gt; (“PDAs”) make the simple change of adding a stack (hence “pushdown”), giving the machine a memory (of sorts) beyond the current state. To clarify, the machine can both inspect and manipulate the stack both in deciding what transition to take and what to do with the stack.&lt;/p&gt;  &lt;p&gt;This allows PDAs to process a much broader set of languages. A language that can be processed by a PDA is called a &lt;a href="http://en.wikipedia.org/wiki/Context-free_language"&gt;context-free language&lt;/a&gt; (“CFL”), which is a superset of all regular languages.&lt;/p&gt;  &lt;p&gt;If a PDA is deterministic (much like FSMs vs DFAs) then it is called a &lt;a href="http://en.wikipedia.org/wiki/Deterministic_pushdown_automaton"&gt;deterministic pushdown automaton&lt;/a&gt; (“DPDA”). Any languages that can be parsed by DPDAs are called &lt;a href="http://en.wikipedia.org/wiki/Deterministic_context-free_language"&gt;deterministic context-free languages&lt;/a&gt; (“DFCLs”), which are a subset of CFLs.&lt;/p&gt;  &lt;p&gt;Going back to regular expressions and HTML/XML parsing: with the addition of this stack, suddenly your parsing becomes &lt;em&gt;much&lt;/em&gt; more reliable. You can stop looking for anchors when you enter a Javascript block or a comment and so on.&lt;/p&gt;  &lt;p&gt;Many programming languages are CFLs but certainly not all.&lt;/p&gt;  &lt;h3&gt;Context-Sensitive Languages&lt;/h3&gt;  &lt;p&gt;The next broader class of languages are called &lt;a href="http://en.wikipedia.org/wiki/Context-sensitive_language"&gt;context-sensitive languages&lt;/a&gt;. CFLs are a subset of context-sensitive languages. C++ is the traditional poster-child for hard-to-parse languages. It’s grammar is also context-sensitive. Take the &lt;a href="http://stackoverflow.com/questions/1172939/is-any-part-of-c-syntax-context-sensitive/1173004#1173004"&gt;following expression&lt;/a&gt;:&lt;/p&gt;  &lt;blockquote&gt;   &lt;pre&gt;A a = B();&lt;/pre&gt;
&lt;/blockquote&gt;

&lt;p&gt;Is that a method call or object construction? You can’t tell without looking up B in the symbol table.&lt;/p&gt;

&lt;p&gt;Ruby and some other programming languages are context-sensitive. HTML/XML is also in this category and so is Markdown. For example:&lt;/p&gt;

&lt;pre&gt;&amp;gt; Block quote &amp;lt;p
&amp;gt; &amp;gt;&lt;/pre&gt;

&lt;p&gt;Is the second &amp;gt; on the second line the start of a nested block quote or the closing part of the paragraph tag at the end of the first line?&lt;/p&gt;

&lt;p&gt;Not only is this context-sensitive it’s also ambiguous. Technically we call this a &lt;em&gt;non-determinism&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Once again returning to parsing a document for anchors, context-sensitivity bridges the remaining gap. Cases like being inside a comment or not represent the kind of context-sensitivity that is unmanageable for regular languages.&lt;/p&gt;

&lt;h3&gt;Grammars, Lexers and Parsers&lt;/h3&gt;

&lt;p&gt;A &lt;a href="http://en.wikipedia.org/wiki/Formal_grammar"&gt;formal grammar&lt;/a&gt; (or just “grammar” for short) is a set of rules that describe a sequence of tokens. A token sequence is often called a sentence. The set of all sentences described by a given grammar is the language for that grammar. &lt;strong&gt;Note:&lt;/strong&gt; a context-free language is described by a context-free grammar (“CFG”), etc.&lt;/p&gt;

&lt;p&gt;Compilers, interpreters and translators are typically written in at least two parts (that are of interest to us): a lexer and a parser. Simple grammars may be implemented where the lexer and parser are combined. More complicated grammars may have many parsing steps.&lt;/p&gt;

&lt;p&gt;A lexer or &lt;a href="http://en.wikipedia.org/wiki/Lexical_analysis"&gt;lexical analyzer&lt;/a&gt; or scanner or recognizer reads a set of input tokens—most often a character stream—and converts them into lexemes or tokens. For example, a lexer for arithmetic expressions may convert:&lt;/p&gt;

&lt;pre&gt;4+5*7&lt;/pre&gt;

&lt;p&gt;into&lt;/p&gt;

&lt;pre&gt;NUMBER(4) OP(+) NUMBER(5) OP(*) NUMBER(7)&lt;/pre&gt;

&lt;p&gt;A parser is a program that interprets a stream of lexemes by a set of rules. Those rules are defined in terms of lexems and/or other rules, or even the same rule (although anything other than tail-recursion in grammar rules tends to be problematic and is usually factored out either manually because the parser won’t accept it or automatically).&lt;/p&gt;

&lt;p&gt;This distinction is somewhat artificial and a little blurred. &lt;a href="http://www.antlr.org/"&gt;ANTLR&lt;/a&gt; maeks the definition that lexer rules are &lt;em&gt;terminating rules&lt;/em&gt; and parser rules are &lt;em&gt;non-terminating&lt;/em&gt;. “Terminating” means the rules is not defined in terms of any other rules and as such can at best resolve to a lexeme.&lt;/p&gt;

&lt;h3&gt;Types of Parsers&lt;/h3&gt;

&lt;p&gt;At the top level there are two main types of parsers.&lt;/p&gt;

&lt;p&gt;The first category are &lt;a href="http://en.wikipedia.org/wiki/LL_parser"&gt;LL-parsers&lt;/a&gt;. Here each L stands for “left to right”. Basically this means the parser is &lt;em&gt;top-down&lt;/em&gt;. The parser attempts to match the input to a rule and in doing so will attempt to match input tokens to lexemes. The other L means the input tokens are matched left-to-right too (ie from hte beginning).&lt;/p&gt;

&lt;p&gt;LL parsers vary in their degree of lookahead. The simplest LL parsers are LL(1), meaning they lookahead one token. An LL parser cannot choose between:&lt;/p&gt;

&lt;pre&gt;r : A B
  | A C
  ;&lt;/pre&gt;

&lt;p&gt;An LL(2) parser however can. LL parsers with a finite amount of lookahead are also called LL(k) parsers. Arbitrary lookahead LL parsers are called LL(*) parsers.&lt;/p&gt;

&lt;p&gt;The other main category is &lt;a href="http://en.wikipedia.org/wiki/LR(0)_parser"&gt;LR parsers&lt;/a&gt;. The input tokens are still read from the beginning but instead of trying to match rules, the parser will look at the input and try to construct tokens. From those tokens it will then look for rules and match the input that way.&lt;/p&gt;

&lt;p&gt;The most important subset of LR parsers are &lt;a href="http://en.wikipedia.org/wiki/LALR_parser"&gt;LALR parsers&lt;/a&gt;. Frankly it’s been too many years for me to remember the difference between LR and LALR parsers.&lt;/p&gt;

&lt;p&gt;There are various strengths and weaknesses of each approach, which is beyond the scope of this post. Generally though, LL parsers are easier to understand but LR parsers are more often used by &lt;a href="http://en.wikipedia.org/wiki/Compiler-compiler"&gt;compiler compilers&lt;/a&gt;. A compiler compiler is a tool that takes a formal grammar and creates a compiler, parser, interpreter or translator.&lt;/p&gt;

&lt;p&gt;Another class of parsers is &lt;a href="http://en.wikipedia.org/wiki/Parsing_expression_grammar"&gt;parsing expression grammars&lt;/a&gt; (“PEGs”). I know far less about these. They’re fairly new but at least one Markdown parser, &lt;a href="http://github.com/jgm/lunamark"&gt;Lunamark&lt;/a&gt;, has been written with a PEG parser.&lt;/p&gt;

&lt;h3&gt;ANTLR&lt;/h3&gt;

&lt;p&gt;Apparently Joel and Jeff discussed this issue in &lt;a href="http://blog.stackoverflow.com/2010/01/podcast-79/"&gt;this week's Stackoverflow podcast&lt;/a&gt;. They ruminated that it would have been better had Markdown been written using a formal grammar and a tool such as bison. I agree and they provide a &lt;a href="http://code.google.com/p/markdownsharp/source/browse/trunk/MarkdownSharpTests/source/php/markdown.php#365"&gt;pretty good example of how horrifying regex parsing of non-regular languages can be&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;My own journey towards a non-regex solution first took me to &lt;a href="http://www.antlr.org/"&gt;ANTLR&lt;/a&gt; (“ANother Tool for Language Recognition”), which has a good GUI for debugging parsers. ANTLR is an LL(*) parser with some extensions. One of these is &lt;a href="http://en.wikipedia.org/wiki/Syntactic_predicate"&gt;syntactic predicates&lt;/a&gt;. Syntactic predicates increase the recognition power of LL parsers by resolving some ambiguities that LL parsers otherwise can’t handle.&lt;/p&gt;

&lt;p&gt;For example, consider the following Markdown:&lt;/p&gt;

&lt;pre&gt;&amp;gt; This is a *test of
&amp;gt; blockquoting*,
&amp;gt; emphasis and
&amp;gt; http://www.google.com
&amp;gt; (autolinking)
A paragraph.&lt;/pre&gt;

&lt;p&gt;When I was playing around with ANTLR this was problematic to parse. The natural way is to remove the blockquoting and then parse the remaining text, probably recursively.&lt;/p&gt;

&lt;p&gt;One “problem” with LL parses is that they attempt to match all the rule alternatives so if you wanted to write the above as:&lt;/p&gt;

&lt;pre&gt;document : (para | quote)* ;
para     : ((~ '\n') '\n')+ ;
quote    : ('&amp;gt; ' (~ '\n')*)+ ;&lt;/pre&gt;

&lt;p&gt;you’ve actually created an ambiguous grammar because the para rule can match quote lines can be matched by the para rule. Syntactic predicates seek to resolve this kind of ambiguity by saying things like “if it looks like a block quote then it’s a block quote” when choosing between possible alternatives.&lt;/p&gt;

&lt;p&gt;Another problem I ran into was how to deal with things like auto-linking URLs?&lt;/p&gt;

&lt;p&gt;Throw in limited XML/HTML parsing and it just became a hair-pulling exercise. Basically it just seemed to be the wrong tool for this particular job. Now that doesn’t mean it can’t be done. Someone more skilled than I with it no doubt could get it done. I could see the path forward and it wasn’t pretty however.&lt;/p&gt;

&lt;p&gt;It’s a shame really because I like ANTLR. &lt;a href="http://www.cs.usfca.edu/~parrt/"&gt;Terence Parr&lt;/a&gt;, the author of ANTLR and all-round language tool rock star, has written an excellent book &lt;a href="http://www.pragprog.com/titles/tpantlr/the-definitive-antlr-reference"&gt;The Definitive ANTLR Reference: Building Domain-Specific Languages&lt;/a&gt; and I can’t recommend this enough.&lt;/p&gt;

&lt;h3&gt;The Future of JMD&lt;/h3&gt;

&lt;p&gt;One commentor &lt;a href="http://www.dzone.com/links/announcing_jmd_java_markdown_port_of_markdownsharp.html"&gt;asked&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;How is it better/different than MarkdownJ?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It’s a good question. The answer is that JMD will use a true parser rather than a hash of regexes to parse and process Markdown. It’ll also do a lot more than this but more on this later when its closer to fruition.&lt;/p&gt;

&lt;p&gt;The ANTLR exercise wasn’t a complete waste. I had a choice to see if LALR parsing would offer a better alternative. The ANTLR experiment did solidify in my mind how I would go about parsing Markdown.&lt;/p&gt;

&lt;p&gt;I’m convinced that a hand-coded parser is not only possible but it’s relatively straightforward.&lt;/p&gt;

&lt;p&gt;Preliminary results look extremely promising. It’s not feature-complete yet and I won’t release it until it passes a significant portion of the unit tests inherited from MarkdownSharp.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Initial results indicate it will be 50-100x faster than a regex solution.&lt;/em&gt;&lt;/strong&gt; The lexical analysis so far is being done in a single pass using virtually no memory and is taking between 10 and 20 &lt;em&gt;microseconds&lt;/em&gt; to tokenize a document about the size of one of the unit tests.&lt;/p&gt;

&lt;h3&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;I hope this post has been useful in three respects:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;To give a brief overview of the field of compiler compilers;&lt;/li&gt;

  &lt;li&gt;To explain my thought process behind how to take JMD forward; and&lt;/li&gt;

  &lt;li&gt;Why I’m doing what I’m doing.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Watch this space.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/Bl0p1b2xlp8" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/4066218079347763609/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2010/01/jmd-markdown-and-brief-overview-of.html#comment-form" title="5 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/4066218079347763609?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/4066218079347763609?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/Bl0p1b2xlp8/jmd-markdown-and-brief-overview-of.html" title="JMD, Markdown and a Brief Overview of Parsing and Compilers" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>5</thr:total><feedburner:origLink>http://www.cforcoding.com/2010/01/jmd-markdown-and-brief-overview-of.html</feedburner:origLink></entry><entry gd:etag="W/&quot;C0MGQHY-eip7ImA9WxBXE0w.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-5358229045943984522</id><published>2010-01-04T19:39:00.003+08:00</published><updated>2010-01-24T13:37:01.852+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-01-24T13:37:01.852+08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="open source" /><category scheme="http://www.blogger.com/atom/ns#" term="Java" /><category scheme="http://www.blogger.com/atom/ns#" term="parsing" /><category scheme="http://www.blogger.com/atom/ns#" term="markdown" /><title>Announcing JMD: Java MarkDown (port of MarkdownSharp)</title><content type="html">&lt;p&gt;By a strange coincidence when I was looking for text editing options for another project, the Stackoverflow guys &lt;a href="http://blog.stackoverflow.com/2009/12/introducing-markdownsharp/"&gt;released MarkdownSharp&lt;/a&gt; last week, being a C# port and extension to what was originally written in Perl.&lt;/p&gt;  &lt;p&gt;A couple of days later I have &lt;a href="http://github.com/cletus/jmd"&gt;JMD&lt;/a&gt; (Java MarkDown) with the same extensions and unit tests. At this stage—and certainly while the code stabilizes and work progresses in passing all the tests—it is an almost line-for-line translation of the C# source as this makes it easier to apply patches. This isn’t the Java Way, in particular Java favours a more DI-centric approach typified by Spring rather than static configuration.&lt;/p&gt;  &lt;p&gt;Ugliness and architectural issues aside, it will do for now. You can:&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;Download it from the &lt;a href="http://github.com/cletus/jmd/downloads"&gt;Downloads&lt;/a&gt; page; or &lt;/li&gt;    &lt;li&gt;Retrieve it from &lt;a href="http://github.com/"&gt;Github&lt;/a&gt; at &lt;a title="git://github.com/cletus/jmd.git" href="git://github.com/cletus/jmd.git"&gt;git://github.com/cletus/jmd.git&lt;/a&gt;. &lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;It is built with &lt;a href="http://maven.apache.org/"&gt;Maven&lt;/a&gt; and should build out of the box (assuming correctly configured Maven). Running on my machine:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Intel Q9450 CPU (2.66GHz); &lt;/li&gt;    &lt;li&gt;8GB DDR2 RAM; &lt;/li&gt;    &lt;li&gt;Windows 7 Ultimate 64; and &lt;/li&gt;    &lt;li&gt;Intel X25-M G2 80GB SSD. &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;The results are:&lt;/p&gt;  &lt;pre&gt;JMD test run

1   Amps_and_angle_encoding                                 OK
2   Auto_links                                              OK
3   Backslash_escapes                                       OK
4   Blockquotes_with_code_blocks                            OK
5   Code_Blocks                                             OK
6   Code_Spans                                              OK
7   Hard_wrapped_paragraphs_with_list_like_lines            OK
8   Horizontal_rules                                        OK
9   Images                                                  OK
10  Inline_HTML_Advanced                                    Mismatch
11  Inline_HTML_comments                                    OK
12  Inline_HTML_Simple                                      OK
13  Links_inline_style                                      OK
14  Links_reference_style                                   OK
15  Links_shortcut_references                               OK
16  Literal_quotes_in_titles                                OK
17  Markdown_Documentation_Basics                           OK
18  Markdown_Documentation_Syntax                           OK
19  Nested_blockquotes                                      OK
20  Ordered_and_unordered_lists                             Mismatch
21  Strong_and_em_together                                  OK
22  Tabs                                                    OK
23  Tidyness                                                OK^

Tests      : 23
OK         : 21 (^ 1 whitespace differences)
Mismatch   : 2

input string length: 475
4000 iterations in 6.301 seconds (1.575 ms per iteration)
input string length: 2356
1000 iterations in 6.390 seconds (6.390 ms per iteration)
input string length: 27737
100 iterations in 10.503 seconds (105.031 ms per iteration)
input string length: 11075
1 iteration in 0.037 seconds
input string length: 88607
1 iteration in 0.518 seconds
input string length: 354431
1 iteration in 4.992 seconds&lt;/pre&gt;

&lt;p&gt;To compare, on the same machine, these are the MarkdownSharp results in Visual Studio 2008:&lt;/p&gt;

&lt;pre&gt;MarkdownSharp v1.006 test run on \mdtest-1.1

001 Amps_and_angle_encoding                                OK
002 Auto_links                                             OK
003 Backslash_escapes                                      OK^
004 Blockquotes_with_code_blocks                           OK
005 Code_Blocks                                            OK
006 Code_Spans                                             OK
007 Hard_wrapped_paragraphs_with_list_like_lines           OK
008 Horizontal_rules                                       OK
009 Images                                                 OK
010 Inline_HTML_Advanced                                   Mismatch
011 Inline_HTML_comments                                   OK
012 Inline_HTML_Simple                                     OK
013 Links_inline_style                                     OK
014 Links_reference_style                                  OK
015 Links_shortcut_references                              OK
016 Literal_quotes_in_titles                               OK
017 Markdown_Documentation_Basics                          OK
018 Markdown_Documentation_Syntax                          OK
019 Nested_blockquotes                                     OK
020 Ordered_and_unordered_lists                            Mismatch
021 Strong_and_em_together                                 OK
022 Tabs                                                   OK
023 Tidyness                                               OK^

Tests        : 23
OK           : 21 (^ 2 whitespace differences)
Mismatch     : 2

MarkdownSharp v1.006 test run on \test-input

001 markdown-readme                                        OK
002 reality-check                                          OK

Tests        : 2
OK           : 2
Mismatch     : 0


MarkdownSharp v1.006 benchmark, takes 10 ~ 30 seconds...

input string length: 475
4000 iterations in 3827 ms (0.95675 ms per iteration)
input string length: 2356
1000 iterations in 4205 ms (4.205 ms per iteration)
input string length: 27737
100 iterations in 4736 ms (47.36 ms per iteration)
input string length: 11075
1 iteration in 23 ms
input string length: 88607
1 iteration in 191 ms
input string length: 354431
1 iteration in 1025 ms&lt;/pre&gt;

&lt;p&gt;So Java is roughly half the speed of C# in this regard, which is more difference than I’d expect for what is essentially the same code. At this preliminary stage I can only attribute this to the .Net Regex libraries being better.&lt;/p&gt;

&lt;p&gt;JMD is released under the same permissive &lt;a href="http://www.opensource.org/licenses/mit-license.php"&gt;MIT license&lt;/a&gt; as MarkdownSharp. Please feel free to use it, let me know what you think or to contribute.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/pOLWUSGUjoQ" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/5358229045943984522/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2010/01/announcing-jmd-java-markdown-port-of.html#comment-form" title="7 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/5358229045943984522?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/5358229045943984522?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/pOLWUSGUjoQ/announcing-jmd-java-markdown-port-of.html" title="Announcing JMD: Java MarkDown (port of MarkdownSharp)" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>7</thr:total><feedburner:origLink>http://www.cforcoding.com/2010/01/announcing-jmd-java-markdown-port-of.html</feedburner:origLink></entry><entry gd:etag="W/&quot;A08FRn45fip7ImA9WxBRE0U.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-4917424370978974063</id><published>2010-01-02T07:50:00.001+08:00</published><updated>2010-01-02T07:50:17.026+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-01-02T07:50:17.026+08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="opinion" /><category scheme="http://www.blogger.com/atom/ns#" term="Java" /><title>Java: Why-oh-why still no multi-line strings?</title><content type="html">&lt;p&gt;Over the last year or two I’ve been doing a lot of PHP (which I really like).One of the things I use a lot is &lt;a href="http://www.php.net/manual/en/language.types.string.php#language.types.string.syntax.heredoc"&gt;heredoc syntax&lt;/a&gt; eg:&lt;/p&gt;  &lt;pre class="brush:php"&gt;$query = &amp;lt;&amp;lt;&amp;lt;END
SELECT *
FROM tablename
WHERE condition1 = $field
AND condition2 = 345
END;&lt;/pre&gt;

&lt;p&gt;This is much more convenient than, say:&lt;/p&gt;

&lt;pre class="brush:java"&gt;String query =
  &amp;quot;SELECT * &amp;quot; + // MUST remember to put a space here!
  &amp;quot;FROM tablename &amp;quot; +
  &amp;quot;WHERE condition1 = &amp;quot; + field + &amp;quot; &amp;quot; + 
  &amp;quot;AND condition2 = 345&amp;quot;;&lt;/pre&gt;

&lt;p&gt;I’ve been doing quite a bit of Java recently and it’s really starting to bug me. I don't understand why pretty much every other imperative language invented in the last 15 years can have some form of multi-line string syntax but Java &lt;em&gt;still&lt;/em&gt; doesn’t.&lt;/p&gt;

&lt;p&gt;Java 6 was released over three years ago. Java 7—thanks largely to the unexpected (yet welcome) inclusion of closures—isn’t due for nearly another year. Four years between releases.&lt;/p&gt;

&lt;p&gt;Surely Java could have gotten &lt;em&gt;something&lt;/em&gt; in that time. Even if it’s the rather ugly (imho) triple-quote syntax of Scala/Groovy it’d be better than nothing.&lt;/p&gt;

&lt;p&gt;Anyway, I just needed to get that out.&lt;/p&gt;

&lt;p&gt;Happy New Year for 2010.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/Wv0JlSp2Kyo" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/4917424370978974063/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2010/01/java-why-oh-why-still-no-multi-line.html#comment-form" title="4 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/4917424370978974063?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/4917424370978974063?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/Wv0JlSp2Kyo/java-why-oh-why-still-no-multi-line.html" title="Java: Why-oh-why still no multi-line strings?" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>4</thr:total><feedburner:origLink>http://www.cforcoding.com/2010/01/java-why-oh-why-still-no-multi-line.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CEANQXs5fSp7ImA9WxBREEQ.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-3073976543799228049</id><published>2009-12-29T21:13:00.001+08:00</published><updated>2009-12-29T21:19:50.525+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-12-29T21:19:50.525+08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="performance" /><category scheme="http://www.blogger.com/atom/ns#" term="Java" /><title>Mutability, Arrays and the Cost of Temporary Objects in Java</title><content type="html">&lt;p&gt;In his 2001 must-read book, &lt;a href="http://www.amazon.com/Effective-Java-2nd-Joshua-Bloch/dp/0321356683/ref=sr_1_1?ie=UTF8&amp;amp;s=books&amp;amp;qid=1262081713&amp;amp;sr=8-1"&gt;Effective Java&lt;/a&gt;, Joshua Bloch said in one item “Favor immutability”. &lt;a href="http://www.ibm.com/developerworks/java/library/j-jtp02183.html"&gt;Java theory and practice: To mutate or not to mutate?&lt;/a&gt; provides an excellent overview of what this means and why it matters. It states:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;An immutable object is one whose externally visible state cannot change after it is instantiated.&lt;/p&gt; &lt;/blockquote&gt;  &lt;h3&gt;A Brief Overview of Immutability&lt;/h3&gt;  &lt;p&gt;Let’s say you want to create a class to model arbitrary precision &lt;a href="http://en.wikipedia.org/wiki/Rational_number"&gt;rational numbers&lt;/a&gt; (ie fractions). A mutable version might start out:&lt;/p&gt;  &lt;pre class="brush:java"&gt;public class BigRational {
  private BigInteger numerator = BigInteger.ZERO;
  private BigInteger denominator = BigInteger.ONE;

  // constructors and so on

  public BigRational add(BigRational other) {
    if (numerator.signum() == 0) {
      numerator = other.numerator;
      denominator = other.denominator;
    } else if (other.numerator.signum() == 0) {
      // no action required
    } else if (denominator.equals(other.denominator)) {
      numerator = numerator.add(other.numerator);
    } else {
      // this could be optimized for greatest common divisor
      numerator = numerator.multiply(other.denominator).add(other.numerator.multiply(denominator));
      denominator = denominator.multiply(other.denominator);
    }
    return this;
  }

  // etc
}&lt;/pre&gt;

&lt;p&gt;This is how many classes naively start. The problem comes when this class is used as a data member or a parameter. Consider this class:&lt;/p&gt;

&lt;pre class="brush:java"&gt;public class MyClass {
  private BigRational data = new BigRational();

  public BigRational getData() {
    return data;
  }
}&lt;/pre&gt;

&lt;p&gt;Doing this will modify the internal state of the class:&lt;/p&gt;

&lt;pre class="brush:java"&gt;MyClass mc = new MyClass();
BigRational rational = mc.getData();
rational.multiply(rational);&lt;/pre&gt;

&lt;p&gt;Obviously this behaviour isn’t desirable, which leads to the practice of &lt;a href="http://www.javapractices.com/topic/TopicAction.do?Id=15"&gt;defensive copying&lt;/a&gt;, a practice familiar to any C programmer. Each time this getter is called a temporary copy is created so the internal state of the class isn’t violated.&lt;/p&gt;

&lt;p&gt;One of the biggest early errors in Java’s design was that the &lt;a href="http://java.sun.com/javase/6/docs/api/java/util/Date.html"&gt;Date class&lt;/a&gt; is mutable. This means that any API that uses dates either has to use a non-standard date class or defensively copy date instances. Oddly, Java got String, BigInteger and BigDecimal all right as they’re all immutable. Even stranger, the later &lt;a href="http://java.sun.com/javase/6/docs/api/java/util/Calendar.html"&gt;Calendar class&lt;/a&gt; (introduced in JDK 1.1) was also made mutable.&lt;/p&gt;

&lt;p&gt;An immutable version would look something like:&lt;/p&gt;

&lt;pre class="brush:java"&gt;public class BigRational {
  private final BigInteger numerator;
  private final BigInteger denominator;

  public BigRational() {
    this(BigInteger.ZERO);
  }

  public BigRational(BigInteger integer) {
    this(integer, BigInteger.ONE);
  }

  public BigRational(BigInteger numerator, BigInteger denominator) {
    if (denominator.signum() == 0) {
      throw new IllegalArgumentException(&amp;quot;denominator cannot be zero&amp;quot;);
    }
    if (numerator.signum() == 0) {
      this.numerator = BigInteger.ZERO;
      this.denominator = BigInteger.ONE;
    } else {
      this.numerator = numerator;
      this.denominator = denominator;
    }
  }

  public BigRational multiply(BigRational other) {
    if (numerator.signum() == 0 || other.numerator.signum() == 0) {
      return new BigRational(BigInteger.ZERO);
    } else if (denominator.equals(other.denominator)) {
      return new BigRational(numerator.add(other.numerator), denominator);
    } else {
      return new BigRational(numerator.multiply(other.denominator).add(other.numerator.multiply(denominator)), enominator.multiply(other.denominator));
    }
  }

  // etc
}&lt;/pre&gt;

&lt;h3&gt;Arrays are Mutable&lt;/h3&gt;

&lt;p&gt;The big problem with all this is that Java arrays are mutable. So for example:&lt;/p&gt;

&lt;pre class="brush:java"&gt;public void doStuff(String args[]) {
  args[0] = &amp;quot;Hello world&amp;quot;;
}

...

String arr[] = new String[] { &amp;quot;one&amp;quot;, &amp;quot;two&amp;quot;, &amp;quot;three&amp;quot; };
doStuff(arr);
System.out.println(arr[0]); // Hello world&lt;/pre&gt;

&lt;p&gt;This is one big reason why you should use &lt;a href="http://java.sun.com/javase/6/docs/api/java/util/List.html"&gt;Lists&lt;/a&gt; instead of arrays in almost all circumstances where you have a choice. Lists can be made immutable:&lt;/p&gt;

&lt;pre class="brush:java"&gt;List&amp;lt;String&amp;gt; list = new ArrayList&amp;lt;String&amp;gt;();
list.add(&amp;quot;one&amp;quot;);
list.add(&amp;quot;two&amp;quot;);
list.add(&amp;quot;three&amp;quot;);
final List&amp;lt;String&amp;gt; immutableList = Collections.unmodifiableList(list);&lt;/pre&gt;

&lt;p&gt;On a side note, this rather verbose syntax gets a little easier in Java 7 with &lt;a href="http://tech.puredanger.com/2009/06/02/javaone-coin/"&gt;collection literals&lt;/a&gt;, for example:&lt;/p&gt;

&lt;pre class="brush:java"&gt;List&amp;lt;String&amp;gt; list = [&amp;quot;one&amp;quot;, &amp;quot;two&amp;quot;, &amp;quot;three&amp;quot;];
final List&amp;lt;String&amp;gt; immutableList = Collections.unmodifiableList(list);&lt;/pre&gt;

&lt;h3&gt;Enum Values&lt;/h3&gt;

&lt;p&gt;Java 5 introduced &lt;a href="http://www.javapractices.com/topic/TopicAction.do?Id=1"&gt;typesafe enums&lt;/a&gt;, largely based on Joshua Bloch’s proposal (that used classes). The unofficial versions had lots of potential issues (eg having to implement readResolve() to cater to serialization creating new instances). In my opinion, Java’s enums are one of (increasingly few) significantly better language constructs in Java compared to, say, C# as Java’s enums aren’t just thinly wrapped integers (as they are in C/C++/.Net) and can also have behaviour.&lt;/p&gt;

&lt;p&gt;Java enums have a static method called values() which returns an &lt;em&gt;array&lt;/em&gt; of all instances of that enum. After the lessons of the Date class, this particular decision was nothing short of shocking. A List would have been a far more sensible choice. Internally this means the array of instances must be defensively copied each time it is called forcing you to write code like this repeatedly:&lt;/p&gt;

&lt;pre class="brush:java"&gt;public enum Season {
  SPRING, SUMMER, AUTUMN, WINTER;

  private static final List&amp;lt;Season&amp;gt; VALUES =
    Collections.unmodifiableList(
      new ArrayList&amp;lt;Season&amp;gt;(Arrays.asList(values())));

  public static List&amp;lt;Season&amp;gt; getValues() { return VALUES; }
}&lt;/pre&gt;

&lt;h3&gt;Is This Really Necessary?&lt;/h3&gt;

&lt;p&gt;There is nothing &lt;em&gt;inherently&lt;/em&gt; wrong with temporary objects. It’s a question of degree. So creating several hundred (or thousand) temporary objects isn’t any big deal. At some point there is such a thing as too many.&lt;/p&gt;

&lt;p&gt;I will demonstrate two things here:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;The cost of temporary arrays; and &lt;/li&gt;

  &lt;li&gt;The correct way to generate random enums. &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This post was prompted by &lt;a href="http://stackoverflow.com/questions/1972392/java-pick-a-random-value-from-an-enum"&gt;Java: Pick a random value from an enum?&lt;/a&gt; where the poster created a Random on every iteration. Some years ago this was bad because Randoms were seeded with the current time (in milliseconds or even seconds) so you wouldn’t get a particularly random distribution if called in a short space of time and it’s worth answering the question of fairness.&lt;/p&gt;

&lt;p&gt;This enum will be used:&lt;/p&gt;

&lt;pre class="brush:java"&gt;public enum Season {
  SPRING, SUMMER, AUTUMN, WINTER;

  private static final List&amp;lt;Season&amp;gt; VALUES1 =
      Collections.unmodifiableList(
          new ArrayList&amp;lt;Season&amp;gt;(Arrays.asList(values())));
  private static final Season[] VALUES2 = values();
  private static final int SIZE = VALUES2.length;
  
  private static final Random RANDOM = new Random();

  public static Season random1() {
    return values()[new Random().nextInt(SIZE)];
  }

  public static Season random2() {
    return values()[RANDOM.nextInt(SIZE)];
  }

  public static Season random3() {
    return VALUES1.get(new Random().nextInt(SIZE));
  }

  public static Season random4() {
    return VALUES1.get(RANDOM.nextInt(SIZE));
  }

  public static Season random5() {
    return VALUES2[new Random().nextInt(SIZE)];
  }

  public static Season random6() {
    return VALUES2[RANDOM.nextInt(SIZE)];
  }
}&lt;/pre&gt;

&lt;p&gt;with the following test harness:&lt;/p&gt;

&lt;pre class="brush:java"&gt;public class Temporary {
  private static final int COUNT = 30000000;

  public static void main(String args[]) {
    ScheduledExecutorService executor = Executors.newSingleThreadScheduledExecutor();
    MemoryMonitor monitor = new MemoryMonitor();
    executor.scheduleAtFixedRate(monitor, 0, 10, TimeUnit.MILLISECONDS);
    int[] tally = new int[4];
    long baseline = usedMemory();
    long start = System.nanoTime();
    for (int i=0; i&amp;lt;COUNT; i++) {
      tally[Season.random1().ordinal()]++;
    }
    long end = System.nanoTime();
    executor.shutdown();
    try {
      executor.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
    } catch (InterruptedException e) {
      throw new RuntimeException(e);
    }
    long memoryUsed = monitor.peak() - baseline;
    for (Season season : Season.values()) {
      System.out.printf(&amp;quot;%s: %,d%n&amp;quot;, season, tally[season.ordinal()]);
    }
    System.out.printf(&amp;quot;%nCompleted %,d iterations in %,.3f seconds using %,d bytes%n&amp;quot;,
        COUNT, ((end - start) / 1000000) / 1000.0d, memoryUsed
    );
  }

  private static long usedMemory() {
    Runtime runtime = Runtime.getRuntime();
    return runtime.totalMemory() - runtime.freeMemory();
  }

  private static void waitForEnter() {
    try {
      new BufferedReader(new InputStreamReader(System.in)).readLine();
    } catch (IOException e) {
      e.printStackTrace();
    }
  }
}&lt;/pre&gt;

&lt;p&gt;and&lt;/p&gt;

&lt;pre class="brush:java"&gt;public class MemoryMonitor implements Runnable {
  private final Runtime runtime = Runtime.getRuntime();
  private final List&amp;lt;Long&amp;gt; usage = new ArrayList&amp;lt;Long&amp;gt;();

  @Override
  public void run() {
    usage.add(runtime.totalMemory() - runtime.freeMemory());
  }

  public List&amp;lt;Long&amp;gt; usage() {
    return usage;
  }

  public long peak() {
    return Collections.max(usage);
  }
}&lt;/pre&gt;

&lt;p&gt;with each method being run in turn.&lt;/p&gt;

&lt;h3&gt;The Results&lt;/h3&gt;
&lt;style type="text/css"&gt;
#rundata { border-collapse: collapse; }
#rundata td { border: 1px solid black; text-align: center; }&lt;/style&gt;

&lt;table id="rundata" border="0" cellspacing="0" cellpadding="2" width="472"&gt;&lt;tbody&gt;
    &lt;tr&gt;
      &lt;td valign="top" width="88"&gt;&lt;strong&gt;Method&lt;/strong&gt;&lt;/td&gt;

      &lt;td valign="top" width="146"&gt;&lt;strong&gt;Run time (seconds)&lt;/strong&gt;&lt;/td&gt;

      &lt;td valign="top" width="236"&gt;&lt;strong&gt;Peak Memory Usage (bytes)&lt;/strong&gt;&lt;/td&gt;
    &lt;/tr&gt;

    &lt;tr&gt;
      &lt;td valign="top" width="88"&gt;random1&lt;/td&gt;

      &lt;td valign="top" width="146"&gt;9.746&lt;/td&gt;

      &lt;td valign="top" width="236"&gt;681,288&lt;/td&gt;
    &lt;/tr&gt;

    &lt;tr&gt;
      &lt;td valign="top" width="88"&gt;random2&lt;/td&gt;

      &lt;td valign="top" width="146"&gt;5.914&lt;/td&gt;

      &lt;td valign="top" width="236"&gt;665,592&lt;/td&gt;
    &lt;/tr&gt;

    &lt;tr&gt;
      &lt;td valign="top" width="88"&gt;random3&lt;/td&gt;

      &lt;td valign="top" width="146"&gt;5.123&lt;/td&gt;

      &lt;td valign="top" width="236"&gt;669,408&lt;/td&gt;
    &lt;/tr&gt;

    &lt;tr&gt;
      &lt;td valign="top" width="88"&gt;random4&lt;/td&gt;

      &lt;td valign="top" width="146"&gt;1.476&lt;/td&gt;

      &lt;td valign="top" width="236"&gt;18,376&lt;/td&gt;
    &lt;/tr&gt;

    &lt;tr&gt;
      &lt;td valign="top" width="88"&gt;random5&lt;/td&gt;

      &lt;td valign="top" width="146"&gt;4.593&lt;/td&gt;

      &lt;td valign="top" width="236"&gt;661,368&lt;/td&gt;
    &lt;/tr&gt;

    &lt;tr&gt;
      &lt;td valign="top" width="88"&gt;random6&lt;/td&gt;

      &lt;td valign="top" width="146"&gt;1.056&lt;/td&gt;

      &lt;td valign="top" width="236"&gt;18,376&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;&lt;/table&gt;

&lt;p&gt;From this we can draw several conclusions:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Creating a Random on every invocation is fair (in distribution terms) but has a high cost in temporary objects and CPU time (by a factor of 2-5); &lt;/li&gt;

  &lt;li&gt;The garbage collector is working as both the arrays and the temporary Random objects contribute to the memory usage but Java is (partially) handling both being created. What this probably means is that both created is triggering a GC; &lt;/li&gt;

  &lt;li&gt;Using a static copy of the enum values is 2-5x as fast; &lt;/li&gt;

  &lt;li&gt;A static array copy is about 20-40% quicker than a static List copy; and &lt;/li&gt;

  &lt;li&gt;The more optimized version uses 30x less memory and runs 10x quicker than the least optimized version. &lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;For significant use of an enum’s values() method, it’s a no brainer: create and use a static copy instead. It’s faster and uses way less memory. On non-trivial applications it will also mean less memory fragmentation and less (possibly expensive) GCs, which is a significant issue with high-usage Web applications.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/CMNblULpX3k" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/3073976543799228049/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2009/12/mutability-arrays-and-cost-of-temporary.html#comment-form" title="5 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/3073976543799228049?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/3073976543799228049?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/CMNblULpX3k/mutability-arrays-and-cost-of-temporary.html" title="Mutability, Arrays and the Cost of Temporary Objects in Java" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>5</thr:total><feedburner:origLink>http://www.cforcoding.com/2009/12/mutability-arrays-and-cost-of-temporary.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CkENSXkzfyp7ImA9WxBTGEQ.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-1905431498404039281</id><published>2009-12-15T23:21:00.001+08:00</published><updated>2009-12-15T23:24:58.787+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-12-15T23:24:58.787+08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="stackoverflow" /><category scheme="http://www.blogger.com/atom/ns#" term="opinion" /><title>Hard Numbers on Stackoverflow Careers</title><content type="html">&lt;p&gt;&lt;em&gt;This is a follow-up to &lt;/em&gt;&lt;a href="http://www.cforcoding.com/2009/12/joel-inc-stackoverflow-careers-and.html" target="_blank"&gt;&lt;em&gt;Joel Inc., Stackoverflow Careers and Jumping Sharks&lt;/em&gt;&lt;/a&gt;&lt;em&gt;, posted late last week.&lt;/em&gt;&lt;/p&gt;  &lt;p&gt;Joel posted &lt;a href="http://www.joelonsoftware.com/items/2009/12/13.html" target="_blank"&gt;Stack Stats&lt;/a&gt; this week in which he demonstrates the correlation between Stackoverflow reputation and Careers take-up.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://www.joelonsoftware.com/items/2009/12/13.html" target="_blank"&gt;&lt;img style="width: 426px; display: inline; margin-left: 0px; margin-right: 0px" align="left" src="http://www.joelonsoftware.com/items/2009/12/13cvs.png" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p style="clear: left"&gt;Now the exact meaning of this graph isn’t listed. Is that 30% of users with 50,000 reputation and above have submitted CVs? Let’s assume that it is.&lt;/p&gt;  &lt;div align="center"&gt;   &lt;table border="1" cellspacing="0" cellpadding="2" width="500" align="center"&gt;&lt;tbody&gt;       &lt;tr&gt;         &lt;td valign="top" width="125"&gt;&lt;strong&gt;Reputation&lt;/strong&gt;&lt;/td&gt;          &lt;td valign="top" width="125"&gt;&lt;strong&gt;Percentage&lt;/strong&gt;&lt;/td&gt;          &lt;td valign="top" width="125"&gt;&lt;strong&gt;Users in Range&lt;/strong&gt;&lt;/td&gt;          &lt;td valign="top" width="125"&gt;&lt;strong&gt;# of CVs&lt;/strong&gt;&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td valign="top" width="125"&gt;1,000&lt;/td&gt;          &lt;td valign="top" width="125"&gt;8%&lt;/td&gt;          &lt;td valign="top" width="125"&gt;3,257&lt;/td&gt;          &lt;td valign="top" width="125"&gt;261&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td valign="top" width="125"&gt;2,000&lt;/td&gt;          &lt;td valign="top" width="125"&gt;10%&lt;/td&gt;          &lt;td valign="top" width="125"&gt;1,196&lt;/td&gt;          &lt;td valign="top" width="125"&gt;120&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td valign="top" width="125"&gt;3,000&lt;/td&gt;          &lt;td valign="top" width="125"&gt;11%&lt;/td&gt;          &lt;td valign="top" width="125"&gt;640&lt;/td&gt;          &lt;td valign="top" width="125"&gt;70&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td valign="top" width="125"&gt;4,000&lt;/td&gt;          &lt;td valign="top" width="125"&gt;12%&lt;/td&gt;          &lt;td valign="top" width="125"&gt;373&lt;/td&gt;          &lt;td valign="top" width="125"&gt;45&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td valign="top" width="125"&gt;5,000&lt;/td&gt;          &lt;td valign="top" width="125"&gt;13%&lt;/td&gt;          &lt;td valign="top" width="125"&gt;244&lt;/td&gt;          &lt;td valign="top" width="125"&gt;32&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td valign="top" width="125"&gt;6,000&lt;/td&gt;          &lt;td valign="top" width="125"&gt;13.5%&lt;/td&gt;          &lt;td valign="top" width="125"&gt;153&lt;/td&gt;          &lt;td valign="top" width="125"&gt;21&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td valign="top" width="125"&gt;7,000&lt;/td&gt;          &lt;td valign="top" width="125"&gt;14%&lt;/td&gt;          &lt;td valign="top" width="125"&gt;112&lt;/td&gt;          &lt;td valign="top" width="125"&gt;16&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td valign="top" width="125"&gt;8,000&lt;/td&gt;          &lt;td valign="top" width="125"&gt;15%&lt;/td&gt;          &lt;td valign="top" width="125"&gt;95&lt;/td&gt;          &lt;td valign="top" width="125"&gt;14&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td valign="top" width="125"&gt;9,000&lt;/td&gt;          &lt;td valign="top" width="125"&gt;16%&lt;/td&gt;          &lt;td valign="top" width="125"&gt;48&lt;/td&gt;          &lt;td valign="top" width="125"&gt;8&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td valign="top" width="125"&gt;10,000&lt;/td&gt;          &lt;td valign="top" width="125"&gt;17%&lt;/td&gt;          &lt;td valign="top" width="125"&gt;220&lt;/td&gt;          &lt;td valign="top" width="125"&gt;37&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td valign="top" width="125"&gt;15,000&lt;/td&gt;          &lt;td valign="top" width="125"&gt;19.5%&lt;/td&gt;          &lt;td valign="top" width="125"&gt;79&lt;/td&gt;          &lt;td valign="top" width="125"&gt;15&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td valign="top" width="125"&gt;20,000&lt;/td&gt;          &lt;td valign="top" width="125"&gt;22%&lt;/td&gt;          &lt;td valign="top" width="125"&gt;58&lt;/td&gt;          &lt;td valign="top" width="125"&gt;12&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td valign="top" width="125"&gt;30,000&lt;/td&gt;          &lt;td valign="top" width="125"&gt;26%&lt;/td&gt;          &lt;td valign="top" width="125"&gt;28&lt;/td&gt;          &lt;td valign="top" width="125"&gt;7&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td valign="top" width="125"&gt;50,000&lt;/td&gt;          &lt;td valign="top" width="125"&gt;30%&lt;/td&gt;          &lt;td valign="top" width="125"&gt;15&lt;/td&gt;          &lt;td valign="top" width="125"&gt;5&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td valign="top" width="125"&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;          &lt;td valign="top" width="125"&gt;&lt;strong&gt;10.2%&lt;/strong&gt;&lt;/td&gt;          &lt;td valign="top" width="125"&gt;&lt;strong&gt;6,518&lt;/strong&gt;&lt;/td&gt;          &lt;td valign="top" width="125"&gt;&lt;strong&gt;663&lt;/strong&gt;&lt;/td&gt;       &lt;/tr&gt;     &lt;/tbody&gt;&lt;/table&gt; &lt;/div&gt;  &lt;p&gt;So we have 145 employers (as of 15 Dec 2009) and 663 job seekers of the 6,518 in the sample representing a percentage take-up of 10.2%.&lt;/p&gt;  &lt;p&gt;I would guess that the vast majority of those would’ve paid $29 for 3 years so these ~700 uses account for $21,000 revenue over 3 years.&lt;/p&gt;  &lt;p&gt;Is this kind of barrier—charging job seekers—really worth that kind of revenue stream?&lt;/p&gt;  &lt;p&gt;Of course the hope is both that:&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;The number of candidates will substantially grow; and &lt;/li&gt;    &lt;li&gt;Many (or most) of them will convert to paying $99/year in 3 years. &lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;Three years from now I would consider it optimistic that the users matching the profile might number 40,000 instead of 10 to 25,000. That won’t accurately reflect the natural attrition rate either (there are some users who have already become basically inactive).&lt;/p&gt;  &lt;p&gt;Considering that not all users will be looking for work at the same time (assuming Goldman Sachs’ next crackpot house of cards hasn’t come tumbling down yet), it’s hard to imagine the take-up rate being higher than 15-20% and that’s being optimistic.&lt;/p&gt;  &lt;p&gt;So if everything goes well 10,000 people are paying $99/year. $1 million a year—basically money for nothing—is nothing to sneeze at. I can’t see it happening however.&lt;/p&gt;  &lt;p&gt;Even if it does, it’s questionable whether this is the critical mass required to attract employers. I guess time will tell.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/QM7YFO6fnsI" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/1905431498404039281/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2009/12/hard-numbers-on-stackoverflow-careers.html#comment-form" title="10 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/1905431498404039281?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/1905431498404039281?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/QM7YFO6fnsI/hard-numbers-on-stackoverflow-careers.html" title="Hard Numbers on Stackoverflow Careers" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>10</thr:total><feedburner:origLink>http://www.cforcoding.com/2009/12/hard-numbers-on-stackoverflow-careers.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CEQCRX48cSp7ImA9WxBTFE4.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-166152980014672010</id><published>2009-12-10T16:06:00.001+08:00</published><updated>2009-12-10T16:06:04.079+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-12-10T16:06:04.079+08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="stackoverflow" /><category scheme="http://www.blogger.com/atom/ns#" term="opinion" /><title>Joel Inc., Stackoverflow Careers and Jumping Sharks</title><content type="html">&lt;p&gt;Joel Spolsky is a legend in the programming world. His blog—&lt;a href="http://www.joelonsoftware.com/"&gt;Joel on Software&lt;/a&gt;—is the most popular and well-known programming blog. In mid-2008, Joel and Jeff Atwood—of &lt;a href="http://www.codinghorror.com/blog/"&gt;Coding Horror&lt;/a&gt; fame—launched &lt;a href="http://stackoverflow.com/"&gt;Stackoverflow&lt;/a&gt;, a free site for asking programming questions.&lt;/p&gt;  &lt;p&gt;Stackoverflow is clearly a success but the sister sites haven’t fared nearly as well. Recently Jeff and Joel launched &lt;a href="http://careers.stackoverflow.com/"&gt;Stackoverflow Careers&lt;/a&gt;, a site for programmers to find jobs and employers to find programmers.&lt;/p&gt;  &lt;p&gt;Stackoverflow Careers may just be a bridge too far.&lt;/p&gt;  &lt;h3&gt;Let’s Talk About… Joel&lt;/h3&gt;  &lt;p&gt;Joel on Software was the first blog I ever read. I read it before anyone really knew what a blog was. &lt;a href="http://www.joelonsoftware.com/uibook/chapters/fog0000000057.html"&gt;Controlling Your Environment Makes You Happy&lt;/a&gt; was one of those things I read that completely changed my perspective. &lt;a href="http://www.joelonsoftware.com/articles/APIWar.html"&gt;How Microsoft Lost the API War&lt;/a&gt; I consider to be almost prophetic in its predictions regarding the then-Longhorn now-Vista boondoggle and desktop bloodletting by Web applications.&lt;/p&gt;  &lt;p&gt;But something isn’t right in the Land of Joel.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/File:Buzo.jpg" rel="license" target="_blank"&gt;&lt;img style="width: 320px" src="http://img187.imageshack.us/img187/2708/scubav.jpg" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;In the late 90s during a brief flirtation with strenuous physical activity, I learnt to SCUBA dive. I went to one of these courses that was an evening of instruction of the evils of nitrogen, a weekend in the pool and then a weekend in the ocean. This was a &lt;a href="http://www.padi.com/scuba/"&gt;PADI&lt;/a&gt; course and is very much the consumer-grade diving education and I state that as a simple observation not a judgement or accusation. At the other end of the spectrum is &lt;a href="http://www.naui.org/"&gt;NAUI&lt;/a&gt;.&lt;/p&gt;  &lt;p&gt;PADI is all about selling you stuff—gear, courses, whatever. A friend remarked to me that PADI stood for &lt;strong&gt;Put Another Dollar In&lt;/strong&gt;.&lt;/p&gt;  &lt;p&gt;NAUI on the other hand is much more highly regarded but less prolific. It is a not-for-profit organisation. Whereas some accuse PADI of dumbing down SCUBA training, nothing of the sort is levelled against NAUI. That same friend said NAUI stands for &lt;strong&gt;Not Another Untrained Idiot&lt;/strong&gt;.&lt;/p&gt;  &lt;p&gt;What does this have to do with Joel? &lt;em&gt;Whereas Joel was once the NAUI-like font of wisdom, now it just seems like he’s trying to sell me stuff.&lt;/em&gt;&lt;/p&gt;  &lt;h3&gt;Jumping the Shark&lt;/h3&gt;  &lt;p&gt;Of course I’m not the first to articulate this. &lt;/p&gt;  &lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/File:Fonzie_jumps_the_shark.PNG" rel="license" target="_blank"&gt;&lt;img style="width: 317px" src="http://upload.wikimedia.org/wikipedia/en/5/51/Fonzie_jumps_the_shark.PNG" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;In recent times Joel has taken quite a bashing, for example &lt;a href="http://stochasticgeometry.wordpress.com/2009/10/27/joel-spolsky-snake-oil-salesman/"&gt;Joel Spolsky, Snake-Oil Salesman&lt;/a&gt; and &lt;a href="http://blogs.citytechinc.com/sanderson/"&gt;Sten Anderson&lt;/a&gt;’s &lt;a href="http://blogs.citytechinc.com/sanderson/?p=284"&gt;I Heart Joel on Software&lt;/a&gt;. &lt;/p&gt;  &lt;p&gt;Sten’s comments are particularly interesting because what he says is true: all Joel’s endless talk about great programmers is thinly disguised disdain for the 99% of us that didn’t go to MIT, Stanford, UW, Yale, Harvard or UPenn.&lt;/p&gt;  &lt;p&gt;Amusingly, Jeff Atwood posted several years ago &lt;a href="http://www.codinghorror.com/blog/archives/000679.html"&gt;Has Joel Spolsky Jumped the Shark?&lt;/a&gt; going so far as to say:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;I reject this new, highly illogical Joel Spolsky. I demand the immediate return of the sage, sane, wise Joel Spolsky of years past. But maybe it's like wishing for a long-running television show to return to its previous glories.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;I guess he got over it.&lt;/p&gt;  &lt;p&gt;Side note: Jeff was responding to &lt;a href="http://www.joelonsoftware.com/items/2006/09/01.html"&gt;Language Wars&lt;/a&gt; (emphasis added by Jeff):&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;a href="http://www.fogcreek.com/FogBugz"&gt;FogBugz&lt;/a&gt;is written in Wasabi, a very advanced, functional-programming dialect of Basic with closures and lambdas and Rails-like active records that can be compiled down to VBScript, JavaScript, PHP4 or PHP5. &lt;strong&gt;Wasabi is a private, in-house language written by one of our best developers that is optimized specifically for developing FogBugz;&lt;/strong&gt; the Wasabi compiler itself is written in C#.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;I admit it: I love a good rant. And not just ranting for ranting’s sake but a rant with a message, an essential kernel of truth, a pearl of wisdom. It’s hard to forget &lt;a href="http://www.zedshaw.com/"&gt;Zed Shaw&lt;/a&gt;’s now-infamous (albeit retracted) &lt;a href="http://web.archive.org/web/20080103072111/http://www.zedshaw.com/rants/rails_is_a_ghetto.html"&gt;Rails is a Ghetto&lt;/a&gt; rant of nearly two years ago. Yesterday I read &lt;a href="http://gilesbowkett.blogspot.com/"&gt;Giles Bowkett&lt;/a&gt;’s &lt;a href="http://gilesbowkett.blogspot.com/2009/12/blogs-are-godless-communist-bullshit.html"&gt;Blogs are Godless Communist Bullshit&lt;/a&gt;. It’s long but entertaining and absolutely worth reading.&lt;/p&gt;  &lt;p&gt;But is all this criticism justified?&lt;/p&gt;  &lt;p&gt;Firstly, some background.&lt;/p&gt;  &lt;h3&gt;IT Recruitment&lt;/h3&gt;  &lt;p&gt;In Europe and Australia programmers (and other IT professionals) are found in three ways:&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;Direct recruitment by the employer. This usually means big employers who have dedicated HR departments to filter out CVs, book interviews and so on. Such candidates will most likely become salaried employees of the company; &lt;/li&gt;    &lt;li&gt;Word of mouth; and &lt;/li&gt;    &lt;li&gt;Through recruitment agencies. &lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;In my experience recruitment agents are &lt;em&gt;loathed&lt;/em&gt; by IT workers (eg &lt;a href="http://angryaussie.wordpress.com/2006/10/30/why-is-it-recruitment-so-bad/"&gt;Why is IT recruitment so bad?&lt;/a&gt;). Most of the time they’re &lt;em&gt;utterly clueless&lt;/em&gt; (I have in all seriousness been asked “I see you have 7 years of Java experience but do you have any J2SE experience?”). Horror stories are legion. IT recruitment in London in particular is a &lt;em&gt;soul-destroying experience&lt;/em&gt;.&lt;/p&gt;  &lt;p&gt;Recruiters will fill positions on a &lt;em&gt;permanent&lt;/em&gt; (salaried) or &lt;em&gt;contract&lt;/em&gt; (paid by the hour, day, week or month) basis.&lt;/p&gt;  &lt;p&gt;The recruiter will earn a fee that is typically around 10-15% of the candidate’s annual salary upon successfully filling the position. If the employee leaves in the probationary period (typically three months) some or all of that will be refunded.&lt;/p&gt;  &lt;p&gt;With contractors the recruiter will typically earn a margin of 10-25% (or even higher) on top of the contractor’s rate either for a fixed term (eg it scales down after a year) or in perpetuity. Expat contractors typically have criminally high margins put on top of what they earn, at least initially.&lt;/p&gt;  &lt;p&gt;So recruitment is expensive.&lt;/p&gt;  &lt;p&gt;Compare that to placing ads on job boards will typically cost hundreds of dollars (eg &lt;a href="http://jobs.joelonsoftware.com/default.asp?pg=pgFAQ"&gt;jobs.joelonsoftware.com FAQ&lt;/a&gt; and &lt;a href="http://hiring.monster.com/recruitment/Job-Postings.aspx"&gt;Monster Job Posting&lt;/a&gt;) and last weeks. One ad can potentially fill multiple positions. Employers will typically keep CVs on file and getting contacted some time after applying is not uncommon. So ads can be effective although there can be a lot of chaff.&lt;/p&gt;  &lt;p&gt;IT recruitment &lt;em&gt;is&lt;/em&gt; broken so there’s definitely room for a solution.&lt;/p&gt;  &lt;h3&gt;Stackoverflow Careers&lt;/h3&gt;  &lt;p&gt;Careers is another site hoping to capitalize on the success of Stackoverflow. Programmers routinely demonstrate the ability to self-organize, which I think explains—at least in part—its success. Computer science is also a centuries-old. Yes I said “centuries old”. So before some reddit lurker points out computers were born in the mid-twentieth century, I suggest you consult the &lt;a href="http://en.wikipedia.org/wiki/Timeline_of_computing_2400_BC%E2%80%931949"&gt;Timeline of computing 2400 BC–1949&lt;/a&gt; and the work of &lt;a href="http://en.wikipedia.org/wiki/Charles_Babbage"&gt;Charles Babbage&lt;/a&gt; and others.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://commons.wikimedia.org/wiki/File:Volunteers_of_America_Soup_Kitchen_WDC.gif" rel="license" target="_blank"&gt;&lt;img style="width: 320px" src="http://img690.imageshack.us/img690/2364/soupkitchen.gif" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;&lt;/p&gt;  &lt;p&gt;The latest money-making venture is &lt;a href="http://careers.stackoverflow.com/"&gt;Stackoverflow Careers&lt;/a&gt;, heavily cross-promoted by Jeff Atwood (&lt;a href="http://blog.stackoverflow.com/2009/10/introducing-stack-overflow-careers/"&gt;Introducing Stack Overflow Careers&lt;/a&gt; and &lt;a href="http://www.codinghorror.com/blog/archives/001308.html"&gt;Stack Overflow Careers: Amplifying Your Awesome&lt;/a&gt;) and Joel (&lt;a href="http://www.joelonsoftware.com/items/2009/11/05.html"&gt;Upgrade your career&lt;/a&gt; and &lt;a href="http://www.joelonsoftware.com/items/2009/12/02.html"&gt;Programmer search engine&lt;/a&gt;) as well as echoes in the blogosphere.&lt;/p&gt;  &lt;p&gt;Despite the success in terms of audience size (&lt;a href="http://www.youtube.com/watch?v=NWHfY_lvKIQ"&gt;Joel in his Google Tech Talk&lt;/a&gt; claims a ~30% programmer share, which is huge if true), programmers are a hard bunch to monetize (see &lt;a href="http://blog.stackoverflow.com/2009/11/our-amazon-advertising-experiment/"&gt;Our Amazon Advertising Experiment&lt;/a&gt;). Careers is the latest incarnation.&lt;/p&gt;  &lt;p&gt;It’s free to have a public CV but having a private CV costs money (allegedly $99/year after 31st December but don’t be surprised if that changes). The private CV is searchable by employers and allows (as Jeff/Joel put it) “deep” integration with Stackoverflow.&lt;/p&gt;  &lt;p&gt;The employers are paying too anywhere from $500 for a week to $5,000 for a year (see the &lt;a href="http://careers.stackoverflow.com/faq"&gt;FAQ&lt;/a&gt;).&lt;/p&gt;  &lt;p&gt;Not cheap. So what are we getting for our money?&lt;/p&gt;  &lt;h3&gt;The Hollywood Analogy&lt;/h3&gt;  &lt;p&gt;Joel claims:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;In Hollywood, studios who need talent browse through portfolios, find two or three possible candidates, and make them great offers. And then they all try to outdo each other providing plush work environments and great benefits.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Make no mistake: you’re being sold something here. The allure of stardom is deliberate bait. Giles succinctly sums this up:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;This last part is laugh-out-loud funny. That's not how Hollywood works. I'm an actor, I've been studying acting for years, and I know award-winning actors who still have to go out on auditions like everybody else. You might wonder how a newbie like me, with nothing but Cop #3 in a student film to his credit, can claim to know award-winning, seasoned professionals. It's simple: because &lt;b&gt;&lt;i&gt;they have to go on auditions like everybody else&lt;/i&gt;&lt;/b&gt;&lt;i&gt;&lt;/i&gt;.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;I will take one issue with what Giles said:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;Robert Downey Jr. had to fight like hell to get the lead role in &lt;i&gt;Iron Man&lt;/i&gt;.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Yes, but there’s a reason for that. He had a &lt;a href="http://en.wikipedia.org/wiki/Robert_Downey,_Jr.#Substance_abuse"&gt;serious drug problem&lt;/a&gt; and any studio is going to balk at betting a billion dollar franchise on a cokehead.&lt;/p&gt;  &lt;p&gt;But I digress.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://commons.wikimedia.org/wiki/File:Hollywood_sign_053004.jpg" rel="license" target="_blank"&gt;&lt;img style="width: 320px" src="http://img138.imageshack.us/img138/3341/800pxhollywoodsign05300.jpg" /&gt;&lt;/a&gt; Here’s another difference: actors are basically the most flexible labour market in the world. They go where the work is. The film shoots for 40 weeks in Siberia? Fine, no problem. Actors go where the films are.&lt;/p&gt;  &lt;p&gt;Programmers on the other hand are not nearly as flexible. Programmers are regular workers. We have families, friends, mortgages and so on. Sure we might move from St Louis to San Francisco for a job but we also might not. I think it’s safe to say that more often than not, we’re not looking to move across country. Hell, we’ll even turn down a job if it’s in the &lt;em&gt;wrong part of the same city&lt;/em&gt;.&lt;/p&gt;  &lt;p&gt;Imagine how far you’d get as an actor if you said “I’ve love to work on your TV show but the studio is in Burbank and commute from Radondo Beach is a bitch so i think I’ll pass.” (only knowing LA to change planes I make no apologies for any gross errors in LA geography I may have just made).&lt;/p&gt;  &lt;p&gt;So instead of there being a handful of job markets for actors there are probably 100 or more for programmers.&lt;/p&gt;  &lt;h3&gt;So What’s In It For Me?&lt;/h3&gt;  &lt;p&gt;From the &lt;a href="http://careers.stackoverflow.com/faq"&gt;FAQ&lt;/a&gt;:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;If you are seeking employment, we do require a modest annual payment to file your CV. Filing your CV makes it eligible to appear in searches by hiring managers via our private search interface. This fee allows us to ensure employers that everyone they find is actively looking for a job.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Isn’t the fact that I’ve filled out a CV and ticked a box that says I’m looking for work sufficient? Apparently not.&lt;/p&gt;  &lt;p&gt;Consider &lt;a href="http://www.joelonsoftware.com/articles/FindingGreatDevelopers.html"&gt;Finding Great Developers&lt;/a&gt;:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;The great software developers, indeed, the best people in every field, are quite simply &lt;em&gt;never on the market.&lt;/em&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;So the target market seems to be those developers who &lt;em&gt;think&lt;/em&gt; they’re great developers but actually aren’t. If they were they wouldn’t be looking. I get it: everyone is better than average.&lt;/p&gt;  &lt;p&gt;Giles sums this up:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;The number one rule of the con: you can't con an honest man … Try to get something for nothing, just because Joel Spolsky said you could? You're going to get burned.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;I should point out that I signed up in the beta. I was under no illusions however (then again, who ever thinks they are?). The chances of an employer looking in my remote backwater are next to nil but I figured at $30, at worst I was out two lunches from &lt;a href="http://www.nandos.com.au/index.php"&gt;Nando's&lt;/a&gt;.&lt;/p&gt;  &lt;h3&gt;And If I’m A Hiring Manager?&lt;/h3&gt;  &lt;p&gt;Approximately 6,500 Stackoverflow users have 1,000 reputation or more. This is an arbitrary number choice but the point is this: integration with Stackoverflow only adds value if you’ve contributed a sufficiently large number of answers to mine. Go up to 2,000 rep and you’re down to less than 3,200 users. And so on.&lt;/p&gt;  &lt;p&gt;Let’s be optimistic and say the potential audience for whom Stackoverflow will add value to their CV is 10,000. A number of these can be eliminated as being students, retired, incapable of working (eg disability or serious prolonged injury) or simply not looking for work.&lt;/p&gt;  &lt;p&gt;Joel claims:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;But Stack Overflow Careers doesn’t have to be massive. It’s not for the 5.2 million people who visit Stack Overflow; it’s for the top 25,000 developers who participate actively.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Want to know what the &lt;a href="http://stackoverflow.com/users?page=714"&gt;25,000th user&lt;/a&gt; looks like?&lt;/p&gt;  &lt;p&gt;&lt;img style="width: 500px; display: block; float: none; margin-left: auto; margin-right: auto" src="http://easycaptures.com/fs/uploaded/443/8219048195.png" /&gt; &lt;/p&gt;  &lt;p&gt;I mean no disrespect to these people but “participate actively”?&lt;/p&gt;  &lt;p&gt;Take careful note of the language too: 25,000 from 5.2 million? Hell, you’re &lt;em&gt;already&lt;/em&gt; the top half of one percent! You’re &lt;em&gt;elite&lt;/em&gt;, positively &lt;em&gt;l33t&lt;/em&gt;! Uh huh.&lt;/p&gt;  &lt;h3&gt;Crunching the Numbers&lt;/h3&gt;  &lt;p&gt;&lt;a href="http://commons.wikimedia.org/wiki/File:Accountants.jpg" rel="license" target="_blank"&gt;&lt;img style="width: 320px" src="http://img44.imageshack.us/img44/9199/accountants.jpg" /&gt;&lt;/a&gt; There are at least 100 distinct geographical job markets for an employer. If you’re lucky 10% of the pool is accessible to you either by being in the right place or willing to relocate.&lt;/p&gt;  &lt;p&gt;Of those 10%, maybe 10% have the right skills. The importance of programming languages is definitely overstated by (typically clueless) HR departments and recruiters. It’s also true that good developers can program in anything (given sufficient time) but not all languages are interchangeable in all situations. I would consider a Java Web developer to be largely interchangeable with an ASP.NET C# Web developer (in that there is sufficient crossover to enable a sufficiently speedy transition) but I wouldn’t hire a Ruby programmer to do C programming for microcontrollers and embedded devices. The transition from unmanaged (eg C/C++) to managed (eg C#/.Net) code can be steep enough.&lt;/p&gt;  &lt;p&gt;Of this reduced pool, how many have the right experience? The more experienced you get as a developer, generally the more important domain knowledge becomes. I wouldn’t hire a mobile telephony architect to design a system for market-making options on commodities futures because you’d spend 6 months explaining bid/ask, spreads, what a future is, what an option is, in-the-money, out-the-money, out-the-money, short, long, contango, volatility, Black-Scholes… the list goes on.&lt;/p&gt;  &lt;p&gt;Of the remaining few who has the right &lt;em&gt;amount &lt;/em&gt;of experience? You wouldn’t hire a fresh college grad to mentor junior developers.&lt;/p&gt;  &lt;p&gt;Now you’ve got a short list (“short” being the operative word) consider how many are available?&lt;/p&gt;  &lt;p&gt;And you haven’t even interviewed anybody yet!&lt;/p&gt;  &lt;p&gt;So if you optimistically assume that 10,000 people sign up for Careers, chances are you’re down to &lt;em&gt;less than five&lt;/em&gt;. Of those, how many are &lt;em&gt;seriously &lt;/em&gt;looking? They’re paying by the year so why not have your CV out there just in case?&lt;/p&gt;  &lt;p&gt;Don’t be fooled, paying to file your CV doesn’t ensure you’re seriously looking. The &lt;em&gt;only&lt;/em&gt; thing it ensures is that you’re a revenue stream.&lt;/p&gt;  &lt;h3&gt;Critical Mass&lt;/h3&gt;  &lt;p&gt;&lt;a href="http://commons.wikimedia.org/wiki/File:Nuclear_power.JPG" rel="license" target="_blank"&gt;&lt;img style="width: 320px" src="http://img40.imageshack.us/img40/8706/nuclearpower.jpg" /&gt;&lt;/a&gt; Matching candidates to employers is &lt;em&gt;low probability&lt;/em&gt;. The number who fit the profile is probably 1 in 1,000 &lt;em&gt;or even less&lt;/em&gt;.&lt;/p&gt;  &lt;p&gt;So of the 10 to 25 thousand relevant potential candidates, some percentage will actually be looking for work. Of that percentage, a smaller percentage will pay to be seen by employers, less than might otherwise be seen if the service was free (for job seekers). I expect that number to be 2,000 or less and that number is, in my opinion, inflated by the cheap beta registration.&lt;/p&gt;  &lt;p&gt;So an employer is going to pay big bucks—much more than a typical job ad—to reach a &lt;em&gt;much smaller&lt;/em&gt; target audience?&lt;/p&gt;  &lt;p&gt;People will pay money if they are getting value for money. Paying $15,000 to a recruiter to find you a programmer is &lt;em&gt;cheap&lt;/em&gt; because the recruiter is doing most of the legwork &lt;em&gt;and&lt;/em&gt; assuming a large part of the risk (in that they don’t typically get paid if you don’t find someone you like). Job ads are &lt;em&gt;cheap&lt;/em&gt; because they may reach tens or even hundreds of thousands of candidates.&lt;/p&gt;  &lt;p&gt;&lt;em&gt;It’s like Careers is charging as if it’s already a proven success.&lt;/em&gt;&lt;/p&gt;  &lt;p&gt;Things like this work on the principle of &lt;em&gt;critical mass&lt;/em&gt;. Take eBay. People buy on eBay because there are things to be bought. People sell on eBay because people will buy them. Without either group the site fails. A job board is no different. People go to them because they have jobs they want. Companies advertise on them because they reach the right audience.&lt;/p&gt;  &lt;p&gt;So what job board—and let’s be honest; that’s what it is—is going to survive by restricting itself to 10 to 25 thousand candidates &lt;em&gt;globally&lt;/em&gt;? Perhaps Jeff and Joel are thinking that it will be &lt;em&gt;so&lt;/em&gt; successful that everyone else will just have to sign up anyway.&lt;/p&gt;  &lt;p&gt;Good luck with that business strategy.&lt;/p&gt;  &lt;h3&gt;Is It Legal?&lt;/h3&gt;  &lt;p&gt;I have to wonder if anyone has bothered to ask this yet. Consider &lt;a href="http://www.thenational.ae/apps/pbcs.dll/article?AID=/20090725/NATIONAL/707249768" target="_blank"&gt;Job seekers are hit by illegal fees&lt;/a&gt;. Not just in the United Arab Emirates is it &lt;em&gt;illegal&lt;/em&gt; to charge job seekers. Also, &lt;a href="http://jobseekr.com.au/2009/04/27/how-job-seekers-can-best-use-recruitment-agencies/" target="_blank"&gt;How job seekers can best use recruitment agencies&lt;/a&gt; (emphasis added):&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;Recruitment agencies make their money by charging employers a fee for a permanent hire or an hourly or daily margin on a temporary placement. &lt;strong&gt;&lt;em&gt;It is illegal to charge job seekers a fee for finding them work.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;That’s for Australia. The point here is that Jeff and Joel probably need to be &lt;em&gt;very&lt;/em&gt; careful about how they define the Careers site if they don’t want to run afoul of laws set up to protect the unemployed from unscrupulous practices.&lt;/p&gt;  &lt;h3&gt;Smoke and Mirrors&lt;/h3&gt;  &lt;p&gt;From the &lt;a href="http://careers.stackoverflow.com/faq"&gt;FAQ&lt;/a&gt;:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;If you are seeking employment, we do require a modest annual payment to file your CV. Filing your CV makes it eligible to appear in searches by hiring managers via our private search interface. This fee allows us to ensure employers that everyone they find is actively looking for a job.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;We’re being sold something here.&lt;/p&gt;  &lt;p&gt;Also consider &lt;a href="http://www.joelonsoftware.com/items/2009/11/05.html"&gt;Upgrade your career&lt;/a&gt;:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;Employers can see how good you are at communicating, …&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;OK&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;… how well you explain things, …&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;OK&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;… how well you understand the tools that you’re using, …&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Er… OK.&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;… and generally, if you’re a great developer or not.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Whoa. Sorry, but the fact that I know how that parsing HTML with regular expressions is retarded, I can explain how to add a jQuery click() handler and that not sanitizing user input to SQL statements is idiotic doesn’t make me a great developer. It means anything from I like teaching to I’m narcissistic enough to like hearing the sound of my own voice (virtually speaking), perhaps both.&lt;/p&gt;  &lt;p&gt;And let’s not forget that &lt;em&gt;all of this can be established by simply including a URL to your Stackoverflow profile on your CV&lt;/em&gt;.&lt;/p&gt;  &lt;p&gt;&lt;/p&gt;  &lt;h3&gt;Conclusion&lt;/h3&gt;  &lt;p&gt;The numbers just don’t add up on this one. My only question is how long it’ll be before that sinks in and the model changes. With so much free choice, its just not viable to charge job seekers while severely limiting the candidate pool for employers while charging them an arm and a leg for information they can get from a URL.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/bSfjU7p6ix4" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/166152980014672010/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2009/12/joel-inc-stackoverflow-careers-and.html#comment-form" title="53 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/166152980014672010?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/166152980014672010?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/bSfjU7p6ix4/joel-inc-stackoverflow-careers-and.html" title="Joel Inc., Stackoverflow Careers and Jumping Sharks" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>53</thr:total><feedburner:origLink>http://www.cforcoding.com/2009/12/joel-inc-stackoverflow-careers-and.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CEIGRnc5fip7ImA9WxBTE08.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-7272264298600108129</id><published>2009-12-09T09:34:00.003+08:00</published><updated>2009-12-09T09:35:27.926+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-12-09T09:35:27.926+08:00</app:edited><title>Google Wave Invites to Give Away</title><content type="html">I've got a dozen or so of these I don't really need. &lt;a href="http://www.cforcoding.com/2009/05/contact.html"&gt;Drop me a line&lt;/a&gt; and I'll send you one, first in first served until they run out.&lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/iql-Kz-TlvU" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/7272264298600108129/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2009/12/google-wave-invites-to-give-away.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/7272264298600108129?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/7272264298600108129?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/iql-Kz-TlvU/google-wave-invites-to-give-away.html" title="Google Wave Invites to Give Away" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://www.cforcoding.com/2009/12/google-wave-invites-to-give-away.html</feedburner:origLink></entry></feed>
