<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:gd="http://schemas.google.com/g/2005" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" gd:etag="W/&quot;CEAEQXw8fCp7ImA9WxBbEU8.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555</id><updated>2010-03-09T16:18:20.274+08:00</updated><title>C for Coding</title><subtitle type="html" /><link rel="http://schemas.google.com/g/2005#feed" type="application/atom+xml" href="http://www.cforcoding.com/feeds/posts/default" /><link rel="alternate" type="text/html" href="http://www.cforcoding.com/" /><link rel="next" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default?start-index=26&amp;max-results=25&amp;redirect=false&amp;v=2" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email></author><generator version="7.00" uri="http://www.blogger.com">Blogger</generator><openSearch:totalResults>64</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.feedburner.com/CForCoding" /><feedburner:info uri="cforcoding" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><entry gd:etag="W/&quot;DEMBR3gzeSp7ImA9WxBVE00.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-3948045478498721299</id><published>2010-02-16T11:45:00.001+08:00</published><updated>2010-02-16T15:47:36.681+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-02-16T15:47:36.681+08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="stackoverflow" /><category scheme="http://www.blogger.com/atom/ns#" term="opinion" /><title>Stackoverflow: Joel and Jeff want VC Money? Say What?</title><content type="html">&lt;p&gt;The big news today is that Stackoverflow—started by &lt;a href="http://www.joelonsoftware.com/"&gt;Joel Spolsky&lt;/a&gt; and &lt;a href="http://www.codinghorror.com/blog/"&gt;Jeff Atwood&lt;/a&gt; as a programming Q&amp;amp;A site almost 18 months ago—is &lt;a href="http://www.joelonsoftware.com/items/2010/02/14.html"&gt;now looking for VC money&lt;/a&gt;. This is huge and deeply worrying. And it raises a whole raft of questions.&lt;/p&gt;  &lt;h3&gt;Vertical Growth&lt;/h3&gt;  &lt;p&gt;Stackoverflow has grown to be probably the largest programming Q&amp;amp;A site on the internet in its short life, supplanting the “evil hyphen site”, to be just outside the top 1000 sites having &lt;a href="http://www.quantcast.com/stackoverflow.com"&gt;over 4.5 million visitors a month&lt;/a&gt;. While it continues to grow, there’s only so big it can get because there are only so many programmers.&lt;/p&gt;  &lt;p&gt;Joel says:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;In 18 months we’ve accomplish that: we’ve got 6 million unique visitors every month.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; this figure includes Superuser (1M) and Serverfault (730K).&lt;/p&gt;  &lt;p&gt;The issue of course is how to turn this traffic into revenue sufficient to cover the site’s running costs, development of the site and profit for its owners. Programmers are a hard group to monetize and you can see Joel and Jeff struggle with this when it comes the usual method: advertising. See &lt;a href="http://blog.stackoverflow.com/2009/03/responsible-advertising-feed-a-programmer/"&gt;Responsible Advertising: Feed a Programmer&lt;/a&gt;, &lt;a href="http://blog.stackoverflow.com/2009/11/our-amazon-advertising-experiment/"&gt;Our Amazon Advertising Experiment&lt;/a&gt; and &lt;a href="http://blog.stackoverflow.com/summary-of-amazon-remnant-ad-experiment/"&gt;Summary of Amazon Remnant Ad Experiment&lt;/a&gt;.&lt;/p&gt;  &lt;h3&gt;Horizontal Growth&lt;/h3&gt;  &lt;p&gt;It’s natural for companies that exhaust opportunities in their home markets to look at other markets that are related somehow, fuelled by (sometimes justified) paranoia that if they stop growing they’ll die or simply the need for incessant growth.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://www.businessinsider.com/chart-of-the-day-microsoft-operating-income-by-division-2010-2"&gt;&lt;img style="width: 510px" title="Microsoft Operating Profits By Division" alt="Microsoft Operating Profits by Division" src="http://static.businessinsider.com/image/4b7337bc0000000000a10a91/chart-of-the-day-msft-operating-profit.gif" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Look at &lt;a href="http://www.businessinsider.com/chart-of-the-day-microsoft-operating-income-by-division-2010-2"&gt;where Microsoft's profits come from&lt;/a&gt; and you’ll see their core business is Windows and Office. Forays into gaming, music, online services, mobile communication, etc have varied from being lacklustre to haemorrhaging money pits.&lt;/p&gt;  &lt;p&gt;Google’s core business is search and advertising.&lt;/p&gt;  &lt;p&gt;It takes a rare combination of talent, timing and luck to successfully branch into new areas as Apple did with online music, portable music players and the iPhone.&lt;/p&gt;  &lt;p&gt;Joel gave a &lt;a href="http://blog.stackoverflow.com/2009/05/joel-talks-about-stack-overflow-at-google/"&gt;Google Tech Talk about Stackoverflow&lt;/a&gt; last May that’s instructive. A key point is that all software is social and that a given platform that works in one community that’s dropped into another may simply not work.&lt;/p&gt;  &lt;p&gt;Programmers respond to the Q&amp;amp;A format of Stackoverflow because a programmer is predisposed to formulating questions, answering them and categorizing (tagging) them. What’s more, the subject matter is sufficiently objective for there to be right and wrong answers most of the time.&lt;/p&gt;  &lt;p&gt;To put it another way: programmers talking about programming are self-organizing.&lt;/p&gt;  &lt;p&gt;Some miss the point completely and criticize the format for making discussion hard, which misses the point entirely.&lt;/p&gt;  &lt;h3&gt;Sister Sites&lt;/h3&gt;  &lt;p&gt;Joel and Jeff’s first attempts at horizontal market growth are the sister sites: &lt;a href="http://serverfault.com/"&gt;Serverfault&lt;/a&gt; (for sysadmins) and &lt;a href="http://superuser.com/"&gt;Superuser&lt;/a&gt; (for general computer questions), which Jeff calls the League of Justice. There are also loose affiliations with &lt;a href="http://www.howtogeek.com/"&gt;How-to Geek&lt;/a&gt; and &lt;a href="http://doctype.com/"&gt;Doctype&lt;/a&gt; (from the guys behind &lt;a href="http://litmusapp.com/"&gt;Litmus&lt;/a&gt;).&lt;/p&gt;  &lt;p&gt;While a million (ish) uniques per month is nothing to sneeze at it’s clear that these sites haven’t grown like Stackoverflow has. See &lt;a href="http://www.quantcast.com/superuser.com"&gt;superuser.com&lt;/a&gt; and &lt;a href="http://www.quantcast.com/serverfault.com"&gt;serverfault.com&lt;/a&gt; (this one has started to pick up recently).&lt;/p&gt;  &lt;h3&gt;Stack Exchange&lt;/h3&gt;  &lt;p&gt;&lt;a href="http://www.fogcreek.com/"&gt;Fog Creek&lt;/a&gt; has adapted the Stackoverflow code to create a hosted white label Q&amp;amp;A solution. For roughly $129/month you can have your own Q&amp;amp;A site to discuss everything from parenting issues to World of Warcraft (no joke).&lt;/p&gt;  &lt;p&gt;Such sites rely on communities and building communities takes time. Stackoverflow succeeded in part because it leveraged the existing audiences of Joel and Jeff.&lt;/p&gt;  &lt;h3&gt;Careers&lt;/h3&gt;  &lt;p&gt;This is perhaps the more controversial move and something I covered in &lt;a href="http://www.cforcoding.com/2009/12/joel-inc-stackoverflow-careers-and.html"&gt;Joel Inc., Stackoverflow Careers and Jumping Sharks&lt;/a&gt; and &lt;a href="http://www.cforcoding.com/2009/12/hard-numbers-on-stackoverflow-careers.html"&gt;Hard Numbers on Stackoverflow Careers&lt;/a&gt;. It’s something the pair have pushed repeatedly, going so far as &lt;a href="http://blog.stackoverflow.com/2010/01/careers-success-stories/"&gt;heartfelt testimonials&lt;/a&gt;.&lt;/p&gt;  &lt;p&gt;This one differs from the others in that the revenue model isn’t based on advertising: it’s based on the high cost of recruitment and the unique tie-in with Stackoverflow. My opinion is there simply aren’t enough active Stackoverflow users for this to be a real money spinner but time will tell.&lt;/p&gt;  &lt;h3&gt;Self-Funding and Control&lt;/h3&gt;  &lt;p&gt;Self-funding has huge advantages for any venture. If it’s possible it keeps control in the hands of the founders. Investors have their own agenda—being a return on that investment—which doesn’t necessarily coincide with the best long-term interests of the venture.&lt;/p&gt;  &lt;p&gt;Some argue &lt;a href="http://en.wikipedia.org/wiki/Transmeta"&gt;Transmeta&lt;/a&gt; was derailed by being forced to make a premature product launch.&lt;/p&gt;  &lt;p&gt;When you own your own venture you can do whatever you want. Well, you can’t break the law but other than that, there’s not a lot you can’t do.&lt;/p&gt;  &lt;p&gt;As soon as you have investors that changes. Investors have rights. Their money comes with conditions like how you can spend the company’s money, reporting requirements and so on.&lt;/p&gt;  &lt;p&gt;It gets even worse when you’re a public company and worse again when you’re a publicly listed company.&lt;/p&gt;  &lt;h3&gt;Debt and Equity&lt;/h3&gt;  &lt;p&gt;There are two basic sources of funding for a venture: debt and equity.&lt;/p&gt;  &lt;p&gt;Debt is borrowing money that you agree to repay the lender, typically at a fixed or floating rate over a given period of time. In the corporate world, there are many sources of debt: bank bills, overdrafts, commercial paper, bonds, swaps, traditional loans (secured and unsecured) and so forth. Many of these you have to be sufficiently large to have access to (eg corporate bonds are an option for the Toyotas of the world).&lt;/p&gt;  &lt;p&gt;Equity is ownership of the company. Depending on your jurisdiction there are many forms of equity: ordinary shareholders, preferential shareholders and so on. They have different rights and a different pecking order for being repaid if the company is ever wound up (and typically the debt-holders will be ahead of all of them).&lt;/p&gt;  &lt;p&gt;In between there are countless variations (eg convertible notes are a debt instrument that can be converted to equity in certain circumstances).&lt;/p&gt;  &lt;p&gt;Companies generally strive for a healthy mix of debt and equity funding options.&lt;/p&gt;  &lt;p&gt;The fallacy that many tech companies succumb to is that venture capitalists are their only source of funding. What’s more, VC funding is about the most expensive source of funding. A bank, being your typical source for a loan, will look at your plan and make a decision on your ability to repay the loan. Not your revenue but your income (being revenue minus expenses), both current and projected.&lt;/p&gt;  &lt;p&gt;VCs typically look for blue-sky potential, often in ventures that don’t even generate revenue now or in the foreseeable future. Still any business plan will need to answer the questions of “when” and “how” the investors will get a return.&lt;/p&gt;  &lt;h3&gt;What Does Stackoverflow Want?&lt;/h3&gt;  &lt;p&gt;This move is surprising consider Joel wrote &lt;a href="http://www.joelonsoftware.com/articles/VC.html"&gt;Fixing Venture Capital&lt;/a&gt; and &lt;a href="http://www.joelonsoftware.com/articles/fog0000000056.html"&gt;Strategy Letter I: Ben and Jerry's vs. Amazon&lt;/a&gt;. Joel is somewhat vague on their motivations, saying only:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;Now we’re biting off the bigger goal of changing the way&lt;em&gt;everyone &lt;/em&gt;gets answers to their questions on the Internet, and that’s something we can’t do alone.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;The infrastructure (hardware and bandwidth) is cheap (almost free) for Q&amp;amp;A. Stackoverflow.com seems to run on three Web servers based on &lt;a href="http://highscalability.com/blog/2009/8/5/stack-overflow-architecture.html"&gt;Stack Overflow Architecture&lt;/a&gt; (a little outdated but those Web servers are low RAM and single CPU, which means dirt cheap) and &lt;a href="http://blog.stackoverflow.com/2010/01/stack-overflow-network-configuration/"&gt;Stack Overflow Network Configuration&lt;/a&gt;.&lt;/p&gt;  &lt;p&gt;It’s fair to say that hardware is ludicrously cheap. Plentyoffish uses &lt;a href="http://highscalability.com/plentyoffish-architecture"&gt;less than 10 servers&lt;/a&gt; for over a billion monthly page views.&lt;/p&gt;  &lt;p&gt;Is it development? Is there some grand Q&amp;amp;A idea that’s going to take 50 man-years of development time to implement? Jeff has repeatedly said that apart from tweaking around the edges, Stackoverflow as a technology platform is basically “done”.&lt;/p&gt;  &lt;p&gt;Is it to broaden the scope of Stackoverflow? What about a Wikipedia-like platform? What about the Wikipedia content? Is there any money in that?&lt;/p&gt;  &lt;h3&gt;Why Venture Capital?&lt;/h3&gt;  &lt;p&gt;This points to something ridiculously large scale otherwise:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Why wouldn’t a bank fund it (based on existing income)? &lt;/li&gt;    &lt;li&gt;Why wouldn’t Fog Creek fund it? &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;The last is worth mulling over. Fog Creek has ~34 employees. Joel once said for every $10,000/month Fog Creek made he hired a programmer. Fog Creek is a private company so it’s profits aren’t published but it would seem reasonable to assume that their revenue is in the order of $4-10 million per annum.&lt;/p&gt;  &lt;blockquote&gt;   &lt;ol&gt;     &lt;li value="value"&gt;he business itself could benefit from the publicity of getting an investment from someone who is thought of as being a savvy investor. &lt;/li&gt;      &lt;li value="value"&gt;The investor will add substantial value to the business in advice, connections, and introductions. &lt;/li&gt;   &lt;/ol&gt; &lt;/blockquote&gt;  &lt;p&gt;But he also says:&lt;/p&gt;  &lt;blockquote&gt;   &lt;ol&gt;     &lt;li value="value"&gt;The founders are not in it for their own personal aggrandizement and are happy to give up some control to make the business more successful. &lt;/li&gt;   &lt;/ol&gt; &lt;/blockquote&gt;  &lt;p&gt;Interesting. Could it be as simple as wanting to cash out?&lt;/p&gt;  &lt;p&gt;I suspect (3) and (4) are more what it’s about but without knowing what they want to do it’s largely impossible to figure out the why.&lt;/p&gt;  &lt;h3&gt;Conclusion&lt;/h3&gt;  &lt;p&gt;It’s hard not to be concerned by this. The evil hyphen site became evil when they tried to take what was free content and and monetize it using a subscription model. I don’t believe this is a likely outcome here but when you give up control, it’s a question of what your investors believe is the path to profitability that matters.&lt;/p&gt;  &lt;p&gt;Many businesses fail because they try to apply something that worked one place to another area where it simply doesn’t work. I would hate to see this happen to Stackoverflow as I’m personally a big fan of the site.&lt;/p&gt;  &lt;p&gt;There’s something to be said for leaving something that works well enough alone and turning your attention to building something else. Not everyone can or should be Microsoft or Google. Trying to be is typically a surefire way of converting success into failure.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;em&gt;Update:&lt;/em&gt;&lt;/strong&gt; I misspoke regarding the Stackoverflow Web server configuration. Fixed.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/336308386934546555-3948045478498721299?l=www.cforcoding.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/AvgdWKMvdqJ01NXJYwRiKUooAiU/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/AvgdWKMvdqJ01NXJYwRiKUooAiU/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/AvgdWKMvdqJ01NXJYwRiKUooAiU/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/AvgdWKMvdqJ01NXJYwRiKUooAiU/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/g5VyzCcC2-4" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/3948045478498721299/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2010/02/stackoverflow-joel-and-jeff-want-vc.html#comment-form" title="14 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/3948045478498721299?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/3948045478498721299?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/g5VyzCcC2-4/stackoverflow-joel-and-jeff-want-vc.html" title="Stackoverflow: Joel and Jeff want VC Money? Say What?" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07140129710674369084" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">14</thr:total><feedburner:origLink>http://www.cforcoding.com/2010/02/stackoverflow-joel-and-jeff-want-vc.html</feedburner:origLink></entry><entry gd:etag="W/&quot;A0cBQHs8fip7ImA9WxBWGEo.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-1738664673136658575</id><published>2010-02-11T17:04:00.001+08:00</published><updated>2010-02-11T17:04:11.576+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-02-11T17:04:11.576+08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="open source" /><category scheme="http://www.blogger.com/atom/ns#" term="Java" /><category scheme="http://www.blogger.com/atom/ns#" term="parsing" /><category scheme="http://www.blogger.com/atom/ns#" term="markdown" /><title>Markdown, Inline Parsing and Badly Formed HTML</title><content type="html">&lt;p&gt;I haven’t had much time to work on my Markdown parser lately (sadly) but I thought it was worth posting an update on where I’m at. I have been digging deep into the dark depths of inline parsing. I have &lt;a href="http://www.cforcoding.com/2010/02/markdown-block-parsing-and-road-to-hell.html"&gt;previously discussed the two modes of parsing Markdown&lt;/a&gt;, which I call block and inline.&lt;/p&gt;  &lt;p&gt;But the block parsing is done (well, I have to go back and tweak &lt;em&gt;one&lt;/em&gt; thing) so I’m onto the murky world of inline Markdown parsing.&lt;/p&gt;  &lt;h3&gt;Parsing Block Markup&lt;/h3&gt;  &lt;p&gt;Various Markdown implementations allow you to create markup blocks. There are usually quite strict requirements about how you can write these blocks. For example, you might need to put the start and end tags on separate lines such as:&lt;/p&gt;  &lt;pre class="brush:plain"&gt;&amp;lt;ul&amp;gt;
  &amp;lt;li&amp;gt;one&amp;lt;/li&amp;gt;
  &amp;lt;li&amp;gt;two&amp;lt;/li&amp;gt;
  &amp;lt;li&amp;gt;three&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;  &lt;/pre&gt;

&lt;p&gt;I have a much more forgiving approach to this such that this “Markdown”:&lt;/p&gt;

&lt;pre class="brush:plain"&gt;This is a paragraph with a &amp;lt;ul&amp;gt;&amp;lt;li&amp;gt;nested&amp;lt;/li&amp;gt;&amp;lt;/li&amp;gt;block&amp;lt;/li&amp;gt; with
some &amp;lt;hr&amp;gt;random&amp;lt;h2&amp;gt;other tags&amp;lt;/h2&amp;gt; in it&lt;/pre&gt;

&lt;p&gt;and convert it to:&lt;/p&gt;

&lt;pre class="brush:xml"&gt;&amp;lt;p&amp;gt;This is a paragraph&amp;lt;/p&amp;gt;

&amp;lt;ul&amp;gt;
  &amp;lt;li&amp;gt;nested&amp;lt;/li&amp;gt;
  &amp;lt;li&amp;gt;block&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;

&amp;lt;p&amp;gt;with some&amp;lt;/p&amp;gt;

&amp;lt;hr&amp;gt;

&amp;lt;p&amp;gt;random&amp;lt;/p&amp;gt;

&amp;lt;h2&amp;gt;other tags&amp;lt;/h2&amp;gt;

&amp;lt;p&amp;gt;in it&amp;lt;/p&amp;gt;&lt;/pre&gt;

&lt;p&gt;&lt;em&gt;This part already works.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;But it gets better. It will also take that some input stream and convert it to:&lt;/p&gt;

&lt;pre class="brush:plain"&gt;This is a paragraph

- nested
- block

with some

------

random

## other tags ##

in it&lt;/pre&gt;

&lt;p&gt;So to be clear: this will convert acceptable markup to markdown and filter out unacceptable markup (like script tags).&lt;/p&gt;

&lt;p&gt;This will include parsing links and images into Markdown references.&lt;/p&gt;

&lt;h3&gt;Parsing Inline Markup&lt;/h3&gt;

&lt;p&gt;This is what I’m working on now. I’m still looking for a good generic way of doing this that correctly captures tag hierarchies (eg list items must be children to unordered and ordered lists). What I’m probably going to do is release a messy version of the code (being the current version) then go back and revisit it once I have a working implementation.&lt;/p&gt;

&lt;p&gt;This is a good general principle: it’s far easier to fix something that’s complete and working than it is to constantly strive for perfection in incomplete code (basically &lt;a href="http://www.folklore.org/StoryView.py?story=Real_Artists_Ship.txt"&gt;artists ship&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;One thing I’m debating is whether I require tags to be balanced. That means whether I accept this:&lt;/p&gt;

&lt;pre class="brush:xml"&gt;&amp;lt;b&amp;gt;this is&amp;lt;i&amp;gt;a&amp;lt;/b&amp;gt; test&amp;lt;/i&amp;gt;&lt;/pre&gt;

&lt;p&gt;Ideally I’d like to &lt;em&gt;not&lt;/em&gt; accept this. XML/XHTML requires balanced tags but HTML either doesn’t or even if it does, most browsers are quite forgiving of this. XML treats markup essentially as a document tree whereas the HTML view is more like tags are, in certain circumstances, switches to turn behaviour on or off.&lt;/p&gt;

&lt;h3&gt;Markdown Formatting&lt;/h3&gt;

&lt;p&gt;I went into this problem thinking I could construct a document tree out of&lt;/p&gt;

&lt;pre class="brush:plain"&gt;***this is a* test**&lt;/pre&gt;

&lt;p&gt;into&lt;/p&gt;

&lt;pre class="brush:plain"&gt;document
+- bold
   +- italic
   |  +- text: this is a 
   +- text: test&lt;/pre&gt;

&lt;p&gt;but that idea quickly falls down when you consider that this is valid Markdown:&lt;/p&gt;

&lt;pre class="brush:plain"&gt;**this is *a** test*&lt;/pre&gt;

&lt;p&gt;which basically parses to:&lt;/p&gt;

&lt;pre class="brush:plain"&gt;BOLD_ON
TEXT(&amp;quot;this is &amp;quot;)
ITALIC_ON
TEXT(&amp;quot;a&amp;quot;)
BOLD_OFF
TEXT(&amp;quot; test&amp;quot;)
ITALIC_OFF&lt;/pre&gt;

&lt;p&gt;Almost any Markdown parser will generate HTML from this that looks like this:&lt;/p&gt;

&lt;pre class="brush:xml"&gt;&amp;lt;strong&amp;gt;this is &amp;lt;em&amp;gt;a&amp;lt;/strong&amp;gt; test&amp;lt;/em&amp;gt;&lt;/pre&gt;

&lt;p&gt;That’s unfortunate because I like the document tree. But sadly the matching problem still remains because if a special sequence doesn’t have a matching close it is put into the document as a literal sequence.&lt;/p&gt;

&lt;p&gt;This leads to some fairly pathological corner cases like:&lt;/p&gt;

&lt;pre class="brush:plain"&gt;*this [link* google][1]

  [1]: http://google.com&lt;/pre&gt;

&lt;p&gt;which will translate to&lt;/p&gt;

&lt;pre class="brush:xml"&gt;&amp;lt;em&amp;gt;this &amp;lt;a href=&amp;quot;http://google.com&amp;quot;&amp;gt;link&amp;lt;/em&amp;gt; google&amp;lt;/a&amp;gt;&lt;/pre&gt;

&lt;p&gt;and browsers will tend to break that link into two parts (where you can click on “link” or “google”).&lt;/p&gt;

&lt;p&gt;But work progresses.&lt;/p&gt;

&lt;h3&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;Even with a lot of the Markdown spec being parsed my transformation times on basic documents (eg a couple of lists, a block quote and some paragraphs) is still under 60 microseconds (roughly) and that’s with some messy array manipulation and temporary object creation that I plan to revisit and clean up.&lt;/p&gt;

&lt;p&gt;At this stage I’m hoping to have some committed and available for comment within two weeks. It won’t be pretty but my goal is to get feedback earlier rather than later.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/336308386934546555-1738664673136658575?l=www.cforcoding.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/1CyFTPQ52yyIcprpBFHXV9YS2Lo/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/1CyFTPQ52yyIcprpBFHXV9YS2Lo/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/1CyFTPQ52yyIcprpBFHXV9YS2Lo/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/1CyFTPQ52yyIcprpBFHXV9YS2Lo/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/belD0NvzMXs" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/1738664673136658575/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2010/02/markdown-inline-parsing-and-badly.html#comment-form" title="3 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/1738664673136658575?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/1738664673136658575?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/belD0NvzMXs/markdown-inline-parsing-and-badly.html" title="Markdown, Inline Parsing and Badly Formed HTML" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07140129710674369084" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">3</thr:total><feedburner:origLink>http://www.cforcoding.com/2010/02/markdown-inline-parsing-and-badly.html</feedburner:origLink></entry><entry gd:etag="W/&quot;A0YAQ3w6fyp7ImA9WxBWE0w.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-2778215083728630318</id><published>2010-02-05T05:32:00.001+08:00</published><updated>2010-02-05T05:32:22.217+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-02-05T05:32:22.217+08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="opinion" /><category scheme="http://www.blogger.com/atom/ns#" term="Java" /><title>Standing on the Outside</title><content type="html">&lt;p&gt;This week I read &lt;a href="http://codebetter.com/blogs/kyle.baley/archive/2010/02/02/life-outside-net-or-how-to-check-out-your-neighbours.aspx"&gt;Life outside .NET, or “How to check out your neighbours”&lt;/a&gt;. I really like posts like this. They’re instructive about the culture of a particular community.&lt;/p&gt;  &lt;p&gt;For over a decade I’ve been a Java developer (since JDK 1.0.2). Like most Java developers I have a love-hate relationship with the language, the libraries and Sun. Java didn’t invent the virtual machine but it certainly popularized it. 5-10 years ago (in particular) Java was a hotbed for the development of many technologies, concepts and frameworks.&lt;/p&gt;  &lt;p&gt;As the author notes, MVC and DI (dependency injection) are simply assumed in Javaland. It’s true. Good luck finding a non-MVC Web framework in Java out of the dozens that exist.&lt;/p&gt;  &lt;p&gt;My experience and exposure with .Net is at best peripheral. ASP.NET always struck me as somewhat &lt;em&gt;primitive &lt;/em&gt;in the sense that it’s what would’ve happened had JSP been taken to the nth-degree instead of being supplanted by Struts and all that came after. That’s not to say ASP.NET is bad or doesn’t do it’s job but to a Java developer it seems somehow &lt;em&gt;crude&lt;/em&gt;.&lt;/p&gt;  &lt;p&gt;Beyond the boring and irrelevant comparisons of Java vs. .Net performance, the more interesting comparison is as a proxy for decentralized vs. centralized platform progression.&lt;/p&gt;  &lt;p&gt;The &lt;em&gt;Microsoft Way&lt;/em&gt; definitely has its advantages. Where once Redmond was playing catch-up on Java (technically speaking), Sun’s inability to lead (and no clue where they were going if they could) has left Java largely stagnant. Java 7 is due at the end of the year but has been delayed &lt;em&gt;years&lt;/em&gt;. Thankfully it’s now getting closures if for no other reason than we can all stop bitching about it (frankly, I think some form of function pointers or delegates in “C#-speke” will be sufficient for 99% of use cases).&lt;/p&gt;  &lt;p&gt;It can be useful not to have a diaspora of Web development frameworks (even at the cost of innovation). Takes a Struts developer and put them on a Wicket or Tapestry project and their experience won’t be especially applicable.&lt;/p&gt;  &lt;p&gt;It will certainly be interesting to see if Oracle can provide more leadership than Sun. Oracle was always heavily invested in Java&amp;#160; so I’m hoping Java isn’t simply collateral damage to Larry’s acquisition of Sun’s server business. Bizarrely Oracle seems committed to JavaFX of all things.&lt;/p&gt;  &lt;p&gt;For those of you unfamiliar with it, JavaFX is Sun’s “me too” Flash alternative and a prime example of Sun’s boondoggles of recent years.&lt;/p&gt;  &lt;p&gt;I for one welcome our new insect &lt;a href="http://knowyourmeme.com/memes/i-for-one-welcome-our-new-overlords"&gt;overlords&lt;/a&gt;. I’d like to remind them that as a trusted blogger, I can be helpful in rounding up others to toil in their underground sugar caves.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/336308386934546555-2778215083728630318?l=www.cforcoding.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/7rfUHbOk2fy_KsUYE_8pNnag89E/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/7rfUHbOk2fy_KsUYE_8pNnag89E/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/7rfUHbOk2fy_KsUYE_8pNnag89E/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/7rfUHbOk2fy_KsUYE_8pNnag89E/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/AUnT4UWQb9w" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/2778215083728630318/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2010/02/standing-on-outside.html#comment-form" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/2778215083728630318?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/2778215083728630318?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/AUnT4UWQb9w/standing-on-outside.html" title="Standing on the Outside" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07140129710674369084" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">1</thr:total><feedburner:origLink>http://www.cforcoding.com/2010/02/standing-on-outside.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DEIERnw6cSp7ImA9WxBWEEU.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-6936739971799482067</id><published>2010-02-02T12:53:00.000+08:00</published><updated>2010-02-02T12:55:07.219+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-02-02T12:55:07.219+08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="opinion" /><category scheme="http://www.blogger.com/atom/ns#" term="parsing" /><category scheme="http://www.blogger.com/atom/ns#" term="markdown" /><title>Markdown, Block Parsing and the Road to Hell</title><content type="html">&lt;p&gt;I thought it times to update my status on this particular undertaking, which so far has ended up being far more massive than originally envisioned.&lt;/p&gt;  &lt;p&gt;The overall design of the Markdown parser is that there are two parsers… &lt;em&gt;kinda&lt;/em&gt;. There is a parser to break your document into blocks and another to interpret the inline content within those blocks. As soon as I made this realization, everything just got a whole lot easier.&lt;/p&gt;  &lt;p&gt;I use this this term (and “inline”) because those are the terms HTML uses (“block elements” and “inline elements”). Of course HTML also gets more complex (eg “replaced” vs “non-replaced” elements and inline-block, floats, etc) but fundamentally you can think of a Markdown document—or any hypertext document—as consisting of block and inline elements.&lt;/p&gt;  &lt;p&gt;Markdown parsers will often talk about “blocks” and “spans” instead.&lt;/p&gt;  &lt;h3&gt;Block Parsing&lt;/h3&gt;  &lt;p&gt;The first level of parsing of Markdown is into blocks.&lt;/p&gt;  &lt;p&gt;Such a document can be viewed as a tree. The root node is the document. Every node below that is either a block or an inline node. The tree can be arbitrarily deep and there are certain rules about relationships in that tree. For instance:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Block nodes are only ever children of other block nodes (counting the root Document node as a block node); &lt;/li&gt;    &lt;li&gt;Paragraphs can only contain inline elements; &lt;/li&gt;    &lt;li&gt;List items must be children of lists; &lt;/li&gt;    &lt;li&gt;and so on. &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;The goal of any parser is take an input and build a &lt;em&gt;valid syntax tree&lt;/em&gt; based on the rules defined.&lt;/p&gt;  &lt;p&gt;This part of the problem for what I’m writing is now done. This includes code blocks, paragraphs, block quotes, ordered and unordered lists, headers and horizontal rules. Tables I plan to return to later.&lt;/p&gt;  &lt;h3&gt;List Parsing&lt;/h3&gt;  &lt;p&gt;Today I came across &lt;a href="http://blog.stackoverflow.com/2008/06/three-markdown-gotcha/"&gt;Three Markdown Gotchas&lt;/a&gt;, which I hadn’t seen before but it opened my eyes to one particular area of difficulty I had: list processing. Go to StackOverflow, ask a question and type in:&lt;/p&gt;  &lt;pre class="brush:plain"&gt;- one
 - two
  - three
   - four&lt;/pre&gt;

&lt;p&gt;and you probably won’t get you what you expect. You get this:&lt;/p&gt;

&lt;pre class="brush:xml"&gt;&amp;lt;ul&amp;gt;
  &amp;lt;li&amp;gt;one
    &amp;lt;ul&amp;gt;
      &amp;lt;li&amp;gt;two&amp;lt;/li&amp;gt;
      &amp;lt;li&amp;gt;two&amp;lt;/li&amp;gt;
      &amp;lt;li&amp;gt;two&amp;lt;/li&amp;gt;
    &amp;lt;/ul&amp;gt;
  &amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;&lt;/pre&gt;

&lt;p&gt;Let me give you some background: Markdown has the concept of &lt;em&gt;indents&lt;/em&gt;. Based on a predefined tab width (typically 4), a single tab or 4 spaces represents one indent. That’s important because code lines are preceded by one indent. A &lt;em&gt;non-indent space&lt;/em&gt; is sometimes ignored at the beginning of a line, for example at the start of a paragraph line or the continuation of an existing one.&lt;/p&gt;

&lt;p&gt;The original Markdown “spec” says that nesting list items is done by preceding the line with one more indent than the previous line. In vanilla Markdown the above sequence would come out as:&lt;/p&gt;

&lt;pre class="brush:xml"&gt;&amp;lt;ul&amp;gt;
  &amp;lt;li&amp;gt;one&amp;lt;/li&amp;gt;
  &amp;lt;li&amp;gt;two&amp;lt;/li&amp;gt;
  &amp;lt;li&amp;gt;two&amp;lt;/li&amp;gt;
  &amp;lt;li&amp;gt;two&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;&lt;/pre&gt;

&lt;p&gt;because none of the lines has a leading indent. That’s logical and consistent. Jeff’s point is basically that even one space should indicate intent and be interpreted as nesting. Sounds reasonable right? Maybe. The problem is that it leads to unintended complexity.&lt;/p&gt;

&lt;p&gt;Go back to the above example and put one, two then three spaces in front of the first list item. Watch the preview pane to see how the list changes. The implied nesting changes all over the place? Logical? I think not.&lt;/p&gt;

&lt;p&gt;But it gets worse.&lt;/p&gt;

&lt;pre class="brush:plain"&gt;- one

 two
 - three

 four&lt;/pre&gt;

&lt;p&gt;comes out as&lt;/p&gt;

&lt;pre class="brush:plain"&gt;&amp;lt;ul&amp;gt;
  &amp;lt;li&amp;gt;
    &amp;lt;p&amp;gt;one&amp;lt;/p&amp;gt;
    &amp;lt;p&amp;gt;two&amp;lt;/p&amp;gt;
    &amp;lt;ul&amp;gt;
      &amp;lt;li&amp;gt;three&amp;lt;/li&amp;gt;
    &amp;lt;/ul&amp;gt;
    &amp;lt;p&amp;gt;four&amp;lt;/p&amp;gt;
  &amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;&lt;/pre&gt;

&lt;p&gt;&lt;em&gt;Okay&lt;/em&gt;… bear in mind that there are spaces before two and four so that you continue the list item. Otherwise they would be interpreted as separate paragraphs. But what if you want four to continue the nested list item three? How much indentation do you need? It turns out that the magical number is anything from 5 to 11.&lt;/p&gt;

&lt;p&gt;But it gets worse. Put one space before one and suddenly one and three are the same list so four is now indented so far that it becomes a code block run-on from three. Add a second space to the front of one and for some reason it returns to the original nesting even though one is now indented more than three. Huh?&lt;/p&gt;

&lt;p&gt;I’ll leave an examination of the MarkdownSharp source code as to the reasons for this as an exercise for the reader. Suffice it to say that it all stems from the motivation that one (more) space indicating nesting being somehow more intuitive.&lt;/p&gt;

&lt;h3&gt;The Road to Hell&lt;/h3&gt;

&lt;p&gt;The road to hell is paved with good intentions. It’s one of my favourite sayings. We programmers as a whole are unreasonable people. Through a combination of hubris, stubbornness and even laziness we have a tendency to throw out what’s been done before or simply make breaking changes because we prefer it, we think others will prefer it, we don’t appreciate that someone else may have to deal with the consequences or simply out of ignorance as to what led to the original changes.&lt;/p&gt;

&lt;p&gt;We all do this, myself included. It’s worst when it not only manifests itself in company culture but it’s &lt;em&gt;enshrined&lt;/em&gt;. Take Microsoft as a prime example. Internet Explorer has “Favourites”. What the hell are favourites? Well, they’re bookmarks. But IE can’t call them that because Netscape called them that first and Microsoft wanted to differentiated themselves and their products. This is of course led to many conversations I know I had at the time that went something like this:&lt;/p&gt;

&lt;p&gt;New user: What’s a favourite?
  &lt;br /&gt;Me: It’s a bookmark.&lt;/p&gt;

&lt;p&gt;I couldn’t help but laugh out loud when I first read C# and saw all the things copied from Java had been renamed, sometimes with significantly worse names. Java’s final as C#’s sealed springs to mind. You can just tell that there were people dedicated to the task of finding names to Java concepts and keywords. &lt;em&gt;It’s just sad&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Hyperbolae aside, I digress.&lt;/p&gt;

&lt;p&gt;The point of all this is that:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Often things that came before you were done for a reason, whether or not you’re aware of it and whether or not you agree with it if you are;&lt;/li&gt;

  &lt;li&gt;Breaking changes have a high price so much so that the cure is often far worse than the disease and your delicate sensibilities be damned. Internal consistency and syntactic purity is overrated. Interestingly those overly encumbered with such sensibilities seem to have a disproportionate tendency to become Python programmers.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;List Sanity&lt;/h3&gt;

&lt;p&gt;For this reason my parser has returned to what is probably the original implementation. That is:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;A leading non-indent space is ignored before list items. That is, it implies no meaning and is discarded so there is no difference between 0 and 2 leading spaces before a list item; &lt;/li&gt;

  &lt;li&gt;Up to one leading indent (meaning one tab or 0 to 4 spaces) is consumed from each subsequent line until a new list items is hit or a line with no leading spaces is met. The subsequent list item will be a part of the same list. Text with no leading spaces will end the list and form a new paragraph; and &lt;/li&gt;

  &lt;li&gt;All lines that continue the list item are combined (with their leading tab or 0 to 4 spaces consumed) and they form a new &lt;em&gt;block context&lt;/em&gt;. Meaning they are then parsed as if they were a separate input, meaning it can contain new lists, block quotes, code segments and so on. &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;(3) provides a lot of consistency. it means that if you have a list item followed by a line with two indents that second line will be a code block (one indent marking a continued list item, the second will be interpreted as a code block within the list item block context).&lt;/p&gt;

&lt;p&gt;To me this is supremely more logical—and easier to implement—but I guess if you’re really attached to nesting list items with a single space and figuring out that 5 to 11 spaces is the magical number of spaces to continue a nested list item then you’ll hate it. Too bad.&lt;/p&gt;

&lt;p&gt;The nested block context from (3) has one exception. If the nested block context would result in a single paragraph then that paragraph is unwrapped to being inline content of the list item. This has one important effect, which some may consider a breaking change. Namely this Markdown:&lt;/p&gt;

&lt;pre class="brush:plain"&gt;- one
- two
- three&lt;/pre&gt;

&lt;p&gt;and&lt;/p&gt;

&lt;pre class="brush:plain"&gt;- one

- two

- three&lt;/pre&gt;

&lt;p&gt;will both be interpreted as being:&lt;/p&gt;

&lt;pre class="brush:xml"&gt;&amp;lt;ul&amp;gt;
  &amp;lt;li&amp;gt;one&amp;lt;/li&amp;gt;
  &amp;lt;li&amp;gt;two&amp;lt;/li&amp;gt;
  &amp;lt;li&amp;gt;three&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;&lt;/pre&gt;

&lt;p&gt;whereas MarkdownSharp will interpret the latter as:&lt;/p&gt;

&lt;pre class="brush:plain"&gt;&amp;lt;ul&amp;gt;
  &amp;lt;li&amp;gt;&amp;lt;p&amp;gt;one&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;
  &amp;lt;li&amp;gt;&amp;lt;p&amp;gt;two&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;
  &amp;lt;li&amp;gt;&amp;lt;p&amp;gt;three&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;&lt;/pre&gt;

&lt;p&gt;which is something &lt;a href="http://www.cforcoding.com/2010/01/markdown-musings-on-unintended.html"&gt;I've previously documented and disagreed with&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;But this could be interpreted as a breaking change so I will probably add a special case for just this scenario as an option&lt;/p&gt;

&lt;h3&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;The block parsing portion is done. The code is ugly and needs to be refactored (again) but it works. I still have an issue with too many temporary objects being created (mainly because it simplified some code) and I’ll need to go back and eliminate that.&lt;/p&gt;

&lt;p&gt;What’s been interesting is that I’ve now rewritten the block parsing at least four times before it felt right. John Carmack once said he needs to write something five or six times before he gets it right. I agree with his sentiment. It takes that long to truly understand the domain, in my opinion.&lt;/p&gt;

&lt;p&gt;The inline parsing has been a completely different set of problems. I will have a follow-up post on that soon.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/336308386934546555-6936739971799482067?l=www.cforcoding.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/UYTohtr0pnN2va7hWbjPi6FMonI/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/UYTohtr0pnN2va7hWbjPi6FMonI/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/UYTohtr0pnN2va7hWbjPi6FMonI/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/UYTohtr0pnN2va7hWbjPi6FMonI/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/oVpahk4nXXQ" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/6936739971799482067/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2010/02/markdown-block-parsing-and-road-to-hell.html#comment-form" title="4 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/6936739971799482067?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/6936739971799482067?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/oVpahk4nXXQ/markdown-block-parsing-and-road-to-hell.html" title="Markdown, Block Parsing and the Road to Hell" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07140129710674369084" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">4</thr:total><feedburner:origLink>http://www.cforcoding.com/2010/02/markdown-block-parsing-and-road-to-hell.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CUEMRnY6eCp7ImA9WxBXFkQ.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-8527414037634384027</id><published>2010-01-28T23:48:00.001+08:00</published><updated>2010-01-28T23:48:07.810+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-01-28T23:48:07.810+08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="opinion" /><category scheme="http://www.blogger.com/atom/ns#" term="Java" /><title>Java IDEs: the Blue Heeler, the Dachshund and the Labradoodle</title><content type="html">&lt;p&gt;I’ve had a frustrating week. I’m on a mission to find out why a piece of code I wrote had a “blowout” in execution time (“blowout” here means 60 microseconds instead of 15 in sustained usage just to keep things in perspective). I suspect it’s to do with temporary objects either auto-boxing/unboxing and/or temporary arrays.&lt;/p&gt;  &lt;p&gt;Java, in my opinion, has the best IDEs of any language or platform bar none. Say what you want about the language but the IDEs are, on the whole, first rate. That doesn’t mean there aren’t bumps along the road however.&lt;/p&gt;  &lt;p&gt;For the purposes of this completely biased rant I shall liken them to dog breeds.&lt;/p&gt;  &lt;h3&gt;Blue Heeler&lt;/h3&gt;  &lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/File:ACD-blue-spud.jpg" rel="license"&gt;&lt;img style="width: 250px" src="http://img32.imageshack.us/img32/8514/blueheeler.jpg" /&gt;&lt;/a&gt;The Blue Heeler is one kind of &lt;a href="http://en.wikipedia.org/wiki/Australian_Cattle_Dog"&gt;Australian cattle dog&lt;/a&gt;. It’s used on sheep farms and cattle stations to round up livestock. It’s not the prettiest of breeds.&lt;/p&gt;  &lt;p&gt;So you won’t see these as family pets or in trendy dog parks or in your neighbourhood. But they’re smart, obedient, protective and hard-working. If you’re herding cattle? You won’t see much else. The Blue Heeler is a &lt;em&gt;working dog&lt;/em&gt; and a victory for utilitarianism.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://www.jetbrains.com/idea/"&gt;IntelliJ IDEA&lt;/a&gt; is the Blue Heeler of the Java IDE world.&lt;/p&gt;  &lt;p&gt;Ever hear anyone rave about &lt;a href="http://www.jetbrains.com/resharper/index.html"&gt;Resharper&lt;/a&gt; when talking about Visual Studio? Or even go so far as to say that Resharper is what makes Visual Studio good? Well, Resharper is adding the functionality to Visual Studio that IntelliJ has for Java.&lt;/p&gt;  &lt;p&gt;Yet all is not perfect in IntelliJ-land. The biggest problem is plugins. You certainly don’t have the range that, say, Eclipse does. But nor do you have the “plugin hell” woes so often associated with Eclipse either. Open source frameworks will tend to release plugins for Eclipse and its up to third parties to make IntelliJ versions, which doesn’t always happen.&lt;/p&gt;  &lt;p&gt;On the bright side, you don’t actually need that many plugins because nearly everything you need is done out of the box anyway.&lt;/p&gt;  &lt;p&gt;What makes it worse is that &lt;strong&gt;&lt;em&gt;&lt;a href="http://www.jetbrains.com/"&gt;Jetbrains&lt;/a&gt; keeps breaking all the plugins&lt;/em&gt;&lt;/strong&gt;. IntelliJ 9 is relatively new but it once again broke all the plugins. I’ve lost count of the number of times a major version has done this. Seriously, can’t you guys make the plug-in architecture remotely backwards compatible? Is that breaking change you’re making &lt;em&gt;really&lt;/em&gt; necessary? Really?&lt;/p&gt;  &lt;p&gt;The &lt;a href="http://jetty.codehaus.org/jetty/"&gt;Jetty&lt;/a&gt; plugin &lt;em&gt;still&lt;/em&gt; doesn’t work, which is a reasonably big deal. Worse, i can’t find a profiler that works in IntelliJ to save my life, except possibly &lt;a href="http://www.ej-technologies.com/products/jprofiler/overview.html"&gt;JProfiler&lt;/a&gt; but who can justify &lt;a href="http://www.ej-technologies.com/buy/jprofiler/single"&gt;$499&lt;/a&gt; for a fixed single license of a profiler? Especially considering the same thing for the whole rest of the IDE is &lt;a href="http://www.jetbrains.com/idea/buy/index.jsp"&gt;$249&lt;/a&gt; (if you don’t want to use the &lt;a href="http://www.jetbrains.com/idea/free_java_ide.html"&gt;free version&lt;/a&gt;.&lt;/p&gt;  &lt;h3&gt;Dachshund&lt;/h3&gt;  &lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/File:Short-haired-Dachshund.jpg" rel="license"&gt;&lt;img style="width: 250px" src="http://img710.imageshack.us/img710/9428/dachshund.jpg" /&gt;&lt;/a&gt;The &lt;a href="http://en.wikipedia.org/wiki/Dachshund"&gt;Dachshund&lt;/a&gt; is a strange and impractical dog. Just look at it and you can tell it’s no product of evolution. Not without man’s intervention anyway.&lt;/p&gt;  &lt;p&gt;Yet people like them. Families own them. They are however stubborn and hard to train and they have their fair share of health problems (including spinal problems unsurprisingly).&lt;/p&gt;  &lt;p&gt;&lt;a href="http://www.eclipse.org/"&gt;Eclipse&lt;/a&gt; is the Dachshund of the Java IDE world.&lt;/p&gt;  &lt;p&gt;If IntelliJ has not enough plugins then arguably Eclipse has too many. Hell, this goes so far as &lt;a href="http://stackoverflow.com/questions/185486/which-eclipse-subversion-plugin-should-i-use"&gt;having two Subversion plugins&lt;/a&gt;. A former colleague, who was an avid Eclipse fan, could never get either one to work. Sometimes they lied about checking stuff in (&lt;em&gt;big&lt;/em&gt; problem) or just conflicted with other stuff.&lt;/p&gt;  &lt;p&gt;Every time I try and use Eclipse for something I’m struck with an overwhelming sense of how awkward and unintuitive it is. Take Maven projects as one example. In IntelliJ or Netbeans you just open one up and it &lt;em&gt;just works&lt;/em&gt;. &lt;a href="http://www.google.com.au/search?q=eclipse+maven"&gt;Googling&lt;/a&gt; doesn’t really help either. &lt;a href="http://maven.apache.org/eclipse-plugin.html"&gt;The first link&lt;/a&gt; is seemingly out of date and it doesn’t get much better.&lt;/p&gt;  &lt;p&gt;Now I realize this is a whole Coke vs Pepsi thing. Many people are no doubt experts in Eclipse. They’re used to the “Eclipse Way” so it all makes sense (which strikes me as a form of &lt;a href="http://en.wikipedia.org/wiki/Stockholm_syndrome"&gt;Stockholm Syndrome&lt;/a&gt; but I digress…). Hell, they may even like the whole perspectives thing, which I’ve always hated.&lt;/p&gt;  &lt;p&gt;But if you tell me you’ve never had problems with Eclipse plugins you’re lying.&lt;/p&gt;  &lt;h3&gt;Labradoodle&lt;/h3&gt;  &lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/File:Labradoodle_Brown.jpg" rel="license"&gt;&lt;img style="width: 262px" src="http://upload.wikimedia.org/wikipedia/commons/7/70/Labradoodle_Brown.jpg" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;The &lt;a href="http://en.wikipedia.org/wiki/Labradoodle"&gt;Labradoodle&lt;/a&gt; is a strange and relatively new dog breed. Whereas programmers with too much free time come out with &lt;a href="http://www.cise.ufl.edu/~manuel/obfuscate/pi.c"&gt;bizarre ways of calculating Pi&lt;/a&gt; and other such boondoggles, dog breeders with idle hands decide to answer a question that has plagued civilization since Aristotle’s time:&lt;/p&gt;  &lt;p&gt;What happens when you cross a Labrador Retriever with a poodle?&lt;/p&gt;  &lt;p&gt;Baseless hyperbolae aside, I’m sure there was a reason. I just don’t know what it was.&lt;/p&gt;  &lt;p&gt;But what resulted is a friendly, energetic and not-too-bright breed that families tend to like.&lt;/p&gt;  &lt;p&gt;Well &lt;a href="http://netbeans.org/"&gt;Netbeans&lt;/a&gt; is the Labradoodle of the Java IDE world.&lt;/p&gt;  &lt;p&gt;Netbeans does some things very well, particularly Swing development (which admittedly in today’s Web-focused world is a lot like being the best manufacturer of horse bridles and saddles).&lt;/p&gt;  &lt;p&gt;Netbeans also faces an uncertain future with Oracle’s acquisition of Sun.&lt;/p&gt;  &lt;p&gt;Netbeans at least immediately understand my Maven project. It couldn’t find classes with main() methods that were under the test directory (IntelliJ could) but it otherwise all just worked.&lt;/p&gt;  &lt;p&gt;So it was looking good to finally get a profiler running… until I came across a bug. There is at least one open bug against Netbeans that raises an issue against Windows 7. My dev machine is a Windows 7 64 bit machine. Months after Windows 7’s release—nearly a year after the beta version—to still have permission problems is simply unacceptable. Yet that’s what happens when I try and use the profiler.&lt;/p&gt;  &lt;h3&gt;Conclusion&lt;/h3&gt;  &lt;p&gt;All this and I still have no profile of my code!&lt;/p&gt;  &lt;p&gt;Please don’t waste my time and yours by commenting or sending me a message saying I’m wrong about &amp;lt;&lt;em&gt;insert favourite IDE here&lt;/em&gt;&amp;gt;. You’re missing the point of a rant (in that largely there isn’t one).&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/336308386934546555-8527414037634384027?l=www.cforcoding.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/aiNnNVPAMiS7UWZHxq0CkR49xFk/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/aiNnNVPAMiS7UWZHxq0CkR49xFk/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/aiNnNVPAMiS7UWZHxq0CkR49xFk/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/aiNnNVPAMiS7UWZHxq0CkR49xFk/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/t-7YiP6nC08" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/8527414037634384027/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2010/01/java-ides-blue-heeler-dachshund-and.html#comment-form" title="10 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/8527414037634384027?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/8527414037634384027?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/t-7YiP6nC08/java-ides-blue-heeler-dachshund-and.html" title="Java IDEs: the Blue Heeler, the Dachshund and the Labradoodle" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07140129710674369084" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">10</thr:total><feedburner:origLink>http://www.cforcoding.com/2010/01/java-ides-blue-heeler-dachshund-and.html</feedburner:origLink></entry><entry gd:etag="W/&quot;D04EQ3szfCp7ImA9WxBXE08.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-735938443884589256</id><published>2010-01-24T13:34:00.001+08:00</published><updated>2010-01-24T17:38:22.584+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-01-24T17:38:22.584+08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="open source" /><category scheme="http://www.blogger.com/atom/ns#" term="Java" /><category scheme="http://www.blogger.com/atom/ns#" term="parsing" /><category scheme="http://www.blogger.com/atom/ns#" term="computer science" /><category scheme="http://www.blogger.com/atom/ns#" term="markdown" /><title>Markdown and an Introduction to Parsing Expression Grammars (PEG)</title><content type="html">&lt;p&gt;Writing an &lt;a href="http://www.antlr.org/"&gt;ANTLR&lt;/a&gt; &lt;em&gt;LL(*)&lt;/em&gt; grammar for Markdown has been the itch I just can’t scratch this month. I keep going back to it as I have a new idea about how to approach the problem or how to solve a previous problem I’ve had. Each time I get further but I still keep hitting a wall.&lt;/p&gt;  &lt;p&gt;It’s a shame really because &lt;a href="http://www.antlr.org/works/index.html"&gt;ANTLRWorks&lt;/a&gt; is an excellent tool and ANTLR is an extremely mature product. The rewriting rules and tree grammars are extremely elegant.&lt;/p&gt;  &lt;p&gt;Over the couple of weeks I’ve been investigating PEGs (“Parsing Expression Grammars”). I highly recommend &lt;a href="http://pdos.csail.mit.edu/papers/parsing:popl04.pdf"&gt;Parsing Expression Grammars: A Recognition-Based Syntactic Foundation&lt;/a&gt; by &lt;a href="http://www.brynosaurus.com/"&gt;Bryan Ford&lt;/a&gt;. PEGs are relatively new (Ford’s paper was published in 2004) whereas parsing CFGs (“Context Free Grammars”) with &lt;em&gt;&lt;a href="http://en.wikipedia.org/wiki/LL_parser"&gt;LL&lt;/a&gt;&lt;/em&gt;, &lt;em&gt;&lt;a href="http://en.wikipedia.org/wiki/LR_parser"&gt;LR&lt;/a&gt;&lt;/em&gt; and &lt;em&gt;&lt;a href="http://en.wikipedia.org/wiki/LALR_parser"&gt;LALR&lt;/a&gt;&lt;/em&gt; parsers has a history going back decades.&lt;/p&gt;  &lt;h3&gt;Traditional Parsers&lt;/h3&gt;  &lt;p&gt;Parsing of computer and natural languages (by computers) has it’s roots in &lt;a href="http://en.wikipedia.org/wiki/Noam_Chomsky"&gt;Noam Chomsky&lt;/a&gt;’s work on generative grammars, particularly the &lt;a href="http://en.wikipedia.org/wiki/Chomsky_hierarchy"&gt;Chomsky hierarchy&lt;/a&gt; and the work of &lt;a href="http://en.wikipedia.org/wiki/Donald_Knuth"&gt;Donald Knuth&lt;/a&gt; (On the Translation of Languages from Left to Right [1965]) and Frank DeRemer (&lt;a href="http://portal.acm.org/citation.cfm?id=888578"&gt;Practical Translators for LR(k) Languages&lt;/a&gt; [1969]).&lt;/p&gt;  &lt;p&gt;To understand &lt;a href="http://en.wikipedia.org/wiki/Parsing_expression_grammar"&gt;Parsing Expression Grammars&lt;/a&gt; let me first explain the basic workings of a traditional parsers. The first step is lexical analysis that turns an input stream into a series of tokens. The parser will then apply various rules to these tokens. There are varying techniques for dealing with ambiguities and recursive rules.&lt;/p&gt;  &lt;p&gt;As &lt;a href="http://en.wikipedia.org/wiki/Terence_Parr"&gt;Terence Parr&lt;/a&gt; (the creator of ANTLR) puts it in his (excellent) &lt;a href="http://www.amazon.com/Definitive-Antlr-Reference-Domain-Specific-Programmers/dp/0978739256"&gt;The Definitive ANTLR Reference: Building Domain-Specific Languages&lt;/a&gt;:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;Unfortunately, ANTLR cannot generate a top-down recognizer for every grammar—&lt;em&gt;LL&lt;/em&gt; recognizers restrict the class of acceptable grammars somewhat. For example, ANTLR cannot accept left-recursive grammars such as the following (see Section 11.5, Left-Recursive Grammars, on page 274):&lt;/p&gt;    &lt;pre&gt;&lt;code&gt;/** An expression is defined to be an expression followed by '++' */
expr : expr '++'
     ;&lt;/code&gt;&lt;/pre&gt;

  &lt;p&gt;ANTLR translates this grammar to a recursive method called &lt;code&gt;expr()&lt;/code&gt; that immediately invokes itself:&lt;/p&gt;

  &lt;pre&gt;&lt;code&gt;void expr() {
  expr();
  match(&amp;quot;++&amp;quot;);
}&lt;/code&gt;&lt;/pre&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is something that &lt;em&gt;LALR&lt;/em&gt; parsers handle better.&lt;/p&gt;

&lt;h3&gt;The Lexical Analysis Problem&lt;/h3&gt;

&lt;p&gt;But the big problem as far as Markdown is concerned is that tokens are not context-free. Take a natural definition for lists:&lt;/p&gt;

&lt;pre class="brush:plain"&gt;listItem    : ORDERED inline NEWLINE
            | UNORDERED inline NEWLINE
            ;

ORDERED     : DIGIT+ '.' (' ' | '\t')+ ;
UNORDERED   : ('*' | '-' | '+') (' ' | '\t')+ ;
inline      : (~ NEWLINE)+ ;
NEWLINE     : '\r' '\n'? : '\n' ;&lt;/pre&gt;

&lt;p&gt;and this Markdown:&lt;/p&gt;

&lt;pre class="brush:plain"&gt;1. one
2. two
3. three&lt;/pre&gt;

&lt;p&gt;will be converted into this lexical stream:&lt;/p&gt;

&lt;pre class="brush:plain"&gt;ORDERED inline(&amp;quot;one&amp;quot;) NEWLINE
ORDERED inline(&amp;quot;two&amp;quot;) NEWLINE
ORDERED inline(&amp;quot;three&amp;quot;) NEWLINE&lt;/pre&gt;

&lt;p&gt;and then this AST (&amp;quot;Absract Syntax Tree&amp;quot;) will result:&lt;/p&gt;

&lt;pre class="brush:plain"&gt;document
+- listItem
|  +- ORDERED
|  +- inline (&amp;quot;one&amp;quot;)
|  +- NEWLINE
+- listItem
|  +- ORDERED
|  +- inline (&amp;quot;two&amp;quot;)
|  +- NEWLINE
+- listItem
   +- ORDERED
   +- inline (&amp;quot;three&amp;quot;)
   +- NEWLINE&lt;/pre&gt;

&lt;p&gt;Looks good, right? Wrong. It quickly falls down when the Markdown becomes pathological:&lt;/p&gt;

&lt;pre class="brush:plain"&gt;1. 1. one
2. two
3. three&lt;/pre&gt;

&lt;p&gt;because the input stream is:&lt;/p&gt;

&lt;pre class="brush:plain"&gt;ORDERED ORDERED inline(&amp;quot;one&amp;quot;) NEWLINE
ORDERED inline(&amp;quot;two&amp;quot;) NEWLINE
ORDERED inline(&amp;quot;three&amp;quot;) NEWLINE&lt;/pre&gt;

&lt;p&gt;assuming you can resolve the ambiguity regarding inline being able to technically match &amp;quot;1.&amp;quot; (which you can.... &lt;em&gt;kinda&lt;/em&gt;).&lt;/p&gt;

&lt;p&gt;The above will not be recognized because there is no rule that handles a pair of ORDERED tokens. Really what you want to do is not create an ORDERED token after you’ve already started a list item but at this point &lt;strong&gt;&lt;em&gt;you no longer have a context-free grammar&lt;/em&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;ANTLR’s semantic and syntactic predicates make an admirable effort of dealing with these kinds of ambiguities and context sensitivities but ultimately it’s just not designed for this kind of grammar.&lt;/p&gt;

&lt;h3&gt;Enter PEG&lt;/h3&gt;

&lt;p&gt;PEG parsers take a different approach in two important ways:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;PEGs are &lt;em&gt;not&lt;/em&gt; ambiguous. Choices in the above can lead to ambiguities. ANTLR resolves many of these by using predicates, which are a way of saying “if it looks like a duck then it’s a duck otherwise it’s something else”. PEGs use a &lt;em&gt;prioritized choice operator&lt;/em&gt;, which basically try the choices &lt;em&gt;in order&lt;/em&gt; until it finds one that matches. By definition this is unambiguous because the input stream will either be recognized or it won’t; and &lt;/li&gt;

  &lt;li&gt;PEGs better handle non-CFGs by trying to recognize tokens as part of processing a rule rather than recognizing tokens and then applying rules to them. &lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;Prioritized Choice&lt;/h3&gt;

&lt;p&gt;So in PEG terms, Markdown becomes easier to describe:&lt;/p&gt;

&lt;pre class="brush:plain"&gt;Document &amp;lt;- Line*
Line     &amp;lt;- Heading / ListItem / Inline / Empty
Heading  &amp;lt;= '#'+ WS+ Inline
ListItem &amp;lt;- (DIGIT+ '.' / '*' / '-' / '+') WS+ Inline
Inline   &amp;lt;- (!NEWLINE .)+ NEWLINE

DIGIT    &amp;lt;- [0-9]
WS       &amp;lt;- ' ' | '\t'
NEWLINE  &amp;lt;- '\r\n' / '\r' / '\n'&lt;/pre&gt;

&lt;p&gt;This is of course partial and a simplification but the important thing here is that prioritized choice resolves what otherwise will be ambiguous. This is the “else” clause I’ve been looking for.&lt;/p&gt;

&lt;h3&gt;Context-Sensitive Tokenization&lt;/h3&gt;

&lt;p&gt;Markdown has lots of these issues. For example ‘###’ &lt;em&gt;might&lt;/em&gt; indicate a header but only if that line itself isn’t a header (by the next line consisting of all equals signs or hyphens). ANTLR allows you to handle some of these situations by doing something like:&lt;/p&gt;

&lt;pre class="brush:plain"&gt;HEADER : {getCharPositionInLine()==0]?=&amp;gt; ‘#’+ WS+ ;&lt;/pre&gt;

&lt;p&gt;but what about this Markdown?&lt;/p&gt;

&lt;pre class="brush:plain"&gt;&amp;gt; # quoted heading
&amp;gt; some text&lt;/pre&gt;

&lt;p&gt;It’s entirely possible I’m missing some key part of the puzzle here but I’m not hopeful.&lt;/p&gt;

&lt;p&gt;Ford illustrates this problem:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;...PEGs also create new possibilities for language syntax design. Consider for example a well-known problem with C++ syntax involving nested template type expressions:&lt;/p&gt;

  &lt;pre&gt;&lt;code&gt;vector&amp;lt;vector&amp;lt;float&amp;gt; &amp;gt; MyMatrix;&lt;/code&gt;&lt;/pre&gt;

  &lt;p&gt;The space between the two right angle brackets is required because the C++ scanner is oblivious to the language’s hierarchical syntax, and would otherwise interpret the &lt;code&gt;&amp;gt;&amp;gt;&lt;/code&gt; incorrectly as a right shift operator. &lt;strong&gt;&lt;em&gt;In a language described by a unified PEG, however, it is easy to define the language to permit a &lt;code&gt;&amp;gt;&amp;gt;&lt;/code&gt; sequence to be interpreted as either one token or two depending on its context:&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

  &lt;pre&gt;&lt;code&gt;TemplType &amp;lt;- PrimType (LANGLE TemplType RANGLE)?
ShiftExpr &amp;lt;- PrimExpr (ShiftOper PrimExpr)*
ShiftOper &amp;lt;- LSHIFT / RSHIFT
LANGLE    &amp;lt;- ’&amp;lt;’ Spacing
RANGLE    &amp;lt;- ’&amp;gt;’ Spacing
LSHIFT    &amp;lt;- ’&amp;lt;&amp;lt;’ Spacing
RSHIFT    &amp;lt;- ’&amp;gt;&amp;gt;’ Spacing
&lt;/code&gt;&lt;/pre&gt;
&lt;/blockquote&gt;

&lt;p&gt;(emphasis added)&lt;/p&gt;

&lt;h3&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;This isn’t a new problem and I’m the first to approach the issue of Markdown parsing with a PEG grammar. &lt;a href="http://www.ohloh.net/p/peg-markdown"&gt;peg-markdown&lt;/a&gt; is an implementation of Markdown in C using a PEG parser.&lt;/p&gt;

&lt;p&gt;My own effort is going forward despite this implementation existing for several reasons:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;I plan on having implementations in several languages; &lt;/li&gt;

  &lt;li&gt;I intend to implement various Markdown and Wiki extensions and flavours; and &lt;/li&gt;

  &lt;li&gt;Because I’m getting a kick out of it. &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I learnt compiler theory in university but it was all quite theoretical with simple yet interesting examples. The practical application to a real-world problem is quite something else. Plus PEG is only 6 or so years old so is new to me.&lt;/p&gt;

&lt;p&gt;It is my belief that PEGs are a far more natural and robust means of parsing &lt;em&gt;any&lt;/em&gt; form of Markdown, Wiki syntax, BBcode or other forum format.&lt;/p&gt;

&lt;p&gt;And that’s the direction I’m heading.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/336308386934546555-735938443884589256?l=www.cforcoding.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/qBF3O2qdd53iigIVMLSZbyDRJYo/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/qBF3O2qdd53iigIVMLSZbyDRJYo/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/qBF3O2qdd53iigIVMLSZbyDRJYo/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/qBF3O2qdd53iigIVMLSZbyDRJYo/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/k232pHNkPzk" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/735938443884589256/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2010/01/markdown-and-introduction-to-parsing.html#comment-form" title="7 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/735938443884589256?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/735938443884589256?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/k232pHNkPzk/markdown-and-introduction-to-parsing.html" title="Markdown and an Introduction to Parsing Expression Grammars (PEG)" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07140129710674369084" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">7</thr:total><feedburner:origLink>http://www.cforcoding.com/2010/01/markdown-and-introduction-to-parsing.html</feedburner:origLink></entry><entry gd:etag="W/&quot;C0QCQH4yeCp7ImA9WxBXE0w.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-6693774793451367620</id><published>2010-01-17T11:49:00.002+08:00</published><updated>2010-01-24T13:36:01.090+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-01-24T13:36:01.090+08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="open source" /><category scheme="http://www.blogger.com/atom/ns#" term="Java" /><category scheme="http://www.blogger.com/atom/ns#" term="parsing" /><category scheme="http://www.blogger.com/atom/ns#" term="markdown" /><title>Markdown Headings, Grief and Unknown Elements to the Rescue</title><content type="html">&lt;p&gt;Well it’s not a day to be outside (unless you’re at the beach). It’s &lt;a href="http://www.weather.com.au/wa/perth"&gt;41 degrees&lt;/a&gt; and that’s metric (none of this Imperial rubbish that only the US uses). That’s 106F in the old scale.&lt;/p&gt;  &lt;p&gt;So I’m tackling the problem of Markdown headings in my parser.&lt;/p&gt;  &lt;pre class="brush:plain"&gt;Heading 1
=========

Heading 2
---------

# Heading 1
## Heading 2
### Heading 3

Horizontal rules:

-------
*******
_______&lt;/pre&gt;

&lt;p&gt;This is really annoying for two reasons:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Ambiguous syntax: a line of hyphens could be a horizontal rule or indicate a heading depending on the context (and again we return to the point of Markdown being context-snesitive); and &lt;/li&gt;

  &lt;li&gt;From an LL-persepctive this is left-recursive and requires LL(*) (arbitrary lookahead) in a normal grammar. &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Let me explain.&lt;/p&gt;

&lt;p&gt;A subset of the grammar for Markdown might look something like this (in ANTLR-like syntax):&lt;/p&gt;

&lt;pre class="brush:plain"&gt;document  : block* ;
block     : paragraph | heading | heading1 | heading2 | codeblock ;
paragraph : inline+ END_BLOCK ;
heading   : '#'+ inline NEWLINE ;
heading1  : inline+ NEWLINE '='+ (NEWLINE | END_BLOCK) ;
heading2  : inline| NEWLINE '-'+ (NEWLINE | END_BLOCK) ;
inline    : '*' inline+ '*'
          | '`' inline+ '`'
          | ...
          | OTHER+
OTHER     : '.' ;
END_BLOCK : '\n' '\n'+ | EOF ;&lt;/pre&gt;

&lt;p&gt;Try and plug something like that into ANTLR and it will complain all over the place.&lt;/p&gt;

&lt;p&gt;Firstly it’s ambiguous. An input sequence like “*123*” matches two of the inline alternatives. I’m led to believe that PEG parsers can deal with this by simply trying rules in the order they appear. That would fit a lot better to this situation. ANTLR can (messily) handle it with syntactic predicates.&lt;/p&gt;

&lt;p&gt;The other problem is the grammar is left-recursive, most notably with the inline rule.&lt;/p&gt;

&lt;p&gt;Yet another problem is that this requires arbitrary lookahead (again, something ANTLR can do with its LL(*) algorithm) because the token that delineates the heading rules is right at the end of a cyclic rule.&lt;/p&gt;

&lt;p&gt;It's even worse once you start factoring in paragraphs and lists.&lt;/p&gt;

&lt;p&gt;All of this leads to a whole bunch of headaches but I thought about this long and hard (going so far as to wake up in a sweat after a lexical analysis nightmare) and came up with a much more elegant (imho) solution&lt;/p&gt;

&lt;p&gt;Consider a lexical stream that looks like this:&lt;/p&gt;

&lt;pre class="brush:plain"&gt;WORD(&amp;quot;Heading&amp;quot;) WHITE_SPACE(&amp;quot; &amp;quot;) WORD(&amp;quot;1&amp;quot;) ...&lt;/pre&gt;

&lt;p&gt;What's next is important because the parser doesn't yet know if this is a paragraph or a heading. But here is where I was trying to be too clever for my own good by determining the block quote in the lexer. After all, it would make the parsing step easier if I could just use a stack to push/pop block elements already knowing what they are.&lt;/p&gt;

&lt;p&gt;Instead I decided to treat a stream of inline elements as an Unknown element and I could just determine the type as a parsing action rather than a lexcial rule. So the grammar simplifies somewhat to:&lt;/p&gt;

&lt;pre class="brush:plain"&gt;document  : block* ;
block     : codeblock | unknown ;
codeblock : ('    ' .* NEWLINE)+ ;
unknown   : inline* END_BLOCK ;&lt;/pre&gt;

&lt;p&gt;Again, order of rules is useful here, meaning if it looks like a code block it is a code block, otherwise it’s an unknown block. A syntactic predicate could handle this or you could make an indent at the start of a line an INDENT token, which wouldn’t fit into the inline rule. This makes the grammar unambiguous but still requires arbitrary lookahead. It’s easier to simply make a decision based on the first token and avoid any backtracking whatsoever.&lt;/p&gt;

&lt;p&gt;So if the parsing actions come across the right token sequence within the unknown block it changes that block to a heading, otherwise when that block ends it simply defaults to being a paragraph.&lt;/p&gt;

&lt;p&gt;Think of unknown elements as being the stem cells of Markdown lexical analysis.&lt;/p&gt;

&lt;p&gt;Anyway that was my revelation for the week. I still need to finish my list handling, inline styling and links, which is still more than I’d like but it’s getting there.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/336308386934546555-6693774793451367620?l=www.cforcoding.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/Lhbl-tXQJvECADzHxRgtq5SMqX0/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/Lhbl-tXQJvECADzHxRgtq5SMqX0/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/Lhbl-tXQJvECADzHxRgtq5SMqX0/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/Lhbl-tXQJvECADzHxRgtq5SMqX0/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/mLJtFVra_cI" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/6693774793451367620/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2010/01/markdown-headings-grief-and-unknown.html#comment-form" title="12 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/6693774793451367620?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/6693774793451367620?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/mLJtFVra_cI/markdown-headings-grief-and-unknown.html" title="Markdown Headings, Grief and Unknown Elements to the Rescue" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07140129710674369084" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">12</thr:total><feedburner:origLink>http://www.cforcoding.com/2010/01/markdown-headings-grief-and-unknown.html</feedburner:origLink></entry><entry gd:etag="W/&quot;C0QDRXo4cSp7ImA9WxBXE0w.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-7495962090909617820</id><published>2010-01-14T23:24:00.002+08:00</published><updated>2010-01-24T13:36:14.439+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-01-24T13:36:14.439+08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="open source" /><category scheme="http://www.blogger.com/atom/ns#" term="Java" /><category scheme="http://www.blogger.com/atom/ns#" term="parsing" /><category scheme="http://www.blogger.com/atom/ns#" term="markdown" /><title>Markdown Musings on Unintended Consequences</title><content type="html">&lt;p&gt;It may seem lately that Markdown is my white whale to which I respond thusly… call me Ahab.&lt;/p&gt;  &lt;p&gt;One of the problems with implementing something like this is that no one can quite agree on what exactly constitutes Markdown. It gets worse when you consider Wiki syntaxes. What’s stunning is that someone (&lt;a href="http://www.cosmocode.de/en/index"&gt;CosmoCode&lt;/a&gt;) has gone so far as to create a &lt;a href="http://www.wikimatrix.org/index.php"&gt;matrix comparing them all&lt;/a&gt;.&lt;/p&gt;  &lt;p&gt;If you peruse the unit tests you find things like:&lt;/p&gt;  &lt;pre class="brush:plain"&gt;Asterisks tight:

* asterisk 1
* asterisk 2
* asterisk 3


Asterisks loose:

* asterisk 1

* asterisk 2

* asterisk 3&lt;/pre&gt;

&lt;p&gt;is converted to:&lt;/p&gt;

&lt;pre class="brush:html"&gt;&amp;lt;p&amp;gt;Asterisks tight:&amp;lt;/p&amp;gt;

&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;asterisk 1&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;asterisk 2&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;asterisk 3&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;

&amp;lt;p&amp;gt;Asterisks loose:&amp;lt;/p&amp;gt;

&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;asterisk 1&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;asterisk 2&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;asterisk 3&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;&lt;/pre&gt;

&lt;p&gt;Now having gone through the code I can see why this is: two newlines is typically used as a block delimiter, between paragraphs, code blocks and so forth. But I have to wonder three things:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Is this planned behaviour or simply the result of splitting the file into blocks using two or more newlines as a delimeter? &lt;/li&gt;

  &lt;li&gt;Is this behaviour desirable? &lt;/li&gt;

  &lt;li&gt;Is this behaviour reasonable? &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Of course there is a case for paragraphs being nested in list items, namely that you have two or more paragraphs or other nested block content within list items. This is certainly something you can do—and will do—in HTML but I’m not so convinced that a newlines wrapping list content in a paragraph is anything other than an unintended consequence.&lt;/p&gt;

&lt;p&gt;Of course there is no grammar or spec for Markdown so it’s something you can argue til the cows come home. You can also change it and still call what you do “Markdown”. It’s why there are so many Wiki syntaxes.&lt;/p&gt;

&lt;p&gt;There are other issues. For example, should you be able to start or end bold or italic styling in the middle of a word? I believe Github has taken the approach that underscores for italics can’t start or end intra-word, sensibly (as this is a common occurrence in source code).&lt;/p&gt;

&lt;p&gt;Lastly, Markdown preserves HTML. It’s my opinion that it should be replaced with Markdown where possible. What should you do with this:&lt;/p&gt;

&lt;pre class="brush:html"&gt;&amp;lt;blockquote&amp;gt;
  &amp;lt;ul id=&amp;quot;list&amp;quot;&amp;gt;
    &amp;lt;li&amp;gt;one&amp;lt;/li&amp;gt;
    &amp;lt;li&amp;gt;two&amp;lt;/li&amp;gt;
    &amp;lt;li&amp;gt;three&amp;lt;/li&amp;gt;
  &amp;lt;/ul&amp;gt;
&amp;lt;/blockquote&amp;gt;&lt;/pre&gt;

&lt;p&gt;In my opinion, it would make sense to convert this to:&lt;/p&gt;

&lt;pre class="brush:plain"&gt;&amp;gt; 1. one
&amp;gt; 2. two
&amp;gt; 3. three&lt;/pre&gt;

&lt;p&gt;Of course you lose information in doing this (namely the id attribute) but you have to decide: are you using Markdown or HTML?&lt;/p&gt;

&lt;p&gt;Opinions will of course vary.&lt;/p&gt;

&lt;p&gt;Weighty issues indeed! But this is what I’m struggling with as I’m working on my list parsing while trying to prevent my lexer from becoming a pushdown automaton.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/336308386934546555-7495962090909617820?l=www.cforcoding.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/K_8lzUR7yWe9hjLV4aopRmOQL70/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/K_8lzUR7yWe9hjLV4aopRmOQL70/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/K_8lzUR7yWe9hjLV4aopRmOQL70/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/K_8lzUR7yWe9hjLV4aopRmOQL70/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/G9L-okS9MWw" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/7495962090909617820/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2010/01/markdown-musings-on-unintended.html#comment-form" title="6 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/7495962090909617820?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/7495962090909617820?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/G9L-okS9MWw/markdown-musings-on-unintended.html" title="Markdown Musings on Unintended Consequences" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07140129710674369084" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">6</thr:total><feedburner:origLink>http://www.cforcoding.com/2010/01/markdown-musings-on-unintended.html</feedburner:origLink></entry><entry gd:etag="W/&quot;C0QNQ3k7fCp7ImA9WxBXE0w.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-268406792312293234</id><published>2010-01-13T23:03:00.003+08:00</published><updated>2010-01-24T13:36:32.704+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-01-24T13:36:32.704+08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="open source" /><category scheme="http://www.blogger.com/atom/ns#" term="Java" /><category scheme="http://www.blogger.com/atom/ns#" term="parsing" /><category scheme="http://www.blogger.com/atom/ns#" term="markdown" /><title>More Details on JMD Markdown Parsing</title><content type="html">&lt;p&gt;I’ve reached an important milestone tonight. As &lt;a href="http://www.cforcoding.com/2010/01/jmd-markdown-and-brief-overview-of.html"&gt;previously mentioned&lt;/a&gt; I’m working on a non-regex Markdown library for Java (and other languages to follow).&lt;/p&gt;  &lt;p&gt;The goals of this project are:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;To be feature complete for standard Markdown as well as the StackOverflow/Github extensions; &lt;/li&gt;    &lt;li&gt;To add table support; &lt;/li&gt;    &lt;li&gt;To support various Wiki markdown flavours; and &lt;/li&gt;    &lt;li&gt;To convert from Markdown to HTML &lt;em&gt;and from HTML to Markdown&lt;/em&gt;. &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;There looks like being four steps in this process.&lt;/p&gt;  &lt;p&gt;The first step I’ve called lexical analysis but it’s part scanning and part parsing mainly because to do it at this stage is convenient and saves me a lot of grief later. The end result of this step is a list of Tokens, which is highly memory efficient. The Token object only requires 4 integers each and for a source file 10K in size you’ll probably end up with between 2,000 and 5,000 tokens.&lt;/p&gt;  &lt;p&gt;The second step, which I’m not convinced will remain, is a rewrite step. There are a couple of awkward cases I don’t want to handle in the third step so I filter the list of tokens at this point.&lt;/p&gt;  &lt;p&gt;The last step is to take the list of tokens and to generate a Document. A Document is basically an &lt;a href="http://en.wikipedia.org/wiki/Abstract_syntax_tree"&gt;Abstract Syntax Tree&lt;/a&gt; and looks a lot like a DOM.&lt;/p&gt;  &lt;p&gt;The last step is to use the &lt;a href="http://en.wikipedia.org/wiki/Visitor_pattern"&gt;Visitor pattern&lt;/a&gt; to render an HTML document.&lt;/p&gt;  &lt;p&gt;Tonight I have working code that does all four steps. It is still very much feature incomplete. Lots of inline styling doesn’t work. Neither do reference images, reference links nor any kind of list. Still it is correctly handling nested block quotes, implicit paragraphs and paragraph breaks and indented code blocks.&lt;/p&gt;  &lt;p&gt;As of right now it is converting (the Code_blocks unit test from MarkdownSharp):&lt;/p&gt;  &lt;pre class="brush:plain"&gt; code block on the first line
 
Regular text.

    code block indented by spaces

Regular text.

 the lines in this block  
 all contain trailing spaces  

Regular Text.

 code block on the last line&lt;/pre&gt;

&lt;p&gt;into this:&lt;/p&gt;

&lt;pre class="brush:html"&gt;&amp;lt;pre&amp;gt;&amp;lt;code&amp;gt;code block on the first line
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;

&amp;lt;p&amp;gt;Regular text.&amp;lt;/p&amp;gt;

&amp;lt;pre&amp;gt;&amp;lt;code&amp;gt;code block indented by spaces
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;

&amp;lt;p&amp;gt;Regular text.&amp;lt;/p&amp;gt;

&amp;lt;pre&amp;gt;&amp;lt;code&amp;gt;the lines in this block  
all contain trailing spaces  
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;

&amp;lt;p&amp;gt;Regular Text.&amp;lt;/p&amp;gt;

&amp;lt;pre&amp;gt;&amp;lt;code&amp;gt;code block on the last line&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;&lt;/pre&gt;

&lt;p&gt;in &lt;strong&gt;&lt;em&gt;5 microseconds&lt;/em&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Let me repeat that: it looped through that conversion &lt;strong&gt;&lt;em&gt;one million times in under 5 seconds… in pure Java!&lt;/em&gt;&lt;/strong&gt; To compare, my regex solution is doing this in 600-700 microseconds (that’s based on the 1.006 MarkdownSharp code; 1.009 has improved block handling, which should make a difference).&lt;/p&gt;

&lt;p&gt;Now you might look at that document and say it’s not that complicated (and you’d be right) but all the infrastructure is there. I know how I’m going to implement the rest and I can’t imagine anything (other than auto-linking) significantly affecting performance. What’s more even if it was 100 times slower I’d still be happy. I’m working on a worst case of it being 10 times slower when feature complete.&lt;/p&gt;

&lt;p&gt;So far I haven’t used a single regular expression and don’t think I’ll need to apart from maybe link validation. I’ll document more about the design in future posts (after the code is released probably) to explain many optimizations you can make to this process as well as the overall parsing strategy. So far there has been almost zero need for lookahead and backtracking, which is generally what kills your performance (without complicated techniques like &lt;a href="http://en.wikipedia.org/wiki/Memoization"&gt;memoization&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;Stay tuned…&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/336308386934546555-268406792312293234?l=www.cforcoding.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/1yOZoo6bfgS_3EQ1slRUEoSgxWA/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/1yOZoo6bfgS_3EQ1slRUEoSgxWA/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/1yOZoo6bfgS_3EQ1slRUEoSgxWA/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/1yOZoo6bfgS_3EQ1slRUEoSgxWA/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/bFNDYj-3ubQ" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/268406792312293234/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2010/01/more-details-on-jmd-markdown-parsing.html#comment-form" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/268406792312293234?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/268406792312293234?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/bFNDYj-3ubQ/more-details-on-jmd-markdown-parsing.html" title="More Details on JMD Markdown Parsing" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07140129710674369084" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">2</thr:total><feedburner:origLink>http://www.cforcoding.com/2010/01/more-details-on-jmd-markdown-parsing.html</feedburner:origLink></entry><entry gd:etag="W/&quot;C0MESXgyeyp7ImA9WxBXE0w.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-4066218079347763609</id><published>2010-01-11T07:19:00.003+08:00</published><updated>2010-01-24T13:36:48.693+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-01-24T13:36:48.693+08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="open source" /><category scheme="http://www.blogger.com/atom/ns#" term="Java" /><category scheme="http://www.blogger.com/atom/ns#" term="parsing" /><category scheme="http://www.blogger.com/atom/ns#" term="computer science" /><category scheme="http://www.blogger.com/atom/ns#" term="markdown" /><title>JMD, Markdown and a Brief Overview of Parsing and Compilers</title><content type="html">&lt;p&gt;Like most comp sci students, I did a course on compilers in university. I also did some parsing and syntax trees in a data structures course. At the time I wrote a couple of parsers, including one for simplifying boolean expressions (de Morgen’s laws, a AND true = a, etc) and another for evaluating arithmetic expressions.&lt;/p&gt;  &lt;p&gt;So I’ve been familiar with the basics and the theory of compiler design but I was by no means an expert.&lt;/p&gt;  &lt;p&gt;Recently, I &lt;a href="http://www.cforcoding.com/2010/01/announcing-jmd-java-markdown-port-of.html"&gt;launched JMD&lt;/a&gt;, a Java implementation of &lt;a href="http://code.google.com/p/markdownsharp/"&gt;MarkdownSharp&lt;/a&gt;, itself a C# port and extension of the original Perl Markdown scripts. Like the original it relies heavily on regular expressions.&lt;/p&gt;  &lt;p&gt;I like the idea of e and improving Markdown but I’m no fan of using complicated regular expressions for that purpose. The current version is a milestone. It passes the unit tests and allows me to better build the replacement, which will be written more in the traditional compiler/translator sense.&lt;/p&gt;  &lt;h3&gt;Finite State Machines&lt;/h3&gt;  &lt;p&gt;&lt;img style="width: 200px" src="http://upload.wikimedia.org/wikipedia/commons/thumb/9/9d/DFAexample.svg/200px-DFAexample.svg.png" /&gt;A &lt;a href="http://en.wikipedia.org/wiki/Finite-state_machine"&gt;finite state machine&lt;/a&gt; (“FSM”) defines two things: a finite number of states and transitions between them. This abstract machines models behaviour of some kind.&lt;/p&gt;  &lt;p&gt;Often—but not always—such machines have a start state and one or more end states. FSMs are typically used for games, input processing and many other things.&lt;/p&gt;  &lt;p&gt;One important characteristic of such machines is that they typically have no memory. They merely know the current state and what transitions there are.&lt;/p&gt;  &lt;p&gt;Typically in computer science we’re more concerned with a special class of FSMs called &lt;a href="http://en.wikipedia.org/wiki/Deterministic_finite-state_machine"&gt;deterministic finite state machines&lt;/a&gt; (“DFSM”) or &lt;em&gt;deterministic finite automata&lt;/em&gt; (“DFAs”). The key difference is that transitions are deterministic, meaning there is only one transition between two states with a given input symbol.&lt;/p&gt;  &lt;p&gt;Another characteristic of finite state machines is whether they are &lt;em&gt;cyclic&lt;/em&gt; or &lt;em&gt;acyclic&lt;/em&gt;. If there exists a state such that a transition can be taken and it is possible to return to that same state the FSM is cyclic, otherwise it is acyclic. By definition, cyclic FSMs are capable of processing an infinite space of inputs. Acyclic FSMs are not.&lt;/p&gt;  &lt;h3&gt;Regular Expressions&lt;/h3&gt;  &lt;p&gt;The most familiar DFA for most programmers will probably be &lt;a href="http://www.regular-expressions.info/"&gt;regular expressions&lt;/a&gt;. a regular expression (“regex”) is a shorthand way of building a DFA to process text input by specifying the optionality, cardinality, capturing and ordering of character sequences. Typically programs will determine if a given input matches a specified regex or whether or not that regex can be found anywhere in the input and possibly capture key parts of that input.&lt;/p&gt;  &lt;p&gt;Undoubtedly regexes are useful but they tend to be overused. Consider them a shining example of how once you have a hammer everything starts to look like a nail. In particular programmers will often try to use them to parse HTML or XML documents, which tends to be a pet peeve of Stackoverflow answeres such as myself.&lt;/p&gt;  &lt;p&gt;The reason they are a poor choice is that HTML is not a &lt;a href="http://en.wikipedia.org/wiki/Regular_language"&gt;regular language&lt;/a&gt;. What that means is that it is not possible to parse and validate HTML with a DFA. That’s because things like proper nesting of tags, ordering of opening and closing tags, etc require the machine to have some kind of memory, which DFAs don’t have.&lt;/p&gt;  &lt;p&gt;You see that in regex-based Markdown parsers where limitations are imposed to make it possible to parse Markdown, such as introducing a nesting depth limit to certain block level elements.&lt;/p&gt;  &lt;p&gt;To give you an example: if you’re looking for links, will “&amp;lt;a[ &amp;gt;]” find them? Most of the time? Yes. But not all the time. Consider the case of such expressions appearing in attributes, XML CDATA blocks, inside XML, CSS or Javascript comments, inside Javascript strings, etc. Regexes can’t detect these kinds of corner cases. It simply isn’t capable. Not reliably at least.&lt;/p&gt;  &lt;p&gt;As it turns out a fairly simple change greatly enhances the power of DFAs.&lt;/p&gt;  &lt;h3&gt;Pushdown Automata&lt;/h3&gt;  &lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/Pushdown_automaton"&gt;Pushdown Automata&lt;/a&gt; (“PDAs”) make the simple change of adding a stack (hence “pushdown”), giving the machine a memory (of sorts) beyond the current state. To clarify, the machine can both inspect and manipulate the stack both in deciding what transition to take and what to do with the stack.&lt;/p&gt;  &lt;p&gt;This allows PDAs to process a much broader set of languages. A language that can be processed by a PDA is called a &lt;a href="http://en.wikipedia.org/wiki/Context-free_language"&gt;context-free language&lt;/a&gt; (“CFL”), which is a superset of all regular languages.&lt;/p&gt;  &lt;p&gt;If a PDA is deterministic (much like FSMs vs DFAs) then it is called a &lt;a href="http://en.wikipedia.org/wiki/Deterministic_pushdown_automaton"&gt;deterministic pushdown automaton&lt;/a&gt; (“DPDA”). Any languages that can be parsed by DPDAs are called &lt;a href="http://en.wikipedia.org/wiki/Deterministic_context-free_language"&gt;deterministic context-free languages&lt;/a&gt; (“DFCLs”), which are a subset of CFLs.&lt;/p&gt;  &lt;p&gt;Going back to regular expressions and HTML/XML parsing: with the addition of this stack, suddenly your parsing becomes &lt;em&gt;much&lt;/em&gt; more reliable. You can stop looking for anchors when you enter a Javascript block or a comment and so on.&lt;/p&gt;  &lt;p&gt;Many programming languages are CFLs but certainly not all.&lt;/p&gt;  &lt;h3&gt;Context-Sensitive Languages&lt;/h3&gt;  &lt;p&gt;The next broader class of languages are called &lt;a href="http://en.wikipedia.org/wiki/Context-sensitive_language"&gt;context-sensitive languages&lt;/a&gt;. CFLs are a subset of context-sensitive languages. C++ is the traditional poster-child for hard-to-parse languages. It’s grammar is also context-sensitive. Take the &lt;a href="http://stackoverflow.com/questions/1172939/is-any-part-of-c-syntax-context-sensitive/1173004#1173004"&gt;following expression&lt;/a&gt;:&lt;/p&gt;  &lt;blockquote&gt;   &lt;pre&gt;A a = B();&lt;/pre&gt;
&lt;/blockquote&gt;

&lt;p&gt;Is that a method call or object construction? You can’t tell without looking up B in the symbol table.&lt;/p&gt;

&lt;p&gt;Ruby and some other programming languages are context-sensitive. HTML/XML is also in this category and so is Markdown. For example:&lt;/p&gt;

&lt;pre&gt;&amp;gt; Block quote &amp;lt;p
&amp;gt; &amp;gt;&lt;/pre&gt;

&lt;p&gt;Is the second &amp;gt; on the second line the start of a nested block quote or the closing part of the paragraph tag at the end of the first line?&lt;/p&gt;

&lt;p&gt;Not only is this context-sensitive it’s also ambiguous. Technically we call this a &lt;em&gt;non-determinism&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Once again returning to parsing a document for anchors, context-sensitivity bridges the remaining gap. Cases like being inside a comment or not represent the kind of context-sensitivity that is unmanageable for regular languages.&lt;/p&gt;

&lt;h3&gt;Grammars, Lexers and Parsers&lt;/h3&gt;

&lt;p&gt;A &lt;a href="http://en.wikipedia.org/wiki/Formal_grammar"&gt;formal grammar&lt;/a&gt; (or just “grammar” for short) is a set of rules that describe a sequence of tokens. A token sequence is often called a sentence. The set of all sentences described by a given grammar is the language for that grammar. &lt;strong&gt;Note:&lt;/strong&gt; a context-free language is described by a context-free grammar (“CFG”), etc.&lt;/p&gt;

&lt;p&gt;Compilers, interpreters and translators are typically written in at least two parts (that are of interest to us): a lexer and a parser. Simple grammars may be implemented where the lexer and parser are combined. More complicated grammars may have many parsing steps.&lt;/p&gt;

&lt;p&gt;A lexer or &lt;a href="http://en.wikipedia.org/wiki/Lexical_analysis"&gt;lexical analyzer&lt;/a&gt; or scanner or recognizer reads a set of input tokens—most often a character stream—and converts them into lexemes or tokens. For example, a lexer for arithmetic expressions may convert:&lt;/p&gt;

&lt;pre&gt;4+5*7&lt;/pre&gt;

&lt;p&gt;into&lt;/p&gt;

&lt;pre&gt;NUMBER(4) OP(+) NUMBER(5) OP(*) NUMBER(7)&lt;/pre&gt;

&lt;p&gt;A parser is a program that interprets a stream of lexemes by a set of rules. Those rules are defined in terms of lexems and/or other rules, or even the same rule (although anything other than tail-recursion in grammar rules tends to be problematic and is usually factored out either manually because the parser won’t accept it or automatically).&lt;/p&gt;

&lt;p&gt;This distinction is somewhat artificial and a little blurred. &lt;a href="http://www.antlr.org/"&gt;ANTLR&lt;/a&gt; maeks the definition that lexer rules are &lt;em&gt;terminating rules&lt;/em&gt; and parser rules are &lt;em&gt;non-terminating&lt;/em&gt;. “Terminating” means the rules is not defined in terms of any other rules and as such can at best resolve to a lexeme.&lt;/p&gt;

&lt;h3&gt;Types of Parsers&lt;/h3&gt;

&lt;p&gt;At the top level there are two main types of parsers.&lt;/p&gt;

&lt;p&gt;The first category are &lt;a href="http://en.wikipedia.org/wiki/LL_parser"&gt;LL-parsers&lt;/a&gt;. Here each L stands for “left to right”. Basically this means the parser is &lt;em&gt;top-down&lt;/em&gt;. The parser attempts to match the input to a rule and in doing so will attempt to match input tokens to lexemes. The other L means the input tokens are matched left-to-right too (ie from hte beginning).&lt;/p&gt;

&lt;p&gt;LL parsers vary in their degree of lookahead. The simplest LL parsers are LL(1), meaning they lookahead one token. An LL parser cannot choose between:&lt;/p&gt;

&lt;pre&gt;r : A B
  | A C
  ;&lt;/pre&gt;

&lt;p&gt;An LL(2) parser however can. LL parsers with a finite amount of lookahead are also called LL(k) parsers. Arbitrary lookahead LL parsers are called LL(*) parsers.&lt;/p&gt;

&lt;p&gt;The other main category is &lt;a href="http://en.wikipedia.org/wiki/LR(0)_parser"&gt;LR parsers&lt;/a&gt;. The input tokens are still read from the beginning but instead of trying to match rules, the parser will look at the input and try to construct tokens. From those tokens it will then look for rules and match the input that way.&lt;/p&gt;

&lt;p&gt;The most important subset of LR parsers are &lt;a href="http://en.wikipedia.org/wiki/LALR_parser"&gt;LALR parsers&lt;/a&gt;. Frankly it’s been too many years for me to remember the difference between LR and LALR parsers.&lt;/p&gt;

&lt;p&gt;There are various strengths and weaknesses of each approach, which is beyond the scope of this post. Generally though, LL parsers are easier to understand but LR parsers are more often used by &lt;a href="http://en.wikipedia.org/wiki/Compiler-compiler"&gt;compiler compilers&lt;/a&gt;. A compiler compiler is a tool that takes a formal grammar and creates a compiler, parser, interpreter or translator.&lt;/p&gt;

&lt;p&gt;Another class of parsers is &lt;a href="http://en.wikipedia.org/wiki/Parsing_expression_grammar"&gt;parsing expression grammars&lt;/a&gt; (“PEGs”). I know far less about these. They’re fairly new but at least one Markdown parser, &lt;a href="http://github.com/jgm/lunamark"&gt;Lunamark&lt;/a&gt;, has been written with a PEG parser.&lt;/p&gt;

&lt;h3&gt;ANTLR&lt;/h3&gt;

&lt;p&gt;Apparently Joel and Jeff discussed this issue in &lt;a href="http://blog.stackoverflow.com/2010/01/podcast-79/"&gt;this week's Stackoverflow podcast&lt;/a&gt;. They ruminated that it would have been better had Markdown been written using a formal grammar and a tool such as bison. I agree and they provide a &lt;a href="http://code.google.com/p/markdownsharp/source/browse/trunk/MarkdownSharpTests/source/php/markdown.php#365"&gt;pretty good example of how horrifying regex parsing of non-regular languages can be&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;My own journey towards a non-regex solution first took me to &lt;a href="http://www.antlr.org/"&gt;ANTLR&lt;/a&gt; (“ANother Tool for Language Recognition”), which has a good GUI for debugging parsers. ANTLR is an LL(*) parser with some extensions. One of these is &lt;a href="http://en.wikipedia.org/wiki/Syntactic_predicate"&gt;syntactic predicates&lt;/a&gt;. Syntactic predicates increase the recognition power of LL parsers by resolving some ambiguities that LL parsers otherwise can’t handle.&lt;/p&gt;

&lt;p&gt;For example, consider the following Markdown:&lt;/p&gt;

&lt;pre&gt;&amp;gt; This is a *test of
&amp;gt; blockquoting*,
&amp;gt; emphasis and
&amp;gt; http://www.google.com
&amp;gt; (autolinking)
A paragraph.&lt;/pre&gt;

&lt;p&gt;When I was playing around with ANTLR this was problematic to parse. The natural way is to remove the blockquoting and then parse the remaining text, probably recursively.&lt;/p&gt;

&lt;p&gt;One “problem” with LL parses is that they attempt to match all the rule alternatives so if you wanted to write the above as:&lt;/p&gt;

&lt;pre&gt;document : (para | quote)* ;
para     : ((~ '\n') '\n')+ ;
quote    : ('&amp;gt; ' (~ '\n')*)+ ;&lt;/pre&gt;

&lt;p&gt;you’ve actually created an ambiguous grammar because the para rule can match quote lines can be matched by the para rule. Syntactic predicates seek to resolve this kind of ambiguity by saying things like “if it looks like a block quote then it’s a block quote” when choosing between possible alternatives.&lt;/p&gt;

&lt;p&gt;Another problem I ran into was how to deal with things like auto-linking URLs?&lt;/p&gt;

&lt;p&gt;Throw in limited XML/HTML parsing and it just became a hair-pulling exercise. Basically it just seemed to be the wrong tool for this particular job. Now that doesn’t mean it can’t be done. Someone more skilled than I with it no doubt could get it done. I could see the path forward and it wasn’t pretty however.&lt;/p&gt;

&lt;p&gt;It’s a shame really because I like ANTLR. &lt;a href="http://www.cs.usfca.edu/~parrt/"&gt;Terence Parr&lt;/a&gt;, the author of ANTLR and all-round language tool rock star, has written an excellent book &lt;a href="http://www.pragprog.com/titles/tpantlr/the-definitive-antlr-reference"&gt;The Definitive ANTLR Reference: Building Domain-Specific Languages&lt;/a&gt; and I can’t recommend this enough.&lt;/p&gt;

&lt;h3&gt;The Future of JMD&lt;/h3&gt;

&lt;p&gt;One commentor &lt;a href="http://www.dzone.com/links/announcing_jmd_java_markdown_port_of_markdownsharp.html"&gt;asked&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;How is it better/different than MarkdownJ?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It’s a good question. The answer is that JMD will use a true parser rather than a hash of regexes to parse and process Markdown. It’ll also do a lot more than this but more on this later when its closer to fruition.&lt;/p&gt;

&lt;p&gt;The ANTLR exercise wasn’t a complete waste. I had a choice to see if LALR parsing would offer a better alternative. The ANTLR experiment did solidify in my mind how I would go about parsing Markdown.&lt;/p&gt;

&lt;p&gt;I’m convinced that a hand-coded parser is not only possible but it’s relatively straightforward.&lt;/p&gt;

&lt;p&gt;Preliminary results look extremely promising. It’s not feature-complete yet and I won’t release it until it passes a significant portion of the unit tests inherited from MarkdownSharp.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Initial results indicate it will be 50-100x faster than a regex solution.&lt;/em&gt;&lt;/strong&gt; The lexical analysis so far is being done in a single pass using virtually no memory and is taking between 10 and 20 &lt;em&gt;microseconds&lt;/em&gt; to tokenize a document about the size of one of the unit tests.&lt;/p&gt;

&lt;h3&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;I hope this post has been useful in three respects:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;To give a brief overview of the field of compiler compilers;&lt;/li&gt;

  &lt;li&gt;To explain my thought process behind how to take JMD forward; and&lt;/li&gt;

  &lt;li&gt;Why I’m doing what I’m doing.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Watch this space.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/336308386934546555-4066218079347763609?l=www.cforcoding.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/3Y6sBB6qZULulJ_MLOpHxaW5e-s/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/3Y6sBB6qZULulJ_MLOpHxaW5e-s/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/3Y6sBB6qZULulJ_MLOpHxaW5e-s/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/3Y6sBB6qZULulJ_MLOpHxaW5e-s/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/Bl0p1b2xlp8" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/4066218079347763609/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2010/01/jmd-markdown-and-brief-overview-of.html#comment-form" title="5 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/4066218079347763609?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/4066218079347763609?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/Bl0p1b2xlp8/jmd-markdown-and-brief-overview-of.html" title="JMD, Markdown and a Brief Overview of Parsing and Compilers" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07140129710674369084" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">5</thr:total><feedburner:origLink>http://www.cforcoding.com/2010/01/jmd-markdown-and-brief-overview-of.html</feedburner:origLink></entry><entry gd:etag="W/&quot;C0MGQHY-eip7ImA9WxBXE0w.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-5358229045943984522</id><published>2010-01-04T19:39:00.003+08:00</published><updated>2010-01-24T13:37:01.852+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-01-24T13:37:01.852+08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="open source" /><category scheme="http://www.blogger.com/atom/ns#" term="Java" /><category scheme="http://www.blogger.com/atom/ns#" term="parsing" /><category scheme="http://www.blogger.com/atom/ns#" term="markdown" /><title>Announcing JMD: Java MarkDown (port of MarkdownSharp)</title><content type="html">&lt;p&gt;By a strange coincidence when I was looking for text editing options for another project, the Stackoverflow guys &lt;a href="http://blog.stackoverflow.com/2009/12/introducing-markdownsharp/"&gt;released MarkdownSharp&lt;/a&gt; last week, being a C# port and extension to what was originally written in Perl.&lt;/p&gt;  &lt;p&gt;A couple of days later I have &lt;a href="http://github.com/cletus/jmd"&gt;JMD&lt;/a&gt; (Java MarkDown) with the same extensions and unit tests. At this stage—and certainly while the code stabilizes and work progresses in passing all the tests—it is an almost line-for-line translation of the C# source as this makes it easier to apply patches. This isn’t the Java Way, in particular Java favours a more DI-centric approach typified by Spring rather than static configuration.&lt;/p&gt;  &lt;p&gt;Ugliness and architectural issues aside, it will do for now. You can:&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;Download it from the &lt;a href="http://github.com/cletus/jmd/downloads"&gt;Downloads&lt;/a&gt; page; or &lt;/li&gt;    &lt;li&gt;Retrieve it from &lt;a href="http://github.com/"&gt;Github&lt;/a&gt; at &lt;a title="git://github.com/cletus/jmd.git" href="git://github.com/cletus/jmd.git"&gt;git://github.com/cletus/jmd.git&lt;/a&gt;. &lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;It is built with &lt;a href="http://maven.apache.org/"&gt;Maven&lt;/a&gt; and should build out of the box (assuming correctly configured Maven). Running on my machine:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Intel Q9450 CPU (2.66GHz); &lt;/li&gt;    &lt;li&gt;8GB DDR2 RAM; &lt;/li&gt;    &lt;li&gt;Windows 7 Ultimate 64; and &lt;/li&gt;    &lt;li&gt;Intel X25-M G2 80GB SSD. &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;The results are:&lt;/p&gt;  &lt;pre&gt;JMD test run

1   Amps_and_angle_encoding                                 OK
2   Auto_links                                              OK
3   Backslash_escapes                                       OK
4   Blockquotes_with_code_blocks                            OK
5   Code_Blocks                                             OK
6   Code_Spans                                              OK
7   Hard_wrapped_paragraphs_with_list_like_lines            OK
8   Horizontal_rules                                        OK
9   Images                                                  OK
10  Inline_HTML_Advanced                                    Mismatch
11  Inline_HTML_comments                                    OK
12  Inline_HTML_Simple                                      OK
13  Links_inline_style                                      OK
14  Links_reference_style                                   OK
15  Links_shortcut_references                               OK
16  Literal_quotes_in_titles                                OK
17  Markdown_Documentation_Basics                           OK
18  Markdown_Documentation_Syntax                           OK
19  Nested_blockquotes                                      OK
20  Ordered_and_unordered_lists                             Mismatch
21  Strong_and_em_together                                  OK
22  Tabs                                                    OK
23  Tidyness                                                OK^

Tests      : 23
OK         : 21 (^ 1 whitespace differences)
Mismatch   : 2

input string length: 475
4000 iterations in 6.301 seconds (1.575 ms per iteration)
input string length: 2356
1000 iterations in 6.390 seconds (6.390 ms per iteration)
input string length: 27737
100 iterations in 10.503 seconds (105.031 ms per iteration)
input string length: 11075
1 iteration in 0.037 seconds
input string length: 88607
1 iteration in 0.518 seconds
input string length: 354431
1 iteration in 4.992 seconds&lt;/pre&gt;

&lt;p&gt;To compare, on the same machine, these are the MarkdownSharp results in Visual Studio 2008:&lt;/p&gt;

&lt;pre&gt;MarkdownSharp v1.006 test run on \mdtest-1.1

001 Amps_and_angle_encoding                                OK
002 Auto_links                                             OK
003 Backslash_escapes                                      OK^
004 Blockquotes_with_code_blocks                           OK
005 Code_Blocks                                            OK
006 Code_Spans                                             OK
007 Hard_wrapped_paragraphs_with_list_like_lines           OK
008 Horizontal_rules                                       OK
009 Images                                                 OK
010 Inline_HTML_Advanced                                   Mismatch
011 Inline_HTML_comments                                   OK
012 Inline_HTML_Simple                                     OK
013 Links_inline_style                                     OK
014 Links_reference_style                                  OK
015 Links_shortcut_references                              OK
016 Literal_quotes_in_titles                               OK
017 Markdown_Documentation_Basics                          OK
018 Markdown_Documentation_Syntax                          OK
019 Nested_blockquotes                                     OK
020 Ordered_and_unordered_lists                            Mismatch
021 Strong_and_em_together                                 OK
022 Tabs                                                   OK
023 Tidyness                                               OK^

Tests        : 23
OK           : 21 (^ 2 whitespace differences)
Mismatch     : 2

MarkdownSharp v1.006 test run on \test-input

001 markdown-readme                                        OK
002 reality-check                                          OK

Tests        : 2
OK           : 2
Mismatch     : 0


MarkdownSharp v1.006 benchmark, takes 10 ~ 30 seconds...

input string length: 475
4000 iterations in 3827 ms (0.95675 ms per iteration)
input string length: 2356
1000 iterations in 4205 ms (4.205 ms per iteration)
input string length: 27737
100 iterations in 4736 ms (47.36 ms per iteration)
input string length: 11075
1 iteration in 23 ms
input string length: 88607
1 iteration in 191 ms
input string length: 354431
1 iteration in 1025 ms&lt;/pre&gt;

&lt;p&gt;So Java is roughly half the speed of C# in this regard, which is more difference than I’d expect for what is essentially the same code. At this preliminary stage I can only attribute this to the .Net Regex libraries being better.&lt;/p&gt;

&lt;p&gt;JMD is released under the same permissive &lt;a href="http://www.opensource.org/licenses/mit-license.php"&gt;MIT license&lt;/a&gt; as MarkdownSharp. Please feel free to use it, let me know what you think or to contribute.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/336308386934546555-5358229045943984522?l=www.cforcoding.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/ibszYbGG5ovic4iAZct5DYBcemc/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/ibszYbGG5ovic4iAZct5DYBcemc/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/ibszYbGG5ovic4iAZct5DYBcemc/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/ibszYbGG5ovic4iAZct5DYBcemc/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/pOLWUSGUjoQ" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/5358229045943984522/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2010/01/announcing-jmd-java-markdown-port-of.html#comment-form" title="4 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/5358229045943984522?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/5358229045943984522?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/pOLWUSGUjoQ/announcing-jmd-java-markdown-port-of.html" title="Announcing JMD: Java MarkDown (port of MarkdownSharp)" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07140129710674369084" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">4</thr:total><feedburner:origLink>http://www.cforcoding.com/2010/01/announcing-jmd-java-markdown-port-of.html</feedburner:origLink></entry><entry gd:etag="W/&quot;A08FRn45fip7ImA9WxBRE0U.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-4917424370978974063</id><published>2010-01-02T07:50:00.001+08:00</published><updated>2010-01-02T07:50:17.026+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-01-02T07:50:17.026+08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="opinion" /><category scheme="http://www.blogger.com/atom/ns#" term="Java" /><title>Java: Why-oh-why still no multi-line strings?</title><content type="html">&lt;p&gt;Over the last year or two I’ve been doing a lot of PHP (which I really like).One of the things I use a lot is &lt;a href="http://www.php.net/manual/en/language.types.string.php#language.types.string.syntax.heredoc"&gt;heredoc syntax&lt;/a&gt; eg:&lt;/p&gt;  &lt;pre class="brush:php"&gt;$query = &amp;lt;&amp;lt;&amp;lt;END
SELECT *
FROM tablename
WHERE condition1 = $field
AND condition2 = 345
END;&lt;/pre&gt;

&lt;p&gt;This is much more convenient than, say:&lt;/p&gt;

&lt;pre class="brush:java"&gt;String query =
  &amp;quot;SELECT * &amp;quot; + // MUST remember to put a space here!
  &amp;quot;FROM tablename &amp;quot; +
  &amp;quot;WHERE condition1 = &amp;quot; + field + &amp;quot; &amp;quot; + 
  &amp;quot;AND condition2 = 345&amp;quot;;&lt;/pre&gt;

&lt;p&gt;I’ve been doing quite a bit of Java recently and it’s really starting to bug me. I don't understand why pretty much every other imperative language invented in the last 15 years can have some form of multi-line string syntax but Java &lt;em&gt;still&lt;/em&gt; doesn’t.&lt;/p&gt;

&lt;p&gt;Java 6 was released over three years ago. Java 7—thanks largely to the unexpected (yet welcome) inclusion of closures—isn’t due for nearly another year. Four years between releases.&lt;/p&gt;

&lt;p&gt;Surely Java could have gotten &lt;em&gt;something&lt;/em&gt; in that time. Even if it’s the rather ugly (imho) triple-quote syntax of Scala/Groovy it’d be better than nothing.&lt;/p&gt;

&lt;p&gt;Anyway, I just needed to get that out.&lt;/p&gt;

&lt;p&gt;Happy New Year for 2010.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/336308386934546555-4917424370978974063?l=www.cforcoding.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/NG_zHDvxdlwMLN9jsb5e-69cUFE/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/NG_zHDvxdlwMLN9jsb5e-69cUFE/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/NG_zHDvxdlwMLN9jsb5e-69cUFE/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/NG_zHDvxdlwMLN9jsb5e-69cUFE/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/Wv0JlSp2Kyo" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/4917424370978974063/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2010/01/java-why-oh-why-still-no-multi-line.html#comment-form" title="4 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/4917424370978974063?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/4917424370978974063?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/Wv0JlSp2Kyo/java-why-oh-why-still-no-multi-line.html" title="Java: Why-oh-why still no multi-line strings?" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07140129710674369084" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">4</thr:total><feedburner:origLink>http://www.cforcoding.com/2010/01/java-why-oh-why-still-no-multi-line.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CEANQXs5fSp7ImA9WxBREEQ.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-3073976543799228049</id><published>2009-12-29T21:13:00.001+08:00</published><updated>2009-12-29T21:19:50.525+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-12-29T21:19:50.525+08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="performance" /><category scheme="http://www.blogger.com/atom/ns#" term="Java" /><title>Mutability, Arrays and the Cost of Temporary Objects in Java</title><content type="html">&lt;p&gt;In his 2001 must-read book, &lt;a href="http://www.amazon.com/Effective-Java-2nd-Joshua-Bloch/dp/0321356683/ref=sr_1_1?ie=UTF8&amp;amp;s=books&amp;amp;qid=1262081713&amp;amp;sr=8-1"&gt;Effective Java&lt;/a&gt;, Joshua Bloch said in one item “Favor immutability”. &lt;a href="http://www.ibm.com/developerworks/java/library/j-jtp02183.html"&gt;Java theory and practice: To mutate or not to mutate?&lt;/a&gt; provides an excellent overview of what this means and why it matters. It states:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;An immutable object is one whose externally visible state cannot change after it is instantiated.&lt;/p&gt; &lt;/blockquote&gt;  &lt;h3&gt;A Brief Overview of Immutability&lt;/h3&gt;  &lt;p&gt;Let’s say you want to create a class to model arbitrary precision &lt;a href="http://en.wikipedia.org/wiki/Rational_number"&gt;rational numbers&lt;/a&gt; (ie fractions). A mutable version might start out:&lt;/p&gt;  &lt;pre class="brush:java"&gt;public class BigRational {
  private BigInteger numerator = BigInteger.ZERO;
  private BigInteger denominator = BigInteger.ONE;

  // constructors and so on

  public BigRational add(BigRational other) {
    if (numerator.signum() == 0) {
      numerator = other.numerator;
      denominator = other.denominator;
    } else if (other.numerator.signum() == 0) {
      // no action required
    } else if (denominator.equals(other.denominator)) {
      numerator = numerator.add(other.numerator);
    } else {
      // this could be optimized for greatest common divisor
      numerator = numerator.multiply(other.denominator).add(other.numerator.multiply(denominator));
      denominator = denominator.multiply(other.denominator);
    }
    return this;
  }

  // etc
}&lt;/pre&gt;

&lt;p&gt;This is how many classes naively start. The problem comes when this class is used as a data member or a parameter. Consider this class:&lt;/p&gt;

&lt;pre class="brush:java"&gt;public class MyClass {
  private BigRational data = new BigRational();

  public BigRational getData() {
    return data;
  }
}&lt;/pre&gt;

&lt;p&gt;Doing this will modify the internal state of the class:&lt;/p&gt;

&lt;pre class="brush:java"&gt;MyClass mc = new MyClass();
BigRational rational = mc.getData();
rational.multiply(rational);&lt;/pre&gt;

&lt;p&gt;Obviously this behaviour isn’t desirable, which leads to the practice of &lt;a href="http://www.javapractices.com/topic/TopicAction.do?Id=15"&gt;defensive copying&lt;/a&gt;, a practice familiar to any C programmer. Each time this getter is called a temporary copy is created so the internal state of the class isn’t violated.&lt;/p&gt;

&lt;p&gt;One of the biggest early errors in Java’s design was that the &lt;a href="http://java.sun.com/javase/6/docs/api/java/util/Date.html"&gt;Date class&lt;/a&gt; is mutable. This means that any API that uses dates either has to use a non-standard date class or defensively copy date instances. Oddly, Java got String, BigInteger and BigDecimal all right as they’re all immutable. Even stranger, the later &lt;a href="http://java.sun.com/javase/6/docs/api/java/util/Calendar.html"&gt;Calendar class&lt;/a&gt; (introduced in JDK 1.1) was also made mutable.&lt;/p&gt;

&lt;p&gt;An immutable version would look something like:&lt;/p&gt;

&lt;pre class="brush:java"&gt;public class BigRational {
  private final BigInteger numerator;
  private final BigInteger denominator;

  public BigRational() {
    this(BigInteger.ZERO);
  }

  public BigRational(BigInteger integer) {
    this(integer, BigInteger.ONE);
  }

  public BigRational(BigInteger numerator, BigInteger denominator) {
    if (denominator.signum() == 0) {
      throw new IllegalArgumentException(&amp;quot;denominator cannot be zero&amp;quot;);
    }
    if (numerator.signum() == 0) {
      this.numerator = BigInteger.ZERO;
      this.denominator = BigInteger.ONE;
    } else {
      this.numerator = numerator;
      this.denominator = denominator;
    }
  }

  public BigRational multiply(BigRational other) {
    if (numerator.signum() == 0 || other.numerator.signum() == 0) {
      return new BigRational(BigInteger.ZERO);
    } else if (denominator.equals(other.denominator)) {
      return new BigRational(numerator.add(other.numerator), denominator);
    } else {
      return new BigRational(numerator.multiply(other.denominator).add(other.numerator.multiply(denominator)), enominator.multiply(other.denominator));
    }
  }

  // etc
}&lt;/pre&gt;

&lt;h3&gt;Arrays are Mutable&lt;/h3&gt;

&lt;p&gt;The big problem with all this is that Java arrays are mutable. So for example:&lt;/p&gt;

&lt;pre class="brush:java"&gt;public void doStuff(String args[]) {
  args[0] = &amp;quot;Hello world&amp;quot;;
}

...

String arr[] = new String[] { &amp;quot;one&amp;quot;, &amp;quot;two&amp;quot;, &amp;quot;three&amp;quot; };
doStuff(arr);
System.out.println(arr[0]); // Hello world&lt;/pre&gt;

&lt;p&gt;This is one big reason why you should use &lt;a href="http://java.sun.com/javase/6/docs/api/java/util/List.html"&gt;Lists&lt;/a&gt; instead of arrays in almost all circumstances where you have a choice. Lists can be made immutable:&lt;/p&gt;

&lt;pre class="brush:java"&gt;List&amp;lt;String&amp;gt; list = new ArrayList&amp;lt;String&amp;gt;();
list.add(&amp;quot;one&amp;quot;);
list.add(&amp;quot;two&amp;quot;);
list.add(&amp;quot;three&amp;quot;);
final List&amp;lt;String&amp;gt; immutableList = Collections.unmodifiableList(list);&lt;/pre&gt;

&lt;p&gt;On a side note, this rather verbose syntax gets a little easier in Java 7 with &lt;a href="http://tech.puredanger.com/2009/06/02/javaone-coin/"&gt;collection literals&lt;/a&gt;, for example:&lt;/p&gt;

&lt;pre class="brush:java"&gt;List&amp;lt;String&amp;gt; list = [&amp;quot;one&amp;quot;, &amp;quot;two&amp;quot;, &amp;quot;three&amp;quot;];
final List&amp;lt;String&amp;gt; immutableList = Collections.unmodifiableList(list);&lt;/pre&gt;

&lt;h3&gt;Enum Values&lt;/h3&gt;

&lt;p&gt;Java 5 introduced &lt;a href="http://www.javapractices.com/topic/TopicAction.do?Id=1"&gt;typesafe enums&lt;/a&gt;, largely based on Joshua Bloch’s proposal (that used classes). The unofficial versions had lots of potential issues (eg having to implement readResolve() to cater to serialization creating new instances). In my opinion, Java’s enums are one of (increasingly few) significantly better language constructs in Java compared to, say, C# as Java’s enums aren’t just thinly wrapped integers (as they are in C/C++/.Net) and can also have behaviour.&lt;/p&gt;

&lt;p&gt;Java enums have a static method called values() which returns an &lt;em&gt;array&lt;/em&gt; of all instances of that enum. After the lessons of the Date class, this particular decision was nothing short of shocking. A List would have been a far more sensible choice. Internally this means the array of instances must be defensively copied each time it is called forcing you to write code like this repeatedly:&lt;/p&gt;

&lt;pre class="brush:java"&gt;public enum Season {
  SPRING, SUMMER, AUTUMN, WINTER;

  private static final List&amp;lt;Season&amp;gt; VALUES =
    Collections.unmodifiableList(
      new ArrayList&amp;lt;Season&amp;gt;(Arrays.asList(values())));

  public static List&amp;lt;Season&amp;gt; getValues() { return VALUES; }
}&lt;/pre&gt;

&lt;h3&gt;Is This Really Necessary?&lt;/h3&gt;

&lt;p&gt;There is nothing &lt;em&gt;inherently&lt;/em&gt; wrong with temporary objects. It’s a question of degree. So creating several hundred (or thousand) temporary objects isn’t any big deal. At some point there is such a thing as too many.&lt;/p&gt;

&lt;p&gt;I will demonstrate two things here:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;The cost of temporary arrays; and &lt;/li&gt;

  &lt;li&gt;The correct way to generate random enums. &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This post was prompted by &lt;a href="http://stackoverflow.com/questions/1972392/java-pick-a-random-value-from-an-enum"&gt;Java: Pick a random value from an enum?&lt;/a&gt; where the poster created a Random on every iteration. Some years ago this was bad because Randoms were seeded with the current time (in milliseconds or even seconds) so you wouldn’t get a particularly random distribution if called in a short space of time and it’s worth answering the question of fairness.&lt;/p&gt;

&lt;p&gt;This enum will be used:&lt;/p&gt;

&lt;pre class="brush:java"&gt;public enum Season {
  SPRING, SUMMER, AUTUMN, WINTER;

  private static final List&amp;lt;Season&amp;gt; VALUES1 =
      Collections.unmodifiableList(
          new ArrayList&amp;lt;Season&amp;gt;(Arrays.asList(values())));
  private static final Season[] VALUES2 = values();
  private static final int SIZE = VALUES2.length;
  
  private static final Random RANDOM = new Random();

  public static Season random1() {
    return values()[new Random().nextInt(SIZE)];
  }

  public static Season random2() {
    return values()[RANDOM.nextInt(SIZE)];
  }

  public static Season random3() {
    return VALUES1.get(new Random().nextInt(SIZE));
  }

  public static Season random4() {
    return VALUES1.get(RANDOM.nextInt(SIZE));
  }

  public static Season random5() {
    return VALUES2[new Random().nextInt(SIZE)];
  }

  public static Season random6() {
    return VALUES2[RANDOM.nextInt(SIZE)];
  }
}&lt;/pre&gt;

&lt;p&gt;with the following test harness:&lt;/p&gt;

&lt;pre class="brush:java"&gt;public class Temporary {
  private static final int COUNT = 30000000;

  public static void main(String args[]) {
    ScheduledExecutorService executor = Executors.newSingleThreadScheduledExecutor();
    MemoryMonitor monitor = new MemoryMonitor();
    executor.scheduleAtFixedRate(monitor, 0, 10, TimeUnit.MILLISECONDS);
    int[] tally = new int[4];
    long baseline = usedMemory();
    long start = System.nanoTime();
    for (int i=0; i&amp;lt;COUNT; i++) {
      tally[Season.random1().ordinal()]++;
    }
    long end = System.nanoTime();
    executor.shutdown();
    try {
      executor.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
    } catch (InterruptedException e) {
      throw new RuntimeException(e);
    }
    long memoryUsed = monitor.peak() - baseline;
    for (Season season : Season.values()) {
      System.out.printf(&amp;quot;%s: %,d%n&amp;quot;, season, tally[season.ordinal()]);
    }
    System.out.printf(&amp;quot;%nCompleted %,d iterations in %,.3f seconds using %,d bytes%n&amp;quot;,
        COUNT, ((end - start) / 1000000) / 1000.0d, memoryUsed
    );
  }

  private static long usedMemory() {
    Runtime runtime = Runtime.getRuntime();
    return runtime.totalMemory() - runtime.freeMemory();
  }

  private static void waitForEnter() {
    try {
      new BufferedReader(new InputStreamReader(System.in)).readLine();
    } catch (IOException e) {
      e.printStackTrace();
    }
  }
}&lt;/pre&gt;

&lt;p&gt;and&lt;/p&gt;

&lt;pre class="brush:java"&gt;public class MemoryMonitor implements Runnable {
  private final Runtime runtime = Runtime.getRuntime();
  private final List&amp;lt;Long&amp;gt; usage = new ArrayList&amp;lt;Long&amp;gt;();

  @Override
  public void run() {
    usage.add(runtime.totalMemory() - runtime.freeMemory());
  }

  public List&amp;lt;Long&amp;gt; usage() {
    return usage;
  }

  public long peak() {
    return Collections.max(usage);
  }
}&lt;/pre&gt;

&lt;p&gt;with each method being run in turn.&lt;/p&gt;

&lt;h3&gt;The Results&lt;/h3&gt;
&lt;style type="text/css"&gt;
#rundata { border-collapse: collapse; }
#rundata td { border: 1px solid black; text-align: center; }&lt;/style&gt;

&lt;table id="rundata" border="0" cellspacing="0" cellpadding="2" width="472"&gt;&lt;tbody&gt;
    &lt;tr&gt;
      &lt;td valign="top" width="88"&gt;&lt;strong&gt;Method&lt;/strong&gt;&lt;/td&gt;

      &lt;td valign="top" width="146"&gt;&lt;strong&gt;Run time (seconds)&lt;/strong&gt;&lt;/td&gt;

      &lt;td valign="top" width="236"&gt;&lt;strong&gt;Peak Memory Usage (bytes)&lt;/strong&gt;&lt;/td&gt;
    &lt;/tr&gt;

    &lt;tr&gt;
      &lt;td valign="top" width="88"&gt;random1&lt;/td&gt;

      &lt;td valign="top" width="146"&gt;9.746&lt;/td&gt;

      &lt;td valign="top" width="236"&gt;681,288&lt;/td&gt;
    &lt;/tr&gt;

    &lt;tr&gt;
      &lt;td valign="top" width="88"&gt;random2&lt;/td&gt;

      &lt;td valign="top" width="146"&gt;5.914&lt;/td&gt;

      &lt;td valign="top" width="236"&gt;665,592&lt;/td&gt;
    &lt;/tr&gt;

    &lt;tr&gt;
      &lt;td valign="top" width="88"&gt;random3&lt;/td&gt;

      &lt;td valign="top" width="146"&gt;5.123&lt;/td&gt;

      &lt;td valign="top" width="236"&gt;669,408&lt;/td&gt;
    &lt;/tr&gt;

    &lt;tr&gt;
      &lt;td valign="top" width="88"&gt;random4&lt;/td&gt;

      &lt;td valign="top" width="146"&gt;1.476&lt;/td&gt;

      &lt;td valign="top" width="236"&gt;18,376&lt;/td&gt;
    &lt;/tr&gt;

    &lt;tr&gt;
      &lt;td valign="top" width="88"&gt;random5&lt;/td&gt;

      &lt;td valign="top" width="146"&gt;4.593&lt;/td&gt;

      &lt;td valign="top" width="236"&gt;661,368&lt;/td&gt;
    &lt;/tr&gt;

    &lt;tr&gt;
      &lt;td valign="top" width="88"&gt;random6&lt;/td&gt;

      &lt;td valign="top" width="146"&gt;1.056&lt;/td&gt;

      &lt;td valign="top" width="236"&gt;18,376&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;&lt;/table&gt;

&lt;p&gt;From this we can draw several conclusions:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Creating a Random on every invocation is fair (in distribution terms) but has a high cost in temporary objects and CPU time (by a factor of 2-5); &lt;/li&gt;

  &lt;li&gt;The garbage collector is working as both the arrays and the temporary Random objects contribute to the memory usage but Java is (partially) handling both being created. What this probably means is that both created is triggering a GC; &lt;/li&gt;

  &lt;li&gt;Using a static copy of the enum values is 2-5x as fast; &lt;/li&gt;

  &lt;li&gt;A static array copy is about 20-40% quicker than a static List copy; and &lt;/li&gt;

  &lt;li&gt;The more optimized version uses 30x less memory and runs 10x quicker than the least optimized version. &lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;For significant use of an enum’s values() method, it’s a no brainer: create and use a static copy instead. It’s faster and uses way less memory. On non-trivial applications it will also mean less memory fragmentation and less (possibly expensive) GCs, which is a significant issue with high-usage Web applications.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/336308386934546555-3073976543799228049?l=www.cforcoding.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/IQ98tNztw8mXwre4GKA7sAEiPvY/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/IQ98tNztw8mXwre4GKA7sAEiPvY/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/IQ98tNztw8mXwre4GKA7sAEiPvY/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/IQ98tNztw8mXwre4GKA7sAEiPvY/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/CMNblULpX3k" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/3073976543799228049/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2009/12/mutability-arrays-and-cost-of-temporary.html#comment-form" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/3073976543799228049?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/3073976543799228049?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/CMNblULpX3k/mutability-arrays-and-cost-of-temporary.html" title="Mutability, Arrays and the Cost of Temporary Objects in Java" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07140129710674369084" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">1</thr:total><feedburner:origLink>http://www.cforcoding.com/2009/12/mutability-arrays-and-cost-of-temporary.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CkENSXkzfyp7ImA9WxBTGEQ.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-1905431498404039281</id><published>2009-12-15T23:21:00.001+08:00</published><updated>2009-12-15T23:24:58.787+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-12-15T23:24:58.787+08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="stackoverflow" /><category scheme="http://www.blogger.com/atom/ns#" term="opinion" /><title>Hard Numbers on Stackoverflow Careers</title><content type="html">&lt;p&gt;&lt;em&gt;This is a follow-up to &lt;/em&gt;&lt;a href="http://www.cforcoding.com/2009/12/joel-inc-stackoverflow-careers-and.html" target="_blank"&gt;&lt;em&gt;Joel Inc., Stackoverflow Careers and Jumping Sharks&lt;/em&gt;&lt;/a&gt;&lt;em&gt;, posted late last week.&lt;/em&gt;&lt;/p&gt;  &lt;p&gt;Joel posted &lt;a href="http://www.joelonsoftware.com/items/2009/12/13.html" target="_blank"&gt;Stack Stats&lt;/a&gt; this week in which he demonstrates the correlation between Stackoverflow reputation and Careers take-up.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://www.joelonsoftware.com/items/2009/12/13.html" target="_blank"&gt;&lt;img style="width: 426px; display: inline; margin-left: 0px; margin-right: 0px" align="left" src="http://www.joelonsoftware.com/items/2009/12/13cvs.png" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p style="clear: left"&gt;Now the exact meaning of this graph isn’t listed. Is that 30% of users with 50,000 reputation and above have submitted CVs? Let’s assume that it is.&lt;/p&gt;  &lt;div align="center"&gt;   &lt;table border="1" cellspacing="0" cellpadding="2" width="500" align="center"&gt;&lt;tbody&gt;       &lt;tr&gt;         &lt;td valign="top" width="125"&gt;&lt;strong&gt;Reputation&lt;/strong&gt;&lt;/td&gt;          &lt;td valign="top" width="125"&gt;&lt;strong&gt;Percentage&lt;/strong&gt;&lt;/td&gt;          &lt;td valign="top" width="125"&gt;&lt;strong&gt;Users in Range&lt;/strong&gt;&lt;/td&gt;          &lt;td valign="top" width="125"&gt;&lt;strong&gt;# of CVs&lt;/strong&gt;&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td valign="top" width="125"&gt;1,000&lt;/td&gt;          &lt;td valign="top" width="125"&gt;8%&lt;/td&gt;          &lt;td valign="top" width="125"&gt;3,257&lt;/td&gt;          &lt;td valign="top" width="125"&gt;261&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td valign="top" width="125"&gt;2,000&lt;/td&gt;          &lt;td valign="top" width="125"&gt;10%&lt;/td&gt;          &lt;td valign="top" width="125"&gt;1,196&lt;/td&gt;          &lt;td valign="top" width="125"&gt;120&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td valign="top" width="125"&gt;3,000&lt;/td&gt;          &lt;td valign="top" width="125"&gt;11%&lt;/td&gt;          &lt;td valign="top" width="125"&gt;640&lt;/td&gt;          &lt;td valign="top" width="125"&gt;70&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td valign="top" width="125"&gt;4,000&lt;/td&gt;          &lt;td valign="top" width="125"&gt;12%&lt;/td&gt;          &lt;td valign="top" width="125"&gt;373&lt;/td&gt;          &lt;td valign="top" width="125"&gt;45&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td valign="top" width="125"&gt;5,000&lt;/td&gt;          &lt;td valign="top" width="125"&gt;13%&lt;/td&gt;          &lt;td valign="top" width="125"&gt;244&lt;/td&gt;          &lt;td valign="top" width="125"&gt;32&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td valign="top" width="125"&gt;6,000&lt;/td&gt;          &lt;td valign="top" width="125"&gt;13.5%&lt;/td&gt;          &lt;td valign="top" width="125"&gt;153&lt;/td&gt;          &lt;td valign="top" width="125"&gt;21&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td valign="top" width="125"&gt;7,000&lt;/td&gt;          &lt;td valign="top" width="125"&gt;14%&lt;/td&gt;          &lt;td valign="top" width="125"&gt;112&lt;/td&gt;          &lt;td valign="top" width="125"&gt;16&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td valign="top" width="125"&gt;8,000&lt;/td&gt;          &lt;td valign="top" width="125"&gt;15%&lt;/td&gt;          &lt;td valign="top" width="125"&gt;95&lt;/td&gt;          &lt;td valign="top" width="125"&gt;14&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td valign="top" width="125"&gt;9,000&lt;/td&gt;          &lt;td valign="top" width="125"&gt;16%&lt;/td&gt;          &lt;td valign="top" width="125"&gt;48&lt;/td&gt;          &lt;td valign="top" width="125"&gt;8&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td valign="top" width="125"&gt;10,000&lt;/td&gt;          &lt;td valign="top" width="125"&gt;17%&lt;/td&gt;          &lt;td valign="top" width="125"&gt;220&lt;/td&gt;          &lt;td valign="top" width="125"&gt;37&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td valign="top" width="125"&gt;15,000&lt;/td&gt;          &lt;td valign="top" width="125"&gt;19.5%&lt;/td&gt;          &lt;td valign="top" width="125"&gt;79&lt;/td&gt;          &lt;td valign="top" width="125"&gt;15&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td valign="top" width="125"&gt;20,000&lt;/td&gt;          &lt;td valign="top" width="125"&gt;22%&lt;/td&gt;          &lt;td valign="top" width="125"&gt;58&lt;/td&gt;          &lt;td valign="top" width="125"&gt;12&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td valign="top" width="125"&gt;30,000&lt;/td&gt;          &lt;td valign="top" width="125"&gt;26%&lt;/td&gt;          &lt;td valign="top" width="125"&gt;28&lt;/td&gt;          &lt;td valign="top" width="125"&gt;7&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td valign="top" width="125"&gt;50,000&lt;/td&gt;          &lt;td valign="top" width="125"&gt;30%&lt;/td&gt;          &lt;td valign="top" width="125"&gt;15&lt;/td&gt;          &lt;td valign="top" width="125"&gt;5&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td valign="top" width="125"&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;          &lt;td valign="top" width="125"&gt;&lt;strong&gt;10.2%&lt;/strong&gt;&lt;/td&gt;          &lt;td valign="top" width="125"&gt;&lt;strong&gt;6,518&lt;/strong&gt;&lt;/td&gt;          &lt;td valign="top" width="125"&gt;&lt;strong&gt;663&lt;/strong&gt;&lt;/td&gt;       &lt;/tr&gt;     &lt;/tbody&gt;&lt;/table&gt; &lt;/div&gt;  &lt;p&gt;So we have 145 employers (as of 15 Dec 2009) and 663 job seekers of the 6,518 in the sample representing a percentage take-up of 10.2%.&lt;/p&gt;  &lt;p&gt;I would guess that the vast majority of those would’ve paid $29 for 3 years so these ~700 uses account for $21,000 revenue over 3 years.&lt;/p&gt;  &lt;p&gt;Is this kind of barrier—charging job seekers—really worth that kind of revenue stream?&lt;/p&gt;  &lt;p&gt;Of course the hope is both that:&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;The number of candidates will substantially grow; and &lt;/li&gt;    &lt;li&gt;Many (or most) of them will convert to paying $99/year in 3 years. &lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;Three years from now I would consider it optimistic that the users matching the profile might number 40,000 instead of 10 to 25,000. That won’t accurately reflect the natural attrition rate either (there are some users who have already become basically inactive).&lt;/p&gt;  &lt;p&gt;Considering that not all users will be looking for work at the same time (assuming Goldman Sachs’ next crackpot house of cards hasn’t come tumbling down yet), it’s hard to imagine the take-up rate being higher than 15-20% and that’s being optimistic.&lt;/p&gt;  &lt;p&gt;So if everything goes well 10,000 people are paying $99/year. $1 million a year—basically money for nothing—is nothing to sneeze at. I can’t see it happening however.&lt;/p&gt;  &lt;p&gt;Even if it does, it’s questionable whether this is the critical mass required to attract employers. I guess time will tell.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/336308386934546555-1905431498404039281?l=www.cforcoding.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/dfDDX43CGqk2c0SZXyHpeedK6Ak/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/dfDDX43CGqk2c0SZXyHpeedK6Ak/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/dfDDX43CGqk2c0SZXyHpeedK6Ak/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/dfDDX43CGqk2c0SZXyHpeedK6Ak/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/QM7YFO6fnsI" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/1905431498404039281/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2009/12/hard-numbers-on-stackoverflow-careers.html#comment-form" title="10 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/1905431498404039281?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/1905431498404039281?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/QM7YFO6fnsI/hard-numbers-on-stackoverflow-careers.html" title="Hard Numbers on Stackoverflow Careers" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07140129710674369084" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">10</thr:total><feedburner:origLink>http://www.cforcoding.com/2009/12/hard-numbers-on-stackoverflow-careers.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CEQCRX48cSp7ImA9WxBTFE4.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-166152980014672010</id><published>2009-12-10T16:06:00.001+08:00</published><updated>2009-12-10T16:06:04.079+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-12-10T16:06:04.079+08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="stackoverflow" /><category scheme="http://www.blogger.com/atom/ns#" term="opinion" /><title>Joel Inc., Stackoverflow Careers and Jumping Sharks</title><content type="html">&lt;p&gt;Joel Spolsky is a legend in the programming world. His blog—&lt;a href="http://www.joelonsoftware.com/"&gt;Joel on Software&lt;/a&gt;—is the most popular and well-known programming blog. In mid-2008, Joel and Jeff Atwood—of &lt;a href="http://www.codinghorror.com/blog/"&gt;Coding Horror&lt;/a&gt; fame—launched &lt;a href="http://stackoverflow.com/"&gt;Stackoverflow&lt;/a&gt;, a free site for asking programming questions.&lt;/p&gt;  &lt;p&gt;Stackoverflow is clearly a success but the sister sites haven’t fared nearly as well. Recently Jeff and Joel launched &lt;a href="http://careers.stackoverflow.com/"&gt;Stackoverflow Careers&lt;/a&gt;, a site for programmers to find jobs and employers to find programmers.&lt;/p&gt;  &lt;p&gt;Stackoverflow Careers may just be a bridge too far.&lt;/p&gt;  &lt;h3&gt;Let’s Talk About… Joel&lt;/h3&gt;  &lt;p&gt;Joel on Software was the first blog I ever read. I read it before anyone really knew what a blog was. &lt;a href="http://www.joelonsoftware.com/uibook/chapters/fog0000000057.html"&gt;Controlling Your Environment Makes You Happy&lt;/a&gt; was one of those things I read that completely changed my perspective. &lt;a href="http://www.joelonsoftware.com/articles/APIWar.html"&gt;How Microsoft Lost the API War&lt;/a&gt; I consider to be almost prophetic in its predictions regarding the then-Longhorn now-Vista boondoggle and desktop bloodletting by Web applications.&lt;/p&gt;  &lt;p&gt;But something isn’t right in the Land of Joel.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/File:Buzo.jpg" rel="license" target="_blank"&gt;&lt;img style="width: 320px" src="http://img187.imageshack.us/img187/2708/scubav.jpg" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;In the late 90s during a brief flirtation with strenuous physical activity, I learnt to SCUBA dive. I went to one of these courses that was an evening of instruction of the evils of nitrogen, a weekend in the pool and then a weekend in the ocean. This was a &lt;a href="http://www.padi.com/scuba/"&gt;PADI&lt;/a&gt; course and is very much the consumer-grade diving education and I state that as a simple observation not a judgement or accusation. At the other end of the spectrum is &lt;a href="http://www.naui.org/"&gt;NAUI&lt;/a&gt;.&lt;/p&gt;  &lt;p&gt;PADI is all about selling you stuff—gear, courses, whatever. A friend remarked to me that PADI stood for &lt;strong&gt;Put Another Dollar In&lt;/strong&gt;.&lt;/p&gt;  &lt;p&gt;NAUI on the other hand is much more highly regarded but less prolific. It is a not-for-profit organisation. Whereas some accuse PADI of dumbing down SCUBA training, nothing of the sort is levelled against NAUI. That same friend said NAUI stands for &lt;strong&gt;Not Another Untrained Idiot&lt;/strong&gt;.&lt;/p&gt;  &lt;p&gt;What does this have to do with Joel? &lt;em&gt;Whereas Joel was once the NAUI-like font of wisdom, now it just seems like he’s trying to sell me stuff.&lt;/em&gt;&lt;/p&gt;  &lt;h3&gt;Jumping the Shark&lt;/h3&gt;  &lt;p&gt;Of course I’m not the first to articulate this. &lt;/p&gt;  &lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/File:Fonzie_jumps_the_shark.PNG" rel="license" target="_blank"&gt;&lt;img style="width: 317px" src="http://upload.wikimedia.org/wikipedia/en/5/51/Fonzie_jumps_the_shark.PNG" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;In recent times Joel has taken quite a bashing, for example &lt;a href="http://stochasticgeometry.wordpress.com/2009/10/27/joel-spolsky-snake-oil-salesman/"&gt;Joel Spolsky, Snake-Oil Salesman&lt;/a&gt; and &lt;a href="http://blogs.citytechinc.com/sanderson/"&gt;Sten Anderson&lt;/a&gt;’s &lt;a href="http://blogs.citytechinc.com/sanderson/?p=284"&gt;I Heart Joel on Software&lt;/a&gt;. &lt;/p&gt;  &lt;p&gt;Sten’s comments are particularly interesting because what he says is true: all Joel’s endless talk about great programmers is thinly disguised disdain for the 99% of us that didn’t go to MIT, Stanford, UW, Yale, Harvard or UPenn.&lt;/p&gt;  &lt;p&gt;Amusingly, Jeff Atwood posted several years ago &lt;a href="http://www.codinghorror.com/blog/archives/000679.html"&gt;Has Joel Spolsky Jumped the Shark?&lt;/a&gt; going so far as to say:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;I reject this new, highly illogical Joel Spolsky. I demand the immediate return of the sage, sane, wise Joel Spolsky of years past. But maybe it's like wishing for a long-running television show to return to its previous glories.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;I guess he got over it.&lt;/p&gt;  &lt;p&gt;Side note: Jeff was responding to &lt;a href="http://www.joelonsoftware.com/items/2006/09/01.html"&gt;Language Wars&lt;/a&gt; (emphasis added by Jeff):&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;a href="http://www.fogcreek.com/FogBugz"&gt;FogBugz&lt;/a&gt;is written in Wasabi, a very advanced, functional-programming dialect of Basic with closures and lambdas and Rails-like active records that can be compiled down to VBScript, JavaScript, PHP4 or PHP5. &lt;strong&gt;Wasabi is a private, in-house language written by one of our best developers that is optimized specifically for developing FogBugz;&lt;/strong&gt; the Wasabi compiler itself is written in C#.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;I admit it: I love a good rant. And not just ranting for ranting’s sake but a rant with a message, an essential kernel of truth, a pearl of wisdom. It’s hard to forget &lt;a href="http://www.zedshaw.com/"&gt;Zed Shaw&lt;/a&gt;’s now-infamous (albeit retracted) &lt;a href="http://web.archive.org/web/20080103072111/http://www.zedshaw.com/rants/rails_is_a_ghetto.html"&gt;Rails is a Ghetto&lt;/a&gt; rant of nearly two years ago. Yesterday I read &lt;a href="http://gilesbowkett.blogspot.com/"&gt;Giles Bowkett&lt;/a&gt;’s &lt;a href="http://gilesbowkett.blogspot.com/2009/12/blogs-are-godless-communist-bullshit.html"&gt;Blogs are Godless Communist Bullshit&lt;/a&gt;. It’s long but entertaining and absolutely worth reading.&lt;/p&gt;  &lt;p&gt;But is all this criticism justified?&lt;/p&gt;  &lt;p&gt;Firstly, some background.&lt;/p&gt;  &lt;h3&gt;IT Recruitment&lt;/h3&gt;  &lt;p&gt;In Europe and Australia programmers (and other IT professionals) are found in three ways:&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;Direct recruitment by the employer. This usually means big employers who have dedicated HR departments to filter out CVs, book interviews and so on. Such candidates will most likely become salaried employees of the company; &lt;/li&gt;    &lt;li&gt;Word of mouth; and &lt;/li&gt;    &lt;li&gt;Through recruitment agencies. &lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;In my experience recruitment agents are &lt;em&gt;loathed&lt;/em&gt; by IT workers (eg &lt;a href="http://angryaussie.wordpress.com/2006/10/30/why-is-it-recruitment-so-bad/"&gt;Why is IT recruitment so bad?&lt;/a&gt;). Most of the time they’re &lt;em&gt;utterly clueless&lt;/em&gt; (I have in all seriousness been asked “I see you have 7 years of Java experience but do you have any J2SE experience?”). Horror stories are legion. IT recruitment in London in particular is a &lt;em&gt;soul-destroying experience&lt;/em&gt;.&lt;/p&gt;  &lt;p&gt;Recruiters will fill positions on a &lt;em&gt;permanent&lt;/em&gt; (salaried) or &lt;em&gt;contract&lt;/em&gt; (paid by the hour, day, week or month) basis.&lt;/p&gt;  &lt;p&gt;The recruiter will earn a fee that is typically around 10-15% of the candidate’s annual salary upon successfully filling the position. If the employee leaves in the probationary period (typically three months) some or all of that will be refunded.&lt;/p&gt;  &lt;p&gt;With contractors the recruiter will typically earn a margin of 10-25% (or even higher) on top of the contractor’s rate either for a fixed term (eg it scales down after a year) or in perpetuity. Expat contractors typically have criminally high margins put on top of what they earn, at least initially.&lt;/p&gt;  &lt;p&gt;So recruitment is expensive.&lt;/p&gt;  &lt;p&gt;Compare that to placing ads on job boards will typically cost hundreds of dollars (eg &lt;a href="http://jobs.joelonsoftware.com/default.asp?pg=pgFAQ"&gt;jobs.joelonsoftware.com FAQ&lt;/a&gt; and &lt;a href="http://hiring.monster.com/recruitment/Job-Postings.aspx"&gt;Monster Job Posting&lt;/a&gt;) and last weeks. One ad can potentially fill multiple positions. Employers will typically keep CVs on file and getting contacted some time after applying is not uncommon. So ads can be effective although there can be a lot of chaff.&lt;/p&gt;  &lt;p&gt;IT recruitment &lt;em&gt;is&lt;/em&gt; broken so there’s definitely room for a solution.&lt;/p&gt;  &lt;h3&gt;Stackoverflow Careers&lt;/h3&gt;  &lt;p&gt;Careers is another site hoping to capitalize on the success of Stackoverflow. Programmers routinely demonstrate the ability to self-organize, which I think explains—at least in part—its success. Computer science is also a centuries-old. Yes I said “centuries old”. So before some reddit lurker points out computers were born in the mid-twentieth century, I suggest you consult the &lt;a href="http://en.wikipedia.org/wiki/Timeline_of_computing_2400_BC%E2%80%931949"&gt;Timeline of computing 2400 BC–1949&lt;/a&gt; and the work of &lt;a href="http://en.wikipedia.org/wiki/Charles_Babbage"&gt;Charles Babbage&lt;/a&gt; and others.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://commons.wikimedia.org/wiki/File:Volunteers_of_America_Soup_Kitchen_WDC.gif" rel="license" target="_blank"&gt;&lt;img style="width: 320px" src="http://img690.imageshack.us/img690/2364/soupkitchen.gif" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;&lt;/p&gt;  &lt;p&gt;The latest money-making venture is &lt;a href="http://careers.stackoverflow.com/"&gt;Stackoverflow Careers&lt;/a&gt;, heavily cross-promoted by Jeff Atwood (&lt;a href="http://blog.stackoverflow.com/2009/10/introducing-stack-overflow-careers/"&gt;Introducing Stack Overflow Careers&lt;/a&gt; and &lt;a href="http://www.codinghorror.com/blog/archives/001308.html"&gt;Stack Overflow Careers: Amplifying Your Awesome&lt;/a&gt;) and Joel (&lt;a href="http://www.joelonsoftware.com/items/2009/11/05.html"&gt;Upgrade your career&lt;/a&gt; and &lt;a href="http://www.joelonsoftware.com/items/2009/12/02.html"&gt;Programmer search engine&lt;/a&gt;) as well as echoes in the blogosphere.&lt;/p&gt;  &lt;p&gt;Despite the success in terms of audience size (&lt;a href="http://www.youtube.com/watch?v=NWHfY_lvKIQ"&gt;Joel in his Google Tech Talk&lt;/a&gt; claims a ~30% programmer share, which is huge if true), programmers are a hard bunch to monetize (see &lt;a href="http://blog.stackoverflow.com/2009/11/our-amazon-advertising-experiment/"&gt;Our Amazon Advertising Experiment&lt;/a&gt;). Careers is the latest incarnation.&lt;/p&gt;  &lt;p&gt;It’s free to have a public CV but having a private CV costs money (allegedly $99/year after 31st December but don’t be surprised if that changes). The private CV is searchable by employers and allows (as Jeff/Joel put it) “deep” integration with Stackoverflow.&lt;/p&gt;  &lt;p&gt;The employers are paying too anywhere from $500 for a week to $5,000 for a year (see the &lt;a href="http://careers.stackoverflow.com/faq"&gt;FAQ&lt;/a&gt;).&lt;/p&gt;  &lt;p&gt;Not cheap. So what are we getting for our money?&lt;/p&gt;  &lt;h3&gt;The Hollywood Analogy&lt;/h3&gt;  &lt;p&gt;Joel claims:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;In Hollywood, studios who need talent browse through portfolios, find two or three possible candidates, and make them great offers. And then they all try to outdo each other providing plush work environments and great benefits.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Make no mistake: you’re being sold something here. The allure of stardom is deliberate bait. Giles succinctly sums this up:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;This last part is laugh-out-loud funny. That's not how Hollywood works. I'm an actor, I've been studying acting for years, and I know award-winning actors who still have to go out on auditions like everybody else. You might wonder how a newbie like me, with nothing but Cop #3 in a student film to his credit, can claim to know award-winning, seasoned professionals. It's simple: because &lt;b&gt;&lt;i&gt;they have to go on auditions like everybody else&lt;/i&gt;&lt;/b&gt;&lt;i&gt;&lt;/i&gt;.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;I will take one issue with what Giles said:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;Robert Downey Jr. had to fight like hell to get the lead role in &lt;i&gt;Iron Man&lt;/i&gt;.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Yes, but there’s a reason for that. He had a &lt;a href="http://en.wikipedia.org/wiki/Robert_Downey,_Jr.#Substance_abuse"&gt;serious drug problem&lt;/a&gt; and any studio is going to balk at betting a billion dollar franchise on a cokehead.&lt;/p&gt;  &lt;p&gt;But I digress.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://commons.wikimedia.org/wiki/File:Hollywood_sign_053004.jpg" rel="license" target="_blank"&gt;&lt;img style="width: 320px" src="http://img138.imageshack.us/img138/3341/800pxhollywoodsign05300.jpg" /&gt;&lt;/a&gt; Here’s another difference: actors are basically the most flexible labour market in the world. They go where the work is. The film shoots for 40 weeks in Siberia? Fine, no problem. Actors go where the films are.&lt;/p&gt;  &lt;p&gt;Programmers on the other hand are not nearly as flexible. Programmers are regular workers. We have families, friends, mortgages and so on. Sure we might move from St Louis to San Francisco for a job but we also might not. I think it’s safe to say that more often than not, we’re not looking to move across country. Hell, we’ll even turn down a job if it’s in the &lt;em&gt;wrong part of the same city&lt;/em&gt;.&lt;/p&gt;  &lt;p&gt;Imagine how far you’d get as an actor if you said “I’ve love to work on your TV show but the studio is in Burbank and commute from Radondo Beach is a bitch so i think I’ll pass.” (only knowing LA to change planes I make no apologies for any gross errors in LA geography I may have just made).&lt;/p&gt;  &lt;p&gt;So instead of there being a handful of job markets for actors there are probably 100 or more for programmers.&lt;/p&gt;  &lt;h3&gt;So What’s In It For Me?&lt;/h3&gt;  &lt;p&gt;From the &lt;a href="http://careers.stackoverflow.com/faq"&gt;FAQ&lt;/a&gt;:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;If you are seeking employment, we do require a modest annual payment to file your CV. Filing your CV makes it eligible to appear in searches by hiring managers via our private search interface. This fee allows us to ensure employers that everyone they find is actively looking for a job.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Isn’t the fact that I’ve filled out a CV and ticked a box that says I’m looking for work sufficient? Apparently not.&lt;/p&gt;  &lt;p&gt;Consider &lt;a href="http://www.joelonsoftware.com/articles/FindingGreatDevelopers.html"&gt;Finding Great Developers&lt;/a&gt;:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;The great software developers, indeed, the best people in every field, are quite simply &lt;em&gt;never on the market.&lt;/em&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;So the target market seems to be those developers who &lt;em&gt;think&lt;/em&gt; they’re great developers but actually aren’t. If they were they wouldn’t be looking. I get it: everyone is better than average.&lt;/p&gt;  &lt;p&gt;Giles sums this up:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;The number one rule of the con: you can't con an honest man … Try to get something for nothing, just because Joel Spolsky said you could? You're going to get burned.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;I should point out that I signed up in the beta. I was under no illusions however (then again, who ever thinks they are?). The chances of an employer looking in my remote backwater are next to nil but I figured at $30, at worst I was out two lunches from &lt;a href="http://www.nandos.com.au/index.php"&gt;Nando's&lt;/a&gt;.&lt;/p&gt;  &lt;h3&gt;And If I’m A Hiring Manager?&lt;/h3&gt;  &lt;p&gt;Approximately 6,500 Stackoverflow users have 1,000 reputation or more. This is an arbitrary number choice but the point is this: integration with Stackoverflow only adds value if you’ve contributed a sufficiently large number of answers to mine. Go up to 2,000 rep and you’re down to less than 3,200 users. And so on.&lt;/p&gt;  &lt;p&gt;Let’s be optimistic and say the potential audience for whom Stackoverflow will add value to their CV is 10,000. A number of these can be eliminated as being students, retired, incapable of working (eg disability or serious prolonged injury) or simply not looking for work.&lt;/p&gt;  &lt;p&gt;Joel claims:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;But Stack Overflow Careers doesn’t have to be massive. It’s not for the 5.2 million people who visit Stack Overflow; it’s for the top 25,000 developers who participate actively.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Want to know what the &lt;a href="http://stackoverflow.com/users?page=714"&gt;25,000th user&lt;/a&gt; looks like?&lt;/p&gt;  &lt;p&gt;&lt;img style="width: 500px; display: block; float: none; margin-left: auto; margin-right: auto" src="http://easycaptures.com/fs/uploaded/443/8219048195.png" /&gt; &lt;/p&gt;  &lt;p&gt;I mean no disrespect to these people but “participate actively”?&lt;/p&gt;  &lt;p&gt;Take careful note of the language too: 25,000 from 5.2 million? Hell, you’re &lt;em&gt;already&lt;/em&gt; the top half of one percent! You’re &lt;em&gt;elite&lt;/em&gt;, positively &lt;em&gt;l33t&lt;/em&gt;! Uh huh.&lt;/p&gt;  &lt;h3&gt;Crunching the Numbers&lt;/h3&gt;  &lt;p&gt;&lt;a href="http://commons.wikimedia.org/wiki/File:Accountants.jpg" rel="license" target="_blank"&gt;&lt;img style="width: 320px" src="http://img44.imageshack.us/img44/9199/accountants.jpg" /&gt;&lt;/a&gt; There are at least 100 distinct geographical job markets for an employer. If you’re lucky 10% of the pool is accessible to you either by being in the right place or willing to relocate.&lt;/p&gt;  &lt;p&gt;Of those 10%, maybe 10% have the right skills. The importance of programming languages is definitely overstated by (typically clueless) HR departments and recruiters. It’s also true that good developers can program in anything (given sufficient time) but not all languages are interchangeable in all situations. I would consider a Java Web developer to be largely interchangeable with an ASP.NET C# Web developer (in that there is sufficient crossover to enable a sufficiently speedy transition) but I wouldn’t hire a Ruby programmer to do C programming for microcontrollers and embedded devices. The transition from unmanaged (eg C/C++) to managed (eg C#/.Net) code can be steep enough.&lt;/p&gt;  &lt;p&gt;Of this reduced pool, how many have the right experience? The more experienced you get as a developer, generally the more important domain knowledge becomes. I wouldn’t hire a mobile telephony architect to design a system for market-making options on commodities futures because you’d spend 6 months explaining bid/ask, spreads, what a future is, what an option is, in-the-money, out-the-money, out-the-money, short, long, contango, volatility, Black-Scholes… the list goes on.&lt;/p&gt;  &lt;p&gt;Of the remaining few who has the right &lt;em&gt;amount &lt;/em&gt;of experience? You wouldn’t hire a fresh college grad to mentor junior developers.&lt;/p&gt;  &lt;p&gt;Now you’ve got a short list (“short” being the operative word) consider how many are available?&lt;/p&gt;  &lt;p&gt;And you haven’t even interviewed anybody yet!&lt;/p&gt;  &lt;p&gt;So if you optimistically assume that 10,000 people sign up for Careers, chances are you’re down to &lt;em&gt;less than five&lt;/em&gt;. Of those, how many are &lt;em&gt;seriously &lt;/em&gt;looking? They’re paying by the year so why not have your CV out there just in case?&lt;/p&gt;  &lt;p&gt;Don’t be fooled, paying to file your CV doesn’t ensure you’re seriously looking. The &lt;em&gt;only&lt;/em&gt; thing it ensures is that you’re a revenue stream.&lt;/p&gt;  &lt;h3&gt;Critical Mass&lt;/h3&gt;  &lt;p&gt;&lt;a href="http://commons.wikimedia.org/wiki/File:Nuclear_power.JPG" rel="license" target="_blank"&gt;&lt;img style="width: 320px" src="http://img40.imageshack.us/img40/8706/nuclearpower.jpg" /&gt;&lt;/a&gt; Matching candidates to employers is &lt;em&gt;low probability&lt;/em&gt;. The number who fit the profile is probably 1 in 1,000 &lt;em&gt;or even less&lt;/em&gt;.&lt;/p&gt;  &lt;p&gt;So of the 10 to 25 thousand relevant potential candidates, some percentage will actually be looking for work. Of that percentage, a smaller percentage will pay to be seen by employers, less than might otherwise be seen if the service was free (for job seekers). I expect that number to be 2,000 or less and that number is, in my opinion, inflated by the cheap beta registration.&lt;/p&gt;  &lt;p&gt;So an employer is going to pay big bucks—much more than a typical job ad—to reach a &lt;em&gt;much smaller&lt;/em&gt; target audience?&lt;/p&gt;  &lt;p&gt;People will pay money if they are getting value for money. Paying $15,000 to a recruiter to find you a programmer is &lt;em&gt;cheap&lt;/em&gt; because the recruiter is doing most of the legwork &lt;em&gt;and&lt;/em&gt; assuming a large part of the risk (in that they don’t typically get paid if you don’t find someone you like). Job ads are &lt;em&gt;cheap&lt;/em&gt; because they may reach tens or even hundreds of thousands of candidates.&lt;/p&gt;  &lt;p&gt;&lt;em&gt;It’s like Careers is charging as if it’s already a proven success.&lt;/em&gt;&lt;/p&gt;  &lt;p&gt;Things like this work on the principle of &lt;em&gt;critical mass&lt;/em&gt;. Take eBay. People buy on eBay because there are things to be bought. People sell on eBay because people will buy them. Without either group the site fails. A job board is no different. People go to them because they have jobs they want. Companies advertise on them because they reach the right audience.&lt;/p&gt;  &lt;p&gt;So what job board—and let’s be honest; that’s what it is—is going to survive by restricting itself to 10 to 25 thousand candidates &lt;em&gt;globally&lt;/em&gt;? Perhaps Jeff and Joel are thinking that it will be &lt;em&gt;so&lt;/em&gt; successful that everyone else will just have to sign up anyway.&lt;/p&gt;  &lt;p&gt;Good luck with that business strategy.&lt;/p&gt;  &lt;h3&gt;Is It Legal?&lt;/h3&gt;  &lt;p&gt;I have to wonder if anyone has bothered to ask this yet. Consider &lt;a href="http://www.thenational.ae/apps/pbcs.dll/article?AID=/20090725/NATIONAL/707249768" target="_blank"&gt;Job seekers are hit by illegal fees&lt;/a&gt;. Not just in the United Arab Emirates is it &lt;em&gt;illegal&lt;/em&gt; to charge job seekers. Also, &lt;a href="http://jobseekr.com.au/2009/04/27/how-job-seekers-can-best-use-recruitment-agencies/" target="_blank"&gt;How job seekers can best use recruitment agencies&lt;/a&gt; (emphasis added):&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;Recruitment agencies make their money by charging employers a fee for a permanent hire or an hourly or daily margin on a temporary placement. &lt;strong&gt;&lt;em&gt;It is illegal to charge job seekers a fee for finding them work.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;That’s for Australia. The point here is that Jeff and Joel probably need to be &lt;em&gt;very&lt;/em&gt; careful about how they define the Careers site if they don’t want to run afoul of laws set up to protect the unemployed from unscrupulous practices.&lt;/p&gt;  &lt;h3&gt;Smoke and Mirrors&lt;/h3&gt;  &lt;p&gt;From the &lt;a href="http://careers.stackoverflow.com/faq"&gt;FAQ&lt;/a&gt;:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;If you are seeking employment, we do require a modest annual payment to file your CV. Filing your CV makes it eligible to appear in searches by hiring managers via our private search interface. This fee allows us to ensure employers that everyone they find is actively looking for a job.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;We’re being sold something here.&lt;/p&gt;  &lt;p&gt;Also consider &lt;a href="http://www.joelonsoftware.com/items/2009/11/05.html"&gt;Upgrade your career&lt;/a&gt;:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;Employers can see how good you are at communicating, …&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;OK&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;… how well you explain things, …&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;OK&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;… how well you understand the tools that you’re using, …&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Er… OK.&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;… and generally, if you’re a great developer or not.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Whoa. Sorry, but the fact that I know how that parsing HTML with regular expressions is retarded, I can explain how to add a jQuery click() handler and that not sanitizing user input to SQL statements is idiotic doesn’t make me a great developer. It means anything from I like teaching to I’m narcissistic enough to like hearing the sound of my own voice (virtually speaking), perhaps both.&lt;/p&gt;  &lt;p&gt;And let’s not forget that &lt;em&gt;all of this can be established by simply including a URL to your Stackoverflow profile on your CV&lt;/em&gt;.&lt;/p&gt;  &lt;p&gt;&lt;/p&gt;  &lt;h3&gt;Conclusion&lt;/h3&gt;  &lt;p&gt;The numbers just don’t add up on this one. My only question is how long it’ll be before that sinks in and the model changes. With so much free choice, its just not viable to charge job seekers while severely limiting the candidate pool for employers while charging them an arm and a leg for information they can get from a URL.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/336308386934546555-166152980014672010?l=www.cforcoding.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/oJsy1SXfL6OcsQXSg7YidpM-6Ww/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/oJsy1SXfL6OcsQXSg7YidpM-6Ww/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/oJsy1SXfL6OcsQXSg7YidpM-6Ww/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/oJsy1SXfL6OcsQXSg7YidpM-6Ww/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/bSfjU7p6ix4" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/166152980014672010/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2009/12/joel-inc-stackoverflow-careers-and.html#comment-form" title="51 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/166152980014672010?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/166152980014672010?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/bSfjU7p6ix4/joel-inc-stackoverflow-careers-and.html" title="Joel Inc., Stackoverflow Careers and Jumping Sharks" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07140129710674369084" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">51</thr:total><feedburner:origLink>http://www.cforcoding.com/2009/12/joel-inc-stackoverflow-careers-and.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CEIGRnc5fip7ImA9WxBTE08.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-7272264298600108129</id><published>2009-12-09T09:34:00.003+08:00</published><updated>2009-12-09T09:35:27.926+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-12-09T09:35:27.926+08:00</app:edited><title>Google Wave Invites to Give Away</title><content type="html">I've got a dozen or so of these I don't really need. &lt;a href="http://www.cforcoding.com/2009/05/contact.html"&gt;Drop me a line&lt;/a&gt; and I'll send you one, first in first served until they run out.&lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/336308386934546555-7272264298600108129?l=www.cforcoding.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/FgMxMIpGrxP3j1fUE9-1itO1_Bc/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/FgMxMIpGrxP3j1fUE9-1itO1_Bc/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/FgMxMIpGrxP3j1fUE9-1itO1_Bc/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/FgMxMIpGrxP3j1fUE9-1itO1_Bc/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/iql-Kz-TlvU" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/7272264298600108129/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2009/12/google-wave-invites-to-give-away.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/7272264298600108129?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/7272264298600108129?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/iql-Kz-TlvU/google-wave-invites-to-give-away.html" title="Google Wave Invites to Give Away" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07140129710674369084" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total><feedburner:origLink>http://www.cforcoding.com/2009/12/google-wave-invites-to-give-away.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CUMFRHk4fCp7ImA9WxBTEk8.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-3488586796642816981</id><published>2009-12-07T19:34:00.003+08:00</published><updated>2009-12-08T06:03:35.734+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-12-08T06:03:35.734+08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="algorithms" /><category scheme="http://www.blogger.com/atom/ns#" term="Java" /><category scheme="http://www.blogger.com/atom/ns#" term="computer science" /><title>Programming Puzzles, Chess Positions and Huffman Coding</title><content type="html">&lt;p&gt;This week &lt;a href="http://stackoverflow.com/users/40410/andrew-rollings"&gt;Andrew Rollings&lt;/a&gt; asked the question     &lt;a href="http://stackoverflow.com/questions/1831386/programmer-puzzle-encoding-a-chess-board-state-throughout-a-game"&gt;ProgrammerPuzzle:         Encoding a chess board state throughout a game&lt;/a&gt; on &lt;a href="http://stackoverflow.com/"&gt;StackOverflow&lt;/a&gt;. &lt;/p&gt;  &lt;p&gt;Now I’ll admit that I love this kind of question. I’m not really such a big fan of &lt;a href="http://codegolf.com/"&gt;Code     Golf&lt;/a&gt; as that’s an exercise in writing terse, unreadable code (although some     of the solutions have been brilliant). But this Chess problem is the sort of thing that will allow a programmer to     demonstrate his or her mental acuity and problem solving ability (or the lack thereof).&lt;/p&gt;  &lt;h3&gt;The Problem&lt;/h3&gt;  &lt;blockquote&gt;&lt;p&gt;What is the most space-efficient way you can think of to encode the state of a chess game (or subset     thereof)? That is, given a chess board with the pieces arranged legally, encode both this initial state and all     subsequent legal moves taken by the players in the game.&lt;/p&gt;&lt;/blockquote&gt;  &lt;p&gt;This image illustrates the starting     Chess position. Chess occurs on an 8x8 board with each player starting with an identical set of 16 pieces consisting     of 8 pawns, 2 rooks, 2 knights, 2 bishops, 1 queen and 1 king as illustrated here:&lt;/p&gt;  &lt;p&gt;&lt;img height="250" width="250" src="http://img222.imageshack.us/img222/5970/chess.png"&gt;Positions are generally recorded as a letter for the     column followed by the number for the row so White’s queen is at d1. Moves are most often stored in &lt;a href="http://en.wikipedia.org/wiki/Algebraic_chess_notation"&gt;algebraic notation&lt;/a&gt;, which is unambiguous     and generally only specifies the minimal information necessary. Consider this opening:&lt;/p&gt;  &lt;p&gt;1. e4 e5 &lt;br/&gt;2. Nf3     Nc6 &lt;br/&gt;3. …&lt;/p&gt;  &lt;p&gt;which translates to:&lt;/p&gt; &lt;ol&gt;     &lt;li&gt;White moves king’s pawn from e2 to e4 (it is the only piece that can get to e4 hence “e4”);&lt;/li&gt;     &lt;li&gt;Black moves the king’s pawn from e7 to e5;&lt;/li&gt;     &lt;li&gt;White moves the knight (N) to f3;&lt;/li&gt;     &lt;li&gt;Black moves the knight to c6.&lt;/li&gt; &lt;/ol&gt; &lt;p&gt;The board looks like this:&lt;/p&gt;  &lt;p&gt;&lt;img width="250" height="250" src="http://img222.imageshack.us/img222/371/chessx.png" /&gt; An important     ability for any programmer is to be able to &lt;em&gt;correctly and unambiguously specify the problem&lt;/em&gt;.&lt;/p&gt;  &lt;p&gt;So     what’s missing or ambiguous? A lot as it turns out.&lt;/p&gt;  &lt;h3&gt;Board State vs Game State&lt;/h3&gt;  &lt;p&gt;The first thing you     need to determine is whether you’re storing the state of a game or the position of pieces on the board. Encoding     simply the positions of the pieces is one thing but the problem says “all subsequent legal moves”. The problem also     says nothing about knowing the moves up to this point. That’s actually a problem as I’ll explain.&lt;/p&gt;  &lt;h3&gt;     Castling&lt;/h3&gt;  &lt;p&gt;The game has proceeded as follows:&lt;/p&gt;  &lt;p&gt;1. e4 e5 &lt;br/&gt;2. Nf3 Nc6 &lt;br/&gt;3. Bb5 a6 &lt;br/&gt;4. Ba4 Bc5 &lt;/p&gt;  &lt;p&gt;The board looks as follows:&lt;/p&gt;  &lt;p&gt;&lt;img width="250" height="250" src="http://img163.imageshack.us/img163/371/chessx.png" /&gt; White has     the option of &lt;a href="http://en.wikipedia.org/wiki/Castling"&gt;castling&lt;/a&gt;. Part of the requirements for this are     that the king and the relevant rook can never have moved, so whether the king or either rook of each side has moved     will need to be stored. Obviously if they aren’t on their starting positions, they have moved otherwise it needs to     be specified.&lt;/p&gt;  &lt;p&gt;There are several strategies that can be used for dealing with this problem.&lt;/p&gt;  &lt;p&gt;Firstly,     we could store an extra 6 bits of information (1 for each rook and knight position) to indicate whether that piece     had moved. We could streamline this by only storing a bit for one of these six squares if the right piece happens to     be in it. Alternatively we could treat each unmoved piece as another piece type so instead of 6 piece types on each     side (pawn, rook, knight, bishop, queen and king) there are 8 (adding unmoved rook and unmoved king).&lt;/p&gt;  &lt;h3&gt;En     Passant&lt;/h3&gt;  &lt;p&gt;Another peculiar and often-neglected rule in Chess is &lt;a href="http://en.wikipedia.org/wiki/En_passant"&gt;En Passant&lt;/a&gt;.&lt;/p&gt;  &lt;p&gt;&lt;img height="250" width="250" src="http://img37.imageshack.us/img37/6535/chessa.png" /&gt; &lt;/p&gt;  &lt;p&gt;The game has progressed.&lt;/p&gt;  &lt;p&gt;1. e4 e5 &lt;br/&gt;2. Nf3 Nc6 &lt;br/&gt;3. Bb5 a6 &lt;br/&gt;4. Ba4 Bc5 &lt;br/&gt;5. O-O b5 &lt;br/&gt;6.     Bb3 b4 &lt;br/&gt;7. c4&lt;/p&gt;  &lt;p&gt;Black’s pawn on b4 now has the option of moving his pawn on b4 to c3 taking the White pawn     on c4. This only happens on the first opportunity meaning if Black passes on the option now he can’t take it next     move. So we need to store this.&lt;/p&gt;  &lt;p&gt;If we know the previous move we can definitely answer if En Passant is     possible. Alternatively we can store whether each pawn on its 4th rank has just moved there with a double move     forward. Or we can look at each possible En Passant position on the board and have a flag to indicate whether its     possible or not.&lt;/p&gt;  &lt;h3&gt;Promotion&lt;/h3&gt;  &lt;p&gt;&lt;img height="250" width="250" src="http://img689.imageshack.us/img689/5970/chess.png" /&gt;&lt;/p&gt;  &lt;p&gt;     It is White’s move. If White moves his pawn on h7 to h8 it can be promoted to any other piece (but not the king).     99% of the time it is promoted to a Queen but sometimes it isn’t, typically because that may force a stalemate when     otherwise you’d win. This is written as:&lt;/p&gt;  &lt;p&gt;56. h8=Q&lt;/p&gt;  &lt;p&gt;This is important in our problem because it means     we can’t count on there being a fixed number of pieces on each side. It is entirely possible (but incredibly     unlikely) for one side to end up with 9 queens, 10 rooks, 10 bishops or 10 knights if all 8 pawns get promoted.&lt;/p&gt;  &lt;h3&gt;Stalemate&lt;/h3&gt;  &lt;p&gt;When in a position from which you cannot win your best tactic is to try for a &lt;a href="http://en.wikipedia.org/wiki/Stalemate"&gt;stalemate&lt;/a&gt;. The most likely variant is where you cannot make a     legal move (usually because any move when put your king in check). In this case you can claim a draw. This one is     easy to cater for.&lt;/p&gt;  &lt;p&gt;The second variant is by &lt;a href="http://en.wikipedia.org/wiki/Threefold_repetition"&gt;threefold     repetition&lt;/a&gt;. If the same board position occurs three times in a game (or will occur a third time on the next     move), a draw can be claimed. The positions need not occur in any particular order (meaning it doesn’t have to the     same sequence of moves repeated three times). This one greatly complicates the problem because you have to remember     every previous board position. &lt;strong&gt;&lt;em&gt;If this is a requirement of the problem the only possible solution to the         problem is to store every previous move.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;Lastly, there is the &lt;a href="http://en.wikipedia.org/wiki/Fifty-move_rule"&gt;fifty move rule&lt;/a&gt;. A player can claim a draw if no pawn     has moved and no piece has been taken in the previous fifty consecutive moves so we would need to store how many     moves since a pawn was moved or a piece taken (the latest of the two. This requires 6 bits (0-63).&lt;/p&gt;  &lt;h3&gt;Whose     Turn Is It?&lt;/h3&gt;  &lt;p&gt;Of course we also need to know whose turn it is and this is a single bit of information.&lt;/p&gt;  &lt;h3&gt;Two Problems&lt;/h3&gt;  &lt;p&gt;Because of the stalemate case, the only feasible or sensible way to store the game state is to     store all the moves that led to this position. I’ll tackle that one problem. The board state problem will be     simplified to this: &lt;em&gt;store the current position of all pieces on the board ignoring castling, en passant,         stalemate conditions and whose turn it is&lt;/em&gt;.&lt;/p&gt;  &lt;p&gt;Piece layout can be broadly handled in one of two ways:     by storing the contents of each square or by storing the position of each piece.&lt;/p&gt;  &lt;h3&gt;Simple Contents&lt;/h3&gt;  &lt;p&gt;     There are six piece types (pawn, rook, knight, bishop, queen and king). Each piece can be White or Black so a square     may contain one of 12 possible pieces or it may be empty so there are 13 possibilities. 13 can be stored in 4 bits     (0-15) So the simplest solution is to store 4 bits for each square times 64 squares or 256 bits of information.&lt;/p&gt;  &lt;p&gt;The advantage of this method is that manipulation is &lt;em&gt;incredibly&lt;/em&gt; easy and fast. This could even be extended     by adding 3 more possibilities without increasing the storage requirements: a pawn that has moved 2 spaces on the     last turn, a king that hasn’t moved and a rook that hasn’t moved, which will cater for a lot of previously mentioned     issues.&lt;/p&gt;  &lt;p&gt;But we can do better.&lt;/p&gt;  &lt;h3&gt;Base 13 Encoding&lt;/h3&gt;  &lt;p&gt;It is often helpful to think of the board     position as a very large number. This is often done in computer science. For example, the &lt;a href="http://en.wikipedia.org/wiki/Halting_problem"&gt;halting problem&lt;/a&gt; treats a computer program (rightly)     as a large number.&lt;/p&gt;  &lt;p&gt;The first solution treats the position as a 64 digit base 16 number but as demonstrated     there is redundancy in this information (being the 3 unused possibilities per “digit”) so we can reduce the number     space to 64 base 13 digits. Of course this can’t be done as efficiently as base 16 can but it will save on storage     requirements (and minimizing storage space is our goal).&lt;/p&gt;  &lt;p&gt;In base 10 the number 234 is equivalent to 2 x     10&lt;sup&gt;2&lt;/sup&gt; + 3 x 10&lt;sup&gt;1&lt;/sup&gt; + 4 x 10&lt;sup&gt;0&lt;/sup&gt;.&lt;/p&gt;  &lt;p&gt;In base 16 the number 0xA50 is equivalent to 10 x     16&lt;sup&gt;2&lt;/sup&gt; + 5 x 16&lt;sup&gt;1&lt;/sup&gt; + 0 x 16&lt;sup&gt;0&lt;/sup&gt; = 2640 (decimal).&lt;/p&gt;  &lt;p&gt;So we can encode our position as     p&lt;sub&gt;0&lt;/sub&gt; x 13&lt;sup&gt;63&lt;/sup&gt; + p&lt;sub&gt;1&lt;/sub&gt; x 13&lt;sup&gt;62&lt;/sup&gt; + ... + p&lt;sub&gt;63&lt;/sub&gt; x 13&lt;sup&gt;0&lt;/sup&gt; where     p&lt;sub&gt;i&lt;/sub&gt; represents the contents of square &lt;em&gt;i&lt;/em&gt;.&lt;/p&gt;  &lt;p&gt;2&lt;sup&gt;256&lt;/sup&gt; equals approximately 1.16e77. 13&lt;sup&gt;64&lt;/sup&gt;     equals approximately 1.96e71, which requires 237 bits of storage space. That saving of a mere 7.5% comes at a cost     of &lt;strong&gt;&lt;em&gt;significantly&lt;/em&gt;&lt;/strong&gt; increased manipulation costs.&lt;/p&gt;  &lt;h3&gt;Variable Base Encoding&lt;/h3&gt;  &lt;p&gt;In     legal boards certain pieces can’t appear in certain squares. For example, pawns cannot occur at in the first or     eighth ranks, reducing the possibilities for those squares to 11. That reduces the possible boards to     11&lt;sup&gt;16&lt;/sup&gt; x 13&lt;sup&gt;48&lt;/sup&gt; = 1.35e70 (approximately), requiring 233 bits of storage space.&lt;/p&gt;  &lt;p&gt;Actually     encoding and decoding such values to and from decimal (or binary) is a little more convoluted but it can be done     reliably and is left as an exercise to the reader.&lt;/p&gt;  &lt;h3&gt;Variable Width Alphabets&lt;/h3&gt;  &lt;p&gt;The previous two     methods can both be described as &lt;em&gt;fixed-width alphabetic encoding&lt;/em&gt;. Each of the 11, 13 or 16 members of the     alphabet is substituted for another value. Each “character” is the same width but the efficiency can be improved     when you consider that &lt;em&gt;each character is not equally likely&lt;/em&gt;.&lt;/p&gt;  &lt;p&gt;&lt;img height="250" width="250" src="http://www.spacetoday.org/images/History/RadioHistory/MorseCodeChart.gif" /&gt; &lt;/p&gt;  &lt;p&gt;Consider &lt;a href="http://en.wikipedia.org/wiki/Morse_code"&gt;Morse code&lt;/a&gt; (pictured left). Characters in a     message are encoded as a sequence of dashes and dots. Those dashes and dots are transferred over radio (typically)     with a pause between them to delimit them.&lt;/p&gt;  &lt;p&gt;Notice how the letter E (&lt;a href="http://en.wikipedia.org/wiki/Letter_frequencies"&gt;the most common letter in English&lt;/a&gt;) is a single dot,     the shortest possible sequence, whereas Z (the least frequent) is two dashes and two beeps.&lt;/p&gt;  &lt;p&gt;Such a scheme     can significantly reduce the size of an &lt;em&gt;expected&lt;/em&gt; message but comes at the cost of increasing the size of a     random character sequence.&lt;/p&gt;  &lt;p&gt;It should be noted that Morse code has another inbuilt feature: dashes are as     long as three dots so the above code is created with this in mind to minimize the use of dashes. Since 1s and 0s     (our building blocks) don’t have this problem, it’s not a feature we need to replicate.&lt;/p&gt;  &lt;p&gt;Lastly, there are     two kinds of rests in Morse code. A short rest (the length of a dot) is used to distinguish between dots and dashes.     A longer gap (the length of a dash) is used to delimit characters.&lt;/p&gt;  &lt;p&gt;So how does this apply to our     problem?&lt;/p&gt;  &lt;h3&gt;Huffman Coding&lt;/h3&gt;  &lt;p&gt;There is an algorithm for dealing with variable length codes called &lt;a href="http://en.wikipedia.org/wiki/Huffman_coding"&gt;Huffman coding&lt;/a&gt;. Huffman coding creates a variable length     code substitution, typically uses expected frequency of the symbols to assign shorter values to the more common     symbols.&lt;/p&gt;  &lt;p&gt;&lt;img height="322" width="500" src="http://upload.wikimedia.org/wikipedia/commons/thumb/8/82/Huffman_tree_2.svg/500px-Huffman_tree_2.svg.png" /&gt; &lt;/p&gt;  &lt;p&gt;In the above tree, the letter E is encoded as 000 (or left-left-left) and S is 1011. It should be clear that     this encoding scheme is &lt;em&gt;unambiguous&lt;/em&gt;.&lt;/p&gt;  &lt;p&gt;This is an important distinction from Morse code. Morse code     has the character separator so it can do otherwise ambiguous substitution (eg 4 dots can be H or 2 Is) but we only     have 1s and 0s so we choose an unambiguous substitution instead.&lt;/p&gt;  &lt;p&gt;Below is a simple implementation:&lt;/p&gt;  &lt;pre class="brush:java"&gt;
private static class Node {
  private final Node left;
  private final Node right;
  private final String label;
  private final int weight;

  private Node(String label, int weight) {
    this.left = null;
    this.right = null;
    this.label = label;
    this.weight = weight;
  }

  public Node(Node left, Node right) {
    this.left = left;
    this.right = right;
    label = &amp;quot;&amp;quot;;
    weight = left.weight + right.weight;
  }

  public boolean isLeaf() {
    return left == null &amp;&amp; right == null;
  }

  public Node getLeft() {
    return left;
  }

  public Node getRight() {
    return right;
  }

  public String getLabel() {
    return label;
  }

  public int getWeight() {
    return weight;
  }
}

private static class WeightComparator implements Comparator&amp;lt;Node&amp;gt; {
  @Override
  public int compare(Node o1, Node o2) {
    if (o1.getWeight() == o2.getWeight()) {
      return 0;
    } else {
      return o1.getWeight() &amp;lt; o2.getWeight() ? -1 : 1;
    }
  }
}

private static class PathComparator implements Comparator&amp;lt;String&amp;gt; {
  @Override
  public int compare(String o1, String o2) {
    if (o1 == null) {
      return o2 == null ? 0 : -1;
    } else if (o2 == null) {
      return 1;
    } else {
      int length1 = o1.length();
      int length2 = o2.length();
      if (length1 == length2) {
        return o1.compareTo(o2);
      } else {
        return length1 &amp;lt; length2 ? -1 : 1;
      }
    }
  }
}

private final static List&amp;lt;String&amp;gt; COLOURS;
private final static Map&amp;lt;String, Integer&amp;gt; WEIGHTS;

static {
  List&amp;lt;String&amp;gt; list = new ArrayList&amp;lt;String&amp;gt;();
  list.add(&amp;quot;White&amp;quot;);
  list.add(&amp;quot;Black&amp;quot;);
  COLOURS = Collections.unmodifiableList(list);
  Map&amp;lt;String, Integer&amp;gt; map = new HashMap&amp;lt;String, Integer&amp;gt;();
  for (String colour : COLOURS) {
    map.put(colour + &amp;quot; &amp;quot; + &amp;quot;King&amp;quot;, 1);
    map.put(colour + &amp;quot; &amp;quot; + &amp;quot;Queen&amp;quot;, 1);
    map.put(colour + &amp;quot; &amp;quot; + &amp;quot;Rook&amp;quot;, 2);
    map.put(colour + &amp;quot; &amp;quot; + &amp;quot;Knight&amp;quot;, 2);
    map.put(colour + &amp;quot; &amp;quot; + &amp;quot;Bishop&amp;quot;, 2);
    map.put(colour + &amp;quot; &amp;quot; + &amp;quot;Pawn&amp;quot;, 8);
  }
  map.put(&amp;quot;Empty&amp;quot;, 32);
  WEIGHTS = Collections.unmodifiableMap(map);
}

public static void main(String args[]) {
  PriorityQueue&amp;lt;Node&amp;gt; queue = new PriorityQueue&amp;lt;Node&amp;gt;(WEIGHTS.size(), new WeightComparator());
  for (Map.Entry&amp;lt;String, Integer&amp;gt; entry : WEIGHTS.entrySet()) {
    queue.add(new Node(entry.getKey(), entry.getValue()));
  }
  while (queue.size() &amp;gt; 1) {
    Node first = queue.poll();
    Node second = queue.poll();
    queue.add(new Node(first, second));
  }
  Map&amp;lt;String, Node&amp;gt; nodes = new TreeMap&amp;lt;String, Node&amp;gt;(new PathComparator());
  addLeaves(nodes, queue.peek(), &amp;quot;&amp;quot;);
  for (Map.Entry&amp;lt;String, Node&amp;gt; entry : nodes.entrySet()) {
    System.out.printf(&amp;quot;%s %s%n&amp;quot;, entry.getKey(), entry.getValue().getLabel());
  }
}

public static void addLeaves(Map&amp;lt;String, Node&amp;gt; nodes, Node node, String prefix) {
  if (node != null) {
    addLeaves(nodes, node.getLeft(), prefix + &amp;quot;0&amp;quot;);
    addLeaves(nodes, node.getRight(), prefix + &amp;quot;1&amp;quot;);
    if (node.isLeaf()) {
      nodes.put(prefix, node);
    }
  }
}
&lt;/pre&gt;

&lt;p&gt;One possible output is:&lt;/p&gt;

&lt;table border="1" cellspacing="0" cellpadding="2" width="500"&gt;
    &lt;tbody&gt;
    &lt;tr&gt;
        &lt;th style="width: 100px"&gt;&amp;#160;&lt;/th&gt;

        &lt;th style="text-align: center; width: 100px"&gt;White&lt;/th&gt;

        &lt;th style="text-align: center; width: 100px"&gt;Black&lt;/th&gt;
    &lt;/tr&gt;

    &lt;tr&gt;
        &lt;th style="text-align: left"&gt;Empty&lt;/th&gt;

        &lt;td style="text-align: center" colspan="2"&gt;0&lt;/td&gt;
    &lt;/tr&gt;

    &lt;tr&gt;
        &lt;th style="text-align: left"&gt;Pawn&lt;/th&gt;

        &lt;td style="text-align: center"&gt;110&lt;/td&gt;

        &lt;td style="text-align: center"&gt;100&lt;/td&gt;
    &lt;/tr&gt;

    &lt;tr&gt;
        &lt;th style="text-align: left"&gt;Rook&lt;/th&gt;

        &lt;td style="text-align: center"&gt;11111&lt;/td&gt;

        &lt;td style="text-align: center"&gt;11110&lt;/td&gt;
    &lt;/tr&gt;

    &lt;tr&gt;
        &lt;th style="text-align: left"&gt;Knight&lt;/th&gt;

        &lt;td style="text-align: center"&gt;10110&lt;/td&gt;

        &lt;td style="text-align: center"&gt;10101&lt;/td&gt;
    &lt;/tr&gt;

    &lt;tr&gt;
        &lt;th style="text-align: left"&gt;Bishop&lt;/th&gt;

        &lt;td style="text-align: center"&gt;10100&lt;/td&gt;

        &lt;td style="text-align: center"&gt;11100&lt;/td&gt;
    &lt;/tr&gt;

    &lt;tr&gt;
        &lt;th style="text-align: left"&gt;Queen&lt;/th&gt;

        &lt;td style="text-align: center"&gt;111010&lt;/td&gt;

        &lt;td style="text-align: center"&gt;111011&lt;/td&gt;
    &lt;/tr&gt;

    &lt;tr&gt;
        &lt;th style="text-align: left"&gt;King&lt;/th&gt;

        &lt;td style="text-align: center"&gt;101110&lt;/td&gt;

        &lt;td style="text-align: center"&gt;101111&lt;/td&gt;
    &lt;/tr&gt;
    &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;For a starting position this equates to 32 x 1 + 16 x 3 + 12 x 5 + 4 x 6 = 164 bits.&lt;/p&gt;

&lt;h3&gt;State Difference&lt;/h3&gt;

&lt;p&gt;Another possible approach is to combine the very first approach with Huffman coding. This is based on the assumption
    that most expected Chess boards (rather than randomly generated ones) are more likely than not to, at least in part,
    resemble a starting position.&lt;/p&gt;

&lt;p&gt;So what you do is XOR the 256 bit current board position with a 256 bit starting position and then encode that (using
    Huffman coding or, say, some method of &lt;a href="http://en.wikipedia.org/wiki/Run-length_encoding"&gt;run length
        encoding&lt;/a&gt;). Obviously this will be very efficient to start with (64 0s probably corresponding to 64 bits) but
    increase in storage required as the game progresses.&lt;/p&gt;

&lt;h3&gt;Piece Position&lt;/h3&gt;

&lt;p&gt;As mentioned, another way of attacking this problem is to instead store the position of each piece a player has. This
    works particularly well with endgame positions where most squares will be empty (but in the Huffman coding approach
    empty squares only use 1 bit anyway).&lt;/p&gt;

&lt;p&gt;Each side will have a king and 0-15 other pieces. Because of promotion the exact make up of those pieces can vary
    enough that you can’t assume the numbers based on the starting positions are maxima.&lt;/p&gt;

&lt;p&gt;The logical way to divide this up is store a Position consisting of two Sides (White and Black). Each Side has:&lt;/p&gt;

&lt;ul&gt;
    &lt;li&gt;A king: 6 bits for the location;&lt;/li&gt;

    &lt;li&gt;Has pawns: 1 (yes), 0 (no);&lt;/li&gt;

    &lt;li&gt;If yes, number of pawns: 3 bits (0-7+1 = 1-8);&lt;/li&gt;

    &lt;li&gt;If yes, the location of each pawn is encoded: 45 bits (see below);&lt;/li&gt;

    &lt;li&gt;Number of non-pawns: 4 bits (0-15);&lt;/li&gt;

    &lt;li&gt;For each piece: type (2 bits for queen, rook, knight, bishop) and location (6 bits)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As for the pawn location, the pawns can only be on 48 possible squares (not 64 like the others). As such, it is
    better not to waste the extra 16 values that using 6 bits per pawn would use. So if you have 8 pawns there are
    48&lt;sup&gt;8&lt;/sup&gt; possibilities, equalling 28,179,280,429,056. You need 45 bits to encode that many values.&lt;/p&gt;

&lt;p&gt;That’s 105 bits per side or 210 bits total. The starting position is the worst case for this method however and it
    will get substantially better as you remove pieces.&lt;/p&gt;

&lt;p&gt;It should be pointed out that there are less than 48&lt;sup&gt;8&lt;/sup&gt; possibilities because the pawns can’t all be in the
    same square&amp;#160; The first has 48 possibilities, the second 47 and so on. 48 x 47 x … x 41 = 1.52e13 = 44 bits
    storage.&lt;/p&gt;

&lt;p&gt;You can further improve this by eliminating the squares that are occupied by other pieces (including the other side)
    so you could first place the white non-pawns then the black non-pawns, then the white pawns and lastly the black
    pawns. On a starting position this reduces the storage requirements to 44 bits for White and 42 bits for Black.&lt;/p&gt;

&lt;h3&gt;Combined Approaches&lt;/h3&gt;

&lt;p&gt;Another possible optimization is that each of these approaches has its strength and weaknesses. You could, say, pick
    the best 4 and then encode a scheme selector in the first two bits and then the scheme-specific storage after
    that.&lt;/p&gt;

&lt;p&gt;With the overhead that small, this will by far be the best approach.&lt;/p&gt;

&lt;h3&gt;Game State&lt;/h3&gt;

&lt;p&gt;I return to the problem of storing a &lt;em&gt;game&lt;/em&gt; rather than a &lt;em&gt;position&lt;/em&gt;. Because of the threefold
    repetition we have to store the list of moves that have occurred to this point.&lt;/p&gt;

&lt;h3&gt;Annotations&lt;/h3&gt;

&lt;p&gt;One thing you have to determine is are you simply storing a list of moves or are you annotating the game? Chess games
    are often annotated, for example:&lt;/p&gt;

&lt;p&gt;17. Bb5!! Nc4?&lt;/p&gt;

&lt;p&gt;White’s move is marked by two exclamation points as brilliant whereas Black’s is viewed as a mistake. See &lt;a href="http://en.wikipedia.org/wiki/Punctuation_(chess)"&gt;Chess punctuation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Additionally you could also need to store free text as the moves are described.&lt;/p&gt;

&lt;p&gt;I am assuming that the moves are sufficient so there will be no annotations.&lt;/p&gt;

&lt;h3&gt;Algebraic Notation&lt;/h3&gt;

&lt;p&gt;We could simply store the the text of the move here (“e4”, “Bxb5”, etc). Including a terminating byte you’re looking
    at about 6 bytes (48 bits) per move (worst case). That’s not particularly efficient.&lt;/p&gt;

&lt;p&gt;The second thing to try is to store the starting location (6 bits) and end location (6 bits) so 12 bits per move.
    That is significantly better.&lt;/p&gt;

&lt;p&gt;Alternatively we can determine all the legal moves from the current position in a predictable and deterministic way
    and state which we’ve chosen. This then goes back to the variable base encoding mentioned above. White and Black
    have 14 possible moves each on their first move, more on the second and so on.&lt;/p&gt;

&lt;h3&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;There is no absolutely right answer to this question. There are many possible approaches of which the above are just
    a few.&lt;/p&gt;

&lt;p&gt;What I like about this and similar problems is that it demands abilities important to any programmer like considering
    the usage pattern, accurately determining requirements and thinking about corner cases.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Chess positions taken as screenshots from &lt;/em&gt;&lt;a href="http://www.chesspositiontrainer.com/"&gt;&lt;em&gt;Chess Position
    Trainer&lt;/em&gt;&lt;/a&gt;&lt;em&gt;.&lt;/em&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/336308386934546555-3488586796642816981?l=www.cforcoding.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/Uv27IcXHRFN_n5-n5IkU4_6RFjM/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/Uv27IcXHRFN_n5-n5IkU4_6RFjM/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/Uv27IcXHRFN_n5-n5IkU4_6RFjM/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/Uv27IcXHRFN_n5-n5IkU4_6RFjM/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/eMhhoWrvlsA" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/3488586796642816981/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2009/12/programming-puzzles-chess-positions-and.html#comment-form" title="4 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/3488586796642816981?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/3488586796642816981?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/eMhhoWrvlsA/programming-puzzles-chess-positions-and.html" title="Programming Puzzles, Chess Positions and Huffman Coding" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07140129710674369084" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">4</thr:total><feedburner:origLink>http://www.cforcoding.com/2009/12/programming-puzzles-chess-positions-and.html</feedburner:origLink></entry><entry gd:etag="W/&quot;AkQGSXo8fCp7ImA9WxNaEEo.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-7975371492719883428</id><published>2009-11-24T23:00:00.001+08:00</published><updated>2009-11-24T23:58:48.474+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-11-24T23:58:48.474+08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Java" /><category scheme="http://www.blogger.com/atom/ns#" term="lombok" /><category scheme="http://www.blogger.com/atom/ns#" term="review" /><title>Reviewing Project Lombok or the Right Way to Write a Library</title><content type="html">&lt;p&gt;You could consider this a parody of my own &lt;a href="http://www.cforcoding.com/2009/07/spring-batch-or-how-not-to-design-api.html"&gt;Spring Batch or How Not to Design an API&lt;/a&gt;. Credit where credit is due however and this brings me to &lt;a href="http://projectlombok.org/"&gt;Project Lombok&lt;/a&gt;.&lt;/p&gt;  &lt;h3&gt;What is Project Lombok?&lt;/h3&gt;  &lt;p&gt;Every Java developer knows that Java involves writing a lot of boilerplate code. Create a &lt;a href="http://en.wikipedia.org/wiki/Data_transfer_object"&gt;value object&lt;/a&gt; (or “data transfer object”) and you might have a behaviour-less class with a dozen properties. Those twelve statements are the only important thing in the class. After that you create a hundred lines of getters, setters (if not &lt;a href="http://en.wikipedia.org/wiki/Immutable_object"&gt;immutable&lt;/a&gt;), an equals/hashCode and a toString method, possibly all IDE generated.&lt;/p&gt;  &lt;p&gt;This is an error-prone process, even when generated by an IDE. An IDE won’t typically tell you if, say, you add another data member and don’t regenerate your equals and hashCode methods.&lt;/p&gt;  &lt;p&gt;Project Lombok seeks to greatly reduce the need for such boilerplate by using annotations to automatically generate it so you don’t have to. This fits in well with my own philosophy:&lt;/p&gt;  &lt;h3&gt;Lines of Code Are The Enemy&lt;/h3&gt;  &lt;p&gt;This is not a new or revolutionary idea. Decades ago it was noted that the number of lines of code was an important metric for both programmer productivity and expected bugs, meaning that whether you were dealing with assembly language or Erlang the metrics were roughly the same. This in part explains the steady move towards higher level languages as the more you can get done in one line of code the better.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://stackoverflow.com/questions/862277/what-is-the-industry-standard-for-bugs-per-1000-lines-of-code/862304#862304"&gt;Bugs per lines of code&lt;/a&gt; addresses this point, referencing Code Complete and other sources.&lt;/p&gt;  &lt;p&gt;This is why I think things like first-class properties and closures are advantages in the .Net world (over Java): because they can do the same thing with less code, even if you can do the same thing with IDE-generated getters and setters and anonymous inner classes (respectively).&lt;/p&gt;  &lt;h3&gt;What Can Project Lombok Do?&lt;/h3&gt;  &lt;p&gt;Project Lombok has &lt;a href="http://projectlombok.org/features/index.html"&gt;seven annotations&lt;/a&gt; for minimizing boilerplate code.&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;/p&gt;   &lt;dt&gt;&lt;a href="http://projectlombok.org/GetterSetter.html"&gt;&lt;code&gt;@Getter&lt;/code&gt; / &lt;code&gt;@Setter&lt;/code&gt;&lt;/a&gt;       &lt;p&gt;&lt;/p&gt;   &lt;/dt&gt;&lt;dd&gt;Never write &lt;code&gt;public int getFoo() {return foo;}&lt;/code&gt; again. &lt;/dd&gt;&lt;dt&gt;&lt;a href="http://projectlombok.org/ToString.html"&gt;&lt;code&gt;@ToString&lt;/code&gt;&lt;/a&gt;&lt;/dt&gt;&lt;dd&gt;No need to start a debugger to see your fields: Just let lombok generate a&lt;/dd&gt;&lt;dd&gt;&lt;code&gt;toString&lt;/code&gt; for you! &lt;/dd&gt;&lt;dt&gt;&lt;a href="http://projectlombok.org/EqualsAndHashCode.html"&gt;&lt;code&gt;@EqualsAndHashCode&lt;/code&gt;&lt;/a&gt;&lt;/dt&gt;&lt;dd&gt;Equality made easy: Generates &lt;code&gt;hashCode&lt;/code&gt; and &lt;code&gt;equals&lt;/code&gt; implementations&lt;/dd&gt;&lt;dd&gt;from the fields of your object. &lt;/dd&gt;&lt;dt&gt;&lt;a href="http://projectlombok.org/Data.html"&gt;&lt;code&gt;@Data&lt;/code&gt;&lt;/a&gt;&lt;/dt&gt;&lt;dd&gt;All together now: A shortcut for &lt;code&gt;@ToString&lt;/code&gt;, &lt;code&gt;@EqualsAndHashCode&lt;/code&gt;,&lt;/dd&gt;&lt;dd&gt;&lt;code&gt;@Getter&lt;/code&gt; on all fields, and &lt;code&gt;@Setter&lt;/code&gt; on all non-final fields. You even get&lt;/dd&gt;&lt;dd&gt;a free constructor to initialize your final fields! &lt;/dd&gt;&lt;dt&gt;&lt;a href="http://projectlombok.org/Cleanup.html"&gt;&lt;code&gt;@Cleanup&lt;/code&gt;&lt;/a&gt;&lt;/dt&gt;&lt;dd&gt;Automatic resource management: Call your &lt;code&gt;close()&lt;/code&gt; methods safely with&lt;/dd&gt;&lt;dd&gt;no hassle. &lt;/dd&gt;&lt;dt&gt;&lt;a href="http://projectlombok.org/Synchronized.html"&gt;&lt;code&gt;@Synchronized&lt;/code&gt;&lt;/a&gt;&lt;/dt&gt;&lt;dd&gt;&lt;code&gt;synchronized&lt;/code&gt; done right: Don't expose your locks. &lt;/dd&gt;&lt;dt&gt;&lt;a href="http://projectlombok.org/SneakyThrows.html"&gt;&lt;code&gt;@SneakyThrows&lt;/code&gt;&lt;/a&gt;&lt;/dt&gt;&lt;dd&gt;To boldly throw checked exceptions where no one has thrown them before!&lt;/dd&gt;&lt;/blockquote&gt;  &lt;p&gt;The effect can be dramatic. From &lt;a href="http://projectlombok.org/features/Data.html"&gt;@Data&lt;/a&gt; you can replace this:&lt;/p&gt;  &lt;blockquote&gt;   &lt;pre class="brush:java"&gt;import java.util.Arrays;

public class DataExample {
    private final String name;
    private int age;
    private double score;
    private String[] tags;

    public DataExample(String name) {
        this.name = name;
    }

    public String getName() {
        return name;
    }

    void setAge(int age) {
        this.age = age;
    }

    public int getAge() {
        return age;
    }

    public void setScore(double score) {
        this.score = score;
    }

    public double getScore() {
        return score;
    }

    public String[] getTags() {
        return tags;
    }

    public void setTags(String[] tags) {
        this.tags = tags;
    }

    @Override
    public String toString() {
        return &amp;quot;DataExample(&amp;quot; + name + &amp;quot;, &amp;quot; + age + &amp;quot;, &amp;quot; + score + &amp;quot;, &amp;quot; + Arrays.deepToString(tags) + &amp;quot;)&amp;quot;;
    }

    @Override
    public boolean equals(Object o) {
        if (o == this) return true;
        if (o == null) return false;
        if (o.getClass() != this.getClass()) return false;
        DataExample other = (DataExample) o;
        if (name == null ? other.name != null : !name.equals(other.name)) return false;
        if (age != other.age) return false;
        if (Double.compare(score, other.score) != 0) return false;
        if (!Arrays.deepEquals(tags, other.tags)) return false;
        return true;
    }

    @Override
    public int hashCode() {
        final int PRIME = 31;
        int result = 1;
        final long temp1 = Double.doubleToLongBits(score);
        result = (result * PRIME) + (name == null ? 0 : name.hashCode());
        result = (result * PRIME) + age;
        result = (result * PRIME) + (int) (temp1 ^ (temp1 &amp;gt;&amp;gt;&amp;gt; 32));
        result = (result * PRIME) + Arrays.deepHashCode(tags);
        return result;
    }

    public static class Exercise&lt;t&gt; {
        private final String name;
        private final T value;

        private Exercise(String name, T value) {
            this.name = name;
            this.value = value;
        }

        public static &lt;t&gt; Exercise&lt;t&gt; of(String name, T value) {
            return new Exercise&lt;t&gt;(name, value);
        }

        public String getName() {
            return name;
        }

        public T getValue() {
            return value;
        }

        @Override
        public String toString() {
            return &amp;quot;Exercise(name=&amp;quot; + name + &amp;quot;, value=&amp;quot; + value + &amp;quot;)&amp;quot;;
        }

        @Override
        public boolean equals(Object o) {
            if (o == this) return true;
            if (o == null) return false;
            if (o.getClass() != this.getClass()) return false;
            Exercise&amp;lt;?&amp;gt; other = (Exercise&amp;lt;?&amp;gt;) o;
            if (name == null ? other.name != null : !name.equals(other.name)) return false;
            if (value == null ? other.value != null : !value.equals(other.value)) return false;
            return true;
        }

        @Override
        public int hashCode() {
            final int PRIME = 31;
            int result = 1;
            result = (result * PRIME) + (name == null ? 0 : name.hashCode());
            result = (result * PRIME) + (value == null ? 0 : value.hashCode());
            return result;
        }
    }
}&lt;/pre&gt;
&lt;/blockquote&gt;

&lt;p&gt;with&lt;/p&gt;

&lt;blockquote&gt;
  &lt;pre class="brush:java"&gt;import lombok.AccessLevel;
import lombok.Setter;
import lombok.Data;
import lombok.ToString;

@Data
public class DataExample {
    private final String name;
    @Setter(AccessLevel.PACKAGE)
    private int age;
    private double score;
    private String[] tags;

    @ToString(includeFieldNames = true)
    @Data(staticConstructor = &amp;quot;of&amp;quot;)
    public static class Exercise&lt;t&gt; {
        private final String name;
        private final T value;
    }
}&lt;/pre&gt;
&lt;/blockquote&gt;

&lt;p&gt;&amp;#160;&lt;/p&gt;

&lt;h3&gt;With Annotations? How?&lt;/h3&gt;

&lt;p&gt;Most developers experience with annotations involves:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Putting @Override in overriden methods; &lt;/li&gt;

  &lt;li&gt;Using @SuppressWarnings, often to disable an IDE warning about casting generic collections; and &lt;/li&gt;

  &lt;li&gt;Using API specific annotations in JPA&amp;lt; J2EE, Spring, etc (eg @Resource, @Transactional). &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Far fewer developers have ever &lt;em&gt;written&lt;/em&gt; an annotation. If you haven’t it’s worth having a read of &lt;a href="http://java.sun.com/docs/books/tutorial/java/javaOO/annotations.html"&gt;Sun's Annotation Tutorial&lt;/a&gt;. Annotations can be used to basically generate code at compile-time. That’s how Project Lombok works.&lt;/p&gt;

&lt;h3&gt;Is This Kosher?&lt;/h3&gt;

&lt;p&gt;There is some debate about this. A certain school of thought believes that code should still compile and possibly even work without the source annotations present. Alternatively, the belief is that annotations shouldn’t be used as a substitute for language features.&lt;/p&gt;

&lt;p&gt;Java 7 had debate over many new features, some changing the language. One is &lt;em&gt;first-class properties&lt;/em&gt;, to avoid the boilerplate of getters and setters or provide them in a far terser manner, much like C#/.Net does. &lt;a href="http://tech.puredanger.com/java7#property"&gt;First-class properties didn’t make it into Java 7&lt;/a&gt;. Project Lombok gives a viable alternative.&lt;/p&gt;

&lt;p&gt;Possibly more controversial is that you can use &lt;a href="http://projectlombok.org/features/SneakyThrows.html"&gt;@SneakyThrows to throw checked exceptions without declaring them&lt;/a&gt;. This stokes the debate that is old as Java itself: &lt;a href="http://www.ibm.com/developerworks/java/library/j-jtp05254.html"&gt;are checekd exceptions a mistake?&lt;/a&gt; I view them as a failed experiment in software engineering.&lt;/p&gt;

&lt;p&gt;So Project Lombok is somewhat controversial (some have even gone so far as to call it a “hack”) so perhaps the best thing to come out of it is the debate about Java and its future. Up until a couple of years ago Java was a hotbed for debate about software design and an incubator for new technologies, methodologies and architectures. Java (though the &lt;a href="http://www.springsource.org/"&gt;Spring framework&lt;/a&gt;) popularized &lt;a href="http://en.wikipedia.org/wiki/Dependency_injection"&gt;dependency injection&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/Inversion_of_control"&gt;inversion of control&lt;/a&gt; as well as the use of MVC in Web frameworks.&lt;/p&gt;

&lt;p&gt;But after Java 5 Sun seems to have lost its way. It lost such luminaries as &lt;a href="http://en.wikipedia.org/wiki/Joshua_Bloch"&gt;Joshua Bloch&lt;/a&gt; (to Google). Java 5 itself was a huge change and debate still rages about the complexity of Java generics and the wisdom of type erasure. So much so that closures for Java 7 were declined (but out of nowhere, it was announced &lt;a href="http://java.dzone.com/news/closures-coming-java-7"&gt;closures are now coming to Java 7&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;Microsoft has demonstrated leadership for pushing the .Net platform forwards to the point that Java is now playing catch-up (which wasn’t the case prior to .Net 3.0/3.5) whereas Java 7 was mired in debate and lacked leadership and vision from Sun about where to go next, complicated by grand misadventures like the stillborn JavaFX.&lt;/p&gt;

&lt;p&gt;But anyway, perhaps all this debate will help reinvigorate a Java development community that seems to have given up.&lt;/p&gt;

&lt;h3&gt;What has Project Lombok Done Right?&lt;/h3&gt;

&lt;p&gt;Firstly, &lt;a href="http://projectlombok.org/mavenrepo/index.html"&gt;using Lombok with Maven&lt;/a&gt; is easy. I can’t emphasize enough how useful that is. There’s nothing more frustrating than digging around to find the right artifact(s) and/or repositories to get a particular library to work in Maven. Usually it’s not hard but sometimes it is. I have better things to do than try and figure out what someone should just mention in their project’s documentation.&lt;/p&gt;

&lt;p&gt;Also the documentation isn’t super-extensive (as, say, Spring’s typically is) but it sure beats some others (eg typical Apache projects). At least there are examples of all the annotations.&lt;/p&gt;

&lt;p&gt;Another thing I really like about it is that it generates toString(), hashCode() and equals() methods automatically but unlike some Apache Commons libraries, it doesn’t do it via reflection at run-time.&lt;/p&gt;

&lt;p&gt;Lastly, Project Lombok is released under the highly permissive &lt;a href="http://en.wikipedia.org/wiki/MIT_License"&gt;MIT license&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;What Could Project Lombok Do Better?&lt;/h3&gt;

&lt;p&gt;Currently, Project Lombok only integrates (well) with Eclipse. &lt;a href="http://wiki.github.com/rzwitserloot/lombok/netbeans"&gt;Netbeans&lt;/a&gt; and &lt;a href="http://wiki.github.com/rzwitserloot/lombok/intellij-idea"&gt;IntelliJ&lt;/a&gt; users are out in the cold and support for those two IDEs seems anywhere from far off to never going to happen. I’m a huge fan of IntelliJ and frankly I wouldn’t use any other IDE given the choice. People tend to feel pretty strongly about their IDEs so currently Project Lombok’s lack of IntelliJ support is (unfortunately) a deal breaker.&lt;/p&gt;

&lt;p&gt;I’ve had a bit of a look at the IntelliJ plug-in architecture. You can create plug-ins for new languages so all that’s really required is modification to the existing code for Java, assuming it is done with the same mechanism (which is a big and possibly incorrect assumption).&lt;/p&gt;

&lt;p&gt;It seems like a serious limitation of IntelliJ’s internal compiler that it can’t handle compile-time annotations like this in a general sense. Surely one compile-time annotation is lie any other, right?&lt;/p&gt;

&lt;h3&gt;What Features Could Be Added?&lt;/h3&gt;

&lt;p&gt;Others have taken to this idea, despite the controversy. One such extension is &lt;a href="http://code.google.com/p/morbok/"&gt;Morbok&lt;/a&gt;, which uses the same idea to get rid of the boilerplate of creating loggers in classes.&lt;/p&gt;

&lt;p&gt;It’s worth noting that &lt;a href="http://projectlombok.org/features/Cleanup.html"&gt;@Cleanup&lt;/a&gt; should become obsolete with Java 7 (a year or more from now) with the advent of &lt;a href="http://mail.openjdk.java.net/pipermail/coin-dev/2009-February/000011.html"&gt;ARM&lt;/a&gt; (“Automatic Resource Management”). Again, this mimics another .Net feature, where such tedious try-catch-finally blocks are abbreviated using &lt;code&gt;using() { … }&lt;/code&gt; blocks for anything that implements the &lt;code&gt;IDisposable&lt;/code&gt; interface.&lt;/p&gt;

&lt;p&gt;One limitation I noticed was that you can’t use @Data on enums. It would be useful to have an enum-specific version of this. One problem that Lombok could solve is the boilerplate around the use of values(). Enum.values() returns an array of the values. Because arrays aren’t immutable this needs to be copied each time you call it. This makes codes like this inefficient:&lt;/p&gt;

&lt;pre class="brush:java"&gt;public enum Gender {
  MALE(&amp;quot;M&amp;quot;, &amp;quot;male&amp;quot;), FEMALE(&amp;quot;F&amp;quot;, &amp;quot;female&amp;quot;);

  private final static Map&lt;string gender ,&gt; LOOKUP;

  static {
    Map&lt;string gender ,&gt; map = new HashMap&lt;string gender ,&gt;();
    for (Gender gender : values()) {
      map.put(gender.code, gender);
    }
    LOOKUP = Collections.unmodifiableMap(map);
  }

  public static Gender find(String code) {
    return LOOKUP.get(code);
  }

  private final String code;
  private final String description;

  private Gender(String code, String description) {
    this.code = code;
    this.description = description;
  }

  public String getCode() { return code; }
  public String getDescription() { return description; }
}&lt;/pre&gt;

&lt;p&gt;That could easily be reduced to a couple of Lombok-style annotations. For example:&lt;/p&gt;

&lt;pre class="brush:java"&gt;@Enum
@Finder
public enum Gender {
  MALE(&amp;quot;M&amp;quot;, &amp;quot;male&amp;quot;), FEMALE(&amp;quot;F&amp;quot;, &amp;quot;female&amp;quot;);

  @Code
  private final String code;
  private final String description;

  private Gender(String code, String description) {
    this.code = code;
    this.description = description;
  }
}&lt;/pre&gt;

&lt;h3&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;I’m a big fan of Project Lombok. It’s my kind of library: lightweight, practical and doesn’t get in your face, in exactly the way that many Java Web frameworks aren’t. For example, Seam uses a flawed idea (component-oriented JSF) to solve a problem I don’t have. And for those of you that don’t think JSF is a failed experiment, it’s been about to take off for 7+ years now. At some point you just have to accept that it’s not going to work.&lt;/p&gt;

&lt;p&gt;I personally come down on the side of the fence that accepts the utility of using annotations as a substitute for language features, especially considering how Java has stalled in that department in recent years (and Java 7 is delayed now til the end of 2010).&lt;/p&gt;

&lt;p&gt;If you happen to be an Eclipse user, you’re in luck. Give it a try. If you’re not, well it’s a choice between those red lines under your seemingly non-existent methods and not using Lombok. But hope springs eternal for future IDE support.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/336308386934546555-7975371492719883428?l=www.cforcoding.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/uWqk2L9u3QinZDBQzuC9_HLhLtc/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/uWqk2L9u3QinZDBQzuC9_HLhLtc/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/uWqk2L9u3QinZDBQzuC9_HLhLtc/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/uWqk2L9u3QinZDBQzuC9_HLhLtc/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/k1-cbncSpYI" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/7975371492719883428/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2009/11/reviewing-project-lombok-or-right-way.html#comment-form" title="4 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/7975371492719883428?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/7975371492719883428?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/k1-cbncSpYI/reviewing-project-lombok-or-right-way.html" title="Reviewing Project Lombok or the Right Way to Write a Library" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07140129710674369084" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">4</thr:total><feedburner:origLink>http://www.cforcoding.com/2009/11/reviewing-project-lombok-or-right-way.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CUcHSXw5fip7ImA9WxNVF0g.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-1251598093694832745</id><published>2009-10-29T02:03:00.001+08:00</published><updated>2009-10-29T02:03:58.226+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-10-29T02:03:58.226+08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="javascript" /><category scheme="http://www.blogger.com/atom/ns#" term="opinion" /><category scheme="http://www.blogger.com/atom/ns#" term="web 2.0" /><category scheme="http://www.blogger.com/atom/ns#" term="Java" /><category scheme="http://www.blogger.com/atom/ns#" term="gwt" /><category scheme="http://www.blogger.com/atom/ns#" term="web" /><title>Lost in Translation or Why GWT Isn’t the Future of Web Development</title><content type="html">&lt;p&gt;I recently read &lt;a href="http://blog.balfes.net/?p=869"&gt;Is GWT the future of web development?&lt;/a&gt; The post postulates that GWT (“Google Web Toolkit”) is the future because it introduces type safety, leverages the existing base of Java programmers and it has some widgets.&lt;/p&gt;  &lt;p&gt;Google has recently put their considerable weight behind it, most notably with &lt;a href="http://wave.google.com"&gt;Google Wave&lt;/a&gt;. I’m naturally hesitant to bet against Google or &lt;a href="http://www.crunchbase.com/person/lars-rasmussen"&gt;Lars Rasmussen&lt;/a&gt; but the fact is that’s what I’m doing.&lt;/p&gt;  &lt;h3&gt;On Type Safety and Static Typing&lt;/h3&gt;  &lt;p&gt;In the 90s type-safety and static typing ruled almost unchallenged, first with C then C++ and Java (&lt;em&gt;yes I realize Pascal, Algol-68 and a plethora of other languages came beforehand&lt;/em&gt;). Perl was the calling card of smug, bearded Unix systems administrators.&lt;/p&gt;  &lt;p&gt;&lt;img style="width: 550px" src="http://dilbert.com/dyn/str_strip/000000000/00000000/0000000/000000/20000/1000/000/21021/21021.strip.gif" /&gt; &lt;/p&gt;  &lt;p&gt;Performance and the challenges of increasing complexity with relatively low-powered hardware (certainly by today’s standards) were the impetus behind this movement. The idea that variables didn’t need to be declared or that the type could morph as required were tantamount to the sky falling.&lt;/p&gt;  &lt;p&gt;Between Javascript, PHP, Python, Perl, Ruby and other languages over the last decade (&lt;em&gt;and yes some have a history going far earlier than that&lt;/em&gt;) have clearly demonstrated that indeed the sky hasn’t fallen with loose and dynamic typing.&lt;/p&gt;  &lt;h3&gt;On Leveraging Java Programmers&lt;/h3&gt;  &lt;p&gt;This sounds good in theory but let me put it to you another way: if you were to write textbooks in German would you write them in German or write them in English and have a tool convert them to German?&lt;/p&gt;  &lt;p&gt;Anyone who has studied or knows a second language knows that &lt;em&gt;some things just don’t translate&lt;/em&gt;. The same applies to programming languages. Javascript has lots of features that Java doesn’t: &lt;a href="http://en.wikipedia.org/wiki/First-class_function"&gt;first class functions&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/wiki/Closure_(computer_science)"&gt;closures&lt;/a&gt;, &lt;a href="http://codebetter.com/blogs/jeremy.miller/archive/2008/09/08/quot-extension-method-quot-in-javascript.aspx"&gt;extension methods&lt;/a&gt;, a vastly different “this” context, &lt;a href="http://www.javascriptkata.com/2007/03/22/how-to-use-anonymous-objects/"&gt;anonymous objects&lt;/a&gt;, dynamic typing and so on.&lt;/p&gt;  &lt;p&gt;The problems you face when writing a “cross-compiler” are:&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;The weaknesses and limitations of the end result are the combined weaknesses of both languages (or “A union B” in a maths context where A and B are the two languages); &lt;/li&gt;    &lt;li&gt;The strengths of the end result are the common strengths (“A intersect B”) of the two languages; &lt;/li&gt;    &lt;li&gt;The idioms are different; and &lt;/li&gt;    &lt;li&gt;&lt;a href="http://www.joelonsoftware.com/articles/LeakyAbstractions.html"&gt;Abstractions are leaky&lt;/a&gt;. Jeff Atwood characterized this as &lt;a href="http://www.codinghorror.com/blog/archives/001281.html"&gt;All Abstractions Are Failed Abstractions&lt;/a&gt;. &lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;This is the same basic problem with ORMs like Hibernate: &lt;a href="http://en.wikipedia.org/wiki/Object-relational_impedance_mismatch"&gt;Object-relational impedance mismatch&lt;/a&gt;. Every now and again you end up spending half a day figuring the correct combination of properties, annotations, XML and VM parameters to have a query generate the right two lines of SQL that’ll actually be performant.&lt;/p&gt;  &lt;p&gt;Another problem is that GWT fools naive Java developers into thinking &lt;em&gt;they don’t need to learn Javascript&lt;/em&gt;.&lt;/p&gt;  &lt;p&gt;My position can be summed up as: &lt;strong&gt;&lt;em&gt;GWT treats Javascript as a bug that needs to be solved&lt;/em&gt;&lt;/strong&gt;.&lt;/p&gt;  &lt;h3&gt;On Widgets and Maturity&lt;/h3&gt;  &lt;p&gt;I’ve programmed with GWT. The widget selection is woeful. The standard GWT widgets look awful, even amateurish. There are some third-party options but &lt;a href="http://www.extjs.com/products/gxt/"&gt;ExtGWT&lt;/a&gt; is a shockingly bad library. &lt;a href="http://code.google.com/p/smartgwt/"&gt;SmartGWT&lt;/a&gt; looks like a better alternative (and is actually a community effort rather than a split GPL/commercial mish-mash from someone who simply doesn’t understand Java Generics). There aren’t many other choices.&lt;/p&gt;  &lt;p&gt;Javascript has many choices: &lt;a href="http://developer.yahoo.com/yui/"&gt;YUI&lt;/a&gt;, &lt;a href="http://www.extjs.com/"&gt;ExtJS&lt;/a&gt; (&lt;em&gt;completely&lt;/em&gt; different beast to ExtGWT), &lt;a href="http://www.dojotoolkit.org/"&gt;Dojo&lt;/a&gt;, &lt;a href="http://jqueryui.com/"&gt;jQuery UI&lt;/a&gt;, &lt;a href="http://www.smartclient.com/"&gt;SmartClient&lt;/a&gt; and others. Not only is there substantially more choice but the choices are substantially more mature.&lt;/p&gt;  &lt;h3&gt;Development Speed is King&lt;/h3&gt;  &lt;p&gt;Java Web apps can take minutes to build and deploy. Within certain restrictions you can hot-deploy classes and JSPs. One of the wonderful things about PHP and Javascript development is that the build and deploy step is typically replaced by saving the file you’re working on and clicking reload on your browser.&lt;/p&gt;  &lt;p&gt;GWT compiles are &lt;strong&gt;&lt;em&gt;brutal&lt;/em&gt;&lt;/strong&gt;, so much so that significant effort has gone into improving the experience with GWT 1.6+ and 2.0. Draft compiles, parallel compilation, optimized vs unoptimized Javascript and selected targeted browsers in development. These all can help but these are in part counteracted by increasing compile times with each version.&lt;/p&gt;  &lt;p&gt;Also compiles are only required when you change your service interfaces. Pure client-side changes can be tested by refreshing the hosted browser (or a real browser in GWT 2.0+). Serverside changes that don’t alter the interface don’t technically require a GWT recompile but this can be problematic to implement (in either Ant or Maven).&lt;/p&gt;  &lt;p&gt;Why are long compile times a problem?&lt;/p&gt;  &lt;p&gt;&lt;img style="width: 413px" src="http://imgs.xkcd.com/comics/compiling.png" /&gt;&lt;/p&gt;  &lt;p style="clear: left"&gt;Or from &lt;a href="http://www.joelonsoftware.com/articles/fog0000000043.html"&gt;The Joel Test: 12 Steps to Better Code&lt;/a&gt;:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;We all know that knowledge workers work best by getting into &amp;quot;flow&amp;quot;, also known as being &amp;quot;in the zone&amp;quot;, where they are fully concentrated on their work and fully tuned out of their environment. They lose track of time and produce great stuff through absolute concentration. This is when they get all of their productive work done. Writers, programmers, scientists, and even basketball players will tell you about being in the zone.&lt;/p&gt;    &lt;p&gt;&amp;#160;&lt;/p&gt;    &lt;p&gt;The trouble is, getting into &amp;quot;the zone&amp;quot; is not easy. When you try to measure it, it looks like it takes an average of 15 minutes to start working at maximum productivity.&lt;/p&gt;    &lt;p&gt;&amp;#160;&lt;/p&gt;    &lt;p&gt;The other trouble is that it's so easy to get knocked &lt;i&gt;out&lt;/i&gt; of the zone. Noise, phone calls, going out for lunch, having to drive 5 minutes to Starbucks for coffee, and interruptions by coworkers -- &lt;i&gt;especially&lt;/i&gt;interruptions by coworkers -- all knock you out of the zone.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Even a one minute compile can knock you out of the zone. Even Jeff Atwood—still desperately clinging to his irrational hatred of PHP like an indentity asserting life preserver—has seen the light and is a self-proclaimed &lt;a href="http://www.codinghorror.com/blog/archives/001216.html"&gt;Scripter at Heart&lt;/a&gt;.&lt;/p&gt;  &lt;p&gt;&lt;/p&gt;  &lt;h3&gt;Not Every Application is GMail&lt;/h3&gt;  &lt;p&gt;I think of a Web application as something like GMail. It is typically a single page (or close to it) and will often mimic a desktop application. Traditional Web pages may use Javascript varying from none to lots but still rely on a fairly standard HTTP transition between HTML pages.&lt;/p&gt;  &lt;p&gt;GWT is a technology targeted at Web applications. Load times are high (because it’s not hard to get to 1MB+ of Javascript) but that’s OK because in your whole session you tend to load only one page once. Web pages are still far more common than that and GWT is not applicable to that kind of problem.&lt;/p&gt;  &lt;p&gt;Even if you limit the discussion to Web applications, all but the largest Web applications can be managed with a Javascript library in my experience.&lt;/p&gt;  &lt;p&gt;Now for something truly monumental in size I can perhaps see the value in GWT or at least the value of type checking. Still, I’d rather deal with dynamic loading of code in Javascript that I would with &lt;a href="http://code.google.com/p/google-web-toolkit/wiki/CodeSplitting"&gt;GWT 2.0+ code splitting&lt;/a&gt;. Compare that to, say, &lt;a href="http://developer.yahoo.com/yui/3/examples/yui/yui-loader-ext.html"&gt;YUI 3 dynamic loading&lt;/a&gt;, which leverages terse Javascript syntax and first class functions.&lt;/p&gt;  &lt;h3&gt;Of Layers and Value Objects&lt;/h3&gt;  &lt;p&gt;It’s not secret that Java programmers love their layers. No sooner do you have a Presentation Layer, a Controller Layer and a Repository Layer than someone suggest you also need a Database Abstraction Layer, a Service Layer, a Web Services Layer and a Messaging Layer.&lt;/p&gt;  &lt;p&gt;And of course you can’t use the same value object to pass data between them so you end up writing a lot of boilerplate like:&lt;/p&gt;  &lt;pre class="brush:java"&gt;public class TranslationUtils {
  public static CustomerVO translate(Customer customer) {
    CustomerVO ret = new CustomerVO();
    ret.setName(customer.getName());
    ret.setDateOfBirth(customer.getDateOfBirth());
    ...
    return ret;
  }
}&lt;/pre&gt;

&lt;p&gt;Or you end up using some form of reflection (or even XML) based property copying mechanism.&lt;/p&gt;

&lt;p&gt;Apparently this sort of thing is deemed a &lt;em&gt;good idea&lt;/em&gt; (or is at least common practice). The problem of course is that if your interfaces mentions that class you’ve created a dependency.&lt;/p&gt;

&lt;p&gt;What’s more Java programmers have a predilection with concerning themselves about swapping out layers or putting in alternative implementations that never happen.&lt;/p&gt;

&lt;p&gt;I am a firm believer that lines of code are the enemy. You should have as few of them as possible. As a result, it is my considered opinion that you are better off passing one object around that you can dynamically change as needed than writing lots of boilerplate property copying that due to sheer monotony is error-prone and because of subtle differences can’t be solved (at least not completely) with automated tools.&lt;/p&gt;

&lt;p&gt;In Javascript of coruse you can just add properties and methods to classes (all instances) or individual instances as you see fit. Since Java doesn’t support that, it creates a problem for GWT: what do you use for your presentation objects? Libraries like ExtGWT have ended up treating everything as Maps (so where is your type safety?) that go through several translations (including to and from JSON).&lt;/p&gt;

&lt;h3&gt;On Idioms&lt;/h3&gt;

&lt;p&gt;Managers and recruiters tend to place too much stock in what languages and frameworks you (as the programmer candidate) have used. Good programmers can (and do) pick up new things almost constantly. This applies to languages as well. Basic control structures are the same as are the common operations (at least with two languages within the same family ie imperative, functional, etc).&lt;/p&gt;

&lt;p&gt;Idioms are harder. A lot of people from say a Java, C++ or C# background when they go to something like PHP will try and recreate what they did in their “mother tongue”. This is nearly always a mistake.&lt;/p&gt;

&lt;p&gt;Object-oriented programming is the most commonly misplaced idiom. &lt;a href="http://michaelkimsal.com/blog/php-is-not-object-oriented/"&gt;PHP is not object-oriented&lt;/a&gt; (“object capable” is a more accurate description). Distaste for global is another. Few things are truly global in PHP and serving HTTP requests is quite naturally procedural most of the time. As Joel notes in &lt;a href="http://www.joelonsoftware.com/articles/APIWar.html"&gt;How Microsoft Lost the API War&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;A lot of us thought in the 1990s that the big battle would be between procedural and object oriented programming, and we thought that object oriented programming would provide a big boost in programmer productivity. I thought that, too. Some people still think that. It turns out we were wrong. Object oriented programming is handy dandy, but it's not really the productivity booster that was promised. The &lt;em&gt;real &lt;/em&gt;significant productivity advance we've had in programming has been from languages which manage memory for you automatically.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The point is that Java and Javascript have very different idioms. Well-designed Javascript code will do things quite differently to well-designed Java so by definition you’re losing something by converting Java to Javascript: idioms can’t be automagically translated.&lt;/p&gt;

&lt;h3&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;Scripting is the future. Long build and deploy steps are anachronistic to both industry trends and maximizing productivity. This trend has been developing for many years.&lt;/p&gt;

&lt;p&gt;Where once truly compiled languages (like C/C++ and not Java/C#, which are “compiled” into an intermediate form) accounted for the vast bulk of development, now they the domain of the tools we use (Web browsers, operating systems, databases, Web servers, virtual machines, etc). They have been displaced by the “semi-compiled” managed platforms (Java and .Net primarily). Those too will have their niches but for an increasing amount of programming, they too will be displaced by more script-based approaches.&lt;/p&gt;

&lt;p&gt;GWT reminds me of trying to figure out the best way to implement a large-scale, efficient global messaging system using telegrams where everyone else has switched to email.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/336308386934546555-1251598093694832745?l=www.cforcoding.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/273dOOsZJLhcareaM17NpSC0eBs/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/273dOOsZJLhcareaM17NpSC0eBs/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/273dOOsZJLhcareaM17NpSC0eBs/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/273dOOsZJLhcareaM17NpSC0eBs/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/2NFrIaHe7rQ" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/1251598093694832745/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2009/10/lost-in-translation-or-why-gwt-isnt.html#comment-form" title="61 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/1251598093694832745?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/1251598093694832745?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/2NFrIaHe7rQ/lost-in-translation-or-why-gwt-isnt.html" title="Lost in Translation or Why GWT Isn’t the Future of Web Development" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07140129710674369084" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">61</thr:total><feedburner:origLink>http://www.cforcoding.com/2009/10/lost-in-translation-or-why-gwt-isnt.html</feedburner:origLink></entry><entry gd:etag="W/&quot;D0ICQX07cSp7ImA9WxNVEk8.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-7948784112271408046</id><published>2009-10-22T19:10:00.001+08:00</published><updated>2009-10-22T23:32:40.309+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-10-22T23:32:40.309+08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="opinion" /><category scheme="http://www.blogger.com/atom/ns#" term="windows" /><title>Microsoft, Marketing Insanity and Windows Piracy</title><content type="html">&lt;p&gt;Thursday marks Microsoft’s release of the Windows 7 operating system. This is an opportune moment to reflect on Microsoft’s marketing strategy because its like they want me to pirate Windows. And I feel the need to rant.&lt;/p&gt;  &lt;h3&gt;A Brief History of Windows&lt;/h3&gt;  &lt;p&gt;Windows 3.0 was released in 1990 in an attempt to stave off the successful (but expensive) Macintosh. And it certainly ticked off the boxes (from a marketing perspective at least).&lt;/p&gt;  &lt;p&gt;Various incarnations of Windows 3.x followed over the next 2-3 years. Probably the most interesting thing is that &lt;a href="$word = htmlentities(str_replace($search, $replace, $word), ENT_QUOTES);"&gt;Microsoft only stopped selling Windows 3.x licenses in November 2008&lt;/a&gt;.&lt;/p&gt;  &lt;p&gt;Windows 95 (“Chicago”) in 1995. What some don’t realize is that Windows 95 was in many ways &lt;em&gt;technically&lt;/em&gt; superior to MacOS, most notably:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Pre-emptive multitasking rather than cooperative multitasking. Rather than waiting for an application to yield, the operating system could interrupt. Nothing new to UNIX but certainly new to Windows and MacOS; &lt;/li&gt;    &lt;li&gt;Virtual address spaces. Macs at the time had to allocate memory slices to programs. Win95 programs could simply ask for more memory. Depending on your hardware, this could be physical RAM or hard disk space. The OS could swap between them while the application was running too. Again, nothing new for UNIX. &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;The biggest impact of Windows 95 was that it killed off non-Microsoft DOS.&lt;/p&gt;  &lt;p&gt;Another notable feature was DirectX, Microsoft’s gaming API. This wasn’t part of the original release. It quickly supplanted OpenGL as a gaming API in the burgeoning world of hardware acceleration to the point that even stalwart advocates like &lt;a href="http://techreport.com/discussions.x/13237"&gt;id software are abandoning it&lt;/a&gt; but DirectX was a commercial success and the majority player almost a decade earlier.&lt;/p&gt;  &lt;p&gt;Separately Windows NT had sprung into existence to break the connection between Windows and the DOS shell. Windows NT 4.0 in 1996 was probably the first version with broad market success, targeted at businesses.&lt;/p&gt;  &lt;p&gt;The next notable release was Windows 2000, the successor to the venerable Windows NT 4.0, as it began the convergence of the NT and 9X (including ME) families. This culminated in 2001 with the release of Windows XP.&lt;/p&gt;  &lt;h3&gt;The Wintel Alliance&lt;/h3&gt;  &lt;p&gt;The rise of Windows and decline of IBM’s leadership of the PC coincided with marriage of convenience between Microsoft (Windows) and Intel or “Wintel”. This has never been a comfortable arrangement but it is based on a fairly simple principle:&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;em&gt;Most people only buy operating systems with new computers.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;That conventional wisdom has since been disproven by Apple but more on that later.&lt;/p&gt;  &lt;p&gt;The dark side of this marriage is &lt;a href="http://en.wikipedia.org/wiki/Planned_obsolescence"&gt;planned obsolescence&lt;/a&gt;. Chipsets changing, RAM standards changing, CPU sockets changing and so on. Some of it’s necessary and understandable. Savvy consumers have long figured out that buying high quality (and high cost) components for their PCs for upgrades down the line is a waste of time and money.&lt;/p&gt;  &lt;p&gt;The biggest threat to Intel came from the disaster that was (and is) Itanium, AMD cutting them off at the knees with the hugely successful Athlon line of processors, AMD’s x86-64 instruction set putting the final nail in Itanium’s coffin and the disaster that was the Pentium 4. I say “disaster” but it was mixed. The gigahertz marketing campaign against AMD was successful. As a technology it was a disaster.&lt;/p&gt;  &lt;p&gt;Why do I say that? Because eventually it was abandoned as Intel returned to the Pentium 3 architecture with what became the hugely successful Pentium M, Core Solo, Core Duo and Core 2 Duo releases.&lt;/p&gt;  &lt;h3&gt;A New Millenium&lt;/h3&gt;  &lt;p&gt;For those of us who purchased (and typically built) PCs in the 1990s, it was worth upgrading your PC every year or two. The newer PCs were just that much better. The market while large was much smaller than it is today to the point where Microsoft could count on a rapid turnover in PCs and a low of purchases from first-time PC owners.&lt;/p&gt;  &lt;p&gt;As Joel Spolsky noted in &lt;a href="http://www.joelonsoftware.com/articles/APIWar.html"&gt;How Microsoft Lost the API War&lt;/a&gt;:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;Microsoft just waited for the next big wave of hardware upgrades and sold Windows, Word and Excel to corporations buying their next round of desktop computers (in some cases their first round). So in many ways Microsoft never needed to learn how to get an installed base to switch from product N to product N+1.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;In the last 8-10 years PCs have gotten better but for increasingly more people it’s &lt;em&gt;enough&lt;/em&gt;. My father has an old PC cannibalized from parts I bought in 2002. It runs a (modern) browser, Word and Excel and that’s all he needs. There will be absolutely no need to upgrade that PC or purchase a new one until it dies. This is the case for most consumers and businesses.&lt;/p&gt;  &lt;p&gt;So rather than buying a new operating system every 2-3 years, the music stopped playing when Windows XP was the OS &lt;em&gt;du jour&lt;/em&gt;. Everyone sat down and they haven’t moved since.&lt;/p&gt;  &lt;p&gt;Interestingly, Office suffered the same problem at roughly the same time. Office 97 was basically feature complete for 95%+ of all users. Every version since has been an attempt to get businesses to buy it for it’s enterprise tinselware. Sure there have been minor improvements but overall, Office 97 is it.&lt;/p&gt;  &lt;h3&gt;Fighting Fires&lt;/h3&gt;  &lt;p&gt;As the scope of Windows has grown over the years, Microsoft has been fighting fires to defend its franchise that include:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Java: “run anywhere” (well, write once, test everywhere) was a threat to the Windows lock-in; &lt;/li&gt;    &lt;li&gt;Games: OpenGL also threatened the lock-in since DirectX is Windows-specific; &lt;/li&gt;    &lt;li&gt;Developer Tools: once Borland was a major player. Now it’s all about Visual Studio; &lt;/li&gt;    &lt;li&gt;The Web: the free Internet Explorer was a desperate attempt to fend off Netscape that was ultimately successful. &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;The last one is important because even though Microsoft won the battle they lost the war. Microsoft’s hubris, breaking of backwards compatibility, ever-changing platforms and standards and so on probably accelerated the adoption of the Web as a platform for application delivery.&lt;/p&gt;  &lt;p&gt;Microsoft was once an innovator trying to get market share. It was then they were at their best. At some point companies become so large that they switch from being innovators to defenders. No longer are they concerned making the best product. They are primarily concerned with defending what they already have.&lt;/p&gt;  &lt;h3&gt;The Madness Begins&lt;/h3&gt;  &lt;p&gt;Even before it was released, Vista (or Longhorn as it was called then) had a lot of people concerned. Microsoft had seemingly decided that it was OK to start breaking backwards compatibility.&lt;/p&gt;  &lt;p&gt;Faced with people only buying an operating system (meaning a PC) every 5-8 years, what did Microsoft do? They did what most marketing eggheads would do: they raised prices. Instead of getting $100 from a consumer every 2-3 years, let’s charge $250 every 5-8 years including revenue growth to please our shareholders.&lt;/p&gt;  &lt;p&gt;Earth to Microsoft: if you charge people &lt;em&gt;more&lt;/em&gt; they will buy &lt;em&gt;less&lt;/em&gt;.&lt;/p&gt;  &lt;h3&gt;Segmentation Insanity&lt;/h3&gt;  &lt;p&gt;A bigger problem was where there were &lt;em&gt;basically&lt;/em&gt; two versions of Windows XP (Home and Professional) ignoring the Server version. In classic &lt;a href="http://www.joelonsoftware.com/articles/CamelsandRubberDuckies.html"&gt;“how much money do you have?”&lt;/a&gt; pricing, there were now &lt;a href="http://www.microsoft.com/windows/windows-vista/compare-editions/default.aspx"&gt;four versions&lt;/a&gt;:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Home Basic &lt;/li&gt;    &lt;li&gt;Home Premium &lt;/li&gt;    &lt;li&gt;Business; and &lt;/li&gt;    &lt;li&gt;Ultimate &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;But it gets worse…&lt;/p&gt;  &lt;h3&gt;Retail, Upgrade or OEM?&lt;/h3&gt;  &lt;p&gt;Say what now? Try and explain this one to a non-techie. For Windows 7 this is actually worse. For example, &lt;a href="http://www.amazon.co.uk/s/ref=amb_link_85141893_3?ie=UTF8&amp;amp;rh=n:300435,k:B002DUCMTC|B002DGS82G|B002M78BPU&amp;amp;pf_rd_m=A3P5ROKL5A1OLE&amp;amp;pf_rd_s=center-sign-in&amp;amp;pf_rd_r=060PPGZN6JZ4JQM77WSX&amp;amp;pf_rd_t=101&amp;amp;pf_rd_p=473679253&amp;amp;pf_rd_i=248293031"&gt;Windows 7 Professional Upgrade is more expensive than Windows 7 Professional Retail&lt;/a&gt;. What the…?&lt;/p&gt;  &lt;h3&gt;32 or 64 bits?&lt;/h3&gt;  &lt;p&gt;&lt;em&gt;Seriously?&lt;/em&gt;&lt;/p&gt;  &lt;p&gt;This is perhaps the most egregious transgression. Several months ago I had a conversation with a friend who has been using PCs for 15 years and was upgrading his computer about getting the right version of Vista (32 or 64) since he was thinking about getting 4GB or more of RAM. He’s reasonably proficient. Try explaining it to someone who isn’t.&lt;/p&gt;  &lt;p&gt;It reminds me of this &lt;a href="http://www.joelonsoftware.com/uibook/chapters/fog0000000059.html"&gt;classic UI blunder&lt;/a&gt;:&lt;/p&gt;  &lt;p&gt;&lt;img style="width: 470px" src="http://www.joelonsoftware.com/uibook/pictures/Stupidest_Dialog_Ever.gif" /&gt; &lt;/p&gt;  &lt;p&gt;Why are you asking consumers about questions they don’t understand and don’t care about when choosing which OS to buy? Just sell one version. If an advanced user wishes to install the 64 bit version, let them do so during installation.&lt;/p&gt;  &lt;h3&gt;Activations and DRM&lt;/h3&gt;  &lt;p&gt;One of the scourges of the last decade has been the rise of DRM (“digital rights management”). As I &lt;a href="http://www.cforcoding.com/2009/10/stackoverflow-advertising-and-ethics-of.html"&gt;previously said&lt;/a&gt;, people have an innate sense of fairness. If you tell them they don’t own something like a program, video or song even though they paid for it, they aren’t going to like it.&lt;/p&gt;  &lt;p&gt;Even Google got sucked into this sham (probably at the behest of Big Content) and discovered the downside (for them) when they had to &lt;a href="http://profy.com/2007/08/21/google-apologizes-offers-video-customers-full-refund/"&gt;refund users when they closed Google Video&lt;/a&gt;. Woops.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://www.bit-tech.net/news/hardware/2006/10/26/Microsoft_clarifies_Vista_activation_to_bit-tech/1"&gt;Vista came with draconian activation limits&lt;/a&gt;. Microsoft eventually relented somewhat, particularly with the (pricey) retail version.&lt;/p&gt;  &lt;p&gt;Now compare that to a version I can find on The Pirate Bay that simply works and you begin to see that &lt;strong&gt;&lt;em&gt;DRM creates pirates&lt;/em&gt;&lt;/strong&gt;.&lt;/p&gt;  &lt;h3&gt;Windows 7? It Gets Worse&lt;/h3&gt;  &lt;p&gt;Don’t believe me? Consider this table:&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;img style="width: 550px" src="http://i.zdnet.com/blogs/windows-upgrade-chart-eb-remake-final3.png" /&gt; &lt;/p&gt;  &lt;p&gt;And that’s the simple version of the charge that doesn’t include 32 and 64 bit combinations. The upgrade costs have also changed as you can, say, upgrade from Windows Vista Home Premium to Windows 7 Professional.&lt;/p&gt;  &lt;p&gt;But there’s no other choice, right? Wrong.&lt;/p&gt;  &lt;h3&gt;Timing is Everything&lt;/h3&gt;  &lt;p&gt;From the mid to late 90s Apple was in the wilderness. Windows had eroded Mac market share to the small single digits. Several attempts were made to turn this around such as Apple’s purchase of Steve Jobs’ NeXT and the Rhapsody OS.&lt;/p&gt;  &lt;p&gt;With the return of the king, there were (eventually) two great successes. Firstly, the iPod (and more importantly) iTunes. Secondly, Jobs put the nail in the coffin for PowerPC by switching Apple hardware to Intel’s x86 architecture. Jobs stated that “power efficiency”, which many found laughable given the Pentium 4’s power and heat issues.&lt;/p&gt;  &lt;p&gt;Jobs’ timing however was superb (and undoubtedly not luck). Intel’s Centrino platform&amp;#160; became all-conquering.&lt;/p&gt;  &lt;h3&gt;Some Things Are Greater Than Their Sum of Parts&lt;/h3&gt;  &lt;p&gt;This was a risky move. &lt;em&gt;Differentiation&lt;/em&gt; is a key component of Apple’s strategy. If Macs use the same hardware as PCs, why pay extra? Macbook and iMac market share has recovered from around 2% to be &lt;a href="http://osxdaily.com/2009/10/14/apple-now-has-9-4-market-share-in-usa-unofficially-at-4-worldwide/"&gt;almost 10% of US shipments&lt;/a&gt;.&lt;/p&gt;  &lt;p&gt;Part of Apple’s appeal has always been what I term “countercultural knockback”, meaning there are a certain group of people who will attach themselves to something—sometimes fanatically—in part because it &lt;em&gt;isn't&lt;/em&gt; popular. Another part of it is that Apple aims itself at the top end of the market quite deliberately. But a huge part that’s often overlooked by detractors is that the &lt;em&gt;whole package&lt;/em&gt; is attractive.&lt;/p&gt;  &lt;p&gt;Apple didn’t invent the concept of a sleek laptop, or a digital music player or a phone with Internet capability. They just did it better than anyone else.&lt;/p&gt;  &lt;h3&gt;One Size Fits All&lt;/h3&gt;  &lt;p&gt;Consumers don’t like being forced to make choices they don’t care about or don’t understand. Two years ago, Steve Jobs famously fun of Vista segmentation during his WWDC keynote.&lt;/p&gt;  &lt;div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; float: none; padding-top: 0px" id="scid:5737277B-5D6D-4f48-ABFC-DD9C333F4C5D:1412dea3-827c-4457-8caf-12cea5fe00a9" class="wlWriterSmartContent"&gt;&lt;embed height="355" type="application/x-shockwave-flash" width="425" src="http://www.youtube.com/v/7G5UUw9puhQ" wmode="transparent" /&gt;&lt;/div&gt;  &lt;h3&gt;Don’t Make Me Think&lt;/h3&gt;  &lt;p&gt;So I’m looking to buy a copy of Windows 7. I initially started on the Home Premium version. A friend pointed out to me that the XP Compatibility Mode&amp;amp;mdash;something I'm not certain I won't need—is only in the &lt;a href="http://www.microsoft.com/windows/windows-7/compare/default.aspx"&gt;Professional and Ultimate versions&lt;/a&gt;.&lt;/p&gt;  &lt;p&gt;Microsoft’s policy on OEM versions takes some figuring out. As it turns out it all comes down to the motherboard. Change your motherboard and you need a new OEM version. This may or may not be enforced. I’ve been using Windows XP for over 7 years. I’ve had 3 motherboards in the last 2 years so I want the retail version instead.&lt;/p&gt;  &lt;p&gt;Well that’s going to cost $449 for the Professional version. Oddly, the Ultimate version is only $20 more. Was there really need to differentiate between these two versions for $20? Why not just have one version that covers both?&lt;/p&gt;  &lt;p&gt;That’s a lot of money to pay for an operating system. Times have changed. No longer does a PC cost $3,000. You can buy a quad core box with 4GB of RAM for as little as $500. And I need to pay almost 100% of the hardware costs for Windows? Seriously? What does it do that XP doesn’t? Not a lot (that I care about anyway).&lt;/p&gt;  &lt;p&gt;Are you kidding me?&lt;/p&gt;  &lt;h3&gt;Fear, Uncertainty, Doubt&lt;/h3&gt;  &lt;p&gt;FUD has been a hallmark of tech marketing. Microsoft is no exception. Just last month, &lt;a href="http://www.computerworld.com/s/article/9138007/Microsoft_No_TCP_IP_patches_for_you_XP"&gt;Microsoft announced no TCP/IP patch for Windows XP&lt;/a&gt;, claiming the code was too old. Bullshit. It’s marketing strategy to convince us we &lt;em&gt;need&lt;/em&gt; to upgrade.&lt;/p&gt;  &lt;p&gt;They tried it with Vista too. The long-awaited DirectX 10 update was Vista only. Microsoft marketing was to suggest you might not be able to run the latest games if you have Vista instead of XP (when most hardcore gamers were sticking with XP for performance reasons).&lt;/p&gt;  &lt;p&gt;Microsoft has been using FUD against Linux for years. There’s something amusing (even ironic) about them using FUD on one of their own products.&lt;/p&gt;  &lt;h3&gt;Lipstick on a Pig&lt;/h3&gt;  &lt;p&gt;What is Windows 7, really? I’ll give Microsoft props for one thing: the Windows 7 marketing is a success. A lot of people &lt;em&gt;are&lt;/em&gt; excited about it. I used the RC version for a few months and it’s not bad. The NTFS support is noticeably faster and I didn’t get those stupid “Preparing to delete” boxes when I deleted a directory tree. I must admit I also like to find programs by the Start Menu speed search.&lt;/p&gt;  &lt;p&gt;Could these features have been added on an XP base? Absolutely.&lt;/p&gt;  &lt;p&gt;Vista has a service pack already. As far as I’m concerned Windows 7 is just Vista Service Pack 2. How really is it different to Vista? The UAC security is slightly less annoying but it’s basically the same. Maybe wireless is a bit better but these are all incremental changes.&lt;/p&gt;  &lt;p&gt;The 90s were a pioneering period for personal computing where they went from niche to mainstream. Operating systems and applications are both mature now. Even Linus Torvalds has recognized this:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;strong&gt;APC&lt;/strong&gt;: When do you expect to see a kernel version 3.0? What will be the major changes or differences from the 2.6 series?&lt;/p&gt;    &lt;p&gt;&amp;#160;&lt;/p&gt;    &lt;p&gt;&lt;strong&gt;LT&lt;/strong&gt;: We really don't expect to need to go to a 3.0.x version at all: we've been very good at introducing even pretty big new features without impacting the code-base in a disruptive manner, and without breaking any old functionality.&lt;/p&gt;    &lt;p&gt;&amp;#160;&lt;/p&gt;    &lt;p&gt;That, together with the aforementioned lack of a marketing department that says &amp;quot;You have to increase the version number to show how good you are!&amp;quot; just means that we tend to just improve everything we can, but you're not likely to see a big &amp;quot;Get the new-and-improved version 3!&amp;quot; campaign.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Basically there (probably) won’t be a Linux 3.0. There’s no need. Microsoft needs to recognize that &lt;em&gt;need&lt;/em&gt; isn’t a factor for consumers. Whatever they have, it’s enough.&lt;/p&gt;  &lt;h3&gt;Old Versions Cost You Money&lt;/h3&gt;  &lt;p&gt;Anyone who has written software for a living knows this to be true: supporting old versions of your software costs you money. You want your customers to be on the latest version.&lt;/p&gt;  &lt;p&gt;Here &lt;a href="http://arstechnica.com/apple/news/2009/10/how-fast-are-mac-users-adopting-snow-leopard-pretty-fast.ars"&gt;Apple is clearly more successful at getting their users to upgrade&lt;/a&gt;, in part helped by the low (US$29) cost. You can even buy 5 licenses for home for US$49. &lt;a href="http://www.appleinsider.com/articles/09/10/19/apples_mac_os_x_snow_leopard_sales_double_previous_records.html"&gt;Each release seems to get bigger&lt;/a&gt;.&lt;/p&gt;  &lt;p&gt;Cost is clearly a factor here. Detractors would argue Apple is charging for service packs. Maybe so. But it’s clear consumers prefer to pay less money more often.&lt;/p&gt;  &lt;h3&gt;Ship Early, Ship Often&lt;/h3&gt;  &lt;p&gt;The other way you cost yourself money is increasing the time between releases. Costs scale exponentially rather than linearly. If takes you four years to ship a product it will probably cost you twice what it does to ship two products at two year intervals.&lt;/p&gt;  &lt;p&gt;Long releases tend to be over-ambitious releases. What’s more, there is a huge likelihood that market conditions have changed by the time you release that you’re spending a lot of effort changing an unshipped product before it even gets out the door. There is no better example than the &lt;a href="http://en.wikipedia.org/wiki/Duke_Nukem_Forever"&gt;Duke Nukem Forever debacle&lt;/a&gt;.&lt;/p&gt;  &lt;p&gt;And of course complexity is the enemy. The growth in &lt;a href="http://en.wikipedia.org/wiki/Source_lines_of_code#Example"&gt;Windows lines of code&lt;/a&gt; shows no signs of abating.&lt;/p&gt;  &lt;h3&gt;The Business Market&lt;/h3&gt;  &lt;p&gt;This is both Microsoft’s biggest source of revenue (for both Windows and Office licenses) and its biggest thorn in the side. It’s also a problem Apple does not have.&lt;/p&gt;  &lt;p&gt;Most companies buy PCs for their employees. To help with support costs they come up with a standard installation, called an SOE (“standard operating environment”). This version will then come on a CD that’ll install everything. It’s expensive to change and roll out. Most companies will have a Windows 2000 or XP based SOE.&lt;/p&gt;  &lt;p&gt;A ten year old PC running Win2K and Office 97 &lt;em&gt;still&lt;/em&gt; does its required job. This isn’t just being cheap. Why would you roll out new hardware that from a functionality perspective does the same thing? There’s no business case for it. What do you think a hospital will choose between new PCs of dubious utility and a $3 million MRI that’ll save some lives?&lt;/p&gt;  &lt;p&gt;So it’s understandable but these people are the bane for Web developers as they’re responsible for the dogged ~10% market share for Internet Explorer 6 too.&lt;/p&gt;  &lt;h3&gt;So How Does Microsoft Sell Operating Systems?&lt;/h3&gt;  &lt;p&gt;Good question, one for which Microsoft has no answer. It probably doesn’t help that the man at the helm (Steve Ballmer) isn’t a programmer. He’s not even a techie. He’s a business guy who thinks in terms of marketing, business strategy and gap analysis. At least Bill Gates was a programmer. Bill Gates’ Microsoft was an innovator no matter what else you could say about it or him.&lt;/p&gt;  &lt;p&gt;I have to agree with Jeff Atwood on this one. &lt;a href="http://www.codinghorror.com/blog/archives/001293.html"&gt;Microsoft is getting pricing wrong&lt;/a&gt;. Prices need to be low enough that it ceases to be a major purchase.&lt;/p&gt;  &lt;h3&gt;Microsoft Just Doesn’t “Get” Marketing&lt;/h3&gt;  &lt;p&gt;What can I say? &lt;a href="http://www.computerworld.com/s/article/9139725/Windows_7_launch_parties_Are_we_all_mad_?taxonomyId="&gt;Windows 7 Launch Parties&lt;/a&gt;? &lt;a href="http://finchannel.com/Main_News/Tech/50041_New_Retail_Stores_Connect_Consumers_With_the_Best_of_Microsoft/"&gt;Microsoft retail stores&lt;/a&gt; (hint: Apple had compelling consumer products rather than “me too!” wannabe products before they opened stores)? &lt;a href="http://tech.yahoo.com/blogs/null/104224"&gt;Vista ads with Jerry Senfeld&lt;/a&gt;? I’m shaking my head.&lt;/p&gt;  &lt;h3&gt;So Which Windows 7 to Buy?&lt;/h3&gt;  &lt;p&gt;Non-coincidentally, Apple just cut the prices of Macbook Pros and released a new “low” cost white Macbook. In Australia this was at least in part due to the appreciation of the Australian dollar in recent months. A Macbook Pro 13 is now only a few hundred dollars more expensive thn a (plastic) Dell Studio XPS 13.&lt;/p&gt;  &lt;p&gt;I’m not a .Net developer so I’m not tied to Windows. My favourite IDEs come from Jetbrains (Intellij IDEA and increasingly Web IDE) and they run on Windows, Macs and Linux.&lt;/p&gt;  &lt;p&gt;Windows virtualization and emulation (eg WINE) are getting sufficiently good that you can run Office 2007 under Ubuntu.&lt;/p&gt;  &lt;p&gt;I need to install Cygwin to get a workable command line on Windows anyway. It also makes Git work easier.&lt;/p&gt;  &lt;p&gt;As a developer I’m finding the Macbook an increasingly attractive option. I only have three criticisms and concerns:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Apple reversed themselves on adopting ZFS as a replacement to (what &lt;a href="http://www.smh.com.au/news/technology/torvalds-pans-apples-os-x/2008/02/05/1202090393959.html"&gt;Linus Torvalds described as &amp;quot;scary&amp;quot;&lt;/a&gt;) HFS+ filesystem; &lt;/li&gt;    &lt;li&gt;Apple’s bizarre stance on delaying Java releases to integrate their look and feel. Java 6 for the Mac was almost a year late; and &lt;/li&gt;    &lt;li&gt;I’ll have to buy another copy of Civilization 4. &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;Linux is of course an option but I’ve been there and done that. Fact is, both Windows and MacOS are slicker than Gnome or KDE.&lt;/p&gt;  &lt;h3&gt;Conclusion&lt;/h3&gt;  &lt;p&gt;Microsoft reminds me of an ageing housewife revelling in her high school glory days whose greatest achievement is that she still fits into her cheerleader outfit. Sure you were the popular girl once but that was 20 years ago. Times have changed.&lt;/p&gt;  &lt;p&gt;It isn’t 1995 anymore. Next Christmas most people will be fine with a $200-300 netbook. Why would such people buy that price again for an operating system (ever)? Getting existing users to upgrade is (or should be) a key strategy for Microsoft but like any incumbent, business weenies are now running the asylum and they’re more concerned about having room for revenue growth than in actually selling products people want to buy.&lt;/p&gt;  &lt;p&gt;Bring it on Apple!&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img border="0" alt="." src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" width="1" height="1" /&gt;&lt;/div&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/336308386934546555-7948784112271408046?l=www.cforcoding.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/LfYATycUVc0BcOgd7TmX6wWiJ3w/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/LfYATycUVc0BcOgd7TmX6wWiJ3w/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/LfYATycUVc0BcOgd7TmX6wWiJ3w/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/LfYATycUVc0BcOgd7TmX6wWiJ3w/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/tHqU6fivvwo" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/7948784112271408046/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2009/10/microsoft-marketing-insanity-and.html#comment-form" title="9 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/7948784112271408046?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/7948784112271408046?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/tHqU6fivvwo/microsoft-marketing-insanity-and.html" title="Microsoft, Marketing Insanity and Windows Piracy" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07140129710674369084" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">9</thr:total><feedburner:origLink>http://www.cforcoding.com/2009/10/microsoft-marketing-insanity-and.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CkAHQXs5fCp7ImA9WxNXFEk.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-5297993532907126674</id><published>2009-10-01T23:25:00.001+08:00</published><updated>2009-10-02T07:45:30.524+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-10-02T07:45:30.524+08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="stackoverflow" /><category scheme="http://www.blogger.com/atom/ns#" term="opinion" /><title>Stackoverflow, Advertising and the Ethics of a Free Lunch</title><content type="html">&lt;p&gt;If the Internet has taught us nothing else, it has taught us that:&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;Advertising pays for otherwise free services; &lt;/li&gt;    &lt;li&gt;People don’t like advertising; and &lt;/li&gt;    &lt;li&gt;Advertising works. &lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;These conflicting forces always cause consternation and &lt;a href="http://stackoverflow.com/"&gt;Stackoverflow&lt;/a&gt; is by no means immune.&lt;/p&gt;  &lt;h3&gt;Stackoverflow is Free&lt;/h3&gt;  &lt;p&gt;One of the most important features of Stackoverflow is that it is free to browse, ask and answer questions. People like free. It’s one reason I believe that Stackoverflow has been so well-received by programmers as a whole. Of course it has it’s detractors (most of whom seem to lurk on reddit) but as &lt;a href="http://www.research.att.com/~bs/bs_faq.html#really-say-that"&gt;Bjarne Stroustrup says&lt;/a&gt;:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;There are only two kinds of languages: the ones people complain about and the ones nobody uses.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;The lesson being that &lt;em&gt;anything&lt;/em&gt;—not just programming languages—that’s popular will attract countercultural&amp;#160; malcontents keen to assert their non-mainstream identities.&lt;/p&gt;  &lt;h3&gt;Stackoverflow Costs Money&lt;/h3&gt;  &lt;p&gt;While the content is community driven (and thus free), the site is not. It takes money for hosting, hardware, software development, administration, support issues (separate to community moderators) and so on. No one would argue with that. Yet there appears to be a disconnect between the fact that something costs money and the activities required to earn that money. Either that or people mentally file that away as &lt;a href="http://en.wikipedia.org/wiki/Somebody_Else's_Problem"&gt;Somebody Else’s Problem&lt;/a&gt;.&lt;/p&gt;  &lt;p&gt;So how does a “free” service pay for itself?&lt;/p&gt;  &lt;h3&gt;Micro-Transactions Don’t Work&lt;/h3&gt;  &lt;p&gt;An excellent resource on this is &lt;a href="http://www.shirky.com/writings/fame_vs_fortune.html"&gt;Fame vs Fortune: Micropayments and Free Content&lt;/a&gt;:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;This strategy doesn't work, because the act of buying anything, even if the price is very small, creates what Nick Szabo calls &lt;a href="http://szabo.best.vwh.net/micropayments.html"&gt;mental transaction costs&lt;/a&gt;, the energy required to decide whether something is worth buying or not, regardless of price.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Joel Spolsky has also spoken on this subject. In addition to the mental cost of transactions (no matter how small), Joel remarked on how people will do things for free that they will never do if paid (a small amount).&lt;/p&gt;  &lt;h3&gt;Segmentation Doesn’t Work&lt;/h3&gt;  &lt;p&gt;Market segmentation is the time-honoured technique of asking people how much money they have when they want to buy something rather than telling them what it costs, meaning what it costs is a function of how much money they have.&lt;/p&gt;  &lt;p&gt;Joel speaks about this in-depth in &lt;a href="http://www.joelonsoftware.com/articles/CamelsandRubberDuckies.html"&gt;Camels and Rubber Duckies&lt;/a&gt;.&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;Working my way backwards, this business about segmenting? It pisses the &lt;em&gt;heck&lt;/em&gt; off of people. People want to feel they're paying a fair price. They don't want to think they're paying extra just because they're not clever enough to find the magic coupon code. The airline industry got really, really good at segmenting and ended up charging literally a different price to every single person on the plane. As a result most people felt they weren't getting the best deal, and they didn't like the airlines.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Perhaps it’s more correct to say segmentation doesn’t work &lt;em&gt;in the long term&lt;/em&gt;.&lt;/p&gt;  &lt;h3&gt;Advertising Works&lt;/h3&gt;  &lt;p&gt;It’s clear that advertising works as a means of revenue. Why is it clear? If it didn’t, we wouldn’t have it. Of course, that doesn’t mean it works universally. It is obviously possible to lose money on advertising but it’s clearly possible to make money too.&lt;/p&gt;  &lt;p&gt;Traditional media typically lied about &lt;em&gt;conversion rates&lt;/em&gt;. Conversion rate is the percentage of visitors, users, viewers or listeners who see, hear or read an advert that take some desirable action, which could be simply clicking through or result in an inquiry, a sale or the like. Twenty years ago you’d have radio and TV marketing departments who would work up a model based on conversion rates of up to 25%. They did so because there was no way to refute their claims (other thank taking the plunge and getting disappointed with the result). With the internet such things are precisely measurable. Because the cost of distribution is so low, the conversion rates of 1 in 1000 (or less) are fine.&lt;/p&gt;  &lt;p&gt;The other proof that advertising. Possibly &lt;a href="http://arstechnica.com/old/content/2007/12/report-95-percent-of-all-e-mail-has-that-spammy-smell.ars"&gt;95% of email is spam&lt;/a&gt;, &lt;em&gt;if not more&lt;/em&gt;. Clearly the conversion rate is non-zero otherwise they wouldn’t do it so that one guy in 10,000 who can’t find porn on the Internet (somehow) or thinks a plastic bottle of oregano will really extend his… well, you know… he is responsible for spam eclipsing legitimate email by a factor of 20-to-1.&lt;/p&gt;  &lt;h3&gt;Registration to Read Annoys People&lt;/h3&gt;  &lt;p&gt;The Evil Hyphen Site (ie Experts Exchange; deliberately no link) exemplifies this point. You can read content for free on that site if you either know where to look for free registration (deliberately not obvious) or you get to the site from Google (even though it says “register to see the answer” the answer is at the bottom of the page; try it).&lt;/p&gt;  &lt;p&gt;This annoys people and is part of the reason that site has (justifiably) earnt so much hate.&lt;/p&gt;  &lt;p&gt;Sometimes this registration is simply offensive, like why do I need to provide you with my date of birth and home address to read your forum post? Of course one has to wonder about what the less scrupulous operators are doing with such private information but even if you’re reputable, &lt;em&gt;you don’t need it so why are you asking?&lt;/em&gt;&lt;/p&gt;  &lt;p&gt;I can’t speak for anyone else, but as far as that sort of invasive information gathering goes, my name is Jimmy Hoffa, I’m 93 and I live in Afghanistan. I also run a banking company with a million employees and have an annual income of $10,000.&lt;/p&gt;  &lt;p&gt;Stackoverflow doesn’t even require you to register to &lt;em&gt;ask questions&lt;/em&gt;.&lt;/p&gt;  &lt;h3&gt;Alienate Your Community and You Have No Site&lt;/h3&gt;  &lt;p&gt;In this era of social sites (including crowd-sourced sites like Stackoverflow), &lt;em&gt;community matters&lt;/em&gt;. A given solution can succeed and fail on the strength of it’s community. The same solution in different communities may succeed in one and fail in another by virtue of the different communities.&lt;/p&gt;  &lt;p&gt;On a site like Stackoverflow the most important people are the ones who answer questions. This is a somewhat controversial opinion. The editors will disagree (or at least have a higher opinion of their worth). Don’t get me wrong: editing has value but no one celebrates the guy who edited The Great Gatsby, they celebrate F. Scott Fitzgerald.&lt;/p&gt;  &lt;p&gt;Such communities over time can become insular (arguably incestuous). The poster child for this are Wikipedia editors, who went so far as to have a &lt;a href="http://www.theregister.co.uk/2007/12/04/wikipedia_secret_mailing/"&gt;secret McCarthy-esque black list of &amp;quot;problem&amp;quot; users&lt;/a&gt;.&lt;/p&gt;  &lt;p&gt;Lose your community and you lose your site. The Evil Hyphen Site has already done that.&lt;/p&gt;  &lt;h3&gt;Stackoverflow and Advertising&lt;/h3&gt;  &lt;p&gt;Originally Stackoverflow was quite light on for advertising, limited to a (mostly textual) right sidebar. The site started out looking much like this:&lt;/p&gt; &lt;img style="float: left" src="http://img242.imageshack.us/img242/5347/soad3.png" /&gt;   &lt;p style="clear: left"&gt;Now if you have less than 200 reputation it looks like this:&lt;/p&gt; &lt;img style="float: left" src="http://img242.imageshack.us/img242/7175/soad2.png" /&gt;   &lt;p style="clear: left"&gt;Interestingly, it only seems to look like this in Internet Explorer, even when I delete all my cookies. Firefox and Chrome (cookies deleted) still look like the original.&lt;/p&gt; &lt;img style="float: left" src="http://easycaptures.com/fs/uploaded/373/2656709805.png" /&gt;   &lt;p&gt;The difference? The right sidebar is &lt;a href="http://blog.stackoverflow.com/2009/03/"&gt;“higher contrast”&lt;/a&gt; and there is an ad banner at the top of the question (and another further down). The top ads I believe were once text only, which is far less invasive. But Jeff has stated there won’t be any Flash or animated ads.&lt;/p&gt;  &lt;p&gt;The latest controversy concerns the “offensive” advertising indicated to the left. Along with the &amp;quot;offensive&amp;quot; Adobe icons.&lt;/p&gt;  &lt;p&gt;Call me crazy but I actually &lt;em&gt;like&lt;/em&gt; these Adobe symbols on anything Flash/Flex related. It makes them easier to spot and I think it adds value. Spotting an Adobe icon is easier than finding the exact text that you’re after.&lt;/p&gt;  &lt;p&gt;if you do find all questions tagged with one of these sponsored tags, you get this:&lt;/p&gt; &lt;img style="float: left" src="http://easycaptures.com/fs/uploaded/374/6459281781.png" /&gt;   &lt;p style="clear: left"&gt;Is this too much? In my opinion? No. Others (naturally) disagree. Some to the point that they’ve &lt;a href="http://meta.stackoverflow.com/questions/24065/user-script-to-remove-so-sponsored-tag-advertisements"&gt;written a script to remove such sponsored content&lt;/a&gt;.&lt;/p&gt;  &lt;p&gt;Such activities, if done by a sufficiently large percentage of the userbase, undermine that site’s ability to generate revenue that pays for the site existing.&lt;/p&gt;  &lt;h3&gt;“But I Don’t Click on Ads Anyway!”&lt;/h3&gt;  &lt;p&gt;The first obvious rationalization is that basically ads don’t affect you. Bullshit. Ads do two things: they attempt to entice the user to take particular action, clicking through, buying something and so on. They also simply raise awareness of a brand, product or service. This is all about &lt;a href="http://en.wikipedia.org/wiki/Mind_share"&gt;mind share&lt;/a&gt;. This one is subtle and hard to measure but if you see an ad or a logo often enough you’ll subconsciously recognize it.&lt;/p&gt;  &lt;h3&gt;“It’s Like Fast Forwarding Through Commercials”&lt;/h3&gt;  &lt;p&gt;No it isn’t. This defence was used in the &lt;a href="http://arstechnica.com/old/content/2001/11/2551.ars"&gt;ReplayTV lawsuit&lt;/a&gt;:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;Yet, what the advertisers who are supporting TV are paying for is the &lt;em&gt;potential&lt;/em&gt; that you &lt;em&gt;might&lt;/em&gt; watch television ads. They know you might channel surf, get up and fart, go grab a smoke, or whatever. The challenge to the advertising agencies is to make commercials that you like to watch, that you &lt;em&gt;want&lt;/em&gt; to watch. By editing out the commercials entirely, &lt;em&gt;a priori&lt;/em&gt;, the networks can claim that ReplayTV in effect creates a derivative work that deprives them of the &lt;em&gt;possibility&lt;/em&gt; that you might actually watch the ads. It is that possibility that generates the value of their ad space, and if something like ReplayTV were widely used, those numbers would drop, big time.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;The other way to look at this is that if no one saw the ads, no sponsor would pay for them. If half the audience skipped the ads, it would be worth half as much to the sponsor and so on.&lt;/p&gt;  &lt;h3&gt;“It’s Already Loaded!”&lt;/h3&gt;  &lt;p&gt;Irrelevant. Something that’s loaded but never seen is of no value to an advertiser. Also, revenue from advertising can come from simply placing the ad, clicking through the ad or some combination of the two.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://meta.stackoverflow.com/users/130914/adam-bellaire"&gt;Adam Bellaire&lt;/a&gt; claims:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;I don't think the SO guys are going to &amp;quot;not get paid&amp;quot; by a user script removing images and content &lt;i&gt;after it's already loaded.&lt;/i&gt; Nobody can tell who or how many people are using this thing.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;It could be argued that Adam believes advertisers are that clueless but it would be the height of naiveté. It’s far more likely that this is simply rationalisation.&lt;/p&gt;  &lt;h3&gt;“I Should Be Able to Opt Out.”&lt;/h3&gt;  &lt;p&gt;Adam once again pontificates:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;…this is a completely opt-in script!&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;So what? Does it magically cost nothing to provide the service for you specifically? I must’ve missed the “This packet is intended for Adam” bit in the TCP/IP packet structure so the telcos know not to charge for it. I’ve got a good mind to write to &lt;a href="http://www.kohala.com/start/"&gt;W. Richard Stevens&lt;/a&gt; so he can issue an emergency addendum to his book.&lt;/p&gt;  &lt;h3&gt;“But This is ME!”&lt;/h3&gt;  &lt;p&gt;All of this comes down to what I call the BITM (“But this is ME!”) syndrome, closely related to &lt;a href="http://en.wikipedia.org/wiki/NIMBY"&gt;NIMBY&lt;/a&gt; (“Not In My Backyard”). Examples include “I realize there is a speed limit… but this is ME!”, “I realize that I should stop at this almost red light… but this is ME!” and so on. Once again with Adam:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;Or do you mean to say that as long as &lt;i&gt;some&lt;/i&gt; people see the sponsored ads, then it's okay? Because I agree with that,&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;To put it another way: I understand someone needs to pay for this, I just don’t see why it should be me.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://meta.stackoverflow.com/users/133733/alex-papadimoulis"&gt;Alex Papadimoulis&lt;/a&gt; succinctly rebuts this:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;Ad blockers are like the fat bastards at the grocery store who take handful after handful of free samples. If everyone [Ed] did it, the system would collapse and everyone would lose out. We know it, they know it, and we all just roll our eyes as they stuff their face with cut-up hot dogs and go &amp;quot;whhaaaat?&amp;quot;. When they try to justify it (&amp;quot;it doesn't say &lt;i&gt;only one&lt;/i&gt;, not my fault they give it away!&amp;quot;), it just makes 'em look worse. As I always say, at least have the decency to admit you're a bastard&lt;/p&gt; &lt;/blockquote&gt;  &lt;h3&gt;The Innate Sense of Fairness&lt;/h3&gt;  &lt;p&gt;People have an &lt;a href="http://arstechnica.com/old/content/2008/06/exploring-the-neurochemistry-of-fairness.ars"&gt;innate sense of fairness&lt;/a&gt; to the point it can be manipulated or predicted with neurochemistry. This can work for companies if they treat their users fairly or against them if they don’t.&lt;/p&gt;  &lt;p&gt;EA released Spore with a nauseating and invasive DRM system that &lt;a href="http://news.cnet.com/8301-10797_3-10046565-235.html"&gt;limited it to three activations&lt;/a&gt; (later changed to five). Microsoft tried the same thing with &lt;a href="http://www.bit-tech.net/news/hardware/2006/10/26/Microsoft_clarifies_Vista_activation_to_bit-tech/1"&gt;limited activations of OEM Vista&lt;/a&gt;. People naturally believe that if they buy something they should be able to reinstall it as many times as they want.&lt;/p&gt;  &lt;p&gt;A lawyer may argue that you haven’t bought the software, you’ve bought a license to use it in a limited way. But we’re not talking what’s &lt;em&gt;legal&lt;/em&gt; here. We’re talking what’s &lt;em&gt;fair&lt;/em&gt;.&lt;/p&gt;  &lt;p&gt;It’s this sense of fairness that will cause people to reject the underhanded tactics of the Evil Hyphen Site to get you to subscribe or pirate some piece of software they’ve bought that has run out of activations (or simply installs a rootkit allowing your system to be hacked). So I guarantee you that if the Jeff and Joel go too far with advertising the userbase will react. But we’re not there yet. Nowhere near it.&lt;/p&gt;  &lt;h3&gt;There’s No Such Thing as a Free Lunch&lt;/h3&gt;  &lt;p&gt;Sites cost money to develop, maintain and host. They have a &lt;em&gt;right&lt;/em&gt; to earn revenue to cover their costs and make a return on the investment they’ve made (and risked). So what’s &lt;em&gt;fair&lt;/em&gt;?&lt;/p&gt;  &lt;p&gt;My personal opinion is that icons on tags are OK, sponsored links at the top are OK (if you have a problem with that I guess you don’t use Google or pretty much any other search engine) and the side bar is OK. I find the graphical ads littered throughout the question a bit much but then again I have more than 200 reputation so don’t see them.&lt;/p&gt;  &lt;p&gt;For what it’s worth I think I’ve even clicked on a couple (&lt;a href="http://www.telerik.com/"&gt;Telerik&lt;/a&gt; and &lt;a href="http://www.spreadsheetgear.com/"&gt;SpreadsheetGear&lt;/a&gt; spring to mind) of the many thousands I’ve no doubt seen but, as mentioned, such a low conversion rate is to be expected.&lt;/p&gt;  &lt;p&gt;The euphemism “opt out” in this context is akin to “it’s OK to steal from the supermarket as long as no one else does it”. If you want good services like Stackoverflow to exist then &lt;em&gt;on principle alone&lt;/em&gt; you should be supporting them.&lt;/p&gt;  &lt;p&gt;Writing scripts to block ads is just selfish. What’s more if the ads offend you that much it suggests a certain irresponsibility, thoughtlessness, touchiness and intolerance that doesn’t speak well of your character.&lt;/p&gt;  &lt;p&gt;Seriously, get a clue.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/336308386934546555-5297993532907126674?l=www.cforcoding.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/1QW4uvX0fcpXhiebkl9jMZBEdUo/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/1QW4uvX0fcpXhiebkl9jMZBEdUo/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/1QW4uvX0fcpXhiebkl9jMZBEdUo/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/1QW4uvX0fcpXhiebkl9jMZBEdUo/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/szwklptf4Go" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/5297993532907126674/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2009/10/stackoverflow-advertising-and-ethics-of.html#comment-form" title="72 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/5297993532907126674?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/5297993532907126674?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/szwklptf4Go/stackoverflow-advertising-and-ethics-of.html" title="Stackoverflow, Advertising and the Ethics of a Free Lunch" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07140129710674369084" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">72</thr:total><feedburner:origLink>http://www.cforcoding.com/2009/10/stackoverflow-advertising-and-ethics-of.html</feedburner:origLink></entry><entry gd:etag="W/&quot;AkEDQHo6fSp7ImA9WxNQEUw.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-856290674033538103</id><published>2009-09-17T00:31:00.001+08:00</published><updated>2009-09-17T00:31:11.415+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-09-17T00:31:11.415+08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="open source" /><category scheme="http://www.blogger.com/atom/ns#" term="php" /><category scheme="http://www.blogger.com/atom/ns#" term="html" /><title>Anyone Interested in a PHP Fluent Interface for Generating HTML?</title><content type="html">&lt;p&gt;For the past few days I’ve been toying with a set of classes in PHP that is basically a fluent interface for generating HTML. A basic usage example is:&lt;/p&gt;  &lt;pre class="brush:php"&gt;echo p(b('This'), ' is a ', i('test'))-&amp;gt;addClass('foo')-&amp;gt;html();&lt;/pre&gt;

&lt;p&gt;or alternatively:&lt;/p&gt;

&lt;pre class="brush:php"&gt;echo new Paragraph(
       new Bold('This'),
       ' is a ',
       new Italic('test'))-&amp;gt;addClass('foo')-&amp;gt;html();&lt;/pre&gt;

&lt;p&gt;or&lt;/p&gt;

&lt;pre class="brush:php"&gt;echo new Paragraph()-&amp;gt;append(new Bold('This'))
                    -&amp;gt;append(' is a ')
                    -&amp;gt;append(new Italic('test')
                    -&amp;gt;addClass('foo')
                    -&amp;gt;html();&lt;/pre&gt;

&lt;p&gt;which outputs&lt;/p&gt;

&lt;pre class="brush:xml"&gt;&amp;lt;p class=&amp;quot;foo&amp;quot;&amp;gt;&amp;lt;b&amp;gt;This&amp;lt;/b&amp;gt; is a &amp;lt;i&amp;gt;test&amp;lt;/i&amp;gt;&amp;lt;/p&amp;gt;&lt;/pre&gt;

&lt;p&gt;My goals in doing this were to:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Create a terse fluent interface for gnerating HTML; &lt;/li&gt;

  &lt;li&gt;Create valid HTML. This code currently implements the HTML 4.01 Transitional standard so it'll generate an error if you try and append a &amp;lt;p&amp;gt; element to an &amp;lt;object&amp;gt; element, for example; &lt;/li&gt;

  &lt;li&gt;Correctly handle HTML escaping implicitly; and &lt;/li&gt;

  &lt;li&gt;Provide a level of jQuery like manipulation functionality. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I did this largely to see how it would turn out. Basically for my own curiosity. I don't even know if I'll be using this with what I'm working on but just in case I have to ask: is this something others might be interested in?&lt;/p&gt;

&lt;p&gt;I haven't seen anything like it but that doesn't mean it hasn't already been done. If there is interest in it, even if its just idle curiosity, I'll clean it up and open source it.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/336308386934546555-856290674033538103?l=www.cforcoding.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/flbWEzd3q-z-zgx2F7yWCWJdvs0/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/flbWEzd3q-z-zgx2F7yWCWJdvs0/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/flbWEzd3q-z-zgx2F7yWCWJdvs0/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/flbWEzd3q-z-zgx2F7yWCWJdvs0/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/_V-nXxa6t9M" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/856290674033538103/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2009/09/anyone-interested-in-php-fluent.html#comment-form" title="5 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/856290674033538103?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/856290674033538103?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/_V-nXxa6t9M/anyone-interested-in-php-fluent.html" title="Anyone Interested in a PHP Fluent Interface for Generating HTML?" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07140129710674369084" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">5</thr:total><feedburner:origLink>http://www.cforcoding.com/2009/09/anyone-interested-in-php-fluent.html</feedburner:origLink></entry><entry gd:etag="W/&quot;Ak8CRHo_eyp7ImA9WxNQEU0.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-1092210923623922049</id><published>2009-09-16T21:47:00.001+08:00</published><updated>2009-09-16T21:47:45.443+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-09-16T21:47:45.443+08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="web development" /><category scheme="http://www.blogger.com/atom/ns#" term="php" /><category scheme="http://www.blogger.com/atom/ns#" term="smarty" /><title>PHP Smarty Tutorial: Caching and Versioning Static Content</title><content type="html">&lt;p&gt;I am a nascent advocate of using &lt;a href="http://www.smarty.net/"&gt;Smarty&lt;/a&gt; as a view templating engine in PHP. Some—including myself until recently—might ask “Why use a templating engine for PHP, which is a templating engine?” In this post I hope to give you one example of how Smarty can clean up code in a useful way.&lt;/p&gt;  &lt;h3&gt;Static Content&lt;/h3&gt;  &lt;p&gt;Static content refers to primarily images, Cascading Style Sheets (“CSS”) and JavaScript files but can also include such things as videos, audio clips and so on. All of this tends to be labelled “static content” as it is largely unchanging (as opposed to Web pages, which are largely dynamic).&lt;/p&gt;  &lt;p&gt;Static content can include things like static HTML files but in practice other methods tend to be more common for caching HTML content.&lt;/p&gt;  &lt;h3&gt;Caching&lt;/h3&gt;  &lt;p&gt;Static content can be cached in two places: on the client and on the server.&lt;/p&gt;  &lt;p&gt;Caching on the server isn’t typically an issue with truly static content. It’s more an issue with cached content that is generated in some way, such as the dynamically generated Javascript files from &lt;a href="http://www.cforcoding.com/2009/05/supercharging-javascript.html"&gt;Supercharging Javascript in PHP&lt;/a&gt;. Serverside caching is not the focus of this post.&lt;/p&gt;  &lt;p&gt;Client caching involves telling the browser to keep a copy of certain content and not request it from the server. There are several variations upon this theme:&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;Caching for a fixed period of time. For example, setting the HTTP Expires header in the future; &lt;/li&gt;    &lt;li&gt;Allowing the browser to ask if the content has changed. For example, use of ETags in HTTP/1.1; and &lt;/li&gt;    &lt;li&gt;Some combination of the two. &lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;The one we’ll focus on is (1). It is a simple scheme but you must balance setting the Expires header sufficiently far in the future versus making spurious requests to the server when the content hasn’t changed.&lt;/p&gt;  &lt;h3&gt;Versioning&lt;/h3&gt;  &lt;p&gt;As a solution to the far futures Expires header problem of content that changes, a common scheme used by modern Websites is to use some form of versioning. What this means is this: the Expires header is set in the far future (one or more years). When the content changes a new URL is generated. This forces the browser to reload the content.&lt;/p&gt;  &lt;p&gt;In my previous articles I used URLs like this:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://example.com/images/logo.1233454569.png"&gt;http://example.com/images/logo.1233454569.png&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;The disadvantage of this is that it requires URL rewriting to work. I have since settled on a simpler scheme: putting the file’s last modified time as the query string. This means a URL like this:&lt;/p&gt;  &lt;p&gt;&lt;a title="http://example.com/images/logo.png?1233454569" href="http://example.com/images/logo.png?1233454569"&gt;http://example.com/images/logo.png?1233454569&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;No URL rewriting required and it should work out of the box with Web servers.&lt;/p&gt;  &lt;h3&gt;Static Content Domain&lt;/h3&gt;  &lt;p&gt;Another optimization that is encouraged is to split your publicly available static content onto a separate domain. The reason is that you could minimize traffic and overhead by the client not sending unnecessary cookies.&lt;/p&gt;  &lt;h3&gt;Image Dimensions&lt;/h3&gt;  &lt;p&gt;Lastly, images in particular can be optimized by specifying their height and width in the HTML. This can speed up how fast the browser renders the page and whether or not there are possibly unsightly visual artefacts as the page loads, such as content jittering or shifting position while it is loading.&lt;/p&gt;  &lt;h3&gt;Directory Structure&lt;/h3&gt;  &lt;p&gt;This example will assuming the following directory structure:&lt;/p&gt;  &lt;pre class="brush:plain"&gt;/var/www                    Top-level on host
+- /site                    Document root of dynamic domain
+- /static                  Document root of static domain
   +- /images               Images subdirectory
+- /lib                     Third-party code
   +- /Smarty-2.6.26        Smarty install files
+- /include                 PHP files not served by the Web server and thus not under the document root
   +- Mysmarty.class.php    Custom Smarty template generation. See Smarty installation notes.
   +- /templates            Smarty templates
   +- /templates_c          Compiled Smarty templates
   +- /cache                Cached Smarty content
   +- /config               Smarty config files
   +- /plugins              Smarty plugins&lt;/pre&gt;

&lt;p&gt;Your structure may of course vary. I will say I prefer to put nothing under the document root that isn’t the endpoint of an HTTP request.&lt;/p&gt;

&lt;p&gt;The Apache, IIS, nginx or other config to set the far future Expires header is assumed.&lt;/p&gt;

&lt;h3&gt;Mysmarty.class.php&lt;/h3&gt;

&lt;p&gt;A common technique with Smarty is to create a custom subclass of Smarty that sets all the correct configuration. This looks something like this:&lt;/p&gt;

&lt;pre class="brush:php"&gt;&amp;lt;?php
define('BASE_DIR',      '/var/www/');
define('LIB_DIR',       BASE_DIR . 'lib/');
define('INCLUDE_DIR',   BASE_DIR . 'include/');

// separate static site
define('STATIC_URL',    'http://mystatic.com/');
define('STATIC_DIR',    BASE_DIR . 'static/');

// use the following if everything is on one site
//define('STATIC_URL',    '/');
//define('STATIC_DIR',    BASE_DIR . 'site/');

// Smarty directories
define('TEMPLATES_DIR', INCLUDE_DIR . 'templates/');
define('CONFIG_DIR',    INCLUDE_DIR . 'config/');
define('COMPILE_DIR',   INCLUDE_DIR . 'templates_c/');
define('CACHE_DIR',     INCLUDE_DIR . 'cache/');
define('PLUGINS_DIR',   INCLUDE_DIR . 'plugins/');

// static content
define('IMAGES_DIR',    STATIC_DIR . 'images/');

// URLs
define('IMAGES_URL',    STATIC_URL . 'images/');

require '../lib/Smarty-2.6.26/libs/Smarty.class.php';

class MySmarty extends Smarty {
  function MySmarty() {
    $this-&amp;gt;Smarty();

    $this-&amp;gt;template_dir = TEMPLATES_DIR;
    $this-&amp;gt;compile_dir  = COMPILE_DIR;
    $this-&amp;gt;config_dir   = CONFIG_DIR;
    $this-&amp;gt;cache_dir    = CACHE_DIR;
    $this-&amp;gt;plugins_dir  = PLUGINS_DIR;
  }
}
?&amp;gt;&lt;/pre&gt;

&lt;p&gt;Hopefully you should be able to adapt the above to whatever directory structure you use.&lt;/p&gt;

&lt;h3&gt;Smarty Image Plugin&lt;/h3&gt;

&lt;p&gt;To demonstrate this, I’ll use just one example: images. You can however easily apply this to other forms of static content.&lt;/p&gt;

&lt;p&gt;Assume the following code exists and is accessible to the plugin:&lt;/p&gt;

&lt;pre class="brush:php"&gt;&amp;lt;?php
$standard_attributes = array('class', 'dir', 'id', 'lang', 'style', 'title');

function standard_attributes(array $attributes) {
  global $standard_attributes;
  $ret = '';
  foreach ($standard_attributes as $attr) {
    if (isset($attributes[$attr])) {
      $ret .= attribute($attr, $attributes[$attr]);
    }
  }
  return $ret;
}

function attribute($name, $value) {
  return ' ' . $name . '&amp;quot;' . htmlspecialchars($value) . '&amp;quot;';
}

function build_image($params, &amp;amp;$smarty) {
  $src = $params['src'];
  if (!isset($src)) {
    return &amp;quot;[No src to image]&amp;quot;;
  }
  $file = IMAGES_DIR . $src;
  $mtime = filemtime($file);
  if ($mtime === false) {
    return &amp;quot;[Image '$src' not found]&amp;quot;;
  }
  $height = $params['height'];
  if (!isset($height)) {
    $size = getimagesize($file);
    $height = $size[1];
  }
  $width = $params['width'];
  if (!isset($width)) {
    if (!isset($size)) {
      $size = getimagesize($file);
    }
    $width = $size[0];
  }
  $alt = htmlspecialchars($params['alt']);
  $attribs = standard_attributes($params);
  $url = IMAGES_URL . htmlspecialchars($src);
  return &amp;lt;&amp;lt;&amp;lt;END
&amp;lt;img alt=&amp;quot;$alt&amp;quot; src=&amp;quot;$url?$mtime&amp;quot; width=&amp;quot;$width&amp;quot; height=&amp;quot;$height&amp;quot; $attribs&amp;gt;
END;
}
?&amp;gt;&lt;/pre&gt;

&lt;p&gt;It makes sense to put the above into Mysmarty.class.php or at least include/require it there.&lt;/p&gt;

&lt;p&gt;In the Smarty plugins directory, you need to create a file called function.image.php.&lt;/p&gt;

&lt;pre class="brush:php"&gt;&amp;lt;?php
function smarty_function_image($params, &amp;amp;$smarty) {
  return build_image($params, &amp;amp;$smarty);
}
?&amp;gt;&lt;/pre&gt;

&lt;p&gt;You may well ask &amp;quot;why not put the code in the plugin directly instead of calling a function?&amp;quot; and you'd be right except for one thing: I will reuse it for another plugin but more on that later.&lt;/p&gt;

&lt;p&gt;This plugin does several things:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;It verifies that the specified image actually exists in the expected location; &lt;/li&gt;

  &lt;li&gt;It gets the last modified time (mtime) of the image file; &lt;/li&gt;

  &lt;li&gt;It prepends the correct URI to the image name including the static domain if there is one; &lt;/li&gt;

  &lt;li&gt;The plugin user can specify as many or as few attributes of the image tag as they wish. This version is written to handle most that you would expect to encounter and is easily extensible to include others; &lt;/li&gt;

  &lt;li&gt;It uses the GD library to get the height and width of the image, if required. It is only required if the user does not specify both. The user-supplied values are used in preference. &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That’s a lot. In a template it is nothing more than:&lt;/p&gt;

&lt;pre class="brush:plain"&gt;{image src=&amp;quot;logo.gif&amp;quot;}
{image src=&amp;quot;top.png&amp;quot; height=&amp;quot;100&amp;quot; width=&amp;quot;300&amp;quot; alt=&amp;quot;Some alt text&amp;quot;}
etc&lt;/pre&gt;

&lt;p&gt;You can of course do that in standard PHP. I've previously created functions like auto_version() just for this purpose but it's nowhere near as readable and maintainable. The pure PHP version would have to pass in an array of parameters like so:&lt;/p&gt;

&lt;pre class="brush:php"&gt;&amp;lt;?php echo auto_version(array('src' =&amp;gt; 'logo.gif', 'height' =&amp;gt; '100', 'width' =&amp;gt; '200', 'alt' =&amp;gt; 'Some alt test'); ?&amp;gt;&lt;/pre&gt;

&lt;p&gt;But it gets even better.&lt;/p&gt;

&lt;h3&gt;Compiler Plugins&lt;/h3&gt;

&lt;p&gt;The astute reader will point out the inefficiency of this: every page request will access the image file once or twice. Firstly to get the mtime. Secondly to get the height and width (if not specified by the user). What’s more the second operation is arguably more expensive since it necessitates reading in the image file and processing it.&lt;/p&gt;

&lt;p&gt;The plugin we just created is, in Smarty terms, a &lt;a href="http://www.smarty.net/manual/en/plugins.functions.php"&gt;template function&lt;/a&gt;. The function is called with every view of the template. An alternative is to use a &lt;a href="http://www.smarty.net/manual/en/plugins.compiler.functions.php"&gt;compiler function&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The basic lifecycle of a template is:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;If the template file is newer than the compiled template or there is no compiled template, compile it; &lt;/li&gt;

  &lt;li&gt;Templates are compiled into PHP files that are put in the ‘compile_dir’ directory (typically ‘templates_c’); &lt;/li&gt;

  &lt;li&gt;Caching and other config can change this. &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A template function is called on each execution. A compiler function is only called when the template is compiled into PHP.&lt;/p&gt;

&lt;p&gt;You might say that for many of your images they change very infrequently. As such the overhead of reading the dimensions and the last modified time is overkill. This is a good candidate for a compiler function. These values will only be calculated once.&lt;/p&gt;

&lt;p&gt;The downside is that if the image does change you will need to regenerate the compiled templates. This is easily done by simply deleting the relevant contents of the compile directory.&lt;/p&gt;

&lt;h3&gt;Static Image Plugin&lt;/h3&gt;

&lt;p&gt;We will create another version of our plugin that will be executed at template compile time. We will call this one ‘static_image’. If it had the same name ('”image”) as the template function, the compiler function would take precedence (ie the other would not be called).&lt;/p&gt;

&lt;p&gt;There are several differences with template functions:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;The file is named compiler.static_image.php; &lt;/li&gt;

  &lt;li&gt;The function is named smarty_compiler_static_image(); &lt;/li&gt;

  &lt;li&gt;The raw argument (eg ‘src=”logo.png height=”100”’) is passed as a string to the plugin instead of an array of key-value pairs; and &lt;/li&gt;

  &lt;li&gt;A compiler function generates PHP statements instead of HTML markup. &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Bearing all this in mind, we need two functions: one to parse the raw text argument and another to generate the PHP statements. Also, the build_image() function needs to handle a string parameter.&lt;/p&gt;

&lt;pre class="brush:php"&gt;&lt;lt  $args="$smarty-" { &amp;$smarty) Smarty $tag_attrs, build_params(string function ;?PHP&gt;_parse_attrs($tag_attrs);
  $ret = array();
  foreach ($args as $k =&amp;gt; $v) {
    $ret[$k] = $smarty-&amp;gt;_dequote($v);
  }
  return $ret;
}

function php_echo(string $text) {
  return 'echo &amp;quot;' . addslashes($text) . &amp;quot;\&amp;quot;;&amp;quot;;
}

function build_image($params, &amp;amp;$smarty) {
  if (!is_array($params)) {
    $params = build_params($params);
  }
  $src = $params['src'];
  if (!isset($src)) {
    return &amp;quot;[No src to image]&amp;quot;;
  }
  $file = IMAGES_DIR . $src;
  $mtime = filemtime($file);
  if ($mtime === false) {
    return &amp;quot;[Image '$src' not found]&amp;quot;;
  }
  $height = $params['height'];
  if (!isset($height)) {
    $size = getimagesize($file);
    $height = $size[1];
  }
  $width = $params['width'];
  if (!isset($width)) {
    if (!isset($size)) {
      $size = getimagesize($file);
    }
    $width = $size[0];
  }
  $alt = htmlspecialchars($params['alt']);
  $attribs = standard_attributes($params);
  $url = IMAGES_URL . htmlspecialchars($src);
  return &amp;lt;&amp;lt;&amp;lt;END
&amp;lt;img alt=&amp;quot;$alt&amp;quot; src=&amp;quot;$url?$mtime&amp;quot; width=&amp;quot;$width&amp;quot; height=&amp;quot;$height&amp;quot; $attribs&amp;gt;
END;
}
?&amp;gt;&lt;/pre&gt;

&lt;p&gt;With these changes and additions, the compiler template becomes:&lt;/p&gt;

&lt;pre class="brush:php"&gt;&amp;lt;?php
function smarty_compiler_static_image($tag_attrs, &amp;amp;$smarty) {
  return php_echo(build_image($tag_attrs, &amp;amp;$smarty));
}
?&amp;gt;&lt;/pre&gt;

&lt;p&gt;Using it is as simple as:&lt;/p&gt;

&lt;pre class="brush:plain"&gt;{static_image src=&amp;quot;logo.png&amp;quot;}&lt;/pre&gt;

&lt;p&gt;&lt;i&gt;This is something not easily done in pure PHP.&lt;/i&gt;&lt;/p&gt;

&lt;h3&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;This is but scratching the surface of what Smarty is capable of but I hope it provides a useful demonstration of what makes Amsarty so good.&lt;/p&gt;

&lt;p&gt;What’s more this functionality comes at very low cost since Smarty is simply compiled into straight PHP. Barring the initial compile, the only cost on a per-request basis is one to see if the template has changed (and thus needs a recompile). In production environments this check can even be disabled for further performance.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/336308386934546555-1092210923623922049?l=www.cforcoding.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/xmsi_gXFQuRTyLZVynWr9NXh20c/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/xmsi_gXFQuRTyLZVynWr9NXh20c/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/xmsi_gXFQuRTyLZVynWr9NXh20c/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/xmsi_gXFQuRTyLZVynWr9NXh20c/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/TvAUgDWDxb0" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/1092210923623922049/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2009/09/php-smarty-tutorial-caching-and.html#comment-form" title="6 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/1092210923623922049?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/1092210923623922049?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/TvAUgDWDxb0/php-smarty-tutorial-caching-and.html" title="PHP Smarty Tutorial: Caching and Versioning Static Content" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07140129710674369084" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">6</thr:total><feedburner:origLink>http://www.cforcoding.com/2009/09/php-smarty-tutorial-caching-and.html</feedburner:origLink></entry><entry gd:etag="W/&quot;D0MCQHkyeip7ImA9WxNRFE8.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-8588625789022570319</id><published>2009-09-08T23:40:00.001+08:00</published><updated>2009-09-08T23:57:41.792+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-09-08T23:57:41.792+08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="git" /><category scheme="http://www.blogger.com/atom/ns#" term="methodology" /><category scheme="http://www.blogger.com/atom/ns#" term="hosting" /><title>Windows Git Tutorial: Cygwin, SSH and Projectlocker</title><content type="html">&lt;p&gt;Recently I’ve switched from using Subversion to using &lt;a href="http://git-scm.com/"&gt;Git&lt;/a&gt; for version control on personal projects. Now as much as Windows annoys me—and believe it me when I say it does annoy me—I still prefer to use it as a development environment (for PHP or Java) over Ubuntu (which I used for that sort of thing for a year or two). There is a better selection of tools. The UI is better. And to top it off, Civilization 4 is easier to run on Windows.&lt;/p&gt;  &lt;h3&gt;Why Git?&lt;/h3&gt;  &lt;p&gt;There are quite a few comparisons between Subversion and Git out there. Ultimately they’re just different philosophies and approaches. For me it came down to:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;I like the Git philosophy of being able to distribute patches without creating history on your central repository. At work, doing a code review usually entails doing a checkin (and yes I know you can create svn diffs); &lt;/li&gt;    &lt;li&gt;Git allows you to push your changes to several repositories. This is great for redundancy. I’m paranoid about losing source code. Anyone who writes software should be paranoid about becoming the next &lt;a href="http://it.slashdot.org/article.pl?sid=09/05/15/0138204"&gt;Avsim&lt;/a&gt;; &lt;/li&gt;    &lt;li&gt;You can use Git for deployment to your hosting provider; and &lt;/li&gt;    &lt;li&gt;All the cool kids are doing it. :-) &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;That being said, Git does have some downsides:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Subversion has better tooling and IDE integration; &lt;/li&gt;    &lt;li&gt;Getting Git to work right on Windows is a hair-pulling experience (which is the point of this post); &lt;/li&gt;    &lt;li&gt;With Subversion you are the user that’s set up for you in the repository. In Git, you are who you say you are, which takes a bit of getting used to; &lt;/li&gt;    &lt;li&gt;Obviously multiple repositories is by its nature more complicated than a single central repository; and &lt;/li&gt;    &lt;li&gt;Subversion deals with binary files better and easier than Git does. &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;One piece of advice I’ll give you is this: &lt;strong&gt;If your Windows username has a space in it, &lt;em&gt;change it&lt;/em&gt;&lt;/strong&gt;. I was bashing my head against a brick wall for awhile before I figured out that my Git commands were failing to remotely authenticate because the Python script didn’t handle the case where the username contained a space as best as I could tell.&lt;/p&gt;  &lt;h3&gt;Why Cygwin?&lt;/h3&gt;  &lt;p&gt;The nicest environment to have is one where you don’t have to type in your password every time you want to do a pull or push to a remote repository. Good security suggests you should do such things over SSH.&lt;/p&gt;  &lt;p&gt;I wasted a lot of time with firstly the bundled git-bash and then with PuTTY. I had problems with git-bash like when doing a git push over ssh it would inexplicably hang. It also seemed to be leaking file descriptors or something because it reached a point where it wouldn’t connect to anything, not even things I’d successfully connected to.&lt;/p&gt;  &lt;p&gt;PuTTY I usually use under sufferance. It does the job but I find myself just wanting a good command line (rather than say plink.exe and a command prompt).&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Do yourself a favour and use Cygwin from Day One&lt;/strong&gt;.&lt;/p&gt;  &lt;p&gt;I’m currently using &lt;a href="http://www.cygwin.com/#beta-test"&gt;Cygwin 1.7.0&lt;/a&gt; even though it’s in beta. When you install it, you have to select the packages you want. SSH (openssh) is not installed by default so select that under Net or search for it and select it.&lt;/p&gt;  &lt;p&gt;One of the (many) nice things about Cygwin is that it creates a virtual filesystem for you where you can mount different directories at different points using the mount command. For example:&lt;/p&gt;  &lt;pre class="brush:bash"&gt;mount c:/xampp/htdocs /www&lt;/pre&gt;

&lt;p&gt;creates a convenient spot for your deployment directory.&lt;/p&gt;

&lt;p&gt;If you’re not familiar with a Unix/Linux shell like bash and the typical Unix/Linux commands, do yourself a favour and get comfortable with them.&lt;/p&gt;

&lt;h3&gt;Why Projectlocker?&lt;/h3&gt;

&lt;p&gt;One of the things I wanted was a remote backup for my source code that is separate to my hosting account. Some may consider this overly paranoid. I think it’s good practice to have a local repository, your production code deployed on your hosting provider and a remote backup of your development work since that won’t necessarily be pushed to the hosting provider.&lt;/p&gt;

&lt;p&gt;I compared several private hosting alternatives including the ever-popular &lt;a href="http://github.com/"&gt;Github&lt;/a&gt;, &lt;a href="http://unfuddle.com/"&gt;Unfuddle&lt;/a&gt; and &lt;a href="http://www.projectlocker.com/"&gt;Projectlocker&lt;/a&gt;. Their &lt;a href="https://www.projectlocker.com/scenario/startup"&gt;free version&lt;/a&gt; for me had the best features:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Both Subversion and Git; &lt;/li&gt;

  &lt;li&gt;Multiple active projects; &lt;/li&gt;

  &lt;li&gt;Unlimited repositories; &lt;/li&gt;

  &lt;li&gt;RAID storage and nightly backups; &lt;/li&gt;

  &lt;li&gt;500MB storage; AND &lt;/li&gt;

  &lt;li&gt;&lt;strong&gt;SSL encryption.&lt;/strong&gt; &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These services all differ in what products they integrate (eg Wikis, bug/issue tracking, project management and so on) so you really need to do your own research to find what’s right for you.&lt;/p&gt;

&lt;p&gt;The one downside of Projectlocker is their website is rather primitive and at times slow to respond (with certain pages). But once you’re set up you should have almost no need to go to it again.&lt;/p&gt;

&lt;h3&gt;Setting up SSH&lt;/h3&gt;

&lt;p&gt;Using public key encryption is far more convenient than being asked for your password every time you want to access the remote repository.&lt;/p&gt;

&lt;p&gt;First, create a public key if you don’t have one already:&lt;/p&gt;

&lt;pre class="brush:bash"&gt;username@hostname:~$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/username/.ssh/id_rsa):
Created directory '/home/username/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/username/.ssh/id_rsa.
Your public key has been saved in /home/username/.ssh/id_rsa.pub.
The key fingerprint is:
63:17:a4:06:78:34:9d:4f:2d:4c:7f:8f:84:5a:4b:c4 &lt;a href="mailto:username@hostnameusername@hostname:~$"&gt;username@hostname
username@hostname:~$&lt;/a&gt; cat ~/.ssh/id_rsa.pub
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEArGyNxx5nct8vic01GxKPybSxL6ZNBMcOrtwo0dMSU0qN9l4pxAuX0jjJe4Pl/I121NztTDdiIntOTaYiQQXTQ2NP8MD5X1oyr7svs8Rm50zpQwOQ3rt4MvwgotZZjMoETT39fA3soRoQLQS5LzD0W7cvVdSoTGYcL+lfv2f0xQdS+CgCrRTeiwn7KYgD6arZLo6B1wVPCAmaiyo1Hetu1q7UIzh4dCACUy4BymlLAxon0NRWAhEADKltZqMitnPgDqtRXyMLUzEn6AvIotRprK7LoPzvLqz2MgBfzTne13Dz8LFOPhbM2n7cSf/OEUt+TtKZqvIoUb79smKDsf2aXw== username@hostname&lt;/pre&gt;

&lt;p&gt;Next, go to the &lt;a href="http://portal.projectlocker.com"&gt;Projectlocker Portal&lt;/a&gt;, login and if you haven’t done so already, create a Git repository. Your Git URL will be under “User Home” on the left hand menu. Also on the left-hand menu is “Manage Public Keys”. Click “New Key” and enter:&lt;/p&gt;

&lt;p&gt;&lt;img style="width: 500px" src="http://easycaptures.com/fs/uploaded/351/3956077181.png" /&gt;&lt;/p&gt;

&lt;p&gt;Save that and assuming everything has been done correctly you should be able to go to your local machine and type:&lt;/p&gt;

&lt;pre class="brush:bash"&gt;username@host:~$ ssh git-CompanyName@freeN.projectlocker.com
PTY allocation request failed on channel 0
ERROR:gitosis.serve.main:Need SSH_ORIGINAL_COMMAND in environment.
Connection to freeN.projectlocker.com closed.&lt;/pre&gt;

&lt;p&gt;You don't have shell access but as long as you don't get prompted for your password, it should all be working correctly. You can get more detailed information with:&lt;/p&gt;

&lt;pre class="brush:bash"&gt;username@host:~$ ssh -v git-CompanyName@freeN.projectlocker.com&lt;/pre&gt;

&lt;p&gt;or even more verbose:&lt;/p&gt;

&lt;pre class="brush:bash"&gt;username@host:~$ ssh -vv git-CompanyName@freeN.projectlocker.com&lt;/pre&gt;

&lt;h3&gt;Local Git Repository&lt;/h3&gt;

&lt;p&gt;There are multiple ways of doing this. You can create a local repository and push it to Projectlocker or you can simply clone it, which is what I’ll do:&lt;/p&gt;

&lt;pre class="brush:bash"&gt;username@hostname:~$ git config --global user.name &amp;quot;Your Name&amp;quot;
username@hostname:~$ git config --global user.email &amp;quot;youremail@example.com&amp;quot;
username@hostname:~$ git clone git-CompanyName@freeN.projectlocker.com:project.git
Initialized empty Git repository in /home/username/project/.git/
remote: Counting objects: 118, done.
remote: Compressing objects: 100% (96/96), done.
remote: Total 118 (delta 15), reused 111 (delta 15)
Receiving objects: 100% (118/118), 184.08 KiB | 69 KiB/s, done.
Resolving deltas: 100% (15/15), done.&lt;/pre&gt;

&lt;p&gt;Your output may look different as I'm cloning a populated repository in this case. The first two statements are telling Git who you are. Any commits will be identified by that name and user and this includes pushes to remote repositories. This is what I mean by Subversoin users needing to get used to this. Rather than being who the remote server authorizes you as, you are who you say you are.&lt;/p&gt;

&lt;p&gt;An alternative way to set this up is:&lt;/p&gt;

&lt;pre class="brush:bash"&gt;username@hostname:~/work$ git init
Initialized empty Git repository in /home/username/work/.git/
username@hostname:~/work$ git remote add project git-CompanyName@freeN.projectlocker.com:project.git
username@hostname:~/work$ git pull project master
remote: Counting objects: 126, done.
remote: Compressing objects: 100% (102/102), done.
remote: Total 126 (delta 17), reused 111 (delta 15)
Receiving objects: 100% (126/126), 184.87 KiB | 78 KiB/s, done.
Resolving deltas: 100% (17/17), done.
From git-CompanyName@freeN.projectlocker.com:project
 * branch            master     -&amp;gt; FETCH_HEAD&lt;/pre&gt;

&lt;p&gt;Now try the following test:&lt;/p&gt;

&lt;pre class="brush:bash"&gt;hostname@hostname:~/project$ cat &amp;gt; test.txt
This is a test
^D
hostname@hostname:~/project$ git add test.txt
hostname@hostname:~/project$ git remote
origin
hostname@hostname:~/project$ git commit -m &amp;quot;adding test file&amp;quot;
[master 4811b38] adding test file
 1 files changed, 1 insertions(+), 0 deletions(-)
 create mode 100644 test.txt
hostname@hostname:~/project$ git push origin master
Counting objects: 4, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 282 bytes, done.
Total 3 (delta 1), reused 0 (delta 0)
To git-CompanyName@freeN.projectlocker.com:project.git
   a6a6bf9..4811b38  master -&amp;gt; master
hostname@hostname:~/project$ git rm test.txt
rm 'test.txt'
hostname@hostname:~/project$ git commit -m &amp;quot;getting rid of test&amp;quot;
[master 172d57d] getting rid of test
 1 files changed, 0 insertions(+), 1 deletions(-)
 delete mode 100644 test.txt
hostname@hostname:~/project$ git push origin master
Counting objects: 3, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (2/2), 243 bytes, done.
Total 2 (delta 1), reused 0 (delta 0)
To git-CompanyName@freeN.projectlocker.com:project.git
   3edc309..bccaeeb  master -&amp;gt; master&lt;/pre&gt;

&lt;p&gt;Here we’ve simply added a test file, committed it, pushed the whole master branch back to Projectlocker and then removed it. The &amp;quot;git remote&amp;quot; class lists our remote repositories&lt;/p&gt;

&lt;h3&gt;Remote Deployment&lt;/h3&gt;

&lt;p&gt;To do this cleanly, you require SSH shell access to your hosting provider or remote server. Rather than I write this out myself, I shall point you to excellent &lt;a href="http://toroid.org/ams/git-website-howto"&gt;Using Git to manage a web site&lt;/a&gt; as to how to set this up.&lt;/p&gt;

&lt;h3&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;I hope you’ve found this guide useful. It can be a daunting task to figure out exactly where to start and how to begin with a new tool like Git, particularly because there are so many ways you can use it. I truly hope this saves you some grief in getting it set up and working on Windows.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/336308386934546555-8588625789022570319?l=www.cforcoding.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/r5ME6_oBFnN3Qyj0xujxfugO69M/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/r5ME6_oBFnN3Qyj0xujxfugO69M/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/r5ME6_oBFnN3Qyj0xujxfugO69M/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/r5ME6_oBFnN3Qyj0xujxfugO69M/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/FVo5v0YCTS0" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/8588625789022570319/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2009/09/windows-git-tutorial-cygwin-ssh-and.html#comment-form" title="7 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/8588625789022570319?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/8588625789022570319?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/FVo5v0YCTS0/windows-git-tutorial-cygwin-ssh-and.html" title="Windows Git Tutorial: Cygwin, SSH and Projectlocker" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07140129710674369084" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">7</thr:total><feedburner:origLink>http://www.cforcoding.com/2009/09/windows-git-tutorial-cygwin-ssh-and.html</feedburner:origLink></entry><entry gd:etag="W/&quot;C0EERnk8eip7ImA9WxNREEg.&quot;"><id>tag:blogger.com,1999:blog-336308386934546555.post-4382335172597854037</id><published>2009-09-04T16:06:00.001+08:00</published><updated>2009-09-04T16:06:47.772+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-09-04T16:06:47.772+08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="php" /><category scheme="http://www.blogger.com/atom/ns#" term="smarty" /><title>PHP and Smarty: What IDE?</title><content type="html">&lt;p&gt;This week I’ve decided to use &lt;a href="http://www.smarty.net/"&gt;Smarty&lt;/a&gt; for a personal project. I’ve previously been leery of using a template engine with PHP because PHP is a template engine. You can embed code in what are otherwise HTML documents.&lt;/p&gt;  &lt;h3&gt;Why Smarty?&lt;/h3&gt;  &lt;p&gt;But this week I’ve come around to the Smarty way of thinking for several reasons.&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;It has &lt;a href="http://www.smarty.net/manual/en/"&gt;excellent documentation&lt;/a&gt;. Every function is documented (as far as I can tell) and there are sections on relevant advanced topics. As regular readers will know, I have been &lt;a href="http://www.cforcoding.com/2009/08/its-time-we-stopped-rewarding-projects.html"&gt;critical of projects lacking documentation&lt;/a&gt;;&lt;/li&gt;    &lt;li&gt;It’s compiled to PHP anyway so apart from an initial “compile”, there is performance cost for using it.&lt;/li&gt;    &lt;li&gt;It simplifies a lot of common task like odd-even table row styling, rendering combo boxes with a particular item selected, etc;&lt;/li&gt;    &lt;li&gt;There are extensive &lt;a href="http://www.smarty.net/manual/en/language.modifiers.php"&gt;variable modifiers&lt;/a&gt;;&lt;/li&gt;    &lt;li&gt;There is very little magic going on. By “magic” I basically mean auto-loading and other things that introduce code you don’t necessarily expect. Smarty is very straightforward and is obviously designed for high-performance use, which I really appreciate;&lt;/li&gt;    &lt;li&gt;Smarty is extensible with it’s &lt;a href="http://www.smarty.net/manual/en/plugins.php"&gt;plugin architecture&lt;/a&gt; and input and output filters.&lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;Ok, so decision made: use Smarty. Now what?&lt;/p&gt;  &lt;h3&gt;Where are the Tools?&lt;/h3&gt;  &lt;p&gt;I still haven’t found a PHP IDE that I’m comfortable with. In part, this is probably due to using &lt;a href="http://www.jetbrains.com/idea/"&gt;Intellij IDEA&lt;/a&gt;, which as far as I’m concerned is by far the best Java IDE out there. So much so that many Visual Studio users consider &lt;a href="http://www.jetbrains.com/resharper/"&gt;Resharper&lt;/a&gt; a “must have” extension for .Net development. Resharper and Intellij are both by &lt;a href="http://www.jetbrains.com/"&gt;Jetbrains&lt;/a&gt;. Resharper adds a lot of the refactoring tools that Intellij has to Visual Studio.&lt;/p&gt;  &lt;p&gt;So what am I looking for?&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;Syntax highlighting;&lt;/li&gt;    &lt;li&gt;Code completion;&lt;/li&gt;    &lt;li&gt;Being able to navigate to and from templates, plugins, etc; and&lt;/li&gt;    &lt;li&gt;Some form of validation (eg plugin file naming).&lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;The code completion and syntax highlighting should, at a minimum, apply to PHP, HTML, Javascript and CSS. Ideally it would also include good jQuery support but that’s just a nice-to-have.&lt;/p&gt;  &lt;p&gt;This isn’t an exhaustive review of these tools, merely an overview based on a fairly narrow set of requirements.&lt;/p&gt;  &lt;h3&gt;Eclipse PDT&lt;/h3&gt;  &lt;p&gt;I have never quite &lt;a href="http://en.wikipedia.org/wiki/Grok"&gt;grokked&lt;/a&gt; the Eclipse way of thinking. The whole perspectives thing where your whole IDE changes when you are, say, debugging never sat right with me. To each their own I guess. You also have the well-known malaise known as &lt;a href="http://www.eclipsezone.com/eclipse/forums/t60989.html"&gt;Eclipse plugin hell&lt;/a&gt;.&lt;/p&gt;  &lt;p&gt;Still, I’ve tried it a couple of times and it’s OK but it doesn’t understand Smarty templates. There is a project to add &lt;a href="http://code.google.com/p/smartypdt/"&gt;Smarty support to Eclipse&lt;/a&gt; but it seems to be abandoned or dead and doesn’t seem to work with Eclipse 3.5.&lt;/p&gt;  &lt;h3&gt;Netbeans&lt;/h3&gt;  &lt;p&gt;For PHP I’ve used Netbeans far more than Eclipse. I can’t give you a concrete reason for this other than to say that I just found it less bloated and more straightforward. But unfortunately there is no Smarty support and it appears there are no plans to add it.&lt;/p&gt;  &lt;h3&gt;Komodo Edit&lt;/h3&gt;  &lt;p&gt;I only tried the free version (Komodo IDE is commercial) but it has no Smarty support. I’ve read some people rave about this tool. Personally I found it uninspiring.&lt;/p&gt;  &lt;h3&gt;PhpEd&lt;/h3&gt;  &lt;p&gt;There are lot of things I like about PhpEd but it is commercial. It has one glaring problem though: You can’t change the font size menus and other UI components. You can change the editing pane but the rest is derived from the OS DPI settings. Apparently this is because of the framework PhpEd is written in from what I could gather from the forums. Still, that’s a problem. There’s a huge difference between a 24” 1920x1200 display and a 15” laptop display that’s also 1920x1200. Changing the DPI setting in the OS is fraught with problems so changing it is not always a viable option.&lt;/p&gt;  &lt;p&gt;That being said, I am generally most impressed with this IDE out of all the others. The price (US$210) is that high but it can be a lot for a personal project considering how much you can get for free elsewhere.&lt;/p&gt;  &lt;p&gt;One thing I don’t like is that you have to buy a Windows or a Linux version. Why? Compare this to Intellij. You simply buy a license and you can use it on Windows, Mac or Linux. Why should it matter if I want to change my OS? It’s also a greatly different version (3.3.3 on Linux, 5.8 on Windows). Is there a difference? Is Linux lagging behind?&lt;/p&gt;  &lt;p&gt;PhpEd does understand Smarty templates though so it has to get a big plus for that.&lt;/p&gt;  &lt;h3&gt;CodeLobster PHP&lt;/h3&gt;  &lt;p&gt;I’ve been testing &lt;a href="http://www.codelobster.com/"&gt;Code Lobster&lt;/a&gt; for a few days. It’s an interesting product. What I like about it:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;It’s lightweight at under 10 megs in size;&lt;/li&gt;    &lt;li&gt;The UI is modern-looking and snappy;&lt;/li&gt;    &lt;li&gt;It has support for Smarty, Joomla, Wordpress, CodeIgniter, Drupal and jQuery;&lt;/li&gt;    &lt;li&gt;It is quite cheap and you only pay for those plugins you use, basically.&lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;That being said, I did come up with a few things that need some more work.&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;No “live” search. I’ve been spoilt by Intellij on this one. This is where the cursor moves as you type so you can stop typing as soon as you get to where you want. I consider this a “must have” feature of any editor. The venerable search box needs to be retired;&lt;/li&gt;    &lt;li&gt;The search/replace functionality was a bit awkward in how it searches multiple files. It doesn’t list all the matches that you can just click on. Searching a subfolder is awkward;&lt;/li&gt;    &lt;li&gt;Making changes to the filesystem externally requires a manual ‘Reload Project’. Something like Netbeans or Intellij will just find changed files quickly;&lt;/li&gt;    &lt;li&gt;There doesn’t seem to be a way of moving files around within the IDE;&lt;/li&gt;    &lt;li&gt;Some of the auto-formatting takes a bit of getting used to. For example, you type an opening brace and it advances automatically to the next line and indents. That’s not bad. You just need to get used to it.&lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;I can’t say this is an exhaustive list and I’m not saying it’s a bad product. Overall I like it. It’s just a bit rough around the edges.&lt;/p&gt;  &lt;h3&gt;Web IDE&lt;/h3&gt;  &lt;p&gt;In the last month, Jetbrains has announced the development of &lt;a href="http://blogs.jetbrains.com/idea/2009/08/web-ide-&amp;mdash;-intellij-idea-for-html-and-php-developers/"&gt;Web IDE&lt;/a&gt; in standard and PHP editions. You can download an &lt;a href="http://www.jetbrains.net/confluence/display/WI/Web+IDE+EAP;jsessionid=2EA455B03CBB806EE3F0F41F06D3C50E"&gt;Early Access Preview release&lt;/a&gt; and try it out. To me this is the Great White Hope of the PHP IDE market because I have such enormous respect for Intellij’s code editor, which to me is the most important part of any IDE.&lt;/p&gt;  &lt;p&gt;I’ve had a brief look at the PHP support and it still seems like there is a long way to go but I’ll certainly keep an eye on it. Based on Jetbrain’s track record, I have a lot of hope for this one but as Eclipse has shown, making a good Java IDE is different to making a good PHP IDE and as soon as you try to do everything you end up being good at nothing. I usually describe this condition as “doing everything badly”.&lt;/p&gt;  &lt;p&gt;Also, it seems Jetbrains will be rolling this functionality into the fully-featured Intellij IDEA 9 (Maia) release so if you happen to get that version anyway (which I will) you’ll get it for “free”.&lt;/p&gt;  &lt;p&gt;The good thing about Intellij is that it’s equally usable on Mac, Linux and Windows and you can use your license for any of those. It’s out of the box functionality comes with Subversion integration and pretty much everything you need out of the box. For me, this sure beats the plugin hell of Eclipse like figuring out which Subversion plugin to use (Subclipse or Subversive).&lt;/p&gt;  &lt;h3&gt;It’s the Little Things&lt;/h3&gt;  &lt;p&gt;There are plenty of people who happily develop using a text editor such as Textmate, Notepad++, Vim or Emacs. You can but there’s nothing like an IDE that supports left-clicking on a class to open the other class. Or an IDE that understands what framework you use (eg understanding Spring is particularly useful for a Java IDE).&lt;/p&gt;  &lt;p&gt;Several of the programs I looked at did things like have code completion for HTML, CSS and Javascript to the point that you could click Ctrl+space and get a list of values for a CSS property. All of those things are useful.&lt;/p&gt;  &lt;p&gt;One of the things that really impresses me about Intellij is the little ways the editing makes your life easier. This can be as simple as you type “mysql_query(“ and it inserts a closing parenthesis for you. Lots of programs do that but in Intellij if you then type a closing parenthesis, it moves you cursor after the automatic one. No spurious parenthesis is added and I don’t have to navigate over the automatic one with cursor keys or mouse. This speeds up how fast you can type and helps not break your flow.&lt;/p&gt;  &lt;p&gt;That may seem like a small thing but there are literally &lt;strong&gt;&lt;em&gt;hundreds&lt;/em&gt;&lt;/strong&gt; of things like that in Intellij to the point that I now look for them in anything else I use, for better or for worse.&lt;/p&gt;  &lt;h3&gt;Conclusion&lt;/h3&gt;  &lt;p&gt;I guess the downside of using a dynamic language like PHP is that it’s harder to do anything like validation and navigation.&lt;/p&gt;  &lt;p&gt;For my money, at this point in time, I would have to say that PhpEd is the best of the bunch if Smarty is important to you but keep an eye on Web IDE/Intellij IDEA when the next version is released by the end of the year (Q4 2009).&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img src="http://c.statcounter.com/counter.php?sc_project=4738793&amp;amp;java=0&amp;amp;security=26803be4&amp;amp;invisible=1" alt="." border="0" height="1" width="1" /&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/336308386934546555-4382335172597854037?l=www.cforcoding.com' alt='' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/F880qEbnKk-yOlxt0W2SEtlgnWc/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/F880qEbnKk-yOlxt0W2SEtlgnWc/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/F880qEbnKk-yOlxt0W2SEtlgnWc/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/F880qEbnKk-yOlxt0W2SEtlgnWc/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/CForCoding/~4/BzDH9zEQNj0" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://www.cforcoding.com/feeds/4382335172597854037/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.cforcoding.com/2009/09/php-and-smarty-what-ide.html#comment-form" title="11 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/4382335172597854037?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/336308386934546555/posts/default/4382335172597854037?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/CForCoding/~3/BzDH9zEQNj0/php-and-smarty-what-ide.html" title="PHP and Smarty: What IDE?" /><author><name>William Shields</name><uri>http://www.blogger.com/profile/18356811199950883367</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="07140129710674369084" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">11</thr:total><feedburner:origLink>http://www.cforcoding.com/2009/09/php-and-smarty-what-ide.html</feedburner:origLink></entry></feed>
