<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>developers.hover.in</title>
	
	<link>http://developers.hover.in/blog</link>
	<description>the developer loungue at hover.in...</description>
	<pubDate>Tue, 27 Oct 2009 10:13:06 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.7</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<atom:link rel="hub" href="http://pubsubhubbub.appspot.com" /><atom:link rel="hub" href="http://superfeedr.com/hubbub" />		<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/hoverinDevelopers" /><feedburner:info uri="hoverindevelopers" /><item>
		<title>hoverlets and a case study : Zemanta API</title>
		<link>http://feedproxy.google.com/~r/hoverinDevelopers/~3/gcjaFqWPj9s/</link>
		<comments>http://developers.hover.in/blog/2009/hoverlets-and-case-study-zemanta-api/#comments</comments>
		<pubDate>Sat, 20 Jun 2009 18:46:27 +0000</pubDate>
		<dc:creator>Bosky</dc:creator>
		
		<category><![CDATA[posts]]></category>

		<category><![CDATA[casestudy]]></category>

		<category><![CDATA[demo]]></category>

		<category><![CDATA[hoverlet]]></category>

		<category><![CDATA[library]]></category>

		<category><![CDATA[semantic]]></category>

		<category><![CDATA[zemanta]]></category>

		<guid isPermaLink="false">http://developers.hover.in/blog/?p=160</guid>
		<description><![CDATA[In this post we see :

What is a hoverlet?
Creation of a hoverlet
Usage of a hoverlet
Case Study : A Zemanta Hoverlet

What is a hoverlet?
When we launched the concept of a hoverlet, we wanted it to be as extensible as possible with respect to what was possible. We decided not to go for proprietary packaging and app [...]]]></description>
			<content:encoded><![CDATA[<p>In this post we see :</p>
<ol>
<li>What is a hoverlet?</li>
<li>Creation of a hoverlet</li>
<li>Usage of a hoverlet</li>
<li>Case Study : A Zemanta Hoverlet</li>
</ol>
<h2>What is a hoverlet?</h2>
<p>When we launched the concept of a hoverlet, we wanted it to be as extensible as possible with respect to what was possible. We decided not to go for proprietary packaging and app creation standards for creating a hoverlet since the web is already an open enough standard. We simply decided to augment the same simple markup, javascript, and CSS that everyone is familiar with - with a few objects extra exposed.</p>
<p>Currently you need to register with us to create a hoverlet, But if you are a hosting environment/company would like to provide hoverlet hosting - we&#8217;d love to talk to you - so that our users and the community as a whole can benefit from defragmented user-generated content. And we intend to open source whatever it takes to see a wave of adoption that transcends a company and has a chance to benefit the user-experience of the web as it is today.</p>
<p><span style="text-decoration: underline;"><strong></strong></span>A hoverlet is just a regular html document that exposes a javascript object that contains details like which is the word hovered, from which page, which site ,and several other parameters to let developers build contextual mashups. This can easily be done by including a CGI script or other server-side scripting methods irrespective of the server and language used.</p>
<h2>Creation of a hoverlet</h2>
<p>So essentially we&#8217;re just hosting webpages that are interpreted in realtime to understand certain macros and have a javascript object that contains context. In our beta product - a developer can create a hoverlet that shows contextual information based on whichever keyword is hovered.<br />
A publisher needs to say - &#8220;i like this map hoverlet, now let me associate a keyword to this hoverlet&#8221;, or &#8220;i like this stock quotes hoverlet, let me associate a company to it&#8221;.</p>
<p>Here&#8217;s  a screencast of how to create a hoverlet, by registering for hover.in beta .</p>
<p><object width="568" height="355" data="http://blip.tv/play/Ae3yQAA" type="application/x-shockwave-flash"><param name="src" value="http://blip.tv/play/Ae3yQAA" /><param name="allowfullscreen" value="true" /></object></p>
<h2>Using a hoverlet</h2>
<p>Using a hoverlet was made a much more easier and straightforward. A hoverlet can be embedded into any webpage by including the <a href="http://start.hover.in/">start.hover.in</a> script and without any registration or credentials. Following that - since the URI is the API , all you need to to to embed or render a hoverlet into a webpage is just give the href or title attribute of an anchor tag to point to a valid hoverlet resource.  That&#8217;s about it! Here is a structure of how the URI will look like.</p>
<pre class="prettyprint">http://&lt;event&gt;.hover.in/&lt;type&gt;/&lt;nick&gt;/&lt;hoverlet&gt;/&lt;param1&gt;/&lt;param2&gt;/...</pre>
<p>We have made 10-20 hoverlets  during our internal hackdays using the method specified in the video above.</p>
<pre class="prettyprint">crunchbase
blinkxvideos
relatedyoutube
pubinfo
relatedflickr
relatedpicasa
relatedwordpress
twittersearch
googlemaps
priceinfo
twitterprofile
zemantarelated
safaribooks
bookmarks
nytimes
cricinfo
aolvideos</pre>
<p>These are all made by our nick &#8216;hover.in&#8217;. However, we will open up the community in January 2010, where anyone can create hoverlets (like creating facebook apps) . If you are a content provider or a developer, you might want to register well before that so that you can be well-prepared to understand through the platform and the possibilities of a hoverlet - be it a registration form , a dictionary lookup or a paid <span class="il">transaction</span> to buy a product.</p>
<p>Note: If you&#8217;re into RDF and semantic web vocabularies, we might be able to dwell deeper into the implication of having a <em>hoverlet vocab</em>?</p>
<h2>Case Study : A Zemanta Hoverlet</h2>
<p>We loved the idea of showing developers and publishers how they can use mashups and API&#8217;s such as <a href="http://www.zemanta.com/blog/zemanta-api-hoverin-hoverlet/" target="_blank">Zemanta API and hover.in&#8217;s hoverlets</a> , as posted on the <a href="http://http://www.zemanta.com/blog/" target="_blank">Zemanta blog</a> [cheers guys : ) ]to bring context and increased user-engagement for their users.</p>
<p>Here&#8217;s a simple example of the <a href="http://start.hover.in/" target="_blank">start.hover.in</a> library in action using the zemanta related hoverlet created by hover.in for getting related articles to the Iran election in just few lines of markup:</p>
<pre class="prettyprint"> &lt;html&gt;
 &lt;body&gt;

  &lt;a href="http://onclick.hover.in/hoverlet/hover.in/zemantarelated/election" &gt;iran election&lt;/a&gt; 

  &lt;script src="http://start.hover.in/script" id="hi_start" type="text/javascript"&gt;&lt;/script&gt;

 &lt;/body&gt;
&lt;html&gt;</pre>
<p>So here just to clarify :</p>
<pre class="prettyprint">http://onclick.hover.in/hoverlet/hover.in/zemantarelated/election
translates to
http://&lt;event&gt;.&lt;host&gt;/&lt;type&gt;/&lt;nick&gt;/&lt;hoverlet&gt;/&lt;param1&gt;/&lt;param2&gt;/...
Hence
    event -&gt; onclick
    host-&gt; hover.in
    type -&gt; hoverlet
    nick -&gt; hover.in
    hoverlet  -&gt; zemantarelated
    keyword -&gt; election</pre>
<p>You can see a demo of the zemanta hoverlet at <a href="http://start.hover.in/#demo4" target="_blank">http://start.hover.in/#demo4</a> , and other demo&#8217;s of the hoverlets along with more details mentioned above at the <a href="http://start.hover.in /" target="_blank">homepage of the start.hover.in library</a>.</p>
<p>if you want to pull in relevent content and  <a href="http://www.zemanta.com/pro/" target="_blank">use the Zemanta API</a> - you can get your own API key,  and start building truly contextual applications and hoverlets for the web. And of course,  If you&#8217;re a content provider, an application developer on an existing platform or a hacker who wants to build hoverlets - do <a href="http://get.hover.in" target="_blank">sign up for beta</a>, and get in touch with me via kode at hover dot in, or ping <a href="http://twitter.com/hoverin" target="_blank">hover.in on twitter</a>.</p>
<p><strong>Keep Clicking,</strong><br />
~B</p>
<img src="http://feeds.feedburner.com/~r/hoverinDevelopers/~4/gcjaFqWPj9s" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://developers.hover.in/blog/2009/hoverlets-and-case-study-zemanta-api/feed/</wfw:commentRss>
		<feedburner:origLink>http://developers.hover.in/blog/2009/hoverlets-and-case-study-zemanta-api/</feedburner:origLink></item>
		<item>
		<title>intern challenge - 1</title>
		<link>http://feedproxy.google.com/~r/hoverinDevelopers/~3/zZE_vgj7QZs/</link>
		<comments>http://developers.hover.in/blog/2009/intern-challenge-1/#comments</comments>
		<pubDate>Sun, 24 May 2009 15:30:45 +0000</pubDate>
		<dc:creator>interns</dc:creator>
		
		<category><![CDATA[posts]]></category>

		<category><![CDATA[appjet]]></category>

		<category><![CDATA[atoms]]></category>

		<category><![CDATA[erlang]]></category>

		<category><![CDATA[feeds]]></category>

		<category><![CDATA[google ajax api]]></category>

		<category><![CDATA[mochiweb]]></category>

		<category><![CDATA[perl]]></category>

		<category><![CDATA[php]]></category>

		<category><![CDATA[RSS]]></category>

		<category><![CDATA[yahoo pipes]]></category>

		<category><![CDATA[yaws]]></category>

		<category><![CDATA[yql]]></category>

		<category><![CDATA[zembly]]></category>

		<guid isPermaLink="false">http://developers.hover.in/blog/?p=117</guid>
		<description><![CDATA[Problem:
  Given a URL find out the location of the RSS feed and show the corresponding posts.
Bonus: Provide context by finding related posts to a term / given query
Steps:
  1.    Find the various feeds from the URL using a parser or a similar web service
2.    Taking the [...]]]></description>
			<content:encoded><![CDATA[<h2><strong>Problem:</strong></h2>
<p><strong> </strong> Given a URL find out the location of the RSS feed and show the corresponding posts.<br />
Bonus: Provide context by finding related posts to a term / given query</p>
<h2><strong>Steps:</strong></h2>
<p><strong> </strong> 1.    Find the various feeds from the URL using a parser or a similar web service<br />
2.    Taking the feed and show the posts<br />
3.    Finally host the application<br />
4.    Bonus: pass in the context using the HOVER APIs</p>
<h2><strong>Solutions:</strong></h2>
<p><strong> </strong> ·    Using server side scripting (Perl/PHP/Erlang) to discover RSS / Atom feeds.<br />
·    Using Yahoo Pipes to auto discover RSS / Atom feeds.<br />
·    Using YQL to fetch the links to RSS.<br />
·    Using Google Ajax APIs to find Feeds on Zembly or Appjet or a Hoverlet.</p>
<h2><strong>Using PHP to discover RSS/Atom feeds</strong></h2>
<p>We have been using PHP for developing our different projects like the Wordpress plugin named <a href="http://wordpress.org/extend/plugins/wp-hover/screenshots/" target="_blank">wp-hover</a>. So we decided to start with PHP.</p>
<p>Firstly this is how typical RSS links appears in a page.</p>
<pre class="prettyprint">&lt;link rel="alternate" type="application/rss+xml" title="RSS 2.0" href="http://developers.hover.in/blog/feed/" /&gt;</pre>
<p>We tried to parse using  Zend_Feed::findFeeds(&lt;URL&gt;) but it was not giving RSS links as expected. At times it was just printing whole content of the body, and other times gave DOM Element objects which when var_dumped contained just integers and some metadata. If we could get the result using the above function, then we could have easily parsed that RSS URL using the function Zend_Feed::import(&lt;URL of the discovered RSS&gt;).</p>
<p>The pseudocode for the solution in PHP:</p>
<pre class="prettyprint">//finding the rss feed
$feedArray= Zend_Feed::findFeeds('http://developers.hover.in’);
// get the RSS feeds
Foreach($feedArray as $item){
// get the rss feed
$rss = Zend_Feed::import($item);
// Loop through the items in the feed
// and print it out</pre>
<p>Since the above didn’t work as expected, we looked at other options.</p>
<h2><strong>Using Perl to discover RSS/Atom feeds</strong></h2>
<p>Similarly in perl which is known for string and regex handling, we found Feed::Find for this purpose which worked much better. So we setup perl (and its long list of dependency) as follows:</p>
<pre class="prettyprint">#Installing Perl with CPAN modules (Ubuntu) using apt-get to install :
sudo apt-get install build-essential  libssl-dev   libc6-dev    perl    yaml-mode
#followed by :
perl -MCPAN -e ‘install Feed::Find’
#or running cpan terminal and use command ‘install Feed::Find’

<strong>Perl Script:
</strong>
#!/usr/local/bin/perl
use Feed::Find;
@feeds = Feed::Find-&gt;find('http://trak.in/');
print "@feeds";</pre>
<p>Using the above script we were able to find all the RSS / ATOM links for the given URL. By default it finds all kinds of feeds. It would have been better if the find function took second argument of &lt;FEED_MIME_TYPE&gt; currently you need to set this in the source itself.</p>
<p>Since we interested in web application the following Perl code can be run from a web server as a CGI script.</p>
<h2><strong>Using Erlang to discover RSS/Atom feeds</strong></h2>
<p>Since erlang is predominantly use at hover.in, we explored the same in erlang.<a href="http://code.google.com/p/mochiweb/" target="_blank"> MochiWeb</a> is an Erlang library for building lightweight HTTP servers. Since it is an open source project - it has been used in big projects like Couchdb, erlyweb etc and we were encouraged to test out it&#8217;s parsing capabilities.</p>
<p><a href="http://www.w3.org/TR/xpath" target="_blank">XPath</a> is a language for addressing parts of an XML document, which is based on element nodes, attribute nodes and text nodes.</p>
<p>Reading through a post on <a href="http://ppolv.wordpress.com/2008/05/09/fun-with-mochiwebs-html-parser-and-xpath" target="_blank">pplov’s blog</a> we came to know that he had contributed an xpath parser for mochiweb which can be downloaded at the <a href="http://groups.google.com/group/mochiweb/files">mochiweb google group</a></p>
<p><strong>Code:</strong></p>
<pre class="prettyprint">module(find).
-export([feeds/1]).
feeds(Url)-&gt;
    {ok,{_,_Headers,Body}} = http:request(Url),
    Tree = mochiweb_html:parse(Body),
    Xpath = "//link[@type='application/rss+xml']",
    [ {_Tag1,Attributes1,_Content1}|_Rest] = mochiweb_xpath:execute(Xpath,Tree),
    BinUrl = lists:foldl(
         fun({&lt;&lt;"href"&gt;&gt;,Href},_Prev) -&gt; Href;
             (_Else,Prev)-&gt; Prev
         end,error,Attributes1),
    binary_to_list(BinUrl).</pre>
<p><strong>To run above code</strong></p>
<pre class="prettyprint">1&gt;application:start(inets).
ok
2&gt; find:feeds("http://developers.hover.in").
"http://developers.hover.in/blog/feed/"</pre>
<p><strong>Conclusion:<br />
</strong> Using either PHP or Perl or Erlang or basically similar modules from other languages are defiantly viable options, but we decided to also check out solutions that have come up  more recently such as Pipes, YQL , Google AJAX APIs  which could be hosted on environments like <a href="http://appjet.net" target="_blank">Appjet</a><a href="http://pipesonajet.appjet.net?site=http://developers.hover.in&amp;kw=erlang" target="_blank"> </a>(<span dir="ltr">AppJet is a website that allows users to create web based applications in their web browser</span>), <a href="http://zembly.com" target="_blank">Zembly</a> or Hoverlets &#8212; hover.in’s own hovering widget hosting environment. But more on the application hosting later. First let’s try other methods to discover feeds from a webpage.</p>
<h2><strong>Using Yahoo Pipes to auto discover RSS/Atom feeds</strong></h2>
<p>Yahoo! Pipes is a web application from Yahoo! that provides a graphical user interface for building data mashups that aggregate web feeds, web pages, and other services. hover.in has always been a big fan of pipes and in <a href="http://hover.in/2007/12/17/see-you-the-first-yahoo-pipes-unconference/" target="_blank">showcasing</a> it.</p>
<p style="text-align: center;"><a href="http://developers.hover.in/blog/wp-content/uploads/2009/05/findfeeds.jpg" target="_blank"><img class="aligncenter size-full wp-image-118" src="http://developers.hover.in/blog/wp-content/uploads/2009/05/findfeeds.jpg" alt="findfeeds" width="796" height="430" /></a><br />
The above snapshot shows that how simple it is to find all RSS and ATOMs for given URL. You can run the <strong>findFeeds</strong> pipe at <a href="http://pipes.yahoo.com/pipes/pipe.info?_id=00ff6bb493d2785b7594eea76e55c988" target="_blank">http://pipes.yahoo.com/pipes/pipe.info?_id=00ff6bb493d2785b7594eea76e55c988</a>.</p>
<p style="text-align: center;"><a href="http://developers.hover.in/blog/wp-content/uploads/2009/05/showfeeds.jpg" target="_blank"><img class="aligncenter size-full wp-image-120" src="http://developers.hover.in/blog/wp-content/uploads/2009/05/showfeeds.jpg" alt="showfeeds" width="881" height="591" /></a></p>
<p>The above snapshot is a clone of the first pipe extended to show the post of an RSS Feed for any given URL. You can run the <strong>showRelated</strong> pipe at <a href="http://pipes.yahoo.com/pipes/pipe.info?_id=_iMHbOhG3hGjsadmgQSecQ" target="_self">http://pipes.yahoo.com/pipes/pipe.info?_id=_iMHbOhG3hGjsadmgQSecQ</a> as well as view its source.</p>
<p>To see how far we could differ in traditional implementations -  the final pipes was hosted on Appjet and hence aptly called <a href="http://pipesonajet.appjet.net" target="_blank">pipes-on-a-jet</a> ; ) This is how it looks like in the appjet IDE<br />
<a href="http://developers.hover.in/blog/wp-content/uploads/2009/05/appjet1.gif"><img class="aligncenter size-full wp-image-149" title="appjet1" src="http://developers.hover.in/blog/wp-content/uploads/2009/05/appjet1.gif" alt="appjet1" width="936" height="548" /></a><br />
<strong>examples of using pipesOnAJet:</strong></p>
<ol>
<li>search <a href="http://pipesonajet.appjet.net?site=http://trak.in&amp;kw=kolkata knight riders" target="_blank">for kolkata knight riders on trak.in</a></li>
<li>search <a href="http://pipesonajet.appjet.net?site=http://techcrunch.com&amp;kw=demo" target="_blank">for demo on techcrunch</a></li>
</ol>
<h2><strong>Using Yahoo! Query Language</strong><strong> to discover RSS/Atom feeds</strong></h2>
<p><a href="http://developer.yahoo.com/yql/" target="_blank">YQL</a> (Yahoo! Query Language) is an expressive SQL-like language that lets you query, filter, and join data across Web services.</p>
<p>Running “select * from data where URL=&#8217;http://developers.hover.in&#8217;” gave the content of the body tag in that page. One feedback we have is that we were not able to find the content of head element tag. We look forward for this feature in upcoming builds. So we tried another query with the help of <a href="http://developer.yahoo.com/yql/" target="_blank">Open Data Tables</a> (Open Data Tables enable developers to add tables for any data on the Web to our stable of API-specific tables), which returned me the entire link tags in given URL. But this was not sufficient since we had to find the links to all the RSS feeds. So playing around with it we found that a clever hack was to provide a minimal CSS selector and we got this!<br />
<a href="http://developers.hover.in/blog/wp-content/uploads/2009/05/yql1.gif"><img class="aligncenter size-full wp-image-151" title="yql1" src="http://developers.hover.in/blog/wp-content/uploads/2009/05/yql1.gif" alt="yql1" width="891" height="414" /></a><br />
<strong>Final statement:</strong></p>
<pre class="prettyprint" style="padding:20px;">use 'http://yqlblog.net/samples/data.html.cssselect.xml' as data.html.cssselect;
select * from data.html.cssselect where url="&lt;URL&gt;" and css="link"</pre>
<p><a href="http://query.yahooapis.com/v1/public/yql?q=use%20'http%3A%2F%2Fyqlblog.net%2Fsamples%2Fdata.html.cssselect.xml'%20as%20data.html.cssselect%3B%20select%20*%20from%20data.html.cssselect%20where%20url%3D%22%3CURL%3E%22%20and%20css%3D%22link%22&amp;format=xml" target="_blank">You can execute this here:</a></p>
<h2><strong>Using Google AJAX APIs to discover RSS/Atom feeds</strong></h2>
<p><a href="http://code.google.com/apis/ajaxsearch/documentation/" target="_blank">Google&#8217;s AJAX APIs</a> let you implement rich, dynamic web sites entirely in JavaScript and HTML. You can add a map to your site, a dynamic search box, or download feeds with just a few lines of JavaScript. Unlike most javascript libraries out there - this one focuses more on data and less on the typical UI capabilities.</p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-123" src="http://developers.hover.in/blog/wp-content/uploads/2009/05/google12.gif" alt="google12" width="775" height="528" /><br />
<strong>Google Ajax libraries snippet to discover RSS and show the post</strong></p>
<pre class="prettyprint">google.load("feeds", "1");

function OnLoad() {
   var query;
   // Query for finding posts related to erlang on the dev blog
   query = 'site:http://developers.hover.in/ erlang';

   // OR Query to find related posts to the hovered word within a hoverlet
   // query = 'site:' + HOVER.site +' '+ HOVER.kw;

   google.feeds.findFeeds(query, findDone);
}

function findDone(result) {
   //traverse and print out
}

google.setOnLoadCallback(OnLoad);</pre>
<p>You can run find the above code <a href="http://trak.in/tags/business/2009/05/11/ipl-is-valued-at-2bn-and-kkr-is-the-richest-team-are-you-kidding-me/" target="_blank"><strong>live here</strong></a>, and has been used for a related posts from your blog hoverlet and is being used on sites like trak.in to show related posts to Kolkata Knight Riders , and basically any word that the hover.in user specifies in his dashboard.</p>
<p>And finally…..here’s the result of a couple of days of hacking around with perl, php, erlang+yaws, cgi, y!pipes, y!ql, appjet, google apis . To top it all– I think it’s safe to say that it took longer to edit this post though! ; )</p>
<p style="text-align: center;"><a href="http://trak.in/tags/business/2009/05/11/ipl-is-valued-at-2bn-and-kkr-is-the-richest-team-are-you-kidding-me/" target="_blank"><img class="aligncenter size-full wp-image-119" src="http://developers.hover.in/blog/wp-content/uploads/2009/05/hoverlet.jpg" alt="hoverlet" width="550" height="380" /></a></p>
<p>You can look forward to more posts that deal with the web apps, API’s and hosting environments – apart from the official HOVER API documentation that will be announced soon that will enable you to build your own contextual applications. <a href="http://get.hover.in" target="_blank">Signup</a>, get in touch <a href="mailto:contact@hover.in">with us</a> for more or follow <a href="http://onhover.hover.in/hoverlet/hover.in/twitterprofile/hoverin">hover.in on twitter</a>.</p>
<p>~<br />
<strong>for hover.in</strong></p>
<p>Kanchan, Ravi, Zeeshan<br />
( o8-o9 hover.in developer interns from Symbiosis, Pune)</p>
<p><!--Session data--></p>
<p><!--Session data--></p>
<p><!--Session data--></p>
<p><!--Session data--></p>
<p><!--Session data--></p>
<p><!--Session data--></p>
<p><!--Session data--></p>
<p><!--Session data--></p>
<p><!--Session data--></p>
<img src="http://feeds.feedburner.com/~r/hoverinDevelopers/~4/zZE_vgj7QZs" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://developers.hover.in/blog/2009/intern-challenge-1/feed/</wfw:commentRss>
		<feedburner:origLink>http://developers.hover.in/blog/2009/intern-challenge-1/</feedburner:origLink></item>
		<item>
		<title>hover.in presenting at Erlang Factory London 2009</title>
		<link>http://feedproxy.google.com/~r/hoverinDevelopers/~3/vMroBBBJ4sU/</link>
		<comments>http://developers.hover.in/blog/2009/hoverin-at-erlang-factory-2009/#comments</comments>
		<pubDate>Tue, 21 Apr 2009 10:21:51 +0000</pubDate>
		<dc:creator>Bosky</dc:creator>
		
		<category><![CDATA[Announcements]]></category>

		<category><![CDATA[events]]></category>

		<category><![CDATA[talks]]></category>

		<guid isPermaLink="false">http://developers.hover.in/blog/?p=96</guid>
		<description><![CDATA[ 
We are extremely thrilled to announce that hover.in will be presenting at Erlang Factory&#8217;s London Conference to be held between June 25-26th 2009.
List of speakers
http://www.erlang-factory.com/conference/SFBayAreaErlangFactory2009/speakers
List of talks
http://www.erlang-factory.com/conference/London2009/talks
Abstract of the talk : Erlang at hover.in
From our experiences at &#8216;hover.in&#8216; I&#8217;d like to present how we first chose erlang late 2007, and got about using it [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: center;"><a href="http://www.erlang-factory.com/conference/London2009"> <img class="aligncenter" style="border: 0pt none;" src="http://www.erlang-factory.com/images/lon-speak.gif" border="0" alt="" width="283" height="206" /></a></p>
<p>We are extremely thrilled to announce that hover.in will be presenting at Erlang Factory&#8217;s London Conference to be held between June 25-26th 2009.</p>
<p><strong>List of speakers<br />
</strong><a href="http://www.erlang-factory.com/conference/SFBayAreaErlangFactory2009/speakers" target="_blank">http://www.erlang-factory.com/conference/SFBayAreaErlangFactory2009/speakers</a></p>
<p><strong>List of talks<br />
</strong><a href="http://www.erlang-factory.com/conference/London2009/talks" target="_blank">http://www.erlang-factory.com/conference/London2009/talks</a></p>
<p><strong>Abstract of the talk : Erlang at hover.in</strong></p>
<blockquote><p><strong></strong>From our experiences at &#8216;<a href="http://hover.in/">hover.in</a>&#8216; I&#8217;d like to present how we first chose erlang late 2007, and got about using it as our bridge across our multi-node cluster. In particular the architectural decisions that went into making our distributed python crawler backend running off mnesia with its sharding &amp; fragmentation strategies for tables that span several millions of rows, load-balancing to our 3-node yaws web servers, tweaks to serve more requests, trade-offs in efficiency vs cost, experiments in DHT&#8217;s,  our cache worker implementations  &amp; messaging queues, cron&#8217;s &amp; dispatching jobs while  throwing light on design choices that can fit in distributed and heterogeneous environments. We have also recently built and intend to opensource our own in-memory cache workers, persistent stats &amp; logging system, and in the process of now deploying an A/B testing framework,  that we&#8217;d love to talk about. The talk will also bring in some interesting metaphors from bacteria and its group dynamics, as well as the human brain in handling concurrency &amp; memory.</p>
<p><a href="http://www.erlang-factory.com/conference/London2009/speakers/bhaskerkode" target="_blank">link</a></p></blockquote>
<p><a href="http://www.erlang-factory.com/conference/London2009"><br />
</a></p>
<p>All in all - we&#8217;re very excited to push the realms of both <a href="http://erlang.org" target="_blank">erlang</a>, as well as the functional programming scene in India, and looking forward to share and learn form other erlang legends, developers, enthusiasts, and startups at the <a href="http://erlang-factory.com" target="_blank">Erlang Factory</a> - so <a title="kode@hover.in" href="mailto:kode@hover.in" target="_blank">do ping me </a>if you are <a href="http://www.erlang-factory.com/conference/London2009/venue" target="_blank">in the neighborhood</a>.</p>
<p>Keep Clicking,<br />
~B</p>
<img src="http://feeds.feedburner.com/~r/hoverinDevelopers/~4/vMroBBBJ4sU" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://developers.hover.in/blog/2009/hoverin-at-erlang-factory-2009/feed/</wfw:commentRss>
		<feedburner:origLink>http://developers.hover.in/blog/2009/hoverin-at-erlang-factory-2009/</feedburner:origLink></item>
		<item>
		<title>hover.in at devcamp bangalore</title>
		<link>http://feedproxy.google.com/~r/hoverinDevelopers/~3/ilkTVXOxiHA/</link>
		<comments>http://developers.hover.in/blog/2009/hover-presents-erlang-talk-at-devcamp-blr/#comments</comments>
		<pubDate>Sun, 12 Apr 2009 15:23:34 +0000</pubDate>
		<dc:creator>Bosky</dc:creator>
		
		<category><![CDATA[posts]]></category>

		<category><![CDATA[erlang]]></category>

		<category><![CDATA[talks]]></category>

		<guid isPermaLink="false">http://developers.hover.in/blog/?p=87</guid>
		<description><![CDATA[talks from developer camp ,bangalore 09, including the "erlang at hover.in" slides as well.]]></description>
			<content:encoded><![CDATA[<p>A quick shoutout to<a href="http://sidu.in" target="_blank"> Sidu Ponappa</a>, had a great time at <a href="http://devcamp.in" target="_blank">devcamp</a> edition 2 held on Apr 11, in Bangalore.</p>
<p>some of the interesting talks were&#8230;</p>
<h2><strong>sahi, by Narayan Raman<br />
</strong></h2>
<p>While the morning at <a href="http://blog.sahi.co.in/2009/04/sahi-in-dev-camp_11.html" target="_blank">devcamp was by a committer on the sahi</a> test suite talk on some pretty impressive javascript trickery, where he demonstrated how to automate tests to login send mails , then go into the sent mail, and  delete it &#8230;all automated. considering that it works on one the most beefed up apps on we web right now - gmail , extremely imprssive ( even got to detect , and write into gmail&#8217;s rich text editor though some smart DOM traversal. Even works on IE , which is commendable . It works by having our browser point to a proxy server which inserts scripts to simulate various actions. Another thing to notice was that alerts,and other blocking calls were overwritten,and handled differently. One question that still lingered was how they simluated mouse cursors, and how many of it&#8217;s event properties would be passed on the event handlers, considering the entire actions were simulated.</p>
<pre class="prettyprint">function login($username, $password){
    _setValue(_textbox("Email"), $username);
    _setValue(_password("Passwd"), $password);
    _click(_submit("Sign in"));
}

login("sahi.abcde", "tough123");
_click(_spandiv("Compose Mail"));
_setValue(_textarea("to"), ", ");
_setValue(_textbox("subject"), "important subject");
_rteWrite(_rte(0, _near(_textbox("subject"))), "lots of content");
_click(_spandiv("Send[9]"));
_assertExists(_cell("Your message has been sent. View message"));
_click(_link("Sent Mail"));
_assertExists(_spandiv("To: dummy.email"));
_click(_checkbox(0, _near(_spandiv("To: dummy.email"))));
_expectConfirm("You are about to move the entire conversation to the Trash.
  Are you sure you want to trash the entire conversation containing your sent message?", true)
_click(_spandiv("Delete[14]"));
_assertExists(_cell("The conversation has been moved to the Trash. Learn more Undo"));
_assertExists(_cell("No sent messages! Send one now!"));
_click(_link("Sign out"));</pre>
<h2>Visual metrics for code, by neil ford</h2>
<p>A thoughtworker showcasing the various evolution of visual code metrics, most of which are open source and fun projects that give your projects/codebase a completely different dimention ,literally speaking. one of the most impressive was the codecity view</p>
<div class="wp-caption alignnone" style="width: 421px"><a href="www.inf.unisi.ch/phd/wettel/codecity.html"><img title="a codecity example of modules as districts, buildings as functions wrt LOC" src="http://www.inf.unisi.ch/phd/wettel/pics/codecity_screenshot.png" alt="a codecity example of modules as districts, buildings as functions wrt LOC" width="411" height="259" /></a><p class="wp-caption-text">a codecity example of modules as districts, buildings as functions wrt LOC,function calls,etc</p></div>
<h2>Software for a concurrent world, by chromewatir dev&#8217;s</h2>
<p>The afternoon saw plenty of talks on concurrency, including one by couple of developers of <a href="http://code.google.com/p/chrome-watir/" target="_self">chromewatir</a> ( who forgot to introduce themselves btw!, they&#8217;re from thoughtworks as well  ) on software transactions, locking mechanisms, and compared the atomicty of an ecommerce payment implemented in java, haskell as well as erlang. very impressive.<br />
UPDATE:  <a href="http://developer-in-test.blogspot.com/" target="_blank">Sai</a> is the ChromeWatir guy, and <a href="http://harikrishnan83.wordpress.com" target="_blank">Hari</a> works on Rapa (ActiveResource for Java)</p>
<h2>CouchDB, by                      Anand Chitipothu</h2>
<p>I was fortunate to bump into <a href="http://anand.infogami.com/about" target="_blank">Anand Chitipothu</a>, early dev at Infogami &amp; now the Chief Web Programmer at <a href="http://www.archive.org/" target="_blank">Internet Archive</a> . He&#8217;s had the pleasure to work <a href="http://www.aaronsw.com/" target="_self">Aaron Shwatrz</a> (of RSS, reddit,infogami,webkit,ycombinator, and openlibrary fame&#8230;phew!) . So i got down talking with Anand over lunch, finding out how he got to meet Aaron, (started when he starting contributing to <a href="http://webpy.org/">web.py</a> ) , moving over to his involvement with infogami, and now the IA,which gets over 2 million hits a day, talked about erlang ,python a while, and his experiments with <a href="http://couchdb.apache.org/" target="_blank">couchdb</a>. His talk debunked some lingering questions i still had wrt this REST- based db. There were lot of relevent questions from couple of mysql developers. Backing up a document is as easy as copying it to another system/locations, and the document can now be accessed from a couchdb instance on that system/location. Nice for debugging between home &amp; office he says. View javascript functions are used to create view index, you can also create another reduce view . there are no edits, only appends to file ( or recreate it yourself) . you need to handle conflict resoltuion yourself, views are recreated lazily during a recent read .</p>
<p>He also mentioned that it for 30 million records, it took:</p>
<ul>
<li> nearly 3-4 hours for inserting 30million entries</li>
<li>1 day to create the view index</li>
</ul>
<p>This is slightly starting. Also, any slight change in the view requires that all documents are re-updated.  he also talked about why he&#8217;s skeptical about the concept of having presentation data inside couchdb as well. (the concept of an entire blog running off just couchdb ). So if it&#8217;s just quickly setting up some key ,value pairs - couchdb is for you. He also talked about when his site db/backups went kapoot - his replacement until things were brought back into control - was couchdb instance of key,value pairs and its various views accessed from python bindings.</p>
<div class="wp-caption alignnone" style="width: 302px"><img title="couchdb internals" src="http://couchdb.apache.org/img/sketch.png" alt="couchdb internals" width="292" height="340" /><p class="wp-caption-text">couchdb internals</p></div>
<h2>Other interesting talks</h2>
<ul>
<li><a href="http://devcamp.in/wiki/Ruby_for_the_curious_hacker" target="_blank"> ruby for the curious hacker</a>, by Sidu thoughtworker who i first met as the co-founder of inactive - the micro-messaging <em>twitterish</em> utility for mobile phones alone</li>
<li>on open source licences by the folks at twinglyfoundation</li>
<li>augmented reality demos, by uber flash/flex guru <a href="http://weblog.mrinalwadhwa.com/" target="_blank">Mrinal Wadhwa</a> where he coded up those now popular &#8220;animating some thing on a piece of paper,taking input from a web cam&#8221;. really cool!</li>
</ul>
<h2>erlang at hover.in</h2>
<p>and finally &#8230;..[drum roll ] ; ) , here&#8217;s my talk ( hosted on slideshare ) on erlang at hover.in, and to make things interesting i brought in context about how bacteria, and the brain handle concurrency and memory issues.</p>
<ul>
<li>hover.in founded in late 2007</li>
<li>the web ~ 10-20 years</li>
<li>humans been around for hundreds of thousands of years&#8230;</li>
<li>but bacteria&#8230;. been around for millions of years, so this talk will talk about what we can learn from bacteria, the brain , and memory in a concurrent world.</li>
</ul>
<div id="__ss_1277515" style="width: 425px; text-align: left;"><a style="font:14px Helvetica,Arial,Sans-serif;display:block;margin:12px 0 3px 0;text-decoration:underline;" title="erlang at hover.in , Devcamp Blr 09" href="http://www.slideshare.net/bosky101/erlang-at-hoverin-devcamp-blr-09?type=powerpoint">erlang at hover.in , Devcamp Blr 09</a><object width="425" height="355" data="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=erlangathoverindevcampblr09-090412084909-phpapp02&amp;stripped_title=erlang-at-hoverin-devcamp-blr-09" type="application/x-shockwave-flash"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=erlangathoverindevcampblr09-090412084909-phpapp02&amp;stripped_title=erlang-at-hoverin-devcamp-blr-09" /><param name="allowfullscreen" value="true" /></object></p>
<div style="font-size: 11px; font-family: tahoma,arial; height: 26px; padding-top: 2px;">View more <a style="text-decoration:underline;" href="http://www.slideshare.net/">presentations</a> from <a style="text-decoration:underline;" href="http://www.slideshare.net/bosky101">bosky101</a>.</div>
</div>
<p>All in all, a great unconference, and looking forward to meet and work with more like-minded hackers in the near future.</p>
<p><strong>Keep Clicking,</strong><br />
~B</p>
<img src="http://feeds.feedburner.com/~r/hoverinDevelopers/~4/ilkTVXOxiHA" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://developers.hover.in/blog/2009/hover-presents-erlang-talk-at-devcamp-blr/feed/</wfw:commentRss>
		<feedburner:origLink>http://developers.hover.in/blog/2009/hover-presents-erlang-talk-at-devcamp-blr/</feedburner:origLink></item>
		<item>
		<title>YUI’sing Tsung to choose your CDN</title>
		<link>http://feedproxy.google.com/~r/hoverinDevelopers/~3/dkorPBo-98c/</link>
		<comments>http://developers.hover.in/blog/2009/yui-tsung-to-choose-a-cdn/#comments</comments>
		<pubDate>Tue, 31 Mar 2009 07:34:18 +0000</pubDate>
		<dc:creator>Bosky</dc:creator>
		
		<category><![CDATA[posts]]></category>

		<category><![CDATA[amazon]]></category>

		<category><![CDATA[cloudfront]]></category>

		<category><![CDATA[erlang]]></category>

		<category><![CDATA[googleapis]]></category>

		<category><![CDATA[load-testing]]></category>

		<category><![CDATA[s3]]></category>

		<category><![CDATA[scaling]]></category>

		<category><![CDATA[tsung]]></category>

		<category><![CDATA[yaws]]></category>

		<category><![CDATA[yui]]></category>

		<guid isPermaLink="false">http://developers.hover.in/blog/?p=49</guid>
		<description><![CDATA[An introduction to Tsung
Tsung is an open-source multi-protocol distributed load testing tool, written in &#8230;you guessed it erlang, by the folks at  process-one ( they also contribute heavily to multiple erlang projects, and infact maintain ejabberd, etc ).
I built Tsung 1.3.0 from source on erlang R13A ( built from source, catch the earlier post [...]]]></description>
			<content:encoded><![CDATA[<h3>An introduction to Tsung</h3>
<p><a href="http://tsung.erlang-projects.org/" target="_blank">Tsung</a> is an open-source multi-protocol distributed load testing tool, written in &#8230;you guessed it erlang, by the folks at  process-one ( they also contribute heavily to multiple erlang projects, and infact maintain <a href="http://www.process-one.net/en/ejabberd/" target="_blank">ejabberd</a>, etc ).</p>
<p>I built Tsung 1.3.0 from source on erlang R13A ( built from source, <a href="http://developers.hover.in/blog/2009/somethings-to-rejoice-about/" target="_blank">catch the earlier post on first impressions</a> ) on an AMD 64-bit  , 4 GB Ram machine running ubuntu. Beebole has a nice intro on <a href="http://beebole.com/blog/erlang/test-performance-and-scalability-of-your-web-applications-with-tsung/" target="_blank">how to setup Tsung</a>. The only key hints I have -  you can avoid tsung recorder , and straight away  copy sample tsung config files from /usr/local/share/tsung/examples/http-simple.xml to $HOME/.tsung/tsung.xml , and edit/tweak them. Once set , running tsung start, should show the log directory that it references.</p>
<p>We needed to estimate and gauge how much load the quad-code servers at hover.in could take . To start off I &#8216;m simulating static file  requests, and you can have a lot of control over what kind of requests you can to simulate .So I could have different set of user visit frequencies - like one set of requests at x frequency from M browser asking for url A, while another set of faster frequency y , form I browser asking for url B.</p>
<p>While it may take some time to figure out where you need to tweak you app/server ,since there are several areas that your LYME app can be optimized :</p>
<ul>
<li>kernel-polling , and other erlang VM optimizations</li>
<li>size of log files  , whether gzip is enabled  in yaws.conf (through deflate = true )
<div id="attachment_80" class="wp-caption aligncenter" style="width: 760px"><img class="size-full wp-image-80" title="yaws with &amp; without gzip ( deflate flag in yaws.conf against each server tag)" src="http://developers.hover.in/blog/wp-content/uploads/2009/03/gzip.gif" alt="yaws with &amp; without gzip ( deflate flag in yaws.conf against each server tag)" width="750" height="69" /><p class="wp-caption-text">yaws with &amp; without gzip ( deflate flag in yaws.conf against each server tag)</p></div></li>
<li>max process&#8217;s set for the VM</li>
<li>mnesia specific - the dump_log_time_threshold, table loading timeouts, etc</li>
<li>yaws even has some other flags that you can insert into yaws.conf such as  max_num_cached_files , max_size_cached_file, etc</li>
</ul>
<p><strong>So&#8230;. let&#8217;s go server some static files !</strong> And what better way to test your load testing tool than to test it against various different content delivery networks out there. Now it&#8217;s a pity that I&#8217;m only running this on one node - will definitely work on using Tsung on a cluster soon.</p>
<h3>Load-testing the YUI library across various networks</h3>
<p>I decided to simulate that for a period of 10 minutes, a request would be made to a static file every 0.1.seconds!</p>
<pre class="prettyprint">&lt;!-- tsung.xml --&gt;
.
.
&lt;load&gt;
&lt;!-- several arrival phases can be set: for each phase, you can set
       the mean inter-arrival time between new clients and the phase duration --&gt;
       &lt;arrivalphase phase="1" duration="10" unit="minute" &gt;
            &lt;users interarrival="0.1" unit="second"&gt;&lt;users&gt;
       &lt;/arrivalphase&gt;
&lt;/load&gt;
.
.</pre>
<p>You can even give a combination of different sessions, of varying probability ( their summation being 100% ). But since i wanted to test only the single yahoo minified javascript file, a added a single session of 100% probability.</p>
<pre class="prettyprint">.
.
&lt;sessions&gt;
     &lt;session name="hit1"  probability="100" type="ts_http" &gt;
          &lt;request&gt;&lt;http url="http://full.path.to.my.static.file.com/ymin-2-4-1.js"
                   method="get" version="1.1" if_modified_since="Fri, 14 Jan 2009 02:043:31 GMT"&gt;&lt;/http&gt;
         &lt;/request&gt;
    &lt;/session&gt;
&lt;/sessions&gt;
.
.</pre>
<h3>The experiment</h3>
<p>Our favourite javascript library has been the <a href="http://developer.yahoo.com" target="_blank">YUI</a> hands down for the foresight, documentation and great developer community for the past many years. So we tried serving the same YUI javascript base file   from 5 different networks, test their <strong>response times</strong>, effects of of<strong> hundreds of simultaneous users</strong> , and then to analyze and compare their performances, pro&#8217;s and con&#8217;s of :</p>
<ol>
<li><a href="http://yaws.hyber.org" target="_blank">Yaws server</a> 1.77 ,running on <a href="http://erlang.org" target="_blank">erlang 12-B</a></li>
<li>Amazon <a href="http://s3.amazonaws.com" target="_blank">S3</a></li>
<li>Amazon <a href="aws.amazon.com/cloudfront/ " target="_blank">CloudFront</a></li>
<li>Yahoo&#8217;s <a href="developer.yahoo.com/yui/articles/hosting/ " target="_blank">YUI hosting</a></li>
<li>Google&#8217;s <a href="code.google.com/apis/ajaxlibs/documentation/ " target="_blank">javascript library hosting</a> ( ~little difference since google hosts only from 2.6.0 while the rest were 2.4.1 )</li>
</ol>
<h3>Results / Main statistics</h3>
<p>The statistics that comes with tsung coupled with perl graphing magic, gives you some comprehensive statistics. In order not to be biased , I even took two trials of load -testing the yahoo yui library against four networks / or content delivery networks if you will, which we at hover have tested at some point. While firebug definitely gives you the end-user perspective , time to load, etc - you can have a look at the  latency and network effects , and couple of stat&#8217;s that i&#8217;ve compiled ( tsung gives a much more detailed view btw) , and make your own judgements for yourself.</p>
<table style="height: 230px;" border="0" cellspacing="0" width="728" frame="void" rules="none">
<colgroup><col width="86"></col><col width="103"></col><col width="104"></col><col width="108"></col><col width="105"></col></colgroup>
<tbody>
<tr>
<td width="86" height="38" align="center"><strong>network</strong></td>
<td width="103" align="center"><strong><span style="font-family: Nimbus Roman No9 L;">highest 10sec mean</span></strong></td>
<td width="104" align="center"><strong><span style="font-family: Nimbus Roman No9 L;">lowest 10sec mean </span></strong></td>
<td width="108" align="center"><strong><span style="font-family: Nimbus Roman No9 L;">Highest Rate</span></strong></td>
<td width="105" align="center"><strong><span style="font-family: Nimbus Roman No9 L;">Mean</span></strong></td>
</tr>
<tr>
<td height="19" align="left">hover yaws 1</td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">9.167 sec</span></td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">0.642 sec</span></td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">11.3 / sec</span></td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">0.975 sec </span></td>
</tr>
<tr>
<td height="19" align="left">hover yaws 2</td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">1.093 sec</span></td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">0.170 sec</span></td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">8 / sec</span></td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">0.634 sec</span></td>
</tr>
<tr>
<td height="19" align="left">S3 1</td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">5.329 sec</span></td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">0.285 sec</span></td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">10.8 / sec</span></td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">0.750 sec</span></td>
</tr>
<tr>
<td height="19" align="left">S3 2</td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">1.878 sec</span></td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">0.256 sec</span></td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">11.2 / sec</span></td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">0.478 sec </span></td>
</tr>
<tr>
<td height="19" align="left">Cloudfront 1</td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">4.121 sec</span></td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">0.259 sec</span></td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">11.8 / sec</span></td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">0.693 sec</span></td>
</tr>
<tr>
<td height="19" align="left">Cloudfront 2</td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">4.420 sec</span></td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">0.256 sec</span></td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">11.2 / sec</span></td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">0.478 sec</span></td>
</tr>
<tr>
<td height="19" align="left">Yahoo 1</td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">45.5 sec</span></td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">73.67 msec</span></td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">11.2 / sec</span></td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">0.738 sec </span></td>
</tr>
<tr>
<td height="19" align="left">Yahoo 2</td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">29.9 sec</span></td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">78.71 msec</span></td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">16.1 / sec</span></td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">2.657 sec</span></td>
</tr>
<tr>
<td height="19" align="left">Google 1</td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">2.188 sec</span></td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">0.178 sec</span></td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">14.6 / sec</span></td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">1.033 sec</span></td>
</tr>
<tr>
<td height="19" align="left">Google 2</td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">4.420 sec</span></td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">0.170 sec</span></td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">15.1 / sec</span></td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">0.808 sec</span></td>
</tr>
</tbody>
</table>
<h3>Simultaneous Users per second</h3>
<ol>
<li>
<h3>hover yaws</h3>
<p>Amd 64-bit quad core machine ,4GB RAM (  extra un-used node that we&#8217;re removing as part of our see-how-much-you-can-get-out-of-one-machine scaling attempts  ). We felt that for site-specific js/static files ,etc - we need&#8217;nt look to a CDN. The only thing you need to tweak around, ( and we seem to be getting loads of errors on yaws is that of possibly insufficinet sockets , when the load moves beyong 2-3 requests per second from our experiences. It infact increased when enabling gzip, but there may be other reasons. Anyhow tsung should help crash test our local dev servers long enough , and slow enough for us to figure out where the bottlenecks really are.</p>
<p><div id="attachment_53" class="wp-caption aligncenter" style="width: 310px"><img class="size-medium wp-image-53" title="hover yaws 1" src="http://developers.hover.in/blog/wp-content/uploads/2009/03/graphes-users-simultaneous-300x225.png" alt="hover yaws 1" width="300" height="225" /><p class="wp-caption-text">hover yaws 1</p></div>
<div id="attachment_54" class="wp-caption aligncenter" style="width: 310px"><img class="size-medium wp-image-54" title="graphes-users-simultaneous1" src="http://developers.hover.in/blog/wp-content/uploads/2009/03/graphes-users-simultaneous1-300x225.png" alt="hover yaws 2" width="300" height="225" /><p class="wp-caption-text">hover yaws 2</p></div>
<hr /></li>
<li>
<h3>Amazon s3</h3>
<p>Reading through the<a href="3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf" target="_blank"> dynamo paper</a> gives some nice insights into their architecture, and we never had any issues with them. Cheap &amp; effective . Only qualm is that they don&#8217;t gzip files ( but I hear that you can push gzipped content into S3 through few hacks )</p>
<div id="attachment_61" class="wp-caption aligncenter" style="width: 310px"><img class="size-medium wp-image-61" title="S3 1" src="http://developers.hover.in/blog/wp-content/uploads/2009/03/graphes-users-simultaneous8-300x225.png" alt="S3 1" width="300" height="225" /><p class="wp-caption-text">S3 1</p></div>
<div id="attachment_59" class="wp-caption aligncenter" style="width: 310px"><img class="size-medium wp-image-59" title="S3 2" src="http://developers.hover.in/blog/wp-content/uploads/2009/03/graphes-users-simultaneous6-300x225.png" alt="S3 2" width="300" height="225" /><p class="wp-caption-text">S3 2</p></div>
<hr /></li>
<li>
<h3>Amazon CloudFront</h3>
<p>Our move to amazon cloudfront has had drastically good results with static file serving. One reason I might quit CloudFront though is that the files when pushed to S3,  seem to be inconsistent across data centers. Although the lame way would be to keep giving new filenames, ( which by the way deleting from S3, doesnt delete from cloudfront) Simply not acceptable. ( version 1 from asia, version 2 from the US ,etc  for the same file at the same time ) . We&#8217;ve beein doing upwards of 25 million GET requests from Cloudfront last month. Overall, Cloudfront steals the thunder - since it&#8217;s most likely the only edge network serving the static file among the other networks.</p>
<div id="attachment_62" class="wp-caption aligncenter" style="width: 310px"><img class="size-medium wp-image-62" title="Cloudfront 1" src="http://developers.hover.in/blog/wp-content/uploads/2009/03/graphes-users-simultaneous9-300x225.png" alt="Cloudfront 1" width="300" height="225" /><p class="wp-caption-text">Cloudfront 1</p></div>
<div id="attachment_60" class="wp-caption aligncenter" style="width: 310px"><img class="size-medium wp-image-60" title="Cloudfront 2" src="http://developers.hover.in/blog/wp-content/uploads/2009/03/graphes-users-simultaneous7-300x225.png" alt="CloudFront 2" width="300" height="225" /><p class="wp-caption-text">CloudFront 2</p></div>
<hr /></li>
<li>
<h3>yahoo&#8217;s free static content hosting</h3>
<p>They&#8217;ve been offering free hosting for something for various releases, and the free hosting is definitely worth it - be it for quick prototyping, or spanking new web apps.</p>
<div id="attachment_56" class="wp-caption aligncenter" style="width: 310px"><img class="size-medium wp-image-56" title="yahoo apis 1" src="http://developers.hover.in/blog/wp-content/uploads/2009/03/graphes-users-simultaneous3-300x225.png" alt="yahoo apis 1" width="300" height="225" /><p class="wp-caption-text">yahoo apis 1</p></div>
<div id="attachment_63" class="wp-caption aligncenter" style="width: 310px"><img class="size-medium wp-image-63" title="yahoo 2" src="http://developers.hover.in/blog/wp-content/uploads/2009/03/graphes-users-simultaneous10-300x225.png" alt="yahoo apis 2" width="300" height="225" /><p class="wp-caption-text">yahoo 2</p></div>
<hr /></li>
<li>
<h3>google free static content hosting</h3>
<p>YUI was added recently to part of Google&#8217;s ajax api&#8217;s iniatives. The javascript needs to be lazy loaded using javascript though. Doubt that they&#8217;re edge networks.</p>
<div id="attachment_57" class="wp-caption aligncenter" style="width: 310px"><img class="size-medium wp-image-57" title="googleapis.com 1" src="http://developers.hover.in/blog/wp-content/uploads/2009/03/graphes-users-simultaneous4-300x225.png" alt="google 1" width="300" height="225" /><p class="wp-caption-text">googleapis.com 1</p></div>
<p><div id="attachment_64" class="wp-caption aligncenter" style="width: 310px"><img class="size-medium wp-image-64" title="google 2" src="http://developers.hover.in/blog/wp-content/uploads/2009/03/graphes-users-simultaneous11-300x225.png" alt="google 2" width="300" height="225" /><p class="wp-caption-text">google 2</p></div></li>
</ol>
<p>And another compilation of number of HTTP status responses per second , again two trials per network.</p>
<h3>HTTP Status Response Times</h3>
<table style="height: 190px;" border="0" cellspacing="0" width="720" frame="void" rules="none">
<colgroup><col width="86"></col><col width="103"></col></colgroup>
<tbody>
<tr>
<td width="86" height="19" align="left">hover yaws 1</td>
<td width="103" align="left"><span style="font-family: Nimbus Roman No9 L;">11 / sec</span></td>
</tr>
<tr>
<td height="19" align="left">hover yaws 2</td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">8.3 / sec</span></td>
</tr>
<tr>
<td height="19" align="left">S3 1</td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">12.2 / sec</span></td>
</tr>
<tr>
<td height="19" align="left">S3 1</td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">10.7 / sec</span></td>
</tr>
<tr>
<td height="19" align="left">Cloudfront 1</td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">8.4 / sec</span></td>
</tr>
<tr>
<td height="19" align="left">Cloudfront 1</td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">9.1 / sec</span></td>
</tr>
<tr>
<td height="19" align="left">Yahoo 1</td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">11.1 / sec</span></td>
</tr>
<tr>
<td height="19" align="left">Yahoo 1</td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">12.6 / sec</span></td>
</tr>
<tr>
<td height="19" align="left">Google 1</td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">10.9 / sec</span></td>
</tr>
<tr>
<td height="19" align="left">Google 1</td>
<td align="left"><span style="font-family: Nimbus Roman No9 L;">14.1 / sec</span></td>
</tr>
</tbody>
</table>
<h3>Conclusion</h3>
<p>What&#8217;s interesting is that the figures indicate that the Amazon servers score less on handling concurrent users , and more on combating latency. But that&#8217;s quite understandable considering that every S3 object needs to be fetched from atleast 2 nodes while reading , and a conflict resolution is done during the reads, not the writes. ( they try to avoid neglecting writes from any subsystem at any cost ). But the increased response times from google , yahoo and not to forget - our very own hover server running on yet another web server - could be justified by the lesser number of requests being served by the respective CDN&#8217;s when compared to Amazon. What also  interesting to see was that the yahoo api servers seem to consistently give higher throughput from their <em>YTS/1.17.8 server. </em></p>
<p>That said, static files are just half the story, I&#8217;d love to benchmark more libraries when we get the time, but next up  might be benchmarking images , css , and more dynamic content - possibly a comparison of distributed hash stores, cache&#8217;s and the like.</p>
<p>~B</p>
<img src="http://feeds.feedburner.com/~r/hoverinDevelopers/~4/dkorPBo-98c" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://developers.hover.in/blog/2009/yui-tsung-to-choose-a-cdn/feed/</wfw:commentRss>
		<feedburner:origLink>http://developers.hover.in/blog/2009/yui-tsung-to-choose-a-cdn/</feedburner:origLink></item>
		<item>
		<title>Somethings to rejoice about</title>
		<link>http://feedproxy.google.com/~r/hoverinDevelopers/~3/1MBabOlsN2s/</link>
		<comments>http://developers.hover.in/blog/2009/somethings-to-rejoice-about/#comments</comments>
		<pubDate>Fri, 27 Mar 2009 02:48:25 +0000</pubDate>
		<dc:creator>Bosky</dc:creator>
		
		<category><![CDATA[posts]]></category>

		<category><![CDATA[Add new tag]]></category>

		<category><![CDATA[erlang]]></category>

		<category><![CDATA[mnesia]]></category>

		<category><![CDATA[numbers]]></category>

		<category><![CDATA[releases]]></category>

		<category><![CDATA[stats]]></category>

		<guid isPermaLink="false">http://developers.hover.in/blog/?p=15</guid>
		<description><![CDATA[We're really excited to check out the new erlang R13A release , here's a summary of what caught our attention - including 8 highlights and respective lessons learnt. And there are some more things to rejoice about at hover.in too, cheers to reaching quarter of a million hovers in less than two months.]]></description>
			<content:encoded><![CDATA[<p>We&#8217;re really excited to check out the <a title="release notes" href="http://erlang.org/download/otp_src_R13A.readme" target="_blank">new erlang R13A release </a>, here&#8217;s a summary of what caught our attention.</p>
<ul>
<li>
<h2><strong>Highlight #1</strong></h2>
<pre>OTP-7500  The runtime system with SMP support now uses multiple,
	      scheduler specific run queues, instead of one globally shared
	      run queue.</pre>
</li>
</ul>
<p>Not inducing concurrency into your programs can hinder the effectiveness and optimal utilization of your machine . That said - just because erlang can spawn a million process&#8217;s doesn&#8217;t quite warrant making every call concurrent. Most experiments don&#8217;t take into consideration that the same system may well be <a href="http://yaws.hyber.org" target="_blank">yet another web server</a>, building your own crons using flowcontrol through message queues , loading data into or out of the database - at the same instant of time when your benchmarking algorithm wants you to start a multi-node concurrent version of  every function that you come across. Infact it&#8217;s quite a misnomer. Sure you can spawn a mighty - concurrent sorting algorithm across 100 nodes. But what if each of those 100 nodes tried to do the same thing. Would it be more efficient to spawn more process&#8217;s on the same node versus <a href="http://steve.vinoski.net/blog/2008/05/27/joe-armstrong-erlang-and-rpc/" target="_blank">RPC</a>&#8216;ing and handling latencies from different nodes. Now I have nothing against location transparency, and mnesia&#8217; ability to decide which tables need to be on RAM,  completely transparent from any node, or only accessible form a third node - that granularity is from another planet. Same goes with gen_server&#8217;s being in memory, being able to configure to use trees, vs lists vs queue&#8217;s vs sets. it rocks. period. But here&#8217;s where improvements in SMP and infact the very emphasis of your code relying on being able to add jobs to nodes, but internally making sure that X node run&#8217;s process&#8217;s at 2X courtesy of concurrency helps, as oppose to spawning process&#8217;s for every element in the list on every node in the cluster. It&#8217;s simply not controlled enough , and sometimes good old serially running jobs , but making sure they run faster - makes a hell of a difference. Well thats what we&#8217;ve been trying to do, and has held us in amazing stead .  Something like replacing every lists:map with a parallel version is left to when you have way too much money to spend buying extra nodes that you can replicate your data on. With that in mind that said, even with the R13A we intend to continue mnesia reads even being made synchronous for things like crons, while the handling of that data immediately after - that massages the data into whatever form it wants -  is encourages to be made more concurrent. This works out well when we want to compare datasets stored in DB. One way we&#8217;ve tried to optimize the cores is to  go through our code to see if We can typically avoid unnecessary list fold&#8217;s , in favour of list map&#8217;s which are much more concurrent friendly since the input to each map function doesn&#8217;t depend on the previous or next values ( as opposed to foldl / foldr ). But wouldn&#8217;t it be nice to know when the last map function ( by  making a lists:map concurrent , each concurrent process may take different time and complete in any order, which sort of resonates with what i meant by asynchronous.</p>
<p>I&#8217;ve compared <a href="http://21ccw.blogspot.com/2008/05/parallel-quicksort-in-erlang-part-ii.html">some techniques</a> <a href="http://yomi.at/posts/3">from</a> <a href="http://yarivsblog.com/articles/2008/02/08/the-erlang-challenge/" target="_blank">many pmap</a> <a href="http://bc.tech.coop/blog/070601.html" target="_blank">implementations</a> <a href="http://montsamu.blogspot.com/2007/02/erlang-parallel-map-and-parallel.html">out there</a>, and here are some notes&#8230;</p>
<p>While it&#8217;s common practice to send the origin pid or to make a reference to the callee pid, most could easily reuse a variable .</p>
<pre class="prettyprint">lists:map fun(Msg)-&gt;
                  Pid ! { self() ,Msg }
                end , Data ).</pre>
<p>vs</p>
<pre class="prettyprint">%% Neater and should be faster , since reducing recomputing self N times
Self = self(),
lists:map fun(Msg)-&gt;
                  Pid ! { Self ,Msg }
                end , Data ).</pre>
<p>Now an interesting part wrt receives is where it is placed, and there are a fair share of different  ways of placing it, while the most commonly used method is :</p>
<pre class="prettyprint">Pids = lists:map fun(Msg)-&gt;
                  Pid ! { Self ,Msg }
                end , Data ),
lists:map ( fun(Pid) -&gt;
                     receive
                            Pid -&gt; ok
                     end
                 end, Pids).</pre>
<p>or</p>
<pre class="prettyprint">Pids = lists:map fun(Msg)-&gt;
                  Pid ! { Self ,Msg }
                end , Data ),
[ receive   Pid -&gt; ok  end ||  Pid &lt;-  Pids].</pre>
<p>I&#8217;ve always been content on avoiding the extra lists:map, and receiving results back form the concurrent process&#8217;s - within the same scope / lists:map</p>
<pre class="prettyprint">lists:map fun(Msg)-&gt;
                  Pid ! { Self ,Msg },
                  receive
                     Pid -&gt; ok
                 end
               end, List ) .</pre>
<p>And to testing the above on R12B on a 32bit, R12B on a 64bit &amp; the spanking new R13A on a 64-bit node, I <a href="http://gist.github.com/86463" target="_blank">asked hacked up a simple module </a>that took a list of elements, and attempted :</p>
<p><em><strong>UPDATE To reiterate what the module is for</strong> &#8230;“How would you get a counter of how many times some success() happened for each element of a list , and also store the results of each as well . Typically this would be a lists:foldl . Could there be someway faster than a lists:fold ( approach serial_something1) ?”</em></p>
<p>So, my explorations were that if you were to use a lists:map where one spawned process is used to maintain state, I found that results were much better. ( approach something1, something2 ) .</p>
<ol>
<li><strong><em>something1 </em></strong>concurrently get the number of attempts  and status of each attempt, using receive inside the first map</li>
<li><strong>something2</strong> concurrently get the number of attempts  and status of each attempt, using receive in a separate lists:map</li>
<li><strong>serial_something</strong> normal lists:map that synchronously get the number of attempts  and status of each attempt  ( no concurrency )</li>
</ol>
<p>I then attempted to pass in elements of various sizes , 10, 100, 1000, and 10000.</p>
<p><strong>erl , on R12B-1 on a 32bit node</strong></p>
<pre class="prettyprint">Erlang (BEAM) emulator version 5.6.1 [source] [smp:2] [async-threads:0] [kernel-poll:false]
Eshell V5.6.1  (abort with ^G)
1&gt; something:multi_bench().
[{[{161,something1},      {29,something2},        {7,serial_something}]},      %% 10    elements
 {[{543,something1},     {347,something2},       {80,serial_something}]},      %% 100   elements
 {[{5672,something1},  {12840,something2},     {5227,serial_something}]},      %% 1000  elements
 {[{50898,something1}, {64534,something2},   {617422,serial_something}]}]      %% 10000 elements</pre>
<p><strong>erl , on R12B  on a 64bit node </strong></p>
<pre class="prettyprint">Erlang (BEAM) emulator version 5.6.1 [source] [64-bit] [smp:4] [async-threads:0] [hipe] [kernel-poll:false]
Eshell V5.6.1  (abort with ^G)
1&gt; something:multi_bench_something().
[{[{40,something1},       {15,something2},         {5,serial_something}]},     %% 10    elements
 {[{260,something1},     {243,something2},        {83,serial_something}]},     %% 100   elements
 {[{2633,something1},   {2600,something2},      {5794,serial_something}]},     %% 1000  elements
 {[{23505,something1},  28484,something2},    {622124,serial_something}]}]     %% 10000 elements</pre>
<p><strong>erl , on R13A  on a 64bit node </strong></p>
<pre class="prettyprint">Erlang R13A (erts-5.7) [smp:4:4][rq:4] [async-threads:0]
Eshell V5.7  (abort with ^G)
1&gt; something:multi_bench().
[{[{1,something1},           {1,something2},         {1,serial_something}]},   %%  10     elements
 {[{1,something1},           {1,something2},         {1,serial_something}]},   %%  100    elements
 {[{1,something1},           {1,something2},         {1,serial_something}]},   %%  1000   elements
 {[{15000,something1},   {30999,something2},    {531999,serial_something}]}]   %%  10000  elements
 {[{234000,something1}, {265000,something2},  {68109999,serial_something}]}]   %%  100000 elements [BONUS] <img src='http://developers.hover.in/blog/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </pre>
<p>What&#8217;s surprising is that with R13A, the first approach of avoiding an extra lists:map fn for receive&#8217;s seems to working to consistently work out better across all platforms for large datasets. Here could be some key takeaways:</p>
<p><em>if the list is &lt; 1000 elements then it&#8217;s better to infact use another lists:map  ( doesn&#8217;t make a difference on R13A though )<br />
if the list &gt; 10000 elements, expecting the receive in another lists:map could turn out twice as slower ( across all releases )</em></p>
<p>The <a href="http://gist.github.com/86463" target="_blank">code used to run the tests</a> , including the two different approaches to the  receives is <a href="http://gist.github.com/86463" target="_blank">something.erl on github.</a></p>
<ul>
<li>
<h2><strong><strong>Highlight #2</strong></strong></h2>
<pre> OTP-7648  Support for Unicode is implemented as described in EEP10.
	      Formatting and reading of unicode data both from terminals
	      and files is supported by the io and io_lib modules. Files
	      can be opened in modes with automatic translation to and from
	      different unicode formats. The module 'unicode' contains
	      functions for conversion between external and internal
	      unicode formats and the re module has support for unicode
	      data. There is also language syntax for specifying string and
	      character data beyond the ISO-latin-1 range.

	      The interactive shell will support input and output of
	      unicode characters when the terminal and operating system
	      supports it.</pre>
</li>
</ul>
<p>And with the awesomeness of binary pattern matching, there&#8217;s nothing stopping us now . That said we&#8217;ve already had a good run with unicode ( despite the myths ) . Our prayers were answered through a common glue between erlang,python as well as erlang.  It&#8217;s dreadfully simple, yet a life-saver.</p>
<pre class="prettyprint">%% how we handled unicode pretty printing prior to R13A ( 3 cheers for Base64 encoding ! )
out(Arg)-&gt;
          B64 = base64:encode( SomeBinaryKeyword () ),
	  Kw = {script, [],  [
                           "document.write(Base64.decode('",
                             B64,
                            "'));"
                     ]},
          {ehtml, Kw }.</pre>
<p>Since Binary can be spitted out by the yaws ehtml structure , there is no need to convert from binary back to string . Wrt unicode, I hope this also means that we will be able to actually work with unicode in the erlang shell, inside code, and so on. Erlang&#8217;s nature of storing strings as integers turned out to be a boon for us, because it now does&#8217;nt matter if out python crawler crawls english or swahili - it;s all integers stored as binary - which makes it cheaper than strings, and more parser friendly.  Having a look at <a href="http://github.com/rvirding/leex/tree/master">leex/yecc</a> for building unicode compilers might be a hack for another day.</p>
<ul>
<li>
<h2><strong><strong>Highlight #3</strong></strong></h2>
<pre>Message passing has been further optimized for parallel
	      execution. Serial message passing is slightly more expensive
	      than before, but parallel send to a common receiver is much
	      cheaper.</pre>
</li>
</ul>
<p>I&#8217;ll be looking forward to also comparing <a href="http://hg.rabbitmq.com/rabbitmq-server/file/b95f2fd4e3f6/src/gen_server2.erl">gen_server2</a> which has been doing the  rounds in some projects.</p>
<ul>
<li>
<h2><strong><strong>Highlight #4</strong></strong></h2>
<pre>OTP-7804

The BIFs atom_to_binary/2, binary_to_atom/2, and
	      binary_to_existing_atom/2 have been added.</pre>
</li>
</ul>
<ul>
<li>
<h2><strong><strong>Highlight #5</strong></strong></h2>
<pre>OTP-7826
Nodes belonging to different independent clusters can now
	      co-exist on the same host with the help of a new environment
	      variable setting ERL_EPMD_PORT.</pre>
</li>
</ul>
<p>Naming nodes ( hostname vs fully qualified ip ) tends to get little tricky  if you don&#8217;t have static IP &#8217;s. Especially when different hostnames try to interconnect. For those who are&#8217;nt familiar - creating a distributed cluster is as simply as</p>
<pre>net_adm:ping( 'name@ip').
or
net_adm:ping('justhostname') .</pre>
<p>and  if the nodes set the same cookie ( just a security identifier ) and if all goes well  - it should return pong. If there are problems, pang. If the nodes were all connected on the same Lan and sharing the same cookie - there would &#8216;nt be too many hassles. And until now , I don&#8217;t think nodes from different clusters could communicate. I wonder if this means that we could have nodes talking to each other from different lan networks. This is big.</p>
<ul>
<li>
<h2><strong><strong>Highlight #6</strong></strong></h2>
<pre>OTP-7641  When chunk reading a disk log opened in read_only mode, bad
	      terms could crash the disk log process.</pre>
</li>
</ul>
<p><img class="size-medium wp-image-26 alignleft" title="screenshot-32" src="http://developers.hover.in/blog/wp-content/uploads/2009/03/screenshot-32-300x187.png" alt="screenshot-32" width="316" height="198" />Some time back, we noticed that the nodes were crashing, mostly &#8220;out of memory&#8221; errors which would bring down the VM, sometimes even the node , which isn&#8217;t pretty at at all. We followed it by more logging, to no avail. I don&#8217;t know how we missed it -but it turned out that the  “error_logger” module will registered one and only one process to receive and handle all log messages. ( at least till 1.77 ) had a bottleneck ,so infact logging to find out what was wrong kept forcing more crash&#8217;s . Since then we&#8217;re simply cut down logging to almost nil - and we haven&#8217;t faced the same issue since then. It was quite embarrassing when I did try it out locally. Simply running <em>[  error_logger:info_msg( "writing ", [ X ] ) || X&lt;- lists:seq(1, AHighEnoughNumber) ] </em>brought yaws down. I hope this is fixed either will the latest yaws release or R12B, if that&#8217;s what they meant by this fix. Pity we did&#8217;nt debug <a href="http://www.planeterlang.org/en/planet/article/A_Case_Study_of_Scalability_Related_Out_of_memory_Crash_in_Erlang/">this</a> earlier.</p>
<ul>
<li>
<h2><strong><strong>Highlight #7 (mnesia)<br />
</strong></strong></h2>
<pre>    OTP-7753  With bad timing several api functions could return or exit
	      with a bad error message when mnesia was shutting down.

    OTP-7835  mnesia:clear_table/1 cleared all nodes table content even if
	      the table was local_content only type.</pre>
</li>
</ul>
<p>We&#8217;ve also noticed inconsistencies in R12B, wrt clearing fragments of mnesia tables. What our module which we&#8217;ll talk about later - frag_mnesia does is spawn process&#8217;s that will address each fragment explicitly and clear it. We&#8217;ve got similiar functions for querying disk-only tables as well. Here are some results</p>
<pre>%% wrt time taken
an mnesia dirty_match_object  on a fragmented table ( eg: 8 fragments + the base table )
%%is slower than
 spawning as many process's as there are fragments and dirty matching over them
%%is slower than
mnesia reads</pre>
<p>When you have data that changes over time, - perhaps it could change from more that one point of entry, or you&#8217;re perhaps accepting a whole bunch of tags. Now the smart thing to do for editing is infact making the mnesia table of type set, and simply overwriting. Infact the create &amp; edit records are often the same function at some point . But when you have data that is one is to many - eg: meta tags of a webpage . There maybe multiple meta tags for the same webapge. We decided to have the key as</p>
<pre class="prettyprint">%Approach1 : Table is type set, Key as tuple , no conflict resolution, but slower
{ Url , Kw }
%Approach2 : Table is type bag, Key is either Kw or Url ,  conflict resolution, but much quicker
Kw

%sample entries for a mnesia set table
{ &lt;&lt;"unitednations.com/"&gt;&gt; , &lt;&lt;"india"&gt;&gt; } , MetaData, TimeT1
{ &lt;&lt;"unitednations.com/"&gt;&gt; , &lt;&lt;"india"&gt;&gt; } , NewMetaData, TimeT2  %% overwrites TimeT1 which is great

%sample entries for a mnesia bag table
 &lt;&lt;"unitednations.com/"&gt;&gt; , &lt;&lt;"india"&gt;&gt;  , MetaData, TimeT1
 &lt;&lt;"unitednations.com/"&gt;&gt; , &lt;&lt;"india"&gt;&gt;  , NewMetaData, TimeT2  %% two verions of data</pre>
<p>For a crawler, recrawls of webpages are something you need to cater to. If the table above was of tpye set, then the record TimeT2 would simply overwrite the first row with new data which is ideal.  if the table above was of type bag, then there would have been two rows for the same url and keyword combination. We did&#8217;nt want to have to deal with the inconsistent rows , so we kept the table as sets. Which drastically reduced query speeds on large fragmented tables. There are two approaches that we might look at from here on. Keeping another table, where the index infact is the keyword  or the other approach could be storing the tables as bags and dealing with conflict resolution of different versions of data. <a href="http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pd">Amazon&#8217;s paper on dynamo</a> talks about reading multiple versions of data (infact an S3 object is typically replicated on 3 nodes, reads are only allowed if two nodes respond, writes are always accpted. But they say this ratio changes from webapp to webapp within Amazon itself. But the interesting part was that of the conflict resolution was done during the reads. not during the writes. infact vector clocks are used to merge two different versions of data.</p>
<p>Another design decision we took early was how we <a href="http://highscalability.com/unorthodox-approach-database-design-coming-shard">sharded</a> the data. I guess we broke every rule in the book, when we decided that every user would have their own set of tables. <strong><em>Wont there be too many tables ?</em></strong> Yes, and that many lesser table locks to deal with , letting us have more workers accessing multiple tables that have absolutely no transaction locks between them or drastically reduce the number of  process&#8217;s trying to access the same data. <strong><em>Tell me more ! </em></strong>Backing up is easier, having user specific rule are easier,  analysing and awarding more workers on the fly to high-priority publishers becomes a charm ( much like spawning ec2 instances i assume ) . <strong><em>Wait did you say you stored crawler data in mnesia ? </em></strong>That&#8217;&#8217;s right , sharded from the word go, disc-only fragmented from the word go ( allowing us to spawn process&#8217;s to even search each fragment which turns out to be several magnitude times faster that non-concurrent match&#8217;s. What&#8217;ll be interesting is how far we can go with mnesia. <em><strong>But how do you get in so much data so fast ?</strong></em> It is&#8217;nt lighting speed, but it takes around 4 seconds for everything from crawling/creating a whole bunch of index&#8217;s , and loading into mnesia from heat seeking ( will crawl based on popularity of pages ) hybridqueue . We also have another singleton crawler that&#8217;s more of a traditional techniques.  User specific Metadata is often replicated across all nodes , and are disc_copies when are required in most calls. Stat tables are typically disc-only tables, and we&#8217;re currently have awstats configured over yaws logs to spruce us some http analytics juice. We&#8217;re also working on an <a href="http://sourceforge.net/projects/rrdstats/ " target="_blank">rrdstats</a> implementation that uses a gen_server to fill in data within the step frequency , still need to spend some more time before it&#8217;s production ready. <em><strong>And the rest of the table are mainly disc_copies or in RAM? Where does that leave us ?</strong> </em> <a href="http://en.wikipedia.org/wiki/Denormalization " target="_blank">Denormalization</a> - works great you need to render amazing amounts of data to some of India&#8217;s biggest sites in under 200-300 ms .Loads of inverted index&#8217;s being generated through extremely well-controlled and predictive flowcontrols. But we&#8217;re still exploring our own custom cache worker&#8217;s that are typically ring buffers of fixed size , but we&#8217;d love to spruce things up with perhaps squid servers, memcached , <a href="http://www.joeandmotorboat.com/2009/01/10/introducing-merle-an-erlang-memcached-client/" target="_blank">merle</a> , or a <a href="http://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-value-stores/" target="_blank">distributed key-value store</a> ?</p>
<ul>
<li>
<h2><strong><strong>Highlight #8 ( stdlib )</strong></strong></h2>
<pre>    OTP-7230  The functions lists:seq/1,2 return the empty list in a few
	      cases when they used to generate an exception, for example
	      lists:seq(1, 0). See lists(3) for details. (Thanks to Richard
	      O'Keefe.)</pre>
</li>
<li>
<pre>    OTP-7740  When a process started with proc_lib, gen_server, or gen_fsm
	      exits with reason {shutdown,Term}, a crash report will no
	      longer be generated (to allow a clean shutdown, but still
	      provide additional information to process that are linked to
	      the terminating process).</pre>
</li>
</ul>
<div class="mceTemp">
<dl class="wp-caption alignleft" style="width: 160px;">
<dt class="wp-caption-dt"><a href="http://www.erlang-factory.com/conference/SFBayAreaErlangFactory2009"><img title="erlang factory" src="http://www.erlang-factory.com/images/erlanglogo.gif" alt="erlang factory" width="150" height="80" /></a></dt>
</dl>
</div>
<p>All in all, it&#8217;s interesting times at hover.in with R13A coming up, and the buzz is definitely in the air wrt erlang. Too bad I could&#8217;nt make the CFP for the <a href="http://www.erlang-factory.com/conference/SFBayAreaErlangFactory2009">erlang factory&#8217;s Bay Area 2009 conference</a>, it looks like it&#8217;s going to be the biggest and most widely anticipated erlang conference yet!  Can&#8217;t wait for the videos of the talks to roll out. There are also few talks by companies like Facebook, and Mochiweb. and ofcourse the amazing guys at ProcessOne.</p>
<p><strong>And there are some more things to rejoice about at hover.in too :</strong></p>
<ul>
<li><strong>Peaking at 50,000 hovers a single day</strong>, where a hover is everytime a viewer moves their cursor over a word that the publisher / blogger chooses to see content . We&#8217;re now happy to see around 15,000 - 20,000 hovers a day from the few publishers that have signed up with us.</li>
<li>Interesting our scaling experiences with erlang &amp; yaws has been &#8230;..<strong>reducing one node from our 4 node cluster</strong> : ) We &#8216;re more excited about pushing how much we can do with how less. ( 2-3 quad- core machines serving up as production servers ) with anywhere between 3 and 30 requests per second, we&#8217;re just touched <strong>quarter of a million hovers in under 2 months</strong>, with zilch marketing or PR, and   established large-size Indian portals ( Sify Finance , Oneindia and more to be announced )</li>
<li><strong>full support for unicode languages</strong>, which means we&#8217;re the first intext company to hoverize regional languages. Go have a look. <a href="http://thatstamil.oneindia.in" target="_blank">http://thatstamil.oneindia.in </a>. A big shoutout to the folks at OneIndia for making it happen, and believing in the potential of the product. We however are looking for an NLP hacker to help us move especially in parts of speech recognition, classification and clustering for not just english, but pioneer with regional languages as well.</li>
<li>Building the next generation of contextual , and in-text engaging content , be it a crunchbase hoverlet , a map hoverlet, music from last.fm or videos from youtube, a deck from slideshare, graphs from compete, conversations from twitter, and virtually any widget or 3rd party gadget out there. We typically work on erlang, python , javascript ( based on the excellent <a href="http://developer.yahoo.net/yui" target="_blank">YUI</a> ), and will be launching our facebook / wordpress / firefox plugins soon to be launched.</li>
<li><strong>the 26th alliance</strong>, <a href="http://twitter.com/26th" target="_blank">was announced</a> in our typical low-key affair, on January 26th and attempts to bring together a growing set of Indian consumer- web companies, trying to get together on common ground to pursue a singular cause - to develop and distribute widgets , and help expose and utilize upcoming API&#8217;s .</li>
</ul>
<p>It&#8217;s been a fantastic ride through the past 18months , and hopefully we&#8217;ll be able to keep in touch within the eco-system of bloggers, content-providers and developers in the near future. take a look at the product <a href="http://hover.in/category/blog/">demo&#8217;s on the the main hover.in blog</a>, keep a track of <a href="http://twitter.com/hoverin" target="_blank">hover.in on twiitter</a> .</p>
<p>Thank&#8217;s for dropping in, and a big shoutout to everyone behind erlang, everyone at #erlang and github and the blogosphere promoting erlang - on behalf of our team of 5 developers, including myself : )<br />
~<br />
Keep Clicking,<br />
Bhasker V Kode , aka bosky<br />
Co-Founder &amp; CTO</p>
<img src="http://feeds.feedburner.com/~r/hoverinDevelopers/~4/1MBabOlsN2s" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://developers.hover.in/blog/2009/somethings-to-rejoice-about/feed/</wfw:commentRss>
		<feedburner:origLink>http://developers.hover.in/blog/2009/somethings-to-rejoice-about/</feedburner:origLink></item>
		<item>
		<title>developers.hover.in</title>
		<link>http://feedproxy.google.com/~r/hoverinDevelopers/~3/UJ3VWMm2LB8/</link>
		<comments>http://developers.hover.in/blog/2009/hello-world/#comments</comments>
		<pubDate>Mon, 02 Feb 2009 04:58:22 +0000</pubDate>
		<dc:creator>Bosky</dc:creator>
		
		<category><![CDATA[Announcements]]></category>

		<category><![CDATA[announcement]]></category>

		<guid isPermaLink="false">http://dev.hover.in/blog/?p=1</guid>
		<description><![CDATA[This is the developer blog at hover.in , where posts are contributed by the developers  (and wannabe-cartoonists until we get another blog) . For the most part - this group blog will touch upon the following areas :

erlang, python , system utils , and other stuff mostly on the LYME stack ( linux, yaws, [...]]]></description>
			<content:encoded><![CDATA[<p>This is the developer blog at <em>hover.in</em> , where posts are contributed by the developers  (and wannabe-cartoonists until we get another blog) . For the most part - this group blog will touch upon the following areas :</p>
<ol>
<li><a href="http://developers.hover.in/blog/?tag=erlang" target="_self">erlang</a>, python , system utils , and other stuff mostly on the LYME stack ( linux, yaws, <a href="http://developers.hover.in/blog/?tag=mnesia">mnesia</a>, erlang )</li>
<li>javascript, user interfaces , css , graphics discussions</li>
<li>theoretical computer science, caching, CDN’s ,and scaling experiences</li>
<li>tech <a href="http://developers.hover.in/blog/category/anns/" target="_self">announcements</a> updates on features, open-source work, API talk, and even hiring</li>
</ol>
<p>Do get in touch with us , <a href="http://twitter.com/hoverin">follow us on twitter</a>,or shoot in an email to contact at hover dot in.</p>
<p><img class="size-full wp-image-213" title="Bosky" src="http://hover.in/wp-content/uploads/2009/01/dsc_9749-1.jpg" alt="Bosky" width="58" height="58" /><img title="Tag!" src="http://developers.hover.in/blog/wp-content/uploads/2009/02/q_silhouette.gif" alt="Tag" width="57" height="57" /> <img title="Arun" src="http://hover.in/wp-content/uploads/2009/01/dsc_9777-2.jpg" alt="Arun" width="56" height="57" /></p>
<p>~<br />
FTW<br />
Team hover.in</p>
<img src="http://feeds.feedburner.com/~r/hoverinDevelopers/~4/UJ3VWMm2LB8" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://developers.hover.in/blog/2009/hello-world/feed/</wfw:commentRss>
		<feedburner:origLink>http://developers.hover.in/blog/2009/hello-world/</feedburner:origLink></item>
	</channel>
</rss>
