<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>nutrun » Software</title>
	
	<link>http://nutrun.com</link>
	<description>nutrun</description>
	<lastBuildDate>Tue, 10 Nov 2009 13:48:45 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.5</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" href="http://feeds.feedburner.com/nutrun/feed" type="application/rss+xml" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com" /><item>
		<title>Deployment setup automation</title>
		<link>http://feedproxy.google.com/~r/nutrun/feed/~3/TFqWN_iztSk/</link>
		<comments>http://nutrun.com/weblog/deployment-setup-automation/#comments</comments>
		<pubDate>Tue, 10 Nov 2009 13:47:16 +0000</pubDate>
		<dc:creator>George Malamidis</dc:creator>
				<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://nutrun.com/?p=327</guid>
		<description><![CDATA[Part of my work these days has to do with building and deploying numerous experimental applications with varying life cycles. Many of these applications get built and put on a server in less than a day only to be shut down and never looked at again a couple of days later, others get turned off [...]]]></description>
			<content:encoded><![CDATA[<p>Part of my work these days has to do with building and deploying numerous experimental applications with varying life cycles. Many of these applications get built and put on a server in less than a day only to be shut down and never looked at again a couple of days later, others get turned off and revisited after some time, while others graduate to larger, wider scope systems.</p>
<p>This means that I get to deploy applications for the first time more frequently than usual. Also, because we deploy to virtualised infrastructures (including an internal cloud, Slicehost and Amazon EC2), slice instances (servers) tend to get rebuilt more often than they would in the absence of virtualisation. First time deployments are generally more involved than subsequent ones because there is setup up to be made and software to be installed in order for the host servers to accommodate the application.</p>
<p>One way to treat first time deployment woes is to create and maintain images of the system in the state required to host the application. I find this to work well when dealing with moderate numbers of applications and servers, whereas creating and keeping images up to date has a tendency to become tedious and inflexible as the number of applications and images increases.</p>
<p>As an alternative, we can move prerequisite system setup and installations responsibility closer to the application code, in the form of an <code>after</code> hook to the <code>deploy:setup task</code> that we call the first time we deploy an application with Capistrano. Here&#8217;s some Capistrano code that performs one time setup tasks.</p>
<pre>
namespace :setup do
  task :install_libraries do
    sudo 'apt-get install libxml2 libxml2-dev libmysqlclient15-dev -y'
  end
end

after 'deploy:setup', 'util:install_libraries'
</pre>
<p>With this approach, the application knows how to setup the system the way it needs it to be next time it gets deployed for the first time. As an added benefit, the Capistrano code serves as documentation for the application&#8217;s system requirements.</p>
<img src="http://feeds.feedburner.com/~r/nutrun/feed/~4/TFqWN_iztSk" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://nutrun.com/weblog/deployment-setup-automation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://nutrun.com/weblog/deployment-setup-automation/</feedburner:origLink></item>
		<item>
		<title>VCS practices over features</title>
		<link>http://feedproxy.google.com/~r/nutrun/feed/~3/PZdydXtXuHE/</link>
		<comments>http://nutrun.com/weblog/vcs-practices-over-features/#comments</comments>
		<pubDate>Sat, 29 Aug 2009 00:36:09 +0000</pubDate>
		<dc:creator>George Malamidis</dc:creator>
				<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://nutrun.com/?p=316</guid>
		<description><![CDATA[I&#8217;ve often heard people I know and respect say that git is leaps and bounds better than Subversion. I&#8217;ve been a relatively early adopter of git, it&#8217;s been my VCS of choice for almost two years now. Even though I find it superior to most of the competition I struggle to justify the &#8220;leaps and [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve often heard people I know and respect say that <cite><a href="http://git-scm.com/" title="Git - Fast Version Control System">git</a> is leaps and bounds better than <a href="http://subversion.tigris.org/" title="subversion.tigris.org">Subversion</a></cite>. I&#8217;ve been a relatively early adopter of git, it&#8217;s been my VCS of choice for almost two years now. Even though I find it superior to most of the competition I struggle to justify the &#8220;leaps and bounds&#8221; claim and would rather more modestly call it &#8220;a step forward&#8221;.</p>
<p>This is probably due to the practices we find benefit our development process. Git puts great emphasis on branching, something we generally tend to avoid (to clarify, I&#8217;m not referring to local branching). We concentrate on feedback based on the usage of our applications. This means that we strive to commit as often as possible and, most importantly, deploy to production at a constant rate. Grossly simplified, the process is: identify a small coherent feature, build it, commit it to the master branch and deploy. No part of the codebase is owned by a subdivision of the team, everyone works on everything.</p>
<p>By far the most popular git commands we issue are <code>git pull</code>, <code>git add</code> and <code>git push</code>, not that different to <code>svn update</code> and <code>svn commit</code>.</p>
<p>When I first started using git I was wondering if I had developed a fear of branching because of Subversion&#8217;s inefficiencies in that area. In reality, I think that an environment where every developer constantly has an up to date understanding of the codebase and especially a current grasp of the design and overall vision will always be more efficient than working remotely and having merge checkpoints, no matter how cleverly the VCS handles branching. This is why I think a faster, distributed, superior at merging VCS is not something more dramatic than a desirable step forward.</p>
<img src="http://feeds.feedburner.com/~r/nutrun/feed/~4/PZdydXtXuHE" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://nutrun.com/weblog/vcs-practices-over-features/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://nutrun.com/weblog/vcs-practices-over-features/</feedburner:origLink></item>
		<item>
		<title>Hello world nginx module</title>
		<link>http://feedproxy.google.com/~r/nutrun/feed/~3/BRhTbyruaGk/</link>
		<comments>http://nutrun.com/weblog/hello-world-nginx-module/#comments</comments>
		<pubDate>Sat, 15 Aug 2009 00:20:09 +0000</pubDate>
		<dc:creator>George Malamidis</dc:creator>
				<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://nutrun.com/?p=287</guid>
		<description><![CDATA[Several times over the past few months I made short lived attempts of delving into the mechanics of nginx modules. Although an invaluable resource to anyone seriously interested in the subject, Emiller&#8217;s Guide To Nginx Module Development doesn&#8217;t at the time of this writing include a quick-start example I could hack together and see in [...]]]></description>
			<content:encoded><![CDATA[<p>Several times over the past few months I made short lived attempts of delving into the mechanics of <a href="http://nginx.net/" title="nginx">nginx</a> modules. Although an invaluable resource to anyone seriously interested in the subject, <a href="http://www.evanmiller.org/nginx-modules-guide.html" title="Emiller's Guide to Nginx Module Development">Emiller&#8217;s Guide To Nginx Module Development</a> doesn&#8217;t at the time of this writing include a quick-start example I could hack together and see in action. Getting something to run as quickly as possible is my preferred way of starting the study of new things and every time I caught myself searching the web for a &#8220;Hello world nginx module&#8221;.</p>
<p>I will not go into any details, <a href="http://www.evanmiller.org/nginx-modules-guide.html" title="Emiller's Guide to Nginx Module Development">Emiller&#8217;s Guide</a> does an excellent job at that, I&#8217;m only going to mention the steps I believe are absolutely necessary to write, compile and run an nginx handler module that responds to every request with the string &#8220;Hello world&#8221;.</p>
<p>There is a minimum of two files required for writing an nginx module, the first should be called <code>config</code> and looks something like this:</p>
<pre>
ngx_addon_name=ngx_http_hello_world_module
HTTP_MODULES="$HTTP_MODULES ngx_http_hello_world_module"
NGX_ADDON_SRCS="$NGX_ADDON_SRCS $ngx_addon_dir/ngx_http_hello_world_module.c"
</pre>
<p>The second is the module&#8217;s implementation in C and nginx convention suggests a name like <code>ngx_http_modulename_module.c</code>, in this case <code>ngx_http_hello_world_module.c </code>.</p>
<pre>
#include &lt;ngx_config.h&gt;
#include &lt;ngx_core.h&gt;
#include &lt;ngx_http.h&gt;

static char *ngx_http_hello_world(ngx_conf_t *cf, ngx_command_t *cmd, void *conf);

static ngx_command_t  ngx_http_hello_world_commands[] = {

  { ngx_string("hello_world"),
    NGX_HTTP_LOC_CONF|NGX_CONF_NOARGS,
    ngx_http_hello_world,
    0,
    0,
    NULL },

    ngx_null_command
};

static u_char  ngx_hello_world[] = "hello world";

static ngx_http_module_t  ngx_http_hello_world_module_ctx = {
  NULL,                          /* preconfiguration */
  NULL,                          /* postconfiguration */

  NULL,                          /* create main configuration */
  NULL,                          /* init main configuration */

  NULL,                          /* create server configuration */
  NULL,                          /* merge server configuration */

  NULL,                          /* create location configuration */
  NULL                           /* merge location configuration */
};

ngx_module_t ngx_http_hello_world_module = {
  NGX_MODULE_V1,
  &amp;ngx_http_hello_world_module_ctx, /* module context */
  ngx_http_hello_world_commands,   /* module directives */
  NGX_HTTP_MODULE,               /* module type */
  NULL,                          /* init master */
  NULL,                          /* init module */
  NULL,                          /* init process */
  NULL,                          /* init thread */
  NULL,                          /* exit thread */
  NULL,                          /* exit process */
  NULL,                          /* exit master */
  NGX_MODULE_V1_PADDING
};

static ngx_int_t ngx_http_hello_world_handler(ngx_http_request_t *r)
{
  ngx_buf_t    *b;
  ngx_chain_t   out;

  r-&gt;headers_out.content_type.len = sizeof("text/plain") - 1;
  r-&gt;headers_out.content_type.data = (u_char *) "text/plain";

  b = ngx_pcalloc(r-&gt;pool, sizeof(ngx_buf_t));

  out.buf = b;
  out.next = NULL;

  b-&gt;pos = ngx_hello_world;
  b-&gt;last = ngx_hello_world + sizeof(ngx_hello_world);
  b-&gt;memory = 1;
  b-&gt;last_buf = 1;

  r-&gt;headers_out.status = NGX_HTTP_OK;
  r-&gt;headers_out.content_length_n = sizeof(ngx_hello_world);
  ngx_http_send_header(r);

  return ngx_http_output_filter(r, &amp;out);
}

static char *ngx_http_hello_world(ngx_conf_t *cf, ngx_command_t *cmd, void *conf)
{
  ngx_http_core_loc_conf_t  *clcf;

  clcf = ngx_http_conf_get_module_loc_conf(cf, ngx_http_core_module);
  clcf-&gt;handler = ngx_http_hello_world_handler;

  return NGX_CONF_OK;
}
</pre>
<p>Both <code>config</code> and <code>ngx_http_hello_world_module.c</code> should be placed in the same directory, let&#8217;s say <code>/etc/ngxhelloworld</code>. Modules are compiled into the nginx binary. To do so, <a href="http://wiki.nginx.org/NginxInstall" title="NginxInstall">download the nginx source</a>, uncompress, and in the nginx source directory run:</p>
<pre>
./configure --add-module=/etc/ngxhelloworld
make
sudo make install
</pre>
<p>Finally, add a module directive to nginx&#8217;s configuration (default is <code>/usr/local/nginx/conf/nginx.conf</code>) to enable the module for a location.</p>
<pre>
location = /hello {
  hello_world;
}
</pre>
<p>At this point, we can start nginx and navigating to <code>http://localhost/hello</code> will yield the result of all this labor.</p>
<p>Alongside Emiller&#8217;s Guide, I also found reading <a href="http://wiki.nginx.org/Nginx3rdPartyModules" title="Nginx3rdPartyModules">nginx third party module</a> code helpful.</p>
<img src="http://feeds.feedburner.com/~r/nutrun/feed/~4/BRhTbyruaGk" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://nutrun.com/weblog/hello-world-nginx-module/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://nutrun.com/weblog/hello-world-nginx-module/</feedburner:origLink></item>
		<item>
		<title>Asynchronous session content injection</title>
		<link>http://feedproxy.google.com/~r/nutrun/feed/~3/FWWW2yNAHPY/</link>
		<comments>http://nutrun.com/weblog/asynchronous-session-content-injection/#comments</comments>
		<pubDate>Thu, 06 Aug 2009 12:14:07 +0000</pubDate>
		<dc:creator>George Malamidis</dc:creator>
				<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://nutrun.com/?p=273</guid>
		<description><![CDATA[Applying a clear distinction between stateless and stateful content when designing a web application is tricky but worth tackling early so that content not specific to user sessions can benefit from web caching. The technique we are trying out for scramble.com reminds me of what I described in State separation and was introduced to me [...]]]></description>
			<content:encoded><![CDATA[<p>Applying a clear distinction between stateless and stateful content when designing a web application is tricky but worth tackling early so that content not specific to user sessions can benefit from web caching. The technique we are trying out for <a href="http://www.scramble.com/" title="">scramble.com</a> reminds me of what I described in <a href="http://nutrun.com/weblog/state-separation/" title="nutrun  &raquo; Blog Archive   &raquo; State separation">State separation</a> and was introduced to me by <a href="http://www.neophiliac.net/" title="ne•o•phil•i•ac">Mike Jones</a> who was inspired by the <em>Dynamically Update Cached Pages</em> chapter in <a href="http://www.pragprog.com/titles/fr_arr/advanced-rails-recipes" title="The Pragmatic Bookshelf | Advanced Rails Recipes">Advanced Rails Recipes</a>.</p>
<p>
<a href="http://www.flickr.com/photos/nutrun/3794424247/" title="asynchronous-session-content-injection by nutrunflickr, on Flickr"><img src="http://farm3.static.flickr.com/2501/3794424247_30b0d5cc52_o.png" width="331" height="242" alt="asynchronous-session-content-injection" /></a>
</p>
<p>The idea involves serving non session specific resources independent from personalized content and use AJAX calls to inject the page with session specific content.</p>
<pre>
require 'rubygems'
require 'sinatra'
require 'json'

configure do
  enable :sessions
end

get '/' do
  headers['Cache-Control'] = 'max-age=60, must-revalidate'
  erb :index
end

get '/userinfo' do
  if session[:user]
    JSON.dump(:user =&gt; session[:user])
  else
    halt 401
  end
end

get '/login' do
  session[:user] = 'rock'
  redirect '/'
end

get '/logout' do
  session.clear
  redirect '/'
end
</pre>
<p>Notice some of the headers for <code>'/'</code>:</p>
<pre>
$ curl -I http://localhost:4567/
Cache-Control: max-age=60, must-revalidate
Set-Cookie: rack.session=BAh7AA%3D%3D%0A; path=/
</pre>
<p>The <code>Cache-Control</code> policy instructs a web cache to keep this version of the resource for 60 seconds before requesting a fresh one. <code>Set-Cookie</code> however will usually cause a web cache to never store the response and always query its back end.</p>
<p>The following configuration tells <a href="http://varnish.projects.linpro.no/" title="Varnish - Trac">Varnish</a> to throw away the cookie from any request/response that doesn&#8217; match one of the URLs that require authorization, thus causing it to react to response cache policies.</p>
<pre>
sub vcl_recv {
  if (req.url !~ "^(/login|/logout|/userinfo)") {
    unset req.http.cookie;
  }
}

sub vcl_fetch {
  if (req.url !~ "^(/login|/logout|/userinfo)") {
    unset obj.http.set-cookie;
  }
}
</pre>
<p>A snippet from the HTML response for <code>'/'</code>:</p>
<pre>
&lt;h1&gt;Hi&lt;/h1&gt;
&lt;div id="nav"&gt;
  &lt;a href="/login" class='login-control'&gt;Login&lt;/a&gt;
&lt;/div&gt;
</pre>
<p>&#8230; and the javascript for asynchronously injecting session data to the page:</p>
<pre>
$(function() {
  $.getJSON('/userinfo', function(data) {
    $('h1').text('Hi ' + data.user);
    $('#nav .login-control').attr('href', '/logout').html('logout');
  })
})
</pre>
<p>In summary, it is likely that a website will have significant amounts of content that is intended for everyone without the need for personalization. The performance of serving that content can benefit from web caching, but that becomes difficult as many websites&#8217; user experience depends on the presence of user sessions. Separating stateless from session specific content at the resource level and using a combination of HTTP and AJAX to merge the results of requests for both types of resources will make stateless content cacheable by decoupling it from the unnecessary cookie dependency.</p>
<p>Runnable code example : <a href="http://pastie.org/573878">http://pastie.org/573878</a></p>
<img src="http://feeds.feedburner.com/~r/nutrun/feed/~4/FWWW2yNAHPY" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://nutrun.com/weblog/asynchronous-session-content-injection/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		<feedburner:origLink>http://nutrun.com/weblog/asynchronous-session-content-injection/</feedburner:origLink></item>
		<item>
		<title>Rack::CacheHeaders code</title>
		<link>http://feedproxy.google.com/~r/nutrun/feed/~3/Xw1hobGSM3E/</link>
		<comments>http://nutrun.com/weblog/rackcacheheaders-code/#comments</comments>
		<pubDate>Mon, 18 May 2009 13:22:34 +0000</pubDate>
		<dc:creator>George Malamidis</dc:creator>
				<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://nutrun.com/?p=261</guid>
		<description><![CDATA[A few months ago I wrote about a possible method for centrally configuring HTTP cache headers in Rack based web applications which I called Rack::CacheHeaders. This is useful if your application&#8217;s architecture involves tools like Squid or Varnish, or if you are generally interested in harvesting the numerous advantages of HTTP caching for your web [...]]]></description>
			<content:encoded><![CDATA[<p>A few months ago I <a href="http://nutrun.com/weblog/rack-cache-headers/" title="nutrun  &raquo; Blog Archive   &raquo; Rack cache headers">wrote</a> about a possible method for centrally configuring HTTP cache headers in <a href="http://rack.rubyforge.org/" title="Rack: a Ruby Webserver Interface">Rack</a> based web applications which I called <code>Rack::CacheHeaders</code>. This is useful if your application&#8217;s architecture involves tools like <a href="http://www.squid-cache.org/" title="squid : Optimising Web Delivery">Squid</a> or <a href="http://varnish.projects.linpro.no/" title="Varnish - Trac">Varnish</a>, or if you are generally interested in harvesting the numerous advantages of HTTP caching for your web application.</p>
<p>The code has evolved a bit since and proven useful in a number of production systems. I created a <a href="http://gist.github.com/113441" title="gist: 113441 - GitHub">gist</a> of <code>Rack::CacheHeaders</code> in case someone else finds it handy. The tool is not exhaustive in terms of policies as found in the HTTP <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html" title="HTTP/1.1: Caching in HTTP">specs</a>, it&#8217;s a collection of the ones we needed in the projects it&#8217;s been used so far. Consider adding ones you need to the gist to make the code more complete and widely useful.</p>
<p><code>Rack::CacheHeaders</code> allows configuring HTTP cache policy response headers based on request URI patterns. For example, to set the <code>Cache-Control: max-age</code> header for a <code>/guitars/:id</code> resource to one hour:</p>
<pre>
Rack::CacheHeaders.configure do |cache|
  cache.max_age(/^\/guitars\/d+$/, 3600)
end
</pre>
<p><a href="http://gist.github.com/113441" title="gist: 113441 - GitHub">Download/develop Rack::CacheHeaders</a></p>
<img src="http://feeds.feedburner.com/~r/nutrun/feed/~4/Xw1hobGSM3E" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://nutrun.com/weblog/rackcacheheaders-code/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://nutrun.com/weblog/rackcacheheaders-code/</feedburner:origLink></item>
		<item>
		<title>97 Things Every Software Architect Should Know</title>
		<link>http://feedproxy.google.com/~r/nutrun/feed/~3/lrEjngk4Ris/</link>
		<comments>http://nutrun.com/weblog/97-things-every-software-architect-should-know/#comments</comments>
		<pubDate>Sat, 28 Feb 2009 13:26:27 +0000</pubDate>
		<dc:creator>George Malamidis</dc:creator>
				<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://nutrun.com/?p=258</guid>
		<description><![CDATA[A few months ago I wrote one of the axioms for a community effort called 97 Things Every Software Architect Should Know which was driven and edited by Richard Monson-Haefel. This collection of principles, as contributed by an impressive range of software architects around the world, was recently released as a book by O&#8217;Reilly Media [...]]]></description>
			<content:encoded><![CDATA[<p>A few months ago I wrote one of the axioms for a community effort called <a href="http://97-things.near-time.net/wiki" title="Home Page for 97 Things 		 [97 Things] : Near-Time">97 Things Every Software Architect Should Know</a> which was driven and edited by <a href="http://www.monson-haefel.com/" title="Monson-Haefel's Web Site">Richard Monson-Haefel</a>. This collection of principles, as contributed by an impressive range of software architects around the world, was recently released as a <a href="http://oreilly.com/catalog/9780596522698/index.html" title="97 Things Every Software Architect Should Know | O'Reilly Media">book</a> by <a href="http://oreilly.com/" title="O'Reilly Media - Spreading the knowledge of technology innovators">O&#8217;Reilly Media</a> and is well worth a look if you&#8217;re interested in pragmatic advice based on how some of our colleagues approach technology projects.</p>
<img src="http://feeds.feedburner.com/~r/nutrun/feed/~4/lrEjngk4Ris" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://nutrun.com/weblog/97-things-every-software-architect-should-know/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://nutrun.com/weblog/97-things-every-software-architect-should-know/</feedburner:origLink></item>
		<item>
		<title>Caching proxy fronted web consumer</title>
		<link>http://feedproxy.google.com/~r/nutrun/feed/~3/dhGygk-Fs1Y/</link>
		<comments>http://nutrun.com/weblog/caching-proxy-fronted-web-consumer/#comments</comments>
		<pubDate>Sat, 14 Feb 2009 14:31:16 +0000</pubDate>
		<dc:creator>George Malamidis</dc:creator>
				<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://nutrun.com/?p=242</guid>
		<description><![CDATA[Consider an application which as part of its functionality queries a product search web service.

WEB_SERVICE_ADDRESS = 'http://www.example.com'

url = URI.parse(WEB_SERVICE_ADDRESS)

Net::HTTP.start(url.host, url.port) do &#124;http&#124;
  http.get('/product-search', 'q' => 'guitar')
end

Inspecting the response headers, we notice the web service instructs consumers that the results of the query will remain the same for one hour.

curl -I "http://www.example.com/product-search?q=guitar"

HTTP/1.1 200 OK
Content-Type: text/html
Cache-Control: [...]]]></description>
			<content:encoded><![CDATA[<p>Consider an application which as part of its functionality queries a product search web service.</p>
<pre>
WEB_SERVICE_ADDRESS = 'http://www.example.com'

url = URI.parse(WEB_SERVICE_ADDRESS)

Net::HTTP.start(url.host, url.port) do |http|
  http.get('/product-search', 'q' => 'guitar')
end
</pre>
<p>Inspecting the response headers, we notice the web service instructs consumers that the results of the query will remain the same for one hour.</p>
<pre>
curl -I "http://www.example.com/product-search?q=guitar"

HTTP/1.1 200 OK
Content-Type: text/html
<strong>Cache-Control: max-age=3600, must-revalidate</strong>
Content-Length: 32650
Date: Sat, 14 Feb 2009 13:53:31 GMT
Age: 0
Connection: keep-alive
</pre>
<p>At this point we can choose to ignore the cache control header and keep on querying the service for this specific resource regardless of whether the response is going to be the same. This is suboptimal for the consumer, which will suffer unnecessary latency penalties, the service, which will have to respond to inessential requests, and the network which will be subject to unnecessary bandwidth usage. Another option involves making the web consumer aware of the service&#8217;s caching policies so that it only queries for data that it doesn&#8217;t have or data that&#8217;s become stale. This option remedies the above problems but introduces additional complexity to the consumer.</p>
<p>A third option involves introducing a caching proxy to the web consumer&#8217;s stack responsible for mediating the service/consumer interactions solely based on the content&#8217;s caching characteristics.</p>
<p><a href="http://www.flickr.com/photos/nutrun/3278914298/" title="caching-proxy-fronted-web-consumer by nutrunflickr, on Flickr"><img src="http://farm4.static.flickr.com/3054/3278914298_f039f380ff_o.png" width="422" height="149" alt="caching-proxy-fronted-web-consumer" /></a></p>
<p>Benefits of this approach include: The consumer never has to deal with any caching logic; No effort is required in re-implementing cache handling code; It is likely that the caching engine will perform better than custom caching code in the consumer because it&#8217;s been built and optimized for this purpose; The caching proxy can be re-used by more than one types of consumer or more than one instances of the same consumer in the stack. As a possible side-effect, the caching proxy is an additional layer to the consumer stack and this can result in network (the consumer&#8217;s LAN) latency.</p>
<p>Here&#8217;s the configuration needed in order to use <a href="http://varnish.projects.linpro.no/" title="Varnish - Trac">Varnish</a> as a caching web consumer proxy for the above example.</p>
<pre>
<strong># varnish.conf</strong>

backend default {
  .host = "www.example.com";
  .port = "http";
}
</pre>
<p>The only thing that changes in the consumer is the address it directs its requests to.</p>
<pre>
WEB_SERVICE_ADDRESS = <strong>'http://service-proxy'</strong>

url = URI.parse(WEB_SERVICE_ADDRESS)

Net::HTTP.start(url.host, url.port) do |http|
  http.get('/product-search', 'q' => 'guitar')
end
</pre>
<img src="http://feeds.feedburner.com/~r/nutrun/feed/~4/dhGygk-Fs1Y" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://nutrun.com/weblog/caching-proxy-fronted-web-consumer/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://nutrun.com/weblog/caching-proxy-fronted-web-consumer/</feedburner:origLink></item>
		<item>
		<title>Distributed key-value store indexing</title>
		<link>http://feedproxy.google.com/~r/nutrun/feed/~3/k_Lx-PfL0PM/</link>
		<comments>http://nutrun.com/weblog/distributed-key-value-store-indexing/#comments</comments>
		<pubDate>Sun, 01 Feb 2009 13:57:36 +0000</pubDate>
		<dc:creator>George Malamidis</dc:creator>
				<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://nutrun.com/?p=233</guid>
		<description><![CDATA[Distributed key-value stores present an interesting alternative to some of the functionality relational databases are commonly employed for. Advantages include improved performance, easy replication, horizontal scaling and redundancy.
By nature, key value stores offer one way of retrieving data, by some sort of primary key which uniquely identifies each entry. But what about queries that require [...]]]></description>
			<content:encoded><![CDATA[<p>Distributed key-value stores present an interesting alternative to some of the functionality relational databases are commonly employed for. Advantages include improved performance, easy replication, horizontal scaling and redundancy.</p>
<p>By nature, key value stores offer one way of retrieving data, by some sort of primary key which uniquely identifies each entry. But what about queries that require more elaborate input in order to collect relevant entries? Full text search engines like <a href="http://www.sphinxsearch.com/" title="Sphinx - Free open-source SQL full-text search engine" rel="nofollow">Sphinx</a> and <a href="http://lucene.apache.org/java/docs/" title="Apache Lucene - Overview" rel="nofollow">Lucence</a> do exactly this and when used in conjunction with a database will query their indexes and return a collection of ids which are then used to retrieve the results from the database. Full text search engines support indexing data sources other than RDBMSs, so there&#8217;s no reason why one couldn&#8217;t index a distributed key-value store.</p>
<p><a href="http://www.flickr.com/photos/nutrun/3244315588/" title="distributed-key-value-store-index by nutrunflickr, on Flickr"><img src="http://farm4.static.flickr.com/3515/3244315588_b9e2f08356_o.png" width="413" height="390" alt="distributed-key-value-store-index" /></a></p>
<p>Here, we&#8217;ll look at how we can integrate Sphinx with <a href="http://memcachedb.org/" title="MemcacheDB: A distributed key-value storage system designed for persistent">MemcacheDB</a>, a distributed key-value store which conforms to the <a href="http://www.danga.com/memcached/" rel="nofollow" title="memcached: a distributed memory object caching system">memcached</a> protocol and uses Berkeley DB as its storage back-end.</p>
<p>Sphinx comes with an <a href="http://www.sphinxsearch.com/docs/current.html#xmlpipe2" title="Sphinx - Free open-source SQL full-text search engine" rel="nofollow">xmlpipe2 datasource</a>, a generic XML interface aimed at simplifying custom integration. What this means is that our application can transform content from MemcacheDB into this format and feed it to Sphinx for indexing. The highlighted lines from the following Sphinx configuration instruct Sphinx to use the <code>xmlpipe2</code> source type and invoke the <code>ruby /app/lib/sphinxpipe.rb</code> script in order to retrieve the data to index.</p>
<pre>
<strong># sphinx.conf</strong>

source products_src
{
  <strong>type = xmlpipe2</strong>
  <strong>xmlpipe_command = ruby /app/lib/sphinxpipe.rb</strong>
}

index products
{
  source = products_src
  path = /app/sphinx/data/products
  docinfo = extern
  mlock = 0
  morphology = stem_en
  min_word_len = 1
  charset_type = utf-8
  enable_star = 1
  html_strip = 0
}

indexer
{
  mem_limit = 256M
}

searchd
{
  port = 3312
  log = /app/sphinx/log/searchd.log
  query_log = /app/sphinx/log/query.log
  read_timeout = 5
  max_children = 30
  pid_file = /app/sphinx/searchd.pid
  max_matches = 10000
  seamless_rotate = 1
  preopen_indexes = 0
  unlink_old = 1
}
</pre>
<p>Following is a Product class. Each product instance can present itself as <code>xmlpipe2</code> data. The class itself gets the entire product catalog as a <code>xmlpipe2</code> data source. It also has a <code>search</code> method used for querying Sphinx and retrieving matched products from MemcacheDB. Finally, there&#8217;s a <code>bootstrap</code> method for populating the store with some example data.</p>
<pre>
<strong># product.rb</strong>

require "rubygems"
require "xml/libxml"
require "memcached"
require "riddle"

class Product
  attr_reader :id
  MEM = Memcached.new('localhost:21201')

  def initialize(id, title)
    @id, @title = id, title
  end

  def to_sphinx_doc
    sphinx_document = XML::Node.new('sphinx:document')
    sphinx_document['id'] = @id
    sphinx_document &lt;&lt; title = XML::Node.new('title')
    title &lt;&lt; @title
    sphinx_document
  end

  <strong># Query sphinx and load products with matched ids from MemcacheDB</strong>
  def self.search(query)
    client = Riddle::Client.new
    client.match_mode = :any
    client.max_matches = 10_000
    results = client.query(query, 'products')
    ids = results[:matches].map {|m| m[:doc].to_s}
    MEM.get(ids) if ids.any?
  end

  <strong># Load all products from MemcacheDB and convert them to xmlpipe2 data</strong>
  def self.sphinx_datasource
    docset = XML::Document.new.root = XML::Node.new("sphinx:docset")
    docset &lt;&lt; sphinx_schema = XML::Node.new("sphinx:schema")
    sphinx_schema &lt;&lt; sphinx_field = XML::Node.new('sphinx:field')
    sphinx_field['name'] = 'title'

    keys = MEM.get('product_keys')
    products = MEM.get(keys)
    products.each { |id, product| docset &lt;&lt; product.to_sphinx_doc }

    %(&lt;?xml version="1.0" encoding="utf-8"?&gt;\n#{docset})
  end

  <strong># Create a some products and store them in MemcacheDB</strong>
  def self.bootstrap
    product_ids = ('1'..'5').to_a.inject([]) do |ids, id|
      product = Product.new(id, "product #{id}")
      MEM.set(product.id, product)
      ids &lt;&lt; id
    end
    MEM.set('product_keys', product_ids)
  end
end
</pre>
<p>The <code>sphinxpipe.rb</code> script looks like this.</p>
<pre>
<strong># sphinxpipe.rb</strong>
Product.bootstrap
puts Product.sphinx_datasource
</pre>
<p>With MemcacheDB (or even memcached for the purpose of this example) running, we can tell Sphinx to create an index of products by invoking <code>indexer --all -c sphinx.conf</code> and then start the search daemon &#8211; <code>searchd -c sphinx.conf</code>. Now we&#8217;re ready to start querying the index and retrieving results from the distributed store.</p>
<pre>
puts Product.search('product 1').inspect
</pre>
<p>It is not uncommon for the database to become a performance hotspot. The integration of a fast, distributed key-value store with an efficient search engine can be an interesting substitute for high throughput data retrieval operations.</p>
<img src="http://feeds.feedburner.com/~r/nutrun/feed/~4/k_Lx-PfL0PM" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://nutrun.com/weblog/distributed-key-value-store-indexing/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		<feedburner:origLink>http://nutrun.com/weblog/distributed-key-value-store-indexing/</feedburner:origLink></item>
		<item>
		<title>State separation</title>
		<link>http://feedproxy.google.com/~r/nutrun/feed/~3/yelXLmV9BTU/</link>
		<comments>http://nutrun.com/weblog/state-separation/#comments</comments>
		<pubDate>Sun, 01 Feb 2009 03:06:14 +0000</pubDate>
		<dc:creator>George Malamidis</dc:creator>
				<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://nutrun.com/?p=230</guid>
		<description><![CDATA[It is usual for web applications to deal with serving content specific to a user&#8217;s session. This makes web caching harder to implement as we don&#8217;t want content that is meant to be viewed by a particular user being cached and accidentally offered to others. Some HTTP accelerators like Varnish choose to by default completely [...]]]></description>
			<content:encoded><![CDATA[<p>It is usual for web applications to deal with serving content specific to a user&#8217;s session. This makes web caching harder to implement as we don&#8217;t want content that is meant to be viewed by a particular user being cached and accidentally offered to others. Some HTTP accelerators like <a href="http://varnish.projects.linpro.no/" title="Varnish - Trac">Varnish</a> choose to by default completely ignore responses that contain cookies. However, not all content is always tied to a user&#8217;s session, and if that content doesn&#8217;t change in real time, it makes sense to cache the parts that are common to all users in order to improve efficiency. With this in mind, one logical split could be made between parts of the system that are globally cache friendly and ones that aren&#8217;t.</p>
<p>Consider online retailer websites which usually operate in two modes, one for visitors and one for logged in users. Logged in users are presented with a customized, session specific experience, yet data like the product catalog is essentially the same regardless of whether one is logged in or not and it makes sense for everyone to be accessing the same cached copy of a common resource.</p>
<p>A possible solution involves creating two separate web applications, one entirely dedicated to stateless interactions and one meant for pages that are rendered as part of a user&#8217;s session. This might seem like overkill, but it clearly enforces the divide between what can and what can&#8217;t be cached. It also promotes reuse of the system&#8217;s web caching layer, which now serves content to site &#8220;visitors&#8221; as well as to the stateful components. The stateful application can delegate requests for potentially cached content to its stateless counterpart via the caching layer and decorate the responses with session specific data.</p>
<p><a href="http://www.flickr.com/photos/nutrun/3242283753/" title="split_by_state by nutrunflickr, on Flickr"><img src="http://farm4.static.flickr.com/3128/3242283753_9731c494c0_o.png" width="378" height="271" alt="split_by_state" /></a></p>
<p>Web caching presents but one way to cache data that remains static for predefined periods of time. Apart from harnessing proven existing tools, this form of caching comes with the advantage that its policies are universally understood and can significantly improve a website&#8217;s efficiency in ways beyond the maintainer&#8217;s control. Retrofitting web caching into an application that hasn&#8217;t been designed with it mind can be difficult, therefore it is worth to logically separate cacheable and non cacheable resources early on.</p>
<img src="http://feeds.feedburner.com/~r/nutrun/feed/~4/yelXLmV9BTU" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://nutrun.com/weblog/state-separation/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://nutrun.com/weblog/state-separation/</feedburner:origLink></item>
		<item>
		<title>Live component rotation</title>
		<link>http://feedproxy.google.com/~r/nutrun/feed/~3/3Fc-sXBg4Wo/</link>
		<comments>http://nutrun.com/weblog/live-component-rotation/#comments</comments>
		<pubDate>Thu, 22 Jan 2009 01:42:22 +0000</pubDate>
		<dc:creator>George Malamidis</dc:creator>
				<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://nutrun.com/?p=226</guid>
		<description><![CDATA[Many applications comprise of a number of components, the majority of which are shared by others in the system. Different parts of the system exercise their collaborators in a variety of ways, think of a website where data is periodically processed by jobs and stored in a database while presentation modules handle rendering the data [...]]]></description>
			<content:encoded><![CDATA[<p>Many applications comprise of a number of components, the majority of which are shared by others in the system. Different parts of the system exercise their collaborators in a variety of ways, think of a website where data is periodically processed by jobs and stored in a database while presentation modules handle rendering the data in ways meaningful to end users. Shared resources can yield the unwanted side effect of performance degradation when a given component is being pushed too hard to perform part of its tasks, affecting each piece of the system that depends on it. In the shared database website example, the website might suffer low response times while potentially heavy on the database processing jobs are running.</p>
<p>One way of getting around this problem involves creating more than one instances of the shared resource, one of which is considered &#8220;live&#8221;, the one the system&#8217;s clients interact with, and perform expensive operations on a copy which will itself become live the moment these operations conclude. This solution does not apply to every situation but can be useful in scenarios where real time is not a concern. In the example website&#8217;s case, we can create a copy of the database on which we run the processing jobs. The front end components run off the &#8220;stale&#8221;, live database copy whose performance is not affected by the jobs. Once the jobs complete we can switch databases and repeat the live component rotation process as needed. Live component rotation also nicely lends itself to distribution, as component copies can exist on different physical hosts.</p>
<p>Virtualization and cloud computing make this method all the more interesting. Imagine hosting a database server on Amazon EC2 with its static data stored on an EBS volume. We can snapshot the EBS volume, fire up a new EC2 instance, attach the snapshot to it, run the job and rotate live database instances once the jobs are complete with most parts of the system never having to worry about the costly operations taking place.</p>
<img src="http://feeds.feedburner.com/~r/nutrun/feed/~4/3Fc-sXBg4Wo" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://nutrun.com/weblog/live-component-rotation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://nutrun.com/weblog/live-component-rotation/</feedburner:origLink></item>
	</channel>
</rss>
