<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom">
  <title>Robey</title>
  
  <link href="http://robey.lag.net/" />
  <updated>2009-11-07T10:47:15-08:00</updated>
  <id>http://robey.lag.net/</id>
  <author>
    <name>Robey Pointer</name>
    <email>robeypointer@gmail.com</email>
  </author>
  
    <link rel="self" href="http://feeds.feedburner.com/robey" type="application/atom+xml" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com" /><entry>
      <title>Fanout queues in kestrel 1.2</title>
      <link href="http://robey.lag.net/2009/10/10/fanout-queues.html" />
      <updated>2009-10-10T00:00:00-07:00</updated>
      <id>http://robey.lag.net/2009/10/10/fanout-queues</id>
      <content type="html">&lt;p&gt;I bumped kestrel to version 1.2 this week and there are a few new features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Queue files are loaded at startup time now, before opening the listening
port.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You can explicitly abort an outstanding transcational &lt;code&gt;GET&lt;/code&gt; by passing the
&lt;code&gt;/abort&lt;/code&gt; option (in the same way as &lt;code&gt;/open&lt;/code&gt; or &lt;code&gt;/close&lt;/code&gt;).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You can also peek at the current head item on a queue with &lt;code&gt;/peek&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;New options &lt;code&gt;max_item_size&lt;/code&gt; and &lt;code&gt;sync_journal&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Queues can be deleted with &lt;code&gt;DELETE&lt;/code&gt;, which just forgets every item and
erases any journal file (ie: fast &amp;amp; unrecoverable).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Some bugs in stats reporting were fixed.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Hopefully the documentation is much better. There&amp;rsquo;s a
&lt;a href="http://github.com/robey/kestrel/blob/master/docs/guide.md"&gt;guide&lt;/a&gt; now.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Thanks to Clement, Zachary Kim, Blair Zajac, and Frederic Jean for submitting
patches!&lt;/p&gt;

&lt;p&gt;Special thanks to Justin Azoff, who&amp;#8217;s responsible for the coolest new feature:
fanout queues. My coworkers had been asking about for this feature for a
while, and he posted a branch on github which added them in a way that makes
sense to me, so I pulled it in with minor changes. Here&amp;rsquo;s how they work:&lt;/p&gt;

&lt;p&gt;A fanout queue is a &amp;ldquo;parent&amp;rdquo; queue that has &amp;ldquo;children&amp;rdquo; queues. Whenever an
item is added to the parent, the same item is added automatically to each
child. The children each have their own journal files, and are essentially
independent queues. An item removed from one child doesn&amp;rsquo;t affect any other
children. In fact, you can even add items directly to a child if you want,
bypassing the fanout. The primary distinction is that &lt;code&gt;SET&lt;/code&gt;s to the parent
are repeated to each child automatically.&lt;/p&gt;

&lt;p&gt;A child queue is created automatically by referring to a queue with a &lt;code&gt;+&lt;/code&gt; in
its name. For example, a queue named &amp;ldquo;orders+audit&amp;rdquo; is considered to be a
child of the &amp;ldquo;orders&amp;rdquo; queue. Once a child is created, it starts receiving new
items immediately, but existing items in the parent are not copied over. (The
child&amp;rsquo;s history begins now.) Fanout stops when a child is &lt;code&gt;DELETE&lt;/code&gt;d.&lt;/p&gt;

&lt;p&gt;Children use the same configuration as their parent, to make things easier.
There&amp;rsquo;s no specific configuration for a child. Also, you can expect fanout
queues to use more disk space and memory, since items are being copied around
muliple times.&lt;/p&gt;

&lt;p&gt;Hopefully the simplicity of the implementation makes it easy to reason about,
which I think is more important than having a lot of clever tricks or magic.&lt;/p&gt;

&lt;p&gt;Kestrel is, as always, hosted here: &lt;strong&gt;&lt;a href="http://github.com/robey/kestrel"&gt;http://github.com/robey/kestrel&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
</content>
    </entry>
  
    <entry>
      <title>Follow-up on Scala on Android</title>
      <link href="http://robey.lag.net/2009/07/19/followup-android.html" />
      <updated>2009-07-19T00:00:00-07:00</updated>
      <id>http://robey.lag.net/2009/07/19/followup-android</id>
      <content type="html">&lt;p&gt;Dan Bornstein had a correction about my guesswork on how dx works, and I got permission to post it
here. I said:&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;The end result is a little wasteful: even a tiny &amp;ldquo;hello world&amp;rdquo; app is 800K, probably because of
that big ol' scala library. I suspect dex is proactively throwing away classes that I didn&amp;rsquo;t
use, or it would be even bigger.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;Dan replies:&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;Actually, dx doesn&amp;rsquo;t do any sort of tree-shaking. It&amp;rsquo;s just that I
obsessed over making the dex format compact, and I got a bit of help
on that front from mikef who designed a super-tight encoding for debug
info (line numbers, local variables, etc.).&lt;/p&gt;

&lt;p&gt;An &lt;em&gt;uncompressed&lt;/em&gt; dex file is usually a little smaller (5% or so) than
a corresponding (compressed) jar file, and for distribution a
&lt;em&gt;compressed&lt;/em&gt; dex file is generally about 40-45% the size of &lt;em&gt;that&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;For comparison on the distribution front, a compressed pack200 is
still usually a bit better than a compressed dex file, weighing in at
30-35% of the original compressed jar size, a win which it gets by
translating class files into a compressability-optimized intermediate
format (doing stuff like collating immediate integer constants out of
bytecode streams). The distinction here is that the dex format loses a
bit of compressability, trading that for suitability to be directly
executed.&lt;/p&gt;

&lt;p&gt;We decided that the overhead (both in terms of code and runtime) of
doing something like pack200 wasn&amp;rsquo;t worth it for us, at least not yet,
because most apps don&amp;rsquo;t actually have that much code (much more likely
to be heavy with media), though if people start regularly including
language runtimes with their apps, we may want to revise.&lt;/p&gt;

&lt;p&gt;Oh, a much more minor tidbit: You can pass dx options for the
underlying JVM by prefixing them with &amp;ldquo;-J&amp;rdquo;, e.g. &amp;ldquo;-JXmx2000m&amp;rdquo; or
what-have-you.&lt;/p&gt;&lt;/blockquote&gt;
</content>
    </entry>
  
    <entry>
      <title>Actors talk for SDForum</title>
      <link href="http://robey.lag.net/2009/06/25/actors-talk.html" />
      <updated>2009-06-25T00:00:00-07:00</updated>
      <id>http://robey.lag.net/2009/06/25/actors-talk</id>
      <content type="html">&lt;p&gt;Last night I got to give a
&lt;a href="http://sdforum.org/index.cfm?fuseaction=Calendar.eventDetail&amp;amp;eventID=13413"&gt;short talk at SDForum&lt;/a&gt;
on &amp;ldquo;solving real problems with actors&amp;rdquo;, or more
specifically, why I got interested in using scala actors, how they&amp;rsquo;re used in naggati and kestrel,
and what I learned about where they&amp;rsquo;re useful and not. It was fun, even though I was a little
nervous &amp;mdash; they had me speaking after Carl Hewitt, one of the instigators of the actor model,
and Frank Sommers, who&amp;rsquo;s writing a book about actors in scala.&lt;/p&gt;

&lt;p&gt;Here are the slides:
&lt;a href="http://robey.lag.net/docs/robey-actors-sdforum.pdf"&gt;http://robey.lag.net/docs/robey-actors-sdforum.pdf&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Thanks to everyone who came, and everyone who chatted afterwards! It was worth the drive. :)&lt;/p&gt;
</content>
    </entry>
  
    <entry>
      <title>Performance bug in kestrel 1.1</title>
      <link href="http://robey.lag.net/2009/05/18/kestrel-bug.html" />
      <updated>2009-05-18T00:00:00-07:00</updated>
      <id>http://robey.lag.net/2009/05/18/kestrel-bug</id>
      <content type="html">&lt;p&gt;If you&amp;rsquo;re using (or have tried to use) kestrel 1.1, you should upgrade to 1.1.1
(&lt;a href="http://github.com/robey/kestrel/tree/master"&gt;in github&lt;/a&gt;) for a performance fix.&lt;/p&gt;

&lt;p&gt;Kestrel uses &lt;a href="http://github.com/robey/naggati/tree/master"&gt;naggati&lt;/a&gt;, an actor-adapter for
&lt;a href="http://mina.apache.org/"&gt;mina&lt;/a&gt;, to handle I/O and session management. Mina is a java library
for handling thousands of open connections across NIO, and naggati converts mina events from
java-style &amp;ldquo;listener method calls&amp;rdquo; to actor messages. I wrote all about mina &amp;amp; naggati in
&lt;a href="http://robey.lag.net/2009/03/02/actors-mina-and-naggati.html"&gt;another article&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;One feature added to naggati 0.6 was the ability to filter out mina events you didn&amp;rsquo;t care about. So
if, for example, you didn&amp;rsquo;t care to receive an event message every time mina finished transmitting a
block of data, you could filter out those messages:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;IoHandlerActorAdapter.filter(session) -= classOf[MinaMessage.MessageSent]
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Potentially this saves you a lot of unwanted actor messages that you'd just ignore anyway, causing
fewer context switches. If the feature works correctly&amp;hellip; aye, there&amp;rsquo;s the rub.&lt;/p&gt;

&lt;p&gt;Naggati 0.6 would accidentally overwrite any filter changes that happened during an actor&amp;rsquo;s
constructor. Kestrel&amp;rsquo;s session actors changed their filters in the constructor, and so their filter
changes were lost. This caused kestrel actors to receive reams of unwanted messages like
&lt;code&gt;MessageSent&lt;/code&gt;, which by itself would only be a minor annoyance, and would have the same performance
behavior as kestrel 1.0.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;But&lt;/em&gt;, in actors, if you don&amp;rsquo;t pattern match on a message type, and you receive a message of that
type, the message just sits in your queue forever. Long-lived connections would eventually have
thousands of these unwanted &lt;code&gt;MessageSent&lt;/code&gt; messages, which had to be &lt;em&gt;linearly&lt;/em&gt; traversed to get to
interesting messages. Scanning this queue on every I/O event eventually slowed down a long-lived
session to a crawl &amp;mdash; and it consumed memory in the process.&lt;/p&gt;

&lt;p&gt;This bug is now fixed in naggati 0.7 and kestrel 1.1.1. Kudos to John Kalucki, who found the bug
during his own stress tests and persevered when I couldn&amp;rsquo;t reproduce it myself.&lt;/p&gt;
</content>
    </entry>
  
    <entry>
      <title>Sidekick LX party</title>
      <link href="http://robey.lag.net/2009/05/15/sidekick-lx.html" />
      <updated>2009-05-15T00:00:00-07:00</updated>
      <id>http://robey.lag.net/2009/05/15/sidekick-lx</id>
      <content type="html">&lt;p&gt;When Nick, Evan and I stumbled out of &amp;ldquo;Star Trek&amp;rdquo; at the Metreon last night, it was 1am. Nick
announced that there was some kind of party at the Mezzanine, and Jack was there, but it seemed
pretty late, and my iPhone said there was a final bart train in 10 minutes, so I decided to clock
out and go home. Nick and Evan went on to the club.&lt;/p&gt;

&lt;p&gt;All 4 bart entrances on Market around Powell were gated up, though. (How exactly is one supposed to
take the last bart train?) So I texted back that I was coming anyway and went to the Mezzanine in
Mint Plaza. The place was set up like an LA bar in decline, with bouncers and a bunch of signs
celebrating the &amp;ldquo;Sidekick LX&amp;rdquo;, which is apparently the name of the final Sidekick. I wasn&amp;rsquo;t sure I
was at the right place and had to text Nick a bit before I decided to go in.&lt;/p&gt;

&lt;p&gt;It was packed, even at 1am. There was no cover and it was an open bar filled with all kinds of
kitchy decorations to celebrate the launch of Danger&amp;rsquo;s final phone in some sort of fake
Metropolis-slash-amalgamation of all American cities (plus an English phone booth). One of the first
things I witnessed was a set of fake &amp;ldquo;telephone poles&amp;rdquo; fall on top of partiers, light bulbs tinkling
under our feet. I immediately took this as a metaphor for Danger.&lt;/p&gt;

&lt;p&gt;There were no other Danger people there, and a very buxom woman wearing a Sidekick LX shirt wasn&amp;rsquo;t
even sure what Danger might be. But never mind, the drinks were free, and DJ Jazzy Jeff was on
stage. Another guy was &amp;ldquo;MC"ing, which appeared to mean that for several hours he had to walk around
on stage shouting "What! What! DJ Jazzy Jeff in the house! Say what! Yeaaaaah!&amp;rdquo;&lt;/p&gt;

&lt;p&gt;Everyone in the audience was taking pictures of each other and texting &amp;mdash; using blackberries and
iPhones. The only Sidekick I saw anywhere was in the hands of a drunk T-Mobile rep. But Twitter was
represented! A looping movie behind the &amp;ldquo;MC&amp;rdquo; showed a bird flying around, and a Sidekick that opened
up to reveal 3 apps: Facebook, Myspace (?!), and Twitter. Neet! Jack teased me a few times about the
work I'd put into making a push-notification service for the Sidekick, which they were too
disorganized to make use of for six months until we finally just threw the code away. I think the
current Sidekick Twitter app just uses the same polling API as everyone else. Not a metaphor: That&amp;rsquo;s
a good example of how poor Danger&amp;rsquo;s follow-through was.&lt;/p&gt;

&lt;p&gt;This all could very well have been a dream or nightmare, or both combined. Certainly an event that
combines Twitter, Danger, DJ Jazzy Jeff, and free drinks sounds highly improbable. But I must report
it as I remember it, because that is the only way I know how.&lt;/p&gt;
</content>
    </entry>
  
    <entry>
      <title>Does anyone actually use JMX?</title>
      <link href="http://robey.lag.net/2009/03/29/anyone-use-jmx.html" />
      <updated>2009-03-29T00:00:00-07:00</updated>
      <id>http://robey.lag.net/2009/03/29/anyone-use-jmx</id>
      <content type="html">&lt;p&gt;In java 5 update 6, or 1.5.0_06, or whatever goofy release naming scheme
they&amp;rsquo;re using now, there&amp;rsquo;s a cool replacement for jconsole called &amp;ldquo;visualvm&amp;rdquo;.
(On my mac, it didn&amp;rsquo;t come with the installed java, but it can easily be
&lt;a href="https://visualvm.dev.java.net/"&gt;downloaded&lt;/a&gt;.) It cleans up jconsole a
little bit, and merges in some of the other tools. You can monitor, take heap
snapshots, and even do an extremely skeletal form of profiling.&lt;/p&gt;

&lt;p&gt;We were tinkering with hooking up some of our servers' stats to monitoring
tools by exposing them as &amp;ldquo;mbeans&amp;rdquo; across JMX (Java Management Xtreme), and
someone suggested that it would be nice if the server&amp;rsquo;s configuration could be
viewed and edited through the same console interface. So during some of the
more lucid moments of my two-week-long cold, I tried hooking up a JMX
interface to &lt;a href="http://github.com/robey/configgy/"&gt;configgy&lt;/a&gt; to do just that.&lt;/p&gt;

&lt;p&gt;It hasn&amp;rsquo;t gone extremely well.&lt;/p&gt;

&lt;p&gt;Java &amp;mdash; and JMX in particular &amp;mdash; has reams of documentation, but most of it
assumes you are a drooling infant or recent PHP convert. Quite a lot of it
assumes you want to write your own replacement for jconsole or visualvm. Very
little of it is written for someone who may just want to hook up management
tools to an existing app or server. This seems strange when java is used so
heavily for servers.&lt;/p&gt;

&lt;p&gt;I finally figured out that the best interface for a tree of config nodes was
to create a tree of &lt;code&gt;DynamicMBean&lt;/code&gt; objects which report their attributes as
being the attributes on the config node. This works pretty well. The interface
through visualvm won&amp;rsquo;t win any awards (and would make a UX person weep openly)
but functions basically the way you expect and lets you discover and edit
things in a straightforward way.&lt;/p&gt;

&lt;p&gt;&lt;img src="/images/configgy-jmx.png" alt="not bad, for java" /&gt;&lt;/p&gt;

&lt;p&gt;The fatal flaw in the UI is that if you add a new config item, visualvm
doesn&amp;rsquo;t update the display tab to show the new attribute. (Jconsole behaves
similarly.) At some level, it does appear to know to re-fetch the &amp;ldquo;mbeans&amp;rdquo; for
the config tree, because I can see it doing that in the debugger. But it
doesn&amp;rsquo;t use this information to update the display.&lt;/p&gt;

&lt;p&gt;Looks like I have 2 options:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Register &amp;ldquo;mbeans&amp;rdquo; as the config nodes, and just live with the fact that
they won&amp;rsquo;t be updated for new attributes. If you add a new attribute,
you&amp;rsquo;ll have to disconnect visualvm and reconnect it to see everything.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Any time any config attribute changes, unregister and re-register the
&amp;ldquo;mbeans&amp;rdquo; for config nodes. This makes visualvm destroy all open tabs to
config nodes and rebuild the tree of config nodes in the left panel. You
have to open the nodes back up to see the changes.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;For now, I'm leaning toward #1 but both options make me sad. The punchline is
this article I found while searching around on google, since I was convinced I
must be doing it wrong:&lt;/p&gt;

&lt;p&gt;&lt;a href="http://forums.sun.com/thread.jspa?threadID=5218394"&gt;http://forums.sun.com/thread.jspa?threadID=5218394&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The thread summary seems to be: &amp;ldquo;We have documented a way for dynamic &amp;lsquo;mbeans&amp;rsquo;
to indicate that their attributes may change. We did not code our own tools to
follow our own documentation so no existing tool uses this hint. Has our
incompetence convinced you that you really want to do what you want to do, or
are you eyeing that python book with a bit more interest now?&amp;rdquo;&lt;/p&gt;

&lt;p&gt;Is there anyone out there that actually uses JMX for real?&lt;/p&gt;
</content>
    </entry>
  
    <entry>
      <title>Actors, Mina, and Naggati</title>
      <link href="http://robey.lag.net/2009/03/02/actors-mina-and-naggati.html" />
      <updated>2009-03-02T00:00:00-08:00</updated>
      <id>http://robey.lag.net/2009/03/02/actors-mina-and-naggati</id>
      <content type="html">&lt;p&gt;A few weeks ago, Jorge Ortiz gave a good talk at BASE about actors. I want to
summarize a bit of that and explain why I wrote naggati, and why I think
mina-plus-actors is a big deal.&lt;/p&gt;

&lt;h2&gt;The history&lt;/h2&gt;

&lt;p&gt;In the 90s, we server coders wrote daemons using the fork/exec process model.
Apache still does it this way, though you can switch to threaded models in the
current version.&lt;/p&gt;

&lt;p&gt;&lt;img src="/images/naggati1.png" alt="fork/exec model" /&gt;&lt;/p&gt;

&lt;p&gt;Having a bunch of processes makes coordination difficult (you have to use
shared memory, pipes, or something similar) but works well for a web server
where there isn&amp;rsquo;t really any shared state except configuration, and you can
just respawn new processes if the configuration changes.&lt;/p&gt;

&lt;p&gt;For a chat server, you&amp;rsquo;d like a lot of shared state and communication between
the various sessions. So in the late 90s (early 2000s on Linux), we started
using threads. Each incoming connection spawns a new thread.&lt;/p&gt;

&lt;p&gt;&lt;img src="/images/naggati2.png" alt="each session gets its own thread" /&gt;&lt;/p&gt;

&lt;p&gt;Each session is exactly one thread. Because threads all share the same memory
space, communication between sessions is easy. You just have to make sure to
use locks and condition variables, and to get the locking logic right &amp;mdash; which
can be tricky.&lt;/p&gt;

&lt;p&gt;When you start to have a lot of sessions, the number of active threads begins
to get a little unmanagable. At a past job, we had a daemon that would handle
between 3000-4000 sessions this way. We couldn&amp;rsquo;t push it past that, because
the OS wasn&amp;rsquo;t able to allocate stack space for more than about 4000 threads,
and the task switching overhead was getting pretty ridiculous anyway.&lt;/p&gt;

&lt;p&gt;Most sessions are blocked waiting for new data most of the time, so in the
mid-2000s, server coders migrated to a thread-pool model.&lt;/p&gt;

&lt;p&gt;&lt;img src="/images/naggati3.png" alt="thread pool" /&gt;&lt;/p&gt;

&lt;p&gt;Java even included a good thread pool library in java 5. In this model, you
can have thousands of open sessions or socket connections, but you farm out
work across a smaller pool of threads. Usually you have one or two threads in
a socket &lt;code&gt;poll()&lt;/code&gt; call (or &lt;code&gt;Selector&lt;/code&gt; in java) across all the open sockets,
and when incoming data arrives, it posts a work item into a queue, which gets
handled by the next available thread from the pool.&lt;/p&gt;

&lt;h2&gt;Here&amp;rsquo;s where mina comes in&lt;/h2&gt;

&lt;p&gt;&lt;a href="http://mina.apache.org/"&gt;Mina&lt;/a&gt; takes this thread-pool pattern and pulls it
out into a reusable library. You can register a socket with mina and get an
&lt;code&gt;IoSession&lt;/code&gt; object back. There&amp;rsquo;s a &lt;code&gt;Selector&lt;/code&gt; running in its own thread, and
when an I/O event happens on one of your registered sockets, mina calls an
event method on your &lt;code&gt;IoHandler&lt;/code&gt;, passing in the &lt;code&gt;IoSession&lt;/code&gt;. These methods
represent events like &amp;ldquo;new data has arrived&amp;rdquo; or &amp;ldquo;some data you sent has been
successfully transmitted&amp;rdquo;, and they are called from within a thread pool, or
other executor of your choosing.&lt;/p&gt;

&lt;p&gt;The great thing mina adds to this mix is the idea of a &amp;ldquo;protocol decoder&amp;rdquo;.
Instead of handling I/O events in your session and doing stateful buffering
and protocol decoding there, mina lets you write a &lt;code&gt;ProtocolFilter&lt;/code&gt; that
receives incoming data as a byte buffer. This filter can do the buffering and
decoding, and pass decoded objects on to the session. For example, a web
server could have a decoder that takes byte buffers and generates
&lt;code&gt;HttpRequest&lt;/code&gt;s.&lt;/p&gt;

&lt;p&gt;The protocol decoding happens in a special set of I/O processor threads
maintained by the mina library, so your &lt;code&gt;IoHandler&lt;/code&gt; only gets events for fully
decoded objects, and the handler code can be small and simple. In the web
server example, an &lt;code&gt;IoHandler&lt;/code&gt; would receive &lt;code&gt;HttpRequest&lt;/code&gt; objects as events
that run in a worker thread from a thread pool. The repsonse can be created
and queued for writing, and then the event is over. Since the queued work
items are small, discrete tasks that finish quickly, many active sessions can
be handled across a much smaller number of threads.&lt;/p&gt;

&lt;p&gt;There are only two big speedbumps with this system:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;You&amp;rsquo;re using threads, so you need to be very careful about locking and all
the other threading issues.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The protocol decoder is isolated into its own filter class, but it still
has to be stateful and buffered. So it will still be ugly and gnarly.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;


&lt;h2&gt;Enter actors&amp;hellip;&lt;/h2&gt;

&lt;p&gt;Actors solve problem 1. There are
&lt;a href="http://www.stanford.edu/class/cs94si/Concurrency.pdf"&gt;several&lt;/a&gt;
&lt;a href="http://www.scala-lang.org/node/242"&gt;good&lt;/a&gt;
&lt;a href="http://java.dzone.com/articles/scala-threadless-concurrent"&gt;articles&lt;/a&gt; about
actors out there, so I won&amp;rsquo;t belabor the point. With actors, you can write
code in a way similar to writing threaded code, but all communication happens
through message passing. The actor library promises that any actor will only
be running in one thread at a time, so you can avoid using locks entirely if
you want. Shared state can be passed around as &amp;ldquo;messages&amp;rdquo; that are immutable
objects.&lt;/p&gt;

&lt;p&gt;There&amp;rsquo;s a pretty clear one-to-one mapping between the events that mina would
like to notify you about, and messages passed to an actor. It&amp;rsquo;s
straightforward enough that I put it into naggati as a helper class called
&lt;a href="http://github.com/robey/naggati/blob/7e9236bd6c5e7646afc6def397b034c7b5ce3a85/src/main/scala/net/lag/naggati/IoHandlerActorAdapter.scala"&gt;IoHandlerActorAdapter&lt;/a&gt;.
It creates an &lt;code&gt;IoHandler&lt;/code&gt; that handles every event by sending a corresponding
message to the actor handling that session. (For example, the
&lt;code&gt;messageReceived&lt;/code&gt; method causes &lt;code&gt;MinaMessage.MessageReceived&lt;/code&gt; to be sent.)&lt;/p&gt;

&lt;p&gt;This line in kestrel is all that&amp;rsquo;s needed to tell mina to create a new actor
for each incoming connection:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;acceptor.setHandler(new IoHandlerActorAdapter(session =&amp;gt; new KestrelHandler(session, config)))
&lt;/code&gt;&lt;/pre&gt;

&lt;h2&gt;&amp;hellip;and naggati&lt;/h2&gt;

&lt;p&gt;Mina&amp;rsquo;s written in java, so they can&amp;rsquo;t do much about the way protocol filters
need to be written, beyond giving you some tools to store state between calls
and some base classes you can use to get implementations of some common bits
(like reading a line). But scala&amp;rsquo;s syntax supports attaching code blocks to
method calls as &lt;code&gt;Function&lt;/code&gt; objects, so we can write code that looks sequential
but is actually a series of continuations.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://github.com/robey/naggati/"&gt;Naggati&lt;/a&gt; is a library that makes it easy to
build protocol filters for mina 2.0 using this sequential style. A DSL allows
you to write the decoder in sequence, but each decoding step is actually a
state in a state machine, and when one step is finished decoding, it calls the
code block for the next step.&lt;/p&gt;

&lt;p&gt;As an example, let&amp;rsquo;s say there&amp;rsquo;s a simple protocol that sends packets. Each
packet starts with an (ascii) count of the number of bytes in the packet,
followed by a linefeed, and then the actual bytes. (This is how HTTP chunked
encoding works.) We could write the decoder like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;case class DataBlob(data: Array[Byte])

val decode = readLine { line =&amp;gt;
  readByteBuffer(line.toInt) { data =&amp;gt;
    state.out.write(DataBlob(data))
    End
  }
}
val decoder = new Decoder(decode)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The first line just defines a case class for the decoded objects we&amp;rsquo;re going
to be throwing to an actor inside a &lt;code&gt;MinaMessage.MessageReceived&lt;/code&gt; event.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;decode&lt;/code&gt; step is where it gets interesting. It just calls &lt;code&gt;readLine&lt;/code&gt; with
a block of code that will process that line when it&amp;rsquo;s read. &lt;code&gt;readLine&lt;/code&gt; returns
a &lt;code&gt;Step&lt;/code&gt; object that contains that code block and some logic for reading bytes
from mina&amp;rsquo;s buffers until a linefeed is found. Once a complete line has been
read, &lt;code&gt;readLine&lt;/code&gt; passes it to the saved code block.&lt;/p&gt;

&lt;p&gt;In this example, that code block just contains another call (&lt;code&gt;readByteBuffer&lt;/code&gt;)
which returns a &lt;code&gt;Step&lt;/code&gt; that reads a certain number of bytes and then passes
them on to the attached code block. The nested code block takes that byte
array and sends it back to mina as a decoded object, to be handed off to a
session actor. &lt;code&gt;End&lt;/code&gt; is a special state object that means decoding is
complete, and the decoder should reset and start over at the beginning again.&lt;/p&gt;

&lt;p&gt;We&amp;rsquo;ve written a tiny state machine here, which buffers data as it arrives and
progresses through each state as the previous state is completed.
&lt;em&gt;But we didn&amp;rsquo;t have to write it that way&lt;/em&gt;. By passing code blocks around, we
got to write a very simple, sequential description of the protocol without
worrying about callbacks or buffers.&lt;/p&gt;

&lt;p&gt;There are a couple of more in-depth examples in the naggati source code,
including an ASN.1 tag decoder, and a simple HTTP request decoder. They&amp;rsquo;re too
long to quote here, but they demonstrate things like using a value from one
decoded state to help decode a later state. Kestrel is another good example,
because it uses nagatti to decode memcache requests in a pretty small chunk of
code.&lt;/p&gt;

&lt;p&gt;With actors, mina, and naggati, I think the promise of a simple programming
model for servers with thousands of connections is here. You can spend your
time thinking about and coding up the actual logic of a server, and not worry
so much about how to juggle all those sockets and threads.&lt;/p&gt;

&lt;p&gt;Hopefully in this tiny space I&amp;rsquo;ve convinced you too.&lt;/p&gt;
</content>
    </entry>
  
    <entry>
      <title>Paramiko is on github now</title>
      <link href="http://robey.lag.net/2009/02/16/paramiko-is-on-github-now.html" />
      <updated>2009-02-16T00:00:00-08:00</updated>
      <id>http://robey.lag.net/2009/02/16/paramiko-is-on-github-now</id>
      <content type="html">&lt;h2&gt;A story&lt;/h2&gt;

&lt;p&gt;A few years ago, at Danger, I wrote an SSH client for the Hiptop (or Sidekick)
phone, which we gave away to developers for free as a demonstration of our API
and the cool things you could do with it. Later, the developer program would
be effectively shut down, and the app would be renamed &amp;ldquo;Terminal Client&amp;rdquo; and
sold to hapless users for $10. But I didn&amp;rsquo;t know that yet.&lt;/p&gt;

&lt;p&gt;The protocol implementation was awful. SSH protocol &amp;ldquo;documentation&amp;rdquo; was just a
handful of poorly written RFC drafts that described things in seemingly random
order. I hacked on some java code until I got it mostly functioning and
left it at that. But it was so buggy and bare-bones that it often would only
connect to certain OpenSSH daemon versions, and this eventually bothered me
enough that I wanted to fix it.&lt;/p&gt;

&lt;p&gt;Python was my favorite language at the time. The syntax is so simple and clean
that I feel like I'm writing in pseudocode. It&amp;rsquo;s a very close match to &amp;ldquo;the
way I think&amp;rdquo; when I'm trying to solve a problem, so there&amp;rsquo;s a very low
impedance between what I think and what I type. If I'm trying to sketch out a
rough draft of code, I usually do it in python as a prototype. And since the
SSH protocol seemed like a difficult problem to me, I decided I should try to
write an SSH library in python, as a prototype for doing it in java.&lt;/p&gt;

&lt;p&gt;The basic idea was to use a thread to handle the protocol level of the socket,
so that SSH&amp;rsquo;s packet windowing requests would get immediate responses, and
supply an API to the client that looked just like a normal python file. Data
should arrive, get handled by the protocol thread, and stuffed into a buffer
so that a client &lt;code&gt;read&lt;/code&gt; could get it. The &lt;code&gt;read&lt;/code&gt; would trigger window feedback
to the server, but the client could go do other stuff while waiting for data.&lt;/p&gt;

&lt;p&gt;It worked beautifully. I called it &lt;a href="http://www.lag.net/paramiko"&gt;paramiko&lt;/a&gt; as
a dorky esperanto pun and released it, and after a while, to my moderate
surprise, people were using it. I didn&amp;rsquo;t really anticipate that it would be
useful for automating tasks for working with server machines and clusters,
though it&amp;rsquo;s kind of obvious in retrospect.&lt;/p&gt;

&lt;p&gt;Python&amp;rsquo;s thread model (the dreaded Global Interpreter Lock) isn&amp;rsquo;t awesome, but
when threads are mostly blocked on I/O, they&amp;rsquo;re good enough. The
&lt;a href="http://bazaar-vcs.org/bazaar"&gt;bazaar&lt;/a&gt; team obsessed about performance, and
helped find and solve some real bottlenecks, so that in the end, paramiko was
slower than OpenSSH, but not by much.&lt;/p&gt;

&lt;p&gt;Over time, the feature set exploded. Now it can be an SSH client or server, or
an SFTP client or server, and has a few experimental SFTP features and a few
made-up ones (just to see if they were useful). I wanted to try writing a
standalone SFTP server based on it, for making little file-server sandboxes,
but&amp;hellip; I only have so much free time. :)&lt;/p&gt;

&lt;p&gt;I did end up using what I learned to write &amp;ldquo;jaramiko&amp;rdquo;, a java library for SSH,
and used that in later releases of the Sidekick SSH client, but it was never
really as powerful or successful as paramiko. In retrospect, it&amp;rsquo;s obvious why:
Java is not even remotely as expressive or concise as python, so most of the
java code is boilerplate that obscures the real logic. It&amp;rsquo;s hard to keep
up-to-date and maintain, much less add new features to.&lt;/p&gt;

&lt;h2&gt;What now?&lt;/h2&gt;

&lt;p&gt;I haven&amp;rsquo;t made a new release of paramiko since July. It feels like even longer
than that. Really, to me, the library is finished. I'd like to fix any
remaining bugs in it, but I don&amp;rsquo;t get many bug reports any more, and most of
the mailing list support questions are answered by other users.&lt;/p&gt;

&lt;p&gt;I don&amp;rsquo;t want the project to die. I just have to admit to myself that I'm not
really taking a leading role in it anymore. So I've migrated the repository to
git and put it on github here:
&lt;a href="github.com/robey/paramiko"&gt;github.com/robey/paramiko&lt;/a&gt; Anyone can fork it
there and submit patches, and I&amp;rsquo;ll make regular releases as long as there are
still bugs to fix, or new features that look cool.&lt;/p&gt;

&lt;p&gt;Why git? I like bazaar better (see some previous rants on this blog) but the
rest of the world has chosen git, and I don&amp;rsquo;t have the energy to keep fighting
that fight. It&amp;rsquo;s also easier for me if I can use the same VCS everywhere.
Keeping the project on github, in front of my face, will make it less likely
to bitrot. At least, that&amp;rsquo;s my hope.&lt;/p&gt;
</content>
    </entry>
  
    <entry>
      <title>Scala on Android</title>
      <link href="http://robey.lag.net/2009/01/19/scala-on-android.html" />
      <updated>2009-01-19T00:00:00-08:00</updated>
      <id>http://robey.lag.net/2009/01/19/scala-on-android</id>
      <content type="html">&lt;p&gt;Chris DiBona gave me a free G-phone after an open-source talk last week
(thanks!) with the admonishment that I write an app for it. I'm going to give
it an effort, at least!&lt;/p&gt;

&lt;p&gt;My first task was to figure out the scala-android build process, because that
will make coding much easier. Turns out to be easier than the instructions I
found online, so I thought I'd share.&lt;/p&gt;

&lt;p&gt;You don&amp;rsquo;t have to use Eclipse like the instructions say. You can use the ant
build file, but you have to modify it to find and compile using the scala
compiler.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;First&lt;/em&gt;, copy &lt;strong&gt;&lt;code&gt;scala-compiler.jar&lt;/code&gt;&lt;/strong&gt; and &lt;strong&gt;&lt;code&gt;scala-library.jar&lt;/code&gt;&lt;/strong&gt; out from
your scala-home and into your android project folder. Since the compiler is
only needed for building, and the library is the only one you&amp;rsquo;ll need
installed, I made a folder for the compiler and put the library in &lt;strong&gt;&lt;code&gt;libs/&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ mkdir scala-compiler
$ cp $SCALA_HOME/lib/scala-compiler.jar scala-compiler/
$ cp $SCALA_HOME/lib/scala-library.jar libs/
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;em&gt;Second&lt;/em&gt;, edit &lt;strong&gt;&lt;code&gt;build.xml&lt;/code&gt;&lt;/strong&gt; and change the compile task to look like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;target name="compile" depends="dirs, resource-src, aidl"&amp;gt;
    &amp;lt;javac encoding="ascii" target="1.5" debug="true" extdirs=""
           srcdir="${srcdir}" includes="**/*.java"
           destdir="${outdir-classes}"
           bootclasspath="${android-jar}"&amp;gt;
        &amp;lt;classpath&amp;gt;
            &amp;lt;fileset dir="${external-libs}" includes="*.jar"/&amp;gt;
        &amp;lt;/classpath&amp;gt;
    &amp;lt;/javac&amp;gt;

    &amp;lt;taskdef resource="scala/tools/ant/antlib.xml"
             classpath="scala-compiler/scala-compiler.jar:${external-libs-ospath}/scala-library.jar" /&amp;gt;
    &amp;lt;scalac force="changed" deprecation="on"
            srcdir="${srcdir}" includes="**/*.scala"
            destdir="${outdir-classes}"
            bootclasspath="${android-jar}"&amp;gt;
        &amp;lt;classpath&amp;gt;
            &amp;lt;fileset dir="${external-libs}" includes="*.jar"/&amp;gt;
        &amp;lt;/classpath&amp;gt;
    &amp;lt;/scalac&amp;gt;
&amp;lt;/target&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;em&gt;Third,&lt;/em&gt; you need to edit &lt;strong&gt;&lt;code&gt;dx&lt;/code&gt;&lt;/strong&gt; from the android toolkit so that it gets
lots of memory. Uncomment the line with the java heap option and give it a
&lt;em&gt;lot&lt;/em&gt; of memory, like 512MB (&lt;strong&gt;&lt;code&gt;"-Xmx=512m"&lt;/code&gt;&lt;/strong&gt;). It needs this because dex is
going to convert the entire scala library into dalvik. Without a lot of
memory, it will run out of heap space.&lt;/p&gt;

&lt;p&gt;The end result is a little wasteful: even a tiny &amp;ldquo;hello world&amp;rdquo; app is 800K,
probably because of that big ol' scala library. I suspect dex is proactively
throwing away classes that I didn&amp;rsquo;t use, or it would be even bigger.&lt;/p&gt;
</content>
    </entry>
  
    <entry>
      <title>Java SE6 on Leopard</title>
      <link href="http://robey.lag.net/2008/12/01/java-se6-on-leopard.html" />
      <updated>2008-12-01T00:00:00-08:00</updated>
      <id>http://robey.lag.net/2008/12/01/java-se6-on-leopard</id>
      <content type="html">&lt;p&gt;I could not find this anywhere on google in one place, but I pieced it
together from clues Steve found by poring over a half dozen other blogs.&lt;/p&gt;

&lt;p&gt;To get Java 6 to run on Leopard, you must go into
&lt;strong&gt;&lt;code&gt;/System/Library/Frameworks/JavaVM.framework/Versions&lt;/code&gt;&lt;/strong&gt; and blow away the
&lt;strong&gt;&lt;code&gt;Current&lt;/code&gt;&lt;/strong&gt; and &lt;strong&gt;&lt;code&gt;CurrentJDK&lt;/code&gt;&lt;/strong&gt; symlinks. Make them point to &lt;strong&gt;&lt;code&gt;A&lt;/code&gt;&lt;/strong&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ sudo rm Current CurrentJDK
$ sudo ln -s A Current
$ sudo ln -s A CurrentJDK
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now launch the &lt;strong&gt;Java Preferences&lt;/strong&gt; app and move the JDK order to the way you
want it.&lt;/p&gt;

&lt;p&gt;Setting the symlinks manually to 1.5 or 1.6 will make text apps work, but
jconsole will blow chunks. It seems like the &lt;strong&gt;Java Preferences&lt;/strong&gt; app changes
&amp;ldquo;what&amp;rsquo;s in &lt;strong&gt;&lt;code&gt;A&lt;/code&gt;&lt;/strong&gt;&amp;rdquo; and then the symlinks make that work.&lt;/p&gt;

&lt;p&gt;If you have the symlinks set up in the pre-Leopard way, the prefs app won&amp;rsquo;t
fix them.&lt;/p&gt;

&lt;p&gt;Sigh. Apple.&lt;/p&gt;
</content>
    </entry>
  
    <entry>
      <title>Scarling » Kestrel</title>
      <link href="http://robey.lag.net/2008/11/27/scarling-to-kestrel.html" />
      <updated>2008-11-27T00:00:00-08:00</updated>
      <id>http://robey.lag.net/2008/11/27/scarling-to-kestrel</id>
      <content type="html">&lt;p&gt;Ever since we deployed scarling in production, its name has progressed from
being a stale joke to an annoyance. It started out as a test of porting
starling to scala, but as features have been added and it&amp;rsquo;s been hardened by
the real world, it has needed its own identity. I wanted to stay with the bird
theme, because I think it&amp;rsquo;s cute, so after spending several days mulling over
possible new names, I've settled on &amp;ldquo;kestrel&amp;rdquo;.&lt;/p&gt;

&lt;p&gt;Poof! And so it is. What was scarling is now kestrel. I've updated the code
and github page:
&lt;a href="http://github.com/robey/kestrel/"&gt;http://github.com/robey/kestrel/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Meanwhile, over the last couple of weeks, I've added three big features.&lt;/p&gt;

&lt;h2&gt;Big Queues&lt;/h2&gt;

&lt;p&gt;One of the working assumptions of kestrel is that queues are normally empty. A
healthy system has more consumers than producers, so new work items are
gobbled up as fast as they come in, and we can usually keep all queued items
in memory, for the rare times there are any items queued at all.&lt;/p&gt;

&lt;p&gt;However, you may have an event &amp;mdash; like a realigning US presidential election
&amp;mdash; which causes brief &amp;ldquo;high traffic&amp;rdquo; bursts that temporarily overwhelm
consumers. It became clear at Twitter that we needed to have a graceful
soft-landing for these bursts, to prevent kestrel from running out of memory
or needing manual intervention.&lt;/p&gt;

&lt;p&gt;In kestrel&amp;rsquo;s git-head, now, when a queue passes a memory limit (128MB by
default), that queue will be dropped into what I call &amp;ldquo;read-behind mode&amp;rdquo;. New
items are added to the queue by writing them into the queue&amp;rsquo;s journal, but not
kept around in memory. Instead, we just keep the first 128MB of the queue head
in memory, and track a second file pointer to our journal. As items are read
from the queue, this new file pointer replays the journal from behind, filling
the queue back up until it either catches up with the write pointer or fills
up 128MB again.&lt;/p&gt;

&lt;p&gt;In effect, we&amp;rsquo;re keeping a window of the head of the queue in memory, and
using the journal as a disk store for the rest. It nicely caps memory
consumption and the added disk I/O can be amortized out across consumers.&lt;/p&gt;

&lt;p&gt;You probably don&amp;rsquo;t want to let an out-of-control queue grow forever because it
will fill up your disk, but this should make it cope well with short-term
spikes and give you one less thing to worry about when the snake is trying to
swallow the pig.&lt;/p&gt;

&lt;h2&gt;Blocking fetches&lt;/h2&gt;

&lt;p&gt;Something that&amp;rsquo;s bothered me about using the memcache protocol is that there&amp;rsquo;s
no way for a consumer to do a blocking fetch from a queue. If an item is
immediately available, kestrel will give it to you. If not, you&amp;rsquo;ll immediately
get a &amp;ldquo;nothing&amp;rdquo; response. Since, like I just said above, you always want to
have more consumers/workers than work items, these consumers swarm all over
the cluster, asking for work and immediately being sent away empty-handed.
Just to keep them from going crazy, we have ruby client code that looks
something like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;while !(response = QUEUE.get(queue_name))
  sleep 0.25
end
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Good grief. If we&amp;rsquo;re going to let the workers take a nap on the job, we could
at least make it happen while blocking on a queue fetch.&lt;/p&gt;

&lt;p&gt;So I did a little sneaky thing with queue names in the memcache &amp;ldquo;get&amp;rdquo; command
by letting clients add options to the end, separated by slashes. Slashes
aren&amp;rsquo;t allowed in filenames anyway so they were never valid in queue names.
Then I made a timeout option, so a client can ask to block for work for some
amount of time:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;while !(response = QUEUE.get("&amp;lt;b&amp;gt;#{queue_name}/t=250&amp;lt;/b&amp;gt;")); end
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The &amp;ldquo;&lt;strong&gt;&lt;code&gt;t=250&lt;/code&gt;&lt;/strong&gt;&amp;rdquo; option means &amp;ldquo;if there&amp;rsquo;s nothing in the queue right now, I'm
willing to wait up to 250 milliseconds for something to arrive&amp;rdquo;. After that
timeout, if there&amp;rsquo;s still nothing, kestrel will answer with the usual
empty-response. It&amp;rsquo;s important here to make sure that your memcache client is
set to have a read-timeout larger than the timeout you send in the &amp;ldquo;get&amp;rdquo;
request.&lt;/p&gt;

&lt;p&gt;This was the easiest thing to implement after I worked out how. Each queue
just has a kernel-style wait-list of clients attached to it. If a client makes
a timeout-style &amp;ldquo;get&amp;rdquo; request, and the queue is empty, we just put the client
on the wait-list and the client&amp;rsquo;s actor does a &lt;code&gt;receiveWithin(timeout)&lt;/code&gt; to
wait for a message saying something new has arrived. When items are put on the
queue, the first wait-list client is removed from the wait-list and notified.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;ManyClients&lt;/code&gt; load test exercises this by having 100 (or 500) clients pile
on to a queue with blocking fetches while a single producer slowly trickles
out data. It seems to work like a charm.&lt;/p&gt;

&lt;h2&gt;Reliable Fetch&lt;/h2&gt;

&lt;p&gt;Writing something into a queue is pretty reliable. The client does a &amp;ldquo;set&amp;rdquo;
operation, and if it worked, kestrel responds &amp;ldquo;&lt;code&gt;STORED&lt;/code&gt;&amp;rdquo;. Naturally, it only
sends that response after the item has been written into the queue&amp;rsquo;s journal
file. The &amp;ldquo;&lt;code&gt;STORED&lt;/code&gt;&amp;rdquo; response means kestrel has taken responsibility for the
item.&lt;/p&gt;

&lt;p&gt;Fetching from a queue is not such a happy story. When kestrel sends an item to
a client, it will never get an acknowledgement or confirmation, and has to
blithely assume that the client got all the data okay and took responsibility
for it. If a client loses its connection during the data transfer, or crashes
right after receiving a work item, that item is gone forever.&lt;/p&gt;

&lt;p&gt;So I added an &amp;ldquo;open&amp;rdquo; option to &amp;ldquo;get&amp;rdquo; which opens a tentative fetch on a queue.
If an item is available, kestrel will remove it from the queue and send it to
the client as usual. But it will also set the item aside and prepare to
&amp;ldquo;un-get&amp;rdquo; it if the client disconnects without confirming it. So a tentative
fetch is started with:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;QUEUE.get("#{queue_name}/open")
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;and confirmed with:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;QUEUE.get("#{queue_name}/close")
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;which returns nothing. For efficiency, you can also confirm a previous fetch
and get the next item in one operation (avoiding an extra round-trip):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;QUEUE.get("#{queue_name}/close/open")
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Each client connection may only have one outstanding tentative fetch, and if a
connection is dropped, any tentatively-fetched item will be put back on the
head of the queue and given to the next available consumer.&lt;/p&gt;

&lt;p&gt;I want to briefly make a distinction here between confirming that a client
receives an enqueued item and confirming that some useful work was done on it.
Kestrel can really only concern itself with the former. As a good queue
server, it would like confirmation that a client has accepted responsibility
for an item before that item is erased from the queue and journal. But it has
no way of confirming that &amp;ldquo;something useful&amp;rdquo; was done with that item. You
still need to write careful client code to ensure that an item isn&amp;rsquo;t lost
after it&amp;rsquo;s received.&lt;/p&gt;

&lt;p&gt;Using reliable fetch means you are protected from losing items, at the
expense of potentially receiving duplicates &amp;mdash; that&amp;rsquo;s the trade-off. A client
may successfully handle a fetched item but crash before confirming it to
kestrel, and the item may then be given to another client. I think this is a
good trade-off, though. If you know you may handle some items twice, you can
design your system so that duplicate work is harmless &amp;mdash; versus the case where
you may lose items and don&amp;rsquo;t have any recourse.&lt;/p&gt;

&lt;h2&gt;Summary&lt;/h2&gt;

&lt;p&gt;With these three new features, you should be able to survive large bursts of
traffic more easily (with big queues), allow incoming items to be processed
immediately by the next available consumer (with blocking fetches), and
deliver items reliably even to flaky consumers (with reliable fetch). They
expanded the code size 50%, from 1000 lines to 1500, but I think they were
worth it, because they solve several limitations inherited from starling.&lt;/p&gt;
</content>
    </entry>
  
    <entry>
      <title>Coder Dvorak</title>
      <link href="http://robey.lag.net/2008/09/20/coder-dvorak.html" />
      <updated>2008-09-20T00:00:00-07:00</updated>
      <id>http://robey.lag.net/2008/09/20/coder-dvorak</id>
      <content type="html">&lt;p&gt;Some kind of bug was spreading around at work a few weeks ago and got me
interested in alternate keyboard layouts again. Several of my co-workers use
Dvorak, which never really interested me. But Evan pointed me to a page on
&amp;ldquo;programmer dvorak&amp;rdquo; and suddenly my interest was piqued.&lt;/p&gt;

&lt;p&gt;Dvorak rearranges the letter keys so that letters used more frequently in
english are closer to the home row, and &lt;a href="http://www.kaufmann.no/roland/dvorak/"&gt;Progammer
dvorak&lt;/a&gt; continues that concept into the
symbol keys and attempts to rearrange them so that symbols frequently used in
code are close at hand. The stroke of genius that makes it worthwhile is that
he &lt;em&gt;moves the number keys up to shifted positions&lt;/em&gt;, making space for more
non-shifted symbols. I don&amp;rsquo;t hit the number 8 nearly as much as I hit
underline, plus, or the &amp;ldquo;squiggly braces&amp;rdquo;.&lt;/p&gt;

&lt;p&gt;Unfortunately, he then moves the number keys into a jumbled-up order so that
they&amp;rsquo;re no longer in incrementing or any other intuitive order. I'm not sure
why. (I later found out that this is apparently the original Dvorak number
layout which nobody uses.)&lt;/p&gt;

&lt;p&gt;Being the type of person I am, I decided I could do better, so I constructed a
&amp;ldquo;coder dvorak&amp;rdquo; based on this idea, but keeping the number keys in order, and
moving symbols to positions that make sense for a modern language. (There&amp;rsquo;s
not much need to keep &lt;strong&gt;&lt;code&gt;$&lt;/code&gt;&lt;/strong&gt; nearby unless you still do a lot of perl.)&lt;/p&gt;

&lt;p&gt;I grouped the symbol keys into four loose categories, the first being &amp;ldquo;paired&amp;rdquo;
(like parentheses), and the other three ordered from &amp;ldquo;most used&amp;rdquo; to &amp;ldquo;least
used&amp;rdquo;. Then after trying it out for a few days, I re-assigned several symbols
because I made some mistakes in guessing frequency of use &amp;mdash; both period and
slash were used way more than i thought, for example.&lt;/p&gt;

&lt;p&gt;The layout I finally adopted, unchanged for a month now, is below:&lt;/p&gt;

&lt;p&gt;&lt;img src="/images/cdv.png" alt="coder dvorak layout" /&gt;&lt;/p&gt;

&lt;p&gt;I used the awesome &lt;a href="http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&amp;amp;amp;id=ukelele"&gt;Ukelele&lt;/a&gt;
to create the Mac keyboard layout file, which can just be dropped into
&lt;strong&gt;&lt;code&gt;~/Library/Keyboard Layouts/&lt;/code&gt;&lt;/strong&gt; in your home folder.&lt;/p&gt;

&lt;p&gt;A lot of the design came from whiskey-inspired ideas from Britt and Evan.
Originally, I was going to put the paired keys on symmetric sides of the
keyboard, like having &amp;ldquo;&lt;strong&gt;&lt;code&gt;(&lt;/code&gt;&lt;/strong&gt;&amp;rdquo; on &lt;strong&gt;&lt;code&gt;5&lt;/code&gt;&lt;/strong&gt; and &amp;ldquo;&lt;strong&gt;&lt;code&gt;)&lt;/code&gt;&lt;/strong&gt;&amp;rdquo; on &lt;strong&gt;&lt;code&gt;7&lt;/code&gt;&lt;/strong&gt;, which we
all convinced ourselves made sense. But after trying that for a day, it was
clearly wrong. It turns out that by the time my fingers jump the two rows from
the home row to the number row, the finger I use to hit a key isn&amp;rsquo;t 100%
deterministic. It depends on what else I've been typing. The home row
&amp;ldquo;bindings&amp;rdquo; don&amp;rsquo;t hold as tightly up there. So it was as if I'd put the
symmetric keys randomly apart from each other.&lt;/p&gt;

&lt;p&gt;Some placements worked out better than I expected. The underline key (used
heavily in ruby, python, and scala) almost couldn&amp;rsquo;t be in a more accessible
place. Similarly, period and slash ended up in great places, and putting bang
(&lt;strong&gt;&lt;code&gt;!&lt;/code&gt;&lt;/strong&gt;) and question (&lt;strong&gt;&lt;code&gt;?&lt;/code&gt;&lt;/strong&gt;) on the same key is strangely intuitive.&lt;/p&gt;

&lt;p&gt;The thing I have the most trouble with, after about 6 weeks, is the Dvorak
letter layout.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Dvorak is right-handed-centric.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Great, because I'm right-handed. But it&amp;rsquo;s not that simple! Look at your QWERTY
keyboard: more letters are centered around the left hand than the right.
QWERTY even devotes one of the 8 precious home-row positions to &lt;em&gt;semi-colon&lt;/em&gt;
on the right hand &amp;mdash; hardly a frequently used english key! (Most english
writers have never even learned correct usage of the semi-colon.)&lt;/p&gt;

&lt;p&gt;I never realized, until typing on Dvorak, that QWERTY has made my left hand a
much stronger typer. Dvorak relies on the right hand a lot more, and I was in
the strange position of having my right hand get tired pretty quickly for the
first couple of weeks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. English has a different distribution than a programming language.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It mostly works, but there are a few cases that sting. Words like &amp;ldquo;if&amp;rdquo; and
&amp;ldquo;while&amp;rdquo; appear &lt;em&gt;all the time&lt;/em&gt; in code, but are difficult to type in Dvorak
because F and W are in awkward places. A real coder layout would probably
reorganize the keys to match actual words used in programming languages, but I
chickened out. It seemed like a very large gratuitous change.&lt;/p&gt;

&lt;p&gt;The worst is &amp;ldquo;I&amp;rdquo; versus &amp;ldquo;U&amp;rdquo;. &amp;ldquo;I&amp;rdquo; is used all over the place in coding. It&amp;rsquo;s
also the most common loop variable name. So it should have been on the home
row instead of &amp;ldquo;U&amp;rdquo;, which is rarely used.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Anyway&amp;hellip;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I'm liking it so far. I may change my mind in another month or so, because I'm
pretty hard to please, but it&amp;rsquo;s working well for now.&lt;/p&gt;

&lt;p&gt;The keyboard layout can be downloaded here:
&lt;a href="http://www.lag.net/robey/cdv/"&gt;http://www.lag.net/robey/cdv/&lt;/a&gt;&lt;/p&gt;
</content>
    </entry>
  
    <entry>
      <title>OS X Leopard JNI Link Error</title>
      <link href="http://robey.lag.net/2008/09/02/jni-link-error.html" />
      <updated>2008-09-02T00:00:00-07:00</updated>
      <id>http://robey.lag.net/2008/09/02/jni-link-error</id>
      <content type="html">&lt;p&gt;I just spent a long time tracking this down, and nobody else on google had
ever reported a solution. So I'm posting this so the next person wit h this
problem can find this post.&lt;/p&gt;

&lt;p&gt;If you see this error when linking something in JNI:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ gcc -dynamiclib -o libDirectorySyncer.jnilib target/c/dsync.o -framework JavaVM
ld: framework not found JavaVM
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;it&amp;rsquo;s because leopard has messed up your JDK folder:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ ls -l /System/Library/Frameworks/JavaVM.framework/Versions/Current
/System/Library/Frameworks/JavaVM.framework/Versions/Current@ -&amp;gt; 1.6.0
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Wrong! It needs to be pointing to &amp;ldquo;A&amp;rdquo;. No idea why.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ cd /System/Library/Frameworks/JavaVM.framework/Versions/
$ sudo rm Current
$ sudo ln -s A Current
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;That will fix it. Yay!&lt;/p&gt;
</content>
    </entry>
  
    <entry>
      <title>CJC Prime Number Sieve</title>
      <link href="http://robey.lag.net/2008/08/27/cjc-prime-number-sieve.html" />
      <updated>2008-08-27T00:00:00-07:00</updated>
      <id>http://robey.lag.net/2008/08/27/cjc-prime-number-sieve</id>
      <content type="html">&lt;p&gt;Apologies in advance. This is probably the geekiest post I've ever done.&lt;/p&gt;

&lt;p&gt;To get an SSH client working &amp;mdash; with a reasonable response time &amp;mdash; on a 200mhz
ARM chip a few years ago, I had to optimize some crazy things. One of those
things was the generation of prime numbers, which are needed to create new
public keys.&lt;/p&gt;

&lt;p&gt;You may have heard that it&amp;rsquo;s very difficult to factor large numbers. This is
true, but it&amp;rsquo;s actually a lot easier to determine whether a large number is
&amp;ldquo;probably prime&amp;rdquo;. Most prime number generators create a random number from a
secure source of entropy and then run it through a Miller-Rabin test, which
can identify composite (non-prime) numbers. If the test doesn&amp;rsquo;t prove that a
number is composite, then there&amp;rsquo;s at least a &amp;frac34; chance that it&amp;rsquo;s prime. You
can perform the test multiple times to improve those odds. More info here:
&lt;a href="http://en.wikipedia.org/wiki/Miller-Rabin_test"&gt;Miller-Rabin test&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Miller-Rabin test can be pretty time consuming, though, and since you may
need to run it against many random numbers before you&amp;rsquo;ll find one that&amp;rsquo;s
prime, it would be nice to weed out obvious composites beforehand. For
example, you wouldn&amp;rsquo;t want to test an even number since that&amp;rsquo;s obviously
divisible by 2. So here&amp;rsquo;s a shortcut I came up with that I call the &amp;ldquo;CJC prime
number sieve&amp;rdquo;.&lt;/p&gt;

&lt;h2&gt;The Sum-of-Digits Trick&lt;/h2&gt;

&lt;p&gt;I thought I would be able to gloss over this part of the explanation by
throwing out a few links and saying &amp;ldquo;go read about the (name) theorem!&amp;rdquo; But a
google search only turns up some anecdotal descriptions, and wikipedia barely
devotes an entire sentence to it (here: 
&lt;a href="http://en.wikipedia.org/wiki/Modular_arithmetic#Applications"&gt;Modular Arithmetic&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;There&amp;rsquo;s a fairly well-known trick for finding out if a number is divisible by
3: add the digits, and if the sum of the digits is divisible by 3, so is the
original number. If it&amp;rsquo;s not, the original number is not. For example, 4401 is
divisible by 3 because 4 + 4 + 0 + 1 = 9, and 9 is divisible by 3. 512 is not,
because 5 + 1 + 2 = 8, and 8 is not divisible by 3. (This trick also works for
11 and 9 but these are emo numbers that don&amp;rsquo;t get as much attention.)&lt;/p&gt;

&lt;p&gt;So why does this work? As you can guess from the wikipedia link above, it has
to do with modular arithmetic. In (mod k) space, the only numbers that exist
are integers from 0 to k &amp;ndash; 1. Coders recognize this as being the way all int
math works on a computer: a byte can represent a number &amp;ldquo;mod 256&amp;rdquo;. It turns
out that in (mod k) space, you can cancel out a factor if that factor is
&lt;em&gt;relatively prime&lt;/em&gt; to k.&lt;/p&gt;

&lt;p&gt;Say what? Okay, &amp;ldquo;relatively prime&amp;rdquo; just means two numbers don&amp;rsquo;t share any
factors. 9 and 10 are relatively prime because 10 = 2 &lt;em&gt; 5, and 9 = 3 &lt;/em&gt; 3. But
4 and 6 aren&amp;rsquo;t, because they share the factor 2.&lt;/p&gt;

&lt;p&gt;Sooo&amp;hellip; since:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;20 = 2 (mod 9)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;and 20 is relatively prime to 9, you can figure out 40 mod 9 by factoring out
the 20, and replacing it with its equivalent in (mod 9) space, 2:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;40 = 2 * 20
2 * 20 = 2 * 2 = 4 (mod 9)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The reason this works well for finding numbers divisible by 3 is that:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;4401 = (4 * 1000) + (4 * 100) + (0 * 10) + 1
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;and:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;10 = 1 (mod 3)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Aha! So factoring all the 10s out and replacing them with 1s gives us:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;(4 * 1000) + (4 * 100) + (0 * 10) + 1 (mod 3)
(4 * 1) + (4 * 1) + (0 * 1) + 1 (mod 3)
4 + 4 + 0 + 1 (mod 3)
9 (mod 3)
0 (mod 3)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;and any number that&amp;rsquo;s 0 in (mod 3) is of course divisible by 3.&lt;/p&gt;

&lt;p&gt;The reason 9 and 11 work is that 10 mod 9 = 1, and 10 mod 11 = -1. (The -1
means you have to do alternate add/subtracts instead of just summing the di
gits, but it&amp;rsquo;s not worth obsessing over in this article.)&lt;/p&gt;

&lt;h2&gt;Making a sieve&lt;/h2&gt;

&lt;p&gt;So that&amp;rsquo;s exciting&amp;hellip; if you do a lot of dividing by 3. But how does this help
discover primes?&lt;/p&gt;

&lt;p&gt;Well if computers worked in base 10 instead of 2, we'd now have a pretty fast
way of determining if a really large number were divisible by 3. And I hope
you won&amp;rsquo;t think I'm being pedantic if I point out here that if you make up a
random number, it has about 1/3 chance of being divisible by 3. :)&lt;/p&gt;

&lt;p&gt;Let&amp;rsquo;s say instead of working in base 10, we were working in some other base B
that was relatively prime to a lot of interesting numbers. Ideally we'd like
to be in a base B such that&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;B = 1 (mod n)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;for as many n as possible. One nice coincidence is that&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;256 =
255 + 1 =
(3 * 5 * 17) + 1
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;or&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;256 = 1 (mod 3) and
256 = 1 (mod 5) and
256 = 1 (mod 17)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;That means that in base 256, summing the digits will tell you quickly if 3, 5,
or 17 is a factor. Summing the base-256 digits of a number also goes by the
name &amp;ldquo;adding the bytes in the binary representation&amp;rdquo;. It actually helps a lot,
as you can see from timings at the end of this article.&lt;/p&gt;

&lt;p&gt;If we&amp;rsquo;re willing to leave powers of 2, though, we can do better. It just so
happens that we can construct a base that&amp;rsquo;s relatively prime to a handful of
small primes, yet is close to a power of 2. The key is noticing that, in the
256 case above, multiplying a few primes together and then adding 1 created a
number that was both relatively prime to all those primes, and was equal to 1
in each of the primes' (mod k) spaces.&lt;/p&gt;

&lt;p&gt;Think of it this way. If A, B, and C are all prime, then A &lt;em&gt; B &lt;/em&gt; C will equal
0 in (mod A) because it&amp;rsquo;s a multiple of A. Same goes for B and C for the same
reason. Adding 1 makes it equal 1 in all three (mod k) as long as the primes
are all greater than 2. And that also means the resulting number is no longer
a multiple of A, B, or C, which makes our new number relatively prime to them.
In short, we can construct a base that&amp;rsquo;s relatively prime to a set of primes,
and is equal to 1 in each of the primes' &amp;ldquo;mod spaces&amp;rdquo;, by just multiplying the
primes together and adding 1.&lt;/p&gt;

&lt;p&gt;We want the base to be close to a power of 2 to preserve our entropy source.
If we grab 8 bits (0 &amp;ndash; 255) from a secure random pool, but we want a smal ler
base like 250, we have two choices. We can mod the random byte (&lt;code&gt;rand % 250&lt;/code&gt;)
to wrap into range, or we can just discard an d retry when we get a byte
that&amp;rsquo;s out of our preferred range. If we mod-and-wrap, though, we&amp;rsquo;re giving
more weight to results in the wrapped part &amp;mdash; numbers like 1 are going to be
more common because both 1 and 251 will mod-wrap to them. That ruins the
&amp;ldquo;secure&amp;rdquo; part of our randomness, so we need to go with the discard option. And
if we have to discard some bytes, we'd like to pick ranges to minimize the
chance of a byte being discarded, which means choosing ranges that are as
close to a power-of-2 range as possible.&lt;/p&gt;

&lt;p&gt;One nice possibility is:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;3 * 5 * 7 * 11 * 13 * 17 * 19**2 * 23 + 1 = 2119382266 = 0x7e5334fa
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;That&amp;rsquo;s a base that covers 98.7% of the 31-bit range, so it won&amp;rsquo;t cause many
random digits to be discarded, and it&amp;rsquo;s relatively prime to the first 8 primes
excluding 2. (We can ensure that a number isn&amp;rsquo;t divisible by 2 by just setting
its lowest bit to 1 &amp;mdash; a bonus for working in base 2!)&lt;/p&gt;

&lt;h2&gt;Let&amp;rsquo;s Go!&lt;/h2&gt;

&lt;p&gt;At this point, the code should just write itself. To generate a 1024-bit
random prime, we need to figure out how many &amp;ldquo;digits&amp;rdquo; that would be in a base
-2119382266 number, and what the range would be on the highest digit. We&amp;rsquo;ll
lose some of the 1024-bit range because our new base doesn&amp;rsquo;t map exactly to
the desired range, but we&amp;rsquo;ll never lose more than a single bit&amp;rsquo;s worth.
(There&amp;rsquo;s probably a fancy proof for this, but if you think about it, you can
reason it out pretty quickly in your head.) You already lose 2 bits for any
prime, because you need to set the high bit to ensure a prime number of the
desired size, and you need to set the low bit to ensure an odd number as
mentioned above.&lt;/p&gt;

&lt;p&gt;For a 1024-bit prime, that means a 34-digit number with the highest digit set
to 2. For a 2048-bit prime, it&amp;rsquo;s a 67-digit number with the highest digit
between 5 and 8 inclusive. (You can play around with other sizes by calling
&lt;code&gt;computeParams&lt;/code&gt; in the attached code.)&lt;/p&gt;

&lt;p&gt;We can then generate, for a 1024-bit prime, 33 &amp;ldquo;digits&amp;rdquo; of 32-bit machine
words, masking off the high bit and retrying any time we get a &amp;ldquo;digit&amp;rdquo; bigger
than (base &amp;ndash; 1). Set the high &amp;ldquo;digit&amp;rdquo; to 2 (or select a digit at random for
other bit choices) and we&amp;rsquo;re done.&lt;/p&gt;

&lt;p&gt;A quick shortcut here: Normally we should sum the digits, and then find the
mod of that sum against each of our 8 primes. However, we already have a
really convenient thing here: base &amp;ndash; 1 is a multiple of all of 8 primes (by
design), so we can mod the sum by (base &amp;ndash; 1) as we add, and not lose any
information, keeping the sum small enough to fit into a 32-bit word.&lt;/p&gt;

&lt;p&gt;After summing, a few small mod operations tell us if the generated number is
divisible by 3, 5, 7, 11, 13, 17, 19, or 23. If it is, we can loop back and
generate another number immediately, skipping the Miller-Rabin prime test. If
it passes, we still dump it into Miller-Rabin for final verification &amp;mdash; the
sum check just lets us quickly discard any simple composites.&lt;/p&gt;

&lt;h2&gt;Code and Results&lt;/h2&gt;

&lt;p&gt;I wrote a proof-of-concept in scala, and hosted it in git here:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;git clone http://www.lag.net/robey/code/cjc/
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;It includes an implementation of CJC in base 2119382266, alongside a prime
number generator that just uses the standard Miller-Rabin prime test by
itself, and a variant of the standard M-R sieve that checks the base-256
primes (3, 5, 17) first.&lt;/p&gt;

&lt;p&gt;The test code generates 5 primes from each algorithm, using a random number
seed given on the command line. (By the way,
&lt;em&gt;never use the built-in random number generator&lt;/em&gt; like this code does! Use a
secure one. I
used a seeded generator so the results would be repeatable, which is exactly
what you don&amp;rsquo;t want in the real world.) I've posted my own results in a chart
below: Using the base-256 check alone gives a 2x speedup, and usi ng CJC gives
over 3x! The primary increase, according to the second chart, appears to be
due to cutting back on calls to the Miller-Rabin algorithm. The more
candidates we can discard before trying M-R, the better.&lt;/p&gt;

&lt;p&gt;(By the way, ignore how long these algorithms are taking in wall-clock time. I
ran them in a standard JVM without JIT. In real life, you would definitely
want this to be in JIT or possibly even C &amp;mdash; horrors!)&lt;/p&gt;

&lt;p&gt;Anyway, hopefully this is an interesting technique for filtering primes. And
the name CJC? It&amp;rsquo;s named after my friend Communist J. Cat.&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;img src="/images/cjc-graph.png"" alt="results graph" /&gt;&lt;/p&gt;
</content>
    </entry>
  
    <entry>
      <title>Git for the real world</title>
      <link href="http://robey.lag.net/2008/07/13/git-for-the-real-world.html" />
      <updated>2008-07-13T00:00:00-07:00</updated>
      <id>http://robey.lag.net/2008/07/13/git-for-the-real-world</id>
      <content type="html">&lt;p&gt;Now that we've been using git at Twitter for a couple of months, we've
overcome several crippling problems and misunderstandings about how to use it
properly. There are dozens of &amp;ldquo;intros&amp;rdquo; and &amp;ldquo;tutorials&amp;rdquo; to git online, but at
some point you need to know more than just the basics of DVCS and the map to
svn commands &amp;mdash; you need to know practical considerations of real-world usage.
None of the intros or tutorials had this stuff, so I thought I'd share what we
learned.&lt;/p&gt;

&lt;p&gt;Git&amp;rsquo;s command-line interface is hands-down the worst of any DVCS (except the
archaic &lt;code&gt;tla&lt;/code&gt;). There are inconsistencies: Some commands will expect you to
type &amp;ldquo;origin/master&amp;rdquo;, while others will want &amp;ldquo;origin master&amp;rdquo;. Other commands
should never ever be used, but are presented in the documentation as if
they&amp;rsquo;re part of a normal usage pattern. Some commands are useless in their
default form and need several command-line options to make them work right.&lt;/p&gt;

&lt;p&gt;I ended up writing a wrapper script to cover up a lot of these flaws, which I
consider an &amp;ldquo;ultimate fail&amp;rdquo; for a UI. But I'm still not sure the script is a
good idea, since it may make me forget all the quirks I need to keep in mind
when the script isn&amp;rsquo;t around.&lt;/p&gt;

&lt;h2&gt;Don&amp;rsquo;t change history&lt;/h2&gt;

&lt;p&gt;Two commands you should avoid: &lt;strong&gt;&lt;code&gt;git rebase&lt;/code&gt;&lt;/strong&gt; and &lt;strong&gt;&lt;code&gt;git reset&lt;/code&gt;&lt;/strong&gt;. Some of
the tutorials will tell you that &lt;strong&gt;&lt;code&gt;rebase&lt;/code&gt;&lt;/strong&gt; is one of the first commands you
should learn. Lies! &lt;strong&gt;&lt;code&gt;rebase&lt;/code&gt;&lt;/strong&gt; is a way to trick you into creating merge
conflicts.&lt;/p&gt;

&lt;p&gt;When you rebase, you are erasing every local commit you've made, and turning
them into patches (as if you were back on CVS). After syncing your repository
up with the remote one, your patches are re-applied one by one. Presto! you've
changed history.&lt;/p&gt;

&lt;p&gt;The only reason I can think of for doing this is if you&amp;rsquo;re not comfortable
doing merges. But DVCS is all about merges, so you should just get used to
doing them. A merge provides a little signpost to everyone else about your
branch. Don&amp;rsquo;t fear the merge &amp;mdash; love it! It records exactly how your local
work should be rectified with remote changes, without requiring you to keep
tweaking your patch.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;reset&lt;/code&gt;&lt;/strong&gt; is even worse. It erases commits from your history, which will
very likely make your local repository different from everyone else&amp;rsquo;s, and
guarantee future conflicts or even an inability to push in the future. Some
people will say you should learn &lt;strong&gt;&lt;code&gt;reset&lt;/code&gt;&lt;/strong&gt; so you can use it in a panic
situation, but if you&amp;rsquo;re panicking, you&amp;rsquo;re more likely to make things worse,
so stop. Calm down. You have time to think and solve the problem in a
rational way.&lt;/p&gt;

&lt;p&gt;My problem with these two commands is that they violate a core philosophy of
DVCS: Everyone has their own view of the repository, but these views obey
entropy and flow in only one time direction. When they meet, they merge. Doing
a &lt;strong&gt;&lt;code&gt;rebase&lt;/code&gt;&lt;/strong&gt; or &lt;strong&gt;&lt;code&gt;reset&lt;/code&gt;&lt;/strong&gt; goes back in time and changes the past. They
should be in a separate tool, like &amp;ldquo;git-fix&amp;rdquo; or &amp;ldquo;git-hack&amp;rdquo;.&lt;/p&gt;

&lt;h2&gt;The story matters more than the chronology&lt;/h2&gt;

&lt;p&gt;Have you ever read a history book that said &amp;ldquo;In 1812, the British empire
shelled the tiny new American capital. Meanwhile, Napoleon marched across
Europe. In China, &amp;hellip;&amp;rdquo;? Hopefully not, that would suck. Telling a thread of
the story from beginning to end is more important than placing every single
event in its exact chronological order. The default format for &lt;strong&gt;&lt;code&gt;git log&lt;/code&gt;&lt;/strong&gt;
reorders commits by their exact date and time, so you need to be aware of
that and not get confused.&lt;/p&gt;

&lt;p&gt;Say, for example, you made a local branch, and made 3 local commits: L1, L2,
and L3. Meanwhile, someone else is working on a different feature on their
branch, and does commits R1 and R2. After you merge (M1), git is likely to
show you a history like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;M1 -&amp;gt; R2 -&amp;gt; L3 -&amp;gt; L2 -&amp;gt; R1 -&amp;gt; L1
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Huh? What? Why are my local commits intermixed with my co-worker&amp;rsquo;s commits?
The merge must have messed up! Crap! Time to &lt;strong&gt;&lt;code&gt;git reset&lt;/code&gt;&lt;/strong&gt; and destroy
everything, right? No! Stop! Don&amp;rsquo;t do it. It&amp;rsquo;s a trap! Git is lying by
omission &amp;mdash; it&amp;rsquo;s telling the literal, actual truth, but it&amp;rsquo;s telling it to you
in a way that makes it confusing. Git is re-ordering the history to make sure
every commit is shown in its actual time order, not the story order.&lt;/p&gt;

&lt;p&gt;You should probably just go ahead and alias &lt;strong&gt;&lt;code&gt;log&lt;/code&gt;&lt;/strong&gt; to:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;git log --topo-order --decorate
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;That tells git to show things in &amp;ldquo;topological&amp;rdquo; (story) order, and to also mark
where various branches are sitting. I usually find it useful to take that one
step further:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;git log --topo-order --decorate --first-parent
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;That tells git to show things in story order &lt;em&gt;and&lt;/em&gt; to tell that story from
&lt;em&gt;my point of view&lt;/em&gt;. It&amp;rsquo;s sometimes interesting to see every commit that one of
your coworkers did in their branch, but often you just want to see the
merge-commit and move on. &lt;strong&gt;&lt;code&gt;"-- first-parent"&lt;/code&gt;&lt;/strong&gt; tells git
to skip over the details of every branch that isn&amp;rsquo;t a linear parent of yours.
Generally this means you&amp;rsquo;ll see a simplified history of what&amp;rsquo;s been going on,
without the intricacies of what happened on forked branches while they were
forked off.&lt;/p&gt;

&lt;p&gt;If you want to see all the threads of history intertwined, I suggest using a
graphical tool like gitk instead of git-log.&lt;/p&gt;

&lt;h2&gt;Don&amp;rsquo;t fast-forward &amp;mdash; live every moment&lt;/h2&gt;

&lt;p&gt;This one is pretty confusing. And it sucks, because this concept doesn&amp;rsquo;t even
exist in other DVCS. I think it&amp;rsquo;s another symptom of &amp;ldquo;fear of merge&amp;rdquo;.
Basically, sometimes when you ask git to merge branch A into branch B, it will
decide that it doesn&amp;rsquo;t want to merge and it will instead turn A and B into
clones of each other.&lt;/p&gt;

&lt;p&gt;For example, let&amp;rsquo;s pretend you made a branch of &amp;ldquo;master&amp;rdquo; called &amp;ldquo;feature&amp;rdquo; and
did a few commits on it, and are now ready to merge it back into master. If no
other work has happened on the master branch, git will try to out-clever you.
It thinks: &amp;ldquo;Well, nobody else has worked on the master branch, so I could just
&lt;em&gt;make the feature branch become the new master branch&lt;/em&gt; and
that would be logically equivalent.&amp;rdquo; So after the merge, you&amp;rsquo;ll see every
single commit you made, as if you had done them directly on the master branch.
Git has cloned your feature branch into the new master branch.&lt;/p&gt;

&lt;p&gt;This might not be so bad if there are only a couple of people working on the
project, but there are a few side effects: Your branch has effectively
vanished from history. There is no longer any indication that you were working
on a side branch; it looks like you were working directly on master. And if it
turns out that there were bugs in your new feature (which, you know, sometimes
happens), you can&amp;rsquo;t reverse the merge-commit because
&lt;em&gt;there is no merge-commit&lt;/em&gt;. You will have to reverse every single commit you
made, in reverse order, or worse.&lt;/p&gt;

&lt;p&gt;So really, you want git to always create a merge-commit when you do a merge.
For this, you have to ask it nicely:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;git merge --no-ff
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;(Git calls the history erasing &amp;ldquo;fast-forwarding&amp;rdquo;.)&lt;/p&gt;

&lt;h2&gt;A few other things&lt;/h2&gt;

&lt;p&gt;To remove a branch from a remote repository after it&amp;rsquo;s been merged and
deployed, you have to push the branch with a colon in front of the branch
name. This has become a running joke in the office: &amp;ldquo;Colon means delete.&amp;rdquo;
Look, don&amp;rsquo;t ask me, I'm not Linus. That&amp;rsquo;s just how it works.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;git push origin :stale_completed_branch
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;When other people remove branches, they won&amp;rsquo;t be removed from your local copy
of the repository. To take care of this housekeeping, you need to express a
fruit preference:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;git remote prune origin
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Again, don&amp;rsquo;t ask. I don&amp;rsquo;t know why. That&amp;rsquo;s just how it is.&lt;/p&gt;

&lt;p&gt;I have a few ranty topics on how git is implemented and used, and how that
compares with the older DVCS (especially bazaar), but I&amp;rsquo;ll save that for some
other time. If you&amp;rsquo;re using git, hopefully this information is useful.&lt;/p&gt;
</content>
    </entry>
  
    <entry>
      <title>Git gripes</title>
      <link href="http://robey.lag.net/2008/05/11/git-gripes.html" />
      <updated>2008-05-11T00:00:00-07:00</updated>
      <id>http://robey.lag.net/2008/05/11/git-gripes</id>
      <content type="html">&lt;p&gt;I have to get this rant off my chest.&lt;/p&gt;

&lt;p&gt;Why does git insist on making up new terminology for features that are in all
DVCS, and already have commonly accepted names?&lt;/p&gt;

&lt;p&gt;Last week during a presentation, I almost choked when someone showed off a git
feature where you can temporarily shelve uncommitted changes. It&amp;rsquo;s called
&amp;ldquo;git stash&amp;rdquo;. Stash?! Mercurial and bazaar at least agree that this is called,
um, shelve. Since you are shelving. I don&amp;rsquo;t have a bag of heroin to stash,
Linus. Just patches. To shelve. Now until the end of time I will need to
remember which term to use when talking to git people or the rest of the
universe.&lt;/p&gt;

&lt;p&gt;In git, when you commit, you are actually staging. (Git doesn&amp;rsquo;t consider a
commit to be truly committed until you've posted it to someone else&amp;rsquo;s repos
itory &amp;mdash; a charming bit of communism.) A staged commit is, to git, a cached
commit. Wikipedia tells me a cache is a copy of data that&amp;rsquo;s already stored
elsewhere, but nevermind, that definition doesn&amp;rsquo;t apply in the gitverse.
Commit = staged; staged = cached; got it?&lt;/p&gt;

&lt;p&gt;To switch branches in a working tree, you &amp;ldquo;checkout&amp;rdquo; in git. You also
&amp;ldquo;checkout&amp;rdquo; to create a new working tree for a branch. The manpage for
&amp;ldquo;git-check out&amp;rdquo; can&amp;rsquo;t even avoid sounding confused by this, and admits that
the former usage really &amp;ldquo;switches branches&amp;rdquo;. Let&amp;rsquo;s see if we can think of a
good term for switching branches. How about &amp;ldquo;switch&amp;rdquo;? Hey look, that&amp;rsquo;s what
everyone else calls it! Must be a very odd coincidence. To branch is to
&amp;ldquo;clone&amp;rdquo;; to revert is to &amp;ldquo;clean&amp;rdquo;. Not even the simplest of terms escape
newspeak.&lt;/p&gt;

&lt;p&gt;I want to end on a positive note, so I will say that compared to every other
DVCS, git is fast. I frequently have no idea what it&amp;rsquo;s doing (is &amp;ldquo;deltifying&amp;rdquo;
really a word?), or if it&amp;rsquo;s doing what I'd like it to do, but it&amp;rsquo;s doing it
very very quickly. Maybe there&amp;rsquo;s a metaphor there.&lt;/p&gt;
</content>
    </entry>
  
    <entry>
      <title>Scarling</title>
      <link href="http://robey.lag.net/2008/05/07/scarling.html" />
      <updated>2008-05-07T00:00:00-07:00</updated>
      <id>http://robey.lag.net/2008/05/07/scarling</id>
      <content type="html">&lt;p&gt;I recently spent bits of my free time porting
&lt;a href="http://rubyforge.org/projects/starling/"&gt;Starling&lt;/a&gt; to scala,
and thought I'd post the story here in case anyone else finds it enlightening
or interesting.&lt;/p&gt;

&lt;p&gt;Starling is a simple, reliable message-queue server written in ruby. It uses
MemCache protocol, so almost any language can already speak to it. You can
set up multiple starling servers, and just like a memcache pool, the servers
don&amp;rsquo;t need to know anything about each other. If clients pick a server at
random for each operation, it appears to be a single loosely-ordered queue,
with each server holding its own part. Starling is used extensively at
Twitter.&lt;/p&gt;

&lt;p&gt;I've been tinkering with scala since December, and have been building a 
&lt;a href="http://github.com/robey/configgy/tree/master"&gt;config/logging library&lt;/a&gt;
in my spare time. It was easy to pick up since I've been using
java and python for many years, and scala seems to nicely synthesize the
styles of those two languages. I felt like I was ready to try a meaningful
project, and Starling is potentially the kind of project that could gain a lot
from running on the JVM. At a mere 1000 lines of ruby, it was also a nice
managable size.&lt;/p&gt;

&lt;h2&gt;First chunk&lt;/h2&gt;

&lt;p&gt;I'm a bottom-up coder, so I started with the PersistentQueue class. It&amp;rsquo;s the
guts of Starling: a FIFO queue backed by a journal, and the means to rotate
and replay that journal when necessary. For this class, I was able to do
nearly a direct copy, just converting ruby syntax and library calls into scala
ones. Only a few places tripped me up:&lt;/p&gt;

&lt;p&gt;Scala, like java, has no equivalent of the &amp;ldquo;pack&amp;rdquo; function that&amp;rsquo;s in all of
the perl/python/ruby languages. I actually had to write a 10-line class to
write a little-endian int into a stream, and another 10-line class to read
one. Ack! That wasn&amp;rsquo;t an auspicious start, and gave me pause. Everything that
makes people run from java to higher-level languages was there: You&amp;rsquo;re given
only the most basic, fundamental tools to do I/O, and serializing data is
treated like some strange, rare operation.&lt;/p&gt;

&lt;p&gt;Aside from that, it went smoothly, though. The payoff was when I downloaded a
troublesome 500MB journal file from a running starling server and ran it
through the scala journal replayer. Starling would take about a half hour to
process the log (and frequently crash the ruby interpreter). &amp;ldquo;Scarling&amp;rdquo; (as I
call my scala version) was able to process the log in &lt;em&gt;23 seconds&lt;/em&gt;. Exciting!&lt;/p&gt;

&lt;p&gt;Next up was QueueCollection, which is what it sounds like: a collection of all
the persistent queues, and methods for getting stats on them. Starling does
some trickery here to avoid races around queues that are replaying journals. I
made a few false starts, trying to either duplicate the logic or improve it,
before I decided the really clever thing would be to make each queue an actor.
Once I made that leap, the code practically wrote itself, and I stopped for a
week.&lt;/p&gt;

&lt;h2&gt;Second chunk&lt;/h2&gt;

&lt;p&gt;The remainder of Starling is front-end code for handling connections, speaking
memcache protocol, and interfacing with QueueCollection. I needed to break
away from porting code at this point, because Starling uses a ruby wrapper
around EventMachine, a fast asynchronous event I/O library. New incoming
connections create a Handler object and send it events when new data arrives.
I was sure this was a place to use actors, but I wasn&amp;rsquo;t sure how much of the
I/O code I'd need to write. (I wrote most of ziggurat, Danger&amp;rsquo;s async I/O
library, so I'm not scared of writing an I/O library, but it would be really
time-consuming and non-fun.)&lt;/p&gt;

&lt;p&gt;Googling for &amp;ldquo;java EventMachine&amp;rdquo; turned up links to an apache project called
Mina. Aha! This was pretty much exactly what I was looking for. In fact, it
closely resembles a ziggurat-derived library my friend Matt is working on,
since they both use the idea of pushing protocol encoders into the I/O event
engine. Mina not only creates a new &amp;ldquo;session&amp;rdquo; object for incoming connections,
it can decode the wire protocol inside its own worker threads, and notify your
session of I/O events by sending it fully-decoded objects.&lt;/p&gt;

&lt;p&gt;This model is so close to how actors work that it just made my mouth water. I
just needed some kind of gasket! More googling revealed that one (and only
one) person had already done this work, and written a scala wrapper for Mina
that turned &amp;ldquo;I/O events&amp;rdquo; into actor messages. Well actually, he did it as a
patch against Mina instead of a wrapper, for ill-explained reasons. Oh, and
the patch fails to compile. Oh, and all the links on his project page are
broken. Oh, and also he has no posted email address. FAIL. I made an attempt
to reach him through blog comments (ugh) and decided to do this one on my own.&lt;/p&gt;

&lt;p&gt;Luckily, once I read enough of the (actually semi-decent) Mina docs, I was
able to connect Mina to actors in less than 50 lines of code. I told you: the
two models (Mina and actors) really are made for each other. I hope someone
ends up writing a working wrapper for it. Or heck, just use mine as a starting
point.&lt;/p&gt;

&lt;p&gt;The last piece was a memcache protocol handler, written to Mina&amp;rsquo;s API. I found
one from a project called jmemcached, ported the bits I needed into scala,
fixed it up to use some of scala&amp;rsquo;s nicer collections like ArrayBuffer[Byte],
and impatiently patched the wrole thing together for some trial runs.&lt;/p&gt;

&lt;h2&gt;First Results&lt;/h2&gt;

&lt;p&gt;The first results were pretty discouraging. I wrote a quick test script to
open a connection and do 10,000 queue SETs, and on my old Macbook Pro,
Starling could do them in under 4 seconds. My scala port could do them in 30.
This is roughly a factor of 10 worse &amp;mdash; an order of magnitude. Miserable
failure. What was I doing wrong?&lt;/p&gt;

&lt;p&gt;Scala doesn&amp;rsquo;t have many (any?) performance benchmarking tools, but java does,
and these java tools don&amp;rsquo;t seem to care if your jar was made by java or scala.
I fixed a few small things, but wasn&amp;rsquo;t making much progress. My inexperience
with hprof made me fix the least important things first, but the last two
were worth telling about.&lt;/p&gt;

&lt;p&gt;Using an actor for each queue was overkill. Reading from or writing to a queue
is such a fast operation that the message-passing for &amp;ldquo;please write to this
queue&amp;rdquo;, &amp;ldquo;okay done&amp;rdquo; was deadly. I reluctantly scrapped the actor code there
and used simple locks for the queues, and shaved off several seconds. The
client connections were already actors, so it was just adding too much
overhead to have the queues themselves be separate actors.&lt;/p&gt;

&lt;p&gt;The biggest gain, which took me the longest time to find, and made me feel the
most foolish, was in the memcache protocol decoder. I thought I was being
really clever by sending incoming data into a scala ArrayBuffer[Byte] and
doing functional operations on it. Those operations are sometimes slow as
molasses! One, &amp;ldquo;trimStart&amp;rdquo;, was using up 1 millsecond per queue push, or 10
seconds of total test time all by itself.&lt;/p&gt;

&lt;p&gt;Begrudgingly, I dove into my memcache protocol decoder and decided to do
things the Mina way instead of trying to be clever. Using Mina&amp;rsquo;s ByteBuffer
class slashed times dramatically, and suddenly my scala code was as fast as
Starling (less than 4 seconds). I actually spent a couple of days fretting
about the poor performance before this &amp;ldquo;eureka&amp;rdquo; moment, and was very relieved
to find out that the slowness was entirely due to my own brain damage, and
not to anything in scala or actors.&lt;/p&gt;

&lt;h2&gt;Better Results&lt;/h2&gt;

&lt;p&gt;&amp;ldquo;As fast as ruby&amp;rdquo; may not sound impressive. In some circles, it&amp;rsquo;s actually an
insult. But for this test, I felt good. First, Starling doesn&amp;rsquo;t do anything
intensive in ruby. It uses ruby&amp;rsquo;s expressiveness and its libraries to make a
small server that&amp;rsquo;s mostly disk I/O and network I/O bound. EventMachine does
a bang-on job of network I/O so there&amp;rsquo;s little room for improvement.&lt;/p&gt;

&lt;p&gt;However&amp;hellip; My test was effectively written to play to Starling&amp;rsquo;s strengths.
Since Starling runs in a single thread on a single process (STSP), it excels
at handling a single client which makes a lot of requests, but waits for a
response between each request. The advantage of the JVM appeared when I
changed the test to emulate 100 clients doing 100 queries each.&lt;/p&gt;

&lt;p&gt;To Starling, this is exactly the same test (10,000 queries, handled
sequentially) so it gets approximately the same result (less than four
seconds). Mina-plus-actors starts to shine here, though, and finishes in under
&lt;strong&gt;two&lt;/strong&gt; seconds. It&amp;rsquo;s able to juggle queue work with I/O threads. Success!&lt;/p&gt;

&lt;h2&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;I've heard there&amp;rsquo;s a big migration of ruby people to scala, and so the first
thing I would say to the ruby people is that this is no panacea. It&amp;rsquo;s not ruby
on a JVM; it&amp;rsquo;s an entirely new langauge, with much stronger java roots than
any other language, so familiarity with java is probably more helpful than
python or ruby. On the other hand, if ruby whetted your appetite for
functional programming, scala has more of that than ruby and python combined,
and seems to live up to its promise of exposing the wonders of java&amp;rsquo;s
scalability and rock-solid virtual machine and garbage collector.&lt;/p&gt;

&lt;p&gt;The main stumbling blocks I hit were the same ones others have talked about:
scarcity of tools, and a lack of pure-scala libraries &amp;mdash; both probably due to
the language being so new. Coming from java, it felt great to express myself
with python conciseness. It was like opening up the space capsule and
breathing fresh air again. Coming from ruby, it was reassuring to write for a
platform and libraries that are solid and well-tested, not just an &amp;ldquo;overnight
project&amp;rdquo; like many of the core ruby libraries appear to be.&lt;/p&gt;

&lt;p&gt;Oh, and in the end, my Starling port was only a little over 900 lines long. I
think if I added proper config file support like Starling has, it would end up
being roughly the same number of lines (1000), even though I had to write a
few large pieces (like the memcache codec) that aren&amp;rsquo;t necessary in ruby.&lt;/p&gt;

&lt;p&gt;The end result, which I plan to keep playing with and expanding, is here:
http://github.com/robey/scarling/tree/master&lt;/p&gt;
</content>
    </entry>
  
    <entry>
      <title>Scala infatuation</title>
      <link href="http://robey.lag.net/2008/03/16/scala-infatuation.html" />
      <updated>2008-03-16T00:00:00-07:00</updated>
      <id>http://robey.lag.net/2008/03/16/scala-infatuation</id>
      <content type="html">&lt;p&gt;Why my recent infatuation with scala? (Actually, since it&amp;rsquo;s reaching a half
year, I guess it can no longer be called an infatuation.) Here&amp;rsquo;s my answer in
the form of a quick example. I want to determine the average of a list of
timings for something.&lt;/p&gt;

&lt;p&gt;In python:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;timings = [1.2, 1.5, 1.3, 1.5]
reduce(lambda a, b: a + b, timings, 0.0) / len(timings)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In ruby:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;timings = [1.2, 1.5, 1.3, 1.5]
timings.inject(0.0) { |a, b| a + b } / timings.length
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Besides demonstrating that ruby is just python with different spelling, this
shows they both have some powerful built-in functions and syntax. I can type
one line for calculating the average and move on to the real point of my code,
whatever that is.&lt;/p&gt;

&lt;p&gt;Here it is in scala:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;var timings = Array(1.2, 1.5, 1.3, 1.5)
timings.foldLeft(0.0) { (a, b) =&amp;gt; a + b } / timings.size
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;It&amp;rsquo;s pretty much the same! I can type code that is very close to the
pseudocode in my head, and move on.&lt;/p&gt;

&lt;p&gt;Whether intentionally or not, scala is borrowing very heavily from the &amp;ldquo;high
level&amp;rdquo; language features of python (and ruby). But it&amp;rsquo;s running on top of the
JVM, which is still the best combination of interpreter machine + libraries
out there. Python&amp;rsquo;s VM is amateur at best, and ruby&amp;rsquo;s is even worse. This way
is like getting my reeces cup, two great tastes that taste great together: a
high-level, pseudocode language on top of a racecar engine.&lt;/p&gt;

&lt;p&gt;I suspect this is the main reason scala has been getting a lot of attention
recently, in spite of the gaping lack of decent documentation or tools.&lt;/p&gt;

&lt;p&gt;Oh yeah, here&amp;rsquo;s that same code in java. I'm not including the creation of the
array, because that takes 5 lines all by itself, and wouldn&amp;rsquo;t be fair to
compare:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;// assume omitted var: List&amp;lt;Double&amp;gt; timings.
double sum = 0.0;
for (double d : timings) {
    sum += d;
}
double average = sum / timings.size();
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;For java versions before 1.5, it&amp;rsquo;s a lot worse.&lt;/p&gt;

&lt;p&gt;Incidentally, last night I found a
&lt;a href="http://www.codecommit.com/blog/scala/roundup-scala-for-java-refugees"&gt;nice introductory article on scala&lt;/a&gt;,
which I skimmed and recommend.&lt;/p&gt;
</content>
    </entry>
  
</feed>
