Robey

Running the Gauntlet

2026-04-26T00:00:00-07:00

TLDR: I made a small typescript-compiling front-end for the nodejs test runner: https://www.npmjs.com/package/@robey/gauntlet

I’ve been on a quest to reduce dependencies in my nodejs projects, partly because of things like the left-pad incident and the debug/chalk incident, and partly because my ideal codebase is simple and organized. Deeply nested dependency trees of unknown provenance look to me like a mess to be cleaned up.

You’d be surprised how many important, useful libraries have an indirect dependency on things like “is-arrayish”, a polyfill for Array.isArray (available since ES5 15 years ago). Or “strip-ansi”, a one-line call to string.replace that itself imports a one-line regex from a different package. Most are relics of a past age or land grab of npm’s namespace, and probably considered harmless. But each one adds a layer of needless complexity and another potential attack vector.

As I’ve gradually cut back these dependency weeds, one tool has remained: mocha. It has a lot of dependencies. I’ve been eyeing it for a while, trying to figure out how to replace it, but building my own testing framework is a much bigger project than I want to take on. And in the javascript world, mocha is not a testing framework, it’s the testing framework. Whenever someone recommends a hot new alternate, I find out it’s just a new layer built on top of mocha. Mocha is still there underneath.

There’s also the uncomfortable truth that mocha is not very typescript-friendly. The favored way to run typescript tests is by running mocha with a special config file that sets the loader to “ts-node”, a just-in-time typescript compiler which is increasingly unmaintained and now emits warnings each time you run it. It would be nice to have a test runner that compiled the tests before running them, using the same typescript module I have in my project already.

I’ve noticed that some build systems (like vite) solve this by ignoring the types in tests, and run them as if they were untyped javascript. It’s much faster, no doubt, but I consider a type mismatch between my tests and library to be a bug, so I want that checked explicitly as part of the unit tests. And I want that to happen at build time, not just IDE time.

By the way, this is not a call to bully the maintainers of mocha! I’ve been using it for many years and it’s worked well enough that I’ve rarely thought about it until recently. Mocha clearly strives for long-term backward compatibility in the face of the moving target of nodejs, a commendable and difficult job. They also now have to support all these other test frameworks built on top of them. I think they’re doing great. I just have different goals.

The gauntlet

The idea remained stuck in the back of my head: I could probably replace mocha with a small library, like a minimal “pytest” that just discovered test files, defined describe and it functions, and ran them. But the word “just” is doing a lot of work there, and the more I thought about how to code it, the more load-bearing it became. It seemed like a huge effort for low payoff.

A few years ago, I noticed that nodejs’s builtin “assert” module had quietly added features like regex matching and promises, and could now handily replace my use of “should.js”. It didn’t have every feature I was using, but it had enough that I could make it work. One more dependency down.

And then at some point last year, I saw a new builtin module: “test”. Turns out some nodejs contributors had been thinking along similar lines, and had added most of the features of a standard test runner to the standard library. It can collect tests and suites into a tree, run them, and report the results, either as asynchronous events in javascript, or as one of a few simple text formats. In many cases, you could probably use this module as-is, and remove all your testing dependencies.

I was nerd-sniped. In my spare time across a couple of weeks, I hacked up a front-end to node:test to cover my needs:

Compile (changed) tests into a cache folder.
Run them.
Display a simple summary on success, and details (including stdout & stderr) for any failures.

I call it “gauntlet” after the english phrase “running the gauntlet”, which I recently discovered is actually a corruption/confusion of the swedish word “gatlopp”, and has nothing to do with the glove.

Typescript

The hardest part turned out to be running the typescript compiler as a library.

I’d already done this before. For many years, typescript has been my preferred config file format for JS projects. It’s an easy way to enforce a structured schema, define some custom types (like duration), and let the server fail immediately on startup for errors or typos. I used some sample code to drive the compiler API. It was pretty slow, but that didn’t matter much since a config file only gets recompiled if it changes. I don’t actually remember where I found the sample code. It was probably an earlier version of this “minimal compiler” from the typescript wiki.

Compiling a few dozen test files made speed a real problem, though, since each file took about one second. I want tests to build and run quickly for rapid iteration! Mocha + ts-node are much faster. They’re obviously not using the sample code from the wiki. It was time to get my hands dirty and figure out what they were doing differently.

The sample code in the “minimal compiler” isn’t much more than three steps (and three lines): build a compiler, compile a file, and get the list of errors. The code in ts-node is… a lot. Thousands of lines. Eventually I figured out the core difference, and whittled it down to about 100 lines that could reproduce ts-node’s speed. Their trick is to create a “language service” that can cache file & module contents across multiple typescript source files. The language service requires an environment to handle filesystem access and project metadata – things an IDE might want to control. The ts-node compiler builds a fake environment for each file, directing the language service back at typescript’s own internal implementations. The cache is the secret speed sauce.

It became clear to me that typescript was never really meant to be used as a library like this. Kudos to the author(s) of ts-node for figuring it out and making it look simple, despite the underlying complexity! If the project became unmaintained because they’ve moved to the forest to become a landscape painter, I wish them a happy retirement.

node:test

I wrote code to compile every test file from tests/ into a .gauntlet/ folder, then copy over everything from lib/ too (and any other configured asset folders). That way, tests can use relative imports while remaining in a relatively clean environment, separate from the project workspace.

After that, it was just a matter of hooking up the nodejs test runner, listening for events, assembling those events into some coherent story for the end user… and lots of debugging.

The test module is new enough that the event API documentation is sparse, and it still has a lot of quirks. I assume it’s too late to make significant changes, but if it wasn’t, here are things I would suggest for a future version:

There’s no easy way to know how many test cases are going to run until they finish. The enqueue/dequeue events report when a file is processed, but it would help to also report how many test cases they found in each file. This would let a runner estimate progress once the test cases start running… and even show a progress bar.
Use different event names for different event types. The start event currently means the test runner started reading a file, or possibly started processing a test suite, or possibly started running an individual test case. It’s difficult to infer which one. They should be 3 different events.
Some events (like start) only pass the name of the individual test case, not the full path of enclosing suites. For example, if a suite "queue worker" has a test case "handles shutdown", you’ll get a start event that only names "handles shutdown". Gauntlet has some convoluted logic to model the tree and figure out the path to each test, but the test runner already has this model of the tree. It built it when reading the test files. It should include the full tree path in each event that refers to a suite or test case.
Test case timeouts seem to include the time the test (and its file) are enqueued in the test runner. So if there are many more test files than cores, and some of the files have slow tests, a file can “timeout” with an error, without running any of the tests, before the file is even executed. The runner should start the timer only when a test case is actually started.

Results

The results beat my expectations! Tests are much faster now than they were under mocha.

All of that speed is due to the node:test module, not me. It seems to be launching a thread on each core and running all the test files in parallel. Once the VMs warm up, it can fly through them.

So I’m releasing gauntlet. Zero dependencies – though you’ll need typescript, of course. I gave it an MIT license because it’s not really much code, and it relies entirely on nodejs libraries that use MIT(-ish) licenses themselves, so that seemed only fair.

I admit my requirements are pretty narrowly focused: typescript compilation, ESM, and no dependencies. But if that describes you too, check it out.

https://www.npmjs.com/package/@robey/gauntlet

My Watchy OS

2025-09-14T00:00:00-07:00

In the summer of 2021, I bought a Watchy: a hobbyist watch with a tiny (200x200 pixel, 4cm) e-ink screen, an ESP32 processor (plus separate real-time clock chip), 4 buttons, a step-counting accelerometer, and 200mAh battery. I was doing a lot of ESP32 coding at work, so the idea of writing my own watch software sounded like fun, and the ESP32 is pretty powerful: tons of memory, bluetooth/wifi support, and good low-power modes.

Over the course of the next year, I iterated on a unique watch design that I really liked, and had fun showing off to other nerds.

How does it work?

In normal operation, Watchy’s ESP32 is powered down. That’s right, the cores and all but 8KB of RAM are turned off in what they call “deep sleep” mode. The e-ink screen is also off, conveniently still showing the most recently-drawn clock face. Only the RTC (real-time clock) chip is powered, to track the current time. When that time rolls over to the next minute, the RTC triggers an interrupt that wakes up (actually: boots up) the ESP32. The four buttons are also connected to a different interrupt/wake pin so they can have a sub-minute reaction time. As it “boots up” once a minute, the sample Watchy code initializes the e-ink display, updates it for the current time, and then powers everything off again.

But the fun of owning a hobbyist watch like this is replacing the sample code with something personalized, so of course that’s what I did.

First, I drew a round clock face to show the current (12-hour) time, with no number markings, because it looked cooler that way. Then I put the digital (24-hour) time and current day in the bottom corners, and an icon of the battery level in the top corner. I’m a visual learner, so the large clock face gave me an instant impression of roughly what time it is, while the digital time in the corner let me know the exact minute if I was about to be late for a meeting.

I’d written code for an e-ink display before (the Inkplate), and that one could take about a dozen quick-refresh partial updates before the screen started to show the faded remnants of old content, like a half-erased etch-a-sketch. To avoid that, you needed to do a full refresh, which causes the screen to distractingly flash black and white a few times. Whatever e-ink display they used in the Watchy does not have that problem. It looked like I could update the clock face every minute for hours (maybe even days?) at a time without the full refresh. I ended up doing a full refresh for any user interaction via button press that was going to change the whole screen anyway, but I’m not sure it needed even that. Amazing!

The most obvious use for the built-in wifi is NTP, and I believe some of the example code from Watchy already demonstrated this. It’s easy because an NTP client is built into the ESP’s standard library. Wifi, however, is extremely power hungry for a tiny embedded device, so I set it to connect only at the top of each hour. The RTC chip is not very accurate, losing about 5 seconds a day, but an hourly wifi check was enough to keep it pretty accurate without draining the battery. Given how much time I spent away from my home wifi without reconfiguring the watch’s connection, it was probably good enough to be unnoticeable even if it only updated every day or two.

One improvement I made was to set the RTC’s time in UTC (as it comes from the NTP server), then use the ESP’s standard library to convert to the current time zone. It already has support for one of the standard time zone description formats, including daylight savings, so it was both easier and better than trying to do it by hand, and left open the possibility of switching between a set of predefined time zones (foreshadowing).

At that point, I thought: But this watch has wifi. Much more is possible!

Weather conditions

I used to track our house’s thermostat usage against the outside temperature, so I got skilled at bringing up a new API client as each weather service either died (Weather Underground), enshittified (Dark Sky), or both. I’m currently using OpenWeather and like it just fine. I drew some tiny icons to represent the current conditions, and displayed the “current” temperature and humidity from the last time it connected to wifi.

Wildfires and the day the sun didn’t rise were also top-of mind, so I connected to the AirNow API too, and drew some tiny icons representing 5 different levels of air quality, from “great” to “gas mask time”.

Here are the icons for your amusement, and feel free to use them for your own projects if you like. If it’s possible to license a 24-pixel icon, I hereby release them under CC0:

What next? Why not sunrise and sunset? Why not the current moon phase, and moonrise and moonset? I often go for night walks, so this is not useless info! Turns out you can get a great estimate of all of these if you put the current time and your latitude/longitude into a set of cubic equations that first appeared as a Pascal program on a floppy disk included with a paper book about astronomy in 1994. I only mention that because almost every code sample you can find online is a direct descendant of that code – including mine. You can find this same code translated into QBasic, Javascript, and Python. It deserves its own “secretly influential code” blog post. (The equations are complicated because not only are all three orbits elliptical, but everything wobbles.)

I’m not a great artist (please act shocked), so I bought high-quality icons of the moon phases from the Noun Project and down-res’d the images to 48 pixels, which did them a lot of injustice, but still looked great.

While working on this feature, I took a vacation to Hawai'i, which bumped up the priority of supporting multiple time zones. Naturally I had to stay up all night on the lanai, hacking on my watch as the moon rose. I ended up making a table of locations popular with my friends and family, and letting you switch between these locations by name. Each location has a time zone descriptor and latitude/longitude, which is enough to localize the time, weather, and astronomical conditions. It’s easy to add new ones by looking up a city’s entry on wikipedia, which has all the relevant data in a handy table in a sidebar.

Now it gets excessive

All this weather and moon information really needed to be on a second screen to keep the main clock face from getting cluttered. There also needed to be a way to change the current location (and possibly other future settings) without recompiling and re-installing the watch each time. So, of course, I decided to write a custom UI toolkit, and assigned a purpose to each button: menu/select, back, up, and down.

Several KB of RAM are preserved when the ESP32 is in deep sleep, and that’s more than enough to store all the state you need to track user interactions: which “view” is currently displayed, a small stack of previous views for navigating “back”, and the state of each view (like which menu item is highlighted). It’s a small screen that can only display 8 lines of 3mm-height text, so any long text or menu also needs to support scrolling and remembering its scroll position. Aside from memory reserved for the system log and a dynamic demo, it all uses less than 1KB of state… and most of that is for caching the current weather & air quality across 10 locations.

I was so pleased when I got word-wrap and scroll working – with a scroll bar and long-press to scroll by a page! – that I recorded a small video demonstrating it on a famous PKD quote. Then I built a small menu widget for changing settings on the fly (like location). Preferences are stored in NVS, a tiny wear-leveled key-value flash store that’s part of the ESP standard library. EZPZ.

You may have looked at the images so far and thought, “What bitmap font is that? I haven’t seen one that odd before.” That’s because I took my Bizcat font and adapted it to a larger size (24-pixel) proportional style to fit the watch.

Nothing lasts forever

For a while, I had a lot of fun writing the code and adding features. Then it felt sort of “done” and it was my working watch for two years, which may be the highest praise to give a thing like a wristwatch: after a while, I took it for granted. Every once in a while, someone would notice it and I could show it off, but most of the time, I just … used it.

I tried to connect with other Watchy fans to share code & ideas and get inspired, but the only active forum seemed to be a Discord guild. As with all discords, information tended to scroll gently away into oblivion, with no way to pull out or organize the good stuff. A message board and/or a group wiki for tracking projects and toolkits would have been invaluable, but the community never got enough traction to make the leap, and the creator wasn’t really engaged with it by then.

If you feel like iterating on the Watchy design, the schematics are online. There were two common suggestions from users for a 2nd rev:

Add some minimal amount of water resistance to the case. Some people reported that a single drop of water could kill the device. This makes it hard to wear if you live in a place where it rains.
Add a little LED light like old Casios had. I don’t think it would have hurt the battery life much since you usually don’t hold the light on for very long. Without this, the watch was unreadable outside at night.

I found network connections over wifi to be extremely flaky, though it seemed to be a software problem, not hardware. At the top of each hour, if I was home, my Watchy would connect to the house wifi and update NTP and weather data. Weather involved making a separate HTTP request to OpenWeather and AirNow for each of the 4 or 5 cities I was tracking, but I rarely saw it complete all of the cities before it would start getting network errors as if the wifi had dropped out. (The wifi was fine.) Sometimes it would only get one city, sometimes almost all of them. I tried adding delays and fussing around to triple-ensure that all resources were closed after each HTTP request, but nothing worked. It might have been a bug in the ESP network stack… or maybe there was one last elusive resource I didn’t close correctly. I never figured it out.

As you can see, mine got a lot of use despite those shortcomings.

My first watch died after about a year for mysterious reasons. I bought a replacement and felt newly inspired to hook up the step counter and track a small history of daily steps in flash. I factored out common code into a “Watchy OS” library that others could reuse, but never felt motivated to actually post it as a separate repo. Later I found out this “shared base OS” idea had percolated through the community a few times before without much success.

My second watch died within about a year for similarly mysterious reasons. This time, it was still working, but could no longer be charged. Once it drained its battery completely, it was an ex-watch, permanently showing the last time it had enough juice to update the screen.

I decided not to buy a third, because it was kind of expensive for a watch that lasts one year, and it was clear the product had quietly reached end-of-life. I still miss it.

The code is here, with an Apache-2 license, if you want to check it out or borrow any of it: code.lag.net/robey/nuwatchy

Setting up a CI server for Forgejo

2025-08-10T00:00:00-07:00

I recently got CI working with my personal forgejo instance, and thought I would share how I did it. It’s not that hard, but has a few confusing quirks, and all the tutorials I found online were wrong, outdated, or both. They also tend to include a few pages of steps you don’t really need for a small instance.

Background: Forgejo is a “software forge” – a way to host git & jujutsu code repos for you, your friends, or your team. It’s part of the “small communities” decentralization movement, and is a good replacement for things like Github or Launchpad. CI means “continual integration” which just means “build server” like Jenkins, TeamCity, or Tinderbox. It’s totally possible to host your own code and CI on 1 or 2 servers, virtual or home lab.

How it works: The build server runs some kind of Linux container manager. I chose podman, a command-line tool that’s compatible with Docker, but much smaller & simpler, and doesn’t have to run as root. A separate service called forgejo-runner authenticates with your forgejo server, receives build requests, runs them on the container, and sends back logs and artifacts.

Forgejo recommends putting the build server on a different machine than the forge, because both pieces of software can run hot, and the build server will be literally running new Linux VMs to do builds. For security, you probably want to treat the build machine as disposable: don’t give it access to anything besides your forge, and rebuild it from scratch if it behaves suspiciously. (I haven’t had any trouble here. This is just general good advice.)

I chose Alpine as the Linux install for the build machine. It’s a good thin distribution for servers, where you don’t need any bells and whistles or a full desktop. The first time I tried it, I was pleasantly surprised that the display of top only showed my servers, not pages and pages of systemd and random “support” daemons. It also has packages for podman and forgejo-runner, which greatly simplifies the CI install.

Install Alpine

If you already have an Alpine 3.22 VM ready, you can skip this part. Otherwise, install a fresh Alpine image, and do the bare minimum:

# setup-alpine

Make sure to create a user account for yourself and add your SSH key.
Add your user account to /etc/sudoers under root.
Make sure you can login as your new user, from a new ssh connection, and sudo echo hi to make sure you can sudo.
Turn off ssh password auth. I don’t understand why this defaults on for virtual machine images.
- Edit /etc/ssh/sshd_config, find the PasswordAuthentication line, and change the value to No.

If you’re picky like me, now is also your chance to install some of your favorite tools:

# apk add less lsof

Now upgrade it to at least 3.22, if it’s not already. That’s the minimum version to get forgejo-runner.

Edit /etc/apk/repositories
- Change the version number to 3.22 on every line.

# apk update
# apk add --upgrade apk-tools
# apk upgrade --available

This should be pretty fast. Then reboot into the new, fresh-smelling 3.22:

# reboot

From now on, you’ll be logging into this machine as yourself (the username you added above), not root, and using “sudo” to do things that require root permission.

Install podman & runner

Podman and forgejo-runner can be installed at the same time, but they need extra dependencies. Some are obvious (git), others are needed to create virtual file systems and network interfaces.

# apk add nodejs npm rsync git iptables podman forgejo-runner slirp4netns fuse
# rc-update add cgroups
# rc-service cgroups start
# modprobe tun
# echo tun >>/etc/modules
# echo forgejo-runner:100000:9999 >>/etc/subuid
# echo forgejo-runner:100000:9999 >>/etc/subgid

If you aren’t already using tmpfs, you need to set that up too. Podman has a panic attack if it sees old files in /tmp after a reboot:

# echo "tmpfs /tmp tmpfs nodev,nosuid,size=1G 0 0" >> /etc/fstab
# mount /tmp

At this point, you should be able to run a test image with podman – as your normal user account! – to prove it’s installed correctly.

$ podman run --rm hello-world

We need to do a little bootstrapping to make sure the environment is set up correctly before starting the daemons. Runner creates its own user (forgejo-runner) and we want podman to run as the same user. Runner will create this user’s home folder the first time it starts, but it expects podman to be running already, and podman will crash if it can’t reach the same home folder. Easy to untangle, though.

# forgejo-runner generate-config > /etc/forgejo-runner/config.yaml
# mkdir /var/lib/forgejo-runner
# chown forgejo-runner:forgejo-runner /var/lib/forgejo-runner

Runner and podman communicate through a unix socket. Podman’s default location for this socket is undiscoverable, so tell them to look in /tmp.

Edit /etc/conf.d/podman and change these lines:
- podman_uri="unix:///tmp/podman.sock"
- podman_user="forgejo-runner"
Edit /etc/forgejo-runner/config.yaml and change this line inside the container section at the end:
- docker_host: "unix:///tmp/podman.sock"

Also, the startup script for forgejo-runner has a bug. It can’t start at boot time until podman is running, but they forget to add a dependency.

Edit /etc/init.d/forgejo-runner and add this line inside depend():
- need podman

That makes the depend() function look like:

depend() {
    need net
    need podman
    use dns logger
}

Now reboot to make sure all the new config is loaded (especially the changes to /etc/subuid and /etc/subgid).

Start the podman & runner services

# rc-update add podman
# rc-service podman start
# rc-service podman status

As your normal user, double-check that the runner can talk to podman, and register it with your forgejo server:

$ doas -u forgejo-runner podman run --rm alpine echo "It works!"
$ cd /var/lib/forgejo-runner && doas -u forgejo-runner forgejo-runner register

It will ask for your forgejo server’s url, and then a token. There are a few kinds of tokens, but the simplest kind is the “site” token which gives access to every project on your server. That’s in the upper right corner under “Site administration” -> “Actions” -> “Runners” -> “Create new runner”.

After that, you should see the config in /var/lib/forgejo-runner/.runner as JSON. You can change the label here if you like, but I figured “docker” is already a pretty good description of what’s going on.

Now you can start up the runner:

# rc-update add forgejo-runner
# rc-service forgejo-runner start
# rc-service forgejo-runner status

Back on your forgejo server, reloading the “Runners” admin page should show your new runner as idle. You did it!

As a last step, you may need to “activate” actions for your project(s). That’s a separate checkbox in the project’s “Settings” page, under “Units” -> “Overview”.

How do I use this?

Forgejo runners seem to be based on a Github feature called “Github Actions”. They’re so heavily based on it that they use the same build file structure, use Github’s name in many of the field names and environment variables, and require a lot of ceremony that’s overkill for a team build server.

Describe a build by committing a special file into your repo in a .forgejo/workflows/ folder. This file must be YAML. Inside, you declare which runner to use (usually “docker” unless you changed it), which docker base image to start from, and a set of steps/commands. Optionally, you can set which branches trigger the build when pushed. Here’s a simple example:

# .forgejo/workflows/build.yaml

---
name: build
"on":
  push: "*"
jobs:
  build:
    runs-on: docker
    container:
      image: "rust:alpine"
    steps:
      - run: apk add nodejs npm git
      - uses: actions/checkout@v4
      - run: "./ci.sh"

This calls the build “build”, starts it on a push to any branch, starts with the rust:alpine base docker image, and runs the shell script ci.sh to do the actual build.

Two tricky bits to catch here: You have to add the uses: line to ask explicitly for your code to be cloned into the image – without that, you get an empty folder. And before that, you need to install nodejs & git in the image, because the “plugin” that handles that is written in typescript, and will fallback to downloading your code over an API if it can’t find git installed. This looks like a bunch of cruft “borrowed” from Github, which hasn’t bothered anyone enough to get fixed yet.

As soon as you push this file to your forgejo repo, you can go watch your build in the “Actions” tab. Congrats!

Bizcat - an 8x16 bitmap font

2020-02-09T00:00:00-08:00

Bizcat is a retro 8x16 bitmap font, intended for hobby computers or terminals that use a 640x480 graphics mode, like the Linux VGA console. It supports most of Latin-9 and box-drawing glyphs. It’s available to use under the Creative Commons “share and adapt” license, so feel free to tweak it or use it in your own projects.

An image file in BMP format and a Linux console PSF file are available in the fonts/ folder of my font-problems toolset.

Why?

Retro-computing has received a new wave of interest over the past few years. It may be a reaction to the boggling complexity and fragility of modern systems: x86 boxes apparently require a separate CPU and OS just to bootstrap the real CPU, and once it’s running, it uses tricks so baroque that it took over a decade to figure out they were subtly flawed. It may also be due to the explosion of the “maker” community and the recent availability of simple hardware and components: you can build your own custom CPU from logic chips or assemble a motherboard for a new 8-bit platform.

Over the winter, I played around with the Commander X16 emulator, part of a project by David Murray (The 8-bit Guy) and his friends to create a new hackable 8-bit home computer in the style of the Commodore 64. The video output is 640x480 VGA, and by default it uses a blocky 8x8 font, maintaining the spirit of Commodore graphics while scaling up to allow a console of 80x60 characters. Unfortunately, while square letters are authentic, they clearly represent a limitation of the old graphics mode (which was only 320x200 or 40x25), not an aesthetic choice.

It's hip to be square.

This made me think about how the old VGA PCs drew characters. With a wide-open 640x480 space, you can fit a standard 80x25 terminal by using a character cell size of 8x16, so that’s what they did. A 2:1 ratio sounds really tall for that era, but they used the extra vertical space to give more breathing room to accents and to pad out the line spacing. The result (check out this awesome collection of IBM fonts) looks really pretentious to me because of all the serifs. I think IBM chose a typewriter style to differentiate their “strictly business” PCs from the “fun time” look of their competitors. But now it’s fun time again.

I was immediately bitten by the bug. Time to draw a new bitmap font!

Since I was committed to spending my free time on this, I decided to make a few other philosophical changes. In the 1980s, you needed a 100-kilo CRT to view text. These days, you’re more likely to be looking at VGA graphics on a 7" flatscreen made from old tablet parts. So I decided to use thick lines that can be seen at high DPI. If all lines are two pixels wide, then 1-pixel offsets act as a kind of anti-aliasing to smooth out curves. And if curves look nice, let’s lean into them and make circles look as round as possible. Some fonts push circles in the direction of rounded rectangles, which gives a nice rectangular look overall, but to me, geometric looks more “fun”, and circles are very retro-80s.

Left: International Business Machines. Right: Business cat.

6502 assembly on the X16

I tested and iterated on the font by trying it out on the Commander X-16 emulator. The VERA video “chip” has two layers, and each can be assigned a different bitmap font, so I sliced the 8x16 font into two half-fonts, with each character’s top half on one layer and the bottom half on the other. Once I was satisfied, I realized I could also use this font on my physical Linux server console, after cleaning the spider webs off of it.

man epoll

midnight commander

So jaunty! I think I’ll keep it.

If you make any significant changes, or if you use this font in any interesting places, I’d love to see them!

glyphs so far

The Year of the Linux Laptop - Darter Pro Review

2019-04-21T00:00:00-07:00

Background

I started using Linux in the early 1990s, first as a toy, and gradually as my primary/only OS. Aside from the cute Solitaire game, Windows 3 didn’t really appeal to me, while Linux had native support for the internet and a development environment that would let you write networked programs that could do several things at once through multitasking – a world of possibilities, which I promptly exploited.

2000: Year of the 640x480 laptop

In 2000, my employer loaned me a Linux Dell laptop. I think it was called “Geographer” or “Latitude”. It was a mess. Despite the internet bursting into popularity five years prior, you needed a separate dongle with a modem or ethernet adapter that connected directly to the main bus through a slot on the side. Once plugged in, you had to modprobe the right driver to bring the network up. If the laptop ever went to sleep, the driver would die and you had to reboot.

I say “sleep” but it actually had three different ways to pause, with silly names like “cat-nap” and “hibernate”, each triggered by a different F-key. If you used any of them, the laptop had a 50% chance of hard-crashing, either right away, or when you tried to wake it up later. The battery drained so quickly that you couldn’t count on it lasting through a meeting.

That same year, I bought an iBook. It was my first Mac. It was very bad for coding, but it was pretty. It could play music without glitching, because there was no preemptive multi-tasking. It could connect to the network through a built-in ethernet port or wifi, without dongles. If you closed it, it paused, and when you opened it, it resumed and reconnected to the network without crashing or rebooting. It was a computer re-imagined as a tool. I fell in love.

Over the next few years, I started using Mac laptops for all development work, personal or paid, and used Linux exclusively as a server platform. By the late zeros, I wasn’t considered “quirky” for this anymore: almost all developers had moved their daily work to Mac laptops. Windows and Linux had quietly given up the (admittedly very small) tech-worker share of the market.

But for the past decade, there’s been a rising anxiety about this situation in the tech world. Apple probably didn’t intend to become the premier (or only) manufacturer of development machines, and they’ve become increasingly unsubtle about their intention to focus entirely on the highly lucrative fashion-accessory market instead. The rapid decline in laptop quality has made this palpable.

Surely in the past 20 years, the PC world has caught up enough to become a viable alternative, right? I decided it was time to give it another try.

Investigation

I’m not an Apple fanboy, so I haven’t been oblivious to the outside world. Several friends are die-hard fans of Linux and/or freedom, and for years they’ve been swearing that the water is fine, and I should dive in.

My table stakes, though, are that a candidate laptop has to pass two tests:

It can be paused by closing the lid. When the lid is opened and the screen is unlocked, it resumes without crashing.
It can remain wifi-capable for at least a week without requiring a reboot.

Apparently these requirements are still enough to disqualify most laptops, but one that gets a lot of love is the Lenovo (previously IBM) Thinkpad. More than one person has proclaimed to me over a beer that the Thinkpad is the best Linux laptop, full stop. So over the winter holidays, I read several reviews of the current model, the “Thinkpad X1”.

These reviews are nearly universally terrible.

As far as I can tell, Lenovo doesn’t actually consider this to be a Linux laptop. It’s not mentioned in their marketing materials, and they don’t offer a version with Linux installed. Instead, you’re responsible for installing a Linux distro from scratch, then hunting down and installing various custom drivers if you would like to use features like the trackpad.

I don’t think this model can work. I know Linux fans really want to believe it can, but I think laptops just contain too much custom hardware. To be a “Linux laptop”, I think the manufacturer must provide and install the right drivers at the factory. It can’t be the customer’s job to go scavenging on sketchy websites for drivers.

If you want a good experience, the laptop needs to be sold to you as a “Linux laptop”, with the explicit promise that it has an OS with drivers that have been tested and pre-installed. Surprisingly few laptop makers are doing this (yet?).

Darter Pro Hardware

I heard about System76 from an enthusiastic co-worker, and had been eyeing one of their laptops as a contender, but I was indecisive, and a few months later it had been discontinued. Then the Darter Pro was posted in February, the specs looked reasonable, and the initial reviews were positive, so I went for it.

The requisite unboxing photos

As you can see, it’s a cool unboxing experience. There are a bunch of stickers and a welcome card inside; it looks like the expected audience is a hardware “maker” who is trying out Linux for the first time. It’s lighter than a macbook, which made it feel a flimsy at first, but it’s really sturdy and I’ve grown to like the feel of it.

The laptop is ready to use when you open it, and does some Apple-style setup screens before dumping you into a cute-looking variant of Gnome. It configured my wifi during setup, and also connected my Google and Nextcloud accounts so that my calendar and file-sharing started syncing immediately. It gives a great first impression.

The keyboard… is weird. It looks like it was designed by committee, or a machine learning system gone haywire. There’s a number keypad on the right side, which pushes the main keyboard to the left edge. There are full-sized arrow keys (yay) that are hard to find by touch (boo). Keys with no purpose, like “Scroll Lock”, “System Request”, and “Number Lock”, are resurrected from the distant past, seemingly just to fill in empty space. There are four separate function keys for controlling the brightness and hue of the keyboard backlight. 😂 That said, once you get used to holding your hands on the left side of the laptop, the keys are pleasantly tactile and feel nice to type on. I’ve had to unlearn some Mac shortcuts for delete, page up, and page down, but those keys are relatively easy to find by touch after some practice.

Weird but functional keyboard layout

The screen, sadly, is not retina (also known as 4K or Hi-DPI), and it is noticeable. I don’t mind most of the time, and my friends tell me I wouldn’t want a retina display yet anyway, because X11/Gnome doesn’t support it well yet. But it means that small text in Firefox looks pretty retro and frankly bad.

I was really nervous about the trackpad, given that every PC laptop review complains about how bad they are. But it works great! Moving the pointer, two-finger scrolling, and even double-tap-drag work without editing a single config file. It works so simply and effortlessly that I’m sure it was a huge project, so: hats off to whoever hacked on trackpad support! It’s amazing!

There are two wobbly physical mouse buttons under the trackpad. They’re hard to find by touch so I rarely use them.

This laptop would like you to know it has a fan. At random times, even if nothing is happening, the fan will come on full blast for about 5 seconds and then shut off. It’s alarming but hopefully harmless? The fan seems to have no intermediate setting: it’s either off, or it’s a jet engine. Once, the fan stayed on for 8 hours and I worried that it was broken, but it turned out that I had accidentally hit Fn-1, which is a secret command to turn the fan on max power. I’m not sure why that seemed like an important feature to have, and an important feature to hide.

Relatedly, sometimes the laptop will turn off unexpectedly. It’s happened 3 or 4 times in the past two months, and doesn’t seem to be related to load, so it’s probably a firmware bug. Save early and often. On the other hand, this laptop cold-boots faster than anything, so I’m not offline for long.

The battery lasts forever. Several hours, sometimes most of a day. It has so much capacity that it takes the power brick a few hours to fully recharge it.

This is a very powerful machine. I haven’t been able to fill up its 32GB of RAM, even running several electron apps and two browsers. The fan may come on intermittently, but Minecraft is the only app that’s been able to keep it on continuously (a feat it accomplishes with equal ease on my macbook). It never approaches the macbook’s heat level either.

Software

Table stakes: It does indeed pause when you close the lid, and resume with an unlock screen when you open it again. Wifi comes back online and it doesn’t crash! (I wish this surprised me less than it does.) Usually it wakes up immediately, but occasionally it takes about 30 seconds. The syslogs make it look like sometimes the kernel doesn’t finish putting everything away before the CPU turns off, so when it wakes up again, it spends a few seconds continuing to pause before realizing that it’s supposed to be waking up now.

The OS is labeled as “Pop! OS”, which seems to be Ubuntu with a new desktop GUI, probably in reaction to Ubuntu’s floundering around Unity. My recent Linux desktop experience has been limited to xfce in virtualbox, and this is at least as good. Hitting the command key (also known as the “super” key) brings up a view of every open desktop and app, a launcher, a package installer, and a file manager which seems to be a distant descendant of Nautilus, to my embarrassment. Nothing is ugly and everything is snappy.

Apparently, while I’ve been away, Linux has migrated its command key from “Alt” to “Control”. This is a terrible idea, since many control-keys already have a defined meaning, but I can’t find any global setting to fix it back. So, for example, many text fields will interpret ^A as “select all” instead of “start of line”, and erase everything you typed. It’s funny the first few dozen times.

This includes all terminal apps, which need to send control keys to the terminal so they can be used for their normal purposes. Most try to rebind the command keys to new places, so “copy” becomes control-shift-C or alt-C, but all this does is add to the confusion. The punch line is that few apps have any use for the “Alt” or “Command” (“Super”) keys, so “Control” does the duty of every possible meta-command, and the rest are wasted. I ended up binding “Command” to various window manager tasks, and used a tool called AutoKey to bind command-C and command-V to send either control-C or control-shift-C depending on which app is active. It’s a total hack, it feels dirty, and it makes Linux look bad.

Why is this necessary?

I’m impressed with how many apps have a Linux port that works great without effort. Success stories include:

Firefox (and Chrome)
Slack, Discord, Telegram, Skype
Spotify
VS Code (and Atom)
VLC
Minecraft

A few that are missing:

Sonos
an image editor similar to Acorn (though Krita appears to be a cool paint app)
a git visualizer similar to Tower or Fork
a tiling terminal app similar to iTerm

The last one was very surprising. I assumed desktop Linux was primarily used by people who spend their day in terminals. Not so. Gnome-terminal has never progressed beyond the level of “demo app”, and most of the others I tried (Alacritty, Guake, Terminology) exist to show off a whiz-bang graphics trick, not to be a power tool. I settled on Terminus, which is very configurable and supports multi-pane windows.

There are also the usual glitches: once, across a reboot, the entire settings panel (which is an app) just… vanished. I had to research what it’s called and reinstall it from apt. It was more funny than annoying.

So far, Pop! OS has never displayed a popup nagging me to change my web browser, or insisted that it needs to reboot to install an update to a video codec, so it’s an improvement over OSX from the start. It’s also, bewilderingly, able to switch apps much faster than OSX. I have my main apps each bound to a different “super” keystroke (using wmctrl to change the active window), and when I hit one of these key combos, there’s a very short animation, and then the new app is immediately responsive. On the macbook, apps often respond sluggishly for a second or so after activation.

Summary

This review probably sounds overly critical, but I wanted to be honest about the warts and splinters, and prepare anyone else who’s thinking of taking the plunge back into the Linux world. For every flaw, there were two or three things that worked better than expected, or wowed me.

I’ve had the laptop for about two months now. During the first two weeks, I switched often between the Darter and my macbook, depending on what I was doing. Then, one morning, I noticed that the macbook was sitting on a desk collecting dust, and hadn’t been opened in days. I now use this Linux laptop almost exclusively – for everything but image editing, and I’m still holding out hope on that one.

It’s a weird little laptop, and I love it.

Interaction

2012-12-13T00:00:00-08:00

I’ve been reading more and more blog posts of the “please don’t interrupt me” variety, from yesterday’s post by Zach of Github about communicating over text back to Paul Graham’s essay on meetings to the under-appreciatied Peopleware (from 1987!).

Just so you don’t get the wrong idea, I agree with all of these sentiments. In fact, usually I think they’re too narrow-focused – it may surprise you to learn that coders are not the only people who have to think on the job, and need uninterrupted time.

Interruptions bad

Every place I’ve worked, interruptions have been an issue.

Sometimes, it’s just a difference in perceived priority. When a co-worker taps your shoulder, breaking your concentration, it may not be because they have a different way of scheduling time. I usually find that it’s because when you have a problem that you need help with, it becomes the highest priority “work item” you have. You can’t do anything but twiddle your thumbs and be unhappy until you get help. The person you’re interrupting is busy working, though, so answering questions is a much lower priority, and your interruption seems rude. The best solution I’ve seen for this is to have a group chat or IRC channel where anyone can ask a question, and the audience is large enough that someone will be around who can help.

Sometimes, the work environment is bankrupt. As offices with doors gave way to cheaper “Dilbert cubes”, which gave way to even cheaper sweatshop tables, background noise levels have crept up. Some workspaces now have the feel of never-ending house parties. You can tell this is becoming a problem when a large percentage of people are wearing headphones at any given moment, and conference rooms are confiscated for coding.

Peopleware categorizes work communication as either interrupt-driven or not. Interrupt-driven communication includes face-to-face meetings and telephone calls, which break your concentration. (This book is from the 1980s when it was common for each desk to have an office phone.) In fact, they provide evidence that each interruption of this type takes at least 15 minutes to recover from, afterwards, before you can get back “in the zone”. Email and chat are not interrupt-driven, and you can check them between tasks, without losing efficiency.

There is no question that whenever possible, you should favor non-interrupt- driven communication channels. But all the other blogs have been saying the same thing, using more flowery words. I want to talk about the other side of this coin:

You cannot have a job where you don’t interact, often, face to face (or its video equivalent) with your co-workers.

Socializing good

Humans are social creatures. Even engineers! Even (heaven forfend) neckbearded hackers from the C++ caverns! Think of the grumpiest, more hermit-like person you know. I bet they still do something social, just not in a format you’re used to. They may play MMORPG tournaments all night long, or meet up for chess clubs or Yu-Gi-Oh tournaments.

Some anthropologists argue that our social behaviors spawned the creation of language. Language gave us the ability to pass things we’ve learned firsthand on to other people, creating our entire culture. Big stuff.

Because we are social creatures, we need to interact with the people we’re working with. We need to create bonds. We need to create a shared sense that we’re in the same “tribe” together, working toward the same goals. We need to build a rapport and trust, so that if I do something wrong, or something that you aren’t fully informed about, you’ll give me the benefit of doubt, and stay focused on your part of the overall project. This is part of the magic sauce that makes a successful team.

Why does this matter? As an example, I might notice an odd-looking bit of code one night while I’m working on an unrelated feature. In fact, I might think, “This is a bug.” If I don’t know the author very well, my reaction might be anything from “I should file a bug” to “Ugh, another piece of buggy crap, whatever”. If I’m comfortable with the author, though, I might shoot off an email instantly: “What on earth is this?” We might avoid downtime because I’m not worried about offending my team-mate. Or my confidence level might be raised because they’re not worried about shooting me down. (“That is seriously how the library works, Robey. Check out my snarky comment 3 lines up.”)

These social bonds are really hard to form over email.

How to socialize

My gut tells me that this socialization is why some bad teams have weekly “staff meetings” (now, thankfully, out of fashion). The manager understands, at some level, that everyone needs to be in the same room periodically, but doesn’t know how to do that without scheduling a meeting where everyone reads a spreadsheet out loud from a projector. Standups are not good for this, either, since the purpose is to share information quickly, not to chit-chat.

If you understand that the socializing is important, there are better ways to work it in.

I had a co-worker once who went around to our whole team at lunchtime every day and demanded that we go out to eat, outside the office. He did not want to hear that you would maybe get something from the cafeteria, or that you might just bring something back to your desk. We were going to go out, as a team, and eat together. And even though we tried to avoid talking about work, at least one day out of five, I would find out something interesting and relevant about what someone else was working on.

The “coffee train” phenomenon works well, too. In the doldrums of the afternoon, everyone that’s not in crunch time gathers and goes out to get coffee. This is sometimes mystifying to managers, because there’s coffee available in the office – even a fancy espresso machine! But the purpose is (1) to leave the office, and (2) to talk to each other for a while.

At one company, we used our “team building” budget to buy a kegerator (a mini- fridge that’s been retrofitted to serve beer from a keg). Once a week, late in the evening, we rolled it out next to a bunch of couches and had a social. It got so popular that it was hard to get near the keg, but you could meet people from all over, and learn a lot. Several times, an utter stranger would walk up to me, beer in hand, and say something like, “I’m from team X, and I didn’t want to bother you about this, but I was wondering…” and suddenly I’d find out something really important to the success of my project.

On the other end of the scale, some of my saddest experiences have been with teams that don’t communicate with each other. They silo themselves off, sometimes because they don’t think anyone else has anything valuable to offer, or sometimes because they just need to be “heads down” to meet an impossible ship date. Months later, they can’t ship anyway, because they missed an important prerequisite or spent too much time duplicating a bunch of existing work. They were too busy to talk to anyone, and instead wasted a chunk of their life that they’ll never get back.

So, while I love the new awareness about interruptions and alternate means of communication, I strongly encourage you: Find some way to interact with your coworkers, socially, between code sprints. It may seem like you’re just “wasting time” chatting at the water cooler, but you’re really making an investment that can make you more productive, and your team more successful.

How to add numbers (part 2)

2012-11-14T00:00:00-08:00

Last time, I explained how adders work in CPUs, and one nice trick for speeding them up. Be sure to read part 1 before diving into this!

Generation and propagation

In 1958, some sharp fellows named Weinberger & Smith hit the carry ripple problem from a different angle. Even if you don’t know what a column’s carry-in will be yet, you can make some assumptions about what will happen:

A	B	C_out
0	0	0
0	1	C_in
1	0	C_in
1	1	1

If both inputs are 0, the carry will definitely be 0, so the carry is “killed”. If both are 1, the carry will definitely be 1, so a carry is “generated”. Both of these cases are the same whether the carry-in is 0 on 1. But if only one of the inputs is 1, then we’ll only have a carry-out if we had a carry-in, so a carry is “propagated”.

We can use “G” to mean a 1-bit adder would generate a carry by itself, and “P” to mean it will propagate its incoming carry.

G = A⋅B
P = A⊕B

So, for any column, the carry-out will be 1 if either “G” is 1 (it generates a carry), or “P” is 1 (it propagates a carry) and the carry-in is 1.

C_out = G + P⋅C_in

For the lowest bit, if we substitute G and P into the above equation, we get:

C_out = A⋅B + (A⊕B)⋅C_in

which is equivalent to our original carry-out equation:

C_out = A⋅B + A⋅C_in + B⋅C_in

The fun comes when you consider the second bit. It will have a carry-out if it generates one, or it propagates one and the lowest bit generated one, or it propagates one and the lowest bit propagates one and the carry-in was 1.

C₀ = G₀ + P₀⋅C_in
C₁ = G₁ + P₁⋅G₀ + P₁⋅P₀⋅C_in
C₂ = G₂ + P₂⋅G₁ + P₂⋅P₁⋅G₀ + P₂⋅P₁⋅P₀⋅C_in
...

Parallel (in small doses)

This series can go on indefinitely. If we compute a G and P for each column, then we can compute the carry bit for a column N by making an OR gate with N + 2 inputs, each of which is a G and a string of Ps, with the last AND gate having N + 1 inputs. We could compute each carry bit in 3 gate delays, but to add 64 bits, it would require a pile of mythical 65-input AND and OR gates, and a lot of silicon.

It’s more feasible for small adders, like 4 or 8 bits at a time. Here’s a sample two-bit adder that computes the two carry-out bits in parallel, by computing P and G first:

That circuit is already a bit intimidating to look at, so I didn’t show the sum bits, but remember that the sum bit is

S = A⊕B⊕C_in

or, using P:

S = P⊕C_in

So the sum for any column is just an XOR of the carry-in bit and the P bit that we already computed for our carry-out. That adds one more gate, for a total of 4 gate delays to compute the whole 2-bit sum.

If we built a set of 4-bit adders this way – assuming a 6-way OR gate is fine – our carry-select adder could add two 64-bit numbers in 19 gate delays: 3 for all of the carries to be generated, and 16 for the muxes to ripple down. These ripples now account for almost all of the delay.

Kogge-Stone

In 1973, probably while listening to a Yes or King Crimson album, Kogge and Stone came up with the idea of parallel-prefix computation. Their paper was a description of how to generalize recursive linear functions into forms that can be quickly combined in an arbitrary order, but um, they were being coy in a way that math people do. What they were really getting at is that these G and P values can be combined before being used.

If you combine two columns together, you can say that as a whole, they may generate or propagate a carry. If the left one generates, or the left one propagates and the right one generates, then the combined two-column unit will generate a carry. The unit will only propagate a carry bit across if both columns are propagating. It looks like this:

G_unit = G₁ + P₁⋅G₀
P_unit = P₁⋅P₀

In a circuit, it adds 2 gate delays, but can be used to combine any set of P and G signals that are next to each other, and even to combine some P and G signals that are already combined. On the right, below, is the symbol we’ll use to represent this combining operation from now on:

Any time we can do a recursive combination like this, we’re in log-scale country. This is the country where cowboys ride horses that go twice as far with each hoofstep. But seriously, it means we can compute the final carry in an 8-bit adder in 3 steps.

Wait, what? Well, the numbers at the top represent the computed P and G bit for each of the 8 columns of our 8-bit adder. The diamonds combine two adjacent sets of columns and produce a new combined P and G for the set. If this works, at the bottom, each arrow should represent the combined P and G for that column and every column to its right.

Look at the line on the far left, and trace it back up. It combines the lines from 7 and 3, and as we trace that up again, each of those combines two more units, and then again to cover all 8 columns. The same path up should work for each column.

There are lots of wires/connections because we need to compute the combined P and G for each column, not just the final one. These combined P and G values represent the combined value for each set of columns all the way to the right edge, so they can be used to compute the carry-out for each column from the original carry-in bit, instead of rippling:

C_n = G_n-combined + P_n-combined⋅C_in

The sum bit can still be computed with a final XOR, using the original (not combined) P and the carry bit to its immediate right:

S_n = P_n⊕C_n-1

This final step adds three gates to the end of each column. As we saw above, each combining operation is two gates, and computing the original P and G is one more. For this 8-bit adder, which uses three combining steps, we wait 1 + 3⋅2 + 3 = 10 gate delays for the result. For a 64-bit adder, we need 6 combining steps, and get our result in 16 gate delays!

The Kogge-Stone adder is the fastest possible layout, because it scales logarithmically. Every time we add a combining step, it doubles the number of bits that can be added. It’s so efficient that 25% of the delay in our 64-bit adder will be the setup and final computation before and after the combining phase. The only real flaw is that the number of wires gets a little crazy – the 8-bit adder is already filled with cross-connections, and that gets so much worse in the 64-bit version that I’m not going to try to draw it. It might even monopolize a lot of the chip space if we tried to build it.

Luckly, there’s a compromise that adds a few steps but removes a lot of the wires.

Brent-Kung

In 1982, Brent & Kung described this clever modification, which just computes the left-most column in a binary tree, and then fills in the intermediate columns in a reverse tree:

If you walk up the tree from bottom to top on any column, it should still end up combining every other column to its right, but this time it uses far fewer connections to do so. A nice paper from 2007 compares several adder strategies and decides that this one is the most energy-efficient because of the trade-off of speed for simplicity. That is, it can be built easier than the Kogge-Stone adder, even though it has nearly twice as many combination steps in it. For our 64-bit adder, we’d have 11 steps, for 1 + 11 ⋅ 2 + 3 = 26 gate delays. (This is more than our best-case of 16 for the Kogge-Stone adder, and a bit more than our naive-case of 24 with the carry-select adder.)

One potential problem is “fan-out”, which means one outgoing signal is being sent to several other gates as inputs. Electronics people would say one gate is “driving” a bunch of other gates, and this is bad, because the current gets split several different ways and diluted and weakened, just like water through a fork in a pipe. You can see this especially in column 3. A Brent-Kung adder will actually turn the joints (that I’ve marked with black circles) into buffers, or gates that don’t do anything. That reduces the fan-out back to 2 without slowing anything down.

Hybrid

One thing you might have spotted with your eagle eye is that the Brent-Kung adder doesn’t slow down the left-most column, which generates the final carry- out bit. So if we were to combine this strategy with the carry-select strategy from last time, our carry bits could start rippling across the adder units before each unit finishes computing the intermediate bits. Hmm.

An n-bit Brent-Kung adder will be able to generate the carry-out bit in log₂(n) steps, using 2 gates per step, with an additional gate delay for computing P and G for each bit, and two extra gate delays to compute the carry-out from the combined P/G.

The full sum will take an extra log₂(n) - 1 steps, and an extra gate to do the P⊕C_in operation.

When a carry-select adder is used with k units, the ripple delay is k plus the time it takes to get a carry-out from the first unit.

So if we split our 64-bit adder into 8 8-bit Brent-Kung adders, and combine those into a carry-select adder, the 8-bit adders will compute their carry-out bits in 9 gate delays, after which the carry bits ripple through the muxes for 7 gate delays, for a total of 16. The sum bits are available after 14 gate delays, in plenty of time. So we got it down to 16 total, and this time in a pretty efficient way!

Adding numbers: Proof that humans can make anything complicated, if they try hard enough.

There are a bunch of other historical strategies, but I thought these were the most interesting and effective. If you stuck it out through both articles, I’d love to hear your thoughts, ideas, and/or corrections.

How to add numbers (part 1)

2012-11-07T00:00:00-08:00

A few weeks ago, probably due to my recent Arduino and D-CPU obsessions, I started thinking about with this topic: How do modern computer CPUs add numbers? I took classes on this in school, so I had a basic understanding, but the more I thought about it, the more I realized that my ideas about how this would scale up to 64-bit computers would be too slow to actually work.

I started digging around, and even though wikipedia is usually exhaustive (and often inscrutable) about obscure topics, I had reached the edge of the internet. Only context-less names like “Kogge-Stone” and unexplained box diagrams greeted me. I had to do actual research of the 20th-century kind.

So come with me over the precipice and learn – in great detail – how to add numbers!

I’m going to start out as if you’ve never taken a class in computer engineering. If you’re familiar with the basics of binary addition, skip below to get to the good stuff.

Adding in binary

For big numbers, addition by hand means starting on the rightmost digit, adding all the digits in the column, and then writing down the units digit and carrying the tens over. In the example below, 8 plus 4 is 12, so we carry the 1, which I’ve indicated with a precious tiny blue 1 over the left column:

1
482
+345
----
827

We memorize this in school, but the reason it works is that each column is the same power of ten: 8 tens plus 4 tens is 12 tens. And 12 tens is really 1 hundred and 2 tens, so the 1 hundred is shifted/carried over to the hundreds column.

This works the same in binary, but the digits can only ever be 0 or 1, so the biggest number we can add is 1 plus 1. This would be 2, or “10” in binary (1 two and 0 ones), so there’s a carry of 1. In fact, if we have a carry, 1 plus 1 with a carried 1 is 3: “11” (1 two and 1 one). That still only carries a 1, which is convenient, because it means the carry can be represented in binary just like every other digit.

1 1
0110 (6)
+0111 (7)
-----
1101 (13)

So, to add two binary numbers, we just need to add 3 binary digits (one digit from each of the numbers, plus a possible incoming carry), and produce a sum bit and an outgoing carry bit. We can make a logic table for this:

A	B	C	Carry	Sum
0	0	0	0	0
0	0	1	0	1
0	1	0	0	1
0	1	1	1	0
1	0	0	0	1
1	0	1	1	0
1	1	0	1	0
1	1	1	1	1

…and then design a logic circuit to generate the Sum and Carry bits. In logic circuit equations, “+” means OR, “⋅” means AND, and “⊕” means XOR. (Programmers usually use “&” to mean AND, and “|” to mean OR, but I think in this case it’s important to use the symbols that professional circuit designers use. It gives you a bit more intuition when dealing with logical equations, which will come up later.)

One way to think of it is: According to the logic table we just made, the sum should be 1 if there are an odd number of incoming 1s. XOR is the operation that matches odd inputs. And the carry should be 1 if at least two of the incoming digits are 1.

Adding in circuitry

The most straightforward logic circuit for this is

assuming you have a 3-input XOR gate. If you don’t, you can just hook two 2-input XOR gates together.

Now rename C to C_in, and Carry to C_out, and we have a “full adder” block that can add two binary digits, including an incoming carry, and generate a sum and an outgoing carry.

And if we put a bunch of them in a row, we can add any N-bit numbers together!

Starting along the top, there are four inputs each of A and B, which allows us to add two 4-bit numbers. The right-most bit, A₀, is the “ones”, A₁ is the “twos”, and so on through the “fours” and “eights” (powers of two instead of ten). On the far right, we have a dangling carry-in which we’ll just set to zero so that it doesn’t matter.

The carry-out from the right-most adder is passed along to the second adder, just like in long addition: any carry from the “ones” is added to the “twos” column. Finally, on the far left, we get an “extra” carry out, because the addition of two 4-bit numbers may require 5 bits. Normally this is considered an “overflow”, but the carry-out bit is stored in some kind of status register by every CPU that I know of. It just usually can’t be accessed from C or any other language directly, so it gets lost.

Adding in slow-motion

But here’s where the problems come in. Imagine setting up 64 of those adders in a chain, so you could add two 64-bit numbers together. How long would it take? The circuit diagram above shows that each sum goes through one or two gates, and each carry-out goes through two. And the carry-out of one adder becomes the carry-in for the next one. So to generate the entire sum and the final carry-out bit, we need to go through 64 ⋅ 2 = 128 gates.

Uh oh.

Spoiler alert: No CPU has time to wait for 128 gates to flip in sequence, so no CPU actually adds this way. The problem is that the carry bit needs to “ripple” across each bit, and will only scale linearly with the number of bits being added. We’ll need some way to break out of linearity.

Carry-select adder

The trick that seems most obvious to me – and the only one I thought of before doing research – was apparently invented in 1960 by Sklansky. If you’re willing to add more circuitry in exchange for speed, you can put two adders in parallel. One computes the sum with a carry-in of 0, and the other computes with a carry-in of 1. When the real carry-in signal arrives, it selects which addition to use. Here’s an example of a 4-bit carry-select adder:

The weird rhombus-shapes are multiplexers, or “mux” for short. A mux takes two inputs and selects one or the other, based on a control signal. In this case, each mux uses the carry-in signal to determine which adder output to use, for each of the four sum bits (along the bottom), and the carry-out bit (on the left).

The diagram gets simpler if we make a shortcut box for a series of connected adder units, and draw each group of 4 input or output bits as a thick gray bus:

Now, for example, to compute the sum of two 16-bit numbers, we can split each number into four chunks of four bits each, and let each of these 4-bit chunks add in parallel. When the adders are finished, the carry-out bit from the lowest (rightmost) adder is used to select which adder’s result to use for the next four bits, and then that selected carry-out is used to select the next adder’s result, and so on. Simplifying the diagram a bit more, it looks like:

If we assume a mux takes as long as a logic gate, then this circuit can compute a 16-bit addition in 2 ⋅ 4 + 4 = 12 gate delays: 8 for all the adders to finish, and 4 for the muxs to ripple the carry bits across. For a 64-bit adder, it would take 24 delays, because it would have 16 muxes instead of 4. Going from 128 to 24 is a great start, and it only cost us a little less than twice as many gates!

We can fuss with this and make it a little faster. The leftmost adder unit waits a long time to get its incoming carry bit, and the first 75% of the time is spent waiting for the first adder to finish. If we compute only one bit at a time on the right, then two, then three, and so on as it goes left, we can shave off a few more.

But… we can do better.

Next time, some tricker adding methods that end up being quicker.

C++ quirks, remembered

2012-10-09T00:00:00-07:00

Remember how annoying it was to try to get anything done in C++? Typing reams of boilerplate and fighting mysterious compiler errors that need to be fed through c++filt to make any sense of them? Or maybe those days have faded into the rose-tinged past, when we were “real” programmers, who ran through the gauntlet just to prove we could.

It’s been over five years since I did any serious C++ coding, but I dusted it off for a pet project recently, and had forgotten a lot more than I thought. It was also a lot worse than I remembered.

The main thing I remembered is that C++ is more verbose (by line count) than modern languages. At a previous job, we translated a few servers from C++ to Java, and calculated that Java required about half as many lines of code, not counting custom libraries for things like network I/O (which you get for free in Java). I strongly believe that verbosity and boilerplate are bad features in a programming language, but I’ll save that argument for a future blog post.

But the quirkiness of C++ goes a lot deeper than its wordiness. Check out these things I’ve been uncovering as I work on my project.

Private inheritance

Woe be unto you if you think of the java line

class Complex extends Number

and you write it in C++ as

class Complex : Number

because gotcha! C++ predates consensus ideas about public/private exposure, and the default inheritance actually changes the exposure of all inherited members to be private (hidden). Like most of the 137,000 features of C++, this is probably useful to someone somewhere, and everyone else just has to remember the rule that you need to type

class Complex : public Number

to get standard inheritance.

Implicit constructors

This is a feature where, if you have a constructor taking a single parameter, like

CatList(int maxSize);

and a function that takes an object of that type, like

void feed(CatList& cats);

then you can make a seemingly nonsensical or broken call like

feed(3);

and it will compile. Why? Because the C++ compiler sees that you didn’t pass a CatList, and it starts rooting through your garbage to find something that will work. (Like perl, C++ would rather do the wrong thing than fail to compile.) So it finds the CatList constructor that takes an int, and turns your call into

feed(CatList(3));

so that it creates an empty list of cats and feeds them. I’m sure that’s what the author meant!

This became such a huge problem that C++ eventually added the explicit keyword, so you can mark which constructors should be safe – more boilerplate to memorize.

Scala added a similar feature, but reversed the sense, so you have to mark a method as implicit before it will be considered. You also have to import any implicit conversions you want to use in a file, so they become a bit more explicit. But scala coders will tell you that even with all these restrictions, they can bite you from time to time. C++ making them default behavior is just punitive.

Generic classes must live in headers

This is pretty obscure, but if you make a generic class, which are called “template classes” in C++, it must live entirely in a header file, or else be used only in the file it’s defined in.

Normal C++ classes have a header file with the class’s interface, and a code file for the implementation. This can’t be done for generic classes because the compiler still treats templates as macros: each possible variant of a generic class is effectively compiled into a separate class. If you put part of the implementation into a C++ file, the compiler can’t find it. You’ll just get context-free link errors.

There’s a great description of what’s going on here.

My favorite part is that the C++ standards body tried out a fix for this, but the fix turned out to be broken in a different way, so they deprecated the fix. (Details here.) It’s as if they’ve thrown up their hands and said “Well this is hopeless. Just remember to treat these classes specially.”

Method pointers

It’s possible to grab a pointer to a method in C++. Most people I’ve mentioned this to looked surprised, but really, it’s an old, supported language feature. If you have a class like

class CatList {
  void add(Cat& cat);
}

then you can define a type for the add method (which you should absolutely do, because good grief look at this mess) – let’s call it Method:

typedef void (CatList::*Method)(Cat& cat);

and you can pick out the method pointer and save it!

Method method = &CatList::add;

And you can call it later!

CatList *c = ...;
(c->*method)(cat);

If you’ve never seen ::* or ->* in C++ code before, that’s because they are special operators added just to make this feature work. The parenthesis around c->*method are because they thought it would be funny to give these new operators incredibly low precedence, for maximum surprise-factor.

I wanted to use this feature to translate method calls on a javascript (v8) object into method calls on a C++ object. v8 will pass a (void *) to the C++ handler, so all I need to do is stuff the method pointer in there and cast it back on the way out… right? Wrong!

error: cannot cast from type 'Method' (aka 'void (CatList::*)(Cat &)') to pointer type 'void *'

The pages about this in the C++ faq read like “Get off my lawn!” and don’t explain why this won’t work, so I’ll try. In C++, a method pointer isn’t just an address; it’s potentially a struct containing the method’s address and some other data too. So the struct is too big to be assigned into a pointer.

When I mention this to friends over beers, most people guess that it’s because C++ needs to store two pointers: one for the address of the function, and one for the “this” pointer. But nope, it’s not that. You need to pass the “this” pointer on the left side of the ->* operation. In fact, nobody seems to know what else a C++ compiler might want to store besides the function address. My guess is that it might also want to store the address of the “vtable” – the class definition/metadata.

If you’ve been feeling nostalgic about the old days of “real coding” in C++, maybe now you realize that you haven’t really been missing it. Sometimes the present really is better than the past!

(Photo credits: my dad’s painting, and a wikipedia arctile about the dust bowl.)

Mensch 2.0

2012-08-22T00:00:00-07:00

I recently picked up one of the new “retina” Macbooks, and it looks incredible. There’s nothing magic about the hardware – they simply doubled the number of pixels in each dimension, as you can read about in this arstechnica review of Mountain Lion. But the scaling effect on text is beautiful. Here’s what it looks like blown up 8 times:

That’s a lot of pixels.

The new sharpness highlighted a few, um, “rough edges” in the Mensch font that I posted a few years ago. Back then, the only accessible font editor for macs was a port of an old X11 Linux font editor. It crashed a lot, and getting it to connect lines together was usually a five-minute process of trying the same thing over and over until it worked.

But a casual mention in a github blog post led me to an awesome new font editor called “Glyphs”: Check it out! It’s Mac-native, has a simple interface, great documentation, and (so far) hasn’t crashed.

Armed with Glyphs, I went through the current Menlo drop from Apple, and re-created the same changes I had made to create Mensch. With so much more control over the shapes, it came out a lot better. It might not be obvious on a non-retina screen at a small point size, but it’s a pretty clear improvement on the new screens. I’m still calling it Mensch, but it’s a Mensch 2.0.

One thing I backed down on from the original version: The “angle brackets” < and > are reduced to the height of capital letters instead of being as tall as the square brackets. Their excessive height was a little distracting, and I don’t think it added any more clarity than letter height gives them.

There’s a bug in the new version of Font Book that causes the new Mensch font to display a weird selection of characters when first importing, but it doesn’t seem to affect its actual use.

Download Mensch 2.0

Why Config?

2012-03-26T00:00:00-07:00

When I first started playing with scala in 2008, I was dismayed by the state of server configuration in the java world. A lot of java servers were still using property files, or worse, XML. XML is meant to be easily parsed by computers, but is really hard for humans to read and edit, and tends to hide useful information in baroque syntax and line noise. The python world was still clinging to Windows-era “INI” files, and the ruby world had invented something called YAML, with its own odd syntax.

[Alex Feinberg pointed out that the use of the term “config” can be overly general. In this post, I’m talking specifically about configuration used to bootstrap a cluster of machines all running the same server code. Shared configuration required by multiple server clusters is a different problem, and obviously not well-served by any solution that only works on the JVM.]

We had gone through many iterations of config file formats at my previous job, as we moved from perl to C++ to java, but it was a very private company, terrified of open source, so we shared none of what we learned. I thought it was time to spead some best-practices around, so I wrote “configgy” lazily over a couple of months as I learned scala.

Configgy

The core ideas behind “configgy” were:

A config file should be a text file, primarily readable by humans. It should be unambiguous and have minimal syntax.
A server’s configuration should really just be a set of (string) keys and values. The values can be bools, ints, strings, lists of strings.
You should be able to take blocks of these key/value sets and nest them, so subsystems can have their own isolated configuration.
The API should be similarly minimal, like a typesafe hashtable, and should allow subsystems to “subscribe” to configuration values and get notified if they’ve changed.

The end result was pretty successful, and we used it at Twitter for several years. An example chunk of a config file might look like this:

port = 22133
timeout_msec = 100

log {
  filename = "/var/log/kestrel/kestrel.log"
  roll = "daily"
  level = "info"
}

Unfortunately, I had gone in the wrong direction, and it took a while for the mounting evidence (and my coworkers) to convince me.

What’s wrong

Some of the problems with configgy show up in the config file example I pasted above:

There’s no schema. “port” should be an int, but there’s no place to declare that. There’s no definition for what should be in the config file at all. What are the keys? What do they do? You have to document it separately in a text file, if you’re really ambitious.
The available types aren’t sufficient. Durations are really common in server configuration because they specify timeouts, and there’s no real support for them. You have to drop sly hints in the field names (like “msec” for milliseconds) and hope people are paying close attention.
Extending the available types will never cover all cases. The “roll” field above can only have a few possible values, but there’s no simple way to define a new enumerated type like that.

Other problems only show up in daily use:

validation: How do you validate that the config file won’t cause a server crash hours after it starts? There’s nothing forcing “timeout_msec” to be an int, so it may throw an exception minutes later, when the code first tries to call .toInt on it.
defaults: What is the default timeout? Is there one? Configgy supported providing a default value in the API, but how do you know what that is when you’re editing the config file – especially if you didn’t write the original code?

One of the biggest faults should get its own section, because I have a lot to say about it.

Reloading config files

Configgy had a lot of code to support reloading config files on the fly, allowing a server to “subscribe” to a key and change its behavior if a config file was reloaded. It seemed really clever at the time, but experience taught me and my coworkers that it’s a really bad idea in practice.

How often do you change a config file on the fly and ask the server to reload it? And more importantly, when? Murphy’s Law tells us the answer: when something is broken, it’s the middle of the night, and it needs to be fixed immediately.

But because we only did this in a crisis, the code was effectively untested. If you aren’t regularly using some part of a server, you can’t trust it enough to depend on it in a crisis. In a crisis, I want only tools that I’ve used before and am confident in. It only takes a couple of incidents where reloading a config file doesn’t actually fix the server’s behavior before your policy becomes: Fix the config file offline, then restart the server.

The ability to reload configuration became just another moving part: something you had to think about, but would never actually use in a crunch.

This could probably be solved by adding automated testing that changes your config file, asks the server to reload, and then re-runs a suite of tests. But it just didn’t seem worth it. As a practical matter, the server needs to startup cleanly after any kind of unclean shutdown (“kill -9” or a fire) – and must be tested to do so – so you don’t need any other feature for reloading the config file. Just change the file and kill the server. Now it’s running with the new config!

How to fix it

If you read my post from last year about patterns, you know where this is heading. There’s one obvious way to define a set of named, type-safe fields: write a scala trait. Your config file can then just be a scala file that you compile and evaluate when the server starts.

Your config trait should be a builder that creates a server from config, like this:

trait ServerConfig extends (() => Server) {
  var port: Int = 9999
  var timeout: Option[Duration] = None

  def apply(): Server = new Server(port, timeout, ...)
}

The apply method assembles a Server from the configuration. After that, your config file can be:

new ServerConfig {
  port = 12345
  timeout = 250.milliseconds
}

The important lines look just like the configgy version, and are executed as part of the constructor.

Now you have a schema (the config trait), and every field has a type, declared in the trait and enforced by the scala compiler. If you need a specialized type, like an enum, you can make one. I especially like how readable timeouts become. It’s unambiguous that the duration is specified in milliseconds, and you could use seconds if you want.

How does it work?

The key is Eval, a component of util-eval that makes it easier to compile and execute scala code from inside the JVM. Scala already exposes this functionality – the scala compiler runs on the JVM, after all, and the REPL needs to do line-by-line compilation – but the API is arcane and marked with a “No serviceable parts inside” label. The Eval class simplifies it to:

scala> val eval = new Eval()
eval: com.twitter.util.Eval = com.twitter.util.Eval@1df5973b

scala> eval[Int]("3 + 4")
res0: Int = 7

The result of evaluating a config file is a new ServerConfig object (or similar), and calling apply on that will return a fully-initialized Server object, so loading the config file and starting the server boils down to:

  val eval = new Eval()
  val config: ServerConfig = eval[ServerConfig](new File("..."))
  val server: Server = config()
  server.start()

If you add some exception handling to log errors, you end up with the code inside RuntimeEnvironment in ostrich, which we use to bootstrap server startup from config files in a deployed server.

Sleight of hand

There are two problems I listed above that aren’t solved by this simple solution: validation and default values. So you have to add a little bit of code to finish up.

If a config file can be compiled and executed, then it’s valid. The result of the evaluation is a config object (ServerConfig in this example) that doesn’t have any side-effects and can be safely evaluated at compile time. So that’s what we do: the last phase of a build executes the server jar with a special "--validate" option that compiles the config files and exits. If that succeeds, the config files are valid and they won’t crash the server in production.

In the example above, all the fields had default values, which is not always what you want. For those cases, we defined a basic Config trait. It allows you to mark a field as required with no default value, or optional, or lazily computed.

trait ServerConfig extends Config[Server] {
  var port = required[Int]
  var timeout = optional[Duration]
}

Implicits handle the conversion from a normal type to a “required” or “optional” type (optional types just use scala’s Option class), so the config file looks the same.

The Config trait fits completely in one file, with less than 100 code lines (according to cloc). That’s an incredible improvement over configgy.

Postscript

This post is a little overdue, but better late than never. :-)

I wrote this because it was important to me to share the knowledge, not because i did all (or even most) of the work. I carefully avoided naming coworkers while writing this post, because it disturbed the flow, but they all deserve callouts:

John Kalucki first spelled out for me why the implementation of default values was bad. Matt Freels and Ed Ceaser implemented the first draft of the Config class and pulled me in to help iterate on it. Nick Kallen opened my eyes to the dangers of depending on a server’s “shutdown” and “reload” behavior.

Dissolving patterns

2011-04-30T00:00:00-07:00

I forget where I first read about this meme, but I’ve been appreciating it more and more over time:

Once a programming pattern achieves widespread use, languages start to absorb it into their structure.

Object-orientation is a pattern in C, because you have to use a library like gobject to implement virtual method calls, and you have to write code to that library’s conventions. In java, you get virtual method calls as default behavior. It’s not a pattern; it’s part of the language.

Iteration used to be a pattern in java, but by java 5, there was enough syntactic sugar to make iteration seem like a natural part of the language. You don’t have to think about iterating; you can do it with half a line of code. You don’t usually need to think about how to create iterable objects, either, because the standard library has a set of methods to make it easy. It’s become a part of the language.

Another way to look at patterns is that they’re boilerplate you need to add to your code – boilerplate you could avoid (or make a lot smaller) if the pattern was integrated into the language. So as a pattern becomes commonly accepted, it’s more likely to be added as a core feature, and cease to be a pattern at all. I think this is part of the reason that high-level languages are terser than low-level languages: patterns become syntax.

There are two patterns that I think should go away in a scala (or any post-java language) world.

Factories

The factory pattern is used to abstract away object creation, usually so libraries can have functionality injected. For example, a popular HTTP library for java uses a factory to allow clients to replace or augment the socket library:

interface SocketFactory {
  public Socket createSocket();
}

This is a cool feature, because it allows the library to be very configurable, but it ends up creating a lot of boilerplate code when it’s used:

class MySSLSocketFactory {
  public Socket createSocket() {
    return new MySSLSocket();
  }
}
...
Protocol p = new Protocol("https", new MySSLSocketFactory(), 443);

Let’s look at the interface a bit more closely. It’s really just sugar for a single method which takes zero or more parameters and returns a new object. In fact, let me suggest that it’s really just another way of describing… a function! But in higher-level languages, functions are first-class objects, so we could define the factory that way instead:

type SocketFactory = () => Socket

And instead of creating a factory, we can just pass in our function for creating sockets:

val protocol = new Protocol("https", { () => new MySSLSocket() }, 443)

Poof! All the boilerplate is gone. You don’t need a factory because a factory is just a function, and the function is a factory. The pattern is integrated into the language already.

So I propose that the next time you start to write a factory interface, you just use a function instead.

Builders

That probably isn’t very controversial or shocking, but let me go one step further and tackle a tougher pattern: the builder.

Builders are used when an object may have a lot of initial state to configure. For example, if your class constructor has 10 parameters, you probably want to create a builder so that constructing it can be a little more self-documenting, and optional configuration can be omitted.

As immutable objects become more popular, builders are also becoming a popular way to allow multi-step construction in a mutable object before squirting out the final immutable object.

Here’s an example from the android API. There are different code styles, but the most common one seems to be a cascade of method calls, culminating in a final create call:

AlertDialog alert = new AlertDialog.Builder(this)
  .setTitle("Alert!").setMessage("lp on fire!")
  .setNegativeButton("Denial", null)
  .setPositiveButton("Acceptance", null)
  .create();

You have to admit, at least, that it’s a lot better (and easier to understand) than a huge constructor call, which could end up looking something like:

AlertDialog alert = new AlertDialog(this, "Alert!", 0, "lp on fire!", null, null, button1, ...)

Anyway, you’ve probably already guessed what I’m going to say about that create call: It’s a factory function!

But the cascading method calls can be fixed too. What if the builder class was replaced by a configuration class with mutable fields? It might look like this:

class Builder extends (() => AlertDialog) {
  var title: String = ""
  var message: String = ""
  var positiveButton: Option[Button] = None
  ...
  def apply() = ...
}

The builder class is a factory function that generates new AlertDialog objects – that is, it’s a function () => AlertDialog – but it also has a bunch of optional, mutable configuration. Instead of cascading method calls, you can use the constructor to override any fields you want to set:

val builder = new Builder {
  title = "Alert!"
  message = "lp on fire!"
  positiveButton = Some(new Button(...))
}
val alert = builder()

I think this is even easier to read than the builder call. To avoid confusion, I call these classes Config instead of Builder, but they have exactly the same effect: to create a (possibly immutable) object out of mutable configuration.

It looks like having functions as first-class objects lets us integrate at least two patterns into the language, but there are a lot more to work on. For one, the delegator pattern seems to still involve a lot of boilerplate in most languages. So there is more to be done! Onward and upward!

Why AT&T Deserves To Fail

2010-07-03T00:00:00-07:00

AT&T is a very large, very old company. It’s so old, it has the stock ticker symbol “T”. That’s right, just “T”. It’s so large, the US government tried to break it up into a half dozen smaller companies 25 years ago, and ultimately failed. But history is filled with very large, very old institutions that eventually failed: the Roman empire, the Dutch East India company, and the Atlantic slave trade, to name a few.

Recently I bought an HTC Evo (Android) on the Sprint network. My plan was to try it for a few weeks, and if the battery life was as bad as some reviews said it was, or the coverage was as bad as AT&T’s, I would return it.

I’ve barely had the thing for 48 hours and I can’t imagine going back.

More than half of my joy is the phone itself. The iPhone has always been a step backward, since its crippled OS means it can’t run a decent SSH terminal or chat client – two of my three primary uses for a mobile device. (It has a great browser, though.) The newly-named iOS 4 doesn’t fix that. We’ve had multi-tasking on phones since 2002, but Apple can’t seem to get it working on hardware that laptops barely had in 2002.

The Evo evokes a sense of wonder and infinite possibilities, not too different from how it felt as a child to play with a computer for the first time. Unlike the fortress-like atmosphere of an iPhone, the Android OS suggests that if you can think of something to try, it might just work. For example, I browsed to a website offering an mp3, and clicked on the link just to see what would happen. The song started downloading! It showed up in the music player! I could listen to music on the internet without going through the tedious process of docking and syncing with a laptop!

Apple is not trying for this market, though. They’re aiming to be a shinier, more expensive RAZR; a consumer phone that can’t do much, but can do it very well. And they’re very successful at it, to gauge from their sales numbers. Very, very successful. The iPhone is probably the most popular phone in the world. People stood in line 14 hours this week to purchase a modest upgrade.

If, say, a cellphone carrier were able to get an exclusive contract for selling the iPhone for the first five years of its meteoric rise, that carrier might very well make bank. They could become larger than all the other carriers combined! They could add extra fees for the privilege of having an iPhone, and expand their reach and coverage to hitherto unknown extent.

AT&T has this exclusive contract, and they have the privilege fees. But they haven’t expanded their coverage or their reach. They haven’t done anything, as far as I can tell, except shovel wheelbarrows full of money into a Scrooge McDuck vault somewhere, while their name & brand have become a joke. Whatever else AT&T did in the past, they now represent only “incompetent wireless provider” in the mind of the consumer. How did this happen?

I live in the second-most densely populated city in the US, with 6,700 people per square kilometer. In the three years I’ve been using AT&T’s cell network, they have not managed to make a single improvement to their abyssmal coverage. Entire neighborhoods, including popular ones like the Haight and the Mission, are giant dead zones. My friends and I regularly use Skype to make voice calls, out of necessity. If you can’t cover the end of a peninsula 10km on a side, something is seriously wrong. The only rational explanation is that AT&T is not trying.

One excuse often offered is that “cities are hard to cover”. Sorry, I’m not buying it. If the task is difficult, we’d be seeing progress, but slowly. Seeing no progress means the task is either impossible or isn’t being attempted. Since Sprint and Verizon are able to provide excellent coverage, I guess it’s not impossible. So, again, AT&T is not trying.

In fact, their coverage situation is dire throughout the bay area of California. Why would they care to provide coverage to the bay area, though? Aside from it being the home of Apple and 7.4 million people, there’s a really good PR reason why they might want to step up their game here. It’s the home of nearly every tech pundit and blogger on the net. AT&T could improve coverage in every other part of the US, and still get a continuous stream of bad press about their network, because these journalists can’t make a phone call. It’s so self-destructive, it’s mind-boggling.

Apple seems to have given up on AT&T. Their latest new features, like video chat, won’t even try to use AT&T’s network. They’re just hoping that you’ll be in wifi coverage most of the time. And sales of the wifi-only model of the iPad imply that a wifi-only iPhone would probably sell as well as the AT&T one.

All this money in the bank won’t save AT&T any more than it saved Microsoft. And it won’t buy them happiness – ask Scrooge McDuck about that. It might have bought them a future, but it’s probably too late even for that, now.

It doesn’t matter anymore, though. Yesterday, from my Evo, I sent a text message from my bedroom! The future is now!

Mensch -- A coding font

2010-06-21T00:00:00-07:00

The latest MacOS release (10.6, or “Snow Leopard”) comes with a new monospace font. It’s called “Menlo” and it’s a slightly modified form of the standard Linux font (with appropriately weightly Linux name) “DejaVu Sans Serif Mono”, which is itself an updated form of Bitstream Vera Sans Mono. Apple’s modifications are a definite improvement to my eyes, mostly because they thicken up some of the wispier glyphs from DejaVu, like the underline and period. There’s a great comparison over here.

One thing that bothered me, though, is that they turned the zero into a 1980s-style “slashed circle”. Unhip, daddy-o! Naturally I searched for a font editor, and the best one I found was Font Forge, an old Linux app ported to the Mac but still requiring X11. So that’s two ways OS X is borrowing from Linux for font support. What’s up with that? Was there an elite cadre of fontistas working on Linux machines in a secret bunker? Linux is, um, not usually known for its great designers.

I couldn’t limit my tweaking to the zero glyph, so in the end I made about a dozen changes. Bitstream released these fonts with a very open license that only requires that you change the name if you change anything about the font, so I’m releasing my changes with the same license, as the font “Mensch”.

A summary of the changes I made:

Zero is back to being a circle with a dot in the middle.
The ampersand’s loop was closed. That was also bugging me.
Three is rendered in the gothic style, because the gothic style is clearly superior.
Lowercase L is drawn in a more traditional style (the Menlo variant looks bizarre to me), and one is turned gothic. I think the original artist drew the L weirdly to make it extremely clear that it’s not a one, but if you draw a gothic one, the difference is obvious even with a simpler L.
The bang, question, lowercase I, and lowercase J are made friendlier by turning the dots into actual circles, not just blocks.
Angle brackets are embiggened to facilitate languages like Java and C++ that use them to enclose class names. They parallel parens and square brackets in size now.
Q and lowercase Q are made more distinct. Lowercase Q gets a little more spunk. (This was a bit gratuitous.)

Here’s a sample of Mensch in use in code:

Doesn’t it look cool? Don’t you want it now? Here’s the link!

mensch.ttf

A very tiny, monospace, bitmap font

2010-01-23T00:00:00-08:00

Long ago, when making an SSH client for a phone, we needed the tiniest possible monospace font we could find that was still basically legible. Brian Swetland – who was also the instigator of this side project – had made a 4x6 font (3x5 usable pixels) for a Palm Pilot VT100 emulator. Here’s what it looks like, blown up four times:

When I first tried it out doing things like reading email and editing code, a couple of things bugged me. The uppercase letters were great. The lowercase letters had looked fine individually, but when used in long strings together, they tended to look like a child’s writing. I eventually figured out that this was because they all had different middle heights, and modified them so that they had a fairly tall mid-height (one pixel shorter than uppercase) but were all consistent. This made some of the individual letters (like “x”) look odder, but it made them much easier to read as a whole. It also let the “a” and “g” look a lot more stylish.

The other thing that bothered me was the digits, which I think were intentionally modeled on the 7-segment LED of an alarm clock. Again, they looked fine on their own or as part of a long string of digits, but when they were in the middle of a line like “Date: 23 Jan 2010”, they stuck out a lot. I pretty much just went in with a belt sander and sanded off the corners so they would seem more rounded.

The final version, which we jokingly called “Tom Thumb”, is below:

And here’s a dumb comparison of a line of text, with the modified form on top and the original underneath:

Brian’s page implies that his font was licensed under the MIT license, so since I did these modifications in my free time, the same license applies here. Feel free to download the BDF file below if you would find this useful, or would like to modify it further for some other nefarious purposes. (Update from 2015: As per comments below, Brian has authorized his font to be released under the CC0 or CC-BY 3.0 license. Therefore, this font may also be used under either CC0 or CC-BY 3.0 license.)

The BDF actually includes a sprinkling of Latin-1 characters too, but the pixel limitations hit those characters a lot harder than ASCII, so… be forewarned. :)

tom-thumb.bdf

More real-world git

2009-11-29T00:00:00-08:00

Git is really confusing for new users who have come over from subversion or perforce. On one hand, I can admire, in a sort of detached objective way, Linus’s commitment to making the tool bare-bones and focusing on trying to make the command-line tools as fast as possible. On the other hand, many of the defaults are maddeningly obscure, and there are large gaps where the the starship’s hallway just ends in a catwalk and piles of exposed wiring. “Watch your step here. We haven’t felt like finishing this part.”

So, cool that git has reached a critical mass where it’s bringing DVCS (“distributed version control systems”) to people who never would have tried them before. But it means that, as opposed to bazaar, we need to share a lot of knowledge and best practices in order to make it work smoothly. Consider this a sequel to my last git post.

Single master

A few times I’ve heard people argue that if you’re using git (or any DVCS) and you have a central repository, then “You’re Doing It Wrong”. I disagree.

I think most software projects, whether they are a library, a server, or a website, have a single linear series of releases. Toaster 1.3.2 is followed by Toaster 1.3.3, etc. On this macro scale, each release is meant to be an improvement or progression over the previous one. If you have this kind of system, then you’re probably going to have a single repository somewhere which holds the “authoritative” copies production or release branches, even though lots of people will have local copies of them.

DVCS makes it easy to have a large cloud of coders, each with their own copy of the source, equally authoritative from git’s point of view, and therefore makes it easy to fork projects, which is pretty useful in open source. But it doesn’t require you to treat each repository as equally authoritative in your workflow. It works just fine with the model of a single centralized repository. It would be foolish if it didn’t, since that’s the way almost every software project works.

The key is that you can fork a branch from the “master” branch, experiment for an hour on the train, and then if you want, you can merge back in, keeping all of your change history. If you can hack on things wherever and whenever you want, and sync back up later, You’re Doing It Right.

Why you shouldn’t fast-forward

As far as I know, git is the only DVCS that has a “fast-forward” merge feature. Maybe that’s why they have it turned on by default. Please, git maintainers, change this default because it is wrong. It’s a clever feature, and when you want it, it’s a nice tool to have around, but default behavior should always be the most commonly-wanted behavior, and normal (non-fast-forward) merges are the most commonly-wanted behavior for a merge.

I didn’t explain fast-forward merges well last time, so I’m going to try again. Let’s imagine you write software for the jukebox in a Waffle House. You have a pretty small codebase so far.

and you decide you want to try hacking on the randomizer code, so that when it plays songs at random, it’s more likely to play songs with “Waffle House” in the name. You make a branch.

And you hack on it for a while, and it works!

Meanwhile, nobody has been working on the main branch, so nothing has happened there. It’s time to merge back in so you can release this awesome new code. If you do this with git merge, it will do a fast-forward merge, meaning it will just ignore the existence of your branch and pretend that you were working on the master branch all along. Most importantly, it will not create a merge point that you can identify later. The information that you ever had a feature branch is gone.

If you want to revert this feature later (possibly because it’s driving the staff crazy), you have to figure out which changes were involved, and revert them one by one.

However, if you did a “normal” merge, using git merge --no-ff, there will be a specific revision marking the merge.

No history is lost. You can see everything that happened on the feature branch if you like, and you can also revert the entire branch by reverting the merge.

Cross merging

One nice feature of DVCS is cheap branching. After figuring out that creating and merging branches is as easy as making a commit, most people jump right in to the workflow of creating a new branch for every bugfix or feature. But you can still get stuck in the “star model”, where every branch is forked and merged only to the master branch. And, as David Yang pointed out, if you have a long-term branch, merging it into master can cause a giant conflict that has to be resolved at the last minute.

It doesn’t have to be that way though. You can and should merge the master branch into your branch often. This works because DVCS like git use “merge strategies” that look for the most recent common ancestor revision, and play through changes from that point forward. Every time you merge master into your branch, you have a more recent common ancestor (marked with * on the diagram below), so there is less to merge, and conflicts are resolved on your branch.

If you do the last merge right before you merge back into master, it won’t even be possible to have conflicts, because you took care of them all. Our deploy system uses this fact, and auto-rejects any branch that won’t merge without conflict. It’s the branch owner’s responsibility to keep each branch merged up to date.

You can also cross-merge between two unrelated branches, which is helpful if they’re dependent on each other. (Maybe fixing bug 13 requires bug 12 to be fixed too.) This has the side effect of making the two branches interdependent, but if they were already interdependent, you’re fine.

Okay, that’s all for this installment!

Fanout queues in kestrel 1.2

2009-10-10T00:00:00-07:00

I bumped kestrel to version 1.2 this week and there are a few new features:

Queue files are loaded at startup time now, before opening the listening port.
You can explicitly abort an outstanding transcational GET by passing the /abort option (in the same way as /open or /close).
You can also peek at the current head item on a queue with /peek.
New options max_item_size and sync_journal.
Queues can be deleted with DELETE, which just forgets every item and erases any journal file (ie: fast & unrecoverable).
Some bugs in stats reporting were fixed.
Hopefully the documentation is much better. There’s a guide now.

Thanks to Clement, Zachary Kim, Blair Zajac, and Frederic Jean for submitting patches!

Special thanks to Justin Azoff, who’s responsible for the coolest new feature: fanout queues. My coworkers had been asking about for this feature for a while, and he posted a branch on github which added them in a way that makes sense to me, so I pulled it in with minor changes. Here’s how they work:

A fanout queue is a “parent” queue that has “children” queues. Whenever an item is added to the parent, the same item is added automatically to each child. The children each have their own journal files, and are essentially independent queues. An item removed from one child doesn’t affect any other children. In fact, you can even add items directly to a child if you want, bypassing the fanout. The primary distinction is that SETs to the parent are repeated to each child automatically.

A child queue is created automatically by referring to a queue with a + in its name. For example, a queue named “orders+audit” is considered to be a child of the “orders” queue. Once a child is created, it starts receiving new items immediately, but existing items in the parent are not copied over. (The child’s history begins now.) Fanout stops when a child is DELETEd.

Children use the same configuration as their parent, to make things easier. There’s no specific configuration for a child. Also, you can expect fanout queues to use more disk space and memory, since items are being copied around muliple times.

Hopefully the simplicity of the implementation makes it easy to reason about, which I think is more important than having a lot of clever tricks or magic.

Kestrel is, as always, hosted here: http://github.com/robey/kestrel

Follow-up on Scala on Android

2009-07-19T00:00:00-07:00

Dan Bornstein had a correction about my guesswork on how dx works, and I got permission to post it here. I said:

The end result is a little wasteful: even a tiny “hello world” app is 800K, probably because of that big ol' scala library. I suspect dex is proactively throwing away classes that I didn’t use, or it would be even bigger.

Dan replies:

Actually, dx doesn’t do any sort of tree-shaking. It’s just that I obsessed over making the dex format compact, and I got a bit of help on that front from mikef who designed a super-tight encoding for debug info (line numbers, local variables, etc.).

An uncompressed dex file is usually a little smaller (5% or so) than a corresponding (compressed) jar file, and for distribution a compressed dex file is generally about 40-45% the size of that.

For comparison on the distribution front, a compressed pack200 is still usually a bit better than a compressed dex file, weighing in at 30-35% of the original compressed jar size, a win which it gets by translating class files into a compressability-optimized intermediate format (doing stuff like collating immediate integer constants out of bytecode streams). The distinction here is that the dex format loses a bit of compressability, trading that for suitability to be directly executed.

We decided that the overhead (both in terms of code and runtime) of doing something like pack200 wasn’t worth it for us, at least not yet, because most apps don’t actually have that much code (much more likely to be heavy with media), though if people start regularly including language runtimes with their apps, we may want to revise.

Oh, a much more minor tidbit: You can pass dx options for the underlying JVM by prefixing them with “-J”, e.g. “-JXmx2000m” or what-have-you.

Actors talk for SDForum

2009-06-25T00:00:00-07:00

Last night I got to give a short talk at SDForum on “solving real problems with actors”, or more specifically, why I got interested in using scala actors, how they’re used in naggati and kestrel, and what I learned about where they’re useful and not. It was fun, even though I was a little nervous – they had me speaking after Carl Hewitt, one of the instigators of the actor model, and Frank Sommers, who’s writing a book about actors in scala.

Here are the slides: http://robey.lag.net/docs/robey-actors-sdforum.pdf

Thanks to everyone who came, and everyone who chatted afterwards! It was worth the drive. :)

Performance bug in kestrel 1.1

2009-05-18T00:00:00-07:00

If you’re using (or have tried to use) kestrel 1.1, you should upgrade to 1.1.1 (in github) for a performance fix.

Kestrel uses naggati, an actor-adapter for mina, to handle I/O and session management. Mina is a java library for handling thousands of open connections across NIO, and naggati converts mina events from java-style “listener method calls” to actor messages. I wrote all about mina & naggati in another article.

One feature added to naggati 0.6 was the ability to filter out mina events you didn’t care about. So if, for example, you didn’t care to receive an event message every time mina finished transmitting a block of data, you could filter out those messages:

IoHandlerActorAdapter.filter(session) -= classOf[MinaMessage.MessageSent]

Potentially this saves you a lot of unwanted actor messages that you’d just ignore anyway, causing fewer context switches. If the feature works correctly… aye, there’s the rub.

Naggati 0.6 would accidentally overwrite any filter changes that happened during an actor’s constructor. Kestrel’s session actors changed their filters in the constructor, and so their filter changes were lost. This caused kestrel actors to receive reams of unwanted messages like MessageSent, which by itself would only be a minor annoyance, and would have the same performance behavior as kestrel 1.0.

But, in actors, if you don’t pattern match on a message type, and you receive a message of that type, the message just sits in your queue forever. Long-lived connections would eventually have thousands of these unwanted MessageSent messages, which had to be linearly traversed to get to interesting messages. Scanning this queue on every I/O event eventually slowed down a long-lived session to a crawl – and it consumed memory in the process.

This bug is now fixed in naggati 0.7 and kestrel 1.1.1. Kudos to John Kalucki, who found the bug during his own stress tests and persevered when I couldn’t reproduce it myself.

Sidekick LX party

2009-05-15T00:00:00-07:00

When Nick, Evan and I stumbled out of “Star Trek” at the Metreon last night, it was 1am. Nick announced that there was some kind of party at the Mezzanine, and Jack was there, but it seemed pretty late, and my iPhone said there was a final bart train in 10 minutes, so I decided to clock out and go home. Nick and Evan went on to the club.

All 4 bart entrances on Market around Powell were gated up, though. (How exactly is one supposed to take the last bart train?) So I texted back that I was coming anyway and went to the Mezzanine in Mint Plaza. The place was set up like an LA bar in decline, with bouncers and a bunch of signs celebrating the “Sidekick LX”, which is apparently the name of the final Sidekick. I wasn’t sure I was at the right place and had to text Nick a bit before I decided to go in.

It was packed, even at 1am. There was no cover and it was an open bar filled with all kinds of kitchy decorations to celebrate the launch of Danger’s final phone in some sort of fake Metropolis-slash-amalgamation of all American cities (plus an English phone booth). One of the first things I witnessed was a set of fake “telephone poles” fall on top of partiers, light bulbs tinkling under our feet. I immediately took this as a metaphor for Danger.

There were no other Danger people there, and a very buxom woman wearing a Sidekick LX shirt wasn’t even sure what Danger might be. But never mind, the drinks were free, and DJ Jazzy Jeff was on stage. Another guy was “MC"ing, which appeared to mean that for several hours he had to walk around on stage shouting "What! What! DJ Jazzy Jeff in the house! Say what! Yeaaaaah!”

Everyone in the audience was taking pictures of each other and texting – using blackberries and iPhones. The only Sidekick I saw anywhere was in the hands of a drunk T-Mobile rep. But Twitter was represented! A looping movie behind the “MC” showed a bird flying around, and a Sidekick that opened up to reveal 3 apps: Facebook, Myspace (?!), and Twitter. Neet! Jack teased me a few times about the work I’d put into making a push-notification service for the Sidekick, which they were too disorganized to make use of for six months until we finally just threw the code away. I think the current Sidekick Twitter app just uses the same polling API as everyone else. Not a metaphor: That’s a good example of how poor Danger’s follow-through was.

This all could very well have been a dream or nightmare, or both combined. Certainly an event that combines Twitter, Danger, DJ Jazzy Jeff, and free drinks sounds highly improbable. But I must report it as I remember it, because that is the only way I know how.

Does anyone actually use JMX?

2009-03-29T00:00:00-07:00

In java 5 update 6, or 1.5.0_06, or whatever goofy release naming scheme they’re using now, there’s a cool replacement for jconsole called “visualvm”. (On my mac, it didn’t come with the installed java, but it can easily be downloaded.) It cleans up jconsole a little bit, and merges in some of the other tools. You can monitor, take heap snapshots, and even do an extremely skeletal form of profiling.

We were tinkering with hooking up some of our servers' stats to monitoring tools by exposing them as “mbeans” across JMX (Java Management Xtreme), and someone suggested that it would be nice if the server’s configuration could be viewed and edited through the same console interface. So during some of the more lucid moments of my two-week-long cold, I tried hooking up a JMX interface to configgy to do just that.

It hasn’t gone extremely well.

Java – and JMX in particular – has reams of documentation, but most of it assumes you are a drooling infant or recent PHP convert. Quite a lot of it assumes you want to write your own replacement for jconsole or visualvm. Very little of it is written for someone who may just want to hook up management tools to an existing app or server. This seems strange when java is used so heavily for servers.

I finally figured out that the best interface for a tree of config nodes was to create a tree of DynamicMBean objects which report their attributes as being the attributes on the config node. This works pretty well. The interface through visualvm won’t win any awards (and would make a UX person weep openly) but functions basically the way you expect and lets you discover and edit things in a straightforward way.

The fatal flaw in the UI is that if you add a new config item, visualvm doesn’t update the display tab to show the new attribute. (Jconsole behaves similarly.) At some level, it does appear to know to re-fetch the “mbeans” for the config tree, because I can see it doing that in the debugger. But it doesn’t use this information to update the display.

Looks like I have 2 options:

Register “mbeans” as the config nodes, and just live with the fact that they won’t be updated for new attributes. If you add a new attribute, you’ll have to disconnect visualvm and reconnect it to see everything.
Any time any config attribute changes, unregister and re-register the “mbeans” for config nodes. This makes visualvm destroy all open tabs to config nodes and rebuild the tree of config nodes in the left panel. You have to open the nodes back up to see the changes.

For now, I’m leaning toward #1 but both options make me sad. The punchline is this article I found while searching around on google, since I was convinced I must be doing it wrong:

http://forums.sun.com/thread.jspa?threadID=5218394

The thread summary seems to be: “We have documented a way for dynamic ‘mbeans’ to indicate that their attributes may change. We did not code our own tools to follow our own documentation so no existing tool uses this hint. Has our incompetence convinced you that you really want to do what you want to do, or are you eyeing that python book with a bit more interest now?”

Is there anyone out there that actually uses JMX for real?

Actors, Mina, and Naggati

2009-03-02T00:00:00-08:00

A few weeks ago, Jorge Ortiz gave a good talk at BASE about actors. I want to summarize a bit of that and explain why I wrote naggati, and why I think mina-plus-actors is a big deal.

The history

In the 90s, we server coders wrote daemons using the fork/exec process model. Apache still does it this way, though you can switch to threaded models in the current version.

Having a bunch of processes makes coordination difficult (you have to use shared memory, pipes, or something similar) but works well for a web server where there isn’t really any shared state except configuration, and you can just respawn new processes if the configuration changes.

For a chat server, you’d like a lot of shared state and communication between the various sessions. So in the late 90s (early 2000s on Linux), we started using threads. Each incoming connection spawns a new thread.

Each session is exactly one thread. Because threads all share the same memory space, communication between sessions is easy. You just have to make sure to use locks and condition variables, and to get the locking logic right – which can be tricky.

When you start to have a lot of sessions, the number of active threads begins to get a little unmanagable. At a past job, we had a daemon that would handle between 3000-4000 sessions this way. We couldn’t push it past that, because the OS wasn’t able to allocate stack space for more than about 4000 threads, and the task switching overhead was getting pretty ridiculous anyway.

Most sessions are blocked waiting for new data most of the time, so in the mid-2000s, server coders migrated to a thread-pool model.

Java even included a good thread pool library in java 5. In this model, you can have thousands of open sessions or socket connections, but you farm out work across a smaller pool of threads. Usually you have one or two threads in a socket poll() call (or Selector in java) across all the open sockets, and when incoming data arrives, it posts a work item into a queue, which gets handled by the next available thread from the pool.

Here’s where mina comes in

Mina takes this thread-pool pattern and pulls it out into a reusable library. You can register a socket with mina and get an IoSession object back. There’s a Selector running in its own thread, and when an I/O event happens on one of your registered sockets, mina calls an event method on your IoHandler, passing in the IoSession. These methods represent events like “new data has arrived” or “some data you sent has been successfully transmitted”, and they are called from within a thread pool, or other executor of your choosing.

The great thing mina adds to this mix is the idea of a “protocol decoder”. Instead of handling I/O events in your session and doing stateful buffering and protocol decoding there, mina lets you write a ProtocolFilter that receives incoming data as a byte buffer. This filter can do the buffering and decoding, and pass decoded objects on to the session. For example, a web server could have a decoder that takes byte buffers and generates HttpRequests.

The protocol decoding happens in a special set of I/O processor threads maintained by the mina library, so your IoHandler only gets events for fully decoded objects, and the handler code can be small and simple. In the web server example, an IoHandler would receive HttpRequest objects as events that run in a worker thread from a thread pool. The repsonse can be created and queued for writing, and then the event is over. Since the queued work items are small, discrete tasks that finish quickly, many active sessions can be handled across a much smaller number of threads.

There are only two big speedbumps with this system:

You’re using threads, so you need to be very careful about locking and all the other threading issues.
The protocol decoder is isolated into its own filter class, but it still has to be stateful and buffered. So it will still be ugly and gnarly.

Enter actors…

Actors solve problem 1. There are several good articles about actors out there, so I won’t belabor the point. With actors, you can write code in a way similar to writing threaded code, but all communication happens through message passing. The actor library promises that any actor will only be running in one thread at a time, so you can avoid using locks entirely if you want. Shared state can be passed around as “messages” that are immutable objects.

There’s a pretty clear one-to-one mapping between the events that mina would like to notify you about, and messages passed to an actor. It’s straightforward enough that I put it into naggati as a helper class called IoHandlerActorAdapter. It creates an IoHandler that handles every event by sending a corresponding message to the actor handling that session. (For example, the messageReceived method causes MinaMessage.MessageReceived to be sent.)

This line in kestrel is all that’s needed to tell mina to create a new actor for each incoming connection:

acceptor.setHandler(new IoHandlerActorAdapter(session => new KestrelHandler(session, config)))

…and naggati

Mina’s written in java, so they can’t do much about the way protocol filters need to be written, beyond giving you some tools to store state between calls and some base classes you can use to get implementations of some common bits (like reading a line). But scala’s syntax supports attaching code blocks to method calls as Function objects, so we can write code that looks sequential but is actually a series of continuations.

Naggati is a library that makes it easy to build protocol filters for mina 2.0 using this sequential style. A DSL allows you to write the decoder in sequence, but each decoding step is actually a state in a state machine, and when one step is finished decoding, it calls the code block for the next step.

As an example, let’s say there’s a simple protocol that sends packets. Each packet starts with an (ascii) count of the number of bytes in the packet, followed by a linefeed, and then the actual bytes. (This is how HTTP chunked encoding works.) We could write the decoder like this:

case class DataBlob(data: Array[Byte])

val decode = readLine { line =>
  readByteBuffer(line.toInt) { data =>
    state.out.write(DataBlob(data))
    End
  }
}
val decoder = new Decoder(decode)

The first line just defines a case class for the decoded objects we’re going to be throwing to an actor inside a MinaMessage.MessageReceived event.

The decode step is where it gets interesting. It just calls readLine with a block of code that will process that line when it’s read. readLine returns a Step object that contains that code block and some logic for reading bytes from mina’s buffers until a linefeed is found. Once a complete line has been read, readLine passes it to the saved code block.

In this example, that code block just contains another call (readByteBuffer) which returns a Step that reads a certain number of bytes and then passes them on to the attached code block. The nested code block takes that byte array and sends it back to mina as a decoded object, to be handed off to a session actor. End is a special state object that means decoding is complete, and the decoder should reset and start over at the beginning again.

We’ve written a tiny state machine here, which buffers data as it arrives and progresses through each state as the previous state is completed. But we didn’t have to write it that way. By passing code blocks around, we got to write a very simple, sequential description of the protocol without worrying about callbacks or buffers.

There are a couple of more in-depth examples in the naggati source code, including an ASN.1 tag decoder, and a simple HTTP request decoder. They’re too long to quote here, but they demonstrate things like using a value from one decoded state to help decode a later state. Kestrel is another good example, because it uses nagatti to decode memcache requests in a pretty small chunk of code.

With actors, mina, and naggati, I think the promise of a simple programming model for servers with thousands of connections is here. You can spend your time thinking about and coding up the actual logic of a server, and not worry so much about how to juggle all those sockets and threads.

Hopefully in this tiny space I’ve convinced you too.

Paramiko is on github now

2009-02-16T00:00:00-08:00

A story

A few years ago, at Danger, I wrote an SSH client for the Hiptop (or Sidekick) phone, which we gave away to developers for free as a demonstration of our API and the cool things you could do with it. Later, the developer program would be effectively shut down, and the app would be renamed “Terminal Client” and sold to hapless users for $10. But I didn’t know that yet.

The protocol implementation was awful. SSH protocol “documentation” was just a handful of poorly written RFC drafts that described things in seemingly random order. I hacked on some java code until I got it mostly functioning and left it at that. But it was so buggy and bare-bones that it often would only connect to certain OpenSSH daemon versions, and this eventually bothered me enough that I wanted to fix it.

Python was my favorite language at the time. The syntax is so simple and clean that I feel like I’m writing in pseudocode. It’s a very close match to “the way I think” when I’m trying to solve a problem, so there’s a very low impedance between what I think and what I type. If I’m trying to sketch out a rough draft of code, I usually do it in python as a prototype. And since the SSH protocol seemed like a difficult problem to me, I decided I should try to write an SSH library in python, as a prototype for doing it in java.

The basic idea was to use a thread to handle the protocol level of the socket, so that SSH’s packet windowing requests would get immediate responses, and supply an API to the client that looked just like a normal python file. Data should arrive, get handled by the protocol thread, and stuffed into a buffer so that a client read could get it. The read would trigger window feedback to the server, but the client could go do other stuff while waiting for data.

It worked beautifully. I called it paramiko as a dorky esperanto pun and released it, and after a while, to my moderate surprise, people were using it. I didn’t really anticipate that it would be useful for automating tasks for working with server machines and clusters, though it’s kind of obvious in retrospect.

Python’s thread model (the dreaded Global Interpreter Lock) isn’t awesome, but when threads are mostly blocked on I/O, they’re good enough. The bazaar team obsessed about performance, and helped find and solve some real bottlenecks, so that in the end, paramiko was slower than OpenSSH, but not by much.

Over time, the feature set exploded. Now it can be an SSH client or server, or an SFTP client or server, and has a few experimental SFTP features and a few made-up ones (just to see if they were useful). I wanted to try writing a standalone SFTP server based on it, for making little file-server sandboxes, but… I only have so much free time. :)

I did end up using what I learned to write “jaramiko”, a java library for SSH, and used that in later releases of the Sidekick SSH client, but it was never really as powerful or successful as paramiko. In retrospect, it’s obvious why: Java is not even remotely as expressive or concise as python, so most of the java code is boilerplate that obscures the real logic. It’s hard to keep up-to-date and maintain, much less add new features to.

What now?

I haven’t made a new release of paramiko since July. It feels like even longer than that. Really, to me, the library is finished. I’d like to fix any remaining bugs in it, but I don’t get many bug reports any more, and most of the mailing list support questions are answered by other users.

I don’t want the project to die. I just have to admit to myself that I’m not really taking a leading role in it anymore. So I’ve migrated the repository to git and put it on github here: github.com/robey/paramiko Anyone can fork it there and submit patches, and I’ll make regular releases as long as there are still bugs to fix, or new features that look cool.

Why git? I like bazaar better (see some previous rants on this blog) but the rest of the world has chosen git, and I don’t have the energy to keep fighting that fight. It’s also easier for me if I can use the same VCS everywhere. Keeping the project on github, in front of my face, will make it less likely to bitrot. At least, that’s my hope.

Scala on Android

2009-01-19T00:00:00-08:00

Chris DiBona gave me a free G-phone after an open-source talk last week (thanks!) with the admonishment that I write an app for it. I’m going to give it an effort, at least!

My first task was to figure out the scala-android build process, because that will make coding much easier. Turns out to be easier than the instructions I found online, so I thought I’d share.

You don’t have to use Eclipse like the instructions say. You can use the ant build file, but you have to modify it to find and compile using the scala compiler.

First, copy scala-compiler.jar and scala-library.jar out from your scala-home and into your android project folder. Since the compiler is only needed for building, and the library is the only one you’ll need installed, I made a folder for the compiler and put the library in libs/

$ mkdir scala-compiler
$ cp $SCALA_HOME/lib/scala-compiler.jar scala-compiler/
$ cp $SCALA_HOME/lib/scala-library.jar libs/

Second, edit build.xml and change the compile task to look like this:

<target name="compile" depends="dirs, resource-src, aidl">
    <javac encoding="ascii" target="1.5" debug="true" extdirs=""
           srcdir="${srcdir}" includes="**/*.java"
           destdir="${outdir-classes}"
           bootclasspath="${android-jar}">
        <classpath>
            <fileset dir="${external-libs}" includes="*.jar"/>
        </classpath>
    </javac>

    <taskdef resource="scala/tools/ant/antlib.xml"
             classpath="scala-compiler/scala-compiler.jar:${external-libs-ospath}/scala-library.jar" />
    <scalac force="changed" deprecation="on"
            srcdir="${srcdir}" includes="**/*.scala"
            destdir="${outdir-classes}"
            bootclasspath="${android-jar}">
        <classpath>
            <fileset dir="${external-libs}" includes="*.jar"/>
        </classpath>
    </scalac>
</target>

Third, you need to edit dx from the android toolkit so that it gets lots of memory. Uncomment the line with the java heap option and give it a lot of memory, like 512MB ("-Xmx=512m"). It needs this because dex is going to convert the entire scala library into dalvik. Without a lot of memory, it will run out of heap space.

The end result is a little wasteful: even a tiny “hello world” app is 800K, probably because of that big ol' scala library. I suspect dex is proactively throwing away classes that I didn’t use, or it would be even bigger.

Java SE6 on Leopard

2008-12-01T00:00:00-08:00

I could not find this anywhere on google in one place, but I pieced it together from clues Steve found by poring over a half dozen other blogs.

To get Java 6 to run on Leopard, you must go into /System/Library/Frameworks/JavaVM.framework/Versions and blow away the Current and CurrentJDK symlinks. Make them point to A:

$ sudo rm Current CurrentJDK
$ sudo ln -s A Current
$ sudo ln -s A CurrentJDK

Now launch the Java Preferences app and move the JDK order to the way you want it.

Setting the symlinks manually to 1.5 or 1.6 will make text apps work, but jconsole will blow chunks. It seems like the Java Preferences app changes “what’s in A” and then the symlinks make that work.

If you have the symlinks set up in the pre-Leopard way, the prefs app won’t fix them.

Sigh. Apple.

Scarling » Kestrel

2008-11-27T00:00:00-08:00

Ever since we deployed scarling in production, its name has progressed from being a stale joke to an annoyance. It started out as a test of porting starling to scala, but as features have been added and it’s been hardened by the real world, it has needed its own identity. I wanted to stay with the bird theme, because I think it’s cute, so after spending several days mulling over possible new names, I’ve settled on “kestrel”.

Poof! And so it is. What was scarling is now kestrel. I’ve updated the code and github page: http://github.com/robey/kestrel/

Meanwhile, over the last couple of weeks, I’ve added three big features.

Big Queues

One of the working assumptions of kestrel is that queues are normally empty. A healthy system has more consumers than producers, so new work items are gobbled up as fast as they come in, and we can usually keep all queued items in memory, for the rare times there are any items queued at all.

However, you may have an event – like a realigning US presidential election – which causes brief “high traffic” bursts that temporarily overwhelm consumers. It became clear at Twitter that we needed to have a graceful soft-landing for these bursts, to prevent kestrel from running out of memory or needing manual intervention.

In kestrel’s git-head, now, when a queue passes a memory limit (128MB by default), that queue will be dropped into what I call “read-behind mode”. New items are added to the queue by writing them into the queue’s journal, but not kept around in memory. Instead, we just keep the first 128MB of the queue head in memory, and track a second file pointer to our journal. As items are read from the queue, this new file pointer replays the journal from behind, filling the queue back up until it either catches up with the write pointer or fills up 128MB again.

In effect, we’re keeping a window of the head of the queue in memory, and using the journal as a disk store for the rest. It nicely caps memory consumption and the added disk I/O can be amortized out across consumers.

You probably don’t want to let an out-of-control queue grow forever because it will fill up your disk, but this should make it cope well with short-term spikes and give you one less thing to worry about when the snake is trying to swallow the pig.

Blocking fetches

Something that’s bothered me about using the memcache protocol is that there’s no way for a consumer to do a blocking fetch from a queue. If an item is immediately available, kestrel will give it to you. If not, you’ll immediately get a “nothing” response. Since, like I just said above, you always want to have more consumers/workers than work items, these consumers swarm all over the cluster, asking for work and immediately being sent away empty-handed. Just to keep them from going crazy, we have ruby client code that looks something like this:

while !(response = QUEUE.get(queue_name))
  sleep 0.25
end

Good grief. If we’re going to let the workers take a nap on the job, we could at least make it happen while blocking on a queue fetch.

So I did a little sneaky thing with queue names in the memcache “get” command by letting clients add options to the end, separated by slashes. Slashes aren’t allowed in filenames anyway so they were never valid in queue names. Then I made a timeout option, so a client can ask to block for work for some amount of time:

while !(response = QUEUE.get("<b>#{queue_name}/t=250</b>")); end

The “t=250” option means “if there’s nothing in the queue right now, I’m willing to wait up to 250 milliseconds for something to arrive”. After that timeout, if there’s still nothing, kestrel will answer with the usual empty-response. It’s important here to make sure that your memcache client is set to have a read-timeout larger than the timeout you send in the “get” request.

This was the easiest thing to implement after I worked out how. Each queue just has a kernel-style wait-list of clients attached to it. If a client makes a timeout-style “get” request, and the queue is empty, we just put the client on the wait-list and the client’s actor does a receiveWithin(timeout) to wait for a message saying something new has arrived. When items are put on the queue, the first wait-list client is removed from the wait-list and notified.

The ManyClients load test exercises this by having 100 (or 500) clients pile on to a queue with blocking fetches while a single producer slowly trickles out data. It seems to work like a charm.

Reliable Fetch

Writing something into a queue is pretty reliable. The client does a “set” operation, and if it worked, kestrel responds “STORED”. Naturally, it only sends that response after the item has been written into the queue’s journal file. The “STORED” response means kestrel has taken responsibility for the item.

Fetching from a queue is not such a happy story. When kestrel sends an item to a client, it will never get an acknowledgement or confirmation, and has to blithely assume that the client got all the data okay and took responsibility for it. If a client loses its connection during the data transfer, or crashes right after receiving a work item, that item is gone forever.

So I added an “open” option to “get” which opens a tentative fetch on a queue. If an item is available, kestrel will remove it from the queue and send it to the client as usual. But it will also set the item aside and prepare to “un-get” it if the client disconnects without confirming it. So a tentative fetch is started with:

QUEUE.get("#{queue_name}/open")

and confirmed with:

QUEUE.get("#{queue_name}/close")

which returns nothing. For efficiency, you can also confirm a previous fetch and get the next item in one operation (avoiding an extra round-trip):

QUEUE.get("#{queue_name}/close/open")

Each client connection may only have one outstanding tentative fetch, and if a connection is dropped, any tentatively-fetched item will be put back on the head of the queue and given to the next available consumer.

I want to briefly make a distinction here between confirming that a client receives an enqueued item and confirming that some useful work was done on it. Kestrel can really only concern itself with the former. As a good queue server, it would like confirmation that a client has accepted responsibility for an item before that item is erased from the queue and journal. But it has no way of confirming that “something useful” was done with that item. You still need to write careful client code to ensure that an item isn’t lost after it’s received.

Using reliable fetch means you are protected from losing items, at the expense of potentially receiving duplicates – that’s the trade-off. A client may successfully handle a fetched item but crash before confirming it to kestrel, and the item may then be given to another client. I think this is a good trade-off, though. If you know you may handle some items twice, you can design your system so that duplicate work is harmless – versus the case where you may lose items and don’t have any recourse.

Summary

With these three new features, you should be able to survive large bursts of traffic more easily (with big queues), allow incoming items to be processed immediately by the next available consumer (with blocking fetches), and deliver items reliably even to flaky consumers (with reliable fetch). They expanded the code size 50%, from 1000 lines to 1500, but I think they were worth it, because they solve several limitations inherited from starling.

Coder Dvorak

2008-09-20T00:00:00-07:00

Some kind of bug was spreading around at work a few weeks ago and got me interested in alternate keyboard layouts again. Several of my co-workers use Dvorak, which never really interested me. But Evan pointed me to a page on “programmer dvorak” and suddenly my interest was piqued.

Dvorak rearranges the letter keys so that letters used more frequently in english are closer to the home row, and Progammer dvorak continues that concept into the symbol keys and attempts to rearrange them so that symbols frequently used in code are close at hand. The stroke of genius that makes it worthwhile is that he moves the number keys up to shifted positions, making space for more non-shifted symbols. I don’t hit the number 8 nearly as much as I hit underline, plus, or the “squiggly braces”.

Unfortunately, he then moves the number keys into a jumbled-up order so that they’re no longer in incrementing or any other intuitive order. I’m not sure why. (I later found out that this is apparently the original Dvorak number layout which nobody uses.)

Being the type of person I am, I decided I could do better, so I constructed a “coder dvorak” based on this idea, but keeping the number keys in order, and moving symbols to positions that make sense for a modern language. (There’s not much need to keep $ nearby unless you still do a lot of perl.)

I grouped the symbol keys into four loose categories, the first being “paired” (like parentheses), and the other three ordered from “most used” to “least used”. Then after trying it out for a few days, I re-assigned several symbols because I made some mistakes in guessing frequency of use – both period and slash were used way more than i thought, for example.

The layout I finally adopted, unchanged for a month now, is below:

I used the awesome Ukelele to create the Mac keyboard layout file, which can just be dropped into ~/Library/Keyboard Layouts/ in your home folder.

A lot of the design came from whiskey-inspired ideas from Britt and Evan. Originally, I was going to put the paired keys on symmetric sides of the keyboard, like having “(” on 5 and “)” on 7, which we all convinced ourselves made sense. But after trying that for a day, it was clearly wrong. It turns out that by the time my fingers jump the two rows from the home row to the number row, the finger I use to hit a key isn’t 100% deterministic. It depends on what else I’ve been typing. The home row “bindings” don’t hold as tightly up there. So it was as if I’d put the symmetric keys randomly apart from each other.

Some placements worked out better than I expected. The underline key (used heavily in ruby, python, and scala) almost couldn’t be in a more accessible place. Similarly, period and slash ended up in great places, and putting bang (!) and question (?) on the same key is strangely intuitive.

The thing I have the most trouble with, after about 6 weeks, is the Dvorak letter layout.

1. Dvorak is right-handed-centric.

Great, because I’m right-handed. But it’s not that simple! Look at your QWERTY keyboard: more letters are centered around the left hand than the right. QWERTY even devotes one of the 8 precious home-row positions to semi-colon on the right hand – hardly a frequently used english key! (Most english writers have never even learned correct usage of the semi-colon.)

I never realized, until typing on Dvorak, that QWERTY has made my left hand a much stronger typer. Dvorak relies on the right hand a lot more, and I was in the strange position of having my right hand get tired pretty quickly for the first couple of weeks.

2. English has a different distribution than a programming language.

It mostly works, but there are a few cases that sting. Words like “if” and “while” appear all the time in code, but are difficult to type in Dvorak because F and W are in awkward places. A real coder layout would probably reorganize the keys to match actual words used in programming languages, but I chickened out. It seemed like a very large gratuitous change.

The worst is “I” versus “U”. “I” is used all over the place in coding. It’s also the most common loop variable name. So it should have been on the home row instead of “U”, which is rarely used.

Anyway…

I’m liking it so far. I may change my mind in another month or so, because I’m pretty hard to please, but it’s working well for now.

The keyboard layout can be downloaded here: coder-dvorak.keylayout

Update - Nov 2013: Evan Weaver has provided a Windows keyboard file also: coder-dvorak.klc

OS X Leopard JNI Link Error

2008-09-02T00:00:00-07:00

I just spent a long time tracking this down, and nobody else on google had ever reported a solution. So I’m posting this so the next person wit h this problem can find this post.

If you see this error when linking something in JNI:

$ gcc -dynamiclib -o libDirectorySyncer.jnilib target/c/dsync.o -framework JavaVM
ld: framework not found JavaVM

it’s because leopard has messed up your JDK folder:

$ ls -l /System/Library/Frameworks/JavaVM.framework/Versions/Current
/System/Library/Frameworks/JavaVM.framework/Versions/Current@ -> 1.6.0

Wrong! It needs to be pointing to “A”. No idea why.

$ cd /System/Library/Frameworks/JavaVM.framework/Versions/
$ sudo rm Current
$ sudo ln -s A Current

That will fix it. Yay!

CJC Prime Number Sieve

2008-08-27T00:00:00-07:00

Apologies in advance. This is probably the geekiest post I’ve ever done.

To get an SSH client working – with a reasonable response time – on a 200mhz ARM chip a few years ago, I had to optimize some crazy things. One of those things was the generation of prime numbers, which are needed to create new public keys.

You may have heard that it’s very difficult to factor large numbers. This is true, but it’s actually a lot easier to determine whether a large number is “probably prime”. Most prime number generators create a random number from a secure source of entropy and then run it through a Miller-Rabin test, which can identify composite (non-prime) numbers. If the test doesn’t prove that a number is composite, then there’s at least a ¾ chance that it’s prime. You can perform the test multiple times to improve those odds. More info here: Miller-Rabin test

The Miller-Rabin test can be pretty time consuming, though, and since you may need to run it against many random numbers before you’ll find one that’s prime, it would be nice to weed out obvious composites beforehand. For example, you wouldn’t want to test an even number since that’s obviously divisible by 2. So here’s a shortcut I came up with that I call the “CJC prime number sieve”.

The Sum-of-Digits Trick

I thought I would be able to gloss over this part of the explanation by throwing out a few links and saying “go read about the (name) theorem!” But a google search only turns up some anecdotal descriptions, and wikipedia barely devotes an entire sentence to it (here: Modular Arithmetic).

There’s a fairly well-known trick for finding out if a number is divisible by 3: add the digits, and if the sum of the digits is divisible by 3, so is the original number. If it’s not, the original number is not. For example, 4401 is divisible by 3 because 4 + 4 + 0 + 1 = 9, and 9 is divisible by 3. 512 is not, because 5 + 1 + 2 = 8, and 8 is not divisible by 3. (This trick also works for 11 and 9 but these are emo numbers that don’t get as much attention.)

So why does this work? As you can guess from the wikipedia link above, it has to do with modular arithmetic. In (mod k) space, the only numbers that exist are integers from 0 to k - 1. Coders recognize this as being the way all int math works on a computer: a byte can represent a number “mod 256”. It turns out that in (mod k) space, you can cancel out a factor if that factor is relatively prime to k.

Say what? Okay, “relatively prime” just means two numbers don’t share any factors. 9 and 10 are relatively prime because 10 = 2 x 5, and 9 = 3 x 3. But 4 and 6 aren’t, because they share the factor 2.

Sooo… since:

20 = 2 (mod 9)

and 20 is relatively prime to 9, you can figure out 40 mod 9 by factoring out the 20, and replacing it with its equivalent in (mod 9) space, 2:

40 = 2 * 20
2 * 20 = 2 * 2 = 4 (mod 9)

The reason this works well for finding numbers divisible by 3 is that:

4401 = (4 * 1000) + (4 * 100) + (0 * 10) + 1

and:

10 = 1 (mod 3)

Aha! So factoring all the 10s out and replacing them with 1s gives us:

(4 * 1000) + (4 * 100) + (0 * 10) + 1 (mod 3)
(4 * 1) + (4 * 1) + (0 * 1) + 1 (mod 3)
4 + 4 + 0 + 1 (mod 3)
9 (mod 3)
0 (mod 3)

and any number that’s 0 in (mod 3) is of course divisible by 3.

The reason 9 and 11 work is that 10 mod 9 = 1, and 10 mod 11 = -1. (The -1 means you have to do alternate add/subtracts instead of just summing the digits, but it’s not worth obsessing over in this article.)

Making a sieve

So that’s exciting… if you do a lot of dividing by 3. But how does this help discover primes?

Well if computers worked in base 10 instead of 2, we’d now have a pretty fast way of determining if a really large number were divisible by 3. And I hope you won’t think I’m being pedantic if I point out here that if you make up a random number, it has about 1/3 chance of being divisible by 3. :)

Let’s say instead of working in base 10, we were working in some other base B that was relatively prime to a lot of interesting numbers. Ideally we’d like to be in a base B such that

B = 1 (mod n)

for as many n as possible. One nice coincidence is that

256 =
255 + 1 =
(3 * 5 * 17) + 1

256 = 1 (mod 3) and
256 = 1 (mod 5) and
256 = 1 (mod 17)

That means that in base 256, summing the digits will tell you quickly if 3, 5, or 17 is a factor. Summing the base-256 digits of a number also goes by the name “adding the bytes in the binary representation”. It actually helps a lot, as you can see from timings at the end of this article.

If we’re willing to leave powers of 2, though, we can do better. It just so happens that we can construct a base that’s relatively prime to a handful of small primes, yet is close to a power of 2. The key is noticing that, in the 256 case above, multiplying a few primes together and then adding 1 created a number that was both relatively prime to all those primes, and was equal to 1 in each of the primes’ (mod k) spaces.

Think of it this way. If A, B, and C are all prime, then A x B x C will equal 0 in (mod A) because it’s a multiple of A. Same goes for B and C for the same reason. Adding 1 makes it equal 1 in all three (mod k) as long as the primes are all greater than 2. And that also means the resulting number is no longer a multiple of A, B, or C, which makes our new number relatively prime to them. In short, we can construct a base that’s relatively prime to a set of primes, and is equal to 1 in each of the primes’ “mod spaces”, by just multiplying the primes together and adding 1.

We want the base to be close to a power of 2 to preserve our entropy source. If we grab 8 bits (0 - 255) from a secure random pool, but we want a smaller base like 250, we have two choices. We can mod the random byte (rand % 250) to wrap into range, or we can just discard and retry when we get a byte that’s out of our preferred range. If we mod-and-wrap, though, we’re giving more weight to results in the wrapped part – numbers like 1 are going to be more common because both 1 and 251 will mod-wrap to them. That ruins the “secure” part of our randomness, so we need to go with the discard option. And if we have to discard some bytes, we’d like to pick ranges to minimize the chance of a byte being discarded, which means choosing ranges that are as close to a power-of-2 range as possible.

One nice possibility is:

3 * 5 * 7 * 11 * 13 * 17 * 19**2 * 23 + 1 = 2119382266 = 0x7e5334fa

That’s a base that covers 98.7% of the 31-bit range, so it won’t cause many random digits to be discarded, and it’s relatively prime to the first 8 primes excluding 2. (We can ensure that a number isn’t divisible by 2 by just setting its lowest bit to 1 – a bonus for working in base 2!)

Let’s Go!

At this point, the code should just write itself. To generate a 1024-bit random prime, we need to figure out how many “digits” that would be in a base-2119382266 number, and what the range would be on the highest digit. We’ll lose some of the 1024-bit range because our new base doesn’t map exactly to the desired range, but we’ll never lose more than a single bit’s worth. (There’s probably a fancy proof for this, but if you think about it, you can reason it out pretty quickly in your head.) You already lose 2 bits for any prime, because you need to set the high bit to ensure a prime number of the desired size, and you need to set the low bit to ensure an odd number, as mentioned above.

For a 1024-bit prime, that means a 34-digit number with the highest digit set to 2. For a 2048-bit prime, it’s a 67-digit number with the highest digit between 5 and 8 inclusive. (You can play around with other sizes by calling computeParams in the attached code.)

We can then generate, for a 1024-bit prime, 33 “digits” of 32-bit machine words, masking off the high bit and retrying any time we get a “digit” bigger than (base - 1). Set the high “digit” to 2 (or select a digit at random for other bit choices) and we’re done.

A quick shortcut here: Normally we should sum the digits, and then find the mod of that sum against each of our 8 primes. However, we already have a really convenient thing here: base - 1 is a multiple of all of 8 primes (by design), so we can mod the sum by (base - 1) as we add, and not lose any information, keeping the sum small enough to fit into a 32-bit word.

After summing, a few small mod operations tell us if the generated number is divisible by 3, 5, 7, 11, 13, 17, 19, or 23. If it is, we can loop back and generate another number immediately, skipping the Miller-Rabin prime test. If it passes, we still dump it into Miller-Rabin for final verification – the sum check just lets us quickly discard any simple composites.

Code and Results

I wrote a proof-of-concept in scala, and hosted it in git here:

git clone http://www.lag.net/robey/code/cjc/

It includes an implementation of CJC in base 2119382266, alongside a prime number generator that just uses the standard Miller-Rabin prime test by itself, and a variant of the standard M-R sieve that checks the base-256 primes (3, 5, 17) first.

The test code generates 5 primes from each algorithm, using a random number seed given on the command line. (By the way, never use the built-in random number generator like this code does! Use a secure one. I used a seeded generator so the results would be repeatable, which is exactly what you don’t want in the real world.) I’ve posted my own results in a chart below: Using the base-256 check alone gives a 2x speedup, and using CJC gives over 3x! The primary increase, according to the second chart, appears to be due to cutting back on calls to the Miller-Rabin algorithm. The more candidates we can discard before trying M-R, the better.

(By the way, ignore how long these algorithms are taking in wall-clock time. I ran them in a standard JVM without JIT. In real life, you would definitely want this to be in JIT or possibly even C – horrors!)

Anyway, hopefully this is an interesting technique for filtering primes. And the name CJC? It’s named after my friend Communist J. Cat.

Git for the real world

2008-07-13T00:00:00-07:00

Now that we’ve been using git at Twitter for a couple of months, we’ve overcome several crippling problems and misunderstandings about how to use it properly. There are dozens of “intros” and “tutorials” to git online, but at some point you need to know more than just the basics of DVCS and the map to svn commands – you need to know practical considerations of real-world usage. None of the intros or tutorials had this stuff, so I thought I’d share what we learned.

Git’s command-line interface is hands-down the worst of any DVCS (except the archaic tla). There are inconsistencies: Some commands will expect you to type “origin/master”, while others will want “origin master”. Other commands should never ever be used, but are presented in the documentation as if they’re part of a normal usage pattern. Some commands are useless in their default form and need several command-line options to make them work right.

I ended up writing a wrapper script to cover up a lot of these flaws, which I consider an “ultimate fail” for a UI. But I’m still not sure the script is a good idea, since it may make me forget all the quirks I need to keep in mind when the script isn’t around.

Don’t change history

Two commands you should avoid: git rebase and git reset. Some of the tutorials will tell you that rebase is one of the first commands you should learn. Lies! rebase is a way to trick you into creating merge conflicts.

When you rebase, you are erasing every local commit you’ve made, and turning them into patches (as if you were back on CVS). After syncing your repository up with the remote one, your patches are re-applied one by one. Presto! you’ve changed history.

The only reason I can think of for doing this is if you’re not comfortable doing merges. But DVCS is all about merges, so you should just get used to doing them. A merge provides a little signpost to everyone else about your branch. Don’t fear the merge – love it! It records exactly how your local work should be rectified with remote changes, without requiring you to keep tweaking your patch.

reset is even worse. It erases commits from your history, which will very likely make your local repository different from everyone else’s, and guarantee future conflicts or even an inability to push in the future. Some people will say you should learn reset so you can use it in a panic situation, but if you’re panicking, you’re more likely to make things worse, so stop. Calm down. You have time to think and solve the problem in a rational way.

My problem with these two commands is that they violate a core philosophy of DVCS: Everyone has their own view of the repository, but these views obey entropy and flow in only one time direction. When they meet, they merge. Doing a rebase or reset goes back in time and changes the past. They should be in a separate tool, like “git-fix” or “git-hack”.

The story matters more than the chronology

Have you ever read a history book that said “In 1812, the British empire shelled the tiny new American capital. Meanwhile, Napoleon marched across Europe. In China, …”? Hopefully not, that would suck. Telling a thread of the story from beginning to end is more important than placing every single event in its exact chronological order. The default format for git log reorders commits by their exact date and time, so you need to be aware of that and not get confused.

Say, for example, you made a local branch, and made 3 local commits: L1, L2, and L3. Meanwhile, someone else is working on a different feature on their branch, and does commits R1 and R2. After you merge (M1), git is likely to show you a history like this:

M1 -> R2 -> L3 -> L2 -> R1 -> L1

Huh? What? Why are my local commits intermixed with my co-worker’s commits? The merge must have messed up! Crap! Time to git reset and destroy everything, right? No! Stop! Don’t do it. It’s a trap! Git is lying by omission – it’s telling the literal, actual truth, but it’s telling it to you in a way that makes it confusing. Git is re-ordering the history to make sure every commit is shown in its actual time order, not the story order.

You should probably just go ahead and alias log to:

git log --topo-order --decorate

That tells git to show things in “topological” (story) order, and to also mark where various branches are sitting. I usually find it useful to take that one step further:

git log --topo-order --decorate --first-parent

That tells git to show things in story order and to tell that story from my point of view. It’s sometimes interesting to see every commit that one of your coworkers did in their branch, but often you just want to see the merge-commit and move on. "-- first-parent" tells git to skip over the details of every branch that isn’t a linear parent of yours. Generally this means you’ll see a simplified history of what’s been going on, without the intricacies of what happened on forked branches while they were forked off.

If you want to see all the threads of history intertwined, I suggest using a graphical tool like gitk instead of git-log.

Don’t fast-forward – live every moment

This one is pretty confusing. And it sucks, because this concept doesn’t even exist in other DVCS. I think it’s another symptom of “fear of merge”. Basically, sometimes when you ask git to merge branch A into branch B, it will decide that it doesn’t want to merge and it will instead turn A and B into clones of each other.

For example, let’s pretend you made a branch of “master” called “feature” and did a few commits on it, and are now ready to merge it back into master. If no other work has happened on the master branch, git will try to out-clever you. It thinks: “Well, nobody else has worked on the master branch, so I could just make the feature branch become the new master branch and that would be logically equivalent.” So after the merge, you’ll see every single commit you made, as if you had done them directly on the master branch. Git has cloned your feature branch into the new master branch.

This might not be so bad if there are only a couple of people working on the project, but there are a few side effects: Your branch has effectively vanished from history. There is no longer any indication that you were working on a side branch; it looks like you were working directly on master. And if it turns out that there were bugs in your new feature (which, you know, sometimes happens), you can’t reverse the merge-commit because there is no merge-commit. You will have to reverse every single commit you made, in reverse order, or worse.

So really, you want git to always create a merge-commit when you do a merge. For this, you have to ask it nicely:

git merge --no-ff

(Git calls the history erasing “fast-forwarding”.)

A few other things

To remove a branch from a remote repository after it’s been merged and deployed, you have to push the branch with a colon in front of the branch name. This has become a running joke in the office: “Colon means delete.” Look, don’t ask me, I’m not Linus. That’s just how it works.

git push origin :stale_completed_branch

When other people remove branches, they won’t be removed from your local copy of the repository. To take care of this housekeeping, you need to express a fruit preference:

git remote prune origin

Again, don’t ask. I don’t know why. That’s just how it is.

I have a few ranty topics on how git is implemented and used, and how that compares with the older DVCS (especially bazaar), but I’ll save that for some other time. If you’re using git, hopefully this information is useful.

Git gripes

2008-05-11T00:00:00-07:00

I have to get this rant off my chest.

Why does git insist on making up new terminology for features that are in all DVCS, and already have commonly accepted names?

Last week during a presentation, I almost choked when someone showed off a git feature where you can temporarily shelve uncommitted changes. It’s called “git stash”. Stash?! Mercurial and bazaar at least agree that this is called, um, shelve. Since you are shelving. I don’t have a bag of heroin to stash, Linus. Just patches. To shelve. Now until the end of time I will need to remember which term to use when talking to git people or the rest of the universe.

In git, when you commit, you are actually staging. (Git doesn’t consider a commit to be truly committed until you’ve posted it to someone else’s repos itory – a charming bit of communism.) A staged commit is, to git, a cached commit. Wikipedia tells me a cache is a copy of data that’s already stored elsewhere, but nevermind, that definition doesn’t apply in the gitverse. Commit = staged; staged = cached; got it?

To switch branches in a working tree, you “checkout” in git. You also “checkout” to create a new working tree for a branch. The manpage for “git-check out” can’t even avoid sounding confused by this, and admits that the former usage really “switches branches”. Let’s see if we can think of a good term for switching branches. How about “switch”? Hey look, that’s what everyone else calls it! Must be a very odd coincidence. To branch is to “clone”; to revert is to “clean”. Not even the simplest of terms escape newspeak.

I want to end on a positive note, so I will say that compared to every other DVCS, git is fast. I frequently have no idea what it’s doing (is “deltifying” really a word?), or if it’s doing what I’d like it to do, but it’s doing it very very quickly. Maybe there’s a metaphor there.

Scarling

2008-05-07T00:00:00-07:00

I recently spent bits of my free time porting Starling to scala, and thought I’d post the story here in case anyone else finds it enlightening or interesting.

Starling is a simple, reliable message-queue server written in ruby. It uses MemCache protocol, so almost any language can already speak to it. You can set up multiple starling servers, and just like a memcache pool, the servers don’t need to know anything about each other. If clients pick a server at random for each operation, it appears to be a single loosely-ordered queue, with each server holding its own part. Starling is used extensively at Twitter.

I’ve been tinkering with scala since December, and have been building a config/logging library in my spare time. It was easy to pick up since I’ve been using java and python for many years, and scala seems to nicely synthesize the styles of those two languages. I felt like I was ready to try a meaningful project, and Starling is potentially the kind of project that could gain a lot from running on the JVM. At a mere 1000 lines of ruby, it was also a nice managable size.

First chunk

I’m a bottom-up coder, so I started with the PersistentQueue class. It’s the guts of Starling: a FIFO queue backed by a journal, and the means to rotate and replay that journal when necessary. For this class, I was able to do nearly a direct copy, just converting ruby syntax and library calls into scala ones. Only a few places tripped me up:

Scala, like java, has no equivalent of the “pack” function that’s in all of the perl/python/ruby languages. I actually had to write a 10-line class to write a little-endian int into a stream, and another 10-line class to read one. Ack! That wasn’t an auspicious start, and gave me pause. Everything that makes people run from java to higher-level languages was there: You’re given only the most basic, fundamental tools to do I/O, and serializing data is treated like some strange, rare operation.

Aside from that, it went smoothly, though. The payoff was when I downloaded a troublesome 500MB journal file from a running starling server and ran it through the scala journal replayer. Starling would take about a half hour to process the log (and frequently crash the ruby interpreter). “Scarling” (as I call my scala version) was able to process the log in 23 seconds. Exciting!

Next up was QueueCollection, which is what it sounds like: a collection of all the persistent queues, and methods for getting stats on them. Starling does some trickery here to avoid races around queues that are replaying journals. I made a few false starts, trying to either duplicate the logic or improve it, before I decided the really clever thing would be to make each queue an actor. Once I made that leap, the code practically wrote itself, and I stopped for a week.

Second chunk

The remainder of Starling is front-end code for handling connections, speaking memcache protocol, and interfacing with QueueCollection. I needed to break away from porting code at this point, because Starling uses a ruby wrapper around EventMachine, a fast asynchronous event I/O library. New incoming connections create a Handler object and send it events when new data arrives. I was sure this was a place to use actors, but I wasn’t sure how much of the I/O code I’d need to write. (I wrote most of ziggurat, Danger’s async I/O library, so I’m not scared of writing an I/O library, but it would be really time-consuming and non-fun.)

Googling for “java EventMachine” turned up links to an apache project called Mina. Aha! This was pretty much exactly what I was looking for. In fact, it closely resembles a ziggurat-derived library my friend Matt is working on, since they both use the idea of pushing protocol encoders into the I/O event engine. Mina not only creates a new “session” object for incoming connections, it can decode the wire protocol inside its own worker threads, and notify your session of I/O events by sending it fully-decoded objects.

This model is so close to how actors work that it just made my mouth water. I just needed some kind of gasket! More googling revealed that one (and only one) person had already done this work, and written a scala wrapper for Mina that turned “I/O events” into actor messages. Well actually, he did it as a patch against Mina instead of a wrapper, for ill-explained reasons. Oh, and the patch fails to compile. Oh, and all the links on his project page are broken. Oh, and also he has no posted email address. FAIL. I made an attempt to reach him through blog comments (ugh) and decided to do this one on my own.

Luckily, once I read enough of the (actually semi-decent) Mina docs, I was able to connect Mina to actors in less than 50 lines of code. I told you: the two models (Mina and actors) really are made for each other. I hope someone ends up writing a working wrapper for it. Or heck, just use mine as a starting point.

The last piece was a memcache protocol handler, written to Mina’s API. I found one from a project called jmemcached, ported the bits I needed into scala, fixed it up to use some of scala’s nicer collections like ArrayBuffer[Byte], and impatiently patched the wrole thing together for some trial runs.

First Results

The first results were pretty discouraging. I wrote a quick test script to open a connection and do 10,000 queue SETs, and on my old Macbook Pro, Starling could do them in under 4 seconds. My scala port could do them in 30. This is roughly a factor of 10 worse – an order of magnitude. Miserable failure. What was I doing wrong?

Scala doesn’t have many (any?) performance benchmarking tools, but java does, and these java tools don’t seem to care if your jar was made by java or scala. I fixed a few small things, but wasn’t making much progress. My inexperience with hprof made me fix the least important things first, but the last two were worth telling about.

Using an actor for each queue was overkill. Reading from or writing to a queue is such a fast operation that the message-passing for “please write to this queue”, “okay done” was deadly. I reluctantly scrapped the actor code there and used simple locks for the queues, and shaved off several seconds. The client connections were already actors, so it was just adding too much overhead to have the queues themselves be separate actors.

The biggest gain, which took me the longest time to find, and made me feel the most foolish, was in the memcache protocol decoder. I thought I was being really clever by sending incoming data into a scala ArrayBuffer[Byte] and doing functional operations on it. Those operations are sometimes slow as molasses! One, “trimStart”, was using up 1 millsecond per queue push, or 10 seconds of total test time all by itself.

Begrudgingly, I dove into my memcache protocol decoder and decided to do things the Mina way instead of trying to be clever. Using Mina’s ByteBuffer class slashed times dramatically, and suddenly my scala code was as fast as Starling (less than 4 seconds). I actually spent a couple of days fretting about the poor performance before this “eureka” moment, and was very relieved to find out that the slowness was entirely due to my own brain damage, and not to anything in scala or actors.

Better Results

“As fast as ruby” may not sound impressive. In some circles, it’s actually an insult. But for this test, I felt good. First, Starling doesn’t do anything intensive in ruby. It uses ruby’s expressiveness and its libraries to make a small server that’s mostly disk I/O and network I/O bound. EventMachine does a bang-on job of network I/O so there’s little room for improvement.

However… My test was effectively written to play to Starling’s strengths. Since Starling runs in a single thread on a single process (STSP), it excels at handling a single client which makes a lot of requests, but waits for a response between each request. The advantage of the JVM appeared when I changed the test to emulate 100 clients doing 100 queries each.

To Starling, this is exactly the same test (10,000 queries, handled sequentially) so it gets approximately the same result (less than four seconds). Mina-plus-actors starts to shine here, though, and finishes in under two seconds. It’s able to juggle queue work with I/O threads. Success!

Conclusion

I’ve heard there’s a big migration of ruby people to scala, and so the first thing I would say to the ruby people is that this is no panacea. It’s not ruby on a JVM; it’s an entirely new langauge, with much stronger java roots than any other language, so familiarity with java is probably more helpful than python or ruby. On the other hand, if ruby whetted your appetite for functional programming, scala has more of that than ruby and python combined, and seems to live up to its promise of exposing the wonders of java’s scalability and rock-solid virtual machine and garbage collector.

The main stumbling blocks I hit were the same ones others have talked about: scarcity of tools, and a lack of pure-scala libraries – both probably due to the language being so new. Coming from java, it felt great to express myself with python conciseness. It was like opening up the space capsule and breathing fresh air again. Coming from ruby, it was reassuring to write for a platform and libraries that are solid and well-tested, not just an “overnight project” like many of the core ruby libraries appear to be.

Oh, and in the end, my Starling port was only a little over 900 lines long. I think if I added proper config file support like Starling has, it would end up being roughly the same number of lines (1000), even though I had to write a few large pieces (like the memcache codec) that aren’t necessary in ruby.

The end result, which I plan to keep playing with and expanding, is here: http://github.com/robey/scarling/tree/master

Scala infatuation

2008-03-16T00:00:00-07:00

Why my recent infatuation with scala? (Actually, since it’s reaching a half year, I guess it can no longer be called an infatuation.) Here’s my answer in the form of a quick example. I want to determine the average of a list of timings for something.

In python:

timings = [1.2, 1.5, 1.3, 1.5]
reduce(lambda a, b: a + b, timings, 0.0) / len(timings)

In ruby:

timings = [1.2, 1.5, 1.3, 1.5]
timings.inject(0.0) { |a, b| a + b } / timings.length

Besides demonstrating that ruby is just python with different spelling, this shows they both have some powerful built-in functions and syntax. I can type one line for calculating the average and move on to the real point of my code, whatever that is.

Here it is in scala:

var timings = Array(1.2, 1.5, 1.3, 1.5)
timings.foldLeft(0.0) { (a, b) => a + b } / timings.size

It’s pretty much the same! I can type code that is very close to the pseudocode in my head, and move on.

Whether intentionally or not, scala is borrowing very heavily from the “high level” language features of python (and ruby). But it’s running on top of the JVM, which is still the best combination of interpreter machine + libraries out there. Python’s VM is amateur at best, and ruby’s is even worse. This way is like getting my reeces cup, two great tastes that taste great together: a high-level, pseudocode language on top of a racecar engine.

I suspect this is the main reason scala has been getting a lot of attention recently, in spite of the gaping lack of decent documentation or tools.

Oh yeah, here’s that same code in java. I’m not including the creation of the array, because that takes 5 lines all by itself, and wouldn’t be fair to compare:

// assume omitted var: List<Double> timings.
double sum = 0.0;
for (double d : timings) {
    sum += d;
}
double average = sum / timings.size();

For java versions before 1.5, it’s a lot worse.

Incidentally, last night I found a nice introductory article on scala, which I skimmed and recommend.