<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.9.3">Jekyll</generator><link href="https://chadaustin.me/feed/atom" rel="self" type="application/atom+xml" /><link href="https://chadaustin.me/" rel="alternate" type="text/html" /><updated>2025-03-20T23:31:22-05:00</updated><id>https://chadaustin.me/feed/atom</id><title type="html">Chad Austin</title><subtitle></subtitle><entry><title type="html">(Partially) Repairing a Super NES Classic</title><link href="https://chadaustin.me/2025/03/snes-classic-partial-repair/" rel="alternate" type="text/html" title="(Partially) Repairing a Super NES Classic" /><published>2025-03-20T00:00:00-05:00</published><updated>2025-03-20T00:00:00-05:00</updated><id>https://chadaustin.me/2025/03/snes-classic-partial-repair</id><content type="html" xml:base="https://chadaustin.me/2025/03/snes-classic-partial-repair/"><![CDATA[<p>My friend’s SNES Classic stopped responding to controller inputs. He
reset it to factory settings but couldn’t even get through the
language selection menu without being able to push buttons.</p>

<p>I told him I could take a look.</p>

<p>First I used Hakchi to save a backup of its internal storage.</p>

<p>Then I flashed the factory kernel and system software and erased the
user partition. Same thing. Controller did nothing.</p>

<p>I tried my own controllers to no avail. And his controller worked in
mine, so the issue was with the console itself.</p>

<p>How do we debug further?</p>

<p>The SNES Classic runs Linux on a somewhat mainstream ARM A7 SOC by
Allwinner, and if you use Hakchi to flash its custom kernel, you can
telnet to the device while it’s running and interrogate it.</p>

<p>Unfortunately, I didn’t keep great logs of this part of the process,
but eventually I noticed something suspicious in <code class="language-plaintext highlighter-rouge">dmesg</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[    -.------] twi_start()434 - [i2c1] START can't sendout!
</code></pre></div></div>

<p>That message comes from the Allwinner SOC i2c-sunxi kernel driver in
<a href="https://github.com/linux-sunxi/linux-sunxi/blob/d47d367036be38c5180632ec8a3ad169a4593a88/drivers/i2c/busses/i2c-sunxi.c#L411">/drivers/i2c/busses/i2c-sunxi.c</a>.</p>

<p>Oh no, an I²C start packet timeout sounds more like a hardware failure
than anything my friend did with Hakchi. The SNES Classic
controllers (and all Wii Nunchuk connectors) communicate with I²C (or
TWI if you prefer that name). It’s time to pop open the case.</p>

<figure>
<a href="/images/snes-classic/with-heat-sink.jpeg"><img src="/images/snes-classic/with-heat-sink.jpeg" alt="The inside of the case." /></a>
<figcaption>The inside of the case.</figcaption>
</figure>

<p>With the case open and the heatsink off, there’s not much to it.</p>

<figure>
<a href="/images/snes-classic/board-labeled.jpeg"><img src="/images/snes-classic/board-labeled.jpeg" alt="The unshielded board with labeled components." /></a>
<figcaption>The unshielded board with labeled components.</figcaption>
</figure>

<p>The SOC is a four-core <a href="https://linux-sunxi.org/images/b/b3/R16_Datasheet_V1.4_(1).pdf">Allwinner
R16</a>
with a Mali GPU. It’s quite a capable little chip.</p>

<p>You can also see the DRAM (Nanya
<a href="https://www.mxic.com.tw/Lists/Datasheet/Attachments/8462/MX30LF4G18AC,%203V,%204Gb,%20v1.4.pdf">NT5CC128M16IP-DI</a>),
flash (Macronix
<a href="https://www.mxic.com.tw/Lists/Datasheet/Attachments/8462/MX30LF4G18AC,%203V,%204Gb,%20v1.4.pdf">NT5CC128M16IP-DI</a>),
and PMIC (X-Powers/Allwinner
<a href="https://linux-sunxi.org/images/e/e5/AXP223_Datasheet_V1.0_en.pdf">AXP223</a>)
on the top of the board. The HDMI circuitry (Explore
<a href="https://bbs.aw-ol.com/assets/uploads/files/1639040585144-ep952-%E6%8A%80%E6%9C%AF%E5%8F%82%E6%95%B0.pdf">EP952</a>)
is on the bottom.</p>

<p>Most of that’s irrelevant - the important part was whether I could
find anything wrong with the I²C signals between the connector and the
SOC.</p>

<p>I won’t bore you with all of the random stuff I probed, but I
eventually noticed that controller 1’s SCL line had a low-impedance
path to ground, about 670 ohms. SDA and both of controller 2’s signal
lines were pulled low at 1 MΩ, which makes a lot more sense.</p>

<p>Having the clock incorrectly pulled low would explain why it couldn’t
communicate with controller 1. So where is the fault? It’s either
somewhere in the PCB (unlikely) or in the SOC’s I²C buffer.</p>

<p>If the PCB, that’s easily fixable: cut traces and bodge a wire from
the closest good point. If it’s in the SOC, that’s something I can’t
fix. I don’t have the skill to reball and resolder BGA. While you can
easily acquire an Allwinner R16 on Alibaba, there’s no way I could
reball and solder it to the board without breaking something else. I
did find a service that does BGA rework but it starts at $200 and the
SNES Classic isn’t worth that much.</p>

<p>It was a little hard to trace the path from the connectors to the SOC
because the traces run on inner layers. Eventually, I found some
jumpers to desolder to isolate the fault.</p>

<figure>
<a href="/images/snes-classic/i2c-labeled.jpeg"><img src="/images/snes-classic/i2c-labeled.jpeg" alt="Labeled I²C jumpers." /></a>
<figcaption>Labeled I²C jumpers.</figcaption>
</figure>

<p>I desoldered SCL1’s jumper – well, let’s be honest. I melted that
tiny sucker into a black paste while struggling to apply even heat.
Unfortunately, SCL1’s short to ground is in the chip and I can’t fix
that.</p>

<figure>
<a href="/images/snes-classic/i2c-buffer-failure.png"><img src="/images/snes-classic/i2c-buffer-failure.png" alt="Likely a FET in the I²C buffer failed." /></a>
<figcaption>Likely a FET in the I²C buffer failed.</figcaption>
</figure>

<p>At this point, software fixes became the only option.</p>

<p>I was wondering if you could patch the Hakchi kernel to swap
controller 1 with controller 2. It may not even be hard. Then at least
one controller would function.</p>

<p>But I also saw someone on Reddit mention that the system supports
receiving power over USB while acting as a USB host with a powered
<a href="https://en.wikipedia.org/wiki/USB_On-The-Go">USB On-The-Go</a> cable.
You can build your own O2G cable with wire snips, a soldering iron,
and heat shrink tubing. I just purchased <a href="https://www.amazon.com/dp/B00C452XFO">one from
Amazon</a> instead. The reviews
make it clear this is a popular purpose for the cable.</p>

<figure>
<a href="/images/snes-classic/usb-otg.jpeg"><img src="/images/snes-classic/usb-otg.jpeg" alt="Powered USB O2G Cable" /></a>
<figcaption>Powered USB O2G Cable</figcaption>
</figure>

<p>The stock Nintendo kernel does not support USB controllers but the
latest Hakchi kernel does! I confirmed an Xbox controller can navigate
the menu and every button works in game. Even better, controller port
2 still works as usual!</p>

<p>I wonder if raphnet’s <a href="https://www.raphnet-tech.com/products/wusbmote_1player_adapter_v3/index.php">Classic to USB
adapter</a>
would work so you could keep the SNES controller experience.</p>

<p>At this point, I declared as much victory as this was going to get and
sent it back to my friend. It’s nice to keep quality hardware out of
the e-waste bin.</p>

<figure>
<a href="/images/snes-classic/it-works.jpeg"><img src="/images/snes-classic/it-works.jpeg" alt="It works!" /></a>
<figcaption>It works!</figcaption>
</figure>]]></content><author><name></name></author><category term="electronics" /><summary type="html"><![CDATA[My friend’s SNES Classic stopped responding to controller inputs. He reset it to factory settings but couldn’t even get through the language selection menu without being able to push buttons.]]></summary></entry><entry><title type="html">Unsafe Rust Is Harder Than C</title><link href="https://chadaustin.me/2024/10/intrusive-linked-list-in-rust/" rel="alternate" type="text/html" title="Unsafe Rust Is Harder Than C" /><published>2024-10-24T00:00:00-05:00</published><updated>2024-10-24T00:00:00-05:00</updated><id>https://chadaustin.me/2024/10/intrusive-linked-list-in-rust</id><content type="html" xml:base="https://chadaustin.me/2024/10/intrusive-linked-list-in-rust/"><![CDATA[<h2 id="or-the-most-expensive-linked-list-ive-ever-written">Or: The Most Expensive Linked List I’ve Ever Written</h2>

<p>Some of you already know the contents of this post, especially if
you’ve written embedded or unsafe code in Rust. But I didn’t, so I
thought it was useful to write down my experience as accurately as I
can. Without further ado…</p>

<p>Last year, I wrote Photohash, <a href="https://github.com/chadaustin/photohash">software to help me index my NAS and
find duplicate photos</a> with
rotation-independent hashing and <a href="https://en.wikipedia.org/wiki/Perceptual_hashing">perceptual
hashing</a>. To make
use of cores and keep the disks busy, it distributes work to compute
and IO workers. Work is distributed with channels – synchronized work
queues.</p>

<p>In Photohash, work tends to be discovered and processed in batches:
enumerating directories returns multiple entries and the database is
updated in multi-row transactions.</p>

<p>Rust has a rich selection of channel implementations:
<a href="https://doc.rust-lang.org/std/sync/mpsc/index.html">std::sync::mpsc</a>,
<a href="https://docs.rs/futures/latest/futures/channel/index.html">futures::channel</a>,
<a href="https://docs.rs/tokio/latest/tokio/sync/index.html">tokio::sync</a>,
<a href="https://docs.rs/crossbeam/latest/crossbeam/channel/index.html">crossbeam::channel</a>,
<a href="https://docs.rs/flume/">flume</a>, and <a href="https://docs.rs/kanal/">kanal</a>
are high-quality options.</p>

<p>Unfortunately, none of them exactly met my needs, so I nerd-sniped
myself into writing my dream channel. My previous day job
(<a href="https://github.com/facebook/sapling/tree/main/eden/fs">EdenFS</a> and
<a href="https://github.com/facebook/watchman">Watchman</a>) was full of ad-hoc
channels so I knew roughly I wanted. <code class="language-plaintext highlighter-rouge">kanal</code> is closest, but it is
riddled with unsafe code and uses spinlocks which look great in
microbenchmarks but have <a href="https://matklad.github.io/2020/01/02/spinlocks-considered-harmful.html">no place in userspace
software</a>.</p>

<p>Introducing
<a href="https://docs.rs/batch-channel/">batch-channel</a>,
a throughput-optimized channel. The design goals are:</p>

<ul>
  <li><strong>Multi-producer, multi-consumer</strong>. Parallelism in both production
and consumption.</li>
  <li><strong>Sync and async support</strong> for both consumers and producers.
Mix-and-match provides flexibility for use in any type of thread
pool, async runtime, or FFI.</li>
  <li><strong>Bounded or unbounded</strong>. Bounded for backpressure and limiting peak
memory consumption. Unbounded for situations where you cannot
guarantee deadlock freedom.</li>
  <li><strong>Sending and receiving multiple values</strong>. I often want to send
multiple values. Like reading all of the paths in a directory. Or
writing multiple rows to a database. Batching allows amortizing
per-batch costs. It’s silly to acquire the channel lock N times to
push N values. It’s the same on the consumption side: workers may
want to pull all pending work items in one channel read. You might
wonder about lock-free queues, and they have their place, but but
you’ll still contend on the head and tail, and atomic operations
remain slow even on modern Intel cores. If you’re going to contend
on the queue anyway, batch-channel’s philosophy is to stick the
whole thing behind a mutex and maximize batch sizes on both ends.</li>
</ul>

<p>At the time this was written, the following design goals weren’t yet
implemented:</p>

<ul>
  <li><strong>Priorities</strong>. Senders can influence processing order.</li>
  <li><strong>Bounding variable-sized items</strong>. For example, being able to say a
queue can hold up to 20 MB of paths, no matter how long they are.</li>
</ul>

<p>And, finally, the design goal that led to this post:</p>

<ul>
  <li><strong>No allocations under steady-state use</strong>. Allocations are a source
of contention, failure, and overhead, especially when using slow
system allocators.</li>
</ul>

<h2 id="the-shape-of-a-channel">The Shape of a Channel</h2>

<p>To explain why unsafe Rust is even involved, let’s look at the
implementation of a channel.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">pub</span> <span class="k">struct</span> <span class="n">Channel</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="p">{</span>
  <span class="n">q</span><span class="p">:</span> <span class="n">VecDeque</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">,</span>
  <span class="c1">// Blocked receivers</span>
  <span class="n">waiting_for_elements</span><span class="p">:</span> <span class="nb">Vec</span><span class="o">&lt;</span><span class="n">Waker</span><span class="o">&gt;</span><span class="p">,</span>
  <span class="c1">// Blocked senders</span>
  <span class="n">waiting_for_capacity</span><span class="p">:</span> <span class="nb">Vec</span><span class="o">&lt;</span><span class="n">Waker</span><span class="o">&gt;</span><span class="p">,</span>
  <span class="c1">// Synchronous use requires some condition variables too.</span>
<span class="p">}</span>
</code></pre></div></div>

<p>When an async receiver blocks on <code class="language-plaintext highlighter-rouge">recv()</code> because the channel is
empty, the task’s <code class="language-plaintext highlighter-rouge">Waker</code> is stored so that the channel knows to wake
it when a value arrives.</p>

<p><a href="https://doc.rust-lang.org/core/task/struct.Waker.html"><code class="language-plaintext highlighter-rouge">Waker</code></a> is a
handle to a blocked task. The channel can signal the async runtime to
wake a task when it should poll the channel again. It’s a <a href="https://doc.rust-lang.org/nomicon/exotic-sizes.html">wide
pointer</a>, two
words in size.</p>

<p><code class="language-plaintext highlighter-rouge">Waker</code> is used like this:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">pub</span> <span class="k">struct</span> <span class="n">Recv</span><span class="o">&lt;</span><span class="nv">'a</span><span class="p">,</span> <span class="n">T</span><span class="o">&gt;</span> <span class="p">{</span>
  <span class="n">channel</span><span class="p">:</span> <span class="o">&amp;</span><span class="nv">'a</span> <span class="n">Channel</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">,</span>
<span class="p">}</span>

<span class="k">impl</span><span class="o">&lt;</span><span class="nv">'a</span><span class="p">,</span> <span class="n">T</span><span class="o">&gt;</span> <span class="n">Future</span> <span class="k">for</span> <span class="n">Recv</span><span class="o">&lt;</span><span class="nv">'a</span><span class="p">,</span> <span class="n">T</span><span class="o">&gt;</span> <span class="p">{</span>
  <span class="k">type</span> <span class="n">Output</span> <span class="o">=</span> <span class="n">T</span><span class="p">;</span>
  
  <span class="k">fn</span> <span class="nf">poll</span><span class="p">(</span><span class="k">self</span><span class="p">:</span> <span class="nb">Pin</span><span class="o">&lt;&amp;</span><span class="k">mut</span> <span class="k">Self</span><span class="o">&gt;</span><span class="p">,</span> <span class="n">cx</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="n">Context</span><span class="o">&lt;</span><span class="nv">'_</span><span class="o">&gt;</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="n">Poll</span><span class="o">&lt;</span><span class="k">Self</span><span class="p">::</span><span class="n">Output</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="c1">// Is the queue empty?</span>
    <span class="k">if</span> <span class="k">let</span> <span class="nf">Some</span><span class="p">(</span><span class="n">element</span><span class="p">)</span> <span class="o">=</span> <span class="k">self</span><span class="py">.channel.q</span><span class="nf">.pop_front</span><span class="p">()</span> <span class="p">{</span>
      <span class="c1">// The queue has an element, so return it.</span>
      <span class="nn">Poll</span><span class="p">::</span><span class="nf">Ready</span><span class="p">(</span><span class="n">element</span><span class="p">)</span>
    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
      <span class="c1">// Queue is empty so block and try again later.</span>
      <span class="k">self</span><span class="py">.channel.waiting_for_elements</span><span class="nf">.push</span><span class="p">(</span><span class="n">cx</span><span class="nf">.waker</span><span class="p">()</span><span class="nf">.clone</span><span class="p">());</span>
      <span class="nn">Poll</span><span class="p">::</span><span class="n">Pending</span>
    <span class="p">}</span>
  <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>Note</strong>: The above code is illustrative. In reality, the channel has
a <code class="language-plaintext highlighter-rouge">Mutex</code> and some condition variables, but that’s incidental for this
post.</p>

<p>If the queue is empty when <code class="language-plaintext highlighter-rouge">recv()</code> is called, the waker is stored in
the channel and the task enters the blocked state.</p>

<p>Later, when a value is added to the queue, any waiting tasks are
woken:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="n">send</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">(</span><span class="n">channel</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="n">Channel</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">,</span> <span class="n">value</span><span class="p">:</span> <span class="n">T</span><span class="p">)</span> <span class="p">{</span>
  <span class="n">channel</span><span class="py">.q</span><span class="nf">.push_back</span><span class="p">(</span><span class="n">value</span><span class="p">);</span>
  <span class="k">let</span> <span class="n">wakers</span> <span class="o">=</span> <span class="nn">mem</span><span class="p">::</span><span class="nf">take</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="n">channel</span><span class="py">.waiting_for_elements</span><span class="p">);</span>
  <span class="c1">// NOTE: Mutexes are released here, before waking.</span>
  <span class="c1">// Unless we are clever with cancellation, we have to wake all futures,</span>
  <span class="c1">// because we don't know which, if any, will attempt the next poll.</span>
  <span class="k">for</span> <span class="n">waker</span> <span class="k">in</span> <span class="n">wakers</span> <span class="p">{</span>
    <span class="n">waker</span><span class="nf">.wake</span><span class="p">();</span>
  <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>Here’s the issue</strong>: <code class="language-plaintext highlighter-rouge">waiting_for_elements</code> is a <code class="language-plaintext highlighter-rouge">Vec&lt;Waker&gt;</code>. The
channel cannot know how many tasks are blocked, so we can’t use a
fixed-size array. Using a <code class="language-plaintext highlighter-rouge">Vec</code> means we allocate memory every time we
queue a waker. And that allocation is taken and released every time we
have to wake.</p>

<p>The result is that a naive implementation will allocate and free
repeatedly under steady-state send and recv. That’s a lot of memory
allocator traffic.</p>

<h2 id="can-we-use-an-intrusive-list">Can We Use an Intrusive List?</h2>

<p>The optimization I want is, rather than allocating in a <code class="language-plaintext highlighter-rouge">Vec</code> every
time a task is blocked on the queue, can we store the list of <code class="language-plaintext highlighter-rouge">Waker</code>s
within the blocked futures themselves? We know that we only need to
store as many <code class="language-plaintext highlighter-rouge">Waker</code>s as blocked futures, so that should work.</p>

<p>It should look something like:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">pub</span> <span class="k">struct</span> <span class="n">Channel</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="p">{</span>
  <span class="n">q</span><span class="p">:</span> <span class="n">VecDeque</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">,</span>
  <span class="c1">// Intrusive doubly-linked list head.</span>
  <span class="n">waiting_for_elements</span><span class="p">:</span> <span class="n">WakerList</span><span class="p">,</span>
<span class="p">}</span>

<span class="k">fn</span> <span class="n">send</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">(</span><span class="n">channel</span><span class="p">:</span> <span class="o">&amp;</span><span class="n">Channel</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">,</span> <span class="n">value</span><span class="p">:</span> <span class="n">T</span><span class="p">)</span> <span class="p">{</span>
  <span class="n">channel</span><span class="py">.q</span><span class="nf">.push_back</span><span class="p">(</span><span class="n">value</span><span class="p">);</span>
  <span class="k">let</span> <span class="n">wakers</span> <span class="o">=</span> <span class="n">channel</span><span class="py">.waiting_for_elements</span><span class="nf">.extract_list</span><span class="p">();</span>
  <span class="c1">// Release any mutex before waking.</span>
  <span class="k">for</span> <span class="n">waker</span> <span class="k">in</span> <span class="n">wakers</span> <span class="p">{</span>
    <span class="n">waker</span><span class="nf">.wake</span><span class="p">();</span>
  <span class="p">}</span>
<span class="p">}</span>

<span class="k">pub</span> <span class="k">struct</span> <span class="n">Recv</span><span class="o">&lt;</span><span class="nv">'a</span><span class="p">,</span> <span class="n">T</span><span class="o">&gt;</span> <span class="p">{</span>
  <span class="n">channel</span><span class="p">:</span> <span class="o">&amp;</span><span class="nv">'a</span> <span class="n">Channel</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">,</span>
  <span class="c1">// Every Future gets a WakerSlot, which is an intrusive doubly-linked</span>
  <span class="c1">// list node protected by Channel's mutex.</span>
  <span class="n">waker</span><span class="p">:</span> <span class="n">WakerSlot</span><span class="p">,</span>
<span class="p">}</span>

<span class="k">impl</span><span class="o">&lt;</span><span class="nv">'a</span><span class="p">,</span> <span class="n">T</span><span class="o">&gt;</span> <span class="n">Future</span> <span class="k">for</span> <span class="n">Recv</span><span class="o">&lt;</span><span class="nv">'a</span><span class="p">,</span> <span class="n">T</span><span class="o">&gt;</span> <span class="p">{</span>
  <span class="k">type</span> <span class="n">Output</span> <span class="o">=</span> <span class="n">T</span><span class="p">;</span>
  
  <span class="k">fn</span> <span class="nf">poll</span><span class="p">(</span><span class="k">self</span><span class="p">:</span> <span class="nb">Pin</span><span class="o">&lt;&amp;</span><span class="k">mut</span> <span class="k">Self</span><span class="o">&gt;</span><span class="p">,</span> <span class="n">cx</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="n">Context</span><span class="o">&lt;</span><span class="nv">'_</span><span class="o">&gt;</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="n">Poll</span><span class="o">&lt;</span><span class="k">Self</span><span class="p">::</span><span class="n">Output</span><span class="o">&gt;</span> <span class="p">{</span>
    <span class="k">if</span> <span class="k">let</span> <span class="nf">Some</span><span class="p">(</span><span class="n">element</span><span class="p">)</span> <span class="o">=</span> <span class="k">self</span><span class="py">.channel.q</span><span class="nf">.pop_front</span><span class="p">()</span> <span class="p">{</span>
      <span class="nn">Poll</span><span class="p">::</span><span class="nf">Ready</span><span class="p">(</span><span class="n">element</span><span class="p">)</span>
    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
      <span class="c1">// Queue is empty so try again later.</span>
      <span class="c1">// Store the Waker in this Future by linking it into channel's list.</span>
      <span class="k">self</span><span class="py">.channel.waiting_for_elements</span><span class="nf">.link</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="py">.waker</span><span class="p">,</span> <span class="n">cx</span><span class="nf">.waker</span><span class="p">()</span><span class="nf">.clone</span><span class="p">());</span>
      <span class="nn">Poll</span><span class="p">::</span><span class="n">Pending</span>
    <span class="p">}</span>
  <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>And that’s about the limit of my fake illustrative code. It’s time to
get into details. How do we express an intrusive linked list in Rust?</p>

<h2 id="intrusive-list-crates">Intrusive List Crates</h2>

<p>This isn’t a new idea. I looked for existing crates:</p>

<ul>
  <li><a href="https://docs.rs/intrusive-collections/">intrusive-collections</a>
is popular, but list nodes must outlive the list itself. In my case,
the future will never outlive the channel.</li>
  <li><a href="https://docs.rs/futures-intrusive/">futures-intrusive</a>
is a nice-looking crate that performs the same optimization, but
does not meet my design goals.</li>
  <li><a href="https://github.com/pcwalton/multilist">multilist</a> is Patrick
Walton’s pre-1.0 experiment. Interesting idea, but it allocates
nodes on the heap.</li>
</ul>

<p>There are two other production examples of this approach:</p>

<ul>
  <li><a href="https://docs.rs/lilos-list/0.1.0/lilos_list/">lilos-list</a> is part
of Cliff Biffle’s embedded operating system
<a href="https://docs.rs/lilos/">lilos</a>. It stores wakers with an intrusive
list. It’s close to what I wanted, but embedded OS code tends to
have its own concurrency model. In particular, it would take some
work to integrate it with standard library mutexes, and it <a href="https://users.rust-lang.org/t/should-locks-be-dropped-before-calling-waker-wake/53057/4">calls
wakers while locks are held, which is a bad
idea</a>.
On the other hand, since there are no threads in lilos, it can avoid
implementing <code class="language-plaintext highlighter-rouge">Send</code> and use
<a href="https://doc.rust-lang.org/std/cell/struct.Cell.html"><code class="language-plaintext highlighter-rouge">std::cell::Cell</code></a>
to temporarily perform mutations on otherwise shared references.</li>
  <li><a href="https://github.com/tokio-rs/tokio/blob/c8f3539bc11e57843745c68ee60ca5276248f9f9/tokio/src/sync/batch_semaphore.rs#L35">tokio</a>
stores its channel wakers in an intrusive linked list too. Its
implementation has a surprising amount of code, but it’s closest to
what I want. The point is moot: it’s an implementation detail not
visible outside of Tokio. (See Alice Rhyl’s <a href="https://gist.github.com/Darksonn/1567538f56af1a8038ecc3c664a42462">musings on the
challenges of intrusive structures in
Rust</a>.)</li>
</ul>

<h2 id="pinning">Pinning</h2>

<p>It’s time to write a crate.</p>

<p>I want the channel to store a <code class="language-plaintext highlighter-rouge">WakerList</code> and each future to have a
<code class="language-plaintext highlighter-rouge">WakerSlot</code> member. Slots can be linked into the list and unlinked
either on wake or future cancellation.</p>

<p><code class="language-plaintext highlighter-rouge">WakerList</code> and <code class="language-plaintext highlighter-rouge">WakerSlot</code> form a self-referential data structure.
Self-referential data structures are a well-known challenge in Rust.
They require unsafe code.</p>

<p>In C++, this is relatively easy. You delete the move and copy
operations and fix up the link pointers as appropriate.</p>

<p>So, at this point in my Rust journey, still thinking in C++, I assume
“easy!”</p>

<p>I just need to disable movement with <code class="language-plaintext highlighter-rouge">!Unpin</code> (actually
<a href="https://doc.rust-lang.org/std/marker/struct.PhantomPinned.html"><code class="language-plaintext highlighter-rouge">PhantomPinned</code></a>)
and ensure all methods take <code class="language-plaintext highlighter-rouge">Pin&lt;&amp;mut WakerList&gt;</code> and <code class="language-plaintext highlighter-rouge">Pin&lt;&amp;mut
WakerSlot&gt;</code>.</p>

<p>Once you observe a <code class="language-plaintext highlighter-rouge">Pin&lt;&amp;mut T&gt;</code>, you can assume T will never move
again. I’m not going to discuss <code class="language-plaintext highlighter-rouge">Pin</code> in depth – Jon Gjengset has an
<a href="https://www.youtube.com/watch?v=DkMwYxfSYNQ">excellent video describing its rationale and
usage</a>.</p>

<p>Here’s where things start to get hard. Pinning was added well after
Rust 1.0 was stabilized. The language pervasively assumes values of
any type can be moved with a memcpy, so writing a data structure that
violates that assumption makes the public APIs themselves awkward.</p>

<p>Here’s what I tried first:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nf">WakerSlot</span><span class="p">(</span><span class="o">...</span><span class="p">);</span>
<span class="k">struct</span> <span class="nf">WakerList</span><span class="p">(</span><span class="o">...</span><span class="p">);</span>

<span class="k">impl</span> <span class="n">WakerList</span> <span class="p">{</span>
  <span class="k">fn</span> <span class="n">link</span><span class="o">&lt;</span><span class="nv">'list</span> <span class="p">:</span> <span class="nv">'slot</span><span class="p">,</span> <span class="nv">'slot</span><span class="o">&gt;</span><span class="p">(</span>
    <span class="k">self</span><span class="p">:</span> <span class="nb">Pin</span><span class="o">&lt;&amp;</span><span class="nv">'list</span> <span class="k">mut</span> <span class="n">WakerList</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="n">slot</span><span class="p">:</span> <span class="nb">Pin</span><span class="o">&lt;&amp;</span><span class="nv">'slot</span> <span class="k">mut</span> <span class="n">WakerSlot</span><span class="o">&gt;</span><span class="p">,</span>
  <span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div>

<p>My thought was that the act of linking should constrain the
“pinnedness” and the lifetimes: list must outlive slot. Alas,
<a href="https://stackoverflow.com/questions/66017394/does-rust-narrow-lifetimes-to-satisfy-constraints-defined-on-them">lifetimes don’t work like
that</a>.
A function call cannot constrain the actual lifetimes of its parameters.
The borrow checker will happily subset <code class="language-plaintext highlighter-rouge">'list</code> and <code class="language-plaintext highlighter-rouge">'slot</code> until it
proves whether it can satisfy the constraints. The result is that the
<code class="language-plaintext highlighter-rouge">link</code>’s definition above has no effect.</p>

<p>The idea that lifetimes precisely match the lives of actual values is
apparently a common misconception, and it resulted in some puzzling
error messages.</p>

<p>(Writing a post like this after the fact feels weird because “of
course it doesn’t work that way” but I’m faithfully documenting my
learning process.)</p>

<p>Can <code class="language-plaintext highlighter-rouge">WakerSlot</code> itself take a lifetime to the list it references?</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nf">WakerList</span><span class="p">(</span><span class="o">...</span><span class="p">);</span>
<span class="k">struct</span> <span class="n">WakerSlot</span><span class="o">&lt;</span><span class="nv">'list</span><span class="o">&gt;</span><span class="p">(</span><span class="o">...</span><span class="p">);</span>

<span class="k">impl</span> <span class="n">WakerSlot</span><span class="o">&lt;</span><span class="nv">'_</span><span class="o">&gt;</span> <span class="p">{</span>
  <span class="k">fn</span> <span class="nf">new</span><span class="p">(</span><span class="n">list</span><span class="p">:</span> <span class="o">&amp;</span><span class="n">WakerList</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="n">WakerSlot</span><span class="o">&lt;</span><span class="nv">'_</span><span class="o">&gt;</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This doesn’t work. If you pretend the WakerSlot has a reference to the
WakerList, then you can never create a <code class="language-plaintext highlighter-rouge">&amp;mut WakerList</code> elsewhere,
because Rust’s core lifetime rule is that you can either have one mut
reference or many shared references but never both.</p>

<p>I’m hoping this is possible and a reader leaves me a note.</p>

<h2 id="when-panicking-isnt-safe-enough">When Panicking Isn’t Safe Enough</h2>

<p>Conceptually, <code class="language-plaintext highlighter-rouge">link</code> and <code class="language-plaintext highlighter-rouge">unlink</code> operations take mutable references
to both the list and slot. But I never found a way to satisfy all of
the rules in the type system:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">WakerSlot</code>’s lifetime parameter does not outlive its list.</li>
  <li>Only one <code class="language-plaintext highlighter-rouge">&amp;mut</code> reference at a time.</li>
  <li>Never <code class="language-plaintext highlighter-rouge">&amp;mut</code> and <code class="language-plaintext highlighter-rouge">&amp;</code> simultaneously.</li>
</ul>

<p>Here, I gave up on trying to express the rules in the type system and
chose to assert at runtime. The runtime lifetime rules are:</p>

<p><code class="language-plaintext highlighter-rouge">WakerList</code> must be empty when dropped. Otherwise, slots would have pointers
to invalid memory.</p>

<p><code class="language-plaintext highlighter-rouge">WakerSlot</code> must be unlinked when dropped. Otherwise, the list
references deallocated memory.</p>

<p>Reporting these invariant violations with panic is not sufficient.
Panics can be caught, but the program would remain in a state where
safe code can access dangling pointers and cause undefined behavior (UB).</p>

<p>Therefore, when an invariant is violated, the program must abort.</p>

<p>But I can’t just call abort: I want this utility crate to be
<code class="language-plaintext highlighter-rouge">[no_std]</code>, so it’s up to the calling program to decide how it aborts.</p>

<p>The simplest solution I found was to panic from an <code class="language-plaintext highlighter-rouge">extern "C"</code>
function and let Rust translate that to an abort.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">#[allow(non_snake_case)]</span>
<span class="nd">#[inline(never)]</span>
<span class="nd">#[cold]</span>
<span class="k">extern</span> <span class="s">"C"</span> <span class="k">fn</span> <span class="nf">MUST_UNLINK_WakerSlot_BEFORE_DROP</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="o">!</span> <span class="p">{</span>
    <span class="c1">// panic! from extern "C" is an abort with an error message.</span>
    <span class="nd">panic!</span><span class="p">(</span><span class="s">"Must unlink WakerSlot before drop"</span><span class="p">)</span>
    <span class="c1">// Another option, at the cost of a tiny, stable, dependency, is</span>
    <span class="c1">// the `abort` crate.</span>
    <span class="c1">//abort::abort()</span>
<span class="p">}</span>
</code></pre></div></div>

<h2 id="structural-pinning">Structural Pinning</h2>

<p>I elided this detail in the example code above, but <code class="language-plaintext highlighter-rouge">WakerList</code> is
intended to be accessed behind a mutex. However, neither
<a href="https://doc.rust-lang.org/std/sync/struct.Mutex.html"><code class="language-plaintext highlighter-rouge">std::sync::Mutex</code></a>
nor
<a href="https://docs.rs/parking_lot/latest/parking_lot/type.Mutex.html"><code class="language-plaintext highlighter-rouge">parking_lot::Mutex</code></a>
have <a href="https://doc.rust-lang.org/std/pin/index.html#projections-and-structural-pinning">structural
pinning</a>.
That is, <code class="language-plaintext highlighter-rouge">lock()</code> is <code class="language-plaintext highlighter-rouge">&amp;Mutex&lt;T&gt;</code> onto <code class="language-plaintext highlighter-rouge">&amp;mut T</code>, allowing T to be
moved.</p>

<p>I needed a safe API for getting <code class="language-plaintext highlighter-rouge">Pin&lt;&amp;mut T&gt;</code> from <code class="language-plaintext highlighter-rouge">Pin&lt;&amp;Mutex&lt;T&gt;&gt;</code>.</p>

<p>So I wrote the <a href="https://docs.rs/pinned-mutex/">pinned-mutex</a> crate
which provides structurally-pinned <code class="language-plaintext highlighter-rouge">Mutex</code>, <code class="language-plaintext highlighter-rouge">MutexGuard</code>, and
<code class="language-plaintext highlighter-rouge">Condvar</code> wrappers.</p>

<p>Note that there is a <a href="https://docs.rs/pinarcmutex/">pinarcmutex crate</a>
that offers a <code class="language-plaintext highlighter-rouge">PinArcMutex&lt;T&gt;</code> type roughly equivalent to
<code class="language-plaintext highlighter-rouge">Pin&lt;Arc&lt;Mutex&lt;T&gt;&gt;&gt;</code> except with structural pinning. But it allocates
and you can’t drop in <code class="language-plaintext highlighter-rouge">parking_lot</code>’s mutex, which is faster and
lighter than the standard library’s.</p>

<p>We can imagine a future Rust version where pinning is more natural and
has pervasive (or implicit) standard library support.</p>

<h2 id="pinning-ergonomics">Pinning Ergonomics</h2>

<p>Boats recently wrote a <a href="https://without.boats/blog/pin/">nice overview of why Pin is shaped the way it
is and why it is painful to use</a>.</p>

<p>And the internet is full of threads like <a href="https://www.reddit.com/r/rust/comments/v64nej/pin_suffering_continues/">“Pin Suffering
Continues”</a>.</p>

<p>If you want to use pinned APIs safely in your own code, you will need
to depend on a pin-projection crate like
<a href="https://docs.rs/pin-project/">pin-project</a> or
<a href="https://docs.rs/pin-project-lite/">pin-project-lite</a>. (And don’t
forget
<a href="https://docs.rs/pin-project/latest/pin_project/attr.pinned_drop.html">pinned_drop</a>!)</p>

<p>It works fine, but you end up with code that looks like the following.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="k">mut</span> <span class="n">wakers</span> <span class="o">=</span> <span class="n">state</span><span class="nf">.as_mut</span><span class="p">()</span><span class="nf">.project</span><span class="p">()</span><span class="py">.base</span><span class="nf">.project</span><span class="p">()</span><span class="py">.rx_wakers</span><span class="nf">.extract_some_wakers</span><span class="p">();</span>
<span class="k">while</span> <span class="n">wakers</span><span class="nf">.wake_all</span><span class="p">()</span> <span class="p">{</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">state</span> <span class="o">=</span> <span class="k">self</span><span class="nf">.project_ref</span><span class="p">()</span><span class="py">.state</span><span class="nf">.lock</span><span class="p">();</span>
    <span class="n">wakers</span><span class="nf">.extract_more</span><span class="p">(</span><span class="n">state</span><span class="nf">.as_mut</span><span class="p">()</span><span class="nf">.base</span><span class="p">()</span><span class="nf">.project</span><span class="p">()</span><span class="py">.rx_wakers</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Writing code this way is miserable. The compiler will guide you, but
my mind was shouting “you know what I want, just do it” the whole
time.</p>

<h2 id="it-works">It Works!</h2>

<p>Deep breath. Tests passed. <code class="language-plaintext highlighter-rouge">WakerList</code> and <code class="language-plaintext highlighter-rouge">WakerSlot</code> provide the
interface I wanted, so I published them in a
<a href="https://docs.rs/wakerset/">wakerset</a> crate. It offers a safe,
intrusive, <code class="language-plaintext highlighter-rouge">no_std</code> list of Wakers.</p>

<p>With it, I could remove the main source of steady-state allocations in
<code class="language-plaintext highlighter-rouge">batch_channel</code>.</p>

<p>It was time to polish it up and ensure I didn’t miss anything, and
this blog post is only half-done.</p>

<h2 id="undefined-behavior-sanitizers-and-miri">Undefined Behavior, Sanitizers, and MIRI</h2>

<p>One of my original goals for <code class="language-plaintext highlighter-rouge">batch-channel</code> was to avoid unsafe
code.</p>

<p>Unfortunately, two optimizations required it. Besides the intrusive
list described above, the MPMC channel objects themselves are managed
with two reference counts, one for senders and one for receivers. A
split reference count is required: when one half of the channel is
dropped, the channel is closed.</p>

<p>To simplify auditing, I placed all of this new unsafe code behind safe
APIs and separate crates. They are:</p>
<ul>
  <li><a href="https://docs.rs/splitrc/">splitrc</a></li>
  <li><a href="https://docs.rs/wakerset/">wakerset</a></li>
  <li><a href="https://docs.rs/pinned-mutex/">pinned-mutex</a></li>
</ul>

<p>Safe Rust, excepting compiler bugs and incorrectly-designed unsound
APIs, has no undefined behavior. Unsafe Rust, on the other hand,
removes the guardrails and opens a <a href="https://doc.rust-lang.org/reference/behavior-considered-undefined.html">buffet of possible
UB</a>.</p>

<p>There are three ways you can deal with potential undefined behavior.
In increasing order of happiness over time:</p>

<ul>
  <li>Hope for the best and deal with potential bugs when they come up.</li>
  <li>Think carefully and convince yourself the code is correct.</li>
  <li>Automated sanitizers.</li>
</ul>

<p>Fortunately, Rust supports
<a href="https://doc.rust-lang.org/nightly/unstable-book/compiler-flags/sanitizer.html">sanitizers</a>
that detect various types of undefined behavior. The two most useful
are
<a href="https://doc.rust-lang.org/nightly/unstable-book/compiler-flags/sanitizer.html#addresssanitizer">ASan</a>
and
<a href="https://doc.rust-lang.org/nightly/unstable-book/compiler-flags/sanitizer.html#threadsanitizer">TSan</a>.
C++ programmers are quite familiar with them at this point, and I
consider them table stakes for any C or C++ project.</p>

<p>But Rust has one even better:
<a href="https://github.com/rust-lang/miri">MIRI</a>. It catches violations of
the Rust aliasing model, which is pickier than ASAN. Satisfying MIRI
is where I spent most of my time.</p>

<h2 id="rust-aliasing-model">Rust Aliasing Model</h2>

<p>The first time I ran MIRI, it failed:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>test link_and_notify_all ...
error: Undefined Behavior: trying to retag from &lt;254318&gt; for SharedReadWrite permission at alloc88289[0x10], but that tag does not exist in the borrow stack for this location
...
trying to retag from &lt;254318&gt; for SharedReadWrite permission at alloc88289[0x10], but that tag does not exist in the borrow stack for this location
...
help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
</code></pre></div></div>

<p>Stacked borrows? What’s all this?</p>

<p>Here, I realized my first mistake: I dove straight into unsafe Rust
and should have read more in advance:</p>
<ul>
  <li><a href="https://rust-unofficial.github.io/too-many-lists/index.html">Learning Rust With Entirely Too Many Linked
Lists</a></li>
  <li><a href="https://rust-lang.github.io/unsafe-code-guidelines/">Rust Unsafe Code
Guidelines</a></li>
  <li><a href="https://doc.rust-lang.org/nomicon/">The Rustonomicon</a></li>
</ul>

<p>My other mistake was “thinking in C”. Being deeply familiar with <a href="https://en.wikipedia.org/wiki/Alias_analysis#Type-based_alias_analysis">C’s
semantics</a>
wasn’t helpful here. In hindsight, it feels foolish to have assumed
Rust’s aliasing model was similar to C’s. In C, usually, having
pointers to values carries no meaning. Primarily you reason about
pointer dereferencing operations.</p>

<p>For example, in C, this is perfectly legal if <code class="language-plaintext highlighter-rouge">p == q</code>:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">foo</span><span class="p">(</span><span class="kt">int</span><span class="o">*</span> <span class="n">p</span><span class="p">,</span> <span class="k">const</span> <span class="kt">int</span><span class="o">*</span> <span class="n">q</span><span class="p">)</span> <span class="p">{</span>
  <span class="n">printf</span><span class="p">(</span><span class="s">"%d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="o">*</span><span class="n">q</span><span class="p">);</span>
  <span class="o">*</span><span class="n">p</span> <span class="o">=</span> <span class="mi">456</span><span class="p">;</span>
  <span class="n">printf</span><span class="p">(</span><span class="s">"%d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="o">*</span><span class="n">q</span><span class="p">);</span> <span class="c1">// if p == q, prints 456</span>
  <span class="o">*</span><span class="p">(</span><span class="kt">int</span><span class="o">*</span><span class="p">)</span><span class="n">q</span> <span class="o">=</span> <span class="mi">789</span><span class="p">;</span>
  <span class="n">printf</span><span class="p">(</span><span class="s">"%d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="o">*</span><span class="n">p</span><span class="p">);</span> <span class="c1">// if p == q, prints 789</span>
<span class="p">}</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">const</code> is “meaningless” in that it does not prevent the pointee from
changing, so the compiler can’t optimize based on it.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">foo</span><span class="p">(</span><span class="n">a</span><span class="p">:</span> <span class="o">&amp;</span><span class="nb">u32</span><span class="p">,</span> <span class="n">b</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="nb">u32</span><span class="p">)</span> <span class="p">{</span>
    <span class="nd">println!</span><span class="p">(</span><span class="s">"{a}"</span><span class="p">);</span>
    <span class="o">*</span><span class="n">b</span> <span class="o">=</span> <span class="mi">123</span><span class="p">;</span>
    <span class="nd">println!</span><span class="p">(</span><span class="s">"{a}"</span><span class="p">);</span> <span class="c1">// always prints the same value as above</span>
<span class="p">}</span>
</code></pre></div></div>

<p>On the other hand, Rust’s primary aliasing rule is that, at any point,
an object may have a unique <code class="language-plaintext highlighter-rouge">&amp;mut</code> reference to it or any number of
shared <code class="language-plaintext highlighter-rouge">&amp;</code> references, but never both.</p>

<p>The optimizer will take advantage of that. <code class="language-plaintext highlighter-rouge">a</code> is not reloaded from
memory because the write to <code class="language-plaintext highlighter-rouge">b</code> cannot alias it.</p>

<p>The <a href="https://doc.rust-lang.org/nomicon/aliasing.html">aliasing rules</a>
in Rust are not fully defined. That’s part of what makes this hard.
You have to write code assuming the most pessimal aliasing model.</p>

<p>Under the most pessimal aliasing rules, you have to assume taking a
<code class="language-plaintext highlighter-rouge">&amp;mut</code> reference to a value <em>immediately</em> writes to it and continues
to write to it as long as the reference lives. And if you have a
shared reference, you have to assume the value is read at arbitrary
times as long as any reference is held.</p>

<h2 id="boxleak">Box::leak</h2>

<p>Let’s start with the first confusing example I ran into:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="n">p</span> <span class="o">=</span> <span class="nn">Box</span><span class="p">::</span><span class="nf">leak</span><span class="p">(</span><span class="nn">Box</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="nn">MyThing</span><span class="p">::</span><span class="nf">new</span><span class="p">()))</span> <span class="k">as</span> <span class="o">*</span><span class="k">mut</span> <span class="n">MyThing</span><span class="p">;</span>
<span class="c1">// later:</span>
<span class="k">let</span> <span class="k">ref</span><span class="p">:</span> <span class="o">&amp;</span><span class="n">MyThing</span> <span class="o">=</span> <span class="o">*</span><span class="n">p</span><span class="p">;</span>
<span class="k">ref</span><span class="nf">.method</span><span class="p">();</span>
</code></pre></div></div>

<p>MIRI failed, complaining that <code class="language-plaintext highlighter-rouge">p</code> was formed from the perpetual <code class="language-plaintext highlighter-rouge">&amp;mut</code>
returned by <code class="language-plaintext highlighter-rouge">Box::leak</code> and therefore it’s UB to create a shared
<code class="language-plaintext highlighter-rouge">&amp;MyThing</code> reference to it at any point thenceforth.</p>

<p>The fix was to allocate without forming a reference:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="n">p</span> <span class="o">=</span> <span class="nn">Box</span><span class="p">::</span><span class="nf">into_raw</span><span class="p">(</span><span class="nn">MyThing</span><span class="p">::</span><span class="nf">new</span><span class="p">());</span>
<span class="c1">// later:</span>
<span class="k">let</span> <span class="k">ref</span><span class="p">:</span> <span class="o">&amp;</span><span class="n">MyThing</span> <span class="o">=</span> <span class="o">*</span><span class="n">p</span><span class="p">;</span>
<span class="k">ref</span><span class="nf">.method</span><span class="p">();</span>
</code></pre></div></div>

<p><strong>Note</strong>: This may have been a MIRI bug or the rules have since been
relaxed, because I can no longer reproduce as of nightly-2024-06-12.
Here’s where the memory model and aliasing rules not being defined
caused some pain: when MIRI fails, it’s unclear whether it’s my fault
or not. For example, given the <code class="language-plaintext highlighter-rouge">&amp;mut</code> was immediately turned into a
pointer, does the <code class="language-plaintext highlighter-rouge">&amp;mut</code> reference still exist? There are multiple
valid interpretations of the rules.</p>

<h2 id="boxfrom_raw">Box::from_raw</h2>

<p>OK, if you allocate some memory with <code class="language-plaintext highlighter-rouge">Box::into_raw</code>, you’d expect to
deallocate with <code class="language-plaintext highlighter-rouge">Box::from_raw</code>, right? That failed MIRI too.</p>

<p>I ended up having to write:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">unsafe</span> <span class="p">{</span>
    <span class="k">let</span> <span class="n">ptr</span> <span class="o">=</span> <span class="n">ptr</span><span class="nf">.as_ptr</span><span class="p">();</span>
    <span class="nn">std</span><span class="p">::</span><span class="nn">ptr</span><span class="p">::</span><span class="nf">drop_in_place</span><span class="p">(</span><span class="n">ptr</span><span class="p">);</span>
    <span class="nn">std</span><span class="p">::</span><span class="nn">alloc</span><span class="p">::</span><span class="nf">dealloc</span><span class="p">(</span>
        <span class="n">ptr</span> <span class="k">as</span> <span class="o">*</span><span class="k">mut</span> <span class="nb">u8</span><span class="p">,</span>
        <span class="nn">std</span><span class="p">::</span><span class="nn">alloc</span><span class="p">::</span><span class="nn">Layout</span><span class="p">::</span><span class="nn">new</span><span class="p">::</span><span class="o">&lt;</span><span class="n">Inner</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;&gt;</span><span class="p">());</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>Note</strong>: This may have also been a MIRI bug. It is no longer
reproducible. I changed <code class="language-plaintext highlighter-rouge">splitrc</code> to use <code class="language-plaintext highlighter-rouge">Box::into_raw</code> and
<code class="language-plaintext highlighter-rouge">Box::from_raw</code> and it passes MIRI. I enabled MIRI in my CI so we’ll
see if it breaks again going forward.</p>

<h2 id="linkage-references-and-interior-mutability">Linkage, References, and Interior Mutability</h2>

<p>That’s channel allocation and deallocation covered. Now let’s look at
the intrusive pointers in <code class="language-plaintext highlighter-rouge">wakerset</code>.</p>

<p>In a linked list, every node has a linkage struct.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">Pointers</span> <span class="p">{</span>
    <span class="n">next</span><span class="p">:</span> <span class="o">*</span><span class="k">mut</span> <span class="n">Pointers</span><span class="p">,</span>
    <span class="n">prev</span><span class="p">:</span> <span class="o">*</span><span class="k">mut</span> <span class="n">Pointers</span><span class="p">,</span>
    <span class="n">pinned</span><span class="p">:</span> <span class="n">PhantomPinned</span><span class="p">,</span>
<span class="p">}</span>

<span class="k">struct</span> <span class="n">WakerList</span> <span class="p">{</span>
    <span class="n">pointers</span><span class="p">:</span> <span class="n">Pointers</span><span class="p">,</span>
<span class="p">}</span>

<span class="k">struct</span> <span class="n">WakerSlot</span> <span class="p">{</span>
    <span class="n">pointers</span><span class="p">:</span> <span class="n">Pointers</span><span class="p">,</span>
    <span class="c1">// Required to assert that this slot is never</span>
    <span class="c1">// unlinked from an unrelated list.</span>
    <span class="n">owner</span><span class="p">:</span> <span class="o">*</span><span class="k">mut</span> <span class="n">WakerList</span><span class="p">,</span>
    <span class="n">waker</span><span class="p">:</span> <span class="nb">Option</span><span class="o">&lt;</span><span class="n">Waker</span><span class="o">&gt;</span><span class="p">,</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Now imagine two threads. Thread A holds a <code class="language-plaintext highlighter-rouge">&amp;mut WakerList</code> with the
intent to extract pending Wakers. Thread B happens to hold a
<code class="language-plaintext highlighter-rouge">&amp;WakerSlot</code> at the same time.</p>

<p>It is UB for code traversing the pointers to form a <code class="language-plaintext highlighter-rouge">&amp;mut WakerSlot</code>
(or even a <code class="language-plaintext highlighter-rouge">&amp;WakerSlot</code>) if any thread might have a <code class="language-plaintext highlighter-rouge">&amp;mut WakerSlot</code>,
because this violates Rust’s aliasing rules. A <code class="language-plaintext highlighter-rouge">&amp;mut</code> reference must
always be exclusive, <em>even if it is never dereferenced</em>. This is the
important difference with C.</p>

<p>Because Rust reorders reads and writes based on its aliasing rules,
you must never convert a pointer into a reference unless you know that
nobody else has a conflicting reference.</p>

<p>We need to prevent the compiler from optimizing a <code class="language-plaintext highlighter-rouge">&amp;WakerSlot</code> into
early reads of the <code class="language-plaintext highlighter-rouge">pointers</code> and <code class="language-plaintext highlighter-rouge">waker</code> fields.</p>

<p><a href="https://doc.rust-lang.org/std/cell/struct.UnsafeCell.html"><code class="language-plaintext highlighter-rouge">UnsafeCell</code></a>
is the tool to reach for. It introduces a “mutability barrier”, and
<code class="language-plaintext highlighter-rouge">UnsafeCell&lt;Pointers&gt;</code> tells Rust not to cache reads. We are
responsible for ensuring we won’t violate Rust’s aliasing rules when
accessing <code class="language-plaintext highlighter-rouge">Pointers</code>’s fields.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">WakerList</span> <span class="p">{</span>
    <span class="n">pointers</span><span class="p">:</span> <span class="n">Pointers</span><span class="p">,</span>
<span class="p">}</span>

<span class="k">struct</span> <span class="n">WakerSlot</span> <span class="p">{</span>
    <span class="c1">// UnsafeCell: written by WakerList independent of</span>
    <span class="c1">// WakerSlot references</span>
    <span class="n">pointers</span><span class="p">:</span> <span class="n">UnsafeCell</span><span class="o">&lt;</span><span class="n">Pointers</span><span class="o">&gt;</span><span class="p">,</span>
    <span class="c1">// Required to assert that this slot is never</span>
    <span class="c1">// unlinked from an unrelated list.</span>
    <span class="n">owner</span><span class="p">:</span> <span class="o">*</span><span class="k">mut</span> <span class="n">WakerList</span><span class="p">,</span>
    <span class="n">waker</span><span class="p">:</span> <span class="nb">Option</span><span class="o">&lt;</span><span class="n">Waker</span><span class="o">&gt;</span><span class="p">,</span>
<span class="p">}</span>
</code></pre></div></div>

<p>A circular linked list means that only a slot reference is required to
mutate the list, so I needed to enforce the guarantee that only one
thread may access the <code class="language-plaintext highlighter-rouge">UnsafeCell</code>s at a time.</p>

<p>I did this with an important, if subtle, API guarantee: all link and
unlink operations take <code class="language-plaintext highlighter-rouge">&amp;mut WakerList</code>. If <code class="language-plaintext highlighter-rouge">&amp;mut WakerSlot</code> was
sufficient to unlink, it could violate thread safety if <code class="language-plaintext highlighter-rouge">WakerList</code>
was behind a mutex. (This also means that <code class="language-plaintext highlighter-rouge">WakerList</code> does not require
an <code class="language-plaintext highlighter-rouge">UnsafeCell&lt;Pointers&gt;</code>.)</p>

<p>The
<a href="https://docs.rs/pinned-aliasable/latest/pinned_aliasable/"><code class="language-plaintext highlighter-rouge">pinned-aliasable</code></a>
crate solves a related problem: how do we define self-referential data
structures with mutable references that do not miscompile? Read the
motivation in the crate’s doc comments. It’s a situation required by
async futures, which are self-referential and thus pinned, but have no
desugaring. See the open Rust <a href="https://github.com/rust-lang/rust/issues/63818">Issue
#63818</a>.</p>

<h2 id="avoiding-references-entirely">Avoiding References Entirely</h2>

<p>As mentioned, when traversing a linked list, it’s easy to form conflicting
<code class="language-plaintext highlighter-rouge">&amp;mut</code> references to nodes. Consider this slightly contrived unlink
example:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="n">p</span> <span class="o">=</span> <span class="o">&amp;</span><span class="n">slot</span><span class="py">.pointers</span> <span class="k">as</span> <span class="o">*</span><span class="k">mut</span> <span class="n">Pointers</span><span class="p">;</span>
<span class="k">let</span> <span class="n">next</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="n">Pointers</span> <span class="o">=</span> <span class="o">*</span><span class="p">(</span><span class="o">*</span><span class="n">p</span><span class="p">)</span><span class="py">.next</span><span class="p">;</span>
<span class="k">let</span> <span class="n">prev</span><span class="p">:</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="n">Pointers</span> <span class="o">=</span> <span class="o">*</span><span class="p">(</span><span class="o">*</span><span class="n">p</span><span class="p">)</span><span class="py">.prev</span><span class="p">;</span>
<span class="n">next</span><span class="py">.prev</span> <span class="o">=</span> <span class="n">prev</span> <span class="k">as</span> <span class="o">*</span><span class="k">mut</span> <span class="n">Pointers</span><span class="p">;</span>
<span class="n">prev</span><span class="py">.next</span> <span class="o">=</span> <span class="n">next</span> <span class="k">as</span> <span class="o">*</span><span class="k">mut</span> <span class="n">Pointers</span><span class="p">;</span>
<span class="p">(</span><span class="o">*</span><span class="n">p</span><span class="p">)</span><span class="py">.next</span> <span class="o">=</span> <span class="nn">ptr</span><span class="p">::</span><span class="nf">null_mut</span><span class="p">();</span>
<span class="p">(</span><span class="o">*</span><span class="n">p</span><span class="p">)</span><span class="py">.prev</span> <span class="o">=</span> <span class="nn">ptr</span><span class="p">::</span><span class="nf">null_mut</span><span class="p">();</span>
</code></pre></div></div>

<p>If <code class="language-plaintext highlighter-rouge">slot</code> is the only slot in the list, then <code class="language-plaintext highlighter-rouge">next</code> and <code class="language-plaintext highlighter-rouge">prev</code> both
point to the <code class="language-plaintext highlighter-rouge">WakerList</code> and we’ve now formed two mut references to
the same value, which is UB.</p>

<p>In this particular case, we could ensure every pointer dereference
occurs as a temporary. That limits the scope of each reference to each
line.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="n">p</span> <span class="o">=</span> <span class="o">&amp;</span><span class="n">slot</span><span class="py">.pointers</span> <span class="k">as</span> <span class="o">*</span><span class="k">mut</span> <span class="n">Pointers</span><span class="p">;</span>
<span class="k">let</span> <span class="n">next</span> <span class="o">=</span> <span class="p">(</span><span class="o">*</span><span class="n">p</span><span class="p">)</span><span class="py">.next</span><span class="p">;</span>
<span class="k">let</span> <span class="n">prev</span> <span class="o">=</span> <span class="p">(</span><span class="o">*</span><span class="n">p</span><span class="p">)</span><span class="py">.prev</span><span class="p">;</span>
<span class="p">(</span><span class="o">*</span><span class="n">next</span><span class="p">)</span><span class="py">.prev</span> <span class="o">=</span> <span class="n">prev</span><span class="p">;</span>
<span class="p">(</span><span class="o">*</span><span class="n">prev</span><span class="p">)</span><span class="py">.next</span> <span class="o">=</span> <span class="n">next</span><span class="p">;</span>
<span class="p">(</span><span class="o">*</span><span class="n">p</span><span class="p">)</span><span class="py">.next</span> <span class="o">=</span> <span class="nn">ptr</span><span class="p">::</span><span class="nf">null_mut</span><span class="p">();</span>
<span class="p">(</span><span class="o">*</span><span class="n">p</span><span class="p">)</span><span class="py">.prev</span> <span class="o">=</span> <span class="nn">ptr</span><span class="p">::</span><span class="nf">null_mut</span><span class="p">();</span>
</code></pre></div></div>

<p>But I just don’t trust myself to ensure, under all code paths, that I
never have two references overlap, violating Rust’s aliasing rules.</p>

<p>It’s kind of miserable, but the safest approach is to avoid creating
references entirely and operate entirely in the domain of pointer
reads, writes, and offsets.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="n">nextp</span> <span class="o">=</span> <span class="nd">addr_of_mut!</span><span class="p">((</span><span class="o">*</span><span class="n">node</span><span class="p">)</span><span class="py">.next</span><span class="p">);</span>
<span class="k">let</span> <span class="n">prevp</span> <span class="o">=</span> <span class="nd">addr_of_mut!</span><span class="p">((</span><span class="o">*</span><span class="n">node</span><span class="p">)</span><span class="py">.prev</span><span class="p">);</span>
<span class="k">let</span> <span class="n">next</span> <span class="o">=</span> <span class="n">nextp</span><span class="nf">.read</span><span class="p">();</span>
<span class="k">let</span> <span class="n">prev</span> <span class="o">=</span> <span class="n">prevp</span><span class="nf">.read</span><span class="p">();</span>
<span class="nd">addr_of_mut!</span><span class="p">((</span><span class="o">*</span><span class="n">prev</span><span class="p">)</span><span class="py">.next</span><span class="p">)</span><span class="nf">.write</span><span class="p">(</span><span class="n">next</span><span class="p">);</span>
<span class="nd">addr_of_mut!</span><span class="p">((</span><span class="o">*</span><span class="n">next</span><span class="p">)</span><span class="py">.prev</span><span class="p">)</span><span class="nf">.write</span><span class="p">(</span><span class="n">prev</span><span class="p">);</span>
<span class="n">nextp</span><span class="nf">.write</span><span class="p">(</span><span class="nn">ptr</span><span class="p">::</span><span class="nf">null_mut</span><span class="p">());</span>
<span class="n">prevp</span><span class="nf">.write</span><span class="p">(</span><span class="nn">ptr</span><span class="p">::</span><span class="nf">null_mut</span><span class="p">());</span>
</code></pre></div></div>

<p>(So much syntax. Makes you appreciate C.)</p>

<p><code class="language-plaintext highlighter-rouge">addr_of_mut!</code> is key: it computes a pointer to a place expression
without forming a reference. There are gotchas: you can still
accidentally form a reference within an <code class="language-plaintext highlighter-rouge">addr_of_mut!</code> argument. Read
<a href="https://doc.rust-lang.org/beta/std/ptr/macro.addr_of_mut.html">the
documentation</a>.</p>

<p><strong>Note</strong>: As I publish this, Rust 1.82 introduces <a href="https://doc.rust-lang.org/stable/reference/expressions/operator-expr.html#raw-borrow-operators">new
syntax</a>
that allows replacing <code class="language-plaintext highlighter-rouge">addr_of_mut!</code> with <code class="language-plaintext highlighter-rouge">&amp;raw mut</code> and <code class="language-plaintext highlighter-rouge">addr_of!</code>
with <code class="language-plaintext highlighter-rouge">&amp;raw const</code>. It’s not yet clear to me how much this prevents
accidental reference creation.</p>

<p>Despite the noise, to be safe, I ended up converting all of <code class="language-plaintext highlighter-rouge">WakerSet</code>
to pointer reads and writes. It’s not greppable: the code looks like
it’s still full of pointer dereferences, but they’re within
<code class="language-plaintext highlighter-rouge">addr_of_mut!</code> and the place expressions have the right shape.</p>

<p>I think it was Patrick Walton who once proposed a sugar for unsafe
Rust and pointers with a hypothetical <code class="language-plaintext highlighter-rouge">-&gt;</code> operator. It would be
convenient and easier on the eyes.</p>

<p>Until the Rust memory model stabilizes further and the aliasing rules
are well-defined, your best option is to integrate ASAN, TSAN, and
MIRI (both <a href="https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md">stacked
borrows</a>
and <a href="https://perso.crans.org/vanille/treebor/">tree borrows</a>) into
your continuous integration for any project that contains unsafe code.</p>

<p>If your project is safe Rust but depends on a crate which makes heavy
use of unsafe code, you should probably still enable sanitizers. I
didn’t discover all UB in wakerset until it was integrated into
batch-channel.</p>

<h2 id="miri-stacked-borrows-and-tree-borrows">MIRI: Stacked Borrows and Tree Borrows</h2>

<p>MIRI supports two aliasing models: <a href="https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md">stacked
borrows</a>
and <a href="https://perso.crans.org/vanille/treebor/">tree borrows</a>.</p>

<p>I won’t attempt to describe them. They are different approaches with
the same goal: formalize and validate the Rust memory model. Ralf
Jung, Neven Villani, and all are doing amazing work. Without MIRI, it
would be hard to trust unsafe Rust.</p>

<p>I decided to run both stacked and tree borrows and haven’t hit any
false positives so far.</p>

<h2 id="active-research-in-self-referential-structures">Active Research in Self-Referential Structures</h2>

<p>This topic is an <a href="https://github.com/rust-lang/unsafe-code-guidelines/issues/495">active work in
progress</a>.
I hope this blog post is obsolete in two years.</p>

<p>Self-referential and pinned data structures are something of a hot
topic right now. The <a href="https://rust-for-linux.com/">Rust-for-Linux
project</a>, doing the kinds of things
systems programs do, has <code class="language-plaintext highlighter-rouge">Pin</code> ergonomics and self-referential data
structures near the top of <a href="https://github.com/Rust-for-Linux/linux/issues/354">their
wishlist</a>.</p>

<p>In particular, <a href="https://rust-for-linux.com/the-safe-pinned-initialization-problem">pinned initialization
problem</a>.
The Linux kernel has self-referential data structures and it is
currently hard to initialize them with safe code. In <code class="language-plaintext highlighter-rouge">wakerset</code>, I
sidestepped this problem at the cost of a small amount of runtime
inefficiency by giving <code class="language-plaintext highlighter-rouge">WakerList</code> two empty states: one that is
moveable and one that is pinned. The former converts to the latter the
first time it is used after pinning.</p>

<p>y86-dev has a great blog post proposing <a href="https://y86-dev.github.io/blog/safe-pinned-initialization/overview.html">Safe Pinned
Initialization</a>.</p>

<h2 id="conclusions">Conclusions</h2>

<p>As much as this post might come across as a gripefest, I still think
Rust is great. In particular, its composable safety. The result of my
pain is a safe, efficient API. You can use <code class="language-plaintext highlighter-rouge">wakerset</code> without any risk
of undefined behavior.</p>

<p>What I learned from this experience:</p>
<ul>
  <li>Be extra careful with any use of <code class="language-plaintext highlighter-rouge">unsafe</code>.</li>
  <li>References, even if never used, are more dangerous than pointers in
C.</li>
  <li>Pinning syntax is awful, but it feels like Rust could solve this
someday.</li>
  <li><code class="language-plaintext highlighter-rouge">UnsafeCell</code> is required for intrusive structures.</li>
  <li>I don’t know how to statically constrain lifetime relationships with
intrusive structures, but maybe it’s possible? Avoiding the need for
runtime assertions would be nice.</li>
  <li>MIRI, especially under multithreaded stress tests, is critical.</li>
  <li>Putting this in words was as hard as writing the code.</li>
</ul>]]></content><author><name></name></author><category term="rust" /><summary type="html"><![CDATA[Or: The Most Expensive Linked List I’ve Ever Written]]></summary></entry><entry><title type="html">Terminal Latency on Windows</title><link href="https://chadaustin.me/2024/02/windows-terminal-latency/" rel="alternate" type="text/html" title="Terminal Latency on Windows" /><published>2024-02-18T00:00:00-06:00</published><updated>2024-02-18T00:00:00-06:00</updated><id>https://chadaustin.me/2024/02/windows-terminal-latency</id><content type="html" xml:base="https://chadaustin.me/2024/02/windows-terminal-latency/"><![CDATA[<p><strong>UPDATE 2024-04-15</strong>: Windows Terminal 1.19 contains a fix that
reduces latency by half! It’s now competitive with WSLtty on my
machine. Details in the <a href="https://github.com/microsoft/terminal/issues/5590">GitHub
Issue</a>.</p>

<p>In 2009, I wrote about <a href="https://chadaustin.me/2009/10/reasons-why-mintty-is-the-best-terminal-on-windows/">why MinTTY is the best terminal on
Windows</a>.
Even today, that post is one of my most popular.</p>

<figure>
<a href="/wp-uploads/mintty_right_click.png"><img src="/wp-uploads/mintty_right_click.png" alt="MinTTY in 2009" /></a>
<figcaption>MinTTY in 2009</figcaption>
</figure>

<p>Since then, the terminal situation on Windows has improved:</p>
<ul>
  <li>Cygwin defaults to MinTTY; you no longer need to manually install
it.</li>
  <li>Windows added <a href="https://devblogs.microsoft.com/commandline/windows-command-line-introducing-the-windows-pseudo-console-conpty/">PTY
support</a>,
obviating the need for offscreen console window hacks that add
latency.</li>
  <li>Windows added basically full support for <a href="https://learn.microsoft.com/en-us/windows/console/console-virtual-terminal-sequences">ANSI terminal
sequences</a>
in both the legacy conhost.exe consoles and its new <a href="https://github.com/microsoft/terminal">Windows
Terminal</a>.</li>
  <li>We now have a variety of terminals to choose from, even on Windows:
<a href="https://cmder.app/">Cmder</a>, <a href="https://conemu.github.io/">ConEmu</a>,
<a href="https://alacritty.org/">Alacritty</a>,
<a href="https://wezfurlong.org/wezterm/index.html">WezTerm</a>,
<a href="http://xtermjs.org/">xterm.js</a> (component of Visual Studio Code)</li>
</ul>

<p>The beginning of a year is a great time to look at your tools and
improve your environment.</p>

<p>I’d already <a href="https://chadaustin.me/2024/01/truecolor-terminal-emacs/">enabled 24-bit color in all of my
environments</a>
and <a href="https://chadaustin.me/2024/02/tmux-config/">streamlined my tmux
config</a>. It’s about time
that I take a look at the newer terminals.</p>

<p>Roughly in order, I care about:</p>
<ul>
  <li>Minimum feature set: 24-bit color, reasonable default fonts with
emoji support, italics are nice.</li>
  <li>Input latency.</li>
  <li>Throughput at line rate, for example, when I <code class="language-plaintext highlighter-rouge">cat</code> a large file.</li>
  <li>Support for multiple tabs in one window would be nice, but tmux
suffices for me.</li>
</ul>

<h2 id="which-terminals-should-i-test">Which terminals should I test?</h2>

<p>I considered the following.</p>

<ul>
  <li>Legacy conhost.exe (also known as Windows Console), Windows 10 19045</li>
  <li>MinTTY (3.7.0)</li>
  <li>Alacritty (0.13.1)</li>
  <li>WezTerm (20240203-110809-5046fc22)</li>
  <li>Windows Terminal (1.18.10301.0)</li>
</ul>

<h2 id="testing-features">Testing Features</h2>

<p>Testing color and italics support is easy with my
<a href="https://gist.github.com/chadaustin/2d2c2cb4b71fd1d4163aa8115077624a">colortest.rs</a>
script. To test basic emoji, you can cat the <a href="https://unicode.org/Public/emoji/1.0/emoji-data.txt">Unicode emoji 1.0
emoji-data.txt</a>.
To test more advanced support, try the zero-width joiner list in the
<a href="https://unicode.org/Public/emoji/latest/">latest/</a> directory.</p>

<table>
  <thead>
    <tr>
      <th>Terminal</th>
      <th>Emoji</th>
      <th>Font Attributes</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>conhost.exe</td>
      <td>No</td>
      <td>No italics</td>
    </tr>
    <tr>
      <td>MinTTY</td>
      <td>Black and white</td>
      <td>All major attributes</td>
    </tr>
    <tr>
      <td>Alacritty</td>
      <td>Black and white</td>
      <td>Everything but double underline</td>
    </tr>
    <tr>
      <td>WezTerm</td>
      <td><a href="https://wezfurlong.org/wezterm/config/fonts.html">Color</a></td>
      <td>All major attributes</td>
    </tr>
    <tr>
      <td>Windows Terminal</td>
      <td>Color</td>
      <td>All major attributes</td>
    </tr>
  </tbody>
</table>

<p>Everything but conhost.exe meets my bar.</p>

<p>It’s also worth noting that conhost.exe has a terrible default
palette. The default yellow is a pukey green and dark blue is barely
visible. You can change palettes, but defaults matter.</p>

<figure>
<a href="/images/windows-terminal-latency/default-palette-conhost.png"><img src="/images/windows-terminal-latency/default-palette-conhost.png" alt="Conhost.exe Default Palette" /></a>
<figcaption>Conhost.exe Default Palette</figcaption>
</figure>

<figure>
<a href="/images/windows-terminal-latency/default-palette-mintty.png"><img src="/images/windows-terminal-latency/default-palette-mintty.png" alt="MinTTY Default Palette" /></a>
<figcaption>MinTTY Default Palette</figcaption>
</figure>

<h2 id="latency">Latency</h2>

<p>I set up two latency tests. One with an 80x50 blank window in the
upper left corner of the screen. The other fullscreen, editing an
Emacs command at the bottom of the screen.</p>

<p>Since latencies are additive, system configuration doesn’t matter as
much as the absolute milliseconds of latency each terminal adds, but
I’ll describe my entire setup and include total keypress-to-pixels
latency.</p>

<ul>
  <li>Windows 10</li>
  <li>Intel i7-4771 @ 3.5 GHz</li>
  <li>NVIDIA GTX 1060</li>
  <li>Keyboard: <a href="https://1upkeyboards.com/shop/keyboard-kits/macro-pads/sweet16-macro-pad-white/">Sweet 16 Macro
Pad</a></li>
  <li>Display: <a href="https://www.lg.com/us/monitors/lg-27gp950-b-gaming-monitor">LG
27GP950-B</a>
at 4K, 120 Hz, adaptive sync</li>
</ul>

<h3 id="measurement-methodology">Measurement Methodology</h3>

<p>With <a href="https://isitsnappy.com/">Is It Snappy?</a>, I measured the number
of frames between pressing a key and pixels changing on the screen.</p>

<p>To minimize ambiguity about when the key was pressed, I slammed a
pencil’s eraser into the key, and always measured the key press as the
<em>second</em> frame after contact. (The first frame was usually when the
eraser barely touched the key. It would usually clear the activation
depth by the second frame.)</p>

<p>I considered the latency to end when pixels just started to change on
the screen. In practice, pixels take several 240 Hz frames to
transition from black to white, but I consistently marked the
beginning of that transition.</p>

<p>I took five measurements for each configuration and picked the median.
Each measurement was relatively consistent, so average would have been
a fine metric too. It doesn’t change the results below.</p>

<h3 id="80x50">80x50</h3>

<p>80x50 window, upper left of screen, cleared terminal, single keypress.</p>

<p>Confirmed window size with:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ echo $(tput cols)x$(tput lines)
80x50
</code></pre></div></div>

<table>
  <thead>
    <tr>
      <th>Terminal</th>
      <th>Median Latency (ms)</th>
      <th>240 Hz Camera Frames</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>conhost.exe WSL1</td>
      <td>33.3</td>
      <td>8</td>
    </tr>
    <tr>
      <td>MinTTY WSL1</td>
      <td>33.3</td>
      <td>8</td>
    </tr>
    <tr>
      <td>conhost.exe Cygwin</td>
      <td>41.3</td>
      <td>10</td>
    </tr>
    <tr>
      <td>MinTTY Cygwin</td>
      <td>57.9</td>
      <td>14</td>
    </tr>
    <tr>
      <td>WezTerm cmd.exe</td>
      <td>62.5</td>
      <td>15</td>
    </tr>
    <tr>
      <td>Alacritty WSL1</td>
      <td>62.5</td>
      <td>15</td>
    </tr>
    <tr>
      <td>WezTerm WSL1</td>
      <td>66.7</td>
      <td>16</td>
    </tr>
    <tr>
      <td>Windows Terminal WSL1</td>
      <td>66.7</td>
      <td>16</td>
    </tr>
  </tbody>
</table>

<h3 id="fullscreen">Fullscreen</h3>

<p>Maximized emacs, editing a command in the bottom row of the terminal.
I only tested WSL1 this time.</p>

<table>
  <thead>
    <tr>
      <th>Terminal</th>
      <th>Median Latency (ms)</th>
      <th>240 Hz Camera Frames</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>conhost.exe</td>
      <td>45.8</td>
      <td>11</td>
    </tr>
    <tr>
      <td>MinTTY</td>
      <td>52.42</td>
      <td>12</td>
    </tr>
    <tr>
      <td>WezTerm</td>
      <td>75</td>
      <td>18</td>
    </tr>
    <tr>
      <td>Windows Terminal</td>
      <td>75</td>
      <td>18</td>
    </tr>
    <tr>
      <td>Alacritty</td>
      <td>87.5</td>
      <td>21</td>
    </tr>
  </tbody>
</table>

<h3 id="throughput">Throughput</h3>

<p>I generated a 100,000-line file with:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ yes "This sentence has forty-five (45) characters." | head -n 100000 &gt; /tmp/lines.txt
</code></pre></div></div>

<p>Then I measured the wall-clock duration of:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ time cat /tmp/lines.txt
</code></pre></div></div>

<p>This benchmark captures the case that I accidentally dump a ton of
output and I’m sitting there just waiting for the terminal to become
responsive again. I have a gigabit internet connection, and it’s
embarrassing to be CPU-bound instead of IO-bound.</p>

<p>I did include Cygwin in this test, just to have two different MinTTY
datapoints.</p>

<table>
  <thead>
    <tr>
      <th>Terminal</th>
      <th>Elapsed Time (s)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>MinTTY WSL1</td>
      <td>0.57</td>
    </tr>
    <tr>
      <td>MinTTY Cygwin</td>
      <td>2.2</td>
    </tr>
    <tr>
      <td>Windows Terminal</td>
      <td>5.25</td>
    </tr>
    <tr>
      <td>Alacritty</td>
      <td>5.75</td>
    </tr>
    <tr>
      <td>WezTerm</td>
      <td>6.2</td>
    </tr>
    <tr>
      <td>conhost.exe</td>
      <td>21.8</td>
    </tr>
  </tbody>
</table>

<p>I assume this means MinTTY throttles display updates in some way. Of
course this is totally fine, because you couldn’t read the output
either way.</p>

<p>To test the hypothesis that MinTTY was caching cell rendering by
their contents, I also tried generating a file that rotated through
different lines, with no effect.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s">"/tmp/lines2.txt"</span><span class="p">,</span> <span class="s">"w"</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
  <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">100000</span><span class="p">):</span>
    <span class="n">sentence</span><span class="o">=</span><span class="s">"This sentence has forty-five (45) characters."</span>
    <span class="k">print</span><span class="p">(</span><span class="n">sentence</span><span class="p">[</span><span class="n">i</span><span class="o">%</span><span class="nb">len</span><span class="p">(</span><span class="n">sentence</span><span class="p">):]</span><span class="o">+</span><span class="n">sentence</span><span class="p">[:</span><span class="n">i</span><span class="o">%</span><span class="nb">len</span><span class="p">(</span><span class="n">sentence</span><span class="p">)],</span> <span class="nb">file</span><span class="o">=</span><span class="n">f</span><span class="p">)</span>
</code></pre></div></div>

<h3 id="cpu-usage-during-repeated-keypresses">CPU Usage During Repeated Keypresses</h3>

<p>While making these measurements, I noticed some strange behaviors. My
monitor runs at 120 Hz and animation and window dragging are generally
smooth. But right after you start Alacritty, dragging the window
animates at something like 30-60 frames per second. It’s noticeably
chunkier. WezTerm does the same, but slightly worse. Maybe 20 frames
per second.</p>

<p>I don’t know if I can blame the terminals themselves, because I
sometimes experience this even with Notepad.exe too. But the
choppiness stands out much more. Maybe something is CPU-bound in
responding to window events?</p>

<p>This made me think of a new test: if I open a terminal and hold down
the “a” button on autorepeat, how much CPU does the terminal consume?</p>

<p>To measure this, I set the terminal process’s affinity to my third
physical core, and watched the CPU usage graph in Task Manager. Not a
great methodology, but it gave a rough sense. Again, 80x50.</p>

<table>
  <thead>
    <tr>
      <th>Terminal</th>
      <th>Percent of Core</th>
      <th>Private Bytes After Startup (KiB)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>conhost</td>
      <td>0%</td>
      <td>6,500</td>
    </tr>
    <tr>
      <td>Alacritty</td>
      <td>5%</td>
      <td>74,000</td>
    </tr>
    <tr>
      <td>MinTTY WSL1</td>
      <td>10%</td>
      <td>10,200</td>
    </tr>
    <tr>
      <td>MinTTY Cygwin</td>
      <td>10%</td>
      <td>10,500</td>
    </tr>
    <tr>
      <td>Windows Terminal</td>
      <td>20%</td>
      <td>73,700</td>
    </tr>
    <tr>
      <td>WezTerm</td>
      <td>85%</td>
      <td>134,000</td>
    </tr>
  </tbody>
</table>

<p>The WezTerm CPU usage has to be a bug. I’ll report it.</p>

<h3 id="cpu-usage-idle">CPU Usage (Idle)</h3>

<p>I often have a pile of idle terminals sitting around. I don’t want
them to chew battery life. So let’s take a look at CPU Cycles Delta
(courtesy of Process Explorer) with a fresh, idle WSL
session.</p>

<table>
  <thead>
    <tr>
      <th>Terminal</th>
      <th>Idle Cycles/s (Focused)</th>
      <th>Idle Cycles/s (Background)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>conhost</td>
      <td>~900,000</td>
      <td>0</td>
    </tr>
    <tr>
      <td>Alacritty</td>
      <td>~2,400,000</td>
      <td>no difference</td>
    </tr>
    <tr>
      <td>WezTerm</td>
      <td>~2,600,000</td>
      <td>~1,600,000</td>
    </tr>
    <tr>
      <td>Windows Terminal</td>
      <td>~55,000,000</td>
      <td>~6,100,000</td>
    </tr>
    <tr>
      <td>MinTTY WSL1</td>
      <td>~120,000,000</td>
      <td>no difference</td>
    </tr>
    <tr>
      <td>MinTTY Cygwin</td>
      <td>~120,000,000</td>
      <td>no difference</td>
    </tr>
  </tbody>
</table>

<p>These numbers aren’t great at all! For perspective, I have a pile of
Firefox tabs open, some of them actively running JavaScript, and
they’re “only” using a few hundred million cycles per second.</p>

<p>Raymond Chen once wrote a <a href="https://devblogs.microsoft.com/oldnewthing/20060124-17/?p=32553">blog post about the importance of properly
idling</a>
in the Windows Terminal Server days. You might have a dozen users
logged into a host, and if a program is actively polling, it’s eating
performance that others could use.</p>

<p>Today, we often run on batteries, so idling correctly still matters,
but it seems to be something of a lost art. The only terminal that
idles completely is the old conhost.exe.</p>

<p>The other lesson we can draw is that Microsoft’s own replacement for
conhost.exe, Windows Terminal, uses over 10x the RAM, 60x the CPU when
focused, and infinitely more CPU when idle.</p>

<h2 id="conclusions">Conclusions</h2>

<p>conhost.exe consistently has the best latency, with MinTTY not much
behind. MinTTY handily dominates the throughput test, supports all
major ANSI character attributes, and has a better default palette.</p>

<p>As in 2009, I’d say MinTTY is still pretty great. (I should try to
track down that idle CPU consumption. It feels more like a bug than a
requirement.)</p>

<p>If you want to use MinTTY as the default terminal for WSL, install
<a href="https://github.com/mintty/wsltty">WSLtty</a>.</p>

<p>The others all have slightly worse latencies, but they’re in a similar
class. I’m particularly sensitive to latency, so I’d had a suspicion
even before measuring. Maybe it’s some consequence of being
GPU-accelerated? Out of curiousity, I put Windows Terminal in
software-rendered mode, and it shaved perhaps 4 ms off (median of 62.5
ms, 15 frames). Perhaps just measurement noise.</p>

<p>While I’m going to stick with MinTTY, one thing is clear: there is
room to improve all of the above.</p>]]></content><author><name></name></author><category term="terminal" /><summary type="html"><![CDATA[UPDATE 2024-04-15: Windows Terminal 1.19 contains a fix that reduces latency by half! It’s now competitive with WSLtty on my machine. Details in the GitHub Issue.]]></summary></entry><entry><title type="html">My Minimal tmux Config</title><link href="https://chadaustin.me/2024/02/tmux-config/" rel="alternate" type="text/html" title="My Minimal tmux Config" /><published>2024-02-10T00:00:00-06:00</published><updated>2024-02-10T00:00:00-06:00</updated><id>https://chadaustin.me/2024/02/tmux-config</id><content type="html" xml:base="https://chadaustin.me/2024/02/tmux-config/"><![CDATA[<p>If you spend any significant time in a terminal, you’ve probably used
<a href="https://github.com/tmux/tmux/wiki">tmux</a>.</p>

<p>I’m writing this post for a few reasons:</p>
<ul>
  <li>People have asked for my config.</li>
  <li>I see too many people wasting their time in the morning, rebuilding
their session from the previous day.</li>
  <li>I felt I should justify the configuration to myself rather than
setting options ad-hoc.</li>
</ul>

<p>tmux is often paired with a persistent connection. There are two
popular choices: Eternal Terminal and Mosh. The goal is to close your
laptop at the end of the day, open it the next morning, and have
everything where it was so you can immediately get back into flow.</p>

<p><strong>Note</strong>: There are other options. WezTerm has <a href="https://wezfurlong.org/wezterm/multiplexing.html">built-in
multiplexing</a>, for
example.</p>

<h2 id="macos--iterm2--eternal-terminal--tmux-control-mode">macOS + iTerm2 + Eternal Terminal + tmux Control Mode</h2>

<p>If you use macOS, <a href="https://iterm2.com/">iTerm2</a> has deep
<a href="https://iterm2.com/documentation-tmux-integration.html">tmux
integration</a>
in the form of <a href="https://github.com/tmux/tmux/wiki/Control-Mode">tmux Control
Mode</a>.</p>

<p>tmux windows map to iTerm2 tabs. Native tab navigation and scrollback
(both scrolling and find) work just as you’d expect.</p>

<p>tmux control mode does expect a reliable stream channel, so if you
want a connection that persists even when network connections are
dropped, Mosh will not work. You’ll need <a href="https://eternalterminal.dev/">Eternal
Terminal</a>.</p>

<p>If you use a Mac, this is an excellent configuration. I worked on
<a href="https://github.com/facebook/sapling/blob/main/eden/fs/docs/Overview.md">EdenFS</a>
and <a href="https://facebook.github.io/watchman/">Watchman</a> for almost five
years this way.</p>

<h2 id="mosh--tmux">mosh + tmux</h2>

<p>But now I use Windows and Linux and can’t use iTerm2, so tmux within
<a href="https://mosh.org/">Mosh</a> it is.</p>

<p>I find tmux’s default keybindings a little awkward, and the colors
simultaneously harsh and too minimal, so I made a configuration to
match my tastes.</p>

<p>You can <a href="https://gist.github.com/chadaustin/d4696e18217a7de9b2671549abcb54c4">download the full .tmux.conf
here</a>.</p>

<p>(You can go crazy, but avoiding too much fanciness was my goal. If you
want all the bling, install the <a href="https://github.com/tmux-plugins/tpm?tab=readme-ov-file">Tmux Plugin
Manager</a> and
things like <a href="https://github.com/erikw/tmux-powerline">tmux-powerline</a>.</p>

<h2 id="tmuxconf">.tmux.conf</h2>

<p>First, a small demonstration.</p>

<div style="max-width: 500px"><script async="" id="asciicast-xYtwktNN10OIgwImlFUviz5tr" src="https://asciinema.org/a/xYtwktNN10OIgwImlFUviz5tr.js"></script></div>

<p>Despite all of the work I put into <a href="https://chadaustin.me/2024/01/truecolor-terminal-emacs/">my recent post about 24-bit color
in
terminals</a>, I
do still use some with limited color support. macOS’s Terminal.app
only supports the 256-color palette, and the Linux console only really
supports 8. The following selects the correct <code class="language-plaintext highlighter-rouge">tmux</code> terminfo entry.</p>

<div class="language-conf highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Detect the correct TERM value for new sessions.
# if-shell uses /bin/sh, so bashisms like [[ do not work.
</span><span class="n">if</span> <span class="s2">"[ $(tput colors) = 16777216 ]"</span> {
  <span class="n">set</span> -<span class="n">g</span> <span class="n">default</span>-<span class="n">terminal</span> <span class="s2">"tmux-direct"</span>
} {
  <span class="n">if</span> <span class="s2">"[ $(tput colors) = 256 ]"</span> {
    <span class="n">set</span> -<span class="n">g</span> <span class="n">default</span>-<span class="n">terminal</span> <span class="s2">"tmux-256color"</span>
  } {
    <span class="n">set</span> -<span class="n">g</span> <span class="n">default</span>-<span class="n">terminal</span> <span class="s2">"tmux"</span>
  }
}
</code></pre></div></div>

<p>I prefer Emacs keybindings in both bash (readline) and tmux.</p>

<div class="language-conf highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">setw</span> -<span class="n">g</span> <span class="n">mode</span>-<span class="n">keys</span> <span class="n">emacs</span>
</code></pre></div></div>

<p>The next setting is more legacy terminal insanity. On some (most?)
terminals, programs cannot differentiate between a user pressing the
escape key and the beginning of an escape sequence. <code class="language-plaintext highlighter-rouge">readline</code> and
<code class="language-plaintext highlighter-rouge">tmux</code> default to 500 ms, which adds <a href="https://unix.stackexchange.com/a/608179/459183">noticeable
latency</a> in some
terminals when using programs like <code class="language-plaintext highlighter-rouge">vi</code>.</p>

<p>There’s no correct value here. Ideally, your terminal would use an
unambiguous code for the escape key, <a href="https://github.com/mintty/mintty/wiki/CtrlSeqs#escape-keycode">like
MinTTY</a>.</p>

<div class="language-conf highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">set</span> -<span class="n">s</span> <span class="n">escape</span>-<span class="n">time</span> <span class="m">200</span>
</code></pre></div></div>

<p>Let’s not be stingy with scrollback! Searching lots of history is
worth spending megabytes.</p>

<div class="language-conf highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># I can afford 50 MB of scrollback.
# Measured on WSL 1 with:
# yes $(python3 -c "print('y' * 80)")
</span><span class="n">set</span> -<span class="n">g</span> <span class="n">history</span>-<span class="n">limit</span> <span class="m">100000</span>
</code></pre></div></div>

<p>By default, if multiple clients connect to one tmux session, tmux will
resize all of the windows to the smallest connected terminal.</p>

<p>This behavior is annoying, and it’s always an accident. Sometimes I’ll
leave a temporary connection to a server from home and then another
fullscreen connection from work will cram each window into 80x25.</p>

<p>The <code class="language-plaintext highlighter-rouge">aggressive-resize</code> option applies this logic only to the
currently-viewed window, not everything in the session.</p>

<div class="language-conf highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">setw</span> -<span class="n">g</span> <span class="n">aggressive</span>-<span class="n">resize</span> <span class="n">on</span>
</code></pre></div></div>

<p>Window titles don’t automatically forward to the whatever graphical
terminal you’re using. Do that, and add the hostname, but keep it
concise.</p>

<div class="language-conf highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">set</span> -<span class="n">g</span> <span class="n">set</span>-<span class="n">titles</span> <span class="n">on</span>
<span class="n">set</span> -<span class="n">g</span> <span class="n">set</span>-<span class="n">titles</span>-<span class="n">string</span> <span class="s2">"#h: #W"</span>
</code></pre></div></div>

<p>iTerm2 has this nice behavior where active tabs are visually marked so
you can see, at a glance, which had recent activity. The following two
options offer similar behavior. Setting <code class="language-plaintext highlighter-rouge">activity-action</code> to <code class="language-plaintext highlighter-rouge">none</code>
disables any audible ding or visible flash, leaving just a subtle
indication in the status bar.</p>

<div class="language-conf highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">set</span> -<span class="n">g</span> <span class="n">monitor</span>-<span class="n">activity</span> <span class="n">on</span>
<span class="n">set</span> -<span class="n">g</span> <span class="n">activity</span>-<span class="n">action</span> <span class="n">none</span>
</code></pre></div></div>

<p>The following is perhaps the most important part of my configuration:
tab management. Like browsers and iTerm2, I want my tabs numbered. I
want a single (modified) keypress to select a tab, and I want tabs
automatically renumbered as they’re created, destroyed, and reordered.</p>

<p>I also want iTerm2-style previous- and next-tab keybindings.</p>

<div class="language-conf highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Match window numbers to the order of the keys on a keyboard.
</span><span class="n">set</span> -<span class="n">g</span> <span class="n">base</span>-<span class="n">index</span> <span class="m">1</span>
<span class="n">setw</span> -<span class="n">g</span> <span class="n">pane</span>-<span class="n">base</span>-<span class="n">index</span> <span class="m">1</span>

<span class="n">setw</span> -<span class="n">g</span> <span class="n">renumber</span>-<span class="n">windows</span> <span class="n">on</span>

<span class="c"># My tmux muscle memory still wants C-b 0 to select the first window.
</span><span class="n">bind</span> <span class="m">0</span> <span class="n">select</span>-<span class="n">window</span> -<span class="n">t</span> <span class="s2">":^"</span>
<span class="c"># Other terminals and web browsers use 9 to focus the final tab.
</span><span class="n">bind</span> <span class="m">9</span> <span class="n">select</span>-<span class="n">window</span> -<span class="n">t</span> <span class="s2">":$"</span>

<span class="n">bind</span> -<span class="n">n</span> <span class="s2">"M-0"</span> <span class="n">select</span>-<span class="n">window</span> -<span class="n">t</span> <span class="s2">":^"</span>
<span class="n">bind</span> -<span class="n">n</span> <span class="s2">"M-1"</span> <span class="n">select</span>-<span class="n">window</span> -<span class="n">t</span> <span class="s2">":1"</span>
<span class="n">bind</span> -<span class="n">n</span> <span class="s2">"M-2"</span> <span class="n">select</span>-<span class="n">window</span> -<span class="n">t</span> <span class="s2">":2"</span>
<span class="n">bind</span> -<span class="n">n</span> <span class="s2">"M-3"</span> <span class="n">select</span>-<span class="n">window</span> -<span class="n">t</span> <span class="s2">":3"</span>
<span class="n">bind</span> -<span class="n">n</span> <span class="s2">"M-4"</span> <span class="n">select</span>-<span class="n">window</span> -<span class="n">t</span> <span class="s2">":4"</span>
<span class="n">bind</span> -<span class="n">n</span> <span class="s2">"M-5"</span> <span class="n">select</span>-<span class="n">window</span> -<span class="n">t</span> <span class="s2">":5"</span>
<span class="n">bind</span> -<span class="n">n</span> <span class="s2">"M-6"</span> <span class="n">select</span>-<span class="n">window</span> -<span class="n">t</span> <span class="s2">":6"</span>
<span class="n">bind</span> -<span class="n">n</span> <span class="s2">"M-7"</span> <span class="n">select</span>-<span class="n">window</span> -<span class="n">t</span> <span class="s2">":7"</span>
<span class="n">bind</span> -<span class="n">n</span> <span class="s2">"M-8"</span> <span class="n">select</span>-<span class="n">window</span> -<span class="n">t</span> <span class="s2">":8"</span>
<span class="c"># Browsers also select last tab with M-9.
</span><span class="n">bind</span> -<span class="n">n</span> <span class="s2">"M-9"</span> <span class="n">select</span>-<span class="n">window</span> -<span class="n">t</span> <span class="s2">":$"</span>
<span class="c"># Match iTerm2.
</span><span class="n">bind</span> -<span class="n">n</span> <span class="s2">"M-{"</span> <span class="n">previous</span>-<span class="n">window</span>
<span class="n">bind</span> -<span class="n">n</span> <span class="s2">"M-}"</span> <span class="n">next</span>-<span class="n">window</span>
</code></pre></div></div>

<p>Note that Emacs assigns meaning to Alt-number. If it matters to you,
pick a different modifier.</p>

<p>Now let’s optimize the window ordering. By default, <code class="language-plaintext highlighter-rouge">C-b c</code> creates a
new window at the end. That’s a fine default. But sometimes I want a
new window right after the current one, so define <code class="language-plaintext highlighter-rouge">C-b C-c</code>. Also, add
some key bindings for sliding the current window around.</p>

<div class="language-conf highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">bind</span> <span class="s2">"C-c"</span> <span class="n">new</span>-<span class="n">window</span> -<span class="n">a</span>

<span class="n">bind</span> <span class="s2">"S-Left"</span> {
  <span class="n">swap</span>-<span class="n">window</span> -<span class="n">t</span> -<span class="m">1</span>
  <span class="n">select</span>-<span class="n">window</span> -<span class="n">t</span> -<span class="m">1</span>
}
<span class="n">bind</span> <span class="s2">"S-Right"</span> {
  <span class="n">swap</span>-<span class="n">window</span> -<span class="n">t</span> +<span class="m">1</span>
  <span class="n">select</span>-<span class="n">window</span> -<span class="n">t</span> +<span class="m">1</span>
}
</code></pre></div></div>

<p>I wanted “C-{“ and “C-}” but terminal key encoding <a href="https://vt100.net/docs/vt100-ug/chapter3.html">doesn’t work like
that</a>.</p>

<p>Next, let’s define some additional key bindings for very common
operations.</p>

<p>By default, searching in the scrollback requires entering “copy mode”
with <code class="language-plaintext highlighter-rouge">C-b [</code> and then entering reverse search mode with <code class="language-plaintext highlighter-rouge">C-r</code>.
Searching is common, so give it a dedicated <code class="language-plaintext highlighter-rouge">C-b r</code>.</p>

<div class="language-conf highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">bind</span> <span class="n">r</span> {
  <span class="n">copy</span>-<span class="n">mode</span>
  <span class="n">command</span>-<span class="n">prompt</span> -<span class="n">i</span> -<span class="n">p</span> <span class="s2">"(search up)"</span> \
    <span class="s2">"send-keys -X search-backward-incremental '%%%'"</span>
}
</code></pre></div></div>

<p>And some convenient toggles:</p>

<div class="language-conf highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Toggle terminal mouse support.
</span><span class="n">bind</span> <span class="n">m</span> <span class="n">set</span>-<span class="n">option</span> -<span class="n">g</span> <span class="n">mouse</span> \; <span class="n">display</span> <span class="s2">"Mouse: #{?mouse,ON,OFF}"</span>

<span class="c"># Toggle status bar. Useful for fullscreen focus.
</span><span class="n">bind</span> <span class="n">t</span> <span class="n">set</span>-<span class="n">option</span> <span class="n">status</span>
</code></pre></div></div>

<p>Now the status bar. The default status bar is okay, but we can do
better.</p>

<figure>
<a href="/images/tmux/tmux-status-before.png"><img src="/images/tmux/tmux-status-before.png" alt="tmux status bar: before" /></a>
<figcaption>tmux status bar: before</figcaption>
</figure>

<ul>
  <li>Move the tmux session ID next to the hostname on the right side.</li>
  <li>Move the current time to the far right corner.</li>
  <li>Keep the date, but I think I can remember what year it is.</li>
  <li>Ensure there is a single space between the windows and the left
edge. Without a space at the edge, it looks weird.</li>
</ul>

<figure>
<a href="/images/tmux/tmux-status-after.png"><img src="/images/tmux/tmux-status-after.png" alt="tmux status bar: after" /></a>
<figcaption>tmux status bar: after</figcaption>
</figure>

<p>The other half of that improvement is the color scheme. Instead of a
harsh black-on-green, I chose a scheme that evokes <a href="https://www.worthpoint.com/worthopedia/dec-digital-vt520-monitor-amber-1791664954">old amber CRT
phosphors</a>
or <a href="https://retropaq.com/the-miracle-of-gas-plasma/">gas plasma
displays</a> My dad had
a “laptop” with one of those when I was young.</p>

<p>The following color scheme mildly highlights the current window and
uses a dark blue for the hostname-and-time section. These colors don’t
distract me when I’m not working, but if I do look, the important
information is there.</p>

<div class="language-conf highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">if</span> <span class="s2">"[ $(tput colors) -ge 256 ]"</span> {
  <span class="n">set</span> -<span class="n">g</span> <span class="n">status</span>-<span class="n">left</span>-<span class="n">style</span> <span class="s2">"fg=black bg=colour130"</span>
  <span class="n">set</span> -<span class="n">g</span> <span class="n">status</span>-<span class="n">right</span>-<span class="n">style</span> <span class="s2">"bg=colour17 fg=orange"</span>
  <span class="n">set</span> -<span class="n">g</span> <span class="n">status</span>-<span class="n">style</span> <span class="s2">"fg=black bg=colour130"</span>
  <span class="n">set</span> -<span class="n">g</span> <span class="n">message</span>-<span class="n">style</span> <span class="s2">"fg=black bg=colour172"</span>
  <span class="c"># Current window should be slightly brighter.
</span>  <span class="n">set</span> -<span class="n">g</span> <span class="n">window</span>-<span class="n">status</span>-<span class="n">current</span>-<span class="n">style</span> <span class="s2">"fg=black bg=colour172"</span>
  <span class="c"># Windows with activity should be very subtly highlighted.
</span>  <span class="n">set</span> -<span class="n">g</span> <span class="n">window</span>-<span class="n">status</span>-<span class="n">activity</span>-<span class="n">style</span> <span class="s2">"fg=colour17 bg=colour130"</span>
  <span class="n">set</span> -<span class="n">g</span> <span class="n">mode</span>-<span class="n">style</span> <span class="s2">"fg=black bg=colour172"</span>
}
</code></pre></div></div>

<p>And that’s it!</p>

<p>Again, feel free to copy <a href="https://gist.github.com/chadaustin/d4696e18217a7de9b2671549abcb54c4">the complete
.tmux.conf</a>.</p>

<h2 id="shell-integration">Shell Integration</h2>

<p>There’s one more config to mention: adding some shell aliases to
.bashrc.</p>

<p>I sometimes want to look at or edit a file right next to my shell.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">if</span> <span class="o">[[</span> <span class="s2">"</span><span class="nv">$TMUX</span><span class="s2">"</span> <span class="o">]]</span><span class="p">;</span> <span class="k">then
    function </span>lv<span class="o">()</span> <span class="o">{</span>
        tmux split-window <span class="nt">-h</span> less <span class="s2">"</span><span class="nv">$@</span><span class="s2">"</span>
    <span class="o">}</span>
    <span class="k">function </span>ev<span class="o">()</span> <span class="o">{</span>
        tmux split-window <span class="nt">-h</span> emacs <span class="s2">"</span><span class="nv">$@</span><span class="s2">"</span>
    <span class="o">}</span>
    <span class="k">function </span>lh<span class="o">()</span> <span class="o">{</span>
        tmux split-window <span class="nt">-v</span> less <span class="s2">"</span><span class="nv">$@</span><span class="s2">"</span>
    <span class="o">}</span>
    <span class="k">function </span>eh<span class="o">()</span> <span class="o">{</span>
        tmux split-window <span class="nt">-v</span> emacs <span class="s2">"</span><span class="nv">$@</span><span class="s2">"</span>
    <span class="o">}</span>
<span class="k">fi</span>
</code></pre></div></div>

<p>(You may notice the aliases use different meanings of horizontal and
vertical than tmux. I don’t know, it feels like tmux is backwards, but
that could be my brain.)</p>

<script async="" id="asciicast-02hGqCVoOv13lpzI3XYZ1Tpz6" src="https://asciinema.org/a/02hGqCVoOv13lpzI3XYZ1Tpz6.js"></script>

<p>Happy multiplexing!</p>]]></content><author><name></name></author><category term="terminal" /><summary type="html"><![CDATA[If you spend any significant time in a terminal, you’ve probably used tmux.]]></summary></entry><entry><title type="html">I Just Wanted Emacs to Look Nice — Using 24-Bit Color in Terminals</title><link href="https://chadaustin.me/2024/01/truecolor-terminal-emacs/" rel="alternate" type="text/html" title="I Just Wanted Emacs to Look Nice — Using 24-Bit Color in Terminals" /><published>2024-01-27T00:00:00-06:00</published><updated>2024-01-27T00:00:00-06:00</updated><id>https://chadaustin.me/2024/01/truecolor-terminal-emacs</id><content type="html" xml:base="https://chadaustin.me/2024/01/truecolor-terminal-emacs/"><![CDATA[<p>Thanks to some coworkers and David Wilson’s <a href="https://www.youtube.com/watch?v=74zOY-vgkyw&amp;list=PLEoMzSkcN8oPH1au7H6B7bBJ4ZO7BXjSZ">Emacs from Scratch
playlist</a>,
I’ve been getting back into Emacs. The community is more vibrant than
the last time I looked, and
<a href="https://microsoft.github.io/language-server-protocol/">LSP</a> brings
modern completion and inline type checking.</p>

<p>David’s Emacs looks so fancy — I want nice colors and fonts
too, especially my preferred themes like
<a href="https://ethanschoonover.com/solarized/">Solarized</a>.</p>

<p>From desktop environments, Emacs automatically supports 24-bit color.</p>

<figure>
<a href="/images/truecolor-terminal-emacs/emacs-window.png"><img src="/images/truecolor-terminal-emacs/emacs-window.png" alt="Graphical Emacs: Fonts and Colors" /></a>
<figcaption>Graphical Emacs: Fonts and Colors</figcaption>
</figure>

<p>But, since I work on infrastructure, I’ve lived primarily in terminals
for years. And my Emacs looks like:</p>

<figure>
<a href="/images/truecolor-terminal-emacs/emacs-terminal.png"><img src="/images/truecolor-terminal-emacs/emacs-terminal.png" alt="Terminal Emacs: Not Fancy" /></a>
<figcaption>Terminal Emacs: Not Fancy</figcaption>
</figure>

<p>It turns out, for <em>years</em>, <a href="https://github.com/termstandard/colors#truecolor-support-in-output-devices">popular terminals have supported 24-bit
color</a>.
And yet they’re rarely used.</p>

<p>Like everything else, it boil down to legacy and politics. Control
codes are a protocol, and changes to that protocol take time to
propagate, especially with missteps along the way.</p>

<p>This post is two things:</p>
<ol>
  <li>how to enable true-color support in the terminal environments I
use, and</li>
  <li>how my desire for nice colors in Emacs led to poring over technical
standards from the 70s, 80s, and 90s, wondering how we got to this
point.</li>
</ol>

<blockquote>
  <p><strong><em>NOTE:</em></strong> I did my best, but please forgive any terminology
slip-ups or false histories. I grew up on VGA text mode UIs, but
never used a hardware terminal and wasn’t introduced to unix until
much later.</p>
</blockquote>

<h2 id="ansi-escape-codes">ANSI Escape Codes</h2>

<p>Early hardware terminals offered their own, incompatible, control code
schemes. That made writing portable software hard, so ANSI
standardized the protocol, while reserving room for expansion and
vendor-specific capabilities.</p>

<figure>
<a href="/images/truecolor-terminal-emacs/dec-vt100.webp"><img src="/images/truecolor-terminal-emacs/dec-vt100.webp" alt="DEC
VT100 (1978)" /></a>
<figcaption>DEC
VT100 (1978)</figcaption>
</figure>

<p><a href="https://en.wikipedia.org/wiki/ANSI_escape_code">ANSI escape codes</a>
date back to the 70s. They cover a huge range of functionality, but
since this post is focused on colors, I’m mostly interested in SGR
(Select Graphics Rendition), which allows configuring a variety of
character display attributes:</p>

<ul>
  <li>bold or intensity</li>
  <li>italics (not frequently supported)</li>
  <li>blink</li>
  <li>foreground and background colors</li>
  <li>and a bunch of other stuff. You can look at Wikipedia.</li>
</ul>

<h2 id="3--4--and-8-bit-color">3-, 4-, and 8-bit Color</h2>

<p>When color was introduced, there were eight. Black, white, the
additive primaries, and the subtractive primaries. The eight corners
of an RGB color cube.</p>

<p>Later, a bright (or bold) bit added eight more; “bright black” being
dark gray.</p>

<figure>
<a href="/images/truecolor-terminal-emacs/microsoft-vga.png"><img src="/images/truecolor-terminal-emacs/microsoft-vga.png" alt="4-Bit VGA Text Mode Palette" /></a>
<figcaption>4-Bit VGA Text Mode Palette</figcaption>
</figure>

<p>In 1999, <a href="https://invisible-island.net/xterm/xterm.log.html#xterm_111">Todd Larason patched xterm to add support for 256
colors</a>.
He chose a palette that filled out the RGB color cube with a 6x6x6
interior sampling and added a 24-entry finer-precision grayscale ramp.</p>

<figure>
<a href="/images/truecolor-terminal-emacs/xterm-256color.png"><img src="/images/truecolor-terminal-emacs/xterm-256color.png" alt="Output From colortest-256" /></a>
<figcaption>Output From colortest-256</figcaption>
</figure>

<blockquote>
  <p><strong><em>NOTE:</em></strong> There’s a rare, but still-supported, 88-color variant
with a 4x4x4 color cube and 8-entry grayscale ramp, primarily to
reduce the use of historically-limited X11 color objects.</p>
</blockquote>

<blockquote>
  <p><strong><em>NOTE:</em></strong> We’ll debug this later, but Todd’s patch to add
256-color support to xterm used semicolons as the separator between
the ANSI SGR command 48 and the color index, which set off a chain
reaction of ambiguity we’re still dealing with today.</p>
</blockquote>

<h2 id="where-did-24-bit-color-support-come-from">Where Did 24-Bit Color Support Come From?</h2>

<p>It’s well-documented how to send 8-bit and 24-bit colors to compatible
terminals. Per
<a href="https://en.wikipedia.org/wiki/ANSI_escape_code#8-bit">Wikipedia</a>:</p>

<p><code class="language-plaintext highlighter-rouge">ESC[38;5;&lt;n&gt;m</code> sets foreground color <code class="language-plaintext highlighter-rouge">n</code> per the palettes above.</p>

<p><code class="language-plaintext highlighter-rouge">ESC[38;2;&lt;r&gt;;&lt;g&gt;;&lt;b&gt;m</code> sets foreground color (<code class="language-plaintext highlighter-rouge">r</code>, <code class="language-plaintext highlighter-rouge">g</code>, <code class="language-plaintext highlighter-rouge">b</code>).</p>

<p>(Again, that confusion about <a href="https://wezfurlong.org/wezterm/escape-sequences.html#graphic-rendition-sgr">semicolons vs.
colons</a>,
and an unused colorspace ID if colons are used. We’ll get to the
bottom of that soon.)</p>

<p>But why 5? Why 2? How did any of this come about? I’d struggled enough
with unexpected output that it was time to discover the ground truth.</p>

<p>Finding and reading original sources led me to construct the following
narrative:</p>

<ul>
  <li>In the 70s, ANSI standardized terminal escape sequences, resulting
in <a href="https://nvlpubs.nist.gov/nistpubs/Legacy/FIPS/fipspub86.pdf">ANSI
X3.64</a>
and the better-known
<a href="https://www.ecma-international.org/wp-content/uploads/ECMA-48_5th_edition_june_1991.pdf">ECMA-48</a>.</li>
  <li>The first edition of ECMA-48 is lost to time, but it probably looks
much like ANSI X3.64.</li>
  <li>The <a href="https://ecma-international.org/wp-content/uploads/ECMA-48_2nd_edition_august_1979.pdf">2nd
edition</a>
of ECMA-48 (1979) allocated SGR parameters 30-37 and 40-47 for setting
3-bit foreground and background colors, respectively.
    <ul>
      <li>By the way, these standards use the word “parameter” to mean
command, and “subparameter” to mean argument, if applicable.</li>
    </ul>
  </li>
  <li>The <a href="https://ecma-international.org/wp-content/uploads/ECMA-48_3rd_edition_march_1984.pdf">3rd
edition</a>
(1984) introduced the concept of an implementation-defined default
color for both foreground and background, and allocated parameters
39 and 49, respectively.</li>
  <li>Somewhere in this timeline, vendors did ship hardware terminals with
richer color support. The <a href="https://terminals-wiki.org/wiki/index.php/Wyse_WY-370">Wyse
WY-370</a>
introduced new color modes, including a direct-indexed 64-color
palette. (See Page 86 of its <a href="http://bitsavers.org/pdf/wyse/WY-370/881133-02A_WY-370_Programmers_Guide_Jun90.pdf">Programmer’s
Guide</a>.)</li>
  <li>38 and 48 are the most important parameters for selecting colors
today, but they weren’t allocated by either the
<a href="https://ecma-international.org/wp-content/uploads/ECMA-48_4th_edition_december_1986.pdf">4th</a>
(1986) or
<a href="https://www.ecma-international.org/wp-content/uploads/ECMA-48_5th_edition_june_1991.pdf">5th</a>
(1991) editions. So where did they come from? The 5th edition gives
a clue:
    <blockquote>
      <p>reserved for future standardization; intended for setting
character foreground colour as specified in ISO 8613-6 [CCITT
Recommendation T.416]</p>
    </blockquote>
  </li>
  <li>
    <p>ISO 8613 was a boondoggle of a project intended to <a href="https://en.wikipedia.org/wiki/Open_Document_Architecture">standardize and
replace all proprietary document file
formats</a>.
You’ve never heard of it, so it obviously failed. But its legacy
lives on – ISO 8613-6 (ITU T.416) (1993) built on ECMA-48’s codes
and defined parameters 38 and 48 as extended foreground and
background color modes, respectively.</p>

    <blockquote>
      <p>The first parameter element indicates a choice between:</p>
      <ul>
        <li>0 implementation defined (only applicable for the character foreground colour)</li>
        <li>1 transparent;</li>
        <li>2 direct colour in RGB space;</li>
        <li>3 direct colour in CMY space;</li>
        <li>4 direct colour in CMYK space;</li>
        <li>5 indexed colour.</li>
      </ul>
    </blockquote>
  </li>
</ul>

<p>There we go! <em>That</em> is why 5 is used for 256-color mode and 2 is
24-bit RGB.</p>

<p>Careful reading also gives a clue as to the semicolon vs. colon syntax
screw-up. Note the subtle use of the term “parameter element” vs.
“parameter”.</p>

<p>If you read ISO 8613-6 (ITU T.416) and ECMA-48 closely, it’s not
explicitly stated, but they seem to indicate that unknown parameters
for commands like “select graphics rendition” should be ignored. And
parameters are separated with semicolons.</p>

<p>That implies <code class="language-plaintext highlighter-rouge">ESC[38;5;3m</code> should be interpreted, in terminals that
don’t support SGR 38, as “unknown, ignored (38)”, “blinking (5)”, and
“italicized (3)”. The syntax should use colons to separate
sub-parameter components, but something got lost along the way.</p>

<p>(Now, in practice, programs are told how to communicate with their
terminals via the TERM variable and the terminfo database, so I
don’t know how much pain occurs in reality.)</p>

<p>Thomas Dickey has done a great job documenting the history of
<a href="https://invisible-island.net/ncurses/">ncurses</a> and
<a href="https://invisible-island.net/xterm/xterm.log.html">xterm</a>, and, lo
and behold, explains exactly the <a href="https://invisible-island.net/xterm/xterm.faq.html#semicolon_vs_colon">origin of the ambiguous
syntax</a>:</p>

<blockquote>
  <p>We used semicolon (like other SGR parameters) for separating the
R/G/B values in the escape sequence, since a copy of ITU T.416
(ISO-8613-6) which presumably clarified the use of colon for this
feature was costly.</p>

  <p>Using semicolon was incorrect because some applications could expect
their parameters to be order-independent. As used for the R/G/B
values, that was order-dependent. The relevant information, by the
way, is part of ECMA-48 (not ITU T.416, as mentioned in Why only 16
(or 256) colors?). Quoting from section 5.4.2 of ECMA-48, page 12,
and adding emphasis (not in the standard):</p>

  <blockquote>
    <p>Each parameter sub-string consists of one or more bit combinations
from 03/00 to 03/10; the bit combinations from 03/00 to 03/09
represent the digits ZERO to NINE; bit combination 03/10 may be
used as a separator in a parameter sub-string, for example, to
separate the fractional part of a decimal number from the integer
part of that number.</p>
  </blockquote>

  <p>and later on page 78, in 8.3.117 SGR – SELECT GRAPHIC RENDITION, the
description of SGR 38:</p>

  <blockquote>
    <p>(reserved for future standardization; intended for setting
character foreground colour as specified in ISO 8613-6 [CCITT
Recommendation T.416])</p>
  </blockquote>

  <p>Of course you will immediately recognize that 03/10 is ASCII colon,
and that ISO 8613-6 necessarily refers to the encoding in a
parameter sub-string. Or perhaps you will not.</p>
</blockquote>

<p>So it’s all because the ANSI and ISO standards are ridiculously
expensive (to this day, these crappy PDF scans from the 90s and
earlier are $200 USD!) and because they use a baroque syntax to denote
ASCII characters. While writing this post, I had to keep <code class="language-plaintext highlighter-rouge">man ascii</code>
open to match, for example, <code class="language-plaintext highlighter-rouge">03/10</code> to colon and <code class="language-plaintext highlighter-rouge">03/11</code> to semicolon.
I guess it’s how standards were written back then. A Hacker News
thread in the context of WezTerm <a href="https://news.ycombinator.com/item?id=35138390">gives more
detail</a>.</p>

<p>So, to recap in the timeline:</p>

<ul>
  <li>1999: <a href="https://invisible-island.net/xterm/xterm.log.html#xterm_111">Thomas Dickey merged Todd Larason’s 256-color
patches</a>
with ambiguous semicolon syntax.</li>
  <li>2006: Konsole added support for 256-color and 24-bit truecolor using
the same ambiguous syntax as xterm, with a <a href="https://bugs.kde.org/show_bug.cgi?id=107487">follow-on
discussion</a> about
colons vs. semicolons. The issue was noticed, but semicolon syntax
was adopted anyway.</li>
  <li>2012: Thomas Dickey <a href="https://invisible-island.net/xterm/xterm.log.html#xterm_282">fixed xterm to accept the standards-compliant
syntax</a>.</li>
  <li>2016: Windows 10’s built-in console gained <a href="https://devblogs.microsoft.com/commandline/24-bit-color-in-the-windows-console/">ANSI escape code
support, including 24-bit
colors</a>.
Unfortunately with the ambiguous semicolon syntax.</li>
  <li>2019: Windows Terminal is released, with ANSI escape code support,
but also using ambiguous semicolon syntax.</li>
  <li>2022: Microsoft announced <a href="https://learn.microsoft.com/en-us/windows/console/ecosystem-roadmap">ecosystem-wide
migration</a>
from the legacy framebuffer-based VGA-style console subsystem to
ANSI terminal emulation, specifically using xterm as a guide.</li>
  <li>2022: Konsole <a href="https://github.com/KDE/konsole/commit/316a386d92a083e235624e9f81df3b6dbbe08bff">gains support for standards-compliant
syntax</a>.</li>
</ul>

<p>Okay, here’s what we’ve established:</p>

<ul>
  <li>ANSI codes are widely supported, even on Windows.</li>
  <li>Truecolor support is either widely supported or (for example, on the
Linux text mode terminal) at least recognized and mapped to a more
limited palette.</li>
  <li>Semicolon syntax is the most compatible, though the unambiguous
colon syntax is slowly spreading.</li>
</ul>

<p>I wrote a <a href="https://gist.github.com/chadaustin/2d2c2cb4b71fd1d4163aa8115077624a">small colortest.rs program to test color support and attributes like
reverse and
italics</a>
to confirm the above in every terminal I use.</p>

<h2 id="terminfo">Terminfo</h2>

<p>Now that we’ve established terminal capabilities and how to use them,
the next trick is to convince software of varying lineages to detect
and use the best color support available.</p>

<p>Typically, this is done with the old
<a href="https://en.wikipedia.org/wiki/Terminfo">terminfo</a> library (or the
even older <a href="https://en.wikipedia.org/wiki/Termcap">termcap</a>).</p>

<p>Terminfo provides a database of terminal capabilities and the ability
to generate appropriate escape sequences. The TERM environment
variable tells programs which terminfo record to use. Its value is
automatically forwarded over <code class="language-plaintext highlighter-rouge">ssh</code> connections.</p>

<p>Terminfo uses ridiculous command names: <code class="language-plaintext highlighter-rouge">infocmp</code>, <code class="language-plaintext highlighter-rouge">tic</code>, <code class="language-plaintext highlighter-rouge">toe</code>. (Not
to be confused with the unrelated <code class="language-plaintext highlighter-rouge">tac</code>.)</p>

<p>To see the list of terminfo records installed on your host, run <code class="language-plaintext highlighter-rouge">toe
-a</code>. (Do we /really/ need to install support for every legacy hardware
terminal on modern machines? Good luck even finding a hardware
terminal these days. They’re collector’s items.)</p>

<p><code class="language-plaintext highlighter-rouge">infocmp</code> is how you inspect the capabilities of a specific terminfo
record.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ infocmp xterm-256color
#       Reconstructed via infocmp from file: /lib/terminfo/x/xterm-256color
xterm-256color|xterm with 256 colors,
        am, bce, ccc, km, mc5i, mir, msgr, npc, xenl,
        colors#0x100, cols#80, it#8, lines#24, pairs#0x10000,
        acsc=``aaffggiijjkkllmmnnooppqqrrssttuuvvwwxxyyzz{{||}}~~,
        bel=^G, blink=\E[5m, bold=\E[1m, cbt=\E[Z, civis=\E[?25l,
        clear=\E[H\E[2J, cnorm=\E[?12l\E[?25h, cr=\r,
        csr=\E[%i%p1%d;%p2%dr, cub=\E[%p1%dD, cub1=^H,
        cud=\E[%p1%dB, cud1=\n, cuf=\E[%p1%dC, cuf1=\E[C,
        cup=\E[%i%p1%d;%p2%dH, cuu=\E[%p1%dA, cuu1=\E[A,
        cvvis=\E[?12;25h, dch=\E[%p1%dP, dch1=\E[P, dim=\E[2m,
        dl=\E[%p1%dM, dl1=\E[M, ech=\E[%p1%dX, ed=\E[J, el=\E[K,
        el1=\E[1K, flash=\E[?5h$&lt;100/&gt;\E[?5l, home=\E[H,
        hpa=\E[%i%p1%dG, ht=^I, hts=\EH, ich=\E[%p1%d@,
        il=\E[%p1%dL, il1=\E[L, ind=\n, indn=\E[%p1%dS,
        initc=\E]4;%p1%d;rgb:%p2%{255}%*%{1000}%/%2.2X/%p3%{255}%*%{1000}%/%2.2X/%p4%{255}%*%{1000}%/%2.2X\E\\,
        invis=\E[8m, is2=\E[!p\E[?3;4l\E[4l\E&gt;, kDC=\E[3;2~,
        kEND=\E[1;2F, kHOM=\E[1;2H, kIC=\E[2;2~, kLFT=\E[1;2D,
        kNXT=\E[6;2~, kPRV=\E[5;2~, kRIT=\E[1;2C, ka1=\EOw,
        ka3=\EOy, kb2=\EOu, kbeg=\EOE, kbs=^?, kc1=\EOq, kc3=\EOs,
        kcbt=\E[Z, kcub1=\EOD, kcud1=\EOB, kcuf1=\EOC, kcuu1=\EOA,
        kdch1=\E[3~, kend=\EOF, kent=\EOM, kf1=\EOP, kf10=\E[21~,
        kf11=\E[23~, kf12=\E[24~, kf13=\E[1;2P, kf14=\E[1;2Q,
        kf15=\E[1;2R, kf16=\E[1;2S, kf17=\E[15;2~, kf18=\E[17;2~,
        kf19=\E[18;2~, kf2=\EOQ, kf20=\E[19;2~, kf21=\E[20;2~,
        kf22=\E[21;2~, kf23=\E[23;2~, kf24=\E[24;2~,
        kf25=\E[1;5P, kf26=\E[1;5Q, kf27=\E[1;5R, kf28=\E[1;5S,
        kf29=\E[15;5~, kf3=\EOR, kf30=\E[17;5~, kf31=\E[18;5~,
        kf32=\E[19;5~, kf33=\E[20;5~, kf34=\E[21;5~,
        kf35=\E[23;5~, kf36=\E[24;5~, kf37=\E[1;6P, kf38=\E[1;6Q,
        kf39=\E[1;6R, kf4=\EOS, kf40=\E[1;6S, kf41=\E[15;6~,
        kf42=\E[17;6~, kf43=\E[18;6~, kf44=\E[19;6~,
        kf45=\E[20;6~, kf46=\E[21;6~, kf47=\E[23;6~,
        kf48=\E[24;6~, kf49=\E[1;3P, kf5=\E[15~, kf50=\E[1;3Q,
        kf51=\E[1;3R, kf52=\E[1;3S, kf53=\E[15;3~, kf54=\E[17;3~,
        kf55=\E[18;3~, kf56=\E[19;3~, kf57=\E[20;3~,
        kf58=\E[21;3~, kf59=\E[23;3~, kf6=\E[17~, kf60=\E[24;3~,
        kf61=\E[1;4P, kf62=\E[1;4Q, kf63=\E[1;4R, kf7=\E[18~,
        kf8=\E[19~, kf9=\E[20~, khome=\EOH, kich1=\E[2~,
        kind=\E[1;2B, kmous=\E[&lt;, knp=\E[6~, kpp=\E[5~,
        kri=\E[1;2A, mc0=\E[i, mc4=\E[4i, mc5=\E[5i, meml=\El,
        memu=\Em, mgc=\E[?69l, nel=\EE, oc=\E]104\007,
        op=\E[39;49m, rc=\E8, rep=%p1%c\E[%p2%{1}%-%db,
        rev=\E[7m, ri=\EM, rin=\E[%p1%dT, ritm=\E[23m, rmacs=\E(B,
        rmam=\E[?7l, rmcup=\E[?1049l\E[23;0;0t, rmir=\E[4l,
        rmkx=\E[?1l\E&gt;, rmm=\E[?1034l, rmso=\E[27m, rmul=\E[24m,
        rs1=\Ec\E]104\007, rs2=\E[!p\E[?3;4l\E[4l\E&gt;, sc=\E7,
        setab=\E[%?%p1%{8}%&lt;%t4%p1%d%e%p1%{16}%&lt;%t10%p1%{8}%-%d%e48;5;%p1%d%;m,
        setaf=\E[%?%p1%{8}%&lt;%t3%p1%d%e%p1%{16}%&lt;%t9%p1%{8}%-%d%e38;5;%p1%d%;m,
        sgr=%?%p9%t\E(0%e\E(B%;\E[0%?%p6%t;1%;%?%p5%t;2%;%?%p2%t;4%;%?%p1%p3%|%t;7%;%?%p4%t;5%;%?%p7%t;8%;m,
        sgr0=\E(B\E[m, sitm=\E[3m, smacs=\E(0, smam=\E[?7h,
        smcup=\E[?1049h\E[22;0;0t, smglp=\E[?69h\E[%i%p1%ds,
        smglr=\E[?69h\E[%i%p1%d;%p2%ds,
        smgrp=\E[?69h\E[%i;%p1%ds, smir=\E[4h, smkx=\E[?1h\E=,
        smm=\E[?1034h, smso=\E[7m, smul=\E[4m, tbc=\E[3g,
        u6=\E[%i%d;%dR, u7=\E[6n, u8=\E[?%[;0123456789]c,
        u9=\E[c, vpa=\E[%i%p1%dd,
</code></pre></div></div>

<p>There’s so much junk in there. I wonder how much only applies to
non-ANSI hardware terminals, and therefore is irrelevant these days.</p>

<p>For now, we’re only interested in three of these capabilities:</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">colors</code> is how many colors this terminal supports. The standard
values are 0, 8, 16, 256, and 0x1000000 (24-bit), though other
values exist.</li>
  <li><code class="language-plaintext highlighter-rouge">setaf</code> and <code class="language-plaintext highlighter-rouge">setab</code> set foreground and background colors,
respectively. I believe they stand for “Set ANSI Foreground” and
“Set ANSI Background”. Each takes a single argument, the color
number.</li>
</ul>

<p>Those percent signs are a parameter arithmetic and substitution
language. Let’s decode <code class="language-plaintext highlighter-rouge">setaf</code> in particular:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>setaf=\E[%?%p1%{8}%&lt;%t3%p1%d%e%p1%{16}%&lt;%t9%p1%{8}%-%d%e38;5;%p1%d%;m
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>print "\E["
if p1 &lt; 8 {
  print "3" p1
} else if p1 &lt; 16 {
  print "9" (p1 - 8)
} else {
  print "38;5;" p1
}
print "m"
</code></pre></div></div>

<p>This is the <code class="language-plaintext highlighter-rouge">xterm-256color</code> terminfo description. It only knows how
to output the ANSI 30-37 SGR parameters, the non-standard 90-97
brights (from IBM AIX), or otherwise the 256-entry palette, using
ambiguous semicolon-delimited syntax.</p>

<p>Let’s compare with <code class="language-plaintext highlighter-rouge">xterm-direct</code>, the terminfo entry that supports
RGB.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ infocmp xterm-256color xterm-direct
comparing xterm-256color to xterm-direct.
    comparing booleans.
        ccc: T:F.
    comparing numbers.
        colors: 256, 16777216.
    comparing strings.
        initc: '\E]4;%p1%d;rgb:%p2%{255}%*%{1000}%/%2.2X/%p3%{255}%*%{1000}%/%2.2X/%p4%{255}%*%{1000}%/%2.2X\E\\', NULL.
        oc: '\E]104\007', NULL.
        rs1: '\Ec\E]104\007', '\Ec'.
        setab: '\E[%?%p1%{8}%&lt;%t4%p1%d%e%p1%{16}%&lt;%t10%p1%{8}%-%d%e48;5;%p1%d%;m', '\E[%?%p1%{8}%&lt;%t4%p1%d%e48:2::%p1%{65536}%/%d:%p1%{256}%/%{255}%&amp;%d:%p1%{255}%&amp;%d%;m'.
        setaf: '\E[%?%p1%{8}%&lt;%t3%p1%d%e%p1%{16}%&lt;%t9%p1%{8}%-%d%e38;5;%p1%d%;m', '\E[%?%p1%{8}%&lt;%t3%p1%d%e38:2::%p1%{65536}%/%d:%p1%{256}%/%{255}%&amp;%d:%p1%{255}%&amp;%d%;m'.
</code></pre></div></div>

<p>A few things are notable:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">xterm-direct</code> advertises 16.7 million colors, as expected.</li>
  <li><code class="language-plaintext highlighter-rouge">xterm-direct</code> unsets the <code class="language-plaintext highlighter-rouge">ccc</code> boolean, which indicates color
indices cannot have new RGB values assigned.</li>
  <li>Correspondingly, xterm-direct unsets <code class="language-plaintext highlighter-rouge">initc</code>, <code class="language-plaintext highlighter-rouge">oc</code>, and <code class="language-plaintext highlighter-rouge">rs1</code>, also
related to changing color values at runtime.</li>
  <li>And of course <code class="language-plaintext highlighter-rouge">setaf</code> and <code class="language-plaintext highlighter-rouge">setab</code> change. We’ll decode that next.</li>
</ul>

<p>Here’s where Terminfo’s limitations cause us trouble. Terminfo and
ncurses are tied at the hip. Their programming model is that there are
N palette entries, each of which has a default RGB value, and
terminals may support overriding any palette entry’s RGB value.</p>

<p>The <code class="language-plaintext highlighter-rouge">-direct</code> terminals, however, are different. They represent 24-bit
colors by pretending there are 16.7 million palette entries, each of
which maps to the 8:8:8 RGB cube, but whose values cannot be changed.</p>

<p>Now let’s look at the new <code class="language-plaintext highlighter-rouge">setaf</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>print "\E["
if p1 &lt; 8 {
  print "3" p1
} else {
  print "38:2::" (p1 / 65536) ":" ((p1 / 256) &amp; 255) ":" (p1 &amp; 255)
}
print "m"
</code></pre></div></div>

<p>It’s not <em>quite</em> as simple as direct RGB. For compatibility with
programs that assume the meaning of <code class="language-plaintext highlighter-rouge">setaf</code>, this scheme steals the
darkest 7 blues, not including black, and uses them for compatibility
with the basic ANSI 8 colors. Otherwise, there’s a risk of legacy
programs outputting barely-visible dark blues instead of the ANSI
colors they expect.</p>

<p>One consequence is that the <code class="language-plaintext highlighter-rouge">-direct</code> schemes are incompatible with
the <code class="language-plaintext highlighter-rouge">-256color</code> schemes, so programs must be aware that 256 colors
means indexed and 16.7 million means direct, except that the darkest 7
blues are to be avoided.</p>

<p>Fundamentally, terminfo has no notion of color space. So a program
that was written before terminfo even supported more colors than 256
might (validly!) assume the values of the first 8, 16, or even 256
palette entries.</p>

<p>This explains an issue with the Rust crate
<a href="https://docs.rs/termwiz/latest/termwiz/">termwiz</a> that I <a href="https://github.com/wez/wezterm/issues/4528">recently
ran into</a> at work. A
<a href="https://sapling-scm.com/">program</a> expected to output colors in the
xterm-256color palette, but was actually generating various
illegibly-dark shades of blue. (Note: Despite the fact that the issue
is open as of this writing, @quark-zju landed a fix, so current
termwiz behaves reasonably.)</p>

<p>This is a terminfo restriction, not a terminal restriction. As far as
I know, every terminal that supports 24-bit color also supports the
xterm 256-color palette and even dynamically changing their RGB
values. (You can even <a href="https://gist.github.com/chadaustin/7046bff2261b0f669d223a88ecad8282">animate the
palette</a>
like <a href="https://www.youtube.com/watch?v=aMcJ1Jvtef0">The Secret of Monkey Island
did</a>!) While I appreciate
Thomas Dickey’s dedication to accurately documenting history and
preserving compatibility, terminfo simply isn’t great at accurate and
timely descriptions of today’s vibrant ecosystem of terminal
emulators.</p>

<p>Kovid Goyal, author of <a href="https://sw.kovidgoyal.net/kitty/">kitty</a>,
<a href="https://github.com/kovidgoyal/kitty/issues/4172#issuecomment-955190343">expresses his
frustration</a>:</p>

<blockquote>
  <p>To summarize, one cannot have both 256 and direct color support in
one terminfo file.</p>

  <p>Frustrated users of the ncurses library have only themselves to
blame, for choosing to use such a bad library.</p>
</blockquote>

<p>A deeper, more accurate discussion of the challenges are documented in
<a href="https://github.com/kovidgoyal/kitty/issues/879">kitty issue #879</a>.</p>

<p>In an ideal world, terminfo would have introduced a brand new
capability for 24-bit RGB, leaving the adjustable 256-color palette in
place.</p>

<p>Modern programs should probably disregard most of terminfo and assume
that 16.7 million colors implies support for the rest of the color
capabilities. And maybe generate their own ANSI-compatible escape
sequences… except for the next wrinkle.</p>

<h2 id="setting-term-semicolons-again">Setting TERM: Semicolons Again!</h2>

<p>Gripes about terminfo aside, everyone uses it, so we do need to ensure
TERM is set correctly.</p>

<p>While I’d like to standardize on the colon-based SGR syntax, several
terminals I use only support semicolons:</p>

<ul>
  <li><a href="https://learn.microsoft.com/en-us/windows/console/definitions#console-host">Conhost</a>,
Windows’s built-in console.</li>
  <li><a href="https://github.com/mintty/mintty/wiki/Changelog#370-14-november-2023">Mintty</a>
<a href="https://github.com/mintty/mintty/wiki/Changelog#370-14-november-2023">claims to
work</a>
(and <a href="https://github.com/mintty/wsltty">wsltty</a> does), but for some
reason running <a href="https://gist.github.com/chadaustin/2d2c2cb4b71fd1d4163aa8115077624a">my colortest.rs
program</a>
from Cygwin only works with semicolon syntax, unless I pipe the
output through <code class="language-plaintext highlighter-rouge">cat</code> or a file. There must be some kind of magic
translation happening under the hood. I haven’t debugged.</li>
  <li><a href="https://mosh.org/">Mosh</a> is aware, but hasn’t <a href="https://github.com/mobile-shell/mosh/issues/951">added
support</a>.</li>
  <li><a href="https://www.chiark.greenend.org.uk/~sgtatham/putty/">PuTTY</a>.</li>
  <li>Ubuntu 22.04 LTS ships a version of Konsole that only supports
semicolons.</li>
</ul>

<p>Terminfo entries are built from “building blocks”, marked with a plus.
<a href="https://invisible-island.net/ncurses/terminfo.src.html#tic-xterm_direct"><code class="language-plaintext highlighter-rouge">xterm+direct</code></a>
is the building block for the standard colon-delimited syntax.
<a href="https://invisible-island.net/ncurses/terminfo.src.html#tic-xterm_indirect"><code class="language-plaintext highlighter-rouge">xterm+indirect</code></a>
is the building block for legacy terminals that only support semicolon
syntax.</p>

<p>Searching for <code class="language-plaintext highlighter-rouge">xterm+indirect</code> shows which terminfo entries might work
for me. <code class="language-plaintext highlighter-rouge">vscode-direct</code> looks the most accurate. I assume that, since
it targets a Microsoft terminal, it’s probably close enough in
functionality to Windows Terminal and Windows Console. I have not
audited all capabilities, but it seems to work.</p>

<p>The next issue was that none of my servers had the <code class="language-plaintext highlighter-rouge">-direct</code> terminfo
entries installed! On most systems, the terminfo database comes from
the
<a href="https://packages.ubuntu.com/jammy/all/ncurses-base/filelist"><code class="language-plaintext highlighter-rouge">ncurses-base</code></a>
package, but you need
<a href="https://packages.ubuntu.com/jammy/all/ncurses-term/filelist"><code class="language-plaintext highlighter-rouge">ncurses-term</code></a>
for the extended set of terminals.</p>

<p>At work, we can configure a default set of installed packages for your
hosts, but I have to install them manually on my unmanaged personal
home machines. Also, I was still running Ubuntu 18, so I had to
upgrade to a version that contained the <code class="language-plaintext highlighter-rouge">-direct</code> terminfo entries.
(Of course, two of my headless machines failed to boot after
upgrading, but that’s a different story.)</p>

<p><del>Unfortunately, there is no terminfo entry for the Windows console.</del>
Since I started writing this post, ncurses introduced a
<a href="https://invisible-island.net/ncurses/NEWS.html#index-t20231230">winconsole</a>
terminfo entry, but it neither supports 24-bit color nor is released
in any ncurses version.</p>

<h2 id="configuring-emacs">Configuring Emacs</h2>

<p>Emacs documents <a href="https://www.gnu.org/software/emacs/manual/html_node/efaq/Colors-on-a-TTY.html">how it detects truecolor
support</a>.</p>

<p>I find it helpful to <code class="language-plaintext highlighter-rouge">M-x eval-expression</code> <code class="language-plaintext highlighter-rouge">(display-color-cells)</code> to
confirm whether Emacs sees 16.7 million colors.</p>

<p>Emacs also documents the <code class="language-plaintext highlighter-rouge">-direct</code> mode terminfo limitation described
above:</p>

<blockquote>
  <p>Terminals with ‘RGB’ capability treat pixels #000001 - #000007 as
indexed colors to maintain backward compatibility with applications
that are unaware of direct color mode. Therefore the seven darkest
blue shades may not be available. If this is a problem, you can
always use custom terminal definition with ‘setb24’ and ‘setf24’.</p>
</blockquote>

<p>It’s worth noting that <code class="language-plaintext highlighter-rouge">RGB</code> is Emacs’s fallback capability. Emacs
looks for the <code class="language-plaintext highlighter-rouge">setf24</code> and <code class="language-plaintext highlighter-rouge">setb24</code> strings first, but no terminfo
entries on my machine contain those capabilities:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ for t in $(toe -a | cut -f1); do
    if (infocmp "$t" | grep 'setf24') &gt; /dev/null; then
      echo "$t";
    fi;
done
$
</code></pre></div></div>

<h2 id="nesting-terminals">Nesting Terminals</h2>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>conhost.exe (WSL1)
+-------------------------+
| mosh                    |
| +---------------------+ |
| | tmux                | |
| | +-----------------+ | |
| | | emacs terminal  | | |
| | | +-------------+ | | |
| | | | $ ls        | | | |
| | | | foo bar baz | | | |
| | | +-------------+ | | |
| | +-----------------+ | |
| +---------------------+ |
+-------------------------+
</code></pre></div></div>

<p>I’d never consciously considered this, but my typical workflow nests
multiple terminals.</p>
<ul>
  <li>I open a graphical terminal emulator on my local desktop, Windows,
Mac, or Linux.</li>
  <li>I mosh to a remote machine or VM.</li>
  <li>I start tmux.</li>
  <li>I might then use a terminal within Emacs or
<a href="https://asciinema.org/">Asciinema</a> or <a href="https://www.gnu.org/software/screen/">GNU
Screen</a>.
    <ul>
      <li>Yes, there are situations where it’s useful to have some screen
sessions running inside or outside of tmux.</li>
    </ul>
  </li>
</ul>

<p>Each of those layers is its own implementation of the ANSI escape
sequence state machine. For 24-bit color to work, every single layer
has to understand and accurately translate the escape sequences from
the inner TERM value’s terminfo to the outer terminfo.</p>

<p>Therefore, you need recent-enough versions of all of this software.
Current LTS Ubuntus only ship with mosh 1.3, so I had to enable the
<a href="https://launchpad.net/~keithw/+archive/ubuntu/mosh-dev">mosh-dev
PPA</a>.</p>

<p>TERM must be set correctly within each terminal: <code class="language-plaintext highlighter-rouge">tmux-direct</code> within
tmux, for example. There is no standard terminfo for <code class="language-plaintext highlighter-rouge">mosh</code>, so you
have to pick something close enough.</p>

<h3 id="graphical-terminal-emulators">Graphical Terminal Emulators</h3>

<p>Most terminals either set TERM to a reasonable default or
allow you to override TERM.</p>

<p>I use Konsole, but I think you could find a similar option in
whichever you use.</p>

<figure>
<a href="/images/truecolor-terminal-emacs/konsole.png"><img src="/images/truecolor-terminal-emacs/konsole.png" alt="Konsole's TERM value selection" /></a>
<figcaption>Konsole's TERM value selection</figcaption>
</figure>

<h3 id="ssh">ssh</h3>

<p>Often, the first thing I do when opening a terminal is to <code class="language-plaintext highlighter-rouge">ssh</code>
somewhere else. Fortunately, this is easy, as long as the remote host
has the same terminfo record. <code class="language-plaintext highlighter-rouge">ssh</code> carries your TERM value into the
new shell.</p>

<h3 id="tmux">tmux</h3>

<p>But then you load <code class="language-plaintext highlighter-rouge">tmux</code> and TERM is set to <code class="language-plaintext highlighter-rouge">screen</code>! To fix this,
override <code class="language-plaintext highlighter-rouge">default-terminal</code> in your <code class="language-plaintext highlighter-rouge">~/.tmux.conf</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>set -g default-terminal "tmux-direct"
</code></pre></div></div>

<p>For extra credit, consider setting <code class="language-plaintext highlighter-rouge">tmux-direct</code> conditionally with
<code class="language-plaintext highlighter-rouge">%if</code> when the outer TERM supports 24-bit color, otherwise leaving the
default of <code class="language-plaintext highlighter-rouge">screen</code> or <code class="language-plaintext highlighter-rouge">tmux-256color</code>. And then let me know how you
did it. :P</p>

<h3 id="mosh">mosh</h3>

<p>While recent mosh does support 24-bit color, it <a href="https://github.com/mobile-shell/mosh/blob/1105d481bb9143dad43adf768f58da7b029fd39c/src/frontend/mosh-server.cc#L571">only advertises 8 or
256
colors</a>.
Thus, it’s up to you to set TERM appropriately.</p>

<p>Mosh aims for xterm compatibility, but unfortunately only supports
semicolon syntax for SGR 38 and 48, so <code class="language-plaintext highlighter-rouge">TERM=xterm-direct</code> does not
work. So far, I’ve found that <code class="language-plaintext highlighter-rouge">vscode-direct</code> is the closest to
<code class="language-plaintext highlighter-rouge">xterm-direct</code>.</p>

<p>There is no convenient “I’m running in mosh” variable, so I wrote a
<a href="https://gist.github.com/chadaustin/ee1a20e0522c10b65cb4006496d1fb7c"><code class="language-plaintext highlighter-rouge">detect-mosh.rs</code></a>
Rust script and called it from <code class="language-plaintext highlighter-rouge">.bashrc</code>:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">unamer</span><span class="o">=</span><span class="si">$(</span><span class="nb">uname</span> <span class="nt">-r</span><span class="si">)</span>
<span class="nv">unameo</span><span class="o">=</span><span class="si">$(</span><span class="nb">uname</span> <span class="nt">-o</span><span class="si">)</span>
<span class="k">if</span> <span class="o">[[</span> <span class="o">!</span> <span class="s2">"</span><span class="nv">$TMUX</span><span class="s2">"</span> <span class="o">]]</span><span class="p">;</span> <span class="k">then
    if</span> <span class="o">[[</span> <span class="s2">"</span><span class="nv">$unamer</span><span class="s2">"</span> <span class="o">==</span> <span class="k">*</span>Microsoft <span class="o">]]</span><span class="p">;</span> <span class="k">then</span>
        <span class="c"># WSL 1</span>
        <span class="nb">export </span><span class="nv">TERM</span><span class="o">=</span>vscode-direct
    <span class="k">elif</span> <span class="o">[[</span> <span class="s2">"</span><span class="nv">$unameo</span><span class="s2">"</span> <span class="o">==</span> Cygwin <span class="o">]]</span><span class="p">;</span> <span class="k">then</span>
        <span class="c"># Eh, could just configure mintty to set mintty-direct.</span>
        <span class="nb">export </span><span class="nv">TERM</span><span class="o">=</span>vscode-direct
    <span class="k">elif </span>detect-mosh 2&gt;/dev/null<span class="p">;</span> <span class="k">then</span>
        <span class="c"># This should be xterm-direct, but mosh does not understand SGR</span>
        <span class="c"># colon syntax.</span>
        <span class="nb">export </span><span class="nv">TERM</span><span class="o">=</span>vscode-direct
    <span class="k">fi
fi</span>
</code></pre></div></div>

<p>It works by checking whether the shell process is a child of
<code class="language-plaintext highlighter-rouge">mosh-server</code>.</p>

<p>The jury’s still out on whether it’s a good idea to compile Rust in
the critical path of login, especially into an underpowered host like
my Intel Atom NAS or a Raspberry Pi.</p>

<h2 id="it-works">It Works!</h2>

<p>Beautiful Emacs themes everywhere!</p>

<figure>
<a href="/images/truecolor-terminal-emacs/finally.png"><img src="/images/truecolor-terminal-emacs/finally.png" alt="Emacs within tmux within mosh" /></a>
<figcaption>Emacs within tmux within mosh</figcaption>
</figure>

<p>This was a ton of work, but I learned a lot, and, perhaps most
importantly, I now feel confident I could debug any kind of wonky
terminal behavior in the future.</p>

<p>To recap:</p>
<ul>
  <li>Terminals don’t agree on syntax and capabilities.</li>
  <li>Terminfo is how those capabilities are queried.</li>
  <li>Terminfo is often limited, sometimes inaccurate, and new terminfo
versions are released infrequently.</li>
</ul>

<h2 id="whats-next">What’s Next?</h2>

<p>If you were serious about writing software to take full advantage of
modern terminal capabilities, it would be time to break from terminfo.</p>

<p>I imagine such a project would look like this:</p>
<ul>
  <li>Continue to use the TERM variable because it’s well-supported.</li>
  <li>Give programs knowledge of terminals independent of the age of the
operating system or distribution they’re running on:
    <ul>
      <li>Programs would link with a frequently-updated (Rust?) library.</li>
      <li>Said library would contain a (modern!) terminfo database
representing, say, the last 10 years of terminal emulators, keyed
on (name, version). Notably, the library would not pretend to
support any hardware terminals, because they no longer exist. We
can safely forget about
<a href="https://www.gnu.org/software/termutils/manual/termcap-1.3/html_mono/termcap.html#SEC7">padding</a>,
for example.</li>
    </ul>
  </li>
  <li>Continue to support the terminfo file format and OS-provided
terminfo files on disk, with some protocol for determining which
information is most-up-to-date.</li>
  <li>Allow an opt-in TERMVERSION to differentiate between the
capabilities of, for example, 2022’s Konsole and 2023’s Konsole.</li>
  <li>Allow describing modern terminal capabilities (like 24-bit color,
256-color palette animation, <a href="https://github.com/Alhadis/OSC8-Adoption/">URL
links</a>, <a href="https://sw.kovidgoyal.net/kitty/graphics-protocol/">Kitty’s graphics
protocol</a>) in an
accurate, unambiguous format, independent of the timeline of new
ncurses releases.</li>
  <li>Backport modern terminal descriptions to legacy programs by
providing a program to be run by <code class="language-plaintext highlighter-rouge">.bashrc</code> that:
    <ul>
      <li>Uses TERM and TERMVERSION to generate a binary terminfo file in
<code class="language-plaintext highlighter-rouge">$HOME/.terminfo/</code>, which ncurses knows how to discover.</li>
      <li>Generates unambiguous 24-bit color capabilities like <code class="language-plaintext highlighter-rouge">RGB</code>,
<code class="language-plaintext highlighter-rouge">setf24</code>, and <code class="language-plaintext highlighter-rouge">setb24</code>, despite the fact that getting them added
to terminfo has been politically untenable.</li>
      <li>Otherwise, assumes RGB-unaware programs will assume the 256-color
palette, and leaves <code class="language-plaintext highlighter-rouge">colors#0x100</code>, <code class="language-plaintext highlighter-rouge">initc</code>, <code class="language-plaintext highlighter-rouge">oc</code> in place.
Palette animation is a useful, widely-supported feature.</li>
    </ul>
  </li>
</ul>

<p>Let me know if you’re interested in such a project!</p>]]></content><author><name></name></author><category term="terminal" /><summary type="html"><![CDATA[Thanks to some coworkers and David Wilson’s Emacs from Scratch playlist, I’ve been getting back into Emacs. The community is more vibrant than the last time I looked, and LSP brings modern completion and inline type checking.]]></summary></entry><entry><title type="html">Reference Counting Things</title><link href="https://chadaustin.me/2023/11/reference-counting-things/" rel="alternate" type="text/html" title="Reference Counting Things" /><published>2023-11-28T00:00:00-06:00</published><updated>2023-11-28T00:00:00-06:00</updated><id>https://chadaustin.me/2023/11/reference-counting-things</id><content type="html" xml:base="https://chadaustin.me/2023/11/reference-counting-things/"><![CDATA[<p>Reference counting is cheap and easy. An integer starts at one,
increments on every new reference, and whoever decrements it to zero
is responsible for deallocation.</p>

<p>If references are shared across threads, increments and decrements
must be atomic.</p>

<p>Decades ago, I wrote an <a href="https://audiere.sourceforge.net/">audio
library</a> that shipped in a couple
commercial games. Things you’d find on CD in the bargain bin at
Walmart. The ABI was <a href="https://chadaustin.me/cppinterface.html">modeled after
COM</a> and most objects were
reference-counted. At the time I’d never seen a dual-CPU system, and
thought <code class="language-plaintext highlighter-rouge">inc [refcount]</code> and <code class="language-plaintext highlighter-rouge">dec [refcount]</code> are single instructions.
It will be fine, right?!</p>

<p>Dual-core didn’t yet exist, but some people had dual-socket boards,
and we started seeing crash reports after the CDs were burned… oops.</p>

<p>(On the bright side, since I was religious about maintaining stable
ABIs, users could just drop the fixed DLL into place.)</p>

<h2 id="cost-of-atomics">Cost of Atomics</h2>

<p>Atomics are more expensive than non-atomic operations. <code class="language-plaintext highlighter-rouge">inc</code> is a
handful of cycles. <code class="language-plaintext highlighter-rouge">lock inc</code> even uncontended, can be dozens.</p>

<p>When C++ standardized <code class="language-plaintext highlighter-rouge">std::shared_ptr</code> in 2011 they <a href="https://stackoverflow.com/a/15140227/1483824">didn’t even
bother with a non-atomic
version</a>. C++ isn’t safe
enough, and there was a feeling that atomic increments and decrements
were common enough that they’d get optimized in hardware. That was
correct – it just took a while.</p>

<p>Rust’s safety guarantees, on the other hand, allow safe use of an
unsynchronized <code class="language-plaintext highlighter-rouge">Rc</code> if you don’t want to pay for <code class="language-plaintext highlighter-rouge">Arc</code>.</p>

<p>It’s pretty easy for reference counting overhead to show up in
profiles. Sometimes it’s the accidental <code class="language-plaintext highlighter-rouge">shared_ptr</code> copy in a hot
loop or a recursive <code class="language-plaintext highlighter-rouge">.clone()</code> in Rust. Last time I wrote Swift,
atomic reference counts were a major cost.</p>

<p>The hardware is getting better. On Apple Silicon and AMD
Zen 3, uncontended atomic increments and decrements are almost as
cheap as non-atomic. (Interestingly, atomics are also cheap on my
64-bit, 4-thread Intel Atom from 2011.) These optimizations are a big
deal, and if all CPUs worked that way, maybe this blog post would end
here.</p>

<p>Alas, data centers are still filled with years-old Intel CPUs or
non-Apple ARM implementation. It’s worth spending some time in
software to avoid synchronization if possible.</p>

<h2 id="avoid-0-to-1">Avoid 0-to-1</h2>

<p>Here’s an easy but commonly-missed trick. Initialize your reference
counts to 1.</p>

<p>For whatever reason (symmetry?), it’s common to see implementations like:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nc">Object</span> <span class="p">{</span>
  <span class="n">std</span><span class="o">::</span><span class="n">atomic</span><span class="o">&lt;</span><span class="kt">size_t</span><span class="o">&gt;</span> <span class="n">count</span><span class="p">{</span><span class="mi">0</span><span class="p">};</span>
<span class="p">};</span>

<span class="k">struct</span> <span class="nc">ObjectPtr</span> <span class="p">{</span>
  <span class="n">ObjectPtr</span><span class="p">(</span><span class="n">Object</span><span class="o">*</span> <span class="n">p</span><span class="p">)</span><span class="o">:</span> <span class="n">p</span><span class="p">{</span><span class="n">p</span><span class="p">}</span> <span class="p">{</span>
    <span class="n">p</span><span class="o">-&gt;</span><span class="n">count</span><span class="p">.</span><span class="n">fetch_add</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">std</span><span class="o">::</span><span class="n">memory_order_relaxed</span><span class="p">);</span>
  <span class="p">}</span>
  <span class="n">Object</span><span class="o">*</span> <span class="n">p</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div></div>

<p>I haven’t seen a compiler realize it can replace the initial value
with 1 and avoid atomics when new objects are allocated.</p>

<h2 id="avoid-1-to-0">Avoid 1-to-0</h2>

<p>A typical release implementation is written:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nc">ObjectPtr</span> <span class="p">{</span>
  <span class="o">~</span><span class="n">ObjectPtr</span><span class="p">()</span> <span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="mi">1</span> <span class="o">==</span> <span class="n">p</span><span class="o">-&gt;</span><span class="n">count</span><span class="p">.</span><span class="n">fetch_sub</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">std</span><span class="o">::</span><span class="n">memory_order_acq_rel</span><span class="p">))</span> <span class="p">{</span>
      <span class="k">delete</span> <span class="n">p</span><span class="p">;</span>
    <span class="p">}</span>
  <span class="p">}</span>
  <span class="n">Object</span><span class="o">*</span> <span class="n">p</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>However, actually decrementing the count to zero is not necessary. We
only need to know if we’re the last reference. Thus, we can write:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  <span class="o">~</span><span class="n">ObjectPtr</span><span class="p">()</span> <span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="mi">1</span> <span class="o">==</span> <span class="n">p</span><span class="o">-&gt;</span><span class="n">count</span><span class="p">.</span><span class="n">load</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">memory_order_acquire</span><span class="p">)</span> <span class="o">||</span>
        <span class="mi">1</span> <span class="o">==</span> <span class="n">p</span><span class="o">-&gt;</span><span class="n">count</span><span class="p">.</span><span class="n">fetch_sub</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">std</span><span class="o">::</span><span class="n">memory_order_acq_rel</span><span class="p">))</span> <span class="p">{</span>
      <span class="k">delete</span> <span class="n">p</span><span class="p">;</span>
    <span class="p">}</span>
  <span class="p">}</span>
</code></pre></div></div>

<p>Maybe the impact on code size isn’t worth it. That’s your call. On
older Intel CPUs, in situations where most objects only have one
reference, it can be a meaningful optimization.</p>

<p>Maged Michael <a href="https://github.com/gcc-mirror/gcc/commit/dbf8bd3c2f2cd2d27ca4f0fe379bd9490273c6d7">implemented a fancier version of this
algorithm</a>
in gcc’s libstdc++.</p>

<p><a href="https://github.com/facebook/watchman/commit/32c74f3a580785ecef2ee329a336f46e7002df8f">Implementing these two
optimizations</a>
in <a href="https://facebook.github.io/watchman/">Watchman</a> was a material win
for code that allocated or deallocated large trees.</p>

<h2 id="biased-reference-counting">Biased Reference Counting</h2>

<p>Swift implicitly reference-counts many of its objects. When I worked
at Dropbox, we measured reference counting operations as a substantial
portion of our overall CPU time.</p>

<p>In 2018, researchers at University of Illinois Urbana-Champaign
<a href="http://iacoma.cs.uiuc.edu/iacoma-papers/pact18.pdf">proposed an algorithm called Biased Reference
Counting</a> that
splits the reference count into two. One is biased to a specific
thread and can be updated without atomic operations. The other
reference count is atomic and shared among the remaining threads.
Unifying these two counts requires extra bookkeeping, especially in
languages like Swift or C++ where unsynchronized values can easily migrate
across threads.</p>

<p>The <a href="https://crates.io/crates/hybrid_rc">hybrid_rc</a> Rust crate has an
implementation of this algorithm that takes advantage of Rust’s type
system (in particular, by not providing <code class="language-plaintext highlighter-rouge">Send</code> for thread-local
references) to avoid extra bookkeeping.</p>

<p>I’m curious if anyone uses biased reference counting in practice.</p>

<h2 id="split-reference-counting">Split Reference Counting</h2>

<p>Channel and promise implementations need to track two reference
counts: one for readers and one for writers. When either reaches zero,
the channel is closed. Waiting senders or receivers are notified that
no more messages can be sent.</p>

<p>Rust’s built-in channels use <a href="https://github.com/rust-lang/rust/blob/9144d511758fdd85db4daeeea1020f62c61bdd04/library/std/src/sync/mpmc/counter.rs#L6">two atomic counters and an atomic
bit</a>.
The bit is necessary to determine which thread should deallocate in
the case that a thread drops the last reader exactly as another thread
drops the last writer.</p>

<p>It’s possible to pack all of these into a single 64-bit counter. If
each half has 32 bits but the entire counter is updated atomically, no
additional state is required to disambiguate who deallocates.</p>

<p>I have a Rust implementation of the above in the <a href="https://docs.rs/splitrc/latest/splitrc/">splitrc
crate</a>.</p>

<h2 id="how-many-bits">How Many Bits?</h2>

<p>Rust is sound: safe code must not have undefined behavior.
<a href="https://doc.rust-lang.org/std/mem/fn.forget.html">std::mem::forget</a>
is a safe function. Therefore, it’s possible to run up some reference
count <code class="language-plaintext highlighter-rouge">p</code> in a tight loop such as:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">loop</span> <span class="p">{</span>
  <span class="nn">std</span><span class="p">::</span><span class="nn">mem</span><span class="p">::</span><span class="nf">forget</span><span class="p">(</span><span class="n">p</span><span class="nf">.clone</span><span class="p">());</span>
<span class="p">}</span>
</code></pre></div></div>

<p>64-bit counters are effectively infinite. Let’s hypothesize a 4 GHz
CPU where increments take one cycle. It would take almost 150 years to
overflow.</p>

<p>In contrast, a modern CPU can overflow a 32-bit counter in seconds.
You might say (and I’d agree) that a program that holds billions of
references is pathological, and need not be supported. On the other
hand, in Rust, safe code must never overflow and cause use-after-free.</p>

<p>Therefore, any 32-bit counter (even <code class="language-plaintext highlighter-rouge">usize</code> and <code class="language-plaintext highlighter-rouge">AtomicUsize</code> on
32-bit CPUs) must detect and handle overflow.</p>

<p><code class="language-plaintext highlighter-rouge">Rc</code> uses <code class="language-plaintext highlighter-rouge">usize::wrapping_add</code> to <a href="https://github.com/rust-lang/rust/blob/6faa181015208911f9492cc41254fb2a0e95f23f/library/alloc/src/rc.rs#L3263">detect
wraparound</a>.
<code class="language-plaintext highlighter-rouge">Arc</code> reserves <a href="https://github.com/rust-lang/rust/blob/6faa181015208911f9492cc41254fb2a0e95f23f/library/alloc/src/sync.rs#L2025">half the range of
<code class="language-plaintext highlighter-rouge">usize</code></a>
to detect overflow. This is safe under the assumption that billions of
threads aren’t simultaneously incrementing the counter.</p>

<p>Rust reference counts typically abort on overflow rather than panic. I
assume this is because panics can be caught and ignored. There may be
codegen benefits as well. However, in the context of long-lived server
processes that concurrently handle requests, it’s nice to catch panics
and fail the one buggy request instead of aborting.</p>

<p><code class="language-plaintext highlighter-rouge">splitrc</code> allocates a <a href="https://docs.rs/splitrc/0.1.4/src/splitrc/lib.rs.html#55">panic range and an abort
range</a> to
get the best of both worlds.</p>

<p>In practice, overflowing a reference count should never happen.
Reference counts should never get that high. But that sounds like
<a href="https://www.wfrp.de/hosted/flw/en/flw0250.html">famous last words</a>,
and I’ll happily pay a branch and some cold code for a loud failure.</p>

<p>Older versions of the Linux kernel even had a <a href="https://lwn.net/Articles/786044/">use-after-free caused
by reference count overflow</a>.</p>

<h2 id="weak-references">Weak References</h2>

<p>Like split reference counts, supporting weak references requires
maintaining two counts: a strong count and a weak count. When the
strong count reaches zero, the referenced value is destructed. But the
memory can’t be deallocated until both counts reach zero.</p>

<p>The approach taken by Rust’s <code class="language-plaintext highlighter-rouge">Arc</code> is to maintain two separate
counters. All strong references share an extra weak reference. When
the last strong reference is dropped, the extra weak reference is
dropped too.</p>

<p>Memory is deallocated when the last weak reference reaches zero.</p>

<p><a href="https://github.com/llvm/llvm-project/blob/43bc81d7488a8fbd43b855f7e1100cfe110f90fc/libcxx/include/__memory/shared_ptr.h#L213-L216">libc++ takes a similar
approach</a>
with the interesting caveat that it starts counting at zero and waits
until the counts decrement to -1.</p>

<p>Supporting weak references has a small cost. You need space for two
counters and some implementations actually perform two atomic
decrements when the last strong reference is dropped.</p>

<p>It’s possible to do better: like
<a href="https://docs.rs/splitrc/latest/splitrc/">splitrc</a>, the strong and
weak references can be packed into a single 64-bit integer with
overflow detection on each half. Each new reference is a single atomic
addition. As in the 1-to-0 optimization above, an optimistic load can
avoid an atomic RMW in the common case that no weak references are
alive.</p>

<p>If you don’t need weak references, the Rust
<a href="https://docs.rs/triomphe/latest/triomphe/">triomphe</a> crate provides
some faster alternatives to the standard library.</p>

<h2 id="count-first-reference-from-zero-or-one">Count First Reference From Zero or One?</h2>

<p>It’s typical for reference counts to start at one and decrement to
zero. But that’s not the only option. As mentioned above, libc++
initializes its references to value zero, meaning one reference.
Decrement checks whether the count underflows to -1.</p>

<p>Unless you’re the most standard of libraries, the tiny differences in
instruction selection don’t matter. But they’re fun to look at, so
let’s see. (<a href="https://gcc.godbolt.org/z/qf9sGG9Th">Compiler Explorer</a>)</p>

<p>Initializing values to zero is smaller in most ISAs:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nc">RC</span> <span class="p">{</span>
    <span class="kt">size_t</span> <span class="n">s</span><span class="p">;</span>
    <span class="kt">size_t</span> <span class="n">w</span><span class="p">;</span>
<span class="p">};</span>

<span class="kt">void</span> <span class="n">init_zero</span><span class="p">(</span><span class="n">RC</span><span class="o">&amp;</span> <span class="n">rc</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">rc</span><span class="p">.</span><span class="n">s</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="n">rc</span><span class="p">.</span><span class="n">w</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>

<span class="kt">void</span> <span class="n">init_one</span><span class="p">(</span><span class="n">RC</span><span class="o">&amp;</span> <span class="n">rc</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">rc</span><span class="p">.</span><span class="n">s</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
    <span class="n">rc</span><span class="p">.</span><span class="n">w</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>x86-64 (gcc 13.2):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>init_zero(RC&amp;):
        pxor    xmm0, xmm0
        movups  XMMWORD PTR [rdi], xmm0
        ret
init_one(RC&amp;):
        movdqa  xmm0, XMMWORD PTR .LC0[rip]
        movups  XMMWORD PTR [rdi], xmm0
        ret
</code></pre></div></div>

<p>gcc chooses to load the pair of ones from a 128-bit constant. clang
instead generates two stores.</p>

<p>x86-64 (clang 17):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>init_zero(RC&amp;):                       # @init_zero(RC&amp;)
        xorps   xmm0, xmm0
        movups  xmmword ptr [rdi], xmm0
        ret
init_one(RC&amp;):                        # @init_one(RC&amp;)
        mov     qword ptr [rdi], 1
        mov     qword ptr [rdi + 8], 1
        ret
</code></pre></div></div>

<p>ARM64 gcc generates equivalent code to x86-64. clang on ARM64 instead
broadcasts a constant 1 into a vector and stores it.</p>

<p>64-bit ARM (clang 17):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>init_zero(RC&amp;):                       // @init_zero(RC&amp;)
        stp     xzr, xzr, [x0]
        ret
init_one(RC&amp;):                        // @init_one(RC&amp;)
        mov     w8, #1                          // =0x1
        dup     v0.2d, x8
        str     q0, [x0]
        ret
</code></pre></div></div>

<p>As expected, zero-initialization is slightly cheaper.</p>

<p>Increment will generate the same instructions no matter where the
count starts, of course. (If using a 32-bit counter, overflow checks
are required. Choosing an overflow range that allows branching on the
sign bit can generate a smaller hot path, but that’s almost
independent of where to start counting.)</p>

<p>Decrement is a little interesting.</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">dec_zero_exact</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">atomic</span><span class="o">&lt;</span><span class="kt">size_t</span><span class="o">&gt;&amp;</span> <span class="n">c</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="mi">0</span> <span class="o">==</span> <span class="n">c</span><span class="p">.</span><span class="n">fetch_sub</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">std</span><span class="o">::</span><span class="n">memory_order_acq_rel</span><span class="p">))</span> <span class="p">{</span>
        <span class="n">dealloc</span><span class="p">();</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="kt">void</span> <span class="n">dec_zero_less</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">atomic</span><span class="o">&lt;</span><span class="kt">size_t</span><span class="o">&gt;&amp;</span> <span class="n">c</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">using</span> <span class="kt">ssize_t</span> <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="n">make_signed_t</span><span class="o">&lt;</span><span class="kt">size_t</span><span class="o">&gt;</span><span class="p">;</span>
    <span class="k">if</span> <span class="p">(</span><span class="mi">0</span> <span class="o">&gt;=</span> <span class="k">static_cast</span><span class="o">&lt;</span><span class="kt">ssize_t</span><span class="o">&gt;</span><span class="p">(</span><span class="n">c</span><span class="p">.</span><span class="n">fetch_sub</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">std</span><span class="o">::</span><span class="n">memory_order_acq_rel</span><span class="p">)))</span> <span class="p">{</span>
        <span class="n">dealloc</span><span class="p">();</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="kt">void</span> <span class="n">dec_one</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">atomic</span><span class="o">&lt;</span><span class="kt">size_t</span><span class="o">&gt;&amp;</span> <span class="n">c</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="mi">1</span> <span class="o">==</span> <span class="n">c</span><span class="p">.</span><span class="n">fetch_sub</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">std</span><span class="o">::</span><span class="n">memory_order_acq_rel</span><span class="p">))</span> <span class="p">{</span>
        <span class="n">dealloc</span><span class="p">();</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Let’s look at x86-64:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>dec_zero_exact(std::atomic&lt;unsigned long&gt;&amp;):    # @dec_zero_exact(std::atomic&lt;unsigned long&gt;&amp;)
        mov        rax, -1
        lock xadd  qword ptr [rdi], rax
        test       rax, rax
        je         dealloc()@PLT                # TAILCALL
        ret
dec_zero_less(std::atomic&lt;unsigned long&gt;&amp;):     # @dec_zero_less(std::atomic&lt;unsigned long&gt;&amp;)
        lock dec  qword ptr [rdi]
        jl        dealloc()@PLT                 # TAILCALL
        ret
dec_one(std::atomic&lt;unsigned long&gt;&amp;):           # @dec_one(std::atomic&lt;unsigned long&gt;&amp;)
        lock dec  qword ptr [rdi]
        je        dealloc()@PLT                 # TAILCALL
        ret
</code></pre></div></div>

<p>There are two atomic decrement instructions, <code class="language-plaintext highlighter-rouge">lock dec</code> and <code class="language-plaintext highlighter-rouge">lock
xadd</code>. <code class="language-plaintext highlighter-rouge">lock dec</code> is slightly preferable: it has a similar cost, but
its latency is one cycle less on Zen 4, and it’s smaller. (<code class="language-plaintext highlighter-rouge">lock xadd</code>
also requires loading -1 into a register.)</p>

<p>But, since it doesn’t return the previous value and only sets flags,
it can only be used if a following comparison can use those flags.</p>

<p>Therefore, on x86-64, counting from 1 is slightly cheaper, at least
with a naive comparison. However, if we sacrifice half the range of
the counter type (again, two billion should be plenty), then we can
get the same benefits in the counting-from-zero decrement.</p>

<p>Now let’s take a look at ARM64:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>dec_zero_exact(std::atomic&lt;unsigned long&gt;&amp;):        // @dec_zero_exact(std::atomic&lt;unsigned long&gt;&amp;)
        mov     x8, #-1                         // =0xffffffffffffffff
        ldaddal x8, x8, [x0]
        cbz     x8, .LBB2_2
        ret
.LBB2_2:
        b       dealloc()
dec_zero_less(std::atomic&lt;unsigned long&gt;&amp;):         // @dec_zero_less(std::atomic&lt;unsigned long&gt;&amp;)
        mov     x8, #-1                         // =0xffffffffffffffff
        ldaddal x8, x8, [x0]
        cmp     x8, #0
        b.le    .LBB3_2
        ret
.LBB3_2:
        b       dealloc()
dec_one(std::atomic&lt;unsigned long&gt;&amp;):                // @dec_one(std::atomic&lt;unsigned long&gt;&amp;)
        mov     x8, #-1                         // =0xffffffffffffffff
        ldaddal x8, x8, [x0]
        cmp     x8, #1
        b.ne    .LBB4_2
        b       dealloc()
.LBB4_2:
        ret
</code></pre></div></div>

<p>None of the atomic read-modify-writes on ARM64 set flags, so the value
has to be explicitly compared anyway. The only difference is that
comparing equality with zero is one fewer instruction.</p>

<p>So there we go. All of this instruction selection is likely in the
wash. I was hoping for a <a href="https://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF">Dijkstra-like clear
winner</a>. The
strongest argument to start counts at 1 is that the counter never
underflows, allowing multiple counts to be packed into a single
integer.</p>

<h2 id="false-sharing">False Sharing</h2>

<p>Where the reference count is positioned in the object can matter. If
the reference count ends up in the same page as other modified data,
concurrent workloads are penalized. By reducing <a href="https://lore.kernel.org/netdev/CAHk-=wi=CDyS_ebXw745OCXnhwDpVLnahNveQNcZOPrzE5QiQA@mail.gmail.com/T/">false sharing and
making RCU more
scalable</a>,
Intel improved highly concurrent network performance in Linux by
2-130%.</p>

<p>There may be value in abstracting the count’s location through a
vtable, like COM does.</p>

<h2 id="ill-stop-here">I’ll Stop Here</h2>

<p>Reference counting is a long-studied topic. There are saturating
counts, counts that saturate into mark-and-sweep, counts that saturate
into (logged) leaks, cycle detection, weighted reference counts,
deferred increments, combining updates, external counts, but you can
read more elsewhere.</p>

<p>I mostly wanted to share some things I’ve recently run into.</p>]]></content><author><name></name></author><category term="c++" /><category term="rust" /><category term="performance" /><summary type="html"><![CDATA[Reference counting is cheap and easy. An integer starts at one, increments on every new reference, and whoever decrements it to zero is responsible for deallocation.]]></summary></entry><entry><title type="html">Microsoft Sculpt Wired Conversion Mod</title><link href="https://chadaustin.me/2021/02/wired-sculpt/" rel="alternate" type="text/html" title="Microsoft Sculpt Wired Conversion Mod" /><published>2021-02-21T00:00:00-06:00</published><updated>2021-02-21T00:00:00-06:00</updated><id>https://chadaustin.me/2021/02/wired-sculpt</id><content type="html" xml:base="https://chadaustin.me/2021/02/wired-sculpt/"><![CDATA[<p>I made a control board for the Microsoft Sculpt wireless keyboard that converts it to wired USB, and now my favorite keyboard is even better.</p>

<figure>
<a href="/images/sculpt/finished-board.jpeg"><img src="/images/sculpt/finished-board.jpeg" alt="The finished and installed board." /></a>
<figcaption>The finished and installed board.</figcaption>
</figure>

<figure>
<a href="/images/sculpt/messy-desk.jpeg"><img src="/images/sculpt/messy-desk.jpeg" alt="Wired keyboard and the resulting project mess!" /></a>
<figcaption>Wired keyboard and the resulting project mess!</figcaption>
</figure>

<figure>
<a href="/images/sculpt/underside.jpeg"><img src="/images/sculpt/underside.jpeg" alt="USB cable and reset button." /></a>
<figcaption>USB cable and reset button.</figcaption>
</figure>

<p>The QMK config is available at <a href="https://github.com/chadaustin/qmk_firmware">@chadaustin/qmk_firmware</a> (<a href="https://github.com/chadaustin/qmk_firmware/tree/master/keyboards/handwired/sculpt">keyboards/handwired/sculpt/</a>), and the PCB design files at <a href="https://github.com/chadaustin/wired-sculpt-pcb">@chadaustin/wired-sculpt-pcb</a>.</p>

<p>I’m planning on making at least one more, so if you’d like one, maybe I can help.</p>

<p>It’s a huge improvement. Latency is reduced by about 13 milliseconds, and with full control over the microcontroller’s firmware, you can customize keymaps and layers, and actually use the keyboard’s built-in LEDs.</p>

<h2 id="why">Why?</h2>

<p>Feel free to stop reading here — I am going to tell the sequence of events that led to this project. Besides some exposure to basic voltage and resistance circuits in college, I have very little electronics background. But, in a short time, I went from only barely knowing what a capacitor was to having a working PCB manufactured and assembled, and maybe this will inspire someone else to give it a try.</p>

<p>Since developing RSI in college, I’ve exclusively used Microsoft’s ergonomic keyboards. And when I first tried the Sculpt, I instantly knew it was the best yet. The soft actuation, short key travel, and rigid frame are perfect for my hands. And because the number pad is a separate device, the distance to my mouse is shortened.</p>

<p>My brother went out and bought one too. Not much later, he gave it to me, saying the latency was inconsistent and high, and it was unacceptable for gaming. I thought he was being uniquely sensitive, since I had no problem in either Linux, Windows 7, or macOS. But then I updated to Windows 10 and saw exactly what he meant.</p>

<p>It was like the keyboard would go to sleep if a key wasn’t pressed for a few seconds, and the first keypress after a wake would be delayed or, worse, dropped.</p>

<p>And heaven forbid I use my USB 3 hub, whose EMI would disrupt the 2.4 GHz signal, and <em>every other</em> keypress would be unreliable. I’d gone as far as mounting the wireless transceiver directly under my keyboard, on the underside of my desk, and keys were still dropped.</p>

<p>So, best keyboard ever. But wireless sucks. (But mostly in Windows 10? No idea about that.)</p>

<h2 id="over-the-hump">Over the Hump</h2>

<p>What started this whole thing is that the <a href="https://github.com/facebookexperimental/eden/#edenfs">EdenFS</a> team was a bunch of keyboard enthusiasts. During the pandemic, as we’re all at home burning out and missing each other, we were trying to think of some virtual team offsites. Wez offered to walk everyone through building a <a href="https://www.1upkeyboards.com/instructions-downloads/sweet-16-instructions/">Sweet 16 Macro Pad</a>.</p>

<figure>
<a href="/images/sculpt/sweet-16.jpeg"><img src="/images/sculpt/sweet-16.jpeg" alt="Assembled Sweet 16 underside" /></a>
<figcaption>Assembled Sweet 16 underside. This is take two, after resoldering and cleaning the whole thing. Take one was a bit of a mess.</figcaption>
</figure>

<p>So, okay, a keyboard is a matrix, with some diodes used to disambiguate the signalling, and a microcontroller that rapidly polls the matrix and reports events over USB…</p>

<p>So maybe I could fix the Sculpt! I bought a transceiver-less Sculpt off eBay for cheap and <a href="http://emmanuelcontreras.com/how-to/how-to-disassemble-microsoft-sculpt-ergonomic-keyboard-and-make-it-wired/">popped it open (thanks Emmanuel Contreras!)</a>, thinking maybe its controller could be flashed with new firmware that speaks USB. The Sculpt uses a <a href="https://infocenter.nordicsemi.com/pdf/nRF24LE1_PS_v1.6.pdf">Nordic Semiconductor nRF24LE1</a>, but I was nowhere near capable of making use of that information at the time, though it did point me to Samy Kamkar’s horrifying guide on <a href="https://samy.pl/keysweeper/">surreptitiously sniffing keystrokes from nearby (older) Microsoft wireless keyboards</a>.</p>

<p>I almost gave up here, but Per Vognsen <a href="https://twitter.com/pervognsen/status/1322422385174220800">suggested I scan the matrix myself</a> and it turns out Michael Fincham had already <a href="https://www.reddit.com/r/MechanicalKeyboards/comments/bhkgnp/modification_photos_qmk_wired_microsoft_sculpt/">mapped out the matrix and soldered a Teensy 2.0++ board onto the Sculpt’s test pads</a>, showing this was doable!</p>

<p>So I ordered my own microcontroller to try the same thing.</p>

<p>First, I bought an Arduino Pro Micro, like the Sweet 16 uses. Oh hey, 18 GPIO pins isn’t enough to drive the Sculpt’s 26-pin matrix. I looked at using an I2C GPIO expander, but it felt like taking on too much.</p>

<figure>
<a href="/images/sculpt/pro-micro.jpeg"><img src="/images/sculpt/pro-micro.jpeg" alt="Arduino Pro Micro" /></a>
<figcaption>Arduino Pro Micro. Wait, you need pins to scan a matrix?</figcaption>
</figure>

<p>More pins? QMK’s Proton C has more pins! So I carefully soldered onto the test pads as Michael had shown was possible… and it worked!</p>

<figure>
<a href="/images/sculpt/proton-c.jpeg"><img src="/images/sculpt/proton-c.jpeg" alt="QMK Proton C" /></a>
<figcaption>QMK Proton C. It's a beautiful board.</figcaption>
</figure>

<figure>
<a href="/images/sculpt/test-pads.jpeg"><img src="/images/sculpt/test-pads.jpeg" alt="Soldering test pads to Proton C." /></a>
<figcaption>Soldering test pads to Proton C.</figcaption>
</figure>

<figure>
<a href="/images/sculpt/all-test-pads.jpeg"><img src="/images/sculpt/all-test-pads.jpeg" alt="All test pads connected to Proton C. It works!" /></a>
<figcaption>All test pads connected to Proton C. It works!</figcaption>
</figure>

<p>Getting those wires to stick to the pads without shorting was tricky. (I hadn’t yet discovered how magical flux is.)</p>

<p>The keyboard worked, but I couldn’t fit the board, its wires, and the new microcontroller into the case, and I wasn’t <em>really</em> happy leaving it in this state, even if I could pack it in somehow.</p>

<p>I thought, all I <em>really</em> need is the ribbon cable connector, so I ordered a 30 pin, 1.0 mm pitch ribbon breakout and the pricier (but tons of pins!) <a href="https://www.pjrc.com/store/teensypp.html">Teensy 2.0++</a>. Looking back, it’s cute that I was trying to save $10 on the microcontroller… You just have to get used to spending money on whatever saves you time.</p>

<figure>
<a href="/images/sculpt/breakout-and-teensy.jpeg"><img src="/images/sculpt/breakout-and-teensy.jpeg" alt="Ribbon cable breakout and Teensy 2.0++" /></a>
<figcaption>Ribbon cable breakout and Teensy 2.0++</figcaption>
</figure>

<p>Well, it was almost as annoying to solder, and still didn’t fit. So much for saving money on microcontrollers.</p>

<p>I thought about giving up. Is it really that bad that my keys don’t always register in games? Can I just tolerate some flakiness and latency?</p>

<p>But Jon Watte offered to spend an entire day showing me how to use KiCad, design circuits, layout PCBs, select components on Digi-Key, scan datasheets for the important information, and how to work with a PCB manufacturing house. Of course you never turn down opportunities like that.</p>

<h2 id="designing-the-final-board---schematic">Designing the Final Board - Schematic</h2>

<p>Assuming, like me, you’ve never done this, I’ll summarize the steps.</p>

<p>First you sketch out the circuit schematic.</p>

<figure>
<a href="/images/sculpt/schematic.png"><img src="/images/sculpt/schematic.png" alt="Schematic" /></a>
<figcaption>Schematic in KiCad. Most of this was informed by the datasheet and Atmel's design guides.</figcaption>
</figure>

<p>Jon showed me several tricks in KiCad, like global labels, and starting with some standard resistor and capacitor values, but it’s very important that you go through the datasheets, because details can matter a ton.</p>

<p>I knew I wanted the main processor to be the AT90USB1286 controller, and fortunately KiCad already had a symbol for it. Atmel has a comprehensive and accessible data sheet, which showed me I needed some 22 Ω resistors on the USB data lines, which of the ISP programmer lines needed resistors (and appropriate values), and that I needed to either pull HWB low, or provide a physical switch that pulls it low, in order to allow rebooting the device into USB firmware update mode.</p>

<p>There are a bunch of things that are implicitly known to electrical engineers but that were new to me. You want:</p>

<ul>
  <li>a ground plane under the data lines and most of the microcontroller if possible.</li>
  <li>an electrolytic or tantalum bypass capacitor on the main 5V power from USB.</li>
  <li>ceramic filter capacitors on each power pin.</li>
  <li>appropriate values for the resonance capacitors on your crystal.</li>
  <li>electrostatic discharge protection! Turns out transients are common and it’s easy to fry a chip just by plugging it in.</li>
</ul>

<p>And then when you get into concerns like EMI and high-frequency signal integrity, the rabbit hole goes deep.</p>

<p>I kept having to tell myself “it’s just a keyboard”, but it also helped that there are a great number of high-quality resources on these topics just a click away. I spent lots of time on <a href="https://www.eevblog.com/">EEVBlog</a>.</p>

<p>Before finishing the circuit design, Jon had me do a couple smart things. In case the factory-supplied USB bootloader didn’t work out, he suggested I add the footprint (but not a connector!) for an ISP programmer and a debug LED to prove code would work at all.</p>

<h2 id="designing-the-final-board---physical-layout">Designing the Final Board - Physical Layout</h2>

<p>After arranging the schematic and ensuring it passed the electrical rules check, it was time to pick specific components. That is, the reference to a 220 Ω resistor is replaced with the Panasonic ERJ-3EKF2200V, 0603 surface mount.</p>

<p>There are a couple things to keep in mind. For common components, like resistors and ceramic capacitors, there is a huge amount of choice. For example, I see over 1400 surface-mount 220 Ω resistors on digikey. I tried to just stick with one high-quality brand like Panasonic or Samsung for all of that stuff.</p>

<p>The important thing is the physical form factor, which determines the footprint on the board. Once you pick a part, it has a size, and you need to tell KiCad which physical footprint should be assigned to that component. I used 0603 resistors, so I assigned each resistor in the schematic the “Resistor_SMD:R_0603_1608Metric” footprint.</p>

<p>Same for everything else. Jon showed me how to draw my own footprints, but to avoid complexity, I was able to find appropriate footprints in KiCad’s standard libraries for every component I needed.</p>

<p>When you import the schematic into Pcbnew, it’s time to figure out where things go. Where are the edges of the board? Make careful measurements here. Where do the mounting holes go? Where do you want 
the microcontroller? Where do you want the USB port?</p>

<figure>
<a href="/images/sculpt/dimensions.jpeg"><img src="/images/sculpt/dimensions.jpeg" alt="Measuring dimensions and mounting holes" /></a>
<figcaption>Measuring dimensions and mounting holes</figcaption>
</figure>

<p>Also, you have to pick through-hole sizes and trace widths. Jon had me use .250 mm for the narrow traces and .500 mm for the wider ones, presumably from experience. I used the narrow traces for signalling and wide traces for power, though I’ve since heard it’s a good idea to use narrow traces between filter capacitors and VBUS.</p>

<figure>
<a href="/images/sculpt/pcb-layout.svg"><img src="/images/sculpt/pcb-layout.svg" alt="Schematic" /></a>
<figcaption>PCB layout in KiCad</figcaption>
</figure>

<p>Of course, there’s some iteration between the schematic and the PCB. After physically placing the ribbon cable connector and MCU, the traces all crossed over each other, so I had to reassign all the pins so it made sense physically.</p>

<p>There are also physical constraints about how USB data lines are run, and how the electrostatic protection chip wants to be placed for the most protection.</p>

<p>So, as simple as this board is, I spent a fair amount of time getting all of that right.</p>

<p>I found myself getting lost in the abstractness of holes and traces and footprints, so it was helpful to ground myself by occasionally loading the PCB in KiCad’s 3D viewer.</p>

<figure>
<a href="/images/sculpt/3d-view.png"><img src="/images/sculpt/3d-view.png" alt="Schematic" /></a>
<figcaption>3D View</figcaption>
</figure>

<h2 id="designing-the-final-board---manufacturing-and-testing-physical-fit">Designing the Final Board - Manufacturing and Testing Physical Fit</h2>

<p>I tried to find a low-cost prototyping service in the USA, but it looks like China is still the best option if you want a PCB manufactured <em>and</em> assembled for an amount I’m willing to spend on a keyboard.</p>

<p>I saw PCBWay recommended somewhere, and it seemed like a fine choice.  Their site has tutorials that walk you through submitting your Gerber files in a way they can process.</p>

<p>Before buying any components or doing assembly, I figured it would be smart to do a test order, just to physically look at the board and make sure it fit.</p>

<p>Good thing, because it didn’t! The mounting holes were about half a millimeter off, and the clearance was tight enough that half a millimeter mattered.</p>

<figure>
<a href="/images/sculpt/first-board.jpeg"><img src="/images/sculpt/first-board.jpeg" alt="First board!" /></a>
<figcaption>First board!</figcaption>
</figure>

<p>I couldn’t stop playing with it! It’s so magical to have the lines drawn in software turned into physical fiberglass and copper.</p>

<h2 id="designing-the-final-board---assembly">Designing the Final Board - Assembly</h2>

<p>After making a couple adjustments and updating the version number and date on the silkscreen, I sent another order to PCBWay, this time requesting assembly service.</p>

<p>Overall, I was impressed with their communication. They couldn’t get the specific LED I’d listed in my BOM and confirmed if a substitution was okay.</p>

<p>Then, after all the parts were sourced, they asked me to clarify the polarity of the main tantalum bypass capacitor, since I’d forgotten to indicate anything on the silkscreen.</p>

<p>Finally, before shipping the assembled board, they sent me high-resolution photos of each side and asked me to confirm orientations and assembly.</p>

<figure>
<a href="/images/sculpt/pcbway-top.jpeg"><img src="/images/sculpt/pcbway-top.jpeg" alt="Top of assembled board" /></a>
<figcaption>Top of assembled board</figcaption>
</figure>

<figure>
<a href="/images/sculpt/pcbway-bottom.jpeg"><img src="/images/sculpt/pcbway-bottom.jpeg" alt="Bottom of assembled board" /></a>
<figcaption>Bottom of assembled board</figcaption>
</figure>

<p>It all looked correct to me, though I later noticed that one of the traces is lifted. (There is still connectivity, and it’s not a huge deal, as that trace is only connected to an LED that I haven’t gotten to work anyway.)</p>

<p>It took about a month for the assembled board to arrive. I checked the assembly status every day. Maybe next time I’ll expedite. :)</p>

<p>Overall, I was pretty happy:</p>

<ul>
  <li>My first test order, the minimum, was cheap and came with a cute battery-powered LED Christmas tree ornament.</li>
  <li>They made my board even though it was technically smaller than their minimum size.</li>
  <li>They took care of setting up the alignment holes for the pick-and-place machine, and sent me individual boards. I didn’t have to do any panelization.</li>
  <li>Shipping from China seemed unreasonably fast, but I suppose that’s how things work these days.</li>
</ul>

<h2 id="electrical-testing">Electrical Testing</h2>

<figure>
<a href="/images/sculpt/version-2-fit.jpeg"><img src="/images/sculpt/version-2-fit.jpeg" alt="The second revision fit in the case!" /></a>
<figcaption>The second revision fit in the case!</figcaption>
</figure>

<p>Before powering anything, I carefully did an electrical connectivity test of the main power circuits. Wanted to make sure the first power-on wasn’t going to result in a puff of blue smoke.</p>

<p>I briefly panicked, thinking everything was installed backwards, until I discovered my crappy little multimeter, in continuity mode, runs current from COM to positive. So I kept thinking there was a short somewhere on the board, and I’d have to disassemble to debug it! In reality, it was showing the ESD protection circuitry correctly shunting current from GND to VBUS.</p>

<p>When I realized this and reversed the leads, everything was correct. (And I bought a nicer multimeter which doesn’t have this problem.)</p>

<p>There was an electrical issue, however! Most of the pins on the ribbon cable connector weren’t soldered down to the board. I don’t know if this is a solder mask issue with the footprint in KiCad or if the board wasn’t flat enough for the paste on each pad to connect upwards to the leg.</p>

<p>I was afraid of forming bridges between the 1 mm pitch fins, so I coated the entire area in flux and very carefully swiped solder upwards from the pad. It took three passes before I was able to measure reliable connectivity between each ribbon pin and the corresponding microcontroller leg.</p>

<figure>
<a href="/images/sculpt/resoldered-fpc-connector-legs.jpeg"><img src="/images/sculpt/resoldered-fpc-connector-legs.jpeg" alt="Resoldered FPC connector legs" /></a>
<figcaption>Resoldered FPC connector legs</figcaption>
</figure>

<p>I see why people use microscopes for this stuff.</p>

<h2 id="fuses-and-firmware">Fuses and Firmware</h2>

<p>Now that everything seemed electrically correct, it was time to plug it in. Success! The factory-supplied DFU bootloader device showed up.</p>

<figure>
<a href="/images/sculpt/dfu-device.jpeg"><img src="/images/sculpt/dfu-device.jpeg" alt="Linux recognized the DFU bootloader device!" /></a>
<figcaption>Linux recognized the DFU bootloader device!</figcaption>
</figure>

<p>With <code class="language-plaintext highlighter-rouge">dfu-programmer</code>, I uploaded a tiny C program that simply blinked the test LED pin at 1 Hz. First weirdness: the clock speed seemed to be incorrect. After some careful datasheet reading, long story short, the CLKDIV fuse bit comes preprogrammed, which divides your clock speed by 8. So the crystal was 16 MHz, but the MCU was dividing that down to 2 MHz. I had expected it to use the internal RC oscillator by default, which would have resulted in a 1 MHz clock.</p>

<p>You can change the fuse bits with an in-circuit programmer device (not USB!), but that has the side effect of erasing the convenient factory-supplied USB bootloader, which I’d prefer to leave alone if possible. (There’s a LUFA bootloader you can upload, but since all of this was new, baby steps felt good.)</p>

<p>Fortunately, for this device, none of the above actually matters! It turns out I can get away without programming any fuse bits. CLKDIV merely sets the default clock speed divisor, and you can change it in software at the start of your program:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>clock_prescale_set(clock_div_1);
</code></pre></div></div>

<p>The result of all of this is that the six AVR ISP pins on the board are only necessary in emergencies. (Good thing, because I borrowed two of the pins later.) From the factory, it can be flashed with firmware and function as designed.</p>

<h2 id="qmk">QMK</h2>

<p>After getting the clock speed issues sorted, I flashed QMK — thanks again to Michael Fincham for mapping the layout — and it worked!</p>

<p>The Sculpt treats left and right spacebar as independent keys. Michael took advantage of that and mapped right spacebar to enter. Turns out I couldn’t live with that, so I mapped it back to space.</p>

<p>Now that it’s not necessary for the battery indicator, I repurposed the keyboard’s red LED for Caps Lock.</p>

<p>I’d like to use the green LED too, but I discovered it has reversed polarity, and there’s no easy way to drive it with the current circuit.</p>

<figure>
<a href="/images/sculpt/caps-lock.jpeg"><img src="/images/sculpt/caps-lock.jpeg" alt="Finally, the Sculpt has a Caps Lock indicator" /></a>
<figcaption>Finally, the Sculpt has a Caps Lock indicator</figcaption>
</figure>

<h2 id="case-fitting-and-reassembly">Case Fitting and Reassembly</h2>

<p>Dremel.</p>

<figure>
<a href="/images/sculpt/dremel.jpeg"><img src="/images/sculpt/dremel.jpeg" alt="Cut cut!" /></a>
<figcaption>Cut cut!</figcaption>
</figure>

<p>The only complication here was realizing it would be super convenient to launch the bootloader without disassembling the keyboard, so I soldered RST and GND from the AVR ISP pins to a button and hot-glued that into the battery compartment. (HWB is pulled low on the board, so all external resets enter the bootloader.)</p>

<p>To allow future disassembly, I cut up a PC fan extension cable and repurposed the connectors.</p>

<figure>
<a href="/images/sculpt/borrowing-rst-gnd.jpeg"><img src="/images/sculpt/borrowing-rst-gnd.jpeg" alt="Borrowing RST and GND pins" /></a>
<figcaption>Borrowing RST and GND pins</figcaption>
</figure>

<figure>
<a href="/images/sculpt/external-reset-button.jpeg"><img src="/images/sculpt/external-reset-button.jpeg" alt="External reset button. I almost forgot the spike-limiting resistor!" /></a>
<figcaption>External reset button. I almost forgot the spike-limiting resistor!</figcaption>
</figure>

<figure>
<a href="/images/sculpt/underside.jpeg"><img src="/images/sculpt/underside.jpeg" alt="All packed up!" /></a>
<figcaption>All packed up!</figcaption>
</figure>

<h2 id="latency">Latency</h2>

<p>I don’t have enough words to convey how happy this modded keyboard makes me.</p>

<p>After years of thinking I was just getting old and losing my dexterity, my computer feels solid again. It’s like a bunch of resistance disappeared. Gaming is easier. Typing is easier. Latency is definitely better, and perhaps more importantly, more consistent.</p>

<p>I fired up <a href="https://isitsnappy.com/">Is It Snappy?</a> and measured, on my PC, a total keyboard-to-screen latency reduction from 78 ms to 65. 13 milliseconds better!</p>

<p>I’ll have to test it on my new work laptop, an MSI GS65 Stealth, which measures keypress-to-pixels latency under 30 ms (!).</p>

<p>This project was worth every hour it took.</p>

<p>And during my latency testing, the wireless keyboard repeatedly dropped keys, as if to validate all of my complaints in a final hurrah.</p>

<h2 id="power">Power</h2>

<p>While waiting for the assembled PCB to arrive from China, I modded my Wii sensor bar to take 100 mA from the TV USB and bump it up to the 7.5V required to light its infrared LEDs. I was worried about excessive current draw and potentially damaging the TV’s USB ports, so I picked up a USB meter.</p>

<p>This keyboard draws about 60 mA — a quarter watt — which isn’t bad, but it feels possible to do better.</p>

<figure>
<a href="/images/sculpt/usb-meter.jpeg"><img src="/images/sculpt/usb-meter.jpeg" alt="USB power draw" /></a>
<figcaption>USB power draw</figcaption>
</figure>

<p>The original wireless transceiver draws 20 mA under use and under 100 µA when idle. So I might play around with clocking down to 8 MHz and seeing what subsystems on the microcontroller can be turned off.</p>

<p>With a switching regulator, I could even drop the MCU voltage to 3.3. And as awful as the wireless Sculpt’s sleep behavior was, there’s perhaps opportunity to improve there.</p>

<p>I probably won’t push too hard. I almost never use a wired keyboard on a phone or laptop where it might make a small difference.</p>

<h2 id="next-steps">Next Steps</h2>

<p>Besides reducing power, there are a few improvements I’d like to make:</p>

<ul>
  <li>The Fn switch (between function keys and volume/brightness/etc.) isn’t functional. I traced the membrane and discovered the switch controls whether pin 1 is pulled down by 47K or by half a megaohm. So I guess, assuming the membrane’s parasitic capacitance, I can detect the switch’s state by driving it high and measuring how long until it drops low.</li>
  <li>The green LED has reversed polarity from the red! To drive them both at once, I’ll have to set the ground pin at maybe half VCC and treat red active-high and green active-low. That might complicate the Fn switch, since it’s pulled towards this same “ground” voltage. I haven’t figured out what the Microsoft circuit does.</li>
  <li>Next time, I’ll put all LEDs on PWM pins. They’re a bit too bright, and breathing would be fun.</li>
  <li>I’d like a better-fitting ribbon cable connector. One that closes more nicely onto the ribbon. And while I appreciate being forced to learn the magic of flux, it would be nice if it came correctly soldered to the pads.</li>
  <li>Given tantalum is a rare earth metal, it might be nice to replace it with a ceramic and maybe a 4 Ω resistor in series. I’d love any input here.</li>
  <li>I’ve always kind of wanted a USB 2 hub like old keyboards used to have? Hub controllers are <a href="https://www.retrocution.com/2020/01/15/easy-diy-tiny-usb-hub-for-raspberry-pi-projects/">only a couple bucks</a>…</li>
  <li>Pretty up the traces on the board. :) They should look as <a href="https://www.youtube.com/watch?v=DAlKDkflkZo">clean as Nilaus makes his Factorio belts</a>.</li>
</ul>

<h2 id="closing-thoughts">Closing Thoughts</h2>

<p>This was a ton of fun. Electronics is so hands-on and as someone who never held interest in Maker Faire or any of that community, <em>I get it now</em>. Connecting software to the real world feels so empowering.</p>

<p>What exploded my mind the most was how accessible hardware design is these days. Download the open source KiCad, pick some inexpensive parts on Digi-Key, send it off to a manufacturer, and for not that many dollars, you have a functioning board!</p>

<p>Now I stare at duct vents and wonder if I could just hook up a microcontroller and a servo to reduce home energy usage.</p>]]></content><author><name></name></author><category term="keyboard" /><category term="electronics" /><category term="hackernews" /><summary type="html"><![CDATA[I made a control board for the Microsoft Sculpt wireless keyboard that converts it to wired USB, and now my favorite keyboard is even better.]]></summary></entry><entry><title type="html">Measuring and Coping with Indoor CO2</title><link href="https://chadaustin.me/2020/09/measuring-indoor-co2/" rel="alternate" type="text/html" title="Measuring and Coping with Indoor CO2" /><published>2020-09-28T00:00:00-05:00</published><updated>2020-09-28T00:00:00-05:00</updated><id>https://chadaustin.me/2020/09/measuring-indoor-co2</id><content type="html" xml:base="https://chadaustin.me/2020/09/measuring-indoor-co2/"><![CDATA[<p>I have two home offices. The one in the garage has lots of natural light, is separate from the family, but is only usable when the weather is sufficiently cool. The one in the house is in a tiny 100 sq. ft. bedroom. During this particularly hot summer, I mostly stayed indoors, and I noticed I’d get dizzy and sometimes nauseous in the afternoons.</p>

<p>It’s even worse on days that my daughter has to use the office for school in the morning. So I bought a CO2 monitor.</p>

<figure>
<img src="/images/co2_monitor.jpeg" alt="Autopilot CO2 Monitor" />
<figcaption>AutoPilot APCEMDL Desktop CO2</figcaption>
</figure>

<p>I bought that model because it writes values to a CSV file on a microSD card, so I could graph them if I wanted.</p>

<p>Typical outdoor CO2 levels are around 415 ppm. The science is fuzzy, but it seems like CO2 levels above 1000 ppm start to negatively impact thinking, concentration, and comfort. I know that, on days where I can’t open the window because it’s too hot or smoky outside, my thinking gets super fuzzy to the point that I’m unable to work.</p>

<p>Now that I can quantify the CO2 level in the room, with a single person and maybe the door cracked, the CO2 level will climb to about 1200 ppm.</p>

<figure>
<img src="/images/co2_1100.jpeg" alt="CO2 1100 ppm" />
<figcaption>CO2 1100 ppm</figcaption>
</figure>

<p>If I’m giving a presentation or conducting an interview, it might even reach north of 2000.</p>

<figure>
<img src="/images/co2_1750.jpeg" alt="CO2 1750 ppm" />
<figcaption>CO2 1750 ppm</figcaption>
</figure>

<p>If it got too high, no matter the conditions outside, I’d crack a window in the room I was in. Even a small crack was enough to help some. Does gas diffuse that quickly?</p>

<p>I did make a trip to a nursery to purchase some indoor goldon pothos plants, but it’s not clear if they help in a material way. I don’t think I have controlled enough conditions to measure the effect of two small plants. (I will say that the goal is not for the plant to produce oxygen, as humans consume far more oxygen than a small plant can produce, but instead to absorb CO2.)</p>

<p>Even the entire house might have elevated CO2 levels. With the doors and windows shut most of the day (wildfire season), and everyone home (pandemic), the CO2 level in the house stayed around 1500 for weeks at a time. And my children were randomly vomiting in the mornings, with no other symptoms. Related? Caused by smoke seeping in? Something else? Not sure.</p>

<p>It climbs high in a bedroom during the night, too.</p>

<p>But here’s a surprising case: every time we’d use our gas stove, oven, or dryer, the CO2 level in the house would spike above 2000 ppm. This makes sense, since the outputs of burning methane are water vapor and CO2. During bad air quality days, when we had to keep the windows shut, we tried to only use electric cooking appliances: the rice cooker, bread maker, toaster oven, and crock pot.</p>

<p>When we bought the house, natural gas was the cheap, clean source of fuel. Now, I’m wishing we’d gone electric for at least the dryer and water heater, especially when we install solar panels.</p>

<p>Someone recently asked me “Hey, how is your CO2 problem?” I laughed a bit, because it’s not <em>my</em> CO2 problem – it’s all of ours. Especially in modern, insulated and well-sealed homes. I’m just measuring it. And it’s only going to get worse with climate change. In the past 60 years, humans have increased atmospheric CO2 levels from 315 ppm to 415 ppm, and it’s expected to reach 700-900 ppm in my childrens’ lifetimes. That means indoor CO2 levels will stay above levels that affect human cognition. (Climate change depresses the shit out of me, and this is yet another reason why.)</p>

<p>There are cities that are banning new construction of residental natural gas lines. When I first heard of that, it seemed crazy, but it doesn’t anymore. I will miss cooking eggs on a gas stove, but maybe that can be solved with a single burner fed from a tank of biogas, or something.</p>]]></content><author><name></name></author><category term="family" /><summary type="html"><![CDATA[I have two home offices. The one in the garage has lots of natural light, is separate from the family, but is only usable when the weather is sufficiently cool. The one in the house is in a tiny 100 sq. ft. bedroom. During this particularly hot summer, I mostly stayed indoors, and I noticed I’d get dizzy and sometimes nauseous in the afternoons.]]></summary></entry><entry><title type="html">Measuring Storage Closet Temperature and Humidity with a Raspberry Pi</title><link href="https://chadaustin.me/2020/09/raspberry-pi-zero-storage-sensor/" rel="alternate" type="text/html" title="Measuring Storage Closet Temperature and Humidity with a Raspberry Pi" /><published>2020-09-27T00:00:00-05:00</published><updated>2020-09-27T00:00:00-05:00</updated><id>https://chadaustin.me/2020/09/raspberry-pi-zero-storage-sensor</id><content type="html" xml:base="https://chadaustin.me/2020/09/raspberry-pi-zero-storage-sensor/"><![CDATA[<p>We live in Silicon Valley, which means our house is too expensive and too small. So, unlike my parents’ old house in the midwest, we don’t have the luxury of a basement with consistent year-round temperature and humidity for long-term storage.</p>

<p>We moved some of our long-term storage into the house, some into the attic, and some into the garage. But I’ve been concerned about temperature and humidity swings, especially during the rainy season and when it gets extremely hot in late summer.</p>

<p>I bought a steel storage closet from <a href="https://www.uline.com">Uline</a>, a Raspberry Pi Zero, and an <a href="https://www.adafruit.com/product/393">AM2302</a> temperature-humidity sensor. The AM2302 is just a DHT22 with the pull-up resistor built-in, so installing it is as simple as soldering three wires onto the Raspberry Pi.</p>

<h2 id="reading-the-sensor">Reading the Sensor</h2>

<p>Then the question became how to read it from software. Standard tutorials suggest using <a href="https://github.com/adafruit/Adafruit_Python_DHT">Adafruit’s Python DHT module</a>. It works, reading the sensor every couple seconds will consume nearly the entire Raspberry Pi Zero’s single core. That’s because it uses Linux’s memory-mapped GPIO interface to <a href="https://github.com/adafruit/Adafruit_Python_DHT/blob/8f5e2c4d6ebba8836f6d31ec9a0c171948e3237d/source/Raspberry_Pi_2/pi_2_dht_read.c#L38">communicate with the sensor</a>. It sets scheduling priority to realtime, bit-bangs the GPIO pin, and then busy-loops to poll for signal edges.</p>

<p>(The DHT22 uses a bespoke one-pin signaling protocol where bits are distinguished by whether voltage is held high for shorter than or longer than about 40 microseconds. The idea is that you measure the time between all 80-some edges and compute bits.)</p>

<p>I had hoped to use the CPU for things in addition to busy-polling, so I wondered if there was a more efficient way to read the sensor. Well it turns out the kernel <a href="https://citizen-stig.github.io/2020/05/17/reading-temperature-sensor-in-rust-using-raspberry-pi-gpio.html">now supports a character device interface</a> using ioctls that delivers edge transitions as a stream of events that can be read. Sounds perfect! Unfortunately, even with a minimal C program, transitions were being missed, and I couldn’t reliably parse a result packet. My guess is the kernel isn’t polling the pin signal frequently enough, so transitions are dropped.</p>

<p>Polling the pin in userspace is too expensive, and the gpio character device interface didn’t work, but fortunately recent kernels include an IIO (Industrial I/O) driver that <a href="https://github.com/torvalds/linux/blob/a1b8638ba1320e6684aa98233c15255eb803fac7/drivers/iio/humidity/dht11.c">speaks the DHT11/DHT22 protocol</a>. The driver registers an interrupt timer to reliably poll the signal at a high enough frequency to read every transition, and exposes the results as files in sysfs. Reading values with the dht11 driver on the Raspberry Pi means the CPU stays almost entirely idle, and I don’t have to worry about scheduling other processes.</p>

<h2 id="recording-and-graphing-values">Recording and Graphing Values</h2>

<p>The popular time series database and graphing solution seems to be InfluxDB and Grafana these days, but after going through the setup tutorial for those, I decided I didn’t want to deal with containers and security updates and running complicated software on my home network. Given my very limited free time these days, I optimize for systems with extremely low maintenance costs, even if they require more up-front work. (This happens to be why I replaced WordPress with something based on Jekyll, too.)</p>

<p>Thus, I wrote a tiny HTTP server with <a href="https://docs.rs/hyper/0.13.8/hyper/">Hyper</a> that simply writes recorded values and their timestamps to CSV files.</p>

<p>Then, a separate program invokes gnuplot to output PNG graphs into a dedicated folder on my NAS.</p>

<figure>
<img src="/images/temphumgraph.png" alt="Temperature and Humidity" />
<figcaption>Example temperature and humidity graph</figcaption>
</figure>

<p>Simple, solves for my needs, and gives me the option to import data from CSV into InfluxDB and Grafana later, if desired.</p>

<h2 id="next-steps">Next Steps</h2>

<p>Next, more sensors! Besides temperature and humidity from various parts of the house, recording indoor carbon dioxide would be useful.</p>

<p>I also need to automate the creation of SD card images. So far, I’ve manually assigned IP addresses and configured systemd units on the NAS and each device, but that’s getting unwieldy, especially since SD cards are likely the first thing to fail on a Pi.</p>

<p>Now that I have reliable data, I’ll probably experiment with placing insulation between the storage unit and the wall, and placing some closed bags of charcoal inside the (enclosed) unit to see if it reduces humidity movement through the day.</p>]]></content><author><name></name></author><category term="family" /><category term="electronics" /><summary type="html"><![CDATA[We live in Silicon Valley, which means our house is too expensive and too small. So, unlike my parents’ old house in the midwest, we don’t have the luxury of a basement with consistent year-round temperature and humidity for long-term storage.]]></summary></entry><entry><title type="html">Two Years at Dropbox</title><link href="https://chadaustin.me/2019/11/two-years-at-dropbox/" rel="alternate" type="text/html" title="Two Years at Dropbox" /><published>2019-11-30T00:00:00-06:00</published><updated>2019-11-30T00:00:00-06:00</updated><id>https://chadaustin.me/2019/11/two-years-at-dropbox</id><content type="html" xml:base="https://chadaustin.me/2019/11/two-years-at-dropbox/"><![CDATA[<h2 id="disclaimer">Disclaimer</h2>

<p>This post is a collection of stories from my time at Dropbox. Inevitably, someone will read too much into it and come away with some overgeneralized lesson, but keep in mind that I was only there for two pre-IPO years and only exposed to a couple specific areas corners of the company.</p>

<p>I certainly don’t regret my time there - my coworkers were amazing and I learned a lot about myself. In fact, this post says more about me than it does about the company.</p>

<p>And <a href="https://www.inc.com/business-insider/tech-companies-employee-turnover-average-tenure-silicon-valley.html">two years is the average Silicon Valley tenure</a>, right?</p>

<h2 id="the-interview">The Interview</h2>

<p>I’d been at <a href="/tag/imvu">IMVU</a> almost ten years (an eternity by Silicon Valley standards!) and realized a year would disappear without me noticing. It was time for something new. I knew someone at Dropbox, and I’d met some more smart people at <a href="https://cppcon2014.sched.com/event/1yYeaTK">CppCon 2014</a>, so when they reached out to see if it made sense for me to join, <a href="http://randsinrepose.com/archives/shields-down/">I didn’t say no</a>.</p>

<p>The interview process was confused. After dinner introductions and recruiter phone calls, I was first invited up to San Francisco to meet with a handful of product division leads. Next, they scheduled a proper technical interview. After passing what ended up being a short half-day round of trivial whiteboard problems, I was annoyed to find that I’d have to take another day off and come back for a second round. After passing those, they invited me again to meet with more product leads. By this point I’d met with something like 14 people. Finally, in frustration, I asked “What are we doing? It costs me real money to take days off, so do you want to offer me a job or not?” They did.</p>

<p>At the time, it seemed like a good offer – relative to IMVU, a doubling in total compensation!  In hindsight, though, I should have negotiated higher. More on that later.</p>

<p>My excitement started to build. IMVU was a great engineering organization, but the product direction was weak and aimless. And from the interview alone, I could tell Dropbox had an amazing product culture; I had to see how it worked from the inside. The fact that the Dropbox client was 10 years old and remained a simple and refined UX said a lot – any other company would have peppered the product with random features. As a random example of what I mean, Excel currently lists “Bing Maps” in the ribbon before “Recommended Charts”, letting its internal turf wars trump the user experience.</p>

<p>Also, during the interview process, I got a sneak peak of what’s now called Dropbox Paper. Within minutes, I instantly understood the product, what problem it solved, and why people would want it. Not only that, but it was clean and delightful in a way that other online collaboration tools weren’t. I knew then that Dropbox’s product culture had magic, so I had to see it from the inside.</p>

<h2 id="yellow-flags">Yellow Flags</h2>

<p>Between accepting and starting, three people independently reached out and said “Don’t join Dropbox. You’re making the wrong decision.” One person said the culture wasn’t good – politics and currying favor with an old boys’ network. Another said the commute to San Francisco would do me in, especially since I was having my third child. A third was concerned that I was joining a company too late in its life. Dropbox had already grown substantially and the last round of investment valued the company at $10B. I should buy low instead of high. I decided to join anyway.</p>

<p>Regarding culture: I’m pleased to say that, while it may have been the case that Dropbox was a frat house in its early years, they had intentionally and decisively solved that problem by the time I joined. Like Facebook, they implemented something like the <a href="https://en.wikipedia.org/wiki/Rooney_Rule">Rooney Rule</a> in the hiring process. In addition, everyone was encouraged (required?) to take unconscious bias training. It was surprisingly valuable, and it made me notice the dozens of ways that an interview can be biased without realizing it. I felt that Dropbox did a good job of actively striving to make the workplace and hiring process as inclusive and bias-free as you can reasonably hope for.</p>

<p>The last point, that Dropbox was valued too highly, was probably correct. Shortly after I joined, major investors wrote the stock down about 40%. Oof. (I had factored a possible 50% write-down into the evaluation of my offer, but I realize now, given the illiquidity, I should have pushed the equity component of my offer much higher. Again, more on compensation later.)</p>

<h2 id="onboarding">Onboarding</h2>

<p>After IMVU, I was so excited to jump into something new and get the clock rolling on the vesting schedule, I took no time off between IMVU and Dropbox. For anyone reading this: please don’t. Take time off, if only to reset your mind.</p>

<p>Either way, the recruiter had said my chosen start date would be okay. That was a lie - in reality, Dropbox only onboarded new employees every other Tuesday. This left me without health insurance coverage for a week between jobs. Fortunately, none of the kids got hurt that week. Always quit your previous job at the beginning of the month so you are covered until the new job starts.</p>

<p>The initial few days of hardware, account setup, and security training went smoothly. Given that IMVU did nothing to onboard people into the culture, this was a big improvement. (But it paled in comparison to what I’d later see in Facebook’s onboarding process.)</p>

<h2 id="my-first-team-sonoma">My First Team: Sonoma</h2>

<p>During the interview, the pitch was that I could slot in as the tech lead for the Paper backend team. Its current tech lead was leaving Dropbox. Not everyone on the team knew that yet, and I didn’t know it was secret, so there was an awkward moment when I said to the current lead in a group interview “So where are you headed next?” and he said “I don’t know what you’re talking about.” Oops.</p>

<p>Before I joined, however, the Paper backend team moved to NYC, so I was instead assigned to a product prototyping team. The team’s average age was in the low 20s and my grandboss wanted someone with experience on how projects play out.</p>

<p>The prototype’s current iteration was a mess. It was built atop other, failed prototypes, which in turn were built on real shipping features, so you never quite knew which code was alive or dead.</p>

<p>And Dropbox’s deployment cadence was daily, except that half the time the daily push would fail, meaning we weren’t able to test hypothesis very rapidly. Relative to IMVU’s continuous deployment, this was jarring, but it’s also just not a good way to develop new products.</p>

<h2 id="iteration-speed">Iteration Speed</h2>

<p>Coming from IMVU, my expectations around developer velocity were extremely high. If a unit test took a second to run, that was considered a <a href="http://gamesfromwithin.com/whats-your-pain-threshold">failure on our part</a>. (It doesn’t mean all of our tests took under a second, but we pushed hard.) We also deployed to production with <a href="http://timothyfitz.com/2009/02/10/continuous-deployment-at-imvu-doing-the-impossible-fifty-times-a-day/">every commit to master</a>.</p>

<p>So I was shocked to find that running an <strong>empty</strong> Python unit test at Dropbox took tens of seconds. Worse, when you fixed a bug and landed it, depending on how the daily push went, it might make it to production within a day? Two? Maybe more? Compared to IMVU, this workflow was unacceptable, especially for a prototyping team that was trying to find product market fit as fast as possible. One day, after struggling to get the simplest diff landed, the frustration overflowed; I stood up and exploded “HOW THE FUCK DOES ANYONE GET ANYTHING DONE AROUND HERE?!?”</p>

<p>The iteration speeds were bad at every scale. Unit tests were slow; uploads to the code review tool were sluggish; the reviewer might look at your diff after two or three days; and finally the aforementioned deployment issues. The net result is that simple changes took days, so you had to pipeline a lot of work in parallel, resulting in constant context switching. (This was especially painful for someone like me with high context switch overhead. I prefer to dive deep on a problem, move fast, and then come up for air.)</p>

<p>One thing I don’t understand is why these iteration speeds were tolerated. Is it because the company had a huge number of new graduates who didn’t know better?  Maybe the average Dropbox engineer can handle context switches a lot better than I can? Perhaps the situation was better outside of core product engineering?  Maybe Dropbox grew so fast that the situation regressed faster than people could fix it? All of the above?</p>

<h2 id="greenfield">Greenfield</h2>

<p>Shortly after I joined, the Sonoma team failed. We cancelled the prototype and half the team quit. (Retention was a general issue - I’ll talk more about that later.)</p>

<p>However, executives decided the initiative was still valuable and needed a fresh look. We rebooted the team with a few people from the old team, some internal hires, <a href="https://techcrunch.com/2015/07/22/dropbox-acquires-clementine-an-enterprise-communication-service/">an acquisition</a>, and some interns to attack the same problem space.</p>

<p>To avoid our previous iteration issues, we decoupled from the main Dropbox stack and built our own.</p>

<p>The new stack was a TypeScript + React + Flux + Webpack frontend deployed directly onto S3 and a Go backend. None of us actually wanted to use Go, but it had momentum at Dropbox and many existing Dropbox systems already had Go APIs. Our iteration speed was great. We could deploy whenever we wanted and were only limited by our ability to think, as it should be.</p>

<p>This new team, by the way, was the most gelled team I’ve ever worked with. Not only was everyone productive and thoughtful, but our personalities meshed in a way that coming into the office was a pleasure. And talk about cross-functional! It was as if everyone on the team was multiclassing. Some of the engineers on the team would easily have qualified as product managers at IMVU, and our (excellent!) designers regularly wrote code. The level of empathy and thoughtfulness they had about the customer’s emotional state surpassed anything I’d seen.</p>

<p>I hope someday to work on a team like that again!</p>

<p>And, in a form only conceptually related to any code I wrote, <a href="https://www.fastcompany.com/90362055/how-dropbox-is-finally-breaking-free-of-the-folder">our project did eventually ship</a>!</p>

<h2 id="personal-life-fell-apart">Personal Life Fell Apart</h2>

<p>Shortly after our product team rebooted itself and was in its <a href="https://www.infoq.com/news/2008/09/sprint_zero/">Sprint Zero</a>, my personal life began to unravel.</p>

<p>When I joined Dropbox, I knew the 90-minute commute from the south bay to San Francisco would be painful. And my wife was pregnant with our third child, so I’d be taking paternity leave only months after starting. But I did not predict how awful that year would be.</p>

<p>Days before #3 was born, my beloved grandfather suddenly collapsed and died at 74 years old. One month later, my 54-year-old father was diagnosed with stage 4 lung cancer, and was (accurately) given nine months to live. That fall, my wife’s grandmother passed. I went from being devastated to numb.</p>

<p>It was easily the worst year of my life, but I could not have asked for Dropbox to provide better support. Even though I’d just joined the company, they let me take all the time I needed to get my personal life back in order.</p>

<p>Including paternity leave, I took months of paid time off. And when my father passed, my manager gave me all the time I needed to help my mother get her estate in order, and then gradually ease back into a work mindset.</p>

<p>I’ll forever be grateful for the support Dropbox provided in the worst year of my life.</p>

<h2 id="empathy">Empathy</h2>

<p>You’d think, coming from IMVU – a company built around avatars and self expression – that a B2B file syncing company would be dry and uninteresting.</p>

<p>But I was thrilled to discover Dropbox implemented all the practices I only dreamed of at IMVU. My team regularly flew around the country to meet with users, deeply understanding their workflows and bringing that knowledge back.</p>

<p>It’s a common simplification to say that Dropbox is a file-syncing business. But file syncing is a commodity. Dropbox’s value is broader - it’s more accurate to think of Dropbox as an organizing-and-sharing-your-stuff business. This mindset leads to features like the <a href="https://help.dropbox.com/files-folders/share/badge-collaborate">Dropbox Badge</a> and how taking screenshots automatically uploads and copies the URL to your clipboard, because most of the time screenshots are going to be shared.</p>

<h2 id="employee-retention">Employee Retention</h2>

<p>I’d heard about high rates of employee churn in mainstream Silicon Valley but IMVU had unusually high retention so I didn’t witness it until Dropbox.</p>

<p>Maybe it’s a San Francisco culture thing. Maybe the employee base is young and not tied down. Maybe I joined right as the valuation peaked and people wanted to cash out. Either way, employee turnover was high. It felt like half the people I met would quit a couple months later. I can understand - if you’re in your 20s and sitting on a few million bucks, why not just go live in a cabin on a lake or spend a year climbing mountains.</p>

<p>It’s hard for companies in Silicon Valley to keep employees - there are so many opportunities and salaries are so high that even new grads can work on almost any project they personally find fulfilling. I suspect the majority of engineers in SV could quit, walk down the street, and get a raise somewhere else with almost no effort.</p>

<p>That said, longevity is important. If your goal is to become a senior engineer capable of large-scale impact, you have to be able to see the results of decisions you made years prior. If you jump ship every 18 months, you won’t develop that skill.</p>

<p>I don’t have data on Dropbox’s retention numbers, but anecdotally it felt like they struggled to keep employees, especially senior engineers. I feel like they’d be well-served by doing a <a href="https://www.crayon.co/blog/how-to-do-win-loss-analysis-examples-resources">win-loss analysis</a> on every regrettable departure, and solving that, even if throwing money at the problem is a short-term fix.</p>

<p>No matter the reason, it’s a worrying sign when so many of the good people leave. Companies need to retain their senior talent.</p>

<h2 id="programming-languages">Programming Languages</h2>

<p>I firmly believe that programming languages matter. They shape your thoughts and strongly influence code’s correctness, runtime behavior, and even team dynamics.</p>

<p>I’ve always been a fan of Python – we used it heavily at IMVU – but I learned something surprising: what’s worse than IMVU’s two million lines of backend PHP? Two million lines of Python! PHP is a shit language, for sure, but Python has all of the same dynamic language problems, plus high startup costs, plus so much dynamicism and sugar that people feel compelled to build fancy abstractions like decorators, proxy objects, and other forms of cute magic. In PHP, for example, you’d never even imagine a framework inspecting a function’s arguments’ <em>names</em> to determine which values should be passed in. Yet that’s how our Python code at Dropbox worked. PHP’s expressive weakness leads to obvious, straight-line code, which is a strength at scale.</p>

<p>There was a migration away from Python and towards Go on the backend while I was there. Go is a great language for writing focused network services, but I’m not sure replacing all the complicated business logic with Go would be successful. My experience on our little product team writing an application in Go is that it’s way too easy for someone to introduce a data race on a shared data structure or to implement, for example, the “<a href="https://telliott.io/2016/09/29/three-ish-ways-to-implement-timeouts-in-go.html">timeout idiom</a>” incorrectly, leaking goroutines. And no matter how many people say otherwise, the lack of generics really does hurt.</p>

<p>When I joined the company, the Dropbox.com frontend was written in CoffeeScript. I’ve already written <a href="/2015/05/coffeescript/">my feelings on that language</a>, and the Dropbox experience didn’t change them. Fortunately, while I was there, the web platform team managed a well-motivated, well-communicated, and well-executed transition to TypeScript. Major props to them - large-scale technology transitions are hard in the best of times, and they did a great job making TypeScript happen.</p>

<h2 id="compensation">Compensation</h2>

<p>I never felt underpaid at IMVU. I joined before the series A and was given a <em>very</em> fair percentage of the company, especially for a new graduate. Sure, my starting salary was pathetic, but it climbed rapidly once we took funding, and my numbers were at the top end of glassdoor’s numbers for engineers. And relative to other startups or mid-sized private companies, I probably <em>wasn’t</em> underpaid.</p>

<p>But Dropbox competes with the <a href="https://en.wikipedia.org/wiki/Facebook,_Apple,_Amazon,_Netflix_and_Google">FAANGs</a> for talent and I had yet to realize <a href="https://twitter.com/danluu/status/942452212445405185">just how high top-of-market rate for senior engineers had climbed</a>. Also, <a href="https://www.levels.fyi/">levels.fyi</a> wasn’t a thing, and since I was poached rather than looking around, I failed to acquire competing offers. So I didn’t know my market worth.</p>

<p>Now, by any reasonable person’s standards, my Dropbox offer was good. They matched my IMVU salary and gave me an equivalent amount in RSU equity per year (for four years), plus another 20% of my salary as a signing bonus. That should be great, and I was happy with the offer.</p>

<p>But, in hindsight, I could have gotten the same offer from publicly traded FAANGs, and since Dropbox was private (and possibly overvalued), I should have fought for a 2x equity multiplier at least.</p>

<p>This would have forced the company to place me into a more impactful role and level, ultimately making my work more satisfying, and perhaps keeping me at the company longer. As it was, I was underleveled.</p>

<p>Always try to get competing offers. Had I done that, and had Dropbox still been the #1 choice, I might still be there, with both sides happier.</p>

<h2 id="credibility">Credibility</h2>

<p>When I joined IMVU, it was a small team of founders. Eric Ries said to me “For your first week here, you should fix one thing for each person.” This was great advice. Building rapport early is important.</p>

<p>I’d forgotten that advice by the time I joined Dropbox. I had been in a position of implicit credibility for so long that I assumed it would carry over. It was a splash of cold water to realize nobody cares what you’ve done. Nobody cares about what other people have thought of you. As a new hire, you’re an unknown like everyone else.</p>

<p>It’s common for people joining from other companies to talk about their experiences. “At Acme Corp, we did this.” But when nobody has any shared context with your time at Acme, that sentence conveys no meaning.</p>

<p>I spent too much time saying “At IMVU, we did X.” To me, IMVU was a great place that did a lot well. But talking about it wasn’t helpful. Eventually, I learned to rephrase. “I’ve noticed we have X problem. What do you think about trying Y solution?” Nobody cares how you learned the trick, but you’re a wizard if you perform it in front of them.</p>

<h2 id="security">Security</h2>

<p>Dropbox cares <em>a lot</em> about security. They’re fully aware that breaches destroy trust, and Dropbox greatly values its customers’ trust.</p>

<p>As a customer, I’m very happy to know a world-class security team is protecting my data. As a developer, it sometimes was a pain in the ass. :) Culturally, they rounded towards more secure by default, even if it negatively affected velocity. My team had to sign some Windows executables, and because we didn’t have an internal service that handled apps other than the main Dropbox app, I had to be escorted to the vault, where a signing key was briefly plugged into my laptop, and all uses were supervised.</p>

<p>And the developer laptops were sooo slow. I don’t know how you take a brand new MacBook Pro and turn it into something so sluggish. The week when the virus scanner was broken was glorious. I believe all activity on our machines was monitored too. “You should have zero expectation of privacy on your work hardware.”</p>

<p>Development gripes aside, as a user, I trust Dropbox to protect my data.</p>

<h2 id="diversity-and-interviewing">Diversity and Interviewing</h2>

<p>Dropbox paid a lot of attention to employee diversity. Everyone was required to take unconscious bias training, and the interview process aimed to limit the risk of race or gender or even cultural background clouding a hire decision. For example, it’s common for interviewers to factor in the presence of a GitHub profile as positive signal, but the Dropbox interviewing process cautioned against this, as it biases towards people of a particular background.</p>

<p>To that end, the interview process for engineers was mechanized. The primary input on your hiring decision was how well you could write correct, complicated, algorithms-and-data-structures code on a given set of whiteboard questions, where questions involving concurrency were considered especially valuable. This process, while attempting to be bias-free, had unintended effects. It resulted in a heavy bias towards new grads from high-end computer science programs, such as the Ivy Leagues.</p>

<p>And even with the briefest glance around the office, you could tell the employees weren’t a representative slice of society. The company was full of pretty people. I’m sure being headquartered in San Francisco and the relative youth of the employees had an effect. I’m at Facebook now, and it feels a lot more like a normal slice of society. Or at least of suburban Silican Valley.</p>

<p>I referred two of IMVU’s best engineers – the kinds of people who have average CS backgrounds, but who have shipped <em>a ton</em> of high-value code and led major projects. One didn’t make it through the screen, and the recruiter wrote the note “Declined: we don’t have much signal about IMVU” and the other failed the interview because they didn’t use a heap to solve a certain problem. The moderator in the debrief told the (junior) interviewer “Now, now, a senior industry product candidate probably hasn’t used a heap in 10 years”, but the result remained the same.</p>

<p>I appreciate Dropbox’s attempts to create a bias-free interview process, but I worry that it values fresh CS knowledge over experience and get-it-done attitude.</p>

<p>By the way, when the FAANGs and Dropbox are offering compensation packages twice what startups can afford, this is where startups can compete for talent. There are many people who didn’t graduate college but are focused workers or have a knack for understanding users.</p>

<h2 id="creative-talent">Creative Talent</h2>

<p>The sheer density of creative talent at Dropbox was amazing. Designers and product managers could hack on the code. Product engineers had an amazing sense of empathy for the customer. There was art on the walls and various creative projects all around the office. It seemed like everyone I met had multiple talents.</p>

<p>Even the interns were amazing. These kids, barely old enough to drink, had a strong grasp on cryptography and networks and distributed systems and programming language theory, on top of all of the basic CS knowledge. Motivated individuals have access to so much more information than when I grew up, and I’m a bit jealous. :)</p>

<h2 id="hack-week">Hack Week</h2>

<p>IMVU began <a href="https://web.archive.org/web/20160916004543/https://engineering.imvu.com/2010/03/30/its-hack-week-at-imvu/">company-wide hack weeks in 2007</a>. Eric Prestemon, one of our tech leads, modeled the idea off of Yahoo’s hack days, but as far as I’ve heard, we might have been the first to make it a week-long quarterly event. (I look forward to hearing from you if your company also ran a hack week.) So when I joined Dropbox, the idea was quite familiar, and the benefits obvious.</p>

<p>The idea is that, on some regular cadence, you give everyone in the company an entire week to work on whatever they want. The normal backlog is paused, product managers have no direct influence, and shipping to 1% of customers is encouraged. It’s good for the business – risky product ideas can be prototyped, some of which become valued parts of the product. And it’s good for employees – everyone gets a chance to drive what they think is important and underserved. Hack weeks inject a dose of positivity into the work environment.</p>

<p>But I must say that <a href="https://www.theverge.com/2014/7/24/5930927/why-dropbox-gives-its-employees-a-week-to-do-whatever-they-want">Dropbox ran its hack week better than IMVU ever did</a>. IMVU’s hack week started off open-ended. As long as it was somewhat related to the business, employees could work on anything. But over time, the product managers put an increasing amount of pressure on people to work on <em>their</em> projects and deliver concrete value that week.</p>

<p>Dropbox, on the other hand, invested substantially more organizational effort into supporting hack week. The dates were announced in advance, giving people time to write up proposals, merge project ideas, and form small teams to work on them.</p>

<p>While most people applied their creative energy to unexplored product ideas, there was no pressure to do any particular thing. At Dropbox, it was totally cool to spend hack week learning a new skill, blowing glass, or trying to break a Guinness record. There’s nothing like being surrounded by excited people. Passion is contagious.</p>

<p>Projects were celebrated during Friday’s expo, where the office was arranged into zones and each zone given a time window for presentations. Then, everyone, including executives, would tour the projects. The most impactful or promising would get a chance to be officially funded. I can’t put into words how amazing some of the projects were. Dropbox hack week was like getting a glimpse into what the future of business collaboration will look like. Of course, it takes time to ship features properly, but these weren’t smoke and mirrors demos. Many projects actually had their core loops implemented.</p>

<h2 id="community">Community</h2>

<p>If you care about volunteering your time and giving back to the community, Dropbox is a great place to work. Every quarter, you could take two paid days off to volunteer your time. For example, you could work in a local school or a food pantry. Monetary donations to charity were also matched one-to-one up to a cap.</p>

<p>Charity and service opportunities were regularly by email. Public service was celebrated and part of the culture.</p>

<h2 id="code-review">Code Review</h2>

<p>Dropbox follows a diff-based code review process using <a href="https://www.phacility.com/phabricator/">Phabricator</a>. I think it was largely copied from Facebook’s. I’ve written before that I don’t think diff-based code reviews are as effective or efficient as <a href="https://chadaustin.me/2015/01/code-reviews-follow-the-data/">project-based, dataflow-based code review</a>.</p>

<p>And as I expected, the code review process at Dropbox lent itself to bikeshedding. To be fair, code review processes are cultural, so I imagine diff-based review could work well.</p>

<p>Nonetheless, it was common for me to have diffs blocked for minor things. My diffs were rejected for things like use of tense or capitalization in comments. Meanwhile, important decisions like why I chose a certain hash function or system design would receive no comments at all.</p>

<p>Also! A common antipattern was for someone to block my diff because, even though it was an improvement over the previous state, it didn’t go far enough. This unnecessary perfectionism slowed progress towards the desired end state.</p>

<p>The net result was a lot of friction in the development process. At IMVU, we followed a project-structured flow, where a team planned out a body of work, had an informal design review (emphasizing core components over leaf components), implemented the feature with autonomy, and finally, as the project wrapped up, one or more hour-long project-based code review sessions were held. This made sure we got the high-order bits right, while letting the team move quickly during development.</p>

<p>In contrast, at Dropbox, code review was interleaved throughout development. Coupled with the fact that everyone was busy and team members had different schedules, the turnaround time on diffs was measured in hours or days. In egregious cases, a small diff might have one code review cycle per day, and get bounced back multiple times, resulting in a three-day latency between work starting and the diff landing on master.</p>

<p>This meant I rarely entered a flow state. I had to keep a handful of diffs in flight at all time which was tremendously inefficient, at least for me – I am not great at context switches.</p>

<p>[UPDATE: I wrote the above before joining Facebook, and Facebook’s diff-based code review process is much healthier. It’s a combination of culture and tooling. Maybe I’ll write about that sometime.]</p>

<h2 id="testing">Testing</h2>

<p>My understanding is that Dropbox didn’t form a testing culture until years after the company started. The result is that the basic processes of effective testing at scale were still being figured out. Coming from IMVU, it felt like stepping back in time about five years. (An aside, this made me realize that company maturity is an orthogonal axis to revenue and size.)</p>

<p>Testing maturity is a progression.</p>

<ol>
  <li>No tests</li>
  <li>Occasional, slow, unreliable tests</li>
  <li>Semi-comprehensive integration tests</li>
  <li>Fast, comprehensive unit tests comprise the bulk of testing
    <ol>
      <li>Dependency injection</li>
      <li>Composable subsystem design</li>
    </ol>
  </li>
  <li>Real-time test feedback (ideally integrated into the editor)</li>
  <li>Tests are extremely reliable or <a href="http://andyfriesen.com/2015/06/17/testable-io-in-haskell.html">guaranteed reliable by the type system</a>
    <ol>
      <li>With tooling that tracks the reliability of tests and provides that feedback to authors.</li>
    </ol>
  </li>
  <li>Fuzzing, statistically automated microbenchmarking, rich testing frameworks for every language and every platform, and a company culture of writing the appropriate number of unit tests and high-value integration tests.</li>
</ol>

<p>If IMVU was somewhere around 5 or 6, Dropbox, when I joined, was closer to 3. The situation improved while I was there, but this stuff takes time. And good ideas spread more slowly if you have more junior engineers. Also, every flaky test written – or integration test that could have been a unit test – is a recurring cost on future engineering, so it’s valuable to climb this hierarchy early in an engineering team’s life.</p>

<p>All of that said, the company did important work on this axis while I was there. They’ll probably catch up eventually.</p>

<h2 id="epd">EPD</h2>

<p>At IMVU, between 2010-ish and 2015, there was a strong divide between product management, design, and engineering roles. IMVU’s executive leadership was a proponent of “The person who makes the decision must be responsible, and since product management must be held responsible, they must also have full control over their decisions.” The implication is that product management has total say, and engineers must do what they’re told. As you might imagine, people don’t like being told their opinion doesn’t matter, which led to conflict and unhealthy team dynamics.</p>

<p>I personally favor a soft touch product management style, where product management gathers data, shares context with the team, and guides it to success. (See this <a href="http://www.podcastchart.com/podcasts/product-management-pulse/episodes/the-craft-of-product-management">excellent interview</a> with Bob Corrigan.) I understand it’s harder to judge the success of a PM than with a black-and-white “was your product successful”, but top-down <em>tactical</em> team management is not healthy, and IMVU was frequently guilty of that.</p>

<p>Thus, when I went to Dropbox, I was thrilled to see that engineering, product management, and design were considered one unit. The teams I was involved with had frequent open communication between all team members, and I did not observe any disagreements that weren’t quickly resolved by sharing additional context. (Though there were a couple times that context led to people quitting or switching teams, haha. But better that than misery.)</p>

<p>Now, product management still owned the backlog. Unlike Facebook, engineers did not have true autonomy. But the dynamics were so much healthier than IMVU’s. I’ll grant it’s possible that healthy dynamics are easier when revenue and the stock price are growing.</p>

<h2 id="the-food">The Food</h2>

<p>The food at Dropbox is unreal. On my first day, I had <em>the best</em> fried chicken sandwich of my life. For lunch, it was not uncommon to have to decide between duck confit, swordfish, and braised lamb shank for lunch. I thought “There’s no way this lasts.” But… it did, at least as long as I was there. The quality did briefly dip a bit as the company moved to a bigger office (with a different kitchen) and expanded the food service to all of the new employees, but it recovered.</p>

<p>Dropbox’s kitchen – known as The Tuck Shop – never made the same dish twice. (Though common themes would pop up every so often.) At first, this made me sad. With an amazing dish would come knowledge that I would never have it again. This was hard to bear. There were no recipes; the dishes came straight from the minds of Chef Brian and his team. Eventually, the Tuck Shop gave me a kind of zen allegory for life. In life, moments are fleeting and you don’t get redos, so enjoy opportunities when they occur.</p>

<p>I had a very long commute from the South Bay up to the city, so I ate a large breakfast every day. Breakfast is what made the long commute possible.</p>

<figure>
<a href="https://www.reddit.com/r/grilledcheese/comments/3xdril/breakfast_dropbox_cheddar_and_tomato_basil_soup_w/"><img src="/images/dropbox/dropbox_breakfast.jpeg" alt="Grilled cheese with tomato soup and sweet potato tots" /></a>
<figcaption>Breakfast was <em>killer</em>. Check out this grilled cheese with tomato soup and sweet potato tots.</figcaption>
</figure>

<p>Oh yeah! It took me too long to learn about this, but the Tuck Shop hosted afternoon tea and cookies or cake. Afternoon desserts you’ve never heard of. The coffee shop made crepes. Fresh young coconuts every day. And even wine pairing on Wednesday nights.</p>

<figure>
<img src="/images/dropbox/wine_pairing.jpeg" alt="Wine pairing with cocktail, scallop, and pasta" />
<figcaption>Wednesday night wine pairing, with cocktail, scallop, and pasta.</figcaption>
</figure>

<p>It’s funny to hear people at other companies talk about how good the food is. (And I’m sure it is good!) But no way does Dropbox not have the best corporate food in Silicon Valley, or even the whole USA.</p>

<p>Will it last?  Hard to say. Is it egregious?  It certainly feels like it, but the cost of food is dominated by labor and I’ve heard they’ve managed to get costs to a reasonable amount per day, with almost zero food wastage. Hundreds of restaurant-style entrees were prepared and plated on masse, with copious use of sous vide and diced herbs sprinkled on top. I’m sure food costs were still a drop in the bucket compared to the salaries of thousands of engineers.</p>

<h2 id="floss">Floss</h2>

<p>This might sound silly, but one of my favorite benefits was the fact that every bathroom was stocked with floss and other oral hygiene items. I’ve always at least tried to make an attempt to floss regularly, but when it’s right there at work, flossing regularly is so much easier. It was especially important after those amazing lunches.</p>

<h2 id="wrong-career-trajectory">Wrong Career Trajectory</h2>

<p>About a year into my employment, the honeymoon was over. While I was enjoying my work and the team, the impact I was having on the company was tiny compared to my potential. For one, I was on a product prototyping team, isolated from the main Dropbox offering. Like a startup, the only way for a new effort to have significant impact is by succeeding. And the chances of that are small. While I really enjoy pushing pixels and executing that tight customer-feedback-write-code loop, I wasn’t going to get promoted doing that. In fact, I know a bunch of cheaper engineers than me who write better CSS.</p>

<p>In hindsight, I probably should have joined an infrastructure team from the beginning. Engineering infrastructure has a lot of visible impact, requires deeper technical leadership than product work, and aligns better with my skillset. That said, I intentionally joined Dropbox to learn about its product culture. I’d also had little exposure to mainstream web stacks (IMVU hand-rolled its own mostly due to unfortunate timing), and no exposure to Electron, iOS, and native macOS development. Plus, again, pushing pixels with world-class collaborative designers and a gelled team is delightful. :)</p>

<p>I’m conflicted, but I can’t say I regret my time on the prototyping team. The friendships alone were worth it, and I can justify it like getting paid to go back to school after 10 years of IMVU technology and habits.</p>

<p>Nonetheless, I was unhappy, so I made the transition over to the dropbox.com performance team.</p>

<h2 id="web-performance">Web Performance</h2>

<p>Web performance and platform APIs are squarely in my wheelhouse. I led the team that transitioned IMVU from desktop client engineers to web frontend engineers and, with my team, built most of IMVU’s modern web stack (and to this day, a lot of what we did remains better than off-the-shelf open source).</p>

<p>So when Dropbox kicked off a strike team to reduce its page load times from eight seconds (!) to four, I decided that was a great opportunity, and switched teams.</p>

<p>This proved to be quite challenging for me. Coming from a prototyping team with its own stack, I had little background on how the core Dropbox.com stack worked. Meanwhile, my new teammates had years of experience with it. And because the effort was a fire drill, it never felt like there was enough time to sit down and properly spin up.</p>

<p>The big lesson for me here was about setting expectations. While I had a lot of experience in the space, I did a poor job of making it clear that I’d need time to spin up on the team. As a new but senior engineer, I could have managed the dynamics much better. I would have done better with more independence, autonomy, and time.</p>

<h2 id="management-churn">Management Churn</h2>

<p>The web platform team also had a lot of management churn. I liked my first manager quite a lot, but we both knew that everything would change once our performance goals were achieved. The company planned to hire a new group manager, who would then hire his own managers for each team.</p>

<p>In the short term, I reported to the new group manager, but he didn’t last long at the company. So then I reported to the team’s tech lead, who wasn’t planning on being a manager again, but had to.</p>

<p>The result of all of this was that I had <strong>seven managers in two years</strong>. I’d heard about Silicon Valley’s high rate employee turnover, but with so many managers it was hard to build rapport and learn each other’s styles.</p>

<p>It didn’t help that my new manager’s style was very command-and-control. After our team planning, he laid out my next several months’ worth of work. Given my need to work with autonomy, this was probably the beginning of the end of my time at Dropbox.</p>

<p>Note that I <em>liked</em> everyone I worked with. There just wasn’t enough time or space to build a strong working relationship.</p>

<h2 id="impact">Impact</h2>

<p>I’d been at Dropbox for about 20 months before I learned how engineers are leveled. The problem boils down to this: given an organization of a thousand or more, how do you decide how to distribute responsibilities and determine compensation? You’d like to maintain some kind of fairness across disciplines, organizations, teams, and managers. In an ideal world, your team and manager are independent of your compensation.</p>

<p>Dropbox culture derived largely from Facebook (as many early employees had come from Facebook), and Facebook determines level and compensation by gauging each employee’s impact. During review season, managers from across the company are all shuffled into groups that review random engineers. This prevents a manager from biasing their reports upwards or downwards and adds consistency. This calibration process focuses primarily on a sense of that person’s impact on the company.</p>

<p>At Facebook, impact is a first-class concept. It’s common to hear “I have some impact for you.” But I’d come from a 150-person company where the pay bands were wide, people did a mix of short-term and long-term work, and managers were primarily responsible for placing their employees in said pay bands based on a variety of factors, such as team cohesiveness, giving high-quality feedback to peers, and writing quality code. IMVU was a team culture.</p>

<p>Dropbox had copied their system from Facebook. Now, I don’t think IMVU’s system was better, and I don’t think Dropbox’s was bad. Here’s the problem: <em>nobody ever told me how it worked</em>.</p>

<p>If I had known how my level and compensation were determined, I would have made very different decisions. Facebook, on the other hand, has a class during orientation that explains how your compensation and level are determined. They have countless wiki pages, internal posts, and presentations on the subject. The incentives are very explicit.</p>

<p>I was too naive and trusting and assumed everything would just work out if I was a good teammate and worked hard. I now see the value in grasping the mechanics of the incentive structure early.</p>

<p>Obviously you can find problems with any incentive system, but the real issue here is that I somehow never learned that Dropbox’s leveling process required you to keep track of your specific contributions, especially the more nebulous ones that don’t bubble up through typical project management, and provide enough data that your manager can make the case for a level adjustment in the company-wide calibration process.</p>

<p>I was months away from quitting when I found out how this worked, and suddenly it explained why my manager had always wanted specific examples of work I’d done. At the time, it had seemed like whether I’d done such and such optimization was irrelevant on an the annual review. I could have done a much better job of framing my contributions.</p>

<h2 id="thinking-from-first-principles">Thinking from First Principles</h2>

<p>I (and several of the early IMVU engineers) are partial to first-principles thinking. What are we trying to do?  What are the constraints?  What are the options?  How do they weigh against the constraints?  Decide accordingly. Perhaps this comes from working on games where, at least in the early years, technical constraints had a big influence on the options. Facebook also seems to have a first principles culture.</p>

<p>But one thing I found frustrating at Dropbox is a kind of… software architecture as religion. “No, don’t structure your code that way, it’s not very Fluxy.” or “[Facebook or Google] do it this way, we should too.” or “We need to compute this data lazily for performance [even though a brute force solution under realistic data sizes can easily achieve our concrete performance targets].” or arguments like “We should base our build system on webpack because it seems to be the winner [without measuring its suitability for the problem at hand]”.</p>

<p>Again, I’m conflicted. There’s value in going with the flow and not spending forever shaving all the yaks if something already fits the bill. But often, with a bit of research and some thought, you <em>can</em> come up with a better solution. In hindsight, I’m amazed that IMVU was able to build such great, reliable, and fast infrastructure with a small set of talented people, simply by thinking about the problem from the top and precisely solving it. For example, IMVU’s realtime message queue, IMQ, was better than any of the four that Dropbox had built, and was written by three people.</p>

<p>(Now is a good time to remind you that I was only exposed to a small slice of Dropbox. I would hope that, for example, the data storage teams thought from first principles.)</p>

<h2 id="lessons-learned">Lessons Learned</h2>

<p>In a lot of ways, Dropbox was a great place to work. I loved my teammates and learned a lot. Even though it ended up not working out, it’s hard to regret the time I spent there.</p>

<p>Here are some lessons I took away:</p>

<ul>
  <li>Always get competing offers. Know your worth.</li>
  <li>When you’re hired, take the time to understand how you’re judged. This would have prevented a lot of confusion on my part.</li>
  <li>90-minute door-to-door commutes are horrible.</li>
  <li>Unless you’re Guido van Rossum or otherwise have a widespread reputation, building credibility is hard and takes conscious effort for months and possibly years.</li>
  <li>As fantastic as the food perks are, meaningful work is better.</li>
  <li>Thank you so much, Dropbox, for taking care of me and my family during a very hard year.</li>
</ul>]]></content><author><name></name></author><category term="dropbox" /><category term="imvu" /><category term="career" /><category term="hackernews" /><summary type="html"><![CDATA[Disclaimer]]></summary></entry></feed>