tag:blogger.com,1999:blog-81842378166695207632024-02-20T18:13:13.134-08:00Code, code and more code.Marc Gravellhttp://www.blogger.com/profile/01023334706549710089noreply@blogger.comBlogger176125tag:blogger.com,1999:blog-8184237816669520763.post-64075309427831871832022-05-20T03:47:00.004-07:002022-05-20T07:59:12.841-07:00Unusual optimizations; ref foreach and ref returns<p>A really interesting feature quietly slipped into C# 7.3 - interesting to me, at least - but which I’ve seen almost no noise about. As I’ve said many times before: I have niche interests - I spend a lot of time in library code, or acting in a consulting capacity on performance tuning application code - so in both capacities, I tend to look at performance tweaks that <em>aren’t usually needed</em>, but when they are: they’re <strong>glorious</strong>. As I say: I haven’t seen this discussed a lot, so: “be the change you want to see” - here’s my attempt to sell you on the glory of <code class="language-plaintext highlighter-rouge">ref foreach</code>.</p>
<p>Because I know folks have a short attention span, I’ll start with the money shot:</p>
<table>
<thead>
<tr>
<th>Method</th>
<th style="text-align: right;">Mean</th>
<th style="text-align: right;">Gen 0</th>
<th style="text-align: right;">Allocated</th>
</tr>
</thead>
<tbody>
<tr>
<td>ListForEachLoop</td>
<td style="text-align: right;">2,724.7 ns</td>
<td style="text-align: right;">-</td>
<td style="text-align: right;">-</td>
</tr>
<tr>
<td>ArrayForEachLoop</td>
<td style="text-align: right;">972.2 ns</td>
<td style="text-align: right;">-</td>
<td style="text-align: right;">-</td>
</tr>
<tr>
<td>CustomForEachLoop</td>
<td style="text-align: right;">987.2 ns</td>
<td style="text-align: right;">-</td>
<td style="text-align: right;">-</td>
</tr>
<tr>
<td>ListForLoop</td>
<td style="text-align: right;">1,201.3 ns</td>
<td style="text-align: right;">-</td>
<td style="text-align: right;">-</td>
</tr>
<tr>
<td>ArrayForLoop</td>
<td style="text-align: right;">593.0 ns</td>
<td style="text-align: right;">-</td>
<td style="text-align: right;">-</td>
</tr>
<tr>
<td>CustomForLoop</td>
<td style="text-align: right;">596.2 ns</td>
<td style="text-align: right;">-</td>
<td style="text-align: right;">-</td>
</tr>
<tr>
<td>ListLinqSum</td>
<td style="text-align: right;">7,057.1 ns</td>
<td style="text-align: right;">0.0076</td>
<td style="text-align: right;">80 B</td>
</tr>
<tr>
<td>ArrayLinqSum</td>
<td style="text-align: right;">4,832.7 ns</td>
<td style="text-align: right;">-</td>
<td style="text-align: right;">32 B</td>
</tr>
<tr>
<td>ListForEachMethod</td>
<td style="text-align: right;">2,070.6 ns</td>
<td style="text-align: right;">0.0114</td>
<td style="text-align: right;">88 B</td>
</tr>
<tr>
<td>ListRefForeachLoop</td>
<td style="text-align: right;">586.2 ns</td>
<td style="text-align: right;">-</td>
<td style="text-align: right;">-</td>
</tr>
<tr>
<td>ListSpanForLoop</td>
<td style="text-align: right;">590.3 ns</td>
<td style="text-align: right;">-</td>
<td style="text-align: right;">-</td>
</tr>
<tr>
<td>ArrayRefForeachLoop</td>
<td style="text-align: right;">574.1 ns</td>
<td style="text-align: right;">-</td>
<td style="text-align: right;">-</td>
</tr>
<tr>
<td>CustomRefForeachLoop</td>
<td style="text-align: right;">581.0 ns</td>
<td style="text-align: right;">-</td>
<td style="text-align: right;">-</td>
</tr>
<tr>
<td>CustomSpanForeachLoop</td>
<td style="text-align: right;">816.1 ns</td>
<td style="text-align: right;">-</td>
<td style="text-align: right;">-</td>
</tr>
<tr>
<td>CustomSpanRefForeachLoop</td>
<td style="text-align: right;">592.2 ns</td>
<td style="text-align: right;">-</td>
<td style="text-align: right;">-</td>
</tr>
</tbody>
</table>
<p>With the point being: I want to sell you on those sub-600 nanosecond versions, rather than the multi-microsecond versions <em>of the same operation</em>.</p>
<h1 id="what-the-hell-is-ref-foreach">What the hell is <code class="language-plaintext highlighter-rouge">ref foreach</code>?</h1>
<p>First, simple recap: let’s consider:</p>
<div class="language-c# highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">foreach</span> <span class="p">(</span><span class="kt">var</span> <span class="n">someValue</span> <span class="k">in</span> <span class="n">someSequence</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">someValue</span><span class="p">.</span><span class="nf">DoSomething</span><span class="p">();</span>
<span class="p">}</span>
</code></pre></div></div>
<p>The <em>details</em> here may vary depending on what <code class="language-plaintext highlighter-rouge">someSequence</code> is, but <em>conceptually</em>, what this is doing is reading each value from <code class="language-plaintext highlighter-rouge">someSequence</code> into a local variable <code class="language-plaintext highlighter-rouge">someValue</code>, and calling the <code class="language-plaintext highlighter-rouge">DoSomething()</code> method on each. If the type of <code class="language-plaintext highlighter-rouge">someValue</code> is a reference-type (i.e. a <code class="language-plaintext highlighter-rouge">class</code> or <code class="language-plaintext highlighter-rouge">interface</code>), then each “value” in the sequence is just that: a reference - so we’re not really moving much data around here, just a pointer.</p>
<p>When this gets <em>interesting</em> is: what if the type of <code class="language-plaintext highlighter-rouge">someValue</code> is a <code class="language-plaintext highlighter-rouge">struct</code>? And in particular, what if it is a <em>heckin’ chonka</em> of a <code class="language-plaintext highlighter-rouge">struct</code>? (and yes, there <em>are</em> some interesting scenarios where <code class="language-plaintext highlighter-rouge">struct</code> is useful outside of simple data types, especially if we enforce <code class="language-plaintext highlighter-rouge">readonly struct</code> to prevent ourselves from shooting our own feet off) In that case, copying the value out of the sequence can be a singificant operation (if we do it often enough to care). Historically, the <code class="language-plaintext highlighter-rouge">foreach</code> syntax has an inbuilt implementation for some types (arrays, etc), falling back to a duck-typed pattern that relies on a <code class="language-plaintext highlighter-rouge">bool MoveNext()</code> and <code class="language-plaintext highlighter-rouge">SomeType Current {get;}</code> pair (often, but not exclusively, provided via <code class="language-plaintext highlighter-rouge">IEnumerator<T></code>) - so the “return the entire value” is baked into the old signature (via the <code class="language-plaintext highlighter-rouge">Current</code> property).</p>
<p><strong><em>What if we could avoid that?</em></strong></p>
<h1 id="for-arrays-we-already-can">For arrays: <em>we already can!</em></h1>
<p>Let’s consider that <code class="language-plaintext highlighter-rouge">someSequence</code> is explicitly typed as an array. It is very tempting to think that <code class="language-plaintext highlighter-rouge">foreach</code> and <code class="language-plaintext highlighter-rouge">for</code> over the array work the same - i.e. the same <code class="language-plaintext highlighter-rouge">foreach</code> as above, compared to <code class="language-plaintext highlighter-rouge">for</code>:</p>
<div class="language-c# highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span><span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="p">=</span> <span class="m">0</span> <span class="p">;</span> <span class="n">i</span> <span class="p"><</span> <span class="n">someArray</span><span class="p">.</span><span class="n">Length</span> <span class="p">;</span> <span class="n">i</span><span class="p">++)</span>
<span class="p">{</span>
<span class="n">someArray</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="nf">DoSomething</span><span class="p">();</span>
<span class="p">}</span>
</code></pre></div></div>
<p>But: if we <a href="https://sharplab.io/#v2:D4AQzABCBMEMIFgBQBvZENQIwDYoBYIAxAewCcBRAQwGMALACgGUSBbAUwBUBPAB3YDaAXQgBnNuwCCZMlW4BKdJjRJMaiADNy7WnQgMAblTJiJANSoAbAK7sIASwB2pjtNkKl6iCq9fxHCxt2ADoAERIWDgAXOicAcwZ5AG5PdQBfVIzVTBBcAmJyZgkefmEXKRk5RWyMH18tEwYnKIcIAF4IAAYIJNaAHnK3OWCAGXZHOJiehwBqGerfb1TF/wr3AXshMIiJGPjElJr0zOQssh0AExJHS24xKLJrGhbIrj52ZDqMc6orm7uLuwaPZWFYIFQADQAIwhNEOanA+XCrz2E0S3iyaSAA==">run both of those through sharplab</a>, we can see that they compile differently; in C#, the difference is that <code class="language-plaintext highlighter-rouge">foreach</code> is basically:</p>
<div class="language-c# highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">SomeType</span> <span class="n">someType</span> <span class="p">=</span> <span class="n">someArray</span><span class="p">[</span><span class="n">index</span><span class="p">];</span>
<span class="n">someType</span><span class="p">.</span><span class="nf">DoSomething</span><span class="p">();</span>
</code></pre></div></div>
<p>which fetches the entire value out of the array, where as <code class="language-plaintext highlighter-rouge">for</code> is:</p>
<div class="language-c# highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">someArray</span><span class="p">[</span><span class="n">index</span><span class="p">].</span><span class="nf">DoSomething</span><span class="p">();</span>
</code></pre></div></div>
<p>Now, you might be looking at that and thinking “aren’t they the same thing?”, and the simple answer is: “no, no they are not”. You see, there are <em>two ways</em> of accessing values inside an array; you can <em>copy the data out</em> (<code class="language-plaintext highlighter-rouge">ldelem</code> in IL, which returns the <em>value</em> at the index), or you can access the data <em>directly inside the array</em> (<code class="language-plaintext highlighter-rouge">ldelema</code> in IL, which returns the <em>address</em> at the index). Ironically, we need an <em>address</em> to call the <code class="language-plaintext highlighter-rouge">DoSomething()</code> method, so for the <code class="language-plaintext highlighter-rouge">foreach</code> version, this actually becomes three steps: “copy out the value from the index, store the value to a local, get the address of a local” - instead of just “get the address of the index”; or in IL:</p>
<pre><code class="language-txt">IL_0006: ldloc.0 // the array
IL_0007: ldloc.1 // the index
IL_0008: ldelem SomeType // read value out from array:index
IL_000d: stloc.2 // store in local
IL_000e: ldloca.s 2 // get address of local
IL_0010: call instance void SomeType::DoSomething() // invoke method
</code></pre>
<p>vs</p>
<pre><code class="language-txt">IL_0004: ldarg.0 // the array
IL_0005: ldloc.0 // the index
IL_0006: ldelema SomeType // get address of array:index
IL_000b: call instance void SomeType::DoSomething() // invoke method
</code></pre>
<p>So by using <code class="language-plaintext highlighter-rouge">for</code> here, <em>not only</em> have we avoided copying the entire value, but we’ve dodged a few extra operations too! Nice. Depending on the size of the value being iterated (again, think “chunky struct” here), using <code class="language-plaintext highlighter-rouge">for</code> rather than <code class="language-plaintext highlighter-rouge">foreach</code> on an array (making sure you snapshot the value to elide bounds checks) can make a significant difference!</p>
<p>But: that’s arrays, and we aren’t always interested in arrays.</p>
<h1 id="but-how-does-that-help-me-outside-arrays">But how does that help me outside arrays?</h1>
<p>You might reasonably be thinking “great, but I don’t want to just hand arrays around” - after all, they give me no ability to protect the data, and they’re inconvenient for sizing - you can’t add/remove, short of creating a second array and copying all the data. This is where C# 7.3 takes a huge flex; it introduces a few key things here:</p>
<ul>
<li>C# 7.0 adds <code class="language-plaintext highlighter-rouge">ref</code> return values from custom methods <em>including indexers</em>, and <code class="language-plaintext highlighter-rouge">ref</code> local values (so you don’t need to use them immediately as a return value)</li>
<li>C# 7.2 adds <code class="language-plaintext highlighter-rouge">ref readonly</code> to most places where <code class="language-plaintext highlighter-rouge">ref</code> might be used (and <code class="language-plaintext highlighter-rouge">readonly struct</code>, which often applies here)</li>
<li>C# 7.3 adds <code class="language-plaintext highlighter-rouge">ref</code> (and <code class="language-plaintext highlighter-rouge">ref readonly</code>) as <code class="language-plaintext highlighter-rouge">foreach</code> L-values (i.e. the iterator value corresponding to <code class="language-plaintext highlighter-rouge">.Current</code>)</li>
</ul>
<p>Note that with <code class="language-plaintext highlighter-rouge">ref</code>, the caller can <em>mutate</em> the data in-place, which is not always wanted; <code class="language-plaintext highlighter-rouge">ref readonly</code> signals that we don’t want that to happen, hence why it is so often matched with <code class="language-plaintext highlighter-rouge">readonly struct</code> (to avoid having to make defensive copies of data), but as a warning: <code class="language-plaintext highlighter-rouge">readonly</code> is always a <em>guideline</em>, not a rule; a suitably motivated caller can convert a <code class="language-plaintext highlighter-rouge">ref readonly</code> to a <code class="language-plaintext highlighter-rouge">ref</code>, and can convert a <code class="language-plaintext highlighter-rouge">ReadOnlySpan<T></code> to a <code class="language-plaintext highlighter-rouge">Span<T></code>, and convert any of the above to an unmanaged <code class="language-plaintext highlighter-rouge">T*</code> pointer (at which point you can forget about all safety); this is not a bug, but a simple reality: <em>everything</em> is mutable if you try hard enough.</p>
<p>These languages features provide the building blocks - especially, but not exclusively, when combined with <code class="language-plaintext highlighter-rouge">Span<T></code>; <code class="language-plaintext highlighter-rouge">Span<T></code> (and the twin, <code class="language-plaintext highlighter-rouge">ReadOnlySpan<T></code>) provide unified access to arbitrary data, which <em>could</em> be a slice of an array, but could be anything else - with the usual <code class="language-plaintext highlighter-rouge">.Length</code>, indexer (<code class="language-plaintext highlighter-rouge">this[int index]</code>) and <code class="language-plaintext highlighter-rouge">foreach</code> support you’d expect, with some additional compiler and JIT tricks (much like with arrays) to make them fly. Since spans are naturally optimized, one of the first things we can do - if we don’t want to deal with arrays - is: deal with spans instead! This is sometimes a little hard to fit into existing systems without drastically refactoring the code, but more recently (.NET 5+), we get helper methods like <a href="https://docs.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.collectionsmarshal.asspan"><code class="language-plaintext highlighter-rouge">CollectionsMarshal.AsSpan</code></a>, which gives us the sized span of the data underpinning a <code class="language-plaintext highlighter-rouge">List<T></code>. This is only useful transiently (as any <code class="language-plaintext highlighter-rouge">Add</code>/<code class="language-plaintext highlighter-rouge">Remove</code> on the list will render the span broken - the length will be wrong, and it may now even point to the wrong array instance, if the list had to re-size the underlying data), but <em>when used correctly</em>, it allows us to access the data <em>in situ</em> rather than having to go via the indexer or iterator (both of which copy out the entire value at each position). For example:</p>
<div class="language-c# highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">foreach</span> <span class="p">(</span><span class="k">ref</span> <span class="kt">var</span> <span class="n">tmp</span> <span class="k">in</span> <span class="n">CollectionsMarshal</span><span class="p">.</span><span class="nf">AsSpan</span><span class="p">(</span><span class="n">someList</span><span class="p">))</span>
<span class="p">{</span> <span class="c1">// also works identically with "ref readonly var", since this is</span>
<span class="c1">// a readonly struct</span>
<span class="n">tmp</span><span class="p">.</span><span class="nf">DoSomething</span><span class="p">();</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Our use of <code class="language-plaintext highlighter-rouge">ref var tmp</code> with <code class="language-plaintext highlighter-rouge">foreach</code> here means that the L-value (<code class="language-plaintext highlighter-rouge">tmp</code>) is a <em>managed pointer</em> to the data - not the data itself; we have avoided copying the overweight value-type, and called the method in-place.</p>
<p>If you look carefully, the indexer on a span is not <code class="language-plaintext highlighter-rouge">T this[int index]</code>, but rather: <code class="language-plaintext highlighter-rouge">ref T this[int index]</code> (or <code class="language-plaintext highlighter-rouge">ref readonly T this[int index]</code> for <code class="language-plaintext highlighter-rouge">ReadOnlySpan<T></code>), so we can also use a <code class="language-plaintext highlighter-rouge">for</code> loop, and avoid copying the data at any point:</p>
<div class="language-c# highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">var</span> <span class="n">span</span> <span class="p">=</span> <span class="n">CollectionsMarshal</span><span class="p">.</span><span class="nf">AsSpan</span><span class="p">(someList);<br /></span><span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="p">=</span> <span class="m">0</span><span class="p">;</span> <span class="n">i</span> <span class="p"><</span> <span class="n">span</span><span class="p">.</span><span class="n">Length</span><span class="p">;</span> <span class="n">i</span><span class="p">++)</span>
<span class="p">{</span>
<span class="n">span</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">DoSomething()</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<h1 id="generalizing-this">Generalizing this</h1>
<p>Sometimes, spans aren’t viable either - for whatever reason. The good news is: we can do the exact same thing with our own types, in two ways:</p>
<ol>
<li>we can write our own types with an indexer that returns a <code class="language-plaintext highlighter-rouge">ref</code> or <code class="language-plaintext highlighter-rouge">ref readonly</code> managed pointer to the real data</li>
<li>we can write our own <em>iterator</em> types with a <code class="language-plaintext highlighter-rouge">ref</code> or <code class="language-plaintext highlighter-rouge">ref readonly</code> return value on <code class="language-plaintext highlighter-rouge">Current</code>; this won’t satisfy <code class="language-plaintext highlighter-rouge">IEnumerator<T></code>, but the <em>compiler</em> isn’t limited to <code class="language-plaintext highlighter-rouge">IEnumerator<T></code>, and if you’re writing a custom iterator (rather than using a <code class="language-plaintext highlighter-rouge">yield return</code> iterator block): you’re probably using a custom value-type iterator and <em>avoiding</em> the interface to make sure it never gets boxed accidentally, so: nothing is lost!</li>
</ol>
<p>Purely for illustration (you wouldn’t do this - you’d just use <code class="language-plaintext highlighter-rouge">ReadOnlySpan<T></code>), a very simple custom iterator could be something like:</p>
<div class="language-c# highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="k">struct</span> <span class="nc">Enumerator</span>
<span class="p">{</span>
<span class="k">private</span> <span class="k">readonly</span> <span class="n">SomeStruct</span><span class="p">[]</span> <span class="n">_array</span><span class="p">;</span>
<span class="k">private</span> <span class="kt">int</span> <span class="n">_index</span><span class="p">;</span>
<span class="k">internal</span> <span class="nf">Enumerator</span><span class="p">(</span><span class="n">SomeStruct</span><span class="p">[]</span> <span class="n">array</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">_array</span> <span class="p">=</span> <span class="n">array</span><span class="p">;</span>
<span class="n">_index</span> <span class="p">=</span> <span class="p">-</span><span class="m">1</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">public</span> <span class="kt">bool</span> <span class="nf">MoveNext</span><span class="p">()</span>
<span class="p">=></span> <span class="p">++</span><span class="n">_index</span> <span class="p"><</span> <span class="n">_array</span><span class="p">.</span><span class="n">Length</span><span class="p">;</span>
<span class="k">public</span> <span class="k">ref</span> <span class="k">readonly</span> <span class="n">SomeStruct</span> <span class="n">Current</span>
<span class="p">=></span> <span class="k">ref</span> <span class="n">_array</span><span class="p">[</span><span class="n">_index</span><span class="p">];</span>
<span class="p">}</span>
</code></pre></div></div>
<p>which would provide <code class="language-plaintext highlighter-rouge">foreach</code> access <em>almost</em> as good as a direct span. If the caller uses <code class="language-plaintext highlighter-rouge">foreach (var tmp in ...)</code> rather than <code class="language-plaintext highlighter-rouge">foreach(ref readonly var tmp in ...)</code>, then the compiler will simply de-reference the value for the caller, which it <em>would have done anyway in the old-style <code class="language-plaintext highlighter-rouge">foreach</code></em>, so: once again: no harm.</p>
<h1 id="summary">Summary</h1>
<p>In modern C#, we have a range of tricks that can help in certain niche scenarios relating to sequences of - in particular - value types. These scenarios don’t apply to everyone, <em>and that’s fine</em>. If you never need to use any of the above: <strong>that’s great</strong>, and good luck to you. But when you <em>do</em> need them, they are <em>incredibly</em> powerful and versatile, and a valuable tool in the optimizer’s toolbox.</p>
<p>The benchamrk code used for the table at the start of the post <a href="https://github.com/mgravell/blog-preview/tree/main/RefForeach">is included here</a>.</p>Marc Gravellhttp://www.blogger.com/profile/01023334706549710089noreply@blogger.comtag:blogger.com,1999:blog-8184237816669520763.post-10855747819278486492022-02-22T11:28:00.005-08:002022-02-22T11:33:58.508-08:00Migrating from Redis-64 to Memurai<p>or alternatively:</p>
<h1>How did updating to .NET 6 break asp-net redis cache for some users?</h1>
<p>Whereby I present the history of Redis-64, along with options and motivations for Redis-64 users on Windows to consider updating their redis via Memurai.</p>
<h2 id="running-redis-on-windows-2022-edition-replacing-redis-64">Running Redis on Windows, 2022 edition; replacing Redis-64</h2>
<p>A funny thing happened recently; after updating to .NET 6, some StackExchange.Redis users started reporting that redis was not working
from their web applications. A relatively small number, so: not an endemic fail - but also far from zero. As you might hope, we took
a look, and pieced together that what was <em>actually</em> happening here was:</p>
<ul>
<li>a part of ASP.NET allows using redis as a cache</li>
<li>historically, this used the <code class="language-plaintext highlighter-rouge">HMSET</code> redis command (which sets multiple hash fields, contrast to <code class="language-plaintext highlighter-rouge">HSET</code> which sets a single hash field)</li>
<li>in redis 4.0 (July 2014), <code class="language-plaintext highlighter-rouge">HSET</code> was made variadic and thus functionally identical to <code class="language-plaintext highlighter-rouge">HMSET</code> - and so <code class="language-plaintext highlighter-rouge">HMSET</code> was marked “deprecated” (although it still works)</li>
<li>respecting the “deprecated” marker, .NET 6 (Nov 2021) included a change to switch from <code class="language-plaintext highlighter-rouge">HMSET</code> to <code class="language-plaintext highlighter-rouge">HSET</code>, thinking that the number of people below redis 4.0 should be negligible</li>
<li>and it turned out not to be!</li>
</ul>
<p>This problem <a href="https://github.com/dotnet/aspnetcore/issues/38715">was reported</a> and the relevant code has now been fixed <a href="https://github.com/dotnet/aspnetcore/pull/38927">to support both variants</a>, but we need to take a step further and understand why a non-trivial number of users are <em>more than 7 years behind on servicing</em>. After a bit more probing, it is my understanding that for a lot of the affected users, the answer is simple: they are using Redis-64.</p>
<h2 id="what-is-was-redis-64">What is (was) Redis-64?</h2>
<p>Historically, the main redis project has only supported linux usage. There are some particular nuances of how redis is implemented (using
fork-based replication and persistance with copy-on-write semantics, for example) that don’t make for a direct “just recompile the code and it works the same” nirvana. Way back around the redis 2.6 era (2013), Microsoft (in the guise of MSOpenTech) released a viable Windows-compatible fork, under the name Redis-64 (May 2013). This fork was kept up to date through redis 2.8 and some 3.0 releases, but the development was ultimately dropped some time in 2016, leaving redis 3.0 as the last MSOpenTech redis implementation. There was also a Redis-32 variant for x86 usage, although this was even more short-lived, staying at 2.6.</p>
<p>I’m all in favor of a wide variety of good quality tools and options. If you want to run a redis server as part of a Windows installation, you should be able to do that! This could be because you already have Windows servers and administrative experience, and want a single OS deployment; it could be because you don’t want the additional overheads/complications of virtualization/container technologies. It could be because you’re primarily doing development on a Windows machine, and it is convenient. Clearly, Redis-64 was an attractive option to many people who want to run redis natively on Windows; I know we used it (in addition to redis on linux) when I worked with Stack Overflow.</p>
<h2 id="running-outdated-software-is-a-risk">Running outdated software is a risk</h2>
<p>Ultimately, being stuck with a server that is based on 2015/2016 starts to present a few problems:</p>
<ol>
<li>you need to live with long-known and long-fixed bugs and other problems (including any well-known security vulnerabilities)</li>
<li>you don’t get to use up-to-date features and capabilities</li>
<li>you might start dropping off the support horizon of 3rd party libraries and tools</li>
</ol>
<p>This 3rd option is what happened with ASP.NET in .NET 6, but the other points also stand; the “modules” (redis 4.x) and “streams” (redis 5.x) features come to mind immediately - both have huge utility.</p>
<p>So: if you’re currently using Redis-64, how can we resolve this, without completely changing our infrastructure?</p>
<h2 id="shout-out-memurai">Shout-out: Memurai</h2>
<p>The simplest way out of this corner is, in my opinion: Memurai, by Janea Systems. So: what is Memurai? To put it simply: Memurai is a redis 5 compatible fork of redis that runs natively on Windows. That means you get a wide range of more recent redis fixes and features. Fortunately, it is <a href="https://www.memurai.com/blog/install-redis-windows-alternatives-such-as-memurai">a breeze to install</a>, with options for <a href="https://www.nuget.org/packages/MemuraiDeveloper/">nuget</a>, <a href="https://community.chocolatey.org/packages/memurai-developer/">choco/cinst</a>, <a href="https://winget.run/pkg/Memurai/MemuraiDeveloper">winget</a>, <a href="https://winstall.app/apps/Memurai.MemuraiDeveloper">winstall</a> and <a href="https://www.memurai.com/get-memurai">an installer</a>. This means that you can get started with a Memurai development installation immediately.</p>
<p>The obsolete Redis-64 nuget package also now carries a link to Memurai in the “Suggested Alternatives”, which is encouraging. To be transparent: I need to emphasize - Memurai is a commercial offering with a free developer edition. If we look at how Redis-64 ultimately stagnated, I view this as a strength: it means that someone has a vested interest in making sure that the product continues to evolve and be supported, now and into the future.</p>
<h2 id="working-with-memurai">Working with Memurai</h2>
<p>As previously noted: installation is quick and simple, but so is working with it. The command-line tools change nominally; instead of <code class="language-plaintext highlighter-rouge">redis-cli</code>, we have <code class="language-plaintext highlighter-rouge">memurai-cli</code>; instead of <code class="language-plaintext highlighter-rouge">redis-server</code> we have <code class="language-plaintext highlighter-rouge">memurai</code>. However, they work exactly as you expect and will be immediately familar to anyone who has used redis. At the server level, Memurai surfaces the exact same protocol and API surface as a vanilla redis server, meaning any existing redis-compatible tools and clients should work without problem:</p>
<pre><code class="language-txt">c:\Code>memurai-cli
127.0.0.1:6379> get foo
(nil)
127.0.0.1:6379> set foo bar
OK
127.0.0.1:6379> get bar
(nil)
127.0.0.1:6379>
</code></pre>
<p>(note that <code class="language-plaintext highlighter-rouge">redis-cli</code> would have worked identically)</p>
<p>At the metadata level, you may notice that <code class="language-plaintext highlighter-rouge">info server</code> reports some additional antries:</p>
<pre><code class="language-txt">127.0.0.1:6379> info server
# Server
memurai_edition:Memurai Developer
memurai_version:2.0.5
redis-version:5.0.14
...
</code></pre>
<p>The <code class="language-plaintext highlighter-rouge">redis_version</code> entry is present so that client libraries and applications expecting this entry can understand the features available, so this is effectively the redis API compatibility level; the <code class="language-plaintext highlighter-rouge">memurai_version</code> and <code class="language-plaintext highlighter-rouge">memurai_edition</code> give specific Memurai information, if you need it - but other than those additions (and extra rows are expected here), everything works as you would expect. For example, we can use any pre-existing redis client to talk to the server:</p>
<div class="language-c# highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">using</span> <span class="nn">StackExchange.Redis</span><span class="p">;</span>
<span class="c1">// connect to local redis, default port</span>
<span class="k">using</span> <span class="nn">var</span> <span class="n">conn</span> <span class="p">=</span> <span class="k">await</span> <span class="n">ConnectionMultiplexer</span><span class="p">.</span><span class="nf">ConnectAsync</span><span class="p">(</span><span class="s">"127.0.0.1"</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">db</span> <span class="p">=</span> <span class="n">conn</span><span class="p">.</span><span class="nf">GetDatabase</span><span class="p">();</span>
<span class="c1">// reset and populate some data</span>
<span class="k">await</span> <span class="n">db</span><span class="p">.</span><span class="nf">KeyDeleteAsync</span><span class="p">(</span><span class="s">"mykey"</span><span class="p">);</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="p">=</span> <span class="m">1</span><span class="p">;</span> <span class="n">i</span> <span class="p"><=</span> <span class="m">20</span><span class="p">;</span> <span class="n">i</span><span class="p">++)</span>
<span class="p">{</span>
<span class="k">await</span> <span class="n">db</span><span class="p">.</span><span class="nf">StringIncrementAsync</span><span class="p">(</span><span class="s">"mykey"</span><span class="p">,</span> <span class="n">i</span><span class="p">);</span>
<span class="p">}</span>
<span class="c1">// fetch and display</span>
<span class="kt">var</span> <span class="n">sum</span> <span class="p">=</span> <span class="p">(</span><span class="kt">int</span><span class="p">)</span><span class="k">await</span> <span class="n">db</span><span class="p">.</span><span class="nf">StringGetAsync</span><span class="p">(</span><span class="s">"mykey"</span><span class="p">);</span>
<span class="n">Console</span><span class="p">.</span><span class="nf">WriteLine</span><span class="p">(</span><span class="n">sum</span><span class="p">);</span> <span class="c1">// writes: 210</span>
</code></pre></div></div>
<p>Configuring the server works exactly like it does for redis - the config file works the same, although the example template is named differently:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>c:\Code>where memurai
C:\Program Files\Memurai\memurai.exe
c:\Code>dir "C:\Program Files\Memurai\*.conf" /B
memurai.conf
</code></pre></div></div>
<h2 id="summary">Summary</h2>
<p>Putting this all together: if you’re currently choosing Redis-64 to run a redis server natively on Windows, then Memurai might make a very appealing option - certainly more appealing than remaining on the long-obsolete Redis-64. All of your existing redis knowledge continues to apply, but you get a wide range of features that were added to redis after Redis-64 was last maintained. Are there other ways of running redis on Windows? Absolutely. But for people in the Redis-64 zone, it looks like a good option.</p>Marc Gravellhttp://www.blogger.com/profile/01023334706549710089noreply@blogger.comtag:blogger.com,1999:blog-8184237816669520763.post-56575112190126147572021-05-03T09:53:00.002-07:002021-05-03T12:50:38.467-07:00Is the era of reflection-heavy C# libraries at an end?<section id="main_content" class="inner">
<p>I’m going to talk about reflection-heavy libraries; I will describe the scenario I’m talking about - as it is commonly used today, the status quo, giving a brief overview of the pros and cons of this, and then present the case that <em>times have changed</em>, and with new language and runtime features: it may be time to challenge our way of thinking about this kind of library.</p>
<hr>
<p>I’m a code-first kind of developer; I <em>love</em> the inner-loop experience of being able to tweak some C# types and immediately have everything work, and I <em>hate</em> having to mess in external DSLs or configuration files (protobuf/xml/json/yaml/etc). Over the last almost-two-decades, I’ve selected or written libraries that allow me to work that way. And, to be fair, this seems to be a pretty common way of working in .NET.</p>
<p>What this means in reality is that we tend to have libraries where <em>a lot of magic happens at runtime</em>, based either on the various <code class="language-plaintext highlighter-rouge"><T></code> for generic APIs, or via <code class="language-plaintext highlighter-rouge">GetType()</code> on objects that are passed in. Consider the following examples:</p>
<h3 id="jsonnet">Json.NET</h3>
<div class="language-c# highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// from https://www.newtonsoft.com/json</span>
<span class="n">Product</span> <span class="n">product</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">Product</span><span class="p">();</span>
<span class="n">product</span><span class="p">.</span><span class="n">Name</span> <span class="p">=</span> <span class="s">"Apple"</span><span class="p">;</span>
<span class="n">product</span><span class="p">.</span><span class="n">Expiry</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">DateTime</span><span class="p">(</span><span class="m">2008</span><span class="p">,</span> <span class="m">12</span><span class="p">,</span> <span class="m">28</span><span class="p">);</span>
<span class="n">product</span><span class="p">.</span><span class="n">Sizes</span> <span class="p">=</span> <span class="k">new</span> <span class="kt">string</span><span class="p">[]</span> <span class="p">{</span> <span class="s">"Small"</span> <span class="p">};</span>
<span class="kt">string</span> <span class="n">json</span> <span class="p">=</span> <span class="n">JsonConvert</span><span class="p">.</span><span class="nf">SerializeObject</span><span class="p">(</span><span class="n">product</span><span class="p">);</span>
</code></pre></div></div>
<h3 id="dapper">Dapper</h3>
<div class="language-c# highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">var</span> <span class="n">producer</span> <span class="p">=</span> <span class="s">"Megacorp, Inc."</span><span class="p">;</span>
<span class="kt">var</span> <span class="n">products</span> <span class="p">=</span> <span class="n">connection</span><span class="p">.</span><span class="n">Query</span><span class="p"><</span><span class="n">Product</span><span class="p">>(</span><span class="s">@"
select Id, Name, Expiry
from Products
where Producer = producer"</span><span class="p">,</span>
<span class="k">new</span> <span class="p">{</span> <span class="n">producer</span> <span class="p">}).</span><span class="nf">AsList</span><span class="p">();</span>
</code></pre></div></div>
<p>I won’t try to give an exhaustive list, but there are a <em>myriad</em> of libraries - both by Microsoft, or 3rd-party, for a <em>myriad</em> of purposes, that fundamentally fall into the camp of:</p>
<blockquote>
<p>At runtime, given some <code class="language-plaintext highlighter-rouge">Type</code>: check the local library type cache; if we haven’t seen that <code class="language-plaintext highlighter-rouge">Type</code> before: perform a <em>ton</em> of reflection code to understand the model, produce a strategy to <em>implement</em> the library features on that model, and then expose some simplified API that invokes that strategy.</p>
</blockquote>
<p>Behind the scenes, this might be “simple” naive reflection (<code class="language-plaintext highlighter-rouge">PropertyInfo.GetValue()</code>, etc), or it might use the <code class="language-plaintext highlighter-rouge">Expression</code> API or the ref-emit API (mainly: <code class="language-plaintext highlighter-rouge">ILGenerator</code>) to write runtime methods directly, or it might generate C# that it then runs through the compiler (<code class="language-plaintext highlighter-rouge">XmlSerializer</code> used to work this way, and may well still do so).</p>
<p>This provides a pretty reasonable experience for the consumer; their code <em>just works</em>, and - sure, the library does a lot of work behind the scenes, but the library authors usually invest a decent amount of time into trying to minimize that so you aren’t paying the reflection costs every time.</p>
<h2 id="so-what-is-the-problem">So what is the problem?</h2>
<p>For many cases: this is fine - we’ve certainly lived well-enough with it for the last however-many years; but: times change. In particular, a few things have become increasingly significant in the last few years:</p>
<ul>
<li>
<h3 id="asyncawait"><code class="language-plaintext highlighter-rouge">async</code>/<code class="language-plaintext highlighter-rouge">await</code></h3>
<p>Increasing demands of highly performant massively parallel code (think: “web servers”, for example) has made <code class="language-plaintext highlighter-rouge">async</code>/<code class="language-plaintext highlighter-rouge">await</code> hugely important; from the consumer perspective, it is easy to think that this is <em>mostly</em> a “sprinkle in some <code class="language-plaintext highlighter-rouge">async</code>/<code class="language-plaintext highlighter-rouge">await</code> keywords” (OK, I’m glossing over a <em>lot</em> of nuance here), but <em>behind the scenes</em>, the compiler is doing a lot - like <strong>a real lot</strong> of work for us. If you’re in the <code class="language-plaintext highlighter-rouge">Expression</code> or <code class="language-plaintext highlighter-rouge">ILGenerator</code> mind-set, switching <em>fully</em> to <code class="language-plaintext highlighter-rouge">async</code>/<code class="language-plaintext highlighter-rouge">await</code> is virtually impossible - it is just <em>too much</em>. At best, you can end up with some <code class="language-plaintext highlighter-rouge">async</code> shell library codes that calls into some generated <code class="language-plaintext highlighter-rouge">Func<...></code>/<code class="language-plaintext highlighter-rouge">Action<...></code> code, but that assumes that the context-switch points (i.e. the places where you’d want to <code class="language-plaintext highlighter-rouge">await</code> etc) can be <em>conveniently mapped to that split</em>. It isn’t assumed that a reflection-heavy library <em>can even be</em> carved up in this way.</p>
</li>
<li>
<h3 id="aot-platforms">AOT platforms</h3>
<p>At the other end of the spectrum, we have AOT devices - think “Xamarin”, “Unity”, etc. Running on a small device can mean that you have reduced computational power, so you start noticing the time it takes to inspect models at runtime - but they <em>also</em> often have deliberately restricted runtimes that prevent runtime code generation. This means that you <em>can probably</em> get away with the naive reflection approach (which is relatively slow), but you won’t be able to emit optimized code via <code class="language-plaintext highlighter-rouge">ILGenerator</code>; the <code class="language-plaintext highlighter-rouge">Expression</code> approach is a nice compromise here, in that it will optimize <em>when it can</em>, but use naive reflection when that isn’t possible - but you still end up paying the performance cost.</p>
</li>
<li>
<h3 id="linkers">Linkers</h3>
<p>Another feature of AOT device scenarios is that they often involve trimmed deployments via a pruning linker, but <a href="https://docs.microsoft.com/en-us/dotnet/core/deploying/single-file">“Single file deployment and executable”</a> deployments are now a “thing” for regular .NET Core 5 / .NET 6+. This brings two problems:</p>
<ol>
<li>we need to work very hard to convince the linker <em>not</em> to remove things that our library is going to need to use at runtime, despite the fact that they aren’t used if you scan the assembly in isolation</li>
<li>our reflection-heavy library often needs to consider <em>all the possible problematic edge scenarios that could exist, ever</em>, which means it might <em>appear</em> to touch a lot more things than it does, when in reality <em>for the majority of runs</em> it is just going to be asking “do I need to consider this? oh, nope, that’s fine” - because the library appears to touch it</li>
<li>we thus find ourselves fighting the linker’s tendency to <em>remove everything we need</em> while simultaneously <em>retaining everything that doesn’t apply to our scenario</em></li>
</ol>
</li>
<li>
<h3 id="cold-start">Cold start</h3>
<p>It is easy to think of applications as having a relatively long duration, so: cold-start performance doesn’t matter. Now consider things like “Azure functions”, or other environments where our code is invoked for a very brief time, as-needed (often on massively shared infrastructure); in <em>this</em> scenario, cold-start performance translates directly (almost linearly) to throughput, and thus real money</p>
</li>
<li>
<h3 id="runtime-error-discovery">Runtime error discovery</h3>
<p>One of the problems with having the library do all the model analysis at runtime is that you don’t get feedback on your code <em>until you run it</em>; and sure, you can (and should) write unit/integration tests that push your model through the library in every way you can think of, but: things get missed. This means that code that compiled blows up at runtime, for reasons that <em>should be knowable</em> - an “obvious” attribute misconfiguration, for example.</p>
</li>
<li>
<h3 id="magic-code">Magic code</h3>
<p>Magic is bad. By which I mean: if I said to you “there’s going to be some code running in your application, that doesn’t exist anywhere - it can’t be seen on GitHub, or in your source-code, or in the IDE, or in the assembly IL, or <em>anywhere</em>, and by the way it probably uses lots of really gnarly unusual IL, but trust me it is totally legit” - you might get a little worried; but that is <em>exactly what all of these libraries do</em>. I’m not being hyperbolic here; I’ve personally received bug-reports from the JIT god (<a href="https://github.com/AndyAyersMS">AndyAyersMS</a>) because my generated IL used <em>ever so slightly</em> the wrong pointer type in one place, which <em>worked fine almost always</em>, except when it didn’t and exploded the runtime.</p>
</li>
</ul>
<h2 id="there-is-a-different-way-we-can-do-all-of-this">There is a different way we can do all of this</h2>
<p>Everything above is a side-effect of <em>the tools that have been available to us</em> - when the only tool you’ve had for years has been a hammer, you get used to thinking in terms of nails. For “code first”, that really meant “reflection”, which meant “runtime”. Reflection-based library authors aren’t ignorant of these problems, and for a long time now have been talking to the framework and language teams about options. As the above problem scenarios have become increasingly important, we’ve recently been graced with new features in Roslyn (the C# / VB compiler engine), i.e. “generators”. So: what are generators?</p>
<p>Imagine you could take your reflection-based analysis code, and inject it <em>way</em> earlier - in the build pipe, so when your <em>library consumer</em> is building <em>their</em> code (whether they’re using Visual Studio, or <code class="language-plaintext highlighter-rouge">dotnet build</code> or whatever else), you get given the compiler’s view of the code (the types, the methods, etc), and <em>at that point</em> you had the chance to <em>add your own code</em> (note: purely additive - you aren’t allowed to <em>change</em> the existing code), and have our additional code included in the build. That: would be a generator. This solves <em>most</em> of the problems we’ve discussed:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">async</code>: our <em>generated</em> code can use <code class="language-plaintext highlighter-rouge">async</code>/<code class="language-plaintext highlighter-rouge">await</code>, and we can <em>just let the regular compiler worry about what that means</em> - we don’t need to get our hands dirty</li>
<li>AOT: all of the <em>actual code needed at runtime</em> exists <em>in the assemblies we ship</em> - nothing needs to be generated at runtime</li>
<li>linkers: the required code is <em>now much more obvious</em>, because: <em>it exists in the assembly</em>; conversely, because we can consider all the problematic edge scenarios <em>during build</em>, the workarounds needed for those niche scenarios <em>don’t get included</em> when they’re not needed, and nor do their dependency chains</li>
<li>cold start: we now don’t need to do <em>any</em> model inspection or generation at runtime: it is <em>already done</em> during build</li>
<li>error discovery: our generator doubles as a Roslyn analyzer; it can emit warnings and errors <em>during build</em> if it finds something suspicious in our model</li>
<li>magic code: the consumer can <em>see the generated code in the IDE</em>, or the final IL in the assembly</li>
</ul>
<p>If you’re thinking “this sounds great!”, you’d be right. It is a huge step towards addressing the problems described above.</p>
<h2 id="what-does-a-generator-look-like-for-a-consumer">What does a generator look like for a consumer?</h2>
<p>From the “I’m an application developer, just make things work for me” perspective, using a generator <em>firstly</em> means adding a build-time package; for example, to add <a href="https://github.com/DapperLib/DapperAOT/">DapperAOT</a> (which is purely experimental at this point, don’t get too excited), we would add (to our <code class="language-plaintext highlighter-rouge">csproj</code>):</p>
<div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt"><ItemGroup></span>
<span class="nt"><PackageReference</span> <span class="na">Include=</span><span class="s">"Dapper.AOT"</span>
<span class="na">Version=</span><span class="s">"0.0.8"</span> <span class="na">PrivateAssets=</span><span class="s">"all"</span>
<span class="na">IncludeAssets=</span><span class="s">"runtime;build;native;contentfiles;analyzers;buildtransitive"</span> <span class="nt">/></span>
<span class="nt"></ItemGroup></span>
</code></pre></div></div>
<p>This package isn’t part of what gets shipped in our application - it just gets hooked into the build pipe. Then we need to follow the library’s instructions on what is needed! In many cases, I would expect the library to self-discover scenarios where it needs to get involved, but as with any library, there might be special methods we need to call, or attributes we need to add, to make the magic happen. For example, with DapperAOT I’m thinking of having the consumer declare their intent via <code class="language-plaintext highlighter-rouge">partial</code> methods in a <code class="language-plaintext highlighter-rouge">partial</code> type:</p>
<div class="language-c# highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[</span><span class="nf">Command</span><span class="p">(</span><span class="s">@"select * from Customers where Id=@id and Region=@region"</span><span class="p">)]</span>
<span class="p">[</span><span class="nf">SingleRow</span><span class="p">(</span><span class="n">SingleRowKind</span><span class="p">.</span><span class="n">FirstOrDefault</span><span class="p">)]</span> <span class="c1">// entirely optional; this is</span>
<span class="c1">// to influence what happens when zero/multiple rows returned</span>
<span class="k">public</span> <span class="k">static</span> <span class="k">partial</span> <span class="n">Customer</span> <span class="nf">GetCustomer</span><span class="p">(</span>
<span class="n">DbConnection</span> <span class="n">connection</span><span class="p">,</span> <span class="kt">int</span> <span class="n">id</span><span class="p">,</span> <span class="kt">string</span> <span class="n">region</span><span class="p">);</span>
</code></pre></div></div>
<p>If you haven’t seen this <code class="language-plaintext highlighter-rouge">partial</code> usage before, this is an “extended partial method” in C# 9, which basically means <code class="language-plaintext highlighter-rouge">partial</code> methods can now be accessible, have return values, <code class="language-plaintext highlighter-rouge">out</code> parameters, etc - the caveat is that <em>somewhere</em> the compiler expects to find another matching half of the <code class="language-plaintext highlighter-rouge">partial</code> method that provides an implementation. Our generator can detect the above dangling <code class="language-plaintext highlighter-rouge">partial</code> method, and add the implementation <em>in the generated code</em>. This generated code is then available in the IDE, either by stepping into the code as usual, or in the solution explorer:</p>
<p><img alt="Showing the solution explorer, expanding: (the project), Dependencies, Analyzers, Dapper.AOT, Dapper.CoreAnalysis.CommandGenerator, Dapper.generated.cs" src="https://mgravell.github.io/Blog/Generators/SolutionExplorer.png" width="600"></p>
<p>and as a code file:</p>
<p><img alt="The generated code file Dapper.generated.cs, declaring the GetCustomer method" src="https://mgravell.github.io/Blog/Generators/CodeFile.png" width="800"></p>
<p>Other libraries may choose other approaches, perhaps using module initializers to register some specific type handlers into a lightweight library, that handle expected known types (as discovered during build); or it could detect API calls that don’t resolve, and <em>add them</em> (either via <code class="language-plaintext highlighter-rouge">partial</code> types, or extension methods) - like a custom <code class="language-plaintext highlighter-rouge">dynamic</code> type, but where the convention-based APIs are very real, but generated automatically during build. But the theme remains: from the consumer perspective, everything <em>just works</em>, and is now <em>more</em> discoverable.</p>
<h2 id="what-does-a-generator-look-like-for-a-library-author">What does a generator look like for a library author?</h2>
<p>Things are a little more complicated for the library author; the Roslyn semantic tree is <em>similar</em> to the kind of model you get at runtime - but it isn’t <em>the same</em> model; in particular, you’re not working with <code class="language-plaintext highlighter-rouge">Type</code> any more, you’re working with <code class="language-plaintext highlighter-rouge">ITypeSymbol</code> or (perhaps more commonly) <code class="language-plaintext highlighter-rouge">INamedTypeSymbol</code>. That’s because the type system that you’re <em>inspecting</em> is not the same as the type system <em>that you’re running on</em> - it could be for an entirely different framework, for example. But if you’re already used to complex reflection analysis, most things are pretty obvious. It <em>isn’t very hard, honest</em>. Mostly, this involves:</p>
<ol>
<li>implementing <code class="language-plaintext highlighter-rouge">ISourceGenerator</code> (and marking that type with <code class="language-plaintext highlighter-rouge">[Generator]</code>)</li>
<li>implementing <code class="language-plaintext highlighter-rouge">ISyntaxReceiver</code> to capture candidate nodes you might want to look at later</li>
<li>implementing <code class="language-plaintext highlighter-rouge">ISourceGenerator.Initialize</code> to <em>register</em> your <code class="language-plaintext highlighter-rouge">ISyntaxReceiver</code></li>
<li>implementing <code class="language-plaintext highlighter-rouge">ISourceGenerator.Execute</code> to perform whatever logic you need against the nodes you captured</li>
<li>calling <code class="language-plaintext highlighter-rouge">context.AddSource</code> some number of times to add whatever file(s) you need</li>
</ol>
<p>I’m <em>not</em> going to give a full lesson on “how to write a generator” - I’m mostly trying to set the scene for <em>why</em> you might want to consider this, but there is a <a href="https://github.com/dotnet/roslyn/blob/main/docs/features/source-generators.cookbook.md">Source Generators Cookbook</a> that covers a lot, or I humbly submit that the <a href="https://github.com/DapperLib/DapperAOT/">DapperAOT code</a> might be interesting (I am <em>not</em> suggesting that it does everything the best way, but: it kinda works, and shows input-source-file-based unit testing etc).</p>
<h2 id="this-all-sounds-too-good-to-be-true-what-is-the-catch">This all sounds too good to be true? What is the catch?</h2>
<p>Nothing is free. There’s a few gotchas here.</p>
<ol>
<li>It is a lot of re-work; if you have an existing non-trivial library, this represents a <em>lot</em> of effort</li>
<li>You may also need to re-work your main library, perhaps splitting the “reflection aware” code and “making things work” code into two separate pieces, with the generator-based approach only needing the latter half</li>
<li>Some scenarios may be hard to detect reliably during code analysis - where your code is seven layers down in generic types and methods, for example, it may be hard to discover all of the <em>original</em> types that are passed into your library; and if it is just <code class="language-plaintext highlighter-rouge">object</code>: even harder; we may need to consider this when designing APIs, or provide fallback mechanisms to educate the generator (for example, <code class="language-plaintext highlighter-rouge">[model:GenerateJsonSerializerFor(typeof(Customer))]</code>)</li>
<li>There are some things we can’t do in C# that we <em>can</em> do in IL; change <code class="language-plaintext highlighter-rouge">readonly</code> fields, call <code class="language-plaintext highlighter-rouge">init</code>/<code class="language-plaintext highlighter-rouge">get</code>-only properties, bypass accessibility, etc; in some cases, we might be able to generate <code class="language-plaintext highlighter-rouge">internal</code> constructors in another <code class="language-plaintext highlighter-rouge">partial class</code> (for example), that allows us to sneak past those boundaries, but in some other cases (where the type being used isn’t part of the current compilation, because it comes from a package reference) it might simply be that <em>we can’t offer the exact same features</em> (or need to use a fallback reflection scenario)</li>
<li>It is <strike>C# specific</strike> (edit: C# and VB, my mistake!); this is a <em>huuuuuge</em> “but”, and I can hear the F#, <strike>VB</strike>, etc developers gnashing their teeth already; there’s a very nuanced conversation here about whether the advantages I’ve covered outweigh the disadvantages of not being able to offer the same features on all .NET platforms</li>
<li>It needs up-to-date build tools, which may limit adoption (note: this does <em>not</em> mean we can only use generators when building against .NET 6 etc)</li>
<li>We have less flexibility to configure things at runtime; in practice, this isn’t usually a problem as long as we <em>can actually configure it</em>, which can be done at build-time using attributes (and by using <code class="language-plaintext highlighter-rouge">[Conditional(...)]</code> on our configuration attributes, we don’t even need to include them in the final assembly - they can be used by the generator and then discarded by the compiler)</li>
</ol>
<p>That said, there’s also some great upsides - during build we have access to information that <em>doesn’t exist in the reflection model</em>, for example the <em>name</em> parts of value-tuples (which are exposed <em>outwards</em> via attributes, but not <em>inwards</em>; libraries are inwards, from this perspective), and more reliable nullability annotation data when calling generic APIs with nullability.</p>
<h2 id="summary">Summary</h2>
<p>I genuinely think we should be embracing generators and reducing or removing completely our reliance on runtime reflection emit code. I say this as someone who has built a pretty successful niche as an expert in those areas, and would have to start again with the new tools - I see the benefits, despite the work and wrinkles. Not only that, I think there is an opportunity here (with things like “extended partial methods” etc) to make our application code <em>even more expressive</em>, rather than having than having to worry about dancing around library implementation details.</p>
<p>But I welcome competing thoughts!</p>
</section>Marc Gravellhttp://www.blogger.com/profile/01023334706549710089noreply@blogger.comtag:blogger.com,1999:blog-8184237816669520763.post-69845659700265570932020-05-18T02:44:00.003-07:002020-05-18T02:44:35.905-07:00Multi-path cancellation; a tale of two codependent async enumerators<p>Disclaimer: I'll be honest: many of the concepts in this post are a bit more advanced - some viewer caution is advised! It touches on concurrent linked async enumerators that share a termination condition by combining multiple <code>CancellationToken</code>.</p>
<hr>
<p>Something that I've been looking at recently - in the context of gRPC (and <a href="https://www.nuget.org/packages/protobuf-net.Grpc/" rel="nofollow">protobuf-net.Grpc</a> in particular) - is the complex story of duplex data pipes. A full-duplex connection is a connection between two nodes, but instead of being request-response, either node can send messages at any time. There's still a notional "client" and "server", but that is purely a feature of which node was sat listening for connection attempts vs which node reached out and <em>established</em> a connection. Shaping a duplex API is much more complex than shaping a request-response API, and frankly: a lot of the details around timing are <em>hard</em>.</p>
<p>So: I had the idea that maybe we can reshape everything at the library level, and offer the consumer something more familiar. It makes an interesting (to me, at least) worked example of cancellation in practice. So; let's start with an imaginary <em>transport</em> API (the thing that is happening underneath) - let's say that we have:</p>
<ul>
<li>a client establishes a connection (we're not going to worry about how)</li>
<li>there is a <code>SendAsync</code> message that sends a message from the client to the server</li>
<li>there is a <code>TryReceiveAsync</code> message that attempts to await a message <em>from</em> the server (this will also report <code>true</code> if a message could be fetched, and <code>false</code> if the server has indicated that it won't ever be sending any more)</li>
<li>additionally, the server controls data flow termination; if the server indicates that it has sent the last message, the client should not send any more</li>
</ul>
<p>something like (where <code>TRequest</code> is the data-type being sent from the client to the server, and <code>TResponse</code> is the data-type expected from the server to the client):</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">interface</span> <span class="pl-en">ITransport</span><<span class="pl-en">TRequest</span>, <span class="pl-en">TResponse</span>> : <span class="pl-en">IAsyncDisposable</span>
{
<span class="pl-en">ValueTask</span> <span class="pl-en">SendAsync</span>(<span class="pl-en">TRequest</span> <span class="pl-smi">request</span>,
<span class="pl-en">CancellationToken</span> <span class="pl-smi">cancellationToken</span>);
<span class="pl-en">ValueTask</span><(<span class="pl-k">bool</span> <span class="pl-en">Success</span>, <span class="pl-en">TResponse</span> <span class="pl-en">Message</span>)> <span class="pl-en">TryReceiveAsync</span>(
<span class="pl-en">CancellationToken</span> <span class="pl-smi">cancellationToken</span>);
}</pre></div>
<p>This API doesn't <em>look</em> all that complicated - it <em>looks</em> like (if we ignore connection etc for the moment) we can just create a couple of loops, and expose the data via enumerators - presumably starting the <code>SendAsync</code> via <code>Task.Run</code> or similar so it is on a parallel flow:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-en">ITransport</span><<span class="pl-en">TRequest</span>, <span class="pl-en">TResponse</span>> <span class="pl-smi">transport</span>;
<span class="pl-k">public</span> <span class="pl-k">async</span> <span class="pl-en">IAsyncEnumerable</span><<span class="pl-en">TResponse</span>> <span class="pl-en">ReceiveAsync</span>(
[<span class="pl-en">EnumeratorCancellation</span>] <span class="pl-en">CancellationToken</span> <span class="pl-smi">cancellationToken</span>)
{
<span class="pl-k">while</span> (<span class="pl-c1">true</span>)
{
<span class="pl-k">var</span> (<span class="pl-en">success</span>, <span class="pl-en">message</span>) <span class="pl-k">=</span>
<span class="pl-k">await</span> <span class="pl-smi">transport</span>.<span class="pl-en">TryReceiveAsync</span>(<span class="pl-smi">cancellationToken</span>);
<span class="pl-k">if</span> (<span class="pl-k">!</span><span class="pl-smi">success</span>) <span class="pl-k">break</span>;
<span class="pl-k">yield</span> <span class="pl-k">return</span> <span class="pl-smi">message</span>;
}
}
<span class="pl-k">public</span> <span class="pl-k">async</span> <span class="pl-en">ValueTask</span> <span class="pl-en">SendAsync</span>(
<span class="pl-en">IAsyncEnumerable</span><<span class="pl-en">TRequest</span>> <span class="pl-smi">data</span>,
<span class="pl-en">CancellationToken</span> <span class="pl-smi">cancellationToken</span>)
{
<span class="pl-k">await</span> <span class="pl-en">foreach</span> (<span class="pl-smi">var</span> <span class="pl-smi">message</span> <span class="pl-k">in</span> <span class="pl-smi">data</span>
.<span class="pl-en">WithCancellation</span>(<span class="pl-smi">cancellationToken</span>))
{
<span class="pl-k">await</span> <span class="pl-smi">transport</span>.<span class="pl-en">SendAsync</span>(<span class="pl-smi">message</span>, <span class="pl-smi">cancellationToken</span>);
}
}</pre></div>
<p>and it <em>looks</em> like we're all set for cancellation - we can pass in an external cancellation-token to both methods, and we're set. Right?</p>
<p>Well, it is a bit more complex than that, and the above doesn't take into consideration that these two flows are <em>codependent</em>. In particular, a big concern is that we don't want to leave the producer (the thing pumping <code>SendAsync</code>) still running in any scenario where the connection is doomed. There are actually many more cancellation paths than we might think:</p>
<ol>
<li>we might have supplied an external cancellation-token to both methods, and this token may have triggered</li>
<li>the <em>consumer</em> of <code>ReceiveAsync</code> (the thing iterating it) might have supplied a cancellation-token to <code>GetAsyncEnumerator</code> (via <code>WithCancellation</code>), and this token may have been triggered (we looked at this <a href="https://blog.marcgravell.com/2020/05/the-anatomy-of-async-iterators-aka.html" rel="nofollow">last time</a>)</li>
<li>we could have faulted in our send/receive code</li>
<li>the consumer of <code>ReceiveAsync</code> may have decided <em>not to take all the data</em> - that might be because of some async simile of <code>Enumerable.Take()</code>, or it could be because <em>they</em> faulted when processing a message they had received</li>
<li>the producer in <code>SendAsync</code> may have faulted</li>
</ol>
<p>All of these scenarios <em>essentially</em> signify termination of the connection, so we need to be able to encompass all of these scenarios in some way that allows us to communicate the problem between the send and receive path. In a word, we want our own <code>CancellationTokenSource</code>.</p>
<p>There's a lot going on here; more than we can reasonably expect consumers to do <em>each and every time they use the API</em>, so this is a perfect scenario for a library method. Let's imagine that we want to encompass all this complexity in a simple single library API that the consumer can access - something like:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">public</span> <span class="pl-en">IAsyncEnumerable</span><<span class="pl-en">TResponse</span>> <span class="pl-en">Duplex</span>(
<span class="pl-en">IAsyncEnumerable</span><<span class="pl-en">TRequest</span>> <span class="pl-smi">request</span>,
<span class="pl-en">CancellationToken</span> <span class="pl-smi">cancellationToken</span> <span class="pl-k">=</span> <span class="pl-smi">default</span>);</pre></div>
<p>This:</p>
<ul>
<li>allows them to pass <em>in</em> a producer</li>
<li>optionally allows them to pass in an external cancellation-token</li>
<li>makes an async feed of responses available to them</li>
</ul>
<p>Their usage might be something like:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-en">await</span> <span class="pl-en">foreach</span> (<span class="pl-en">MyResponse</span> <span class="pl-smi">item</span> <span class="pl-en">in</span> <span class="pl-smi">client</span>.Duplex(ProducerAsync()))
{
<span class="pl-smi">Console</span>.<span class="pl-en">WriteLine</span>(<span class="pl-smi">item</span>);
}</pre></div>
<p>where their <code>ProducerAsync()</code> method is (just "because"):</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">async</span> <span class="pl-en">IAsyncEnumerable</span><<span class="pl-en">MyRequest</span>> <span class="pl-en">ProducerAsync</span>(
[<span class="pl-en">EnumeratorCancellation</span>] <span class="pl-en">CancellationToken</span> <span class="pl-smi">cancellationToken</span> <span class="pl-k">=</span> <span class="pl-smi">default</span>)
{
<span class="pl-k">for</span> (<span class="pl-k">int</span> <span class="pl-smi">i</span> <span class="pl-k">=</span> <span class="pl-c1">0</span>; <span class="pl-smi">i</span> <span class="pl-k"><</span> <span class="pl-c1">100</span>; <span class="pl-smi">i</span><span class="pl-k">++</span>)
{
<span class="pl-k">yield</span> <span class="pl-k">return</span> <span class="pl-k">new</span> <span class="pl-en">MyRequest</span>(<span class="pl-smi">i</span>);
<span class="pl-k">await</span> <span class="pl-smi">Task</span>.<span class="pl-en">Delay</span>(<span class="pl-c1">100</span>, <span class="pl-smi">cancellationToken</span>);
}
}</pre></div>
<p>As I discussed in <a href="https://blog.marcgravell.com/2020/05/the-anatomy-of-async-iterators-aka.html" rel="nofollow">The anatomy of async iterators (aka await, foreach, yield)</a>, our call to <code>ProducerAsync()</code> <em>doesn't actually do much yet</em> - this just hands a place-holder that <em>can</em> be enumerated later, and it is the act of enumerating it that actually invokes the code. Very important point, that.</p>
<p>So; what can our <code>Duplex</code> code do? It <em>already</em> needs to think about at least 2 different kinds of cancellation:</p>
<ul>
<li>the external token that was passed into <code>cancellationToken</code></li>
<li>the potentially different token that <em>could</em> be passed into <code>GetAsyncEnumerator()</code> <em>when it is consumed</em></li>
</ul>
<p>but we know from our thoughts earler that we <em>also</em> have a bunch of other ways of cancelling. We can do something clever here. Recall how the <em>compiler</em> usually combines the above two tokens for us? Well, if we do that <em>ourselves</em>, then instead of getting just a <code>CancellationToken</code>, we find ourselves with a <code>CancellationTokenSource</code>, which gives us lots of control:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">public</span> <span class="pl-en">IAsyncEnumerable</span><<span class="pl-en">TResponse</span>> <span class="pl-en">Duplex</span>(
<span class="pl-en">IAsyncEnumerable</span><<span class="pl-en">TRequest</span>> <span class="pl-smi">request</span>,
<span class="pl-en">CancellationToken</span> <span class="pl-smi">cancellationToken</span> <span class="pl-k">=</span> <span class="pl-smi">default</span>)
<span class="pl-k">=></span> <span class="pl-en">DuplexImpl</span>(<span class="pl-smi">transport</span>, <span class="pl-smi">request</span>, <span class="pl-smi">cancellationToken</span>);
<span class="pl-k">private</span> <span class="pl-k">async</span> <span class="pl-k">static</span> <span class="pl-en">IAsyncEnumerable</span><<span class="pl-en">TResponse</span>> <span class="pl-en">DuplexImpl</span>(
<span class="pl-en">ITransport</span><<span class="pl-en">TRequest</span>, <span class="pl-en">TResponse</span>> <span class="pl-smi">transport</span>,
<span class="pl-en">IAsyncEnumerable</span><<span class="pl-en">TRequest</span>> <span class="pl-smi">request</span>,
<span class="pl-en">CancellationToken</span> <span class="pl-smi">externalToken</span>,
[<span class="pl-en">EnumeratorCancellation</span>] <span class="pl-en">CancellationToken</span> <span class="pl-smi">enumeratorToken</span> <span class="pl-k">=</span> <span class="pl-smi">default</span>)
{
<span class="pl-smi">using</span> <span class="pl-k">var</span> <span class="pl-smi">allDone</span> <span class="pl-k">=</span> <span class="pl-smi">CancellationTokenSource</span>.<span class="pl-en">CreateLinkedTokenSource</span>(
<span class="pl-smi">externalToken</span>, <span class="pl-smi">enumeratorToken</span>);
<span class="pl-c"><span class="pl-c">//</span> ... todo</span>
}</pre></div>
<p>Our <code>DuplexImpl</code> method here allows the enumerator cancellation to be provided, but (importantly) kept <em>separate</em> from the original external token; this means that it won't yet be combined, and we can do that ourselves using <code>CancellationTokenSource.CreateLinkedTokenSource</code> - much like the compiler would
have done for us, but: now we have a <code>CancellationTokenSource</code> that we can cancel <em>when we choose</em>. This means that we can use <code>allDone.Token</code> in all the places we want to ask "are we done yet?", and we're considering everything.</p>
<p>For starters, let's handle the scenario where the consumer <em>doesn't take all the data</em> (out of choice, or because of a fault). We want to trigger <code>allDone</code> <em>however</em> we exit <code>DuplexImpl</code>. Fortunately, the way that iterator blocks are implemented makes this simple (and we're already using it here, via <code>using</code>): recall (from the previous blog post) that <code>foreach</code> and <code>await foreach</code> both (usually) include a <code>using</code> block that invokes <code>Dispose</code>/<code>DisposeAsync</code> on the enumerator instance? Well: anything we put in a <code>finally</code> <em>essentially</em> relocates to that <code>Dispose</code>/<code>DisposeAsync</code>. The upshot of this is that triggering the cancellation token when the consumer is <em>done with us</em> is trivial:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">using</span> <span class="pl-en">var</span> <span class="pl-en">allDone</span> <span class="pl-k">=</span> <span class="pl-en">CancellationTokenSource</span>.<span class="pl-en">CreateLinkedTokenSource</span>(
<span class="pl-en">externalToken</span>, <span class="pl-en">enumeratorToken</span>);
<span class="pl-k">try</span>
{
<span class="pl-c"><span class="pl-c">//</span> ... todo</span>
}
<span class="pl-k">finally</span>
{ <span class="pl-c"><span class="pl-c">//</span> cancel allDone however we exit</span>
<span class="pl-smi">allDone</span>.<span class="pl-en">Cancel</span>();
}</pre></div>
<p>The next step is to get our <em>producer</em> working - that's our <code>SendAsync</code> code. Because this is duplex, it doesn't have any bearing on the incoming messages, so we'll start that as a completely separate code-path via <code>Task.Run</code>, but we can make it such that <em>if the producer or send faults</em>, it stops the entire show; so if we look just at our <code>// ... todo</code> code, we can add:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">var</span> <span class="pl-smi">send</span> <span class="pl-k">=</span> <span class="pl-smi">Task</span>.<span class="pl-en">Run</span>(<span class="pl-k">async</span> () <span class="pl-k">=></span>
{
<span class="pl-k">try</span>
{
<span class="pl-k">await</span> <span class="pl-en">foreach</span> (<span class="pl-smi">var</span> <span class="pl-smi">message</span> <span class="pl-k">in</span>
<span class="pl-smi">request</span>.<span class="pl-en">WithCancellation</span>(<span class="pl-smi">allDone</span>.<span class="pl-smi">Token</span>))
{
<span class="pl-k">await</span> <span class="pl-smi">transport</span>.<span class="pl-en">SendAsync</span>(<span class="pl-smi">message</span>, <span class="pl-smi">allDone</span>.<span class="pl-smi">Token</span>);
}
}
<span class="pl-smi">catch</span>
{ <span class="pl-c"><span class="pl-c">//</span> trigger cancellation if send faults</span>
<span class="pl-smi">allDone</span>.<span class="pl-en">Cancel</span>();
<span class="pl-k">throw</span>;
}
}, <span class="pl-smi">allDone</span>.<span class="pl-smi">Token</span>);
<span class="pl-c"><span class="pl-c">//</span> ... todo: receive</span>
<span class="pl-k">await</span> <span class="pl-smi">send</span>; <span class="pl-c"><span class="pl-c">//</span> observe send outcome</span></pre></div>
<p>This starts a parallel operation that <em>consumes the data from our producer</em>, but notice that we're using <code>allDone.Token</code> to pass our combined cancellation knowledge <em>to the producer</em>. This is very subtle, because it represents a cancellation state that <em>didn't even conceptually exist</em> at the time <code>ProducerAsync()</code> was originall invoked. The fact that <code>GetAsyncEnumerator</code> is <em>deferred</em> has allowed us to give it something <em>much more useful</em>, and as long as <code>ProducerAsync()</code> uses the cancellation-token appropriately, it can now be fully aware of the life-cycle of the composite duplex operation.</p>
<p>This just leaves our receive code, which is more or less like it was originally, but again: using <code>allDone.Token</code>:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">while</span> (<span class="pl-c1">true</span>)
{
<span class="pl-k">var</span> (<span class="pl-en">success</span>, <span class="pl-en">message</span>) <span class="pl-k">=</span> <span class="pl-k">await</span> <span class="pl-smi">transport</span>.<span class="pl-en">TryReceiveAsync</span>(<span class="pl-smi">allDone</span>.<span class="pl-smi">Token</span>);
<span class="pl-k">if</span> (<span class="pl-k">!</span><span class="pl-smi">success</span>) <span class="pl-k">break</span>;
<span class="pl-k">yield</span> <span class="pl-k">return</span> <span class="pl-smi">message</span>;
}
<span class="pl-c"><span class="pl-c">//</span> the server's last message stops everything</span>
<span class="pl-smi">allDone</span>.<span class="pl-en">Cancel</span>();</pre></div>
<p>Putting all this together gives us a non-trivial libray function:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">private</span> <span class="pl-k">async</span> <span class="pl-k">static</span> <span class="pl-en">IAsyncEnumerable</span><<span class="pl-en">TResponse</span>> <span class="pl-en">DuplexImpl</span>(
<span class="pl-en">ITransport</span><<span class="pl-en">TRequest</span>, <span class="pl-en">TResponse</span>> <span class="pl-smi">transport</span>,
<span class="pl-en">IAsyncEnumerable</span><<span class="pl-en">TRequest</span>> <span class="pl-smi">request</span>,
<span class="pl-en">CancellationToken</span> <span class="pl-smi">externalToken</span>,
[<span class="pl-en">EnumeratorCancellation</span>] <span class="pl-en">CancellationToken</span> <span class="pl-smi">enumeratorToken</span> <span class="pl-k">=</span> <span class="pl-smi">default</span>)
{
<span class="pl-smi">using</span> <span class="pl-k">var</span> <span class="pl-smi">allDone</span> <span class="pl-k">=</span> <span class="pl-smi">CancellationTokenSource</span>.<span class="pl-en">CreateLinkedTokenSource</span>(
<span class="pl-smi">externalToken</span>, <span class="pl-smi">enumeratorToken</span>);
<span class="pl-k">try</span>
{
<span class="pl-k">var</span> <span class="pl-smi">send</span> <span class="pl-k">=</span> <span class="pl-smi">Task</span>.<span class="pl-en">Run</span>(<span class="pl-k">async</span> () <span class="pl-k">=></span>
{
<span class="pl-k">try</span>
{
<span class="pl-k">await</span> <span class="pl-en">foreach</span> (<span class="pl-smi">var</span> <span class="pl-smi">message</span> <span class="pl-k">in</span>
<span class="pl-smi">request</span>.<span class="pl-en">WithCancellation</span>(<span class="pl-smi">allDone</span>.<span class="pl-smi">Token</span>))
{
<span class="pl-k">await</span> <span class="pl-smi">transport</span>.<span class="pl-en">SendAsync</span>(<span class="pl-smi">message</span>, <span class="pl-smi">allDone</span>.<span class="pl-smi">Token</span>);
}
}
<span class="pl-smi">catch</span>
{ <span class="pl-c"><span class="pl-c">//</span> trigger cancellation if send faults</span>
<span class="pl-smi">allDone</span>.<span class="pl-en">Cancel</span>();
<span class="pl-k">throw</span>;
}
}, <span class="pl-smi">allDone</span>.<span class="pl-smi">Token</span>);
<span class="pl-k">while</span> (<span class="pl-c1">true</span>)
{
<span class="pl-k">var</span> (<span class="pl-en">success</span>, <span class="pl-en">message</span>) <span class="pl-k">=</span> <span class="pl-k">await</span> <span class="pl-smi">transport</span>.<span class="pl-en">TryReceiveAsync</span>(<span class="pl-smi">allDone</span>.<span class="pl-smi">Token</span>);
<span class="pl-k">if</span> (<span class="pl-k">!</span><span class="pl-smi">success</span>) <span class="pl-k">break</span>;
<span class="pl-k">yield</span> <span class="pl-k">return</span> <span class="pl-smi">message</span>;
}
<span class="pl-c"><span class="pl-c">//</span> the server's last message stops everything</span>
<span class="pl-smi">allDone</span>.<span class="pl-en">Cancel</span>();
<span class="pl-k">await</span> <span class="pl-smi">send</span>; <span class="pl-c"><span class="pl-c">//</span> observe send outcome</span>
}
<span class="pl-k">finally</span>
{ <span class="pl-c"><span class="pl-c">//</span> cancel allDone however we exit</span>
<span class="pl-smi">allDone</span>.<span class="pl-en">Cancel</span>();
}
}</pre></div>
<p>The key points here being:</p>
<ul>
<li>both the external token and the enumerator token contribute to <code>allDone</code></li>
<li>the transport-level send and receive code uses <code>allDone.Token</code></li>
<li>the producer enumeration uses <code>allDone.Token</code></li>
<li>however we exit our enumerator, <code>allDone</code> is cancelled
<ul>
<li>if transport-receive faults, <code>allDone</code> is cancelled</li>
<li>if the consumer terminates early, <code>allDone</code> is cancelled</li>
</ul>
</li>
<li>when we receive the last message from the server, <code>allDone</code> is cancelled</li>
<li>if the producer or transport-send faults, <code>allDone</code> is cancelled</li>
</ul>
<p>The one thing it <em>doesn't</em> support well is people using <code>GetAsyncEnumerator()</code> directly and <em>not</em> disposing it. That comes under the heading of "using the API incorrectly", and is self-inflicted.</p>
<p>A side note on <code>ConfigureAwait(false)</code>; by default <code>await</code> includes a check on <code>SynchronizationContext.Current</code>; in addition to meaning an extra context-switch, in the case of UI applications this may mean running code on the UI thread that <em>does not need</em> to run on the UI thread. Library code usually does not <em>require</em> this (it isn't as though we're updating form controls here, so we don't need thread-affinity). As such, in library code, it is common to use <code>.ConfigureAwait(false)</code> <em>basically everywhere that you see an <code>await</code></em> - which bypasses this mechanism. I have not included that in the code above, for readability, but: you should <em>imagine</em> it being there :) By contrast, in <em>application</em> code, you should usually default to just using <code>await</code> without <code>ConfigureAwait</code>, unless you know you're writing something that doesn't need sync-context.</p>
<p>I hope this has been a useful delve into some of the more complex things you can do with cancellation-tokens, and how you can combine them to represent codependent exit conditions.</p>Marc Gravellhttp://www.blogger.com/profile/01023334706549710089noreply@blogger.comtag:blogger.com,1999:blog-8184237816669520763.post-55308755159921714802020-05-14T03:56:00.000-07:002020-05-14T07:20:57.763-07:00The anatomy of async iterators (aka await, foreach, yield)<p>Here I'm going to discuss the mechanisms and concepts relating to async iterators in C# - with the hope of both demystifying them a bit, and also showing how we can use some of the more advanced (but slightly hidden) features. I'm going to give some illustrations of what happens under the hood, but note: these are <em>illustrations</em>, not the literal generated expansion - this is deliberately to help show what is <em>conceptually</em> happening, so if I ignore some sublte implementation detail: that's not accidental. As always, if you want to see the <em>actual</em> code, tools like <a href="https://sharplab.io/" rel="nofollow">https://sharplab.io/</a> are awesome (just change the "Results" view to "C#" and paste the code you're interested in onto the left).</p>
<h2>Iterators in the sync world</h2>
<p>Before we discuss async iterators, let's start by recapping iterators. Many folks may already be familiar with all of this, but hey: it helps to set the scene. More importantly, it is useful to allow us to compare and contrast later when we look at how <code>async</code> changes things. So: we know that we can write a foreach loop (over a sequence) of the form:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">foreach</span> (<span class="pl-k">var</span> <span class="pl-smi">item</span> <span class="pl-k">in</span> <span class="pl-en">SomeSource</span>(<span class="pl-c1">42</span>))
{
<span class="pl-smi">Console</span>.<span class="pl-en">WriteLine</span>(<span class="pl-smi">item</span>);
}</pre></div>
<p>and for each item that <code>SomeSource</code> returns, we'll get a line in the console. <code>SomeSource</code> <em>could</em> be returning a fully buffered set of data (like a <code>List<string></code>):</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-en">IEnumerable</span><<span class="pl-k">string</span>> <span class="pl-en">SomeSource</span>(<span class="pl-k">int</span> <span class="pl-smi">x</span>)
{
<span class="pl-k">var</span> <span class="pl-smi">list</span> <span class="pl-k">=</span> <span class="pl-k">new</span> <span class="pl-en">List</span><<span class="pl-k">string</span>>();
<span class="pl-k">for</span> (<span class="pl-k">int</span> <span class="pl-smi">i</span> <span class="pl-k">=</span> <span class="pl-c1">0</span>; <span class="pl-smi">i</span> <span class="pl-k"><</span> <span class="pl-c1">5</span>; <span class="pl-smi">i</span><span class="pl-k">++</span>)
<span class="pl-smi">list</span>.<span class="pl-en">Add</span>(<span class="pl-s"><span class="pl-pds">$"</span>result from SomeSource, x={<span class="pl-smi">x</span>}, result {<span class="pl-smi">i</span>}<span class="pl-pds">"</span></span>);
<span class="pl-k">return</span> <span class="pl-smi">list</span>;
}</pre></div>
<p>but a problem here is that this requires <code>SomeSource</code> to run <em>to completion</em> before we get even the first result, which could take a lot of time and memory - and is just generally restrictive. Often, when we're trying to represent a <em>sequence</em>, it may be unbounded, or at least: open-ended - for example, we could be pulling data from a remote work queue, where a: we only want to be holding one pending item at a time, and b: it may not <em>have</em> a logical "end". It turns out that C#'s definition of a "sequence" (for the purposes of <code>foreach</code>) is fine with this. Instead of <em>returning</em> a list, we can write an <em>iterator block</em>:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-en">IEnumerable</span><<span class="pl-k">string</span>> <span class="pl-en">SomeSource</span>(<span class="pl-k">int</span> <span class="pl-smi">x</span>)
{
<span class="pl-k">for</span> (<span class="pl-k">int</span> <span class="pl-smi">i</span> <span class="pl-k">=</span> <span class="pl-c1">0</span>; <span class="pl-smi">i</span> <span class="pl-k"><</span> <span class="pl-c1">5</span>; <span class="pl-smi">i</span><span class="pl-k">++</span>)
<span class="pl-k">yield</span> <span class="pl-k">return</span> <span class="pl-s"><span class="pl-pds">$"</span>result from SomeSource, x={<span class="pl-smi">x</span>}, result {<span class="pl-smi">i</span>}<span class="pl-pds">"</span></span>;
}</pre></div>
<p>This works <em>similarly</em>, but there are some fundamental differences - most noticeably: we don't ever have a buffer - we just make one element available at a time. To understand how this can work, it is useful to take another look at our <code>foreach</code>; the compiler interprets <code>foreach</code> as something <em>like</em> the following:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">using</span> (<span class="pl-en">var</span> <span class="pl-en">iter</span> <span class="pl-k">=</span> <span class="pl-en">SomeSource</span>(42).<span class="pl-en">GetEnumerator</span>())
{
<span class="pl-en">while</span> (<span class="pl-en">iter</span>.<span class="pl-en">MoveNext</span>())
{
<span class="pl-en">var</span> <span class="pl-en">item</span> <span class="pl-k">=</span> <span class="pl-en">iter</span>.<span class="pl-en">Current</span>;
<span class="pl-smi">Console</span>.<span class="pl-en">WriteLine</span>(<span class="pl-smi">item</span>);
}
}</pre></div>
<p>We have to be a <em>little</em> loose in our phrasing here, because <code>foreach</code> isn't actually tied to <code>IEnumerable<T></code> - it is duck-typed against an API shape instead; the <code>using</code> may or may not be there, for example. But fundamentally, the compiler calls <code>GetEnumerator()</code> on the expression passed to <code>foreach</code>, then creates a <code>while</code> loop checking <code>MoveNext()</code> (which defines "is there more data?" and advances the mechanism in the success case), then accesses the <code>Current</code> property (which exposes the element we advanced to). As an aside, historically (prior to C# 5) the compiler used to scope <code>item</code> <em>outside</em> of the <code>while</code> loop, which might sound innocent, but it was the source of <strong>absolutely no end</strong> of confusion, code erros, and questions on Stack Overflow (think "captured variables").</p>
<p>So; hopefully you can see in the above how the <em>consumer</em> can access an unbounded forwards-only sequence via this <code>MoveNext()</code> / <code>Current</code> approach; but how does that get <em>implemented</em>? Iterator blocks (anything involving the <code>yield</code> keyword) are actually <em>incredibly</em> complex, so I'm going to take a lot of liberties here, but what is going on is <em>similar</em> to:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-en">IEnumerable</span><<span class="pl-k">string</span>> <span class="pl-en">SomeSource</span>(<span class="pl-k">int</span> <span class="pl-smi">x</span>)
<span class="pl-k">=></span> <span class="pl-k">new</span> <span class="pl-en">GeneratedEnumerable</span>(<span class="pl-smi">x</span>);
<span class="pl-k">class</span> <span class="pl-en">GeneratedEnumerable</span> : <span class="pl-en">IEnumerable</span><<span class="pl-k">string</span>>
{
<span class="pl-k">private</span> <span class="pl-k">int</span> <span class="pl-smi">x</span>;
<span class="pl-k">public</span> <span class="pl-en">GeneratedEnumerable</span>(<span class="pl-k">int</span> <span class="pl-smi">x</span>)
<span class="pl-k">=></span> <span class="pl-k">this</span>.<span class="pl-smi">x</span> <span class="pl-k">=</span> <span class="pl-smi">x</span>;
<span class="pl-k">public</span> <span class="pl-en">IEnumerator</span><<span class="pl-k">string</span>> <span class="pl-en">GetEnumerator</span>()
<span class="pl-k">=></span> <span class="pl-k">new</span> <span class="pl-en">GeneratedEnumerator</span>(<span class="pl-smi">x</span>);
<span class="pl-c"><span class="pl-c">//</span> non-generic fallback</span>
IEnumerator IEnumerable.<span class="pl-en">GetEnumerator</span>()
<span class="pl-k">=></span> <span class="pl-en">GetEnumerator</span>();
}
<span class="pl-k">class</span> <span class="pl-en">GeneratedEnumerator</span> : <span class="pl-en">IEnumerator</span><<span class="pl-k">string</span>>
{
<span class="pl-k">private</span> <span class="pl-k">int</span> <span class="pl-smi">x</span>, <span class="pl-smi">i</span>;
<span class="pl-k">public</span> <span class="pl-en">GeneratedEnumerator</span>(<span class="pl-k">int</span> <span class="pl-smi">x</span>)
<span class="pl-k">=></span> <span class="pl-k">this</span>.<span class="pl-smi">x</span> <span class="pl-k">=</span> <span class="pl-smi">x</span>;
<span class="pl-k">public</span> <span class="pl-k">string</span> <span class="pl-smi">Current</span> { <span class="pl-k">get</span>; <span class="pl-k">private</span> <span class="pl-k">set</span>; }
<span class="pl-c"><span class="pl-c">//</span> non-generic fallback</span>
object IEnumerator.Current => Current;
<span class="pl-c"><span class="pl-c">//</span> if we had "finally" code, it would go here</span>
<span class="pl-k">public</span> <span class="pl-k">void</span> <span class="pl-en">Dispose</span>() { }
<span class="pl-c"><span class="pl-c">//</span> our "advance" logic</span>
<span class="pl-k">public</span> <span class="pl-k">bool</span> <span class="pl-en">MoveNext</span>()
{
<span class="pl-k">if</span> (<span class="pl-smi">i</span> <span class="pl-k"><</span> <span class="pl-c1">5</span>)
{
<span class="pl-smi">Current</span> <span class="pl-k">=</span> <span class="pl-s"><span class="pl-pds">$"</span>result from SomeSource, x={<span class="pl-smi">x</span>}, result {<span class="pl-smi">i</span>}<span class="pl-pds">"</span></span>;
<span class="pl-smi">i</span><span class="pl-k">++</span>;
<span class="pl-k">return</span> <span class="pl-c1">true</span>;
}
<span class="pl-k">else</span>
{
<span class="pl-k">return</span> <span class="pl-c1">false</span>;
}
}
<span class="pl-c"><span class="pl-c">//</span> this API is essentially deprecated and never used</span>
void IEnumerator.<span class="pl-en">Reset</span>() <span class="pl-k">=></span> <span class="pl-k">throw</span> <span class="pl-k">new</span> <span class="pl-en">NotSupportedException</span>();
}</pre></div>
<p>Let's tear this apart:</p>
<ul>
<li>firstly, we need <em>some object</em> to represent <code>IEnumerable<T></code>, but we also need to understand that <code>IEnumerable<T></code> and <code>IEnumerator<T></code> (as returned from <code>GetEnumerator()</code>) are <em>different</em> APIs; in the <em>generated</em> version there is a lot of overlap and they can share an instance, but to help discuss it, I've kept the two concepts separate.</li>
<li>when we call <code>SomeSource</code>, we create our <code>GeneratedEnumerable</code> which stores the state (<code>x</code>) that was passed to <code>SomeSource</code>, and exposes the required <code>IEnumerable<T></code> API</li>
<li>later (and it could be <em>much</em> later), when the caller iterates (<code>foreach</code>) the data, <code>GetEnumerator()</code> is invoked, which calls into our <code>GeneratedEnumerator</code> to act as the cursor over the data</li>
<li>our <code>MoveNext()</code> logic implements the same <code>for</code> loop <em>conceptually</em>, but one step per call to <code>MoveNext()</code>; if there is more data, <code>Current</code> is assigned with the thing we would have passed to <code>yield return</code></li>
<li>note that there is also a <code>yield break</code> C# keyword, which terminates iteration; this would essentially be <code>return false</code> in the generated expansion</li>
<li>note that there are some nuanced differences in my hand-written version that the C# compiler needs to deal with; for example, what happens if I change <code>x</code> in my enumerator code (<code>MoveNext()</code>), and then <em>later</em> iterate the data a <em>second time</em> - what is the value of <code>x</code>? emphasis: I don't care about this nuance for this discussion!</li>
</ul>
<p>Hopefully this gives enough of a flavor to understand <code>foreach</code> and iterators (<code>yield</code>) - now let's get onto the more interesting bit: <code>async</code>.</p>
<h2>Why do we need async iterators?</h2>
<p>The above works great in a synchronous world, but a <em>lot</em> of .NET work is now favoring <code>async</code>/<code>await</code>, in particular to improve server scalability. The big problem in the above code is the <code>bool MoveNext()</code>. This is <em>explicitly synchronous</em>. If the thing it is doing takes some time, we'll be blocking a thread, and blocking a thread is increasingly anathema to us. In the context of our earlier "remote work queue" example, there might not be anything there for seconds, minutes, hours. We really don't want to block threads for that kind of time! The closest we can do without async iterators is to fetch the data asynchronously, but buffered - for example:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">async</span> <span class="pl-en">Task</span><<span class="pl-en">List</span><<span class="pl-k">string</span>>> <span class="pl-en">SomeSource</span>(<span class="pl-k">int</span> <span class="pl-smi">x</span>) {...}</pre></div>
<p>But this is <em>not the same semantics</em> - and is getting back into buffering. Assuming we don't want to fetch everything in one go, to get around this we'd eventually end up implementing some kind of "async batch loop" monstrosity that effectily re-implements <code>foreach</code> using manual ugly code, negating the reasons that <code>foreach</code> even exists. To address this, C# and the BCL have recently added support for async iterators, yay! The new APIs (which are available down to net461 and netstandard20 <a href="https://www.nuget.org/packages/Microsoft.Bcl.AsyncInterfaces" rel="nofollow">via NuGet</a>) are:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">public</span> <span class="pl-k">interface</span> <span class="pl-en">IAsyncEnumerable</span><<span class="pl-k">out</span> <span class="pl-en">T</span>>
{
<span class="pl-en">IAsyncEnumerator</span><<span class="pl-en">T</span>> <span class="pl-en">GetAsyncEnumerator</span>(<span class="pl-en">CancellationToken</span> <span class="pl-smi">cancellationToken</span> <span class="pl-k">=</span> <span class="pl-smi">default</span>);
}
<span class="pl-k">public</span> <span class="pl-k">interface</span> <span class="pl-en">IAsyncEnumerator</span><<span class="pl-k">out</span> <span class="pl-en">T</span>> : <span class="pl-en">IAsyncDisposable</span>
{
<span class="pl-en">T</span> <span class="pl-smi">Current</span> { <span class="pl-k">get</span>; }
<span class="pl-en">ValueTask</span><<span class="pl-k">bool</span>> <span class="pl-en">MoveNextAsync</span>();
}
<span class="pl-k">public</span> <span class="pl-k">interface</span> <span class="pl-en">IAsyncDisposable</span>
{
<span class="pl-en">ValueTask</span> <span class="pl-en">DisposeAsync</span>();
}</pre></div>
<p>Let's look at our example again, this time: with added async; we'll look at the <em>consumer</em> first (the code doing the <code>foreach</code>), so for now, let's imagine that we have:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-en">IAsyncEnumerable</span><<span class="pl-k">string</span>> <span class="pl-en">SomeSourceAsync</span>(<span class="pl-k">int</span> <span class="pl-smi">x</span>)
<span class="pl-k">=></span> <span class="pl-k">throw</span> <span class="pl-k">new</span> <span class="pl-en">NotImplementedException</span>();</pre></div>
<p>and focus on the loop; C# now has the <code>await foreach</code> concept, so we can do:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-en">await</span> <span class="pl-en">foreach</span> (<span class="pl-en">var</span> <span class="pl-smi">item</span> <span class="pl-en">in</span> <span class="pl-smi">SomeSourceAsync</span>(42))
{
<span class="pl-smi">Console</span>.<span class="pl-en">WriteLine</span>(<span class="pl-smi">item</span>);
}</pre></div>
<p>and the compiler interprets this as something similar to:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-en">await</span> <span class="pl-en">using</span> (<span class="pl-en">var</span> <span class="pl-smi">iter</span> <span class="pl-k">=</span> <span class="pl-en">SomeSourceAsync</span>(<span class="pl-c1">42</span>).<span class="pl-en">GetAsyncEnumerator</span>())
{
<span class="pl-k">while</span> (<span class="pl-k">await</span> <span class="pl-smi">iter</span>.<span class="pl-en">MoveNextAsync</span>())
{
<span class="pl-k">var</span> <span class="pl-smi">item</span> <span class="pl-k">=</span> <span class="pl-smi">iter</span>.<span class="pl-smi">Current</span>;
<span class="pl-smi">Console</span>.<span class="pl-en">WriteLine</span>(<span class="pl-smi">item</span>);
}
}</pre></div>
<p>(note that <code>await using</code> is similar to <code>using</code>, but <code>DisposeAsync()</code> is called and awaited, instead of <code>Dispose()</code> - even cleanup code can be asynchronous!)</p>
<p>The key point here is that this is actually pretty similar to our sync version, just with added <code>await</code>. Ultimately, however, the moment we add <code>await</code> the entire body is ripped apart by the compiler and rewritten as an asynchronous state machine. That isn't the topic of this article, so I'm not even going to <em>try</em> and cover how <code>await</code> is implemented behind the scenes. For today "a miracle happens" will suffice for that. The observant might also be wondering "wait, but what about cancellation?" - don't worry, we'll get there!</p>
<p>So what about our enumerator? Along with <code>await foreach</code>, we can <em>also</em> now write async iterators with <code>yield</code>; for example, we could do:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">async</span> <span class="pl-en">IAsyncEnumerable</span><<span class="pl-k">string</span>> <span class="pl-en">SomeSourceAsync</span>(<span class="pl-k">int</span> <span class="pl-smi">x</span>)
{
<span class="pl-k">for</span> (<span class="pl-k">int</span> <span class="pl-smi">i</span> <span class="pl-k">=</span> <span class="pl-c1">0</span>; <span class="pl-smi">i</span> <span class="pl-k"><</span> <span class="pl-c1">5</span>; <span class="pl-smi">i</span><span class="pl-k">++</span>)
{
<span class="pl-k">await</span> <span class="pl-smi">Task</span>.<span class="pl-en">Delay</span>(<span class="pl-c1">100</span>); <span class="pl-c"><span class="pl-c">//</span> simulate async something</span>
<span class="pl-k">yield</span> <span class="pl-k">return</span> <span class="pl-s"><span class="pl-pds">$"</span>result from SomeSource, x={<span class="pl-smi">x</span>}, result {<span class="pl-smi">i</span>}<span class="pl-pds">"</span></span>;
}
}</pre></div>
<p>In real code, we could now be consuming data from a remote source asynchronously, and we have a <em>very</em> effective mechanism for expressing open sequences of <em>asynchronous</em> data. In particular, remember that the <code>await iter.MoveNextAsync()</code> <strong>might complete synchronously</strong>, so if data <em>is</em> available immediately, there is no context switch. We can imagine, for example, an iterator block that requests data from a remote server <em>in pages</em>, and <code>yield return</code> each record of the data in the current page (making it available immediately), only doing an <code>await</code> when it needs to fetch the next page.</p>
<p>Behind the scenes, the compiler generates types to implement the <code>IAsyncEnumerable<T></code> and <code>IAsyncEnumerator<T></code> pieces, but this time they are <em>even more obtuse</em>, owing to the <code>async</code>/<code>await</code> restructuring. I <em>do not</em> intend to try and cover those here - it is my hope instead that we wave a hand and say "you know that expansion we wrote by hand earlier? like that, but with more async". However, there is a very important topic that we <em>have</em> overlooked, and that we should cover: cancellation.</p>
<h2>But what about cancellation?</h2>
<p>Most async APIs support cancellation via a <code>CancellationToken</code>, and this is no exception; look back up to <code>IAsyncEnumerable<T></code> and you'll see that it can be passed into the <code>GetAsyncEnumerator()</code> method. But if we're not writing the loop by hand, how do we do this? This is achieved via <code>WithCancellation</code>, similarly do how <code>ConfigureAwait</code> can be used to configure <code>await</code> - and indeed, there's even a <code>ConfigureAwait</code> we can use too! For example, we could do (showing both config options in action here):</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-en">await</span> <span class="pl-en">foreach</span> (<span class="pl-en">var</span> <span class="pl-smi">item</span> <span class="pl-en">in</span> <span class="pl-smi">SomeSourceAsync</span>(42)
.WithCancellation(cancellationToken).ConfigureAwait(false))
{
<span class="pl-smi">Console</span>.<span class="pl-en">WriteLine</span>(<span class="pl-smi">item</span>);
}</pre></div>
<p>which would be semantically equivalent to:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">var</span> <span class="pl-smi">iter</span> <span class="pl-k">=</span> <span class="pl-en">SomeSourceAsync</span>(<span class="pl-c1">42</span>).<span class="pl-en">GetAsyncEnumerator</span>(<span class="pl-smi">cancellationToken</span>);
<span class="pl-en">await</span> <span class="pl-en">using</span> (iter.ConfigureAwait(false))
{
<span class="pl-k">while</span> (<span class="pl-k">await</span> <span class="pl-smi">iter</span>.<span class="pl-en">MoveNextAsync</span>().<span class="pl-en">ConfigureAwait</span>(<span class="pl-c1">false</span>))
{
<span class="pl-k">var</span> <span class="pl-smi">item</span> <span class="pl-k">=</span> <span class="pl-smi">iter</span>.<span class="pl-smi">Current</span>;
<span class="pl-smi">Console</span>.<span class="pl-en">WriteLine</span>(<span class="pl-smi">item</span>);
}
}</pre></div>
<p>(I've had to split the <code>iter</code> local out to illustrate that the <code>ConfigureAwait</code> applies to the <code>DisposeAsync()</code> too - via <code>await iter.DisposeAsync().ConfigureAwait(false)</code> in a <code>finally</code>)</p>
<p>So; now we can pass a <code>CancellationToken</code> <em>into</em> our iterator... but - how can we use it? That's where things get <em>even more</em> fun! The <em>naive</em> way to do this would be to think along the lines of "I can't take a <code>CancellationToken</code> until <code>GetAsyncEnumerator</code> is called, so... perhaps I can create a type to hold the state until I get to that point, and create an iterator block on the <code>GetAsyncEnumerator</code> method" - something like:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-c"><span class="pl-c">//</span> this is unnecessary; do not copy this!</span>
<span class="pl-en">IAsyncEnumerable</span><<span class="pl-k">string</span>> <span class="pl-en">SomeSourceAsync</span>(<span class="pl-k">int</span> <span class="pl-smi">x</span>)
<span class="pl-k">=></span> <span class="pl-k">new</span> <span class="pl-en">SomeSourceEnumerable</span>(<span class="pl-smi">x</span>);
<span class="pl-k">class</span> <span class="pl-en">SomeSourceEnumerable</span> : <span class="pl-en">IAsyncEnumerable</span><<span class="pl-k">string</span>>
{
<span class="pl-k">private</span> <span class="pl-k">int</span> <span class="pl-smi">x</span>;
<span class="pl-k">public</span> <span class="pl-en">SomeSourceEnumerable</span>(<span class="pl-k">int</span> <span class="pl-smi">x</span>)
<span class="pl-k">=></span> <span class="pl-k">this</span>.<span class="pl-smi">x</span> <span class="pl-k">=</span> <span class="pl-smi">x</span>;
<span class="pl-k">public</span> <span class="pl-k">async</span> <span class="pl-en">IAsyncEnumerator</span><<span class="pl-k">string</span>> <span class="pl-en">GetAsyncEnumerator</span>(
<span class="pl-en">CancellationToken</span> <span class="pl-smi">cancellationToken</span> <span class="pl-k">=</span> <span class="pl-smi">default</span>)
{
<span class="pl-k">for</span> (<span class="pl-k">int</span> <span class="pl-smi">i</span> <span class="pl-k">=</span> <span class="pl-c1">0</span>; <span class="pl-smi">i</span> <span class="pl-k"><</span> <span class="pl-c1">5</span>; <span class="pl-smi">i</span><span class="pl-k">++</span>)
{
<span class="pl-k">await</span> <span class="pl-smi">Task</span>.<span class="pl-en">Delay</span>(<span class="pl-c1">100</span>, <span class="pl-smi">cancellationToken</span>); <span class="pl-c"><span class="pl-c">//</span> simulate async something</span>
<span class="pl-k">yield</span> <span class="pl-k">return</span> <span class="pl-s"><span class="pl-pds">$"</span>result from SomeSource, x={<span class="pl-smi">x</span>}, result {<span class="pl-smi">i</span>}<span class="pl-pds">"</span></span>;
}
}
}</pre></div>
<p>The above <em>works</em>. If a <code>CancellationToken</code> is passed in via <code>WithCancellation</code>, our iterator will be cancelled at the correct time - including during the <code>Task.Delay</code>; we could also check <code>IsCancellationRequested</code> or call <code>ThrowIfCancellationRequested()</code> at any point in our iterator block, and all the right things would happen. But; we're making life hard for ourselves - the compiler can <em>do this for us</em>, via <code>[EnumeratorCancellation]</code>. We could <em>also</em> just have:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">async</span> <span class="pl-en">IAsyncEnumerable</span><<span class="pl-k">string</span>> <span class="pl-en">SomeSourceAsync</span>(<span class="pl-k">int</span> <span class="pl-smi">x</span>,
[<span class="pl-en">EnumeratorCancellation</span>] <span class="pl-en">CancellationToken</span> <span class="pl-smi">cancellationToken</span> <span class="pl-k">=</span> <span class="pl-smi">default</span>)
{
<span class="pl-k">for</span> (<span class="pl-k">int</span> <span class="pl-smi">i</span> <span class="pl-k">=</span> <span class="pl-c1">0</span>; <span class="pl-smi">i</span> <span class="pl-k"><</span> <span class="pl-c1">5</span>; <span class="pl-smi">i</span><span class="pl-k">++</span>)
{
<span class="pl-k">await</span> <span class="pl-smi">Task</span>.<span class="pl-en">Delay</span>(<span class="pl-c1">100</span>, <span class="pl-smi">cancellationToken</span>); <span class="pl-c"><span class="pl-c">//</span> simulate async something</span>
<span class="pl-k">yield</span> <span class="pl-k">return</span> <span class="pl-s"><span class="pl-pds">$"</span>result from SomeSource, x={<span class="pl-smi">x</span>}, result {<span class="pl-smi">i</span>}<span class="pl-pds">"</span></span>;
}
}</pre></div>
<p>This works <em>similarly</em> to our approach above - our <code>cancellationToken</code> parameter makes the token from <code>GetAsyncEnumerator()</code> (via <code>WithCancellation</code>) available to our iterator block, and we haven't had to create any dummy types. There is one slight nuance, though... we've <em>changed the signature</em> of <code>SomeSourceAsync</code> by adding a parameter. The code we had above <em>still compiles</em> because the parameter is optional. But this prompts the question: what happens if I <em>passed one in</em>? For example, what are the differences between:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-c"><span class="pl-c">//</span> option A - no cancellation</span>
<span class="pl-en">await</span> <span class="pl-en">foreach</span> (<span class="pl-en">var</span> <span class="pl-smi">item</span> <span class="pl-en">in</span> <span class="pl-smi">SomeSourceAsync</span>(42))
<span class="pl-c"><span class="pl-c">//</span> option B - cancellation via WithCancellation</span>
await foreach (<span class="pl-en">var</span> <span class="pl-smi">item</span> <span class="pl-en">in</span> <span class="pl-smi">SomeSourceAsync</span>(42).WithCancellation(cancellationToken))
<span class="pl-c"><span class="pl-c">//</span> option C - cancellation via SomeSourceAsync</span>
await foreach (<span class="pl-en">var</span> <span class="pl-smi">item</span> <span class="pl-en">in</span> <span class="pl-smi">SomeSourceAsync</span>(42, cancellationToken))
<span class="pl-c"><span class="pl-c">//</span> option D - cancellation via both</span>
await foreach (<span class="pl-en">var</span> <span class="pl-smi">item</span> <span class="pl-en">in</span> <span class="pl-smi">SomeSourceAsync</span>(42, cancellationToken).WithCancellation(cancellationToken))
<span class="pl-c"><span class="pl-c">//</span> option E - cancellation via both with different tokens</span>
await foreach (<span class="pl-en">var</span> <span class="pl-smi">item</span> <span class="pl-en">in</span> <span class="pl-smi">SomeSourceAsync</span>(42, tokenA).WithCancellation(tokenB))</pre></div>
<p>The answer is that <em>the right thing happens</em>: it doesn't matter which API you use - if a cancellation token is provided, it will be respected. If you pass two <em>different</em> tokens, then when <em>either</em> token is cancelled, it will be considered cancelled. What happens is that the <em>original token passed via the parameter</em> is stored as a field on the generated enumerable type, and when <code>GetAsyncEnumerator</code> is called, the parameter to <code>GetAsyncEnumerator</code> and the field are inspected. If they are both genuine but different cancellable tokens, <code>CancellationTokenSource.CreateLinkedTokenSource</code> is used to create a combined token (you can think of <code>CreateLinkedTokenSource</code> as the cancellation version of <code>Task.WhenAny</code>); otherwise, if <em>either</em> is genuine and cancellable, it is used. The result is that when you write an async cancellable iterator, you don't need to worry too much about whether the caller used the API directly vs indirectly.</p>
<p>You <em>might</em> be more concerned by the fact that we've changed the signature, however; in that case, a neat trick is to use two methods - one <em>without</em> the token that is for consumers, and one <em>with</em> the token for the actual implementation:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">public</span> <span class="pl-en">IAsyncEnumerable</span><<span class="pl-k">string</span>> <span class="pl-en">SomeSourceAsync</span>(<span class="pl-k">int</span> <span class="pl-smi">x</span>)
<span class="pl-k">=></span> <span class="pl-en">SomeSourceImplAsync</span>(<span class="pl-smi">x</span>);
<span class="pl-k">private</span> <span class="pl-k">async</span> <span class="pl-en">IAsyncEnumerable</span><<span class="pl-k">string</span>> <span class="pl-en">SomeSourceImplAsync</span>(<span class="pl-k">int</span> <span class="pl-smi">x</span>,
[<span class="pl-en">EnumeratorCancellation</span>] <span class="pl-en">CancellationToken</span> <span class="pl-smi">cancellationToken</span> <span class="pl-k">=</span> <span class="pl-smi">default</span>)
{
<span class="pl-k">for</span> (<span class="pl-k">int</span> <span class="pl-smi">i</span> <span class="pl-k">=</span> <span class="pl-c1">0</span>; <span class="pl-smi">i</span> <span class="pl-k"><</span> <span class="pl-c1">5</span>; <span class="pl-smi">i</span><span class="pl-k">++</span>)
{
<span class="pl-k">await</span> <span class="pl-smi">Task</span>.<span class="pl-en">Delay</span>(<span class="pl-c1">100</span>, <span class="pl-smi">cancellationToken</span>); <span class="pl-c"><span class="pl-c">//</span> simulate async something</span>
<span class="pl-k">yield</span> <span class="pl-k">return</span> <span class="pl-s"><span class="pl-pds">$"</span>result from SomeSource, x={<span class="pl-smi">x</span>}, result {<span class="pl-smi">i</span>}<span class="pl-pds">"</span></span>;
}
}</pre></div>
<p>This would <em>seem</em> an ideal candidate for a "local function", but unfortunately <em>at the current time</em>, parameters on local functions are not allowed to be decorated with attributes. It is my hope that the language / compiler folks take pity on us, and allow us to do (in the future) something more like:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">public</span> <span class="pl-en">IAsyncEnumerable</span><<span class="pl-k">string</span>> <span class="pl-en">SomeSourceAsync</span>(<span class="pl-k">int</span> <span class="pl-smi">x</span>)
{
<span class="pl-k">return</span> <span class="pl-en">Impl</span>();
<span class="pl-c"><span class="pl-c">//</span> this does not compile today</span>
<span class="pl-smi">async</span> <span class="pl-smi">IAsyncEnumerable</span><span class="pl-k"><</span><span class="pl-smi">string</span><span class="pl-k">></span> <span class="pl-en">Impl</span>(
[<span class="pl-smi">EnumeratorCancellation</span>] <span class="pl-smi">CancellationToken</span> <span class="pl-smi">cancellationToken</span> <span class="pl-k">=</span> <span class="pl-smi">default</span>)
{
<span class="pl-k">for</span> (<span class="pl-k">int</span> <span class="pl-smi">i</span> <span class="pl-k">=</span> <span class="pl-c1">0</span>; <span class="pl-smi">i</span> <span class="pl-k"><</span> <span class="pl-c1">5</span>; <span class="pl-smi">i</span><span class="pl-k">++</span>)
{
<span class="pl-k">await</span> <span class="pl-smi">Task</span>.<span class="pl-en">Delay</span>(<span class="pl-c1">100</span>, <span class="pl-smi">cancellationToken</span>); <span class="pl-c"><span class="pl-c">//</span> simulate async something</span>
<span class="pl-k">yield</span> <span class="pl-k">return</span> <span class="pl-s"><span class="pl-pds">$"</span>result from SomeSource, x={<span class="pl-smi">x</span>}, result {<span class="pl-smi">i</span>}<span class="pl-pds">"</span></span>;
}
}
}</pre></div>
<p>or the equivalent using <code>static</code> local functions, which is <em>usually</em> my preference to avoid any surprises in how capture works. The good news is that this works in the preview language versions, but that is not a guarantee that it will "land".</p>
<h2>Summary</h2>
<p>So; that's how you can implement and use async iterators in C# now. We've looked at both the consumer and producer versions of iterators, for both synchronous and asynchronous code paths, and looked at various ways of accessing cancellation of asynchronous iterators. There is a <em>lot</em> going on here, but: hopefully it is useful and meaningful.</p>Marc Gravellhttp://www.blogger.com/profile/01023334706549710089noreply@blogger.comtag:blogger.com,1999:blog-8184237816669520763.post-37184113575839073832020-03-04T02:20:00.003-08:002020-06-28T12:20:33.405-07:00Why do I rag on BinaryFormatter?<p>tl;dr: seriously, <strong>stop using <code>BinaryFormatter</code></strong></p>
<p>The other evening, in the context of <a href="https://github.com/protobuf-net/protobuf-net.Grpc">protobuf-net.Grpc</a>, someone asked me whether it was possible to use <code>BinaryFormatter</code> as the marshaller. This isn't an unreasonable question, especially as protobuf-net.Grpc is <em>designed</em> to allow you to swap out the marshaller (gRPC is <em>usually</em> used with protobuf, but it isn't <em>restricted</em> to that as long as both ends understand what they're dealing with).</p>
<p>This made me realise that while I've spent over a decade telling people variants of "don't use <code>BinaryFormatter</code>", I don't think I've ever collated the reasons in one place. I suspect that many people think I'm being self-serving by saying this - after all it is so <em>easy</em> to use <code>BinaryFormatter</code>, and I'm not exactly a disinterested observer when it comes to serialization tools.</p>
<p>So! I thought I'd take this opportunity to put together my thoughts and reasons in one place, while also providing a "custom marshaller" example for protobuf-net.Grpc. Because "reasons", I've done this as comments in the example, but I present them below. There are four sections, but if you aren't sold by the time you've finished the first ("Security") section, then frankly: I give up. Everything beyond that first section is just decoration!</p>
<p>So; if you're still using <code>BinaryFormatter</code>, I <em>implore</em> you: please just stop.</p>
<p>And without further embellishment, <a href="https://github.com/protobuf-net/protobuf-net.Grpc/blob/main/tests/protobuf-net.Grpc.Test.Integration/CustomMarshaller.cs">I present my thesis</a>. If I missed anything, please let me know and we can add more. But again, no more should be needed.</p>Marc Gravellhttp://www.blogger.com/profile/01023334706549710089noreply@blogger.comtag:blogger.com,1999:blog-8184237816669520763.post-3653782171117488102020-01-07T04:44:00.001-08:002020-01-08T07:43:16.148-08:00.NET Core, .NET 5; the exodus of .NET Framework?<p>tl,dr; opinion: ongoing .NET Framework support for F/OSS libraries may quickly start evaporating, and this should be a consideration in migration planning.</p>
<hr>
<p>First, a clarification of terms, because they matter:</p>
<ul><li>.NET Framework - the original .NET, the one that ships on Windows and <em>only</em> on Windows; the current (and probably final) version of .NET Framework is 4.8</li><li>.NET Core - the evolution of .NET, that is not tied to the OS as much, with slightly different feature sets, and where most of the Microsoft .NET effort has been for the last few years; .NET Core 3.1 shipped recently</li><li>.NET Standard - an API definition (not implementation - akin to an interface) that allows a library to target a range of platforms in a single build, i.e. by targeting .NET Standard 2.0 a library can in theory run equivalently on .NET Core 3 and .NET Framework 4.6.2 (ish...) and others (Mono, Unity, etc), without needing to target each individually</li><li>.NET 5 - the next version of .NET Core; the naming deliberately emphasizes that there isn't a two-pronged development future consisting of "Framework" and "Core", but just one - this one - which isn't "Core" in the "minimal" sense, but is in fact now a very rich and powerful runtime; .NET 4 was avoided to prevent versioning confusion between .NET 4.* and .NET Framework 4.* (and again, to emphasize that this is the future direction of .NET, including if you are currently on .NET Framework)</li></ul>
<hr>
<p>The first thing we must be clear about, in case it isn't 100% clear from the above, is that .NET Framework is <strike>legacy</strike> <a href="https://twitter.com/blowdart/status/1214236697132584961"><em>completed</em></a>. There isn't going to be a .NET Framework 4.9 or a .NET Framework 5. There <em>might</em> be some critical security fixes, but there aren't going to be feature additions, unless those additions come from out-of-band NuGet (etc) packages that just happened to work on .NET Framework on the first (or maybe second) try.</p>
<p>I commented on Twitter yesterday about my perceptions on the status of this, and how we (the .NET community) should look at the landscape; it goes without saying that I'm merely opining here - I'm not a spokesperson for Microsoft, but I am a library author and consumer, and I work extensively in the .NET space. Other views and conclusions are possible! But: I wanted to take the time to write up a more long-form version of what I see, with space to give reasons and discuss consequences.</p>
<h1>What I said yesterday</h1>
<p>The short version is: I expect that 2020 will see a lot of library authors giving serious consideration as to whether to continue shipping .NET Framework support on new library versions. There are lots of reasons for this, including:</p>
<ul><li>increasing feature gaps making it increasingly expensive to support multiple frameworks, either via compatibility shims or framework-dependent feature sets</li><li>as more and more library authors complete their own migrations to .NET Core, the effort required to support a framework <em>that they aren't using</em> increases:<ul><li>bugs don't get spotted until they've shipped to consumers</li><li>a lot of knowledge of "the old framework" needs to be retained and kept in mind - a particular issue with new contributors who might <em>never have used</em> that framework (and yes, there are some huge gotchas)</li><li>there are often two (or more) code implementations to support</li><li>builds are more complicated than necessary (requiring either Windows or the build-pack), and take longer</li><li>tests take longer <strong>and require Windows</strong></li><li>packages and dependency trees are larger than necessary</li></ul></li><li>not all new language features are equal citizens on down-level frameworks<ul><li>some features, such as default interface methods, will not work on down-level frameworks</li><li>some important features like C# 8 nullability are in a weird middle ground where some bits kinda work sometimes most of the time except when it doesn't</li><li>some, like <code>IAsyncEnumerable<T></code> may have <a href="https://www.nuget.org/packages/Microsoft.Bcl.AsyncInterfaces/">compatibility shims</a>, but that only allows minimal support on library surfaces, since of course many framework level pieces to produce or consume such will be missing</li></ul></li><li>some APIs are <em>fundamentally brittle</em> on .NET Framework, especially when multi-targeting, with the breaks happening only at run-time (they are not obvious at build, and may not be obvious until a very specific code-path is hit, which might be a long time after initial deployment); a lot of this comes does to the assembly loader and assembly-binding-redirects (a problem that simply does not exist in .NET Core / .NET 5)<ul><li>if you want to see a library author cry, mention <a href="https://www.nuget.org/packages/System.ValueTuple/"><code>System.ValueTuple</code></a>, <a href="https://www.nuget.org/packages/System.Numerics.Vectors/"><code>System.Numerics.Vectors</code></a>, or <a href="https://www.nuget.org/packages/System.Runtime.CompilerServices.Unsafe/"><code>System.Runtime.CompilerServices.Unsafe</code></a>. Why? Because they are <em>deployment nightmares</em> if you are targeting multiple platforms, because .NET Framework makes a <em>complete <a href="https://english.stackexchange.com/questions/24208/why-is-a-disastrous-mess-called-a-pigs-ear">pig's ear</a></em> of them; you can <em>just about</em> fix it up with assembly-binding-redirects some of the time, but the tooling will not and can not do this for you, which is pure pain for a library author</li><li>recall that .NET Framework is "complete"; the loader isn't going to be fixed (also, <a href="https://twitter.com/Nick_Craver/status/1212424774019952641">nobody wants to touch it</a>); alternatively, it could be said that the loader has <em>already</em> been fixed; the fix is called .NET Core / .NET 5</li></ul></li><li>a lot of recent performance-focused APIs are not available on .NET Framework, or perform very differently (which is almost the worst possible outcome for performance-focused APIs!); for example:<ul><li>concurrency: a lot of <code>async</code> APIs designed for highly concurrent systems (servers, in particular) will be simply missing on .NET Framework, or may be implemented via async-over-sync / sync-over-async, which <em>significantly</em> changes the characteristics</li><li>allocations: ther are a lot of new APIs designed to avoid allocations, typically in library code related to IO, data-processing etc - things like <code>Span<T></code>; the APIs to interact with the framework with these directly with these won't exist on .NET Framework, forcing dual code paths, but <em>even when they do</em>, .NET Framework uses a <em>different</em> (and less optimal) <code>Span<T></code> implementation, <em>and</em> the JIT lacks the knowledge to make <code>Span<T></code> be magical; you can hack over some of the API gaps using pointer-based APIs when they exist, but then you might be tempted to use <a href="https://www.nuget.org/packages/System.Runtime.CompilerServices.Unsafe/"><code>Unsafe.*</code></a>, which as already mentioned: wants to kill you</li><li>processing: one of the most powerful new toolkits in .NET for CPU-focused work is access to <a href="https://devblogs.microsoft.com/dotnet/hardware-intrinsics-in-net-core/">SIMD and CPU intrinsics</a>; both of these work especially well when mixed with spans, due to the ability to coerce between spans and vectors - but we just saw how <code>Span<T></code> is problematic; full CPU intrinsics are only available on .NET Core / .NET 5, but you can still get a <em>lot</em> done by using <code>Vector<T></code> which allows SIMD on .NET Framework... except I already mentioned that <a href="https://www.nuget.org/packages/System.Numerics.Vectors/"><code>System.Numerics.Vectors</code></a> is one of the trifecta of doom - so yes, you can use it, but: brace yourself.</li><li>now consider that a lot of libraries - including Microsoft libraries on NuGet, and F/OSS libraries - are starting to make more and more use of these features for performance, and you start to see how brittle things get, and <em>it often won't be the library author that sees the problem</em>.</li></ul></li><li>as .NET Core / .NET 5 expand our ability to reach more OSes, we <em>already</em> have enough permutations of configurations to worry about.</li><li>often, the issues here may not be just down to <em>a</em> library, but may be due to <em>interactions</em> of <em>multiple</em> libraries (or indeed, conflicting dependencies of multiple libraries), so the issues may be unique to specific deployments.</li></ul>
<hr>
<h2>How about just offering patch support, not feature support?</h2>
<p>So the theory here is that we can throw our hands in the air, and declare "no new features in the .NET Framework version - but we'll bugfix". This sounds great to the consumer, but... it isn't really very enticing to the maintainer. In reality, this means branching at some point, and now ... what happens? We still retain <em>all</em> of the build, test, deploy problems (although now we might need completely different build/CI tools for each), but now we have two versions of the code that are drifting apart; we need to keep all the old things in our mind for support, and when we bugfix the current code, we might also need to backport that bug into a branch that uses <em>very different code</em>, and test that. On a platform that the library maintainers <em>aren't using</em>.</p>
<p><em><strong>F/OSS isn't free; it is paid for by the maintainers.</strong></em> When proposing something like the above, we need to be very clear about whose time we are committing, and why we feel entitled to commit it. Fundamentally, I don't think that option scales very well. At some point, I think it becomes increasingly necessary to think of .NET Framework in the same way that we have thought of .NET 1.* for a very long time - it is interesting to know that it exists, but the longer you stay stuck on that island, the harder life is going to become for you.</p>
<p>In particular, to spell it out explicitly; I expect a number of libraries will start rebasing to .NET Standard 2.1 and .NET Core 3.0 or 3.1 as their minimum versions, <em>carving off</em> .NET Framework. The choice of .NET Standard 2.1 here isn't necessarily "because we want to use APIs only available in 2.1", but is instead: "because we actively don't want .NET Framework trying to run this, and .NET Framework thinks, often mistakenly, that it works with .NET Standard 2.0" (again, emphasis here is that .NET Framework 4.6.2 only <em>sort of</em> implements .NET Standard 2.0, and even when it does, it drags in a large dependency graph; this is partly resolved if you also target .NET Framework 4.7.2, but your list of TFMs is now growing even further).</p>
<h2>So what happens to .NET Framework folks?</h2>
<p>I <em>totally get</em> that a lot of people will be stuck on .NET Framework for the foreseeable future. Hell, a lot of <em>our</em> code at Stack Overflow is still .NET Framework (we're working through migration). I completely understand and empathize with all the familiar topics of service lifetimes, SLAs, budgets, clients, contracts/legals, and all of those things.</p>
<p>Just like nobody is coming to take .NET Framework off your machine, nobody is coming to take F/OSS libraries either. What I'm saying is that a time may come - and it is getting closer on the horizon - when you just won't get updates. The library you have today will continue working, and will still be on NuGet, but there won't be feature updates, and <em>very few</em> (for the reasons above) bug fixes.</p>
<p>I know I've spoken about <a href="https://blog.marcgravell.com/2018/04/having-serious-conversation-about-open.html">open source funding</a> before, but: at some point, if your business <em>genuinely needs</em> additional support on .NET Framework where it is going to create <em>significant</em> extra work (see: everything above) for the maintainers, perhaps at some point this is simply a supply-chain issue, and one solution is to sponsor that work <em>and the ongoing support</em>. Another option may be to fork the project yourself at the point where you're stuck, and maintain all the changes there, perhaps even supporting the other folks using that level. If you're thinking "but that sounds like a lot of effort": congratulations, you're right - it is! That's <em>why</em> it isn't already being done. All such work is zero sum; time spent on the additional work needed to support .NET Framework is time not being spent actually developing the library for <em>what the maintainer wants and needs</em>, and: it is <em>their time being spent</em>.</p>
<h2>Conclusion</h2>
<p>A lot of what I've discussed here is opinion; I can't say for sure how it will play out, but I think it is a very real (and IMO likely) possibility. As such, I think it is <em>just one facet</em> of the matrix you should be considering in terms of "should we, or when should we, look to migrate to .NET Core / .NET 5"; key point: <a href="https://dotnet.microsoft.com/platform/support/policy/dotnet-core">.NET Core 3.1 is a LTS release</a>, so frankly, there's absolutely no better time than now. Is migrating work? Yes, it is. But staying put <em>also</em> presents challenges, and I do not believe that .NET Framework consumers can reasonably expect the status-quo of F/OSS support (for .NET Framework) to continue.</p>
<p>(<a href="https://twitter.com/marcgravell/status/1214226671550775297">the Twitter thread</a>)</p>Marc Gravellhttp://www.blogger.com/profile/01023334706549710089noreply@blogger.comtag:blogger.com,1999:blog-8184237816669520763.post-60271070305293783472019-08-23T03:22:00.000-07:002019-08-24T23:25:11.850-07:00Prefer ValueTask to Task, always; and don't await twice<h2 id="preamblenotapart2">Preamble - not a part 2</h2>
<p>A little while ago <a href="https://blog.marcgravell.com/2019/02/fun-with-spiral-of-death.html">I blogged here</a> and I set it up to be a "continues..." style post. I haven't had the energy to continue it in that context, and this fact was putting me off concluding the post. I then realised: the thing that matters isn't some overarching narrative structure, but that I get my ideas down. So: I'm aborting any attempt at making this post a continuation, and just focusing on the content!</p>
<h1 id="prefervaluetaskttotasktalways">Prefer <code>ValueTask[<T>]</code> to <code>Task[<T>]</code>, always.</h1>
<p>There's been a lot of confusion over when to use <code>Task[<T>]</code> vs <code>ValueTask[<T>]</code> (note: I'm going to drop the <code>[<T>]</code> from now on; just pretend they're there when you see <code>Task</code> / <code>ValueTask</code> etc).</p>
<h2 id="contextwhataretaskandvaluetask">Context: what are <code>Task</code> and <code>ValueTask</code>?</h2>
<p>In case you don't know, <code>Task</code> and <code>ValueTask</code> are the two primary implementations of "awaitable" types in .NET; "awaitable" here means that there is a duck-typed signature that allows the compiler to turn this:</p>
<pre><code>int i = await obj.SomeMethodAsync();
</code></pre>
<p>into <em>something like</em> this:</p>
<pre><code>var awaiter = obj.SomeMethodAsync().GetAwaiter();
if (!awaiter.IsCompleted)
{
// voodoo here that schedules a
// continuation that resumes here
// once the result becomes available
}
int i = awaiter.GetResult();
</code></pre>
<p><code>Task</code> is the original and most well known API, since it shipped with the TPL, but it means that an object allocation is necessary even for scenarios where it turns out that it <em>was</em> already available, i.e. <code>awaiter.IsCompleted</code> returned <code>true</code>. The <code>ValueTask</code> value-type (<code>struct</code>) acts as a hybrid result that can represent an already completed result <em>without allocating</em> <strong>or</strong> an incomplete pending operation. You <em>can</em> implement your own custom awaitables, but it isn't common.</p>
<h2 id="whentochooseeachtheincorrectversion">When to choose each, the incorrect version</h2>
<p>If you'd asked me a while back about when to choose each, I might have <strong>incorrectly</strong> said something like:</p>
<blockquote>
<p>Use <code>Task</code> when something is usually or always going to be genuinely asynchronous, i.e. not immediately complete; use <code>ValueTask</code> when something is usually or always going to be synchronous, i.e. the value will be known inline; also use <code>ValueTask</code> in a polymorphic scenario (<code>virtual</code>, <code>interface</code>) where you can't know the answer.</p>
</blockquote>
<p>The logic behind this incorrect statement is that if something is incomplete, your <code>ValueTask</code> is going to end up being backed by a <code>Task</code> <em>anyway</em>, but without the extra indirection and false promise of <code>ValueTask</code>. This is incorrect, though, because it is based on the premise that a <code>ValueTask</code> is a composite of "known result (<code>T</code>)" and "<code>Task</code>". In fact, <code>ValueTask</code> is <em>also</em> a composite of a third thing: <code>IValueTaskSource[<T>]</code>.</p>
<h2>What is <code>IValueTaskSource[<T>]</code>?</h2>
<p><code>IValueTaskSource</code> is an abstraction that allows you to represent the <em>logical</em> behaviour of a task <strong>separately</strong> to the result itself. That's a little vague, so an example:</p>
<pre><code>IValueTaskSource<int> someSource = // ...
short token = // ...
var vt = new ValueTask<int>(someSource, token);
// ...
int i = await vt;
</code></pre>
<p>This now functions like you'd expect from an awaitable, but <strong>even in the incomplete/asynchronous case</strong> the logic about how everything works is now down to <em>whatever implements the interface</em> - it <strong>does not</strong> need to be backed by a <code>Task</code>. You might be thinking:</p>
<blockquote>
<p>ah, but we still need an instance of whatever is implementing the interface, and we're treating it as a reference, so: we're still going to allocate; what's the point? what have you gained?</p>
</blockquote>
<p>And that's when I need to point out the <code>short token</code>. This little gem allows us to use the <strong>same interface instance</strong> with <strong>multiple</strong> value-tasks, and have them know the difference. There are two ways you <em>could</em> use this:</p>
<ul>
<li>keep the state for multiple asynchronous operations <em>concurrently</em>, using the <code>token</code> to pick the correct state (presumably from a vector)</li>
<li>keep a <em>single</em> piece of state for multiple <strong>consecutive</strong> operations, using the <code>token</code> to guarantee that we're talking about the correct one</li>
</ul>
<p>The second is actually <em>by far</em> the more common implementation, and in fact is now included in the BCL for you to make direct use of - see <a href="https://github.com/dotnet/coreclr/blob/master/src/System.Private.CoreLib/shared/System/Threading/Tasks/Sources/ManualResetValueTaskSourceCore.cs"><code>ManualResetValueTaskSourceCore<T></code></a>.</p>
<h2 id="sowhathowdoesthishelpme">So what? How does this help me?</h2>
<p>OK; so - we've seen that this alternative exists. There are two ways that people commonly author awaitable APIs today:</p>
<ul>
<li>using <code>TaskCompletionSource<T></code> and handing the caller the <code>.Task</code> (perhaps wrapped in a <code>ValueTask</code>), and calling <code>TrySetResult</code> etc when we want to trigger completion</li>
<li>using <code>async</code> and <code>await</code>, having the compiler generate all the machinery behind the scenes - noting that this currently involves creating a <code>Task</code> in the incomplete case, even for <code>ValueTask</code> methods (because it has to come from <em>somewhere</em>)</li>
</ul>
<p>Hopefully you can see that <strong>if we have <code>ValueTask</code> available to us</strong> it is relatively easy to substitute in a <code>ManualResetValueTaskSourceCore</code> backer, allowing us to <strong>reuse</strong> the same <code>IValueTaskSource</code> instance multiple times, avoiding <strong>lots</strong> of allocations. But: there's an important caveat - <strong>it changes the API</strong>. No, really. Let's take a stroll to discuss how...</p>
<h1 id="dontawaittwice">Don't await twice</h1>
<p>Right now, the following code works - assuming the result is backed by either a fixed <code>T</code> or a <code>Task<T></code>:</p>
<pre><code>var pending = obj.SomeMethodAsync();
int i = await pending;
// ...
int j = await pending;
</code></pre>
<p>You'll get the same answer from each <code>await</code>, unsurprisingly - but the actual operation (the method) is only performed once. But: if we switch to <code>ManualResetValueTaskSourceCore</code>, we should only assume that each <code>token</code> is valid <strong>exactly once</strong>; once we've awaited the result, the <em>entire point</em> is that the backing implementation is free to re-use that <code>IValueTaskSource</code> <strong>with a different <code>token</code></strong> for <em>another consumer</em>. That means that the code shown above <em>is no longer legal</em>, and we should expect that the second <code>await</code> can now throw an exception about the <code>token</code> being incorrect.</p>
<p>This is a pretty rare thing to see in code, so personally I'm OK with saying "tough; await once only". Think of it in human terms; this is like a manager going to someone's desk and saying:</p>
<blockquote>
<p>Hi, I need the answer to (some topical question); do you know that now? if so, tell me now; otherwise, when you have the answer, bring it (somewhere) and nudge me.</p>
</blockquote>
<p>All fine and reasonable so far; our office hero didn't know the answer right away, so they went away and got it, took it where instructed and handed the answer to the manager.</p>
<p>20 minutes later (or 2 days later), the manager stops by their desk again:</p>
<blockquote>
<p>Hey, give me that answer</p>
</blockquote>
<p>At this point, our hero might reasonably say</p>
<blockquote>
<p>Boss, I already gave it you; I only printed it out once - you have the copy; I deal with lots of requests each day, and I can't even remember what you <em>asked about</em>, let alone what the answer was; if you've forgotten the answer, <em>that's on you</em> - feel free to ask again, it's all billable</p>
</blockquote>
<p>This is kinda how I anthropomorphize <code>ValueTask</code>, especially in the context of <code>IValueTaskSource</code>. So key point: <strong>don't await twice</strong>. Treat the results of <em>awaitables</em> exactly the same as you would the result of <em>any other expression</em>: if you are going to need the value twice, store it in a local <em>when you first fetch it</em>.</p>
<h1 id="howelsecanwebenefitfromivaluetasksource">How else can we benefit from IValueTaskSource?</h1>
<p>So; we've seen how we can <em>manually</em> use an <code>IValueTaskSource</code> to efficiently issue <code>ValueTask</code> awaitable results; but if we use <code>async</code>/<code>await</code>, in the incomplete / asynchronous case the compiler is still going to be generating a <code>Task</code> - and also generating a bunch of other state boxes associated with the continuation voodoo. But.. <em>it doesn't have to!</em> A while ago I did some playing in this area that resulted in <a href="https://mgravell.github.io/PooledAwait/">"Pooled Await"</a>; I'm not going to go into details about this here, and for reasons that will become clear in a moment, I <strong>don't</strong> recommend switching to this, but the short version is: you can write a method that behaves <em>exactly like</em> a <code>ValueTask</code> awaitable method (including <code>async</code>), but the library makes the compiler generate different code that using <code>IValueTaskSource</code> to avoid the <code>Task</code> allocation, and uses state machine boxing to reduce the other allocations. It works pretty well, but as you might expect, it has the above caveat about awaiting things more than once</p>
<p>So; why am I saying don't leap at this? That because the BCL folks are <em>also</em> now playing in this space, as evidenced by <a href="https://github.com/dotnet/coreclr/pull/26310">this PR</a>, which has pretty much the exact same feature set, but the advantages of:</p>
<ul>
<li>being written by people who <em>really, really</em> understand async</li>
<li>it not adding any dependencies - it would just work <em>out of the box</em> for <code>ValueTask</code> awaitables</li>
</ul>
<p>If that happens, then a lot of asynchronous code will magically get less allocatey <em>all at once</em>. I know this is something they've discussed in the past, so maybe my "Pooled Await" stuff gave them the metaphorical kick to go and take another look at implementing it for real; or maybe it was just a timing coincidence.</p>
<p>For both my own implementation and the BCL version, it can't do <em>all</em> the magic if you return <code>Task</code> - for best results, a <code>ValueTask</code> is needed (although "Pooled Await" still reuses the state-machine boxes for <code>Task</code> APIs)</p>
<h2 id="conclusion">Conclusion</h2>
<p>So, going back to the earlier question of when to use <code>Task</code> vs <code>ValueTask</code>, IMO the answer is now obvious:</p>
<blockquote>
<p>Use <code>ValueTask[<T>]</code>, unless you absolutely can't because the existing API is <code>Task[<T>]</code>, and even then: <em>at least consider</em> an API break</p>
</blockquote>
<p>And also keep in mind:</p>
<blockquote>
<p>Only <code>await</code> any single awaitable expression <strong>once</strong></p>
</blockquote>
<p>If we put those two things together, libraries and the BCL are free to work miracles in the background to improve performance <em>without the caller needing to care</em>.</p>Marc Gravellhttp://www.blogger.com/profile/01023334706549710089noreply@blogger.comtag:blogger.com,1999:blog-8184237816669520763.post-89080761594389405212019-02-21T05:17:00.002-08:002020-02-19T07:25:26.422-08:00Fun with the Spiral of Death<p>Subtitled: "a cautionary tale of <code>SemaphoreSlim</code>", an adventure in two parts:</p>
<ul>
<li>In part 1 I want to discuss a very fun series of problems we had in some asynchronous code - where "fun" here means "I took Stack Overflow offline, again". Partly because it is a fun story, but mostly because I think there's some really useful learning points in there for general adventures in asynchronicity</li>
<li>In part 2 I want to look at some of the <em>implementation details</em> of our eventual fix, which covers some slightly more advanced themes around how to implement awaitable code in non-trivial scenarios</li>
</ul>
<h2><a id="user-content-i-took-stack-overflow-offline-again" class="anchor" aria-hidden="true" href="#i-took-stack-overflow-offline-again"></a>I took Stack Overflow offline, again</h2>
<p>As a side note: many of the themes here run hand-in-hand with David and Damian's recent presentation "Why your ASP.NET Core application won't scale" at NDC; if you haven't seen it yet: <a href="https://www.youtube.com/watch?v=J-xqz_ZM9Wg" rel="nofollow"><em>go watch it</em></a> - in particular everything around "the application works fine until it suddenly doesn't" and "don't sync-over-async or async-over-sync".</p>
<p>A lot of this journey relates to our migration of <a href="https://github.com/StackExchange/StackExchange.Redis" rel="nofollow">StackExchange.Redis</a> to use "pipelines", the new IO layer in .NET (previously discussed <a href="https://blog.marcgravell.com/2018/07/pipe-dreams-part-1.html" rel="nofollow">here</a>, <a href="https://blog.marcgravell.com/2018/07/pipe-dreams-part-2.html" rel="nofollow">here</a>, <a href="https://blog.marcgravell.com/2018/07/pipe-dreams-part-3.html" rel="nofollow">here</a>, and <a href="https://blog.marcgravell.com/2018/07/pipe-dreams-part-31.html" rel="nofollow">here</a> - I love me some pipelines). One of the key design choices in StackExchange.Redis is for the library to implement <a href="https://stackexchange.github.io/StackExchange.Redis/PipelinesMultiplexers" rel="nofollow">multiplexing</a> to allow multiple concurrent calling threads to communicate over the same underlying socket to the server; this keeps the socket count low while also helping to reduce packet fragmentation, but it means that we need to do some synchronization around how the many caller threads access the underlying socket.</p>
<p>Before the pipeline migration, this code was <em>basically</em> synchronous (it was a bit more complex, but… that's close enough), and the "write an actual command" code could be expressed (if we take some liberties for readability) as below:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">readonly</span> <span class="pl-k">object</span> <span class="pl-smi">syncLock</span> <span class="pl-k">=</span> <span class="pl-k">new</span> <span class="pl-k">object</span>(); <span class="pl-c"><span class="pl-c">//</span> single writer</span>
<span class="pl-k">void</span> <span class="pl-en">WriteMessage</span>(<span class="pl-en">Message</span> <span class="pl-smi">message</span>)
{
<span class="pl-k">bool</span> <span class="pl-smi">haveLock</span> <span class="pl-k">=</span> <span class="pl-c1">false</span>;
<span class="pl-k">try</span>
{
<span class="pl-smi">Monitor</span>.<span class="pl-en">TryEnter</span>(<span class="pl-smi">syncLock</span>, <span class="pl-smi">timeout</span>, <span class="pl-k">ref</span> <span class="pl-smi">haveLock</span>);
<span class="pl-k">if</span> (<span class="pl-k">!</span><span class="pl-smi">haveLock</span>) <span class="pl-en">ThrowTimeout</span>();
<span class="pl-en">ActuallyWriteTheThing</span>(<span class="pl-smi">message</span>);
<span class="pl-en">Flush</span>();
}
<span class="pl-k">finally</span>
{
<span class="pl-k">if</span> (<span class="pl-smi">haveLock</span>) <span class="pl-smi">Monitor</span>.<span class="pl-en">Exit</span>(<span class="pl-smi">syncLock</span>);
}
}</pre></div>
<p>This is a fairly normal style of coding - the <code>try</code>/<code>finally</code>/<code>Monitor</code>/<code>haveLock</code> code here is just a standard implementation of "<code>lock</code> with a timeout", so all this really does is:</p>
<ul>
<li>try to acquire exclusive access to the socket, guarded by <code>syncLock</code></li>
<li>if successful, write and flush</li>
</ul>
<p>All reasonable. But then we moved to pipelines, and one of the defining features of the pipelines implementation is that key steps in it are <code>async</code>. You might assume that it is the <em>write</em> that is <code>async</code> - but since you write to a buffer pool, this isn't actually the case - it's the <em>flush</em> that is <code>async</code>. The <em>flush</em> in pipelines achieves a few different things:</p>
<ul>
<li>if necessary, it activates the <em>consumer</em> that is pulling work from the pipe and sending it to the next step (a socket in our case)</li>
<li>it provides back-pressure to the <em>provider</em> (<code>WriteMessage</code> in this case), so that if the consumer is falling behind and there's too much backlog, we can slow down the provider (in an asynchronous way) so we don't get unbounded buffer growth</li>
</ul>
<p>All very neat.</p>
<p>But switching from synchronous code to an API that uses <code>async</code> is not always trivial - <code>async</code> begets <code>async</code>, and once you start going <code>async</code>, it <em>all</em> goes <code>async</code>. So… I did a bad thing; I was lazy, and figured "hey, flush will almost always complete synchronously anyway; we can probably get away with a sync-over-async here" (<em>narrator: they didn't get away with it</em>).</p>
<p>So; what I did was something like:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">readonly</span> <span class="pl-k">object</span> <span class="pl-smi">syncLock</span> <span class="pl-k">=</span> <span class="pl-k">new</span> <span class="pl-k">object</span>(); <span class="pl-c"><span class="pl-c">//</span> single writer</span>
<span class="pl-k">void</span> <span class="pl-en">WriteMessage</span>(<span class="pl-en">Message</span> <span class="pl-smi">message</span>)
{
<span class="pl-k">bool</span> <span class="pl-smi">haveLock</span> <span class="pl-k">=</span> <span class="pl-c1">false</span>;
<span class="pl-k">try</span>
{
<span class="pl-smi">Monitor</span>.<span class="pl-en">TryEnter</span>(<span class="pl-smi">syncLock</span>, <span class="pl-smi">timeout</span>, <span class="pl-k">ref</span> <span class="pl-smi">haveLock</span>);
<span class="pl-k">if</span> (<span class="pl-k">!</span><span class="pl-smi">haveLock</span>) <span class="pl-en">ThrowTimeout</span>();
<span class="pl-en">ActuallyWriteTheThing</span>(<span class="pl-smi">message</span>);
<span class="pl-en">FlushSync</span>();
}
<span class="pl-k">finally</span>
{
<span class="pl-k">if</span> (<span class="pl-smi">haveLock</span>) <span class="pl-smi">Monitor</span>.<span class="pl-en">Exit</span>(<span class="pl-smi">syncLock</span>);
}
}
<span class="pl-k">void</span> <span class="pl-en">FlushSync</span>() <span class="pl-c"><span class="pl-c">//</span> evil hack, DO NOT USE</span>
{
<span class="pl-k">var</span> <span class="pl-smi">flush</span> <span class="pl-k">=</span> <span class="pl-en">FlushAsync</span>();
<span class="pl-k">if</span> (<span class="pl-k">!</span><span class="pl-smi">flush</span>.<span class="pl-smi">IsCompletedSuccessfully</span>)
{
<span class="pl-k">flush</span>.<span class="pl-en">Wait</span>();
}
}</pre></div>
<p>The <code>IsCompletedSuccessfully</code> here is a check you can use on many task-like (awaitable) results to see if it completed synchronously and without faulting; if it did, you're safe to access the <code>.Result</code> (etc.) and it will all be available already - a good trick for avoiding the <code>async</code> state-machine complications in high-throughput code (typically library code, not application code). The bad bit is the <code>.Wait(…)</code> when it <em>isn't</em> already completed - this is a sync-over-async.</p>
<h2><a id="user-content-what-happened-next" class="anchor" aria-hidden="true" href="#what-happened-next"></a>What happened next?</h2>
<p>A key thing to keep in mind is that StackExchange.Redis exposes both synchronous and asynchronous APIs - i.e. there are twin methods, for example:</p>
<ul>
<li><code>RedisValue StringGet(RedisKey key)</code></li>
<li><code>Task<RedisValue> StringGetAsync(RedisKey key)</code></li>
</ul>
<p>Internally they are implemented very differently so that they both get the job done with the minimum of fuss and overhead, but they were both calling into the same <code>WriteMessage</code> at some point. Actually, never afraid to double-down on the anti-patterns, this means that for the <em>async</em> callers, they were effectively doing async-over-sync-over-async; ouch.</p>
<p>The <code>WriteMessage</code> code above is used from both the <em>synchronous</em> and <em>asynchronous</em> call paths. As it happens, much of our internal existing <em>application</em> codebase mostly uses the synchronous paths (we're gradually adding more <code>async</code>, but we need to complete our in-progress transition from .NET Framework to .NET Core to be able to do it more extensively), and on the synchronous paths you were always going to be blocked <em>anyway</em>, so from the perspective of synchronous callers, there's not really that much wrong with the above. It does what it promises: execute synchronously.</p>
<p>The <em>problem</em> here comes from <em>asynchronous</em> callers, who thought they were calling <code>StringGetAsync</code>, and their thread got blocked. The golden rule of <code>async</code> is: don't block an <code>async</code> caller unless you <em>really, really</em> have to. We broke this rule, and we had reports from users about big thread-jams with <code>async</code> call paths all stuck at <code>WriteMessage</code>, because <em>one thread</em> had paused for the flush, and all the other threads were trying to obtain the lock.</p>
<p>Note: the problem here isn't that "a backlog happened, and we had to delay" - that's just business as normal. That happens, especially when you need mutex-like semantics. The problem is that we <em>blocked the worker threads</em> (although we did at least have the good grace to include a timeout), which <em>under heavy load</em> caused thread-pool starvation and a cascading failure (again: watch the video above).</p>
<h2><a id="user-content-so-what-should-we-have-done-in-theory" class="anchor" aria-hidden="true" href="#so-what-should-we-have-done-in-theory"></a>So what <em>should</em> we have done <strong>in theory</strong>?</h2>
<p>Given that we have both synchronous and asynchronous call-paths, what we <em>should</em> do is have two versions of the write code:</p>
<ul>
<li><code>void WriteMessage(Message message)</code></li>
<li><code>ValueTask WriteMessageAsync(Message message)</code></li>
</ul>
<p>but we get into immediate problems when we talk about our locking mechanism. We can see this more clearly if we use a simple <code>lock</code> rather than the more complex <code>Monitor</code> usage above - the following <strong>does not compile</strong>:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">async</span> <span class="pl-en">ValueTask</span> <span class="pl-en">Foo</span>()
{
<span class="pl-k">lock</span> (<span class="pl-smi">syncLock</span>)
{
<span class="pl-c"><span class="pl-c">//</span> CS1996 - Cannot await in the body of a lock statement</span>
<span class="pl-k">await</span> <span class="pl-smi">Task</span>.<span class="pl-en">Delay</span>(<span class="pl-smi">SomeAmount</span>);
}
}</pre></div>
<p>The reason this doesn't work is that <code>lock</code> (aka <code>Monitor</code>) is <strong>thread-oriented</strong>. You need to <code>[Try]Enter</code> (take the lock) and <code>Exit</code> (release the lock) the constrained region from the same thread. But the moment you <code>await</code>, you're saying "this might continue synchronously, or it might resume later <em>on a different thread</em>". This actually has two consequences:</p>
<ul>
<li>it would mean that when we try to release the lock, it will fail because the resuming thread probably won't actually have it</li>
<li>when we <code>await</code>, we're releasing the <em>current</em> thread back to do <em>whatever else needs doing</em>… which <em>could</em> actually end up calling back into <code>Foo</code>… and <code>Monitor</code> is "re-entrant", meaning: if you have the lock <em>once</em>, you can actually <code>lock</code> <strong>again</strong> successfully (it maintains a counter internally), which means that code in a completely unrelated execution context could incorrectly end up <strong>inside</strong> the <code>lock</code>, <em>before</em> we've resumed from the <code>await</code> and logically released it</li>
</ul>
<p>As a side note, it is worth knowing that the compiler <em>only</em> spots this (CS1996) if you use <code>lock</code>; if you use manual <code>Monitor</code> code (because of timeouts), it won't warn you - you just need to know not to do this (which perhaps by itself is good motivation for "<code>lock</code> with timeout" as a language feature). Fortunately, I <em>did</em> know not to do this - and I moved to the next most obvious locking primitive: <a href="https://docs.microsoft.com/en-us/dotnet/api/system.threading.semaphoreslim" rel="nofollow"><code>SemaphoreSlim</code></a>. A semaphore is like <code>Monitor</code>, but instead of being thread-based, it is purely counter-based. Theoretically you can use a semaphore to say "no more than 5 in here", but in reality it is often used as a mutex by saying "no more than 1". <code>SemaphoreSlim</code> is particularly enticing because it has both synchronous and asynchronous APIs, allowing us to split our code in two fairly neatly:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">readonly</span> <span class="pl-smi">SemaphoreSlim</span> <span class="pl-smi">singleWriter</span>
<span class="pl-k">=</span> <span class="pl-k">new</span> <span class="pl-en">SemaphoreSlim</span>(<span class="pl-c1">1</span>); <span class="pl-c"><span class="pl-c">//</span> single writer</span>
<span class="pl-k">void</span> <span class="pl-en">WriteMessage</span>(<span class="pl-en">Message</span> <span class="pl-smi">message</span>)
{
<span class="pl-k">if</span> (<span class="pl-k">!</span><span class="pl-smi">singleWriter</span>.<span class="pl-en">Wait</span>(<span class="pl-smi">timeout</span>))
<span class="pl-en">ThrowTimeout</span>();
<span class="pl-k">try</span>
{
<span class="pl-en">ActuallyWriteTheThing</span>(<span class="pl-smi">message</span>);
<span class="pl-en">FlushSync</span>(); <span class="pl-c"><span class="pl-c">//</span> our hack from before</span>
}
<span class="pl-k">finally</span>
{
<span class="pl-smi">singleWriter</span>.<span class="pl-en">Release</span>();
}
}
<span class="pl-k">async</span> <span class="pl-en">ValueTask</span> <span class="pl-en">WriteMessageAsync</span>(<span class="pl-en">Message</span> <span class="pl-smi">message</span>)
{
<span class="pl-k">if</span> (<span class="pl-k">!</span><span class="pl-k">await</span> <span class="pl-smi">singleWriter</span>.<span class="pl-en">WaitAsync</span>(<span class="pl-smi">timeout</span>))
<span class="pl-en">ThrowTimeout</span>();
<span class="pl-k">try</span>
{
<span class="pl-en">ActuallyWriteTheThing</span>(<span class="pl-smi">message</span>);
<span class="pl-k">await</span> <span class="pl-en">FlushAsync</span>();
}
<span class="pl-k">finally</span>
{
<span class="pl-smi">singleWriter</span>.<span class="pl-en">Release</span>();
}
}</pre></div>
<p>This looks broadly similar to what we had before; the <code>new SemaphoreSlim(1)</code> initializes the semaphore with a limit of 1, i.e. a mutex. In the synchronous path, it works mostly like it always did, but the asynchronous path (now used by the asynchronous callers) now <em>correctly</em> releases worker threads back to wherever worker threads go - when either they can't get the lock yet, <em>or</em> when they are waiting (or rather: awaiting) on the flush. We still have the sync-over-async in the sync path, but that's not really a problem in this case - but we've completely fixed the async path. Short of <em>removing</em> or <a href="https://github.com/aspnet/Announcements/issues/342" rel="nofollow"><em>optionally disabling</em> the sync path</a> (which is an idea I'm putting serious thought into doing, as an opt-in thing), that's probably about as good as we can get.</p>
<p>This looks like it should work, and the chances are that this would have completely solved the problems being seen by our consumers with heavily asynchronous workloads. But one of the nice things about working at Stack Overflow is that I have an opportunity to dogfood library releases under Stack Overflow load (which isn't "big big" by any stretch, but it is comfortably big enough to give me confidence that the library isn't pathologically broken). So, we dropped the above changes into production (after testing etc.), and: BOOM!</p>
<p>We went down.</p>
<h2><a id="user-content-what-happened-there" class="anchor" aria-hidden="true" href="#what-happened-there"></a>What happened there?</h2>
<p>Fortunately, we were lucky enough to manage to grab some process dumps from the production servers in their death-throes before we stood them back up (with the older version of the library), and the stack-traces in the doomed processes were very interesting; they are pretty verbose, but something that kept recurring (note: I've inverted and summarised this trace for readability):</p>
<pre><code>WriteMessage
…
System.Threading.SemaphoreSlim.Wait
…
System.Threading.SemaphoreSlim.WaitAsync
…
KERNELBASE!WaitForMultipleObjectsEx
</code></pre>
<p>This was the case for 650+ threads - almost all of them; and critically, <strong>no-one</strong> actually had the lock - nobody was doing anything useful. The semaphore had, in an edge case, failed to activate the lucky winner of the <a href="https://en.wikipedia.org/wiki/Lord_of_the_Flies" rel="nofollow">conch</a>.</p>
<h3><a id="user-content-what-actually-went-wrong" class="anchor" aria-hidden="true" href="#what-actually-went-wrong"></a>What actually went wrong?</h3>
<p>Looking at it, our <em>synchronous</em> <code>WriteMessage</code> implementation, when calling <code>Wait</code> on the semaphore, was calling into <code>WaitAsync</code>, and then blocking at the kernel for the object. Despite looking odd, this by itself <em>isn't actually</em> a terrible idea. It turns out that <code>SemaphoreSlim</code> has different strategies that it uses internally:</p>
<ul>
<li>if you're just using synchronous <code>Wait</code> calls, it can handle everything using regular synchronous code and syncronous blocks</li>
<li>if you're just using <code>WaitAsync</code>, because it wants to release the caller promptly, it needs to maintain a queue (actually a linked-list) of waiting callers as <code>Task<bool></code>; when you release something, it takes the next item from one end of the list, and reactivates (<code>TrySetResult</code>) that caller</li>
<li>if you're using a mixture of <code>Wait</code> and <code>WaitAsync</code>, if it can't get access immediately, then it uses the <code>WaitAsync</code> approach so that the <code>Wait</code> and <code>WaitAsync</code> consumers are in the same queue - otherwise you'd have two separate queues and it gets very confusing and unpredictable</li>
</ul>
<p>Now this <em>seems</em> fine, but it turns out that the way it was using <code>TrySetResult</code> was… problematic. It wasn't using <code>TrySetResult</code> <em>directly</em>, but instead was <em>enqueuing</em> a work item to do the <code>TrySetResult</code>. There's actually a good - albeit now legacy - reason for this: thread stealing, <em>another</em> problem I've had to contend with many times.</p>
<p>When you call <code>TrySetResult</code> etc. on a <code>Task<T></code> (usually via <code>TaskCompletionSource<T></code>), it is possible (likely, even) that the <em>async continuation</em> is going to run immediately and inline <strong>on the thread that called <code>TrySetResult</code>.</strong> This is something you need to be really careful about - it can lead to dedicated IO threads somehow ending up serving web requests; or more generally: just … not doing what you expected. But in the scenario presented we got into a "spiral of death": due to a very brief blip from the <code>FlushAsync</code>, our workers had got stuck in the <code>Wait</code>-><code>WaitAsync</code> path, and the <strong>very thing</strong> that was meant to unblock everything: needed a worker. To release (resource) you need more of (resource), and (resource) is currently exhausted. It is <strong>almost impossible</strong> to recover from that situation due to the growth limits on workers, and the servers became increasingly unstable until they stopped working completely.</p>
<p>This is clearly a dangerous scenario, so we reported it as an issue, and amazingly within a day Stephen Toub had a <a href="https://github.com/dotnet/corefx/commit/ecba811b1438517ac90a957e2cfe4cef64a13861" rel="nofollow">surprisingly minimal and elegant fix for <code>SemaphoreSlim</code></a>. The commit message (and code changes themselves) explain it in more depth, but by configuring the queued tasks with the <code>TaskCreationOptions.RunContinuationsAsynchronously</code> flag, it means the "release" code can call <code>TrySetResult</code> <strong>directly</strong>, without needing an extra worker as an intermediary. In the specific case where the only thing waiting on the task is a synchronous <code>Wait</code>, the task code already has specific detection to unblock that scenario directly without needing a worker <em>at all</em>, and in the genuine <code>async</code>/<code>await</code> case, we just end up with the <em>actual work</em> going to the queue, rather than the "call <code>TrySetResult</code>" going to the queue. Tidy!</p>
<h2><a id="user-content-but-that-isnt-the-end-of-the-story" class="anchor" aria-hidden="true" href="#but-that-isnt-the-end-of-the-story"></a>But that isn't the end of the story</h2>
<p>It would be nice to say "all's well that ends well; bug in <code>SemaphoreSlim</code> fixed", but it isn't as easy as that. The fix for <code>SemaphoreSlim</code> <em>has</em> been merged, but a) that won't help "us" until the next .NET Framework service release, and b) as library authors, we can't rely on which service releases are on our <em>consumers'</em> machines. We need a fix that works reliably everywhere. So whilst it is great to know that our pain has improved things for future users of <code>SemaphoreSlim</code>, we needed something more immediate and framework-independent. So that's when I went away and created a bespoke synchronous/asynchronous <code>MutexSlim</code> that we are now using in StackExchange.Redis.</p>
<p>It is <em>amazing</em> how much simpler things become if you limit yourself to "0 or 1 people in the pool", so it wasn't <em>actually</em> that much work; but: I thought I knew a lot about <code>async</code>/<code>await</code>, yet in writing <code>MutexSlim</code> I dove deeper into that topic than I have usually had to; and in the second part I'll talk about some of what I learned.</p>Marc Gravellhttp://www.blogger.com/profile/01023334706549710089noreply@blogger.comtag:blogger.com,1999:blog-8184237816669520763.post-24368155922713010612018-12-06T06:34:00.002-08:002018-12-06T06:52:28.274-08:00A Thanksgiving Carol<p>Normally I write about programming topics (usually .NET); today I'm going to veer very far from that track - and talk about society, mental health, individual and corporate responsibility, and personal relationships. I genuinely hope you hear me out, but if that isn't your thing ... well, then you probably need to read it more than most. I could try a clever reverse psychology trick to oblige you to see it through, but you'd see straight through it... or would you?</p>
<p>My apologies in advance if I seem to be on a negative tone through much of this - I'm pulling no punches in something that has been a quite deep - and painful - personal journey and realisation. I assure you that it ends much more positively than the body might suggest. Maybe for me this is mostly cathartic self-indulgence and rambling, but.. it's my personal blog and I get to do that if I want. But if it makes even one person think for a few minutes, it has been well worth my time.</p>
<p>So; on with the real title:</p>
<h1><a id="user-content-technology-is-outpacing-our-individual-and-societal-health" class="anchor" aria-hidden="true" href="#technology-is-outpacing-our-individual-and-societal-health"></a>Technology is Outpacing our Individual and Societal Health</h1>
<p>This week, I've identified hugely with that famous (infamous?) festive favorite: Ebenezer Scrooge (humbug!). Not the usury part - but instead:</p>
<ul>
<li>the familiar story of spending a long time making choices that cause harm</li>
<li>having some catastrophic event or events bring everything into focus</li>
<li>having a genuine yet painful inspection of those past (and present) choices</li>
<li>consideration of what those choices mean for the future</li>
<li>undergoing a fundamental transformation, a realignment of priorities and thinking, that should lead to a much happier future</li>
<li>actively walking that path with actions, not just hollow words</li>
</ul>
<p>See, I got heavy and personal! Let's see how deep this rabbit hole goes. How to start...</p>
<hr>
<p>Recently I nearly destroyed my marriage and a relationship of nearly 25 years.</p>
<p>As opening lines go, it isn't quite up there with "Marley was dead: to begin with.", but it's all I've got. It wasn't anything huge and obvious like an affair or a huge violent argument. What I did was to make - over an extended period of time - a series of bad choices about my relationship with technology.</p>
<p>The reality of the era is that we are absolutely surrounded by technology - it encroaches and invades on every aspect of our lives, and it has progressed so fast that we haven't really had time to figure out where "healthy" lies. I must immediately stress that I don't say this to absolve myself of responsibility; we're adults, and we must own the choices that we make, even if we make those choices in an environment that normalises them. So what do I mean?</p>
<p>Ultimately, the heart of my personal failings here stem from how easy - and tempting - it can be to lose ourselves in a digital world. We live in such a hyper-connected time, surrounded by flashing instant updates at every turn. It is alarmingly easy to confuse the signals that this electronic phantom universe provides, prioritising them over the real world in front of us. I'm sure we can all relate to seeing a group of people out together, whether at a bar, a meal, or some other social gathering - and seeing the mobile phones come out regularly. Don't get me started on the idiots who think they can <em>drive</em> while distracted by a phone. I'm certainly guilty of occasionally "parenting" by observing the digitial-tablet-infused face of one of my children, by half-watching them over the top of a mobile. And I'd be lying if I said I'd never treated my marriage with the same over-familiarity bordering on contempt.</p>
<p>The digital world is so <em>easy and tempting</em>. Everything is immediate and easy. The real world takes effort, work, and time. When I was growing up, "allow 28 days for delivery" was a mantra; today, if something physical won't arrive within 28 <em>hours</em> we look at alternative vendors; for purely virtual items, we'd get twitchy and worried if it took 28 minutes.</p>
<p>I've reached the conclusion that among other things, I was - for want of a better word - in an addictive and unhealthy relationship with the internet. The internet is amazing and brilliant - and I'm not proposing we need to nuke it from orbit, but it is at our great peril that we think that it is always (or ever) without harm. We have grown complacent, when we should be treating it with respect and, yes, at times: fear - or at least concern.</p>
<p>We build a global platform for communicating data - all the shared collective knowledge and wisdom of the world past and present, and how do we choose to use it? If only it was "sharing cat pics", maybe the world would be a better place. Instead, <em>as people</em>, we mostly seem to use it for either validating ourselves in echo chambers (tip: nothing useful is ever achieved by listening to people you already agree with), or getting into angry anonymous rows with strangers. Either triggers a spurt of rewarding chemicals to the brain, but they're both usually entirely empty of any real achievement. If only that was the only mine to avoid.</p>
<h2><a id="user-content-perverse-incentives-and-eroded-psychological-walls" class="anchor" aria-hidden="true" href="#perverse-incentives-and-eroded-psychological-walls"></a>Perverse Incentives and Eroded Psychological Walls</h2>
<p>Again, I want to keep emphasizing that no personal responsibility is voided, but we haven't arrived at this place in isolation. At risk of sounding like a radical anti-capitalist (I'm not - really), corporate interests are actively averse to us having a healthy relationship with the internet. One way this materializes is in the notion of "engagement". Now; "engagement" by itself isn't an unreasonable measure, but as with most measures: the moment that we start treating it as a target, all hell breaks loose.</p>
<p>Because all genuine inspections should start at home, I'll start by talking about Stack Overflow. We have a measure there, on a user's profile page: consecutive days visited. We're not monsters, so we only display this on <em>your own</em> profile, but: I can only see negative things about this number. On its own, it adds nothing (not least: you can't compare it to anything), but: I know that at some point in history <em>I cared</em> about that number. I would try to do something, <em>anything</em> to not lose this number, including checking in while on family holidays. And here's the thing: the more you maintain it, <em>the more it feels to lose</em>. It is purely a psychological thing, but... when thinking about it, I can't think of a single positive use of this number. The <em>only</em> thing it does is encourage <strong>wholly harmful</strong> behaviours. I love our users, and I want them to be happy, rested, and healthy. Making users not want to <em>go even 24 hours without checking in with us</em> - well, that doesn't seem good to me. If anything, it sounds like a great way to cause burnout and frustration. I would <em>love</em> to start a conversation internally about whether we should just nuke that number entirely - or if anything, use it to prompt a user "hey, we really love you, but ... maybe take a day off? we'll see you next week!". As a counterpoint to that: we actively enforce a daily "rep cap", which I think is hugely positive thing towards sensible and healthy usage; I just want to call that out for balance and fairness.</p>
<p>Now consider: in the grand scheme of things: <em>we're cuddly kittens</em>. Just think what the Facebooks, Googles, etc are doing with psychological factors to drive "engagement". We've already seen the disclosures about Facebook's manipulation of feeds to drive specific responses. Corporations are often perversely incentivized to be at odds with healthy engagement. We can see this most clearly in sectors like gambling, pornography, gaming (especially in-game/in-app purchases, "pay to win"), drugs (whether legal or illicit), "psychics" (deal with the air-quotes) etc. Healthy customers are all well and good, but you make most of your money from the customers with <em>unhealthy</em> relationships. The idea of fast-eroding virtual "credit" is rife. If I can pick another example: I used to play quite a bit of <em>Elite: Dangerous</em>; I stopped playing around the time of the "Powerplay" update, which involved a mechanic around "merits" with a <em>steep</em> decay cycle: if you didn't play <em>significant</em> amounts of grind <em>every</em> week (without fail): you'd basically always have zero merits. This is far from unusual in today's games, especially where an online component exists. I've seen YouTube content creators talking about how they strongly feel that if they don't publish on a hard schedule, their channel tanks - and it doesn't even matter whether they're right: their behaviour is driven by the perception, not cold reality (whatever it may be).</p>
<p>I now accept that I had developed some unhealthy relationships with the internet. It hugely impacted my relationships at home, both in quality and quantity. I would either be unavailable, or when I was available, I would be... distracted. Checking my phone <em>way</em> too often - essentially not really present, except in the "meat" sense. Over time, this eroded things. Important things.</p>
<p>And yet as a society we've normalized it.</p>
<p>Let's look at some of the worst examples from above - gambling, pornography, drugs, etc: it used to be that if you had a proclivity in those directions, there would be some psychological or physical barrier: you'd need to go to the book-maker or casino, or that seedy corner-shop, or find a dealer. Now we have all of those things in our pocket, 24/7, offering anonymous instant access to the best and worst of everything the world has to offer. How would you know that your colleague has a gambling problem, when placing a bet looks identical to responding to a work email? As if that wasn't enough, we've even invented new ways of paying - "crypto-currency" - the key purposes of which are (in no particular order) "to ensure we don't get caught" and "to burn electricity pointlessly". There is possibly some third option about "decentralization" (is that just another word for "crowd-sourced money-laundering"? I can't decide), but I think we all know that in reality for most regular crypto-currency users this is a very far third option; it is probably more important for the organised criminals using it, but... that's another topic.</p>
<h2><a id="user-content-we-need-to-maintain-vigilance" class="anchor" aria-hidden="true" href="#we-need-to-maintain-vigilance"></a>We Need to Maintain Vigilance</h2>
<p>I wouldn't be saying all this if I thought it was all doom. I <em>do</em> think we've reached a fundamentally unhealthy place with technology; maybe we've been over-indulging in an initial excited binge, but: we <em>really</em> need to <strong>get over it</strong> and see where we're harming and being harmed. We <em>don't</em> need to obsess over our phones - those little notifications mean <em>almost nothing</em>. I'm <strong>absolutely not</strong> saying that I'm detaching myself from the internet, but I <em>am</em> treating it with a lot more respect - and caution. I'm actively limiting the times that I engage to times that <em>I am comfortable with</em>. There are very few things that are important enough to need your constant attention; things can wait. For most things: if it is genuinely urgent, <em>someone will simply call you</em>. I've completely and irrevocably blocked my access to a range of locations that (upon introspection) I found myself over-using, but which weren't helping me as a person - again, hollow validation like echo-chambers and empty arguments. I can limit my usage of things like "twitter" to useful professional interactions, not the uglier side of twitter politics. And I can ensure that in the time I spend with my family: I'm <em>actually there</em>. In mind and person, not just body. I've completely removed technology from the bedroom - and no, I'm not being crude there - there is a lot of important and useful discussion and just closeness time to be had there, without touching on more ... "intimate" topics. You really, <em>really</em> don't need to check your inbox while on the toilet - nobody deserves that; just leave the phone outside.</p>
<p>I got lucky; whatever problems I had, I was able to identify, isolate, and work through before they caused total destruction - and I need to be thankful for the support and patience of my wife. But it was genuinely close, and I need to acknowledge that. I'm happier today - and closer to my wife - than I have been in a long long time, mostly through my own casual fault. I'm cautious that the next person might not be so lucky. I'm also terrified about the upcoming generation of children who have very little baseline to compare to. What, for them, is "normal"? How much time at school and home are we dedicating to teaching these impressionable youths successful tactics for navigating the internet, and what that means for their "real" life? I think we can agree that when we hear of "Fortnite", "kids" and "rehab" being used in the same sentence: something is wrong somewhere.</p>
<p>Maybe somewhere along the line we (culture) threw the baby out with the bathwater. I'm not at all a religious person, but if I look at most established religions with that atheistic lens, I have to acknowledge that among the superstition: there are some good wisdoms about leading a good and healthy life - whether by way of moral codes (that vary hugely by religion), or by instilling a sense of personal accountability and responsibility, or by the simple act of finding time to sit quietly - regularly - and be honestly contemplative. To consider the consequences of our actions, even - perhaps especially - when we haven't had to do so directly. Humility, patience, empathy. I know in the past I've been somewhat dismissive of even non-theistic meditation, but: I suspect that it is something that I might now be in a position to appreciate.</p>
<p>To re-state: I'm OK; I am (and in terms of my marriage: we are) in a much better, stronger, healthier place than I (we) have been in a long time. I've had my Thanksgiving Miracle, and I've come out the other side with a renewed energy, and some fundamentally altered opinions. I'm interested in your thoughts here, but I'm not opening comments; again - we've made it too easy and anonymous! If you want to email me on this, please do (marc.gravell at gmail.com - if you could use "Thanksgiving Carol" in the subject, that'd really help me organize my inbox); I may respond, but I won't guarantee it, and I certainly won't guarantee an immediate response. I'm also deliciously conscious of the apparent irony of my blogging about the harms of the internet. But: if - as Joel assures me - "Developers are Writing the Script for the Future" - we need to start being a bit more outspoken about what that script says, and calling out when some measure of "success" of a product or service is likely impactful to healthy usage.</p>
<p>Closing: technology is great, the internet is great; but: we need to treat them with respect, and use them in sensible moderation. And pay <em>lots</em> more attention to the real world.</p>Marc Gravellhttp://www.blogger.com/profile/01023334706549710089noreply@blogger.comtag:blogger.com,1999:blog-8184237816669520763.post-70833534980737196362018-09-08T02:35:00.001-07:002018-09-08T02:35:28.785-07:00Monotoolism<h1><a id="One_Tool_To_Rule_Them_All_0"></a>One Tool To Rule Them All</h1>
<p>A recent twitter thread reminded me of a trope that I see frequently as a library author (and just as a general observer) - let’s call it “monotoolism”.</p>
<p>Examples of this might be examples like:</p>
<ul>
<li>“wait, you’re still using ‘LINQ to SQL’? I thought you were using ‘Dapper’?”</li>
<li>“Google’s protobuf impementation provides opinionated JSON parsing, but my JSON doesn’t fit that layout - how do I get the library to parse my layout?”</li>
<li><a href="https://stackoverflow.com/a/1732454/23354">“how do I parse HTML with a regular expression?”</a></li>
<li>etc</li>
</ul>
<p>The common theme here being the expectation that once you have <em>one</em> tool in a codebase that fits a particular category: <em>that’s it</em> - there is one and only one tool against each category; one “data access tool”, one “string parsing tool”, etc.</p>
<p>This has always irked me. I <em>understand where people are coming from</em> - they don’t want an <em>explosion</em> of different tools to have to worry about:</p>
<ul>
<li>they don’t want an overly complex dependency tree</li>
<li>they don’t want to have to check licensing / compliance etc against a huge number of libraries</li>
<li>they don’t want to have to train everyone to use a plethora of tools</li>
<li>etc</li>
</ul>
<p>It absolutely makes sense to minimize the dependency count, and to remove unnecessary library overlap. But the key word in that sentence: “unnecessary” - and I mean that in a fairly loose sense: you can use the handle of a screwdriver to drive in a nail if you try hard enough, but it is <em>much easier</em> (and you get a better outcome) if you use a hammer. I think I’d include a hammer as a “necessary” tool alongside a set of screwdrivers if you’re doing any form of construction (but is that a metric or imperial hammer?).</p>
<p>I often see people either expressing frustration that their chosen “one tool to rule them all” can’t do tangentially-related-feature-X, or bending their code <em>massively</em> out of shape to try to make it do it; sometimes they even succeed, which is even scarier as a library author - because now there’s some completely undesigned-for, unspecified, undocumented and just <em>unknown</em> usage in the wild (quite possibly abusing reflection to push buttons that aren’t exposed) that the library author is going to get yelled at when it breaks.</p>
<p><a href="https://xkcd.com/1172/"><img src="https://imgs.xkcd.com/comics/workflow.png" alt="XKCD: Workflow"></a></p>
<h1><a id="It_is_OK_to_use_more_than_one_tool_26"></a>It is OK to use more than one tool!</h1>
<p>Yes, it is desirable to minimize the number of unnecessary tools. But: <strong>it is OK to use more than one tool</strong>. Expected, even. You <em>absolutely should</em> be wary of uncontrolled tool propogation, but I strongly advocate <em>against</em> being too aggressive with rebukes along the lines of:</p>
<blockquote>
<p>We already have a tool that does something kinda like that; can you just torture the tool and the domain model a bit and see if it works well enough to just about work?</p>
</blockquote>
<p>Remember, the options here are:</p>
<ol>
<li>two (or more) different tools, each used in their intended way, closely following their respective documented examples in ways that are “obviously right” and which it is easy to ask questions of the library authors or the community</li>
<li>one single tool, tortured and warped beyond recognition, looking nothing like… <em>anything</em>, where even the tool’s authors can’t understand what you’re doing (let alone why, and they’re probably too afraid to ask), where you’re the <em>only usage like that, ever</em>, and where your “elegant hack” might stop working in the next minor revision, because it wasn’t a tested scenario</li>
</ol>
<p>I prefer “1”. It’ll keep your model cleaner. It’ll keep you relationship with the tool more successful. Yes, it will mean that you occasionally need more than one tool listed in a particular box. Deal with it! If the tool <em>really is</em> complex enough that this is problematic, just move the ugly complexity behind some abstraction, then only a limited number of people need to worry about <em>how</em> it works.</p>
<p>Always use the right tool for the job.</p>Marc Gravellhttp://www.blogger.com/profile/01023334706549710089noreply@blogger.comtag:blogger.com,1999:blog-8184237816669520763.post-2090008337841825372018-08-02T16:26:00.001-07:002018-08-02T16:46:04.979-07:00protobuf-net, August 2018 update<h1>An update on what's happening with <code>protobuf-net</code></h1>
<p>Headline: .proto processing now works directly from <code>dotnet build</code> and MSBuild, without any need for DSL processing steps; and - new shiny things in the future.</p>
<hr>
<p>I haven't spoken about protobuf-net for quite a while, but: it is <em>very much</em> alive and active. However, I really should do a catch-up, and I'm <em>really</em> excited about where we are.</p>
<h2><a id="user-content-level-100-primer-if-you-dont-know-what-protobuf-is" class="anchor" aria-hidden="true" href="#level-100-primer-if-you-dont-know-what-protobuf-is"><svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Level 100 primer, if you don't know what "protobuf" is</h2>
<p>"protobuf" is <a href="https://developers.google.com/protocol-buffers/" rel="nofollow">Protocol Buffers</a>, Google's cross-platform/language/OS/etc serialization format (and associated tools). It is <em>primarily</em> a dense binary format, but a JSON variant also exists. A lot of Google's public and private APIs are protobuf, but it is used widely outside of Google too.</p>
<p>The data/schema is <em>often</em> described via a custom DSL, <a href="https://developers.google.com/protocol-buffers/docs/proto" rel="nofollow">.proto</a> - which comes in 2 versions (proto2 and proto3). They both describe the same binary format.</p>
<p>Google provide implementations for a range of platforms including C# (note: "proto3" only), but ... I kinda find the "DSL first, always" approach limiting (I like the flexibility of "code first"), and ... the Google implementation is "Google-idiomatic", rather than ".NET idiomatic".</p>
<p>Hence <a href="https://www.nuget.org/packages/protobuf-net/" rel="nofollow">protobuf-net exists</a>; it is a fast/dense binary serializer that implements the protobuf specifiction, but which is .NET-idiomatic, and allows either code-first or DSL-first. I use it a lot.</p>
<p>Historically, it was biased towards "code first", with the "DSL first" tools a viable but more awkward option.</p>
<h2><a id="user-content-whats-changed-lately" class="anchor" aria-hidden="true" href="#whats-changed-lately"><svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>What's changed lately?</h2>
<h3><a id="user-content-bespoke-managed-dsl-parser" class="anchor" aria-hidden="true" href="#bespoke-managed-dsl-parser"><svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Bespoke managed DSL parser</h3>
<p>Just over a year ago now, back in 2.3.0, I released a new set of DSL parsing tools. In the past, protobuf-net's tooling (<code>protogen</code>) made use of Google's <code>protoc</code> tool - a binary executable that processes .proto files, but this was <em>incredibly</em> akward to deploy between platforms. Essentially, the tools would <em>probably</em> work on Windows, but that was about it. This wasn't a great option going forward, so I implemented a completely bespoke 100% managed-code parser and code-generator that didn't depend on <code>protoc</code> at all. <code>protogen</code> was reborn (and it works with both "proto2" and "proto3"), but it lacked a good deployment route.</p>
<h3><a id="user-content-playground-website" class="anchor" aria-hidden="true" href="#playground-website"><svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Playground website</h3>
<p>Next, I threw together <a href="https://protogen.marcgravell.com/" rel="nofollow">protogen.marcgravell.com</a>. This is an ASP.NET Core web app that uses the same library code as <code>protogen</code>, but in an interactive web app. This makes for a pretty easy way to play with .proto files, including a code-editor and code generator. It also hosts <code>protoc</code>, if you prefer that - and includes a <em>wide range</em> of Google's API definitions available as <code>import</code>s. This is a very easy way of working with casual .proto usage, and it provides a download location for the standalone <code>protogen</code> tools. It isn't going to win any UI awards, but it works. It even <a href="https://protogen.marcgravell.com/decode" rel="nofollow">includes a decoder</a>, if you want to understand serialized protobuf data.</p>
<h3><a id="user-content-global-tools" class="anchor" aria-hidden="true" href="#global-tools"><svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Global tools</h3>
<p>Having a download for the command-line tools is a great step forward, but ... it is still a lot of hassle. If only there were a way of installing managed-code developer tools in a convenient way. Well, there is: .NET "global tools"; so, a few months ago I added <a href="https://www.nuget.org/packages/protobuf-net.Protogen/" rel="nofollow"><code>protobuf-net.Protogen</code></a>. As a "global tool", this can be installed once via</p>
<pre><code>dotnet tool install --global protobuf-net.Protogen
</code></pre>
<p>and then <code>protogen</code> will be available anywhere, as a development tool. Impressively, "global tools" work <em>between operating systems</em>, so the exact same package will also work on linux (and presumably Mac). This starts to make .proto very friendly to work with, as a developer.</p>
<h3><a id="user-content-build-tools" class="anchor" aria-hidden="true" href="#build-tools"><svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Build tools</h3>
<p>I'm going to be frank and honest: MSBuild scares the bejeezus out of me. I don't understand .targets files, etc. It is a huge blind-spot of mine, but I've made my peace with that reality. So... I was <em>genuinely delighted</em> to receive a pull request from <a href="https://twitter.com/markpflug" rel="nofollow">Mark Pflug</a> that fills in the gaps! What this adds is <a href="https://www.nuget.org/packages/protobuf-net.MSBuild" rel="nofollow"><code>protobuf-net.MSBuild</code></a> - tools that tweak that build process from <code>dotnet build</code> and <code>MSBuild</code>. What this means is that you can just install <code>protobuf-net.MSBuild</code> into a project, and it automatically runs the .proto → C# code-generation steps <em>as part of build</em>. This means you can just maintain your .proto files without any need to generate the C# as a separate step. You can still extend the <code>partial</code> types in the usualy way. All you need to do is make sure the .proto files are in the project. It even includes the common Google <code>import</code> additions for free (without any extra files required), so: if you know what a <code>.google.protobuf.timestamp.Timestamp</code> is - know that it'll work without you having to add the relevant .proto file manually (although you still need the <code>import</code> statement).</p>
<p>I can't understate how awesome I think these tools are, and how much friendlier it makes the "DSL first" scenario. Finally, protobuf-net can use .proto as a truly first class experience. Thanks again, Mark Pflug!</p>
<h2><a id="user-content-what-next" class="anchor" aria-hidden="true" href="#what-next"><svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>What next?</h2>
<p>That's where we are <em>today</em>, but : to give an update on my plans and priorities going forwards...</p>
<h3><a id="user-content-spans-and-pipelines" class="anchor" aria-hidden="true" href="#spans-and-pipelines"><svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Spans and Pipelines</h3>
<p>You might have noticed me talking about these a little lately; I've done lots of <em>research</em> to look at what protobuf-net might do with these, but it is probably time to start looking at doing it "for real". The first step there is getting some real timings on the performance difference between a few different approaches</p>
<h3><a id="user-content-aot" class="anchor" aria-hidden="true" href="#aot"><svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>AOT</h3>
<p>In particular, platforms that don't allow IL-emit. This helps consumers like UWP, Unity, iOS, etc. They usually currently <em>work</em> with protobuf-net, but via huge compromises. To do better, we need radically overhaul how we approach those platforms. I see two viable avenues to explore there.</p>
<ol>
<li>
<p>we can enhance the .proto codegen (the bits that <code>protobuf-net.MSBuild</code> just made tons better), to include generation of <em>the actual serialization code</em></p>
</li>
<li>
<p>we can implement Roslyn-based tools that pull apart code-first usage to understand the model, and emit the serialization code at build time</p>
</li>
</ol>
<p>All of these are going to keep me busy into the foreseeable!</p>Marc Gravellhttp://www.blogger.com/profile/01023334706549710089noreply@blogger.comtag:blogger.com,1999:blog-8184237816669520763.post-64192061156846879472018-07-30T03:52:00.003-07:002018-07-30T06:27:11.319-07:00Pipe Dreams, part 3.1<h1><a id="user-content-pipelines---a-guided-tour-of-the-new-io-api-in-net-part-31" class="anchor" aria-hidden="true" href="#pipelines---a-guided-tour-of-the-new-io-api-in-net-part-31"><svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Pipelines - a guided tour of the new IO API in .NET, part 3.1</h1>
<p>(<a href="https://blog.marcgravell.com/2018/07/pipe-dreams-part-1.html" rel="nofollow">part 1</a>, <a href="https://blog.marcgravell.com/2018/07/pipe-dreams-part-2.html" rel="nofollow">part 2</a>, <a href="https://blog.marcgravell.com/2018/07/pipe-dreams-part-3.html" rel="nofollow">part 3</a>)</p>
<p>After part 3, I got some great feedback - mostly requests to clarify things that I touched on, but could do with further explanation. Rather than make part 3 <em>even longer</em>, I want to address those here! Yay, more words!</p>
<hr>
<h3><a id="user-content-isnt-arraypoolownert-doing-the-same-thing-as-memorypoolt-why-dont-you-just-use-memorypoolt-" class="anchor" aria-hidden="true" href="#isnt-arraypoolownert-doing-the-same-thing-as-memorypoolt-why-dont-you-just-use-memorypoolt-"><svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Isn't <code>ArrayPoolOwner<T></code> doing the same thing as <code>MemoryPool<T></code>? Why don't you just use <code>MemoryPool<T></code> ?</h3>
<p>Great question! I didn't actually mention <code>MemoryPool<T></code>, so I'd better introduce it.</p>
<p><code>MemoryPool<T></code> is an <code>abstract</code> base type that offers an API of the form:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">public</span> <span class="pl-k">abstract</span> <span class="pl-k">class</span> <span class="pl-en">MemoryPool</span><<span class="pl-en">T</span>> : <span class="pl-en">IDisposable</span>
{
<span class="pl-k">public</span> <span class="pl-k">abstract</span> <span class="pl-en">IMemoryOwner</span><<span class="pl-en">T</span>> <span class="pl-en">Rent</span>(<span class="pl-k">int</span> <span class="pl-smi">minBufferSize</span> <span class="pl-k">=</span> <span class="pl-k">-</span><span class="pl-c1">1</span>);
<span class="pl-c"><span class="pl-c">//</span> not shown: a few unrelated details</span>
}</pre></div>
<p>As you can see, this <code>Rent()</code> method looks exactly like what we were looking for before - it takes a size and returns an <code>IMemoryOwner<T></code> (to provide a <code>Memory<T></code>), with it being returned from whence it came upon disposal.</p>
<p><code>MemoryPool<T></code> also has a default implementation (<code>public static MemoryPool<T> Shared { get; }</code>), which returns a <code>MemoryPool<T></code> that is based on the <code>ArrayPool<T></code> (i.e. <a href="https://github.com/dotnet/corefx/blob/master/src/System.Memory/src/System/Buffers/ArrayMemoryPool.cs" rel="nofollow"><code>ArrayMemoryPool<T></code></a>). The <code>Rent()</code> method returns an <a href="https://github.com/dotnet/corefx/blob/master/src/System.Memory/src/System/Buffers/ArrayMemoryPool.ArrayMemoryPoolBuffer.cs" rel="nofollow"><code>ArrayMemoryPoolBuffer</code></a>, which looks <em>remarkably like</em> the thing that I called <code>ArrayPoolOwner<T></code>.</p>
<p>So: a very valid question would be: "Marc, didn't you just re-invent the default memory pool?". The answer is "no", but it is for a <em>very</em> subtle reason that I probably should have expounded upon at the time.</p>
<p>The problem is in the name <code>minBufferSize</code>; well... not really the <em>name</em>, but the <em>consequence</em>. What this means is: when you <code>Rent()</code> from the default <code>MemoryPool<T>.Shared</code>, the <code>.Memory</code> that you get back will be <em>over-sized</em>. Often this isn't a problem, but in our case we <em>really</em> want the <code>.Memory</code> to represent the actual number of bytes that were sent (<em>even if</em> we are, behind the scenes, using a larger array from the pool to contain it).</p>
<p>We <em>could</em> use an extension method on arbitrary memory pools to <em>wrap</em> potentially oversized memory, i.e.</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">public</span> <span class="pl-k">static</span> <span class="pl-en">IMemoryOwner</span><<span class="pl-en">T</span>> <span class="pl-en">RentRightSized</span><<span class="pl-en">T</span>>(
<span class="pl-k">this</span> <span class="pl-en">MemoryPool</span><<span class="pl-en">T</span>> <span class="pl-smi">pool</span>, <span class="pl-k">int</span> <span class="pl-smi">size</span>)
{
<span class="pl-k">var</span> <span class="pl-smi">leased</span> <span class="pl-k">=</span> <span class="pl-smi">pool</span>.<span class="pl-en">Rent</span>(<span class="pl-smi">size</span>);
<span class="pl-k">if</span> (<span class="pl-smi">leased</span>.<span class="pl-smi">Memory</span>.<span class="pl-smi">Length</span> <span class="pl-k">==</span> <span class="pl-smi">size</span>)
<span class="pl-k">return</span> <span class="pl-smi">leased</span>; <span class="pl-c"><span class="pl-c">//</span> already OK</span>
<span class="pl-k">return</span> <span class="pl-k">new</span> <span class="pl-en">RightSizeWrapper</span><<span class="pl-en">T</span>>(<span class="pl-smi">leased</span>, <span class="pl-smi">size</span>);
}
<span class="pl-k">class</span> <span class="pl-en">RightSizeWrapper</span><<span class="pl-en">T</span>> : <span class="pl-en">IMemoryOwner</span><<span class="pl-en">T</span>>
{
<span class="pl-k">public</span> <span class="pl-en">RightSizeWrapper</span>(
<span class="pl-en">IMemoryOwner</span><<span class="pl-en">T</span>> <span class="pl-smi">inner</span>, <span class="pl-k">int</span> <span class="pl-smi">length</span>)
{
<span class="pl-smi">_inner</span> <span class="pl-k">=</span> <span class="pl-smi">inner</span>;
<span class="pl-smi">_length</span> <span class="pl-k">=</span> <span class="pl-smi">length</span>;
}
<span class="pl-en">IMemoryOwner</span><<span class="pl-en">T</span>> <span class="pl-smi">_inner</span>;
<span class="pl-k">int</span> <span class="pl-smi">_length</span>;
<span class="pl-k">public</span> <span class="pl-k">void</span> <span class="pl-en">Dispose</span>() <span class="pl-k">=></span> <span class="pl-smi">_inner</span>.<span class="pl-en">Dispose</span>();
<span class="pl-k">public</span> <span class="pl-en">Memory</span><<span class="pl-en">T</span>> <span class="pl-smi">Memory</span>
<span class="pl-k">=></span> <span class="pl-smi">_inner</span>.<span class="pl-smi">Memory</span>.<span class="pl-en">Slice</span>(<span class="pl-c1">0</span>, <span class="pl-smi">_length</span>);
}</pre></div>
<p>but... this would mean allocating <em>two</em> objects for most leases - one for the <em>actual</em> lease, and one for the thing that fixes the length. So, since we only <em>really</em> care about the array-pool here, it is preferable IMO to cut out the extra layer, and write our own right-sized implementation from scratch.</p>
<p>So: that's the difference in the reasoning and implementation. As a side note, though: it prompts the question as to whether I should refactor my API to actually implement the <code>MemoryPool<T></code> API.</p>
<hr>
<h3><a id="user-content-you-might-not-want-to-complete-with-success-if-the-cancellation-token-is-cancelled" class="anchor" aria-hidden="true" href="#you-might-not-want-to-complete-with-success-if-the-cancellation-token-is-cancelled"><svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>You <em>might</em> not want to complete with success if the cancellation token is cancelled</h3>
<p>This is in relation to the <code>while</code> in the read loop:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">while</span> (<span class="pl-k">!</span><span class="pl-smi">cancellationToken</span>.<span class="pl-smi">IsCancellationRequested</span>)
{...}</pre></div>
<p>The <em>more typical</em> expectation for cancellation is for it to throw with a cancellation exception of some kind; therefore, if it <em>is</em> cancelled, I might want to reflect that.</p>
<p>This is very valid feedback! Perhaps the most practical fix here is simply to use <code>while (true)</code> and let the subsequent <code>await reader.ReadAsync(cancellationToken)</code> worry about what cancellation should look like.</p>
<hr>
<h3><a id="user-content-you-should-clarify-about-testing-the-result-in-async-sync-path-scenarios" class="anchor" aria-hidden="true" href="#you-should-clarify-about-testing-the-result-in-async-sync-path-scenarios"><svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>You should clarify about testing the result in async "sync path" scenarios</h3>
<p>In my aside about <code>async</code> uglification (optimizing when we <em>expect</em> it to be synchronous in most cases), I ommitted to talk about
getting results from the pseudo-awaited operation. Usually, this comes down to calling <code>.Result</code> on an awaitable (something like
a <code>Task<T></code>, <code>ValueTask<T></code>, or <code>.GetResult()</code> on an awaiter (the thing you get from <code>.GetAwaiter()</code>). I haven't done it <em>in the
example</em> because in <code>async</code> terms this would simply have been an <code>await theThing;</code> usage, not a <code>var local = await theThing;</code> usage; but
you can if you need that.</p>
<p>I must, however, clarify a few points that perhaps weren't clear:</p>
<ul>
<li>you <strong>should not</strong> (usually) try to access the <code>.Result</code> of a task <strong>unless</strong> you know that it
has already completed</li>
<li>knowing that it has completed <em>isn't enough</em> to know that it has completed <em>successfully</em>; if you only test "is completed", you can use <code>.GetResult()</code> on the awaiter to <em>check for exceptions</em> while also fetching the result (which you can then discard if you like)</li>
<li>in my case, I'm taking a shortcut by checking <code>IsCompletedSuccessfully</code>; this exists on <code>ValueTask[<T>]</code> (and on <code>Task[<T>]</code> in .NET Core 2.0, else you can check <code>.Status == TaskStatus.RanToCompletion</code>) - which is only <code>true</code> in the "completed without an exception" case</li>
<li>because of expectations around how exceptions on <code>async</code> operations are wrapped and surfaced, it is almost always preferable to just switch into the <code>async</code> flow if you know a task has faulted, and just <code>await</code> it; the compiler knows how to get the exception out in the most suitable way, so: let it do the hard work</li>
</ul>
<hr>
<h3><a id="user-content-you-should-explain-more-about-valuetaskt-vs-taskt---not-many-people-understand-them-well" class="anchor" aria-hidden="true" href="#you-should-explain-more-about-valuetaskt-vs-taskt---not-many-people-understand-them-well"><svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>You should explain more about <code>ValueTask[<T>]</code> vs <code>Task[<T>]</code> - not many people understand them well</h3>
<p>OK! Many moons ago, <code>Task<T></code> became a thing, and all was well. <code>Task<T></code> actually happened <em>long before</em> C# had any kind of support
for <code>async</code>/<code>await</code>, and the main scenarios it was concerned about were <em>genuinely asynchronous</em> - it was <em>expected</em> that the answer
<em>would not be immediately available</em>. So, the ovehead of allocating a placeholder object was fine and dandy, dandy and fine.</p>
<p>As the usage of <code>Task<T></code> grew, and the language support came into effect, it started to become clear that there were many cases where:</p>
<ul>
<li>the operation would often be available immediately (think: caches, buffered data, uncontested locking primitives, etc)</li>
<li>it was being used <em>inside a tight loop</em>, or just at high frequency - i.e. something that happens thousands of times a second (file IO, network IO, synchronization over a collection, etc)</li>
</ul>
<p>When you put those two things together, you find yourself allocating <em>large numbers</em> of objects for something that <em>was only rarely actually asynchronous</em> (so: when there <em>wasn't</em> data available in the socket, or the file buffer was empty, or the lock was contested). For some scenarios, there are pre-completed reusable task instances available (such as <code>Task.CompletedTask</code>, and inbuilt handling for some low integers), but this doesn't help if the return value is outside this very limited set. To help avoid the allocations in the general case, <code>ValueTask[<T>]</code> was born. A <code>ValueTask[<T>]</code> is a <code>struct</code> that implements the "awaitable" pattern (a duck-typed pattern, like <code>foreach</code>, but that's a story for another day), that essentially contains two fields:</p>
<ul>
<li>a <code>T</code> if the value was known immediately (obviously not needed for the untyped <code>ValueTask</code> case)</li>
<li>a <code>Task<T></code> if the value is pending and the answer depends on the result of the incomplete operation</li>
</ul>
<p>That means that <em>if the value is known now</em>, no <code>Task[<T>]</code> (and no corresponding <code>TaskCompletionSource<T></code>) <em>ever needs to be allocated</em> - we just throw back the <code>struct</code>, it gets unwrapped by the <code>async</code> pattern, and life is good. Only in the case where the operation is <em>actually asynchronous</em> does an object need to be allocated.</p>
<p>Now, there are three common views on what we should do with this:</p>
<ol>
<li>always expose <code>Task[<T>]</code>, regardless of whether it is likely to be synchronous</li>
<li>expose <code>Task[<T>]</code> if we know it will be async, expose <code>ValueTask[<T>]</code> if we think it <em>may</em> be synchronous</li>
<li>always expose <code>ValueTask[<T>]</code></li>
</ol>
<p>Frankly, the only valid reason to use <code>1</code> is because your API surface was baked and fixed back before <code>ValueTask[<T>]</code> existed.</p>
<p>The choice between <code>2</code> and <code>3</code> is interesting; what we're actually talking about there is an implementation detail, so a good case <em>could be argued</em> for <code>3</code>, allowing you to amaze yourself later if you find a way of doing something synchronously (where it was previously asynchronous), without breaking the API. I went for <code>2</code> in the code shown, but it would be something I'd be willing to change without much prodding.</p>
<p>You should also note that there is actually a fourth option: <strong>use custom awaitables</strong> (meaning: a custom type that implements the "awaitable" duck-typed pattern). This is an advanced topic, and needs <em>very</em> careful consideration. I'm not even going to give examples of <em>how</em> to do that, but it is worth noting that <code>ReadAsync</code> and <code>FlushAsync</code> ("pipelines" methods that we've used extensively here) <em>do return</em> custom awaitables. You'd need to <em>really, really</em> understand your reasons before going down that path, though.</p>
<hr>
<h3><a id="user-content-i-spotted-a-bug-in-your-next-message-number-code" class="anchor" aria-hidden="true" href="#i-spotted-a-bug-in-your-next-message-number-code"><svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>I spotted a bug in your "next message number" code</h3>
<p>Yes, the code shown in the post can generate two messages with id <code>1</code>, after 4-billion-something messages:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-smi">messageId</span> <span class="pl-k">=</span> <span class="pl-k">++</span><span class="pl-smi">_nextMessageId</span>;
<span class="pl-k">if</span> (<span class="pl-smi">messageId</span> <span class="pl-k">==</span> <span class="pl-c1">0</span>) <span class="pl-smi">messageId</span> <span class="pl-k">=</span> <span class="pl-c1">1</span>;</pre></div>
<p>Note that I didn't increment <code>_nextMessageId</code> when I dodged the sentinel (zero). There's also a <em>very</em> small chance that a previous message from 4-billion-ago <em>still hasn't been replied to</em>. Both of these are fixed in the "real" code.</p>
<hr>
<h3><a id="user-content-you-might-be-leaking-your-lease-around-the-trysetresult" class="anchor" aria-hidden="true" href="#you-might-be-leaking-your-lease-around-the-trysetresult"><svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>You might be leaking your lease around the <code>TrySetResult</code></h3>
<p>In the original blog code, I had</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-smi">tcs</span><span class="pl-k">?</span>.<span class="pl-en">TrySetResult</span>(<span class="pl-smi">payload</span>.<span class="pl-en">Lease</span>());</pre></div>
<p>If <code>tcs</code> is not <code>null</code> (via the "Elvis operator"), this allocates a lease and then invokes <code>TrySetResult</code>. However, <code>TrySetResult</code> <em>can return <code>false</code></em> - meaning: it couldn't do that, because the underlying task was already completed in some other way (perhaps we added timeout code). The only time we should consider that we have <em>successfully</em> transferred ownership of the lease to the task is if it returns <code>true</code>. The real code fixes this, ensuring that it is disposed in all cases <em>except</em> where <code>TrySetResult</code> returns <code>true</code>.</p>
<hr>
<h3><a id="user-content-what-about-incremental-frame-parsers" class="anchor" aria-hidden="true" href="#what-about-incremental-frame-parsers"><svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>What about incremental frame parsers?</h3>
<p>In my discussion of handling the frame, I was using an approach that processed a frame either in it's entirety, or not at all. This is not the only option, and you can consume <em>any amount of the frame that you want</em>, as long as you write code to track the internal state. For example, if you are parsing http, you could parse the http headers into some container as long as you have at least one entire http header name/value pair (without requiring <em>all</em> the headers to start parsing). Similarly, you could consume <em>some</em> of the payload (perhaps writing what you have so far to disk). In both cases, you would simply need to <code>Advance</code> past the bit that you consider consumed, and update your own state object.</p>
<p>So yes, that is absolutely possible - even highly desirable in some cases. In some cases it is highly desirable <em>not to start</em> until you've got everything. Remember that parsing often means taking the data from a <em>streamed</em> representation, and pushing it into a <em>model</em> representation - you might actually need <em>more</em> memory for the <em>model</em> representation (especially if the source data is compressed or simply stored in a dense binary format). An <em>advantage</em> of incremental parsing is that when the last few bytes dribble in, you might already have done <em>most</em> of the parsing work - allowing you to overlap pre-processing the data with data receive - rather than "buffer, buffer, buffer; right - now <em>start</em> parsing it".</p>
<p>However, in the case I was discussing: the header was 8 bytes, so there's not much point trying to over-optimize; if we don't have an entire header <em>now</em>, we'll mostly likely have a complete header when the next packet arrives, or we'll <em>never</em> have an entire header. Likewise, because we want to hand the entire payload to the consumer as a single chunk, we need all the data. We <em>could</em> actually lease the target array as soon as we know the size, and start copying data into <em>that</em> buffer and releasing the source buffers. We're not actually gaining much by this - we're simply exchanging data in one buffer for the same amount of data in another buffer; but we're actually exposing ourselves to an attack vector: a malicious (or buggy) client can sent a message-header that claims to be sending a large amount of data (maybe 1GiB), then just ... keeps the socket open and doesn't send anything more. In this scenario, the client has sent <em>almost nothing</em> (maybe just 8 bytes!), but they've chewed up a <b>lot</b> of server memory. Now imagine they do this from 10, 100, or 1000 parallel connections - and you can see how they've achieved <em>disproportionate</em> harm to our server, for almost zero cost at the client. There are two pragmatic fixes for this:</p>
<ul>
<li>Put an upper limit on the message size, and put an upper limit on the connections from a single endpoint
<li>Make the client pay their dues: if they claim to be sending a large message (which may indeed have legitimate uses): don't lease any expensive resources until they've actually sent that much (which is what the code as-implemented achieves)
</ul>
<p>Emphasis: your choice of frame parsing strategy is entirely contextual, and you can play with other implementations.</p>
<hr>
<p>So; that's the amendments. I hope they are useful. A huge "thanks" to the people who are keeping me honest here, including Shane Grüling, David Fowler, and Nick Craver.</p>Marc Gravellhttp://www.blogger.com/profile/01023334706549710089noreply@blogger.comtag:blogger.com,1999:blog-8184237816669520763.post-60071335528837361432018-07-29T07:23:00.000-07:002018-08-01T07:51:12.993-07:00Pipe Dreams, part 3<h1><a id="user-content-pipelines---a-guided-tour-of-the-new-io-api-in-net-part-3" class="anchor" aria-hidden="true" href="#pipelines---a-guided-tour-of-the-new-io-api-in-net-part-3"><svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Pipelines - a guided tour of the new IO API in .NET, part 3</h1>
<p>Update: please also see <a href="https://blog.marcgravell.com/2018/07/pipe-dreams-part-31.html">part 3.1</a> for further clarifications on this post</p>
<p>Sorry, it has been longer than anticipated since <a href="https://blog.marcgravell.com/2018/07/pipe-dreams-part-2.html" rel="nofollow">part 2</a> (also: <a href="https://blog.marcgravell.com/2018/07/pipe-dreams-part-1.html" rel="nofollow">part 1</a>). A large part of the reason for that is that I've been trying to think how best to explain some of the inner guts of <a href="https://github.com/StackExchange/StackExchange.Redis"><code>StackExchange.Redis</code></a> in a way that makes it easy to understand, and is useful for someone trying to learn about "pipelines", not <code>StackExchange.Redis</code>. I've also been thinking on ways to feed more practical "pipelines" usage guidance into the mix, which was something that came up <em>a lot</em> in feedback to parts 1/2.</p>
<p>In the end, I decided that the best thing to do was to step back from <code>StackExchange.Redis</code>, and use a <em>completely different example</em>, but one that faces almost all of the same challenges.</p>
<p>So, with your kind permission, I'd like to deviate from our previously advertised agenda, and instead talk about a library by my colleague <a href="https://twitter.com/haneycodes" rel="nofollow">David Haney</a> - <a href="https://github.com/haneytron/simplsockets"><code>SimplSockets</code></a>. What I hope to convey is a range of both the <em>reasoning</em> behind prefering pipelines, but also practical guidance that the reader can directly transfer to their own IO-based needs. In particular, I hope to discuss:</p>
<ul>
<li>different ways to pass chunks of data between APIs</li>
<li>working effectively with the array-pool</li>
<li><code>async</code>/<code>await</code> optimization in the context of libraries</li>
<li>practical real-world examples of writing to and reading from pipelines</li>
<li>how to connect pipelines client and server types to the network</li>
<li>performance comparisons from pipelines, and tips on measuring performance</li>
</ul>
<p>I'll be walking through <em>a lot</em> of code here, but I'll also be making the "real" code available for further exploration; this also includes some things I dodn't have time to cover here, such as how to host a pipelines server inside the Kestrel server.</p>
<p>Sound good?</p>
<h2><a id="user-content-what-is-simplsockets" class="anchor" aria-hidden="true" href="#what-is-simplsockets"><svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>What is <code>SimplSockets</code>?</h2>
<p>This is a network helper library designed to make it easier to implement basic client/server network comms over a socket:</p>
<ul>
<li>it implements a simple framing protocol to separate messages</li>
<li>it allows for concurrent usage over a single client, with a message queuing mechanism</li>
<li>it embeds additional data in the framing data to allow responses to be tied back to requests, to complete operations</li>
<li>out-of-order and out-of-band replies are allowed - you might send requests <code>A</code>, <code>B</code>, <code>C</code> - and get the responses <code>A</code>, <code>C</code>, <code>D</code>, <code>B</code> - i.e. two of the responses came in the opposite order (presumably <code>B</code> took longer to execute), and <code>D</code> came from the server unsolicited (broadcasts, etc)</li>
<li>individual messages are always complete in a single frame - there is no frame splitting</li>
<li>in terms of API surface: everything is synchronous and <code>byte[]</code> based; for example the client has a <code>byte[] SendReceive(byte[])</code> method that sends a payload and blocks until the corresponding response is received, and there is a <code>MessageReceived</code> event for unsolicited messages that exposes a <code>byte[]</code></li>
<li>the server takes incoming requests via the same <code>MessageReceived</code> event, and can (if required, not always) post replies via a <code>Reply(byte[], ...)</code> method that also takes the incoming message (for pairing) - and has a <code>Broadcast(byte[])</code> method for sending a message to all clients</li>
<li>there are some other nuances like heartbeats, but; that's probably enough</li>
</ul>
<p>So; we've probably got enough there to start talking about real-world - and very common - scenarios in network code, and we can use that to start thinking about how "pipelines" makes our life easier.</p>
<p>Also an important point: anything I say below is not meant to be critical of <code>SimplSockets</code> - rather, it is to acknowledge that it was written when a <em>lot</em> of pieces like "pipelines" and <code>async</code>/<code>await</code> <em>didn't exist</em> - so it is more an exploration into how we <em>could</em> implement this differently with today's tools.</p>
<h2><a id="user-content-first-things-first-we-need-to-think-about-our-exchange-types" class="anchor" aria-hidden="true" href="#first-things-first-we-need-to-think-about-our-exchange-types"><svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>First things first: we need to think about our exchange types</h2>
<p>The first question I have here is - for received messages in particular: "how should we expose this data to consumers?". By this I mean: <code>SimplSockets</code> went with <code>byte[]</code> as the exchange type; can we improve on that? Unsurprisingly: yes. There are many approaches we can use here.</p>
<ol>
<li>at one extreme, we can stick with <code>byte[]</code> - i.e. allocate a standalone copy of the data, that we can hand to the user; simple to work with, and very safe (nobody else sees that array - no risk of confusion), but it comes at the cost of allocations and copy time.</li>
<li>at the other extreme, we can use zero-copy - and stick with <code>ReadOnlySequence<byte></code> - this means we're consuming the non-contiguous buffers <em>in the pipe itself</em>; this is <em>fast</em>, but somewhat limiting - we can't <em>hand that out</em>, because once we <code>Advance</code> the pipe: that data is going to be recycled. This might be a good option for strictly controlled server-side processing (where the data never escapes the request context)</li>
<li>as an extension of <code>2</code>, we could move the <em>payload</em> parsing code into the library (based on the live <code>ReadOnlySequence<byte></code>), just exposing the <em>deconstructed</em> data, perhaps using custom <code>struct</code>s that map to the scenario; efficient, but requires lots more knowledge of the contents than a general message-passing API allows; this might be a good option if you can pair the library with a serializer that accepts input as <code>ReadOnlySequence<byte></code>, though - allowing the serializer to work on the data without any copies</li>
<li>we could return a <code>Memory<byte></code> to a copy of the data, perhaps using an oversized <code>byte[]</code> from the <code>ArrayPool<byte>.Shared</code> pool; but it isn't necessarily obvious to the consumer that they should return it to the pool (and indeed: getting a <code>T[]</code> array back from a <code>Memory<T></code> is an advanced and "unsafe" operation - not all <code>Memory<T></code> is based on <code>T[]</code> - so we <em>really</em> shouldn't encourage users to try)</li>
<li>we could compromise by returning something that <em>provides</em> a <code>Memory<byte></code> (or <code>Span<byte></code> etc), but which makes it <em>very obvious</em> via a well-known API that the user is meant to do something when they're done with it - i.e. <code>IDisposable</code> / <code>using</code> - and have the exchange-type <em>itself</em> return things to the pool when <code>Dispose()</code> is called</li>
</ol>
<p>In the context of a general purpose messaging API, I think that <code>5</code> is a reasonable option - it means the caller <em>can</em> store the data for some period while they work with it, without jamming the pipe, while still allowing us to make good use of the array pool. And if someone forgets the <code>using</code>, it is <em>less efficient</em>, but nothing will actually explode - it just means it'll tend to run a bit more like option <code>1</code>. But: this decision of exchange types needs careful consideration for your scenario. The <code>StackExchange.Redis</code> client uses option <code>3</code>, handing out deconstructed data; I also have a fake redis <em>server</em> using the <code>StackExchange.Redis</code> framing code, which uses option <code>2</code> - never allowing live escape a request context. You need to take time in considering your exchange types, because it is basically impossible to change this later!</p>
<blockquote>
<p>As a pro tip for option <code>2</code> (using live <code>ReadOnlySequence<byte></code> data and not letting it escape the context - zero-copy for maxiumum efficiency), one way to <em>guarantee</em> this is to wrap the buffer in a domain-specific <code>ref struct</code> before handing it to the code that needs to consume it. It is impossible to store a <code>ref struct</code>, which includes holding onto it in an <code>async</code>/<code>await</code> context, and includes basic reflection (since that requires "boxing", and you cannot "box" a <code>ref struct</code>) - so you have confidence that when the method completes, they no longer have indirect access to the data.</p>
</blockquote>
<p>But, let's assume we're happy with option <code>5</code> (<em>for this specific scenario</em> - there is no general "here's the option you should use", except: not <code>1</code> if you can help it). What might that look like? It turns out that this intent is already desribed in the framework, as <code>System.Buffers.IMemoryOwner<T></code>:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">public</span> <span class="pl-k">interface</span> <span class="pl-en">IMemoryOwner</span><<span class="pl-en">T</span>> : <span class="pl-en">IDisposable</span>
{
<span class="pl-en">Memory</span><<span class="pl-en">T</span>> <span class="pl-smi">Memory</span> { <span class="pl-k">get</span>; }
}</pre></div>
<p>We can then implement this to put our leased arrays back into the array-pool when disposed, taking care to be thread-safe so that if it is disposed twice, we don't put the array into the pool twice (very bad):</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">private</span> <span class="pl-k">sealed</span> <span class="pl-k">class</span> <span class="pl-en">ArrayPoolOwner</span><<span class="pl-en">T</span>> : <span class="pl-en">IMemoryOwner</span><<span class="pl-en">T</span>>
{
<span class="pl-k">private</span> <span class="pl-k">readonly</span> <span class="pl-k">int</span> <span class="pl-smi">_length</span>;
<span class="pl-k">private</span> <span class="pl-en">T</span>[] <span class="pl-smi">_oversized</span>;
<span class="pl-k">internal</span> <span class="pl-en">ArrayPoolOwner</span>(<span class="pl-en">T</span>[] <span class="pl-smi">oversized</span>, <span class="pl-k">int</span> <span class="pl-smi">length</span>)
{
<span class="pl-smi">_length</span> <span class="pl-k">=</span> <span class="pl-smi">length</span>;
<span class="pl-smi">_oversized</span> <span class="pl-k">=</span> <span class="pl-smi">oversized</span>;
}
<span class="pl-k">public</span> <span class="pl-en">Memory</span><<span class="pl-en">T</span>> <span class="pl-smi">Memory</span> <span class="pl-k">=></span> <span class="pl-k">new</span> <span class="pl-en">Memory</span><<span class="pl-en">T</span>>(<span class="pl-en">GetArray</span>(), <span class="pl-c1">0</span>, <span class="pl-smi">_length</span>);
<span class="pl-k">private</span> <span class="pl-en">T</span>[] <span class="pl-en">GetArray</span>() <span class="pl-k">=></span>
<span class="pl-smi">Interlocked</span>.<span class="pl-en">CompareExchange</span>(<span class="pl-k">ref</span> <span class="pl-smi">_oversized</span>, <span class="pl-c1">null</span>, <span class="pl-c1">null</span>)
<span class="pl-k">??</span> <span class="pl-k">throw</span> <span class="pl-k">new</span> <span class="pl-en">ObjectDisposedException</span>(<span class="pl-en">ToString</span>());
<span class="pl-k">public</span> <span class="pl-k">void</span> <span class="pl-en">Dispose</span>()
{
<span class="pl-k">var</span> <span class="pl-smi">arr</span> <span class="pl-k">=</span> <span class="pl-smi">Interlocked</span>.<span class="pl-en">Exchange</span>(<span class="pl-k">ref</span> <span class="pl-smi">_oversized</span>, <span class="pl-c1">null</span>);
<span class="pl-k">if</span> (<span class="pl-smi">arr</span> <span class="pl-k">!=</span> <span class="pl-c1">null</span>) <span class="pl-smi">ArrayPool</span><<span class="pl-en">T</span>>.<span class="pl-smi">Shared</span>.<span class="pl-en">Return</span>(<span class="pl-smi">arr</span>);
}
}</pre></div>
<p>The key point here is in <code>Dispose()</code>, where it swaps out the array field (using <code>Interlocked.Exchange</code>), and puts the array back into the pool. Once we've done this, subsequent calls to <code>.Memory</code> will fail, and calls to <code>Dispose()</code> will do nothing.</p>
<p>Some important things to know about the array pool:</p>
<ol>
<li>the arrays it gives you are often <em>oversized</em> (so that it can give you a larger array if it doesn't have one in exactly your size, but it has a larger one ready to go). This means we need to track the <em>expected</em> length (<code>_length</code>), and use that when constructing <code>.Memory</code>.</li>
<li>the array <em>is not zeroed upon fetch</em> - it can contain garbage. In our case, this isn't a problem because (below) we are <em>immediately</em> going to overwrite it with the data we want to represent, so the external caller will never see this, but <em>in the general case</em>, you might want to consider a: should I zero the contents on behalf of the receiver before giving it to them?, and b: is my data sensitive such that I don't want to accidentally leak it into the pool? (there is an existing "zero when <em>returning</em> to the pool" option in the array-pool, for this reason)</li>
</ol>
<p>As a side note, I wonder whether the above concept might be a worthy addition inside the framework itself, for usage directly from <code>ArrayPool<T></code> - i.e. a method like <code>IMemoryOwner<T> RentOwned(int length)</code> alongside <code>T[] Rent(int minimumLength)</code> - perhaps with the additions of flags for "zero upon fetch" and "zero upon return".</p>
<p>The idea here is that passing an <code>IMemoryOwner<T></code> expresses a transfer of ownership, so a typical usage might be:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">void</span> <span class="pl-en">DoSomethingWith</span>(<span class="pl-en">IMemoryOwner</span><<span class="pl-k">byte</span>> <span class="pl-smi">data</span>)
{
<span class="pl-k">using</span> (<span class="pl-smi">data</span>)
{
<span class="pl-c"><span class="pl-c">//</span> ... other things here ...</span>
<span class="pl-en">DoTheThing</span>(<span class="pl-smi">data</span>.<span class="pl-smi">Memory</span>);
}
<span class="pl-c"><span class="pl-c">//</span> ... more things here ...</span>
}</pre></div>
<p>The caller doesn't need to know about the implementation details (array-pool, etc). Note that we still have to allocate a <em>small</em> object to represent this, but this is still hugely preferable to allocating a large <code>byte[]</code> buffer each time, for our safety.</p>
<p>As a caveat, we should note that a badly written consumer could store the <code>.Memory</code> somewhere, which would lead to undefined behaviour after it has been disposed; or they could use <code>MemoryMarshal</code> to get an array from the memory. If we <em>really needed to prevent these problems</em>, we could do so by implementing a custom <code>MemoryManager<T></code> (most likely, by making <code>ArrayPoolOwner<T> : MemoryManager<T></code>, since <code>MemoryManager<T> : IMemoryOwner<T></code>). We could then make <code>.Span</code> fail just like <code>.Memory</code> does above, and we could prevent <code>MemoryMarshal</code> from being able to obtain the underlying array. It is almost certainly overkill here, but it is useful to know that this option exists, for more extreme scenarios.</p>
<p>At this point you're probably thinking "wow, Marc, you're really over-thinking this - just give them the data", but: getting the exchange types right is probably the single most important design decision you have to make, so: this bit matters!</p>
<p>OK, so how would we populate this? Fortunately, that is pretty simple, as <code>ReadOnlySequence<T></code> has a very handy <code>CopyTo</code> method that does all the heavy lifting:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">public</span> <span class="pl-k">static</span> <span class="pl-en">IMemoryOwner</span><<span class="pl-en">T</span>> <span class="pl-en">Lease</span><<span class="pl-en">T</span>>(
<span class="pl-k">this</span> <span class="pl-en">ReadOnlySequence</span><<span class="pl-en">T</span>> <span class="pl-smi">source</span>)
{
<span class="pl-k">if</span> (<span class="pl-smi">source</span>.<span class="pl-smi">IsEmpty</span>) <span class="pl-k">return</span> <span class="pl-en">Empty</span><<span class="pl-en">T</span>>();
<span class="pl-k">int</span> <span class="pl-smi">len</span> <span class="pl-k">=</span> <span class="pl-k">checked</span>((<span class="pl-k">int</span>)<span class="pl-smi">source</span>.<span class="pl-smi">Length</span>);
<span class="pl-k">var</span> <span class="pl-smi">arr</span> <span class="pl-k">=</span> <span class="pl-smi">ArrayPool</span><<span class="pl-en">T</span>>.<span class="pl-smi">Shared</span>.<span class="pl-en">Rent</span>(<span class="pl-smi">len</span>);
<span class="pl-smi">source</span>.<span class="pl-en">CopyTo</span>(<span class="pl-smi">arr</span>);
<span class="pl-k">return</span> <span class="pl-k">new</span> <span class="pl-en">ArrayPoolOwner</span><<span class="pl-en">T</span>>(<span class="pl-smi">arr</span>, <span class="pl-smi">len</span>);
}</pre></div>
<p>This shows how we can use <code>ArrayPool<T></code> to obtain a (possibly oversized) array that we can use to hold a <em>copy</em> of the data; once we've copied it, we can hand the <em>copy</em> to a consumer to use however they need (and being a flat vector here makes it simple to consume), while the network code can advance the pipe and discard / re-use the buffers. When they <code>Dispose()</code> it, it goes back in the pool, and everyone is happy.</p>
<h2><a id="user-content-starting-the-base-api" class="anchor" aria-hidden="true" href="#starting-the-base-api"><svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Starting the base API</h2>
<p>There is a <em>lot</em> of overlap in the code between a client and server; both need thread-safe mechanisms to write data, and both need some kind of read-loop to check for received data; but what <em>happens</em> is different. So - it sounds like a a base-class might be useful; let's start with a skeleton API that let's us hand in a pipe (or two: recall that an <code>IDuplexPipe</code> is actually the ends of two <em>different</em> pipes - <code>.Input</code> and <code>.Output</code>):</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">public</span> <span class="pl-k">abstract</span> <span class="pl-k">class</span> <span class="pl-en">SimplPipeline</span> : <span class="pl-en">IDisposable</span>
{
<span class="pl-k">private</span> <span class="pl-en">IDuplexPipe</span> <span class="pl-smi">_pipe</span>;
<span class="pl-k">protected</span> <span class="pl-en">SimplPipeline</span>(<span class="pl-en">IDuplexPipe</span> <span class="pl-smi">pipe</span>)
<span class="pl-k">=></span> <span class="pl-smi">_pipe</span> <span class="pl-k">=</span> <span class="pl-smi">pipe</span>;
<span class="pl-k">public</span> <span class="pl-k">void</span> <span class="pl-en">Dispose</span>() <span class="pl-k">=></span> <span class="pl-en">Close</span>();
<span class="pl-k">public</span> <span class="pl-k">void</span> <span class="pl-en">Close</span>() {<span class="pl-c"><span class="pl-c">/*</span> burn the pipe<span class="pl-c">*/</span></span>}
}</pre></div>
<p>The first thing we need after this is some mechanism to send a message in a thread-safe way that doesn't block the caller unduly. The way <code>SimplSockets</code> handles this (and also how <code>StackExchange.Redis</code> v1 works) is to have a <em>message queue</em> of messages that <em>have not yet been written</em>. When the caller calls <code>Send</code>, the messages is added to the queue (synchronized, etc), and will <em>at some point</em> be dequeued and written to the socket. This helps with perceived performance and can help avoid packet fragmentation in some scenarios, <em>but</em>:</p>
<ul>
<li>it has a lot of moving parts</li>
<li>it duplicates something that "pipelines" already provides</li>
</ul>
<p>For the latter, specifically: the pipe <strong>is the queue</strong>; meaning: we <em>already have</em> a buffer of data between the actual output. Adding a <em>second</em> queue is just duplicating this and retaining complexity, so: the second major design change we can make is: <em>throw away the unsent queue</em>; just write to the pipe (synchronized, etc), and let the pipe worry about the rest. One slight consequence of this is that the v1 code had a concept of prioritising messages that are expecting a reply - essentially queue-jumping. By treating the pipe as the outbound queue we <em>lose this ability</em>, but in reality this is unlikely to make a huge difference, so I'm happy to lose it. For very similar reasons, <code>StackExchange.Redis</code> v2 loses the concept of <code>CommandFlags.HighPriority</code>, which is this exact same queue-jumping idea. I'm not concerned by this.</p>
<p>We also need to consider the <em>shape</em> of this API, to allow a server or client to add a messagee</p>
<ul>
<li>we don't necessarily want to be synchronous; we don't need to block while waiting to access to write to the pipe, or while waiting for a response from the server</li>
<li>we might want to expose alternate APIs for whether the caller is simply giving us memory to write (<code>ReadOnlyMember<byte></code>), or <em>giving us owneship</em> of the data, for us to clean up when we've written it (<code>IMemoryOwner<byte></code>)</li>
<li>let's assume that write and read are decoupled - we don't want to worry about the issues of response messages here</li>
</ul>
<p>So; putting that together, I quite like:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">protected</span> <span class="pl-k">async</span> <span class="pl-en">ValueTask</span> <span class="pl-en">WriteAsync</span>(
<span class="pl-en">IMemoryOwner</span><<span class="pl-k">byte</span>> <span class="pl-smi">payload</span>, <span class="pl-k">int</span> <span class="pl-smi">messageId</span>)
{
<span class="pl-k">using</span> (<span class="pl-smi">payload</span>)
{
<span class="pl-k">await</span> <span class="pl-en">WriteAsync</span>(<span class="pl-smi">payload</span>.<span class="pl-smi">Memory</span>, <span class="pl-smi">messageId</span>);
}
}
<span class="pl-k">protected</span> <span class="pl-en">ValueTask</span> <span class="pl-en">WriteAsync</span>(
<span class="pl-en">ReadOnlyMemory</span><<span class="pl-k">byte</span>> <span class="pl-smi">payload</span>, <span class="pl-k">int</span> <span class="pl-smi">messageId</span>);</pre></div>
<p>Here we're giving the caller the conveninence of passing us either an <code>IMemoryOwner<byte></code> (which we then dispose correctly), or a <code>ReadOnlyMemory<byte></code> if they don't need to convery ownership.</p>
<p>The <code>ValueTask</code> makes sense because a write <em>to a pipe</em> is often synchronous; we <em>probably</em> won't be contested for the single-writer access, and the only async part of writing to a pipe is flushing <em>if the pipe is backed up</em> (flushing is very often always synchronous). The <code>messageId</code> is the additional metadata in the frame header that lets us pair replies later. We'll worry about what it <em>is</em> in a bit.</p>
<h2><a id="user-content-writes-and-wrongs" class="anchor" aria-hidden="true" href="#writes-and-wrongs"><svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Writes and wrongs</h2>
<p>So; let's implement that. The first thing we need is guaranteed single-writer access. It would be tempting to use a <code>lock</code>, but <code>lock</code> <em>doesn't play well with <code>async</code></em> (<a href="https://twitter.com/marcgravell/status/1023176337652109312" rel="nofollow">even if you don't screw it up</a>). Because the flush <em>may</em> be async, the continuation could come back on another thread, so we need an <code>async</code>-compatible locking primitive; <code>SemaphoreSlim</code> should suffice.</p>
<p>Next, I'm going to go off on one of my wild tangents. Premise:</p>
<blockquote>
<p>In general, application code should be optimized for readability; library code should be optimized for performance.</p>
</blockquote>
<p>You may or may not agree with this, but it is the general guidance that I code by. What I mean by this is that <em>library</em> code tends to have a <em>single focused purpose</em>, often being maintained by someone whose experience may be "deep but not necessarily wide"; your mind is focusing on that one area, and it is OK to go to bizarre lengths to optimize the code. Conversely, <em>application</em> code tends to involve a lot more plumbing of <em>different</em> concepts - "wide but not necessarily deep" (the depth being hidden in the various libraries). Application code often has more complex and unpredictable interactions, so the focus should be on maintainable and "obviously right".</p>
<p>Basically, my point here is that I tend to focus a lot on optimizations that you wouldn't normally put into application code, because <em>I know from experience and extensive benchmarking</em> that they <em>really matter</em>. So... I'm going to do some things that might look odd, and I want you to take that journey with me.</p>
<p>Let's start with the "obviously right" implementation:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">private</span> <span class="pl-k">readonly</span> <span class="pl-smi">SemaphoreSlim</span> <span class="pl-smi">_singleWriter</span>
<span class="pl-k">=</span> <span class="pl-k">new</span> <span class="pl-en">SemaphoreSlim</span>(<span class="pl-c1">1</span>);
<span class="pl-k">protected</span> <span class="pl-k">async</span> <span class="pl-en">ValueTask</span> <span class="pl-en">WriteAsync</span>(
<span class="pl-en">ReadOnlyMemory</span><<span class="pl-k">byte</span>> <span class="pl-smi">payload</span>, <span class="pl-k">int</span> <span class="pl-smi">messageId</span>)
{
<span class="pl-k">await</span> <span class="pl-smi">_singleWriter</span>.<span class="pl-en">WaitAsync</span>();
<span class="pl-k">try</span>
{
<span class="pl-en">WriteFrameHeader</span>(<span class="pl-smi">writer</span>, <span class="pl-smi">payload</span>.<span class="pl-smi">Length</span>, <span class="pl-smi">messageId</span>);
<span class="pl-k">await</span> <span class="pl-smi">writer</span>.<span class="pl-en">WriteAsync</span>(<span class="pl-smi">payload</span>);
}
<span class="pl-k">finally</span>
{
<span class="pl-smi">_singleWriter</span>.<span class="pl-en">Release</span>();
}
}</pre></div>
<p>This <code>await</code>s single-writer access to the pipe, writes the frame header using <code>WriteFrameHeader</code> (which we'll show in a bit), then drops the <code>payload</code> using the framework-provided <code>WriteAsync</code> method, noting that this includes the <code>FlushAsync</code> as well. There's nothing <em>wrong</em> with this code, but... it does involve unnecessary state machine plumbing in the <strong>most likely case</strong> - i.e. where everything completes synchronously (the writer is not contested, and the pipe is not backed up). We can tweak this code by asking:</p>
<ul>
<li>can I get the single-writer access uncontested?</li>
<li>was the flush synchronous?</li>
</ul>
<p>Consider, instead - making the method we just wrote <code>private</code> and renaming it to <code>WriteAsyncSlowPath</code>, and adding a <strong>non-<code>async</code></strong> method instead:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">protected</span> <span class="pl-en">ValueTask</span> <span class="pl-en">WriteAsync</span>(
<span class="pl-en">ReadOnlyMemory</span><<span class="pl-k">byte</span>> <span class="pl-smi">payload</span>, <span class="pl-k">int</span> <span class="pl-smi">messageId</span>)
{
<span class="pl-c"><span class="pl-c">//</span> try to get the conch; if not, switch to async</span>
<span class="pl-k">if</span> (<span class="pl-k">!</span><span class="pl-smi">_singleWriter</span>.<span class="pl-en">Wait</span>(<span class="pl-c1">0</span>))
<span class="pl-k">return</span> <span class="pl-en">WriteAsyncSlowPath</span>(<span class="pl-smi">payload</span>, <span class="pl-smi">messageId</span>);
<span class="pl-k">bool</span> <span class="pl-smi">release</span> <span class="pl-k">=</span> <span class="pl-c1">true</span>;
<span class="pl-k">try</span>
{
<span class="pl-en">WriteFrameHeader</span>(<span class="pl-smi">writer</span>, <span class="pl-smi">payload</span>.<span class="pl-smi">Length</span>, <span class="pl-smi">messageId</span>);
<span class="pl-k">var</span> <span class="pl-smi">write</span> <span class="pl-k">=</span> <span class="pl-smi">writer</span>.<span class="pl-en">WriteAsync</span>(<span class="pl-smi">payload</span>);
<span class="pl-k">if</span> (<span class="pl-smi">write</span>.<span class="pl-smi">IsCompletedSuccessfully</span>) <span class="pl-k">return</span> <span class="pl-smi">default</span>;
<span class="pl-smi">release</span> <span class="pl-k">=</span> <span class="pl-c1">false</span>;
<span class="pl-k">return</span> <span class="pl-en">AwaitFlushAndRelease</span>(<span class="pl-smi">write</span>);
}
<span class="pl-k">finally</span>
{
<span class="pl-k">if</span> (<span class="pl-smi">release</span>) <span class="pl-smi">_singleWriter</span>.<span class="pl-en">Release</span>();
}
}
<span class="pl-k">async</span> <span class="pl-en">ValueTask</span> <span class="pl-en">AwaitFlushAndRelease</span>(
<span class="pl-en">ValueTask</span><<span class="pl-en">FlushResult</span>> <span class="pl-smi">flush</span>)
{
<span class="pl-k">try</span> { <span class="pl-k">await</span> <span class="pl-smi">flush</span>; }
<span class="pl-k">finally</span> { <span class="pl-smi">_singleWriter</span>.<span class="pl-en">Release</span>(); }
}</pre></div>
<p>The <code>Wait(0)</code> returns <code>true</code> <em>if and only if</em> we can take the semaphore synchronously without delay. If we can't: all bets are off, just switch to the <code>async</code> version. Note once you've gone <code>async</code>, there's no point doing any more of these "hot path" checks - you've already built a state machine (and probably boxed it): the meal is already paid for, so you might as well sit and eat.</p>
<p>However, if we <em>do</em> get the semaphore for free, we can continue and do our <em>writing</em> for free. The header is synchronous <em>anyway</em>, so our next decision is: did the <em>flush</em> complete synchronously? If it did (<code>IsCompletedSuccessfully</code>), <em>we're done</em> - away we go (<code>return default;</code>). Otherwise, we'll need to <code>await</code> the flush. Now, we can't do that from our non-<code>async</code> method, but we can write a <em>separate</em> method (<code>AwaitFlushAndRelease</code>) that takes our incomplete flush, and <code>await</code>s it. In particular, note that we only want the semaphore to be released <em>after</em> the flush has completed, hence the <code>Release()</code> in our helper method. This is also why we set <code>release</code> to <code>false</code> in the calling method, so it doesn't get released prematurely.</p>
<p>We can apply similar techniques to <em>most</em> <code>async</code> operations if we know they're going to <em>often</em> be synchronous, and it is a pattern you may wish to consider. Emphasis: it doesn't help you <em>at all</em> if the result is usually or always <em>genuinely</em> asynchronous - so: don't over-apply it.</p>
<hr>
<p>Right; so - how do we write the header? What <em>is</em> the header? <code>SimplSockets</code> defines the header to be 8 bytes composed of two little-endian 32-bit integers. The first 4 bytes contains the payload length in bytes; the second 4 bytes is the <code>messageId</code> used to correlate requests and responses. Writing this is remarkably simple:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">void</span> <span class="pl-en">WriteFrameHeader</span>(<span class="pl-en">PipeWriter</span> <span class="pl-smi">writer</span>, <span class="pl-k">int</span> <span class="pl-smi">length</span>, <span class="pl-k">int</span> <span class="pl-smi">messageId</span>)
{
<span class="pl-k">var</span> <span class="pl-smi">span</span> <span class="pl-k">=</span> <span class="pl-smi">writer</span>.<span class="pl-en">GetSpan</span>(<span class="pl-c1">8</span>);
<span class="pl-smi">BinaryPrimitives</span>.<span class="pl-en">WriteInt32LittleEndian</span>(
<span class="pl-smi">span</span>, <span class="pl-smi">length</span>);
<span class="pl-smi">BinaryPrimitives</span>.<span class="pl-en">WriteInt32LittleEndian</span>(
<span class="pl-smi">span</span>.<span class="pl-en">Slice</span>(<span class="pl-c1">4</span>), <span class="pl-smi">messageId</span>);
<span class="pl-smi">writer</span>.<span class="pl-en">Advance</span>(<span class="pl-c1">8</span>);
}</pre></div>
<p>You can ask a <code>PipeWriter</code> for "reasonable" sized buffers with confidence, and <code>8</code> bytes is certainly a reasonable size. The helpful framework-provided <code>BinaryPrimitives</code> type provides explicit-endian tools, perfect for network code. The first call writes <code>length</code> to the first 4 bytes of the span. After that, we need to <code>Slice</code> the span so that the second call writes to the <em>next</em> 4 bytes - and finally we call <code>Advance(8)</code> which commits our header to the pipe <em>without</em> flushing it. Normally, you might have to write lots of pieces manually, then call <code>FlushAsync</code> explicitly, but this particular protocol is a good fit for simply calling <code>WriteAsync</code> on the pipe to attach the payload, which <em>includes</em> the flush. So; putting those pieces together, we've successfully written our message to the pipe.</p>
<h2><a id="user-content-using-that-from-a-client" class="anchor" aria-hidden="true" href="#using-that-from-a-client"><svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Using that from a client</h2>
<p>We have a <code>WriteAsync</code> method in the base class; now let's add a concrete client class and start hooking pieces together. Consider:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">public</span> <span class="pl-k">class</span> <span class="pl-en">SimplPipelineClient</span> : <span class="pl-en">SimplPipeline</span>
{
<span class="pl-k">public</span> <span class="pl-k">async</span> <span class="pl-en">Task</span><<span class="pl-en">IMemoryOwner</span><<span class="pl-k">byte</span>>> <span class="pl-en">SendReceiveAsync</span>(<span class="pl-en">ReadOnlyMemory</span><<span class="pl-k">byte</span>> <span class="pl-smi">message</span>)
{
<span class="pl-k">var</span> <span class="pl-smi">tcs</span> <span class="pl-k">=</span> <span class="pl-k">new</span> <span class="pl-en">TaskCompletionSource</span><<span class="pl-en">IMemoryOwner</span><<span class="pl-k">byte</span>>>();
<span class="pl-k">int</span> <span class="pl-smi">messageId</span>;
<span class="pl-k">lock</span> (<span class="pl-smi">_awaitingResponses</span>)
{
<span class="pl-smi">messageId</span> <span class="pl-k">=</span> <span class="pl-k">++</span><span class="pl-smi">_nextMessageId</span>;
<span class="pl-k">if</span> (<span class="pl-smi">messageId</span> <span class="pl-k">==</span> <span class="pl-c1">0</span>) <span class="pl-smi">messageId</span> <span class="pl-k">=</span> <span class="pl-c1">1</span>;
<span class="pl-smi">_awaitingResponses</span>.<span class="pl-en">Add</span>(<span class="pl-smi">messageId</span>, <span class="pl-smi">tcs</span>);
}
<span class="pl-k">await</span> <span class="pl-en">WriteAsync</span>(<span class="pl-smi">message</span>, <span class="pl-smi">messageId</span>);
<span class="pl-k">return</span> <span class="pl-k">await</span> <span class="pl-smi">tcs</span>.<span class="pl-smi">Task</span>;
}
<span class="pl-k">public</span> <span class="pl-k">async</span> <span class="pl-en">Task</span><<span class="pl-en">IMemoryOwner</span><<span class="pl-k">byte</span>>> <span class="pl-en">SendReceiveAsync</span>(<span class="pl-en">IMemoryOwner</span><<span class="pl-k">byte</span>> <span class="pl-smi">message</span>)
{
<span class="pl-k">using</span> (<span class="pl-smi">message</span>)
{
<span class="pl-k">return</span> <span class="pl-k">await</span> <span class="pl-en">SendReceiveAsync</span>(<span class="pl-smi">message</span>.<span class="pl-smi">Memory</span>);
}
}
}</pre></div>
<p>where <code>_awaitingResponses</code> is a dictionary of <code>int</code> message-ids to <code>TaskCompletionSource<IMemoryOwner<byte>></code>. This code invents a new <code>messageId</code> (avoiding zero, which we'll use as a sentinel value), and creates a <code>TaskCompletionSource<T></code> to represent our in-progress operation. Since this definitely will involve network access, there's no benefit in exposing it as <code>ValueTask<T></code>, so this works well. Once we've added our placeholder for catching the reply we write our message (always do book-keeping <em>first</em>, to avoid race conditions). Finally, expose the incomplete task to the caller.</p>
<p>Note that I've implemented this the "obvious" way, but we can optimize this like we did previously, by checking if <code>WriteAsync</code> completed synchronously and simply <code>return</code>ing the <code>tcs.Task</code> without <code>await</code>ing it. Note also that <code>SimplSockets</code> used the <em>calling thread-id</em> as the message-id; this works fine in a blocking scenario, but it isn't viable when we're using <code>async</code> - but: the number is opaque to the "other end" <em>anyway</em> - all it has to do is return the same number.</p>
<h2><a id="user-content-programmed-to-receive" class="anchor" aria-hidden="true" href="#programmed-to-receive"><svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Programmed to receive</h2>
<p>That's pretty-much it for write; next we need to think about receive. As mentioned in the previous posts, there's almost always a receive <em>loop</em> - especially if we need to support out-of-band and out-of-order messages (so: we can't just read one frame immediately after writing). A basic read loop can be approximated by:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">protected</span> <span class="pl-k">async</span> <span class="pl-en">Task</span> <span class="pl-en">StartReceiveLoopAsync</span>(
<span class="pl-en">CancellationToken</span> <span class="pl-smi">cancellationToken</span> <span class="pl-k">=</span> <span class="pl-smi">default</span>)
{
<span class="pl-k">try</span>
{
<span class="pl-k">while</span> (<span class="pl-k">!</span><span class="pl-smi">cancellationToken</span>.<span class="pl-smi">IsCancellationRequested</span>)
{
<span class="pl-k">var</span> <span class="pl-smi">readResult</span> <span class="pl-k">=</span> <span class="pl-k">await</span> <span class="pl-smi">reader</span>.<span class="pl-en">ReadAsync</span>(<span class="pl-smi">cancellationToken</span>);
<span class="pl-k">if</span> (<span class="pl-smi">readResult</span>.<span class="pl-smi">IsCanceled</span>) <span class="pl-k">break</span>;
<span class="pl-k">var</span> <span class="pl-smi">buffer</span> <span class="pl-k">=</span> <span class="pl-smi">readResult</span>.<span class="pl-smi">Buffer</span>;
<span class="pl-k">var</span> <span class="pl-smi">makingProgress</span> <span class="pl-k">=</span> <span class="pl-c1">false</span>;
<span class="pl-k">while</span> (<span class="pl-en">TryParseFrame</span>(<span class="pl-k">ref</span> <span class="pl-smi">buffer</span>, <span class="pl-k">out</span> <span class="pl-k">var</span> <span class="pl-smi">payload</span>, <span class="pl-k">out</span> <span class="pl-k">var</span> <span class="pl-smi">messageId</span>))
{
<span class="pl-smi">makingProgress</span> <span class="pl-k">=</span> <span class="pl-c1">true</span>;
<span class="pl-k">await</span> <span class="pl-en">OnReceiveAsync</span>(<span class="pl-smi">payload</span>, <span class="pl-smi">messageId</span>);
}
<span class="pl-smi">reader</span>.<span class="pl-en">AdvanceTo</span>(<span class="pl-smi">buffer</span>.<span class="pl-smi">Start</span>, <span class="pl-smi">buffer</span>.<span class="pl-smi">End</span>);
<span class="pl-k">if</span> (<span class="pl-k">!</span><span class="pl-smi">makingProgress</span> <span class="pl-k">&&</span> <span class="pl-smi">readResult</span>.<span class="pl-smi">IsCompleted</span>) <span class="pl-k">break</span>;
}
<span class="pl-k">try</span> { <span class="pl-smi">reader</span>.<span class="pl-en">Complete</span>(); } <span class="pl-k">catch</span> { }
}
<span class="pl-k">catch</span> (<span class="pl-en">Exception</span> <span class="pl-smi">ex</span>)
{
<span class="pl-k">try</span> { <span class="pl-smi">reader</span>.<span class="pl-en">Complete</span>(<span class="pl-smi">ex</span>); } <span class="pl-k">catch</span> { }
}
}
<span class="pl-k">protected</span> <span class="pl-k">abstract</span> <span class="pl-en">ValueTask</span> <span class="pl-en">OnReceiveAsync</span>(
<span class="pl-en">ReadOnlySequence</span><<span class="pl-k">byte</span>> <span class="pl-smi">payload</span>, <span class="pl-k">int</span> <span class="pl-smi">messageId</span>);</pre></div>
<p>Note: since we are <em>bound</em> to have an <code>async</code> delay at some point (probably immediately), we might as well just jump straight to an "obvoious" <code>async</code> implementation - we'll gain nothing from trying to be clever here. Key points to observe:</p>
<ul>
<li>we get data from the pipe (note that we <em>might</em> want to also consider <code>TryRead</code> here, but only if we are making progress - otherwise we could find ourselves in a hot loop)</li>
<li>read (<code>TryParseFrame</code>) and process (<code>OnReceiveAsync</code>) as many frames as we can</li>
<li>advance the reader to report our progress, noting that <code>TryParseFrame</code> will have updated <code>buffer.Start</code>, and since we're actively reading as many frames as we can, it is true to say that we've "inspected" to <code>buffer.End</code></li>
<li>keep in mind that the pipelines code is dealing with all the back-buffer concerns re data that we haven't consumed yet (usually a significant amount of code repeated in lots of libraries)</li>
<li>check for exit conditions - if we aren't progressing and the pipe won't get any more data, we're done</li>
<li>report when we've finished reading - through success or failure</li>
</ul>
<p>Unsurprisingly, <code>TryParseFrame</code> is largely the reverse of <code>WriteAsync</code>:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">private</span> <span class="pl-k">bool</span> <span class="pl-en">TryParseFrame</span>(
<span class="pl-k">ref</span> <span class="pl-en">ReadOnlySequence</span><<span class="pl-k">byte</span>> <span class="pl-smi">input</span>,
<span class="pl-k">out</span> <span class="pl-en">ReadOnlySequence</span><<span class="pl-k">byte</span>> <span class="pl-smi">payload</span>, <span class="pl-k">out</span> <span class="pl-k">int</span> <span class="pl-smi">messageId</span>)
{
<span class="pl-k">if</span> (<span class="pl-smi">input</span>.<span class="pl-smi">Length</span> <span class="pl-k"><</span> <span class="pl-c1">8</span>)
{ <span class="pl-c"><span class="pl-c">//</span> not enough data for the header</span>
<span class="pl-smi">payload</span> <span class="pl-k">=</span> <span class="pl-smi">default</span>;
<span class="pl-smi">messageId</span> <span class="pl-k">=</span> <span class="pl-smi">default</span>;
<span class="pl-k">return</span> <span class="pl-c1">false</span>;
}
<span class="pl-k">int</span> <span class="pl-smi">length</span>;
<span class="pl-k">if</span> (<span class="pl-smi">input</span>.<span class="pl-smi">First</span>.<span class="pl-smi">Length</span> <span class="pl-k">>=</span> <span class="pl-c1">8</span>)
{ <span class="pl-c"><span class="pl-c">//</span> already 8 bytes in the first segment</span>
<span class="pl-smi">length</span> <span class="pl-k">=</span> <span class="pl-en">ParseFrameHeader</span>(
<span class="pl-smi">input</span>.<span class="pl-smi">First</span>.<span class="pl-smi">Span</span>, <span class="pl-k">out</span> <span class="pl-smi">messageId</span>);
}
<span class="pl-k">else</span>
{ <span class="pl-c"><span class="pl-c">//</span> copy 8 bytes into a local span</span>
<span class="pl-en">Span</span><<span class="pl-k">byte</span>> <span class="pl-smi">local</span> <span class="pl-k">=</span> <span class="pl-smi">stackalloc</span> <span class="pl-smi">byte</span>[<span class="pl-c1">8</span>];
<span class="pl-smi">input</span>.<span class="pl-en">Slice</span>(<span class="pl-c1">0</span>, <span class="pl-c1">8</span>).<span class="pl-en">CopyTo</span>(<span class="pl-smi">local</span>);
<span class="pl-smi">length</span> <span class="pl-k">=</span> <span class="pl-en">ParseFrameHeader</span>(
<span class="pl-smi">local</span>, <span class="pl-k">out</span> <span class="pl-smi">messageId</span>);
}
<span class="pl-c"><span class="pl-c">//</span> do we have the "length" bytes?</span>
<span class="pl-k">if</span> (<span class="pl-smi">input</span>.<span class="pl-smi">Length</span> <span class="pl-k"><</span> <span class="pl-smi">length</span> <span class="pl-k">+</span> <span class="pl-c1">8</span>)
{
<span class="pl-smi">payload</span> <span class="pl-k">=</span> <span class="pl-smi">default</span>;
<span class="pl-k">return</span> <span class="pl-c1">false</span>;
}
<span class="pl-c"><span class="pl-c">//</span> success!</span>
<span class="pl-smi">payload</span> <span class="pl-k">=</span> <span class="pl-smi">input</span>.<span class="pl-en">Slice</span>(<span class="pl-c1">8</span>, <span class="pl-smi">length</span>);
<span class="pl-smi">input</span> <span class="pl-k">=</span> <span class="pl-smi">input</span>.<span class="pl-en">Slice</span>(<span class="pl-smi">payload</span>.<span class="pl-smi">End</span>);
<span class="pl-k">return</span> <span class="pl-c1">true</span>;
}</pre></div>
<p>First we check whether we have enough data for the frame header (8 bytes); if we don't have that - we certainly don't have a frame. Once we know we have enough bytes for the frame header, we can parse it out to find the payload length. This is a little subtle, because we need to recall that <code>ReadOnlySequence<byte></code> can be <em>discontiguous</em> multiple buffers. Since we're only talking about 8 bytes, the simplest thing to do is:</p>
<ul>
<li>check whether the <em>first segment</em> has 8 bytes; if so, parse from that</li>
<li>otherwise, <code>stackalloc</code> a span (note that this doesn't need <code>unsafe</code>), copy 8 bytes from <code>input</code> into that, and parse <em>from there</em>.</li>
</ul>
<p>Once we know how much payload we're expecting, we can check whether we <em>have that too</em>; if we don't: cede back to the read loop. But if we do:</p>
<ul>
<li>our <em>actual payload</em> is the <code>length</code> bytes <em>after</em> the header - i.e. <code>input.Slice(8, length)</code></li>
<li>we want to update <code>input</code> by cutting off everything up to the end of the frame, i.e. <code>input = input.Slice(payload.End)</code></li>
</ul>
<p>This means that when we return <code>true</code>, <code>payload</code> now contains the bytes that were sent to us, as a discontiguous buffer.</p>
<p>We should also take a look at <code>ParseFrameHeader</code>, which is a close cousin to <code>WriteFrameHeader</code>:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">static</span> <span class="pl-k">int</span> <span class="pl-en">ParseFrameHeader</span>(
<span class="pl-en">ReadOnlySpan</span><<span class="pl-k">byte</span>> <span class="pl-smi">input</span>, <span class="pl-k">out</span> <span class="pl-k">int</span> <span class="pl-smi">messageId</span>)
{
<span class="pl-k">var</span> <span class="pl-smi">length</span> <span class="pl-k">=</span> <span class="pl-smi">BinaryPrimitives</span>
.<span class="pl-en">ReadInt32LittleEndian</span>(<span class="pl-smi">input</span>);
<span class="pl-smi">messageId</span> <span class="pl-k">=</span> <span class="pl-smi">BinaryPrimitives</span>
.<span class="pl-en">ReadInt32LittleEndian</span>(<span class="pl-smi">input</span>.<span class="pl-en">Slice</span>(<span class="pl-c1">4</span>));
<span class="pl-k">return</span> <span class="pl-smi">length</span>;
}</pre></div>
<p>Once again, <code>BinaryPrimitives</code> is helping us out, and we are slicing the <code>input</code> in exactly the same way as before to get the two halves.</p>
<hr>
<p>So; we can parse frames; now we need to act upon them; here's our client implementation:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">protected</span> <span class="pl-k">override</span> <span class="pl-en">ValueTask</span> <span class="pl-en">OnReceiveAsync</span>(
<span class="pl-en">ReadOnlySequence</span><<span class="pl-k">byte</span>> <span class="pl-smi">payload</span>, <span class="pl-k">int</span> <span class="pl-smi">messageId</span>)
{
<span class="pl-k">if</span> (<span class="pl-smi">messageId</span> <span class="pl-k">!=</span> <span class="pl-c1">0</span>)
{ <span class="pl-c"><span class="pl-c">//</span> request/response</span>
<span class="pl-en">TaskCompletionSource</span><<span class="pl-en">IMemoryOwner</span><<span class="pl-k">byte</span>>> <span class="pl-smi">tcs</span>;
<span class="pl-k">lock</span> (<span class="pl-smi">_awaitingResponses</span>)
{
<span class="pl-k">if</span> (<span class="pl-smi">_awaitingResponses</span>.<span class="pl-en">TryGetValue</span>(<span class="pl-smi">messageId</span>, <span class="pl-k">out</span> <span class="pl-smi">tcs</span>))
{
<span class="pl-smi">_awaitingResponses</span>.<span class="pl-en">Remove</span>(<span class="pl-smi">messageId</span>);
}
}
<span class="pl-smi">tcs</span><span class="pl-k">?</span>.<span class="pl-en">TrySetResult</span>(<span class="pl-smi">payload</span>.<span class="pl-en">Lease</span>());
}
<span class="pl-k">else</span>
{ <span class="pl-c"><span class="pl-c">//</span> unsolicited</span>
<span class="pl-smi">MessageReceived</span><span class="pl-k">?</span>.<span class="pl-en">Invoke</span>(<span class="pl-smi">payload</span>.<span class="pl-en">Lease</span>());
}
<span class="pl-k">return</span> <span class="pl-smi">default</span>;
}</pre></div>
<p>This code has two paths; it can be the request/response scenario, or it can be an out-of-band response message with no request. So; if we <em>have</em> a non-zero <code>messageId</code>, we check (synchronized) in our <code>_awaitingResponses</code> dictionary to see if we have a message awaiting completion. If we do, we use <code>TrySetResult</code> to complete the task (after exiting the <code>lock</code>), giving it a lease with the data from the message. Otherwise, we check whether the <code>MessageReceived</code> event is subscribed, and invoke that similarly. In both cases, the use of <code>?.</code> here means that we don't populate a leased array if nobody is listening. It will be the receiver's job to ensure the lease is disposed, as only they can know the lifetime.</p>
<h2><a id="user-content-service-please" class="anchor" aria-hidden="true" href="#service-please"><svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Service, please</h2>
<p>We need to think a little about how we orchestrate this at the server. The <code>SimplPipeline</code> base type above relates to a <em>single</em> connection - it is essentially a proxy to a socket. But servers usually have many clients. Because of that, we'll create a server type that does the <em>actual processing</em>, that internally has a client-type that is our <code>SimplPipeline</code>, and a set of connected clients; so:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">public</span> <span class="pl-k">abstract</span> <span class="pl-k">class</span> <span class="pl-en">SimplPipelineServer</span> : <span class="pl-en">IDisposable</span>
{
<span class="pl-k">protected</span> <span class="pl-k">abstract</span> ValueTask<IMemoryOwner<byte>>
<span class="pl-en">OnReceiveForReplyAsync</span>(<span class="pl-en">IMemoryOwner</span><<span class="pl-k">byte</span>> <span class="pl-smi">message</span>);
<span class="pl-k">public</span> <span class="pl-k">int</span> <span class="pl-smi">ClientCount</span> <span class="pl-k">=></span> <span class="pl-smi">_clients</span>.<span class="pl-smi">Count</span>;
<span class="pl-k">public</span> <span class="pl-en">Task</span> <span class="pl-en">RunClientAsync</span>(<span class="pl-en">IDuplexPipe</span> <span class="pl-smi">pipe</span>,
<span class="pl-en">CancellationToken</span> <span class="pl-smi">cancellationToken</span> <span class="pl-k">=</span> <span class="pl-smi">default</span>)
<span class="pl-k">=></span> <span class="pl-k">new</span> <span class="pl-en">Client</span>(<span class="pl-smi">pipe</span>, <span class="pl-k">this</span>).<span class="pl-en">RunAsync</span>(<span class="pl-smi">cancellationToken</span>);
<span class="pl-k">private</span> <span class="pl-k">class</span> <span class="pl-en">Client</span> : <span class="pl-en">SimplPipeline</span>
{
<span class="pl-k">public</span> <span class="pl-en">Task</span> <span class="pl-en">RunAsync</span>(<span class="pl-en">CancellationToken</span> <span class="pl-smi">cancellationToken</span>)
<span class="pl-k">=></span> <span class="pl-en">StartReceiveLoopAsync</span>(<span class="pl-smi">cancellationToken</span>);
<span class="pl-k">private</span> <span class="pl-k">readonly</span> <span class="pl-en">SimplPipelineServer</span> <span class="pl-smi">_server</span>;
<span class="pl-k">public</span> <span class="pl-en">Client</span>(<span class="pl-en">IDuplexPipe</span> <span class="pl-smi">pipe</span>, <span class="pl-en">SimplPipelineServer</span> <span class="pl-smi">server</span>)
: <span class="pl-k">base</span>(<span class="pl-smi">pipe</span>) <span class="pl-k">=></span> <span class="pl-smi">_server</span> <span class="pl-k">=</span> <span class="pl-smi">server</span>;
<span class="pl-k">protected</span> <span class="pl-k">override</span> <span class="pl-k">async</span> <span class="pl-en">ValueTask</span> <span class="pl-en">OnReceiveAsync</span>(
<span class="pl-en">ReadOnlySequence</span><<span class="pl-k">byte</span>> <span class="pl-smi">payload</span>, <span class="pl-k">int</span> <span class="pl-smi">messageId</span>)
{
<span class="pl-k">using</span> (<span class="pl-k">var</span> <span class="pl-smi">msg</span> <span class="pl-k">=</span> <span class="pl-smi">payload</span>.<span class="pl-en">Lease</span>())
{
<span class="pl-k">var</span> <span class="pl-smi">response</span> <span class="pl-k">=</span> <span class="pl-k">await</span> <span class="pl-smi">_server</span>.<span class="pl-en">OnReceiveForReplyAsync</span>(<span class="pl-smi">msg</span>);
<span class="pl-k">await</span> <span class="pl-en">WriteAsync</span>(<span class="pl-smi">response</span>, <span class="pl-smi">messageId</span>);
}
}
}
}</pre></div>
<p>So; our <em>publicly visible server type</em>, <code>SimplPipelineServer</code> has an <code>abstract</code> method for providing the implementation for <em>what we want to do with messages</em>: <code>OnReceiveForReplyAsync</code> - that takes a payload, and returns the response. Behind the scenes we have a set of clients, <code>_clients</code>, although the details of that aren't interesting.</p>
<p>We accept new clients via the <code>RunClientAsync</code> method; this might seem counter-intuitive, but the emerging architecture for pipelines servers (especially considering "Kestrel" hosts) is to let an external host deal with listening and accepting connections, and all we need to do is have something that accepts an <code>IDuplexPipe</code> and returns a <code>Task</code>. In this case, what that <em>does</em> is create a new <code>Client</code> and start the client's read loop, <code>StartReceiveLoopAsync</code>. When the client receives a message (<code>OnReceiveAsync</code>), it asks the server for a response (<code>_server.OnReceiveForReplyAsync</code>), and then writes that response back via <code>WriteAsync</code>. Note that the version of <code>OnReceiveAsync</code> shown has the consequence of meaning that we can't handle multiple overlapped messages on the same connection at the same time; the "real" version has been aggressively uglified, to check whether <code>_server.OnReceiveForReplyAsync(msg)</code> has completed synchronously; if it hasn't, then it schedules a <em>continuation</em> to perform the <code>WriteAsync</code> (also handling the disposal of <code>msg</code>), and yields to the caller. It also optimizes for the "everything is synchronous" case.</p>
<p>The only other server API we need is a broadcast:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">public</span> <span class="pl-k">async</span> <span class="pl-en">ValueTask</span><<span class="pl-k">int</span>> <span class="pl-en">BroadcastAsync</span>(
<span class="pl-en">ReadOnlyMemory</span><<span class="pl-k">byte</span>> <span class="pl-smi">message</span>)
{
<span class="pl-k">int</span> <span class="pl-smi">count</span> <span class="pl-k">=</span> <span class="pl-c1">0</span>;
<span class="pl-k">foreach</span> (<span class="pl-k">var</span> <span class="pl-smi">client</span> <span class="pl-k">in</span> <span class="pl-smi">_clients</span>)
{
<span class="pl-k">try</span>
{
<span class="pl-k">await</span> <span class="pl-smi">client</span>.<span class="pl-smi">Key</span>.<span class="pl-en">SendAsync</span>(<span class="pl-smi">message</span>);
<span class="pl-smi">count</span><span class="pl-k">++</span>;
}
<span class="pl-k">catch</span> { } <span class="pl-c"><span class="pl-c">//</span> ignore failures on specific clients</span>
}
<span class="pl-k">return</span> <span class="pl-smi">count</span>;
}</pre></div>
<p>(again, possibly with an overload that takes <code>IMemoryOwner<byte></code>)</p>
<p>where <code>SendAsync</code> is simply:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">public</span> <span class="pl-en">ValueTask</span> <span class="pl-en">SendAsync</span>(<span class="pl-en">ReadOnlyMemory</span><<span class="pl-k">byte</span>> <span class="pl-smi">message</span>)
<span class="pl-k">=></span> <span class="pl-en">WriteAsync</span>(<span class="pl-smi">message</span>, <span class="pl-c1">0</span>);</pre></div>
<h2><a id="user-content-putting-it-all-together-implementing-a-client-and-server" class="anchor" aria-hidden="true" href="#putting-it-all-together-implementing-a-client-and-server"><svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Putting it all together; implementing a client and server</h2>
<p>So how can we <em>use</em> all of this? How can we get a working client and server? Let's start with the simpler of the two, the client:</p>
<pre><code>using (var client = await SimplPipelineClient.ConnectAsync(
new IPEndPoint(IPAddress.Loopback, 5000)))
{
// subscribe to broadcasts
client.MessageReceived += async msg => {
if (!msg.Memory.IsEmpty)
await WriteLineAsync('*', msg);
};
string line;
while ((line = await Console.In.ReadLineAsync()) != null)
{
if (line == "q") break;
using (var leased = line.Encode())
{
var response = await client.SendReceiveAsync(leased.Memory);
await WriteLineAsync('<', response);
}
}
}
</code></pre>
<p><code>SimplPipelineClient.ConnectAsync</code> here just uses <code>Pipelines.Sockets.Unofficial</code> to spin up a client socket pipeline, and starts the <code>StartReceiveLoopAsync()</code> method. Taking an additional dependency on <code>Pipelines.Sockets.Unofficial</code> is vexing, but right now there is no framework-supplied client-socket API for pipelines, so: it'll do the job.</p>
<p>This code sets up a simple console client that takes keyboard input; if it receives a <code>"q"</code> it quits; otherwise it sends the message to the server (<code>Encode</code>, not shown, is just a simple text-encode into a leased buffer), and writes the response. The <code>WriteLineAsync</code> method here takes a leased buffer, decodes it, and writes the output to the console - then disposes the buffer. We also listen for unsolicited messages via <code>MessageReceived</code>, and write those to the console with a different prefix.</p>
<p>The server is a little more involved; first we need to implement a server; in this case let's simply reverse the bytes we get:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">class</span> <span class="pl-en">ReverseServer</span> : <span class="pl-en">SimplPipelineServer</span>
{
<span class="pl-k">protected</span> <span class="pl-k">override</span> ValueTask<IMemoryOwner<byte>>
<span class="pl-en">OnReceiveForReplyAsync</span>(<span class="pl-en">IMemoryOwner</span><<span class="pl-k">byte</span>> <span class="pl-smi">message</span>)
{
<span class="pl-c"><span class="pl-c">//</span> since the "message" outlives the response write,</span>
<span class="pl-c"><span class="pl-c">//</span> we can do an in-place reverse and hand</span>
<span class="pl-c"><span class="pl-c">//</span> the same buffer back</span>
<span class="pl-k">var</span> <span class="pl-smi">memory</span> <span class="pl-k">=</span> <span class="pl-smi">message</span>.<span class="pl-smi">Memory</span>;
<span class="pl-en">Reverse</span>(<span class="pl-smi">memory</span>.<span class="pl-smi">Span</span>); <span class="pl-c"><span class="pl-c">//</span> details not shown</span>
<span class="pl-k">return</span> <span class="pl-k">new</span> <span class="pl-en">ValueTask</span><<span class="pl-en">IMemoryOwner</span><<span class="pl-k">byte</span>>>(<span class="pl-smi">memory</span>);
}
}</pre></div>
<p>All this does is respond to messages by returning the same payload, but backwards. And yes, I realize that since we're dealing with text, this could go horribly wrong for grapheme-clusters and/or multi-byte code-points! I never said it was a <em>useful</em> server...</p>
<p>Next up, we need a host. Kestrel (the "ASP.NET Core" server) is an excellent choice there, but implementing a Kestrel host requires introducing quite a few more concepts. But... since we already took a dependency on <code>Pipelines.Sockets.Unofficial</code> for the client, we can use that for the server host with a few lines of code:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">class</span> <span class="pl-en">SimplPipelineSocketServer</span> : <span class="pl-en">SocketServer</span>
{
<span class="pl-k">public</span> <span class="pl-en">SimplPipelineServer</span> <span class="pl-smi">Server</span> { <span class="pl-k">get</span>; }
<span class="pl-k">public</span> <span class="pl-en">SimplPipelineSocketServer</span>(<span class="pl-en">SimplPipelineServer</span> <span class="pl-smi">server</span>)
<span class="pl-k">=></span> <span class="pl-smi">Server</span> <span class="pl-k">=</span> <span class="pl-smi">server</span>;
<span class="pl-k">protected</span> <span class="pl-k">override</span> <span class="pl-en">Task</span> <span class="pl-en">OnClientConnectedAsync</span>(
<span class="pl-en">in</span> <span class="pl-smi">ClientConnection</span> client)
<span class="pl-k">=></span> <span class="pl-smi">Server</span>.<span class="pl-en">RunClientAsync</span>(<span class="pl-smi">client</span>.<span class="pl-smi">Transport</span>);
<span class="pl-k">public</span> <span class="pl-k">static</span> <span class="pl-en">SimplPipelineSocketServer</span> <span class="pl-en">For</span><<span class="pl-en">T</span>>()
<span class="pl-k">where</span> <span class="pl-en">T</span> : <span class="pl-en">SimplPipelineServer</span>, <span class="pl-k">new</span>()
<span class="pl-k">=></span> <span class="pl-k">new</span> <span class="pl-en">SimplPipelineSocketServer</span>(<span class="pl-k">new</span> <span class="pl-en">T</span>());
<span class="pl-k">protected</span> <span class="pl-k">override</span> <span class="pl-k">void</span> <span class="pl-en">Dispose</span>(<span class="pl-k">bool</span> <span class="pl-smi">disposing</span>)
{
<span class="pl-k">if</span> (<span class="pl-smi">disposing</span>) <span class="pl-smi">Server</span>.<span class="pl-en">Dispose</span>();
}
}</pre></div>
<p>The key line in here is our <code>OnClientConnectedAsync</code> method, which is how we accept new connections, simply by passing down the <code>client.Transport</code> (an <code>IDuplexPipe</code>). Hosting in Kestrel works very similarly, except you subclass <code>ConnectionHandler</code> instead of <code>SocketServer</code>, and <code>override</code> the <code>OnConnectedAsync</code> method - but there are a few more steps involved in plumbing everything together. Kestrel, however, has advantages such as supporting exotic socket APIs.</p>
<p>So, let's whack together a console that interacts with the server:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">using</span> (<span class="pl-en">var</span> <span class="pl-en">socket</span> <span class="pl-k">=</span>
<span class="pl-en">SimplPipelineSocketServer</span>.<span class="pl-en">For</span><<span class="pl-en">ReverseServer</span>>())
{
<span class="pl-en">socket</span>.<span class="pl-en">Listen</span>(<span class="pl-en">new</span> <span class="pl-en">IPEndPoint</span>(<span class="pl-en">IPAddress</span>.<span class="pl-en">Loopback</span>, 5000));
<span class="pl-k">string</span> <span class="pl-smi">line</span>;
<span class="pl-k">while</span> ((<span class="pl-smi">line</span> <span class="pl-k">=</span> <span class="pl-k">await</span> <span class="pl-smi">Console</span>.<span class="pl-smi">In</span>.<span class="pl-en">ReadLineAsync</span>()) <span class="pl-k">!=</span> <span class="pl-c1">null</span>)
{
<span class="pl-k">if</span> (<span class="pl-smi">line</span> <span class="pl-k">==</span> <span class="pl-s"><span class="pl-pds">"</span>q<span class="pl-pds">"</span></span>) <span class="pl-k">break</span>;
<span class="pl-k">int</span> <span class="pl-smi">clientCount</span>, <span class="pl-smi">len</span>;
<span class="pl-k">using</span> (<span class="pl-k">var</span> <span class="pl-smi">leased</span> <span class="pl-k">=</span> <span class="pl-smi">line</span>.<span class="pl-en">Encode</span>())
{
<span class="pl-smi">len</span> <span class="pl-k">=</span> <span class="pl-smi">leased</span>.<span class="pl-smi">Memory</span>.<span class="pl-smi">Length</span>;
<span class="pl-smi">clientCount</span> <span class="pl-k">=</span> <span class="pl-k">await</span> <span class="pl-smi">socket</span>.<span class="pl-smi">Server</span>.<span class="pl-en">BroadcastAsync</span>(<span class="pl-smi">leased</span>.<span class="pl-smi">Memory</span>);
}
<span class="pl-k">await</span> <span class="pl-smi">Console</span>.<span class="pl-smi">Out</span>.<span class="pl-en">WriteLineAsync</span>(
<span class="pl-s"><span class="pl-pds">$"</span>Broadcast {<span class="pl-smi">len</span>} bytes to {<span class="pl-smi">clientCount</span>} clients<span class="pl-pds">"</span></span>);
}
}</pre></div>
<p>This works much like the client, except any input other than <code>"q"</code> is <em>broadcast</em> to all the clients.</p>
<h2><a id="user-content-now-race-your-horses" class="anchor" aria-hidden="true" href="#now-race-your-horses"><svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Now race your horses</h2>
<p>We're not just doing this for fun! The key obective of things like pipelines and the array-pool is that it makes it <strong>much</strong> simpler to write IO code that makes efficient use of memory; reducing allocations (and <em>especially</em> reducing large object allocations) <em>signficantly</em> reduces garbage collection overhead, allowing our code to be much more scalable (useful for both servers, and high-throughput client scenarios). Our use of <code>async</code>/<code>await</code> makes it <strong>much</strong> simpler to make effective use of the CPU: instead of blocking for a while, we can make the thread available to do other <em>useful work</em> - increasing throughput, and once again: reducing memory usage (having lots of threads is <em>not</em> cheap - each thread has a quite significant stack space reserved for it).</p>
<p>Note that this isn't entirely free; fetching arrays from the pool (and remembering to return them) <em>by itself</em> has some overhead - but the general expectation is that the cost of checking the pool is, <em>overall</em>, lower than the cost associated from constant allocations and collections. Similarly, <code>async</code>: the hope is that the increased scalability afforded by freeing up threads more-than-offsets the cost of the additional work required by the plumbing involved.</p>
<p>But: there's only one way to find out. <a href="https://ericlippert.com/2012/12/17/performance-rant/" rel="nofollow">As Eric Lippert puts it</a>:</p>
<blockquote>
<p>If you have two horses and you want to know which of the two is the faster then <strong>race your horses</strong></p>
</blockquote>
<p>Setting up a good race-track for code can be awkward, because we need to try to reproduce a <em>meaningful scenario</em>. And it is <em>amazingly</em> easy to write bad performnce tests. Rather than reinvent bad code, it is <em>hugely</em> adviseable to lean on tools like <a href="https://benchmarkdotnet.org/" rel="nofollow"><code>BenchmarkDotNet</code></a>. If you are <em>even remotely</em> performance minded, and you haven't used <code>BenchmarkDotNet</code>: sorry, but <em>you're doing it wrong</em>.</p>
<p>There are 4 combinations we can check here:</p>
<ul>
<li><code>SimplSocketClient</code> against <code>SimplSocketServer</code></li>
<li><code>SimplSocketClient</code> against <code>SimplPipelineServer</code></li>
<li><code>SimplPipelineClient</code> against <code>SimplSocketServer</code></li>
<li><code>SimplPipelineClient</code> against <code>SimplPipelineServer</code></li>
</ul>
<p>I won't list all of these, but for these tests I'll use a <code>[GlobalSetup]</code> method (a <code>BenchmarkDotNet</code> concept) to spin up both servers (on different ports), then we can test clients against each. Here's our "<code>SimplSocketClient</code> against <code>SimplSocketServer</code>" test (remembering that <code>SimplSocketClient</code> is synchronous):</p>
<div class="highlight highlight-source-cs"><pre>[<span class="pl-en">Benchmark</span>(<span class="pl-en">OperationsPerInvoke</span> <span class="pl-k">=</span> <span class="pl-smi">Ops</span>)]
<span class="pl-k">public</span> <span class="pl-k">long</span> <span class="pl-en">c1_s1</span>()
{
<span class="pl-k">long</span> <span class="pl-smi">x</span> <span class="pl-k">=</span> <span class="pl-c1">0</span>;
<span class="pl-k">using</span> (<span class="pl-k">var</span> <span class="pl-smi">client</span> <span class="pl-k">=</span> <span class="pl-k">new</span> <span class="pl-en">SimplSocketClient</span>(<span class="pl-smi">CreateSocket</span>))
{
<span class="pl-smi">client</span>.<span class="pl-en">Connect</span>(<span class="pl-smi">s1</span>);
<span class="pl-k">for</span> (<span class="pl-k">int</span> <span class="pl-smi">i</span> <span class="pl-k">=</span> <span class="pl-c1">0</span>; <span class="pl-smi">i</span> <span class="pl-k"><</span> <span class="pl-smi">Ops</span>; <span class="pl-smi">i</span><span class="pl-k">++</span>)
{
<span class="pl-k">var</span> <span class="pl-smi">response</span> <span class="pl-k">=</span> <span class="pl-smi">client</span>.<span class="pl-en">SendReceive</span>(<span class="pl-smi">_data</span>);
<span class="pl-smi">x</span> <span class="pl-k">+=</span> <span class="pl-smi">response</span>.<span class="pl-smi">Length</span>;
}
}
<span class="pl-k">return</span> <span class="pl-en">AssertResult</span>(<span class="pl-smi">x</span>);
}</pre></div>
<p>and here's our "<code>SimplPipelineClient</code> against <code>SimplPipelineServer</code>" test (using a <code>Task</code> this time, as <code>SimplPipelineClient</code> uses an <code>async</code> API):</p>
<div class="highlight highlight-source-cs"><pre>[<span class="pl-en">Benchmark</span>(<span class="pl-en">OperationsPerInvoke</span> <span class="pl-k">=</span> <span class="pl-smi">Ops</span>)]
<span class="pl-k">public</span> <span class="pl-k">async</span> <span class="pl-en">Task</span><<span class="pl-k">long</span>> <span class="pl-en">c2_s2</span>()
{
<span class="pl-k">long</span> <span class="pl-smi">x</span> <span class="pl-k">=</span> <span class="pl-c1">0</span>;
<span class="pl-k">using</span> (<span class="pl-k">var</span> <span class="pl-smi">client</span> <span class="pl-k">=</span>
<span class="pl-k">await</span> <span class="pl-smi">SimplPipelineClient</span>.<span class="pl-en">ConnectAsync</span>(<span class="pl-smi">s2</span>))
{
<span class="pl-k">for</span> (<span class="pl-k">int</span> <span class="pl-smi">i</span> <span class="pl-k">=</span> <span class="pl-c1">0</span>; <span class="pl-smi">i</span> <span class="pl-k"><</span> <span class="pl-smi">Ops</span>; <span class="pl-smi">i</span><span class="pl-k">++</span>)
{
<span class="pl-k">using</span> (<span class="pl-k">var</span> <span class="pl-smi">response</span> <span class="pl-k">=</span>
<span class="pl-k">await</span> <span class="pl-smi">client</span>.<span class="pl-en">SendReceiveAsync</span>(<span class="pl-smi">_data</span>))
{
<span class="pl-smi">x</span> <span class="pl-k">+=</span> <span class="pl-smi">response</span>.<span class="pl-smi">Memory</span>.<span class="pl-smi">Length</span>;
}
}
}
<span class="pl-k">return</span> <span class="pl-en">AssertResult</span>(<span class="pl-smi">x</span>);
}</pre></div>
<p>Note that we're performing multiple operations (<code>Ops</code>) per run here, so we're not just measing overheads like connect. Other than that, we'll just let <code>BenchmarkDotNet</code> do the hard work. We run our tests, and we get (after some time; benchmarking isn't always fast, although you can make suggestions on the iterations etc to speed it up if you want):</p>
<table>
<thead>
<tr>
<th>Method</th>
<th>Runtime</th>
<th align="right">Mean</th>
<th align="right">Error</th>
<th align="right">StdDev</th>
<th align="right">Gen 0</th>
<th align="right">Gen 1</th>
</tr>
</thead>
<tbody>
<tr>
<td>c1_s1</td>
<td>Clr</td>
<td align="right">NA</td>
<td align="right">NA</td>
<td align="right">NA</td>
<td align="right">N/A</td>
<td align="right">N/A</td>
</tr>
<tr>
<td>c1_s2</td>
<td>Clr</td>
<td align="right">NA</td>
<td align="right">NA</td>
<td align="right">NA</td>
<td align="right">N/A</td>
<td align="right">N/A</td>
</tr>
<tr>
<td>c2_s1</td>
<td>Clr</td>
<td align="right">NA</td>
<td align="right">NA</td>
<td align="right">NA</td>
<td align="right">N/A</td>
<td align="right">N/A</td>
</tr>
<tr>
<td>c2_s2</td>
<td>Clr</td>
<td align="right">45.99us</td>
<td align="right">0.4275us</td>
<td align="right">0.2544us</td>
<td align="right">0.3636</td>
<td align="right">0.0909</td>
</tr>
<tr>
<td>c1_s1</td>
<td>Core</td>
<td align="right">NA</td>
<td align="right">NA</td>
<td align="right">NA</td>
<td align="right">N/A</td>
<td align="right">N/A</td>
</tr>
<tr>
<td>c1_s2</td>
<td>Core</td>
<td align="right">NA</td>
<td align="right">NA</td>
<td align="right">NA</td>
<td align="right">N/A</td>
<td align="right">N/A</td>
</tr>
<tr>
<td>c2_s1</td>
<td>Core</td>
<td align="right">NA</td>
<td align="right">NA</td>
<td align="right">NA</td>
<td align="right">N/A</td>
<td align="right">N/A</td>
</tr>
<tr>
<td>c2_s2</td>
<td>Core</td>
<td align="right">29.87us</td>
<td align="right">0.2294us</td>
<td align="right">0.1518us</td>
<td align="right">0.1250</td>
<td align="right">-</td>
</tr></tbody></table>
<p>Now, you're probaly looking at that table and thinking "huh? most of the data is missing - how can interpret that?" - and: you wouldn't be wrong! It turns out that the <code>c1</code> (<code>SimplSocketClient</code>) and <code>s1</code> (<code>SimplSocketServer</code>) implementations are <em>simply unreliable</em>. Ultimately, it was <strong>painfully hard</strong> to write reliable socket code before pipelines, and it looks like the legacy implementation simply has bugs and race conditions that <em>don't show up in casual usage</em> (it works fine in the REPL client), but which manifest pretty quickly when <code>BenchmarkDotNet</code> runs it <em>aggressively</em>. Our "pipelines" implementation simply used the "obvious" thing, and <em>it works reliably first time</em>. All of the complex pieces that IO authors previously had to worry about have now moved to the framework code, which enables programmers to focus on the interesting thing <em>that they're trying to do</em> (rather than spending most of their time fighting with IO intrinsics), <em>and</em> benefit from a reliable well-tested implementation of the ugly IO code.</p>
<blockquote>
<p>A major advantage of moving to pipelines is getting rid of the gnarly IO bugs that <em>you didn't even know you had</em>.</p>
</blockquote>
<p>I will be more than happy to update this table with updated numbers if <code>SimplSockets</code> can find the things that are stalling it.</p>
<p>Of the numbers that we <em>do</em> have, we can see that it behaves <em>well</em> on <code>Clr</code> (.NET Framework) but works <em>much better</em> on <code>Core</code> (.NET Core). .NET Core 2.1 is frankly <em>amazing</em> (and 3.0 looks even better) - with <em>lots</em> of advantages. If you're serious about performance, migrating to .NET Core should definitely be on your roadmap.</p>
<h2><a id="user-content-summary" class="anchor" aria-hidden="true" href="#summary"><svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Summary</h2>
<p>This has been a long read, but I hope I've conveyed some useful practical advice and tips for working with pipelines in real systems, in a way that is directly translatable to your <em>own</em> requirements. If you want to play with the code in more depth, or see it in action, you can <a href="https://github.com/mgravell/simplsockets/">see my fork here</a>.</p>
<p>Update: please also see <a href="https://blog.marcgravell.com/2018/07/pipe-dreams-part-31.html">part 3.1</a> for further clarifications on this post</p>
<p>Enjoy!</p>Marc Gravellhttp://www.blogger.com/profile/01023334706549710089noreply@blogger.comtag:blogger.com,1999:blog-8184237816669520763.post-51878917846161915102018-07-03T08:42:00.003-07:002018-08-01T07:51:33.039-07:00Pipe Dreams, part 2<h1><a href="#pipelines---a-guided-tour-of-the-new-io-api-in-net-part-2" aria-hidden="true" class="anchor" id="user-content-pipelines---a-guided-tour-of-the-new-io-api-in-net-part-2"></a>Pipelines - a guided tour of the new IO API in .NET, part 2</h1>
<p>In <a href="https://blog.marcgravell.com/2018/07/pipe-dreams-part-1.html" rel="nofollow">part 1</a>, we discussed some of the problems that exist in the familiar <code>Stream</code> API, and we had an introduction to the <code>Pipe</code>, <code>PipeWriter</code> and <code>PipeReader</code> APIs, looking at how to write to a single <code>Pipe</code> and then consume the data from that <code>Pipe</code>; we also discussed how <code>FlushAsync()</code> and <code>ReadAsync()</code> work together to keep both sides of the machinery working, dealing with "empty" and "full" scenarios - suspending the reader when there is nothing to do, and resuming it when data arrives; and suspending the writer when it is out-pacing the reader (over-filling the pipe), and resuming when the reader has caught up; and we discussed what it <em>means</em> to "resume" here, in terms of the threading model.</p>
<p>In this part, we're going to discuss the <em>memory model</em> of pipelines: where does the data actually <em>live?</em>. We'll also start looking at how we can use pipelines in realistic scenarios to fulfil real needs.</p>
<h2><a href="#the-memory-model-where-are-all-my-datas" aria-hidden="true" class="anchor" id="user-content-the-memory-model-where-are-all-my-datas"></a>The memory model: where are all my datas?</h2>
<p>In part 1, we spoke about how the pipe owns all the buffers, allowing the writer to request a buffer via <code>GetMemory()</code> and <code>GetSpan()</code>, with the committed data later being exposed to the reader via the <code>.Buffer</code> on <code>ReadAsync()</code> - which is a <code>ReadOnlySequence<byte></code>, i.e some number of <em>segments</em> of data.</p>
<p>So what <em>actually happens</em>?</p>
<p>Each <code>Pipe</code> instance has a reference to a <code>MemoryPool<byte></code> - a new device in <a href="https://www.nuget.org/packages/System.Memory/" rel="nofollow"><code>System.Memory</code></a> for, unsurprisingly, creating a memory pool. You can specify a specific <code>MemoryPool<byte></code> in the options when creating a <code>Pipe</code>, but by default (and, I imagine, almost always) - a shared application-wide pool (<code>MemoryPool<byte>.Shared</code>) is used.</p>
<p>The <code>MemoryPool<byte></code> concept is very open-ended. The <em>default</em> implementation simply makes use of <code>ArrayPool<byte>.Shared</code> (the application wide array-pool), renting arrays as needed, and returning them when done. This <code>ArrayPool<T></code> is implemented using <code>WeakReference</code>, so pooled arrays are collectible if memory pressure demands it. However, when you ask <code>GetMemory(someSize)</code> or <code>GetSpan(someSize)</code>, it doesn't simply ask the memory pool for <em>that amount</em>; instead, it tracks a "segment" internally. A new "segment" will be (by default, configurable) the larger of <code>someSize</code> or 2048 bytes. Requesting a non-trivial amount of memory means that we aren't filling the system with tiny arrays, which would significantly impact garbage collection. When you <code>Advance(bytesWritten)</code> in the writer, it:</p>
<ul>
<li>moves an internal counter that is how much of the current segment has been used</li>
<li>updates the end of the "available to be read" chain for the reader; if we've just written the first bytes of an empty segment, this will mean adding a new segment to the chain, otherwise it'll mean increasing the end marker of the final segment of the existing chain</li>
</ul>
<p>It is this "available to be read" chain that we fetch in <code>ReadAsync()</code>; and as we <code>AdvanceTo</code> in the reader - when entire segments are consumed, the pipe hands those segments back to the memory pool. From there, they can be reused many times. And as a direct consequence of the two points above, we can see that <em>most of the time</em>, even with multiple calls to <code>Advance</code> in the writer, we may end up with a single segment in the reader, with multiple segments happening either at segment boundaries, or where the reader is falling behind the writer, and data is starting to accumulate.</p>
<p>What this achieves <em>just using the default pool</em> is:</p>
<ul>
<li>we don't need to keep allocating every time we call <code>GetMemory()</code> / <code>GetSpan()</code></li>
<li>we don't need a separate array per <code>GetMemory()</code> / <code>GetSpan()</code> - we'll often just get a different range of the same "segment"</li>
<li>a relatively small number of non-trivial buffer arrays are used</li>
<li>they are automatically recycled without needing lots of library code</li>
<li>when not being used, they are available for garbage collection</li>
</ul>
<p>This also explains why the approach of requesting a very small amount in <code>GetMemory()</code> / <code>GetSpan()</code> and then checking the size can be so successful: we have access to the <em>rest of the unused part of the current segment</em>. Meaning: with a segment size of 2048, of which 200 bytes were already used by previous writes - even if we only ask for 5 bytes, we'll probably find we have 1848 bytes available to play with. Or possibly more - remember that obtaining an array from <code>ArrayPool<T>.Shared</code> is <em>also</em> an "at least this big" operation.</p>
<h2><a href="#zero-copy-buffers" aria-hidden="true" class="anchor" id="user-content-zero-copy-buffers"></a>Zero copy buffers</h2>
<p>Something else to notice in this setup is that we get data buffering without <em>any</em> data copying. The writer asked for a buffer, and wrote the data to <em>where it needed to be</em> the first time, on the way in. This then acted as a buffer between the writer and the reader without any need to copy data around. And if the reader couldn't process all the data yet, it was able to push data back into the pipe simply by saying explicitly what it <em>did</em> consume. There was no need to maintain a separate backlog of data for the reader, something that is <em>very</em> common in protocol processing code using <code>Stream</code>.</p>
<p>It is this combination of features that makes the <em>memory</em> aspect of pipeline code so friendly. You could <em>do</em> all of this with <code>Stream</code>, but it is an excruciating amount of error-prone code to do it, and even more if you want to do it <em>well</em> - and you'd pretty much have to implement it separately for each scenario. Pipelines makes good memory handling the default simple path - <a href="https://blog.codinghorror.com/falling-into-the-pit-of-success/" rel="nofollow">the pit of success</a>.</p>
<h2><a href="#more-exotic-memory-pools" aria-hidden="true" class="anchor" id="user-content-more-exotic-memory-pools"></a>More exotic memory pools</h2>
<p>You aren't limited to the memory model discussed; you can implement your own custom memory pool! The advantage of the default pool is that it is simple. In particular, it <em>doesn't really matter</em> if we aren't 100% perfect about returning every segment - if we somehow drop a pipe on the floor, the worst that can happen is that the garbage collector collects the abandoned segments at some point. They won't go back into the pool, but that's fine.</p>
<p>You <em>can</em>, however, do much more interesting things. Imagine, for example, a <code>MemoryPool<byte></code> that takes huge <em>slabs</em> of memory - either managed memory via a number of very large arrays, or unmanaged memory via <code>Marshal.AllocHGlobal</code> (note that <code>Memory<T></code> and <code>Span<T></code> <em>are not limited to arrays</em> - all they require is some kind of contiguous memory), leasing blocks of this larger chunk as required. This has great potential, but it becomes increasingly important to ensure that segments are reliably returned. Most systems shouldn't need this, but it is good that the flexibility is offered.</p>
<h1><a href="#useful-pipes-in-real-systems" aria-hidden="true" class="anchor" id="user-content-useful-pipes-in-real-systems"></a>Useful pipes in real systems</h1>
<p>The example that we used in part 1 was of a single <code>Pipe</code> that was written and read by the same code. That's clearly not a realistic scenario (unless we're trying to mock an "echo" server), so what can we do for more realistic scenarios? First, we need to connect our pipelines to something. We don't usually want a <code>Pipe</code> in isolation; we want a pipe that <em>integrates with a common system or API</em>. So; let's start by seeng what this would look like.</p>
<p>Here we need a bit of caveat and disclaimer: the pipelines released in .NET Core 2.1 <em>do not include any endpoint implementations</em>. Meaning: the <code>Pipe</code> machinery is there, but nothing is shipped <em>inside the box</em> that actually connects pipes with any other existing systems - like shipping the abstract <code>Stream</code> base-type, but without shipping <code>FileStream</code>, <code>NetworkStream</code>, etc. Yes, that sounds frustrating, but it was a pragmatic reality of time constraints. Don't panic! There are... "lively" conversations going on right now about which bindings to implement with which priority; and there are few community offerings to bridge the most obvious gaps for today.</p>
<p>Since we find ourselves in that position, we might naturally ask: "what does it take to connect pipelines to another data backend?".</p>
<p>Perhaps a good place to start would be connecting a pipe to a <code>Stream</code>. I know what you're thinking: "Marc, but in part 1 you went out of your way to say how terrible <code>Stream</code> is!". I haven't changed my mind; it isn't necessarily <em>ideal</em> - for any scenario-specific <code>Stream</code> implementation (such as <code>NetworkStream</code> or <code>FileStream</code>) we <em>could</em> have a dedicated pipelines-based endpoint that talked <em>directly</em> to that service with minimal indirection; but it is a <em>useful</em> first step:</p>
<ul>
<li>it gives us immediate access to a <em>huge</em> range of API surfaces - anything that can expose data via <code>Stream</code>, and anything that can act as a middle-layer via wrapped streams (encryption, compression, etc)</li>
<li>it hides all the wrinkly bits of the <code>Stream</code> API behind a clear unambiguous surface</li>
<li>it gives us <em>almost all</em> of the advantages that we have mentioned so far</li>
</ul>
<p>So, let's get started! The first thing we need to think about is: what is the <em>direction</em> here? As previously mentioned, a <code>Stream</code> is ambiguous - and could be read-only, write-only, or read-write. Let's assume we want to deal with the most general case: a read-write stream that acts in a duplex manner - this will give us access to things like sockets (via <code>NetworkStream</code>). This means we're actually going to want <em>two</em> pipes - one for the input, one for the output. Pipelines helps clear this up for us, by declaring an interface expressly for this: <code>IDuplexPipe</code>. This is a very simple interface, and being handed an <code>IDuplexPipe</code> is analogous to being handed the ends of two pipes - one marked "in", one marked "out":</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">interface</span> <span class="pl-en">IDuplexPipe</span>
{
<span class="pl-en">PipeReader</span> <span class="pl-smi">Input</span> { <span class="pl-k">get</span>; }
<span class="pl-en">PipeWriter</span> <span class="pl-smi">Output</span> { <span class="pl-k">get</span>; }
}</pre></div>
<p>What we want to do, then, is create a type that <em>implements</em> <code>IDuplexPipe</code>, but using 2 <code>Pipe</code> instances internally:</p>
<ul>
<li>one <code>Pipe</code> will be the output buffer (from the consumer's perspective), which will be filled by caller-code writing to <code>Output</code> - and we'll have a loop that consumes this <code>Pipe</code> and pushes the data into the underlying <code>Stream</code> (to be written to the network, or whatever the stream does)</li>
<li>one <code>Pipe</code> will be the input buffer (from the consumer's perspective); we'll have a loop that <em>reads</em> data from the underlying <code>Stream</code> (from the network, etc) and pushes it into the <code>Pipe</code>, where it will be drained by caller-code reading from <code>Input</code></li>
</ul>
<p>This approach immediately solves a <em>wide range</em> of problems that commonly affect people using <code>Stream</code>:</p>
<ul>
<li>we now have input/output buffers that decouple stream access from the read/write caller-code, without having to add <code>BufferedStream</code> or similar to prevent packet fragmentation (for the writing code), and to make it very easy to continue receiving more data while we process it (for the reading code especially, so we don't have to keep pausing while we ask for more data)</li>
<li>if the caller-code is writing faster than the stream <code>Write</code> can process, the back-pressure feature will kick in, throttling the caller-code so we don't end up with a huge buffer of unsent data</li>
<li>if the stream <code>Read</code> is out-pacing the caller-code that is <em>consuming</em> the data, the back-pressure will kick in here too, throttling our stream read loop so we don't end up with a huge buffer of unprocessed data</li>
<li>both the read and write implementations benefit from all the memory pool goodness that we discussed above</li>
<li>the caller-code doesn't ever need to worry about backlog of data (incomplete frames), etc - the pipe deals with it</li>
</ul>
<h1><a href="#so-what-might-that-look-like" aria-hidden="true" class="anchor" id="user-content-so-what-might-that-look-like"></a>So what might that look like?</h1>
<p>Essentially, all we need to do, is something like:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">class</span> <span class="pl-en">StreamDuplexPipe</span> : <span class="pl-en">IDuplexPipe</span>
{
<span class="pl-en">Stream</span> <span class="pl-smi">_stream</span>;
<span class="pl-en">Pipe</span> <span class="pl-smi">_readPipe</span>, <span class="pl-smi">_writePipe</span>;
<span class="pl-k">public</span> <span class="pl-en">PipeReader</span> <span class="pl-smi">Input</span> <span class="pl-k">=></span> <span class="pl-smi">_readPipe</span>.<span class="pl-smi">Reader</span>;
<span class="pl-k">public</span> <span class="pl-en">PipeWriter</span> <span class="pl-smi">Output</span> <span class="pl-k">=></span> <span class="pl-smi">_writePipe</span>.<span class="pl-smi">Writer</span>;
<span class="pl-c"><span class="pl-c">//</span> ... more here</span>
}</pre></div>
<p>Note that we have two different pipes; the caller gets one end of each pipe - and our code will act on the <em>other</em> end of each pipe.</p>
<h2><a href="#pumping-the-pipe" aria-hidden="true" class="anchor" id="user-content-pumping-the-pipe"></a>Pumping the pipe</h2>
<p>So what does the code look like to interact with the stream? We need two methods, as disccused above. The first - and simplest - has a loop that reads data from the <code>_stream</code> and pushes it to <code>_readPipe</code>, to be consumed by the calling code; the core of this method could be <em>something like</em></p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">while</span> (<span class="pl-c1">true</span>)
{
<span class="pl-c"><span class="pl-c">//</span> note we'll usually get *much* more than we ask for</span>
<span class="pl-k">var</span> <span class="pl-smi">buffer</span> <span class="pl-k">=</span> <span class="pl-smi">_readPipe</span>.<span class="pl-smi">Writer</span>.<span class="pl-en">GetMemory</span>(<span class="pl-c1">1</span>);
<span class="pl-k">int</span> <span class="pl-smi">bytes</span> <span class="pl-k">=</span> <span class="pl-k">await</span> <span class="pl-smi">_stream</span>.<span class="pl-en">ReadAsync</span>(<span class="pl-smi">buffer</span>);
<span class="pl-smi">_readPipe</span>.<span class="pl-smi">Writer</span>.<span class="pl-en">Advance</span>(<span class="pl-smi">bytes</span>);
<span class="pl-k">if</span> (<span class="pl-smi">bytes</span> <span class="pl-k">==</span> <span class="pl-c1">0</span>) <span class="pl-k">break</span>; <span class="pl-c"><span class="pl-c">//</span> source EOF</span>
<span class="pl-k">var</span> <span class="pl-smi">flush</span> <span class="pl-k">=</span> <span class="pl-k">await</span> <span class="pl-smi">_readPipe</span>.<span class="pl-smi">Writer</span>.<span class="pl-en">FlushAsync</span>();
<span class="pl-k">if</span> (<span class="pl-smi">flush</span>.<span class="pl-smi">IsCompleted</span> <span class="pl-k">||</span> <span class="pl-smi">flush</span>.<span class="pl-smi">IsCanceled</span>) <span class="pl-k">break</span>;
}</pre></div>
<p>This loop asks the pipe for a buffer, then uses the new <code>netcoreapp2.1</code> overload of <code>Stream.ReadAsync</code> that accepts a <code>Memory<byte></code> to populate that buffer - we'll discuss what to do if you don't have an API that takes <code>Memory<byte></code> shortly. When the read is complete, it commits that-many bytes to the pipe using <code>Advance</code>, then it invokes <code>FlushAsync()</code> on the <em>pipe</em> to (if needed) awaken the reader, or pause the write loop while the back-pressure eases. Note we should also check the outcome of the <code>Pipe</code>'s <code>FlushAsync()</code> - it could tell us that the pipe's <em>consumer</em> has signalled that they've finished reading the data they want (<code>IsCompleted</code>), or that the pipe itself was shut down (<code>IsCanceled</code>).</p>
<p>Note that in both cases, we want to ensure that we tell the pipe when this loop has exited - <em>however it exits</em> - so that we don't end up with the calling code awaiting forever on data that will never come. Accidents happen, and sometimes the call to <code>_stream.ReadAsync</code> (or any other method) might throw an exception, so a good way to do this is with a <code>try</code>/<code>finally</code> block:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-en">Exception</span> <span class="pl-smi">error</span> <span class="pl-k">=</span> <span class="pl-c1">null</span>;
<span class="pl-k">try</span>
{
<span class="pl-c"><span class="pl-c">//</span> our loop from the previous sample</span>
}
<span class="pl-k">catch</span>(<span class="pl-en">Exception</span> <span class="pl-smi">ex</span>) { <span class="pl-smi">error</span> <span class="pl-k">=</span> <span class="pl-smi">ex</span>; }
<span class="pl-k">finally</span> { <span class="pl-smi">_readPipe</span>.<span class="pl-smi">Writer</span>.<span class="pl-en">Complete</span>(<span class="pl-smi">error</span>); }</pre></div>
<p>If you prefer, you could also use two calls to <code>Complete</code> - one at the end of the <code>try</code> (for success) and one inside the <code>catch</code> (for failure).</p>
<p>The second method we need is a bit more complex; we need a loop that consumes data from <code>_writePipe</code> and pushes it to <code>_stream</code>. The core of this could be something like:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">while</span> (<span class="pl-c1">true</span>)
{
<span class="pl-k">var</span> <span class="pl-smi">read</span> <span class="pl-k">=</span> <span class="pl-k">await</span> <span class="pl-smi">_writePipe</span>.<span class="pl-smi">Reader</span>.<span class="pl-en">ReadAsync</span>();
<span class="pl-k">var</span> <span class="pl-smi">buffer</span> <span class="pl-k">=</span> <span class="pl-smi">read</span>.<span class="pl-smi">Buffer</span>;
<span class="pl-k">if</span> (<span class="pl-smi">buffer</span>.<span class="pl-smi">IsCanceled</span>) <span class="pl-k">break</span>;
<span class="pl-k">if</span> (<span class="pl-smi">buffer</span>.<span class="pl-smi">IsEmpty</span> <span class="pl-k">&&</span> <span class="pl-smi">read</span>.<span class="pl-smi">IsCompleted</span>) <span class="pl-k">break</span>;
<span class="pl-c"><span class="pl-c">//</span> write everything we got to the stream</span>
<span class="pl-k">foreach</span> (<span class="pl-k">var</span> <span class="pl-smi">segment</span> <span class="pl-k">in</span> <span class="pl-smi">buffer</span>)
{
<span class="pl-k">await</span> <span class="pl-smi">_stream</span>.<span class="pl-en">WriteAsync</span>(<span class="pl-smi">segment</span>);
}
<span class="pl-smi">_writePipe</span>.<span class="pl-en">AdvanceTo</span>(<span class="pl-smi">buffer</span>.<span class="pl-smi">End</span>);
<span class="pl-k">await</span> <span class="pl-smi">_stream</span>.<span class="pl-en">FlushAsync</span>();
}</pre></div>
<p>This awaits <em>some</em> data (which could be in multiple buffers), and checks some exit conditions; as before, we can give up if <code>IsCanceled</code>, but the next check is more subtle: we don't want to stop writing just because the <em>producer</em> indicated that they've written everything they wanted to (<code>IsCompleted</code>), or we might not write the last few segments of their data - we need to continue until <em>we've written all their data</em>, so <code>buffer.IsEmpty</code>. This is simplified in this case because we're always writing everything - we'll see a more complex example shortly. Once we have data, we write each of the non-contiguous buffers to the stream sequentially - because <code>Stream</code> can only write one buffer at a time (again, I'm using the <code>netcoreapp2.1</code> overload here that accepts <code>ReadOnlyMemory<byte></code>, but we aren't restricted to this). Once it has written the buffers, it tells the pipe that we have consumed the data, and flushes the underlying <code>Stream</code>.</p>
<p>In "real" code we <em>might</em> want to be a bit more aggressive about optimizing to reduce flushing the underlying stream until we know there is no more data readily available, perhaps using the <code>_writePipe.Reader.TryRead(...)</code> method in addition to <code>_writePipe.Reader.ReadAsync()</code> method; this method works similarly to <code>ReadAsync()</code> but is guaranteed to always return synchronously - useful for testing "did the writer append something while I was busy?". But the above illustrates the point.</p>
<p>Additionally, like before we would want to add a <code>try</code>/<code>finally</code>, so that we always call <code>_writePipe.Reader.Complete();</code> when we exit.</p>
<p>We can use the <code>PipeScheduler</code> to start these two pumps, which will ensure that they run in the intended context, and our loops start pumping data. We'd have a <em>little</em> more house-keeping to add (we'd probably want a mechanism to <code>Close()</code>/<code>Dispose()</code> the underlying stream, etc) - but as you can see, it doesn't have to be a <em>huge</em> task to connect an <code>IDuplexPipe</code> to a source that wasn't designed with pipelines in mind.</p>
<h1><a href="#heres-one-i-made-earlier" aria-hidden="true" class="anchor" id="user-content-heres-one-i-made-earlier"></a>Here's one I made earlier...</h1>
<p>I've simplified the above a little (not too much, honest) to make it consise for discussion, but you still probably don't want to start copying/pasting chunks from here to try and get it to work. I'm not claiming they are the perfect solution for all situations, but as part of the <a href="https://github.com/StackExchange/StackExchange.Redis/issues/871" rel="nofollow">2.0 work for <code>StackExchange.Redis</code></a>, we have implemented a range of bindings for pipelines that we are making available on nuget - unimaginatively titled <code>Pipelines.Sockets.Unofficial</code> (<a href="https://www.nuget.org/packages/Pipelines.Sockets.Unofficial/" rel="nofollow">nuget</a>, <a href="https://github.com/mgravell/Pipelines.Sockets.Unofficial" rel="nofollow">github</a>); this includes:</p>
<ul>
<li>converting a duplex <code>Stream</code> to an <code>IDuplexPipe</code> (like the above)</li>
<li>converting a read-only <code>Stream</code> to a <code>PipeReader</code></li>
<li>converting a write-only <code>Stream</code> to a <code>PipeWriter</code></li>
<li>converting an <code>IDuplexPipe</code> to a duplex <code>Stream</code></li>
<li>converting a <code>PipeReader</code> to a read-only <code>Stream</code></li>
<li>converting a <code>PipeWriter</code> to a writer-only <code>Stream</code></li>
<li>converting a <code>Socket</code> to an <code>IDuplexPipe</code> directly (without going via <code>NetworkStream</code>)</li>
</ul>
<p>The first six are all available via static methods on <code>StreamConnection</code>; the last is available via <code>SocketConnection</code>.</p>
<p><code>StackExchange.Redis</code> is very involved in <code>Socket</code> work, so we are very interested in how to connect pipelines to sockets; for redis connections without TLS, we can connect our <code>Socket</code> direct to the pipeline:</p>
<ul>
<li><code>Socket</code> ⇔ <code>SocketConnection</code></li>
</ul>
<p>For redis connections <em>with</em> TLS (in particular: cloud redis providers), we can connect the pieces thusly:</p>
<ul>
<li><code>Socket</code> ⇔ <code>NetworkStream</code> ⇔ <code>SslStream</code> ⇔ <code>StreamConnection</code></li>
</ul>
<p>Both of these configurations give us a <code>Socket</code> at one end, and an <code>IDuplexPipe</code> at the other, and it begins to show how we can orchcestrate pipelines as part of a more complex system. Perhaps more importantly, it gives us room in the future to <em>change</em> the implementation. As examples of future possibilities:</p>
<ul>
<li>Tim Seaward has been working on <a href="https://github.com/Drawaes/Leto" rel="nofollow"><code>Leto</code></a>, which provides TLS capability as an <code>IDuplexPipe</code> directly, without requiring <code>SslStream</code> (and thus: no stream inverters)</li>
<li>between Tim Seaward, David Fowler and Ben Adams, there are a <em>range</em> of experimental or in-progress network layers directly implementing pipelines without using managed sockets, including "libuv", "RIO" (Registered IO), and most recently, "magma" - which pushes the entire TCP stack into user code to reduce syscalls.</li>
</ul>
<p>It'll be interesting to see how this space develops!</p>
<h2><a href="#but-my-existing-api-doesnt-talk-in-spanbyte-or-memorybyte" aria-hidden="true" class="anchor" id="user-content-but-my-existing-api-doesnt-talk-in-spanbyte-or-memorybyte"></a>But my existing API doesn't talk in <code>Span<byte></code> or <code>Memory<byte></code>!</h2>
<p>When writing code to pump data from a pipe to another system (such as a <code>Socket</code>), it is very likely you'll bump into APIs that don't take <code>Memory<byte></code> or <code>Span<byte></code>. Don't panic, all is not lost! You still have multiple ways of breaking out of that world into something more ... traditional.</p>
<p>The first trick, for when you have a <code>Memory<T></code> or <code>ReadOnlyMemory<T></code>, is <code>MemoryMarshal.TryGetArray(...)</code>. This takes in a <em>memory</em> and attempts to get an <code>ArraySegment<T></code> that describes the same data in terms of a <code>T[]</code> vector and an <code>int</code> offset/count pair. Obviously this can only work if the memory <em>was based on</em> a vector, which is not always the case. So this can fail <em>on exotic memory pools</em>. Our second escape hatch is <code>MemoryMarshal.GetReference(...)</code>. This takes in a <em>span</em> and returns a reference (actually a "managed pointer", aka <code>ref T</code>) to the start of the data. Once we have a <code>ref T</code>, we can use <code>unsafe</code> C# to get an unmanaged pointer to the data, useful for APIs that talk in such:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-en">Span</span><<span class="pl-k">byte</span>> <span class="pl-smi">span</span> <span class="pl-k">=</span> ...
<span class="pl-en">fixed</span>(<span class="pl-smi">byte</span><span class="pl-k">*</span> <span class="pl-smi">ptr</span> <span class="pl-k">=</span> <span class="pl-k">&</span><span class="pl-smi">MemoryMarshal</span>.<span class="pl-en">GetReference</span>(<span class="pl-smi">span</span>))
{
<span class="pl-c"><span class="pl-c">//</span> ...</span>
}</pre></div>
<p>It can still do this if the length of the span is zero, returning a reference to where the zeroth item <em>would have been</em>, and it <em>even works</em> for a <code>default</code> span where there never was any backing memory. This last one requires a slight word of caution because a <code>ref T</code> is <em>not usually expected to be null</em>, but that's exactly what you get here. Essentially, as long as you don't ever try to dereference this kind of null reference: you'll be fine. If you use <code>fixed</code> to convert it to an unmanaged pointer, you get back a null (zero) pointer, which <em>is</em> more expected (and can be useful in some P/Invoke scenarios). <code>MemoryMarshal</code> is <em>essentially</em> synonymous with <code>unsafe</code> code, even if the method you're calling doesn't require the <code>unsafe</code> keyword. It is perfectly valid to use it, but if you use it incorrectly, it reserves the right to hurt you - so just be careful.</p>
<h1><a href="#what-about-the-app-code-end-of-the-pipe" aria-hidden="true" class="anchor" id="user-content-what-about-the-app-code-end-of-the-pipe"></a>What about the app-code end of the pipe?</h1>
<p>OK, we've got our <code>IDuplexPipe</code>, and we've seen how to connect the "business end" of both pipes to your backend data service of choice. Now; how do we use it in our app code?</p>
<p>As in our example from part 1, we're going to hand the <code>PipeWriter</code> from <code>IDuplexPipe.Output</code> to our outbound code, and the <code>PipeReader</code> from <code>IDuplexPipe.Input</code> to our inbound code.</p>
<p>The <em>outbound</em> code is typically very simple, and is usually a very direct port to get from <code>Stream</code>-based code to <code>PipeWriter</code>-based. The key difference, once again, is that <em>you don't control the buffers</em>. A typical implementation might look something like:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-en">ValueTask</span><<span class="pl-k">bool</span>> <span class="pl-en">Write</span>(<span class="pl-en">SomeMessageType</span> <span class="pl-smi">message</span>, <span class="pl-en">PipeWriter</span> <span class="pl-smi">writer</span>)
{
<span class="pl-c"><span class="pl-c">//</span> (this may be multiple GetSpan/Advance calls, or a loop,</span>
<span class="pl-c"><span class="pl-c">//</span> depending on what makes sense for the message/protocol)</span>
<span class="pl-k">var</span> <span class="pl-smi">span</span> <span class="pl-k">=</span> <span class="pl-smi">writer</span>.<span class="pl-en">GetSpan</span>(...);
<span class="pl-c"><span class="pl-c">//</span> TODO: ... actually write the message</span>
<span class="pl-k">int</span> <span class="pl-smi">bytesWritten</span> <span class="pl-k">=</span> ... <span class="pl-c"><span class="pl-c">//</span> from writing</span>
<span class="pl-smi">writer</span>.<span class="pl-en">Advance</span>(<span class="pl-smi">bytesWritten</span>);
<span class="pl-k">return</span> <span class="pl-en">FlushAsync</span>(<span class="pl-smi">writer</span>);
}
<span class="pl-k">private</span> <span class="pl-k">static</span> <span class="pl-k">async</span> <span class="pl-en">ValueTask</span><<span class="pl-k">bool</span>> <span class="pl-en">FlushAsync</span>(<span class="pl-en">PipeWriter</span> <span class="pl-smi">writer</span>)
{
<span class="pl-c"><span class="pl-c">//</span> apply back-pressure etc</span>
<span class="pl-k">var</span> <span class="pl-smi">flush</span> <span class="pl-k">=</span> <span class="pl-k">await</span> <span class="pl-smi">writer</span>.<span class="pl-en">FlushAsync</span>();
<span class="pl-c"><span class="pl-c">//</span> tell the calling code whether any more messages</span>
<span class="pl-c"><span class="pl-c">//</span> should be written</span>
<span class="pl-k">return</span> <span class="pl-k">!</span>(<span class="pl-smi">flush</span>.<span class="pl-smi">IsCanceled</span> <span class="pl-k">||</span> <span class="pl-smi">flush</span>.<span class="pl-smi">IsCompleted</span>);
}</pre></div>
<p>The first part of <code>Write</code> is our business code - we do whatever we need to write the data to the buffers from <code>writer</code>; typically this will include multiple calls to <code>GetSpan(...)</code> and <code>Advance()</code>. When we've written our message, we can flush it to ensure the pump is active, and apply back-pressure. For very large messages we <em>could</em> also flush at intermediate points, but for most simple scenarios: flushing once per message is fine.</p>
<p>If you're wondering why I split the <code>FlushAsync</code> code into a separate method: that's because I want to <code>await</code> the result of <code>FlushAsync</code> to check the exit conditions, so it needs to be in an <code>async</code> method. The most efficient way to access memory here is via the <code>Span<byte></code> API, and <code>Span<byte></code> is a <code>ref struct</code> type; as a consequence <a href="https://github.com/dotnet/csharplang/blob/master/proposals/csharp-7.2/span-safety.md" rel="nofollow">we <strong>cannot</strong> use a <code>Span<byte></code> local variable in an <code>async</code> method</a>. A pragmatic solution is to simply split the methods, so one method deals with the <code>Span<byte></code> work, and another method deals with the <code>async</code> aspect.</p>
<h2><a href="#random-aside-async-code-hot-synchronous-paths-and-async-machinery-overhead" aria-hidden="true" class="anchor" id="user-content-random-aside-async-code-hot-synchronous-paths-and-async-machinery-overhead"></a>Random aside: async code, hot synchronous paths, and async machinery overhead</h2>
<p>The machinery involved in <code>async</code> / <code>await</code> is pretty good, but it can still be a surprising amount stack work - you can see this on <a href="https://sharplab.io/#v2:D4AQDABCCMCsDcBYAUCkBmKAmCB5ArgE4DCA9gCYCmKA3ihAxAA6ECWAbgIYAulU0ANigAOCADVOAG3yUQAgDwAjUqUkA+CADFpAZwAWAQR0BPAHYBjABQAFVk0oB1Nr0IQA7s8qEAlPUZ1kRiCIAHoQiE4mJkljCEVOcwBrAFoWSh0dIj5KbnM/YIguVwAzXT0IAF4oAE53T0IAOm18fSMzK28kQILQ8N5JSQhuPT5zKUlWUwBzCHMKPjcR4a8I01iAW1JCPnX0nU4p9PzgsIh9UnxJcjiF515TY6CQAHYIAEJLUpa9BoBJHWInAslEklGuwGAEC++j+ANI62iOTBnWOAF8UOjUFjTjomCpimCoVshiMIJQAB6cBGglCTFymKTYCC2exOVguWjHDDiKQyOTyZr6ABK6Uu3A0gsMJgslm8lQ0VGKnDFXUxOm4hHw5m4WjKIsykm4nO6DG5ylUEH+gOBoOuFQVlCVKq5mHNgyt8MRvDtDqdhtVQA=" rel="nofollow">sharplab.io</a> - take a look at the generated machinery for the <code>OurCode.FlushAsync</code> method - and the entirety of <code>struct <FlushAsync>d__0</code>. Now, this code is <em>not terrible</em> - it tries hard to avoid allocations in the synchronous path - but it is <em>unnecessary</em>. There are two ways to signficantly improve this; one is to not <code>await</code> <em>at all</em>, which is often possible if the <code>await</code> is the last line in a method <strong>and we don't need to process the results</strong>: don't <code>await</code> - just remove the <code>async</code> and <code>return</code> the task - complete or incomplete. We can't do that here, because we need to check the state of the result, but we can optimize for success by checking whether the task <em>is already complete</em> (via <code>.IsCompletedSuccessfully</code> - if it has completed but faulted, we still want to use the <code>await</code> to make sure the exception behaves correctly). If it <em>is</em> successfully completed, we're allowed to access the <code>.Result</code>; so we could <em>also</em> write our <code>FlushAsync</code> method as:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">private</span> <span class="pl-k">static</span> <span class="pl-en">ValueTask</span><<span class="pl-k">bool</span>> <span class="pl-en">Flush</span>(<span class="pl-en">PipeWriter</span> <span class="pl-smi">writer</span>)
{
<span class="pl-smi">bool</span> <span class="pl-en">GetResult</span>(<span class="pl-en">FlushResult</span> <span class="pl-smi">flush</span>)
<span class="pl-c"><span class="pl-c">//</span> tell the calling code whether any more messages</span>
<span class="pl-c"><span class="pl-c">//</span> should be written</span>
<span class="pl-k">=</span><span class="pl-k">></span> <span class="pl-k">!</span>(<span class="pl-smi">flush</span>.<span class="pl-smi">IsCanceled</span> <span class="pl-k">||</span> <span class="pl-smi">flush</span>.<span class="pl-smi">IsCompleted</span>);
<span class="pl-smi">async</span> <span class="pl-smi">ValueTask</span><span class="pl-k"><</span><span class="pl-smi">bool</span><span class="pl-k">></span> <span class="pl-en">Awaited</span>(<span class="pl-en">ValueTask</span><<span class="pl-en">FlushResult</span>> <span class="pl-smi">incomplete</span>)
<span class="pl-k">=</span><span class="pl-k">></span> <span class="pl-en">GetResult</span>(<span class="pl-en">await</span> <span class="pl-smi">incomplete</span>);
<span class="pl-c"><span class="pl-c">//</span> apply back-pressure etc</span>
<span class="pl-k">var</span> <span class="pl-smi">flushTask</span> <span class="pl-k">=</span> <span class="pl-smi">writer</span>.<span class="pl-en">FlushAsync</span>();
<span class="pl-k">return</span> <span class="pl-smi">flushTask</span>.<span class="pl-smi">IsCompletedSuccessfully</span>
<span class="pl-k">?</span> <span class="pl-k">new</span> <span class="pl-en">ValueTask</span><<span class="pl-k">bool</span>>(<span class="pl-en">GetResult</span>(<span class="pl-smi">flushTask</span>.<span class="pl-smi">Result</span>))
<span class="pl-k">:</span> <span class="pl-en">Awaited</span>(<span class="pl-smi">flushTask</span>);
}</pre></div>
<p>This <em>completely avoids</em> the <code>async</code>/<code>await</code> machinery in the most common case: synchronous completion - as we can see again on <a href="https://sharplab.io/#v2:D4AQDABCCMCsDcBYAUCkBmKAmCB5ArgE4DCA9gCYCmKA3ihAxAA6ECWAbgIYAulU0ANggA1TgBt8lEAIA8AI1KkxAPggAxCQGcAFgAoACqyaUA6m16EIAd3OVCASnqM6yRm4gKlEAOKVuAJUpNfDFuXQ18HUDg0IgAMy1tR1d3VIB6NIheMTEs7T4AY3ExVgA7AHMIAoo+K3zufMtOUoBPCABbUkI+dqDNTnKgp1S3DIgdUhDyD1rzXlLhkYYAXlUAQl0EyO0AOgBJTWJmgsoxSmngYHjE/cPSdqYz3nJ7JFQU1JAADhFxSWl5IoVBAAIJWTisZ66UQSKSyCJRIIhbiqMrVB5PSjJJZuVY+PzRZG6EAATggaPujz8WLei1GmU4TEebTknAKAGsALQsPpEPh+Ap0xhcSxbHTSCDLay2Qg7BHaEGaFqlAq6V4oIUMEAAdmu22ktzIGOp5AAyvgCidNJo4iExC1Ne4APwQUqUKy/WEAzwqXS+AJI0KbRIGwmhezYnEMABcoPBkPOwf1AnVHwgAF8UJn3igxpomIo4ud4l08vyAB6cY0oMoWUribAQQzGMwJwi0RYYT3/eGJMMo9SJRXK1X2SWqKhxTjIt7ZzTcQgW7iD7b9jtprs+iAHI4q07FvGT6ehN5uTdA7d3Y3PccQI8zrNAA=" rel="nofollow">sharplab.io</a>. I should emphasize: there's absolutely no point doing this if the code is usually (or exclusively) going to <em>actually be asynchronous</em>; it <em>only</em> helps when the result is usually (or exclusively) going to be available synchronously.</p>
<h2><a href="#and-what-about-the-reader" aria-hidden="true" class="anchor" id="user-content-and-what-about-the-reader"></a>And what about the reader?</h2>
<p>As we've seen many times, the reader is often slightly more complicated - we can't know that a single "read" operation will contain exactly one inbound message. We may need to loop until we have all the data we need, and we may have <em>additional</em> data that we need to push back. So let's assume we want to consume a single message of some kind:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">async</span> <span class="pl-en">ValueTask</span><<span class="pl-en">SomeMessageType</span>> <span class="pl-en">GetNextMessage</span>(
<span class="pl-en">PipeReader</span> <span class="pl-smi">reader</span>,
<span class="pl-en">CancellationToken</span> <span class="pl-smi">cancellationToken</span> <span class="pl-k">=</span> <span class="pl-smi">default</span>)
{
<span class="pl-k">while</span> (<span class="pl-c1">true</span>)
{
<span class="pl-k">var</span> <span class="pl-smi">read</span> <span class="pl-k">=</span> <span class="pl-k">await</span> <span class="pl-smi">reader</span>.<span class="pl-en">ReadAsync</span>(<span class="pl-smi">cancellationToken</span>);
<span class="pl-k">if</span> (<span class="pl-smi">read</span>.<span class="pl-smi">IsCanceled</span>) <span class="pl-en">ThrowCanceled</span>();
<span class="pl-c"><span class="pl-c">//</span> can we find a complete frame?</span>
<span class="pl-k">var</span> <span class="pl-smi">buffer</span> <span class="pl-k">=</span> <span class="pl-smi">read</span>.<span class="pl-smi">Buffer</span>;
<span class="pl-k">if</span> (<span class="pl-en">TryParseFrame</span>(
<span class="pl-smi">buffer</span>,
<span class="pl-k">out</span> <span class="pl-en">SomeMessageType</span> <span class="pl-smi">nextMessage</span>,
<span class="pl-k">out</span> <span class="pl-en">SequencePosition</span> <span class="pl-smi">consumedTo</span>))
{
<span class="pl-smi">reader</span>.<span class="pl-en">AdvanceTo</span>(<span class="pl-smi">consumedTo</span>);
<span class="pl-k">return</span> <span class="pl-smi">nextMessage</span>;
}
<span class="pl-smi">reader</span>.<span class="pl-en">AdvanceTo</span>(<span class="pl-smi">buffer</span>.<span class="pl-smi">Start</span>, <span class="pl-smi">buffer</span>.<span class="pl-smi">End</span>);
<span class="pl-k">if</span> (<span class="pl-smi">read</span>.<span class="pl-smi">IsCompleted</span>) <span class="pl-en">ThrowEOF</span>();
}
}</pre></div>
<p>Here we obtain <em>some</em> data from the pipe, checking exit conditions like cancelation. Next, we <em>try to find a message</em>; what this means depends on your exact code - this could mean:</p>
<ul>
<li>looking through the buffer for some sentinel value such as an ASCII line-ending, then treating everything up to that point as a message (discarding the line ending)</li>
<li>parsing a well-defined binary frame header, obtaining the payload length, checking that we have that much data, and processing it</li>
<li>or anything else you want!</li>
</ul>
<p>If we <em>do</em> manage to find a message, we can tell the pipe to discard the data that we've consumed - by <code>AdvanceTo(consumedTo)</code>, which uses whatever our own frame-parsing code told us that we consumed. If we <em>don't</em> manage to find a message, the first thing to do is tell the pipe that we consumed nothing despite trying to read everything - by <code>reader.AdvanceTo(buffer.Start, buffer.End)</code>. At this point, there are two possibilities:</p>
<ul>
<li>we haven't got enough data <em>yet</em></li>
<li>the pipe is dead and there will <em>never</em> be enough data</li>
</ul>
<p>Our check on <code>read.IsCompleted</code> tests this, reporting failure in the latter case; otherwise we continue the loop, and await more data. What is left, then, is our frame parsing - we've reduced complex IO management down to simple operations; for example, if our messages are separated by line-feed sentinels:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">private</span> <span class="pl-k">static</span> <span class="pl-k">bool</span> <span class="pl-en">TryParseFrame</span>(
<span class="pl-en">ReadOnlySequence</span><<span class="pl-k">byte</span>> <span class="pl-smi">buffer</span>,
<span class="pl-k">out</span> <span class="pl-en">SomeMessageType</span> <span class="pl-smi">nextMessage</span>,
<span class="pl-k">out</span> <span class="pl-en">SequencePosition</span> <span class="pl-smi">consumedTo</span>)
{
<span class="pl-c"><span class="pl-c">//</span> find the end-of-line marker</span>
<span class="pl-k">var</span> <span class="pl-smi">eol</span> <span class="pl-k">=</span> <span class="pl-smi">buffer</span>.<span class="pl-en">PositionOf</span>((<span class="pl-smi">byte</span>)<span class="pl-s">'<span class="pl-cce">\n</span>'</span>);
<span class="pl-k">if</span> (<span class="pl-smi">eol</span> <span class="pl-k">==</span> <span class="pl-c1">null</span>)
{
<span class="pl-smi">nextMessage</span> <span class="pl-k">=</span> <span class="pl-smi">default</span>;
<span class="pl-smi">consumedTo</span> <span class="pl-k">=</span> <span class="pl-smi">default</span>;
<span class="pl-k">return</span> <span class="pl-c1">false</span>;
}
<span class="pl-c"><span class="pl-c">//</span> read past the line-ending</span>
<span class="pl-smi">consumedTo</span> <span class="pl-k">=</span> <span class="pl-smi">buffer</span>.<span class="pl-en">GetPosition</span>(<span class="pl-c1">1</span>, <span class="pl-smi">eol</span>.<span class="pl-smi">Value</span>);
<span class="pl-c"><span class="pl-c">//</span> consume the data</span>
<span class="pl-k">var</span> <span class="pl-smi">payload</span> <span class="pl-k">=</span> <span class="pl-smi">buffer</span>.<span class="pl-en">Slice</span>(<span class="pl-c1">0</span>, <span class="pl-smi">eol</span>.<span class="pl-smi">Value</span>);
<span class="pl-smi">nextMessage</span> <span class="pl-k">=</span> <span class="pl-en">ReadSomeMessageType</span>(<span class="pl-smi">payload</span>);
<span class="pl-k">return</span> <span class="pl-c1">true</span>;
}</pre></div>
<p>Here <code>PositionOf</code> tries to find the first location of a line-feed. If it can't find one, we give up. Otherwise, we set <code>consumedTo</code> to be "the line-feed plus one" (so we consume the line-feed), and we slice our buffer to create a sub-range that represents the payload <em>without</em> the line-feed, which we can then parse (however). Finally, we report success, and can rejoice at the simplicity of parsing linux-style line-endings.</p>
<h2><a href="#whats-the-point-here" aria-hidden="true" class="anchor" id="user-content-whats-the-point-here"></a>What's the point here?</h2>
<p>With minimal code that is <em>very similar to the most naïve and simple <code>Stream</code> version</em> (without any nice features) our app code now has a reader and writer chain that <em>automatically</em> exploits a wide range of capabilities to ensure efficient and effective processing. Again, you <em>can do all these things</em> with <code>Stream</code>, but it is <em>really, really hard</em> to do well and reliably. By pushing all theses features into the framework, multiple code-bases can benefit from a single implementation. It also gives future scope for interesting custom pipeline endpoints and decorators that work directly on the pipeline API.</p>
<h1><a href="#summary" aria-hidden="true" class="anchor" id="user-content-summary"></a>Summary</h1>
<p>In this section, we looked at the memory model used by pipelines, and how it helps us avoid allocations. Then we looked at how we might integrate pipelines into existing APIs and systems such a <code>Stream</code> - and we introduced <code>Pipelines.Sockets.Unofficial</code> as an available utility library. We looked at the options available for integrating span/memory code with APIs that don't offer those options, and finally we looked at what the <em>actual calling code</em> might look like when talking to pipelines (taking a brief side step into how to optimize <code>async</code> code that is usually synchronous) - showing what our <em>application</em> code might look like. In the third and final part, we'll look at how we combine all these learning points when looking at a real-world library such at <code>StackExchange.Redis</code> - discussing what complications the code needed to solve, and how pipelines made it simple to do so.</p>Marc Gravellhttp://www.blogger.com/profile/01023334706549710089noreply@blogger.comtag:blogger.com,1999:blog-8184237816669520763.post-2924821514599619082018-07-02T08:27:00.003-07:002018-07-30T04:25:16.984-07:00Pipe Dreams, part 1<h1><a href="#pipelines---a-guided-tour-of-the-new-io-api-in-net-part-1" aria-hidden="true" class="anchor" id="user-content-pipelines---a-guided-tour-of-the-new-io-api-in-net-part-1"></a>Pipelines - a guided tour of the new IO API in .NET, part 1</h1>
<p><a href="https://blog.marcgravell.com/2018/07/pipe-dreams-part-2.html">(part 2 here)</a></p>
<p>About two years ago <a href="https://blog.marcgravell.com/2016/09/channelling-my-inner-geek.html" rel="nofollow">I blogged about an upcoming experimental IO API in the .NET world</a> - at the time provisionally called "Channels"; at the end of May 2018, this finally shipped - under the name <a href="https://www.nuget.org/packages/System.IO.Pipelines/" rel="nofollow"><code>System.IO.Pipelines</code></a>. I am hugely interested in the API, and over the last few weeks I'm been <em>consumed</em> with converting <code>StackExchange.Redis</code> to use "pipelines", <a href="https://github.com/StackExchange/StackExchange.Redis/issues/871" rel="nofollow">as part of our 2.0 library update</a>.</p>
<p>My hope in this series, then, is to discuss:</p>
<ul>
<li>what "pipelines" <em>are</em></li>
<li>how to use them in terms of code</li>
<li><em>when</em> you might want to use them</li>
</ul>
<p>To help put this in concrete terms, after introducing "pipelines" I intend to draw heavily on the <code>StackExchange.Redis</code> conversion - and in particular by discussing which problems it solves for us in each scenario. Spoiler: in virtually all cases, the answer can be summarized as:</p>
<blockquote>
<p>It perfectly fits a complex but common stumbling point in IO code; allowing us to replace an ugly kludge, workaround or compromise in <em>our</em> code - with a purpose-designed elegant solution that is in framework code.</p>
</blockquote>
<p>I'm pretty sure that the pain points I'm going to cover below will be familiar to anyone who works at "data protocol" levels, and I'm equally sure that the hacks and messes that we'll be replacing with pipelines will be duplicated in a lot of code-bases.</p>
<h2><a href="#what-do-pipelines-replace--complement" aria-hidden="true" class="anchor" id="user-content-what-do-pipelines-replace--complement"></a>What do pipelines replace / complement?</h2>
<p>The starting point here has to be: <em>what is the closest analogue in existing framework code?</em> And that is simple: <code>Stream</code>. The <code>Stream</code> API will be familiar to anyone who has worked with serialization or data protocols. As an aside: <code>Stream</code> is actually a very ambiguous API - it works <em>very</em> differently in different scenarios:</p>
<ul>
<li>some streams are read-only, some are write-only, some are read-write</li>
<li>the same <em>concrete type</em> can sometimes be read-only, and sometimes write-only (<code>DeflateStream</code>, for example)</li>
<li>when a stream is read-write, sometimes it works like a <a href="https://en.wikipedia.org/wiki/Compact_Cassette">cassette tape</a>, where read and write are operating on the same underlying data (<code>FileStream</code>, <code>MemoryStream</code>); and sometimes it works like two separate streams, where read and write are essentially completely separate streams (<code>NetworkStream</code>, <code>SslStream</code>) - a <em>duplex stream</em></li>
<li>in many of the duplex cases, it is hard or impossible to express "no more data will be arriving, but you should continue to read the data to the end" - there's just <code>Close()</code>, which usually kills both halves of the duplex</li>
<li>sometimes streams are seekable and support concepts like <code>Position</code> and <code>Length</code>; often they're not</li>
<li>because of the progression of APIs over time, there are often multiple ways of performing the same operation - for example, we could use <code>Read</code> (synchronous), <code>BeginRead</code>/<code>EndRead</code> (asynchronous using the <code>IAsyncResult</code> pattern), or <code>ReadAsync</code> (asynchronous using the <code>async</code>/<code>await</code> pattern); calling code has no way <em>in the general case</em> of knowing which of these is the "intended" (optimal) API</li>
<li>if you use either of the asynchronous APIs, it is often unclear what the threading model is; will it always actually be synchronous? if not, what thread will be calling me back? does it use sync-context? thread-pool? IO completion-port threads?</li>
<li>and more recently, there are also extensions to allow <code>Span<byte></code> / <code>Memory<byte></code> to be used in place of <code>byte[]</code> - again, the caller has no way of knowing which is the "preferred" API</li>
<li>the nature of the API <em>encourages</em> copying data; need a buffer? that's a block-copy into another chunk of memory; need a backlog of data you haven't processed yet? block-copy into another chunk of memory; etc</li>
</ul>
<p>So even before we start talking about real-world <code>Stream</code> examples and the problems that happen <em>when using it</em>, it is clear that there are a <em>lot</em> of problems in the <code>Stream</code> API <em>itself</em>. The first unsurprising news, then, is that pipelines sorts this mess out!</p>
<h2><a href="#what-are-pipelines" aria-hidden="true" class="anchor" id="user-content-what-are-pipelines"></a>What are pipelines?</h2>
<p>By "pipelines", I mean a set of 4 key APIs that between them implement decoupled and overlapped reader/writer access to a binary stream (not <code>Stream</code>), including buffer management (pooling, recycling), threading awareness, rich backlog control, and over-fill protection via back-pressure - all based around an API designed around non-contiguous memory. That's a <em>heck</em> of a word salad - but don't worry, I'll be talking about each element to explain what I mean.</p>
<h2><a href="#starting-out-simple-writing-to-and-reading-from-a-single-pipe" aria-hidden="true" class="anchor" id="user-content-starting-out-simple-writing-to-and-reading-from-a-single-pipe"></a>Starting out simple: writing to, and reading from, a single pipe</h2>
<p>Let's start with a <code>Stream</code> analogue, and write sometthing simple to a stream, and read it back - sticking to just the <code>Stream</code> API. We'll use ASCII text so we don't need to worry about any complex encoding concerns, and our read/write code shouldn't assume anything about the underlying stream. We'll just write the data, and then read to the end of the stream to consume it.</p>
<p>We'll do this with <code>Stream</code> first - familiar territory. Then we'll re-implement it with pipelines, to see where the similarities and differences lie. After that, we'll investigate what is actually happening under the hood, so we understand <em>why</em> this is interesting to us!</p>
<p>Also, before you say it: yes, I'm aware of <code>TextReader</code>/<code>TextWriter</code>; I'm not using them intentionally - because I'm trying to talk about the <code>Stream</code> API here, so that the example extends to a wide range of data protocols and scenarios.</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">using</span> (<span class="pl-en">MemoryStream</span> <span class="pl-en">ms</span> <span class="pl-k">=</span> <span class="pl-en">new</span> <span class="pl-en">MemoryStream</span>())
{
<span class="pl-c"><span class="pl-c">//</span> write something</span>
<span class="pl-en">WriteSomeData</span>(<span class="pl-en">ms</span>);
<span class="pl-c"><span class="pl-c">//</span> rewind - MemoryStream works like a tape</span>
<span class="pl-smi">ms</span>.<span class="pl-smi">Position</span> <span class="pl-k">=</span> <span class="pl-c1">0</span>;
<span class="pl-c"><span class="pl-c">//</span> consume it</span>
<span class="pl-en">ReadSomeData</span>(<span class="pl-smi">ms</span>);
}</pre></div>
<p>Now, to write to a <code>Stream</code> the caller needs to obtain and populate a buffer which they then pass to the <code>Stream</code>. We'll keep it simple for now by using the synchronous API and simply allocating a <code>byte[]</code>:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">void</span> <span class="pl-en">WriteSomeData</span>(<span class="pl-en">Stream</span> <span class="pl-smi">stream</span>)
{
<span class="pl-k">byte</span>[] <span class="pl-smi">bytes</span> <span class="pl-k">=</span> <span class="pl-smi">Encoding</span>.<span class="pl-smi">ASCII</span>.<span class="pl-en">GetBytes</span>(<span class="pl-s"><span class="pl-pds">"</span>hello, world!<span class="pl-pds">"</span></span>);
<span class="pl-smi">stream</span>.<span class="pl-en">Write</span>(<span class="pl-smi">bytes</span>, <span class="pl-c1">0</span>, <span class="pl-smi">bytes</span>.<span class="pl-smi">Length</span>);
<span class="pl-smi">stream</span>.<span class="pl-en">Flush</span>();
}
</pre></div>
<p>Note: there are <em>tons</em> of things in the above I could do for efficiency; but that isn't the point yet. So if you're familiar with this type of code and are twitching at the above... don't panic; we'll make it uglier - er, I mean <em>more efficient</em> - later.</p>
<p>The <em>reading</em> code is typically more complex than the writing code, because the reading code can't assume that it will get everything in a single call to <code>Read</code>. A read operation on a <code>Stream</code> can return nothing (which indicates the end of the data), or it could fill our buffer, or it could return a single byte despite being offered a huge buffer. So read code on a <code>Stream</code> is <em>almost always</em> a loop:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">void</span> <span class="pl-en">ReadSomeData</span>(<span class="pl-en">Stream</span> <span class="pl-smi">stream</span>)
{
<span class="pl-k">int</span> <span class="pl-smi">bytesRead</span>;
<span class="pl-c"><span class="pl-c">//</span> note that the caller usually can't know much about</span>
<span class="pl-c"><span class="pl-c">//</span> the size; .Length is not usually usable</span>
<span class="pl-k">byte</span>[] <span class="pl-smi">buffer</span> <span class="pl-k">=</span> <span class="pl-k">new</span> <span class="pl-k">byte</span>[<span class="pl-c1">256</span>];
<span class="pl-k">do</span>
{
<span class="pl-smi">bytesRead</span> <span class="pl-k">=</span> <span class="pl-smi">stream</span>.<span class="pl-en">Read</span>(<span class="pl-smi">buffer</span>, <span class="pl-c1">0</span>, <span class="pl-smi">buffer</span>.<span class="pl-smi">Length</span>);
<span class="pl-k">if</span> (<span class="pl-smi">bytesRead</span> <span class="pl-k">></span> <span class="pl-c1">0</span>)
{ <span class="pl-c"><span class="pl-c">//</span> note this only works for single-byte encodings</span>
<span class="pl-k">string</span> <span class="pl-smi">s</span> <span class="pl-k">=</span> <span class="pl-smi">Encoding</span>.<span class="pl-smi">ASCII</span>.<span class="pl-en">GetString</span>(
<span class="pl-smi">buffer</span>, <span class="pl-c1">0</span>, <span class="pl-smi">bytesRead</span>);
<span class="pl-smi">Console</span>.<span class="pl-en">Write</span>(<span class="pl-smi">s</span>);
}
} <span class="pl-k">while</span> (<span class="pl-smi">bytesRead</span> <span class="pl-k">></span> <span class="pl-c1">0</span>);
}</pre></div>
<p>Now let's translate that to pipelines. A <code>Pipe</code> is broadly comparable to a <code>MemoryStream</code>, except instead of being able to rewind it many times, the data is more simply a "first in first out" queue. We have a <em>writer</em> API that can push data in at one end, and a <em>reader</em> API that can pull the data out at the other. The <code>Pipe</code> is the buffer that sits between the two. Let's reproduce our previous scenario, but using a single <code>Pipe</code> instead of the <code>MemoryStream</code> (again not something we'd usually do in practice, but it is simple to illustrate):</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-en">Pipe</span> <span class="pl-smi">pipe</span> <span class="pl-k">=</span> <span class="pl-k">new</span> <span class="pl-en">Pipe</span>();
<span class="pl-c"><span class="pl-c">//</span> write something</span>
<span class="pl-en">await</span> <span class="pl-en">WriteSomeDataAsync</span>(pipe.Writer);
<span class="pl-c"><span class="pl-c">//</span> signal that there won't be anything else written</span>
<span class="pl-smi">pipe</span>.<span class="pl-smi">Writer</span>.<span class="pl-en">Complete</span>();
<span class="pl-c"><span class="pl-c">//</span> consume it</span>
<span class="pl-en">await</span> <span class="pl-en">ReadSomeDataAsync</span>(pipe.Reader);</pre></div>
<p>First we create a pipe using the default options, then we write to it. Note that IO operations on pipes are usually asynchronous, so we'll need to <code>await</code> our two helper methods. Note also that we don't pass the <code>Pipe</code> to them - unlike <code>Stream</code>, pipelines have separate API surfaces for read and write operations, so we pass a <code>PipeWriter</code> to the helper method that does our writing, and a <code>PipeReader</code> to the helper method that does our reading. After writing the data, we call <code>Complete()</code> on the <code>PipeWriter</code>. We didn't have to do this with the <code>MemoryStream</code> because it automatically <a href="https://en.wikipedia.org/wiki/End-of-file" rel="nofollow">EOFs</a> when it reaches the end of the buffered data - but on some other <code>Stream</code> implementations - especially one-way streams - we might have had to call <code>Close</code> after writing the data.</p>
<p>OK, so what does <code>WriteSomeDataAsync</code> look like? Note, I've deliberately over-annotated here:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">async</span> <span class="pl-en">ValueTask</span> <span class="pl-en">WriteSomeDataAsync</span>(<span class="pl-en">PipeWriter</span> <span class="pl-smi">writer</span>)
{
<span class="pl-c"><span class="pl-c">//</span> use an oversized size guess</span>
<span class="pl-en">Memory</span><<span class="pl-k">byte</span>> <span class="pl-smi">workspace</span> <span class="pl-k">=</span> <span class="pl-smi">writer</span>.<span class="pl-en">GetMemory</span>(<span class="pl-c1">20</span>);
<span class="pl-c"><span class="pl-c">//</span> write the data to the workspace</span>
<span class="pl-k">int</span> <span class="pl-smi">bytes</span> <span class="pl-k">=</span> <span class="pl-smi">Encoding</span>.<span class="pl-smi">ASCII</span>.<span class="pl-en">GetBytes</span>(
<span class="pl-s"><span class="pl-pds">"</span>hello, world!<span class="pl-pds">"</span></span>, <span class="pl-smi">workspace</span>.<span class="pl-smi">Span</span>);
<span class="pl-c"><span class="pl-c">//</span> tell the pipe how much of the workspace</span>
<span class="pl-c"><span class="pl-c">//</span> we actually want to commit</span>
<span class="pl-smi">writer</span>.<span class="pl-en">Advance</span>(<span class="pl-smi">bytes</span>);
<span class="pl-c"><span class="pl-c">//</span> this is **not** the same as Stream.Flush!</span>
<span class="pl-k">await</span> <span class="pl-smi">writer</span>.<span class="pl-en">FlushAsync</span>();
}</pre></div>
<p>The first thing to note is that when dealing with pipelines: <em>you don't control the buffers</em>: the <code>Pipe</code> does. Recall how in our <code>Stream</code> code, both the read and write code created a local <code>byte[]</code>, but we don't have that here. Instead, we ask the <code>Pipe</code> for a buffer (<code>workspace</code>), via the <code>GetMemory</code> method (or it's twin - <code>GetSpan</code>). As you might expect from the name, this gives us either a <code>Memory<byte></code> or a <code>Span<byte></code> - of size <em>at least</em> twenty bytes.</p>
<p>Having obtained this buffer, we encode our <code>string</code> into it. This means that we're writing directly into the pipe's memory, and keep track of how many bytes we <em>actually used</em>, so we can tell it in <code>Advance</code>. We are under no obligation to use the twenty that we asked for: we could write zero, one, twenty, or even fifty bytes. The last one may seem surprising, but it is actually actively encouraged! The emphasis previously was on "<em>at least</em>" - the writer can actually give us a <em>much bigger</em> buffer than we ask for. When dealing with larger data, it is common to make modest requests but expect greatness: ask for the <em>minumum we can usefully utilize</em>, but then check the size of the memory/span that it gives us before deciding how much to actually write.</p>
<p>The call to <code>Advance</code> is important; this completes a single write operation, making the data available in the pipe to be consumed by a reader. The call to <code>FlushAsync</code> is <em>equally important</em>, but much more nuanced. However, before we can adequately describe what it does, we need to take a look at the reader. So; here's our <code>ReadSomeDataAsync</code> method:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-k">async</span> <span class="pl-en">ValueTask</span> <span class="pl-en">ReadSomeDataAsync</span>(<span class="pl-en">PipeReader</span> <span class="pl-smi">reader</span>)
{
<span class="pl-k">while</span> (<span class="pl-c1">true</span>)
{
<span class="pl-c"><span class="pl-c">//</span> await some data being available</span>
<span class="pl-en">ReadResult</span> <span class="pl-smi">read</span> <span class="pl-k">=</span> <span class="pl-k">await</span> <span class="pl-smi">reader</span>.<span class="pl-en">ReadAsync</span>();
<span class="pl-en">ReadOnlySequence</span><<span class="pl-k">byte</span>> <span class="pl-smi">buffer</span> <span class="pl-k">=</span> <span class="pl-smi">read</span>.<span class="pl-smi">Buffer</span>;
<span class="pl-c"><span class="pl-c">//</span> check whether we've reached the end</span>
<span class="pl-c"><span class="pl-c">//</span> and processed everything</span>
<span class="pl-k">if</span> (<span class="pl-smi">buffer</span>.<span class="pl-smi">IsEmpty</span> <span class="pl-k">&&</span> <span class="pl-smi">read</span>.<span class="pl-smi">IsCompleted</span>)
<span class="pl-k">break</span>; <span class="pl-c"><span class="pl-c">//</span> exit loop</span>
<span class="pl-c"><span class="pl-c">//</span> process what we received</span>
<span class="pl-k">foreach</span> (<span class="pl-en">Memory</span><<span class="pl-k">byte</span>> <span class="pl-smi">segment</span> <span class="pl-k">in</span> <span class="pl-smi">buffer</span>)
{
<span class="pl-k">string</span> <span class="pl-smi">s</span> <span class="pl-k">=</span> <span class="pl-smi">Encoding</span>.<span class="pl-smi">ASCII</span>.<span class="pl-en">GetString</span>(
<span class="pl-smi">segment</span>.<span class="pl-smi">Span</span>);
<span class="pl-smi">Console</span>.<span class="pl-en">Write</span>(<span class="pl-smi">s</span>);
}
<span class="pl-c"><span class="pl-c">//</span> tell the pipe that we used everything</span>
<span class="pl-smi">reader</span>.<span class="pl-en">AdvanceTo</span>(<span class="pl-smi">buffer</span>.<span class="pl-smi">End</span>);
}
}</pre></div>
<p>Just like with the <code>Stream</code> example, we have a loop that continues until we've reached the end of the data. With <code>Stream</code>, that is defined as being when <code>Read</code> returns a non-positive result, but with pipelines there are two things to check:</p>
<ul>
<li><code>read.IsCompleted</code> tells us whether the write pipe has been signalled as completed and therefore no more data will be written (<code>pipe.Writer.Complete();</code> in our earlier code did this)</li>
<li><code>buffer.IsEmpty</code> tells us whether there is any data left to proces <em>in this iteration</em></li>
</ul>
<p>If there's nothing in the pipe now <em>and</em> the writer has been completed, then there will <strong>never</strong> be anything in the pipe, and we can exit.</p>
<p>If we <em>do</em> have data, then we can look at <code>buffer</code>. So first - let's talk about <code>buffer</code>; in the code it is a <code>ReadOnlySequence<byte></code>, which is a new type - this concept combines a few roles:</p>
<ul>
<li>describing non-contiguous memory, speficially a sequence of zero, one or many <code>ReadOnlyMemory<byte></code> chunks</li>
<li>describing a logical position (<code>SequencePosition</code>) in such a data-stream - in particular via <code>buffer.Start</code> and <code>buffer.End</code></li>
</ul>
<p>The <em>non-contiguous</em> is very important here. We'll look at where the data is actually going shortly, but in terms of reading: we need to be prepared to handle data that <em>could</em> be spread accross multiple segments. In this case, we do this by a simple <code>foreach</code> over the <code>buffer</code>, decoding each segment in turn. Note that even though the API is designed to be able to describe multiple non-contiguous buffers, it is frequently the case that the data received is contiguous in a single buffer; and in that case, it is often possible to write an optimized implementation for a single buffer. You can do that by checking <code>buffer.IsSingleSegment</code> and accessing <code>buffer.First</code>.</p>
<p>Finally, we call <code>AdvanceTo</code>, which tells the pipe <em>how much data we actually used</em>.</p>
<h1><a href="#key-point-you-dont-need-to-take-everything-you-are-given" aria-hidden="true" class="anchor" id="user-content-key-point-you-dont-need-to-take-everything-you-are-given"></a>Key point: <em>you don't need to take everything you are given!</em></h1>
<p>Contrast to <code>Stream</code>: when you call <code>Read</code> on a <code>Stream</code>, it puts data into the buffer you gave it. In most real-world scenarios, it isn't always possible to consume all the data yet - maybe it only makes sense to consider "commands" as "entire text lines", and you haven't yet seen a <code>cr</code>/<code>lf</code> in the data. With <code>Stream</code>: this is tough - once you've been given the data, it is your problem; if you can't use it yet, you need to store the backlog somewhere. However, with pipelines, <em>you can tell it</em> what you've consumed. In our case, we're telling it that we consumed everything we were given, which we do by passing <code>buffer.End</code> to <code>AdvanceTo</code>. That means we'll never see that data again, just like with <code>Stream</code>. However, we could also have passed <code>buffer.Start</code>, which would mean "we didn't use anything" - and <em>even though we had chance to inspect the data</em>, it would remain in the pipe for subsequent reads. We can also get arbitrary <code>SequencePosition</code> values inside the buffer - if we read 20 bytes, for example - so we have full control over how much data is dropped from the pipe. There are two ways of getting a <code>SequencePosition</code>:</p>
<ul>
<li>you can <code>Slice(...)</code> a <code>ReadOnlySequence<byte></code> in the same way that you <code>Slice(...)</code> a <code>Span<T></code> or <code>Memory<T></code> - and access the <code>.Start</code> or <code>.End</code> of the resulting sub-range</li>
<li>you can use the <code>.GetPosition(...)</code> method of the <code>ReadOnlySequence<byte></code>, which returns a relative position <em>without</em> actually slicing</li>
</ul>
<p>Even more subtle: we can tell it separetely that we <em>consumed</em> some amount, but that we <em>inspected</em> a different amount. The most common example here is to express "you can drop <em>this much</em> - I'm done with that; but I looked at everything, I can't make any more progress at the moment - I need more data" - specifically:</p>
<div class="highlight highlight-source-cs"><pre><span class="pl-smi">reader</span>.<span class="pl-en">AdvanceTo</span>(<span class="pl-smi">consumedToPosition</span>, <span class="pl-smi">buffer</span>.<span class="pl-smi">End</span>);</pre></div>
<p>This is where the subtle interplay of <code>PipeWriter.FlushAsync()</code> and <code>PipeReader.ReadAsync()</code> starts to come into play. I skipped over <code>FlushAsync</code> earlier, but it actually serves two different functions in one call:</p>
<ul>
<li>if there is a <code>ReadAsync</code> call that is outstanding because it needs data, then it <em>awakens</em> the reader, allowing the read loop to continue</li>
<li>if the writer is out-pacing the reader, such that the pipe is filling up with data that isn't being cleared by the reader, it can <em>suspend</em> the writer (by not completing synchronously) - to be reactivated when there is more space in the pipe (the thresholds for writer suspend/resume can be optionally specified when creating the <code>Pipe</code> instance)</li>
</ul>
<p>Obviously these concepts don't come into play in our example, but they are central ideas to how pipelines works. The ability to push data back into the pipe <em>hugely</em> simplifies a vast range of IO scenarios. Virtually every piece of protocol handling code I've seen before pipelines has <em>masses</em> of code related to handling the backlog of incomplete data - it is such a repeated piece of logic that I am <em>incredibly</em> happy to see it handled well in a framework library instead.</p>
<h2><a href="#what-does-awaken-or-reactivate-mean-here" aria-hidden="true" class="anchor" id="user-content-what-does-awaken-or-reactivate-mean-here"></a>What does "awaken" or "reactivate" mean here?</h2>
<p>You might have observed that I didn't really define what I meant here. At the <em>obvious</em> level, I mean that: an <code>await</code> operation of <code>ReadAsync</code> or <code>FlushAsync</code> had previously returned as incomplete, so now the asynchronous continuation gets invoked, allowing our <code>async</code> method to resume execution. Yeah, OK, but that's just re-stating what <code>async</code>/<code>await</code> <em>mean</em>. It is bug-bear of mine that I care <em>deeply</em> (really, it is alarming how deep) about which threads code runs on - for reasons that I'll talk about later in this series. So saying "the asynchronous continuation gets invoked" <em>isn't enough for me</em>. I want to understand <em>who is invoking it</em>, in terms of threads. The most common answers to this are:</p>
<ul>
<li>it delegates via the <code>SynchronizationContext</code> (note: many systems <em>do not have</em> a <code>SynchronizationContext</code>)</li>
<li>the thread that <em>triggered the state change</em> gets used, at the point of the state change, to invoke the continuation</li>
<li>the global thread-pool is used to invoke the continuation</li>
</ul>
<p>All of these can be fine in some cases, and all of these can be terrible in some cases! Sync-context is a well-established mechanism for getting from worker threads back to primary application threads (epecially: the UI thread in desktop applications). However, it isn't necessarily the case that just because we've finished one IO operation, we're ready to jump back to an application thread; and doing so can effectively push a lot of IO code and data processing code <em>onto</em> an application thread - usually the one thing we explicitly want to avoid. Additionally, it can be prone to deadlocks if the application code has used <code>Wait()</code> or <code>.Result</code> on an asynchronous call (which, to be fair, you're not meant to do). The second option (performing the callback "inline" on the thread that triggered it) can be problematic because it can steal a thread that you expected to be doing something else (and can lead to deadlocks as a consequence); and in some extreme cases it can lead to a stack-dive (and eventually a stack-overflow) when two asynchronous methods are essentially functioning as co-routines. The final option (global thread-pool) is immune to the problems of the other two - but can run into severe problems under some load conditions - something again that I'll discuss in a later part in this series.</p>
<p>However, the good news is that <em>pipelines gives you control here</em>. When creating the <code>Pipe</code> instance, we can supply <code>PipeScheduler</code> instances to use for the reader and writer (separately). The <code>PipeScheduler</code> is used to perform these activations. If not specified, then it defaults <em>first</em> to checking for <code>SynchronizationContext</code>, then using the global thread-pool, with "inline" continuations (i.e. intionally using the thread that caused the state change) as another option readily available. But: <em>you can provide your own implementation</em> of a <code>PipeScheduler</code>, giving you full control of the threading model.</p>
<h2><a href="#summary" aria-hidden="true" class="anchor" id="user-content-summary"></a>Summary</h2>
<p>So: we've looked at what a <code>Pipe</code> is when considered individually, and how we can write to a pipe with a <code>PipeWriter</code>, and read from a pipe with a <code>PipeReader</code> - and how to "advance" both reader and writer. We've looked at the similarity and differences with <code>Stream</code>, and we've discussed how <code>ReadAsync()</code> and <code>FlushAsync()</code> can interact to control how the writer and reader pieces execute. We looked at how responsibility for buffers is reversed, with the pipe providing all buffers - and how the pipe can simplify backlog management. Finally, we discussed the threading model that is active for continuations in the <code>await</code> operations.</p>
<p>That's probably enough for step 1; next, we'll look at how the <em>memory model</em> for pipelines works - i.e. where does the data live. We'll also look at <em>how we can use pipelines in real scenarios</em> to start doing interesting things.</p>Marc Gravellhttp://www.blogger.com/profile/01023334706549710089noreply@blogger.comtag:blogger.com,1999:blog-8184237816669520763.post-63683265453471377252018-04-12T06:42:00.001-07:002018-04-17T01:10:55.914-07:00Having a serious conversation about open source<h2 id="havingaseriousconversationaboutopensource">Having a serious conversation about open source</h2>
<p>A certain topic has been surfacing a lot lately on places like twitter; one that I've quietly tried to ignore, but which has been gnawing away at me slowly. It is seemingly the most taboo, dirty and avoided topics.</p>
<p>Open source and money</p>
<p>See, I said it was taboo and dirty</p>
<p>Talking openly about money is always hard, but when you combine money and open source, it very quickly devolves into a metaphorical warzone, complete with entrenched camps, propaganda, etc.</p>
<p>This is a complex area, and if I mis-speak I hope that you'll afford me some generosity in interpreting my words as meant constructively and positively. This is largely a bit of a brain-dump; I don't intend it to be too ranty or preachy, but I'm probably not the best judge of that.</p>
<p>I absolutely love open source and the open source community. I love sharing ideas, being challenged by requirements and perspectives outside of my own horizon, benefitting from the contributions and wisdom of like-minded folks etc. I love that packages I've created (usually originally because I needed to solve a problem that vexed me) have been downloaded and used to help people tens of millions of times - that's a greate feeling! I love that I have benefitted indirectly from community recognition (including things like an MVP award), and professionally (I doubt I'd have got my job at Stack Overflow if the team hadn't been using protobuf-net from a very early date).</p>
<p>But: the consumers of open source (and I <strong>very much</strong> include myself in this) have become... for want of a better word: entitled. We've essentially reinforced that software is free (edit: I mean in the "beer" sense, not in the "Stallman" sense). That our efforts - as an open source community: have no value beyond the occasional pat on a back. Perhaps worse, it undermines the efforts of those developers trying to earn an honest buck by offering professional products in the same area... or maybe it just forces them to offer very clear advantages and extra features, which perhaps is a good thing for them? Or is that just me trying to suppress a sense of guilt at cutting off someone else's customer base?</p>
<p>Yes it is true that some open source projects get great community backing from companies benefitting from the technology, but that seems to be the minority. Most open source projects... just don't.</p>
<ul>
<li>maybe this is because the project isn't popular</li>
<li>maybe it is popular, but not in a way that helps "enterprise" customers (the people most likely to pay)</li>
<li>maybe the project team simply haven't made a pay option available, which could be lack of confidence, or lack or know-how, or legal issues - or it could be the <em>expectation</em> that the software is completely free</li>
<li>maybe people like it, but not enough to pay anything towards it</li>
<li>maybe anything other than the "free to use and open licensing" is massively disruptive to the dominant tool-chain for accessing libraries in a particular ecosytem (npm, nuget, etc)</li>
<li>maybe with multiple contributors of different-sized efforts, it would become massively confusing as to who <em>receives</em> what money, if any was made</li>
<li>(added) does your daytime employment contract <em>prohibit</em> taking payment for additional work</li>
</ul>
<p>But whatever the reason; most open source libraries don't get financially supported. Sometimes this might not be a problem; maybe a library is sponsored internally by a company that has then <em>made</em> that software available for other people to benefit from. But: the moment a library hits the public, it has to deal with all the scenarios and problems and edge cases that the originating company <em>didn't</em> need to worry about. For example, you'd be <em>amazed</em> at how much trouble SQLite causes dapper due to the data types, or how much complexity things like "sentinel", "cluster" and "modules" make for redis clients. But: the originating company (Stack Overflow in the case of dapper, etc) <em>doesn't use</em> those features, so they don't get fixed on company time. This is a recurring theme in many such projects - and now you're in an even more complex place where the people maintaining and advancing something are doing a lot of that work on their own time, but it is now <em>even more awkward</em> to ask the simple question: "this thing that I'm doing to benefit real users: am I getting paid for this? can I even accept contributions other than PRs?".</p>
<h2 id="isthisaproblem">Is this a problem?</h2>
<p>Perhaps, perhaps not. I'm certainly not bitter; I love working on this stuff - I do it for hobby and pleasure reasons too (I love solving challenging problems), and it has <em>hugely</em> advanced my knowledge, but I have to be honest and admit that there's a peice of me that thinks an opportunity has been missed. Take protobuf-net: if I sat down and added up the hours that I've spent on that, it would be horrifying. And I know people are succeding with it, and using it for commercial gain - I get the support emails from people using it in incredibly interesting ways.</p>
<p>Quite a while back I tried adding small and discreet contribution options for protobuf-net (think: "buy me a beer"). It wasn't entirely unsuccessful: to date I've received a little over (edit: incorrectly stated USD) GBP 100 in direct contributions; most of that was in one very much appreciated donation - that I can't find the details of because "pledgie" closed down. But overall, almost all of the work done has been completely for free. Again, I don't <em>resent this</em>, but it feels that there's a huge imbalance in terms of who is doing the work, versus who is <em>benefiting</em> from the work. There is very little motivation for companies benefiting from open source to contribute back to it - even when they're using it in commercial ways that are helping them create profits from successful products or services.</p>
<p>In my view, this is just as bad for the <em>consuming company</em> as it is for the author: if the developer isn't motivated to improve and maintain a library that you depend on for your successful product, then: that sounds a lot like a supply-chain risk. But then, I guess you can just move onto the next competing free tool if one author burns out.</p>
<p>I'm not sure I have a <em>solution</em> here, but I <em>do</em> think there's a very real conversation that we shouldn't be afraid of having, about how we - as an industry - avoid open source being treated simply as free contractors.</p>
<h2 id="imtoyingwithideas">I'm toying with ideas</h2>
<p>For protobuf-net, I'm aware that a good number of my users are doing things like games, which tend to run on limited runtimes that don't tend to have runtime-emit support (they are "AOT" runtimes). This really hurts the performance of reflection-based libraries. I've been toying <em>for ages</em> with new ideas to make protobuf-net work much better on those platforms by having much richer build tooling that does all of the emit code up-front as part of the build, and <em>one of the ideas I'm playing with</em> is to:</p>
<ul>
<li>keep the core protobuf-net runtime library "as is" (and continue to make it available for free)</li>
<li>add an additional separate package that adds the AOT features</li>
<li>but make this package dual licencesed: GPL or purchase non-GPL</li>
</ul>
<p>But I'm very very unsure about this. Philosophically, I kinda hate the GPL. I just do. But the stickiness of the GPL might be the thing that actually gets <em>some</em> customers - the ones who care about compliance - to pay a little for it. It doesn't have to be much; just enough to make it feel justified to spend such a vast amount of time developing these complex features. As for the people that don't care about compliance: they weren't going to pay anything <em>anyway</em>, so frankly it isn't worth worrying about what they do.</p>
<p>Is this a terrible idea? Is this just me exploiting the fact that I know some users have AOT requirements? Am I just being greedy in my middle-age? Is this just going to make a nightmare of accounting and legal problems? Am I just being grumpy? Should I just accept that open source is free work? I genuinely don't know, and I haven't made my mind up on what to do. I'm genuinely interested in what people think though; <strike>comments should be open below</strike> (edit: comments <em>were</em> open, but... blog spam, so much blog span; maybe <a href="https://twitter.com/marcgravell">tweet me @marcgravell</a>).</p>
Marc Gravellhttp://www.blogger.com/profile/01023334706549710089noreply@blogger.comtag:blogger.com,1999:blog-8184237816669520763.post-9353611261862925022018-01-30T05:07:00.001-08:002018-01-31T02:24:18.187-08:00Sorting myself out, extreme edition<h3 id="sorting-myself-out-extreme-edition">...where I go silly with optimization</h3>
<p>Yes, I’m still talking about sorting... ish. I love going <em>deep</em> on problems - it isn’t enough to just have <em>a solution</em> - I like knowing that I have the <em>best solution that I am capable of</em>. This isn’t always possible in a business setting - deadlines etc - but this is my own time.</p>
<p><a href="http://blog.marcgravell.com/2018/01/a-sort-of-problem.html">In part 1</a>, we introduced the scenario - and discussed how to build a composite unsigned value that we could use to sort multiple properties as a single key.</p>
<p><a href="http://blog.marcgravell.com/2018/01/more-of-sort-of-problem.html">In part 2</a>, we looked a little at radix sort, and found that it was a very compelling candidate.</p>
<p>In this episode, we’re going to look at some ways to signficantly improve what we’ve done so far. In particular, we’ll look at:</p>
<ul>
<li>using knowledge of how signed data works to avoid having to transform between them</li>
<li>performing operations in blocks rather than per value to reduce calls</li>
<li>using <code class="highlighter-rouge">Span<T></code> as a replacemment for <code class="highlighter-rouge">unsafe</code> code and unmanaged pointers, allowing you to get very high performance even in 100% managed/safe code</li>
<li>investigating branch removal as a performance optimization of critical loops</li>
<li>vectorizing critical loops to do the same work with significantly fewer CPU operations</li>
</ul>
<p>Hopefully, as a follow up after this one, I’ll look at pratical guidance on <em>parallelizing</em> this same work to spread the load over available cores.</p>
<p>Key point: the main purpose of these words is <strong>not</strong> to discuss how to implement a radix sort - in fact, we don’t even do that. Instead, it uses <em>one small part</em> of radix sort as an example problem with which to discuss <strong>much broader</strong> concepts of performance optimization in C# / .NET.</p>
<p>Obviously I can’t cover the entire of radix sort for these, so I’m going to focus on one simple part: composing the radix for sorting. To recall, a naive implementation of radix sort requires unsigned keys, so that the data is naturally sortable in their binary representation. Signed integers and floating point numbers don’t follow this layout, so in part 1 we introduced some basic tools to change between them:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>uint Sortable(int value)
{
// re-base eveything upwards, so anything
// that was the min-value is now 0, etc
var val = unchecked((uint)(value - int.MinValue));
return val;
}
unsafe uint Sortable (float value)
{
const int MSB = 1 << 31;
int raw = *(int*)(&value);
if ((raw & MSB) != 0) // IEEE first bit is the sign bit
{
// is negative; shoult interpret as -(the value without the MSB) - not the same as just
// dropping the bit, since integer math is twos-complement
raw = -(raw & ~MSB);
}
return Sortable(raw);
}
</code></pre></div></div>
<p>These two simple transformation - applied to our target values - will form the central theme of this entry.</p>
<p>To measure performance, I’ll be using the inimitable <a href="https://www.nuget.org/packages/BenchmarkDotNet/">BenchmarkDotNet</a>, looking at multiple iterations of transforming 2 million random <code class="highlighter-rouge">float</code> values taken from a seeded <code class="highlighter-rouge">Random()</code>, with varying signs etc. The method above will be our baseline, and at each step we’ll add a new row to our table at the bottom:</p>
<table>
<thead>
<tr>
<th>Method</th>
<th style="text-align: right">Mean</th>
<th style="text-align: right">Scaled</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>SortablePerValue</strong></td>
<td style="text-align: right">10,483.8 us</td>
<td style="text-align: right">1.00</td>
</tr>
</tbody>
</table>
<p>This gives us a good starting point.</p>
<h2 id="a-negative-sign-of-the-times">A negative sign of the times<a class="anchorjs-link " href="#a-negative-sign-of-the-times" aria-label="Anchor link for: a negative sign of the times" data-anchorjs-icon="" style="font-style: normal; font-variant: normal; font-weight: normal; font-stretch: normal; font-size: 1em; line-height: 1; font-family: anchorjs-icons; padding-left: 0.375em;"></a></h2>
<p>What’s faster than performing a fast operation? <em>Not</em> performing a fast operation. The way radix sort works is by looking at the sort values <code class="highlighter-rouge">r</code> bits at a time (commonly 4, 8, 10, but any number is valid) and for that block of bits: counting how many candidates are in each of the <code class="highlighter-rouge">1 << r</code> possible buckets. So if <code class="highlighter-rouge">r</code> is 3, we have 8 possible buckets. From that it computes target offsets for each group: if there are 27 values in bucket 0, 12 in bucket 1, 3 in bucket 2, etc - then <em>when sorted</em> bucket 0 will start at offset 0, bucket 1 at offset 27, bucket 2 at offset 39, bucket 3 at offset 41, and so on - just by accumulating the counts. But this breaks if we have signed numbers.</p>
<p>Why?</p>
<p>First, let’s remind ourselves of the various ways that signed and unsigned data can be represented in binary, using a 4 bit number system and integer representations:</p>
<table>
<thead>
<tr>
<th style="text-align: center">Binary</th>
<th style="text-align: center">Unsigned</th>
<th style="text-align: center">2s-complement</th>
<th style="text-align: center">1s-complement</th>
<th style="text-align: center">Sign bit</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center">0000</td>
<td style="text-align: center">0</td>
<td style="text-align: center">0</td>
<td style="text-align: center">+0</td>
<td style="text-align: center">+0</td>
</tr>
<tr>
<td style="text-align: center">0001</td>
<td style="text-align: center">1</td>
<td style="text-align: center">1</td>
<td style="text-align: center">1</td>
<td style="text-align: center">1</td>
</tr>
<tr>
<td style="text-align: center">0010</td>
<td style="text-align: center">2</td>
<td style="text-align: center">2</td>
<td style="text-align: center">2</td>
<td style="text-align: center">2</td>
</tr>
<tr>
<td style="text-align: center">0011</td>
<td style="text-align: center">3</td>
<td style="text-align: center">3</td>
<td style="text-align: center">3</td>
<td style="text-align: center">3</td>
</tr>
<tr>
<td style="text-align: center">0100</td>
<td style="text-align: center">4</td>
<td style="text-align: center">4</td>
<td style="text-align: center">4</td>
<td style="text-align: center">4</td>
</tr>
<tr>
<td style="text-align: center">0101</td>
<td style="text-align: center">5</td>
<td style="text-align: center">5</td>
<td style="text-align: center">5</td>
<td style="text-align: center">5</td>
</tr>
<tr>
<td style="text-align: center">0110</td>
<td style="text-align: center">6</td>
<td style="text-align: center">6</td>
<td style="text-align: center">6</td>
<td style="text-align: center">6</td>
</tr>
<tr>
<td style="text-align: center">0111</td>
<td style="text-align: center">7</td>
<td style="text-align: center">7</td>
<td style="text-align: center">7</td>
<td style="text-align: center">7</td>
</tr>
<tr>
<td style="text-align: center">1000</td>
<td style="text-align: center">8</td>
<td style="text-align: center">-8</td>
<td style="text-align: center">-7</td>
<td style="text-align: center">-0</td>
</tr>
<tr>
<td style="text-align: center">1001</td>
<td style="text-align: center">9</td>
<td style="text-align: center">-7</td>
<td style="text-align: center">-6</td>
<td style="text-align: center">-1</td>
</tr>
<tr>
<td style="text-align: center">1010</td>
<td style="text-align: center">10</td>
<td style="text-align: center">-6</td>
<td style="text-align: center">-5</td>
<td style="text-align: center">-2</td>
</tr>
<tr>
<td style="text-align: center">1011</td>
<td style="text-align: center">11</td>
<td style="text-align: center">-5</td>
<td style="text-align: center">-4</td>
<td style="text-align: center">-3</td>
</tr>
<tr>
<td style="text-align: center">1100</td>
<td style="text-align: center">12</td>
<td style="text-align: center">-4</td>
<td style="text-align: center">-3</td>
<td style="text-align: center">-4</td>
</tr>
<tr>
<td style="text-align: center">1101</td>
<td style="text-align: center">13</td>
<td style="text-align: center">-3</td>
<td style="text-align: center">-2</td>
<td style="text-align: center">-5</td>
</tr>
<tr>
<td style="text-align: center">1110</td>
<td style="text-align: center">14</td>
<td style="text-align: center">-2</td>
<td style="text-align: center">-1</td>
<td style="text-align: center">-6</td>
</tr>
<tr>
<td style="text-align: center">1111</td>
<td style="text-align: center">15</td>
<td style="text-align: center">-1</td>
<td style="text-align: center">-0</td>
<td style="text-align: center">-7</td>
</tr>
</tbody>
</table>
<p>We’re usually most familiar with unsigned and 2s-complement representations, because that is what most modern processors use to represent integers. 1s-complement is where <code class="highlighter-rouge">-x ≡ ~x</code> - i.e. to negate something we simply invert <em>all the bits</em>. This works fine but has two zeros, which is one more than we usually need - hence we usually use 2s-complement which simply adds an off-by-one step; this makes zero unambiguous (very useful for <code class="highlighter-rouge">false</code> as we’ll see later) and (perhaps less important) gives us an extra negative value to play with.</p>
<p>The final option is to use a sign bit; to negate a number we flip the most significant bit, so <code class="highlighter-rouge">-x ≡ x ^ 0b1000</code>. IEEE754 floating point numbers (<code class="highlighter-rouge">float</code> and <code class="highlighter-rouge">double</code>) are implemented using a sign bit, which is why floating point numbers have +0 and -0. Due to clever construction, the rest of the value can be treated as naturally/bitwise sortable - even without needing to understand about the “mantissa”, “exponent” and “bias”. This means that to convert a <strong>negative</strong> <code class="highlighter-rouge">float</code> (or any other sign-bit number) to a 1s-complement representation, we simply flip all the bits except the most significant bit. Or we flip <em>all</em> the bits and put the most significant bit back again, since we know it should be a <code class="highlighter-rouge">1</code>.</p>
<hr>
<p>So: armed with this knowledge, we can see that signed data in 1s-complement or 2s-complement is <em>almost</em> “naturally sortable” in binary, but simply: the negative values are sorted in increasing numerical value, but come <em>after</em> the positive values (we can happily assert that -0 < +0). This means that we can educate radix sort about 1s-complement and 2s-complement signed data <em>simply by being clever when processing the <strong>final left-most chunk</strong></em>: based on <code class="highlighter-rouge">r</code> and the bit-width, calculate which bit is the most-signficant bit (which indicates sign), and simply <em>process the negative buckets first</em> (still in the same order) when calculating the offsets; then calculate the offsets of the non-negative buckets. If we were using the 4-bit system above and <code class="highlighter-rouge">r=4</code>, we would have 16 buckets, and would calculate offsets in the order (of unsigned buckets) 8-15 then 0-7.</p>
<p>By doing this, we can <strong>completely remove</strong> any need to do any pre-processing when dealing with values like <code class="highlighter-rouge">int</code>. We could perhaps wish that the IEEE754 committee had preferred 1s-complement so we could skip all of this for <code class="highlighter-rouge">float</code> too, but a: I think it is fair to assume that there are good technical reasons for the choice (presumably relating to fast negation, and fast access to the mantissa/exponent), and b: it is moot: IEEE754 is implemented in CPU architectures and is here to stay.</p>
<p>So we’re still left with an issue for <code class="highlighter-rouge">float</code>: we can’t use the same trick for values with a sign bit, because the sign bit changes the order <em>throughout</em> the data - making grouping impossible. We can make our lives <em>easier</em> though: since the algorithm can now cope with 1s-complement and 2s-complement data, we can switch to 1s-complement rather than to fully unsigned, which as discussed above: is pretty easy:</p>
<p>(Aside: actually, there <em>is</em> a related trick we can do to avoid having to pre-process floating point data, but: it would make this entire blog redundant! So for the purposes of a problem to investigate, we’re going to assume that we need to do this transformation.)</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>unsafe int ToRadix (float value)
{
const int MSB = 1 << 31;
int raw = *(int*)(&value);
// if sign bit set: flip all bits except the MSB
return (raw & MSB) == 0 ? raw : ~raw | MSB;
}
</code></pre></div></div>
<p>A nice side-effect of this is that it is self-reversing: we can apply the exact same bit operations to convert from 1s-complement back to a sign bit.</p>
<table>
<thead>
<tr>
<th>Method</th>
<th style="text-align: right">Mean</th>
<th style="text-align: right">Scaled</th>
</tr>
</thead>
<tbody>
<tr>
<td>SortablePerValue</td>
<td style="text-align: right">10,483.8 us</td>
<td style="text-align: right">1.00</td>
</tr>
<tr>
<td><strong>ToRadixPerValue</strong></td>
<td style="text-align: right">10,120.5 us</td>
<td style="text-align: right">0.97</td>
</tr>
</tbody>
</table>
<p>We’ve made a slight but measurable improvment - nothing drastic, but the code is nicer.</p>
<h2 id="blocking-ourselves-out">Blocking ourselves out<a class="anchorjs-link " href="#blocking-ourselves-out" aria-label="Anchor link for: blocking ourselves out" data-anchorjs-icon="" style="font-style: normal; font-variant: normal; font-weight: normal; font-stretch: normal; font-size: 1em; line-height: 1; font-family: anchorjs-icons; padding-left: 0.375em;"></a></h2>
<p>We have a large chunk of data, and we want to perform a transformation on each value. So far, we’ve looked at a per-value transformation function (<code class="highlighter-rouge">Sortable</code>), but that means the overhead of a call per-value (which may or may not get inlined, depending on the complexity, and how we resolve the method - i.e. does it involve a <code class="highlighter-rouge">virtual</code> call to a type that isn’t reliably known). Additioanlly, it makes it very hard for us to apply more advanced optimizations! Blocks good.</p>
<p>So; let’s say we have our existing loop:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>float[] values = ...
int[] radix = ...
for(int i = 0 ; i < values.Length; i++)
{
radix = someHelper.Sortable(values[i]);
}
</code></pre></div></div>
<p>and we want to retain the ability to swap in per-type implementations of <code class="highlighter-rouge">someHelper.Sortable</code>; we can significantly reduce the call overhead by performing a block-based transformation. Consider:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>float[] values = ...
int[] radix = ...
someHelper.ToRadix(values, radix);
...
unsafe void ToRadix(float[] source, int[] destination)
{
const int MSB = 1 << 31;
for(int i = 0 ; i < source.Length; i++)
{
var val = source[i];
int raw = *(int*)(&val);
// if sign bit set: flip all bits except the MSB
destination[i] = (raw & MSB) == 0 ? raw : ~raw | MSB;
}
}
</code></pre></div></div>
<p>How much of a speed improvement this makes depends a lot on whether the JIT managed to inline the IL from the original version. It is usually a good win by itself, but more importantly: it is a key stepping stone to further optimizations.</p>
<table>
<thead>
<tr>
<th>Method</th>
<th style="text-align: right">Mean</th>
<th style="text-align: right">Scaled</th>
</tr>
</thead>
<tbody>
<tr>
<td>SortablePerValue</td>
<td style="text-align: right">10,483.8 us</td>
<td style="text-align: right">1.00</td>
</tr>
<tr>
<td>ToRadixPerValue</td>
<td style="text-align: right">10,120.5 us</td>
<td style="text-align: right">0.97</td>
</tr>
<tr>
<td><strong>ToRadixBlock</strong></td>
<td style="text-align: right">10,080.0 us</td>
<td style="text-align: right">0.96</td>
</tr>
</tbody>
</table>
<p>Another small improvement; I was hoping for more, but I suspect that the JIT was <em>already</em> doing a good job of inlining the method we're calling, making it <em>almost</em> the same loop at runtime. This is not always the case, though - especially if you have multiple different transformations to apply through a single API.</p>
<h2 id="safely-spanning-the-performance-chasm">Safely spanning the performance chasm<a class="anchorjs-link " href="#safely-spanning-the-performance-chasm" aria-label="Anchor link for: safely spanning the performance chasm" data-anchorjs-icon="" style="font-style: normal; font-variant: normal; font-weight: normal; font-stretch: normal; font-size: 1em; line-height: 1; font-family: anchorjs-icons; padding-left: 0.375em;"></a></h2>
<p>You’ll notice that in the code above I’ve made use of <code class="highlighter-rouge">unsafe</code> code. There are a few things that make <code class="highlighter-rouge">unsafe</code> appealing, but one of the things it does exceptionally well is allow us to reintrepret chunks of data as other types, which is what this line does:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>int raw = *(int*)(&val);
</code></pre></div></div>
<p>Actually, there are some methods on <code class="highlighter-rouge">BitConverter</code> <a href="https://github.com/dotnet/coreclr/blob/master/src/mscorlib/shared/System/BitConverter.cs#L462-L472">to do exactly this</a>, but only the 64-bit (<code class="highlighter-rouge">double</code>/<code class="highlighter-rouge">long</code>) versions exist in the “.NET Framework” (“.NET Core” has both 32-bit and 64-bit) - and that only helps us with this single example, rather than the general case.</p>
<p>For example, if we are talking in pointers, we can tell the compiler to treat a <code class="highlighter-rouge">float*</code> as though it were an <code class="highlighter-rouge">int*</code>. One way we <em>might</em> be tempted to rewrite our <code class="highlighter-rouge">ToRadix</code> method could be to move this coercison earlier:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>unsafe void ToRadix(float[] source, int[] destination)
{
const int MSB = 1 << 31;
fixed(float* fPtr = source)
{
int* ptr = (int*)fPtr;
for(int i = 0 ; i < values.Length; i++)
{
int raw = *ptr++;
destination[i] = (raw & MSB) == 0 ? raw : ~raw | MSB;
}
}
}
</code></pre></div></div>
<p>Now we’re naturally reading values out <em>as</em> <code class="highlighter-rouge">int</code>, rather than performing any reinterpretation per value. This is <em>useful</em>, but it requires us to use <code class="highlighter-rouge">unsafe</code> code (always a great way to get hurt), and it doesn’t work with generics - you <strong>cannot</strong> use <code class="highlighter-rouge">T*</code> for some <code class="highlighter-rouge"><T></code>, even with the <code class="highlighter-rouge">where T : struct</code> constraint.</p>
<p>I’ve spoken more than a few times about <code class="highlighter-rouge">Span<T></code>; quite simply: it rocks. To recap, <code class="highlighter-rouge">Span<T></code> (and it’s heap-friendly cousin, <code class="highlighter-rouge">Memory<T></code>) is a general purpose, efficient, and versatile representation of contiguous memory - which includes things like arrays (<code class="highlighter-rouge">float[]</code>), but more exotic things too.</p>
<p>One of the most powerful (but simple) features of <code class="highlighter-rouge">Span<T></code> is that it allows us to do type coercison <em>in fully safe managed code</em>. For example, instead of a <code class="highlighter-rouge">float[]</code>, let’s say that we have a <code class="highlighter-rouge">Span<float></code>. We can reinterpet that as <code class="highlighter-rouge">int</code> <em>very simply</em>:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Span<float> values = ...
var asIntegers = values.NonPortableCast<float, int>();
</code></pre></div></div>
<p>Note: this API is likely to change - by the time it hits general availablity, it’ll probably be:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>var asIntegers = values.Cast<int>();
</code></pre></div></div>
<p>What this does is:</p>
<ul>
<li>look at the sizes of the original and target type</li>
<li>calculate how many of the target type fit into the original data</li>
<li>round down (so we never go out of range)</li>
<li>hand us back a span of the target type, of that (possibly reduced) length</li>
</ul>
<p>Since <code class="highlighter-rouge">float</code> and <code class="highlighter-rouge">int</code> are the same size, we’ll find that <code class="highlighter-rouge">asIntegers</code> has the same length as <code class="highlighter-rouge">values</code>.</p>
<p>What is <em>especially</em> powerful here is that this trick <em>works with generics</em>. It does something that <code class="highlighter-rouge">unsafe</code> code <strong>will not do for us</strong>. Note that a <em>lot of love</em> has been shown to <code class="highlighter-rouge">Span<T></code> in the runtime and JIT - essentially all of the same tricks that make <code class="highlighter-rouge">T[]</code> array performance largely indistinguishable from pointer <code class="highlighter-rouge">T*</code> performance.</p>
<p>This means we could simplify a lot of things by converting from our generic <code class="highlighter-rouge">T</code> <em>even earlier</em> (so we only do it once for the entire scope of our radix code), and having our radix converter just talk in terms of the raw bits (usually: <code class="highlighter-rouge">uint</code>).</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>// our original data
float[] values = ...
// recall that radix sort needs some extra space to work in
float[] workspace = ...
// implicit <float> and implicit conversion to Span<float>
RadixSort32(values, workspace);
...
RadixSort32<TSource>(Span<T> typedSource, Span<T> typedWorkspace)
where T : struct
{
// note that the JIT can inline this *completely* and remove impossible
// code paths, because for struct T, the JIT is per-T
if (Unsafe.SizeOf<T>() != Unsafe.SizeOf<uint>()) .. throw an error
var source = typedSource.NonPortableCast<T, uint>();
var workspace = typedWorkspace.NonPortableCast<T, uint>();
// convert the radix if needed (into the workspace)
var converter = GetConverter<T>();
converter?.ToRadix(source, workspace);
// ... more radix sort details not shown
}
...
// our float converter
public void ToRadix(Span<uint> values, Span<uint> destination)
{
const uint MSB = 1U << 31;
for(int i = 0 ; i < values.Length; i++)
{
uint raw = values[i];
destination[i] = (raw & MSB) == 0 ? raw : ~raw | MSB;
}
}
</code></pre></div></div>
<p>The code is getting simpler, while retaining performance <em>and</em> becoming more generic-friendly; and we haven’t needed to use a single <code class="highlighter-rouge">unsafe</code>. You’ll have to excuse me an <code class="highlighter-rouge">Unsafe.SizeOf<T>()</code> - despite the name, this isn’t <em>really</em> an “unsafe” operation in the usual sense - this is simply a wrapper to the <code class="highlighter-rouge">sizeof</code> IL instruction that is <em>perfectly well defined</em> for all <code class="highlighter-rouge">T</code> that are usable in generics. It just isn’t directly available in safe C#.</p>
<table>
<thead>
<tr>
<th>Method</th>
<th style="text-align: right">Mean</th>
<th style="text-align: right">Scaled</th>
</tr>
</thead>
<tbody>
<tr>
<td>SortablePerValue</td>
<td style="text-align: right">10,483.8 us</td>
<td style="text-align: right">1.00</td>
</tr>
<tr>
<td>ToRadixPerValue</td>
<td style="text-align: right">10,120.5 us</td>
<td style="text-align: right">0.97</td>
</tr>
<tr>
<td>ToRadixBlock</td>
<td style="text-align: right">10,080.0 us</td>
<td style="text-align: right">0.96</td>
</tr>
<tr>
<td><strong>ToRadixSpan</strong></td>
<td style="text-align: right">7,976.3 us</td>
<td style="text-align: right">0.76</td>
</tr>
</tbody>
</table>
<p>Now we’re starting to make decent improvements - <code class="highlighter-rouge">Span<T></code> is <em>really</em> useful for large operations where type coercion is necessary.</p>
<h2 id="taking-up-tree-surgery-prune-those-branches">Taking up tree surgery: prune those branches<a class="anchorjs-link " href="#taking-up-tree-surgery-prune-those-branches" aria-label="Anchor link for: taking up tree surgery prune those branches" data-anchorjs-icon="" style="font-style: normal; font-variant: normal; font-weight: normal; font-stretch: normal; font-size: 1em; line-height: 1; font-family: anchorjs-icons; padding-left: 0.375em;"></a></h2>
<p>Something that gnaws at my soul in where we’ve got to is that it includes a branch - an <code class="highlighter-rouge">if</code> test, essentially - in the inner part of the loop. Actually, there’s two and they’re both hidden. The first is in the <code class="highlighter-rouge">for</code> loop, but the one I’m talking about here is the one hidden in the ternary conditional operation, <code class="highlighter-rouge">a ? b : c</code>. CPUs are very clever about branching, with branch prediction and other fancy things - but it can still stall the instruction pipeline, especially if the prediction is wrong. If only there was a way to rewrite that operation to not need a branch. I’m sure you can see where this is going.</p>
<p>Branching: bad. Bit operations: good. A common trick we can use to remove branches is to obtain a bit-mask that is either all 0s (000…000) or all 1s (111…111) - so: 0 and -1 in 2s-complement terms. There are various ways we can do that (although it also depends on the actual value of <code class="highlighter-rouge">true</code> in your target system, which is a surprisingly complex question). Obviously one way to do that would be:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>// -1 if negative, 0 otherwise
var mask = (raw & MSB) == 0 ? 0 : ~0;
</code></pre></div></div>
<p>but that just <em>adds</em> another branch. If we were using C, we could use the knowledge that the equality test returns an <em>integer</em> of either 0 or 1, and just negate that to get 0 or -1:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>// -1 if negative, 0 otherwise
var mask = -((raw & MSB) == 0);
</code></pre></div></div>
<p>But no such trick is available in C#. What we <em>can</em> do, though, is use knowledge of <em>arithmetic shift</em>. Left-shift (<code class="highlighter-rouge"><<</code>) is simple; we shift our bits <code class="highlighter-rouge">n</code> places to the left, filling in with 0s on the right. So binary <code class="highlighter-rouge">11001101 << 3</code> becomes <code class="highlighter-rouge">01101000</code> (we lose the <code class="highlighter-rouge">110</code> from the left).</p>
<p>Right-shift is more subtle, as there are two of them: logical and arithmetic, which are essentially unsigned and signed. The <em>logical</em> shift (used with <code class="highlighter-rouge">uint</code> etc) moves our bits <code class="highlighter-rouge">n</code> places to the right, filling in with 0s on the left, so <code class="highlighter-rouge">11001101 >> 3</code> gives <code class="highlighter-rouge">00011001</code> (we lose the <code class="highlighter-rouge">101</code> from the right). The <em>arithmetic</em> shift behaves differently depending on whether the most significant bit (which tells us the sign of the value) is <code class="highlighter-rouge">0</code> or <code class="highlighter-rouge">1</code>. If it is <code class="highlighter-rouge">0</code>, it behaves exactly like the logical shift; however, if it is <code class="highlighter-rouge">1</code>, it fills in with 1s on the left; so <code class="highlighter-rouge">11001101 >> 3</code> gives <code class="highlighter-rouge">11111001</code>. Using this, we can use <code class="highlighter-rouge">>> 31</code> (or <code class="highlighter-rouge">>> 63</code> for 64-bit data) to create a mask that matches the sign of the original data:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>// -1 if negative, 0 otherwise
var mask = (uint)(((int)raw) >> 31);
</code></pre></div></div>
<p>Don’t worry about the extra conversions: as long as we’re in an <code class="highlighter-rouge">unchecked</code> context (which we are by default), they simply don’t exist. All they do is tell the compiler which shift operation to emit. If you’re curious, you can <a href="https://sharplab.io/#v2:EYLgtghgzgLgpgJwDQxASwDYB8ACAmAAgGUALNAMxgFgAoAb1oKYJwEYA2AgVzQDsYCAcTgwAstADWACh78CANwgYucAJSNmDGsx04A7ARl8YqqWeOrFytQQB8tggGZWqgNwamAX1qegA===">see this here</a>, but in IL terms, this is just:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(push the value onto the stack)
ldc.i4.s 31 // push 31 onto the stack
shr // arithmetic right shift
</code></pre></div></div>
<p>In IL terms, there’s really no difference between signed and unsigned integers, <em>other</em> than whether the compiler emites the signed or unsigned opcodes for operations. In this case we’ve told it to emit <a href="https://msdn.microsoft.com/en-us/library/system.reflection.emit.opcodes.shr(v=vs.110).aspx"><code class="highlighter-rouge">shr</code></a> - the signed/arithmetic opcode, instead of <a href="https://msdn.microsoft.com/en-us/library/system.reflection.emit.opcodes.shr_un(v=vs.110).aspx"><code class="highlighter-rouge">shr.un</code></a> - the unsigned/logical opcode.</p>
<p>OK; so we’ve got a mask that is either all zeros or all ones. Now we need to use it to avoid a branch, but how? Consider:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>var condition = // all zeros or all ones
var result = (condition & trueValue) | (~condition & falseValue);
</code></pre></div></div>
<p>If <code class="highlighter-rouge">condition</code> is all zeros, then the <code class="highlighter-rouge">condition & trueValue</code> gives us zero; the <code class="highlighter-rouge">~condition</code> becomes all ones, and therefore <code class="highlighter-rouge">~condition & falseValue</code> gives us <code class="highlighter-rouge">falseValue</code>. When we “or” (<code class="highlighter-rouge">|</code>) those together, we get <code class="highlighter-rouge">falseValue</code>.</p>
<p>Likewise, if <code class="highlighter-rouge">condition</code> is all ones, then <code class="highlighter-rouge">condition & trueValue</code> gives us <code class="highlighter-rouge">trueValue</code>; the <code class="highlighter-rouge">~condition</code> becomes all zeros, and therefore <code class="highlighter-rouge">~condition & falseValue</code> gives us zero. When we “or” (<code class="highlighter-rouge">|</code>) those together, we get <code class="highlighter-rouge">trueValue</code>.</p>
<p>So our branchless operation becomes:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>public void ToRadix(Span<uint> values, Span<uint> destination)
{
const uint MSB = 1U << 31;
for(int i = 0 ; i < values.Length; i++)
{
uint raw = values[i];
var ifNeg = (uint)(((int)raw) >> 31);
destination[i] =
(ifNeg & (~raw | MSB)) // true
| (~ifNeg & raw); // false
}
}
</code></pre></div></div>
<p>This might look more complicated, but it is <strong>very</strong> CPU-friendly: it pipelines very well, and doesn’t involve any branches for it to worry about. Doing a few extra bit operations is <em>nothing</em> to a CPU - especially if they can be pipelined. Long instruction pipelines are actually a <em>good</em> thing to a CPU - compared to a branch or something that might involve a cache miss, at least.</p>
<table>
<thead>
<tr>
<th>Method</th>
<th style="text-align: right">Mean</th>
<th style="text-align: right">Scaled</th>
</tr>
</thead>
<tbody>
<tr>
<td>SortablePerValue</td>
<td style="text-align: right">10,483.8 us</td>
<td style="text-align: right">1.00</td>
</tr>
<tr>
<td>ToRadixPerValue</td>
<td style="text-align: right">10,120.5 us</td>
<td style="text-align: right">0.97</td>
</tr>
<tr>
<td>ToRadixBlock</td>
<td style="text-align: right">10,080.0 us</td>
<td style="text-align: right">0.96</td>
</tr>
<tr>
<td>ToRadixSpan</td>
<td style="text-align: right">7,976.3 us</td>
<td style="text-align: right">0.76</td>
</tr>
<tr>
<td><strong>Branchless</strong></td>
<td style="text-align: right">2,507.0 us</td>
<td style="text-align: right">0.24</td>
</tr>
</tbody>
</table>
<p>By removing the branches, we’re down to <em>less than a quarter</em> of the original run-time; that’s a huge win, even if the code is slightly more complex.</p>
<h2 id="why-do-one-thing-at-a-time">Why do one thing at a time?<a class="anchorjs-link " href="#why-do-one-thing-at-a-time" aria-label="Anchor link for: why do one thing at a time" data-anchorjs-icon="" style="font-style: normal; font-variant: normal; font-weight: normal; font-stretch: normal; font-size: 1em; line-height: 1; font-family: anchorjs-icons; padding-left: 0.375em;"></a></h2>
<p>OK, so we’ve got a nice branchless version, and the world looks great. We’ve made significant improvements. But we can still get <em>much better</em>. At the moment we’re processing each value one at a time, but as it happens, this is a perfect scenario for <em>vectorization</em> via <a href="https://en.wikipedia.org/wiki/SIMD">SIMD (“Single instruction, multiple data”)</a>.</p>
<p>We have a pipelineable operation without any branches; if we can execute it for <em>one</em> value, we can probably execute it for <em>multiple</em> values <strong>at the same time</strong> - magic, right? Many modern CPUs include support for performing basic operations like the above on multiple values at a time, using super-wide registers. Right now we’re using 32-bit values, but most current CPUs will have support for AVX (mostly: 128-bit) or AVX2 (mostly: 256-bit) operations. If you’re on a very expensive server, you might have more (AVX512). But let’s assume AVX2: that means we can handle 8 32-bit values at a time. That means 1/8th of the main operations, and also 1/8th of the <code class="highlighter-rouge">if</code> branches hidden in the <code class="highlighter-rouge">for</code> loop.</p>
<p>Some languages have automatic vectorization during compilation; C# doesn’t have that, and neither does the JIT. But, we still have access to a range of vectorized operations (with support for the more exotic intrinsics being added soon). Until recently, one of the most awkward things about working with vectorization has been <em>loading the values</em>. This might sound silly, but it is surprisingly difficult to pull the values in efficiently when you don’t know how wide the vector registers are on the target CPU. Fortunately, our amazing new friend <code class="highlighter-rouge">Span<T></code> jumps to the rescue here - making it almost embarrassingly easy!</p>
<p>First, let’s look at what the shell loop might look like, without actually doing the real work in the middle:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>public void ToRadix(Span<uint> values, Span<uint> destination)
{
const uint MSB = 1U << 31;
int i = 0;
if (Vector.IsHardwareAccelerated)
{
var vSource = values.NonPortableCast<uint, Vector<uint>>();
var vDest = destination.NonPortableCast<uint, Vector<uint>>();
for (int j = 0; j < vSource.Length; j++)
{
var vec = vSource[j];
vDest[j] = // TODO
}
// change our root offset for the remainder of the values
i = vSource.Length * Vector<uint>.Count;
}
for( ; i < values.Length; i++)
{
uint raw = values[i];
var ifNeg = (uint)(((int)raw) >> 31);
destination[i] =
(ifNeg & (~raw | MSB)) // true
| (~ifNeg & raw); // false
}
}
</code></pre></div></div>
<p>First, look at the bottom of the code; here we see that our <em>regular branchless</em> code still persists. This is for two reasons:</p>
<ul>
<li>the target CPU <em>might not be capable of vectorization</em></li>
<li>our input data might not be a nice multiple of the register-width, so we might need to process a final few items the old way</li>
</ul>
<p>Note that we’ve changed the <code class="highlighter-rouge">for</code> loop so that it doesn’t reset the position of <code class="highlighter-rouge">i</code> - we don’t <em>necessarily</em> start at <code class="highlighter-rouge">0</code>.</p>
<p>Now look at the <code class="highlighter-rouge">if (Vector.IsHardwareAccelerated)</code>; this checks that suitable vectorization support is available. Note that the JIT can optimize this check away completely (and remove all of the inner code if it won’t be reached). If we <em>do</em> have support, we cast the span from a <code class="highlighter-rouge">Span<uint></code> to a <code class="highlighter-rouge">Span<Vector<uint>></code>. Note that <code class="highlighter-rouge">Vector<T></code> is recognized by the JIT, and will be <em>reshaped</em> by the JIT to match the size of the available vectorization support on the running computer. That means that when using <code class="highlighter-rouge">Vector<T></code> we don’t need to worry about whether the target computer has SSE vs AVX vs AVX2 etc - or what the available widths are; simply: “give me what you can”, and the JIT worries about the details.</p>
<p>We can now loop over the <em>vectors</em> available in the cast spans - loading an entire vector at a time simply using our familiar: <code class="highlighter-rouge">var vec = vSource[j];</code>. This is a <em>huge</em> difference to what loading vectors used to look like. We then do some operation (not shown) on <code class="highlighter-rouge">vec</code>, and assign the result <em>again as an entire vector</em> to <code class="highlighter-rouge">vDest[j]</code>. On my machine with AVX2 support, <code class="highlighter-rouge">vec</code> is block of 8 32-bit values.</p>
<p>Next, we need to think about that <code class="highlighter-rouge">// TODO</code> - what are we actually going to <em>do</em> here? If you’ve already re-written your inner logic to be branchless, there’s actually a very good chance that it will be a like-for-like translation of your branchless code. In fact, it turns out that the ternary conditional scenario we’re looking at here is <em>so common</em> that there are vectorized operations <em>precisely to do it</em>; the “conditional select” vectorized CPU instruction can essentially be stated as:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>// result conditionalSelect(condition, left, right)
result = (condition & left) | (~condition & right);
</code></pre></div></div>
<p>Where <code class="highlighter-rouge">condition</code> is <em>usually</em> either all-zeros or all-ones (but it doesn’t have to be; if you want to pull different bits from each value, you can do that too).</p>
<p>This intrinsic is exposed directly on <code class="highlighter-rouge">Vector</code>, so our missing code becomes simply:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>var vMSB = new Vector<uint>(MSB);
var vNOMSB = ~vMSB;
for (int j = 0; j < vSource.Length; j++)
{
var vec = vSource[j];
vDest[j] = Vector.ConditionalSelect(
condition: Vector.GreaterThan(vec, vNOMSB),
left: ~vec | vMSB, // when true
right: vec // when false
);
}
</code></pre></div></div>
<p>Note that I’ve pre-loaded a vector with the MSB value (which creates a vector with that value in every cell), and I’ve switched to using a <code class="highlighter-rouge">></code> test instead of a bit test and shift. Partly, this is because the vectorized equality / inequality operations <em>expect</em> this kind of usage, and very kindly return <code class="highlighter-rouge">-1</code> as their true value - using the result to directly feed “conditional select”.</p>
<table>
<thead>
<tr>
<th>Method</th>
<th style="text-align: right">Mean</th>
<th style="text-align: right">Scaled</th>
</tr>
</thead>
<tbody>
<tr>
<td>SortablePerValue</td>
<td style="text-align: right">10,483.8 us</td>
<td style="text-align: right">1.00</td>
</tr>
<tr>
<td>ToRadixPerValue</td>
<td style="text-align: right">10,120.5 us</td>
<td style="text-align: right">0.97</td>
</tr>
<tr>
<td>ToRadixBlock</td>
<td style="text-align: right">10,080.0 us</td>
<td style="text-align: right">0.96</td>
</tr>
<tr>
<td>ToRadixSpan</td>
<td style="text-align: right">7,976.3 us</td>
<td style="text-align: right">0.76</td>
</tr>
<tr>
<td>Branchless</td>
<td style="text-align: right">2,507.0 us</td>
<td style="text-align: right">0.24</td>
</tr>
<tr>
<td><strong>Vectorized</strong></td>
<td style="text-align: right">930.0 us</td>
<td style="text-align: right">0.09</td>
</tr>
</tbody>
</table>
<p>As you can see, the effect of vectorization on this type of code is just amazing - with us now getting more than an order-of-magnitude improvement on the original data. That’s why I’m so excited about how easy (relatively speaking) <code class="highlighter-rouge">Span<T></code> makes vectorization, and why I can’t wait for <code class="highlighter-rouge">Span<T></code> to hit production.</p>
<p>A reasonable range of common operations are available on <a href="https://msdn.microsoft.com/en-us/library/system.numerics.vector(v=vs.111).aspx"><code class="highlighter-rouge">Vector</code></a> and <a href="https://msdn.microsoft.com/en-us/library/dn858385(v=vs.111).aspx"><code class="highlighter-rouge">Vector<T></code></a>. If you need exotic operations like “gather”, you might need to wait until <a href="https://github.com/dotnet/corefx/blob/master/src/System.Runtime.Intrinsics/ref/System.Runtime.Intrinsics.cs"><code class="highlighter-rouge">System.Runtime.Intrinsics</code></a> lands. One key difference here is that <code class="highlighter-rouge">Vector<T></code> exposes the <em>common intersection</em> of operations that might be available (with different widths) against <em>different</em> CPU instruction sets, where as <code class="highlighter-rouge">System.Runtime.Intrinsics</code> aims to expose the <em>underlying</em> intrinsics - giving access to the full range of instructions, but forcing you to code specifically to a chosen instruction set (or possibly having two implementations - one for AVX and one for AVX2). This is simply because there isn’t a uniform API surface between generatons and vendors - it isn’t simply that you get the same operations with different widths: you get different operations too. So you’d typically be checking <code class="highlighter-rouge">Aes.IsSupported</code>, <code class="highlighter-rouge">Avx2.IsSupported</code>, etc. Being realistic: <code class="highlighter-rouge">Vector<T></code> is what we have <em>today</em>, and it worked damned well.</p>
<h2 id="summary">Summary<a class="anchorjs-link " href="#summary" aria-label="Anchor link for: summary" data-anchorjs-icon="" style="font-style: normal; font-variant: normal; font-weight: normal; font-stretch: normal; font-size: 1em; line-height: 1; font-family: anchorjs-icons; padding-left: 0.375em;"></a></h2>
<p>We’ve looked at a range of advanced techniques for improving performance of critical loops of C# code, including (to repeat the list from the start):</p>
<ul>
<li>using knowledge of how signed data works to avoid having to transform between them</li>
<li>performing operations in blocks rather than per value to reduce calls</li>
<li>using <code class="highlighter-rouge">Span<T></code> as a replacemment for <code class="highlighter-rouge">unsafe</code> code and unmanaged pointers, allowing you to get very high performance even in 100% managed/safe code</li>
<li>investigating branch removal as a performance optimization of critical loops</li>
<li>vectorizing critical loops to do the same work with significantly fewer CPU operations</li>
</ul>
<p>And we’ve seen <em>dramatic</em> improvements to the performance. Hopefully, some or all of these techniques will be applicable to your own code. Either way, I hope it has been an interesting diversion.</p>
<p>Next time: practical parallelization</p>
<h2 id="addendum">Addendum</h2>
<p>For completeness: yes I also tried a <code class="highlighter-rouge">val ^ ~MSB</code> approach for both branchless and vectorized; it wasn't an improvement.</p>
<p>And for the real implementation (the "aside" mentioned above): what the code <em>actually</em> does for sign-bit data (IEEE754) is: sort <em>just</em> on the sign bit <em>first</em>, use the count data to find where the sign changes (without scanning over the data an extra time), and then sort the two halves separately <em>ignoring</em> the MSB, with the first chunk in descending order and the second chunk in ascending order. By doing this, we avoid the need for the transform - again, by using knowledge of the bit layout of the data.</p>Marc Gravellhttp://www.blogger.com/profile/01023334706549710089noreply@blogger.comtag:blogger.com,1999:blog-8184237816669520763.post-90545530600300335872018-01-20T13:27:00.002-08:002018-01-30T06:19:38.435-08:00More Of A Sort Of Problem<p>(<a href="http://blog.marcgravell.com/2018/01/a-sort-of-problem.html">part 1 here</a>)</p>
<p>(<a href="http://blog.marcgravell.com/2018/01/sorting-myself-out-extreme-edition.html">part 3 here</a>)</p>
<p>So <a href="http://blog.marcgravell.com/2018/01/a-sort-of-problem.html">last time</a> I talked about a range of ways of performing a sort, ranging from the simple thru to hijacking .NET source code. Very quickly, some folks pointed out that I should have looked at “radix sort”, and they’re absoltely right - I should have. In fact, in the GPU version of this same code, we do exactly that <a href="https://nvlabs.github.io/cub/structcub_1_1_device_radix_sort.html">via the CUB library</a>.</p>
<p>The great thing is, radix sort is relatively simple, so:</p>
<h2 id="attempt-9-radix-sort">Attempt 9: radix sort<a class="anchorjs-link " href="#attempt-9-radix-sort" aria-label="Anchor link for: attempt 9 radix sort" data-anchorjs-icon="" style="font-style: normal; font-variant: normal; font-weight: normal; font-stretch: normal; font-size: 1em; line-height: 1; font-family: anchorjs-icons; padding-left: 0.375em;"></a></h2>
<p>The key point about radix sort is that it works by grouping the data by groups of bits in the data, using the same “bitwise sortable” idea that we used previously. We’ve already done the hard work to get a radix compliant representation of our sort data.</p>
<p>We can get a <a href="https://en.wikibooks.org/wiki/Algorithm_Implementation/Sorting/Radix_sort">basic radix sort implementation from wikibooks</a>, but this version has a few things we need to fix:</p>
<ul>
<li>this is a single-array version; we want a dual array</li>
<li>we can use <code class="highlighter-rouge">unsafe</code> code to get rid of a lot of array range checks (and just: don’t be wrong!)</li>
<li>radix sort needs a workspace the same same as the input values as a scratch area; in the version shown, it allocates this internally, but in “real” code we’ll want to manage that externally and pass it in</li>
<li>we can make <code class="highlighter-rouge">r</code> (the number of bits to consider at a time) configurable</li>
<li>the shown code copies the workspace over the real data each cycle, but we can avoid this by simply swapping what we consider “real” and “workspace” each cycle, and copying once at the end if required</li>
</ul>
<p>I’m not going to try and describe how or why radix sort works (wikipedia covers much of that <a href="https://en.wikipedia.org/wiki/Radix_sort">here</a>); the key thing <strong>that will be relevant in a moment</strong> is: for the group size <code class="highlighter-rouge">r</code>, it loops through all the data looking <code class="highlighter-rouge">r</code> bits at a time, to see how many values there are with each possible value for those <code class="highlighter-rouge">r</code> bits. So if <code class="highlighter-rouge">r=4</code>, there are 16 possible values over each 4 bits. Once it has that, it iterates a second time, writing the values into the corresponding places for the group it is in.</p>
<p>Once we have an implementation, our code basically consists of preparing the bit-sortable keys just like we did before, then simply invoking the algoritm, passing in our reusable workspace:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Helpers.RadixSort(sortKeys, index, keysWorkspace, valuesWorkspace, r);
</code></pre></div></div>
<p>(where <code class="highlighter-rouge">keysWorkspace</code> and <code class="highlighter-rouge">valuesWorkspace</code> are scratch areas of the required size, shared between sort cycles).</p>
<p>One consideration here is: what value of <code class="highlighter-rouge">r</code> (the number of bits to consider at a time) to choose. <code class="highlighter-rouge">4</code> is a reasonably safe default, but you can experiment with different values for your data to see what works well.</p>
<p>I get:</p>
<ul>
<li>r=2: 3800ms</li>
<li>r=4: 1900ms</li>
<li>r=8: 1200ms</li>
<li>r=16: 2113ms</li>
</ul>
<p>This r=8 is very tempting that is a significant improvement on our previous best.</p>
<h2 id="attempt-10-radix-sort-with-parellelization">Attempt 10: radix sort with parellelization<a class="anchorjs-link " href="#attempt-10-radix-sort-with-parellelization" aria-label="Anchor link for: attempt 10 radix sort with parellelization" data-anchorjs-icon="" style="font-style: normal; font-variant: normal; font-weight: normal; font-stretch: normal; font-size: 1em; line-height: 1; font-family: anchorjs-icons; padding-left: 0.375em;"></a></h2>
<p>Remember the “<strong>that will be relevant in a moment</strong>” from a few paragraphs ago? Recall: a key point of radix sort is that for each group of bits (of size <code class="highlighter-rouge">r</code>), it needs to iterate the entire key-set to count the frequencies of each possible group value. This count operation is something that is embarrasingly parallelizable, since counting chunks can be done independently over the entire data.</p>
<p>To do that, we can create a number of workers, divide the key-space into that many chunks, and tell each worker to perform the counts <em>for that chunk</em>. Fire these workers in parallel via <code class="highlighter-rouge">Parallel.Invoke</code> or similar, and reap the rewards. This creates a slight complexity that we need to <em>combine</em> the counts, and there will be thread races. A naive but thread-safe implementation would be to use <code class="highlighter-rouge">Interlocked.Increment</code> to do all the counts, but that would have severe collision penalties - it is far preferable to count each chunk in complete isolation, and only worry about the combination at the end. At that point, either <code class="highlighter-rouge">lock</code> or <code class="highlighter-rouge">Interlocked</code> would be fine, as it is going to happen very minimally. We should also be careful to hoist everything we want into a local, to avoid a lot of <code class="highlighter-rouge">ldarg.0</code>, <code class="highlighter-rouge">ldfld</code> overhead:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>public void Invoke()
{
var len = Length;
var mask = Mask;
var keys = Keys + Offset;
var shift = Shift;
int* count = stackalloc int[CountLength];
// count into a local buffer
for (int i = 0; i < len; i++)
count[(*keys++ >> shift) & mask]++;
// now update the origin data, synchronized
lock (SyncLock)
{
for (int i = 0; i < CountLength; i++)
Counts[i] += count[i];
}
}
</code></pre></div></div>
<p>Here we’re also using <code class="highlighter-rouge">stackalloc</code> to do all our counting in the stack space, rather than allocating a count buffer per worker. This is fine, since we’ll typically be dealing with values like <code class="highlighter-rouge">r=4</code> (<code class="highlighter-rouge">CountLength=16</code>). Even for larger <em>reasonable</em> <code class="highlighter-rouge">r</code>, the stack space is fine. We could very reasonably put an upper bound on <code class="highlighter-rouge">r</code> of <code class="highlighter-rouge">16</code> if we wanted to be sure.</p>
<p>Our calling code is virtually idental - all we’re doing is changing the internal implementation:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Helpers.RadixSortParallel(sortKeys, index, keysWorkspace, valuesWorkspace, r);
</code></pre></div></div>
<p>So what does this do for performance? Note: I’m using <code class="highlighter-rouge">Environment.ProcessorCount * 2</code> workers, but we could play with other values.</p>
<p>I get</p>
<ul>
<li>r=2: 3600ms</li>
<li>r=4: 1800ms</li>
<li>r=8: 1200ms</li>
<li>r=16: 2000ms</li>
</ul>
<p>So; we don’t get a <em>vast</em> improvement really - our key benefit comes from simply choosing a suitable <code class="highlighter-rouge">r</code> for our data, like <code class="highlighter-rouge">r=8</code>.</p>
<h2 id="throws-down-gauntlet">Throws down gauntlet<a class="anchorjs-link " href="#throws-down-gauntlet" aria-label="Anchor link for: throws down gauntlet" data-anchorjs-icon="" style="font-style: normal; font-variant: normal; font-weight: normal; font-stretch: normal; font-size: 1em; line-height: 1; font-family: anchorjs-icons; padding-left: 0.375em;"></a></h2>
<p>So; so far we’ve gone from 17s (LINQ) to 1.2s (radix sort, single-threaded or parallel). What more can we do? Can we parallelize the second half of radix sort? Can we try a completely different sort? Can we combine our index and keys so we are performing a single array sort? Can we make use of some obscure CPU instructions to perform 128-bit (or wider) operations to combine our existing 64-bit key and 32-bit value? Vectorize a key part of one of the existing algorithms with SIMD?</p>
<p>If you have more ideas, please feel free to fork and PR <a href="https://github.com/mgravell/SortOfProblem">from here</a>.</p>Marc Gravellhttp://www.blogger.com/profile/01023334706549710089noreply@blogger.comtag:blogger.com,1999:blog-8184237816669520763.post-11790635244400250332018-01-19T16:50:00.002-08:002018-01-30T06:18:56.119-08:00A Sort Of Problem<p>(<a href="http://blog.marcgravell.com/2018/01/more-of-sort-of-problem.html">part 2 here</a>)</p>
<p>(<a href="http://blog.marcgravell.com/2018/01/sorting-myself-out-extreme-edition.html">part 3 here</a>)</p>
<p>I <em>love</em> interesting questions, especially when they directly relate to things I need to do. A great question came up on Stack Overflow today about <a href="https://stackoverflow.com/q/48345753/23354">how to efficiently sort large data</a>. I gave an answer, but there’s <em>so much more</em> we can say on the topic, so I thought I’d turn it into a blog entry, exploring pragmatic ways to improve sort performance when dealing with non-trivial amounts of data. In particular, this is remarkably similar to time I’ve spent trying to make our “tag engine” faster.</p>
<h2 id="the-problem">The problem<a class="anchorjs-link " href="#the-problem" aria-label="Anchor link for: the problem" data-anchorjs-icon="" style="font-style: normal; font-variant: normal; font-weight: normal; font-stretch: normal; font-size: 1em; line-height: 1; font-family: anchorjs-icons; padding-left: 0.375em;"></a></h2>
<p>So, the premise is this:</p>
<ul>
<li>we have a complex entity, <code class="highlighter-rouge">SomeType</code>, with multiple properties</li>
<li>we have a large number of these entities - lets say 16M+</li>
<li>we want to sort this data using a sort that considers multiple properties - “this, then that”</li>
<li>and we want it to be fast</li>
</ul>
<p>Note that sorting data when it is already sorted or nearly-sorted is <em>usually</em> cheap under most common algorithms, so I’m going to be focusing only on the initial painful sort when the data is not at all sorted.</p>
<p>Because we’re going to have so many of them, and they are going to be basic storage types only, this is a good scenario to consider a <code class="highlighter-rouge">struct</code>, and I was delighted to see that the OP in the question had already done this. We’ll play with a few of the properties (for sorting, etc), but to simulate the usual context, there will be extra stuff that isn’t relevant to the question, so we’ll pad the size of the struct with some dummy fields up to 64 bytes. So, something like:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>readonly partial struct SomeType
{
public int Id { get; }
public DateTime ReleaseDate { get; }
public double Price { get; }
public SomeType(int id, DateTime releaseDate, double price)
{
Id = id;
ReleaseDate = releaseDate;
Price = price;
_some = _other = _stuff = _not = _shown = 0;
}
#pragma warning disable CS0414 // suppress "assigned, never used"
private readonly long _some, _other, _stuff, _not, _shown;
#pragma warning restore CS0414
}
</code></pre></div></div>
<p>Note: yes, I know that <code class="highlighter-rouge">double</code> is a terrible choice for something that describes money.</p>
<p>Note: <code class="highlighter-rouge">readonly struct</code> is a new C# feature, <a href="https://blogs.msdn.microsoft.com/mazhou/2017/11/21/c-7-series-part-6-read-only-structs/">described in more detail here</a> - this is a good fit, and might help us avoid some large “load” costs.</p>
<p>For something interesting to do, we’ll try sorting things “most recent, then cheapest”.</p>
<h2 id="inventing-some-data">Inventing some data<a class="anchorjs-link " href="#inventing-some-data" aria-label="Anchor link for: inventing some data" data-anchorjs-icon="" style="font-style: normal; font-variant: normal; font-weight: normal; font-stretch: normal; font-size: 1em; line-height: 1; font-family: anchorjs-icons; padding-left: 0.375em;"></a></h2>
<p>The first thing we need is some data; a very basic seeded random data script might be something like:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>var rand = new Random(data.Length);
for (int i = 0; i < data.Length; i++)
{
int id = rand.Next();
var releaseDate = Epoch
.AddYears(rand.Next(50))
.AddDays(rand.Next(365))
.AddSeconds(rand.Next(24 * 60 * 60));
var price = rand.NextDouble() * 50000;
data[i] = new SomeType(
id, releaseDate, price);
}
</code></pre></div></div>
<h2 id="attempt-1-linq">Attempt 1: LINQ<a class="anchorjs-link " href="#attempt-1-linq" aria-label="Anchor link for: attempt 1 linq" data-anchorjs-icon="" style="font-style: normal; font-variant: normal; font-weight: normal; font-stretch: normal; font-size: 1em; line-height: 1; font-family: anchorjs-icons; padding-left: 0.375em;"></a></h2>
<p>LINQ is great; I love LINQ, and it makes some code very expressive. So let’s try the most obvious thing first:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>sorted = (from item in data
orderby item.ReleaseDate descending,
item.Price
select item).ToArray();
</code></pre></div></div>
<p>This LINQ expression performs the sort we want, creating a copy of the data - but on my machine this takes about 17 seconds to run - not ideal. So that’s the target to beat. The key thing about LINQ is that it is designed for <em>your</em> efficiency, i.e. the size and complexity of the code that you need to write, on the assumption that you’ll only use it on reasonable data. We do not have reasonable data here.</p>
<h2 id="attempt-2-icomparablet">Attempt 2: <code class="highlighter-rouge">IComparable<T></code><a class="anchorjs-link " href="#attempt-2-icomparablet" aria-label="Anchor link for: attempt 2 icomparablet" data-anchorjs-icon="" style="font-style: normal; font-variant: normal; font-weight: normal; font-stretch: normal; font-size: 1em; line-height: 1; font-family: anchorjs-icons; padding-left: 0.375em;"></a></h2>
<p>Since we’re talking about arrays, another obvious thing to do is <code class="highlighter-rouge">Array.Sort</code>; for the simplest version of that, we need to implement <code class="highlighter-rouge">IComparable<T></code> on our type:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>partial struct SomeType : IComparable<SomeType>
{
int IComparable<SomeType>.CompareTo(SomeType other)
{
var delta = other.ReleaseDate
.CompareTo(this.ReleaseDate);
if (delta == 0) // second property
delta = this.Price.CompareTo(other.Price);
return delta;
}
}
</code></pre></div></div>
<p>And then we can use:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Array.Sort<SomeType>(data);
</code></pre></div></div>
<p>to perform an in-place sort. The generic <code class="highlighter-rouge"><SomeType></code> here is actually redundant, but I’ve included it to make it obious that I am using the generic API.</p>
<p>This takes just over 6 seconds for me, so: a huge improvement! Note that for the purpose of our tests, we will re-populate the data after this, to oensure that all tests start with randomized data. For brevity, assume we’re doing this whenever necessary - I won’t keep calling it out.</p>
<h2 id="attempt-3-icomparert">Attempt 3: <code class="highlighter-rouge">IComparer<T></code><a class="anchorjs-link " href="#attempt-3-icomparert" aria-label="Anchor link for: attempt 3 icomparert" data-anchorjs-icon="" style="font-style: normal; font-variant: normal; font-weight: normal; font-stretch: normal; font-size: 1em; line-height: 1; font-family: anchorjs-icons; padding-left: 0.375em;"></a></h2>
<p>There’s a second common sort API: <code class="highlighter-rouge">IComparer<T></code> custom comparers. This has the advantages that a: you don’t need to edit the target type, and b: you can support multiple different sorts against the same type via different custom comparers. For this, we add or own comparer:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>sealed class SomeTypeComparer : IComparer<SomeType>
{
private SomeTypeComparer() { }
public static SomeTypeComparer Default { get; } = new SomeTypeComparer();
int IComparer<SomeType>.Compare(SomeType x, SomeType y)
{
var delta = y.ReleaseDate
.CompareTo(x.ReleaseDate);
if (delta == 0) // second property
delta = x.Price.CompareTo(y.Price);
return delta;
}
}
</code></pre></div></div>
<p>using it via:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Array.Sort<SomeType>(data, SomeTypeComparer.Default);
</code></pre></div></div>
<p>This takes around 8 seconds; we’re not going in the right direction here!</p>
<h2 id="attempt-4-comparisont">Attempt 4: <code class="highlighter-rouge">Comparison<T></code><a class="anchorjs-link " href="#attempt-4-comparisont" aria-label="Anchor link for: attempt 4 comparisont" data-anchorjs-icon="" style="font-style: normal; font-variant: normal; font-weight: normal; font-stretch: normal; font-size: 1em; line-height: 1; font-family: anchorjs-icons; padding-left: 0.375em;"></a></h2>
<p>Why stop at two ways to do the same thing, when we can have 3? For completeness, there’s yet another primary <code class="highlighter-rouge">Array.Sort<T></code> variant that takes a delegate, for example:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Array.Sort<SomeType>(data, (x, y) =>
{
var delta = y.ReleaseDate
.CompareTo(x.ReleaseDate);
if (delta == 0) // second property
delta = x.Price.CompareTo(y.Price);
return delta;
});
</code></pre></div></div>
<p>This keeps the “do a sort” and “like this” code all in the same place, which is nice; but: it performs virtually identically to the previous attempt, at around 8 seconds.</p>
<h1 id="first-intermission-what-is-going-wrong">First intermission: what is going wrong?</h1>
<p>We’re doing a lot of work here, that much is true; but there are things that are exacerbating the situation:</p>
<ul>
<li>we have a large struct, which means we need to copy that data on the stack whenever we do anything</li>
<li>because it needs to compare values to their neighbours, there are a <em>lot</em> of virtual calls going on</li>
</ul>
<p>These costs are fine for reasonable data, but for larger volumes the costs start building up.</p>
<p>We need an alternative.</p>
<p>It happens that <code class="highlighter-rouge">Array.Sort</code> also has overloads that accept <em>two</em> arrays - the keys and the values. What this does is: perform the sort logic on the <em>first</em> array, but whenever it swaps data around: it swaps the corresponding items in <strong>both</strong> arrays. This has the effect of sorting the <em>second</em> array by the values of the <em>first</em>. In visual terms, it is like selecting two columns of a spreadsheet and clicking sort.</p>
<p>If we only had a single value, this would be great! For example…</p>
<h2 id="attempt-5-dual-arrays-single-property">Attempt 5: dual arrays, single property<a class="anchorjs-link " href="#attempt-5-dual-arrays-single-property" aria-label="Anchor link for: attempt 5 dual arrays single property" data-anchorjs-icon="" style="font-style: normal; font-variant: normal; font-weight: normal; font-stretch: normal; font-size: 1em; line-height: 1; font-family: anchorjs-icons; padding-left: 0.375em;"></a></h2>
<p>Let’s pretend for a moment that we only want to sort by the date, in ascending order. Which isn’t what we want, but: humour me.</p>
<p>What we <em>could</em> do is keep a <code class="highlighter-rouge">CreationDate[]</code> hanging around (reuse it between operations), and when we want to sort: populate the data we want to sort by into this array:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>for (int i = 0; i < data.Length; i++)
releaseDates[i] = data[i].ReleaseDate;
</code></pre></div></div>
<p>and then to sort:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Array.Sort(releaseDates, data);
</code></pre></div></div>
<p>For me, this takes about 150ms to prepare the keys, and 4.5s to execute the sort. Promising, although hard to tell if that is useful until we can handle the complex dual sort.</p>
<h1 id="second-intermission-how-can-we-compose-the-sort">Second intermission: how can we compose the sort?</h1>
<p>We have two properties that we want to sort by, and a <code class="highlighter-rouge">Sort</code> method that only takes a single value. We could start looking at tuple types, but that is just making things even more complex. What we want is a way to <em>simplify</em> the complex sort into a single value. What if we could use something simple like an integer to represent our combined sort? Well, we can!</p>
<p>Many basic values can - either directly, or via a hack - be treated as a bitwise-sortable value. By bitwise sortable, I essentially mean: “sorts like the samme bits expressed as an unsigned integer would sort”. Consider a 32-bit integer: obviously an unsigned integer sorts <em>just like</em> an unsigned integer, but a <em>signed</em> integer does not - negative numbers are problematic. What would be great is if <code class="highlighter-rouge">int.MinValue</code> was treated as 0, with <code class="highlighter-rouge">int.MinValue + 1</code> treated as 1, etc; we can do that by subtraction:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>protected static ulong Sortable(int value)
{
// re-base eveything upwards, so anything
// that was the min-value is now 0, etc
var val = unchecked((uint)(value - int.MinValue));
return val;
}
</code></pre></div></div>
<p>The result of this is that <code class="highlighter-rouge">Sortable</code> will return 32-bits worth of data (the same as the input), but with <code class="highlighter-rouge">000...000</code> as the minimum expected value, and <code class="highlighter-rouge">111...111</code> as the maximum expected value.</p>
<p>Now; notice that we’re only talking about 32 bits here, but we’ve returne a <code class="highlighter-rouge">ulong</code>; that’s because we’re going to pack <em>2</em> values into a single token.</p>
<p>For our <em>actual</em> data, we hae two pieces:</p>
<ul>
<li>a <code class="highlighter-rouge">DateTime</code></li>
<li>a <code class="highlighter-rouge">Double</code></li>
</ul>
<p>Now, that’s 16 bytes worth of data, and we only have 8 to play with. This sounds like a dilemma, but <em>usually</em>: we can cheat by fudging the precision.</p>
<p>For many common applications - and especially things like a <code class="highlighter-rouge">ReleaseDate</code>, most of the bits in a <code class="highlighter-rouge">DateTime</code> are not useful. We probably don’t need to handle every tick in a 10,000-year range. We can <em>almost certainly</em> use per-second precision - perhaps even per-<em>day</em> for a release date. Unix time in seconds using 32 bits has us covered until January 19, 2038. If we need <em>less precision than seconds</em>, we can extend that hugely; and we can often use a different epoch that fits our minimum expected data. Heck, starting at the year 2000 instead of 1970 buys 30 years even in per-second precision. Time in an epoch is bitwise-sortable.</p>
<p>Likewise, an obvious way of approximating a <code class="highlighter-rouge">double</code> in 32 bits would be to cast it as a <code class="highlighter-rouge">float</code>. This doesn’t have the same range or precision, but <em>will usually be just fine</em> for sorting purposes. Floating point data in .NET <a href="https://en.wikipedia.org/wiki/IEEE_754">has a complex internal structure</a>, but fortunately making it bitwise-sortable can be achieved through some simple well-known bit hacks:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>protected static unsafe ulong Sortable (float value)
{
const int MSB = 1 << 31;
int raw = *(int*)(&value);
if ((raw & MSB) != 0) // IEEE first bit is the sign bit
{
// is negative; shoult interpret as -(the value without the MSB) - not the same as just
// dropping the bit, since integer math is twos-complement
raw = -(raw & ~MSB);
}
return Sortable(raw);
}
</code></pre></div></div>
<p>Putting these together, we havev all the tools we need to create a single composite value that is <strong>totally meanningless</strong> for all ordinary purposes, but which represents our sort perfectly.</p>
<h2 id="attempt-6-dual-arrays-dual-property">Attempt 6: dual arrays, dual property<a class="anchorjs-link " href="#attempt-6-dual-arrays-dual-property" aria-label="Anchor link for: attempt 6 dual arrays dual property" data-anchorjs-icon="" style="font-style: normal; font-variant: normal; font-weight: normal; font-stretch: normal; font-size: 1em; line-height: 1; font-family: anchorjs-icons; padding-left: 0.375em;"></a></h2>
<p>We can create a method that composes our two properties:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>static ulong Sortable(in SomeType item)
{
return (~(ulong)item.ReleaseDate.ToMillenialTime()) << 32
| Sortable((float)item.Price);
}
</code></pre></div></div>
<p>This might look complex, but what it does is:</p>
<ul>
<li>compute the time in seconds since 2000 as a 32-bit unsigned integer</li>
<li>extends it to 64-bits</li>
<li><em>inverts it</em>; this has the same effect as “descending”, since it reverses the order</li>
<li>left shifts it by 32 bits, to place those 32 bits in the <strong>upper</strong> half of our 64 bits (padding on the right with zero)</li>
<li>compute the bitwise-sortable representation of the price as a 32-bit unsigned integer</li>
<li>throws that value into the lower 32 bits</li>
</ul>
<p>We can prepare our data into a <code class="highlighter-rouge">ulong[]</code> that we keep around between sort operations:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>for (int i = 0; i < data.Length; i++)
sortKeys[i] = Sortable(in data[i]);
</code></pre></div></div>
<p>and finally sort:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Array.Sort(sortKeys, data);
</code></pre></div></div>
<p>The prepare operation is more complex now - and has gone up to 300ms, but the sort is <em>faster</em> at just over 4 seconds. We’re moving in the right direction. Note that the prepare operation is <a href="https://en.wikipedia.org/wiki/Embarrassingly_parallel">embarrassingly parallelizable</a>, so we can trivially divide that over a number of cores (say: 16 blocks of 1M records per block) - and can often be further reduced by storing the data in the struct in similar terms to the sortable version (so: the same representation of time, and the same floating-point scale) - thus I’m not going to worry about the prepare cost here.</p>
<p>But we’re still paying a lot of overhead from having to move around those big structs. We can avoid that by… just not doing that!</p>
<h2 id="attempt-7-indexed">Attempt 7: indexed<a class="anchorjs-link " href="#attempt-7-indexed" aria-label="Anchor link for: attempt 7 indexed" data-anchorjs-icon="" style="font-style: normal; font-variant: normal; font-weight: normal; font-stretch: normal; font-size: 1em; line-height: 1; font-family: anchorjs-icons; padding-left: 0.375em;"></a></h2>
<p>Rather than sorting our <code class="highlighter-rouge">SomeType[]</code> array, we could instead <em>leave that data alone</em>, forever. Never move the items around (although it is usually fine to replace them with updates). This has multiple advantages, but the one we’re keen on is the reduction of cost copying the data.</p>
<p>So; we can declare an <code class="highlighter-rouge">int[] index</code> that is our <em>index</em> - it just tells us the offsets to look in the <em>actual</em> data. We can sort that index <em>as though</em> it were the actual data, and just make sure we go through the index. We need to initialize the index as well as the composite sortable value (although we can re-use the positions if we are re-sorting the data, as usually the data doesn’t move much between cycles - we’ll get another huge boost on re-sorts when the data hasn’t drifted much; we <em>do not</em> need to reset the index when re-sorting the same data):</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>for (int i = 0; i < data.Length; i++)
{
index[i] = i;
sortKeys[i] = Sortable(in data[i]);
}
</code></pre></div></div>
<p>and sort:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Array.Sort(sortKeys, index);
</code></pre></div></div>
<p>The only complication is that now, to access the sorted data - instead of looking at <code class="highlighter-rouge">data[i]</code> we need to look at <code class="highlighter-rouge">data[index[i]]</code>, i.e. find the i’th item in the index, and use <em>that value</em> as the offset in the actual data.</p>
<p>This takes the time down to 3 seconds - we’re getting there.</p>
<h2 id="attempt-8-indexed-direct-compare-no-range-checks">Attempt 8: indexed, direct compare, no range checks<a class="anchorjs-link " href="#attempt-8-indexed-direct-compare-no-range-checks" aria-label="Anchor link for: attempt 8 indexed direct compare no range checks" data-anchorjs-icon="" style="font-style: normal; font-variant: normal; font-weight: normal; font-stretch: normal; font-size: 1em; line-height: 1; font-family: anchorjs-icons; padding-left: 0.375em;"></a></h2>
<p>The introspective sort that <code class="highlighter-rouge">Array.Sort</code> does is great, but it is still going to be talking via the general <code class="highlighter-rouge">CompareTo</code> API on our key type (<code class="highlighter-rouge">ulong</code>), and using the array indexers extensively. The JIT in .NET is good, but we can help it out a <em>little bit</em> more by … “borrowing” (ahem) the <a href="https://github.com/dotnet/coreclr/blob/775003a4c72f0acc37eab84628fcef541533ba4e/src/mscorlib/src/System/Collections/Generic/ArraySortHelper.cs"><code class="highlighter-rouge">IntroSort</code> code</a>, and:</p>
<ul>
<li>replacing the <code class="highlighter-rouge">CompareTo</code> usage on the keys with direct integer operations</li>
<li>replacing the array access with <code class="highlighter-rouge">unsafe</code> code that uses <code class="highlighter-rouge">ulong*</code> (for the keys) and <code class="highlighter-rouge">int*</code> (for the index)</li>
</ul>
<p>(as an aside, it’ll be interesting to see how this behaves with <code class="highlighter-rouge">Span<T></code>, but that’s not complete yet).</p>
<p>I’m not going to show the implementation here, but it is available in the source project. Our code for consuming this doesn’t change much, except to call our butchered sort API:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Helpers.Sort(sortKeys, index);
</code></pre></div></div>
<p>For me this now takes just over 2.5 seconds.</p>
<h1 id="conclusion">Conclusion</h1>
<p>So there we go; I’ve explored some common approaches to improrving sort performance; we’ve looked at LINQ; we’ve looked at basic sorts using comparables, comparers and comparisons (which are all different, <em>obviously</em>); we’ve looked at keyed dual-array sorts; we’ve looked at <em>indexed</em> sorts (where the source data remains unsorted); and finally we’ve hacked the introspective sort to squeeze a tiny bit more from it.</p>
<p>We’ve seen performance range from 17 seconds for LINQ, 8 seconds for the 3 basic sort APIs, then 4 seconds for our dual array sorts, 3 seconds for the indexed sort, and finally 2.5 seconds with our hacked and debased version.</p>
<p>Not bad for a night’s work!</p>
<p>All the code discussed here is <a href="https://github.com/mgravell/SortOfProblem">available on github</a>.</p>
<p>(<a href="http://blog.marcgravell.com/2018/01/more-of-sort-of-problem.html">part 2 here</a>)</p>Marc Gravellhttp://www.blogger.com/profile/01023334706549710089noreply@blogger.comtag:blogger.com,1999:blog-8184237816669520763.post-50616482326677166432017-12-06T02:47:00.004-08:002017-12-06T08:24:38.392-08:00Dapper, Prepared Statements, and Car Tyres<h2><a id="why-doesnt-dapper-use-prepared-statements" class="anchor" href="#why-doesnt-dapper-use-prepared-statements" aria-hidden="true"><svg aria-hidden="true" class="octicon octicon-link" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Why Doesn't Dapper Use Prepared Statements?</h2>
<p>I had a very interesting email in my inbox this week from a <a href="https://www.nuget.org/packages/Dapper/">Dapper</a> user; I'm not going to duplicate the email here, but it can be boiled down to:</p>
<blockquote>
<p>My external security consultant is telling me that Dapper is insecure because it doesn't use prepared statements, and is therefore susceptible to SQL injection. What are your thoughts on this?</p>
</blockquote>
<p>with a Dapper-specific example of something comparable to:</p>
<pre><code>List<Order> GetOpenOrders(int customerId) => _connection.Query<Order>(
"select * from Orders where CustomerId=@customerId and Status=@Open",
new { customerId, OrderStatus.Open }).AsList();
</code></pre>
<p>Now this is a fun topic for me, because in my head I'm reading it in the same way that I would read:</p>
<blockquote>
<p>My car mechanic is telling me my car is dangerous because it doesn't use anti-smear formula screen-wash, and is therefore susceptible to skidding in icy weather. What are your thoughts on this?</p>
</blockquote>
<p>Basically, these are two completely unrelated topics. You can have a perfectly good and constructive conversation about either in isolation. There are merits to both discussions. But when you smash them together, it might suggest that the person raising the issue (the "security consultant" in this case, not the person sending me the email) has misunderstood something fundamental.</p>
<p>My initial response - while in my opinion valid - probably needs to be expanded upon:</p>
<p><a href="https://twitter.com/marcgravell/status/938128960474492928"><img src="https://pbs.twimg.com/media/DQTnTo4W4AEA7Gz.jpg" alt="Hi! No problem at all. If your security consultant is telling you that a correctly parameterized SQL query is prone to SQL injection, then your security consultant is a fucking idiot with no clue what they're talking about, and you can quote me on that." style="max-width:100%;"></a></p>
<p>So; let's take this moment to discuss the two topics and try to put this beast to bed!</p>
<h2><a id="part-the-first-what-is-sql-injection" class="anchor" href="#part-the-first-what-is-sql-injection" aria-hidden="true"><svg aria-hidden="true" class="octicon octicon-link" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Part The First: What is SQL injection?</h2>
<p>Most folks will be very familiar with this, so I'm not going to cover every nuance, but: SQL injection is the <strong>major</strong> error made by concatenating inputs into SQL strings. It could be typified by the bad example:</p>
<pre><code>string customerName = customerNameTextBox.Value; // or a http request input; whatever
var badOptionOne = connection.Query<Customer>(
"select * from Customers where Name='" + customerName + "'");
var badOptionTwo = connection.Query<Customer>(
string.Format("select * from Customers where Name='{0}'", customerName));
var badOptionThree = connection.Query<Customer>(
$"select * from Customers where Name='{customerName}'");
</code></pre>
<p>As an aside on <code>badOptionThree</code>, it <em>really</em> frustrates me that C# overloading prefers <code>string</code> to <code>FormattableString</code> (interpolated <code>$"..."</code> strings can be assigned to either, but only <code>FormattableString</code> retains the semantics). I would really have loved to be able to add a method to Dapper like:</p>
<pre><code>[Obsolete("No! Bad developer! Bobby Tables will find you in the dark", error: true)]
public static IEnumerable<T> Query<T>(FormattableString query, /* other args not shown */)
=> throw new InvalidOperation(...);
</code></pre>
<p>This category of coding error is perpetually on the <a href="https://www.owasp.org/index.php/Top_10_2017-A1-Injection">OWASP "top 10" list</a>, and is now infamously associated with <a href="https://xkcd.com/327/">xkcd's "Bobby Tables"</a>:</p>
<p><a href="https://xkcd.com/327/"><img src="https://imgs.xkcd.com/comics/exploits_of_a_mom.png" alt="Did you really name your son Robert'); DROP TABLE Students;-- ?" style="max-width:100%;"></a></p>
<p>The problem, as the cartoon shows us, is that this allows malicious input to do unexpected and dangerous things. <em>In this case</em> the hack was to use a quote to end a SQL literal (<code>');...</code> - in this case with the attacker guessing that the clause was inside parentheses), then issue a separate command (<code>DROP TABLE ...</code>), then discard anything at the end of the original query using a comment (<code>-- ...</code>). But the issue is not limited to quotes, and frankly any attempt to play "replace/escape the risky tokens" is an arms race where you need to win every time, but the attacker only needs to win once. Don't play that game.</p>
<p>It can also be a huge internationalization problem, familiar to every developer who has received bug reports about the search not working for some people of Irish or Scottish descent. This (SQL injection - not Irish or Scottish people) is such an exploitable problem that readily available tools exist that can <em>trivially</em> seach a site for exploitable inputs and give free access to the database with a friendly UI. So... yeah, you really don't want to have SQL injection bugs. No argument there.</p>
<h2><a id="so-how-do-we-prevent-sql-injection" class="anchor" href="#so-how-do-we-prevent-sql-injection" aria-hidden="true"><svg aria-hidden="true" class="octicon octicon-link" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>So how do we prevent SQL injection?</h2>
<p>The solution to SQL injection is <em>parameters</em>. One complaint I have about the xkcd comic - and a range of other discussions on the topic - is the suggestion that you should "sanitize" your inputs to prevent SQL injection. Nope! Wrong. You "sanitize" your inputs to check that they are <em>within what your logic allows</em> - for example, if the only permitted options from a drop-down are 1, 2 and 3 - then you might want to check that they haven't entered 42. Sanitizing the inputs <em>is not</em> the right solution to SQL injection: parameters are. We already showed an example of parameters in my SQL example at the top, but to take our search example:</p>
<pre><code>string customerName = customerNameTextBox.Value; // or a http request input; whatever
var customers = connection.Query<Customer>(
"select * from Customers where Name=@customerName",
new { customerName });
</code></pre>
<p>What <em>this</em> does is add a <em>parameter</em> called "customerName" with the chosen value, passing that <strong>alongside and separate to</strong> the command text, in a raw form that doesn't need it to be encoded to work inside a SQL string. At no point does the parameter value get written into the SQL as a literal. Well, except perhaps on some DB backends that don't support parameters at all, in which case frankly it is up to the DB provider to get the handling right (and: virtually all RDBMS have first-class support for parameters).</p>
<p>Note that parameters solve other problems too:</p>
<ul>
<li>the formatting of things like dates and numbers: if you use injection you need to know the format that the RDBMS expects, which is usually not the format that the "current culture" is going to specify, making it awkward; but by using a parameter, the value <em>doesn't need to be formatted as text at all</em> - with things like numbers and dates usually being sent in a raw binary format - some-defined-endian-fixed-width for integers (including dates), or something like IEEE754 for floating point.</li>
<li>query-plan re-use: the RDBMS can cache our <code>...Name=@customerName</code> query and re-use the same plan automatically and trivially (without saturating the cache with a different plan for every unique name searched), with different values of the parameter <code>@customerName</code> - this can provide a great performance boost (side note: this can be double-edged, so you should <em>also</em> probably learn about <a href="https://blogs.msdn.microsoft.com/sqlprogrammability/2008/11/26/optimize-for-unknown-a-little-known-sql-server-2008-feature/"><code>OPTIMIZE FOR ... UNKNOWN</code></a> (or the equivalent on your chosen RDBMS) if you're serious about SQL performance - note this should only be added <em>reactively</em> based on actual performance investigations)</li>
</ul>
<h2><a id="dapper-loves-parameters" class="anchor" href="#dapper-loves-parameters" aria-hidden="true"><svg aria-hidden="true" class="octicon octicon-link" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Dapper loves parameters</h2>
<p>Parameterization is great; Dapper loves parameterization, and does <em>everything it can</em> to make it easy for you to parameterize your queries. So: whatever criticism you want to throw at Dapper: SQL injection isn't really a fair one. The only time Dapper will be complicit in SQL injection is when you feed it a query that <em>already has</em> an injection bug before Dapper ever sees it. We can't fix stupid.</p>
<p>For full disclosure: there <em>is</em> actually one case where Dapper allows literal injection. Consider our <code>GetOpenOrders</code> query from above. This can <em>also</em> be written:</p>
<pre><code>List<Order> GetOpenOrders(int customerId) => _connection.Query<Order>(
"select * from Orders where CustomerId=@customerId and Status={=Open}",
new { customerId, OrderStatus.Open }).AsList();
</code></pre>
<p>Note that instead of <code>@Open</code> we're now using <code>{=Open}</code>. This is not SQL syntax - it is telling <em>Dapper</em> to do an injection of a literal value. This is intended for things <em>that don't change per query</em> such as status codes - and can result <em>in some cases</em> in performance improvements. Dapper doesn't want to make it easy to blow your own feet off, so it <em>STRICTLY</em> only allows this for integers (including <code>enum</code> values, which are fundamentally integers), since integers are a: very common for this scenario, and b: follow predictable rules as literals.</p>
<h2><a id="part-the-second-what-are-prepared-statements" class="anchor" href="#part-the-second-what-are-prepared-statements" aria-hidden="true"><svg aria-hidden="true" class="octicon octicon-link" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Part The Second: What are prepared statements?</h2>
<p>There's often a slight confusion here with "stored procedures", so we'll have to touch on that too...</p>
<p>It is pretty common to issue commands to a RDBMS, where the SQL for those commands is contained <em>in the calling application</em>. This isn't <em>universal</em> - some applications are written with all the SQL in "stored procedures" that are deployed separately to the server, so the only SQL in the application is the names to invoke. There are merits of both approaches, which might include discussions around:</p>
<ul>
<li>isolation - the ability to deploy and manage the SQL separately to the application (which might be desirable in client applications in particlar, where re-deploying all the client installations to fix a small SQL bug is hard or expensive)</li>
<li>performance - <em>historically</em> stored procedures tended to out-perform ad-hoc commands; in most modern RDBMS this simply isn't a real concern, with the query-plan-cache working virtually identically regardless of the mechanism</li>
<li>granular security - in a high security application you might not want users (even if the "user" is a central app-server) to have direct <code>SELECT</code> permission on the tables or views - instead preferring to wrap the <em>allowed</em> queries in stored procedures that the calling user can be granted <code>EXEC</code> permission; of course a counter-argument there is that a blind <code>EXEC</code> can <em>hide</em> what a stored procedure is doing (so it does something the caller didn't expect), but ultimately if someone has pwned your RDBMS server, you're already toast</li>
<li>flexibility - being able to construct SQL to match a <em>specific scenario</em> (for example: the exact combination of 17 search options) can be important to improving performance (compared, in our search example, to 17 <code>and (@someArg is null or row.SomeCol=@someArg)</code> clauses). Tools like LINQ and ORMs rely extensively on runtime query generation to match the queries and model known at runtime, so allowing them to execute ad-hoc parameterized commands is required; it should also be noted that most RDBMS can <em>also</em> execute ad-hoc parameterzied commands <em>from within SQL</em> - via things like <a href="https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sp-executesql-transact-sql"><code>sp_executesql</code></a> from <em>inside</em> a stored procedure</li>
</ul>
<p>You'll notice that SQL injection is not part of that discussion on the merits of "ad-hoc commands" vs "stored procedures", because parameterization <em><strong>makes it</strong></em> a non-topic.</p>
<p>So: let's assume that we've had the conversation about stored procedures and we've decided to use ad-hoc statements.</p>
<h2><a id="what-does-is-mean-to-prepare-a-statement" class="anchor" href="#what-does-is-mean-to-prepare-a-statement" aria-hidden="true"><svg aria-hidden="true" class="octicon octicon-link" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>What does it mean to "prepare" a statement?</h2>
<p>"Preparing a statement" is a sometimes-optional / sometimes-mandatory (depending on the RDBMS) step required to issue ad-hoc SQL commands. Conceptually, it takes our <code>"select * from Orders where CustomerId=@customerId and Status=@Open"</code> query - along with the defined parameters - and says "I'm going to want to run this in a moment; kindly figure out what that means to you and get everything in place". In terms of ADO.NET, this means calling the <a href="https://msdn.microsoft.com/en-us/library/system.data.common.dbcommand.prepare(v=vs.110).aspx"><code>DbCommand.Prepare()</code></a> method. There are 3 possible outcomes of a <code>Prepare()</code> call (ignoring errors):</p>
<ul>
<li>it does <em>literally nothing</em> - a no-op; this might commonly be the case if you've told it that you're running a stored procedure (it is already as prepared as it will ever be), or if your chosen RDBMS isn't interested in the concept of prepared statements</li>
<li>it runs an <em>additional optional</em> operation that it wouldn't have done otherwise - adding a round trip</li>
<li>it runs a <em>required</em> operation that it was <em>otherwise going to do automatically</em> when we executed the query</li>
</ul>
<p>So <em>on the surface</em>, the <strong>best case</strong> is that we achieve no benefit (the first and third options). The <strong>worst case</strong> is that we've added a round trip. You might be thinking "so why does <code>Prepare()</code> exist, if it is only ever harmful?" - and the reason is: I've only talked about running the operation <em>once</em>.</p>
<p>The main scenario in which <code>Prepare()</code> helps us is when you're going to be issuing <em>exactly the same command</em> (including the parameter definition, but not values), on <em>exactly the same connection</em>, <em>many many times</em>, and especially when your RDBMS <em>requires</em> command preparation. In that scenario, preparing a statement can be a very important <em>performance tweak</em>.</p>
<p>You'll notice - similarly to stored procedures - that SQL injection is not part of that discussion on the merits of "prepared statements".</p>
<p>It is entirely true to say that Dapper does not currently call <code>Prepare()</code>.</p>
<h2><a id="why-doesnt-dapper-prepare-statements" class="anchor" href="#why-doesnt-dapper-prepare-statements" aria-hidden="true"><svg aria-hidden="true" class="octicon octicon-link" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Why doesn't Dapper <code>Prepare()</code> statements?</h2>
<p>There are various reasons for this, but the most important one is: on most providers, a prepared statement is scoped to the connection <em>and</em> is stored as part of the <code>DbCommand</code>. To <em>actually provide</em> a useful prepared statement story, Dapper would need to store and re-use every <code>DbCommand</code> for every <code>DbConnection</code>. Dapper <em>really, really</em> doesn't want to store your connections. It is designed with high concurrency in mind, and typically works in scenarios where the <code>DbConnection</code> is short-lived - perhaps scoped to the context of a single web-request. Note that connection pooling doesn't mean that the <em>underlying</em> connection is short-lived, but Dapper only gets to see the managed <code>DbConnection</code>, so anything else is opaque to it.</p>
<p>Without tracking every <code>DbConnection</code> / <code>DbCommand</code> and <em>without a new abstraction</em>, the best Dapper could do would be to call <code>.Prepare()</code> on every <code>DbCommand</code> immediately before executing it - but this is <em>exactly</em> the situation we discussed previously where the only two options are "has no effect" and "makes things worse".</p>
<p>Actually, there <em>is</em> one scenario <em>using the current API</em> in which Dapper <em>could</em> usefully consider doing this, which is the scenario:</p>
<pre><code>connection.Execute(someSql, someListOfObjects);
</code></pre>
<p>In <em>this case</em>, Dapper <em>unrolls</em> <code>someListOfObjects</code>, executing <code>someSql</code> with the parameters from each object in turn - on the same connection. I will acknowledge that a case could be made for Dapper to call <code>.Prepare()</code> in anticipation of the loop here, although it would require some changes to implement.</p>
<p>But fundamentally, the main objection that dapper has to prepared statements is that <em>typically</em>, the <em>connections</em> that Dapper works with are transient and short-lived.</p>
<h2><a id="could-dapper-usefully-offer-a-prepare-api-for-systems-with-long-lived-connections" class="anchor" href="#could-dapper-usefully-offer-a-prepare-api-for-systems-with-long-lived-connections" aria-hidden="true"><svg aria-hidden="true" class="octicon octicon-link" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Could Dapper usefully offer a <code>Prepare()</code> API for systems with long-lived connections?</h2>
<p>Hypothetically, yes: there <em>is</em> something that Dapper could do here, <em>specifically targeted</em> at the scenario:</p>
<blockquote>
<p>I have a long-lived connection and an RDBMS that needs statements to be prepared, and I want the best possible performance when issuing repeated ad-hoc parameterized commands.</p>
</blockquote>
<p>We could conceptualize an API that pins a command to a single connection:</p>
<pre><code>var getOrders = connection.Prepare<Order>(
"select * from Orders where CustomerId=@customerId",
new { customerId = 123 }); // dummy args, for type inference
// ...
var orders = getOrders.Query(new { customerId }).AsList();
</code></pre>
<p>Note that in this imaginary API the connection is trapped and pinned inside the object that we stored in <code>getOrders</code>. There are some things that would need to be considered - for example, how does this work for literal injection and Dapper's fancy "<code>in</code>" support. A trivial answer might be: just don't support those features when used with <code>.Prepare()</code>.</p>
<p>I think there's plenty of merit to have this kind of discussion, and I'm 100% open to discussing API features and additions. <em>As long as</em> we are discussing the right thing - i.e. the "I have a long-lived..." discussion from above.</p>
<p>If, however, we start that conversation (via a security consultant) via:</p>
<blockquote>
<p>I want to use prepared statements to avoid SQL injection</p>
</blockquote>
<p>then: that <em>is not</em> a useful discussion.</p>
<h2><a id="tldr" class="anchor" href="#tldr" aria-hidden="true"><svg aria-hidden="true" class="octicon octicon-link" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Tl;dr:</h2>
<p>If you want to avoid your car skidding in icy weather, you fit appropriate tyres. You don't change the screen-wash.</p>Marc Gravellhttp://www.blogger.com/profile/01023334706549710089noreply@blogger.comtag:blogger.com,1999:blog-8184237816669520763.post-2135526467117807462017-06-21T04:34:00.002-07:002017-06-21T05:44:48.433-07:00protobuf-net gets proto3 support<h1>protobuf-net gets proto3</h1>
<p>For quite a little while, protobuf-net hasn't seen any <em>major</em> changes. Sure, I've been pottering along with ongoing maintenance and things like .NET Core support, but it hasn't had any step changes in behavior. Until recently.</p>
<h1>2.3.0 Released</h1>
<p>I'm pleased to say that <a href="https://www.nuget.org/packages/protobuf-net/2.3.0-alpha">2.3.0 has finally dropped</a>. The most significant part of this is "proto3", which ties into the 3.0.0 version of Protocol Buffers - released by Google at the end of July 2016. There are a few reasons why I haven't looked at this for protobuf-net before now, including:</p>
<ul>
<li>zero binary format changes; so <em>ultimately</em>, even without any library or tooling changes: everything that can be done in proto2 can be done in proto3, <em>interchangeably</em>; I didn't feel under immense pressure to rush out a release</li>
<li>significant DSL changes for "proto3" syntax, coupled with the fact protobuf-net's existing DSL tools were in bad shape; not least, they were tied into some technologies with a bad cross-platform story. Since I knew I needed a new answer for DSL tooling, it seemed a poor investment to hack the new features into the end-of-life tooling. A significant portion of protobuf-net's usage is from code-first users who don't even <em>have</em> a DSL version of their schema, hence why this wasn't at the top of my list of priorities</li>
<li>some new data contracts targeting commonly exchanged types, but this is tied into the DSL changes</li>
<li>I misunderstood the nature of the "proto3" syntax changes; I assumed it would be *adding features and complexity, when in fact it <em>removes</em> a lot of the more awkward features. The few pieces that it did actually add were backported into "proto2" <em>anyway</em></li>
<li>I've been busy with lots of other things, including a lot of .NET Core work for multiple libraries</li>
</ul>
<p>But; I've finally managed to get enough time together to look at things properly.</p>
<p>First, some notes on proto3:</p>
<h2>proto3 is simpler than proto2</h2>
<p>This genuinely surprised me, but it was a very pleasant surprise. When writing protobuf-net, I made a conscious decision to make it <em>easy and natural</em> to implement the most common scenarios. I <em>supported</em> the full range of protobuf features, but some of them were more awkward to use. As such, I made some random decisions towards making it simple and obvious to use:</p>
<ul>
<li>implicit zero defaults: most people don't have complex default values, where-as this makes it simple and efficient to store "empty" data (in zero bytes) without any configuration</li>
<li>don't worry about implicitly set vs explicitly set values: values are value are values; the library <em>supports</em> a few common .NET patterns for explicit assignment (<code>ShouldSerialize*</code> / <code>*Specified</code> / <code>Nullable<T></code> + <code>null</code>), but it doesn't <em>demand</em> them and is perfectly fine without them</li>
<li>extensions and unknown data entirely optional: the question here is what to do if the serialized data contains unexpected / unknown values - which could be from external "extensions", or could just be new fields that the code doesn't know about. protobuf-net <em>supports</em> this type of usage, but accepts that it isn't something that most folks need or even want - they just want to get the <em>expected</em> data in and out</li>
</ul>
<p>It turns out that proto3 makes some striking <em>omissions</em> from proto2:</p>
<ul>
<li>default values are gone - implicit zero values are assumed and are the <em>only</em> permitted defaults</li>
<li>explicit assignment is gone - if something has a value other than the zero default, it is serialized, <em>and that's it</em></li>
<li>extensions are largely missing</li>
</ul>
<p>A part of me feels that these changes <em>totally validate</em> the decisions I made when making protobuf-net as simple to use as possible. Note that protobuf-net still retains full support for the wider set of protobuf features (including all the proto2 features) - they're not going anywhere.</p>
<h2>what about protobuf JSON?</h2>
<p>protobuf 3.0.0 added a well-defined JSON encoding for protobuf data. I confess that I'm deeply conflicted on this. In the .NET world, JSON is a solved problem. If I want my data serialized as JSON, I'm probably going to look at JIL (if I want raw performance) or Json.NET (if I want greater flexibility and range of features, or just want to use the de-facto platform serializer). Since protobuf-net targets idiomatic .NET types that would <em>already</em> serialize <strong>just fine</strong> with either of these, it seems to me of very little benefit to spend a large amount of time writing JSON support directly for protobuf-net. As such, protobuf-net still <em>does not support this</em>. If there is a genuine need for this, the first thing I would do would be to look at JIL or Json.NET to see if there is some combination of configuration options that I can specify that would conveninetly be compatible with the expected JSON encoding. At the very worst case, I could see either some PRs to JIL or a fork of JIL to support it, but frankly I'm going to defer on touching the JSON option until I understand the use-case. On the surface, it <em>seems</em> like the JSON option here takes all the main reasons for using protobuf and throws them out the window. My reservations here are probably because I'm spoiled by working in a platform where I can take <em>virtually any object</em>, and both JIL and Json.NET will be able to serialize and deserialize it for me. </p>
<h1>So what do we get in protobuf-net 2.3.0?</h1>
<h2>Brand new protogen tooling for both proto2 and proto3</h2>
<p>This release completely replaces the protogen DSL parsing tool; it has been 100% rewritten from scratch using pure managed code. The old version used to:</p>
<ul>
<li>shell execute to call Google's "protoc" tool to emit a compiled schema (in the protobuf serialization format, naturally) as a file</li>
<li>then deserialize that file into the corresponding type model using protobuf-net</li>
<li>serialize that same object as xml</li>
<li>run the xml through an xslt 1.0 engine to generate C#</li>
</ul>
<p>This worked, but is a cross-platform nightmare as well as being a maintenance nightmare. I doubt that xslt was a <em>good</em> choice for codegen even when it was written, but today... just painful. I looked at a range of parsing engines, but ultimately decided on a manual tokenizer and imperative forwards-only parser. It turned out to not be anything like as much work as I had feared, which was nice. In order to have confidence in the parser, I have tested it on every .proto schema I can find, including about 220 schemas that describe a good portion of Google's public API surface. I've tested these against protoc's binary output to ensure that <em>not only</em> does it parse the files meaningfully, but it produces <em>the exact same bytes</em> (as a compiled / serialized schema) that protoc produces.</p>
<p>This parser is then tied into a relatively basic codegen system. At the moment this is relatively crude, and is subject to significant change. The good thing is that <em>now that everything is in place</em>, this can be reworked relatively easily - perhaps to use one of the many templating systems that are available in .NET.</p>
<p>As an illustration of how the parser and codegen are neatly decoupled, <a href="https://twitter.com/RogerAlsing">Roger Johansson</a> has also independently converted his <a href="https://github.com/AsynkronIT/protoactor-go">Proto Actor code</a> to use protobuf-net's parser rather than protoc, which is great! <a href="https://twitter.com/RogerAlsing/status/871829162218184704">https://twitter.com/RogerAlsing/status/871829162218184704</a>. If you want to use the parser and code-generation tools outside of the tools I provide, <a href="https://www.nuget.org/packages/protobuf-net.Reflection/">protobuf-net.Reflection</a> may be useful to you.</p>
<h3>How do I use it?</h3>
<p>OK, you have a .proto schema (proto2 or proto3). At the moment, you have 2 options for codegen from protobuf-net:</p>
<ol>
<li>compile, build and execute the <code>protogen</code> command line tool (which deliberately shares command-line switches with Google's <code>protoc</code> tool)</li>
<li>use <a href="https://protogen.marcgravell.com/">https://protogen.marcgravell.com/</a> to do it online</li>
</ol>
<p>(as a 2.1 option you could also clone that same website from git and host it locally; that's totally fine)</p>
<p>I want to introduce much better tooling options, including something that ties into msbuild and dotnet CLI, and (optionally) devenv, but so far this is looking like hard work, so I wanted to ship 2.3.0 before tackling it. It is my opinion that <a href="https://protogen.marcgravell.com/">https://protogen.marcgravell.com/</a> is now perhaps the <em>easiest</em> way to play with .proto schemas - and to show willing, it also includes support for all official protoc output languages, <em>and</em> includes the entire public Google API surface as readily avaialble imports (those same 220 schemas from before).</p>
<h2>Support for maps</h2>
<p>Maps (<code>map<key_type, value_type></code>) in .proto are the equivalent of dictionaries in .NET. If you're familiar with protobuf-net, you'll know that it has offered dictionary support <em>for many years</em>. Fortunately, Google's idea of how this should be implemented matches perfectly with the arbitrary and unilateral decisions I stumbled into, so maps are 99.95% interchangeable with how protobuf-net already handles dictionaries. The 0.05% relates to what happens with duplicate keys. Basically: historically, protobuf-net used <code>theData.Add(key, value)</code>, which would throw if a key was duplicated. However, maps are defined such as the last value <em>replaces</em> previous values - so: <code>theData[key] = value;</code>. This is a very small difference, and doesn't impact any data that would <em>currently successfully deserialize</em>, so I've made the executive decision that from 2.3.0 all dictionaries should follow the "map" rules by default (when appropriate). To allow full control, protobuf-net has a new <code>ProtoMapAttribute</code> (<code>[ProtoMap]</code>). This has options to use the old <code>.Add</code> behavior, and also has options to control the sub-format used for the key and value. The protogen tool will always include the appropriate <code>[ProtoMap]</code> options for your data.</p>
<h2>Support for <code>Timestamp</code> and <code>Duration</code></h2>
<p><code>Timestamp</code> and <code>Duration</code> refer to a point in time (think: <code>DateTime</code>) and an <em>amount</em> of time (think: <code>TimeSpan</code>). Again, protobuf-net has had support for <code>DateTime</code> and <code>TimeSpan</code> <em>for many years</em>, but this time my arbitrary interpretation and Google's differs significantly. I have added native support for these formats, but because it is different to (and fundamentally incompattible with) what protobuf-net has done historically, this has to be done on an opt-in basis. I've added a new <code>DataFormat.WellKnown</code> option that indicates that you want to use these formats. For example:</p>
<pre><code>[ProtoMember(7, DataFormat = DataFormat.WellKnown)]
pubic DateTime CreationDate {get; set;}
</code></pre>
<p>will be serialized as a <code>Timestamp</code>. The protogen tool recognises <code>Timestamp</code> and <code>Duration</code> and will emit the appropriate options.</p>
<h2>Simpler <code>enum</code> handling</h2>
<p>Historically, enums in .proto were a bit awkward when it came to unknown values, and protobuf-net defaulted to the most paranoid options of panicking if it saw a value it didn't explicitly expect. However, the guidance now includes the new remark:</p>
<blockquote>
<p>During deserialization, unrecognized enum values will be preserved in the message, though how this is represented when the message is deserialized is language-dependent. In languages that support open enum types with values outside the range of specified symbols, such as C++ and Go, the unknown enum value is simply stored as its underlying integer representation.</p>
</blockquote>
<p>Enums in .NET are open enum types, so it makes sense to relax the handling here. Additionally, historically protobuf-net didn't <em>really</em> properly implelemt the older "make it available as an extension value" approach from proto2 (it would throw an exception instead) - far from ideal. So: from 2.3.0 onwards, all enums will be (by default) interpreted directly and without checking against expected values, <em>with the exception</em> of the unusual scenario where <code>[ProtoEnum(Value=...)]</code> has been used to <em>re-map</em> any enum such that the serialized value is different to the natural value. In this case, it can't assume that a direct interpretation will be valid, so the legacy checks will remain. Emphasis: this is a very rare scenario, and probably won't impact anyone except me (and my test suite). Because of this, the <code>[ProtoContract(EnumPassthru = ...)]</code> option is now <em>mostly</em> redundant: the only time it is useful is to explicitly set this to <code>false</code> to revert to the previous "throw an exception" behaviour.</p>
<h2>Discriminated unions, aka one-of</h2>
<p>One of the features introduced in proto3 (and back-ported to proto2) is the ability for multiple fields to overlap such that only one of them can contain a value at a time. The ideal in-memory representation of this is a discriminated union, which C# can't really represent <i>directly</i>, but which can be simulated via a <c>struct</c> with explicit layout; so that's exactly what we now do! A family of discriminated union structs have been introduced for this purpose,
and are <i>mainly</i> intended to be used with generated code. But if you want to use them directly: have fun!</p>
<h2>proto3 schema generation</h2>
<p>Since the DSL tools accept proto2 or proto3 syntax, it makes sense that we should be able to <em>emit</em> both proto2 and proto3 syntax, so there are now overloads of <code>GetSchema</code> / <code>GetProto<T></code> that allow this. These tools have also been updated to be aware of maps, <code>Timestamp</code>, <code>Duration</code> etc. </p>
<h2>New custom option DSL support</h2>
<p>The new DSL tooling makes use of the "extensions" feature to add custom syntax options to your .proto files. At the moment the options here are <a href="https://raw.githubusercontent.com/mgravell/protobuf-net/master/src/protogen.site/wwwroot/protoc/protobuf-net/protogen.proto">pretty limited</a>, allowing you to control the accessibility and naming of elements, but as new controls becomes necessary: that's where they will go. </p><h2>General bug fixes</h2>
<p>This build also includes a range of more general fixes for specific scenarios, as covered by the <a href="http://mgravell.github.io/protobuf-net/releasenotes">release notes</a></p>
<h1>What next?</h1>
<p>I'm keeping a basic future roadmap on the <a href="https://mgravell.github.io/protobuf-net/releasenotes">release notes</a>. There are some significant pieces of work ahead, including (almost certainly) a major rework of the core serializer to support <code>async</code> IO, "Pipelines", etc. I also want to improve the buid-time tooling. My work here is very much not done.</p>Marc Gravellhttp://www.blogger.com/profile/01023334706549710089noreply@blogger.comtag:blogger.com,1999:blog-8184237816669520763.post-10839042385983860132017-05-17T13:43:00.000-07:002017-05-18T06:29:32.540-07:00protobuf-net: large data, and the future<h1>protobuf-net was born into a different world</h1>
<p>On Jul 17, 2008 I pushed <a href="https://code.google.com/archive/p/protobuf-net/source/default/commits?page=7">the first commits of protobuf-net</a>. It is easy to forget, but <em>back then</em>, most machines had access to a lot less memory than they do today, with x86 still being a common choice, meaning that 2GB user space (or maybe a little more if you fancied fighting with /3GB+LAA) was a hard upper limit. In reality, your usable memory was much less. Processors were much less powerful - user desktops were doing well if their single core had hyper-threading support (dual and quad cores existed, but were much rarer).</p>
<h1><a id="user-content-thanks-for-the-2gb-memories" class="anchor" href="#thanks-for-the-2gb-memories" aria-hidden="true"><svg aria-hidden="true" class="octicon octicon-link" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Thanks for the 2GB memories</h1>
<p>It is in this context that protobuf-net was born, and in which many of the early design decisions were made. Although to be fair, even Google (who designed the thing) suggested an upper bound in the low hundreds of MB. Here's the original author (Kenton Varda) saying on Stack Overflow <a href="http://stackoverflow.com/questions/34128872/google-protobuf-maximum-size">that 10MB is "pushing it"</a> - although he does also note that 1GB works, but that 2GB is a hard limit.</p>
<p>protobuf-net took these limitations on board, and many aspects of the code could only work inside these borders. In particular, one of the key design questions in protobuf-net was how, when serializing general purpose objects, to handle the length prefix.</p>
<h2><a id="user-content-protobuf-strings" class="anchor" href="#protobuf-strings" aria-hidden="true"><svg aria-hidden="true" class="octicon octicon-link" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>protobuf strings</h2>
<p>Protobuf is actually a relatively simple binary format; it has few primitives, one of which is the length-prefixed string (where "string" means "arbitrary payload", not just text). The encoding of this is a <em>variable length</em> "varint" that tells it how many bytes are involved, then <em>that many bytes</em> of the payload:</p>
<pre><code>[field x, "string"]
[n, 1-10 bytes]
[payload, n bytes]
</code></pre>
<p>The requirement to know the length in advance is fine for the Google implementation - as I understand it, the "builder" approach means that the length is calculated when the "builder" creates the actual object, which is long before serialization happens (note: I'm happy to be corrected here if I've misunderstood). But protobuf-net doesn't work with "builder" types; it works against gereral every-day POCOs - usually written without any DSL schema ("code-first"). We can't rely on any construction-time calculations. So: how to write the length?</p>
<p>Essentially, there's two ways of doing this:</p>
<ul>
<li>serialize the data <em><strong>first</strong></em> (perhaps hoping that the length prefix will fit in a single byte, and leaving a space for it); when you've finished serializing, you know the length - so now backfill that into the original space, which might mean nudging the data over a bit if the prefix took more space than expected</li>
<li><em>compute</em> the actual required length, write the prefix, <em>then</em> serialize the data</li>
</ul>
<p>Both have advantages and disadvantages. The first requires you to buffer all the data in the payload (you can't flush something that you might need to update later), and might need us to move a lot of data. The second requires us to do more thinking without actually writing anything - which might mean doing a lot of work twice.</p>
<p>At the current time, protobuf-net chooses the first approach. For quite a lot of small leaf types, this doesn't actually mean much more than backfilling a single byte of length data, but it becomes progressively more expensive as the payload size increases.</p>
<h2><a id="user-content-i-hate-limits" class="anchor" href="#i-hate-limits" aria-hidden="true"><svg aria-hidden="true" class="octicon octicon-link" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>I hate limits</h2>
<p>Over the time since then, I have seen <em>many, many</em> requests from people asking for protobuf-net to support larger data sizes - at least an order of magnitude above what has previously been usable, tens of GB or more, which makes perfect sense when you consider the data that some apps load into the plentiful RAM available on even a mid-range server. In <em>principle</em> this is simple (mostly making sure that the reader and writer use 64-bit tracking internally), but there are 2 stumbling blocks:</p>
<ul>
<li>the need to buffer vast quantities of data would demand excessive amounts of RAM</li>
<li>the current buffer implementation woud be prohibitively hard to refactor to go above 2GB</li>
<li>even if we did, it would then take a <em>loooong</em> time to output the buffered data after backfilling</li>
</ul>
<p>I've recently pushed some commits intended to address the 64-bit reader/writer issue - unblocking some users, but the other factors are much harder to solve in the current implementation.</p>
<h2><a id="user-content-wait-how-does-that-unblock-anyone" class="anchor" href="#wait-how-does-that-unblock-anyone" aria-hidden="true"><svg aria-hidden="true" class="octicon octicon-link" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Wait... how does that unblock anyone?</h2>
<p>Good catch; indeed, simply enabling 64-bit readers and writers doesn't fix the buffering problem - but: there is a workaround. A long time in protobuf's past, there were two ways of encoding sub-messages. One was the length-prefixed string that we've discussed; the other was the "group". At the binary level, the difference is that "groups" don't have a length prefix - instead a sentinel value <em>suffix</em> is used to denote the end of the message:</p>
<pre><code>[field x, "start group"]
[payload]
[field x, "end group"]
</code></pre>
<p>(the protocol itself means that "end group" could not occur as an immediate child of the payload, so this is unambiguous)</p>
<p>As with most things, this has various advantages and disadvantages - but most significantly in our case here, it means we <em>don't need to know the length in advance</em>. And if we don't need to know the length, then we don't need to <em>buffer</em> anything - we can write the data in a purely forwards direction without any need to backfill data. There's just one problem: it is out of favor with the protobuf specification owners - it was marked as deprecated but supported in the proto2 DSL, and there is no syntax for it <em>at all</em> in the proto3 DSL (these all just describe data against the same binary format).</p>
<p>But: I <em>really, really like groups</em>, at least at the binary format level. Essentially, the current 2GB+ unblocking in an upcoming deploy of protobuf-net is limited to scenarios where it is possible to use groups <em>extensively</em>. The closer something is to being a leaf, the more it'll be OK to use length-prefixed strings; the closer something is to the root object, the more it will benefit from being treated as a "group". With this removing the need to buffer+backfill, arbitrarily large files can be produced. The cost, however, is that you won't be able to interop with data that is expressed as proto3 schemas.</p>
<p>Historically, you have been able to indicate that a <em>member</em> should be treated as a group via:</p>
<pre><code>// for field number "n"
[ProtoMember(n, DataFormat = DataFormat.Group)]
public SomeType MemberName { get; set; }
</code></pre>
<p>However, this is hard to express in some cases (such as dictionaries), so this has been <em>extended</em> to allow declaration at the type-level:</p>
<pre><code>[ProtoContract(IsGroup = true)]
public class SomeType {...}
</code></pre>
<p>(both of which can also be expressed via the <code>RuntimeTypeModel</code> API for runtime configuration)</p>
<p>These changes move us forward, at least - but are mainly appropriate when using protobuf-net as the only piece of the puzzle, since it simply cannot be expressed in the proto3 DSL.</p>
<h2><a id="user-content-the-future" class="anchor" href="#the-future" aria-hidden="true"><svg aria-hidden="true" class="octicon octicon-link" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>The future</h2>
<p>This is all great, but isn't ideal. So <em>in parallel</em> with that, I have some work-in-progress early-stages work that is taking a much more aggressive look at the future of protobuf-net and what it needs to move forward. I have many lofty aims on the list:</p>
<ul>
<li>true 2GB+ support including length-prefix, achieved by a redesign of the writer API, including switching to precalculation of lengths as required</li>
<li>optimized support for heterogeneous backend targets, including in-memory serialization, <code>Stream</code>s, "Channels" (the experimental redesign of the .NET IO stack), memory-mapped-files, etc</li>
<li>making use of new concepts like <code>Utf8String</code>, <code>Span<T></code> where appropriate</li>
<li>full support for <code>async</code> backend targets, making optimal use of <code>ValueTask<T></code> as appropriate so that performance is retained in the case where it is possible to complete entirely synchronously</li>
<li>rework of the codegen / meta-programming layer, reducing or removing the dependency on IL-emit, and moving more towards compile-time code-gen (ideally fully automated and silent) using Roslyn</li>
<li>in doing so, greatly improve the experience for AOT scenarios, where meta-programming is restricted or impossible</li>
<li>improve the performance of a range of common scenarios by every mechanism imaginable</li>
<li>and maybe, just maybe: getting around to implementing updated DSL parsing tooling (but realistically: that isn't the key selling-point of protobuf-net)</li>
</ul>
<p>As counterpoints, I <em>also</em> imagine that I'll be dropping support for everything that isn't either ".NET Framework recent-enough to build via <code>dotnet build</code>" (4.0 and avove, IIRC) or ".NET Standard (something)". The reality is that I'm not in a position to support some obscure PCL configuration or an ancient version of Silverlight. If you can make it compile: great! I'm also entirely open to including targets for things like Xamarin or Unity as long as <em>somebody else</em> can make them work in the build - I'm simply not a user of those tools, and it would be artificial to say that I've seen it work. I'm also moving away from my historic aim of being able to compile on down-level compiler versions. These days, with NuGet as the de-facto package manager, and <code>dotnet build</code> readily available, and the free <a href="https://www.visualstudio.com/vs/community/">Visual Studio Community</a> edition, I'm not sure it makes sense to worry about old compilers.</p>
<p>As you can see, there's a lot in the planning. I've been experimenting with various pieces of it to see how it fits together, and I'm confident that I see a viable route forward. Now all I need is to make it happen.</p>
<p>The first step there is to get the "longification" changes shipped; this has now seen real-world usage, so it is just some packaging work to do. I hope to have that available on NuGet before next week.</p>
<p>Fun times!</p>Marc Gravellhttp://www.blogger.com/profile/01023334706549710089noreply@blogger.comtag:blogger.com,1999:blog-8184237816669520763.post-60803781152772418862017-04-29T11:27:00.003-07:002017-04-29T11:27:59.783-07:00StackExchange.Redis and Redis 4.0 Modules<h2>StackExchange.Redis and Redis Modules</h2>
<p>This is largely a brain-dump of my plans for Redis 4.0 Modules and the StackExchange.Redis client library.</p>
<p>Redis 4.0 <a href="https://raw.githubusercontent.com/antirez/redis/4.0/00-RELEASENOTES">is in RC 3</a>, which is great for folks interested in Redis. As the primary maintainer of <a href="https://www.nuget.org/packages/StackExchange.Redis/">StackExchange.Redis</a>, new releases also bring me some extra work in terms of checking whether there are new features that I need to incorporate into the client library. Some client libraries expose a very raw API surface, leaving the individual commands etc to the caller - this has the advantagee of simplicity, but it has disadvantages too:</p>
<ul>
<li>it presents a higher barrier to entry, as users need to learn the redis command semantics</li>
<li>it prevents the library offering any special-case optimizations or guidance</li>
<li>it makes it hard to ensure that key-based sharding is being implemented correctly (as to do that you need to know <em>with certainty</em> which tokens are <em>keys</em> vs <em>values</em> vs <em>command semantics</em>)</li>
<li>it is hard to optimize the API</li>
</ul>
<p>For all these reasons, StackExchange.Redis has historically offered a more method-per-command experience, allowing full intellisense, identification of <em>keys</em>, helper enums for options, santity checking of operands, and various scenario-specific optimizations / fallback strategies. And ... if that isn't enough, you can always use hack by using Lua to do things at the server directly.</p>
<h2>Along comes Modules</h2>
<p>A key feature in Redis 4.0 is the introduction of <em>modules</em>. This allows <em>anyone</em> to write a module that does something interesting and useful that they want to run <em>inside</em> Redis, and load that module into their Redis server - then invoke it using whatever exotic commands they choose. If you're interested in Redis, you should go check it out! There's already a <a href="http://redismodules.com/">gallery of useful modules started by Redis Labs</a> - things like JSON support, Machine Learning, or Search - with an option to submit your own modules to the community.</p>
<p>Clearly, my old approach of "manually update the API when new releases come out" doesn't scale to the advent of modules, and saying "use Lua to run them" is ... ungainly. We need a different approach.</p>
<h2>Adding <code>Execute</code> / <code>ExecuteAsync</code></h2>
<p>As a result, in an upcoming (not yet released) version, the plan is to add some new methods to StackExchange.Redis to allow more direct and raw access to the pipe; for example the <code>rejson</code> module adds a <code>JSON.GET</code> command that takes a key to an existing JSON value, and a path inside that json - we can invoke this via:</p>
<pre><code>string foo = (string)db.Execute(
"JSON.GET", key, "[1].foo");
</code></pre>
<p>(there's a similar <code>ExecuteAsync</code> method)</p>
<p>The return value of these methods is the flexible <code>RedisResult</code> type that the Lua API already exposes, which handles all the expected scenarios of primitives, arrays, etc. The parameters are simply a <code>string</code> command name, and a <code>params object[]</code> of <em>everything else</em> - with appropriate handling of the types you're likely to use with redis commands (<code>string</code>, <code>int</code>, <code>double</code>, etc). It also recognises parameters <em>typed as <code>RedisKey</code></em> and uses them for routing / sharding purposes as necessary.</p>
<p>The key from all of this is that it should be easy to quickly hook into any modules that you write or want to consume.</p>
<h2>What about more graceful handling for well-known modules?</h2>
<p>My hope here is that or <em>well-known</em> but non-trivial modules, "someone" (maybe me, maybe the wider community) will be able to write helper methods <em>as C# extension methods</em> against the client library, and package them as module-specific NuGet packages; for example, a package could add:</p>
<pre><code>public static RedisValue JsonGet(this IDatabase db, RedisKey key,
string path = ".", CommandFlags flags = CommandFlags.None)
{
return (RedisValue)db.Execute("JSON.GET",
new object[] { key, path }, flags);
}
</code></pre>
<p>to expose <em>raw</em> json functionality, or could choose to add serialization / deserialization into the mix too:</p>
<pre><code>public static T JsonGet<T>(this IDatabase db, RedisKey key,
string path = ".", CommandFlags flags = CommandFlags.None)
{
byte[] bytes = (byte[])db.Execute("JSON.GET",
new object[] { key, path }, flags);
using (var ms = new MemoryStream(bytes))
{
return SomeJsonSerializer.Deserialize<T>(ms);
}
}
</code></pre>
<p>The <em>slight</em> wrinkle here is that it is still using the <code>Execute[Async]</code> API; as a general-purpose API it is very <em>convenient and flexible</em>, but slightly more expensive than it <em>absolutely needs to be</em>. But being realistic, it is probably fine for 95% of use-cases, so: let's get that shipped and iterate from there.</p>
<p>I'd <em>like</em> to add a second API specifically intended for extensions like this (more direct, less allocations, etc), but a: ideally I'd want to ensure that I can subsequently tie it cleanly into the "pipelines" concept (which is currently just a corefxlab dream, without a known ETA for "real" .NET), and b: it would be good to gauge interest and uptake before spending any time doing this.</p>
<h2>But what should consumers target?</h2>
<p>This <em>also</em> makes "strong naming" rear it's ugly head. I'm not going to opine on strong naming here - the discussion is not very interesting and has been done to death. Tl,dr: currently, there are two packages for the client library - strong named and not strong named. It would be sucky if there was a mix of external extensions targeting one, the other, or both. The mid range plan is to make a <strong>breaking package change</strong> and re-deploy StackExchange.Redis (which currently is not strong-named) as: strong-named. The StackExchange.Redis.StrongName would be <em>essentially</em> retired, although I guess it could be an empty package with a StackExchange.Redis dependency for convenience purposes, possibly populated entirely by <a href="https://msdn.microsoft.com/en-us/library/system.runtime.compilerservices.typeforwardedtoattribute(v=vs.110).aspx"><code>[assembly:TypeForwardedTo(...)]</code></a> markers. I'm open to better ideas, of course!</p>
<h2>So that's "The Plan"</h2>
<p>If you have strong views, <a href="https://twitter.com/marcgravell">hit me on twitter (@marcgravell)</a>, or <a href="https://github.com/StackExchange/StackExchange.Redis/">log an issue</a> and we can discuss it.</p>Marc Gravellhttp://www.blogger.com/profile/01023334706549710089noreply@blogger.comtag:blogger.com,1999:blog-8184237816669520763.post-72503780450065474752017-04-23T15:40:00.001-07:002017-05-05T01:47:05.278-07:00Spans and ref part 2 : spans
<h1 id="spans-and-ref-part-2--spans">Spans and <code class="highlighter-rouge">ref</code> part 2 : spans</h1>
<p>In <a href="http://blog.marcgravell.com/2017/04/spans-and-ref-part-1-ref.html">part 1</a>, we looked at <code class="highlighter-rouge">ref</code> locals and <code class="highlighter-rouge">ref return</code>, and hinted at a connection to “spans”; this time we’re going to take a deeper look at what this connection might be, and how we can use make use of it.</p>
<h2 id="disclaimer">Disclaimer</h2>
<p>I’m <em>mostly</em> on the outside of this - looking in at the public artefacts, playing with the API etc - maybe the odd PR or issue report. It is entirely possible that I’ve misunderstood some things, and it is possible that things will change between now and general availability.</p>
<h2 id="what-are-spans">What are spans?</h2>
<p>By spans, I mean <code class="highlighter-rouge">System.Span<T></code>, which is part of .NET Core, living in the <code class="highlighter-rouge">System.Memory</code> assembly. It is also available for .NET via the <code class="highlighter-rouge">System.Memory</code> package. But please note: <em>it is a loaded gun to use at the moment</em> - you can currently compile code that has <strong>undefined behavior</strong>, and which <strong>may not</strong> compile at some point in the future. Although to be fair, to get into any of the terrible scenarios you need to use the <code class="highlighter-rouge">unsafe</code> keyword, at which point you already said “I take full responsibility for everything that goes wrong here”. I’ll discuss this more below, but I wanted to mention that at the top in case you stop reading and don’t get to that important point.</p>
<p>Note that some of the code in this post uses unreleased features; I’m using:</p>
<div class="highlighter-rouge"><pre class="highlight"><code><PackageReference Include="System.Memory"
Version="4.4.0-preview1-25219-04" />
<PackageReference Include="System.Runtime.CompilerServices.Unsafe"
Version="4.4.0-preview1-25219-04" />
</code></pre>
</div>
<p>Obviously <a href="http://idioms.thefreedictionary.com/all+bets+are+off">all bets are off</a> with preview code; things may change.</p>
<h2 id="why-do-spans-need-to-exist">Why do spans need to exist?</h2>
<p>We <a href="http://blog.marcgravell.com/2017/04/spans-and-ref-part-1-ref.html">saw previously</a> how <code class="highlighter-rouge">ref T</code> can be used similarly to pointers (<code class="highlighter-rouge">T*</code>) to represent a reference to a single value. Basically, anything that allows us to talk about complex scenarios without needing pointers is a good thing. But: representing a <em>single</em> value is not the only use-case of pointers. The <em>much more common</em> scenario for pointers is for talking about a <em>range</em> of contiguous data, usually when paired with a count of the elements.</p>
<p>At the most basic level, a <code class="highlighter-rouge">Span<T></code> represents a strongly typed contiguous chunk of elements of type <code class="highlighter-rouge">T</code> with a known and enforced length. In many ways, very comparable to an array (<code class="highlighter-rouge">T[]</code>) or segment <code class="highlighter-rouge">ArraySegment<T></code>) - but… more. They also provide <em>safe</em> (by which I mean: not <code class="highlighter-rouge">unsafe</code> in the C# sense) access to features that would previously have required pointers (<code class="highlighter-rouge">T*</code>).</p>
<p>I’m probably missing a few things here, but the most immediate features are:</p>
<ul>
<li>provide a unified type system over all contiguous memory, including: arrays, unmanaged pointers, stack pointers, fixed / pinned pointers to managed data, and references into the interior of values</li>
<li>allow type coercion for primitives and value-types</li>
<li>work with generics (unlike pointers, which don’t)</li>
<li>respect garbage collection (GC) semantics by using <em>references</em> instead of <em>pointers</em> (the GC only walks references)</li>
</ul>
<p>Now: if none of the above sounds like things you ever need to do, then great: you probably won’t ever need to use <code class="highlighter-rouge">Span<T></code> - and that’s perfectly OK. Most <em>application</em> code will never need to use these features. Ultimately, these tools are designed for <em>lower level</em> code (usually: library code) that is performance critical. That said, there <em>are</em> some great uses in regular code, that we’ll get onto.</p>
<h2 id="but-what-is-a-span">But… what is a span?</h2>
<p>OK, OK. <em>Conceptually</em>, a <code class="highlighter-rouge">Span<T></code> can be thought of as a reference and a length:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>public struct Span<T> {
ref T _reference;
int _length;
public ref T this[int index] { get {...} }
}
</code></pre>
</div>
<p>with a cousin:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>public struct ReadOnlySpan<T> {
ref T _reference;
int _length;
public T this[int index] { get {...} }
}
</code></pre>
</div>
<p>You would be perfectly correct to complain “but… but… in the last part you said no <code class="highlighter-rouge">ref</code> fields!”. That’s fair, but I did say <em>conceptually</em>. At least… for now!</p>
<h2 id="spans-as-ranges-of-an-array">Spans as ranges of an array</h2>
<p>As a completely trivial (and rather pointless) example, we can see how we can use a <code class="highlighter-rouge">Span<T></code> very similarly to how we might have used a <code class="highlighter-rouge">T[]</code>:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>void ArrayExample() {
byte[] data = new byte[1024];
// not shown: populate data
ProcessData(data);
}
void ProcessData(Span<byte> span) {
for (int i = 0; i < span.Length; i++) {
DoSomething(span[i]);
}
}
</code></pre>
</div>
<p>Here we implicitly convert the <code class="highlighter-rouge">byte[]</code> to <code class="highlighter-rouge">Span<byte></code> when calling the method, but at this point you would still be justified in being underwhelmed - we could have done everything here with just an array.</p>
<p>Similarly, we can talk about just a <em>portion</em> of the array:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>void ArrayExample() {
byte[] data = new byte[1024];
// not shown: populate data
ProcessData(new Span<byte>(data, 10, 512));
}
void ProcessData(Span<byte> span) {
for (int i = 0; i < span.Length; i++) {
DoSomething(span[i]);
}
}
</code></pre>
</div>
<p>And again you could observe that we could have just used <code class="highlighter-rouge">ArraySegment<T></code>. Actually, let’s be realistic: very few people use <code class="highlighter-rouge">ArraySegment<T></code> - but we could have just passed <code class="highlighter-rouge">int offset</code> and <code class="highlighter-rouge">int count</code> as additional parameters, it would have worked fine. But I mentioned pointers earlier…</p>
<h2 id="spans-as-ranges-of-pointers">Spans as ranges of pointers</h2>
<p>The <em>second</em> way we can use <code class="highlighter-rouge">Span<T></code> is <em>over a pointer</em>; which could be any of:</p>
<ul>
<li>a <code class="highlighter-rouge">stackalloc</code> pointer for a small value that we want to work on without allocating an array</li>
<li>a managed array that we previously <code class="highlighter-rouge">fixed</code></li>
<li>a managed array that we previously pinned with <code class="highlighter-rouge">GCHandle.Alloc</code></li>
<li>a fixed-sized buffer that we previously <code class="highlighter-rouge">fixed</code></li>
<li>the contents of a <code class="highlighter-rouge">string</code> that we previously <code class="highlighter-rouge">fixed</code></li>
<li>a <em>coerced</em> pointer from any of the above (I’ll explain what this means below)</li>
<li>a chunk of unmanaged memory obtained with <code class="highlighter-rouge">Marshal.AllocHGlobal</code> or any other unmanaged memory API</li>
<li>etc</li>
</ul>
<p>All of these will necessarily involve <code class="highlighter-rouge">unsafe</code>, but: we’ll tread carefully! Let’s have a look at a <code class="highlighter-rouge">stackalloc</code> example (<code class="highlighter-rouge">stackalloc</code> is where you obtain a chunk of data directly on the call-stack):</p>
<div class="highlighter-rouge"><pre class="highlight"><code>void StackAllocExample() {
unsafe {
byte* data = stackalloc byte[128];
var span = new Span<byte>(data, 128);
// not shown: populate data / span
ProcessData(span);
}
}
void ProcessData(Span<byte> span) {
for (int i = 0; i < span.Length; i++) {
DoSomething(span[i]);
}
}
</code></pre>
</div>
<p>That’s… actually pretty huge! We just used the exact same processing code to handle an array and a pointer, and we didn’t need to use <code class="highlighter-rouge">unsafe</code> (except in the code that initially obtained the pointer). This opens up a <em>huge</em> range of possibilities, especially for things like network IO and serialization. Even better, it means that we can do all of the above with a “zero copy” mentality: rather than having managed code writing to a <code class="highlighter-rouge">byte[]</code> that later gets copied to some unmanaged chunk (for whatever IO we need), we can write <em>directly</em> to the unmanaged memory via a <code class="highlighter-rouge">Span<T></code>.</p>
<h2 id="slice-and-dice">Slice and dice</h2>
<p>A very common scenario when working with buffers and buffer segments is the need to sub-divide the buffer. <code class="highlighter-rouge">Span<T></code> makes this easy via the <code class="highlighter-rouge">Slice()</code> method, best illustrated by an example:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>void ProcessData(Span<byte> span) {
while(span.Length > 0) {
// first byte is single-byte length-prefix
int len = span[0];
// process the next "len" bytes
ProcessChunk(span.Slice(1, len));
// move forward len+1 bytes
span = span.Slice(len + 1);
}
}
</code></pre>
</div>
<p>This isn’t something we couldn’t do other ways, but it is very <em>convenient</em> here. Importantly, we haven’t <em>allocated</em> anything here - there’s no “new array” or similar - we just have a reference to a <em>different part</em> of the existing range, and / or a different length.</p>
<h2 id="coercion">Coercion</h2>
<p>A more interesting example is coercion; this is something that you can do with pointers, but is very hard to do with arrays. A classic scenario here would be IO / serialization: you have a chunk of bytes, and at <em>some point</em> in that data you need to treat the data as fixed-size <code class="highlighter-rouge">int</code>, <code class="highlighter-rouge">float</code>, <code class="highlighter-rouge">double</code>, etc data. In the world of pointers, you just… <em>do that</em>:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>byte* raw = ...
float* floats = (float*)raw;
float x = floats[0], y = floats[1]; // consume 8 bytes
</code></pre>
</div>
<p>With arrays, there is no <em>direct</em> way to do this; you’d either need to use <code class="highlighter-rouge">unsafe</code> hacks, or you can use <code class="highlighter-rouge">BitConverter</code> if the types you need are supported. But this is easy with <code class="highlighter-rouge">Span<T></code>:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>Span<byte> raw = ...
var floats = raw.NonPortableCast<byte, float>();
float x = floats[0], y = floats[1]; // consume 8 bytes
</code></pre>
</div>
<p>Not only can we <em>do it</em>, but we have the added advantage that it has correctly tracked the end range for us during the conversion - we will find that <code class="highlighter-rouge">floats.Length</code> is equal to <code class="highlighter-rouge">raw.Length / 4</code> (since each <code class="highlighter-rouge">float</code> requires 4 bytes). The important thing to realise here is that we haven’t <em>copied any data</em> - we’re still looking at the exact same place in memory - but instead of treating it as a <code class="highlighter-rouge">ref byte</code>, we’re treating it as a <code class="highlighter-rouge">ref float</code>.</p>
<h2 id="except-better">Except… better!</h2>
<p>We observed that with pointers we could coerce from <code class="highlighter-rouge">byte*</code> to <code class="highlighter-rouge">float*</code>. That’s fine, but you can’t use pointers with all types. <code class="highlighter-rouge">Span<T></code> has <em>much stronger</em> support here. A particularly interesting illustration is <a href="https://en.wikipedia.org/wiki/SIMD">SIMD</a>, which is exposed in .NET via <a href="https://msdn.microsoft.com/en-us/library/dn858385(v=vs.111).aspx"><code class="highlighter-rouge">Vector<T></code></a>. A vexing limitation of pointers is that we <em>cannot</em> talk about a <code class="highlighter-rouge">Vector<float>*</code> pointer (for example). This means that we can’t use pointer coercion as a convenient way of reading and writing SIMD vectors (you’ll usually have to use <a href="https://www.nuget.org/packages/System.Runtime.CompilerServices.Unsafe/"><code class="highlighter-rouge">Unsafe.Read<T></code> and <code class="highlighter-rouge">Unsafe.Write<T></code></a> instead). But we <em>can</em> coerce directly to <code class="highlighter-rouge">Vector<T></code> from a span! Here’s an example that might come up in things like applying the web-sockets xor mask to a received frame’s payload:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>void ApplyXor(Span<byte> span, uint mask) {
if(Vector.IsHardwareAccelerated) {
// apply the mask to SIMD-width bytes at a time
var vectorMask = new Vector<uint>(mask);
var typed = span.NonPortableCast<byte, Vector<uint>>();
for (int i = 0; i < typed.Length; i++) {
typed[i] ^= vectorMask;
}
// move past that data (might be a few bytes left)
span = span.Slice(Vector<uint>.Count * typed.Length);
}
// not shown - finish any remaining data
}
</code></pre>
</div>
<p>That’s pretty minimal code for vectorizing something; it is especially nice that we didn’t even need to do the math to figure out the vectorizable range - <code class="highlighter-rouge">typed.Length</code> did everything we wanted. It would be premature for me to know for sure, but I’m also hopeful that these <code class="highlighter-rouge">0</code>-<code class="highlighter-rouge">Span<T>.Length</code> loops will also elide the bounds check in the same way that array access from <code class="highlighter-rouge">0</code>-<code class="highlighter-rouge">T[].Length</code> elides the bounds check.</p>
<h2 id="and-readonly-too">And readonly too!</h2>
<p>Pointers are notoriously permissive; if you have a pointer: you can do anything. You can use <code class="highlighter-rouge">fixed</code> to obtain the <code class="highlighter-rouge">char*</code> pointer inside a <code class="highlighter-rouge">string</code>: if you change the data via the pointer, the <code class="highlighter-rouge">string</code> now has different contents. <code class="highlighter-rouge">string</code> is not immutable if you allow <code class="highlighter-rouge">unsafe</code>: <em>nothing</em> is immutable if you allow <code class="highlighter-rouge">unsafe</code>. But just as we can obtain a <code class="highlighter-rouge">Span<T></code>, we can also get a <code class="highlighter-rouge">ReadOnlySpan<T></code>. If you only expect a method to read the data, you can give them a <code class="highlighter-rouge">ReadOnlySpan<T></code>.</p>
<h2 id="zero-cost-substrings">Zero-cost substrings</h2>
<p>In the “corefxlab” preview code, there’s a method-group with signatures along the lines of:</p>
<div class="highlighter-rouge"><pre class="highlight"><code> public static ReadOnlySpan<char> Slice(this string text, ...)
</code></pre>
</div>
<p>(where the overloads allow an initial range to be specified). This gives us a <code class="highlighter-rouge">ReadOnlySpan<char></code> that points directly at a range <em>inside</em> the <code class="highlighter-rouge">string</code>. If we want a substring, we can just <code class="highlighter-rouge">Slice()</code> again and again - with zero allocations and zero string copying - we just have different spans over the same data. A rich set of APIs already exists in the corefxlab code for working with this type of string-like data. If you do a lot of text processing, this could have some really interesting aspects.</p>
<h2 id="this-all-sounds-too-good-to-be-true---whats-the-catch">This all sounds too good to be true - what’s the catch?</h2>
<p>Here’s the gotcha: in order to have the appropriate <em>correctness</em> guarantees when discussing something that <em>could</em> be a managed object, <em>could</em> be data on the stack, or <em>could</em> be unmanaged data, we run into very similar problems that make it impossible to store a <code class="highlighter-rouge">ref T</code> local as a field. Remember that a <code class="highlighter-rouge">Span<T></code> is <em>conceptually</em> a <code class="highlighter-rouge">ref T</code> (reference) and <code class="highlighter-rouge">int</code> (length) - well: we still need to obey the rules imposed by that “conceptually”. For a trivial example of how we can get in a mess, we can tweak our <code class="highlighter-rouge">stackalloc</code> example:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>private Span<byte> _span;
unsafe void StackAllocExample() {
byte* data = stackalloc byte[128];
_span = new Span<byte>(data, 128);
...
}
void SomeWhileLater() {
ProcessData(_span);
}
</code></pre>
</div>
<p>Where does <code class="highlighter-rouge">_span</code> refer to in <code class="highlighter-rouge">SomeWhileLater</code>? I can’t tell you. We get into similar problems with anything that used <code class="highlighter-rouge">fixed</code> to get a pointer - the pointer is only guaranteed to make sense inside the <code class="highlighter-rouge">fixed</code>. Conceptually the issue is not restricted to pointers - it would apply equally if we could initialize <code class="highlighter-rouge">Span<T></code> directly with a <code class="highlighter-rouge">ref T</code> constuctor:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>private Span<SomeStruct> _span;
void StackRefExample() {
var val = new SomeStruct(123, 456);
_span = new Span<SomeStruct>(ref val);
// ^^^ hypothetical span of length 1
}
</code></pre>
</div>
<p>We didn’t even need <code class="highlighter-rouge">unsafe</code> to break things this time. No such constructor currently exists, very wisely!</p>
<p>We <em>should</em> be OK if we only ever use managed heap objects (arrays, etc) to initialize a <code class="highlighter-rouge">Span<T></code>, but the <em>entire point</em> of <code class="highlighter-rouge">Span<T></code> is to provide feature parity between things like arrays and pointers while making it hard to shoot yourself in the foot.</p>
<p>In addition to this, we also need to worry about <em>atomicity</em>. The runtime and language guarantee that a <em>single reference</em> can be read atomically (in one CPU instruction), but it makes no guarantees about anything larger. If we have a reference <em>and a length</em>, we start getting into very complex issues around <a href="http://joeduffyblog.com/2006/02/07/threadsafety-torn-reads-and-the-like/">“torn” values</a> (an invalid pair of the reference and length that didn’t actually exist, due to two threads squabbling). A torn value is vexing at the best of times, but in this case it would lead to valid-looking code accessing unexpected memory - a very bad thing.</p>
<p>The <code class="highlighter-rouge">stackalloc</code> example above is a perfect example of code that will compile without complaint today, but will end <em>very very badly</em> - although we used <code class="highlighter-rouge">unsafe</code>, so: self-inflicted. But this and the atomicity issue are both illustrations of why we have…</p>
<h2 id="the-important-big-rule-of-spans">The Important Big Rule Of Spans</h2>
<p><strong><code class="highlighter-rouge">Span<T></code> has undefined behavior off the stack</strong>. And in the future: may not be allowed off the stack at all - this means no fields, no arrays, no boxing, etc. In the same way that <code class="highlighter-rouge">ref T</code> only has defined behavior on the stack (locals, parameters, return values) - so <code class="highlighter-rouge">Span<T></code> only has defined behavior on the stack. You are not meant to <strong>ever</strong> put a <code class="highlighter-rouge">Span<T></code> in a field (including all those times when things look like locals but are actually fields, that I touched on last time). An immediate consequence of this is that atomicity is no longer an issue: each stack is specific to a single thread; if our value can’t escape the stack, then two threads can’t have competing reads and writes.</p>
<p>There’s <a href="https://github.com/dotnet/csharplang/pull/472">some in-progress discussion</a> on how the rules for this requirement should work, but it <em>looks</em> like the concept of a “ref-like” stack-only type is being introduced. <code class="highlighter-rouge">ref T</code> as a field would be ref-like, and <code class="highlighter-rouge">Span<T></code> would be ref-like. Any ref-like type would only be valid directly on the stack, or as an instance field (not a <code class="highlighter-rouge">static</code> field) on a ref-like type. If I had to <strong><em>speculate</em></strong> at syntax, I’d expect this to look something like:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>public ref struct Span<T> {
ref T _reference;
int _length;
public ref T this[int index] { get {...} }
}
</code></pre>
</div>
<p>Emphasis: this syntax is pure speculation based on the historic reluctance to introduce new keywords, but the <code class="highlighter-rouge">ref struct</code> here denotes a ref-like type. It could also be done via attributes or a range of other ideas, but note that we’re now allowed to embed the ref-like <code class="highlighter-rouge">ref T</code> field. Additionally, the compiler and runtime would verify that <code class="highlighter-rouge">Span<T></code> is never used illegally as a field or in an array etc. Notionally, we could also do this for our own types that shouldn’t escape the stack, if we have similar semantics but <code class="highlighter-rouge">Span<T></code> doesn’t represent our scenario.</p>
<p>Thinking back to the <code class="highlighter-rouge">StackRefExample</code>, if we <em>wanted</em> to safely support usage like:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>var val = new SomeStruct(123, 456);
var span = new Span<SomeStruct>(ref val); // local, not field
</code></pre>
</div>
<p>then presumably it <em>could</em> work, but we’d have to have similar logic about <em>returning</em> ref-like types as currently exists for <code class="highlighter-rouge">ref return</code>, further complicated by the fact that we don’t have the single-assignment guarantee - we can <em>reassign</em> a <code class="highlighter-rouge">Span<T></code>. If ref-like types work in the general case, then the logic about <em>passing and returning</em> such a value needs ironing out. And that’s complex. I’m very happy to <a href="https://github.com/dotnet/csharplang/pull/472">defer to Vladimir Sadov</a> on this!</p>
<p>EDIT: to clarify - it is only the <em>pair of <code class="highlighter-rouge">ref T</code> and <code class="highlighter-rouge">length</code></em> (together known as a span, <code class="highlighter-rouge">Span<T></code> or <code class="highlighter-rouge">ReadOnlySpan<T></code>) that need to stay on the stack; the memory that we're <em>spanning</em> can be <em>anywhere</em> - and will often be part of a regular array (<code class="highlighter-rouge">T[]</code>) on the managed heap. It <em>could</em> also be a reference to the unmanaged heap, or to a separate part of the current stack.</p>
<h2 id="so-how-am-i-meant-to-work-with-spans">So how am I meant to work with spans?</h2>
<p>Sure, not everything is on the stack.</p>
<p>This isn’t as much of a limitation as it sounds. Instead of storing the <code class="highlighter-rouge">Span<T></code> itself, you just need to store <em>something that can manifest a span</em>. For example, if you’re actually using arrays you might have a type that <em>contains</em> an <code class="highlighter-rouge">ArraySegment<T></code>, but which has a property:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>public Span<T> Span { get { ... } }
</code></pre>
</div>
<p>As long as you can switch into <code class="highlighter-rouge">Span<T></code> mode when you’re inside an appropriate method, all is good.</p>
<p>For a more unified model, the corefxlab code contains the <code class="highlighter-rouge">Buffer<T></code> concept, but it is still very much a work in progress. We’ll have to see how it shakes out in time.</p>
<h2 id="wait-why-so-much-ref-previously">Wait… why so much <code class="highlighter-rouge">ref</code> previously?</h2>
<p>We covered a <em>lot</em> of <code class="highlighter-rouge">ref</code> details - you might feel cheated. Well, <em>partly</em> we needed that information to <em>understand</em> the stack-only semantics of <code class="highlighter-rouge">Span<T></code>. But there’s more! <code class="highlighter-rouge">Span<T></code> also exposes the <code class="highlighter-rouge">ref T</code> directly via the aptly named <code class="highlighter-rouge">DangerousGetPinnableReference()</code> method. This is a <code class="highlighter-rouge">ref return</code>, and allows us to do any of:</p>
<ul>
<li>store the <code class="highlighter-rouge">ref return</code> into a <code class="highlighter-rouge">ref</code> local and work with it</li>
<li>pass the <code class="highlighter-rouge">ref return</code> as a <code class="highlighter-rouge">ref</code> or <code class="highlighter-rouge">out</code> parameter to another method</li>
<li>use <code class="highlighter-rouge">fixed</code> to convert the <code class="highlighter-rouge">ref</code> to a pointer (preventing GC movement at the same time)</li>
</ul>
<p>The latter option means that <em>not only</em> can we get from <code class="highlighter-rouge">unsafe</code> to <code class="highlighter-rouge">Span<T></code>, but we can go the other direction if we need:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>fixed(byte* ptr = &span.DangerousGetPinnableReference())
{ ... }
</code></pre>
</div>
<h2 id="if-i-can-get-a-ref-can-i-escape-the-bounds">If I can get a <code class="highlighter-rouge">ref</code>, can I escape the bounds?</h2>
<p>The <code class="highlighter-rouge">DangerousGetPinnableReference()</code> method give us back a <code class="highlighter-rouge">ref</code> to the start of the range, comparable to how a <code class="highlighter-rouge">T*</code> pointer refers to the start of a range in pointer terms. So: can we use this to get around the range constraints? Well… yes… ish:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>ref int somewhere = ref Unsafe.Add(
ref span.DangerousGetPinnableReference(), 5000);
</code></pre>
</div>
<p>This cheeky duo gives us a reference to <em>whatever</em> is 5000-integers ahead of the span we were thinking of. It <em>might</em> still be part of our data (if we have a large array, for example), or it might be something completely random. But the sharp eyed might have noticed some key words in that expression… “<code class="highlighter-rouge">Unsafe...</code>” and “<code class="highlighter-rouge">Dangerous...</code>”. If you keep sprinting past signs with words like that on: expect to hit rocks. There’s nothing here that you couldn’t already do with <code class="highlighter-rouge">unsafe</code> code, note.</p>
<h2 id="doing-crazy-things-with-unmanaged-memory">Doing crazy things with unmanaged memory</h2>
<p>Sometimes you need to use unmanaged memory - this could be because of memory / collection issues, or could be because of interfacing with unmanaged systems - I use it in CUDA work, for example, where the CUDA driver has to allocate the memory in a special way to get optimal performance. Historically, working with unmanaged memory <em>is hard</em> - you will be using pointers all the time. But we can simplify everything by using spans. Here’s our dummy type that we will store in unmanaged memory:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>// could be explict layout to match external definition
struct SomeType
{
public SomeType(int id, DateTime creationDate)
{
Id = id;
_creationDate = creationDate.ToEpochTime();
// ...
}
public int Id { get; }
private long _creationDate;
public DateTime CreationDate => _creationDate.FromEpochTime();
// ...
public override string ToString()
=> $"{Id}: {CreationDate}, ...";
}
</code></pre>
</div>
<p>We’ll need to allocate some memory and ensure it is collected, usually via a finalizer in a wrapper class:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>unsafe class UnmanagedStuff : IDisposable
{
private SomeType* ptr;
public UnmanagedStuff(int count)
{
ptr = (SomeType*) Marshal.AllocHGlobal(
sizeof(SomeType) * count).ToPointer();
}
~UnmanagedStuff() { Dispose(false); }
public void Dispose() => Dispose(true);
private void Dispose(bool disposing)
{
if(disposing) GC.SuppressFinalize(this);
var ip = new IntPtr(ptr);
if (ip != IntPtr.Zero) Marshal.Release(ip);
ptr = default(SomeType*);
}
}
</code></pre>
</div>
<p>The wrapper type needs to know about the pointers, so is going to be <code class="highlighter-rouge">unsafe</code> - but does the <em>rest</em> of the code need to? Sure, we could add an indexer that uses <code class="highlighter-rouge">Unsafe.Read</code> / <code class="highlighter-rouge">Unsafe.Write</code> to access individual elements, but that means copying the data constantly, which is probably not what we want - and it doesn’t help us represent ranges. But <em>spans</em> do: we can return a <em>span</em> of the data (perhaps via a <code class="highlighter-rouge">Slice()</code> API):</p>
<div class="highlighter-rouge"><pre class="highlight"><code>public Span<SomeType> Slice(int offset, int count)
=> new Span<SomeType>(ptr + offset, count);
// ^^^ not shown: validate range first
</code></pre>
</div>
<p>And we can consume this pretty naturally <em>without</em> <code class="highlighter-rouge">unsafe</code>:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>// "stuff" is our UnmanagedStuff object
// easily talk about a slice of unmanaged data
var slice = stuff.Slice(5, 10);
slice[0] = new SomeType(123, DateTime.Now);
// (separate slices work)
slice = stuff.Slice(0, 25);
Console.WriteLine(slice[5]); // 123: 23/04/2017 09:09:51, ...
</code></pre>
</div>
<p>If we want to talk about <em>individual</em> elements (rather than a range), then a <code class="highlighter-rouge">ref</code> local (via a <code class="highlighter-rouge">ref return</code>) is what we want; we <em>could</em> use the <code class="highlighter-rouge">DangerousGetPinnableReference()</code> API on a <code class="highlighter-rouge">Span<T></code> for this, but in this case it is probably easier just to use <code class="highlighter-rouge">Unsafe</code> directly:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>public ref SomeType this[int index]
=> ref Unsafe.AsRef<SomeType>(ptr + index);
// ^^^ not shown: validate range first
</code></pre>
</div>
<p>We can consume this with similar ease:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>// talk about a *reference* to unmanaged data
ref SomeType item = ref stuff[5];
Console.WriteLine(item); // 123: 23/04/2017 09:09:51, ...
item = new SomeType(42, new DateTime(2016, 1, 8));
// prove that updated *inside* the slice
Console.WriteLine(slice[5]); // 42: 08/01/2016 00:00:00, ...
</code></pre>
</div>
<p>And now from <strong>any</strong> code, we can talk directly to the unmanaged memory simply by passing it in as a <code class="highlighter-rouge">ref</code> parameter - it will never be copied, just dereferenced. If you <em>want</em> to talk about an isolated copy or store a copy as a field, <em>then</em> you can dereference, but that is easy:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>SomeType isolated = item;
</code></pre>
</div>
<p>If you’ve ever worked with unmanaged memory from C#, this is a <em>huge</em> difference - and opens up a whole range of interesting scenarios for allocation-free systems <em>without</em> requiring the entire codebase to be <code class="highlighter-rouge">unsafe</code>. For context, in an allocation-free system, the lifetime of a set of data is strictly defined by some unit of work - processing an inbound request, for example. This means we don’t <em>need</em> reference tracking and garbage collection (and GC pauses can hurt high performance systems), so instead we simply take some slabs of memory, work from them (incrementing counters as we consume space), and then when we’ve finished the request we just set all the counters back to zero and we’re ready for the next request, no mess. Spans and <code class="highlighter-rouge">ref</code> locals and <code class="highlighter-rouge">ref return</code> make this friendly, even in the unmanaged memory scenario. The only caveat being - once again: <code class="highlighter-rouge">Span<T></code> and <code class="highlighter-rouge">ref T</code> cannot legally escape the stack. But as we’ve seen, we can <em>expose on-demand</em> a <code class="highlighter-rouge">Span<T></code> or <code class="highlighter-rouge">ref T</code> - so it <em>isn’t a burden</em>.</p>
<h2 id="summary">Summary</h2>
<p>Spans; they’re very powerful <em>if</em> you need that kind of thing. And they force a range of new concepts into C#, giving us all the combined strong points of arrays, pointers, references and generics - with very few of the pain points. If you <em>don’t care</em> about pointers, buffers, etc - you probably won’t need to learn about spans. But if you <em>do</em>, <em>they’re awesome</em>. The amount of effort the .NET folks (and the community, but mostly Microsoft) have made making this span concept so rich and powerful is <em>huge</em> - it impacts the compiler, the JIT, the runtime, and multiple libraries both pre-existing and brand new. And it impacts both .NET and .NET Core. As someone who works <em>a lot</em> in the areas affected by spans and <code class="highlighter-rouge">ref</code> - it is also hugely appreciated. Good things are coming.</p>Marc Gravellhttp://www.blogger.com/profile/01023334706549710089noreply@blogger.com