<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>bdunagan</title>
    <description>fill the void</description>
    <link>https://bdunagan.com</link>
    <atom:link href="https://bdunagan.com/feed.xml" rel="self" type="application/rss+xml" />
    
      <item>
        <title>How Faster Compute, More Data, and Better Algorithms Scaled AI to ChatGPT</title>
        <description>&lt;div class=&quot;flush&quot;&gt;&lt;img src=&quot;https://bdunagan.com/files/scaling_ai_to_chatgpt.jpg&quot; /&gt;&lt;/div&gt;
&lt;div class=&quot;caption&quot;&gt;&lt;a href=&quot;https://unsplash.com/photos/nIEHqGSymRU&quot;&gt;Unsplash&lt;/a&gt; - @viazavier&lt;/div&gt;

&lt;p&gt;I heard someone comment that NeuroIPS (a computational neuroscience conference) has grown in interest and narrowed in focus over the last two decades, where there were 500 people in 2003 discussing many different AI fields, 1,500 people in 2013 focused on machine learning, and 15,000 people in 2023 only talking about Transformers.&lt;/p&gt;

&lt;p&gt;Why? The Transformer neural network architecture scales results in terms of compute, data, and parameters. There are now 2-trillion parameter LLMs trained on 30-trillion token data sets and running on thousands of GPUs. Billions of people now use LLMs in Google Search’s AI Overview and OpenAI’s ChatGPT.&lt;/p&gt;

&lt;p&gt;Let’s walk through the three fundamental forces that have pushed the field forward at such a fast pace: faster compute, more data, and better algorithms.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fontawesome_icon fas fa-microchip&quot;&gt;&lt;/i&gt;&lt;br /&gt;Faster Compute&lt;/h3&gt;

&lt;p&gt;LLMs don’t run on CPUs. They run on GPUs. In &lt;a href=&quot;https://bdunagan.com/2024/03/23/building-a-10-million-parameter-llm/&quot;&gt;Building a 10-Million Parameter LLM with 300 Lines of Python and Training It in 10 Minutes&lt;/a&gt;, I tried running a 800k-parameter LLM on a CPU and a GPU: the GPU was 118x faster (2 hours vs 1 minute). CPUs are designed for serial execution of any set of instructions; they can handle running an OS, writing a Word document, and resizing a photo. GPUs are designed for parallelized execution of a specific set of instructions: math, specifically for graphics. It’s even in the name. GPU stands for Graphics Processing Unit, as they were originally designed for computer games.&lt;/p&gt;

&lt;p&gt;CPUs have a small number of processing cores (8-core or even 64-core), but GPUs have &lt;em&gt;thousands&lt;/em&gt; of cores. The Nvidia GeForce RTX 4090 ($1,600 MSRP) has 16,384 cores on a single card. Nvidia wrote a huge software framework called CUDA to abstract away the low-level details of all those cores, so that game developers could write simpler high-level software to leverage them in parallel.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/scaling_ai_compute.jpg&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;&lt;strong&gt;&lt;a href=&quot;https://www.topcpu.net/en/gpu-r&quot;&gt;Compute&lt;/a&gt;&lt;/strong&gt;: 100 GFLOPS to 100 TFLOPS in 20 years&lt;/div&gt;

&lt;!-- 2005: 10^2 (7800 GTX)
2012: 10^3 (K20)
2017: 10^4 (Titan)
2025: 10^5 (Blackwell) --&gt;

&lt;p&gt;Moreover, the amount of compute per card has been growing incredibly fast. Over the last twenty years, the compute power of a single GPU card has grown 1000x from 10^2 GFLOPS to 10^5 GFLOPS. FLOPS stands for floating-point operations per second. GFLOP stands for Giga-FLOPS or 1000x. The latest GPU cards clocks in at 100 TFLOPS.&lt;/p&gt;

&lt;p&gt;It’s purely coincidental that neural networks, like computer graphics, also require processing massive amounts of math that can be parallelized, but GPUs are a fantastic fit for neural networks. Fundamentally, GPUs are so much faster than CPUs at machine learning specifically because they are &lt;em&gt;designed&lt;/em&gt; for solving that specific numerical problem incredibly efficiently, processing math computation over thousands of cores in parallel.&lt;/p&gt;

&lt;p&gt;Tools such as Pytorch abstract away CUDA and tailor the experience to neural network programming. With Pytorch, developers can simply assign models to “the GPU”, without any lower-level understanding of blocks, threads, or cores. In fact, I built a 10-million parameter LLM in the post linked above by training and running inference on a single GPU.&lt;/p&gt;

&lt;p&gt;However, a 2-trillion parameter model does not train or run on “the GPU”. Pytorch isn’t magic and reinforces the lesson that LLMs are a leaky abstraction at scale. To train a model that requires thousands of individuals GPUs, the model needs to be split up using parallelism, referred to as 3D parallelism (as there are three independent axes):&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Data Parallelism (DP)&lt;/strong&gt;: Data is split into batches. Parameter sharding is a memory optimization on top of this with tools like ZeRO (Zero Redundancy Optimizer) and FSDP (Fully Sharded Data Parallel). This is for training or high-volume inference.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Tensor Parallelism (TP)&lt;/strong&gt;: Tensors are split within a single layer.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Pipeline Parallelism (PP)&lt;/strong&gt;: The pipeline between layers is split.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Meta said that pre-training Llama 4 Behemoth (the 2T-parameter model) required 32,000 GPUs, but let’s focus on the simpler task of running the model. Llama 4 Behemoth might be a 2T-parameter model, but it uses Mixture-of-Experts (see “Better Algorithms”) and so only has 288B active parameters.&lt;/p&gt;

&lt;p&gt;In practice, running the model might use an architecture like this using Nvidia 80GB H100s: 1 DP x 2 PP x 8 TP or 16 GPUs. That setup enables two axes of parallelism: splitting the model across layers for pipeline parallelism and then splitting within the individual layers for tensor parallelism. In fact, for a single request, pipeline parallelism is used simply to fit the model into GPU memory; when there are multiple requests, pipeline parallelism enables micro-batching to fully utilize the GPUs as each request progresses through the model layers.&lt;/p&gt;

&lt;p&gt;Combining all of these axes of parallelism enables LLMs to train and run quickly and efficiently at scale, but keep in mind that this entire description is all to generate a single token. After that token is generated, it’s appended to the context window, and the entire process runs again. That workflow cannot run in parallel because those steps are dependent. Similar to most at-scale cloud services, the actual infrastructure that undergirds being able to use ChatGPT is incredibly complicated.&lt;/p&gt;

&lt;p&gt;At a much higher level, we can look at how increasing amounts of compute has decreased the costs of generating tokens.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/scaling_ai_token_cost.jpg&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;&lt;strong&gt;&lt;a href=&quot;https://a16z.com/llmflation-llm-inference-cost/&quot;&gt;Cost&lt;/a&gt;&lt;/strong&gt;: GPT-3: $60, GPT-3.5: $20, Llama 3.2: $0.06&lt;/div&gt;

&lt;!-- 2022: $60 for GPT-3
2023: $20 for GPT-3.5
2024: $0.06 for Llama 3.2 3B --&gt;

&lt;p&gt;Token costs have plummeted by 100x over the last two years, from $60/million tokens for GPT-3 to $0.06/million tokens for Llama 3.2. Faster compute is making this massive amount of numerical computation far cheaper.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fontawesome_icon fas fa-database&quot;&gt;&lt;/i&gt;&lt;br /&gt;More Data&lt;/h3&gt;

&lt;p&gt;As compute becomes cheaper, there is an accelerating need for high-quality training data. There are training data sets for vision, speech, and other modalities, but let’s focus on text and see how training data sets have grown over just the last eight years.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/scaling_ai_dataset.jpg&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;&lt;strong&gt;Training Words&lt;/strong&gt;: GPT-1: 1B, GPT-3: 1T, Llama 4: 20T&lt;/div&gt;

&lt;!-- 2018: 1B words (GPT-1)
2020: 1T words (GPT-3)
2025: 20T words (Llama 4) --&gt;

&lt;p&gt;Since 2018, the training size for large language models has grown 20,000x. GPT-1 was trained on 1B words. Wikipedia has 55 million articles across all languages, comprising 29 billion words, and Llama 4 was trained on almost 1,000x more data, including text, image, and video datasets.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fontawesome_icon fas fa-project-diagram&quot;&gt;&lt;/i&gt;&lt;br /&gt;Better Algorithms&lt;/h3&gt;

&lt;p&gt;Faster compute is great. More data is great. It doesn’t matter if the algorithm–the neural network model–cannot embed the nuance of the data. That’s where Transformers come in. The Transformer architecture drastically increased the number of parameters for neural networks but enabled them to store far more nuance in a parallelizable data flow. More recently, Mixture-of-Experts (MoE) networks further pushed the number of parameters but, similarly, increased the accuracy of the model while keeping the number of “active” parameters at a lower number.&lt;/p&gt;

&lt;p&gt;Let’s walk through how quickly deep learning neural network models have grown in terms of parameter count.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/scaling_ai_parameters.jpg&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;&lt;strong&gt;&lt;a href=&quot;https://computerprogress.csail.mit.edu/compute-trends-machine-learning&quot;&gt;Parameters&lt;/a&gt;&lt;/strong&gt;: 1.5B to 2T in 8 years&lt;/div&gt;

&lt;!-- 2018: GPT-1 with 117m
2019: GPT-2 with 1.5b (10x)
2020: GPT-3 with 175b (100x)
2021: no big improvement
2022: no big improvement
2023: Grok-1 with 314b
2024: DeepSeek v3 with 671b
2025: Llama 4 with 2T (open source/weights) --&gt;

&lt;p&gt;That’s a 20,000x increase in parameters in eight years. For every parameter in GPT-1, there are 20,000 more in Llama 4, and every one of those parameters can learn a little bit more about the data.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fontawesome_icon fas fa-lightbulb&quot;&gt;&lt;/i&gt;&lt;br /&gt;The Bitter Lesson&lt;/h3&gt;

&lt;p&gt;Every graph above is log scale.&lt;/p&gt;

&lt;p&gt;The three dimensions driving the current wave of AI are all increasing exponentially. The pace of change feels so blisteringly fast these days precisely because all three drivers are advancing so quickly. We keep shipping faster compute. We keep expanding the data sets to train AI on. We keep improving the algorithms to train. Progress on any one of these would push AI forward. Our current exponential progress on all three made ChatGPT a reality.&lt;/p&gt;

&lt;p&gt;In fact, there are a growing number of examples, from Google’s AlphaZero to DeepSeek’s r1, where compute and search (self-play or reinforcement learning) scale results better than any human-derived heuristic systems that researchers can come up with. Better algorithms, such as Transformers and Mixture-of-Experts, that leverage faster compute and more data continue to produce increasingly accurate results.&lt;/p&gt;

&lt;p&gt;Greg Sutton wrote an influential essay in 2019 titled &lt;a href=&quot;http://incompleteideas.net/IncIdeas/BitterLesson.html&quot;&gt;“The Bitter Lesson”&lt;/a&gt;, arguing that the best way forward for AI is leveraging compute and search:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;“The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin…One thing that should be learned from the bitter lesson is the great power of general purpose methods, of methods that continue to scale with increased computation even as the available computation becomes very great. The two methods that seem to scale arbitrarily in this way are &lt;em&gt;search&lt;/em&gt; and &lt;em&gt;learning&lt;/em&gt;.”&lt;/p&gt;
&lt;/blockquote&gt;
</description>
        <pubDate>Sat, 31 May 2025 12:00:00 +0000</pubDate>
        <link>https://bdunagan.com/2025/05/31/scaling-compute-data-and-algorithms-to-chatgpt/</link>
        <guid isPermaLink="true">https://bdunagan.com/2025/05/31/scaling-compute-data-and-algorithms-to-chatgpt/</guid>
      </item>
    
      <item>
        <title>Building a 10-Million Parameter LLM with 300 Lines of Python and Training It in 10 Minutes</title>
        <description>&lt;div class=&quot;flush&quot;&gt;&lt;img src=&quot;https://bdunagan.com/files/llm_training.jpg&quot; /&gt;&lt;/div&gt;
&lt;div class=&quot;caption&quot;&gt;DALL-E: &quot;2:1 size picture of toy building blocks halfway through building a complicated structure&quot;&lt;/div&gt;

&lt;p&gt;OpenAI’s ChatGPT, Google’s Gemini, Meta’s Llama2, Mistral’s Mixtral are all examples of large languages models (LLMs). They are general-purpose machine learning models that can handle a wide variety of tasks. They seem magical, but we can build a small-scale 10-million parameter example of an LLM in around 300 lines of Python and train it in ten minutes on Google Colab for free. As we’ll see, creating and training a model is easy; achieving good performance is difficult.&lt;/p&gt;

&lt;p&gt;The steps we’ll go through are:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Creating a data set to train on&lt;/li&gt;
  &lt;li&gt;Creating a machine learning model&lt;/li&gt;
  &lt;li&gt;Training the model on the data set&lt;/li&gt;
  &lt;li&gt;Running inference on the model&lt;/li&gt;
  &lt;li&gt;Thinking about leaky abstractions&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Throughout this post, I’ll be using machine learning terminology. See my post &lt;a href=&quot;https://bdunagan.com/2024/03/16/a-glossary-for-understanding-large-language-models-in-ai/&quot;&gt;A Glossary for Understanding Large Language Models in AI such as OpenAI’s GPT-4, Meta’s Llama2, and Google’s Gemini&lt;/a&gt; for any unfamiliar terms.&lt;/p&gt;

&lt;p&gt;Let’s get started. All the code is available at &lt;a href=&quot;https://www.github.com/bdunagan/bdunaganGPT&quot;&gt;github.com/bdunagan/bdunaganGPT&lt;/a&gt;.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fontawesome_icon fas fa-table&quot;&gt;&lt;/i&gt;&lt;br /&gt;Data Set&lt;/h3&gt;

&lt;p&gt;For a small data set, I compiled a single text document with the contents of every blog post on this blog, bdunagan.com. The blog is written in Jekyll and markdown, so I wrote a Python script (&lt;a href=&quot;https://github.com/bdunagan/bdunaganGPT/blob/main/create_data_set.py&quot;&gt;create_data_set.py&lt;/a&gt;) to concatenate all the files in the “_posts” folder (195 files) into a single document at &lt;a href=&quot;https://github.com/bdunagan/bdunaganGPT/blob/main/bdunagan.com.txt&quot;&gt;bdunagan.com.txt&lt;/a&gt;. We’ll use this text file as the training data for our small Transformer model.&lt;/p&gt;

&lt;p&gt;The training data has 363,859 characters with a vocabulary of 107 using character-level tokens, as opposed to subword tokens like OpenAI’s tiktoken and Google’s sentencepiece.&lt;/p&gt;

&lt;p&gt;Even in this simple example, I had to go through ten iterations of generating the data: manually inspecting, training, and testing it to see what data worked well. For instance, the model was confused by HTML and Liquid tags, so I removed those. I also needed to remove Jekyll’s front matter, which is metadata for a particular blog post. Because it’s a text machine learning model, I didn’t even need to worry about labeling the data because the next predicted token (the label) is the next character in the text document.&lt;/p&gt;

&lt;p&gt;Data collection and cleaning is an incredibly important and time-intensive task for any machine learning application. Without it, “garbage in, garbage out”.&lt;/p&gt;

&lt;p&gt;There are many, many public data sets to choose from to start training a machine learning model, including &lt;a href=&quot;https://huggingface.co/datasets&quot;&gt;HuggingFace’s data sets&lt;/a&gt; and &lt;a href=&quot;https://www.tensorflow.org/datasets/catalog/overview&quot;&gt;TensorFlow’s catalog&lt;/a&gt;.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fontawesome_icon fas fa-layer-group&quot;&gt;&lt;/i&gt;&lt;br /&gt;Model&lt;/h3&gt;

&lt;p&gt;Our model is ~300 lines of Python, available as a Jupyter Notebook on GitHub (&lt;a href=&quot;https://github.com/bdunagan/bdunaganGPT/blob/main/bdunaganGPT.ipynb&quot;&gt;bdunaganGPT.ipynb&lt;/a&gt;), and Google Colab makes it easy to import.&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Click on &lt;a href=&quot;https://colab.research.google.com/github/bdunagan/bdunaganGPT/blob/main/bdunaganGPT.ipynb&quot;&gt;this link&lt;/a&gt; to open the Jupyter Notebook in Google Colab.&lt;/li&gt;
  &lt;li&gt;Alternatively, you can visit &lt;a href=&quot;https://colab.research.google.com/&quot;&gt;https://colab.research.google.com/&lt;/a&gt;, click File &amp;gt; Open Notebook, select “GitHub”, and paste the URL for &lt;a href=&quot;https://github.com/bdunagan/bdunaganGPT/blob/main/bdunaganGPT.ipynb&quot;&gt;bdunaganGPT.ipynb&lt;/a&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/llm_colab_github.png&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Open the Jupyter Notebook file from GitHub in Google Colab&lt;/div&gt;

&lt;p&gt;To complete the training in 10 minutes, we need a GPU. Change the processor by clicking on the arrow in the top right and selecting “Change runtime type”. Select the free T4 GPU, click “Save”, then click “Connect” next to the arrow.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/llm_colab_runtime.png&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Click on the arrow and select &quot;Change runtime type&quot;&lt;/div&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/llm_colab_t4.png&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Select the T4 GPU, click &quot;Save&quot;, then click &quot;Connect&quot;&lt;/div&gt;

&lt;p&gt;After the notebook loads in Google Colab and the correct processor is connected, press the Play button.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/llm_colab_play.png&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Press the play button to train and run this model for free in Google Colab&lt;/div&gt;

&lt;p&gt;Google Colab will run this notebook: training the 10-million parameter model, running inference, and outputting 100 new tokens––in 10 minutes for free using the T4 GPU with the following hyperparameters:&lt;/p&gt;

&lt;p&gt;&lt;tt&gt;GPTTest(batch_size=32, block_size=128, max_iters=1400, learning_rate=1e-3, n_embd=128, n_head=4, n_layer=52, dropout=0.1, device=device)&lt;/tt&gt;&lt;/p&gt;

&lt;p&gt;The model achieves a loss of 1.680 and produces the following text:&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&quot;Store 200 nice detail handled details of Launch and files or an useful base Amazon Vier. I also read&quot;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Some people will find this output astonishing. Others will find it laughably bad. Both are right. The difference is in expectation and understanding of the building blocks of the neural network.&lt;/p&gt;

&lt;p&gt;Before training, the model knows nothing; it doesn’t understand or write English. Its only training for knowing what character (token) to write next is based solely on my blog’s contents. After running a training loop one thousand times on one GPU, the model was able to string together characters into English words, capitalize the first word, and add some spaces and even a period. It’s not ChatGPT though, which took months to train on thousands of GPUs with billions of tokens.&lt;/p&gt;

&lt;p&gt;This model is based on Andrej Karpathy’s excellent nanoGPT model from his &lt;a href=&quot;https://karpathy.ai/zero-to-hero.html&quot;&gt;Zero-to-Hero&lt;/a&gt; YouTube course, a fantastic and accessible deep dive into deep neural networks. Karpathy is a cofounder of OpenAI and was the Director of AI at Tesla before recently returning to OpenAI. Learn about how to write a model like this in his &lt;a href=&quot;https://www.youtube.com/watch?v=kCc8FmEb1nY&quot;&gt;“Let’s build GPT: from scratch, in code, spelled out”&lt;/a&gt; YouTube video.&lt;/p&gt;

&lt;p&gt;I updated the Python code in three ways:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Pytorch’s Multihead Attention&lt;/strong&gt;: I switched from Karpathy’s version of Multihead Attention to Pytorch’s version using an attention mask.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Positional Encoding&lt;/strong&gt;: I experimented with the original &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sin()&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cos()&lt;/code&gt; positional encoding function with no learning parameters from Google’s &lt;a href=&quot;https://arxiv.org/abs/1706.03762&quot;&gt;“Attention Is All You Need”&lt;/a&gt; paper, but testing showed that Karpathy’s simpler learned positional encoding reduced loss a bit more.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Toggles&lt;/strong&gt;: I added three different toggles to experiment with changing the model architecture. Two of them were the above items, and the third was disabling residual connections.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of my changes improved the model’s original performance, but they were useful for experimentation.&lt;/p&gt;

&lt;p&gt;Let’s compare it to GPT-3, using the table from OpenAI’s 2020 research paper titled &lt;a href=&quot;https://arxiv.org/abs/2005.14165&quot;&gt;“Language Models are Few-Shot Learners”&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/llm_gpt3_table.png&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Hyperparameter table from GPT-3 paper&lt;/div&gt;

&lt;p&gt;Hyperparameters are attributes about the model and training, rather than weights within the model that we’re tuning. Each of these models was trained on 300 billion tokens, meaning each model had the same massive set of training data to learn from, but there are a number of hyperparameters listed that define the structure of each model. Let’s walk through each hyperparameter, including the equivalent variable name in our bdunaganGPT version:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;n&lt;sub&gt;params&lt;/sub&gt;&lt;/strong&gt; (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;parameters&lt;/code&gt;): The number of total weights (parameters) in the model.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;n&lt;sub&gt;layers&lt;/sub&gt;&lt;/strong&gt; (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;n_layer&lt;/code&gt;): The number of aggregate layers in the model. Each layer is composed of multiple components.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;d&lt;sub&gt;model&lt;/sub&gt;&lt;/strong&gt; (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;n_embd&lt;/code&gt;): The dimensionality of the embedding vector.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;n&lt;sub&gt;heads&lt;/sub&gt;&lt;/strong&gt; (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;n_head&lt;/code&gt;): The number of attention heads in each multi-head attention block.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;d&lt;sub&gt;head&lt;/sub&gt;&lt;/strong&gt; (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;head_size&lt;/code&gt;): The number of inputs on each attention head block, determined by two above hyperparameters: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;n_embd // n_head&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Batch Size&lt;/strong&gt; (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;batch_size&lt;/code&gt;): The number of examples in each batch as it’s processed.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Learning Rate&lt;/strong&gt; (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;learning_rate&lt;/code&gt;): The step size of how much we update the weights on every back propagation.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Context Window&lt;/strong&gt; (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;block_size&lt;/code&gt;): The number of tokens each model reads in to predict the next token. All GPT-3 models used 2048 tokens.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With these hyperparameters, OpenAI configured GPT-3, just like we configured bdunaganGPT. GPT-3 is the same basic architecture as the model we ran. GPT-3 is just much bigger.&lt;/p&gt;

&lt;p&gt;There is a significant caveat that models on the scale of GPT-3 can be optimized in various ways. For instance, OpenAI’s paper on GPT-3 refers to “alternating dense and locally banded sparse attention patterns in the layers of the transformer, similar to the Sparse Transformer.” But broadly speaking, the architectures are the same.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fontawesome_icon fa fa-gears&quot;&gt;&lt;/i&gt;&lt;br /&gt;Training&lt;/h3&gt;

&lt;p&gt;The loss calculation is independent of our architecture, so we can test the loss with different architectures, processors, and hyperparameters to see which minimizes the training times and loss. For instance, doubling the layers might reduce the loss by a bit but double the training time, which might be an unacceptable tradeoff. I used cross entropy loss as the loss function for the model feedback. It’s the same as negative log likelihood and is derived from KL divergence.&lt;/p&gt;

&lt;p&gt;Let’s look at how the loss decreases over time. With random weights, the loss should be around &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-ln(1/n)&lt;/code&gt; where &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;n&lt;/code&gt; is the number of possible tokens. In my case, there are 107 possible characters in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bdunagan.com.txt&lt;/code&gt;, so the baseline loss is around 4.8. The loss quickly decreases though. By saving the loss for each iteration, we use Matlab to plot how the loss changes over time.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/llm_loss_10k.png&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;&lt;pre&gt;plt.plot(steps, losses)&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The significant noise in the loss graph is caused by the variance in the batches. As the LLM progressively trains across different batches, the loss goes up and down locally while still decreasing on average, and the plot visually confirms that the model is improving with more training.&lt;/p&gt;

&lt;p&gt;The problem is that this model is just one combination of hyperparameters. There could be a different set that produces an even better model. In theory, we could test each combination of hyperparameters to find the optimal one, but in practice, large models take months to train. We cannot exhaustively search the possible space for the minimum loss.&lt;/p&gt;

&lt;p&gt;To get a sense of how the loss changes, let’s look at a small set of combinations, all tested on Google Colab’s T4 processor:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th style=&quot;text-align: left&quot;&gt;Hyperparameters&lt;/th&gt;
      &lt;th style=&quot;text-align: right&quot;&gt;#1&lt;/th&gt;
      &lt;th style=&quot;text-align: right&quot;&gt;#2&lt;/th&gt;
      &lt;th style=&quot;text-align: right&quot;&gt;#3&lt;/th&gt;
      &lt;th style=&quot;text-align: right&quot;&gt;#4&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Parameters&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;5,451&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;31,339&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;212,331&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;1,621,611&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Layers&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;4&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;8&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;16&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;32&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Embedding Vector&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;8&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;16&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;32&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;64&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Heads&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;4&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;8&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;16&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;32&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Batch Size&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;8&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;16&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;32&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;64&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Block Size&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;32&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;128&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;128&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;256&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Learning Rate&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;1e-2&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;1e-3&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;1e-4&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;1e-5&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Dropout&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;0.0&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;0.1&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;0.2&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;0.3&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Iterations&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;1,000&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;1,000&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;1,000&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;1,000&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Training Time(s)&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;19&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;36&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;77&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;1218&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Validation Loss&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;2.627&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;2.588&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;2.770&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;3.253&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;More parameters doesn’t immediately translate to lower loss. By decreasing the learning rate for the 1.6m-parameter model, we saw a higher loss than the 30k-parameter model after the same number of training iterations.&lt;/p&gt;

&lt;p&gt;Even with one particular hyperparameter combination, more training isn’t always better. As an example of overfitting, see the table below to watch the validation loss grow while the training loss continues to shrink. The set of parameters shows that even the number of iterations is important.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th style=&quot;text-align: left&quot;&gt;Iterations&lt;/th&gt;
      &lt;th style=&quot;text-align: right&quot;&gt;Training Loss&lt;/th&gt;
      &lt;th style=&quot;text-align: right&quot;&gt;Validation Loss&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;500&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;2.0993&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;2.1271&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;1,500&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;1.2805&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;1.6006&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;3,000&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;0.9815&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;1.7457&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;div class=&quot;caption&quot;&gt;An 800k parameter model with batch_size=64, block_size=128, max_iters=3000, learning_rate=1e-3, n_embd=64, n_head=8, n_layer=16, dropout=0.0&lt;/div&gt;

&lt;p&gt;Unsurprisingly, changes to the Transformer architecture lead to significant changes in the final loss. Let’s see the effect of disabling various parts of a 200k-parameter model over 2,000 iterations with the following hyperparameters: 4 layers, 4 heads, 16 batch size, 32 block size, 1e-3 learning rate, 64 embedding vector, and 0.0 dropout.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th style=&quot;text-align: left&quot;&gt;Model Notes&lt;/th&gt;
      &lt;th style=&quot;text-align: right&quot;&gt;Validation Loss&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;Original architecture&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;1.916&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;No normalization&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;1.926&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;No positional encoding&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;2.026&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;No attention&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;2.554&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;No residual connections&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;3.162&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;At this small scale, certain architectural pieces are less necessary than others are. Attention is the fundamental insight of the &lt;a href=&quot;https://arxiv.org/abs/1706.03762&quot;&gt;“Attention Is All You Need”&lt;/a&gt; paper from Google, and that layer is clearly helpful at capturing information for the model to use.&lt;/p&gt;

&lt;p&gt;However, the effect of residual connections is even more powerful. These connections are the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;+&lt;/code&gt; operations that add the input to the output of a layer, and backprogagation over residual connections lets the model avoid vanishing gradients because the final loss flows back to each layer separately, thanks to the derivatives. (Watch Karpathy’s &lt;a href=&quot;https://karpathy.ai/zero-to-hero.html&quot;&gt;micrograd&lt;/a&gt; YouTube video for a detailed mathematical explanation.) By cobbling together these architectural patterns, we can avoid plateauing in the loss function over many training iterations.&lt;/p&gt;

&lt;p&gt;All these training passes take time, but the processor type drastically changes how long. Let’s compare how long training takes for different processors that Google Colab has available.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/llm_colab_processors.png&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Google Colab - Processor Options&lt;/div&gt;

&lt;p&gt;We’ll use the 800k-parameter LLM to accentuate the differences in times with the following hyperparameters: 16 layers, 1,000 iterations, 8 heads, 64 batch size, 128 block size, 1e-3 learning rate, 64 embedding, and 0.0 dropout. Note that Google Colab provides CPU and T4 for free when resources are available, but the A100 and V100 are part of the paid plan.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th style=&quot;text-align: left&quot;&gt;Processor&lt;/th&gt;
      &lt;th style=&quot;text-align: right&quot;&gt;Training Time (seconds)&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;CPU&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;7568&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;T4 GPU&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;117&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;V100 GPU&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;69&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;A100 GPU&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;64&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;div class=&quot;caption&quot;&gt;GPUs are 118x faster than CPUs are.&lt;/div&gt;

&lt;p&gt;GPUs greatly accelerate training performance, even in this very small example. The model trained in two hours on the CPU and in one minute on the GPU. The Nvidia A100 GPU enabled the model to train 118x faster than the CPU did, opening up opportunities for scaling that wouldn’t have been practical with CPUs.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fontawesome_icon fas fa-file-alt&quot;&gt;&lt;/i&gt;&lt;br /&gt;Inference&lt;/h3&gt;

&lt;p&gt;We’ve quantified the loss across a variety of different hyperparameters, but we have no idea what the output looks like for each tier of loss. Again, before training, the model knows nothing about English. During training, all it sees is my blog’s content and tries to replicate it to mimimize the loss. Let’s qualify the loss by see the output text at various points:&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;4.0 Validation Loss&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;==rv8\&lt;/code&gt;
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Exlu(6~@ &apos;/)1/U⌘N4vbyYW&amp;amp;?jeuP48-ea&amp;gt;&quot;6\\R&lt;/code&gt;
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;;J!&amp;gt;:3~ &quot;H~25 V&quot; djP1Qw%uI&amp;lt;\d”&amp;gt;—y7	dna+ZHa%Cal&amp;lt;(-:?7&lt;/code&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;3.0 Validation Loss&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;T .septheeh.s  9 nt&amp;gt; e  yr. ye dmnild  &lt;/code&gt;
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;a .aueo_r%lme ol e  aslen r  hu.noco W Un elenQdhautf t neep&lt;/code&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;2.5 Validation Loss&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;alder t Cfo t o s TalySL mlo deo dotersincolrwfit rto me ontade. iom Pk gs &quot;indodeacronero andn. t&apos;&lt;/code&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;2.0 Validation Loss&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ayigh, Ontiss thippated to so peckup to with spoickin&apos;t nox a rele davestrn difUpearanges and frotuI&lt;/code&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;strong&gt;1.6 Validation Loss&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;I going turn into a key-vs, partners, remarker of these; additions tha steps disconsforw was easy li&lt;/code&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;Look at how the text changes from random letters to almost coherent words and then to almost reasonable sentences. A magical quality of deep neural networks and LLMs in particular is the ability to continually decrease the loss with the right hyperparameters and more data in a reasonable amount of time with GPUs. They continue to absorb more context and leverage that knowledge to generate new tokens that accurately resemble the existing data set.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fontawesome_icon fas fa-user-cog&quot;&gt;&lt;/i&gt;&lt;br /&gt;Fine-Tuning&lt;/h3&gt;

&lt;p&gt;What we built is not ChatGPT. What we did is referred to as pre-training. The model is trained on a large corpus with a loss function that measures how well the output matches the corpus. Regardless of how much we improved the loss, the output would remain a steady stream of sentences that sound like my blog posts, not answers to questions.&lt;/p&gt;

&lt;p&gt;The next step is called fine-tuning, feeding in thousands of examples of conversations along with reinforcement learning with human feedback (RLHF) to nudge the models’ pre-trained weights to appear to respond to questions with answers or write accurate responses to prompts.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fontawesome_icon fas fa-brain&quot;&gt;&lt;/i&gt;&lt;br /&gt;Leaky Abstraction&lt;/h3&gt;

&lt;p&gt;There is a significant difference between reading about machine learning and large language models and actually attempting to create one. At one point during his YouTube series, Karpathy notes that deep neural networks are a “leaky abstraction”: understanding the internals of each aspect is critical to avoid pitfalls. Let’s look at three different instances in our simple model:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Hyperparameter Tuning&lt;/strong&gt;: Simply increasing the iterations for certain hyperparameter combinations increased our model’s validation loss rather than decreasing it. Overfitting is a huge problem in machine learning in general.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Residual Connections&lt;/strong&gt;: Before these connections, models suffered from vanishing gradients because the loss function’s output didn’t flow back to each layer separately and instead passed through each layer. But without understanding the math behind how the derivative can bypass layers with an addition, it wouldn’t be clear that residual connections would be a way around that. See the following code blocks as a comparison:&lt;/li&gt;
&lt;/ul&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;c1&quot;&gt;# Residual connection in Transformer Block
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sa&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ln1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ffwd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ln2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;# No residual connection in Transformer Block
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sa&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ln1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ffwd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ln2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;# Subtle bug in residual connection in Transformer Block
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ln1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;# BUG: x is overridden and no longer passes through, but this works for small networks
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;attn_mask&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Transformer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;generate_square_subsequent_mask&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;to&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;device&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;output&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;_&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sa&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;attn_mask&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;attn_mask&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;need_weights&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;False&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;is_causal&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;True&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;output&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ffwd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ln2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Matrix, Layer, and Model Analysis&lt;/strong&gt;: Every level of the model requires analysis and optimization to reduce the loss to the minimum, including studying matrix sizes to ensure they are doing what they are expected to do and making sure the network is set up correctly. These systems are incredibly complex at scale, with a host of places that could silently increase loss.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Understanding the innerworkings of neural networks enables us to optimize and tune models in a way that relying on the abstraction would not.&lt;/p&gt;

&lt;p&gt;Moreover, a deeper knowledge both demystifies and highlights the magical quality of LLMs. ChatGPT is an LLM that frequently &lt;em&gt;feels&lt;/em&gt; like talking to a person. Fundamentally, it is an LLM that is simply predicting the most appropriate next token in a way that minimizes the loss. I still find it amazing that these steps can lead to such an authentic interaction.&lt;/p&gt;
</description>
        <pubDate>Sat, 23 Mar 2024 12:00:00 +0000</pubDate>
        <link>https://bdunagan.com/2024/03/23/building-a-10-million-parameter-llm/</link>
        <guid isPermaLink="true">https://bdunagan.com/2024/03/23/building-a-10-million-parameter-llm/</guid>
      </item>
    
      <item>
        <title>A Glossary for Understanding Large Language Models in AI such as OpenAI's GPT-4, Meta's Llama2, and Google's Gemini</title>
        <description>&lt;div class=&quot;flush&quot;&gt;&lt;img src=&quot;https://bdunagan.com/files/llm_glossary.jpg&quot; /&gt;&lt;/div&gt;
&lt;div class=&quot;caption&quot;&gt;DALL-E: &quot;picture of an open book 16:9&quot;&lt;/div&gt;

&lt;p&gt;OpenAI’s GPT-4, Meta’s Llama2, and Google’s Gemini are all forms of large language models (LLMs), a subset of deep neural networks, which are a subset of machine learning algorithms. LLMs feel magical, but at their core, these models are token-prediction algorithms. The fundamental building blocks of these machine learning models are quite simple but capable of achieving astounding results at scale.&lt;/p&gt;

&lt;p&gt;Let’s walk through a number of words that come up for LLMs.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fontawesome_icon fas fa-book&quot;&gt;&lt;/i&gt;&lt;br /&gt;Glossary&lt;/h3&gt;

&lt;p&gt;At the highest level, we care about two aspects:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Model&lt;/strong&gt;: The model is the actual algorithm for processing a given input context and generating an output.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Weights&lt;/strong&gt;: Weights are numerical constants being multiplied with (weights) or added to (biases) the inputs, like (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;w*x + b&lt;/code&gt;). For simplicity, weights and biases are both referred to as weights. They’re also referred to as parameters. GPT-3 has 7 billion parameters. Talking about the number of weights for a neural network gives an instant scope to the complexity of the network, but the number does not translate into accuracy.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Going a level deeper, we can think about how we train and use the model and weights:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/llm_tokens.jpg&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Converting &quot;LLMs understand numbers, not text.&quot; into a numerical vector using tiktoken&lt;/div&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Tokens&lt;/strong&gt;: GPT-3 was trained on a massive corpus of text. However, neural networks don’t “read” text. They take a set of numbers as input and output a set of numbers that minimize the loss function. These numbers are called tokens. For large language models (LLMs), text is converted into these tokens that typically represent subword chunks (not individual characters but not words) for the network to read as input and to write as output.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Training&lt;/strong&gt;: To optimize the weights of the neural network, we take a large corpus of data and iteratively run a forward pass (inference) on small sections of it to generate an output. The output is then compared to our desired output, and the difference is called the loss. The loss is passed backwards through the network using calculus to adjust the weights, so that the next forward pass is more accurate. The goal of training is to minimize the loss of a model’s inference.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Inference&lt;/strong&gt;: When consumers use ChatGPT, the model is running inference (a forward pass) to read an input to generate an output.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Below models and weights, we can dive into the innerworkings of neural networks. Let’s go through them in quasi-top-down order:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Hyperparameters&lt;/strong&gt;: These refer to the actual configuration of the neural network, like having 2 layers of 100 neurons or 4 layers of 50 neurons. Optimizing hyperparameters is a second-order optimization on top of optimizing the weights of one iteration of the neural network.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/llm_neuron.jpg&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Neuron&lt;/div&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Neuron&lt;/strong&gt;: The neuron, also referred to as a node or a perceptron, is the central building block of the neural network. It’s a simple mathematical function, designed to represent a biological neuron. The neuron takes a set of input values (&lt;tt&gt;x&lt;sub&gt;n&lt;/sub&gt;&lt;/tt&gt;), multiplies each by a weight (&lt;tt&gt;w&lt;sub&gt;n&lt;/sub&gt;&lt;/tt&gt;), adds them all together, adds a bias (&lt;tt&gt;b&lt;/tt&gt;), and finally passes that value through a non-linear function, also called an activation function, to produce an output (&lt;tt&gt;y&lt;/tt&gt;).&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Loss Function&lt;/strong&gt;: The loss function is the final part of the neural network, only present during training, and compares the output of the forward pass of the network to the expected output. Again, the goal is of training is to minimize the loss.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Forward Pass&lt;/strong&gt;: The forward pass of a network is processing an input through the network into an output. For an LLM, the output is a set of probabilities returned by the softmax function to decide the likelihood of the next token within the token space. That probability distribution is then sampled to decide on the predicted token.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Backward Pass&lt;/strong&gt;: The backward pass of a network is the crucial part of training neural networks. We take the output of the loss function and feed it into the network going backwards using calculus to take local derivatives of each part with respect to the final output.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Gradient Descent&lt;/strong&gt;: When we calculate the derivative of the local equation with respect to the final output during the backward pass, we use the local slope and multiply it by a small learning rate to move toward the local minima.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Learning Rate&lt;/strong&gt;: We are using gradient descent to move the weights incrementally toward an output that is the local minima of the equation. The learning rate is the step size of these increments: too small will take the network too long to reach the local minima, too large will cause the network to overshoot the local minima.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Cross Entropy Loss&lt;/strong&gt;: We need a loss function to help the neural network learn how to improve based on a single positive number that we want to minimize, and popular ones include cross entropy loss and mean square error (MSE) loss. Cross entropy loss takes the softmax of the logits and then the mean of the negative log likelihood, giving us a single number. Negative log likelihood (NLL) is simply the negative log of the value. The cross entropy loss function is calculating the distance (KL divergence) between the predicted probability distribution and the true probability distribution. Simplifying that mathematical formula results in the negative log likelihood.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Softmax&lt;/strong&gt;: This is a normalization function that translates a set of numbers to a set of probabilities between zero and one, enabling the result to be handled as a standard probability range. However, instead of a basic normalization function that divides each value by the sum of all values, softmax uses &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;e^x / sum(e^x)&lt;/code&gt;. The exponential component both highlights the maximum value by increasing the distance between it and the other values, unlike basic normalization, and is differentiable, unlike hardmax.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Logits&lt;/strong&gt;: It’s the unit of measurement for a log scale (&lt;em&gt;log&lt;/em&gt;istic un&lt;em&gt;it&lt;/em&gt;). We use this term for the output of the penultimate neural network layer before we normalize the output using the softmax function (to get final probabilities for the next token) or before we calculate cross entropy loss (to get a loss number to minimize). Logits are unnormalized log-probabilities because both of those subsequent functions includes a softmax calculation, which exponentiates and normalizes the values. Logits are not the final values because we want the neural network’s output to be in the form of probabilities with the maximum value exaggerated through the softmax function.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Logprobs&lt;/strong&gt;: After we use softmax on the output to get the probabilities of the next token, we take the log to get logprobs, and this extra calculation helps with numerical stability for small probability values due to the way computers store real numbers. For example, .000000001 and .0000000001 are 10x different, but their precision could be lost; however, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;log(.0001)&lt;/code&gt; is -9 and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;log(.00001)&lt;/code&gt; is -10, and their precision won’t be lost in storage. Surfacing logprobs from an LLM provides more nuance to the LLM’s confidence in its own output and what alternative responses would have been. OpenAI started providing logprobs for its selected output tokens and alternative tokens in 2023.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Batch Size&lt;/strong&gt;: The model cannot process the entire training data set in one pass, and it would lose valuable information if it processed each example independently. The batch size refers to the numbers of examples that are processed concurrently in a single forward pass. For example, GPT-3 had a batch size of 3.2 million, so every forward pass had 3.2 million examples to process together to form a better understanding.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/llm_transformer_attention.png&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Attention block from &quot;Attention Is All You Need&quot; paper&lt;/div&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Attention&lt;/strong&gt;: Neural networks do not automatically absorb relationship information. For example, in the sentence “The house is on the market.”, “house” and “market” have a relationship, but it’s not the same relationship as the two words in the sentence “The house is next to the market.” Previous architectures, like recurrent neural networks (RNNs), add dependencies between stages of the model to add better relationship information, but this dependency prevented them from being efficiently parallelizable. In a 2017 research paper from Google titled &lt;a href=&quot;https://arxiv.org/abs/1706.03762&quot;&gt;“Attention Is All You Need”&lt;/a&gt;, researchers proposed capturing that contextual information in a new set of weights to train in the network in the form of matrices called “Query”, “Key”, and “Values”. The paper was focused on machine translation, using an encoder block for the source text and a decoder block for the destination text. However, attention has become the central architectural insight for modern LLMs because models can absorb relationship information of surrounding tokens in an efficiently parallelizable way.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Cross-Attention&lt;/strong&gt;: This version of attention has the keys and values come from the encoder while the queries come from the decoder. Cross-attention is tailored to machine translation, where the model is attempting to both understand the source language in its entirety but also understand &lt;em&gt;and predict&lt;/em&gt; the destination language output. For example, a model translating French to English would see “Bonne après-midi.” as input to the encoder block but only “Good” (not “ morning”) as input to the decoder block. Its job is to use the entire French context and a partial English context to predict the next English token. Again, this approach has proven in practice to be far better than any other ML algorithm has been.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Self-Attention&lt;/strong&gt;: This version of attention has the keys and values generated from the same input as the queries, using a decoder block and no encoder block. Reusing our example, the input would be “Good” as tokens, and based on training data, the model would predict “ afternoon” as the next token.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Multi-Head Attention&lt;/strong&gt;: One layer of self-attention only lets us represent a single relationship between two tokens, because the softmax function highlights only one possibility. Adding more heads enables the layer to capture more relationships to absorb more information about the tokens. For example, “The hungry dog ate breakfast.” has multiple relationships: “hungry” and “dog”, “dog” and “ate”, “hungry” and “ate”, “ate” and breakfast”, and “hungry” and “breakfast”. We want the LLM to absorb as many relationships as possible to most accurately predict what to say next.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Encoder&lt;/strong&gt;: There can be two modules for attention: an encoder and a decoder. A model for machine translation uses both. The encoder takes the source language, and the decoder takes the destination language. And keep in mind that encoding is not the same as embedding. Moreover, there are decoder-only models like BERT.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Decoder&lt;/strong&gt;: The decoder can take a destination language for machine translation, or it can be used outside of machine translation for token prediction, like for GPT.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Tokenization&lt;/strong&gt;: The input needs to be translated into numerical values (tokens) for the model to interpret mathematically. This process is called embedding. A single token is pre-assigned a number, so the set of tokens in the input becomes a vector of numbers. The token-to-integer lookup is defined statically in advance. For example, Google uses a sub-word embedding algorithm called &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sentencepiece&lt;/code&gt;, and OpenAI uses one called &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tiktoken&lt;/code&gt; (with 50,257 possibilities).&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Embedding&lt;/strong&gt;: Tokens are used as a lookup into an N-dimensional vector space, called an embedding space. Each token’s vector is random initially and adjusted through back propagation during training to bring similar tokens closer to each other and dissimilar tokens further from each other. Keep in mind that embedding is not the same as attention and does not retain token positions.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Positional Encoding&lt;/strong&gt;: Attention captures relationships between tokens, but it does not include actual position. Encoding the position of each token provides more information for the model to utilize. The 2017 “Attention is All You Need” paper from Google decided to use a sin/cos formula based on the embedding vector dimension, embedding vector index, and token index in order. These encoding values are simply added to the embedding vectors’ values (keeping the dimensionality the same). An alternative approach is to create a second embedding with the dimensions of the context length and the original embedding vector and add that to the original embedding vector. This second option is called learned positional encoding because it lets the neural network optimize the weights and can lead to even better performance.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Vanishing Gradient&lt;/strong&gt;: A key insight in the attention paper was adding the raw input of a layer to the layer’s output through a residual connection. The backpropagation calculus lets the gradient pass back to the initial layers without being diminished by intermediate layers, improving the learning rate. Other architectures found the gradient vanished as it traveled back through the layers.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/llm_context_window.jpg&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;The context window and next token prediction over three inference runs&lt;/div&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Context Window&lt;/strong&gt;: As token-prediction algorithms, LLMs take an input to produce exactly one token. The first inference produces the first output token, based on the input. The second inference produces the second token, based on the input and the first output token. And the process continues, sliding the context window along by one token per inference run. When using ChatGPT, the displayed response is actually the model being run over and over again to take the question and on-going response to produce the next token.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Context Length&lt;/strong&gt;: This is the total number of tokens that the neural network can take as input. In papers, researchers refer to this as the block size. Because it’s the total, LLMs like GPT include the output token length in the count, so that the LLM can keep the beginning of the input in the context window while generating the end of the output. Otherwise, the LLM would “forget” what the beginning of the initial input was while attempting to generate the end of the output.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/llm_data_set.jpg&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Data Sets&lt;/div&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Labeled Data Set&lt;/strong&gt;: Training a model requires an input and a desired output, referred to as the label. The model trains its parameters to minimize the difference (loss) between the actual output and the desired output. For the text prediction tasks that an LLM performs, the label is the next token in the data set. This data set is split into three subsets: training data set (60%), validation data set (20%), and test data set (20%). (Percentages vary.) Consumer LLMs like ChatGPT are trained on huge data sets. For instance, GPT-3 was trained on 300 billion tokens, so in our breakdown, that would be 180 billion tokens for the training set, 60 billion tokens for the validation set, and 60 billion tokens for the test set.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Training Data Set&lt;/strong&gt;: This subset of data is used to actually train the model.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Validation Data Set&lt;/strong&gt;: This subset of data is used to validate the loss of the model but not training the model, such as running many different configurations of hyperparameters and finding the lowest loss. At a higher level though, the model is being trained on this subset simply by optimizing for the model where those hyperparameters generate the lowest loss.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Test Data Set&lt;/strong&gt;: This subset of data is used to test the loss of the final model using data that the model has never seen to identify underfitting or overfitting.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Overfitting&lt;/strong&gt;: By training models on a specific set of data, the parameters can become too optimized on its quirks. For instance, training a model on thousands of pictures of a cow in a grass field might lead to a model that cannot identify a cow standing on a road. It extracted the wrong information from the data and overfit: no grass, no cow.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Underfitting&lt;/strong&gt;: If the model does not have access to enough data, it can underfit and not be able to complete the task. In the example of the cow in a grass field above, training a model on only ten images of cows would not provide enough information for the model to be able to identify a cow in a new image. Well-tuned models neither overfit nor underfit; they absorb enough information to complete the task but are able to generalize beyond the trained data set to work on new inputs.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Zero-Shot Learning&lt;/strong&gt;: A model is able to complete a task without ever having seen it before. ChatGPT was so surprising because it did very well at tasks that it had never been trained on, like “write a haiku about why ChatGPT’s service is overloaded” (which OpenAI had on its status page for a while).&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Few-Shot Learning&lt;/strong&gt;: A model is able to complete a task having only seen a couple instances of it before.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;PyTorch&lt;/strong&gt;: This is a popular Python library created by Meta (then Facebook) for writing machine learning. It’s a complement to Pandas and numpy libraries and competes with Google’s Tensorflow.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Pre-training&lt;/strong&gt;: “Training” encompasses the entire process of preparing a model for usage, but technically, optimizing weights is referred to as pre-training.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Fine-Tuning&lt;/strong&gt;: After pre-training a model, we use a smaller set of examples, on the order of thousands, to fine-tune the weights of the model to solve more specific tasks, such as answering questions.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Finally, let’s touch on a number of general terms that come up for the current state of machine learning:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/llm_transformer_model.png&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Transformer model from &quot;Attention Is All You Need&quot; paper&lt;/div&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Transformers&lt;/strong&gt;: This is a type of neural network architecture that Google researchers proposed in 2017 in &lt;a href=&quot;https://arxiv.org/abs/1706.03762&quot;&gt;“Attention Is All You Need”&lt;/a&gt; that utilizes attention as a building block. It’s called a transformer because the model transforms the data in different ways and the model needed a name to differentiate itself from RNNs, which were the state-of-the-art at the time.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;GPT&lt;/strong&gt;: This is an acronym for generative pre-trained transformers that OpenAI came up with, first seen with their GPT-1 model. ChatGPT has been so successful that transformer and GPT are frequently used interchangeably.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;LLM&lt;/strong&gt;: This is an acronym for large language models. GPT is a specific type of LLM.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Foundational Model (FM)&lt;/strong&gt;: This is a type of AI model that has been trained on a large set of data and can be applied to a large set of use cases. Due to the success of ChatGPT and other LLMs, there is a growing set of companies operationalizing foundation model pipelines, referred to as FMOps or LLMOps.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Multi-Layer Perceptron&lt;/strong&gt;: This is a simple type of neural network that has multiple layers of perceptrons, also referred to as neurons.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Convolutional Neural Networks (CNNs)&lt;/strong&gt;: Commonly used for image analysis, this type of neural network has convolutional layers in between feedforward (unidirectional) layers. These convolutional layers act as filters on the data, sliding across the context and extracting certain features into its weights. Google DeepMind released a popular version called WaveNet for processing audio.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Recurrent Neural Networks (RNNs)&lt;/strong&gt;: Before transformers, this was the best neural network for machine translation. It preserves state inside each layer to use the previous output in the next input, giving the network a memory but also introducing dependencies between iterations and neurons. For comparison, the attention architecture in transformers also adds a form of memory but without the dependencies, so transformers can parallelize more efficiently.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Small Language Models (SLMs)&lt;/strong&gt;: Given the success of ChatGPT, companies are exploring how to build their own small-scale models but with a small set of domain-specific data to focus the model on a set of tasks. General LLMs like GPT-4 and Gemini take months to train and significant resources even for inference. Shrinking the number of parameters makes the model smaller and more efficient without reducing performance at specific tasks, to the point where these models can run on-device instead of in a data center via the cloud.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fontawesome_icon fas fa-cubes&quot;&gt;&lt;/i&gt;&lt;br /&gt;Foundation&lt;/h3&gt;

&lt;p&gt;Neural networks have been around for decades, but access to data and compute power at scale has enabled these models to cross a threshold from useful to magical. ChatGPT became an instant sensation because people did not realize a computer could respond to questions like a person. Still, at its core, a large language model such as ChatGPT is a token-prediction algorithm, and understanding how an LLM is built helps demystify and ground its performance. This glossary provides a reasonable foundation for understanding the details.&lt;/p&gt;
</description>
        <pubDate>Sat, 16 Mar 2024 12:00:00 +0000</pubDate>
        <link>https://bdunagan.com/2024/03/16/a-glossary-for-understanding-large-language-models-in-ai/</link>
        <guid isPermaLink="true">https://bdunagan.com/2024/03/16/a-glossary-for-understanding-large-language-models-in-ai/</guid>
      </item>
    
      <item>
        <title>SudokuAI 1.1: Solve Sudoku Instantly at Night with Dark Mode</title>
        <description>&lt;div class=&quot;flush&quot;&gt;&lt;img src=&quot;https://bdunagan.com/files/sudokuai_1_1_dark_mode_top.jpg&quot; /&gt;&lt;/div&gt;
&lt;div class=&quot;caption&quot;&gt;SudokuAI 1.1 with Dark Mode support&lt;/div&gt;

&lt;p&gt;SudokuAI now supports Dark Mode. This app is an open-source side project that I wrote to learn Swift. Learn more about it:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://bdunagan.com/2022/12/31/sudokuai-instantly-solve-any-sudoku-puzzle-with-your-iphone/&quot;&gt;SudokuAI: Instantly Solve Any Sudoku Puzzle with Your iPhone&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://bdunagan.com/2022/12/31/how-to-solve-any-sudoku-puzzle-in-one-second/&quot;&gt;How to Solve Any Sudoku Puzzle in One Second&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Available now.&lt;/p&gt;

&lt;p&gt;&lt;a class=&quot;center&quot; href=&quot;https://apps.apple.com/us/app/sudokuai/id6444911620&quot;&gt;
  &lt;img src=&quot;http://bdunagan.com/files/appstore.png&quot; /&gt;
&lt;/a&gt;&lt;/p&gt;
</description>
        <pubDate>Fri, 16 Feb 2024 12:00:00 +0000</pubDate>
        <link>https://bdunagan.com/2024/02/16/sudokai-1-1-dark-mode/</link>
        <guid isPermaLink="true">https://bdunagan.com/2024/02/16/sudokai-1-1-dark-mode/</guid>
      </item>
    
      <item>
        <title>Prompt Engineering for Stories: A Generative AI Children's Book using ChatGPT and Midjourney</title>
        <description>&lt;div class=&quot;flush&quot;&gt;&lt;img src=&quot;https://bdunagan.com/files/aiart_panda_sunglasses.jpg&quot; /&gt;&lt;/div&gt;
&lt;div class=&quot;caption&quot;&gt;Midjourney Prompt: “a panda wearing sunglasses and lying down beach bed in cancun”&lt;/div&gt;

&lt;p&gt;ChatGPT and Midjourney are both examples of generative AI, a form of machine learning that lets users create artwork based on specific inputs, such as text or images. Reading about the possiblities of this technology doesn’t have the same impact as personally trying it. It’s amazing.&lt;/p&gt;

&lt;p&gt;I first learned about Midjourney from Stratechery’s interview with its founder, David Holz, creator of LeapMagic. I had no idea that Midjourney was Discord’s most popular server or that Midjourney is one of the biggest consumers of GPUs in the cloud. The images are amazing. Type in text, and out comes a professional image.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/aiart_panda_reading.jpg&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Midjourney Prompt: “panda reading a book on the beach”&lt;/div&gt;

&lt;p&gt;Midjourney, released in July 2022, creates images in 60 seconds, solely through Discord. In the Stratechery interview, Holz talks about how the team was using Discord as an easy way to test the software and then realized how much fun it was, so they decided to make the tool available only through Discord. Midjourney uses a freemium business model, with a couple dozen free images and paid tiers.&lt;/p&gt;

&lt;p&gt;OpenAI’s ChatGPT, released in November 2022, is also remarkable. It’s a text chatbot, based on the GPT-3 large language model (LLM), and can respond to an impressive variety of questions. Ask it for an essay comparing communism and capitalism, and the essay is ready in less than a minute. Ask for a Python script to ingest a CSV file. Done. Ask for a sonnet about why ChatGPT is at capacity. It displays that on its status page.&lt;/p&gt;

&lt;p&gt;Both services take text as an input, referred to as a prompt, and run it through an advanced machine learning model to output a generative AI result. To explore the two services, I thought it would be fun to create children’s books, writing a prompt for ChatGPT so that the output could be fed directly into Midjourney for story artwork. I did one pass myself and then another with my family, and we printed all of them on Shutterfly as 6x6 Instant Books for $4 each.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fas fa-keyboard fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;Prompt Engineering&lt;/h3&gt;

&lt;p&gt;Prompts are a very new concept, but there is already a term for iterating on them for better results: prompt engineering. My goal was to create a prompt for ChatGPT that would generate a story, and the story sentences would serve as the prompts for artwork in Midjourney. I went with two family favorites: birds and pandas.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;ChatGPT: “Write a seven-sentence children’s story titled “Falcon Finds Home” about a peregrine falcon that wakes up in a box, escapes, and flies home above the forest and stream to find its family and rainbow-colored nest, with each sentence written as a prompt.”&lt;/p&gt;
  &lt;ol&gt;
    &lt;li&gt;The peregrine falcon woke up inside a small box, feeling confused and alone.&lt;/li&gt;
    &lt;li&gt;&lt;span style=&quot;color:red;text-decoration:line-through;&quot;&gt;It&lt;/span&gt;&lt;span style=&quot;color:green;&quot;&gt;The falcon&lt;/span&gt; quickly realized it needed to escape &lt;span style=&quot;color:green&quot;&gt;from the box&lt;/span&gt; and began looking for a way out.&lt;/li&gt;
    &lt;li&gt;Using its powerful wings, the falcon broke through the top of the box and flew into the open air.&lt;/li&gt;
    &lt;li&gt;&lt;span style=&quot;color:red;text-decoration:line-through;&quot;&gt;It&lt;/span&gt;&lt;span style=&quot;color:green;&quot;&gt;The falcon&lt;/span&gt; soared above the forest, the trees a blur of green beneath it.&lt;/li&gt;
    &lt;li&gt;The falcon followed the babbling stream, knowing it would lead it home.&lt;/li&gt;
    &lt;li&gt;As &lt;span style=&quot;color:red;text-decoration:line-through;&quot;&gt;It&lt;/span&gt;&lt;span style=&quot;color:green;&quot;&gt;the falcon&lt;/span&gt; flew, a rainbow appeared in the sky, guiding the falcon towards its destination.&lt;/li&gt;
    &lt;li&gt;Finally, the falcon saw its colorful nest built high in a tree, and it knew it was home.&lt;/li&gt;
  &lt;/ol&gt;
&lt;/blockquote&gt;

&lt;p&gt;ChatGPT wrote the below story within ten seconds, including the numbering. I made slight alterations to the text to keep the context of the story for Midjourney in every line.&lt;/p&gt;

&lt;p&gt;It’s a story and a bit unnverving. I came up with the idea in thirty seconds, and I was able to create all of the assets for a book in around ten minutes, lay it out in Shutterfly in ten minutes, and ship it in a week for $20.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/aiart_falcon_cover.png&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Book cover for &quot;Falcon Finds Home&quot;&lt;/div&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/aiart_shutterfly_falcon.png&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Book layout for &quot;Falcon Finds Home&quot;&lt;/div&gt;

&lt;blockquote&gt;
  &lt;p&gt;ChatGPT: “Write a children’s story titled “Great Wall of Pandas” in seven sentences about a family of panda that must journey over the Great Wall of China to make it home to their bamboo forest.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;blockquote&gt;
  &lt;p&gt;The Great Wall of Pandas is a story about a family of pandas who live in a bamboo forest in China. One day, the mother panda, Peaches, and the father panda, Mr. Panda, decide to take their baby panda, Bao, on a journey to see the Great Wall of China. As they walk along the Great Wall, Bao &lt;span style=&quot;color:green;&quot;&gt;the Panda&lt;/span&gt; asks his parents many questions about the history of the wall and how it was built to protect the country. Bao &lt;span style=&quot;color:green;&quot;&gt;the Panda&lt;/span&gt; is so fascinated by the Great Wall that he wants to climb to the top, so with the help of his parents, he does just that. From the top of the wall, Bao &lt;span style=&quot;color:green;&quot;&gt;the Panda&lt;/span&gt; sees a beautiful view of the country below. When it’s time to go home, the family of pandas has a great journey back to their bamboo forest. Bao &lt;span style=&quot;color:green;&quot;&gt;the Panda&lt;/span&gt; can’t wait to tell all of his friends about his adventure and is excited for more adventures with his family.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/aiart_panda_cover.png&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Book cover for &quot;Great Wall of Pandas&quot;&lt;/div&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/aiart_shutterfly_pandas.png&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Book layout for &quot;Great Wall of Pandas&quot;&lt;/div&gt;

&lt;p&gt;Connecting ChatGPT to Midjourney to create visual stories from prompts creates an AI-driven creativity pipeline. With prompt engineering, one person can fuse words and art into a story in minutes instead of months.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/aiart_books.jpg&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;6x6 Instant Books from Shutterfly&lt;/div&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fas fa-palette fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;Creativity&lt;/h3&gt;

&lt;p&gt;My family was uninterested in ChatGPT, but Midjourney was magical. Type in words, and out comes an image. Visualizing ideas was far more interesting for the family than getting the computer to generate a story. They wanted to write their own stories and use Midjourney for the artwork.&lt;/p&gt;

&lt;p&gt;They wrote two books, one about a fox and a panda and one about birds, and then we laid out the stories with the artwork from Midjourney on Shutterfly. In fact, printing 6x6 books for $4 with Shutterfly was as much of a hit as Midjourney.&lt;/p&gt;

&lt;p&gt;ChatGPT and Midjourney aren’t perfect. I generated many images with Midjourney where the panda had ten toes. ChatGPT repeatedly ignored my requests to use the name in each sentence or how to incorporate styles like “cinematic” or “sketch”. Still, these are the state of what’s available now. Those shortcomings will be resolved soon.&lt;/p&gt;
</description>
        <pubDate>Sat, 28 Jan 2023 12:00:00 +0000</pubDate>
        <link>https://bdunagan.com/2023/01/28/prompt-engineering-for-stories-a-generate-ai-childrens-book-using-chatgpt-and-midjourney/</link>
        <guid isPermaLink="true">https://bdunagan.com/2023/01/28/prompt-engineering-for-stories-a-generate-ai-childrens-book-using-chatgpt-and-midjourney/</guid>
      </item>
    
      <item>
        <title>SudokuAI: Instantly Solve Any Sudoku Puzzle with Your iPhone</title>
        <description>&lt;div class=&quot;flush&quot;&gt;&lt;img src=&quot;https://bdunagan.com/files/sudokuai.png&quot; /&gt;&lt;/div&gt;

&lt;p&gt;Before I started this side project, I had tried Sudoku a handful of times. Then, my friend sent me this excellent article by Peter Norvig titled &lt;a href=&quot;https://norvig.com/sudoku.html&quot;&gt;Solving Every Sudoku Puzzle&lt;/a&gt;, and I wrote about the algorithm in &lt;a href=&quot;https://bdunagan.com/2022/12/31/how-to-solve-any-sudoku-puzzle-in-one-second/&quot;&gt;How to Solve Any Sudoku Puzzle in One Second&lt;/a&gt;. Productizing that algorithm into an iPhone app seemed like a fun reason to learn Swift.&lt;/p&gt;

&lt;p&gt;My original vision for this app was one screen: a camera view that automatically recognized a Sudoku puzzle, highlighted the numbers with Apple’s yellow box (like QR codes get), and solve it on the fly, superimposing the calculated numbers over the image in real time. I thought I could learn Swift, SwiftUI, and iOS’s image recognition. This is not that app.&lt;/p&gt;

&lt;p&gt;Brainstorming is a fantastic avenue to thinking about what’s possible, but all projects, especially side projects like this one, are constrained by time and resources. SudokuAI is a far simpler app. It does have image recognition, but the process doesn’t involve floating yellow boxes. It solves puzzles in real time, but the result isn’t superimposed on a real-time image. SudokuAI also covers a couple more use cases than the original version did: creating a new Sudoku puzzle, using an image from Photos, or getting just a hint.&lt;/p&gt;

&lt;p&gt;Most importantly, SudokuAI is shipping on the App Store now. It’s the most important aspect because without shipping, people don’t have the opportunity to try the app.&lt;/p&gt;

&lt;p&gt;Shipping a product highlights the tension between Product Management and Engineering. Product Management wants to deliver the best experience to the customer, but Engineering is tasked with implementing the experience. Features that sound simple or fun to customers can translate into untenable requests for Engineering, like floating yellow boxes with superimposed numbers for someone who has never written Swift working on a side project. Delivering a new product requires understanding the needs of customers in addition to the resources of the team and finding the best intersection between them.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/sudokuai_list.png&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;SudokuAI List&lt;/div&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/sudokuai_puzzle.png&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;SudokuAI Puzzle&lt;/div&gt;

&lt;p&gt;I iterated many times on the design both to simplify the interaction and to enable me to implement it.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fas fa-pencil-alt fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;Simple is Hard&lt;/h3&gt;

&lt;p&gt;My original vision for the app was derailed as soon as I looked into Apple’s documentation for image recognition. I thought that those yellow rectangles might be part of an API for VisionKit. They were not, and I wasn’t interested in building them. Moreover, the original version didn’t account for creating new puzzles, importing digital photos of puzzles, or getting a hint.&lt;/p&gt;

&lt;p&gt;I pivoted to having a list of puzzles. The user could try a pre-loaded puzzle, create a new one, or import one from the camera or photo roll, and they could watch autoplay, get a hint, or solve it instantly. That list of use cases seemed reasonable and achievable. Still, those use cases could be surfaced with many different user interfaces.&lt;/p&gt;

&lt;p&gt;I settled on buttons along the top of the list view for puzzle ingestion: plus, camera, photo. Toolbar items like these are a standard iOS style.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/sudokuai_toolbar.jpg&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Toolbar&lt;/div&gt;

&lt;p&gt;The puzzle interaction workflows were far harder to get right. I started with the music player analogy: back, play/pause, next.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/sudokuai_actions_v1.jpg&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Actions - v1&lt;/div&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/sudokuai_actions_v2.jpg&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Actions - v2&lt;/div&gt;

&lt;p&gt;Interacting with them wasn’t a great experience though. “Back” should always be a reset for the puzzle. “Next” doesn’t cover both use cases: hint and solve. I abandoned the music analogy and switched to words: “Solve”, “Hint”, and “AI” (for autoplay). Tapping “AI” switched the buttons to “Pause AI” and “Reset AI”.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/sudokuai_actions_v3.jpg&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Actions - v3&lt;/div&gt;

&lt;p&gt;That interaction felt awkward. I iterated on the words with “Solve” and “AI”. Tapping “Solve” switched to “Answer” and “Hint” with “Reset”. Tapping “AI” switched to “Reset” and “Pause” with a play rate slider. The slider would allow users to slow down or speed up autoplay, seeing what choices the algorithm made to proceed through the problem space.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/sudokuai_actions_v4.jpg&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Actions - v4&lt;/div&gt;

&lt;p&gt;That workflow was far too awkward. This app was supposed to be simple. I returned to the music player analogy with a small tweak: “Reset” (Back), “Play”/“Pause”, “Hint” (Next), “Solve” (End).&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/sudokuai_actions_v5.jpg&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Actions - v5 (Final)&lt;/div&gt;

&lt;p&gt;These buttons addressed the use cases and were easy to implement, and they translated well to the high-level pitch for the app:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;em&gt;Solve Sudoku Instantly&lt;/em&gt;&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;Tap for a Hint&lt;/em&gt;&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;How to Solve Any Puzzle&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I got tired of adding puzzles manually into the app and added a toolbar item for importing a default set of puzzles: Norvig’s example along with the three hardest puzzles I could find.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://norvig.com/sudoku.html&quot;&gt;Norvig’s Example&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.sudokuwiki.org/Weekly_Sudoku.asp?puz=28&quot;&gt;Unsolvable #28&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.sudokuwiki.org/Weekly_Sudoku.asp?puz=49&quot;&gt;Unsolvable #49&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://abcnews.go.com/blogs/headlines/2012/06/can-you-solve-the-hardest-ever-sudoku&quot;&gt;World’s Hardest&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The final addition to the app was the tutorial. The first question during beta testing was what exactly the app did, so I added the tutorial and a question mark button to bring up the Help section any time.&lt;/p&gt;

&lt;p&gt;SudokuAI is not a complicated or flashy app, but it achieves what the goals were.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fas fa-laptop-code fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;Technical Difficulties&lt;/h3&gt;

&lt;p&gt;I’ve used Objective-C, Cocoa, and Interface Builder (NIBs/XIBs) for over a decade, writing 50k lines of code to create the user interface for Retrospect Backup for Mac, but Apple’s moved on to Swift. I had zero knowledge of Swift when I started this app.&lt;/p&gt;

&lt;p&gt;The initial problem was learning enough Swift to implement the algorithm, but productizing R&amp;amp;D doesn’t just mean wrapping a UI around some research. Features like “Tap for a Hint” meant deciding how to store and surface the intermediate steps of the algorithm, and the “Play”/“Pause” buttons required the algorithm slow down and occasionally stop at the user’s request. Adding the plumbing for those user-facing features required extending the original algorithm to support new use cases.&lt;/p&gt;

&lt;p&gt;Next, when should the app actually solve a puzzle? When the user hits “Solve”? When it’s created? The design choice depends on the engineering capabilities. Luckily, solving any known puzzle takes 1.2 seconds at maximum, about 1400 steps in the algorithm. Based on that, I created an upper bound, declaring “No Solution” if the algorithm exceeds 2000 steps in its search. Because the time was short enough, I opted to solve a puzzle for every change. By always having a solution or failing after a certain number of tries, the app could then display if there was a solution, providing great feedback to users who are editing a puzzle.&lt;/p&gt;

&lt;p&gt;The UI is built in SwiftUI. SwiftUI is drastically different working with XIBs in Interface Builder. In fact, Apple EOL’d Interface Builder in 2011 and rolled it into Xcode before replacing it with SwiftUI. SwiftUI is declarative code (as opposed to imperative code), meaning you can write what you want, not exactly how you want it. The following code becomes a list with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PuzzleCell&lt;/code&gt; as the table view’s cells and “SudokuAI” as the title.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-swift&quot; data-lang=&quot;swift&quot;&gt;&lt;span class=&quot;kt&quot;&gt;NavigationView&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;kt&quot;&gt;List&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;ForEach&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;items&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;item&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt;
      &lt;span class=&quot;kt&quot;&gt;PuzzleCell&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;item&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;item&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;listStyle&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;PlainListStyle&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;navigationTitle&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;SudokuAI&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;With SwiftUI, I could easily write UI code (with only a few conditional statements) that worked on any iPhone and iPad on both portrait and landscape. Coming from Interface Builder, the experience was both transformative and incredibly frustrating: amazing because very little code can become a functioning UI and annoying because that UI might not be exactly what I wanted. Learning SwiftUI informed the app’s design by seeing what was easy and shippable.&lt;/p&gt;

&lt;p&gt;Beyond solving the puzzle, the app needed to ingest puzzles, either by user creation or by import and image recognition. The goal was to do a reasonable job of recognizing a Sudoku puzzle in an image and extracting the numbers from it. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;VNRecognizeTextRequest&lt;/code&gt; in Apple’s VisionKit was a great API to accomplish this with minimal effort, but it took a number of tries to find the correct parameters. Thanks to &lt;a href=&quot;https://developer.apple.com/documentation/vision/locating_and_displaying_recognized_text&quot;&gt;Apple’s Locating and Displaying Recognized Text sample project&lt;/a&gt; for making attribute testing easy for sample puzzle images.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-swift&quot; data-lang=&quot;swift&quot;&gt;&lt;span class=&quot;n&quot;&gt;textRecognitionRequest&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;VNRecognizeTextRequest&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;completionHandler&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;recognizeTextHandler&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;textRecognitionRequest&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;recognitionLevel&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;accurate&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;textRecognitionRequest&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;usesLanguageCorrection&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;true&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;textRecognitionRequest&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;recognitionLanguages&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;en-US&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;textRecognitionRequest&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;usesCPUOnly&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;false&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;textRecognitionRequest&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;revision&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;VNRecognizeTextRequestRevision1&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;For a given image, iOS’s VisionKit API provided a list of character guesses with their location in the image. The app needed to estimate the spacing between the recognized numbers, including where the puzzle started and stopped within the image, so the code first guesses the surrounding box size and then assigns each character to a square, adding blanks for spaces that contain no number. The algorithm works well if the user crops the image to only include the puzzle. To help the user correct any errors, I overlay the recognized puzzle on top of the image, so that the user can tap on each square to correct the number. The app uses the same workflow to create a new puzzle.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/sudokuai_puzzle_overlay.png&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Image Overlay on Puzzle&lt;/div&gt;

&lt;p&gt;With camera support, photo roll integration, and puzzle creation, SudokuAI gives users a couple different workflows for ingesting puzzles. Solving puzzles instantly on any change allows the app to display a status like “Ready to Solve” or “44 hints left”. Letting users tap to change each square ensures any image recognition errors can be easily fixed. SwiftUI enables me to bundle these features into a simple UI.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fab fa-app-store fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;Available Now&lt;/h3&gt;

&lt;p&gt;&lt;a href=&quot;https://bdunagan.com/sudokuai&quot;&gt;SudokuAI&lt;/a&gt; is available for free from the &lt;a href=&quot;https://apps.apple.com/us/app/sudokuai/id6444911620&quot;&gt;App Store&lt;/a&gt;. It is compatible with iOS 15 and later on iPhone and iPad. The source code is available on GitHub at &lt;a href=&quot;https://www.github.com/bdunagan/SudokuAI&quot;&gt;bdunagan/SudokuAI&lt;/a&gt; under MIT license.&lt;/p&gt;

&lt;p&gt;Despite its name, the underlying algorithm is simply constraint propagation and search, neither of which is AI (artificial intelligence) or ML (machine learning). SudokuAI just seemed like a catchy name.&lt;/p&gt;

&lt;p&gt;&lt;a class=&quot;center&quot; href=&quot;https://apps.apple.com/us/app/sudokuai/id6444911620&quot;&gt;
  &lt;img src=&quot;http://bdunagan.com/files/appstore.png&quot; /&gt;
&lt;/a&gt;&lt;/p&gt;
</description>
        <pubDate>Sat, 31 Dec 2022 12:00:00 +0000</pubDate>
        <link>https://bdunagan.com/2022/12/31/sudokuai-instantly-solve-any-sudoku-puzzle-with-your-iphone/</link>
        <guid isPermaLink="true">https://bdunagan.com/2022/12/31/sudokuai-instantly-solve-any-sudoku-puzzle-with-your-iphone/</guid>
      </item>
    
      <item>
        <title>How to Solve Any Sudoku Puzzle in One Second</title>
        <description>&lt;div class=&quot;flush&quot;&gt;&lt;img src=&quot;https://bdunagan.com/files/sudoku.jpg&quot; /&gt;&lt;/div&gt;
&lt;div class=&quot;caption&quot;&gt;&lt;a href=&quot;https://unsplash.com/photos/4eBOPIJliNc&quot;&gt;Unsplash&lt;/a&gt; - @iamfrancismorgan&lt;/div&gt;

&lt;p&gt;My friend pointed out an excellent article by Peter Norvig titled &lt;a href=&quot;https://norvig.com/sudoku.html&quot;&gt;Solving Every Sudoku Puzzle&lt;/a&gt;. Norvig is an Education Fellow at Stanford Institute for Human-Centered AI, co-authored “Artificial Intelligence: A Modern Approach”, the most popular AI textbook, and served as a director of research and search quality at Google.&lt;/p&gt;

&lt;p&gt;This is Sudoku: fill in every blank square with a number between 1 and 9 such that every row, every column, and every 3x3 group has every digit.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/sudoku_norvig.png&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Example Sudoku Puzzle&lt;/div&gt;

&lt;p&gt;I was a complete novice to Sudoku when I started this project. I had casually played it before, but my naive mental algorithm was a mix of constraint propagation and search, without any systematic method to it. In his article, Norvig walked through how to use constraint propagation and backtracking search to quickly solve any valid Sudoku puzzle. His Python version was able to do so in one second for the world’s hardest puzzles.&lt;/p&gt;

&lt;p&gt;I skimmed Norvig’s general approach (summarized below) and then attempted to fill in the details myself with a Ruby implementation. The process took a couple iterations and pushed me to understand the problem space far better than I would have simply reading the article and Norvig’s Python implementation.&lt;/p&gt;

&lt;p&gt;Let’s walk through the different challenges.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fas fa-puzzle-piece fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;Problem Space&lt;/h3&gt;

&lt;p&gt;Sudoku has 10^21 potential solutions, and the minimum number of supplied values is 17. A researcher actually spent a year proving that no 16-value Sudoku puzzles existed, according to &lt;a href=&quot;https://www.technologyreview.com/2012/12/27/180994/best-of-2012-mathematicians-solve-minimum-sudoku-problem/&quot;&gt;MIT Tech Review&lt;/a&gt;. Fittingly, Norvig’s example puzzle (pictured above) has 17 values.&lt;/p&gt;

&lt;p&gt;Each square in the 9x9 grid can have a value between 1 and 9. These squares are grouped into “units”: columns, rows, and 3x3 groups. Each unit has nine squares (peers), so each unit has one of every digit. If a unit already has a value assigned, like “4” in the first square (A1 in Sudoku parlance), then no other square can contain that value. By converting the grid that we see into representation of known and potential values (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;A1-&amp;gt;4, A2-&amp;gt;12356789...&lt;/code&gt;), we can evaluate each unit according to a set of constraints.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fas fa-list-ul fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;Constraint Propagation&lt;/h3&gt;

&lt;p&gt;There are two constraints that we can use to reduce the space of possibilities:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;em&gt;Elimination&lt;/em&gt;: If a square has only one value, eliminate that value from all other peers.&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;Assignment&lt;/em&gt;: If a square is the only peer with a certain value, that value must belong to that square.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For every elimination, we can attempt an assignment, and for every assignment, we can attempt further eliminations, recursively narrowing the potential value space for the puzzle.&lt;/p&gt;

&lt;p&gt;In practice, easy and medium difficulty Sudoku puzzles can be solved without search by simply running eliminations and assignments recursively.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fas fa-search fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;Search&lt;/h3&gt;

&lt;p&gt;Constraint propagation only gets us so far. After we’ve limited the potential values using eliminations and assignments, we need to search the problem space: choose a possible value for a square and see if it’s correct. A backtracking search leverages constraints to prune a given branch and backtrack as soon as the remaining possibilities on that branch are not valid. I used a backtracking depth-first search in my implementation.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/sudoku_search_bdfs.jpg&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Backtracking Depth-First Search (BDFS)&lt;/div&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fas fa-laptop-code fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;Implementations Details&lt;/h3&gt;

&lt;p&gt;Norvig wrote his implementation in Python. I wrote mine in Ruby. They’re both great, high-level languages, but they have important differences that only manifested when I tested difficult puzzles.&lt;/p&gt;

&lt;p&gt;In particular, Python has lazy evaluation of generator functions. In &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;some()&lt;/code&gt;, the expression &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;e&lt;/code&gt; is not evaluated until it’s needed, so the first time it succeeds, it returns and does not evaluate the rest.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;some&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;seq&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;s&quot;&gt;&quot;Return some element of seq that is true.&quot;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;e&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;seq&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;e&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;e&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;False&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Ruby does not have lazy evaluation, so the function needs to check an instance variable instead to break out of the search.&lt;/p&gt;

&lt;p&gt;Norvig picked an excellent puzzle as the example because there are only 17 initial numbers, compared to 36 in easy puzzles and 22 in hard ones. Moreover, the puzzle seems designed to trip up naive search algorithms. My early iterations failed on his example because squares with the smallest number of potential values needed the larger number assigned, and my code guessed the lowest one and could never recover after that. It was a great example that ensured I didn’t prematurely stop improving the algorithm.&lt;/p&gt;

&lt;p&gt;However, only testing with one puzzle is not a good QA practice, so I built out a test framework with 60 puzzles of varying difficulty to ensure my implementation worked on a variety of puzzles beyond Norvig’s example. To ease debugging, I added two functions for displaying the current state of the puzzle grid in Terminal: one for the actual values and one for the potential values. The visual aid helped debug issues with the search algorithm.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-ruby&quot; data-lang=&quot;ruby&quot;&gt;&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;format_grid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;grid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;# Print grid as numbers and dots.&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;grid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;chars&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;each_slice&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;9&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;to_a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;each_with_index&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;do&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;row&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;log&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;row&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;slice&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;join&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;|&quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;row&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;slice&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;join&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;|&quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;row&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;slice&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;6&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;join&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;split&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;join&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot; &quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;log&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;------+------+------&quot;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/sudoku_grid.png&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Sudoku Grid in Terminal&lt;/div&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-ruby&quot; data-lang=&quot;ruby&quot;&gt;&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;display_values&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;values&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;values&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;values&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;each_slice&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;9&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;to_a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;each_with_index&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;do&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;row&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;log&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;row&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;..&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;].&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;collect&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;center&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;9&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)}.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;join&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot; &quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;|&quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;row&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;..&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;].&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;collect&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;center&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;9&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)}.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;join&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot; &quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;|&quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;row&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;6&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;..&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;].&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;collect&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;center&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;9&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)}.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;join&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot; &quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;log&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;-----------------------------+-----------------------------+-----------------------------&quot;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;log&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&quot;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/sudoku_values.png&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Sudoku Potential Values in Terminal&lt;/div&gt;

&lt;p&gt;After I finished debugging my version, I ported Norvig’s Python implementation into Ruby to see how they compared. Norvig’s algorithm was vastly faster than mine was: from 3x faster (0.004 seconds vs 0.01 seconds) for easy puzzles to 178x faster for the hardest puzzle available (&lt;a href=&quot;https://www.sudokuwiki.org/Weekly_Sudoku.asp?puz=28&quot;&gt;Unsolvable #28&lt;/a&gt;): 0.2 seconds vs 34 seconds.&lt;/p&gt;

&lt;p&gt;The implementations were close for puzzles that only required constraint propagation, but mine was much slower at search for two reasons:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;em&gt;Overzealous Constraint Propagation&lt;/em&gt;: My version wasted a tremendous amount of time performing constraint propagation on every square during every search, rather than focusing on the square being tested and its affected peers in each unit.&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;Wasted Data Conversions&lt;/em&gt;: My version passed each grid to be tested as a string to the next search, forcing each run to recreate the potential values of each square in the grid rather than leaving the internal representation as a set of known/potential values for each square.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I preserved my Ruby approach along with my Ruby port of the Python version in a GitHub repo for comparison along with the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sudoku.csv&lt;/code&gt; test framework and results: &lt;a href=&quot;https://www.github.com/bdunagan/SudokuAI&quot;&gt;bdunagan/SudokuAI&lt;/a&gt;.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fas fa-mobile-alt fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;SudokuAI&lt;/h3&gt;

&lt;p&gt;As an intersection between Product Management and Engineering, I thought productizing Norvig’s algorithm into an iPhone app written in Swift would be a fun extension to this side project. Read more about the process in &lt;a href=&quot;https://bdunagan.com/2022/12/31/sudokuai-instantly-solve-any-sudoku-puzzle-with-your-iphone/&quot;&gt;SudokuAI: Instantly Solve Any Sudoku Puzzle with Your iPhone&lt;/a&gt;.&lt;/p&gt;
</description>
        <pubDate>Sat, 31 Dec 2022 11:00:00 +0000</pubDate>
        <link>https://bdunagan.com/2022/12/31/how-to-solve-any-sudoku-puzzle-in-one-second/</link>
        <guid isPermaLink="true">https://bdunagan.com/2022/12/31/how-to-solve-any-sudoku-puzzle-in-one-second/</guid>
      </item>
    
      <item>
        <title>Adopting a Continuous Delivery (CD) Jenkins DevOps Pipeline</title>
        <description>&lt;div class=&quot;flush&quot;&gt;&lt;img src=&quot;https://bdunagan.com/files/cd_ci.jpg&quot; /&gt;&lt;/div&gt;
&lt;div class=&quot;caption&quot;&gt;&lt;a href=&quot;https://unsplash.com/photos/_EFvjSgbw1c&quot;&gt;Unsplash&lt;/a&gt; - @chuttersnap&lt;/div&gt;

&lt;p&gt;More than a decade ago, Retrospect Engineering set up a daily build system for our Windows and Mac products. Once a day, a cronjob would kick off a task to build the latest version of our codebase and email the results to the team. The daily build process was a fantastic method for adding automation to the build process for having a repeatable build process, ensuring the main codebase compiled in production, and storing builds that we could use to narrow down any issue that arose during regression testing.&lt;/p&gt;

&lt;p&gt;Since then, continuous integration (CI) workflows have become a common practice for Engineering teams. Continuous integration (CI) and continuous delivery (CD) are both processes to automate building, testing, and in the case of delivery, packaging and deploying a product to the final platform for customers. This DevOps automation creates a pipeline from codebase to customers, eliminating human bottlenecks and manual steps to increase the efficiency and performance of an engineering team.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://www.devops-research.com/research.html&quot;&gt;Google’s DevOps Research and Assessment (DORA)&lt;/a&gt; group studied thousands of teams to understand the practices and metrics for high-performing teams vs low-performing teams, and continuous delivery is a fundamental building block for high-performing teams. By relying on an automation pipeline, engineering teams can ship more features faster and better tested while reducing burnout.&lt;/p&gt;

&lt;p&gt;Recently, we decided it was time to upgrade our infrastructure to a continuous delivery workflow to improve our efficiency as an Engineering team. Let’s walk through the original homegrown build system, what our overall goals for automation were, and how we extended the homegrown build system into a continuous delivery pipeline with Jenkins to improve our delivery performance.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fas fa-cube fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;Homegrown Build System&lt;/h3&gt;

&lt;p&gt;Since the mid-2000s, Retrospect has had a daily build system for Windows and Mac and a manual build for Linux.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/cd_build.jpg&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Homegrown Build System&lt;/div&gt;

&lt;p&gt;On Windows, we’ve used Visual Build Pro to generate builds. Visual Build Pro checked out the source code from GitHub Enterprise, compiled the appropriate project, posted the build output to a NAS share, and emailed the Engineering team with the results. When we added automated testing, we updated Visual Build Pro to kick that off as well.&lt;/p&gt;

&lt;p&gt;On Mac, we wrote our own build system. The first version was a shell script, run daily as a cron job. That lasted a couple years until we rewrote it in Ruby. The Ruby version has endured quite well. In the intervening years, we’ve added support for building different versions of the Mac application, running regressions with our automated test framework, signing them, notarizing them, and uploading the final artifacts to Amazon S3. The Ruby build script is an end-to-end build process for our Mac product.&lt;/p&gt;

&lt;p&gt;On Linux, we have never made changes to the client agent frequently enough to justify automating the build process. We built it manually for every change.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;far fa-list-ul fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;Continuous Delivery Goals&lt;/h3&gt;

&lt;p&gt;Our homegrown system had elements of continuous integration, but there were many manual steps. Let’s walk through what we needed from a continuous delivery pipeline:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Continuous Delivery&lt;/strong&gt;: We wanted a single system that supported an end-to-end pipeline, starting when a pull request was merged, building, testing, packaging, deploying to Amazon S3 for download by customers, and notifying the Engineering team of the results.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;On-Premise&lt;/strong&gt;: We use GitHub Enterprise, so we needed a system that could run on-premise and control the local testing infrastructure.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Cross Platform&lt;/strong&gt;: Retrospect has Windows, Mac, and Linux elements to build as well as cross-platform dependencies. We need to build a Mac artifact and then transfer it to Windows and vice versa. Moreover, we needed to check out our codebase at a single consistent point across platforms.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Build Commit with Status&lt;/strong&gt;: As a team, we are accustomed to having a build commit to master/main on GitHub that identifies the exact bits that were compiled and shipped, and the build commits keep version numbers and cross-platform artifacts up-to-date within the codebase. We wanted to consolidate the two commits that we currently had into a single commit and mark it with a status from the CD pipeline.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Our goal was to leverage our current build processes, knit them together into a single build pipeline, and extend that into continuous delivery.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fas fa-cubes fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;Third-Party Build Pipeline&lt;/h3&gt;

&lt;p&gt;Building a homegrown system has the fantastic advantage of doing exactly what it was coded for and the unfortunate downside of doing nothing else. The internal team pushes the functionality forward. We needed to leverage a third-party solution to achieve our CD goals with minimal commitment.&lt;/p&gt;

&lt;p&gt;We looked at Jenkins, GoCD, Bamboo, CircleCI, and GitHub Actions; Jenkins was open source, worked on-premise, and seemed to be popular enough that we could google for help. Jenkins was designed to be a CI system with a plugin architecture, and it had thousands of plugins. We thought it made sense to try it first.&lt;/p&gt;

&lt;p&gt;Overall, the setup process took about three weeks of one engineer’s time, spread out over three month. We have nine stages in our pipeline using three nodes (2 Windows, 1 Mac) to perform the following steps through a versioned Groovy script.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/cd_jenkins_pipeline.jpg&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Third-Party Build Pipeline&lt;/div&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Auto Trigger&lt;/strong&gt;: We wanted the new system to start a new build when someone merged a pull request to master. As we use GitHub, its Webhooks were the natural choice. We originally set up GitHub Webhooks to only send Jenkins data on pull request changes, but because it didn’t actually include the master commit, Jenkins didn’t trigger the build. We had to send the push event from GitHub for Jenkins Git plugin to notify our pipeline. However, because the pipeline was triggered by pushes, the later build commit auto-triggered a new build, creating a feedback loop. We had to add special code to detect and suppress builds based on the build commit.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Delay&lt;/strong&gt;: We set a 30-minute delay before the pipeline starts in case an engineer merges several pull requests in quick succession (which did happen to us in practice).&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Code Checkout&lt;/strong&gt;: We have two platforms that we’re building on, and we needed the codebase to be in the same state on both to avoid the race condition of building different commits on different platforms. We reduced this race condition from 45 minutes to 15 seconds by separating out the checkout stages and moving them to the beginning. We did experiment with checking out a specific Git hash, but that approach ran into issues when we tried to push a new build commit back to master.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Build Dependencies&lt;/strong&gt;: Retrospect for Windows includes a Mac binary and vice versa. The build pipeline needed to build the dependencies on both platforms and then copy them to the other platform via an Amazon S3 artifact staging folder.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Commit Version and Dependencies&lt;/strong&gt;: The team was accustomed to the version of the product and the above dependencies that they use in development being updated by the build system. In the past, we had the separate build systems commit these, but that approach would re-introduce the race condition from above. If one platform committed to master, the other platform would need to fetch that change or its push would fail, but because it was fetching a change, it could fetch too much. Either build commit push would fail if someone had merged another pull request, but we were trying to avoid silent bugs, rather than the build process failing. We consolidated the two platform commits into one by transferring all of the updated files from Windows onto Mac via the S3 artifact staging folder and then committing all of them.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Build Applications&lt;/strong&gt;: Each platform node would build Retrospect client agent and engine application as before, using the previous build systems.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Package Applications&lt;/strong&gt;: Jenkins was quite handy for creating the folder structure and moving files around for packaging. On Mac, we did need to code sign and notarize the binaries. These steps used to be manual, and the team really appreciated automating them.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Deploy Applications&lt;/strong&gt;: Our deployment process is quite straight-forward: upload them to S3 and mark as public.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Test Applications&lt;/strong&gt;: On Mac, the original build system continued to run automated testing on the finalized product. On Windows, Jenkins downloaded the finished binaries to a new node where it built the test framework and then ran it against the new binaries. Our tests are quite extensive, as we make backup software, so the tests actually extended the build pipeline from two hours to nine hours.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Email Team&lt;/strong&gt;: Jenkins sent an email at the end with the status of the build and linked (if successful) to the staged final products.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It’s a long list, but it ticks off all of our requirements. Jenkins provides us with an infrastructure for managing multiple build systems on different platforms as nodes in a single pipeline, and its Groovy script support enables us to knit those build systems together into a DevOps process.&lt;/p&gt;

&lt;p&gt;The resulting Groovy script is 500 lines, and that count excludes the 2k-line Ruby file for building Mac and the extensive Visual Build Pro script on Windows. The option to pull the Groovy script from source code (GitHub Enterprise in our case) highlights the DevOps mindset of repeatability and automation.&lt;/p&gt;

&lt;p&gt;While the team is really happy that Jenkins is working now, the setup process was not smooth. It took a significant amount of time to understand what Jenkins could provide us, how to connect what we needed into what it could deliver, and then codify that process in Groovy. Beyond that expertise barrier, Jenkins has bugs, like all software. We hit a number of issues where we eventually found open JIRA issues without resolution, so we worked around them.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fas fa-tachometer-alt fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;High-Performing Team&lt;/h3&gt;

&lt;p&gt;CI/CD workflows are a necessary part of a high-performing Engineering team. While we had a good system for building and testing the software, we were missing the full continuous delivery pipeline. Like all Engineering projects, there was an opportunity cost associated with the infrastructure upgrade, but resolving this technical debt allows the teamn to focus on features rather than manual processes.&lt;/p&gt;

&lt;p&gt;Thanks to the engineer who took on this project and overcame the hurdles we found in Jenkins to wrap up this project and to the entire team for helping out when needed! Moving to cross-platform continuous integration and continuous delivery had been a goal for years, so a huge shout-out to the Retrospect Engineering team for ticking it off this summer.&lt;/p&gt;
</description>
        <pubDate>Sat, 29 Oct 2022 12:00:00 +0000</pubDate>
        <link>https://bdunagan.com/2022/10/29/adopting-a-continuous-delivery-cd-jenkins-devops-pipeline/</link>
        <guid isPermaLink="true">https://bdunagan.com/2022/10/29/adopting-a-continuous-delivery-cd-jenkins-devops-pipeline/</guid>
      </item>
    
      <item>
        <title>Anomaly Detection with Retrospect Backup</title>
        <description>&lt;div class=&quot;flush&quot;&gt;&lt;img src=&quot;https://bdunagan.com/files/anomaly_detection.jpg&quot; /&gt;&lt;/div&gt;
&lt;div class=&quot;caption&quot;&gt;&lt;a href=&quot;https://unsplash.com/photos/1k3vsv7iIIc&quot;&gt;Unsplash&lt;/a&gt; - @kaip&lt;/div&gt;

&lt;p&gt;The rise of ransomware, and ransomware-as-a-service more recently, poses a huge threat to businesses around the world with a projected $20B payout in 2021, a 100% year-over-year increase for the last four years. Ransomware is now a vast ecosystem with many different forms of attacks. Many attackers have their own versions of ransomware, and these are called variants. Each variant has the same purpose, but it uses a different mechanism or simply a different naming convention. The majority of ransomware variants and all of the top 10 forms for 2021 followed the same attack pattern: infiltrate a computer, encrypt its files, and then rename the files with a different extension. Businesses need to detect ransomware as early as possible to stop the threat and remediate those resources.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://www.retrospect.com/anomaly_detection&quot;&gt;Anomaly detection&lt;/a&gt; in Retrospect Backup 18.5, launched last month, identifies changes in an environment that warrant the attention of the IT team. Administrators can tailor anomaly detection to their business’s specific systems using customizable filtering and thresholds for each of their backup policies, and those anomalies are aggregated on Retrospect Management Console across the entire business’s Retrospect Backup instances or a partner’s client base with a notification area for responding to those anomalies.&lt;/p&gt;

&lt;p&gt;Retrospect Backup detects all of the major ransomware variants using an algorithm that focuses on file metadata anomalies for behavior-based monitoring. According to Coveware, the top variants are always changing, with over 50% changing every quarter. The key to detection is combining technologies such as signature detection in processes with file-based irregularities. Using a multi-pronged defense, with immutable backups, anomaly detection, and other security layers, businesses will know when they’re being attacked and will have the tools to remediate it and move on.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fas fa-tools fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;Technical Deep Dive&lt;/h3&gt;

&lt;p&gt;As a data protection solution, Retrospect Backup has a significant footprint in a business’s computer environment with visibility into endpoints, servers, NAS volumes, and even cloud storage. To detect anomalies, Retrospect Backup provides a per-policy option for filtering and threshold to decide whether or not certain file changes are an anomaly with options for notifications. Let’s walk through each:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Filtering&lt;/strong&gt;: Configure a filter to identify the files to observe. Retrospect lets administrators tailor this to file types, paths, dates, or specific attributes, and the built-in filter focuses on office documents, photos, and movies.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Threshold&lt;/strong&gt;: Set the threshold for the alert. If the percentage of files new or changed out of the total number of files matched by the filter is greater or equal to the threshold, Retrospect will create an anomaly event.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Notification&lt;/strong&gt;: Access notifications on Retrospect Management Console, receive them immediately in an email, and find them in the Execution History and Backup Report. Retrospect surfaces the notification for anomaly detection in the best place for an organization.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/anomaly_detection_diagram.jpg&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Anomaly Detection Diagram&lt;/div&gt;

&lt;p&gt;The diagram shows the volume being monitored as a whole, the subset of files that match the “Anomaly Detection” filter, and the files that are new or changed within that subset. Retrospect generates an alert if the percentage exceeds the threshold.&lt;/p&gt;

&lt;p&gt;Customers can simply enable “Anomaly Detection” in each policy, select the appropriate filter, and assign a threshold. Retrospect takes care of the rest.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/anomaly_detection_win_settings.png&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Retrospect Backup for Windows - Anomaly Detection&lt;/div&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/anomaly_detection_mac_settings.png&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Retrospect Backup for Mac - Anomaly Detection&lt;/div&gt;

&lt;p&gt;When an anomaly is detected, Retrospect can generate an email notification and surface the anomaly in the application and on Retrospect Management Console. It also provides API integration for businesses to tailor the workflow to their processes using Script Hooks, so the organizations can take further actions, like posting to a Slack channel or even stopping the backup until the problem is assessed.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fas fa-truck fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;Free Update&lt;/h3&gt;

&lt;p&gt;Congratulations to the Retrospect Engineering team for releasing such a huge update! Retrospect Backup 18 included immutable backups for ransomware protection, and with this free update, we’re extending that ransomware focus to anomaly detection.&lt;/p&gt;
</description>
        <pubDate>Sun, 06 Mar 2022 12:00:00 +0000</pubDate>
        <link>https://bdunagan.com/2022/03/06/anomaly-detection-with-retrospect-backup/</link>
        <guid isPermaLink="true">https://bdunagan.com/2022/03/06/anomaly-detection-with-retrospect-backup/</guid>
      </item>
    
      <item>
        <title>Containerizing our Web Development Toolchain with Docker</title>
        <description>&lt;div class=&quot;flush&quot;&gt;&lt;img src=&quot;https://bdunagan.com/files/containers.jpg&quot; /&gt;&lt;/div&gt;
&lt;div class=&quot;caption&quot;&gt;&lt;a href=&quot;https://unsplash.com/photos/SInhLTQouEk&quot;&gt;Unsplash&lt;/a&gt; - @exdigy&lt;/div&gt;

&lt;p&gt;Containerization has been around for decades since chroot, but the technology has become ubiquitous as large tech companies have invested in platforms and tools to make containers easier to use and scale with tools like Kubernetes. While I’d been using containers with Heroku for years, I didn’t realize how pervasive they were until I read an &lt;a href=&quot;https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44843.pdf&quot;&gt;ACMQueue article&lt;/a&gt; in 2016 about Google’s Borg, Omega, and Kubernetes, especially this section:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;“The isolation and dependency minimization provided by containers have proved quite effective at Google, and the container has become the sole runnable entity supported by the Google infrastructure.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Containers are how Google can scale services, by abstracting away the physical infrastructure and building orchestration tools to manage them, and they fit into a broader context with the rise of DevOps, handing Engineering an excellent building block for creating software delivery pipelines and deploying applications.&lt;/p&gt;

&lt;p&gt;Containers are one level of abstraction up from virtual machines. Virtual machines (VMs) allow you to run an operating system in an isolated environment in a server, virtualizing the hardware, but containers allow you to run a single application within an isolated environment, virtualizing the operating system. Compared to virtual machines, containers are smaller, start faster, and consume fewer resources, so you can run more on a single physical hardware system while maintaining isolation between them.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/containerization.jpg&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Virtual Machine vs Container&lt;/div&gt;

&lt;p&gt;You’ll find many diagrams like this one online, but it’s an oversimplification. Containers are so small and fast because they rely on the host OS’s kernel, but they actually need their own minimal guest OS as well to handle tasks like package management, configuration, and other miscellaneous OS features. The application’s libraries and dependencies as well as the small guest OS are all wrapped up in “Libs”. As an example, the Alpine Linux OS is 5MB, and applications running on it start in less than a second.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://docker.com/&quot;&gt;Docker&lt;/a&gt; made containers accessible and has become very popular since its launch in 2013 with support for Windows, Mac, and Linux. Docker containers are instances of a Docker image, and each image is a stack of immutable layers, based on the commands of your &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Dockerfile&lt;/code&gt;. This layered immutability has multiple benefits, allowing you to only change the image based on new or changed commands and to store layers separately in Docker Hub, the hosted service for distributing public Docker images. Clients can download layers concurrently for performance.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fas fa-tools fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;Toolchain Upgrade&lt;/h3&gt;

&lt;p&gt;Before the transition to Docker, our web app toolchain was a local macOS environment for development, &lt;a href=&quot;https://www.github.com&quot;&gt;GitHub&lt;/a&gt; for source control, and &lt;a href=&quot;https://www.heroku.com&quot;&gt;Heroku&lt;/a&gt; for deployment. Retrospect Engineering has used Heroku for web app deployment since 2011, starting with &lt;a href=&quot;https://www.retrospect.com/&quot;&gt;Retrospect.com&lt;/a&gt;, our Ruby-on-Rails website, and then with &lt;a href=&quot;https://console.retrospect.com/&quot;&gt;Retrospect Management Console&lt;/a&gt;, our hosted service for backup analytics. Heroku, now owned by Salesforce, has been a fantastic deployment target. Deploying a new version is as simple as pushing to a Git repository.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/container_development.jpg&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Development Toolchain&lt;/div&gt;

&lt;p&gt;Instead of installing and managing upgrades for Ruby/Rails/Gems/Postgres/Redis on macOS, we manage them all within a Docker container running on our Mac development computers. We finally decided to migrate from a local macOS environment to Docker because of upgrade issues. Every Ruby/Rails/gem/macOS version update hit installation issues which we only resolved after finding a blog post for our specific combination of versions. Upgrades were painful enough that we postponed them, a sure sign that we needed a deeper toolchain upgrade.&lt;/p&gt;

&lt;p&gt;Docker provided that next-generation environment, where upgrading a Ruby version was as simple as changing the base Docker image that we depend on. Every upgrade starts with a clean slate in the image, so there are far fewer conflicts between different versions or different packages. Using containers for development work increased the Engineering team’s efficiency.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fas fa-cube fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;One Container&lt;/h3&gt;

&lt;p&gt;The website is a Ruby-on-Rails environment with only a SQLite dependency for local development, so we use a simple Docker configuration file (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Dockerfile&lt;/code&gt;). Here is our short file for the website:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-shell&quot; data-lang=&quot;shell&quot;&gt;FROM ruby:2.7.5-alpine3.14 as retrospectdotcom

&lt;span class=&quot;c&quot;&gt;# Set up environment: build-base for &quot;developer tools&quot;, sqlite for SQLite, postgres for Postgres, git for Git, and tzdata for weird Rails 5 error.&lt;/span&gt;
RUN apk update &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; apk add &lt;span class=&quot;nt&quot;&gt;--no-cache&lt;/span&gt; build-base sqlite-dev postgresql-client postgresql-dev tzdata git

&lt;span class=&quot;c&quot;&gt;# Add Gemfiles for Bundler.&lt;/span&gt;
RUN &lt;span class=&quot;nb&quot;&gt;mkdir&lt;/span&gt; /retrospectdotcom
COPY ./Gemfile /retrospectdotcom
COPY ./Gemfile.lock /retrospectdotcom
WORKDIR /retrospectdotcom

&lt;span class=&quot;c&quot;&gt;# Set up gems&lt;/span&gt;
RUN bundle &lt;span class=&quot;nb&quot;&gt;install
&lt;/span&gt;RUN &lt;span class=&quot;nb&quot;&gt;rm&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-rf&lt;/span&gt; /retrospectdotcom

&lt;span class=&quot;c&quot;&gt;# Run rails server&lt;/span&gt;
CMD &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;bundle&quot;&lt;/span&gt;, &lt;span class=&quot;s2&quot;&gt;&quot;exec&quot;&lt;/span&gt;, &lt;span class=&quot;s2&quot;&gt;&quot;rails&quot;&lt;/span&gt;, &lt;span class=&quot;s2&quot;&gt;&quot;server&quot;&lt;/span&gt;, &lt;span class=&quot;s2&quot;&gt;&quot;-b&quot;&lt;/span&gt;, &lt;span class=&quot;s2&quot;&gt;&quot;0.0.0.0&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Let’s walk through each chunk:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Base Image&lt;/strong&gt;: We use the official Ruby Alpine Linux image as the base image. Common tools like Ruby publish official images to Docker Hub.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;APK&lt;/strong&gt;: Alpine Package Keeper (APK) is an extremely lightweight package manager. One of the reasons we need a base image inside the container is for the package management functionality, so that we can download the necessary libraries for our application’s environment.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Gemfiles&lt;/strong&gt;: We create the directory for our application, copy in the Gemfiles, run &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bundle install&lt;/code&gt; to install the required libraries, and finally remove both Gemfiles. This image is specifically for local development, so we want to bind mount to the host’s application folder and not depend on any application code within the container.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Gemfile.lock&lt;/strong&gt;: We actually created a separate &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Dockerfile&lt;/code&gt; for running &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bundle update&lt;/code&gt; because it’s such an infrequent task to generate a new &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Gemfile.lock&lt;/code&gt;, and we wanted it to be an intentional action.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Rails Server&lt;/strong&gt;: We run the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;rails server&lt;/code&gt; command to start the application.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;From Terminal, we use the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Dockerfile&lt;/code&gt; to build the image. Then, to run an instance of the image, we use a bind mount, reference a Docker environment file, and  include &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-it&lt;/code&gt; to allow interactive debugging with the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pry-byebug&lt;/code&gt; gem.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-shell&quot; data-lang=&quot;shell&quot;&gt;docker build &lt;span class=&quot;nt&quot;&gt;-t&lt;/span&gt; retrospectdotcom &lt;span class=&quot;nb&quot;&gt;.&lt;/span&gt;
docker run &lt;span class=&quot;nt&quot;&gt;-it&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;--mount&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;bind&lt;/span&gt;,source&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;pwd&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;,target&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;/retrospectdotcom &lt;span class=&quot;nt&quot;&gt;--env-file&lt;/span&gt; Docker.env &lt;span class=&quot;nt&quot;&gt;-p&lt;/span&gt; 3000:3000 retrospectdotcom&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;With that, we have the website running in a Docker container with the following features:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Local Development&lt;/strong&gt;: It’s accessible at &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;http://localhost:3000&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Live Editing&lt;/strong&gt;: The bind mount means the container’s app folder references the app folder on the host under source control, so that we can do live editing and commit our changes as if the app was not running in a container.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Live Debugging&lt;/strong&gt;: The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-it&lt;/code&gt; parameter enables &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tty&lt;/code&gt; mode for live interaction, which means we can use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;debugger&lt;/code&gt; for breakpointing.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can also override the default command by passing a command at the end. For instance, to run &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;rails console&lt;/code&gt;, use:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-shell&quot; data-lang=&quot;shell&quot;&gt;docker run &lt;span class=&quot;nt&quot;&gt;-it&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;--mount&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;bind&lt;/span&gt;,source&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;pwd&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;,target&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;/retrospectdotcom &lt;span class=&quot;nt&quot;&gt;--env-file&lt;/span&gt; Docker.env &lt;span class=&quot;nt&quot;&gt;-p&lt;/span&gt; 3000:3000 retrospectdotcom bundle &lt;span class=&quot;nb&quot;&gt;exec &lt;/span&gt;rails console&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Instead of typing the whole line every time, we set up a shell alias in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;~/.bash_profile&lt;/code&gt;:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-shell&quot; data-lang=&quot;shell&quot;&gt;&lt;span class=&quot;nb&quot;&gt;alias &lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;rs&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;docker run -it --mount type=bind,source=&quot;$(pwd)&quot;,target=/retrospectdotcom --env-file Docker.env -p 3000:3000 retrospectdotcom&apos;&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# The above command becomes&lt;/span&gt;
rs bundle &lt;span class=&quot;nb&quot;&gt;exec &lt;/span&gt;rails console&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fas fa-cubes fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;Multiple Containers&lt;/h3&gt;

&lt;p&gt;Retrospect Management Console is a Ruby-on-Rails web application that has two dependencies: Postgres and Redis. We needed all three services running and able to communicate with each other. This configuration required Docker Compose to handle the multiple microservices.&lt;/p&gt;

&lt;p&gt;With Docker Compose, we can set up a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;docker-compose.yml&lt;/code&gt; file to describe the containers, dependencies, volumes, and configurations. Docker Compose even supports health checks to periodically verify that a service is operating correctly.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-yaml&quot; data-lang=&quot;yaml&quot;&gt;&lt;span class=&quot;na&quot;&gt;version&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;2&apos;&lt;/span&gt;
&lt;span class=&quot;na&quot;&gt;services&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;na&quot;&gt;postgresdb&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;image&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;postgres:11.2-alpine&apos;&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;user&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;postgres&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;ports&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; 
        &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;5432:5432&quot;&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;volumes&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;pgdata:/var/lib/postgresql/data&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;healthcheck&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;na&quot;&gt;test&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;pi&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;CMD-SHELL&quot;&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;pg_isready&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;-U&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;postgres&quot;&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;]&lt;/span&gt;
      &lt;span class=&quot;na&quot;&gt;interval&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;5s&lt;/span&gt;
      &lt;span class=&quot;na&quot;&gt;timeout&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;5s&lt;/span&gt;
      &lt;span class=&quot;na&quot;&gt;retries&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;5&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;env_file&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;Docker.env&apos;&lt;/span&gt;
  &lt;span class=&quot;na&quot;&gt;redis&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;image&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;redis:4.0-alpine&apos;&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;volumes&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;redisdata:/data&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;ports&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; 
        &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;6379:6379&quot;&lt;/span&gt;
  &lt;span class=&quot;na&quot;&gt;console&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;depends_on&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;s1&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;postgresdb&apos;&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;na&quot;&gt;condition&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;service_healthy&lt;/span&gt;
      &lt;span class=&quot;s1&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;redis&apos;&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;na&quot;&gt;condition&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;service_started&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;command&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;sh -c &quot;rm -f tmp/pids/server.pid &amp;amp;&amp;amp; bundle exec rails s -p 3000 -b &apos;0.0.0.0&apos;&quot;&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;build&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;.&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;stdin_open&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;true&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;tty&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;true&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;ports&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;3000:3000&apos;&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;volumes&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;.:/consoledotcom&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;env_file&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;Docker.env&apos;&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;entrypoint&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;/docker-entry.sh&quot;&lt;/span&gt;
  &lt;span class=&quot;na&quot;&gt;sidekiq&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;build&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;.&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;command&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;bundle exec sidekiq -C config/sidekiq.yml&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;volumes&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;.:/consoledotcom&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;links&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;postgresdb&lt;/span&gt;
      &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;redis&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;env_file&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;Docker.env&apos;&lt;/span&gt;
&lt;span class=&quot;na&quot;&gt;volumes&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;na&quot;&gt;pgdata&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; 
  &lt;span class=&quot;na&quot;&gt;redisdata&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;For our &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;docker-entry.sh&lt;/code&gt; file, we added logic to either run any pending Rails migrations if there is a database or re-create the database if it did not exist:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-shell&quot; data-lang=&quot;shell&quot;&gt;&lt;span class=&quot;c&quot;&gt;# Run postgres tasks.&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;if &lt;/span&gt;psql &lt;span class=&quot;nt&quot;&gt;-h&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$PGHOST&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-U&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$PGUSER&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$PGDBNAME&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-c&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;&apos;&lt;/span&gt; 2&amp;gt;&amp;amp;1&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;then&lt;/span&gt;
    &lt;span class=&quot;c&quot;&gt;# Apply any migrations.&lt;/span&gt;
    &lt;span class=&quot;nb&quot;&gt;echo&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;Running &apos;bundle exec rake db:migrate&apos;&quot;&lt;/span&gt;
    bundle &lt;span class=&quot;nb&quot;&gt;exec &lt;/span&gt;rake db:migrate
&lt;span class=&quot;k&quot;&gt;else&lt;/span&gt;
    &lt;span class=&quot;c&quot;&gt;# If no database, run db:setup (db:create + db:schema:load + db:seed)&lt;/span&gt;
    &lt;span class=&quot;c&quot;&gt;# Note this will attempt all of those actions above twice (development + test) &lt;/span&gt;
    &lt;span class=&quot;c&quot;&gt;# and might include log errors due to https://github.com/rails/rails/issues/27299&lt;/span&gt;
    &lt;span class=&quot;nb&quot;&gt;echo&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;Running &apos;bundle exec rake db:setup&apos;&quot;&lt;/span&gt;
    bundle &lt;span class=&quot;nb&quot;&gt;exec &lt;/span&gt;rake db:setup
&lt;span class=&quot;k&quot;&gt;fi&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;# To load database dump, run the following from the *host* Terminal (because Docker can&apos;t see the host file system):&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# pg_restore --verbose --clean --no-acl --no-owner -h localhost -U postgres -d dbname latest.dump&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;# Run &quot;command&quot; supplied by docker-compose.&lt;/span&gt;
&lt;span class=&quot;nb&quot;&gt;exec &lt;/span&gt;bundle &lt;span class=&quot;nb&quot;&gt;exec&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$@&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;As with the website, we configured Docker to use the host system’s web application folder under source control via a bind mount.&lt;/p&gt;

&lt;p&gt;To launch the containers, first build the images then bring the setup up.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-shell&quot; data-lang=&quot;shell&quot;&gt;docker-compose &lt;span class=&quot;nt&quot;&gt;-f&lt;/span&gt; docker-compose.yml build
docker-compose &lt;span class=&quot;nt&quot;&gt;-f&lt;/span&gt; docker-compose.yml up&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Live debugging is one extra step. We needed &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;stdin_open: true&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tty: true&lt;/code&gt; entries under the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;console&lt;/code&gt; container in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;docker-compose.yml&lt;/code&gt;. Then, we needed to add &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;debugger&lt;/code&gt; in the Rails app where we wanted to breakpoint. Finally, we needed to attach to the container after it launched.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-shell&quot; data-lang=&quot;shell&quot;&gt;docker ps
docker attach container_name&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;To detach, type the escape sequence Ctrl+P then Ctrl+Q.&lt;/p&gt;

&lt;p&gt;Since the Postgres database is now run in a container, we needed to modify our &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;psql&lt;/code&gt; to access it. Fortunately, Docker makes it easy:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-shell&quot; data-lang=&quot;shell&quot;&gt;&lt;span class=&quot;c&quot;&gt;# https://stackoverflow.com/questions/27673563/how-to-get-into-psql-of-a-running-postgres-container&lt;/span&gt;
psql &lt;span class=&quot;nt&quot;&gt;--host&lt;/span&gt; localhost &lt;span class=&quot;nt&quot;&gt;--port&lt;/span&gt; 5432 &lt;span class=&quot;nt&quot;&gt;--username&lt;/span&gt; postgres&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Finally, here are a couple more handy Docker commands:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-shell&quot; data-lang=&quot;shell&quot;&gt;&lt;span class=&quot;c&quot;&gt;# Clean up Docker&lt;/span&gt;
docker system prune
&lt;span class=&quot;c&quot;&gt;# Check storage usage (because Docker only is allocated a certain amount of space per its preferences):&lt;/span&gt;
docker system &lt;span class=&quot;nb&quot;&gt;df&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# Run sh in an instance&lt;/span&gt;
docker &lt;span class=&quot;nb&quot;&gt;exec&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-it&lt;/span&gt; container_name sh&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fas fa-window-restore fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;Headless Automation Testing&lt;/h3&gt;

&lt;p&gt;Automation testing is significantly more complicated with Docker. You can’t just drive a browser because there is no screen. We needed to update our Rails Rspec and Capybara setup to use a headless environment for the integration tests. After testing different options, we settled on Selenium Grid with a Google Chrome container using Docker Compose, based on &lt;a href=&quot;https://www.plymouthsoftware.com/articles/rails-on-docker-system-specs-in-containers-with-rspec-capybara-chrome-and-selenium&quot;&gt;Plymouth Software’s excellent blog post&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;With this environment, RSpec can run through our 711 tests for the website and generate screenshots on the failures, covering the site and the embedded partner and support portal.&lt;/p&gt;

&lt;p&gt;Here is the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;docker-compose.test.yml&lt;/code&gt; file for reference:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-yaml&quot; data-lang=&quot;yaml&quot;&gt;&lt;span class=&quot;na&quot;&gt;services&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;na&quot;&gt;web&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;environment&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;na&quot;&gt;HUB_URL&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;http://chrome:4444/wd/hub&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;# &amp;lt;-- Add the HUB_URL environment variable&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;depends_on&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;chrome&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;# &amp;lt;-- Link to the chrome container&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;ports&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;3000:3000&quot;&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;build&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;.&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;command&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;sh -c &quot;rm -f tmp/pids/server.pid &amp;amp;&amp;amp; bundle exec rspec&quot;&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;stdin_open&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;true&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;tty&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;true&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;env_file&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;Docker.env&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;volumes&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;bind&lt;/span&gt;
        &lt;span class=&quot;na&quot;&gt;source&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;.&lt;/span&gt;
        &lt;span class=&quot;na&quot;&gt;target&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;/retrospectdotcom&lt;/span&gt;
  &lt;span class=&quot;na&quot;&gt;chrome&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;image&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;selenium/standalone-chrome:4.1.0-20211123&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;# this version should match that of the selenium-webdriver gem (see Gemfile)&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;ports&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;4444:4444&quot;&lt;/span&gt;
      &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;7900:7900&quot;&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;volumes&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;/dev/shm:/dev/shm&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Here is a snippet from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;rails_helper.rb&lt;/code&gt; that works for us:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-ruby&quot; data-lang=&quot;ruby&quot;&gt;&lt;span class=&quot;nb&quot;&gt;require&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;byebug&apos;&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;# Support devise&lt;/span&gt;
&lt;span class=&quot;nb&quot;&gt;require&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;devise&apos;&lt;/span&gt;
&lt;span class=&quot;no&quot;&gt;RSpec&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;configure&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;do&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;include&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;Devise&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;Test&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;ControllerHelpers&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;:type&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;:controller&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;include&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;FactoryBot&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;Syntax&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;Methods&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;# Add Capybara settings and remote web driver. See docker-compose.test.yml for more details.&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;# NOTE: screenshots for failing tests appear in #{Rails.root}/tmp/screenshots.&lt;/span&gt;
&lt;span class=&quot;no&quot;&gt;Capybara&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;server&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;:webrick&lt;/span&gt;
&lt;span class=&quot;no&quot;&gt;Capybara&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;register_driver&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;:chrome_headless&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;do&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;app&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;chrome_capabilities&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;Selenium&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;WebDriver&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;Remote&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;Capabilities&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;chrome&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;goog:chromeOptions&apos;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;args&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;sx&quot;&gt;%w[no-sandbox headless disable-gpu window-size=1400,1400]&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;})&lt;/span&gt;

  &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;ENV&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;HUB_URL&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;no&quot;&gt;Capybara&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;Selenium&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;Driver&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;new&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;app&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                                   &lt;span class=&quot;ss&quot;&gt;browser: :remote&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                                   &lt;span class=&quot;ss&quot;&gt;url: &lt;/span&gt;&lt;span class=&quot;no&quot;&gt;ENV&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;HUB_URL&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
                                   &lt;span class=&quot;ss&quot;&gt;desired_capabilities: &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;chrome_capabilities&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt;
    &lt;span class=&quot;no&quot;&gt;Capybara&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;Selenium&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;Driver&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;new&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;app&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                                   &lt;span class=&quot;ss&quot;&gt;browser: :chrome&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                                   &lt;span class=&quot;ss&quot;&gt;desired_capabilities: &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;chrome_capabilities&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;

&lt;span class=&quot;no&quot;&gt;RSpec&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;configure&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;do&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;before&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;ss&quot;&gt;:each&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;type: :system&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;do&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;driven_by&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;:chrome_headless&lt;/span&gt;

    &lt;span class=&quot;no&quot;&gt;Capybara&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;app_host&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;http://&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;#{&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;IPSocket&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;getaddress&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;Socket&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;gethostname&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;:3000&quot;&lt;/span&gt;
    &lt;span class=&quot;no&quot;&gt;Capybara&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;server_host&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;no&quot;&gt;IPSocket&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;getaddress&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;Socket&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;gethostname&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;no&quot;&gt;Capybara&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;server_port&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3000&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Getting the setup exactly right was tricky, but this is the code that works for us. We frequently referred to the &lt;a href=&quot;https://relishapp.com/rspec/&quot;&gt;Rspec&lt;/a&gt; site as well.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fas fa-laptop-code fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;Dev Setup&lt;/h3&gt;

&lt;p&gt;The proof that the toolchain upgrade was a success came when we rolled out Docker to the wider web development team. Previously, setting up Ruby, Rails, and all of the gems was a multi-hour process with frequent web searches to fix environment issues. With Docker, the developers downloaded Docker, built the image with one Terminal command, and launched the multi-container application with a second command. It took five minutes.&lt;/p&gt;
</description>
        <pubDate>Sat, 19 Feb 2022 12:00:00 +0000</pubDate>
        <link>https://bdunagan.com/2022/02/19/containerizing-our-web-development-toolchain-with-docker/</link>
        <guid isPermaLink="true">https://bdunagan.com/2022/02/19/containerizing-our-web-development-toolchain-with-docker/</guid>
      </item>
    
      <item>
        <title>Building a Technical Documentation DocOps/DevOps Pipeline</title>
        <description>&lt;div class=&quot;flush&quot;&gt;&lt;img src=&quot;https://bdunagan.com/files/documentation.jpg&quot; /&gt;&lt;/div&gt;
&lt;div class=&quot;caption&quot;&gt;&lt;a href=&quot;https://unsplash.com/photos/2JIvboGLeho&quot;&gt;Unsplash&lt;/a&gt; - @syinq&lt;/div&gt;

&lt;p&gt;Retrospect Backup is designed to be easy to use for non-technical customers, but it has a deep set of features for the technical subset. To understand its capabilities, customers need well-written, up-to-date documentation, in the form of user guides and knowledgebase articles in their language, with targeted content accessible as a URL. Retrospect has been around for 33 years and supports six languages, so achieving that goal is a significant ongoing DocOps/DevOps project.&lt;/p&gt;

&lt;p&gt;In the past, we used Adobe FrameMaker for our user guides and a Salesforce Force.com Site for our knowledgebase. Using FrameMaker meant only someone with FrameMaker knowledge could update the user guides, typically an outside contractor. It was difficult to see what changed when updating the content, so the team had to trust spot-checking. Using Force.com meant Support could manage the knowledgebase content, but it wasn’t easy and had no search capabilities at the time. Moreover, language support for both simply multiplied the required tasks.&lt;/p&gt;

&lt;p&gt;All these barriers led to fewer documentation updates, detracting from our customer experience and increased Support tickets from confused customers.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fas fa-file-import fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;Documentation Pipeline&lt;/h3&gt;

&lt;p&gt;To simplify the process of keeping documentation current and adding new information, we updated our documentation DocOps/DevOps pipeline with a set of goals and the following tools:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Markup&lt;/strong&gt;: Documents need to be readable plain text with markup =&amp;gt; &lt;em&gt;AsciiDoc and Markdown&lt;/em&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Source Control and WYSIWYG&lt;/strong&gt;: Documents need to be versioned, diffable, and editable by non-technical staff =&amp;gt; &lt;em&gt;Git and GitHub&lt;/em&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Languages&lt;/strong&gt;: Documents need to be easily localized =&amp;gt; &lt;em&gt;Gengo&lt;/em&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Deployment, Search, and Links&lt;/strong&gt;: Document generation needs to be automated and deployable with search and deep linking  =&amp;gt; &lt;em&gt;Website, Prince, and Swiftype&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/documentation_pipeline.png&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Documentation Pipeline&lt;/div&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fas fa-file-alt fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;Markup: AsciiDoc and Markdown&lt;/h3&gt;

&lt;p&gt;Markup languages like &lt;a href=&quot;https://asciidoc.org/&quot;&gt;AsciiDoc&lt;/a&gt; and &lt;a href=&quot;https://daringfireball.net/projects/markdown/&quot;&gt;Markdown&lt;/a&gt; are excellent at embedding sophisticated styles and layouts in plain text. AsciiDoc in particular was designed for generating books from markup, so it works well for both HTML generation and PDF generation. We use AsciiDoc for most parts of our documentation now, both user guides and KB articles, and mix in Markdown documents when it’s easier.&lt;/p&gt;

&lt;p&gt;Converting Adobe FrameMaker to AsciiDoc was painful. The Retrospect Backup for Windows User’s Guide was 500 pages long, and the Retrospect Backup for Mac User’s Guide was 300 pages long. Each was localized into six languages: English, German, Spanish, French, Italian, and Japanese. First, we exported from FrameMaker to HTML, but the HTML was very messy. Next, we used many passes of regular expressions in Ruby to clean it up, and finally, we used &lt;a href=&quot;https://pandoc.org/&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pandoc&lt;/code&gt;&lt;/a&gt; to convert from HTML to AsciiDoc.&lt;/p&gt;

&lt;p&gt;Compared to the Adobe FrameMaker conversion, we had a far easier time ingesting the SFDC Article pages that populated our Force.com knowledgebase. We used a Ruby gem to read the objects in from the API and write them out to files with bits of Ruby to convert it into Markdown. There were a small subset of KB article that had already been localized, and those were saved as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;kb_name.language.adoc&lt;/code&gt; (i.e. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;error_1101.ja.adoc&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;With all of our documentation now in a plain text format, we could easily add, edit, delete, and localize it.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fas fa-globe fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;Language: Gengo&lt;/h3&gt;

&lt;p&gt;&lt;a href=&quot;https://gengo.com/&quot;&gt;Gengo&lt;/a&gt; translates content within a few hours at low price point. We use them for all of our localization needs. Compared to our previous service, it has saved us thousands of dollars and months of waiting. Getting a string translated for a couple dollars in an hour by a professional sounded absurd to us before we discovered Gengo.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fas fa-link fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;Source Control and WYSIWYG: Git and GitHub&lt;/h3&gt;

&lt;p&gt;We switched to Git and GitHub almost a decade ago: &lt;a href=&quot;https://bdunagan.com/2014/04/02/migrating-retrospect-from-svn-to-github-enterprise/&quot;&gt;Migrating Retrospect from SVN to GitHub Enterprise&lt;/a&gt;. By adding our documentation to Git, we could easily version the content and diff any changes. We now have 402 KB articles in Git, and our user guides total 104k lines. When we recently added a new chapter in the user’s guide, it only took 10 minutes and became a GitHub pull request to be diff’d and reviewed. When Support writes up a new KB article, Engineering can add to the production website in about 10 minutes.&lt;/p&gt;

&lt;p&gt;Moreover, GitHub provides an excellent WYSIWYG editor for AsciiDoc and Markdown files, with rich styles and preview available.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/documentation_kb_edit.png&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;GitHub - WYSIWYG Editor&lt;/div&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/documentation_kb_preview.png&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;GitHub - WYSIWYG Preview&lt;/div&gt;

&lt;p&gt;Using Git and GitHub allows us to integrate our documentation into our existing build and deployment pipeline while providing editor abilities to non-technical staff.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fas fa-file-import fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;Deployment, Search, Links: Website, Prince, and Swiftype&lt;/h3&gt;

&lt;p&gt;The documentation is now in plain text and versioned, but we still need to deploy it with search and links. Most companies use Wordpress for their websites, but we use Ruby-on-Rails. The tradeoff is extensibility. Ruby-on-Rails is a less common choice, so fewer web developers have experience with it compared to Wordpress. However, Rails lets us quickly build new functionality: &lt;a href=&quot;https://bdunagan.com/2021/07/25/building-deep-scalable-customer-relationship-management-tools/&quot;&gt;Building Deep, Scalable Customer Relationship Management Tools&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;For the knowledgebase, we built a new page with categories and links to all of the articles. To populate it, we wrote a simple Rails initializer to read in all of the articles in the directory and store their frontmatter in memory to be used later by the KB index page.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/documentation_site_kb.png&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Website - Knowledgebase&lt;/div&gt;

&lt;p&gt;The user’s guides took more effort. We could easily build a section of the website for Documentation. The trick was handling the chapters well. We wanted to enable Support to link directly to specific sections of chapters without loading the entirety of the user’s guide in a single page. We manually build the chapters in the guide to allow custom sorting, but we dynamically build the subchapters by reading and parsing the AsciiDoc files within Rails. With both, we can display the any chapter along with a sidebar of the overall structure of the guide for context.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/documentation_site_ug.png&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Website - User&apos;s Guide&lt;/div&gt;

&lt;p&gt;It took some time to work out the correct AsciiDoc syntax to integrate well with Rails via &lt;a href=&quot;https://asciidoctor.org/&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;asciidoctor&lt;/code&gt;&lt;/a&gt;. For instance, every chapter needed the following frontmatter:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;----
:showtitle:
:idprefix:
:idseparator: -

= Chapter Title [[chapter-title]]
——&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Another hurdle was incorporating ERB code into AsciiDoc, so that we could use our Rails helper functions, like short references to our S3 assets:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-ruby&quot; data-lang=&quot;ruby&quot;&gt;&lt;span class=&quot;o&quot;&gt;+++&amp;lt;&lt;/span&gt;&lt;span class=&quot;sx&quot;&gt;%= image_tag(s3_image(&quot;docs/example.png&quot;), :class =&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;whitepaper_image&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&amp;gt;+++&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;The user’s guides also needed to be converted into PDFs for offline usage. We use &lt;a href=&quot;https://www.princexml.com/&quot;&gt;Prince&lt;/a&gt; to convert them from a single HTML page into 300+ page PDFs, wrapped into a Rails &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;rake&lt;/code&gt; task.&lt;/p&gt;

&lt;p&gt;Because the knowledgebase and user’s guides are on the website, all of their content is indexed by Google and others. For search, we use Swiftype to index the content, rank the results when necessary, and integrate a simple search field at the top of the page.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fas fa-shapes fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;Alternatives&lt;/h3&gt;

&lt;p&gt;We are an engineering-focused company, and many companies optimize their documentation pipeline for other goals.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Full Service&lt;/strong&gt;: Services like ZenDesk provide a complete customer service experience, from ticketing to knowledgebase.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Domain-Specific Documentation&lt;/strong&gt;: There are other projects to leverage, like &lt;a href=&quot;https://github.com/slatedocs/slate&quot;&gt;Slate&lt;/a&gt;, to create documentation for a specific purpose, like API documentation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There are clear downsides to our approach. Retrospect Engineering is the bottleneck for feature development compared to using a third-party service. I touched on these sorts of trade-offs in &lt;a href=&quot;https://bdunagan.com/2021/07/25/building-deep-scalable-customer-relationship-management-tools/&quot;&gt;Building Deep, Scalable Customer Relationship Management Tools&lt;/a&gt;. The issue comes down to what is best for a company at a given time.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fas fa-building fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;Internal Documentation&lt;/h3&gt;

&lt;p&gt;Along with customer-facing documentation, we migrated our internal documentation from MediaWiki to AsciiDoc hosted on GitHub Wikis: 542 pages. GitHub Wikis are version controlled, in plain text, with universal search, and with access via a local repo. The wiki lives in its own repo but under the Wiki feature, so that we had access the better WYSIWYG editor with sidebar support and have the universal search separate the wiki entries correctly.&lt;/p&gt;

&lt;p&gt;For the migration, we used Git’s wiki extension to clone MediaWiki to a Git repo then used &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pandoc&lt;/code&gt; to convert the pages from MediaWiki format to AsciiDoc. Most of the pages are old and unused, another reason why we wanted the content in GitHub. It’s easy to see the entirety of the wiki as local text files rather than a long HTML list. The biggest downside was migrating the 300 embedded images and files from MediaWiki. We opted to migrate selectively when we found one missing in a relevant wiki page in GitHub. In the end, we only migrated a couple dozen files.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fas fa-sort-amount-down-alt fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;Lowering Barriers&lt;/h3&gt;

&lt;p&gt;Retrospect Engineering has consistently found that better tools and workflows lead to significantly improved experiences. We found that with Git and GitHub, and we found that with these documentation pipeline upgrades. The improved DocOps/DevOps pipeline reduced our documentation schedule from weeks to days.&lt;/p&gt;
</description>
        <pubDate>Sat, 05 Feb 2022 12:00:00 +0000</pubDate>
        <link>https://bdunagan.com/2022/02/05/building-a-technical-documentation-pipeline/</link>
        <guid isPermaLink="true">https://bdunagan.com/2022/02/05/building-a-technical-documentation-pipeline/</guid>
      </item>
    
      <item>
        <title>Leveraging Forever-Incremental Backup Technology for Customer-Centric Data Protection</title>
        <description>&lt;div class=&quot;flush&quot;&gt;&lt;img src=&quot;https://bdunagan.com/files/backup_hard_drive.jpg&quot; /&gt;&lt;/div&gt;
&lt;div class=&quot;caption&quot;&gt;&lt;a href=&quot;https://unsplash.com/photos/GNyjCePVRs8&quot;&gt;Unsplash&lt;/a&gt; - @benjaminlehman&lt;/div&gt;

&lt;p&gt;Let’s say you have a computing environment, and you need to protect its contents over time. You need a backup: a versioned snapshot of the systems for point-in-time restores. Any backup solution will be optimized for certain criteria–simplicity, completeness, flexibility, etc–but each one needs to perform backups. Let’s walk through the three different types:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Full Backup&lt;/strong&gt;: This is a complete point-in-time snapshot of a volume. It takes up the most space.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Differential Backup&lt;/strong&gt;: This is a backup that only depends on one full backup. If you have two differential backups based on one full backup, each differential backup only depends on that one full backup and thus take up more space.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Incremental Backup&lt;/strong&gt;: This is a backup that depends on a full backup and a number of incremental backups. This is the smallest backup format you can have because it only takes up as much space as necessary to preserve point-in-time restores for each recovery point.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Retrospect Backup’s core engine uses forever-incremental file-based backup technology to protect individual files and folders as well as full systems. Forever-incremental backup allows you to only back up file and system changes since the previous backup, minimizing backup time and storage costs.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/retrospect_incremental_backup.png&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Forever-Incremental Backup&lt;/div&gt;

&lt;p&gt;When you restore, Retrospect will combine these incremental backups into a synthetic full backup, giving you a point-in-time restore for a set of files or a full system.&lt;/p&gt;

&lt;p&gt;This customer-centric approach allows businesses to leverage the speed of file-based recovery, restoring a file to the Desktop or to the original location. IT administrators can even restore an entire folder structure based on settings like “Do not replace newer files”, allowing IT to recover a large folder structure in place on the affected server or endpoint without determining whether files have been touched more recently than the backup.&lt;/p&gt;

&lt;p&gt;Retrospect Backup’s goal is to protect your entire environment the way you need it, giving you as much flexibility as possible.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fas fa-shapes fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;Workflow Components&lt;/h3&gt;

&lt;p&gt;Retrospect Backup leverages a number of components to optimize its forever-incremental file-based backup workflow:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;System State&lt;/strong&gt;: Retrospect includes system state, application state, disk layout, and boot information to support disaster recovery scenarios. To capture an accurate backup of a Windows system, Retrospect utilizes Windows Volume Shadow Copy Service (VSS) to snapshot the system, so that with the other information, it can perform a full disaster recovery (DR), also bare-metal recovery (BMR).&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Filtering&lt;/strong&gt;: Businesses need to be able to include or exclude files based on how critical they are to the business. Music and movies can be excluded while preserving the ability to perform a bare-metal recovery.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Destinations&lt;/strong&gt;: Retrospect supports a wide range of destinations–disk, NAS, tape, and cloud–and backups can be transferred between different storage locations.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Scheduling&lt;/strong&gt;: Organizations have their own unique business flow, and Retrospect allows them to schedule their data protection strategies based on their business needs.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Scripts&lt;/strong&gt;: Retrospect supports different types of actions on data, including backup, replication, archive, and transfers with support for block-level incremental backup (BLIB) and file-level deduplication for backup scenarios. Transfers allow customers to make a copy of a backup in another storage destination, with support for different destination types, filtering, and scheduling. Customers can make one backup of a source and then move it multiple places without touching the source again. Customers can even enable automatic backup healing by using a “Verify” script to auto-heal their backups. If any issues are found, Retrospect will back up the affected files in the next backup. It’s a flexible approach that you can apply to each backup set on your own schedule to detect and fix integrity issues automatically.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;ProactiveAI&lt;/strong&gt;: Retrospect’s unique ProactiveAI policy-based scheduling enables businesses to intelligently protect their infrastructure based on policy window. Endpoints that go on and off the network will be protected as soon as they join, and with remote backup, remote workers are protected even outside of the office. See &lt;a href=&quot;https://bdunagan.com/2021/06/19/optimizing-retrospect-s-algorithm-for-resource-scheduling/&quot;&gt;Optimizing Retrospect’s Algorithm for Resource Scheduling&lt;/a&gt; for a technical deep dive.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/retrospect_backup_stack.png&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Retrospect&apos;s Backup Stack&lt;/div&gt;

&lt;p&gt;The combination and deep integration of all of these components form Retrospect’s backup stack and provide businesses with a flexible backup solution that extends to all major operating systems and includes application-level backup for Microsoft Exchange and Microsoft SQL Server. Customers can deploy Retrospect wherever it makes the most sense: on a VM, in the cloud, or on an old desktop.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fas fa-images fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;Image-Based Backup Comparison&lt;/h3&gt;

&lt;p&gt;An alternative backup method is image-based backup. Image-based backup uses the blocks of data on a volume to read and protect data at a system level.&lt;/p&gt;

&lt;p&gt;Many enterprise data protection solutions use this approach. They focus on the entire volume, copying all used blocks on a volume for a full backup and then subsequent changed blocks for incremental backups. This approach allows customers to restore entire systems, but it loses the file-level granularity of file-based backup.&lt;/p&gt;

&lt;p&gt;To allow better flexibility, a number of solutions also support file-level restore. This granular recovery lets businesses quickly choose a point-in-time recovery for a system, launch and attach to it, and then retrieve the individual file that they needed. This workflow enables IT administrators to quickly retrieve a lost file.&lt;/p&gt;

&lt;p&gt;However, I don’t know of any solutions that integrate file-level restore into the original system along with the ability to leave files that are newer in that folder structure, as Retrospect can.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fas fa-user-shield fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;Customer-Centric Approach&lt;/h3&gt;

&lt;p&gt;Businesses use Retrospect’s customer-centric approach to data protection to quickly restore individual files, large folders, or entire systems with incremental-forever file-based backup, system state snapshotting, filtering, scheduling, and deep integration with destinations.&lt;/p&gt;
</description>
        <pubDate>Sun, 02 Jan 2022 12:00:00 +0000</pubDate>
        <link>https://bdunagan.com/2022/01/02/leveraging-forever-incremental-backup-for-customer-centric-data-protection/</link>
        <guid isPermaLink="true">https://bdunagan.com/2022/01/02/leveraging-forever-incremental-backup-for-customer-centric-data-protection/</guid>
      </item>
    
      <item>
        <title>Scaling Data Ingestion for a Web Application to 250 Million Requests</title>
        <description>&lt;div class=&quot;flush&quot;&gt;&lt;img src=&quot;https://bdunagan.com/files/wave.jpg&quot; /&gt;&lt;/div&gt;
&lt;div class=&quot;caption&quot;&gt;&lt;a href=&quot;https://unsplash.com/photos/czkOVu3YwIQ&quot;&gt;Unsplash&lt;/a&gt; - @luckybeanz&lt;/div&gt;

&lt;p&gt;&lt;a href=&quot;https://console.retrospect.com&quot;&gt;Retrospect Management Console&lt;/a&gt; is Retrospect’s hosted service where businesses can connect their instances of Retrospect Backup to aggregate, analyze, and manage their backup environments. See &lt;a href=&quot;https://bdunagan.com/2021/10/30/mvp-and-iteration-how-we-built-a-web-application-for-backup-analytics/&quot;&gt;MVP and Iteration: How We Built a Web Application for Backup Analytics&lt;/a&gt; for more details about why we built it.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/retrospect_mc_current.png&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Retrospect Management Console&lt;/div&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fas fa-cubes fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;Stack&lt;/h3&gt;

&lt;p&gt;Here is our current stack for Retrospect Management Console:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Web framework: Ruby-on-Rails&lt;/li&gt;
  &lt;li&gt;Background processing: Sidekiq&lt;/li&gt;
  &lt;li&gt;Auto-Scaling: Rails Autoscale&lt;/li&gt;
  &lt;li&gt;Authentication: Devise Ruby gem&lt;/li&gt;
  &lt;li&gt;Database: Postgres&lt;/li&gt;
  &lt;li&gt;Deployment: Salesforce Heroku&lt;/li&gt;
  &lt;li&gt;Source control: GitHub Enterprise&lt;/li&gt;
&lt;/ul&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fas fa-sitemap fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;Data Ingestion&lt;/h3&gt;

&lt;p&gt;Data requests are like ocean waves hitting a service. Ingestion is trivial for small waves, but the larger and more frequent they get, the harder it is to finish ingesting before the next wave. The service might recover for a bit, but then it gets knocked over again with an even larger wave. And of course, you can’t see the waves, so you have no warning.&lt;/p&gt;

&lt;p&gt;As Retrospect Management Console has grown in usage, we hit two constraints: request number and request size. There were more requests, and more requests had a lot of data, up to 11MB.&lt;/p&gt;

&lt;p&gt;In the beginning, we didn’t know what the problem was beyond Heroku telling us that the service was down. The memory consumption was far too much, and increasing the running instances (dynos) didn’t resolve the problem.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/retrospect_management_console_heroku.png&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Heroku Metrics&lt;/div&gt;

&lt;p&gt;We initially focused on the memory consumption and lowered the maximum allowed amount. That ceiling temporarily resolved the 500 errors, but the service still went down with a big wave and didn’t recover without a couple manual restarts.&lt;/p&gt;

&lt;p&gt;After investigating the problem, it was clear that our background processing framework, &lt;a href=&quot;https://github.com/brandonhilkert/sucker_punch&quot;&gt;SuckerPunch&lt;/a&gt;, was the bottleneck. It ran on background Ruby threads in the same instance and was built for sending the occasional email in the background, not processing millions of large data requests every day. We had chosen it initially to ship the MVP, but it was time to move on.&lt;/p&gt;

&lt;p&gt;We considered switching to Amazon SES or a similar cloud service, but we settled on &lt;a href=&quot;https://sidekiq.org/&quot;&gt;Sidekiq&lt;/a&gt;. Sidekiq was created for scalable background processing in Ruby. It ran background instances and handed off jobs to them. In fact, SuckerPunch was written as a lighter version of Sidekiq, so it fit well into our existing workflow.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/retrospect_management_console_sidekiq.png&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Sidekiq Dashboard&lt;/div&gt;

&lt;p&gt;We switched to Sidekiq in April 2021. The transition was five lines of code, and it immediately resolved our scaling issues in both directions. Since then, the background instances have processed 250 million requests at 1.1 million requests per day, including 20MB requests. 20MB is a lot of JSON backup data. The service has been able to handle 300 requests per second.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fas fa-layer-group fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;Auto-Scaling&lt;/h3&gt;

&lt;p&gt;In addition to Sidekiq, we needed to automatically scale our web instances and worker instances up and down based on the amount of traffic coming in. If we underprovisioned, we would not be able to process the amount of data coming in. If we overprovisioned, we would be paying too much for the amount of data we needed to process.&lt;/p&gt;

&lt;p&gt;Heroku offers a services for web instances, but to cover both types, we chose &lt;a href=&quot;https://railsautoscale.com/&quot;&gt;Rails Autoscale&lt;/a&gt;. Rails Autoscale automatically scales instances based on queue time for requests, and it has a number of options for scaling up and scaling down along with limiting the minimum and maximum number of instances.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/retrospect_management_console_rails_autoscale_settings.png&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Rails Autoscale Settings&lt;/div&gt;

&lt;p&gt;It also has a nice usage page to show how much money it’s saving you by not overprovisioning.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fas fa-users fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;250 Million Requests&lt;/h3&gt;

&lt;p&gt;We’ve been thrilled by how Retrospect Management Console has grown from a product idea to solve customer problems into a shipping product that has handled 250 million requests since the summer. By monitoring the service and investigating each bottleneck in turn, the service has been able to scale up to meet customer needs, using excellent solutions like Sidekiq and Rails Autoscale.&lt;/p&gt;
</description>
        <pubDate>Thu, 30 Dec 2021 12:00:00 +0000</pubDate>
        <link>https://bdunagan.com/2021/12/30/scaling-data-ingestion-for-a-web-application-to-250-million-requests/</link>
        <guid isPermaLink="true">https://bdunagan.com/2021/12/30/scaling-data-ingestion-for-a-web-application-to-250-million-requests/</guid>
      </item>
    
      <item>
        <title>Retrospect, Inc. Turns 10</title>
        <description>&lt;div class=&quot;flush&quot;&gt;&lt;img src=&quot;https://bdunagan.com/files/retrospect_10_years.jpg&quot; /&gt;&lt;/div&gt;
&lt;div class=&quot;caption&quot;&gt;&lt;a href=&quot;https://unsplash.com/photos/Hli3R6LKibo&quot;&gt;Unsplash&lt;/a&gt; - @adigold1&lt;/div&gt;

&lt;p&gt;Retrospect, Inc. incorporated ten years ago today: November 1, 2011. Retrospect Backup was originally owned by Dantz Development until Dantz was acquired by EMC in 2004. In 2010, Sonic Solutions acquired Retrospect Backup from EMC and then itself was acquired by Rovi. The leaders of Retrospect approached the management team at Rovi about the idea of spinning off the product, and Rovi agreed to a deal.&lt;/p&gt;

&lt;p&gt;Thus, Retrospect, Inc. was born, focused solely on data protection. See our &lt;a href=&quot;https://en.wikipedia.org/wiki/Retrospect_(software)&quot;&gt;Wikipedia page&lt;/a&gt; for more details. Eight years later, we were acquired by StorCentric. We’re part of a larger organization now, with ways to grow the business and protect more data.&lt;/p&gt;

&lt;p&gt;While Retrospect, Inc. has only been around for a decade, Retrospect Backup has been protecting businesses for 32 years now. Retrospect Backup doesn’t collect analytics, but a couple years ago, I did a back-of-the-envelope calculation for how much data we protect: 100PB. In fact, we now have individual customers who have over 1PB protected by Retrospect Backup. For a long list of features we’ve shipped since 2011, see &lt;a href=&quot;https://www.retrospect.com/en/30_history&quot;&gt;“Retrospect Celebrates 30 Years of Data Protection”&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I’m extremely grateful to be a part of such a fantastic team and to work on a product that has helped so many recover from data loss. Thanks to everyone who is or has been a part of Retrospect!&lt;/p&gt;
</description>
        <pubDate>Mon, 01 Nov 2021 12:00:00 +0000</pubDate>
        <link>https://bdunagan.com/2021/11/01/retrospect-inc-is-10-years-old/</link>
        <guid isPermaLink="true">https://bdunagan.com/2021/11/01/retrospect-inc-is-10-years-old/</guid>
      </item>
    
      <item>
        <title>MVP and Iteration: How We Built a Web Application for Backup Analytics</title>
        <description>&lt;div class=&quot;flush&quot;&gt;&lt;img src=&quot;https://bdunagan.com/files/retrospect_management_console_mvp.jpg&quot; /&gt;&lt;/div&gt;
&lt;div class=&quot;caption&quot;&gt;&lt;a href=&quot;https://unsplash.com/photos/CcADlKdo94o&quot;&gt;Unsplash&lt;/a&gt; - @halacious&lt;/div&gt;

&lt;p&gt;Retrospect is a small company, so when we build new products or features, our goal is to find the intersection between what customers need and what our team can build, deliver, sell, and support.&lt;/p&gt;

&lt;p&gt;In 2017, feedback from customers and partners frequently touched on how difficult it was to monitor and manage multiple instances of Retrospect Backup. Retrospect Backup is software that is installed on Windows and Macs, either servers or desktops, so administrators needed to log into each computer and look at the status in the application or configure email reporting for each instance. The process was painful and manual.&lt;/p&gt;

&lt;p&gt;The Retrospect team talked about different solutions to this, including third-party monitoring service integration, better email reporting, or a web UI for each instance. Eventually, we agreed that the best long-term solution would be our own hosted service.&lt;/p&gt;

&lt;p&gt;With a hosted service, customers and partners could log into a single web application and see an aggregated view of their entire backup infrastructure or, in the case of partners, all of their clients’ backup infrastructures through a single pane of glass. That one-sentence product pitch was the goal for &lt;a href=&quot;https://console.retrospect.com&quot;&gt;Retrospect Management Console&lt;/a&gt;, focused initially on monitoring and analytics and eventually on management.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fas fa-cubes fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;MVP: Minimum Viable Product&lt;/h3&gt;

&lt;p&gt;Building a minimum viable product or MVP represents a compromise between Sales, Engineering, and Product Management. Sales is looking at competitors and wanting to ship the equivalent solution as soon as possible. Engineering is trying to architect and deploy a well-designed product. Product Management is representing the customer and how this product will address their pain points. Finding a balance between these competing views comes down to a well-defined product roadmap that satisfies all three perspectives and begins with the MVP.&lt;/p&gt;

&lt;p&gt;The value of the MVP is shipping and feedback. Shipping an MVP means the Engineering team has built a foundation for the product with a small set of features along with a packaging and deployment process. Feedback means you have customers–internally and externally–who are interested enough to give you feedback to help you iterate toward a better future product.&lt;/p&gt;

&lt;p&gt;In our case, we wanted to ship the MVP for a hosted service that a customer could sign into and see an aggregated view from multiple Retrospect Backup instances. That translated into the following requirements:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Hosted Service Platform&lt;/strong&gt;: Heroku&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Database&lt;/strong&gt;: Postgres&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Application Framework&lt;/strong&gt;: Ruby-on-Rails&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Authentication&lt;/strong&gt;: Devise&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Security&lt;/strong&gt;: SSL encryption&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Data Processing&lt;/strong&gt;: SuckerPunch&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Integration&lt;/strong&gt;: Retrospect Backup connection and JSON data&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Dashboard&lt;/strong&gt;: Aggregated view of multiple instances&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The actual customer use case is the final point because the others represent the foundation of the product. We couldn’t get data into the service without them. The product requirement for the dashboard was a set of useful statistics and charts, and we could build more after shipping and getting feedback.&lt;/p&gt;

&lt;p&gt;For comparison, we added the following features in future releases after the MVP:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Scalable Data Processing&lt;/strong&gt;: Sidekiq&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Detailed Instance Monitoring&lt;/strong&gt;: Backup Report, Sources List, Scripts List, Backup Sets List, Scripts List&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Interactivity&lt;/strong&gt;: Pause/Stop Support for Activities and Script Creation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We started building in May 2018 and shipped the MVP in August 2018. After testing it in-house for a month, we released it to the public as a beta. It required the latest version of Retrospect Backup to connect and send analytics data.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/retrospect_mc_mvp_web.png&quot; style=&quot;max-height:400;&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Retrospect Management Console - MVP&lt;/div&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/retrospect_mc_mvp_integration.png&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Retrospect Backup Integration&lt;/div&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fas fa-balance-scale-left fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;MVP Tradeoffs&lt;/h3&gt;

&lt;p&gt;When you’re building an MVP, you also need to acknowledge the tradeoffs. The problem with shipping an MVP is those early decisions build the foundation for the product, and as teams continue to build features on top of them, they also find it harder to justify the time commitment of changing them.&lt;/p&gt;

&lt;p&gt;For us, we wanted to ship as soon as possible, which meant utilizing our existing knowledge of Ruby-on-Rails. We could have built the service as a Node.js application with React or Angular, but we weren’t as familiar with that approach. However, switching to a different application framework now would be a significant project because of the features we would need to replicate.&lt;/p&gt;

&lt;p&gt;Moreover, we chose to have the API endpoint for the integration be served by the user-facing application. That design choice simplified the architecture in the beginning. We only needed one web instance running to both ingest data from Retrospect Backup and also serve the dashboard. However, as the service scaled, we needed to address that bottleneck.&lt;/p&gt;

&lt;p&gt;MVPs are a fantastic approach to getting a product into customers’ hands and iterating based on their feedback, but your team needs to acknowledge the Engineering tradeoffs that come with that approach.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fas fa-sync-alt fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;Feedback and Iteration&lt;/h3&gt;

&lt;p&gt;With the MVP shipped, we revisited the product roadmap. Sales wanted more management abilities to compete with other services. Product Management pushed for easier workflows for customers to use the service. Support pointed out issues that customers had. Engineering evaluated performance optimizations for scaling the service.&lt;/p&gt;

&lt;p&gt;We’ve added a number of features since the MVP shipped, including the following:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Navigation&lt;/strong&gt;: Better navigation for organizations with managed organizations and multiple Retrospect Backup instances&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Detailed Monitoring&lt;/strong&gt;: More insight into Retrospect Backup instances via Backup Report, Sources list, Scripts list, Backup Sets list, Scripts list&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Interactivity&lt;/strong&gt;: Pause/Stop support for activities, shared scripts, and backup set/script creation&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Status&lt;/strong&gt;: Detailed status for each Retrospect Backup instance’s service connectivity&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Scalable Data Processing&lt;/strong&gt;: Sidekiq and Rails AutoScale&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Web applications are particularly suited to MVPs and iteration based on feedback because the company controls the delivery. We only update Retrospect Backup every six months because customers have repeatedly told us they don’t want updates more frequently. However, with a web application, we can update the application far more frequently, and the customer is only aware if there are new features.&lt;/p&gt;

&lt;p&gt;For Retrospect Management Console, we’re on deployment #396, so on average, we’ve shipped 2.5 times per week since Fall 2018.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/retrospect_mc_current.png&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Retrospect Management Console - Current&lt;/div&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fas fa-desktop fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;Deeper Product Integration&lt;/h3&gt;

&lt;p&gt;Owning a hosted service allowed us to integrate and streamline the user experience for Retrospect Backup. In May 2021, we launched Retrospect Backup 18 with deeper integration with Retrospect Management Console.&lt;/p&gt;

&lt;p&gt;When you download a trial of Retrospect Backup, the download link points to Retrospect Management Console. The service automatically creates an account for the embedded email address and license and then creates a personalized version of the application installer with license and Retrospect Management Console UUID included. After installation, Retrospect Backup automatically creates a public/private keypair and uploads it to the service, so that the service can bundle it in each client installer for that Retrospect instance.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/retrospect_mc_mac_client.png&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Retrospect Backup - Add Clients&lt;/div&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/retrospect_mc_client_link.png&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Retrospect Management Console - Client Installers&lt;/div&gt;

&lt;p&gt;In Retrospect Backup, the customer can copy a single download link for the Retrospect Backup Client agent installer and send it their entire company. When each employee downloads and installs it, their computer will be automatically added to the Retrospect Backup instance and start getting protected, without any administrative setup.&lt;/p&gt;

&lt;p&gt;By leveraging Retrospect Management Console, the Engineering team was able to streamline the entire end-to-end process of trial download to automatic protection for an entire organization, requiring only a couple clicks from the administrator.&lt;/p&gt;

&lt;p&gt;Building an MVP and then iterating based on feedback enabled the Retrospect Engineering team to ship an integrated backup service that helps companies easily onboard, protect, and manage their backup environment from a single pane of glass.&lt;/p&gt;
</description>
        <pubDate>Sat, 30 Oct 2021 12:00:00 +0000</pubDate>
        <link>https://bdunagan.com/2021/10/30/mvp-and-iteration-how-we-built-a-web-application-for-backup-analytics/</link>
        <guid isPermaLink="true">https://bdunagan.com/2021/10/30/mvp-and-iteration-how-we-built-a-web-application-for-backup-analytics/</guid>
      </item>
    
      <item>
        <title>Jekyll Tip: Absolute URLs for Links and Images in RSS Feed</title>
        <description>&lt;p&gt;When I write a post and add an image, the image lives in the “files” folder. When Jekyll builds the static site locally, I can see the image inline. When I push the site to GitHub Pages, I can see the image online in the post with my web browser. But when I view the feed in an RSS reader (Reeder for me), no image.&lt;/p&gt;

&lt;p&gt;The issue is the URL for the image is a relative path instead of an absolute path:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;# Relative Path
/files/example.jpg
# Absolute Path
https://bdunagan.com/files/example.jpg&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;The RSS reader doesn’t know what the base path is. But if I wrote it with the absolute URL format locally, I couldn’t see it until I pushed the post live.&lt;/p&gt;

&lt;p&gt;Web browsers handle relative URLs, but RSS readers are not consistent about it. You can add “xml:base” to your feed, but there is no guarantee that the RSS reader will observe it.&lt;/p&gt;

&lt;p&gt;Below is a quick Jekyll hack to replace relative URLs for links and images with absolute URLs in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/feed.xml&lt;/code&gt;, using Liquid’s &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;replace&lt;/code&gt; method:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-ruby&quot; data-lang=&quot;ruby&quot;&gt;&lt;span class=&quot;p&quot;&gt;{{&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;post&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;content&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;replace: &lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;src=&amp;amp;quot;/&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;src=&amp;amp;quot;https://bdunagan.com/&apos;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;}}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Here is the code in the context of my entire &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;feed.xml&lt;/code&gt; file:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-ruby&quot; data-lang=&quot;ruby&quot;&gt;&lt;span class=&quot;o&quot;&gt;---&lt;/span&gt;
&lt;span class=&quot;ss&quot;&gt;layout: &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;null&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;---&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;?&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;xml&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;version&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;1.0&quot;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;encoding&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;UTF-8&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;?&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rss&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;version&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;2.0&quot;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;xmlns&lt;/span&gt;&lt;span class=&quot;ss&quot;&gt;:atom&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;http://www.w3.org/2005/Atom&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;channel&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;title&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{{&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;site&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;title&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;}}&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;sr&quot;&gt;/title&amp;gt;
    &amp;lt;description&amp;gt;{{ site.description }}&amp;lt;/&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;description&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;link&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{{&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;site&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;url&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;}}&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;sr&quot;&gt;/link&amp;gt;
    &amp;lt;atom:link href=&quot;{{ site.url }}/&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;feed&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;xml&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot; rel=&quot;&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot; type=&quot;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;application&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rss&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;xml&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot; /&amp;gt;
    {% for post in site.posts limit:20 %}
      &amp;lt;item&amp;gt;
        &amp;lt;title&amp;gt;{{ post.title }}&amp;lt;/title&amp;gt;
        &amp;lt;description&amp;gt;{{ post.content | xml_escape | replace: &apos;&apos; | replace: &apos;&apos; | replace: &apos;&apos; | replace: &apos;&apos; | replace: &apos;src=&amp;amp;quot;/&apos;, &apos;src=&amp;amp;quot;https://bdunagan.com/&apos; | replace: &apos;href=&amp;amp;quot;/&apos;, &apos;href=&amp;amp;quot;https://bdunagan.com/&apos; }}&amp;lt;/description&amp;gt;
        &amp;lt;pubDate&amp;gt;{{ post.date | date: &quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;d&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;Y&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;H&lt;/span&gt;&lt;span class=&quot;ss&quot;&gt;:%&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;M&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;S&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;z&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot; }}&amp;lt;/pubDate&amp;gt;
        &amp;lt;link&amp;gt;{{ site.url }}{{ post.url }}&amp;lt;/link&amp;gt;
        &amp;lt;guid isPermaLink=&quot;&lt;/span&gt;&lt;span class=&quot;kp&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&amp;gt;{{ site.url }}{{ post.url }}&amp;lt;/guid&amp;gt;
      &amp;lt;/item&amp;gt;
    {% endfor %}
  &amp;lt;/channel&amp;gt;
&amp;lt;/rss&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;By replacing the relative links to absolute links in the RSS feed, the post is viewable in development and in production as well as in an RSS reader, without any modifications.&lt;/p&gt;
</description>
        <pubDate>Fri, 29 Oct 2021 23:00:00 +0000</pubDate>
        <link>https://bdunagan.com/2021/10/29/jekyll-tip-absolute-urls-for-links-and-images-in-rss-feed/</link>
        <guid isPermaLink="true">https://bdunagan.com/2021/10/29/jekyll-tip-absolute-urls-for-links-and-images-in-rss-feed/</guid>
      </item>
    
      <item>
        <title>How I Managed Cash Flow at a Bootstrapped Startup</title>
        <description>&lt;div class=&quot;flush&quot;&gt;&lt;img src=&quot;https://bdunagan.com/files/cashflow.jpg&quot; /&gt;&lt;/div&gt;
&lt;div class=&quot;caption&quot;&gt;&lt;a href=&quot;https://unsplash.com/photos/75nbwHfDsnY&quot;&gt;Unsplash&lt;/a&gt; - @bkotynski&lt;/div&gt;

&lt;p&gt;Retrospect, Inc. was a bootstrapped startup of sorts. We were spun out in 2011, responsible for a product that was 22 years old, with around thirty people at the company. We had an accounting firm, Accretive (now Countsy), with a fantastic and helpful advisor there, and they maintained our financials in NetSuite. From 2016 until our acquisition in 2019, I was Chief Operating Officer (COO), so I was responsible for the financials and the cash flow.&lt;/p&gt;

&lt;p&gt;The cash flow of a small company is composed of two pieces: Accounts Payable (AP) and Accounts Receivable (AR). AP is the list of vendors to whom we own money with our payment deadline. AR is the list of customers that owe us money with their payment deadline. Sales closes opportunities (bookings) and hands off booked orders to Operations; Operations fulfills those orders and bills customers (billings), transitioning those orders into AR and over to Finance; and finally, Finance follows up on AR and collects money from customers for those orders (collections). This back-office process is how product turns into cash, and, for us, it involved many manual steps.&lt;/p&gt;

&lt;p&gt;The core component of a bootstrapped startup is cash in the bank. We needed to know our current financial position, predict the company’s cash flow, and minimize past-due AR. Let’s walk through how we did that with a simple spreadsheet.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fas fa-random fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;Predicting Cash Flow&lt;/h3&gt;

&lt;p&gt;I’m not referring to complicated machine learning models to predict cash flow. We needed basic spreadsheet-driven estimates for our expenses and our cash. We did not have a subscription service at that point, so we did not deal with annual recurring revenue (ARR) or subscription churn rate. Our focus was simply understanding our current business, selling software licenses online and in the channel as well as support contracts and collecting payment from customers.&lt;/p&gt;

&lt;p&gt;AP was consistent for us every month. Countsy did a great job maintaining our NetSuite instance, so that I could drill into any department and see their expenses for the three years or the last quarter. For us, the largest expense was payroll. Beyond that, we had sales rep contractors, our European support team, our accounting team, and a long list of credit card charges. Every quarter, we had to account for quarterly bonuses as well as quarterly bills, such as Salesforce. When we grew our Sales staff, I knew how much those additional people were going to cost on a monthly basis. There were occasional out-of-the-blue bills, such as the yearly online store fee, but in general, our burn rate was easy to predict.&lt;/p&gt;

&lt;p&gt;AR took more effort. Most of our customers had NET30 payment schedules, meaning that they had 30 days to pay us after we sent the invoice. However, when I started as COO, we had an AR forecast of one week, meaning we knew which customers were going to pay us what amount for the next seven days. Think about those two data points. Our customers had 30 days to pay us, and we only predicted payments a week out. We had better data than a one-week horizon.&lt;/p&gt;

&lt;p&gt;To predict cash as far out as possible, I built an “AR Forecast” spreadsheet in Google Sheets. Using a report in NetSuite, I copied every customer’s open invoices with amount, purchase date, and payment terms into the spreadsheet, and I used formulas to note when the invoice was due. For each customer, I learned what their past payment schedule was and manually predicted around when they would pay us. The spreadsheet allowed me to extend our AR forecast from 1 week to 6 weeks as well as accurately track our total AR.&lt;/p&gt;

&lt;p&gt;As a bootstrapped startup, tracking AR and predicting payments were critical to understanding our runway and spotting any hurdles.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fas fa-funnel-dollar fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;Minimizing Past-Due AR&lt;/h3&gt;

&lt;p&gt;In addition to only having a 1-week cash prediction, we had a huge past-due AR balance. According to NetSuite, many customers had unpaid invoices that were over 90 days old. Some open invoices were over a year old.&lt;/p&gt;

&lt;p&gt;All customers had their own processes for paying vendors. Large distributors’ Finance departments ran like clockwork, but they were still idiosyncratic. For instance, one consistently sent checks every week for all of the open invoices that had past NET30 that week, but the key phrase was “sent checks”. The checks were marked as sent in their system, but they always managed to arrive two weeks later. Adding up the times, the checks arrived six weeks after we sold the products: NET45, not NET30. Still, large customers were consistent.&lt;/p&gt;

&lt;p&gt;Smaller customers varied in terms of how prompt they were at paying. We had a number of small distributors that were great at paying on time. Others needed reminders. One change we made was ensuring there was an employee who was responsible for following up on past-due AR.&lt;/p&gt;

&lt;p&gt;A past-due invoice did not necessarily mean that we hadn’t been paid. All customers need to submit their payment with remittance information, which lists what invoices the payment covers. Sometimes, remittance information is incorrect, leading to accounting not closing the correct open invoice. Other times, accounting has the correct information but has not yet updated the open invoice.&lt;/p&gt;

&lt;p&gt;Our huge past-due AR balance was a mix of all three. Accounting only processed invoices every other week, so there were frequently unprocessed invoices. A subset of remittance information was incorrect, so while the customer thought the invoice was paid, our system did not. Working with customers resolved a long list of these that had built up over a couple years. Finally, a couple customers were consistently late at paying. Our diligent back-office person worked with them to figure out a payment schedule that we both agreed on.&lt;/p&gt;

&lt;p&gt;By minimizing our past-due AR, I knew what our real AR was, so I could predict payments more accurately.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fas fa-money-bill-wave fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;Profit: EBITDA vs Cash&lt;/h3&gt;

&lt;p&gt;We didn’t need to worry about EBITDA–shorthand for earnings before interest, taxes, depreciation, and amortization. It’s a common accounting measure for a company’s financial health. As a bootstrapped startup, cash was critical, and EBITDA was only useful for our corporate taxes. We followed cash basis accounting (as opposed to accrual basis accounting), and we focused on profit in terms of cash to decide questions like hiring, bonuses, or layoffs.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fas fa-building fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;Understanding Data and Processes&lt;/h3&gt;

&lt;p&gt;Understanding a small company’s cash flow isn’t hard, but it does require patience and tools. NetSuite is a fantastic back office tool, but it’s not necessarily ground truth. We couldn’t simply click on “AR Report” and accept the sum at the bottom. The data in the report wasn’t accurate in multiple ways. Digging into the actual data, customer by customer and vendor by vendor, helped us see what our actual AR was, allowed us to predict our cash up to six weeks using a simple spreadsheet, and enabled us to manage the cash flow for our bootstrapped startup.&lt;/p&gt;

</description>
        <pubDate>Sun, 24 Oct 2021 23:00:00 +0000</pubDate>
        <link>https://bdunagan.com/2021/10/24/how-i-managed-cash-flow-at-a-bootstrapped-startup/</link>
        <guid isPermaLink="true">https://bdunagan.com/2021/10/24/how-i-managed-cash-flow-at-a-bootstrapped-startup/</guid>
      </item>
    
      <item>
        <title>Technical Deep Dive on Object Lock, Ransomware Protection, and Immutable Backups in Retrospect Backup</title>
        <description>&lt;div class=&quot;flush&quot;&gt;&lt;img src=&quot;https://bdunagan.com/files/object_lock.jpg&quot; /&gt;&lt;/div&gt;
&lt;div class=&quot;caption&quot;&gt;&lt;a href=&quot;https://unsplash.com/photos/8wewP5tpt-4&quot;&gt;Unsplash&lt;/a&gt; - @myfotocanva&lt;/div&gt;

&lt;p&gt;Ransomware is a huge global threat to businesses around the world. The problem for companies is that their storage is always connected with full access for admins. When ransomware gets the administrative credentials, it has full access too. There is no policy to say that no one, not even the administrator, can change this file for a set amount of time.&lt;/p&gt;

&lt;p&gt;Cloud Object Lock does just that. Because cloud storage providers like Amazon S3 control the API, they can add features like Object Lock. This lock is a retention policy for a specific version of a file that is locked from changes from every user, including the administrator. You can think of this as a virtual air-gap in the cloud because there is no way, barring to close the account, to delete that file before the retention date is passed.&lt;/p&gt;

&lt;p&gt;Retrospect was one of the first data protection solutions to add ransomware protection using immutable backups: &lt;a href=&quot;https://bdunagan.com/2021/05/30/retrospect-backup-18-ransomware-protection/&quot;&gt;“Retrospect Backup 18: Ransomware Protection”&lt;/a&gt;. Retrospect utilizes Object Lock technology in major cloud storage providers to set a retention policy for cloud backups to ensure no one, not even the root user, can delete them during the retention window.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fas fa-window-restore fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;Retrospect User Interface&lt;/h3&gt;

&lt;p&gt;Creating an immutable backup set with Retrospect Backup is easy. There is a single checkbox in the user interface to enable it and a number of days to specify:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/retrospect_object_lock_retain_ui.png&quot; style=&quot;max-height:400;&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Retrospect Backup - Immutable Retention Policy&lt;/div&gt;

&lt;p&gt;However, there is a lot of functionality underneath that checkbox to create immutable backups. Let’s dive into the technical details.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fas fa-history fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;Forever-Incremental Backup’s Rolling Window&lt;/h3&gt;

&lt;p&gt;Retrospect Backup uses ProactiveAI for policy-driven scheduling and forever-incremental backup technology to minimize backup sizes while ensuring a point-in-time restore. The first backup is a full backup and every subsequent backup is called an incremental backup. Those incremental backups depend on previous backups. If a file doesn’t change, it doesn’t get backed up again.&lt;/p&gt;

&lt;p&gt;Ordinarily, this workflow is a fantastic combination of minimizing storage while providing a backup that can perfectly recreate a point-in-time snapshot of the volume being protected. But that changes if you’re concerned the previous backups might be deleted. If a file is no longer locked, it can be deleted maliciously. Retrospect Backup needs to create backups where any backup within the rolling window of immutability are fully contained point-in-time snapshots of the volume.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/retrospect_immutable_retention_policy.png&quot; style=&quot;max-height:400;&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Retrospect Backup - Immutable Retention Policy&lt;/div&gt;

&lt;p&gt;Retrospect Backup accounts for the rolling window in two ways:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;File Matching&lt;/strong&gt;: Retrospect adjusts its file matching to take into account retention policy for a given backed up file. A file that is outside of the retention policy is no longer considered to be backed up, and Retrospect will back it up again.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;ProactiveAI Scheduling&lt;/strong&gt;: ProactiveAI determines the next date the script will run and backs up any file that will fall out of the retention policy by that date with forever-incremental backup, predicting into the future to ensure the file is protected at all times.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The consequence of this change is Retrospect will back up any file that is not protected in an immutable backup. Let’s say you back up every week and you set the retention policy for 4 weeks. Retrospect will back up every file every four weeks, regardless of whether it changed, because it needs to keep those files in the ransomware protection’s rolling window.&lt;/p&gt;

&lt;p&gt;This process ensures that customers always have immutable backups with complete point-in-time restores. There is never a time when a backup depends on an out-of-policy file while preserving forever-incremental backups.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fas fa-cloud fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;Cloud Storage Providers&lt;/h3&gt;

&lt;p&gt;There are two types of approaches from cloud storage providers: per-object policies and per-bucket policies. Per-object policies can be applied granularly to specific versions of an object at the time of creation, and they can vary within a bucket. Per-bucket policies are created for an entire bucket and are applied uniformly to every new version of any object in that bucket.&lt;/p&gt;

&lt;p&gt;To compare with Retrospect:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Per-Object Policy&lt;/strong&gt;: You can create Backup Set A with an immutable retention policy of 2 days and Backup Set B with an immutable retention policy of 6 months in the same bucket, and the bucket does not need to have a bucket-wide policy.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Per-Bucket Policy&lt;/strong&gt;: You can only set a bucket-wide policy for immutable retention, so every new object is set to that retention period, regardless of what you have set in Retrospect.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cloud storage providers with per-object policies are Amazon S3, Wasabi, Backblaze B2, MinIO, and Microsoft Azure Blob Storage (Preview - September 2021), while those with per-bucket policies are Google Cloud Storage and Microsoft Azure Blob Storage.&lt;/p&gt;

&lt;p&gt;There are also different policy modes:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Compliance Mode&lt;/strong&gt;: The policy is time-based and enforced for every user, including administrators.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Governance Mode&lt;/strong&gt;: The policy is a legal hold, does not expire, and can be cancelled by a user with those permissions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;See &lt;a href=&quot;https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lock-overview.html#object-lock-retention-modes&quot;&gt;Amazon S3 documentation&lt;/a&gt; for more information.&lt;/p&gt;

&lt;p&gt;Retrospect Backup uses Retention Mode for its immutable backups. When you create an immutable backup, there is no permission level that will allow you to delete that version of the backup files. The root account cannot delete them. The only way to delete them is to close the account.&lt;/p&gt;

&lt;p&gt;Because there is a way to ultimately delete the files, it’s important to use multi-factor authentication (MFA) for your root account on the cloud storage provider.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;far fa-calendar-alt fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;Retention Policy Dates&lt;/h3&gt;

&lt;p&gt;Let’s walk through the user interfaces for retention policy dates in the different cloud storage providers.&lt;/p&gt;

&lt;h3 id=&quot;amazon-s3&quot;&gt;Amazon S3&lt;/h3&gt;

&lt;p&gt;Below is Amazon S3’s Retention Mode UI.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/retrospect_object_lock_amazon_s3_retention_panel.png&quot; style=&quot;max-height:400;&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Amazon S3 Object Lock - Retention Policy&lt;/div&gt;

&lt;p&gt;You’ll see it specifies the mode, the “Retain Until Date”, and the version of an object that you’re applying this to. Retrospect Backup does this step automatically when creating an immutable backup.&lt;/p&gt;

&lt;p&gt;For Microsoft Azure Blob Storage and Google Cloud Storage, you will need to create the retention policy manually because they only support per-bucket policies.&lt;/p&gt;

&lt;p&gt;Click on any object and scroll down on “Properties” to “Object Lock retention”.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/retrospect_object_lock_retain_s3.png&quot; style=&quot;max-height:400;&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Amazon S3&lt;/div&gt;

&lt;h3 id=&quot;wasabi&quot;&gt;Wasabi&lt;/h3&gt;

&lt;p&gt;Click on any object and “File Details” appears.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/retrospect_object_lock_retain_wasabi.png&quot; style=&quot;max-height:400;&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Wasabi&lt;/div&gt;

&lt;h3 id=&quot;backblaze-b2&quot;&gt;Backblaze B2&lt;/h3&gt;

&lt;p&gt;Click on any object’s blue link and “Details” appears.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/retrospect_object_lock_retain_backblaze.png&quot; style=&quot;max-height:400;&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Backblaze B2&lt;/div&gt;

&lt;h3 id=&quot;cyberduck&quot;&gt;Cyberduck&lt;/h3&gt;

&lt;p&gt;Select any object and click “Info” then “Metadata”.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/retrospect_object_lock_retain_cyberduck.png&quot; style=&quot;max-height:400;&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Cyberduck&lt;/div&gt;

&lt;h3 id=&quot;microsoft-azure-blob-storage&quot;&gt;Microsoft Azure Blob Storage&lt;/h3&gt;

&lt;p&gt;For a container, select “Access Policy”. Note: Per-object (blob) version locking in preview (September 2021)&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/retrospect_object_lock_retain_azure.png&quot; style=&quot;max-height:400;&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Microsoft Azure Blob Storage&lt;/div&gt;

&lt;h3 id=&quot;google-cloud-storage&quot;&gt;Google Cloud Storage&lt;/h3&gt;

&lt;p&gt;When viewing a list of files, see the “Retention expiration date” column.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/retrospect_object_lock_retain_google.png&quot; style=&quot;max-height:400;&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Microsoft Azure Blob Storage&lt;/div&gt;

&lt;h3 id=&quot;viewing-and-deleting-versions&quot;&gt;Viewing and Deleting Versions&lt;/h3&gt;

&lt;p&gt;One important nuance is how to view versions of a file. Only Amazon S3 and Cyberduck show versions. See below. Other interfaces choose to display a simplified version of the actual underlying content while preventing you from taking certain actions, like deletion.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/retrospect_object_lock_show_versions.png&quot; style=&quot;max-height:400;&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Object Lock - Show Versions&lt;/div&gt;

&lt;p&gt;One underlying feature is a delete marker. When you delete a object in a versioned bucket on Amazon S3, the file is not deleted. You are adding a delete marker as the next version of that file, and Amazon S3 understands it should not display that in the interface without “Show Versions” enabled.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/retrospect_object_lock_show_versions_delete_marker.png&quot; style=&quot;max-height:400;&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Object Lock - Delete Markers&lt;/div&gt;

&lt;p&gt;Let’s look at the difference between deleting an object (“delete”) and deleting an object version (“permanently delete”):&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/retrospect_object_lock_delete.png&quot; style=&quot;max-height:400;&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Object Lock - Delete&lt;/div&gt;

&lt;p&gt;In Wasabi or Backblaze, you don’t see versions, even though they are there for buckets with Object Lock enabled. Wasabi won’t let you delete files through their interface, but if an attacker added a delete marker to your file using an API, the file would no longer show up in Wasabi. You would have to use Cyberduck or other API to see that the locked files were indeed still there.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fab fa-youtube fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;Video Walkthrough&lt;/h3&gt;

&lt;p&gt;I recorded a detailed video of the use cases and step-by-step walkthroughs on both Windows and Mac platforms as well as this technical deep dive into how Retrospect’s ransomware protection works and how Object Lock is implemented across cloud storage providers.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://www.youtube.com/embed/mtKKiFHvJRc&quot;&gt;
&lt;img src=&quot;https://bdunagan.com/files/retrospect_ransomware_protection_video.jpg&quot; style=&quot;max-height:400;&quot; /&gt;
&lt;/a&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Ransomware Protection with Retrospect Backup&apos;s Immutable Backups&lt;/div&gt;

&lt;p&gt;Note that much of this content is also available on the Retrospect website, under &lt;a href=&quot;https://www.retrospect.com/ransomware&quot;&gt;Ransomware Protection&lt;/a&gt; and &lt;a href=&quot;https://www.retrospect.com/en/support/kb/technical_deep_dive_on_ransomware_protection&quot;&gt;Technical Deep Dive on Ransomware Protection, Object Lock, and Immutable Backups
Ransomware&lt;/a&gt;. It took me a bit to compile, but the broad overview and then technical details have really helped clarify people’s understanding of ransomware, Object Lock, and Immutable backups–both inside the company and for our partners and customers.&lt;/p&gt;
</description>
        <pubDate>Sun, 17 Oct 2021 12:00:00 +0000</pubDate>
        <link>https://bdunagan.com/2021/10/17/technical-deep-dive-on-retrospect-ransomware-protection/</link>
        <guid isPermaLink="true">https://bdunagan.com/2021/10/17/technical-deep-dive-on-retrospect-ransomware-protection/</guid>
      </item>
    
      <item>
        <title>How to Protect Cloud Storage with Retrospect Backup's Cloud Data Protection</title>
        <description>&lt;div class=&quot;flush&quot;&gt;&lt;img src=&quot;https://bdunagan.com/files/cloud_data_protection.jpg&quot; /&gt;&lt;/div&gt;
&lt;div class=&quot;caption&quot;&gt;&lt;a href=&quot;https://unsplash.com/photos/gnxb59lGU1M&quot;&gt;Unsplash&lt;/a&gt; - @ilumire&lt;/div&gt;

&lt;p&gt;Cloud storage provides a low-cost, scalable, offsite storage location for a multitude of corporate assets. Companies use cloud storage for all sorts of data, from website assets to corporate storage to unstructured data used for analysis.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/retrospect_cloud_data_protection_uses.png&quot; style=&quot;max-height:400;&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Cloud Data&lt;/div&gt;

&lt;p&gt;Retrospect has used Amazon S3 for storing various assets for the last decade. Today, we store tens of terabytes of content on it: website assets, sales collateral, customer logs, Engineering builds, and, of course, corporate backups.&lt;/p&gt;

&lt;p&gt;Cloud data storage is fantastic for its infinite supply, worldwide accessibility, and abstraction. It allows companies to move up the storage stack and not manage the physical bits of hardware that are required to store vast quantities of data–all for an affordable per-data-unit cost. Amazon S3 costs $0.023 per GB per month, and other providers like Wasabi and Backblaze B2 are even more affordable.&lt;/p&gt;

&lt;p&gt;To achieve SLAs up to 11 9s like Amazon S3, cloud storage providers built redundancies into their infrastructure to protect your data from hardware failure. Objects are stored in multiple copies in a single zone and in multiple different zones. Two data centers can go offline, and you’ll still have your data. Awesome.&lt;/p&gt;

&lt;p&gt;Unless you delete a file accidentally. Or ransomware comprises your account and encrypts your data. Or a malicious insider deletes every bucket.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://bdunagan.com/files/retrospect_cloud_data_protection_loss.png&quot; style=&quot;max-height:400;&quot; /&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;Cloud Data Loss&lt;/div&gt;

&lt;p&gt;Working at a backup company, I hear stories about data loss a lot. Bad things happen. You can protect yourself against one type of data loss, only to be hit by another. Cloud data is no exception.&lt;/p&gt;

&lt;p&gt;Retrospect Backup can now protect your cloud data with Cloud Data Protection. Retrospect treats cloud storage as a first-class storage volume and integrates cloud volumes into the rest of Retrospect’s deep feature set. Cloud Data Protection is included for an unlimited number of cloud volumes in Retrospect Backup’s server-level editions on both Windows and Mac platforms, and Retrospect integrates cloud storage from all the major cloud storage providers: Amazon S3, Microsoft Azure Blob Storage, Google Cloud Storage, Wasabi, Backblaze B2, and MinIO.&lt;/p&gt;

&lt;p&gt;I made a video covering common use cases and step-by-step walkthroughs for Retrospect Backup for Windows and Retrospect Backup for Mac:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://www.youtube.com/embed/OvAOlpAH7Ec&quot;&gt;
&lt;img src=&quot;https://bdunagan.com/files/retrospect_cloud_data_protection_video.jpg&quot; style=&quot;max-height:400;&quot; /&gt;
&lt;/a&gt;&lt;/p&gt;
&lt;div class=&quot;caption&quot;&gt;How to Protect Cloud Assets with Retrospect Backup&apos;s Cloud Data Protection&lt;/div&gt;

&lt;p&gt;Learn more at &lt;a href=&quot;https://www.retrospect.com/cloud_data_protection&quot;&gt;retrospect.com/cloud_data_protection&lt;/a&gt;.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 class=&quot;center&quot;&gt;&lt;i class=&quot;fas fa-layer-group fontawesome_icon&quot;&gt;&lt;/i&gt;&lt;br /&gt;Deep Integration&lt;/h3&gt;

&lt;p&gt;Retrospect Backup has a deep feature set, and Cloud Data Protection allows organizations to incorporate cloud data into their standard backup strategies. Let’s walk through a number of facets:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;On-Premise or Cloud&lt;/strong&gt;: You can back up the cloud volume to on-premise storage, back up the cloud data to a different cloud from a cloud server instance, or back up a local volume and restore it to the cloud. A cloud volume has the same level of protection and interoperability as a local volume.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Incremental-Forever Backups&lt;/strong&gt;: Retrospect uses incremental-forever backup technology to ensure each backup is as compact as possible while still providing point-in-time restore, and cloud volumes are included.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Scheduling&lt;/strong&gt;: Retrospect supports cloud volumes under scheduled scripts or policy-based scheduling called ProactiveAI, which prioritizes backups based on a backup window and availability across local volumes, NAS shares, servers, endpoints, email accounts, and now cloud volumes.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Filtering&lt;/strong&gt;: Retrospect includes a granular set of filtering options for companies to include only critical data or exclude extraneous data, based on volume information or file information. For instance, you can easily write a filter to only back up cloud data that’s created in the last 90 days and less than 100MB.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Cloud Backup&lt;/strong&gt;: Retrospect can also use cloud storage as a destination for backups, with support for over 20 different cloud storage providers.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Replication&lt;/strong&gt;: In addition to backup, Retrospect can replicate between cloud volumes, local volumes, NAS shares, servers, endpoints, or any combination, with scheduling and filtering available.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Retrospect Backup’s goal is to ensure your data is protected from the wide array of possibilities for data loss. With Cloud Data Protection in Retrospect Backup, you can back up and restore cloud storage with the same policy-driven workflows that you use for on-premise systems and storage.&lt;/p&gt;

&lt;p&gt;Congratulations to the Retrospect Engineering team for shipping this excellent feature in such a deeply integrated fashion! With the continued migration to the cloud, data protection that is agnostic to volume type or location helps companies protect their data in the way that works for their business constraints.&lt;/p&gt;
</description>
        <pubDate>Sat, 16 Oct 2021 13:00:00 +0000</pubDate>
        <link>https://bdunagan.com/2021/10/16/how-to-protect-cloud-storage-with-retrospect-backup/</link>
        <guid isPermaLink="true">https://bdunagan.com/2021/10/16/how-to-protect-cloud-storage-with-retrospect-backup/</guid>
      </item>
    
      <item>
        <title>Jekyll Tip: Adding Years and Post Counts using Liquid</title>
        <description>&lt;p&gt;After 13 years and 177 posts, this blog is a bit long for a single scrolling index, so I added years to delineate the content as well as post count per year.&lt;/p&gt;

&lt;p&gt;That involved a bit of programming in &lt;a href=&quot;https://shopify.github.io/liquid/&quot;&gt;Liquid&lt;/a&gt;, a template language created by Shopify and used by Jekyll. Liquid does not expose support for hash creation, even though Jekyll can provide hashes like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;site&lt;/code&gt;. Fortunately, it supports &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;group_by_exp&lt;/code&gt; expressions, available in v3.4+.&lt;/p&gt;

&lt;p&gt;Below is the code I used:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-ruby&quot; data-lang=&quot;ruby&quot;&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ul&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;class&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;posts&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;sx&quot;&gt;% assign &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;posts_per_year&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;site&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;posts&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;group_by_exp: &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;post&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;post.date | date: &apos;%Y&apos;&quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;sx&quot;&gt;% for &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;post&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;site&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;posts&lt;/span&gt; &lt;span class=&quot;sx&quot;&gt;%}
    {% assign year = post.date | date: &quot;%Y&quot; %}&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;sx&quot;&gt;% for &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;current_year&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;posts_per_year&lt;/span&gt; &lt;span class=&quot;sx&quot;&gt;%}
      {% if last_year != year and current_year.name == year %}&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;li&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;class&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;year&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{{&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;year&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;}}&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{{&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;current_year&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;size&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;}}&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;sr&quot;&gt;/li&amp;gt;
      {% endif %}
    {% endfor %}
    {% assign last_year = year %}

    &amp;lt;li&amp;gt;
      &amp;lt;span class=&quot;datetime muted&quot; data-time=&quot;{{ post.date }}&quot;&amp;gt;{{ post.date | date: &quot;%B %-d %Y&quot; }}&amp;lt;/s&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pan&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;
      &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;href&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;{{ post.url }}&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{{&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;post&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;title&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;}}&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;sr&quot;&gt;/a&amp;gt;
      &amp;lt;p&amp;gt;{{ post.description }}&amp;lt;/&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;p&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;sr&quot;&gt;/li&amp;gt;
  {% endfor %}
&amp;lt;/u&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;l&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;I’ve been using Jekyll for eight years now. In 2013, I converted from Wordpress’s database-driven approach to Jekyll’s static HTML-generated approach: &lt;a href=&quot;https://bdunagan.com/2013/11/08/why-i-moved-from-wordpress-to-jekyll/&quot;&gt;“Why I moved from Wordpress to Jekyll”&lt;/a&gt;. It continues to work well.&lt;/p&gt;

&lt;p&gt;And for those who use Jekyll, you can post Liquid as text in a post by using &lt;code&gt;&amp;#123;% raw %}&amp;#123;% endraw %}&lt;/code&gt;.&lt;/p&gt;
</description>
        <pubDate>Fri, 15 Oct 2021 23:00:00 +0000</pubDate>
        <link>https://bdunagan.com/2021/10/15/jekyll-tip-adding-years-and-post-counts-in-liquid/</link>
        <guid isPermaLink="true">https://bdunagan.com/2021/10/15/jekyll-tip-adding-years-and-post-counts-in-liquid/</guid>
      </item>
    
  </channel>
</rss>
