<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.3.4">Jekyll</generator><link href="https://byroot.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://byroot.github.io/" rel="alternate" type="text/html" /><updated>2025-11-05T08:06:53+00:00</updated><id>https://byroot.github.io/feed.xml</id><title type="html">byroot’s blog</title><subtitle>Various ramblings.</subtitle><entry><title type="html">Frozen String Literals: Past, Present, Future?</title><link href="https://byroot.github.io/ruby/performance/2025/10/28/string-literals.html" rel="alternate" type="text/html" title="Frozen String Literals: Past, Present, Future?" /><published>2025-10-28T08:03:51+00:00</published><updated>2025-10-28T08:03:51+00:00</updated><id>https://byroot.github.io/ruby/performance/2025/10/28/string-literals</id><content type="html" xml:base="https://byroot.github.io/ruby/performance/2025/10/28/string-literals.html"><![CDATA[<p>If you are a Rubyist, you’ve likely been writing <code class="language-plaintext highlighter-rouge"># frozen_string_literal: true</code> at the top of most of your Ruby
source code files, or at the very least, that you’ve seen it in some other projects.</p>

<p>Based on informal discussions at conferences and online, it seems that what this magic comment really is about is not always well understood,
so I figured it would be worth talking about why it’s there, what it does exactly, and what its future might look like.</p>

<h2 id="ruby-strings-are-mutable">Ruby Strings Are Mutable</h2>

<p>Before we can delve into what makes frozen string literals special, we first need to talk about the Ruby String type,
because it’s quite different from the equivalent type in other popular languages.</p>

<p>In the overwhelming majority of popular languages, strings are immutable.
That’s the case in Java, JavaScript, Python, Go, etc.</p>

<p>There are a few exceptions, though, like Perl, PHP<sup id="fnref:1"><a href="#fn:1" class="footnote" rel="footnote" role="doc-noteref">1</a></sup>, C/C++ (except for literals), and of course Ruby:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;</span> <span class="n">str</span> <span class="o">=</span> <span class="no">String</span><span class="p">.</span><span class="nf">new</span>
<span class="o">=&gt;</span> <span class="s2">""</span>
<span class="o">&gt;&gt;</span> <span class="n">str</span><span class="p">.</span><span class="nf">object_id</span>
<span class="o">=&gt;</span> <span class="mi">24952</span>
<span class="o">&gt;&gt;</span> <span class="n">str</span> <span class="o">&lt;&lt;</span> <span class="s2">"foo"</span>
<span class="o">=&gt;</span> <span class="s2">"foo"</span>
<span class="o">&gt;&gt;</span> <span class="n">str</span>
<span class="o">=&gt;</span> <span class="s2">"foo"</span>
<span class="o">&gt;&gt;</span> <span class="n">str</span><span class="p">.</span><span class="nf">capitalize!</span>
<span class="o">=&gt;</span> <span class="s2">"Foo"</span>
<span class="o">&gt;&gt;</span> <span class="n">str</span><span class="p">.</span><span class="nf">upcase!</span>
<span class="o">=&gt;</span> <span class="s2">"FOO"</span>
<span class="o">&gt;&gt;</span> <span class="n">str</span>
<span class="o">=&gt;</span> <span class="s2">"FOO"</span>
<span class="o">&gt;&gt;</span> <span class="n">str</span><span class="p">.</span><span class="nf">object_id</span>
<span class="o">=&gt;</span> <span class="mi">24952</span>
</code></pre></div></div>

<p>Implementation-wise, they’re just an array of bytes, with an associated encoding to know how these bytes should be interpreted:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">String</span>
  <span class="nb">attr_reader</span> <span class="ss">:encoding</span>

  <span class="k">def</span> <span class="nf">initialize</span>
    <span class="vi">@bytes</span> <span class="o">=</span> <span class="p">[]</span>
    <span class="vi">@encoding</span> <span class="o">=</span> <span class="no">Encoding</span><span class="o">::</span><span class="no">UTF_8</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>That too is quite unusual.</p>

<h2 id="string-encoding">String Encoding</h2>

<p>Most languages, especially the ones I listed above, instead have chosen a specific internal encoding, and all strings are encoded that way.
For instance, in Java and JavaScript, strings are encoded in UTF-16 because they were created somewhat at the same time as the first Unicode specification, and at that time, many people thought that surely 16 bits should be enough to encode all possible characters, but that later turned out to be wrong.
Most newer languages uses UTF-8, or a limited set of internal encodings.</p>

<p>For instance, in Python, strings can be encoded in either <code class="language-plaintext highlighter-rouge">ISO-8859-1</code> (AKA Latin 1), UTF-16 or UtF-32.
But from a user perspective, it’s an implementation detail, and you can’t really tell what encoding a particular string is using.
Semantically, strings are Unicode sequences, how that sequence is encoded in memory is abstracted away.</p>

<p>In these languages, whenever you have to handle text in another encoding, you start by re-encoding it into the internal representation.
In Ruby however, strings with different internal encodings can exist in the same program, and Ruby supports over a hundred different encodings:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;</span> <span class="no">Encoding</span><span class="p">.</span><span class="nf">list</span><span class="p">.</span><span class="nf">size</span>
<span class="o">=&gt;</span> <span class="mi">103</span>
</code></pre></div></div>

<p>While I’m not 100% percent certain of why Ruby went that way, I highly suspect it is in big part due to Ruby’s Japanese origin.
In the early days of the Unicode specification, there was an attempt at unifying some of the “common” Chinese, Korean, and Japanese characters,
as what is now called the <a href="https://en.wikipedia.org/wiki/Han_unification">Han unification</a>.
Because of that character unification attempt, Unicode had lots of problems for Japanese text, hence the Japanese IT industry didn’t adopt Unicode as fast as the Western IT industry did, and for a very long time, Japanese-specific encoding such as <a href="https://en.wikipedia.org/wiki/Shift_JIS">Shift JIS</a> remained widespread.</p>

<p>As such, being able to work with Japanese text without going through a forced Unicode conversion was an important feature for a large part of Ruby’s core contributors.</p>

<p>But let’s go back to mutability.</p>

<h2 id="pros-and-cons">Pros And Cons</h2>

<p>Like most things in engineering, both immutable and mutable strings have pros and cons, so it’s not like one choice is inherently superior to the other.</p>

<p>One of the advantages of immutable strings is that you can more easily share them, for instance:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sliced_string</span> <span class="o">=</span> <span class="n">very_long_string</span><span class="p">[</span><span class="mi">1</span><span class="o">..-</span><span class="mi">1</span><span class="p">]</span>
</code></pre></div></div>

<p>In the above case, if strings are mutable, you need to copy all but one of the bytes of <code class="language-plaintext highlighter-rouge">very_long_string</code> into <code class="language-plaintext highlighter-rouge">sliced_string</code>, which can be costly.
But if strings are immutable, you can instead have <code class="language-plaintext highlighter-rouge">sliced_string</code> internally be pointing at the content of <code class="language-plaintext highlighter-rouge">very_long_string</code> with just an offset.
That is what some languages call String Views, or String slices.</p>

<p>Another advantage of immutable strings is that they allow for <a href="https://en.wikipedia.org/wiki/String_interning">interning</a>.
The idea is simple, if strings can’t be mutated, whenever you have multiple instances of strings with identical content, you can coalesce them into a single instance.
This deduplication can be done more or less aggressively, as it’s always a tradeoff in how much CPU time you want to spend searching for duplicates in the hope of saving some memory.</p>

<p>Some other advantages include not having to worry about mutation in multi-threaded code, as well as dictionary keys.
Strings are used a lot as dictionary keys.
If you mutate a string, you change its hash code, and that basically breaks hash tables.</p>

<p>On the other hand, mutable strings are very handy in some scenarios, like to iteratively build a final string:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">buffer</span> <span class="o">=</span> <span class="s2">""</span>
<span class="mi">10</span><span class="p">.</span><span class="nf">times</span> <span class="k">do</span>
  <span class="n">buffer</span> <span class="o">&lt;&lt;</span> <span class="s2">"hello"</span>
<span class="k">end</span>
</code></pre></div></div>

<p>Whereas in a language with immutable strings like Java, concatenating strings in a loop is known as a classic performance gotcha:</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nc">String</span> <span class="n">buffer</span> <span class="o">=</span> <span class="s">""</span><span class="o">;</span>
<span class="k">for</span> <span class="o">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="o">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">10</span><span class="o">;</span> <span class="n">i</span><span class="o">++)</span> <span class="o">{</span>
  <span class="n">buffer</span> <span class="o">+=</span> <span class="s">"hello"</span><span class="o">;</span>
<span class="o">}</span>
</code></pre></div></div>

<p>In the above example, on every loop, the <code class="language-plaintext highlighter-rouge">+=</code> operator causes a new string to be allocated, and the content to be copied, which gets exponentially more expensive as the string grows.
Instead, you are supposed to use a different object as a buffer: <code class="language-plaintext highlighter-rouge">StringBuilder</code>:</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nc">StringBuilder</span> <span class="n">buffer</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">StringBuilder</span><span class="o">();</span>
<span class="k">for</span> <span class="o">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="o">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">10</span><span class="o">;</span> <span class="n">i</span><span class="o">++)</span> <span class="o">{</span>
  <span class="n">buffer</span><span class="o">.</span><span class="na">append</span><span class="o">(</span><span class="s">"hello"</span><span class="o">);</span>
<span class="o">}</span>
<span class="n">buffer</span><span class="o">.</span><span class="na">toString</span><span class="o">();</span>
</code></pre></div></div>

<p>That’s the Java equivalent of appending strings to an array and then calling <code class="language-plaintext highlighter-rouge">array.join("")</code>.
It’s a common enough mistake that at some point the Java compiler gained the ability to <a href="https://docs.oracle.com/javase/specs/jls/se8/html/jls-15.html#jls-15.18.1">detect that pattern and automatically replace it with the equivalent code using <code class="language-plaintext highlighter-rouge">StringBuilder</code></a>.</p>

<p>While having to use a different buffer type isn’t the end of the world, I do very much like that it’s not necessary in Ruby.</p>

<p>But more generally, the advantage of mutable strings is that for some algorithms, being able to modify the string in place saves a lot of memory allocations and copying.</p>

<h2 id="ruby-actually-has-both">Ruby Actually Has Both</h2>

<p>Earlier in this post, I said Ruby had mutable strings, but it’s not quite true.
Ruby actually has both mutable and immutable strings, because in Ruby, every mutable object can be frozen, hence, Ruby has both mutable and immutable strings, and it takes advantage of this.</p>

<p>A fun way to poke at Ruby internals is through <a href="https://docs.ruby-lang.org/en/3.4/ObjectSpace.html#method-i-dump">the <code class="language-plaintext highlighter-rouge">ObjectSpace.dump</code> method</a>.</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">require</span> <span class="s2">"json"</span>
<span class="nb">require</span> <span class="s2">"objspace"</span>

<span class="k">def</span> <span class="nf">dump</span><span class="p">(</span><span class="n">obj</span><span class="p">)</span>
  <span class="no">JSON</span><span class="p">.</span><span class="nf">pretty_generate</span><span class="p">(</span><span class="no">JSON</span><span class="p">.</span><span class="nf">parse</span><span class="p">(</span><span class="no">ObjectSpace</span><span class="p">.</span><span class="nf">dump</span><span class="p">(</span><span class="n">obj</span><span class="p">)))</span>
<span class="k">end</span>

<span class="n">str</span> <span class="o">=</span> <span class="s2">"Hello World"</span> <span class="o">*</span> <span class="mi">80</span>
<span class="nb">puts</span> <span class="n">dump</span><span class="p">(</span><span class="n">str</span><span class="p">)</span>
</code></pre></div></div>

<p>The above script will output something like:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"address"</span><span class="p">:</span><span class="w"> </span><span class="s2">"0x105068e10"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"STRING"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"slot_size"</span><span class="p">:</span><span class="w"> </span><span class="mi">40</span><span class="p">,</span><span class="w">
  </span><span class="nl">"bytesize"</span><span class="p">:</span><span class="w"> </span><span class="mi">880</span><span class="p">,</span><span class="w">
  </span><span class="nl">"memsize"</span><span class="p">:</span><span class="w"> </span><span class="mi">921</span><span class="p">,</span><span class="w">
  </span><span class="err">...</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>It tells us the string content is <code class="language-plaintext highlighter-rouge">880B</code> (<code class="language-plaintext highlighter-rouge">bytesize</code>) and that Ruby allocated a <code class="language-plaintext highlighter-rouge">40B</code> wide slot (<code class="language-plaintext highlighter-rouge">slot_size</code>),
hence the string content is stored in an external buffer for a total of <code class="language-plaintext highlighter-rouge">921B</code> (<code class="language-plaintext highlighter-rouge">memsize</code>).</p>

<p>Now, look what happens if we slice that string:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">require</span> <span class="s2">"json"</span>
<span class="nb">require</span> <span class="s2">"objspace"</span>

<span class="k">def</span> <span class="nf">dump</span><span class="p">(</span><span class="n">obj</span><span class="p">)</span>
  <span class="no">JSON</span><span class="p">.</span><span class="nf">pretty_generate</span><span class="p">(</span><span class="no">JSON</span><span class="p">.</span><span class="nf">parse</span><span class="p">(</span><span class="no">ObjectSpace</span><span class="p">.</span><span class="nf">dump</span><span class="p">(</span><span class="n">obj</span><span class="p">)))</span>
<span class="k">end</span>

<span class="n">str</span> <span class="o">=</span> <span class="s2">"Hello World"</span> <span class="o">*</span> <span class="mi">80</span>
<span class="nb">puts</span> <span class="s2">"initial str: </span><span class="si">#{</span><span class="n">dump</span><span class="p">(</span><span class="n">str</span><span class="p">)</span><span class="si">}</span><span class="se">\n</span><span class="s2">"</span>

<span class="n">slice</span> <span class="o">=</span> <span class="n">str</span><span class="p">[</span><span class="mi">40</span><span class="o">..-</span><span class="mi">1</span><span class="p">]</span>

<span class="nb">puts</span> <span class="s2">"str after:</span><span class="se">\n</span><span class="si">#{</span><span class="n">dump</span><span class="p">(</span><span class="n">str</span><span class="p">)</span><span class="si">}</span><span class="se">\n</span><span class="s2">"</span>
<span class="nb">puts</span> <span class="s2">"slice:</span><span class="se">\n</span><span class="si">#{</span><span class="n">dump</span><span class="p">(</span><span class="n">slice</span><span class="p">)</span><span class="si">}</span><span class="se">\n</span><span class="s2">"</span>
</code></pre></div></div>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="err">str</span><span class="w"> </span><span class="err">after:</span><span class="w">
</span><span class="p">{</span><span class="w">
  </span><span class="nl">"address"</span><span class="p">:</span><span class="w"> </span><span class="s2">"0x105178e18"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"STRING"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"slot_size"</span><span class="p">:</span><span class="w"> </span><span class="mi">40</span><span class="p">,</span><span class="w">
  </span><span class="nl">"shared"</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span><span class="w">
  </span><span class="nl">"references"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="s2">"0x1051786c0"</span><span class="w"> </span><span class="p">],</span><span class="w">
  </span><span class="nl">"memsize"</span><span class="p">:</span><span class="w"> </span><span class="mi">40</span><span class="p">,</span><span class="w">
  </span><span class="err">...</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="err">slice:</span><span class="w">
</span><span class="p">{</span><span class="w">
  </span><span class="nl">"address"</span><span class="p">:</span><span class="w"> </span><span class="s2">"0x1051786e8"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"STRING"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"slot_size"</span><span class="p">:</span><span class="w"> </span><span class="mi">40</span><span class="p">,</span><span class="w">
  </span><span class="nl">"shared"</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span><span class="w">
  </span><span class="nl">"references"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="s2">"0x1051786c0"</span><span class="w"> </span><span class="p">],</span><span class="w">
  </span><span class="nl">"memsize"</span><span class="p">:</span><span class="w"> </span><span class="mi">40</span><span class="p">,</span><span class="w">
  </span><span class="err">...</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>Now, both <code class="language-plaintext highlighter-rouge">str</code> and <code class="language-plaintext highlighter-rouge">slice</code> have the <code class="language-plaintext highlighter-rouge">shared: true</code> attribute, which indicates that they’re not actually owning their content, they are pointing inside another String object.
You can also see that both <code class="language-plaintext highlighter-rouge">str</code> and <code class="language-plaintext highlighter-rouge">slice</code> have a reference to the same object at address: <code class="language-plaintext highlighter-rouge">0x1051786c0</code>.
So even though it has mutable strings, Ruby is still able to optimize some operations using “string views” like languages with immutable strings.
However, since <code class="language-plaintext highlighter-rouge">str</code> is mutable, Ruby couldn’t directly create a string view that references <code class="language-plaintext highlighter-rouge">str</code>, it first had to transfer the buffer ownership to a third String object, and that one is immutable.
But if <code class="language-plaintext highlighter-rouge">str</code> was frozen, Ruby would have been able to directly create <code class="language-plaintext highlighter-rouge">slice</code> as a view inside <code class="language-plaintext highlighter-rouge">str</code>.</p>

<p>Similarly, when I was listing some of the pros and cons of mutable strings, I mentioned how mutable strings are a problem when used as hash table keys.
Perhaps you’ve never noticed it, but to avoid this problem, Ruby automatically freezes string keys in Hash:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;</span> <span class="n">str</span> <span class="o">=</span> <span class="s2">"test"</span>
<span class="o">=&gt;</span> <span class="s2">"test"</span>
<span class="o">&gt;&gt;</span> <span class="n">str</span><span class="p">.</span><span class="nf">frozen?</span>
<span class="o">=&gt;</span> <span class="kp">false</span>
<span class="o">&gt;&gt;</span> <span class="nb">hash</span> <span class="o">=</span> <span class="p">{</span> <span class="n">str</span> <span class="o">=&gt;</span> <span class="mi">1</span> <span class="p">}</span>
<span class="o">=&gt;</span> <span class="p">{</span><span class="s2">"test"</span> <span class="o">=&gt;</span> <span class="mi">1</span><span class="p">}</span>
<span class="o">&gt;&gt;</span> <span class="nb">hash</span><span class="p">.</span><span class="nf">keys</span><span class="p">.</span><span class="nf">first</span>
<span class="o">=&gt;</span> <span class="s2">"test"</span>
<span class="o">&gt;&gt;</span> <span class="nb">hash</span><span class="p">.</span><span class="nf">keys</span><span class="p">.</span><span class="nf">first</span><span class="p">.</span><span class="nf">frozen?</span>
<span class="o">=&gt;</span> <span class="kp">true</span>
<span class="o">&gt;&gt;</span> <span class="p">[</span><span class="n">str</span><span class="p">.</span><span class="nf">object_id</span><span class="p">,</span> <span class="nb">hash</span><span class="p">.</span><span class="nf">keys</span><span class="p">.</span><span class="nf">first</span><span class="p">.</span><span class="nf">object_id</span><span class="p">]</span>
<span class="o">=&gt;</span> <span class="p">[</span><span class="mi">16</span><span class="p">,</span> <span class="mi">24</span><span class="p">]</span>
</code></pre></div></div>

<p>As you can see, here Ruby couldn’t directly use the <code class="language-plaintext highlighter-rouge">str</code> string as a Hash key, it first had to make a frozen copy of it.
Here, too, if <code class="language-plaintext highlighter-rouge">str</code> was frozen, Ruby could have saved the extra work of duplicating this string.</p>

<p>I believe that illustrates the common tradeoffs at play with mutable strings.
On one hand, they can be much more efficient, allowing for in-place modifications, but on the other hand, they impose extra allocations and copying to protect yourself from mutations.</p>

<h2 id="the-history-of-frozen-string-literal">The History Of Frozen String Literal</h2>

<p>To avoid this extra copying overhead, it used to be a fairly common optimization technique to store string literals in constants.
For instance, you can see this idiom in <a href="https://github.com/rack/rack/commit/8b8690bcb7762cde729088c2abdacb610ebea1f7">a 17 years old patch to rack</a>:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">module</span> <span class="nn">Rack</span>
  <span class="k">class</span> <span class="nc">MethodOverride</span>
    <span class="no">METHOD_OVERRIDE_PARAM_KEY</span> <span class="o">=</span> <span class="s2">"_method"</span><span class="p">.</span><span class="nf">freeze</span>
    <span class="no">HTTP_METHOD_OVERRIDE_HEADER</span> <span class="o">=</span> <span class="s2">"HTTP_X_HTTP_METHOD_OVERRIDE"</span><span class="p">.</span><span class="nf">freeze</span>

    <span class="k">def</span> <span class="nf">call</span><span class="p">(</span><span class="n">env</span><span class="p">)</span>
      <span class="c1"># ...</span>
      <span class="nb">method</span> <span class="o">=</span> <span class="n">req</span><span class="o">.</span><span class="no">POST</span><span class="p">[</span><span class="no">METHOD_OVERRIDE_PARAM_KEY</span><span class="p">]</span> <span class="o">||</span>
        <span class="n">env</span><span class="p">[</span><span class="no">HTTP_METHOD_OVERRIDE_HEADER</span><span class="p">]</span>
      <span class="c1"># ...</span>
    <span class="k">end</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>It’s this pattern that led <a href="https://github.com/haileys">Hailey Somerville</a> from GitHub to open <a href="https://bugs.ruby-lang.org/issues/8579">a feature request to propose a new syntax for frozen string literals</a>: <code class="language-plaintext highlighter-rouge">%f</code>.</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">req</span><span class="o">.</span><span class="no">POST</span><span class="p">[</span><span class="o">%</span><span class="n">f</span><span class="p">(</span><span class="n">_method</span><span class="p">)]</span> <span class="o">||</span> <span class="n">env</span><span class="p">[</span><span class="o">%</span><span class="n">f</span><span class="p">(</span><span class="no">HTTP_X_HTTP_METHOD_OVERRIDE</span><span class="p">)]</span>
</code></pre></div></div>

<p>This syntax wasn’t accepted, but as a counter proposal, <a href="https://github.com/mame">Yusuke Endoh (mame)</a> suggested an “f suffix”:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">req</span><span class="o">.</span><span class="no">POST</span><span class="p">[</span><span class="s2">"_method"</span><span class="n">f</span><span class="p">]</span> <span class="o">||</span> <span class="n">env</span><span class="p">[</span><span class="s2">"HTTP_X_HTTP_METHOD_OVERRIDE"</span><span class="n">f</span><span class="p">]</span>
</code></pre></div></div>

<p>This one was accepted and implemented in Ruby <code class="language-plaintext highlighter-rouge">2.1.0dev</code>.</p>

<p>However, many core developers didn’t like this new syntax, so even after its implementation, multiple counterproposals were made.
Notably, <a href="https://bugs.ruby-lang.org/issues/8976">Akira Tanaka (akr), proposed a file-based directive</a>: <code class="language-plaintext highlighter-rouge"># freeze_string: true</code>, but it didn’t catch on.</p>

<p>However before the final 2.1.0 release, <a href="https://github.com/headius">Charles Nutter</a> <a href="https://bugs.ruby-lang.org/issues/8992">opened another feature request</a>,
and suggested to instead implement a compiler optimization for <code class="language-plaintext highlighter-rouge">String#freeze</code>, so as to provide the same feature but without introducing a new syntax.</p>

<p>If you aren’t familiar with how the Ruby virtual machine works, or virtual machines in general, you may be surprised to hear that Ruby has a compiler, but it absolutely does.</p>

<p>Prior to Ruby 2.1, the program <code class="language-plaintext highlighter-rouge">"Hello World".freeze</code> would be compiled by Ruby into a sequence of two instructions:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;</span> <span class="nb">puts</span> <span class="no">RubyVM</span><span class="o">::</span><span class="no">InstructionSequence</span><span class="p">.</span><span class="nf">compile</span><span class="p">(</span><span class="sx">%{"Hello World".freeze}</span><span class="p">).</span><span class="nf">disasm</span>
<span class="o">==</span> <span class="ss">disasm: </span><span class="c1">#&lt;ISeq:&lt;compiled&gt;@&lt;compiled&gt;:1 (1,0)-(1,19)&gt;</span>
<span class="mo">0000</span> <span class="n">putstring</span>                              <span class="s2">"Hello World"</span>             <span class="p">(</span>   <span class="mi">1</span><span class="p">)[</span><span class="no">Li</span><span class="p">]</span>
<span class="mo">0002</span> <span class="n">opt_send_without_block</span>                 <span class="o">&lt;</span><span class="n">calldata!mid</span><span class="ss">:freeze</span><span class="p">,</span> <span class="n">argc</span><span class="p">:</span><span class="mi">0</span><span class="p">,</span> <span class="no">ARGS_SIMPLE</span><span class="o">&gt;</span>
<span class="mo">0004</span> <span class="n">leave</span>
</code></pre></div></div>

<p>First, a <code class="language-plaintext highlighter-rouge">putstring</code> instruction to put <code class="language-plaintext highlighter-rouge">"Hello World"</code> on the VM stack, followed by an <code class="language-plaintext highlighter-rouge">opt_send_without_block</code> to call the <code class="language-plaintext highlighter-rouge">#freeze</code> method on it.</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">putstring</span><span class="p">(</span><span class="n">frozen_string</span><span class="p">)</span>
  <span class="vi">@stack</span><span class="p">.</span><span class="nf">push</span><span class="p">(</span><span class="n">frozen_string</span><span class="p">.</span><span class="nf">dup</span><span class="p">)</span>
<span class="k">end</span>
</code></pre></div></div>

<p>When invoked, the instruction receives a reference to a frozen String object that has been created by the Ruby compiler.
But since the semantics is that the string <code class="language-plaintext highlighter-rouge">#freeze</code> will be called on must be mutable, it has to duplicate it, and it’s the mutable copy that is put on the stack.</p>

<p>In my opinion, the <code class="language-plaintext highlighter-rouge">putstring</code> instruction isn’t correctly named, because its name suggests it just puts the frozen string directly on the stack.
This isn’t consistent with other <code class="language-plaintext highlighter-rouge">put*</code> instructions like <code class="language-plaintext highlighter-rouge">putobject</code>, which directly puts an object on the stack without duping it:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">putobject</span><span class="p">(</span><span class="n">object</span><span class="p">)</span>
  <span class="vi">@stack</span><span class="p">.</span><span class="nf">push</span><span class="p">(</span><span class="n">object</span><span class="p">)</span>
<span class="k">end</span>
</code></pre></div></div>

<p>But also inconsistent with some other instructions like <code class="language-plaintext highlighter-rouge">duparray</code> and <code class="language-plaintext highlighter-rouge">duphash</code>, which actually behave like <code class="language-plaintext highlighter-rouge">putstring</code> does:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">duparray</span><span class="p">(</span><span class="n">array</span><span class="p">)</span>
  <span class="vi">@stack</span><span class="p">.</span><span class="nf">push</span><span class="p">(</span><span class="n">array</span><span class="p">.</span><span class="nf">dup</span><span class="p">)</span>
<span class="k">end</span>
</code></pre></div></div>

<p>So it would be much clearer if it had been named <code class="language-plaintext highlighter-rouge">dupstring</code> instead of <code class="language-plaintext highlighter-rouge">putstring</code>.</p>

<p>But anyways, Charles’ suggestion was to have the compiler generate a different set of VM instructions when the <code class="language-plaintext highlighter-rouge">#freeze</code> method is called
on a string literal:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;</span> <span class="nb">puts</span> <span class="no">RubyVM</span><span class="o">::</span><span class="no">InstructionSequence</span><span class="p">.</span><span class="nf">compile</span><span class="p">(</span><span class="sx">%{"Hello World".freeze}</span><span class="p">).</span><span class="nf">disasm</span>
<span class="o">==</span> <span class="ss">disasm: </span><span class="c1">#&lt;ISeq:&lt;compiled&gt;@&lt;compiled&gt;:1 (1,0)-(1,20)&gt;</span>
<span class="mo">0000</span> <span class="n">opt_str_freeze</span>                         <span class="s2">"Hello World"</span><span class="p">,</span> <span class="o">&lt;</span><span class="n">calldata!mid</span><span class="ss">:freeze</span><span class="p">,</span> <span class="n">argc</span><span class="p">:</span><span class="mi">0</span><span class="p">,</span> <span class="no">ARGS_SIMPLE</span><span class="o">&gt;</span><span class="p">(</span>   <span class="mi">1</span><span class="p">)[</span><span class="no">Li</span><span class="p">]</span>
<span class="mo">0003</span> <span class="n">leave</span>
</code></pre></div></div>

<p>As you can see, on more recent rubies, the <code class="language-plaintext highlighter-rouge">putstring</code> and <code class="language-plaintext highlighter-rouge">opt_send_without_block</code> instructions have been replaced by a single <code class="language-plaintext highlighter-rouge">opt_str_freeze</code>.
Its implementation in pseudo-Ruby would be something like:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">opt_str_freeze</span><span class="p">(</span><span class="n">frozen_string</span><span class="p">)</span>
  <span class="k">if</span> <span class="no">RubyVM</span><span class="p">.</span><span class="nf">string_freeze_was_redefined?</span>
    <span class="vi">@stack</span><span class="p">.</span><span class="nf">push</span><span class="p">(</span><span class="n">frozen_string</span><span class="p">.</span><span class="nf">dup</span><span class="p">.</span><span class="nf">freeze</span><span class="p">)</span>
  <span class="k">else</span>
    <span class="vi">@stack</span><span class="p">.</span><span class="nf">push</span><span class="p">(</span><span class="n">frozen_string</span><span class="p">)</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>As you can see, to not break semantics, the instruction has to check that <code class="language-plaintext highlighter-rouge">String#freeze</code> hasn’t been redefined, but apart from that cheap precondition, the instruction does strictly less work than before.</p>

<p>This is the feature Ruby 2.1.0 ultimately shipped with in December 2013.</p>

<h2 id="further-optimizations">Further Optimizations</h2>

<p>To further reduce string allocations, in 2014, Aman Karmani (tmm1) and Hailey Somerville (haileys) from GitHub submitted <a href="https://bugs.ruby-lang.org/issues/9382">a patch to add two more optimized instructions, <code class="language-plaintext highlighter-rouge">opt_aref_with</code> and <code class="language-plaintext highlighter-rouge">opt_aset_with</code></a>.</p>

<p>Before their patch, accessing a hash with a string key would cause a string allocation:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;</span> <span class="nb">puts</span> <span class="no">RubyVM</span><span class="o">::</span><span class="no">InstructionSequence</span><span class="p">.</span><span class="nf">compile</span><span class="p">(</span><span class="sx">%{some_hash["str"]}</span><span class="p">).</span><span class="nf">disasm</span>
<span class="o">...</span>
<span class="mo">0003</span> <span class="n">putstring</span>                              <span class="s2">"str"</span>
<span class="mo">0005</span> <span class="n">opt_aref</span>                               <span class="o">&lt;</span><span class="n">calldata!mid</span><span class="ss">:[]</span><span class="p">,</span> <span class="n">argc</span><span class="p">:</span><span class="mi">1</span><span class="p">,</span> <span class="no">ARGS_SIMPLE</span><span class="o">&gt;</span><span class="p">[</span><span class="no">CcCr</span><span class="p">]</span>
<span class="mo">0007</span> <span class="n">leave</span>
</code></pre></div></div>

<p>After the patch, these two instructions were replaced by a single <code class="language-plaintext highlighter-rouge">opt_aref_with</code>:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;</span> <span class="nb">puts</span> <span class="no">RubyVM</span><span class="o">::</span><span class="no">InstructionSequence</span><span class="p">.</span><span class="nf">compile</span><span class="p">(</span><span class="sx">%{some_hash["str"]}</span><span class="p">).</span><span class="nf">disasm</span>
<span class="o">...</span>
<span class="mo">0003</span> <span class="n">opt_aref_with</span>                          <span class="s2">"str"</span><span class="p">,</span> <span class="o">&lt;</span><span class="n">calldata!mid</span><span class="ss">:[]</span><span class="p">,</span> <span class="n">argc</span><span class="p">:</span><span class="mi">1</span><span class="p">,</span> <span class="no">ARGS_SIMPLE</span><span class="o">&gt;</span>
<span class="mo">0006</span> <span class="n">leave</span>
</code></pre></div></div>

<p>Similar to <code class="language-plaintext highlighter-rouge">opt_str_freeze</code>, these instructions would check if the method is being called on a Hash, and if <code class="language-plaintext highlighter-rouge">Hash#[]</code> hadn’t been redefined.
When both conditions are true, the instruction would be able to look up in the hash without first copying the string.</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">opt_aref_with</span><span class="p">(</span><span class="n">frozen_string</span><span class="p">)</span>
  <span class="k">if</span> <span class="no">RubyVM</span><span class="p">.</span><span class="nf">hash_aref_was_redefined?</span> <span class="o">||</span> <span class="o">!</span><span class="vi">@stack</span><span class="p">.</span><span class="nf">last</span><span class="p">.</span><span class="nf">is_a?</span><span class="p">(</span><span class="no">Hash</span><span class="p">)</span>
    <span class="c1"># fallback</span>
    <span class="vi">@stack</span><span class="p">.</span><span class="nf">push</span><span class="p">(</span><span class="n">frozen_string</span><span class="p">.</span><span class="nf">dup</span><span class="p">)</span>
    <span class="n">value</span> <span class="o">=</span> <span class="no">RubyVM</span><span class="p">.</span><span class="nf">call_method</span><span class="p">(</span><span class="ss">:[]</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
    <span class="vi">@stack</span><span class="p">.</span><span class="nf">push</span><span class="p">(</span><span class="n">value</span><span class="p">)</span>
  <span class="k">else</span>
    <span class="c1"># fast path</span>
    <span class="nb">hash</span> <span class="o">=</span> <span class="vi">@stack</span><span class="p">.</span><span class="nf">pop</span>
    <span class="n">value</span> <span class="o">=</span> <span class="nb">hash</span><span class="p">[</span><span class="n">frozen_string</span><span class="p">]</span>
    <span class="vi">@stack</span><span class="p">.</span><span class="nf">push</span><span class="p">(</span><span class="n">value</span><span class="p">)</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>According to Aman Karmani, this reduced allocations in GitHub by 3%, which is quite massive for what is a relatively small patch.</p>

<p>As a sidenote, this optimized instruction has <a href="https://bugs.ruby-lang.org/issues/21553">just been removed by Aaron Paterson</a> on the Ruby trunk, because given most performance-sensitive code already uses the magic comment, this optimization no longer yields much benefit.</p>

<h2 id="ruby-30-and-frozen-string-literals">Ruby 3.0 And Frozen String Literals</h2>

<p>Perhaps in part because of that new feature, or perhaps because of other reasons.
The knowledge of the performance impact of all these useless string duplication in Ruby applications started to spread around 2014,
and some community members, notably Richard Scheenman, started to submit <a href="https://github.com/rails/rails/pull/21057">pull requests in Rails</a>,
<a href="https://github.com/rack/rack/pull/737">rack</a> and a bunch of other gems, with some pretty significant results, such as an 11.9% latency reduction on <a href="https://www.codetriage.com/">codetriage.com</a>.</p>

<p>These performance gains were generally too good to pass up, but regardless, many people felt that the resulting code was much more ugly.
So the question of freezing string by default came back regularly, but was always rejected.</p>

<p>Until <a href="https://github.com/ruby/dev-meeting-log/blob/master/2015/DevMeeting-2015-08-20.md#magic-comment-for-frozen-string-literal-by-default">Akira Matsuda (amatsuda) brought the issue again at the Ruby core developer meeting in August 2015</a>,
and there <a href="https://xcancel.com/yukihiro_matz/status/634386185507311616">Matz decided that Ruby string literals would be frozen in Ruby 3.0</a>.</p>

<p>A number of other features to ease the transition were also decided.
First, the <code class="language-plaintext highlighter-rouge"># frozen_string_literal: true</code> magic comment was introduced to help gems prepare for Ruby 3.0.</p>

<p>Then, to ensure that any code that wouldn’t have been made compatible with Ruby 3.0 would remain usable, two Ruby command line options were added: <code class="language-plaintext highlighter-rouge">--enable-frozen-string-literal</code> and <code class="language-plaintext highlighter-rouge">--disable-frozen-string-literal</code>.</p>

<p>This way, once Ruby 3.0 would be released, if your code or one of your dependencies wasn’t compatible yet, you could just set
<code class="language-plaintext highlighter-rouge">RUBYOPT="--disable-frozen-string-literal"</code> and keep going.</p>

<p>And also a <code class="language-plaintext highlighter-rouge">--debug-frozen-string-literal</code> command line option, to help developers.</p>

<p>All these new features were released with Ruby 2.3 in December 2015.</p>

<p>What happens when you run Ruby with <code class="language-plaintext highlighter-rouge">--enable-frozen-string-literal</code> or with the <code class="language-plaintext highlighter-rouge"># frozen_string_literal: true</code> magic comment is that the compiler generates a different bytecode:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;</span> <span class="nb">puts</span> <span class="no">RubyVM</span><span class="o">::</span><span class="no">InstructionSequence</span><span class="p">.</span><span class="nf">compile</span><span class="p">(</span><span class="sx">%{# frozen_string_literal: true</span><span class="se">\n</span><span class="sx">"Hello World"}</span><span class="p">).</span><span class="nf">disasm</span>
<span class="o">==</span> <span class="ss">disasm: </span><span class="c1">#&lt;ISeq:&lt;compiled&gt;@&lt;compiled&gt;:2 (2,0)-(2,13)&gt;</span>
<span class="mo">0000</span> <span class="n">putobject</span>                              <span class="s2">"Hello World"</span>             <span class="p">(</span>   <span class="mi">2</span><span class="p">)[</span><span class="no">Li</span><span class="p">]</span>
<span class="mo">0002</span> <span class="n">leave</span>
</code></pre></div></div>

<p>Now, instead of the <code class="language-plaintext highlighter-rouge">putstring</code> instruction, the compiler generates a <code class="language-plaintext highlighter-rouge">putobject</code> instruction.
As I mentioned above, this instruction directly puts the frozen string that was created during compilation on the stack, with no extra duplication.</p>

<p>So it’s important to understand that frozen string literals are strictly less work for Ruby than mutable string literals.</p>

<h2 id="community-usage">Community Usage</h2>

<p>Following the release of Ruby 2.3, the Rubocop project added <a href="https://github.com/rubocop/rubocop/pull/2542/commits/425b7469f109f2eae0648b600aa3ad24e85f6e21">a new cop to enforce the use of the <code class="language-plaintext highlighter-rouge"># frozen_string_literal: true</code> comment</a>,
with the intent of helping projects be ready for Ruby 3.0 in the future.</p>

<p>Over the following years, many projects migrated to frozen string literals, <a href="https://github.com/rails/rails/pull/29506">including Rails</a> and <a href="https://github.com/ruby/rake/pull/209">rake</a> in 2017, <a href="https://github.com/rack/rack/pull/1250">Rack in 2018</a>,
and of course a long tail of other projects.</p>

<p>It’s always hard to say with certainty how much a feature is used, but I think it’s safe to say that, aside from a few projects that deliberately chose not to follow suit, a large majority of the actively developed gems did migrate to frozen string literals.
However, many of the more stable and less actively developed gems didn’t.</p>

<p>There was no indication of when Ruby 3.0 would be released, and the lack of compatibility with it wasn’t advertised by warnings or any other methods, hence, few people even knew whether any of their dependencies needed to be updated.</p>

<p>Over time, the magic comment slowly became an incantation most Rubyists follow, in big part because of rubocop, but as far as I know, basically no one was trying to run their application with <code class="language-plaintext highlighter-rouge">--enable-frozen-string-literal</code>, and few even knew about it.</p>

<h2 id="abandoned-plan">Abandoned Plan</h2>

<p>However, <a href="https://bugs.ruby-lang.org/issues/11473#note-53">in October 2019, just before the release of Ruby 2.7, Matz abandoned the plan to make frozen string literal the default for Ruby 3.0</a>.</p>

<blockquote>
  <p>I consider this for years. I REALLY like the idea but I am sure introducing this could cause HUGE compatibility issue, even bigger than Ruby 1.9.
So I officially abandon making frozen-string-literals default (for Ruby3).</p>

  <p>–
Matz</p>
</blockquote>

<p>I must say this decision did surprise me at the time.
I definitely understand not wanting to cause a Python 3 sort of moment, but I don’t think frozen string literals would have caused it,
because ultimately you could always have set <code class="language-plaintext highlighter-rouge">RUBYOPT="--disable-frozen-string-literal"</code> and kept running your applications unchanged if necessary.</p>

<p>I’m pretty sure if Python 3 had a way of running Python 2 code, the migration would have been much less of a big deal.</p>

<p>It was even more surprising to me because Ruby 2.7 also introduced new deprecation warnings in preparation for the keyword argument change in Ruby 3.0, and from my point of view, this breaking change was way bigger than frozen string literals would ever have been.
It caused so many deprecations that a <a href="https://www.ruby-lang.org/en/news/2020/10/02/ruby-2-7-2-released/">Ruby 2.7.2 was later released specifically to turn deprecation warnings off</a>.
And arguably, updating code to support the new keyword argument logic was way more involved than for frozen string literals.
If you have a look at <a href="https://www.ruby-lang.org/en/news/2019/12/12/separation-of-positional-and-keyword-arguments-in-ruby-3-0/">the migration guide</a>, it’s fairly long and complex,
whereas frozen string literals only need a few strategically placed <code class="language-plaintext highlighter-rouge">.dup</code> there and there.</p>

<p>As a datapoint, I personally handled the migration of Shopify’s monolith and roughly 700 gem dependencies for both the Ruby 3.0 keyword arguments and for <code class="language-plaintext highlighter-rouge">--enable-frozen-string-literal</code>.
For keyword arguments, I had to send pull requests to almost a hundred gems, as well as change a lot of code in the monolith itself, and some of them were really non-trivial to fix.
For frozen string literals, I only had to send pull requests to 12 gems, and it was just a matter of adding a few <code class="language-plaintext highlighter-rouge">.dup</code> calls.</p>

<p>But anyway, by the time of the Ruby 3.0 release, it had been almost 5 years since the initial plan had been laid out, and most of the performance-sensitive code had migrated to use the magic comment, so this abandonment didn’t spark much discussion, and few people noticed.</p>

<h2 id="new-standards">New Standards</h2>

<p>Until four years later, in January 2024, I started hearing about <code class="language-plaintext highlighter-rouge">standardrb</code> and how <a href="https://github.com/standardrb/standard/pull/181">it doesn’t enforce the presence of the frozen string literal magic comment</a>.
I also saw a few projects starting to remove them, or new projects deliberately not adding them, because this extra comment at the top is seen as cruft.</p>

<p>And I must say I agree.
I hate that comment.</p>

<p>Back when I started with Ruby, in version 1.8, the default encoding of source files was ASCII, so we frequently had to add a magic comment
at the top of the file to tell Ruby they were encoded in UTF-8.</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># encoding: utf-8</span>
</code></pre></div></div>

<p>I hated that comment back then, because what I always loved about Ruby is that the source code is almost entirely free of boilerplate.
So when Ruby 2.0 made UTF-8 the default encoding, and we could finally get rid of all this cruft, it made me extremely happy.</p>

<p>I would love to do the same with the frozen string literal comment, but once you are aware of all these useless allocations and copies, it’s really hard to unsee.
I’m now familiar enough with the VM that when I look at code without the magic comment, I pretty much visualize the implicit <code class="language-plaintext highlighter-rouge">dup</code> calls.</p>

<p>When I look at code like this:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">env</span><span class="p">[</span><span class="s2">"HTTPS"</span><span class="p">]</span> <span class="o">==</span> <span class="s2">"on"</span> <span class="p">?</span> <span class="s2">"https"</span> <span class="p">:</span> <span class="s2">"http"</span>
</code></pre></div></div>

<p>I can’t help but see this:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">env</span><span class="p">[</span><span class="s2">"HTTPS"</span><span class="p">.</span><span class="nf">dup</span><span class="p">]</span> <span class="o">==</span> <span class="s2">"on"</span><span class="p">.</span><span class="nf">dup</span> <span class="p">?</span> <span class="s2">"https"</span><span class="p">.</span><span class="nf">dup</span> <span class="p">:</span> <span class="s2">"http"</span><span class="p">.</span><span class="nf">dup</span>
</code></pre></div></div>

<p>Which drives me nuts.
And yes, these are small strings, and the GC got faster in the last few years, but still, string literals are everywhere, so these allocations add up and cause a death by a thousand cuts.</p>

<p>So seeing that the community was slowly unlearning this lesson pained me, and I decided I’d try to revive the initiative.</p>

<h2 id="chilled-string-literals">Chilled String Literals</h2>

<p>In my opinion, what the initial plan lacked was a proper deprecation path.
Many Ruby users had heard the default would change with Ruby 3.0, but Ruby itself never emitted any deprecation to warn users that code would need to be updated, so very little work happened to prepare for it.</p>

<p>Hence, if I wanted to convince Matz to try again, I needed to come up with a way to emit useful deprecation warnings whenever some code would mutate a literal string.
That’s where I came up with <a href="https://bugs.ruby-lang.org/issues/20205">the concept of <em>chilled strings</em></a>.</p>

<p>Starting from Ruby 3.4, when a source file has no <code class="language-plaintext highlighter-rouge">frozen_string_literal</code> comment (either <code class="language-plaintext highlighter-rouge">true</code> or <code class="language-plaintext highlighter-rouge">false</code>), instead of generating <code class="language-plaintext highlighter-rouge">putstring</code> instructions, the compiler now generates <code class="language-plaintext highlighter-rouge">putchilledstring</code> instructions:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;</span> <span class="nb">puts</span> <span class="no">RubyVM</span><span class="o">::</span><span class="no">InstructionSequence</span><span class="p">.</span><span class="nf">compile</span><span class="p">(</span><span class="sx">%{puts "Hello World"}</span><span class="p">).</span><span class="nf">disasm</span>
<span class="o">==</span> <span class="ss">disasm: </span><span class="c1">#&lt;ISeq:&lt;compiled&gt;@&lt;compiled&gt;:1 (1,0)-(1,18)&gt;</span>
<span class="mo">0000</span> <span class="n">putself</span>                                                          <span class="p">(</span>   <span class="mi">1</span><span class="p">)[</span><span class="no">Li</span><span class="p">]</span>
<span class="mo">0001</span> <span class="n">putchilledstring</span>                       <span class="s2">"Hello World"</span>
<span class="mo">0003</span> <span class="n">opt_send_without_block</span>                 <span class="o">&lt;</span><span class="n">calldata!mid</span><span class="ss">:puts</span><span class="p">,</span> <span class="n">argc</span><span class="p">:</span><span class="mi">1</span><span class="p">,</span> <span class="no">FCALL</span><span class="o">|</span><span class="no">ARGS_SIMPLE</span><span class="o">&gt;</span>
<span class="mo">0005</span> <span class="n">leave</span>
</code></pre></div></div>

<p>This new instruction is identical to <code class="language-plaintext highlighter-rouge">putstring</code>, except it additionally marks the newly allocated string with the <code class="language-plaintext highlighter-rouge">STR_CHILLED</code> flag.
Then I modified the <code class="language-plaintext highlighter-rouge">rb_check_frozen</code> function, which is responsible for raising <code class="language-plaintext highlighter-rouge">FrozenError</code> when a frozen object is mutated, to also check for that flag.
When a chilled string is mutated, a deprecation warning is emitted, and the flag is removed so that only the very first mutation emits a warning:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;</span> <span class="no">Warning</span><span class="p">[</span><span class="ss">:deprecated</span><span class="p">]</span> <span class="o">=</span> <span class="kp">true</span>
<span class="o">=&gt;</span> <span class="kp">true</span>
<span class="o">&gt;&gt;</span> <span class="s2">"test"</span> <span class="o">&lt;&lt;</span> <span class="s2">"a"</span> <span class="o">&lt;&lt;</span> <span class="s2">"b"</span>
<span class="p">(</span><span class="n">irb</span><span class="p">):</span><span class="mi">3</span><span class="p">:</span> <span class="ss">warning: </span><span class="n">literal</span> <span class="n">string</span> <span class="n">will</span> <span class="n">be</span> <span class="n">frozen</span> <span class="k">in</span> <span class="n">the</span> <span class="n">future</span> <span class="p">(</span><span class="n">run</span> <span class="n">with</span> <span class="o">--</span><span class="n">debug</span><span class="o">-</span><span class="n">frozen</span><span class="o">-</span><span class="n">string</span><span class="o">-</span><span class="n">literal</span> <span class="k">for</span> <span class="n">more</span> <span class="n">information</span><span class="p">)</span>
<span class="o">=&gt;</span> <span class="s2">"testab"</span>
</code></pre></div></div>

<p>The migration plan is that in a yet to be defined future version, these deprecation warnings would be visible by default, and then in a further version, frozen string literals would become the default.</p>

<h2 id="measuring-the-performance-impact">Measuring The Performance Impact</h2>

<p>Just like in the previous discussions back in 2014, <a href="https://github.com/mame">Yusuke Endoh (mame)</a> objected to the change, arguing that the performance benefits of frozen string literals were never properly measured because back in 2014, lots of code wasn’t compatible so it wasn’t possible to measure.</p>

<blockquote>
  <p>how much would the performance degrade if we removed <code class="language-plaintext highlighter-rouge"># frozen_string_literal: true</code> from all code used in yjit-bench?</p>
</blockquote>

<p>So I went ahead and built a modified Ruby interpreter on which the magic comment had no effect, and <a href="https://bugs.ruby-lang.org/issues/20205#note-34">benchmarked it against mainline Ruby</a>.</p>

<p>The results were that frozen string literals make Lobsters, an open source discussion board in Rails, 8-9% faster.
It also made <code class="language-plaintext highlighter-rouge">railsbench</code>, a synthetic Rails application, 4-6% faster, and <code class="language-plaintext highlighter-rouge">liquid-render</code> 11% faster.</p>

<p>And one thing to note is that the benchmarked codebase and its dependencies, like Rack, still contain lots of code that was hand-optimized from the pre-frozen string literal days to avoid allocations.
So the difference would be certainly larger if mutable string literals weren’t already worked around.</p>

<p>Similarly, back then I was surprised to only see a meager 1-2% gain on the <code class="language-plaintext highlighter-rouge">erubi-rails</code> benchmark, given it’s quite string-heavy.
But in retrospect, it’s very much expected because one of the biggest performance tricks of erubi is that it works around mutable string literals in its code generation by leveraging <code class="language-plaintext highlighter-rouge">opt_str_freeze</code> instructions:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;</span> <span class="nb">puts</span> <span class="no">Erubi</span><span class="o">::</span><span class="no">Engine</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="s2">"Hello &lt;% name%&gt;!"</span><span class="p">).</span><span class="nf">src</span>
<span class="n">_buf</span> <span class="o">=</span> <span class="o">::</span><span class="no">String</span><span class="p">.</span><span class="nf">new</span><span class="p">;</span> <span class="n">_buf</span> <span class="o">&lt;&lt;</span> <span class="s1">'Hello '</span><span class="p">.</span><span class="nf">freeze</span><span class="p">;</span> <span class="nb">name</span><span class="p">;</span> <span class="n">_buf</span> <span class="o">&lt;&lt;</span> <span class="s1">'!'</span><span class="p">.</span><span class="nf">freeze</span><span class="p">;</span>
<span class="n">_buf</span><span class="p">.</span><span class="nf">to_s</span>
</code></pre></div></div>

<p>All this makes it hard to come up with a clear measure of the performance benefits of freezing string literals.
At this point, making them the default is more to allow Rubyists to write nicer and less contrived code, not so much about improving performance.</p>

<p>After some more rounds of discussion, Matz <a href="https://bugs.ruby-lang.org/issues/20205#note-35">accepted the proposal</a> but without committing to any specific timeline,
and I implemented the feature with <a href="https://github.com/etiennebarrie">Étienne Barrié</a>, which shipped with Ruby 3.4.0.</p>

<h2 id="so-its-done">So It’s Done?</h2>

<p>So at this point, it may look like a done deal.
The deprecations are in place, it’s just a matter of deciding when to flip the switch.</p>

<p>But as we’ve seen in the past, that doesn’t mean much.
Matz may still change his mind at any point, and there are still a few Ruby core members actively campaigning against frozen string literals.</p>

<p>Personally, I’m quite tired of arguing about it.
It might be a personal bias, given the overwhelming majority of the code I interact with has been frozen string literal compatible for a decade, but it seems to me that the Ruby community very largely adopted frozen string literals, so for me it seems obvious to make it the default.</p>

<p>But not everyone in Ruby core has the same view of the community.
Some members like Mame are very involved in <a href="https://en.wikipedia.org/wiki/Quine_(computing)">quines</a> and other forms of <a href="https://github.com/tric/trick2025">artistic programming like TRICK</a>,
in which mutable string literals are used a lot.
So I understand that for him, switching the default means breaking a number of historical programs he cares about.</p>

<p>Ultimately, as always with Ruby’s direction, it will come down to what Matz decides.
For now, he has publicly accepted the migration plan, but not yet committed to any timeline, and I’m not sure Matz really has a vision of what the community at large desires on this topic.
With Ruby 4.0 being likely released this year, it’s very possible this migration stays in limbo for years and is ultimately abandoned again.</p>

<h2 id="alternatives">Alternatives</h2>

<p>At the end of the day, I don’t care so much about frozen string literals being the default.
I just want to be able to stop adding this ugly comment at the top of my files, without losing the performance benefit and without having to explicitly freeze my constants.</p>

<p>An alternative to changing the default could be to allow setting compiler options for entire directories.
This would allow Rubyists to enable frozen string literals in a single place, typically the <code class="language-plaintext highlighter-rouge">gemspec</code> or Rails config.</p>

<p>However, this would fragment Ruby more, because it means a given code snippet may or may not work based on where it is located.
This was already a concern with the magic comment, it would be an even bigger one with directory-based compiler options.
So I’m not sure Matz would be ok with that.</p>

<h2 id="conclusion">Conclusion</h2>

<p>I can’t predict what the future of string literals in Ruby will be.
I do hope they’ll be frozen a few years from now, but I’m not holding my breath.</p>

<p>In the meantime I do encourage gem authors to tes<a href="https://github.com/asciidoctor/asciimath/pull/78">t their gems with <code class="language-plaintext highlighter-rouge">--enable-frozen-string-literal</code></a></p>

<p>What is certain, however, is that performance-wise, they only have upsides, as they’re strictly less work for the Ruby VM, but your performance-sensitive dependencies likely already use them, or at least work around mutable string literals in the hot paths.
Hence, you are unlikely to notice a big difference if you were to run your application with <code class="language-plaintext highlighter-rouge">RUBYOPT="--enable-frozen-string-literal"</code>.
However, if you do measure a negative performance impact, there is no doubt you are measuring incorrectly.</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1">
      <p>A previous version of the post wrongly listed PHP as a language with immutable strings. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name></name></author><category term="ruby" /><category term="performance" /><summary type="html"><![CDATA[If you are a Rubyist, you’ve likely been writing # frozen_string_literal: true at the top of most of your Ruby source code files, or at the very least, that you’ve seen it in some other projects.]]></summary></entry><entry><title type="html">Dear Rubyists: Shopify Isn’t Your Enemy</title><link href="https://byroot.github.io/opensource/ruby/2025/10/09/dear-rubyists.html" rel="alternate" type="text/html" title="Dear Rubyists: Shopify Isn’t Your Enemy" /><published>2025-10-09T05:03:51+00:00</published><updated>2025-10-09T05:03:51+00:00</updated><id>https://byroot.github.io/opensource/ruby/2025/10/09/dear-rubyists</id><content type="html" xml:base="https://byroot.github.io/opensource/ruby/2025/10/09/dear-rubyists.html"><![CDATA[<p>I’ve been meaning to write a post about my perspective on Open Source and corporate entities.
I already got the rough outline of it; however, I’m suffering from writer’s block,
but more importantly, the whole post is a praise of how Shopify engages with Open Source communities.
Hence, given the current climate, I don’t think I could publish it without addressing the elephant in the room first anyway.</p>

<p>So here it is, I am deeply convinced that contrary to what has been alleged recently,
Shopify has nothing but good intentions toward Ruby and its community.</p>

<p>It is healthy to be skeptical toward corporations, I certainly am, but I believe Shopify is currently receiving undue distrust considering their track record of massive investment in the Ruby ecosystem.
And some of that may be due to a lack of understanding of how they engage with Open Source communities.</p>

<p>So I’ll try to explain what they do, how they do it, and why we need more companies like Shopify, not less.</p>

<h2 id="proper-disclaimer">Proper Disclaimer</h2>

<p>As is customary in this sort of situation, I first need to disclose the nature of my relationship with Shopify.</p>

<p>I could try to brush it off by just saying that I was employed by them from November 2013 to August 2025, but in my opinion, that would be a cop-out.
Knowing that someone has been previously employed by someone else doesn’t tell you anything about where they’re speaking from.
Worse, instead of enlightening you on which biases the author might have, it might let you think they have insider knowledge, hence are even more reliable.</p>

<p>What is important to disclose is how the relationship ended.</p>

<p>In my case, I left Shopify for several reasons, but mainly because of my constant friction with the CEO.
Ever since my first interaction with him twelve years ago, I knew he was someone I couldn’t see eye to eye with on almost every subject.
Even when I’d occasionally happen to agree on a specific topic, his overly maximalist position and lack of nuance would drive me away.
The only reason I managed to stick this long at the company is that I made sure to pick projects and teams so as to minimize my interactions with him.</p>

<p>And the reason why I ended up quitting is that it was no longer possible to avoid him.
Since I consider him directly responsible for my burnout last year, I couldn’t possibly stay any longer.</p>

<p>I could go on for hours about all the hard feelings, but this is not really the place, I only mean to share enough to explain where I’m speaking from.
What is important to know is that I have absolutely zero reasons to give a pass to Shopify over anything.</p>

<h2 id="people-are-multidimensional">People Are Multidimensional</h2>

<p>But despites my personal feelings and history, it has to be said that Shopify’s CEO is a Rubyist at heart, almost to a fault.</p>

<p>Contrary to what you might think, Ruby isn’t all that popular at Shopify.
Even when I started back in 2013, only a small fraction of new hires had any prior experience with Ruby,
and a decade later, there aren’t so many proud Rubyists in the Shopify ranks.
Most developers, and even many executives, would rather use something else.</p>

<p>Yet, Ruby and Rails remain the default stack at Shopify, and the only reason for that is the CEO.
Every Shopify employee knows that suggesting straying away from Ruby wouldn’t fly there.
And I’m convinced that if it were anyone else at the helm, Shopify would have joined the long list of companies that attempted to migrate to something else
and are now stuck with both a Ruby monolith and a ton of half-migrated micro-services in Java or Go.</p>

<p>Hence, it’s important to recognize that people are multidimensional.
Just because you can’t see eye to eye on some topic doesn’t mean you can’t be allies (even if only by circumstances) on another.</p>

<p>But Shopify isn’t only its CEO.</p>

<h2 id="the-ruby--rails-infrastructure-team-rri">The Ruby &amp; Rails Infrastructure Team (R&amp;RI)</h2>

<p>As Rubyists, the side of Shopify you are the most likely to interact with, or at least be familiar with, is the Ruby and Rails Infrastructure team (R&amp;RI).</p>

<p>It’s a team of 40ish people.
They’re the ones you see on countless GitHub issues and pull requests, maintaining countless projects, and speaking at conferences.
I know all of them very well, and I can attest that, barring a couple of rare exceptions, they’re all long-time proud Rubyists, not mercenaries nor zealous “company men”.</p>

<p>I believe, without the shadow of a doubt, that if Shopify ever started to have ill intentions toward the community, many people in the R&amp;RI team would either resign or call it out or both.
At the very least, they would confide in other members of the community, and that would inevitably be public rather quickly.</p>

<p>You may think I’m exaggerating, and surely with their cushy salaries, many of them would have second thoughts.
But I honestly don’t think so.
Shopify isn’t even paying that well (depending on the market).
Based on my discussions with the people who left the team over the years, the most common cause of voluntary departures, by far, was compensation.
And most of the team could find another job rather quickly anyway, even in this market.</p>

<p>What makes them stay at Shopify, and why it took me so long to finally decide to quit, is that right now, it is hands down the best place
in the world to contribute to the Ruby ecosystem.
Nowhere else comes close, and that’s all due to Shopify’s philosophy toward Open Source.</p>

<h2 id="your-dependencies-are-your-code-too">Your Dependencies Are Your Code Too</h2>

<p>Whether you realized it already or not, all the code you depend on, all the code that runs on your servers, is your code.
It doesn’t matter if it was written by someone you never met in Nebraska, or by a multi-billion-dollar corporation.</p>

<p>You run it, you own it.</p>

<p>If it has a bug, if it is missing a feature, or if it has any other needs, that’s on you to figure out the solution for yourself.
There’s no relying on the original author to get that responsibility off your plate.</p>

<p>To illustrate this, I remember back in 2014 or 2015, when MySQL servers started segfaulting in production at a regular interval.
IIRC, Shopify had a support contract with a MySQL consultancy, and they probably were notified of it.
But we didn’t sit there waiting for the “owners” or experts to figure it out.</p>

<p>It’s a colleague who went knee deep in core dumps to figure out this was caused by an <code class="language-plaintext highlighter-rouge">alloca</code> call in a non-leaf function,
causing a stack overflow, produced a patch, patched our MySQL servers, and then sent the patch upstream<sup id="fnref:1"><a href="#fn:1" class="footnote" rel="footnote" role="doc-noteref">1</a></sup>.</p>

<p>This philosophy is at the heart of Shopify’s Ruby &amp; Rails Infrastructure team.
It is determined not just to be a user of the open source ecosystem, but to proactively engage with it, contribute,
and make it better through engineering time and contributions.
Not by delegating the responsibility to a third party nor exploiting maintainers goodwill.</p>

<p>I also sometimes hear people saying that Shopify is snatching all the super senior Ruby developers, but I’d argue that’s mostly untrue.
The reality is that in most cases, Shopify is growing these developers internally.</p>

<p>Take Kevin Newton, for instance.
He started as a product developer at Shopify, but after a few years, he pitched his vision of a universal parser for Ruby,
managed to get transferred to the R&amp;RI team, worked on the project that became Prism, became a Ruby core committer, won the Ruby Prize award, etc.
Since then, he left Shopify to work on a Python JIT at Meta, yet he is still maintaining Prism, because he is a Rubyist at heart.
And Kevin is far from the only example of that; I am one as well, and so are dozens of my former teammates.
Some, like Peter Zhu, even started as interns.</p>

<p>The reason I’m explaining this is that I feel there is a part of the community that is naturally distrustful of Shopify or corporations in general.
I don’t blame them, there have been countless examples of nefarious behaviours from companies, so it’s logical and healthy to at least be skeptical.
But it’s also important to recognize and salute positive behavior when it happens.</p>

<p>In this specific case, I believe that recently, Shopify has been giving the community something that is priceless: a large number of proficient and deeply committed contributors to Ruby itself and the whole ecosystem.
And I’d argue that is way more valuable for the future and sustainability of Ruby than any amount of money.</p>

<h2 id="sustainability-isnt-just-about-money">Sustainability Isn’t Just About Money</h2>

<p>Usually, when the topic of Open Source sustainability comes up, it ends up revolving around how to make companies pay for developers’ time.
There is this idealized image of Open Source being an amalgamation of lone developers tirelessly maintaining projects for free, eating ramen while big bad companies make huge profits out of their work.
There is definitely some truth to it, it is far from uncommon, but it’s also a bit of a tired cliché.</p>

<p>The Open Source ecosystem is also a lot of projects that are contributed to by people on various companies’ payrolls.
Linux is the poster child of healthy corporate involvement, with the overwhelming majority of contributions coming from employees of companies with a vested interest in the kernel.
That’s just one example, but when you look at big and complex open source projects, most of the time you’ll see big companies involved in one way or another.
That’s how most of the sustainable open source happens today, way more than through donations.</p>

<p>Hence, I’d argue that if an open source community wants to be sustainable, it needs to be welcoming of corporate contributions.
I don’t mean trust them blindly, it’s important to keep them in check just in case, but you have to let them play ball.</p>

<p>Ruby has successfully done that.
Back in 2019, Rafael França and Matz met in Bristol.
Rafael asked Matz what he needed, and Matz answered: “I need people”.
That’s how the Ruby and Rails Infrastructure team started getting involved in Ruby development, that’s what ultimately led to YJIT, now ZJIT, numerous GC improvements like Variable Width Allocation, modular GC, Prism, tons of Ractors improvements, etc.
But more importantly, almost a dozen new Ruby core committers.</p>

<p>I would wager that if that day Matz had asked for money, we’d have much worse results to show for.</p>

<h2 id="money-can-create-perverse-incentives">Money Can Create Perverse Incentives</h2>

<p>And aside from worse results, I’d argue it would have created perverse incentives.</p>

<p>I have nothing but respect for people who try to find ways to fund open source development in alternative ways.
However, it’s important to look at it through the lens of structures and incentives.</p>

<p>Whenever you design a system that involves people, you need to consider how a person who tries to maximize their personal benefits is incentivized to behave.</p>

<p>A typical example is ticket inspectors on trains and buses.
You may be tempted to give them a cut on the fines they give to people, as to incentivise them to work harder, but by doing so,
you create a problem that they are incentivized to be inflexible with commuters, causing a lot of conflicts instead of resolving situations peacefully.
Some of them might even be incentivized to give bullshit fines to earn a little extra money<sup id="fnref:2"><a href="#fn:2" class="footnote" rel="footnote" role="doc-noteref">2</a></sup>.</p>

<p>If a system requires all the people involved to be perfect and act selflessly, then I’d argue it’s a flawed system.</p>

<p>Now, if Shopify had instead poured millions in cash into the Ruby Association or Matz himself, how would you, I, or anyone
be able to trust that the project direction and decision are free of influence?
How to trust that a given feature was accepted solely on its own merit and not just because it came from a big sponsor?
Inversely, when a feature is declined, how do you trust it wasn’t because it didn’t come from a sponsor?</p>

<p>That’s the thing with money, once you have it, it’s very difficult to do without.
When a big sponsor pulls out, you have to lay off staff, stop some initiatives, etc.
So even if you publicly declare that there’s no strings attached, even if you never explicitly say anything about it,
entities and people who receive funding are naturally incentivized to keep the donor happy so that the funding keeps coming.</p>

<p>Whereas with corporate contributors, sure, their employer may decide to assign them to another project, but there are no hard consequences, and most of them will stick around regardless.
Most will even remain contributors if they quit or are laid off.</p>

<p>You can actually witness that dynamic between Shopify and Ruby publicly, for instance, in how Prism is now the default parser, but isn’t yet the only official parser.
I can tell you that this has ruffled quite a few feathers at Shopify, but that’s the thing, Matz and Ruby don’t feel indebted to Shopify, they feel entirely free to say no.
And I think that’s how it should be.</p>

<p>To be clear, I’m not saying open source should be free of any monetary exchanges, just that it’s crucial to do it in a way that doesn’t let these sorts of suspicions arise.</p>

<h2 id="not-every-project-is-equally-forkable">Not Every Project Is Equally Forkable</h2>

<p>I know some people will object to the above, arguing that this is all open source, so if you are not happy with the direction of the project, you can always fork, ergo: shut up!
And while this is true in most cases, in practice, there are some projects that aren’t as easily forked because of their position.</p>

<p>For instance, if you look at Sidekiq, it’s making loads of money with its Pro and Enterprise offerings, and quite openly declines some features in the open source project so as not to cannibalize sales.
As far as I am aware, pretty much everyone is fine with it.
Sure, you’ll find a few people complaining about it, but that’s just background noise.</p>

<p>This is because Sidekiq isn’t on any critical path, there are plenty of alternatives you can go for if you aren’t satisfied with it, and if you wish to fork it and add such a feature for yourself, it’s pretty trivial, you don’t need to convince anyone.
Hence, everyone sees it as fair.</p>

<p>However, some projects have a moat.
A dominant position granted by another project.
Imagine if, instead of allowing you to use any job processor you want through Active Job, Rails had instead decided to make Sidekiq the only option.
In such a world, then I believe a whole lot more people would be upset or suspicious, because the bar to clear to use an alternative would be way higher.
A lot of Rails users would feel captive.</p>

<p>Well, I would argue that rubygems is in such a situation.
It is distributed with Ruby, required early during the Ruby boot process, is coupled with all distributed gems via the <code class="language-plaintext highlighter-rouge">gemspec</code> format, etc.
Because of this, it has a massive moat.
Forking it to build and use your own alternative to it is hardly viable, even for a big team like Shopify’s Ruby and Rails Infrastructure team.</p>

<p>As such, while it’s still nothing but commendable to try to fund its maintenance work, you have to be careful to avoid any perverse incentives and conflicts of interest.
Otherwise, even if you are exceptionally selfless and well-intentioned, you will inevitably spur suspicion whenever you refuse contributions or ask for sponsorship on a GitHub issue.</p>

<p>Unfortunately, it did happen.</p>

<p>Over the past decade, people in the community, not just Shopify employees, started to conclude that rubygems and bundler were being monetized by some key maintainers.
To be clear, I’m not trying to convince anyone that this was actually the case.
Some of that dirty laundry that has been an open secret among the Ruby maintainers’ community for a long time has recently been aired out, and I suspect there’s more to come.
You are free to form your own opinion on the topic if you so wish.</p>

<p>But my point is that it doesn’t actually matter whether rubygems was actually being unduly taken advantage of or not.
Ultimately, it’s down to who and what you consider legitimate.</p>

<p>My point is that the economic model chosen to fund rubygems’ maintenance, combined with its critical position in the ecosystem, has allowed for these suspicions to exist and persist, creating tensions and driving potential sources of funding away.</p>

<p>Again, I believe the problem is with structures and incentives, as well as optics, not specific people being imperfect or ill-intentioned.</p>

<h2 id="shopify-and-rubygems-rocky-relationship">Shopify and Rubygems Rocky Relationship</h2>

<p>Because of this, the relationship between Shopify and the various entities overseeing rubygems development has been quite rocky for a long time.</p>

<p>As you are probably aware, supply chain security has been a hot topic in the corporate world, hence, around 2021, Shopify started trying to contribute more to rubygems, and an entire team of developers was assembled with the goal of helping the upstream projects.</p>

<p>I no longer have access to all the history, and some details are now blurry.
But from what I recall, there were various goals, such as requiring multi-factor authentication to publish the most popular packages, making code signing easier, and a few other topics.</p>

<p>However, that initiative didn’t exactly receive a warm welcome from upstream.
It’s not that these features weren’t desired, but the understanding on Shopify’s side was that maintainers preferred to be paid to do it, rather than just accept contributions.</p>

<p>This is what ultimately led to Shopify funding Ruby Central directly (other than being a recurring major sponsor at their conferences for years).
The deal was for <a href="https://rubycentral.org/news/ruby-shield/">one million dollars over 4 years, under the name Ruby Shield</a>.</p>

<p>But even after that, the feeling on the Shopify side was that upstream was still uncooperative, until ultimately they decided to cut their losses and re-assigned engineers elsewhere.
The 4-year funding deal remained, but not much was expected of it.</p>

<p>Shopify could have threatened to pull funding at that time to try to coerce Ruby Central, yet they didn’t.</p>

<h2 id="shopify-never-threatened-to-pull-funding">Shopify Never Threatened To Pull Funding</h2>

<p>As I said earlier, ever since this controversy started, I’ve been unconvinced by the theory that all this would have been orchestrated by Shopify or through Shopify.
That simply would have required involving too many people, and I absolutely can’t imagine that none of them would have objected in one way or another.</p>

<p>But anyway, since then, I did contact two former coworkers, and they both assured me that Shopify never threatened to pull Ruby Central’s funding, nor threatened not to renew it.</p>

<p>Now, as I tried to explain earlier, even if you loudly claim money comes with no strings attached, people and entities are naturally incentivized to do what they think is necessary to keep it coming.
As such, it’s entirely possible that despite the absence of threats, Ruby Central’s moves may have been motivated by the need to secure the existing funding and/or find additional sources of funding.</p>

<p>My former coworkers also told me their side of the story, and it’s absolutely nothing like what has been alleged so far.
I deeply trust these two people, and I can’t possibly imagine they’d be lying to me, but I’d understand if you don’t want to take my word for it.</p>

<p>I don’t know when their side of the story will come out, nor if it will come out at all, but I do hope it comes out soon and with receipts.
Seeing so many good-natured and well-intentioned people get demonized like they have been over the last few weeks is depressing.</p>

<p>It is undeniable that, regardless of what Ruby Central’s intentions were, the communication and execution have been abysmal.
It is also true that there is a deep disagreement about what they rightfully or legitimately owned that won’t easily be resolved.
However, I can’t believe the entire organisation was ill-intentioned, here again, that would involve too many people to be conceivable.</p>

<p>Similarly, the claim that Aaron sending patches to rubygems is a clue that there was a conspiracy at play drives me nuts.
I’ve seen these pull requests being made with my own eyes, and I can tell you that the reason is way more mundane than that.
We were at Rails World, someone mentioned <code class="language-plaintext highlighter-rouge">rv</code>, the question of why you’d need to write something in Rust to speed up gem installation was raised, and Aaron and a few others started to profile Bundler to see if it could be made faster.</p>

<p>That’s it, that’s all there is.
Aaron got nerd sniped into making Bundler faster, and now he’s being called out for supposedly being part of a hostile takeover?
Give me a break.</p>

<h2 id="we-need-more-shopifies-not-less">We Need More Shopifies, Not Less</h2>

<p>I think it’s healthy to be wary of Shopify’s huge footprint on the ecosystem.
Companies are fickle beings, and even if I’m not particularly concerned about them ever having ill intent toward the Ruby ecosystem,
it’s not impossible that in the future they may decide to invest less.</p>

<p>But the response shouldn’t be to try to cast Shopify and its employees aside.
It would be silly to punish them for helping too much.
What we need is more companies doing their part.
Both to reduce Shopify’s relative influence, but also to have more diverse perspectives, use cases, and priorities.</p>

<p>I’m not saying every company should have a team as big as Shopify’s R&amp;RI, but there are numerous Ruby-based companies with valuations in billions and several hundred developers on their payroll, yet they contribute very little upstream.
If you work at one of such companies, you should really consider how you could do more.</p>

<p>That’s what I intend to do at my next job, to get one more Ruby company to pull its weight.</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1">
      <p>For the annecdote, MySQL refused the patch arguing that this error couldn’t realistically happen. LOL. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2">
      <p>This is not a made up example by the way. It has been a big issue in France for over a decade. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name></name></author><category term="opensource" /><category term="ruby" /><summary type="html"><![CDATA[I’ve been meaning to write a post about my perspective on Open Source and corporate entities. I already got the rough outline of it; however, I’m suffering from writer’s block, but more importantly, the whole post is a praise of how Shopify engages with Open Source communities. Hence, given the current climate, I don’t think I could publish it without addressing the elephant in the room first anyway.]]></summary></entry><entry><title type="html">Unlocking Ractors: generic instance variables</title><link href="https://byroot.github.io/ruby/performance/2025/08/11/unlocking-ractors-generic-variables.html" rel="alternate" type="text/html" title="Unlocking Ractors: generic instance variables" /><published>2025-08-11T09:03:51+00:00</published><updated>2025-08-11T09:03:51+00:00</updated><id>https://byroot.github.io/ruby/performance/2025/08/11/unlocking-ractors-generic-variables</id><content type="html" xml:base="https://byroot.github.io/ruby/performance/2025/08/11/unlocking-ractors-generic-variables.html"><![CDATA[<p>In two previous posts, I explained that one of the big blockers for Ractors’ viability is that while they’re supposed
to run fully in parallel, in many cases, they’d perform worse than a single thread because there were numerous codepaths
in the Ruby virtual machine and runtime that were still protected by the global VM lock.</p>

<p>I also explained how I removed two of these contention points, <a href="/ruby/performance/2025/04/26/unlocking-ractors-object-id.html">the <code class="language-plaintext highlighter-rouge">object_id</code> method</a>,
and <a href="/ruby/performance/2025/05/24/unlocking-ractors-class-variables.html">class instance variables</a>.</p>

<p>Since then, the situation has improved quite drastically, as numerous other contentious points have been either eliminated or reduced by me and my former teammates.
I’m not going to make a post for each of them, as in most cases it boils down to the same <a href="https://en.wikipedia.org/wiki/Read-copy-update">RCU technique</a>
I explained in the post about class instance variables.</p>

<p>But there’s one such contention point I find interesting and that I’d like to write about: the generic instance variables table.</p>

<h2 id="how-instance-variables-work">How Instance Variables Work</h2>

<p>As a Ruby user, you are likely familiar with the idea that everything is an object, and that is somewhat true, but that doesn’t mean all objects are equal.
I already touched on that subject in some of my previous posts, so I’ll do it quickly.</p>

<p>In the context of instance variables, in the Ruby VM you essentially have 3 or 4 types of objects, depending on how you count.</p>

<p>First, you have the “immediates”, small integers (<code class="language-plaintext highlighter-rouge">1</code>), booleans (<code class="language-plaintext highlighter-rouge">true</code>, <code class="language-plaintext highlighter-rouge">false</code>), static symbols (<code class="language-plaintext highlighter-rouge">:foo</code>, but not dynamic symbols like <code class="language-plaintext highlighter-rouge">"bar".to_sym</code>), etc.
These are called immediates because they don’t actually exist in memory; they don’t have an allocated object slot on the heap.  Their reference <em>is</em> their value.
In other words, they’re just <a href="https://en.wikipedia.org/wiki/Tagged_pointer">tagged pointers</a>.</p>

<p>Hence, they can’t have instance variables, and Ruby will treat them as if they were frozen to maintain the illusion of parity with other objects:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;</span> <span class="mi">42</span><span class="p">.</span><span class="nf">instance_variable_set</span><span class="p">(</span><span class="ss">:@test</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
<span class="p">(</span><span class="n">irb</span><span class="p">):</span><span class="mi">2</span><span class="ss">:in</span> <span class="s1">'Kernel#instance_variable_set'</span><span class="p">:</span> <span class="n">can</span><span class="err">'</span><span class="n">t</span> <span class="n">modify</span> <span class="n">frozen</span> <span class="no">Integer</span><span class="p">:</span> <span class="mi">42</span> <span class="p">(</span><span class="no">FrozenError</span><span class="p">)</span>
</code></pre></div></div>

<p>Then you have the more regular <code class="language-plaintext highlighter-rouge">T_OBJECT</code>, for your user-defined classes.
In the case of <code class="language-plaintext highlighter-rouge">T_OBJECT</code>, instance variables are stored inside the object’s slot like an array.
Consider the following object with 3 instance variables:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Foo</span>
  <span class="k">def</span> <span class="nf">initialize</span>
    <span class="vi">@a</span> <span class="o">=</span> <span class="mi">1</span>
    <span class="vi">@b</span> <span class="o">=</span> <span class="mi">2</span>
    <span class="vi">@c</span> <span class="o">=</span> <span class="mi">3</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>It will fit in the base <code class="language-plaintext highlighter-rouge">40B</code> object slot.
<code class="language-plaintext highlighter-rouge">16B</code> is being used for the object’s flags and a pointer to its class, and the remaining <code class="language-plaintext highlighter-rouge">24B</code> is used for the three instance variable references:</p>

<table>
  <thead>
    <tr>
      <th>flags</th>
      <th>klass</th>
      <th>@a</th>
      <th>@b</th>
      <th>@c</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>T_OBJECT</td>
      <td>0xffeff</td>
      <td>1</td>
      <td>2</td>
      <td>3</td>
    </tr>
  </tbody>
</table>

<p>In some cases, if an instance variable is added later and the slot is full, the Ruby VM may have to allocate a separate memory
region and “spill” the instance variables there, but this is actually fairly rare. The VM keeps track of how many variables
the instances of each class have, so if Ruby ever has to spill, every future instance of that class will be allocated in a larger slot.</p>

<p>The third type of objects are <code class="language-plaintext highlighter-rouge">T_CLASS</code> and <code class="language-plaintext highlighter-rouge">T_MODULE</code>. Since that was the topic of my previous post, I’ll be quick.
Class instance variables are laid out like for <code class="language-plaintext highlighter-rouge">T_OBJECT</code> except they’re in a “companion” slot.</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Foo</span>
  <span class="vi">@a</span> <span class="o">=</span> <span class="mi">1</span>
  <span class="vi">@b</span> <span class="o">=</span> <span class="mi">2</span>
  <span class="vi">@c</span> <span class="o">=</span> <span class="mi">3</span>
<span class="k">end</span>
</code></pre></div></div>

<p>The layout of the class itself stores a reference to that “companion” slot:</p>

<table>
  <thead>
    <tr>
      <th>flags</th>
      <th>klass</th>
      <th>obj_fields</th>
      <th>…</th>
      <th>…</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>T_CLASS</td>
      <td>0xffeaa</td>
      <td>0xffdddd</td>
      <td> </td>
      <td> </td>
    </tr>
  </tbody>
</table>

<p>And that other slot is laid out exactly like a <code class="language-plaintext highlighter-rouge">T_OBJECT</code>, except its type is <code class="language-plaintext highlighter-rouge">T_IMEMO</code> for “Internal Memory”:</p>

<table>
  <thead>
    <tr>
      <th>flags</th>
      <th>klass</th>
      <th>@a</th>
      <th>@b</th>
      <th>@c</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>T_IMEMO/fields</td>
      <td>0xffeaa</td>
      <td>1</td>
      <td>2</td>
      <td>3</td>
    </tr>
  </tbody>
</table>

<p>That’s a type of object that, as a Ruby user, you can’t directly interact with, nor even get a reference to; they’re basically invisible.
But they are used internally by the VM to store various data in memory managed by the GC instead of using manual memory management with <code class="language-plaintext highlighter-rouge">malloc</code> and <code class="language-plaintext highlighter-rouge">free</code>.</p>

<p>And then you have all the other objects. <code class="language-plaintext highlighter-rouge">Hash</code>, <code class="language-plaintext highlighter-rouge">Array</code>, <code class="language-plaintext highlighter-rouge">String</code>, etc.
For these, the space inside the object slot is already used.
For example, a <code class="language-plaintext highlighter-rouge">String</code> slot is used to store the string <code class="language-plaintext highlighter-rouge">length</code>, <code class="language-plaintext highlighter-rouge">capacity</code>, and if it’s small enough, the bytes that compose the string itself, otherwise a pointer to a manually allocated buffer.</p>

<p>Yet, Ruby allows you to define any instance variables you want on a string:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;</span> <span class="n">s</span> <span class="o">=</span> <span class="s2">"test"</span>
<span class="o">&gt;&gt;</span> <span class="n">s</span><span class="p">.</span><span class="nf">instance_variable_set</span><span class="p">(</span><span class="ss">:@test</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
<span class="o">&gt;&gt;</span> <span class="n">s</span><span class="p">.</span><span class="nf">instance_variable_get</span><span class="p">(</span><span class="ss">:@test</span><span class="p">)</span>
<span class="o">=&gt;</span> <span class="mi">1</span>
</code></pre></div></div>

<p>To allow this, the VM has an internal hash table, which used to be called the <code class="language-plaintext highlighter-rouge">genivar_tbl</code>, for Generic Instance Variables Hash-Table, and that I renamed into <code class="language-plaintext highlighter-rouge">generic_fields_tbl_</code> as part of my work on <code class="language-plaintext highlighter-rouge">object_id</code>.</p>

<p>I previously explained how this works in <a href="/ruby/performance/2025/04/26/unlocking-ractors-object-id.html#generic-instance-variables">my post about the <code class="language-plaintext highlighter-rouge">object_id</code></a>
method, but I’ll reexplain here with a bit more detail, as it’s really the core topic.</p>

<p>Once again, I’ll use Ruby pseudo-code to make it easier:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">module</span> <span class="nn">GenericIvarObject</span>
  <span class="no">GENERIC_FIELDS_TBL</span> <span class="o">=</span> <span class="no">Hash</span><span class="p">.</span><span class="nf">new</span><span class="p">.</span><span class="nf">compare_by_identity</span>

  <span class="k">def</span> <span class="nf">instance_variable_get</span><span class="p">(</span><span class="n">ivar_name</span><span class="p">)</span>
    <span class="k">if</span> <span class="n">ivar_shape</span> <span class="o">=</span> <span class="nb">self</span><span class="p">.</span><span class="nf">shape</span><span class="p">.</span><span class="nf">find</span><span class="p">(</span><span class="n">ivar_name</span><span class="p">)</span>
      <span class="no">RubyVM</span><span class="p">.</span><span class="nf">synchronize</span> <span class="k">do</span>
        <span class="k">if</span> <span class="n">buffer</span> <span class="o">=</span> <span class="no">GENERIC_FIELDS_TBL</span><span class="p">[</span><span class="nb">self</span><span class="p">]</span>
          <span class="n">buffer</span><span class="p">[</span><span class="n">ivar_shape</span><span class="p">.</span><span class="nf">index</span><span class="p">]</span>
        <span class="k">end</span>
      <span class="k">end</span>
    <span class="k">end</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>In that global hash, the keys are the reference to the objects, and the values are pointers to manually allocated buffers.
Inside the buffer, there is an array of references just like in a <code class="language-plaintext highlighter-rouge">T_OBJECT</code> or a <code class="language-plaintext highlighter-rouge">T_IMEMO/fields</code>.</p>

<p>This isn’t ideal for multiple reasons.</p>

<p>First, having to do a hash-lookup is way more expensive than reading at an offset like we do for <code class="language-plaintext highlighter-rouge">T_OBJECT</code>, or even chasing a reference
like we do for <code class="language-plaintext highlighter-rouge">T_CLASS</code> and <code class="language-plaintext highlighter-rouge">T_MODULE</code>.</p>

<p>But worse, if we’re in a multi-ractor scenario, we have to acquire the VM lock for the whole operation.
First, because that hash-table is global and not thread-safe, then because we must ensure that another Ractor can’t free that manually allocated buffer while we’re reading it<sup id="fnref:1"><a href="#fn:1" class="footnote" rel="footnote" role="doc-noteref">1</a></sup>.</p>

<p>So now, you probably understand the problem.
Any code that reads or writes an instance variable in an object that isn’t a direct descendant of <code class="language-plaintext highlighter-rouge">Object</code> (actually <code class="language-plaintext highlighter-rouge">BasicObject</code>) nor <code class="language-plaintext highlighter-rouge">Module</code> is a contention point for Ractors.</p>

<h2 id="surely-that-isnt-common">Surely That Isn’t Common?</h2>

<p>Before I dig into what can be changed, you may wonder if it even matters.</p>

<p>And it’s a very fair question. Developer time isn’t unlimited, hence the question of whether it is worth removing a contention
points boil down to how hot a code path it is, and how hard it is to fix it.</p>

<p>When I started looking at this, it was from the angle of <code class="language-plaintext highlighter-rouge">T_STRUCT</code>.
I wanted the instance variable of <code class="language-plaintext highlighter-rouge">Struct</code> and <code class="language-plaintext highlighter-rouge">Data</code> objects
not to be contention points, e.g., it’s not that rare to see <code class="language-plaintext highlighter-rouge">Struct</code> being used as some sort of code generator:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">Address</span> <span class="o">=</span> <span class="no">Struct</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="ss">:street</span><span class="p">,</span> <span class="ss">:city</span><span class="p">)</span> <span class="k">do</span>
  <span class="k">def</span> <span class="nf">something_else</span>
    <span class="vi">@something_else</span> <span class="o">||=</span> <span class="n">compute_something</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>Because <code class="language-plaintext highlighter-rouge">Struct.new</code> and <code class="language-plaintext highlighter-rouge">Data.define</code> don’t create <code class="language-plaintext highlighter-rouge">T_OBJECT</code> but <code class="language-plaintext highlighter-rouge">T_STRUCT</code> objects.
In these, the space inside the slot is used for the declared fields, not for the ivars.</p>

<p>Another pattern I expected was C extensions. When a Ruby C extension needs to expose an API, it uses the <code class="language-plaintext highlighter-rouge">TypedData</code> API, which allows to create <code class="language-plaintext highlighter-rouge">T_DATA</code> objects.
But it’s not rare for extensions to do as little as possible in C, and to extend that C class with some Ruby.</p>

<p>An example of that is the <code class="language-plaintext highlighter-rouge">trilogy</code> gem, which <a href="https://github.com/trilogy-libraries/trilogy/blob/16667c95e8c2716a16e69e8325d6b0cb615591e2/contrib/ruby/ext/trilogy-ruby/cext.c#L1141-L1153">defines a bunch of C methods</a></p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">RUBY_FUNC_EXPORTED</span> <span class="kt">void</span> <span class="nf">Init_cext</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">VALUE</span> <span class="n">Trilogy</span> <span class="o">=</span> <span class="n">rb_const_get</span><span class="p">(</span><span class="n">rb_cObject</span><span class="p">,</span> <span class="n">rb_intern</span><span class="p">(</span><span class="s">"Trilogy"</span><span class="p">));</span>
    <span class="n">rb_define_alloc_func</span><span class="p">(</span><span class="n">Trilogy</span><span class="p">,</span> <span class="n">allocate_trilogy</span><span class="p">);</span>

    <span class="n">rb_define_private_method</span><span class="p">(</span><span class="n">Trilogy</span><span class="p">,</span> <span class="s">"_connect"</span><span class="p">,</span> <span class="n">rb_trilogy_connect</span><span class="p">,</span> <span class="mi">3</span><span class="p">);</span>
    <span class="n">rb_define_method</span><span class="p">(</span><span class="n">Trilogy</span><span class="p">,</span> <span class="s">"change_db"</span><span class="p">,</span> <span class="n">rb_trilogy_change_db</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
    <span class="n">rb_define_alias</span><span class="p">(</span><span class="n">Trilogy</span><span class="p">,</span> <span class="s">"select_db"</span><span class="p">,</span> <span class="s">"change_db"</span><span class="p">);</span>
    <span class="n">rb_define_method</span><span class="p">(</span><span class="n">Trilogy</span><span class="p">,</span> <span class="s">"query"</span><span class="p">,</span> <span class="n">rb_trilogy_query</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
    <span class="c1">//...</span>
<span class="p">}</span>
</code></pre></div></div>

<p>But <a href="https://github.com/trilogy-libraries/trilogy/blob/16667c95e8c2716a16e69e8325d6b0cb615591e2/contrib/ruby/lib/trilogy.rb">then augment that C class with Ruby code</a>:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Trilogy</span>
  <span class="k">def</span> <span class="nf">initialize</span><span class="p">(</span><span class="n">options</span> <span class="o">=</span> <span class="p">{})</span>
    <span class="n">options</span><span class="p">[</span><span class="ss">:port</span><span class="p">]</span> <span class="o">=</span> <span class="n">options</span><span class="p">[</span><span class="ss">:port</span><span class="p">].</span><span class="nf">to_i</span> <span class="k">if</span> <span class="n">options</span><span class="p">[</span><span class="ss">:port</span><span class="p">]</span>
    <span class="n">mysql_encoding</span> <span class="o">=</span> <span class="n">options</span><span class="p">[</span><span class="ss">:encoding</span><span class="p">]</span> <span class="o">||</span> <span class="s2">"utf8mb4"</span>
    <span class="n">encoding</span> <span class="o">=</span> <span class="no">Trilogy</span><span class="o">::</span><span class="no">Encoding</span><span class="p">.</span><span class="nf">find</span><span class="p">(</span><span class="n">mysql_encoding</span><span class="p">)</span>
    <span class="n">charset</span> <span class="o">=</span> <span class="no">Trilogy</span><span class="o">::</span><span class="no">Encoding</span><span class="p">.</span><span class="nf">charset</span><span class="p">(</span><span class="n">mysql_encoding</span><span class="p">)</span>
    <span class="vi">@connection_options</span> <span class="o">=</span> <span class="n">options</span>
    <span class="vi">@connected_host</span> <span class="o">=</span> <span class="kp">nil</span>

    <span class="n">_connect</span><span class="p">(</span><span class="n">encoding</span><span class="p">,</span> <span class="n">charset</span><span class="p">,</span> <span class="n">options</span><span class="p">)</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>That’s a pattern I really like, as it allows to write less C and more Ruby, so I would have hated having to complexify some C extensions so that they’d perform better under ractors.</p>

<p>Then you have a few classics, <a href="https://github.com/rails/rails/blob/3235827585d87661942c91bc81f64f56d710f0b2/activesupport/lib/active_support/core_ext/string/output_safety.rb#L19-L73">like <code class="language-plaintext highlighter-rouge">ActiveSupport::SafeBuffer</code></a>,
which is a subclass of <code class="language-plaintext highlighter-rouge">String</code> with a <code class="language-plaintext highlighter-rouge">@html_safe</code> instance variable:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">module</span> <span class="nn">ActiveSupport</span>
  <span class="k">class</span> <span class="nc">SafeBuffer</span> <span class="o">&lt;</span> <span class="no">String</span>
    <span class="k">def</span> <span class="nf">initialize</span><span class="p">(</span><span class="n">str</span> <span class="o">=</span> <span class="s2">""</span><span class="p">)</span>
      <span class="vi">@html_safe</span> <span class="o">=</span> <span class="kp">true</span>
      <span class="k">super</span>
    <span class="k">end</span>

    <span class="c1"># ...snip</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>So it’s not that rare for code to inherit from core types, and it can end up in hot spots.
Even though I would recommend avoiding it as much as possible, for reasons other than performance, sometimes it’s the pragmatic thing to do, so users do it.</p>

<h2 id="some-data-points">Some Data Points</h2>

<p>Regardless, I was quite convinced that improving this code path would be useful and started working on it.
But later on, I was asked to provide some data, so while I’m breaking the chronology here, let me share it with you.</p>

<p>I started by doing my favorite hack in the VM, a good old print gated by an environment variable:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  <span class="k">if</span> <span class="p">(</span><span class="n">getenv</span><span class="p">(</span><span class="s">"DEB"</span><span class="p">))</span> <span class="p">{</span>
      <span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"%s</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">rb_obj_info</span><span class="p">(</span><span class="n">obj</span><span class="p">));</span>
  <span class="p">}</span>
</code></pre></div></div>

<p>Then I modified the <a href="https://github.com/Shopify/yjit-bench/"><code class="language-plaintext highlighter-rouge">yjit-bench</code> suite</a> to set <code class="language-plaintext highlighter-rouge">ENV["DEB"] = "1"</code> at the start
of the benchmarks loops, as I’m more interested in runtime codepaths than in boottime ones.</p>

<p>I then ran the <a href="https://github.com/Shopify/shipit-engine"><code class="language-plaintext highlighter-rouge">shipit</code></a> benchmark while redirecting STDERR to a file:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>bundle <span class="nb">exec </span>ruby benchmark.rb 2&gt; /tmp/ivar-stats.txt
</code></pre></div></div>

<p>And did some quick number crunching with <code class="language-plaintext highlighter-rouge">irb</code>:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">File</span><span class="p">.</span><span class="nf">readlines</span><span class="p">(</span><span class="s2">"/tmp/ivar-stats.txt"</span><span class="p">,</span> <span class="ss">chomp: </span><span class="kp">true</span><span class="p">).</span><span class="nf">tally</span><span class="p">.</span><span class="nf">sort_by</span><span class="p">(</span><span class="o">&amp;</span><span class="ss">:last</span><span class="p">).</span><span class="nf">reverse</span>
</code></pre></div></div>

<p>Here are some results. It’s a very vanilla Rails 8 application, nothing fancy:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[</span>
 <span class="p">[</span><span class="s2">"VM/thread"</span><span class="p">,</span> <span class="mi">4886969</span><span class="p">],</span>
 <span class="p">[</span><span class="s2">"T_HASH"</span><span class="p">,</span> <span class="mi">229501</span><span class="p">],</span>
 <span class="p">[</span><span class="s2">"SQLite3::Backup"</span><span class="p">,</span> <span class="mi">122531</span><span class="p">],</span>
 <span class="p">[</span><span class="s2">"T_STRING"</span><span class="p">,</span> <span class="mi">70597</span><span class="p">],</span>
 <span class="p">[</span><span class="s2">"xmlDoc"</span><span class="p">,</span> <span class="mi">23625</span><span class="p">],</span>
 <span class="p">[</span><span class="s2">"T_ARRAY"</span><span class="p">,</span> <span class="mi">9039</span><span class="p">],</span>
 <span class="p">[</span><span class="s2">"OpenSSL/Cipher"</span><span class="p">,</span> <span class="mi">2800</span><span class="p">],</span>
 <span class="p">[</span><span class="s2">"xmlNode"</span><span class="p">,</span> <span class="mi">2025</span><span class="p">],</span>
 <span class="p">[</span><span class="s2">"encoding"</span><span class="p">,</span> <span class="mi">358</span><span class="p">],</span>
 <span class="p">[</span><span class="s2">"time"</span><span class="p">,</span> <span class="mi">199</span><span class="p">],</span>
 <span class="p">[</span><span class="s2">"proc"</span><span class="p">,</span> <span class="mi">68</span><span class="p">],</span>
 <span class="p">[</span><span class="s2">"T_STRUCT"</span><span class="p">,</span> <span class="mi">38</span><span class="p">],</span>
 <span class="p">[</span><span class="s2">"OpenSSL/X509/STORE"</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span>
 <span class="p">[</span><span class="s2">"Psych/parser"</span><span class="p">,</span> <span class="mi">2</span><span class="p">],</span>
 <span class="p">[</span><span class="s2">"set"</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span>
<span class="p">]</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">T_STRUCT</code> was there as I expected, but entirely dwarfed by other types.
For the ones that aren’t obvious:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">"VM/Thread"</code> is literally <code class="language-plaintext highlighter-rouge">Thread</code> instances.</li>
  <li><code class="language-plaintext highlighter-rouge">xmlNode</code> and <code class="language-plaintext highlighter-rouge">xmlDoc</code> are <code class="language-plaintext highlighter-rouge">nokogiri</code> objects.</li>
  <li>Anything that doesn’t start with <code class="language-plaintext highlighter-rouge">T_</code>, is a <code class="language-plaintext highlighter-rouge">T_DATA</code>.</li>
</ul>

<p>The <code class="language-plaintext highlighter-rouge">T_HASH</code> I definitely didn’t expect, and it wasn’t clear where it was coming from. So I did another hack:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">if</span> <span class="p">(</span><span class="n">getenv</span><span class="p">(</span><span class="s">"DEB"</span><span class="p">)</span> <span class="o">&amp;&amp;</span> <span class="n">TYPE_P</span><span class="p">(</span><span class="n">obj</span><span class="p">,</span> <span class="n">T_HASH</span><span class="p">)</span> <span class="o">&amp;&amp;</span> <span class="p">(</span><span class="n">rand</span><span class="p">()</span> <span class="o">%</span> <span class="mi">1000</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">rb_bug</span><span class="p">(</span><span class="s">"here"</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">rb_bug</code> function causes the RubyVM to abort and print its crash report, which does contain the Ruby level-backtrace.
With that, I figured these were <a href="https://github.com/rack/rack/blob/9163ac3f5fac795179f9935e2ba6533a0ca1cf82/lib/rack/utils.rb#L436-L449"><code class="language-plaintext highlighter-rouge">Rack::Utils::HeaderHash</code></a> instances.</p>

<p>As for the <code class="language-plaintext highlighter-rouge">T_ARRAY</code>, it seems like it was mostly from <a href="https://github.com/rails/rails/blob/bb3ddbf032c3a24c2c94f911c8c5ca9f6939c6d9/activesupport/lib/active_support/inflector/inflections.rb#L33-L37"><code class="language-plaintext highlighter-rouge">ActiveSupport::Inflector::Inflections::Uncountables</code></a></p>

<p>And for <code class="language-plaintext highlighter-rouge">"VM/Thread"</code> it comes from <a href="https://github.com/rails/rails/blob/bb3ddbf032c3a24c2c94f911c8c5ca9f6939c6d9/activesupport/lib/active_support/isolated_execution_state.rb#L7-L8"><code class="language-plaintext highlighter-rouge">ActiveSupport::IsolatedExecutionState</code></a>.</p>

<p>All the rest was various <code class="language-plaintext highlighter-rouge">T_DATA</code> defined by C extensions, like the <code class="language-plaintext highlighter-rouge">trilogy</code> example I shared.</p>

<p>I ran a few other benchmarks from the <code class="language-plaintext highlighter-rouge">yjit-bench</code> repo, and often found similar generic instance variable usages.</p>

<p>So to answer the question, while it’s not that big of a hotspot, I believe it’s used enough to be worth optimizing, especially for <code class="language-plaintext highlighter-rouge">T_DATA</code>,
and not just because of Ractors.</p>

<h2 id="shaped-structs">Shaped Structs</h2>

<p>But as I said, before I got all that data, my sight was set on <code class="language-plaintext highlighter-rouge">T_STRUCT</code>.
Struct objects are laid out very similarly to <code class="language-plaintext highlighter-rouge">T_OBJECT</code> except that the space is used for “members” instead of instance variables.</p>

<p>For instance, the following struct:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">struct</span> <span class="o">=</span> <span class="no">Struct</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="ss">:field_1</span><span class="p">,</span> <span class="ss">:field_2</span><span class="p">).</span><span class="nf">new</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span>
</code></pre></div></div>

<p>Would be laid out as is:</p>

<table>
  <thead>
    <tr>
      <th>flags</th>
      <th>klass</th>
      <th>field_1</th>
      <th>field_2</th>
      <th>-</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>T_STRUCT</td>
      <td>0xbbeaa</td>
      <td>1</td>
      <td>2</td>
      <td> </td>
    </tr>
  </tbody>
</table>

<p>Hence, my initial idea was that if we were to encode the struct’s layout using shapes like we do for instance variables, we’d
be able to collocate members and variables together so that:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">MyStruct</span> <span class="o">=</span> <span class="no">Struct</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="ss">:field_1</span><span class="p">,</span> <span class="ss">:field_2</span><span class="p">)</span> <span class="k">do</span>
  <span class="k">def</span> <span class="nf">initialize</span><span class="p">(</span><span class="o">...</span><span class="p">)</span>
    <span class="k">super</span>
    <span class="vi">@c</span> <span class="o">=</span> <span class="mi">1</span>
<span class="k">end</span>
</code></pre></div></div>

<p>Could be laid out as:</p>

<table>
  <thead>
    <tr>
      <th>flags</th>
      <th>klass</th>
      <th>field_1</th>
      <th>field_2</th>
      <th>@c</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>T_STRUCT</td>
      <td>0xffeaa</td>
      <td>1</td>
      <td>2</td>
      <td>3</td>
    </tr>
  </tbody>
</table>

<p>Which would be perfect. Everything would be embedded in the object slot, so we’d have minimal memory usage and access times.</p>

<p>Unfortunately, after putting some more thought into it, I realized that was a major problem with it: complex shapes.
I <a href="https://railsatscale.com/2023-10-24-memoization-pattern-and-object-shapes/#shape_too_complex">previously wrote at length on what complex shapes are</a>, so very quickly,
in the Ruby VM, shapes aren’t garbage collected, so if some code generates a lot of different shapes, Ruby will deoptimize the object and use a hash table to store
its instance variables. It also does the same if the program uses all the possible shape slots.</p>

<p>So if <code class="language-plaintext highlighter-rouge">Struct</code> members were encoded with shapes, we’d need to have many fallback code paths to handle complex structs,
and for some of the struct APIs, that is straight out impossible, because Struct objects can be treated like arrays:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;</span> <span class="no">Struct</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="ss">:a</span><span class="p">,</span> <span class="ss">:b</span><span class="p">).</span><span class="nf">new</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">)[</span><span class="mi">1</span><span class="p">]</span>
<span class="o">=&gt;</span> <span class="mi">2</span>
</code></pre></div></div>

<p>In such a case, all we have is the member offset, so if the struct was deoptimized into a hash, we wouldn’t be able to look up members by index anymore, short of keeping a reverse index, but that’s really a lot of extra complexity.
So I abandoned this idea.</p>

<h2 id="shape-offset">Shape Offset</h2>

<p>A few days later, I was brainstorming with Étienne Barrié, and we thought of a simpler solution.
Instead of encoding struct members in shapes, we could introduce a new type of shape to encode at which offset the instance variables start.</p>

<p>As often mentioned, shapes are a tree, so an object with variables <code class="language-plaintext highlighter-rouge">@a -&gt; @b -&gt; @c -&gt; @d</code>, the shape tree would look like:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">ROOT_SHAPE</span>
  <span class="p">\</span><span class="o">-</span> <span class="no">Ivar</span><span class="p">(</span><span class="ss">name: :@a</span><span class="p">,</span> <span class="ss">index: </span><span class="mi">0</span><span class="p">,</span> <span class="ss">capacity: </span><span class="mi">3</span><span class="p">)</span>
    <span class="p">\</span><span class="o">-</span> <span class="no">Ivar</span><span class="p">(</span><span class="ss">name: :@b</span><span class="p">,</span> <span class="ss">index: </span><span class="mi">1</span><span class="p">,</span> <span class="ss">capacity: </span><span class="mi">3</span><span class="p">)</span>
      <span class="p">\</span><span class="o">-</span> <span class="no">Ivar</span><span class="p">(</span><span class="ss">name: :@c</span><span class="p">,</span> <span class="ss">index: </span><span class="mi">2</span><span class="p">,</span> <span class="ss">capacity: </span><span class="mi">3</span><span class="p">)</span>
        <span class="p">\</span><span class="o">-</span> <span class="no">Ivar</span><span class="p">(</span><span class="ss">name: :@d</span><span class="p">,</span> <span class="ss">index: </span><span class="mi">3</span><span class="p">,</span> <span class="ss">capacity: </span><span class="mi">8</span><span class="p">)</span>
</code></pre></div></div>

<p>With offset shapes, the same instance variable list, but for a struct with two members, would look like:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">ROOT_SHAPE</span>
  <span class="p">\</span><span class="o">-</span> <span class="no">Offset</span><span class="p">(</span><span class="ss">index: index: </span><span class="mi">1</span><span class="p">,</span> <span class="ss">capacity: </span><span class="mi">3</span><span class="p">)</span>
    <span class="p">\</span><span class="o">-</span> <span class="no">Ivar</span><span class="p">(</span><span class="ss">name: :@a</span><span class="p">,</span> <span class="ss">index: </span><span class="mi">2</span><span class="p">,</span> <span class="ss">capacity: </span><span class="mi">3</span><span class="p">)</span>
      <span class="p">\</span><span class="o">-</span> <span class="no">Ivar</span><span class="p">(</span><span class="ss">name: :@b</span><span class="p">,</span> <span class="ss">index: </span><span class="mi">3</span><span class="p">,</span> <span class="ss">capacity: </span><span class="mi">8</span><span class="p">)</span>
        <span class="p">\</span><span class="o">-</span> <span class="no">Ivar</span><span class="p">(</span><span class="ss">name: :@c</span><span class="p">,</span> <span class="ss">index: </span><span class="mi">4</span><span class="p">,</span> <span class="ss">capacity: </span><span class="mi">8</span><span class="p">)</span>
          <span class="p">\</span><span class="o">-</span> <span class="no">Ivar</span><span class="p">(</span><span class="ss">name: :@d</span><span class="p">,</span> <span class="ss">index: </span><span class="mi">5</span><span class="p">,</span> <span class="ss">capacity: </span><span class="mi">8</span><span class="p">)</span>
</code></pre></div></div>

<p>Here again, we’d need to handle the case where the Ruby VM ran out of shapes, but at least only the instance variables
would be deoptimized into a hash table, the struct members would still be laid out like an array, saving a ton of complexity.</p>

<p>That being said, while I still think this is a good idea, it’s a fairly big project with some uncertainties.
So when I evoked this solution with Peter Zhu, he suggested something much simpler.</p>

<h2 id="direct-references">Direct References</h2>

<p>The annoying thing with generic instance variables isn’t so much that they aren’t embedded inside the object’s slot, but that to find the companion slot, you need to go through that global hash table.</p>

<p>Of course, if they were embedded, it would mean better data locality, which is good for performance, but that really isn’t much compared to the hash-lookup, so a single pointer chase would already be a major win.</p>

<p>Hence, Peter’s suggestion was to just use empty space in struct slots to keep a direct reference to the buffer that holds
the instance variables, and since structs are basically fixed-size arrays, we can store that reference right after the
last struct member.</p>

<p>In pseudo-code, it would be more or less:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Struct</span>
  <span class="k">def</span> <span class="nf">instance_variable_get</span><span class="p">(</span><span class="n">ivar</span><span class="p">)</span>
    <span class="k">if</span> <span class="n">__slot_capacity__</span> <span class="o">&gt;</span> <span class="n">size</span>
      <span class="nb">self</span><span class="p">[</span><span class="n">size</span><span class="p">].</span><span class="nf">instance_variable_get</span><span class="p">(</span><span class="n">ivar</span><span class="p">)</span>
    <span class="k">else</span>
      <span class="c1"># use the generic instance variables table</span>
    <span class="k">end</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>That’s essentially the same strategy as with classes and modules.</p>

<p>At least on paper, that was quite easy because <a href="https://github.com/ruby/ruby/pull/13626">a few weeks prior, I had refactored the generic instance variables to use the same underlying managed object as classes</a>: <code class="language-plaintext highlighter-rouge">T_IMEMO/fields</code>.</p>

<p>Once again, <a href="https://github.com/ruby/ruby/pull/14095">I paired with Étienne Barrié to implement that idea</a>, but the resulting PR was way larger and more complex than I had hoped for, because of a lack of encapsulation.</p>

<p>In many places across the VM, when dealing with instance variables, you have a similar big <code class="language-plaintext highlighter-rouge">switch/case</code> statement with
a branch for each of the 3 or 4 possible types of object layouts.
So making <code class="language-plaintext highlighter-rouge">T_STRUCT</code> different would mean adding one more code path in all these places, which would leave me with a bad taste in my mouth.</p>

<p>That’s why I backtracked a bit and decided to start by <a href="https://github.com/ruby/ruby/pull/14107">refactoring the generic instance variables table, so that all accesses go through a very small number of functions</a>.
After that, all reads and writes to the table went through mostly just two functions, making it the perfect place to specialize the behavior for struct objects.</p>

<p>As a bit of a sidenote, the more I work on the Ruby VM, the more I realize the challenging part isn’t to come up with a brilliant idea,
or a clever algorithm, but the sheer effort required to refactor code without breaking everything.
The C language doesn’t have a lot of features for abstractions and encapsulation, so coupling is absolutely everywhere.</p>

<p>Anyways, with that refactoring done, I was able to re-implement <a href="https://github.com/ruby/ruby/pull/14129">the same pull request we did with Étienne, but half the size</a>, most of it being just tests, documentation, and benchmarking code.</p>

<p>Now the generic instance variable lookup function looks like this:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">VALUE</span>
<span class="nf">rb_obj_fields</span><span class="p">(</span><span class="n">VALUE</span> <span class="n">obj</span><span class="p">,</span> <span class="n">ID</span> <span class="n">field_name</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">RUBY_ASSERT</span><span class="p">(</span><span class="o">!</span><span class="n">RB_TYPE_P</span><span class="p">(</span><span class="n">obj</span><span class="p">,</span> <span class="n">T_IMEMO</span><span class="p">));</span>
    <span class="n">ivar_ractor_check</span><span class="p">(</span><span class="n">obj</span><span class="p">,</span> <span class="n">field_name</span><span class="p">);</span>

    <span class="n">VALUE</span> <span class="n">fields_obj</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">rb_shape_obj_has_fields</span><span class="p">(</span><span class="n">obj</span><span class="p">))</span> <span class="p">{</span>
        <span class="k">switch</span> <span class="p">(</span><span class="n">BUILTIN_TYPE</span><span class="p">(</span><span class="n">obj</span><span class="p">))</span> <span class="p">{</span>
          <span class="k">case</span> <span class="n">T_STRUCT</span><span class="p">:</span>
            <span class="k">if</span> <span class="p">(</span><span class="n">LIKELY</span><span class="p">(</span><span class="o">!</span><span class="n">FL_TEST_RAW</span><span class="p">(</span><span class="n">obj</span><span class="p">,</span> <span class="n">RSTRUCT_GEN_FIELDS</span><span class="p">)))</span> <span class="p">{</span>
                <span class="n">fields_obj</span> <span class="o">=</span> <span class="n">RSTRUCT_FIELDS_OBJ</span><span class="p">(</span><span class="n">obj</span><span class="p">);</span>
                <span class="k">break</span><span class="p">;</span>
            <span class="p">}</span>
            <span class="c1">// fall through</span>
          <span class="nl">default:</span>
            <span class="n">RB_VM_LOCKING</span><span class="p">()</span> <span class="p">{</span>
                <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">st_lookup</span><span class="p">(</span><span class="n">generic_fields_tbl_</span><span class="p">,</span> <span class="p">(</span><span class="n">st_data_t</span><span class="p">)</span><span class="n">obj</span><span class="p">,</span> <span class="p">(</span><span class="n">st_data_t</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">fields_obj</span><span class="p">))</span> <span class="p">{</span>
                    <span class="n">rb_bug</span><span class="p">(</span><span class="s">"Object is missing entry in generic_fields_tbl"</span><span class="p">);</span>
                <span class="p">}</span>
            <span class="p">}</span>
        <span class="p">}</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="n">fields_obj</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>When dealing with a <code class="language-plaintext highlighter-rouge">T_STRUCT</code> and if there’s some unused space in the slot, we entirely bypass the <code class="language-plaintext highlighter-rouge">generic_fields_tbl</code> and <code class="language-plaintext highlighter-rouge">RB_VM_LOCKING</code>.</p>

<p>And to ensure we don’t fall in the fallback path too much, we modified the Struct allocator to allocate a large enough slots for structs
that have instance variables:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="n">VALUE</span>
<span class="nf">struct_alloc</span><span class="p">(</span><span class="n">VALUE</span> <span class="n">klass</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">long</span> <span class="n">n</span> <span class="o">=</span> <span class="n">num_members</span><span class="p">(</span><span class="n">klass</span><span class="p">);</span>
    <span class="kt">size_t</span> <span class="n">embedded_size</span> <span class="o">=</span> <span class="n">offsetof</span><span class="p">(</span><span class="k">struct</span> <span class="n">RStruct</span><span class="p">,</span> <span class="n">as</span><span class="p">.</span><span class="n">ary</span><span class="p">)</span> <span class="o">+</span> <span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="n">VALUE</span><span class="p">)</span> <span class="o">*</span> <span class="n">n</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">RCLASS_MAX_IV_COUNT</span><span class="p">(</span><span class="n">klass</span><span class="p">)</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">embedded_size</span> <span class="o">+=</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">VALUE</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// snip...</span>
<span class="p">}</span>
</code></pre></div></div>

<p>As a result, instance variable accesses in structs are now noticeably faster, even when no ractor is involved:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>compare-ruby: ruby 3.5.0dev (2025-08-06T12:50:36Z struct-ivar-fields-2 9a30d141a1) +PRISM [arm64-darwin24]
built-ruby: ruby 3.5.0dev (2025-08-06T12:57:59Z struct-ivar-fields-2 2ff3ec237f) +PRISM [arm64-darwin24]
warming up.....

|                      |compare-ruby|built-ruby|
|:---------------------|-----------:|---------:|
|member_reader         |    590.317k|  579.246k|
|                      |       1.02x|         -|
|member_writer         |    543.963k|  527.104k|
|                      |       1.03x|         -|
|member_reader_method  |    213.540k|  213.004k|
|                      |       1.00x|         -|
|member_writer_method  |    192.657k|  191.491k|
|                      |       1.01x|         -|
|ivar_reader           |    403.993k|  569.915k|
|                      |           -|     1.41x|
</code></pre></div></div>

<p>That was a satisfying change.</p>

<h2 id="generalizing-to-other-types">Generalizing to Other Types</h2>

<p>Now that we had a working pattern, the question was where else could we apply it.</p>

<p>I definitely knew instance variables on <code class="language-plaintext highlighter-rouge">T_STRING</code> are rather common, given I’m very familiar with <code class="language-plaintext highlighter-rouge">ActiveSupport::SafeBuffer</code>, so I thought about pulling a similar trick for them.</p>

<p>Unfortunately, what made this possible with <code class="language-plaintext highlighter-rouge">T_STRUCT</code> is that they are essentially fixed-size arrays.
Which means we know that whatever free space is left in the slot won’t ever be needed in the future.</p>

<p>Whereas other types like <code class="language-plaintext highlighter-rouge">T_STRING</code> and <code class="language-plaintext highlighter-rouge">T_ARRAY</code> are variable size.
If you start storing a reference in free space at the end of the slot, you then need to be very careful that if the user appends
to the string or array, it won’t overwrite that reference. That’s much harder to do and probably not worth the extra complexity.</p>

<p>But one of my favorite things with Ruby and Rails is to be able to optimize from both ends.
If some pattern Rails uses isn’t very performant, I can try to optimize Ruby, but I can also just change what Rails does.</p>

<p>In the case of <code class="language-plaintext highlighter-rouge">ActiveSupport::SafeBuffer</code>, all we’re storing is just a boolean: <code class="language-plaintext highlighter-rouge">@html_safe = true</code>, and eventually, if something is appended to the buffer, the flag will be flipped.
But appends into safe buffers are very rare.</p>

<p>Most of the time, <code class="language-plaintext highlighter-rouge">String#html_safe</code> is only used as a way to tag the string, to indicate that it doesn’t need to be escaped when it’s later appended into another buffer. In other words, the overwhelming majority of instances never flip that flag.</p>

<p>Based on that knowledge, <a href="https://github.com/rails/rails/pull/55352">I changed that variable to be a negative</a>.
Instead of starting with <code class="language-plaintext highlighter-rouge">@html_safe = true</code>, we can start with <code class="language-plaintext highlighter-rouge">@html_unsafe = false</code>, and since referencing an instance
variable that doesn’t exist evaluates to <code class="language-plaintext highlighter-rouge">nil</code>, which is also falsy, we can simply not set the variable at all.</p>

<p>The result made <code class="language-plaintext highlighter-rouge">String#html_safe</code> twice as fast, even when no Ractor is started:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ruby 3.5.0dev (2025-07-17T14:01:57Z master a46309d19a) +YJIT +PRISM [arm64-darwin24]
Calculating -------------------------------------
    String#html_safe (old)     6.421M (± 1.6%) i/s  (155.75 ns/i) -     32.241M in   5.022802s
    String#html_safe          12.470M (± 0.8%) i/s   (80.19 ns/i) -     63.140M in   5.063698s
</code></pre></div></div>

<p>I guess this is a good example of <a href="https://en.wiktionary.org/wiki/mechanical_sympathy">mechanical sympathy</a><sup id="fnref:2"><a href="#fn:2" class="footnote" rel="footnote" role="doc-noteref">2</a></sup>, the more you know about how the tools you are using work, the more effectively you can use them.</p>

<p>And now that I learned about <code class="language-plaintext highlighter-rouge">ActiveSupport::Inflector::Inflections::Uncountables</code>, I should probably change it in a similar way.</p>

<p>But the one other type that I thought was worth attention to was <code class="language-plaintext highlighter-rouge">T_DATA</code>.</p>

<h2 id="typeddata">TypedData</h2>

<p>Until just a few months ago, <code class="language-plaintext highlighter-rouge">T_DATA</code> slots were fully used; here’s the <code class="language-plaintext highlighter-rouge">RTypedData</code> C struct in Ruby 3.4,
I added some annotations with the size of each field:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">RTypedData</span> <span class="p">{</span>
    <span class="cm">/** The part that all ruby objects have in common. */</span>
    <span class="k">struct</span> <span class="n">RBasic</span> <span class="n">basic</span><span class="p">;</span> <span class="c1">// 16B</span>

    <span class="cm">/**
     * This field  stores various  information about how  Ruby should  handle a
     * data.   This roughly  resembles a  Ruby level  class (apart  from method
     * definition etc.)
     */</span>
    <span class="k">const</span> <span class="n">rb_data_type_t</span> <span class="o">*</span><span class="k">const</span> <span class="n">type</span><span class="p">;</span> <span class="c1">// 8B</span>

    <span class="cm">/**
     * This has to be always 1.
     *
     * @internal
     */</span>
    <span class="k">const</span> <span class="n">VALUE</span> <span class="n">typed_flag</span><span class="p">;</span> <span class="c1">// 8B</span>

    <span class="cm">/** Pointer to the actual C level struct that you want to wrap. */</span>
    <span class="kt">void</span> <span class="o">*</span><span class="n">data</span><span class="p">;</span> <span class="c1">// 8B</span>
<span class="p">};</span>
</code></pre></div></div>

<p>Just quickly, the first <code class="language-plaintext highlighter-rouge">16B</code> was used for the common header all Ruby objects share, <code class="language-plaintext highlighter-rouge">8B</code> was used to store a pointer
to another struct that gives information to Ruby on what to do with this object, for instance, how to garbage collect it.</p>

<p>And then two other <code class="language-plaintext highlighter-rouge">8B</code> values, one pointing to arbitrary memory a C extension might have allocated, and then <code class="language-plaintext highlighter-rouge">typed_flag</code>.
If you read the comment associated with <code class="language-plaintext highlighter-rouge">typed_flag</code>, you may wonder what purpose it can possibly serve.</p>

<p>It’s there because <code class="language-plaintext highlighter-rouge">RTypedData</code> is the newer API for C extensions that was introduced in 2009 by Koichi Sasada.
Historically, when you needed to wrap a piece of native memory in a Ruby object, you’d use the <code class="language-plaintext highlighter-rouge">RData</code> API, and you had to
supply:</p>

<ul>
  <li>A pointer to the memory region.</li>
  <li>A marking function for the GC.</li>
  <li>A free function for the GC.</li>
</ul>

<p>That older, deprecated API is still there today, and you can see the struct that backs it up:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/**
 * @deprecated
 *
 * Old  "untyped"  user  data.   It  has  roughly  the  same  usage  as  struct
 * ::RTypedData, but lacked several features such as support for compaction GC.
 * Use of this struct is not recommended  any longer.  If it is dead necessary,
 * please inform the core devs about your usage.
 *
 * @internal
 *
 * @shyouhei tried to add RBIMPL_ATTR_DEPRECATED for this type but that yielded
 * too many warnings  in the core.  Maybe  we want to retry  later...  Just add
 * deprecated document for now.
 */</span>
<span class="k">struct</span> <span class="n">RData</span> <span class="p">{</span>

    <span class="cm">/** Basic part, including flags and class. */</span>
    <span class="k">struct</span> <span class="n">RBasic</span> <span class="n">basic</span><span class="p">;</span>

    <span class="cm">/**
     * This function is called when the object is experiencing GC marks.  If it
     * contains references to  other Ruby objects, you need to  mark them also.
     * Otherwise GC will smash your data.
     *
     * @see      rb_gc_mark()
     * @warning  This  is  called  during  GC  runs.   Object  allocations  are
     *           impossible at that moment (that is why GC runs).
     */</span>
    <span class="n">RUBY_DATA_FUNC</span> <span class="n">dmark</span><span class="p">;</span>

    <span class="cm">/**
     * This function is called when the object  is no longer used.  You need to
     * do whatever necessary to avoid memory leaks.
     *
     * @warning  This  is  called  during  GC  runs.   Object  allocations  are
     *           impossible at that moment (that is why GC runs).
     */</span>
    <span class="n">RUBY_DATA_FUNC</span> <span class="n">dfree</span><span class="p">;</span>

    <span class="cm">/** Pointer to the actual C level struct that you want to wrap. */</span>
    <span class="kt">void</span> <span class="o">*</span><span class="n">data</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div></div>

<p>So in various places in the Ruby VM, when you interact with a <code class="language-plaintext highlighter-rouge">T_DATA</code> object, you need to know if it’s a <code class="language-plaintext highlighter-rouge">RTypedData</code> or a <code class="language-plaintext highlighter-rouge">RData</code>
before you can do much of anything with it.</p>

<p>That’s where <code class="language-plaintext highlighter-rouge">typed_flag</code> comes in. It’s at the same offset in the <code class="language-plaintext highlighter-rouge">RTypedData</code>struct as the <code class="language-plaintext highlighter-rouge">dfree</code> pointer in the <code class="language-plaintext highlighter-rouge">RData</code> struct, and for various reasons, it’s impossible for a legitimate C function pointer to be strictly equal to <code class="language-plaintext highlighter-rouge">1</code>.</p>

<p>That’s why <code class="language-plaintext highlighter-rouge">typed_flag</code> is always <code class="language-plaintext highlighter-rouge">1</code>, it allows us to check if a <code class="language-plaintext highlighter-rouge">T_DATA</code> is typed by checking <code class="language-plaintext highlighter-rouge">rdata-&gt;dfree == 1</code>.</p>

<p>Now you might wonder why I’m telling you all of this.
Well, it’s because that <code class="language-plaintext highlighter-rouge">typed_flag</code> field is using <code class="language-plaintext highlighter-rouge">8B</code> of space to store exactly <code class="language-plaintext highlighter-rouge">1bit</code> of information, and that has bugged me for several years.</p>

<p>Even though truth be told, the comment is outdated, and the field can also sometimes be <code class="language-plaintext highlighter-rouge">3</code> as <a href="https://railsatscale.com/2025-06-03-implementing-embedded-typeddata-objects/">we piggy-backed on it with Peter Zhu last year to implement embedded TypedData objects</a>.
But that’s still 32 times more than needed, so if someone could think of a better place to store these two bits, that would free and entire <code class="language-plaintext highlighter-rouge">8B</code> to store a direct reference to the <code class="language-plaintext highlighter-rouge">T_IMEMO/fields</code>.</p>

<h2 id="enter-set-man">Enter Set Man</h2>

<p>Well, it turns out that someone did earlier this year.</p>

<p>Just before the RubyKaigi developer meeting, Jeremy Evans <a href="https://bugs.ruby-lang.org/issues/21216">proposed to turn <code class="language-plaintext highlighter-rouge">Set</code> into a core class, and to reimplement it in C</a>, and that was accepted.
Later during the conference, he asked me to <a href="https://github.com/ruby/ruby/pull/13074">review his usage of the RTypedData API</a>, and I suggested a bunch of improvements to make <code class="language-plaintext highlighter-rouge">Set</code>
objects smaller and reduce pointer chasing by leveraging embedded RTypedData objects.</p>

<p>But turns out that there was a bit of an annoying tradeoff here. The <code class="language-plaintext highlighter-rouge">RTypedData</code> struct is <code class="language-plaintext highlighter-rouge">40B</code> large, but when used embedded, we recycle the <code class="language-plaintext highlighter-rouge">data</code> pointer, so it’s only <code class="language-plaintext highlighter-rouge">32B</code> large,
and the <code class="language-plaintext highlighter-rouge">set_table</code> struct Jememy needed to store is <code class="language-plaintext highlighter-rouge">56B</code>, for a total of <code class="language-plaintext highlighter-rouge">88B</code>, which is a particularly annoying number.</p>

<p>Not because of the meaning some distasteful people attribute to it, but because it is just <code class="language-plaintext highlighter-rouge">8B</code> too large to fit in a standard <code class="language-plaintext highlighter-rouge">80B</code> GC slot, hence if we marked it as embeded, the footprint would grow from <code class="language-plaintext highlighter-rouge">40 + 56 = 96B</code> to <code class="language-plaintext highlighter-rouge">160B</code> with lots of wasted space.</p>

<p>In all honesty, it wasn’t a massive problem unless your application is using a massive amount of sets, but it seems that it really bothered Jeremy.</p>

<p>What he came up with a couple of weeks later was that <a href="https://github.com/ruby/ruby/pull/13190">he moved these two bits of memory into the low bits of <code class="language-plaintext highlighter-rouge">RTypedData.type</code> and <code class="language-plaintext highlighter-rouge">RData.dmark</code></a>,
freeing <code class="language-plaintext highlighter-rouge">8B</code> per embedded TypedData object and allowing <code class="language-plaintext highlighter-rouge">Set</code> objects to fit in 80B.</p>

<p>Here again, the assumption was that because of alignment rules, the three lower bits of pointers can’t ever be set, so we can store our own information in there.</p>

<p>But now, I think <a href="https://github.com/ruby/ruby/pull/14134">this space could be put to better use to store a reference to a companion <code class="language-plaintext highlighter-rouge">T_IMEMO/fields</code></a>, so we could skip the global instance variables table.
The problem is that here again it’s a matter of tradeoff. We can waste some memory to save some CPU cycles, which is better is really just a judgment call.</p>

<p>Just like this issue bothered Jeremy a few months back, it now bothered me, and I went searching for a way to save another <code class="language-plaintext highlighter-rouge">8B</code> in <code class="language-plaintext highlighter-rouge">Set</code> objects.</p>

<h2 id="shrinking-set">Shrinking Set</h2>

<p>Hence, I started to stare at the <code class="language-plaintext highlighter-rouge">struct set_table</code> while frowning my eyebrows in the hope of spotting some redundant or superfluous member I could eliminate:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">set_table</span> <span class="p">{</span>
    <span class="cm">/* Cached features of the table -- see st.c for more details.  */</span>
    <span class="kt">unsigned</span> <span class="kt">char</span> <span class="n">entry_power</span><span class="p">,</span> <span class="n">bin_power</span><span class="p">,</span> <span class="n">size_ind</span><span class="p">;</span>
    <span class="cm">/* How many times the table was rebuilt.  */</span>
    <span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">rebuilds_num</span><span class="p">;</span>
    <span class="k">const</span> <span class="k">struct</span> <span class="n">st_hash_type</span> <span class="o">*</span><span class="n">type</span><span class="p">;</span>
    <span class="cm">/* Number of entries currently in the table.  */</span>
    <span class="n">st_index_t</span> <span class="n">num_entries</span><span class="p">;</span>
    <span class="cm">/* Array of bins used for access by keys.  */</span>
    <span class="n">st_index_t</span> <span class="o">*</span><span class="n">bins</span><span class="p">;</span>
    <span class="cm">/* Start and bound index of entries in array entries.
       entries_starts and entries_bound are in interval
       [0,allocated_entries].  */</span>
    <span class="n">st_index_t</span> <span class="n">entries_start</span><span class="p">,</span> <span class="n">entries_bound</span><span class="p">;</span>
    <span class="cm">/* Array of size 2^entry_power.  */</span>
    <span class="n">set_table_entry</span> <span class="o">*</span><span class="n">entries</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div></div>

<p>I was first attracted to the trio of <code class="language-plaintext highlighter-rouge">num_entries</code>, <code class="language-plaintext highlighter-rouge">entries_start</code>, and <code class="language-plaintext highlighter-rouge">entries_bound</code>. All of these are <code class="language-plaintext highlighter-rouge">8B</code> integers, so if I could eliminate just one of them, I’d be set<sup id="fnref:3"><a href="#fn:3" class="footnote" rel="footnote" role="doc-noteref">3</a></sup>.</p>

<p>Without being really intimate with the set implementation, I guessed that surely, if you know how many entries you have, you don’t need both the offset of the start and end of the entries list.
So in theory, I could just replace every reference to <code class="language-plaintext highlighter-rouge">entries_bound</code> by <code class="language-plaintext highlighter-rouge">entries_start + num_entries</code>.</p>

<p>What I do when I experiment with code I’m not fully familiar with, is that I try to prove my assumptions.
Here I wrote a small helper function:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kr">inline</span> <span class="n">st_index_t</span>
<span class="nf">set_entries_bound</span><span class="p">(</span><span class="k">const</span> <span class="k">struct</span> <span class="n">set_table</span> <span class="o">*</span><span class="n">set</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">RUBY_ASSERT</span><span class="p">(</span><span class="n">set</span><span class="o">-&gt;</span><span class="n">entries_start</span> <span class="o">+</span> <span class="n">set</span><span class="o">-&gt;</span><span class="n">num_entries</span> <span class="o">==</span> <span class="n">set</span><span class="o">-&gt;</span><span class="n">entries_bound</span><span class="p">);</span>
    <span class="k">return</span> <span class="n">set</span><span class="o">-&gt;</span><span class="n">entries_bound</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>And then went over the code to replace all the direct accesses to <code class="language-plaintext highlighter-rouge">set-&gt;entries_bound</code> by my helper, and tried to run the test suite to see if that <code class="language-plaintext highlighter-rouge">RUBY_ASSERT</code> would trip or not.</p>

<p>Well, turns out it wasn’t that simple… After seeing the test suite light up like a Christmas tree, I dug into the code
helped by the backtraces in the crash reports, and realized the <code class="language-plaintext highlighter-rouge">entries_bound</code> doesn’t always match the entries’ size,
There is even a comment about it in the code:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="cm">/* Do not update entries_bound here.  Otherwise, we can fill all
       bins by deleted entry value before rebuilding the table.  */</span>
</code></pre></div></div>

<p>So that was a bust, and I went back to the drawing board.</p>

<p>After some more staring and eyebrow frowning, I got another idea.</p>

<p>Ruby’s hash-tables (Ruby sets are hash-sets) are ordered.
Hence, you can see them as the combination of a regular unordered hash table and a classic array. The hash-table values are just offset into that array.</p>

<p>Here, the hash-table part is the <code class="language-plaintext highlighter-rouge">st_index_t *bins</code>, and the array part is <code class="language-plaintext highlighter-rouge">set_table_entry *entries</code>.</p>

<p>Both of these are memory regions allocated with <code class="language-plaintext highlighter-rouge">malloc</code>, and they are grown and shrunk at the same time when you add or remove elements from the set.</p>

<p>Hence, if we can know how large one of them is, we could allocate both with a single <code class="language-plaintext highlighter-rouge">malloc</code>, and then access the other by simply skipping over the first one.</p>

<p>In this case, the size of <code class="language-plaintext highlighter-rouge">set_table.bins</code> is indicated by <code class="language-plaintext highlighter-rouge">set_table.bin_power</code>:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/* Return size of the allocated bins of table TAB.  */</span>
<span class="k">static</span> <span class="kr">inline</span> <span class="n">st_index_t</span>
<span class="nf">set_bins_size</span><span class="p">(</span><span class="k">const</span> <span class="n">set_table</span> <span class="o">*</span><span class="n">tab</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">return</span> <span class="n">features</span><span class="p">[</span><span class="n">tab</span><span class="o">-&gt;</span><span class="n">entry_power</span><span class="p">].</span><span class="n">bins_words</span> <span class="o">*</span> <span class="k">sizeof</span> <span class="p">(</span><span class="n">st_index_t</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>That’s how <a href="https://github.com/ruby/ruby/commit/9250ece276bae357a6ac42cb832c67bbfab0eb01">with a relatively small patch, I was able to save 8B from <code class="language-plaintext highlighter-rouge">struct set_table</code></a>,
which could allow us to keep <code class="language-plaintext highlighter-rouge">Set</code> objects in <code class="language-plaintext highlighter-rouge">80B</code> slots even if we make embedded <code class="language-plaintext highlighter-rouge">RTypedData</code> <code class="language-plaintext highlighter-rouge">32B</code> again.</p>

<p>However, I still need to run some benchmarks to make sure this patch wouldn’t degrade set performance significantly.</p>

<h2 id="lookup-cache">Lookup Cache</h2>

<p>For some remaining types like <code class="language-plaintext highlighter-rouge">T_STRING</code>, <code class="language-plaintext highlighter-rouge">T_ARRAY</code>, or <code class="language-plaintext highlighter-rouge">T_HASH</code>, it’s unlikely we’ll ever find spaces in their slots for an extra reference.
So I had another idea to speed up accesses and reduce contention.</p>

<p>The core of the assumption is that whenever we look up the instance variables of an object, there is a high chance that the next lookup will be for the same object.</p>

<p>So what if we kept a cache of the last object we looked up, and its associated <code class="language-plaintext highlighter-rouge">T_IMEMO/fields</code>?</p>

<p>In pseudo-ruby:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">module</span> <span class="nn">GenericIvarObject</span>
  <span class="no">GENERIC_FIELDS_TBL</span> <span class="o">=</span> <span class="no">Hash</span><span class="p">.</span><span class="nf">new</span><span class="p">.</span><span class="nf">compare_by_identity</span>

  <span class="k">def</span> <span class="nf">instance_variable_get</span><span class="p">(</span><span class="n">ivar_name</span><span class="p">)</span>
    <span class="k">if</span> <span class="n">ivar_shape</span> <span class="o">=</span> <span class="nb">self</span><span class="p">.</span><span class="nf">shape</span><span class="p">.</span><span class="nf">find</span><span class="p">(</span><span class="n">ivar_name</span><span class="p">)</span>
      <span class="n">fields_obj</span> <span class="o">=</span> <span class="k">if</span> <span class="no">Fiber</span><span class="p">[</span><span class="ss">:__last_obj__</span><span class="p">]</span> <span class="o">==</span> <span class="nb">self</span>
        <span class="no">Fiber</span><span class="p">[</span><span class="ss">:__last_fields__</span><span class="p">]</span>
      <span class="k">else</span>
        <span class="no">Fiber</span><span class="p">[</span><span class="ss">:__last_obj__</span><span class="p">]</span> <span class="o">=</span> <span class="nb">self</span>
        <span class="no">Fiber</span><span class="p">[</span><span class="ss">:__last_obj__</span><span class="p">]</span> <span class="o">=</span> <span class="no">RubyVM</span><span class="p">.</span><span class="nf">synchronize</span> <span class="k">do</span>
          <span class="no">GENERIC_FIELDS_TBL</span><span class="p">[</span><span class="nb">self</span><span class="p">]</span>
        <span class="k">end</span>
      <span class="k">end</span>

      <span class="n">fields_obj</span><span class="p">.</span><span class="nf">instance_variable_get</span><span class="p">(</span><span class="n">ivar_name</span><span class="p">)</span>
    <span class="k">end</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>Given that the cache is in fiber local storage, we don’t need to protect it with a lock.</p>

<p>I have <a href="https://github.com/ruby/ruby/pull/14132">a draft patch for that idea</a> that I need to polish and benchmark, but I like that it’s quite simple.</p>

<h2 id="future-work">Future Work</h2>

<p>Ultimately, for the remaining cases, it would be good if the Ruby VM had a proper concurrent-map implementation to allow lock-free lookups into the generic instance variables table.
However, concurrent maps are <em>hard</em>, so it might not happen any time soon.</p>

<p>In the meantime, for the more important types like <code class="language-plaintext highlighter-rouge">T_STRUCT</code> and <code class="language-plaintext highlighter-rouge">T_DATA</code>, we now have solutions, either already merged or potentially soon to be, and for others, we have a way to reduce how often we look up the table.
And all that improves performance for both single-threaded and multi-ractor applications, so it’s a win-win.</p>

<p>My biggest concern with Ractors is that at some point we’d significantly impact single-threaded performance for the benefit of Ractors, so when we find optimizations that improve both use-cases, I’m particularly happy.</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1">
      <p>You might think that since an object can’t be visible by more than one ractor unless it is frozen, then this isn’t a concern. But actually, since <code class="language-plaintext highlighter-rouge">object_id</code> is now essentially a memoized instance variable, it can happen. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2">
      <p>There was <a href="https://www.youtube.com/watch?v=wCOuJB6MEQo">a pretty good talk on that subject</a> at Euruko 2024. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:3">
      <p>Pun intended. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name></name></author><category term="ruby" /><category term="performance" /><summary type="html"><![CDATA[In two previous posts, I explained that one of the big blockers for Ractors’ viability is that while they’re supposed to run fully in parallel, in many cases, they’d perform worse than a single thread because there were numerous codepaths in the Ruby virtual machine and runtime that were still protected by the global VM lock.]]></summary></entry><entry><title type="html">What’s wrong with the JSON gem API?</title><link href="https://byroot.github.io/ruby/json/2025/08/02/whats-wrong-with-the-json-gem-api.html" rel="alternate" type="text/html" title="What’s wrong with the JSON gem API?" /><published>2025-08-02T09:03:51+00:00</published><updated>2025-08-02T09:03:51+00:00</updated><id>https://byroot.github.io/ruby/json/2025/08/02/whats-wrong-with-the-json-gem-api</id><content type="html" xml:base="https://byroot.github.io/ruby/json/2025/08/02/whats-wrong-with-the-json-gem-api.html"><![CDATA[<p>As I mentioned at the start of my <a href="/ruby/json/2024/12/15/optimizing-ruby-json-part-1.html">Optimizing Ruby’s JSON</a> series of posts,
performance isn’t why I candidated to be the new gem’s maintainer.</p>

<p>The actual reason is that the gem has many APIs that I think aren’t very good, and some that are outright dangerous.</p>

<p>As a gem user, it’s easy to be annoyed at deprecations and breaking changes.
It’s noisy and creates extra work, so I entirely understand that people may suffer from deprecation fatigue.
But while it occasionally happens to run into mostly cosmetic deprecations that aren’t really worth the churn they cause (and that annoys me a lot too),
most of the time there’s a good reason for them, it just is very rarely conveyed to the users, and even more rarely discussed,
so let’s do that for once.</p>

<p>So I’d like to go over some of the API changes and deprecations I already implemented or will likely implement soon,
given it’s a good occasion to explain why the change is valuable, and to talk about API design more broadly.</p>

<h2 id="dealing-with-deprecations-in-ruby">Dealing With Deprecations in Ruby</h2>

<p>But before I delve into deprecated API, I’d like to mention how to effectively deal with deprecations in modern Ruby.</p>

<p>Since Ruby 2.7, warning messages emitted with <code class="language-plaintext highlighter-rouge">Kernel#warn</code> are categorized, and one of the available categories is <code class="language-plaintext highlighter-rouge">:deprecated</code>.
By default, deprecation warnings are silenced; to display them, you must enable the <code class="language-plaintext highlighter-rouge">:deprecated</code> category like so:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">Warning</span><span class="p">[</span><span class="ss">:deprecated</span><span class="p">]</span> <span class="o">=</span> <span class="kp">true</span>
</code></pre></div></div>

<p>It is very highly recommended to do so in your test suite, so much so that Rails and Minitest will do it by default.</p>

<p>However, if you are using RSpec, you’ll have to do it yourself in your <code class="language-plaintext highlighter-rouge">spec_helper.rb</code> file, because we’ve tried to get
<a href="https://github.com/rspec/rspec/issues/37">RSpec to do it too for over four years now, but without success</a>.
But I’m still hopeful <a href="https://github.com/rspec/rspec/pull/161">it will eventually happen</a>.</p>

<p>Another useful thing to know about Ruby’s <code class="language-plaintext highlighter-rouge">Kernel#warn</code> method is that under the hood, it calls the <code class="language-plaintext highlighter-rouge">Warning.warn</code> method,
allowing you to redefine it and customize its behavior.</p>

<p>For instance, you could turn warnings into errors like this:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">module</span> <span class="nn">Warning</span>
  <span class="k">def</span> <span class="nf">warn</span><span class="p">(</span><span class="n">message</span><span class="p">,</span> <span class="o">...</span><span class="p">)</span>
    <span class="k">raise</span> <span class="n">message</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>Doing so both ensures warnings aren’t missed, and helps tracking them down as you’ll get an exception with a full backtrace
rather than a warning that points at a single call-site that may not necessarily help you find the problem.</p>

<p>This is a pattern I use in most of my own projects, and that <a href="https://github.com/rails/rails/blob/add5a73b26e78d6b13945525874749ae40af21c7/tools/strict_warnings.rb">I also included into Rails’ own test suite</a>.
For larger projects, where being deprecation-free all the time may be complicated, there’s also the more sophisticated <a href="https://github.com/Shopify/deprecation_toolkit"><code class="language-plaintext highlighter-rouge">deprecation_toolkit</code> gem</a>.</p>

<h2 id="the-create_additions-option">The create_additions Option</h2>

<p>Now, let’s start with the API that convinced me to request maintainership.</p>

<p>Do you know the difference between <code class="language-plaintext highlighter-rouge">JSON.load</code> and <code class="language-plaintext highlighter-rouge">JSON.parse</code>?</p>

<p>There’s more than one, but the main difference is that it has a different set of options enabled by default, and notably
one that is a massive footgun: <code class="language-plaintext highlighter-rouge">create_additions: true</code>.</p>

<p>This option is so bad that <a href="https://github.com/rubocop/rubocop/pull/3448">Rubocop’s default set of rules bans <code class="language-plaintext highlighter-rouge">JSON.load</code> outright for security reasons</a>,
and it has been involved in more than one <a href="https://discuss.rubyonrails.org/t/cve-2023-27531-possible-deserialization-of-untrusted-data-vulnerability-in-kredis-json/82467">security vulnerabilities</a>.</p>

<p>Let’s dig into what it does:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">require</span> <span class="s2">"json"</span>

<span class="k">class</span> <span class="nc">Point</span>
  <span class="k">class</span> <span class="o">&lt;&lt;</span> <span class="nb">self</span>
    <span class="k">def</span> <span class="nf">json_create</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
      <span class="n">new</span><span class="p">(</span><span class="n">data</span><span class="p">[</span><span class="s2">"x"</span><span class="p">],</span> <span class="n">data</span><span class="p">[</span><span class="s2">"y"</span><span class="p">])</span>
    <span class="k">end</span>
  <span class="k">end</span>

  <span class="k">def</span> <span class="nf">initialize</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>
    <span class="vi">@x</span> <span class="o">=</span> <span class="n">x</span>
    <span class="vi">@y</span> <span class="o">=</span> <span class="n">y</span>
  <span class="k">end</span>
<span class="k">end</span>

<span class="n">document</span> <span class="o">=</span> <span class="o">&lt;&lt;~</span><span class="no">'JSON'</span><span class="sh">
  {
    "json_class": "Point",
    "x": 123.456,
    "y": 789.321
  }
</span><span class="no">JSON</span>

<span class="nb">p</span> <span class="no">JSON</span><span class="p">.</span><span class="nf">parse</span><span class="p">(</span><span class="n">document</span><span class="p">)</span>
<span class="c1"># =&gt; {"json_class" =&gt; "Point", "x" =&gt; 123.456, "y" =&gt; 789.321}</span>

<span class="nb">p</span> <span class="no">JSON</span><span class="p">.</span><span class="nf">load</span><span class="p">(</span><span class="n">document</span><span class="p">)</span>
<span class="c1"># =&gt; #&lt;Point:0x00000001007f6d08 @x=123.456, @y=789.321&gt;</span>
</code></pre></div></div>

<p>So what the <code class="language-plaintext highlighter-rouge">create_additions: true</code> parsing option does is that when it notices an object with the special key <code class="language-plaintext highlighter-rouge">"json_class"</code>,
It resolves the constant and calls <code class="language-plaintext highlighter-rouge">#json_create</code> on it with the object.</p>

<p>By itself, this isn’t really a security vulnerability, as only classes with a <code class="language-plaintext highlighter-rouge">.json_create</code> method can be instantiated this way.
But if you’ve been using Ruby for a long time, this may remind you of similar issues with gems like <code class="language-plaintext highlighter-rouge">YAML</code> where similar capabilities
were exploited.</p>

<p>That’s the problem with these sorts of duck-typed APIs: they are way too global.</p>

<p>You can have a piece of code using <code class="language-plaintext highlighter-rouge">JSON.load</code> that is perfectly safe on its own, but then if it’s embedded in an application
that also loads some other piece of code that defines some <code class="language-plaintext highlighter-rouge">.json_create</code> methods you weren’t expecting, you may end up with
an unforeseen vulnerability.</p>

<p>But even if you don’t define any <code class="language-plaintext highlighter-rouge">json_create</code> methods, the gem will always define one on <code class="language-plaintext highlighter-rouge">String</code>:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;</span> <span class="nb">require</span> <span class="s2">"json"</span>
<span class="o">&gt;&gt;</span> <span class="no">JSON</span><span class="p">.</span><span class="nf">load</span><span class="p">(</span><span class="s1">'{"json_class": "String", "raw": [112, 119, 110, 101, 100]}'</span><span class="p">)</span>
<span class="o">=&gt;</span> <span class="s2">"pwned"</span>
</code></pre></div></div>

<p>Here again, you probably need to find some specific circumstances to exploit that, but you can probably see how this
trick can be used to bypass a validation check of some sort.</p>

<p>So what do I plan to do about it? Several things.</p>

<p>First, I deprecated the implicit <code class="language-plaintext highlighter-rouge">create_additions: true</code> option. If you use <code class="language-plaintext highlighter-rouge">JSON.load</code> for that feature, a deprecation
warning will be emitted, asking to use <code class="language-plaintext highlighter-rouge">JSON.unsafe_load</code> instead:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">require</span> <span class="s2">"json"</span>
<span class="no">Warning</span><span class="p">[</span><span class="ss">:deprecated</span><span class="p">]</span> <span class="o">=</span> <span class="kp">true</span>
<span class="no">JSON</span><span class="p">.</span><span class="nf">load</span><span class="p">(</span><span class="s1">'{"json_class": "String", "raw": [112, 119, 110, 101, 100]}'</span><span class="p">)</span>
<span class="c1"># /tmp/j.rb:3: warning: JSON.load implicit support for `create_additions: true`</span>
<span class="c1"># is deprecated and will be removed in 3.0,</span>
<span class="c1"># use JSON.unsafe_load or explicitly pass `create_additions: true`</span>
</code></pre></div></div>

<p>That being said, considering how wonky this feature is, I’m also considering extracting it into another gem.</p>

<p>This used to be impossible, as it was baked deep into the both the C and the Java parsers,
but <a href="https://github.com/ruby/json/pull/774">I recently refactored it to be pure Ruby code using a callback exposed by the parsers</a>.</p>

<p>Now you can provide a <code class="language-plaintext highlighter-rouge">Proc</code> to <code class="language-plaintext highlighter-rouge">JSON.load</code>, the parser will invoke it for every parsed value, allowing you to substitute
a value by another:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">cb</span> <span class="o">=</span> <span class="o">-&gt;</span><span class="p">(</span><span class="n">obj</span><span class="p">)</span> <span class="k">do</span>
  <span class="k">case</span> <span class="n">obj</span>
  <span class="k">when</span> <span class="no">String</span>
    <span class="n">obj</span><span class="p">.</span><span class="nf">upcase</span>
  <span class="k">else</span>
    <span class="n">obj</span>
  <span class="k">end</span>
<span class="k">end</span>

<span class="nb">p</span> <span class="no">JSON</span><span class="p">.</span><span class="nf">load</span><span class="p">(</span><span class="s1">'["a", {"b": 1}]'</span><span class="p">,</span> <span class="n">cb</span><span class="p">)</span>
<span class="c1"># =&gt; ["A", {"B" =&gt; 1}]</span>
</code></pre></div></div>

<p>Prior to that change, <code class="language-plaintext highlighter-rouge">JSON.load</code> already accepted a Proc, but its return value was ignored.</p>

<p>The nice thing is that this callback also now serves as a much safer and flexible way to handle the serialization of rich objects.
For instance, you could implement something like this:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">types</span> <span class="o">=</span> <span class="p">{</span>
  <span class="s2">"range"</span> <span class="o">=&gt;</span> <span class="no">MyRangeType</span>
<span class="p">}</span>
<span class="n">cb</span> <span class="o">=</span> <span class="o">-&gt;</span><span class="p">(</span><span class="n">obj</span><span class="p">)</span> <span class="k">do</span>
  <span class="k">case</span> <span class="n">obj</span>
  <span class="k">when</span> <span class="no">Hash</span>
    <span class="k">if</span> <span class="n">type</span> <span class="o">=</span> <span class="n">types</span><span class="p">[</span><span class="n">obj</span><span class="p">[</span><span class="s2">"__type"</span><span class="p">]]</span>
      <span class="n">type</span><span class="p">.</span><span class="nf">load</span><span class="p">(</span><span class="n">obj</span><span class="p">)</span>
    <span class="k">else</span>
      <span class="n">obj</span>
    <span class="k">end</span>
  <span class="k">else</span>
    <span class="n">obj</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>While this requires more code from the user, it gives much tighter control over the deserialization,
but more importantly, it isn’t global anymore.
If a library uses this feature to deserialize trusted data, its callback is never going to be invoked by another library
like it’s the case with the old <code class="language-plaintext highlighter-rouge">Class#json_create</code> API.</p>

<p>The obvious solution would have been to follow the same route as <code class="language-plaintext highlighter-rouge">YAML</code>, with its <code class="language-plaintext highlighter-rouge">permitted_classes</code> argument, but
in my opinion, it wouldn’t have addressed the root of the problem, and it makes for a very unpleasant API to use.</p>

<p>Instead, I believe this Proc interface provides the same functionality as before, but in a way that is both more
flexible and safer.</p>

<p>I think this is a clear case for deprecation, given it is very rarely needed, has security implications, and surprises users.</p>

<h2 id="parsing-of-duplicate-keys">Parsing of Duplicate Keys</h2>

<p>Another behavior of the parser I recently deprecated is the treatment of duplicate keys.
Consider the following code:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">p</span> <span class="no">JSON</span><span class="p">.</span><span class="nf">parse</span><span class="p">(</span><span class="s1">'{"a": 1, "a": 2}'</span><span class="p">)[</span><span class="s2">"a"</span><span class="p">]</span>
</code></pre></div></div>

<p>What do you think it should return? You could argue that the first key or the last key should win, or that this should
result in a parse error.</p>

<p>Unfortunately, JSON is a bit of a “post-specified” format, as in it started as <a href="https://www.json.org/json-en.html">an extremely simple document</a>.
All it says about “objects” is:</p>

<blockquote>
  <p>An object is an unordered set of name/value pairs.
An object begins with <code class="language-plaintext highlighter-rouge">{</code> and ends with <code class="language-plaintext highlighter-rouge">}</code>.
Each name is followed by <code class="language-plaintext highlighter-rouge">:</code> and the name/value pairs are separated by <code class="language-plaintext highlighter-rouge">,</code>.</p>
</blockquote>

<p>That’s it, that’s the extent of the specification, as you can see, there is no mention of what a parser should do if it encounters a duplicate key.</p>

<p>Later on, various standardisation bodies tried to specify JSON based on the implementations out there.</p>

<p>Hence, we now have IETF’s STD 90, also known as <a href="https://datatracker.ietf.org/doc/html/rfc8259">RFC 8259</a>, which states:</p>

<blockquote>
  <p>Many implementations report the last name/value pair only.
Other implementations report an error or fail to parse the object,
and some implementations report all of the name/value pairs, including duplicates.</p>
</blockquote>

<p>In other words, it acknowledges most implementations return the last seen pair, but doesn’t prescribe any particular behavior.</p>

<p>There’s also the <a href="https://ecma-international.org/wp-content/uploads/ECMA-404_2nd_edition_december_2017.pdf">ECMA-404 standard</a></p>

<blockquote>
  <p>The JSON syntax does not impose any restrictions on the strings used as names,
does not require that name strings be unique, and does not assign any significance to the ordering of name/value pairs.
These are all semantic considerations that may be defined by
JSON processors or in specifications defining specific uses of JSON for data interchange.</p>
</blockquote>

<p>Which is pretty much the specification language equivalent of: 🤷‍♂️.</p>

<p>The problem with under-specified formats is that they can sometimes be exploited, the classic example being
<a href="https://en.wikipedia.org/wiki/HTTP_request_smuggling">HTTP request smuggling</a>.</p>

<p>And while it wasn’t an exploitation per se, <a href="https://hackerone.com/reports/3000510#activity-32819479">a security issue happened to Hacker One</a>,
in part because of that behavior.
Technically, the bug was on the JSON generation side, but if the JSON’s gem parser didn’t silently accept duplicated keys,
they would have caught it early in development.</p>

<p>That’s why starting from version <code class="language-plaintext highlighter-rouge">2.13.0</code>, <code class="language-plaintext highlighter-rouge">JSON.parse</code> <a href="https://github.com/ruby/json/pull/818">now accepts a new <code class="language-plaintext highlighter-rouge">allow_duplicate_key:</code> keyword argument</a>,
and if not explicitly allowed, a deprecation warning is emitted if a duplicate key is encountered:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">require</span> <span class="s2">"json"</span>
<span class="no">Warning</span><span class="p">[</span><span class="ss">:deprecated</span><span class="p">]</span> <span class="o">=</span> <span class="kp">true</span>

<span class="nb">p</span> <span class="no">JSON</span><span class="p">.</span><span class="nf">parse</span><span class="p">(</span><span class="s1">'{"a": 1, "a": 2}'</span><span class="p">)</span>
<span class="c1"># =&gt; {"a" =&gt; 2}</span>

<span class="c1"># /tmp/j.rb:4: warning: detected duplicate key "a" in JSON object.</span>
<span class="c1"># This will raise an error in json 3.0 unless enabled via `allow_duplicate_key: true`</span>
<span class="c1">#at line 1 column 1</span>
</code></pre></div></div>

<p>As mentioned in the warning message, I plan to change the default behavior to be an error in the next major version, but of course
it will always be possible to explicitly allow for duplicate keys, for the rare cases where it’s needed.</p>

<p>Here again, I think this deprecation is justified because duplicated keys are rare, but also almost always a mistake,
hence I expect few people to need to change anything, and the ones who do will likely learn about a previously unnoticed
mistake in their application.</p>

<h2 id="the-to_json-and-to_s-methods">The to_json And to_s Methods</h2>

<p>Before you gasp in horror, don’t worry, I don’t plan on deprecating the <code class="language-plaintext highlighter-rouge">Object#to_json</code> method, ever.
It is way too widespread for this to ever be acceptable.</p>

<p>But that doesn’t mean this API is good, nor that nothing should be done about it.</p>

<p>At the center of the <code class="language-plaintext highlighter-rouge">json</code> gem API, there’s the notion that objects can define themselves how they should be
serialized into JSON by responding to the <code class="language-plaintext highlighter-rouge">to_json</code> method.</p>

<p>At first sight, it seems like a perfectly fine API, it’s an interface that objects can implement, fairly classic object-oriented design.</p>

<p>Here’s an example that changes how <code class="language-plaintext highlighter-rouge">Time</code> objects are serialized.</p>

<p>By default, <code class="language-plaintext highlighter-rouge">json</code> will call <code class="language-plaintext highlighter-rouge">#to_s</code> on objects it doesn’t know how to handle:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;</span> <span class="nb">puts</span> <span class="no">JSON</span><span class="p">.</span><span class="nf">generate</span><span class="p">({</span> <span class="ss">created_at: </span><span class="no">Time</span><span class="p">.</span><span class="nf">now</span> <span class="p">})</span>
<span class="p">{</span><span class="s2">"created_at"</span><span class="ss">:"2025-08-02 13:03:32 +0200"</span><span class="p">}</span>
</code></pre></div></div>

<p>But we can instruct it to instead serialize <code class="language-plaintext highlighter-rouge">Time</code> using the ISO8601 / <a href="https://datatracker.ietf.org/doc/html/rfc3339">RFC 3339</a>
format:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Time</span>
  <span class="k">def</span> <span class="nf">to_json</span><span class="p">(</span><span class="o">...</span><span class="p">)</span>
    <span class="n">iso8601</span><span class="p">(</span><span class="mi">3</span><span class="p">).</span><span class="nf">to_json</span><span class="p">(</span><span class="o">...</span><span class="p">)</span>
  <span class="k">end</span>
<span class="k">end</span>

<span class="o">&gt;&gt;</span> <span class="nb">puts</span> <span class="no">JSON</span><span class="p">.</span><span class="nf">generate</span><span class="p">({</span> <span class="ss">created_at: </span><span class="no">Time</span><span class="p">.</span><span class="nf">now</span> <span class="p">})</span>
<span class="p">{</span><span class="s2">"created_at"</span><span class="ss">:"2025-08-02T13:05:04.160+02:00"</span><span class="p">}</span>
</code></pre></div></div>

<p>This seems all well and good, but the problem, like for the <code class="language-plaintext highlighter-rouge">.json_create</code> method, is that this is a global behavior.
An application may very well need to serialize dates in different ways in different contexts.</p>

<p>Worse, in the context of a library, say an API client that needs to serialize <code class="language-plaintext highlighter-rouge">Time</code> in a specific way, it’s not really
possible to use this API, you can’t assume it’s acceptable to change such a global behavior, given you know nothing about the application in which you’ll run.</p>

<p>So to me, there are two problems here. First, using <code class="language-plaintext highlighter-rouge">#to_s</code> as a fallback works for a few types, like date, but it is really not helpful
for the overwhelming majority of other objects:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;</span> <span class="nb">puts</span> <span class="no">JSON</span><span class="p">.</span><span class="nf">generate</span><span class="p">(</span><span class="no">Object</span><span class="p">.</span><span class="nf">new</span><span class="p">)</span>
<span class="s2">"#&lt;Object:0x000000011ce214a0&gt;"</span>
</code></pre></div></div>

<p>I really can’t think of a situation in which this is the behavior that you want. If <code class="language-plaintext highlighter-rouge">JSON.generate</code> ends up calling <code class="language-plaintext highlighter-rouge">to_s</code> on an object, I’m willing to bet that in 99% of the time, the developer didn’t intend for that object to be serialized, or forgot to implement a <code class="language-plaintext highlighter-rouge">#to_json</code> on it.</p>

<p>Either way, it would be way more useful to raise an error, and requires that an explicit method to serialize that unknown object be provided.</p>

<p>The second is that it should be possible to customize a given type serialization locally, instead of globally.</p>

<p>In addition, returning a String as a JSON fragment is also not great, because it means recursively calling generators, and
allows to generate invalid documents:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Broken</span>
  <span class="k">def</span> <span class="nf">to_json</span>
    <span class="nb">to_s</span>
  <span class="k">end</span>
<span class="k">end</span>

<span class="o">&gt;&gt;</span> <span class="no">Broken</span><span class="p">.</span><span class="nf">new</span><span class="p">.</span><span class="nf">to_json</span>
<span class="o">=&gt;</span> <span class="s2">"#&lt;Broken:0x0000000123054050&gt;"</span>
<span class="o">&gt;&gt;</span> <span class="no">JSON</span><span class="p">.</span><span class="nf">parse</span><span class="p">(</span><span class="no">Broken</span><span class="p">.</span><span class="nf">new</span><span class="p">.</span><span class="nf">to_json</span><span class="p">)</span>
<span class="c1">#&gt; JSON::ParserError: unexpected character: '#&lt;Broken:0x000000011c9377a0&gt;'</span>
<span class="c1"># &gt; at line 1 column 1 </span>
</code></pre></div></div>

<p>That’s the problems the new <code class="language-plaintext highlighter-rouge">JSON::Coder</code> API is meant to solve.</p>

<p>By default, <code class="language-plaintext highlighter-rouge">JSON::Coder</code> only accepts to serialize types that have a direct JSON equivalent, so <code class="language-plaintext highlighter-rouge">Hash</code>, <code class="language-plaintext highlighter-rouge">Array</code>, <code class="language-plaintext highlighter-rouge">String</code> / <code class="language-plaintext highlighter-rouge">Symbol</code>,
<code class="language-plaintext highlighter-rouge">Integer</code>, <code class="language-plaintext highlighter-rouge">Float</code>, <code class="language-plaintext highlighter-rouge">true</code>, <code class="language-plaintext highlighter-rouge">false</code> and <code class="language-plaintext highlighter-rouge">nil</code>. Any type that doesn’t have a direct JSON equivalent produces an error:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;</span> <span class="no">MY_JSON</span> <span class="o">=</span> <span class="no">JSON</span><span class="o">::</span><span class="no">Coder</span><span class="p">.</span><span class="nf">new</span>
<span class="o">&gt;&gt;</span> <span class="no">MY_JSON</span><span class="p">.</span><span class="nf">dump</span><span class="p">({</span><span class="ss">a: </span><span class="mi">1</span><span class="p">})</span>
<span class="o">=&gt;</span> <span class="s2">"{</span><span class="se">\"</span><span class="s2">a</span><span class="se">\"</span><span class="s2">:1}"</span>
<span class="o">&gt;&gt;</span> <span class="no">MY_JSON</span><span class="p">.</span><span class="nf">dump</span><span class="p">({</span><span class="ss">a: </span><span class="no">Time</span><span class="p">.</span><span class="nf">new</span><span class="p">})</span>
<span class="c1">#&gt; JSON::GeneratorError: Time not allowed in JSON</span>
</code></pre></div></div>

<p>But it does allow you to provide a <code class="language-plaintext highlighter-rouge">Proc</code> to define the serialization of all other types:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">MY_JSON</span> <span class="o">=</span> <span class="no">JSON</span><span class="o">::</span><span class="no">Coder</span><span class="p">.</span><span class="nf">new</span> <span class="k">do</span> <span class="o">|</span><span class="n">obj</span><span class="o">|</span>
  <span class="k">case</span> <span class="n">obj</span>
  <span class="k">when</span> <span class="no">Time</span>
    <span class="n">obj</span><span class="p">.</span><span class="nf">iso8601</span><span class="p">(</span><span class="mi">3</span><span class="p">)</span>
  <span class="k">else</span>
    <span class="n">obj</span> <span class="c1"># return `obj` to fail serialization</span>
  <span class="k">end</span>
<span class="k">end</span>

<span class="o">&gt;&gt;</span> <span class="no">MY_JSON</span><span class="p">.</span><span class="nf">dump</span><span class="p">({</span><span class="ss">a: </span><span class="no">Time</span><span class="p">.</span><span class="nf">new</span><span class="p">})</span>
<span class="o">=&gt;</span> <span class="s2">"{</span><span class="se">\"</span><span class="s2">a</span><span class="se">\"</span><span class="s2">:</span><span class="se">\"</span><span class="s2">2025-08-02T14:03:15.091+02:00</span><span class="se">\"</span><span class="s2">}"</span>
</code></pre></div></div>

<p>Contrary to the <code class="language-plaintext highlighter-rouge">#to_json</code> method, here the Proc is expected to return a JSON primitive object, so you don’t have to
concern yourself with JSON escaping rules and such, which is much safer.</p>

<p>But if for some reason you do need to, you still can using <code class="language-plaintext highlighter-rouge">JSON::Fragment</code>:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">MY_JSON</span> <span class="o">=</span> <span class="no">JSON</span><span class="o">::</span><span class="no">Coder</span><span class="p">.</span><span class="nf">new</span> <span class="k">do</span> <span class="o">|</span><span class="n">obj</span><span class="o">|</span>
  <span class="k">case</span> <span class="n">obj</span>
  <span class="k">when</span> <span class="no">SomeRecord</span>
    <span class="no">JSON</span><span class="o">::</span><span class="no">Fragment</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">obj</span><span class="p">.</span><span class="nf">json_blob</span><span class="p">)</span>
  <span class="k">else</span>
    <span class="n">obj</span> <span class="c1"># return `obj` to fail serialization</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>With this new API, it’s now much easier for a gem to customize JSON generation in a local way.</p>

<p>Now, as I said before, I absolutely don’t plan to deprecate <code class="language-plaintext highlighter-rouge">#to_json</code>, nor even the behavior that calls <code class="language-plaintext highlighter-rouge">#to_s</code> on unknown objects.
Even though I think it’s a bad API, and that its replacement is way superior, the <code class="language-plaintext highlighter-rouge">#to_json</code> method has been at the center of the <code class="language-plaintext highlighter-rouge">json</code>
gem from the beginning and would require a massive amount of work from the community to migrate out of.</p>

<p>The decision to deprecate an API should always weigh the benefits against the costs.
Here, the cost is so massive that it is unimaginable for me to even consider it.</p>

<h2 id="load_default_options--dump_default_options">load_default_options / dump_default_options</h2>

<p>Another set of APIs I’ve marked as deprecated are the various <code class="language-plaintext highlighter-rouge">_default_options</code> accessors.</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;</span> <span class="nb">puts</span> <span class="no">JSON</span><span class="p">.</span><span class="nf">dump</span><span class="p">(</span><span class="s2">"http://example.com"</span><span class="p">)</span>
<span class="s2">"http://example.com"</span>
<span class="o">&gt;&gt;</span> <span class="no">JSON</span><span class="p">.</span><span class="nf">dump_default_options</span><span class="p">[</span><span class="ss">:script_safe</span><span class="p">]</span> <span class="o">=</span> <span class="kp">true</span>
<span class="o">&gt;&gt;</span> <span class="nb">puts</span> <span class="no">JSON</span><span class="p">.</span><span class="nf">dump</span><span class="p">(</span><span class="s2">"http://example.com"</span><span class="p">)</span>
<span class="s2">"http:</span><span class="se">\/\/</span><span class="s2">example.com"</span>
</code></pre></div></div>

<p>The concept is simple: you can globally change the default options received by certain methods.</p>

<p>At first sight, this might seem like a convenience, it allows you to set some option without having to pass it around
at potentially dozens of different call sites.</p>

<p>But just like <code class="language-plaintext highlighter-rouge">#to_json</code> and other APIs, this change applies to the entire application, including some dependencies that may
not expect standard JSON methods to behave differently.</p>

<p>And that’s not a hypothetical, I personally ran into a gem that was using JSON to fingerprint some object graphs, e.g.</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">fingerprint</span>
  <span class="no">Digest</span><span class="o">::</span><span class="no">SHA1</span><span class="p">.</span><span class="nf">hexdigest</span><span class="p">(</span><span class="no">JSON</span><span class="p">.</span><span class="nf">dump</span><span class="p">(</span><span class="n">some_object_graph</span><span class="p">))</span>
<span class="k">end</span>
</code></pre></div></div>

<p>That fingerprinting method was well tested in the gem, and was working well in a few dozen applications until one
day someone reported a bug in the gem. After some investigation, I figured the host application in question
had modified <code class="language-plaintext highlighter-rouge">JSON.dump_default_options</code>, causing the fingerprints to be different.</p>

<p>If you think about it, these sorts of global settings aren’t very different from monkey patching:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">JSON</span><span class="p">.</span><span class="nf">singleton_class</span><span class="p">.</span><span class="nf">prepend</span><span class="p">(</span><span class="no">Module</span><span class="p">.</span><span class="nf">new</span> <span class="p">{</span>
  <span class="k">def</span> <span class="nf">dump</span><span class="p">(</span><span class="n">obj</span><span class="p">,</span> <span class="nb">proc</span> <span class="o">=</span> <span class="kp">nil</span><span class="p">,</span> <span class="n">opts</span> <span class="o">=</span> <span class="p">{})</span>
    <span class="n">opts</span> <span class="o">=</span> <span class="n">opts</span><span class="p">.</span><span class="nf">merge</span><span class="p">(</span><span class="ss">script_safe: </span><span class="kp">true</span><span class="p">)</span>
    <span class="k">super</span>
  <span class="k">end</span>
<span class="p">})</span>
</code></pre></div></div>

<p>The overwhelming majority of Rubyists are very aware of the potential pitfalls of monkey patching, and some absolutely loathe it,
yet, these sorts of global configuration APIs don’t get frowned upon as much for some reason.</p>

<p>In some cases, they make sense. e.g. if the configuration is for an application, or a framework (a framework essentially being an application skeleton),
there’s not really a need for local configuration, and a global one is simpler and easier to reason about.
But in a library, that may in turn be used by multiple other libraries with different configuration needs, they’re a problem.</p>

<p>Amusingly, <a href="https://bugs.ruby-lang.org/issues/21311#Avoiding-unexpected-globally-shared-modulesobjects">this sort of API was one of the justifications for the currently experimental namespace feature in Ruby 3.5.0dev</a>,
which shows the <code class="language-plaintext highlighter-rouge">json</code> gem is not the only one with this problem.</p>

<p>Here again, a better solution is the <code class="language-plaintext highlighter-rouge">JSON::Coder</code> API, if you want to centralize your JSON generation configuration across
your codebase, you can allocate a singleton with your desired options:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">module</span> <span class="nn">MyLibrary</span>
  <span class="no">JSON_CODER</span> <span class="o">=</span> <span class="no">JSON</span><span class="o">::</span><span class="no">Coder</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="ss">script_safe: </span><span class="kp">true</span><span class="p">)</span>

  <span class="k">def</span> <span class="nf">do_things</span>
    <span class="no">JSON_CODER</span><span class="p">.</span><span class="nf">dump</span><span class="p">(</span><span class="o">...</span><span class="p">)</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>As a library author, you can even allow your users to substitute the configuration for one of their choosing:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">module</span> <span class="nn">MyLibrary</span>
  <span class="k">class</span> <span class="o">&lt;&lt;</span> <span class="nb">self</span>
    <span class="nb">attr_accessor</span> <span class="ss">:json_coder</span>
  <span class="k">end</span>
  <span class="vi">@json_coder</span> <span class="o">=</span> <span class="no">JSON</span><span class="o">::</span><span class="no">Coder</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="ss">script_safe: </span><span class="kp">true</span><span class="p">)</span>

  <span class="k">def</span> <span class="nf">do_things</span>
    <span class="no">MyLibrary</span><span class="p">.</span><span class="nf">json_coder</span><span class="p">.</span><span class="nf">dump</span><span class="p">(</span><span class="o">...</span><span class="p">)</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>Thankfully, from what I can see of the gem’s usage, these API were very rarely used, so while they’re not a major hindrance,
I figured the cost vs benefit is positive. And if someone really needs to set an option globally, they can monkey-patch JSON,
the effect is the same, and at least it’s more honest.</p>

<h2 id="conclusion">Conclusion</h2>

<p>As mentioned previously, the decision to deprecate shouldn’t be taken lightly.
It’s important to have empathy for the users who will have to deal with the fallout,
and there are a few things more annoying than cosmetic deprecations.</p>

<p>Yet it is also important to recognize when an API is error-prone or even outright dangerous,
and deprecations are sometimes a necessary evil to correct course.</p>

<p>Also, as you probably noticed, a common theme in most of the APIs I don’t like in the <code class="language-plaintext highlighter-rouge">json</code> gem, is global behavior and configuration.
I’m not certain why that is. A part of it might be that as Rubyists we value simplicity and conciseness, and that historically
the community has built its ethos as a reaction against overly verbose and ceremonial enterprise Java APIs, with their dependency injection frameworks and whatnot.</p>

<p>A bit of global state or behavior can sometimes bring a lot of simplicity, but it’s a very sharp tool that needs to be handled with extreme care.</p>]]></content><author><name></name></author><category term="ruby" /><category term="json" /><summary type="html"><![CDATA[As I mentioned at the start of my Optimizing Ruby’s JSON series of posts, performance isn’t why I candidated to be the new gem’s maintainer.]]></summary></entry><entry><title type="html">Unlocking Ractors: class instance variables</title><link href="https://byroot.github.io/ruby/performance/2025/05/24/unlocking-ractors-class-variables.html" rel="alternate" type="text/html" title="Unlocking Ractors: class instance variables" /><published>2025-05-24T09:03:51+00:00</published><updated>2025-05-24T09:03:51+00:00</updated><id>https://byroot.github.io/ruby/performance/2025/05/24/unlocking-ractors-class-variables</id><content type="html" xml:base="https://byroot.github.io/ruby/performance/2025/05/24/unlocking-ractors-class-variables.html"><![CDATA[<p>In <a href="/ruby/performance/2025/02/27/whats-the-deal-with-ractors.html">a previous post about ractors</a>, I explained why
I think it’s really unlikely you’d ever be able to run an entire application inside a ractor, but that they could
still be situationally very useful to move CPU-bound work out of the main thread, and to unlock some parallel algorithm.</p>

<p>But as I mentioned, this is unfortunately not yet viable because there are many known implementation bugs that can lead
to interpreter crashes, and that while they are supposed to execute in parallel, the Ruby VM still has one true global
lock that Ractors need to acquire to perform certain operations, making them often perform worse than the equivalent
single-threaded code.</p>

<p>One of these remaining contention points is class instance variables and class variables, and given it’s quite frequent
for code to check a class or module instance variable as some sort of configuration, this contention point can have a very
sizeable impact on Ractor performance, let me show you with a simple benchmark:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">module</span> <span class="nn">Mod</span>
  <span class="vi">@a</span> <span class="o">=</span> <span class="vi">@b</span> <span class="o">=</span> <span class="vi">@c</span> <span class="o">=</span> <span class="mi">1</span>

  <span class="k">def</span> <span class="nc">self</span><span class="o">.</span><span class="nf">compute</span><span class="p">(</span><span class="n">count</span><span class="p">)</span>
    <span class="n">count</span><span class="p">.</span><span class="nf">times</span> <span class="k">do</span>
      <span class="vi">@a</span> <span class="o">+</span> <span class="vi">@b</span> <span class="o">+</span> <span class="vi">@c</span>
    <span class="k">end</span>
  <span class="k">end</span>
<span class="k">end</span>

<span class="no">ITERATIONS</span> <span class="o">=</span> <span class="mi">1_000_000</span>
<span class="no">PARALLELISM</span> <span class="o">=</span> <span class="mi">8</span>

<span class="k">if</span> <span class="no">ARGV</span><span class="p">.</span><span class="nf">first</span> <span class="o">==</span> <span class="s2">"ractor"</span>
  <span class="n">ractors</span> <span class="o">=</span> <span class="no">PARALLELISM</span><span class="p">.</span><span class="nf">times</span><span class="p">.</span><span class="nf">map</span> <span class="k">do</span>
    <span class="no">Ractor</span><span class="p">.</span><span class="nf">new</span> <span class="k">do</span>
      <span class="no">Mod</span><span class="p">.</span><span class="nf">compute</span><span class="p">(</span><span class="no">ITERATIONS</span><span class="p">)</span>
    <span class="k">end</span>
  <span class="k">end</span>
  <span class="n">ractors</span><span class="p">.</span><span class="nf">each</span><span class="p">(</span><span class="o">&amp;</span><span class="ss">:take</span><span class="p">)</span>
<span class="k">else</span>
  <span class="no">Mod</span><span class="p">.</span><span class="nf">compute</span><span class="p">(</span><span class="no">ITERATIONS</span> <span class="o">*</span> <span class="no">PARALLELISM</span><span class="p">)</span>
<span class="k">end</span>
</code></pre></div></div>

<p>This simplistic micro-benchmark just add three module instance variables together repeatedly.
In one mode it does it serialy in the main thread, and if the <code class="language-plaintext highlighter-rouge">ractor</code> argument is passed, it does as many loop, but with 8
parallel ractors.
Hence in a perfect world, using the Ractors branch should be close to 8 times faster.</p>

<p>However, if you run this benchmark on Ruby’s master branch, this isn’t the result you’ll get:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>hyperfine <span class="nt">-w</span> 1 <span class="s1">'./miniruby --yjit ../test.rb'</span> <span class="s1">'./miniruby --yjit ../test.rb ractor'</span>
Benchmark 1: ./miniruby <span class="nt">--yjit</span> <span class="nt">--disable-all</span> ../test.rb
  Time <span class="o">(</span>mean ± σ<span class="o">)</span>:     252.4 ms ±   1.2 ms    <span class="o">[</span>User: 250.2 ms, System: 1.6 ms]
  Range <span class="o">(</span>min … max<span class="o">)</span>:   249.9 ms … 253.8 ms    11 runs

Benchmark 2: ./miniruby <span class="nt">--yjit</span> <span class="nt">--disable-all</span> ../test.rb ractor
  Time <span class="o">(</span>mean ± σ<span class="o">)</span>:      2.005 s ±  0.013 s    <span class="o">[</span>User: 2.098 s, System: 6.963 s]
  Range <span class="o">(</span>min … max<span class="o">)</span>:    1.992 s …  2.027 s    10 runs

Summary
  ./miniruby <span class="nt">--yjit</span> ../test.rb ran
    7.94 ± 0.06 <span class="nb">times </span>faster than ./miniruby <span class="nt">--yjit</span> ../test.rb ractor
</code></pre></div></div>

<p>That’s right, instead of being 8 times faster, the branch that uses Ractors ended up being 8 times slower.
This is because to read a module or class instance variables, secondary ractors have to acquire the VM lock,
which is a costly operation in itself, and worse, they end up waiting a lot to obtain the lock.</p>

<p>So what can we do about it?</p>

<h2 id="language-semantic">Language Semantic</h2>

<p>Before we delves into how this lock could be removed or reduced, let’s review how class instance variables behave with ractors.</p>

<p>Given that classes are global, their instance variables are too, hence they are essentially global.
Because of this, Ractors can’t let you do everything with them, otherwise, it would be a way to work around Ractors isolation.</p>

<p>The first rule is that only the main Ractor is allowed to set class instance variables:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Test</span>
  <span class="k">class</span> <span class="o">&lt;&lt;</span> <span class="nb">self</span>
    <span class="nb">attr_accessor</span> <span class="ss">:var</span>
  <span class="k">end</span>
<span class="k">end</span>

<span class="no">Test</span><span class="p">.</span><span class="nf">var</span> <span class="o">=</span> <span class="mi">1</span> <span class="c1"># works</span>

<span class="no">Ractor</span><span class="p">.</span><span class="nf">new</span> <span class="k">do</span>
  <span class="c1"># works</span>
  <span class="nb">p</span> <span class="no">Test</span><span class="p">.</span><span class="nf">var</span>

  <span class="c1"># raises Ractor::IsolationError: can not set instance variables</span>
  <span class="c1"># of classes/modules by non-main Ractors</span>
  <span class="no">Test</span><span class="p">.</span><span class="nf">var</span> <span class="o">=</span> <span class="mi">2</span>
<span class="k">end</span><span class="p">.</span><span class="nf">take</span>
</code></pre></div></div>

<p>So secondary ractors can read instance variables on classes and modules, but can’t write them.</p>

<p>The second rule is that they can only read instance variables on classes if the object stored in that variable is shareable:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Test</span>
  <span class="k">class</span> <span class="o">&lt;&lt;</span> <span class="nb">self</span>
    <span class="nb">attr_accessor</span> <span class="ss">:var1</span><span class="p">,</span> <span class="ss">:var2</span>
  <span class="k">end</span>

  <span class="vi">@var1</span> <span class="o">=</span> <span class="p">{}.</span><span class="nf">freeze</span>
  <span class="vi">@var2</span> <span class="o">=</span> <span class="p">{}</span>
<span class="k">end</span>

<span class="no">Ractor</span><span class="p">.</span><span class="nf">new</span> <span class="k">do</span>
  <span class="c1"># works:</span>
  <span class="nb">p</span> <span class="no">Test</span><span class="p">.</span><span class="nf">var1</span>

  <span class="c1"># raises Ractor::IsolationError: can not get unshareable values from</span>
  <span class="c1"># instance variables of classes/modules from non-main Ractors</span>
  <span class="nb">p</span> <span class="no">Test</span><span class="p">.</span><span class="nf">var2</span>
<span class="k">end</span><span class="p">.</span><span class="nf">take</span>
</code></pre></div></div>

<h2 id="reducing-contention">Reducing Contention</h2>

<p>Usually when dealing with lock contention issues, the first solution is to turn one big lock into multiple finer-grained locks.
In our simplistic benchmark, all ractors are accessing variables on the same module, so that wouldn’t help, but we could
assume that in more realistic scenarios, they’d access the variables of many different modules and, hence wouldn’t fight as much
for the same one.</p>

<p>But the way I envision Ractors being used in real-world cases, at least initially, is for running small pieces of
code in parallel, with an API approaching futures:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">futures</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">futures</span> <span class="o">&lt;&lt;</span> <span class="no">Ractor</span><span class="p">.</span><span class="nf">new</span> <span class="p">{</span> <span class="n">fetch_and_compute_prices</span> <span class="p">}</span>
<span class="n">futures</span> <span class="o">&lt;&lt;</span> <span class="no">Ractor</span><span class="p">.</span><span class="nf">new</span> <span class="p">{</span> <span class="n">fetch_and_compute_order_history</span> <span class="p">}</span>
<span class="o">...</span>
<span class="n">futures</span><span class="p">.</span><span class="nf">map</span><span class="p">(</span><span class="o">&amp;</span><span class="ss">:take</span><span class="p">)</span>
</code></pre></div></div>

<p>As such I actually expect Ractors to commonly access the same module or class variables over and over, so introducing more finely grained locks isn’t very enticing.</p>

<p>Another possibility would be to use a <a href="https://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock">read-write lock</a>,
given only the main ractor can “write” variables, all secondary ractors could acquire the read lock concurrently.
But from previous experience, while read-write locks do allow concurrent read threads not to stall, they’re still quite
costly when contented because all threads have to atomically increment and decrement the same value and that isn’t good
for the CPU cache.
It’s a fine solution when the operation you are protecting is a relatively slow one, but in our case, reading an instance
variable is extremely cheap, so any kind of lock, even an uncontended one, will be disproportionally costly and ruin performance.</p>

<p>That’s why the only reasonable solution is to find a way to not use a lock at all.</p>

<h2 id="how-do-instance-variables-work">How do Instance Variables Work</h2>

<p>To understand how we could make instance variables lock-free, we must first understand how they work.
As is now tradition, I’ll try to explain it using Ruby pseudo code, starting with instance variable reads:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Module</span>
  <span class="k">def</span> <span class="nf">instance_variable_get</span><span class="p">(</span><span class="n">variable_name</span><span class="p">)</span>
    <span class="k">if</span> <span class="no">RubyVM</span><span class="p">.</span><span class="nf">main_ractor?</span>
      <span class="c1"># The main ractor is the only one allowed to write instance variables</span>
      <span class="c1"># hence it doesn't need to lock because we know no one else could be</span>
      <span class="c1"># concurrently modifying `@shape` or `@fields`</span>
      <span class="k">if</span> <span class="n">field_index</span> <span class="o">=</span> <span class="vi">@shape</span><span class="p">.</span><span class="nf">field_index_for</span><span class="p">(</span><span class="n">variable_name</span><span class="p">)</span>
        <span class="vi">@fields</span><span class="p">[</span><span class="n">field_index</span><span class="p">]</span>
      <span class="k">end</span>
    <span class="k">else</span>
      <span class="c1"># Secondary ractors must lock the VM even for reads because the main Ractor</span>
      <span class="c1"># could be modifying `@shape` or `@fields` concurrently.</span>
      <span class="no">RubyVM</span><span class="p">.</span><span class="nf">synchronize</span> <span class="k">do</span>
        <span class="k">if</span> <span class="n">field_index</span> <span class="o">=</span> <span class="vi">@shape</span><span class="p">.</span><span class="nf">field_index_for</span><span class="p">(</span><span class="n">variable_name</span><span class="p">)</span>
          <span class="n">value</span> <span class="o">=</span> <span class="vi">@fields</span><span class="p">[</span><span class="n">field_index</span><span class="p">]</span>
          <span class="k">raise</span> <span class="no">Ractor</span><span class="o">::</span><span class="no">IsolationError</span> <span class="k">unless</span> <span class="no">Ractor</span><span class="p">.</span><span class="nf">shareable?</span><span class="p">(</span><span class="n">value</span><span class="p">)</span>
          <span class="n">value</span>
        <span class="k">end</span>
      <span class="k">end</span>
    <span class="k">end</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>I’m not going to explain how shapes work here, as I already explained it in multiple previous posts.
The only thing you really need to know is that instance variables are stored in a continuous array, and shapes
keep track of the offset at which each variable is stored. They also are immutable, so you can query them concurrently.</p>

<p>As a result, reading an instance variable only amount of querying the shape tree to figure out if that particular variable exists,
and if it does, what its index is. After that, we read the variable at the specified offset in the <code class="language-plaintext highlighter-rouge">@fields</code> array of the
object.</p>

<p>However, on secondary Ractors, we additionally need to lock the VM to ensure the shape and the fields are consistent,
but that will be clearer once I explain how writing instance variables works.</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Module</span>
  <span class="k">def</span> <span class="nf">instance_variable_set</span><span class="p">(</span><span class="n">variable_name</span><span class="p">,</span> <span class="n">value</span><span class="p">)</span>
    <span class="k">raise</span> <span class="no">FrozenError</span> <span class="k">if</span> <span class="nb">frozen?</span>
    <span class="c1"># The main ractor is the only one allowed to write instance variables</span>
    <span class="k">raise</span> <span class="no">Ractor</span><span class="o">::</span><span class="no">IsolationError</span> <span class="k">unless</span> <span class="no">RubyVM</span><span class="p">.</span><span class="nf">main_ractor?</span>

    <span class="no">RubyVM</span><span class="p">.</span><span class="nf">synchronize</span> <span class="k">do</span>
      <span class="k">if</span> <span class="n">field_index</span> <span class="o">=</span> <span class="vi">@shape</span><span class="p">.</span><span class="nf">field_index_for</span><span class="p">(</span><span class="n">variable_name</span><span class="p">)</span>
        <span class="c1"># The variable already exists, we replace its value</span>
        <span class="vi">@fields</span><span class="p">[</span><span class="n">field_index</span><span class="p">]</span> <span class="o">=</span> <span class="n">value</span>
      <span class="k">else</span>
        <span class="c1"># The variable doesn't exist, we have to make a shape transition</span>
        <span class="n">next_shape</span> <span class="o">=</span> <span class="vi">@shape</span><span class="p">.</span><span class="nf">add_instance_variable</span><span class="p">(</span><span class="n">variable_name</span><span class="p">)</span>

        <span class="k">if</span> <span class="n">next_shape</span><span class="p">.</span><span class="nf">capacity</span> <span class="o">&gt;</span> <span class="vi">@shape</span><span class="p">.</span><span class="nf">capacity</span>
          <span class="c1"># @fields is full, we need to allocate a larger one</span>
          <span class="n">new_fields</span> <span class="o">=</span> <span class="no">Memory</span><span class="p">.</span><span class="nf">allocate</span><span class="p">(</span><span class="ss">size: </span><span class="n">next_shape</span><span class="p">.</span><span class="nf">capacity</span><span class="p">)</span>
          <span class="n">new_fields</span><span class="p">.</span><span class="nf">replace</span><span class="p">(</span><span class="vi">@fields</span><span class="p">)</span> <span class="c1"># copy content</span>
          <span class="vi">@fields</span><span class="p">,</span> <span class="n">old_fields</span> <span class="o">=</span> <span class="n">new_fields</span><span class="p">,</span> <span class="vi">@fields</span>

          <span class="c1"># The fields array is manually managed memory, so it needs to be freed explicitly</span>
          <span class="no">Memory</span><span class="p">.</span><span class="nf">free</span><span class="p">(</span><span class="n">old_fields</span><span class="p">)</span>
        <span class="k">end</span>

        <span class="vi">@fields</span><span class="p">[</span><span class="n">next_shape</span><span class="p">.</span><span class="nf">field_index</span><span class="p">]</span> <span class="o">=</span> <span class="n">value</span>
        <span class="vi">@shape</span> <span class="o">=</span> <span class="n">next_shape</span>
      <span class="k">end</span>
    <span class="k">end</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>As you can see, the fields array has a given size, if we’re adding a new instance variable, we may need to allocate
a larger one and swap the two, as well as change the object’s shape.</p>

<p>That is why we need to lock the VM, we can’t let another ractor read an instance variable while we’re doing this because
it would run into all sorts of race conditions:</p>

<ul>
  <li>It could be reading inside <code class="language-plaintext highlighter-rouge">old_fields</code> while we’re freeing it, causing a use-after-free bug.</li>
  <li>It could be reading inside <code class="language-plaintext highlighter-rouge">old_fields</code> using the new shape, causing an out-of-bounds read.</li>
  <li>It could be reading inside <code class="language-plaintext highlighter-rouge">new_fields</code> using the new shape, but before we’ve written the new value, causing an uninitialized memory read.</li>
</ul>

<p>Now, if you are not familiar with C, or another low-level programming language, you might be thinking that I’m exaggerating.
After all, updating the shape is the last operation, so surely cases 2 and 3 aren’t possible.</p>

<p>Well, I got some bad news…</p>

<h2 id="memory-model">Memory Model</h2>

<p>Multithreaded programming is tricky, but even more so when allowing multiple threads to read and write the same memory,
because processors have all sorts of caches, hence a variable doesn’t only reside in one place in your RAM.</p>

<p>It can also be copied in the CPU L1/L2/etc caches, or even in the CPU registers.
When one thread writes into a variable, it’s not immediately visible to all other threads, the write will take a while
to propagate back to the RAM.
Worse, if you write into multiple variables in a specific order, it’s not even guaranteed other threads will witness these changes
in the same order.</p>

<p>Let’s consider a simple multi-threaded program:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">Point</span> <span class="o">=</span> <span class="no">Struct</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="ss">:x</span><span class="p">,</span> <span class="ss">:y</span><span class="p">)</span>

<span class="n">treasure</span> <span class="o">=</span> <span class="kp">nil</span>

<span class="n">thread</span> <span class="o">=</span> <span class="no">Thread</span><span class="p">.</span><span class="nf">new</span> <span class="k">do</span>
  <span class="k">while</span> <span class="kp">true</span>
    <span class="k">if</span> <span class="n">treasure</span>
      <span class="nb">puts</span> <span class="s2">"Treasure is at </span><span class="si">#{</span><span class="n">treasure</span><span class="p">.</span><span class="nf">x</span><span class="p">.</span><span class="nf">inspect</span><span class="si">}</span><span class="s2"> / </span><span class="si">#{</span><span class="n">treasure</span><span class="p">.</span><span class="nf">y</span><span class="p">.</span><span class="nf">inspect</span><span class="si">}</span><span class="s2">"</span>
      <span class="k">break</span>
    <span class="k">end</span>
  <span class="k">end</span>
<span class="k">end</span>

<span class="n">point</span> <span class="o">=</span> <span class="no">Point</span><span class="p">.</span><span class="nf">new</span>
<span class="n">point</span><span class="p">.</span><span class="nf">x</span> <span class="o">=</span> <span class="mi">12</span>
<span class="n">point</span><span class="p">.</span><span class="nf">y</span> <span class="o">=</span> <span class="mi">24</span>
<span class="n">treasure</span> <span class="o">=</span> <span class="n">point</span>

<span class="n">thread</span><span class="p">.</span><span class="nf">join</span>
</code></pre></div></div>

<p>As a Ruby programmer, you likely expect this program to print <code class="language-plaintext highlighter-rouge">Treasure is at 12 / 24</code>, and you’d be correct.
After all, we fully initialize the <code class="language-plaintext highlighter-rouge">Point</code> instance before updating the <code class="language-plaintext highlighter-rouge">treasure</code> global variable to point to it.</p>

<p>But if we were to write a similar program in C, the output could be any of:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">Treasure is at 12 / 24</code></li>
  <li><code class="language-plaintext highlighter-rouge">Treasure is at nil / 24</code></li>
  <li><code class="language-plaintext highlighter-rouge">Treasure is at 12 / nil</code></li>
  <li><code class="language-plaintext highlighter-rouge">Treasure is at nil / nil</code></li>
</ul>

<p>Why? Well, this has to do with <a href="https://en.wikipedia.org/wiki/Memory_model_(programming)">memory models</a>.
In order to optimize your code, compilers sometimes may have to change the order of memory reads and writes.
So for programmers to be able to write correct programs, they need to know what the compiler can and cannot do, and that’s
what a language memory model defines. In the case of C, the memory model is very lax, and compilers are allowed to reorder
reads and writes very extensively.</p>

<p>And it’s not only about the compilers. CPUs too can reorder read and write operations.
The <code class="language-plaintext highlighter-rouge">x86</code> (AKA Intel) memory model is quite strict, so it doesn’t reorder much, but the <code class="language-plaintext highlighter-rouge">arm64</code> memory model is much more lax,
so even if your compiler generated the native code in the same order, your CPU could execute them out of order,
giving you unpredictable results.</p>

<p>To work around this problem, C compilers and CPUs provide <a href="https://en.wikipedia.org/wiki/Barrier_(computer_science)">“barriers”</a>.
You can insert them in your code to enforce that reads and write can’t be reordered across such barriers, allowing
you to ensure that all threads will observe memory in a consistent way.</p>

<h2 id="atomic-write">Atomic Write</h2>

<p>From a programmer’s perspective, it’s generally exposed as “atomic” read and write operations, and it’s understood by the
compiler and CPU that memory operations cannot be reordered across atomic operations.</p>

<p>So going back to our <code class="language-plaintext highlighter-rouge">instance_variable_set</code> implementation, we can fix two of the three race conditions by using an atomic
write:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Module</span>
  <span class="k">def</span> <span class="nf">instance_variable_set</span><span class="p">(</span><span class="n">variable_name</span><span class="p">,</span> <span class="n">value</span><span class="p">)</span>
    <span class="k">raise</span> <span class="no">FrozenError</span> <span class="k">if</span> <span class="nb">frozen?</span>
    <span class="c1"># The main ractor is the only one allowed to write instance variables</span>
    <span class="k">raise</span> <span class="no">Ractor</span><span class="o">::</span><span class="no">IsolationError</span> <span class="k">unless</span> <span class="no">RubyVM</span><span class="p">.</span><span class="nf">main_ractor?</span>

    <span class="k">if</span> <span class="n">field_index</span> <span class="o">=</span> <span class="vi">@shape</span><span class="p">.</span><span class="nf">field_index_for</span><span class="p">(</span><span class="n">variable_name</span><span class="p">)</span>
      <span class="c1"># The variable already exists, we replace its value</span>
      <span class="vi">@fields</span><span class="p">[</span><span class="n">field_index</span><span class="p">]</span> <span class="o">=</span> <span class="n">value</span>
    <span class="k">else</span>
      <span class="c1"># The variable doesn't exist, we have to make a shape transition</span>
      <span class="n">next_shape</span> <span class="o">=</span> <span class="vi">@shape</span><span class="p">.</span><span class="nf">add_instance_variable</span><span class="p">(</span><span class="n">variable_name</span><span class="p">)</span>

      <span class="k">if</span> <span class="n">next_shape</span><span class="p">.</span><span class="nf">capacity</span> <span class="o">&gt;</span> <span class="vi">@shape</span><span class="p">.</span><span class="nf">capacity</span>
        <span class="c1"># @fields is full, we need to allocate a larger one</span>
        <span class="n">new_fields</span> <span class="o">=</span> <span class="no">Memory</span><span class="p">.</span><span class="nf">allocate</span><span class="p">(</span><span class="ss">size: </span><span class="n">next_shape</span><span class="p">.</span><span class="nf">capacity</span><span class="p">)</span>
        <span class="n">new_fields</span><span class="p">.</span><span class="nf">replace</span><span class="p">(</span><span class="vi">@fields</span><span class="p">)</span> <span class="c1"># copy content</span>
        <span class="n">old_fields</span> <span class="o">=</span> <span class="vi">@fields</span>
        <span class="c1"># Ensure `@fields` isn't updated before its content has been filled</span>
        <span class="no">Atomic</span><span class="p">.</span><span class="nf">write</span> <span class="p">{</span> <span class="vi">@fields</span> <span class="o">=</span> <span class="n">new_fields</span> <span class="p">}</span>

        <span class="c1"># The fields array is manually managed memory, so it needs to be freed explicitly</span>
        <span class="no">Memory</span><span class="p">.</span><span class="nf">free</span><span class="p">(</span><span class="n">old_fields</span><span class="p">)</span>
      <span class="k">end</span>

      <span class="vi">@fields</span><span class="p">[</span><span class="n">next_shape</span><span class="p">.</span><span class="nf">field_index</span><span class="p">]</span> <span class="o">=</span> <span class="n">value</span>
      <span class="vi">@shape</span> <span class="o">=</span> <span class="n">next_shape</span>
    <span class="k">end</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>With this simple change, we now guarantee that the new <code class="language-plaintext highlighter-rouge">@fields</code> will be visible to other threads before the new <code class="language-plaintext highlighter-rouge">@shape</code> is.</p>

<p>They may still see the old <code class="language-plaintext highlighter-rouge">@shape</code> with the new <code class="language-plaintext highlighter-rouge">@fields</code>, but that’s acceptable because all the offsets <code class="language-plaintext highlighter-rouge">@shape</code> may point to
contain the same values. Pretty neat. Now we only need to find a solution for the use-after-free problem.</p>

<h2 id="our-friend-the-garbage-collector">Our Friend The Garbage Collector</h2>

<p>So our problem is that after we swap the old <code class="language-plaintext highlighter-rouge">@fields</code> array for the new one, we must free the old array to not leak memory.
But if there is no synchronization, we can’t guarantee that another thread doesn’t have a reference to the old array in its
registers or caches, so it may try to read from it after it was freed, and that might lead to a segmentation fault.</p>

<p>Hence, we must wait until there’s no longer any reference to the old array before freeing it, and if you think about it
that’s exactly what a garbage collector does, and lucky for us, Ruby already has one.</p>

<p>So the solution to avoid use-after-free is to use an actual Ruby <code class="language-plaintext highlighter-rouge">Array</code> instead of manually allocated memory,
this way we no longer have to free it explicitly, the garbage collected will take care of it later:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Module</span>
  <span class="k">def</span> <span class="nf">instance_variable_set</span><span class="p">(</span><span class="n">variable_name</span><span class="p">,</span> <span class="n">value</span><span class="p">)</span>
    <span class="k">raise</span> <span class="no">FrozenError</span> <span class="k">if</span> <span class="nb">frozen?</span>
    <span class="c1"># The main ractor is the only one allowed to write instance variables</span>
    <span class="k">raise</span> <span class="no">Ractor</span><span class="o">::</span><span class="no">IsolationError</span> <span class="k">unless</span> <span class="no">RubyVM</span><span class="p">.</span><span class="nf">main_ractor?</span>

    <span class="k">if</span> <span class="n">field_index</span> <span class="o">=</span> <span class="vi">@shape</span><span class="p">.</span><span class="nf">field_index_for</span><span class="p">(</span><span class="n">variable_name</span><span class="p">)</span>
      <span class="c1"># The variable already exists, we replace its value</span>
      <span class="vi">@fields</span><span class="p">[</span><span class="n">field_index</span><span class="p">]</span> <span class="o">=</span> <span class="n">value</span>
    <span class="k">else</span>
      <span class="c1"># The variable doesn't exist, we have to make a shape transition</span>
      <span class="n">next_shape</span> <span class="o">=</span> <span class="vi">@shape</span><span class="p">.</span><span class="nf">add_instance_variable</span><span class="p">(</span><span class="n">variable_name</span><span class="p">)</span>

      <span class="k">if</span> <span class="n">next_shape</span><span class="p">.</span><span class="nf">capacity</span> <span class="o">&gt;</span> <span class="vi">@shape</span><span class="p">.</span><span class="nf">capacity</span>
        <span class="c1"># @fields is full, we need to allocate a larger one</span>
        <span class="n">new_fields</span> <span class="o">=</span> <span class="no">Array</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">next_shape</span><span class="p">.</span><span class="nf">capacity</span><span class="p">)</span>
        <span class="n">new_fields</span><span class="p">.</span><span class="nf">replace</span><span class="p">(</span><span class="vi">@fields</span><span class="p">)</span> <span class="c1"># copy content</span>
        <span class="n">old_fields</span> <span class="o">=</span> <span class="vi">@fields</span>
        <span class="c1"># Ensure `@fields` isn't updated before its content has been filled</span>
        <span class="no">Atomic</span><span class="p">.</span><span class="nf">write</span> <span class="p">{</span> <span class="vi">@fields</span> <span class="o">=</span> <span class="n">new_fields</span> <span class="p">}</span>
      <span class="k">end</span>

      <span class="vi">@fields</span><span class="p">[</span><span class="n">next_shape</span><span class="p">.</span><span class="nf">field_index</span><span class="p">]</span> <span class="o">=</span> <span class="n">value</span>
      <span class="vi">@shape</span> <span class="o">=</span> <span class="n">next_shape</span>
    <span class="k">end</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>Now, if another thread is currently reading inside the old <code class="language-plaintext highlighter-rouge">@fields</code>, it doesn’t matter because it will remain valid
memory until the garbage collector notices it’s no longer referenced by anyone.</p>

<p>And just like that, we now have fully lock-free class instance variable reads and writes!</p>

<p>Well… no. Because we overlooked two complications.</p>

<h2 id="removing-instance-variables">Removing Instance Variables</h2>

<p>Perhaps you don’t know about it, because it’s quite a rare thing to do, but in Ruby, you can remove an object’s instance variables:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Test</span>
  <span class="nb">p</span> <span class="n">instance_variable_defined?</span><span class="p">(</span><span class="ss">:@foo</span><span class="p">)</span> <span class="c1"># =&gt; false</span>
  <span class="vi">@foo</span> <span class="o">=</span> <span class="mi">1</span>
  <span class="nb">p</span> <span class="n">instance_variable_defined?</span><span class="p">(</span><span class="ss">:@foo</span><span class="p">)</span> <span class="c1"># =&gt; true</span>

  <span class="n">remove_instance_variable</span><span class="p">(</span><span class="ss">:@foo</span><span class="p">)</span>
  <span class="nb">p</span> <span class="n">instance_variable_defined?</span><span class="p">(</span><span class="ss">:@foo</span><span class="p">)</span> <span class="c1"># =&gt; false</span>
<span class="k">end</span>
</code></pre></div></div>

<p>And while this is an extremely rare operation, it can happen, hence we must handle it in a thread safe way.</p>

<p>Let’s look at its pseudo-implementation:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Module</span>
  <span class="k">def</span> <span class="nf">remove_instance_variable</span><span class="p">(</span><span class="n">variable_name</span><span class="p">)</span>
    <span class="n">removed_index</span> <span class="o">=</span> <span class="vi">@shape</span><span class="p">.</span><span class="nf">field_index_for</span><span class="p">(</span><span class="n">variable_name</span><span class="p">)</span>

    <span class="c1"># The variable didn't exist in the first place</span>
    <span class="k">return</span> <span class="k">unless</span> <span class="n">removed_index</span>

    <span class="n">next_shape</span> <span class="o">=</span> <span class="vi">@shape</span><span class="p">.</span><span class="nf">remove_instance_variable</span><span class="p">(</span><span class="n">variable_name</span><span class="p">)</span>

    <span class="c1"># Shift fields left</span>
    <span class="n">removed_index</span><span class="p">.</span><span class="nf">upto</span><span class="p">(</span><span class="n">next_shape</span><span class="p">.</span><span class="nf">fields_count</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">index</span><span class="o">|</span>
      <span class="vi">@fields</span><span class="p">[</span><span class="n">index</span><span class="p">]</span> <span class="o">=</span> <span class="vi">@fields</span><span class="p">[</span><span class="n">index</span> <span class="o">+</span> <span class="mi">1</span><span class="p">]</span>
    <span class="k">end</span>

    <span class="vi">@shape</span> <span class="o">=</span> <span class="n">next_shape</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>So when removing an instance variable, we get a new shape that is shorter than the previous one, which means that
all the variables indexed after the one we removed are now lower, so we need to shift all the fields.</p>

<p>To better illustrate, consider the following code:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="vi">@a</span> <span class="o">=</span> <span class="mi">1</span>
<span class="vi">@b</span> <span class="o">=</span> <span class="mi">2</span>
<span class="vi">@c</span> <span class="o">=</span> <span class="mi">3</span>
<span class="n">remove_instance_variable</span><span class="p">(</span><span class="ss">:@b</span><span class="p">)</span>
</code></pre></div></div>

<p>In the snippet above, <code class="language-plaintext highlighter-rouge">@fields</code> will change from <code class="language-plaintext highlighter-rouge">[1, 2, 3]</code> to <code class="language-plaintext highlighter-rouge">[1, 3]</code>, and that’s not really possible to do this in a thread-safe way.</p>

<p>We could, of course, do this shifting in a copy of <code class="language-plaintext highlighter-rouge">@fields</code>, and then swap <code class="language-plaintext highlighter-rouge">@fields</code> atomically, but one major problem would remain: the old shape and the new
shape are fundamentally incompatible.</p>

<p>If you are accessing <code class="language-plaintext highlighter-rouge">@c</code> using the old fields with the new shape, you will get <code class="language-plaintext highlighter-rouge">2</code> which is incorrect.</p>

<p>If you are accessing <code class="language-plaintext highlighter-rouge">@c</code> using new fields with the old shape, you will get whatever is outside the array, or perhaps a segmentation fault.</p>

<p>So in this case, we can’t rely on clever ordering of writes to keep a consistent view of the instance variables for all ractors.</p>

<p>For the anecdote, this isn’t how the initial implementation of object shapes in Ruby worked.</p>

<p>Early in Ruby 3.2 development, <code class="language-plaintext highlighter-rouge">#remove_instance_variable</code> wouldn’t produce a shorter shape, but instead
a child shape of type <code class="language-plaintext highlighter-rouge">UNDEF</code> that would record that the variable at offset <code class="language-plaintext highlighter-rouge">1</code> needs to be considered not defined.</p>

<p>However it was found that this could cause an infinite amount of shapes to be created by misbehaving code:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">obj</span> <span class="o">=</span> <span class="no">Object</span><span class="p">.</span><span class="nf">new</span>
<span class="kp">loop</span> <span class="k">do</span>
  <span class="n">obj</span><span class="p">.</span><span class="nf">instance_variable_set</span><span class="p">(</span><span class="ss">:@foo</span><span class="p">)</span>
  <span class="n">obj</span><span class="p">.</span><span class="nf">remove_instance_variable</span><span class="p">(</span><span class="ss">:@foo</span><span class="p">)</span>
<span class="k">end</span>
</code></pre></div></div>

<p>So instead <a href="https://github.com/ruby/ruby/pull/6866">the implementation was changed to rebuild the shape tree</a>.</p>

<p>That previous implementation would have been useful in this case, as it would have prevented this race condition.
But ultimately it doesn’t matter, because there is another complication I didn’t mention.</p>

<h2 id="complex-shape">Complex Shape</h2>

<p>The other major complication I deliberately overlooked in my explanation thus far, is the existence of complex shapes.</p>

<p>Since shapes are append-only, Ruby code that defines instance variables in random order or often removes instance variables
can potentially generate an infinite combination of shapes, and each shape uses some amount of memory.</p>

<p>That’s why Ruby keeps track of how many shape variations a given class causes, and after a specific threshold (currently 8),
Ruby gives up and marks the class as “too complex”.</p>

<p>If you run this script on a recent Ruby, you will see a performance warning:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">Warning</span><span class="p">[</span><span class="ss">:performance</span><span class="p">]</span> <span class="o">=</span> <span class="kp">true</span>

<span class="k">class</span> <span class="nc">TooComplex</span>
  <span class="k">def</span> <span class="nf">initialize</span>
    <span class="mi">10</span><span class="p">.</span><span class="nf">times</span> <span class="k">do</span> <span class="o">|</span><span class="n">i</span><span class="o">|</span>
      <span class="nb">instance_variable_set</span><span class="p">(</span><span class="s2">"@iv_</span><span class="si">#{</span><span class="n">i</span><span class="si">}</span><span class="s2">"</span><span class="p">,</span> <span class="n">i</span><span class="p">)</span>
      <span class="n">remove_instance_variable</span><span class="p">(</span><span class="s2">"@iv_</span><span class="si">#{</span><span class="n">i</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
    <span class="k">end</span>
  <span class="k">end</span>
<span class="k">end</span>

<span class="no">TooComplex</span><span class="p">.</span><span class="nf">new</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/tmp/complex.rb:6: warning: The class TooComplex reached 8 shape variations,
instance variables accesses will be slower and memory usage increased.
It is recommended to define instance variables in a consistent order,
for instance by eagerly defining them all in the #initialize method.
</code></pre></div></div>

<p>When this happens, any operation on an instance of that class that would result in a new shape being created instead results
in some sort of “singleton” shape, known as the complex shape, and in that case instance variables are stored in a Hash
instead of being stored in an array. It’s slower and uses more memory, but limits the creation of new shapes.</p>

<p>So the real <code class="language-plaintext highlighter-rouge">#instance_variable_get</code> and <code class="language-plaintext highlighter-rouge">#instance_variable_set</code> implementations are more complicated than what I described at the start of the post.
In reality, they look more like this:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Module</span>
  <span class="k">def</span> <span class="nf">instance_variable_get</span><span class="p">(</span><span class="n">variable_name</span><span class="p">)</span>
    <span class="k">if</span> <span class="vi">@shape</span><span class="p">.</span><span class="nf">too_complex?</span>
      <span class="vi">@fields</span><span class="p">[</span><span class="n">variable_name</span><span class="p">]</span> <span class="c1"># @fields is is Hash</span>
    <span class="k">elsif</span> <span class="n">field_index</span> <span class="o">=</span> <span class="vi">@shape</span><span class="p">.</span><span class="nf">field_index_for</span><span class="p">(</span><span class="n">variable_name</span><span class="p">)</span>
      <span class="vi">@fields</span><span class="p">[</span><span class="n">field_index</span><span class="p">]</span> <span class="c1"># @fields is an Array</span>
    <span class="k">end</span>
  <span class="k">end</span>

  <span class="k">def</span> <span class="nf">instance_variable_set</span><span class="p">(</span><span class="n">variable_name</span><span class="p">,</span> <span class="n">value</span><span class="p">)</span>
    <span class="k">raise</span> <span class="no">FrozenError</span> <span class="k">if</span> <span class="nb">frozen?</span>
    <span class="c1"># The main ractor is the only one allowed to write instance variables</span>
    <span class="k">raise</span> <span class="no">Ractor</span><span class="o">::</span><span class="no">IsolationError</span> <span class="k">unless</span> <span class="no">RubyVM</span><span class="p">.</span><span class="nf">main_ractor?</span>

    <span class="k">if</span> <span class="n">shape</span><span class="p">.</span><span class="nf">too_complex?</span>
      <span class="k">return</span> <span class="vi">@field_index</span><span class="p">[</span><span class="n">variable_name</span><span class="p">]</span> <span class="o">=</span> <span class="n">value</span>
    <span class="k">end</span>

    <span class="k">if</span> <span class="n">field_index</span> <span class="o">=</span> <span class="vi">@shape</span><span class="p">.</span><span class="nf">field_index_for</span><span class="p">(</span><span class="n">variable_name</span><span class="p">)</span>
      <span class="c1"># The variable already exists, we replace its value</span>
      <span class="vi">@fields</span><span class="p">[</span><span class="n">field_index</span><span class="p">]</span> <span class="o">=</span> <span class="n">value</span>
    <span class="k">else</span>
      <span class="c1"># The variable doesn't exist, we have to make a shape transition</span>
      <span class="n">next_shape</span> <span class="o">=</span> <span class="vi">@shape</span><span class="p">.</span><span class="nf">add_instance_variable</span><span class="p">(</span><span class="n">variable_name</span><span class="p">)</span>

      <span class="k">if</span> <span class="n">next_shape</span><span class="p">.</span><span class="nf">too_complex?</span>
        <span class="n">new_fields</span> <span class="o">=</span> <span class="p">{}</span>
        <span class="vi">@shape</span><span class="p">.</span><span class="nf">each_ancestor</span> <span class="k">do</span> <span class="o">|</span><span class="n">shape</span><span class="o">|</span>
          <span class="n">new_fields</span><span class="p">[</span><span class="n">shape</span><span class="p">.</span><span class="nf">variable_name</span><span class="p">]</span> <span class="o">=</span> <span class="vi">@fields</span><span class="p">[</span><span class="n">shape</span><span class="p">.</span><span class="nf">field_index</span><span class="p">]</span>
        <span class="k">end</span>

        <span class="vi">@fields</span> <span class="o">=</span> <span class="n">new_fields</span>
        <span class="vi">@shape</span> <span class="o">=</span> <span class="n">next_shape</span>

        <span class="k">return</span> <span class="vi">@fields</span><span class="p">[</span><span class="n">variable_name</span><span class="p">]</span> <span class="o">=</span> <span class="n">value</span>
      <span class="k">end</span>

      <span class="k">if</span> <span class="n">next_shape</span><span class="p">.</span><span class="nf">capacity</span> <span class="o">&gt;</span> <span class="vi">@shape</span><span class="p">.</span><span class="nf">capacity</span>
        <span class="c1"># @fields is full, we need to allocate a larger one</span>
        <span class="n">new_fields</span> <span class="o">=</span> <span class="no">Array</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">next_shape</span><span class="p">.</span><span class="nf">capacity</span><span class="p">)</span>
        <span class="n">new_fields</span><span class="p">.</span><span class="nf">replace</span><span class="p">(</span><span class="vi">@fields</span><span class="p">)</span> <span class="c1"># copy content</span>
        <span class="n">old_fields</span> <span class="o">=</span> <span class="vi">@fields</span>
        <span class="c1"># Ensure `@fields` isn't updated before its content has been filled</span>
        <span class="no">Atomic</span><span class="p">.</span><span class="nf">write</span> <span class="p">{</span> <span class="vi">@fields</span> <span class="o">=</span> <span class="n">new_fields</span> <span class="p">}</span>
      <span class="k">end</span>

      <span class="vi">@fields</span><span class="p">[</span><span class="n">next_shape</span><span class="p">.</span><span class="nf">field_index</span><span class="p">]</span> <span class="o">=</span> <span class="n">value</span>
      <span class="vi">@shape</span> <span class="o">=</span> <span class="n">next_shape</span>
    <span class="k">end</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>And this code is now riddled with race conditions because regular and complex shapes are radically different,
even in the happy path case where we’re adding a new instance variable, we might turn <code class="language-plaintext highlighter-rouge">@fields</code> from an array into
a <code class="language-plaintext highlighter-rouge">Hash</code>.
So if <code class="language-plaintext highlighter-rouge">@shape</code> and <code class="language-plaintext highlighter-rouge">@fields</code> aren’t perfectly synchronized together, we might end up trying to access a Hash
like an Array, and vice-versa, which will likely end up in a VM crash.</p>

<h2 id="128bit-atomics">128bit Atomics</h2>

<p>One solution could have been to ensure <code class="language-plaintext highlighter-rouge">@shape</code> and <code class="language-plaintext highlighter-rouge">@fields</code> are written atomically together, but unfortunately in this case
it isn’t really possible.</p>

<p>First, because it would require to write two pointer-sized (64bit) values in a single atomic operation, which is possible
on some modern CPUs using SIMD instruction, but Ruby supports many different platforms, and there is no way all of them
would have support for it.</p>

<p>And second, because the constraint with this is that both fields need to be contiguous.
You can’t atomically write two pointer-sized values that are distant from each other.
Semantically you are treating two contiguous 64bit values are a single 128bit one, and for reasons I won’t get into here,
<code class="language-plaintext highlighter-rouge">@shape</code> and <code class="language-plaintext highlighter-rouge">@fields</code> can’t be made contiguous.</p>

<h2 id="delegation">Delegation</h2>

<p>That’s where it came to me that we could instead bundle the <code class="language-plaintext highlighter-rouge">@shape</code> and <code class="language-plaintext highlighter-rouge">@fields</code> in their own GC-managed object,
so that when we have to update both atomically, we can work on a copy and then swap the pointer:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Module</span>
  <span class="k">def</span> <span class="nf">instance_variable_get</span><span class="p">(</span><span class="n">variable_name</span><span class="p">)</span>
    <span class="vi">@fields_object</span><span class="o">&amp;</span><span class="p">.</span><span class="nf">instance_variable_get</span><span class="p">(</span><span class="n">variable_name</span><span class="p">)</span>
  <span class="k">end</span>

  <span class="k">def</span> <span class="nf">instance_variable_set</span><span class="p">(</span><span class="n">variable_name</span><span class="p">,</span> <span class="n">value</span><span class="p">)</span>
    <span class="k">raise</span> <span class="no">FrozenError</span> <span class="k">if</span> <span class="nb">frozen?</span>
    <span class="c1"># The main ractor is the only one allowed to write instance variables</span>
    <span class="k">raise</span> <span class="no">Ractor</span><span class="o">::</span><span class="no">IsolationError</span> <span class="k">unless</span> <span class="no">RubyVM</span><span class="p">.</span><span class="nf">main_ractor?</span>

    <span class="n">new_fields_object</span> <span class="o">=</span> <span class="vi">@fields_object</span> <span class="p">?</span> <span class="vi">@fields_object</span><span class="p">.</span><span class="nf">dup</span> <span class="p">:</span> <span class="no">Object</span><span class="p">.</span><span class="nf">new</span>
    <span class="n">new_fields_object</span><span class="p">.</span><span class="nf">instance_variable_set</span><span class="p">(</span><span class="n">variable_name</span><span class="p">,</span> <span class="n">value</span><span class="p">)</span>
    <span class="no">Atomic</span><span class="p">.</span><span class="nf">write</span> <span class="p">{</span> <span class="vi">@fields_object</span> <span class="o">=</span> <span class="n">new_fields_object</span> <span class="p">}</span>
  <span class="k">end</span>

  <span class="k">def</span> <span class="nf">remove_instance_variable</span><span class="p">(</span><span class="n">variable_name</span><span class="p">)</span>
    <span class="k">raise</span> <span class="no">FrozenError</span> <span class="k">if</span> <span class="nb">frozen?</span>
    <span class="c1"># The main ractor is the only one allowed to write instance variables</span>
    <span class="k">raise</span> <span class="no">Ractor</span><span class="o">::</span><span class="no">IsolationError</span> <span class="k">unless</span> <span class="no">RubyVM</span><span class="p">.</span><span class="nf">main_ractor?</span>

    <span class="n">new_fields_object</span> <span class="o">=</span> <span class="vi">@fields_object</span> <span class="p">?</span> <span class="vi">@fields_object</span><span class="p">.</span><span class="nf">dup</span> <span class="p">:</span> <span class="no">Object</span><span class="p">.</span><span class="nf">new</span>
    <span class="n">new_fields_object</span><span class="p">.</span><span class="nf">remove_instance_variable</span><span class="p">(</span><span class="n">variable_name</span><span class="p">)</span>
    <span class="no">Atomic</span><span class="p">.</span><span class="nf">write</span> <span class="p">{</span> <span class="vi">@fields_object</span> <span class="o">=</span> <span class="n">new_fields_object</span> <span class="p">}</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>It really is that trivial. Instead of storing instance variables in the class or module, we store them in a regular <code class="language-plaintext highlighter-rouge">Object</code>,
and on mutation, we first clone the current state, do our unsafe mutation, and finally atomically swap the <code class="language-plaintext highlighter-rouge">@fields_object</code> reference.</p>

<p>Of course, doing it exactly like this would cause a huge increase in object allocation, so in the actual code I added lots
of special cases to directly mutate the existing object rather than to copy it when it is safe to do so, but conceptually
this is <a href="https://github.com/byroot/ruby/commit/989bce8eef24c6dc6aeb7495d7c57c4324016e72">exactly what my current patch is doing</a>.</p>

<p>That patch is mostly a proof of concept, in the end, I don’t think we should use an actual <code class="language-plaintext highlighter-rouge">T_OBJECT</code> for various reasons,
but I already have a follow-up patch that replaces it with a <code class="language-plaintext highlighter-rouge">T_IMEMO</code>, which is an internal type invisible to Ruby users.</p>

<p>With this solution I was able to remove the locks around class instance variables, and now the ractor version
of the micro-benchmark runs almost 3 times faster than the single-threaded version:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ hyperfine -w 1 './miniruby --yjit ../test.rb' './miniruby --yjit ../test.rb ractor'
Benchmark 1: ./miniruby --yjit ../test.rb
  Time (mean ± σ):     166.3 ms ±   1.1 ms    [User: 164.4 ms, System: 1.5 ms]
  Range (min … max):   164.0 ms … 168.5 ms    18 runs

Benchmark 2: ./miniruby --yjit ../test.rb ractor
  Time (mean ± σ):      59.3 ms ±   2.6 ms    [User: 211.4 ms, System: 1.5 ms]
  Range (min … max):    57.9 ms …  67.7 ms    48 runs

Summary
  ./miniruby --yjit ../test.rb ractor ran
    2.80 ± 0.12 times faster than ./miniruby --yjit ../test.rb
</code></pre></div></div>

<p>That’s still far from the 8 times faster you might expect, but profiling indicates that it’s now a scheduling problem,
which we’ll eventually fix too, and it’s still over 13 times faster than on Ruby 3.4:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ hyperfine -w 1 'ruby --disable-all --yjit ../test.rb ractor' './ruby --disable-all --yjit ../test.rb ractor'
Benchmark 1: ruby --disable-all --yjit ../test.rb ractor
  Time (mean ± σ):     772.3 ms ±   9.0 ms    [User: 1023.8 ms, System: 1325.6 ms]
  Range (min … max):   759.3 ms … 790.5 ms    10 runs

Benchmark 2: ./ruby --disable-all --yjit ../test.rb ractor
  Time (mean ± σ):      56.8 ms ±   1.4 ms    [User: 205.7 ms, System: 1.6 ms]
  Range (min … max):    55.8 ms …  65.6 ms    50 runs

Summary
  ./ruby --disable-all --yjit ../test.rb ractor ran
   13.59 ± 0.36 times faster than ruby --disable-all --yjit ../test.rb ractor
</code></pre></div></div>

<p>Hopefully, I’ll get this merged in the next couple of weeks.</p>

<h2 id="wont-this-increase-memory-usage">Won’t This Increase Memory Usage?</h2>

<p>You may be thinking that this is all well and good, but that using another object to store classes and modules instance
variables in another object will increase Ruby’s memory usage.</p>

<p>Well, probably not. Previously the <code class="language-plaintext highlighter-rouge">@fields</code> memory was managed by <code class="language-plaintext highlighter-rouge">malloc</code>, and while it depends on which implementation
of <code class="language-plaintext highlighter-rouge">malloc</code> you are using, most of them will have an overhead of <code class="language-plaintext highlighter-rouge">16B</code> per allocated pointer, which is exactly the overhead
of a Ruby object.</p>

<p>So overall it shouldn’t cause memory usage to increase.</p>

<h2 id="cherry-on-top">Cherry On Top</h2>

<p>This solution has another incidental benefit, which is that it fixes both a bug and a performance regression recently introduced
when <a href="https://bugs.ruby-lang.org/issues/21311">the new Namespace feature was merged</a>.</p>

<p>Under namespaces, core classes are supposed to have a different set of instance variables, and frozen status, in each namespace,
but this doesn’t work well at all with shapes because right now the shape is stored in the object header, hence all objects
including classes and modules, only have a single shape.</p>

<p>By delegating instance variable management to another object, classes can now have one <code class="language-plaintext highlighter-rouge">@fields_object</code> per namespace,
encompassing both the shape and the fields, hence properly namespace class instance variables.</p>

<p>It wasn’t at all a motivation for this change, but it’s a nice side effect.</p>]]></content><author><name></name></author><category term="ruby" /><category term="performance" /><summary type="html"><![CDATA[In a previous post about ractors, I explained why I think it’s really unlikely you’d ever be able to run an entire application inside a ractor, but that they could still be situationally very useful to move CPU-bound work out of the main thread, and to unlock some parallel algorithm.]]></summary></entry><entry><title type="html">Unlocking Ractors: object_id</title><link href="https://byroot.github.io/ruby/performance/2025/04/26/unlocking-ractors-object-id.html" rel="alternate" type="text/html" title="Unlocking Ractors: object_id" /><published>2025-04-26T10:03:51+00:00</published><updated>2025-04-26T10:03:51+00:00</updated><id>https://byroot.github.io/ruby/performance/2025/04/26/unlocking-ractors-object-id</id><content type="html" xml:base="https://byroot.github.io/ruby/performance/2025/04/26/unlocking-ractors-object-id.html"><![CDATA[<p>In <a href="/ruby/performance/2025/02/27/whats-the-deal-with-ractors.html">a previous post about ractors</a>, I explained why I think it’s really unlikely you’d ever be able to run an entire application inside a ractor, but that they could
still be situationally very useful to move CPU-bound work out of the main thread, and to unlock some parallel algorithm.</p>

<p>But as I mentioned, this is unfortunately not yet viable because there are many known implementation bugs that can lead
to interpreter crashes, and that while they are supposed to execute in parallel, the Ruby VM still has one true global
lock that Ractors need to acquire to perform certain operations, making them often perform worse than the equivalent
single-threaded code.</p>

<p>But things are evolving rapidly.
Since then, there is now a team of people working on fixing exactly that: tackling known bugs and eliminating or reducing the remaining contention points.</p>

<p>The one example I gave to illustrate this remaining contention, was the <code class="language-plaintext highlighter-rouge">fstring_table</code>, which in short is a big internal
hash table used to deduplicate strings, which Ruby does whenever you use a String as a key in a Hash.
Because looking into that table while another Ractor is inserting a new entry would result in a crash (or worse),
until last week Ruby had to acquire the remaining VM lock whenever it touched that table.</p>

<p>But <a href="https://bugs.ruby-lang.org/issues/21268">John Hawthorn recently replaced it with a lock-free Hash-Set</a>, and now this
contention point is gone. If you re-run the JSON benchmarks from the previous post using the latest Ruby master,
the Ractor version is now twice as fast as the single-threaded version, instead of being 3 times slower.</p>

<p>This still isn’t perfect though, as the benchmark uses 5 ractors, hence in an ideal world should be almost 5 times faster
then the single-threaded example, so we still have a lot of work to do to eliminate or reduce the remaining contention
points.</p>

<p>One of such remaining contention points, that you likely didn’t suspect would be one, is
<a href="https://docs.ruby-lang.org/en/3.4/Object.html#method-i-object_id">the <code class="language-plaintext highlighter-rouge">#object_id</code> method</a>.
And on my way back from RubyKaigi, I started working on tackling it.</p>

<p>But before we delve into what I plan to do about it, let’s talk about how this method came to be a contention point.</p>

<h2 id="a-little-bit-of-history">A Little Bit Of History</h2>

<p>Up until Ruby 2.6, the <code class="language-plaintext highlighter-rouge">#object_id</code> implementation used to be quite trivial:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">VALUE</span>
<span class="nf">rb_obj_id</span><span class="p">(</span><span class="n">VALUE</span> <span class="n">obj</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">STATIC_SYM_P</span><span class="p">(</span><span class="n">obj</span><span class="p">))</span> <span class="p">{</span>
        <span class="k">return</span> <span class="p">(</span><span class="n">SYM2ID</span><span class="p">(</span><span class="n">obj</span><span class="p">)</span> <span class="o">*</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">RVALUE</span><span class="p">)</span> <span class="o">+</span> <span class="p">(</span><span class="mi">4</span> <span class="o">&lt;&lt;</span> <span class="mi">2</span><span class="p">))</span> <span class="o">|</span> <span class="n">FIXNUM_FLAG</span><span class="p">;</span>
    <span class="p">}</span>
    <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">FLONUM_P</span><span class="p">(</span><span class="n">obj</span><span class="p">))</span> <span class="p">{</span>
      <span class="k">return</span> <span class="n">LL2NUM</span><span class="p">((</span><span class="n">SIGNED_VALUE</span><span class="p">)</span><span class="n">obj</span><span class="p">);</span>
    <span class="p">}</span>
    <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">SPECIAL_CONST_P</span><span class="p">(</span><span class="n">obj</span><span class="p">))</span> <span class="p">{</span>
      <span class="k">return</span> <span class="n">LONG2NUM</span><span class="p">((</span><span class="n">SIGNED_VALUE</span><span class="p">)</span><span class="n">obj</span><span class="p">);</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="n">LL2NUM</span><span class="p">((</span><span class="n">SIGNED_VALUE</span><span class="p">)(</span><span class="n">obj</span><span class="p">)</span> <span class="o">/</span> <span class="mi">2</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Of course, it’s C so it might be a bit cryptic to the uninitiated, but in short, for the common case of a heap allocated
object, its <code class="language-plaintext highlighter-rouge">object_id</code> would be the address where the object is stored, divided by two.
So in a way, <code class="language-plaintext highlighter-rouge">#object_id</code> used to return you an actual pointer to the object.</p>

<p>This made implementing the lesser-known counterpart of <code class="language-plaintext highlighter-rouge">#object_id</code>, <a href="https://docs.ruby-lang.org/en/2.5.0/ObjectSpace.html#method-c-_id2ref"><code class="language-plaintext highlighter-rouge">ObjectSpace._id2ref</code></a>,
just as trivial, multiply the <code class="language-plaintext highlighter-rouge">object_id</code> by two, and here you go, you now have a pointer to the corresponding object.</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">s</span> <span class="o">=</span> <span class="s2">"I am a string"</span>
<span class="no">ObjectSpace</span><span class="p">.</span><span class="nf">_id2ref</span><span class="p">(</span><span class="n">s</span><span class="p">.</span><span class="nf">object_id</span><span class="p">).</span><span class="nf">equal?</span><span class="p">(</span><span class="n">s</span><span class="p">)</span> <span class="c1"># =&gt; true</span>
</code></pre></div></div>

<p>But there was actually a major problem with that implementation, which is that the Ruby heap is composed of standard-size slots.
When an object is no longer referenced, the GC reclaims the object slot and will most likely re-use it for a future object.</p>

<p>Hence if you were to hold onto an <code class="language-plaintext highlighter-rouge">object_id</code>, and use <code class="language-plaintext highlighter-rouge">ObjectSpace._id2ref</code>, it’s not actually certain the object you get
back is the one you got the <code class="language-plaintext highlighter-rouge">object_id</code> from, it might be a totally different object.</p>

<p>It also meant that if you are holding onto an <code class="language-plaintext highlighter-rouge">object_id</code> as a way to know if you’ve already seen a given object,
you may run into some false positives.</p>

<p>That’s why <a href="https://bugs.ruby-lang.org/issues/15408">in 2018 there was already a feature request to deprecate both <code class="language-plaintext highlighter-rouge">#object_id</code> and <code class="language-plaintext highlighter-rouge">_id2ref</code></a>.
Back then Matz agreed to deprecated <code class="language-plaintext highlighter-rouge">_id2ref</code> for Ruby 2.7, but pointed out that removing <code class="language-plaintext highlighter-rouge">#object_id</code> would be too much of a breaking change,
and that it is a useful API.
However, this somehow fell through the cracks, and <code class="language-plaintext highlighter-rouge">_id2ref</code> was never formally deprecated, which is <a href="https://github.com/ruby/ruby/pull/13157">something I’d like to
do for Ruby 3.5</a>.</p>

<p>I’m not certain why <code class="language-plaintext highlighter-rouge">_id2ref</code> was added initially, given that <code class="language-plaintext highlighter-rouge">git blame</code> points to <a href="https://github.com/ruby/ruby/commit/210367ec889">a commit from 1999 that was generated by cvs2svn</a>.
But if I had to guess, I’d say it was added for <code class="language-plaintext highlighter-rouge">drb</code> which today remains the only significant user of that API in the stdlib, but <a href="https://github.com/ruby/drb/pull/35">even that is about to change</a>.</p>

<h2 id="gc-compaction">GC Compaction</h2>

<p>Regardless of why <code class="language-plaintext highlighter-rouge">_id2ref</code> was added, that major flaw in its design became a blocker for Aaron Patterson when <a href="https://bugs.ruby-lang.org/issues/15626">he implemented
GC compaction in Ruby 2.7</a>.
Since GC compaction implies that objects can be moved from one slot to another, <code class="language-plaintext highlighter-rouge">#object_id</code> could no longer be derived from
the object address, otherwise, it wouldn’t remain stable.</p>

<p>What Aaron did is conceptually simple:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">module</span> <span class="nn">Kernel</span>
  <span class="k">def</span> <span class="nf">object_id</span>
    <span class="k">unless</span> <span class="nb">id</span> <span class="o">=</span> <span class="no">ObjectSpace</span><span class="o">::</span><span class="no">OBJ_TO_ID_TABLE</span><span class="p">[</span><span class="nb">self</span><span class="p">]</span>
      <span class="nb">id</span> <span class="o">=</span> <span class="no">ObjectSpace</span><span class="p">.</span><span class="nf">next_obj_id</span>
      <span class="no">ObjectSpace</span><span class="p">.</span><span class="nf">next_obj_id</span> <span class="o">+=</span> <span class="mi">8</span>
      <span class="no">ObjectSpace</span><span class="o">::</span><span class="no">OBJ_TO_ID_TABLE</span><span class="p">[</span><span class="nb">self</span><span class="p">]</span> <span class="o">=</span> <span class="nb">id</span>
      <span class="no">ObjectSpace</span><span class="o">::</span><span class="no">ID_TO_OBJ_TABLE</span><span class="p">[</span><span class="nb">id</span><span class="p">]</span> <span class="o">=</span> <span class="nb">self</span>
    <span class="k">end</span>
    <span class="nb">id</span>
  <span class="k">end</span>
<span class="k">end</span>

<span class="k">module</span> <span class="nn">ObjectSpace</span>
  <span class="k">def</span> <span class="nc">self</span><span class="o">.</span><span class="nf">_id2ref</span><span class="p">(</span><span class="nb">id</span><span class="p">)</span>
    <span class="no">ObjectSpace</span><span class="o">::</span><span class="no">ID_TO_OBJ_TABLE</span><span class="p">[</span><span class="nb">id</span><span class="p">]</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>In short, Ruby added two internal Hash tables. One of them with objects as keys and IDs as values, and the inverse for the other.
Whenever you access an object’s ID for the first time, a unique ID is created by incrementing an internal counter,
and the relation between the object and its ID is stored in the two hash tables.</p>

<p>As a Ruby user, you can observe this change easily by printing some <code class="language-plaintext highlighter-rouge">object_id</code>:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">p</span> <span class="no">Object</span><span class="p">.</span><span class="nf">new</span><span class="p">.</span><span class="nf">object_id</span>
<span class="nb">p</span> <span class="no">Object</span><span class="p">.</span><span class="nf">new</span><span class="p">.</span><span class="nf">object_id</span>
</code></pre></div></div>

<p>Up to Ruby 2.6, the above code will print some large and seemingly random integers such as <code class="language-plaintext highlighter-rouge">50666405449360</code>, whereas on
Ruby 2.7 onwards, it will print small integers, likely <code class="language-plaintext highlighter-rouge">8</code> and <code class="language-plaintext highlighter-rouge">16</code>.</p>

<p>This change both solved the historical issue with <code class="language-plaintext highlighter-rouge">_id2ref</code> and allowed the GC to keep stable IDs when moving objects from one
address to the other, but made <code class="language-plaintext highlighter-rouge">object_id</code> way more costly than it used to be.</p>

<p>Ruby’s hash-table implementation stores 3 pointer-sized numbers per entry.
One for the key, one for the value, and one for the hashcode:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">st_table_entry</span> <span class="p">{</span>
    <span class="n">st_hash_t</span> <span class="n">hash</span><span class="p">;</span>
    <span class="n">st_data_t</span> <span class="n">key</span><span class="p">;</span>
    <span class="n">st_data_t</span> <span class="n">record</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div></div>

<p>And given every <code class="language-plaintext highlighter-rouge">object_id</code> is stored in two hash-tables, that makes for a total of <code class="language-plaintext highlighter-rouge">48B</code> (plus some change) per <code class="language-plaintext highlighter-rouge">object_id</code>.
That’s quite a lot of memory for just a small number.</p>

<p>In addition, accessing the <code class="language-plaintext highlighter-rouge">object_id</code> now requires doing a hash lookup, when before it was a simple division, and whenever
the GC frees or moves an object that has an ID, it needs to update these two hash-tables.</p>

<p>To be clear, I don’t have any evidence that these two tables cause significant memory or CPU overhead in real-world Ruby applications.
I’m just saying that <code class="language-plaintext highlighter-rouge">#object_id</code> is way more expensive than one might expect.</p>

<h2 id="entering-ractors">Entering Ractors</h2>

<p>Then later on, when Koichi Sasada implemented Ractors since now multiple ractors could attempt to access these two hash-tables
concurrently, <a href="https://github.com/ruby/ruby/commit/da3438a5045">he had to add a lock around them in <code class="language-plaintext highlighter-rouge">#object_id</code></a>, turning
<code class="language-plaintext highlighter-rouge">#object_id</code> in a contention point:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">module</span> <span class="nn">Kernel</span>
  <span class="k">def</span> <span class="nf">object_id</span>
    <span class="no">RubyVM</span><span class="p">.</span><span class="nf">synchronize</span> <span class="k">do</span>
      <span class="k">unless</span> <span class="nb">id</span> <span class="o">=</span> <span class="no">ObjectSpace</span><span class="o">::</span><span class="no">OBJ_TO_ID_TABLE</span><span class="p">[</span><span class="nb">self</span><span class="p">]</span>
        <span class="nb">id</span> <span class="o">=</span> <span class="no">ObjectSpace</span><span class="p">.</span><span class="nf">next_obj_id</span>
        <span class="no">ObjectSpace</span><span class="p">.</span><span class="nf">next_obj_id</span> <span class="o">+=</span> <span class="mi">8</span>
        <span class="no">ObjectSpace</span><span class="o">::</span><span class="no">OBJ_TO_ID_TABLE</span><span class="p">[</span><span class="nb">self</span><span class="p">]</span> <span class="o">=</span> <span class="nb">id</span>
        <span class="no">ObjectSpace</span><span class="o">::</span><span class="no">ID_TO_OBJ_TABLE</span><span class="p">[</span><span class="nb">id</span><span class="p">]</span> <span class="o">=</span> <span class="nb">self</span>
      <span class="k">end</span>
      <span class="nb">id</span>
    <span class="k">end</span>
  <span class="k">end</span>
<span class="k">end</span>

<span class="k">module</span> <span class="nn">ObjectSpace</span>
  <span class="k">def</span> <span class="nc">self</span><span class="o">.</span><span class="nf">_id2ref</span><span class="p">(</span><span class="nb">id</span><span class="p">)</span>
    <span class="no">RubyVM</span><span class="p">.</span><span class="nf">synchronize</span> <span class="k">do</span>
      <span class="no">ObjectSpace</span><span class="o">::</span><span class="no">ID_TO_OBJ_TABLE</span><span class="p">[</span><span class="nb">id</span><span class="p">]</span>
    <span class="k">end</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>At this point, you may wonder if it’s really a big deal.
After all, <code class="language-plaintext highlighter-rouge">#object_id</code> is used a bit for debugging, but not so much in actual production code.
And this is mostly true, but it does come up in real-world code, e.g. <a href="https://github.com/mikel/mail/blob/d1d65b370b109b98e673a934e8b70a0c1f58cc59/lib/mail/message.rb#L1698">in the <code class="language-plaintext highlighter-rouge">mail</code> gem</a>,
<a href="https://github.com/rubocop/rubocop/blob/4a611564c4e1d8ec12a8e45e96490465e5141605/lib/rubocop/cop/variable_force/branch.rb#L129-L131">in <code class="language-plaintext highlighter-rouge">rubocop</code></a>,
and of course <a href="https://github.com/rails/rails/blob/99e27fa586af7db2b5334124a62eb3a464cdffd8/activesupport/lib/active_support/cache/strategy/local_cache.rb#L213-L215">quite a bit in Rails</a>.</p>

<p>But calling <code class="language-plaintext highlighter-rouge">Kernel#object_id</code> isn’t the only way you might rely on an object ID.</p>

<p>The <a href="https://docs.ruby-lang.org/en/3.4/Object.html#method-i-hash"><code class="language-plaintext highlighter-rouge">Object#hash</code></a> method for example rely on it:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="n">st_index_t</span>
<span class="nf">objid_hash</span><span class="p">(</span><span class="n">VALUE</span> <span class="n">obj</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">VALUE</span> <span class="n">object_id</span> <span class="o">=</span> <span class="n">rb_obj_id</span><span class="p">(</span><span class="n">obj</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">FIXNUM_P</span><span class="p">(</span><span class="n">object_id</span><span class="p">))</span>
        <span class="n">object_id</span> <span class="o">=</span> <span class="n">rb_big_hash</span><span class="p">(</span><span class="n">object_id</span><span class="p">);</span>

    <span class="k">return</span> <span class="p">(</span><span class="n">st_index_t</span><span class="p">)</span><span class="n">st_index_hash</span><span class="p">((</span><span class="n">st_index_t</span><span class="p">)</span><span class="n">NUM2LL</span><span class="p">(</span><span class="n">object_id</span><span class="p">));</span>
<span class="p">}</span>

<span class="n">VALUE</span>
<span class="nf">rb_obj_hash</span><span class="p">(</span><span class="n">VALUE</span> <span class="n">obj</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">long</span> <span class="n">hnum</span> <span class="o">=</span> <span class="n">any_hash</span><span class="p">(</span><span class="n">obj</span><span class="p">,</span> <span class="n">objid_hash</span><span class="p">);</span>
    <span class="k">return</span> <span class="n">ST2FIX</span><span class="p">(</span><span class="n">hnum</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Common value classes such as <code class="language-plaintext highlighter-rouge">String</code>, <code class="language-plaintext highlighter-rouge">Array</code> etc, do define their own <code class="language-plaintext highlighter-rouge">#hash</code> method that doesn’t rely on the object ID,
but all other objects that are compared by identity by default will end up using <code class="language-plaintext highlighter-rouge">Object#hash</code>, hence accessing the <code class="language-plaintext highlighter-rouge">object_id</code>.</p>

<p>For instance here’s a quite class <code class="language-plaintext highlighter-rouge">#hash</code> implementation from one of Rails classes:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">#  activerecord/lib/arel/nodes/delete_statement.rb</span>
  <span class="k">def</span> <span class="nf">hash</span>
    <span class="p">[</span><span class="nb">self</span><span class="p">.</span><span class="nf">class</span><span class="p">,</span> <span class="vi">@relation</span><span class="p">,</span> <span class="vi">@wheres</span><span class="p">,</span> <span class="vi">@orders</span><span class="p">,</span> <span class="vi">@limit</span><span class="p">,</span> <span class="vi">@offset</span><span class="p">,</span> <span class="vi">@key</span><span class="p">].</span><span class="nf">hash</span>
  <span class="k">end</span>
</code></pre></div></div>

<p>It absolutely isn’t obvious, but here we’re hashing a <code class="language-plaintext highlighter-rouge">Class</code> object, and classes are indexed by identity like a default object:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;</span> <span class="no">Class</span><span class="p">.</span><span class="nf">new</span><span class="p">.</span><span class="nf">method</span><span class="p">(</span><span class="ss">:hash</span><span class="p">).</span><span class="nf">owner</span>
<span class="o">=&gt;</span> <span class="no">Kernel</span>
<span class="o">&gt;&gt;</span> <span class="no">Object</span><span class="p">.</span><span class="nf">new</span><span class="p">.</span><span class="nf">method</span><span class="p">(</span><span class="ss">:hash</span><span class="p">).</span><span class="nf">owner</span>
<span class="o">=&gt;</span> <span class="no">Kernel</span>
</code></pre></div></div>

<p>Hence the above code currently requires to lock the entire virtual machine, just to produce a hashcode.</p>

<h2 id="deoptimization">Deoptimization</h2>

<p>So what could we do to remove or reduce the need to synchronize the entire virtual machine when accessing object IDs?</p>

<p>Well first, given that <code class="language-plaintext highlighter-rouge">ObjectSpace._id2ref</code> is very rarely used, and will likely be marked as deprecated soon,
we can start by optimistically not creating nor updating the <code class="language-plaintext highlighter-rouge">id -&gt; object</code> table until someone needs it, which hopefully
won’t be the case in the vast majority of programs:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">module</span> <span class="nn">Kernel</span>
  <span class="k">def</span> <span class="nf">object_id</span>
    <span class="no">RubyVM</span><span class="p">.</span><span class="nf">synchronize</span> <span class="k">do</span>
      <span class="k">unless</span> <span class="nb">id</span> <span class="o">=</span> <span class="no">ObjectSpace</span><span class="o">::</span><span class="no">OBJ_TO_ID_TABLE</span><span class="p">[</span><span class="nb">self</span><span class="p">]</span>
        <span class="nb">id</span> <span class="o">=</span> <span class="no">ObjectSpace</span><span class="p">.</span><span class="nf">next_obj_id</span>
        <span class="no">ObjectSpace</span><span class="p">.</span><span class="nf">next_obj_id</span> <span class="o">+=</span> <span class="mi">8</span>
        <span class="no">ObjectSpace</span><span class="o">::</span><span class="no">OBJ_TO_ID_TABLE</span><span class="p">[</span><span class="nb">self</span><span class="p">]</span> <span class="o">=</span> <span class="nb">id</span>
        <span class="k">if</span> <span class="k">defined?</span><span class="p">(</span><span class="no">ObjectSpace</span><span class="o">::</span><span class="no">ID_TO_OBJ_TABLE</span><span class="p">)</span>
          <span class="no">ObjectSpace</span><span class="o">::</span><span class="no">ID_TO_OBJ_TABLE</span><span class="p">[</span><span class="nb">id</span><span class="p">]</span> <span class="o">=</span> <span class="nb">self</span>
        <span class="k">end</span>
      <span class="k">end</span>
      <span class="nb">id</span>
    <span class="k">end</span>
  <span class="k">end</span>
<span class="k">end</span>

<span class="k">module</span> <span class="nn">ObjectSpace</span>
  <span class="k">def</span> <span class="nc">self</span><span class="o">.</span><span class="nf">_id2ref</span><span class="p">(</span><span class="nb">id</span><span class="p">)</span>
    <span class="no">RubyVM</span><span class="p">.</span><span class="nf">synchronize</span> <span class="k">do</span>
      <span class="k">unless</span> <span class="k">defined?</span><span class="p">(</span><span class="no">ObjectSpace</span><span class="o">::</span><span class="no">ID_TO_OBJ_TABLE</span><span class="p">)</span>
        <span class="no">ObjectSpace</span><span class="o">::</span><span class="no">ID_TO_OBJ_TABLE</span> <span class="o">=</span> <span class="no">ObjectSpace</span><span class="o">::</span><span class="no">OBJ_TO_ID_TABLE</span><span class="p">.</span><span class="nf">invert</span>
      <span class="k">end</span>
      <span class="no">ObjectSpace</span><span class="o">::</span><span class="no">ID_TO_OBJ_TABLE</span><span class="p">[</span><span class="nb">id</span><span class="p">]</span>
    <span class="k">end</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>This doesn’t remove the lock yet, but assuming your program never calls <code class="language-plaintext highlighter-rouge">ObjectSpace._id2ref</code> it removes some work
from inside the lock, hence it shouldn’t be held as long.
And even if you don’t use Ractors, it should slightly reduce memory usage as well as remove work for the GC,
as demonstrated by a micro-benchmark:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>benchmark:
  baseline: "Object.new"
  object_id: "Object.new.object_id"
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>compare-ruby: ruby 3.5.0dev (2025-04-10T09:44:40Z master 684cfa42d7) +YJIT +PRISM [arm64-darwin24]
built-ruby: ruby 3.5.0dev (2025-04-10T10:13:43Z lazy-id-to-obj d3aa9626cc) +YJIT +PRISM [arm64-darwin24]
warming up..

|           |compare-ruby|built-ruby|
|:----------|-----------:|---------:|
|baseline   |     26.364M|   25.974M|
|           |       1.01x|         -|
|object_id  |     10.293M|   14.202M|
|           |           -|     1.38x|
</code></pre></div></div>

<p>As always, when possible, the most efficient way to speed up some code is to not call it if you can avoid it.</p>

<p>If you’re curious to see the actual implementation, <a href="https://github.com/ruby/ruby/pull/13115">you can have a look at the pull request</a>.</p>

<h2 id="inline-storage">Inline Storage</h2>

<p>But while saving a bit of memory and CPU is nice, we’re still not significantly reducing contention, so what else could we do?</p>

<p>The crux of the issue here is that the <code class="language-plaintext highlighter-rouge">object_id</code> is stored in a centralized hash table, and as long as it will be the case,
synchronization will be required, short of implementing a lock-free hash table, but this is quite tricky to do.
Much trickier than a hash-set John used for the <code class="language-plaintext highlighter-rouge">fstring_table</code>.</p>

<p>But more importantly, a centralized data structure to store all the IDs of all objects isn’t great for locality anyway.
More so, needing to do a hash lookup to access an object’s property is quite costly, when conceptually it should be stored directly
inside the object.</p>

<p>If you think about it, <code class="language-plaintext highlighter-rouge">object_id</code> isn’t very different from an instance variable:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">module</span> <span class="nn">Kernel</span>
  <span class="k">def</span> <span class="nf">object_id</span>
    <span class="vi">@__object_id</span> <span class="o">||=</span> <span class="no">ObjectSpace</span><span class="p">.</span><span class="nf">generate_next_obj_id</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>You’d need the id generation to be thread-safe, which is easily done using an atomic increment operation, but other than that,
assuming the object isn’t one of the special objects that is accessible from multiple ractors, you can mutate it to store the
<code class="language-plaintext highlighter-rouge">object_id</code> without having to lock the entire VM.</p>

<p>However, as is tradition, nothing is ever that simple.</p>

<h2 id="final-shapes">Final Shapes</h2>

<p>Since Ruby 3.2, objects use shapes to define how their instance variables are stored.</p>

<p>Here again, let’s use some pseudo-Ruby code to illustrate the basics of how they work.</p>

<p>To start, shapes are a tree-like structure. Every shape has a parent (except the root one)
and 0-N children:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Shape</span>
  <span class="k">def</span> <span class="nf">initialize</span><span class="p">(</span><span class="n">parent</span><span class="p">,</span> <span class="n">type</span><span class="p">,</span> <span class="n">edge_name</span><span class="p">,</span> <span class="n">next_ivar_index</span><span class="p">)</span>
    <span class="vi">@parent</span> <span class="o">=</span> <span class="n">parent</span>
    <span class="vi">@type</span> <span class="o">=</span> <span class="n">type</span>
    <span class="vi">@edge_name</span> <span class="o">=</span> <span class="n">edge_name</span>
    <span class="vi">@next_ivar_index</span> <span class="o">=</span> <span class="n">next_ivar_index</span>
    <span class="vi">@edges</span> <span class="o">=</span> <span class="p">{}</span>
  <span class="k">end</span>

  <span class="k">def</span> <span class="nf">add_ivar</span><span class="p">(</span><span class="n">ivar_name</span><span class="p">)</span>
    <span class="vi">@edges</span><span class="p">[</span><span class="n">ivar_name</span><span class="p">]</span> <span class="o">||=</span> <span class="no">Shape</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="nb">self</span><span class="p">,</span> <span class="ss">:ivar</span><span class="p">,</span> <span class="n">ivar_name</span><span class="p">,</span> <span class="n">next_ivar_index</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>With this, when the Ruby VM has to execute code such as:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">User</span>
  <span class="k">def</span> <span class="nf">initialize</span><span class="p">(</span><span class="nb">name</span><span class="p">,</span> <span class="n">role</span><span class="p">)</span>
    <span class="vi">@name</span> <span class="o">=</span> <span class="nb">name</span>
    <span class="vi">@role</span> <span class="o">=</span> <span class="n">role</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>It can compute the object shape on the fly such as:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Allocate the object</span>
<span class="n">object</span> <span class="o">=</span> <span class="n">new_object</span>
<span class="n">object</span><span class="p">.</span><span class="nf">shape</span> <span class="o">=</span> <span class="no">ROOT_SHAPE</span>

<span class="c1"># add @name</span>
<span class="n">next_shape</span> <span class="o">=</span> <span class="n">object</span><span class="p">.</span><span class="nf">add_ivar</span><span class="p">(</span><span class="ss">:@name</span><span class="p">)</span>
<span class="n">object</span><span class="p">.</span><span class="nf">shape</span> <span class="o">=</span> <span class="n">next_shape</span>
<span class="n">object</span><span class="p">.</span><span class="nf">ivars</span><span class="p">[</span><span class="n">next_shape</span><span class="p">.</span><span class="nf">next_ivar_index</span> <span class="o">-</span> <span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="nb">name</span>

<span class="c1"># add @role</span>
<span class="n">next_shape</span> <span class="o">=</span> <span class="n">object</span><span class="p">.</span><span class="nf">add_ivar</span><span class="p">(</span><span class="ss">:@role</span><span class="p">)</span>
<span class="n">object</span><span class="p">.</span><span class="nf">shape</span> <span class="o">=</span> <span class="n">next_shape</span>
<span class="n">object</span><span class="p">.</span><span class="nf">ivars</span><span class="p">[</span><span class="n">next_shape</span><span class="p">.</span><span class="nf">next_ivar_index</span> <span class="o">-</span> <span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="n">role</span>
</code></pre></div></div>

<p>This method may seem surprising, but it’s actually very efficient for various reasons I won’t get into here,
because I wrote <a href="https://railsatscale.com/2023-10-24-memoization-pattern-and-object-shapes/">another post about it a bit over a year ago</a>,
go read it if you are curious to know more.</p>

<p>But how instance variables are laid out isn’t the only thing that shapes record. They also keep track of how large an object
is, hence how many instance variables it can store, as well as whether it has been frozen.</p>

<p>Still in pseudo-Ruby code, it looks like this:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Shape</span>
  <span class="k">def</span> <span class="nf">add_ivar</span><span class="p">(</span><span class="n">ivar_name</span><span class="p">)</span>
    <span class="k">if</span> <span class="vi">@type</span> <span class="o">==</span> <span class="ss">:frozen</span>
      <span class="k">raise</span> <span class="s2">"Can't modify frozen object"</span>
    <span class="k">end</span>
    <span class="vi">@edges</span><span class="p">[</span><span class="n">ivar_name</span><span class="p">]</span> <span class="o">||=</span> <span class="no">Shape</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="nb">self</span><span class="p">,</span> <span class="ss">:ivar</span><span class="p">,</span> <span class="n">ivar_name</span><span class="p">,</span> <span class="n">next_ivar_index</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span>
  <span class="k">end</span>

  <span class="k">def</span> <span class="nf">freeze</span>
    <span class="vi">@edges</span><span class="p">[</span><span class="ss">:__frozen</span><span class="p">]</span> <span class="o">||=</span> <span class="no">Shape</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="nb">self</span><span class="p">,</span> <span class="ss">:frozen</span><span class="p">,</span> <span class="kp">nil</span><span class="p">,</span> <span class="n">next_ivar_index</span><span class="p">)</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>So <code class="language-plaintext highlighter-rouge">frozen</code> shapes are final. It is expected that a shape of type <code class="language-plaintext highlighter-rouge">frozen</code> won’t ever have any children.</p>

<p>But in the case of <code class="language-plaintext highlighter-rouge">object_id</code>, we want to be able to store the id on any object, regardless of whether they are frozen
or not. So the first step is to modify shapes to allow that, <a href="https://github.com/Shopify/ruby/commit/ca92bbe4f646658f9a420e61089cf5d6e27a5a71">which I did in a relatively simple commit</a>.</p>

<p>But here too there was a bit of a complication. In a few cases, for instance when calling <code class="language-plaintext highlighter-rouge">Object#dup</code>, Ruby needs to find
the unfrozen version of a shape. Previously, since frozen shapes couldn’t possibly have children, it was quite simple:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Object</span>
  <span class="k">def</span> <span class="nf">dup</span>
    <span class="n">new_object</span> <span class="o">=</span> <span class="nb">self</span><span class="p">.</span><span class="nf">class</span><span class="p">.</span><span class="nf">allocate</span>
    <span class="k">if</span> <span class="nb">self</span><span class="p">.</span><span class="nf">shape</span><span class="p">.</span><span class="nf">type</span> <span class="o">==</span> <span class="ss">:frozen</span>
      <span class="n">new_object</span><span class="p">.</span><span class="nf">shape</span> <span class="o">=</span> <span class="nb">self</span><span class="p">.</span><span class="nf">shape</span><span class="p">.</span><span class="nf">parent</span>
    <span class="k">else</span>
      <span class="n">new_object</span><span class="p">.</span><span class="nf">shape</span> <span class="o">=</span> <span class="nb">self</span><span class="p">.</span><span class="nf">shape</span>
    <span class="k">end</span>
    <span class="c1"># ...</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>Once you allow frozen shapes to have children, this operation becomes more involved, as you now need to go up the tree
to find the last non-frozen shape, then reapply all the child shapes you wish to carry over.</p>

<p>After this small refactoring was done, I could introduce a new type of shape: <code class="language-plaintext highlighter-rouge">SHAPE_OBJ_ID</code>, which behaves very similarly
to instance variable shapes:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Shape</span>
  <span class="k">def</span> <span class="nf">object_id</span>
    <span class="c1"># First check if there is an OBJ_ID shape in ancestors</span>
    <span class="n">shape</span> <span class="o">=</span> <span class="nb">self</span>
    <span class="k">while</span> <span class="n">shape</span><span class="p">.</span><span class="nf">parent</span>
      <span class="k">return</span> <span class="n">shape</span> <span class="k">if</span> <span class="n">shape</span><span class="p">.</span><span class="nf">type</span> <span class="o">==</span> <span class="ss">:obj_id</span>
      <span class="n">shape</span> <span class="o">=</span> <span class="n">shape</span><span class="p">.</span><span class="nf">parent</span>
    <span class="k">end</span>

    <span class="c1"># Otherwise create one.</span>
    <span class="vi">@edges</span><span class="p">[</span><span class="ss">:__object_id</span><span class="p">]</span> <span class="o">||=</span> <span class="no">Shape</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="nb">self</span><span class="p">,</span> <span class="ss">:obj_id</span><span class="p">,</span> <span class="kp">nil</span><span class="p">,</span> <span class="n">next_ivar_index</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>And just like this, we’re now able to reserve some inline space inside any object to store the <code class="language-plaintext highlighter-rouge">object_id</code>,
and in <em>some cases</em> we’re able to access an object’s ID fully lock-free.</p>

<h2 id="lock-free-shapes">Lock Free Shapes</h2>

<p>Why I’m saying <em>in some cases</em> is because there are still a number of limitations.</p>

<p>First, since shapes are mostly immutable, we can access an object’s shape, and all its ancestors without taking a lock.
However, finding or creating a shape’s child currently still requires synchronizing the VM.
So even if my patch was to be applied, Ruby would still lock when accessing an object’s ID for the very first time,
it would only be lock-free on subsequent accesses.</p>

<p>Being able to find or create child shapes in a lock-free way would be useful way beyond the <code class="language-plaintext highlighter-rouge">object_id</code> use case, so
hopefully we’ll get to it in the future, I haven’t yet dedicated much thought to it, but I’m hopeful we can find
a solution. But even if we can’t do it lock-free, I think we could at least use a dedicated lock for it, so we wouldn’t
contend with all the other code paths that synchronize the entire VM, only paths that do the same operation.</p>

<p>Then, if the object is potentially shared between ractors, we also still need to acquire the lock before storing the ID,
as otherwise, concurrent writes may cause a race condition. Given we need to both update the object’s shape and write
the <code class="language-plaintext highlighter-rouge">object_id</code> inside the object, we can’t do it all in an atomic manner.</p>

<p>Finally, not all objects store their instance variables in the same way.</p>

<h2 id="generic-instance-variables">Generic Instance Variables</h2>

<p>As a Rubyist, you likely know that in Ruby everything is an object, but that doesn’t mean all objects are equal.</p>

<p>In the context of instance variables, there are essentially three types of objects: <code class="language-plaintext highlighter-rouge">T_OBJECT</code>, <code class="language-plaintext highlighter-rouge">T_CLASS/T_MODULE</code> and
then all the rest.</p>

<p><code class="language-plaintext highlighter-rouge">T_OBJECT</code> are your classic objects that inherit from the <code class="language-plaintext highlighter-rouge">BasicObject</code> class. Their instance variables are stored
inline directly inside the object slot, as long as it’s large enough. If it ends up overflowing, then a separated memory
location is allocated, and instance variables are moved there, the object slot then only contains a pointer to that auxiliary memory.</p>

<p><code class="language-plaintext highlighter-rouge">T_CLASS</code> and <code class="language-plaintext highlighter-rouge">T_MODULE</code> as their name suggests are all instances of the <code class="language-plaintext highlighter-rouge">Class</code> and <code class="language-plaintext highlighter-rouge">Module</code> classes. These are much
larger than regular objects, as they need to keep track of a lot of things, such as their method table, a pointer to the
parent class, etc:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;</span> <span class="no">ObjectSpace</span><span class="p">.</span><span class="nf">memsize_of</span><span class="p">(</span><span class="no">Object</span><span class="p">.</span><span class="nf">new</span><span class="p">)</span>
<span class="o">=&gt;</span> <span class="mi">40</span>
<span class="o">&gt;&gt;</span> <span class="no">ObjectSpace</span><span class="p">.</span><span class="nf">memsize_of</span><span class="p">(</span><span class="no">Class</span><span class="p">.</span><span class="nf">new</span><span class="p">)</span>
<span class="o">=&gt;</span> <span class="mi">192</span>
</code></pre></div></div>

<p>As such, they never store their instance variables inline, they always store them in auxiliary memory, and they have
dedicated space in their object slot to store the auxiliary memory pointer:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp"># internal/class.h
</span><span class="k">struct</span> <span class="n">rb_classext_struct</span> <span class="p">{</span>
    <span class="n">VALUE</span> <span class="o">*</span><span class="n">iv_ptr</span><span class="p">;</span> <span class="c1">// iv = instance variable</span>
    <span class="c1">// ...</span>
<span class="p">}</span>
</code></pre></div></div>

<p>And finally, there are all the other objects, such as <code class="language-plaintext highlighter-rouge">T_STRING</code>, <code class="language-plaintext highlighter-rouge">T_ARRAY</code>, <code class="language-plaintext highlighter-rouge">T_HASH</code>, <code class="language-plaintext highlighter-rouge">T_REGEXP</code>, etc.
None of these have free space in their slot to store inline variables, and not even space to store the auxiliary memory
pointer.</p>

<p>So what does Ruby do when you do add an instance variable to such objects? Well, it stores it in a Hash-table of course!</p>

<p>In pseudo-Ruby, it would look like this:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">module</span> <span class="nn">GenericIvarObject</span>
  <span class="k">class</span> <span class="nc">GenericStorage</span>
    <span class="nb">attr_accessor</span> <span class="ss">:shape</span>
    <span class="nb">attr_reader</span> <span class="ss">:ivars</span>

    <span class="k">def</span> <span class="nf">initialize</span>
      <span class="vi">@ivars</span> <span class="o">=</span> <span class="p">[]</span>
    <span class="k">end</span>
  <span class="k">end</span>

  <span class="k">def</span> <span class="nf">instance_variable_get</span><span class="p">(</span><span class="n">ivar_name</span><span class="p">)</span>
    <span class="n">store</span> <span class="o">=</span> <span class="no">RubyVM</span><span class="p">.</span><span class="nf">synchronize</span> <span class="k">do</span>
      <span class="no">GENERIC_STORAGE</span><span class="p">[</span><span class="nb">self</span><span class="p">]</span> <span class="o">||=</span> <span class="no">GenericStorage</span><span class="p">.</span><span class="nf">new</span>
    <span class="k">end</span>

    <span class="k">if</span> <span class="n">ivar_shape</span> <span class="o">=</span> <span class="n">store</span><span class="p">.</span><span class="nf">shape</span><span class="p">.</span><span class="nf">find</span><span class="p">(</span><span class="n">ivar_name</span><span class="p">)</span>
      <span class="n">store</span><span class="p">.</span><span class="nf">ivars</span><span class="p">[</span><span class="n">ivar_shape</span><span class="p">.</span><span class="nf">next_ivar_index</span> <span class="o">-</span> <span class="mi">1</span><span class="p">]</span>
    <span class="k">end</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>As you probably have noticed or even guessed, since this is yet another global hash table, any access needs to be synchronized,
which means that for objects other than <code class="language-plaintext highlighter-rouge">T_OBJECT</code>, <code class="language-plaintext highlighter-rouge">T_CLASS</code> and <code class="language-plaintext highlighter-rouge">T_MODULE</code>,
my patch replaces one global synchronized hash with another…</p>

<p>So perhaps for these, keeping the original <code class="language-plaintext highlighter-rouge">object -&gt; id</code> table would be preferable, that’s something I still need to figure out.</p>

<h3 id="conclusion">Conclusion</h3>

<p>My patch isn’t finished. I still have to figure out how to best deal with “generic” objects, and probably refine the
implementation some more, and perhaps it won’t even be merged at all in the end.</p>

<p>But I wanted to share it because explaining something helps me think about the problem,
and also because while I don’t think <code class="language-plaintext highlighter-rouge">object_id</code> is currently the biggest Ractor bottleneck,
it’s a good showcase of the type of work that needs to be done to make Ractors more parallel.</p>

<p>If you are curious about the patch, here’s <a href="https://github.com/ruby/ruby/compare/master...byroot:ruby:object_id-in-shape-snapshot">what it currently looks like as of this writing</a>.</p>

<p>Similar work will have to be done for other internal tables, such as the symbol table and the various method tables.</p>]]></content><author><name></name></author><category term="ruby" /><category term="performance" /><summary type="html"><![CDATA[In a previous post about ractors, I explained why I think it’s really unlikely you’d ever be able to run an entire application inside a ractor, but that they could still be situationally very useful to move CPU-bound work out of the main thread, and to unlock some parallel algorithm.]]></summary></entry><entry><title type="html">Database Protocols Are Underwhelming</title><link href="https://byroot.github.io/performance/2025/03/21/database-protocols.html" rel="alternate" type="text/html" title="Database Protocols Are Underwhelming" /><published>2025-03-21T08:03:51+00:00</published><updated>2025-03-21T08:03:51+00:00</updated><id>https://byroot.github.io/performance/2025/03/21/database-protocols</id><content type="html" xml:base="https://byroot.github.io/performance/2025/03/21/database-protocols.html"><![CDATA[<p>If you’ve been in this trade for a while, you have probably seen dozens of debates on the merits and problems of SQL
as a relational database query language.
As an ORM maintainer, I have a few gripes with SQL, but overall it is workable, and anyway, it has so much inertia
that there’s no point fantasizing about a replacement.</p>

<p>However one database-adjacent topic I don’t think I’ve ever seen any discussions about, and that I think could be improved,
is the protocols exposed by these databases to execute queries.
Relational databases are very impressive pieces of technology, but their client protocol makes me wonder if they ever
considered being used by anything other than a human typing commands in a CLI interface.</p>

<p>I also happen to maintain the Redis client for Ruby, and while the Redis protocol is far from perfect, I think
there are some things it does better than PostgreSQL and MySQL protocols, which are the two I am somewhat familiar with.</p>

<h2 id="mutable-state-lot-of-mutable-state">Mutable State, Lot Of Mutable State</h2>

<p>You’ve probably never seen them, because they’re not logged by default, but when Active Record connects to your database
it starts by executing several database-specific queries, which I generally call the “prelude”.</p>

<p>Which queries are sent exactly depends on how you configured Active Record, but for most people, it will be the default.</p>

<p>In the case of MySQL it will look like this:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SET</span>  <span class="o">@@</span><span class="k">SESSION</span><span class="p">.</span><span class="n">sql_mode</span> <span class="o">=</span> <span class="n">CONCAT</span><span class="p">(</span><span class="o">@@</span><span class="n">sql_mode</span><span class="p">,</span> <span class="s1">',STRICT_ALL_TABLES,NO_AUTO_VALUE_ON_ZERO'</span><span class="p">),</span>
     <span class="o">@@</span><span class="k">SESSION</span><span class="p">.</span><span class="n">wait_timeout</span> <span class="o">=</span> <span class="mi">2147483</span>
</code></pre></div></div>

<p>For PostgreSQL, there’s a bit more:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SET</span> <span class="n">client_min_messages</span> <span class="k">TO</span> <span class="s1">'warning'</span><span class="p">;</span>
<span class="k">SET</span> <span class="n">standard_conforming_strings</span> <span class="o">=</span> <span class="k">on</span><span class="p">;</span>
<span class="k">SET</span> <span class="n">intervalstyle</span> <span class="o">=</span> <span class="n">iso_8601</span><span class="p">;</span>
<span class="k">SET</span> <span class="k">SESSION</span> <span class="n">timezone</span> <span class="k">TO</span> <span class="s1">'UTC'</span>
</code></pre></div></div>

<p>In both cases the idea is the same, we’re configuring the connection, making it behave differently.
And there’s nothing wrong with the general idea of that, as a database gets older, new modes and features get introduced
so for backward compatibility reasons you have to opt-in to them.</p>

<p>My issue with this however is that you can set these at any point.
They’re not restricted to an initial authentication and configuration step, so when as a framework or library you hand
over a connection to user code and later get it back, you can’t know for sure they haven’t changed any of these settings.
Similarly, it means you have both configured and unconfigured connections and must be careful to never use an unconfigured one.
It’s not the end of the world but noticeably complexifies the connection management code.</p>

<p>This statefulness also makes it hard if not impossible to recover from errors. If for some reason a query fails, it’s hard
to tell which state the connection is in, and the only reasonable thing to do is to close it and start from scratch with a new connection.</p>

<p>If these protocols had an explicit initial configuration phase, it would make it easier to have some sort of “reset state”
message you could send after an error (or after letting user code run unknown queries) to get the connection back to a known clean state.</p>

<p>From a Ruby client perspective, it would look like this:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">connection</span> <span class="o">=</span> <span class="no">MyDB</span><span class="p">.</span><span class="nf">new_connection</span>
<span class="n">connection</span><span class="p">.</span><span class="nf">authenticate</span><span class="p">(</span><span class="n">user</span><span class="p">,</span> <span class="n">password</span><span class="p">)</span>
<span class="n">connection</span><span class="p">.</span><span class="nf">configure</span><span class="p">(</span><span class="s2">"SET ..."</span><span class="p">)</span>
<span class="n">connection</span><span class="p">.</span><span class="nf">query</span><span class="p">(</span><span class="s2">"INSERT INTO ..."</span><span class="p">)</span>
<span class="n">connection</span><span class="p">.</span><span class="nf">reset</span>
</code></pre></div></div>

<p>You could even cheaply reset the state whenever a connection is checked back into a connection pool.</p>

<p>I’m not particularly knowledgeable about all the constraints database servers face, but I can’t think of a reason why such
protocol feature would be particularly tricky to implement.</p>

<h2 id="safe-retries">Safe Retries</h2>

<p>One of the most important jobs of a database client, or network clients in general, is to deal with network errors.</p>

<p>Under the hood, most if not all clients will look like this:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">query</span><span class="p">(</span><span class="n">command</span><span class="p">)</span>
  <span class="n">packet</span> <span class="o">=</span> <span class="n">serialize</span><span class="p">(</span><span class="n">command</span><span class="p">)</span>
  <span class="vi">@socket</span><span class="p">.</span><span class="nf">write</span><span class="p">(</span><span class="n">packet</span><span class="p">)</span>
  <span class="n">response</span> <span class="o">=</span> <span class="vi">@socket</span><span class="p">.</span><span class="nf">read</span>
  <span class="n">deserialize</span><span class="p">(</span><span class="n">response</span><span class="p">)</span>
<span class="k">end</span>
</code></pre></div></div>

<p>It’s fairly trivial, you send the query to the server and read the server response.
The difficulty however is that both the <code class="language-plaintext highlighter-rouge">write</code> and the <code class="language-plaintext highlighter-rouge">read</code> operations can fail in dozens of different ways.</p>

<p>Perhaps the server is temporarily unreachable and will work again in a second or two.
Or perhaps it’s reachable but was temporarily overloaded and didn’t answer fast enough so the client timeout was reached.</p>

<p>These errors should hopefully be rare, but can’t be fully avoided.
Whenever you are sending something through the network, there is a chance it might not work, it’s a fact of life.
Hence a client should try to gracefully handle such errors as much as possible, and there aren’t many ways to do so.</p>

<p>The most obvious way to handle such an error is to retry the query, the problem is that most of the time, from the point
of view of the database client, it isn’t clear whether it is safe to retry or not.</p>

<p>In my view, the best feature of <code class="language-plaintext highlighter-rouge">HTTP</code> by far is its explicit verb specification.
The HTTP spec clearly states that clients, and even proxies, are allowed to retry some specific verbs such as <code class="language-plaintext highlighter-rouge">GET</code> or <code class="language-plaintext highlighter-rouge">DELETE</code>
because they are <a href="https://en.wikipedia.org/wiki/Idempotence#Computer_science_meaning">idempotent</a>.</p>

<p>The reason this is important is that whenever the <code class="language-plaintext highlighter-rouge">write</code> or the <code class="language-plaintext highlighter-rouge">read</code> fails, in the overwhelming majority of cases,
you don’t know whether the query was executed on the server or not.
That is why idempotency is such a valuable property, by definition an idempotent operation can safely be executed twice,
hence when you are in doubt whether it was executed, you can retry.</p>

<p>But knowing whether a query is idempotent or not with SQL isn’t easy.
For instance, a simple <code class="language-plaintext highlighter-rouge">DELETE</code> query is idempotent:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">DELETE</span>
<span class="k">FROM</span> <span class="n">articles</span>
<span class="k">WHERE</span> <span class="n">id</span> <span class="o">=</span> <span class="mi">42</span><span class="p">;</span>
</code></pre></div></div>

<p>But one can perfectly write a <code class="language-plaintext highlighter-rouge">DELETE</code> query that isn’t:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">DELETE</span>
<span class="k">FROM</span> <span class="n">articles</span>
<span class="k">WHERE</span> <span class="n">id</span> <span class="k">IN</span> <span class="p">(</span>
  <span class="k">SELECT</span> <span class="n">id</span>
  <span class="k">FROM</span> <span class="n">articles</span>
  <span class="k">LIMIT</span> <span class="mi">10</span>
<span class="p">);</span>
</code></pre></div></div>

<p>So in practice, database clients can’t safely retry on errors, unless the caller instructs them that it is safe to do so.
You could attempt to write a client that parses the queries to figure out whether they are idempotent, but it is fraught with peril,
hence it’s generally preferable to rely on the caller to tell us.</p>

<p>That’s one of the reasons why I’ve been slowly refactoring Active Record lately, to progressively make it easier to retry
more queries in case of network errors.
But even once I’ll be done with that refactoring, numerous non-idempotent queries will remain, and whenever they fail,
there is still nothing Active Record will be able to do about it.</p>

<h2 id="idempotency-keys">Idempotency Keys</h2>

<p>However, there are solutions to turn non-idempotent operations into idempotent ones, using what is sometimes called “Idempotency Keys”.
If you’ve used <a href="https://docs.stripe.com/api/idempotent_requests">the Stripe API</a>, perhaps you are already familiar with them.
I suspect they’re not the first ones to come up with such a solution, but that’s where I was first exposed to it.</p>

<p>Conceptually it’s rather simple, when performing a non-idempotent operation, say creating a new customer record, you can
add an <code class="language-plaintext highlighter-rouge">Idempotency-Key</code> <code class="language-plaintext highlighter-rouge">HTTP</code> header containing a randomly generated string.
If for some reason you need to retry that request, you do it with the same idempotency key, allowing the Stripe API to
check if the initial request succeeded or not, and either perform or discard the retry.</p>

<p>They even go a bit further, when a request with an idempotency key succeeds, they record the response so that in case of
a retry, they return you exactly the original response. Thanks to this feature, it is safe to retry all API calls to their
API, regardless of whether they are idempotent or not.</p>

<p>This is such a great feature that last year, at Rails World 2024, when I saw there was a ValKey booth, hosted by
<a href="https://fosstodon.org/@linux_mclinuxface">Kyle Davis</a>, I decided to go have a chat with him, to see if perhaps ValKey
was interested in tackling this fairly common problem.</p>

<p>Because everything I said about SQL and idempotency also applies to Redis (hence to ValKey).
It is also hard for a Redis client to know if a query can safely be retried, and for decades, long before I became the
maintainer, the Redis client would <a href="https://github.com/avgerin0s/redis-rb/blob/f17a33f05146d29256622e7736abe00870aed6ef/lib/redis.rb#L139-L152">retry all queries by default</a>.</p>

<p>At first, it would only do so in case of <code class="language-plaintext highlighter-rouge">ECONNRESET</code> errors, but over time more errors were added to the retry list.
I must admit I’m not the most knowledgeable person about <code class="language-plaintext highlighter-rouge">TCP</code>, so perhaps it is indeed safe to assume the server never
received the query when such an error is returned, but over time more and more errors were added to the list, and I highly
doubt all of them are safe to retry.</p>

<p>That’s why when I later wrote <code class="language-plaintext highlighter-rouge">redis-client</code>, a much simpler and lower-level client for Redis, I made sure not to retry by
default, as well as a way to distinguish idempotent queries by having both a <code class="language-plaintext highlighter-rouge">call</code> and a <code class="language-plaintext highlighter-rouge">call_once</code> method.</p>

<p>But from the feedback I got when Mike Perham replaced the <code class="language-plaintext highlighter-rouge">redis</code> gem with <code class="language-plaintext highlighter-rouge">redis-client</code> in Sidekiq, lots of users
started noticing reports of errors they wouldn’t experience before, showing how unreliable remote data stores can be in practice,
especially in cloud environments.</p>

<p>So even though these retries were potentially unsafe, and may have occasionally caused data loss, they were desired by users.</p>

<p>That’s why I tried to pitch an idempotency key kind of feature to Kyle, and he encouraged me to open <a href="https://github.com/valkey-io/valkey/issues/1087">a feature request
in the ValKey repo</a>. After a few rounds of discussion, the ValKey core
team accepted the feature, and while as far as I know it hasn’t been implemented yet, the next version of ValKey will likely
have it.</p>

<p>It is again pretty simple conceptually:</p>

<pre><code class="language-SQL">MULTISTORE 699accd1-c7fa-4c40-bc85-5cfcd4d3d344 EX 10
INC counter
LPOP queue
EXEC
</code></pre>

<p>Just like with Stripe’s API, you start a transaction with a randomly generated key, in this case, a UUID, as well as an expiry.</p>

<p>In the example above we ask ValKey to remember this transaction for the next 10 seconds, that’s for how long we can safely
retry, after that ValKey can discard the response.</p>

<p>Assuming the next version of ValKey ships with the feature, that should finally offer a solution to safely retry all possible queries.</p>

<p>I fully understand that relational databases are much bigger beasts than an in-memory key-value store, hence it likely is harder
to implement, but if I was ever asked what feature MySQL or PostgreSQL could add to make them nicer to work with, it certainly would be this one.</p>

<p>In the case of ValKey, given it’s a text protocol that meant introducing a new command, but MySQL and PostgreSQL both have
binary protocols, with distinct packet types, so I think it would be possible to introduce at the protocol level with
no change to their respective SQL syntax, and no backward compatibility concerns.</p>

<h2 id="prepared-statements">Prepared Statements</h2>

<p>Another essential part of database protocols that I think isn’t pleasant to work with is prepared statements.</p>

<p>Prepared statements mostly serve two functions, the most important one is to provide a query and its parameters separately,
as to eliminate the risk of SQL injections.
In addition to that, it can in some cases help with performance, because it saves on having to parse the query every time,
as well as to send it down the wire. Some databases will also cache the associated query plan.</p>

<p>Here’s how you use prepared statements using the MySQL protocol:</p>

<ul>
  <li>First send a <code class="language-plaintext highlighter-rouge">COM_STMT_PREPARE</code> packet with the parametized query (<code class="language-plaintext highlighter-rouge">SELECT * FROM users WHERE id = ?</code>).</li>
  <li>Read the returned <code class="language-plaintext highlighter-rouge">COM_STMT_PREPARE_OK</code> packet and extract the <code class="language-plaintext highlighter-rouge">statement_id</code>.</li>
  <li>Then send a <code class="language-plaintext highlighter-rouge">COM_STMT_EXECUTE</code> with the <code class="language-plaintext highlighter-rouge">statement_id</code> and the parameters.</li>
  <li>Read the <code class="language-plaintext highlighter-rouge">OK_Packet</code> response.</li>
  <li>Whenever you no longer need that prepared statement, send a <code class="language-plaintext highlighter-rouge">COM_STMT_CLOSE</code> packet with the <code class="language-plaintext highlighter-rouge">statement_id</code>.</li>
</ul>

<p>Now ideally, you execute the same statements relatively often, so you keep track of them, and in the happy path you
can perform a parameterized query in a single roundtrip by directly sending a <code class="language-plaintext highlighter-rouge">COM_STMT_EXECUTE</code> with the known <code class="language-plaintext highlighter-rouge">statement_id</code>.</p>

<p>But one major annoyance is that these <code class="language-plaintext highlighter-rouge">statement_id</code> are session-scoped, meaning they’re only valid with the connection
that was used to create them.
In a modern web application, you don’t just have one connection, but a pool of them, and that’s per process, so you need
to keep track of the same thing many times.</p>

<p>Worse, as explained previously, since closing and reopening the connection is often the only safe way to recover from errors,
whenever that happens, all prepared statements are lost.</p>

<p>These statements also have a cost on the server side. Each statement requires some amount of memory in the database server.
So you have to be careful not to create an unbounded amount of them, which for an ORM isn’t easy to enforce.</p>

<p>It’s not rare for applications to dynamically generate queries based on user input, typically some advanced search or filtering form.</p>

<p>In addition, Active Record allows you to provide SQL fragments, and it can’t know whether they are static strings or dynamically
generated ones. For example, it’s not good practice, but users can perfectly do something like this:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">Article</span><span class="p">.</span><span class="nf">where</span><span class="p">(</span><span class="s2">"published_at &gt; '</span><span class="si">#{</span><span class="no">Time</span><span class="p">.</span><span class="nf">now</span><span class="p">.</span><span class="nf">to_s</span><span class="p">(</span><span class="n">db</span><span class="p">)</span><span class="si">}</span><span class="s2">'"</span><span class="p">)</span>
</code></pre></div></div>

<p>Also, if you have <a href="https://api.rubyonrails.org/classes/ActiveRecord/QueryLogs.html">Active Record query logs</a>, then most
queries will be unique.</p>

<p>All this means that a library like Active Record has to have lots of logic to keep track of prepared statements and their
lifetime. You might even need some form of Least Recently Used logic to prune unused statements and free resources on the server.</p>

<p>In many cases, when you have no reason to believe a particular query will be executed again soon, it is actually advantageous
not to use prepared statements.
Ideally, you’d still use a parameterized query, but then it means doing 2-3 rountrips<sup id="fnref:1"><a href="#fn:1" class="footnote" rel="footnote" role="doc-noteref">1</a></sup> to the database instead of just one.</p>

<p>So for MySQL at least, when you use Active Record with a SQL fragment provided as a string, Active Record fallback to
not use prepared statements, and instead interpolate the parameters inside the query.</p>

<p>Ideally, we’d still use a parameterised query, just not a prepared one, but the MySQL protocol doesn’t offer such functionality.
If you want to use parameterized queries, you have to use prepared statements and in many cases, that will mean an extra roundtrip.</p>

<p>I’m much less familiar with the PostgreSQL protocol, but from glancing at its specification I believe it works largely in the same way.</p>

<p>So how could it be improved?</p>

<p>First I think it should be possible to perform parameterized queries without a prepared statement, I can’t think of a reason
why this isn’t a possibility yet.</p>

<p>Then I think that here again, some inspiration could be taken from Redis.</p>

<h2 id="evalsha">EVALSHA</h2>

<p>Redis doesn’t have prepared statements, that wouldn’t make much sense, but it does have something rather similar in
the form of <a href="https://valkey.io/topics/eval-intro/">Lua scripts</a>.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&gt; EVAL "return ARGV[1] .. ARGV[2]" 0 "hello" "world!"
"helloworld!"
</code></pre></div></div>

<p>But just like SQL queries, Lua code needs to be parsed and can be relatively large, so caching that operation is preferable for
performance.
But rather than a <code class="language-plaintext highlighter-rouge">PREPARE</code> command that returns you a connection-specific identifier for your given script, Redis
instead use SHA1 digests.</p>

<p>You can first load a script with the <code class="language-plaintext highlighter-rouge">SCRIPT LOAD</code> command:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&gt; SCRIPT LOAD "return ARGV[1] .. ARGV[2]"
"702b19e4aa19aaa9858b9343630276d13af5822e"
</code></pre></div></div>

<p>Then you can execute the script as many times as desired by only referring its digest:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&gt; EVALSHA "702b19e4aa19aaa9858b9343630276d13af5822e" 0 "hello" "world!"
"helloworld!"
</code></pre></div></div>

<p>And that script registry is global, so even if you have 5000 connections, they can all share the same script, and you can
even assume scripts have been loaded already, and load them on a retry if they weren’t:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">require</span> <span class="s2">"redis-client"</span>
<span class="nb">require</span> <span class="s2">"digest/sha1"</span>

<span class="k">class</span> <span class="nc">RedisScript</span>
  <span class="k">def</span> <span class="nf">initialize</span><span class="p">(</span><span class="n">src</span><span class="p">)</span>
    <span class="vi">@src</span> <span class="o">=</span> <span class="n">src</span>
    <span class="vi">@digest</span> <span class="o">=</span> <span class="no">Digest</span><span class="o">::</span><span class="no">SHA1</span><span class="p">.</span><span class="nf">hexdigest</span><span class="p">(</span><span class="n">src</span><span class="p">)</span>
  <span class="k">end</span>

  <span class="k">def</span> <span class="nf">execute</span><span class="p">(</span><span class="n">connection</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">)</span>
    <span class="n">connection</span><span class="p">.</span><span class="nf">call</span><span class="p">(</span><span class="s2">"EVALSHA"</span><span class="p">,</span> <span class="vi">@digest</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">)</span>
  <span class="k">rescue</span> <span class="no">RedisClient</span><span class="o">::</span><span class="no">CommandError</span>
    <span class="n">connection</span><span class="p">.</span><span class="nf">call</span><span class="p">(</span><span class="s2">"SCRIPT"</span><span class="p">,</span> <span class="s2">"LOAD"</span><span class="p">,</span> <span class="vi">@src</span><span class="p">)</span>
    <span class="n">connection</span><span class="p">.</span><span class="nf">call</span><span class="p">(</span><span class="s2">"EVALSHA"</span><span class="p">,</span> <span class="vi">@digest</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">)</span>
  <span class="k">end</span>
<span class="k">end</span>

<span class="no">CONCAT_SCRIPT</span> <span class="o">=</span> <span class="no">RedisScript</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="o">&lt;&lt;~</span><span class="no">LUA</span><span class="p">)</span><span class="sh">
  return ARGV[1] .. " " .. ARGV[2]
</span><span class="no">LUA</span>

<span class="n">redis</span> <span class="o">=</span> <span class="no">RedisClient</span><span class="p">.</span><span class="nf">new</span>
<span class="nb">p</span> <span class="no">CONCAT_SCRIPT</span><span class="p">.</span><span class="nf">execute</span><span class="p">(</span><span class="n">redis</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="s2">"Hello"</span><span class="p">,</span> <span class="s2">"World!"</span><span class="p">)</span>
</code></pre></div></div>

<p>I’m not a database engineer, so perhaps there’s some big constraint I’m missing, but I think it would make a lot of sense
for prepared statement identifiers to be some sort of predictable digests, so that they are much more easily shared
across connection, and let the server deal with garbage-collecting prepared statements that haven’t been seen in a long
time, or use some sort of reference counting strategy.</p>

<h2 id="conclusion">Conclusion</h2>

<p>I could probably find a few more examples of things that are impractical in MySQL and PostgreSQL protocols, but I think
I’ve shown enough to share my feelings about them.</p>

<p>Relational databases are extremely impressive projects, clearly built by very smart people, but It feels like the developer
experience isn’t very high on their priority list, if it’s even considered.
And that perhaps explains part of the NoSQL appeal in the early 2010’s.
However, I think it would be possible to significantly improve their usability without changing the query language, just by improving the query
protocol.</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1">
      <p>3 roundtrips in total, but you theoretically can do the <code class="language-plaintext highlighter-rouge">COM_STMT_CLOSE</code> asynchronously. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name></name></author><category term="performance" /><summary type="html"><![CDATA[If you’ve been in this trade for a while, you have probably seen dozens of debates on the merits and problems of SQL as a relational database query language. As an ORM maintainer, I have a few gripes with SQL, but overall it is workable, and anyway, it has so much inertia that there’s no point fantasizing about a replacement.]]></summary></entry><entry><title type="html">The Pitchfork Story</title><link href="https://byroot.github.io/ruby/performance/2025/03/04/the-pitchfork-story.html" rel="alternate" type="text/html" title="The Pitchfork Story" /><published>2025-03-04T10:03:51+00:00</published><updated>2025-03-04T10:03:51+00:00</updated><id>https://byroot.github.io/ruby/performance/2025/03/04/the-pitchfork-story</id><content type="html" xml:base="https://byroot.github.io/ruby/performance/2025/03/04/the-pitchfork-story.html"><![CDATA[<p>A bit more than two years ago, as part of my work in Shopify’s Ruby and Rails Infrastructure team,
I released a new Ruby HTTP server called <a href="https://github.com/Shopify/pitchfork">Pitchfork</a>.</p>

<p>It has a bit of an unusual design and makes hard tradeoffs, so I’d like to explain the thought process behind
these decisions and how I see the future of that project.</p>

<h2 id="unicorns-design-is-fine">Unicorn’s Design Is Fine</h2>

<p>Ever since I joined Shopify over 11 years ago, the main monolith application has been using <a href="https://yhbt.net/unicorn/">Unicorn</a>
as its application server in production.
I know that Unicorn is seen as legacy software by many if not most Rubyists, <a href="https://yhbt.net/unicorn-public/20200908084429.GA16521@dcvr/T/#u">including Unicorn’s own maintainer</a>,
but I very strongly disagree with this opinion.</p>

<p>A major argument against Unicorn is that Rails apps are mostly IO-bound, so besides the existence of the GVL,
you can use a threaded server to increase throughput. <a href="/ruby/performance/2025/01/23/the-mythical-io-bound-rails-app.html">I explained in a previous post why I don’t believe most
Rails applications are IO-bound</a>,
but regardless of how true it is in general, it certainly isn’t the case of Shopify’s monolith, hence using a threaded
server wasn’t a viable option.</p>

<p>In addition, back in 2014, before the existence of the Ruby and Rails Infrastructure team at Shopify,
I worked on the Resiliency team, where we were in charge of reducing the likeliness of outages, as well as reducing the
blast radius of any outage we failed to prevent. That’s the team where we developed tools such as
<a href="https://github.com/Shopify/toxiproxy">Toxiproxy</a> and <a href="https://github.com/Shopify/semian">Semian</a>.</p>

<p>During my stint on the Resiliency team, I’ve witnessed some pretty catastrophic failures.
Some <a href="https://github.com/protocolbuffers/protobuf/issues/11968">C extensions segfaulting</a>, or worse,
<a href="https://github.com/grpc/grpc/pull/16332">deadlocking the Ruby VM</a>, some datastores becoming unresponsive, and more.</p>

<p>What I learned from that experience, is that while you should certainly strive to catch as many bugs as possible out front on CI,
you have to accept that you can’t possibly catch them all.
So ultimately, it becomes a number game. If an application is developed by half a dozen people, this kind of event
may only happen once in a blue moon. But when dealing with a monolith on which hundreds if not thousands of developers are
actively making changes every day, bugs are a fact of life.</p>

<p>As such, it’s important to adopt a defense-in-depth strategy, if you cannot possibly abolish all bugs, you can at least
limit their blast radius with various techniques.
And <a href="/ruby/performance/2025/02/09/guard-rails-are-not-code-smells.html">Unicorn’s process based execution model largely participated in the resiliency of the system</a>.</p>

<h2 id="its-not-all-rainbows-and-unicorns">It’s Not All Rainbows And Unicorns</h2>

<p>But while I’ll never cease to defend Unicorn’s design, I’m also perfectly able to recognize that it also has its downsides.</p>

<p>One is that Unicorn doesn’t attempt to protect against common attacks such as <a href="https://en.wikipedia.org/wiki/Slowloris_(cyber_attack)">slowloris</a>,
so it’s mandatory to put it behind a buffering reverse proxy such as NGINX.
You may consider this to be extra complexity, but to me, it’s the opposite.
Yes, it’s one more “moving piece”, but from my point of view, it’s less complex to defer many classic concerns to a battle-tested software used
across the world, with lots of documentation, rather than to trust my application server can safely be exposed directly
to the internet. I’d much rather trust the NGINX community to keep up with whatever novel attack was engineered last week
than rely on the part of the Ruby community that uses my app server of choice. Not that I distrust the Ruby community,
but my assumption is that the larger community is more likely to quickly get the security fixes in.</p>

<p>And if a reverse proxy will be involved anyway, you can let it take care of many standard concerns such as terminating SSL,
allowing newer versions of HTTP, serving static assets, etc. I don’t think that an extra moving piece brings extra complexity
when it is such a standard part of so many stacks and removes a ton of complexity from the next piece in the chain.
But that’s just me, I suppose, especially after reading some of the reactions to my previous posts, that not everybody
agree on what is complex and what is simple.</p>

<p>Another shortcoming of the multi-process design that’s often mentioned, is its inability to do efficient connection pooling.
Since connections aren’t easily shared across processes, each unicorn worker will maintain a separate pool of connections,
that will be idle most of the time.</p>

<p>But here too, there aren’t many alternatives. Even if you accept the tradeoff of using a threaded server, you will still need to
run at least one process per core, hence you won’t be able to cut the number of idle connections significantly compared to Unicorn.
You may be able to buy a bit of time that way, but sooner or later it won’t be enough.</p>

<p>Ultimately, once you scale past a certain size you kinda have to accept that external connection pooling is a necessity.
The only alternative I can think of would be to implement cross-process connection pooling by passing file descriptors via IPC.
It’s technically doable, but I can’t imagine myself arguing that it’s less complex than setting up <a href="https://proxysql.com/">ProxySQL</a>,
<a href="https://github.com/facebook/mcrouter">mcrouter</a> / <a href="https://github.com/twitter/twemproxy">twemproxy</a> etc.</p>

<p>Yet another complaint I heard, was that the multi-process design made it impossible to cache data in memory.
But here too I’m going to sound like a broken record, as long as Ruby doesn’t have a viable way to do in-process parallelism,
you will have to run at least one process per core, so trying to cache data in-process is never going to work well.</p>

<p>But even without that limitation, I’d still argue you’d be better not to use the heap as a cache because by doing so you
are creating extra work for the garbage collector, and anyway, all the caches would be wiped on every deploy, which may
be quite frequent, so I’d much rather run a small local Memcached instance on every web node, or use something like SQLite
or whatever. It’s a bit slower than in-memory caching, in part because it requires serialization, but it persists across
deploys and is shared across all the processes on the server, so have a much better hit ratio.</p>

<p>And finally, by far the most common complaint against the Unicorn model is the extra memory usage induced by processes,
and that’s exactly what Pitchfork was designed to solve.</p>

<h2 id="the-heap-janitor">The Heap Janitor</h2>

<p>Whenever I’m asked what my day-to-day job is like, I have a very hard time explaining it, because I kind of do an
amalgamation of lots of small things that aren’t necessarily all logically related. So it’s almost impossible for me to
come up with an answer that makes sense, and I don’t think I ever gave the same answer twice.
I also probably made a fool of myself more than once.</p>

<p>But among the many hats I occasionally wear, there’s one I call the “Heap Janitor”.
When you task hundreds if not thousands of developers to add features to a monolith, its memory usage will keep growing.
Some of that growth will be legitimate because every line of code has to reside somewhere in memory as VM bytecode, but some of it can
be reduced or eliminated by using better data structures, deduplicating some data, etc.</p>

<p>Most of the time when the Shopify monolith would experience a memory leak, or simply would have increased its memory
usage enough to be problematic, I’d get involved in the investigation.
Over time I developed some expertise on how to analyse a Ruby application’s heap, find leaks or opportunities for memory
usage reduction.</p>

<p>I even <a href="https://github.com/Shopify/heap-profiler">developed some dedicated tools</a> to help with that task, and integrated
them into CI so every morning I’d get a nightly report of what Shopify’s monolith heap is made of, to better see historical
trends and proactively fix newly introduced problems.</p>

<p>Once, <a href="https://github.com/rails/rails/pull/35860#issuecomment-480218928">by deduplicating the schema information Active Record keeps</a>,
I managed to reduce each process memory usage by 114MB, and by now I probably sent over a hundred patches to many gems
to reduce their memory usage, most <a href="https://github.com/dry-rb/dry-schema/pull/399">patches revolve around interning some strings</a>.</p>

<p>But while you can often find more compact ways to represent some data in memory, that can’t possibly compensate for the
new features being added constantly.</p>

<h2 id="the-miracle-cow">The Miracle CoW</h2>

<p>So by far, the most effective way to reduce an application’s memory usage is to allow
more memory to be shared between processes via Copy-on-Write, which in the case of Puma or Unicorn, means ensuring it’s loaded
during boot, and is never mutated after that.</p>

<p>Since the Shopify monolith runs in pretty large containers with 36 workers, if you load 1GiB of extra data in memory,
as long as you do it during boot and it is never mutated, thanks to Copy-on-Write that will only account for an extra
28MiB (<code class="language-plaintext highlighter-rouge">1024 / 36</code>) of actual memory usage per worker, which is perfectly reasonable.</p>

<p>Unfortunately, the lazy loading pattern is extremely common in Ruby code, I’m sure you’ve seen plenty of code like this:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">module</span> <span class="nn">SomeNamespace</span>
  <span class="k">class</span> <span class="o">&lt;&lt;</span> <span class="nb">self</span>
    <span class="k">def</span> <span class="nf">config</span>
      <span class="vi">@config</span> <span class="o">||=</span> <span class="no">YAML</span><span class="p">.</span><span class="nf">load_file</span><span class="p">(</span><span class="s2">"path/to/config.yml"</span><span class="p">)</span>
    <span class="k">end</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>Here I used a YAML config file as an example, but sometimes it’s fetching or computing data from somewhere else,
they key point is <code class="language-plaintext highlighter-rouge">@ivar ||=</code> being done in a class or module method.</p>

<p>This pattern is good in development because it means that if you don’t need that data, you won’t waste time computing
it, but in production, it’s bad, because not only that memory won’t be in shared pages, it will also cause the first
request that needs this data to do some extra work, causing latency to spike around deploys.</p>

<p>A very simple way to improve this code is to just use a constant:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">module</span> <span class="nn">SomeNamespace</span>
  <span class="no">CONFIG</span> <span class="o">=</span> <span class="no">YAML</span><span class="p">.</span><span class="nf">load_file</span><span class="p">(</span><span class="s2">"path/to/config.yml"</span><span class="p">)</span>
<span class="k">end</span>
</code></pre></div></div>

<p>But if for some reason you really want this to be lazily loaded in development,
<a href="https://guides.rubyonrails.org/configuring.html#config-eager-load-namespaces">Rails offers a not-so-well-known API</a> to
help with that:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">module</span> <span class="nn">SomeNamespace</span>
  <span class="k">class</span> <span class="o">&lt;&lt;</span> <span class="nb">self</span>
    <span class="k">def</span> <span class="nf">eager_load!</span>
      <span class="n">config</span>
    <span class="k">end</span>

    <span class="k">def</span> <span class="nf">config</span>
      <span class="vi">@config</span> <span class="o">||=</span> <span class="no">YAML</span><span class="p">.</span><span class="nf">load_file</span><span class="p">(</span><span class="s2">"path/to/config.yml"</span><span class="p">)</span>
    <span class="k">end</span>
  <span class="k">end</span>
<span class="k">end</span>

<span class="c1"># in: config/application.rb</span>
<span class="n">config</span><span class="p">.</span><span class="nf">eager_load_namespaces</span> <span class="o">&lt;&lt;</span> <span class="no">SomeNamespace</span>
</code></pre></div></div>

<p>In the above example, Rails takes care of calling <code class="language-plaintext highlighter-rouge">eager_load!</code> on all objects you add to <code class="language-plaintext highlighter-rouge">config.eager_load_namespaces</code>
when it’s booted in production mode. This way you keep lazy loading in development environments, but get eager loading
in production.</p>

<p>I spent a lot of time improving Shopify’s monolith and its open-source dependencies to make it eager-load more.
To help me track down the offending call sites, I configured <a href="https://github.com/Shopify/app_profiler">our profiling middleware</a>
so that it would automatically trigger profiling of the very first request processed by a worker.
And similarly, I configured our Unicorn so that a few workers would dump their heap with <a href="https://docs.ruby-lang.org/en/3.4/ObjectSpace.html#method-i-dump_all"><code class="language-plaintext highlighter-rouge">ObjectSpace.dump_all</code></a>
before and after their very first request.</p>

<p>On paper, every object allocated as part of a Rails request is supposed to no longer be referenced once the request has been completed.
So by taking a heap snapshot before and after a request, and making a diff of them, you can locate any object that should
have been eager loaded during boot.</p>

<p>Over time this data helped me increase the amount of shared memory, from something around <code class="language-plaintext highlighter-rouge">45%</code> up to about <code class="language-plaintext highlighter-rouge">60%</code> of the total,
hence significantly reduced the memory usage of individual workers, but I was hitting diminishing returns.</p>

<p><code class="language-plaintext highlighter-rouge">60%</code> is good, but I was hoping for more. In theory, only the memory allocated as part of the request cycle can’t be shared,
the overwhelming majority of the rest of the objects should be shareable, so I was expecting the ratio of shared memory
to be more akin to <code class="language-plaintext highlighter-rouge">80%</code>, which begged the question, which memory still wasn’t shared?</p>

<h2 id="inline-caches">Inline Caches</h2>

<p>For a while I tried to answer this question using eBPF probes, but after reading man pages for multiple days,
I had to accept that these sorts of things fly over my head<sup id="fnref:1"><a href="#fn:1" class="footnote" rel="footnote" role="doc-noteref">1</a></sup>, so I gave up.</p>

<p>But one day I had a revelation: It must be the inline caches!</p>

<p>A very large portion of the Shopify monolith heap is comprised of VM bytecode, as mentioned previously, all the code
written by all these developers has to end up somewhere. That bytecode is largely immutable but very close to it there
are inline caches<sup id="fnref:2"><a href="#fn:2" class="footnote" rel="footnote" role="doc-noteref">2</a></sup> and they are mutable, at least early.</p>

<p>And if they are close together in the heap, mutating an inline cache would invalidate the entire 4kiB page, including lots
of immutable objects on the same page.</p>

<p>To validate my assumption, I wrote a test application:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">module</span> <span class="nn">App</span>
  <span class="no">CONST_NUM</span> <span class="o">=</span> <span class="no">Integer</span><span class="p">(</span><span class="no">ENV</span><span class="p">.</span><span class="nf">fetch</span><span class="p">(</span><span class="s2">"NUM"</span><span class="p">,</span> <span class="mi">100_000</span><span class="p">))</span>

  <span class="no">CONST_NUM</span><span class="p">.</span><span class="nf">times</span> <span class="k">do</span> <span class="o">|</span><span class="n">i</span><span class="o">|</span>
    <span class="nb">class_eval</span><span class="p">(</span><span class="o">&lt;&lt;~</span><span class="no">RUBY</span><span class="p">,</span> <span class="kp">__FILE__</span><span class="p">,</span> <span class="kp">__LINE__</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span><span class="sh">
      Const</span><span class="si">#{</span><span class="n">i</span><span class="si">}</span><span class="sh"> = Module.new

      def self.lookup_</span><span class="si">#{</span><span class="n">i</span><span class="si">}</span><span class="sh">
        Const</span><span class="si">#{</span><span class="n">i</span><span class="si">}</span><span class="sh">
      end
</span><span class="no">    RUBY</span>
  <span class="k">end</span>

  <span class="nb">class_eval</span><span class="p">(</span><span class="o">&lt;&lt;~</span><span class="no">RUBY</span><span class="p">,</span> <span class="kp">__FILE__</span><span class="p">,</span> <span class="kp">__LINE__</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span><span class="sh">
    def self.warmup
      </span><span class="si">#{</span><span class="no">CONST_NUM</span><span class="p">.</span><span class="nf">times</span><span class="p">.</span><span class="nf">map</span> <span class="p">{</span> <span class="o">|</span><span class="n">i</span><span class="o">|</span> <span class="s2">"lookup_</span><span class="si">#{</span><span class="n">i</span><span class="si">}</span><span class="s2">"</span><span class="si">}</span><span class="sh">.join("</span><span class="se">\n</span><span class="sh">")}
    end
</span><span class="no">  RUBY</span>
<span class="k">end</span>
</code></pre></div></div>

<p>It uses meta-programming, but is rather simple, it defines 100k methods, each referencing a unique constant.
If I removed the meta-programing it would look like this:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">module</span> <span class="nn">App</span>
  <span class="no">Const0</span> <span class="o">=</span> <span class="no">Module</span><span class="p">.</span><span class="nf">new</span>
  <span class="k">def</span> <span class="nc">self</span><span class="o">.</span><span class="nf">lookup_0</span>
    <span class="no">Const0</span>
  <span class="k">end</span>

  <span class="no">Const1</span> <span class="o">=</span> <span class="no">Module</span><span class="p">.</span><span class="nf">new</span>
  <span class="k">def</span> <span class="nc">self</span><span class="o">.</span><span class="nf">lookup_1</span>
    <span class="no">Const1</span>
  <span class="k">end</span>

  <span class="k">def</span> <span class="nc">self</span><span class="o">.</span><span class="nf">warmup</span>
    <span class="n">lookup_0</span>
    <span class="n">lookup_1</span>
    <span class="c1"># snip...</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>Why this pattern? Because it’s a good way to generate a lot of inline caches, constant caches in this case, and to
trigger their warmup.</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;</span> <span class="nb">puts</span> <span class="no">RubyVM</span><span class="o">::</span><span class="no">InstructionSequence</span><span class="p">.</span><span class="nf">compile</span><span class="p">(</span><span class="s1">'Const0'</span><span class="p">).</span><span class="nf">disasm</span>
<span class="o">==</span> <span class="ss">disasm: </span><span class="c1">#&lt;ISeq:&lt;compiled&gt;@&lt;compiled&gt;:1 (1,0)-(1,6)&gt;</span>
<span class="mo">0000</span> <span class="n">opt_getconstant_path</span>                   <span class="o">&lt;</span><span class="n">ic</span><span class="p">:</span><span class="mi">0</span> <span class="no">Const0</span><span class="o">&gt;</span>             <span class="p">(</span>   <span class="mi">1</span><span class="p">)[</span><span class="no">Li</span><span class="p">]</span>
<span class="mo">0002</span> <span class="n">leave</span>
</code></pre></div></div>

<p>Here the <code class="language-plaintext highlighter-rouge">&lt;ic:0&gt;</code> tells us this instructions has an associated inline cache.
These constant caches start uninitialized, and the first time this codepath is executed, the Ruby VM goes through
the slow process of finding the object that’s pointed by that constant, and stores it in the cache.
On further execution, it just needs to check the cache wasn’t invalidated, which for constants is extremely rare unless
you are doing some really nasty meta programming during runtime.</p>

<p>Now, using this app, we can demonstrate the effect of inline caches on Copy-on-Write effectiveness:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">show_pss</span><span class="p">(</span><span class="n">title</span><span class="p">)</span>
  <span class="c1"># Easy way to get PSS on Linux</span>
  <span class="nb">print</span> <span class="n">title</span><span class="p">.</span><span class="nf">ljust</span><span class="p">(</span><span class="mi">30</span><span class="p">,</span> <span class="s2">" "</span><span class="p">)</span>
  <span class="nb">puts</span> <span class="no">File</span><span class="p">.</span><span class="nf">read</span><span class="p">(</span><span class="s2">"/proc/self/smaps_rollup"</span><span class="p">).</span><span class="nf">scan</span><span class="p">(</span><span class="sr">/^Pss: (.*)$/</span><span class="p">)</span>
<span class="k">end</span>

<span class="n">show_pss</span><span class="p">(</span><span class="s2">"initial"</span><span class="p">)</span>

<span class="n">pid</span> <span class="o">=</span> <span class="nb">fork</span> <span class="k">do</span>
  <span class="n">show_pss</span><span class="p">(</span><span class="s2">"after fork"</span><span class="p">)</span>

  <span class="no">App</span><span class="p">.</span><span class="nf">warmup</span>
  <span class="n">show_pss</span><span class="p">(</span><span class="s2">"after fork after warmup"</span><span class="p">)</span>
<span class="k">end</span>
<span class="no">Process</span><span class="p">.</span><span class="nf">wait</span><span class="p">(</span><span class="n">pid</span><span class="p">)</span>
</code></pre></div></div>

<p>If you run the above script on Linux, you should get something like:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>initial                                    246380 kB
after fork                                 121590 kB
after fork after warmup                    205688 kB
</code></pre></div></div>

<p>So our synthetic <code class="language-plaintext highlighter-rouge">App</code> made our initial Ruby process grow to <code class="language-plaintext highlighter-rouge">246MB</code>, and once we forked a child, its
<a href="https://en.wikipedia.org/wiki/Proportional_set_size">proportionate memory usage</a> was immediately cut in half as expected.
However once <code class="language-plaintext highlighter-rouge">App.warmup</code> is called in the child, all these inline caches end up initialized, and most of the Copy-on-Write
pages get invalidated, making the proportionate memory usage grow back to <code class="language-plaintext highlighter-rouge">205MB</code>.</p>

<p>So you probably guessed the next step, if you can call <code class="language-plaintext highlighter-rouge">App.warmup</code> before forking, you stand to save a ton of memory:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">show_pss</span><span class="p">(</span><span class="n">title</span><span class="p">)</span>
  <span class="c1"># Easy way to get PSS on Linux</span>
  <span class="nb">print</span> <span class="n">title</span><span class="p">.</span><span class="nf">ljust</span><span class="p">(</span><span class="mi">30</span><span class="p">,</span> <span class="s2">" "</span><span class="p">)</span>
  <span class="nb">puts</span> <span class="no">File</span><span class="p">.</span><span class="nf">read</span><span class="p">(</span><span class="s2">"/proc/self/smaps_rollup"</span><span class="p">).</span><span class="nf">scan</span><span class="p">(</span><span class="sr">/^Pss: (.*)$/</span><span class="p">)</span>
<span class="k">end</span>

<span class="n">show_pss</span><span class="p">(</span><span class="s2">"initial"</span><span class="p">)</span>
<span class="no">App</span><span class="p">.</span><span class="nf">warmup</span>
<span class="n">show_pss</span><span class="p">(</span><span class="s2">"after warmup"</span><span class="p">)</span>

<span class="n">pid</span> <span class="o">=</span> <span class="nb">fork</span> <span class="k">do</span>
  <span class="n">show_pss</span><span class="p">(</span><span class="s2">"after fork"</span><span class="p">)</span>

  <span class="no">App</span><span class="p">.</span><span class="nf">warmup</span>
  <span class="n">show_pss</span><span class="p">(</span><span class="s2">"after fork after warmup"</span><span class="p">)</span>
<span class="k">end</span>
<span class="no">Process</span><span class="p">.</span><span class="nf">wait</span><span class="p">(</span><span class="n">pid</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>initial                                    246404 kB
after warmup                               251140 kB
after fork                                 123944 kB
after fork after warmup                    124240 kB
</code></pre></div></div>

<p>My theory was somewhat validated.
If I found a way to fill inline caches before fork, I’d stand to achieve massive memory savings.
Some would for sure continue to flip-flop like inline method caches in polymorphic code paths,
but the vast majority of them would essentially be static memory.</p>

<p>However, that was easier said than done.</p>

<p>Generally, when I mentioned that problem, the suggestion was to exercise these code paths as part of boot, but
it already isn’t easy to get good coverage in the test environment, it would be even harder during boot in the production
environment.
Even worse, many of these code paths have side effects, you can’t just run them like that out of context. Anyway, with something like this in place, the application would take ages to boot, and it would be painful to maintain.</p>

<p>Another idea was to attempt to precompute these caches statically, which for constant caches is relatively easy.
But it’s only part of the picture, method caches, and instance variable caches are much harder, if not impossible to predict
statically, so perhaps it would help a bit, but it wouldn’t solve the issue once and for all.</p>

<p>Given all these types of caches are stored right next to each other, as soon as a single one changes, the entire <code class="language-plaintext highlighter-rouge">4kiB</code> memory page is invalidated.</p>

<p>Yet another suggestion was to serve traffic for a while from the Unicorn master process, but I didn’t like this
idea because that process is in charge of overseeing and coordinating all the workers, it can’t afford to render
requests, as it can’t be timed out.</p>

<h2 id="pumas-fork-worker">Puma’s Fork Worker</h2>

<p>That idea lived in my head for quite some time, not too sure how long but certainly months, until one day I noticed
an experimental feature in Puma: <a href="https://github.com/puma/puma/pull/2099"><code class="language-plaintext highlighter-rouge">fork_worker</code></a>.
Someone had identified the same issue, or at least a very similar one, and came up with an interesting idea.</p>

<p>It would initially start Puma in a normal way, with the cluster process overseeing its workers, but after a while you
could trigger a mechanism that would cause all workers except the first one to shut down, and be replaced not by
forking from the cluster process, but from the remaining worker.</p>

<p>So in terms of process hierarchy, you’d go from:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>10000   \_ puma 4.3.3 (tcp://0.0.0.0:9292) [puma]
10001       \_ puma: cluster worker 0: 10000 [puma]
10002       \_ puma: cluster worker 1: 10000 [puma]
10003       \_ puma: cluster worker 2: 10000 [puma]
10004       \_ puma: cluster worker 3: 10000 [puma]
</code></pre></div></div>

<p>To:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>10000   \_ puma 4.3.3 (tcp://0.0.0.0:9292) [puma]
10001       \_ puma: cluster worker 0: 10000 [puma]
10005           \_ puma: cluster worker 1: 10000 [puma]
10006           \_ puma: cluster worker 2: 10000 [puma]
10007           \_ puma: cluster worker 3: 10000 [puma]
</code></pre></div></div>

<p>I found the solution quite brilliant, rather than trying to exercise code paths in some automated way, just let live traffic
do it and then share that state with other workers. Simple.</p>

<p>But I had a major reservation with that feature, it’s that if you use it you end up with 3 levels of processes,
and as I explained in <a href="/ruby/performance/2025/02/09/guard-rails-are-not-code-smells.html">my post about how guardrails are important</a>,
if anything goes wrong, I want to be able to terminate any worker safely.</p>

<p>In this case, what happens if <code class="language-plaintext highlighter-rouge">worker 0</code> is terminated or crashes by itself? Other workers end up orphaned, which in POSIX
means that they’ll be adopted by the PID 1, AKA the init process, not the Puma cluster process and that’s a major resiliency issue,
as Puma needs the workers to be its direct children for various things.
For this to be resilient, you’d need to fork these workers as siblings, not children, and that’s just not possible.</p>

<p>I really couldn’t reasonably consider deploying Shopify’s monolith this way, it would for sure bite us hard soon enough.
Yet, I was really curious about how effective it could be, so I set an experiment to have a single container in the canary
environment to use Puma with this feature enabled for a while, and it performed both fantastically and horribly.</p>

<p>Fantastically because the memory gains were absolutely massive, and horribly because the newly spawned workers started
raising errors from the <code class="language-plaintext highlighter-rouge">grpc</code> gem.
Errors that I knew relatively well because they came from <a href="https://github.com/grpc/grpc/pull/16332">a safety check added a few years prior in the <code class="language-plaintext highlighter-rouge">grpc</code> gem by one of my coworkers</a>
to prevent <code class="language-plaintext highlighter-rouge">grpc</code> from deadlocking in the presence of <code class="language-plaintext highlighter-rouge">fork</code>.</p>

<p>In addition to my reservations about process parenting, it was also clear that making the <code class="language-plaintext highlighter-rouge">grpc</code> gem fork-safe would
be almost impossible.
So I shoved that idea in the drawer with all the other good ideas that will never be and moved on.</p>

<h2 id="child-subreaper">Child Subreaper</h2>

<p>Until one day, I’m not too sure how long after, I was searching for a solution to a different problem, in <a href="https://man7.org/linux/man-pages/man2/prctl.2.html">the
<code class="language-plaintext highlighter-rouge">prctl(2)</code> manpage</a>, and I stumbled upon <a href="https://man7.org/linux/man-pages/man2/PR_SET_CHILD_SUBREAPER.2const.html">the <code class="language-plaintext highlighter-rouge">PR_SET_CHILD_SUBREAPER</code>
constant</a>.</p>

<blockquote>
  <p>If set is nonzero, set the “child subreaper” attribute of the
calling process; if set is zero, unset the attribute.</p>

  <p>A subreaper fulfills the role of init(1) for its descendant
processes.  When a process becomes orphaned (i.e., its immediate
parent terminates), then that process will be reparented to the
nearest still living ancestor subreaper.</p>
</blockquote>

<p>This was exactly the feature I didn’t know existed and didn’t know I wanted, to make Puma’s experimental feature more robust.</p>

<p>If you’d enable <code class="language-plaintext highlighter-rouge">PR_SET_CHILD_SUBREAPER</code> on the Puma cluster process, the <code class="language-plaintext highlighter-rouge">worker 0</code> would be able to spawn siblings
by doing the classic daemonization procedure: forking a grandchild, and orphaning it.
This would cause the new worker to be reparented to the Puma cluster process, effectively allowing you to fork a sibling.</p>

<p>Additionally, at that point, we were running YJIT in production, which made our memory usage situation noticeably worse, so we had to use tricks to enable it only on a subset of workers.</p>

<p>By definition, JIT compilers generate code at runtime, that is a lot of memory that can’t be in shared pages.
If I could make this idea work in production, that would allow JITed code to be shared, making the potential savings
even bigger.</p>

<p>So I then proceeded to spend the next couple weeks prototyping.</p>

<h2 id="the-very-first-prototype">The Very First Prototype</h2>

<p>I both tried to improve Puma’s feature and also to add the feature to Unicorn to see which would be the simplest.</p>

<p>It is probably in big part due to my higher familiarity with Unicorn, but I found it easier to do in Unicorn,
and proceeded to <a href="https://yhbt.net/unicorn-public/aecd9142-94cf-b195-34f3-bea4870ed9c8@shopify.com/T/">send a patch to the mailing list</a>.</p>

<p>The first version of the patch actually didn’t use <code class="language-plaintext highlighter-rouge">PR_SET_CHILD_SUBREAPER</code> because it is a Linux-only feature, and Unicorn
support all POSIX systems.
Instead, I built on Unicorn’s zero-downtime restart functionality, I’d fork a new master process and proceed to shutdown
the old one, and replace the pidfile.</p>

<p>To help you picture it better, starting from a classic Unicorn process tree:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>PID     Proctitle

1000   \_ unicorn master
1001       \_ unicorn worker 0
1002       \_ unicorn worker 1
1003       \_ unicorn worker 2
1004       \_ unicorn worker 3
</code></pre></div></div>

<p>Once you trigger reforking, the worker starts to behave like a new master:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>PID     Proctitle

1000   \_ unicorn master
1001       \_ unicorn master, generation 2
1002       \_ unicorn worker 1
1003       \_ unicorn worker 2
1004       \_ unicorn worker 3
</code></pre></div></div>

<p>Then the old and new master processes would progressively shut down and spawn their workers respectively:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>PID     Proctitle

1000   \_ unicorn master
1001       \_ unicorn master, generation 2
1005         \_ unicorn worker 0, generation 2
1006         \_ unicorn worker 1, generation 2
1003       \_ unicorn worker 2
1004       \_ unicorn worker 3
</code></pre></div></div>

<p>Until the old master has no workers left, at which point it exits.</p>

<p>This approach had the benefit of working on all POSIX systems, however, it was very brittle and required launching Unicorn
in daemonized mode, which isn’t what you want in containers and most modern deployment systems.</p>

<p>I was also relying on creating named pipes in the file system to allow the master process and workers to have a communication pipe,
which really wasn’t elegant at all.</p>

<p>But that was enough to send a patch and get some feedback on whether such a feature was desired upstream, as well as feedback on the implementation.</p>

<h2 id="inter-process-communication">Inter-Process Communication</h2>

<p>In Unicorn, the master process has to be able to communicate with its workers, for instance, to ask them to shut down,
this sort of thing.</p>

<p>The easiest way to do inter-process communication is to send a signal, but it limits you to just a few predefined
signals, many of which already have a meaning.
In addition, signals are handled asynchronously, so they tend to interrupt system calls and can generally conflict with
the running application.</p>

<p>So what Unicorn does is that it implements “soft signals”. Instead of sending real signals, before spawning each
workers, it creates a pipe, and the children look for messages from the master process in between processing two requests.</p>

<p>Here’s a simplified example of how it works.</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">spawn_worker</span>
  <span class="n">read_pipe</span><span class="p">,</span> <span class="n">write_pipe</span> <span class="o">=</span> <span class="no">IO</span><span class="p">.</span><span class="nf">pipe</span>
  <span class="n">child_pip</span> <span class="o">=</span> <span class="nb">fork</span> <span class="k">do</span>
    <span class="n">write_pipe</span><span class="p">.</span><span class="nf">close</span>
    <span class="kp">loop</span> <span class="k">do</span>
      <span class="n">ready_ios</span> <span class="o">=</span> <span class="no">IO</span><span class="p">.</span><span class="nf">select</span><span class="p">([</span><span class="n">read_pipe</span><span class="p">,</span> <span class="vi">@server_socket</span><span class="p">])</span>
      <span class="n">ready_ios</span><span class="p">.</span><span class="nf">each</span> <span class="k">do</span> <span class="o">|</span><span class="n">io</span><span class="o">|</span>
        <span class="k">if</span> <span class="n">io</span> <span class="o">==</span> <span class="n">read_pipe</span>
          <span class="c1"># handle commands sent by the parent process in the pipe</span>
        <span class="k">else</span>
          <span class="c1"># handle HTTP request</span>
        <span class="k">end</span>
      <span class="k">end</span>
    <span class="k">end</span>
  <span class="k">end</span>
  <span class="n">read_pipe</span><span class="p">.</span><span class="nf">close</span>
  <span class="p">[</span><span class="n">child_pid</span><span class="p">,</span> <span class="n">write_pipe</span><span class="p">]</span>
<span class="k">end</span>
</code></pre></div></div>

<p>The master process keeps the writing end of the pipe, and the worker the reading end.
Whenever it is idle, a worker waits for either the command pipe or the HTTP socket to have something to read using
either <code class="language-plaintext highlighter-rouge">epoll</code>, <code class="language-plaintext highlighter-rouge">kqueue</code> or <code class="language-plaintext highlighter-rouge">select</code>. In this example, I just use Ruby’s provided <code class="language-plaintext highlighter-rouge">IO.select</code>, which is functionally equivalent.</p>

<p>With this in place, the Unicorn master always has both the PID and a communication pipe to all its workers.</p>

<p>But in my case, I wanted the master to be able to know about workers it didn’t spawn itself.
For the PID, it wasn’t that hard, I could just create a second pipe, but in the opposite direction, so that workers
would be able to send a message to the master to let it know about the new worker PID.
But how to establish the communication pipe with the grandparent?</p>

<p>That’s why my first prototype used named pipes, also known as FIFO, which are exactly like regular pipes, except they are
exposed as files on the file system tree. This way the master to look for a named pipe at an agreed-upon location, and
have a way to send messages to its grandchildren. It worked but as Unicorn’s maintainer, pointed out in his feedback, there
was a much cleaner solution, <a href="https://man7.org/linux/man-pages/man2/socketpair.2.html"><code class="language-plaintext highlighter-rouge">socketpair(2)</code></a> and
<a href="https://docs.ruby-lang.org/en/3.4/UNIXSocket.html#method-i-send_io"><code class="language-plaintext highlighter-rouge">UNIXSocket#send_io</code></a>.</p>

<p>First, <code class="language-plaintext highlighter-rouge">socketpair(2)</code> as its name implies creates two sockets that are connected to each other, so it’s very similar
to pipes but is bidirectional. Since I needed two-way communication between processes, that was simpler and cleaner than
creating two pipes each time.</p>

<p>But then, a little-known capability of UNIX domain sockets (at least I didn’t know about it), is that they allow you to
pass file descriptors to another process. Here’s a quick demo in Ruby:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">require</span> <span class="s1">'socket'</span>
<span class="nb">require</span> <span class="s1">'tempfile'</span>

<span class="n">parent_socket</span><span class="p">,</span> <span class="n">child_socket</span> <span class="o">=</span> <span class="no">UNIXSocket</span><span class="p">.</span><span class="nf">socketpair</span>

<span class="n">child_pid</span> <span class="o">=</span> <span class="nb">fork</span> <span class="k">do</span>
  <span class="n">parent_socket</span><span class="p">.</span><span class="nf">close</span>

  <span class="c1"># Create a file that doesn't exist on the file system</span>
  <span class="n">file</span> <span class="o">=</span> <span class="no">Tempfile</span><span class="p">.</span><span class="nf">create</span><span class="p">(</span><span class="ss">anonymous: </span><span class="kp">true</span><span class="p">)</span>
  <span class="n">file</span><span class="p">.</span><span class="nf">write</span><span class="p">(</span><span class="s2">"Hello"</span><span class="p">)</span>
  <span class="n">file</span><span class="p">.</span><span class="nf">rewind</span>

  <span class="n">child_socket</span><span class="p">.</span><span class="nf">send_io</span><span class="p">(</span><span class="n">file</span><span class="p">)</span>
  <span class="n">file</span><span class="p">.</span><span class="nf">close</span>
<span class="k">end</span>
<span class="n">child_socket</span><span class="p">.</span><span class="nf">close</span>

<span class="n">child_io</span> <span class="o">=</span> <span class="n">parent_socket</span><span class="p">.</span><span class="nf">recv_io</span>
<span class="nb">puts</span> <span class="n">child_io</span><span class="p">.</span><span class="nf">read</span>
<span class="no">Process</span><span class="p">.</span><span class="nf">wait</span><span class="p">(</span><span class="n">child_pid</span><span class="p">)</span>
</code></pre></div></div>

<p>In the above example, we have the child process create an anonymous file and share it with its parent through a UNIX
domain socket.</p>

<p>With this new capability, I could make the design much less brittle. Now when a new worker was spawned, it could send
a message to the master process with all the necessary metadata as well as an attached socket for direct communication
with the new worker.</p>

<h2 id="the-decision-to-fork">The Decision To Fork</h2>

<p>Thanks to Eric Wong’s suggestions, I started to have a much neater design based around <code class="language-plaintext highlighter-rouge">PR_SET_CHILD_SUBREAPER</code> but at that
point rather than continue to attempt to upstream that new feature in Unicorn, I chose to instead fork the project under
a different name for multiple reasons.</p>

<p>First, it became clear that several Unicorn features were hard to make work in conjunction with reforking.
Not impossible, but it would have required quite a lot of effort, and ultimately it would induce a risk that I’d break Unicorn
for some of its users.</p>

<p>Unicorn also isn’t the easiest project to contribute to.
It has a policy of supporting very old versions of Ruby, many of them lacking features I wanted to use,
and hard to install on modern systems, making debugging extra hard.
It also doesn’t use bundler nor most of the modern Ruby tooling, which makes it hard to contribute to for many people,
has its own bash-based unit test framework,
and accept patches over a mailing list rather than some forge.</p>

<p>I wouldn’t go as far as to say Unicorn is hostile to outside contributions, as it’s not the intent,
but in practice it kinda is.</p>

<p>So if I had to make large changes to support that new feature, it was preferable to do it as a different project,
one that wouldn’t impact the existing user base in case of mistakes, and one I’d be in control of, allowing me to
iterate and release quickly based on production experience.</p>

<p>That’s why I decided to fork. I started by removing many of Unicorn’s features that I believe aren’t useful in a modern
container-based world, removing the dependency on <code class="language-plaintext highlighter-rouge">kgio</code> in favor of using the non-locking IO APIs introduced in newer
versions of Ruby.</p>

<p>From that simplified Unicorn base I could more easily do a clean and robust implementation of the feature I wanted
without having the constraint of not breaking features I didn’t need.</p>

<p>The nice thing when you start a new project is that you get to choose a name for it.
Initially, I wanted to continue the trend of naming Ruby web servers after animals and possibly marking the lineage with
Unicorn by naming it after another mythical animal.
So for a while, I considered naming the new project <a href="https://en.wikipedia.org/wiki/Dahu">Dahu</a>,
but ultimately I figured something with <code class="language-plaintext highlighter-rouge">fork</code> in the name would be more catchy.
Unfortunately, it’s very hard to find names on Rubygems that haven’t been taken yet, but I decided to send a mail to
the person who owned the <code class="language-plaintext highlighter-rouge">pitchfork</code> gem, which was long abandoned, and they very gracefully transferred the gem to me.
That’s how <code class="language-plaintext highlighter-rouge">pitchfork</code> was born.</p>

<h2 id="the-mold-process">The Mold Process</h2>

<p>Now that I could more significantly change the server, I decided to move the responsibility of spawning new workers
out of the master process, which I renamed “monitor process” for the occasion.</p>

<p>In Unicorn, assuming you use the <code class="language-plaintext highlighter-rouge">preload_app</code> option to better benefit from Copy-on-Write, new workers are forked from
the master process, but that master process never serves any request, so all the application code it loaded is never called.
In addition, if you are running in a container, you can’t reasonably replace the initial process.</p>

<p>What I did instead is that Pitchfork’s monitor process never loads the application code, instead it gives that responsibility
to the first child it spawns: the “mold”. That mold process is responsible for loading the application, and spawning
new workers when ordered to do so by the “monitor” process. The process tree initially looks like this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>PID     Proctitle

1000   \_ pitchfork monitor
1001       \_ pitchfork mold
</code></pre></div></div>

<p>Then, once the mold is fully booted, the monitor sends requests to spawn workers, which the mold does using the classic double fork:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>PID     Proctitle

1000   \_ pitchfork monitor
1001       \_ pitchfork mold
1002          \_ pitchfork init-worker
1003             \_ pitchfork worker 0
</code></pre></div></div>

<p>Once the <code class="language-plaintext highlighter-rouge">init-worker</code> process exits, <code class="language-plaintext highlighter-rouge">worker 0</code> becomes an orphan and is automatically reparented to the monitor:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>PID     Proctitle

1000   \_ pitchfork monitor
1001       \_ pitchfork mold
1003       \_ pitchfork worker 0
</code></pre></div></div>

<p>Since all workers and the mold are at the same level, whenever we decide to do so, we can declare that a worker is now the new
mold, and respawn all other workers from it:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>PID     Proctitle

1000   \_ pitchfork monitor
1001       \_ pitchfork mold &lt;exiting&gt;
1003       \_ pitchfork mold, generation 2
1005       \_ pitchfork worker 0, generation 2
1007       \_ pitchfork worker 1, generation 2
</code></pre></div></div>

<p>All of this of course being done progressively, one worker at a time, to avoid significantly reducing the capacity
of the server.</p>

<h2 id="benchmarking">Benchmarking</h2>

<p>After that, I turned my constant cache demo into <a href="https://github.com/Shopify/pitchfork/tree/b70ee3c8700a997ee9513c81709b91062cc79ca1/benchmark">a memory usage benchmark for Rack servers</a>,
and that early version of Pitchfork performed as well as I hoped.</p>

<p>Compared to Puma with 2 workers and 2 threads, Pitchfork configured with 4 processes would use half the memory:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ PORT=9292 bundle exec benchmark/cow_benchmark.rb puma -w 2 -t 2 --preload
Booting server...
Warming the app with ab...
Memory Usage:
Single Worker Memory Usage: 207.5 MiB
Total Cluster Memory Usage: 601.6 MiB
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ PORT=8080 bundle exec benchmark/cow_benchmark.rb pitchfork -c examples/pitchfork.conf.minimal.rb 
Booting server...
Warming the app with ab...
Memory Usage:
Single Worker Memory Usage: 62.6 MiB
Total Cluster Memory Usage: 320.3 MiB
</code></pre></div></div>

<p>Of course, this is an extreme micro-benchmark for demonstration purposes, and not indicative of the effect on any
given real application in production, but it was very encouraging.</p>

<h2 id="the-bumpy-road-to-production">The Bumpy Road To Production</h2>

<p>Writing a new server, and benchmarking it, is the fun and easy part, and you can probably spend months ironing it out
if you so wish.</p>

<p>But it’s only once you attempt to put it in production that you’ll learn of all the mistakes you made and all the
problems you didn’t think of.</p>

<p>In this particular case though, there was one major blocker I did know of, and that I did know I had to solve
before even attempting to put Pitchfork in production: my old nemesis, the <code class="language-plaintext highlighter-rouge">grpc</code> gem.</p>

<p>I have a very long history of banging my head against my desk trying to fix compilation issues in that gem,
or figuring out leaks and other issues, so I knew making it fork-safe wouldn’t be an easy task.</p>

<p>To give you an idea of how much of a juggernaut it is, here’s a <code class="language-plaintext highlighter-rouge">sloccount</code> report from the
source package, hence excluding tests, etc:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cloc --include-lang='C,C++,C/C++ Header' .
-----------------------------------------------------------------
Language       files          blank        comment           code
-----------------------------------------------------------------
C/C++ Header    1797          43802          96161         309150
C++              983          35199          53621         261047
C                463           9020           8835          81831
-----------------------------------------------------------------
SUM:            3243          88021         158617         652028
-----------------------------------------------------------------
</code></pre></div></div>

<p>Depending on whether you consider that headers are code or not, that is either
significantly bigger than Ruby’s own source code, or about as big.</p>

<p>Here’s the same <code class="language-plaintext highlighter-rouge">sloccount</code> in <code class="language-plaintext highlighter-rouge">ruby/ruby</code> excluding tests and default gems for comparison:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cloc --include-lang='C,C++,C/C++ Header' --exclude-dir=test,spec,-test-,gems,trans,build .
------------------------------------------------------------------
Language        files          blank        comment           code
------------------------------------------------------------------
C                 304          51562          83404         315614
C/C++ Header      406           8588          32604          84751
------------------------------------------------------------------
SUM:              710          60150         116008         400365
------------------------------------------------------------------
</code></pre></div></div>

<p>And to that, you’d also need to add the <code class="language-plaintext highlighter-rouge">google-protobuf</code> gem that works in hand with <code class="language-plaintext highlighter-rouge">grpc</code> and is also quite
sizeable.</p>

<p>Because of that, rather than try to make <code class="language-plaintext highlighter-rouge">grpc</code> fork-safe, I first tried to see if I could instead eliminate that
problematic dependency, given that after all, it was barely used in the monolith. It was only used to call a single service.
Unfortunately, I wasn’t capable of convincing the team using that gem to move to something else.</p>

<p>I later attempted to find a way to make the library fork-safe, but I was forced to admit I wasn’t capable of it.
All I managed to do was figure out that <a href="https://github.com/grpc/grpc/blob/master/doc/fork_support.md#current-status">the Python bindings had optional support for fork safety behind an environment
variable</a>.
That confirmed it was theoretically possible, but still beyond my capacities.</p>

<p>So I wasn’t happy about it, but I had to abandon the Pitchfork project. It just wasn’t viable as long as <code class="language-plaintext highlighter-rouge">grpc</code> remained
a dependency.</p>

<p>A few months later, a colleague who probably heard me cursing across the Atlantic Ocean asked if he could help.
Given that fork-safety was supported by the Python version of <code class="language-plaintext highlighter-rouge">grpc</code>, and that Shopify is a big Google Cloud customer
with a very high tier of support, he thought he could pull a few strings and get Google to implement it.
And he was right, it took a long time, probably something like six months, but
<a href="https://github.com/grpc/grpc/pull/33430">the <code class="language-plaintext highlighter-rouge">grpc</code> gem did end up gaining fork support</a>.</p>

<p>And just like that, after being derailed for half a year, the Pitchfork project was back on track, so a big thanks to
Alexander Polcyn for improving <code class="language-plaintext highlighter-rouge">grpc</code>.</p>

<h2 id="fixing-other-fork-safety-issues">Fixing Other Fork Safety Issues</h2>

<p>At that point, it was clear there were other issues than <code class="language-plaintext highlighter-rouge">grpc</code>, but I had some confidence I’d be able to
tackle them. Even without enabling reforking, it was advantageous to replace Unicorn with Pitchfork in production,
as to confirm no bugs were introduced in the HTTP and IO layers, but also because it allowed us to remove
our dependency on <code class="language-plaintext highlighter-rouge">kgio</code>, unlocked compatibility with <code class="language-plaintext highlighter-rouge">rack 3</code>, and a few other small things.
So that was the first step.</p>

<p>Then, fixing the fork safety issues other than <code class="language-plaintext highlighter-rouge">grpc</code> took approximately another month.</p>

<p>The first thing I did was to <a href="https://github.com/minitest/minitest/pull/961#issuecomment-1654393109">simulate reforking on CI</a>.
Every 100 tests or so, CI workers would refork the same way Pitchfork does. This uncovered fork-safety issues
in other gems, notably <code class="language-plaintext highlighter-rouge">ruby-vips</code>.
Luckily this gem wasn’t used much by web workers, so I devised a new strategy to deal with it.</p>

<p>Pitchfork doesn’t actually need all workers to be fork-safe, only the ones that will be promoted into the next mold.
So if some libraries cause workers to become fork unsafe once they’ve been used, like <code class="language-plaintext highlighter-rouge">ruby-vips</code>, but are very rarely called,
what we can do is <a href="https://github.com/Shopify/pitchfork/pull/55">mark the worker as no longer being allowed to be promoted</a>.</p>

<p>If you are abusing this feature, you may end up with all workers marked as fork-unsafe, and no longer able
to refork ever. But once I shipped Pitchfork in production, I did put some instrumentation in place to keep an eye on
how often workers would be marked unsafe and it was very rare, so we were fine.</p>

<p>Once I managed to get a green CI with reforking on, I still was a bit worried about the application being fork-safe.
Because simulating reforking on CI was good for catching issues with dead threads, but didn’t do much for catching
issues with inherited file descriptors.</p>

<p>In production, the problem with inheriting file descriptors mostly comes from multiple processes using the same
file descriptor concurrently. But on CI, even with that reforking simulation, we’re always running a single process.</p>

<p>So I had to think of another strategy to ensure no file descriptors were leaking.</p>

<p>This led me to develop another Pitchfork helper: <a href="https://github.com/Shopify/pitchfork/pull/56"><code class="language-plaintext highlighter-rouge">close_all_ios!</code></a>.
The idea is relatively simple, after a reforking happens, you can use <a href="https://docs.ruby-lang.org/en/3.4/ObjectSpace.html#method-c-each_object"><code class="language-plaintext highlighter-rouge">ObjectSpace.each_object</code></a>
to find all instances of <code class="language-plaintext highlighter-rouge">IO</code> and close them unless they’ve been explicitly marked as fork-safe with <code class="language-plaintext highlighter-rouge">Pitchfork::Info.keep_io</code>.</p>

<p>This isn’t fully reliable, as it can only catch Ruby-level IOs, and can’t catch file descriptors held in C extensions,
but it still helped find numerous issues in gems and private code.</p>

<p>Here’s <a href="https://github.com/discourse/mini_mime/pull/50">one example in the <code class="language-plaintext highlighter-rouge">mini_mime</code> gem</a>.</p>

<p>The gem is a small wrapper that allows querying flat files that contain information about mime types,
and to do that it would keep a read-only file, and <code class="language-plaintext highlighter-rouge">seek</code> into it:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">resolve</span><span class="p">(</span><span class="n">row</span><span class="p">)</span>
  <span class="vi">@file</span><span class="p">.</span><span class="nf">seek</span><span class="p">(</span><span class="n">row</span> <span class="o">*</span> <span class="vi">@row_length</span><span class="p">)</span>
  <span class="no">Info</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="vi">@file</span><span class="p">.</span><span class="nf">readline</span><span class="p">)</span>
<span class="k">end</span>
</code></pre></div></div>

<p>Since <code class="language-plaintext highlighter-rouge">seek</code> and <code class="language-plaintext highlighter-rouge">readline</code> aren’t thread-safe, the gem would wrap all that in a global mutex.</p>

<p>The problem here is that on fork file descriptors are inherited, and file descriptors aren’t just a pointer to a file
or socket. File descriptors also include a cursor that is incremented when you call <code class="language-plaintext highlighter-rouge">seek</code> or <code class="language-plaintext highlighter-rouge">read</code>.</p>

<p>To make this fork safe you could detect that a fork happened, and reopen the file, but there’s actually a much better solution.</p>

<p>Rather than to rely on <code class="language-plaintext highlighter-rouge">seek + read</code>, you can instead rely on <a href="https://man7.org/linux/man-pages/man2/pread.2.html"><code class="language-plaintext highlighter-rouge">pread(2)</code></a>,
which Ruby conveniently exposes in the <code class="language-plaintext highlighter-rouge">IO</code> class.
Instead of advancing the cursor like <code class="language-plaintext highlighter-rouge">read</code>, <code class="language-plaintext highlighter-rouge">pread</code> takes absolute offsets from the start of the file, which makes it
ideal to use in multi-threaded and multi-process scenarios:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">resolve</span><span class="p">(</span><span class="n">row</span><span class="p">)</span>
  <span class="no">Info</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="vi">@file</span><span class="p">.</span><span class="nf">pread</span><span class="p">(</span><span class="vi">@row_length</span><span class="p">,</span> <span class="n">row</span> <span class="o">*</span> <span class="vi">@row_length</span><span class="p">))</span>
<span class="k">end</span>
</code></pre></div></div>

<p>In addition to fixing the fork-safety in that gem, using <code class="language-plaintext highlighter-rouge">pread</code> also allowed to remove the global mutex, making the gem faster.
Win-win.</p>

<h2 id="the-first-production-reforking">The First Production Reforking</h2>

<p>After a few more rounds of grepping the codebase and its dependencies for patterns that may be problematic, I started being
confident enough to start manually triggering reforking in a single canary container.</p>

<p>To be clear, I was expecting some issues to be left, but I was out of ideas on how to catch any more of them
and confident the most critical problems such as data corruption were out of the picture.</p>

<p>These manual reforks didn’t reveal any issues, except that <a href="https://github.com/Shopify/pitchfork/issues/60">I forgot to also prevent manual reforking once a worker
had been maked as fork-unsafe</a>, 🤦.</p>

<p>Since other than that it went well, I progressively enabled automatic reforking on more and more servers over the span of
a few days, first 1%, then 10%, etc, with seemingly no problems.
While doing that I was also trying multiple different reforking frequencies, to try to identify a good tradeoff
between memory usage reduction and latency impact.</p>

<p>But one of the characteristics of the Shopify monolith, with so many engineers shipping changes every day, is that
it’s deployed extremely frequently, as often as every 30 minutes, and with teams across the world, this never really
stops except for a couple of hours at night, and a couple of days during weekends.</p>

<p>For the same reason that rebooting your computer will generally make whatever issue you had go away, redeploying a web
application will generally hide various bugs that take time to manifest themselves.
So over the years, doing this sort of infrastructure changes, I learned that even when you think you succeeded,
you might discover problems over the next weekend.</p>

<p>And in this case, it is what happened. On the night of Friday to Saturday, Site Reliability Engineers got paged because
some application servers became unresponsive, with very high CPU usage.</p>

<p>Luckily I had a ton of instrumentation in place to help me tune reforking, so I was able to investigate this immediately
on Saturday morning, and quickly identified some smoking guns.</p>

<p>The first thing I noticed is that on these nodes, the <code class="language-plaintext highlighter-rouge">after_fork</code> callbacks were taking close to a minute on average,
while they’d normally take less than a second. In that callback, we were mostly doing two things,
calling <code class="language-plaintext highlighter-rouge">Pitchfork::Info.close_all_ios!</code>, and eagerly reconnecting to datastores. So a good explanation for these spikes
would be an IO “leak”.</p>

<p>Hence I immediately jumped on a canary container to confirm my suspicion. The worker processes were fine, but
the mold processes were indeed “leaking” file descriptors, I still have the logs from that investigation:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>appuser@web-59bccbbd79-sgfph:~$ date; ls  /proc/135229/fd | wc -l
Sat Sep 23 07:52:46 UTC 2023
155
appuser@web-59bccbbd79-sgfph:~$ date; ls  /proc/135229/fd | wc -l
Sat Sep 23 07:52:47 UTC 2023
156
appuser@web-59bccbbd79-sgfph:~$ date; ls  /proc/135229/fd | wc -l
Sat Sep 23 07:52:47 UTC 2023
157
appuser@web-59bccbbd79-sgfph:~$ date; ls  /proc/135229/fd | wc -l
Sat Sep 23 07:52:48 UTC 2023
157
appuser@web-59bccbbd79-sgfph:~$ date; ls  /proc/135229/fd | wc -l
Sat Sep 23 07:52:49 UTC 2023
158
appuser@web-59bccbbd79-sgfph:~$ date; ls  /proc/135229/fd | wc -l
Sat Sep 23 07:52:49 UTC 2023
158
appuser@web-59bccbbd79-sgfph:~$ date; ls  /proc/135229/fd | wc -l
Sat Sep 23 07:52:50 UTC 2023
159
appuser@web-59bccbbd79-sgfph:~$ date; ls  /proc/135229/fd | wc -l
Sat Sep 23 07:52:51 UTC 2023
160
appuser@web-59bccbbd79-sgfph:~$ date; ls  /proc/135229/fd | wc -l
Sat Sep 23 07:52:51 UTC 2023
160
</code></pre></div></div>

<p>I could see that the mold process was creating file descritors at the rate of roughly one per second.</p>

<p>So I snapshotted the result of <code class="language-plaintext highlighter-rouge">ls -lh /proc/&lt;pid&gt;/fd</code> twice a few seconds apart, and used <code class="language-plaintext highlighter-rouge">diff</code> to see
which ones were new:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ diff tmp/fds-1.txt tmp/fds-2.txt 
130a131,135
&gt; lrwx------ 1 64 Sep 23 07:54 215 -&gt; 'socket:[10443548]'
&gt; lrwx------ 1 64 Sep 23 07:54 216 -&gt; 'socket:[10443561]'
&gt; lrwx------ 1 64 Sep 23 07:54 217 -&gt; 'socket:[10443568]'
&gt; lrwx------ 1 64 Sep 23 07:54 218 -&gt; 'socket:[10443577]'
&gt; lrwx------ 1 64 Sep 23 07:54 219 -&gt; 'socket:[10443605]'
&gt; lrwx------ 1 64 Sep 23 07:54 220 -&gt; 'socket:[10465514]'
&gt; lrwx------ 1 64 Sep 23 07:54 221 -&gt; 'socket:[10443625]'
&gt; lrwx------ 1 64 Sep 23 07:54 222 -&gt; 'socket:[10443637]'
&gt; lrwx------ 1 64 Sep 23 07:54 223 -&gt; 'socket:[10477738]'
&gt; lrwx------ 1 64 Sep 23 07:54 224 -&gt; 'socket:[10477759]'
&gt; lrwx------ 1 64 Sep 23 07:54 225 -&gt; 'socket:[10477764]'
&gt; lrwx------ 1 64 Sep 23 07:54 226 -&gt; 'socket:[10445634]'
...
</code></pre></div></div>

<p>These file descriptors were sockets. I went on and took a heap dump using <code class="language-plaintext highlighter-rouge">rbtrace</code>,
to see what the leak looked like from Ruby’s point of view:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>...
5130070:{"address":"0x7f5d11bfff48", "type":"FILE", "class":"0x7f5d8bc9eec0", "fd":11, "memsize":248}
7857847:{"address":"0x7f5cd9950668", "type":"FILE", "class":"0x7f5d8bc9eec0", "fd":-1, "memsize":8440}
7857868:{"address":"0x7f5cd99511d0", "type":"FILE", "class":"0x7f5d81597280", "fd":4855, "memsize":248}
7857933:{"address":"0x7f5cd9951fb8", "type":"FILE", "class":"0x7f5d8bc9eec0", "fd":-1, "memsize":8440}
7857953:{"address":"0x7f5cd99523c8", "type":"FILE", "class":"0x7f5d81597280", "fd":4854, "memsize":248}
7858016:{"address":"0x7f5cd9952fd0", "type":"FILE", "class":"0x7f5d8bc9eec0", "fd":-1, "memsize":8440}
7858036:{"address":"0x7f5cd9953390", "type":"FILE", "class":"0x7f5d81597280", "fd":4853, "memsize":248}
...
</code></pre></div></div>

<p>Here <code class="language-plaintext highlighter-rouge">"type":"FILE"</code> corresponds to Ruby’s <code class="language-plaintext highlighter-rouge">T_FILE</code> base type, which encompasses all <code class="language-plaintext highlighter-rouge">IO</code> objects.
I then used <a href="https://github.com/csfrancis/harb"><code class="language-plaintext highlighter-rouge">harb</code></a><sup id="fnref:3"><a href="#fn:3" class="footnote" rel="footnote" role="doc-noteref">3</a></sup>, to get some more context on these IO objects
and quickly got my answer:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>harb&gt; print 0x7f5cd9950668
    0x7f5cd9950668: "FILE"
           memsize: 8,440
  retained memsize: 8,440
     references to: [
                      0x7f5cc9c59158 (FILE: (null))
                      0x7f5cd71d8540 (STRING: "/tmp/raindrop_monitor_84")
                      0x7f5cc9c590e0 (DATA: mutex)
                    ]
   referenced from: [
                      0x7f5cc9c59158 (FILE: (null))
                    ]
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">/tmp/raindrop_monitor</code> path hinted at one of our utility threads, which used to run in the Unicorn master process
and that I had moved into the Pitchfork mold process.</p>

<p>It uses <code class="language-plaintext highlighter-rouge">raindrops</code> gem to connect to the server port and extract TCP statistics to estimate how many requests
are queued, hence producing a utilization metric of the application server.</p>

<p>Basically, it executes the following code in a loop, and makes the result accessible to all workers:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">Raindrops</span><span class="o">::</span><span class="no">Linux</span><span class="p">.</span><span class="nf">tcp_listener_stats</span><span class="p">(</span><span class="s2">"localhost:$PORT"</span><span class="p">)</span>
</code></pre></div></div>

<p>The problem here is that <code class="language-plaintext highlighter-rouge">tcp_listener_stats</code> opens a socket to get the TCP stats, but doesn’t close the socket, nor even return it to you. It leaves to the Ruby GC the responsibility of closing the file descriptor.</p>

<p>Normally, this isn’t a big deal, because GC should trigger somewhat frequently, but the Pitchfork mold process, or
even the Unicorn master process, doesn’t do all that much work, hence allocates rarely, as a result, GC may only very
rarely trigger, if at all, letting these objects, hence file descriptors, accumulate over time.</p>

<p>Then once a new worker had to be spawned, it would inherit all these file descriptors, and have to close them all,
causing a lot of work for the kernel. That perfectly explained the observed issue and also explained why it would get
worse over time. The reforking frequency wasn’t fixed, it was configured to be relatively frequent at first,
and then less and less so. Leaving increasingly more time for file descriptors to accumulate.</p>

<p>To fix that problem, <a href="https://yhbt.net/raindrops-public/6E0E349D-A7CE-4B88-8F89-66438BB775A1@gmail.com/T/#u">I submitted a patch to Raindrops</a>,
to make it eagerly close these sockets, and applied the patch immediately on our systems, and the problem was gone.</p>

<p>What I find interesting here, is that in a way this bug was predating the Pitchfork migration.
Sockets were already accumulating in Unicorn’s master process, it just had not enough of an impact there for us to notice.</p>

<p>This wasn’t the only issue found in production, but it was the most impactful and is a good illustration of how
reforking can go wrong.</p>

<h2 id="tuning-reforking-frequency">Tuning Reforking Frequency</h2>

<p>Concurrently to ironing out reforking bugs, I spent a lot of time deploying various reforking settings, as it’s
a bit of a balancing act.</p>

<p>Reforking and Copy-on-Write aren’t free. It sounds a bit magical when described, but this is a lot of work for the
kernel.</p>

<p>Forking a process with which you share memory isn’t terribly costly, but after that, whenever a shared page has to be
invalidated because either the child or the parent has mutated it, the kernel has to pause the process and copy the
page over. So after you trigger a refork, you can expect some negative impact on the process latency, at least for
a little while.</p>

<p>That’s why it can be hard to find the sweet spot. If you refork too often you’ll degrade the service latency,
if you refork too infrequently, you’re not going to save as much memory.</p>

<p>For this sort of configuration, with lots of variables, I just tend to deploy multiple configurations concurrently,
and graph the results to try to locate the sweet spot, which is exactly what I did here.</p>

<p>Ultimately I settled on a setting with fairly linear growth:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">PITCHFORK_REFORK_AFTER</span><span class="o">=</span><span class="s2">"500,750,1000,1200,1400,1800,2000,2200,2400,2600,2800,...
</span></code></pre></div></div>

<p>The idea is that young containers are likely triggering various lazy initializations at a relatively fast rate,
but that over time, as more an more of these have been warmed, invalidations become less frequent.</p>

<p>Back <a href="https://railsatscale.com/2023-10-23-pitchfork-impact-on-shopify-monolith/">in 2023 I wrote a post that shared quite a few details on the results of reforking on Shopify’s monolith</a>,
you can read it if you want more details, but in short, memory usage was reduced by <code class="language-plaintext highlighter-rouge">30%</code>, and latency by <code class="language-plaintext highlighter-rouge">9%</code>.</p>

<p>The memory usage reduction was largely expected, but the latency reduction was a bit of a nice surprise at first,
if anything I was hoping latency wouldn’t be degraded too much.</p>

<p>I had to investigate to understand how it was even possible.</p>

<h2 id="the-unicorn-bias">The Unicorn Bias</h2>

<p>One thing to know about how Unicorn and Pitchfork works is that, on Linux, they wait for incoming requests using the <code class="language-plaintext highlighter-rouge">epoll</code> system call.
Once a request comes in, the worker is woken up by the kernel and immediately calls <code class="language-plaintext highlighter-rouge">accept</code> to, well, accept the request.
This is a very classic pattern, that many servers use, but historically it suffered from a problem called the
<a href="https://en.wikipedia.org/wiki/Thundering_herd_problem">“thundering herd problem”</a>.</p>

<p>Assuming a fully idle server with 32 workers, all waiting on <code class="language-plaintext highlighter-rouge">epoll</code>, whenever a request would come in,
all 32 workers would be woken up, and all try to call <code class="language-plaintext highlighter-rouge">accept</code>, but only one of them would succeed.
This was a pretty big waste of resources, so in 2016, with the release of Linux 4.5, <code class="language-plaintext highlighter-rouge">epoll</code> gained a new flag: <code class="language-plaintext highlighter-rouge">EPOLLEXCLUSIVE</code>.</p>

<p>If this flag is set, the Linux kernel will only wake up a single worker when a request comes in.
However the feature doesn’t try to be fair or anything, it just wakes up the first it finds, and because of how the
feature is implemented, it behaves a bit like a Last In First Out queue, in other words, a stack.</p>

<p>As a result, unless most workers are busy most of the time, what you’ll observe is that some workers will serve
disproportionately more requests than others. In some cases, I witnessed that <code class="language-plaintext highlighter-rouge">worker 0</code> had processed over a thousand
requests while <code class="language-plaintext highlighter-rouge">worker 47</code> had only seen a dozen requests.</p>

<p>Unicorn isn’t the only server impacted by that, <a href="https://blog.cloudflare.com/the-sad-state-of-linux-socket-balancing/">Cloudflare engineers wrote a much more detailed post on how NGINX behaves
the way</a>.</p>

<p>In Ruby’s case, this imbalance means that all these inline caches in the VM, all the lazy initialized code in the application,
as well as YJIT, are much more warmed up in some workers than in others.</p>

<h2 id="how-reforking-can-reduce-latency">How Reforking Can Reduce Latency</h2>

<p>Because of all these caches, JIT, etc, a “cold” worker is measurably slower than a warmed-up one,
and because of the balancing bias, workers are very unevenly warmed up.</p>

<p>However since the criteria for promoting a worker into the new mold is the number of requests it has handled,
it’s almost always the most warmed-up worker that ends up being used as a template for the next generation of workers.</p>

<p>As a result, with reforking enabled, workers are much more warmed up on average, hence running faster.
In my initial post about Pitchfork, I illustrated this by showing how much more JITed code workers had in containers
where reforking was enabled compared to the ones without:</p>

<p><img src="/assets/articles/pitchfork/yjit-code-region-size.png" alt="" /></p>

<p>And more JITed code translates into faster execution and less time spent compiling hot methods.</p>

<h2 id="the-actual-killer-feature">The Actual Killer Feature</h2>

<p>As explained previously, the motivator for working on Pitchfork was reducing memory usage.
Especially with the advent of YJIT, we were hitting some limits, and I wanted to solve that once and for all.
But in reality, it would have been much less effort to just ask for more RAM on servers.
RAM is quite cheap these days, and most hosting services will give you about 4GiB of RAM per core, which even for
Ruby is plenty.</p>

<p>It’s only when working with very large monoliths that this becomes a bit tight. But even then, we could have
relatively easily used servers with more RAM per core, and while it would have incurred extra cost, it probably wouldn’t
have been too bad in the grand scheme of things.</p>

<p>It’s only after reforking fully shipped to production, that I started to understand its real benefits.
Beyond the memory savings, the way the warmest worker is essentially “checkpointed” and used as a template means
that whenever a small spike of traffic comes in, and workers that are normally mostly idle respond to that traffic,
they do it noticeably faster than they used to.</p>

<p>In addition, when we were running Unicorn, we were keeping a close eye on worker terminations caused by request
timeouts or OOM, because killing a Unicorn worker meant replacing a warm worker with a cold worker, hence it had a
noticeable performance impact.</p>

<p>But since reforking was enabled, not only does this happen less often because OOM events are less common,
but also the killed worker is now replaced with a fairly well-warmed-up one, with already a lot of JITed code and such.</p>

<p>And I now believe this is the true killer feature of Pitchfork, before the memory usage reduction.</p>

<p>This realization of how powerful checkpointing is, later led me to further optimize the monolith.</p>

<h2 id="pushing-it-further">Pushing It Further</h2>

<p>YJIT has this nice characteristic that it warms up quite fast and for relatively cheap.
By that, I mean that it reaches its peak performance quickly, and doesn’t slow down normal Ruby execution too much
while doing so.</p>

<p>However last summer, when I started testing Ruby 3.4.0-preview1 in production, I discovered a pretty major regression
in YJIT compile time. The compiled code was still as fast if not faster, but YJIT was suddenly requiring 4 times as much
CPU to do its compilation, which was causing large spikes of CPU utilization on our servers, negatively impacting the
overall latency.</p>

<p>What happened is that the YJIT team had recently rewritten the register allocator to be smarter, but it also ended up
being noticeably slower. This is a common tradeoff in JIT design, if you complexify the compiler, it may generate faster
code, but degrade performance more while it is compiling.</p>

<p>I of course reported the issue to the YJIT team, but it was clear that this performance would not be reclaimed quickly, so
it was complicated to keep the Ruby preview in production with such regression in it.</p>

<p>Until it hit me: why are we even bothering to compile this much?</p>

<p>If you think about it, we were deploying Pitchfork with 36 workers, and all 36 of them have YJIT enabled, so all of
them compile new code when they discover new hot methods. So most methods, especially the hottest ones, are compiled 36 times.</p>

<p>But once one worker has served the 500 requests required to be promoted, all the code compiled by other workers is just
thrown out of the window, it’s a huge waste.</p>

<p>Which gave me the idea, what if we only enabled YJIT in the <code class="language-plaintext highlighter-rouge">worker 0</code>? Thanks to the balancing bias induced by
<code class="language-plaintext highlighter-rouge">EPOLLEXCLUSIVE</code>, we already know it will most likely be the one to be promoted, and for the others, we can just
mark them as not fork-safe.</p>

<p>This is quite trivially done from the Pitchfork config:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">after_worker_fork</span> <span class="k">do</span> <span class="o">|</span><span class="n">server</span><span class="p">,</span> <span class="n">worker</span><span class="o">|</span>
  <span class="k">if</span> <span class="n">worker</span><span class="p">.</span><span class="nf">nr</span> <span class="o">==</span> <span class="mi">0</span>
    <span class="no">RubyVM</span><span class="o">::</span><span class="no">YJIT</span><span class="p">.</span><span class="nf">enable</span>
  <span class="k">else</span>
    <span class="o">::</span><span class="no">Pitchfork</span><span class="o">::</span><span class="no">Info</span><span class="p">.</span><span class="nf">no_longer_fork_safe!</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>Of course, once the first generation is promoted, YJIT is then enabled in all workers, but this helped tremendously
to reduce the YJIT overhead soon after a deploy.</p>

<p>Here’s a graph that shows the distribution of system time around deploys. YJIT tends to make the system time spike
when warming up, because it calls <code class="language-plaintext highlighter-rouge">mprotect</code> frequently to mark pages as either executable or writable.
This causes quite a lot of load on the kernel.</p>

<p>The first spike is a deploy before I enabled this configuration, on the second spike the yellow line has the
configuration enabled, while the green one still doesn’t have it.</p>

<p><img src="/assets/articles/pitchfork/yjit_delayed_enable.png" alt="" /></p>

<p>While there is currently no way to turn YJIT back off once it has been enabled, we did experiment with such a feature
for other reasons a few years ago. So there may be a case for bringing that feature back, as it would allow to
keep YJIT compilation disabled in all workers but one, further reducing the overhead caused by YJIT’s warmup.</p>

<p>There are also a few other advanced optimizations that aren’t exclusive to Pitchfork but are facilitated by it, such as
<a href="https://railsatscale.com/2024-10-23-next-generation-oob-gc/">Out of Band Garbage Collection</a>, but I can’t mention everything.</p>

<h2 id="beyond-shopify">Beyond Shopify</h2>

<p>I never really intended Pitchfork to be more than a very opinionated fork of Unicorn, for very specific needs.
I even wrote <a href="https://github.com/Shopify/pitchfork/blob/b70ee3c8700a997ee9513c81709b91062cc79ca1/docs/WHY_MIGRATE.md">a long document essentially explaining why you probably don’t want to migrate to Pitchfork</a>.</p>

<p>But based on issues open on the repo, some conference chatter, and a few DMs I got, it seems that a handful of companies
either migrated to it or are currently working on doing so.</p>

<p>Unsurprisingly, these are mostly companies that used to run Unicorn and have relatively large monoliths.</p>

<p>However, <a href="https://blog.studysapuri.jp/entry/2024-pitchfork-into-the-largest-rails-application-in-studysapuri">the only public article about such migration I know of is in Japanese</a>.</p>

<p>But it’s probably for the better, because while reforking is very powerful, as I tried to demonstrate in this post,
fork-safety issues can lead to pretty catastrophic bugs that can be very hard to debug, hence it’s probably better left
to teams with the resources and expertise needed to handle that sort of thing.</p>

<p>So I prefer to avoid any sort of Pitchfork hype.</p>

<p>That being said, I’ve also noticed some people simply interested in a modernized Unicorn, not intending to ever enable
reforking, which I guess is a good enough reason to migrate.</p>

<h2 id="the-future-of-pitchfork">The Future Of Pitchfork</h2>

<p>At this point, after seeing all the performance improvements I mentioned, you may be thinking that Shopify must be pretty
happy with its brand-new application server.</p>

<p>Well.</p>

<p>While Pitchfork was well received by my immediate team, my manager, my director, and many of my peers, the feedback I got
from upper management wasn’t exactly as positive:</p>

<blockquote>
  <p>reforking is a hack that I think is borderline abdication of engineering responsibilities, so this won’t do</p>
</blockquote>

<p>Brushing aside the offensiveness of the phrasing, it may surprise you to hear that I do happen to, at least partially,
agree with this statement.</p>

<p>This is why before writing this post, I wrote a whole series on <a href="/ruby/performance/2025/01/23/the-mythical-io-bound-rails-app.html">how IO-bound Rails applications really are</a>,
<a href="/ruby/performance/2025/02/27/whats-the-deal-with-ractors.html">the current state of parallelism in Ruby</a> and a few other adjacent subjects.
To better explain the tradeoffs currently at play when designing a Ruby web server.</p>

<p>I truly believe that <strong>today</strong>, Pitchfork’s design is what best answers the needs of a large Rails monolith,
I wouldn’t have developed it otherwise.
It offers true parallelism and faster JIT warmup, absurdly little time spent in GC, while keeping memory usage low and
does so with a decent level of resiliency.</p>

<p>That being said, I also truly hope that <strong>tomorrow</strong>, Pitchfork’s design will be obsolete.</p>

<p>I do hope that in the future Ruby will be capable of true parallelism in a single process, be it via improved Ractors,
or by progressively <a href="/ruby/performance/2025/01/29/so-you-want-to-remove-the-gvl.html">removing the GVL</a>, I’m not picky.</p>

<p>But this is a hypothetical future. The very second it happens, I’ll happily work on Pitchfork’s successor, and slap a deprecation
notice on Pitchfork.</p>

<p>That being said, I know I’m rarely the most optimistic person in the room, it’s in my nature, but I honestly can’t
see this future happening in the short term. Maybe in 2 or 3 years, certainly not before.</p>

<p>Because it’s not just about Ruby itself, it’s also about the ecosystem. Even if Ractors were perfectly usable tomorrow
morning, tons of gems would need to be adapted to work in a Ractor world. This would be the mother of all yak-shaves.</p>

<p>Trust me, I’ve done my fair share of yak-shaves in the past. When Ruby 2.7 started throwing keyword deprecation warnings
I took it upon myself to fix all these issues in Shopify’s monolith and all its dependencies, which led me to open over a hundred pull requests on open-source gems, trying to reach maintainers, etc.
And again recently with frozen string literal, I submitted tons of PRs to fix lots of gems ahead of Ruby 3.4’s release.</p>

<p>All this to say, I’m not scared of yak-shaves, but making an application like Shopify’s monolith, including its dependencies,
Ractor compatible requires an amount of work that is largely beyond what you may imagine.
And more than work, an ecosystem like Ruby’s need time to adapt to new features like Ractors,
It’s not just a matter of throwing more engineers at the problem.</p>

<p>In the meantime, reforking may or may not be a hack, I don’t really care.
What is important to me is that it solves some real problems, and it does so today.</p>

<p>Of course, it’s not perfect, there are several common complaints it doesn’t solve, such as still requiring more
database connections than what would be possible with in-process parallelism.
But I don’t believe it’s a problem that can be reasonably solved today with a different server design that doesn’t mostly
rely on <code class="language-plaintext highlighter-rouge">fork</code>, and trying to do so now would be putting the cart before the horse.</p>

<p>An engineer’s responsibility is to solve problems while considering the limitations imposed by practicality.</p>

<p>As such, I believe Pitchfork will continue to do fine for at least a few more years.</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1">
      <p>Years later, <a href="https://github.com/ruby/ruby/pull/10899">John Hawthorn figured how to to it with <code class="language-plaintext highlighter-rouge">perf</code> to great effect</a>. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2">
      <p>Since I explained what inline caches are multiple times in the past, I’ll just refer you to <a href="/ruby/json/2024/12/18/optimizing-ruby-json-part-2.html#inline-caches">Optimizing JSON, Part 2</a>. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:3">
      <p>Today I’d probably recommend <a href="https://github.com/jhawthorn/sheap"><code class="language-plaintext highlighter-rouge">sheap</code></a> for the same use case. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name></name></author><category term="ruby" /><category term="performance" /><summary type="html"><![CDATA[A bit more than two years ago, as part of my work in Shopify’s Ruby and Rails Infrastructure team, I released a new Ruby HTTP server called Pitchfork.]]></summary></entry><entry><title type="html">What’s The Deal With Ractors?</title><link href="https://byroot.github.io/ruby/performance/2025/02/27/whats-the-deal-with-ractors.html" rel="alternate" type="text/html" title="What’s The Deal With Ractors?" /><published>2025-02-27T08:03:51+00:00</published><updated>2025-02-27T08:03:51+00:00</updated><id>https://byroot.github.io/ruby/performance/2025/02/27/whats-the-deal-with-ractors</id><content type="html" xml:base="https://byroot.github.io/ruby/performance/2025/02/27/whats-the-deal-with-ractors.html"><![CDATA[<p>I want to write a post about <a href="https://rubygems.org/gems/pitchfork">Pitchfork</a>, explaining where it comes from, why it
is like it is, and how I see its future.
But before I can get to that, I think I need to share my mental model on a few things, in this case, Ractors.</p>

<p>When Ractors were announced 4 or 5 years ago, many people expected we’d quickly see a Ractor-based web server,
some sort of Puma but with Ractors instead of threads.
Yet this still hasn’t happened, except for a few toy projects and experiments.</p>

<p>Since this post series is about giving context to Ruby HTTP servers design constraints, I think it makes sense to share
my view on Ractors viability.</p>

<h2 id="what-are-they-supposed-to-be">What Are They Supposed to Be?</h2>

<p>The core idea of Ractors is relatively simple, the goal is to provide a primitive that allows true in-process parallelism,
while still not fully <a href="/ruby/performance/2025/01/29/so-you-want-to-remove-the-gvl.html">removing the GVL</a>.</p>

<p>As I mentioned in depth in a previous post, operating without a GVL would require synchronization (mutexes) on every
mutable object that is shared between threads.
Ractors’ solution to that problem is not to allow sharing of mutable objects between Ractors.
Instead, they can send each other copies of objects, or in some cases “move” an object to another Ractor, which means they
can no longer access it themselves.</p>

<p>This isn’t unique to Ruby, it’s largely inspired by the <a href="https://en.wikipedia.org/wiki/Actor_model">Actor model</a>, like
the Ractor name suggests, and many languages in the same category as Ruby have a similar construct or are working on one.
For instance, JavaScript has <a href="https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API">Web Workers</a>,
and Python has been working on <a href="https://peps.python.org/pep-0734/">subinterpreters</a> for a while.</p>

<p>And it’s no surprise because it makes total sense from a language evolution perspective.
If you have a language that has prevented in-process parallelism for a long time, a Ractor-like API allows you to introduce (constrained) parallelism in a way that isn’t going to break existing code, without having to add mutexes everywhere.</p>

<p>But even in languages that have free threading, shared mutable state parallelism is seen as a major foot gun by many,
and message-passing parallelism is often deemed safer, for instance, channels in Go, etc.</p>

<p>Applied to Ruby, this means that instead of having a single Global VM Lock that synchronizes all threads,
you’d instead have many Ractor Locks, that each synchronize all threads that belong to a given Ractor.
So in a way, since the Ruby 3.0 release that introduced Ractors, on paper the GVL is somewhat already gone,
even though as we’ll see later, it’s more subtle than that.</p>

<p>And this can easily be confirmed experimentally with a simple test script:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">require</span> <span class="s2">"benchmark"</span>
<span class="no">Warning</span><span class="p">[</span><span class="ss">:experimental</span><span class="p">]</span> <span class="o">=</span> <span class="kp">false</span>

<span class="k">def</span> <span class="nf">fibonacci</span><span class="p">(</span><span class="n">n</span><span class="p">)</span>
  <span class="k">if</span> <span class="n">n</span> <span class="o">==</span> <span class="mi">0</span> <span class="o">||</span> <span class="n">n</span> <span class="o">==</span> <span class="mi">1</span>
    <span class="n">n</span>
  <span class="k">else</span>
    <span class="n">fibonacci</span><span class="p">(</span><span class="n">n</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span> <span class="o">+</span> <span class="n">fibonacci</span><span class="p">(</span><span class="n">n</span> <span class="o">-</span> <span class="mi">2</span><span class="p">)</span>
  <span class="k">end</span>
<span class="k">end</span>

<span class="k">def</span> <span class="nf">synchronous_fib</span><span class="p">(</span><span class="n">concurrency</span><span class="p">,</span> <span class="n">n</span><span class="p">)</span>
  <span class="n">concurrency</span><span class="p">.</span><span class="nf">times</span><span class="p">.</span><span class="nf">map</span> <span class="k">do</span>
    <span class="n">fibonacci</span><span class="p">(</span><span class="n">n</span><span class="p">)</span>
  <span class="k">end</span>
<span class="k">end</span>

<span class="k">def</span> <span class="nf">threaded_fib</span><span class="p">(</span><span class="n">concurrency</span><span class="p">,</span> <span class="n">n</span><span class="p">)</span>
  <span class="n">concurrency</span><span class="p">.</span><span class="nf">times</span><span class="p">.</span><span class="nf">map</span> <span class="k">do</span>
    <span class="no">Thread</span><span class="p">.</span><span class="nf">new</span> <span class="p">{</span> <span class="n">fibonacci</span><span class="p">(</span><span class="n">n</span><span class="p">)</span> <span class="p">}</span>
  <span class="k">end</span><span class="p">.</span><span class="nf">map</span><span class="p">(</span><span class="o">&amp;</span><span class="ss">:value</span><span class="p">)</span>
<span class="k">end</span>

<span class="k">def</span> <span class="nf">ractor_fib</span><span class="p">(</span><span class="n">concurrency</span><span class="p">,</span> <span class="n">n</span><span class="p">)</span>
  <span class="n">concurrency</span><span class="p">.</span><span class="nf">times</span><span class="p">.</span><span class="nf">map</span> <span class="k">do</span>
    <span class="no">Ractor</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">n</span><span class="p">)</span> <span class="p">{</span> <span class="o">|</span><span class="n">num</span><span class="o">|</span> <span class="n">fibonacci</span><span class="p">(</span><span class="n">num</span><span class="p">)</span> <span class="p">}</span>
  <span class="k">end</span><span class="p">.</span><span class="nf">map</span><span class="p">(</span><span class="o">&amp;</span><span class="ss">:take</span><span class="p">)</span>
<span class="k">end</span>

<span class="nb">p</span> <span class="p">[</span><span class="ss">:sync</span><span class="p">,</span> <span class="no">Benchmark</span><span class="p">.</span><span class="nf">realtime</span> <span class="p">{</span> <span class="n">synchronous_fib</span><span class="p">(</span><span class="mi">5</span><span class="p">,</span> <span class="mi">38</span><span class="p">)</span> <span class="p">}.</span><span class="nf">round</span><span class="p">(</span><span class="mi">2</span><span class="p">)]</span>
<span class="nb">p</span> <span class="p">[</span><span class="ss">:thread</span><span class="p">,</span> <span class="no">Benchmark</span><span class="p">.</span><span class="nf">realtime</span> <span class="p">{</span> <span class="n">threaded_fib</span><span class="p">(</span><span class="mi">5</span><span class="p">,</span> <span class="mi">38</span><span class="p">)</span> <span class="p">}.</span><span class="nf">round</span><span class="p">(</span><span class="mi">2</span><span class="p">)]</span>
<span class="nb">p</span> <span class="p">[</span><span class="ss">:ractor</span><span class="p">,</span> <span class="no">Benchmark</span><span class="p">.</span><span class="nf">realtime</span> <span class="p">{</span> <span class="n">ractor_fib</span><span class="p">(</span><span class="mi">5</span><span class="p">,</span> <span class="mi">38</span><span class="p">)</span> <span class="p">}.</span><span class="nf">round</span><span class="p">(</span><span class="mi">2</span><span class="p">)]</span>
</code></pre></div></div>

<p>Here we use the Fibonacci function as a classic CPU-bound workload and benchmark it in 3 different ways.
First without any concurrency, just serially, then concurrently using 5 threads, and finally concurrently using 5 Ractors.</p>

<p>If I run this script on my machine, I get these results:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[</span><span class="ss">:sync</span><span class="p">,</span> <span class="mf">2.26</span><span class="p">]</span>
<span class="p">[</span><span class="ss">:thread</span><span class="p">,</span> <span class="mf">2.29</span><span class="p">]</span>
<span class="p">[</span><span class="ss">:ractor</span><span class="p">,</span> <span class="mf">0.68</span><span class="p">]</span>
</code></pre></div></div>

<p>As we already knew, using threads for CPU-bound workloads doesn’t make anything faster because of the GVL, however using Ractors we can benefit from some parallelism.
So this script proves that, at least to some extent, the Ruby VM can execute code in parallel, hence the GVL is not so
global anymore.</p>

<p>But as always, the devil is in the details.
Running a pure function like <code class="language-plaintext highlighter-rouge">fibonacci</code>, that only deals with immutable integers, in parallel is one thing, running
a full-on web application, with hundreds of gems and a lot of global states, in parallel is another.</p>

<h2 id="shareable-objects">Shareable Objects</h2>

<p>Where Ruby ractors are significantly different from most similar features in other languages, is that Ractors share the global
namespace with other Ractors.</p>

<p>To create a <code class="language-plaintext highlighter-rouge">WebWorker</code> in JavaScript, you have to provide an entry script:</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">myWorker</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">Worker</span><span class="p">(</span><span class="dl">"</span><span class="s2">worker.js</span><span class="dl">"</span><span class="p">)</span>
</code></pre></div></div>

<p>WebWorkers are created from a blank slate and have their own namespace, they don’t automatically inherit all the constants
defined by the caller.</p>

<p>Similarly, Python’s sub-interpreters as defined in PEP 734, start with a clean slate.</p>

<p>So both JavaScript’s WebWorker and Python’s sub-interpreters have very limited sharing capabilities and are more akin to light subprocesses, but with an API that allows passing each other’s objects without needing to serialize them.</p>

<p>Ruby’s Ractors are more ambitious than that.
From a secondary Ractor, you have visibility on all the constants and methods defined by the main Ractor:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">INT</span> <span class="o">=</span> <span class="mi">1</span>

<span class="no">Ractor</span><span class="p">.</span><span class="nf">new</span> <span class="k">do</span>
  <span class="nb">p</span> <span class="no">INT</span> <span class="c1"># prints 1</span>
<span class="k">end</span><span class="p">.</span><span class="nf">take</span>
</code></pre></div></div>

<p>But since Ruby cannot allow concurrent access to mutable objects, it has to limit this in some way:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">HASH</span> <span class="o">=</span> <span class="p">{}</span>

<span class="no">Ractor</span><span class="p">.</span><span class="nf">new</span> <span class="k">do</span>
  <span class="nb">p</span> <span class="no">HASH</span> <span class="c1"># Ractor::IsolationError</span>
  <span class="c1"># can not access non-shareable objects in constant Object::HASH by non-main Ractor.</span>
<span class="k">end</span><span class="p">.</span><span class="nf">take</span>
</code></pre></div></div>

<p>So all objects are divided into shareable and unshareable objects, and only shareable ones can be accessed by secondary ractors.
In general, objects that are frozen, or inherently immutable are shareable as long as they don’t reference a non-shareable object.</p>

<p>In addition, some other operations, such as assigning class instance variables aren’t allowed from any ractor other than
the main one:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">Ractor</span><span class="p">.</span><span class="nf">new</span> <span class="k">do</span> 
  <span class="k">class</span> <span class="nc">Foo</span>
    <span class="k">class</span> <span class="o">&lt;&lt;</span> <span class="nb">self</span>
      <span class="nb">attr_accessor</span> <span class="ss">:bar</span>
    <span class="k">end</span>
  <span class="k">end</span>
  <span class="no">Foo</span><span class="p">.</span><span class="nf">bar</span> <span class="o">=</span> <span class="mi">1</span> <span class="c1"># Ractor::IsolationError</span>
  <span class="c1"># can not set instance variables of classes/modules by non-main Ractors</span>
<span class="k">end</span><span class="p">.</span><span class="nf">take</span>
</code></pre></div></div>

<p>So Ractors’ design is a bit of a double-edged sword.
On one hand, by having access to all the loaded constants and methods, you don’t have to load the same code multiple
times, and it’s easier to pass complex objects from one ractor to the other, but it also means that not all code may be
able to run from a secondary ractor.
Actually, a lot, if not most, existing Ruby code can’t run from a secondary Ractor.
Something as mundane as accessing a constant that is technically mutable, like a String or Hash, will raise an <code class="language-plaintext highlighter-rouge">IsolationError</code>,
even if you never attempted to mutate it.</p>

<p>Something as mundane and idiomatic as having a constant with some defaults is enough to make your code not Ractor compatible,
e.g.:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Something</span>
  <span class="no">DEFAULTS</span> <span class="o">=</span> <span class="p">{</span> <span class="ss">config: </span><span class="mi">1</span> <span class="p">}</span> <span class="c1"># You'd need to explictly freeze that Hash.</span>

  <span class="k">def</span> <span class="nf">initialize</span><span class="p">(</span><span class="n">options</span> <span class="o">=</span> <span class="p">{})</span>
    <span class="vi">@options</span> <span class="o">=</span> <span class="no">DEFAULTS</span><span class="p">.</span><span class="nf">merge</span><span class="p">(</span><span class="n">options</span><span class="p">)</span> <span class="c1"># =&gt; Ractor::IsolationError</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>That’s one of the main reasons why a Ractor-based web server isn’t really practical for anything more than a trivial application.</p>

<p>If you take Rails as an example, there is quite a lot of legitimate global states, such as the routes, the database schema
cache, or the logger. Some of it could probably be frozen to be accessible by secondary ractors, but for things
like the logger, the Active Record connection pool, and various caches, it’s tricky.</p>

<p>To be honest, I’m not even sure how you could implement a Ractor safe connection pool with the current API, but I may
be missing something. Actually, that’s probably a good illustration of the problem, let’s try to implement a Ractor-compatible connection pool.</p>

<h2 id="a-ractor-aware-connection-pool">A Ractor Aware Connection Pool</h2>

<p>The first challenge is that you’d need to be able to move connections from one ractor to another, something like:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">require</span> <span class="s2">"trilogy"</span>

<span class="n">db_client</span> <span class="o">=</span> <span class="no">Trilogy</span><span class="p">.</span><span class="nf">new</span>
<span class="n">ractor</span> <span class="o">=</span> <span class="no">Ractor</span><span class="p">.</span><span class="nf">new</span> <span class="p">{</span> <span class="n">receive</span><span class="p">.</span><span class="nf">query</span><span class="p">(</span><span class="s2">"SELECT 1"</span><span class="p">)</span> <span class="p">}</span>
<span class="n">ractor</span><span class="p">.</span><span class="nf">send</span><span class="p">(</span><span class="n">db_client</span><span class="p">,</span> <span class="ss">move: </span><span class="kp">true</span><span class="p">)</span>
<span class="nb">p</span> <span class="n">ractor</span><span class="p">.</span><span class="nf">take</span>
</code></pre></div></div>

<p>If you try that you’ll get a <code class="language-plaintext highlighter-rouge">can not move Trilogy object. (Ractor::Error)</code>.
This is because as far as I’m aware, there is no way for classes implemented in C to define that they can be moved to
another ractor. Even the ones defined in Ruby’s core, like <code class="language-plaintext highlighter-rouge">Time</code> can’t:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">Ractor</span><span class="p">.</span><span class="nf">new</span><span class="p">{}.</span><span class="nf">send</span><span class="p">(</span><span class="no">Time</span><span class="p">.</span><span class="nf">now</span><span class="p">,</span> <span class="ss">move: </span><span class="kp">true</span><span class="p">)</span> <span class="c1"># can not move Time object. (Ractor::Error)</span>
</code></pre></div></div>

<p>The only thing C extensions can do is define that a type can be shared between Ractors once it is frozen, using the
<code class="language-plaintext highlighter-rouge">RUBY_TYPED_FROZEN_SHAREABLE</code> flag, but that wouldn’t make sense for a database connection.</p>

<p>A way around this is to encapsulate that object inside its own Ractor:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">require</span> <span class="s2">"trilogy"</span>

<span class="k">class</span> <span class="nc">RactorConnection</span>
  <span class="k">def</span> <span class="nf">initialize</span>
    <span class="vi">@ractor</span> <span class="o">=</span> <span class="no">Ractor</span><span class="p">.</span><span class="nf">new</span> <span class="k">do</span>
      <span class="n">client</span> <span class="o">=</span> <span class="no">Trilogy</span><span class="p">.</span><span class="nf">new</span>
      <span class="k">while</span> <span class="n">args</span> <span class="o">=</span> <span class="no">Ractor</span><span class="p">.</span><span class="nf">receive</span>
        <span class="n">ractor</span><span class="p">,</span> <span class="nb">method</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span> <span class="o">=</span> <span class="n">args</span>
        <span class="n">ractor</span><span class="p">.</span><span class="nf">send</span> <span class="n">client</span><span class="p">.</span><span class="nf">public_send</span><span class="p">(</span><span class="nb">method</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">)</span>
      <span class="k">end</span>
    <span class="k">end</span>
  <span class="k">end</span>

  <span class="k">def</span> <span class="nf">query</span><span class="p">(</span><span class="n">sql</span><span class="p">)</span>
    <span class="vi">@ractor</span><span class="p">.</span><span class="nf">send</span><span class="p">([</span><span class="no">Ractor</span><span class="p">.</span><span class="nf">current</span><span class="p">,</span> <span class="ss">:query</span><span class="p">,</span> <span class="n">sql</span><span class="p">],</span> <span class="ss">move: </span><span class="kp">true</span><span class="p">)</span>
    <span class="no">Ractor</span><span class="p">.</span><span class="nf">receive</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>When we need to perform an operation on the object, we send a message telling it what to do,
and give it our own ractor so it can send the result back.</p>

<p>It really is a huge hack, and perhaps there is a proper way to do this, but I don’t know of any.</p>

<p>Now that we have a “way” to pass database connections across ractors, we need to implement a pool.
Here again, it is tricky, because by definition a pool is a mutable data structure, hence it can’t
be referenced by multiple ractors.</p>

<p>So we somewhat need to use the same hack again:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">RactorConnectionPool</span>
  <span class="k">def</span> <span class="nf">initialize</span>
    <span class="vi">@ractor</span> <span class="o">=</span> <span class="no">Ractor</span><span class="p">.</span><span class="nf">new</span> <span class="k">do</span>
      <span class="n">pool</span> <span class="o">=</span> <span class="p">[]</span>
      <span class="k">while</span> <span class="n">args</span> <span class="o">=</span> <span class="no">Ractor</span><span class="p">.</span><span class="nf">receive</span>
        <span class="n">ractor</span><span class="p">,</span> <span class="nb">method</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span> <span class="o">=</span> <span class="n">args</span>
        <span class="k">case</span> <span class="nb">method</span>
        <span class="k">when</span> <span class="ss">:checkout</span>
          <span class="n">ractor</span><span class="p">.</span><span class="nf">send</span><span class="p">(</span><span class="n">pool</span><span class="p">.</span><span class="nf">pop</span> <span class="o">||</span> <span class="no">RactorConnection</span><span class="p">.</span><span class="nf">new</span><span class="p">)</span>
        <span class="k">when</span> <span class="ss">:checkin</span>
          <span class="n">pool</span> <span class="o">&lt;&lt;</span> <span class="n">args</span><span class="p">.</span><span class="nf">first</span>
        <span class="k">end</span>
      <span class="k">end</span>
    <span class="k">end</span>
    <span class="nb">freeze</span> <span class="c1"># so we're shareable</span>
  <span class="k">end</span>

  <span class="k">def</span> <span class="nf">checkout</span>
    <span class="vi">@ractor</span><span class="p">.</span><span class="nf">send</span><span class="p">([</span><span class="no">Ractor</span><span class="p">.</span><span class="nf">current</span><span class="p">,</span> <span class="ss">:checkout</span><span class="p">],</span> <span class="ss">move: </span><span class="kp">true</span><span class="p">)</span>
    <span class="no">Ractor</span><span class="p">.</span><span class="nf">receive</span>
  <span class="k">end</span>

  <span class="k">def</span> <span class="nf">checkin</span><span class="p">(</span><span class="n">connection</span><span class="p">)</span>
    <span class="vi">@ractor</span><span class="p">.</span><span class="nf">send</span><span class="p">([</span><span class="no">Ractor</span><span class="p">.</span><span class="nf">current</span><span class="p">,</span> <span class="ss">:checkin</span><span class="p">,</span> <span class="n">connection</span><span class="p">],</span> <span class="ss">move: </span><span class="kp">true</span><span class="p">)</span>
  <span class="k">end</span>
<span class="k">end</span>

<span class="no">CONNECTION_POOL</span> <span class="o">=</span> <span class="no">RactorConnectionPool</span><span class="p">.</span><span class="nf">new</span>

<span class="n">ractor</span> <span class="o">=</span> <span class="no">Ractor</span><span class="p">.</span><span class="nf">new</span> <span class="k">do</span>
  <span class="n">db_client</span> <span class="o">=</span> <span class="no">CONNECTION_POOL</span><span class="p">.</span><span class="nf">checkout</span>
  <span class="n">result</span> <span class="o">=</span> <span class="n">db_client</span><span class="p">.</span><span class="nf">query</span><span class="p">(</span><span class="s2">"SELECT 1"</span><span class="p">)</span>
  <span class="no">CONNECTION_POOL</span><span class="p">.</span><span class="nf">checkin</span><span class="p">(</span><span class="n">db_client</span><span class="p">)</span>
  <span class="n">result</span>
<span class="k">end</span>
<span class="nb">p</span> <span class="n">ractor</span><span class="p">.</span><span class="nf">take</span><span class="p">.</span><span class="nf">to_a</span> <span class="c1"># =&gt; [[1]]</span>
</code></pre></div></div>

<p>I’m not going to go further, as this implementation is quite ridiculous, I think this is enough to make my point.</p>

<p>For Ractors to be viable to run a full-on application in, Ruby would need to provide at least a few basic data structures
that would be shareable across ractors, so that we can implement useful constructs like connection pools.</p>

<p>Perhaps some <code class="language-plaintext highlighter-rouge">Ractor::Queue</code>, maybe even some <code class="language-plaintext highlighter-rouge">Ractor::ConcurrentMap</code>, and more importantly, C extensions
would need to be able to make their types movable.</p>

<h2 id="what-ractors-could-be-useful-for">What Ractors Could Be Useful For</h2>

<p>So while I don’t believe it makes sense to try to run a full application inside Ractors, I still think Ractors could be
very useful even with their current limitations.</p>

<p>For instance, in my previous post about the GVL, I mentioned how some gems do have background threads, one example being
<a href="https://github.com/Shopify/statsd-instrument/blob/6fd8c49d50803bbccfcc11b195f9e334a6e835e9/lib/statsd/instrument/batched_sink.rb#L163"><code class="language-plaintext highlighter-rouge">statsd-instrument</code></a>,
but there are others like open telemetry and such.</p>

<p>These gems all have a similar pattern, they collect information in memory, and periodically serialize and send it down
the wire. Currently, this is done using a thread, which is sometimes problematic because the serialization part holds
the GVL, hence can slow down the threads that are responding to incoming traffic.</p>

<p>This would be an excellent pattern for Ractors, as they’d be able to do the same thing without holding the main Ractor’s
GVL and it’s mostly fire and forget.</p>

<p>I only mean this as an example I know well, I’m sure there’s more.
The key point is that while Ractors in their current form can hardly be used as the main execution primitive, they can certainly be used for parallelizing lower-level functions inside libraries.</p>

<p>But unfortunately, in practice, it’s not really a good idea to do that today.</p>

<h2 id="also-there-are-many-implementation-issues">Also There Are Many Implementation Issues</h2>

<p>If you attempt to use Ractors, Ruby will display a warning:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>warning: Ractor is experimental, and the behavior may change in future versions of Ruby!
Also there are many implementation issues.
</code></pre></div></div>

<p>And that’s not an overstatement.
As I’m writing this article, there are 74 open issues about Ractors.
A handful are feature requests or minor things, but a significant part are really critical bugs such as
segmentation faults, or deadlocks.
As such, one cannot reasonably use Ractors for anything more than small experiments.</p>

<p>Another major reason not to use them even in these cases that are perfect for them, is that quite often, they’re not
really running in parallel as they’re supposed to.</p>

<h2 id="one-more-global-lock">One More Global Lock</h2>

<p>As mentioned previously, on paper, the true Global VM Lock is supposedly gone since the introduction of Ractors in Ruby 3.0
and instead, each ractor has its own “GVL”. But this isn’t actually true.</p>

<p>There are still a significant number of routines in the Ruby virtual machine that do lock all Ractors.
Let me show you an example.</p>

<p>Imagine you have 5 millions small JSON documents to parse:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># frozen_string_literal: true</span>
<span class="nb">require</span> <span class="s1">'json'</span>

<span class="n">document</span> <span class="o">=</span> <span class="o">&lt;&lt;~</span><span class="no">JSON</span><span class="sh">
  {"a": 1, "b": 2, "c": 3, "d": 4}
</span><span class="no">JSON</span>

<span class="mi">5_000_000</span><span class="p">.</span><span class="nf">times</span> <span class="k">do</span>
  <span class="no">JSON</span><span class="p">.</span><span class="nf">parse</span><span class="p">(</span><span class="n">document</span><span class="p">)</span>
<span class="k">end</span>
</code></pre></div></div>

<p>Doing so serially takes about 1.3 seconds on my machine:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">time </span>ruby <span class="nt">--yjit</span> /tmp/j.rb

real	0m1.292s
user	0m1.251s
sys	0m0.018s
</code></pre></div></div>

<p>As unrealistic as this script may look, it should be a perfect use case for Ractor. In theory, we could spawn
5 Ractors, have each of them parse 1 million documents, and be done in 1/5th of the time:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># frozen_string_literal: true</span>
<span class="nb">require</span> <span class="s1">'json'</span>

<span class="no">DOCUMENT</span> <span class="o">=</span> <span class="o">&lt;&lt;~</span><span class="no">JSON</span><span class="sh">
  {"a": 1, "b": 2, "c": 3, "d": 4}
</span><span class="no">JSON</span>

<span class="n">ractors</span> <span class="o">=</span> <span class="mi">5</span><span class="p">.</span><span class="nf">times</span><span class="p">.</span><span class="nf">map</span> <span class="k">do</span>
  <span class="no">Ractor</span><span class="p">.</span><span class="nf">new</span> <span class="k">do</span>
    <span class="mi">1_000_000</span><span class="p">.</span><span class="nf">times</span> <span class="k">do</span>
      <span class="no">JSON</span><span class="p">.</span><span class="nf">parse</span><span class="p">(</span><span class="no">DOCUMENT</span><span class="p">)</span>
    <span class="k">end</span>
  <span class="k">end</span>
<span class="k">end</span>
<span class="n">ractors</span><span class="p">.</span><span class="nf">each</span><span class="p">(</span><span class="o">&amp;</span><span class="ss">:take</span><span class="p">)</span>
</code></pre></div></div>

<p>But somehow, it’s over twice as slow as doing it serially:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/tmp/jr.rb:9: warning: Ractor is experimental, and the behavior may change <span class="k">in </span>future versions of Ruby! Also there are many implementation issues.

real	0m3.191s
user	0m3.055s
sys	0m6.755s
</code></pre></div></div>

<p>What’s happening is that in this particular example, JSON has to acquire the true remaining VM lock for each key in
the JSON document.
With 4 keys, a million times, it means each Ractor has to acquire and release a lock 4 million times.
It’s almost surprising it only takes 3 seconds to do so.</p>

<p>For the keys, it needs to acquire the GVL because it inserts string keys into a Hash, and as I explained in
<a href="/ruby/json/2025/01/12/optimizing-ruby-json-part-6.html">Optimizing Ruby’s JSON, Part 6</a>, when you do that Ruby will
look inside the interned string table to search for an equivalent string that is already interned.</p>

<p>I used the following Ruby pseudo-code to explain how it works:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Hash</span>
  <span class="k">def</span> <span class="nf">[]=</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="p">)</span>
    <span class="k">if</span> <span class="n">entry</span> <span class="o">=</span> <span class="n">find_entry</span><span class="p">(</span><span class="n">key</span><span class="p">)</span>
      <span class="n">entry</span><span class="p">.</span><span class="nf">value</span> <span class="o">=</span> <span class="n">value</span>
    <span class="k">else</span>
      <span class="k">if</span> <span class="n">key</span><span class="p">.</span><span class="nf">is_a?</span><span class="p">(</span><span class="no">String</span><span class="p">)</span> <span class="o">&amp;&amp;</span> <span class="o">!</span><span class="n">key</span><span class="p">.</span><span class="nf">interned?</span>
        <span class="k">if</span> <span class="n">interned_str</span> <span class="o">=</span> <span class="o">::</span><span class="no">RubyVM</span><span class="o">::</span><span class="no">INTERNED_STRING_TABLE</span><span class="p">[</span><span class="n">key</span><span class="p">]</span>
          <span class="n">key</span> <span class="o">=</span> <span class="n">interned_str</span>
        <span class="k">elsif</span> <span class="o">!</span><span class="n">key</span><span class="p">.</span><span class="nf">frozen?</span>
          <span class="n">key</span> <span class="o">=</span> <span class="n">key</span><span class="p">.</span><span class="nf">dup</span><span class="p">.</span><span class="nf">freeze</span>
        <span class="k">end</span>
      <span class="k">end</span>

      <span class="nb">self</span> <span class="o">&lt;&lt;</span> <span class="no">Entry</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="p">)</span>
    <span class="k">end</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>In the above example <code class="language-plaintext highlighter-rouge">::RubyVM::INTERNED_STRING_TABLE</code> is a regular hash that could cause a crash if it was accessed
concurrently, so Ruby still acquires the GVL to look it up.</p>

<p>If you look at <a href="https://github.com/ruby/ruby/blob/d4b8da66ca9533782d2fed9762783c3e560f2998/string.c#L538-L570"><code class="language-plaintext highlighter-rouge">register_fstring</code> in <code class="language-plaintext highlighter-rouge">string.c</code></a>
(<code class="language-plaintext highlighter-rouge">fstring</code> is the internal name for interned strings), you can see the very obvious <code class="language-plaintext highlighter-rouge">RB_VM_LOCK_ENTER()</code> and
<code class="language-plaintext highlighter-rouge">RB_VM_LOCK_LEAVE()</code> calls.</p>

<p>As I’m writing this, there are 42 remaining calls to <code class="language-plaintext highlighter-rouge">RB_VM_LOCK_ENTER()</code> in the Ruby VM, many are very rarely hit and not
much of a problem, but this one demonstrates how even when you have what is a perfect use case for Ractors, besides their constraints,
it may still not be advantageous to use them yet.</p>

<h2 id="conclusion">Conclusion</h2>

<p>In his RubyKaigi 2023 talk about the state of Ractors, Koichi Sasada who’s the main driving force behind them, mentioned that
Ractors suffered from <a href="https://youtu.be/Id706gYi3wk?si=DaECpXT2lEMO7kiA&amp;t=878">some sort of a chicken and egg problem</a>.
By his own admission, Ractors suffer from many bugs, and often don’t actually deliver the performance they’re supposed to,
hence very few people use them enough to be able to provide feedback on the API, and I’m afraid that almost two years later,
my assessment is the same on bugs and performance.</p>

<p>If Ractors bugs and performance problems were fixed, it’s likely that some of the provided feedback would lead to some of their
restrictions to be lifted over time.
I personally don’t think they’ll ever have little enough restrictions for it to be practical to run a full application inside a Ractor, hence that a Ractor-based web server would make sense, but who knows, I’d be happy to be proven wrong.</p>

<p>Ultimately, even if you are among the people who believe that Ruby should just try to remove its GVL for real rather
than to spend resources on Ractors, let me say that a large part of the work needed to make Ractors perform well,
like a concurrent hash map for interned strings, is work that would be needed to enable free threading anyway, so it’s not
wasted.</p>]]></content><author><name></name></author><category term="ruby" /><category term="performance" /><summary type="html"><![CDATA[I want to write a post about Pitchfork, explaining where it comes from, why it is like it is, and how I see its future. But before I can get to that, I think I need to share my mental model on a few things, in this case, Ractors.]]></summary></entry><entry><title type="html">There Isn’t Much Point to HTTP/2 Past The Load Balancer</title><link href="https://byroot.github.io/ruby/performance/2025/02/24/http2-past-the-load-balancer.html" rel="alternate" type="text/html" title="There Isn’t Much Point to HTTP/2 Past The Load Balancer" /><published>2025-02-24T19:47:51+00:00</published><updated>2025-02-24T19:47:51+00:00</updated><id>https://byroot.github.io/ruby/performance/2025/02/24/http2-past-the-load-balancer</id><content type="html" xml:base="https://byroot.github.io/ruby/performance/2025/02/24/http2-past-the-load-balancer.html"><![CDATA[<p>I want to write a post about <a href="https://rubygems.org/gems/pitchfork">Pitchfork</a>, explaining where it comes from, why it
is like it is, and how I see its future.
But before I can get to that, I think I need to share my mental model on a few things, in this case, HTTP/2.</p>

<p>From time to time, either online or at conferences, I hear people complain about the lack of support for HTTP/2 in
Ruby HTTP servers, generally Puma.
And every time I do the same, I ask them why they want that feature, and so far nobody had an actual use case for it.</p>

<p>Personally, this lack of support doesn’t bother me much, because the only use case I can see for it, is wanting to expose
your Ruby HTTP directly to the internet without any sort of load balancer or reverse proxy, which I understand may seem
tempting, as it’s “one less moving piece”, but not really worth the trouble in my opinion.</p>

<p>If you are not familiar with the HTTP protocol and what’s different in version 2 (and even 3 nowadays), you might
be surprised by this take, so let me try to explain what it is all about.</p>

<h2 id="what-does-http2-solve">What Does HTTP/2 Solve?</h2>

<p>HTTP/2 started under the name SPDY in 2009, with multiple goals, but mainly to reduce page load latency, by allowing it to
download more resources faster.
A major factor in page load time is that a page isn’t just a single HTTP request.
Once your browser has downloaded the HTML page and starts parsing it, it will find other resources it needs to also
download to render the page, be it stylesheets, scripts, or images.</p>

<p>So a page isn’t one HTTP request, but a cascade of them, and in the late 2000s, the number of resources on the average
page kept going up.
This bloat was in part offset by broadband getting better, but still, HTTP/1.1 wasn’t really adequate to download
many small files quickly for a few reasons.</p>

<p>The first one is that <a href="https://datatracker.ietf.org/doc/html/rfc2616">RFC 2616</a>, which introduced HTTP/1.1
specified that browsers were only allowed <em>two</em> concurrent connections to a given domain:</p>

<blockquote>
  <p>8.1.4 Practical Considerations</p>

  <p>Clients that use persistent connections SHOULD limit the number of simultaneous connections that they maintain to a
given server. A single-user client SHOULD NOT maintain more than 2 connections with any server or proxy.</p>
</blockquote>

<p>So if you can only request a single resource per connection, and are limited to two connections, even if you have a very
large bandwidth, the latency to the server will have a massive impact on performance whenever you need to download more than
a couple of resources.</p>

<p>Imagine you have an excellent 100Gb connection, but are trying to load a webpage hosted across the Atlantic ocean.
The roundtrip time to that server (your ping), will probably be around 60ms. If you need to download 100 small resources
through just two connections, it will take at least <code class="language-plaintext highlighter-rouge">ping * (resources / connections)</code>, so 3 seconds, which isn’t great.</p>

<p>That’s what made many frontend optimization techniques like assets bundling absolutely essentials back then, they
made a major difference in load time<sup id="fnref:1"><a href="#fn:1" class="footnote" rel="footnote" role="doc-noteref">1</a></sup>.
Similarly, some websites were using a technique called domain sharding, splitting assets into multiple domains to allow
more concurrency.</p>

<p>In theory, even these two connections could have been used much more effectively by pipelining requests,
the RFC 2616 has an entire section about it, and that was one of the big features added in HTTP/1.1 compared to 1.0.
The idea is simple, after sending your request, you don’t have to wait for the response before sending more requests.
You can send 10 requests immediately before having received a single response, and the server will send them one by one
in order.</p>

<p>But in practice most browsers ended up disabling that feature by default because they ran into misbehaving servers, dooming
the feature.
It also wasn’t perfect, as you could experience <em>head-of-line blocking</em>.
Since responses don’t have an identifier to map them to the request they’re the answer to, they have to be sent in order.
If one resource is slow to generate, all the subsequent resources can’t be sent yet.</p>

<p>That’s why as early as 2008, browsers stopped respecting the two concurrent connection rule.
Firefox 3 started raising the connection limit to 6 per domain, and most other browsers followed suit shortly after.</p>

<p>However, more concurrent connections isn’t an ideal solution, because TCP connections have a <em>slow start</em>.
When you connect to a remote address, your computer doesn’t know if the link to that other machine can support 10 gbit/s
or only 56 kbit/s.
Hence, to avoid flooding the network with tons of packets that will be dropped on the floor, it starts relatively slow
and periodically increase the throughput until it receives packet loss notifications, at that point it know it has more
or less reached the maximum throuhput the link can sustain.</p>

<p>That’s why persistent connections are a big deal, a freshly established connection has a much lower throughput than one
that has seen some use.</p>

<p>So by multiplying the number of connections, you can download more resources faster, but it would be preferable if they
were all downloaded from the same connection to not suffer as much from TCP slow start.</p>

<p>And that’s exactly the main thing HTTP/2 solved, by allowing multiplexing of requests inside a single TCP connection,
solving the head-of-line blocking issue<sup id="fnref:2"><a href="#fn:2" class="footnote" rel="footnote" role="doc-noteref">2</a></sup>.</p>

<p>It also did a few other things, such as mandating the use of encryption<sup id="fnref:3"><a href="#fn:3" class="footnote" rel="footnote" role="doc-noteref">3</a></sup> and also compressing request and response headers
with GZip, and “server push”, but multiplexing is really the big one.</p>

<h2 id="why-it-doesnt-matter-over-lan">Why It Doesn’t Matter Over LAN</h2>

<p>So the main motivation for HTTP/2 is multiplexing, and over the Internet, especially mobile Internet with somewhat more
spotty connections, it can have a massive impact.</p>

<p>But in the data center, not so much. If you think about it, the very big factor in the computation we did above was the
roundtrip time (ping) with the client.
Unless your infrastructure is terribly designed, that roundtrip time between your server (say Puma) and its client
(your load balancer or reverse proxy) should be extremely small, way under one millisecond, and totally dwarfed by the
actual request render time.</p>

<p>When you are serving mostly static assets over the Internet, latency may be high and HTTP/2 multiplexing is a huge deal.
But when you are serving application-generated responses over LAN (or even a UNIX socket), it won’t make a measurable
difference.</p>

<p>In addition to the low roundtrip time, the connections between your load balancer and application server likely have
a very long lifetime, hence don’t suffer from TCP slow start as much, and that’s assuming your operating system hasn’t
been tuned to disable slow start entirely, which is very common on servers.</p>

<h2 id="server-push-fail">Server Push Fail</h2>

<p>Another reason people may have wanted HTTP/2 all the way to the Ruby application server at one point was the “server push”
capability.</p>

<p>The idea was relatively simple, servers were allowed to send HTTP resources to the client without being prompted for it.
This way, when you request the landing page of a website, the server can send you all the associated resources up front
so your browser doesn’t have to parse the HTML to realize it needs them and start to ask for it.</p>

<p>However, that capability was actually removed from the spec and nowadays all browsers have removed it because was
actually doing more harm than good. It turns out that if the browser already had these resources in its cache, then
pushing them again would slow down the page load time.</p>

<p>People tried to find smart heuristics to know which resources may be in the cache or not, but in the end, none worked
and the feature was abandoned.</p>

<p>Today it has been superseded by <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/103">103 Early Hints</a>, which
is a much simpler and elegant spec, and is retro-compatible with HTTP/1.1.</p>

<p>So there isn’t any semantic difference left between HTTP/1.1 and HTTP/2.
From a <a href="https://github.com/rack/rack/blob/main/SPEC.rdoc"><code class="language-plaintext highlighter-rouge">Rack</code></a> application point of view, whether the request was
issued through an HTTP/2 or HTTP/1.1 connection makes no difference.
You can tunnel one into the other just fine.</p>

<h2 id="extra-complexity">Extra Complexity</h2>

<p>In addition to not providing much if any benefit over LAN, HTTP/2 adds some extra complexity.</p>

<p>First, the complexity of implementation, as HTTP/2 while not being crazy complicated at all, is still a largely binary
protocol, so it’s much harder to debug.</p>

<p><del>But also the complexity of deployment. HTTP/2 is fully encrypted, so you need all your application servers to have a key and
certificate, that’s not insurmountable, but is an extra hassle compared to just using HTTP/1.1, unless of course for some
reasons you are required to use only encrypted connections even over LAN.</del> Edit: The HTTP/2 spec doesn’t actually require
encryption, only browsers and some libraries, so you can do unencrypted HTTP/2 inside your datacenter.</p>

<p>So unless you are deploying to a single machine, hence don’t have a load balancer, bringing HTTP/2 all the way to
the Ruby app server is significantly complexifying your infrastructure for little benefit.</p>

<p>And even if you are on a single machine, it’s probably to leave that concern to a reverse proxy, which will also take
care of serving static assets, normalize inbound requests, and also probably fend off at least some malicious actors.</p>

<p>There are numerous battle-tested reverse proxies such as Nginx, Caddy, etc, and they’re pretty simple to setup,
might as well use these common middlewares rather than to try to do everything in a single Ruby application.</p>

<p>But if you think a reverse proxy is too much complexity and you’d rather do without, there are now zero config solutions
such as <a href="https://github.com/basecamp/thruster">thruster</a>, I haven’t tried it so I can’t vouch for it, but at least on
paper it solves that need.</p>

<h2 id="conclusion">Conclusion</h2>

<p>I think HTTP/2 is better thought of not as an upgrade over HTTP/1.1, but as an alternative protocol to more efficiently
transport the same HTTP resources over the Internet. In a way, it’s similar to how HTTPS doesn’t change the semantics
of the HTTP protocol, it only changes how it’s serialized over the wire.</p>

<p>So I believe handling HTTP/2 is better left to your infrastructure entry point, typically the load balancer or reverse proxy, for the same
reason that TLS has been left to the load balancer or reverse proxy for ages. They have to decrypt and decompress
the request to know what to do with it, why re-encrypt and re-compress it to forward it to the app server?</p>

<p>Hence, in my opinion, HTTP/2 support in Ruby HTTP servers isn’t a critically important feature, would be nice to have it for a few
niche use cases, but overall, the lack of it isn’t hindering much of anything.</p>

<p>Note that I haven’t mentioned HTTP/3, but while the protocol is very different, its goals are largely the same as HTTP2, so I’d apply the same conclusion to it.</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1">
      <p>Minifying and bundling still improve load time with HTTP/2, fewer requests and fewer bytes transferred are still positive, so they’re still useful, but it’s no longer critical to achieve a decent experience. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2">
      <p>At the HTTP layer at least, HTTP/2 still suffers from some forms of head-of-line blocking in lower layers, but it is beyond the scope of this post. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:3">
      <p>The RFC doesn’t actually requires encryption, but all browser implementations do. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name></name></author><category term="ruby" /><category term="performance" /><summary type="html"><![CDATA[I want to write a post about Pitchfork, explaining where it comes from, why it is like it is, and how I see its future. But before I can get to that, I think I need to share my mental model on a few things, in this case, HTTP/2.]]></summary></entry></feed>