00:56:02
Aaron Patterson's closing keynote at Rails World 2025 delved into the cutting-edge performance enhancements coming to Ruby, focusing on true parallelism with Ractors and the next-generation JIT compiler, ZJIT. Here’s a breakdown of the key insights for developers.
At Shopify, a core goal is improving machine utilization by handling more parallel work without degrading latency. Current web servers, whether process-based (like Unicorn) or thread-based (like Puma), face a fundamental Ruby limitation: only processes can achieve true CPU parallelism, leading to inefficiencies with mixed I/O and CPU workloads.
An ideal solution would be a load-aware web server that could dynamically scale its parallel processing capacity based on current CPU load, potentially using HTTP/2 for back-pressure communication. The missing piece is a construct that can be spun up quickly and handle CPU-bound work in parallel. The answer? Ractors.
Ractors (Ruby Actors) provide an actor-model style of parallelism. Each Ractor has its own independent Global VM Lock (GVL), allowing them to run Ruby code truly in parallel on multiple CPU cores.
A benchmark calculating a Fibonacci sequence demonstrates the difference:
Ractors currently output a warning: "experimental feature... behavior may change in future versions." The Ruby core team, including Patterson's team at Shopify, is actively working on stabilizing the API and fixing implementation issues to remove this warning and make Ractors production-ready.
The fundamental rule of Ractors is that mutable objects cannot be shared between them. When passed, they are duplicated. Immutable objects (e.g., frozen strings) can be passed by reference without cost.
To pass complex, nested mutable data structures between Ractors, use Ractor.make_shareable
to deep-freeze the object, or use APIs that return frozen structures by default (e.g., JSON.parse(json, freeze: true)
).
Ractors communicate through "ports," which are essentially queues. The critical rules for ports are:
This requires a different mental model for coordination compared to traditional thread-based queues, often involving a central "coordinator" Ractor that distributes work to "worker" Ractors that explicitly ask for tasks.
A compelling use case for Ractors is wrapping CPU-intensive operations, effectively creating a "noGVL" block in pure Ruby. This allows other threads (e.g., those handling web requests) to run in parallel while the CPU-bound task executes in a Ractor.
For example, instead of writing a C extension to release the GVL for a task like password hashing (with bcrypt) or JSON parsing, you can achieve the same parallelism by isolating that task inside a Ractor, making it accessible to pure Ruby code.
The path to stable Ractors involves finding and fixing hidden global bottlenecks within CRuby itself. Patterson highlighted an issue where parsing JSON was slower with Ractors than serially due to lock contention on Ruby's internal "fstring table" (a global hash for deduplicating frozen strings).
Fixing this by implementing a lock-free hash made JSON parsing in Ractors 12x faster. Similar bottlenecks have been found and addressed in other internal structures (e.g., the symbol ID table, the inline cache CC table, the encoding table).
Alongside Ractors, the Ruby team is developing ZJIT, a new method-based Just-In-Time compiler set to ship with Ruby 3.5. It complements the existing YJIT compiler.
A JIT compiler assembles machine code at runtime. It speeds up programs by:
YJIT (Lazy Basic Block Versioning): Currently ships with Ruby. It compiles straight-line code paths ("basic blocks") lazily as they are discovered during execution.
ZJIT (Method-Based): Compiles entire methods at a time.
ZJIT is very new and is not yet as fast as YJIT, but the goal is to apply the lessons learned from YJIT to create a powerful method-based compiler.
While the ideal is to write any code and let the JIT optimize it, some patterns are easier to speed up than others. Patterson's key advice is to monomorphize call sites.
object.to_s
where object
could be a String, Symbol, Integer, etc.). The generated code must check for all possible types, adding overhead.A real-world example from the Prism parser showed a 13% speedup in Ruby 3.4 simply by refactoring code to ensure call sites were monomorphic. This works with or without a JIT.
The guidance is not to avoid polymorphism altogether but to favor high-value polymorphism (e.g., different strategy objects that encode meaningful business logic) over low-value polymorphism (e.g., calling to_s
on an input that could be either a String or a Symbol for caching). For low-value cases, standardize the input type early to create monomorphic call sites.
The future of Ruby performance is bright, driven by the dual engines of Ractors for parallelism and ZJIT for raw execution speed. The core team is focused on stabilizing Ractors and advancing ZJIT for the upcoming Ruby 3.5 release.
Patterson also highlighted that object allocation has been made 70% faster in Ruby 3.5, providing another compelling reason to upgrade.
The talk concluded by emphasizing that these advancements aim to empower developers to write powerful applications, with the language and infrastructure handling the complex task of making them fast.