textlize pricing account
Aaron Patterson - Rails World 2025 Closing Keynote
Cover

00:56:02

Rails World 2025: Unpacking Ruby's Future with Ractors and ZJIT

Aaron Patterson's closing keynote at Rails World 2025 delved into the cutting-edge performance enhancements coming to Ruby, focusing on true parallelism with Ractors and the next-generation JIT compiler, ZJIT. Here’s a breakdown of the key insights for developers.

The Challenge: Efficient Parallel Work

At Shopify, a core goal is improving machine utilization by handling more parallel work without degrading latency. Current web servers, whether process-based (like Unicorn) or thread-based (like Puma), face a fundamental Ruby limitation: only processes can achieve true CPU parallelism, leading to inefficiencies with mixed I/O and CPU workloads.

An ideal solution would be a load-aware web server that could dynamically scale its parallel processing capacity based on current CPU load, potentially using HTTP/2 for back-pressure communication. The missing piece is a construct that can be spun up quickly and handle CPU-bound work in parallel. The answer? Ractors.

Ractors: True Parallelism in Ruby

Ractors (Ruby Actors) provide an actor-model style of parallelism. Each Ractor has its own independent Global VM Lock (GVL), allowing them to run Ruby code truly in parallel on multiple CPU cores.

Ractors vs. Other Concurrency Models

A benchmark calculating a Fibonacci sequence demonstrates the difference:

  • Serial execution: ~2000 ms
  • Threads/Fibers: ~2000 ms (No parallelism for CPU-bound work)
  • Processes/Ractors: ~480 ms (True CPU parallelism achieved)

Overcoming the Experimental Hurdle

Ractors currently output a warning: "experimental feature... behavior may change in future versions." The Ruby core team, including Patterson's team at Shopify, is actively working on stabilizing the API and fixing implementation issues to remove this warning and make Ractors production-ready.

The Golden Rule: No Shared Mutable State

The fundamental rule of Ractors is that mutable objects cannot be shared between them. When passed, they are duplicated. Immutable objects (e.g., frozen strings) can be passed by reference without cost.

To pass complex, nested mutable data structures between Ractors, use Ractor.make_shareable to deep-freeze the object, or use APIs that return frozen structures by default (e.g., JSON.parse(json, freeze: true)).

Communication via Ports

Ractors communicate through "ports," which are essentially queues. The critical rules for ports are:

  • Any Ractor can write to any port.
  • Only the Ractor that created the port can read from it.

This requires a different mental model for coordination compared to traditional thread-based queues, often involving a central "coordinator" Ractor that distributes work to "worker" Ractors that explicitly ask for tasks.

The "noGVL" Block: A Powerful Use Case

A compelling use case for Ractors is wrapping CPU-intensive operations, effectively creating a "noGVL" block in pure Ruby. This allows other threads (e.g., those handling web requests) to run in parallel while the CPU-bound task executes in a Ractor.

For example, instead of writing a C extension to release the GVL for a task like password hashing (with bcrypt) or JSON parsing, you can achieve the same parallelism by isolating that task inside a Ractor, making it accessible to pure Ruby code.

Fixing Hidden Bottlenecks

The path to stable Ractors involves finding and fixing hidden global bottlenecks within CRuby itself. Patterson highlighted an issue where parsing JSON was slower with Ractors than serially due to lock contention on Ruby's internal "fstring table" (a global hash for deduplicating frozen strings).

Fixing this by implementing a lock-free hash made JSON parsing in Ractors 12x faster. Similar bottlenecks have been found and addressed in other internal structures (e.g., the symbol ID table, the inline cache CC table, the encoding table).

ZJIT: The Next-Generation JIT Compiler

Alongside Ractors, the Ruby team is developing ZJIT, a new method-based Just-In-Time compiler set to ship with Ruby 3.5. It complements the existing YJIT compiler.

How JIT Compilers Speed Up Code

A JIT compiler assembles machine code at runtime. It speeds up programs by:

  • Eliminating interpreter overhead: Running code directly on the CPU instead of through the YARV bytecode interpreter.
  • Caching values: Embedding constants directly into machine code.
  • Speculating and deoptimizing: Making assumptions (e.g., a method isn't monkey-patched) and generating optimized code, then gracefully falling back if the assumption is wrong.
  • Eliminating redundant type checks: Inferring types and removing repeated checks.

YJIT vs. ZJIT

YJIT (Lazy Basic Block Versioning): Currently ships with Ruby. It compiles straight-line code paths ("basic blocks") lazily as they are discovered during execution.

  • Pros: Very fast warm-up time, low memory overhead.
  • Cons: More challenging register allocation, can lead to values being spilled to memory.

ZJIT (Method-Based): Compiles entire methods at a time.

  • Pros: Better register allocation, easier constant folding, leverages existing compiler research.
  • Cons: May compile unused code (e.g., both branches of an if-statement), requires more type-tracking infrastructure.

ZJIT is very new and is not yet as fast as YJIT, but the goal is to apply the lessons learned from YJIT to create a powerful method-based compiler.

Writing JIT-Friendly Code: A Pro Tip

While the ideal is to write any code and let the JIT optimize it, some patterns are easier to speed up than others. Patterson's key advice is to monomorphize call sites.

Polymorphic vs. Monomorphic Call Sites

  • Polymorphic: A method call where the receiver can be many different types (e.g., object.to_s where object could be a String, Symbol, Integer, etc.). The generated code must check for all possible types, adding overhead.
  • Monomorphic: A method call where the receiver is consistently the same type. This allows for much faster, optimized code.

Practical Application

A real-world example from the Prism parser showed a 13% speedup in Ruby 3.4 simply by refactoring code to ensure call sites were monomorphic. This works with or without a JIT.

The guidance is not to avoid polymorphism altogether but to favor high-value polymorphism (e.g., different strategy objects that encode meaningful business logic) over low-value polymorphism (e.g., calling to_s on an input that could be either a String or a Symbol for caching). For low-value cases, standardize the input type early to create monomorphic call sites.

Conclusion & The Road to Ruby 3.5

The future of Ruby performance is bright, driven by the dual engines of Ractors for parallelism and ZJIT for raw execution speed. The core team is focused on stabilizing Ractors and advancing ZJIT for the upcoming Ruby 3.5 release.

Patterson also highlighted that object allocation has been made 70% faster in Ruby 3.5, providing another compelling reason to upgrade.

The talk concluded by emphasizing that these advancements aim to empower developers to write powerful applications, with the language and infrastructure handling the complex task of making them fast.

© 2025 textlize.com. all rights reserved. terms of services privacy policy