Goaly Research Blog

Featured Posts

Explore our latest research and updates

Lessons from Llama

From frontier-scale training to Goaly.

total parameters

GPUs at peak scale

policy-weight sync

Featured · Engineering

What We Learned Building Infrastructure for Frontier-Scale Models

Lessons from scaling asynchronous RL across multiple generations of Llama—from 405B dense models to a nearly two-trillion-parameter mixture-of-experts model running across thousands of GPUs—and why those lessons led us to build Goaly.

July 10, 2026 · 10 min readRead →

Engineering
Beyond Async RL: Faster Post-Training for Reasoning Models and Agents
Async trainer-sampler execution can speed up RL post-training, but load balancing and off-policy drift still make systems hard to scale safely. Goaly combines system optimizations with algorithmic controls to cut training time and cost while preserving model quality, delivering 2.8-4.3x speedups on 8B models and 1.8-2.5x on 30-32B models. The same stack extends to agentic RL workloads with long horizons, crash recovery, stateful workflows, and latency-aware scheduling.
May 28, 2026

What We Learned Building Infrastructure for Frontier-Scale Models

Beyond Async RL: Faster Post-Training for Reasoning Models and Agents