Goaly Research Blog

Featured Posts

Explore our latest research and updates

Speed and cost preview
Faster training. Lower GPU spend.

At the same GPU count, a 2× step-time speedup roughly halves per-step compute cost.

RL 8B
77s to 27s vs OSS async
Speed
2.8x
Cost cut
64%
RL 30B
fixed GPU count
Speed
2.5x
Cost cut
60%
RL 32B
~$4k saved per 1k steps
Speed
1.8x
Cost cut
44%
Featured · Engineering

Beyond Async RL: Faster Post-Training for Reasoning Models and Agents

Async trainer-sampler execution can speed up RL post-training, but load balancing and off-policy drift still make systems hard to scale safely. Goaly combines system optimizations with algorithmic controls to cut training time and cost while preserving model quality, delivering 2.8-4.3x speedups on 8B models and 1.8-2.5x on 30-32B models. The same stack extends to agentic RL workloads with long horizons, crash recovery, stateful workflows, and latency-aware scheduling.

May 28, 2026 · 10 min readRead →
More posts

More technical write-ups are on the way.

Want a heads-up when we publish? Join the waitlist.