SPIRAL: Learning to Search and Aggregate

The authors introduce Sequential-Parallel-Aggregative Reinforcement Learning (SPIRAL), a framework that trains language models to utilize sequential, parallel, and aggregative reasoning primitives simultaneously. Unlike standard post-training methods that optimize only for single-trace sequential reasoning, SPIRAL unifies these components into a single inference compute pipeline. The model first samples independent traces in parallel using chain-of-thought reasoning and then generates a final aggregation trace conditioned on those inputs. This entire process is optimized end-to-end against the reward of the final aggregated response using set reinforcement learning and standard reinforcement learning techniques. Experiments on reasoning tasks demonstrate that SPIRAL effectively scales with inference compute resources. The approach outperforms GRPO by up to 11 times in scaling efficiency and achieves 15% higher performance when all three compute primitives are scaled.