Local Branch Routing: Efficient Trainable Test-Time Scaling for Language Models

The authors introduce Local Branch Routing (LBR), a token-level framework designed to improve language model reasoning through efficient test-time scaling. LBR expands a small local lookahead tree and forwards all sampled branches through the model, using a lightweight router to select the depth-1 subtree for commitment. This approach allows each token decision to utilize evidence from candidate local futures without incurring the computational costs of full solution-level search. The method employs a prune-shift-grow decoding process that preserves discrete branch identities and defines a tractable tree-trajectory likelihood. Consequently, LBR enables end-to-end reinforcement learning with verifiable rewards, jointly optimizing the base model and router under the same likelihood-ratio principle as discrete-token RLVR. Experimental results on synthetic hierarchical-planning tasks demonstrate that post-candidate hidden states provide useful routing evidence. Furthermore, benchmarks in mathematical reasoning show that LBR improves both Pass@1 and Pass@32 metrics compared to discrete chain-of-thought and other baselines.