ReLAR introduces a reinforcement-guided framework that iteratively refines hidden states to improve LLM reasoning stability. It uses learned depth and action controllers trained via policy gradients to adaptively determine refinement steps, achieving better accuracy and generation quality with lower inference overhead than explicit reasoning methods.
ReLAR: Reinforcement-Guided Latent Refinement for Stable LLM Reasoning
from English