CAT: Confidence-Adaptive Thinking balances accuracy and latency in Large Reasoning Models

The authors propose Confidence-Adaptive Thinking (CAT), a framework that uses a model's intrinsic self-certainty signals to autonomously modulate reasoning lengths based on problem difficulty. This approach addresses the issue of overthinking in Large Reasoning Models, which causes significant token overhead and reduced inference efficiency.

CAT incorporates self-certainty into the preference optimization process to compress confident responses while deliberating on uncertain ones.
The method avoids performance degradation on difficult problems by not applying uniform length reduction or relying on coarse-grained difficulty estimation.
Experimental results show that CAT consistently outperforms state-of-the-art baselines on reasoning accuracy across multiple benchmarks on different base models.

The work offers a potentially robust solution for balancing accuracy and latency in practical industrial scenarios.