REVES introduces a two-stage iterative framework that enhances LLM reasoning through sequential revision and verification. It achieves +6.5 points over RL baselines and +4.0 points over standard multi-turn training on LiveCodeBench, using a 4B base model with fewer rollouts than large evolutionary systems. The method improves error correction and generalizes to out-of-distribution puzzles like n_queens and mini_sudoku.
arxiv
arXiv cs.LG
·
7d ago
·
research
REVES: Augmented Training for Test-Time Scaling
from English
Importance 3/3
Beats a top-lab benchmark
New feature vs. leaders
arXiv cs.LG
OpenAI
Google DeepMind
Meta AI
Code generation
Reasoning models
Training methods
Benchmarks
| Benchmark | Model | Score |
|---|---|---|
| LiveCodeBench | REVES | 6.5pts |