ContextRL introduces an indirect auxiliary objective to improve long-horizon reasoning and multimodal performance in LLMs. It rewards models for selecting the context that supports a query-answer pair, using contrastive context data from coding agent trajectories and image-based visual questions. ContextRL achieves +2.2% and +1.8% gains over standard methods on long-horizon and visual QA benchmarks, with gains attributed to the selection objective, not data augmentation.
arxiv
arXiv cs.CL
·
9d ago
·
research
ContextRL: Context-Aware RL for LLMs
from English
Importance 3/3
New feature vs. leaders
New harness with differentiators
arXiv cs.CL
OpenAI
Google DeepMind
Mistral AI
AI agents
Multimodal
Reasoning models
Benchmarks
| Benchmark | Model | Score |
|---|---|---|
| SWE-bench | ContextRL | 2.2% |