Answer-in-context diagnostic and submodular packing improve budget-constrained multi-hop RAG

Researchers introduce "answer-in-context," a diagnostic measuring whether gold answers survive as contiguous spans in packed reader contexts, arguing it is superior to document recall for budget-constrained retrieval-augmented generation. They also propose casting reader-context construction as budgeted monotone submodular maximization to jointly optimize relevance, coverage, representativeness, and diversity.

Answer-in-context predicts answer F1 better than recall (r=0.39-0.55 vs. ~0.31) and separates answer quality five-fold on HotpotQA.
The submodular packer beats MMR and naive packing by up to +5.1 F1 on HotpotQA with a 160-token budget and 3B reader.
Gains require multi-hop structure, effective retrieval, binding budgets, and readers where evidence density is the bottleneck.
The advantage over heuristics is absorbed by 7B models and reversed by 14B models, as explained by the diagnostic.

The study demonstrates that optimizing for answer survival rather than recall improves performance in specific multi-hop scenarios with limited context budgets.