Synthetic RAG benchmark shows document shape yields bigger gains than model tweaks

A synthetic healthcare database benchmark demonstrates that optimizing data representation, such as using rollup documents and Small-to-Big retrieval, significantly outperforms standard RAG upgrades like query rewriting and reranking.

The author built a 30-question eval set over fake patients, doctors, and billing records to test various RAG techniques.
Basic vector search achieved an answer score of 2.856/5, while adding query rewriting and BGE reranking only raised it to 3.056/5.
Small-to-Big retrieval (searching small chunks then expanding to full records) improved the score to 4.044/5 by providing precise matching without context starvation.
Adding precomputed rollup documents for aggregates like appointment loads and billing totals raised the answer score to 4.622/5 and hard-question score to 4.500/5.
A final Jina reranker run achieved the highest retrieval MRR at 0.792, but the rollup setup provided the best overall answer quality.

The results suggest that RAG quality is often a data representation problem rather than a model problem, emphasizing the need to align document structure with query types like entity-level lookups or aggregate calculations.