The authors propose RAPS-DA, a regime-aware peer specialization framework designed to address the fragility of Retrieval-augmented generation (RAG) when retrieved context conflicts with a model's parametric knowledge. This approach disentangles incompatible learning signals across different reliability regimes by training specialized peers and applying targeted supervision.

Conflicts are categorized into three regimes—Grounding, Arbitration, and Resistance—with one same-scale peer specialist trained per regime from a shared base model. Samples are hard-routed to their matched peer for on-policy reverse-KL supervision at the sample level. A dual-layer selector filters uninformative tokens and upweights confidently misaligned ones based on inter-teacher disagreement and student entropy. The method achieves gains through specialization at a fixed model scale, with peer specialists existing only during training. Experiments demonstrate that RAPS-DA surpasses all prompting, decoding, fine-tuning, RL, and single-teacher baselines across five conflict scenarios and two out-of-distribution benchmarks.

This framework allows the deployed student model to handle heterogeneous knowledge conflicts without requiring regime labels or access to peer specialists during inference.