Bad contexts in conversations can lead to 'pigeonholing', where models repeat incorrect answers or narrow down to a single response. Experiments show performance drops of 38-40% and worsening errors with more conversation turns, even when initial inputs are correct. A new method, RLVR with synthetic errors, improves model performance by 43-60% under such bad contexts.
Bad Prompts Cause Model Collapse and Mistakes
from English