CEAP Reduces Variance in LLM Circuit Discovery
CEAP, a new circuit discovery method, substantially reduces resampling variance compared to EAP-IG. The paper shows that rephrasing variance arises from prompt templates activating different circuits, suggesting LLMs are inherently hard to steer across diverse inputs. Sample-wise variance is largely benign, as poor unfaithfulness scores result from selective contribution scaling, not circuit defects.