CircuitLasso proposes a scalable method for learning sparse circuits in large language models using sparse linear regression. It achieves structural accuracy comparable to state-of-the-art intervention-based methods at significantly lower computational cost, while enabling efficient discovery of semantic feature propagation and improving performance on domain-generalization tasks with reduced cost.
CircuitLasso: Scalable Circuit Learning for LLM Interpretability
from English