SAE-Guided Activation Regularization for LLM Continual Learning
This paper proposes a new approach to catastrophic forgetting in large language models by regularizing in activation space using pretrained Sparse Autoencoders (SAEs) as a monosemantic feature dictionary, rather than relying on traditional weight-space methods like Elastic Weight Consolidation (EWC).