Researchers introduce LACUNA, the first unlearning testbed featuring ground-truth parameter-level localization to address the gap in evaluating whether unlearning truly erases knowledge from model parameters. The testbed injects PII of synthetic individuals into predefined parameters of 1B and 7B OLMo-based models via masked continual pretraining.
- LACUNA enables direct evaluation of whether unlearning targets the weights responsible for knowledge storage.
- Benchmarking reveals that current SOTA methods are highly imprecise despite strong output-level performance.
- Existing methods remain susceptible to resurfacing attacks even when they appear effective at the output level.
- Successful localization allows simple gradient-based unlearning to achieve strong erasure and robustness.
The authors release LACUNA to complement behavioral evaluations and drive further advances in robust, localization-based unlearning.