KVEraser: Efficient Localized Context Erasing in LLMs
KVEraser enables efficient localized context erasing in large language models by replacing only the KV cache states of an erased span with learned steering states. It achieves near-full-recomputation performance on in-domain tasks across 1K to 32K context lengths, with only a 24% latency increase, and outperforms other approximate methods in long-document QA with 3--4x speedup over full recomputation.