KVEraser: Efficient Localized Context Erasing in LLMs
KVEraser enables efficient localized context erasing in large language models by replacing only the KV cache states of an erased span with learned steering states. It achieves near-full-recomputation performance on in-domain tasks and offers a 24% latency increase versus a 17.6x increase for full recomputation, with up to 3--4x speedup on long-document QA tasks.