InfoKV: Information-Aware KV Cache Compression for Long Reasoning
Researchers introduce InfoKV, an entropy-aware framework that compresses key-value caches by combining token-level predictive uncertainty with attention scores to improve long-context reasoning.