The authors propose MATCH, a framework that augments sparsified attention mechanisms with dynamically integrated in-context information to address the scalability bottlenecks of traditional attention in long-context scenarios.

  • Addresses the quadratic computational cost and performance degradation associated with rigid structural constraints like local attention windows.
  • Integrates an efficient retrieval system to dynamically incorporate in-context information into sparse attention architectures.
  • Demonstrates significant performance improvements on both synthetic and real-world natural-language tasks requiring precise long-range recall.

MATCH serves as a versatile approach for enhancing in-context retrieval capabilities while maintaining the efficiency benefits of sparse attention models.