Memory-Managed Long-Context Attention: A Preliminary Study of Editable Request-Local Memory

This study investigates memory-managed long-context attention by separating a fast recurrent or sparse backbone from explicit editable request-local memory slots and query-time sparse fallback. The research aims to address the limitations of existing linear, recurrent, and sparse attention methods in managing when facts should be written, overwritten, protected, or discarded.

Pure fixed-state or pure sparse methods fail specific overwrite, version, anti-pollution, or no-write-signal cases, whereas a hybrid approach covers both routes.
A 2,097,152-token mechanism stress test achieved 50/50 pooled accuracy with 2-132 active chunks.
A 2.74M-parameter minimal causal event-token model reached 595/600 accuracy with lite write supervision, demonstrating trainability independent of scale.
A six-family frozen-hidden-state bridge achieved 1079/1080 controlled pointer accuracy but relied on generator-provided integer key IDs rather than open-text entity resolution.
Local non-leaderboard RULER 4K diagnostics remained close to full context, while a 33-record LongBench v1 16K subset showed that naive lexical selection is not general.

The evidence suggests that controlled slot lifecycle is feasible and sparse fallback is necessary when writes lack future-query signals, but learned open-domain selection remains the primary architectural bottleneck.