This paper formalizes the fleet-memory problem in multi-agent LLM environments, identifying four foundational failure modes: unauthorized leakage, stale propagation, contradiction persistence, and provenance collapse. To address these issues, the authors define explicit systems-level primitives including scoped retrieval, temporal supersession, provenance tracking, and policy-governed memory propagation.
- These primitives are implemented in MemClaw, a production multi-tenant memory service, and evaluated via the ArgusFleet harness.
- The system successfully reconstructed 100% of depth-four derivation chains with correct writer identity at sub-second per-hop latency.
- Evaluation demonstrated high intra-fleet visibility with zero cross-fleet leakage under strong write mode.
- Production testing revealed asymmetric scope enforcement issues where sub-tenant scopes were bypassed on direct GET-by-id requests.
- A pipeline ordering conflict was identified where synchronous near-duplicate gates can prematurely reject contradictory writes before asynchronous detection.
The study concludes that long-context retrieval alone is insufficient for production multi-agent memory, emphasizing that governed shared memory requires explicit systems-level abstractions and live evaluation to expose enforcement failures.