This paper formalizes the fleet-memory problem in multi-agent LLM environments, identifying four foundational failure modes: unauthorized leakage, stale propagation, contradiction persistence, and provenance collapse. To address these issues, the authors define explicit systems-level primitives including scoped retrieval, temporal supersession, provenance tracking, and policy-governed memory propagation.

  • These primitives are implemented in MemClaw, a production multi-tenant memory service, and evaluated via the ArgusFleet harness.
  • The system successfully reconstructed 100% of depth-four derivation chains with correct writer identity at sub-second per-hop latency.
  • Evaluation demonstrated high intra-fleet visibility with zero cross-fleet leakage under strong write mode.
  • Production testing revealed asymmetric scope enforcement issues where sub-tenant scopes were bypassed on direct GET-by-id requests.
  • A pipeline ordering conflict was identified where synchronous near-duplicate gates can prematurely reject contradictory writes before asynchronous detection.

The study concludes that long-context retrieval alone is insufficient for production multi-agent memory, emphasizing that governed shared memory requires explicit systems-level abstractions and live evaluation to expose enforcement failures.