Research demonstrates that LLM agent memory systems rewrite casual or hedged remarks into confident, dated assertions that agents subsequently treat as verified facts. This process allows unverified information to bypass safety checks without requiring an active attacker, as the agent responds to phrasing confidence rather than source attribution.

  • Memory products like mem0 and LangMem convert conversation history into stored "facts" that later steps trust.
  • A casual remark becomes a confident assertion that grants every subsequent request it faces.
  • Agents obey flat assertions regardless of whether they are attributed, unattributed, or forged.
  • The evidential register (e.g., "reportedly") is the least-discounted hedge and is often obeyed like a flat assertion.
  • Passive tags like "unverified" are ignored, while active instructions to distrust can escalate errors in correct memory.

The study concludes that maintaining tentative phrasing in the memory store is necessary hygiene, but the most effective deployable defense against this hazard is using redundant sources to restore correct decisions.