The article introduces TraceRetain, a lightweight framework for bounded external memory in frozen LLM agents that scores and evicts entries based on interpretable features like success and redundancy. The study evaluates how retention policies impact performance when external memory is used to augment language models.

  • On clean ALFWorld with gpt-5-mini, external memory improves over no memory, but differences among bounded retention policies fall within Wilson 95% CIs.
  • Under a controlled noisy-write stress with 75% synthetic distractors, unbounded memory and FIFO-K50 degrade significantly on Precision@5.
  • TraceRetain-CEM remains essentially unchanged under noise, preserving 97 out of 100 task successes while maintaining precision.
  • Held-out in-distribution evaluation shows memory-augmented policies solving 47 to 49 of 50 tasks compared to 39 for no memory.

Bounded retention improves memory and step efficiency on saturated benchmarks without costing task success, differentiating itself from cache heuristics primarily when data streams contain noise.