MemDelta: Controlled Baselines and Hidden Confounds in Agent Memory Evaluation
The article introduces MemDelta, a controlled evaluation protocol for agent memory systems that isolates individual components to prevent confounding variables from skewing results. Using the LongMemEval-S dataset with 500 questions across three model families, the study reveals that reported gains often mix changes in memory methods with variations in language models or retrieval pipelines.