Researchers introduce AutoMem, a framework that treats memory management in large language models as a trainable skill rather than a static component. The system promotes file-system operations to first-class memory actions, allowing the model to autonomously decide how to encode, retrieve, and organize knowledge.
- AutoMem automates two axes of improvement: the memory structure (prompts, schemas, vocabulary) and the model's proficiency in using it.
- A strong LLM reviews agent trajectories to iteratively revise the memory structure, while the agent's successful decisions serve as training signals to sharpen proficiency.
- Testing on procedurally generated long-horizon games Crafter, MiniHack, and NetHack showed that optimizing memory alone improved base agent performance by 2x-4x.
- This approach brought a 32B open-weight model competitive with frontier systems like Claude Opus 4.5 and Gemini 3.1 Pro Thinking without modifying task-action behavior.
The results demonstrate that memory management is an independently learnable skill and a high-leverage objective for achieving significant gains in long-horizon tasks.