The paper introduces WorldEvolver, a framework that equips long-horizon LLM agents with reliable foresight by revising deployment-time context without modifying model parameters. It addresses the issue of unreliable predictions degrading decision-making through a self-evolving approach that enhances predictive fidelity and planning performance.
- Episodic Memory retrieves real action transitions for simulation.
- Semantic Memory extracts persistent heuristic rules from prediction-observation mismatches.
- Selective Foresight filters low-confidence predictions before integration.
- Evaluated on ALFWorld and ScienceWorld, it achieves highest prediction accuracy across three backbones.
- Leads other baselines on downstream agent success rate measured on AgentBoard.
This approach demonstrates that test-time memory revision significantly improves both the accuracy of world model predictions and the overall success rate of agent planning tasks.