The paper introduces WorldEvolver, a framework that equips long-horizon LLM agents with reliable foresight by revising deployment-time context without modifying model parameters. It addresses the issue of unreliable predictions degrading decision-making through a self-evolving approach that enhances predictive fidelity and planning performance.

  • Episodic Memory retrieves real action transitions for simulation.
  • Semantic Memory extracts persistent heuristic rules from prediction-observation mismatches.
  • Selective Foresight filters low-confidence predictions before integration.
  • Evaluated on ALFWorld and ScienceWorld, it achieves highest prediction accuracy across three backbones.
  • Leads other baselines on downstream agent success rate measured on AgentBoard.

This approach demonstrates that test-time memory revision significantly improves both the accuracy of world model predictions and the overall success rate of agent planning tasks.