The paper introduces Agent-Authored World Modeling (AAWM), a training procedure that addresses the limitations of standard world modeling objectives tied to next-observation prediction. This traditional approach often omits dynamics relevant to an agent's current decision because supervision depends on what a transition reveals rather than what is needed. AAWM constructs supervision directly from the policy's decision needs by having the agent identify necessary environmental understanding at each state. Relevant transition evidence is retrieved across trajectories and synthesized into training targets that capture these decision-oriented dynamics. This method aligns the learning objective with the specific information required before acting, rather than forcing the model to reconstruct the next observation. Experimental results validate AAWM's effectiveness across multiple environments and training settings. The findings demonstrate that decision-aware world-model targets provide a more effective learning signal than conventional next-observation prediction.
Agent-Authored World Modeling Aligns Training with Decision Needs
from English