arxiv arXiv cs.LG · 8d ago · src: 9d ago · research

EnvRL: Leveraging Environment Dynamics in Agentic RL

from English

EnvRL introduces a framework that enhances agentic reinforcement learning by incorporating environment dynamics through state prediction and inverse dynamics objectives. When trained with GRPO, EnvRL improves success rates of Qwen-2.5-1.5B-Instruct from 72.8% to 77.4% on ALFWorld and from 56.8% to 67.0% on WebShop.

Importance 3/3 New feature vs. leaders New harness with differentiators arXiv cs.LG Alibaba (Qwen) AI agents Reasoning models Training methods

Benchmarks

Benchmark	Model	Score
WebArena	Qwen-2.5-1.5B-Instruct	77.4%

Read original