Reinforcement Learning for Computer-Use Agents with Autonomous Evaluation
The authors propose a reinforcement learning fine-tuning framework that utilizes autonomous vision-language evaluation as a scalable supervision signal for GUI agents, eliminating the need for manual labels or task-specific heuristics. By treating evaluator feedback as a noisy binary reward channel and deriving a noise-corrected estimator for Proximal Policy Optimization, the method addresses the difficulty of obtaining machine-readable rewards in open-ended desktop environments.