The paper introduces HRLLI, a hierarchical reinforcement learning framework designed to improve sample efficiency by leveraging natural-language instructions. It addresses the limitation of existing approaches that treat instructions as static inputs, failing to account for their stage-dependent relevance in complex environments. The proposed method decomposes instructions into piecewise guidance elements that become relevant at different interaction stages. A novel Select-to-Act paradigm is formulated where a high-level semantic policy acts as a selector for the most relevant instruction piece based on the current state. This selected guidance conditions a low-level policy that executes environment actions, with both policies learned simultaneously to maximize augmented expected returns. Experiments on the RTFM benchmark demonstrate that HRLLI consistently outperforms strong instruction-conditioned RL baselines. The results confirm that explicitly modeling adaptive instruction selection significantly enhances reinforcement learning effectiveness.
Select-to-Act: Hierarchical Reinforcement Learning via Adaptive Language Guidance
from English