Imagine to Ensure Safety in Hierarchical Reinforcement Learning
The method combines a learnable world model with high- and low-level policies to enable safe exploration in long-horizon tasks. The high-level policy guides exploration toward safe subgoals, while the low-level policy uses imagined rollouts to prevent unsafe behaviors, outperforming existing Safe RL methods in success rate and constraint satisfaction across diverse tasks.