Reward-free Pretraining for Reinforcement Learning via Occupancy Coverage Maximization
ROVER enables reward-free pretraining by maximizing occupancy coverage in state space, using a learned world model to estimate occupancy without density or entropy estimation. It introduces a virtual sink state to balance exploration of known and unknown regions, achieving more uniform coverage and better downstream performance in tabular and pixel-based navigation tasks.