Direct Advantage Estimation for Partially Observable Domains
Direct Advantage Estimation (DAE) is extended to partially observable domains with minimal modifications. A discrete latent dynamics model reduces computational overhead by efficiently approximating transition probabilities, enabling scalable and sample-efficient deep reinforcement learning in high-dimensional observation spaces.