Analytic Policy Gradients for Sample and Learning Efficient Control
Analytic Policy Gradients (APG) enables exact gradient computation via backpropagation through simulation when environment dynamics are differentiable. APG outperforms Proximal Policy Optimization (PPO) on four continuous control tasks, showing superior sample and learning efficiency with a segmented backpropagation scheme that reduces gradient degradation on long-horizon tasks.