Variance Reduction in Temporal Difference Learning

Temporal difference learning reduces variance by aggregating over multiple trajectories. The study shows TD variance is asymptotically bounded above by Monte Carlo estimators, and shorter horizon updates reduce variance for fixed samples. Direct Advantage Estimation acts as a regression-adjusted control variate, achieving tighter variance bounds than TD in large samples.