A new ensemble method for finite-horizon MDPs uses quantile-based estimates to achieve minimax optimal regret bounds. It eliminates reliance on count-based uncertainty and provides theoretical justification for ensemble-based exploration in reinforcement learning.