Leveraging Similarities in Multi-Armed Bandits

This study investigates online learning with similarity-structured action sets encoded by rooted trees, demonstrating that standard one-point feedback cannot exploit these similarities. The authors propose unified algorithms for richer feedback models that replace the number of actions with a similarity-aware effective count to improve regret bounds.

Actions are encoded by a rooted tree where levels quantify relationship closeness and losses are tree-compatible.
Standard one-point bandit feedback is proven unable to leverage range or tree-induced similarity under strong constraints.
Algorithms adapt to feedback models ranging from semi-bandit down to minimal two-point protocols.
Regret bounds replace the action count K with a similarity-aware effective number of actions, K_eff.
The approach achieves sqrt(T) regret in Lipschitz bandits with dimension d <= 2 under two-point feedback.

These algorithms provably exploit action similarities by adapting to various feedback models, offering best-of-both-worlds guarantees and improved performance in specific high-dimensional settings.