This study investigates online learning with similarity-structured action sets encoded by rooted trees, demonstrating that standard one-point feedback cannot exploit these similarities. The authors propose unified algorithms for richer feedback models that replace the number of actions with a similarity-aware effective count to improve regret bounds.
- Actions are encoded by a rooted tree where levels quantify relationship closeness and losses are tree-compatible.
- Standard one-point bandit feedback is proven unable to leverage range or tree-induced similarity under strong constraints.
- Algorithms adapt to feedback models ranging from semi-bandit down to minimal two-point protocols.
- Regret bounds replace the action count K with a similarity-aware effective number of actions, K_eff.
- The approach achieves sqrt(T) regret in Lipschitz bandits with dimension d <= 2 under two-point feedback.
These algorithms provably exploit action similarities by adapting to various feedback models, offering best-of-both-worlds guarantees and improved performance in specific high-dimensional settings.