The paper proposes SME-OFU, a novel algorithm for stochastic linear contextual bandits with bounded reward noise. It achieves a regret bound of O(log T) by leveraging set-membership estimation and optimism in the face of uncertainty. Simulations show SME-OFU outperforms a sub-Gaussian noise benchmark when reward noise is bounded.
SME-OFU: Set-Membership Approach for Stochastic Linear Contextual Bandits
from English