NTR Webinar: Tsallis-INF for Decoupled Exploration and Exploitation in Multi-armed Bandits

NTR organizes and hosts technical webinars about neural networks and invites speakers from all over the world to lead them. Webinars are held on Tuesdays and most are in Russian. 

On June 15 Yevgeny Seldin, University of Copenhagen, Copenhagen, Denmark, led a technical Zoom webinar on Tsallis-INF for Decoupled Exploration and Exploitation in Multi-armed Bandits.

About the webinar: 

I started with a general introduction to online learning and then presented the Tsallis-INF algorithm, which achieves the optimal (within constants) pseudo-regret in both stochastic and adversarial multi-armed bandits without prior knowledge of the regime and time horizon. 

Additionally, it achieves the optimal regret guarantee in several intermediate regimes, including stochastically constrained adversarial bandits and stochastic bandits with adversarial corruptions. 

We provided empirical evaluation of the algorithm, demonstrating that it significantly outperforms UCB1 and EXP3 in stochastic environments. We also provide examples of adversarial environments, where UCB1 and Thompson Sampling exhibit almost linear regret, whereas Tsallis-INF suffers only logarithmic regret.

I surveyed several follow-up works, including an optimal algorithm for adversarial bandits with arbitrary delays, and an algorithm for stochastic and adversarial bandits with switching costs.

The presentation was based on the following papers:

Julian Zimmert and Yevgeny Seldin. Tsallis-INF: An optimal algorithm for stochastic and adversarial bandits. Journal of Machine Learning Research, 2021.

Saeed Masoudian and Yevgeny Seldin. Improved analysis of robustness of the Tsallis-INF algorithm to adversarial corruptions in stochastic multiarmed bandits. COLT, 2021.

Julian Zimmert and Yevgeny Seldin. An optimal algorithm for adversarial bandits with arbitrary delays. AISTATS, 2020.

Chloé Rouyer, Yevgeny Seldin, and Nicolò Cesa-Bianchi. An algorithm for stochastic and adversarial bandits with switching costs. ICML, 2021.

Chloé Rouyer and Yevgeny Seldin. Tsallis-INF for Decoupled Exploration and Exploitation in Multi-armed Bandits. COLT, 2020.

Webinar recording (in English)

Webinar presentation (in English)

Moderator and contact: NTR CEO Nick Mikhailovsky: nickm@ntrlab.com.

Leave a Reply

Your email address will not be published. Required fields are marked *