Séminaire de Probabilités et Statistique

Le lundi 01 mars 2021 à 13:45 - UM - Bât 09 - Salle de conférence (1er étage)

Odalric-Ambrym Maillard
Some recent results in Reinforcement Learning Theory

In this talk, I will focus on two key topics in Reinforcement Learning Theory. First, in the setup of multi-armed bandits that formalizes the exploration-exploitation trade-off, I will review some recent works showing the benefit of exploiting time-uniform concentration inequalities, of leveraging the information given by regret lower bounds in a structured bandit setup, and of considering non-parametric estimation schemes. Then, in the setup of (rested, stationary, discrete time) Markov Decision Processes learning, I will show how statistical tools can improve regret minimization strategies inspired from the bandit litterature, first in the tabular (discrete state-action space) case, then in a continuous state-action setup with parametric transitions. Finally, I will highlight a few challenges inspired from an application to agroecology.

WEBINAIRE ouvert à toutes et tous : https://umontpellier-fr.zoom.us/j/85813807839

Voir la liste des séminaires