Le lundi 01 mars 2021 à 13:45 - UM - Bât 09 - Salle de conférence (1er étage)Odalric-Ambrym Maillard
In this talk, I will focus on two key topics in Reinforcement Learning Theory.
First, in the setup of multi-armed bandits that formalizes the exploration-exploitation trade-off,
I will review some recent works showing the benefit of exploiting time-uniform concentration inequalities,
of leveraging the information given by regret lower bounds in a structured bandit setup,
and of considering non-parametric estimation schemes.
Then, in the setup of (rested, stationary, discrete time) Markov Decision Processes learning,
I will show how statistical tools can improve regret minimization strategies inspired from the bandit litterature,
first in the tabular (discrete state-action space) case, then in a continuous state-action setup with parametric transitions.
Finally, I will highlight a few challenges inspired from an application to agroecology.
WEBINAIRE ouvert à toutes et tous : https://umontpellier-fr.zoom.us/j/85813807839