Bayesian methods for machine learning have been widely investigated,
yielding principled methods for incorporating prior information into
inference algorithms. In this survey, we provide an in-depth review
of the role of Bayesian methods for the reinforcement learning (RL)
paradigm. The major incentives for incorporating Bayesian reasoning
in RL are: 1) it provides an elegant approach to action-selection (exploration/
exploitation) as a function of the uncertainty in learning; and
2) it provides a machinery to incorporate prior knowledge into the algorithms.
We first discuss models and methods for Bayesian inference
in the simple single-step Bandit model. We then review the extensive
recent literature on Bayesian methods for model-based RL, where prior
information can be expressed on the parameters of the Markov model.
We also present Bayesian methods for model-free RL, where priors are
expressed over the value function or policy class. The objective of the
paper is to provide a comprehensive survey on Bayesian RL algorithms
and their theoretical and empirical properties.
1