This tutorial is concerned with applications of information theory
concepts in statistics, in the finite alphabet setting. The information
measure known as information divergence or Kullback-Leibler distance
or relative entropy plays a key role, often with a geometric flavor as
an analogue of squared Euclidean distance, as in the concepts of
I-projection, I-radius and I-centroid. The topics covered include large
deviations, hypothesis testing, maximum likelihood estimation in
exponential families, analysis of contingency tables, and iterative
algorithms with an “information geometry” background. Also, an
introduction is provided to the theory of universal coding, and to
statistical inference via the minimum description length principle
motivated by that theory.
1