Copy this bookmark:



bookmark detail

Why is Kullback-Leibler divergence not a distance?
As an example, consider the probability densities below, one exponential and one gamma with a shape parameter of 2.

The two densities differ mostly on the left end. The exponential distribution believes this region is likely while the gamma does not. This means that an expectation with respect to the exponential distribution will weigh things in this region more heavily. In an information-theoretic sense, an exponential is a better approximation to a gamma than the other way around.
statistics  InformationTheory 
november 2017 by mike
view in context