4.5 Article

Information Theory With Kernel Methods

Journal

IEEE TRANSACTIONS ON INFORMATION THEORY
Volume 69, Issue 2, Pages 752-775

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TIT.2022.3211077

Keywords

Kernel; Entropy; Probability distribution; Information theory; Hilbert space; Eigenvalues and eigenfunctions; Tensors; Kernel methods; quantum information theory

Ask authors/readers for more resources

Through the associated covariance operators in reproducing kernel Hilbert spaces, we analyze probability distributions. We find that the von Neumann entropy and relative entropy of these operators are closely related to the usual notions of Shannon entropy and relative entropy, and have many similar properties. They can be used together with efficient estimation algorithms from various oracles on the probability distributions. We also consider product spaces and define notions of mutual information and joint entropies for tensor product kernels, which perfectly characterize independence, but only partially conditional independence. Lastly, we show how these new notions of relative entropy lead to new upper-bounds on log partition functions, which can be utilized in conjunction with convex optimization in variational inference methods, providing a new family of probabilistic inference methods.
We consider the analysis of probability distributions through their associated covariance operators from reproducing kernel Hilbert spaces. We show that the von Neumann entropy and relative entropy of these operators are intimately related to the usual notions of Shannon entropy and relative entropy, and share many of their properties. They come together with efficient estimation algorithms from various oracles on the probability distributions. We also consider product spaces and show that for tensor product kernels, we can define notions of mutual information and joint entropies, which can then characterize independence perfectly, but only partially conditional independence. We finally show how these new notions of relative entropy lead to new upper-bounds on log partition functions, that can be used together with convex optimization within variational inference methods, providing a new family of probabilistic inference methods.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available