☆ 4.5 Article

Ensemble Estimation of Generalized Mutual Information With Applications to Genomics

IEEE TRANSACTIONS ON INFORMATION THEORY (2021)

期刊

IEEE TRANSACTIONS ON INFORMATION THEORY

卷 67, 期 9, 页码 5963-5996

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TIT.2021.3100108

关键词

Convergence; Estimation; Random variables; Feature extraction; Entropy; Density measurement; Kernel; Mutual information; nonparametric estimation; central limit theorem; single cell data; feature selection; minimax rate

类别

Computer Science, Information Systems Engineering, Electrical & Electronic

资金

U.S. Army Research Office [W911NF1910269, W911NF1510479]
National Nuclear Security Administration in U.S. Department of Energy [DE-NA0003921]
U.S. Department of Defense (DOD) [W911NF1510479] Funding Source: U.S. Department of Defense (DOD)

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This passage introduces the concept and application of mutual information, and proposes an ensemble estimator called GENIE for estimating mutual information measures between continuous and mixed variables. The estimator can achieve a parametric mean squared error convergence rate of 1/N and is suitable for complex scenarios commonly encountered in practical applications.

Mutual information is a measure of the dependence between random variables that has been used successfully in myriad applications in many fields. Generalized mutual information measures that go beyond classical Shannon mutual information have also received much interest in these applications. We derive the mean squared error convergence rates of kernel density-based plug-in estimators of general mutual information measures between two multidimensional random variables X and Y for two cases: 1) X and Y are continuous; 2) X and Y may have a mixture of discrete and continuous components. Using the derived rates, we propose an ensemble estimator of these information measures called GENIE by taking a weighted sum of the plug-in estimators with varied bandwidths. The resulting ensemble estimators achieve the 1/N parametric mean squared error convergence rate when the conditional densities of the continuous variables are sufficiently smooth. To the best of our knowledge, this is the first nonparametric mutual information estimator known to achieve the parametric convergence rate for the mixture case, which frequently arises in applications (e.g. variable selection in classification). The estimator is simple to implement and it uses the solution to an offline convex optimization problem and simple plug-in estimators. A central limit theorem is also derived for the ensemble estimators and minimax rates are derived for the continuous case. We demonstrate the ensemble estimator for the mixed case on simulated data and apply the proposed estimator to analyze gene relationships in single cell data.

Ensemble Estimation of Generalized Mutual Information With Applications to Genomics

期刊

IEEE TRANSACTIONS ON INFORMATION THEORY

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Ensemble Estimation of Generalized Mutual Information With Applications to Genomics

期刊

IEEE TRANSACTIONS ON INFORMATION THEORY

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文