4.7 Review

Quantifying relevance in learning and inference

期刊

出版社

ELSEVIER
DOI: 10.1016/j.physrep.2022.03.001

关键词

Relevance; Statistical inference; Machine learning; Information theory

资金

  1. Kavli Foundation, United States
  2. Norwegian Research Council, Centre of Excellence scheme (Centre for Neural Computation) [223262]

向作者/读者索取更多资源

Learning is a distinctive feature of intelligent behavior, but our conceptual understanding of learning is still poor. This article reviews recent progress in understanding learning based on the concept of relevance, which quantifies the amount of information contained in a dataset or the internal representation of a learning machine about the generative model of the data. The theoretical framework is supported by empirical analysis.
Learning is a distinctive feature of intelligent behaviour. High-throughput experimental data and Big Data promise to open new windows on complex systems such as cells, the brain or our societies. Yet, the puzzling success of Artificial Intelligence and Machine Learning shows that we still have a poor conceptual understanding of learning. These applications push statistical inference into uncharted territories where data is high dimensional and scarce, and prior information on true models is scant if not totally absent. Here we review recent progress on understanding learning, based on the notion of relevance . The relevance, as we define it here, quantifies the amount of information that a dataset or the internal representation of a learning machine contains on the generative model of the data. This allows us to define maximally informative samples, on one hand, and optimal learning machines on the other. These are ideal limits of samples and of machines, that contain the maximal amount of information about the unknown generative process, at a given resolution (or level of compression). Both ideal limits exhibit critical features in the statistical sense: Maximally informative samples are characterised by a power-law frequency distribution (statistical criticality) and optimal learning machines by an anomalously large susceptibility. The trade-off between resolution (i.e. compression) and relevance distinguishes the regime of noisy representations from that of lossy compression. These are separated by a special point characterised by Zipf's law statistics. This identifies samples obeying Zipf's law as the most compressed loss-less representations that are optimal in the sense of maximal relevance. Criticality in optimal learning machines manifests in an exponential degeneracy of energy levels, that leads to unusual thermodynamic properties. This distinctive feature is consistent with the invariance of the classification under coarse graining of the output, which is a desirable property of learning machines. This theoretical framework is corroborated by empirical analysis showing (i) how the concept of relevance can be useful to identify relevant variables in high-dimensional inference and (ii) that widely used machine learning architectures approach reasonably well the ideal limit of optimal learning machines, within the limits of the data with which they are trained. (c) 2022 The Authors. Published by Elsevier B.V.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据