☆ 4.7 Article

The dropout learning algorithm

ARTIFICIAL INTELLIGENCE (2014)

期刊

ARTIFICIAL INTELLIGENCE

卷 210, 期 -, 页码 78-122

出版社

ELSEVIER

DOI: 10.1016/j.artint.2014.02.004

关键词

Machine learning; Neural networks; Ensemble; Regularization; Stochastic neurons; Stochastic gradient descent; Backpropagation; Geometric mean; Variance minimization; Sparse representations

类别

Computer Science, Artificial Intelligence

资金

NVIDIA
[NSF IIS-0513376]
[NSF-IIS-1321053]
[NIH LM010235]
[NIH NLM T15 LM07443]
Direct For Computer & Info Scie & Enginr [1321053] Funding Source: National Science Foundation
Div Of Information & Intelligent Systems [1321053] Funding Source: National Science Foundation

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Dropout is a recently introduced algorithm for training neural networks by randomly dropping units during training to prevent their co-adaptation. A mathematical analysis of some of the static and dynamic properties of dropout is provided using Bernoulli gating variables, general enough to accommodate dropout on units or connections, and with variable rates. The framework allows a complete analysis of the ensemble averaging properties of dropout in linear networks, which is useful to understand the non-linear case. The ensemble averaging properties of dropout in non-linear logistic networks result from three fundamental equations: (1) the approximation of the expectations of logistic functions by normalized geometric means, for which bounds and estimates are derived; (2) the algebraic equality between normalized geometric means of logistic functions with the logistic of the means, which mathematically characterizes logistic functions; and (3) the linearity of the means with respect to sums, as well as products of independent variables. The results are also extended to other classes of transfer functions, including rectified linear functions. Approximation errors tend to cancel each other and do not accumulate. Dropout can also be connected to stochastic neurons and used to predict firing rates, and to backpropagation by viewing the backward propagation as ensemble averaging in a dropout linear network. Moreover, the convergence properties of dropout can be understood in terms of stochastic gradient descent. Finally, for the regularization properties of dropout, the expectation of the dropout gradient is the gradient of the corresponding approximation ensemble, regularized by an adaptive weight decay term with a propensity for self-consistent variance minimization and sparse representations. (C) 2014 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/).

The dropout learning algorithm

期刊

ARTIFICIAL INTELLIGENCE

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

The dropout learning algorithm

期刊

ARTIFICIAL INTELLIGENCE

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文