☆ 4.6 Article

Training neural networks using Metropolis Monte Carlo and an adaptive variant

MACHINE LEARNING-SCIENCE AND TECHNOLOGY (2022)

期刊

MACHINE LEARNING-SCIENCE AND TECHNOLOGY

卷 3, 期 4, 页码 -

出版社

IOP Publishing Ltd

DOI: 10.1088/2632-2153/aca6cd

关键词

adaptive; optimization; neural networks; Metropolis Monte Carlo; gradients

类别

Computer Science, Artificial Intelligence Computer Science, Interdisciplinary Applications Multidisciplinary Sciences

资金

Office of Science, Office of Basic Energy Sciences, of the U.S. Department of Energy [DE-AC02-05CH11231]
National Science and Engineering Council of Canada
Research Foundation-Flanders (FWO)
National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy Office of Science User Facility [DE-AC02-05CH11231]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

The zero-temperature Metropolis Monte Carlo (MC) algorithm is examined as a tool for training neural networks. It can effectively train neural networks with comparable accuracy to gradient descent (GD), although not necessarily as quickly. The adaptive Monte Carlo algorithm (aMC) is introduced to overcome limitations when the network structure or neuron activations are strongly heterogeneous. The MC method allows training of deep neural networks and recurrent neural networks that cannot be trained by GD due to insignificant or excessive gradients. MC methods offer a complementary approach to gradient-based methods for training neural networks, providing access to different network architectures and principles.

We examine the zero-temperature Metropolis Monte Carlo (MC) algorithm as a tool for training a neural network by minimizing a loss function. We find that, as expected on theoretical grounds and shown empirically by other authors, Metropolis MC can train a neural net with an accuracy comparable to that of gradient descent (GD), if not necessarily as quickly. The Metropolis algorithm does not fail automatically when the number of parameters of a neural network is large. It can fail when a neural network's structure or neuron activations are strongly heterogenous, and we introduce an adaptive Monte Carlo algorithm (aMC) to overcome these limitations. The intrinsic stochasticity and numerical stability of the MC method allow aMC to train deep neural networks and recurrent neural networks in which the gradient is too small or too large to allow training by GD. MC methods offer a complement to gradient-based methods for training neural networks, allowing access to a distinct set of network architectures and principles.

Training neural networks using Metropolis Monte Carlo and an adaptive variant

期刊

MACHINE LEARNING-SCIENCE AND TECHNOLOGY

出版社

IOP Publishing Ltd

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Training neural networks using Metropolis Monte Carlo and an adaptive variant

期刊

MACHINE LEARNING-SCIENCE AND TECHNOLOGY

出版社

IOP Publishing Ltd

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文