☆ 4.8 Article

Training Neural Networks by Lifted Proximal Operator Machines

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2022)

期刊

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

卷 44, 期 6, 页码 3334-3348

出版社

IEEE COMPUTER SOC

DOI: 10.1109/TPAMI.2020.3048430

关键词

Training; Artificial neural networks; Linear programming; Convergence; Tuning; Standards; Patents; Neural networks; lifted proximal operator machines; block multi-convex; block coordinate descent; parallel implementation

类别

Computer Science, Artificial Intelligence Engineering, Electrical & Electronic

资金

NSF of China [61802269, 61972132, 61625301, 61731018, 61876007]
Major Scientific Research Project of Zhejiang Lab [2019KB0AC01, 2019KB0AB02]
Beijing Academy of Artificial Intelligence, and Qualcomm
Fundamental Research Funds for the Central Universities

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

LPOM is a method for training fully-connected feed-forward neural networks that represents the activation function as an equivalent proximal operator and applies block multi-convex handling to all layer-wise weights and activations. It avoids gradient vanishing or exploding issues and is memory-efficient with relatively easy parameter tuning.

We present the lifted proximal operator machine (LPOM) to train fully-connected feed-forward neural networks. LPOM represents the activation function as an equivalent proximal operator and adds the proximal operators to the objective function of a network as penalties. LPOM is block multi-convex in all layer-wise weights and activations. This allows us to develop a new block coordinate descent (BCD) method with convergence guarantee to solve it. Due to the novel formulation and solving method, LPOM only uses the activation function itself and does not require any gradient steps. Thus it avoids the gradient vanishing or exploding issues, which are often blamed in gradient-based methods. Also, it can handle various non-decreasing Lipschitz continuous activation functions. Additionally, LPOM is almost as memory-efficient as stochastic gradient descent and its parameter tuning is relatively easy. We further implement and analyze the parallel solution of LPOM. We first propose a general asynchronous-parallel BCD method with convergence guarantee. Then we use it to solve LPOM, resulting in asynchronous-parallel LPOM. For faster speed, we develop the synchronous-parallel LPOM. We validate the advantages of LPOM on various network architectures and datasets. We also apply synchronous-parallel LPOM to autoencoder training and demonstrate its fast convergence and superior performance.

Training Neural Networks by Lifted Proximal Operator Machines

期刊

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

出版社

IEEE COMPUTER SOC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Training Neural Networks by Lifted Proximal Operator Machines

期刊

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

出版社

IEEE COMPUTER SOC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文