4.7 Article

SARAH-M: A fast stochastic recursive gradient descent algorithm via momentum

Related references

Note: Only part of the references are listed.
Article Operations Research & Management Science

Momentum-Based Variance-Reduced Proximal Stochastic Gradient Method for Composite Nonconvex Stochastic Optimization

Yangyang Xu et al.

Summary: In this paper, a new stochastic gradient method called PStorm is proposed for solving nonconvex nonsmooth stochastic problems. PStorm achieves an optimal complexity result of O(epsilon(-3)) by using only one or O(1) samples in every update, and can produce a stochastic epsilon-stationary solution. PStorm can be applied to online learning problems and has better generalization performance for large-scale machine learning problems compared to other methods and the vanilla SGM.

JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS (2023)

Article Automation & Control Systems

Distributed Momentum-Based Frank-Wolfe Algorithm for Stochastic Optimization

Jie Hou et al.

Summary: This paper proposes a distributed stochastic Frank-Wolfe solver for convex and nonconvex optimization over networks. The algorithm combines Nesterov's momentum and gradient tracking techniques to achieve convergence. Numerical simulations demonstrate the efficacy of the algorithm against competing alternatives.

IEEE-CAA JOURNAL OF AUTOMATICA SINICA (2023)

Article Operations Research & Management Science

Finite-sum smooth optimization with SARAH

Lam M. Nguyen et al.

Summary: NC-SARAH is a modified version of the original SARAH algorithm for non-convex optimization, which allows flexible minibatch sizes and large step sizes to achieve fast convergence. SARAH++ is proposed for convex optimization with different convergence rates and shows improved performance through numerical experiments.

COMPUTATIONAL OPTIMIZATION AND APPLICATIONS (2022)

Article Computer Science, Artificial Intelligence

Scheduled Restart Momentum for Accelerated Stochastic Gradient Descent

Bao Wang et al.

Summary: This paper proposes a new DNN training scheme called scheduled restart SGD (SRSGD), which replaces the constant momentum in SGD with increasing momentum and stabilizes the iterations by resetting the momentum to zero according to a schedule. Experimental results demonstrate that SRSGD significantly improves the convergence and generalization of DNNs across various models and benchmarks.

SIAM JOURNAL ON IMAGING SCIENCES (2022)

Article Computer Science, Information Systems

Accelerating mini-batch SARAH by step size rules

Zhuang Yang et al.

Summary: SARAH method's performance relies on the choice of step size sequence, leading to the proposal of MB-SARAH-RBB method, which is proven to linearly converge in expectation for strongly convex objective functions and has better gradient complexity. Numerical experiments show the superiority of the proposed methods.

INFORMATION SCIENCES (2021)

Article Computer Science, Artificial Intelligence

DS-UI: Dual-Supervised Mixture of Gaussian Mixture Models for Uncertainty Inference in Image Recognition

Jiyang Xie et al.

Summary: This paper introduces a DS-UI framework that combines DNN classifier with MoGMM to enhance Bayesian estimation-based UI in image recognition. The DS-UI improves image recognition accuracy by directly calculating probabilities, and proposes a dual-supervised stochastic gradient-based variational Bayes algorithm for optimization.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2021)

Article Computer Science, Software Engineering

Regularized nonlinear acceleration

Damien Scieur et al.

MATHEMATICAL PROGRAMMING (2020)

Article Computer Science, Artificial Intelligence

An accelerated stochastic variance-reduced method for machine learning problems

Zhuang Yang et al.

KNOWLEDGE-BASED SYSTEMS (2020)

Article Operations Research & Management Science

Momentum and stochastic momentum for stochastic gradient, Newton, proximal point and subspace descent methods

Nicolas Loizou et al.

COMPUTATIONAL OPTIMIZATION AND APPLICATIONS (2020)

Article Mathematics, Applied

Optimization Methods for Large-Scale Machine Learning

Leon Bottou et al.

SIAM REVIEW (2018)

Article Computer Science, Artificial Intelligence

Stochastic Gradient Made Stable: A Manifold Propagation Approach for Large-Scale Optimization

Yadong Mu et al.

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING (2017)

Article Mathematics, Applied

LOCAL IMPROVEMENT RESULTS FOR ANDERSON ACCELERATION WITH INACCURATE FUNCTION EVALUATIONS

Alex Toth et al.

SIAM JOURNAL ON SCIENTIFIC COMPUTING (2017)

Article Mathematics, Applied

Anderson acceleration of the alternating projections method for computing the nearest correlation matrix

Nicholas J. Higham et al.

NUMERICAL ALGORITHMS (2016)

Article Mathematics, Applied

CONVERGENCE ANALYSIS FOR ANDERSON ACCELERATION

Alex Toth et al.

SIAM JOURNAL ON NUMERICAL ANALYSIS (2015)

Article Mathematics, Applied

PROXIMAL STOCHASTIC GRADIENT METHOD WITH PROGRESSIVE VARIANCE REDUCTION

Lin Xiao et al.

SIAM JOURNAL ON OPTIMIZATION (2014)

Article Computer Science, Artificial Intelligence

Dreaming of mathematical neuroscience for half a century

Shun-ichi Amari

NEURAL NETWORKS (2013)

Article Mathematics, Applied

RANDOMIZED SMOOTHING FOR STOCHASTIC OPTIMIZATION

John C. Duchi et al.

SIAM JOURNAL ON OPTIMIZATION (2012)

Article Computer Science, Artificial Intelligence

LIBSVM: A Library for Support Vector Machines

Chih-Chung Chang et al.

ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY (2011)