3.8 Proceedings Paper

Structured Stochastic Gradient MCMC

出版社

JMLR-JOURNAL MACHINE LEARNING RESEARCH

关键词

-

资金

  1. National Science Foundation (NSF) under the NSF CAREER Award [2047418]
  2. NSF [1928718, 2003237, 2007719]
  3. NSF Graduate Research Fellowship [DGE-1839285]
  4. Department of Energy, Office of Science [DE-SC0022331]
  5. Defense Advanced Research Projects Agency (DARPA) [HR001120C0021]
  6. U.S. Department of Energy (DOE) [DE-SC0022331] Funding Source: U.S. Department of Energy (DOE)
  7. Direct For Computer & Info Scie & Enginr [2003237] Funding Source: National Science Foundation
  8. Direct For Social, Behav & Economic Scie [1928718] Funding Source: National Science Foundation
  9. Division Of Computer and Network Systems [2003237] Funding Source: National Science Foundation
  10. Divn Of Social and Economic Sciences [1928718] Funding Source: National Science Foundation
  11. Div Of Information & Intelligent Systems
  12. Direct For Computer & Info Scie & Enginr [2007719, 2047418] Funding Source: National Science Foundation

向作者/读者索取更多资源

Researchers propose a new non-parametric variational inference scheme that combines ideas from SGMCMC and coordinate-ascent VI, aiming to relax the assumptions on the posterior distribution. They introduce a new Langevin-type algorithm that operates on a self-averaged posterior energy function to break the statistical dependencies between coordinates, allowing for faster mixing. The scheme is tested on various tasks and shows improvements in convergence speed and/or final accuracy compared to SGMCMC and parametric VI.
Stochastic gradient Markov Chain Monte Carlo (SGMCMC) is a scalable algorithm for asymptotically exact Bayesian inference in parameterrich models, such as Bayesian neural networks. However, since mixing can be slow in high dimensions, practitioners often resort to variational inference (VI). Unfortunately, VI makes strong assumptions on both the factorization and functional form of the posterior. To relax these assumptions, this work proposes a new non-parametric variational inference scheme that combines ideas from both SGMCMC and coordinate-ascent VI. The approach relies on a new Langevin-type algorithm that operates on a self-averaged posterior energy function, where parts of the latent variables are averaged over samples from earlier iterations of the Markov chain. This way, statistical dependencies between coordinates can be broken in a controlled way, allowing the chain to mix faster. This scheme can be further modified in a dropout manner, leading to even more scalability. We test our scheme for ResNet-20 on CIFAR-10, SVHN, and FMNIST. In all cases, we find improvements in convergence speed and/or final accuracy compared to SGMCMC and parametric VI.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

3.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据