4.8 Article

Logically synthesized and hardware-accelerated restricted Boltzmann machines for combinatorial optimization and integer factorization

期刊

NATURE ELECTRONICS
卷 5, 期 2, 页码 92-101

出版社

NATURE PORTFOLIO
DOI: 10.1038/s41928-022-00714-0

关键词

-

资金

  1. ASCENT, one of six centers in JUMP, a Semiconductor Research Corporation (SRC) program - DARPA

向作者/读者索取更多资源

Researchers demonstrate that multiple small computational modules can be combined to create field-programmable gate-array-based RBMs capable of solving more complex problems than their individually trained parts. Their approach combines developments in training, model quantization, and efficient hardware implementation for inference.
Multiple, small computational modules can be combined to create field-programmable gate-array-based stochastic neural network accelerators that are able to solve more complex problems than their individually trained parts. The restricted Boltzmann machine (RBM) is a stochastic neural network capable of solving a variety of difficult tasks including non-deterministic polynomial-time hard combinatorial optimization problems and integer factorization. The RBM is ideal for hardware acceleration as its architecture is compact (requiring few weights and biases) and its simple parallelizable sampling algorithm can find the ground states of difficult problems. However, training the RBM on these problems is challenging as the training algorithm tends to fail for large problem sizes and it can be hard to find efficient mappings. Here we show that multiple, small computational modules can be combined to create field-programmable gate-array-based RBMs capable of solving more complex problems than their individually trained parts. Our approach offers a combination of developments in training, model quantization and efficient hardware implementation for inference. With our implementation, we demonstrate hardware-accelerated factorization of 16-bit numbers with high accuracy and with a speed improvement of 10,000 times over a central processing unit implementation and 1,000 times over a graphics processing unit implementation, as well as a power improvement of 30 and 7 times, respectively.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据