4.3 Article

Routing brain traffic through the von Neumann bottleneck: Efficient cache usage in spiking neural network simulation code on general purpose computers

期刊

PARALLEL COMPUTING
卷 113, 期 -, 页码 -

出版社

ELSEVIER
DOI: 10.1016/j.parco.2022.102952

关键词

Spiking neural networks; Large-scale simulation; Cache performance; Distributed computing; Parallel computing; Memory access bottleneck

资金

  1. European Union's Horizon 2020 (H2020) [785907, 945539, 754304]
  2. Helmholtz Association Initiative and Networking Fund [SO-092]
  3. Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) [368482240/GRK2416, 49111148]
  4. VSR computation time grant [JINB33]
  5. project Exploratory Challenge on Post-K Computer (Understanding the neural mechanisms of thoughts and its applications to AI) of the Ministry of Education, Culture, Sports, Science and Technology (MEXT)

向作者/读者索取更多资源

Simulation is an important approach for studying complex dynamic systems, especially in the field of biological neural networks. This study demonstrates the effectiveness of common techniques in improving simulation accuracy and efficiency.
Simulation is a third pillar next to experiment and theory in the study of complex dynamic systems such as biological neural networks. Contemporary brain-scale networks correspond to directed random graphs of a few million nodes, each with an in-degree and out-degree of several thousands of edges, where nodes and edges correspond to the fundamental biological units, neurons and synapses, respectively. The activity in neuronal networks is also sparse. Each neuron occasionally transmits a brief signal, called spike, via its outgoing synapses to the corresponding target neurons. In distributed computing these targets are scattered across thousands of parallel processes. The spatial and temporal sparsity represents an inherent bottleneck for simulations on conventional computers: irregular memory-access patterns cause poor cache utilization. Using an established neuronal network simulation code as a reference implementation, we investigate how common techniques to recover cache performance such as software-induced prefetching and software pipelining can benefit a real-world application. The algorithmic changes reduce simulation time by up to 50%. The study exemplifies that many-core systems assigned with an intrinsically parallel computational problem can alleviate the von Neumann bottleneck of conventional computer architectures.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.3
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据