4.2 Article

Core Placement Optimization for Multi-chip Many-core Neural Network Systems with Reinforcement Learning

Publisher

ASSOC COMPUTING MACHINERY
DOI: 10.1145/3418498

Keywords

Multi-chip many-core architecture; neural network accelerator; core placement optimization; machine learning for system

Funding

  1. NSF [1725447, 1719160, 1730309]

Ask authors/readers for more resources

The study proposes a reinforcement-learning-based method to optimize core placement in multi-chip many-core systems, improving system performance and efficiency. Experimental results demonstrate significant improvements in throughput and latency compared to traditional methods.
Multi-chip many-core neural network systems are capable of providing high parallelism benefited from decentralized execution, and they can be scaled to very large systems with reasonable fabrication costs. As multi-chip many-core systems scale up, communication latency related effects will take a more important portion in the system performance. While previous work mainly focuses on the core placement within a single chip, there are two principal issues still unresolved: the communication-related problems caused by the non-uniform, hierarchical on/off-chip communication capability in multi-chip systems, and the scalability of these heuristic-based approaches in a factorially growing search space. To this end, we propose a reinforcement-learning-based method to automatically optimize core placement through deep deterministic policy gradient, taking into account information of the environment by performing a series of trials (i.e., placements) and using convolutional neural networks to extract spatial features of different placements. Experimental results indicate that compared with a naive sequential placement, the proposed method achieves 1.99x increase in throughput and 50.5% reduction in latency; compared with the simulated annealing, an effective technique to approximate the global optima in an extremely large search space, our method improves the throughput by 1.22x and reduces the latency by 18.6%. We further demonstrate that our proposed method is capable to find optimal placements taking advantages of different communication properties caused by different system configurations, and work in a topology-agnostic manner.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.2
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available