☆ 3.8 Proceedings Paper

Accelerating Proximal Policy Optimization on CPU-FPGA Heterogeneous Platforms

28TH IEEE INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM) (2020)

期刊

28TH IEEE INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM)

卷 -, 期 -, 页码 19-27

出版社

IEEE COMPUTER SOC

DOI: 10.1109/FCCM48280.2020.00012

关键词

类别

Computer Science, Hardware & Architecture

资金

U.S. National Science Foundation [OAC-1911229]
Intel Strategic Research Alliance

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Reinforcement Learning (RL) is a technique that enables an agent to learn to behave optimally by repeatedly interacting with the environment and receiving the rewards. RL is widely used in domains such as robotics, game playing and finance. Proximal Policy Optimization (PPO) is the state-of-the-art policy optimization algorithm which achieves superior overall performance on various RL benchmarks. PPO iteratively optimizes its policy-a function which chooses optimal actions, with each iteration consisting of two computationally intensive phases: Inference phase-agents infer actions to interact with the environment and collect data, and Training phase-agents train the policy using the collected data. In this work, we develop the first high-throughput PPO accelerator on CPU-FPGA heterogeneous platform, targeting both phases of the algorithm for acceleration. We implement a systolic-array based architecture coupled with a novel memory-blocked data layout that enables streaming data access in both forward and backward propagations to achieve high-throughput performance. Additionally, we develop a novel systolic array compute sharing technique to mitigate the potential load imbalance in the training of two networks. We develop an accurate performance model of our design, based on which we perform design space exploration to obtain optimal design points. Our design is evaluated on widely used robotics benchmarks, achieving 2.1x-30.5x and 2x-27.5x improvements in throughput against state-of-the-art CPU and CPU-GPU implementations, respectively.

Accelerating Proximal Policy Optimization on CPU-FPGA Heterogeneous Platforms

期刊

28TH IEEE INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM)

出版社

IEEE COMPUTER SOC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Accelerating Proximal Policy Optimization on CPU-FPGA Heterogeneous Platforms

期刊

28TH IEEE INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM)

出版社

IEEE COMPUTER SOC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文