3.8 Proceedings Paper

Accelerating Proximal Policy Optimization on CPU-FPGA Heterogeneous Platforms

出版社

IEEE COMPUTER SOC
DOI: 10.1109/FCCM48280.2020.00012

关键词

-

资金

  1. U.S. National Science Foundation [OAC-1911229]
  2. Intel Strategic Research Alliance

向作者/读者索取更多资源

Reinforcement Learning (RL) is a technique that enables an agent to learn to behave optimally by repeatedly interacting with the environment and receiving the rewards. RL is widely used in domains such as robotics, game playing and finance. Proximal Policy Optimization (PPO) is the state-of-the-art policy optimization algorithm which achieves superior overall performance on various RL benchmarks. PPO iteratively optimizes its policy-a function which chooses optimal actions, with each iteration consisting of two computationally intensive phases: Inference phase-agents infer actions to interact with the environment and collect data, and Training phase-agents train the policy using the collected data. In this work, we develop the first high-throughput PPO accelerator on CPU-FPGA heterogeneous platform, targeting both phases of the algorithm for acceleration. We implement a systolic-array based architecture coupled with a novel memory-blocked data layout that enables streaming data access in both forward and backward propagations to achieve high-throughput performance. Additionally, we develop a novel systolic array compute sharing technique to mitigate the potential load imbalance in the training of two networks. We develop an accurate performance model of our design, based on which we perform design space exploration to obtain optimal design points. Our design is evaluated on widely used robotics benchmarks, achieving 2.1x-30.5x and 2x-27.5x improvements in throughput against state-of-the-art CPU and CPU-GPU implementations, respectively.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

3.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据