☆ 4.5 Article

Serving Multi-DNN Workloads on FPGAs: A Coordinated Architecture, Scheduling, and Mapping Perspective

IEEE TRANSACTIONS ON COMPUTERS (2023)

期刊

IEEE TRANSACTIONS ON COMPUTERS

卷 72, 期 5, 页码 1314-1328

出版社

IEEE COMPUTER SOC

DOI: 10.1109/TC.2022.3214113

关键词

Computer architecture; Field programmable gate arrays; Dynamic scheduling; Optimization; Hardware; Bandwidth; Parallel processing; Multi-tenancy; deep neural network; multi-core; accelerator; FPGA

类别

Computer Science, Hardware & Architecture Engineering, Electrical & Electronic

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This paper proposes the H3M framework to optimize the architecture, scheduling, and mapping for INFaaS on cloud FPGA. H3M outperforms other accelerators in terms of EDP reduction on the ASIC platform. On the Xilinx U200 and U280 FPGA platforms, H3M significantly reduces EDP compared to Herald.

Deep Neural Network (DNN) INFerence-as-a-Service (INFaaS) is the dominating workload in current data centers, for which FPGAs become promising hardware platforms because of their high flexibility and energy efficiency. The dynamic and multi-tenancy nature of INFaaS requires careful design in three aspects: multi-tenant architecture, multi-DNN scheduling, and multi-core mapping. These three factors are critical to the system latency and energy efficiency but are also challenging to optimize since they are tightly coupled and correlated. This paper proposes H3M, an automatic Design Space Exploration (DSE) framework to jointly optimize the architecture, scheduling, and mapping for serving INFaaS on cloud FPGAs. H3M explores: (1) the architecture design space with Heterogeneous spatial Multi-tenant sub-accelerators, (2) layer-wise scheduling for Heterogeneous Multi-DNN workloads, and (3) single-layer mapping to the Homogeneous Multi-core architecture. H3M beats state-of-the-art multi-tenant DNN accelerators, Planaria and Herald, by up to 7.5x and 3.6x in Energy-Delay-Product (EDP) reduction on the ASIC platform. On the Xilinx U200 and U280 FPGA platforms, H3M offers 2.1-5.7x and 1.8-9.0x EDP reduction over Herald.

Serving Multi-DNN Workloads on FPGAs: A Coordinated Architecture, Scheduling, and Mapping Perspective

期刊

IEEE TRANSACTIONS ON COMPUTERS

出版社

IEEE COMPUTER SOC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Serving Multi-DNN Workloads on FPGAs: A Coordinated Architecture, Scheduling, and Mapping Perspective

期刊

IEEE TRANSACTIONS ON COMPUTERS

出版社

IEEE COMPUTER SOC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文