☆ 4.3 Article

Deep Learning Inference Parallelization on Heterogeneous Processors With TensorRT

IEEE EMBEDDED SYSTEMS LETTERS (2022)

Journal

IEEE EMBEDDED SYSTEMS LETTERS

Volume 14, Issue 1, Pages 15-18

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/LES.2021.3087707

Keywords

Graphics processing units; Pipeline processing; Throughput; Optimization; Deep learning; Engines; Space exploration; Acceleration; deep learning (DL); optimization

Funding

National Research Foundation of Korea (NRF) - Korea Government (MSIT) [NRF-2019R1A2B5B02069406]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

In this letter, the authors propose a parallelization methodology to maximize the throughput of a single DL application using both GPU and NPU by exploiting various types of parallelism on TensorRT. With six real-life benchmarks, they achieved 81%-391% throughput improvement over the baseline inference using GPU only.

As deep learning (DL) inference applications are increasing, an embedded device tends to equip neural processing units (NPUs) in addition to a CPU and a GPU. For fast and efficient development of DL applications, TensorRT is provided as the software development kit for the NVIDIA hardware platform, including optimizer and runtime that delivers low latency and high throughput for DL inference. Like most DL frameworks, TensorRT assumes that the inference is executed on a single processing element, GPU, or NPU, not both. In this letter, we propose a parallelization methodology to maximize the throughput of a single DL application using both GPU and NPU by exploiting various types of parallelism on TensorRT. With six real-life benchmarks, we could achieve 81%-391% throughput improvement over the baseline inference using GPU only.

Deep Learning Inference Parallelization on Heterogeneous Processors With TensorRT

Journal

IEEE EMBEDDED SYSTEMS LETTERS

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Deep Learning Inference Parallelization on Heterogeneous Processors With TensorRT

Journal

IEEE EMBEDDED SYSTEMS LETTERS

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper