☆ 4.7 Article

An adaptive DNN inference acceleration framework with end-edge-cloud collaborative computing

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE (2023)

Journal

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE

Volume 140, Issue -, Pages 422-435

Publisher

ELSEVIER

DOI: 10.1016/j.future.2022.10.033

Keywords

Deep neural networks; DNN inference acceleration; End-edge-cloud collaboration; DNN computation partitioning; Latency prediction model

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This paper proposes an adaptive DNN inference acceleration framework that utilizes end-edge-cloud collaborative computing to accelerate inference latency. The framework includes a latency prediction model and a computation partitioning algorithm, and experimental results show significant improvements in prediction accuracy and inference latency reduction.

Deep Neural Networks (DNNs) based on intelligent applications have been intensively deployed on mobile devices. Unfortunately, resource-constrained mobile devices cannot meet stringent latency requirements due to a large amount of computation required by these intelligent applications. Both exiting cloud-assisted DNN inference approaches and edge-assisted DNN inference approaches can reduce end-to-end inference latency through offloading DNN computations to the cloud server or edge servers, but they suffer from unpredictable communication latency caused by long wide-area massive data transmission or performance degeneration caused by the limited computation resources. In this paper, we propose an adaptive DNN inference acceleration framework, which accelerates DNN inference by fully utilizing the end-edge-cloud collaborative computing. First, a latency prediction model is built to estimate the layer-wise execution latency of a DNN on different heterogeneous computing platforms, which use neural networks to learn non-linear features related to inference latency. Second, a computation partitioning algorithm is designed to identify two optimal partitioning points, which adaptively divide DNN computations into end devices, edge servers, and the cloud server for minimizing DNN inference latency. Finally, we conduct extensive experiments on three widely -adopted DNNs, and the experimental results show that our latency prediction models can improve the prediction accuracy by about 72.31% on average compared with four baseline approaches, and our computation partitioning approach can reduce the end-to-end latency by about 20.81% on average against six baseline approaches under three wireless networks. (c) 2022 Elsevier B.V. All rights reserved.

An adaptive DNN inference acceleration framework with end-edge-cloud collaborative computing

Journal

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE

Publisher

ELSEVIER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

An adaptive DNN inference acceleration framework with end-edge-cloud collaborative computing

Journal

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE

Publisher

ELSEVIER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper