☆ 3.8 Proceedings Paper

Dynamic Adaptive DNN Surgery for Inference Acceleration on the Edge

IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2019) (2019)

Journal

IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2019)

Volume -, Issue -, Pages 1423-1431

Publisher

IEEE

DOI: 10.1109/infocom.2019.8737614

Keywords

Funding

Hong Kong Polytechnic University [PolyU G-YBQE]
Innovation Technology Fund (ITF)-UICP-MGJR [UIM/363]
University of Sydney DVC

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Recent advances in deep neural networks (DNNs) have substantially improved the accuracy and speed of a variety of intelligent applications. Nevertheless, one obstacle is that DNN inference imposes heavy computation burden to end devices, but offloading inference tasks to the cloud causes transmission of a large volume of data. Motivated by the fact that the data size of some intermediate DNN layers is significantly smaller than that of raw input data, we design the DNN surgery, which allows partitioned DNN processed at both the edge and cloud while limiting the data transmission. The challenge is twofold: (1) Network dynamics substantially influence the performance of DNN partition, and (2) State-of-the-art DNNs arc characterized by a directed acyclic graph (l)AG) rather than a chain so that. partition is greatly complicated. In order to solve the issues, we design a Dynamic Adaptive DNN Surgery (DADS) scheme, which optimally partitions the DNN under different network condition. Under the lightly loaded condition, DNN Surgery Light (DSL) is developed, which minimizes the overall delay to process one frame. The minimization problem is equivalent to a min-cut problem so that a globally optimal solution is derived. In the heavily loaded condition, DNN Surgery Heavy (DSH) is developed, with the objective to maximize throughput. However, the problem is NP-hard so that DSH resorts an approximation method to achieve an approximation ratio of 3. Real-world prototype based on self driving car video dataset is implemented, showing that compared with executing entire the DNN on the edge and cloud, DADS can improve latency up to 6.45 and 8.08 times respectively, and improve throughput up to 8.31 and 14.01 times respectively.

Dynamic Adaptive DNN Surgery for Inference Acceleration on the Edge

Journal

IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2019)

Publisher

IEEE

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Dynamic Adaptive DNN Surgery for Inference Acceleration on the Edge

Journal

IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2019)

Publisher

IEEE

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper