4.7 Article

Low-Latency In Situ Image Analytics With FPGA-Based Quantized Convolutional Neural Network

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TNNLS.2020.3046452

关键词

Computer architecture; Microprocessors; Neural networks; Hardware; Field programmable gate arrays; Throughput; Real-time systems; Cell image classification; convolutional neural network (CNN); field-programmable gate array (FPGA); hardware architecture; low-latency inference; multiplexed asymmetric-detection time-stretch optical microscopy (multi-ATOM); quantized convolutional neural network (QCNN); reconfigurable computing

资金

  1. Croucher Foundation Croucher Innovation Award 2013
  2. Innovation and Technology Commission [ITS/204/18]
  3. Research Grants Council of Hong Kong [CRF C7047-16G, CRF 17307919, GRF 17208918, GRF 17209017, GRF 17245716]
  4. University of Hong Kong Platform Technology Fund
  5. National Natural Science Foundation of China (NSFC) under Scheme Excellent Young Scientists Fund (Hong Kong and Macau) [21922816]

向作者/读者索取更多资源

Real-time in situ image analytics require stringent latency requirements on intelligent neural network inference operations. This study demonstrates that high-performance reconfigurable computing platforms based on FPGA processing can bridge the gap between low-level hardware processing and high-level intelligent image analytics algorithm deployment. The proposed design achieves ultralow classification latency with high accuracy for label-free classification of human PBMC subtypes.
Real-time in situ image analytics impose stringent latency requirements on intelligent neural network inference operations. While conventional software-based implementations on the graphic processing unit (GPU)-accelerated platforms are flexible and have achieved very high inference throughput, they are not suitable for latency-sensitive applications where real-time feedback is needed. Here, we demonstrate that high-performance reconfigurable computing platforms based on field-programmable gate array (FPGA) processing can successfully bridge the gap between low-level hardware processing and high-level intelligent image analytics algorithm deployment within a unified system. The proposed design performs inference operations on a stream of individual images as they are produced and has a deeply pipelined hardware design that allows all layers of a quantized convolutional neural network (QCNN) to compute concurrently with partial image inputs. Using the case of label-free classification of human peripheral blood mononuclear cell (PBMC) subtypes as a proof-of-concept illustration, our system achieves an ultralow classification latency of 34.2 mu s with over 95% end-to-end accuracy by using a QCNN, while the cells are imaged at throughput exceeding 29,200 cells/s. Our QCNN design is modular and is readily adaptable to other QCNNs with different latency and resource requirements.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据