4.7 Article

A Resource-Efficient Pipelined Architecture for Real-Time Semi-Global Stereo Matching

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TCSVT.2021.3061704

关键词

Stereo matching accelerator; semi-global matching; FPGA; real-time; stereo vision; pipelined architecture

资金

  1. National Key Research and Development Program of China [2019YFB2204800]
  2. National Natural Science Foundation of China (NSFC) [61931008, 61874102, 61732020]
  3. Strategic Priority Research Program of Chinese Academy of Sciences [XDB44000000]
  4. Fundamental Research Funds for the Central Universities [WK2100000005]
  5. Beijing Municipal Science and Technology Program [Z181100008918013]

向作者/读者索取更多资源

This paper presents a resource-efficient pipelined hardware architecture for implementing a high-accuracy and high-performance stereo matching algorithm on a resource-limited hardware platform. By combining image down-sampling and disparity skipping, the presented architecture achieves high throughput on a Zynq-7 FPGA board at the maximum frequency, surpassing the efficiency of the latest reference work.
It is still a grand challenge to implement a high-accuracy and high-performance stereo matching algorithm on a resource-limited hardware platform in stereo vision systems. This paper proposes a resource-efficient pipelined hardware architecture with four-cycle time-sharing for the semi-global matching (SGM) algorithm with weighted path cost aggregation. To save hardware resources, we also combined image down-sampling and disparity skipping in the SGM algorithm. The presented architecture is synthesized and implemented on a Zynq-7 FPGA board, which results in a throughput of 1280 x 960/62.5 fps with 75 disparity levels at the maximum frequency of 216 MHz. To improve the accuracy of the disparity map at close range, we also adapt the presented architecture with two-cycle time-sharing, and the disparity range is increased to 128, which attains the processing of 1280 x 960/116 fps at 200 MHz on VCU-118 FPGA hoard; the throughput reaches 18245 MDE/s. The result shows that the whole architecture only takes 50465 LUTs, 48046 Registers, 125.5 BRAMs with 128 disparity levels, which is much more efficient than the latest reference work.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据