4.4 Article

TLP-LDPC: Three-Level Parallel FPGA Architecture for Fast Prototyping of LDPC Decoder Using High-Level Synthesis

Journal

JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY
Volume 37, Issue 6, Pages 1290-1306

Publisher

SCIENCE PRESS
DOI: 10.1007/s11390-022-1499-9

Keywords

low-density parity-check (LDPC); high-level synthesis (HLS); field-programmable gate array (FPGA)

Funding

  1. National Key Research and Development Program of China [2018YF-A0701800]
  2. National Natural Science Foundation of China [61821003, 62172175]
  3. Alibaba Group through Alibaba Innovative Research (AIR) Program

Ask authors/readers for more resources

This paper proposes a three-level parallel architecture, TLP-LDPC, to achieve high throughput LDPC decoding on large FPGA platforms. By fully exploiting the characteristics of LDPC and underlying hardware, and eliminating potential data conflicts, this decoder provides high decoding performance.
Low-Density Parity-heck Codes (LDPC) with excellent error-correction capabilities have been widely used in both data communication and storage fields, to construct reliable cyber-physical systems that are resilient to real-world noises. Fast prototyping field-programmable gate array (FPGA)-based decoder is essential to achieve high decoding performance while accelerating the development process. This paper proposes a three-level parallel architecture, TLP-LDPC, to achieve high throughput by fully exploiting the characteristics of both LDPC and underlying hardware while effectively scaling to large-size FPGA platforms. The three-level parallel architecture contains a low-level decoding unit, a mid-level multi-unit decoding core, and a high-level multi-core decoder. The low-level decoding unit is a basic LDPC computation component that effectively combines the features of the LDPC algorithm and hardware with the specific structure (e.g., Look-Up-Table, LUT) of the FPGA and eliminates potential data conflicts. The mid-level decoding core integrates the input/output and multiple decoding units in a well-balancing pipelined fashion. The top-level multi-core architecture conveniently makes full use of board-level resources to improve the overall throughput. We develop an LDPC C++ code with dedicated pragmas and leverage HLS tools to implement the TLP-LDPC architecture. Experimental results show that TLP-LDPC achieves 9.63 Gbps end-to-end decoding throughput on a Xilinx Alveo U50 platform, 3.9x higher than existing HLS-based FPGA implementations.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available