Elsevier

Integration, the VLSI Journal

Volume 51, September 2015, Pages 21-36
Integration, the VLSI Journal

Efficient multi-Gb/s multi-mode LDPC decoder architecture for IEEE 802.11ad applications

https://doi.org/10.1016/j.vlsi.2015.05.001Get rights and content

Highlights

  • The usage of one’s complement instead of conventional two’s complement number system is explored. One’s complement provides a reduction in critical path and hardware complexity. A dynamic column shifting scheme for a pipelined multi-mode decoder is presented. The proposed scheme enables multi-mode operation with the minimal increase in the decoder area. A very low complexity local switch is presented to implement the proposed dynamic column shifting scheme.

  • The proposed decoder occupies only 0.575 mm2 of core area using 65-nm CMOS technology. In addition, it achieves a throughput of 9.25 Gb/s at 400 MHz for all modes. In terms of throughput, hardware complexity, energy efficiency and area efficiency, the proposed multi-mode LDPC decoder is superior to the previous works.

Abstract

This paper presents a novel multi-Gb/s multi-mode LDPC decoder architecture and efficient design techniques for gigabit wireless communications. An efficient dynamic and fixed column-shifting scheme is presented for multi-mode architectures. A novel low-complexity local switch is proposed to implement the dynamic and fixed column-shifting scheme. Furthermore, an efficient quantization method and the usage of a one׳s-complement scheme instead of a two׳s-complement scheme are explored. The proposed decoder achieves very high throughput with minimal area overhead. Post layout results using TSMC 65-nm CMOS technology shows much better throughput, as well as better area- and energy-efficiency, compared to other multi-mode LDPC decoders.

Introduction

Low-density parity check (LDPC) codes, first proposed by Gallager in 1962 [1], have attracted much attention because of their excellent error-correction performance, inherent parallelism and high throughput potential. Therefore, they are widely used in communications standards such as second-generation digital video broadcasting (DVB-S2), IEEE 802.16e and IEEE 802.11n. In addition, millimeter wave (mmWave) gigabit wireless, described by the IEEE 802.11ad Working Group [2], considers LDPC codes the preferred choice for forward error correction (FEC). Recently, many studies have been conducted, aimed at simplification of very large-scale integration (VLSI) implementation of the related decoders, which are called “architecture-aware LDPC codes” [3] or “block-LDPC codes” [4]. Based on these design approaches, quasi-cycle LDPC (QC-LDPC) codes have received significant attention due to their efficient hardware implementations. Furthermore, QC-LDPC codes can provide error-correction performance comparable to random LDPC codes.

The development of multi-gigabit data transmission techniques for 60 GHz-band wireless communications systems have necessitated the implementation of high-throughput low-power LDPC decoder architectures to meet the continuing demand for ever-higher data rates. To meet these demands, multi-gigabit data transmission rate standards, such as IEEE 802.11ad [2] for gigabit wireless local area networks (WLANs) and IEEE 802.15.3c for wireless personal area networks (WPANs), have been proposed, which have necessitated the implementation of multi-gigabit high-throughput LDPC decoder architectures to meet the continuing demand for ever-higher data rates [5], [6], [7], [8], [9], [10], [11].

The core constituent of complexity and limited throughput within the multi-mode decoder is a routing and sorting network, which varies with the rate of the LDPC code. Yen et al. [5] proposed an effective solution for reducing the routing and sorting congestion under 802.15.3c. Chen et al. [6] proposed a further improvement and utilized the inherent block cyclic structure of 802.15.3c LDPC code to develop an efficient decoder with fixed, wired switch networks. Weiner et al. [7] presented an idea for check node granularity by placing pre- and post-routers, but no architecture was presented for these routers. This decoder achieves only 3 Gb/s with 1.3 mm2 of area. Park et al. [8] used an architecture similar to Weiner et al. [7] but improved performance by using a special embedded dynamic random access memory (EDRAM). It achieves the required throughput of 9 Gb/s with 1.6 mm2 of area. Li et al. [9] implemented a half-row pipelined decoder, which achieves an impressive area of 0.16 mm2 but suffers from the problem of very low throughput. In fact, it shows only 3.45 Gb/s using 65-nm CMOS technology. Schlafer et al. [10] designed a decoder for ultra-high throughput applications. It shows extremely high throughput at the cost of extremely high area. It typically shows 160 Gb/s with a 12 mm2 area using a 4-bit quantization for rate-13/16.

Our own previous work [11] showed an efficient multi-mode LDPC decoder for IEEE 802.11ad. The decoder provided throughput of 6 Gb/s with 1.1 mm2 of area. However, the major problem with this decoder is limited throughput, since the IEEE 802.11ad standard calls for a throughput of nearly 9 Gb/s. Hence, a more efficient architecture is required to achieve the required throughput. Furthermore, the decoder in our previous work [11] has a generic architecture without a detailed explanation of a dynamic column-shifting scheme and local switch network. Moreover, it does not include a low-complexity scheme, such as the one׳s-complement scheme proposed in this work. Also, issues like bit error rate (BER) in a decoder are not discussed.

The 802.11ad LDPC decoders presented by Weiner et al. [7] and Park et al. [8] used a flooding schedule [12] with a five-stage pipeline to develop a multi-mode decoder. These designs have three major drawbacks. First, a flooding schedule is an inherently slow converging schedule compared to a layered schedule [13], which translates into a higher number of iterations. Second, a five-stage pipeline with two frames requires a lot of registers, which results in higher area and power consumption. Though the problem is somewhat addressed by utilizing a special eDRAM [8], but still a more efficient architecture is required to address the problems mentioned above. Finally, no special low-complexity routing and sorting architecture has been described for multi-mode applications. The 802.11ad decoders presented by Li et al. [9] and Schlafer et al. [10] are single-rate (rate-13/16) LDPC decoders, specifically designed for ultra-low-area and ultra-high-throughput applications, respectively. None of these decoders shows any implementation for multi-mode applications.

In this paper, a novel efficient high-throughput multi-mode QC-LDPC decoder architecture for IEEE 802.11ad gigabit wireless communications is proposed. For rates 3/4 and 13/16, the proposed architecture can process the single block layer within one clock cycle. The architecture is also capable of concurrent processing of two block rows for rates 1/2 and 5/8 within one clock cycle. A high-throughput pipelined architecture with a novel reduced complexity local switch is described to achieve at least 9.25 Gb/s throughput for all four code rates. The proposed decoder is capable of delivering more than 12 Gb/s for rate-13/16. The proposed local switch addresses the problem of a highly congested routing and sorting network. Moreover, single frame-layered decoder architecture with a one-stage pipeline is designed to achieve better trade-off between throughput and area. Furthermore, a novel one׳s-complement technique and a quantization reduction technique are also proposed to reduce critical path and hardware complexity.

The rest of this paper is organized as follows. Section 2 describes the design techniques related to the proposed multi-mode, multi-block parallel-layered QC-LDPC decoder, including the novel one׳s-complement technique and a quantization reduction technique. In Section 3, the dynamic column-shifting scheme and a proposed LDPC decoder architecture is presented. Section 4 presents the implementation and comparison results. Finally, our conclusions are presented in Section 5.

Section snippets

Design techniques

The layered decoding algorithm [13] can reduce iterations by almost 50% compared with the belief-propagation (BP) decoding algorithm. Therefore, it offers 2×throughput without performance degradation. However, due to the data dependency between consecutive rows in the layered decoding, the multi-row parallel processing and pipelining techniques cannot be applied directly. QC-LDPC codes, which are composed of sub-matrices, allow implementation of the block parallel layered decoding architecture.

Dynamic column-shifting scheme

The LDPC code for the IEEE 802.11ad standard consists of four different rates. The number of check nodes and the degree vary greatly with the code rate. For a fully parallel layered decoder, we developed a column-shifting scheme that provides fixed throughput across all rates with very little increase in complexity. Fig. 8 shows the proposed dynamic column-shifting scheme. It shows how group #1 of rate-5/8 (one layer with a CN degree>8) and rate-1/2 (two layers where each consists of a CN

Results and comparisons

The proposed LDPC decoder architecture was modeled in the Verilog hardware description language (HDL) and simulated to verify its functionality using a test pattern generated from a C simulator. After complete verification of the design functionality, it was then synthesized and a layout made using appropriate time and area constraints. Both simulation and synthesis steps were carried out using Synopsys design tools and TSMC 65-nm CMOS standard cell technology. With the 65-nm low-voltage

Conclusions

This paper presents a novel efficient high-throughput multi-mode QC-LDPC decoder, which supports all modes given in the IEEE 802.11ad standard for gigabit wireless communications. The proposed decoder needs only 7.875-Kbits of memory. For rate-13/16, the proposed decoder shows much higher throughput and much higher area efficiency compared to other state-of-the-art architectures proposed in the literature. Furthermore, a novel one׳s-complement scheme, a reduced quantization technique, a dynamic

Acknowledgment

This research was supported by the MSIP, Korea, under the ITRC support program (NIPA-2014-H0301-14-1042) supervised by the NIPA and Basic Science Research Program through the NRF funded by the Ministry of Science, ICT and future Planning (2013R1A2A2A01068628) and Inha University research grant.

Sabooh Ajaz received the Bachelor's degree in Electronic Engineering from NED University of Engineering and Technology Karachi, in 2006 and the Master's degree from University of Wollongong, Wollongong, Australia, in 2010. Since 2011, he is currently pursuing the Ph.D. degree in Information & Communication Engineering from Inha University, Incheon, Korea. His research interests includes VLSI and SOC architecture design for digital signal processing and communication systems.

References (17)

  • R.G. Gallager

    Low-Density Parity-Check Codes

    (1963)
  • IEEE 802.11ad Wireless Lan: PHY/MAC Complete Proposal Specification, May...
  • M. Mansour et al.

    High-throughput LDPC decoders

    IEEE Trans. VLSI Syst.

    (2003)
  • H. Zhong et al.

    Block-LDPC: a practical LDPC coding system design approach

    IEEE Trans. Circuits Syst.

    (2005)
  • S.W. Yen

    A 5.79-Gb/s Energy-efficient multirate LDPC Codec Chip for IEEE 802.15.3c applications

    IEEE J. Solid-State Circuits

    (2012)
  • Z. Chen et al., A macro-layer level fully parallel layered LDPC decoder SOC for IEEE 802.15.3c application, in:...
  • M. Weiner, et al., LDPC decoder architecture for high-data rate personal-area networks, in: IEEE International...
  • Y. Park et al., A 1.6mm2 38-mW 1.5Gb/s LDPC decoder enabled by refresh-free embedded DRAM, in: IEEE Symposium on VLSI...
There are more references available in the full text version of this article.

Cited by (0)

Sabooh Ajaz received the Bachelor's degree in Electronic Engineering from NED University of Engineering and Technology Karachi, in 2006 and the Master's degree from University of Wollongong, Wollongong, Australia, in 2010. Since 2011, he is currently pursuing the Ph.D. degree in Information & Communication Engineering from Inha University, Incheon, Korea. His research interests includes VLSI and SOC architecture design for digital signal processing and communication systems.

Hanho Lee received Ph.D. and M.S. degrees, both in Electrical & Computer Engineering, from the University of Minnesota, Minneapolis, in 2000 and 1996, respectively. In 1999, he was a Member of Technical Staff-1 at Lucent Technologies, Bell Labs, Holmdel, New Jersey. From April 2000 to August 2002, he was a Member of Technical Staff at the Lucent Technologies (Bell Labs Innovations), Allentown. From August 2002 to August 2004, he was an Assistant Professor at the Department of Electrical and Computer Engineering, University of Connecticut, USA. Since August 2004, he has been with the Department of Information and Communication Engineering, Inha University, where he is currently Professor. He was a visiting researcher at Electronics and Telecommunications Research Institute (ETRI), Korea, in 2005. From August 2010 to August 2011, he was a visiting scholar at Bell Labs, Alcatel-Lucent, Murray Hill, New Jersey, USA. He has authored more than 140 papers and also holds more than 25 patents. He received the best paper award at the International SoC Design Conference (ISOCC) in 2006 and 2009. He received the 2013 best paper award at the Institute of Electronics and Information Engineers (IEIE). He served as the Secretary General of the IEEE ISCAS2012, Special Session Chair or Technical Program Vice Chair of the ISOCC2012~2014 and Guest Editor of the Journal of Electrical and Computer Engineering. He is a Technical Committee Member or Track Chair of SiPS2013, APCCAS2014, ISCIT2014, ICEIC2014. He served as the Technical Committee Member of the IEEE Signal Processing Society, Design and Implementation of Signal Processing Systems (DISPS) (Jan. 2011 – Dec. 2013). He is a Senior Member of the IEEE, a member of the IEEE Circuits and Systems Society, VLSI Systems and Application (VSA) Technical Committee, Circuits and Systems for Communications (CASCOM) Technical Committee. His research interest includes VLSI architecture design for digital signal processing, forward error correction architectures, cryptographic systems, and communications.

View full text