Skip to main content

Advertisement

Log in

Implementation of a High Throughput 3GPP Turbo Decoder on GPU

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

Turbo code is a computationally intensive channel code that is widely used in current and upcoming wireless standards. General-purpose graphics processor unit (GPGPU) is a programmable commodity processor that achieves high performance computation power by using many simple cores. In this paper, we present a 3GPP LTE compliant Turbo decoder accelerator that takes advantage of the processing power of GPU to offer fast Turbo decoding throughput. Several techniques are used to improve the performance of the decoder. To fully utilize the computational resources on GPU, our decoder can decode multiple codewords simultaneously, divide the workload for a single codeword across multiple cores, and pack multiple codewords to fit the single instruction multiple data (SIMD) instruction width. In addition, we use shared memory judiciously to enable hundreds of concurrent multiple threads while keeping frequently used data local to keep memory access fast. To improve efficiency of the decoder in the high SNR regime, we also present a low complexity early termination scheme based on average extrinsic LLR statistics. Finally, we examine how different workload partitioning choices affect the error correction performance and the decoder throughput.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8

Similar content being viewed by others

References

  1. Berrou, C., Glavieux, A., & Thitimajshima, P. (1993). Near Shannon limit error-correcting coding and decoding: Turbo-codes. In IEEE international conference on communication.

  2. Garrett, D., Xu, B., & Nicol, C. (2001). Energy efficient turbo decoding for 3G mobile. In International symposium on low power electronics and design (pp. 328–333). ACM.

  3. Bickerstaff, M., Davis, L., Thomas, C., Garrett, D., & Nicol, C. (2003). A 24Mb/s Radix-4 LogMAP turbo decoder for 3GPP-HSDPA mobile wireless. In IEEE Int. Solid-State Circuit Conf. (ISSCC).

  4. Shin, M., & Park, I. (2007). SIMD Processor-based turbo decoder supporting multiple third-generation wireless standards. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 15, 801–810.

  5. Lin, Y., Mahlke, S., Mudge, T., Chakrabarti, C., Reid, A., & Flautner, K. (2006). Design and implementation of turbo decoders for software defined radio. In IEEE workshop on signal processing design and implementation (SIPS).

  6. Sun, Y., Zhu, Y., Goel, M., & Cavallaro, J. R. (2008). Configurable and scalable high throughput turbo decoder architecture for multiple 4G wireless standards. In IEEE international conference on Application-Specific Systems, Architectures and Processors (ASAP) (pp. 209–214).

  7. Salmela, P., Sorokin, H., & Takala, J. (2008). A programmable Max-Log-MAP turbo decoder implementation. Hindawi VLSI Design (pp. 636–640).

  8. Wong, C.-C., Lee, Y.-Y., & Chang, H.-C. (2009). A 188-size 2.1 mm2 Reconfigurable turbo decoder chip with parallel architecture for 3GPP LTE system. In Symposium on VLSI circuits (pp. 288–289).

  9. Amiri, K., Sun, Y., Murphy, P., Hunter, C., Cavallaro, J.R., & Sabharwal, A. (2007). WARP, a unified wireless network testbed for education and research. In MSE ’07: Proceedings of the 2007 IEEE international conference on microelectronic systems education.

  10. Kim, J., Hyeon, S., & Choi, S. (2010). Implementation of an SDR system using graphics processing unit. IEEE Communications Magazine, 48(3), 156–162.

    Article  Google Scholar 

  11. Wu, M., Sun, Y., & Cavallaro, J. R. (2009). Reconfigurable real-time MIMO detector on GPU. In IEEE 43rd Asilomar conference on signals, systems and computers (ASILOMAR’09).

  12. Nylanden, T., Janhunen, J., Silvén, O., & Juntti, M. J. (2010). A GPU implementation for two MIMO-OFDM detectors. In International conference on embedded computer systems (SAMOS) (pp. 293–300).

  13. Falcão, G., Silva, V., & Sousa, L. (2009). How GPUs can outperform ASICs for fast LDPC decoding. In ICS ’09: Proceedings of the 23rd international conference on supercomputing (pp. 390–399).

  14. Wu, M., Sun, Y., & Cavallaro, J. (2010). Implementation of a 3GPP LTE turbo decoder accelerator on GPU. In Signal Processing Systems (SIPS) (pp. 192–197).

  15. Lee, D., Wolf, M., & Kim, H. (2010). Design space exploration of the turbo decoding algorithm on GPUs. In International conference on compilers, architectures and synthesis for embedded systems (pp. 214–226).

  16. NVIDIA Corporation, CUDA compute unified device architecture programming guide (2008). Available: http://www.nvidia.com/object/cuda_develop.html

  17. Bahl, L., Cocke, J., Jelinek, F., & Raviv, J. (1974). Optimal decoding of linear codes for minimizing symbol error rate. IEEE Transactions on Information Theory, IT-20, 284–287.

    Article  MathSciNet  Google Scholar 

  18. Naessens, F., Bougard, B., Bressinck, S., Hollevoet, L., Raghavan, P., der Perre, L. V., & Catthoor, F. (2008). A unified instruction set programmable architecture for multi-standard advanced forward error correction. In IEEE workshop on Signal Processing Systems(SIPS).

  19. Hagenauer, J., Offer, E., & Papke, L. (1996). Iterative decoding of binary block and convolutional codes. IEEE Transactions on Information Theory, 42, 429–445.

    Article  MATH  Google Scholar 

  20. Shao, S. L. R., & Fossorier, M. (1996). Two simple stopping criteria for turbo decoding. IEEE Transactions on Information Theory, 42, 429–445.

    Article  Google Scholar 

  21. Matache, A., Dolinar, S., & Pollara, F. (2000). Stopping rules for turbo decoders. In JPL TMO Progress Report (pp. 42–142).

  22. Sun, J., & Takeshita, O. (2005). Interleavers for turbo codes using permutation polynomials over integer rings. IEEE Transactions on Information Theory, 51, 101–119.

    Article  MathSciNet  Google Scholar 

  23. Robertson, P., Villebrun, E., & Hoeher, P. (1995). A comparison of optimal and sub-optimal MAP decoding algorithm operating in the log domain. In IEEE Int. Conf. Commun. (pp. 1009–1013).

  24. Valenti, M., & Sun, J. (2001). The UMTS turbo code and a efficient decoder implementation suitable for software-defined radios. International Journal of Wireless Information Networks, 8(4), 203–215.

    Article  Google Scholar 

  25. Michel, H., Worm, A., Munch, M., & Wehn, N. (2002). Hardware software trade-offs for advanced 3G channel coding. In Proceedings of design, automation and test in Europe.

  26. Loo, K., Alukaidey, T., & Jimaa, S. (2003). High performance parallelised 3GPP turbo decoder. In IEEE personal mobile communications conference (pp. 337–342).

  27. Song, Y., Liu, G., & Yang, H. (2005). The implementation of turbo decoder on DSP in W-CDMA system. In International conference on wireless communications, networking and mobile computing (pp. 1281–1283).

Download references

Acknowledgements

This work was supported in part by Renesas Mobile, Texas Instruments, Xilinx, and by the US National Science Foundation under grants CNS-0551692, CNS-0619767, EECS-0925942 and CNS-0923479.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Wu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, M., Sun, Y., Wang, G. et al. Implementation of a High Throughput 3GPP Turbo Decoder on GPU. J Sign Process Syst 65, 171–183 (2011). https://doi.org/10.1007/s11265-011-0617-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-011-0617-7

Keywords

Navigation