Skip to main content
Log in

Exploring the parallel capabilities of GPU: Berlekamp-Massey algorithm case study

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Graphics processors Unit (GPU) architectures are becoming increasingly programmable, offering the potential for dramatic speedups for a variety of general purpose applications compared to contemporary general- purpose processors (CPUs). However, there are several optimization techniques which are used to maximize the benefit of the GPU resources. This research exploits optimization techniques for CUDA enabled GPU architecture in order to achieve the best possible performance for Berlekamp-Massey Algorithm (BMA) as a case study. Berlekamp-Massey Algorithm (BMA) is one of the best solutions to find the shortest linear feedback shift register which is very important for several applications such as digital processing and cryptography. The experimental results show that the optimized BMA implementation is almost 160 × faster than non-bit CPU serial implementation, 7 × faster than bit serial implementation and 4 × faster than an initial parallel bit implementation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

References

  1. Ali, H., Ouyang, M., Soliman, A., Sheta, W.: Parallelizing the Berlekamp-Massey algorithm. Int. J. Comput. Sci. Inf. Secur. 13(11), 42 (2015)

    Google Scholar 

  2. Berlekamp, E.: Algebraic Coding Theory. World Scientific Publishing, Singapore (2015)

    Book  Google Scholar 

  3. Bradley, T.: Assess, parallelize, optimize, deploy. https://devblogs.nvidia.com/assess-parallelize-optimize-deploy/ (2012)

  4. Chen, N., Yan, Z.: Complexity analysis of Reed-Solomon decoding over GF (2 m) without using syndromes. EURASIP J. Wirel. Commun. Netw. 2008(1), 843634 (2008)

    Article  Google Scholar 

  5. Didier, F.: Efficient erasure decoding of Reed-Solomon codes. http://arXiv.org/arXiv:0901.1886 (2009)

  6. Elsaid, H.A.E.A.: Design and Implementation of Reed-Solomon Decoder Using Decomposed Inversion less Berlekamp-Massey Algorithm. Faculty of Engineering, Cairo University, Giza (2010)

    Google Scholar 

  7. Greenberg, S., Feldblum, N., Melamed, G.: Implementation of the Berlekamp-Massey algorithm using a DSP. In: Proceedings of the 2004 11th IEEE International Conference on Electronics, Circuits and Systems. ICECS, pp. 358–361. IEEE (2004)

  8. Harris, M.: Optimizing CUDA. SC07: High performance computing with CUDA (2007)

  9. Henkel, W.: Another description of the Berlekamp-Massey algorithm. IEE Proc. I-Commun. Speech Vision 136(3), 197–200 (1989)

    Article  Google Scholar 

  10. Katz, J., Shacham, H.: Advances in cryptology–CRYPTO 2017. In: Proceedings of the 37th Annual International Cryptology Conference, Santa Barbara, CA, USA, vol. 10401, 20–24 Aug 2017, Springer (2017)

  11. Kotter, R.: A fast parallel implementation of a Berlekamp-Massey algorithm for algebraic-geometric codes. IEEE Trans. Inf. Theory 44(4), 1353–1368 (1998)

    Article  MathSciNet  Google Scholar 

  12. Mark, H.: Optimizing parallel reduction in CUDA. NVIDIA CUDA SDK 2, 15 (2008)

    Google Scholar 

  13. Massey, J.: Shift-register synthesis and bch decoding. IEEE Trans. Inf. Theory 15(1), 122–127 (1969)

    Article  MathSciNet  Google Scholar 

  14. Mohebbi, H.: Parallel SIMD CPU and GPU implementations of Berlekamp–Massey algorithm and its error correction application. Int. J. Parallel Program. 47(1), 137–160 (2018)

    Article  Google Scholar 

  15. Nvidia, C.: Programming guide (2010)

  16. Nvidia, W.: Whitepaper NVIDIAS next generation CUDA compute architecture. ReVision, pp. 1–22 (2009)

  17. Sarwate, D.V., Shanbhag, N.R.: High-speed architectures for Reed-Solomon decoders. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 9(5), 641–655 (2001)

    Article  Google Scholar 

  18. Schmidt, G., Sidorenko, V.R., Bossert, M.: Syndrome decoding of Reed–Solomon codes beyond half the minimum distance based on shift-register synthesis. IEEE Trans. Inf. Theory 56(10), 5245–5252 (2010)

    Article  MathSciNet  Google Scholar 

  19. Spinner, J., Freudenberger, J.: A decoder with soft decoding capability for high-rate generalized concatenated codes with applications in non-volatile flash memories. In: Proceedings of the 30th Symposium on Integrated Circuits and Systems Design (SBCCI), pp. 185–190. IEEE (2017)

  20. Tilavat, V., Shukla, Y.: Simplification of procedure for decoding reed-solomon codes using various algorithms: an introductory survey. Int. J. Eng. Dev. Res. 2(1) (2014)

  21. Xiao, S., Feng, W.C.: Inter-block GPU communication via fast barrier synchronization. In: Proceedings of the IEEE International Symposium on Parallel & Distributed Processing (IPDPS), pp. 1–12. IEEE (2010)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ghada M. Fathy.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ali, H., Fathy, G.M., Fayez, Z. et al. Exploring the parallel capabilities of GPU: Berlekamp-Massey algorithm case study. Cluster Comput 23, 1007–1024 (2020). https://doi.org/10.1007/s10586-019-02961-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-019-02961-x

Keywords

Navigation