Exploring the parallel capabilities of GPU: Berlekamp-Massey algorithm case study

Ali, Hanan; Fathy, Ghada M.; Fayez, Zeinab; Sheta, Walaa

doi:10.1007/s10586-019-02961-x

Exploring the parallel capabilities of GPU: Berlekamp-Massey algorithm case study

Published: 12 August 2019

Volume 23, pages 1007–1024, (2020)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Hanan Ali¹,
Ghada M. Fathy ORCID: orcid.org/0000-0003-0175-6961¹,
Zeinab Fayez¹ &
…
Walaa Sheta¹

338 Accesses
1 Citation
Explore all metrics

Abstract

Graphics processors Unit (GPU) architectures are becoming increasingly programmable, offering the potential for dramatic speedups for a variety of general purpose applications compared to contemporary general- purpose processors (CPUs). However, there are several optimization techniques which are used to maximize the benefit of the GPU resources. This research exploits optimization techniques for CUDA enabled GPU architecture in order to achieve the best possible performance for Berlekamp-Massey Algorithm (BMA) as a case study. Berlekamp-Massey Algorithm (BMA) is one of the best solutions to find the shortest linear feedback shift register which is very important for several applications such as digital processing and cryptography. The experimental results show that the optimized BMA implementation is almost 160 × faster than non-bit CPU serial implementation, 7 × faster than bit serial implementation and 4 × faster than an initial parallel bit implementation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Remodified Dual-CLCG Method and Its VLSI Architecture for Pseudorandom Bit Generation

Article 10 April 2024

Shared Memory Parallelism in Modern C++ and HPX

Article 20 April 2024

High-Performance Matrix Eigenvalue Decomposition Using the Parallel Jacobi Algorithm on FPGA

Article 27 September 2022

References

Ali, H., Ouyang, M., Soliman, A., Sheta, W.: Parallelizing the Berlekamp-Massey algorithm. Int. J. Comput. Sci. Inf. Secur. 13(11), 42 (2015)
Google Scholar
Berlekamp, E.: Algebraic Coding Theory. World Scientific Publishing, Singapore (2015)
Book Google Scholar
Bradley, T.: Assess, parallelize, optimize, deploy. https://devblogs.nvidia.com/assess-parallelize-optimize-deploy/ (2012)
Chen, N., Yan, Z.: Complexity analysis of Reed-Solomon decoding over GF (2 m) without using syndromes. EURASIP J. Wirel. Commun. Netw. 2008(1), 843634 (2008)
Article Google Scholar
Didier, F.: Efficient erasure decoding of Reed-Solomon codes. http://arXiv.org/arXiv:0901.1886 (2009)
Elsaid, H.A.E.A.: Design and Implementation of Reed-Solomon Decoder Using Decomposed Inversion less Berlekamp-Massey Algorithm. Faculty of Engineering, Cairo University, Giza (2010)
Google Scholar
Greenberg, S., Feldblum, N., Melamed, G.: Implementation of the Berlekamp-Massey algorithm using a DSP. In: Proceedings of the 2004 11th IEEE International Conference on Electronics, Circuits and Systems. ICECS, pp. 358–361. IEEE (2004)
Harris, M.: Optimizing CUDA. SC07: High performance computing with CUDA (2007)
Henkel, W.: Another description of the Berlekamp-Massey algorithm. IEE Proc. I-Commun. Speech Vision 136(3), 197–200 (1989)
Article Google Scholar
Katz, J., Shacham, H.: Advances in cryptology–CRYPTO 2017. In: Proceedings of the 37th Annual International Cryptology Conference, Santa Barbara, CA, USA, vol. 10401, 20–24 Aug 2017, Springer (2017)
Kotter, R.: A fast parallel implementation of a Berlekamp-Massey algorithm for algebraic-geometric codes. IEEE Trans. Inf. Theory 44(4), 1353–1368 (1998)
Article MathSciNet Google Scholar
Mark, H.: Optimizing parallel reduction in CUDA. NVIDIA CUDA SDK 2, 15 (2008)
Google Scholar
Massey, J.: Shift-register synthesis and bch decoding. IEEE Trans. Inf. Theory 15(1), 122–127 (1969)
Article MathSciNet Google Scholar
Mohebbi, H.: Parallel SIMD CPU and GPU implementations of Berlekamp–Massey algorithm and its error correction application. Int. J. Parallel Program. 47(1), 137–160 (2018)
Article Google Scholar
Nvidia, C.: Programming guide (2010)
Nvidia, W.: Whitepaper NVIDIAS next generation CUDA compute architecture. ReVision, pp. 1–22 (2009)
Sarwate, D.V., Shanbhag, N.R.: High-speed architectures for Reed-Solomon decoders. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 9(5), 641–655 (2001)
Article Google Scholar
Schmidt, G., Sidorenko, V.R., Bossert, M.: Syndrome decoding of Reed–Solomon codes beyond half the minimum distance based on shift-register synthesis. IEEE Trans. Inf. Theory 56(10), 5245–5252 (2010)
Article MathSciNet Google Scholar
Spinner, J., Freudenberger, J.: A decoder with soft decoding capability for high-rate generalized concatenated codes with applications in non-volatile flash memories. In: Proceedings of the 30th Symposium on Integrated Circuits and Systems Design (SBCCI), pp. 185–190. IEEE (2017)
Tilavat, V., Shukla, Y.: Simplification of procedure for decoding reed-solomon codes using various algorithms: an introductory survey. Int. J. Eng. Dev. Res. 2(1) (2014)
Xiao, S., Feng, W.C.: Inter-block GPU communication via fast barrier synchronization. In: Proceedings of the IEEE International Symposium on Parallel & Distributed Processing (IPDPS), pp. 1–12. IEEE (2010)

Download references

Author information

Authors and Affiliations

Informatic Research Institute, City for Scientific Research, Alexandria, Egypt
Hanan Ali, Ghada M. Fathy, Zeinab Fayez & Walaa Sheta

Authors

Hanan Ali
View author publications
You can also search for this author in PubMed Google Scholar
Ghada M. Fathy
View author publications
You can also search for this author in PubMed Google Scholar
Zeinab Fayez
View author publications
You can also search for this author in PubMed Google Scholar
Walaa Sheta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ghada M. Fathy.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ali, H., Fathy, G.M., Fayez, Z. et al. Exploring the parallel capabilities of GPU: Berlekamp-Massey algorithm case study. Cluster Comput 23, 1007–1024 (2020). https://doi.org/10.1007/s10586-019-02961-x

Download citation

Received: 25 December 2018
Revised: 15 May 2019
Accepted: 17 July 2019
Published: 12 August 2019
Issue Date: June 2020
DOI: https://doi.org/10.1007/s10586-019-02961-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploring the parallel capabilities of GPU: Berlekamp-Massey algorithm case study

Abstract

Access this article

Similar content being viewed by others

Remodified Dual-CLCG Method and Its VLSI Architecture for Pseudorandom Bit Generation

Shared Memory Parallelism in Modern C++ and HPX

High-Performance Matrix Eigenvalue Decomposition Using the Parallel Jacobi Algorithm on FPGA

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Exploring the parallel capabilities of GPU: Berlekamp-Massey algorithm case study

Abstract

Access this article

Similar content being viewed by others

Remodified Dual-CLCG Method and Its VLSI Architecture for Pseudorandom Bit Generation

Shared Memory Parallelism in Modern C++ and HPX

High-Performance Matrix Eigenvalue Decomposition Using the Parallel Jacobi Algorithm on FPGA

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation