Skip to main content
Log in

Parallel SIMD CPU and GPU Implementations of Berlekamp–Massey Algorithm and Its Error Correction Application

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

The Berlekamp–Massey algorithm finds the shortest linear feedback shift register for a binary input sequence. A wide range of applications like cryptography and digital signal processing use this algorithm. This research proposes novel parallel mechanisms offered by heterogeneous CPU and GPU hardwares in order to achieve the best possible performance for BMA. The proposed bitwise implementation of the BMA algorithm is almost 35 times faster than state of the art implementations. This further improvement is achieved by using SIMD instructions which provides data level parallelism. This new approach can be 4.6 and 35 times faster than a bitwise CPU and state of the art implementations, respectively. In order to achieve the highest possible speedup over a multi-core structure, a multi-threading implementation is introduced in this research. By leveraging on OpenMP we were able to obtain a speedup of 10 times over 12 cores server. The GPU device with thousands of processing cores can bring great speedup over the best CPU implementation. Two other parallel mechanisms offered by GPU are concurrent kernel execution and streaming. They achieve 14.5 and 2.2 times of speedup compared to CPU serial and typical CUDA implementations, respectively. Also, the performance of the openMP code with SIMD instructions is compared with GPU stream implementation. The effectiveness of the proposed method is evaluated in a real world error correction application and it achieves 6.8 times of speedup.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Ali, H., Ouyang, M., Soliman, A., Sheta, W.: Parallelizing the Berlekamp–Massey algorithm. Int. J. Comput. Sci. Inf. Secur. 13(11), 42 (2015)

    Google Scholar 

  2. Anderson, S.E.: Bit twiddling hacks. http://graphics.stanford.edu/~seander/bithacks.html. Accessed May 2018 (2005)

  3. Berlekamp, E.R.: Algebraic coding theory. McGraw-Hill, New York (1968)

  4. Chien, R.T.: Cyclic decoding procedures for Bose–Chaudhuri–Hocquenghem codes. IEEE Trans. Inf. Theory 10(4), 357–363 (1964)

    Article  MATH  Google Scholar 

  5. Cowan, B., Cary, J., Meiser, D.: GPU acceleration of particle-in-cell methods. In: APS Meeting Abstracts (2017)

  6. Ding, C., Xiao, G., Shan, W.: The Stability Theory of Stream Ciphers, vol. 561. Springer Science & Business Media, Berlin (1991)

    MATH  Google Scholar 

  7. Forney, G.: On decoding BCH codes. IEEE Trans. Inf. Theory 11(4), 549–557 (1965)

    Article  MathSciNet  MATH  Google Scholar 

  8. Giacomelli, I.: Improved decoding algorithms for Reed–Solomon codes. arXiv preprint arXiv:1310.2473 (2013)

  9. Gille, P., Szamuely, T.: Central Simple Algebras and Galois Cohomology, vol. 165. Cambridge University Press, Cambridge (2017)

    Book  MATH  Google Scholar 

  10. Guide, P.: Intel\(\textregistered \) 64 and IA-32 Architectures Software Developers Manual. Volume 3B: System Programming Guide, Part 2 (2011)

  11. Henkel, W.: Another description of the Berlekamp–Massey algorithm. IEE Proc. I (Commun. Speech Vis.) 136(3), 197–200 (1989)

    Article  MathSciNet  Google Scholar 

  12. Ji, W., Zhang, W., Peng, X., Liu, Y.: High-efficient Reed–Solomon decoder design using recursive Berlekamp–Massey architecture. IET Commun. 10(4), 381–386 (2016)

    Article  Google Scholar 

  13. Kirk, D.B., Wen-Mei, W.H.: Programming Massively Parallel Processors: A Hands-On Approach. Morgan Kaufmann, Los Altos (2016)

    Google Scholar 

  14. Kötter, R.: A fast parallel implementation of a Berlekamp–Massey algorithm for algebraic-geometric codes. IEEE Trans. Inf. Theory 44(4), 1353–1368 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  15. Massey, J.L.: Shift-register synthesis and BCH decoding. IEEE Trans. Inf. Theory 15(1), 122–127 (1969)

    Article  MathSciNet  MATH  Google Scholar 

  16. Mérai, L., Niederreiter, H., Winterhof, A.: Expansion complexity and linear complexity of sequences over finite fields. Cryptogr. Commun. 9(4), 501–509 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  17. Mittal, S., Vetter, J.S.: A survey of CPU-GPU heterogeneous computing techniques. ACM Comput. Surv. (CSUR) 47(4), 69 (2015)

    Article  Google Scholar 

  18. Mohebbi, H., Mu, Y., Ding, W.: Learning weighted distance metric from group level information and its parallel implementation. Appl. Intell. 46(1), 180–196 (2017)

    Article  Google Scholar 

  19. Mohebbi, H., Vajdi, A., Haspel, N., Simovici, D.: Detecting chromosomal structural variation using jaccard distance and parallel architecture. In: 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1959–1964. IEEE (2017)

  20. Mohebbi, H.R., Kashefi, O., Sharifi, M.: Zivm: a zero-copy inter-vm communication mechanism for cloud computing. Comput. Inf. Sci. 4(6), 18 (2011)

    Google Scholar 

  21. Moon Todd, K.: Error Correction Coding: Mathematical Methods and Algorithms. Technical Report, Wiley, ISBN:0-471-64800-0 (2005)

  22. Murase, M.: Linear feedback shift register. US Patent 5,090,035 (1992)

  23. NVIDIA: CUDA cuda parallel computing platform @ONLINE (2018). http://www.nvidia.com/object/ cuda_home_new.html. Accessed May 2018

  24. NVIDIA: Kepler tuning cuda applications for kepler @ONLINE (2018). http://docs.nvidia.com/cuda/kepler-tuning-guide/. Accessed May 2018

  25. NVIDIA: PTX parallel thread execution ISA version 4.2 @ONLINE (2018). http://docs.nvidia.com/cuda/parallel-thread-execution. Accessed May 2018

  26. OpenGL: Cg cg @ONLINE (2018). https://www.opengl.org/wiki/Cg. Accessed May 2018

  27. Pennycook, S.J., Hughes, C.J., Smelyanskiy, M., Jarvis, S.A.: Exploring SIMD for molecular dynamics, using intel\(\textregistered \) xeon\(\textregistered \) processors and intel\(\textregistered \) xeon phi coprocessors. In: 2013 IEEE 27th International Symposium on Parallel and Distributed Processing (IPDPS), pp. 1085–1097. IEEE (2013)

  28. Rajski, J., Tyszer, J.: Primitive polynomials over GF (2) of degree up to 660 with uniformly distributed coefficients. J. Electron. Test. 19(6), 645–657 (2003)

    Article  Google Scholar 

  29. Reed, I.S., Solomon, G.: Polynomial codes over certain finite fields. J. Soc. Ind. Appl. Math. 8(2), 300–304 (1960)

    Article  MathSciNet  MATH  Google Scholar 

  30. Rueppel, R.A.: Linear complexity and random sequences. In: Advances in Cryptology EUROCRYPT85, pp. 167–188. Springer (1986)

  31. Stamp, M., Martin, C.F.: An algorithm for the k-error linear complexity of binary sequences with period 2 n. IEEE Trans. Inf. Theory 39(4), 1398–1401 (1993)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

I would like to express my sincere gratitude to Professor Ming Ouyang from Computer Science Department of the University of Massachusetts Boston. His comments and suggestions greatly improved the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hamidreza Mohebbi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mohebbi, H. Parallel SIMD CPU and GPU Implementations of Berlekamp–Massey Algorithm and Its Error Correction Application. Int J Parallel Prog 47, 137–160 (2019). https://doi.org/10.1007/s10766-018-0574-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-018-0574-x

Keywords