Parallel SIMD CPU and GPU Implementations of Berlekamp–Massey Algorithm and Its Error Correction Application

Mohebbi, Hamidreza

doi:10.1007/s10766-018-0574-x

Parallel SIMD CPU and GPU Implementations of Berlekamp–Massey Algorithm and Its Error Correction Application

Published: 03 May 2018

Volume 47, pages 137–160, (2019)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Hamidreza Mohebbi ORCID: orcid.org/0000-0001-8300-6042¹

322 Accesses
2 Citations
Explore all metrics

Abstract

The Berlekamp–Massey algorithm finds the shortest linear feedback shift register for a binary input sequence. A wide range of applications like cryptography and digital signal processing use this algorithm. This research proposes novel parallel mechanisms offered by heterogeneous CPU and GPU hardwares in order to achieve the best possible performance for BMA. The proposed bitwise implementation of the BMA algorithm is almost 35 times faster than state of the art implementations. This further improvement is achieved by using SIMD instructions which provides data level parallelism. This new approach can be 4.6 and 35 times faster than a bitwise CPU and state of the art implementations, respectively. In order to achieve the highest possible speedup over a multi-core structure, a multi-threading implementation is introduced in this research. By leveraging on OpenMP we were able to obtain a speedup of 10 times over 12 cores server. The GPU device with thousands of processing cores can bring great speedup over the best CPU implementation. Two other parallel mechanisms offered by GPU are concurrent kernel execution and streaming. They achieve 14.5 and 2.2 times of speedup compared to CPU serial and typical CUDA implementations, respectively. Also, the performance of the openMP code with SIMD instructions is compared with GPU stream implementation. The effectiveness of the proposed method is evaluated in a real world error correction application and it achieves 6.8 times of speedup.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploring the parallel capabilities of GPU: Berlekamp-Massey algorithm case study

Article 12 August 2019

Parallel bitsliced AES through PHAST: a single-source high-performance library for multi-cores and GPUs

Article 29 October 2017

A Parallel GPU Implementation of SWIFFTX

References

Ali, H., Ouyang, M., Soliman, A., Sheta, W.: Parallelizing the Berlekamp–Massey algorithm. Int. J. Comput. Sci. Inf. Secur. 13(11), 42 (2015)
Google Scholar
Anderson, S.E.: Bit twiddling hacks. http://graphics.stanford.edu/~seander/bithacks.html. Accessed May 2018 (2005)
Berlekamp, E.R.: Algebraic coding theory. McGraw-Hill, New York (1968)
Chien, R.T.: Cyclic decoding procedures for Bose–Chaudhuri–Hocquenghem codes. IEEE Trans. Inf. Theory 10(4), 357–363 (1964)
Article MATH Google Scholar
Cowan, B., Cary, J., Meiser, D.: GPU acceleration of particle-in-cell methods. In: APS Meeting Abstracts (2017)
Ding, C., Xiao, G., Shan, W.: The Stability Theory of Stream Ciphers, vol. 561. Springer Science & Business Media, Berlin (1991)
MATH Google Scholar
Forney, G.: On decoding BCH codes. IEEE Trans. Inf. Theory 11(4), 549–557 (1965)
Article MathSciNet MATH Google Scholar
Giacomelli, I.: Improved decoding algorithms for Reed–Solomon codes. arXiv preprint arXiv:1310.2473 (2013)
Gille, P., Szamuely, T.: Central Simple Algebras and Galois Cohomology, vol. 165. Cambridge University Press, Cambridge (2017)
Book MATH Google Scholar
Guide, P.: Intel$\textregistered $ 64 and IA-32 Architectures Software Developers Manual. Volume 3B: System Programming Guide, Part 2 (2011)
Henkel, W.: Another description of the Berlekamp–Massey algorithm. IEE Proc. I (Commun. Speech Vis.) 136(3), 197–200 (1989)
Article MathSciNet Google Scholar
Ji, W., Zhang, W., Peng, X., Liu, Y.: High-efficient Reed–Solomon decoder design using recursive Berlekamp–Massey architecture. IET Commun. 10(4), 381–386 (2016)
Article Google Scholar
Kirk, D.B., Wen-Mei, W.H.: Programming Massively Parallel Processors: A Hands-On Approach. Morgan Kaufmann, Los Altos (2016)
Google Scholar
Kötter, R.: A fast parallel implementation of a Berlekamp–Massey algorithm for algebraic-geometric codes. IEEE Trans. Inf. Theory 44(4), 1353–1368 (1998)
Article MathSciNet MATH Google Scholar
Massey, J.L.: Shift-register synthesis and BCH decoding. IEEE Trans. Inf. Theory 15(1), 122–127 (1969)
Article MathSciNet MATH Google Scholar
Mérai, L., Niederreiter, H., Winterhof, A.: Expansion complexity and linear complexity of sequences over finite fields. Cryptogr. Commun. 9(4), 501–509 (2017)
Article MathSciNet MATH Google Scholar
Mittal, S., Vetter, J.S.: A survey of CPU-GPU heterogeneous computing techniques. ACM Comput. Surv. (CSUR) 47(4), 69 (2015)
Article Google Scholar
Mohebbi, H., Mu, Y., Ding, W.: Learning weighted distance metric from group level information and its parallel implementation. Appl. Intell. 46(1), 180–196 (2017)
Article Google Scholar
Mohebbi, H., Vajdi, A., Haspel, N., Simovici, D.: Detecting chromosomal structural variation using jaccard distance and parallel architecture. In: 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1959–1964. IEEE (2017)
Mohebbi, H.R., Kashefi, O., Sharifi, M.: Zivm: a zero-copy inter-vm communication mechanism for cloud computing. Comput. Inf. Sci. 4(6), 18 (2011)
Google Scholar
Moon Todd, K.: Error Correction Coding: Mathematical Methods and Algorithms. Technical Report, Wiley, ISBN:0-471-64800-0 (2005)
Murase, M.: Linear feedback shift register. US Patent 5,090,035 (1992)
NVIDIA: CUDA cuda parallel computing platform @ONLINE (2018). http://www.nvidia.com/object/ cuda_home_new.html. Accessed May 2018
NVIDIA: Kepler tuning cuda applications for kepler @ONLINE (2018). http://docs.nvidia.com/cuda/kepler-tuning-guide/. Accessed May 2018
NVIDIA: PTX parallel thread execution ISA version 4.2 @ONLINE (2018). http://docs.nvidia.com/cuda/parallel-thread-execution. Accessed May 2018
OpenGL: Cg cg @ONLINE (2018). https://www.opengl.org/wiki/Cg. Accessed May 2018
Pennycook, S.J., Hughes, C.J., Smelyanskiy, M., Jarvis, S.A.: Exploring SIMD for molecular dynamics, using intel$\textregistered $ xeon$\textregistered $ processors and intel$\textregistered $ xeon phi coprocessors. In: 2013 IEEE 27th International Symposium on Parallel and Distributed Processing (IPDPS), pp. 1085–1097. IEEE (2013)
Rajski, J., Tyszer, J.: Primitive polynomials over GF (2) of degree up to 660 with uniformly distributed coefficients. J. Electron. Test. 19(6), 645–657 (2003)
Article Google Scholar
Reed, I.S., Solomon, G.: Polynomial codes over certain finite fields. J. Soc. Ind. Appl. Math. 8(2), 300–304 (1960)
Article MathSciNet MATH Google Scholar
Rueppel, R.A.: Linear complexity and random sequences. In: Advances in Cryptology EUROCRYPT85, pp. 167–188. Springer (1986)
Stamp, M., Martin, C.F.: An algorithm for the k-error linear complexity of binary sequences with period 2 n. IEEE Trans. Inf. Theory 39(4), 1398–1401 (1993)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

I would like to express my sincere gratitude to Professor Ming Ouyang from Computer Science Department of the University of Massachusetts Boston. His comments and suggestions greatly improved the manuscript.

Author information

Authors and Affiliations

Computer Science Department, University of Massachusetts Boston, Boston, MA, 02125, USA
Hamidreza Mohebbi

Authors

Hamidreza Mohebbi
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Hamidreza Mohebbi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mohebbi, H. Parallel SIMD CPU and GPU Implementations of Berlekamp–Massey Algorithm and Its Error Correction Application. Int J Parallel Prog 47, 137–160 (2019). https://doi.org/10.1007/s10766-018-0574-x

Download citation

Received: 13 May 2017
Accepted: 24 April 2018
Published: 03 May 2018
Issue Date: 15 February 2019
DOI: https://doi.org/10.1007/s10766-018-0574-x

Keywords

Part of a collection:

Special Issue on High-Level Programming for Heterogeneous Parallel Systems

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Parallel SIMD CPU and GPU Implementations of Berlekamp–Massey Algorithm and Its Error Correction Application

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Exploring the parallel capabilities of GPU: Berlekamp-Massey algorithm case study

Parallel bitsliced AES through PHAST: a single-source high-performance library for multi-cores and GPUs

A Parallel GPU Implementation of SWIFFTX

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now