Abstract
The Berlekamp–Massey algorithm finds the shortest linear feedback shift register for a binary input sequence. A wide range of applications like cryptography and digital signal processing use this algorithm. This research proposes novel parallel mechanisms offered by heterogeneous CPU and GPU hardwares in order to achieve the best possible performance for BMA. The proposed bitwise implementation of the BMA algorithm is almost 35 times faster than state of the art implementations. This further improvement is achieved by using SIMD instructions which provides data level parallelism. This new approach can be 4.6 and 35 times faster than a bitwise CPU and state of the art implementations, respectively. In order to achieve the highest possible speedup over a multi-core structure, a multi-threading implementation is introduced in this research. By leveraging on OpenMP we were able to obtain a speedup of 10 times over 12 cores server. The GPU device with thousands of processing cores can bring great speedup over the best CPU implementation. Two other parallel mechanisms offered by GPU are concurrent kernel execution and streaming. They achieve 14.5 and 2.2 times of speedup compared to CPU serial and typical CUDA implementations, respectively. Also, the performance of the openMP code with SIMD instructions is compared with GPU stream implementation. The effectiveness of the proposed method is evaluated in a real world error correction application and it achieves 6.8 times of speedup.














Similar content being viewed by others
References
Ali, H., Ouyang, M., Soliman, A., Sheta, W.: Parallelizing the Berlekamp–Massey algorithm. Int. J. Comput. Sci. Inf. Secur. 13(11), 42 (2015)
Anderson, S.E.: Bit twiddling hacks. http://graphics.stanford.edu/~seander/bithacks.html. Accessed May 2018 (2005)
Berlekamp, E.R.: Algebraic coding theory. McGraw-Hill, New York (1968)
Chien, R.T.: Cyclic decoding procedures for Bose–Chaudhuri–Hocquenghem codes. IEEE Trans. Inf. Theory 10(4), 357–363 (1964)
Cowan, B., Cary, J., Meiser, D.: GPU acceleration of particle-in-cell methods. In: APS Meeting Abstracts (2017)
Ding, C., Xiao, G., Shan, W.: The Stability Theory of Stream Ciphers, vol. 561. Springer Science & Business Media, Berlin (1991)
Forney, G.: On decoding BCH codes. IEEE Trans. Inf. Theory 11(4), 549–557 (1965)
Giacomelli, I.: Improved decoding algorithms for Reed–Solomon codes. arXiv preprint arXiv:1310.2473 (2013)
Gille, P., Szamuely, T.: Central Simple Algebras and Galois Cohomology, vol. 165. Cambridge University Press, Cambridge (2017)
Guide, P.: Intel\(\textregistered \) 64 and IA-32 Architectures Software Developers Manual. Volume 3B: System Programming Guide, Part 2 (2011)
Henkel, W.: Another description of the Berlekamp–Massey algorithm. IEE Proc. I (Commun. Speech Vis.) 136(3), 197–200 (1989)
Ji, W., Zhang, W., Peng, X., Liu, Y.: High-efficient Reed–Solomon decoder design using recursive Berlekamp–Massey architecture. IET Commun. 10(4), 381–386 (2016)
Kirk, D.B., Wen-Mei, W.H.: Programming Massively Parallel Processors: A Hands-On Approach. Morgan Kaufmann, Los Altos (2016)
Kötter, R.: A fast parallel implementation of a Berlekamp–Massey algorithm for algebraic-geometric codes. IEEE Trans. Inf. Theory 44(4), 1353–1368 (1998)
Massey, J.L.: Shift-register synthesis and BCH decoding. IEEE Trans. Inf. Theory 15(1), 122–127 (1969)
Mérai, L., Niederreiter, H., Winterhof, A.: Expansion complexity and linear complexity of sequences over finite fields. Cryptogr. Commun. 9(4), 501–509 (2017)
Mittal, S., Vetter, J.S.: A survey of CPU-GPU heterogeneous computing techniques. ACM Comput. Surv. (CSUR) 47(4), 69 (2015)
Mohebbi, H., Mu, Y., Ding, W.: Learning weighted distance metric from group level information and its parallel implementation. Appl. Intell. 46(1), 180–196 (2017)
Mohebbi, H., Vajdi, A., Haspel, N., Simovici, D.: Detecting chromosomal structural variation using jaccard distance and parallel architecture. In: 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1959–1964. IEEE (2017)
Mohebbi, H.R., Kashefi, O., Sharifi, M.: Zivm: a zero-copy inter-vm communication mechanism for cloud computing. Comput. Inf. Sci. 4(6), 18 (2011)
Moon Todd, K.: Error Correction Coding: Mathematical Methods and Algorithms. Technical Report, Wiley, ISBN:0-471-64800-0 (2005)
Murase, M.: Linear feedback shift register. US Patent 5,090,035 (1992)
NVIDIA: CUDA cuda parallel computing platform @ONLINE (2018). http://www.nvidia.com/object/ cuda_home_new.html. Accessed May 2018
NVIDIA: Kepler tuning cuda applications for kepler @ONLINE (2018). http://docs.nvidia.com/cuda/kepler-tuning-guide/. Accessed May 2018
NVIDIA: PTX parallel thread execution ISA version 4.2 @ONLINE (2018). http://docs.nvidia.com/cuda/parallel-thread-execution. Accessed May 2018
OpenGL: Cg cg @ONLINE (2018). https://www.opengl.org/wiki/Cg. Accessed May 2018
Pennycook, S.J., Hughes, C.J., Smelyanskiy, M., Jarvis, S.A.: Exploring SIMD for molecular dynamics, using intel\(\textregistered \) xeon\(\textregistered \) processors and intel\(\textregistered \) xeon phi coprocessors. In: 2013 IEEE 27th International Symposium on Parallel and Distributed Processing (IPDPS), pp. 1085–1097. IEEE (2013)
Rajski, J., Tyszer, J.: Primitive polynomials over GF (2) of degree up to 660 with uniformly distributed coefficients. J. Electron. Test. 19(6), 645–657 (2003)
Reed, I.S., Solomon, G.: Polynomial codes over certain finite fields. J. Soc. Ind. Appl. Math. 8(2), 300–304 (1960)
Rueppel, R.A.: Linear complexity and random sequences. In: Advances in Cryptology EUROCRYPT85, pp. 167–188. Springer (1986)
Stamp, M., Martin, C.F.: An algorithm for the k-error linear complexity of binary sequences with period 2 n. IEEE Trans. Inf. Theory 39(4), 1398–1401 (1993)
Acknowledgements
I would like to express my sincere gratitude to Professor Ming Ouyang from Computer Science Department of the University of Massachusetts Boston. His comments and suggestions greatly improved the manuscript.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mohebbi, H. Parallel SIMD CPU and GPU Implementations of Berlekamp–Massey Algorithm and Its Error Correction Application. Int J Parallel Prog 47, 137–160 (2019). https://doi.org/10.1007/s10766-018-0574-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-018-0574-x