High performance parallelization of Boyer–Moore algorithm on many-core accelerators

Jeong, Yosang; Lee, Myungho; Nam, Dukyun; Kim, Jik-Soo; Hwang, Soonwook

doi:10.1007/s10586-015-0466-4

High performance parallelization of Boyer–Moore algorithm on many-core accelerators

Published: 18 June 2015

Volume 18, pages 1087–1098, (2015)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Yosang Jeong¹,
Myungho Lee¹,
Dukyun Nam²,
Jik-Soo Kim² &
…
Soonwook Hwang²

309 Accesses
2 Citations
Explore all metrics

Abstract

Boyer–Moore (BM) algorithm is a single pattern string matching algorithm. It is considered as the most efficient string matching algorithm and used in many applications. The algorithm first calculates two string shift rules based on the given pattern string in the preprocessing phase. Using the two shift rules, pattern matching operations are performed against the target input string in the second phase. The string shift rules calculated in the first phase let parts of the target input string be skipped where there are no matches to be found in the second phase. The second phase is a time consuming process and needs to be parallelized in order to realize the high performance string matching. In this paper, we parallelize the BM algorithm on the latest many-core accelerators such as the Intel Xeon Phi and the Nvidia Tesla K20 GPU along with the general-purpose multi-core microprocessors. For the parallel string matching, the target input data is partitioned amongst multiple threads. Data lying on the threads’ boundaries is searched redundantly so that the pattern string lying on the boundary between two neighboring threads cannot be missed. The redundant data search overheads increases significantly for a large number of threads. For a fixed target input length, the number of possible matches decreases as the pattern length increases. Furthermore, the positions of the pattern string are spread all over the target data randomly. This leads to the unbalanced workload distribution among threads. We employ the dynamic scheduling and the multithreading techniques to deal with the load balancing issue. We also use the algorithmic cascading technique to maximize the benefit of the multithreading and to reduce the overheads associated with the redundant data search between neighboring threads. Our parallel implementation leads to \(\sim \)17-times speedup on the Xeon Phi and \(\sim \)47-times speedup on the Nvidia Tesla K20 GPU compared with a serial implementation on the host Intel Xeon processor.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

High performance parallel KMP algorithm on a heterogeneous architecture

Article 22 August 2019

Multi-stream Parallel String Matching on Kepler Architecture

Efficient Parallel Knuth-Morris-Pratt Algorithm for Multi-GPUs with CUDA

References

Boyer, R.S., Moore, J.S.: A fast string searching algorithm. Commun. ACM 20(10), 762–772 (1977)
Article MATH Google Scholar
Cole, Richard: Tight bounds on the complexity of the Boyer–Moore string matching algorithm. SIAM J. Comput. 23(5), 1075–1091 (1994)
Article MathSciNet Google Scholar
Dagum, Leonardo, Menon, Ramesh: OpenMP: an industry standard API for shared memory programming. Comput. Sci. Eng. IEEE 5(1), 46–55 (1998)
Article Google Scholar
Galil, Zvi: On improving the worst case running time of the Boyer–Moore string matching algorithm. Commun. ACM 22(9), 505–508 (1979)
Article MATH MathSciNet Google Scholar
Harris, Mark: Optimizing parallel reduction in CUDA. NVIDIA Dev. Technol. 2, 45 (2007)
Google Scholar
Jeffers, J., Reinders, J.: Intel Xeon Phi Coprocessor High Performance Programming, Newnes. Elsevier, Amsterdam (2013)
Google Scholar
Kouzinopoulos, Charalampos S., Konstantinos G., Margaritis. String matching on a multicore GPU using CUDA informatics, 2009. PCI’09. 13th Panhellenic conference on IEEE 2009
Marr, D.T., et al.: Hyper-threading technology architecture and microarchitecture. Intel Technol. J. 6(1), 11 (2002)
MathSciNet Google Scholar
NVIDIA.: CUDA Best Practices Guide: NVIDIA CUDA C Programming Best Practices Guide CUDA Toolkit 5.0, Oct. 2012
NVIDIA.: NVIDIA’s next generation CUDA compute architecture: Kepler GK110 white paper. http://www.nvidia.com/contents/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf (2011). Accessed Nov 2011
Rao, C.S., Raju, K.B., Raju, V.S.: Parallel string matching with multi core processors-A comparative study for gene sequences. Glob. J. Comput. Sci. Technol. 13(1), 25–41 (2013)
MathSciNet Google Scholar
Stone, John E., Gohara, David, Shi, Guochun: OpenCL: a parallel programming standard for heterogeneous computing systems. Comput. Sci. Eng. 12(3), 66 (2010)
Article Google Scholar
The OpenACC application programming interface version 1.0. http://www.openacc.org/sites/default/files/OpenACC.1.0_0.pdf (2011). Accessed Nov 2011
Zhou, J., et al.: Implementation of string match algorithm BMH on GPU using CUDA. Energy Procedia 13, 1853–1861 (2011)
Article Google Scholar

Download references

Acknowledgments

This research has been performed as a subproject (P14008) of project “National Supercomputing Technology Development and Research” and supported by the Korean Institute of Science and Technology Information (KISTI). This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science, and Technology (Grant No: 2012R1A1A2042267).

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Myongji University, 116 Myongji Ro, Cheo-in Gu, Yongin, Kyungki Do, Korea
Yosang Jeong & Myungho Lee
Supercomputing R&D Center, Korea Institute of Science and Technology Information (KISTI), 245 Daehak Ro, Yuseong Gu, Daejeon, Korea
Dukyun Nam, Jik-Soo Kim & Soonwook Hwang

Authors

Yosang Jeong
View author publications
You can also search for this author in PubMed Google Scholar
Myungho Lee
View author publications
You can also search for this author in PubMed Google Scholar
Dukyun Nam
View author publications
You can also search for this author in PubMed Google Scholar
Jik-Soo Kim
View author publications
You can also search for this author in PubMed Google Scholar
Soonwook Hwang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Myungho Lee.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jeong, Y., Lee, M., Nam, D. et al. High performance parallelization of Boyer–Moore algorithm on many-core accelerators. Cluster Comput 18, 1087–1098 (2015). https://doi.org/10.1007/s10586-015-0466-4

Download citation

Received: 03 February 2015
Revised: 09 May 2015
Accepted: 25 May 2015
Published: 18 June 2015
Issue Date: September 2015
DOI: https://doi.org/10.1007/s10586-015-0466-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

High performance parallelization of Boyer–Moore algorithm on many-core accelerators

Abstract

Access this article

Similar content being viewed by others

High performance parallel KMP algorithm on a heterogeneous architecture

Multi-stream Parallel String Matching on Kepler Architecture

Efficient Parallel Knuth-Morris-Pratt Algorithm for Multi-GPUs with CUDA

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

High performance parallelization of Boyer–Moore algorithm on many-core accelerators

Abstract

Access this article

Similar content being viewed by others

High performance parallel KMP algorithm on a heterogeneous architecture

Multi-stream Parallel String Matching on Kepler Architecture

Efficient Parallel Knuth-Morris-Pratt Algorithm for Multi-GPUs with CUDA

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation