Abstract
Boyer–Moore (BM) algorithm is a single pattern string matching algorithm. It is considered as the most efficient string matching algorithm and used in many applications. The algorithm first calculates two string shift rules based on the given pattern string in the preprocessing phase. Using the two shift rules, pattern matching operations are performed against the target input string in the second phase. The string shift rules calculated in the first phase let parts of the target input string be skipped where there are no matches to be found in the second phase. The second phase is a time consuming process and needs to be parallelized in order to realize the high performance string matching. In this paper, we parallelize the BM algorithm on the latest many-core accelerators such as the Intel Xeon Phi and the Nvidia Tesla K20 GPU along with the general-purpose multi-core microprocessors. For the parallel string matching, the target input data is partitioned amongst multiple threads. Data lying on the threads’ boundaries is searched redundantly so that the pattern string lying on the boundary between two neighboring threads cannot be missed. The redundant data search overheads increases significantly for a large number of threads. For a fixed target input length, the number of possible matches decreases as the pattern length increases. Furthermore, the positions of the pattern string are spread all over the target data randomly. This leads to the unbalanced workload distribution among threads. We employ the dynamic scheduling and the multithreading techniques to deal with the load balancing issue. We also use the algorithmic cascading technique to maximize the benefit of the multithreading and to reduce the overheads associated with the redundant data search between neighboring threads. Our parallel implementation leads to \(\sim \)17-times speedup on the Xeon Phi and \(\sim \)47-times speedup on the Nvidia Tesla K20 GPU compared with a serial implementation on the host Intel Xeon processor.
Similar content being viewed by others
References
Boyer, R.S., Moore, J.S.: A fast string searching algorithm. Commun. ACM 20(10), 762–772 (1977)
Cole, Richard: Tight bounds on the complexity of the Boyer–Moore string matching algorithm. SIAM J. Comput. 23(5), 1075–1091 (1994)
Dagum, Leonardo, Menon, Ramesh: OpenMP: an industry standard API for shared memory programming. Comput. Sci. Eng. IEEE 5(1), 46–55 (1998)
Galil, Zvi: On improving the worst case running time of the Boyer–Moore string matching algorithm. Commun. ACM 22(9), 505–508 (1979)
Harris, Mark: Optimizing parallel reduction in CUDA. NVIDIA Dev. Technol. 2, 45 (2007)
Jeffers, J., Reinders, J.: Intel Xeon Phi Coprocessor High Performance Programming, Newnes. Elsevier, Amsterdam (2013)
Kouzinopoulos, Charalampos S., Konstantinos G., Margaritis. String matching on a multicore GPU using CUDA informatics, 2009. PCI’09. 13th Panhellenic conference on IEEE 2009
Marr, D.T., et al.: Hyper-threading technology architecture and microarchitecture. Intel Technol. J. 6(1), 11 (2002)
NVIDIA.: CUDA Best Practices Guide: NVIDIA CUDA C Programming Best Practices Guide CUDA Toolkit 5.0, Oct. 2012
NVIDIA.: NVIDIA’s next generation CUDA compute architecture: Kepler GK110 white paper. http://www.nvidia.com/contents/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf (2011). Accessed Nov 2011
Rao, C.S., Raju, K.B., Raju, V.S.: Parallel string matching with multi core processors-A comparative study for gene sequences. Glob. J. Comput. Sci. Technol. 13(1), 25–41 (2013)
Stone, John E., Gohara, David, Shi, Guochun: OpenCL: a parallel programming standard for heterogeneous computing systems. Comput. Sci. Eng. 12(3), 66 (2010)
The OpenACC application programming interface version 1.0. http://www.openacc.org/sites/default/files/OpenACC.1.0_0.pdf (2011). Accessed Nov 2011
Zhou, J., et al.: Implementation of string match algorithm BMH on GPU using CUDA. Energy Procedia 13, 1853–1861 (2011)
Acknowledgments
This research has been performed as a subproject (P14008) of project “National Supercomputing Technology Development and Research” and supported by the Korean Institute of Science and Technology Information (KISTI). This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science, and Technology (Grant No: 2012R1A1A2042267).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jeong, Y., Lee, M., Nam, D. et al. High performance parallelization of Boyer–Moore algorithm on many-core accelerators. Cluster Comput 18, 1087–1098 (2015). https://doi.org/10.1007/s10586-015-0466-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-015-0466-4