Skip to main content
Log in

New algorithms for fixed-length approximate string matching and approximate circular string matching under the Hamming distance

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

A Correction to this article was published on 20 March 2018

This article has been updated

Abstract

This paper proposes new algorithms for fixed-length approximate string matching and approximate circular string matching under the Hamming distance. Fixed-length approximate string matching and approximate circular string matching are special cases of approximate string matching and have numerous direct applications in bioinformatics and text searching. Firstly, a counter-vector-mismatches (CVM) algorithm is proposed to solve fixed-length approximate string matching with k-mismatches. The development of CVM algorithm is based on the parallel summation of counters located in the same machine word. Secondly, a parallel counter-vector-mismatches (PCVM) algorithm is proposed to accelerate CVM algorithm in parallel. The PCVM algorithm is integrated into two-level parallelisms that exploit not only word-level parallelism but also data parallelism via parallel environments such as multi-core processors and graphics processing units (GPUs). In the particular case of adopting GPUs, a shared-mem parallel counter-vector-mismatches (PCVMsmem) scheme can be implemented from PCVM algorithm. The PCVMsmem scheme can exploit the memory model of GPUs to optimize performance of PCVM algorithm. Finally, this paper shows several methods to adopt CVM and PCVM algorithms in case the input pattern is in circular structure. In the experiments with real DNA packages, our proposed algorithms and scheme work greatly faster than previous bit-vector-mismatches and parallel bit-vector-mismatches algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Change history

  • 20 March 2018

    The funding information is missing in the Acknowledgements section of the original article. The correct wording is given below.

References

  1. Navarro G (2001) A guided tour to approximate string matching. ACM Comput Surv (CSUR) 33(1):31–88

    Article  Google Scholar 

  2. Kefu X, Cui W, Yue H, Guo L (2013) Bit-parallel multiple approximate string matching based on GPU. Proc Comput Sci 17:523–529

    Article  Google Scholar 

  3. Man D, Nakano K, Ito Y (2013) The approximate string matching on the hierarchical memory machine, with performance evaluation. In: Proceedings of the 7th IEEE international symposium embedded multicore socs (MCSoC). IEEE, pp 79–84

  4. Michailidis PD, Margaritis KG (2005) A programmable array processor architecture for flexible approximate string matching algorithms. In: 2005 International Conference on Parallel Processing Workshops (ICPPW’05). IEEE, pp 201–209

  5. Guo Longjiang, Du Shufang, Ren Meirui, Liu Yu, Li Jinbao, He Jing, Tian Ning, Li Keqin (2013) Parallel algorithm for approximate string matching with k-differences. In: Proceedings of the 8th IEEE International Conference Networking, Architecture and Storage (NAS). IEEE, pp 257–261

  6. Hyyrö H (2003) A bit-vector algorithm for computing Levenshtein and Damerau edit distances. Nord. J. Comput. 10(1):29–39

    MathSciNet  MATH  Google Scholar 

  7. Ho TL, Seung-Rohk O, Kim HJ (2017) A parallel approximate string matching under Levenshtein distance on graphics processing units using warp-shuffle operations. PLoS ONE 12(10):e0186251

    Article  Google Scholar 

  8. Amir A, Lewenstein M, Porat E (2004) Faster algorithms for string matching with \(k\)-mismatches. Journal of Algorithms 50(2):257–275

    Article  MathSciNet  MATH  Google Scholar 

  9. Barton C, Iliopoulos CS, Pissis SP (2014) Fast algorithms for approximate circular string matching. Algorithms Mol Biol 9(1):9

    Article  Google Scholar 

  10. Liu Y, Guo L, Li J, Ren M, Li K (2012) Parallel algorithms for approximate string matching with \(k\)-mismatches on CUDA. In: Proceedings of the 26th IEEE International Conference on Parallel and Distributed Processing Symposium Workshops & Ph.D. forum (IPDPSW). IEEE, pp 2414–2422

  11. Ho TL, Seung-Rohk O, Kim HJ (2016) Circular bit-vector-mismatches: a new approximate circular string matching with \(k\)-mismatches. IEICE Trans Fundam Electron Commun Comput Sci 99:1726–1729

    Article  Google Scholar 

  12. Iliopoulos CS, Mouchard L, Pinzon YJ (2001) The Max-Shift algorithm for approximate string matching. In: Brodal GS, Frigioni D, Marchetti-Spaccamela A (eds) Algorithm engineering. Springer, Berlin, Heidelberg, pp 13–25

  13. Landau GM, Myers EW, Schmidt JP (1998) Incremental string comparison. SIAM J Comput 27(2):557–582

    Article  MathSciNet  MATH  Google Scholar 

  14. Chapman B et al (2010) A parallel algorithm for the fixed-length approximate string matching problem for high throughput sequencing technologies. Parallel Comput From Multicores GPU’s Petascale 19:150

    Google Scholar 

  15. Crochemore M, Iliopoulos CS, Pissis SP (2010) A parallel algorithm for fixed-length approximate string-matching with \(k\)-mismatches. In: Elomaa T, Mannila H, Orponen P (eds) Algorithms and applications. Springer, Berlin, Heidelberg, pp 92–101

  16. Pissis S, Retha A (2015) Generalised implementation for fixed-length approximate string matching under Hamming distance and applications. In: Proceedings of IEEE international workshop parallel distributed processing symposium (IPDPSW). IEEE, pp 367–374

  17. Barton C, Iliopoulos CS, Kundu R, Pissis SP, Retha A, Vayani F (2015) Accurate and efficient methods to improve multiple circular sequence alignment. In: Bampis E (ed) Experimental algorithms. Springer, Cham, Switzerland, pp 247–258

  18. Pissis SP, Stamatakis A, Pavlidis P(2013) MoTeX: a word-based HPC tool for MoTif eXtraction. In: Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics, Computational Biology and Biomedical Informatics. ACM, pp 13

  19. Pissis SP (2014) MoTeX-II: structured MoTif eXtraction from large-scale datasets. BMC Bioinform 15(1):235

    Article  Google Scholar 

  20. NVIDIA (2017) GeForce GTX 1080. https://www.nvidia.com/en-us/geforce/products/10series/geforce-gtx-1080. Accessed 27 Oct 2017

  21. Intel (2017) Xeon CPU E5-2630 V3. https://ark.intel.com/products/83356/Intel-Xeon-Processor-E5-2630-v3-20M-Cache-2_40-GHz. Accessed 27 Oct 2017

  22. Stothard P (2017) Ramdom DNA pattern, bioinformatics. http://www.bioinformatics.org/sms2/dna_pattern.html. Accessed 4 Mar 2017

  23. Saccharomyces Genome Database (2017) DNA sequences. http://downloads.yeastgenome.org/sequence/S288C_reference/orf_dna. Accessed 4 Mar 2017

  24. Baeza-Yates R, Gonnet GH (1992) A new approach to text searching. Commun ACM 35(10):74–82

    Article  Google Scholar 

  25. Grabowski S, Fredriksson K (2008) Bit-parallel string matching under Hamming distance in O(n[m/w]) worst case time. Inf Process Lett 105(5):182–187

    Article  MathSciNet  MATH  Google Scholar 

  26. Lin CH, Wang GH, Huang CC (2014) Hierarchical parallelism of bit-parallel algorithm for approximate string matching on GPUs. In: Proceedings of IEEE symposium on computer applications and communications (SCAC). IEEE, pp 76–81

  27. Ho TL, Seung-Rohk O, Kim HJ (2016) PAC-k: a parallel Aho–Corasick string matching approach on graphic processing units using non-overlapped threads. IEICE Trans Commun 99(7):1523–1531

    Article  Google Scholar 

  28. NVIDIA (2017). http://www.nvidia.com/page/home.html. Accessed 4 Mar 2017

  29. Fang J, Varbanescu AL, Sips H (2011) A comprehensive performance comparison of CUDA and OpenCL. In: 2011 International Conference on Parallel Processing (ICPP). IEEE, pp 216–225

  30. NVIDIA (2017) GeForce GTX 780. https://www.geforce.com/hardware/desktop-gpus/geforce-gtx-780/specifications. Accessed 27 Oct 2017

  31. NVIDIA (2017) GeForce GTX 660. http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-660. Accessed 27 Oct 2017

Download references

Acknowledgements

We would like to thank Mr. Ji-Won Song, MS. candidate at School of EEE in Dankook University, who have helped us set up the Linux-based experimental environment using GPU.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to HyunJin Kim.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ho, T., Oh, SR. & Kim, H. New algorithms for fixed-length approximate string matching and approximate circular string matching under the Hamming distance. J Supercomput 74, 1815–1834 (2018). https://doi.org/10.1007/s11227-017-2192-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-017-2192-6

Keywords

Navigation