Skip to main content

A Comparative Study of Pattern Matching Algorithms on Sequences

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5908))

Abstract

In biological sequence pattern mining, pattern matching is a core component to count the matches of each candidate pattern. We consider patterns with wildcard gaps. A wildcard gap matches any subsequence with a length between predefined lower and upper bounds. Since the number of candidate patterns might be huge, the efficiency of pattern matching is critical. We study two existing pattern matching algorithms named Pattern mAtching with Independent wildcard Gaps (PAIG) and Gap Constraint Search (GCS). GCS was designed to deal with patterns with identical gaps, and we propose to revise it for the case of independent gaps. PAIG can deal with global length constraints while GCS cannot. Both algorithms have the same space complexity. In the worst case, the time complexity of GCS is lower. However, in the best case, PAIG is more efficient. We discuss appropriate selection between PAIG and GCS through theoretical analysis and experimental results on a biological sequence.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cole, R., Gottlieb, L.A., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: Proceedings of the 36th ACM Symposium on the Theory of Computing, pp. 91–100 (2004)

    Google Scholar 

  2. Zhang, M., Kao, B., Cheung, D.W., Yip, K.Y.: Mining periodic patterns with gap requirement from sequences. In: Proceedings of ACM SIGMOD, Baltimore Maryland, pp. 623–633 (2005)

    Google Scholar 

  3. Zhu, X., Wu, X.: Discovering relational patterns across multiple databases. In: Proceedings of IEEE 23rd International Conference on Data Engineering (ICDE 2007), pp. 726–735 (2007)

    Google Scholar 

  4. Coward, E., Drablφs, F.: Detecting periodic patterns in biological sequences. Bioinformatics 14(6), 498–507 (1998)

    Article  Google Scholar 

  5. National Center for Biotechnology Information, http://www.ncbi.nlm.nih.gov/

  6. Fischer, M.J., Paterson, M.S.: String matching and other products. In: Karp, R.M. (ed.) Complexity of Computation. SIAM-AMS Proceedings, vol. 7, pp. 113–125 (1974)

    Google Scholar 

  7. Manber, U., Baeza-Yates, R.: An algorithm for string matching with a sequence of don’t cares. Information Processing Letters 37(3), 133–136 (1991)

    Article  MATH  MathSciNet  Google Scholar 

  8. Akutsu, T.: Approximate string matching with variable length don’t care characters. IEICE Transactions on Information Systems E79-D(9), 1353–1354 (1996)

    Google Scholar 

  9. Cole, R., Hariharan, R.: Verifying candidate matches in sparse and wildcard matching. In: Proceedings of the 34th Annual ACM Symposium on Theory of Computing (STOC 2002), pp. 592–601. ACM, New York (2002)

    Chapter  Google Scholar 

  10. Min, F., Wu, X., Lu, Z.: Pattern matching with independent wildcard gaps. In: PICom (accepted, 2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Min, F., Wu, X. (2009). A Comparative Study of Pattern Matching Algorithms on Sequences. In: Sakai, H., Chakraborty, M.K., Hassanien, A.E., Ślęzak, D., Zhu, W. (eds) Rough Sets, Fuzzy Sets, Data Mining and Granular Computing. RSFDGrC 2009. Lecture Notes in Computer Science(), vol 5908. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10646-0_62

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-10646-0_62

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-10645-3

  • Online ISBN: 978-3-642-10646-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics