Abstract
In biological sequence pattern mining, pattern matching is a core component to count the matches of each candidate pattern. We consider patterns with wildcard gaps. A wildcard gap matches any subsequence with a length between predefined lower and upper bounds. Since the number of candidate patterns might be huge, the efficiency of pattern matching is critical. We study two existing pattern matching algorithms named Pattern mAtching with Independent wildcard Gaps (PAIG) and Gap Constraint Search (GCS). GCS was designed to deal with patterns with identical gaps, and we propose to revise it for the case of independent gaps. PAIG can deal with global length constraints while GCS cannot. Both algorithms have the same space complexity. In the worst case, the time complexity of GCS is lower. However, in the best case, PAIG is more efficient. We discuss appropriate selection between PAIG and GCS through theoretical analysis and experimental results on a biological sequence.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Cole, R., Gottlieb, L.A., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: Proceedings of the 36th ACM Symposium on the Theory of Computing, pp. 91–100 (2004)
Zhang, M., Kao, B., Cheung, D.W., Yip, K.Y.: Mining periodic patterns with gap requirement from sequences. In: Proceedings of ACM SIGMOD, Baltimore Maryland, pp. 623–633 (2005)
Zhu, X., Wu, X.: Discovering relational patterns across multiple databases. In: Proceedings of IEEE 23rd International Conference on Data Engineering (ICDE 2007), pp. 726–735 (2007)
Coward, E., Drablφs, F.: Detecting periodic patterns in biological sequences. Bioinformatics 14(6), 498–507 (1998)
National Center for Biotechnology Information, http://www.ncbi.nlm.nih.gov/
Fischer, M.J., Paterson, M.S.: String matching and other products. In: Karp, R.M. (ed.) Complexity of Computation. SIAM-AMS Proceedings, vol. 7, pp. 113–125 (1974)
Manber, U., Baeza-Yates, R.: An algorithm for string matching with a sequence of don’t cares. Information Processing Letters 37(3), 133–136 (1991)
Akutsu, T.: Approximate string matching with variable length don’t care characters. IEICE Transactions on Information Systems E79-D(9), 1353–1354 (1996)
Cole, R., Hariharan, R.: Verifying candidate matches in sparse and wildcard matching. In: Proceedings of the 34th Annual ACM Symposium on Theory of Computing (STOC 2002), pp. 592–601. ACM, New York (2002)
Min, F., Wu, X., Lu, Z.: Pattern matching with independent wildcard gaps. In: PICom (accepted, 2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Min, F., Wu, X. (2009). A Comparative Study of Pattern Matching Algorithms on Sequences. In: Sakai, H., Chakraborty, M.K., Hassanien, A.E., Ślęzak, D., Zhu, W. (eds) Rough Sets, Fuzzy Sets, Data Mining and Granular Computing. RSFDGrC 2009. Lecture Notes in Computer Science(), vol 5908. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10646-0_62
Download citation
DOI: https://doi.org/10.1007/978-3-642-10646-0_62
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10645-3
Online ISBN: 978-3-642-10646-0
eBook Packages: Computer ScienceComputer Science (R0)