Abstract
We introduce a novel alphabet sampling technique for speeding up both online and indexed string matching. We choose a subset of the alphabet and select the corresponding subsequence of the text. Online or indexed searching is then carried out on that subsequence, and candidate matches are verified in the full text. We show that this speeds up online searching, especially for moderate to long patterns, by a factor of up to 5. For indexed searching we achieve indexes that are as fast as the classical suffix array, yet occupy space less than 0.5 times the text size (instead of 4) plus text. Our experiments show no competitive alternatives in a wide space/time range.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abouelhoda, M., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enchanced suffix arrays. Journal of Discrete Algorithms 2(1), 53–86 (2004)
Baeza-Yates, R.: String searching algorithms revisited. In: Dehne, F., Sack, J.R., Santoro, N. (eds.) WADS 1989. LNCS, vol. 382, pp. 75–96. Springer, Heidelberg (1989)
Crochemore, M., Czumaj, A., Gąsieniec, L., Jarominek, S., Lecroq, T., Plandowski, W., Rytter, W.: Speeding up two string-matching algorithms. Algorithmica 12, 247–267 (1994)
Ferragina, P., Fischer, J.: Suffix arrays on words. In: Ma, B., Zhang, K. (eds.) CPM 2007. LNCS, vol. 4580, pp. 328–339. Springer, Heidelberg (2007)
Ferragina, P., González, R., Navarro, G., Venturini, R.: Compressed text indexes: From theory to practice (manuscript 2007), http://pizzachili.dcc.uchile.cl
González, R., Navarro, G.: Compressed text indexes with fast locate. In: Ma, B., Zhang, K. (eds.) CPM 2007. LNCS, vol. 4580, pp. 216–227. Springer, Heidelberg (2007)
Horspool, R.N.: Practical fast searching in strings. Software – Practise & Experience 10, 501–506 (1980)
Kärkkäinen, J., Ukkonen, E.: Sparse suffix trees. In: Cai, J., Wong, C.K. (eds.) COCOON 1996. LNCS, vol. 1090, pp. 219–230. Springer, Heidelberg (1996)
Knuth, D.E., Morris, J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM Journal on Computing 6, 323–350 (1977)
Manber, U., Myers, G.: Suffix arrays: A new method for online string searches. SIAM Journal on Computing 22(5), 935–948 (1993)
Moura, E., Navarro, G., Ziviani, N., Baeza-Yates, R.: Fast and flexible word searching on compressed text. ACM Trans. on Information Systems 18(2), 113–139 (2000)
Navarro, G., Baeza-Yates, R., Sutinen, E., Tarhio, J.: Indexing methods for approximate string matching. IEEE Data Engineering Bulletin 24(4), 19–27 (2001)
Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Computing Surveys 39(1), 1–61 (2007)
Navarro, G., Raffinot, M.: Flexible Pattern Matching in Strings – Practical on-line search algorithms for texts and biological sequences. Cambridge University Press, Cambridge (2002)
Rautio, J., Tanninen, J., Tarhio, J.: String matching with stopper encoding and code splitting. In: Apostolico, A., Takeda, M. (eds.) CPM 2002. LNCS, vol. 2373, pp. 45–52. Springer, Heidelberg (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Claude, F., Navarro, G., Peltola, H., Salmela, L., Tarhio, J. (2008). Speeding Up Pattern Matching by Text Sampling. In: Amir, A., Turpin, A., Moffat, A. (eds) String Processing and Information Retrieval. SPIRE 2008. Lecture Notes in Computer Science, vol 5280. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89097-3_10
Download citation
DOI: https://doi.org/10.1007/978-3-540-89097-3_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89096-6
Online ISBN: 978-3-540-89097-3
eBook Packages: Computer ScienceComputer Science (R0)