Abstract
We present a new algorithm for multiple approximate string matching, based on an extension of the optimal (on average) single-pattern approximate string matching algorithm of Chang and Marr. Our algorithm inherits the optimality and is also competitive in practice. We present a second algorithm that is linear time and handles higher difference ratios. We show experimentally that our algorithms are the fastest for intermediate difference ratios, an area where the only existing algorithms permitted simultaneous search for just a few patterns. Our algorithm is also resistant to the number of patterns, being effective for hundreds of patterns. Hence we fill an important gap in approximate string matching techniques, since no effective algorithms existed to search for many patterns with an intermediate difference ratio.
Work developed while the author was working in the Dept. of Computer Science, University of Helsinki. Supported by the Academy of Finland.
Partially supported by Fondecyt grant 1-020831.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
R. Baeza-Yates and G. Navarro. Multiple approximate string matching. In F. Dehne et al., editor, Proceedings of the 5th Annual Workshop on Algorithms and Data Structures (WADS’97), pages 174–184, 1997.
R. Baeza-Yates and G. Navarro. New and faster filters for multiple approximate string matching. Random Structures and Algorithms (RSA), 20:23–49, 2002.
W. Chang and T. Marr. Approximate string matching and local similarity. In Proc. 5th Combinatorial Pattern Matching (CPM’94), LNCS 807, pages 259–273, 1994.
M. Crochemore and W. Rytter. Text Algorithms. Oxford University Press, 1994.
H. Hyyrö and G. Navarro. Faster bit-parallel approximate string matching. In Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching (CPM 2002), LNCS 2373, pages 203–224, 2002.
R. Muth and U. Manber. Approximate multiple string search. In Proc. 7th Combinatorial Pattern Matching (CPM’96), LNCS 1075, pages 75–86, 1996.
E. W. Myers. A fast bit-vector algorithm for approximate string matching based on dynamic programming. Journal of the ACM, 46(3):395–415, 1999.
G. Navarro. A guided tour to approximate string matching. ACM Computing Surveys, 33(1):31–88, 2001.
G. Navarro, R. Baeza-Yates, E. Sutinen, and J. Tarhio. Indexing methods for approximate string matching. IEEE Data Engineering Bulletin, 24(4):19–27, 2001. Special issue on Managing Text Natively and in DBMSs.
G. Navarro, E. Sutinen, J. Tanninen, and J. Tarhio. Indexing text with approximate q-grams. In Proc. 11th Combinatorial Pattern Matching (CPM 2000), LNCS 1848, pages 350–363, 2000.
P. Sellers. The theory and computation of evolutionary distances: pattern recognition. Journal of Algorithms, 1:359–373, 1980.
E. Sutinen and J. Tarhio. Filtration with q-samples in approximate string matching. In D. S. Hirschberg and E. W. Myers, editors, Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching, number 1075 in Lecture Notes in Computer Science, pages 50–63, Laguna Beach, CA, 1996. Springer-Verlag, Berlin.
E. Ukkonen. Finding approximate patterns in strings. Journal of Algorithms, 6:132–137, 1985.
A. C. Yao. The complexity of pattern matching for a random string. SIAM Journal of Computing, 8(3):368–387, 1979.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fredriksson, K., Navarro, G. (2003). Average-Optimal Multiple Approximate String Matching. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds) Combinatorial Pattern Matching. CPM 2003. Lecture Notes in Computer Science, vol 2676. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44888-8_9
Download citation
DOI: https://doi.org/10.1007/3-540-44888-8_9
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40311-1
Online ISBN: 978-3-540-44888-4
eBook Packages: Springer Book Archive