Skip to main content

Average-Optimal Multiple Approximate String Matching

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2676))

Abstract

We present a new algorithm for multiple approximate string matching, based on an extension of the optimal (on average) single-pattern approximate string matching algorithm of Chang and Marr. Our algorithm inherits the optimality and is also competitive in practice. We present a second algorithm that is linear time and handles higher difference ratios. We show experimentally that our algorithms are the fastest for intermediate difference ratios, an area where the only existing algorithms permitted simultaneous search for just a few patterns. Our algorithm is also resistant to the number of patterns, being effective for hundreds of patterns. Hence we fill an important gap in approximate string matching techniques, since no effective algorithms existed to search for many patterns with an intermediate difference ratio.

Work developed while the author was working in the Dept. of Computer Science, University of Helsinki. Supported by the Academy of Finland.

Partially supported by Fondecyt grant 1-020831.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Baeza-Yates and G. Navarro. Multiple approximate string matching. In F. Dehne et al., editor, Proceedings of the 5th Annual Workshop on Algorithms and Data Structures (WADS’97), pages 174–184, 1997.

    Google Scholar 

  2. R. Baeza-Yates and G. Navarro. New and faster filters for multiple approximate string matching. Random Structures and Algorithms (RSA), 20:23–49, 2002.

    Article  MATH  MathSciNet  Google Scholar 

  3. W. Chang and T. Marr. Approximate string matching and local similarity. In Proc. 5th Combinatorial Pattern Matching (CPM’94), LNCS 807, pages 259–273, 1994.

    Google Scholar 

  4. M. Crochemore and W. Rytter. Text Algorithms. Oxford University Press, 1994.

    Google Scholar 

  5. H. Hyyrö and G. Navarro. Faster bit-parallel approximate string matching. In Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching (CPM 2002), LNCS 2373, pages 203–224, 2002.

    Chapter  Google Scholar 

  6. R. Muth and U. Manber. Approximate multiple string search. In Proc. 7th Combinatorial Pattern Matching (CPM’96), LNCS 1075, pages 75–86, 1996.

    Google Scholar 

  7. E. W. Myers. A fast bit-vector algorithm for approximate string matching based on dynamic programming. Journal of the ACM, 46(3):395–415, 1999.

    Article  MATH  MathSciNet  Google Scholar 

  8. G. Navarro. A guided tour to approximate string matching. ACM Computing Surveys, 33(1):31–88, 2001.

    Article  Google Scholar 

  9. G. Navarro, R. Baeza-Yates, E. Sutinen, and J. Tarhio. Indexing methods for approximate string matching. IEEE Data Engineering Bulletin, 24(4):19–27, 2001. Special issue on Managing Text Natively and in DBMSs.

    Google Scholar 

  10. G. Navarro, E. Sutinen, J. Tanninen, and J. Tarhio. Indexing text with approximate q-grams. In Proc. 11th Combinatorial Pattern Matching (CPM 2000), LNCS 1848, pages 350–363, 2000.

    Chapter  Google Scholar 

  11. P. Sellers. The theory and computation of evolutionary distances: pattern recognition. Journal of Algorithms, 1:359–373, 1980.

    Article  MATH  MathSciNet  Google Scholar 

  12. E. Sutinen and J. Tarhio. Filtration with q-samples in approximate string matching. In D. S. Hirschberg and E. W. Myers, editors, Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching, number 1075 in Lecture Notes in Computer Science, pages 50–63, Laguna Beach, CA, 1996. Springer-Verlag, Berlin.

    Google Scholar 

  13. E. Ukkonen. Finding approximate patterns in strings. Journal of Algorithms, 6:132–137, 1985.

    Article  MATH  MathSciNet  Google Scholar 

  14. A. C. Yao. The complexity of pattern matching for a random string. SIAM Journal of Computing, 8(3):368–387, 1979.

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fredriksson, K., Navarro, G. (2003). Average-Optimal Multiple Approximate String Matching. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds) Combinatorial Pattern Matching. CPM 2003. Lecture Notes in Computer Science, vol 2676. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44888-8_9

Download citation

  • DOI: https://doi.org/10.1007/3-540-44888-8_9

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40311-1

  • Online ISBN: 978-3-540-44888-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics