Skip to main content

A Look-Ahead Branch and Bound Pruning Scheme for Trie-Based Approximate String Matching

  • Conference paper
Computer Recognition Systems

Part of the book series: Advances in Soft Computing ((AINSC,volume 30))

  • 1575 Accesses

Abstract

This paper deals with the problem of estimating a transmitted string X* by processing the corresponding string Y, which is a noisy version of X*. We assume that Y contains substitution, insertion and deletion errors, and that X* is an element of a finite (but possibly, large) dictionary, H. The best estimate X + of X*, is defined as that element of H which minimizes the Generalized Levenshtein Distance D(X, Y) between X and Y such that the total number of errors is not more than K, for all XH. In this paper we present a new Branch and Bound pruning strategy that can be applied to dictionary-based approximate string matching when the dictionary is stored as a trie. The new strategy attempts to look ahead at each node, c, before moving further, by merely evaluating a certain local criterion at c. As opposed to the reported trie-based methods [10], [17], the pruning is done a priori before even embarking on the edit distance computations and thus it combines the advantages of partitioning the dictionary according to the string lengths, and the advantages gleaned by representing H using the trie data structure. The results demonstrate a marked improvement (even up to 33%) with respect to the number of operations needed on three benchmark dictionaries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. A. Acharya, H. Zhu, and K. Shen (1999) Adaptive algorithms for cache-efficient trie search. ACM and SIAM Workshop on Algorithm Engineering and Experimentation.

    Google Scholar 

  2. G. Badr and B. J. Oommen (2005) A look-ahead branch pruning scheme for trie-based approximate string matching. Unabridged version of the present paper.

    Google Scholar 

  3. J. Bentley and R. Sedgewick (1997) Fast algorithms for sorting and searching strings. Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans.

    Google Scholar 

  4. H. Bunke (1993) Structural and syntactic pattern recognition. In: Handbook of Pattern Recognition and Computer Vision. Edited by C.H. Chen, L.F. Pau and P.S.P. Wang, World Scientific, Singapore.

    Google Scholar 

  5. W. Chang and E. Lawler (1992) Approximate string matching in sublinear expected time. 13th Annual Symposium on Foundations of Computer Science, 116–124.

    Google Scholar 

  6. J. Clement, P. Flajolet, and B. Vallee (1998) The analysis of hybrid trie structures. Proc. Annual A CM-SIAM Symp. on Discrete Algorithms, San Francisco, California, 531–539.

    Google Scholar 

  7. G. Dewey (1923) Relative Frequency of English Speech Sounds. Harvard University Press.

    Google Scholar 

  8. M. Du and S. Chang (1994) An approach to designing very fast approximate string matching algorithms. IEEE Transactions on Knowledge and Data Engineering, 6(4):620–633.

    Article  Google Scholar 

  9. M. Firebaugh (1988) Artificial Intelligence: A Knowledge-Based Approach. Boyd and Fraser.

    Google Scholar 

  10. R. L. Kashyap and B. J. Oommen (1981) An effective algorithm for string correction using generalized edit distances-i. description of the algorithm and its optimality. Inf. Sci., 23(2):123–142.

    Article  Google Scholar 

  11. G. Navarro (2001) A guided tour to approximate string matching. ACM Computing Surveys, 33(1):31–88.

    Article  Google Scholar 

  12. K. Oflazer (1996) Error-tolerant finite state recognition with applications to morphological analysis and spelling correction. Computational Linguistics, 22(1):73–89.

    Google Scholar 

  13. B. J. Oommen (1987) Recognition of noisy subsequences using constrained edit distances. IEEE Trans. on Pattern Anal. and Mach. Intel.,PAMI-9:676–685.

    Article  Google Scholar 

  14. B. J. Oommen and G. Badr (2004) Dictionary-based syntactic pattern recognition using tries. Proceedings of the Joint IARR International Workshops SSPR 2004 and SPR 2004, 251–259.

    Google Scholar 

  15. B. J. Oommen and R. K. S. Loke (2003) Syntactic pattern recognition involving traditional and generalized transposition errors: Attaining the information theoretic bound. Submitted for Pubication.

    Google Scholar 

  16. D. Sankoff and J. B. Kruskal (1983) Time Warps, String Edits and Macromolecules: The Theory and practice of Sequence Comparison. Addison-Wesley.

    Google Scholar 

  17. H. Shang and T. Merrettal (1996) Tries for approximate string matching. IEEE Transactions on Knowledge and Data Engineering, 8(4):540–547.

    Article  Google Scholar 

  18. G. A. Stephen (2000) String Searching Algorithms, volume 6. Lecture Notes Series on Computing, World Scientific, Sihgapore, NJ.

    Google Scholar 

  19. E. Ukkonen (1985) Algorithm for approximate string matching. Information and control, 64:100–118.

    Article  MATH  MathSciNet  Google Scholar 

  20. R. Wagner and A. Fischer (1974) The string-to-string correction problem. Journal of the Association for Computing Machinery (ACM), 21:168–173.

    MATH  MathSciNet  Google Scholar 

  21. R. A. Wagner (1974) Order-n correction for regular languages. Comm. ACM, 17:265–268.

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Badr, G., Oommen, J.B. (2005). A Look-Ahead Branch and Bound Pruning Scheme for Trie-Based Approximate String Matching. In: Kurzyński, M., Puchała, E., Woźniak, M., żołnierek, A. (eds) Computer Recognition Systems. Advances in Soft Computing, vol 30. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-32390-2_8

Download citation

  • DOI: https://doi.org/10.1007/3-540-32390-2_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-25054-8

  • Online ISBN: 978-3-540-32390-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics