A Look-Ahead Branch and Bound Pruning Scheme for Trie-Based Approximate String Matching

Badr, Ghada; Oommen, John B.

doi:10.1007/3-540-32390-2_8

Ghada Badr³ &
John B. Oommen³

Part of the book series: Advances in Soft Computing ((AINSC,volume 30))

1575 Accesses

Abstract

This paper deals with the problem of estimating a transmitted string X* by processing the corresponding string Y, which is a noisy version of X*. We assume that Y contains substitution, insertion and deletion errors, and that X* is an element of a finite (but possibly, large) dictionary, H. The best estimate X ⁺ of X*, is defined as that element of H which minimizes the Generalized Levenshtein Distance D(X, Y) between X and Y such that the total number of errors is not more than K, for all X ∈ H. In this paper we present a new Branch and Bound pruning strategy that can be applied to dictionary-based approximate string matching when the dictionary is stored as a trie. The new strategy attempts to look ahead at each node, c, before moving further, by merely evaluating a certain local criterion at c. As opposed to the reported trie-based methods [10], [17], the pruning is done a priori before even embarking on the edit distance computations and thus it combines the advantages of partitioning the dictionary according to the string lengths, and the advantages gleaned by representing H using the trie data structure. The results demonstrate a marked improvement (even up to 33%) with respect to the number of operations needed on three benchmark dictionaries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Fast String Dictionary Lookup with One Error

Lempel–Ziv-78 Compressed String Dictionaries

Article 26 July 2017

Compressed String Dictionary Search with Edit Distance One

Article 25 March 2015

References

A. Acharya, H. Zhu, and K. Shen (1999) Adaptive algorithms for cache-efficient trie search. ACM and SIAM Workshop on Algorithm Engineering and Experimentation.
Google Scholar
G. Badr and B. J. Oommen (2005) A look-ahead branch pruning scheme for trie-based approximate string matching. Unabridged version of the present paper.
Google Scholar
J. Bentley and R. Sedgewick (1997) Fast algorithms for sorting and searching strings. Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans.
Google Scholar
H. Bunke (1993) Structural and syntactic pattern recognition. In: Handbook of Pattern Recognition and Computer Vision. Edited by C.H. Chen, L.F. Pau and P.S.P. Wang, World Scientific, Singapore.
Google Scholar
W. Chang and E. Lawler (1992) Approximate string matching in sublinear expected time. 13th Annual Symposium on Foundations of Computer Science, 116–124.
Google Scholar
J. Clement, P. Flajolet, and B. Vallee (1998) The analysis of hybrid trie structures. Proc. Annual A CM-SIAM Symp. on Discrete Algorithms, San Francisco, California, 531–539.
Google Scholar
G. Dewey (1923) Relative Frequency of English Speech Sounds. Harvard University Press.
Google Scholar
M. Du and S. Chang (1994) An approach to designing very fast approximate string matching algorithms. IEEE Transactions on Knowledge and Data Engineering, 6(4):620–633.
Article Google Scholar
M. Firebaugh (1988) Artificial Intelligence: A Knowledge-Based Approach. Boyd and Fraser.
Google Scholar
R. L. Kashyap and B. J. Oommen (1981) An effective algorithm for string correction using generalized edit distances-i. description of the algorithm and its optimality. Inf. Sci., 23(2):123–142.
Article Google Scholar
G. Navarro (2001) A guided tour to approximate string matching. ACM Computing Surveys, 33(1):31–88.
Article Google Scholar
K. Oflazer (1996) Error-tolerant finite state recognition with applications to morphological analysis and spelling correction. Computational Linguistics, 22(1):73–89.
Google Scholar
B. J. Oommen (1987) Recognition of noisy subsequences using constrained edit distances. IEEE Trans. on Pattern Anal. and Mach. Intel.,PAMI-9:676–685.
Article Google Scholar
B. J. Oommen and G. Badr (2004) Dictionary-based syntactic pattern recognition using tries. Proceedings of the Joint IARR International Workshops SSPR 2004 and SPR 2004, 251–259.
Google Scholar
B. J. Oommen and R. K. S. Loke (2003) Syntactic pattern recognition involving traditional and generalized transposition errors: Attaining the information theoretic bound. Submitted for Pubication.
Google Scholar
D. Sankoff and J. B. Kruskal (1983) Time Warps, String Edits and Macromolecules: The Theory and practice of Sequence Comparison. Addison-Wesley.
Google Scholar
H. Shang and T. Merrettal (1996) Tries for approximate string matching. IEEE Transactions on Knowledge and Data Engineering, 8(4):540–547.
Article Google Scholar
G. A. Stephen (2000) String Searching Algorithms, volume 6. Lecture Notes Series on Computing, World Scientific, Sihgapore, NJ.
Google Scholar
E. Ukkonen (1985) Algorithm for approximate string matching. Information and control, 64:100–118.
Article MATH MathSciNet Google Scholar
R. Wagner and A. Fischer (1974) The string-to-string correction problem. Journal of the Association for Computing Machinery (ACM), 21:168–173.
MATH MathSciNet Google Scholar
R. A. Wagner (1974) Order-n correction for regular languages. Comm. ACM, 17:265–268.
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, Carleton University, Ottawa, Canada, K1S 5B6
Ghada Badr & John B. Oommen (Fellow of the IEEE)

Authors

Ghada Badr
View author publications
You can also search for this author in PubMed Google Scholar
John B. Oommen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Electronics, Wroclaw University of Technology, Wybrzeze Wyspianskiego 27, 50-370, Wroclaw, Poland
Marek Kurzyński , Edward Puchała , Michał Woźniak & Andrzej żołnierek , , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Badr, G., Oommen, J.B. (2005). A Look-Ahead Branch and Bound Pruning Scheme for Trie-Based Approximate String Matching. In: Kurzyński, M., Puchała, E., Woźniak, M., żołnierek, A. (eds) Computer Recognition Systems. Advances in Soft Computing, vol 30. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-32390-2_8

Download citation

DOI: https://doi.org/10.1007/3-540-32390-2_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25054-8
Online ISBN: 978-3-540-32390-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

A Look-Ahead Branch and Bound Pruning Scheme for Trie-Based Approximate String Matching

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Fast String Dictionary Lookup with One Error

Lempel–Ziv-78 Compressed String Dictionaries

Compressed String Dictionary Search with Edit Distance One

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Look-Ahead Branch and Bound Pruning Scheme for Trie-Based Approximate String Matching

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Fast String Dictionary Lookup with One Error

Lempel–Ziv-78 Compressed String Dictionaries

Compressed String Dictionary Search with Edit Distance One

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation