Abstract
This paper deals with the problem of estimating a transmitted string X* by processing the corresponding string Y, which is a noisy version of X*. We assume that Y contains substitution, insertion and deletion errors, and that X* is an element of a finite (but possibly, large) dictionary, H. The best estimate X + of X*, is defined as that element of H which minimizes the Generalized Levenshtein Distance D(X, Y) between X and Y such that the total number of errors is not more than K, for all X ∈ H. In this paper we present a new Branch and Bound pruning strategy that can be applied to dictionary-based approximate string matching when the dictionary is stored as a trie. The new strategy attempts to look ahead at each node, c, before moving further, by merely evaluating a certain local criterion at c. As opposed to the reported trie-based methods [10], [17], the pruning is done a priori before even embarking on the edit distance computations and thus it combines the advantages of partitioning the dictionary according to the string lengths, and the advantages gleaned by representing H using the trie data structure. The results demonstrate a marked improvement (even up to 33%) with respect to the number of operations needed on three benchmark dictionaries.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
A. Acharya, H. Zhu, and K. Shen (1999) Adaptive algorithms for cache-efficient trie search. ACM and SIAM Workshop on Algorithm Engineering and Experimentation.
G. Badr and B. J. Oommen (2005) A look-ahead branch pruning scheme for trie-based approximate string matching. Unabridged version of the present paper.
J. Bentley and R. Sedgewick (1997) Fast algorithms for sorting and searching strings. Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans.
H. Bunke (1993) Structural and syntactic pattern recognition. In: Handbook of Pattern Recognition and Computer Vision. Edited by C.H. Chen, L.F. Pau and P.S.P. Wang, World Scientific, Singapore.
W. Chang and E. Lawler (1992) Approximate string matching in sublinear expected time. 13th Annual Symposium on Foundations of Computer Science, 116–124.
J. Clement, P. Flajolet, and B. Vallee (1998) The analysis of hybrid trie structures. Proc. Annual A CM-SIAM Symp. on Discrete Algorithms, San Francisco, California, 531–539.
G. Dewey (1923) Relative Frequency of English Speech Sounds. Harvard University Press.
M. Du and S. Chang (1994) An approach to designing very fast approximate string matching algorithms. IEEE Transactions on Knowledge and Data Engineering, 6(4):620–633.
M. Firebaugh (1988) Artificial Intelligence: A Knowledge-Based Approach. Boyd and Fraser.
R. L. Kashyap and B. J. Oommen (1981) An effective algorithm for string correction using generalized edit distances-i. description of the algorithm and its optimality. Inf. Sci., 23(2):123–142.
G. Navarro (2001) A guided tour to approximate string matching. ACM Computing Surveys, 33(1):31–88.
K. Oflazer (1996) Error-tolerant finite state recognition with applications to morphological analysis and spelling correction. Computational Linguistics, 22(1):73–89.
B. J. Oommen (1987) Recognition of noisy subsequences using constrained edit distances. IEEE Trans. on Pattern Anal. and Mach. Intel.,PAMI-9:676–685.
B. J. Oommen and G. Badr (2004) Dictionary-based syntactic pattern recognition using tries. Proceedings of the Joint IARR International Workshops SSPR 2004 and SPR 2004, 251–259.
B. J. Oommen and R. K. S. Loke (2003) Syntactic pattern recognition involving traditional and generalized transposition errors: Attaining the information theoretic bound. Submitted for Pubication.
D. Sankoff and J. B. Kruskal (1983) Time Warps, String Edits and Macromolecules: The Theory and practice of Sequence Comparison. Addison-Wesley.
H. Shang and T. Merrettal (1996) Tries for approximate string matching. IEEE Transactions on Knowledge and Data Engineering, 8(4):540–547.
G. A. Stephen (2000) String Searching Algorithms, volume 6. Lecture Notes Series on Computing, World Scientific, Sihgapore, NJ.
E. Ukkonen (1985) Algorithm for approximate string matching. Information and control, 64:100–118.
R. Wagner and A. Fischer (1974) The string-to-string correction problem. Journal of the Association for Computing Machinery (ACM), 21:168–173.
R. A. Wagner (1974) Order-n correction for regular languages. Comm. ACM, 17:265–268.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Badr, G., Oommen, J.B. (2005). A Look-Ahead Branch and Bound Pruning Scheme for Trie-Based Approximate String Matching. In: Kurzyński, M., Puchała, E., Woźniak, M., żołnierek, A. (eds) Computer Recognition Systems. Advances in Soft Computing, vol 30. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-32390-2_8
Download citation
DOI: https://doi.org/10.1007/3-540-32390-2_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25054-8
Online ISBN: 978-3-540-32390-7
eBook Packages: EngineeringEngineering (R0)