Proximity matching using fixed-queries trees

Baeza-Yates, Ricardo; Cunto, Walter; Manber, Udi; Wu, Sun

doi:10.1007/3-540-58094-8_18

Proximity matching using fixed-queries trees

Ricardo Baeza-Yates¹,
Walter Cunto²,
Udi Manber³ &
…
Sun Wu⁴

Conference paper
First Online: 01 January 2005

249 Accesses
67 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 807))

Abstract

We present a new data structure, called the fixed-queries tree, for the problem of finding all elements of a fixed set that are close, under some distance function, to a query element. Fixed-queries trees can be used for any distance function, not necessarily even a metric, as long as it satisfies the triangle inequality. We give an analysis of several performance parameters of fixed-queries trees and experimental results that support the analysis. Fixed-queries trees are particularly efficient for applications in which comparing two elements is expensive.

This work was partially supported by Grant 1930765 from Fondecyt.

Supported in part by NSF grants CCR-9002351 and CCR-9301129, and by the Advanced Research Projects Agency under contract number DABT63-93-C-0052. The information contained in this paper does not necessarily reflect the position or the policy of the U.S. Government or other sponsors of this research. No official endorsement should be inferred.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

References

Altschul S.F., W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, “Basic local alignment search tool,” J. Molecular Biology 15 (1990), 403–410.
Google Scholar
Baeza-Yates, R.A. and Cunto, W., “Unbalanced Multiway Trees Improved by Partial Expansions”, Acta Information, 29 (5), 1992, 443–460.
Google Scholar
Baeza-Yates, R.A. and Gonnet, G.H., “All-against-all Sequence Matching”, Dept. of Computer Science, Universidad de Chile, 1990.
Google Scholar
Bahl L. R., P. S. Gopalakrishnan, D. S. Kanevsky, and D.S. Nahamoo, “A fast admissible method for identifying a short list of candidate words,” IBM tech report RC 15874 (June 1990).
Google Scholar
Bugnion, E. and Roos, T. and Shi, F. and Widmayer, P. and Widmer, F. “A Spatial Index for Approximate Multiple String Matching”, 1st South American Workshop on String Processing, Belo Horizonte, Sept 1993, 43–54.
Google Scholar
Burkhard, W.A. and Keller, R.M. “Some Approaches to Best-Match File Searching”, Communications of the ACM 16 (4), April 1973, 230–236.
Google Scholar
Chang W.L., and E.L. Lawler, “Approximate matching in sublinear expected time,” Proc. of the 31st IEEE Symp. on Foundations of Computer Science (1990) 116–124.
Google Scholar
Friedman, J.H. and Bentley, J.L. and Finkel, R.A. “An Algorithm to find best matches in logarithmic expected time”, ACM Trans. on Math. Software 3(3), 1977.
Google Scholar
Gonnet, G.H. and Baeza-Yates, R. Handbook of Algorithms and Data Structures, Addison-Wesley, second edition, 1991.
Google Scholar
Gonnet, G.H., M.A. Cohen, and S.A. Benner, “Exhaustive matching of the entire protein sequence database,” Science 256, 1443.
Google Scholar
Lipman D. J., and W.R. Pearson, “Rapid and sensitive protein similarity searches,” Science 227 (1985), 1435–1441.
Google Scholar
Mahmoud, H. Evolution of Random Search Trees, John Wiley, New York, 1992.
Google Scholar
Murtagh, F. “A Survey of Recent Advances in Hierarchical Clustering Algorithms”, IEEE Computer 26 (4), 1983, 354–359.
Google Scholar
Myers, E. “Algorithmic Advances for Searching Biosequence Databases,” Proceedings of the International Symposium on Computational Methods in Genome Research (Heidelberg, 1992), to appear.
Google Scholar
Myers, E. “A Sublinear Algorithm for Approximate Keyword Matching,” Algorithmica, in press.
Google Scholar
Nevalainen, O. and Katajainen, J. “Experiments with a Closest Point Algorithm in Hamming Space”, Angewandte Informatik 5, 1982, 277–281.
Google Scholar
Santana, O. and Diaz, M. and Duque, J.D. and Rodriguez, J.C. “Increasing radius search schemes for the most similar strings on the Burkhard-Keller tree”, International Workshop on Computer Aided Systems Theory, EUROCAST'89, 1989.
Google Scholar
Shapiro, M. “The Choice of Reference Points in Best-Match File Searching”, Communications of the ACM 20 (5), May 1977, 339–343.
Google Scholar
Shasha, D. and Wang, T-L. “New Techniques for Best-Match Retrieval”, ACM Transactions on Information Systems 8, 1990, 140–158.
Google Scholar
Ukkonen, E., “Approximate string matching with q-grams and maximal matches,” Theoretical Computer Science (1992), 191–212.
Google Scholar
Ukkonen, E., “Approximate string-matching over suffix trees,” 4th Annual Combinatorial Pattern Matching Symp., Padova, Italy (June 1993), 228–242.
Google Scholar

Download references

Author information

Authors and Affiliations

Dpto. de Ciencias de la Computación, Universidad de Chile, Blanco Encalada, 2120, Santiago, Chile
Ricardo Baeza-Yates
IBM Consulting Group, Aptdo. 64778 & Dpto. de Computación y Tecnología de la Información, Univ. Simón Bolivar, A.P. 68000, Caracas, Venezuela
Walter Cunto
Dept. of Computer Science, University of Arizona, 85721, Tucson, AZ, USA
Udi Manber
Dept. of Computer Science, National Chung-Cheng Univ., Ming-Shong, Chia-Yi, Taiwan
Sun Wu

Authors

Ricardo Baeza-Yates
View author publications
You can also search for this author in PubMed Google Scholar
Walter Cunto
View author publications
You can also search for this author in PubMed Google Scholar
Udi Manber
View author publications
You can also search for this author in PubMed Google Scholar
Sun Wu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Maxime Crochemore Dan Gusfield

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Baeza-Yates, R., Cunto, W., Manber, U., Wu, S. (1994). Proximity matching using fixed-queries trees. In: Crochemore, M., Gusfield, D. (eds) Combinatorial Pattern Matching. CPM 1994. Lecture Notes in Computer Science, vol 807. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-58094-8_18

Download citation

DOI: https://doi.org/10.1007/3-540-58094-8_18
Published: 07 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-58094-2
Online ISBN: 978-3-540-48450-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics