Skip to main content

Proximity matching using fixed-queries trees

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 807))

Abstract

We present a new data structure, called the fixed-queries tree, for the problem of finding all elements of a fixed set that are close, under some distance function, to a query element. Fixed-queries trees can be used for any distance function, not necessarily even a metric, as long as it satisfies the triangle inequality. We give an analysis of several performance parameters of fixed-queries trees and experimental results that support the analysis. Fixed-queries trees are particularly efficient for applications in which comparing two elements is expensive.

This work was partially supported by Grant 1930765 from Fondecyt.

Supported in part by NSF grants CCR-9002351 and CCR-9301129, and by the Advanced Research Projects Agency under contract number DABT63-93-C-0052. The information contained in this paper does not necessarily reflect the position or the policy of the U.S. Government or other sponsors of this research. No official endorsement should be inferred.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Altschul S.F., W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, “Basic local alignment search tool,” J. Molecular Biology 15 (1990), 403–410.

    Google Scholar 

  2. Baeza-Yates, R.A. and Cunto, W., “Unbalanced Multiway Trees Improved by Partial Expansions”, Acta Information, 29 (5), 1992, 443–460.

    Google Scholar 

  3. Baeza-Yates, R.A. and Gonnet, G.H., “All-against-all Sequence Matching”, Dept. of Computer Science, Universidad de Chile, 1990.

    Google Scholar 

  4. Bahl L. R., P. S. Gopalakrishnan, D. S. Kanevsky, and D.S. Nahamoo, “A fast admissible method for identifying a short list of candidate words,” IBM tech report RC 15874 (June 1990).

    Google Scholar 

  5. Bugnion, E. and Roos, T. and Shi, F. and Widmayer, P. and Widmer, F. “A Spatial Index for Approximate Multiple String Matching”, 1st South American Workshop on String Processing, Belo Horizonte, Sept 1993, 43–54.

    Google Scholar 

  6. Burkhard, W.A. and Keller, R.M. “Some Approaches to Best-Match File Searching”, Communications of the ACM 16 (4), April 1973, 230–236.

    Google Scholar 

  7. Chang W.L., and E.L. Lawler, “Approximate matching in sublinear expected time,” Proc. of the 31st IEEE Symp. on Foundations of Computer Science (1990) 116–124.

    Google Scholar 

  8. Friedman, J.H. and Bentley, J.L. and Finkel, R.A. “An Algorithm to find best matches in logarithmic expected time”, ACM Trans. on Math. Software 3(3), 1977.

    Google Scholar 

  9. Gonnet, G.H. and Baeza-Yates, R. Handbook of Algorithms and Data Structures, Addison-Wesley, second edition, 1991.

    Google Scholar 

  10. Gonnet, G.H., M.A. Cohen, and S.A. Benner, “Exhaustive matching of the entire protein sequence database,” Science 256, 1443.

    Google Scholar 

  11. Lipman D. J., and W.R. Pearson, “Rapid and sensitive protein similarity searches,” Science 227 (1985), 1435–1441.

    Google Scholar 

  12. Mahmoud, H. Evolution of Random Search Trees, John Wiley, New York, 1992.

    Google Scholar 

  13. Murtagh, F. “A Survey of Recent Advances in Hierarchical Clustering Algorithms”, IEEE Computer 26 (4), 1983, 354–359.

    Google Scholar 

  14. Myers, E. “Algorithmic Advances for Searching Biosequence Databases,” Proceedings of the International Symposium on Computational Methods in Genome Research (Heidelberg, 1992), to appear.

    Google Scholar 

  15. Myers, E. “A Sublinear Algorithm for Approximate Keyword Matching,” Algorithmica, in press.

    Google Scholar 

  16. Nevalainen, O. and Katajainen, J. “Experiments with a Closest Point Algorithm in Hamming Space”, Angewandte Informatik 5, 1982, 277–281.

    Google Scholar 

  17. Santana, O. and Diaz, M. and Duque, J.D. and Rodriguez, J.C. “Increasing radius search schemes for the most similar strings on the Burkhard-Keller tree”, International Workshop on Computer Aided Systems Theory, EUROCAST'89, 1989.

    Google Scholar 

  18. Shapiro, M. “The Choice of Reference Points in Best-Match File Searching”, Communications of the ACM 20 (5), May 1977, 339–343.

    Google Scholar 

  19. Shasha, D. and Wang, T-L. “New Techniques for Best-Match Retrieval”, ACM Transactions on Information Systems 8, 1990, 140–158.

    Google Scholar 

  20. Ukkonen, E., “Approximate string matching with q-grams and maximal matches,” Theoretical Computer Science (1992), 191–212.

    Google Scholar 

  21. Ukkonen, E., “Approximate string-matching over suffix trees,” 4th Annual Combinatorial Pattern Matching Symp., Padova, Italy (June 1993), 228–242.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Maxime Crochemore Dan Gusfield

Rights and permissions

Reprints and permissions

Copyright information

© 1994 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Baeza-Yates, R., Cunto, W., Manber, U., Wu, S. (1994). Proximity matching using fixed-queries trees. In: Crochemore, M., Gusfield, D. (eds) Combinatorial Pattern Matching. CPM 1994. Lecture Notes in Computer Science, vol 807. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-58094-8_18

Download citation

  • DOI: https://doi.org/10.1007/3-540-58094-8_18

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-58094-2

  • Online ISBN: 978-3-540-48450-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics