Abstract
Pivot-based algorithms are effective tools for proximity searching in metric spaces. They allow trading space overhead for number of distance evaluations performed at query time. With additional search structures (that pose extra space overhead) they can also reduce the amount of side computations. We introduce a new data structure, the Fixed Queries Array (FQA), whose novelties are (1) it permits sublinear extra CPU time without any extra data structure; (2) it permits trading number of pivots for their precision so as to make better use of the available memory. We show experimentally that the FQA is an efficient tool to search in metric spaces and that it compares favorably against other state of the art approaches. Its simplicity converts it into a simple yet effective tool for practitioners seeking for a black-box method to plug in their applications.
Similar content being viewed by others
References
S. Arya, D. Mount, N. Netanyahu, R. Silverman, and A. Wu, “An optimal algorithm for approximate nearest neighbor searching in fixed dimension,” in Proc. 5th ACM-SIAM Symposium on Discrete Algorithms (SODA'94), Washington DC, 1994, pp. 573–583.
F. Aurenhammer, “Voronoi diagrams—a survey of a fundamental geometric data structure,” ACM Computing Surveys, Vol. 23, No 3, pp. 345–405, 1991.
R. Baeza-Yates, “ Searching: an algorithmic tour,” in Encyclopedia of Computer Science and Technology, A. Kent and J. Williams (Eds.), Vol. 37, Marcel Dekker, Inc., NY 1997, pp. 331–359.
R. Baeza-Yates, W. Cunto, U. Manber, and S. Wu, “Proximity matching using fixed-queries trees,” in Proc. 5th Combinatorial Pattern Matching (CPM'94), Asilomar, CA, 1994, pp. 198–212.
R. Baeza-Yates and G. Navarro, “Fast approximate string matching in a dictionary,” in Proc. 5th Symposium on String Processing and Information Retrieval (SPIRE'98), Santa Cruz de la Sierra, Bolivia, IEEE CS Press, 1998, pp. 14–22.
R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval, Addison-Wesley, Harlow, England 1999.
J. Bentley, “Multidimensional binary search trees used for associative searching,” Comm. of theACM,Vol. 18, No. 9, pp. 509–517, 1975.
T. Bozkaya and M. Ozsoyoglu, “Distance-based indexing for high-dimensional metric spaces,” in Proc. ACM SIGMOD International Conference on Management of Data, Sigmod Record, ACM Press, NY., 1997, Vol. 26, No. 2, pp. 357–368.
S. Brin, “Near neighbor search in large metric spaces,” in Proc. 21st Conference on Very Large Databases (VLDB'95), Zurich, Switzerland, 1995, pp. 574–584.
E. Chávez, J. Marroqín, and R. Baeza-Yates, “Spaghettis: an array based algorithm for similarity queries in metric spaces,” in Proc. 6th Symposium on String Processing and Information Retrieval (SPIRE'99), Cancun, Mexico, IEEE CS Press, 1999, pp. 38–46.
E. Chávez, J. Marroqín, and G. Navarro,“ Overcoming the curse of dimensionality, ”in EuropeanWorkshop on Content-Based Multimedia Indexing (CBMI'99), Tolouse, France, 1999, pp. 57–64.
E. Chávez, G. Navarro, R. Baeza-Yates, and J. Marroqín, “Searching in metric spaces,” To appear in ACM Computing Surveys, 2001, ACM Press, NY. ftp://ftp.dcc.uchile.cl/pub/users/gnavarro/-survmetric.ps.gz.
P. Ciaccia, M. Patella, and P. Zezula, “M-tree: an efficient access method for similarity search in metric spaces,” in Proc. of the 23rd Conference on Very Large Databases (VLDB'97), Athens, Greece, 1997, pp. 426–435.
K. Clarkson, “Nearest neighbor queries in metric spaces,” Discrete Computational Geometry, Vol. 22, No. 1, pp. 63–93, 1999.
T. Cox and M. Cox, Multidimensional Scaling. Chapman and Hall, NY 1994.
F. Dehne and H. Nolteimer, “Voronoi trees and clustering problems,” Information Systems, Vol. 12, No. 2, pp. 171–175, 1987.
C. Faloutsos and I. Kamel, “Beyond uniformity and independence: analysis of R-trees using the concept of fractal dimension,” in Proc. 13th ACM Symposium on Principles of Database Principles (PODS'94), Minneapolis, MN, 1994, pp. 4–13.
C. Faloutsos and K. Lin, “Fastmap: a fast algorithm for indexing, data mining and visualization of traditional and multimedia datasets,” ACM SIGMOD Record, Vol. 24, No. 2, pp. 163–174, 1995.
A. Guttman, “R-trees: a dynamic index structure for spatial searching,” in Proc. ACMSIGMOD International Conference on Management of Data, Boston, MA 1984, pp. 47–57.
J. Hair, R. Anderson, R. Tatham, and W. Black, Multivariate Data Analysis with Readings, 4th edition, Prentice-Hall, NJ, 1995.
I. Kalantari and G. McDonald, “A data structure and an algorithm for the nearest point problem,” IEEE Transactions on Software Engineering, Vol. 9, No. 5, 1983.
L. Micó, J. Oncina, and E. Vidal, “Anewversion of the nearest-neighbor approximating and eliminating search (AESA) with linear preprocessing-time and memory requirements,” Pattern Recognition Letters, Vol. 15, pp. 9–17, 1994.
G. Navarro, “Searching in metric spaces by spatial approximation,” in Proc. 6th Symposium on String Processing and Information Retrieval (SPIRE'99), Cancun, Mexico, IEEE CS Press, 1999, pp. 141–148.
S. Nene and S. Nayar, “A simple algorithm for nearest neighbor search in high dimensions,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 19, No. 9, pp. 989–1003, 1997.
M. Shapiro, “The choice of reference points in best-match file searching,” Comm. of the ACM, Vol. 20, No. 5, pp. 339–343, 1977.
J. Uhlmann, “Implementing metric trees to satisfy general proximity/similarity queries,” Manuscript.
J. Uhlmann, “Satisfying general proximity/similarity queries with metric trees,” Information Processing Letters, Vol. 40, pp. 175–179, 1991.
E. Vidal, “An algorithm for finding nearest neighbors in (approximately) constant average time,” Pattern Recognition Letters, Vol. 4, pp. 145–157, 1986.
P. Yianilos, “Data structures and algorithms for nearest neighbor search in general metric spaces,” in Proc. 4th ACM-SIAM Symposium on Discrete Algorithms (SODA'93), Austin, TX, 1993, pp. 311–321.
P. Yianilos, “Excluded middle vantage point forests for nearest neighbor search,” in DIMACS Implementation Challenge, ALENEX'99, Baltimore, MD, LNCS v. 1619, Springer, Berlin, Germany, 1999.
P. Yianilos, “Locally lifting the curse of dimensionality for nearest neighbor search,” in Proc. 11thACM-SIAM Symposium on Discrete Algorithms (SODA'00), San Francisco, CA, 2000, pp. 361–370.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chávez, E., Marroquín, J.L. & Navarro, G. Fixed Queries Array: A Fast and Economical Data Structure for Proximity Searching. Multimedia Tools and Applications 14, 113–135 (2001). https://doi.org/10.1023/A:1011343115154
Issue Date:
DOI: https://doi.org/10.1023/A:1011343115154