Trading Quality for Time with Nearest-Neighbor Search

Weber, Roger; Böhm, Klemens

doi:10.1007/3-540-46439-5_2

Trading Quality for Time with Nearest-Neighbor Search

Roger Weber⁷ &
Klemens Böhm⁷

Conference paper
First Online: 01 January 2000

677 Accesses
24 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1777))

Abstract

In many situations, users would readily accept an approximate query result if evaluation of the query becomes faster. In this article, we investigate approximate evaluation techniques based on the VA-File for Nearest-Neighbor Search (NN-Search). The VA-File contains approximations of feature points. These approximations frequently suffice to eliminate the vast majority of points in a first phase. Then, a second phase identifies the NN by computing exact distances of all remaining points. To develop approximate query-evaluation techniques, we proceed in two steps: first, we derive an analytic model for VA-File based NN-search. This is to investigate the relationship between approximation granularity, effectiveness of the filtering step and search performance. In more detail, we develop formulae for the distribution of the error of the bounds and the duration of the different phases of query evaluation. Based on these results, we develop different approximate query evaluation techniques. The first one adapts the bounds to have a more rigid filtering, the second one skips computation of the exact distances. Experiments show that these techniques have the desired effect: for instance, when allowing for a small but specific reduction of result quality, we observed a speedup of 7 in 50-NN search.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Sunil Arya et al. An Optimal Algorithm for Approximate Nearest Neighbor Searching in Fixed Dimensions. Technical report, 1998.
Google Scholar
D. Barbara, W. DuMouchel, C. Faloutsos, P. J. Haas, J.M. Hellerstein, Y. Ioannidis, H.V. Jagadish, T. Johnson, R. Ng, V. Poosala, K.A. Ross, and K. C. Sevcik. The New Jersey data reduction report. Data Engineering, 20(4):3–45, 1997.
Google Scholar
N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The R*-tree: An efficient and robust access method for points and rectangles. In Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data, pages 322–331, Atlantic City, NJ, 23–25 May 1990.
Google Scholar
S. Berchtold, C. Böhm, B. Braunmüller, D.A. Keim, and H.-P. Kriegel. Fast parallel similarity search in multimedia databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 1–12, Tucson, USA, 1997.
Google Scholar
S. Berchtold, D.A. Keim, and H.-P. Kriegel. The X-tree: An index structure for high-dimensional data. In Proceedings of the International Conference on Very Large Databases (VLDB), pages 28–39, 1996.
Google Scholar
K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft. When is “nearest neighbour” meaningful? In Catriel Beeri and Peter Buneman, editors, Proc. 7th Int. Conf. Data Theory, ICDT, number 1540 in Lecture Notes in Computer Science, LNCS, pages 217–235. Springer-Verlag, 10–12 January 1999.
Google Scholar
P. Ciaccia, M. Patella, and P. Zezula. M-tree: An efficient access method for similarity search in metric spaces. In Proceedings of the International Conference on Very Large Databases (VLDB), Greece, 1997.
Google Scholar
Paolo Ciaccia, Marco Patella, and Pavel Zezula. A cost model for similarity queries in metric spaces. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS), 1998.
Google Scholar
A. Dimai. Spatial encoding using differences of global features. In Storage and Retrieval for Image and Video Databases IV, volume 3022 of SPIE Proceedings Series, pages 352–360, Feb. 1997.
Google Scholar
Ronald Fagin. Combining fuzzy information from multiple systems. In Procedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, volume PODS, pages 216–226, Montreal, Canada, June 1996.
Google Scholar
Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Similarity Search in High Dimensions via Hasing. In Proceedings of the 25th International Conference on Very Large Data Bases. Morgan Kaufmann, 1999. Edinburgh, Scotland.
Google Scholar
A. Guttman. R-trees: A dynamic index structure for spatial searching. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 47–57, Boston, MA, June 1984.
Google Scholar
K. V. R. Kanth, D. Agrawal, and A. Singh. Dimensionality reduction for similarity searching in dynamic databases. SIGMOD Record (ACM Special Interest Group on Management of Data), 27(2):166–176, 1998.
Google Scholar
N. Katayama and S. Satoh. The SR-tree: An index structure for high-dimensional nearest neighbor queries. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 369–380, Tucson, Arizon USA, 1997.
Google Scholar
R. Weber, H.-J. Schek, and S. Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In Proceedings of the International Conference on Very Large Databases (VLDB), volume 24, New York, USA, August 1998.
Google Scholar
Roger Weber and Klemens Böhm. Trading quality for time with nearestneighbor search. Technical report, Dept. of Computer Science, 1999. Available at http://www-dbs.ethz.ch/~weber/paper/EDBT00Long.ps.

Download references

Author information

Authors and Affiliations

Institute of Information Systems, ETH Zentrum, 8092, Zurich, Switzerland
Roger Weber & Klemens Böhm

Authors

Roger Weber
View author publications
You can also search for this author in PubMed Google Scholar
Klemens Böhm
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science Department, University of California, Los Angeles, CA, 90095, USA
Carlo Zaniolo
Computer Science Department, University of Karlsruhe, P.O. Box 6980, 76128, Karlsruhe, Germany
Peter C. Lockemann
University of Konstanz, P.O. Box D188, 78457, Konstanz, Germany
Marc H. Scholl & Torsten Grust &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Weber, R., Böhm, K. (2000). Trading Quality for Time with Nearest-Neighbor Search. In: Zaniolo, C., Lockemann, P.C., Scholl, M.H., Grust, T. (eds) Advances in Database Technology — EDBT 2000. EDBT 2000. Lecture Notes in Computer Science, vol 1777. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46439-5_2

Download citation

DOI: https://doi.org/10.1007/3-540-46439-5_2
Published: 24 March 2000
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67227-2
Online ISBN: 978-3-540-46439-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics