Quantization Techniques for Similarity Search in High-Dimensional Data Spaces

Garcia-Arellano, Christian; Sevcik, Ken

doi:10.1007/3-540-45073-4_8

Christian Garcia-Arellano^6,7 &
Ken Sevcik⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2712))

Included in the following conference series:

British National Conference on Databases

202 Accesses
3 Citations

Abstract

In the recent years, several techniques have been developed for efficient similarity search in high-dimensional data spaces. Some of the techniques, based on the idea of vector approximation via quantization, have been shown to be the most effective. The VA-file was the first technique to use vector approximation. The IQ-tree and the A-tree are subsequent techniques that impose a directory structure over the quantized VA-file representation. The performance gains of the IQ-tree result mainly from an optimized I/O strategy permitted by the directory structure. Those of the A-tree result mainly from exploiting the clustering of the data itself. In our work, first we evaluate the relative performance of these two enhanced approaches over high-dimensional data sets with different clustering characteristics. Second, we present the Clustered IQ-Tree, which is an indexing strategy that combines the best features of the IQ-tree and the A-tree, leading to better query performance than the former and more stable performance than the latter across different types of data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

D. Barbara and P. Chen. Using the fractal dimension to cluster data sets. In Proc. of the 6th KDDM, pages 260–264, 2000.
Google Scholar
S. Berchtold, C. Böhm, H. V. Jagadish, H.-P. Kriegel, and J. Sander. Independent quantization: An index compression technique for high-dimensional data spaces. In Proc. of the 16th ICDE, pages 577–588, 2000.
Google Scholar
S. Berchtold, C. Böhm, and H.-P. Kriegel. The pyramid-technique: towards breaking the curse of dimensionality. In Proc. of ACM SIGMOD Int. Conf., pages 142–153, 1998.
Google Scholar
S. Berchtold, D. Keim, and H.-P. Kriegel. The X-tree: An index structure for high-dimensional data. In Proc. of the 22nd VLDB, pages 28–39, 1996.
Google Scholar
K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft. When is “nearest neighbor” meaningful? In In Proc. of the 7th ICDT, pages 217–235, 1999.
Google Scholar
C. Böhm. A cost model for query processing in high-dimensional data spaces. ACM Transactions on Database Systems, 25:129–178, 2000.
Article Google Scholar
C. Böhm, S. Berchtold, and D. Keim. Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Comp. Surveys, 33(3):322–373, 2001.
Article Google Scholar
K. Chakrabarti and S. Mehrotra. Local dimensionality reduction: A new approach to indexing high dimensional spaces. In The VLDB Journal, pages 89–100, 2000.
Google Scholar
S. Chen, P. Gibbons, T. Mowry, and G. Valentin. Fractal prefetching b+-trees: Optimizing both cache and disk performance. Proc. of ACM SIGMOD Int. Conf., pages 157–168, 2002.
Google Scholar
H. Ferhatosmanoglu, I. Stanoi, D. Agrawal, and A. E. Abbadi. Constrained nearest neighbor queries. In In Proc. of the 7th Int. Symp. on Spatial and Temporal Databases SSTD, pages 257–278, 2001.
Google Scholar
H. Ferhatosmanoglu, E. Tuncel, D. Agrawal, and A. E. Abbadi. Approximate nearest neighbor searching in multimedia databases. In Proc. of the 17th ICDE, pages 503–511, 2001.
Google Scholar
E. Forgy. Cluster analysis for multivariate data: Efficiency vs. interpretability of classifications. Biometrics, 21, 1965.
Google Scholar
V. Gaede and O. Günther. Multidimensional access methods. ACM Comp. Surveys, 30(2):170–231, 1998.
Article Google Scholar
C. Garcia-Arellano. Quantization techniques for similarity search in high-dimensional data spaces, 2002. Master’s Thesis. Computer Science Deptartment, University of Toronto, Canada.
Google Scholar
C. Garcia-Arellano and K. Sevcik. Quantization techniques for similarity search in high-dimensional data spaces, 2003. Technical Report CSRG-471. Computer Science Deptartment, University of Toronto, Canada.
Google Scholar
N. Katayama and S. Satoh. The SR-tree: an index structure for high-dimensional nearest neighbor queries. In Proc. of ACM SIGMOD Int. Conf., pages 369–380, 1997.
Google Scholar
C. Li, E. Chang, H. Garcia-Molina, and G. Wiederhold. Clustering for approximate similarity search in high-dimensional spaces. IEEE Trans. on Knowledge and Data Engineering, 14(4):792–808, 2002.
Article Google Scholar
Y. Sakurai, M. Yoshikawa, S. Uemura, and H. Kojima. The A-tree: An index structure for high-dimensional spaces using relative approximation. In Proc. of the 26th VLDB, pages 516–526, 2000.
Google Scholar
B. Seeger, P. A. Larson, and R. McFayden. Reading a set of disk pages. In Proc. of the 19th VLDB, pages 592–603, 1998.
Google Scholar
R. Weber, H.-J. Schek, and S. Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In Proc. of the 24th VLDB, pages 194–205, 24–27 1998.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Toronto, Canada
Christian Garcia-Arellano & Ken Sevcik
IBM Toronto Lab, Toronto, Canada
Christian Garcia-Arellano

Authors

Christian Garcia-Arellano
View author publications
You can also search for this author in PubMed Google Scholar
Ken Sevcik
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Mathematical and Information Sciences, Coventry University, Priory Street, Coventry, CV1 5FB, UK
Anne James & Muhammad Younas &
Department of Computer Science, University of Exeter, Prince of Wales Road, Exeter, EX4 4PT, UK
Brian Lings

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Garcia-Arellano, C., Sevcik, K. (2003). Quantization Techniques for Similarity Search in High-Dimensional Data Spaces. In: James, A., Younas, M., Lings, B. (eds) New Horizons in Information Management. BNCOD 2003. Lecture Notes in Computer Science, vol 2712. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45073-4_8

Download citation

DOI: https://doi.org/10.1007/3-540-45073-4_8
Published: 18 June 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40536-8
Online ISBN: 978-3-540-45073-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics