Skip to main content

Quantization Techniques for Similarity Search in High-Dimensional Data Spaces

  • Conference paper
  • First Online:
Book cover New Horizons in Information Management (BNCOD 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2712))

Included in the following conference series:

Abstract

In the recent years, several techniques have been developed for efficient similarity search in high-dimensional data spaces. Some of the techniques, based on the idea of vector approximation via quantization, have been shown to be the most effective. The VA-file was the first technique to use vector approximation. The IQ-tree and the A-tree are subsequent techniques that impose a directory structure over the quantized VA-file representation. The performance gains of the IQ-tree result mainly from an optimized I/O strategy permitted by the directory structure. Those of the A-tree result mainly from exploiting the clustering of the data itself. In our work, first we evaluate the relative performance of these two enhanced approaches over high-dimensional data sets with different clustering characteristics. Second, we present the Clustered IQ-Tree, which is an indexing strategy that combines the best features of the IQ-tree and the A-tree, leading to better query performance than the former and more stable performance than the latter across different types of data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. D. Barbara and P. Chen. Using the fractal dimension to cluster data sets. In Proc. of the 6th KDDM, pages 260–264, 2000.

    Google Scholar 

  2. S. Berchtold, C. Böhm, H. V. Jagadish, H.-P. Kriegel, and J. Sander. Independent quantization: An index compression technique for high-dimensional data spaces. In Proc. of the 16th ICDE, pages 577–588, 2000.

    Google Scholar 

  3. S. Berchtold, C. Böhm, and H.-P. Kriegel. The pyramid-technique: towards breaking the curse of dimensionality. In Proc. of ACM SIGMOD Int. Conf., pages 142–153, 1998.

    Google Scholar 

  4. S. Berchtold, D. Keim, and H.-P. Kriegel. The X-tree: An index structure for high-dimensional data. In Proc. of the 22nd VLDB, pages 28–39, 1996.

    Google Scholar 

  5. K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft. When is “nearest neighbor” meaningful? In In Proc. of the 7th ICDT, pages 217–235, 1999.

    Google Scholar 

  6. C. Böhm. A cost model for query processing in high-dimensional data spaces. ACM Transactions on Database Systems, 25:129–178, 2000.

    Article  Google Scholar 

  7. C. Böhm, S. Berchtold, and D. Keim. Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Comp. Surveys, 33(3):322–373, 2001.

    Article  Google Scholar 

  8. K. Chakrabarti and S. Mehrotra. Local dimensionality reduction: A new approach to indexing high dimensional spaces. In The VLDB Journal, pages 89–100, 2000.

    Google Scholar 

  9. S. Chen, P. Gibbons, T. Mowry, and G. Valentin. Fractal prefetching b+-trees: Optimizing both cache and disk performance. Proc. of ACM SIGMOD Int. Conf., pages 157–168, 2002.

    Google Scholar 

  10. H. Ferhatosmanoglu, I. Stanoi, D. Agrawal, and A. E. Abbadi. Constrained nearest neighbor queries. In In Proc. of the 7th Int. Symp. on Spatial and Temporal Databases SSTD, pages 257–278, 2001.

    Google Scholar 

  11. H. Ferhatosmanoglu, E. Tuncel, D. Agrawal, and A. E. Abbadi. Approximate nearest neighbor searching in multimedia databases. In Proc. of the 17th ICDE, pages 503–511, 2001.

    Google Scholar 

  12. E. Forgy. Cluster analysis for multivariate data: Efficiency vs. interpretability of classifications. Biometrics, 21, 1965.

    Google Scholar 

  13. V. Gaede and O. Günther. Multidimensional access methods. ACM Comp. Surveys, 30(2):170–231, 1998.

    Article  Google Scholar 

  14. C. Garcia-Arellano. Quantization techniques for similarity search in high-dimensional data spaces, 2002. Master’s Thesis. Computer Science Deptartment, University of Toronto, Canada.

    Google Scholar 

  15. C. Garcia-Arellano and K. Sevcik. Quantization techniques for similarity search in high-dimensional data spaces, 2003. Technical Report CSRG-471. Computer Science Deptartment, University of Toronto, Canada.

    Google Scholar 

  16. N. Katayama and S. Satoh. The SR-tree: an index structure for high-dimensional nearest neighbor queries. In Proc. of ACM SIGMOD Int. Conf., pages 369–380, 1997.

    Google Scholar 

  17. C. Li, E. Chang, H. Garcia-Molina, and G. Wiederhold. Clustering for approximate similarity search in high-dimensional spaces. IEEE Trans. on Knowledge and Data Engineering, 14(4):792–808, 2002.

    Article  Google Scholar 

  18. Y. Sakurai, M. Yoshikawa, S. Uemura, and H. Kojima. The A-tree: An index structure for high-dimensional spaces using relative approximation. In Proc. of the 26th VLDB, pages 516–526, 2000.

    Google Scholar 

  19. B. Seeger, P. A. Larson, and R. McFayden. Reading a set of disk pages. In Proc. of the 19th VLDB, pages 592–603, 1998.

    Google Scholar 

  20. R. Weber, H.-J. Schek, and S. Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In Proc. of the 24th VLDB, pages 194–205, 24–27 1998.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Garcia-Arellano, C., Sevcik, K. (2003). Quantization Techniques for Similarity Search in High-Dimensional Data Spaces. In: James, A., Younas, M., Lings, B. (eds) New Horizons in Information Management. BNCOD 2003. Lecture Notes in Computer Science, vol 2712. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45073-4_8

Download citation

  • DOI: https://doi.org/10.1007/3-540-45073-4_8

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40536-8

  • Online ISBN: 978-3-540-45073-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics