Abstract
In the recent years, several techniques have been developed for efficient similarity search in high-dimensional data spaces. Some of the techniques, based on the idea of vector approximation via quantization, have been shown to be the most effective. The VA-file was the first technique to use vector approximation. The IQ-tree and the A-tree are subsequent techniques that impose a directory structure over the quantized VA-file representation. The performance gains of the IQ-tree result mainly from an optimized I/O strategy permitted by the directory structure. Those of the A-tree result mainly from exploiting the clustering of the data itself. In our work, first we evaluate the relative performance of these two enhanced approaches over high-dimensional data sets with different clustering characteristics. Second, we present the Clustered IQ-Tree, which is an indexing strategy that combines the best features of the IQ-tree and the A-tree, leading to better query performance than the former and more stable performance than the latter across different types of data sets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
D. Barbara and P. Chen. Using the fractal dimension to cluster data sets. In Proc. of the 6th KDDM, pages 260–264, 2000.
S. Berchtold, C. Böhm, H. V. Jagadish, H.-P. Kriegel, and J. Sander. Independent quantization: An index compression technique for high-dimensional data spaces. In Proc. of the 16th ICDE, pages 577–588, 2000.
S. Berchtold, C. Böhm, and H.-P. Kriegel. The pyramid-technique: towards breaking the curse of dimensionality. In Proc. of ACM SIGMOD Int. Conf., pages 142–153, 1998.
S. Berchtold, D. Keim, and H.-P. Kriegel. The X-tree: An index structure for high-dimensional data. In Proc. of the 22nd VLDB, pages 28–39, 1996.
K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft. When is “nearest neighbor” meaningful? In In Proc. of the 7th ICDT, pages 217–235, 1999.
C. Böhm. A cost model for query processing in high-dimensional data spaces. ACM Transactions on Database Systems, 25:129–178, 2000.
C. Böhm, S. Berchtold, and D. Keim. Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Comp. Surveys, 33(3):322–373, 2001.
K. Chakrabarti and S. Mehrotra. Local dimensionality reduction: A new approach to indexing high dimensional spaces. In The VLDB Journal, pages 89–100, 2000.
S. Chen, P. Gibbons, T. Mowry, and G. Valentin. Fractal prefetching b+-trees: Optimizing both cache and disk performance. Proc. of ACM SIGMOD Int. Conf., pages 157–168, 2002.
H. Ferhatosmanoglu, I. Stanoi, D. Agrawal, and A. E. Abbadi. Constrained nearest neighbor queries. In In Proc. of the 7th Int. Symp. on Spatial and Temporal Databases SSTD, pages 257–278, 2001.
H. Ferhatosmanoglu, E. Tuncel, D. Agrawal, and A. E. Abbadi. Approximate nearest neighbor searching in multimedia databases. In Proc. of the 17th ICDE, pages 503–511, 2001.
E. Forgy. Cluster analysis for multivariate data: Efficiency vs. interpretability of classifications. Biometrics, 21, 1965.
V. Gaede and O. Günther. Multidimensional access methods. ACM Comp. Surveys, 30(2):170–231, 1998.
C. Garcia-Arellano. Quantization techniques for similarity search in high-dimensional data spaces, 2002. Master’s Thesis. Computer Science Deptartment, University of Toronto, Canada.
C. Garcia-Arellano and K. Sevcik. Quantization techniques for similarity search in high-dimensional data spaces, 2003. Technical Report CSRG-471. Computer Science Deptartment, University of Toronto, Canada.
N. Katayama and S. Satoh. The SR-tree: an index structure for high-dimensional nearest neighbor queries. In Proc. of ACM SIGMOD Int. Conf., pages 369–380, 1997.
C. Li, E. Chang, H. Garcia-Molina, and G. Wiederhold. Clustering for approximate similarity search in high-dimensional spaces. IEEE Trans. on Knowledge and Data Engineering, 14(4):792–808, 2002.
Y. Sakurai, M. Yoshikawa, S. Uemura, and H. Kojima. The A-tree: An index structure for high-dimensional spaces using relative approximation. In Proc. of the 26th VLDB, pages 516–526, 2000.
B. Seeger, P. A. Larson, and R. McFayden. Reading a set of disk pages. In Proc. of the 19th VLDB, pages 592–603, 1998.
R. Weber, H.-J. Schek, and S. Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In Proc. of the 24th VLDB, pages 194–205, 24–27 1998.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Garcia-Arellano, C., Sevcik, K. (2003). Quantization Techniques for Similarity Search in High-Dimensional Data Spaces. In: James, A., Younas, M., Lings, B. (eds) New Horizons in Information Management. BNCOD 2003. Lecture Notes in Computer Science, vol 2712. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45073-4_8
Download citation
DOI: https://doi.org/10.1007/3-540-45073-4_8
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40536-8
Online ISBN: 978-3-540-45073-3
eBook Packages: Springer Book Archive