Skip to main content
Log in

Dimensionality reduction for similarity search with the Euclidean distance in high-dimensional applications

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In multimedia information retrieval, multimedia data are represented as vectors in high-dimensional space. To search these vectors efficiently, a variety of indexing methods have been proposed. However, the performance of these indexing methods degrades dramatically with increasing dimensionality, which is known as the dimensionality curse. To resolve the dimensionality curse, dimensionality reduction methods have been proposed. They map feature vectors in high-dimensional space into vectors in low-dimensional space before the data are indexed. This paper proposes a novel method for dimensionality reduction based on a function that approximates the Euclidean distance based on the norm and angle components of a vector. First, we identify the causes of, and discuss basic solutions to, errors in angle approximation during the approximation of the Euclidean distance. Then, this paper propose a new method for dimensionality reduction that extracts a set of subvectors from a feature vector and maintains only the norm and the approximated angle for every subvector. The selection of a good reference vector is crucial for accurate approximation of the angle component. We present criteria for being a good reference vector, and propose a method that chooses a good reference vector. Also, we define a novel distance function using the norm and angle components, and formally prove that the distance function consistently lower-bounds the Euclidean distance. This implies information retrieval with this function does not incur any false dismissals. Finally, the superiority of the proposed approach is verified via extensive experiments with synthetic and real-life data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Aggarwal CC (2001) On the effects of dimensionality reduction on high dimensional similarity search. In: Proc. int’l. symp. on principles of database systems, ACM SIGACT-SIGMOD-SIGART, Santa Barbara, 21–23 May 2001, pp 256–266

  2. Agrawal R, Faloutsos C, Swami A (1993) Efficient similarity search in sequence databases. In: Proc. int’l. conf. on foundations of data organization and algorithms, FODO, Chicago, 13–15 October 1993, pp 69–84

  3. Beckmann N, Kriegel HP, Schneider R, Seeger B (1990) The R*-tree: an efficient and robust access method for points and rectangles. In: Proc. intl. conf. on management of data, ACM SIGMOD, Atlantic City, 23–25 May 1990, pp 322–331

  4. Berchtold S, Böhm C, Braunmüller B, Keim D, Kriegel H-P (1997) Fast parallel similarity search in multimedia databases. In: Proc. int’l. conf. on management of data, ACM SIGMOD, Tucson, 13–15 May 1997, pp 1–12

  5. Beyer KS, Goldstein J, Ramakrishnan R, Shaft U (1999) When is nearest neighbor meaningful? In: Proc. int’l. conf. on database theory, ICDT, Jerusalem, 10–12 January 1999, pp 217–235

  6. Böhm C, Berchtold S, Keim D (2001) Searching in high-dimensional spaces-index structures for improving the performance of multimedia databases. ACM Comput Surv 33(3):322–373

    Article  Google Scholar 

  7. Ciaccia P, Patella M, Zezula P (1997) M-tree: an efficient access method for similarity search in metric spaces. In: Proc int’l. conf. on very large data bases, VLDB, Athens, 25–29 August 1997, pp 426–435

  8. Egecioglu Ö (2001) Parametric approximation algorithms for high-dimensional euclidean similarity. In: Proc. european conf. on principles of data mining and knowledge discovery, PKDD, Freiburg, 3–5 September 2001, pp 79–90

  9. Egecioglu Ö, Ferhatosmanoglu H, Ogras U (2004) Dimensionality reduction and similarity computation by inner product approximations. IEEE Trans Knowl Data Eng 16(6):714–726

    Article  Google Scholar 

  10. Eidenberger H (2004) A new method for visual descriptor evaluation. In: Proc. SPIE storage and retrieval methods and applications for multimedia, San Jose, January 2004, pp 145–157

  11. Faloutsos C, Barber R, Flickner M, Niblack W, Petkovic D, Equitz W (1994) Efficient and effective querying by image content. J Intell Inf Syst 3(3/4):231–262

    Article  Google Scholar 

  12. Jeong S, Kim S-W, Kim K, Choi, B-U (2006) An effective method for approximating the euclidean distance in high-dimensional space. In: Proc. int’l. conf. on databases and expert systems applications, Krakow, 4–8 September 2006, pp 863–872

  13. Kanth KVR, Agrawal D, Singh A (1998) Dimensionality reduction for similarity searching in dynamic databases. In: Proc. int’l. conf. on management of data, ACM SIGMOD, Seattle, 2–4 June 1998, pp 166–176

  14. Katayama N, Satoh S (1997) The SR-Tree: an index structure for high-dimensional nearest neighbor queries. In: Proc. int’l. conf. on management of data, ACM SIGMOD, Tucson, 13–15 May 1997, pp 369–380

  15. Krishnamachari S, Abdel-Mottaleb M (1999) Hierarchical clustering algorithm for fast image retrieval. In: Proc. SPIE int’l. conf. on storage and retrieval for image and video databases, San Jose, 26–29 January 1999, pp 427–435

  16. Lee S-M, Abbott AL, Araman PA (2007) Dimensionality reduction and clustering on statistical manifolds. In: Proc. IEEE conf. on computer vision and pattern recognition, Minneapolis, 18–23 June 2007, pp 1–7

  17. Lin K, Jagadish H, Faloutsos C (1994) The TV-Tree: an index structure for high dimensional data. VLDB J 3(4):517–542

    Article  Google Scholar 

  18. Lin T, Zha H, Lee SU (2006) Riemannian manifold learning for nonlinear dimensionality reduction. In: Proc. European conf. on computer vision, Graz, 7–13 May 2006, pp 44–55

  19. Mertins A (2000) Signal analysis. Wiley, New York

    Google Scholar 

  20. Moon TK, Stirling WC (2000) Mathematical methods and algorithms for signal processing. Prentice-Hall, New York

    Google Scholar 

  21. Niblack W, Barber R, Equitz W, Flickner M, Glasman E, Petkovic D, Yanker P (1993) The QBIC project: querying images by content using color, texture, and shape. In: Proc. SPIE int’l. conf. storage and retrieval for image and video databases, San Jose, 31 January–5 February 1993, pp 173–187

  22. Ogras U, Ferhatosmanoglu H (2003) Dimensionality reduction using magnitude and shape approximations. In: Proc. int’l. conf. on information and knowledge management, ACM CIKM, New Orleans, 3–8 November 2003, pp 99–107

  23. Pagel B-U, Six H-W, Winter M (1995) Window query-optimal clustering of spatial objects. In: Proc. int’l. conf. on principals of database systems, pp 86–94

  24. Press WH, Teukolsky SA, Vetterling WT, Flannery BP (2002) Numerical recipes in C++. Cambridge University Press, Cambridge

    Google Scholar 

  25. Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323–2326

    Article  Google Scholar 

  26. Seidl T, Kriegel H-P (1997) Efficient user-adaptable similarity search in large multimedia databases. In: Proc. int’l. conf. on very large data bases, VLDB, pp 506–515

  27. Seidl T, Kriegel H-P (1998) Optimal multi-step k-nearest neighbor search. In: Proc. int’l. conf. on management of data. ACM SIGMOD, pp 154–165

  28. Tenenbaum JB, De Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290:2319–2323

    Article  Google Scholar 

  29. Thomasian A, Zhang L (2008) Persistent clustered main memory index for accelerating k-NN queries on high dimensional datasets. Multimed Tools Appl 38(2):253–270

    Article  Google Scholar 

  30. Thomasian A, Li Y, Zhang L (2008) Optimal subspace dimensionality for k-Nearest-neighbor queries on clusterd and dimensionality reduced datasets with SVD. Multimed Tools Appl 40(2):241–259

    Article  Google Scholar 

  31. Weber R, Schek HJ, Blott S (1998) A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: Proc. int’l. conf. on very large data bases, VLDB, pp 194–205

  32. White DA, Jain R (1996) Similarity indexing with the SS-tree. In: Proc. IEEE int’l. conf. on data engineering, pp 516–523

  33. Xiao L, Sun J, Boyd SP (2006) A duality view of spectral methods for dimensionality reduction. In: ICML2006, pp 1041–1048

  34. University of California (1999) Corel image features. http://kdd.ics.uci.edu/databases/CorelFeatures/CorelFeatures.html

Download references

Acknowledgements

This work was supported by the Korea Research Foundation Grant funded by the Korean Government(MOEHRD) (Grant: KRF-2005-041-D00651), the Korea Research Foundation Grant funded by the Korean Government (MOEHRD, Basic Research Promotion Fund)(Grant: KRF-2007-314-D00221), and the MKE(Ministry of Knowledge Economy), Korea, under the ITRC(Information Technology Research Center) support program supervised by the IITA(Institute of Information Technology Advancement)(Grant: IITA-2008-C1090-0801-0040). And, all correspondences of this work should be addressed to S.-W. Kim.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sang-Wook Kim.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jeong, S., Kim, SW. & Choi, BU. Dimensionality reduction for similarity search with the Euclidean distance in high-dimensional applications. Multimed Tools Appl 42, 251–271 (2009). https://doi.org/10.1007/s11042-008-0243-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-008-0243-y

Keywords

Navigation