Skip to main content

Easing the Dimensionality Curse by Stretching Metric Spaces

  • Conference paper
Book cover Scientific and Statistical Database Management (SSDBM 2009)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5566))

Abstract

Queries over sets of complex elements are performed extracting features from each element, which are used in place of the real ones during the processing. Extracting a large number of significant features increases the representative power of the feature vector and improves the query precision. However, each feature is a dimension in the representation space, consequently handling more features worsen the dimensionality curse. The problem derives from the fact that the elements tends to distribute all over the space and a large dimensionality allows them to spread over much broader spaces. Therefore, in high-dimensional spaces, elements are frequently farther from each other, so the distance differences among pairs of elements tends to homogenize. When searching for nearest neighbors, the first one is usually not close, but as long as one is found, small increases in the query radius tend to include several others. This effect increases the overlap between nodes in access methods indexing the dataset. Both spatial and metric access methods are sensitive to the problem. This paper presents a general strategy applicable to metric access methods in general, improving the performance of similarity queries in high dimensional spaces. Our technique applies a function that “stretches” the distances. Thus, close objects become closer and far ones become even farther. Experiments using the metric access method Slim-tree show that similarity queries performed in the transformed spaces demands up to 70% less distance calculations, 52% less disk access and reduces up to 57% in total time when comparing with the original spaces.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Smeulders, A.W.M., Worring, M., Santini, S., Gupta, A., Jain, R.: Content-based image retrieval at the end of the early years. IEEE-PAMI 22(12), 1349–1380 (2000)

    Article  Google Scholar 

  2. Güntzer, U., Balke, W.T., Kiessling, W.: Optimizing multi-feature queries for image databases. In: VLDB, Cairo - Egypt, pp. 419–428 (2000)

    Google Scholar 

  3. Felipe, J.C., Traina, A.J.M., Caetano Traina, J.: Global warp metric distance: Boosting content-based image retrieval through histograms. In: ISM 2005: Proceedings of the Seventh IEEE International Symposium on Multimedia, Washington, DC, USA, pp. 295–302. IEEE Computer Society, Los Alamitos (2005)

    Google Scholar 

  4. Bugatti, H.P., Traina, A.J.M., Traina, C.J.: Assessing the best integration between distance-function and image-feature to answer similarity queries. In: 23rd Annual ACM Symposium on Applied Computing (SAC 2008), Fortaleza, Ceará - Brazil, pp. 1225–1230. ACM Press, New York (2008)

    Google Scholar 

  5. Beyer, K., Godstein, J., Ramakrishnan, R., Shaft, U.: When is ”nearest neighbor” meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  6. Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the surprising behavior of distance metrics in high dimensional spaces. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 420–434. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  7. Korn, F., Pagel, B.U., Faloutsos, C.: On the ’dimensionality curse’ and the ’self-similarity blessing’. IEEE Transactions on Knowledge and Data Engineering (TKDE) 13(1), 96–111 (2001)

    Article  Google Scholar 

  8. Gaede, V., Günther, O.: Multidimensional access methods. ACM Computing Surveys 30(2), 170–231 (1998)

    Article  Google Scholar 

  9. Berchtold, S., Böhm, C., Kriegel, H.P.: The pyramid-tree: Breaking the curse of dimensionality. In: ACM SIGMOD International Conference on Management of Data, Seattle, WA, pp. 142–153 (1998)

    Google Scholar 

  10. Yianilos, P.N.: Locally lifting the curse of dimensionality for nearest neighbor search. In: Eleventh Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 361–370 (2000)

    Google Scholar 

  11. Papamarkos, N., Atsalakis, A.E., Strouthopoulos, C.P.: Adaptive color reduction. IEEE Transactions on Systems, Man and Cybernetics 32(1), 44–56 (2002)

    Article  MATH  Google Scholar 

  12. Park, M., Jin, J.S., Wilson, L.S.: Fast content-based image retrieval using quasi-gabor filter and reduction of image feature dimension. In: SSIAI 2002, Santa Fe, New Mexico, pp. 178–182. IEEE Computer Society, Los Alamitos (2002)

    Google Scholar 

  13. Ye, J., Li, Q., Xiong, H., Park, H., Janardan, R., Kumar, V.: Idr/qr: An incremental dimension reduction algorithm via qr decomposition. TKDE 17(9), 1208–1222 (2005)

    Google Scholar 

  14. Chávez, E., Navarro, G., Baeza-Yates, R.A., Marroquín, J.L.: Searching in metric spaces. ACM Computing Surveys 33(3), 273–321 (2001)

    Article  Google Scholar 

  15. Hjaltason, G.R., Samet, H.: Index-driven similarity search in metric spaces. ACM-TODS 21(4), 517–580 (2003)

    Article  Google Scholar 

  16. Ciaccia, P., Patella, M., Zezula, P.: M-tree: An efficient access method for similarity search in metric spaces. In: Jarke, M. (ed.) VLDB, Athens, Greece, pp. 426–435. Morgan Kaufmann, San Francisco (1997)

    Google Scholar 

  17. Traina Jr., C., Traina, A.J.M., Faloutsos, C., Seeger, B.: Fast indexing and visualization of metric datasets using slim-trees. IEEE Transactions on Knowledge and Data Engineering (TKDE) 14(2), 244–260 (2002)

    Article  Google Scholar 

  18. Santos Filho, R.F., Traina, A.J.M., Traina Jr., C., Faloutsos, C.: Similarity search without tears: The omni family of all-purpose access methods. In: ICDE, Heidelberg, Germany, pp. 623–630. IEEE Computer Society Press, Los Alamitos (2001)

    Google Scholar 

  19. Jagadish, H.V., Ooi, B.C., Tan, K.L., Yu, C., Zhang, R.: idistance: An adaptive b+-tree based indexing method for nearest neighbor search. TODS 30(1), 364–397 (2005)

    Article  Google Scholar 

  20. Nadvorny, C.F., Heuser, C.A.: Twisting the metric space to achieve better metric trees. In: SBBD, pp. 178–190 (2004)

    Google Scholar 

  21. Katayama, N., Satoh, S.: Distinctiveness-sensitive nearest neighbor search for efficient similarity retrieval of multimedia information. In: ICDE, Washington, DC, USA, pp. 493–502. IEEE Computer Society Press, Los Alamitos (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Pola, I.R.V., Traina, A.J.M., Traina, C. (2009). Easing the Dimensionality Curse by Stretching Metric Spaces. In: Winslett, M. (eds) Scientific and Statistical Database Management. SSDBM 2009. Lecture Notes in Computer Science, vol 5566. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02279-1_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-02279-1_30

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-02278-4

  • Online ISBN: 978-3-642-02279-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics