Abstract
Queries over sets of complex elements are performed extracting features from each element, which are used in place of the real ones during the processing. Extracting a large number of significant features increases the representative power of the feature vector and improves the query precision. However, each feature is a dimension in the representation space, consequently handling more features worsen the dimensionality curse. The problem derives from the fact that the elements tends to distribute all over the space and a large dimensionality allows them to spread over much broader spaces. Therefore, in high-dimensional spaces, elements are frequently farther from each other, so the distance differences among pairs of elements tends to homogenize. When searching for nearest neighbors, the first one is usually not close, but as long as one is found, small increases in the query radius tend to include several others. This effect increases the overlap between nodes in access methods indexing the dataset. Both spatial and metric access methods are sensitive to the problem. This paper presents a general strategy applicable to metric access methods in general, improving the performance of similarity queries in high dimensional spaces. Our technique applies a function that “stretches” the distances. Thus, close objects become closer and far ones become even farther. Experiments using the metric access method Slim-tree show that similarity queries performed in the transformed spaces demands up to 70% less distance calculations, 52% less disk access and reduces up to 57% in total time when comparing with the original spaces.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Smeulders, A.W.M., Worring, M., Santini, S., Gupta, A., Jain, R.: Content-based image retrieval at the end of the early years. IEEE-PAMI 22(12), 1349–1380 (2000)
Güntzer, U., Balke, W.T., Kiessling, W.: Optimizing multi-feature queries for image databases. In: VLDB, Cairo - Egypt, pp. 419–428 (2000)
Felipe, J.C., Traina, A.J.M., Caetano Traina, J.: Global warp metric distance: Boosting content-based image retrieval through histograms. In: ISM 2005: Proceedings of the Seventh IEEE International Symposium on Multimedia, Washington, DC, USA, pp. 295–302. IEEE Computer Society, Los Alamitos (2005)
Bugatti, H.P., Traina, A.J.M., Traina, C.J.: Assessing the best integration between distance-function and image-feature to answer similarity queries. In: 23rd Annual ACM Symposium on Applied Computing (SAC 2008), Fortaleza, Ceará - Brazil, pp. 1225–1230. ACM Press, New York (2008)
Beyer, K., Godstein, J., Ramakrishnan, R., Shaft, U.: When is ”nearest neighbor” meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)
Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the surprising behavior of distance metrics in high dimensional spaces. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 420–434. Springer, Heidelberg (2001)
Korn, F., Pagel, B.U., Faloutsos, C.: On the ’dimensionality curse’ and the ’self-similarity blessing’. IEEE Transactions on Knowledge and Data Engineering (TKDE) 13(1), 96–111 (2001)
Gaede, V., Günther, O.: Multidimensional access methods. ACM Computing Surveys 30(2), 170–231 (1998)
Berchtold, S., Böhm, C., Kriegel, H.P.: The pyramid-tree: Breaking the curse of dimensionality. In: ACM SIGMOD International Conference on Management of Data, Seattle, WA, pp. 142–153 (1998)
Yianilos, P.N.: Locally lifting the curse of dimensionality for nearest neighbor search. In: Eleventh Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 361–370 (2000)
Papamarkos, N., Atsalakis, A.E., Strouthopoulos, C.P.: Adaptive color reduction. IEEE Transactions on Systems, Man and Cybernetics 32(1), 44–56 (2002)
Park, M., Jin, J.S., Wilson, L.S.: Fast content-based image retrieval using quasi-gabor filter and reduction of image feature dimension. In: SSIAI 2002, Santa Fe, New Mexico, pp. 178–182. IEEE Computer Society, Los Alamitos (2002)
Ye, J., Li, Q., Xiong, H., Park, H., Janardan, R., Kumar, V.: Idr/qr: An incremental dimension reduction algorithm via qr decomposition. TKDE 17(9), 1208–1222 (2005)
Chávez, E., Navarro, G., Baeza-Yates, R.A., Marroquín, J.L.: Searching in metric spaces. ACM Computing Surveys 33(3), 273–321 (2001)
Hjaltason, G.R., Samet, H.: Index-driven similarity search in metric spaces. ACM-TODS 21(4), 517–580 (2003)
Ciaccia, P., Patella, M., Zezula, P.: M-tree: An efficient access method for similarity search in metric spaces. In: Jarke, M. (ed.) VLDB, Athens, Greece, pp. 426–435. Morgan Kaufmann, San Francisco (1997)
Traina Jr., C., Traina, A.J.M., Faloutsos, C., Seeger, B.: Fast indexing and visualization of metric datasets using slim-trees. IEEE Transactions on Knowledge and Data Engineering (TKDE) 14(2), 244–260 (2002)
Santos Filho, R.F., Traina, A.J.M., Traina Jr., C., Faloutsos, C.: Similarity search without tears: The omni family of all-purpose access methods. In: ICDE, Heidelberg, Germany, pp. 623–630. IEEE Computer Society Press, Los Alamitos (2001)
Jagadish, H.V., Ooi, B.C., Tan, K.L., Yu, C., Zhang, R.: idistance: An adaptive b+-tree based indexing method for nearest neighbor search. TODS 30(1), 364–397 (2005)
Nadvorny, C.F., Heuser, C.A.: Twisting the metric space to achieve better metric trees. In: SBBD, pp. 178–190 (2004)
Katayama, N., Satoh, S.: Distinctiveness-sensitive nearest neighbor search for efficient similarity retrieval of multimedia information. In: ICDE, Washington, DC, USA, pp. 493–502. IEEE Computer Society Press, Los Alamitos (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pola, I.R.V., Traina, A.J.M., Traina, C. (2009). Easing the Dimensionality Curse by Stretching Metric Spaces. In: Winslett, M. (eds) Scientific and Statistical Database Management. SSDBM 2009. Lecture Notes in Computer Science, vol 5566. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02279-1_30
Download citation
DOI: https://doi.org/10.1007/978-3-642-02279-1_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02278-4
Online ISBN: 978-3-642-02279-1
eBook Packages: Computer ScienceComputer Science (R0)