Abstract
Recently, databases have been used to store multimedia data such as images, maps, video clips, and music clips. In order to search them, they should be represented by various features, which are composed of high-dimensional vectors. As a result, the dimensionality of data is increased considerably, which causes ‘the curse of dimensionality’. The increase of data dimensionality causes poor performance of index structures. To overcome the problem, the research on the dimensionality reduction has been conducted. However, some reduction methods do not guarantee no false dismissal, while others incur high computational cost. This paper proposes dimensionality reduction techniques that guarantee no false dismissal while providing efficiency considerable by approximating distances with a few values. To provide the no false dismissal property, approximated distances should always be smaller than original distances. The Cauchy–Schwarz inequality and two trigonometrical equations are used as well as the dimension partitioning technique is applied to approximate distances in such a way to reduce the difference between the approximated distance and the original distance. As a result, the proposed techniques reduce the candidate set of a query result for efficient query processing.
Similar content being viewed by others
References
Agrawal R, Faloutsos C, Swami AN (1993) Efficient similarity search in sequence databases. In: Proceedings of the International Conference of Foundations of Data Organization and Algorithms, pp 69–84
Beckmann N, Kriegel HP, Schneider R, Seeger B (1990) The r*-tree: an efficient and robust access method for points and rectangles. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp 322–331
Berchtold S, Keim DA, Kriegel HP (1996) The x-tree : An index structure for high-dimensional data. In: Proceedings of International Conference on Very Large Data Bases, pp 28–39
Cha GH, Chung CW (2002) The gc-tree: a high-dimensional index structure for similarity search in image databases. IEEE Trans Multimed 4(2): 235–247
Cha GH, Zhu X, Petkovic P, Chung CW (2002) An efficient indexing method for nearest neighbor searches in high-dimensional image databases. IEEE Trans Multimed 4(1): 76–87
Donoho DL (2000) High-dimensional data analysis: the curses and blessings of dimensionality. In: AMS Conference Mathematical Challenges of the 21st Century
Egecioglu O, Ferhatosmanoglu H (2004) Dimensionality reduction and similarity computation by inner product approximations. IEEE Trans Knowl Data Eng 16(6): 714–726
Faloutsos C (1996) Searching multimedia databases by content. Kluwer Academic Publishers, Dordrecht
Fei-Fei L, Fergus R, Perona P (2004) Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In CVPR 2004, Workshop on Generative-Model Based Vision
Filho RFS, Traina AJM, Jr., CT, Faloutsos C (2001) Similarity search without tears: the OMNI family of all-purpose access methods. In: Proceedings of the seventeenth International Conference on Data Engineering, pp 623–630
Geusebroek JM, Burghouts GJ, Smeulders AWM (2005) The Amsterdam library of object images. Int J Comput Vis 61(1): 103–112
Griffin G, Holub A, Perona P (2007) Caltech-256 object category dataset, TR-7694, California Institute of Technology
Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: Proceedings of the ACM SIGMOD international conference on Management of Data, pp 47–57
Huang Z, Sun S, Wang W (2009) Efficient mining of skyline objects in subspaces over data streams, Knowledge and Information Systems, Online published
Kanth KVR, Agrawal D, Abbadi AE, Singh A (1999) Dimensionality reduction for similarity searching in dynamic databases. Comput Vis Image Underst 75(1-2): 59–72
Katayama N, Satoh S (1997) The sr-tree: an index structure for high-dimensional nearest neighbor queries. In: Proceedings ACM SIGMOD International Conference on Management of Data, pp 369–380
Katayama N, Satoh S (2000) Application of multidimensional indexing methods to massive processing of multimedia information. Syst Comput Jpn 31(13): 31–41
Keogh EJ, Chakrabarti K, Mehrotra S, Pazzani MJ (2001) Locally adaptive dimensionality reduction for indexing large time series databases. In: Proceedings of the ACM SIGMOD international conference on Management of data, pp 369-380
Keogh EJ, Chakrabarti K, Pazzani MJ, Mehrotra S (2001) Dimensionality reduction for fast similarity search in large time series databases. Knowl Inform Syst 3(3): 263–286
Lin S, Chen S, Wu W, Chen C (2009) Parameter determination and feature selection for back-propagation network by particle swarm optimization. Knowl Inform Syst 21(2): 249–266
Lowe D (2003) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2): 91–110
Martinez JM (2002) Mpeg-7: overview of mpeg-7 description tools, part 2. IEEE Multimed 9(3): 83–93
Sakurai Y, Yoshikawa M, Uemura S, Kojima H (2000) The A-tree: an index structure for high-dimensional spaces using relative approximation. In: Proceedings of the International Conference on Very Large Data Bases, pp 516–526
Song G, Cui B, Zheng B, Xie K, Yang D (2009) Accelerating sequence searching: dimensionality reduction method. Knowl Inform Syst 20(3): 301–322
UCI Machine Learning repository (1998) ftp://ftp.ics.uci.edu/pub/machine-learning-databases/optdigits/
Vu K, Hua K, Cheng H, Lang SD (2008) Bounded approximation: a new criterion for dimensionality reduction approximation in similarity search. IEEE Trans Knowl Data Eng 20(6): 768–783
Vu K, Hua KA, Cheng H, Lang SD (2006) A non-linear dimensionality-reduction technique for fast similarity search in large databases. In: Proceedings of the ACM SIGMOD international conference on Management of data, pp 527–538
Wang JZ, Boujemaa N, Bimbo AD, Geman D, Hauptmann AG, Tesic J (2006) Diversity in multimedia information retrieval research. In: Proceedings of the ACM international workshop on Multimedia information retrieval, pp 5–12
Weber R, Schek HJ, Blott S (1998) A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: Proceedings International Conference on Very Large Data Bases, pp 194–205
White DA, Jain R (1996) Similarity indexing with the ss-tree. In: Proceedings of the International Conference on Data Engineering, pp 516–523
Wu YL, Agrawal D, Abbadi AE (2000) A comparison of DFT and DWT based similarity search in time-series databases. In: Proceedings of the ACM CIKM International Conference on Information and Knowledge Management, pp 488–495
Yi BK, Faloutsos C (2000) Fast time sequence indexing for arbitrary lp norms. In: Proceedings of the International Conference on Very Large Data Bases, pp 385–394
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kim, Y., Chung, CW., Lee, SL. et al. Distance approximation techniques to reduce the dimensionality for multimedia databases. Knowl Inf Syst 28, 227–248 (2011). https://doi.org/10.1007/s10115-010-0322-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-010-0322-z