Skip to main content
Log in

Distance approximation techniques to reduce the dimensionality for multimedia databases

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Recently, databases have been used to store multimedia data such as images, maps, video clips, and music clips. In order to search them, they should be represented by various features, which are composed of high-dimensional vectors. As a result, the dimensionality of data is increased considerably, which causes ‘the curse of dimensionality’. The increase of data dimensionality causes poor performance of index structures. To overcome the problem, the research on the dimensionality reduction has been conducted. However, some reduction methods do not guarantee no false dismissal, while others incur high computational cost. This paper proposes dimensionality reduction techniques that guarantee no false dismissal while providing efficiency considerable by approximating distances with a few values. To provide the no false dismissal property, approximated distances should always be smaller than original distances. The Cauchy–Schwarz inequality and two trigonometrical equations are used as well as the dimension partitioning technique is applied to approximate distances in such a way to reduce the difference between the approximated distance and the original distance. As a result, the proposed techniques reduce the candidate set of a query result for efficient query processing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agrawal R, Faloutsos C, Swami AN (1993) Efficient similarity search in sequence databases. In: Proceedings of the International Conference of Foundations of Data Organization and Algorithms, pp 69–84

  2. Beckmann N, Kriegel HP, Schneider R, Seeger B (1990) The r*-tree: an efficient and robust access method for points and rectangles. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp 322–331

  3. Berchtold S, Keim DA, Kriegel HP (1996) The x-tree : An index structure for high-dimensional data. In: Proceedings of International Conference on Very Large Data Bases, pp 28–39

  4. Cha GH, Chung CW (2002) The gc-tree: a high-dimensional index structure for similarity search in image databases. IEEE Trans Multimed 4(2): 235–247

    Article  Google Scholar 

  5. Cha GH, Zhu X, Petkovic P, Chung CW (2002) An efficient indexing method for nearest neighbor searches in high-dimensional image databases. IEEE Trans Multimed 4(1): 76–87

    Article  Google Scholar 

  6. Donoho DL (2000) High-dimensional data analysis: the curses and blessings of dimensionality. In: AMS Conference Mathematical Challenges of the 21st Century

  7. Egecioglu O, Ferhatosmanoglu H (2004) Dimensionality reduction and similarity computation by inner product approximations. IEEE Trans Knowl Data Eng 16(6): 714–726

    Article  Google Scholar 

  8. Faloutsos C (1996) Searching multimedia databases by content. Kluwer Academic Publishers, Dordrecht

    MATH  Google Scholar 

  9. Fei-Fei L, Fergus R, Perona P (2004) Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In CVPR 2004, Workshop on Generative-Model Based Vision

  10. Filho RFS, Traina AJM, Jr., CT, Faloutsos C (2001) Similarity search without tears: the OMNI family of all-purpose access methods. In: Proceedings of the seventeenth International Conference on Data Engineering, pp 623–630

  11. Geusebroek JM, Burghouts GJ, Smeulders AWM (2005) The Amsterdam library of object images. Int J Comput Vis 61(1): 103–112

    Article  Google Scholar 

  12. Griffin G, Holub A, Perona P (2007) Caltech-256 object category dataset, TR-7694, California Institute of Technology

  13. Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: Proceedings of the ACM SIGMOD international conference on Management of Data, pp 47–57

  14. Huang Z, Sun S, Wang W (2009) Efficient mining of skyline objects in subspaces over data streams, Knowledge and Information Systems, Online published

  15. Kanth KVR, Agrawal D, Abbadi AE, Singh A (1999) Dimensionality reduction for similarity searching in dynamic databases. Comput Vis Image Underst 75(1-2): 59–72

    Article  Google Scholar 

  16. Katayama N, Satoh S (1997) The sr-tree: an index structure for high-dimensional nearest neighbor queries. In: Proceedings ACM SIGMOD International Conference on Management of Data, pp 369–380

  17. Katayama N, Satoh S (2000) Application of multidimensional indexing methods to massive processing of multimedia information. Syst Comput Jpn 31(13): 31–41

    Article  Google Scholar 

  18. Keogh EJ, Chakrabarti K, Mehrotra S, Pazzani MJ (2001) Locally adaptive dimensionality reduction for indexing large time series databases. In: Proceedings of the ACM SIGMOD international conference on Management of data, pp 369-380

  19. Keogh EJ, Chakrabarti K, Pazzani MJ, Mehrotra S (2001) Dimensionality reduction for fast similarity search in large time series databases. Knowl Inform Syst 3(3): 263–286

    Article  MATH  Google Scholar 

  20. Lin S, Chen S, Wu W, Chen C (2009) Parameter determination and feature selection for back-propagation network by particle swarm optimization. Knowl Inform Syst 21(2): 249–266

    Article  Google Scholar 

  21. Lowe D (2003) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2): 91–110

    Article  Google Scholar 

  22. Martinez JM (2002) Mpeg-7: overview of mpeg-7 description tools, part 2. IEEE Multimed 9(3): 83–93

    Article  Google Scholar 

  23. Sakurai Y, Yoshikawa M, Uemura S, Kojima H (2000) The A-tree: an index structure for high-dimensional spaces using relative approximation. In: Proceedings of the International Conference on Very Large Data Bases, pp 516–526

  24. Song G, Cui B, Zheng B, Xie K, Yang D (2009) Accelerating sequence searching: dimensionality reduction method. Knowl Inform Syst 20(3): 301–322

    Article  Google Scholar 

  25. UCI Machine Learning repository (1998) ftp://ftp.ics.uci.edu/pub/machine-learning-databases/optdigits/

  26. Vu K, Hua K, Cheng H, Lang SD (2008) Bounded approximation: a new criterion for dimensionality reduction approximation in similarity search. IEEE Trans Knowl Data Eng 20(6): 768–783

    Article  Google Scholar 

  27. Vu K, Hua KA, Cheng H, Lang SD (2006) A non-linear dimensionality-reduction technique for fast similarity search in large databases. In: Proceedings of the ACM SIGMOD international conference on Management of data, pp 527–538

  28. Wang JZ, Boujemaa N, Bimbo AD, Geman D, Hauptmann AG, Tesic J (2006) Diversity in multimedia information retrieval research. In: Proceedings of the ACM international workshop on Multimedia information retrieval, pp 5–12

  29. Weber R, Schek HJ, Blott S (1998) A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: Proceedings International Conference on Very Large Data Bases, pp 194–205

  30. White DA, Jain R (1996) Similarity indexing with the ss-tree. In: Proceedings of the International Conference on Data Engineering, pp 516–523

  31. Wu YL, Agrawal D, Abbadi AE (2000) A comparison of DFT and DWT based similarity search in time-series databases. In: Proceedings of the ACM CIKM International Conference on Information and Knowledge Management, pp 488–495

  32. Yi BK, Faloutsos C (2000) Fast time sequence indexing for arbitrary lp norms. In: Proceedings of the International Conference on Very Large Data Bases, pp 385–394

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chin-Wan Chung.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, Y., Chung, CW., Lee, SL. et al. Distance approximation techniques to reduce the dimensionality for multimedia databases. Knowl Inf Syst 28, 227–248 (2011). https://doi.org/10.1007/s10115-010-0322-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-010-0322-z

Keywords

Navigation