Skip to main content
Log in

Personalized video similarity measure

  • Interactive Multimedia Computing
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

As an effective technique to manage and explore large scale of video collections, personalized video search has received great attentions in recent years. One of the key problems in the related technique development is how to design and evaluate the similarity measures. Most of the existing approaches simply adopt traditional Euclidean distance or its variants. Consequently, they generally suffer from two main disadvantages: (1) low effectiveness—retrieval accuracy is poor. One of main reasons is that very little research has been carried out on designing an effective fusion scheme for integrating multimodal information (e.g., text, audio and visual) from video sequences and (2) poor scalability—development process of the video similarity metrics is largely disconnected from that of the relevant database access methods (indexing structures). This article reports a new distance metric called personalized video distance to effectively fuse information about individual preference and multimodal properties into a compact signature. Moreover, a novel hashing-based indexing structure has been designed to facilitate fast retrieval process and better scalability. A set of comprehensive empirical studies have been carried out based on two large video test collections and carefully designed queries with different complexities. We observe significant improvements over the existing techniques on various aspects.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. Variance must be larger than 80%.

References

  1. Special issue on keeping, refinding, and sharing personal information. ACM Trans. Inf. Syst. (2008)

  2. Aggarwal, C.C.: On the effects of dimensionality reduction on high dimensional similarity search. In: Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (POSD) (2001)

  3. Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the surprising behavior of distance metrics in high dimensional spaces. In: Proceedings of the 8th International Conference on Database Theory (ICDT) (2001)

  4. Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: Proc. of ACM FOCS (2006)

  5. Berchtold, S., Keim, D.A., Kriegel, H.: The x-tree : An index structure for high-dimensional data. In: Proceedings of 22th International Conference on Very Large Data Bases (VLDB’96) pp. 28–39 (1996)

  6. Blei, D., Jordan, M.: Modeling annotated data. In: Proc. of ACM SIGIR (2003)

  7. Böhm, C., Berchtold, S., Keim, D.A.: Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Comput. Surv. 33(3), (2001)

  8. Chang, H.S., Sull, S., Lee, S.U.: Efficient video indexing scheme for content-based retrieval. IEEE Trans. Circuits Syst. Video Technol. 9(8), 1269–1279 (1999)

    Article  Google Scholar 

  9. Chen, L., Chua, T.-S.: A match and tiling approach to content-based video retrieval. In: Proceeding of ICME (2001)

  10. Cherubini, M., de Oliveira, R., Oliver, N.: Understanding near-duplicate videos: a user-centric approach. In: ACM Multimedia (2009)

  11. Cheung, S., Zakhor, A.: Efficient video similarity measurement with video signature. IEEE Trans. Circuits Syst. Video Technol. 13(1), (2003)

  12. Chiu, C.-Y., Li, C.-H., Wang, H.-A., Chen, C.-S., Chien, L.-F.: A time warping based approach for video copy detection. In: Proceeding of ICPR (2006)

  13. O’Toole, C., Smeaton, A., Murphy, N., Marlow, S.: Evaluation of shot boundary detection on a large video test suite. In: Proc. of Challenges in Image Retrieval (1999)

  14. Dadason, K., Lejsek, H., Ásmundsson, F., Jónsson, B., Amsaleg, L.: Videntifier: identifying pirated videos in real-time. In: Proceedings of ACM the 15th International Conference on Multimedia, pp. 471–472 (2007)

  15. Divakaran, A., Radhakrishnan, R., Peker, K.A.: Motion activity-based extraction of key-frames from video shots. In: Proceeding of the IEEE International Conference on Image Processing (2002)

  16. Fahlman, S.: An empirical study of learning speed for back-propagation networks. Technical report, Technical Report CMU-CS 88-162, Carnegie-Mellon University (1988)

  17. Feng, S., Manmatha, R., Lavrenko, V.: Multiple Bernoulli relevance models for image and video annotation. In: Proc. of the International Conference on Computer Vision and Pattern Recognition (CVPR) (2004)

  18. Ferman, A.M., Tekalp, A.M.: Two-stage hierarchical video summary extraction to match low-level user browsing preferences. IEEE Trans. Multimed. 5(2), 244–256 (2003)

    Article  Google Scholar 

  19. Gibbon D. (2005) Introduction to video search engines (tutorial). In: Proc. of WWW

  20. Gonzalez, R.C., Woods, R.E.: Digital Image Processing. Prentice Hall (2002)

  21. Haghani, P., Michel, S., Cudré-Mauroux, P., Aberer, K.: Lsh at large—distributed knn search in high dimensions. In: WebDB (2008)

  22. Haykin, S.: Neural Networks: A Comprehensive Foundation. Macmillan Publishing (1994)

  23. Hinneburg, A., Aggarwal, C.C., Keim, D.A.: What is the nearest neighbor in high dimensional spaces? In: Proceedings of 26th International Conference on Very Large Data Bases (VLDB) (2000)

  24. Hoad, T., Zobel, J.: Detection of video sequences using compact signatures. ACM Trans. Inf. Syst. 24(1) (2006)

  25. Jagadish, H.V., Ooi, B.C., Tan, K.-L., Yu, C., Zhang, R.: idistance: An adaptive b+-tree based indexing method for nearest neighbor search. ACM Trans. Database Syst. 30(2), 364–397 (2005)

    Article  Google Scholar 

  26. Li, Y., Zhang, T., Tretter, D.: An overview of video abstraction techniques. Technical report, HP Laboratory, (2001)

  27. Lin, K.-I., Jagadish, H.V., Faloutsos, C.: The tv-tree: An index structure for high-dimensional data. VLDB J. 3(4), 517–542 (1994)

    Article  Google Scholar 

  28. Logan, B.: Mel frequency cepstral coefficients for music modeling. In: Proc. of the ISMIR (2000)

  29. Lu, L., Liu, D., Zhang, H.: Automatic mood detection and tracking of music audio signals. IEEE Trans. Acoust., Speech, Signal (2006)

  30. Luo, H., Fan, J.: Building concept ontology for medical video annotation. In: Proceedings of the 14th Annual ACM International Conference on Multimedia, pp. 57–60 (2006)

  31. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press (2008)

  32. OConnor, B.C.: Selecting key frames of moving image documents: A digital environment for analysis and navigation. Microcomput. Inf. Manag. 8(2), (1991)

  33. Puzicha, J., Buhmann, J., Rubner, Y., Tomasi, C.: Empirical evaluation of dissimilarity measures for color and texture. In: Proc. of the International Conference on Computer Vision (ICCV) (1999)

  34. Sakurai, Y., Yoshikawa, M., Uemura, S., Kojima, H.: The a-tree: An index structure for high-dimensional spaces using relative approximation. In: Proceedings of the 26th International Conference on Very Large Data Bases (VLDB ’00), pp. 516–526 (2000)

  35. Santini, S., Jain, R.: Similarity measures. IEEE Trans. Pattern Anal. Mach. Intell. 21(9), (1999)

  36. Shen, J., Tao, D., Li, X.: Modality mixture projections for semantic video event detection. IEEE Trans. Circuits Syst. Video Technol. 18(11), 1587–1596 (2008)

    Article  Google Scholar 

  37. Tao, Y., Yi, K., Sheng, C., Kalnis, P.: Efficient and accurate nearest neighbor and closest pair search in high-dimensional space. ACM Trans. Database Syst. 35(3), (2010)

  38. Truong, B.T., Venkatesh, S.: Video abstraction: A systematic review and classification. ACM Transactions on Multimedia Computing, Communications and Applications 3(1), (2007)

  39. Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Trans. on Speech and Audio Processing (2002)

  40. Wang, M., Hua, X.-S., Hong, R., Tang, J., Qi, G.-J., Song, Y.: Unified video annotation via multi-graph learning. IEEE Trans. Circuits Syst. Video Technol. 19(5), (2009)

  41. Wang, M., Hua, X.-S., Tang, J., Hong, R.: Beyond distance measurement: Constructing neighborhood similarity for video annotation. IEEE Trans. Multimed. 11(3), (2009)

  42. Zhang, B., Shen, J., Xiang, Q., Wang, Y.: Compositemap: A novel framework for music similarity measure. In: Proc. of ACM SIGIR (2009)

  43. Zhang, H., Tan, S.Y., Smoliar, S.W., Gong, Y.: Automatic parsing and indexing of news video. Multimed. Syst. 2(6), 256–266 (1995)

    Article  Google Scholar 

  44. Zhu, X., Fan, J., Elmagarmid, A.K., Wu, X.: Hierarchical video content description and summarization using unified semantic and visual similarity. Multimed. Syst. 9(1), (2003)

  45. Zhu, X., Wu, X., Fan, J., Elmagarmid, A.K., Aref, W.G.: Exploring video content structure for hierarchical summarization. Multimed. Syst. 10(2), 98–115 (2004)

    Article  Google Scholar 

Download references

Acknowledgments

Jialie Shen was supported by the Lee Foundation Fellowship for Research Excellence (SMU Research Project Fund No: C220/T050024), Singapore. We would like to thank Professor Ramayya Krishnan at School of Information Systems and Management, Heinz College, Carnegie Mellon University, associate editors and referees for their valuable comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jialie Shen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shen, J., Cheng, Z. Personalized video similarity measure. Multimedia Systems 17, 421–433 (2011). https://doi.org/10.1007/s00530-010-0223-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-010-0223-8

Keywords

Navigation