Skip to main content
Log in

Social video annotation by combining features with a tri-adaptation approach

  • Special Issue Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Online social video websites such as YouTube allow users to manually annotate their video documents with textual labels. These labels can be used as indexing keywords to facilitate search and organization of video data. However, manual video annotation is usually a labor-intensive and time-consuming process. In this work, we propose a novel social video annotation approach that combines multiple feature sets based on a tri-adaptation approach. For the shots in each video, they are annotated by aggregating models that are learned from three complementary feature sets. Meanwhile, the models are collaboratively adapted by exploring unlabeled shots. In this sense, the method can be viewed as a novel semi-supervised algorithm that explores three complementary views. Our approach also exploits the temporal smoothness of video labels by applying a label correction strategy. Experiments on a web video dataset demonstrate the effectiveness of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. TRECVID: TREC video retrieval evaluation. [Online]. Available: http://www-nlpir.nist.gov/projects/trecvid

  2. Amir, A., Argillander, J., Campbell, M., et al.: IBM research TRECVID-2005 video retrieval system. In Proceedings of TREC Video Retrieval Online Proceedings (2005)

  3. Belkin, M., Matveeva, I., Niyogi, P.: Regularization and semi-supervised learning on large graphs. In: Proceedings of the Conference on Computational Learning Theory, pp. 624–638 (2004)

  4. Belkin, M., Niyogi, P., Sindhwani, V.: Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J. Mach. Learn. Res. 7(12), 2399–2434 (2006)

    MathSciNet  MATH  Google Scholar 

  5. Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the Conference on Computational Learning Theory, pp. 92–100 (1998)

  6. Castelli, V., Cover, T.: The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter. IEEE Trans. Inf. Theory 42(6), 2102–2117 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  7. Duan, L., Xu, D., Tsang, I.W.-H., Luo, J.: Visual event recognition in videos by learning from web data. IEEE Trans. Pattern Anal. Mach. Intell. 34(9), 1667–1680 (2012)

    Article  Google Scholar 

  8. Gao, Y., Wang, F., Luan, H., Chua, T.-S.: Brand data gathering from live social media streams. In: Proceedings of ACM International Conference on Multimedia Retrieval, pp. 169–176 (2014)

  9. Gao, Y., Wang, M., Luan, H., Shen, J., Yan, S., Tao, D.: Tag-based social image search with visual-text joint hypergraph learning. In: Proceedings of ACM International Conference on Multimedia, pp. 1517–1520 (2011)

  10. Gao, Y., Wang, M., Zha, Z.-J., Shen, J., Li, X., Wu, X.: Visual-textual joint relevance learning for tag-based social image search. IEEE Trans. Image Process. 22(1), 363–376 (2013)

    Article  MathSciNet  Google Scholar 

  11. Gao, Y., Tang, J., Hong, R., Yan, S., Dai, Q., Zhang, N., Chua, T.-S.: Camera constraint-free view-based 3-d object retrieval. IEEE Trans. Image Process. 21(4), 2269–2281 (2012)

    Article  MathSciNet  Google Scholar 

  12. He, J., Li, M., Zhang, H.-J., Tong, H., Zhang, C.: Manifold-ranking based image retrieval. In: Proceedings of the ACM International Conference on Multimedia (2004)

  13. Leggetter, C.J., Woodland, P.C.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden markov models. Comput. Speech Lang. 9(2), 171–185 (1995)

    Article  Google Scholar 

  14. Li, G., Wang, M., Lu, Z., Hong, R., Chua, T.-S.: In-video product annotation with web information mining. ACM Trans Multimed Comput Commun Appl 8(4), 55:1–55:19 (2012)

    Article  Google Scholar 

  15. Li, P., Wang, M., Cheng, J., Xu, C., Lu, H.: Spectral hashing with semantically consistent graph for image indexing. IEEE Trans. Multimed. 15(1), 141–152 (2013)

    Article  Google Scholar 

  16. Naphade, M.R., Smith, J.R.: On the detection of semantic concepts at TRECVID. In: Proceedings of the ACM International Conference on Multimedia, pp. 660–667 (2004)

  17. Nie, L., Wang, M., Gao, Y., Zha, Z.J., Chua, T.S.: Beyond text QA: multimedia answer generation by harvesting web information. IEEE Trans. Multimed. 15(2), 426–441 (2013)

    Article  Google Scholar 

  18. Nigam, K., Ghani, R.: Analyzing the effectiveness and applicability of co-training. In: Proceedings of Conference on Information and Knowledge Management, pp. 86–93 (2000)

  19. Nigam, K., Mccallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using em. Mach. Learn. 39(2–3), 103–134 (1999)

    MATH  Google Scholar 

  20. Rosenberg, C., Hebert, M., Schneiderman, H.: Semi-supervised self-training of object detection models. In: Proceedings of the IEEE Workshop on Applications of Computer Vision, pp. 29–36 (2005)

  21. Smeaton, A.F., Over, P., Kraaij, W.: Evaluation campaigns and TRECVid. In: Proceedings of the ACM International Workshop on Multimedia Information Retrieval, pp. 321–330 (2006)

  22. Snoek, C.G.M., Worring, M., Geusebroek, J.-M., Koelma, D.C., Seinstra, F.J., Smeulders, A.W.M.: The semantic pathfinder: using an authoring metaphor for generic multimedia indexing. IEEE Trans. Pattern Anal. Mach. Intell. 28(10), 1678–1689 (2006)

    Article  Google Scholar 

  23. Song, Y., Hua, X., Dai, L., Wang, M.: Semi-automatic video annotation based on active learning with multiple complementary predictors. In Proceedings of the International Workshop on Multimedia Information Retrieval, pp. 97–104 (2005)

  24. Song, Y., Hua, X.S., Dai, LR., Wang, M.: Semi-automatic video annotation based on active learning with multiple complementary predictors. In Proceedings of International Workshop on Multimedia Information Retrieval, pp. 97–104 (2005)

  25. Tang, J., Hua, X.-S., jun Qi, G., Wang, M., Mei, T., Wu, X.: Structure-sensitive manifold ranking for video concept detection. In: Proceedings of the ACM International Conference on Multimedia (2007)

  26. Tang, J., Hua, X.-S., Wang, M., Gu, Z., Qi, G.-J., Wu, X.: Correlative linear neighborhood propagation for video annotation. IEEE Trans. Syst. Man Cybern. Part B Cybern. 39(2), 409–416 (2009)

    Article  Google Scholar 

  27. Wang, D., Liu, X., Luo, L., Li, J., Zhang, B.: Video diver: Generic video indexing with diverse features. In: Proceedings of the International Workshop on Multimedia Information Retrieval, pp. 61–70 (2007)

  28. Wang, M., Hong, R., Li, G., Zha, Z.-J., Yan, S., Chua, T.-S.: Event driven web video summarization by tag localization and key-shot identification. IEEE Trans. Multimed. 14(4), 975–985 (2012)

    Article  Google Scholar 

  29. Wang, M., Hua, X.-S., Hong, R., Tang, J., Qi, G.-J., Song, Y.: Unified video annotation via multigraph learning. IEEE Trans. Circuits Syst. Video Technol. 19(5), 733–746 (2009)

    Article  Google Scholar 

  30. Wang, M., Hua, X.-S., Mei, T., Hong, R., Qi, G., Song, Y., Dai, L.-R.: Semi-supervised kernel density estimation for video annotation. Comput. Vision Image Underst. 113(3), 384–396 (2009)

    Article  Google Scholar 

  31. Wang, M., Hua, X.-S., Song, Y., Dal, L.-R., Li, S.: Automatic video annotation based on co-adaptation and label correction. In: Proceedings of the IEEE International Symposium on Circuits and Systems (2006)

  32. Wang, M., Hua, X.-S., Tang, J., Hong, R.: Beyond distance measurement: constructing neighborhood similarity for video annotation. IEEE Trans. Multimed. 11(3), 465–476 (2009)

    Article  Google Scholar 

  33. Wang, M., Ni, B., Hua, X.-S., Chua, T.-S.: Assistive tagging: a survey of multimedia tagging with human-computer joint exploration. ACM Comput. Surv. 44(4), 25:1–25:24 (2012)

    Article  Google Scholar 

  34. Wu, J., Hua, X.-S., Zhang, H.-J., Zhang, B.: An online-optimized incremental learning framework for video semantic classification. In Proceedings of the ACM International Conference on Multimedia, pp. 320–323 (2004)

  35. Wu, Y., Tian, Q., Huang, T.: Discriminant-em algorithm with application to image retrieval. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, pp. 222–227 (2000)

  36. Yan, R., Naphade, M.: “Semi-supervised cross feature learning for semantic concept detection in video”, in CVPR (2005)

  37. Yang, Y., Ma, Z., Xu, Z., Yan, S., Hauptmann, A.: How related exemplars help complex event detection in web videos. In: Proceedings of the International Conference on Computer Vision (2013)

  38. Yeung, M., Yeo, B.-L., Liu, B.: Extracting story units from long programs for video browsing and navigation. In: Proceedings of the IEEE International Conference on Multimedia Computing and Systems (1996)

  39. Yuan, X., Hua, X.-S., Wang, M., Wu, X.-Q.: Manifold-ranking based video concept detection on large database and feature pool. In: Proceedings of the ACM International Conference on Multimedia (2006)

  40. Zhang, D., Lee, W.: Validating co-training models for web image classification. Technical Report, NUS (2006)

  41. Zhang, T., Oles, F. J.: A probability analysis on the value of unlabeled data for classification problems. In Proceedings of the International Conference on Machine Learning (2000)

  42. Zhong, D., Zhang, H.: Clustering methods for video browsing and annotation. In Proceedings of the SPIE Conference on Storage and Retrieval for Image and Video Databases (1997)

  43. Zhou, D., Bousquet, O., Lal, T. N., Weston, J., Scholkopf, B.: Learning with local and global consistency. In: Proceedings of the Conference on Advances in Neural Information Processing Systems, pp. 321–328 (2004)

  44. Zhou, Z.-H., Li, M.: Tri-training: exploiting unlabeled data using three classifiers. IEEE Trans. Knowl. Data Eng. 17(11), 1529–1541 (2005)

    Article  Google Scholar 

  45. Zhu, X.: Semi-supervised learning literature survey. University of Wisconsin-Madison, Technical Report (2006)

  46. Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-supervised learning using gaussian fields and harmonic functions. In Proceedings of the Conference on International Conference on Machine Learning, pp. 912–919 (2003)

Download references

Acknowledgments

The authors sincerely appreciate the useful comments and suggestions from the anonymous reviewers. This work was supported by National Natural Science Fund of China (Grant No. 61272214, 61173104, 61301222), China Postdoctoral Science Foundation (Grant No. 2013M541821), Fundamental Research Funds for the Central Universities (Grant No. 2013HGQC0018, 2013HGBH0027, 2013HGBZ0166).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shijie Hao.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, F., Xu, M., Li, H. et al. Social video annotation by combining features with a tri-adaptation approach. Multimedia Systems 22, 413–422 (2016). https://doi.org/10.1007/s00530-014-0405-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-014-0405-x

Keywords

Navigation