Social video annotation by combining features with a tri-adaptation approach

Sun, Fuming; Xu, Meixiang; Li, Haojie; Hao, Shijie

doi:10.1007/s00530-014-0405-x

Social video annotation by combining features with a tri-adaptation approach

Special Issue Paper
Published: 10 August 2014

Volume 22, pages 413–422, (2016)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Fuming Sun¹,
Meixiang Xu¹,
Haojie Li² &
…
Shijie Hao³

340 Accesses
1 Citation
Explore all metrics

Abstract

Online social video websites such as YouTube allow users to manually annotate their video documents with textual labels. These labels can be used as indexing keywords to facilitate search and organization of video data. However, manual video annotation is usually a labor-intensive and time-consuming process. In this work, we propose a novel social video annotation approach that combines multiple feature sets based on a tri-adaptation approach. For the shots in each video, they are annotated by aggregating models that are learned from three complementary feature sets. Meanwhile, the models are collaboratively adapted by exploring unlabeled shots. In this sense, the method can be viewed as a novel semi-supervised algorithm that explores three complementary views. Our approach also exploits the temporal smoothness of video labels by applying a label correction strategy. Experiments on a web video dataset demonstrate the effectiveness of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

t-EVA: Time-Efficient t-SNE Video Annotation

DUT-WEBV: A Benchmark Dataset for Performance Evaluation of Tag Localization for Web Video

Social web video clustering based on multi-view clustering via nonnegative matrix factorization

Article 02 February 2019

References

TRECVID: TREC video retrieval evaluation. [Online]. Available: http://www-nlpir.nist.gov/projects/trecvid
Amir, A., Argillander, J., Campbell, M., et al.: IBM research TRECVID-2005 video retrieval system. In Proceedings of TREC Video Retrieval Online Proceedings (2005)
Belkin, M., Matveeva, I., Niyogi, P.: Regularization and semi-supervised learning on large graphs. In: Proceedings of the Conference on Computational Learning Theory, pp. 624–638 (2004)
Belkin, M., Niyogi, P., Sindhwani, V.: Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J. Mach. Learn. Res. 7(12), 2399–2434 (2006)
MathSciNet MATH Google Scholar
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the Conference on Computational Learning Theory, pp. 92–100 (1998)
Castelli, V., Cover, T.: The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter. IEEE Trans. Inf. Theory 42(6), 2102–2117 (1996)
Article MathSciNet MATH Google Scholar
Duan, L., Xu, D., Tsang, I.W.-H., Luo, J.: Visual event recognition in videos by learning from web data. IEEE Trans. Pattern Anal. Mach. Intell. 34(9), 1667–1680 (2012)
Article Google Scholar
Gao, Y., Wang, F., Luan, H., Chua, T.-S.: Brand data gathering from live social media streams. In: Proceedings of ACM International Conference on Multimedia Retrieval, pp. 169–176 (2014)
Gao, Y., Wang, M., Luan, H., Shen, J., Yan, S., Tao, D.: Tag-based social image search with visual-text joint hypergraph learning. In: Proceedings of ACM International Conference on Multimedia, pp. 1517–1520 (2011)
Gao, Y., Wang, M., Zha, Z.-J., Shen, J., Li, X., Wu, X.: Visual-textual joint relevance learning for tag-based social image search. IEEE Trans. Image Process. 22(1), 363–376 (2013)
Article MathSciNet Google Scholar
Gao, Y., Tang, J., Hong, R., Yan, S., Dai, Q., Zhang, N., Chua, T.-S.: Camera constraint-free view-based 3-d object retrieval. IEEE Trans. Image Process. 21(4), 2269–2281 (2012)
Article MathSciNet Google Scholar
He, J., Li, M., Zhang, H.-J., Tong, H., Zhang, C.: Manifold-ranking based image retrieval. In: Proceedings of the ACM International Conference on Multimedia (2004)
Leggetter, C.J., Woodland, P.C.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden markov models. Comput. Speech Lang. 9(2), 171–185 (1995)
Article Google Scholar
Li, G., Wang, M., Lu, Z., Hong, R., Chua, T.-S.: In-video product annotation with web information mining. ACM Trans Multimed Comput Commun Appl 8(4), 55:1–55:19 (2012)
Article Google Scholar
Li, P., Wang, M., Cheng, J., Xu, C., Lu, H.: Spectral hashing with semantically consistent graph for image indexing. IEEE Trans. Multimed. 15(1), 141–152 (2013)
Article Google Scholar
Naphade, M.R., Smith, J.R.: On the detection of semantic concepts at TRECVID. In: Proceedings of the ACM International Conference on Multimedia, pp. 660–667 (2004)
Nie, L., Wang, M., Gao, Y., Zha, Z.J., Chua, T.S.: Beyond text QA: multimedia answer generation by harvesting web information. IEEE Trans. Multimed. 15(2), 426–441 (2013)
Article Google Scholar
Nigam, K., Ghani, R.: Analyzing the effectiveness and applicability of co-training. In: Proceedings of Conference on Information and Knowledge Management, pp. 86–93 (2000)
Nigam, K., Mccallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using em. Mach. Learn. 39(2–3), 103–134 (1999)
MATH Google Scholar
Rosenberg, C., Hebert, M., Schneiderman, H.: Semi-supervised self-training of object detection models. In: Proceedings of the IEEE Workshop on Applications of Computer Vision, pp. 29–36 (2005)
Smeaton, A.F., Over, P., Kraaij, W.: Evaluation campaigns and TRECVid. In: Proceedings of the ACM International Workshop on Multimedia Information Retrieval, pp. 321–330 (2006)
Snoek, C.G.M., Worring, M., Geusebroek, J.-M., Koelma, D.C., Seinstra, F.J., Smeulders, A.W.M.: The semantic pathfinder: using an authoring metaphor for generic multimedia indexing. IEEE Trans. Pattern Anal. Mach. Intell. 28(10), 1678–1689 (2006)
Article Google Scholar
Song, Y., Hua, X., Dai, L., Wang, M.: Semi-automatic video annotation based on active learning with multiple complementary predictors. In Proceedings of the International Workshop on Multimedia Information Retrieval, pp. 97–104 (2005)
Song, Y., Hua, X.S., Dai, LR., Wang, M.: Semi-automatic video annotation based on active learning with multiple complementary predictors. In Proceedings of International Workshop on Multimedia Information Retrieval, pp. 97–104 (2005)
Tang, J., Hua, X.-S., jun Qi, G., Wang, M., Mei, T., Wu, X.: Structure-sensitive manifold ranking for video concept detection. In: Proceedings of the ACM International Conference on Multimedia (2007)
Tang, J., Hua, X.-S., Wang, M., Gu, Z., Qi, G.-J., Wu, X.: Correlative linear neighborhood propagation for video annotation. IEEE Trans. Syst. Man Cybern. Part B Cybern. 39(2), 409–416 (2009)
Article Google Scholar
Wang, D., Liu, X., Luo, L., Li, J., Zhang, B.: Video diver: Generic video indexing with diverse features. In: Proceedings of the International Workshop on Multimedia Information Retrieval, pp. 61–70 (2007)
Wang, M., Hong, R., Li, G., Zha, Z.-J., Yan, S., Chua, T.-S.: Event driven web video summarization by tag localization and key-shot identification. IEEE Trans. Multimed. 14(4), 975–985 (2012)
Article Google Scholar
Wang, M., Hua, X.-S., Hong, R., Tang, J., Qi, G.-J., Song, Y.: Unified video annotation via multigraph learning. IEEE Trans. Circuits Syst. Video Technol. 19(5), 733–746 (2009)
Article Google Scholar
Wang, M., Hua, X.-S., Mei, T., Hong, R., Qi, G., Song, Y., Dai, L.-R.: Semi-supervised kernel density estimation for video annotation. Comput. Vision Image Underst. 113(3), 384–396 (2009)
Article Google Scholar
Wang, M., Hua, X.-S., Song, Y., Dal, L.-R., Li, S.: Automatic video annotation based on co-adaptation and label correction. In: Proceedings of the IEEE International Symposium on Circuits and Systems (2006)
Wang, M., Hua, X.-S., Tang, J., Hong, R.: Beyond distance measurement: constructing neighborhood similarity for video annotation. IEEE Trans. Multimed. 11(3), 465–476 (2009)
Article Google Scholar
Wang, M., Ni, B., Hua, X.-S., Chua, T.-S.: Assistive tagging: a survey of multimedia tagging with human-computer joint exploration. ACM Comput. Surv. 44(4), 25:1–25:24 (2012)
Article Google Scholar
Wu, J., Hua, X.-S., Zhang, H.-J., Zhang, B.: An online-optimized incremental learning framework for video semantic classification. In Proceedings of the ACM International Conference on Multimedia, pp. 320–323 (2004)
Wu, Y., Tian, Q., Huang, T.: Discriminant-em algorithm with application to image retrieval. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, pp. 222–227 (2000)
Yan, R., Naphade, M.: “Semi-supervised cross feature learning for semantic concept detection in video”, in CVPR (2005)
Yang, Y., Ma, Z., Xu, Z., Yan, S., Hauptmann, A.: How related exemplars help complex event detection in web videos. In: Proceedings of the International Conference on Computer Vision (2013)
Yeung, M., Yeo, B.-L., Liu, B.: Extracting story units from long programs for video browsing and navigation. In: Proceedings of the IEEE International Conference on Multimedia Computing and Systems (1996)
Yuan, X., Hua, X.-S., Wang, M., Wu, X.-Q.: Manifold-ranking based video concept detection on large database and feature pool. In: Proceedings of the ACM International Conference on Multimedia (2006)
Zhang, D., Lee, W.: Validating co-training models for web image classification. Technical Report, NUS (2006)
Zhang, T., Oles, F. J.: A probability analysis on the value of unlabeled data for classification problems. In Proceedings of the International Conference on Machine Learning (2000)
Zhong, D., Zhang, H.: Clustering methods for video browsing and annotation. In Proceedings of the SPIE Conference on Storage and Retrieval for Image and Video Databases (1997)
Zhou, D., Bousquet, O., Lal, T. N., Weston, J., Scholkopf, B.: Learning with local and global consistency. In: Proceedings of the Conference on Advances in Neural Information Processing Systems, pp. 321–328 (2004)
Zhou, Z.-H., Li, M.: Tri-training: exploiting unlabeled data using three classifiers. IEEE Trans. Knowl. Data Eng. 17(11), 1529–1541 (2005)
Article Google Scholar
Zhu, X.: Semi-supervised learning literature survey. University of Wisconsin-Madison, Technical Report (2006)
Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-supervised learning using gaussian fields and harmonic functions. In Proceedings of the Conference on International Conference on Machine Learning, pp. 912–919 (2003)

Download references

Acknowledgments

The authors sincerely appreciate the useful comments and suggestions from the anonymous reviewers. This work was supported by National Natural Science Fund of China (Grant No. 61272214, 61173104, 61301222), China Postdoctoral Science Foundation (Grant No. 2013M541821), Fundamental Research Funds for the Central Universities (Grant No. 2013HGQC0018, 2013HGBH0027, 2013HGBZ0166).

Author information

Authors and Affiliations

Liaoning University of Technology, Jinzhou, 121001, People’s Republic of China
Fuming Sun & Meixiang Xu
Dalian University of Technology, Dalian, 115024, People’s Republic of China
Haojie Li
School of Computer and Information, and Computer Science and Technology Postdoctoral Research Station, Hefei University of Technology, Hefei, 230009, People’s Republic of China
Shijie Hao

Authors

Fuming Sun
View author publications
You can also search for this author in PubMed Google Scholar
Meixiang Xu
View author publications
You can also search for this author in PubMed Google Scholar
Haojie Li
View author publications
You can also search for this author in PubMed Google Scholar
Shijie Hao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shijie Hao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sun, F., Xu, M., Li, H. et al. Social video annotation by combining features with a tri-adaptation approach. Multimedia Systems 22, 413–422 (2016). https://doi.org/10.1007/s00530-014-0405-x

Download citation

Published: 10 August 2014
Issue Date: July 2016
DOI: https://doi.org/10.1007/s00530-014-0405-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Social video annotation by combining features with a tri-adaptation approach

Abstract

Access this article

Similar content being viewed by others

t-EVA: Time-Efficient t-SNE Video Annotation

DUT-WEBV: A Benchmark Dataset for Performance Evaluation of Tag Localization for Web Video

Social web video clustering based on multi-view clustering via nonnegative matrix factorization

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Social video annotation by combining features with a tri-adaptation approach

Abstract

Access this article

Similar content being viewed by others

t-EVA: Time-Efficient t-SNE Video Annotation

DUT-WEBV: A Benchmark Dataset for Performance Evaluation of Tag Localization for Web Video

Social web video clustering based on multi-view clustering via nonnegative matrix factorization

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation