A temporal context model for boosting video annotation

Yi, Jian; Peng, YuXin; Xiao, JianGuo

doi:10.1007/s11432-012-4720-6

A temporal context model for boosting video annotation

Research Paper
Progress of Projects Supported by NSFC
Published: 24 November 2012

Volume 56, pages 1–14, (2013)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

Jian Yi¹,
YuXin Peng¹ &
JianGuo Xiao¹

96 Accesses
3 Citations
Explore all metrics

Abstract

In this paper, we propose a new method to model the temporal context for boosting video annotation accuracy. The motivation of our idea mainly comes from the fact that temporally continuous shots in video are generally with relevant content, so that the performance of video annotation could be comparably boosted by mining the temporal dependency between shots in video. Based on this consideration, we propose a temporal context model to mine the redundant information between shots. By connecting our model with conditional random field and borrowing the learning and inference approaches from it, we could obtain the refined probability of a concept occurring in the shot, which is the leverage of temporal context information and initial output of video annotation. Comparing with existing methods for temporal context mining of video annotation, our model could capture different kinds of shot dependency more accurately to improve the video annotation performance. Furthermore, our model is relatively simple and efficient, which is important for the applications which have large scale data to process. Extensive experimental results on the widely used TRECVID datasets exhibit the effectiveness of our method for improving video annotation accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Smeaton A F, Over P, Kraaij W. Evaluation campaigns and trecvid. In: Proceedings of the 8th ACM SIGMM International Workshop on Multimedia Information Retrieval, Santa Barbara, 2006. 321–330
Snoek C G M, Worring M, van Gemert J C, et al. The challenge problem for automated detection of 101 semantic concepts in multimedia. In: Proceedings of the 14th ACM International Conference on Multimedia, Santa Barbara, 2006. 421–430
Yanagawa A, Chang S F, Kennedy L, et al. Columbia university’s baseline detectors for 374 lscom semantic visual concepts. Technical Report, Columbia University. 2007
Google Scholar
Jiang Y G, Ngo C W, Yang J. Towards optimal bag-of-features for object categorization and semantic video retrieval. In: Proceedings of the 6th ACM International Conference on Image and Video Retrieval, Amsterdam, 2007. 494–501
Ngo C W, Jiang Y G, Wei X Y, et al. Vireo/dvmm at trecvid 2009: High-level feature extraction, automatic video search, and content-based copy detection. TREC Video Retrieval Evaluation, Gaithersburg, 2009
Google Scholar
Jiang Y G, Wang J, Chang S F, et al. Domain adaptive semantic diffusion for large scale context-based video annotation. In: Proceedings of the 12th IEEE International Conference on Computer Vision, Kyoto, 2009. 1420–1427
Kennedy L, Chang S F. A reranking approach for context-based concept fusion in video indexing and retrieval. In: Proceedings of the 11th IEEE International Conference on Computer Vision, 2007. 333–340
Yan R, Chen M Y, Hauptmann A G. Mining relationship between video concepts using probabilistic graphical model. In: Proceedings of IEEE International Conference on Multimedia and Expo, Toronto, 2006. 301–304
Torralba A. Contextual priming for object detection. Int J Comput Vis, 2003, 53: 169–191
Article Google Scholar
Shotton J, Winn J, Rother C, et al. Textonboost for image understanding: Multi-class object recognition and segmentation by jointly modeling appearance, shape and context. Int J Comput Vis, 2009, 81: 2–23
Article Google Scholar
Wolf L, Bileschi S. A critical view of context. Int J Comput Vis, 2006, 69: 251–261
Article Google Scholar
Rabinovich A, Vedaldi A, Galleguillos C, et al. Objects in context. In: Proceedings of the 11th IEEE International Conference on Computer Vision, Rio de Janeiro, 2007. 1–8
Galleguillos C, Rabinovich A, Belongie S. Object categorization using co-occurrence, location and appearance. In: IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, 2008. 1–8
Yuan J H, Li J M, Zhang B. Exploiting spatial context constraints for automatic image region annotation. In: Proceedings of the 15th ACM International Conference on Multimedia, Augsburg, 2007. 595–604
Yang J, Hauptmann A G. Exploring temporal consistency for video retrieval and analysis. In: Proceedings of the 8th ACM SIGMM International Workshop on Multimedia Information Retrieval, 2006. 33–42
Qi G J, Hua X S, Rui Y, et al. Correlative multi-label video annotation with temporal kernels. ACM Trans Multimed Comput Commun Appl, 2008, 5: 3–29
Article Google Scholar
Liu K H, Weng M F, Tseng C Y, et al. Association and temporal rule mining for post-processing of semantic concept detection in video. IEEE Trans Multimedia, 2008, 10: 240–251
Article Google Scholar
Weng M F, Chuang Y Y. Multi-cue fusion for semantic video indexing. In: Proceedings of the 16th ACM International Conference on Multimedia, Vancouver, 2008. 71–80
Qi G J, Hua X S, Rui Y, et al. Correlative multi-label video annotation. In: Proceedings of the 15th ACM International Conference on Multimedia, Augsburg, 2007. 17–26
Lafferty J, McCallum A, Pereira F. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th International Conference on Machine Learning, Williamstown, 2001. 282–289
Sutton C, McCallum A. An introduction to conditional random fields for relational learning. In: Getoor L, Taskar B, eds. Introduction to Statistical Relational Learning. Cambridge: MIT Press, 2007
Google Scholar
Zhu J, Nie Z Q, Wen J R, et al. 2d conditional random fields for web information extraction. In: Proceedings of the 22nd International Conference on Machine Learning, Bonn, 2005. 1044–1051
Jiang Y G, Ngo C W, Yang J. VIREO-374: Keypoint-based LSCOMSemantic Concept Detectors. http://vireo.cs.cityu.edu.hk/research/vireo374/
Yilmaz E, Aslam J A. Estimating average precision with incomplete and imperfect judgments. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management, Arlington, 2006. 102–111

Download references

Author information

Authors and Affiliations

Institute of Computer Science and Technology, Peking University, Beijing, 100871, China
Jian Yi, YuXin Peng & JianGuo Xiao

Authors

Jian Yi
View author publications
You can also search for this author in PubMed Google Scholar
YuXin Peng
View author publications
You can also search for this author in PubMed Google Scholar
JianGuo Xiao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to YuXin Peng.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yi, J., Peng, Y. & Xiao, J. A temporal context model for boosting video annotation. Sci. China Inf. Sci. 56, 1–14 (2013). https://doi.org/10.1007/s11432-012-4720-6

Download citation

Received: 10 July 2012
Accepted: 20 October 2012
Published: 24 November 2012
Issue Date: November 2013
DOI: https://doi.org/10.1007/s11432-012-4720-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A temporal context model for boosting video annotation

Abstract

Access this article

Similar content being viewed by others

Weakly supervised detection of video events using hidden conditional random fields

Social video annotation by combining features with a tri-adaptation approach

Non-parametric Contextual Relationship Learning for Semantic Video Object Segmentation

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A temporal context model for boosting video annotation

Abstract

Access this article

Similar content being viewed by others

Weakly supervised detection of video events using hidden conditional random fields

Social video annotation by combining features with a tri-adaptation approach

Non-parametric Contextual Relationship Learning for Semantic Video Object Segmentation

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation