Abstract
Retrieving videos by key words requires semantic knowledge of the videos. However, manual video annotation is very costly and time consuming. Most works reported in literatures focus on annotating a video shot with either only one semantic concept or a fixed number of words. In this paper, we propose a new approach to automatically annotate a video shot with a non-fixed number of semantic concepts and to retrieve videos based on text queries. First, a simple but efficient method is presented to automatically extract Semantic Candidate Set (SCS) for a video shot based on visual features. Then, the final annotation set is obtained from SCS by Bayesian Inference. Finally, a new way is proposed to rank the retrieved key frames according to the probabilities obtained during Bayesian Inference. Experiments show that our method is useful in automatically annotating video shots and retrieving videos by key words.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Feng, S.L., Manmatha, R., Lavrenko, V.: Multiple Bernoulli Relevance Models for Image and Video Annotation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1002–1009 (2004)
Rong, Y.: Probabilistic Models for Combining Diverse Knowledge Sources in Multimedia Retrieval. Dissertation of Carnegie Mellon University ( (2005)
Barnard, K., Duygulu, P., Forsyth, D., de Freitas, N., Blei, D., Jordan, M.I.: Matching Words and Pictures. Journal of Machine Learning Research (JMLR), Special Issue on Text and Images 3, 1107–1135 (2003)
Tseng, B.T., Lin, C.-Y., Naphade, M.R., Natsev, A., Smith, J.R.: Normalized Classifier Fusion for Semantic Visual Concept Detection. In: Proc. of Int. Conf. on Image Processing (ICIP-2003), Barcelona, Spain, pp. 14–17 (2003)
Naphade, M.R.: A Probabilistic Framework For Mapping Audio-visual Features to High-Level Semantics in Terms of Concepts and Context. Dissertation of the University of Illinois at Urbana-Champaign (2001)
Jiménez, A.B.B.: Multimedia Knowledge: Discovery, Classification, Browsing, and Retrieval. Dissertation of Columbia University ( (2005)
Jeon, J., Lavrenko, V., Manmatha, R.: Automatic Image Annotation and Retrieval using Cross-Media Relevance Models. In: Proceedings of the 26th Intl. ACM SIGIR Conf., pp. 119–126 (2003)
Lavrenko, V., Manmatha, R., Jeon, J.: A Model for Learning the Semantics of Pictures. In: The Proceedings of the 16th Conference on Advances in Neural Information Processing Systems NIPS (2004)
Cheng, J., Greiner, R., Kelly, J., Bell, D., Liu, W.: Learning Belief Networks from Data: An Information Theory Based Approach. Artificial Intelligence 137(1-2), 43–90 (2002)
Huang, C.: Inference in Belief Networks: A Procedural Guide. International Journal of Approximate Reasoning 11, 1–158 (1994)
Fangshi, W., De, X., Weixin, W.: A Cluster Algorithm of Automatic Key Frame Extraction Based on Adaptive Threshold. Journal of Computer Research and Development 42(10), 1752–1757 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, F., Xu, D., Lu, W., Wu, W. (2006). Automatic Video Annotation and Retrieval Based on Bayesian Inference. In: Cham, TJ., Cai, J., Dorai, C., Rajan, D., Chua, TS., Chia, LT. (eds) Advances in Multimedia Modeling. MMM 2007. Lecture Notes in Computer Science, vol 4351. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69423-6_28
Download citation
DOI: https://doi.org/10.1007/978-3-540-69423-6_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69421-2
Online ISBN: 978-3-540-69423-6
eBook Packages: Computer ScienceComputer Science (R0)