Abstract
In this paper, we propose a novel video content representation framework to achieve a middle-level understanding of video contents by using multimodal salient objects. Specifically, this framework includes: (a) A semantic-sensitive video content representation and semantic video concept modeling framework by using the multimodal salient objects and Gaussian mixture models; (b) A machine learning technique to train the automatic detection functions of multimodal salient objects; (c) A novel framework to enable more effective classifier training by integrating model selection and parameter estimation seamlessly in a single algorithm. Our experiments on a certain domain of medical education videos have obtained very convincing results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Chang, S.-F., Chen, W., Sundaram, H.: Semantic visual templates: linking visual features to semantics. In: Proc. ICIP (1998)
Adams, W.H., Iyengar, G., Lin, C.-Y., Naphade, M.R., Neti, C., Nock, H.J., Smith, J.R.: Semantic indexing of multimedia content using visual, audio and text cues. In: EURASIP JASP, vol. 2, pp. 170–185 (2003)
Forsyth, D.A., Fleck, M.: Body plan. In: Proc. of CVPR, pp. 678–683 (1997)
Zhang, H.J., Wu, J., Zhong, D., Smoliar, S.: An integrated system for contentbased video retrieval and browsing. Pattern Recognition 30, 643–658 (1997)
Satoh, S., Kanada, T.: Name-It: Association of face and name in video. In: Proc. of CVPR (1997)
Chang, S.F., Chen, W., Meng, H.J., Sundaram, H., Zhong, D.: A fully automatic content-based video search engine supporting spatiotemporal queries. IEEE Trans. on CSVT 8, 602–615 (1998)
Deng, Y., Manjunath, B.S.: Netra-V: Toward an object-based video representation. IEEE Trans. on CSVT 8, 616–627 (1998)
Dimitrova, N., Zhang, H.J., Shahraray, B., Sezan, I., Huang, T.S., Zakhor, A.: Applications of video-content analysis amd retrieval. IEEE Multimedia, 42–55 (2002)
Rui, Y., Huang, T.S., Chang, S.F.: Image retrieval: Past, present, and future. Journal of Visual Comm. and Image Represent 10, 39–62 (1999)
Lew, M.: Principals of Visual Information Retrieval. Springer, Heidelberg (2001)
Branard, K., Duygulu, P., de Freitas, N., Forsyth, D., Blei, D., Jordan, M.I.: Matching words and pictures. Journal of Machine Learning Research 3, 1107–1135 (2003)
Fan, J., Luo, H., Elmagarmid, A.K.: Concept-oriented indexing of video database: towards more effective retrieval and browsing. IEEE Trans. on Image Processing 13(5) (2004)
Benitez, A.B., Smith, J.R., Chang, S.-F.: MediaNet: A multimedia information network for knowledge representation. In: Proc. SPIE, vol. 4210 (2000)
Naphade, M.R., Huang, T.S.: A probabilistic framework for semantic video indexing, filtering, and retrival. IEEE Trans. on Multimedia 3, 141–151 (2001)
Paek, S., Sable, C., et al.: Integration of visual and text-based approaches for the content labeling and classification of photographs. In: SIGIR Workshop on MIR (1999)
Wu, Y., Tian, Q., Huang, T.S.: Discriminant-EM algorithm with application to image retrieval. In: Proc. CVPR, pp. 222–227 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Luo, H., Fan, J., Gao, Y., Xu, G. (2004). Multimodal Salient Objects: General Building Blocks of Semantic Video Concepts. In: Enser, P., Kompatsiaris, Y., O’Connor, N.E., Smeaton, A.F., Smeulders, A.W.M. (eds) Image and Video Retrieval. CIVR 2004. Lecture Notes in Computer Science, vol 3115. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27814-6_45
Download citation
DOI: https://doi.org/10.1007/978-3-540-27814-6_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22539-3
Online ISBN: 978-3-540-27814-6
eBook Packages: Springer Book Archive