Multimodal Salient Objects: General Building Blocks of Semantic Video Concepts

Luo, Hangzai; Fan, Jianping; Gao, Yuli; Xu, Guangyou

doi:10.1007/978-3-540-27814-6_45

Hangzai Luo²⁰,
Jianping Fan²⁰,
Yuli Gao²⁰ &
…
Guangyou Xu²¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3115))

Included in the following conference series:

International Conference on Image and Video Retrieval

1001 Accesses
4 Citations

Abstract

In this paper, we propose a novel video content representation framework to achieve a middle-level understanding of video contents by using multimodal salient objects. Specifically, this framework includes: (a) A semantic-sensitive video content representation and semantic video concept modeling framework by using the multimodal salient objects and Gaussian mixture models; (b) A machine learning technique to train the automatic detection functions of multimodal salient objects; (c) A novel framework to enable more effective classifier training by integrating model selection and parameter estimation seamlessly in a single algorithm. Our experiments on a certain domain of medical education videos have obtained very convincing results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Enhanced video temporal segmentation using a Siamese network with multimodal features

Article 07 July 2023

InVideo Search: Scene Description Clustering and Integrating Image and Audio Captioning for Enhanced Video Search

Video retrieval using salient foreground region of motion vector based extracted keyframes and spatial pyramid matching

Article 30 July 2020

References

Chang, S.-F., Chen, W., Sundaram, H.: Semantic visual templates: linking visual features to semantics. In: Proc. ICIP (1998)
Google Scholar
Adams, W.H., Iyengar, G., Lin, C.-Y., Naphade, M.R., Neti, C., Nock, H.J., Smith, J.R.: Semantic indexing of multimedia content using visual, audio and text cues. In: EURASIP JASP, vol. 2, pp. 170–185 (2003)
Google Scholar
Forsyth, D.A., Fleck, M.: Body plan. In: Proc. of CVPR, pp. 678–683 (1997)
Google Scholar
Zhang, H.J., Wu, J., Zhong, D., Smoliar, S.: An integrated system for contentbased video retrieval and browsing. Pattern Recognition 30, 643–658 (1997)
Article Google Scholar
Satoh, S., Kanada, T.: Name-It: Association of face and name in video. In: Proc. of CVPR (1997)
Google Scholar
Chang, S.F., Chen, W., Meng, H.J., Sundaram, H., Zhong, D.: A fully automatic content-based video search engine supporting spatiotemporal queries. IEEE Trans. on CSVT 8, 602–615 (1998)
Google Scholar
Deng, Y., Manjunath, B.S.: Netra-V: Toward an object-based video representation. IEEE Trans. on CSVT 8, 616–627 (1998)
Google Scholar
Dimitrova, N., Zhang, H.J., Shahraray, B., Sezan, I., Huang, T.S., Zakhor, A.: Applications of video-content analysis amd retrieval. IEEE Multimedia, 42–55 (2002)
Google Scholar
Rui, Y., Huang, T.S., Chang, S.F.: Image retrieval: Past, present, and future. Journal of Visual Comm. and Image Represent 10, 39–62 (1999)
Article Google Scholar
Lew, M.: Principals of Visual Information Retrieval. Springer, Heidelberg (2001)
Google Scholar
Branard, K., Duygulu, P., de Freitas, N., Forsyth, D., Blei, D., Jordan, M.I.: Matching words and pictures. Journal of Machine Learning Research 3, 1107–1135 (2003)
Article Google Scholar
Fan, J., Luo, H., Elmagarmid, A.K.: Concept-oriented indexing of video database: towards more effective retrieval and browsing. IEEE Trans. on Image Processing 13(5) (2004)
Google Scholar
Benitez, A.B., Smith, J.R., Chang, S.-F.: MediaNet: A multimedia information network for knowledge representation. In: Proc. SPIE, vol. 4210 (2000)
Google Scholar
Naphade, M.R., Huang, T.S.: A probabilistic framework for semantic video indexing, filtering, and retrival. IEEE Trans. on Multimedia 3, 141–151 (2001)
Article Google Scholar
Paek, S., Sable, C., et al.: Integration of visual and text-based approaches for the content labeling and classification of photographs. In: SIGIR Workshop on MIR (1999)
Google Scholar
Wu, Y., Tian, Q., Huang, T.S.: Discriminant-EM algorithm with application to image retrieval. In: Proc. CVPR, pp. 222–227 (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of North Carolina, Charlotte, NC, 28223, USA
Hangzai Luo, Jianping Fan & Yuli Gao
Department of Computer Science, Tsinghua University, Beijing, CHINA
Guangyou Xu

Authors

Hangzai Luo
View author publications
You can also search for this author in PubMed Google Scholar
Jianping Fan
View author publications
You can also search for this author in PubMed Google Scholar
Yuli Gao
View author publications
You can also search for this author in PubMed Google Scholar
Guangyou Xu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computing, Mathematical and Information Sciences, University of Brighton, UK
Peter Enser
Informatics and Telematics Institute, Centre for Research and Technology-Hellas, 57001, Thessaloniki, Greece
Yiannis Kompatsiaris
Centre for Digital Video Processing, Adaptive Information Cluster, Dublin City University, Ireland
Noel E. O’Connor
Dublin City University, Dublin, Ireland
Alan F. Smeaton
ISLA lab, Informatics Institute, University of Amsterdam, The Netherlands
Arnold W. M. Smeulders

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Luo, H., Fan, J., Gao, Y., Xu, G. (2004). Multimodal Salient Objects: General Building Blocks of Semantic Video Concepts. In: Enser, P., Kompatsiaris, Y., O’Connor, N.E., Smeaton, A.F., Smeulders, A.W.M. (eds) Image and Video Retrieval. CIVR 2004. Lecture Notes in Computer Science, vol 3115. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27814-6_45

Download citation

DOI: https://doi.org/10.1007/978-3-540-27814-6_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22539-3
Online ISBN: 978-3-540-27814-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Multimodal Salient Objects: General Building Blocks of Semantic Video Concepts

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Enhanced video temporal segmentation using a Siamese network with multimodal features

InVideo Search: Scene Description Clustering and Integrating Image and Audio Captioning for Enhanced Video Search

Video retrieval using salient foreground region of motion vector based extracted keyframes and spatial pyramid matching

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Multimodal Salient Objects: General Building Blocks of Semantic Video Concepts

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Enhanced video temporal segmentation using a Siamese network with multimodal features

InVideo Search: Scene Description Clustering and Integrating Image and Audio Captioning for Enhanced Video Search

Video retrieval using salient foreground region of motion vector based extracted keyframes and spatial pyramid matching

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation