Skip to main content

Multimodal Salient Objects: General Building Blocks of Semantic Video Concepts

  • Conference paper
Image and Video Retrieval (CIVR 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3115))

Included in the following conference series:

Abstract

In this paper, we propose a novel video content representation framework to achieve a middle-level understanding of video contents by using multimodal salient objects. Specifically, this framework includes: (a) A semantic-sensitive video content representation and semantic video concept modeling framework by using the multimodal salient objects and Gaussian mixture models; (b) A machine learning technique to train the automatic detection functions of multimodal salient objects; (c) A novel framework to enable more effective classifier training by integrating model selection and parameter estimation seamlessly in a single algorithm. Our experiments on a certain domain of medical education videos have obtained very convincing results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Chang, S.-F., Chen, W., Sundaram, H.: Semantic visual templates: linking visual features to semantics. In: Proc. ICIP (1998)

    Google Scholar 

  2. Adams, W.H., Iyengar, G., Lin, C.-Y., Naphade, M.R., Neti, C., Nock, H.J., Smith, J.R.: Semantic indexing of multimedia content using visual, audio and text cues. In: EURASIP JASP, vol. 2, pp. 170–185 (2003)

    Google Scholar 

  3. Forsyth, D.A., Fleck, M.: Body plan. In: Proc. of CVPR, pp. 678–683 (1997)

    Google Scholar 

  4. Zhang, H.J., Wu, J., Zhong, D., Smoliar, S.: An integrated system for contentbased video retrieval and browsing. Pattern Recognition 30, 643–658 (1997)

    Article  Google Scholar 

  5. Satoh, S., Kanada, T.: Name-It: Association of face and name in video. In: Proc. of CVPR (1997)

    Google Scholar 

  6. Chang, S.F., Chen, W., Meng, H.J., Sundaram, H., Zhong, D.: A fully automatic content-based video search engine supporting spatiotemporal queries. IEEE Trans. on CSVT 8, 602–615 (1998)

    Google Scholar 

  7. Deng, Y., Manjunath, B.S.: Netra-V: Toward an object-based video representation. IEEE Trans. on CSVT 8, 616–627 (1998)

    Google Scholar 

  8. Dimitrova, N., Zhang, H.J., Shahraray, B., Sezan, I., Huang, T.S., Zakhor, A.: Applications of video-content analysis amd retrieval. IEEE Multimedia, 42–55 (2002)

    Google Scholar 

  9. Rui, Y., Huang, T.S., Chang, S.F.: Image retrieval: Past, present, and future. Journal of Visual Comm. and Image Represent 10, 39–62 (1999)

    Article  Google Scholar 

  10. Lew, M.: Principals of Visual Information Retrieval. Springer, Heidelberg (2001)

    Google Scholar 

  11. Branard, K., Duygulu, P., de Freitas, N., Forsyth, D., Blei, D., Jordan, M.I.: Matching words and pictures. Journal of Machine Learning Research 3, 1107–1135 (2003)

    Article  Google Scholar 

  12. Fan, J., Luo, H., Elmagarmid, A.K.: Concept-oriented indexing of video database: towards more effective retrieval and browsing. IEEE Trans. on Image Processing 13(5) (2004)

    Google Scholar 

  13. Benitez, A.B., Smith, J.R., Chang, S.-F.: MediaNet: A multimedia information network for knowledge representation. In: Proc. SPIE, vol. 4210 (2000)

    Google Scholar 

  14. Naphade, M.R., Huang, T.S.: A probabilistic framework for semantic video indexing, filtering, and retrival. IEEE Trans. on Multimedia 3, 141–151 (2001)

    Article  Google Scholar 

  15. Paek, S., Sable, C., et al.: Integration of visual and text-based approaches for the content labeling and classification of photographs. In: SIGIR Workshop on MIR (1999)

    Google Scholar 

  16. Wu, Y., Tian, Q., Huang, T.S.: Discriminant-EM algorithm with application to image retrieval. In: Proc. CVPR, pp. 222–227 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Luo, H., Fan, J., Gao, Y., Xu, G. (2004). Multimodal Salient Objects: General Building Blocks of Semantic Video Concepts. In: Enser, P., Kompatsiaris, Y., O’Connor, N.E., Smeaton, A.F., Smeulders, A.W.M. (eds) Image and Video Retrieval. CIVR 2004. Lecture Notes in Computer Science, vol 3115. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27814-6_45

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-27814-6_45

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22539-3

  • Online ISBN: 978-3-540-27814-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics