Skip to main content

Semantic Video Indexing

  • Chapter
Multimedia Retrieval

Abstract

Query-by-keyword is the paradigm on which machine-based text search is still based. Elaborating on the success of text-based search engines, query-by-keyword also gains momentum in multimedia retrieval. For multimedia archives it is hard to achieve access, however, when based on text alone. Multimodal indexing is essential for effective access to video archives. For the automatic detection of specific concepts, the state-of-the-art has produced sophisticated and specialized indexing methods. Other than their textual counterparts, generic methods for semantic indexing in multimedia are neither generally available, nor scalable in their computational needs, nor robust in their performance. As a consequence, semantic access to multimedia archives is still limited. Therefore, there is a case to be made for a new approach to semantic video indexing.

© 2006 IEEE. Reprinted, with permission, from IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10):1678–1689, October 2006 [38].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. W. H. Adams, G. Iyengar, C.-Y. Lin, M.R. Naphade, C. Neti, H.J. Nock, and J.R. Smith. Semantic indexing of multimedia content using visual, audio, and text cues. EURASIP Journal on Applied Signal Processing, (2):170–185, 2003.

    Article  Google Scholar 

  2. A.A. Alatan, A.N. Akansu, and W. Wolf. Multimodal dialogue scene detection using hidden Markov models for content-based multimedia indexing. Multimedia Tools Applicat., 14(2):137–151, 2001.

    Article  MATH  Google Scholar 

  3. A. Amir, M. Berg, S.-F. Chang, W. Hsu, G. Iyengar, C.-Y. Lin, M.R. Naphade, A.P. Natsev, C. Neti, H.J. Nock, J.R. Smith, B.L. Tseng, Y. Wu, and D. Zhang. IBM research TRECVID-2003 video retrieval system. In Proc. TRECVID Workshop, NIST Special Publication, Gaithersburg, USA, 2003.

    Google Scholar 

  4. J. Baan, A. van Ballegooij, J.-M. Geusebroek, D. Hiemstra, J. den Hartog, J. List, C. Snoek, I. Patras, S. Raaijmakers, L. Todoran, J. Vendrig, A. de Vries, T. Westerveld, and M. Worring. Lazy users and automatic video retrieval tools in (the) lowlands. In E.M. Voorhees and D.K. Harman, editors, Proc. 10th Text REtrieval Conference, volume 500-250 of NIST Special Publication, Gaithersburg, USA, 2001.

    Google Scholar 

  5. N. Babaguchi, Y. Kawai, and T. Kitahashi. Event based indexing of broadcasted sports video by intermodal collaboration. IEEE Trans. Multimedia, 4(1):68–75, 2002.

    Article  Google Scholar 

  6. H.E. Bal et al. The distributed ASCI supercomputer project. Operating Syst. Review, 34(4):76–96, 2000.

    Article  Google Scholar 

  7. J.M. Boggs and D.W. Petrie. The Art of Watching Films. Mayfield Publishing Company, Mountain View, USA, 5th edition, 2000.

    Google Scholar 

  8. R.M. Bolle, B.-L. Yeo, and M.M. Yeung. Video query: Research directions. IBM Journal of Research and Development, 42(2):233–252, 1998.

    Article  Google Scholar 

  9. D. Bordwell and K. Thompson. Film Art: An Introduction. McGraw-Hill, New York, USA, 5th edition, 1997.

    Google Scholar 

  10. R. Brunelli, O. Mich, and C.M. Modena. A survey on the automatic indexing of video data. J. Visual Commun. Image Representation, 10(2):78–112, 1999.

    Article  Google Scholar 

  11. C.J.C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2):121–167, 1998.

    Article  Google Scholar 

  12. C.-C. Chang and C.-J. Lin. LIBSVM: a library for Support Vector Machines, 2001. http://www.csie.ntu.edu.tw/~cjlin/libsvm/.

    Google Scholar 

  13. J. Fan, A.K. Elmagarmid, X. Zhu, W.G. Aref, and L. Wu. ClassView: hierarchical video shot classification, indexing, and accessing. IEEE Trans. Multimedia, 6(1):70–86, 2004.

    Article  Google Scholar 

  14. J.L. Gauvain, L. Lamel, and G. Adda. The LIMSI broadcast news transcription system. Speech Commun., 37(1–2):89–108, 2002.

    Article  MATH  Google Scholar 

  15. J.M. Geusebroek, R. van den Boomgaard, A.W.M. Smeulders, and H. Geerts. Color invariance. IEEE Trans. Pattern Anal. Machine Intell., 23(12):1338–1350, 2001.

    Article  Google Scholar 

  16. N. Haering, R. Qian, and I. Sezan. A semantic event-detection approach and its application to detecting hunts in wildlife video. IEEE Trans. Circuits Syst. Video Technol., 10(6):857–868, 2000.

    Article  Google Scholar 

  17. A.G. Hauptmann. Towards a large scale concept ontology for broadcast video. In CIVR, volume 3115 of LNCS, pages 674–675. Springer-Verlag, 2004.

    Google Scholar 

  18. A.G. Hauptmann, R.V. Baron, M.-Y. Chen, M. Christel, P. Duygulu, C. Huang, R. Jin, W.-H. Lin, T. Ng, N. Moraveji, N. Papernick, C.G.M. Snoek, G. Tzanetakis, J. Yang, R. Yang, and H.D. Wactlar. Informedia at TRECVID 2003: Analyzing and searching broadcast news video. In Proc. TRECVID Workshop, NIST Special Publication, Gaithersburg, USA, 2003.

    Google Scholar 

  19. A.K. Jain, R.P.W. Duin, and J. Mao. Statistical pattern recognition: A review. IEEE Trans. Pattern Anal. Machine Intell., 22(1):4–37, 2000.

    Article  Google Scholar 

  20. C.-Y. Lin, B.L. Tseng, and J.R. Smith. Video collaborative annotation forum: Establishing ground-truth labels on large multimedia datasets. In Proc. TRECVID Workshop, NIST Special Publication, Gaithersburg, USA, 2003.

    Google Scholar 

  21. C.D. Manning and H. Schütze. Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge, USA, 1999.

    MATH  Google Scholar 

  22. M.R. Naphade. On supervision and statistical learning for semantic multimedia analysis. J. Visual Commun. Image Representation, 15(3):348–369, 2004.

    Article  Google Scholar 

  23. M.R. Naphade and T.S. Huang. Extracting semantics from audiovisual content: The final frontier in multimedia retrieval. IEEE Trans. Neural Networks, 13(4):793–810, 2002.

    Article  Google Scholar 

  24. NIST. TREC Video Retrieval Evaluation. http://www-nlpir.nist.gov/projects/trecvid/.

    Google Scholar 

  25. J.C. Platt. Probabilities for SV machines. In A.J. Smola, P.L. Bartlett, B. Schölkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers, pages 61–74. MIT Press, 2000.

    Google Scholar 

  26. G.M. Quénot, D. Moraru, L. Besacier, and P. Mulhem. CLIPS at TREC-11: Experiments in video retrieval. In E.M. Voorhees and L.P. Buckland, editors, Proc. 11th Text REtrieval Conference, volume 500-251 of NIST Special Publication, Gaithersburg, USA, 2002.

    Google Scholar 

  27. T. Sato, T. Kanade, E.K. Hughes, M.A. Smith, and S. Satoh. Video OCR: Indexing digital news libraries by recognition of superimposed caption. Multimedia Syst., 7(5):385–395, 1999.

    Article  Google Scholar 

  28. H. Schneiderman and T. Kanade. Object detection using the statistics of parts. Int’l J. Comput. Vision, 56(3):151–177, 2004.

    Article  Google Scholar 

  29. F.J. Seinstra, C.G.M. Snoek, D. Koelma, J.M. Geusebroek, and M. Worring. User transparent parallel processing of the 2004 NIST TRECVID data set. In Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS’05), pages 90–98, Denver, USA, 2005.

    Google Scholar 

  30. A.F. Smeaton, W. Kraaij, and P. Over. The TREC VIDeo retrieval evaluation (TRECVID): A case study and status report. In Proc. RIAO 2004, Avignon, France, 2004.

    Google Scholar 

  31. A.F. Smeaton, P. Over, and W. Kraaij. TRECVID: Evaluating the effectiveness of information retrieval tasks on digital video. In Proceedings of the ACM MM’04 (Multimedia), pages 652–655, New York, USA, 2004.

    Google Scholar 

  32. A.W.M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain. Content based image retrieval at the end of the early years. IEEE Trans. Pattern Anal. Machine Intell., 22(12):1349–1380, 2000.

    Article  Google Scholar 

  33. J.R. Smith and S.-F. Chang. Visually searching the Web for content. IEEE Multimedia, 4(3):12–20, 1997.

    Article  Google Scholar 

  34. C.G.M. Snoek. The Authoring Metaphor to Machine Understanding of Multimedia. PhD thesis, University of Amsterdam, 2005.

    Google Scholar 

  35. C.G.M. Snoek and M. Worring. Multimedia event-based video indexing using time intervals. IEEE Trans. Multimedia, 7(4):638–647, 2005.

    Article  Google Scholar 

  36. C.G.M. Snoek and M. Worring. Multimodal video indexing: A review of the state-of-the-art. Multimedia Tools Applicat., 25(1):5–35, 2005.

    Article  Google Scholar 

  37. C.G.M. Snoek, M. Worring, J. van Gemert, J.M. Geusebroek, D. Koelma, G.P. Nguyen, O. de Rooij, and F. Seinstra. MediaMill: Exploring news video archives based on learned semantics. In Proceedings of the ACM International Conference on Multimedia, pages 225–226, Singapore, November 2005.

    Google Scholar 

  38. C.G.M. Snoek, M. Worring, J.M. Geusebroek, D.C. Koelma, F.J. Seinstra, and A.W.M. Smeulders. The semantic pathfinder: Using an authoring metaphor for generic multimedia indexing. IEEE Trans. Pattern Anal. Machine Intell., 28(10):1678–1689, 2006.

    Article  Google Scholar 

  39. C.G.M. Snoek, M. Worring, and A.G. Hauptmann. Learning rich semantics from news video archives by style analysis. ACM Trans. Multimedia Computing, Comm. Applications, 2(2):91–108, 2006.

    Article  Google Scholar 

  40. V.N. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, New York, USA, 2nd edition, 2000.

    MATH  Google Scholar 

  41. H.D. Wactlar, M.G. Christel, Y. Gong, and A.G. Hauptmann. Lessons learned from building a terabyte digital video library. IEEE Computer, 32(2):66–73, 1999.

    Google Scholar 

  42. Y. Wang, Z. Liu, and J. Huang. Multimedia content analysis using both audio and visual clues. IEEE Signal Processing Magazine, 17(6):12–36, 2000.

    Article  Google Scholar 

  43. H.-J. Zhang, S.Y. Tan, S.W. Smoliar, and Y. Gong. Automatic parsing and indexing of news video. Multimedia Syst., 2(6):256–266, 1995.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Snoek, C.G.M., Worring, M., Geusebroek, JM., Koelma, D.C., Seinstra, F.J., Smeulders, A.W.M. (2007). Semantic Video Indexing. In: Blanken, H.M., Blok, H.E., Feng, L., de Vries, A.P. (eds) Multimedia Retrieval. Data-Centric Systems and Applications. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72895-5_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-72895-5_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-72894-8

  • Online ISBN: 978-3-540-72895-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics