Semantic Video Indexing

Snoek, Cees G. M.; Worring, Marcel; Geusebroek, Jan-Mark; Koelma, Dennis C.; Seinstra, Frank J.; Smeulders, Arnold W. M.

doi:10.1007/978-3-540-72895-5_8

Cees G. M. Snoek³,
Marcel Worring³,
Jan-Mark Geusebroek³,
Dennis C. Koelma³,
Frank J. Seinstra³ &
…
Arnold W. M. Smeulders³

Part of the book series: Data-Centric Systems and Applications ((DCSA))

1102 Accesses
1 Citations

Abstract

Query-by-keyword is the paradigm on which machine-based text search is still based. Elaborating on the success of text-based search engines, query-by-keyword also gains momentum in multimedia retrieval. For multimedia archives it is hard to achieve access, however, when based on text alone. Multimodal indexing is essential for effective access to video archives. For the automatic detection of specific concepts, the state-of-the-art has produced sophisticated and specialized indexing methods. Other than their textual counterparts, generic methods for semantic indexing in multimedia are neither generally available, nor scalable in their computational needs, nor robust in their performance. As a consequence, semantic access to multimedia archives is still limited. Therefore, there is a case to be made for a new approach to semantic video indexing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

METU-MMDS: An Intelligent Multimedia Database System for Multimodal Content Extraction and Querying

VERGE: A Multimodal Interactive Search Engine for Video Browsing and Retrieval

V-FIRST: A Flexible Interactive Retrieval System for Video at VBS 2022

References

W. H. Adams, G. Iyengar, C.-Y. Lin, M.R. Naphade, C. Neti, H.J. Nock, and J.R. Smith. Semantic indexing of multimedia content using visual, audio, and text cues. EURASIP Journal on Applied Signal Processing, (2):170–185, 2003.
Article Google Scholar
A.A. Alatan, A.N. Akansu, and W. Wolf. Multimodal dialogue scene detection using hidden Markov models for content-based multimedia indexing. Multimedia Tools Applicat., 14(2):137–151, 2001.
Article MATH Google Scholar
A. Amir, M. Berg, S.-F. Chang, W. Hsu, G. Iyengar, C.-Y. Lin, M.R. Naphade, A.P. Natsev, C. Neti, H.J. Nock, J.R. Smith, B.L. Tseng, Y. Wu, and D. Zhang. IBM research TRECVID-2003 video retrieval system. In Proc. TRECVID Workshop, NIST Special Publication, Gaithersburg, USA, 2003.
Google Scholar
J. Baan, A. van Ballegooij, J.-M. Geusebroek, D. Hiemstra, J. den Hartog, J. List, C. Snoek, I. Patras, S. Raaijmakers, L. Todoran, J. Vendrig, A. de Vries, T. Westerveld, and M. Worring. Lazy users and automatic video retrieval tools in (the) lowlands. In E.M. Voorhees and D.K. Harman, editors, Proc. 10th Text REtrieval Conference, volume 500-250 of NIST Special Publication, Gaithersburg, USA, 2001.
Google Scholar
N. Babaguchi, Y. Kawai, and T. Kitahashi. Event based indexing of broadcasted sports video by intermodal collaboration. IEEE Trans. Multimedia, 4(1):68–75, 2002.
Article Google Scholar
H.E. Bal et al. The distributed ASCI supercomputer project. Operating Syst. Review, 34(4):76–96, 2000.
Article Google Scholar
J.M. Boggs and D.W. Petrie. The Art of Watching Films. Mayfield Publishing Company, Mountain View, USA, 5th edition, 2000.
Google Scholar
R.M. Bolle, B.-L. Yeo, and M.M. Yeung. Video query: Research directions. IBM Journal of Research and Development, 42(2):233–252, 1998.
Article Google Scholar
D. Bordwell and K. Thompson. Film Art: An Introduction. McGraw-Hill, New York, USA, 5th edition, 1997.
Google Scholar
R. Brunelli, O. Mich, and C.M. Modena. A survey on the automatic indexing of video data. J. Visual Commun. Image Representation, 10(2):78–112, 1999.
Article Google Scholar
C.J.C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2):121–167, 1998.
Article Google Scholar
C.-C. Chang and C.-J. Lin. LIBSVM: a library for Support Vector Machines, 2001. http://www.csie.ntu.edu.tw/~cjlin/libsvm/.
Google Scholar
J. Fan, A.K. Elmagarmid, X. Zhu, W.G. Aref, and L. Wu. ClassView: hierarchical video shot classification, indexing, and accessing. IEEE Trans. Multimedia, 6(1):70–86, 2004.
Article Google Scholar
J.L. Gauvain, L. Lamel, and G. Adda. The LIMSI broadcast news transcription system. Speech Commun., 37(1–2):89–108, 2002.
Article MATH Google Scholar
J.M. Geusebroek, R. van den Boomgaard, A.W.M. Smeulders, and H. Geerts. Color invariance. IEEE Trans. Pattern Anal. Machine Intell., 23(12):1338–1350, 2001.
Article Google Scholar
N. Haering, R. Qian, and I. Sezan. A semantic event-detection approach and its application to detecting hunts in wildlife video. IEEE Trans. Circuits Syst. Video Technol., 10(6):857–868, 2000.
Article Google Scholar
A.G. Hauptmann. Towards a large scale concept ontology for broadcast video. In CIVR, volume 3115 of LNCS, pages 674–675. Springer-Verlag, 2004.
Google Scholar
A.G. Hauptmann, R.V. Baron, M.-Y. Chen, M. Christel, P. Duygulu, C. Huang, R. Jin, W.-H. Lin, T. Ng, N. Moraveji, N. Papernick, C.G.M. Snoek, G. Tzanetakis, J. Yang, R. Yang, and H.D. Wactlar. Informedia at TRECVID 2003: Analyzing and searching broadcast news video. In Proc. TRECVID Workshop, NIST Special Publication, Gaithersburg, USA, 2003.
Google Scholar
A.K. Jain, R.P.W. Duin, and J. Mao. Statistical pattern recognition: A review. IEEE Trans. Pattern Anal. Machine Intell., 22(1):4–37, 2000.
Article Google Scholar
C.-Y. Lin, B.L. Tseng, and J.R. Smith. Video collaborative annotation forum: Establishing ground-truth labels on large multimedia datasets. In Proc. TRECVID Workshop, NIST Special Publication, Gaithersburg, USA, 2003.
Google Scholar
C.D. Manning and H. Schütze. Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge, USA, 1999.
MATH Google Scholar
M.R. Naphade. On supervision and statistical learning for semantic multimedia analysis. J. Visual Commun. Image Representation, 15(3):348–369, 2004.
Article Google Scholar
M.R. Naphade and T.S. Huang. Extracting semantics from audiovisual content: The final frontier in multimedia retrieval. IEEE Trans. Neural Networks, 13(4):793–810, 2002.
Article Google Scholar
NIST. TREC Video Retrieval Evaluation. http://www-nlpir.nist.gov/projects/trecvid/.
Google Scholar
J.C. Platt. Probabilities for SV machines. In A.J. Smola, P.L. Bartlett, B. Schölkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers, pages 61–74. MIT Press, 2000.
Google Scholar
G.M. Quénot, D. Moraru, L. Besacier, and P. Mulhem. CLIPS at TREC-11: Experiments in video retrieval. In E.M. Voorhees and L.P. Buckland, editors, Proc. 11th Text REtrieval Conference, volume 500-251 of NIST Special Publication, Gaithersburg, USA, 2002.
Google Scholar
T. Sato, T. Kanade, E.K. Hughes, M.A. Smith, and S. Satoh. Video OCR: Indexing digital news libraries by recognition of superimposed caption. Multimedia Syst., 7(5):385–395, 1999.
Article Google Scholar
H. Schneiderman and T. Kanade. Object detection using the statistics of parts. Int’l J. Comput. Vision, 56(3):151–177, 2004.
Article Google Scholar
F.J. Seinstra, C.G.M. Snoek, D. Koelma, J.M. Geusebroek, and M. Worring. User transparent parallel processing of the 2004 NIST TRECVID data set. In Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS’05), pages 90–98, Denver, USA, 2005.
Google Scholar
A.F. Smeaton, W. Kraaij, and P. Over. The TREC VIDeo retrieval evaluation (TRECVID): A case study and status report. In Proc. RIAO 2004, Avignon, France, 2004.
Google Scholar
A.F. Smeaton, P. Over, and W. Kraaij. TRECVID: Evaluating the effectiveness of information retrieval tasks on digital video. In Proceedings of the ACM MM’04 (Multimedia), pages 652–655, New York, USA, 2004.
Google Scholar
A.W.M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain. Content based image retrieval at the end of the early years. IEEE Trans. Pattern Anal. Machine Intell., 22(12):1349–1380, 2000.
Article Google Scholar
J.R. Smith and S.-F. Chang. Visually searching the Web for content. IEEE Multimedia, 4(3):12–20, 1997.
Article Google Scholar
C.G.M. Snoek. The Authoring Metaphor to Machine Understanding of Multimedia. PhD thesis, University of Amsterdam, 2005.
Google Scholar
C.G.M. Snoek and M. Worring. Multimedia event-based video indexing using time intervals. IEEE Trans. Multimedia, 7(4):638–647, 2005.
Article Google Scholar
C.G.M. Snoek and M. Worring. Multimodal video indexing: A review of the state-of-the-art. Multimedia Tools Applicat., 25(1):5–35, 2005.
Article Google Scholar
C.G.M. Snoek, M. Worring, J. van Gemert, J.M. Geusebroek, D. Koelma, G.P. Nguyen, O. de Rooij, and F. Seinstra. MediaMill: Exploring news video archives based on learned semantics. In Proceedings of the ACM International Conference on Multimedia, pages 225–226, Singapore, November 2005.
Google Scholar
C.G.M. Snoek, M. Worring, J.M. Geusebroek, D.C. Koelma, F.J. Seinstra, and A.W.M. Smeulders. The semantic pathfinder: Using an authoring metaphor for generic multimedia indexing. IEEE Trans. Pattern Anal. Machine Intell., 28(10):1678–1689, 2006.
Article Google Scholar
C.G.M. Snoek, M. Worring, and A.G. Hauptmann. Learning rich semantics from news video archives by style analysis. ACM Trans. Multimedia Computing, Comm. Applications, 2(2):91–108, 2006.
Article Google Scholar
V.N. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, New York, USA, 2nd edition, 2000.
MATH Google Scholar
H.D. Wactlar, M.G. Christel, Y. Gong, and A.G. Hauptmann. Lessons learned from building a terabyte digital video library. IEEE Computer, 32(2):66–73, 1999.
Google Scholar
Y. Wang, Z. Liu, and J. Huang. Multimedia content analysis using both audio and visual clues. IEEE Signal Processing Magazine, 17(6):12–36, 2000.
Article Google Scholar
H.-J. Zhang, S.Y. Tan, S.W. Smoliar, and Y. Gong. Automatic parsing and indexing of news video. Multimedia Syst., 2(6):256–266, 1995.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Informatics Institute, University of Amsterdam, Kruislaan 403, 1098 SJ, Amsterdam, The Netherlands
Cees G. M. Snoek, Marcel Worring, Jan-Mark Geusebroek, Dennis C. Koelma, Frank J. Seinstra & Arnold W. M. Smeulders

Authors

Cees G. M. Snoek
View author publications
You can also search for this author in PubMed Google Scholar
Marcel Worring
View author publications
You can also search for this author in PubMed Google Scholar
Jan-Mark Geusebroek
View author publications
You can also search for this author in PubMed Google Scholar
Dennis C. Koelma
View author publications
You can also search for this author in PubMed Google Scholar
Frank J. Seinstra
View author publications
You can also search for this author in PubMed Google Scholar
Arnold W. M. Smeulders
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Database group Faculty of EWI, University of Twente, P.O. Box 217, 7500 AE, Enschede, The Netherlands
Henk M. Blanken , Henk Ernst Blok & Ling Feng , &
Centrum voor Wiskunde en Informatica, Kruislaan 413, 1098 SJ, Amsterdam, The Netherlands
Arjen P. de Vries

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Snoek, C.G.M., Worring, M., Geusebroek, JM., Koelma, D.C., Seinstra, F.J., Smeulders, A.W.M. (2007). Semantic Video Indexing. In: Blanken, H.M., Blok, H.E., Feng, L., de Vries, A.P. (eds) Multimedia Retrieval. Data-Centric Systems and Applications. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72895-5_8

Download citation

DOI: https://doi.org/10.1007/978-3-540-72895-5_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72894-8
Online ISBN: 978-3-540-72895-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics