Abstract
Query-by-keyword is the paradigm on which machine-based text search is still based. Elaborating on the success of text-based search engines, query-by-keyword also gains momentum in multimedia retrieval. For multimedia archives it is hard to achieve access, however, when based on text alone. Multimodal indexing is essential for effective access to video archives. For the automatic detection of specific concepts, the state-of-the-art has produced sophisticated and specialized indexing methods. Other than their textual counterparts, generic methods for semantic indexing in multimedia are neither generally available, nor scalable in their computational needs, nor robust in their performance. As a consequence, semantic access to multimedia archives is still limited. Therefore, there is a case to be made for a new approach to semantic video indexing.
© 2006 IEEE. Reprinted, with permission, from IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10):1678–1689, October 2006 [38].
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
W. H. Adams, G. Iyengar, C.-Y. Lin, M.R. Naphade, C. Neti, H.J. Nock, and J.R. Smith. Semantic indexing of multimedia content using visual, audio, and text cues. EURASIP Journal on Applied Signal Processing, (2):170–185, 2003.
A.A. Alatan, A.N. Akansu, and W. Wolf. Multimodal dialogue scene detection using hidden Markov models for content-based multimedia indexing. Multimedia Tools Applicat., 14(2):137–151, 2001.
A. Amir, M. Berg, S.-F. Chang, W. Hsu, G. Iyengar, C.-Y. Lin, M.R. Naphade, A.P. Natsev, C. Neti, H.J. Nock, J.R. Smith, B.L. Tseng, Y. Wu, and D. Zhang. IBM research TRECVID-2003 video retrieval system. In Proc. TRECVID Workshop, NIST Special Publication, Gaithersburg, USA, 2003.
J. Baan, A. van Ballegooij, J.-M. Geusebroek, D. Hiemstra, J. den Hartog, J. List, C. Snoek, I. Patras, S. Raaijmakers, L. Todoran, J. Vendrig, A. de Vries, T. Westerveld, and M. Worring. Lazy users and automatic video retrieval tools in (the) lowlands. In E.M. Voorhees and D.K. Harman, editors, Proc. 10th Text REtrieval Conference, volume 500-250 of NIST Special Publication, Gaithersburg, USA, 2001.
N. Babaguchi, Y. Kawai, and T. Kitahashi. Event based indexing of broadcasted sports video by intermodal collaboration. IEEE Trans. Multimedia, 4(1):68–75, 2002.
H.E. Bal et al. The distributed ASCI supercomputer project. Operating Syst. Review, 34(4):76–96, 2000.
J.M. Boggs and D.W. Petrie. The Art of Watching Films. Mayfield Publishing Company, Mountain View, USA, 5th edition, 2000.
R.M. Bolle, B.-L. Yeo, and M.M. Yeung. Video query: Research directions. IBM Journal of Research and Development, 42(2):233–252, 1998.
D. Bordwell and K. Thompson. Film Art: An Introduction. McGraw-Hill, New York, USA, 5th edition, 1997.
R. Brunelli, O. Mich, and C.M. Modena. A survey on the automatic indexing of video data. J. Visual Commun. Image Representation, 10(2):78–112, 1999.
C.J.C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2):121–167, 1998.
C.-C. Chang and C.-J. Lin. LIBSVM: a library for Support Vector Machines, 2001. http://www.csie.ntu.edu.tw/~cjlin/libsvm/.
J. Fan, A.K. Elmagarmid, X. Zhu, W.G. Aref, and L. Wu. ClassView: hierarchical video shot classification, indexing, and accessing. IEEE Trans. Multimedia, 6(1):70–86, 2004.
J.L. Gauvain, L. Lamel, and G. Adda. The LIMSI broadcast news transcription system. Speech Commun., 37(1–2):89–108, 2002.
J.M. Geusebroek, R. van den Boomgaard, A.W.M. Smeulders, and H. Geerts. Color invariance. IEEE Trans. Pattern Anal. Machine Intell., 23(12):1338–1350, 2001.
N. Haering, R. Qian, and I. Sezan. A semantic event-detection approach and its application to detecting hunts in wildlife video. IEEE Trans. Circuits Syst. Video Technol., 10(6):857–868, 2000.
A.G. Hauptmann. Towards a large scale concept ontology for broadcast video. In CIVR, volume 3115 of LNCS, pages 674–675. Springer-Verlag, 2004.
A.G. Hauptmann, R.V. Baron, M.-Y. Chen, M. Christel, P. Duygulu, C. Huang, R. Jin, W.-H. Lin, T. Ng, N. Moraveji, N. Papernick, C.G.M. Snoek, G. Tzanetakis, J. Yang, R. Yang, and H.D. Wactlar. Informedia at TRECVID 2003: Analyzing and searching broadcast news video. In Proc. TRECVID Workshop, NIST Special Publication, Gaithersburg, USA, 2003.
A.K. Jain, R.P.W. Duin, and J. Mao. Statistical pattern recognition: A review. IEEE Trans. Pattern Anal. Machine Intell., 22(1):4–37, 2000.
C.-Y. Lin, B.L. Tseng, and J.R. Smith. Video collaborative annotation forum: Establishing ground-truth labels on large multimedia datasets. In Proc. TRECVID Workshop, NIST Special Publication, Gaithersburg, USA, 2003.
C.D. Manning and H. Schütze. Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge, USA, 1999.
M.R. Naphade. On supervision and statistical learning for semantic multimedia analysis. J. Visual Commun. Image Representation, 15(3):348–369, 2004.
M.R. Naphade and T.S. Huang. Extracting semantics from audiovisual content: The final frontier in multimedia retrieval. IEEE Trans. Neural Networks, 13(4):793–810, 2002.
NIST. TREC Video Retrieval Evaluation. http://www-nlpir.nist.gov/projects/trecvid/.
J.C. Platt. Probabilities for SV machines. In A.J. Smola, P.L. Bartlett, B. Schölkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers, pages 61–74. MIT Press, 2000.
G.M. Quénot, D. Moraru, L. Besacier, and P. Mulhem. CLIPS at TREC-11: Experiments in video retrieval. In E.M. Voorhees and L.P. Buckland, editors, Proc. 11th Text REtrieval Conference, volume 500-251 of NIST Special Publication, Gaithersburg, USA, 2002.
T. Sato, T. Kanade, E.K. Hughes, M.A. Smith, and S. Satoh. Video OCR: Indexing digital news libraries by recognition of superimposed caption. Multimedia Syst., 7(5):385–395, 1999.
H. Schneiderman and T. Kanade. Object detection using the statistics of parts. Int’l J. Comput. Vision, 56(3):151–177, 2004.
F.J. Seinstra, C.G.M. Snoek, D. Koelma, J.M. Geusebroek, and M. Worring. User transparent parallel processing of the 2004 NIST TRECVID data set. In Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS’05), pages 90–98, Denver, USA, 2005.
A.F. Smeaton, W. Kraaij, and P. Over. The TREC VIDeo retrieval evaluation (TRECVID): A case study and status report. In Proc. RIAO 2004, Avignon, France, 2004.
A.F. Smeaton, P. Over, and W. Kraaij. TRECVID: Evaluating the effectiveness of information retrieval tasks on digital video. In Proceedings of the ACM MM’04 (Multimedia), pages 652–655, New York, USA, 2004.
A.W.M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain. Content based image retrieval at the end of the early years. IEEE Trans. Pattern Anal. Machine Intell., 22(12):1349–1380, 2000.
J.R. Smith and S.-F. Chang. Visually searching the Web for content. IEEE Multimedia, 4(3):12–20, 1997.
C.G.M. Snoek. The Authoring Metaphor to Machine Understanding of Multimedia. PhD thesis, University of Amsterdam, 2005.
C.G.M. Snoek and M. Worring. Multimedia event-based video indexing using time intervals. IEEE Trans. Multimedia, 7(4):638–647, 2005.
C.G.M. Snoek and M. Worring. Multimodal video indexing: A review of the state-of-the-art. Multimedia Tools Applicat., 25(1):5–35, 2005.
C.G.M. Snoek, M. Worring, J. van Gemert, J.M. Geusebroek, D. Koelma, G.P. Nguyen, O. de Rooij, and F. Seinstra. MediaMill: Exploring news video archives based on learned semantics. In Proceedings of the ACM International Conference on Multimedia, pages 225–226, Singapore, November 2005.
C.G.M. Snoek, M. Worring, J.M. Geusebroek, D.C. Koelma, F.J. Seinstra, and A.W.M. Smeulders. The semantic pathfinder: Using an authoring metaphor for generic multimedia indexing. IEEE Trans. Pattern Anal. Machine Intell., 28(10):1678–1689, 2006.
C.G.M. Snoek, M. Worring, and A.G. Hauptmann. Learning rich semantics from news video archives by style analysis. ACM Trans. Multimedia Computing, Comm. Applications, 2(2):91–108, 2006.
V.N. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, New York, USA, 2nd edition, 2000.
H.D. Wactlar, M.G. Christel, Y. Gong, and A.G. Hauptmann. Lessons learned from building a terabyte digital video library. IEEE Computer, 32(2):66–73, 1999.
Y. Wang, Z. Liu, and J. Huang. Multimedia content analysis using both audio and visual clues. IEEE Signal Processing Magazine, 17(6):12–36, 2000.
H.-J. Zhang, S.Y. Tan, S.W. Smoliar, and Y. Gong. Automatic parsing and indexing of news video. Multimedia Syst., 2(6):256–266, 1995.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Snoek, C.G.M., Worring, M., Geusebroek, JM., Koelma, D.C., Seinstra, F.J., Smeulders, A.W.M. (2007). Semantic Video Indexing. In: Blanken, H.M., Blok, H.E., Feng, L., de Vries, A.P. (eds) Multimedia Retrieval. Data-Centric Systems and Applications. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72895-5_8
Download citation
DOI: https://doi.org/10.1007/978-3-540-72895-5_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72894-8
Online ISBN: 978-3-540-72895-5
eBook Packages: Computer ScienceComputer Science (R0)