Definition
High-level semantic information, which is otherwise very difficult to derive from the audiovisual content, can be extracted automatically using both audiovisual signal processing as well as screenplay processing and analysis.
Multimedia content analysis of video data so far has relied mostly on the information contained in the raw visual, audio and text signals. In this process the fact that the film production starts with the original screenplay is usually ignored. However, using screenplay information is like using the recipe book for the movie. We demonstrated that high-level semantic information that is otherwise very difficult to derive from the audiovisual content can be extracted automatically using both audiovisual signal processing as well as screenplay processing and analysis.
Here we present the use of screenplay as a source of ground truth for automatic speaker/character identification. Our speaker identification method consists of screenplay parsing, extraction...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
J. Foote, “Methods for the Automatic Analysis of Music and Audio,” TR FXPAL-TR-99–038, 1999.
S. Frank, “Minority Report,” Early and revised Drafts, available from Drew's Script-o-rama, http://www.script-o-rama.com.
N. Patel and I. Sethi. “Video Classification Using Speaker Identification,” IS&E SPIE Proceedings of Storage and Retrieval for Image and Video Databases V, January 1997, San Jose, California.
R. Ronfard and T.T. Thuong, “A Framework for Aligning and Indexing Movies with their Script,” Proceedings of ICME 2003, Baltimore, MD, July 2003.
A. Salway and E. Tomadaki, “Temporal information in collateral texts for indexing moving images,” LREC Workshop on Annotation Standards for Temporal Information in Natural Language, 2002.
R. Turetsky and D.P.W. Ellis. “Ground-Truth Transcriptions of Real Music from Force-Aligned MIDI Syntheses,” ISMIR 2003.
R. Turetsky and N. Dimitrova, “Screenplay Alignment for Closed-System Speaker Identification and Analysis of Feature Films”, ICME 2004, Taipei, Taiwan.
J. Wachman and R.W. Picard, “Tools for browsing a TV situation comedy based on content specific attributes,” Multimedia Tools and Applications, Vol. 13, No. 3, 2001, pp. 255–284.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag
About this entry
Cite this entry
Dimitrova, N., Turetsky, R. (2008). Multiple Source Alignment for Video Analysis. In: Furht, B. (eds) Encyclopedia of Multimedia. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-78414-4_162
Download citation
DOI: https://doi.org/10.1007/978-0-387-78414-4_162
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-74724-8
Online ISBN: 978-0-387-78414-4
eBook Packages: Computer ScienceReference Module Computer Science and Engineering