Multiple Source Alignment for Video Analysis

Dimitrova, Nevenka; Turetsky, Robert

doi:10.1007/0-387-30038-4_174

Nevenka Dimitrova² &
Robert Turetsky^2,3

71 Accesses
1 Citations

Abstract

Definition:High-level semantic information, which is otherwise very difficult to derive from the audiovisual content, can be extracted automatically using both audiovisual signal processing as well as screenplay processing and analysis.

Multimedia content analysis of video data so far has relied mostly on the information contained in the raw visual, audio and text signals. In this process the fact that the film production starts with the original screenplay is usually ignored. However, using screenplay information is like using the recipe book for the movie. We demonstrated that high-level semantic information that is otherwise very difficult to derive from the audiovisual content can be extracted automatically using both audiovisual signal processing as well as screenplay processing and analysis.

Here we present the use of screenplay as a source of ground truth for automatic speaker/character identification. Our speaker identification method consists of screenplay parsing, extraction of time-stamped transcript, alignment of the screenplay with the time-stamped transcript, audio segmentation and audio speaker identification. As the screenplay alignment will not be able to identify all dialogue sections within any film, we use the segments found by alignment as labels to train a statistical model in order to identify unaligned pieces of dialogue. Character names from the screenplay are converted to actor names based on fields extracted from imdb.com. We find that on average the screenplay alignment was able to properly identify the speaker in one third of lines of dialogue. However, with additional automatic statistical labeling for audio speaker ID on the soundtrack our recognition rate improves significantly.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

J. Foote, “Methods for the Automatic Analysis of Music and Audio,” TR FXPAL-TR-99-038, 1999.
Google Scholar
S. Frank, “Minority Report,” Early and revised Drafts, available from Drew’s Script-o-rama, http://www.script-o-rama.com.
Google Scholar
N. Patel and I. Sethi. “Video Classification Using Speaker Identification,” IS&E SPIE Proceedings of Storage and Retrieval for Image and Video Databases V, January 1997, San Jose, California.
Google Scholar
R. Ronfard and T.T. Thuong, “A Framework for Aligning and Indexing Movies with their Script,” Proceedings of ICME 2003, Baltimore, MD, July 2003.
Google Scholar
A. Salway and E. Tomadaki, “Temporal information in collateral texts for indexing moving images,” LREC Workshop on Annotation Standards for Temporal Information in Natural Language, 2002.
Google Scholar
R. Turetsky and D. P. W. Ellis. “Ground-Truth Transcriptions of Real Music from Force-Aligned MIDI Syntheses,” ISMIR 2003.
Google Scholar
R. Turetsky and N. Dimitrova, “Screenplay Alignment for Closed-System Speaker Identification and Analysis of Feature Films”, ICME 2004, Taipei, Taiwan.
Google Scholar
J. Wachman and R. W. Picard, “Tools for browsing a TV situation comedy based on content specific attributes,” Multimedia Tools and Applications, Vol. 13, No. 3, 2001, pp. 255–284.
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Philips Research, Briarcliff Manor, NY, USA
Nevenka Dimitrova & Robert Turetsky
Columbia University, New York, NY, USA
Robert Turetsky

Authors

Nevenka Dimitrova
View author publications
You can also search for this author in PubMed Google Scholar
Robert Turetsky
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Florida Atlantic University, Boca Raton, USA
Borko Furht (Editor-in-Chief) (Editor-in-Chief)

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Dimitrova, N., Turetsky, R. (2006). Multiple Source Alignment for Video Analysis. In: Furht, B. (eds) Encyclopedia of Multimedia. Springer, Boston, MA. https://doi.org/10.1007/0-387-30038-4_174

Download citation

DOI: https://doi.org/10.1007/0-387-30038-4_174
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-24395-5
Online ISBN: 978-0-387-30038-2
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics