Abstract
The location of video scenes is an important semantic descriptor especially for broadcast news video. In this paper, we propose a learning-based approach to annotate shots of news video with locations extracted from video transcript, based on features from multiple video modalities including syntactic structure of transcript sentences, speaker identity, temporal video structure, and so on. Machine learning algorithms are adopted to combine multi-modal features to solve two sub-problems: (1) whether the location of a video shot is mentioned in the transcript, and if so, (2) among many locations in the transcript, which are correct one(s) for this shot. Experiments on TRECVID dataset demonstrate that our approach achieves approximately 85% accuracy in correctly labeling the location of any shot in news video.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aoki, H., Schiele, B., Pentland, A.: Recognizing personal location from video. In: Workshop on Perceptual User Interfaces, pp. 79–82 (1998)
Bikel, D., Miller, S., Schwartz, R., Weischedel, R.: Nymble: a high-performance learning name-finder. In: Proc. 5th Conf. on Applied Natural Language Processing, pp. 194–201 (1997)
Christel, M., Olligschlaeger, A., Huang, C.: Interactive maps for a digital video library. IEEE MultiMedia 7(1), 60–67 (2000)
Gauvain, J.-L., Lamel, L., Adda, G.: The limsi broadcast news transcription system. Speech Commun. 37(1-2), 89–108 (2002)
Hauptmann, A., Witbrock, M.: Story segmentation and detection of commercials in broadcast news video. In: Advances in Digital Libraries, pp. 168–179 (1998)
Kumar, R., Sawhney, H., Asmuth, J., Pope, A., Hsu, S.: Registration of video to geo-referenced imagery. In: Proc. of 14th Int’l Conf. on Pattern Recognition, vol. 2, pp. 1393–1400 (1998)
Sato, T., Kanade, T., Hughes, E., Smith, M., Satoh, S.: Video OCR: indexing digital new libraries by recognition of superimposed captions. Multimedia Syst. 7(5), 385–395 (1999)
Sivic, J., Zisserman, A.: Video google: A text retrieval approach to object matching in videos. In: Proc. of 9th IEEE Int’l Conf. on Computer Vision, vol. 2 (2003)
Sleator, D., Temperley, D.: Parsing english with a link grammar. In: Third Int’l. Workshop on Parsing Technologies (1993)
Yang, J., Hauptmann, A.G.: Naming every individual in news video monologues. In: Proc. of the 12th ACM Intl., pp. 580–587 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yang, J., Hauptmann, A.G. (2006). Annotating News Video with Locations. In: Sundaram, H., Naphade, M., Smith, J.R., Rui, Y. (eds) Image and Video Retrieval. CIVR 2006. Lecture Notes in Computer Science, vol 4071. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11788034_16
Download citation
DOI: https://doi.org/10.1007/11788034_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-36018-6
Online ISBN: 978-3-540-36019-3
eBook Packages: Computer ScienceComputer Science (R0)