Annotating News Video with Locations

Yang, Jun; Hauptmann, Alexander G.

doi:10.1007/11788034_16

Jun Yang²⁰ &
Alexander G. Hauptmann²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4071))

Included in the following conference series:

International Conference on Image and Video Retrieval

794 Accesses

Abstract

The location of video scenes is an important semantic descriptor especially for broadcast news video. In this paper, we propose a learning-based approach to annotate shots of news video with locations extracted from video transcript, based on features from multiple video modalities including syntactic structure of transcript sentences, speaker identity, temporal video structure, and so on. Machine learning algorithms are adopted to combine multi-modal features to solve two sub-problems: (1) whether the location of a video shot is mentioned in the transcript, and if so, (2) among many locations in the transcript, which are correct one(s) for this shot. Experiments on TRECVID dataset demonstrate that our approach achieves approximately 85% accuracy in correctly labeling the location of any shot in news video.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Recognizing key segments of videos for video annotation by learning from web image sets

Article 01 February 2016

Advancing Weakly-Supervised Audio-Visual Video Parsing via Segment-Wise Pseudo Labeling

Article 09 June 2024

Joint-Modal Label Denoising for Weakly-Supervised Audio-Visual Video Parsing

References

Aoki, H., Schiele, B., Pentland, A.: Recognizing personal location from video. In: Workshop on Perceptual User Interfaces, pp. 79–82 (1998)
Google Scholar
Bikel, D., Miller, S., Schwartz, R., Weischedel, R.: Nymble: a high-performance learning name-finder. In: Proc. 5th Conf. on Applied Natural Language Processing, pp. 194–201 (1997)
Google Scholar
Christel, M., Olligschlaeger, A., Huang, C.: Interactive maps for a digital video library. IEEE MultiMedia 7(1), 60–67 (2000)
Article Google Scholar
Gauvain, J.-L., Lamel, L., Adda, G.: The limsi broadcast news transcription system. Speech Commun. 37(1-2), 89–108 (2002)
Article MATH Google Scholar
Hauptmann, A., Witbrock, M.: Story segmentation and detection of commercials in broadcast news video. In: Advances in Digital Libraries, pp. 168–179 (1998)
Google Scholar
Kumar, R., Sawhney, H., Asmuth, J., Pope, A., Hsu, S.: Registration of video to geo-referenced imagery. In: Proc. of 14th Int’l Conf. on Pattern Recognition, vol. 2, pp. 1393–1400 (1998)
Google Scholar
Sato, T., Kanade, T., Hughes, E., Smith, M., Satoh, S.: Video OCR: indexing digital new libraries by recognition of superimposed captions. Multimedia Syst. 7(5), 385–395 (1999)
Article Google Scholar
Sivic, J., Zisserman, A.: Video google: A text retrieval approach to object matching in videos. In: Proc. of 9th IEEE Int’l Conf. on Computer Vision, vol. 2 (2003)
Google Scholar
Sleator, D., Temperley, D.: Parsing english with a link grammar. In: Third Int’l. Workshop on Parsing Technologies (1993)
Google Scholar
Yang, J., Hauptmann, A.G.: Naming every individual in news video monologues. In: Proc. of the 12th ACM Intl., pp. 580–587 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh, PA, 15213, USA
Jun Yang & Alexander G. Hauptmann

Authors

Jun Yang
View author publications
You can also search for this author in PubMed Google Scholar
Alexander G. Hauptmann
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Arts, Media and Engineering Program, Arizona State University, 85281, Tempe, AZ,
Hari Sundaram
Intelligent Information Management Department, IBM T.J. Watson Research Center, 19 Skyline Drive, 10532, Hawthorne, NY, USA
Milind Naphade
Intelligent Information Management Department, IBM T. J. Watson Research Center, 19 Skyline Drive, 10532, Hawthorne, NY, USA
John R. Smith
Microsoft Corporation, Microsoft China R&D Group, 49 Zhichun Road, 100080, Beijing, China
Yong Rui

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, J., Hauptmann, A.G. (2006). Annotating News Video with Locations. In: Sundaram, H., Naphade, M., Smith, J.R., Rui, Y. (eds) Image and Video Retrieval. CIVR 2006. Lecture Notes in Computer Science, vol 4071. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11788034_16

Download citation

DOI: https://doi.org/10.1007/11788034_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-36018-6
Online ISBN: 978-3-540-36019-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics