Abstract
People as news subjects carry rich semantics in broadcast news video and therefore finding a named person in the video is a major challenge for video retrieval. This task can be achieved by exploiting the multi-modal information in videos, including transcript, video structure, and visual features. We propose a comprehensive approach for finding specific persons in broadcast news videos by exploring various clues such as names occurred in the transcript, face information, anchor scenes, and most importantly, the timing pattern between names and people. Experiments on the TRECVID 2003 dataset show that our approach achieves high performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Smeulders, et al.: Content-Based Image Retrieval at the End of the Early Years. IEEE Trans. Pattern Analysis and Machine Intelligence 22(12), 1349–1379 (2000)
Zhang, H.J., Kankanhalli, A., Smoliar, S.W.: Automatic partitioning of full-motion video. ACM Multimedia Systems 1(1) (1993)
Hauptmann, A., et al.: Informedia at TRECVID 2003: Analyzing and Searching Broadcast News Video. In: Proceedings of TREC 2003 (2003)
Satoh, S., Kanade, K.: NAME-IT: Association of Face and Name in Video. In: IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, pp. 775–781 (1997)
The NIST TREC Video Retrieval Evaluation, http://www-nlpir.nist.gov/projects/trecvid/
Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
Baeza-Yates, R., Ribeiro-Neto, N.: Modern Information Retrieval. Addison Wesley, Essex (1999)
Zhai, C., Lafferty, J.: A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval. In: Proc. 24th Int’l ACM SIGIR Conf, pp. 334–342 (2001)
Pentland, A., Moghaddam, B.: Starne,r T.: View-Based and Modular Eigenspaces for Face Recognition IEEE Conference on Computer Vision & Pattern Recognition (1994)
Schneiderman, H., Kanade, T.: Object Detection Using the Statistics of Parts. International Journal of Computer Vision (2003)
Chen, M.Y., Hauptmann, A.: Searching for a Specific Person in Broadcast News Video. In: Int’l Conf. on Acoustics, Speech, and Signal Processing (May 2004) (to appear)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yang, J., Chen, My., Hauptmann, A. (2004). Finding Person X: Correlating Names with Visual Appearances. In: Enser, P., Kompatsiaris, Y., O’Connor, N.E., Smeaton, A.F., Smeulders, A.W.M. (eds) Image and Video Retrieval. CIVR 2004. Lecture Notes in Computer Science, vol 3115. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27814-6_34
Download citation
DOI: https://doi.org/10.1007/978-3-540-27814-6_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22539-3
Online ISBN: 978-3-540-27814-6
eBook Packages: Springer Book Archive