21 March 2013 Person-based video summarization and retrieval by tracking and clustering temporal face sequences
Author Affiliations +
Proceedings Volume 8664, Imaging and Printing in a Web 2.0 World IV; 86640O (2013)
Event: IS&T/SPIE Electronic Imaging, 2013, Burlingame, California, United States
People are often the most important subjects in videos. It is highly desired to automatically summarize the occurrences of different people in a large collection of video and quickly find the video clips containing a particular person among them. In this paper, we present a person-based video summarization and retrieval system named VideoWho which extracts temporal face sequences in videos and groups them into clusters, with each cluster containing video clips of the same person. This is accomplished based on advanced face detection and tracking algorithms, together with a semisupervised face clustering approach. The system achieved good clustering accuracy when tested on a hybrid video set including home video, TV plays and movies. On top of this technology, a number of applications can be built, such as automatic summarization of major characters in videos, person-related video search on the Internet and personalized UI systems etc.
© (2013) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Tong Zhang, Di Wen, and Xiaoqing Ding "Person-based video summarization and retrieval by tracking and clustering temporal face sequences", Proc. SPIE 8664, Imaging and Printing in a Web 2.0 World IV, 86640O (21 March 2013); Logo
Cited by 6 scholarly publications.
Get copyright permission  Get copyright permission on Copyright Marketplace

Facial recognition systems

Detection and tracking algorithms


Video processing

Video surveillance

Feature extraction


Research on face recognition based on deep learning
Proceedings of SPIE (November 14 2023)
Real-time gender classification
Proceedings of SPIE (September 25 2003)
Audio-video feature correlation: faces and speech
Proceedings of SPIE (August 24 1999)
Automatic home video abstraction using audio contents
Proceedings of SPIE (August 30 2002)

Back to Top