Paper
1 January 2001 Fusion of visual and audio features for person identification in real video
Dongge Li, Gang Wei, Ishwar K. Sethi, Nevenka Dimitrova
Author Affiliations +
Proceedings Volume 4315, Storage and Retrieval for Media Databases 2001; (2001) https://doi.org/10.1117/12.410926
Event: Photonics West 2001 - Electronic Imaging, 2001, San Jose, CA, United States
Abstract
In this research, we studied the joint use of visual and audio information for the problem of identifying persons in real video. A person identification system, which is able to identify characters in TV shows by the fusion of audio and visual information, is constructed based on two different fusion strategies. In the first strategy, speaker identification is used to verify the face recognition result. The second strategy consists of using face recognition and tracking to supplement speaker identification results. To evaluate our system's performance, an information database was generated by manually labeling the speaker and the main person's face in every I-frame of a video segment of the TV show 'Seinfeld'. By comparing the output form our system with our information database, we evaluated the performance of each of the analysis channels and their fusion. The results show that while the first fusion strategy is suitable for applications where precision is much more critical than recall. The second fusion strategy, on the other hand, generates the best overall identification performance. It outperforms either of the analysis channels greatly in both precision an recall and is applicable to more general applications, such as, in our case, to identify persons in TV programs.
© (2001) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Dongge Li, Gang Wei, Ishwar K. Sethi, and Nevenka Dimitrova "Fusion of visual and audio features for person identification in real video", Proc. SPIE 4315, Storage and Retrieval for Media Databases 2001, (1 January 2001); https://doi.org/10.1117/12.410926
Lens.org Logo
CITATIONS
Cited by 4 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Facial recognition systems

Visualization

Video

Information visualization

Databases

System identification

Image segmentation

RELATED CONTENT

Knowledge-guided parsing in video databases
Proceedings of SPIE (April 14 1993)
Audio-video feature correlation: faces and speech
Proceedings of SPIE (August 24 1999)
Content-based analysis of news video
Proceedings of SPIE (September 25 2001)
Automatic home video abstraction using audio contents
Proceedings of SPIE (August 30 2002)
Content-based video browsing tools
Proceedings of SPIE (March 14 1995)

Back to Top