Fusion of visual and audio features for person identification in real video

Dongge Li; Gang Wei; Ishwar K. Sethi; Nevenka Dimitrova

doi:10.1117/12.410926

1 January 2001 Fusion of visual and audio features for person identification in real video

Dongge Li, Gang Wei, Ishwar K. Sethi, Nevenka Dimitrova

Proceedings Volume 4315, Storage and Retrieval for Media Databases 2001; (2001) https://doi.org/10.1117/12.410926
Event: Photonics West 2001 - Electronic Imaging, 2001, San Jose, CA, United States

Abstract

In this research, we studied the joint use of visual and audio information for the problem of identifying persons in real video. A person identification system, which is able to identify characters in TV shows by the fusion of audio and visual information, is constructed based on two different fusion strategies. In the first strategy, speaker identification is used to verify the face recognition result. The second strategy consists of using face recognition and tracking to supplement speaker identification results. To evaluate our system's performance, an information database was generated by manually labeling the speaker and the main person's face in every I-frame of a video segment of the TV show 'Seinfeld'. By comparing the output form our system with our information database, we evaluated the performance of each of the analysis channels and their fusion. The results show that while the first fusion strategy is suitable for applications where precision is much more critical than recall. The second fusion strategy, on the other hand, generates the best overall identification performance. It outperforms either of the analysis channels greatly in both precision an recall and is applicable to more general applications, such as, in our case, to identify persons in TV programs.

Citation Download Citation

Dongge Li, Gang Wei, Ishwar K. Sethi, and Nevenka Dimitrova "Fusion of visual and audio features for person identification in real video", Proc. SPIE 4315, Storage and Retrieval for Media Databases 2001, (1 January 2001); https://doi.org/10.1117/12.410926

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available