A Graph Based Approach to Speaker Retrieval in Talk  Show Videos with Transcript-Based Supervision

Han, Yina; Liu, Guizhong; Sahbi, Hichem; Chollet, Gérard

doi:10.1007/978-3-642-10467-1_89

Yina Han²²,
Guizhong Liu²²,
Hichem Sahbi²³ &
…
Gérard Chollet²³

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5879))

Included in the following conference series:

Pacific-Rim Conference on Multimedia

1078 Accesses

Abstract

This paper proposes a graph based strategy to retrieve frames containing the queried speakers in talk show videos. Based on who is speaking and when information from the audio transcript, an initial audio-based step, that restricts the queried person to frames corresponding to when he/she is speaking, with a second step that analyzes visual features of shots is combined. Specifically, based on the production property of talk show video, (1) Shot based graph is constructed first. Then the densest sub-graph is returned as the final result. But instead of direct search (DS) of the densest part, (2) We model the intra node connection and inter node connection by a frame layer degree map to take into account the duration information within each shot node; (3)A graph partition strategy without restriction on the shape and the number of sub-graphs is proposed, in which shots containing the same person are more similar to each other. Experiments on one episode of the French talk show “Le Grand Echiquier” show more than 10% improvement to audio only method and more than 7.5% improvement to DS method on average.

This work is supported in part by the National 973 Project under Project No. 2007CB311002, and the National 863 Program under Project No. 2009AA01Z409. This material is based upon work funded by European K-Space Project and French Infom@gic Project.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection

End-to-End Active Speaker Detection

Multimodal person discovery in broadcast TV: lessons learned from MediaEval 2015

Article 23 May 2017

References

Everingham, M., Sivic, J., Zisserman, A.: ”Hello! My name is Buffy”- Automatic naming of characters in TV video. In: BMVC (2006)
Google Scholar
Sivic, J., Everingham, M., Zisserman, A.: Person spotting: video shot retrieval for face sets. In: ACM CIVR, pp. 226–236 (2005)
Google Scholar
Sivic, J., Everingham, M., Zisserman, A.: Who are you? – Learning person specific classifiers from video. In: IEEE CVPR (2009)
Google Scholar
Ozkan, D., Duyqulu, P.: A graph based approach for naming faces in news photos. In: IEEE CVPR, pp. 1477–1482 (2006)
Google Scholar
Berg, T.L., Berg, A.C., Edwards, J., Maire, M., White, R., Yee-Whye, T., Learned-Miler, E., Forsyth, D.A.: Names and faces in the news. In: IEEE CVPR, pp. 848–854 (2004)
Google Scholar
Satoh, S., Kanade, T.: Name-It: Association of face and name in Video. In: IEEE CVPR, pp. 368–373 (1997)
Google Scholar
Yang, J., Chen, M.Y., Hauptmann, A.: Finding person x: Correlating names with visual appearances. In: ACM CIVR, pp. 270–278 (2004)
Google Scholar
Han, Y., Liu, G., Chollet, G., Razik, J.: Person identity clustering in TV show videos. In: IET VIE, pp. 488–493 (2008)
Google Scholar
Han, Y., Razik, J., Chollet, G., Liu, G.: Speaker Retrieval for TV Show Videos by Associating Audio Speaker Recognition Result to Visual Faces. In: Proceedings of the 2nd K-Space PhD Jamboree Workshop (2008)
Google Scholar
Guillaumin, M., Mensink, T., Verbeek, J., Schmid, C.: Automatic Face Naming with Caption-based Supervision. In: IEEE CVPR, pp. 1–8 (2008)
Google Scholar
Comaniciu, D., Meer, P.: Mean Shift: A Robust Approach Toward Feature Space Analysis. IEEE Trans. PAMI. 24, 603–619 (2002)
Google Scholar
Charikar, M.: Greedy approximation algorithms for finding dense components in a graph. In: APPROX: Int. Workshop on Approximation Algorithms for Combinatorial Optimization (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

The School of Electronic and Information Engineering, Xi’an Jiaotong University, 710049, Xi’an, China
Yina Han & Guizhong Liu
TELECOM-ParisTech, CNRS LTCI UMR 5141, 75634, Paris, France
Hichem Sahbi & Gérard Chollet

Authors

Yina Han
View author publications
You can also search for this author in PubMed Google Scholar
Guizhong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Hichem Sahbi
View author publications
You can also search for this author in PubMed Google Scholar
Gérard Chollet
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Naresuan University, 65000, Phisanulok, Thailand
Paisarn Muneesawang
Microsoft Research Asia, 100109, Beijing, China
Feng Wu
Tokyo Institute of Technology, 226-8503, Yokohama, Japan
Itsuo Kumazawa
Mahanakorn University of Technology, 10530, Bankok, Thailand
Athikom Roeksabutr
Institute of Information Science, Academia Sinica, Taipei, Taiwan
Mark Liao
Chinese University of Hong Kong, Shatin, N.T., Hong Kong,
Xiaoou Tang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Han, Y., Liu, G., Sahbi, H., Chollet, G. (2009). A Graph Based Approach to Speaker Retrieval in Talk Show Videos with Transcript-Based Supervision. In: Muneesawang, P., Wu, F., Kumazawa, I., Roeksabutr, A., Liao, M., Tang, X. (eds) Advances in Multimedia Information Processing - PCM 2009. PCM 2009. Lecture Notes in Computer Science, vol 5879. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10467-1_89

Download citation

DOI: https://doi.org/10.1007/978-3-642-10467-1_89
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10466-4
Online ISBN: 978-3-642-10467-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Graph Based Approach to Speaker Retrieval in Talk Show Videos with Transcript-Based Supervision