ABSTRACT
In this paper we discuss the challenges of scaling a speaker retrieval system for small audiovisual collections towards a speaker retrieval system for large audio (visual) archives. We show that with our large scale speaker diarization approach it is possible to perform query-by-example speaker retrieval; to search for audiovisual documents in which a particular person is talking. On a selection of the ICSI meeting corpus we obtain a Mean Average Precision of 0.49 and precision-at-ten of 0.70. On a much larger archive of three months of Dutch broadcast television we obtain a precision-at-ten of 0.52.
- R. B. Dunn, D. A. Reynolds, and T. F. Quatieri. Approaches to speaker detection and tracking in conversational speech. Digital Signal Processing, 10(1-3):93--112, 2000.Google ScholarDigital Library
- A. J. et all. The ICSI meeting project: Resources and research. In NIST ICASSP 2004 Meeting Recognition Workshop, Montreal, May 2004.Google Scholar
- J. G. Fiscus, J. Ajot, and J. S. Garofolo. The rich transcription 2007 meeting recognition evaluation. In Multimodal Technologies for Perception of Humans, Lecture Notes in Computer Science, Berlin, 2008. Google ScholarDigital Library
- O. Glembek, L. Burget, N. Dehak, N. Bròmmer, and P. Kenny. Comparison of scoring methods used in speaker recognition with joint factor analysis. In Proc ICASSP 2009, Taipei, Taiwan, April 2009. Google ScholarDigital Library
- M. Huijbregts and D. van Leeuwen. The RU submission to the Evalita'09 "application track" speaker recognition evaluation. In proceedings of Evalita 2009, 2009.Google Scholar
- M. Huijbregts and D. van Leeuwen. Large scale speaker diarization for long recordings and small collections. IEEE Transactions on Audio, Speech and Language Processing, submitted. Google ScholarDigital Library
- M. Huijbregts, C. Wooters, and R. Ordelman. Filtering the unknown: Speech activity detection in heterogeneous video collections. In proceedings of Interspeech, Antwerp, Belgium, August 2007.Google Scholar
- A. Martin and M. Przybocki. The nist 1999 speaker recognition evaluation - an overview. Digital Signal Processing, 10(1-3):1--18, 2000.Google ScholarDigital Library
- D. Reynolds and P. Torres-Carrasquillo. Approaches and applications of audio diarization. pages 953--956, Philadelphia, PA, March 2005.Google Scholar
- L. Rodríguez, M. Penagarikano, and G. Bordel. A simple but effective approach to speaker tracking in broadcast news. Lecture Notes in Computer Science, 2007.Google Scholar
- D. van Leeuwen and M. Huijbregts. The AMI speaker diarization system for NIST RT06s meeting data. In (MLMI), volume 4299 of Lecture Notes in Computer Science, pages 371--384, Berlin, October 2007. Google ScholarDigital Library
- C. Wooters and M. Huijbregts. The ICSI RT07s speaker diarization system. In Multimodal Technologies for Perception of Humans, Lecture Notes in Computer Science, Berlin, 2008. Springer Verlag. Google ScholarDigital Library
- J. Zibert, B. Vesnicer, and F. Mihelic. A system for speaker detection and tracking in audio broadcast news. Informatica (Slovenia), 32(1):51--61, 2008.Google Scholar
Index Terms
- Towards automatic speaker retrieval for large multimedia archives
Recommendations
Speaker-adaptive speech recognition using speaker diarization for improved transcription of large spoken archives
This paper deals with speaker-adaptive speech recognition for large spoken archives. The goal is to improve the recognition accuracy of an automatic speech recognition (ASR) system that is being deployed for transcription of a large archive of Czech ...
A review on speaker diarization systems and approaches
Speaker indexing or diarization is an important task in audio processing and retrieval. Speaker diarization is the process of labeling a speech signal with labels corresponding to the identity of speakers. This paper includes a comprehensive review on ...
Speaker Diarization For Multiple-Distant-Microphone Meetings Using Several Sources of Information
Human-machine interaction in meetings requires the localization and identification of the speakers interacting with the system as well as the recognition of the words spoken. A seminal step toward this goal is the field of rich transcription research, ...
Comments