skip to main content
10.1145/1877850.1877857acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Towards automatic speaker retrieval for large multimedia archives

Published:29 October 2010Publication History

ABSTRACT

In this paper we discuss the challenges of scaling a speaker retrieval system for small audiovisual collections towards a speaker retrieval system for large audio (visual) archives. We show that with our large scale speaker diarization approach it is possible to perform query-by-example speaker retrieval; to search for audiovisual documents in which a particular person is talking. On a selection of the ICSI meeting corpus we obtain a Mean Average Precision of 0.49 and precision-at-ten of 0.70. On a much larger archive of three months of Dutch broadcast television we obtain a precision-at-ten of 0.52.

References

  1. R. B. Dunn, D. A. Reynolds, and T. F. Quatieri. Approaches to speaker detection and tracking in conversational speech. Digital Signal Processing, 10(1-3):93--112, 2000.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. J. et all. The ICSI meeting project: Resources and research. In NIST ICASSP 2004 Meeting Recognition Workshop, Montreal, May 2004.Google ScholarGoogle Scholar
  3. J. G. Fiscus, J. Ajot, and J. S. Garofolo. The rich transcription 2007 meeting recognition evaluation. In Multimodal Technologies for Perception of Humans, Lecture Notes in Computer Science, Berlin, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. O. Glembek, L. Burget, N. Dehak, N. Bròmmer, and P. Kenny. Comparison of scoring methods used in speaker recognition with joint factor analysis. In Proc ICASSP 2009, Taipei, Taiwan, April 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Huijbregts and D. van Leeuwen. The RU submission to the Evalita'09 "application track" speaker recognition evaluation. In proceedings of Evalita 2009, 2009.Google ScholarGoogle Scholar
  6. M. Huijbregts and D. van Leeuwen. Large scale speaker diarization for long recordings and small collections. IEEE Transactions on Audio, Speech and Language Processing, submitted. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Huijbregts, C. Wooters, and R. Ordelman. Filtering the unknown: Speech activity detection in heterogeneous video collections. In proceedings of Interspeech, Antwerp, Belgium, August 2007.Google ScholarGoogle Scholar
  8. A. Martin and M. Przybocki. The nist 1999 speaker recognition evaluation - an overview. Digital Signal Processing, 10(1-3):1--18, 2000.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. Reynolds and P. Torres-Carrasquillo. Approaches and applications of audio diarization. pages 953--956, Philadelphia, PA, March 2005.Google ScholarGoogle Scholar
  10. L. Rodríguez, M. Penagarikano, and G. Bordel. A simple but effective approach to speaker tracking in broadcast news. Lecture Notes in Computer Science, 2007.Google ScholarGoogle Scholar
  11. D. van Leeuwen and M. Huijbregts. The AMI speaker diarization system for NIST RT06s meeting data. In (MLMI), volume 4299 of Lecture Notes in Computer Science, pages 371--384, Berlin, October 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Wooters and M. Huijbregts. The ICSI RT07s speaker diarization system. In Multimodal Technologies for Perception of Humans, Lecture Notes in Computer Science, Berlin, 2008. Springer Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Zibert, B. Vesnicer, and F. Mihelic. A system for speaker detection and tracking in audio broadcast news. Informatica (Slovenia), 32(1):51--61, 2008.Google ScholarGoogle Scholar

Index Terms

  1. Towards automatic speaker retrieval for large multimedia archives

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      AIEMPro '10: Proceedings of the 3rd international workshop on Automated information extraction in media production
      October 2010
      78 pages
      ISBN:9781450301640
      DOI:10.1145/1877850

      Copyright © 2010 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 29 October 2010

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Upcoming Conference

      MM '24
      MM '24: The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne , VIC , Australia

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader