Skip to main content

Progress in the AMIDA Speaker Diarization System for Meeting Data

  • Conference paper
Multimodal Technologies for Perception of Humans (RT 2007, CLEAR 2007)

Abstract

In this paper we describe the AMIDA speaker dizarization system as it was submitted to the NIST Rich Transcription evaluation 2007 for conference room data. This is done in the context of the history of this system and other speaker diarization systems. One of the goals of our system is to have as little tunable parameters as possible, while maintaining performance. The system consists of a BIC segmentation/clustering initialization, followed by a combined re-segmentation cluster merging algorithm. The Diarization Error Rate (DER) result of our best system is 17.0 %, accounting for overlapping speech. However, we find that a slight altering of Speech Activity Detection models has a large impact on the speaker DER.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Adami, A., Burget, L., Hermansky, H.: Qualcomm-ICSI-OGI noise-robust front end (September 2002), http://www.icsi.berkeley.edu/Speech/papers/qio/

  2. Ajmera, J., McCowan, I., Bourlard, H.: Robust speaker change detection. IEEE Signal Processing Lettres 11(8), 649–651 (2004)

    Article  Google Scholar 

  3. Anguera, X.: BeamformIt, the fast and robust acoustic beamformer (2006), http://www.icsi.berkeley.edu/~xanguera/BeamformIt/

  4. Anguera, X., Wooters, C., Peskin, B., Aguiló, M.: Robust speaker segmentation for meetings: The ICSI-SRI spring 2005 diarization system. In: Proc. RT 2005 Meeting Recognition Evaluation Workshop, Edinburgh, July 2005, pp. 26–38 (2005)

    Google Scholar 

  5. Anguera, X., Wooters, C., Peskin, B., Aguiló, M.: Robust speaker segmentation for meetings: The ICSI-SRI spring 2005 diarization system. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869, pp. 402–414. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  6. Barras, C., Zhu, X., Meignier, S.: Multistage speaker diarization of broadcast news. IEEE Transactions on Audio, Speech and Language Processing 14(5), 1505–1512 (2006)

    Article  Google Scholar 

  7. Chen, S.S., Gopalakrishnan, P.S.: Clustering via the Baysian Information Criterion with applications in speech recognition. In: Proc. ICASSP (1998)

    Google Scholar 

  8. Chen, S.S., Gopalakrishnan, P.S.: Speaker, environment and channel change detection and clustering via the Bayesian Information Criterion. In: Proceedings of the Darpa Broadcast News Transcription and Understanding Workshop (1998)

    Google Scholar 

  9. Fiscus, J., Radde, N., Garofolo, J.S., Le, A., Ajot, J., Laprun, C.: The rich transcription 2005 spring meeting recognition evaluation. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869, pp. 369–389. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  10. Fredouille, C., Senay, G.: Technical improvements of the e-hmm based speaker diarization system for meeting records. In: Renals, S., Bengio, S., Fiscus, J.G. (eds.) MLMI 2006. LNCS, vol. 4299, pp. 359–370. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  11. Gauvain, J.-L., Lee, C.-H.: Maximum a posteriori esitimation for multivariate gaussian mixture observations of markov chains. IEEE Trans. Speech Audio Processing 2, 291–298 (1994)

    Article  Google Scholar 

  12. Hermansky, H., Morgan, N.: Rasta processing of speech. IEEE Transactions on Speech and Audio Processing, special issue on Robust Speech Recognition 2(4), 578–589 (1994)

    Google Scholar 

  13. Hermansky, H.: Perceptual linear predictive (plp) analysis of speech. JASA 87(4), 1738–1752 (1990)

    Google Scholar 

  14. Huijbregts, M., Wooters, C., Ordelman, R.: Filtering the unknown: Speech activity detection in heterogeneous video collections. In: Proc. Interspeech, Antwerpen (accepted for publication, 2007)

    Google Scholar 

  15. Reynolds, D.A., Singer, E., Carlson, B.A.: Blind clustering of speech utterances based on speaker and language characteristics. In: Proceedings of International Conference Spoken Language Processing (ICSLP 1998), November 1998, pp. 3193–3196 (1998)

    Google Scholar 

  16. van Leeuwen, D.A.: The TNO speaker diarization system for NIST rich transcription evaluation 2005 for meeting data. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869, pp. 400–449. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  17. van Leeuwen, D.A., Brümmer, N.: An introduction to application-independent evaluation of speaker recognition systems. In: Müller, C. (ed.) Speaker Classification 2007. LNCS (LNAI), vol. 4343. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  18. van Leeuwen, D.A., Huijbregts, M.: The AMI speaker diarization system for NIST RT06s meeting data. In: Renals, S., Bengio, S., Fiscus, J.G. (eds.) MLMI 2006. LNCS, vol. 4299, pp. 371–384. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  19. Wooters, C., Huijbregts, M.: The ICSI RT07s speaker diarization system. In: Machine Learning for Multimodal Interaction. LNCS. Springer, Heidelberg (2007)

    Google Scholar 

  20. Zhu, X., Barras, C., Lamel, L., Gauvain, J.-L.: Speaker diarization: From broadcast news to lectures. In: Renals, S., Bengio, S., Fiscus, J.G. (eds.) MLMI 2006. LNCS, vol. 4299, pp. 396–406. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Rainer Stiefelhagen Rachel Bowers Jonathan Fiscus

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

van Leeuwen, D.A., Konečný, M. (2008). Progress in the AMIDA Speaker Diarization System for Meeting Data. In: Stiefelhagen, R., Bowers, R., Fiscus, J. (eds) Multimodal Technologies for Perception of Humans. RT CLEAR 2007 2007. Lecture Notes in Computer Science, vol 4625. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68585-2_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-68585-2_44

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-68584-5

  • Online ISBN: 978-3-540-68585-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics