Skip to main content

A Hybrid Generative-Discriminative Approach to Speaker Diarization

  • Conference paper
Machine Learning for Multimodal Interaction (MLMI 2008)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5237))

Included in the following conference series:

  • 892 Accesses

Abstract

In this paper we present a sound probabilistic approach to speaker diarization. We use a hybrid framework where a distribution over the number of speakers at each point of a multimodal stream is estimated with a discriminative model. The output of this process is used as input in a generative model that can adapt to a novel test set and perform high accuracy speaker diarization. We manage to deal efficiently with the less common, and therefore harder, segments like silence and multiple speaker parts in a principled probabilistic manner.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Anguera, X., Wooters, C., Hernando, J.: Automatic cluster complexity and quantity selection: Towards robust speaker diarization. In: Renals, S., Bengio, S., Fiscus, J.G. (eds.) MLMI 2006. LNCS, vol. 4299, pp. 248–256. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  2. Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Belmont (1999)

    MATH  Google Scholar 

  3. Bilmes, J.: A gentle tutorial on the em algorithm and its application to parameter estimation for gaussian mixture and hidden markov models (1997)

    Google Scholar 

  4. Dines, J., Doss, M.M.: A study of phoneme and grapheme based context-dependent asr systems. IDIAP-RR 12, IDIAP (2007)

    Google Scholar 

  5. Huang, E.M.J., Visweswariah, K., Potamianos, G.: The ibm rt07 evaluation systems for speaker diarization on lecture meetings. In: Proc. Rich Transcription Evaluation Work, pp. 282–289. Morgan Kaufmann Publishers Inc, San Francisco (2007)

    Google Scholar 

  6. Karafia’t, M., Gre’zl, F., Schwarz, P., Burget, L., Cernocky’, J.: Robust heteroscedastic linear discriminant analysis and lcrc posterior features in meeting data recognition. In: Renals, S., Bengio, S., Fiscus, J.G. (eds.) MLMI 2006. LNCS, vol. 4299, pp. 275–284. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  7. Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: ICML 2001: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 282–289. Morgan Kaufmann Publishers Inc., San Francisco (2001)

    Google Scholar 

  8. Laskowski, K., Schultz, T.: Modeling vocal interaction for segmentation in meeting recognition. In: Popescu-Belis, A., Renals, S., Bourlard, H. (eds.) MLMI 2007. LNCS, vol. 4892, pp. 259–270. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  9. Moore, D.: The IDIAP Smart Meeting Room (2002)

    Google Scholar 

  10. Moore, D., Dines, J., Doss, M.M., Vepa, J., Cheng, O., Hain, T.: Juicer: A weighted finite-state transducer speech decoder. In: Renals, S., Bengio, S., Fiscus, J.G. (eds.) MLMI 2006. LNCS, vol. 4299. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  11. Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition, pp. 267–296 (1990)

    Google Scholar 

  12. Reynolds, D., Torres-Carrasquillo, P.: Approaches and applications of audio diarization. In: IEEE ICASSP, pp. 953–956 (2005)

    Google Scholar 

  13. Cuendet, D.H.-T.S., Shriberg, E.: Automatic labeling inconsistencies detection and correction for sentence unit segmentation in conversational speech. In: Popescu-Belis, A., Renals, S., Bourlard, H. (eds.) MLMI 2007. LNCS, vol. 4892, pp. 144–155. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  14. Sutton, C., McCallum, A.: An introduction to Conditional Random Fields for Relational Learning. In: Introduction to Statistical Relational Learning, ch. 1. MIT Press, Cambridge (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Andrei Popescu-Belis Rainer Stiefelhagen

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Noulas, A.K., van Kasteren, T., Kröse, B.J.A. (2008). A Hybrid Generative-Discriminative Approach to Speaker Diarization. In: Popescu-Belis, A., Stiefelhagen, R. (eds) Machine Learning for Multimodal Interaction. MLMI 2008. Lecture Notes in Computer Science, vol 5237. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85853-9_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85853-9_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85852-2

  • Online ISBN: 978-3-540-85853-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics