Abstract
In this paper we present a sound probabilistic approach to speaker diarization. We use a hybrid framework where a distribution over the number of speakers at each point of a multimodal stream is estimated with a discriminative model. The output of this process is used as input in a generative model that can adapt to a novel test set and perform high accuracy speaker diarization. We manage to deal efficiently with the less common, and therefore harder, segments like silence and multiple speaker parts in a principled probabilistic manner.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Anguera, X., Wooters, C., Hernando, J.: Automatic cluster complexity and quantity selection: Towards robust speaker diarization. In: Renals, S., Bengio, S., Fiscus, J.G. (eds.) MLMI 2006. LNCS, vol. 4299, pp. 248–256. Springer, Heidelberg (2006)
Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Belmont (1999)
Bilmes, J.: A gentle tutorial on the em algorithm and its application to parameter estimation for gaussian mixture and hidden markov models (1997)
Dines, J., Doss, M.M.: A study of phoneme and grapheme based context-dependent asr systems. IDIAP-RR 12, IDIAP (2007)
Huang, E.M.J., Visweswariah, K., Potamianos, G.: The ibm rt07 evaluation systems for speaker diarization on lecture meetings. In: Proc. Rich Transcription Evaluation Work, pp. 282–289. Morgan Kaufmann Publishers Inc, San Francisco (2007)
Karafia’t, M., Gre’zl, F., Schwarz, P., Burget, L., Cernocky’, J.: Robust heteroscedastic linear discriminant analysis and lcrc posterior features in meeting data recognition. In: Renals, S., Bengio, S., Fiscus, J.G. (eds.) MLMI 2006. LNCS, vol. 4299, pp. 275–284. Springer, Heidelberg (2006)
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: ICML 2001: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 282–289. Morgan Kaufmann Publishers Inc., San Francisco (2001)
Laskowski, K., Schultz, T.: Modeling vocal interaction for segmentation in meeting recognition. In: Popescu-Belis, A., Renals, S., Bourlard, H. (eds.) MLMI 2007. LNCS, vol. 4892, pp. 259–270. Springer, Heidelberg (2008)
Moore, D.: The IDIAP Smart Meeting Room (2002)
Moore, D., Dines, J., Doss, M.M., Vepa, J., Cheng, O., Hain, T.: Juicer: A weighted finite-state transducer speech decoder. In: Renals, S., Bengio, S., Fiscus, J.G. (eds.) MLMI 2006. LNCS, vol. 4299. Springer, Heidelberg (2006)
Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition, pp. 267–296 (1990)
Reynolds, D., Torres-Carrasquillo, P.: Approaches and applications of audio diarization. In: IEEE ICASSP, pp. 953–956 (2005)
Cuendet, D.H.-T.S., Shriberg, E.: Automatic labeling inconsistencies detection and correction for sentence unit segmentation in conversational speech. In: Popescu-Belis, A., Renals, S., Bourlard, H. (eds.) MLMI 2007. LNCS, vol. 4892, pp. 144–155. Springer, Heidelberg (2008)
Sutton, C., McCallum, A.: An introduction to Conditional Random Fields for Relational Learning. In: Introduction to Statistical Relational Learning, ch. 1. MIT Press, Cambridge (2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Noulas, A.K., van Kasteren, T., Kröse, B.J.A. (2008). A Hybrid Generative-Discriminative Approach to Speaker Diarization. In: Popescu-Belis, A., Stiefelhagen, R. (eds) Machine Learning for Multimodal Interaction. MLMI 2008. Lecture Notes in Computer Science, vol 5237. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85853-9_9
Download citation
DOI: https://doi.org/10.1007/978-3-540-85853-9_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85852-2
Online ISBN: 978-3-540-85853-9
eBook Packages: Computer ScienceComputer Science (R0)