A Free Synthetic Corpus for Speaker Diarization Research

Edwards, Erik; Brenndoerfer, Michael; Robinson, Amanda; Sadoughi, Najmeh; Finley, Greg P.; Korenevsky, Maxim; Axtmann, Nico; Miller, Mark; Suendermann-Oeft, David

doi:10.1007/978-3-319-99579-3_13

Erik Edwards¹⁶,
Michael Brenndoerfer¹⁷,
Amanda Robinson¹⁶,
Najmeh Sadoughi¹⁶,
Greg P. Finley¹⁶,
Maxim Korenevsky¹⁶,
Nico Axtmann¹⁸,
Mark Miller¹⁶ &
…
David Suendermann-Oeft¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11096))

Included in the following conference series:

International Conference on Speech and Computer

1446 Accesses
3 Citations

Abstract

A synthetic corpus of dialogs was constructed from the LibriSpeech corpus, and is made freely available for diarization research. It includes over 90 h of training data, and over 9 h each of development and test data. Both 2-person and 3-person dialogs, with and without overlap, are included. Timing information is provided in several formats, and includes not only speaker segmentations, but also phoneme segmentations. As such, it is a useful starting point for general, particularly early-stage, diarization system development.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Anguera Miró, X.: Robust speaker diarization for meetings. Ph.D. thesis, Univ. Politècnica de Catalunya (2006)
Google Scholar
Anguera Miró, X., Bozonnet, S., Evans, N., Fredouille, C., Friedland, G., Vinyals, O.: Speaker diarization: a review of recent research. IEEE Trans. Audio Speech Lang. Process. 20(2), 356–370 (2012)
Article Google Scholar
Anguera Miró, X., Hernando Pericás, F.: Evolutive speaker segmentation using a repository system. In: Proceedings of ICSLP, pp. 605–608. ISCA (2004)
Google Scholar
Anguera, X., Wooters, C., Peskin, B., Aguiló, M.: Robust speaker segmentation for meetings: the ICSI-SRI spring 2005 diarization system. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869, pp. 402–414. Springer, Heidelberg (2006). https://doi.org/10.1007/11677482_34
Chapter Google Scholar
Bozonnet, S., Vipperla, R., Evans, N.: Phone adaptive training for speaker diarization. In: Proceedings of INTERSPEECH, pp. 494–497. ISCA (2012)
Google Scholar
Burger, S., MacLaren, V., Yu, H.: The ISL meeting corpus: the impact of meeting type on speech style. In: Proceedings of ICSLP, pp. 301–304. ISCA (2002)
Google Scholar
Chen, I.F., Cheng, S.S., Wang, H.M.: Phonetic subspace mixture model for speaker diarization. In: Proceedings of INTERSPEECH, pp. 2298–2301. ISCA (2010)
Google Scholar
Delacourt, P., Kryze, D., Wellekens, C.: Speaker-based segmentation for audio data indexing. In: Proceedings of ESCA Tutorial and Research Workshop, pp. 78–83. ISCA (1999)
Google Scholar
Finley, G., et al.: An automated medical scribe for documenting clinical encounters. In: Proceedings of NAACL. ACL (2018)
Google Scholar
Gangadharaiah, R., Narayanaswamy, B.: A novel method for two-speaker segmentation. In: Proceedings of ICSLP, pp. 2337–2340. ISCA (2004)
Google Scholar
Garofolo, J., Laprun, C., Michel, M., Stanford, V., Tabassi, E.: The NIST meeting room pilot corpus. In: Proceedings of LREC, p. 4. ELRA (2004)
Google Scholar
Gauvain, J.L., Adda, G., Lamel, L., Adda-Decker, M.: Transcribing broadcast news: the LIMSI Nov96 Hub4 system. In: Proceedings of DARPA Speech Recognition Workshop, pp. 56–63. DARPA (1997)
Google Scholar
Gish, H., Siu, M.H., Rohlicek, J.: Segregation of speakers for speech recognition and speaker identification. In: Proceedings of ICASSP, vol. 2, pp. 873–876. IEEE (1991)
Google Scholar
Godfrey, J., Holliman, E., McDaniel, J.: SWITCHBOARD: telephone speech corpus for research and development. In: Proceedings of ICASSP, vol. 1, pp. 517–520. IEEE (1992)
Google Scholar
Hain, T., et al.: The development of the AMI system for the transcription of speech in meetings. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869, pp. 344–356. Springer, Heidelberg (2006). https://doi.org/10.1007/11677482_30
Chapter Google Scholar
Heldner, M., Edlund, J.: Pauses, gaps and overlaps in conversations. J. Phon. 38(4), 555–568 (2010)
Article Google Scholar
Hsieh, C.H., Wu, C.H., Shen, H.P.: Adaptive decision tree-based phone cluster models for speaker clustering. In: Proceedings of INTERSPEECH, pp. 861–864. ISCA (2008)
Google Scholar
Ikbal, S., Visweswariah, K.: Learning essential speaker sub-space using hetero-associative neural networks for speaker clustering. In: Proceedings of INTERSPEECH, pp. 28–31. ISCA (2008)
Google Scholar
Janin, A., et al.: The ICSI meeting corpus. In: Proceedings of ICASSP, vol. 1, pp. 364–367. IEEE (2003)
Google Scholar
Jothilakshmi, S., Ramalingam, V., Palanivel, S.: Speaker diarization using autoassociative neural networks. Eng. Appl. Artif. Intell. 22(4–5), 667–675 (2009)
Article Google Scholar
Kim, K., Kim, M.: Robust speaker recognition against background noise in an enhanced multi-condition domain. IEEE Trans. Consum. Electron. 56(3), 1684–1688 (2010)
Article Google Scholar
Liu, C., Yan, Y.: Speaker change detection using minimum message length criterion. In: Proceedings of ICSLP, pp. 514–517. ISCA (2000)
Google Scholar
Meinedo, H., Neto, J.: A stream-based audio segmentation, classification and clustering pre-processing system for broadcast news using ANN models. In: Proceedings of INTERSPEECH, pp. 237–240. ISCA (2005)
Google Scholar
Metzger, Y.: Blind segmentation of a multi-speaker conversation using two different sets of features. In: Proceedings of Odyssey Workshop, pp. 157–162. ISCA (2001)
Google Scholar
Moattar, M., Homayounpour, M.: A review on speaker diarization systems and approaches. Speech Commun. 54(10), 1065–1103 (2012)
Article Google Scholar
Mohammadi, S., Sameti, H., Langarani, M., Tavanaei, A.: KNNDIST: a non-parametric distance measure for speaker segmentation. In: Proceedings of INTERSPEECH, pp. 2282–2285. ISCA (2012)
Google Scholar
NIST: Spring 2006 (RT-06S) Rich Transcription Meeting Recognition Evaluation plan. Report RT-06S, National Institute of Standards and Technology, Spring 2006
Google Scholar
Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: LibriSpeech: an ASR corpus based on public domain audio books. In: Proceedings of ICASSP, pp. 5206–5210. IEEE (2015)
Google Scholar
Povey, D., et al.: The Kaldi speech recognition toolkit. In: Proceedings of Workshop ASRU, Waikoloa Village, HI, p. 4. IEEE (2011)
Google Scholar
Rohlicek, J., et al.: Gisting conversational speech. In: Proceedings of ICASSP, vol. 2, pp. 113–116. IEEE (1992)
Google Scholar
Schindler, C., Draxler, C.: Using spectral moments as a speaker specific feature in nasals and fricatives. In: Proceedings of INTERSPEECH, pp. 2793–2796. ISCA (2013)
Google Scholar
Shoup, J.: Phonological aspects of speech recognition. In: Lea, W. (ed.) Trends in Speech Recognition, pp. 125–138. Prentice-Hall, Englewood Cliffs (1980)
Google Scholar
Siegler, M., Jain, U., Raj, B., Stern, R.: Automatic segmentation, classification and clustering of broadcast news audio. In: Proceedings of DARPA Speech Recognition Workshop, pp. 97–99. DARPA (1997)
Google Scholar
Siu, M.H., Yu, G., Gish, H.: An unsupervised, sequential learning algorithm for the segmentation of speech waveforms with multiple speakers. In: Proceedings of ICASSP, vol. 2, pp. 189–192. IEEE (1992)
Google Scholar
Soldi, G., Bozonnet, S., Alegre, F., Beaugeant, C., Evans, N.: Short-duration speaker modelling with phone adaptive training. In: Proceedings of Odyssey Workshop, pp. 208–215. ISCA (2014)
Google Scholar
Sönmez, M., Heck, L., Weintraub, M.: Speaker tracking and detection with multiple speakers. In: Proceedings of EUROSPEECH, pp. 2219–2222. ISCA (1999)
Google Scholar
Stivers, T., et al.: Universals and cultural variation in turn-taking in conversation. Proc. Natl. Acad. Sci U.S.A. 106(26), 10587–10592 (2009)
Article Google Scholar
Sugiyama, M., Murakami, J., Watanabe, H.: Speech segmentation and clustering based on speaker features. In: Proceedings of ICASSP, vol. 2, pp. 395–398. IEEE (1993)
Google Scholar
Takagi, K., Itahashi, S.: Segmentation of spoken dialogue by interjections, disfluent utterances and pauses. In: Proceedings of ICSLP, pp. 697–700. ISCA (1996)
Google Scholar
Valente, F., Wellekens, C.: Scoring unknown speaker clustering: VB vs. BIC. In: Proceedings of ICSLP, pp. 593–596. ISCA (2004)
Google Scholar
Viñals, I., Villalba, J., Ortega, A., Miguel, A., Lleida, E.: Bottleneck based front-end for diarization systems. In: Abad, A., et al. (eds.) IberSPEECH 2016. LNCS (LNAI), vol. 10077, pp. 276–286. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49169-1_27
Chapter Google Scholar
Wang, G., Wu, X., Zheng, T.: Using phoneme recognition and text-dependent speaker verification to improve speaker segmentation for Chinese speech. In: Proceedings of INTERSPEECH, pp. 1457–1460. ISCA (2010)
Google Scholar
Wilcox, L., Chen, F., Kimber, D., Balasubramanian, V.: Segmentation of speech using speaker identification. In: Proceedings of ICASSP, vol. 1, pp. 161–164. IEEE (1994)
Google Scholar
Yella, S., Motlícek, P., Bourlard, H.: Phoneme background model for information bottleneck based speaker diarization. In: Proceedings of INTERSPEECH, pp. 597–601. ISCA (2014)
Google Scholar
Yella, S., Stolcke, A., Slaney, M.: Artificial neural network features for speaker diarization. In: Proceedings of SLT Workshop, pp. 402–406. IEEE (2014)
Google Scholar
Zâo, L., Coelho, R.: Colored noise based multicondition training technique for robust speaker identification. IEEE Signal Process. Lett. 18(11), 675–678 (2011)
Article Google Scholar
Zibert, J., Mihelic, F.: Prosodic and phonetic features for speaker clustering in speaker diarization systems. In: Proceedings of INTERSPEECH, pp. 1033–1036. ISCA (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

EMR.AI Inc., San Francisco, CA, USA
Erik Edwards, Amanda Robinson, Najmeh Sadoughi, Greg P. Finley, Maxim Korenevsky, Mark Miller & David Suendermann-Oeft
University of California Berkeley, Berkeley, CA, USA
Michael Brenndoerfer
DHBW, Karlsruhe, Germany
Nico Axtmann

Authors

Erik Edwards
View author publications
You can also search for this author in PubMed Google Scholar
Michael Brenndoerfer
View author publications
You can also search for this author in PubMed Google Scholar
Amanda Robinson
View author publications
You can also search for this author in PubMed Google Scholar
Najmeh Sadoughi
View author publications
You can also search for this author in PubMed Google Scholar
Greg P. Finley
View author publications
You can also search for this author in PubMed Google Scholar
Maxim Korenevsky
View author publications
You can also search for this author in PubMed Google Scholar
Nico Axtmann
View author publications
You can also search for this author in PubMed Google Scholar
Mark Miller
View author publications
You can also search for this author in PubMed Google Scholar
David Suendermann-Oeft
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Erik Edwards .

Editor information

Editors and Affiliations

SPIIRAS, St. Petersburg, Russia
Alexey Karpov
Leipzig University of Telecommunications, Leipzig, Germany
Oliver Jokisch
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Edwards, E. et al. (2018). A Free Synthetic Corpus for Speaker Diarization Research. In: Karpov, A., Jokisch, O., Potapova, R. (eds) Speech and Computer. SPECOM 2018. Lecture Notes in Computer Science(), vol 11096. Springer, Cham. https://doi.org/10.1007/978-3-319-99579-3_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-99579-3_13
Published: 25 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99578-6
Online ISBN: 978-3-319-99579-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics