Towards a Dialect Classification in German Speech Samples

Dobbriner, Johanna; Jokisch, Oliver

doi:10.1007/978-3-030-26061-3_7

Johanna Dobbriner¹¹ &
Oliver Jokisch¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11658))

Included in the following conference series:

International Conference on Speech and Computer

1350 Accesses
3 Citations

Abstract

The automatic classification of a speaker’s dialect can enrich many applications, e.g. in the human-machine interaction (HMI) or natural language processing (NLP) but also in specific areas such as pronunciation tutoring, forensic analysis or personalization of call-center talks. Although a lot of HMI/NLP-related research has been dedicated to different tasks in affective computing, emotion recognition, semantic understanding and other advanced topics, there seems to be a lack of methods for an automated dialect analysis that is not based on transcriptions, in particular for some languages like German. For other languages such as English, Mandarin or Arabic, a multitude of feature combinations and classification methods has been tried already, which provides a starting point for our study. We describe selected experiments to train suitable classifiers on German dialect varieties in the corpus “Regional Variants of German 1” (RVG1). Our article starts with a systematic choice of appropriate spectral features. In a second step, these features are post-processed with different methods and used to train one Gaussian Mixture Model (GMM) per feature combination as a Universal Background Model (UBM). The resulting UBMs are then adapted to a varied selection of dialects by maximum-a-posteriori (MAP) adaptation. Our preliminary results on German show, that a dialect discrimination and classification is possible. The unweighted recognition accuracy ranges from 32.4 to 54.9% in a 3-dialects test and from 19.6 to 31.4% in a classification of 9-dialects. Some dialects are easier distinguishable, purely using spectral features, while others require a different feature set or more sophisticated classification methods, which we will explore in future experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Empirical analysis of linguistic and paralinguistic information for automatic dialect classification

Article 28 July 2017

Variance Normalised Features for Language and Dialect Discrimination

Article 11 January 2021

Spectral Features Based Spoken Dialect Identification for Punjabi Language

References

Hanani, A., Russell, M.J., Carey, M.J.: Human and computer recognition of regional accents and ethnic groups from British English speech. Comput. Speech Lang. 27, 59–74 (2013). https://doi.org/10.1016/j.csl.2012.01.003
Article Google Scholar
Najafian, M., Khurana, S., Shon, S., Ali, A., Glass, J.R.: Exploiting convolutional neural networks for phonotactic based dialect identification. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, AB, Canada, 15–20 April 2018, pp. 5174–5178 (2018). https://doi.org/10.1109/ICASSP.2018.8461486
Wang, H., van Heuven, V.J.: Relative contribution of vowel quality and duration to native language identification in foreign-accented English. In: Proceedings of the 2nd International Conference on Cryptography, Security and Privacy, ICCSP 2018, Guiyang, China, 16–19 March 2018, pp. 16–20 (2018). https://doi.org/10.1145/3199478.3199507
Brown, G.: Automatic accent recognition systems and the effects of data on performance. In: Odyssey 2016: The Speaker and Language Recognition Workshop, Bilbao, Spain, 21–24 June 2016, pp. 94–100 (2016). https://doi.org/10.21437/Odyssey.2016-14
Bougrine, S., Cherroun, H., Ziadi, D.: Hierarchical classification for spoken Arabic dialect identification using prosody: Case of Algerian dialects. CoRR abs/1703.10065 (2017). http://arxiv.org/abs/1703.10065
Biadsy, F., Hirschberg, J., Habash, N.: Spoken Arabic dialect identification using phonotactic modeling. In: Proceedings of the Workshop on Computational Approaches to Semitic Languages, SEMITIC@EACL 2009, Athens, Greece, 31 March 2009, pp. 53–61 (2009). https://aclanthology.info/papers/W09-0807/w09-0807
Akbacak, M., Vergyri, D., Stolcke, A., Scheffer, N., Mandal, A.: Effective Arabic dialect classification using diverse phonotactic models. In: INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association, Florence, Italy, 27–31 August 2011, pp. 737–740 (2011). http://www.isca-speech.org/archive/interspeech_2011/i11_0737.html
Zheng, Y., et al.: Accent detection and speech recognition for Shanghai-accented Mandarin. In: INTERSPEECH 2005 - Eurospeech, 9th European Conference on Speech Communication and Technology, Lisbon, Portugal, 4–8 September 2005, pp. 217–220 (2005). http://www.isca-speech.org/archive/interspeech_2005/i05_0217.html
Hou, J., Liu, Y., Zheng, T.F., Olsen, J.Ø., Tian, J.: Multi-layered features with SVM for Chinese accent identification. In: 2010 International Conference on Audio, Language and Image Processing, pp. 25–30 (2010). https://doi.org/10.1109/ICALIP.2010.5685023
Lei, Y., Hansen, J.H.L.: Dialect classification via text-independent training and testing for Arabic, Spanish, and Chinese. IEEE Trans. Audio Speech Lang. Process. 19, 85–96 (2011). https://doi.org/10.1109/TASL.2010.2045184
Article Google Scholar
Torres-Carrasquillo, P.A., Sturim, D.E., Reynolds, D.A., McCree, A.: Eigen-channel compensation and discriminatively trained Gaussian mixture models for dialect and accent recognition. In: INTERSPEECH 2008, 9th Annual Conference of the International Speech Communication Association, Brisbane, Australia, 22–26 September 2008, pp. 723–726 (2008). http://www.isca-speech.org/archive/interspeech_2008/i08_0723.html
Biadsy, F., Hirschberg, J., Collins, M.: Dialect recognition using a phone-GMM-supervector-based SVM kernel. In: INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, 26–30 September 2010, pp. 753–756 (2010). http://www.isca-speech.org/archive/interspeech_2010/i10_0753.html
Biadsy, F.: Automatic dialect and accent recognition and its application to speech recognition. Ph.D. thesis, Columbia University (2011). https://doi.org/10.7916/D8M61S68
Zissman, M.A., Gleason, T.P., Rekart, D., Losiewicz, B.L.: Automatic dialect identification of extemporaneous conversational, Latin American Spanish speech. In: 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings, ICASSP ’96, Atlanta, Georgia, USA, 7–10 May 1996, pp. 777–780 (1996). https://doi.org/10.1109/ICASSP.1996.543236
Chittaragi, N.B., Prakash, A., Koolagudi, S.: Dialect identification using spectral and prosodic features on single and ensemble classifiers. Arab. J. Sci. Eng. 43, 4289–4302 (2017). https://doi.org/10.1007/s13369-017-2941-0
Article Google Scholar
Najafian, M., Safavi, S., Weber, P., Russell, M.J.: Identification of British English regional accents using fusion of i-vector and multi-accent phonotactic systems. In: Odyssey 2016: The Speaker and Language Recognition Workshop, Bilbao, Spain, 21–24 June 2016, pp. 132–139 (2016). https://doi.org/10.21437/Odyssey.2016-19
Zhang, Q., Boril, H., Hansen, J.H.L.: Supervector pre-processing for PRSVM-based Chinese and Arabic dialect identification. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2013, Vancouver, BC, Canada, 26–31 May 2013, pp. 7363–7367 (2013). https://doi.org/10.1109/ICASSP.2013.6639093
Liu, G., Hansen, J.H.L.: A systematic strategy for robust automatic dialect identification. In: Proceedings of the 19th European Signal Processing Conference, EUSIPCO 2011, Barcelona, Spain, 29 August–2 September 2011, pp. 2138–2141 (2011). http://ieeexplore.ieee.org/document/7074191/
Lazaridis, A., el Khoury, E., Goldman, J., Avanzi, M., Marcel, S., Garner, P.N.: Swiss french regional accent identification. In: Odyssey 2014: The Speaker and Language Recognition Workshop, Joensuu, Finland, 16–19 June 2014 (2014). https://isca-speech.org/archive/odyssey_2014/abstracts.html#abs29
Burger, S., Schiel, F.: RVG 1 - a database for regional variants of contemporary German. In: Proceedings of the 1st International Conference on Language Resources and Evaluation, pp. 1083–1087. Granada, Spain (1998). https://www.phonetik.uni-muenchen.de/forschung/publikationen/Burger-98-RVG1.ps
Mettke, H.: Mittelhochdeutsche Grammatik. VEB Bibliographisches Institut, Leipzig, Germany (1989)
Google Scholar
Larcher, A., Lee, K.A., Meignier, S.: An extensible speaker identification sidekit in Python. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, 20–25 March 2016, pp. 5095–5099 (2016). https://doi.org/10.1109/ICASSP.2016.7472648

Download references

Author information

Authors and Affiliations

Institute of Applied Informatics, Universität Leipzig, 04107, Leipzig, Germany
Johanna Dobbriner
Institute of Communications Engineering, Leipzig University of Telecommunications (HfTL), 04277, Leipzig, Germany
Oliver Jokisch

Authors

Johanna Dobbriner
View author publications
You can also search for this author in PubMed Google Scholar
Oliver Jokisch
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Johanna Dobbriner .

Editor information

Editors and Affiliations

Utrecht University, Utrecht, The Netherlands
Albert Ali Salah
St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences, St. Petersburg, Russia
Alexey Karpov
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dobbriner, J., Jokisch, O. (2019). Towards a Dialect Classification in German Speech Samples. In: Salah, A., Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2019. Lecture Notes in Computer Science(), vol 11658. Springer, Cham. https://doi.org/10.1007/978-3-030-26061-3_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-26061-3_7
Published: 24 July 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26060-6
Online ISBN: 978-3-030-26061-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics