Skip to main content

Towards a Dialect Classification in German Speech Samples

  • Conference paper
  • First Online:
Book cover Speech and Computer (SPECOM 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11658))

Included in the following conference series:

Abstract

The automatic classification of a speaker’s dialect can enrich many applications, e.g. in the human-machine interaction (HMI) or natural language processing (NLP) but also in specific areas such as pronunciation tutoring, forensic analysis or personalization of call-center talks. Although a lot of HMI/NLP-related research has been dedicated to different tasks in affective computing, emotion recognition, semantic understanding and other advanced topics, there seems to be a lack of methods for an automated dialect analysis that is not based on transcriptions, in particular for some languages like German. For other languages such as English, Mandarin or Arabic, a multitude of feature combinations and classification methods has been tried already, which provides a starting point for our study. We describe selected experiments to train suitable classifiers on German dialect varieties in the corpus “Regional Variants of German 1” (RVG1). Our article starts with a systematic choice of appropriate spectral features. In a second step, these features are post-processed with different methods and used to train one Gaussian Mixture Model (GMM) per feature combination as a Universal Background Model (UBM). The resulting UBMs are then adapted to a varied selection of dialects by maximum-a-posteriori (MAP) adaptation. Our preliminary results on German show, that a dialect discrimination and classification is possible. The unweighted recognition accuracy ranges from 32.4 to 54.9% in a 3-dialects test and from 19.6 to 31.4% in a classification of 9-dialects. Some dialects are easier distinguishable, purely using spectral features, while others require a different feature set or more sophisticated classification methods, which we will explore in future experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hanani, A., Russell, M.J., Carey, M.J.: Human and computer recognition of regional accents and ethnic groups from British English speech. Comput. Speech Lang. 27, 59–74 (2013). https://doi.org/10.1016/j.csl.2012.01.003

    Article  Google Scholar 

  2. Najafian, M., Khurana, S., Shon, S., Ali, A., Glass, J.R.: Exploiting convolutional neural networks for phonotactic based dialect identification. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, AB, Canada, 15–20 April 2018, pp. 5174–5178 (2018). https://doi.org/10.1109/ICASSP.2018.8461486

  3. Wang, H., van Heuven, V.J.: Relative contribution of vowel quality and duration to native language identification in foreign-accented English. In: Proceedings of the 2nd International Conference on Cryptography, Security and Privacy, ICCSP 2018, Guiyang, China, 16–19 March 2018, pp. 16–20 (2018). https://doi.org/10.1145/3199478.3199507

  4. Brown, G.: Automatic accent recognition systems and the effects of data on performance. In: Odyssey 2016: The Speaker and Language Recognition Workshop, Bilbao, Spain, 21–24 June 2016, pp. 94–100 (2016). https://doi.org/10.21437/Odyssey.2016-14

  5. Bougrine, S., Cherroun, H., Ziadi, D.: Hierarchical classification for spoken Arabic dialect identification using prosody: Case of Algerian dialects. CoRR abs/1703.10065 (2017). http://arxiv.org/abs/1703.10065

  6. Biadsy, F., Hirschberg, J., Habash, N.: Spoken Arabic dialect identification using phonotactic modeling. In: Proceedings of the Workshop on Computational Approaches to Semitic Languages, SEMITIC@EACL 2009, Athens, Greece, 31 March 2009, pp. 53–61 (2009). https://aclanthology.info/papers/W09-0807/w09-0807

  7. Akbacak, M., Vergyri, D., Stolcke, A., Scheffer, N., Mandal, A.: Effective Arabic dialect classification using diverse phonotactic models. In: INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association, Florence, Italy, 27–31 August 2011, pp. 737–740 (2011). http://www.isca-speech.org/archive/interspeech_2011/i11_0737.html

  8. Zheng, Y., et al.: Accent detection and speech recognition for Shanghai-accented Mandarin. In: INTERSPEECH 2005 - Eurospeech, 9th European Conference on Speech Communication and Technology, Lisbon, Portugal, 4–8 September 2005, pp. 217–220 (2005). http://www.isca-speech.org/archive/interspeech_2005/i05_0217.html

  9. Hou, J., Liu, Y., Zheng, T.F., Olsen, J.Ø., Tian, J.: Multi-layered features with SVM for Chinese accent identification. In: 2010 International Conference on Audio, Language and Image Processing, pp. 25–30 (2010). https://doi.org/10.1109/ICALIP.2010.5685023

  10. Lei, Y., Hansen, J.H.L.: Dialect classification via text-independent training and testing for Arabic, Spanish, and Chinese. IEEE Trans. Audio Speech Lang. Process. 19, 85–96 (2011). https://doi.org/10.1109/TASL.2010.2045184

    Article  Google Scholar 

  11. Torres-Carrasquillo, P.A., Sturim, D.E., Reynolds, D.A., McCree, A.: Eigen-channel compensation and discriminatively trained Gaussian mixture models for dialect and accent recognition. In: INTERSPEECH 2008, 9th Annual Conference of the International Speech Communication Association, Brisbane, Australia, 22–26 September 2008, pp. 723–726 (2008). http://www.isca-speech.org/archive/interspeech_2008/i08_0723.html

  12. Biadsy, F., Hirschberg, J., Collins, M.: Dialect recognition using a phone-GMM-supervector-based SVM kernel. In: INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, 26–30 September 2010, pp. 753–756 (2010). http://www.isca-speech.org/archive/interspeech_2010/i10_0753.html

  13. Biadsy, F.: Automatic dialect and accent recognition and its application to speech recognition. Ph.D. thesis, Columbia University (2011). https://doi.org/10.7916/D8M61S68

  14. Zissman, M.A., Gleason, T.P., Rekart, D., Losiewicz, B.L.: Automatic dialect identification of extemporaneous conversational, Latin American Spanish speech. In: 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings, ICASSP ’96, Atlanta, Georgia, USA, 7–10 May 1996, pp. 777–780 (1996). https://doi.org/10.1109/ICASSP.1996.543236

  15. Chittaragi, N.B., Prakash, A., Koolagudi, S.: Dialect identification using spectral and prosodic features on single and ensemble classifiers. Arab. J. Sci. Eng. 43, 4289–4302 (2017). https://doi.org/10.1007/s13369-017-2941-0

    Article  Google Scholar 

  16. Najafian, M., Safavi, S., Weber, P., Russell, M.J.: Identification of British English regional accents using fusion of i-vector and multi-accent phonotactic systems. In: Odyssey 2016: The Speaker and Language Recognition Workshop, Bilbao, Spain, 21–24 June 2016, pp. 132–139 (2016). https://doi.org/10.21437/Odyssey.2016-19

  17. Zhang, Q., Boril, H., Hansen, J.H.L.: Supervector pre-processing for PRSVM-based Chinese and Arabic dialect identification. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2013, Vancouver, BC, Canada, 26–31 May 2013, pp. 7363–7367 (2013). https://doi.org/10.1109/ICASSP.2013.6639093

  18. Liu, G., Hansen, J.H.L.: A systematic strategy for robust automatic dialect identification. In: Proceedings of the 19th European Signal Processing Conference, EUSIPCO 2011, Barcelona, Spain, 29 August–2 September 2011, pp. 2138–2141 (2011). http://ieeexplore.ieee.org/document/7074191/

  19. Lazaridis, A., el Khoury, E., Goldman, J., Avanzi, M., Marcel, S., Garner, P.N.: Swiss french regional accent identification. In: Odyssey 2014: The Speaker and Language Recognition Workshop, Joensuu, Finland, 16–19 June 2014 (2014). https://isca-speech.org/archive/odyssey_2014/abstracts.html#abs29

  20. Burger, S., Schiel, F.: RVG 1 - a database for regional variants of contemporary German. In: Proceedings of the 1st International Conference on Language Resources and Evaluation, pp. 1083–1087. Granada, Spain (1998). https://www.phonetik.uni-muenchen.de/forschung/publikationen/Burger-98-RVG1.ps

  21. Mettke, H.: Mittelhochdeutsche Grammatik. VEB Bibliographisches Institut, Leipzig, Germany (1989)

    Google Scholar 

  22. Larcher, A., Lee, K.A., Meignier, S.: An extensible speaker identification sidekit in Python. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, 20–25 March 2016, pp. 5095–5099 (2016). https://doi.org/10.1109/ICASSP.2016.7472648

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Johanna Dobbriner .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dobbriner, J., Jokisch, O. (2019). Towards a Dialect Classification in German Speech Samples. In: Salah, A., Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2019. Lecture Notes in Computer Science(), vol 11658. Springer, Cham. https://doi.org/10.1007/978-3-030-26061-3_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-26061-3_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-26060-6

  • Online ISBN: 978-3-030-26061-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics