Skip to main content

Automatic Accent and Gender Recognition of Regional UK Speakers

  • Conference paper
  • First Online:
Engineering Applications of Neural Networks (EANN 2022)

Abstract

With the ubiquity of voice assistants across the UK and the world, speech recognition of the regional accents across the British Isles has proven challenging due to varying pronunciations. This paper proposes an automated recognition of the geographical origin and gender of a voice sample based on the six regional dialects of the United Kingdom. Twenty six features are extracted from 17,877 voice samples and then used to design, implement and evaluate machine learning classifiers based on Artificial Neural Networks (ANNs), Support Vector Machine (SVM), Random Forest (RF) and k-nearest neighbors (k-NN) algorithms. The results suggest that the proposed approach could be applicable for areas such as e-commerce and the service industry, and it provides a contribution to NLP audio research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://librosa.org/doc/latest/index.html.

  2. 2.

    https://keras.io/.

  3. 3.

    https://www.kaggle.com/.

  4. 4.

    https://scikit-learn.org/stable/.

  5. 5.

    https://scikit-learn.org/stable/.

References

  1. Abdulai, M., Alhassan, A.R.K., Sanus, K.M.: Exploring dialectal variations on quality health communication and healthcare delivery in the sissala district of ghana. Lang. Intercult. Commun. 19(3), 242–255 (2019). https://doi.org/10.1080/14708477.2019.1569671

  2. Ali, A.: Multi-dialect Arabic broadcast speech recognition. Ph.D. thesis, June 2018

    Google Scholar 

  3. Altman, N.S.: An introduction to Kernel and nearest-neighbor nonparametric regression. Am. Stat. 46, 175–185 (1992)

    MathSciNet  Google Scholar 

  4. Barker, S., Hartel, C.: Intercultural service encounters: an exploratory study of customer experiences. Cross Cult. Manage. Int. J. 11, 3–14 (2004). https://doi.org/10.1108/13527600410797710

  5. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/A:1010933404324

    Article  MATH  Google Scholar 

  6. Casale, S., Russo, A., Scebba, G., Serrano, S.: Speech emotion classification using machine learning algorithms. In: 2008 IEEE International Conference on Semantic Computing, pp. 158–165 (2008). https://doi.org/10.1109/ICSC.2008.43

  7. CDEI: Smart speakers and voice assistants. CDEI Report at https://www.gov.uk/government/publications/cdei-publishes-its-first-series-of-three-snapshot-papers-ethical-issues-in-ai/snapshot-paper-smart-speakers-and-voice-assistants (2019)

  8. Chai, T., Draxler, R.R.: Root mean square error (RMSE) or mean absolute error (MAE)?-arguments against avoiding RMSE in the literature. Geosci. Model Dev. 7(3), 1247–1250 (2014)

    Article  Google Scholar 

  9. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995). https://doi.org/10.1023/A:1022627411411

  10. Daugherty, P.R., Wilson, H.J., Chowdhury, R.: Using artificial intelligence to promote diversity. MIT Sloan Manage. Rev. 60(2), 1 (2019)

    Google Scholar 

  11. Demirsahin, I., Kjartansson, O., Gutkin, A., Rivera, C.: Open-source multi-speaker corpora of the English accents in the British Isles. In: Proceedings of The 12th Language Resources and Evaluation Conference (LREC), pp. 6532–6541. European Language Resources Association (ELRA), Marseille, France, May 2020. https://www.aclweb.org/anthology/2020.lrec-1.804

  12. Dobbriner, J., Jokisch, O.: Towards a dialect classification in German speech samples. In: Salah, A.A., Karpov, A., Potapova, R. (eds.) SPECOM 2019. LNCS (LNAI), vol. 11658, pp. 64–74. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-26061-3_7

    Chapter  Google Scholar 

  13. Ellis, D.: Chroma feature analysis and synthesis. Resources of Laboratory for the Recognition and Organization of Speech and Audio- LabROSA (2007)

    Google Scholar 

  14. Goodfellow, I.J., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016). http://www.deeplearningbook.org

  15. Google.: Crowdsourced high quality UK and Ireland English dialect speech data set by google. SLR83. http://www.openslr.org/83 (2019)

  16. Fabien, F.P.G., Delerue, O.: On the use of zero-crossing rate for an application of classification of percussive sounds. In: Proceedings of the COST G-6 Conference on Digital Audio Effects (DAFX-00) (2000)

    Google Scholar 

  17. Harako, K., et al.: Roll-off factor dependence of Nyquist pulse transmission. Opt. Express 24(19), 21986–21994 (2016)

    Google Scholar 

  18. Ho, T.K.: Random decision forests. In: Proceedings of 3rd International Conference on Document Analysis and Recognition, vol. 1, pp. 278–282 (1995). https://doi.org/10.1109/ICDAR.1995.598994

  19. Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Patt. Anal. Mach. Intell. 20(8), 832–844 (1998). https://doi.org/10.1109/34.709601

    Article  Google Scholar 

  20. Hossain, M.F., Hasan, M.M., Ali, H., Sarker, M.R.K.R., Hassan, M.T.: A machine learning approach to recognize speakers region of the united kingdom from continuous speech based on accent classification. In: 2020 11th International Conference on Electrical and Computer Engineering (ICECE), pp. 210–213 (2020). https://doi.org/10.1109/ICECE51571.2020.9393038

  21. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980

  22. Konda, A.K.R., Jimada, S., Cherukuri, P.A.A., Sarma, M.J.: Chatbot implementation for enhancement of student understanding—a natural language processing approach. In: Singh Mer, K.K., Semwal, V.B., Bijalwan, V., Crespo, R.G. (eds.) Proceedings of Integrated Intelligence Enable Networks and Computing. AIS, pp. 171–180. Springer, Singapore (2021). https://doi.org/10.1007/978-981-33-6307-6_18

    Chapter  Google Scholar 

  23. Li, B., et al.: Multi-dialect speech recognition with a single sequence-to-sequence model, pp. 4749–4753 (2018). https://doi.org/10.1109/ICASSP.2018.8461886

  24. Li, B., Dimitriadis, D., Stolcke, A.: Acoustic and lexical sentiment analysis for customer service calls. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5876–5880 (2019). https://doi.org/10.1109/ICASSP.2019.8683679

  25. Logan, B.: Mel frequency cepstral coefficients for music modeling. ISMIR 270 (2000)

    Google Scholar 

  26. Mai, R., Hoffmann, S.: Four positive effects of a salesperson’s regional dialect in services selling. J. Serv. Res. 14(4), 460–474 (2011). https://doi.org/10.1177/1094670511414551

  27. McKinney, M., Breebaart, J.: Features for audio and music classification, November 2003

    Google Scholar 

  28. Nie, J., Wang, Q., Xiong, J.: Research on intelligent service of customer service system. Cogn. Comput. Syst. 3(3), 197–205 (2021). https://doi.org/10.1049/ccs2.12012, https://ietresearch.onlinelibrary.wiley.com/doi/abs/10.1049/ccs2.12012

  29. NLS: Prepare for the voice revolution, newcastle life sciences centre. https://www.pwc.com/us/en/services/consulting/library/consumer-intelligence-series/voice-assistants.html (2018)

  30. Paulmann, S., Uskul, A.K.: Cross-cultural emotional prosody recognition: evidence from Chinese and British listeners. Cogn. Emotion 28(2), 230–244 (2014). https://doi.org/10.1080/02699931.2013.812033, pMID: 23862740

  31. PwC: Research fears technology could eradicate regional accents. https://www.life.org.uk/news/technology-could-eradicate-regional-accents (2018)

  32. Sahu, M.P.: Automatic speech recognition in mobile customer care service. Int. J. New Pract. Manage. Eng. 4(01), 07–11 (2015). https://doi.org/10.17762/ijnpme.v4i01.34, http://ijnpme.org/index.php/IJNPME/article/view/34

  33. Spike: The dialect of tech. https://spike.digital/2018/08/28/the-dialect-of-tech/ (2018)

  34. Yoo, S., Song, I., Bengio, Y.: A highly adaptive acoustic model for accurate multi-dialect speech recognition. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5716–5720 (2019). https://doi.org/10.1109/ICASSP.2019.8683705

  35. Zhong, J., Zhang, P., Li, X.: Adaptive recognition of different accents conversations based on convolutional neural network. Multimed. Tools Appl. 78(21), 30749–30767 (2019). https://doi.org/10.1007/s11042-018-6590-4

  36. Zhu, J., Wu, D.: Application of new artificial intelligence technology in the voice recognition and analysis system of electric power information customer service. In: Proceedings of the 2020 International Conference on Computers, Information Processing and Advanced Education, pp. 187–193. CIPAE 2020, Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3419635.3419686

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chrisina Jayne .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jayne, C., Chang, V., Bailey, J., Xu, Q.A. (2022). Automatic Accent and Gender Recognition of Regional UK Speakers. In: Iliadis, L., Jayne, C., Tefas, A., Pimenidis, E. (eds) Engineering Applications of Neural Networks. EANN 2022. Communications in Computer and Information Science, vol 1600. Springer, Cham. https://doi.org/10.1007/978-3-031-08223-8_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-08223-8_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-08222-1

  • Online ISBN: 978-3-031-08223-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics