Abstract
With the ubiquity of voice assistants across the UK and the world, speech recognition of the regional accents across the British Isles has proven challenging due to varying pronunciations. This paper proposes an automated recognition of the geographical origin and gender of a voice sample based on the six regional dialects of the United Kingdom. Twenty six features are extracted from 17,877 voice samples and then used to design, implement and evaluate machine learning classifiers based on Artificial Neural Networks (ANNs), Support Vector Machine (SVM), Random Forest (RF) and k-nearest neighbors (k-NN) algorithms. The results suggest that the proposed approach could be applicable for areas such as e-commerce and the service industry, and it provides a contribution to NLP audio research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abdulai, M., Alhassan, A.R.K., Sanus, K.M.: Exploring dialectal variations on quality health communication and healthcare delivery in the sissala district of ghana. Lang. Intercult. Commun. 19(3), 242–255 (2019). https://doi.org/10.1080/14708477.2019.1569671
Ali, A.: Multi-dialect Arabic broadcast speech recognition. Ph.D. thesis, June 2018
Altman, N.S.: An introduction to Kernel and nearest-neighbor nonparametric regression. Am. Stat. 46, 175–185 (1992)
Barker, S., Hartel, C.: Intercultural service encounters: an exploratory study of customer experiences. Cross Cult. Manage. Int. J. 11, 3–14 (2004). https://doi.org/10.1108/13527600410797710
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/A:1010933404324
Casale, S., Russo, A., Scebba, G., Serrano, S.: Speech emotion classification using machine learning algorithms. In: 2008 IEEE International Conference on Semantic Computing, pp. 158–165 (2008). https://doi.org/10.1109/ICSC.2008.43
CDEI: Smart speakers and voice assistants. CDEI Report at https://www.gov.uk/government/publications/cdei-publishes-its-first-series-of-three-snapshot-papers-ethical-issues-in-ai/snapshot-paper-smart-speakers-and-voice-assistants (2019)
Chai, T., Draxler, R.R.: Root mean square error (RMSE) or mean absolute error (MAE)?-arguments against avoiding RMSE in the literature. Geosci. Model Dev. 7(3), 1247–1250 (2014)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995). https://doi.org/10.1023/A:1022627411411
Daugherty, P.R., Wilson, H.J., Chowdhury, R.: Using artificial intelligence to promote diversity. MIT Sloan Manage. Rev. 60(2), 1 (2019)
Demirsahin, I., Kjartansson, O., Gutkin, A., Rivera, C.: Open-source multi-speaker corpora of the English accents in the British Isles. In: Proceedings of The 12th Language Resources and Evaluation Conference (LREC), pp. 6532–6541. European Language Resources Association (ELRA), Marseille, France, May 2020. https://www.aclweb.org/anthology/2020.lrec-1.804
Dobbriner, J., Jokisch, O.: Towards a dialect classification in German speech samples. In: Salah, A.A., Karpov, A., Potapova, R. (eds.) SPECOM 2019. LNCS (LNAI), vol. 11658, pp. 64–74. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-26061-3_7
Ellis, D.: Chroma feature analysis and synthesis. Resources of Laboratory for the Recognition and Organization of Speech and Audio- LabROSA (2007)
Goodfellow, I.J., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016). http://www.deeplearningbook.org
Google.: Crowdsourced high quality UK and Ireland English dialect speech data set by google. SLR83. http://www.openslr.org/83 (2019)
Fabien, F.P.G., Delerue, O.: On the use of zero-crossing rate for an application of classification of percussive sounds. In: Proceedings of the COST G-6 Conference on Digital Audio Effects (DAFX-00) (2000)
Harako, K., et al.: Roll-off factor dependence of Nyquist pulse transmission. Opt. Express 24(19), 21986–21994 (2016)
Ho, T.K.: Random decision forests. In: Proceedings of 3rd International Conference on Document Analysis and Recognition, vol. 1, pp. 278–282 (1995). https://doi.org/10.1109/ICDAR.1995.598994
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Patt. Anal. Mach. Intell. 20(8), 832–844 (1998). https://doi.org/10.1109/34.709601
Hossain, M.F., Hasan, M.M., Ali, H., Sarker, M.R.K.R., Hassan, M.T.: A machine learning approach to recognize speakers region of the united kingdom from continuous speech based on accent classification. In: 2020 11th International Conference on Electrical and Computer Engineering (ICECE), pp. 210–213 (2020). https://doi.org/10.1109/ICECE51571.2020.9393038
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980
Konda, A.K.R., Jimada, S., Cherukuri, P.A.A., Sarma, M.J.: Chatbot implementation for enhancement of student understanding—a natural language processing approach. In: Singh Mer, K.K., Semwal, V.B., Bijalwan, V., Crespo, R.G. (eds.) Proceedings of Integrated Intelligence Enable Networks and Computing. AIS, pp. 171–180. Springer, Singapore (2021). https://doi.org/10.1007/978-981-33-6307-6_18
Li, B., et al.: Multi-dialect speech recognition with a single sequence-to-sequence model, pp. 4749–4753 (2018). https://doi.org/10.1109/ICASSP.2018.8461886
Li, B., Dimitriadis, D., Stolcke, A.: Acoustic and lexical sentiment analysis for customer service calls. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5876–5880 (2019). https://doi.org/10.1109/ICASSP.2019.8683679
Logan, B.: Mel frequency cepstral coefficients for music modeling. ISMIR 270 (2000)
Mai, R., Hoffmann, S.: Four positive effects of a salesperson’s regional dialect in services selling. J. Serv. Res. 14(4), 460–474 (2011). https://doi.org/10.1177/1094670511414551
McKinney, M., Breebaart, J.: Features for audio and music classification, November 2003
Nie, J., Wang, Q., Xiong, J.: Research on intelligent service of customer service system. Cogn. Comput. Syst. 3(3), 197–205 (2021). https://doi.org/10.1049/ccs2.12012, https://ietresearch.onlinelibrary.wiley.com/doi/abs/10.1049/ccs2.12012
NLS: Prepare for the voice revolution, newcastle life sciences centre. https://www.pwc.com/us/en/services/consulting/library/consumer-intelligence-series/voice-assistants.html (2018)
Paulmann, S., Uskul, A.K.: Cross-cultural emotional prosody recognition: evidence from Chinese and British listeners. Cogn. Emotion 28(2), 230–244 (2014). https://doi.org/10.1080/02699931.2013.812033, pMID: 23862740
PwC: Research fears technology could eradicate regional accents. https://www.life.org.uk/news/technology-could-eradicate-regional-accents (2018)
Sahu, M.P.: Automatic speech recognition in mobile customer care service. Int. J. New Pract. Manage. Eng. 4(01), 07–11 (2015). https://doi.org/10.17762/ijnpme.v4i01.34, http://ijnpme.org/index.php/IJNPME/article/view/34
Spike: The dialect of tech. https://spike.digital/2018/08/28/the-dialect-of-tech/ (2018)
Yoo, S., Song, I., Bengio, Y.: A highly adaptive acoustic model for accurate multi-dialect speech recognition. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5716–5720 (2019). https://doi.org/10.1109/ICASSP.2019.8683705
Zhong, J., Zhang, P., Li, X.: Adaptive recognition of different accents conversations based on convolutional neural network. Multimed. Tools Appl. 78(21), 30749–30767 (2019). https://doi.org/10.1007/s11042-018-6590-4
Zhu, J., Wu, D.: Application of new artificial intelligence technology in the voice recognition and analysis system of electric power information customer service. In: Proceedings of the 2020 International Conference on Computers, Information Processing and Advanced Education, pp. 187–193. CIPAE 2020, Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3419635.3419686
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Jayne, C., Chang, V., Bailey, J., Xu, Q.A. (2022). Automatic Accent and Gender Recognition of Regional UK Speakers. In: Iliadis, L., Jayne, C., Tefas, A., Pimenidis, E. (eds) Engineering Applications of Neural Networks. EANN 2022. Communications in Computer and Information Science, vol 1600. Springer, Cham. https://doi.org/10.1007/978-3-031-08223-8_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-08223-8_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08222-1
Online ISBN: 978-3-031-08223-8
eBook Packages: Computer ScienceComputer Science (R0)