Automatic Accent and Gender Recognition of Regional UK Speakers

Jayne, Chrisina; Chang, Victor; Bailey, Jozeene; Xu, Qianwen Ariel

doi:10.1007/978-3-031-08223-8_6

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1600))

Included in the following conference series:

International Conference on Engineering Applications of Neural Networks

952 Accesses
2 Citations

Abstract

With the ubiquity of voice assistants across the UK and the world, speech recognition of the regional accents across the British Isles has proven challenging due to varying pronunciations. This paper proposes an automated recognition of the geographical origin and gender of a voice sample based on the six regional dialects of the United Kingdom. Twenty six features are extracted from 17,877 voice samples and then used to design, implement and evaluate machine learning classifiers based on Artificial Neural Networks (ANNs), Support Vector Machine (SVM), Random Forest (RF) and k-nearest neighbors (k-NN) algorithms. The results suggest that the proposed approach could be applicable for areas such as e-commerce and the service industry, and it provides a contribution to NLP audio research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Development of a regional voice dataset and speaker classification based on machine learning

Article Open access 02 March 2021

Accent and Gender Recognition from English Language Speech and Audio Using Signal Processing and Deep Learning

Noise-Robust Gender Classification System Through Optimal Selection of Acoustic Features

Notes

References

Abdulai, M., Alhassan, A.R.K., Sanus, K.M.: Exploring dialectal variations on quality health communication and healthcare delivery in the sissala district of ghana. Lang. Intercult. Commun. 19(3), 242–255 (2019). https://doi.org/10.1080/14708477.2019.1569671
Ali, A.: Multi-dialect Arabic broadcast speech recognition. Ph.D. thesis, June 2018
Google Scholar
Altman, N.S.: An introduction to Kernel and nearest-neighbor nonparametric regression. Am. Stat. 46, 175–185 (1992)
MathSciNet Google Scholar
Barker, S., Hartel, C.: Intercultural service encounters: an exploratory study of customer experiences. Cross Cult. Manage. Int. J. 11, 3–14 (2004). https://doi.org/10.1108/13527600410797710
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/A:1010933404324
Article MATH Google Scholar
Casale, S., Russo, A., Scebba, G., Serrano, S.: Speech emotion classification using machine learning algorithms. In: 2008 IEEE International Conference on Semantic Computing, pp. 158–165 (2008). https://doi.org/10.1109/ICSC.2008.43
CDEI: Smart speakers and voice assistants. CDEI Report at https://www.gov.uk/government/publications/cdei-publishes-its-first-series-of-three-snapshot-papers-ethical-issues-in-ai/snapshot-paper-smart-speakers-and-voice-assistants (2019)
Chai, T., Draxler, R.R.: Root mean square error (RMSE) or mean absolute error (MAE)?-arguments against avoiding RMSE in the literature. Geosci. Model Dev. 7(3), 1247–1250 (2014)
Article Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995). https://doi.org/10.1023/A:1022627411411
Daugherty, P.R., Wilson, H.J., Chowdhury, R.: Using artificial intelligence to promote diversity. MIT Sloan Manage. Rev. 60(2), 1 (2019)
Google Scholar
Demirsahin, I., Kjartansson, O., Gutkin, A., Rivera, C.: Open-source multi-speaker corpora of the English accents in the British Isles. In: Proceedings of The 12th Language Resources and Evaluation Conference (LREC), pp. 6532–6541. European Language Resources Association (ELRA), Marseille, France, May 2020. https://www.aclweb.org/anthology/2020.lrec-1.804
Dobbriner, J., Jokisch, O.: Towards a dialect classification in German speech samples. In: Salah, A.A., Karpov, A., Potapova, R. (eds.) SPECOM 2019. LNCS (LNAI), vol. 11658, pp. 64–74. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-26061-3_7
Chapter Google Scholar
Ellis, D.: Chroma feature analysis and synthesis. Resources of Laboratory for the Recognition and Organization of Speech and Audio- LabROSA (2007)
Google Scholar
Goodfellow, I.J., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016). http://www.deeplearningbook.org
Google.: Crowdsourced high quality UK and Ireland English dialect speech data set by google. SLR83. http://www.openslr.org/83 (2019)
Fabien, F.P.G., Delerue, O.: On the use of zero-crossing rate for an application of classification of percussive sounds. In: Proceedings of the COST G-6 Conference on Digital Audio Effects (DAFX-00) (2000)
Google Scholar
Harako, K., et al.: Roll-off factor dependence of Nyquist pulse transmission. Opt. Express 24(19), 21986–21994 (2016)
Google Scholar
Ho, T.K.: Random decision forests. In: Proceedings of 3rd International Conference on Document Analysis and Recognition, vol. 1, pp. 278–282 (1995). https://doi.org/10.1109/ICDAR.1995.598994
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Patt. Anal. Mach. Intell. 20(8), 832–844 (1998). https://doi.org/10.1109/34.709601
Article Google Scholar
Hossain, M.F., Hasan, M.M., Ali, H., Sarker, M.R.K.R., Hassan, M.T.: A machine learning approach to recognize speakers region of the united kingdom from continuous speech based on accent classification. In: 2020 11th International Conference on Electrical and Computer Engineering (ICECE), pp. 210–213 (2020). https://doi.org/10.1109/ICECE51571.2020.9393038
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980
Konda, A.K.R., Jimada, S., Cherukuri, P.A.A., Sarma, M.J.: Chatbot implementation for enhancement of student understanding—a natural language processing approach. In: Singh Mer, K.K., Semwal, V.B., Bijalwan, V., Crespo, R.G. (eds.) Proceedings of Integrated Intelligence Enable Networks and Computing. AIS, pp. 171–180. Springer, Singapore (2021). https://doi.org/10.1007/978-981-33-6307-6_18
Chapter Google Scholar
Li, B., et al.: Multi-dialect speech recognition with a single sequence-to-sequence model, pp. 4749–4753 (2018). https://doi.org/10.1109/ICASSP.2018.8461886
Li, B., Dimitriadis, D., Stolcke, A.: Acoustic and lexical sentiment analysis for customer service calls. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5876–5880 (2019). https://doi.org/10.1109/ICASSP.2019.8683679
Logan, B.: Mel frequency cepstral coefficients for music modeling. ISMIR 270 (2000)
Google Scholar
Mai, R., Hoffmann, S.: Four positive effects of a salesperson’s regional dialect in services selling. J. Serv. Res. 14(4), 460–474 (2011). https://doi.org/10.1177/1094670511414551
McKinney, M., Breebaart, J.: Features for audio and music classification, November 2003
Google Scholar
Nie, J., Wang, Q., Xiong, J.: Research on intelligent service of customer service system. Cogn. Comput. Syst. 3(3), 197–205 (2021). https://doi.org/10.1049/ccs2.12012, https://ietresearch.onlinelibrary.wiley.com/doi/abs/10.1049/ccs2.12012
NLS: Prepare for the voice revolution, newcastle life sciences centre. https://www.pwc.com/us/en/services/consulting/library/consumer-intelligence-series/voice-assistants.html (2018)
Paulmann, S., Uskul, A.K.: Cross-cultural emotional prosody recognition: evidence from Chinese and British listeners. Cogn. Emotion 28(2), 230–244 (2014). https://doi.org/10.1080/02699931.2013.812033, pMID: 23862740
PwC: Research fears technology could eradicate regional accents. https://www.life.org.uk/news/technology-could-eradicate-regional-accents (2018)
Sahu, M.P.: Automatic speech recognition in mobile customer care service. Int. J. New Pract. Manage. Eng. 4(01), 07–11 (2015). https://doi.org/10.17762/ijnpme.v4i01.34, http://ijnpme.org/index.php/IJNPME/article/view/34
Spike: The dialect of tech. https://spike.digital/2018/08/28/the-dialect-of-tech/ (2018)
Yoo, S., Song, I., Bengio, Y.: A highly adaptive acoustic model for accurate multi-dialect speech recognition. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5716–5720 (2019). https://doi.org/10.1109/ICASSP.2019.8683705
Zhong, J., Zhang, P., Li, X.: Adaptive recognition of different accents conversations based on convolutional neural network. Multimed. Tools Appl. 78(21), 30749–30767 (2019). https://doi.org/10.1007/s11042-018-6590-4
Zhu, J., Wu, D.: Application of new artificial intelligence technology in the voice recognition and analysis system of electric power information customer service. In: Proceedings of the 2020 International Conference on Computers, Information Processing and Advanced Education, pp. 187–193. CIPAE 2020, Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3419635.3419686

Download references

Author information

Authors and Affiliations

Teesside University, Middlesbrough, TS1 3BX, UK
Chrisina Jayne, Jozeene Bailey & Qianwen Ariel Xu
Aston University, Aston St, Birmingham, B4 7ET, UK
Victor Chang

Authors

Chrisina Jayne
View author publications
You can also search for this author in PubMed Google Scholar
Victor Chang
View author publications
You can also search for this author in PubMed Google Scholar
Jozeene Bailey
View author publications
You can also search for this author in PubMed Google Scholar
Qianwen Ariel Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chrisina Jayne .

Editor information

Editors and Affiliations

Democritus University of Thrace, Xanthi, Greece
Lazaros Iliadis
Teesside University, Middlesbrough, UK
Chrisina Jayne
Aristotle University of Thessaloniki, Thessaloniki, Greece
Anastasios Tefas
University of the West of England, Bristol, UK
Elias Pimenidis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jayne, C., Chang, V., Bailey, J., Xu, Q.A. (2022). Automatic Accent and Gender Recognition of Regional UK Speakers. In: Iliadis, L., Jayne, C., Tefas, A., Pimenidis, E. (eds) Engineering Applications of Neural Networks. EANN 2022. Communications in Computer and Information Science, vol 1600. Springer, Cham. https://doi.org/10.1007/978-3-031-08223-8_6

Download citation

DOI: https://doi.org/10.1007/978-3-031-08223-8_6
Published: 10 June 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08222-1
Online ISBN: 978-3-031-08223-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Automatic Accent and Gender Recognition of Regional UK Speakers