Empirical analysis of linguistic and paralinguistic information for automatic dialect classification

Sinha, Shweta; Jain, Aruna; Agrawal, Shyam S.

doi:10.1007/s10462-017-9573-3

Empirical analysis of linguistic and paralinguistic information for automatic dialect classification

Published: 28 July 2017

Volume 51, pages 647–672, (2019)
Cite this article

Artificial Intelligence Review Aims and scope Submit manuscript

467 Accesses
7 Citations
3 Altmetric
Explore all metrics

Abstract

Current research in automatic speech recognition is primarily concerned with the correct evaluation of linguistic information transmitted in the speech signal and with the identification of variations, naturally present in speech. These differences in speech may be due to the individual’s age; gender; or speaking style influenced by his dialect. Undoubtedly, the focus of research in this field is to strengthen further the techniques developed thus far, regarding their reliability and accuracy. The endeavour of this research paper is to primarily concentrate on analysis and modelling of linguistic and paralinguistic information embedded in the speech signal for discovering the similarities and dissimilarities among acoustic characteristics arising out of different dialects. This paper investigates the influence of dialectal variations, by measuring and analysing certain acoustic features such as formant frequencies, pitch, pitch slope, duration and intensity of vowel sounds. For automatic identification of native dialect, these differences are further exploited, given a sample of native speaker’s speech. For the classification of dialect in the spoken utterances support vector machines along with dialect-specific Gaussian mixture models were used. The system performance is compared with human perception of dialects. The proposed study focuses on various dialects of one of the world’s major language; Hindi.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic speech recognition: a survey

Article 10 November 2020

Early dementia detection with speech analysis and machine learning techniques

Article Open access 11 April 2024

Speech Emotion Recognition: A Comprehensive Survey

Article 08 March 2023

References

Adank P, Van Hout R, Van de Velde H (2007) An acoustic description of the vowels of northern and southern standard Dutch II: regional varietiesa. J Acoust Soc Am 121(2):1130–1141
Article Google Scholar
Aggarwal RK, Dave M (2012) Integration of multiple acoustic and language models for improved Hindi speech recognition system. Int J Speech Technol 15(2):165–180
Article Google Scholar
Agrawal SS, Jain A, Sinha S (2016) Analysis and modeling of acoustic information for automatic dialect classification. Int J Speech Technol 19(3):593–609
Article Google Scholar
Barkat M, Ohala J, Pellegrino F (1999) Prosody as a distinctive feature for the discrimination of Arabic dialects. Eurospeech 99:395–398
Google Scholar
Behravan H, Hautamäki V, Kinnunen T (2015) Factors affecting i- vector based foreign accent recognition: a case study in spoken Finnish. Speech Commun 66:118–129
Article Google Scholar
Biadsy F (2011) Automatic dialect and accent recognition and its application to speech recognition. Ph.D. Thesis, Columbia University
Biadsy F, Hirschberg J, Ellis DPW (2011) Dialect and accent recognition using phonetic-segmentation supervectors. In: INTERSPEECH, pp 752–756
Bianchini M, Frasconi P, Gori M (1995) Learning in multilayered networks used as autoassociators. IEEE Trans Neural Netw 6(2):512–515
Article Google Scholar
Blackburn CS, Vonwiller J, King RW (1993) Automatic accent classification using artificial neural networks. In: EUROSPEECH, vol 2, pp 1241–1244
Chambers JK, Trudgill P (1998) Dialectology. Cambridge University Press, Cambridge
Book Google Scholar
Chan MV , Feng X , Heinen JA, Niederjohn RJ (1994) Classification of speech accents with neural networks. In: Neural networks, world congress on computational intelligence, vol 7, pp 4483–4486. IEEE
Chen T, Huang C, Chang E, Wang J (2001) Automatic accent identification using Gaussian mixture models. In: Workshop on automatic speech recognition and understanding, pp 343–346. IEEE
Cho T, Keating PA (2001) Articulatory and acoustic studies on domain-initial strengthening in Korean. J Phonetics 29(2):155–190
Article Google Scholar
Deivapalan PG, Jha M, Guttikonda R, Murthy HA (2008) DONLabel: an automatic labeling tool for Indian languages. Energy 2:4
Google Scholar
DeMarco A, Cox SJ (2013) Native accent classification via i-vectors and speaker compensation fusion. In :INTERSPEECH, pp 1472–1476
Dyrud LO (2001) Hindi-Urdu: stress accent or non-stress accent?. Ph.D. Thesis, University of North Dakota
Ganapathiraju A, Hamaker J, Picone J, Ordowski M, Doddington GR (2001) Syllable-based large vocabulary continuous speech recognition. IEEE Trans Speech Audio Process 9(4):358–366
Article Google Scholar
Gang L, Lei Y , Hansen JHL (2010) Dialect identification: impact of differences between read versus spontaneous speech. In: Signal processing conference, 2010 18th European, pp 2003–2006. IEEE
Hanani A, Russell MJ, Carey MJ (2013) Human and computer recognition of regional accents and ethnic groups from British English speech. Comput Speech Lang 27(1):59–74
Article Google Scholar
Hansen JHL, Arslan JHL (1995) Foreign accent classification using source generator based prosodic features. In: Proceeding acoustics, speech, and signal processing, vol 1, pp 836–839. IEEE
Hou J, Liu Y, Zheng TF, Olsen J, Tian J (2010) Multi- layered features with SVM for Chinese accent identification. In: Proceeding audio language and image processing (ICALIP), pp 25–30. IEEE
Huang R, Hansen JHL, Angkititrakul P (2007) Dialect/accent classification using unrestricted audio. IEEE Trans Audio Speech Lang Process 15(2):453–464
Article Google Scholar
Koolagudi SG, Maity S, Vuppala AK, Chakrabarti S, Rao KS (2009) IITKGP-SESC: speech database for emotion analysis. In: Contemporary computing. Springer, Berlin, pp 485–492
Kulshreshtha M, Mathur R (2012) Dialect accent features for establishing speaker identity: a case study. Springer, Berlin
Book Google Scholar
Kumar M, Rajput N, Verma A (2004) A large-vocabulary continuous speech recognition system for Hindi. IBM J Res Dev 48(5.6):703–715
Article Google Scholar
Kumpf K, King K (1997) Foreign speaker accent classification using phoneme-dependent accent discrimination models and comparisons with human perception benchmarks. In: EUROSPEECH, pp 2323–2326
Ladefoged P, Broadbent DE (1957) Information conveyed by vowels. J Acoust Soc Am 29(1):98–104
Article Google Scholar
Lazaridis A, Goldman J-P, Avanzi M, Garner PN (2014) Syllable-based regional Swiss French accent identification using prosodic features. In: Nouveaux cahiers de linguistique francaise, number EPFL-CONF-199821
Levent M, Hansen JHL (1996) Language accent classification in American English. Speech Commun 18(4):353–367
Article Google Scholar
Liu M, Xu B, Hunng T, Deng Y, Li C ( 2000) Mandarin accent adaptation based on context-independent/context-dependent pronunciation modeling. In: Proceedings acoustics, speech, and signal processing, vol 2, pp II1025–II1028. IEEE
Ljolje A, Fallside F (1987) Recognition of isolated prosodic patterns using Hidden Markov models. Comput Speech Lang 2(1):27–34
Article Google Scholar
Ma B, Zhu D, Tong R (2006) Chinese dialect identification using tone features based on pitch flux. In :Acoustics, speech and signal processing, vol 1, pp I–I. IEEE
Mehrabani M, Boril H, Hansen JHL (2010) Dialect distance assessment method based on comparison of pitch pattern statistical models. In: Acoustics speech and signal processing (ICASSP), pp 5158–5161. IEEE
Mishra D, Bali K (2011) A comparative phonological study of the dialects of Hindi. In: Proceedings of ICPhS XVII, Hong Kong, pp 17–21
Ohala M (1986) A search for the phonetic correlates of Hindi stress. In: Krishnamurti B, Masica C, Sinha A (eds) South Asian languages: structure, convergence, and diglossia, pp 81–92
OShaughnessy D (2008) Automatic speech recognition: history, methods and challenges. Pattern Recogn 41(10):2965–2979
Article MATH Google Scholar
Peters J, Gilles P, Auer P, Selting M (2002) Identification of regional varieties by intonational cues: an experimental study on Hamburg and Berlin German. Lang Speech 45(2):115–138
Article Google Scholar
Rabiner L, Juang B-H (1993) Fundamentals of speech recognition. Prentice Hall, Upper Saddle River
Google Scholar
Raman S (1985) Speech recognition of Hindi stop consonants. Ph.D. Thesis, Indian Institute of Technology, Madras
Rao PVS (1993) VOICE: an integrated speech recognition synthesis system for the Hindi language. Speech Commun 13(1):197–205
Article MathSciNet Google Scholar
Rao KS, Koolagudi SG (2012) Emotion recognition using speech features. Springer, Berlin
MATH Google Scholar
Rao KS, Yegnanarayana B (2009) Intonation modeling for Indian languages. Comput Speech Lang 23(2):240–256
Article Google Scholar
Ryan R (2008) Multiclass classification. http://www.mit.edu/~9.520/spring09/Classes/. Accessed 20 Sept 2014
Rym H, Melissa B-D, Emmanuel F, François P (2004) Speech timing and rhythmic structure in Arabic dialects: a comparison of two approaches. Interspeech 4:1613–1616
Google Scholar
Sekhar CC, Yegnanarayana B (2002) A constraint satisfaction model for recognition of stop consonant-vowel (SCV) utterances. IEEE Trans Speech Audio Process 10(7):472–480
Article Google Scholar
Sinha S, Agrawal SS, Jain A (2013) Dialectal influences on acoustic duration of Hindi phonemes. In: Conference on Asian spoken language research and evaluation (O- COCOSDA/CASLRE), pp 1–5. IEEE
Sinha S, Jain A, Agrawal SS (2015) Fusion of multi-stream speech features for dialect classification. CSI Trans ICT 2(4):243–252
Article Google Scholar
Tang H, Ghorbani AA (2003) Accent classification using support vector machine and hidden Markov model. In: Advances in artificial intelligence. Springer, Berlin, pp 629–631
Torres-Carrasquillo PA , Gleason TP , Reynolds DA (2004) Dialect identification using Gaussian mixture models. In: ODYSSEY 04-The speaker and language recognition workshop, pp 297–300
Yan Q, Vaseghi S (2003) Analysis, modelling and synthesis of formants of British, American and Australian accents. In: Proceeding acoustics, speech, and signal processing, vol 1, pp I–712. IEEE
Zheng DC, Dyke D, Berryman F, Morgan C (2012) A new approach to acoustic analysis of two British regional accents: Birmingham and Liverpool accents. Int J Speech Technol 15(2):77–85
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Birla Institute of Technology, Mesra, Ranchi, India
Shweta Sinha & Aruna Jain
KIIT College of Engineering, KIIT Campus, Sohna Road, Gurgaon, Haryana, India
Shyam S. Agrawal

Authors

Shweta Sinha
View author publications
You can also search for this author in PubMed Google Scholar
Aruna Jain
View author publications
You can also search for this author in PubMed Google Scholar
Shyam S. Agrawal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shweta Sinha.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sinha, S., Jain, A. & Agrawal, S.S. Empirical analysis of linguistic and paralinguistic information for automatic dialect classification. Artif Intell Rev 51, 647–672 (2019). https://doi.org/10.1007/s10462-017-9573-3

Download citation

Published: 28 July 2017
Issue Date: 15 April 2019
DOI: https://doi.org/10.1007/s10462-017-9573-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Empirical analysis of linguistic and paralinguistic information for automatic dialect classification

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

Early dementia detection with speech analysis and machine learning techniques

Speech Emotion Recognition: A Comprehensive Survey

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Empirical analysis of linguistic and paralinguistic information for automatic dialect classification

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

Early dementia detection with speech analysis and machine learning techniques

Speech Emotion Recognition: A Comprehensive Survey

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation