Abstract
Current research in automatic speech recognition is primarily concerned with the correct evaluation of linguistic information transmitted in the speech signal and with the identification of variations, naturally present in speech. These differences in speech may be due to the individual’s age; gender; or speaking style influenced by his dialect. Undoubtedly, the focus of research in this field is to strengthen further the techniques developed thus far, regarding their reliability and accuracy. The endeavour of this research paper is to primarily concentrate on analysis and modelling of linguistic and paralinguistic information embedded in the speech signal for discovering the similarities and dissimilarities among acoustic characteristics arising out of different dialects. This paper investigates the influence of dialectal variations, by measuring and analysing certain acoustic features such as formant frequencies, pitch, pitch slope, duration and intensity of vowel sounds. For automatic identification of native dialect, these differences are further exploited, given a sample of native speaker’s speech. For the classification of dialect in the spoken utterances support vector machines along with dialect-specific Gaussian mixture models were used. The system performance is compared with human perception of dialects. The proposed study focuses on various dialects of one of the world’s major language; Hindi.
Similar content being viewed by others
References
Adank P, Van Hout R, Van de Velde H (2007) An acoustic description of the vowels of northern and southern standard Dutch II: regional varietiesa. J Acoust Soc Am 121(2):1130–1141
Aggarwal RK, Dave M (2012) Integration of multiple acoustic and language models for improved Hindi speech recognition system. Int J Speech Technol 15(2):165–180
Agrawal SS, Jain A, Sinha S (2016) Analysis and modeling of acoustic information for automatic dialect classification. Int J Speech Technol 19(3):593–609
Barkat M, Ohala J, Pellegrino F (1999) Prosody as a distinctive feature for the discrimination of Arabic dialects. Eurospeech 99:395–398
Behravan H, Hautamäki V, Kinnunen T (2015) Factors affecting i- vector based foreign accent recognition: a case study in spoken Finnish. Speech Commun 66:118–129
Biadsy F (2011) Automatic dialect and accent recognition and its application to speech recognition. Ph.D. Thesis, Columbia University
Biadsy F, Hirschberg J, Ellis DPW (2011) Dialect and accent recognition using phonetic-segmentation supervectors. In: INTERSPEECH, pp 752–756
Bianchini M, Frasconi P, Gori M (1995) Learning in multilayered networks used as autoassociators. IEEE Trans Neural Netw 6(2):512–515
Blackburn CS, Vonwiller J, King RW (1993) Automatic accent classification using artificial neural networks. In: EUROSPEECH, vol 2, pp 1241–1244
Chambers JK, Trudgill P (1998) Dialectology. Cambridge University Press, Cambridge
Chan MV , Feng X , Heinen JA, Niederjohn RJ (1994) Classification of speech accents with neural networks. In: Neural networks, world congress on computational intelligence, vol 7, pp 4483–4486. IEEE
Chen T, Huang C, Chang E, Wang J (2001) Automatic accent identification using Gaussian mixture models. In: Workshop on automatic speech recognition and understanding, pp 343–346. IEEE
Cho T, Keating PA (2001) Articulatory and acoustic studies on domain-initial strengthening in Korean. J Phonetics 29(2):155–190
Deivapalan PG, Jha M, Guttikonda R, Murthy HA (2008) DONLabel: an automatic labeling tool for Indian languages. Energy 2:4
DeMarco A, Cox SJ (2013) Native accent classification via i-vectors and speaker compensation fusion. In :INTERSPEECH, pp 1472–1476
Dyrud LO (2001) Hindi-Urdu: stress accent or non-stress accent?. Ph.D. Thesis, University of North Dakota
Ganapathiraju A, Hamaker J, Picone J, Ordowski M, Doddington GR (2001) Syllable-based large vocabulary continuous speech recognition. IEEE Trans Speech Audio Process 9(4):358–366
Gang L, Lei Y , Hansen JHL (2010) Dialect identification: impact of differences between read versus spontaneous speech. In: Signal processing conference, 2010 18th European, pp 2003–2006. IEEE
Hanani A, Russell MJ, Carey MJ (2013) Human and computer recognition of regional accents and ethnic groups from British English speech. Comput Speech Lang 27(1):59–74
Hansen JHL, Arslan JHL (1995) Foreign accent classification using source generator based prosodic features. In: Proceeding acoustics, speech, and signal processing, vol 1, pp 836–839. IEEE
Hou J, Liu Y, Zheng TF, Olsen J, Tian J (2010) Multi- layered features with SVM for Chinese accent identification. In: Proceeding audio language and image processing (ICALIP), pp 25–30. IEEE
Huang R, Hansen JHL, Angkititrakul P (2007) Dialect/accent classification using unrestricted audio. IEEE Trans Audio Speech Lang Process 15(2):453–464
Koolagudi SG, Maity S, Vuppala AK, Chakrabarti S, Rao KS (2009) IITKGP-SESC: speech database for emotion analysis. In: Contemporary computing. Springer, Berlin, pp 485–492
Kulshreshtha M, Mathur R (2012) Dialect accent features for establishing speaker identity: a case study. Springer, Berlin
Kumar M, Rajput N, Verma A (2004) A large-vocabulary continuous speech recognition system for Hindi. IBM J Res Dev 48(5.6):703–715
Kumpf K, King K (1997) Foreign speaker accent classification using phoneme-dependent accent discrimination models and comparisons with human perception benchmarks. In: EUROSPEECH, pp 2323–2326
Ladefoged P, Broadbent DE (1957) Information conveyed by vowels. J Acoust Soc Am 29(1):98–104
Lazaridis A, Goldman J-P, Avanzi M, Garner PN (2014) Syllable-based regional Swiss French accent identification using prosodic features. In: Nouveaux cahiers de linguistique francaise, number EPFL-CONF-199821
Levent M, Hansen JHL (1996) Language accent classification in American English. Speech Commun 18(4):353–367
Liu M, Xu B, Hunng T, Deng Y, Li C ( 2000) Mandarin accent adaptation based on context-independent/context-dependent pronunciation modeling. In: Proceedings acoustics, speech, and signal processing, vol 2, pp II1025–II1028. IEEE
Ljolje A, Fallside F (1987) Recognition of isolated prosodic patterns using Hidden Markov models. Comput Speech Lang 2(1):27–34
Ma B, Zhu D, Tong R (2006) Chinese dialect identification using tone features based on pitch flux. In :Acoustics, speech and signal processing, vol 1, pp I–I. IEEE
Mehrabani M, Boril H, Hansen JHL (2010) Dialect distance assessment method based on comparison of pitch pattern statistical models. In: Acoustics speech and signal processing (ICASSP), pp 5158–5161. IEEE
Mishra D, Bali K (2011) A comparative phonological study of the dialects of Hindi. In: Proceedings of ICPhS XVII, Hong Kong, pp 17–21
Ohala M (1986) A search for the phonetic correlates of Hindi stress. In: Krishnamurti B, Masica C, Sinha A (eds) South Asian languages: structure, convergence, and diglossia, pp 81–92
OShaughnessy D (2008) Automatic speech recognition: history, methods and challenges. Pattern Recogn 41(10):2965–2979
Peters J, Gilles P, Auer P, Selting M (2002) Identification of regional varieties by intonational cues: an experimental study on Hamburg and Berlin German. Lang Speech 45(2):115–138
Rabiner L, Juang B-H (1993) Fundamentals of speech recognition. Prentice Hall, Upper Saddle River
Raman S (1985) Speech recognition of Hindi stop consonants. Ph.D. Thesis, Indian Institute of Technology, Madras
Rao PVS (1993) VOICE: an integrated speech recognition synthesis system for the Hindi language. Speech Commun 13(1):197–205
Rao KS, Koolagudi SG (2012) Emotion recognition using speech features. Springer, Berlin
Rao KS, Yegnanarayana B (2009) Intonation modeling for Indian languages. Comput Speech Lang 23(2):240–256
Ryan R (2008) Multiclass classification. http://www.mit.edu/~9.520/spring09/Classes/. Accessed 20 Sept 2014
Rym H, Melissa B-D, Emmanuel F, François P (2004) Speech timing and rhythmic structure in Arabic dialects: a comparison of two approaches. Interspeech 4:1613–1616
Sekhar CC, Yegnanarayana B (2002) A constraint satisfaction model for recognition of stop consonant-vowel (SCV) utterances. IEEE Trans Speech Audio Process 10(7):472–480
Sinha S, Agrawal SS, Jain A (2013) Dialectal influences on acoustic duration of Hindi phonemes. In: Conference on Asian spoken language research and evaluation (O- COCOSDA/CASLRE), pp 1–5. IEEE
Sinha S, Jain A, Agrawal SS (2015) Fusion of multi-stream speech features for dialect classification. CSI Trans ICT 2(4):243–252
Tang H, Ghorbani AA (2003) Accent classification using support vector machine and hidden Markov model. In: Advances in artificial intelligence. Springer, Berlin, pp 629–631
Torres-Carrasquillo PA , Gleason TP , Reynolds DA (2004) Dialect identification using Gaussian mixture models. In: ODYSSEY 04-The speaker and language recognition workshop, pp 297–300
Yan Q, Vaseghi S (2003) Analysis, modelling and synthesis of formants of British, American and Australian accents. In: Proceeding acoustics, speech, and signal processing, vol 1, pp I–712. IEEE
Zheng DC, Dyke D, Berryman F, Morgan C (2012) A new approach to acoustic analysis of two British regional accents: Birmingham and Liverpool accents. Int J Speech Technol 15(2):77–85
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sinha, S., Jain, A. & Agrawal, S.S. Empirical analysis of linguistic and paralinguistic information for automatic dialect classification. Artif Intell Rev 51, 647–672 (2019). https://doi.org/10.1007/s10462-017-9573-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-017-9573-3