Analysis and modeling of acoustic information for automatic dialect classification

Agrawal, S. S.; Jain, Aruna; Sinha, Shweta

doi:10.1007/s10772-016-9351-7

Analysis and modeling of acoustic information for automatic dialect classification

Published: 22 July 2016

Volume 19, pages 593–609, (2016)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

S. S. Agrawal¹,
Aruna Jain² &
Shweta Sinha²

465 Accesses
10 Citations
Explore all metrics

Abstract

A primary challenge in the field of automatic speech recognition is to understand and create acoustic models to represent individual differences in their spoken language. Individual’s age, gender; their speaking styles influenced by their dialect may be few of the reasons for these differences. This work investigates the dialectal differences by measuring the analysis of variance of acoustic features such as, formant frequencies, pitch, pitch slope, duration and intensity for vowel sounds. This paper attempts to discuss methods to capture dialect specific knowledge through vocal tract and prosody information extracted from speech that can be utilized for automatic identification of dialects. Kernel based support vector machine is utilized for measuring the dialect discriminating ability of acoustic features. For the spectral feature shifted delta cepstral coefficients along with Mel frequency cepstral coefficients gives a recognition performance of 66.97 %. Combination of prosodic features performs better with a classification score of 74 %. The model is further evaluated for the combination of spectral and prosodic feature set and achieves a classification accuracy of 88.77 %. The proposed model is compared with the human perception of dialects. The overall work is based on four dialects of Hindi; one of the world’s major languages.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic speech recognition: a survey

Article 10 November 2020

Mishaim Malik, Muhammad Kamran Malik, … Imran Makhdoom

Speech Emotion Recognition: A Comprehensive Survey

Article 08 March 2023

Mohammed Jawad Al-Dujaili & Abbas Ebrahimi-Moghadam

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Amandeep Singh Dhanjal & Williamjeet Singh

References

Adank, P., Van Hout, R., & Van de Velde, H. (2007). An acoustic description of the vowels of northern and southern standard Dutch II: Regional vari- etiesa. The Journal of the Acoustical Society of America, 121(2), 1130–1141.
Article Google Scholar
Aggarwal, R. K., & Dave, M. (2012). Integration of multiple acoustic and language mod- els for improved Hindi speech recognition system. International Journal of Speech Technology, 15(2), 165–180.
Article Google Scholar
Arslan, L. M., & Hansen, J. H. L. (1996). Language accent classification in American English. Speech Communication, 18(4), 353–367.
Article Google Scholar
Barkat, M., Ohala, J., & Pellegrino, F. (1999). Prosody as a distinctive feature for the discrimination of Arabic dialects. EUROSPEECH, 99, 395–398.
Google Scholar
Biadsy, F., Hirschberg, J. B. & Ellis, D. P. W. (2011). Dialect and accent recognition using phonetic-segmentation supervectors. In INTERSPEECH (pp. 752–756).
Cho, T., & Keating, P. A. (2001). Articulatory and acoustic studies on domain-initial strengthening in Korean. Journal of Phonetics, 29(2), 155–190.
Article Google Scholar
Ganapathiraju, A., Hamaker, J., Picone, J., Ordowski, M., & Doddington, G. R. (2001). Syllable-based large vocabulary continuous speech recognition. IEEE Transactions on Speech and Audio Processing, 9(4), 358–366.
Article Google Scholar
Grover, C., Jamieson, D. G., & Dobrovolsky, M. B. (1987). Intonation in English, French and German: perception and production. Language and Speech, 30(3), 277–295.
Google Scholar
Hamdi, R., Barkat-Defradas, M., Ferragne, E. & Pellegrino, F. (2004). Speech Timing and Rhythmic structure in Arabic dialects: A comparison of two approaches. In INTERSPEECH (Vol. 4, pp. 1613–1616).
Hanani, A., Russell, M. J., & Carey, M. J. (2013). Human and computer recognition of regional accents and ethnic groups from British English speech. Computer Speech & Language, 27(1), 59–74.
Article Google Scholar
Koolagudi, S. G., Maity, S., Kumar, V. A., Chakrabarti, S., & Rao, K. S. (2009). IITKGP-SESC: speech database for emotion analysis. In Contemporary computing (pp. 485–492). Springer.
Kulshreshtha, M., & Mathur, R. (2012). Dialect accent features for establishing speaker identity: A case study. New York: Springer.
Book Google Scholar
Kumar, M., Rajput, N., & Verma, A. (2004). A large-vocabulary continuous speech recognition system for Hindi. IBM journal of research and development, 48(5.6), 703–715.
Article Google Scholar
Lazaridis, A., Goldman, J.-P., Avanzi, M. & Garner, P. N. (2014). Syllable-based Regional Swiss French Accent Identification using Prosodic Features. In Nouveaux cahiers de linguistique francaise, Number EPFL-CONF-199821.
Ljolje, Andrej, & Fallside, Frank. (1987). Recognition of isolated prosodic patterns using Hidden Markov models. Computer Speech & Language, 2(1), 27–34.
Article Google Scholar
Mishra, D. & Bali, K (2011). A comparative phonological study of the dialects of Hindi. In Proceedings of ICPhS XVII, Hong Kong (pp. 17–21)
Pandey, P. K. (1989). Word accentuation in Hindi. Lingua, 77(1), 37–73.
Article Google Scholar
Peters, J., Gilles, P., Auer, P., & Selting, M. (2002). Identification of regional varieties by intonational cues: An experimental study on Hamburg and Berlin German. Language and Speech, 45(2), 115–138.
Article Google Scholar
Rabiner, L., & Juang, B.-H. (1993). Fundamentals of speech recognition. Upper Saddle River: Prentice hall.
MATH Google Scholar
Raman, S. (1985). Speech recognition of Hindi stop consonants. PhD thesis, Indian Institute of Technology, Madras, 1985.
Rao, P. V. S. (1993). VOICE: An integrated speech recognition synthesis system for the Hindi language. Speech Communication, 13(1), 197–205.
Article Google Scholar
Rao, K. S., & Koolagudi, S. G. (2012). Emotion recognition using speech features. New York: Springer.
MATH Google Scholar
Rifkin, R. (2008). Multiclass classification. http://www.mit.edu/~9.520/spring09/Classes/. Accessed 20 Sept 2014.
Robinson, A. J. (1989). Dynamic error propagation networks. PhD thesis, University of Cambridge.
Sekhar, C. C., & Yegnanarayana, B. (2002). A constraint satisfaction model for recognition of stop consonant-vowel (SCV) utterances. IEEE Transactions on Speech and Audio Processing, 10(7), 472–480.
Article Google Scholar
Sinha, S., Agrawal, S. S. & Jain, A. (2013) Dialectal influences on acoustic duration of Hindi phonemes. In Conference on Asian spoken language research and evaluation (O- COCOSDA/CASLRE) (pp. 1–5). IEEE, Piscataway.
Sinha, S., Jain, A., & Agrawal, S. S. (2015). Fusion of multi-stream speech features for dialect classification. CSI Transactions on ICT, 2(4), 243–252.
Article Google Scholar
Sreenivasa, K. S., & Yegnanarayana, B. (2009). Intonation modeling for Indian languages. Computer Speech & Language, 23(2), 240–256.
Article Google Scholar
Torres-Carrasquillo, P.A., Gleason, T. P. & Reynolds, D. A. (2004). Dialect identification using Gaussian mixture models. In ODYSSEY 04-the speaker and language recognition workshop (pp. 297–300).
Wells, J. C. (1982). Accents of English (Vol. 1). Cambridge: Cambridge University Press.
Book Google Scholar
Yan, Q. & Vaseghi, S. (2003). Analysis, modelling and synthesis of formants of British, American and Australian accents”. In Proceeding acoustics, speech, and signal processing (Vol. 1, pp. I–712). IEEE, Piscataway.
Zheng, D. C., Dyke, D., Berryman, F., Morgan, C., & Dang Cong. (2012). A new approach to acoustic analysis of two British regional accents: Birmingham and Liverpool accents. International Journal of Speech Technology, 15(2), 77–85.
Article Google Scholar

Download references

Author information

Authors and Affiliations

KIIT Group of Colleges, KIIT Campus, Sohna Road, Gurgaon, Haryana, India
S. S. Agrawal
Department of Computer Science and Engineering, Birla Institute of Technology, Mesra, Ranchi, India
Aruna Jain & Shweta Sinha

Authors

S. S. Agrawal
View author publications
You can also search for this author in PubMed Google Scholar
Aruna Jain
View author publications
You can also search for this author in PubMed Google Scholar
Shweta Sinha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shweta Sinha.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Agrawal, S.S., Jain, A. & Sinha, S. Analysis and modeling of acoustic information for automatic dialect classification. Int J Speech Technol 19, 593–609 (2016). https://doi.org/10.1007/s10772-016-9351-7

Download citation

Received: 16 January 2016
Accepted: 06 July 2016
Published: 22 July 2016
Issue Date: September 2016
DOI: https://doi.org/10.1007/s10772-016-9351-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Analysis and modeling of acoustic information for automatic dialect classification

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

Speech Emotion Recognition: A Comprehensive Survey

A comprehensive survey on automatic speech recognition using neural networks

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Analysis and modeling of acoustic information for automatic dialect classification

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

Speech Emotion Recognition: A Comprehensive Survey

A comprehensive survey on automatic speech recognition using neural networks

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation