Skip to main content
Log in

Analysis and modeling of acoustic information for automatic dialect classification

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

A primary challenge in the field of automatic speech recognition is to understand and create acoustic models to represent individual differences in their spoken language. Individual’s age, gender; their speaking styles influenced by their dialect may be few of the reasons for these differences. This work investigates the dialectal differences by measuring the analysis of variance of acoustic features such as, formant frequencies, pitch, pitch slope, duration and intensity for vowel sounds. This paper attempts to discuss methods to capture dialect specific knowledge through vocal tract and prosody information extracted from speech that can be utilized for automatic identification of dialects. Kernel based support vector machine is utilized for measuring the dialect discriminating ability of acoustic features. For the spectral feature shifted delta cepstral coefficients along with Mel frequency cepstral coefficients gives a recognition performance of 66.97 %. Combination of prosodic features performs better with a classification score of 74 %. The model is further evaluated for the combination of spectral and prosodic feature set and achieves a classification accuracy of 88.77 %. The proposed model is compared with the human perception of dialects. The overall work is based on four dialects of Hindi; one of the world’s major languages.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Adank, P., Van Hout, R., & Van de Velde, H. (2007). An acoustic description of the vowels of northern and southern standard Dutch II: Regional vari- etiesa. The Journal of the Acoustical Society of America, 121(2), 1130–1141.

    Article  Google Scholar 

  • Aggarwal, R. K., & Dave, M. (2012). Integration of multiple acoustic and language mod- els for improved Hindi speech recognition system. International Journal of Speech Technology, 15(2), 165–180.

    Article  Google Scholar 

  • Arslan, L. M., & Hansen, J. H. L. (1996). Language accent classification in American English. Speech Communication, 18(4), 353–367.

    Article  Google Scholar 

  • Barkat, M., Ohala, J., & Pellegrino, F. (1999). Prosody as a distinctive feature for the discrimination of Arabic dialects. EUROSPEECH, 99, 395–398.

    Google Scholar 

  • Biadsy, F., Hirschberg, J. B. & Ellis, D. P. W. (2011). Dialect and accent recognition using phonetic-segmentation supervectors. In INTERSPEECH (pp. 752–756).

  • Cho, T., & Keating, P. A. (2001). Articulatory and acoustic studies on domain-initial strengthening in Korean. Journal of Phonetics, 29(2), 155–190.

    Article  Google Scholar 

  • Ganapathiraju, A., Hamaker, J., Picone, J., Ordowski, M., & Doddington, G. R. (2001). Syllable-based large vocabulary continuous speech recognition. IEEE Transactions on Speech and Audio Processing, 9(4), 358–366.

    Article  Google Scholar 

  • Grover, C., Jamieson, D. G., & Dobrovolsky, M. B. (1987). Intonation in English, French and German: perception and production. Language and Speech, 30(3), 277–295.

    Google Scholar 

  • Hamdi, R., Barkat-Defradas, M., Ferragne, E. & Pellegrino, F. (2004). Speech Timing and Rhythmic structure in Arabic dialects: A comparison of two approaches. In INTERSPEECH (Vol. 4, pp. 1613–1616).

  • Hanani, A., Russell, M. J., & Carey, M. J. (2013). Human and computer recognition of regional accents and ethnic groups from British English speech. Computer Speech & Language, 27(1), 59–74.

    Article  Google Scholar 

  • Koolagudi, S. G., Maity, S., Kumar, V. A., Chakrabarti, S., & Rao, K. S. (2009). IITKGP-SESC: speech database for emotion analysis. In Contemporary computing (pp. 485–492). Springer.

  • Kulshreshtha, M., & Mathur, R. (2012). Dialect accent features for establishing speaker identity: A case study. New York: Springer.

    Book  Google Scholar 

  • Kumar, M., Rajput, N., & Verma, A. (2004). A large-vocabulary continuous speech recognition system for Hindi. IBM journal of research and development, 48(5.6), 703–715.

    Article  Google Scholar 

  • Lazaridis, A., Goldman, J.-P., Avanzi, M. & Garner, P. N. (2014). Syllable-based Regional Swiss French Accent Identification using Prosodic Features. In Nouveaux cahiers de linguistique francaise, Number EPFL-CONF-199821.

  • Ljolje, Andrej, & Fallside, Frank. (1987). Recognition of isolated prosodic patterns using Hidden Markov models. Computer Speech & Language, 2(1), 27–34.

    Article  Google Scholar 

  • Mishra, D. & Bali, K (2011). A comparative phonological study of the dialects of Hindi. In Proceedings of ICPhS XVII, Hong Kong (pp. 17–21)

  • Pandey, P. K. (1989). Word accentuation in Hindi. Lingua, 77(1), 37–73.

    Article  Google Scholar 

  • Peters, J., Gilles, P., Auer, P., & Selting, M. (2002). Identification of regional varieties by intonational cues: An experimental study on Hamburg and Berlin German. Language and Speech, 45(2), 115–138.

    Article  Google Scholar 

  • Rabiner, L., & Juang, B.-H. (1993). Fundamentals of speech recognition. Upper Saddle River: Prentice hall.

    MATH  Google Scholar 

  • Raman, S. (1985). Speech recognition of Hindi stop consonants. PhD thesis, Indian Institute of Technology, Madras, 1985.

  • Rao, P. V. S. (1993). VOICE: An integrated speech recognition synthesis system for the Hindi language. Speech Communication, 13(1), 197–205.

    Article  Google Scholar 

  • Rao, K. S., & Koolagudi, S. G. (2012). Emotion recognition using speech features. New York: Springer.

    MATH  Google Scholar 

  • Rifkin, R. (2008). Multiclass classification. http://www.mit.edu/~9.520/spring09/Classes/. Accessed 20 Sept 2014.

  • Robinson, A. J. (1989). Dynamic error propagation networks. PhD thesis, University of Cambridge.

  • Sekhar, C. C., & Yegnanarayana, B. (2002). A constraint satisfaction model for recognition of stop consonant-vowel (SCV) utterances. IEEE Transactions on Speech and Audio Processing, 10(7), 472–480.

    Article  Google Scholar 

  • Sinha, S., Agrawal, S. S. & Jain, A. (2013) Dialectal influences on acoustic duration of Hindi phonemes. In Conference on Asian spoken language research and evaluation (O- COCOSDA/CASLRE) (pp. 1–5). IEEE, Piscataway.

  • Sinha, S., Jain, A., & Agrawal, S. S. (2015). Fusion of multi-stream speech features for dialect classification. CSI Transactions on ICT, 2(4), 243–252.

    Article  Google Scholar 

  • Sreenivasa, K. S., & Yegnanarayana, B. (2009). Intonation modeling for Indian languages. Computer Speech & Language, 23(2), 240–256.

    Article  Google Scholar 

  • Torres-Carrasquillo, P.A., Gleason, T. P. & Reynolds, D. A. (2004). Dialect identification using Gaussian mixture models. In ODYSSEY 04-the speaker and language recognition workshop (pp. 297–300).

  • Wells, J. C. (1982). Accents of English (Vol. 1). Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Yan, Q. & Vaseghi, S. (2003). Analysis, modelling and synthesis of formants of British, American and Australian accents”. In Proceeding acoustics, speech, and signal processing (Vol. 1, pp. I–712). IEEE, Piscataway.

  • Zheng, D. C., Dyke, D., Berryman, F., Morgan, C., & Dang Cong. (2012). A new approach to acoustic analysis of two British regional accents: Birmingham and Liverpool accents. International Journal of Speech Technology, 15(2), 77–85.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shweta Sinha.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Agrawal, S.S., Jain, A. & Sinha, S. Analysis and modeling of acoustic information for automatic dialect classification. Int J Speech Technol 19, 593–609 (2016). https://doi.org/10.1007/s10772-016-9351-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-016-9351-7

Keywords

Navigation