Abstract
A primary challenge in the field of automatic speech recognition is to understand and create acoustic models to represent individual differences in their spoken language. Individual’s age, gender; their speaking styles influenced by their dialect may be few of the reasons for these differences. This work investigates the dialectal differences by measuring the analysis of variance of acoustic features such as, formant frequencies, pitch, pitch slope, duration and intensity for vowel sounds. This paper attempts to discuss methods to capture dialect specific knowledge through vocal tract and prosody information extracted from speech that can be utilized for automatic identification of dialects. Kernel based support vector machine is utilized for measuring the dialect discriminating ability of acoustic features. For the spectral feature shifted delta cepstral coefficients along with Mel frequency cepstral coefficients gives a recognition performance of 66.97 %. Combination of prosodic features performs better with a classification score of 74 %. The model is further evaluated for the combination of spectral and prosodic feature set and achieves a classification accuracy of 88.77 %. The proposed model is compared with the human perception of dialects. The overall work is based on four dialects of Hindi; one of the world’s major languages.
Similar content being viewed by others
References
Adank, P., Van Hout, R., & Van de Velde, H. (2007). An acoustic description of the vowels of northern and southern standard Dutch II: Regional vari- etiesa. The Journal of the Acoustical Society of America, 121(2), 1130–1141.
Aggarwal, R. K., & Dave, M. (2012). Integration of multiple acoustic and language mod- els for improved Hindi speech recognition system. International Journal of Speech Technology, 15(2), 165–180.
Arslan, L. M., & Hansen, J. H. L. (1996). Language accent classification in American English. Speech Communication, 18(4), 353–367.
Barkat, M., Ohala, J., & Pellegrino, F. (1999). Prosody as a distinctive feature for the discrimination of Arabic dialects. EUROSPEECH, 99, 395–398.
Biadsy, F., Hirschberg, J. B. & Ellis, D. P. W. (2011). Dialect and accent recognition using phonetic-segmentation supervectors. In INTERSPEECH (pp. 752–756).
Cho, T., & Keating, P. A. (2001). Articulatory and acoustic studies on domain-initial strengthening in Korean. Journal of Phonetics, 29(2), 155–190.
Ganapathiraju, A., Hamaker, J., Picone, J., Ordowski, M., & Doddington, G. R. (2001). Syllable-based large vocabulary continuous speech recognition. IEEE Transactions on Speech and Audio Processing, 9(4), 358–366.
Grover, C., Jamieson, D. G., & Dobrovolsky, M. B. (1987). Intonation in English, French and German: perception and production. Language and Speech, 30(3), 277–295.
Hamdi, R., Barkat-Defradas, M., Ferragne, E. & Pellegrino, F. (2004). Speech Timing and Rhythmic structure in Arabic dialects: A comparison of two approaches. In INTERSPEECH (Vol. 4, pp. 1613–1616).
Hanani, A., Russell, M. J., & Carey, M. J. (2013). Human and computer recognition of regional accents and ethnic groups from British English speech. Computer Speech & Language, 27(1), 59–74.
Koolagudi, S. G., Maity, S., Kumar, V. A., Chakrabarti, S., & Rao, K. S. (2009). IITKGP-SESC: speech database for emotion analysis. In Contemporary computing (pp. 485–492). Springer.
Kulshreshtha, M., & Mathur, R. (2012). Dialect accent features for establishing speaker identity: A case study. New York: Springer.
Kumar, M., Rajput, N., & Verma, A. (2004). A large-vocabulary continuous speech recognition system for Hindi. IBM journal of research and development, 48(5.6), 703–715.
Lazaridis, A., Goldman, J.-P., Avanzi, M. & Garner, P. N. (2014). Syllable-based Regional Swiss French Accent Identification using Prosodic Features. In Nouveaux cahiers de linguistique francaise, Number EPFL-CONF-199821.
Ljolje, Andrej, & Fallside, Frank. (1987). Recognition of isolated prosodic patterns using Hidden Markov models. Computer Speech & Language, 2(1), 27–34.
Mishra, D. & Bali, K (2011). A comparative phonological study of the dialects of Hindi. In Proceedings of ICPhS XVII, Hong Kong (pp. 17–21)
Pandey, P. K. (1989). Word accentuation in Hindi. Lingua, 77(1), 37–73.
Peters, J., Gilles, P., Auer, P., & Selting, M. (2002). Identification of regional varieties by intonational cues: An experimental study on Hamburg and Berlin German. Language and Speech, 45(2), 115–138.
Rabiner, L., & Juang, B.-H. (1993). Fundamentals of speech recognition. Upper Saddle River: Prentice hall.
Raman, S. (1985). Speech recognition of Hindi stop consonants. PhD thesis, Indian Institute of Technology, Madras, 1985.
Rao, P. V. S. (1993). VOICE: An integrated speech recognition synthesis system for the Hindi language. Speech Communication, 13(1), 197–205.
Rao, K. S., & Koolagudi, S. G. (2012). Emotion recognition using speech features. New York: Springer.
Rifkin, R. (2008). Multiclass classification. http://www.mit.edu/~9.520/spring09/Classes/. Accessed 20 Sept 2014.
Robinson, A. J. (1989). Dynamic error propagation networks. PhD thesis, University of Cambridge.
Sekhar, C. C., & Yegnanarayana, B. (2002). A constraint satisfaction model for recognition of stop consonant-vowel (SCV) utterances. IEEE Transactions on Speech and Audio Processing, 10(7), 472–480.
Sinha, S., Agrawal, S. S. & Jain, A. (2013) Dialectal influences on acoustic duration of Hindi phonemes. In Conference on Asian spoken language research and evaluation (O- COCOSDA/CASLRE) (pp. 1–5). IEEE, Piscataway.
Sinha, S., Jain, A., & Agrawal, S. S. (2015). Fusion of multi-stream speech features for dialect classification. CSI Transactions on ICT, 2(4), 243–252.
Sreenivasa, K. S., & Yegnanarayana, B. (2009). Intonation modeling for Indian languages. Computer Speech & Language, 23(2), 240–256.
Torres-Carrasquillo, P.A., Gleason, T. P. & Reynolds, D. A. (2004). Dialect identification using Gaussian mixture models. In ODYSSEY 04-the speaker and language recognition workshop (pp. 297–300).
Wells, J. C. (1982). Accents of English (Vol. 1). Cambridge: Cambridge University Press.
Yan, Q. & Vaseghi, S. (2003). Analysis, modelling and synthesis of formants of British, American and Australian accents”. In Proceeding acoustics, speech, and signal processing (Vol. 1, pp. I–712). IEEE, Piscataway.
Zheng, D. C., Dyke, D., Berryman, F., Morgan, C., & Dang Cong. (2012). A new approach to acoustic analysis of two British regional accents: Birmingham and Liverpool accents. International Journal of Speech Technology, 15(2), 77–85.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Agrawal, S.S., Jain, A. & Sinha, S. Analysis and modeling of acoustic information for automatic dialect classification. Int J Speech Technol 19, 593–609 (2016). https://doi.org/10.1007/s10772-016-9351-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-016-9351-7