Abstract
This paper is aimed at developing a two-stage language identification (LID) system for Northeast Indian languages. In the first stage, languages are pre-classified into tonal and non-tonal categories, and in the second stage, individual languages are identified from languages of the corresponding category. In this work, new parameters to model the prosodic characteristics of the speech signal have been proposed for pre-classification as well as individual language identification. Also, the effectiveness of spectral features, namely Mel-frequency cepstral coefficient (MFCC) and their combination with prosodic features, has been studied for pre-classification task. The usefulness of MFCC with their delta and acceleration coefficients in combination with prosodic features has been investigated for individual language identification. The performance of the system is analyzed for the features extracted of different analysis units, such as syllable, disyllable, word, and utterance. Comparative performance analysis of three different classifiers, namely artificial neural network (ANN), Gaussian mixture model–Universal background model (GMM–UBM), and i-vector based support vector machine (i-vector based SVM), has been made for pre-classification as well as individual language identification. A new database, NIT Silchar language database (NITS-LD), has been developed for seven NE Indian languages using All India Radio broadcast news. The experimental analysis suggests that the parameters proposed to represent the prosodic characteristics help to improve the performance of both the stages and show improvements over existing parameters by as much as 7.4%, 11.9%, and 9.1% for 30 s, 10 s, and 3 s test data, respectively, in the pre-classification stage. Of the baseline single-stage systems, GMM–UBM provides the highest accuracies of 80%, 76.8%, and 72% for 30 s, 10 s, and 3 s test data, respectively. In the proposed system, the combination of the ANN model in pre-classification stage and the GMM–UBM model in individual language identification stage provides the highest accuracies, and it shows the improvements over the baseline system by 7.2%, 7%, and 4.9% for 30 s, 10 s, and 3 s test data. For OGI-Multilingual (OGI-MLTS) database, improvements of 8.1%, 7.4%, and 5.7% for 30 s, 10 s, and 3 s test data, respectively, are observed over the baseline LID system.
Similar content being viewed by others
References
A.G. Adami, R. Mihaescu, D.A. Reynolds, J.J. Godfrey, Modeling prosodic dynamics for speaker recognition, in Proceedings, IEEE International Conference on Acoustic, Speech Signal Process, vol. 4 (Hong Kong, 2003), pp. 788–791
F. Adeeba, S. Hussain, Acoustic feature analysis and discriminative modeling for language identification of closely related South-Asian languages. Circuits System Signal Process. (2017). https://doi.org/10.1007/s00034-017-0724-1
C.L. Alan, Tonal effects on perceived vowel duration. Lab. Phonol. 10(4), 151–168 (2010)
E. Ambikairajah, H. Li, L. Wang, B. Yin, V. Sethu, Language identification: a tutorial. IEEE Circuits Syst. Mag. 11(2), 82–108 (2011)
M. Atterer, D.R. Ladd, On the phonetics and phonology of “segmental anchoring” of F 0. J. Phonetics 32, 177–197 (2004)
D. Dan, D. Robert Ladd, Linguistic tone is related to the population frequency of the adaptive haplogroups of two brain size genes, ASPM and microcephalin. PANS (2007). https://doi.org/10.1073/pnas.0610848104
N. Dehak, P. Kenny, R. Dehak, P. Dumouchel, P. Ouellet, Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 99(4), 788–798 (2010)
N. Dehak, P. Torres-Carrasquillo, D. Reynolds, R. Dehak, Language recognition via i-vectors and dimensionality reduction, in Interspeech Conference (Florence, 2011), pp. 857–860
S. Dey, P. Motlicek, S. Madikeri, M. Ferras, Template-matching for text-dependent speaker verification. Speech Commun. 88, 96–105 (2017)
M. Dorofki, A.H. Elshafie, O. Gaafar, O.A. Karim, S. Mastura, Comparison of artificial neural network transfer functions abilities to simulate extreme runoff data, in International Conference on Environment, Energy and Biotechnology (Singapore, 2012)
S. Duanmu, Tone and non-tone languages: an alternative to language typology and parameters. Lang. Linguist. 5(4), 891–923 (2004)
S. Dusan, L. Deng, Recovering vocal tract shapes from MFCC parameters, in 5th International Conference on Spoken Language Processing (1998)
C. Everett, D. Basì, S.G. Roberts, Climate, vocal folds, and tonal languages: connecting the physiological and geographical dots. PNAS 112(5), 1322–1327 (2016)
J. Gandour, Counterfeit tones in the speech of Southern Thai bidialectals. Lingua 41(2), 125–143 (1977)
A. Gelbukh, Computational Linguistics and Intelligent Text Processing, Part-1 (Springer, Berlin, 2011)
A.O. Hatch, S. Kajarekar, A. Stolcke, Within-class covariance normalization for SVM-based speaker recognition, in Proceeding of the ICSLP (2006), pp. 1471–1474
S. Jothilakshmi, V. Ramalingam, S. Palanivel, A hierarchical language identification system for Indian languages. Digit. Signal Proc. 22(3), 544–553 (2012)
A.N. Khan, S.V. Gangashetty, B. Yegnanarayana, Syllabic properties of three Indian languages: implications for speech recognition and language identification, in International Conference on Natural Language Processing (Mysore, 2003), pp. 125–134
E. Kidder, Tone, intonation, stress and duration in Navajo. in En Linguistic Theory at the University of Arizona, eds. by Mans Hulden y Shannon T. Bischoff (Arizona: University of Arizona Linguistics Circle, 2008), Vol. 16, pp 55–66
R.A. Krakow, Physiological organization of syllables: a review. J. Phonetics 27, 23–54 (1999)
P.N. Le, E. Ambikairajah, E.H. Choi, Improvement of vietnamese tone classification using FM and MFCC features, in International Conference on Computing and Communication Technologies, (RIVF’09) (2009), pp. 1–4
I. Maddieson, Tone, in The World Atlas of Language Structures Online, ed. by Matthew S. Dryer, Martin Haspelmath (Max Planck Institute for Evolutionary Anthropology, Leipzig, 2013)
D. Martinez, E.A. Lleida: Ortega and A. Miguel, prosodic features and formant modeling for an i-vector based Language recognition system, in ICASSP (2013)
L. Mary, B. Yegnanarayana, Extraction and representation of prosodic features for language and speaker recognition. Speech Commun. 50(10), 782–796 (2008)
L. Mary, Multilevel implicit features for language and speaker recognition. Ph.D. Dissertation (IIT Madras, 2006)
Y. Muthusamy, R. Cole, B. Oshika, The OGI multi-language telephone speech corpuses, in Proceedings of International Conference Spoken Language Processing (ICSLP) (1992), pp. 895–898
R.W.M. Ng, T. Lee, C.C. Leung, B. Ma, H. Li, Analysis and selection of prosodic features for language identification, in Proc. IALP. (2009), pp. 123–128
P. Pittayaporn, Directionality of tone change, in Proceedings of the 16th International Congress of Phonetic Sciences (Saarland University, Saarbrücken, 2007), pp. 1421–1424
A. Poddar, M. Sahidullah, G. Saha, Improved i-vector extraction technique for speaker verification with short utterances. Int. J. Speech Technol. 3, 1–16 (2017)
S.R.M. Prasanna, B.V.S. Reddy, P. Krishnamurthy, Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Trans. Audio Speech Lang. Process. 17, 556–565 (2009)
C. Qu, H. Goad, The interaction of stress and tone in standard Chinese: experimental findings and theoretical consequences (Theory and Practice, Max Planck Institute for Evolutionary Anthropology, Tone, 2012)
V. Ramu Reddy, S. Maity, K.S. Rao, Identification of Indian languages using multi-level spectral and prosodic features. Int. J. Speech Technol. 16(4), 489–511 (2013)
K.S. Rao, Application of prosody models for developing speech systems in Indian languages. Int. J. Speech Technol. 14(1), 19–33 (2011)
R.A. Redner, H.F. Walker, Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev. 26(2), 195–239 (1984)
B. Remijsen, The study of tone in languages with a quantity contrast. Language Documentation and Conservation. 8, 634–651 (2014)
D. Reynolds, Gaussian Mixture Models. Encyclopedia of Biometric Recognition (Springer, Berlin, 2008)
N. Ryant, J. Hong Yuan, M. Liberman, Mandarin tone classification without pitch tracking, in ICASSP (2014)
P. Sarmah, C.R. Wiltshire, A preliminary acoustic study of Mizo vowels and tones. J. Acoust. Soc. India 37(3), 121–129 (2010)
A.K. Singh, A computational phonetic model for Indian language scripts, in Constraints on Spelling Changes. Fifth International Workshop on Writing Systems (Nijmegen, 2006)
D. Steven, P. Mermelstein, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28, 357–366 (1980)
M.N. Stuttle, A Gaussian mixture model spectral representation for speech recognition. Ph.D. Dissertation (University of Cambridge, 2003)
M.J.S. Suresh, S.A. Thorat, Language identification system using MFCC and SDC feature, Language (2018)
D. Talkin, A robust algorithm for pitch tracking (RAPT), in Speech Coding and Synthesis, ed. by W.B. Klein, K.K. Paliwal (Elsevier, New York, 1995)
L. Wang, E.E. Ambikairajah, H.C. Choi, Automatic tonal and non-tonal language classification and language identification using prosodic information, in International Symposium on Chinese Spoken language Processing. (ISCSLP) (2006), pp. 485–496
L. Wang, E. Ambikairajah, H.C. Choi Eric, Automatic language recognition with tonal and non-tonal language pre-classification, in 15th European Signal Processing Conference (2007)
Y. Xu, ‘Effects of tone and focus on the formation and alignment of F 0 contours. J. Phonetics 27, 55–105 (1999)
Y. Xu, Consistency of tone-syllable alignment across different syllable structures and speaking rates. Phonetica 55, 179–203 (1998)
Y. Xu, Understanding tone from the perspective of production and perception. Lang. Linguist. 5(4), 757–797 (2004)
B. Yegnanarayana, Artificial Neural Networks (Prentice-Hall of india Private Limited, New Delhi, 2005)
B. Yin, Language identification with language and feature dependency. Ph.D. Dissertation (The University of New South Wales, 2009)
J. Zhang, Tones, tonal phonology, and tone sandhi, in Chinese Linguistics, ed. by C.-T. James Huang, Y.-H. Audrey Li, A. Simpson (Wiley, Oxford, 2014), pp. 443–464
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
China Bhanja, C., Laskar, M.A. & Laskar, R.H. A Pre-classification-Based Language Identification for Northeast Indian Languages Using Prosody and Spectral Features. Circuits Syst Signal Process 38, 2266–2296 (2019). https://doi.org/10.1007/s00034-018-0962-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-018-0962-x