Skip to main content
Log in

A Pre-classification-Based Language Identification for Northeast Indian Languages Using Prosody and Spectral Features

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

This paper is aimed at developing a two-stage language identification (LID) system for Northeast Indian languages. In the first stage, languages are pre-classified into tonal and non-tonal categories, and in the second stage, individual languages are identified from languages of the corresponding category. In this work, new parameters to model the prosodic characteristics of the speech signal have been proposed for pre-classification as well as individual language identification. Also, the effectiveness of spectral features, namely Mel-frequency cepstral coefficient (MFCC) and their combination with prosodic features, has been studied for pre-classification task. The usefulness of MFCC with their delta and acceleration coefficients in combination with prosodic features has been investigated for individual language identification. The performance of the system is analyzed for the features extracted of different analysis units, such as syllable, disyllable, word, and utterance. Comparative performance analysis of three different classifiers, namely artificial neural network (ANN), Gaussian mixture model–Universal background model (GMM–UBM), and i-vector based support vector machine (i-vector based SVM), has been made for pre-classification as well as individual language identification. A new database, NIT Silchar language database (NITS-LD), has been developed for seven NE Indian languages using All India Radio broadcast news. The experimental analysis suggests that the parameters proposed to represent the prosodic characteristics help to improve the performance of both the stages and show improvements over existing parameters by as much as 7.4%, 11.9%, and 9.1% for 30 s, 10 s, and 3 s test data, respectively, in the pre-classification stage. Of the baseline single-stage systems, GMM–UBM provides the highest accuracies of 80%, 76.8%, and 72% for 30 s, 10 s, and 3 s test data, respectively. In the proposed system, the combination of the ANN model in pre-classification stage and the GMM–UBM model in individual language identification stage provides the highest accuracies, and it shows the improvements over the baseline system by 7.2%, 7%, and 4.9% for 30 s, 10 s, and 3 s test data. For OGI-Multilingual (OGI-MLTS) database, improvements of 8.1%, 7.4%, and 5.7% for 30 s, 10 s, and 3 s test data, respectively, are observed over the baseline LID system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. A.G. Adami, R. Mihaescu, D.A. Reynolds, J.J. Godfrey, Modeling prosodic dynamics for speaker recognition, in Proceedings, IEEE International Conference on Acoustic, Speech Signal Process, vol. 4 (Hong Kong, 2003), pp. 788–791

  2. F. Adeeba, S. Hussain, Acoustic feature analysis and discriminative modeling for language identification of closely related South-Asian languages. Circuits System Signal Process. (2017). https://doi.org/10.1007/s00034-017-0724-1

    Google Scholar 

  3. C.L. Alan, Tonal effects on perceived vowel duration. Lab. Phonol. 10(4), 151–168 (2010)

    MathSciNet  Google Scholar 

  4. E. Ambikairajah, H. Li, L. Wang, B. Yin, V. Sethu, Language identification: a tutorial. IEEE Circuits Syst. Mag. 11(2), 82–108 (2011)

    Article  Google Scholar 

  5. M. Atterer, D.R. Ladd, On the phonetics and phonology of “segmental anchoring” of F 0. J. Phonetics 32, 177–197 (2004)

    Article  Google Scholar 

  6. D. Dan, D. Robert Ladd, Linguistic tone is related to the population frequency of the adaptive haplogroups of two brain size genes, ASPM and microcephalin. PANS (2007). https://doi.org/10.1073/pnas.0610848104

    Google Scholar 

  7. N. Dehak, P. Kenny, R. Dehak, P. Dumouchel, P. Ouellet, Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 99(4), 788–798 (2010)

    Article  Google Scholar 

  8. N. Dehak, P. Torres-Carrasquillo, D. Reynolds, R. Dehak, Language recognition via i-vectors and dimensionality reduction, in Interspeech Conference (Florence, 2011), pp. 857–860

  9. S. Dey, P. Motlicek, S. Madikeri, M. Ferras, Template-matching for text-dependent speaker verification. Speech Commun. 88, 96–105 (2017)

    Article  Google Scholar 

  10. M. Dorofki, A.H. Elshafie, O. Gaafar, O.A. Karim, S. Mastura, Comparison of artificial neural network transfer functions abilities to simulate extreme runoff data, in International Conference on Environment, Energy and Biotechnology (Singapore, 2012)

  11. S. Duanmu, Tone and non-tone languages: an alternative to language typology and parameters. Lang. Linguist. 5(4), 891–923 (2004)

    MathSciNet  Google Scholar 

  12. S. Dusan, L. Deng, Recovering vocal tract shapes from MFCC parameters, in 5th International Conference on Spoken Language Processing (1998)

  13. C. Everett, D. Basì, S.G. Roberts, Climate, vocal folds, and tonal languages: connecting the physiological and geographical dots. PNAS 112(5), 1322–1327 (2016)

    Article  Google Scholar 

  14. J. Gandour, Counterfeit tones in the speech of Southern Thai bidialectals. Lingua 41(2), 125–143 (1977)

    Article  Google Scholar 

  15. A. Gelbukh, Computational Linguistics and Intelligent Text Processing, Part-1 (Springer, Berlin, 2011)

    Google Scholar 

  16. A.O. Hatch, S. Kajarekar, A. Stolcke, Within-class covariance normalization for SVM-based speaker recognition, in Proceeding of the ICSLP (2006), pp. 1471–1474

  17. S. Jothilakshmi, V. Ramalingam, S. Palanivel, A hierarchical language identification system for Indian languages. Digit. Signal Proc. 22(3), 544–553 (2012)

    Article  MathSciNet  Google Scholar 

  18. A.N. Khan, S.V. Gangashetty, B. Yegnanarayana, Syllabic properties of three Indian languages: implications for speech recognition and language identification, in International Conference on Natural Language Processing (Mysore, 2003), pp. 125–134

  19. E. Kidder, Tone, intonation, stress and duration in Navajo. in En Linguistic Theory at the University of Arizona, eds. by Mans Hulden y Shannon T. Bischoff (Arizona: University of Arizona Linguistics Circle, 2008), Vol. 16, pp 55–66

  20. R.A. Krakow, Physiological organization of syllables: a review. J. Phonetics 27, 23–54 (1999)

    Article  Google Scholar 

  21. P.N. Le, E. Ambikairajah, E.H. Choi, Improvement of vietnamese tone classification using FM and MFCC features, in International Conference on Computing and Communication Technologies, (RIVF’09) (2009), pp. 1–4

  22. I. Maddieson, Tone, in The World Atlas of Language Structures Online, ed. by Matthew S. Dryer, Martin Haspelmath (Max Planck Institute for Evolutionary Anthropology, Leipzig, 2013)

    Google Scholar 

  23. D. Martinez, E.A. Lleida: Ortega and A. Miguel, prosodic features and formant modeling for an i-vector based Language recognition system, in ICASSP (2013)

  24. L. Mary, B. Yegnanarayana, Extraction and representation of prosodic features for language and speaker recognition. Speech Commun. 50(10), 782–796 (2008)

    Article  Google Scholar 

  25. L. Mary, Multilevel implicit features for language and speaker recognition. Ph.D. Dissertation (IIT Madras, 2006)

  26. Y. Muthusamy, R. Cole, B. Oshika, The OGI multi-language telephone speech corpuses, in Proceedings of International Conference Spoken Language Processing (ICSLP) (1992), pp. 895–898

  27. R.W.M. Ng, T. Lee, C.C. Leung, B. Ma, H. Li, Analysis and selection of prosodic features for language identification, in Proc. IALP. (2009), pp. 123–128

  28. P. Pittayaporn, Directionality of tone change, in Proceedings of the 16th International Congress of Phonetic Sciences (Saarland University, Saarbrücken, 2007), pp. 1421–1424

  29. A. Poddar, M. Sahidullah, G. Saha, Improved i-vector extraction technique for speaker verification with short utterances. Int. J. Speech Technol. 3, 1–16 (2017)

    Google Scholar 

  30. S.R.M. Prasanna, B.V.S. Reddy, P. Krishnamurthy, Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Trans. Audio Speech Lang. Process. 17, 556–565 (2009)

    Article  Google Scholar 

  31. C. Qu, H. Goad, The interaction of stress and tone in standard Chinese: experimental findings and theoretical consequences (Theory and Practice, Max Planck Institute for Evolutionary Anthropology, Tone, 2012)

    Google Scholar 

  32. V. Ramu Reddy, S. Maity, K.S. Rao, Identification of Indian languages using multi-level spectral and prosodic features. Int. J. Speech Technol. 16(4), 489–511 (2013)

    Article  Google Scholar 

  33. K.S. Rao, Application of prosody models for developing speech systems in Indian languages. Int. J. Speech Technol. 14(1), 19–33 (2011)

    Article  Google Scholar 

  34. R.A. Redner, H.F. Walker, Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev. 26(2), 195–239 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  35. B. Remijsen, The study of tone in languages with a quantity contrast. Language Documentation and Conservation. 8, 634–651 (2014)

    Google Scholar 

  36. D. Reynolds, Gaussian Mixture Models. Encyclopedia of Biometric Recognition (Springer, Berlin, 2008)

    Google Scholar 

  37. N. Ryant, J. Hong Yuan, M. Liberman, Mandarin tone classification without pitch tracking, in ICASSP (2014)

  38. P. Sarmah, C.R. Wiltshire, A preliminary acoustic study of Mizo vowels and tones. J. Acoust. Soc. India 37(3), 121–129 (2010)

    Google Scholar 

  39. A.K. Singh, A computational phonetic model for Indian language scripts, in Constraints on Spelling Changes. Fifth International Workshop on Writing Systems (Nijmegen, 2006)

  40. D. Steven, P. Mermelstein, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28, 357–366 (1980)

    Article  Google Scholar 

  41. M.N. Stuttle, A Gaussian mixture model spectral representation for speech recognition. Ph.D. Dissertation (University of Cambridge, 2003)

  42. M.J.S. Suresh, S.A. Thorat, Language identification system using MFCC and SDC feature, Language (2018)

  43. D. Talkin, A robust algorithm for pitch tracking (RAPT), in Speech Coding and Synthesis, ed. by W.B. Klein, K.K. Paliwal (Elsevier, New York, 1995)

    Google Scholar 

  44. L. Wang, E.E. Ambikairajah, H.C. Choi, Automatic tonal and non-tonal language classification and language identification using prosodic information, in International Symposium on Chinese Spoken language Processing. (ISCSLP) (2006), pp. 485–496

  45. L. Wang, E. Ambikairajah, H.C. Choi Eric, Automatic language recognition with tonal and non-tonal language pre-classification, in 15th European Signal Processing Conference (2007)

  46. Y. Xu, ‘Effects of tone and focus on the formation and alignment of F 0 contours. J. Phonetics 27, 55–105 (1999)

    Article  Google Scholar 

  47. Y. Xu, Consistency of tone-syllable alignment across different syllable structures and speaking rates. Phonetica 55, 179–203 (1998)

    Article  Google Scholar 

  48. Y. Xu, Understanding tone from the perspective of production and perception. Lang. Linguist. 5(4), 757–797 (2004)

    Google Scholar 

  49. B. Yegnanarayana, Artificial Neural Networks (Prentice-Hall of india Private Limited, New Delhi, 2005)

    Google Scholar 

  50. B. Yin, Language identification with language and feature dependency. Ph.D. Dissertation (The University of New South Wales, 2009)

  51. J. Zhang, Tones, tonal phonology, and tone sandhi, in Chinese Linguistics, ed. by C.-T. James Huang, Y.-H. Audrey Li, A. Simpson (Wiley, Oxford, 2014), pp. 443–464

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chuya China Bhanja.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

China Bhanja, C., Laskar, M.A. & Laskar, R.H. A Pre-classification-Based Language Identification for Northeast Indian Languages Using Prosody and Spectral Features. Circuits Syst Signal Process 38, 2266–2296 (2019). https://doi.org/10.1007/s00034-018-0962-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-018-0962-x

Keywords

Navigation