Skip to main content
Log in

Acoustic Feature Analysis and Discriminative Modeling for Language Identification of Closely Related South-Asian Languages

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

With the advancement in technology, communication between people around the world from different linguistic backgrounds is increasing gradually, resulting in the requirement of language identification services. Language identification techniques extract distinguishable information as features of a language from the speech corpora to differentiate one language from other. Without publicly available speech corpora, comparison between different techniques will not be much reliable. This paper investigates state-of-the-art features and techniques for language identification of under-resource and closely related languages, namely Pashto, Punjabi, Sindhi, and Urdu. For language identification, speech corpus is designed and collected for mentioned languages. The dataset is a read speech data collected over telephone network (mobile and landline) from different regions of Pakistan. The speech corpus is annotated at the sentence level using X-SAMPA, its orthographic transcription is also provided, and verified data are divided into training and evaluation sets. Mel-frequency cepstral coefficients and their shifted delta cepstral features are used to develop language identification system of target languages. Gaussian mixture model with universal background model (GMM-UBM)-based and I-vector-based language identification approaches are investigated. The results show that GMM-UBM is more effective than the I-vector for language identification of short duration test utterances.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. http://www.sindhiadabiboard.org/.

  2. http://www.bbc.com/pashto.

  3. http://www.rohi.af/.

  4. http://www.khybernews.tv/pashto/index.php.

  5. http://sox.sourceforge.net/.

  6. http://www.loc.gov/standards/iso639-2/php/English_list.php.

References

  1. (16 Feb 2017). The 2011 NIST Language Recognition Evaluation Results. https://www.nist.gov/itl/iad/mig/lre11-results

  2. (2017). Gurumukhi–Shahmukhi Transliteration. http://g2s.learnpunjabi.org/default.aspx

  3. (2017). IPA for Punjabi. https://en.wikipedia.org/wiki/Help:IPA_for_Punjabi

  4. 1998 Census Report of Pakistan, Islamabad1998

  5. F. Adeeba, S. Hussain, T. Habib, E. Ul-Haq, K. S. Shahid, Comparison of Urdu text to speech synthesis using unit selection and HMM based techniques, Presented at the Oriental COCOSDA Bali (Indonesia, 2016)

  6. F. Adeeba, Q.-u.-A. Akram, H. Khalid, S. Hussain, CLE Urdu books N-grams, Presented at the Conference on Language and Technology (Karachi, Pakistan, 2014)

  7. A.K.H. Al-Ali, D. Dean, B. Senadji, V. Chandran, G.R. Naik, Enhanced forensic speaker verification using a combination of DWT and MFCC feature warping in the presence of noise and reverberation conditions. (IEEE Access, 2017) pp. 15400–15413

  8. M.H. Bahari, N. Dehak, H.V. Hamme, L. Burget, A.M. Ali, J. Glass, Non-negative factor analysis of gaussian mixture model weight adaptation for language and dialect recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22, 1117–1129 (2014)

    Article  Google Scholar 

  9. H. Behravan, V. Hautamäki, T. Kinnunen, Factors affecting i-vector based foreign accent recognition: a case study in spoken Finnish. Speech Commun. 66, 118–129 (2015)

    Article  Google Scholar 

  10. N. Bertoldi, M. Federico, Cross-Language Spoken Document Retrieval on the TREC SDR Collection, in Advances in Cross-Language Information Retrieval: Third Workshop of the Cross-Language Evaluation Forum, CLEF 2002 Rome, Italy, Sept 19–20, 2002 Revised Papers, C. Peters, M. Braschler, J. Gonzalo, M. Kluck, (Eds.), (Springer, Berlin, 2003) pp. 476–481

  11. P. Boersma, Praat, a system for doing phonetics by computer. Glot Int. 5, 341–345 (2001)

    Google Scholar 

  12. J.P.C.W.M. Campbell, D.A. Reynolds, E. Singer, P.A. Torres-Carrasquillo, Support vector machines for speaker and language recognition, Presented at the Odyssey 2004: The speaker and Language Recognition Workshop (2006)

  13. L. Chi-Yueh, W. Hsiao-Chuan, Language identification using pitch contour information, in Proceedings of (ICASSP’05) IEEE International Conference on Acoustics, Speech, and Signal Processing (2005), pp. 601–604

  14. N. Dehak, P.A. Torres-Carrasquillo, D. Reynolds, R. Dehak, Language recognition via ivectors and dimensionality reduction (2011)

  15. N. Dehak, P. Dumouchel, P. Kenny, Modeling prosodic features with joint factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 15, 2095–2103 (2007)

    Article  Google Scholar 

  16. K.C. Djamel MOSTEFA, Sylvie BRUNESSAUX, Karim Boudahmane, New language resources for the Pashto language, Presented at the Language Resource and Evaluation (LREC) (Istanbul, Turkey, 2012)

  17. M. Djellab, A. Amrouche, A. Bouridane, N. Mehallegue, Algerian Modern Colloquial Arabic Speech Corpus (AMCASC): regional accents recognition within complex socio-linguistic environments. Lang. Resour. Eval. 51, 613–641 (2017)

    Article  Google Scholar 

  18. Ethnologue. (2017, 14 Jan 2017). Sindhi. https://www.ethnologue.com/language/snd

  19. M. Farooq, An Acoustic Phonetic Study of Six Accents of Urdu in Pakistan. MS thesis, Department of English Language and Literature, University of Management and Technology (2014)

  20. C.Y.E.-W.D. Garcia-Romero, Analysis of i-vector length normalization in speaker recognition systems, Presented at the Interspeech (Florence, 2011)

  21. R.G. Gordon, Ethnologue: Languages of the World, 15th edn. (SIL International, Dallas, 2005)

    Google Scholar 

  22. G.A. Grierson, Linguistic Survey of India. vol. Volume IX: Indo-Aryan family. Central group, edn (Office of the Superintendent of Government Printing India, Calcutta, 1916), p. 609

  23. W. Habib, R.H. Basit, S. Hussain, F. Adeeba, Design of speech corpus for open domain Urdu text to speech system using greedy algorithm, in Conference on Language and Technology (CLT) (Karachi, 2014)

  24. M. India, J.A.R. Fonollosa, J. Hernando, LSTM Neural Network-based speaker segmentationusing acoustic and language modelling, in Interspeech Stockholm (Sweden, 2017), pp. 2834–2838

  25. Indo Times. http://www.indotimes.com.au/

  26. P. Kenny, A small footprint i-vector extractor, in ODYSSEY (2012)

  27. M.A. Kohler, M. Kennedy, Language identification using shifted delta cepstra, in The 2002 45th Midwest Symposium on Circuits and Systems, MWSCAS-2002, vol. 3 (2002), pp. III-69-72

  28. T. Lander, R.A. Cole, B.T. Oshika, M. Noel, The OGI 22 language telephone speech corpus, in 4th European Conference on Speech Communication and Technology (Madrid, 1995)

  29. H. Li, B. Ma, K.A. Lee, Spoken language recognition: from fundamentals to practice. Proc. IEEE 101, 1136–1159 (2013)

    Article  Google Scholar 

  30. S.O.S.G. Liu, T. Hasan, J.W. Suh, C. Zhang, M. Mehrabani, H. Boril, J.H.L. Hansen, UTD-CRSS systems for NIST language recognition evaluation 2011, Presented at the NIST 2011 Language Recognition Evaluation Workshop (2011)

  31. Y. Liu, L. He, Y. Tian, Z. Chen, J. Liu, M.T. Johnson, Comparison of multiple features and modeling methods for text-dependent speaker verification. CoRR abs/1707.04373 (2017)

  32. B. Ma, C. Guan, H. Li, C.-H. Lee, Multilingual speech recognition with language identification, in INTERSPEECH (2002)

  33. A. Martin, A. Le, D. Graff, J. v. Santen. (2017). 2007 NIST Language Recognition Evaluation Supplemental Training Set. https://catalog.ldc.upenn.edu/LDC2009S05

  34. D. Martínez, O. Plchot, L. Burget, O. Glembek, P. Matejka, Language recognition in ivectors space, in Proceedings of Interspeech (Firenze, 2011), pp. 861–864

  35. L. Mary, B. Yegnanarayana, Prosodic features for language identification, in International Conference on Signal Processing, Communications and Networking, 2008. ICSCN’08 (2008), pp. 57–62

  36. P. Matejka, P. Schwarz, J. Cernocky, P. Chytil, Phonotactic language identification using high quality phoneme recognition, in Proceedings og Eurospeech 2005 (2005)

  37. P. Mewaram, A Sindhi-English Dictionary (The Sind Juvenile Co-operative Society, Hyderabad, 1910)

    Google Scholar 

  38. G.R. Naik, Measure of quality of source separation for sub-and super-Gaussian audio mixtures. Informatica 23, 581–599 (2012)

    MathSciNet  MATH  Google Scholar 

  39. G.R. Naik, W. Wang, Audio analysis of statistically instantaneous signals with mixed Gaussian probability distributions. Int. J. Electron. 99, 1333–1350 (2012)

    Article  Google Scholar 

  40. R.W.M. Ng, T. Lee, C.C. Leung, B. Ma, H. Li, Analysis and selection of prosodic features for language identification, in International Conference Asian Language Processing, IALP’09 (2009), pp. 123–128

  41. Y. Obuchi, N. Sato, Language identification using phonetic and prosodic HMMs with feature Normalization, in Proceedings of (ICASSP’05). IEEE International Conference on Acoustics, Speech, and Signal Processing (2005), pp. 569–572

  42. A. Poddar, M. Sahidullah, G. Saha, Performance comparison of speaker recognition systems in presence of duration variability, in 2015 annual IEEE India conference (INDICON) India (2015), pp. 1–6

  43. Punjab Post. http://punjabpost.ca/

  44. Punjabi Infoline. http://www.punjabinfoline.com/

  45. D.A. Reynolds, T.F. Quatieri, R.B. Dunn, Speaker verification using adapted Gaussian mixture models. Digit. Signal Process. 10, 19–41 (2000)

    Article  Google Scholar 

  46. M. Scarpiniti, F. Garzia, Security monitoring based on joint automatic speaker recognition and blind source separation, in International Carnahan Conference on Security Technology (ICCST) (Rome, 2014), pp. 1–6

  47. W. Shen, W. Campbell, T. Gleason, D. Reynolds, E. Singer, Experiments with Lattice-based PPRLM language identification, in IEEE Odyssey—The Speaker and Language Recognition Workshop (2006), pp. 1–6

  48. E. Singer, P.A. Torres-Carrasquillo, D.A. Reynolds, A. McCree, F. Richardson, N. Dehak,et al., The MITLL NIST LRE 2011 language recognition system, in ODYSSEY (2012)

  49. S. Strassel, K. Walker, K. Jones, D. Graff, C. Cieri, New resources for recognition of confusable linguistic varieties: the LRE11 corpus. Presented at the Odyssey 2012: The Speaker and Language Recognition Workshop (Singapore, 2012)

  50. Z.H. Tan, B. Lindberg, Low-complexity variable frame rate analysis for speech recognition and voice activity detection. IEEE J. Sel. Top. Signal Process. 4, 798–807 (2010)

    Article  Google Scholar 

  51. The Hidden Markov Model Toolkit. http://htk.eng.cam.ac.uk/

  52. S. Urooj, S. Hussain, F. Adeeba, F. Jabeen, R. Parveen, CLE Urdu digest corpus, in Conference on Language and Technology(CLT) (Lahore, 2012), pp. 47–53

  53. A. Waibel, P. Geutner, L.M. Tomokiyo, T. Schultz, M. Woszczyna, Multilinguality in speech and spoken language systems. Proc. IEEE 88, 1297–1313 (2000)

    Article  Google Scholar 

  54. J.C. Wells, Computer-coding the IPA: a proposed extension of SAMPA (1999)

  55. C.-H. Wu, G.-L. Yan, Acoustic feature analysis and discriminative modeling of filled pauses for spontaneous speech recognition. J. VLSI Signal Process. Syst. Signal Image Video Technol. 36, 91–104 (2004)

    Article  Google Scholar 

  56. F. Yokomori, Y. Ninomiya, M. Morise, A. Tanaka, K. Ozawa, Acoustic feature analysis focusing on gender difference in likability evaluation of female speech. Trans. Jpn. Soc. Kansei Eng. 15, 721–729 (2016)

    Article  Google Scholar 

  57. Q. Zhang, H. Bo, x, il, J.H.L. Hansen, Supervector pre-processing for PRSVM-based Chinese and Arabic dialect identification, in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (2013), pp. 7363–7367

  58. Q. Zhang, G. Liu, J.H. Hansen, Robust language recognition based on diverse features, in ODYSSEY: The Speaker and Language and Language Recognition Workshop (2014), pp. 152–157

  59. X. Zhang, D. Wang, Deep learning based binaural speech separation in reverberant environments. IEEE/ACM Trans. Audio Speech Lang. Process. 25, 1075–1084 (2017)

    Article  Google Scholar 

  60. V.W. Zue, J.R. Glass, Conversational interfaces: advances and challenges. Proc. IEEE 88, 1166–1180 (2000)

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to acknowledge Ashok Kumar Khatri, Asad Mustafa, and Inaam-ullah Torwali of Center for Language Engineering, for their assistance in development of phonetic lexicon of Sindhi, Punjabi, and Pashto languages.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Farah Adeeba.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Adeeba, F., Hussain, S. Acoustic Feature Analysis and Discriminative Modeling for Language Identification of Closely Related South-Asian Languages. Circuits Syst Signal Process 37, 3589–3604 (2018). https://doi.org/10.1007/s00034-017-0724-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-017-0724-1

Keywords

Navigation