Development and analysis of multilingual phone recognition systems using Indian languages

Published: 12 January 2019

Volume 22, pages 157–168, (2019)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

K. E. Manjunath¹,
Dinesh Babu Jayagopi¹,
K. Sreenivasa Rao² &
…
V. Ramasubramanian¹

179 Accesses
5 Citations
Explore all metrics

Abstract

In this paper, the development of Multilingual Phone Recognition System (Multi-PRS) using four Indian languages—Kannada, Telugu, Bengali, and Odia—is described. Multi-PRS is an universal Phone Recognition System (PRS), which performs the phone recognition independent of any language. International phonetic alphabets based transcription is used for grouping the acoustically similar phonetic units from multiple languages. Multilingual phone recognisers for Indian languages are studied using two broad groups namely—Dravidian languages and Indo-Aryan languages. Dravidian and Indo-Aryan languages are grouped separately to develop Bilingual PRSs. We have explored both HMMs and DNNs for developing PRSs under both context-dependent and context-independent setups. The state-of-the-art DNNs have outperformed the HMMs. The performance of Multi-PRSs is analysed and compared with that of the monolingual PRSs. The advantages of Multi-PRSs over monolingual PRSs are discussed. Further, we have developed tandem Multi-PRSs using phone posteriors as tandem features to improve the performance of the baseline Multi-PRSs. It is found that the tandem Multi-PRSs have outperformed the baseline Multi-PRSs in all the cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Similar content being viewed by others

Development and Analysis of Multilingual Phone Recognition System

Chapter © 2022

Articulatory-feature-based methods for performance improvement of Multilingual Phone Recognition Systems using Indian languages

Article 30 July 2020

K E Manjunath, Dinesh Babu Jayagopi, … V Ramasubramanian

Multilingual Phone Recognition: Comparison of Traditional versus Common Multilingual Phone-Set Approaches and Applications in Code-Switching

Chapter © 2020

Notes

MHRD. To know more about Indian Languages. [Online]. Available: http://mhrd.gov.in/sites/upload_files/mhrd/files/upload_document/ languagebr.pdf.
Development of Prosodically Guided Phonetic Engine for Searching Speech Databases in Indian Languages http://speech.iiit.ac.in/svldownloads/pro_po_en_report/.
Sclite Tool http://www1.icsi.berkeley.edu/Speech/docs/sctk-1.2/sclite.htm.

References

Corredor-Ardoy, C. et al. (1998). Multilingual phone recognition of spontaneous telephone speech. In ICASSP, pp. 413–416.
Frankel, J., Magimai-Doss, M., King, S., Livescu, K., & Cetin, O. (2007). Articulatory feature classifiers trained on 2000 hours of telephone speech. In Interspeech.
Gangashetty, S. V., Chandra Sekhar, C., & Yegnanarayana, B. (2005) Spotting multilingual consonant-vowel units of speech using neural network models. In International conference on non-linear speech processing (NOLISP), pp. 303–317.
Golla V. (2011). California Indian languages. London: University of California Press—Language Arts & Disciplines
Hermansky, H., Ellis, D. P., & Sharma, S. (2000). Tandem connectionist feature extraction for conventional HMM systems. In IEEE international conference on acoustics, speech and signal processing (ICASSP), vol. 3, pp. 1635–1638.
Ketabdar, H., & Bourlard, H. (2008). Hierarchical integration of phonetic and lexical knowledge in phone posterior estimation. In IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp. 4065–4068.
Kiran, R. R., Kumar, S. S., Manjunath, K. E., Satapathy, B., Chaturvedi, A., Pati, D., et al. (2013). Automatic phonetic and prosodic transcription for Indian languages: Bengali and Odia. In 10th International conference on natural language processing (ICON).
Madhavi, M. C., Sharma, S., & Patil, H. A. (2014). Development of language resources for speech application in Gujarati and Marathi. In IEEE International conference on asian language processing (IALP), vol. 1, pp. 115–118.
Manjunath, K. E., & Sreenivasa Rao, K. S. (2014). Automatic phonetic transcription for read, extempore and conversation speech for an Indian language: Bengali. In IEEE national conference on communications (NCC).
Manjunath, K. E., Sreenivasa Rao, K. S., & Jayagopi, D. B. (2017). Development of multilingual phone recognition system for Indian languages. In IEEE international conference on signal processing, informatics, communication and energy systems (SPICES).
Manjunath, K. E., Sreenivasa Rao, K. S., Jayagopi, D. B., & Ramasubramanian, V. (2018). Indian languages ASR: A multilingual phone recognition framework with IPA based common phone-set, predicted articulatory features and feature fusion. In INTERSPEECH.
Mohan, A., Rose, R., Ghalehjegh, S. H., & Umesh, S. (2014). Acoustic modelling for speech recognition in Indian languages inan agricultural commodities task domain. Speech Communication, 56, 167–180.
Article Google Scholar
Muller, M., Stuker, S., & Waibel, A. (2016). Towards improving low-resource speech recognition using articulatory and language features. In International workshop on spoken language translation (IWSLT), pp. 1–7.
Muller, M., & Waibel, A. (2015). Using language adaptive deep neural networks for improved multilingual speech recognition. In International workshop on spoken language translation (IWSLT).
Pinto, J., Garimella, S., Magimai-Doss, M., Hermansky, H., & Bourlard, H. (2011). Analysis of MLP-based hierarchical phoneme posterior probability estimator. IEEE transactions on audio, speech, and language processing, 19(2), 225–241.
Article Google Scholar
Povey, D. et al. (2011). The Kaldi speech recognition toolkit, IEEE workshop on ASRU. http://kaldi-asr.org/
Rabiner, L., Juang, B., & Yegnanarayana, B. (2008). Fundamentals of speech recognition. London: Pearson Education.
Google Scholar
Riedhammer, K. T., Bocklet, T., Ghoshal, A., & Povey, D. (2012). Revisiting semi-continuous hidden Markov models. In ICASSP, pp. 4721– 4724.
Santhosh Kumar, C., Mohandas, V. P., & Haizhou, L. (2005). Multilingual speech recognition: A unified approach. In Interspeech.
Sarma, B. D., Sarma, M., Sarma, M., & Prasanna, S. R. M. (2013). Development of assamese phonetic engine: Some issues. In IEEE INDICON, pp. 1–6.
Schultz, T., & Kirchhoff, K. (2006). Multilingual speech processing. Cambridge: Academic Press.
Google Scholar
Schultz, T., & Waibel, A. (1998a). Language independent and language adaptive large vocabulary speech recognition. In International conference on spoken language processing (ICSLP), pp. 1819–1822.
Schultz, T., & Waibel, A. (1998b). Multilingual and crosslingual speech recognition. In Proceedings of DARPA workshop on broadcast news transcription and understanding, pp. 259–262.
Schultz, T., & Waibel, A. (2001). Language independent and language adaptive acoustic modeling for speech recognition. Speech Communication, 35, 31–51.
Article MATH Google Scholar
Shridhara, M. V., Banahatti, B. K., Narthan, L., Karjigi, V., & Kumaraswamy, R. (2013). Development of Kannada speech corpus for prosodically guided phonetic search engine. In O-COCOSDA, pp. 1–6.
Siniscalchi, S. M., Lyu, D., Svendsen, T., & Lee, C. (2012). Experiments on cross-language attribute detection and phone recognition with minimal target-specific training data. IEEE Transactions on Acoustics, Speech, and Signal Processing, 20(3), 875–887.
Google Scholar
Sunil Kumar, S. B., Sreenivasa Rao, K., & Pati, D. (2013). Phonetic and prosodically rich transcribed speech corpus in Indian languages : Bengali and Odia. In Sixteenth International Oriental COCOSDA.
The International Phonetic Association. (2007). Handbook of the international phonetic association. Cambridge University Press. https://www.internationalphoneticassociation.org/
Vuppala, A. K., Yadav, J., Chakrabarti, S., & Sreenivasa Rao, K. (2012). Vowel onset point detection for low bit rate coded speech. IEEE Transactions on Audio, Speech and Language Processing, 20, 1894–1903.
Article Google Scholar
Zhang, X., Trmal, J., Povey, D., & Khudanpur, S. (2014). Improving deep neural network acoustic models using generalized maxout networks. In ICASSP, pp. 215–219.

Download references

Acknowledgements

We thank Prof. B. Yegnanarayana, Prof. K. Sri Rama Murthy, and Prof. R. Kumaraswamy for providing Kannada and Telugu datasets. These datasets were developed as a part of the consortium project titled ”Prosodically guided phonetic engine for searching speech databases in Indian languages” supported by DIT, New Delhi, India.

Author information

Authors and Affiliations

International Institute of Information Technology, Bangalore, 560100, India
K. E. Manjunath, Dinesh Babu Jayagopi & V. Ramasubramanian
Indian Institute of Technology Kharagpur, Kharagpur, 721302, India
K. Sreenivasa Rao

Authors

K. E. Manjunath
View author publications
You can also search for this author in PubMed Google Scholar
Dinesh Babu Jayagopi
View author publications
You can also search for this author in PubMed Google Scholar
K. Sreenivasa Rao
View author publications
You can also search for this author in PubMed Google Scholar
V. Ramasubramanian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to K. Sreenivasa Rao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Manjunath, K.E., Jayagopi, D.B., Sreenivasa Rao, K. et al. Development and analysis of multilingual phone recognition systems using Indian languages. Int J Speech Technol 22, 157–168 (2019). https://doi.org/10.1007/s10772-018-09589-z

Download citation

Received: 26 September 2018
Accepted: 24 December 2018
Published: 12 January 2019
Issue Date: 15 March 2019
DOI: https://doi.org/10.1007/s10772-018-09589-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions