Language identification using phase information

Dutta, Arup Kumar; Rao, K. Sreenivasa

doi:10.1007/s10772-017-9482-5

Language identification using phase information

Published: 12 December 2017

Volume 21, pages 509–519, (2018)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Arup Kumar Dutta¹ &
K. Sreenivasa Rao¹

271 Accesses
6 Citations
Explore all metrics

Abstract

The present work investigates the importance of phase in language identification (LID). We have proposed three phase based features for the language recognition task. In this work, auto-regressive model with scale factor error augmentation have been used for better representation of phase based features. We have developed three group delay based systems, namely, normal group delay based system, auto-regressive model group delay based system and auto-regressive group delay with scale factor augmentation based system. As mel-frequency cepstral coefficients (MFCCs) are extracted from the magnitude of the Fourier transform, we have combined this MFCC-based system with our phase-based systems to exploit the complete information contained in a speech signal. In this work, we have used IITKGP-MLILSC speech database and OGI Multi-language Telephone Speech (OGI-MLTS) corpus for our experiments. We have used Gaussian mixture models for building the language models. From the experimental results it is observed that the LID accuracy obtained from our proposed phase based features is comparable with MFCC features. We have also observed some performance improvement in the LID accuracy on combining the proposed phase-based systems with the state of the art MFCC-based system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Alsteris, L. D., & Paliwal, K. K. (2004). Importance of window shape for phase-only reconstruction of speech. IEEE, 1, 1–573.
Article Google Scholar
Alvin, M. Robert, W. Goodman, F.J. (1989). Improved automatic language identification in noisy speech. In International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (vol. 1, pp. 528–531).
Balleda, J. Murthy, H. A. & Nagarajan, T. (2000). Language identification from short segments of speech. In Interspeech (pp. 1033–1036).
Bhaskar, B. Nandi, D. & Rao, K. S. (2013). Analysis of language identification performance based on gender and hierarchial grouping approaches. In International Conference on Natural Language Processing (ICON-2013), CDAC, Noida, India.
Dutta, A. K. & Rao, K. S. (2015, August, 20-22). Robust language identification using power normalized cepstral coefficients. In Eighth International Conference on Contemporary Computing, IC3 Noida, India (pp. 253–256).
Foil, J. T. (1986). Language identification using noisy speech. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 861–864).
Hegde, R. M., Murthy, H. A., & Gadde, V. R. R. (2007). Significance of the modified group delay feature in speech recognition. IEEE Transactions on Audio, Speech & Language Processing, 15(1), 190–202.
Article Google Scholar
Itahashi, S. Zhou, J. X. & Tanaka, K. (1994). Spoken language discrimination using speech fundamental frequency. In Third International Conference on Spoken Language Processing.
Kadambe, S. & Hieronymus, J. L. (1995). Language identification with phonological and lexical models. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (vol. 5, pp. 3507–3510).
Leonard, G. (1980). Language recognition test and evaluation.
Li, K.-P. (1994). Automatic language identification using syllabic spectral features. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (vol. 1, pp. 1–297).
Liu, L., He, J., & Palm, G. (1997). Effects of phase on the perception of intervocalic stop consonants. Speech Communication, 22(4), 403–417.
Article Google Scholar
Loweimi, E. Ahadi, S. M. & Sheikhzadeh, H. (2011). Phase-only speech reconstruction using very short frames. In Twelfth Annual Conference of the International Speech Communication Association.
Maity, S. Vuppala, A. K. Rao, K. S. & Nandi, D. (2012). IITKGP-MLILSC speech database for language identification. In IEEE National Conference on Communications (NCC) (pp. 1–5).
Martínez, D. Burget, L. Ferrer, L. & Scheffer, N. (2012). ivector-based prosodic system for language identification. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4861–4864).
Mary, L. & Yegnanarayana, B. (2004). Autoassociative neural network models for language identification. In IEEE Intelligent Sensing and Information Processing. Proceedings of International Conference on (pp. 317–320).
Mary, L. (2006). Multilevel implicit features for language and speaker recognition.
Mary, Y. B. L. (2008). Extraction and representation of prosodic features for language and speaker recognition. Speech Commun, 50, 782–796.
Article Google Scholar
Murthy, H. A. (1992). Algorithms for processing fourier transform phase of signals, Ph. D. Dissertation, Department of Computer Science and Engineering, Indian Institute of Technology, Madras, India.
Muthusamy, Y. K. Cole, R. Gopalakrishnan, M. et al., (1991). A segment-based approach to automatic language identification. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 353–356).
Muthusamy, Y. K. Cole, R. A. Oshika, B. T. Consortium, L. D. et al., (1992). The ogi multi-language telephone speech corpus. In Citeseer ICSLP (vol. 92, pp. 895–898).
Nagarajan, T. & Murthy, H. A. (2004). Language identification using parallel syllable-like unit recognition. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (vol. 1, pp. 1–401).
Nandi, D. Dutta, A. K. & Rao, K. S. (2014). Significance of CV transition and steady vowel regions for language identification, in IEEE Seventh International Conference on Contemporary Computing (IC3) (pp. 513–517).
Nandi, D., Pati, D., & Rao, K. S. (2015). Implicit excitation source features for robust language identification. International Journal of Speech Technology, 18(3), 459–477.
Article Google Scholar
Ohm, G. S. (1843). Uber die definition des tones, nebst daran geknfter theorie der sirene und hnlicher tonbildender vorichtungen. Annual Review of Physical Chemistry, 135(8), 513–565.
Article Google Scholar
Oppenheim, A. V., & Lim, J. S. (1981). The importance of phase in signals. Proceedings of the IEEE, 69, 529–550.
Article Google Scholar
Oppenheim, A. V., Schafer, R. W., Buck, J. R., et al. (1989). Discrete-time signal processing. New Jersey: Prentice-hall Englewood Cliffs.
MATH Google Scholar
Pellegrino, F. & André-Obrecht, R. (1999). An unsupervised approach to language identification. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (vol. 2, pp. 833–836).
Rao, K. S., Maity, S., & Reddy, V. R. (2013). Pitch synchronous and glottal closure based speech analysis for language recognition. International Journal of Speech Technology, 16(4), 413–430.
Article Google Scholar
Reddy, V. R., Maity, S., & Rao, K. S. (2013). Identification of Indian languages using multi-level spectral and prosodic features. International Journal of Speech Technology, 16(4), 489–511.
Article Google Scholar
Sangwan, A. Mehrabani, M. & Hansen, J. H. (2010). Automatic language analysis and identification based on speech production knowledge. In IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP) (pp. 5006–5009).
Savic, M. Acosta, E. & Gupta, S. K. (1991). An automatic language identification system. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 817–820).
Sugiyama, M. (1991). Automatic language recognition using acoustic features. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 813–816).
Tribolet, J. (1977). A new phase unwrapping algorithm. IEEE Transactions on Acoustics Speech and Signal Processing, 25(2), 170–177.
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur, 721302, India
Arup Kumar Dutta & K. Sreenivasa Rao

Authors

Arup Kumar Dutta
View author publications
You can also search for this author in PubMed Google Scholar
K. Sreenivasa Rao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arup Kumar Dutta.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dutta, A.K., Rao, K.S. Language identification using phase information. Int J Speech Technol 21, 509–519 (2018). https://doi.org/10.1007/s10772-017-9482-5

Download citation

Received: 15 July 2017
Accepted: 10 November 2017
Published: 12 December 2017
Issue Date: September 2018
DOI: https://doi.org/10.1007/s10772-017-9482-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Language identification using phase information

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

Chinese dialect speech recognition: a comprehensive survey

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Language identification using phase information

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

Chinese dialect speech recognition: a comprehensive survey

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation