Abstract
The present work investigates the importance of phase in language identification (LID). We have proposed three phase based features for the language recognition task. In this work, auto-regressive model with scale factor error augmentation have been used for better representation of phase based features. We have developed three group delay based systems, namely, normal group delay based system, auto-regressive model group delay based system and auto-regressive group delay with scale factor augmentation based system. As mel-frequency cepstral coefficients (MFCCs) are extracted from the magnitude of the Fourier transform, we have combined this MFCC-based system with our phase-based systems to exploit the complete information contained in a speech signal. In this work, we have used IITKGP-MLILSC speech database and OGI Multi-language Telephone Speech (OGI-MLTS) corpus for our experiments. We have used Gaussian mixture models for building the language models. From the experimental results it is observed that the LID accuracy obtained from our proposed phase based features is comparable with MFCC features. We have also observed some performance improvement in the LID accuracy on combining the proposed phase-based systems with the state of the art MFCC-based system.
Similar content being viewed by others
References
Alsteris, L. D., & Paliwal, K. K. (2004). Importance of window shape for phase-only reconstruction of speech. IEEE, 1, 1–573.
Alvin, M. Robert, W. Goodman, F.J. (1989). Improved automatic language identification in noisy speech. In International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (vol. 1, pp. 528–531).
Balleda, J. Murthy, H. A. & Nagarajan, T. (2000). Language identification from short segments of speech. In Interspeech (pp. 1033–1036).
Bhaskar, B. Nandi, D. & Rao, K. S. (2013). Analysis of language identification performance based on gender and hierarchial grouping approaches. In International Conference on Natural Language Processing (ICON-2013), CDAC, Noida, India.
Dutta, A. K. & Rao, K. S. (2015, August, 20-22). Robust language identification using power normalized cepstral coefficients. In Eighth International Conference on Contemporary Computing, IC3 Noida, India (pp. 253–256).
Foil, J. T. (1986). Language identification using noisy speech. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 861–864).
Hegde, R. M., Murthy, H. A., & Gadde, V. R. R. (2007). Significance of the modified group delay feature in speech recognition. IEEE Transactions on Audio, Speech & Language Processing, 15(1), 190–202.
Itahashi, S. Zhou, J. X. & Tanaka, K. (1994). Spoken language discrimination using speech fundamental frequency. In Third International Conference on Spoken Language Processing.
Kadambe, S. & Hieronymus, J. L. (1995). Language identification with phonological and lexical models. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (vol. 5, pp. 3507–3510).
Leonard, G. (1980). Language recognition test and evaluation.
Li, K.-P. (1994). Automatic language identification using syllabic spectral features. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (vol. 1, pp. 1–297).
Liu, L., He, J., & Palm, G. (1997). Effects of phase on the perception of intervocalic stop consonants. Speech Communication, 22(4), 403–417.
Loweimi, E. Ahadi, S. M. & Sheikhzadeh, H. (2011). Phase-only speech reconstruction using very short frames. In Twelfth Annual Conference of the International Speech Communication Association.
Maity, S. Vuppala, A. K. Rao, K. S. & Nandi, D. (2012). IITKGP-MLILSC speech database for language identification. In IEEE National Conference on Communications (NCC) (pp. 1–5).
Martínez, D. Burget, L. Ferrer, L. & Scheffer, N. (2012). ivector-based prosodic system for language identification. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4861–4864).
Mary, L. & Yegnanarayana, B. (2004). Autoassociative neural network models for language identification. In IEEE Intelligent Sensing and Information Processing. Proceedings of International Conference on (pp. 317–320).
Mary, L. (2006). Multilevel implicit features for language and speaker recognition.
Mary, Y. B. L. (2008). Extraction and representation of prosodic features for language and speaker recognition. Speech Commun, 50, 782–796.
Murthy, H. A. (1992). Algorithms for processing fourier transform phase of signals, Ph. D. Dissertation, Department of Computer Science and Engineering, Indian Institute of Technology, Madras, India.
Muthusamy, Y. K. Cole, R. Gopalakrishnan, M. et al., (1991). A segment-based approach to automatic language identification. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 353–356).
Muthusamy, Y. K. Cole, R. A. Oshika, B. T. Consortium, L. D. et al., (1992). The ogi multi-language telephone speech corpus. In Citeseer ICSLP (vol. 92, pp. 895–898).
Nagarajan, T. & Murthy, H. A. (2004). Language identification using parallel syllable-like unit recognition. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (vol. 1, pp. 1–401).
Nandi, D. Dutta, A. K. & Rao, K. S. (2014). Significance of CV transition and steady vowel regions for language identification, in IEEE Seventh International Conference on Contemporary Computing (IC3) (pp. 513–517).
Nandi, D., Pati, D., & Rao, K. S. (2015). Implicit excitation source features for robust language identification. International Journal of Speech Technology, 18(3), 459–477.
Ohm, G. S. (1843). Uber die definition des tones, nebst daran geknfter theorie der sirene und hnlicher tonbildender vorichtungen. Annual Review of Physical Chemistry, 135(8), 513–565.
Oppenheim, A. V., & Lim, J. S. (1981). The importance of phase in signals. Proceedings of the IEEE, 69, 529–550.
Oppenheim, A. V., Schafer, R. W., Buck, J. R., et al. (1989). Discrete-time signal processing. New Jersey: Prentice-hall Englewood Cliffs.
Pellegrino, F. & André-Obrecht, R. (1999). An unsupervised approach to language identification. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (vol. 2, pp. 833–836).
Rao, K. S., Maity, S., & Reddy, V. R. (2013). Pitch synchronous and glottal closure based speech analysis for language recognition. International Journal of Speech Technology, 16(4), 413–430.
Reddy, V. R., Maity, S., & Rao, K. S. (2013). Identification of Indian languages using multi-level spectral and prosodic features. International Journal of Speech Technology, 16(4), 489–511.
Sangwan, A. Mehrabani, M. & Hansen, J. H. (2010). Automatic language analysis and identification based on speech production knowledge. In IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP) (pp. 5006–5009).
Savic, M. Acosta, E. & Gupta, S. K. (1991). An automatic language identification system. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 817–820).
Sugiyama, M. (1991). Automatic language recognition using acoustic features. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 813–816).
Tribolet, J. (1977). A new phase unwrapping algorithm. IEEE Transactions on Acoustics Speech and Signal Processing, 25(2), 170–177.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Dutta, A.K., Rao, K.S. Language identification using phase information. Int J Speech Technol 21, 509–519 (2018). https://doi.org/10.1007/s10772-017-9482-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-017-9482-5