Abstract
We propose a new approach based on Deep Neural Network via Multi-task Learning (MTL-DNN) for simultaneous Mandarin-English code-switching conversational speech recognition (MECS-CSR) (primary task) and language identification (LID) (auxiliary task). In our approach, the hidden layers of the DNNs for primary task fuse with ones of the DNN for auxiliary task by sharing weights/bias parameters. Extensive experiments are carried out on LDC2015S04 and Mixed Error Rate (MER) is used as performance metric for the code-switching speech recognition. Compared with the baseline and the first MECS-CSR system [1] on LDC2015S04, MER of proposed approach is relatively reduced by 4.57% and 4.07%, respectively. Results show that the proposed approach is able to capture more language switching information from the auxiliary task and significantly outperforms the competitive algorithms for the single tasks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Vu, N.T., Lyu, D.-C., Weiner, J., Telaar, D., Schlippe, T., Blaicher, F., et al.: A first speech recognition system for Mandarin-English code-switch conversational speech. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4889–4892 (2012)
Li, Y., Fung, P.: Code switching language model with translation constraint for mixed language speech recognition. In: Proceedings of COLING, pp. 1671–1680 (2012)
Chen, M., et al.: Multi-Task Learning in Deep Neural Networks for Mandarin-English Code-Mixing Speech Recognition. IEICE Trans. Inf. Syst. 99(10), 2554–2557 (2016)
Yeh, C.F., Huang, C.Y., Sun, L.C., Lee, L.S.: An integrated framework for transcribing Mandarin-English code-mixed lectures with improved acoustic and language modeling. In: 7th International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 214–219 (2010)
Yu, S., Zhang, S., Xu, B.: Chinese-English bilingual phone modeling for cross-language speech recognition. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2004), pp. I–917 (2004)
Bhuvanagiri, K., Kopparapu, S.: An approach to mixed language automatic speech recognition. In: Oriental COCOSDA, Kathmandu, Nepal (2010)
Lyu, D.-C., Lyu, R.-Y., Chiang, Y.-C., Hsu, C.-N.: Speech recognition on code-switching among the Chinese dialects. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2006, p. I (2006)
Chen, D., Mak, B.K.-W.: Multitask learning of deep neural networks for low-resource speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 23, 1172–1183 (2015)
Huang, J.-T., Li, J., Yu, D., Deng, L., Gong, Y.: Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7304–7308 (2013)
Giri, R., Seltzer, M.L., Droppo, J., Yu, D.: Improving speech recognition in reverberation using a room-aware deep neural network and multi-task learning. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5014–5018 (2015)
Davis, K., Biddulph, R., Balashek, S.: Automatic recognition of spoken digits. J. Acoust. Soc. Am. 24, 637–642 (1952)
Chen, D., Mak, B., Leung, C.-C., Sivadas, S.: Joint acoustic modeling of triphones and trigraphemes by multi-task learning deep neural networks for low-resource speech recognition. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5592–5596 (2014)
Lyu, D.-C., Tan, T.-P., Chng, E.-S., Li, H.: Mandarin–English code-switching speech corpus in South-East Asia: SEAME. Lang. Resour. Eval. 49, 581–600 (2015)
Acknowledgements
This work is partially supported by Shenzhen Science & Research projects. (No: JCYJ20160331104524983, JSGG20160229121006579). The authors also gratefully acknowledge the helpful comments and suggestions of the reviewers, which have improved the presentation.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Song, X., Liu, Y., Yang, D., Zou, Y. (2018). A Multi-task Learning Approach for Mandarin-English Code-Switching Conversational Speech Recognition. In: Li, K., Li, W., Chen, Z., Liu, Y. (eds) Computational Intelligence and Intelligent Systems. ISICA 2017. Communications in Computer and Information Science, vol 873. Springer, Singapore. https://doi.org/10.1007/978-981-13-1648-7_9
Download citation
DOI: https://doi.org/10.1007/978-981-13-1648-7_9
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1647-0
Online ISBN: 978-981-13-1648-7
eBook Packages: Computer ScienceComputer Science (R0)