A Multi-task Learning Approach for Mandarin-English Code-Switching Conversational Speech Recognition

Song, Xiao; Liu, Yi; Yang, Daming; Zou, Yuexian

doi:10.1007/978-981-13-1648-7_9

Xiao Song^12,13,
Yi Liu¹³,
Daming Yang¹⁴ &
…
Yuexian Zou¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 873))

Included in the following conference series:

International Symposium on Intelligence Computation and Applications

713 Accesses

Abstract

We propose a new approach based on Deep Neural Network via Multi-task Learning (MTL-DNN) for simultaneous Mandarin-English code-switching conversational speech recognition (MECS-CSR) (primary task) and language identification (LID) (auxiliary task). In our approach, the hidden layers of the DNNs for primary task fuse with ones of the DNN for auxiliary task by sharing weights/bias parameters. Extensive experiments are carried out on LDC2015S04 and Mixed Error Rate (MER) is used as performance metric for the code-switching speech recognition. Compared with the baseline and the first MECS-CSR system [1] on LDC2015S04, MER of proposed approach is relatively reduced by 4.57% and 4.07%, respectively. Results show that the proposed approach is able to capture more language switching information from the auxiliary task and significantly outperforms the competitive algorithms for the single tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/xiaosdawn/Kaldi-multi-task/blob/master/egs/wsj/s5/local/online/run_multitask2.sh

References

Vu, N.T., Lyu, D.-C., Weiner, J., Telaar, D., Schlippe, T., Blaicher, F., et al.: A first speech recognition system for Mandarin-English code-switch conversational speech. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4889–4892 (2012)
Google Scholar
Li, Y., Fung, P.: Code switching language model with translation constraint for mixed language speech recognition. In: Proceedings of COLING, pp. 1671–1680 (2012)
Google Scholar
Chen, M., et al.: Multi-Task Learning in Deep Neural Networks for Mandarin-English Code-Mixing Speech Recognition. IEICE Trans. Inf. Syst. 99(10), 2554–2557 (2016)
Article Google Scholar
Yeh, C.F., Huang, C.Y., Sun, L.C., Lee, L.S.: An integrated framework for transcribing Mandarin-English code-mixed lectures with improved acoustic and language modeling. In: 7th International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 214–219 (2010)
Google Scholar
Yu, S., Zhang, S., Xu, B.: Chinese-English bilingual phone modeling for cross-language speech recognition. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2004), pp. I–917 (2004)
Google Scholar
Bhuvanagiri, K., Kopparapu, S.: An approach to mixed language automatic speech recognition. In: Oriental COCOSDA, Kathmandu, Nepal (2010)
Google Scholar
Lyu, D.-C., Lyu, R.-Y., Chiang, Y.-C., Hsu, C.-N.: Speech recognition on code-switching among the Chinese dialects. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2006, p. I (2006)
Google Scholar
Chen, D., Mak, B.K.-W.: Multitask learning of deep neural networks for low-resource speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 23, 1172–1183 (2015)
Google Scholar
Huang, J.-T., Li, J., Yu, D., Deng, L., Gong, Y.: Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7304–7308 (2013)
Google Scholar
Giri, R., Seltzer, M.L., Droppo, J., Yu, D.: Improving speech recognition in reverberation using a room-aware deep neural network and multi-task learning. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5014–5018 (2015)
Google Scholar
Davis, K., Biddulph, R., Balashek, S.: Automatic recognition of spoken digits. J. Acoust. Soc. Am. 24, 637–642 (1952)
Article Google Scholar
Chen, D., Mak, B., Leung, C.-C., Sivadas, S.: Joint acoustic modeling of triphones and trigraphemes by multi-task learning deep neural networks for low-resource speech recognition. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5592–5596 (2014)
Google Scholar
Lyu, D.-C., Tan, T.-P., Chng, E.-S., Li, H.: Mandarin–English code-switching speech corpus in South-East Asia: SEAME. Lang. Resour. Eval. 49, 581–600 (2015)
Article Google Scholar

Download references

Acknowledgements

This work is partially supported by Shenzhen Science & Research projects. (No: JCYJ20160331104524983, JSGG20160229121006579). The authors also gratefully acknowledge the helpful comments and suggestions of the reviewers, which have improved the presentation.

Author information

Authors and Affiliations

ADSPLAB/Intelligent Lab, SECE, Peking University, Shenzhen, China
Xiao Song & Yuexian Zou
IMSL Shenzhen Key Lab, PKU-HKUST Shenzhen Hong Kong Institution, Shenzhen, China
Xiao Song & Yi Liu
PKU Shenzhen Institute, Shenzhen, China
Daming Yang

Authors

Xiao Song
View author publications
You can also search for this author in PubMed Google Scholar
Yi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Daming Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yuexian Zou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuexian Zou .

Editor information

Editors and Affiliations

College of Mathematics and Informatics, South China Agricultural University, Guangzhou, China
Kangshun Li
Jiangxi University of Science and Technology, Ganzhou, Jiangxi, China
Wei Li
Chemical and Petroleum Engineering, University of Calgary, Calgary, Alberta, Canada
Zhangxing Chen
School of Computer Science and Engineering, The University of Aizu, Aizu-Wakamatsu, Fukushima, Japan
Yong Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Song, X., Liu, Y., Yang, D., Zou, Y. (2018). A Multi-task Learning Approach for Mandarin-English Code-Switching Conversational Speech Recognition. In: Li, K., Li, W., Chen, Z., Liu, Y. (eds) Computational Intelligence and Intelligent Systems. ISICA 2017. Communications in Computer and Information Science, vol 873. Springer, Singapore. https://doi.org/10.1007/978-981-13-1648-7_9

Download citation

DOI: https://doi.org/10.1007/978-981-13-1648-7_9
Published: 21 July 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1647-0
Online ISBN: 978-981-13-1648-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics