Improving Minority Language Speech Recognition Based on Distinctive Features

Fu, Tong; Gao, Shaojun; Wu, Xihong

doi:10.1007/978-3-030-02698-1_36

Tong Fu¹⁷,
Shaojun Gao¹⁷ &
Xihong Wu¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11266))

Included in the following conference series:

International Conference on Intelligent Science and Big Data Engineering

1742 Accesses
2 Citations

Abstract

With the development of deep learning technology, speech recognition based on deep neural networks has been continuously improved in recent years. However, the performance of minority language speech recognition still cannot compare with that on majority language whose data can be collected and transcribed easily relatively. Therefore, we attempt to work out an effective data sharing method cross different languages to improve the performance of minority language speech recognition. We proposed a speech attribute detector model under an end-to-end framework, and then we utilized the detector to extract features for minority language speech recognition. To the best of our knowledge, this is the first end-to-end model extracting distinctive features. We implemented our experiments on Tibetan and Mandarin. The results showed the significant improvements were achieved on Tibetan phoneme recognition via utilizing the Mandarin data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The distinctive features are a set of distinguishing attributes that are summarized by linguists to differentiate phonemes, and reflect the different states of speech organs.

References

Cohen, P., Dharanipragada, S., Gros, J., Monkowski, M.: Towards a universal speech recognizer for multiple languages. In: 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings, pp. 591–598 (1997)
Google Scholar
Burget, L., Schwarz, P., Agarwal, M., Akyazi, P., Feng, K., Ghoshal, A., et al.: Multilingual acoustic modeling for speech recognition based on subspace Gaussian mixture models. In: IEEE International Conference on Acoustics Speech and Signal Processing, vol. 130, pp. 4334–4337 (2010)
Google Scholar
Huang, J.T., Li, J., Yu, D., Deng, L., Gong, Y.: Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 7304–7308 (2013)
Google Scholar
Heigold, G., Vanhoucke, V., Senior, A., Nguyen, P., Ranzato, M., Devin, M., et al.: Multilingual acoustic models using distributed deep neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8619–8623 (2013)
Google Scholar
Grezl, F., Karafiat, M., Vesely, K.: Adaptation of multilingual stacked bottle-neck neural network structure for new language. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 7654–7658 (2014)
Google Scholar
Yu, D., Seltzer, M.L.: Improved bottleneck features using pretrained deep neural networks. In: INTERSPEECH 2011, Conference of the International Speech Communication Association, Florence, Italy, August, pp. 237–240 (2011)
Google Scholar
Schultz, T., Waibel, A.: Language-independent and language-adaptive acoustic modeling for speech recognition. Speech Commun. 35(1–2), 31–51 (2001)
Article Google Scholar
International Phonetic Association: Handbook of the International Phonetic Association: A Guide to the Use of the International Phonetic Alphabet. Cambridge University Press, Cambridge (1999)
Google Scholar
Niesler, T.: Language-dependent state clustering for multilingual acoustic modelling. Speech Commun. 49(6), 453–463 (2007)
Article Google Scholar
Lin, H., Deng, L., Yu, D., Gong, Y., Acero, A., Lee, C.H.: A study on multilingual acoustic modeling for large vocabulary ASR. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4333–4336 (2009)
Google Scholar
Lee, C.H., Siniscalchi, S.M.: An information-extraction approach to speech processing: analysis, detection, verification, and recognition. Proc. IEEE 101(5), 1089–1115 (2013)
Article Google Scholar
Siniscalchi, S.M., Svendsen, T., Lee, C.H.: A bottom-up stepwise knowledge-integration approach to large vocabulary continuous speech recognition using weighted finite state machines, pp. 901–904 (2011)
Google Scholar
Graves, A., Jaitly, N.: Towards end-to-end speech recognition with recurrent neural networks. In: International Conference on Machine Learning, pp. 1764–1772 (2014)
Google Scholar
Maas, A., Xie, Z., Dan, J., Ng, A.: Lexicon-free conversational speech recognition with neural networks. In: Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 345–354 (2015)
Google Scholar
Chan, W., Jaitly, N., Le, Q., Vinyals, O.: Listen, attend and spell: a neural network for large vocabulary conversational speech recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4960–4964 (2016)
Google Scholar
Graves, A., Gomez, F.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: International Conference on Machine Learning, vol. 2006, pp. 369–376 (2006)
Google Scholar
Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (2002)
Article Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
PHOIBLE Online. http://phoible.org

Download references

Acknowledgments

The work is supported in part by the National Natural Science Foundation of China (No. 11590773, No. U1713217), the Key Program of National Social Science Foundation of China (No. 12 & ZD119).

Author information

Authors and Affiliations

Key Laboratory on Machine Perception (Ministry of Education), Speech and Hearing Research Center, Peking University, Beijing, China
Tong Fu, Shaojun Gao & Xihong Wu

Authors

Tong Fu
View author publications
You can also search for this author in PubMed Google Scholar
Shaojun Gao
View author publications
You can also search for this author in PubMed Google Scholar
Xihong Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tong Fu .

Editor information

Editors and Affiliations

Peking University, Beijing, China
Yuxin Peng
Shanghai Jiao Tong University, Shanghai, China
Kai Yu
Tsinghua University, Beijing, China
Jiwen Lu
Central China Normal University, Wuhan, China
Xingpeng Jiang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fu, T., Gao, S., Wu, X. (2018). Improving Minority Language Speech Recognition Based on Distinctive Features. In: Peng, Y., Yu, K., Lu, J., Jiang, X. (eds) Intelligence Science and Big Data Engineering. IScIDE 2018. Lecture Notes in Computer Science(), vol 11266. Springer, Cham. https://doi.org/10.1007/978-3-030-02698-1_36

Download citation

DOI: https://doi.org/10.1007/978-3-030-02698-1_36
Published: 09 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02697-4
Online ISBN: 978-3-030-02698-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics