Skip to main content

Improving Minority Language Speech Recognition Based on Distinctive Features

  • Conference paper
  • First Online:
Intelligence Science and Big Data Engineering (IScIDE 2018)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11266))

Abstract

With the development of deep learning technology, speech recognition based on deep neural networks has been continuously improved in recent years. However, the performance of minority language speech recognition still cannot compare with that on majority language whose data can be collected and transcribed easily relatively. Therefore, we attempt to work out an effective data sharing method cross different languages to improve the performance of minority language speech recognition. We proposed a speech attribute detector model under an end-to-end framework, and then we utilized the detector to extract features for minority language speech recognition. To the best of our knowledge, this is the first end-to-end model extracting distinctive features. We implemented our experiments on Tibetan and Mandarin. The results showed the significant improvements were achieved on Tibetan phoneme recognition via utilizing the Mandarin data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The distinctive features are a set of distinguishing attributes that are summarized by linguists to differentiate phonemes, and reflect the different states of speech organs.

References

  1. Cohen, P., Dharanipragada, S., Gros, J., Monkowski, M.: Towards a universal speech recognizer for multiple languages. In: 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings, pp. 591–598 (1997)

    Google Scholar 

  2. Burget, L., Schwarz, P., Agarwal, M., Akyazi, P., Feng, K., Ghoshal, A., et al.: Multilingual acoustic modeling for speech recognition based on subspace Gaussian mixture models. In: IEEE International Conference on Acoustics Speech and Signal Processing, vol. 130, pp. 4334–4337 (2010)

    Google Scholar 

  3. Huang, J.T., Li, J., Yu, D., Deng, L., Gong, Y.: Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 7304–7308 (2013)

    Google Scholar 

  4. Heigold, G., Vanhoucke, V., Senior, A., Nguyen, P., Ranzato, M., Devin, M., et al.: Multilingual acoustic models using distributed deep neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8619–8623 (2013)

    Google Scholar 

  5. Grezl, F., Karafiat, M., Vesely, K.: Adaptation of multilingual stacked bottle-neck neural network structure for new language. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 7654–7658 (2014)

    Google Scholar 

  6. Yu, D., Seltzer, M.L.: Improved bottleneck features using pretrained deep neural networks. In: INTERSPEECH 2011, Conference of the International Speech Communication Association, Florence, Italy, August, pp. 237–240 (2011)

    Google Scholar 

  7. Schultz, T., Waibel, A.: Language-independent and language-adaptive acoustic modeling for speech recognition. Speech Commun. 35(1–2), 31–51 (2001)

    Article  Google Scholar 

  8. International Phonetic Association: Handbook of the International Phonetic Association: A Guide to the Use of the International Phonetic Alphabet. Cambridge University Press, Cambridge (1999)

    Google Scholar 

  9. Niesler, T.: Language-dependent state clustering for multilingual acoustic modelling. Speech Commun. 49(6), 453–463 (2007)

    Article  Google Scholar 

  10. Lin, H., Deng, L., Yu, D., Gong, Y., Acero, A., Lee, C.H.: A study on multilingual acoustic modeling for large vocabulary ASR. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4333–4336 (2009)

    Google Scholar 

  11. Lee, C.H., Siniscalchi, S.M.: An information-extraction approach to speech processing: analysis, detection, verification, and recognition. Proc. IEEE 101(5), 1089–1115 (2013)

    Article  Google Scholar 

  12. Siniscalchi, S.M., Svendsen, T., Lee, C.H.: A bottom-up stepwise knowledge-integration approach to large vocabulary continuous speech recognition using weighted finite state machines, pp. 901–904 (2011)

    Google Scholar 

  13. Graves, A., Jaitly, N.: Towards end-to-end speech recognition with recurrent neural networks. In: International Conference on Machine Learning, pp. 1764–1772 (2014)

    Google Scholar 

  14. Maas, A., Xie, Z., Dan, J., Ng, A.: Lexicon-free conversational speech recognition with neural networks. In: Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 345–354 (2015)

    Google Scholar 

  15. Chan, W., Jaitly, N., Le, Q., Vinyals, O.: Listen, attend and spell: a neural network for large vocabulary conversational speech recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4960–4964 (2016)

    Google Scholar 

  16. Graves, A., Gomez, F.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: International Conference on Machine Learning, vol. 2006, pp. 369–376 (2006)

    Google Scholar 

  17. Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (2002)

    Article  Google Scholar 

  18. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  19. PHOIBLE Online. http://phoible.org

Download references

Acknowledgments

The work is supported in part by the National Natural Science Foundation of China (No. 11590773, No. U1713217), the Key Program of National Social Science Foundation of China (No. 12 & ZD119).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tong Fu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fu, T., Gao, S., Wu, X. (2018). Improving Minority Language Speech Recognition Based on Distinctive Features. In: Peng, Y., Yu, K., Lu, J., Jiang, X. (eds) Intelligence Science and Big Data Engineering. IScIDE 2018. Lecture Notes in Computer Science(), vol 11266. Springer, Cham. https://doi.org/10.1007/978-3-030-02698-1_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-02698-1_36

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-02697-4

  • Online ISBN: 978-3-030-02698-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics