Skip to main content

Automated Development of DNN Based Spoken Language Systems Using Evolutionary Algorithms

  • Chapter
  • First Online:
Deep Neural Evolution

Part of the book series: Natural Computing Series ((NCS))

Abstract

Spoken language processing is one of the research areas that has contributed significantly to the recent revival in neural network research. For example, speech recognition has been at the forefront of deep learning research, inventing various novel models. Their dramatic performance improvements compared to previous state-of-the-art implementations have resulted in spoken language systems being deployed in a wide range of applications today. However, these systems require intensive tuning of their network designs and the training setups in order to achieve maximal performance. The laborious effort by human experts is becoming a prominent obstacle in system development. In this chapter, we first explain the basic concepts and the neural network-based implementations of spoken language processing systems. Several types of neural network models will be described. We then introduce our effort to automate the tuning of the system meta-parameters using evolutionary algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.gsic.titech.ac.jp/en/tsubame.

  2. 2.

    https://github.com/JasperSnoek/spearmint.

  3. 3.

    https://www.lri.fr/~hansen/cmaes_inmatlab.html.

  4. 4.

    We ran main experiments in 2015, and the additional experiments in 2018.

  5. 5.

    We disabled the default option of the parallel training to make the experiments tractable in our environment as it requires a large number of GPUs.

  6. 6.

    In the table, we scored the evaluation set WERs of systems that gave the lowest development set WER through all the generations. Therefore, they were not necessarily the same as the minimum of the generation wise evaluation set WERs shown in Fig. 4.14.

References

  1. Davis, S.B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980)

    Article  Google Scholar 

  2. Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. J. Acoust. Soc. Am. 87(4), 1738–1752 (1990)

    Article  Google Scholar 

  3. Odell, J.J.: The use of context in large vocabulary speech recognition, Ph.D. Thesis, Cambridge University (1995)

    Google Scholar 

  4. Chorowski, J.K., Bahdanau, D., Serdyuk, D., Cho, K., Bengio, Y.: Attention-based models for speech recognition. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 577–585 (2015)

    Google Scholar 

  5. Graves, A., Mohamed, A.-R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6645–6649. IEEE, Piscataway (2013)

    Google Scholar 

  6. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 3104–3112 (2014)

    Google Scholar 

  7. Vinyals, O., Le, Q.: A neural conversational model. Preprint. arXiv:1506.05869 (2015)

    Google Scholar 

  8. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. Preprint. arXiv:1409.0473 (2014)

    Google Scholar 

  9. Bellman, R.E., Dreyfus, S.E.: Applied Dynamic Programming. Princeton University Press, Princeton (1962)

    Book  MATH  Google Scholar 

  10. Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, Stroudsburg, ACL ’02, pp. 311–318. Association for Computational Linguistics, Stroudsburg (2002)

    Google Scholar 

  11. Hansen, N., Müller, S.D., Koumoutsakos, P.: Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES). Evol. Comput. 11(1), 1–18 (2003)

    Article  Google Scholar 

  12. Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., Schmidhuber, J.: Natural evolution strategies. J. Mach. Learn. Res. 15(1), 949–980 (2014)

    MathSciNet  MATH  Google Scholar 

  13. Akimoto, Y., Nagata, Y., Ono, I., Kobayashi, S.: Bidirectional relation between CMA evolution strategies and natural evolution strategies. In: Proceedings of Parallel Problem Solving from Nature (PPSN), pp. 154–163 (2010)

    Google Scholar 

  14. Sutton, R.S., McAllester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Proceedings of the 12th International Conference on Neural Information Processing Systems, NIPS’99, pp. 1057–1063 (1999)

    Google Scholar 

  15. Brochu, E., Cora, V.M., De Freitas, N.: A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. Preprint. arXiv:1012.2599 (2010)

    Google Scholar 

  16. Snoek, J., Larochelle, H., Adams, R.P.: Practical Bayesian optimization of machine learning algorithms. In: Advances in Neural Information Processing Systems 25 (2012)

    Google Scholar 

  17. Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006)

    MATH  Google Scholar 

  18. Miettinen, K.: Nonlinear Multiobjective Optimization. Springer, Berlin (1998)

    Book  MATH  Google Scholar 

  19. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)

    Article  Google Scholar 

  20. Deb, K., Kalyanmoy, D.: Multi-Objective Optimization Using Evolutionary Algorithms. John Wiley & Sons, Inc., New York (2001)

    MATH  Google Scholar 

  21. Zitzler, E., Thiele, L.: Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach. IEEE Trans. Evol. Comput. 3(4), 257–271 (1999)

    Article  Google Scholar 

  22. David Schaffer, J.: Multiple objective optimization with vector evaluated genetic algorithms. In: Proceedings of the 1st International Conference on Genetic Algorithms, Hillsdale, pp. 93–100. L. Erlbaum Associates Inc., Mahwah (1985)

    Google Scholar 

  23. Hajela, P., Lin, C.Y.: Genetic search strategies in multicriterion optimal design. Struct. Optim. 4(2), 99–107 (1992)

    Article  Google Scholar 

  24. Hernandez-Lobato, D., Hernandez-Lobato, J., Shah, A., Adams, R.: Predictive entropy search for multi-objective Bayesian optimization. In: Balcan, M.F., Weinberger, K.Q. (eds.) Proceedings of The 33rd International Conference on Machine Learning, New York, 20–22 Jun. Proceedings of Machine Learning Research, vol. 48, pp. 1492–1501 (2016)

    Google Scholar 

  25. Knowles, J.: ParEGO: a hybrid algorithm with on-line landscape approximation for expensive multiobjective optimization problems. IEEE Trans. Evol. Comput. 10(1), 50–66 (2006)

    Article  Google Scholar 

  26. Moriya, T., Tanaka, T., Shinozaki, T., Watanabe, S., Duh, K.: Evolution-strategy-based automation of system development for high-performance speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 27(1), 77–88 (2019)

    Article  Google Scholar 

  27. Furui, S., Maekawa, K., Isahara, H.: A Japanese national project on spontaneous speech corpus and processing technology. In: Proceedings of ASR’00, pp. 244–248 (2000)

    Google Scholar 

  28. Allauzen, C., Riley, M., Schalkwyk, J., Skut, W., Mohri, M.: OpenFST: a general and efficient weighted finite-state transducer library. In: Implementation and Application of Automata, pp. 11–23. Sprinter, Berlin (2007)

    Google Scholar 

  29. Furui, S.: Speaker independent isolated word recognition using dynamic features of speech spectrum. IEEE Trans. Acoustics Speech Signal Process. 34, 52–59 (1986)

    Article  Google Scholar 

  30. Haeb-Umbach, R., Ney, H.: Linear discriminant analysis for improved large vocabulary continuous speech recognition. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 13–16 (1992)

    Google Scholar 

  31. Gales, M.J.F.: Maximum likelihood linear transformations for HMM-based speech recognition. Comput. Speech Lang. 12, 75–98 (1998)

    Article  Google Scholar 

  32. Povey, D., Peddinti, V., Galvez, D., Ghahremani, P., Manohar, V., Na, X., Wang, Y., Khudanpur, S.: Purely sequence-trained neural networks for ASR based on lattice-free MMI. In: Interspeech, pp. 2751–2755 (2016)

    Google Scholar 

  33. Gillick, L., Cox, S.: Some statistical issues in the comparison of speech recognition algorithms. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing, pp. 532–535 (1989)

    Google Scholar 

  34. Vesely, K., Ghoshal, A., Burget, L., Povey, D.: Sequence-discriminative training of deep neural networks. In: Proceedings of Interspeech, pp. 2345–2349 (2013)

    Google Scholar 

  35. Tanaka, T., Moriya, T., Shinozaki, T., Watanabe, S., Hori, T., Duh, K.: Automated structure discovery and parameter tuning of neural network language model based on evolution strategy. In: Proceedings of the 2016 IEEE Workshop on Spoken Language Technology, pp. 665–671 (2016)

    Google Scholar 

  36. Qin, H., Shinozaki, T., Duh, K.: Evolution strategy based automatic tuning of neural machine translation systems. In: Proceeding of International Workshop on Spoken Language Translation (IWSLT), pp. 120–128 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Takahiro Shinozaki .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Shinozaki, T., Watanabe, S., Duh, K. (2020). Automated Development of DNN Based Spoken Language Systems Using Evolutionary Algorithms. In: Iba, H., Noman, N. (eds) Deep Neural Evolution. Natural Computing Series. Springer, Singapore. https://doi.org/10.1007/978-981-15-3685-4_4

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-3685-4_4

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-3684-7

  • Online ISBN: 978-981-15-3685-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics