Skip to main content
Log in

GRU: optimization of NPI performance

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Currently, artificial intelligence is being used in automatic programming by producing snippets of code. NPI (neural programmer-interpreter) is the most used technology that uses machine learning to implement automatic programming. This paper is aimed to improve the performance of traditional NPI and improve the speed of NPI training without loss of precision. To achieve this goal, we changed the core structure of NPI by adopting the GRU (gated recurrent unit) to replace LSTM (long short-term memory) in NPI. GRU has a control unit that regulates the flow of information within the hidden unit while without single memory unit. Numerical results have been presented to demonstrate the performance of the proposed methodology. That is, GRU-based NPI improved the performance of the original LSTM-based NPI by nearly 33% under the premise of ensuring equal accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Abbreviations

NPI:

Neural programmer-interpreter

LSTM:

Long short-term memory

GRU:

Gated recurrent unit

NTM:

Neural Turing Machine

References

  1. Rumelhart DE, Hinton GE, McClelland JL (1986) Parallel distributed processing: explorations in the microstructure of cognition, vol. 1. Chapter. In: A general framework for parallel distributed processing, MIT Press, pp 45–76

  2. Sutskever I, Hinton GE (2009) Using matrices to model symbolic relationship. In: Advances in neural information processing systems, pp 1593–1600

  3. Donnarumma F, Prevete R, Chersi F, Pezzulo G (2015) A programmer interpreter neural network architecture for prefrontal cognitive control. Int J Neural Syst 25(6):1550017

    Article  Google Scholar 

  4. Schmidhuber J (1992) Learning to control fast-weight memories: an alternative to dynamic recurrent networks. Neural Comput 4(1):131–139

    Article  Google Scholar 

  5. Schneider W, Chein JM (2003) Controlled and automatic processing: behavior, theory, and biological mechanisms. Cogn Sci 27(3):525–559

    Article  Google Scholar 

  6. Anderson ML (2010) Neural reuse: a fundamental organizational principle of the brain. Behav Brain Sci 33:245–266

    Article  Google Scholar 

  7. Brito R, Fong S, Cho K (2016) Towards implementation of residual-feedback GMDH neural network on parallel GPU memory guided by a regression curve. J Supercomput 72(10):1–28

    Google Scholar 

  8. Graves A, Wayne G, Danihelka, I (2014) Neural turing machines. arXiv preprint arXiv:1410.5401

  9. Vinyals O, Fortunato M, Jaitly N (2015) Pointer networks. In: International Conference on Neural Information Processing Systems. MIT Press

  10. Banzhaf W, Nordin P, Keller RE, Francone FD (1998) Genetic programming: an introduction, vol 1. Morgan Kaufmann, San Francisco

    Book  Google Scholar 

  11. Mou L, Li G, Liu Y, Peng H, Jin Z, Xu Y, Zhang L (2014) Building program vector representations for deep learning. arXiv preprint arXiv:1409.3358

  12. Zaremba W, Sutskever I (2014) Learning to execute. arXiv preprint arXiv:1410.4615

  13. Joulin A, Mikolov T (2015) Inferring algorithmic patterns with stack-augmented recurrent nets. In NIPS

  14. Zaremba W, Sutskever I (2015) Reinforcement learning neural turing machines. arXiv preprint arXiv:1505.00521

  15. Zaremba W, Mikolov T, Joulin A, Fergus R (2015) Learning simple algorithms from examples. arXiv preprint arXiv:1511.07275

  16. Kaiser Ł, Sutskever I (2015) Neural gpus learn algorithms. arXiv preprint arXiv:1511.08228

  17. Kurach K, Andrychowicz M, Sutskever I (2015) Neural random-access machines. arXiv preprint arXiv:1511.06392

  18. Cong G, Bhardwaj O, Feng M (2017) An efficient, distributed stochastic gradient descent algorithm for deep-learning applications. In: International Conference on Parallel Processing. IEEE Computer Society, pp 11–20

  19. Gers FA, Schmidhuber J (2000) Recurrent nets ¨ that time and count. In: Neural Networks, 2000. IJCNN 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on, IEEE, volume 3, pp 189–194. ISBN 0769506194

  20. Graves A, Schmidhuber J (2005) Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw 18(5–6):602–610

    Article  Google Scholar 

  21. Gers FA, Perez-Ortiz JA, Eck D, Schmidhuber J (2002) DEFK-LSTM. In: ESANN 2002, Proceedings of the 10th Eurorean Symposium on Artificial Neural Networks

  22. Schmidhuber J, Wierstra D, Gagliolo M, Gomez FJ (2007) Training recurrent networks by EVOLINO. Neural Comput 19(3):757–779

    Article  Google Scholar 

  23. Bayer J, Wierstra D, Togelius J, Schmidhuber J (2009) Evolving memory cell structures for sequence learning. In: Artificial Neural Networks–ICANN 2009, Springer, pp 755–764

  24. Sak H, Senior A, Beaufays F (2014) Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH)

  25. Doetsch P, Kozielski M, Ney H (2014) Fast and robust training of recurrent neural networks for offline handwriting recognition. In: 14th International Conference on Frontiers in Handwriting Recognition

  26. Otte S, Liwicki M, Zell A (2014) Dynamic cortex memory: enhancing recurrent neural networks for gradient-based sequence learning. In: Artificial Neural Networks and Machine Learning—ICANN 2014, number 8681 in Lecture Notes in Computer Science. Springer International Publishing, pp 1–8

  27. Cho K, van Merrienboer B, Gulcehre C, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder decoder for statistical machine translation. arXiv preprint arXiv:1406.1078

Download references

Acknowledgements

The research presented in this paper was supported by Ministry of Science and Technology of the People’s Republic of China and National Natural Science Foundation of China.

Availability of data and materials

The simulation code can be obtained by contacting the Email of corresponding authors.

Funding

This research is partially supported by the National key Research and Development Plan of China under grant No. (2016YFB1100501, 2017YFB1103603, 2017YFB-1103000), National Natural Science Foundation of China under grant No. (61772365, 41772123, 61602343, 51607122, 51575158 and 51378350), Tianjin Province Science and Technology Projects under grant No. (16JCYBJC18400, 16ZLZDZF-00150, 17ZLZXZF00310, 17JCQNJC04500, 17JCYBJC15100).

Author information

Authors and Affiliations

Authors

Contributions

Wei Liu is the main writer of this paper. She proposed the main idea, completed the experiment, and analyzed the result. The other authors gave some important suggestions for the experiment. All the authors read and approved the final manuscript.

Corresponding author

Correspondence to Hanning Chen.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Appendix 1: NPI flowchart based on GRU

Appendix 1: NPI flowchart based on GRU

figure a

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, W., Wang, Q., Zhu, Y. et al. GRU: optimization of NPI performance. J Supercomput 76, 3542–3554 (2020). https://doi.org/10.1007/s11227-018-2634-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-018-2634-9

Keywords

Navigation