Skip to main content
Log in

Learning to Transform Service Instructions into Actions with Reinforcement Learning and Knowledge Base

  • Research Article
  • Published:
International Journal of Automation and Computing Aims and scope Submit manuscript

Abstract

In order to improve the learning ability of robots, we present a reinforcement learning approach with a knowledge base for mapping natural language instructions to executable action sequences. A simulated platform with physical engine is built as interactive environment. Based on the knowledge base, a reward function with immediate rewards and delayed rewards is designed to handle sparse reward problems. Also, a list of object states is produced by retrieving the knowledge base, as a standard to define the quality of action sequences. Experimental results demonstrate that our approach yields good performance on accuracy of action sequences production.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. W. Wang, Q. F. Zhao, T. H. Zhu. Research of natural language understanding in human-service robot interaction. Microcomputer Applications, vol. 3, no. 1, pp. 45–49, 2015.

    Google Scholar 

  2. L. F. Shang, Z. D. Lu, H. Li. Neural responding machine for short-text conversation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, IEEE, Beijing, China, pp. 1577–1586, 2015. Doi: 10.3115/v1/P15-1152.

    Google Scholar 

  3. J. M. Ji, X. P. Chen. A weighted causal theory for acquiring and utilizing open knowledge. International Journal of Approximate Reasoning, vol. 55, no. 9, pp. 2071–2082, 2014. Doi: 10.1016/j.ijar.2014.03.002.

    Google Scholar 

  4. M. Tenorth, M. Beetz. Know rob-knowledge processing for autonomous personal robots. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, St. Louis, USA, pp. 4261266, 2009. Doi: 10.1109/IRGS.2009.5354602.

    Google Scholar 

  5. M. Waibel, M. Beetz, J. Civera, R. D’Andrea, J. Elfring, D. Galvez-Lopez, K. Haussermann, R. Janssen, J. M. M. Montiel, A. Perzylo, B. Schiessle, M. Tenorth, O. Zweigle, R. van de Molengraft. Roboearth. IEEE Robotics and Automation Magazine, vol. 18, no. 2, pp. 69–82, 2011. DOI: 10.1109/MRA.2011.941632.

    Article  Google Scholar 

  6. R. Reiter. Knowledge in Action: Logical Foundations for Specifying and Implementing Dynamical Systems, Cambridge, USA: MIT Press, 2001.

    MATH  Google Scholar 

  7. D. McDermott. The formal semantics of processes in PDDL. In Proceedings of the 23th International Conference on Automated Planning Scheduling, Rome, Italy, 2003.

    Google Scholar 

  8. M. Fox, D. Long. PDDL2.1: An extension to PDDL for expressing temporal planning domains. Journal of Artificial Intelligence Research, vol. 20, pp. 61–124, 2003. DOI: 10. 1613/jair.1129.

    Article  MATH  Google Scholar 

  9. L. P. Kaelbling, M. L. Littman, A. R. Cassandra. Planning and acting in partially observable stochastic domains. Artificial Intelligence, vol. 101, no. 1–2, pp. 99–134, 1998. DOI: 10.1016/S0004-3702(98)00023-X.

    Google Scholar 

  10. I. A. Hameed. Using natural language processing (NLP) for designing socially intelligent robots. In Proceedings of Joint IEEE International Conference on Development and Learning and Epigenetic Robotics, IEEE, Cergy-Pontoises, France, pp. 268–269, 2016. DOI: 10.1109/DEVLRN. 2016.7846830.

    Google Scholar 

  11. M. Tenorth, D. Nyga, M. Beetz. Understanding and executing instructions for everyday manipulation tasks from the World Wide Web. In Proceedings of IEEE International Conference on Robotics and Automation, IEEE, Anchorage, USA, pp. 1486–1491, 2010. DOI: 10.1109/ROBOT.2010.5509955.

    Google Scholar 

  12. M. Tenorth, U. Klank, D. Pangercic, M. Beetz. Web-enabled robots. IEEE Robotics & Automation Magazine, vol. 18, no. 2, pp. 58–68, 2011. DOI: 10.1109/MRA.2011. 940993.

    Article  Google Scholar 

  13. Y. LeCun, Y. G. Bengio, G. Hinton. Deep learning. Nature, vol. 521, no. 7553, pp. 436–444, 2015. DOI: 10.1038/nature14539.

    Article  Google Scholar 

  14. L. Deng, D. Yu. Deep learning: Methods and applications. Foundations and Trends in Signal Processing, vol. 7, no. 3–4, pp. 197–387, 2014. DOI: 10.1561/2000000039.

    Google Scholar 

  15. G. Hinton, L. Deng, D. Yu, G. Dahl, A. R. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, B. Kingsbury. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82–97, 2012. DOI: 10.1109/MSP.2012.2205597.

    Article  Google Scholar 

  16. A. Krizhevsky, I. Sutskever, G. E. Hinton. ImageNet classification with deep convolutional neural networks. In Proceedings of Advances in Neural Information Processing Systems, Lake Tahoe, USA, pp. 1097–1105, 2012.

    Google Scholar 

  17. V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, D. Hassabis. Human-level control through deep reinforcement learning. Nature, vol. 518, no. 7540, pp. 529–533, 2015. DOI: 10.1038/nature14236.

    Article  Google Scholar 

  18. D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, D. Hassabis. Mastering the game of Go with deep neural networks and tree search. Nature, vol. 529, no. 7587, pp. 484–489, 2016. DOI: 10.1038/nature16961.

    Article  Google Scholar 

  19. T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, D. Wierstra. Continuous control with deep reinforcement learning. Computer Science, vol. 529, no. 7587, pp. 484–489, 2015.

    Google Scholar 

  20. Y. Duan, X. Chen, R. Houthooft, J. Schulman, P. Abbeel. Benchmarking deep reinforcement learning for continuous control. In Proceedings of the 33rd International Conference on Machine Learning, ACM, New York, USA, pp. 1329–1338, 2016.

    Google Scholar 

  21. R. S. Sutton, A. G. Barto. Reinforcement Learning: An Introduction, Cambridge, UK: MIT Press, 1998.

    Google Scholar 

  22. J. He, M. Ostendorf, X. D. He, J. S. Chen, J. F. Gao, L. H. Li, L. Deng. Deep reinforcement learning with a combinatorial action space for predicting popular Reddit threads. https://doi.org/arxir.org/abs/1606.03667.

  23. D. Dowty. Compositionality as an empirical problem. Direct Compositionality, C. Barker, P. I. Jacobson, Eds., Oxford, UK: Oxford University Press, pp. 23–101, 2007.

  24. K. S. Tai, R. Socher, C. D. Manning. Improved semantic representations from tree-structured long short-term memory networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China, pp. 1556–1566, 2015.

    Google Scholar 

  25. S. R. Bowman, J. Gauthier, A. Rastogi, R. Gupta, C. D. Manning, C. Potts. A fast unified model for parsing and sentence understanding. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, pp. 1466–1477, 2016.

    Google Scholar 

  26. R. Kaplan, C. Sauer, A. Sosa. Beating Atari with natural language guided reinforcement learning. Computer Science. https://doi.org/adsabs.harvard.edu/abs/2017arXiv170405539K.

  27. F. Wu, Z. W. Xu, Y. Yang. An end-to-end approach to natural language object retrieval via context-aware deep reinforcement learning. https://doi.org/arxir.org/abs/1703.07579.

  28. S. R. K. Branavan, H. Chen, L. S. Zettlemoyer, R. Barzilay. Reinforcement learning for mapping instructions to actions. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Suntec, Singapore, pp. 82–90, 2009. DOI: 10.3115/1687878.1687892.

    Google Scholar 

  29. A. Pritzel, B. Uria, S. Srinivasan, A. Puigdomenech, O. Vinyals, D. Hassabis, D. Wierstra, C. Blundell. Neural episodic control. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, pp. 963–975, 2017.

    Google Scholar 

  30. A. S. Vezhnevets, S. Osindero, T. Schaul, N. Heess, M. Jaderberg, D. Silver, K. Kavukcuoglu. Feudal networks for hierarchical reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 2017.

    Google Scholar 

  31. M. Jaderberg, V. Mnih, W. M. Czarnecki, T. Schaul, J. Z. Leibo, D. Silver, K. Kavukcuoglu. Reinforcement learning with unsupervised auxiliary tasks. Computer Science. https://doi.org/adsabs.harvard.edu/abs/2016arXiv161105397J.

  32. G. Lample, D. S. Chaplot. Playing FPS games with deep reinforcement learning. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, San Francisco, USA, pp. 2140–2146, 2017.

    Google Scholar 

  33. Q. Y. Gu, I. Ishii. Review of some advances and applications in real-time high-speed vision: our views and experiences. International Journal of Automation and Computing, vol. 13, no. 4, pp. 305–318, 2016. DOI: 10.1007/s11633-016-1024-0.

    Article  Google Scholar 

  34. S. Miyashita, X. Y. Lian, X. Zeng, T. Matsubara, K. Uehara. Developing game AI agent behaving like human by mixing reinforcement learning and supervised learning. In Proceedings of the 18th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, IEEE, Kanazawa, Japan, pp. 489–494, 2017. Doi: 10.1109/SNPD. 2017.8022767.

    Google Scholar 

  35. Y. K. Zhu, R. Mottaghi, E. Kolve, J. J. Lim, A. Gupta, F. F. Li, A. Farhadi. Target-driven visual navigation in indoor scenes using deep reinforcement learning. In Proceedings of IEEE International Conference on Robotics and Automation, IEEE, Singapore, pp. 3357–3364, 2017. Doi: 10.1109/ICRA.2017.7989381.

    Google Scholar 

  36. Q. V. Le. Building high-level features using large scale unsupervised learning. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, Vancouver, Canada, pp. 8595–8598, 2013. Doi: 10.1109/ICASSP.2013.6639343.

    Google Scholar 

  37. R. S. Sutton, D. McAllester, S. Singh, Y. Mansour. Policy gradient methods for reinforcement learning with function approximation. In Proceedings of Advances in Neural Information Processing Systems, Denver, USA, pp. 1057–1063, 2000.

    Google Scholar 

  38. D. R. Liu, H. L. Li, D. Wang. Feature selection and feature learning for high-dimensional batch reinforcement learning: A survey. International Journal of Automation and Computing, vol. 12, no. 3, pp. 229–242, 2015. Doi: 10.1007/s11633-015-0893-y.

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (No. 61773239) and Shenzhen Future Industry Special Fund (No. JCYJ20160331174814755).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guo-Hui Tian.

Additional information

Recommended by Guest Editor Jun-Zhi Yu

Meng-Yang Zhang received the B. Sc. and M. Sc. degrees in automation from Qingdao University of Technology, China in 2012 and 2014, respectively. He is currently a Ph. D. degree candidate in control theory and control engineering at Shandong University, China.

His research interests include intelligent space technology and service robot, reinforcement learning, and knowledge construction based on ontology.

Guo-Hui Tian received the B. Sc. degree from Department of Mathematics, Shandong University, China in 1990, the M. Sc. degree in automation from Department of Automation, Shandong University of Technology, China in 1993, and the Ph. D. degree in automatic control theory and application from School of Automation, Northeastern University, China in 1997. He studied as a post-doctoral researcher in School of Mechanical Engineering, Shandong University from 1999 to 2001, and worked as a visiting professor in Graduate School of Engineering, Tokyo University of Japan from 2003 to 2005. He was a lecturer from 1997 to 1998 and an associate professor from 1998 to 2002 in Shandong University. At present, he is the professor in School of Control Science and Engineering, Shandong University, China. And also he is the vice director of the Intelligence Robot Specialized Committee of Chinese Association for Artificial Intelligence, the vice director of the Intelligent Manufacturing System Specialized Committee of Chinese Association for Automation, and the member of the IEEE Robotics and Automation Society.

His research interests include service robot, intelligent space, cloud robotics and brain-inspired intelligent robotics.

Ci-Ci Li received the B. Sc. degree in automation from the Northeastern University, China in 2014. She is currently the Ph. D. degree candidate in control science and engineering at Shandong University, China.

Her research interests include home service robot and object cognition.

Jing Gong received the B. Sc. degree in automation from the Zhengzhou University, China in 2015. He is currently a master student in control science and engineering at Shandong University, China.

His research interests include home service robot, natural language processing and cloud robot system.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, MY., Tian, GH., Li, CC. et al. Learning to Transform Service Instructions into Actions with Reinforcement Learning and Knowledge Base. Int. J. Autom. Comput. 15, 582–592 (2018). https://doi.org/10.1007/s11633-018-1128-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11633-018-1128-9

Keywords

Navigation