Abstract
Deep Reinforcement Learning (DRL) has become successful across various robotic applications. However, DRL methods are not sample-efficient and require long learning times. We present an approach for online continuous deep reinforcement learning for a reach-to-grasp task in a mixed-reality environment: A human places targets for the robot in a physical environment; DRL for reaching these targets is carried out in simulation before actual actions are carried out in the physical environment. We extend previous work on a modified Deep Deterministic Policy Gradient (DDPG) algorithm with an architecture for online learning and evaluate different strategies to accelerate learning while ensuring learning stability. Our approach provides a neural inverse kinematics solution that increases over time its performance regarding the execution time while focusing on those areas of the Cartesian space where targets are often placed by the human operator, thus enabling efficient learning. We evaluate reward shaping and augmented targets as strategies for accelerating deep reinforcement learning and analyze the learning stability.
The authors gratefully acknowledge partial support from the German Research Foundation DFG under project CML (TRR 169) and the European Union under project SECURE (No 642667).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016)
Beeson, P., Ames, B.: TRAC-IK: an open-source library for improved solving of generic inverse kinematics. In: 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids), pp. 928–935. IEEE (2015). https://doi.org/10.1109/HUMANOIDS.2015.7363472
Cangelosi, A., Schlesinger, M.: Developmental Robotics: From Babies to Robots. MIT Press, Cambridge (2015). https://doi.org/10.7551/mitpress/9320.001.0001
Chollet, F., et al.: Keras (2015). https://github.com/keras-team/keras
Daya, B., Khawandi, S., Akoum, M.: Applying neural network architecture for inverse kinematics problem in robotics. J. Softw. Eng. Appl. 3(03), 230 (2010). https://doi.org/10.4236/jsea.2010.33028
Gu, S., Lillicrap, T., Sutskever, I., Levine, S.: Continuous deep Q-learning with model-based acceleration. In: International Conference on Machine Learning, pp. 2829–2838 (2016). http://dl.acm.org/citation.cfm?id=3045390.3045688
Hafez, B., Weber, C., Wermter, S.: Curiosity-driven exploration enhances motor skills of continuous actor-critic learner. In: Proceedings of the 7th Joint IEEE International Conference on Development and Learning and on Epigenetic Robotics (ICDL-EpiRob), pp. 39–46 (2017). https://doi.org/10.1109/DEVLRN.2017.8329785
Jha, P., Biswal, B.: A neural network approach for inverse kinematic of a scara manipulator. IAES Int. J. Rob. Autom. 3(1), 52 (2014). https://doi.org/10.11591/ijra.v3i1.3201
Kerzel, M., Beik-Mohammadi, H., Zamani, M.A., Wermter, S.: Accelerating deep continuous reinforcement learning through task simplification (2018). https://doi.org/10.1109/IJCNN.2018.8489712
Levine, S., Pastor, P., Krizhevsky, A., Quillen, D.: Learning hand-eye coordination for robotic grasping with large-scale data collection. In: Kulić, D., Nakamura, Y., Khatib, O., Venture, G. (eds.) ISER 2016. SPAR, vol. 1, pp. 173–184. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-50115-4_16
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
McCloskey, M., Cohen, N.J.: Catastrophic interference in connectionist networks: the sequential learning problem. Psychol. Learn. Motiv. 24, 109–165 (1989). https://doi.org/10.1016/S0079-7421(08)60536-8
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015). https://doi.org/10.1038/nature14236
Moore, A.W., Atkeson, C.G.: Prioritized sweeping: reinforcement learning with less data and less time. Mach. Learn. 13(1), 103–130 (1993). https://doi.org/10.1007/BF00993104
Ng, A.Y., Harada, D., Russell, S.J.: Policy invariance under reward transformations: theory and application to reward shaping. In: Proceedings of the Sixteenth International Conference on Machine Learning, pp. 278–287, ICML 1999. Morgan Kaufmann Publishers Inc., San Francisco (1999). http://dl.acm.org/citation.cfm?id=645528.657613
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. arXiv preprint arXiv:1511.05952 (2015)
Van Hasselt, H., Wiering, M.A.: Reinforcement learning in continuous action spaces. In: IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007, pp. 272–279. IEEE (2007). https://doi.org/10.1109/ADPRL.2007.368199
Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992). https://doi.org/10.1007/BF00992698
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Beik Mohammadi, H., Zamani, M.A., Kerzel, M., Wermter, S. (2019). Mixed-Reality Deep Reinforcement Learning for a Reach-to-grasp Task. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Theoretical Neural Computation. ICANN 2019. Lecture Notes in Computer Science(), vol 11727. Springer, Cham. https://doi.org/10.1007/978-3-030-30487-4_47
Download citation
DOI: https://doi.org/10.1007/978-3-030-30487-4_47
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30486-7
Online ISBN: 978-3-030-30487-4
eBook Packages: Computer ScienceComputer Science (R0)