Skip to main content
Log in

Robot Control Optimization Using Reinforcement Learning

  • Published:
Journal of Intelligent and Robotic Systems Aims and scope Submit manuscript

Abstract

Conventional robot control schemes are basically model-based methods. However, exact modeling of robot dynamics poses considerable problems and faces various uncertainties in task execution. This paper proposes a reinforcement learning control approach for overcoming such drawbacks. An artificial neural network (ANN) serves as the learning structure, and an applied stochastic real-valued (SRV) unit as the learning method. Initially, force tracking control of a two-link robot arm is simulated to verify the control design. The simulation results confirm that even without information related to the robot dynamic model and environment states, operation rules for simultaneous controlling force and velocity are achievable by repetitive exploration. Hitherto, however, an acceptable performance has demanded many learning iterations and the learning speed proved too slow for practical applications. The approach herein, therefore, improves the tracking performance by combining a conventional controller with a reinforcement learning strategy. Experimental results demonstrate improved trajectory tracking performance of a two-link direct-drive robot manipulator using the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. An, C. H., Atkeson, C. G., and Hollerbach, J. M.: Model Based Control of a Robot Manipulator, MIT Press, Cambridge, MA, 1988.

    Google Scholar 

  2. Albus, J. S.: A new approach to manipulator control: the cerebellar model articulation controller (CMAC), Trans. of ASME, Series G 97(3) (1975), 220–227.

    Google Scholar 

  3. Barto, A. G.: Connectionist learning for control, in: W. T. Miller, R. Sutton, and P. Werbos (eds), Neural Networks for Control, MIT Press, Cambridge, MA, 1990.

    Google Scholar 

  4. Barto, A. G. and Anandan, P.: Pattern-recognizing stochastic learning automata, IEEE Trans. Systems Man Cybernet. 15(3) (1985), 360–375.

    Google Scholar 

  5. Gullapalli, V., Franklin, J. A., and Benbrahim, H.: Acquiring robot skills via reinforcement learning, IEEE Control Systems 14(1) (1994), 13–24.

    Google Scholar 

  6. Gullapalli, V.: Associative reinforcement learning of real-valued functions, in: Proc. of the IEEE Int. Conf. on Systems Man Cybernet., 1991, pp. 1453–1458.

  7. Gullapalli, V.: A stochastic reinforcement learning algorithm for learning real-valued functions, Neural Networks 3 (1990), 671–692.

    Google Scholar 

  8. Michie, D. and Chambers, R. A.: BOXES: an experiment in adaptive control, in: E. Dale and D. Michie (eds), Machine Intelligence 2, 1986, pp. 137–152.

  9. Miller, W. T., Hewes, R. P., Glanz, F. H., and Kraft, L. G.: Real-time dynamic control of an industrial manipulator using a neural-network-based learning controller, IEEE Trans. Robot. Automat. 6(1) (1990), 1–9.

    Google Scholar 

  10. Narendra, K. S. and Thathachar, M. A. L.: Learning automata – a survey, IEEE Trans. Systems Man Cybernet. 14 (1974), 323–334.

    Google Scholar 

  11. Raibert, M. H. and Craig, J.: Hybrid position/force control of manipulators, Trans. ASME J. Dyn. Systems Meas. Control 102 (1981), 126–133.

    Google Scholar 

  12. Rumelhart, D. E., Hinton, G. E., and Williams, R. J.: Parallel Distributed Processing, MIT Press, Cambridge, MA, 1986.

    Google Scholar 

  13. Song, K. T. and Chu, T. S.: An experimental study of force tracking control by reinforcement learning, in: Proc. 1994 Internat. Symp. on Artificial Neural Networks, Taiwan, 1994, pp. 728–734.

  14. Sun, W. Y.: Control design and experimental study of a robot using reinforcement learning, Master thesis, National Chiao Tung Univ., 1995.

  15. Sutton, R. S.: Learning to predict by the method of temporal difference, Machine Learning 3(1) (1988), 9–44.

    Google Scholar 

  16. Werbos, P. J.: Generalization of back propagation with application to a recurrent gas market model, Neural Networks 1 (October, 1988), 339–356.

  17. Widrow, B. and Stearns, S. D.: Adaptive Signal Processing, Prentice-Hall, Englewood Cliffs, NJ, 1985.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Song, KT., Sun, WY. Robot Control Optimization Using Reinforcement Learning. Journal of Intelligent and Robotic Systems 21, 221–238 (1998). https://doi.org/10.1023/A:1007904418265

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1007904418265

Navigation