Skip to main content

Advertisement

Log in

A Huber reward function-driven deep reinforcement learning solution for cart-pole balancing problem

  • S.I.: Human-aligned Reinforcement Learning for Autonomous Agents and Robots
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Lots of learning tasks require experience learning based on activities performed in real scenarios which are affected by environmental factors. Therefore, real-time systems demand a model to learn from working experience—such as physical object properties-driven system models, trajectory prediction, and Atari games. This experience-driven learning model uses reinforcement learning which is considered as an important research topic and needs problem-specific reasoning model simulation. In this research paper, cart-pole balancing problem is selected as a problem where the system learns using Q-learning and Deep Q network reinforcement learning approaches. Pragmatic foundation of cart-pole problem and its solution with the help of Q learning and DQN reinforcement learning model are validated, and a comparison of achieved outcome in the form of accuracy and fast convergence is presented. An unexperienced Huber loss function is applied on cart-pole balancing problem, and results are in favor of Huber loss function in comparison with mean-squared error loss function. Hence, experimental study suggests the use of DQN with Huber loss reward function for fast learning and convergence of cart pole in balanced condition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

  1. Chen Y, Han X (2021) Four-rotor ae flight of inverted pendulum based on reinforcement learning. In: 2021 2nd international conference on artificial intelligence and information systems, pp 1–5

  2. Moreira I, Rivas J, Cruz F, Dazeley R, Ayala A, Fernandes B (2020) Deep reinforcement learning with interactive feedback in a human–robot environment. Appl Sci 10(16):5574

    Article  Google Scholar 

  3. Nguyen HS, Cruz F, Dazeley R (2021) A broad-persistent advising approach for deep interactive reinforcement learning in robotic environments. ArXiv preprint arXiv:2110.08003

  4. Variengien A, Nichele S, Glover T, Pontes-Filho S (2021) Towards selforganized control: using neural cellular automata to robustly control a cart-pole agent. arXiv preprint arXiv:2106.15240

  5. Nagendra S, Podila N, Ugarakhod R, George K (2017) Comparison of reinforcement learning algorithms applied to the cart-pole problem. In: 2017 international conference on advances in computing, communications and informatics (ICACCI), pp. 26–32, IEEE

  6. Prasad LB, Tyagi B, Gupta HO (2014) Optimal control of nonlinear inverted pendulum system using pid controller and lqr: performance analysis without and with disturbance input. Int J Autom Comput 11(6):661–670

    Article  Google Scholar 

  7. Haydari A, Yilmaz Y (2020) Deep reinforcement learning for intelligent transportation systems: a survey. IEEE Trans Intel Transp Syst

  8. Sutton RS, Barto AG (1998) Reinforcement learning: an introduction MIT press. Cambridge, MA 22447 (1998)

  9. Littman M, Moore A (1996) reinforcement learning: a survey, journal of artificial intelligence research 4. syf

  10. Yaghmaie FA, Ljung L (2021) A crash course on reinforcement learning. arXiv preprint arXiv:2103.04910

  11. Manrique Escobar CA, Pappalardo CM, Guida D (2020) A parametric study of a deep reinforcement learning control system applied to the swing-up problem of the cart-pole. Appl Sci 10(24):9013

    Article  Google Scholar 

  12. Lapan M (2018) Deep reinforcement learning hands-on: apply modern rl methods, with deep Q-networks, value iteration, policy gradients, TRPO. Packt Publishing Ltd, AlphaGo Zero and More

    Google Scholar 

  13. Sharma S (2020) Modeling an inverted pendulum via differential equations and reinforcement learning techniques

  14. Xie A, Finn C (2021) Lifelong robotic reinforcement learning by retaining experiences. arXiv preprint arXiv:2109.09180

  15. Cruz F, Dazeley R, Vamplew P, Moreira I (2021) Explainable robotic systems: understanding goal-driven actions in a reinforcement learning scenario. Neural Comput Appl, 1–18

  16. Zhang Z, Liniger A, Dai D, Yu F, Van Gool L (2021) End-to-end urban driving by imitating a reinforcement learning coach. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 15222–15232

  17. Zhu B, Bedeer E, Nguyen HH, Barton R, Henry J (2021) Uav trajectory planning in wireless sensor networks for energy consumption minimization by deep reinforcement learning. IEEE Trans Veh Technol 70(9):9540–9554

    Article  Google Scholar 

  18. Bignold A, Cruz F, Dazeley R, Vamplew P, Foale C (2022) Human engagement providing evaluative and informative advice for interactive reinforcement learning. Neural Comput Appl 1–16

  19. Wei P (2020) Exploration-exploitation strategies in deep q-networks applied to route-finding problems. In: Journal of physics: conference series, vol 1684, p 012073 (2020). IOP Publishing

  20. Mukherjee A (2021) A comparison of reward functions in q-learning applied to a cart position problem. arXiv preprint arXiv:2105.11617

  21. Bates D (2021) A hybrid approach for reinforcement learning using virtual policy gradient for balancing an inverted pendulum. arXiv preprint arXiv:2102.08362

  22. Kumar S (2020) Balancing a cartpole system with reinforcement learning—a tutorial. arXiv preprint arXiv:2006.04938

  23. Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W (2016) Openai gym. arXiv preprint arXiv:1606.01540

  24. Bignold A, Cruz F, Dazeley R, Vamplew P, Foale C (2021) Persistent rule-based interactive reinforcement learning. Neural Comput Appl 1–18

  25. Stimac AK (1999) Standup and stabilization of the inverted pendulum. PhD thesis, Massachusetts Institute of Technology, Department of Mechanical Engineering

  26. Kafetzis I, Moysis L (2017) Inverted pendulum: a system with innumerable applications. School Math Sci

  27. Landry M, Campbell SA, Morris K, Aguilar CO (2005) Dynamics of an inverted pendulum with delayed feedback control. SIAM J Appl Dyn Syst 4(2):333–351

    Article  MathSciNet  MATH  Google Scholar 

  28. Botvinick M, Wang JX, Dabney W, Miller KJ, Kurth-Nelson Z (2020) Deep reinforcement learning and its neuroscientific implications. Neuron 107(4):603–616

    Article  Google Scholar 

  29. Gym O, Sanghi N Deep reinforcement learning with python

  30. Lei C (2021) Deep learning basics. In: Deep learning and practice with mindspore, pp 17–28. Springer

  31. Choudhary A (2019) A hands-on introduction to deep q-learning using openai gym in python, Dostupn´e tieˇz z: https/www Analytics vidhya. com/blog/2019/04/introduction-deep-qlearning-python/[online], cit.[2020–12–10]

  32. Wang F, Qian Z, Yan Z, Yuan C, Zhang W (2019) A novel resilient robot: kinematic analysis and experimentation. IEEE Access 8:2885–2892

    Article  Google Scholar 

  33. Xue L, Liu CJ, Lin Y, Zhang WJ (2015) On redundant human-robot interface: concept and design principle. In: 2015 IEEE international conference on advanced intelligent mechatronics (AIM) (pp 287–292), IEEE

  34. Zhang W, Yang G, Lin Y, Ji C, Gupta MM (2018) On definition of deep learning, 2018 world automation congress (WAC)

  35. Zhang WJ, Lin Y (2010) On the principle of design of resilient systems–application to enterprise information systems. Enterprise Inf Syst 4(2):99–110

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anuja Arora.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mishra, S., Arora, A. A Huber reward function-driven deep reinforcement learning solution for cart-pole balancing problem. Neural Comput & Applic 35, 16705–16722 (2023). https://doi.org/10.1007/s00521-022-07606-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-07606-6

Keywords