Abstract
Lots of learning tasks require experience learning based on activities performed in real scenarios which are affected by environmental factors. Therefore, real-time systems demand a model to learn from working experience—such as physical object properties-driven system models, trajectory prediction, and Atari games. This experience-driven learning model uses reinforcement learning which is considered as an important research topic and needs problem-specific reasoning model simulation. In this research paper, cart-pole balancing problem is selected as a problem where the system learns using Q-learning and Deep Q network reinforcement learning approaches. Pragmatic foundation of cart-pole problem and its solution with the help of Q learning and DQN reinforcement learning model are validated, and a comparison of achieved outcome in the form of accuracy and fast convergence is presented. An unexperienced Huber loss function is applied on cart-pole balancing problem, and results are in favor of Huber loss function in comparison with mean-squared error loss function. Hence, experimental study suggests the use of DQN with Huber loss reward function for fast learning and convergence of cart pole in balanced condition.














Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Chen Y, Han X (2021) Four-rotor ae flight of inverted pendulum based on reinforcement learning. In: 2021 2nd international conference on artificial intelligence and information systems, pp 1–5
Moreira I, Rivas J, Cruz F, Dazeley R, Ayala A, Fernandes B (2020) Deep reinforcement learning with interactive feedback in a human–robot environment. Appl Sci 10(16):5574
Nguyen HS, Cruz F, Dazeley R (2021) A broad-persistent advising approach for deep interactive reinforcement learning in robotic environments. ArXiv preprint arXiv:2110.08003
Variengien A, Nichele S, Glover T, Pontes-Filho S (2021) Towards selforganized control: using neural cellular automata to robustly control a cart-pole agent. arXiv preprint arXiv:2106.15240
Nagendra S, Podila N, Ugarakhod R, George K (2017) Comparison of reinforcement learning algorithms applied to the cart-pole problem. In: 2017 international conference on advances in computing, communications and informatics (ICACCI), pp. 26–32, IEEE
Prasad LB, Tyagi B, Gupta HO (2014) Optimal control of nonlinear inverted pendulum system using pid controller and lqr: performance analysis without and with disturbance input. Int J Autom Comput 11(6):661–670
Haydari A, Yilmaz Y (2020) Deep reinforcement learning for intelligent transportation systems: a survey. IEEE Trans Intel Transp Syst
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction MIT press. Cambridge, MA 22447 (1998)
Littman M, Moore A (1996) reinforcement learning: a survey, journal of artificial intelligence research 4. syf
Yaghmaie FA, Ljung L (2021) A crash course on reinforcement learning. arXiv preprint arXiv:2103.04910
Manrique Escobar CA, Pappalardo CM, Guida D (2020) A parametric study of a deep reinforcement learning control system applied to the swing-up problem of the cart-pole. Appl Sci 10(24):9013
Lapan M (2018) Deep reinforcement learning hands-on: apply modern rl methods, with deep Q-networks, value iteration, policy gradients, TRPO. Packt Publishing Ltd, AlphaGo Zero and More
Sharma S (2020) Modeling an inverted pendulum via differential equations and reinforcement learning techniques
Xie A, Finn C (2021) Lifelong robotic reinforcement learning by retaining experiences. arXiv preprint arXiv:2109.09180
Cruz F, Dazeley R, Vamplew P, Moreira I (2021) Explainable robotic systems: understanding goal-driven actions in a reinforcement learning scenario. Neural Comput Appl, 1–18
Zhang Z, Liniger A, Dai D, Yu F, Van Gool L (2021) End-to-end urban driving by imitating a reinforcement learning coach. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 15222–15232
Zhu B, Bedeer E, Nguyen HH, Barton R, Henry J (2021) Uav trajectory planning in wireless sensor networks for energy consumption minimization by deep reinforcement learning. IEEE Trans Veh Technol 70(9):9540–9554
Bignold A, Cruz F, Dazeley R, Vamplew P, Foale C (2022) Human engagement providing evaluative and informative advice for interactive reinforcement learning. Neural Comput Appl 1–16
Wei P (2020) Exploration-exploitation strategies in deep q-networks applied to route-finding problems. In: Journal of physics: conference series, vol 1684, p 012073 (2020). IOP Publishing
Mukherjee A (2021) A comparison of reward functions in q-learning applied to a cart position problem. arXiv preprint arXiv:2105.11617
Bates D (2021) A hybrid approach for reinforcement learning using virtual policy gradient for balancing an inverted pendulum. arXiv preprint arXiv:2102.08362
Kumar S (2020) Balancing a cartpole system with reinforcement learning—a tutorial. arXiv preprint arXiv:2006.04938
Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W (2016) Openai gym. arXiv preprint arXiv:1606.01540
Bignold A, Cruz F, Dazeley R, Vamplew P, Foale C (2021) Persistent rule-based interactive reinforcement learning. Neural Comput Appl 1–18
Stimac AK (1999) Standup and stabilization of the inverted pendulum. PhD thesis, Massachusetts Institute of Technology, Department of Mechanical Engineering
Kafetzis I, Moysis L (2017) Inverted pendulum: a system with innumerable applications. School Math Sci
Landry M, Campbell SA, Morris K, Aguilar CO (2005) Dynamics of an inverted pendulum with delayed feedback control. SIAM J Appl Dyn Syst 4(2):333–351
Botvinick M, Wang JX, Dabney W, Miller KJ, Kurth-Nelson Z (2020) Deep reinforcement learning and its neuroscientific implications. Neuron 107(4):603–616
Gym O, Sanghi N Deep reinforcement learning with python
Lei C (2021) Deep learning basics. In: Deep learning and practice with mindspore, pp 17–28. Springer
Choudhary A (2019) A hands-on introduction to deep q-learning using openai gym in python, Dostupn´e tieˇz z: https/www Analytics vidhya. com/blog/2019/04/introduction-deep-qlearning-python/[online], cit.[2020–12–10]
Wang F, Qian Z, Yan Z, Yuan C, Zhang W (2019) A novel resilient robot: kinematic analysis and experimentation. IEEE Access 8:2885–2892
Xue L, Liu CJ, Lin Y, Zhang WJ (2015) On redundant human-robot interface: concept and design principle. In: 2015 IEEE international conference on advanced intelligent mechatronics (AIM) (pp 287–292), IEEE
Zhang W, Yang G, Lin Y, Ji C, Gupta MM (2018) On definition of deep learning, 2018 world automation congress (WAC)
Zhang WJ, Lin Y (2010) On the principle of design of resilient systems–application to enterprise information systems. Enterprise Inf Syst 4(2):99–110
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Mishra, S., Arora, A. A Huber reward function-driven deep reinforcement learning solution for cart-pole balancing problem. Neural Comput & Applic 35, 16705–16722 (2023). https://doi.org/10.1007/s00521-022-07606-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07606-6