A Huber reward function-driven deep reinforcement learning solution for cart-pole balancing problem

Mishra, Shaili; Arora, Anuja

doi:10.1007/s00521-022-07606-6

A Huber reward function-driven deep reinforcement learning solution for cart-pole balancing problem

S.I.: Human-aligned Reinforcement Learning for Autonomous Agents and Robots
Published: 30 July 2022

Volume 35, pages 16705–16722, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

824 Accesses
Explore all metrics

Abstract

Lots of learning tasks require experience learning based on activities performed in real scenarios which are affected by environmental factors. Therefore, real-time systems demand a model to learn from working experience—such as physical object properties-driven system models, trajectory prediction, and Atari games. This experience-driven learning model uses reinforcement learning which is considered as an important research topic and needs problem-specific reasoning model simulation. In this research paper, cart-pole balancing problem is selected as a problem where the system learns using Q-learning and Deep Q network reinforcement learning approaches. Pragmatic foundation of cart-pole problem and its solution with the help of Q learning and DQN reinforcement learning model are validated, and a comparison of achieved outcome in the form of accuracy and fast convergence is presented. An unexperienced Huber loss function is applied on cart-pole balancing problem, and results are in favor of Huber loss function in comparison with mean-squared error loss function. Hence, experimental study suggests the use of DQN with Huber loss reward function for fast learning and convergence of cart pole in balanced condition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Review of Deep Reinforcement Learning Algorithms and Comparative Results on Inverted Pendulum System

Deep reinforcement learning applied to an assembly sequence planning problem with user preferences

Article 05 August 2022

Deep Reinforcement Learning-Based Approach to Dynamically Balance Multi-manned Assembly Lines

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Artificial Intelligence

References

Chen Y, Han X (2021) Four-rotor ae flight of inverted pendulum based on reinforcement learning. In: 2021 2nd international conference on artificial intelligence and information systems, pp 1–5
Moreira I, Rivas J, Cruz F, Dazeley R, Ayala A, Fernandes B (2020) Deep reinforcement learning with interactive feedback in a human–robot environment. Appl Sci 10(16):5574
Article Google Scholar
Nguyen HS, Cruz F, Dazeley R (2021) A broad-persistent advising approach for deep interactive reinforcement learning in robotic environments. ArXiv preprint arXiv:2110.08003
Variengien A, Nichele S, Glover T, Pontes-Filho S (2021) Towards selforganized control: using neural cellular automata to robustly control a cart-pole agent. arXiv preprint arXiv:2106.15240
Nagendra S, Podila N, Ugarakhod R, George K (2017) Comparison of reinforcement learning algorithms applied to the cart-pole problem. In: 2017 international conference on advances in computing, communications and informatics (ICACCI), pp. 26–32, IEEE
Prasad LB, Tyagi B, Gupta HO (2014) Optimal control of nonlinear inverted pendulum system using pid controller and lqr: performance analysis without and with disturbance input. Int J Autom Comput 11(6):661–670
Article Google Scholar
Haydari A, Yilmaz Y (2020) Deep reinforcement learning for intelligent transportation systems: a survey. IEEE Trans Intel Transp Syst
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction MIT press. Cambridge, MA 22447 (1998)
Littman M, Moore A (1996) reinforcement learning: a survey, journal of artificial intelligence research 4. syf
Yaghmaie FA, Ljung L (2021) A crash course on reinforcement learning. arXiv preprint arXiv:2103.04910
Manrique Escobar CA, Pappalardo CM, Guida D (2020) A parametric study of a deep reinforcement learning control system applied to the swing-up problem of the cart-pole. Appl Sci 10(24):9013
Article Google Scholar
Lapan M (2018) Deep reinforcement learning hands-on: apply modern rl methods, with deep Q-networks, value iteration, policy gradients, TRPO. Packt Publishing Ltd, AlphaGo Zero and More
Google Scholar
Sharma S (2020) Modeling an inverted pendulum via differential equations and reinforcement learning techniques
Xie A, Finn C (2021) Lifelong robotic reinforcement learning by retaining experiences. arXiv preprint arXiv:2109.09180
Cruz F, Dazeley R, Vamplew P, Moreira I (2021) Explainable robotic systems: understanding goal-driven actions in a reinforcement learning scenario. Neural Comput Appl, 1–18
Zhang Z, Liniger A, Dai D, Yu F, Van Gool L (2021) End-to-end urban driving by imitating a reinforcement learning coach. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 15222–15232
Zhu B, Bedeer E, Nguyen HH, Barton R, Henry J (2021) Uav trajectory planning in wireless sensor networks for energy consumption minimization by deep reinforcement learning. IEEE Trans Veh Technol 70(9):9540–9554
Article Google Scholar
Bignold A, Cruz F, Dazeley R, Vamplew P, Foale C (2022) Human engagement providing evaluative and informative advice for interactive reinforcement learning. Neural Comput Appl 1–16
Wei P (2020) Exploration-exploitation strategies in deep q-networks applied to route-finding problems. In: Journal of physics: conference series, vol 1684, p 012073 (2020). IOP Publishing
Mukherjee A (2021) A comparison of reward functions in q-learning applied to a cart position problem. arXiv preprint arXiv:2105.11617
Bates D (2021) A hybrid approach for reinforcement learning using virtual policy gradient for balancing an inverted pendulum. arXiv preprint arXiv:2102.08362
Kumar S (2020) Balancing a cartpole system with reinforcement learning—a tutorial. arXiv preprint arXiv:2006.04938
Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W (2016) Openai gym. arXiv preprint arXiv:1606.01540
Bignold A, Cruz F, Dazeley R, Vamplew P, Foale C (2021) Persistent rule-based interactive reinforcement learning. Neural Comput Appl 1–18
Stimac AK (1999) Standup and stabilization of the inverted pendulum. PhD thesis, Massachusetts Institute of Technology, Department of Mechanical Engineering
Kafetzis I, Moysis L (2017) Inverted pendulum: a system with innumerable applications. School Math Sci
Landry M, Campbell SA, Morris K, Aguilar CO (2005) Dynamics of an inverted pendulum with delayed feedback control. SIAM J Appl Dyn Syst 4(2):333–351
Article MathSciNet MATH Google Scholar
Botvinick M, Wang JX, Dabney W, Miller KJ, Kurth-Nelson Z (2020) Deep reinforcement learning and its neuroscientific implications. Neuron 107(4):603–616
Article Google Scholar
Gym O, Sanghi N Deep reinforcement learning with python
Lei C (2021) Deep learning basics. In: Deep learning and practice with mindspore, pp 17–28. Springer
Choudhary A (2019) A hands-on introduction to deep q-learning using openai gym in python, Dostupn´e tieˇz z: https/www Analytics vidhya. com/blog/2019/04/introduction-deep-qlearning-python/[online], cit.[2020–12–10]
Wang F, Qian Z, Yan Z, Yuan C, Zhang W (2019) A novel resilient robot: kinematic analysis and experimentation. IEEE Access 8:2885–2892
Article Google Scholar
Xue L, Liu CJ, Lin Y, Zhang WJ (2015) On redundant human-robot interface: concept and design principle. In: 2015 IEEE international conference on advanced intelligent mechatronics (AIM) (pp 287–292), IEEE
Zhang W, Yang G, Lin Y, Ji C, Gupta MM (2018) On definition of deep learning, 2018 world automation congress (WAC)
Zhang WJ, Lin Y (2010) On the principle of design of resilient systems–application to enterprise information systems. Enterprise Inf Syst 4(2):99–110
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science Engineering and Information Technology, Jaypee Institute of Information Technology, Noida, India
Shaili Mishra & Anuja Arora

Authors

Shaili Mishra
View author publications
You can also search for this author inPubMed Google Scholar
Anuja Arora
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Anuja Arora.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mishra, S., Arora, A. A Huber reward function-driven deep reinforcement learning solution for cart-pole balancing problem. Neural Comput & Applic 35, 16705–16722 (2023). https://doi.org/10.1007/s00521-022-07606-6

Download citation

Received: 05 March 2022
Accepted: 01 July 2022
Published: 30 July 2022
Issue Date: August 2023
DOI: https://doi.org/10.1007/s00521-022-07606-6

Keywords

Part of a collection:

S.I.: Human-aligned Reinforcement Learning for Autonomous Agents and Robots (vol 35, issue 22)

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Huber reward function-driven deep reinforcement learning solution for cart-pole balancing problem

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Review of Deep Reinforcement Learning Algorithms and Comparative Results on Inverted Pendulum System

Deep reinforcement learning applied to an assembly sequence planning problem with user preferences

Deep Reinforcement Learning-Based Approach to Dynamically Balance Multi-manned Assembly Lines

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now