Towards a Multiple-Lookahead-Levels agent reinforcement-learning technique and its implementation in integrated circuits

Al-Dayaa, H. S.; Megherbi, D. B.

doi:10.1007/s11227-011-0738-6

Towards a Multiple-Lookahead-Levels agent reinforcement-learning technique and its implementation in integrated circuits

Published: 17 January 2012

Volume 62, pages 588–615, (2012)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

H. S. Al-Dayaa¹ &
D. B. Megherbi¹

123 Accesses
5 Citations
Explore all metrics

Abstract

Reinforcement learning (RL) techniques have contributed and continue to tremendously contribute to the advancement of machine learning and its many related recent applications. As it is well known, some of the main limitations of existing RL techniques are, in general, their slow convergence and their computational complexity. The contributions of this paper are two-fold: (1) First, it introduces a technique for reinforcement learning using multiple lookahead levels that grants an autonomous agent more visibility in its environment and helps it learn faster. This technique extends the Watkins’s Q-Learning algorithm by using the Multiple-Lookahead-Levels (MLL) model equation that we develop and present here. An analysis of the convergence of the MLL equation and proof of its effectiveness are performed. A method to compute the improvement rate of the agent’s learning speed between different look-ahead levels is also proposed and implemented. Here, both the time and space complexities are examined. Results show that the number of steps, required to achieve the goal, per learning path exponentially decreases with the learning path number (time). Results also show that the number of steps per learning path, to some degree, is less at any time when the number of look-ahead levels is higher (space). Furthermore, we perform the analysis of the MLL system in the time domain and prove its temporal stability using Lyapunov theory. (2) Second, based on this Lyapunov stability analysis, we subsequently, and for the first time, propose a circuit architecture for the MLL technique’s software configurable hardware system design for real-time applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Assessing Policy, Loss and Planning Combinations in Reinforcement Learning Using a New Modular Architecture

Deep Reinforcement Learning with Temporal Logics

Experimental quantum speed-up in reinforcement learning agents

Article 10 March 2021

V. Saggio, B. E. Asenbeck, … P. Walther

References

Angelo A, Florence D et al (1999) Efficient learning of variable-resolution cognitive maps for autonomous indoor navigation. IEEE Trans Robot Autom
Barto AG, Sutton RS, Watkins CJCH (1990) Learning and sequential decision making. Learning Comput Neurosci, 539–602
Araabi BN, Mastoureshgh S, Ahmadabadi MN (2007) A study on expertise of agents and its effects on cooperative Q-Learning. IEEE Trans Syst Man Cybern, Part B, Cybern 37(2):398–409
Article Google Scholar
Watkins CJCH (1989) Learning with delayed rewards. PhD Thesis Cambridge University Psychology Department
Watkins C, Dayan P (1992) Q-Learning. Mach Learn 8:279–292
MATH Google Scholar
Clausen C, Wechsler H (2000) Quad-Q-Learning. IEEE Trans Neural Netw 11(2):279–294
Article Google Scholar
Megherbi DB, Al-Dayaa HS (2007) A Lyapunov-stability-based system hardware architecture for a real-time multiple-look-ahead-levels reinforcement learning. In: Proceedings of the 2006 international conference on machine learning; models, technologies & applications, Nevada, USA
Google Scholar
Megherbi DB, Teirelbar A, Boulenouar AJ (2001) A time-varying-environment machine learning technique for autonomous agent shortest path planning. In: Proceedings of the SPIE international conference on defense sensing. Unmanned Ground vehicle Technology, Orlando, Florida, April 2001, pp 419–428
Google Scholar
Patterson DA, Hennessy JL (2004) Computer organization & design. Morgan Kaufmann, San Mateo
Google Scholar
Ernst D, Geurts E, Wehenkel L (2005) Tree-based batch mode reinforcement learning. J Mach Learn Res
Kreyszig E (1993) Advanced engineering mathematics, 7th edn. Wiley, New York
MATH Google Scholar
Al-Dayaa HS, Megherbi DB (2006) Fast reinforcement learning technique via Multiple Lookahead Levels. In: Proceedings of the 2006 international conference on machine learning; models, technologies & applications, Nevada, USA
Google Scholar
Al-Dayaa HS, Megherbi DB (2006) Fast reinforcement learning techniques using the Euclidean distance and the agent state occurrence frequency. In: Proceedings of the 2006 international conference on machine learning; models, technologies & applications, Nevada, USA
Google Scholar
IEEE (1985) IEEE Standard for binary floating point arithmetic. Institute of Electrical & Electronics Engineers, March 1985
Valasek J, Doebbler J, Tandale MD, Meade AJ (2008) Improved adaptive–reinforcement learning control for morphing unmanned air vehicles. IEEE Trans Syst Man Cybern, Part B, Cybern 38(4):1014–1020
Article Google Scholar
Hwang K-S, Lo C-Y, Chen K-J (2009) Real-valued Q-Learning in multi-agent cooperation. In: Proceedings of 2009 IEEE international conference on systems, man, and cybernetics, Texas, USA
Google Scholar
Lakshmikantham V et al (1991) Vector Lyapunov functions and stability analysis of nonlinear systems. Mathematics and its applications. Springer, Berlin
MATH Google Scholar
Hu L, Zhou C, Sun Z (2008) Estimating biped gait using spline-based probability distribution function with Q-Learning. IEEE Trans Ind Electron 55(3):1444–1452
Article Google Scholar
Guo M, Liu Y, Malec J (2004) A new Q-Learning algorithm based on the metropolis criterion. IEEE Trans Syst Man Cybern, Part B, Cybern 34(5):2140–2143
Article Google Scholar
Wiering MA, van Hasselt H (2008) Ensemble algorithms in reinforcement learning. IEEE Trans Syst Man Cybern, Part B, Cybern 38(4):930–935
Article Google Scholar
Balch M (2003) Complete digital design: a comprehensive guide to digital electronics and computer system architecture. McGraw-Hill Professional, New York
Google Scholar
Murphy SA (2005) A generalization error for Q-Learning. J Mach Learn Res, July
Hijab O (1987) Stabilization of control systems. Springer, New York
MATH Google Scholar
Dayan P (1992) The convergence of TD(λ) for general λ. Mach Learn 8:341–362
MATH Google Scholar
Jacob Baker R (2002) Mixed-signal circuit design. In: IEEE Press series on microelectronic systems
Google Scholar
Murray RM, Li Z, Sastry SS (1994) A mathematical introduction to robotic manipulation. CRC Press LLC, Boca Raton
MATH Google Scholar
Maclin R, Shavlik JW (1996) Creating advice-taking reinforcement learners. Mach Learn 22:251–281
Google Scholar
Riolo R (1991) Lookahead planning and latent learning in a classifier system. In: Proceedings of the int. conf. on the simulation of adaptive behavior
Google Scholar
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge
Google Scholar
Sutton RS (1991) Dyna, an integrated architecture for learning, planning, and reacting. In: Working notes of 1991 AAAI spring symposium, pp 151–155
Google Scholar
Sutton RS (1990) Integrated architectures for learning, planning, and reaction based on approximating dynamic programming. In: Proceedings of the seventh international conference on machine learning, pp 216–224
Google Scholar
Sutton RS, Barto AG, Williams RJ (1992) Reinforcement learning is direct adaptive optimal control. IEEE Control Syst Mag, April
Hadidi R, Jeyasurya B (2009) Selective initial state criteria to enhance convergence rate of Q-Learning algorithm in power system stability application. In: IEEE Canadian conference on electrical and computer engineering, NL, Canada, May 2009
Google Scholar
Stefani RT, Savant S, Hostetter et al (2001) Design of feedback control systems, 4th edn. Oxford University Press, London
Google Scholar
Mitchell TM (1997) Machine learning. McGraw-Hill, New York
MATH Google Scholar
Dai X, Li C-K, Rad AB (2005) An approach to tune fuzzy controllers based on reinforcement learning for autonomous vehicle control. IEEE Trans Intell Transp Syst 6(3):285–293
Article Google Scholar
Al-Dayaa HS, Megherbi DB (2012) Reinforcement learning technique using agent state occurrence frequency with analysis of knowledge sharing on the agent’s learning process in multi-agent environments. J Supercomput 59(1), 526–547. doi:10.1007/s11227-010-0451-x
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Massachusetts, Lowell, USA
H. S. Al-Dayaa & D. B. Megherbi

Authors

H. S. Al-Dayaa
View author publications
You can also search for this author in PubMed Google Scholar
D. B. Megherbi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to D. B. Megherbi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Al-Dayaa, H.S., Megherbi, D.B. Towards a Multiple-Lookahead-Levels agent reinforcement-learning technique and its implementation in integrated circuits. J Supercomput 62, 588–615 (2012). https://doi.org/10.1007/s11227-011-0738-6

Download citation

Published: 17 January 2012
Issue Date: October 2012
DOI: https://doi.org/10.1007/s11227-011-0738-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Towards a Multiple-Lookahead-Levels agent reinforcement-learning technique and its implementation in integrated circuits

Abstract

Access this article

Similar content being viewed by others

Assessing Policy, Loss and Planning Combinations in Reinforcement Learning Using a New Modular Architecture

Deep Reinforcement Learning with Temporal Logics

Experimental quantum speed-up in reinforcement learning agents

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Towards a Multiple-Lookahead-Levels agent reinforcement-learning technique and its implementation in integrated circuits

Abstract

Access this article

Similar content being viewed by others

Assessing Policy, Loss and Planning Combinations in Reinforcement Learning Using a New Modular Architecture

Deep Reinforcement Learning with Temporal Logics

Experimental quantum speed-up in reinforcement learning agents

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation