Skip to main content
Log in

A hybrid cognitive/reactive intelligent agent autonomous path planning technique in a networked-distributed unstructured environment for reinforcement learning

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

This paper proposes a path planning technique for autonomous agent(s) located in an unstructured networked distributed environment, where each agent has limited and not complete knowledge of the environment. Each agent has only the knowledge available in the distributed memory of the computing node the agent is running on and the agents share some information learned over a distributed network. In particular, the environment is divided into several sectors with each sector located on a single separate distributed computing node. We consider hybrid reactive-cognitive agent(s) where we use autonomous agent motion planning that is based on the use of a potential field model accompanied by a reinforcement learning as well as boundary detection algorithms. Potential fields are used for fast convergence toward a path in a distributed environment while reenforcement learning is used to guarantee a variety of behavior and consistent convergence in a distributed environment. We show how the agent decision making process is enhanced by the combination of the two techniques in a distributed environment. Furthermore, path retracing is a challenging problem in a distributed environment, since the agent does not have complete knowledge of the environment. We propose a backtracking technique to keep the distributed agent informed all the time of its path information and step count including when migrating from one node to another. Note that no node has knowledge of the entire global path from a source to a goal when such a goal resides on a separate node. Each agent has only knowledge of a partial path (internal to a node) and related number of steps corresponding to the portion of the path that agent traversed when running on the node. In particular, we show how each of the agents(s), starting in one of the many sectors with no initial knowledge of the environment, using the proposed distributed technique, develops its intelligence based on its experience and seamlessly discovers the shortest global path to the target, which is located in a different node, while avoiding any obstacle(s) it encounters in its way, including when transitioning and migrating from one distributed computing node to another. The agent(s) use (s) multiple-token-ring message passing interface (MPI) to perform internode communication. Finally, the experimental results of the proposed method show that single and multiagents sharing the same goal and running on the same or different nodes successfully coordinate the sharing of their respective environment states/information to collaboratively perform their respective tasks. The results also show that distributed multiagent sharing information increases by an order of magnitude the speed of convergence to the optimal shortest path to the goal in comparison with the single-agent case or noninformation sharing multiagent case.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. AAAI (1995) In: Lessor V (ed) Proceedings of the first international conference on multi-agent systems, Menlo Park, CA, June. AAAI Press, Menlo Park

    Google Scholar 

  2. Al-Dayaa HS, Megherbi DB (2006) Fast reinforcement learning technique via multiple lookahead levels. In: Proceedings of the 2006 international conference on machine learning; models, technologies & applications, Nevada, USA

    Google Scholar 

  3. Araabi BN, Mastoureshgh S, Ahmadabadi MN (2007) A study on expertise of agents and its effects on cooperative Q-learning. IEEE Trans Syst Man Cybern, Part B, Cybern 32(2):398–409

    Article  Google Scholar 

  4. Bond AH, Gasser L (eds) (1988) Readings in distributed artificial intelligence. Morgan Kaufmann, San Mateo

    Google Scholar 

  5. Cao J, Spooner DP, Jarvis SA, Nudd GR (2005) Grid load balancing using intelligent agents. J Future Gener Comput Syst

  6. Clausen C, Wechsler H (2000) Quad-Q-learning. IEEE Trans Neural Netw 11(2):279–294

    Article  Google Scholar 

  7. Dai X, Li C-K, Rad AB (2005) An approach to tune fuzzy controllers based on reinforcement learning for autonomous vehicle control. IEEE Trans Intell Transp Syst 6(3):285–293

    Article  Google Scholar 

  8. Decker KS, Williamson M (1987) Intelligent adaptive information agents. The Robotics Institute, Carnegie Mellon University (decker,sycara,mikew)@cs.cmu.edu

  9. Durfee EH, Lesser VR, Corkill DD (1989) Trends in cooperative distributed problem solving. IEEE Trans Data Knowl Eng

  10. Ferber J (1999) Multi-agent systems, an introduction to distributed artificial intelligence. Addison-Wesley, Reading

    Google Scholar 

  11. Gropp W, Lusk E, Skjellum A (1999) Using MPI. MIT Press, Cambridge

    Google Scholar 

  12. Guo M, Liu Y, Malec J (2004) A new Q-learning algorithm based on the metropolis criterion. IEEE Trans Syst Man Cybern, Part B, Cybern 34(5):2140–2143

    Article  Google Scholar 

  13. Hadidi R, Jeyasurya B (2010) Selective initial state criteria to enhance convergence rate of Q-learning algorithm in power system stability application. In: IEEE Canadian conference

    Google Scholar 

  14. Hartvigsen G, Johansen D (2010) Co-operation in distributed artificial intelligence environment—the StromCast application

  15. Hu L, Zhou C, Sun Z (2008) Estimating biped gait using spline-based probability distribution function with Q-learning. IEEE Trans Ind Electron 55(3):1444–1452

    Article  Google Scholar 

  16. Khatib O (1986) Real-time obstacle avoidance for manipulators and mobile robots. Int J Robot Res

  17. Khosla P, Volpe R (1998) Superquadratic artificial potentials for obstacle avoidance and approach. In: Proc. IEEE international conference of robotics and automation, Philadelphia, PA

    Google Scholar 

  18. Megherbi DB, Malayia V (2007) An autonomous hybrid cognitive/reactive agent path planning technique in a networked distributed unstructured environment for reinforcement learning. In: Proceedings of the international conference on parallel and distributed processing techniques and applications, Las Vegas, June

    Google Scholar 

  19. Megherbi DB, Radumilo-Franklin J (2009) An intelligent multi-agent distributed battlefield via multi-token message passing. In: IEEE international conference on computational intelligence for measurement systems and applications, China, May 2009

    Google Scholar 

  20. Megherbi DB, Teirelbar A, Boulenouar AJ (2001) A time-varying-environment machine learning technique for autonomous agent shortest path planning. In: Proceedings of the SPIE international conference on defense sensing, unmanned ground vehicle technology, Orlando, Florida, April, pp 419–428

    Google Scholar 

  21. Mitchell TM (1997) Machine learning. McGraw Hill, New York

    MATH  Google Scholar 

  22. Newman WS, Hogan N (1985) High-speed robot control and obstacle avoidance using dynamic potential functions. In: Proc IEEE conf on robotics & automation

    Google Scholar 

  23. Parker L (2000) Current state of the art in distributed robot systems. In: Parker LE, Bekey G, Barhen J (eds) Distributed autonomous robotic systems 4. Springer, Berlin

    Google Scholar 

  24. Riolo R (1991) Lookahead planning and latent learning in a classifier system. In: Proceedings of the int conf on the simulation of adaptive behavior

    Google Scholar 

  25. So Y, Durfee E (1992) A distributed problem solving infrastructure for computer network management. Int J Intell Comp Inf Syst

  26. Stone P, Veloso M (2000) Multiagent system: a survey from a machine learning. Auton Robot 8

  27. Sutton RS (1990) Integrated architectures for learning, planning, and reaction based on approximating dynamic programming. In: Proceedings of the seventh international conference on machine learning, pp 216–224

    Google Scholar 

  28. Sutton RS (1991) Dyna, an integrated architecture for learning, planning, and reacting. In: Working notes of 1991 AAAI spring symposium, pp 151–155

    Google Scholar 

  29. Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge

    Google Scholar 

  30. Sutton RS, Barto AG, Williams RJ (1992) Reinforcement learning is direct adaptive optimal control. IEEE Control Syst Mag 12(2):19–22

    Article  Google Scholar 

  31. Tan M (1993) Multi-agent reinforcement learning: independent vs. cooperative agents. Readings in agents. Morgan Kaufmann, San Mateo

    Google Scholar 

  32. Valasek J, Doebbler J, Tandale MD, Meade AJ (2008) Improved adaptive–reinforcement learning control for morphing unmanned air vehicles. IEEE Trans Syst Man Cybern, Part B, Cybern 38(4):1014–1020

    Article  Google Scholar 

  33. Van Dyke Parunak H (1996) In: ICMAS proceedings of the second international conference on multi-agent systems

    Google Scholar 

  34. Watkins C, Dayan P (1992) Q-learning. Mach Learn 8:279–292

    MATH  Google Scholar 

  35. Weiss G (1999) A multiagent framework for planning, reacting and learning. Technical Report FKI-233-99

  36. Weiss G (1999) Multiagent systems a modern approach to distributed artificial intelligence. MIT Press, Cambridge

    Google Scholar 

  37. Weiss G (1998) A multi-agent perspective of parallel and distributed machine learning. http://wwwbrauer.in.tum.de/~weissg/Docs/weissgaa98.pdf

  38. Wiering MA, van Hasselt H (2008) Ensemble algorithms in reinforcement learning. IEEE Trans Syst Man Cybern, Part B, Cybern 38(4):930–936

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dalila B. Megherbi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Megherbi, D.B., Malayia, V. A hybrid cognitive/reactive intelligent agent autonomous path planning technique in a networked-distributed unstructured environment for reinforcement learning. J Supercomput 59, 1188–1217 (2012). https://doi.org/10.1007/s11227-010-0510-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-010-0510-3

Keywords

Navigation