Abstract
This paper proposes a path planning technique for autonomous agent(s) located in an unstructured networked distributed environment, where each agent has limited and not complete knowledge of the environment. Each agent has only the knowledge available in the distributed memory of the computing node the agent is running on and the agents share some information learned over a distributed network. In particular, the environment is divided into several sectors with each sector located on a single separate distributed computing node. We consider hybrid reactive-cognitive agent(s) where we use autonomous agent motion planning that is based on the use of a potential field model accompanied by a reinforcement learning as well as boundary detection algorithms. Potential fields are used for fast convergence toward a path in a distributed environment while reenforcement learning is used to guarantee a variety of behavior and consistent convergence in a distributed environment. We show how the agent decision making process is enhanced by the combination of the two techniques in a distributed environment. Furthermore, path retracing is a challenging problem in a distributed environment, since the agent does not have complete knowledge of the environment. We propose a backtracking technique to keep the distributed agent informed all the time of its path information and step count including when migrating from one node to another. Note that no node has knowledge of the entire global path from a source to a goal when such a goal resides on a separate node. Each agent has only knowledge of a partial path (internal to a node) and related number of steps corresponding to the portion of the path that agent traversed when running on the node. In particular, we show how each of the agents(s), starting in one of the many sectors with no initial knowledge of the environment, using the proposed distributed technique, develops its intelligence based on its experience and seamlessly discovers the shortest global path to the target, which is located in a different node, while avoiding any obstacle(s) it encounters in its way, including when transitioning and migrating from one distributed computing node to another. The agent(s) use (s) multiple-token-ring message passing interface (MPI) to perform internode communication. Finally, the experimental results of the proposed method show that single and multiagents sharing the same goal and running on the same or different nodes successfully coordinate the sharing of their respective environment states/information to collaboratively perform their respective tasks. The results also show that distributed multiagent sharing information increases by an order of magnitude the speed of convergence to the optimal shortest path to the goal in comparison with the single-agent case or noninformation sharing multiagent case.
Similar content being viewed by others
References
AAAI (1995) In: Lessor V (ed) Proceedings of the first international conference on multi-agent systems, Menlo Park, CA, June. AAAI Press, Menlo Park
Al-Dayaa HS, Megherbi DB (2006) Fast reinforcement learning technique via multiple lookahead levels. In: Proceedings of the 2006 international conference on machine learning; models, technologies & applications, Nevada, USA
Araabi BN, Mastoureshgh S, Ahmadabadi MN (2007) A study on expertise of agents and its effects on cooperative Q-learning. IEEE Trans Syst Man Cybern, Part B, Cybern 32(2):398–409
Bond AH, Gasser L (eds) (1988) Readings in distributed artificial intelligence. Morgan Kaufmann, San Mateo
Cao J, Spooner DP, Jarvis SA, Nudd GR (2005) Grid load balancing using intelligent agents. J Future Gener Comput Syst
Clausen C, Wechsler H (2000) Quad-Q-learning. IEEE Trans Neural Netw 11(2):279–294
Dai X, Li C-K, Rad AB (2005) An approach to tune fuzzy controllers based on reinforcement learning for autonomous vehicle control. IEEE Trans Intell Transp Syst 6(3):285–293
Decker KS, Williamson M (1987) Intelligent adaptive information agents. The Robotics Institute, Carnegie Mellon University (decker,sycara,mikew)@cs.cmu.edu
Durfee EH, Lesser VR, Corkill DD (1989) Trends in cooperative distributed problem solving. IEEE Trans Data Knowl Eng
Ferber J (1999) Multi-agent systems, an introduction to distributed artificial intelligence. Addison-Wesley, Reading
Gropp W, Lusk E, Skjellum A (1999) Using MPI. MIT Press, Cambridge
Guo M, Liu Y, Malec J (2004) A new Q-learning algorithm based on the metropolis criterion. IEEE Trans Syst Man Cybern, Part B, Cybern 34(5):2140–2143
Hadidi R, Jeyasurya B (2010) Selective initial state criteria to enhance convergence rate of Q-learning algorithm in power system stability application. In: IEEE Canadian conference
Hartvigsen G, Johansen D (2010) Co-operation in distributed artificial intelligence environment—the StromCast application
Hu L, Zhou C, Sun Z (2008) Estimating biped gait using spline-based probability distribution function with Q-learning. IEEE Trans Ind Electron 55(3):1444–1452
Khatib O (1986) Real-time obstacle avoidance for manipulators and mobile robots. Int J Robot Res
Khosla P, Volpe R (1998) Superquadratic artificial potentials for obstacle avoidance and approach. In: Proc. IEEE international conference of robotics and automation, Philadelphia, PA
Megherbi DB, Malayia V (2007) An autonomous hybrid cognitive/reactive agent path planning technique in a networked distributed unstructured environment for reinforcement learning. In: Proceedings of the international conference on parallel and distributed processing techniques and applications, Las Vegas, June
Megherbi DB, Radumilo-Franklin J (2009) An intelligent multi-agent distributed battlefield via multi-token message passing. In: IEEE international conference on computational intelligence for measurement systems and applications, China, May 2009
Megherbi DB, Teirelbar A, Boulenouar AJ (2001) A time-varying-environment machine learning technique for autonomous agent shortest path planning. In: Proceedings of the SPIE international conference on defense sensing, unmanned ground vehicle technology, Orlando, Florida, April, pp 419–428
Mitchell TM (1997) Machine learning. McGraw Hill, New York
Newman WS, Hogan N (1985) High-speed robot control and obstacle avoidance using dynamic potential functions. In: Proc IEEE conf on robotics & automation
Parker L (2000) Current state of the art in distributed robot systems. In: Parker LE, Bekey G, Barhen J (eds) Distributed autonomous robotic systems 4. Springer, Berlin
Riolo R (1991) Lookahead planning and latent learning in a classifier system. In: Proceedings of the int conf on the simulation of adaptive behavior
So Y, Durfee E (1992) A distributed problem solving infrastructure for computer network management. Int J Intell Comp Inf Syst
Stone P, Veloso M (2000) Multiagent system: a survey from a machine learning. Auton Robot 8
Sutton RS (1990) Integrated architectures for learning, planning, and reaction based on approximating dynamic programming. In: Proceedings of the seventh international conference on machine learning, pp 216–224
Sutton RS (1991) Dyna, an integrated architecture for learning, planning, and reacting. In: Working notes of 1991 AAAI spring symposium, pp 151–155
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge
Sutton RS, Barto AG, Williams RJ (1992) Reinforcement learning is direct adaptive optimal control. IEEE Control Syst Mag 12(2):19–22
Tan M (1993) Multi-agent reinforcement learning: independent vs. cooperative agents. Readings in agents. Morgan Kaufmann, San Mateo
Valasek J, Doebbler J, Tandale MD, Meade AJ (2008) Improved adaptive–reinforcement learning control for morphing unmanned air vehicles. IEEE Trans Syst Man Cybern, Part B, Cybern 38(4):1014–1020
Van Dyke Parunak H (1996) In: ICMAS proceedings of the second international conference on multi-agent systems
Watkins C, Dayan P (1992) Q-learning. Mach Learn 8:279–292
Weiss G (1999) A multiagent framework for planning, reacting and learning. Technical Report FKI-233-99
Weiss G (1999) Multiagent systems a modern approach to distributed artificial intelligence. MIT Press, Cambridge
Weiss G (1998) A multi-agent perspective of parallel and distributed machine learning. http://wwwbrauer.in.tum.de/~weissg/Docs/weissgaa98.pdf
Wiering MA, van Hasselt H (2008) Ensemble algorithms in reinforcement learning. IEEE Trans Syst Man Cybern, Part B, Cybern 38(4):930–936
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Megherbi, D.B., Malayia, V. A hybrid cognitive/reactive intelligent agent autonomous path planning technique in a networked-distributed unstructured environment for reinforcement learning. J Supercomput 59, 1188–1217 (2012). https://doi.org/10.1007/s11227-010-0510-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-010-0510-3