skip to main content
article

Adaptive replication of large-scale multi-agent systems: towards a fault-tolerant multi-agent platform

Published: 15 May 2005 Publication History

Abstract

In order to construct and deploy large-scale multi-agent systems, we must address one of the fundamental issues of distributed systems, the possibility of partial failures. This means that fault-tolerance is an inevitable issue for large-scale multi-agent systems. In this paper, we discuss the issues and propose an approach for fault-tolerance of multi-agent systems. The starting idea is the application of replication strategies to agents, the most critical agents being replicated to prevent failures. As criticality of agents may evolve during the course of computation and problem solving, and as resources are bounded, we need to dynamically and automatically adapt the number of replicas of agents, in order to maximize their reliability and availability. We will describe our approach and related mechanisms for evaluating the criticality of a given agent (based on application-level semantic information, e.g. interdependences, and also system-level statistical information, e.g., communication load) and for deciding what strategy to apply (e.g., active replication, passive) how to parameterize it (e.g., number of replicas). We also will report on experiments conducted with our prototype architecture (named DimaX).

References

[1]
M. Bertier, O. Marin, and P. Sens. Implementation and performance evaluation of an adaptable failure detector. In the International Conference on Dependable Systems and Networks, Washington, USA, 2002.
[2]
C. Castelfranchi. Decentralized AI, chapter Dependence relations in multi-agent systems. Elsevier, 1992.
[3]
M. Colombetti and M. Verdicchio. An analysis of agent speech acts as institutional actions. In Proceedings of the first international joint conference on Autonomous agents and multiagent systems, pages 1157--1164. ACM Press, 2002.
[4]
A. Fedoruk and R. Deters. Improving fault-tolerance by replicating agents. In AAMAS2002, pages 373--744, Bologna, Italy, 2002.
[5]
R. Guerraoui, B. Garbinato, and K. Mazouni. Lessons from designing and implementing GARF. In Object-Based Parallel and Distributed Computation, number 791 in LNCS, pages 238--256, 1995.
[6]
Z. Guessoum and J.-P. Briot. From active objects to autonomous agents. IEEE Concurrency, 7(3):68--76, 1999.
[7]
S. Hagg. A sentinel approach to fault handling in multi-agent systems. In C. Zhang and D. Lukose, editors, Multi-Agent Systems, Methodologies and Applications, number 1286 in LNCS, pages 190--195, 1997.
[8]
N. Jamali, P. Thati, and G. Agha. An actor-based architecture for customizing and controlling agent ensembles. IEEE Intelligent Systems, Special Issue on Agents, 1999.
[9]
G. A. Kaminka, D. V. Pynadath, and M. Tambe. Monitoring teams by overhearing: A multi-agent plan-recognition approach. Journal of Intelligence Artificial Research, 17:83--135, 2002.
[10]
S. Kraus, V. Subrahmanian, and N. C. Tacs. Probabilistically survivable MASs. In IJCAI'03, pages 789--795, 2003.
[11]
O. Marin, M. Bertier, and P. Sens. DARX - a framework for the fault-tolerant support of agent software. In 14th International Symposium on Software Reliability Engineering (ISSRE'2003), pages 406--417, Denver, Colorado, USA, 2003. IEEE.
[12]
J. S. Sichman and R. Conte. Multi-agent dependence by dependence graphs. In AAMAS2002, pages 483--490, Bologna, Italy, 2002. ACM.
[13]
J. S. Sichman, R. Conte, and Y. Demazeau. Reasoning about others using dependence networks. In Actes de Incontro del gruppo AI*IA di interesse speciale sul inteligenza artificiale distribuita, Roma, Italia, 1993.
[14]
J. S. Sichman, R. Conte, and Y. Demazeau. A social reasoning mechanism based on dependence networks. In Proceedings of ECAI'94 - European Conference on Artificial Intelligence, Amsterdam, The Netherlands, August 1994.
[15]
F. D. A. Silva and R. Popescu-Zeletin. An approach for providing mobile agent fault tolerance. In S. N. Maheshwari, editor, Second International Workshop on Mobile Agents, number 1477 in LNCS, pages 14--25. Springer Verlag, 1998.
[16]
L. Silva, V. Batista, and J. Silva. Fault-tolerant execution of mobile agents. In International Conference on Dependable Systems and Networks, pages 135--143, 2000.
[17]
R. van Renesse, K. Birman, and S. Maffeis. Horus: A flexible group communication system. Communications of the ACM, 39(4):76--83, 1996.
[18]
L. A. Zadeh. A new direction in ai: Toward a computational theory of perceptions. AI Magazine, 22(1):73--84, 2001.

Cited By

View all
  • (2016)A centralized self-adaptive fault tolerance approach based on feedback control for multiagent systemsTURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES10.3906/elk-1405-5824(4707-4723)Online publication date: 2016
  • (2016)A roadmap for scalable agent organizations in the Internet of EverythingJournal of Systems and Software10.1016/j.jss.2016.01.022115:C(31-41)Online publication date: 1-May-2016
  • (2013)Extensible Java EE-Based Agent Framework – Past, Present, FutureMultiagent Systems and Applications10.1007/978-3-642-33323-1_3(55-88)Online publication date: 2013
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGSOFT Software Engineering Notes
ACM SIGSOFT Software Engineering Notes  Volume 30, Issue 4
July 2005
1514 pages
ISSN:0163-5948
DOI:10.1145/1082983
Issue’s Table of Contents
  • cover image ACM Other conferences
    SELMAS '05: Proceedings of the fourth international workshop on Software engineering for large-scale multi-agent systems
    May 2005
    92 pages
    ISBN:1595931163
    DOI:10.1145/1082960
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 May 2005
Published in SIGSOFT Volume 30, Issue 4

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)1
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2016)A centralized self-adaptive fault tolerance approach based on feedback control for multiagent systemsTURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES10.3906/elk-1405-5824(4707-4723)Online publication date: 2016
  • (2016)A roadmap for scalable agent organizations in the Internet of EverythingJournal of Systems and Software10.1016/j.jss.2016.01.022115:C(31-41)Online publication date: 1-May-2016
  • (2013)Extensible Java EE-Based Agent Framework – Past, Present, FutureMultiagent Systems and Applications10.1007/978-3-642-33323-1_3(55-88)Online publication date: 2013
  • (2013)ConclusionFrom Fault Classification to Fault Tolerance for Multi-Agent Systems10.1007/978-1-4471-5046-6_7(75-80)Online publication date: 22-Mar-2013
  • (2013)Multi-Agent System PropertiesFrom Fault Classification to Fault Tolerance for Multi-Agent Systems10.1007/978-1-4471-5046-6_2(5-10)Online publication date: 22-Mar-2013
  • (2009)Computing the fault tolerance of multi-agent deploymentArtificial Intelligence10.1016/j.artint.2008.11.007173:3-4(437-465)Online publication date: 1-Mar-2009
  • (2007)On Fault Tolerance in Law-Governed Multi-agent SystemsSoftware Engineering for Multi-Agent Systems V10.1007/978-3-540-73131-3_1(1-20)Online publication date: 2007
  • (2006)On fault tolerance in law-governed multi-agent systemsProceedings of the 2006 international workshop on Software engineering for large-scale multi-agent systems10.1145/1138063.1138068(21-28)Online publication date: 22-May-2006
  • (2006)Experience and prospects for various control strategies for self-replicating multi-agent systemsProceedings of the 2006 international workshop on Self-adaptation and self-managing systems10.1145/1137677.1137685(37-43)Online publication date: 21-May-2006
  • (2016)Declarative modeling cases of cyber physical systems2016 International Conference on Logistics, Informatics and Service Sciences (LISS)10.1109/LISS.2016.7854561(1-6)Online publication date: Jul-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media