ABSTRACT
In this article, we propose an original method for providing fault tolerance in multi-agent systems. Our method focuses on building an automatic and adaptive replication policy to solve the resource allocation problem of determining where agents must be replicated to minimize the impact of failures. This policy is determined by taking into account the criticality of the agents and the reliability of the machines. We propose then different heuristics for the allocation of the available resources. Some measurements assessing the effectiveness of our approach are also presented.
- A. L. Almeida, S. Aknine, J.-P. Briot, and J. Malenfant. A predictive method for providing fault tolerance in multi-agent systems. In IEEE/WIC/ACM International Conference on Intelligent Agent Technology (IAT 2006), pages 226--232, 2006. Google ScholarDigital Library
- S. Bora and O. Dikenelli. Applying feedback control in adaptive replication mechanisms in fault tolerant multi-agent organization. In Software Engineering for Large-Scale Multi-Agent Systems, pages 5--12, 2006. Google ScholarDigital Library
- J.-P. Briot, Z. Guessoum, S. Aknine, A. L. Almeida, N. Faci, M. Gatti, C. Lucena, J. Malenfant, O. Marin, and P. Sens. Experience and prospects for various control strategies for self-replicating multi-agent systems. In ICSE Workshop on Software Engineering for Adaptive and Self-Managing Systems (SEAMS'2006), pages 37--43, 2006. Google ScholarDigital Library
- M. Cukier, J. Ren, C. Sabnis, D. Henke, J. Pistole, W. H. Sanders, D. E. Bakken, M. E. Berman, D. A. Karr, and R. E. Schantz. AQuA: An adaptive architecture that provides dependable distributed objects. In 17th IEEE Symposium on Reliable Distributed Systems, pages 245--253, 1998. Google ScholarDigital Library
- A. Fedoruk and R. Deters. Improving fault-tolerance by replicating agents. In AAMAS'02, pages 737--744, 2002. Google ScholarDigital Library
- Z. Guessoum, J.-P. Briot, O. Marin, A. Hamel, and P. Sens. Dynamic and adaptive replication for large-scale reliable multi-agent systems. In Software Engineering for Large-Scale Multi-Agent Systems, pages 182--198, 2003. Google ScholarDigital Library
- Z. T. Kalbarczyk, S. Bagchi, K. Whisnant, and R. K. Iyer. Chameleon: A software infrastructure for adaptive fault tolerance. IEEE Transactions on Parallel and Distributed Systems, 10(6):560--579, 1999. Google ScholarDigital Library
- S. Kraus, V. Subrahmanian, and N. C. Tacs. Probabilistically survivable MASs. In IJCAI'03, pages 789--795, 2003. Google ScholarDigital Library
- R. Mailler and V. Lesser. Solving Distributed Constraint Optimization Problems Using Cooperative Mediation. In AAMAS'04, pages 438--445, 2004. Google ScholarDigital Library
- O. Marin, M. Bertier, and P. Sens. Darx - a framework for the fault-tolerant support of agent software. In IEEE International Symposium on Software Reliability Engineering, pages 406--417, 2003. Google ScholarDigital Library
- P. Modi, W. Shen, M. Tambe, and M. Yokoo. ADOPT: Asynchronous distributed constraint optimization with quality guarantees. Artificial Intelligence Journal, 161(1--2):149--180, 2005. Google ScholarDigital Library
Index Terms
Dynamic resource allocation heuristics for providing fault tolerance in multi-agent systems
Recommendations
Predictive fault tolerance in multiagent systems: a plan-based replication approach
AAMAS '07: Proceedings of the 6th international joint conference on Autonomous agents and multiagent systemsThe fact that multiagent applications are prone to the same faults that any distributed system is susceptible to and the need for a higher quality of service in these systems justify the increasing interest in fault-tolerant multiagent systems. In this ...
Collaborative Services for Fault Tolerance in Hierarchical Data Grid
As fault tolerance is the ability of a system to perform its function correctly even in the presence of faults. Therefore, different fault tolerance techniques are critical for improving the efficient utilization of expensive resources in high ...
Evaluating fault tolerance approaches in multi-agent systems
A multi-agent system (MAS) is a distributed system that consists of multiple agents working together to solve mutual problems. Even though MASs are well suited for the development of complex distributed systems, the number of real-world usages is still ...
Comments