Skip to main content
Log in

Evaluating fault tolerance approaches in multi-agent systems

  • Published:
Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Abstract

A multi-agent system (MAS) is a distributed system that consists of multiple agents working together to solve mutual problems. Even though MASs are well suited for the development of complex distributed systems, the number of real-world usages is still small. One of the main reasons for this is that MASs are very fragile. In a typical, large-scale MAS, the rate of failure grows with the number of hosts, the number of deployed agents, and the duration of the agent’s task execution. For this reason, numerous approaches have been introduced to deal with aspects of failure handling. However, the absence of centralized control and a large number of individual intelligent components makes it difficult to detect and treat errors. The risk of uncontrollable fault propagation is high and can seriously impact on system performance. There are two important factors that limit the usage of MASs: (1) existing fault tolerance (FT) approaches are not generic, as they focus on and improve specific issues of FT; and (2) despite the plethora of available FT approaches and theories, there is a remarkable lack of general metrics, tools, benchmarks, and experimental methods for formal validation and comparison of existing or newly developed FT approaches. As FT approaches in MASs become a well-established field, the need for generalized, standardized evaluation of FT approaches emerges as imperative. In this paper, we first present a detailed overview of existing FT solutions, approaches, and techniques in agent platform hosted MASs. From that overview, we derive the commonalities in existing research. Next, we present the main contribution of our paper: an evaluation methodology, with a set of metrics, for comparing FT approaches in MASs. We adopt an engineering perspective on the problem, defining a methodology and metrics that are both implementation- and domain-independent. The metrics are formalized with an acyclic directed graph. By using our methodology, evaluators can select an appropriate FT approach for targeted MAS application, thus improving MAS usability, stability, and development speed. In order to show the viability of our approach, a case study that compares two FT approaches for a targeted MAS is presented. The case study results show that our methodology can be used for selecting an appropriate FT approach for the targeted MAS.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Tanenbaum, A. S., & Steen, M. V. (2002). Distributed systems: principles and paradigms. Upper Saddle River: Prentice Hall.

    MATH  Google Scholar 

  2. Bellifemine, F. L., Caire, G., & Greenwood, D. (2007). Developing multi-agent systems with JADE. West Sussex: Wiley.

    Book  Google Scholar 

  3. Rudowsky, I. (2004). Intelligent agents. The Communications of the Association for Information Systems, 14(1), 48.

    Google Scholar 

  4. Wooldridge, M. (1997). Agent-based software engineering. IEE Proceedings Software, 144(1), 26–37.

    Article  Google Scholar 

  5. Decker, K. S., & Sycara, K. (1997). Intelligent adaptive information agents. Journal of Intelligent Information Systems, 9(3), 239–260.

    Article  Google Scholar 

  6. Punithavathi, R., & Duraiswamy, K. (2010). A fault tolerant mobile agent information retrieval system. Journal of computer science, 6(5), 553.

    Article  Google Scholar 

  7. Jurasovic, K., Kusek, M., & Jezic, G. (2009). Multi-agent service deployment in telecommunication networks. Agent and multi-agent systems: technologies and applications (pp. 560–569). Berlin: Springer.

    Chapter  Google Scholar 

  8. Yang, Z., Ma, C., Feng, J. Q., Wu, Q. H., Mann, S., & Fitch, J. (2006). A multi-agent framework for power system automation. International Journal of Innovations in Energy Systems and Power, 1(1), 39–45.

    Google Scholar 

  9. Zhang, Z., McCalley, J. D., Vishwanathan, V., & Honavar, V. (June, 2004). Multiagent system solutions for distributed computing, communications, and data integration needs in the power industry. In Power Engineering Society General Meeting, 2004, IEEE (pp. 45-49). IEEE.

  10. Fedoruk, A., & Deters, R. (July, 2002). Improving fault-tolerance by replicating agents. In Proceedings of the First International Joint Conference on Autonomous Agents and Multiagent Systems: Part 2 (pp. 737–744). ACM.

  11. Batouma, N., & Sourrouille, J. L. (2011). Dynamic adaption of resource aware distributed applications. International journal of grid and distributed computing, 4(2), 25–42.

    Google Scholar 

  12. Anon. (2002). SLA Information Zone. http://www.sla-zone.co.uk/. Accessed June 28, 2014.

  13. Ahmad, H. F., Sun, G., & Mori, K. (2001). Autonomous information provision to achieve reliability for users and providers. In Proceedings. 5th International Symposium on Autonomous Decentralized Systems, 2001 (pp. 65–72). IEEE.

  14. Ahmad, H. F., & Suguri, H. (April, 2003). Dynamic information allocation through mobile agents to achieve load balancing in evolving environment. In The Sixth International Symposium on Autonomous Decentralized Systems, 2003. ISADS 2003 (pp. 25-33). IEEE.

  15. Huhns, M. N., et al. (2005). Research directions for service-oriented multiagent systems. IEEE Internet Computing, 9(6), 65.

    Article  Google Scholar 

  16. Calisti, M., et al. (2010). Service-oriented architectures and multi-agent systems technology. Dagstuhl Seminar Proceedings (p. 10021).

  17. Briot, J. P., & Ghédira, K. (2003). Déploiement des systemes multi-agents-Vers un passagea l’échelle-JFSMA’03. Revue des Sciences et Technologies de l’Information (RSTI).

  18. Kumar, S., & Cohen, P. R. (June, 2000). Towards a fault-tolerant multi-agent system architecture. In Proceedings of the Fourth International Conference on Autonomous Agents (pp. 459-466). ACM.

  19. Almeida, A. L., Aknine, S., Briot, J. P., & Malenfant, J. (April, 2006). Plan-based replication for fault-tolerant multi-agent systems. In Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International (p. 7). IEEE.

  20. Isong, B. E., & Bekele, E. (2013). A systematic review of fault tolerance in mobile agents. American Journal of Software Engineering and Applications, 2(5), 111–124.

    Article  Google Scholar 

  21. Stanković, R., & Štula, M. (February, 2013). Fault tolerance through interaction and mutual cooperation in hierarchical multi-agent systems. In 5th International Conference on Agents and Artificial Intelligence.

  22. Marin, O. (2003). The Darx framework: Adapting fault tolerance for agent systems (Doctoral dissertation, Université Paris VI).

  23. Tosic, M., & Zaslavsky, A. (2005). Reliable multi-agent systems with persistent publish/subscribe messaging. Innovations in applied artificial intelligence (pp. 165–174). Berlin: Springer.

    Chapter  Google Scholar 

  24. Kumar, S., Cohen, P. R., & Levesque, H. J. (2000). The adaptive agent architecture: Achieving fault-tolerance using persistent broker teams. In Proceedings. Fourth International Conference on MultiAgent Systems, 2000 (pp. 159–166). IEEE.

  25. Faci, N., Guessoum, Z., & Marin, O. (May, 2006). DimaX: A fault-tolerant multi-agent platform. In Proceedings of the 2006 International Workshop on Software Engineering for Large-Scale Multi-Agent Systems (pp. 13–20). ACM.

  26. Mitrovic, D., Budimac, Z., Ivanovic, M., & Vidakovic, M. (October, 2010). Improving fault-tolerance of distributed multi-agent systems with mobile network-management agents. In Proceedings of the 2010 International Multiconference on Computer Science and Information Technology (IMCSIT)(pp. 217–222). IEEE.

  27. Summiya, S., Ijaz, K., Manzoor, U., & Ali Shahid, A. (November, 2006). A fault tolerant infrastructure for mobile agent. In Proceedings of the International Conference on Computational Intelligence for Modelling Control and Automation and International Conference on Intelligent Agents Web Technologies and International Commerce (p. 235). IEEE Computer Society.

  28. Yang, J., Cao, J., Wu, W., & Xu, C. Z. (2005). Parallel algorithms for fault-tolerant mobile agent execution. Distributed and parallel computing (pp. 246–256). Berlin: Springer.

    Chapter  Google Scholar 

  29. Jin, G., Ahn, B., & Lee, K. D. (2004). A fault-tolerant protocol for mobile agent. Computational science and its applications-ICCSA 2004 (pp. 993–1001). Berlin: Springer.

    Chapter  Google Scholar 

  30. Johansen, D., Marzullo, K., Schneider, F. B., Jacobsen, K., & Zagorodnov, D. (1999). NAP: Practical fault-tolerance for itinerant computations. In 19th IEEE International Conference on Distributed Computing Systems, 1999. Proceedings (pp. 180–189). IEEE.

  31. Klügl, F. (2008). Measuring complexity of multi-agent simulations—An attempt using metrics. Languages, methodologies and development tools for multi-agent systems (pp. 123–138). Berlin: Springer.

    Chapter  Google Scholar 

  32. Wille, C., Brehmer, N., & Dumke, R. R. (2004). Software measurement of agent-based systems an evaluation study of the agent academy. Technical Report Preprint No. 3, Faculty of Informatics, University of Magdeburg.

  33. Such, J. M., Alberola, J. M., Mulet, L., Espinosa, A., Garcia-Fornes, A., & Botti, V. (2007). Large-scale multiagent platform benchmarks. In Languages, Methodologies and Development Tools for Multi-Agent Systems (LADS 2007). Proceedings of the Multi-Agent Logics, Languages, and Organisations-Federated Workshops (pp. 192–204).

  34. Kusek, K., Jurasovic, G., & Jezic, M. (2006). A performance analysis of multi-agent systems. International Transactions on Systems Science and Applications, 1(4).

  35. Alberola, J. M., Such, J. M., Garcia-Fornes, A., Espinosa, A., & Botti, V. (2010). A performance evaluation of three multiagent platforms. Artificial Intelligence Review, 34(2), 145–176.

    Article  Google Scholar 

  36. Mulet, L., Such, J. M., & Alberola, J. M. (May, 2006). Performance evaluation of open-source multiagent platforms. In Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multiagent Systems (pp. 1107–1109). ACM.

  37. Fernández, V., Grimaldo, F., Lozano, M., & Orduna, J. M. (2010). Evaluating Jason for distributed crowd simulations. In ICAART (2) (pp. 206–211).

  38. Pérez-Carro, P., Grimaldo, F., Lozano, M., & Orduna, J. M. (2014). Characterization of the Jason multiagent platform on multicore processors. Scientific Programming, 22(1), 21–35.

    Article  Google Scholar 

  39. Silva, L. M., Soares, G., Martins, P., Batista, V., & Santos, L. (2000). Comparing the performance of mobile agent systems: A study of benchmarking. Computer Communications, 23(8), 769–778.

    Article  Google Scholar 

  40. Krippendorff, K. (1986). A dictionary of cybernetics. Norfolk: The American Society for Cybernetics.

    Google Scholar 

  41. Aprameya Rao, I. V., Jain, M., & Karlapalem, K. (May, 2007). Towards simulating billions of agents in thousands of seconds. In Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems (p. 143). ACM.

  42. Cardoso, R. C., Hübner, J. F., & Bordini, R. H. (2013). Benchmarking communication in actor-and agent-based languages. Engineering multi-agent systems (pp. 58–77). Berlin: Springer.

    Chapter  Google Scholar 

  43. Wilensky, U., (1999). NetLogo Home Page http://ccl.northwestern.edu/netlogo/. Accessed May 06, 2015.

  44. Dimou, C., Symeonidis, A. L., & Mitkas, P. (April, 2007). Towards a generic methodology for evaluating MAS performance. In International Conference on Integration of Knowledge Intensive Multi-Agent Systems, 2007. KIMAS 2007 (pp. 174–179). IEEE.

  45. Zadeh, L. A. (2002). In quest of performance metrics for intelligent systems—A challenge that cannot be met with existing methods. CALIFORNIA UNIV BERKELEY DEPT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE.

  46. Evans, J. M., & Messina, E. R. (2001). Performance metrics for intelligent systems. NIST SPECIAL PUBLICATION SP (pp. 101–104).

  47. Fenton, N., & Bieman, J. (2014). Software metrics: A rigorous and practical approach. Boca Raton: CRC Press.

    Book  MATH  Google Scholar 

  48. Hu, X., & Zeigler, B. P. (2004). Measuring cooperative robotic systems using simulation-based virtual environment. DE LA SALLE UNIV MANILA (PHILIPPINES) COLLEGE OF BUSINESS AND ECONOMICS.

  49. Nelson, A., Grant, E., & Henderson, T. (2002). Competitive relative performance evaluation of neural controllers for competitive game playing with teams of real mobile robots. NIST SPECIAL PUBLICATION SP, 43–50.

  50. Scholtz, J., Antonishek, B., & Young, J. (2004). Evaluation of human–robot interaction in the NIST reference search and rescue test arenas. In Proceedings in the Performance Metrics for Intelligent Systems (PerMIS ’04).

  51. Nowosielski, R., Gerlach, L., Payá-Vayá, G., Hesselbarth, S., & Blume, H. Methodology for observation and evaluation of fault tolerance implementations inside high temperature ASICs.

  52. McCann, J. A., & Huebscher, M. C., (January, 2004). Evaluation issues in autonomic computing. In Grid and Cooperative Computing-GCC 2004 Workshops (pp. 597–608). Berlin: Springer.

  53. Wooldridge, M., Jennings, N. R., & Kinny, D. (2000). The Gaia methodology for agent-oriented analysis and design. Autonomous Agents and multi-agent systems, 3(3), 285–312.

    Article  Google Scholar 

  54. Zambonelli, F., Jennings, N. R., & Wooldridge, M. (2003). Developing multiagent systems: The Gaia methodology. ACM Transactions on Software Engineering and Methodology (TOSEM), 12(3), 317–370.

    Article  Google Scholar 

  55. Deloach, S. (2004). The MaSE methodology. Methodologies and Software Engineering for Agent Systems-The Agent-Oriented Software Engineering Handbook Series: Multiagent Systems, Artificial Societies, and Simulated Organizations, 11, 107–125.

    Article  Google Scholar 

  56. Bresciani, P., Perini, A., Giorgini, P., Giunchiglia, F., & Mylopoulos, J. (2004). Tropos: An agent-oriented software development methodology. Autonomous Agents and Multi-Agent Systems, 8(3), 203–236.

    Article  MATH  Google Scholar 

  57. Elammari, M., & Lalonde, W. (June, 1999). An agent-oriented methodology: High-level and intermediate models. In Proceedings of the 1st International Workshop on Agent-Oriented Information Systems (pp. 1–16).

  58. Padgham, L., & Winikoff, M. (2003). Prometheus: A methodology for developing intelligent agents. Agent-oriented software engineering III (pp. 174–185). Berlin: Springer.

    Chapter  Google Scholar 

  59. Bauer, B., & Odell, J. (2005). UML 2.0 and agents: How to build agent-based systems with the new UML standard. Engineering Applications of Artificial Intelligence, 18(2), 141–157.

    Article  Google Scholar 

  60. Mellouli, S. (2005). FATMAS: A methodology to design fault-tolerant multi-agent systems (Doctoral dissertation, Université Laval).

  61. Abdelaziz, T., Elammari, M., Unland, R., & Branki, C. (2010). MASD: Multi-agent systems development methodology. Multiagent and Grid Systems, 6(1), 71–101.

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rade Stanković.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Stanković, R., Štula, M. & Maras, J. Evaluating fault tolerance approaches in multi-agent systems. Auton Agent Multi-Agent Syst 31, 151–177 (2017). https://doi.org/10.1007/s10458-015-9320-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10458-015-9320-6

Keywords

Navigation