Benchmarking Large Language Models for Multi-agent Systems: A Comparative Analysis of AutoGen, CrewAI, and TaskWeaver

Barbarroxa, Rafael; Gomes, Luis; Vale, Zita

doi:10.1007/978-3-031-70415-4_4

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 15157))

Included in the following conference series:

International Conference on Practical Applications of Agents and Multi-Agent Systems

536 Accesses

Abstract

This paper presents the benchmarking of three multi-agent systems powered by large language models. The paper presents a comparative analysis of AutoGen, CrewAI, and TaskWeaver. Nowadays, large language models have emerged as powerful tools able to assist users in various areas. The integration of large language models into multi-agent systems increases their potential for collaborative problem-solving. This study focuses on a case study involving a machine learning code generation task which is used to evaluate the framework’s performance. To assess the performance of the solutions, it is requested to create energy forecasting models using the same dataset as the base. After producing the code, a new dataset is used to test the model performance using the root mean square error. The three solutions were able to provide results using multiple large language models. The best result was achieved by TaskWeaver using GPT-3.5, with an error of 25.04.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A large language model for advanced power dispatch

Article Open access 15 March 2025

RTaC: A Generalized Framework for Tooling

A Controlled Experiment on the Energy Efficiency of the Source Code Generated by Code Llama

References

Hadi, M.U., et al.: Large language models: a comprehensive survey of its applications, challenges, limitations, and future prospects (2023). https://doi.org/10.36227/techrxiv.23589741.v4
Vaswani, A., et al.: Attention is all you need (2023). https://doi.org/10.48550/arXiv.1706.03762
Cardoso, R.C., Ferrando, A.: A review of agent-based programming for multi-agent systems. Computers 10, 16 (2021). https://doi.org/10.3390/computers10020016
Article Google Scholar
Julian, V., Botti, V.: Multi-agent systems. Appl. Sci. 9, 1402 (2019). https://doi.org/10.3390/app9071402
Article Google Scholar
Ribeiro, B., Gomes, L., Barbarroxa, R., Vale, Z.: A novel framework for multiagent knowledge-based federated learning systems. In: Mathieu, P., Dignum, F., Novais, P., De la Prieta, F. (eds.) PAAMS 2023. LNCS, vol. 13955, pp. 296–306. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-37616-0_25
Chapter Google Scholar
Burch, D.: Survey: Large Language Model Adoption Reaches Tipping Point. https://arize.com/blog/llm-survey/. Accessed 20 Mar 2024
Kumar, A.: LLM Training & GPU Memory Requirements: Examples. https://vitalflux.com/llm-gpu-memory-requirements-examples/. Accessed 20 Mar 2024
Long, T., et al.: Tweetorial Hooks: Generative AI Tools to Motivate Science on Social Media (2023). http://arxiv.org/abs/2305.12265
Romera-Paredes, B., et al.: Mathematical discoveries from program search with large language models. Nature 625, 468–475 (2024). https://doi.org/10.1038/s41586-023-06924-6
Article Google Scholar
Xu, S., Zhang, X.: Leveraging generative artificial intelligence to simulate student learning behavior (2023). http://arxiv.org/abs/2310.19206
Miessler, D.: danielmiessler/fabric (2024). https://github.com/danielmiessler/fabric
Gomes, L., Ribeiro, B., Lezama, F., Vale, Z.: A multi-agent system empowered by federated learning and genetic programming. In: 2023 31st Signal Processing and Communications Applications Conference (SIU), Istanbul, Turkiye, pp. 1–4. IEEE (2023). https://doi.org/10.1109/SIU59756.2023.10223778
Faia, R., Ribeiro, B., Goncalves, C., Gomes, L., Vale, Z.: Multi-agent based energy community cost optimization considering high electric vehicles penetration. Sustain. Energy Technol. Assess. 59, 103402 (2023). https://doi.org/10.1016/j.seta.2023.103402
Article Google Scholar
Talebirad, Y., Nadiri, A.: Multi-Agent Collaboration: Harnessing the Power of Intelligent LLM Agents (2023). http://arxiv.org/abs/2306.03314
Qian, C., et al.: Communicative Agents for Software Development (2023). http://arxiv.org/abs/2307.07924
Pythagora-io/gpt-pilot (2024). https://github.com/Pythagora-io/gpt-pilot
AutoGen | AutoGen. https://microsoft.github.io/autogen/. Accessed 19 Mar 2024
AutoGen Studio: Interactively Explore Multi-Agent Workflows | AutoGen. https://microsoft.github.io/autogen/blog/2023/12/01/AutoGenStudio/. Accessed 02 Apr 2024
Hello from TaskWeaver | TaskWeaver. https://docusaurus.io/TaskWeaver/. Accessed 19 Mar 2024

Download references

Acknowledgments

This work has been supported by the European Union under the Next Generation EU, through a grant of the Portuguese Republic’s Recovery and Resilience Plan (PRR) Partnership Agreement, within the scope of the project PRODUTECH R3 – “Agenda Mobilizadora da Fileira das Tecnologias de Produção para a Reindustrialização”, Total project investment: 166.988.013,71 Euros; Total Grant: 97.111.730,27 Euros. The authors acknowledge the work facilities and equipment provided by GECAD research center (UIDB/00760/2020), DOI: https://doi.org/10.54499/UIDB/00760/2020 to the project team.

Author information

Authors and Affiliations

GECAD – Research Group on Intelligent Engineering and Computing for Advanced, Innovation and Development, LASI – Intelligent Systems Associate Laboratory, Polytechnic of Porto, R. Dr. António Bernardino de Almeida, 431, 4249-015, Porto, Portugal
Rafael Barbarroxa, Luis Gomes & Zita Vale

Authors

Rafael Barbarroxa
View author publications
You can also search for this author in PubMed Google Scholar
Luis Gomes
View author publications
You can also search for this author in PubMed Google Scholar
Zita Vale
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zita Vale .

Editor information

Editors and Affiliations

University of Lille, Villeneuve d’Ascq, France
Philippe Mathieu
University of Salamanca, Salamanca, Spain
Fernando De la Prieta

Ethics declarations

The authors have no competing interests to declare that are relevant to the content of this article.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Barbarroxa, R., Gomes, L., Vale, Z. (2025). Benchmarking Large Language Models for Multi-agent Systems: A Comparative Analysis of AutoGen, CrewAI, and TaskWeaver. In: Mathieu, P., De la Prieta, F. (eds) Advances in Practical Applications of Agents, Multi-Agent Systems, and Digital Twins: The PAAMS Collection. PAAMS 2024. Lecture Notes in Computer Science(), vol 15157. Springer, Cham. https://doi.org/10.1007/978-3-031-70415-4_4

Download citation

DOI: https://doi.org/10.1007/978-3-031-70415-4_4
Published: 19 November 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70414-7
Online ISBN: 978-3-031-70415-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics