Skip to main content

Benchmarking Large Language Models for Multi-agent Systems: A Comparative Analysis of AutoGen, CrewAI, and TaskWeaver

  • Conference paper
  • First Online:
Advances in Practical Applications of Agents, Multi-Agent Systems, and Digital Twins: The PAAMS Collection (PAAMS 2024)

Abstract

This paper presents the benchmarking of three multi-agent systems powered by large language models. The paper presents a comparative analysis of AutoGen, CrewAI, and TaskWeaver. Nowadays, large language models have emerged as powerful tools able to assist users in various areas. The integration of large language models into multi-agent systems increases their potential for collaborative problem-solving. This study focuses on a case study involving a machine learning code generation task which is used to evaluate the framework’s performance. To assess the performance of the solutions, it is requested to create energy forecasting models using the same dataset as the base. After producing the code, a new dataset is used to test the model performance using the root mean square error. The three solutions were able to provide results using multiple large language models. The best result was achieved by TaskWeaver using GPT-3.5, with an error of 25.04.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Hadi, M.U., et al.: Large language models: a comprehensive survey of its applications, challenges, limitations, and future prospects (2023). https://doi.org/10.36227/techrxiv.23589741.v4

  2. Vaswani, A., et al.: Attention is all you need (2023). https://doi.org/10.48550/arXiv.1706.03762

  3. Cardoso, R.C., Ferrando, A.: A review of agent-based programming for multi-agent systems. Computers 10, 16 (2021). https://doi.org/10.3390/computers10020016

    Article  Google Scholar 

  4. Julian, V., Botti, V.: Multi-agent systems. Appl. Sci. 9, 1402 (2019). https://doi.org/10.3390/app9071402

    Article  Google Scholar 

  5. Ribeiro, B., Gomes, L., Barbarroxa, R., Vale, Z.: A novel framework for multiagent knowledge-based federated learning systems. In: Mathieu, P., Dignum, F., Novais, P., De la Prieta, F. (eds.) PAAMS 2023. LNCS, vol. 13955, pp. 296–306. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-37616-0_25

    Chapter  Google Scholar 

  6. Burch, D.: Survey: Large Language Model Adoption Reaches Tipping Point. https://arize.com/blog/llm-survey/. Accessed 20 Mar 2024

  7. Kumar, A.: LLM Training & GPU Memory Requirements: Examples. https://vitalflux.com/llm-gpu-memory-requirements-examples/. Accessed 20 Mar 2024

  8. Long, T., et al.: Tweetorial Hooks: Generative AI Tools to Motivate Science on Social Media (2023). http://arxiv.org/abs/2305.12265

  9. Romera-Paredes, B., et al.: Mathematical discoveries from program search with large language models. Nature 625, 468–475 (2024). https://doi.org/10.1038/s41586-023-06924-6

    Article  Google Scholar 

  10. Xu, S., Zhang, X.: Leveraging generative artificial intelligence to simulate student learning behavior (2023). http://arxiv.org/abs/2310.19206

  11. Miessler, D.: danielmiessler/fabric (2024). https://github.com/danielmiessler/fabric

  12. Gomes, L., Ribeiro, B., Lezama, F., Vale, Z.: A multi-agent system empowered by federated learning and genetic programming. In: 2023 31st Signal Processing and Communications Applications Conference (SIU), Istanbul, Turkiye, pp. 1–4. IEEE (2023). https://doi.org/10.1109/SIU59756.2023.10223778

  13. Faia, R., Ribeiro, B., Goncalves, C., Gomes, L., Vale, Z.: Multi-agent based energy community cost optimization considering high electric vehicles penetration. Sustain. Energy Technol. Assess. 59, 103402 (2023). https://doi.org/10.1016/j.seta.2023.103402

    Article  Google Scholar 

  14. Talebirad, Y., Nadiri, A.: Multi-Agent Collaboration: Harnessing the Power of Intelligent LLM Agents (2023). http://arxiv.org/abs/2306.03314

  15. Qian, C., et al.: Communicative Agents for Software Development (2023). http://arxiv.org/abs/2307.07924

  16. Pythagora-io/gpt-pilot (2024). https://github.com/Pythagora-io/gpt-pilot

  17. AutoGen | AutoGen. https://microsoft.github.io/autogen/. Accessed 19 Mar 2024

  18. AutoGen Studio: Interactively Explore Multi-Agent Workflows | AutoGen. https://microsoft.github.io/autogen/blog/2023/12/01/AutoGenStudio/. Accessed 02 Apr 2024

  19. Hello from TaskWeaver | TaskWeaver. https://docusaurus.io/TaskWeaver/. Accessed 19 Mar 2024

Download references

Acknowledgments

This work has been supported by the European Union under the Next Generation EU, through a grant of the Portuguese Republic’s Recovery and Resilience Plan (PRR) Partnership Agreement, within the scope of the project PRODUTECH R3 – “Agenda Mobilizadora da Fileira das Tecnologias de Produção para a Reindustrialização”, Total project investment: 166.988.013,71 Euros; Total Grant: 97.111.730,27 Euros. The authors acknowledge the work facilities and equipment provided by GECAD research center (UIDB/00760/2020), DOI: https://doi.org/10.54499/UIDB/00760/2020 to the project team.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zita Vale .

Editor information

Editors and Affiliations

Ethics declarations

The authors have no competing interests to declare that are relevant to the content of this article.

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Barbarroxa, R., Gomes, L., Vale, Z. (2025). Benchmarking Large Language Models for Multi-agent Systems: A Comparative Analysis of AutoGen, CrewAI, and TaskWeaver. In: Mathieu, P., De la Prieta, F. (eds) Advances in Practical Applications of Agents, Multi-Agent Systems, and Digital Twins: The PAAMS Collection. PAAMS 2024. Lecture Notes in Computer Science(), vol 15157. Springer, Cham. https://doi.org/10.1007/978-3-031-70415-4_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-70415-4_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-70414-7

  • Online ISBN: 978-3-031-70415-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics