MineLlama: Llama with Retrieval-Augmented Generation as A Decision Maker in Minecraft

Ding, Shiyao; Ito, Takayuki

doi:10.1007/978-3-031-70415-4_9

Shiyao Ding⁹ &
Takayuki Ito⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 15157))

Included in the following conference series:

International Conference on Practical Applications of Agents and Multi-Agent Systems

197 Accesses

Abstract

This paper introduces MineLlama, a lightweight framework that utilizes a localized large language model, Llama, to enhance decision-making in the sandbox game Minecraft, without relying on external APIs. MineLlama operates a two-layer framework, consisting of planning and executing modules. In the planning module, MineLlama employs a retrieval-augmented generation (RAG) paired with a query engine that utilizes recipe information from Minecraft to decompose a final goal into a series of interdependent subgoals. In the executing module, MineLlama is also constructed using a RAG, with a query engine informed by general knowledge of Minecraft, to guide the agent in choosing the appropriate actions. The framework’s efficiency is demonstrated through evaluations on 7 diverse Minecraft tasks, showcasing its ability to guide agents in achieving specific goals. We upload all the contents including code and videos to: https://minecraftagents.github.io/MineLlama_hp/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Analysis of Parent with Fine Tuned Large Language Model

Selecting from Multiple Strategies Improves the Foreseeable Reasoning of Tool-Augmented Large Language Models

A survey on large language model based autonomous agents

Article Open access 22 March 2024

References

The minecraft wiki website. https://minecraft.wiki/
Baker, B., et al.: Video pretraining (VPT): learning to act by watching unlabeled online videos. Adv. Neural. Inf. Process. Syst. 35, 24639–24654 (2022)
Google Scholar
Ding, Z., Luo, H., Li, K., Yue, J., Huang, T., Lu, Z.: Clip4mc: an RL-friendly vision-language model for minecraft. arXiv preprint arXiv:2303.10571 (2023)
Du, Y., et al.: Guiding pretraining in reinforcement learning with large language models. In: International Conference on Machine Learning, pp. 8657–8677. PMLR (2023)
Google Scholar
Fan, L., et al.: Minedojo: building open-ended embodied agents with internet-scale knowledge. Adv. Neural. Inf. Process. Syst. 35, 18343–18362 (2022)
Google Scholar
Feng, Y., Wang, Y., Liu, J., Zheng, S., Lu, Z.: Llama rider: spurring large language models to explore the open world. arXiv preprint arXiv:2310.08922 (2023)
Ouyang, L., et al.: Training language models to follow instructions with human feedback. Adv. Neural. Inf. Process. Syst. 35, 27730–27744 (2022)
Google Scholar
Touvron, H., et al.: Llama: open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023)
Touvron, H., et al.: Llama 2: open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023)
Vemprala, S., Bonatti, R., Bucker, A., Kapoor, A.: Chatgpt for robotics: design principles and model abilities. arXiv preprint arXiv:2306.17582 (2023)
Wang, G., et al.: Voyager: an open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291 (2023)
Wang, Z., Cai, S., Chen, G., Liu, A., Ma, X., Liang, Y.: Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents. arXiv preprint arXiv:2302.01560 (2023)

Download references

Acknowledgement

This work was supported by JST CREST (JPMJCR20D1) and JSPS Grant-in-Aid for Scientific Research (C) (23K11230), Japan.

Author information

Authors and Affiliations

Kyoto University, Kyoto, 606-8501, Japan
Shiyao Ding & Takayuki Ito

Authors

Shiyao Ding
View author publications
You can also search for this author in PubMed Google Scholar
Takayuki Ito
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shiyao Ding .

Editor information

Editors and Affiliations

University of Lille, Villeneuve d’Ascq, France
Philippe Mathieu
University of Salamanca, Salamanca, Spain
Fernando De la Prieta

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ding, S., Ito, T. (2025). MineLlama: Llama with Retrieval-Augmented Generation as A Decision Maker in Minecraft. In: Mathieu, P., De la Prieta, F. (eds) Advances in Practical Applications of Agents, Multi-Agent Systems, and Digital Twins: The PAAMS Collection. PAAMS 2024. Lecture Notes in Computer Science(), vol 15157. Springer, Cham. https://doi.org/10.1007/978-3-031-70415-4_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-70415-4_9
Published: 19 November 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70414-7
Online ISBN: 978-3-031-70415-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

MineLlama: Llama with Retrieval-Augmented Generation as A Decision Maker in Minecraft

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Analysis of Parent with Fine Tuned Large Language Model

Selecting from Multiple Strategies Improves the Foreseeable Reasoning of Tool-Augmented Large Language Models

A survey on large language model based autonomous agents

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

MineLlama: Llama with Retrieval-Augmented Generation as A Decision Maker in Minecraft

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Analysis of Parent with Fine Tuned Large Language Model

Selecting from Multiple Strategies Improves the Foreseeable Reasoning of Tool-Augmented Large Language Models

A survey on large language model based autonomous agents

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation