Abstract
This paper introduces MineLlama, a lightweight framework that utilizes a localized large language model, Llama, to enhance decision-making in the sandbox game Minecraft, without relying on external APIs. MineLlama operates a two-layer framework, consisting of planning and executing modules. In the planning module, MineLlama employs a retrieval-augmented generation (RAG) paired with a query engine that utilizes recipe information from Minecraft to decompose a final goal into a series of interdependent subgoals. In the executing module, MineLlama is also constructed using a RAG, with a query engine informed by general knowledge of Minecraft, to guide the agent in choosing the appropriate actions. The framework’s efficiency is demonstrated through evaluations on 7 diverse Minecraft tasks, showcasing its ability to guide agents in achieving specific goals. We upload all the contents including code and videos to: https://minecraftagents.github.io/MineLlama_hp/.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
The minecraft wiki website. https://minecraft.wiki/
Baker, B., et al.: Video pretraining (VPT): learning to act by watching unlabeled online videos. Adv. Neural. Inf. Process. Syst. 35, 24639–24654 (2022)
Ding, Z., Luo, H., Li, K., Yue, J., Huang, T., Lu, Z.: Clip4mc: an RL-friendly vision-language model for minecraft. arXiv preprint arXiv:2303.10571 (2023)
Du, Y., et al.: Guiding pretraining in reinforcement learning with large language models. In: International Conference on Machine Learning, pp. 8657–8677. PMLR (2023)
Fan, L., et al.: Minedojo: building open-ended embodied agents with internet-scale knowledge. Adv. Neural. Inf. Process. Syst. 35, 18343–18362 (2022)
Feng, Y., Wang, Y., Liu, J., Zheng, S., Lu, Z.: Llama rider: spurring large language models to explore the open world. arXiv preprint arXiv:2310.08922 (2023)
Ouyang, L., et al.: Training language models to follow instructions with human feedback. Adv. Neural. Inf. Process. Syst. 35, 27730–27744 (2022)
Touvron, H., et al.: Llama: open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023)
Touvron, H., et al.: Llama 2: open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023)
Vemprala, S., Bonatti, R., Bucker, A., Kapoor, A.: Chatgpt for robotics: design principles and model abilities. arXiv preprint arXiv:2306.17582 (2023)
Wang, G., et al.: Voyager: an open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291 (2023)
Wang, Z., Cai, S., Chen, G., Liu, A., Ma, X., Liang, Y.: Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents. arXiv preprint arXiv:2302.01560 (2023)
Acknowledgement
This work was supported by JST CREST (JPMJCR20D1) and JSPS Grant-in-Aid for Scientific Research (C) (23K11230), Japan.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ding, S., Ito, T. (2025). MineLlama: Llama with Retrieval-Augmented Generation as A Decision Maker in Minecraft. In: Mathieu, P., De la Prieta, F. (eds) Advances in Practical Applications of Agents, Multi-Agent Systems, and Digital Twins: The PAAMS Collection. PAAMS 2024. Lecture Notes in Computer Science(), vol 15157. Springer, Cham. https://doi.org/10.1007/978-3-031-70415-4_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-70415-4_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70414-7
Online ISBN: 978-3-031-70415-4
eBook Packages: Computer ScienceComputer Science (R0)