Abstract
In the realm of data-driven AI technology, the application of open-source large language models (LLMs) in robotic task planning represents a significant milestone. Recent robotic task planning methods based on open-source LLMs typically leverage vast task planning datasets to enhance models’ planning abilities. While these methods show promise, they struggle with complex long-horizon tasks, which require comprehending more context and generating longer action sequences. This paper addresses this limitation by proposing MLDT, the Multi-Level Decomposition Task planning method. This method innovatively decomposes tasks at the goal-level, task-level, and action-level to mitigate the challenge of complex long-horizon tasks. In order to enhance open-source LLMs’ planning abilities, we introduce a goal-sensitive corpus generation method to create high-quality training data and conduct instruction tuning on the generated corpus. Since the complexity of the existing datasets is not high enough, we construct a more challenging dataset, LongTasks, to specifically evaluate planning ability on complex long-horizon tasks. We evaluate our method using various LLMs on four datasets in VirtualHome. Our results demonstrate a significant performance enhancement in robotic task planning, showcasing MLDT’s effectiveness in overcoming the limitations of existing methods based on open-source LLMs as well as its practicality in complex, real-world scenarios.\(^{1}\)(Our code is available at https://github.com/wuyike2000/MLDT)
Y. Wu and J. Zhang—Contributed equally to this research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
https://openai.com/chatgpt
- 2.
https://huggingface.co/THUDM/chatglm3-6b-32k
- 3.
https://openai.com/chatgpt
- 4.
https://openai.com/gpt-4
References
An, C., Gong, S., Zhong, M., Li, M., Zhang, J., Kong, L., Qiu, X.: L-eval: Instituting standardized evaluation for long context language models. CoRR abs/2307.11088 (2023)
Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al.: Language models are few-shot learners. In: NeurIPS (2020)
Chalvatzaki, G., Younes, A., Nandha, D., Le, A.T., Ribeiro, L.F., Gurevych, I.: Learning to reason over scene graphs: a case study of finetuning gpt-2 into a robot language model for grounded task planning. Frontiers in Robotics and AI 10 (2023)
Chen, Y., Qian, S., Tang, H., Lai, X., Liu, Z., Han, S., Jia, J.: Longlora: Efficient fine-tuning of long-context large language models. CoRR abs/2309.12307 (2023)
Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., Grave, E., Ott, M., Zettlemoyer, L., Stoyanov, V.: Unsupervised cross-lingual representation learning at scale. In: ACL. pp. 8440–8451. Association for Computational Linguistics (2020)
Du, Z., Qian, Y., Liu, X., Ding, M., Qiu, J., Yang, Z., Tang, J.: GLM: general language model pretraining with autoregressive blank infilling. In: ACL (1). pp. 320–335. Association for Computational Linguistics (2022)
Gilardi, F., Alizadeh, M., Kubli, M.: Chatgpt outperforms crowd-workers for text-annotation tasks. CoRR abs/2303.15056 (2023)
Guhur, P., Chen, S., Pinel, R.G., Tapaswi, M., Laptev, I., Schmid, C.: Instruction-driven history-aware policies for robotic manipulations. In: CoRL. Proceedings of Machine Learning Research, vol. 205, pp. 175–187. PMLR (2022)
Huang, C., Mees, O., Zeng, A., Burgard, W.: Visual language maps for robot navigation. In: ICRA. pp. 10608–10615. IEEE (2023)
Huang, W., Abbeel, P., Pathak, D., Mordatch, I.: Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In: ICML. Proceedings of Machine Learning Research, vol. 162, pp. 9118–9147. PMLR (2022)
Huang, W., Xia, F., Xiao, T., Chan, H., Liang, J., Florence, P., Zeng, A., Tompson, J., Mordatch, I., Chebotar, Y., et al.: Inner monologue: Embodied reasoning through planning with language models. In: CoRL. Proceedings of Machine Learning Research, vol. 205, pp. 1769–1782. PMLR (2022)
Ichter, B., Brohan, A., Chebotar, Y., Finn, C., Hausman, K., Herzog, A., Ho, D., Ibarz, J., Irpan, A., Jang, E., et al.: Do as I can, not as I say: Grounding language in robotic affordances. In: CoRL. Proceedings of Machine Learning Research, vol. 205, pp. 287–318. PMLR (2022)
Kaplan, J., McCandlish, S., Henighan, T., Brown, T.B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., Amodei, D.: Scaling laws for neural language models. CoRR abs/2001.08361 (2020)
Laskar, M.T.R., Rahman, M., Jahan, I., Hoque, E., Huang, J.X.: Cqsumdp: A chatgpt-annotated resource for query-focused abstractive summarization based on debatepedia. CoRR abs/2305.06147 (2023)
Li, S., Yan, J., Wang, H., Tang, Z., Ren, X., Srinivasan, V., Jin, H.: Instruction-following evaluation through verbalizer manipulation. CoRR abs/2307.10558 (2023)
Li, S., Puig, X., Paxton, C., Du, Y., Wang, C., Fan, L., Chen, T., Huang, D., Akyürek, E., Anandkumar, A., et al.: Pre-trained language models for interactive decision-making. In: NeurIPS (2022)
Liu, H., Ning, R., Teng, Z., Liu, J., Zhou, Q., Zhang, Y.: Evaluating the logical reasoning ability of chatgpt and GPT-4. CoRR abs/2304.03439 (2023)
Liu, X., Yu, H., Zhang, H., Xu, Y., Lei, X., Lai, H., Gu, Y., Ding, H., Men, K., Yang, K., et al.: Agentbench: Evaluating llms as agents. CoRR abs/2308.03688 (2023)
Puig, X., Ra, K., Boben, M., Li, J., Wang, T., Fidler, S., Torralba, A.: Virtualhome: Simulating household activities via programs. In: CVPR. pp. 8494–8502. Computer Vision Foundation / IEEE Computer Society (2018)
Ray, P.P.: Chatgpt: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. Internet of Things and Cyber-Physical Systems (2023)
Scao, T.L., Fan, A., Akiki, C., Pavlick, E., Ilic, S., Hesslow, D., Castagné, R., Luccioni, A.S., Yvon, F., Gallé, M., et al.: BLOOM: A 176b-parameter open-access multilingual language model. CoRR abs/2211.05100 (2022)
Singh, I., Blukis, V., Mousavian, A., Goyal, A., Xu, D., Tremblay, J., Fox, D., Thomason, J., Garg, A.: Progprompt: Generating situated robot task plans using large language models. In: ICRA. pp. 11523–11530. IEEE (2023)
Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P., Bhosale, S., et al.: Llama 2: Open foundation and fine-tuned chat models. CoRR abs/2307.09288 (2023)
Tworkowski, S., Staniszewski, K., Pacek, M., Wu, Y., Michalewski, H., Milos, P.: Focused transformer: Contrastive training for context scaling. CoRR abs/2307.03170 (2023)
Upadhayay, B., Behzadan, V.: Taco: Enhancing cross-lingual transfer for low-resource languages in llms through translation-assisted chain-of-thought processes. CoRR abs/2311.10797 (2023)
Wei, J., Bosma, M., Zhao, V.Y., Guu, K., Yu, A.W., Lester, B., Du, N., Dai, A.M., Le, Q.V.: Finetuned language models are zero-shot learners. In: ICLR. OpenReview.net (2022)
Wu, X., Duan, R., Ni, J.: Unveiling security, privacy, and ethical concerns of chatgpt. Journal of Information and Intelligence (2023)
Wu, Z., Wang, Z., Xu, X., Lu, J., Yan, H.: Embodied task planning with large language models. CoRR abs/2307.01848 (2023)
Xiang, J., Tao, T., Gu, Y., Shu, T., Wang, Z., Yang, Z., Hu, Z.: Language models meet world models: Embodied experiences enhance language models. CoRR abs/2305.10626 (2023)
Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K.R., Cao, Y.: React: Synergizing reasoning and acting in language models. In: ICLR. OpenReview.net (2023)
Zeng, A., Liu, M., Lu, R., Wang, B., Liu, X., Dong, Y., Tang, J.: Agenttuning: Enabling generalized agent abilities for llms. CoRR abs/2310.12823 (2023)
Zhou, Z., Song, J., Yao, K., Shu, Z., Ma, L.: ISR-LLM: iterative self-refined large language model for long-horizon sequential task planning. CoRR abs/2308.13724 (2023)
Acknowledgements
This work is partially supported by National Nature Science Foundation of China under No. U21A20488. We thank the Big Data Computing Center of Southeast University for providing the facility support on the numerical calculations in this paper.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wu, Y. et al. (2024). MLDT: Multi-Level Decomposition for Complex Long-Horizon Robotic Task Planning with Open-Source Large Language Model. In: Onizuka, M., et al. Database Systems for Advanced Applications. DASFAA 2024. Lecture Notes in Computer Science, vol 14854. Springer, Singapore. https://doi.org/10.1007/978-981-97-5569-1_16
Download citation
DOI: https://doi.org/10.1007/978-981-97-5569-1_16
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-5568-4
Online ISBN: 978-981-97-5569-1
eBook Packages: Computer ScienceComputer Science (R0)