MLDT: Multi-Level Decomposition for Complex Long-Horizon Robotic Task Planning with Open-Source Large Language Model

Wu, Yike; Zhang, Jiatao; Hu, Nan; Tang, Lanling; Qi, Guilin; Shao, Jun; Ren, Jie; Song, Wei

doi:10.1007/978-981-97-5569-1_16

Yike Wu^15,16,
Jiatao Zhang¹⁷,
Nan Hu^15,16,
Lanling Tang¹⁸,
Guilin Qi^15,16,
Jun Shao¹⁷,
Jie Ren¹⁹ &
…
Wei Song¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14854))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

469 Accesses

Abstract

In the realm of data-driven AI technology, the application of open-source large language models (LLMs) in robotic task planning represents a significant milestone. Recent robotic task planning methods based on open-source LLMs typically leverage vast task planning datasets to enhance models’ planning abilities. While these methods show promise, they struggle with complex long-horizon tasks, which require comprehending more context and generating longer action sequences. This paper addresses this limitation by proposing MLDT, the Multi-Level Decomposition Task planning method. This method innovatively decomposes tasks at the goal-level, task-level, and action-level to mitigate the challenge of complex long-horizon tasks. In order to enhance open-source LLMs’ planning abilities, we introduce a goal-sensitive corpus generation method to create high-quality training data and conduct instruction tuning on the generated corpus. Since the complexity of the existing datasets is not high enough, we construct a more challenging dataset, LongTasks, to specifically evaluate planning ability on complex long-horizon tasks. We evaluate our method using various LLMs on four datasets in VirtualHome. Our results demonstrate a significant performance enhancement in robotic task planning, showcasing MLDT’s effectiveness in overcoming the limitations of existing methods based on open-source LLMs as well as its practicality in complex, real-world scenarios.$^{1}$(Our code is available at https://github.com/wuyike2000/MLDT)

Y. Wu and J. Zhang—Contributed equally to this research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 159.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Agent Can Say No: Robot Task Planning by Natural Language Feedback Between Planner and Executor

Integrating action knowledge and LLMs for task planning and situation handling in open worlds

Article 29 August 2023

Human–robot interaction through joint robot planning with large language models

Article 17 January 2025

Notes

1.
https://openai.com/chatgpt
2.
https://huggingface.co/THUDM/chatglm3-6b-32k
3.
https://openai.com/chatgpt
4.
https://openai.com/gpt-4

References

An, C., Gong, S., Zhong, M., Li, M., Zhang, J., Kong, L., Qiu, X.: L-eval: Instituting standardized evaluation for long context language models. CoRR abs/2307.11088 (2023)
Google Scholar
Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al.: Language models are few-shot learners. In: NeurIPS (2020)
Google Scholar
Chalvatzaki, G., Younes, A., Nandha, D., Le, A.T., Ribeiro, L.F., Gurevych, I.: Learning to reason over scene graphs: a case study of finetuning gpt-2 into a robot language model for grounded task planning. Frontiers in Robotics and AI 10 (2023)
Google Scholar
Chen, Y., Qian, S., Tang, H., Lai, X., Liu, Z., Han, S., Jia, J.: Longlora: Efficient fine-tuning of long-context large language models. CoRR abs/2309.12307 (2023)
Google Scholar
Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., Grave, E., Ott, M., Zettlemoyer, L., Stoyanov, V.: Unsupervised cross-lingual representation learning at scale. In: ACL. pp. 8440–8451. Association for Computational Linguistics (2020)
Google Scholar
Du, Z., Qian, Y., Liu, X., Ding, M., Qiu, J., Yang, Z., Tang, J.: GLM: general language model pretraining with autoregressive blank infilling. In: ACL (1). pp. 320–335. Association for Computational Linguistics (2022)
Google Scholar
Gilardi, F., Alizadeh, M., Kubli, M.: Chatgpt outperforms crowd-workers for text-annotation tasks. CoRR abs/2303.15056 (2023)
Google Scholar
Guhur, P., Chen, S., Pinel, R.G., Tapaswi, M., Laptev, I., Schmid, C.: Instruction-driven history-aware policies for robotic manipulations. In: CoRL. Proceedings of Machine Learning Research, vol. 205, pp. 175–187. PMLR (2022)
Google Scholar
Huang, C., Mees, O., Zeng, A., Burgard, W.: Visual language maps for robot navigation. In: ICRA. pp. 10608–10615. IEEE (2023)
Google Scholar
Huang, W., Abbeel, P., Pathak, D., Mordatch, I.: Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In: ICML. Proceedings of Machine Learning Research, vol. 162, pp. 9118–9147. PMLR (2022)
Google Scholar
Huang, W., Xia, F., Xiao, T., Chan, H., Liang, J., Florence, P., Zeng, A., Tompson, J., Mordatch, I., Chebotar, Y., et al.: Inner monologue: Embodied reasoning through planning with language models. In: CoRL. Proceedings of Machine Learning Research, vol. 205, pp. 1769–1782. PMLR (2022)
Google Scholar
Ichter, B., Brohan, A., Chebotar, Y., Finn, C., Hausman, K., Herzog, A., Ho, D., Ibarz, J., Irpan, A., Jang, E., et al.: Do as I can, not as I say: Grounding language in robotic affordances. In: CoRL. Proceedings of Machine Learning Research, vol. 205, pp. 287–318. PMLR (2022)
Google Scholar
Kaplan, J., McCandlish, S., Henighan, T., Brown, T.B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., Amodei, D.: Scaling laws for neural language models. CoRR abs/2001.08361 (2020)
Google Scholar
Laskar, M.T.R., Rahman, M., Jahan, I., Hoque, E., Huang, J.X.: Cqsumdp: A chatgpt-annotated resource for query-focused abstractive summarization based on debatepedia. CoRR abs/2305.06147 (2023)
Google Scholar
Li, S., Yan, J., Wang, H., Tang, Z., Ren, X., Srinivasan, V., Jin, H.: Instruction-following evaluation through verbalizer manipulation. CoRR abs/2307.10558 (2023)
Google Scholar
Li, S., Puig, X., Paxton, C., Du, Y., Wang, C., Fan, L., Chen, T., Huang, D., Akyürek, E., Anandkumar, A., et al.: Pre-trained language models for interactive decision-making. In: NeurIPS (2022)
Google Scholar
Liu, H., Ning, R., Teng, Z., Liu, J., Zhou, Q., Zhang, Y.: Evaluating the logical reasoning ability of chatgpt and GPT-4. CoRR abs/2304.03439 (2023)
Google Scholar
Liu, X., Yu, H., Zhang, H., Xu, Y., Lei, X., Lai, H., Gu, Y., Ding, H., Men, K., Yang, K., et al.: Agentbench: Evaluating llms as agents. CoRR abs/2308.03688 (2023)
Google Scholar
Puig, X., Ra, K., Boben, M., Li, J., Wang, T., Fidler, S., Torralba, A.: Virtualhome: Simulating household activities via programs. In: CVPR. pp. 8494–8502. Computer Vision Foundation / IEEE Computer Society (2018)
Google Scholar
Ray, P.P.: Chatgpt: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. Internet of Things and Cyber-Physical Systems (2023)
Google Scholar
Scao, T.L., Fan, A., Akiki, C., Pavlick, E., Ilic, S., Hesslow, D., Castagné, R., Luccioni, A.S., Yvon, F., Gallé, M., et al.: BLOOM: A 176b-parameter open-access multilingual language model. CoRR abs/2211.05100 (2022)
Google Scholar
Singh, I., Blukis, V., Mousavian, A., Goyal, A., Xu, D., Tremblay, J., Fox, D., Thomason, J., Garg, A.: Progprompt: Generating situated robot task plans using large language models. In: ICRA. pp. 11523–11530. IEEE (2023)
Google Scholar
Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P., Bhosale, S., et al.: Llama 2: Open foundation and fine-tuned chat models. CoRR abs/2307.09288 (2023)
Google Scholar
Tworkowski, S., Staniszewski, K., Pacek, M., Wu, Y., Michalewski, H., Milos, P.: Focused transformer: Contrastive training for context scaling. CoRR abs/2307.03170 (2023)
Google Scholar
Upadhayay, B., Behzadan, V.: Taco: Enhancing cross-lingual transfer for low-resource languages in llms through translation-assisted chain-of-thought processes. CoRR abs/2311.10797 (2023)
Google Scholar
Wei, J., Bosma, M., Zhao, V.Y., Guu, K., Yu, A.W., Lester, B., Du, N., Dai, A.M., Le, Q.V.: Finetuned language models are zero-shot learners. In: ICLR. OpenReview.net (2022)
Google Scholar
Wu, X., Duan, R., Ni, J.: Unveiling security, privacy, and ethical concerns of chatgpt. Journal of Information and Intelligence (2023)
Google Scholar
Wu, Z., Wang, Z., Xu, X., Lu, J., Yan, H.: Embodied task planning with large language models. CoRR abs/2307.01848 (2023)
Google Scholar
Xiang, J., Tao, T., Gu, Y., Shu, T., Wang, Z., Yang, Z., Hu, Z.: Language models meet world models: Embodied experiences enhance language models. CoRR abs/2305.10626 (2023)
Google Scholar
Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K.R., Cao, Y.: React: Synergizing reasoning and acting in language models. In: ICLR. OpenReview.net (2023)
Google Scholar
Zeng, A., Liu, M., Lu, R., Wang, B., Liu, X., Dong, Y., Tang, J.: Agenttuning: Enabling generalized agent abilities for llms. CoRR abs/2310.12823 (2023)
Google Scholar
Zhou, Z., Song, J., Yao, K., Shu, Z., Ma, L.: ISR-LLM: iterative self-refined large language model for long-horizon sequential task planning. CoRR abs/2308.13724 (2023)
Google Scholar

Download references

Acknowledgements

This work is partially supported by National Nature Science Foundation of China under No. U21A20488. We thank the Big Data Computing Center of Southeast University for providing the facility support on the numerical calculations in this paper.

Author information

Authors and Affiliations

Southeast University, Nanjing, Jiangsu, China
Yike Wu, Nan Hu & Guilin Qi
Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, Nanjing, China
Yike Wu, Nan Hu & Guilin Qi
Zhejiang University, Hangzhou, Zhejiang, China
Jiatao Zhang & Jun Shao
University of Chinese Academy of Sciences, Beijing, China
Lanling Tang
Zhejiang Lab, Hangzhou, Zhejiang, China
Jie Ren & Wei Song

Authors

Yike Wu
View author publications
You can also search for this author in PubMed Google Scholar
Jiatao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Nan Hu
View author publications
You can also search for this author in PubMed Google Scholar
Lanling Tang
View author publications
You can also search for this author in PubMed Google Scholar
Guilin Qi
View author publications
You can also search for this author in PubMed Google Scholar
Jun Shao
View author publications
You can also search for this author in PubMed Google Scholar
Jie Ren
View author publications
You can also search for this author in PubMed Google Scholar
Wei Song
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Guilin Qi or Wei Song .

Editor information

Editors and Affiliations

Osaka University, Suita, Osaka, Japan
Makoto Onizuka
KAIST, Daejeon, Korea (Republic of)
Jae-Gil Lee
Beihang University, Beijing, China
Yongxin Tong
Osaka University, Osaka, Japan
Chuan Xiao
Nagoya University, Nagoya, Japan
Yoshiharu Ishikawa
University of Grenoble Alpes, Saint-Martin d’Hères, France
Sihem Amer-Yahia
University of Michigan, Ann Arbor, MI, USA
H. V. Jagadish
Nagoya University, Nagoya, Japan
Kejing Lu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, Y. et al. (2024). MLDT: Multi-Level Decomposition for Complex Long-Horizon Robotic Task Planning with Open-Source Large Language Model. In: Onizuka, M., et al. Database Systems for Advanced Applications. DASFAA 2024. Lecture Notes in Computer Science, vol 14854. Springer, Singapore. https://doi.org/10.1007/978-981-97-5569-1_16

Download citation

DOI: https://doi.org/10.1007/978-981-97-5569-1_16
Published: 13 December 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-5568-4
Online ISBN: 978-981-97-5569-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics