Skip to main content

MLDT: Multi-Level Decomposition for Complex Long-Horizon Robotic Task Planning with Open-Source Large Language Model

  • Conference paper
  • First Online:
Database Systems for Advanced Applications (DASFAA 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14854))

Included in the following conference series:

  • 469 Accesses

Abstract

In the realm of data-driven AI technology, the application of open-source large language models (LLMs) in robotic task planning represents a significant milestone. Recent robotic task planning methods based on open-source LLMs typically leverage vast task planning datasets to enhance models’ planning abilities. While these methods show promise, they struggle with complex long-horizon tasks, which require comprehending more context and generating longer action sequences. This paper addresses this limitation by proposing MLDT, the Multi-Level Decomposition Task planning method. This method innovatively decomposes tasks at the goal-level, task-level, and action-level to mitigate the challenge of complex long-horizon tasks. In order to enhance open-source LLMs’ planning abilities, we introduce a goal-sensitive corpus generation method to create high-quality training data and conduct instruction tuning on the generated corpus. Since the complexity of the existing datasets is not high enough, we construct a more challenging dataset, LongTasks, to specifically evaluate planning ability on complex long-horizon tasks. We evaluate our method using various LLMs on four datasets in VirtualHome. Our results demonstrate a significant performance enhancement in robotic task planning, showcasing MLDT’s effectiveness in overcoming the limitations of existing methods based on open-source LLMs as well as its practicality in complex, real-world scenarios.\(^{1}\)(Our code is available at https://github.com/wuyike2000/MLDT)

Y. Wu and J. Zhang—Contributed equally to this research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://openai.com/chatgpt

  2. 2.

    https://huggingface.co/THUDM/chatglm3-6b-32k

  3. 3.

    https://openai.com/chatgpt

  4. 4.

    https://openai.com/gpt-4

References

  1. An, C., Gong, S., Zhong, M., Li, M., Zhang, J., Kong, L., Qiu, X.: L-eval: Instituting standardized evaluation for long context language models. CoRR abs/2307.11088 (2023)

    Google Scholar 

  2. Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al.: Language models are few-shot learners. In: NeurIPS (2020)

    Google Scholar 

  3. Chalvatzaki, G., Younes, A., Nandha, D., Le, A.T., Ribeiro, L.F., Gurevych, I.: Learning to reason over scene graphs: a case study of finetuning gpt-2 into a robot language model for grounded task planning. Frontiers in Robotics and AI 10 (2023)

    Google Scholar 

  4. Chen, Y., Qian, S., Tang, H., Lai, X., Liu, Z., Han, S., Jia, J.: Longlora: Efficient fine-tuning of long-context large language models. CoRR abs/2309.12307 (2023)

    Google Scholar 

  5. Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., Grave, E., Ott, M., Zettlemoyer, L., Stoyanov, V.: Unsupervised cross-lingual representation learning at scale. In: ACL. pp. 8440–8451. Association for Computational Linguistics (2020)

    Google Scholar 

  6. Du, Z., Qian, Y., Liu, X., Ding, M., Qiu, J., Yang, Z., Tang, J.: GLM: general language model pretraining with autoregressive blank infilling. In: ACL (1). pp. 320–335. Association for Computational Linguistics (2022)

    Google Scholar 

  7. Gilardi, F., Alizadeh, M., Kubli, M.: Chatgpt outperforms crowd-workers for text-annotation tasks. CoRR abs/2303.15056 (2023)

    Google Scholar 

  8. Guhur, P., Chen, S., Pinel, R.G., Tapaswi, M., Laptev, I., Schmid, C.: Instruction-driven history-aware policies for robotic manipulations. In: CoRL. Proceedings of Machine Learning Research, vol. 205, pp. 175–187. PMLR (2022)

    Google Scholar 

  9. Huang, C., Mees, O., Zeng, A., Burgard, W.: Visual language maps for robot navigation. In: ICRA. pp. 10608–10615. IEEE (2023)

    Google Scholar 

  10. Huang, W., Abbeel, P., Pathak, D., Mordatch, I.: Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In: ICML. Proceedings of Machine Learning Research, vol. 162, pp. 9118–9147. PMLR (2022)

    Google Scholar 

  11. Huang, W., Xia, F., Xiao, T., Chan, H., Liang, J., Florence, P., Zeng, A., Tompson, J., Mordatch, I., Chebotar, Y., et al.: Inner monologue: Embodied reasoning through planning with language models. In: CoRL. Proceedings of Machine Learning Research, vol. 205, pp. 1769–1782. PMLR (2022)

    Google Scholar 

  12. Ichter, B., Brohan, A., Chebotar, Y., Finn, C., Hausman, K., Herzog, A., Ho, D., Ibarz, J., Irpan, A., Jang, E., et al.: Do as I can, not as I say: Grounding language in robotic affordances. In: CoRL. Proceedings of Machine Learning Research, vol. 205, pp. 287–318. PMLR (2022)

    Google Scholar 

  13. Kaplan, J., McCandlish, S., Henighan, T., Brown, T.B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., Amodei, D.: Scaling laws for neural language models. CoRR abs/2001.08361 (2020)

    Google Scholar 

  14. Laskar, M.T.R., Rahman, M., Jahan, I., Hoque, E., Huang, J.X.: Cqsumdp: A chatgpt-annotated resource for query-focused abstractive summarization based on debatepedia. CoRR abs/2305.06147 (2023)

    Google Scholar 

  15. Li, S., Yan, J., Wang, H., Tang, Z., Ren, X., Srinivasan, V., Jin, H.: Instruction-following evaluation through verbalizer manipulation. CoRR abs/2307.10558 (2023)

    Google Scholar 

  16. Li, S., Puig, X., Paxton, C., Du, Y., Wang, C., Fan, L., Chen, T., Huang, D., Akyürek, E., Anandkumar, A., et al.: Pre-trained language models for interactive decision-making. In: NeurIPS (2022)

    Google Scholar 

  17. Liu, H., Ning, R., Teng, Z., Liu, J., Zhou, Q., Zhang, Y.: Evaluating the logical reasoning ability of chatgpt and GPT-4. CoRR abs/2304.03439 (2023)

    Google Scholar 

  18. Liu, X., Yu, H., Zhang, H., Xu, Y., Lei, X., Lai, H., Gu, Y., Ding, H., Men, K., Yang, K., et al.: Agentbench: Evaluating llms as agents. CoRR abs/2308.03688 (2023)

    Google Scholar 

  19. Puig, X., Ra, K., Boben, M., Li, J., Wang, T., Fidler, S., Torralba, A.: Virtualhome: Simulating household activities via programs. In: CVPR. pp. 8494–8502. Computer Vision Foundation / IEEE Computer Society (2018)

    Google Scholar 

  20. Ray, P.P.: Chatgpt: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. Internet of Things and Cyber-Physical Systems (2023)

    Google Scholar 

  21. Scao, T.L., Fan, A., Akiki, C., Pavlick, E., Ilic, S., Hesslow, D., Castagné, R., Luccioni, A.S., Yvon, F., Gallé, M., et al.: BLOOM: A 176b-parameter open-access multilingual language model. CoRR abs/2211.05100 (2022)

    Google Scholar 

  22. Singh, I., Blukis, V., Mousavian, A., Goyal, A., Xu, D., Tremblay, J., Fox, D., Thomason, J., Garg, A.: Progprompt: Generating situated robot task plans using large language models. In: ICRA. pp. 11523–11530. IEEE (2023)

    Google Scholar 

  23. Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P., Bhosale, S., et al.: Llama 2: Open foundation and fine-tuned chat models. CoRR abs/2307.09288 (2023)

    Google Scholar 

  24. Tworkowski, S., Staniszewski, K., Pacek, M., Wu, Y., Michalewski, H., Milos, P.: Focused transformer: Contrastive training for context scaling. CoRR abs/2307.03170 (2023)

    Google Scholar 

  25. Upadhayay, B., Behzadan, V.: Taco: Enhancing cross-lingual transfer for low-resource languages in llms through translation-assisted chain-of-thought processes. CoRR abs/2311.10797 (2023)

    Google Scholar 

  26. Wei, J., Bosma, M., Zhao, V.Y., Guu, K., Yu, A.W., Lester, B., Du, N., Dai, A.M., Le, Q.V.: Finetuned language models are zero-shot learners. In: ICLR. OpenReview.net (2022)

    Google Scholar 

  27. Wu, X., Duan, R., Ni, J.: Unveiling security, privacy, and ethical concerns of chatgpt. Journal of Information and Intelligence (2023)

    Google Scholar 

  28. Wu, Z., Wang, Z., Xu, X., Lu, J., Yan, H.: Embodied task planning with large language models. CoRR abs/2307.01848 (2023)

    Google Scholar 

  29. Xiang, J., Tao, T., Gu, Y., Shu, T., Wang, Z., Yang, Z., Hu, Z.: Language models meet world models: Embodied experiences enhance language models. CoRR abs/2305.10626 (2023)

    Google Scholar 

  30. Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K.R., Cao, Y.: React: Synergizing reasoning and acting in language models. In: ICLR. OpenReview.net (2023)

    Google Scholar 

  31. Zeng, A., Liu, M., Lu, R., Wang, B., Liu, X., Dong, Y., Tang, J.: Agenttuning: Enabling generalized agent abilities for llms. CoRR abs/2310.12823 (2023)

    Google Scholar 

  32. Zhou, Z., Song, J., Yao, K., Shu, Z., Ma, L.: ISR-LLM: iterative self-refined large language model for long-horizon sequential task planning. CoRR abs/2308.13724 (2023)

    Google Scholar 

Download references

Acknowledgements

This work is partially supported by National Nature Science Foundation of China under No. U21A20488. We thank the Big Data Computing Center of Southeast University for providing the facility support on the numerical calculations in this paper.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Guilin Qi or Wei Song .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wu, Y. et al. (2024). MLDT: Multi-Level Decomposition for Complex Long-Horizon Robotic Task Planning with Open-Source Large Language Model. In: Onizuka, M., et al. Database Systems for Advanced Applications. DASFAA 2024. Lecture Notes in Computer Science, vol 14854. Springer, Singapore. https://doi.org/10.1007/978-981-97-5569-1_16

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-5569-1_16

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-5568-4

  • Online ISBN: 978-981-97-5569-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics