Grounding natural language instructions to semantic goal representations for abstraction and generalization

Arumugam, Dilip; Karamcheti, Siddharth; Gopalan, Nakul; Williams, Edward C.; Rhee, Mina; Wong, Lawson L. S.; Tellex, Stefanie

doi:10.1007/s10514-018-9792-8

Grounding natural language instructions to semantic goal representations for abstraction and generalization

Published: 13 August 2018

Volume 43, pages 449–468, (2019)
Cite this article

Autonomous Robots Aims and scope Submit manuscript

Dilip Arumugam¹,
Siddharth Karamcheti ORCID: orcid.org/0000-0003-2153-2455¹,
Nakul Gopalan¹,
Edward C. Williams¹,
Mina Rhee¹,
Lawson L. S. Wong¹ &
…
Stefanie Tellex¹

1261 Accesses
Explore all metrics

Abstract

Language grounding is broadly defined as the problem of mapping natural language instructions to robot behavior. To truly be effective, these language grounding systems must be accurate in their selection of behavior, efficient in the robot’s realization of that selected behavior, and capable of generalizing beyond commands and environment configurations only seen at training time. One choice that is crucial to the success of a language grounding model is the choice of representation used to capture the objective specified by the input command. Prior work has been varied in its use of explicit goal representations, with some approaches lacking a representation altogether, resulting in models that infer whole sequences of robot actions, while other approaches map to carefully constructed logical form representations. While many of the models in either category are reasonably accurate, they fail to offer either efficient execution or any generalization without requiring a large amount of manual specification. In this work, we take a first step towards language grounding models that excel across accuracy, efficiency, and generalization through the construction of simple, semantic goal representations within Markov decision processes. We propose two related semantic goal representations that take advantage of the hierarchical structure of tasks and the compositional nature of language respectively, and present multiple grounding models for each. We validate these ideas empirically with results collected from following text instructions within a simulated mobile-manipulator domain, as well as demonstrations of a physical robot responding to spoken instructions in real time. Our grounding models tie abstraction in language commands to a hierarchical planner for the robot’s execution, enabling a response-time speed-up of several orders of magnitude over baseline planners within sufficiently large domains. Concurrently, our grounding models for generalization infer elements of the semantic representation that are subsequently combined to form a complete goal description, enabling the interpretation of commands involving novel combinations never seen during training. Taken together, our results show that the design of semantic goal representation has powerful implications for the accuracy, efficiency, and generalization capabilities of language grounding models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning Unknown Groundings for Natural Language Interaction with Mobile Robots

Inferring Maps and Behaviors from Natural Language Instructions

From Natural Language Instructions to Structured Robot Plans

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Artzi, Y, & Zettlemoyer, L. (2013). Weakly supervised learning of semantic parsers for mapping instructions to actions. In Annual meeting of the association for computational linguistics.
Arumugam, D., Karamcheti, S., Gopalan, N., Wong, L., & Tellex, S. (2017). Accurately and efficiently interpreting human–robot instructions of varying granularities. In Robotics: Science and systems XIII. https://doi.org/10.15607/rss.2017.xiii.056.
Bellman, R. (1957). Dynamic programming. Princeton: Princeton University Press.
MATH Google Scholar
Bengio, Y., Ducharme, R., Vincent, P., & Janvin, C. (2000). A neural probabilistic language model. Journal of Machine Learning Research, 3, 1137–1155.
MATH Google Scholar
Brown, P. F., Cocke, J., Pietra, S. D., Pietra, V. J. D., Jelinek, F., Lafferty, J. D., et al. (1990). A statistical approach to machine translation. Computational Linguistics, 16, 79–85.
Google Scholar
Brown, P. F., Pietra, S. D., Pietra, V. J. D., & Mercer, R. L. (1993). The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19, 263–311.
Google Scholar
Chen, D. L., & Mooney, R. J. (2011). Learning to interpret natural language navigation instructions from observations. In AAAI Conference on artificial intelligence.
Cho, K., van Merriënboer, B., Gülçehre, Ç., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1724–1734). Doha, Qatar: Association for Computational Linguistics. http://www.aclweb.org/anthology/D14-1179.
Chung, J., Gülçehre, Ç., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. In Presented at the deep learning workshop at NIPS2014. arXiv:1412.3555.
Dieterrich, T. G. (2000). Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal on Artificial Intelligence Research, 13, 227–303.
Article MathSciNet MATH Google Scholar
Diuk, C., Cohen, A., & Littman, M. L. (2008). An object-oriented representation for efficient reinforcement learning. In International conference on machine learning.
Dzifcak, J., Scheutz, M., Baral, C., & Schermerhorn, P. (2009). What to do and how to do it: Translating natural language directives into temporal and dynamic logic representation for goal management and action execution. In IEEE international conference on robotics and automation.
Google. (2017). Google Speech API. https://cloud.google.com/speech/. Accessed 30 January, 2017.
Gopalan, N., desJardins, M., Littman, M. L., MacGlashan, J., Squire, S., Tellex, S., Winder, R. J., & Wong, L. L. S. (2017). Planning with abstract Markov decision processes. In International conference on automated planning and scheduling.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.
Article Google Scholar
Howard, T. M., Tellex, S., & Roy, N. (2014). A natural language planner interface for mobile manipulators. In IEEE International conference on robotics and automation.
Iyyer, M., Manjunatha, V., Boyd-Graber, J. L., Daumé, H. (2015). Deep unordered composition rivals syntactic methods for text classification. In Annual meeting of the association for computational linguistics.
Jong, N. K., & Stone, P. (2008). Hierarchical model-based reinforcement learning: R-max + MAXQ. In International conference on machine learning.
Junghanns, A., & Schaeeer, J. (1997). Sokoban: A challenging single-agent search problem. In International joint conference on artificial intelligence workshop on using games as an experimental testbed for AI reasearch.
Karamcheti, S., Williams, E. C., Arumugam, D., Rhee, M., Gopalan, N., Wong, L. L. S., & Tellex, S. (2017). A tale of two DRAGGNs: A hybrid approach for interpreting action-oriented and goal-oriented instructions. In Annual meeting of the association for computational linguistics workshop on language grounding for robotics.
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. CoRR. arxiv:1412.6980.
Liang, P. (2016). Learning executable semantic parsers for natural language understanding. Communications of the ACM, 59(9), 68–76.
Article Google Scholar
MacGlashan, J., Babeş-Vroman, M., desJardins, M., Littman, M., Muresan, S., Squire, S., et al. (2015). Grounding english commands to reward functions. In Proceedings of robotics: Science and systems. https://doi.org/10.15607/RSS.2015.XI.018.
MacMahon, M., Stankiewicz, B., & Kuipers, B. (2006). Walk the talk: Connecting language, knowledge, and action in route instructions. In National conference on artificial intelligence.
Matuszek, C., Herbst, E., Zettlemoyer, L., & Fox, D. (2012). Learning to parse natural language commands to a robot control system. In International symposium on experimental robotics.
McGovern, A., Sutton, R. S., & Fagg, A. H. (1997). Roles of macro-actions in accelerating reinforcement learning. In Grace Hopper celebration of women in computing (pp. 13–18).
McMahan, H. ., Likhachev, M., & Gordon, G. J. (2005). Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees. In International conference on machine learning.
Mikolov, T., Karafiát, M., Burget, L., Cernocký, J., & Khudanpur, S. (2010). Recurrent neural network based language model. In T. Kobayashi, K. Hirose, & S. Nakamura, INTERSPEECH 2010, 11th Annual conference of the international speech communication association, Makuhari, Chiba, Japan (pp. 1045–1048). ISCA. http://www.isca-speech.org/archive/interspeech_2010/i10_1045.html.
Mikolov, T., Kombrink, S., Burget, L., Cernocký, J., & Khudanpur, S. (2011). Extensions of recurrent neural network language model. In IEEE international conference on acoustics, speech, and signal processing.
Mikolov, T., Chen, K., Corrado, G. S., & Dean, J. (2013). Efficient estimation of word representations in vector space. CoRR. arxiv:1301.3781.
Ng, A. Y., & Russell, S. (2000). Algorithms for inverse reinforcement learning. In International conference on machine learning.
Paul, R., Arkin, J., Roy, N., & Howard, T. M. (2016). Efficient grounding of abstract spatial concepts for natural language interaction with robot manipulators. In Proceedings of robotics: Science and systems. https://doi.org/10.15607/RSS.2016.XII.037.
Quigley, M., Faust, J., Foote, T., & Leibs, J. (2009). ROS: an open-source robot operating system. In IEEE international conference on robotics and automation workshop on open source software.
Reed, S. E., & de Freitas, N. (2016). Neural programmer-interpreters. In International conference on learning representations.
Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15, 1929–1958.
MathSciNet MATH Google Scholar
Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Proceedings of the 27th international conference on neural information processing systems, NIPS’14, Montreal, Canada (Vol. 2, pp. 3104–3112). Cambridge, MA: MIT Press. http://dl.acm.org/citation.cfm?id=2969033.2969173
Sutton, R. S., Precup, D., & Singh, S. P. (1999). Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112, 181–211.
Article MathSciNet MATH Google Scholar
Tellex, S., Kollar, T., Dickerson, S., Walter, M. R., Banerjee, A. G., Teller, S., & Roy, N. (2011). Understanding natural language commands for robotic navigation and mobile manipulation. In AAAI conference on artificial intelligence.
Winograd, T. (1971). Procedures as a representation for data in a computer program for understanding natural language. Technical report, Artificial Intelligence Laboratory, Massachusetts Institute of Technology.
Yamada, T., Murata, S., Arie, H., & Ogata, T. (2016). Dynamical linking of positive and negative sentences to goal-oriented robot behavior by hierarchical RNN. In International conference on artificial neural networks.
Zelle, J. M., & Mooney, R. J. (1996) Learning to parse database queries using inductive logic programming. In National conference on artificial intelligence.
Zettlemoyer, L. S., & Collins, M. (2005). Learning to map sentences to logical form: Structured classification with probabilistic categorial grammars. In Proceedings of the twenty-first conference on uncertainty in artificial intelligence (UAI-05) (pp. 658–666). Arlington, VA: AUAI Press. https://dslpitt.org/uai/displayArticleDetails.jsp?mmnu=1&smnu=2&article_id=1209&proceeding_id=21.

Download references

Acknowledgements

This work is supported by the National Science Foundation under Grant Number IIS-1637614, the US Army/DARPA under Grant Number W911NF-15-1-0503, and the National Aeronautics and Space Administration under Grant Number NNX16AR61G.

Lawson L.S. Wong was supported by a Croucher Foundation Fellowship.

Author information

Authors and Affiliations

Brown University, Providence, RI, USA
Dilip Arumugam, Siddharth Karamcheti, Nakul Gopalan, Edward C. Williams, Mina Rhee, Lawson L. S. Wong & Stefanie Tellex

Authors

Dilip Arumugam
View author publications
You can also search for this author inPubMed Google Scholar
Siddharth Karamcheti
View author publications
You can also search for this author inPubMed Google Scholar
Nakul Gopalan
View author publications
You can also search for this author inPubMed Google Scholar
Edward C. Williams
View author publications
You can also search for this author inPubMed Google Scholar
Mina Rhee
View author publications
You can also search for this author inPubMed Google Scholar
Lawson L. S. Wong
View author publications
You can also search for this author inPubMed Google Scholar
Stefanie Tellex
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Siddharth Karamcheti.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This is one of several papers published in Autonomous Robots comprising the “Special Issue on Robotics Science and Systems”.

Dilip Arumugam and Siddharth Karamcheti have contributed equally to this work.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Arumugam, D., Karamcheti, S., Gopalan, N. et al. Grounding natural language instructions to semantic goal representations for abstraction and generalization. Auton Robot 43, 449–468 (2019). https://doi.org/10.1007/s10514-018-9792-8

Download citation

Received: 09 December 2017
Accepted: 26 July 2018
Published: 13 August 2018
Issue Date: 15 February 2019
DOI: https://doi.org/10.1007/s10514-018-9792-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Grounding natural language instructions to semantic goal representations for abstraction and generalization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Learning Unknown Groundings for Natural Language Interaction with Mobile Robots

Inferring Maps and Behaviors from Natural Language Instructions

From Natural Language Instructions to Structured Robot Plans

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now