Skip to main content
Log in

Human-Robot Interaction Through Gesture-Free Spoken Dialogue

  • Published:
Autonomous Robots Aims and scope Submit manuscript

Abstract

We present an approach to human-robot interaction through gesture-free spoken dialogue. Our approach is based on passive knowledge rarefication through goal disambiguation, a technique that allows a human operator to collaborate with a mobile robot on various tasks through spoken dialogue without making bodily gestures. A key assumption underlying our approach is that the operator and the robot share a common set of goals. Another key idea is that language, vision, and action share common memory structures.We discuss how our approach achieves four types of human-robot interaction: command, goal disambiguation, introspection, and instruction-based learning. We describe the system we developed to implement our approach and present experimental results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Amai,W., Fahrenholtz, J., and Leger, C. 2001. Hands-free operation of a small mobile robot. Autonomous Robots, 11(2):69-76.

    Google Scholar 

  • Bonasso, R.P., Firby, R.J., Gat, E., Kortenkamp, D., and Slack, M. 1997. A proven three-tiered architecture for programming autonomous robots. Journal of Experimental and Theoretical Artificial Intelligence, 9(1):171-215.

    Google Scholar 

  • Bookstein, A., Kulyukin, V., and Raita, T. 2002. Generalized hamming distance. Information Retrieval, 5:353-375.

    Google Scholar 

  • Boykov, Y. and Huttenlocher, D. 1999. A new bayesian framework for object recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE Computer Society.

  • Cassell, J., McNeill, D., and McCullough, E. 1999. Speech-gesture mismatches: Evidence for one underlying representation of linguistic and nonlinguistic information. Pragmatics and Cognition, 7(1):1-33.

    Google Scholar 

  • Chang, P. and Krumm, J. 1999. Object recognition with color cooccurrence histograms. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE Computer Society.

  • Chapman, D. 1991. Vision, Instruction, and Action. MIT Press: New York.

    Google Scholar 

  • Cheng, G. and Zelinsky, A. 2001. Supervised autonomy: A framework for human-robot systems development. Autonomous Robots, 10(3):251-266.

    Google Scholar 

  • Firby, R.J., Prokopowicz, P., and Swain, M. 1995. Collecting trash: A test of purposive vision. In Proceedings of the Workshop on Vision for Robotics, International Joint Conference on Artificial Intelligence, AAAI Press.

  • Firby, R.J. 1989. Adaptive execution in complex dynamic worlds. Unpublished Ph.D. dissertation, Computer Science Department, Yale University.

  • Fitzgerald, W. and Firby, R.J. 2000. Dialogue systems require a reactive task architecture. In Proceedings of the AAAI Spring Symposium, AAAI Press.

  • Fong, T. and Thorpe, C. 2001. Vehicle teleoperation interfaces. Autonomous Robots, 11(2):9-18.

    Google Scholar 

  • Fong, T., Thorpe, C., and Baur, C. 2001. Collaboration, dialogue, and human-robot interaction. In Proceedings of the 10th International Symposium of Robotics Research, Lorne, Victoria, Australia.

  • Hainsworth, D.W. 2001. Teleoperation user interfaces for mining robotics. Autonomous Robots, 11(1):19-28.

    Google Scholar 

  • Horswill, I. 1995. Integrating vision and natural language without central models. In Proceedings of the AAAI Fall Symposium on Emboddied Language and Action, AAAI Press.

  • Iwano, Y., Kageyama, S., Morikawa, E., Nakazato, S., and Shirai, K. 1996. Analysis of head movements and its role in spoken dialogue. In Proceedings of the International Conference on Spoken Language Processing (ICSLP-96), Philadelphia, PA, vol. 4, pp. 2167-2170.

    Google Scholar 

  • Kortenkamp, D., Huber, E., and Bonasso, P. 1996. Recognizing and interpreting gestures on a mobile robot. In Proceedings of the AAAI/IAAI Conference, vol. 2, pp. 915-921.

    Google Scholar 

  • Kortenkamp, D. and Schultz, A. 1999. Integrating robotics research. Autonomous Robots, 6(3):243-245.

    Google Scholar 

  • Kulyukin, V. 2003. Towards hands-free human-robot interaction through spoken dialog. In Proceedings of the AAAI Spring Symposium on Human Interaction with Autonomous Systems in Complex Environments, Palo Alto, CA.

  • Kulyukin, V. and DeGraw, N. 2003. Integrating language and vision for voice communication with three-tiered autonomous robots. In Proceedings of the International Conference on Artificial Intelligence (IC-AI 2003), Las Vegas, NV, pp. 737-743.

  • Kulyukin, V. and Blair, M. 2003a. Intelligent voice control of electronic devices. In Proceedings of the Rehabilitation Engineering and Assistive Technology Society of North America (RESNA-2003), Atlanta, GA, avail. on CD-ROM.

  • Kulyukin, V. and Blair, M. 2003b. Distributed tracking and guidance in indoor environments. In Proceedings of the Rehabilitation Engineering and Assistive Technology Society of North America (RESNA-2003), Atlanta, GA, avail. on CD-ROM.

  • Kulyukin, V. and Morley, N. 2002. Integrated object recognition in the three-tiered robot architecture. In Proceedings of the 2002 International Conference on Artificial Intelligence (IC-AI 2002), Las Vegas, NA, pp. 367-373.

  • Kulyukin,V. and Settle, A. 2001. Ranked retrieval with semantic networks and vector spaces. Journal of the American Society for Information Science and Technology (JASIST), 52(14):1224-1233.

    Google Scholar 

  • Kulyukin, V. and Steele, A. 2002a. Instruction and action in the three-tiered robot architecture. In Proceedings of the International Symposium on Robotics and Automation, Toluca, Mexico, avail. on CD-ROM.

  • Kulyukin, V. and Steele, A. 2002b. Input recognition in voice control interfaces to three-tiered autonomous agents. In Proceedings of the International Lisp Conference, Association of Lisp Users, San Francisco, CA.

    Google Scholar 

  • Lane, J.C., Carignan, C.R., and Akin, D.L. 2001. Advanced operator interface design for complex space telerobots. Autonomous Robots, 11(1):69-76.

    Google Scholar 

  • Martin, C. 1993. Direct memory access parsing. Technical Report CS93-07, Computer Science Department, The University of Chicago.

  • Matsui, T., Asah, H., Fry, J., Motomura, Y., Asano, F., Kurita, T., Hara, I., and Otsu, N. 1999. Integrated natural spoken dialogue system of Jijo-2 mobile robot for office services. In Proceedings of the AAAI Conference, Orlando, FL, pp. 621-627.

  • Parker, J.R. 1993. Practical Computer Vision Using C. John Wiley and Sons: New York.

    Google Scholar 

  • Perzanowski, D., Schultz, A., and Adams, W. 1998. In Integrating Natural Language and Gesture in a Robotics Domain. Proceedings of the IEEE International Symposium on Intelligent Control: ISIC/CIRA/ISAS Joint Conference, Gaithersburg, MD: National Institute of Standards and Technology, pp. 247-252.

    Google Scholar 

  • Rogers, E. and Murphy, R. 2001. Human-Robot Interaction. Final Report for DARPA/NSF Study on Human-Robot Interaction, California Polytechnic State University, San Luis Obispo, CA.

    Google Scholar 

  • Roman, S. 1992. Coding and Information Theory. Springer-Verlag: New York.

    Google Scholar 

  • Rich, C., Sidner, C., and Lesh, N. 2001. COLLAGEN: Applying collaborative discourse theory to human-computer interaction. AI Magazine, 22(4):15-25.

    Google Scholar 

  • Riesbeck, C.K. and Schank, R.C. 1989. Inside Case-Based Reasoning. Lawrence Erlbaum Associates: Hillsdale.

    Google Scholar 

  • Rybski, P. and Voyles, R. 1999. Interactive task training of a mobile robot through human gesture recognition. In Proceedings of the IEEE International Conference on Robotics and Automation, Detroit, MI, pp. 664-669.

  • Swain, M.J. and Ballard, D.H. 1991. Color indexing. International Journal of Computer Vision, 7:11-32.

    Google Scholar 

  • Torrance, M. 1994. Natural communication with robots. Unpublished Masters Thesis, MIT.

  • Waldherr, S., Romero, R., and Thrun, S. 2000. A gesture based interface for human-robot interaction. International Journal of Computer Vision, 9(3):151-173.

    Google Scholar 

  • Young, S.S., Scott, P.D., and Nasrabadi, N. 1994. Multi-layer hop-field neural network for object recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE Computer Society.

Download references

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kulyukin, V. Human-Robot Interaction Through Gesture-Free Spoken Dialogue. Autonomous Robots 16, 239–257 (2004). https://doi.org/10.1023/B:AURO.0000025789.33843.6d

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:AURO.0000025789.33843.6d

Navigation