Skip to main content

Robobarista: Object Part Based Transfer of Manipulation Trajectories from Crowd-Sourcing in 3D Pointclouds

  • Chapter
  • First Online:
Robotics Research

Part of the book series: Springer Proceedings in Advanced Robotics ((SPAR,volume 3))

  • 3988 Accesses

Abstract

There is a large variety of objects and appliances in human environments, such as stoves, coffee dispensers, juice extractors, and so on. It is challenging for a roboticist to program a robot for each of these object types and for each of their instantiations. In this work, we present a novel approach to manipulation planning based on the idea that many household objects share similarly-operated object parts. We formulate the manipulation planning as a structured prediction problem and design a deep learning model that can handle large noise in the manipulation demonstrations and learns features from three different modalities: point-clouds, language and trajectory. In order to collect a large number of manipulation demonstrations for different objects, we developed a new crowd-sourcing platform called Robobarista. We test our model on our dataset consisting of 116 objects with 249 parts along with 250 language instructions, for which there are 1225 crowd-sourced manipulation demonstrations. We further show that our robot can even manipulate objects it has never seen before.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    We have made sure that it does not initialize with trajectories from other folds to keep 5-fold cross-validation in experiment section valid.

  2. 2.

    Although not necessary for training our model, we also collected trajectories from the expert for evaluation purposes.

References

  1. Abbeel, P., Coates, A., Ng, A.: Autonomous helicopter aerobatics through apprenticeship learning. IJRR (2010)

    Google Scholar 

  2. Aha, D.W., Kibler, D.: Albert. M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)

    Google Scholar 

  3. Alexander, B., Hsiao, K., Jenkins, C., Suay, B., Toris, R.: Robot web tools [ros topics]. IEEE Robot. Autom. Mag. 19(4), 20–23 (2012)

    Article  Google Scholar 

  4. Argall, B., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robot. Auton. Syst. 57(5), 469–483 (2009)

    Article  Google Scholar 

  5. Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Analysis Mach. Intell. 35(8), 1798–1828 (2013)

    Article  Google Scholar 

  6. Besl, P.J., McKay, N.D.: Method for registration of 3-d shapes. In: International Society for Optics and Photonics, Robotics-DL tentative, pp. 586–606 (1992)

    Google Scholar 

  7. Blaschko, M., Lampert, C.: Learning to localize objects with structured output regression. In: ECCV (2008)

    Google Scholar 

  8. Bollini, M., Barry, J., Rus, D.: Bakebot: baking cookies with the pr2. In: IROS PR2 Workshop (2011)

    Google Scholar 

  9. Crick, C., Osentoski, S., Jay, G., Jenkins, O.C.: Human and robot perception in large-scale learning from demonstration. In: HRI, ACM (2011)

    Google Scholar 

  10. Dang, H., Allen, P.K.: Semantic grasping: planning robotic grasps functionally suitable for an object manipulation task. In: IROS (2012)

    Google Scholar 

  11. Daniel, C., Neumann, G., Peters, J.: Learning concurrent motor skills in versatile solution spaces. In: IROS, IEEE (2012)

    Google Scholar 

  12. Detry, R., Ek, C.H., Madry, M., Kragic, D.: Learning a dictionary of prototypical grasp-predicting parts from grasping experience. In: ICRA (2013)

    Google Scholar 

  13. Endres, F., Trinkle, J., Burgard, W.: Learning the dynamics of doors for robotic manipulation. In: IROS (2013)

    Google Scholar 

  14. Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. PAMI 32(9), 1627–1645 (2010)

    Article  Google Scholar 

  15. Forbes, M., Chung, M.J.-Y., Cakmak, M., Rao, R.P.: Robot programming by demonstration with crowdsourced action fixes. In: Second AAAI Conference on Human Computation and Crowd sourcing (2014)

    Google Scholar 

  16. Gibson, J.J.: The Ecological Approach to Visual Perception. Psychology Press, Hillsdale (1986)

    Google Scholar 

  17. Girshick, R., Felzenszwalb, P., McAllester, D.: Object detection with grammar models. In: NIPS (2011)

    Google Scholar 

  18. Hadsell, R., Erkan, A., Sermanet, P., Scoffier, M., Muller, U., LeCun, Y.: Deep belief net learning in a long-range vision system for autonomous off-road driving. In: IROS, pp. 628–633. IEEE (2008)

    Google Scholar 

  19. Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors (2012). arXiv:1207.0580

  20. Hsiao, K., Chitta, S., Ciocarlie, M., Jones, E.: Contact-reactive grasping of objects with partial shape information. In: IROS (2010)

    Google Scholar 

  21. Hu, N., Lou, Z., Englebienne, G., Krse, B.: Learning to recognize human activities from soft labeled data. In: Proceedings of Robotics: Science and Systems, Berkeley, USA (2014)

    Google Scholar 

  22. Izadi, S., Kim, D., Hilliges, O., Molyneaux, D., Newcombe, R., Kohli, P. et al.: Kinectfusion: real-time 3d reconstruction and interaction using a moving depth camera. In: ACM Symposium on UIST (2011)

    Google Scholar 

  23. Jain, A., Wojcik, B., Joachims, T., Saxena, A.: Learning preferences for manipulation tasks from online coactive feedback. Int. J. Robot. Res. 34(10), 1296–1313 (2015)

    Article  Google Scholar 

  24. Joachims, T., Finley, T., Yu, C.-N.J.: Cutting-plane training of structural svms. Mach. Learn. (2009)

    Google Scholar 

  25. Katz, D., Kazemi, M., Bagnell, J.A., Stentz, A.: Interactive segmentation, tracking, and kinematic modeling of unknown 3d articulated objects. In: ICRA, pp. 5003–5010. IEEE (2013)

    Google Scholar 

  26. Kehoe, B., Matsukawa, A., Candido, S., Kuffner, J., Goldberg, K.: Cloud-based robot grasping with the google object recognition engine. In: ICRA (2013)

    Google Scholar 

  27. Koppula, H., Saxena, A.: Anticipating human activities using object affordances for reactive robotic response. In: RSS (2013)

    Google Scholar 

  28. Koppula, H., Anand, A., Joachims, T., Saxena, A.: Semantic labeling of 3d point clouds for indoor scenes. In: NIPS (2011)

    Google Scholar 

  29. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

    Google Scholar 

  30. Kroemer, O., Ugur, E., Oztop, E., Peters, J.: A kernel-based approach to direct action perception. In: ICRA (2012)

    Google Scholar 

  31. Lai, K., Bo, L., Fox,D.: Unsupervised feature learning for 3d scene labeling. In: ICRA (2014)

    Google Scholar 

  32. Lenz, I., Lee, H., Saxena, A.: Deep learning for detecting robotic grasps. In: RSS (2013)

    Google Scholar 

  33. Li, L.-J., Socher, R., Fei-Fei, L.: Towards total scene understanding: classification, annotation and segmentation in an automatic framework. In: CVPR (2009)

    Google Scholar 

  34. Mangin, O., Oudeyer, P.-Y. et al.: Unsupervised learning of simultaneous motor primitives through imitation. In: IEEE ICDL-EPIROB (2011)

    Google Scholar 

  35. Miller, S., Van Den Berg, J., Fritz, M., Darrell, T., Goldberg, K., Abbeel, P.: A geometric approach to robotic laundry folding. IJRR (2012)

    Google Scholar 

  36. Misra, D., Sung, J., Lee, K., Saxena, A.: Tell me dave: context-sensitive grounding of natural language to mobile manipulation instructions. In: RSS (2014)

    Google Scholar 

  37. Mülling, K., Kober, J., Kroemer, O., Peters, J.: Learning to select and generalize striking movements in robot table tennis. IJRR 32(3), 263–279 (2013)

    Google Scholar 

  38. Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: ICML (2011)

    Google Scholar 

  39. Pastor, P., Hoffmann, H., Asfour, T., Schaal, S.: Learning and generalization of motor skills by learning from demonstration. In: ICRA (2009)

    Google Scholar 

  40. Phillips, M., Hwang, V., Chitta, S., Likhachev, M.: Learning to plan for constrained manipulation from demonstrations. In: RSS (2013)

    Google Scholar 

  41. Pillai, S., Walter, M., Teller, S.: Learning articulated motions from visual demonstration. In: RSS (2014)

    Google Scholar 

  42. Rusu, R., Cousins, S.: 3D is here: Point Cloud Library (PCL). In: ICRA (2011)

    Google Scholar 

  43. Saxena, A. Driemeyer, J., Ng, A.: Learning 3-d object orientation from images. In: ICRA (2009)

    Google Scholar 

  44. Saxena, A., Jain, A., Sener, O., Jami, A., Misra, D.K., Koppula. H.S.: Robo brain: large-scale knowledge engine for robots. Technical report, August 2014

    Google Scholar 

  45. Shoemake, K.: Animating rotation with quaternion curves. SIGGRAPH 19(3), 245–254 (1985)

    Article  Google Scholar 

  46. Socher, R., Lin, C.C., Manning, C., Ng, A.Y.: Parsing natural scenes and natural language with recursive neural networks. In: ICML (2011)

    Google Scholar 

  47. Socher, R., Pennington, J., Huang, E., Ng, A., Manning, C.: Semi-supervised recursive autoencoders for predicting sentiment distributions. In: EMNLP (2011)

    Google Scholar 

  48. Socher, R., Huval, B., Bhat, B., Manning, C., Ng, A.: Convolutional-recursive deep learning for 3d object classification. In: NIPS (2012)

    Google Scholar 

  49. Srivastava, N.: Improving neural networks with dropout. Ph.D. thesis, University of Toronto (2013)

    Google Scholar 

  50. Stilman, M.: Task constrained motion planning in robot joint space. In: IROS (2007)

    Google Scholar 

  51. Sturm, J., Stachniss, C., Burgard, W.: A probabilistic framework for learning kinematic models of articulated objects. JAIR 41(2), 477–526 (2011)

    MathSciNet  MATH  Google Scholar 

  52. Sung, J., Ponce, C., Selman, B., Saxena, A.: Unstructured human activity detection from rgbd images. In: ICRA (2012)

    Google Scholar 

  53. Sung, J., Selman, B., Saxena, A.: Synthesizing manipulation sequences for under-specified tasks using unrolled markov random fields. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (2014)

    Google Scholar 

  54. Tellex, S., Knepper, R., Li, A., Howard, T., Rus, D., Roy, N.: Asking for help using inverse semantics. RSS (2014)

    Google Scholar 

  55. Thrun, S., Burgard, W., Fox, D., et al.: Probabilistic Robotics. MIT press, Cambridge (2005)

    MATH  Google Scholar 

  56. Toris, R., Chernova, S.: Robots for me and robots for you. In: Proceedings of the Interactive Machine Learning Workshop, Intelligent User Interfaces Conference, pp. 10–12 (2013)

    Google Scholar 

  57. Toris, R., Kent, D., Chernova, S.: The robot management system: a framework for conducting human-robot interaction studies through crowdsourcing. J. Hum.-Robot Interact. 3(2), 25–49 (2014)

    Article  Google Scholar 

  58. Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: ICML ACM (2004)

    Google Scholar 

  59. Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y., Singer, Y.: Large margin methods for structured and interdependent output variables. JMLR, 6(9) (2005)

    Google Scholar 

  60. Vina, F., Bekiroglu, Y., Smith, C., Karayiannidis, Y., Kragic, D.: Predicting slippage and learning manipulation affordances through gaussian process regression. In: Humanoids (2013)

    Google Scholar 

  61. Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.-A.: Extracting and composing robust features with denoising autoencoders. In: ICML (2008)

    Google Scholar 

  62. Wieland, S., Gonzalez-Aguirre, D., Vahrenkamp, N., Asfour, T., Dillmann, R.: Combining force and visual feedback for physical interaction tasks in humanoid robots. In: Humanoid Robots (2009)

    Google Scholar 

  63. Wu, C., Lenz, I., Saxena, A.: Hierarchical semantic labeling for task-relevant rgb-d perception. In: RSS (2014)

    Google Scholar 

  64. Yu, C.-N., Joachims, T.: Learning structural svms with latent variables. In: ICML (2009)

    Google Scholar 

  65. Zeiler, M.D., Ranzato, M., Monga, R. et al.: On rectified linear units for speech processing. In: ICASSP (2013)

    Google Scholar 

Download references

Acknowledgements

We thank Joshua Reichler for building the initial prototype of the crowd-sourcing platform. We thank Ian Lenz and Ross Knepper for useful discussions. This research was funded in part by Microsoft Faculty Fellowship (to Saxena), NSF Career award (to Saxena) and Army Research Office.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jaeyong Sung .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this chapter

Cite this chapter

Sung, J., Jin, S.H., Saxena, A. (2018). Robobarista: Object Part Based Transfer of Manipulation Trajectories from Crowd-Sourcing in 3D Pointclouds. In: Bicchi, A., Burgard, W. (eds) Robotics Research. Springer Proceedings in Advanced Robotics, vol 3. Springer, Cham. https://doi.org/10.1007/978-3-319-60916-4_40

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-60916-4_40

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-60915-7

  • Online ISBN: 978-3-319-60916-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics