Robobarista: Object Part Based Transfer of Manipulation Trajectories from Crowd-Sourcing in 3D Pointclouds

Sung, Jaeyong; Jin, Seok Hyun; Saxena, Ashutosh

doi:10.1007/978-3-319-60916-4_40

Jaeyong Sung⁵,
Seok Hyun Jin⁵ &
Ashutosh Saxena⁵

Part of the book series: Springer Proceedings in Advanced Robotics ((SPAR,volume 3))

3988 Accesses

Abstract

There is a large variety of objects and appliances in human environments, such as stoves, coffee dispensers, juice extractors, and so on. It is challenging for a roboticist to program a robot for each of these object types and for each of their instantiations. In this work, we present a novel approach to manipulation planning based on the idea that many household objects share similarly-operated object parts. We formulate the manipulation planning as a structured prediction problem and design a deep learning model that can handle large noise in the manipulation demonstrations and learns features from three different modalities: point-clouds, language and trajectory. In order to collect a large number of manipulation demonstrations for different objects, we developed a new crowd-sourcing platform called Robobarista. We test our model on our dataset consisting of 116 objects with 249 parts along with 250 language instructions, for which there are 1225 crowd-sourced manipulation demonstrations. We further show that our robot can even manipulate objects it has never seen before.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

AdaAfford: Learning to Adapt Manipulation Affordance for 3D Articulated Objects via Few-Shot Interactions

TidyBot: personalized robot assistance with large language models

Article 16 November 2023

KPAM: KeyPoint Affordances for Category-Level Robotic Manipulation

Notes

1.
We have made sure that it does not initialize with trajectories from other folds to keep 5-fold cross-validation in experiment section valid.
2.
Although not necessary for training our model, we also collected trajectories from the expert for evaluation purposes.

References

Abbeel, P., Coates, A., Ng, A.: Autonomous helicopter aerobatics through apprenticeship learning. IJRR (2010)
Google Scholar
Aha, D.W., Kibler, D.: Albert. M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)
Google Scholar
Alexander, B., Hsiao, K., Jenkins, C., Suay, B., Toris, R.: Robot web tools [ros topics]. IEEE Robot. Autom. Mag. 19(4), 20–23 (2012)
Article Google Scholar
Argall, B., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robot. Auton. Syst. 57(5), 469–483 (2009)
Article Google Scholar
Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Analysis Mach. Intell. 35(8), 1798–1828 (2013)
Article Google Scholar
Besl, P.J., McKay, N.D.: Method for registration of 3-d shapes. In: International Society for Optics and Photonics, Robotics-DL tentative, pp. 586–606 (1992)
Google Scholar
Blaschko, M., Lampert, C.: Learning to localize objects with structured output regression. In: ECCV (2008)
Google Scholar
Bollini, M., Barry, J., Rus, D.: Bakebot: baking cookies with the pr2. In: IROS PR2 Workshop (2011)
Google Scholar
Crick, C., Osentoski, S., Jay, G., Jenkins, O.C.: Human and robot perception in large-scale learning from demonstration. In: HRI, ACM (2011)
Google Scholar
Dang, H., Allen, P.K.: Semantic grasping: planning robotic grasps functionally suitable for an object manipulation task. In: IROS (2012)
Google Scholar
Daniel, C., Neumann, G., Peters, J.: Learning concurrent motor skills in versatile solution spaces. In: IROS, IEEE (2012)
Google Scholar
Detry, R., Ek, C.H., Madry, M., Kragic, D.: Learning a dictionary of prototypical grasp-predicting parts from grasping experience. In: ICRA (2013)
Google Scholar
Endres, F., Trinkle, J., Burgard, W.: Learning the dynamics of doors for robotic manipulation. In: IROS (2013)
Google Scholar
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. PAMI 32(9), 1627–1645 (2010)
Article Google Scholar
Forbes, M., Chung, M.J.-Y., Cakmak, M., Rao, R.P.: Robot programming by demonstration with crowdsourced action fixes. In: Second AAAI Conference on Human Computation and Crowd sourcing (2014)
Google Scholar
Gibson, J.J.: The Ecological Approach to Visual Perception. Psychology Press, Hillsdale (1986)
Google Scholar
Girshick, R., Felzenszwalb, P., McAllester, D.: Object detection with grammar models. In: NIPS (2011)
Google Scholar
Hadsell, R., Erkan, A., Sermanet, P., Scoffier, M., Muller, U., LeCun, Y.: Deep belief net learning in a long-range vision system for autonomous off-road driving. In: IROS, pp. 628–633. IEEE (2008)
Google Scholar
Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors (2012). arXiv:1207.0580
Hsiao, K., Chitta, S., Ciocarlie, M., Jones, E.: Contact-reactive grasping of objects with partial shape information. In: IROS (2010)
Google Scholar
Hu, N., Lou, Z., Englebienne, G., Krse, B.: Learning to recognize human activities from soft labeled data. In: Proceedings of Robotics: Science and Systems, Berkeley, USA (2014)
Google Scholar
Izadi, S., Kim, D., Hilliges, O., Molyneaux, D., Newcombe, R., Kohli, P. et al.: Kinectfusion: real-time 3d reconstruction and interaction using a moving depth camera. In: ACM Symposium on UIST (2011)
Google Scholar
Jain, A., Wojcik, B., Joachims, T., Saxena, A.: Learning preferences for manipulation tasks from online coactive feedback. Int. J. Robot. Res. 34(10), 1296–1313 (2015)
Article Google Scholar
Joachims, T., Finley, T., Yu, C.-N.J.: Cutting-plane training of structural svms. Mach. Learn. (2009)
Google Scholar
Katz, D., Kazemi, M., Bagnell, J.A., Stentz, A.: Interactive segmentation, tracking, and kinematic modeling of unknown 3d articulated objects. In: ICRA, pp. 5003–5010. IEEE (2013)
Google Scholar
Kehoe, B., Matsukawa, A., Candido, S., Kuffner, J., Goldberg, K.: Cloud-based robot grasping with the google object recognition engine. In: ICRA (2013)
Google Scholar
Koppula, H., Saxena, A.: Anticipating human activities using object affordances for reactive robotic response. In: RSS (2013)
Google Scholar
Koppula, H., Anand, A., Joachims, T., Saxena, A.: Semantic labeling of 3d point clouds for indoor scenes. In: NIPS (2011)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Kroemer, O., Ugur, E., Oztop, E., Peters, J.: A kernel-based approach to direct action perception. In: ICRA (2012)
Google Scholar
Lai, K., Bo, L., Fox,D.: Unsupervised feature learning for 3d scene labeling. In: ICRA (2014)
Google Scholar
Lenz, I., Lee, H., Saxena, A.: Deep learning for detecting robotic grasps. In: RSS (2013)
Google Scholar
Li, L.-J., Socher, R., Fei-Fei, L.: Towards total scene understanding: classification, annotation and segmentation in an automatic framework. In: CVPR (2009)
Google Scholar
Mangin, O., Oudeyer, P.-Y. et al.: Unsupervised learning of simultaneous motor primitives through imitation. In: IEEE ICDL-EPIROB (2011)
Google Scholar
Miller, S., Van Den Berg, J., Fritz, M., Darrell, T., Goldberg, K., Abbeel, P.: A geometric approach to robotic laundry folding. IJRR (2012)
Google Scholar
Misra, D., Sung, J., Lee, K., Saxena, A.: Tell me dave: context-sensitive grounding of natural language to mobile manipulation instructions. In: RSS (2014)
Google Scholar
Mülling, K., Kober, J., Kroemer, O., Peters, J.: Learning to select and generalize striking movements in robot table tennis. IJRR 32(3), 263–279 (2013)
Google Scholar
Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: ICML (2011)
Google Scholar
Pastor, P., Hoffmann, H., Asfour, T., Schaal, S.: Learning and generalization of motor skills by learning from demonstration. In: ICRA (2009)
Google Scholar
Phillips, M., Hwang, V., Chitta, S., Likhachev, M.: Learning to plan for constrained manipulation from demonstrations. In: RSS (2013)
Google Scholar
Pillai, S., Walter, M., Teller, S.: Learning articulated motions from visual demonstration. In: RSS (2014)
Google Scholar
Rusu, R., Cousins, S.: 3D is here: Point Cloud Library (PCL). In: ICRA (2011)
Google Scholar
Saxena, A. Driemeyer, J., Ng, A.: Learning 3-d object orientation from images. In: ICRA (2009)
Google Scholar
Saxena, A., Jain, A., Sener, O., Jami, A., Misra, D.K., Koppula. H.S.: Robo brain: large-scale knowledge engine for robots. Technical report, August 2014
Google Scholar
Shoemake, K.: Animating rotation with quaternion curves. SIGGRAPH 19(3), 245–254 (1985)
Article Google Scholar
Socher, R., Lin, C.C., Manning, C., Ng, A.Y.: Parsing natural scenes and natural language with recursive neural networks. In: ICML (2011)
Google Scholar
Socher, R., Pennington, J., Huang, E., Ng, A., Manning, C.: Semi-supervised recursive autoencoders for predicting sentiment distributions. In: EMNLP (2011)
Google Scholar
Socher, R., Huval, B., Bhat, B., Manning, C., Ng, A.: Convolutional-recursive deep learning for 3d object classification. In: NIPS (2012)
Google Scholar
Srivastava, N.: Improving neural networks with dropout. Ph.D. thesis, University of Toronto (2013)
Google Scholar
Stilman, M.: Task constrained motion planning in robot joint space. In: IROS (2007)
Google Scholar
Sturm, J., Stachniss, C., Burgard, W.: A probabilistic framework for learning kinematic models of articulated objects. JAIR 41(2), 477–526 (2011)
MathSciNet MATH Google Scholar
Sung, J., Ponce, C., Selman, B., Saxena, A.: Unstructured human activity detection from rgbd images. In: ICRA (2012)
Google Scholar
Sung, J., Selman, B., Saxena, A.: Synthesizing manipulation sequences for under-specified tasks using unrolled markov random fields. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (2014)
Google Scholar
Tellex, S., Knepper, R., Li, A., Howard, T., Rus, D., Roy, N.: Asking for help using inverse semantics. RSS (2014)
Google Scholar
Thrun, S., Burgard, W., Fox, D., et al.: Probabilistic Robotics. MIT press, Cambridge (2005)
MATH Google Scholar
Toris, R., Chernova, S.: Robots for me and robots for you. In: Proceedings of the Interactive Machine Learning Workshop, Intelligent User Interfaces Conference, pp. 10–12 (2013)
Google Scholar
Toris, R., Kent, D., Chernova, S.: The robot management system: a framework for conducting human-robot interaction studies through crowdsourcing. J. Hum.-Robot Interact. 3(2), 25–49 (2014)
Article Google Scholar
Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: ICML ACM (2004)
Google Scholar
Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y., Singer, Y.: Large margin methods for structured and interdependent output variables. JMLR, 6(9) (2005)
Google Scholar
Vina, F., Bekiroglu, Y., Smith, C., Karayiannidis, Y., Kragic, D.: Predicting slippage and learning manipulation affordances through gaussian process regression. In: Humanoids (2013)
Google Scholar
Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.-A.: Extracting and composing robust features with denoising autoencoders. In: ICML (2008)
Google Scholar
Wieland, S., Gonzalez-Aguirre, D., Vahrenkamp, N., Asfour, T., Dillmann, R.: Combining force and visual feedback for physical interaction tasks in humanoid robots. In: Humanoid Robots (2009)
Google Scholar
Wu, C., Lenz, I., Saxena, A.: Hierarchical semantic labeling for task-relevant rgb-d perception. In: RSS (2014)
Google Scholar
Yu, C.-N., Joachims, T.: Learning structural svms with latent variables. In: ICML (2009)
Google Scholar
Zeiler, M.D., Ranzato, M., Monga, R. et al.: On rectified linear units for speech processing. In: ICASSP (2013)
Google Scholar

Download references

Acknowledgements

We thank Joshua Reichler for building the initial prototype of the crowd-sourcing platform. We thank Ian Lenz and Ross Knepper for useful discussions. This research was funded in part by Microsoft Faculty Fellowship (to Saxena), NSF Career award (to Saxena) and Army Research Office.

Author information

Authors and Affiliations

Department of Computer Science, Cornell University, Ithaca, NY, USA
Jaeyong Sung, Seok Hyun Jin & Ashutosh Saxena

Authors

Jaeyong Sung
View author publications
You can also search for this author in PubMed Google Scholar
Seok Hyun Jin
View author publications
You can also search for this author in PubMed Google Scholar
Ashutosh Saxena
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jaeyong Sung .

Editor information

Editors and Affiliations

Istituto Italiano di Tecnologia, Genova, Italy, University of Pisa, Pisa, Italy , Pisa, Italy
Antonio Bicchi
Inst. für Informatik, Albert-Ludwigs-Universität Freiburg Inst. für Informatik, Freiburg, Germany
Wolfram Burgard

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sung, J., Jin, S.H., Saxena, A. (2018). Robobarista: Object Part Based Transfer of Manipulation Trajectories from Crowd-Sourcing in 3D Pointclouds. In: Bicchi, A., Burgard, W. (eds) Robotics Research. Springer Proceedings in Advanced Robotics, vol 3. Springer, Cham. https://doi.org/10.1007/978-3-319-60916-4_40

Download citation

DOI: https://doi.org/10.1007/978-3-319-60916-4_40
Published: 25 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-60915-7
Online ISBN: 978-3-319-60916-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics