Abstract
There is a large variety of objects and appliances in human environments, such as stoves, coffee dispensers, juice extractors, and so on. It is challenging for a roboticist to program a robot for each of these object types and for each of their instantiations. In this work, we present a novel approach to manipulation planning based on the idea that many household objects share similarly-operated object parts. We formulate the manipulation planning as a structured prediction problem and design a deep learning model that can handle large noise in the manipulation demonstrations and learns features from three different modalities: point-clouds, language and trajectory. In order to collect a large number of manipulation demonstrations for different objects, we developed a new crowd-sourcing platform called Robobarista. We test our model on our dataset consisting of 116 objects with 249 parts along with 250 language instructions, for which there are 1225 crowd-sourced manipulation demonstrations. We further show that our robot can even manipulate objects it has never seen before.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
We have made sure that it does not initialize with trajectories from other folds to keep 5-fold cross-validation in experiment section valid.
- 2.
Although not necessary for training our model, we also collected trajectories from the expert for evaluation purposes.
References
Abbeel, P., Coates, A., Ng, A.: Autonomous helicopter aerobatics through apprenticeship learning. IJRR (2010)
Aha, D.W., Kibler, D.: Albert. M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)
Alexander, B., Hsiao, K., Jenkins, C., Suay, B., Toris, R.: Robot web tools [ros topics]. IEEE Robot. Autom. Mag. 19(4), 20–23 (2012)
Argall, B., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robot. Auton. Syst. 57(5), 469–483 (2009)
Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Analysis Mach. Intell. 35(8), 1798–1828 (2013)
Besl, P.J., McKay, N.D.: Method for registration of 3-d shapes. In: International Society for Optics and Photonics, Robotics-DL tentative, pp. 586–606 (1992)
Blaschko, M., Lampert, C.: Learning to localize objects with structured output regression. In: ECCV (2008)
Bollini, M., Barry, J., Rus, D.: Bakebot: baking cookies with the pr2. In: IROS PR2 Workshop (2011)
Crick, C., Osentoski, S., Jay, G., Jenkins, O.C.: Human and robot perception in large-scale learning from demonstration. In: HRI, ACM (2011)
Dang, H., Allen, P.K.: Semantic grasping: planning robotic grasps functionally suitable for an object manipulation task. In: IROS (2012)
Daniel, C., Neumann, G., Peters, J.: Learning concurrent motor skills in versatile solution spaces. In: IROS, IEEE (2012)
Detry, R., Ek, C.H., Madry, M., Kragic, D.: Learning a dictionary of prototypical grasp-predicting parts from grasping experience. In: ICRA (2013)
Endres, F., Trinkle, J., Burgard, W.: Learning the dynamics of doors for robotic manipulation. In: IROS (2013)
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. PAMI 32(9), 1627–1645 (2010)
Forbes, M., Chung, M.J.-Y., Cakmak, M., Rao, R.P.: Robot programming by demonstration with crowdsourced action fixes. In: Second AAAI Conference on Human Computation and Crowd sourcing (2014)
Gibson, J.J.: The Ecological Approach to Visual Perception. Psychology Press, Hillsdale (1986)
Girshick, R., Felzenszwalb, P., McAllester, D.: Object detection with grammar models. In: NIPS (2011)
Hadsell, R., Erkan, A., Sermanet, P., Scoffier, M., Muller, U., LeCun, Y.: Deep belief net learning in a long-range vision system for autonomous off-road driving. In: IROS, pp. 628–633. IEEE (2008)
Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors (2012). arXiv:1207.0580
Hsiao, K., Chitta, S., Ciocarlie, M., Jones, E.: Contact-reactive grasping of objects with partial shape information. In: IROS (2010)
Hu, N., Lou, Z., Englebienne, G., Krse, B.: Learning to recognize human activities from soft labeled data. In: Proceedings of Robotics: Science and Systems, Berkeley, USA (2014)
Izadi, S., Kim, D., Hilliges, O., Molyneaux, D., Newcombe, R., Kohli, P. et al.: Kinectfusion: real-time 3d reconstruction and interaction using a moving depth camera. In: ACM Symposium on UIST (2011)
Jain, A., Wojcik, B., Joachims, T., Saxena, A.: Learning preferences for manipulation tasks from online coactive feedback. Int. J. Robot. Res. 34(10), 1296–1313 (2015)
Joachims, T., Finley, T., Yu, C.-N.J.: Cutting-plane training of structural svms. Mach. Learn. (2009)
Katz, D., Kazemi, M., Bagnell, J.A., Stentz, A.: Interactive segmentation, tracking, and kinematic modeling of unknown 3d articulated objects. In: ICRA, pp. 5003–5010. IEEE (2013)
Kehoe, B., Matsukawa, A., Candido, S., Kuffner, J., Goldberg, K.: Cloud-based robot grasping with the google object recognition engine. In: ICRA (2013)
Koppula, H., Saxena, A.: Anticipating human activities using object affordances for reactive robotic response. In: RSS (2013)
Koppula, H., Anand, A., Joachims, T., Saxena, A.: Semantic labeling of 3d point clouds for indoor scenes. In: NIPS (2011)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Kroemer, O., Ugur, E., Oztop, E., Peters, J.: A kernel-based approach to direct action perception. In: ICRA (2012)
Lai, K., Bo, L., Fox,D.: Unsupervised feature learning for 3d scene labeling. In: ICRA (2014)
Lenz, I., Lee, H., Saxena, A.: Deep learning for detecting robotic grasps. In: RSS (2013)
Li, L.-J., Socher, R., Fei-Fei, L.: Towards total scene understanding: classification, annotation and segmentation in an automatic framework. In: CVPR (2009)
Mangin, O., Oudeyer, P.-Y. et al.: Unsupervised learning of simultaneous motor primitives through imitation. In: IEEE ICDL-EPIROB (2011)
Miller, S., Van Den Berg, J., Fritz, M., Darrell, T., Goldberg, K., Abbeel, P.: A geometric approach to robotic laundry folding. IJRR (2012)
Misra, D., Sung, J., Lee, K., Saxena, A.: Tell me dave: context-sensitive grounding of natural language to mobile manipulation instructions. In: RSS (2014)
Mülling, K., Kober, J., Kroemer, O., Peters, J.: Learning to select and generalize striking movements in robot table tennis. IJRR 32(3), 263–279 (2013)
Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: ICML (2011)
Pastor, P., Hoffmann, H., Asfour, T., Schaal, S.: Learning and generalization of motor skills by learning from demonstration. In: ICRA (2009)
Phillips, M., Hwang, V., Chitta, S., Likhachev, M.: Learning to plan for constrained manipulation from demonstrations. In: RSS (2013)
Pillai, S., Walter, M., Teller, S.: Learning articulated motions from visual demonstration. In: RSS (2014)
Rusu, R., Cousins, S.: 3D is here: Point Cloud Library (PCL). In: ICRA (2011)
Saxena, A. Driemeyer, J., Ng, A.: Learning 3-d object orientation from images. In: ICRA (2009)
Saxena, A., Jain, A., Sener, O., Jami, A., Misra, D.K., Koppula. H.S.: Robo brain: large-scale knowledge engine for robots. Technical report, August 2014
Shoemake, K.: Animating rotation with quaternion curves. SIGGRAPH 19(3), 245–254 (1985)
Socher, R., Lin, C.C., Manning, C., Ng, A.Y.: Parsing natural scenes and natural language with recursive neural networks. In: ICML (2011)
Socher, R., Pennington, J., Huang, E., Ng, A., Manning, C.: Semi-supervised recursive autoencoders for predicting sentiment distributions. In: EMNLP (2011)
Socher, R., Huval, B., Bhat, B., Manning, C., Ng, A.: Convolutional-recursive deep learning for 3d object classification. In: NIPS (2012)
Srivastava, N.: Improving neural networks with dropout. Ph.D. thesis, University of Toronto (2013)
Stilman, M.: Task constrained motion planning in robot joint space. In: IROS (2007)
Sturm, J., Stachniss, C., Burgard, W.: A probabilistic framework for learning kinematic models of articulated objects. JAIR 41(2), 477–526 (2011)
Sung, J., Ponce, C., Selman, B., Saxena, A.: Unstructured human activity detection from rgbd images. In: ICRA (2012)
Sung, J., Selman, B., Saxena, A.: Synthesizing manipulation sequences for under-specified tasks using unrolled markov random fields. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (2014)
Tellex, S., Knepper, R., Li, A., Howard, T., Rus, D., Roy, N.: Asking for help using inverse semantics. RSS (2014)
Thrun, S., Burgard, W., Fox, D., et al.: Probabilistic Robotics. MIT press, Cambridge (2005)
Toris, R., Chernova, S.: Robots for me and robots for you. In: Proceedings of the Interactive Machine Learning Workshop, Intelligent User Interfaces Conference, pp. 10–12 (2013)
Toris, R., Kent, D., Chernova, S.: The robot management system: a framework for conducting human-robot interaction studies through crowdsourcing. J. Hum.-Robot Interact. 3(2), 25–49 (2014)
Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: ICML ACM (2004)
Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y., Singer, Y.: Large margin methods for structured and interdependent output variables. JMLR, 6(9) (2005)
Vina, F., Bekiroglu, Y., Smith, C., Karayiannidis, Y., Kragic, D.: Predicting slippage and learning manipulation affordances through gaussian process regression. In: Humanoids (2013)
Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.-A.: Extracting and composing robust features with denoising autoencoders. In: ICML (2008)
Wieland, S., Gonzalez-Aguirre, D., Vahrenkamp, N., Asfour, T., Dillmann, R.: Combining force and visual feedback for physical interaction tasks in humanoid robots. In: Humanoid Robots (2009)
Wu, C., Lenz, I., Saxena, A.: Hierarchical semantic labeling for task-relevant rgb-d perception. In: RSS (2014)
Yu, C.-N., Joachims, T.: Learning structural svms with latent variables. In: ICML (2009)
Zeiler, M.D., Ranzato, M., Monga, R. et al.: On rectified linear units for speech processing. In: ICASSP (2013)
Acknowledgements
We thank Joshua Reichler for building the initial prototype of the crowd-sourcing platform. We thank Ian Lenz and Ross Knepper for useful discussions. This research was funded in part by Microsoft Faculty Fellowship (to Saxena), NSF Career award (to Saxena) and Army Research Office.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this chapter
Cite this chapter
Sung, J., Jin, S.H., Saxena, A. (2018). Robobarista: Object Part Based Transfer of Manipulation Trajectories from Crowd-Sourcing in 3D Pointclouds. In: Bicchi, A., Burgard, W. (eds) Robotics Research. Springer Proceedings in Advanced Robotics, vol 3. Springer, Cham. https://doi.org/10.1007/978-3-319-60916-4_40
Download citation
DOI: https://doi.org/10.1007/978-3-319-60916-4_40
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-60915-7
Online ISBN: 978-3-319-60916-4
eBook Packages: EngineeringEngineering (R0)