Skip to main content
Log in

Generative Attention Learning: a “GenerAL” framework for high-performance multi-fingered grasping in clutter

  • Published:
Autonomous Robots Aims and scope Submit manuscript

Abstract

Generative Attention Learning (GenerAL) is a framework for high-DOF multi-fingered grasping that is not only robust to dense clutter and novel objects but also effective with a variety of different parallel-jaw and multi-fingered robot hands. This framework introduces a novel attention mechanism that substantially improves the grasp success rate in clutter. Its generative nature allows the learning of full-DOF grasps with flexible end-effector positions and orientations, as well as all finger joint angles of the hand. Trained purely in simulation, this framework skillfully closes the sim-to-real gap. To close the visual sim-to-real gap, this framework uses a single depth image as input. To close the dynamics sim-to-real gap, this framework circumvents continuous motor control with a direct mapping from pixel to Cartesian space inferred from the same depth image. Finally, this framework demonstrates inter-robot generality by achieving over \(92\%\) real-world grasp success rates in cluttered scenes with novel objects using two multi-fingered robotic hand-arm systems with different degrees of freedom.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. For structural details of the RH8D Seed hand, kindly see http://www.seedrobotics.com/rh8d-dexterous-hand.html.

References

  • Akinola, I., Varley, J., Chen, B., & Allen, P. K. (2018). Workspace aware online grasp planning. In 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS), IEEE (pp. 2917–2924).

  • Berenson, D., & Srinivasa, S. S. (2008). Grasp synthesis in cluttered environments for dexterous hands. In 8th IEEE-RAS international conference on humanoid robots, 2008, IEEE (pp. 189–196).

  • Berenson, D., Diankov, R., Nishiwaki, K., Kagami, S., & Kuffner, J. (2007). Grasp planning in complex scenes. In 7th IEEE-RAS international conference on humanoid robots, 2007, IEEE (pp. 42–48).

  • Bohg, J., Morales, A., Asfour, T., & Kragic, D. (2014). Data-driven grasp synthesis—a survey. IEEE Transactions on Robotics, 30(2), 289–309.

    Article  Google Scholar 

  • Bousmalis, K., Irpan, A., Wohlhart, P., Bai, Y., Kelcey, M., Kalakrishnan, M., Downs, L., Ibarz, J., Pastor, P., & Konolige, K., et al. (2018). Using simulation and domain adaptation to improve efficiency of deep robotic grasping. In 2018 IEEE international conference on robotics and automation (ICRA), IEEE (pp. 4243–4250).

  • Chang, A. X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., & Su, H., et al. (2015). Shapenet: An information-rich 3D model repository. arXiv preprint arXiv:1512.03012.

  • Chen, X., Chen, R., Sui, Z., Ye, Z., Liu, Y., Bahar, R., & Jenkins, O. C. (2019). Grip: Generative robust inference and perception for semantic robot manipulation in adversarial environments. arXiv preprint arXiv:1903.08352.

  • Ciocarlie, M., Goldfeder, C., & Allen, P. (2007). Dexterous grasping via eigengrasps: A low-dimensional approach to a high-complexity problem. In Robotics: Science and systems manipulation workshop-sensing and adapting to the real world, Citeseer.

  • Ciocarlie, M. T., & Allen, P. K. (2009). Hand posture subspaces for dexterous robotic grasping. The International Journal of Robotics Research, 28(7), 851–867.

    Article  Google Scholar 

  • Coumans, E., & Bai, Y. (2016). Pybullet, a python module for physics simulation for games, robotics and machine learning. San Francisco: GitHub.

    Google Scholar 

  • Fischinger, D., Vincze, M., & Jiang, Y. (2013). Learning grasps for unknown objects in cluttered scenes. In 2013 IEEE international conference on robotics and automation, IEEE (pp. 609–616).

  • Gualtieri, M., & Platt, R. (2018). Learning 6-DOF grasping and pick-place using attention focus. arXiv preprint arXiv:1806.06134.

  • Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning (ICML).

  • Hang, K., Stork, J. A., & Kragic, D. (2014). Hierarchical fingertip space for multi-fingered precision grasping. In: 2014 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 1641–1648).

  • Hsiao, K., & Lozano-Perez, T. (2006). Imitation learning of whole-body grasps. In: 2006 IEEE/RSJ international conference on intelligent robots and systems, IEEE (pp. 5657–5662).

  • James, S., Wohlhart, P., Kalakrishnan, M., Kalashnikov, D., Irpan, A., Ibarz, J., Levine, S., Hadsell, R., & Bousmalis, K. (2018). Sim-to-real via sim-to-sim: Data-efficient robotic grasping via randomized-to-canonical adaptation networks. arXiv preprint arXiv:1812.07252.

  • Kalashnikov, D., Irpan, A., Pastor, P., Ibarz, J., Herzog, A., Jang, E., Quillen, D., Holly, E., Kalakrishnan, M., & Vanhoucke, V., et al. (2018). QT-Opt: Scalable deep reinforcement learning for vision-based robotic manipulation. arXiv preprint arXiv:1806.10293.

  • Lenz, I., Lee, H., & Saxena, A. (2015). Deep learning for detecting robotic grasps. The International Journal of Robotics Research, 34(4–5), 705–724.

    Article  Google Scholar 

  • Levine, S., Finn, C., Darrell, T., & Abbeel, P. (2016). End-to-end training of deep visuomotor policies. The Journal of Machine Learning Research, 17(1), 1334–1373.

    MathSciNet  MATH  Google Scholar 

  • Levine, S., Pastor, P., Krizhevsky, A., & Quillen, D. (2017). Learning hand-eye coordination for robotic grasping with large-scale data collection. In: International symposium on experimental robotics (ISER).

  • Lin, T., Dollár, P., Girshick, R. B., He, K., Hariharan, B., & Belongie, S. J. (2016). Feature pyramid networks for object detection. CoRR,. arXiv:1612.03144.

  • Lu, Q., Chenna, K., Sundaralingam, B., & Hermans, T. (2017) Planning multi-fingered grasps as probabilistic inference in a learned deep network. In: International symposium on robotics research.

  • Lundell, J., Verdoja, F., & Kyrki, V. (2019). Robust grasp planning over uncertain shape completions. arXiv preprint arXiv:1903.00645.

  • Mahler, J., Liang, J., Niyaz, S., Laskey, M., Doan, R., Liu, X., Ojea, J. A., & Goldberg, K. (2017). Dex-Net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics. arXiv preprint arXiv:1703.09312.

  • Miller, A. T., Knoop, S., Christensen, H. I., & Allen, P. K. (2003). Automatic grasp planning using shape primitives. In: IEEE international conference on robotics and automation, 2003. Proceedings. ICRA’03, IEEE (Vol. 2, pp. 1824–1829).

  • Mnih, V., Heess, N., & Graves, A., et al. (2014). Recurrent models of visual attention. In Advances in neural information processing systems (pp 2204–2212).

  • Morrison, D., Corke, P., & Leitner, J. (2018). Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. In Robotics: science and systems (RSS).

  • Okamura, A. M., Smaby, N., & Cutkosky, M. R. (2000). An overview of dexterous manipulation. In IEEE international conference on robotics and automation. Proceedings. ICRA (Vol. 1, pp. 255–262).

  • Pan, J., Sayrol, E., Giro-i Nieto, X., McGuinness, K., & O’Connor, N. E. (2016). Shallow and deep convolutional networks for saliency prediction. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 598–606).

  • Quillen, D., Jang, E., Nachum, O., Finn, C., Ibarz, J., & Levine, S. (2018). Deep reinforcement learning for vision-based robotic grasping: A simulated comparative evaluation of off-policy methods. arXiv preprint arXiv:1802.10264.

  • Rosales, C., Porta, J. M., & Ros, L. (2011). Global optimization of robotic grasps. In Proceedings of robotics: science and systems VII.

  • Saut, J. P., Sahbani, A., El-Khoury, S., & Perdereau, V. (2007). Dexterous manipulation planning using probabilistic roadmaps in continuous grasp subspaces. In 2007 IEEE/RSJ international conference on intelligent robots and systems (IROS), IEEE (pp. 2907–2912).

  • Schmidt, P., Vahrenkamp, N., Wächter, M., & Asfour, T. (2018). Grasping of unknown objects using deep convolutional neural networks based on depth images. In 2018 IEEE international conference on robotics and automation (ICRA), IEEE (pp. 6831–6838).

  • Schnieders, B., Luo, S., Palmer, G., & Tuyls, K. (2019). Fully convolutional one-shot object segmentation for industrial robotics. In Proceedings of the 18th international conference on autonomous agents and multiagent systems, international foundation for autonomous agents and multiagent systems (pp. 1161–1169).

  • Schulman, J., Moritz, P., Levine, S., Jordan, M., & Abbeel, P. (2015). High-dimensional continuous control using generalized advantage estimation. In International conference on learning representations.

  • Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.

  • Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge: MIT Press.

    MATH  Google Scholar 

  • Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., & Abbeel, P. (2017). Domain randomization for transferring deep neural networks from simulation to the real world. In 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), IEEE (pp. 23–30).

  • Tobin, J., Biewald, L., Duan, R., Andrychowicz, M., Handa, A., Kumar, V., McGrew, B., Ray, A., Schneider, J., & Welinder, P., et al. (2018). Domain randomization and generative models for robotic grasping. In 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS), IEEE (pp. 3482–3489).

  • Varley, J., Weisz, J., Weiss, J., & Allen, P. (2015). Generating multi-fingered robotic grasps via deep learning. In: 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS), IEEE (pp. 4415–4420).

  • Varley, J., DeChant, C., Richardson, A., Ruales, J., & Allen, P. (2017). Shape completion enabled robotic grasping. In: 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), IEEE (pp. 2442–2447).

  • Viereck, U., Pas, A., Saenko, K., & Platt, R. (2017). Learning a visuomotor controller for real world robotic grasping using simulated depth images. In Conference on robot learning (CORL).

  • Wang, S., Jiang, X., Zhao, J., Wang, X., Zhou, W., & Liu, Y. (2019). Efficient fully convolution neural network for generating pixel wise robotic grasps with high resolution images. arXiv preprint arXiv:1902.08950.

  • Wang, W., Shen, J., & Ling, H. (2018). A deep network solution for attention and aesthetics aware photo cropping. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41, 1531–1544.

    Article  Google Scholar 

  • Watkins-Valls, D., Varley, J., & Allen, P. (2019). Multi-modal geometric learning for grasping and manipulation. In: 2019 International conference on robotics and automation (ICRA), IEEE (pp. 7339–7345).

  • Wu, B., Akinola, I., & Allen, P. K. (2019a). Pixel-attentive policy gradient for multi-fingered grasping in cluttered scenes. In 2019 IEEE/RSJ international conference on intelligent robots and systems (IROS).

  • Wu, B., Akinola, I., Varley, J., & Allen, P. K. (2019b). MAT: Multi-fingered adaptive tactile grasping via deep reinforcement learning. In Conference on robot learning (CORL).

  • Zeng, A., Song, S., Welker, S., Lee, J., Rodriguez, A., & Funkhouser, T. (2018a). Learning synergies between pushing and grasping with self-supervised deep reinforcement learning. In 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS), IEEE (pp. 4238–4245).

  • Zeng, Z., Zhou, Z., Sui, Z., & Jenkins, O. C. (2018b). Semantic robot programming for goal-directed manipulation in cluttered scenes. In 2018 IEEE international conference on robotics and automation (ICRA), IEEE (pp. 7462–7469)

  • Zhao, J., Liang, J., & Kroemer, O. (2019). Towards precise robotic grasping by probabilistic post-grasp displacement estimation. Technical report, EasyChair.

Download references

Acknowledgements

We are thankful to Wei Zhang and everyone at Columbia University Robotics Lab for useful comments and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bohan Wu.

Ethics declarations

Conflicts of interest

Jacob Varley is a member of Robotics at Google. Peter K. Allen has received a research grant from Google Inc.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported in part by a Google Research Grant and National Science Foundation Grants CMMI-1734557 and IIS-1527747.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 73367 KB)

Supplementary material 2 (mp4 349594 KB)

Supplementary material 3 (pdf 25159 KB)

Appendix

Appendix

1.1 Mechanical structure of robotics hands in experiments

1.1.1 Staubli–Barrett

The finger joint angles of the Barrett Hand range from 0 (open) to \(2.44 \text { rad}\) (close) for finger-1, finger-2 and finger-3 and from 0 to \(\pi \text { rad}\) for the lateral-spread. We used the original hand throughout all experiments.

1.1.2 UR5-Seed

For the anthropomorphic Seed hand, flexion joints curl the fingers toward the palm of the hand between 0 (open) to \(1.48 \text { rad}\) (close), while adduction joints spread the fingers apart from 0 to \(\pi /2 \text { rad}\). Therefore, only the thumb on the Seed hand may spread. The wrist-rotation, wrist-flexion and wrist-adduction DOFs, in addition to the finger flexion and adduction DOFs, contribute to a total of 8 DOFs for the Seed hand. However, these three wrist-DOFs belong to arm-DOFs \(\varPsi _{\mathrm{arm}}\) because they do not actuate any fingers. Each robotic finger consists of 3 joints, all of which are controlled by a single dyneema tendon (i.e. a Kevlar fiber string). The ring and pinky fingers are coupled such that a single actuator is responsible for the flexion of both fingers, while all other fingers are controlled by their own actuator, respectively.

1.2 Mechanical modification to the RH8D Seed hand

As shown in Fig. 6, fingers demonstrate under-actuated behavior during control. As the actuator moves through its full range of motion, the tendon pulls on the distal, intermediate, and proximal joints of a given finger in three stages:

  1. 1.

    Initially, the finger is fully open before closing (Fig. 6a). As the finger starts to close, the distal joint rotates almost completely before the intermediate joint even begins to move.

  2. 2.

    The distal joint, reaching its maximum displacement, ceases to rotate while the intermediate joint continues to move (Fig. 6b).

  3. 3.

    The proximal joint reaches its limit right after the intermediate joint stops moving (Fig. 6c).

Fig. 6
figure 6

The finger-closing behavior of the Seed hand before a–c and after d–f modification, using the index finger as an example

While the fingers are anthropomorphic in design, human fingers would move differently to grasp an object. In particular, the proximal joint of the human finger will traditionally curl before any other joint at a greater rate than the distal joint. Fingers of the human hand will also rarely ever settle in a hook-like position as depicted in Fig. 6b—the fingers must sweep through a greater volume in order to sufficiently contact and grasp objects beyond thin cylindrical geometries. With these observations in mind, tape is applied around the distal joint (Fig. 1b) to inhibit it from rotating (Fig. 6d–f). This configuration effectively reduces the number of under-actuated joints on each finger by one. Figure 6f depicts the resulting grasp form. Also, the fingers of the Seed hand are composed of very low-friction thermoplastic material. To increase friction on the fingers, we added small rubber caps to the fingertips (Fig. 1b).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, B., Akinola, I., Gupta, A. et al. Generative Attention Learning: a “GenerAL” framework for high-performance multi-fingered grasping in clutter. Auton Robot 44, 971–990 (2020). https://doi.org/10.1007/s10514-020-09907-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10514-020-09907-y

Keywords

Navigation