A novel robotic grasp detection method based on region proposal networks

doi:10.1016/j.rcim.2020.101963

Robotics and Computer-Integrated Manufacturing

Volume 65, October 2020, 101963

https://doi.org/10.1016/j.rcim.2020.101963 Get rights and content

Highlights

•
A single-stage robotic grasp detection network is designed based on the region proposal networks.
•
The predicted grasps are directly regressed and classified based on the oriented anchors.
•
The computation complexity of the grasp detection task is reduced.
•
A new matching strategy is designed for the oriented anchors.

Abstract

Grasp detection based on deep learning is an important method for robots to accurately perceive unstructured environments. However, the deep learning method widely used in general object detection is not suitable for robotic grasp detection. Multi-stage network is often designed to meet the requirements of grasp posture, but they increase computation complexity. This paper proposes a single-stage robotic grasp detection method by using region proposal networks. The proposed method generates multiple oriented reference anchors firstly. The grasp rectangles are then regressed and classified based on these anchors. A new matching strategy for oriented anchors is proposed based on the rotation angles and center positions of the anchors. The well-known Cornell grasp dataset and Jacquard dataset are used to test the performance of the proposed method. Experimental results show that the proposed method can achieve higher grasp detection accuracy compared with other methods in the literature.

Introduction

Robots and intelligent algorithms [1], [2], [3] are essential to the development of the intelligent manufacturing. The robots are widely used in robotic welding [4], robotic assembly [5], and robotic disassembly [6]. Grasp is a very important ability for robots to complete pick-and-place tasks. But grasping objects reliably is still a great challenge for robots due to unstructured environments and other uncertainties [7], [8]. The grasp tasks not only require a robot to accurately identify objects, but also require the robot to accurately determine the position and orientation of the objects. Inaccurate grasp points will result in failed grasp operations, which in turn affect subsequent path planning and grasp-based tasks. Therefore, an effective grasp detection method is essential for robots to complete grasp tasks.

Deep learning does not require hand-engineered features, and can effectively deal with unstructured environments. It is widely used in general object detection and has achieved great successes. However, the deep learning method uses a horizontal rectangle with four-dimensional representation to indicate the detection position in the general object detection task. The horizontal rectangle is not suitable for grasp detection task due to the grasp rotation angle. The grasp rectangle with five-dimensional representation is first used for robotic grasp task based on a two-stage cascaded structure [9]. The cascaded structure outperforms the results based on hand-engineered features [10]. But its network structure is very complicated.

To reduce the computation complexity, a single-stage network was proposed by Redmon and Angelova [11] to regress the grasp rectangle. Their method makes use of the depth information by replacing the blue channel of the image. A new multi-modal convolutional neural network was also designed to perform grasp detection [12] based on residual layers [13]. Their results shown that the deeper network and residual network are conducive to the improvement of grasp accuracy. But these methods cannot use prior information to improve the detection accuracy.

The priori information was proved to effectively improve object detection accuracy in general object detection task [14]. The anchor boxes were introduced into the grasp detection by Guo et al. [15]. But the anchor is a horizontal rectangle that cannot reflect the angle information. Zhou et al. [16] proposed a rotation anchor box with an oriented anchor box mechanism to represent grasp detection results. Their anchor matching strategy greatly improved the grasp accuracy. But their network was designed based on YOLO framework [17] that divides the input image into multiple grid cells. Their anchor matching strategy is not suitable for Faster R-CNN based detection framework. Another grasp detection method was proposed based on the Faster R-CNN framework by Chu et al. [18]. Their method divides grasp detection into the regression of coordinates and the classification of angles, which increases the network complexity.

This paper proposes an effective single-stage grasp detection network based on Faster R-CNN framework. The grasp detection is considered a detection task that contains two categories. The grasp detection network is designed based on region proposal network (RPN) from Faster R-CNN. RPN not only generates oriented anchors but also predicts the category of the candidate detection rectangles. A new matching strategy for the oriented anchors is also designed based on the center position and rotation angle of the anchors. This strategy can be well adapted to the Faster R-CNN framework.

In the rest of this paper, previous studies related to the grasp detection are summarized in Section 2. Detailed description of the proposed method is presented in Section 3. Experiments based on Cornell grasp dataset [19] and Jacquard dataset [20] are described in Section 4. Conclusions and future research directions are discussed in Section 5.

Section snippets

Related work

Robotic grasp problem has been studied over the last decades. Early work uses 3D object models to identify the grasp positions [21], [22]. However, the method is very time consuming and labor intensive when building 3D models. Moreover, 3D models cannot be built for unknown objects. Utilizing 3D models is not an effective method to obtain the grasp position in real world.

Deep learning method can directly learn object features from input images. It does not need to build an object model in

Proposed method for grasp detection

In the task of robotic grasp detection, the network only needs to classify proposals into graspable or ungraspable positions. Robotic grasp detection is a detection task with only two categories: graspable or ungraspable. Similarly, a region proposal network from the Faster R-CNN framework classifies the proposals into foreground or background. Therefore, it is reasonable to choose the RPN as the robotic grasp detection network.

Grasp detection needs to detect not only the grasp position but

Experimental studies

To test the performance of the proposed method, the grasp detection experiments are performed based on Cornell Grasp Dataset and Jacquard Dataset. The grasp detection accuracy is selected as the main performance metric. There are 885 images in the Cornell dataset which contains 240 graspable objects. There are 54,485 images in the Jacquard dataset which contains 11,619 graspable objects.

The network is designed based on Tensorflow, and all the experiments are implemented on Red Hat 4.8.5–28

Conclusions and future work

This paper proposes an effective robotic grasp detection method, which uses a single-stage grasp detection network based on region proposal networks. A new matching strategy is designed to match the oriented anchors generated by the proposed network. The performance of the proposed method is evaluated based on the Cornell grasp dataset and the Jacquard dataset. Experimental results show that the proposed method achieves high grasp detection accuracies on these two datasets. It suggests that the

CRediT authorship contribution statement

Yanan Song: Conceptualization, Methodology, Validation, Writing - original draft, Visualization. Liang Gao: Conceptualization, Software, Investigation, Writing - review & editing. Xinyu Li: Conceptualization, Formal analysis, Project administration, Funding acquisition. Weiming Shen: Conceptualization, Methodology, Investigation, Writing - review & editing, Supervision.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This work was supported by the National Key Research and Development Project [Grant Number 2018AAA0101704 and 2019YFB1704603] and Program for HUST Academic Frontier Youth Team [Grant Number 2017QYTD04].

References (33)

Y.S. He et al.
Fault correction of algorithm implementation for intelligentized robotic multipass welding process based on finite state machines
Robot. Comput. Integr. Manuf.
(2019)
Y.J. Laili et al.
Robotic disassembly re-planning using a two-pointer detection strategy and a super-fast bees algorithm
Robot. Comput. Integr. Manuf.
(2019)
A. Sintov et al.
Dynamic regrasping by in-hand orienting of grasped objects using non-dexterous robotic grippers
Robot. Comput. Integr. Manuf.
(2018)
V. Babin et al.
Stable and repeatable grasping of flat objects on hard surfaces using passive and epicyclic mechanisms
Robot. Comput. Integr. Manuf.
(2019)
Z.C. Wang et al.
Robot grasp detection using multimodal deep convolutional neural networks
Adv. Mech. Eng.
(2016)
D. Guo et al.
Deep vision networks for real-time robotic grasp detection
Int.J. Adv. Robot. Syst.
(2017)
G. Tian et al.
Disassembly sequence planning considering fuzzy component quality and varying operational cost
IEEE Trans. Auto. Sci. Eng.
(2017)
G. Tian et al.
Modeling and planning for dual-objective selective disassembly using and/or graph and discrete artificial bee colony
IEEE Trans. Indus. Inform.
(2018)
G. Tian, N. Hao, M. Zhou, W. Pedrycz, C. Zhang, F. Ma, Z. Li, Fuzzy grey choquet integral for evaluation of...
M.P. Polverini et al.
A constraint-based programming approach for robotic assembly skills implementation
Robot. Comput. Integr. Manuf.
(2019)

I. Lenz et al.

Deep learning for detecting robotic grasps

Int. J. Robot. Res.

(2015)

Y. Jiang et al.

Efficient grasping from RGBD images: learning using a new rectangle representation

J. Redmon et al.

Real-Time grasp detection using convolutional neural networks

S. Kumra et al.

Robotic grasp detection using deep convolutional neural networks

K.M. He et al.

Deep residual learning for image recognition

S.Q. Ren et al.

Towards real-time object detection with region proposal networks

IEEE Trans. Pattern Anal. Mach. Intell.

(2017)

Cited by (71)

Three-dimension object detection and forward-looking control strategy for non-destructive grasp of thin-skinned fruits
2024, Applied Soft Computing
The dynamic non-destructive grasp of thin-skinned fruits using flexible robotic hands, which requires obtaining three-dimension(3D) spatial structure information along with adaptive planning and motion control toward the target object, is a challenging topic in agricultural intelligence. To tackle the issue of 3D detection, we utilize RGB images and LiDAR point clouds for feature extraction and construct a multi-modal depth fusion convolution neural network (MDF-CNN) to obtain the classification information and perform image segmentation. Incorporating the advantages of a variable palm structure, we establish an evaluation mechanism of the optimal grasping stability (EM-OGS) using a hybrid method of the best configuration and force closure to build a new comprehensive performance optimal configuration planning (CPO-CP) method that is based on the multiple grasping performance indexes. We also create three cross-related nonlinear prediction models P-MGF, P-ODAP, and P-OBA along with a forward-looking non-destructive grasp control algorithm (FL-NGCA) using minimum grasping force to grasp thin-skinned fruits. The control algorithm carries out online, self-directed learning in the actual grasping process of the flexible hand and constantly optimizes the accuracy of the prediction model. Experimental results show that our proposed approach greatly improves the flexible hand comprehensive performance of grasping, outperforming state-of-the-art methods for non-destructive grasping of delicate fruits in most areas.
Antipodal-points-aware dual-decoding network for robotic visual grasp detection oriented to multi-object clutter scenes
2023, Expert Systems with Applications
It is challenging for robots to detect grasps with high accuracy and efficiency-oriented to multi-object clutter scenes, especially scenes with objects of large-scale differences. Effective grasping representation, full utilization of data, and formulation of grasping strategies are critical to solving the problem. To this end, this paper proposes an antipodal-points grasping representation model. Based on this, the Antipodal-Points-aware Dual-decoding Network (APDNet) is presented for grasping detection in multi-object scenes. APDNet employs an encoding–decoding architecture. The shared encoding strategy based on an Adaptive Gated Fusion Module (AGFM) is proposed in the encoder to fuse RGB-D multimodal data. Two decoding branches, namely StartpointNet and EndpointNet, are presented to detect antipodal points. To better focus on objects at different scales in multi-object scenes, a global multi-view cumulative attention mechanism, called Global Accumulative Attention Mechanism (GAAM), is also designed in this paper for StartpointNet. The proposed method is comprehensively validated and compared using a public dataset and real robot platform. On the GraspNet-1Billion dataset, the proposed method achieves 30.7%, 26.4%, and 12.7% accuracy at a speed of 88.4 FPS for seen, unseen, and novel objects, respectively. On the AUBO robot platform, the detection and grasp success rates are 100.0% and 95.0% on single-object scenes and 97.0% and 90.3% on multi-object scenes, respectively. It is demonstrated that the proposed method exhibits state-of-the-art performance with well-balanced accuracy and efficiency.
Logistics box recognition in robotic industrial de-palletising procedure with systematic RGB-D image processing supported by multiple deep learning methods
2023, Engineering Applications of Artificial Intelligence
In an automated box de-palletisation system that utilises robots, vision-based box recognition on the pallet plays the main role in providing picking guidelines. The complexity of the working condition and the target object, particularly the cluttered arrangement and various outer surfaces of the boxes, significantly affect the quality of the outcome. Typically, a large-scale vision dataset is required to train a deep learning object-detection model. However, considerable effort and time is required to achieve this. Therefore, this study proposes a Mask R-CNN-based detection approach for box objects, which is supported by a cycle generative adversarial network (Cycle GAN). The purpose of the Cycle-GAN is to optimise the outer surfaces of boxes by automatically erasing tags, stickers, labels, and symbols that exist on the boxes before loading them to the Mask R-CNN for detection. Subsequently, the obtained result was combined with the output from the developed boundary-enhancing technique that was applied to a depth map. Consequently, the box detection performance was significantly improved, and it was confirmed through experiments with a practical robot system in picking tasks. In the experiments, the success rate of the proposed method was validated using 200 cases of orderly and disorderly arrangements of boxes, respectively. Furthermore, the metric of the mean absolute error between the predicted picking point and the ground truth values for the test cases in the implementation process for the robot operation was also researched.
UPG: 3D vision-based prediction framework for robotic grasping in multi-object scenes
2023, Knowledge-Based Systems
Robotic grasping has the challenge of accurately extracting the graspable target from a complicated scenario. To address the issue, we propose a 3D vision prediction framework including visual observation and pose estimation. Firstly, we exploit the continuity characteristics in the U-disparity map to identify the isolated objects and occluded objects which can quickly partition the grasping scene and produce valid candidate regions for grasping. Secondly, an end-to-end approach based on PointNet＋＋ is improved to obtain the topmost target if there is a pile of stacked objects. We also provide a robust labeling method for generating the datasets comprising the multi-object scenes. Moreover, a designed evaluation criterion is presented to assist with estimating the 6-DOF (degree of freedom) pose. Our method UPG (U-disparity and PointNet＋＋ grasping) simplifies the segmentation task and makes the training model lightweight in order to apply in practical bin-picking and assembly. To validate the feasibility, UPG is evaluated on simulation and real-world scenes, respectively. The extensive results indicate that UPG can achieve better segmentation accuracy and grasping success rates against other state-of-the-arts.
A semantic robotic grasping framework based on multi-task learning in stacking scenes
2023, Engineering Applications of Artificial Intelligence
Autonomous robotic grasping is an essential skill for service robots to perform specified tasks in unstructured scenarios. Previous work focus on simple pick-and-place tasks, and it is not satisfactory for real-world scenes that have requirements for manipulation. In this paper, we present a modular intelligent robot architecture via multi-task convolutional neural network which can be used for specific object grasping and manipulation in a stacked and cluttered environment. Firstly, an end-to-end, multi-task semantic grasping convolutional neural network (MSG-ConvNet) that simultaneously outputs the results of grasp detection and semantic segmentation is proposed to recognize the affiliations between objects and grasps in cluttered scenarios. Secondly, we propose a post-processing method which allows the robot to select an optimal grasping area in an active perception way through simply reasoning on the multi-modal information output by the proposed model. The proposed multi-task network has a great improvement in both recognition accuracy and detection speed on the public multi-object dataset GraspNet-1Billion compared with the benchmark. The proposed grasp detection method also yields state-of-the-art performance with accuracies of 95.06% and 98.6% on the public single-object Jacquard Dataset and Cornell Dataset, respectively. In addition, the experiments in a real-world scene demonstrate that our proposed method has stronger robustness and adaptability than the simple direct grasping strategy in the environment with higher mutual occlusion.
Rotation adaptive grasping estimation network oriented to unknown objects based on novel RGB-D fusion strategy
2023, Engineering Applications of Artificial Intelligence
Accurate grasping estimation is prerequisite and key to achieving accurate robotic grasping. As common data sources, existing RGB and Depth (RGB-D) fusion strategies hardly fully use the advantages and suppress the disadvantages of both modes. In addition, existing methods mainly rely on data augmentation to achieve spatial and rotation adaptation, which cannot fundamentally solve the problem. Therefore, this paper proposes a framework for rotation adaptive grasping estimation based on a novel RGB-D fusion strategy. Specifically, the RGB-D is fused with shared weights in stages based on the proposed Multi-step Weight-learning Fusion (MWF) strategy. The spatial position is encoding learned autonomously based on the proposed Rotation Adaptive Conjoin (RAC) encoder to achieve spatial and rotational adaptiveness oriented to unknown objects with unknown poses. In addition, the Multi-dimensional Interaction-guided Attention (MIA) decoding strategy based on the fused multiscale features is proposed to highlight the practical elements and suppress the invalid ones. The method has been validated on the Cornell and Jacquard grasping datasets with cross-validation accuracies of 99.3% and 94.6%. The single-object and multi-object scene grasping success rates on the robot platform are 95.625% and 87.5%, respectively. Our performance compares favorably with state-of-the-art methods.

View all citing articles on Scopus

View full text

Full length ArticleA novel robotic grasp detection method based on region proposal networks

Highlights

Abstract

Introduction

Section snippets

Related work

Proposed method for grasp detection

Experimental studies

Conclusions and future work

CRediT authorship contribution statement

Declaration of competing interest

Acknowledgements

Robot. Comput. Integr. Manuf.

Robot. Comput. Integr. Manuf.

Robot. Comput. Integr. Manuf.

Robot. Comput. Integr. Manuf.

Adv. Mech. Eng.

Int.J. Adv. Robot. Syst.

Disassembly sequence planning considering fuzzy component quality and varying operational cost

IEEE Trans. Auto. Sci. Eng.

Modeling and planning for dual-objective selective disassembly using and/or graph and discrete artificial bee colony

IEEE Trans. Indus. Inform.

A constraint-based programming approach for robotic assembly skills implementation

Robot. Comput. Integr. Manuf.

Deep learning for detecting robotic grasps

Int. J. Robot. Res.

Efficient grasping from RGBD images: learning using a new rectangle representation

Real-Time grasp detection using convolutional neural networks

Robotic grasp detection using deep convolutional neural networks

Deep residual learning for image recognition

Towards real-time object detection with region proposal networks

IEEE Trans. Pattern Anal. Mach. Intell.

Full length Article
A novel robotic grasp detection method based on region proposal networks