Full length ArticleA novel robotic grasp detection method based on region proposal networks
Introduction
Robots and intelligent algorithms [1], [2], [3] are essential to the development of the intelligent manufacturing. The robots are widely used in robotic welding [4], robotic assembly [5], and robotic disassembly [6]. Grasp is a very important ability for robots to complete pick-and-place tasks. But grasping objects reliably is still a great challenge for robots due to unstructured environments and other uncertainties [7], [8]. The grasp tasks not only require a robot to accurately identify objects, but also require the robot to accurately determine the position and orientation of the objects. Inaccurate grasp points will result in failed grasp operations, which in turn affect subsequent path planning and grasp-based tasks. Therefore, an effective grasp detection method is essential for robots to complete grasp tasks.
Deep learning does not require hand-engineered features, and can effectively deal with unstructured environments. It is widely used in general object detection and has achieved great successes. However, the deep learning method uses a horizontal rectangle with four-dimensional representation to indicate the detection position in the general object detection task. The horizontal rectangle is not suitable for grasp detection task due to the grasp rotation angle. The grasp rectangle with five-dimensional representation is first used for robotic grasp task based on a two-stage cascaded structure [9]. The cascaded structure outperforms the results based on hand-engineered features [10]. But its network structure is very complicated.
To reduce the computation complexity, a single-stage network was proposed by Redmon and Angelova [11] to regress the grasp rectangle. Their method makes use of the depth information by replacing the blue channel of the image. A new multi-modal convolutional neural network was also designed to perform grasp detection [12] based on residual layers [13]. Their results shown that the deeper network and residual network are conducive to the improvement of grasp accuracy. But these methods cannot use prior information to improve the detection accuracy.
The priori information was proved to effectively improve object detection accuracy in general object detection task [14]. The anchor boxes were introduced into the grasp detection by Guo et al. [15]. But the anchor is a horizontal rectangle that cannot reflect the angle information. Zhou et al. [16] proposed a rotation anchor box with an oriented anchor box mechanism to represent grasp detection results. Their anchor matching strategy greatly improved the grasp accuracy. But their network was designed based on YOLO framework [17] that divides the input image into multiple grid cells. Their anchor matching strategy is not suitable for Faster R-CNN based detection framework. Another grasp detection method was proposed based on the Faster R-CNN framework by Chu et al. [18]. Their method divides grasp detection into the regression of coordinates and the classification of angles, which increases the network complexity.
This paper proposes an effective single-stage grasp detection network based on Faster R-CNN framework. The grasp detection is considered a detection task that contains two categories. The grasp detection network is designed based on region proposal network (RPN) from Faster R-CNN. RPN not only generates oriented anchors but also predicts the category of the candidate detection rectangles. A new matching strategy for the oriented anchors is also designed based on the center position and rotation angle of the anchors. This strategy can be well adapted to the Faster R-CNN framework.
In the rest of this paper, previous studies related to the grasp detection are summarized in Section 2. Detailed description of the proposed method is presented in Section 3. Experiments based on Cornell grasp dataset [19] and Jacquard dataset [20] are described in Section 4. Conclusions and future research directions are discussed in Section 5.
Section snippets
Related work
Robotic grasp problem has been studied over the last decades. Early work uses 3D object models to identify the grasp positions [21], [22]. However, the method is very time consuming and labor intensive when building 3D models. Moreover, 3D models cannot be built for unknown objects. Utilizing 3D models is not an effective method to obtain the grasp position in real world.
Deep learning method can directly learn object features from input images. It does not need to build an object model in
Proposed method for grasp detection
In the task of robotic grasp detection, the network only needs to classify proposals into graspable or ungraspable positions. Robotic grasp detection is a detection task with only two categories: graspable or ungraspable. Similarly, a region proposal network from the Faster R-CNN framework classifies the proposals into foreground or background. Therefore, it is reasonable to choose the RPN as the robotic grasp detection network.
Grasp detection needs to detect not only the grasp position but
Experimental studies
To test the performance of the proposed method, the grasp detection experiments are performed based on Cornell Grasp Dataset and Jacquard Dataset. The grasp detection accuracy is selected as the main performance metric. There are 885 images in the Cornell dataset which contains 240 graspable objects. There are 54,485 images in the Jacquard dataset which contains 11,619 graspable objects.
The network is designed based on Tensorflow, and all the experiments are implemented on Red Hat 4.8.5–28
Conclusions and future work
This paper proposes an effective robotic grasp detection method, which uses a single-stage grasp detection network based on region proposal networks. A new matching strategy is designed to match the oriented anchors generated by the proposed network. The performance of the proposed method is evaluated based on the Cornell grasp dataset and the Jacquard dataset. Experimental results show that the proposed method achieves high grasp detection accuracies on these two datasets. It suggests that the
CRediT authorship contribution statement
Yanan Song: Conceptualization, Methodology, Validation, Writing - original draft, Visualization. Liang Gao: Conceptualization, Software, Investigation, Writing - review & editing. Xinyu Li: Conceptualization, Formal analysis, Project administration, Funding acquisition. Weiming Shen: Conceptualization, Methodology, Investigation, Writing - review & editing, Supervision.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
This work was supported by the National Key Research and Development Project [Grant Number 2018AAA0101704 and 2019YFB1704603] and Program for HUST Academic Frontier Youth Team [Grant Number 2017QYTD04].
References (33)
- et al.
Fault correction of algorithm implementation for intelligentized robotic multipass welding process based on finite state machines
Robot. Comput. Integr. Manuf.
(2019) - et al.
Robotic disassembly re-planning using a two-pointer detection strategy and a super-fast bees algorithm
Robot. Comput. Integr. Manuf.
(2019) - et al.
Dynamic regrasping by in-hand orienting of grasped objects using non-dexterous robotic grippers
Robot. Comput. Integr. Manuf.
(2018) - et al.
Stable and repeatable grasping of flat objects on hard surfaces using passive and epicyclic mechanisms
Robot. Comput. Integr. Manuf.
(2019) - et al.
Robot grasp detection using multimodal deep convolutional neural networks
Adv. Mech. Eng.
(2016) - et al.
Deep vision networks for real-time robotic grasp detection
Int.J. Adv. Robot. Syst.
(2017) - et al.
Disassembly sequence planning considering fuzzy component quality and varying operational cost
IEEE Trans. Auto. Sci. Eng.
(2017) - et al.
Modeling and planning for dual-objective selective disassembly using and/or graph and discrete artificial bee colony
IEEE Trans. Indus. Inform.
(2018) - G. Tian, N. Hao, M. Zhou, W. Pedrycz, C. Zhang, F. Ma, Z. Li, Fuzzy grey choquet integral for evaluation of...
- et al.
A constraint-based programming approach for robotic assembly skills implementation
Robot. Comput. Integr. Manuf.
(2019)
Deep learning for detecting robotic grasps
Int. J. Robot. Res.
Efficient grasping from RGBD images: learning using a new rectangle representation
Real-Time grasp detection using convolutional neural networks
Robotic grasp detection using deep convolutional neural networks
Deep residual learning for image recognition
Towards real-time object detection with region proposal networks
IEEE Trans. Pattern Anal. Mach. Intell.
Cited by (71)
Antipodal-points-aware dual-decoding network for robotic visual grasp detection oriented to multi-object clutter scenes
2023, Expert Systems with ApplicationsLogistics box recognition in robotic industrial de-palletising procedure with systematic RGB-D image processing supported by multiple deep learning methods
2023, Engineering Applications of Artificial IntelligenceUPG: 3D vision-based prediction framework for robotic grasping in multi-object scenes
2023, Knowledge-Based SystemsA semantic robotic grasping framework based on multi-task learning in stacking scenes
2023, Engineering Applications of Artificial IntelligenceRotation adaptive grasping estimation network oriented to unknown objects based on novel RGB-D fusion strategy
2023, Engineering Applications of Artificial Intelligence