Original papers
3D global mapping of large-scale unstructured orchard integrating eye-in-hand stereo vision and SLAM

https://doi.org/10.1016/j.compag.2021.106237Get rights and content

Highlights

  • A new form of orchard mapping framework integrating eye-in-hand stereovision and SLAM is proposed.

  • Large-scale, high-accuracy, and detailed global maps supporting high-quality orchard picking are obtained.

  • The proposed hand-eye calibration method is efficient and beats the compared methods.

  • The proposed stereo matching method is highly adapted to dynamic and complex orchard environment.

  • The framework generates a more detailed global map than the commercial products used for comparison.

Abstract

Large-scale, high-accuracy, and adaptive three-dimensional (3D) perception are the basic technical requirements for constructing a practical and stable fruit-picking robot. The latest vision-based fruit-picking robots have been able to adapt to the complex background, uneven lighting and low color contrast of the orchard environment. However, most of them have, until now, been limited to a small field of view or rigid sampling manners. Although the simultaneous localization and mapping (SLAM) methods have the potential to realize large scale sensing, it was critically revealed in this study that the classic SLAM pipeline is not completely adapted to orchard picking tasks. In this study, the eye-in-hand stereo vision and SLAM system were integrated to provide detailed global map supporting long-term, flexible and large-scale orchard picking. To be specific, a mobile robot based on eye-in-hand vision was built and an effective hand-eye calibration method was proposed; a state-of-the-art object detection network was trained and used to establish a dynamic stereo matching method adapted well to complex orchard environments; a SLAM system was deployed and combined with the above eye-in-hand stereo vision system to obtain a detailed, wide 3D orchard map. The main contribution of this work is to build a new global mapping framework compatible to the nature of orchard picking tasks. Compared with the existing studies, this work pays more attention to the structural details of the orchard. Experimental results indicated that the constructed global map achieved both large-scale and high-resolution. This is an exploratory work providing theoretical and technical references for the future research on more stable, accurate and practical mobile fruit picking robots.

Introduction

A fruit-picking robot equipped with a stereo vision system improves the efficiency and quality of orchard harvesting. At present, most of the stereo vision systems, such as binocular vision system, RGB-D vision system and multi-vision system, are able to adapt to the complex background, uneven lighting and low color contrast of the unstructured orchard (Tang et al., 2020).

The key point of further researches about fruit picking is how to build a compact, coordinated and practical fruit picking robot based on existing high-performance stereo vision systems. There have been some representative cases in this regard. Xiong et al. (2018) constructed a fruit picking robot using two CCD cameras. The main contribution of this work is to provide a method for calculating the oscillation angle of the fruit under natural disturbance. Ge et al., 2020, Ge et al., 2019) demonstrated a strawberry picking robot using RGB-D vision. They proposed a method that can estimate the 3D shape of the fruit from a small number of images. At the same time, they calculated the safe working area of the robot and constructed a complete strawberry picking process. Lin et al. (2019) constructed a robot for guava picking. The innovation of this work is to propose an efficient stereo vision method reconstructing obstacles on trees. Li et al. (2020) trained a high-performance semantic segmentation network, and obtained the spatial position of the tiny fruit stems directly from the original sampled images, which realized the end-to-end perception of the picking point. Silwal et al. (2017) combined agronomic methods to develop a picking robot that can be used in the orchard with ideal fruit distribution. This work is practical and constructive because it gives feasible solutions to achieve a complete picking process. Wibowo et al. (2017) developed an end-to-end autonomous coconut harvesting robot that can automatically climb coconut trees and detect fruits through a vision system. They designed a novel fixing and cutting mechanism, which is small in size and easy to carry. It provided a reference for the design of other similar lightweight harvesting robots. Zhang et al. (2020) introduced a robot for harvesting fruit with pedicels. They combined the deep instance segmentation network and spatial geometric methods to quickly solve the location of the picking point. They had also designed an end effector that can effectively protect the pulp from damage. This system was successfully verified in a laboratory environment. Williams et al. (2019) proposed a kiwifruit picking robot composed of four robotic arms. The vision system combined deep learning and stereo vision methods to perceive kiwifruit under natural lighting. Their dynamic fruit scheduling system successfully coordinated the various modules. The robot had passed the stability test in the orchard environment and was a complete and usable product.

The above cases had successfully applied a vision-based fruit picking robot in the unstructured environment. However, most of them are limited to the studies of basic issues or employed under local scenes. In fact, this is far from meeting the needs of fruit picking tasks targeting long time span and large scale area. The behavior of picking robots in a dynamic global scale is very complicated, and much more factors must be considered, such as the correlation of mobile platform and robotic arm, the efficiency and completeness of image sampling, the changes in the terrain and obstacles and the differences in distance between different targets, etc. This requires highly integrated hardware, stable 3D vision algorithms, and compact system framework. In addition, more stability tests under large scale orchards are necessary.

In recent years, more and more attention and efforts to visual picking robots have been shifted from local to global scale. Establishing an intelligent, robust, and large-scale picking system has gradually become a consensus in the field of harvesting robots (Tang et al., 2020). Visual SLAM technology can accurately construct a large-scale scene, track the position of the vision system in the global map, and estimate the motion trajectory; therefore, it is a potential solution for completing a large-scale picking task. There have already been some exploratory studies on this topic. For example, Dong et al. (2020) used a RGB-D camera to collect the point clouds of the trees and fruits in an orchard environment and, subsequently, utilized the ORB-SLAM2 framework for motion tracking and mapping. Habibie et al. (2018) collected simulated agricultural data through RGB cameras and lasers and, subsequently, generate a global grid map of the fruits and trees. Shalal et al. (2015) fused camera and laser scanner data to detect tree trunks and construct local orchard maps. The developed algorithm relied only on the on-board sensors of the mobile robot, without adding any artificial landmarks in the orchard. However, this work focused on the distribution of trees on the map, but lacked detailed information about the structure of the fruits. Underwood et al. (2016) proposed a 3D mobile scanning system for almond orchards to reconstruct the distribution map of flowers and fruits and estimate the yield of a single tree. They used a vehicle equipped with lidar and cameras to scan the orchard and construct a forest canopy model database over a long span. Fan et al. (2018) developed a portable and flexible RGB-D SLAM system to estimate the position, height and specific geometric parameters of trees in a large area of forest. As the point cloud resolution of the RGB-D camera dropped sharply in a large range, the point cloud fitting was relatively rough. Nellithimaru and Kantor (2019) proposed a visual SLAM system for fast counting of fruits in vineyards. It combined the classic 3D reconstruction technology with a robust instance segmentation network to obtain a clear 3D structure of the grapes. The structure of this SLAM system was relatively simple, so there may be risks of error accumulation. Ivanov et al. (2020) presented a complete set of technical solutions for outdoor mobile robots including visual perception, navigation and movement. While satisfying the basic functions, the solution guaranteed the stable exchange of internal data of the robot. It had been successfully applied to different complex scenarios and was of great importance for the behavioral logic design of future mobile robots. In addition to the above cases, more different types of agricultural SLAM systems regarding GPS, ultrasonic sensors or inertial measurement units (IMU) had also been proposed and verified (Chen et al., 2018, Gan et al., 2017, Katikaridis et al., 2019).

The SLAM technologies involved in the above research have provided possibilities for building a vision system suitable for large scale environments. However, it should to be pointed out that relevant works are, until now, in their infancy, and there are still a lot of works to be done to realize stable and practical orchard picking application. Capua et al. (2018) pointed out that the performance of the existing SLAM systems cannot fully meet the needs of large-scale agricultural tasks. They believed that by building a new form of SLAM system, optimizing the target tracking algorithm and increasing the type and number of sensors may be able to solve some of the problems. It had also been pointed out in some studies that the original SLAM system may encounter stability problems when applied to agricultural environments (Chebrolu et al., 2017, Gao et al., 2018, Pierzchała et al., 2018). In addition, the huge demand for computing resources of SLAM on a large scale is one of the most important factors restricting its application in the agricultural field (Aguiar et al., 2020, Zhao et al., 2020).

In fact, although SLAM achieves localization and mapping at the same time, it can be figured out that most of the SLAM applications regarding autonomous platforms (such as self-driving, high-altitude cruising, logistics scheduling, etc.) pay more attention to the real-time localization of the mobile platform, rather than the 3D detail of the constructed map. In order to ensure real-time performance, dense spatial features are always ignored, and only a sparse map is constructed. Even if a dense map is constructed in individual cases, it is only used for ensuring better navigations. However, on the contrary, picking in large-scale orchards require high local 3D reconstruction accuracy to better determine the shape, size, maturity and 3D structure of each observable fruits, and then generate a yield map. The more complete the output yield map, the more it can help to estimate the suitable picking area and form a better picking plan for the robot. Therefore, a global mapping system fully compatible to the nature of orchard picking task is urgently needed. Compared with the SLAM system, the stereo vision system pays more attention to the understanding of local structural details, so it can be utilized to improve the resolution of SLAM to construct a higher-performance orchard mapping system.

In this study, a new form of mobile picking robot and a mapping framework integrating stereo vision and SLAM was established. Specifically, an eye-in-hand stereo vision system was built to sample images and generate local maps, and a SLAM system was constructed to estimating the global trajectory of the cameras. Finally, the local maps were stitched according to the global trajectory to form a detailed, wide, 3D orchard map. The main contribution of this work is to combine high-accuracy stereo vision and large-scale SLAM technology, to construct a high-performance global mapping framework compatible to the fruit picking tasks. All the technologies mentioned in this study are necessities of this mapping framework, and they are mutually cooperative.

Section snippets

Overall framework

An eye-in-hand mobile robot based on binocular vision and 6-DOF robotic arm was constructed. The system comprises four modules, namely sampling, object detection, local dense mapping, and global trajectory estimation, as shown in Fig. 1.

The sampling module was implemented based on the eye-in-hand system. An efficient hand-eye calibration method was proposed to determine the coordinate correlation between the robotic arm and the camera.

The object detection module and the local dense mapping

Experiment

This section is divided into two parts: local experiments and global experiments. In the local experiments, the performance of the sampling module, object detection module and local dense mapping module was verified under the local scenes of the orchard, respectively. In the global experiments, detailed and broad global 3D maps of the orchards were obtained, facilitating the validation of the performance of the entire mapping system.

Conclusions and future work

The basic vision technologies of orchard picking robot, such as image segmentation, 3D positioning and surface reconstruction, have been initially completed with the joint efforts of today's researchers. It is believed that the global perceptions, general frameworks, and practical applications will be the key to the future development of the visual picking robots.

In this study, a mobile platform equipped with an eye-in-hand stereo vision system and a 6-DOF robotic arm was built. A flexible 3D

CRediT authorship contribution statement

Mingyou Chen: Methodology, Software, Writing - review & editing, Writing - original draft. Yunchao Tang: Conceptualization, Writing - original draft. Xiangjun Zou: Supervision. Zhaofeng Huang: Methodology, Data curation. Hao Zhou: Investigation, Visualization. Siyu Chen: Data curation.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This work was supported by the Special Funding Project of Guangdong Enterprise Science and Technology Commissioner (GDKTP2020029500), the Key-area Research and Development Program of Guangdong Province (2019B020223003), and the Major scientific research projects of Guangdong Province (2020KZDZX1037).

References (40)

  • Andreff, N., Horaud, R., Espiau, B., 1999. On-line hand-eye calibration, in: Second International Conference on 3-D...
  • Capua, F.R., Sansoni, S., Moreyra, M.L., 2018. Comparative Analysis of Visual-SLAM Algorithms Applied to Fruit...
  • N. Chebrolu et al.

    Agricultural robot dataset for plant classification, localization and mapping on sugar beet fields

    Int. J. Rob. Res.

    (2017)
  • K. Daniilidis

    Hand-eye calibration using dual quaternions

    Int. J. Rob. Res.

    (1999)
  • W. Dong et al.

    Semantic mapping for orchard environments by merging two-sides reconstructions of tree rows

    J. F. Robot.

    (2020)
  • Y. Fan et al.

    Estimating tree position, diameter at breast height, and tree height in real-time using a mobile phone with RGB-D SLAM

    Remote Sens.

    (2018)
  • Gan, H., Lee, W.S., Alchanatis, V., 2017. A Prototype of an Immature Citrus Fruit Yield Mapping System, in: 2017 ASABE...
  • X. Gao et al.

    Review of wheeled mobile robots’ navigation problems and application prospects in agriculture

    IEEE Access

    (2018)
  • Y. Ge et al.

    Fruit localization and environment perception for strawberry harvesting robots

    IEEE Access

    (2019)
  • Habibie, N., Nugraha, A.M., Anshori, A.Z., Ma’sum, M.A., Jatmiko, W., 2018. Fruit mapping mobile robot on simulated...
  • Cited by (99)

    View all citing articles on Scopus
    View full text