This paper presents the development, testing and validation of SWEEPER, a robot for harvesting sweet pepper fruit in greenhouses. The robotic system includes a six degrees of freedom industrial arm equipped with a specially designed end effector, RGB-D camera, high-end computer with graphics processing unit, programmable logic controllers, other electronic equipment, and a small container to store harvested fruit. All is mounted on a cart that autonomously drives on pipe rails and concrete floor in the end-user environment. The overall operation of the harvesting robot is described along with details of the algorithms for fruit detection and localization, grasp pose estimation, and motion control. The main contributions of this paper are the integrated system design and its validation and extensive field testing in a commercial greenhouse for different varieties and growing conditions. A total of 262 fruits were involved in a 4-week long testing period. The average cycle time to harvest a fruit was 24 s. Logistics took approximately 50% of this time (7.8 s for discharge of fruit and 4.7 s for platform movements). Laboratory experiments have proven that the cycle time can be reduced to 15 s by running the robot manipulator at a higher speed. The harvest success rates were 61% for the best fit crop conditions and 18% in current crop conditions. This reveals the importance of finding the best fit crop conditions and crop varieties for successful robotic harvesting. The SWEEPER robot is the first sweet pepper harvesting robot to demonstrate this kind of performance in a commercial greenhouse.
In this paper we present the usage of PointNet, a deep neural network that consumes raw un-ordered point clouds, for detection of grape vine clusters in outdoor conditions. We investigate the added value of feeding the detection network with both RGB and depth, contradictory to common practice in agricultural robotics of relying on RGB only. A total of 5057 pointclouds (1033 manually annotated and 4024 annotated using geometric reasoning) were collected in a field experiment conducted in outdoor conditions on 9 grape vines and 5 plants. The detection results show overall accuracy of 91% (average class accuracy of 74%, precision 53% recall 48%) for RGBXYZ data and a significant drop in recall for RGB or XYZ data only. These results suggest the usage of depth cameras for vision in agricultural robotics is crucial for crops where the color contrast between the crop and the background is complex. The results also suggest geometric reasoning can be used for increased training set size, a major bottleneck in the development of agricultural vision systems.
Current practice for vine yield estimation is based on RGB cameras and has limited performance. In this paper we present a method for outdoor vine yield estimation using a consumer grade RGB-D camera mounted on a mobile robotic platform. An algorithm for automatic grape cluster size estimation using depth information is evaluated both in controlled outdoor conditions and in commercial vineyard conditions. Ten video scans (3 camera viewpoints with 2 different backgrounds and 2 natural light conditions), acquired from a controlled outdoor experiment and a commercial vineyard setup, are used for analyses. The collected dataset (GRAPES3D) is released to the public. A total of 4542 regions of 49 grape clusters were manually labeled by a human annotator for comparison. Eight variations of the algorithm are assessed, both for manually labeled and auto-detected regions. The effect of viewpoint, presence of an artificial background, and the human annotator are analyzed using statistical tools. Results show 2.8-3.5 cm average error for all acquired data and reveal the potential of using lowcost commercial RGB-D cameras for improved robotic yield estimation.
Advanced automation is required for greenhouse production systems due to the lack of skilled workforce and increasing labour costs [1]. As part of the EU project SWEEPER, we are working on developing an autonomous robot able to harvest sweet pepper fruits in greenhouses. This paper focuses on the operational flow of the robot for the high level task planning.
In the SWEEPER project, an RGB camera is mounted on the end effector to detect fruits. Due to the dense plant rows, the camera is located at a maximum of 40 cm from the plants and hence cannot provide an overview of all fruit locations. Only a few ripe fruits at each acquisition can be seen. This implies that the robot must incorporate a search pattern to look for fruits. When at least one fruit has been detected in the image, the search is aborted and a harvesting phase is initiated. The phase starts with directing the manipulator to a point close to the fruit and then activating a visual servo control loop. This motion approach ensures that the fruit is grasped despite the occlusions caused by the stems and leaves. When the manipulator has reached the fruit, it is harvested and automatically released into a container. If there are more fruits that have already been detected, the system continues to pick them. When all detected fruits have been harvested, the system resumes the search pattern again. When the search pattern is finished and no more fruits are detected, the robot base is advanced along the row to the next plant and repeats the operations above.
To support implementation of the workflow into a program controlling the actual robot, a generic software framework for development of agricultural and forestry robots was used [2]. The framework is constructed with a hybrid robot architecture, using a state machine implementing the following flowchart.
Robotic harvesters that use visual servoing must choose the best direction from which to approach the fruit to minimize occlusion and avoid obstacles that might interfere with the detection along the approach. This work proposes different approach strategies, compares them in terms of cycle times, and presents a failure analysis methodology of the different approach strategies. The different approach strategies are: in-field assessment by human observers, evaluation based on an overview image using advanced algorithms or remote human observers, or attempting multiple approach directions until the fruit is successfully reached. In the latter approach, each attempt costs time, which is a major bottleneck in bringing harvesting robots into the market. Alternatively, a single approach strategy that only attempts one direction can be applied if the best approach direction is known a-priori. The different approach strategies were evaluated for a case study of sweet pepper harvesting in laboratorial and greenhouse conditions. The first experiment, conducted in a commercial greenhouse, revealed that the fruit approach cycle time increased 8% and 116% for reachable and unreachable fruits respectively when the multiple approach strategy was applied, compared to the single approach strategy. The second experiment measured human observers’ ability to provide insights to approach directions based on overview images taken in both greenhouse and laboratorial conditions. Results revealed that human observers are accurate in detecting unapproachable directions while they tend to miss approachable directions. By detecting fruits that are unreachable (via automatic algorithms or human operators), harvesting cycle times can be significantly shortened leading to improved commercial feasibility of harvesting robots.
RGB-D cameras play an increasingly important role in localization and autonomous navigation of mobile robots. Reasonably priced commercial RGB-D cameras have recently been developed for operation in greenhouse and outdoor conditions. They can be employed for different agricultural and horticultural operations such as harvesting, weeding, pruning and phenotyping. However, the depth information extracted from the cameras varies significantly between objects and sensing conditions. This paper presents an evaluation protocol applied to a commercially available Fotonic F80 time-of-flight RGB-D camera for eight different object types. A case study of autonomous sweet pepper harvesting was used as an exemplary agricultural task. Each of the objects chosen is a possible item that an autonomous agricultural robot must detect and localize to perform well. A total of 340 rectangular regions of interests (ROI) were marked for the extraction of performance measures of point cloud density, and variability around center of mass, 30-100 ROIs per object type. An additional 570 ROIs were generated (57 manually and 513 replicated) to evaluate the repeatability and accuracy of the point cloud. A statistical analysis was performed to evaluate the significance of differences between object types. The results show that different objects have significantly different point density. Specifically metallic materials and black colored objects had significantly less point density compared to organic and other artificial materials introduced to the scene as expected. The point cloud variability measures showed no significant differences between object types, except for the metallic knife that presented significant outliers in collected measures. The accuracy and repeatability analysis showed that 1-3 cm errors are due to the the difficulty for a human to annotate the exact same area and up to ±4 cm error is due to the sensor not generating the exact same point cloud when sensing a fixed object.
An autonomous sweet pepper harvesting robot must perform several tasks to successfully harvest a fruit. Due to the highly unstructured environment in which the robot operates and the presence of occlusions, the current challenges are to improve the detection rate and lower the risk of losing sight of the fruit while approaching the fruit for harvest. Therefore, it is crucial to choose the best approach direction with least occlusion from obstacles.
The value of ideal information regarding the best approach direction was evaluated by comparing it to a method attempting several directions until successful harvesting is performed. A laboratory experiment was conducted on artificial sweet pepper plants using a system based on eye-in-hand configuration comprising a 6DOF robotic manipulator equipped with an RGB camera. The performance is evaluated in laboratorial conditions using both descriptive statistics of the average harvesting times and harvesting success as well as regression models. The results show roughly 40–45% increase in average harvest time when no a-priori information of the correct harvesting direction is available with a nearly linear increase in overall harvesting time for each failed harvesting attempt. The variability of the harvesting times grows with the number of approaches required, causing lower ability to predict them.
Tests show that occlusion of the front of the peppers significantly impacts the harvesting times. The major reason for this is the limited workspace of the robot often making the paths to positions to the side of the peppers significantly longer than to positions in front of the fruit which is more open.