Keywords

1 Introduction

Approximately one third of all pest related agriculture production losses are attributed to weeds [1]. Weeds reduce crop yield by sharing nutrients, moisture and sunlight with host plants in an adaptive and competitive process [2]. Herbicide application is a common agriculture practice in mitigating the impact of weeds on crop yield. In USA, it constitutes two third of all chemical application to agricultural fields [3]. Increasing trend of chemical application have raised environmental, biological and sustainability concerns. Recent studies have shown their detrimental effects on human health [4]. To reduce harmful effects of chemicals while ensuring profitability of farmers, precision agriculture proposes site specific variable rate application of herbicides which requires accurate mapping of weed densities [5, 6]. Weed mapping on a large scale is a challenging task due to spectral similarity of weeds and host plants.

Weed mapping techniques can be classified into two broad categories: interline and intraline. The former assumes that host plants are planted in rows and everything outside of plant rows is weed [7]. This technique has inherit flaw of misclassifying intra row weeds as host plants and inter-row host plants as weeds. Intra-line approaches attempt to address these flaws by extracting shape features of plants and classifying them into host plant and weeds. With the advent of deep learning techniques, image classification tasks have become easier due to automated feature extraction. In precision agriculture, different deep learning based classification techniques are being employed. Semantic segmentation is a promising pixel level classification technique for weed density mapping. The bottleneck for this technique is labelling of data at pixel level which is time consuming. Recent works have concentrated on synthetic data for training semantic segmentation models and then employing them for real data. Training models on synthetic data do not generalize well on real datasets.

In this paper, semantic segmentation technique is used on the images acquired from oat fields in Saskatchewan for weed density estimation. The paper makes following contributions:

  1. 1.

    It proposes a two step manual labelling procedure for pixels in agriculture images.

  2. 2.

    Semantic segmentation is employed on a real oat field imagery for both training and testing.

The proposed methodology has shown Intersection Over Union (IOU) value of 81.28% for weeds and Mean Intersection Over Union (MIOU) value of 90.445%. Remainder of the paper is organized as follows: Sect. 2 surveys related works, Sect. 3 explains methodology, Sect. 4 discusses results and Sect. 5 concludes the paper.

2 Related Work

Distribution of weeds is not uniform in field. Its patchiness character prompts site specific weed management. Garibay et al. study site specific weed control by thresholding weed density for herbicide spray [8]. Site specific weed control is not readily adopted by farmers due to accuracy concerns, unavailability of robust weed recognition system and limitation of spraying machinery [9]. Castaldi et al. use Unmanned Aerial Vehicle (UAV) imagery to explore the economic potential of patch spraying and its effects on crop yield [10]. Korres et al. study relationship of soil properties and weed types with focus on weeds along highways [11]. Metcalfe et al. demonstrate correlation between weed and soil properties and make prediction of weed patches in wheat field with the objective to make cite specific weed control more effective [12].

Apart from weed patch prediction based on soil properties, weed detection using computer vision techniques is also widely studied. Traditionally, weed detection involves following four steps [13]:

  1. 1.

    RGB or multispectral image acquisition through UAV or ground moving equipment.

  2. 2.

    Background and foreground (vegetation) segmentation.

  3. 3.

    Feature extraction from images like shape and colours.

  4. 4.

    Classification of images based on extracted features.

Saari et al. study UAV and ground equipment mounted sensors for higher resolution imagery [14]. For background segmentation, numerous techniques like Otsu-Adaptive Thresholding, clustering algorithms and principle component analysis are employed to separate vegetation from soil [5, 15, 16]. These colour based segmentation techniques do not perform well under varying sunlight, weather conditions and shadows. Feature extraction and classification techniques can be further categorized in two main classes, interline approach and intraline approaches. Bah et al. implement interline approach using normalized Hough transform to detect crop rows [17]. This approach has disadvantage of misclassifying interline crop plants as weed and intraline weeds as host plants. Contrary to this, intraline approach assumes that weeds can be both interline and intraline [18]. For the purpose, extra features like texture and shape are extracted from weed and host plants to classify images [19]. Lastly, different machine learning techniques like support vector machines and artificial neural network are used to classify based on extracted features [20].

Deep learning has emerged as a powerful machine learning tool in the field of computer vision because of its ability to extract features automatically [21]. Dyrmann et al. detect the location of monocot and dicot weeds in cereal field images using Convolutional Neural Networks (CNN) [22]. Yu et al. apply object detection techniques like VGGNet, GoogLeNet and DetectNet for detecting weeds in turf-grass [23]. Semantic segmentation techniques are also being implemented. Bottleneck in semantic segmentation is pixel wise labelling of images. Dyrmann et al. overcome this problem by synthesizing training images and labels. Weeds and host plants are placed in randomly overlapping and nonoverlapping configurations [24]. Potena et al. use a small representative dataset to label large dataset for semantic segmentation [25]. To compensate the unavailability of large labelled data for semantic segmentation, Milioto et al. input vegetation indexes as additional variables to segmentation model [26]. These studies lack fully labelled real images at pixel level for semantic segmentation which is the focus of this work.

3 Methodology

The objective of the study is to estimate weed density for crops grown in Canadian Prairies. The weed density mapping will be used for variable rate herbicide application. Approach adopted in this paper can be summarized in three steps. First step is acquisition of images and second is labelling the pixels in a two step procedure. Third step is to train semantic segmentation model for automating weed mapping and weed density calculation. Following sub sections give details about these steps.

3.1 Two Step Manual Labelling

For deep learning applications in precision agriculture, large number of labelled agriculture images are not available [27]. Semantic segmentation requires images to be labelled at pixel level which is time consuming. In this study, focus is on developing an efficient and effective way of labelling RGB images. A two step manual labelling procedure is proposed as follows.

Background Removal Using Maximum Likelihood Segmentation. In first step, images are preprocessed by segmenting background and foreground using Maximum Likelihood Segmentation (MLS) [28]. Background removal is performed for two reasons, first is to label background pixels and second reason is to facilitate manual labelling of weeds as with background there are chances that some weed plants are missed in a highly varied background from being labelled. ARCGIS is used as a tool for this purpose. Unlike rule based scheme applied to all images, in our procedure we are making batch of similar images and then training MLS on each batch separately for background removal. MLS is applied in batches because RGB images vary in leave colours, light conditions, soil colour, moisture content of soil, mix of dead plants and some of images contain shadow of the sensing equipment. Figure 1 shows the instances of variations in the images.

Fig. 1.
figure 1

Examples of images with shadows, varying sunlight and colours

Manual Labelling. In second step, minority class pixels are manually labelled using Labelme software package [29]. Instead of labelling both crop and weeds, only weeds are labelled assuming it to be a minority class in images. The crop pixels are zeroed out like background pixels in first step. Minority class labelling dramatically reduces time for manual labelling of pixels. Figure 2 is an example of manually labelled image.

Fig. 2.
figure 2

Manual labelling of minority class pixels

3.2 Semantic Segmentation

Semantic segmentation has seen great progress in recent years thanks to advent of deep learning techniques. Deep learning based semantic segmentation consists of encoding and decoding blocks. Encoding block downsamples the image and extracts features out of it and decoder block up samples to target mask size. The network architecture of encoder and decoder blocks is determined by meta-architecture scheme like UNET [30] and SegNet [31]. The paper makes comparison of UNET and SegNet on given dataset. In UNET, whole feature map is transferred from encoder block to decoder block while in SegNet only pooling indexes are transferred from encoder block to decoder block. In both UNET and SegNet, decoding blocks are transpose of encoding block. Phased upsampling in UNET and SegNet improve accuracy of network [32].

After semantic segmentation is performed on images, weed densities are estimated by following equation:

$$\begin{aligned} Weed\;density\, (w_d) = \frac{Weed\;pixels\;in\;a\;image}{Total\;pixels} \end{aligned}$$
(1)

Crop pixels are not separately classified because the objective of the study is to estimate weed density (\(w_{d}\)) for variable rate herbicide application. However, crop density (\(c_{d}\)) can be estimated by subtracting weed density from background segmented vegetation density (\(v_{d}\)) given by following equation:

$$\begin{aligned} c_d = {v_d - w_d} \end{aligned}$$
(2)

where \(v_d\) is the vegetation density and \(c_d\) is the crop density in the image.

4 Results Discussion

The study is conducted in collaboration with CropPro consulting, Canada. RGB images are collected from three oat fields at early growth stage using quad mounted Sony DSC-RX100M2 camera. A total of 2109 images are collected in a grid pattern of 60 ft by 80 ft. The dataset is augmented to 4702 images using different combinations of flipping, rotation, shearing, scaling, noise addition, colour variations and blurry effects. The original images are divided into four tiles of 800 \(\times \) 544 to deal with memory constraints as downsampling would remove details from the images.

For semantic segmentation UNET and SegNet are used with VGG16 and ResNet-50 as base models. To evaluate and fine tune models, dataset is divided into train, validation and test dataset with split ratio of 70%, 15% and 15% respectively. Thereafter it is augmented to avoid overfitting and better generalization. The trained models are evaluated on accuracy, precision, recall, F1, IOU, MIOU and Frequency Weighted Intersection Over Union (FWIOU). F1 score, IOU, MIOU and FWIOU are given by following equations:

$$\begin{aligned} F1 = \frac{2 \cdot precision\cdot recall}{precision+ recall} \end{aligned}$$
(3)
$$\begin{aligned} IOU = \frac{Area\;of\;overlap}{Area\;of\;union} \end{aligned}$$
(4)
$$\begin{aligned} MIOU = \frac{{IOU_i + IOU_j}}{{k}} \end{aligned}$$
(5)
$$\begin{aligned} FWIOU = {w_i \times IOU_i + w_j \times IOU_j} \end{aligned}$$
(6)

where \(w_{i}\) and \(w_{j}\) are the weights of each class and k is number of pixel classes.

Table 1 summarizes the metrics for evaluation on test dataset. For comparison purpose, accuracy for majority class classifier is calculated to be 98.27%. Accuracy of the UNET model exceeds this by 1.30% while that of SegNet model exceeds majority class classifier (MCC) by 1.37%. SegNet performance is comparatively better than UNET. IOU for weed class is 81.28% for SegNet model. MIOU and FWIOU values for SegNet model are 90.445% and 99.29%.

Table 1. Evaluation metrics

As per developed methodology, models are trained in a way that crop pixels and background pixels are classified in to one class and weed pixels to other class. This means semantic models should ideally learn shape features of crop and spectral properties of background and club them together into one class while labelling remaining pixels as weeds. It is pertinent to mention that there are no means available to ascertain what model is actually learning except having clues from testing it on various images. If model is learning something close to ideal scenario then it should be able to map new types of weeds which were not included in data at learning stage. To evaluate model performance on new types of weeds, it is tested on images of oat crop containing new weeds. Figure 3a contains a new weed type called Horsetail (highlighted) which is not previously seen by the model. The trained SegNet model successfully detects and maps this weed as shown in Fig. 3b.

Fig. 3.
figure 3

SegNet model performance on detecting new types of weeds

There are some points where models confuse weed and crop-background classes. In blurry images oat plants are mapped as weed. Models fail to identify crop plants because of indistinct shapes. So, model labels every vegetation in the image as weed. At image preprocessing stage, training images were made blurry to improve models performance on blurry images. However, when model is confronted with blurry images like Fig. 4, it fails to crop and weed pixels.

Fig. 4.
figure 4

Examples of model confusion on blurry images

5 Conclusion and Future Recommendations

Accurate mapping of weed and crop densities in field provides basis for variable rate herbicide application. Semantic segmentation is a promising technique to estimate these densities. Using two step manual labelling procedure, a relatively bigger set of images can be labelled for model training resulting in better MIOU and accuracy values. As in proposed methodology, trained model eliminates crop pixels along with background pixels, the remaining pixels are labelled as weed pixels. It has advantage of detecting new weeds which are not seen by model during training. In performance comparison of UNET and SegNet, SegNet performs UNET. In future work, we plan to club different density zones to provide basis for variable rate herbicide quantification.