Late fusion of multimodal deep neural networks for weeds classification

https://doi.org/10.1016/j.compag.2020.105506Get rights and content

Highlights

  • Developing methods for a late fusion of multiple Deep Neural Network models for better performance.

  • Proposing methods to determine priority weights for models.

  • Comparison between methods to determine optimal solution.

  • Allow to classify in near real-time.

Abstract

In agriculture, many types of weeds have a harmful impact on agricultural productivity. Recognizing weeds and understanding the threat they pose to farmlands is a significant challenge because many weeds are quite similar in their external structure, making it difficult to classify them. A weeds classification approach with high accuracy and quick processing should be incorporated into automatic devices in smart agricultural systems to solve this problem. In this study, we develop a novel classification approach via a voting method by using the late fusion of multimodal Deep Neural Networks (DNNs). The score vector used for voting is calculated by either using Bayesian conditional probability-based method or by determining priority weights so that better DNNs models have a higher contribution to scoring. We experimentally studied the Plant Seedlings and Chonnam National University (CNU) Weeds datasets with 5 DNN models: NASNet, Resnet, Inception–Resnet, Mobilenet, and VGG. The results show that our methods achieved an accuracy of 97.31% on the Plant Seedlings dataset, and 98.77% accuracy on the CNU Weeds dataset. Furthermore, our framework can classify an image in near real-time.

Introduction

Weeds are one of the most critical factors in the reduction of agricultural productivity and increase the burden on farmers’ crops. Correct identification of weeds is an essential step for analyzing the threats of such weeds to secure agricultural productivity. It may be challenging to identify weeds belonging to the same family visually because of their natural similarity. Various applications that rely on physical features to classify weeds, such as the Naive Bayes and Gaussian mixture models, were adopted in De Rainville et al. (2014). Conversely, Persson and Åstrand (2008) applied Active shape models to extract weeds from images and k-nearest neighbors (KNNs) for the classification. However, the handcrafted features were overly simplified and could not emphasize the characteristics of the weeds in real-world applications.

Recently, with the improvement of camera quality and the development of hardware specifications, especially of graphics processing units (GPUs), researchers have achieved significant results when addressing the weeds classification problem by using deep learning models. Dyrmann et al. (2016) built a convolutional neural network (CNN) model based on the concept of the Resnet model. Chavan and Nandedkar (2018) combined the underlying architecture of AlexNet and VGG to form AgroAVNET, which included 6 convolutional and 3 fully connected layers. They applied it to a small dataset (Plant Seedlings Dataset), which includes 12 species and approximately 4200 images from the Aarhus University Computer Vision and Biosystems Signal Processing Group (Giselsson et al., 2017). Their model exhibited impressive performance; however, those deep learning-based methods were not sufficient for the classification of the Plant Seedlings dataset, which includes complex weeds structures.

To solve this problem, we propose a novel classification using the voting method with the late fusion of multimodal DNNs. The score method used for voting is calculated by alternately using Bayesian conditional probability-based proposed in Kittler et al. (1998) or by using performances of DNNs models to determine priority weights. The species are then scored considering the weighted linear combination or weighted power multiplication. The better the performance, the higher the priority weight. We tested our voting method experimentally by using the late fusion of 5 DNN models on the CNU and Plant Seedlings weeds datasets. We demonstrate that combining multiple DNNs models and giving priority to the best performing models can provide more accurate classification results in near real-time with central processing unit (CPU) or GPU.

The remainder of this paper is organized as follows. Section 2 summarizes previous approaches of weeds classification using fusion and DNNs models. The CNU Weeds and Plant Seedlings datasets are introduced in Section 3. Section 4 describes the overall voting method for weeds classification using the late fusion of multimodal DNNs and introduces the derivation of the score vector and formula for determining the priority weights. Section 5 presents the setup of our experiment, evaluation metrics, relative error reduction, performance results, and time calculations. The study is concluded in the final section.

Section snippets

Information fusion

Deep learning performance is generally improved by object-based multi-fusion. For weeds classification, after (Dyrmann et al., 2018) segmented the plants from the background, they separated the leaves from the plants. Then, they fused particular features of the leaves and plant by proposing Bayesian belief integration to estimate the belief of a correct prediction; This method, however, suffered the difficulty of leaf segmentation when a leaf overlaps the target leaf may inhibit the belief.

Plant Seedlings dataset

This dataset was created by the Computer Vision and Biosystems Signal Processing Group (2018) at the Department of Engineering, Aarhus University, Denmark (Giselsson et al., 2017). On their website, they provide 3 types of data: original, segmented, and cropped. The original image contains a small area surrounding the plant seedlings, which can lead to difficulty in extracting microfeatures from the surface of the plants. High background noise can also negatively impact the training process

Multimodal fusion

Each DNN architecture learns specific structure from the weeds; therefore, combining their different perspectives may have better classification results. Based on this idea, we approach the weeds classification problem with a multimodal DNNs system and fuse the information through scoring methods to classify the species. Early fusion aims to combine information from different DNNs at the feature levels, requiring careful analysis of which feature layers are suitable for fusion to avoid

Performance metrics

Suppose the evaluation dataset D contains m images of c species (m,cN), RiD is the set of images classified as species ci (iN and ic), and IkRiRi is the kth image in the set Ri. Assume predictIkRi is the prediction result and speciesIkRi is the correct label for image IkRi. We then have the following expressions.

  • True Positive (TPi): The number of images in Ri that are classified correctly:

TPi=k=1|Ri|fcmppredictIkRi,ci
  • False Positive (FPi): The number of images in Rithat are classified

Conclusion

We introduced the voting method using the late fusion of multimodal DNNs for weeds classification. Our main contribution was proposing Bayesian conditional probability-based and priority weights as scoring methods to calculate the score vector used for voting. Our method estimates the priority weights for models (or species) so that better models (or models that are better for a particular species) have a higher priority on contribution in the scoring step. The experimental results showed that

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

This work was carried out with the support of the “Cooperative Research Program for Agriculture Science and Technology Development (Project No. PJ01385501)”, Rural Development Administration, Republic of Korea.

References (21)

There are more references available in the full text version of this article.

Cited by (0)

1

The authors contributed equally to this work.

View full text