Original papers
A hybrid ensemble for classification in multiclass datasets: An application to oilseed disease dataset

https://doi.org/10.1016/j.compag.2016.03.026Get rights and content

Highlights

Abstract

The paper presents a new hybrid ensemble approach consisting of a combination of machine learning algorithms, a feature ranking method and a supervised instance filter. Its aim is to improve the performance results of machine learning algorithms for multiclass classification problems. The performance of new hybrid ensemble approach is tested for its effectiveness over four standard agriculture multiclass datasets. It performs better on all these datasets. It is applied on multiclass oilseed disease dataset. It is observed that ensemble-Vote performs better than Logistic Regression and Naïve Bayes algorithms. The performance results of hybrid ensemble are compared with ensemble-Vote. The performance results prove that the new hybrid ensemble approach outperforms ensemble-Vote with improved oilseed disease classification accuracy up to 94.73%.

Introduction

Machine learning algorithms are useful in effective decision making in agriculture. These algorithms possess a strong capability of extracting complicated relationships that exist in the agricultural data (Rocha et al., 2010). High dimensional agricultural data requires the use of machine learning feature selection algorithms when the most explanatory or important features or attributes are to be selected from large datasets (EI-Bendary et al., 2015, Hill et al., 2014, Kundu et al., 2011, Timmermans and Hulzebosch, 1996). Machine learning classification algorithms viz. Logistic Regression and Naïve Bayes are successfully used for accurate identification of crop diseases (Phadikar et al., 2013, Sankaran et al., 2010, Gutiérrez et al., 2008, Baker and Kirk, 2007).

Soybean, groundnut and rapeseed-mustard are the three most important oilseed crops of the world. They play an important role in the oilseed economy. One of the major concerns in increasing and stabilizing the yield of oilseeds is the incidence of pests and diseases which, to a greater extent are responsible for low and unstable production of these crops. Oilseeds are susceptible to various diseases caused by bacteria, fungi, viruses, nematodes and physiological disorders. Some diseases are largely spread and cause great economic losses while others are limited in distribution and are not of much economic importance during present times, but may become major diseases in the course of time by favorable climatic conditions. Oilseed diseases considered in the present work include Alternaria leaf spot, Anthracnose, Cercospora leaf spot, Charcoal rot, Collar rot, Myrothecium leaf spot, Powdery mildew, Sclerotinia stem rot, Phyllosticta leaf spot and Rust. Crop disease diagnosis is a multiclass classification problem.

In several classification problems ensembles have proved to be effective as compared to single classification algorithm (Bolón-Canedo et al., 2012, Sun et al., 2007, Stamatatos and Widmer, 2005). Ensembles have great potential in the domain of multiclass classification. Ensemble machine learning methods have been recommended in the literature for different types of classification problems (Hsu, 2012, Kotsiantis, 2007, Dietterich, 2000, Bay, 1999, Opitz, 1999, Ting and Witten, 1999, Zheng and Webb, 1999, Ho, 1998, Breiman, 1996, Wolpert, 1992, Hansen and Salamon, 1990, Schapire, 1990).

Vote is an ensemble of Logistic Regression and Naïve Bayes algorithms in the present work. This work proposes a new hybrid ensemble approach with an aim to improve the performance results of machine learning algorithms for multiclass classification problems. The aim of the present work is also to compare proposed hybrid ensemble approach with ensemble-Vote. The proposed new hybrid ensemble approach is applied on oilseed disease diagnosis multiclass problem for accurate identification of disease(s).

The paper is organized as follows: Section 2 describes materials and methods used in the present work. Section 3 presents new hybrid ensemble approach for multiclass classification problems. Section 4 describes results and discussion. Section 5 presents the conclusions drawn.

Section snippets

Materials and methods

The tool WEKA (Hall Mark, 2009, Witten and Frank, 2005) is used for the generation of predictive models. It is an open-source tool developed at University of Waikato, New Zealand (http://www.cs.waikato.ac.nz/ml/Weka/).

The proposed hybrid ensemble approach

The hybrid ensemble design is based upon the principle that combining the results of multiple machine learning algorithms is superior to the result of single algorithm.

Results and discussion

Ten-fold cross validation has been successfully used for evaluating the performance of a machine learning algorithm(s) as it offers reliable approximates for classification accuracy on each classification task (Arora and Jain, 2014, Azar et al., 2014, Baldi et al., 2000). The experiments conducted for evaluating the performance of hybrid ensemble are performed using 10-fold cross validation strategy.

Conclusions

The paper proposes a new hybrid ensemble approach for improvement of classification accuracy for multiclass classification problems. It is successfully applied for accurate diagnosis of oilseed diseases. The performance of proposed hybrid ensemble is tested for classification accuracy with 10-fold cross validation on four standard agriculture datasets. The accuracy results obtained for these standard datasets prove that the hybrid ensemble approach shows better classification accuracies as

References (47)

  • A. Rocha et al.

    Automatic fruit and vegetable classification from images

    Comput. Electron. Agric.

    (2010)
  • S. Sankaran et al.

    A review of advanced techniques for detecting plant diseases

    Comput. Electron. Agric.

    (2010)
  • L.O.L.A. Silva et al.

    Comparative assessment of feature selection and classification techniques for visual inspection of pot plant seedlings

    Comput. Electron. Agric.

    (2013)
  • E. Stamatatos et al.

    Automatic identification of music performers with learning ensembles

    Artif. Intell.

    (2005)
  • S. Sun et al.

    An experimental evaluation of ensemble methods for EEG signal classification

    Pattern Recogn. Lett.

    (2007)
  • A.J.M. Timmermans et al.

    Computer vision system for on-line sorting of pot plants using an artificial neural network classifier

    Comput. Electron. Agric.

    (1996)
  • D.H. Wolpert

    Stacked generalization

    Neural Netw.

    (1992)
  • E. Yen et al.

    Relaxing instance boundaries for the search of splitting points of numerical attributes in classification trees

    Inf. Sci.

    (2007)
  • A. Arora et al.

    Machine learning for diagnosis of soybean diseases

    Soybean Res.

    (2014)
  • P. Baldi et al.

    Assessing the accuracy of prediction algorithms for classification and overview

    Bioinformatics

    (2000)
  • Bartaria, A.M., Shukla, A.K., Kaushik, C.D., Kumar, P.R., Singh, N.B., 2001. Major diseases of Rapeseed-Mustard and...
  • E. Bauer et al.

    An empirical comparison of voting classification algorithms: bagging, boosting, and variants

    Mach. Learn.

    (1999)
  • L. Breiman

    Bagging predictors

    Mach. Learn.

    (1996)
  • Cited by (51)

    • Ensemble of hybrid Bayesian networks for predicting the AMEn of broiler feedstuffs

      2022, Computers and Electronics in Agriculture
      Citation Excerpt :

      Thus, an ensemble can outperform and be more robust than individual predictors. Ensembles are attractive for presenting robustness to new data, noise, outliers, missing values, uncertainties, applications and theory development (Polikar, 2006; Kuncheva, 2014; Leite and Skrjanc, 2019; Archana et al., 2016a; Archana et al., 2016b; Archana et al., 2020). The steps to develop an ensemble are (i) separately training z base models (each operating with different hyperparameters and/or different subsets of data) and (ii) combining local estimates to provide a global estimate.

    • Tomato disease and pest diagnosis method based on the Stacking of prescription data

      2022, Computers and Electronics in Agriculture
      Citation Excerpt :

      For ensemble-Voting method, when the base-classifier selects GDBT, XGBoost, and LGBM, the prediction accuracy is 79.98%, and it is better than that produced by base-classifiers Simple Logistic and Naïve Bayes. Moreover, for the same base-classifier combination of GDBT, XGBoost, and LGBM, the accuracy (80.36%) produced by Stacking is better than that (79.98%) of Voting (Chaudhary et al., 2016; Chaudhary et al., 2020) and Blending (Wu et al., 2021). In other studies, Stacking was also proved more effective than Voting. (

    • On the suitability of stacking-based ensembles in smart agriculture for evapotranspiration prediction

      2021, Applied Soft Computing
      Citation Excerpt :

      Finally, nowadays Agriculture 4.0 is evolving thanks to the employment of current technologies such as Internet of Things, big data and artificial intelligence [2,3]. The application of these methodologies improves aspects such as the diagnosis of diseases in agriculture [4,5] and the water management efficiency [6], making farm activities more sustainable, saving energy and preserving the hydrological balance of the ecosystem. Specifically, data mining methods are frequently applied to water management issues in agriculture [3,6].

    View all citing articles on Scopus
    View full text