Original papersA hybrid ensemble for classification in multiclass datasets: An application to oilseed disease dataset
Introduction
Machine learning algorithms are useful in effective decision making in agriculture. These algorithms possess a strong capability of extracting complicated relationships that exist in the agricultural data (Rocha et al., 2010). High dimensional agricultural data requires the use of machine learning feature selection algorithms when the most explanatory or important features or attributes are to be selected from large datasets (EI-Bendary et al., 2015, Hill et al., 2014, Kundu et al., 2011, Timmermans and Hulzebosch, 1996). Machine learning classification algorithms viz. Logistic Regression and Naïve Bayes are successfully used for accurate identification of crop diseases (Phadikar et al., 2013, Sankaran et al., 2010, Gutiérrez et al., 2008, Baker and Kirk, 2007).
Soybean, groundnut and rapeseed-mustard are the three most important oilseed crops of the world. They play an important role in the oilseed economy. One of the major concerns in increasing and stabilizing the yield of oilseeds is the incidence of pests and diseases which, to a greater extent are responsible for low and unstable production of these crops. Oilseeds are susceptible to various diseases caused by bacteria, fungi, viruses, nematodes and physiological disorders. Some diseases are largely spread and cause great economic losses while others are limited in distribution and are not of much economic importance during present times, but may become major diseases in the course of time by favorable climatic conditions. Oilseed diseases considered in the present work include Alternaria leaf spot, Anthracnose, Cercospora leaf spot, Charcoal rot, Collar rot, Myrothecium leaf spot, Powdery mildew, Sclerotinia stem rot, Phyllosticta leaf spot and Rust. Crop disease diagnosis is a multiclass classification problem.
In several classification problems ensembles have proved to be effective as compared to single classification algorithm (Bolón-Canedo et al., 2012, Sun et al., 2007, Stamatatos and Widmer, 2005). Ensembles have great potential in the domain of multiclass classification. Ensemble machine learning methods have been recommended in the literature for different types of classification problems (Hsu, 2012, Kotsiantis, 2007, Dietterich, 2000, Bay, 1999, Opitz, 1999, Ting and Witten, 1999, Zheng and Webb, 1999, Ho, 1998, Breiman, 1996, Wolpert, 1992, Hansen and Salamon, 1990, Schapire, 1990).
Vote is an ensemble of Logistic Regression and Naïve Bayes algorithms in the present work. This work proposes a new hybrid ensemble approach with an aim to improve the performance results of machine learning algorithms for multiclass classification problems. The aim of the present work is also to compare proposed hybrid ensemble approach with ensemble-Vote. The proposed new hybrid ensemble approach is applied on oilseed disease diagnosis multiclass problem for accurate identification of disease(s).
The paper is organized as follows: Section 2 describes materials and methods used in the present work. Section 3 presents new hybrid ensemble approach for multiclass classification problems. Section 4 describes results and discussion. Section 5 presents the conclusions drawn.
Section snippets
Materials and methods
The tool WEKA (Hall Mark, 2009, Witten and Frank, 2005) is used for the generation of predictive models. It is an open-source tool developed at University of Waikato, New Zealand (http://www.cs.waikato.ac.nz/ml/Weka/).
The proposed hybrid ensemble approach
The hybrid ensemble design is based upon the principle that combining the results of multiple machine learning algorithms is superior to the result of single algorithm.
Results and discussion
Ten-fold cross validation has been successfully used for evaluating the performance of a machine learning algorithm(s) as it offers reliable approximates for classification accuracy on each classification task (Arora and Jain, 2014, Azar et al., 2014, Baldi et al., 2000). The experiments conducted for evaluating the performance of hybrid ensemble are performed using 10-fold cross validation strategy.
Conclusions
The paper proposes a new hybrid ensemble approach for improvement of classification accuracy for multiclass classification problems. It is successfully applied for accurate diagnosis of oilseed diseases. The performance of proposed hybrid ensemble is tested for classification accuracy with 10-fold cross validation on four standard agriculture datasets. The accuracy results obtained for these standard datasets prove that the hybrid ensemble approach shows better classification accuracies as
References (47)
- et al.
A random forest classifier for lymph diseases
Comput. Meth. Programs Biomed.
(2014) - et al.
Comparative analysis of models integrating synoptic forecast data into potato late blight risk estimate systems
Comput. Electron. Agric.
(2007) - et al.
Democracy in neural nets: voting schemes for classification
Neural Netw.
(1994) Nearest neighbor classification from multiple feature subsets
Intell. Data Anal.
(1999)- et al.
An ensemble of filters and classifiers for microarray data classification
Pattern Recogn.
(2012) - et al.
A comparison of machine learning techniques for detection of drug target articles
J. Biomed. Inform.
(2010) - et al.
Logistic regression product-unit neural networks for mapping Ridolfia segetum infestations in sunflower crop using multitemporal remote sensed data
Comput. Electron. Agric.
(2008) - et al.
The use of data mining to assist crop protection decisions on kiwifruit in New Zealand
Comput. Electron. Agric.
(2014) Random forests ensemble classifier trained with data resampling strategy to improve cardiac arrhythmia diagnosis
Comput. Biol. Med.
(2011)- et al.
Rice diseases classification using feature selection and rule generation techniques
Comput. Electron. Agric.
(2013)
Automatic fruit and vegetable classification from images
Comput. Electron. Agric.
A review of advanced techniques for detecting plant diseases
Comput. Electron. Agric.
Comparative assessment of feature selection and classification techniques for visual inspection of pot plant seedlings
Comput. Electron. Agric.
Automatic identification of music performers with learning ensembles
Artif. Intell.
An experimental evaluation of ensemble methods for EEG signal classification
Pattern Recogn. Lett.
Computer vision system for on-line sorting of pot plants using an artificial neural network classifier
Comput. Electron. Agric.
Stacked generalization
Neural Netw.
Relaxing instance boundaries for the search of splitting points of numerical attributes in classification trees
Inf. Sci.
Machine learning for diagnosis of soybean diseases
Soybean Res.
Assessing the accuracy of prediction algorithms for classification and overview
Bioinformatics
An empirical comparison of voting classification algorithms: bagging, boosting, and variants
Mach. Learn.
Bagging predictors
Mach. Learn.
Cited by (51)
Heterogeneous learning method of ensemble classifiers for identification and classification of power quality events and fault transients in wind power integrated microgrid
2022, Sustainable Energy, Grids and NetworksEnsemble of hybrid Bayesian networks for predicting the AMEn of broiler feedstuffs
2022, Computers and Electronics in AgricultureCitation Excerpt :Thus, an ensemble can outperform and be more robust than individual predictors. Ensembles are attractive for presenting robustness to new data, noise, outliers, missing values, uncertainties, applications and theory development (Polikar, 2006; Kuncheva, 2014; Leite and Skrjanc, 2019; Archana et al., 2016a; Archana et al., 2016b; Archana et al., 2020). The steps to develop an ensemble are (i) separately training z base models (each operating with different hyperparameters and/or different subsets of data) and (ii) combining local estimates to provide a global estimate.
Tomato disease and pest diagnosis method based on the Stacking of prescription data
2022, Computers and Electronics in AgricultureCitation Excerpt :For ensemble-Voting method, when the base-classifier selects GDBT, XGBoost, and LGBM, the prediction accuracy is 79.98%, and it is better than that produced by base-classifiers Simple Logistic and Naïve Bayes. Moreover, for the same base-classifier combination of GDBT, XGBoost, and LGBM, the accuracy (80.36%) produced by Stacking is better than that (79.98%) of Voting (Chaudhary et al., 2016; Chaudhary et al., 2020) and Blending (Wu et al., 2021). In other studies, Stacking was also proved more effective than Voting. (
Machine learning-based farm risk management: A systematic mapping review
2022, Computers and Electronics in AgricultureOn the suitability of stacking-based ensembles in smart agriculture for evapotranspiration prediction
2021, Applied Soft ComputingCitation Excerpt :Finally, nowadays Agriculture 4.0 is evolving thanks to the employment of current technologies such as Internet of Things, big data and artificial intelligence [2,3]. The application of these methodologies improves aspects such as the diagnosis of diseases in agriculture [4,5] and the water management efficiency [6], making farm activities more sustainable, saving energy and preserving the hydrological balance of the ecosystem. Specifically, data mining methods are frequently applied to water management issues in agriculture [3,6].