Early detection of tomato spotted wilt virus infection in tobacco using the hyperspectral imaging technique and machine learning algorithms
Introduction
Tobacco is an important agricultural and economic crop, both in China and around the world. Notably, China grows approximately one-third of the world’s tobacco crop (Hu et al., 2010). However, the quality as well as the output of tobacco can be strongly impacted by plant diseases and insect pests throughout the growing season (Zhu et al., 2017). Tomato spotted wilt virus (TSWV) is one of the most wide-spread and damaging plant viral pathogens, and can systematically infect lots of crops such as tomatoes, peppers, tobacco, zinnia, and lettuce (Krezhova et al., 2014). TSWV has become one of the most dangerous diseases for tobacco, affecting the cultivation of a wide range of tobacco crops and seriously constraining the tobacco quality and yield worldwide (Mandal et al., 2007, McPherson et al., 2002). For example, in Georgia, the incidence of TSWV in flue-cured tobacco caused an average reduction in crop value of 41% at an estimated economic loss of up to $19.4 million annually in 2007 (Mandal et al., 2007). The disease has been reported in most provinces of China, and is widely distributed in Yunnan Province — one of the most important tobacco producing regions in China.
Plant health monitoring and timely disease detection are crucial for effective morbidity control and crop management (Martinelli et al., 2015). The traditional crop disease detection and monitoring approaches mainly consist of empirical evaluation, i.e., visual surveys, DNA-based and serological methods, such as polymerase chain reaction (PCR), flow cytometry (FCM), immunofluorescence (IF), and double antibody sandwich enzyme-linked immunosorbent assay (DAS-ELISA) (Fang and Ramasamy, 2015, Madufor et al., 1201, 2017,). The empirical method is inefficient and unreliable, while the laboratory-based detection techniques are destructive, time-consuming, labor-intensive and costly (Martinelli et al., 2015). DNA-based and serological methods lack the capability to detect infection at the asymptomatic stage, especially with regard to systemically diffused pathogens (Martinelli et al., 2015). In addition, the above approaches, especially for laboratory-based methods, require highly trained professionals for the sophisticated techniques. These deficiencies have directed researches towards the use of more effective alternative methods for detecting crop diseases at an early stage and on a large scale of fields (Sankaran et al., 2010). Hyperspectral imaging technology has been growing significantly in the past two decades (Bioucas-Dias et al., 2013) and was widely employed for non-destructively investigating biotic and abiotic stresses in crop plants across various spatial and temporal scales (Berger et al., 2018, Galvao et al., 2011, Mananze et al., 2018). Hyperspectral imaging is developed based on the technical integration of imaging and spectroscopy, with which the spatial and spectral information on an object can be acquired synchronously (Bauriegel et al., 2011, Li et al., 2011). The disease infection will lead to variations in the biophysical and biochemical characteristics in plants, e.g., tissue structure, intercellular space, transpiration rate, pigment content, and water content (Rumpf et al., 2010, Slaton et al., 2001). These changes may affect the spectral characteristics of plants, which can be captured by the hyperspectral platform (Zhu et al., 2016).
Similar research has been conducted by Zhu et al., 2017, Krezhova et al., 2014. Krezhova et al. (2014) reported that hyperspectral reflectance in the visible and near-infrared ranges was collected to discriminate the TSWV-infected tobacco leaves from healthy ones. The spectral data were acquired at 14 and 20 days post-inoculation (DPI) and statistical analysis methods were used to detect the development of TSWV infection in tobacco plants. The presence of TSWV was established at 14 DPI. Zhu et al. (2017) demonstrated that it is possible to detect the tobacco mosaic virus (TMV) infection in tobacco plants at the presymptomatic stage using hyperspectral imaging. They compared different machine learning algorithms for classifying disease stages. In this study, hyperspectral image data were acquired at a very early stage (beginning from the first day after inoculation) and processed to prove their potential strength for the early detection of TSWV infection in tobacco plants. We investigated the classification performances of different combinations of wavelength selection methods and machine learning classifiers. Innovatively, by employing the real-time polymerase chain reaction (RT-PCR), we compared the timeliness of the hyperspectral imaging technique with the molecular identification approach for TSWV infection detection.
High dimensionality and multi-collinearity frequently occur to the hyperspectral data due to the large amount of highly correlated spectral values within the dataset (Ng et al., 2019, Wei et al., 2017). Therefore, effective wavelengths (EWs) selection is essential for hyperspectral analysis to maximize the efficiency of data use and reduce computation complexity (Delalieux et al., 2007, Ng et al., 2019). Various approaches have been used to solve the multi-collinearity problem, such as principal component regression (Surhone et al., 2013), genetic algorithm (GA), successive projections algorithm (SPA) (Araújo et al., 2001, Xie et al., 2015, Zhu et al., 2017), and partial least squares regression (PLSR) models (Ng et al., 2019). GA has been used for feature selection in spectral data by many studies (Dou et al., 2015, Li et al., 2011, Ma et al., 2003). SPA was used to select the most important wavelengths for identifying different diseases on tomato leaves using hyperspectral imaging by Xie et al. (2015). The wavelengths selected by SPA involved most of the valid information, and played significant roles in the detection of diseases. Zhu et al. (2017) also adopted SPA for EWs identification in the study of presymptomatic detection of tobacco mosaic virus (TMV) infection using hyperspectral imaging. The above two papers both showed that SPA was an effective method for EWs selection. Optimal wavelengths were identified using the PLS models by ElMasry et al. (2007), and the prediction models based on the wavelengths selected by PLS obtained close accuracies as compared with the predictive performances of the models built by the full spectral range. In order to make a comparison between different feature selection methods for plant disease detection, we used three algorithms (GA, SPA, and BRT) to select wave bands from hyperspectral image data.
Numerous data mining techniques have been employed in previous studies for classification and prediction purposes based on remote sensing data, including statistical analysis methods such as principal components analysis (PCA) and discriminant analysis (DA), and machine learning (ML) algorithms, such as artificial neural networks (ANNs) (Were et al., 2015), support vector machine (SVM) (Were et al., 2015), classification and regression trees (CART) (Razi and Athappilly, 2005), and boosted regression tree (BRT) (Yang et al., 2016). Rumpf et al. (2010) presented an automatic method for the early detection of sugar beet diseases using SVM and hyperspectral reflectance. They correctly discriminated between diseased and healthy sugar beet plants with a classification accuracy of 97%. They also explored the potential of presymptomatic detection of different kinds of diseases on sugar beet, obtaining classification accuracies between 65% and 90%. Wang et al. (2008) adopted ANNs to predict late blight (LB) disease on tomatoes based on spectral reflectance. By comparing different network structures, they successfully predicted healthy and diseased tomato canopies with correlation coefficients between predicted values and measured values of 0.99 and 0.82 for field experiments and remotely sensed images, respectively, suggesting that an ANN with back-propagation training could be employed for spectral detection of LB infections on tomato. Random forest (RF) and BRT are relatively new machine learning algorithms. Michez et al. (2016) used RF for forest health condition classification based on imagery from an unmanned aerial vehicle (UAV), and obtained good overall accuracies (over 90%). Machine learning techniques have been successfully used in prior identification and classification studies and are promising as modeling tools for identifying disease in plants using hyperspectral image data. In order to compare different ML algorithms for plant disease identification, we have selected four methods (CART, SVM, RF and BRT) for the early detection of TSWV infection in tobacco plants.
More specifically, this paper has the following purposes: to attest the applicability of hyperspectral imaging to detect the TSWV infection in tobacco plants at an early stage; (2) to identify the optimal predictive wavebands by using different wavelength selection methods, including GA, SPA and BRT; (3) to develop the prediction models based on different machine learning techniques, which include CART, SVM, RF and BRT; (4) to determine the best combination of band selection method and prediction model technique for the early detection of TSWV in tobacco; and (5) to compare the timeliness between the hyperspectral imaging technique and molecular identification approach for TSWV infection detection.
Section snippets
Experimental design
The experiment was performed at the Zhejiang Academy of Agricultural Science. A total of 80 tobacco plants (Nicotiana benthamiana) were grown in a climate chamber under environmentally controlled conditions (temperature 20–25 °C, humidity 50–70%) with a 12/12 h photoperiod. Among them, 40 plants were inoculated with TSWV at 4–6 leaf stage, and the remaining 40 plants were employed as controls. TSWV was inoculated on tobacco plants according to the previous studies (Krezhova et al., 2014, Zhu et
Disease development
The tobacco plant without inoculation kept growing healthily during the experiment period. For the infected plants, after five days of latent period, the disease symptom of TSWV started to appear on the infected plants at 6 DPI (Fig. 3). The small spots visible on the inoculated leaves rapidly expanded from 7 DPI and formed significantly large necrotic areas at 8 DPI. Molecular identification results of TSWV-infected tobacco leaves are presented in Fig. 4. TSWV coat protein was detected by
Conclusions
This study investigated the potential of using the changes in spectral reflectance in the VIS/NIR region (400–1000 nm) to identify the infected tobacco plants with TSWV at an early stage. A comprehensive method was developed by using the hyperspectral imaging platform in conjunction with GA, SPA and BRT to define several optimal wavelengths and four machine learning algorithms (CART, BRT, SVM and RF) for classification. Six bands were selected by SPA, six bands by GA, and eight bands by the BRT
Acknowledgement
We gratefully acknowledge the financial support from National Natural Science Foundation of China (Grant No. 41601024, 31501220). We also thank the editor and three reviewers for their valuable comments and suggestions that improved this paper.
References (59)
- et al.
The successive projections algorithm for variable selection in spectroscopic multicomponent analysis
Chemometr. Intell. Lab.
(2001) - et al.
Early detection of Fusarium infection in wheat using hyper-spectral imaging
Comput. Electron. Agr.
(2011) - et al.
Detection of biotic stress (Venturia inaequalis) in apple trees using hyperspectral data: non-parametric statistical approaches and physiological implications
Eur. J. Agron.
(2007) - et al.
Near-infrared hyperspectral imaging for predicting colour, pH and tenderness of fresh beef
J. Food Eng.
(2012) - et al.
Hyperspectral imaging for nondestructive determination of some quality attributes for strawberry
J. Food Eng.
(2007) - et al.
On intra-annual EVI variability in the dry season of tropical forest: a case study with MODIS and hyperspectral data
Remote Sens. Environ.
(2011) - et al.
A hybrid forecasting approach applied to wind speed time series
Renew. Energ.
(2013) - et al.
Multi-objective optimization using genetic algorithms: a tutorial
Reliab. Eng. Syst. Safe.
(2006) - et al.
An effective feature selection method for hyperspectral image classification based on genetic algorithm and support vector machine
Knowl.-Based Syst.
(2011) - et al.
A spatial–temporal approach to monitoring forest disease spread using multi-temporal high spatial resolution imagery
Remote Sens. Environ.
(2006)
The potential of spectral reflectance technique for the detection of Grapevine leafroll-associated virus-3 in two red-berried wine grape cultivars
Comput. Electron. Agr.
Optimizing wavelength selection by using informative vectors for parsimonious infrared spectra modelling
Comput. Electron. Agr.
Receiver operating characteristic (ROC) curve: practical review for radiologists
Korean J. Radiol.
Silencing of NbXrn4 facilitates the systemic infection of Tobacco mosaic virus in Nicotiana benthamiana
Virus Res.
A comparative predictive analysis of neural networks (NNs), nonlinear regression and classification and regression tree (CART) models
Expert Syst. Appl.
An insight into machine-learning algorithms to model human-caused wildfire occurrence
Environ. Modell. Softw.
An optimization model for reverse logistics network under stochastic environment by using genetic algorithm
J. Manuf. Syst.
Early detection and classification of plant diseases with Support Vector Machines based on hyperspectral reflectance
Comput. Electron. Agr.
An empirical comparison of machine learning techniques for dam behaviour modelling
Struct. Saf.
A review of advanced techniques for detecting plant diseases
Comput. Electron. Agr.
Hyperspectral characterization of freezing injury and its biochemical impacts in oilseed rape leaves
Remote Sens. Environ.
A comparative assessment of support vector regression, artificial neural networks, and random forests for predicting and mapping soil organic carbon stocks across an Afromontane landscape
Ecol. Ind.
A simplified method for constructing artificial microRNAs based on the osa-MIR528 precursor
J. Biotechnol.
Comparison of boosted regression tree and random forest models for mapping topsoil organic carbon concentration in an alpine ecosystem
Ecol. Ind.
Detection of stress in tomatoes induced by late blight disease in California, USA, using hyperspectral remote sensing
Int. J. Appl. Earth Obs.
Detecting macronutrients content and distribution in oilseed rape leaves based on hyperspectral imaging
Biosyst. Eng.
Detection of Fire Blight disease in pear trees by hyperspectral data
Eur. J. Remote Sens.
Generation and application of hyperspectral 3D plant models: methods and challenges
Mach. Vision Appl.
Evaluation of the PROSAIL model capabilities for future hyperspectral model environments: a review study
Remote Sens.
Cited by (64)
MobileNet-GRU fusion for optimizing diagnosis of yellow vein mosaic virus
2024, Ecological InformaticsBalancing composite motion optimization using R-ERNN with plant disease
2024, Applied Soft ComputingClassification of wheat powdery mildew based on hyperspectral: From leaves to canopy
2024, Crop ProtectionOnline small-object anti-fringe sorting of tobacco stem impurities based on hyperspectral superpixels
2023, Spectrochimica Acta - Part A: Molecular and Biomolecular SpectroscopyUAV-borne hyperspectral estimation of nitrogen content in tobacco leaves based on ensemble learning methods
2023, Computers and Electronics in Agriculture