Early detection of tomato spotted wilt virus infection in tobacco using the hyperspectral imaging technique and machine learning algorithms

https://doi.org/10.1016/j.compag.2019.105066Get rights and content

Highlights

  • Hyperspectral imaging can be used for early detection of TSWV infection in tobacco.

  • NIR is informative and important for identifying the TSWV-infected tobacco leaves.

  • BRT combined with SPA is the best model for TSWV infection detection in tobacco.

  • TSWV infection can be detected before systematic infection established by RT-PCR.

Abstract

The hyperspectral imaging technique was used for the non-destructive detection of tomato spotted wilt virus (TSWV) infection in tobacco at an early stage. Spectra ranging from 400 to 1000 nm with 128 bands from inoculated and healthy tobacco plants were analyzed by using three wavelength selection methods (successive projections algorithm (SPA), boosted regression tree (BRT), and genetic algorithm (GA)), and four machine learning (ML) techniques (boosted regression tree (BRT), support vector machine (SVM), random forest (RF), and classification and regression tress (CART)). The results indicated that the models built by the BRT algorithm using the wavelengths selected by SPA as the input variables obtained the best outcome for the 10-fold cross-validation with the mean overall accuracy of 85.2% and area under receiver operating curve (AUC) of 0.932. The band selection results and variable contribution analysis in BRT modeling jointly showed that the near-infrared (NIR) spectral region is informative and important for the differentiation of infected and healthy tobacco leaves. Different stages of post-inoculation were split according to the molecular identification and visual observation. The classification results at different stages indicated that the hyperspectral imaging data combined with ML methods and wavelength selection algorithms can be used for the early detection of TSWV in tobacco, both at the presymptomatic stage and during the period before the systematic infection can be detected by the molecular identification approach.

Introduction

Tobacco is an important agricultural and economic crop, both in China and around the world. Notably, China grows approximately one-third of the world’s tobacco crop (Hu et al., 2010). However, the quality as well as the output of tobacco can be strongly impacted by plant diseases and insect pests throughout the growing season (Zhu et al., 2017). Tomato spotted wilt virus (TSWV) is one of the most wide-spread and damaging plant viral pathogens, and can systematically infect lots of crops such as tomatoes, peppers, tobacco, zinnia, and lettuce (Krezhova et al., 2014). TSWV has become one of the most dangerous diseases for tobacco, affecting the cultivation of a wide range of tobacco crops and seriously constraining the tobacco quality and yield worldwide (Mandal et al., 2007, McPherson et al., 2002). For example, in Georgia, the incidence of TSWV in flue-cured tobacco caused an average reduction in crop value of 41% at an estimated economic loss of up to $19.4 million annually in 2007 (Mandal et al., 2007). The disease has been reported in most provinces of China, and is widely distributed in Yunnan Province — one of the most important tobacco producing regions in China.

Plant health monitoring and timely disease detection are crucial for effective morbidity control and crop management (Martinelli et al., 2015). The traditional crop disease detection and monitoring approaches mainly consist of empirical evaluation, i.e., visual surveys, DNA-based and serological methods, such as polymerase chain reaction (PCR), flow cytometry (FCM), immunofluorescence (IF), and double antibody sandwich enzyme-linked immunosorbent assay (DAS-ELISA) (Fang and Ramasamy, 2015, Madufor et al., 1201, 2017,). The empirical method is inefficient and unreliable, while the laboratory-based detection techniques are destructive, time-consuming, labor-intensive and costly (Martinelli et al., 2015). DNA-based and serological methods lack the capability to detect infection at the asymptomatic stage, especially with regard to systemically diffused pathogens (Martinelli et al., 2015). In addition, the above approaches, especially for laboratory-based methods, require highly trained professionals for the sophisticated techniques. These deficiencies have directed researches towards the use of more effective alternative methods for detecting crop diseases at an early stage and on a large scale of fields (Sankaran et al., 2010). Hyperspectral imaging technology has been growing significantly in the past two decades (Bioucas-Dias et al., 2013) and was widely employed for non-destructively investigating biotic and abiotic stresses in crop plants across various spatial and temporal scales (Berger et al., 2018, Galvao et al., 2011, Mananze et al., 2018). Hyperspectral imaging is developed based on the technical integration of imaging and spectroscopy, with which the spatial and spectral information on an object can be acquired synchronously (Bauriegel et al., 2011, Li et al., 2011). The disease infection will lead to variations in the biophysical and biochemical characteristics in plants, e.g., tissue structure, intercellular space, transpiration rate, pigment content, and water content (Rumpf et al., 2010, Slaton et al., 2001). These changes may affect the spectral characteristics of plants, which can be captured by the hyperspectral platform (Zhu et al., 2016).

Similar research has been conducted by Zhu et al., 2017, Krezhova et al., 2014. Krezhova et al. (2014) reported that hyperspectral reflectance in the visible and near-infrared ranges was collected to discriminate the TSWV-infected tobacco leaves from healthy ones. The spectral data were acquired at 14 and 20 days post-inoculation (DPI) and statistical analysis methods were used to detect the development of TSWV infection in tobacco plants. The presence of TSWV was established at 14 DPI. Zhu et al. (2017) demonstrated that it is possible to detect the tobacco mosaic virus (TMV) infection in tobacco plants at the presymptomatic stage using hyperspectral imaging. They compared different machine learning algorithms for classifying disease stages. In this study, hyperspectral image data were acquired at a very early stage (beginning from the first day after inoculation) and processed to prove their potential strength for the early detection of TSWV infection in tobacco plants. We investigated the classification performances of different combinations of wavelength selection methods and machine learning classifiers. Innovatively, by employing the real-time polymerase chain reaction (RT-PCR), we compared the timeliness of the hyperspectral imaging technique with the molecular identification approach for TSWV infection detection.

High dimensionality and multi-collinearity frequently occur to the hyperspectral data due to the large amount of highly correlated spectral values within the dataset (Ng et al., 2019, Wei et al., 2017). Therefore, effective wavelengths (EWs) selection is essential for hyperspectral analysis to maximize the efficiency of data use and reduce computation complexity (Delalieux et al., 2007, Ng et al., 2019). Various approaches have been used to solve the multi-collinearity problem, such as principal component regression (Surhone et al., 2013), genetic algorithm (GA), successive projections algorithm (SPA) (Araújo et al., 2001, Xie et al., 2015, Zhu et al., 2017), and partial least squares regression (PLSR) models (Ng et al., 2019). GA has been used for feature selection in spectral data by many studies (Dou et al., 2015, Li et al., 2011, Ma et al., 2003). SPA was used to select the most important wavelengths for identifying different diseases on tomato leaves using hyperspectral imaging by Xie et al. (2015). The wavelengths selected by SPA involved most of the valid information, and played significant roles in the detection of diseases. Zhu et al. (2017) also adopted SPA for EWs identification in the study of presymptomatic detection of tobacco mosaic virus (TMV) infection using hyperspectral imaging. The above two papers both showed that SPA was an effective method for EWs selection. Optimal wavelengths were identified using the PLS models by ElMasry et al. (2007), and the prediction models based on the wavelengths selected by PLS obtained close accuracies as compared with the predictive performances of the models built by the full spectral range. In order to make a comparison between different feature selection methods for plant disease detection, we used three algorithms (GA, SPA, and BRT) to select wave bands from hyperspectral image data.

Numerous data mining techniques have been employed in previous studies for classification and prediction purposes based on remote sensing data, including statistical analysis methods such as principal components analysis (PCA) and discriminant analysis (DA), and machine learning (ML) algorithms, such as artificial neural networks (ANNs) (Were et al., 2015), support vector machine (SVM) (Were et al., 2015), classification and regression trees (CART) (Razi and Athappilly, 2005), and boosted regression tree (BRT) (Yang et al., 2016). Rumpf et al. (2010) presented an automatic method for the early detection of sugar beet diseases using SVM and hyperspectral reflectance. They correctly discriminated between diseased and healthy sugar beet plants with a classification accuracy of 97%. They also explored the potential of presymptomatic detection of different kinds of diseases on sugar beet, obtaining classification accuracies between 65% and 90%. Wang et al. (2008) adopted ANNs to predict late blight (LB) disease on tomatoes based on spectral reflectance. By comparing different network structures, they successfully predicted healthy and diseased tomato canopies with correlation coefficients between predicted values and measured values of 0.99 and 0.82 for field experiments and remotely sensed images, respectively, suggesting that an ANN with back-propagation training could be employed for spectral detection of LB infections on tomato. Random forest (RF) and BRT are relatively new machine learning algorithms. Michez et al. (2016) used RF for forest health condition classification based on imagery from an unmanned aerial vehicle (UAV), and obtained good overall accuracies (over 90%). Machine learning techniques have been successfully used in prior identification and classification studies and are promising as modeling tools for identifying disease in plants using hyperspectral image data. In order to compare different ML algorithms for plant disease identification, we have selected four methods (CART, SVM, RF and BRT) for the early detection of TSWV infection in tobacco plants.

More specifically, this paper has the following purposes: to attest the applicability of hyperspectral imaging to detect the TSWV infection in tobacco plants at an early stage; (2) to identify the optimal predictive wavebands by using different wavelength selection methods, including GA, SPA and BRT; (3) to develop the prediction models based on different machine learning techniques, which include CART, SVM, RF and BRT; (4) to determine the best combination of band selection method and prediction model technique for the early detection of TSWV in tobacco; and (5) to compare the timeliness between the hyperspectral imaging technique and molecular identification approach for TSWV infection detection.

Section snippets

Experimental design

The experiment was performed at the Zhejiang Academy of Agricultural Science. A total of 80 tobacco plants (Nicotiana benthamiana) were grown in a climate chamber under environmentally controlled conditions (temperature 20–25 °C, humidity 50–70%) with a 12/12 h photoperiod. Among them, 40 plants were inoculated with TSWV at 4–6 leaf stage, and the remaining 40 plants were employed as controls. TSWV was inoculated on tobacco plants according to the previous studies (Krezhova et al., 2014, Zhu et

Disease development

The tobacco plant without inoculation kept growing healthily during the experiment period. For the infected plants, after five days of latent period, the disease symptom of TSWV started to appear on the infected plants at 6 DPI (Fig. 3). The small spots visible on the inoculated leaves rapidly expanded from 7 DPI and formed significantly large necrotic areas at 8 DPI. Molecular identification results of TSWV-infected tobacco leaves are presented in Fig. 4. TSWV coat protein was detected by

Conclusions

This study investigated the potential of using the changes in spectral reflectance in the VIS/NIR region (400–1000 nm) to identify the infected tobacco plants with TSWV at an early stage. A comprehensive method was developed by using the hyperspectral imaging platform in conjunction with GA, SPA and BRT to define several optimal wavelengths and four machine learning algorithms (CART, BRT, SVM and RF) for classification. Six bands were selected by SPA, six bands by GA, and eight bands by the BRT

Acknowledgement

We gratefully acknowledge the financial support from National Natural Science Foundation of China (Grant No. 41601024, 31501220). We also thank the editor and three reviewers for their valuable comments and suggestions that improved this paper.

References (59)

  • R.A. Naidu et al.

    The potential of spectral reflectance technique for the detection of Grapevine leafroll-associated virus-3 in two red-berried wine grape cultivars

    Comput. Electron. Agr.

    (2009)
  • W. Ng et al.

    Optimizing wavelength selection by using informative vectors for parsimonious infrared spectra modelling

    Comput. Electron. Agr.

    (2019)
  • S.H. Park et al.

    Receiver operating characteristic (ROC) curve: practical review for radiologists

    Korean J. Radiol.

    (2004)
  • J. Peng et al.

    Silencing of NbXrn4 facilitates the systemic infection of Tobacco mosaic virus in Nicotiana benthamiana

    Virus Res.

    (2011)
  • M.A. Razi et al.

    A comparative predictive analysis of neural networks (NNs), nonlinear regression and classification and regression tree (CART) models

    Expert Syst. Appl.

    (2005)
  • M. Rodrigues et al.

    An insight into machine-learning algorithms to model human-caused wildfire occurrence

    Environ. Modell. Softw.

    (2014)
  • E. Roghanian et al.

    An optimization model for reverse logistics network under stochastic environment by using genetic algorithm

    J. Manuf. Syst.

    (2014)
  • T. Rumpf et al.

    Early detection and classification of plant diseases with Support Vector Machines based on hyperspectral reflectance

    Comput. Electron. Agr.

    (2010)
  • F. Salazar et al.

    An empirical comparison of machine learning techniques for dam behaviour modelling

    Struct. Saf.

    (2015)
  • S. Sankaran et al.

    A review of advanced techniques for detecting plant diseases

    Comput. Electron. Agr.

    (2010)
  • C. Wei et al.

    Hyperspectral characterization of freezing injury and its biochemical impacts in oilseed rape leaves

    Remote Sens. Environ.

    (2017)
  • K. Were et al.

    A comparative assessment of support vector regression, artificial neural networks, and random forests for predicting and mapping soil organic carbon stocks across an Afromontane landscape

    Ecol. Ind.

    (2015)
  • F. Yan et al.

    A simplified method for constructing artificial microRNAs based on the osa-MIR528 precursor

    J. Biotechnol.

    (2012)
  • R. Yang et al.

    Comparison of boosted regression tree and random forest models for mapping topsoil organic carbon concentration in an alpine ecosystem

    Ecol. Ind.

    (2016)
  • M. Zhang et al.

    Detection of stress in tomatoes induced by late blight disease in California, USA, using hyperspectral remote sensing

    Int. J. Appl. Earth Obs.

    (2003)
  • X. Zhang et al.

    Detecting macronutrients content and distribution in oilseed rape leaves based on hyperspectral imaging

    Biosyst. Eng.

    (2013)
  • N. Bagheri et al.

    Detection of Fire Blight disease in pear trees by hyperspectral data

    Eur. J. Remote Sens.

    (2018)
  • J. Behmann et al.

    Generation and application of hyperspectral 3D plant models: methods and challenges

    Mach. Vision Appl.

    (2016)
  • K. Berger et al.

    Evaluation of the PROSAIL model capabilities for future hyperspectral model environments: a review study

    Remote Sens.

    (2018)
  • Cited by (64)

    • Online small-object anti-fringe sorting of tobacco stem impurities based on hyperspectral superpixels

      2023, Spectrochimica Acta - Part A: Molecular and Biomolecular Spectroscopy
    View all citing articles on Scopus
    View full text