Abstract
In Precision Agriculture one of the basic tasks is the classification of land zones in either arable or non-arable land. Several studies have been conducted using data obtained from soil analysis or local exploration of the parcels. However, sometimes only data from satellite images are available and then the problem not only becomes more challenging but also more interesting to solve because it is much more cost-effective. In this paper, we consider different spectral and thermal bands from the Landsat 8 satellite images corresponding to a vineyard located in Galicia, a region in Northeastern Spain, and apply a range of supervised Machine Learning methods to classify the different land zones. We conclude that an adequate choice of the algorithm parameters together with feature selection techniques can yield a classification that is both highly effective and efficient.
Similar content being viewed by others
References
Aggelopooulou K, Castrignanò A, Gemtos T, Benedetto DD (2013) Delineation of management zones in an apple orchard in Greece using a multivariate approach. Comput Electron Agric 90:119–130
Altman NS (1992) An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression. The American Statistician 46(3):175–185
Amazon (2015a) Amazon S3. http://aws.amazon.com/es/public-data-sets/landsat/, accessed: 2015-07-21
Amazon (2015b) Worldwide Reference System. http://landsat.gsfc.nasa.gov/?p=3231, accessed: 2015-07-21
Arango R, Díaz I, Campos A, Combarro E, Canas E (2015) On the influence of temporal resolution on automatic delimitation using clustering algorithms. Appl Math Inf Sci 9(2L):339–347
Arango R, Campos A, Combarro E, Canas E, Díaz I (2016) Mapping cultivable land from satellite imagery with clustering algorithms. Int J Appl Earth Obs Geoinf 49:99–106
Bae JK, Kim J (2011) Combining models from neural networks and inductive learning algorithms. Expert Syst Appl 38(5):4839–4850
Blackmore S, Godwin RJ, Fountas S (2003) The analysis of spatial and temporal trends in yield map data over six years. Biosyst Eng 84(4):455–466
Ceccato P, Gobron N, Flasse S, Pinty B, Tarantola S (2002) Designing a spectral index to estimate vegetation water content from remote sensing data: Part 1: Theoretical approach. Remote Sens Environ 82(2):188–197
Chang YW, Hsieh CJ, Chang KW, Ringgaard M, Lin CJ (2010) Training and testing low-degree polynomial data mappings via linear SVM. J Mach Learn Res 11:1471–1490
Chou JS (2012) Comparison of multilabel classification models to forecast project dispute resolutions. Expert Syst Appl 39(11):10,202–10,211
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Díaz I, Ranilla J, Montañés E, Fernández J, Combarro EF (2004) Improving performance of text categorisation by combining filtering and support vector. J Am Soc Inf Sci Technol 55(7):579–592
Dreiseitl S, Ohno-Machado L (2002) Logistic regression and artificial neural network classification models: a methodology review. J Biomed Inform 35(5–6):352–359
Duro DC, Franklin SE, Dubé MG (2012) A comparison of pixel-based and object-based image analysis with selected machine learning algorithms for the classification of agricultural landscapes using SPOT-5 HRG imagery. Remote Sens Environ 118:259–272
EarthOnline (2014) https://earth.esa.int/web/guest/data-access, accessed: 2014-03-02
ESA (2014) Sentinel missions. http://www.esa.int/Our_Activities/Observing_the_Earth/Copernicus/Overview4/, accessed: 2016-05-14
Farid DM, Rahman MZ, Rahman CM (2011) Article: adaptive intrusion detection based on boosting and Naive Bayesian classifier. Int J Comput Appl 24(3):12–19
Fawcett T (2006) An introduction to ROC analysis. Pattern Recogn Lett 27(8):861–874
Fensholt R, Sandholt I (2003) Derivation of a shortwave infrared water stress index from modis near-and shortwave infrared data in a semiarid environment. Remote Sens Environ 87(1):111–121
Friedl MA, Brodley CE (1997) Decision tree classification of land cover from remotely sensed data. Remote Sens Environ 61(3):399–409
Fu Q, Wang Z, Jiang Q (2010) Delineating soil nutrient management zones based on fuzzy clustering optimized by PSO. Math Comput Model 51(11–12):1299–1305
Gislason PO, Benediktsson JA, Sveinsson JR (2006) Random forests for land cover classification. Pattern Recogn Lett 27(4):294– 300
Gualtieri JA, Cromp RF (1999) Support vector machines for hyperspectral remote sensing classification. In: The 27th AIPR workshop: advances in computer-assisted recognition. International Society for Optics and Photonics, pp 221–232
Hall M (1997) Feature subset selection: a correlation based filter approach
Han N, Wu J, Tahmassebi ARS, wei XUH, WANG K (2011) NDVI-based lacunarity texture for improving identification of torreya using object-oriented method. Agric Sci China 10(9):1431–1444
Huang C, Davis L, Townshend J (2002) An assessment of support vector machines for land cover classification. Int J Remote Sens 23(4):725–749
Huete A, Liu H, Batchily K, Van Leeuwen W (1997) A comparison of vegetation indices over a global set of TM images for EOS-MODIS. Remote Sens Environ 59(3):440–451
Hunt ER, Rock BN (1989) Detection of changes in leaf water content using near-and middle-infrared reflectances. Remote Sens Environ 30(1):43–54
Jackson RD, Huete AR (1991) Interpreting vegetation indices. Prev Vet Med 11(3):185–200
Jardine N, van Rijsbergen CJ (1971) The use of hierarchic clustering in information retrieval. Inf Storage Retr 7(5):217–240
Jiang S, Pang G, Wu M, Kuang L (2012) An improved k-nearest-neighbor algorithm for text categorization. Expert Syst Appl 39(1):1503–1509
Johnson CK, Mortensen DA, Wienhold BJ, Shanahan JF, Doran JW (2003) Site-specific management zones based on soil electrical conductivity in a semiarid cropping system. Agron J 95(2):303–315
Kang DK, Kim MJ (2011) Propositionalized attribute taxonomies from data for data-driven construction of concise classifiers. Expert Syst Appl 38(10):12,739–12,746
Kavzoglu T, Mather P (2003) The use of backpropagating artificial neural networks in land cover classification. Int J Remote Sens 24(23):4907–4938
Klein I, Gessner U, Kuenzer C (2012) Regional land cover mapping and change detection in Central Asia using MODIS time-series. Appl Geogr 35(1–2):219–234
Koc L, Mazzuchi TA, Sarkani S (2012) A network intrusion detection system based on a hidden Naïve Bayes multiclass classifier. Expert Syst Appl 39(18):13,492–13,500
Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th international joint conference on artificial intelligence (IJCAI’95), vol 2. Morgan Kaufmann Publishers Inc., San Francisco, pp 1137–1143
Kriegler F, Malila W, Nalepka R, Richardson W (1969) Preprocessing transformations and their effects on multispectral recognition. Remote Sens Environ VI 1:97
Kuhn M (2008) Building predictive models in R using the caret package. J Stat Softw 28(5):1–26
Kuhn M, Johnson K (2013) Applied predictive modeling. Springer, New York, Heidelberg, Dordrecht, London
Kumar J, Mills RT, Hoffman FM, Hargrove WW (2011) Parallel k-means clustering for quantitative ecoregion delineation using large data sets. Procedia Comput Sci 4:1602–1611
Landsat (2013) Landsat. http://landsat.usgs.gov/, accessed: 2015-02-30
Lau BC, Ma EW, Chow TW (2014) Probabilistic fault detector for wireless sensor network. Expert Syst Appl 41(8):3703–3711
Lin J (1991) Divergence measures based on the Shannon entropy. IEEE Trans Inf Theory 37(1):145–151
Liu H, Huete A (1995) A feedback based modification of the NDVI to minimize canopy background and atmospheric noise, vol 33, p 457
Liu M, Samal A (2002) A fuzzy clustering approach to delineate agroecozones. Ecol Model 149(3):215–228
Ludwig B, Nitschke R, Terhoeven-Urselmans T, Michel K, Flessa H (2008) Use of mid-infrared spectroscopy in the diffuse-reflectance mode for the prediction of the composition of organic matter in soil and litter. J Plant Nutr Soil Sci 171(3):384–391
Luoa Z, Yaolin L, Jiana W, Jingb W (2008) Quantitative mapping of soil organic material using field spectrometer and hyperspectral remote sensing. Int Arch Photogramm Remote Sens Spat Inf Sci 37:901–906
Marconcini M, Camps-Valls G, Bruzzone L (2009) A composite semisupervised svm for classification of hyperspectral images. IEEE Geosci Remote Sens Lett 6(2):234–238
Mistikoglu G, Gerek IH, Erdis E, Usmen PM, Cakan H, Kazan EE (2015) Decision tree analysis of construction fall accidents involving roofers. Expert Syst Appl 42(4):2256–2263
MODIS (2014) http://lpdaac.usgs.gov/products/modis_products_table, accessed: 2015-05-18
Montañés E, Díaz I, Ranilla J, Combarro E, Fernández J (2005) Scoring and selecting terms for text categorization. IEEE Intell Syst 20(3):40–47
Moral F, Terrón J, Rebollo F (2011) Site-specific management zones based on the Rasch model and geostatistical techniques. Comput Electron Agric 75(2):223–230
Ormeño Villajos S, Arozarena Villar A, Martínez Peña M, Palomo Arroyo M, Villa Alcázar G, Peces Morera J, Pérez García L (2008) Los satélites de media y baja resolución espacial como fuente de datos para la obtención de indicadores ambientales. In: IX Congreso Nacional de Medio Ambiente, Madrid
Ortega RA, Santibáñez OA (2007) Determination of management zones in corn (Zea mays L.) based on soil fertility. Comput Electron Agric 58(1):49–59
Ottinger M, Kuenzer C, Liu G, Wang S, Dech S (2013) Monitoring land cover dynamics in the Yellow River Delta from 1995 to 2010 based on Landsat 5 TM. Appl Geogr 44:53–68
Pal M, Mather P (2005) Support vector machines for classification in remote sensing. Int J Remote Sens 26(5):1007–1011
Paliwal M, Kumar UA (2009) Neural networks and statistical techniques: a review of applications. Expert Syst Appl 36(1):2– 17
Peralta NR, Costa JL (2013) Delineation of management zones with soil apparent electrical conductivity to improve nutrient management. Comput Electron Agric 99:218–226
Powers DMW (2007) Evaluation: from precision, recall and F-factor to ROC, informedness, markedness & correlation. Tech. Rep. SIE-07-001, School of Informatics and Engineering, Flinders University, Adelaide, Australia
Quinlan RJ (1994) C4.5: programs for machine learning. Mach Learn 16(3):235–240
Quinlan RJ (2000) Data mining tools See5 and C5
Ripley BD, Hjort NL (1995) Pattern recognition and neural networks, 1st edn. Cambridge University Press, New York
Romanski P, Kotthoff L (2015) FSelector R Package. /FSelector/FSelector.pdf, accessed: 2015-07-21
Rubio M, Riaño D, Cheng Y, Ustin S (2006) Estimation of canopy water content from modis using artificial neural networks trained with radiative transfer models. 6th EMS/6th ECAC
Schepers AR, Shanahan JF, Liebig MA, Schepers JS, Johnson SH, Luchiari A (2004) Appropriateness of management zones for characterizing spatial variability of soil properties and irrigated corn yields across years. Agron J 96(1):195–203
Schuster E, Kumar S, Sarma SE, Willers J, Milliken G (2011) Infrastructure for data-driven agriculture: identifying management zones for cotton using statistical modeling and machine learning techniques. In: 8th international conference expo on emerging technologies for a smarter world (CEWIT), 2011, pp 1– 6
Sebastiani F (2002) Machine learning in automated text categorisation. ACM Comput Surv 34(1)
Sigpac (2015) Sistema de Información Geográfica de Parcelas Agrícolas. http://sigpac.magrama.es/fega/h5visor/, accessed: 2015-01-20
Simbahan GC, Dobermann A (2006) An algorithm for spatially constrained classification of categorical and continuous soil properties. Geoderma 136(3):504–523
SPOT-5 (2015) https://goo.gl/LpIaT4/, accessed: 2015-07-21
Trombetti M, Riaño D, Rubio M, Cheng Y, Ustin S (2008) Multi-temporal vegetation canopy water content retrieval and interpretation using artificial neural networks for the continental USA. Remote Sens Environ 112(1):203–215
USGS (1972) Landsat project. http://landsat.usgs.gov/, accessed: 2015-02-30
Xie H, Yang X, Drury C, Yang J, Zhang X (2011) Predicting soil organic carbon and total nitrogen using mid-and near-infrared spectra for Brookston clay loam soil in Southwestern Ontario, Canada. Can J Soil Sci 91(1):53–63
Zhang B, Li S, Wu C, Gao L, Zhang W, Peng M (2013) A neighbourhood-constrained k-means approach to classify very high spatial resolution hyperspectral imagery. Remote Sens Lett 4(2):161–170
Zhu H, Basir O (2005) An adaptive fuzzy evidential nearest neighbor formulation for classifying remote sensing images. IEEE Trans Geosci Remote Sens 43(8):1874–1889
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: H. A. Babaie
This work has been supported Farm-Oriented Open Data in Europe (FOODIE) Pilot B from European Union’s Seventh Framework Programme for Research, Technological Development and Demonstration under grant agreement no. 621074.
Rights and permissions
About this article
Cite this article
Arango, R.B., Díaz, I., Campos, A. et al. Automatic arable land detection with supervised machine learning. Earth Sci Inform 9, 535–545 (2016). https://doi.org/10.1007/s12145-016-0270-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12145-016-0270-6