Abstract
Land use and land cover (LULC) maps, providing crucial information for monitoring the Earth’s surface, are one of the most essential products for numerous studies. Using only the spectral information in the classification process might cause poor performances in the areas with heterogeneous landscape characteristics. To overcome this problem, auxiliary and ancillary data are usually employed to improve classification accuracy. The objective of this study is to integrate auxiliary data (topographic and climatic features) and ancillary data (spectral indices and texture measures) into spectral bands of Sentinel-2A imagery and evaluate the performances of advanced feature selection methods. In this context, genetic algorithm-based random forest (GA-RF), HSIC-Lasso, and Relief-F feature selection approaches were utilized to determine the most informative features for the classification process from a high-dimensional dataset consisting of 102 features. Whilst the GA-RF algorithm selected 65 features, HSIC-Lasso chose 38 features, and Relief-F determined 51 features as ideal subsets. These feature subsets together with the whole data were inputted into a supervised classification process using the random forest (RF) classifier, whose parameters were optimized using random search algorithm. The highest overall accuracy of the produced thematic maps was estimated as 91.05% for the subset determined by the HSIC-Lasso algorithm, which was also the fastest algorithm (5.71 s). McNemar’s statistical significance test confirmed the superiority of the HSIC-Lasso method over the GA-RF and Relief-F algorithms. SHapley Additive exPlanations method was also applied to analyze the relative importance of a feature according to the model output.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs12145-022-00874-9/MediaObjects/12145_2022_874_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs12145-022-00874-9/MediaObjects/12145_2022_874_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs12145-022-00874-9/MediaObjects/12145_2022_874_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs12145-022-00874-9/MediaObjects/12145_2022_874_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs12145-022-00874-9/MediaObjects/12145_2022_874_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs12145-022-00874-9/MediaObjects/12145_2022_874_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs12145-022-00874-9/MediaObjects/12145_2022_874_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs12145-022-00874-9/MediaObjects/12145_2022_874_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs12145-022-00874-9/MediaObjects/12145_2022_874_Fig9_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs12145-022-00874-9/MediaObjects/12145_2022_874_Fig10_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs12145-022-00874-9/MediaObjects/12145_2022_874_Fig11_HTML.png)
Similar content being viewed by others
Data availability
Data is not available due to legal restrictions.
References
Abdi AM (2020) Land cover and land use classification performance of machine learning algorithms in a boreal landscape using Sentinel-2 data. Giscience Remote Sens 57:1–20. https://doi.org/10.1080/15481603.2019.1650447
Adam E, Mutanga O, Odindi J, Abdel-Rahman EM (2014) Land-use/cover classification in a heterogeneous coastal landscape using RapidEye imagery: evaluating the performance of random forest and support vector machines classifiers. Int J Remote Sens 35:3440–3458. https://doi.org/10.1080/01431161.2014.903435
Afshar M, Usefi H (2021) Dimensionality reduction using singular vectors. Sci Rep 11:1–13. https://doi.org/10.1038/s41598-021-83150-y
Bouzekri S, Lasbet AA, Lachehab A (2015) A new spectral index for extraction of built-up area using landsat-8 data. J Indian Soc Remote Sens 43:867–873. https://doi.org/10.1007/s12524-015-0460-6
Breiman L (2001) Random forests. Mach Learn 45:5–32
Calzolari M (2022) Manuel-calzolari/sklearn-genetic. https://doi.org/10.5281/zenodo.3348077. Accessed 05 May 2022
Colkesen I, Kavzoglu T (2017) The use of logistic model tree (LMT) for pixel- and object-based classifications using high-resolution WorldView-2 imagery. Geocarto Int 32:71–86. https://doi.org/10.1080/10106049.2015.1128486
Colkesen I, Kavzoglu T (2018) Selection of optimal object features in object-based image analysis using filter-based algorithms. J Indian Soc Remote Sens 46:1233–1242. https://doi.org/10.1007/s12524-018-0807-x
Colkesen I, Kavzoglu T (2019) Comparative evaluation of decision-forest algorithms in object-based land use and land cover mapping. In: Spatial Modeling in GIS and R for Earth and Environmental Sciences. Elsevier, pp 499–517
Corcoran J, Knight J, Gallant A (2013) Influence of multi-source and multi-temporal remotely sensed and ancillary data on the accuracy of random forest classification of wetlands in Northern Minnesota. Remote Sens 5:3212–3238. https://doi.org/10.3390/rs5073212
Farhadi H, Najafzadeh M (2021) Flood Risk Mapping by Remote Sensing Data and Random Forest Technique. Water 13:3115. https://doi.org/10.3390/w13213115
Fei H, Fan Z, Wang C et al (2022) Cotton classification method at the county scale based on multi-features and random forest feature selection algorithm and classifier. Remote Sens 14:829. https://doi.org/10.3390/rs14040829
Foody GM (2004) Thematic map comparison: Evaluating the statistical significance of differences in classification accuracy. Photogramm Eng Remote Sensing 70:627–633. https://doi.org/10.14358/PERS.70.5.627
Ghosh M, Guha R, Sarkar R, Abraham A (2020) A wrapper-filter feature selection technique based on ant colony optimization. Neural Comput Appl 32:7839–7857. https://doi.org/10.1007/s00521-019-04171-3
Gislason PO, Benediktsson JA, Sveinsson JR (2006) Random forests for land cover classification. Pattern Recognit Lett 27:294–300. https://doi.org/10.1016/j.patrec.2005.08.011
Hall MA (1999) Correlation-based feature selection for machine learning. University of Waikato, New Zelland
Ham J, Chen Y, Crawford MM, Ghosh J (2005) Investigation of the random forest framework for classification of hyperspectral data. IEEE Trans Geosci Remote Sens 43:492–501. https://doi.org/10.1109/TGRS.2004.842481
Huete A (1988) A soil-adjusted vegetation index (SAVI). Remote Sens Environ 25:295–309. https://doi.org/10.1016/0034-4257(88)90106-X
Hughes G (1968) On the mean accuracy of statistical pattern recognizers. IEEE Trans Inf Theory 14:55–63. https://doi.org/10.1109/TIT.1968.1054102
Hurskainen P, Adhikari H, Siljander M et al (2019) Auxiliary datasets improve accuracy of object-based land use/land cover classification in heterogeneous savanna landscapes. Remote Sens Environ 233:111354. https://doi.org/10.1016/j.rse.2019.111354
Jin Y, Liu X, Chen Y, Liang X (2018) Land-cover mapping using random forest classification and incorporating NDVI time-series and texture: a case study of central Shandong. Int J Remote Sens 39:8703–8723. https://doi.org/10.1080/01431161.2018.1490976
Kavzoglu T (2008) Determination of environmental degradation due to urbanization and industrialization in Gebze, Turkey. Environ Eng Sci 25:429–438. https://doi.org/10.1089/ees.2006.0271
Kavzoglu T (2017) Object-oriented random forest for high resolution land cover mapping using Quickbird-2 imagery. In: Samui P, Roy SS, Balas VE (eds) Handbook of Neural Computation. Elsevier, Amsterdam, pp 607–619
Kavzoglu T, Mather PM (2000) The use of feature selection techniques in the context of artificial neural networks. In: Proceedings of the 26th Annual Conference of the Remote Sensing Society. Leicester, UK
Kavzoglu T, Mather PM (2003) The use of backpropagating artificial neural networks in land cover classification. Int J Remote Sens 24:4907–4938. https://doi.org/10.1080/0143116031000114851
Kavzoglu T, Teke A (2022a) Advanced hyperparameter optimization for improved spatial prediction of shallow landslides using extreme gradient boosting (XGBoost). Bull Eng Geol Environ 81:201. https://doi.org/10.1007/s10064-022-02708-w
Kavzoglu T, Teke A (2022b) Ensemble conditioning factor selection with markov chain framework for shallow landslide susceptibility mapping in lake Sapanca Basin and its vicinity, Turkey. Balt J Mod Comput 10:224–240. https://doi.org/10.22364/bjmc.2022.10.2.09
Kavzoglu T, Bilucan F, Teke A (2020) Comparison of support vector machines, random forest and decision tree methods for classification of Sentinel - 2A image using different band combinations. In: ACRS 2020 - 41st Asian Conference on Remote Sensing. Deqing, China
Kavzoglu T, Tonbul H, Yildiz Erdemir M, Colkesen I (2018) Dimensionality Reduction and Classification of Hyperspectral Images Using Object-Based Image Analysis. J Indian Soc Remote Sens 46:1297–1306. https://doi.org/10.1007/s12524-018-0803-1
Khalid S, Khalil T, Nasreen S (2014) A survey of feature selection and feature extraction techniques in machine learning. In: 2014 Science and Information Conference. IEEE, pp 372–378
Kononenko I (1994) Estimating attributes: Analysis and extensions of RELIEF. In: Proceedings of the Seventh European Conference on Machine Learning. Springer Berlin Heidelberg, Catania, pp 171–182
Kopeć A, Trybała P, Głąbicki D et al (2020) Application of remote sensing, gis and machine learning with geographically weighted regression in assessing the impact of hard coal mining on the natural environment. Sustain 12:1–26. https://doi.org/10.3390/su12229338
Lee Y, Han D, Ahn M-H et al (2019) Retrieval of total precipitable water from Himawari-8 AHI data: a comparison of random forest, extreme gradient boosting, and deep neural network. Remote Sens 11:1741. https://doi.org/10.3390/rs11151741
Lillesand TM, Kiefer RW, Chipman JW (2015) Remote sensing and image interpretation., 7 th. John Wiley & Sons, New York
López S (2022) Deforestation, forest degradation, and land use dynamics in the Northeastern Ecuadorian Amazon. Appl Geogr 145:102749. https://doi.org/10.1016/j.apgeog.2022.102749
Medjahed SA, Ouali M (2018) Band selection based on optimization approach for hyperspectral image classification. Egypt J Remote Sens Sp Sci 21:413–418. https://doi.org/10.1016/j.ejrs.2018.01.003
Mishra VN, Prasad R, Rai PK et al (2019) Performance evaluation of textural features in improving land use/land cover classification accuracy of heterogeneous landscape using multi-sensor remote sensing data. Earth Sci Informatics 12:71–86. https://doi.org/10.1007/s12145-018-0369-z
Na X, Zhang S, Li X et al (2010) Improved land cover mapping using random forests combined with Landsat Thematic Mapper imagery and ancillary geographic data. Photogramm Eng Remote Sens 76:833–840. https://doi.org/10.14358/PERS.76.7.833
Orieschnig CA, Belaud G, Venot J-P et al (2021) Input imagery, classifiers, and cloud computing: Insights from multi-temporal LULC mapping in the Cambodian Mekong Delta. Eur J Remote Sens 54:398–416. https://doi.org/10.1080/22797254.2021.1948356
Pal M (2005) Random forest classifier for remote sensing classification. Int J Remote Sens 26:217–222. https://doi.org/10.1080/01431160412331269698
Pal M, Mather PM (2003) An assessment of the effectiveness of decision tree methods for land cover classification. Remote Sens Environ 86:554–565. https://doi.org/10.1016/S0034-4257(03)00132-9
Qu L, Chen Z, Li M et al (2021) Accuracy improvements to pixel-based and object-based LULC classification with auxiliary datasets from Google Earth engine. Remote Sens 13:453. https://doi.org/10.3390/rs13030453
Rasul A, Balzter H, Ibrahim G et al (2018) Applying built-up and bare-Soil indices from Landsat 8 to cities in dry climates. Land 7:81. https://doi.org/10.3390/land7030081
Rodriguez-Galiano VF, Ghimire B, Rogan J et al (2012) An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS J Photogramm Remote Sens 67:93–104. https://doi.org/10.1016/j.isprsjprs.2011.11.002
Rondeaux G, Steven M, Baret F (1996) Optimization of soil-adjusted vegetation indices. Remote Sens Environ 55:95–107. https://doi.org/10.1016/0034-4257(95)00186-7
Rouse JW, Haas RH, Schell JA, Deering DW (1973) Monitoring the vernal advancement and retrogradation (green wave effect) of natural vegetation. Prog Rep RSC 1978–1
Saha AK, Arora MK, Csaplovics E, Gupta RP (2005) Land cover classification using IRS liss III image and DEM in a rugged terrain: a case study in Himalayas. Geocarto Int 20:33–40. https://doi.org/10.1080/10106040508542343
Saini R, Ghosh SK (2021) Crop classification in a heterogeneous agricultural environment using ensemble classifiers and single-date Sentinel-2A imagery. Geocarto Int 36:2141–2159. https://doi.org/10.1080/10106049.2019.1700556
Shao Z, Sumari NS, Portnov A et al (2021) Urban sprawl and its impact on sustainable urban development: a combination of remote sensing and social media data. Geo-Spatial Inf Sci 24:241–255. https://doi.org/10.1080/10095020.2020.1787800
Sun X, Zhang Y, Shi K et al (2022) Monitoring water quality using proximal remote sensing technology. Sci Total Environ 803:149805. https://doi.org/10.1016/j.scitotenv.2021.149805
Thanh Noi P, Kappas M (2017) Comparison of random forest, k nearest neighbor, and support vector machine classifiers for land cover classification using Sentinel-2 imagery. Sensors 18:18. https://doi.org/10.3390/s18010018
Tonbul H, Colkesen I, Kavzoglu T (2022) Pixel- and object-based ensemble learning for forest burn severity using USGS FIREMON and Mediterranean condition dNBRs in Aegean ecosystem (Turkey). Adv Sp Res 69:3609–3632. https://doi.org/10.1016/j.asr.2022.02.051
Tramblay Y, Quintana Seguí P (2022) Estimating soil moisture conditions for drought monitoring with random forests and a simple soil moisture accounting scheme. Nat Hazards Earth Syst Sci 22:1325–1334. https://doi.org/10.5194/nhess-22-1325-2022
Watts JD, Powell SL, Lawrence RL, Hilker T (2011) Improved classification of conservation tillage adoption using high temporal and synthetic satellite imagery. Remote Sens Environ 115:66–75. https://doi.org/10.1016/j.rse.2010.08.005
Wen L, Hughes M (2020) Coastal wetland mapping using ensemble learning algorithms: a comparative study of bagging, boosting and stacking techniques. Remote Sens 12:1683. https://doi.org/10.3390/rs12101683
Yamada M, Jitkrittum W, Sigal L et al (2014) High-dimensional feature selection by Feature-wise kernelized lasso. Neural Comput 26:185–207. https://doi.org/10.1162/NECO_a_00537
Yang M-D (2007) A genetic algorithm (GA) based automated classifier for remote sensing imagery. Can J Remote Sens 33:203–213. https://doi.org/10.5589/m07-020
Zeferino LB, de Souza LFT, Amaral CH, do, et al (2020) Does environmental data increase the accuracy of land use and land cover classification? Int J Appl Earth Obs Geoinf 91:102128. https://doi.org/10.1016/j.jag.2020.102128
Zha Y, Gao J, Ni S (2003) Use of normalized difference built-up index in automatically mapping urban areas from TM imagery. Int J Remote Sens 24:583–594. https://doi.org/10.1080/01431160304987
Zheng Q, Ye H, Huang W et al (2021) Integrating spectral information and meteorological data to monitor wheat yellow rust at a regional Scale: a case study. Remote Sens 13:278. https://doi.org/10.3390/rs13020278
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Conceptualization, data curation, methodology analysis and writing original draft, review and editing were performed by Taskin KAVZOGLU and Furkan BILUCAN. Investigation, methodology and software were performed by Furkan BILUCAN. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interest
The authors declare they have no competing interests.
Additional information
Communicated by H. Babaie.
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kavzoglu, T., Bilucan, F. Effects of auxiliary and ancillary data on LULC classification in a heterogeneous environment using optimized random forest algorithm. Earth Sci Inform 16, 415–435 (2023). https://doi.org/10.1007/s12145-022-00874-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12145-022-00874-9