Abstract
In recent years, increasing efforts have been made to predict the time, location, and magnitude of future landslides. This study explores the potential application of four state-of-the-art data mining models (logistic regression, random forest, support vector machine, and Naïve Bayes tree) for the spatially explicit prediction of landslide susceptibility across a landslide-prone landscape in the Zagros Mountains, Iran. Fifteen conditioning factors and 272 historical landslide events were used to develop a geospatial database for the study area. A two-step factor analysis procedure based on the multicollinearity analysis and the Gain Ratio technique was performed to measure the predictive utility of the factors and to quantify their contribution to landslide occurrences across the study region. Once the models were successfully trained and validated using several performance metrics (i.e., ROC-AUC, sensitivity, specificity, accuracy, RMSE, and Kappa), they were applied to the entire study region to generate distribution maps of landslide susceptibilities. Overall, the random forest model demonstrated the highest training performance (AUC = 0.971; accuracy = 99%; RMSE = 0.120) and ability to predict future landslides (AUC = 0.942; accuracy =87%; RMSE = 0.312), followed by the support vector machine, Naïve Bayes tree, and logistic regression models. The Wilcoxon signed-rank test further proved the superiority of the random forest model for mapping landslide susceptibility in the Zagros region. The insights obtained from this research could be useful for the spatially explicit assessment of landslide-prone landscapes and obtaining a better understanding of the capability of different predictive models.
Similar content being viewed by others
References
Ahmouda A, Hochmair HH, Cvetojevic S (2018) Analyzing the effect of earthquakes on OpenStreetMap contribution patterns and tweeting activities. Geo-spatial Information Science 21(3):195–212
Amade N, Painho M, Oliveira T (2018) Geographic information technology usage in developing countries–a case study in Mozambique. Geo-spatial Information Science 21(4):331–345
Arpaci A, Malowerschnig B, Sass O, Vacik H (2014) Using multi variate data mining techniques for estimating fire susceptibility of Tyrolean forests. Appl Geogr 53:258–270
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Budimir MEA, Atkinson PM, Lewis HG (2015) A systematic review of landslide probability mapping using logistic regression. Landslides 12(3):419–436
Bui DT, Ngo PTT, Pham TD, Jaafari A, Minh NQ, Hoa PV, Samui P (2019) A novel hybrid approach based on a swarm intelligence optimized extreme learning machine for flash flood susceptibility mapping. Catena 179:184–196
Chen W, Xie X, Wang J, Pradhan B, Hong H, Bui DT et al (2017) A comparative study of logistic model tree, random forest, and classification and regression tree models for spatial prediction of landslide susceptibility. Catena 151:147–160
Chen W, Zhang S, Li R, Shahabi H (2018) Performance evaluation of the GIS-based data mining techniques of best-first decision tree, random forest, and naïve Bayes tree for landslide susceptibility modeling. Sci Total Environ 644:1006–1018
Chen W, Panahi M, Tsangaratos P, Shahabi H, Ilia I, Panahi S et al (2019) Applying population-based evolutionary algorithms and a neuro-fuzzy system for modeling landslide susceptibility. Catena 172:212–231
Cox DR (1958) The regression analysis of binary sequences. J R Stat Soc Ser B Methodol 20(2):215–242
Dormann CF, Elith J, Bacher S, Buchmann C, Carl G, Carré G et al (2013) Collinearity: a review of methods to deal with it and a simulation study evaluating their performance. Ecography 36(1):27–46
Dou J, Yunus AP, Bui DT, Merghadi A, Sahana M, Zhu Z et al (2019) Assessment of advanced random forest and decision tree algorithms for modeling rainfall-induced landslide susceptibility in the Izu-Oshima Volcanic Island, Japan. Sci Total Environ 662:332–346
Felicísimo ÁM, Cuartero A, Remondo J, Quirós E (2013) Mapping landslide susceptibility with logistic regression, multiple adaptive regression splines, classification and regression trees, and maximum entropy methods: a comparative study. Landslides 10(2):175–189
Glade T, Anderson MG, Crozier, MJ (eds) (2006) Landslide hazard and risk. Wiley
Goetz JN, Brenning A, Petschko H, Leopold P (2015) Evaluating machine learning and statistical prediction techniques for landslide susceptibility modeling. Comput Geosci 81:1–11
Gorsevski PV, Gessler PE, Foltz RB, Elliot WJ (2006) Spatial prediction of landslide hazard using logistic regression and ROC analysis. Trans GIS 10(3):395–415
Guzzetti F, Mondini AC, Cardinali M, Fiorucci F, Santangelo M, Chang KT (2012) Landslide inventory maps: new tools for an old problem. Earth Sci Rev 112(1–2):42–66
Hong H, Pradhan B, Xu C, Bui DT (2015) Spatial prediction of landslide hazard at the Yihuang area (China) using two-class kernel logistic regression, alternating decision tree and support vector machines. Catena 133:266–281
Hong H, Pourghasemi HR, Pourtaghi ZS (2016) Landslide susceptibility assessment in Lianhua County (China): a comparison between a random forest data mining technique and bivariate and multivariate statistical models. Geomorphology 259:105–118
Hong H, Ilia I, Tsangaratos P, Chen W, Xu C (2017) A hybrid fuzzy weight of evidence method in landslide susceptibility analysis on the Wuyuan area, China. Geomorphology 290:1–16
Hong H, Liu J, Bui DT, Pradhan B, Acharya TD, Pham BT et al (2018) Landslide susceptibility mapping using J48 decision tree with AdaBoost, bagging and rotation Forest ensembles in the Guangchang area (China). Catena 163:399–413
Hong H, Jaafari A, Zenner EK (2019) Predicting spatial patterns of wildfire susceptibility in the Huichang County, China: an integrated model to analysis of landscape indicators. Ecol Indic 101:878–891
Huang Y, Zhao L (2018) Review on landslide susceptibility mapping using support vector machines. Catena 165:520–529
Jaafari A (2018) LiDAR-supported prediction of slope failures using an integrated ensemble weights-of-evidence and analytical hierarchy process. Environ Earth Sci 77(2):42
Jaafari A, Najafi A, Pourghasemi HR, Rezaeian J, Sattarian A (2014) GIS-based frequency ratio and index of entropy models for landslide susceptibility assessment in the Caspian forest, northern Iran. Int J Environ Sci Technol 11(4):909–926
Jaafari A, Najafi A, Rezaeian J, Sattarian A, Ghajar I (2015a) Planning road networks in landslide-prone areas: a case study from the northern forests of Iran. Land Use Policy 47:198–208
Jaafari A, Najafi A, Rezaeian J, Sattarian A (2015b) Modeling erosion and sediment delivery from unpaved roads in the north mountainous forest of Iran. GEM-International Journal on Geomathematics 6(2):343–356
Jaafari A, Rezaeian J, Omrani MSO (2017) Spatial prediction of slope failures in support of forestry operations safety. Croatian Journal of Forest Engineering: Journal for Theory and Application of Forestry Engineering 38(1):107–118
Jaafari A, Zenner EK, Pham BT (2018) Wildfire spatial pattern analysis in the Zagros Mountains, Iran: a comparative study of decision tree based classifiers. Ecological Informatics 43:200–211
Jaafari A, Panahi M, Pham BT, Shahabi H, Bui DT, Rezaie F, Lee S (2019a) Meta optimization of an adaptive neuro-fuzzy inference system with grey wolf optimizer and biogeography-based optimization algorithms for spatial prediction of landslide susceptibility. Catena 175:430–445
Jaafari A, Zenner EK, Panahi M, Shahabi H (2019b) Hybrid artificial intelligence models based on a neuro-fuzzy system and metaheuristic optimization algorithms for spatial prediction of wildfire probability. Agric For Meteorol 266:198–207
Jaafari A, Termeh SVR, Bui DT (2019c) Genetic and firefly metaheuristic algorithms for an optimized neuro-fuzzy prediction modeling of wildfire probability. J Environ Manag 243:358–369
Jaafari A, Mafi-Gholami D, Thai Pham B, Tien Bui D (2019d) Wildfire probability mapping: bivariate vs. multivariate statistics. Remote Sensing 11(6):618
Kamp U, Growley BJ, Khattak GA, Owen LA (2008) GIS-based landslide susceptibility mapping for the 2005 Kashmir earthquake region. Geomorphology 101(4):631–642
Kohavi R (1996) Scaling up the accuracy of naive-Bayes classifiers: a decision-tree hybrid. In: KDD, vol 96, pp 202–207
Nami MH, Jaafari A, Fallah M, Nabiuni S (2018) Spatial prediction of wildfire probability in the Hyrcanian ecoregion using evidential belief function model and GIS. Int J Environ Sci Technol 15(2):373–384
NASA's EOS (NASA's Earth Observing System) (2000) http://sedac.ciesin.columbia.edu/downloads/maps/ndh/ndh-landslide-hazard distribution/landslide-distribution.pdf. Accessed Feb 2019
Pham BT, Pradhan B, Bui DT, Prakash I, Dholakia MB (2016) A comparative study of different machine learning methods for landslide susceptibility assessment: a case study of Uttarakhand area (India). Environ Model Softw 84:240–250
Pham BT, Bui DT, Dholakia MB, Prakash I, Pham HV, Mehmood K, Le HQ (2017) A novel ensemble classifier of rotation forest and Naïve Bayer for landslide susceptibility assessment at the Luc yen district, Yen Bai Province (Viet Nam) using GIS. Geomatics, Natural Hazards and Risk 8(2):649–671
Pham BT, Jaafari A, Prakash I, Bui DT (2018a) A novel hybrid intelligent model of support vector machines and the MultiBoost ensemble for landslide susceptibility modeling. Bull Eng Geol Environ:1–22
Pham BT, Prakash I, Jaafari A, Bui DT (2018b) Spatial prediction of rainfall-induced landslides using aggregating one-dependence estimators classifier. J Indian Soc Remote Sens:1–14
Pham BT, Jaafari A, Prakash I, Singh SK, Quo NK, Bui DT (2019) Hybrid computational intelligence models for groundwater potential mapping. Catena 181
Pourghasemi HR, Rahmati O (2018) Prediction of the landslide susceptibility: which algorithm, which precision? Catena 162:177–192
Quinlan JR (1993) C45: Programs for machine learning. Morgan Kaufmann, San Mateo
Razavizadeh S, Solaimani K, Massironi M, Kavian A (2017) Mapping landslide susceptibility with frequency ratio, statistical index, and weights of evidence models: a case study in northern Iran. Environ Earth Sci 76(14):499
Shirzadi A, Bui DT, Pham BT, Solaimani K, Chapi K, Kavian A et al (2017) Shallow landslide susceptibility assessment using a novel hybrid intelligence approach. Environ Earth Sci 76(2):60
Tien Bui D, Tuan TA, Klempe H, Pradhan B, Revhaug I (2016a) Spatial prediction models for shallow landslide hazards: a comparative assessment of the efficacy of support vector machines, artificial neural networks, kernel logistic regression, and logistic model tree. Landslides 13(2):361–378
Tien Bui D, Ho TC, Pradhan B, Pham BT, Nhu VH, Revhaug I (2016b) GIS-based modeling of rainfall-induced landslides using data mining-based functional trees classifier with AdaBoost, bagging, and MultiBoost ensemble frameworks. Environ Earth Sci 75(14):1101
Tien Bui D, Shahabi H, Shirzadi A, Chapi K, Hoang ND, Pham B et al (2018) A novel integrated approach of relevance vector machine optimized by imperialist competitive algorithm for spatial modeling of shallow landslides. Remote Sens 10(10):1538
Vafaeinezhad AR, Alesheikh AA, Roshannejad AA, Shad R (2009) A new approach for modeling spatio-temporal events in an earthquake rescue scenario. J Appl Sci 9(3)513–520
Vapnik VN (1995) The nature of statistical learning theory. Springer-Verlag, New York
Wang LM, Li XL, Cao CH, Yuan SM (2006) Combining decision tree and naive Bayes for classification. Knowl-Based Syst 19(7):511–515
Wang Z, Lai C, Chen X, Yang B, Zhao S, Bai X (2015) Flood hazard risk assessment model based on random forest. J Hydrol 527:1130–1141
Wang Y, Fang Z, Hong H (2019) Comparison of convolutional neural networks for landslide susceptibility mapping in Yanshan County, China. Sci Total Environ 666:975–993
Wilcoxon F (1945) Individual comparisons by ranking methods. Biom Bull 1(6):80–83
Yalcin A, Reis S, Aydinoglu AC, Yomralioglu T (2011) A GIS-based comparative study of frequency ratio, analytical hierarchy process, bivariate statistics and logistics regression methods for landslide susceptibility mapping in Trabzon, NE Turkey. Catena 85(3):274–287
Yu Y, Li M, Fu Y (2018) Forest type identification by random forest classification combined with SPOT and multitemporal SAR data. J For Res 29(5):1407–1414
Zhang H, Su J (2004) Naïve Bayesian classifiers for ranking. In: In European conference on machine learning. Springer, Berlin Heidelberg, pp 501–512
Zhu AX, Miao Y, Wang R, Zhu T, Deng Y, Liu J et al (2018) A comparative study of an expert knowledge-based model and two data-driven models for landslide susceptibility mapping. Catena 166:317–327
Acknowledgements
This study was supported by the Science and Research Branch Sensing Islamic Azad University. The authors would like to thank the administrative office of natural resources of the Chaharmahal and Bakhtiari Province, Iran, which provided the landslide report database.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Fallah-Zazuli, M., Vafaeinejad, A., Alesheykh, A.A. et al. Mapping landslide susceptibility in the Zagros Mountains, Iran: a comparative study of different data mining models. Earth Sci Inform 12, 615–628 (2019). https://doi.org/10.1007/s12145-019-00389-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12145-019-00389-w