Skip to main content
Log in

Optimization of negative sample selection for landslide susceptibility mapping based on machine learning using K-means-KNN algorithm

  • RESEARCH
  • Published:
Earth Science Informatics Aims and scope Submit manuscript

Abstract

The quality of the sample plays a vital role in developing accurate models using machine learning. This aspect is equally important when evaluating regional landslide susceptibility using machine learning. Previous studies have mostly employed random generation methods to select samples, which often fail to select representative samples. Therefore, this study proposes the KK-sampling method, which uses K-means and KNN algorithms to analyze relevant attributes of the study area and select samples. To evaluate the effectiveness of the proposed method, this study employed MLP, RF, and XGBoost models in conjunction with the KK-sampling method, with Zhong County, Chongqing serving as a case study. The results indicate that the KK-sampling method significantly improves the stability and accuracy of the model. Additionally, this study analyzed the importance of landslide factors in Zhong County using SHAP values. The findings provide a reference for establishing a reasonable and effective landslide susceptibility model in the region.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data availability

The datasets analysed during the current study are available in the following table.

Factor

Source

Web link to datasets

elevation

The Earth Science Data Systems (ESDS) Program provided by NASA’s collection of Earth science data

https://search.asf.alaska.edu/#/

NDVI

Geospatial Data Cloud site, Computer Network Information Center, Chinese Academy of Sciences

https://www.gscloud.cn

Lithology

China National Digital Geological Map (Public Version) Spatial Database

http://dcc.ngac.org.cn/

River

Vector Map (Public Version) from National Catalogue Service For Geographic Information

https://www.webmap.cn/

Rain

National Earth System Science Data Center, National Science & Technology Infrastructure of China

http://www.geodata.cn

Population

WorldPop, University of Southampton, UK

https://hub.worldpop.org/

Soil

Harmonized World Soil Database Version (HWSD)

https://www.fao.org/soils-portal/data-hub/soil-maps-and-databases/harmonized-world-soil-database-v12/en/

Erosion

Resource & Environment Science & Data Center of Academia Sinica

 

Road

Vector Map (Public Version) from National Catalogue Service For Geographic Information

https://www.webmap.cn/

References

  • Abu El-Magd SA, Ali SA, Pham QB (2021) Spatial modeling and susceptibility zonation of landslides using random forest, naïve bayes and K-nearest neighbor in a complicated terrain. Earth Sci Inform 14:1227–1243. https://doi.org/10.1007/s12145-021-00653-y

    Article  Google Scholar 

  • Ada M, San BT (2018) Comparison of machine-learning techniques for landslide susceptibility mapping using two-level random sampling (2LRS) in Alakir catchment area, Antalya, Turkey. Nat Hazards 90:237–263. https://doi.org/10.1007/s11069-017-3043-8

    Article  Google Scholar 

  • Adnan MSG, Rahman S, Ahmed N, Ahmed B, Rabbi M, Rahman M (2020) Improving Spatial Agreement in Machine Learning-Based Landslide Susceptibility Mapping. Remote Sens (basel) 12:3347. https://doi.org/10.3390/rs12203347

    Article  Google Scholar 

  • Agatonovic-Kustrin S, Beresford R (2000) Basic concepts of artificial neural network (ANN) modeling and its application in pharmaceutical research. J Pharm Biomed Anal 22:717–727

    Article  Google Scholar 

  • Akinci H, Zeybek M (2021) Comparing classical statistic and machine learning models in landslide susceptibility mapping in Ardanuc (Artvin), Turkey. Nat Hazards 108:1515–1543. https://doi.org/10.1007/s11069-021-04743-4

    Article  Google Scholar 

  • Aktas H, San BT (2019) Landslide susceptibility mapping using an automatic sampling algorithm based on two level random sampling. Comput Geosci 133:104329. https://doi.org/10.1016/j.cageo.2019.104329

    Article  Google Scholar 

  • Ba Q, Chen Y, Deng S, Yang J, Li H (2018) A comparison of slope units and grid cells as mapping units for landslide susceptibility assessment. Earth Sci Inform 11:373–388

    Article  Google Scholar 

  • Basu T, Pal S (2020) A GIS-based factor clustering and landslide susceptibility analysis using AHP for Gish River Basin, India. Environ Dev Sustain 22:4787–4819. https://doi.org/10.1007/s10668-019-00406-4

    Article  Google Scholar 

  • Bishop CM (1995) Neural networks for pattern recognition. https://doi.org/10.1093/oso/9780198538493.002.0004

    Book  Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45:5–32

    Article  Google Scholar 

  • Budimir MEA, Atkinson PM, Lewis HG (2015) A systematic review of landslide probability mapping using logistic regression. Landslides 12:419–436. https://doi.org/10.1007/s10346-014-0550-5

    Article  Google Scholar 

  • Bui DT, Tsangaratos P, Nguyen V-T, Liem NV, Trinh PT (2020) Comparing the prediction performance of a Deep Learning Neural Network model with conventional machine learning models in landslide susceptibility assessment. Catena (amst) 188:104426. https://doi.org/10.1016/j.catena.2019.104426

    Article  Google Scholar 

  • Chen T, Niu R, Jia X (2016) A comparison of information value and logistic regression models in landslide susceptibility mapping by using GIS. Environ Earth Sci 75:1–16

    Article  Google Scholar 

  • Chen T, Zhu L, Niu R, Trinder CJ, Peng L, Lei T (2020a) Mapping landslide susceptibility at the Three Gorges Reservoir, China, using gradient boosting decision tree, random forest and information value models. J Mt Sci 17:670–685. https://doi.org/10.1007/s11629-019-5839-3

    Article  Google Scholar 

  • Chen T, Guestrin C (2016) XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

  • Deng H, Wu X, Zhang W, Liu Y, Li W, Li X, Zhou P, Zhuo W (2022) Slope-Unit Scale Landslide Susceptibility Mapping Based on the Random Forest Model in Deep Valley Areas. Remote Sens (basel) 14:4245

    Article  Google Scholar 

  • Dou J, Yunus AP, Tien Bui D, Merghadi A, Sahana M, Zhu Z, Chen C-W, Khosravi K, Yang Y, Pham BT (2019) Assessment of advanced random forest and decision tree algorithms for modeling rainfall-induced landslide susceptibility in the Izu-Oshima Volcanic Island, Japan. Sci Total Environ 662:332–346. https://doi.org/10.1016/j.scitotenv.2019.01.221

    Article  Google Scholar 

  • Du G, Zhang Y, Iqbal J, Yang Z, Yao X (2017) Landslide susceptibility mapping using an integrated model of information value method and logistic regression in the Bailongjiang watershed, Gansu Province, China. J Mt Sci 14:249–268. https://doi.org/10.1007/s11629-016-4126-9

    Article  Google Scholar 

  • Gariano SL, Guzzetti F (2016) Landslides in a changing climate. Earth Sci Rev 162:227–252. https://doi.org/10.1016/j.earscirev.2016.08.011

    Article  Google Scholar 

  • Géron A (2017) Hands-on machine learning with scikit-learn and tensorflow: Concepts. Tools, and Techniques to build intelligent systems

  • Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT press. http://www.deeplearningbook.org

  • Goyes-Peñafiel P, Hernandez-Rojas A (2021) Landslide susceptibility index based on the integration of logistic regression and weights of evidence: A case study in Popayan. Colombia Eng Geol 280:105958. https://doi.org/10.1016/j.enggeo.2020.105958

    Article  Google Scholar 

  • Grozavu A, Margarint MC, Patriche C (2012) Landslide susceptibility assessment in the Brăieşti-Sineşti sector of Iaşi Cuesta. Carpathian Journal of Earth and Environmental Sciences 7:39–46

    Google Scholar 

  • GudiyangadaNachappa T, Kienberger S, Meena SR, Hölbling D, Blaschke T (2020) Comparison and validation of per-pixel and object-based approaches for landslide susceptibility mapping. Geomat Nat Haz Risk 11:572–600

    Article  Google Scholar 

  • Han H, Shi B, Zhang L (2021) Prediction of landslide sharp increase displacement by SVM with considering hysteresis of groundwater change. Eng Geol 280:105876. https://doi.org/10.1016/j.enggeo.2020.105876

    Article  Google Scholar 

  • Harmouzi H, Schlögel R, Jurchescu M, Havenith H-B (2021) Landslide susceptibility mapping in the vrancea-buzău seismic region, southeast Romania. Geosciences (Basel) 11:495

    Article  Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. Springer, New York. https://doi.org/10.1007/978-0-387-84858-7

  • He Y, Zhao Z, Yang W, Yan H, Wang W, Yao S, Zhang L, Liu T (2021) A unified network of information considering superimposed landslide factors sequence and pixel spatial neighbourhood for landslide susceptibility mapping. Int J Appl Earth Obs Geoinf 104:102508

    Google Scholar 

  • Huang Y, Zhao L (2018) Review on landslide susceptibility mapping using support vector machines. Catena (Amst) 165:520–529. https://doi.org/10.1016/j.catena.2018.03.003

    Article  Google Scholar 

  • Huang F, Yin K, Huang J, Gui L, Wang P (2017) Landslide susceptibility mapping based on self-organizing-map network and extreme learning machine. Eng Geol 223:11–22. https://doi.org/10.1016/j.enggeo.2017.04.013

    Article  Google Scholar 

  • Huang F, Tao S, Chang Z, Huang J, Fan X, Jiang S-H, Li W (2021) Efficient and automatic extraction of slope units based on multi-scale segmentation method for landslide assessments. Landslides 18:3715–3731

    Article  Google Scholar 

  • Jacobs L, Dewitte O, Poesen J, Sekajugo J, Nobile A, Rossi M, Thiery W, Kervyn M (2018) Field-based landslide susceptibility assessment in a data-scarce environment: the populated areas of the Rwenzori Mountains. Nat Hazard 18:105–124

    Article  Google Scholar 

  • Jacobs L, Kervyn M, Reichenbach P, Rossi M, Marchesini I, Alvioli M, Dewitte O (2020) Regional susceptibility assessments with heterogeneous landslide information: Slope unit-vs. pixel-based approach. Geomorphology 356:107084

    Article  Google Scholar 

  • Jain AK, Mao J, Mohiuddin KM (1996) Artificial neural networks: A tutorial. Computer (Long Beach Calif) 29:31–44

    Google Scholar 

  • Kavzoglu T, Colkesen I, Sahin EK (2019) Machine learning techniques in landslide susceptibility mapping: a survey and a case study. In: Pradhan SP, Vishal V, Singh TN (eds) Landslides: theory, practice and modelling. Springer International Publishing, Cham, pp. 283–301. https://doi.org/10.1007/978-3-319-77377-3_13

  • Kavzoglu T, Teke A (2022) Advanced hyperparameter optimization for improved spatial prediction of shallow landslides using extreme gradient boosting (XGBoost). Bull Eng Geol Env 81:201. https://doi.org/10.1007/s10064-022-02708-w

    Article  Google Scholar 

  • Keller JM, Gray MR, Givens JA (1985) A fuzzy K-nearest neighbor algorithm. IEEE Trans Syst Man Cybern SMC-15:580–585. https://doi.org/10.1109/TSMC.1985.6313426

    Article  Google Scholar 

  • Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th international joint conference on artificial intellige, vol 2. Montreal, Canada, pp 1137–1145

  • Krkač M, BernatGazibara S, Arbanas Ž, Sečanj M, MihalićArbanas S (2020) A comparative study of random forests and multiple linear regression in the prediction of landslide velocity. Landslides 17:2515–2531. https://doi.org/10.1007/s10346-020-01476-6

    Article  Google Scholar 

  • Lee S, Lee M-J, Jung H-S, Lee S (2020) Landslide susceptibility mapping using Naïve Bayes and Bayesian network models in Umyeonsan, Korea. Geocarto Int 35:1665–1679. https://doi.org/10.1080/10106049.2019.1585482

    Article  Google Scholar 

  • Li CY, Wang XC, He CZ, Wu X, Kong ZY, Li XL (2017) China National Digital Geological Map (Public Version at 1: 200 000 Scale) Spatial Database (V1), Development and Research Center of China Geological Survey; China Geological Survey (producer), 1957, National Geological Archives of China (distributor). NGA120157. K 1

  • Lima P, Steger S, Glade T (2021) Counteracting flawed landslide data in statistically based landslide susceptibility modelling for very large areas: a national-scale assessment for Austria. Landslides 18:3531–3546

    Article  Google Scholar 

  • Liu Z, Gilbert G, Cepeda JM, Lysdahl AOK, Piciullo L, Hefre H, Lacasse S (2021) Modelling of shallow landslides with machine learning algorithms. Geosci Front 12:385–393. https://doi.org/10.1016/j.gsf.2020.04.014

    Article  Google Scholar 

  • Liu R, Yang X, Xu C, Wei L, Zeng X (2022a) Comparative study of convolutional neural network and conventional machine learning methods for landslide susceptibility mapping. Remote Sens (Basel) 14:321

    Article  Google Scholar 

  • Liu S, Zhu J, Yang D, Ma B (2022b) Comparative Study of Geological Hazard Evaluation Systems Using Grid Units and Slope Units under Different Rainfall Conditions. Sustainability 14:16153. https://doi.org/10.3390/su142316153

    Article  Google Scholar 

  • Lombardo L, Mai PM (2018) Presenting logistic regression-based landslide susceptibility results. Eng Geol 244:14–24

    Article  Google Scholar 

  • Lucchese LV, de Oliveira GG, Pedrollo OC (2021) Investigation of the influence of nonoccurrence sampling on landslide susceptibility assessment using Artificial Neural Networks. Catena (Amst) 198:105067. https://doi.org/10.1016/j.catena.2020.105067

    Article  Google Scholar 

  • MacQueen J (1967) Some methods for classification and analysis of multivariate observations, In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. Oakland, CA, USA, pp. 281–297

  • Małka A (2021) Landslide susceptibility mapping of Gdynia using geographic information system-based statistical models. Nat Hazards 107:639–674. https://doi.org/10.1007/s11069-021-04599-8

    Article  Google Scholar 

  • Marcot BG, Hanea AM (2021) What is an optimal value of k in k-fold cross-validation in discrete Bayesian network analysis? Comput Stat 36:2009–2031

    Article  Google Scholar 

  • Marjanović M, Kovačević M, Bajat B, Voženílek V (2011) Landslide susceptibility assessment using SVM machine learning algorithm. Eng Geol 123:225–234. https://doi.org/10.1016/j.enggeo.2011.09.006

    Article  Google Scholar 

  • Meena SR, Puliero S, Bhuyan K, Floris M, Catani F (2022) Assessing the importance of conditioning factor selection in landslide susceptibility for the province of Belluno (region of Veneto, northeastern Italy). Nat Hazard 22:1395–1417. https://doi.org/10.5194/nhess-22-1395-2022

    Article  Google Scholar 

  • Metsalu T, Vilo J (2015) ClustVis: a web tool for visualizing clustering of multivariate data using Principal Component Analysis and heatmap. Nucleic Acids Res 43:W566–W570

    Article  Google Scholar 

  • Myronidis D, Papageorgiou C, Theophanous S (2016) Landslide susceptibility mapping based on landslide history and analytic hierarchy process (AHP). Nat Hazards 81:245–263. https://doi.org/10.1007/s11069-015-2075-1

    Article  Google Scholar 

  • Nguyen V, Pham B, Vu T, Prakash I, Jha S, Shahabi H, Shirzadi A, Ba D, Kumar R, Chatterjee J, Bui D (2019) Hybrid Machine Learning Approaches for Landslide Susceptibility Modeling. Forests 10:1–27. https://doi.org/10.3390/f10020157

    Article  Google Scholar 

  • Nguyen Thi To N, Liu C-C (2019) A new approach using AHP to generate landslide susceptibility maps in the Chen-Yu-Lan Watershed Taiwan. Sensors 19:505. https://doi.org/10.3390/s19030505

    Article  Google Scholar 

  • Pham BT, Tien Bui D, Prakash I, Dholakia MB (2017) Hybrid integration of Multilayer Perceptron Neural Networks and machine learning ensembles for landslide susceptibility assessment at Himalayan area (India) using GIS. Catena (amst) 149:52–63. https://doi.org/10.1016/j.catena.2016.09.007

    Article  Google Scholar 

  • Pham BT, Prakash I, Khosravi K, Chapi K, Trinh PT, Ngo TQ, Hosseini SV, Bui DT (2019) A comparison of Support Vector Machines and Bayesian algorithms for landslide susceptibility modelling. Geocarto Int 34:1385–1407. https://doi.org/10.1080/10106049.2018.1489422

    Article  Google Scholar 

  • Pourghasemi HR, Kornejady A, Kerle N, Shabani F (2020) Investigating the effects of different landslide positioning techniques, landslide partitioning approaches, and presence-absence balances on landslide susceptibility mapping. Catena (Amst) 187:104364. https://doi.org/10.1016/j.catena.2019.104364

    Article  Google Scholar 

  • Rasigraf O, Wagner D (2022) Landslides: An emerging model for ecosystem and soil chronosequence research. Earth Sci Rev. https://doi.org/10.1016/j.earscirev.2022.104064

    Article  Google Scholar 

  • Rousseeuw PJ (1987) Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65. https://doi.org/10.1016/0377-0427(87)90125-7

    Article  Google Scholar 

  • Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323:533–536

    Article  Google Scholar 

  • Saha S, Roy J, Pradhan B, Hembram TK (2021) Hybrid ensemble machine learning approaches for landslide susceptibility mapping using different sampling ratios at East Sikkim Himalayan, India. Adv Space Res 68:2819–2840. https://doi.org/10.1016/j.asr.2021.05.018

    Article  Google Scholar 

  • San BT (2014) An evaluation of SVM using polygon-based random sampling in landslide susceptibility mapping: The Candir catchment area (western Antalya, Turkey). Int J Appl Earth Obs Geoinf 26:399–412. https://doi.org/10.1016/j.jag.2013.09.010

    Article  Google Scholar 

  • Schlögel R, Marchesini I, Alvioli M, Reichenbach P, Rossi M, Malet J-P (2018) Optimizing landslide susceptibility zonation: Effects of DEM spatial resolution and slope unit delineation on logistic regression models. Geomorphology 301:10–20

    Article  Google Scholar 

  • Shapley LS (1952) A Value for N-Person Games. RAND Corporation, Santa Monica, CA. https://doi.org/10.7249/P0295

  • Shreve RL (1974) Variation of mainstream length with basin area in river networks. Water Resour Res 10:1167–1177

    Article  Google Scholar 

  • Singh P, Sharma A, Sur U, Rai PK (2021) Comparative landslide susceptibility assessment using statistical information value and index of entropy model in Bhanupali-Beri region, Himachal Pradesh, India. Environ Dev Sustain 23:5233–5250. https://doi.org/10.1007/s10668-020-00811-0

    Article  Google Scholar 

  • Steiger JH (1980) Tests for comparing elements of a correlation matrix. Psychol Bull 87:245–251

    Article  Google Scholar 

  • Stone M (1974) Cross-validatory choice and assessment of statistical predictions. J Roy Stat Soc: Ser B (Methodol) 36:111–133

    Google Scholar 

  • Štrumbelj E, Kononenko I (2014) Explaining prediction models and individual predictions with feature contributions. Knowl Inf Syst 41:647–665

    Article  Google Scholar 

  • Sun D, Wen H, Wang D, Xu J (2020) A random forest model of landslide susceptibility mapping based on hyperparameter optimization using Bayes algorithm. Geomorphology 362:107201. https://doi.org/10.1016/j.geomorph.2020.107201

    Article  Google Scholar 

  • Sun D, Xu J, Wen H, Wang D (2021) Assessment of landslide susceptibility mapping based on Bayesian hyperparameter optimization: A comparison between logistic regression and random forest. Eng Geol 281:105972. https://doi.org/10.1016/j.enggeo.2020.105972

    Article  Google Scholar 

  • Sun D, Gu Q, Wen H, Xu J, Zhang Y, Shi S, Xue M, Zhou X (2022) Assessment of landslide susceptibility along mountain highways based on different machine learning algorithms and mapping units by hybrid factors screening and sample optimization. Gondwana Res. https://doi.org/10.1016/j.gr.2022.07.013

  • Tanyu BF, Abbaspour A, Alimohammadlou Y, Tecuci G (2021) Landslide susceptibility analyses using Random Forest, C4.5, and C5.0 with balanced and unbalanced datasets. Catena (Amst) 203:105355. https://doi.org/10.1016/j.catena.2021.105355

    Article  Google Scholar 

  • Tien Bui D, Nguyen QP, Hoang N-D, Klempe H (2017) A novel fuzzy K-nearest neighbor inference model with differential evolution for spatial prediction of rainfall-induced shallow landslides in a tropical hilly area using GIS. Landslides 14:1–17. https://doi.org/10.1007/s10346-016-0708-4

    Article  Google Scholar 

  • Tsangaratos P, Ilia I (2016) Comparison of a logistic regression and Naïve Bayes classifier in landslide susceptibility assessments: The influence of models complexity and training dataset size. Catena (Amst) 145:164–179. https://doi.org/10.1016/j.catena.2016.06.004

    Article  Google Scholar 

  • Wang L-J, Sawada K, Moriguchi S (2013) Landslide susceptibility analysis with logistic regression model based on FCM sampling strategy. Comput Geosci 57:81–92

    Article  Google Scholar 

  • Xi C, Han M, Hu X, Liu B, He K, Luo G, Cao X (2022) Effectiveness of Newmark-based sampling strategy for coseismic landslide susceptibility mapping using deep learning, support vector machine, and logistic regression. Bull Eng Geol Env 81:174. https://doi.org/10.1007/s10064-022-02664-5

    Article  Google Scholar 

  • Yang C, Liu L-L, Huang F, Huang L, Wang X-M (2023) Machine learning-based landslide susceptibility assessment with optimized ratio of landslide to non-landslide samples. Gondwana Res 123:198–216. https://doi.org/10.1016/j.gr.2022.05.012

  • Zare M, Pourghasemi HR, Vafakhah M, Pradhan B (2013) Landslide susceptibility mapping at Vaz Watershed (Iran) using an artificial neural network model: a comparison between multilayer perceptron (MLP) and radial basic function (RBF) algorithms. Arab J Geosci 6:2873–2888. https://doi.org/10.1007/s12517-012-0610-x

    Article  Google Scholar 

  • Zhang W, Wu C, Tang L, Gu X, Wang L (2023) Efficient time-variant reliability analysis of Bazimen landslide in the Three Gorges Reservoir Area using XGBoost and LightGBM algorithms. Gondwana Res 123:41–53. https://doi.org/10.1016/j.gr.2022.10.004

  • Zhao B, Ge Y, Chen H (2021) Landslide susceptibility assessment for a transmission line in Gansu Province, China by using a hybrid approach of fractal theory, information value, and random forest models. Environ Earth Sci 80:441. https://doi.org/10.1007/s12665-021-09737-w

    Article  Google Scholar 

  • Zhou H, Gao J (2014) Automatic Method for Determining Cluster Number Based on Silhouette Coefficient. Adv Mat Res 951:227–230. https://doi.org/10.4028/www.scientific.net/AMR.951.227

    Article  Google Scholar 

  • Zhou C, Yin K, Cao Y, Ahmed B, Li Y, Catani F, Pourghasemi HR (2018) Landslide susceptibility modeling applying machine learning methods: A case study from Longju in the Three Gorges Reservoir area, China. Comput Geosci 112:23–37

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank Geoscientific Data & Discovery Publishing System for providing data sharing. The author is grateful to Dr. Huang Jian for his help with the skills of writing in this paper. The author thanks the associate editor and the reviewers for their useful feedback that improved this paper.

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Authors

Contributions

Author 1 (Chao Liu): Conceptualization, Data Curation, Methodology, Software, Investigation, Formal Analysis, Writing—Review & Editing.

Corresponding author

Correspondence to Chao Liu.

Ethics declarations

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Communicated by H. Babaie

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Highlights

• Utilize the K-means-KNN (KK) algorithm to enhance the stability and accuracy of landslide susceptibility mapping through machine learning.

• Evaluate and compare landslide susceptibility in Zhong County, Chongqing, using three sampling methods (random sampling, buffer sampling, and KK-sampling) and three models (Multilayer Perception, Random Forest, XGBoost).

• After applying the SHAP method based on Game theory to analyze the model in this study, it reveals that road, soil type, and elevation are the most influential factors.

• On average, KK-sampling demonstrated superior performance compared to random sampling and buffer sampling, resulting in an increase of 12.3% in accuracy, 42% in F1 score, and 20.1% in AUC.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, C. Optimization of negative sample selection for landslide susceptibility mapping based on machine learning using K-means-KNN algorithm. Earth Sci Inform 16, 4131–4152 (2023). https://doi.org/10.1007/s12145-023-01151-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12145-023-01151-z

Keywords

Navigation