Advanced machine learning techniques for enhanced landslide susceptibility mapping: Integrating geotechnical parameters in the case of Southwestern Cyprus

Tzampoglou, P.; Loukidis, D.; Anastasiades, A.; Tsangaratos, P.

doi:10.1007/s12145-025-01761-9

Advanced machine learning techniques for enhanced landslide susceptibility mapping: Integrating geotechnical parameters in the case of Southwestern Cyprus

RESEARCH
Open access
Published: 03 April 2025

Volume 18, article number 357, (2025)
Cite this article

Download PDF

You have full access to this open access article

Earth Science Informatics Aims and scope Submit manuscript

Advanced machine learning techniques for enhanced landslide susceptibility mapping: Integrating geotechnical parameters in the case of Southwestern Cyprus

Download PDF

P. Tzampoglou^1,2,
D. Loukidis²,
A. Anastasiades¹ &
…
P. Tsangaratos³

650 Accesses
Explore all metrics

Abstract

This study explores the potential enhancement of the performance of machine-learning-based landslide susceptibility analysis by the incorporation of key geotechnical parameters, namely Plasticity Index, Clay Fraction and Geological Strength Index (GSI), alongside geomorphological, geological, and hydrological factors. Utilizing geotechnical parameters, which are often overlooked in conventional probabilistic landslide susceptibility studies, can provide benefits, as they are directly related to the shear strength of the ground and the problem of slope stability. Herein, three methods, namely Logistic Regression, Random Forest and XGBoost are employed, to develop landslide susceptibility classifiers for the southwestern part of Cyprus, a region for which a detailed landslide inventory and geotechnical data are available. A dataset of 2500 landslide points and an equal number of non-landslide points were split into training (70%) and validation (30%) subsets. After processing the feature importance of 17 causal factors, lithology emerged as the most influential factor, followed by rainfall and land use, while GSI and plasticity index ranked sixth and seventh in the importance hierarchy. The capabilities of the three machine learning models were assessed and compared based on ROC curve analysis and 6 statistical metrics. Generally, the machine learning algorithms achieved high accuracy and predictive capability, succeeding in identifying more than 90% of the recorded landslides as areas of high to very high landslide susceptibility. The incorporation of geotechnical parameters resulted in modest but marked increase of statistical performance metrics.

Correlation Between Geotechnical Indexes and Landslide Occurrence in Southwestern Cyprus Using GIS and Machine Learning

Article Open access 13 December 2024

Effect of different mapping units, spatial resolutions, and machine learning algorithms on landslide susceptibility mapping at the township scale

Article 26 February 2025

Landslide susceptibility mapping using GIS-based machine learning algorithms for the Northeast Chongqing Area, China

Article 13 December 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Landslides are defined as the movement of ground down a slope due to gravity (Varnes 1978). They are one of the most destructive natural hazards and have significantly affected many countries worldwide causing thousands of deaths and injuries (Kavzoglu et al. 2014; Skidmore 2001). Based on the type of movement and geological conditions, landslides can be classified into different types such as rotational slides, translational slides, rockfalls, and flows (Alexoudi et al. 2010; Saunders and Glassey 2007; Varnes 1978). Slope stability is controlled by the geomorphological, geological, hydrological, geotechnical, and climatic conditions, as well as human activities (Tzampoglou et al. 2025; Gardezi et al. 2023; Ikram et al. 2022; Luino et al. 2022; Miklin et al., 2022). Among the recorded non-seismically triggered landslides for the period between 2004–2010 worldwide, the number of fatal landslides approached 2620, causing 32,322 fatalities (Petley 2012). Furthermore, this natural hazard causes significant economic losses on the scale of billions of US dollars each year in many countries (Tzampoglou and Loupasakis 2023). This is particularly true in regions characterized by mountainous terrain and high population density. For example, the average annual cost of landslides in Italy varies between 2.6 to 5 billion US dollars (Kjekstad and Highland 2009), while the cost of damages due to landslides in the Republic of Korea increased from 5.3 in 1997 to 13.9 billion Won in 2014 (Kim et al. 2018). Furthermore, the estimated annual total cost of landslides reached 1.4 billion US dollars in Canada (Sidle and Ochiai 2006) and 26.3 million US dollars in New Zealand (Sim et al. 2022).

The term susceptibility refers to the possibility of landslide activation, considering the prevailing conditions (Brabb, 1985). It does not refer to either when or how the landslide is triggered, or to the magnitude of the expected losses. The basic consideration of the landslide susceptibility map is the spatial distribution of the parameters that affect the phenomenon to identify zones that are prone to landslides without any time reference (Radbruch 1970). Stochastic landslide hazard analysis is based on certain basic assumptions (Carrara et al. 1999; Guzzetti et al. 1999, 2005): a) landslides that have occurred in the past did not occur by chance but when a series of factors acted cumulatively, b) old recorded landslides can be used as evidence to predict potential future failures, and c) each landslide has specific geomorphological characteristics that are absolutely evident on the surface of the earth.

The causal factors can be classified into two categories: preparatory (e.g., elevation, slope, aspect, lithology, geotechnical properties, tectonic faults, and human activities) and triggering (e.g., rainfall and seismicity) causal factors. The prediction of susceptible areas involves classification of the factors actively contributing to landslide activation. Various stochastic methodologies are currently available for susceptibility and hazard assessments, each with its own advantages and shortcomings (Fell et al. 2008; Rozos et al. 2008). Qualitative methods rely on expert opinions and do not require a landslide database, but their accuracy depends on expert knowledge and is influenced by subjectivity. Quantitative methods, based on statistical and probabilistic analyses, require a detailed landslide database. Their effectiveness is influenced by the complexity, uncertainty, and nonlinearity of the correlations between the spatial distribution of mass movements and the conditioning variables, which must be addressed with caution (Lee et al. 2003; Pradhan and Lee 2010). Recent approaches based on Machine Learning and Data Mining are more effective in managing uncertainty because they are not influenced by the nonlinear behaviour and complexity of systems (Tsangaratos and Ilia 2016). These advanced methods can learn and uncover hidden patterns in large multi-thematic databases. They are adept at handling data with different scales and types without relying on statistical distribution assumptions, making them powerful tools for analyzing complex datasets (Tsangaratos and Ilia 2016). Some characteristic examples include the Logistic Regression approach (Abeysiriwardana and Gomes 2022; Budimir et al. 2015; Du et al. 2022; Pokharel et al. 2021; Xing et al. 2021a; Xiong et al. 2022), Fuzzy Logic method (Abdi et al. 2021; Baharvand et al. 2020; Dandy et al. 2019; Nwazelibe et al. 2023; Vega and Hidalgo 2023), Artificial Neural Network method (Bragagnolo et al. 2020; Shahri et al. 2019; Tsangaratos and Benardos 2014), Bayes theorem which is based on weights of evidence (Batar and Watanabe 2021; Chen et al. 2021; Kouhartsiouk and Perdikou 2021; Kumar and Anbalagan 2019), Neural–Fuzzy method (Chen et al. 2019; Lucchese et al. 2021; Razavi-Termeh et al. 2021), Support Vector Machines (Dou et al. 2020; Kamran et al. 2021; Wang and Brenning 2021) and Decision Tree method (Arabameri et al. 2022; Guo et al. 2021; Taalab et al. 2018).

The present study considers three Machine Learning methods, namely Logistic Regression (LR), Random Forest (RF), and XGBoost (XGB). LR is a widely used statistical probability model applied in landslide susceptibility assessment, offering accurate and reliable results through its simple discriminative learning mechanism. It estimates the probability of a given feature (x) and label (y) directly by minimizing error (Ng and Jordan 2001). RF is an ensemble Machine Learning method that utilizes random binary trees with bootstrapping techniques (Breiman 2001). This non-linear, non-parametric algorithm can assess the importance of landslide-related factors. It can handle large numerical and categorical datasets, accounting for complex interactions. (Taalab et al. 2018). XGBoost is an increasingly popular gradient tree-boosting algorithm that creates a prediction model by boosting weak classification trees (Chen and Guestrin 2016; Cui et al. 2017). Τhe reduced computational time needed, in combination with high accuracy rates, have recently made this method one of the most popular in the studies of landslide susceptibility mapping (Kavzoglu and Teke 2022; Sahin 2020).

One of the fundamental factors in landslide susceptibility analysis is the lithology thematic layer, through which the effect of ground strength is introduced in the analysis. The lithology thematic layer contains only general information about the geological formations without explicitly considering geotechnical characteristics that control the ground strength, such as the Plasticity Index (PI), the Clay Fraction (CF) and the Geological Strength Index (GSI), and their spatial variation. In this study, we incorporate the geotechnical parameters PI, CF, and GSI as thematic layers to explore the potential of enhancing the accuracy and reliability of landslide susceptibility maps. The results of this study are expected to contribute to better-informed decision-making processes in land-use planning, infrastructure development, and disaster management, ultimately reducing the risk and impact of landslides on human life and property.

Study area

The island of Cyprus has been significantly affected by landslides because of its particular geological-geotechnical conditions and mountainous geomorphology (Myronidis et al. 2016; Northmore et al. 1988; Pantazis 1969). Over the past centuries, entire villages had to be relocated because of landslide phenomena, while recent failures in both natural and man-made slopes resulted in an estimated average annual cost of damage amounting to several millions euros.

The study area is the southwestern (SW) part of the island οf Cyprus, comprising of Paphos District and the western part of Limassol District (Fig. 1). The geology of this region is characterized by extensive outcrops of fine-grained clastic sedimentary rocks (e.g., shales, mudstones, and marls) of low strength (Northmore et al. 1986; Charalambous et al. 1986) (Fig. 2). The rock masses are often sheared and/or folded, resulting in low values of equivalent rock mass cohesion and friction angle. This is particularly true for the rocks of the Mamonia Complex, which is a group of allochthonous geological formations that were thrusted on Cyprus and were originally part of the accretionary prism formed during the early stages of subduction of the African tectonic plate under the Eurasian plate (Constantinou et al. 2002).

Another important aspect of the geology of the area is the existence of extensive outcrops of bentonitic clays and tectonic mélanges (i.e., fragments of Mamonia Complex and Troodos ophiolite rocks in a fine-grained matrix, often of high to very high plasticity). In the 1980’s, a joint study by the British Geological Survey (BGS) and the Geological Survey Department of Cyprus (GSD) examined the geotechnical properties of these argillaceous materials (Northmore et al. 1986; Charalambous et al. 1986). The laboratory results revealed that their effective peak friction angle is often lower than 20° (with the effective cohesion being only a few tens of kPa at most), with the residual friction angle in some cases falling below 10°.

Apart from geological/geotechnical conditions, other aggravating factors include regional rainfall patterns and local seismicity. Rainfall is unevenly distributed during hydrological years, with heavy rains during the winter and spring months and almost no rainfall during the summer months (Hart and Hearn 2013; Alexandris et al. 2016). The study area is also the most seismically active region of the island, encompassing a number of active faults. Moreover, its coast is only a few tens of kilometres away from the subduction zone, where most of the recent earthquake epicenters were observed (Cagnan and Tanircan 2010).

Hart and Hearn (2013) recorded 1842 landslides (active and inactive) of different types in the study area. Most of these slope instabilities can be characterized as creeping landslides, as their rates of movement are of the order of 10 cm/year. Exceptions are road cut failures, which develop quickly and abruptly after intense rainfall events. Such failures have occurred in recent years along the new and old highways connecting the cities of Paphos and Limassol, as well as in the rural road network closer to the Troodos Mountain range.

Despite the apparently slow rate of movement, creeping landslides result over the years in the accumulation of meters of ground displacement, which are capable of shattering even the most competent building foundations. As a consequence, a number of villages have been forced to relocate in the course of the last 70 years, most notably Statos, Agios Photios, Pentalia, Choletria and Fasoula (Alexakis et al. 2014; Hadjigeorgiou et al. 2006). Recent expansion of towns and the growth of the real estate sector in tourist areas have led to the construction of residential projects in regions that were traditionally avoided by local builders due to the known problematic nature of the ground. As a consequence, a number of new instabilities or reactivation of dormant landslides have occurred in the last 20 years, such as those in the areas of Armou, Nata, Tala, Petra tou Romiou, and Pissouri (Alexandris et al. 2016; Hearn et al. 2018, 2020; Perdikou and Kouhartsiouk 2019; Tzampoglou et al. 2024).

Materials and methods

The methodology for constructing the susceptibility maps of the study area is divided into four phases: (i) preparation of a landslide inventory map and selection of the landslides causative factors, (ii) pre-processing and data analysis, (iii) implementation of three machine learning algorithms for landslide susceptibility analysis, and (iv) validation and comparison of the produced maps.

The data sources which were used in this research are summarized in Table 1. All necessary processing for the construction of the thematic layers as well as the landslide susceptibility maps were performed using the program ArcMAP 10.7.1. In addition, various geospatial functions were performed using Python (v3.9). Furthermore, the programming language R (v4.3.2) was used to construct all the necessary codes and scripts of the algorithms.

Table 1 Data sources of the main causal factors

Full size table

Landslide inventory map

The first task of the methodology comprises the creation of a landslide and non-landslide inventory database (Wieczorek 1984). The landslide inventory database was based on landslides previously recorded and mapped by the Cyprus Geological Survey Department (GSD) (Fig. 3a). This database, which is available publically (GEOportal of Cyprus Geological Survey Department 2022) was provided by GSD also in digital format (shape file). In the prepared inventory database, landslides are represented as points from within the respective landslide polygon. The areas considered for extraction of non-landslide points were identified through satellite imagery inspection. These were supplemented by points from areas of very low landslide susceptibility (Günther et al. 2014; Wilde et al. 2018), as determined from previous susceptibility maps (Hart and Hearn 2010; Alexakis et al. 2014; Kouhartsiouk and Perdikou 2021), as well as from a preliminary landslide susceptibility map constructed within the context of the present study based solely on the values of the Frequency Ratio. Using this approach, the non-landslide points represent locations for which there is a high degree of certainty that there is no landslide and that the likelihood a landslide will occur there in the future is very small. This is juxtaposed to the selected landslide points for which it is almost certain that a landslide actually exists. Non-landslide points were automatically selected from the designated areas (polygons) by employing a spatial function that generates randomly positioned points, namely the Create Random Points function of the Data Management Toolset of the ArcMAP suite (ESRI 2015).

The approach for selecting non-landslide points described above relies on expert knowledge and landslide susceptibility calculations that precede the main analysis. To investigate the effect of potential biases and subjectivity stemming from this method of non-landslide point selection (designated hereinafter as method NLS-1) and its impact on model performance metrics, a sensitivity analysis was undertaken by comparing the results with those from additional analyses that were performed using non-landslide datasets generated using the buffer-controlled sampling method (referred to in the remainder as NLS-2). These analyses involved generating three alternative training and testing datasets using buffer zones with sizes of 250 m, 500 m, and 750 m around the polygons of the recorded landslides, and randomly sampling non-landslide points from the part of the study areas that falls outside the buffer zones and the landslide polygons. Additionally, for the analyses based on the buffer zone approach, each model underwent fivefold cross-validation, where the training data was split into five parts, and in each fold, four parts were used for training and one for testing. This process, repeated five times, produced average accuracy and standard deviation values, which provided insights into each model’s consistency and reliability during training.

Data pre-processing

Two separate tasks were performed during the data pre-processing phase. The first was the selection and creation of classes of causal variables. Generally, these variables should (a) have a relationship with the activation of landslides, (b) have spatial variation, (c) be expressed on a measurable scale and (d) not affect other variables in the final result (Ayalew and Yamagishi 2005). Then, multicollinearity and importance ranking assessment analyses were performed. The final outcome of this phase comprises a database with the thematic layers of all causal factors in digital format within ArcMAP, as well as training and testing datasets of landslide and non-landslide points.

Dataset preparation for spatial modelling

Considering previous studies in areas with similar geo-environmental conditions worldwide, the main causal factors for landslides are the ground elevation, ground inclination (slope and aspect), lithology, distance to tectonic faults, land use, distance to hydrographic networks, plan curvature, profile curvature, distance to road network, Stream Power Index (SPI), Topographic Wetness Index (TWI), and average rainfall amount. Among these, the only factor directly related to the ground properties is the lithology. However, lithology introduces in the analysis the effects of ground strength only qualitatively and without accounting for spatial variations of geotechnical properties within each geological formation. Herein, thematic layers of the geotechnical parameters strongly affecting the ground strength, namely the Plasticity Index (PI), the Clay Fraction (CF), and GSI are prepared and included in the input dataset. Moreover, we included in the analysis the difference between the dip direction of discontinuities with persistence (bedding planes, but also sills and dikes in the case of igneous rocks) and the slope dip direction, given that it plays a crucial role in the occurrence of planar slides. To the best of our knowledge, this is the first study that incorporates these factors as specific layers in probabilistic landslide susceptibility analysis with the aim to enhance its accuracy and level of detail.

Weighting of the classified causal factors was performed using the Frequency Ratio (FR) method of analysis. This analysis highlights the relationship between causal variables and the spatial distribution of landslides in a quantitative manner, and constitutes one of the simplest and most comprehensible probabilistic models (Waqas et al. 2021). Generally, the FR correlates with the likelihood of landslide events occurring (Khosravi et al. 2016). During this step, the thematic causal factor layers were normalized by a procedure that sets new values ranging from 0.1 to 0.9.

Multicollinearity analysis

After the creation of the normalized database, a multicollinearity analysis was carried out to investigate the intercorrelation between the landslide causal factors. This procedure needs to be performed to ensure that all causal factors have a sufficient degree of independence from each other (Achour and Pourghasemi 2020). For this purpose, variance inflation factor (VIF) and tolerance index (TOL) were calculated. Severe multicollinearity is considered to occur when the TOL index is lower than 0.1 and, equivalently, the VIF index is higher than 10 (Bui et al. 2019). In this study, the aforementioned indexes were calculated using the olsrr package (Hebbali 2017) in R programming language (v4.3.2), according to the following equations:

$$TOL=1-{R}^{2}$$

(1)

$$VIF=\frac{1}{TOL}$$

(2)

where R² is the regression coefficient of determination.

Feature importance analysis

The feature importance of the causal factors was estimated using the mean decrease accuracy (MDA) value within the Random Forest (RF) model. This measure assesses the importance of each feature in making accurate predictions. MDA is determined by permuting the values of each feature and observing a decrease in the accuracy of the model. A significant drop in accuracy indicates that the feature is important as it contributes significantly to the model’s predictive performance. Features with higher MDA values are thus considered more important because they help reduce the impurity or error of the model, enhancing its overall accuracy. Conversely, features with lower MDA values have less impact on the model’s predictive capability.

Preparing the training and testing datasets

Finally, each of the landslide and non-landslide point groups were randomly divided into a training dataset and a testing dataset. This procedure was performed using geostatistical analysis tools within the ArcGIS program. Subsequently, the training and testing datasets of landslide and non-landslide points were combined to create the final training and testing datasets, which were further analyzed using R during the next phases.

Machine learning classifiers

Logistic regression

The Logistic Regression (LR) is a quantitative statistical method that is widely used to predict the probability of landslide occurrence as a function of the predictor variables (Shafapour et al., 2017). These variables can be either continuous, discrete, or any combination of both types, and do not necessarily need to have normal distributions. The independent variables within this model serve as predictors of the dependent variable and can be assessed on nominal, ordinal, interval, or ratio scales, whereas the dependent variable is presented in a binary format. The association between the dependent and independent variables exhibits a non-linear relationship (Yesilnacar and Topal 2005).

Generally, the probability (p) of a landslide occurring can be estimated using the following equation:

$$p= \frac{1}{1+{e}^{-z}}$$

(3)

where z is a linear combination of the selected causal variables and can be expressed as follows:

$$z={b}_{0}+{b}_{1}{x}_{1}+{b}_{2}{x}_{2}+\dots +{b}_{n}{x}_{n}$$

(4)

where b_i (i = 0,n) are regression coefficients, and x_i are the independent variables (causal factors).

Random forest

The Random Forest (RF) method was introduced in the 90’s by Breiman (2001). This approach is an ensemble Machine Learning method that exploits random binary trees using a subset of observations through bootstrapping techniques (Breiman 2001; Hastie et al. 2009; Son et al. 2018). In general, this methodology achieves high accuracy in both classification and regression tasks by decreasing the variance of the classification error (Catani et al. 2013; Kamal et al. 2022; Lagomarsino et al. 2017; Xiao et al. 2021). Randomly selected training subsets encompassing both samples and variables are employed in the classification process. For each subset, a classifier yields a result. The culmination of the RF model entails aggregating all the outcomes provided by the classifiers (Nhu et al. 2020). To achieve the best possible RF performance, the out-of-bag error must be eliminated by evaluating the minimum number of trees and the number of variables used in each split. Herein, these parameters were determined as follows: n.trees = 2000, m.try = 11.

Extreme Gradient Boosting (XGBoost)

XGBoost was introduced by Chen and Guestrin (2016) and has become one of the most popular and powerful machine-learning algorithms used for both regression and classification tasks. It is an ensemble learning method known for its speed and performance. XGBoost enhances the traditional gradient boosting algorithm (Friedman 2001, 2002) by implementing several optimization and parallelization techniques. Generally, this algorithm achieves a better performance by learning from weak learners’ outputs (usually decision trees) (Hastie et al. 2009). It builds trees sequentially, and each subsequent tree corrects the errors made by the previous tree. XGBoost incorporates regularization techniques to prevent an overfitting and includes parameters that control model complexity (Can et al. 2021). Moreover, it can efficiently utilize multicore CPUs during training, making it faster than the traditional gradient-boosting implementations. In this study, the tune parameters for the XGBoost method were estimated to be max.depth = 3, nrounds = 37.

Landslide susceptibility map

All the above methods (LR, RF, XGBoost) provide as a result the probability of a landslide occurring in a specific location. The final score can take values between 0 and 1, indicating areas where landslides are very unlikely and very likely, respectively. The natural breaks method was used to create five landslide susceptibility classes: very low (VLS), low (LS), moderate (MS), high (HS), and very high (VHS).

Methods of validation and accuracy assessment

In the final phase of this research work, validation and comparison of the performance of the three Machine Learning methods were performed using the testing dataset. In this context, the number of incidents that were classified correctly were compared to the number of incidents that were misclassified by the model. Statistical metrics, namely accuracy, specificity, precision, recall, F1 score and Cohen’s kappa index, were then calculated. Furthermore, the Receiver Operating Characteristic (ROC) curve and the corresponding area under the curve (AUC) were calculated. The ROC curve along with accuracy, precision, and recall provide information about the performance of the methodology (Davis and Goadrich 2006), while the AUC provides an estimation of how well the classifier can distinguish between the positive and negative classes (Fawcett 2006).

In this study, the outcomes of the three Machine Learning classifiers were juxtaposed using non-parametric and linear regression analyses. Specifically, the Wilcoxon signed-rank test (Noether 1992; Wilcoxon 1992) was used for examining the statistical differences in predictive capability between the three models.

Input datasets

The landslide database provided by GSD contains 1936 landslides in total (Fig. 3a), both active and inactive (dormant and relict). The landslides were represented by polygon features, and their size ranges from 60 m² to 4 km². A total of 2500 landslide points were generated from the 1936 polygons, taking into account the extent of each landslide, as well as its type. Accordingly, 2500 points with no landslides were selected, as described in the previous section. Then, 1750 points (70%) from each of the two groups were randomly selected to form the training dataset, while the remaining 750 points (30%) were assigned to the testing dataset (Fig. 3b). This size of training dataset is designated as LP1750 in the remainder.

In our study, the training dataset contains both active and inactive (dormant and relict) landslides. So, the Machine Learning models were trained to identify high susceptibility regions on the basis of areas that have suffered ground movements at any time in the past, even if they seized being active long ago. The choice of treating dormant and relict landslides as locations of very high landslide susceptibility stems from recent observations of reactivation of inactive landslides, mostly due to changes in land use (Tzampoglou et al. 2024).

The study area happens to be thoroughly surveyed for landslides (Hart & Hearn 2010), resulting in a sizeable landslide inventory. Hence, to examine the performance of the methods in a case of relative scarcity of landslide data, a limited number of analyses were performed with the training dataset reduced to only 20 landslide points. These were selected randomly by utilizing the createDataPartition function. This algorithm ensures that the splits are stratified based on the landslide samples and preserves the proportion of classes in each subset, thus maintaining the integrity of the distribution across the various causal factors (Kuhn 2008). The 20 landslide points were paired with a selection of an equal number of non-landslide points. This reduced size of training dataset is labelled as LP20. The testing dataset was the same as in the analysis with the full training dataset, i.e. 750 landslide points and 750 non-landslide points, in order to have a common basis for comparison between the two scenarios. It should be noted that, in the case of the small inventory scenario, the FR was revaluated on the basis of the reduced dataset before performing the training of the models.

A total of 17 causal factors (geomorphological, geological/geotechnical, hydrological and anthropogenic) were considered for inclusion in the development of the susceptibility maps, which are presented in the following paragraphs.

Geomorphological factors

All the geomorphological factors were derived from the Digital Elevation Model (DEM) with a 15 m resolution, created using topographic maps of the Department of Land and Surveys (Cadastre) of the Republic of Cyprus. Specific geo-processing tools within ArcGIS were used to estimate the following derivative factors: slope angle, aspect, plan curvature, and profile curvature.

Elevation is one of the most common factors to be considered for the estimation of landslide susceptibility and is generally through to have a positive correlation with the occurrence of landslides (Raghuvanshi et al. 2015). Elevation, in combination with rainfall, is closely related to the manifestation of landslides in mountainous areas (De Oliveira et al. 2016). Herein, the elevation was divided into four classes, namely < 350 m, 350–700 m, 700–1050 m and > 1050 m (Fig. 4a and Table 2).

Table 2 Spatial relation of causal factor and landslides by Frequency Ratio

Full size table

Similar to elevation, the slope angle (Fig. 4b) and aspect (Fig. 4c) are the factors that are broadly used for the investigation of landslide failure mechanisms because they both directly or indirectly affect the stability of the ground (Gaidzik and Ramírez-Herrera 2021; Sun et al. 2021). For example, the shear stress increases with increasing slope angle (Aboye 2009). In addition, the aspect affects climatic parameters such as the degree of sun exposure and evapotranspiration (Conforti et al. 2014; Wang et al. 2019; Raghuvanshi et al. 2015). The slope angle was divided into six classes (< 5°, 5°−15°, 15°−25°, 25°−35°, 35°−45° and > 45°), while the aspect into nine classes (North, Northeast, East, Southeast, South, Southwest, West, Northwest, Flat) (Table 2).

To generate the thematic map of the dip direction (azimuthal) difference θ, first a map of the spatial distribution of dip direction of bedding planes (including sills and dikes) was created using all the geological compass measurements shown in the geological maps provided by GSD. The dip markings were digitized into ArcGIS in the context of this study. This dataset was enhanced with additional geological compass measurements obtained during fieldwork campaigns. The resulting geodatabase comprised 1520 measurements in total. The dip direction difference θ (Fig. 4d) was then calculated using the following formula:

$$\theta =\text{min}(\left|{AZ}_{1}-{AZ}_{2}\right|, {360}^{o}-\left|{AZ}_{1}-{AZ}_{2}\right|)$$

(5)

where AZ₁ is the dip direction of the ground surface (aspect) and AZ₂ is the discontinuity dip direction. As θ approaches zero, the likelihood of planar sliding increases. The value of θ cannot exceed 180°. Four classes (0°−30°, 30°−60°, 60°−90^o and > 90°) were considered for θ (Table 2).

The curvature factors play an active role in the manifestation of landslides as they affect the surface runoff and the distribution of erosion (Moosavi and Niazi 2016). The plan curvature (Fig. 4e) is perpendicular to the direction of the maximum slope and controls the convergence or divergence of surface water across the surface (Carson and Kirkby 1972). The plan curvature falls into three categories: positive values reflect convex surfaces, negative values reflect concave surfaces, and zero values reflect linear surfaces (Buckley 2010). The profile curvature (Fig. 4f) is parallel to the steepest slope (Ayalew et al. 2004), affecting the changes in water flow velocity across the surface, as well as the deposition and erosion (ESRI 2015; Zevenbergen and Thorne 1987). The profile curvature falls into three categories: positive values reflect concave surfaces, negative values reflect convex surfaces, and zero values reflect linear surfaces (Chapi et al. 2017).

Geological/geotechnical factors

Lithology reflects the physical and mechanical characteristics of the ground. Based on the GSD geological maps, the lithology was divided into 18 classes, namely Quaternary Formations, Nicosia Formation, Kalavasos Formation, Pachna Formation, Lefkara Formation, Talus, Kannaviou Formation (bentonitic clays), Kannaviou volcanoclastics, Agia Varvara Formation, Pera Pedi Formation, Mélange, Radiolarian mudstones, Limestones, Silica sandstones, Lavas, Diabase group, Serpentinite and Gabbro (Fig. 2 and Table 2).

Proximity to tectonic faults foretells pre-sheared rock mass of poor quality. Hence, distance to fault (Fig. 5a) is one of the main preparatory factors of slope failures (Wu et al. 2014). In this study, it was categorized into five classes, namely < 250 m, 250–500 m, 500–750 m, 750-100 m and > 1000 m (Table 2).

The geotechnical properties frequently reported in geotechnical investigations for engineering projects, such as CF, PI and GSI, are strongly related to the shear strength of the ground (Anis et al. 2019; Sartohadi et al. 2018; Mirabedini et al. 2022). High CF and PI values generally lead to slope stability problems (Yalcin 2007; Ahmad et al. 2019; De Ploey and Cruz 1979). To establish the spatial distribution of the geotechnical properties, 667 geotechnical boreholes contained in the files of GSD were processed. This dataset was enhanced with additional surface samples obtained during fieldwork campaigns. Each geological formation was examined separately. For geological formations that had plenty of data well distributed throughout their extent, interpolation was performed using the Inverse Distance Weighted (IDW) spatial analysis tool. On the other hand, in those formations where there was either scarcity of data or the data points where cluttered in regions of limited extent, the average was adopted as an indicative value. The assembly of the spatial distributions of geotechnical properties per geological formation into final thematic maps with resolution 15 m × 15 m for the entire study area was done in ArcGIS (raster calculator, mosaic to new raster, etc.). The CF (Fig. 5b) was divided into five classes, namely 0–15%, 15–30%, 30–45%, > 45%, and N/A (i.e. not applicable), while PI (Fig. 5c) into 0–20%, 20–40%, 40–60%, > 60%, and N/A (Table 2).

The GSI (Fig. 5d) is a geotechnical characterization system that quantifies the quality of rock masses by considering a series of factors, i.e. rock mass structure, number of discontinuity sets, schistosity, roughness and quality of discontinuities (Marinos et al. 2005). The GSI directly relates to rock mass strength and thus plays an important role in landslide occurrence (Mirabedini et al. 2022; Singh et al. 2014). To create the GSI thematic layer, the geological formations were aggregated into four larger groups: 1) Quaternary deposits, Talus; 2) Nicosia F., Kalavasos F., Pachna F., Lefkara F.; 3) Kannaviou F. (bentonitic clays), Kannaviou volcanoclastics, Agia Varvara F. (metamorphics), Pera Pedi F., Mélange, Radiolarian Mudstones, Limestones, Silica Sandstones; 4) Lavas, Diabase, Serpentinites, Gabbro. GSI data was collected from the field by inspecting rock masses exposed in road cuts and creeks (a total of 48 observations well-spread across the study area). Each of these four group was treated separately through interpolation processes using IDW spatial analysis tool. Finally, the GSI was divided into four classes (0–40, 40–60, 60–90, N/A) (Table 2).

Hydrological factors

Rainfall is an important triggering factor (Segoni et al. 2018). The average annual rainfall, in particular, can be considered also a preparatory factor because it affects the long-term groundwater table and, thus, the sustained pore water pressure and mobilized shear strength (Wistuba et al. 2021). The thematic map of average annual rainfall was established based on data from 24 rainfall stations of the Department of Meteorology of Cyprus for the period 1991—2022 (Fig. 6a) using Inverse Distance Weighted (IDW) interpolation in the ArcGIS program. The highest mean annual rainfall were observed in the northeastern part of the study area, whereas the coastal areas had the lowest values. It should be mentioned that the average annual rainfall was introduced as continuous raw data (Fig. 6a) instead of being divided into classes. The reason for this is that preliminary analyses showed that rainfall has one of the strongest influences on landslide susceptibility. As a result, division into classes resulted into artifacts in the susceptibility maps stemming from the borders between regions of different rainfall classes, such as unrealistically abrupt (discontinuous) changes in landslide susceptibility.

The distance to the hydrographic network (Fig. 6b) can affect the stability of slopes because in areas near streams the phreatic water table is expected to be at a relatively shallow depth, leading to reduced shear strength values. The study area was divided into five multi-ring buffer zones from both sides of the thawlegs (< 100 m, 100–200 m, 200–300 m, 300–400 m and > 400 m) (Table 2).

The Stream Power Index (SPI) (Fig. 6c) can be used to estimate the eroding power of surface water (Althuwaynee et al. 2014; Sharma 2010). The probability of landslide increases as SPI increases (Akgun and Türk 2010; Althuwaynee et al. 2014). This index was calculated using the DEM through the following equation:

$$SPI=A\;\text{tan}\;\beta$$

(6)

where A signifies the upstream area and β the slope angle. The SPI was split into 5 classes (−13 to −9, −9 to −2, −2 to 0, 0 to 2, 2 to 8) (Table 2).

The Topographic Wetness Index TWI (Fig. 6d) calculates the ratio of the accumulated surface water flow to the slope gradient (Beven and Kirkby 1979; Gruber and Peckham 2009; Sørensen et al. 2005). Higher values lead to elevated susceptibility to landslides because the ground can hold more moisture (Sun et al. 2020). This index was calculated using the DEM, considering the following equation:

$$TWI=\text{ln}\left(\frac{A}{\text{tan}\beta }\right)$$

(7)

The TWI was divided into 6 classes (2–4, 4–5, 5–6, 6–7, 7–8, > 8) (Table 2).

Anthropogenic factors

The construction of road networks influences the ground stability through the extensive earthworks (road cuts and embankments) and the removal of vegetation, as well as the imposition of additional external loads (Bourenane et al. 2015). The distance to the road network is inversely related to the likelihood of landslide occurrence (Pawluszek and Borkowski 2017). In this study, the distance to the main road network (Fig. 7a) was divided into five classes (< 100 m, 100–200 m, 200–300 m, 300–400 m and > 400 m) (Table 2).

Changes in land use over the years can lead to slope stability issues (Meneses et al. 2019; Roccati et al. 2021; Shu et al. 2019). The land use map was produced based on the CORINE Land Cover inventory (Corine, 2018). Considering the nature of their impact on landslides, land use types (Fig. 7b) were grouped into five classes as follows: 1) Continuous urban fabric, Road and rail networks and associated land, Mineral extraction sites (CLC codes 111, 122, 131); 2) Discontinuous urban fabric, Industrial or commercial units, Permanently irrigated land, Sclerophyllous vegetation, Transitional woodland-shrub, Sparsely vegetated areas (CLC codes 112, 121, 212,323,324,333); 3) Vineyards, Fruit trees and berry plantations, Olive groves, Broad-leaved forest, Coniferous forest, Beaches (CLC codes 221, 222, 223, 311, 312, 331); 4) Airports, Construction sites, Sport and leisure facilities, Non-irrigated arable land, Pastures, Annual crops associated with permanent crops, Complex cultivation patterns, Land principally occupied by agriculture, Natural grasslands, Bare rocks (CLC codes 124, 133, 142, 211, 231, 241, 242, 243, 321, 332); 5) Wetlands, Water bodies (CLC codes 512, 411) (Table 2).

Numerical Results

Frequency ratio analysis

The frequency ratio (FR) values were estimated for each class of causal factors. Higher FR values indicate a stronger correlation between the class of causal factors and landslide occurrence. This means that, as the FR value increases, the likelihood of landslides associated with that specific class also increases (Das et al. 2023). The resulting FR values on the basis of the training dataset containing 1750 landslide points are shown in Table 2. Generally, the largest FR values among all classes were observed for Kannaviou formation (clays FR = 4.75, volcanoclastics FR = 12.04) and talus (FR = 4.84), for regions with a slope angle larger than 45° (FR = 7.42) and for PI larger than 60 (FR = 4.58).

Focusing on geomorphological causal factors, the highest frequency of landslides was observed in areas with elevation between 350 and 700 m (FR = 1.33). FR values are lower in areas above 700 m elevation, probably because of sparse urban development, limited road networks, and dense vegetation. Regarding ground inclination, the areas with slope angles in the ranges 5–15° and 15–25° present also relatively high values of FR (1.21 and 1.55, respectively). The FR values appeared to decrease in areas with more than 25° slope angles, likely due to reduced rainwater infiltration. Exceptions are areas with over 45° slope angle, where FR becomes very large. The results regarding the aspect indicate that most of the landslides occurred with north orientation (North: FR = 1.33; Northwest: FR = 1.30; Northeast: FR = 1.26), highlighting the impact of climatic factors (the NNW–NNE-oriented slopes are mostly sunless, facilitating larger rainwater infiltration). The results for the plan curvature (< −0.2: FR = 1.47) and profile curvature (> 0.15: FR = 1.27) indicated that most of the landslides were recorded on concave slopes. This could be explained by the fact that concave slopes are susceptible to landslides because they can hold moisture (Sujatha et al. 2012). The dip-direction difference θ does not seem to have a significant effect. This is probably because most of the weak formations in the study area (Kannaviou formation, radiolarian mudstones) are intensely folded or they have mostly a chaotic structure (mélanges), limiting the prevalence of planar failure mechanisms.

Regarding the geological factors, most landslides are recorded in the Kannaviou Formation volcanoclastics (FR = 12.04), talus (FR = 4.84), Kannaviou Formation (bentonitic clays) (FR = 4.75), radiolarian mudstones (FR = 3.86) and mélange (FR = 3.20). Furthermore, the likelihood of presence of landslide is high near faults (distance < 250 m: FR = 1.21) and decreases substantially with distance from them.

Considering hydrological factors, the distance from the hydrographic network is strongly correlated with the occurrence of landslides. As expected, the FR values exhibit an inverse relationship with the distance from rivers and streams, with FR ranging from 1.33 at distances less than 100 m from the hydrographic network to 0.28 at distances exceeding 400 m. The SPI index appears to be an increasing function of landslide occurrence, ranging from FR = 0.69 for SPI values between −13 and −9 to FR = 1.37 for SPI values between 2 and 8. An exception observed in the SPI range 0 to 2 (FR = 1.41) may be attributed to the influence of other factors such as geological conditions, land use, or human activities that mitigate the effects of high SPI. The ΤWI classes 4–5, 5–6, and 6–7 present the greatest FR values (1.07, 1.19 and 1.04, respectively), while the class in which the TWI index is over 8 (FR = 0.74) presents similar values to the class in which the TWI ranges from 2 to 4 (FR = 0.77). This observation could be attributed to the fact that extremely high TWI values often correspond to very flat areas, where water accumulation occurs without ground inclination sufficient to trigger landslides, or areas with efficient drainage systems that reduce the impact of water on slope stability. Conversely, lower TWI values (2–4) may still represent areas with moderate slopes and some degree of water accumulation, maintaining a consistent landslide risk.

Considering the anthropogenic factors, it appears that the distance from the main road network does not correlate inversely with the occurrence of landslides (as it would have been expected), as the FR values increase with distance from it. This finding is mainly due to the fact that almost all the landslides in the study area are failures of natural slopes. Furthermore, the majority of those are inactive landslides which had developed before the existence of road networks. Moreover, the road network is mainly developed in the seaside plains of the study area; therefore, no valid conclusions can be drawn.

Finally, the clay fraction (CF) correlates positively with landslide occurrence, as it signifies low shear strength. The largest FR values were observed in areas where the CF exceeded 45% (FR = 2.93). The PI follows the same trend, with the highest FR values observed in areas in where the PI is over 60% (FR = 4.58). Finally, the FR turns out to be in general a decreasing function of GSI. This is expected because high GSI values indicate large rock mass strength.

Multicollinerarity analysis and factor importance

The results of the multicollinearity analysis indicate that the causal factors can be considered sufficiently independent from each other, as all TOL values are well above 0.1 (Table 3). The lowest TOL values are observed for PI (0.459), lithology (0.481), CF (0.569) and GSI (0.607), attesting to a certain degree of correlation of the geotechnical factors to lithology, as well as to each other (e.g. geomaterials with high CF usually exhibit high PI values).

Table 3 Results of multicollinearity analysis

Full size table

According to Mean Decrease Accuracy (MDA) results based on the Random Forest model (Fig. 8) trained with dataset LP1750 / NLS-1, the most important variable in our study area is the lithology (a geological factor), followed by rainfall (an environmental factor), land use (an anthropogenic factor) and slope angle (a geomorphological factor). It is interesting to note that the geotechnical factors PI and GSI rank 6th and 7th in importance among the 17 causal factors, despite their correlation with lithology. This finding suggests that the spatial variation of these geotechnical properties within a given geological formation may play a measurable role in landslide occurrence. Nevertheless, the other geotechnical factor, CF, appears to be of small importance. Finally, the factor with the lowest importance according to MDA analysis is the dip direction difference.

Landslide susceptibility maps

Figure 9 presents the landslide susceptibility maps generated by the three Machine Learning algorithms (LR, RF, XGBoost) using the training dataset LP1750 / NLS-1, i.e. 1750 landslide points with equal number of non-landslide points sampled using the NLS-1 approach described in Sect. "Landslide inventory map". All maps display a gradation of five levels of landslide susceptibility, namely very high (VHS), high (HS), medium (MS), low (LS) and very low (VLS), which were produced using the natural breaks classification method. The percentage of the study area covered by the HS and VHS classes is 41.2%, 36.3% and 34.7% according to the LR, RF and XGBoost methods, respectively. These areas are found mainly in the central and southeastern part, and in a separate cluster close to coast in the northwestern part. This is due to the presence of outcrops of weak argillaceous geomaterials in these regions, namely Kannaviou formation, mélange and radiolarian mudstones, a fact attesting to the high importance of the lithology. Areas of very low and low susceptibility are found close to the coastlines due to the presence of the more sandy Quaternary formations and low slope angles, and in the northern parts which are dominated by the outcrops of high strength igneous rocks of the Troodos ophiolite.

The rather unusually large area classified as "very high susceptibility" (VHS) in the maps produced by Logistic Regression (29.7%), Random Forest (27.5%), and XGBoost (22.7%) models (Fig. 9) can be attributed to the fact that 9% of the study area is covered by the recorded (mapped) landslides (Fig. 3a), coupled with the predictive power of the models. An additional reason is that most of the recorded landslides happen to be in argillaceous formations characterized by very low friction angle, such as the bentonitic clays of the Kannaviou formation, mélanges and the marls of the Nicosia formation (Northmore et al. 1986; Tzampoglou et al. 2024). As a consequence, many landslides occur in areas with low slope angles (e.g. 7°−12°) in addition to regions with rugged terrain.

Performance of machine learning models

The ROC-curves for the three models (trained with dataset LP1750 / NLS-1) are shown in Fig. 10, while the associated metrics are presented in Table 4. It can be seen that all the models exhibit large AUC (Area-Under-the-Curve) values (≥ 0.98), attesting to a very high performance. In order to investigate the significance of differences between the models, Wilcoxon signed-rank tests at the 5% significance level were used. According to this test, there are no significant differences between models if the p-value is more than 0.05 and the z-value falls in the −1.96 to 1.96 range. The Wilcoxon signed-rank test results are shown in Table 5. The LR-RF comparison yields a z-value of 2.16 and a p-value of 0.0303, indicating a statistically significant difference and suggesting that Random Forest (RF) outperforms Logistic Regression (LR). For the XGBoost-LR comparison, the z-value is 1.86 with a p-value of 0.0625, which is larger than the threshold value of 0.05, but close enough to suggest a potential trend towards significance. Therefore, while RF outperforms LR, the potential superiority of XGBoost over LR is suggested but not confirmed. Lastly, the XGBoost-RF comparison yielded a z-value of 1.46 and a p-value of 0.1426, indicating no significant difference in performance between these two models.

Table 4 ROC-curve parameters from analyses with training dataset LP1750 / NLS-1

Full size table

Table 5 Results of Wilcoxon signed-rank model (training dataset LP1750 / NLS-1)

Full size table

To further investigate the prediction performance of the three models, the polygons of all 1936 recorded landslides were divided into grid cells of size 15 m x 15 m (same size as for the landslide susceptibility map), resulting in 578967 landslide cells in total. Subsequently, the percentage of landslide cells falling in each class of the susceptibility maps of Fig. 9 was evaluated. As seen in Fig. 11a, RF achieved the best results, as 87% of the recorded landslides falls within the very high landslide susceptibility level, while the respective percentages are 78% for XGBoost and 83% for LR. Examining the landslide frequency alone may lead to misinterpretation. For example, if a map classifies 90% of a study area as having high or very high landslide susceptibility, it may show extremely high predictive performance in terms of capturing the recorded landslides. However, this result would be unrealistic as it would likely overestimate the risk in many areas. To address this issue, the predictive capability was also examined in terms of relative landslide density, i.e. in relation to the total spatial coverage of each susceptibility level in the study area (Fig. 11b). The largest values of relative landslide density are observed in the very high susceptibility class, with RF achieving 71%. Lower values were observed using the LR (68%) and XGBoost (66%) methods.

To further examine the degree of correspondence of recorded landslides to high susceptibility values independently of the classification method (natural breaks) used for generating the five susceptibility classes, the susceptibility values were divided in 10 equal intervals (i.e. 0–0.1, 0.1–0.2, ….,0.9–1.0) and their spatial extent on the study area is presented in Fig. 11c. Most of the study area (30%−36%) falls within the lowest susceptibility value range (0–0.1). A substantial percentage falls in the 0.9–1.0 range (21%−26%). This can be explained by a) the fact that a large portion (9%) of the study area is covered by recorded landslides (inventory) and b) the large extent of outcrops of weak argillaceous geomaterials. In our case, a model based on the modern high-performing machine learning methods should predict at least 9% of the study area to have landslide susceptibility values in the 0.9–1.0 range.

Tables 6 and 7 present various statistical metrics of testing performance for the case in which the models are trained using the large dataset of 1750 landslide points (LP1750 / NLS-1), and compare them to those achieved when the models are trained using a reduced dataset of only 20 landslide points (LP20 / NLS-1). The best performing model with respect to each metric is noted with bold font. For the calculation of the statistical metrics, two different values, 0.5 (Table 6) and 0.8 (Table 7), are considered for the positive classification threshold (also called probability threshold). For the models trained using the LP1750 dataset, the accuracy, specificity, precision and recall attain very high values (larger than 90%) when considering the 0.5 threshold value. Moreover, the F1 score is larger than 0.9 and the Cohen’s kappa (κ) is larger than 0.8, indicating excellent performance. The RF has the best performance in all metrics, while LR exhibits the smallest values among the three methods. However, when the threshold value is raised to 0.8, the LR and RF yield practically the same metric values, with LR having a slight lead (Table 7). This shows that the LR has a tendency to identify the recorded landslides more confidently than the other methods by assigning larger value of landslide susceptibility. It is worth pointing out that, despite the elevated threshold, the predictive capability of the models is still very good, with accuracy close to 0.9 or higher. In particular, the performance of LR and RF can be deemed excellent, since F1 score remains above 0.9 and Cohen’s kappa above 0.8.

Table 6 Testing performance for probability threshold 0.5

Full size table

Table 7 Testing performance for probability threshold 0.8

Full size table

The performance naturally degrades when the models are trained using a thinned-down dataset of only 20 landslide points (LP20 / NLS-1). Yet, all models manage to achieve adequate accuracy (> 0.7) when considering the positive classification threshold value to be equal to 0.5 (Table 6). Although RF remains the best overall performing method also in the case of the thinned-down training dataset, XGBoost exhibits better specificity and precision. Both methods achieve F1 score larger than 0.8 and Cohen’s κ larger than 0.6, indicating good to very good predictive capacity and inter-rater reliability.

Effect of non-landslide points sampling method

Figure 12 presents the landslide susceptibility maps produced using the buffer zone approach (NLS-2) for sampling the non-landslide points, with the buffer size being 250 m. It can be seed that the maps have the same qualitative characteristics as those of Fig. 9 (NLS-1 approach). Nonetheless, the areas characterized as VHS are smaller in Fig. 12. This is especially true for LR and RF, in which case the VHS areas of Fig. 12 are almost half those observed in Fig. 9.

Tables 8 and 9 compare the performance metrics from the analyses using the NLS-2 sampling method with those of method NLS-1 which involves expert knowledge and preliminary landslide susceptibility calculations. The results show that the use of the buffer zone approach results in smaller but still high AUC values (> 0.9). In particular, for the 750 m size, the AUC values for Random Forest and XGBoost are 0.978 and 0.979, very close to those from the analyses involving prior knowledge in the selection of the non-landslide points (Table 8). Moreover, the NLS-2 approach yields smaller performance metric values (accuracy, specificity, F1, κ) than NLS-1 in all scenarios, with specificity in the 750 m buffer scenario being the only exception. RF and XGBoost consistently exhibit superior predictive accuracy than LR, with AUC values surpassing 0.95. Unlike the case of NLS-1, XGBoost is outperforming RF in terms of AUC in all three buffer zone scenarios. However, RF achieved better statistical metrics (Table 9) than XGBoost when the buffer zone size is 750 m. The AUC and the other metrics improve with increasing buffer size for all three machine learning models, a fact that is in agreement with previous studies that employed the buffer zone method (Dou et al. 2023; Guo et al. 2024; Liu et al. 2024; Xi et al. 2022). This observation could be partially attributed to reduction of spatial overlap and misclassification risks as the buffer size increases.

Table 8 Comparison of AUC values for landslide susceptibility models with different non-landslide sampling method

Full size table

Table 9 Comparison of performance metrics for landslide susceptibility models with different non-landslide sampling method (probability threshold 0.5)

Full size table

Figure 13 shows the frequency distribution of landslide susceptibility values from the buffer-zone-based analyses. It can be seen that the percentage in the 0.9–1.0 range is smaller than that observed in Fig. 11c. The spatial extent of the 0.9–1.0 range from the Logistic Regression method is only 3.3%, far lower than the percentage of the study area covered by the recorded landslides (9%). RF and XGBoost distributions exhibit a local maximum in the 0.9–1.0 range, as in the case of the NLS-1 approach. More specifically, the percentage value from analysis with RF is 10%, slightly larger than the spatial extent of recorded landslides (9% of the study area), while for the XGBoost this percentage rises to 18.2%, effectively approaching the 20%−25% yielded by using the NLS-1 approach. The present results show that, in our study area, a highly performing model will inevitably lead to a “spike” in the highest landslide susceptibility range, irrespectively of the non-landslide sampling method employed.

Effect of incorporation of geotechnical thematic layers

In order to investigate the contribution of the geotechnical thematic layers (Plasticity Index, Clay Fraction and Geological Strength Index) to the predictive capability of the models, landslide susceptibility calculations were repeated without including these thematic layers in the analysis. Their effect on model performance is quantified through the ratio R_GT (Fig. 14), which is the ratio of the statistical metric values (accuracy, specificity, precision, recall, F1 and κ) obtained from the analyses that include the geotechnical thematic layers to the respective values from analyses performed without the geotechnical thematic layers. R_GT > 1.0 signifies performance enhancement, while R_GT < 1.0 performance degradation.

Figure 14a presents the results for the case of the models being trained using the large dataset of 1750 landslide points and non-landslide sampling approach NLS-1 (LP1750 / NLS-1). The R_GT for RF in all statistical metrics turns out to be slightly larger than 1.0, indicating a meager performance enhancement. On the other hand, LR and XGBoost exhibit a marked improvement, particularly in terms of precision (smaller false positive rate) and κ (better inter-rater reliability). The results indicate that the inclusion of the geotechnical thematic layers help more the methods that are comparatively underperforming. The AUC values from the two sets of analyses are shown in Table 10. It can be seen that incorporating the geotechnical variables leads to a slight improvement in AUC values for all models. The gain is larger in LR and XGB, than in RF. These enhancements, although modest, suggests that the inclusion of the geotechnical variables can provide additional predictive value, contributing to a more accurate landslide susceptibility assessment.

Table 10 Comparison of AUC values for landslide susceptibility models with and without inclusion of geotechnical variables

Full size table

When the models are trained based on the thinned-down dataset (LP20 / NLS-1), the trends appear more irregular, with improvement shown in certain metrics but degradation in others (Fig. 14b). Substantial enhancement is observed in terms of precision, F1 and κ of the RF model, showing that overall balance between precision and recall and the agreement between predictions and true labels still improve. In the case of the LR model performance enhancement is particularly seen in terms of accuracy and κ. The LR method seems to provide the most stable model, benefiting from the geotechnical layers irrespectively of the size of the training dataset. This may be attributed to the fact that LR, being a simpler method, can respond better under limited data, showing more consistent improvements across metrics. On the other hand, XGboost appears not to improve at all with the use of geotechnical thematic layers (Fig. 14b).

Discussion

The proposed methodology is expected to contribute to better-informed decision-making processes in land-use planning, infrastructure development, and disaster management, ultimately reducing the risk and impact of landslides on human life and property. The resulting maps can be utilized as an approximate guide for areas where future urban development and infrastructure projects are being considered. For instance, in areas classified as highly to very highly susceptible to landslides, thorough geological/geotechnical studies with specific requirements pertinent to landslide hazard should be submitted to and approved by state authorities prior to construction. Conversely, in areas of low to very low susceptibility, geotechnical investigations and analysis could be limited to those provisioned by the local geotechnical design standard, namely Eurocode 7, on condition that other types of geohazards are not prevalent.

The results illustrate that in cases of availability of a comprehensive landslide inventory, Machine Learning techniques are capable of achieving very high success rates in identifying zones prone to landslides and distinguishing them from those that are relatively safe. The high AUC values achieved (0.9 to 0.985) are primarily attributable to 1) the extensive number of recorded landslides (comprehensive inventory) and 2) the high performance of the machine learning methodologies. It should be mentioned that a number of recent publications employing Random Forest, XGBoost or Logistic Regression report AUC values in the 0.90 to 0.99 range (e.g. Agboola et al. 2024; Chang et al. 2023; Dou et al. 2023; Liu et al. 2024; Xi et al. 2022). The incorporation of the geotechnical thematic layers leads to a relatively small performance improvement. This is due to the inherent correlation of these thematic layers with the lithology layer, which turned out to be the most dominant factor in the given study area (Fig. 8). The results suggest that, for this study area, the variation of the ground properties across a geological formation is a less critical information than the geological formation itself.

Despite the excellent performance of the Machine Learning methods used herein, there are certain limitations that must be acknowledged. An important issue that may be introducing a certain degree of bias in the present results is the procedure followed in selecting the non-landslide points, which relied on expert knowledge and exploitation of low susceptibility areas from previous studies and preliminary frequency-ratio-based analysis. The proper selection strategy for non-landslide points remains a topic of discussion within the scientific community.

Another limitation of this study is that models may be affected by overfitting owing to the specific characteristics of the study area, limiting their generalizability to other regions. The accuracy of landslide susceptibility maps is also influenced by the resolution of the input data and the classification methods used, which might lead to the over- or underestimation of landslide risks in certain areas. Expanding the study to different geographical areas with varying conditions would help to further validate and refine the models.

Conclusions

The main objective of this study was to examine the landslide susceptibility prediction capability of three different machine learning methods: Logistic Regression (LR), Random Forest (RF), and XGboost. In addition, the geotechnical parameters Plasticity Index (PI), Clay Fraction (CF) and Geological Strength Index (GSI) were incorporated as thematic layers and the potential gain in model performance was evaluated. The study area, which comprised the southwestern part of Cyprus, contains 1936 recorded landslides of different types, rendering it an ideal test bed. After processing the feature importance of 17 causal factors, lithology stands out as the factor that affects the most the generation of landslides, followed by rainfall, land use and slope angle.

All models achieved high values of accuracy, specificity, F1 score and Cohen’s kappa index, while AUC values exceeded 0.9. Among the three methods, RF exhibits the best performance, followed by XGBoost and LR, provided that the statistical metrics are calculated assuming a true positive threshold equal to 0.5. For a threshold equal to 0.8, the LR achieves slightly better metrics compared to RF, on condition that the models are trained based on an extensive landslide inventory. Unlike LR, the RF and XGBoost methods exhibit adequate performance even when the models are trained using a thinned-down dataset of only 20 landslide points. The incorporation of the geotechnical parameters PI, CF and GSI as thematic layers in the landslide susceptibility analysis led to a small but measurable performance improvement. The performance enhancement is more marked in cases in which the models are the least effective.

Data availability

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

Abdi A, Bouamrane A, Karech T, Dahri N, Kaouachi A (2021) Landslide susceptibility mapping using GIS-based fuzzy logic and the analytical hierarchical processes approach: a case study in constantine (North-East Algeria). Geotech Geol Eng 39:5675–5691
Article Google Scholar
Abedin J, Rabby YW, Hasan I, Akter H (2020) An investigation of the characteristics, causes, and consequences of June 13, 2017, landslides in Rangamati District Bangladesh. Geoenvironmental Disasters 7:1–19
Article Google Scholar
Abeysiriwardana HD, Gomes PI (2022) Integrating vegetation indices and geo-environmental factors in GIS-based landslide-susceptibility mapping: using logistic regression. J Mt Sci 19(2):477–492
Article Google Scholar
Aboye SA (2009) Slope Stability Analysis Using GIS and Numerical Modeling Techniques. University of Ghent, Belgium
Google Scholar
Achour Y, Pourghasemi HR (2020) How do machine learning techniques help in increasing accuracy of landslide susceptibility maps? Geosci Front 11(3):871–883
Article Google Scholar
Agboola G, Beni LH, Elbayoumi T, Thompson G (2024) Optimizing landslide susceptibility mapping using machine learning and geospatial techniques. Eco Inform 81:102583
Article Google Scholar
Ahmad A, Lopulisa C, Baja S, Imran A (2019) The correlation of soil liquid limit and plasticity index for predicting soil susceptibility: a case study on landslides area in South Sulawesi. In: IOP Conf Ser Earth Environ Sci 235(1):012007. IOP Publishing
Akgun A, Türk N (2010) Landslide susceptibility mapping for Ayvalik (Western Turkey) and its vicinity by multicriteria decision analysis. Environ Earth Sci 61:595–611
Article Google Scholar
Alexakis D, Agapiou A, Tzouvaras M, Themistocleous K, Neocleous K, Michaelides S, Hadjimitsis DG (2014) Integrated use of GIS and remote sensing for monitoring landslides in transportation pavements: the case study of Paphos area in Cyprus. Nat Hazards 72:119–141
Article Google Scholar
Alexandris A, Griva IK, Abarioti M (2016) Remediation of the Pissouri landslide in Cyprus. Int J Geoengin Case Hist 4:14–28
Google Scholar
Alexoudi M, Papaliangas T, Manolopoulou-Papaliagka S (2010) A methodology for landslide risk assessment and management. J Environ Prot Ecol 11(1):317–326
Althuwaynee OF, Pradhan B, Park H-J, Lee JH (2014) A novel ensemble bivariate statistical evidential belief function with knowledge-based analytical hierarchy process and multivariate statistical logistic regression for landslide susceptibility mapping. CATENA 114:21–36
Article Google Scholar
Anis Z, Wissem G, Riheb H, Biswajeet P, Essghaier GM (2019) Effects of clay properties in the landslides genesis in flysch massif: Case study of Aïn Draham, North Western Tunisia. J Afr Earth Sc 151:146–152
Article Google Scholar
Arabameri A, Chandra Pal S, Rezaie F, Chakrabortty R, Saha A, Blaschke T, Di Napoli M, Ghorbanzadeh O, Thi Ngo PT (2022) Decision tree based ensemble machine learning approaches for landslide susceptibility mapping. Geocarto Int 37(16):4594–4627
Article Google Scholar
Ayalew L, Yamagishi H (2005) The application of GIS-based logistic regression for landslide susceptibility mapping in the Kakuda-Yahiko Mountains, Central Japan. Geomorphology 65(1-2):15–31
Ayalew L, Yamagishi H, Ugawa N (2004) Landslide susceptibility mapping using GIS-based weighted linear combination, the case in Tsugawa area of Agano River, Niigata Prefecture, Japan. Landslides 1:73–81
Article Google Scholar
Baharvand S, Rahnamarad J, Soori S, Saadatkhah N (2020) Landslide susceptibility zoning in a catchment of Zagros Mountains using fuzzy logic and GIS. Environ Earth Sci 79:1–10
Article Google Scholar
Batar AK, Watanabe T (2021) Landslide susceptibility mapping and assessment using geospatial platforms and weights of evidence (WoE) method in the Indian himalayan region: Recent developments, gaps, and future directions. ISPRS Int J Geo Inf 10(3):114
Article Google Scholar
Beven KJ, Kirkby MJ (1979) A physically based, variable contributing area model of basin hydrology/Un modèle à base physique de zone d’appel variable de l’hydrologie du bassin versant. Hydrol Sci J 24(1):43–69
Article Google Scholar
Bourenane H, Bouhadad Y, Guettouche MS, Braham M (2015) GIS-based landslide susceptibility zonation using bivariate statistical and expert approaches in the city of Constantine (Northeast Algeria). Bull Eng Geol Env 74:337–355
Article Google Scholar
Brabb EE (1985) Innovative approaches to landslide hazard and risk mapping. In Proc 4th Intern Symp Landslides 1:307–323
Bragagnolo L, Da Silva R, Grzybowski J (2020) Artificial neural network ensembles applied to the mapping of landslide susceptibility. CATENA 184:104240
Article Google Scholar
Breiman L (2001) Random Forests. Mach Learn 45:5–32
Article Google Scholar
Buckley A (2010) Understanding curvature rasters. ArcGIS Blog, ESRI. https://www.esri.com/arcgis-blog/products/product/imagery/understanding-curvature-rasters/. Accessed 22 Nov 2022
Budimir M, Atkinson P, Lewis H (2015) A systematic review of landslide probability mapping using logistic regression. Landslides 12:419–436
Article Google Scholar
Bui DT, Tsangaratos P, Ngo P-TT, Pham TD, Pham BT (2019) Flash flood susceptibility modeling using an optimized fuzzy rule based feature selection technique and tree based ensemble methods. Sci Total Environ 668:1038–1054
Article CAS Google Scholar
Cagnan Z, Tanircan GB (2010) Seismic hazard assessment for Cyprus. J Seismolog 14:225–246
Article Google Scholar
Can R, Kocaman S, Gokceoglu C (2021) A comprehensive assessment of XGBoost algorithm for landslide susceptibility mapping in the upper basin of Ataturk dam. Turkey Appl Sci 11(11):4993
Article CAS Google Scholar
Carrara A, Guzzetti F, Cardinali M, Reichenbach P (1999) Use of GIS technology in the prediction and monitoring of landslide hazard. Nat Hazards 20:117–135
Article Google Scholar
Carson M, Kirkby M (1972) 1972: Hillslope form and process. Cambridge University Press, Cambridge
Google Scholar
Catani F, Lagomarsino D, Segoni S, Tofani V (2013) Landslide susceptibility estimation by random forests technique: sensitivity and scaling issues. Nat Hazard 13(11):2815–2831
Article Google Scholar
Chang L, Xing G, Yin H, Fan L, Zhang R, Zhao N, ..., & Ma J (2023) Landslide susceptibility evaluation and interpretability analysis of typical loess areas based on deep learning. Nat Hazards Res 3(2):155–169
Chapi K, Singh VP, Shirzadi A, Shahabi H, Bui DT, Pham BT, Khosravi K (2017) A novel hybrid artificial intelligence approach for flood susceptibility assessment. Environ Model Softw 95:229–245
Article Google Scholar
Charalambous M, Hobbs P, Northmore K (1986) Supplementary geotechnical and mineralogical data for cohesive soil samples from selected sites across Cyprus. Report EGARP-KW /86/5, GSD Report G/EG/16. Cyprus Geological Survey NERC copyright, Cyprus
Chen W, Panahi M, Tsangaratos P, Shahabi H, Ilia I, Panahi S, Li S, Jaafari A, Ahmad BB (2019) Applying population-based evolutionary algorithms and a neuro-fuzzy system for modeling landslide susceptibility. CATENA 172:212–231
Article Google Scholar
Chen L, Guo H, Gong P, Yang Y, Zuo Z, Gu M (2021) Landslide susceptibility assessment using weights-of-evidence model and cluster analysis along the highways in the Hubei section of the Three Gorges Reservoir Area. Comput Geosci 156:104899
Article Google Scholar
Chen T, Guestrin C (2016) XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, p 785–794
Conforti M, Pascale S, Robustelli G, Sdao F (2014) Evaluation of prediction capability of the artificial neural networks for mapping landslide susceptibility in the Turbolo River catchment (northern Calabria, Italy). CATENA 113:236–250
Article Google Scholar
Constantinou GIP, Xenophontos K, Afrodisis S, Michaelides P, Kramvis S (2002) The geology of Cyprus. Bulletin No. 10, Cyprus Geological Survey, Nicosia, Cyprus
Corine (2018) Corine land cover 2018 (vector/raster 100m), Europe, 6-yearly. https://land.copernicus.eu/en/products/corine-land-cover/clc2018. Accessed 10 Dec 2022
Cui Y, Cai M, Stanley HE (2017) Comparative analysis and classification of cassette exons and constitutive exons. BioMed Res Int. https://doi.org/10.1155/2017/7323508
Dandy M, Sofwan A, Setiawan I (2019) Fuzzy logic method design for landslide vulnerability. In Proceedings of the 4th International Conference of Energy, Enviroment, Epidemiology and Information System (ICENIS 2019) 125:03004
Das S, Saha P, Mitra R, Alam A, Kamruzzaman M (2023) GIS-based data-driven bivariate statistical models for landslide susceptibility prediction in Upper Tista Basin, India. Heliyon 9(5)
Davis J, Goadrich M (2006) The relationship between precision-recall and ROC curves. In Proceedings of the 23rd International Conference of Machine Learning, pp 233–240
De Ploey J, Cruz O (1979) Landslides in the serra do mar. Brazil Catena 6(2):111–122
Article Google Scholar
De Oliveira NS, Rotunno Filho OC, Marton E, Silva C (2016) Correlation between rainfall and landslides in Nova Friburgo, Rio de Janeiro—Brazil: a case study. Environ Earth Sci 75:1358. https://doi.org/10.1007/s12665-016-6171-7
Article Google Scholar
Dou J, Yunus AP, Bui DT, Merghadi A, Sahana M, Zhu Z, Chen C-W, Han Z, Pham BT (2020) Improved landslide assessment using support vector machine with bagging, boosting, and stacking ensemble machine learning framework in a mountainous watershed, Japan. Landslides 17:641–658
Article Google Scholar
Dou H, He J, Huang S, Jian W, Guo C (2023) Influences of non-landslide sample selection strategies on landslide susceptibility mapping by machine learning. Geomat Nat Haz Risk 14(1):2285719
Article Google Scholar
Du G, Zhang Y, Zou L, Yang Z, Yuan Y, Ren S (2022) Co-seismic landslide hazard assessment of the 2017 Ms 6.9 Milin earthquake, Tibet, China, combining the logistic regression–information value and Newmark displacement models. Bull Eng Geol Environ 81(10):446
Article Google Scholar
ESRI (2015) Getting to know ArcGIS. In: Law M, Collins A (eds), 4th edn
Fan W, Wei X-S, Cao Y-B, Zheng B (2017) Landslide susceptibility assessment using the certainty factor and analytic hierarchy process. J Mt Sci 14:906–925
Article Google Scholar
Fawcett T (2006) An introduction to ROC analysis. Pattern Recogn Lett 27(8):861–874
Article Google Scholar
Fell R, Corominas J, Bonnard C, Cascini L, Leroi E, Savage WZ (2008) Guidelines for landslide susceptibility, hazard and risk zoning for land use planning. Eng Geol 102(3–4):85–98
Article Google Scholar
Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4):367–378
Article Google Scholar
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Annals of Statistics 29(5):1189–1232
Gaidzik K, Ramírez-Herrera MT (2021) The importance of input data on landslide susceptibility mapping. Sci Rep 11(1):19334
Article CAS Google Scholar
Gardezi SAH, Usmani NA, Chen XQ, Ikram N, Ahmad S, Wani S (2023) Application of data-driven techniques for landslide susceptibility prediction along an earthquake-affected road section in Kashmir Himalaya. Geol J 58(2):849–882
Article Google Scholar
GEOportal of Cyprus Geological Survey Department (2022) Nicosia, Cyprus. https://geoportal-gsd.moa.gov.cy/portal/apps/experiencebuilder/experience/?id=a18a7b0a0c494ef8abc21be965a07308&page=Home. Accessed 20 Oct 2022
Gholami M, Ghachkanlu EN, Khosravi K, Pirasteh S (2019) Landslide prediction capability by comparison of frequency ratio, fuzzy gamma and landslide index method. J Earth Syst Sci 128:1–22
Article Google Scholar
Gruber S, Peckham S (2009) Land-surface parameters and objects in hydrology. Dev Soil Sci 33:171–194
Google Scholar
Günther A, Van Den Eeckhaut M, Malet J-P, Reichenbach P, Hervás J (2014) Climate-physiographically differentiated Pan-European landslide susceptibility assessment using spatial multi-criteria evaluation and transnational landslide information. Geomorphology 224:69–85
Article Google Scholar
Guo Z, Shi Y, Huang F, Fan X, Huang J (2021) Landslide susceptibility zonation method based on C5. 0 decision tree and K-means cluster algorithms to improve the efficiency of risk management. Geosci Front 12(6):101249
Article Google Scholar
Guo Z, Tian B, Zhu Y, He J, Zhang T (2024) How do the landslide and non-landslide sampling strategies impact landslide susceptibility assessment?—A catchment-scale case study from China. J Rock Mech Geotech Eng 16(3):877–894
Article Google Scholar
Guzzetti F, Carrara A, Cardinali M, Reichenbach P (1999) Landslide hazard evaluation: a review of current techniques and their application in a multi-scale study. Central Italy Geomorphol 31(1–4):181–216
Google Scholar
Guzzetti F, Reichenbach P, Cardinali M, Galli M, Ardizzone F (2005) Probabilistic landslide hazard assessment at the basin scale. Geomorphol 72(1–4):272–299
Article Google Scholar
Hadjigeorgiou J, Kyriakou E, Papanastasiou P (2006) A road embankment failure near Pentalia in Southwest Cyprus
Hart A, Hearn G (2010) Study of Landslide in Areas of Paphos District, Cyprus. Final Project Report. Geological Survey Department, Nicosia, Cyprus
Hart A, Hearn G (2013) Landslide assessment for land use planning and infrastructure management in the Paphos District of Cyprus. Bull Eng Geol Env 72(2):173–188
Article Google Scholar
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: prediction, inference and data mining. SpringerVerlag, New York
Book Google Scholar
Hearn GJ, Larkin H, Hadjicharalambous K, Papageorgiou A, Zoi GE (2018) Proving a landslide: ground behaviour problems at Pissouri, Cyprus. Q J Eng GeolHydrogeol 51(4):461–482
Article Google Scholar
Hearn G, Larkin H, Papageorgiou A, Zoi GE (2020) Acceleration of the Pissouri landslide, Cyprus. Q J Eng GeolHydrogeol 53(4):530–541
Article Google Scholar
Hebbali A (2017) olsrr: tools for teaching and learning OLS regression. https://cran.r-project.org/web/packages/olsrr/index.html. Accessed 25 Nov 2022
Hui S, Su L, Chen W, Tsangaratos P, Ilia I, Liu S, Cui S, Duan Z (2023) Spatial Prediction of Landslide Susceptibility Using Logistic Regression (LR), Functional Trees (FTs), and Random Subspace Functional Trees (RSFTs) for Pengyang County. Chin Remote Sens 15(20):4952. https://doi.org/10.3390/rs15204952
Article Google Scholar
Ikram N, Basharat M, Ali A, Usmani NA, Gardezi SAH, Hussain ML, Riaz MT (2022) Comparison of landslide susceptibility models and their robustness analysis: a case study from the NW Himalayas. Pakistan Geocarto Int 37(25):9204–9241
Article Google Scholar
Kamal M, Zhang B, Cao J, Zhang X, Chang J (2022) Comparative Study of Artificial Neural Network and Random Forest Model for Susceptibility Assessment of Landslides Induced by Earthquake in the Western Sichuan Plateau. Chin Sustain 14(21):13739
Google Scholar
Kamran KV, Feizizadeh B, Khorrami B, Ebadi Y (2021) A comparative approach of support vector machine kernel functions for GIS-based landslide susceptibility mapping. Appl Geomatics 13(4):837–851
Article Google Scholar
Kavzoglu T, Teke A (2022) Predictive Performances of ensemble machine learning algorithms in landslide susceptibility mapping using random forest, extreme gradient boosting (XGBoost) and natural gradient boosting (NGBoost). Arab J Sci Eng 47(6):7367–7385
Article Google Scholar
Kavzoglu T, Sahin EK, Colkesen I (2014) Landslide susceptibility mapping using GIS-based multi-criteria decision analysis, support vector machines, and logistic regression. Landslides 11:425–439
Article Google Scholar
Khosravi K, Nohani E, Maroufinia E, Pourghasemi HR (2016) A GIS-based flood susceptibility assessment and its mapping in Iran: a comparison between frequency ratio and weights-of-evidence bivariate statistical models with multi-criteria decision-making technique. Nat Hazards 83:947–987
Article Google Scholar
Kim HG, Lee DK, Park C (2018) Assessing the Cost of Damage and Effect of Adaptation to Landslides Considering Climate Change. Sustain 10:1628. https://doi.org/10.3390/su10051628
Article Google Scholar
Kjekstad O, Highland L (2009) Economic and social impacts of landslides. In: Sassa K, Canuti P (eds) Landslides – disaster risk reduction. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69970-5_30
Kouhartsiouk D, Perdikou S (2021) The application of DInSAR and Bayesian statistics for the assessment of landslide susceptibility. Nat Hazards 105(3):2957–2985
Article Google Scholar
Kuhn M (2008) Building Predictive Models in R Using the caret Package. J Stat Softw 28(5):1–26. https://doi.org/10.18637/jss.v028.i05
Article Google Scholar
Kumar R, Anbalagan R (2019) Landslide susceptibility mapping of the Tehri reservoir rim area using the weights of evidence method. J Earth Syst Sci 128:1–18
Article Google Scholar
Lagomarsino D, Tofani V, Segoni S, Catani F, Casagli N (2017) A tool for classification and regression using random forest methodology: Applications to landslide susceptibility mapping and soil thickness modeling. Environ Model Assess 22:201–214
Article Google Scholar
Lee S, Ryu JH, Min K, Won JS (2003) Landslide susceptibility analysis using GIS and artificial neural network. Earth Surf Proc Landf J British Geomorphol Res Group 28(12):1361–1376
Article Google Scholar
Liu LL, Zhao SL, Yang C, Zhang W (2024) Quantifying uncertainty in landslide susceptibility mapping due to sampling randomness. Int J Disaster Risk Reduction 114:104966
Article Google Scholar
Lucchese LV, de Oliveira GG, Pedrollo OC (2021) Mamdani fuzzy inference systems and artificial neural networks for landslide susceptibility mapping. Nat Hazards 106(3):2381–2405
Article Google Scholar
Luino F, De Graff J, Biddoccu M, Faccini F, Freppaz M, Roccati A, Ungaro F, D’Amico M, Turconi L (2022) The Role of Soil Type in Triggering Shallow Landslides in the Alps (Lombardy, Northern Italy). Land 11(8):1125. https://doi.org/10.3390/land11081125
Article Google Scholar
Marinos V, Marinos P, Hoek E (2005) The geological strength index: applications and limitations. Bull Eng Geol Env 64:55–65
Article Google Scholar
Meneses BM, Pereira S, Reis E (2019) Effects of different land use and land cover data on the landslide susceptibility zonation of road networks. Nat Hazard 19(3):471–487
Article Google Scholar
Miklin L, Podolszki L, Gulam V, Markotić I (2022) The Impact of Climate Changes on Slope Stability and Landslide Conditioning Factors: An Example from Kravarsko. Croatia Remote Sens 14(8):1794. https://doi.org/10.3390/rs14081794
Article Google Scholar
Mirabedini M, Haghshenas E, Ganjian N (2022) Estimating the Geological Strength Index (GSI) in Regional Seismic-Landslide Zonation Using the Empirical Regression Model. Adv in Civ Eng 1:4798523
Moosavi V, Niazi Y (2016) Development of hybrid wavelet packet-statistical models (WP-SM) for landslide susceptibility mapping. Landslides 13:97–114
Article Google Scholar
Myronidis D, Papageorgiou C, Theophanous S (2016) Landslide susceptibility mapping based on landslide history and analytic hierarchy process (AHP). Nat Hazards 81:245–263
Article Google Scholar
Ng A, Jordan M (2001) On discriminative vs. generative classifiers: a comparison of logistic regression and naive bayes. In: Advances in Neural Information Processing Systems 14:841–848
Nhu V-H, Shirzadi A, Shahabi H, Chen W, Clague JJ, Geertsema M, Jaafari A, Avand M, Miraki S, Talebpour Asl D (2020) Shallow landslide susceptibility mapping by random forest base classifier and its ensembles in a semi-arid region of Iran. Forests 11(4):421
Article Google Scholar
Noether GE (1992) Introduction to Wilcoxon (1945) individual comparisons by ranking methods. In: Kotz S, Johnson NL (eds) Breakthroughs in statistics. Springer Series in statistics. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-4380-9_15
Northmore K, Charalambous M, Hobbs P, Petrides G (1986) Engineering geology of the Kannaviou,'Melange'and Mamonia Complex formations: Phiti/Statos area, SW Cyprus: engineering geology of cohesive soils associated with ophiolites, with particular reference to Cyprus. Report EGARP-KW/86/4, GSD Report G/EG/15
Northmore K, Hobbs P, Charalambous M, Petrides G (1988) Complex landslides in the Kannaviou, Mélange and Mamonia formations of south-west Cyprus. In: International Symposium on Landslides 5:263–268
Nsengiyumva JB, Luo G, Nahayo L, Huang X, Cai P (2018) Landslide susceptibility assessment using spatial multi-criteria evaluation model in Rwanda. Int J Environ Res Public Health 15(2):243
Article Google Scholar
Nwazelibe VE, Unigwe CO, Egbueri JC (2023) Testing the performances of different fuzzy overlay methods in GIS-based landslide susceptibility mapping of Udi Province. SE Nigeria Catena 220:106654
Article Google Scholar
Pantazis TM (1969) Landslides in Cyprus. Bulletin of the Geological Survey Department, Cyprus
Pawluszek K, Borkowski A (2017) Impact of DEM-derived factors and analytical hierarchy process on landslide susceptibility mapping in the region of Rożnów Lake. Poland Natural Hazards 86(2):919–952
Article Google Scholar
Perdikou S, Kouhartsiouk D (2019) Landslide risk evaluation in Cyprus using satellite radar interferometry: the “Rantkat ″project. In Proceedings of the XVII European conference on soil mechanics and geotechnical engineering, icelandic geotechnical society, Reykjavik. https://doi.org/10.32075/17ECSMGE-2019-0488
Petley D (2012) Global patterns of loss of life from landslides. Geology 40:927–930. https://doi.org/10.1130/G33217.1
Article Google Scholar
Pokharel B, Alvioli M, Lim S (2021) Assessment of earthquake-induced landslide inventories and susceptibility maps using slope unit-based logistic regression and geospatial statistics. Sci Rep 11(1):21333
Article CAS Google Scholar
Pradhan B, Lee S (2010) Delineation of landslide hazard areas on Penang Island, Malaysia, by using frequency ratio, logistic regression, and artificial neural network models. Environ Earth Sci 60:1037–1054
Article Google Scholar
Radbruch DH (1970) Map of relative amounts of landslides in California. US Geological Survey Open-File Report 70-1485, 36 pp, map scale 1:500,000. US Geological Survey Open-File Report 85-585
Raghuvanshi TK, Negassa L, Kala P (2015) GIS based Grid overlay method versus modeling approach–A comparative study for landslide hazard zonation (LHZ) in Meta Robi District of West Showa Zone in Ethiopia. Egypt J Remote Sens Space Sci 18(2):235–250
Google Scholar
Rasyid AR, Bhandary NP, Yatabe R (2016) Performance of frequency ratio and logistic regression model in creating GIS based landslides susceptibility map at Lompobattang Mountain, Indonesia. Geoenvironmental Disasters 3:1–16
Article Google Scholar
Razavi-Termeh SV, Shirani K, Pasandi M (2021) Mapping of landslide susceptibility using the combination of neuro-fuzzy inference system (ANFIS), ant colony (ANFIS-ACOR), and differential evolution (ANFIS-DE) models. Bull Eng Geol Env 80:2045–2067
Article Google Scholar
Roccati A, Paliaga G, Luino F, Faccini F, Turconi L (2021) GIS-based landslide susceptibility mapping for land use planning and risk assessment. Land 10(2):162
Article Google Scholar
Rozos D, Pyrgiotis L, Skias S, Tsagaratos P (2008) An implementation of rock engineering system for ranking the instability potential of natural slopes in Greek territory. Appl Karditsa County Landslides 5:261–270
Article Google Scholar
Sahin EK (2020) Assessing the predictive capability of ensemble tree methods for landslide susceptibility mapping using XGBoost, gradient boosting machine, and random forest. SN Appl Sci 2(7):1308
Article CAS Google Scholar
Sartohadi J, Harlin Jennie Pulungan NA, Nurudin M, Wahyudi W (2018) The ecological perspective of landslides at soils with high clay content in the middle Bogowonto Watershed, Central Java, Indonesia. Appl Environ Soil Sci 1: 2648185
Saunders W, Glassey PJ (2007) Guidelines for assessing planning, policy and consent requirements for landslide-prove land GNS Science miscellaneous series 7. Wellington, New Zealand
Segoni S, Piciullo L, Gariano SL (2018) A review of the recent literature on rainfall thresholds for landslide occurrence. Landslides 15(8):1483–1501
Article Google Scholar
Shafapour Tehrany M, Shabani F, Neamah Jebur M, Hong H, Chen W, Xie X (2017) GIS-based spatial prediction of flood prone areas using standalone frequency ratio, logistic regression, weight of evidence and their ensemble techniques. Geomat Nat Haz Risk 8(2):1538–1561
Article Google Scholar
Shahri AA, Spross J, Johansson F, Larsson S (2019) Landslide susceptibility hazard map in southwest Sweden using artificial neural network. CATENA 183:104225
Article Google Scholar
Sharma A (2010) Integrating terrain and vegetation indices for identifying potential soil erosion risk area. Geo-Spatial Inf Sci 13(3):201–209
Article Google Scholar
Shu H, Hürlimann M, Molowny-Horas R, González M, Pinyol J, Abancó C, Ma J (2019) Relation between land cover and landslide susceptibility in Val d’Aran, Pyrenees (Spain): Historical aspects, present situation and forward prediction. Sci Total Environ 693:133557
Article CAS Google Scholar
Sidle RC, Ochiai H (2006) Processes, prediction, and land use American Geophysical Union. AGU Books Board, American Geophysical Union, Washington
Book Google Scholar
Sim KB, Lee ML, Wong SY (2022) A review of landslide acceptable risk and tolerable risk. Geoenviron Disasters 9:3. https://doi.org/10.1186/s40677-022-00205-6
Article Google Scholar
Singh C, Kohli A, Kumar P (2014) Comparison of results of BIS and GSI guidelines on macrolevel landslide hazard zonation—A case study along highway from Bhalukpong to Bomdila, West Kameng district, Arunachal Pradesh. J Geol Soc India 83:688–696
Article Google Scholar
Skidmore M (2001) Risk, natural disasters, and household savings in a life cycle model. Jpn World Econ 13(1):15–34
Article Google Scholar
Son N-T, Chen C-F, Chen C-R, Minh V-Q (2018) Assessment of Sentinel-1A data for rice crop classification using random forests and support vector machines. Geocarto Int 33(6):587–601
Google Scholar
Sørensen R, Zinko U, Seibert J (2005) On the calculation of the topographic wetness index: evaluation of different methods based on field observations. Hydrol Earth Syst Sci Discuss 2(4):1807–1834
Google Scholar
Sujatha ER, Rajamanickam GV, Kumaravel P (2012) Landslide susceptibility analysis using probabilistic certainty factor approach: a case study on Tevankarai stream watershed, India. J Earth Syst Sci 121:1337–1350
Article Google Scholar
Sun D, Xu J, Wen H, Wang Y (2020) An optimized random forest model and its generalization ability in landslide susceptibility mapping: application in two areas of Three Gorges Reservoir, China. J Earth Sci 31:1068–1086
Article Google Scholar
Sun D, Shi S, Wen H, Xu J, Zhou X, Wu J (2021) A hybrid optimization method of factor screening predicated on GeoDetector and Random Forest for Landslide Susceptibility Mapping. Geomorphol 379:107623
Article Google Scholar
Taalab K, Cheng T, Zhang Y (2018) Mapping landslide susceptibility and types using Random Forest. Big Earth Data 2(2):159–178
Article Google Scholar
Tsangaratos P, Benardos A (2014) Estimating landslide susceptibility through a artificial neural network classifier. Nat Hazards 74:1489–1516
Article Google Scholar
Tsangaratos P, Ilia I (2016) Landslide susceptibility mapping using a modified decision tree classifier in the Xanthi Perfection, Greece. Landslides 13:305–320
Article Google Scholar
Tzampoglou P, Loupasakis C (2023) Hydrogeological Hazards in Open Pit Coal Mines-Investigating Triggering Mechanisms by Validating the European Ground Motion Service Product with Ground Truth Data. Water 15(8):1474
Article CAS Google Scholar
Tzampoglou P, Loukidis D, Karlas E, Tolomei C, Svigkas N, Pezzo G, Koulermou N, Anastasiades A, Loupasakis C (2024) Investigating the impact of urban development on the activation of a paleolandslide. A case study from Pissouri, Cyprus. Bull Eng Geol Environ 83(10):420
Article Google Scholar
Tzampoglou P, Loukidis D, Tsangaratos P, Anastasiades A, Karalis K (2025) Correlation Between Geotechnical Indexes and Landslide Occurrence in Southwestern Cyprus Using GIS and Machine Learning. Geotech Geol Eng 43(1):1–29
Article Google Scholar
Varnes DJ (1978) Slope movement types and processes. Special Report 176:11–33
Google Scholar
Vega J, Hidalgo C (2023) Comparison study of a landslide-event hazard mapping using a multi-approach of fuzzy logic, TRIGRS model, and support vector machine in a data-scarce Andes Mountain region. Arab J Geosci 16(9):527
Article Google Scholar
Wang Z, Brenning A (2021) Active-learning approaches for landslide mapping using support vector machines. Remote Sensing 13(13):2588
Article Google Scholar
Wang Y, Fang Z, Hong H (2019) Comparison of convolutional neural networks for landslide susceptibility mapping in Yanshan County, China. Sci Total Environ 666:975–993
Article CAS Google Scholar
Waqas H, Lu L, Tariq A, Li Q, Baqa MF, Xing J, Sajjad A (2021) Flash flood susceptibility assessment and zonation using an integrating analytic hierarchy process and frequency ratio model for the Chitral District, Khyber Pakhtunkhwa. Pakistan Water 13(12):1650
Google Scholar
Wieczorek GF (1984) Preparing a detailed landslide-inventory map for hazard evaluation and reduction. Bull Assoc Eng Geol 21(3):337–342
Google Scholar
Wilcoxon F (1992) Individual comparisons by ranking methods. In: Kotz S, Johnson NL (eds) Breakthroughs in statistics. Springer Series in Statistics. Springer, New York. https://doi.org/10.1007/978-1-4612-4380-9_16
Wilde M, Günther A, Reichenbach P, Malet J-P, Hervás J (2018) Pan-European landslide susceptibility mapping: ELSUS Version 2. J Maps 14(2):97–104
Article Google Scholar
Wistuba M, Gorczyca E, Malik I (2021) Inferring precipitation thresholds of landslide activity from long-term dendrochronological and precipitation data: Case study on the unstable slope at Karpenciny. Poland Eng Geol 294:106398
Article Google Scholar
Wu Y, Chen L, Cheng C, Yin K, Török Á (2014) GIS-based landslide hazard predicting system and its real-time test during a typhoon, Zhejiang Province, Southeast China. Eng Geol 175:9–21
Article Google Scholar
Xi C, Han M, Hu X, Liu B, He K, Luo G, Cao X (2022) Effectiveness of Newmark-based sampling strategy for coseismic landslide susceptibility mapping using deep learning, support vector machine, and logistic regression. Bull Eng Geol Env 81(5):174
Article Google Scholar
Xiao T, Yu L, Tian W, Zhou C, Wang L (2021) Reducing local correlations among causal factor classifications as a strategy to improve landslide susceptibility mapping. Front Earth Sci 9:781674
Article Google Scholar
Xing X, Wu C, Li J, Li X, Zhang L, He R (2021a) Susceptibility assessment for rainfall-induced landslides using a revised logistic regression method. Nat Hazards 106:97–117
Article Google Scholar
Xing Y, Yue J, Guo Z, Chen Y, Hu J, Travé A (2021b) Large-scale landslide susceptibility mapping using an integrated machine learning model: A case study in the Lvliang mountains of China. Front Earth Sci 9:722491
Article Google Scholar
Xiong D, Zhao W, Zhao D, Zhang B, Dahal NM, Wu Y, Baig MA (2022) An Investigation of Landslide Susceptibility Using Logistic Regression and Statistical Index Methods in Dailekh District. Nepal Chin Geograph Sci 32(5):834–851
Article Google Scholar
Yalcin A (2007) The effects of clay on landslides: A case study. Appl Clay Sci 38(1–2):77–85
Article CAS Google Scholar
Yesilnacar E, Topal T (2005) Landslide susceptibility mapping: a comparison of logistic regression and neural networks methods in a medium scale study, Hendek region (Turkey). Eng Geol 79(3–4):251–266
Article Google Scholar
Zevenbergen LW, Thorne CR (1987) Quantitative analysis of land surface topography. Earth Surf Proc Land 12(1):47–56
Article Google Scholar
Zhang S, Bai L, Li Y, Li W, Xie M (2022) Comparing convolutional neural network and machine learning models in landslide susceptibility mapping: A case study in wenchuan county. Front Environ Sci 10:886841
Article Google Scholar

Download references

Acknowledgements

The authors are grateful to Zomenia Zomeni, Niki Koulermou and Pantelitsa Koudouna of the Geological Survey Department of Cyprus for providing geological and geotechnical data. The field work performed by Maria Panagiotou on behalf of the University of Cyprus is greatly acknowledged.

Funding

The study was funded from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 101027880.

Author information

Authors and Affiliations

GeoImaging Ltd, 2021, Nicosia, Cyprus
P. Tzampoglou & A. Anastasiades
Department of Civil & Environmental Engineering, University of Cyprus, 1678, Nicosia, Cyprus
P. Tzampoglou & D. Loukidis
School of Mining and Metallurgical Engineering, National Technical University of Athens, Zografou, Athens, Greece
P. Tsangaratos

Authors

P. Tzampoglou
View author publications
You can also search for this author inPubMed Google Scholar
D. Loukidis
View author publications
You can also search for this author inPubMed Google Scholar
A. Anastasiades
View author publications
You can also search for this author inPubMed Google Scholar
P. Tsangaratos
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Author contributions Ploutarchos Tzampoglou: Data curation, Investigation, Formal analysis, Conceptualization, Methodology, Visualization, Writing—original draft. Dimitrios Loukidis: Conceptualization, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing—review & editing. Aristodemos Anastasiades: Funding acquisition, Project administration, Resources, Supervision. Paraskevas Tsangaratos: Conceptualization, Investigation, Methodology, Software, Supervision, Writing—review & editing.

Corresponding author

Correspondence to P. Tzampoglou.

Ethics declarations

Competing interest

The authors declare no competing interests.

Additional information

Communicated by: Hassan Babaie

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Tzampoglou, P., Loukidis, D., Anastasiades, A. et al. Advanced machine learning techniques for enhanced landslide susceptibility mapping: Integrating geotechnical parameters in the case of Southwestern Cyprus. Earth Sci Inform 18, 357 (2025). https://doi.org/10.1007/s12145-025-01761-9

Download citation

Received: 08 July 2024
Accepted: 27 January 2025
Published: 03 April 2025
DOI: https://doi.org/10.1007/s12145-025-01761-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Advanced machine learning techniques for enhanced landslide susceptibility mapping: Integrating geotechnical parameters in the case of Southwestern Cyprus

Abstract

Similar content being viewed by others

Correlation Between Geotechnical Indexes and Landslide Occurrence in Southwestern Cyprus Using GIS and Machine Learning

Effect of different mapping units, spatial resolutions, and machine learning algorithms on landslide susceptibility mapping at the township scale

Landslide susceptibility mapping using GIS-based machine learning algorithms for the Northeast Chongqing Area, China

Introduction

Study area

Materials and methods

Landslide inventory map

Data pre-processing

Dataset preparation for spatial modelling

Multicollinearity analysis

Feature importance analysis

Preparing the training and testing datasets

Machine learning classifiers

Logistic regression

Random forest

Extreme Gradient Boosting (XGBoost)

Landslide susceptibility map

Methods of validation and accuracy assessment

Input datasets

Geomorphological factors

Geological/geotechnical factors

Hydrological factors

Anthropogenic factors

Numerical Results

Frequency ratio analysis

Multicollinerarity analysis and factor importance

Landslide susceptibility maps

Performance of machine learning models

Effect of non-landslide points sampling method

Effect of incorporation of geotechnical thematic layers

Discussion

Conclusions

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords