Introduction

Landslides are defined as the movement of ground down a slope due to gravity (Varnes 1978). They are one of the most destructive natural hazards and have significantly affected many countries worldwide causing thousands of deaths and injuries (Kavzoglu et al. 2014; Skidmore 2001). Based on the type of movement and geological conditions, landslides can be classified into different types such as rotational slides, translational slides, rockfalls, and flows (Alexoudi et al. 2010; Saunders and Glassey 2007; Varnes 1978). Slope stability is controlled by the geomorphological, geological, hydrological, geotechnical, and climatic conditions, as well as human activities (Tzampoglou et al. 2025; Gardezi et al. 2023; Ikram et al. 2022; Luino et al. 2022; Miklin et al., 2022). Among the recorded non-seismically triggered landslides for the period between 2004–2010 worldwide, the number of fatal landslides approached 2620, causing 32,322 fatalities (Petley 2012). Furthermore, this natural hazard causes significant economic losses on the scale of billions of US dollars each year in many countries (Tzampoglou and Loupasakis 2023). This is particularly true in regions characterized by mountainous terrain and high population density. For example, the average annual cost of landslides in Italy varies between 2.6 to 5 billion US dollars (Kjekstad and Highland 2009), while the cost of damages due to landslides in the Republic of Korea increased from 5.3 in 1997 to 13.9 billion Won in 2014 (Kim et al. 2018). Furthermore, the estimated annual total cost of landslides reached 1.4 billion US dollars in Canada (Sidle and Ochiai 2006) and 26.3 million US dollars in New Zealand (Sim et al. 2022).

The term susceptibility refers to the possibility of landslide activation, considering the prevailing conditions (Brabb, 1985). It does not refer to either when or how the landslide is triggered, or to the magnitude of the expected losses. The basic consideration of the landslide susceptibility map is the spatial distribution of the parameters that affect the phenomenon to identify zones that are prone to landslides without any time reference (Radbruch 1970). Stochastic landslide hazard analysis is based on certain basic assumptions (Carrara et al. 1999; Guzzetti et al. 1999, 2005): a) landslides that have occurred in the past did not occur by chance but when a series of factors acted cumulatively, b) old recorded landslides can be used as evidence to predict potential future failures, and c) each landslide has specific geomorphological characteristics that are absolutely evident on the surface of the earth.

The causal factors can be classified into two categories: preparatory (e.g., elevation, slope, aspect, lithology, geotechnical properties, tectonic faults, and human activities) and triggering (e.g., rainfall and seismicity) causal factors. The prediction of susceptible areas involves classification of the factors actively contributing to landslide activation. Various stochastic methodologies are currently available for susceptibility and hazard assessments, each with its own advantages and shortcomings (Fell et al. 2008; Rozos et al. 2008). Qualitative methods rely on expert opinions and do not require a landslide database, but their accuracy depends on expert knowledge and is influenced by subjectivity. Quantitative methods, based on statistical and probabilistic analyses, require a detailed landslide database. Their effectiveness is influenced by the complexity, uncertainty, and nonlinearity of the correlations between the spatial distribution of mass movements and the conditioning variables, which must be addressed with caution (Lee et al. 2003; Pradhan and Lee 2010). Recent approaches based on Machine Learning and Data Mining are more effective in managing uncertainty because they are not influenced by the nonlinear behaviour and complexity of systems (Tsangaratos and Ilia 2016). These advanced methods can learn and uncover hidden patterns in large multi-thematic databases. They are adept at handling data with different scales and types without relying on statistical distribution assumptions, making them powerful tools for analyzing complex datasets (Tsangaratos and Ilia 2016). Some characteristic examples include the Logistic Regression approach (Abeysiriwardana and Gomes 2022; Budimir et al. 2015; Du et al. 2022; Pokharel et al. 2021; Xing et al. 2021a; Xiong et al. 2022), Fuzzy Logic method (Abdi et al. 2021; Baharvand et al. 2020; Dandy et al. 2019; Nwazelibe et al. 2023; Vega and Hidalgo 2023), Artificial Neural Network method (Bragagnolo et al. 2020; Shahri et al. 2019; Tsangaratos and Benardos 2014), Bayes theorem which is based on weights of evidence (Batar and Watanabe 2021; Chen et al. 2021; Kouhartsiouk and Perdikou 2021; Kumar and Anbalagan 2019), Neural–Fuzzy method (Chen et al. 2019; Lucchese et al. 2021; Razavi-Termeh et al. 2021), Support Vector Machines (Dou et al. 2020; Kamran et al. 2021; Wang and Brenning 2021) and Decision Tree method (Arabameri et al. 2022; Guo et al. 2021; Taalab et al. 2018).

The present study considers three Machine Learning methods, namely Logistic Regression (LR), Random Forest (RF), and XGBoost (XGB). LR is a widely used statistical probability model applied in landslide susceptibility assessment, offering accurate and reliable results through its simple discriminative learning mechanism. It estimates the probability of a given feature (x) and label (y) directly by minimizing error (Ng and Jordan 2001). RF is an ensemble Machine Learning method that utilizes random binary trees with bootstrapping techniques (Breiman 2001). This non-linear, non-parametric algorithm can assess the importance of landslide-related factors. It can handle large numerical and categorical datasets, accounting for complex interactions. (Taalab et al. 2018). XGBoost is an increasingly popular gradient tree-boosting algorithm that creates a prediction model by boosting weak classification trees (Chen and Guestrin 2016; Cui et al. 2017). Τhe reduced computational time needed, in combination with high accuracy rates, have recently made this method one of the most popular in the studies of landslide susceptibility mapping (Kavzoglu and Teke 2022; Sahin 2020).

One of the fundamental factors in landslide susceptibility analysis is the lithology thematic layer, through which the effect of ground strength is introduced in the analysis. The lithology thematic layer contains only general information about the geological formations without explicitly considering geotechnical characteristics that control the ground strength, such as the Plasticity Index (PI), the Clay Fraction (CF) and the Geological Strength Index (GSI), and their spatial variation. In this study, we incorporate the geotechnical parameters PI, CF, and GSI as thematic layers to explore the potential of enhancing the accuracy and reliability of landslide susceptibility maps. The results of this study are expected to contribute to better-informed decision-making processes in land-use planning, infrastructure development, and disaster management, ultimately reducing the risk and impact of landslides on human life and property.

Study area

The island of Cyprus has been significantly affected by landslides because of its particular geological-geotechnical conditions and mountainous geomorphology (Myronidis et al. 2016; Northmore et al. 1988; Pantazis 1969). Over the past centuries, entire villages had to be relocated because of landslide phenomena, while recent failures in both natural and man-made slopes resulted in an estimated average annual cost of damage amounting to several millions euros.

The study area is the southwestern (SW) part of the island οf Cyprus, comprising of Paphos District and the western part of Limassol District (Fig. 1). The geology of this region is characterized by extensive outcrops of fine-grained clastic sedimentary rocks (e.g., shales, mudstones, and marls) of low strength (Northmore et al. 1986; Charalambous et al. 1986) (Fig. 2). The rock masses are often sheared and/or folded, resulting in low values of equivalent rock mass cohesion and friction angle. This is particularly true for the rocks of the Mamonia Complex, which is a group of allochthonous geological formations that were thrusted on Cyprus and were originally part of the accretionary prism formed during the early stages of subduction of the African tectonic plate under the Eurasian plate (Constantinou et al. 2002).

Fig. 1
figure 1

Satellite image of the study area (sources Esri, Maxar, Earthstar Geographics, and the GIS User Community)

Fig. 2
figure 2

Map of geotechnical groups (lithology) compiled from the geological maps of the Geological Survey Department of Cyprus

Another important aspect of the geology of the area is the existence of extensive outcrops of bentonitic clays and tectonic mélanges (i.e., fragments of Mamonia Complex and Troodos ophiolite rocks in a fine-grained matrix, often of high to very high plasticity). In the 1980’s, a joint study by the British Geological Survey (BGS) and the Geological Survey Department of Cyprus (GSD) examined the geotechnical properties of these argillaceous materials (Northmore et al. 1986; Charalambous et al. 1986). The laboratory results revealed that their effective peak friction angle is often lower than 20° (with the effective cohesion being only a few tens of kPa at most), with the residual friction angle in some cases falling below 10°.

Apart from geological/geotechnical conditions, other aggravating factors include regional rainfall patterns and local seismicity. Rainfall is unevenly distributed during hydrological years, with heavy rains during the winter and spring months and almost no rainfall during the summer months (Hart and Hearn 2013; Alexandris et al. 2016). The study area is also the most seismically active region of the island, encompassing a number of active faults. Moreover, its coast is only a few tens of kilometres away from the subduction zone, where most of the recent earthquake epicenters were observed (Cagnan and Tanircan 2010).

Hart and Hearn (2013) recorded 1842 landslides (active and inactive) of different types in the study area. Most of these slope instabilities can be characterized as creeping landslides, as their rates of movement are of the order of 10 cm/year. Exceptions are road cut failures, which develop quickly and abruptly after intense rainfall events. Such failures have occurred in recent years along the new and old highways connecting the cities of Paphos and Limassol, as well as in the rural road network closer to the Troodos Mountain range.

Despite the apparently slow rate of movement, creeping landslides result over the years in the accumulation of meters of ground displacement, which are capable of shattering even the most competent building foundations. As a consequence, a number of villages have been forced to relocate in the course of the last 70 years, most notably Statos, Agios Photios, Pentalia, Choletria and Fasoula (Alexakis et al. 2014; Hadjigeorgiou et al. 2006). Recent expansion of towns and the growth of the real estate sector in tourist areas have led to the construction of residential projects in regions that were traditionally avoided by local builders due to the known problematic nature of the ground. As a consequence, a number of new instabilities or reactivation of dormant landslides have occurred in the last 20 years, such as those in the areas of Armou, Nata, Tala, Petra tou Romiou, and Pissouri (Alexandris et al. 2016; Hearn et al. 2018, 2020; Perdikou and Kouhartsiouk 2019; Tzampoglou et al. 2024).

Materials and methods

The methodology for constructing the susceptibility maps of the study area is divided into four phases: (i) preparation of a landslide inventory map and selection of the landslides causative factors, (ii) pre-processing and data analysis, (iii) implementation of three machine learning algorithms for landslide susceptibility analysis, and (iv) validation and comparison of the produced maps.

The data sources which were used in this research are summarized in Table 1. All necessary processing for the construction of the thematic layers as well as the landslide susceptibility maps were performed using the program ArcMAP 10.7.1. In addition, various geospatial functions were performed using Python (v3.9). Furthermore, the programming language R (v4.3.2) was used to construct all the necessary codes and scripts of the algorithms.

Table 1 Data sources of the main causal factors

Landslide inventory map

The first task of the methodology comprises the creation of a landslide and non-landslide inventory database (Wieczorek 1984). The landslide inventory database was based on landslides previously recorded and mapped by the Cyprus Geological Survey Department (GSD) (Fig. 3a). This database, which is available publically (GEOportal of Cyprus Geological Survey Department 2022) was provided by GSD also in digital format (shape file). In the prepared inventory database, landslides are represented as points from within the respective landslide polygon. The areas considered for extraction of non-landslide points were identified through satellite imagery inspection. These were supplemented by points from areas of very low landslide susceptibility (Günther et al. 2014; Wilde et al. 2018), as determined from previous susceptibility maps (Hart and Hearn 2010; Alexakis et al. 2014; Kouhartsiouk and Perdikou 2021), as well as from a preliminary landslide susceptibility map constructed within the context of the present study based solely on the values of the Frequency Ratio. Using this approach, the non-landslide points represent locations for which there is a high degree of certainty that there is no landslide and that the likelihood a landslide will occur there in the future is very small. This is juxtaposed to the selected landslide points for which it is almost certain that a landslide actually exists. Non-landslide points were automatically selected from the designated areas (polygons) by employing a spatial function that generates randomly positioned points, namely the Create Random Points function of the Data Management Toolset of the ArcMAP suite (ESRI 2015).

Fig. 3
figure 3

Landslide inventory: (a) map of recorded landslides (source: Geological Survey Department of Cyprus), (b) Training and testing datasets of landslide and non-landslides points

The approach for selecting non-landslide points described above relies on expert knowledge and landslide susceptibility calculations that precede the main analysis. To investigate the effect of potential biases and subjectivity stemming from this method of non-landslide point selection (designated hereinafter as method NLS-1) and its impact on model performance metrics, a sensitivity analysis was undertaken by comparing the results with those from additional analyses that were performed using non-landslide datasets generated using the buffer-controlled sampling method (referred to in the remainder as NLS-2). These analyses involved generating three alternative training and testing datasets using buffer zones with sizes of 250 m, 500 m, and 750 m around the polygons of the recorded landslides, and randomly sampling non-landslide points from the part of the study areas that falls outside the buffer zones and the landslide polygons. Additionally, for the analyses based on the buffer zone approach, each model underwent fivefold cross-validation, where the training data was split into five parts, and in each fold, four parts were used for training and one for testing. This process, repeated five times, produced average accuracy and standard deviation values, which provided insights into each model’s consistency and reliability during training.

Data pre-processing

Two separate tasks were performed during the data pre-processing phase. The first was the selection and creation of classes of causal variables. Generally, these variables should (a) have a relationship with the activation of landslides, (b) have spatial variation, (c) be expressed on a measurable scale and (d) not affect other variables in the final result (Ayalew and Yamagishi 2005). Then, multicollinearity and importance ranking assessment analyses were performed. The final outcome of this phase comprises a database with the thematic layers of all causal factors in digital format within ArcMAP, as well as training and testing datasets of landslide and non-landslide points.

Dataset preparation for spatial modelling

Considering previous studies in areas with similar geo-environmental conditions worldwide, the main causal factors for landslides are the ground elevation, ground inclination (slope and aspect), lithology, distance to tectonic faults, land use, distance to hydrographic networks, plan curvature, profile curvature, distance to road network, Stream Power Index (SPI), Topographic Wetness Index (TWI), and average rainfall amount. Among these, the only factor directly related to the ground properties is the lithology. However, lithology introduces in the analysis the effects of ground strength only qualitatively and without accounting for spatial variations of geotechnical properties within each geological formation. Herein, thematic layers of the geotechnical parameters strongly affecting the ground strength, namely the Plasticity Index (PI), the Clay Fraction (CF), and GSI are prepared and included in the input dataset. Moreover, we included in the analysis the difference between the dip direction of discontinuities with persistence (bedding planes, but also sills and dikes in the case of igneous rocks) and the slope dip direction, given that it plays a crucial role in the occurrence of planar slides. To the best of our knowledge, this is the first study that incorporates these factors as specific layers in probabilistic landslide susceptibility analysis with the aim to enhance its accuracy and level of detail.

Weighting of the classified causal factors was performed using the Frequency Ratio (FR) method of analysis. This analysis highlights the relationship between causal variables and the spatial distribution of landslides in a quantitative manner, and constitutes one of the simplest and most comprehensible probabilistic models (Waqas et al. 2021). Generally, the FR correlates with the likelihood of landslide events occurring (Khosravi et al. 2016). During this step, the thematic causal factor layers were normalized by a procedure that sets new values ranging from 0.1 to 0.9.

Multicollinearity analysis

After the creation of the normalized database, a multicollinearity analysis was carried out to investigate the intercorrelation between the landslide causal factors. This procedure needs to be performed to ensure that all causal factors have a sufficient degree of independence from each other (Achour and Pourghasemi 2020). For this purpose, variance inflation factor (VIF) and tolerance index (TOL) were calculated. Severe multicollinearity is considered to occur when the TOL index is lower than 0.1 and, equivalently, the VIF index is higher than 10 (Bui et al. 2019). In this study, the aforementioned indexes were calculated using the olsrr package (Hebbali 2017) in R programming language (v4.3.2), according to the following equations:

$$TOL=1-{R}^{2}$$
(1)
$$VIF=\frac{1}{TOL}$$
(2)

where R2 is the regression coefficient of determination.

Feature importance analysis

The feature importance of the causal factors was estimated using the mean decrease accuracy (MDA) value within the Random Forest (RF) model. This measure assesses the importance of each feature in making accurate predictions. MDA is determined by permuting the values of each feature and observing a decrease in the accuracy of the model. A significant drop in accuracy indicates that the feature is important as it contributes significantly to the model’s predictive performance. Features with higher MDA values are thus considered more important because they help reduce the impurity or error of the model, enhancing its overall accuracy. Conversely, features with lower MDA values have less impact on the model’s predictive capability.

Preparing the training and testing datasets

Finally, each of the landslide and non-landslide point groups were randomly divided into a training dataset and a testing dataset. This procedure was performed using geostatistical analysis tools within the ArcGIS program. Subsequently, the training and testing datasets of landslide and non-landslide points were combined to create the final training and testing datasets, which were further analyzed using R during the next phases.

Machine learning classifiers

Logistic regression

The Logistic Regression (LR) is a quantitative statistical method that is widely used to predict the probability of landslide occurrence as a function of the predictor variables (Shafapour et al., 2017). These variables can be either continuous, discrete, or any combination of both types, and do not necessarily need to have normal distributions. The independent variables within this model serve as predictors of the dependent variable and can be assessed on nominal, ordinal, interval, or ratio scales, whereas the dependent variable is presented in a binary format. The association between the dependent and independent variables exhibits a non-linear relationship (Yesilnacar and Topal 2005).

Generally, the probability (p) of a landslide occurring can be estimated using the following equation:

$$p= \frac{1}{1+{e}^{-z}}$$
(3)

where z is a linear combination of the selected causal variables and can be expressed as follows:

$$z={b}_{0}+{b}_{1}{x}_{1}+{b}_{2}{x}_{2}+\dots +{b}_{n}{x}_{n}$$
(4)

where bi (i = 0,n) are regression coefficients, and xi are the independent variables (causal factors).

Random forest

The Random Forest (RF) method was introduced in the 90’s by Breiman (2001). This approach is an ensemble Machine Learning method that exploits random binary trees using a subset of observations through bootstrapping techniques (Breiman 2001; Hastie et al. 2009; Son et al. 2018). In general, this methodology achieves high accuracy in both classification and regression tasks by decreasing the variance of the classification error (Catani et al. 2013; Kamal et al. 2022; Lagomarsino et al. 2017; Xiao et al. 2021). Randomly selected training subsets encompassing both samples and variables are employed in the classification process. For each subset, a classifier yields a result. The culmination of the RF model entails aggregating all the outcomes provided by the classifiers (Nhu et al. 2020). To achieve the best possible RF performance, the out-of-bag error must be eliminated by evaluating the minimum number of trees and the number of variables used in each split. Herein, these parameters were determined as follows: n.trees = 2000, m.try = 11.

Extreme Gradient Boosting (XGBoost)

XGBoost was introduced by Chen and Guestrin (2016) and has become one of the most popular and powerful machine-learning algorithms used for both regression and classification tasks. It is an ensemble learning method known for its speed and performance. XGBoost enhances the traditional gradient boosting algorithm (Friedman 2001, 2002) by implementing several optimization and parallelization techniques. Generally, this algorithm achieves a better performance by learning from weak learners’ outputs (usually decision trees) (Hastie et al. 2009). It builds trees sequentially, and each subsequent tree corrects the errors made by the previous tree. XGBoost incorporates regularization techniques to prevent an overfitting and includes parameters that control model complexity (Can et al. 2021). Moreover, it can efficiently utilize multicore CPUs during training, making it faster than the traditional gradient-boosting implementations. In this study, the tune parameters for the XGBoost method were estimated to be max.depth = 3, nrounds = 37.

Landslide susceptibility map

All the above methods (LR, RF, XGBoost) provide as a result the probability of a landslide occurring in a specific location. The final score can take values between 0 and 1, indicating areas where landslides are very unlikely and very likely, respectively. The natural breaks method was used to create five landslide susceptibility classes: very low (VLS), low (LS), moderate (MS), high (HS), and very high (VHS).

Methods of validation and accuracy assessment

In the final phase of this research work, validation and comparison of the performance of the three Machine Learning methods were performed using the testing dataset. In this context, the number of incidents that were classified correctly were compared to the number of incidents that were misclassified by the model. Statistical metrics, namely accuracy, specificity, precision, recall, F1 score and Cohen’s kappa index, were then calculated. Furthermore, the Receiver Operating Characteristic (ROC) curve and the corresponding area under the curve (AUC) were calculated. The ROC curve along with accuracy, precision, and recall provide information about the performance of the methodology (Davis and Goadrich 2006), while the AUC provides an estimation of how well the classifier can distinguish between the positive and negative classes (Fawcett 2006).

In this study, the outcomes of the three Machine Learning classifiers were juxtaposed using non-parametric and linear regression analyses. Specifically, the Wilcoxon signed-rank test (Noether 1992; Wilcoxon 1992) was used for examining the statistical differences in predictive capability between the three models.

Input datasets

The landslide database provided by GSD contains 1936 landslides in total (Fig. 3a), both active and inactive (dormant and relict). The landslides were represented by polygon features, and their size ranges from 60 m2 to 4 km2. A total of 2500 landslide points were generated from the 1936 polygons, taking into account the extent of each landslide, as well as its type. Accordingly, 2500 points with no landslides were selected, as described in the previous section. Then, 1750 points (70%) from each of the two groups were randomly selected to form the training dataset, while the remaining 750 points (30%) were assigned to the testing dataset (Fig. 3b). This size of training dataset is designated as LP1750 in the remainder.

In our study, the training dataset contains both active and inactive (dormant and relict) landslides. So, the Machine Learning models were trained to identify high susceptibility regions on the basis of areas that have suffered ground movements at any time in the past, even if they seized being active long ago. The choice of treating dormant and relict landslides as locations of very high landslide susceptibility stems from recent observations of reactivation of inactive landslides, mostly due to changes in land use (Tzampoglou et al. 2024).

The study area happens to be thoroughly surveyed for landslides (Hart & Hearn 2010), resulting in a sizeable landslide inventory. Hence, to examine the performance of the methods in a case of relative scarcity of landslide data, a limited number of analyses were performed with the training dataset reduced to only 20 landslide points. These were selected randomly by utilizing the createDataPartition function. This algorithm ensures that the splits are stratified based on the landslide samples and preserves the proportion of classes in each subset, thus maintaining the integrity of the distribution across the various causal factors (Kuhn 2008). The 20 landslide points were paired with a selection of an equal number of non-landslide points. This reduced size of training dataset is labelled as LP20. The testing dataset was the same as in the analysis with the full training dataset, i.e. 750 landslide points and 750 non-landslide points, in order to have a common basis for comparison between the two scenarios. It should be noted that, in the case of the small inventory scenario, the FR was revaluated on the basis of the reduced dataset before performing the training of the models.

A total of 17 causal factors (geomorphological, geological/geotechnical, hydrological and anthropogenic) were considered for inclusion in the development of the susceptibility maps, which are presented in the following paragraphs.

Geomorphological factors

All the geomorphological factors were derived from the Digital Elevation Model (DEM) with a 15 m resolution, created using topographic maps of the Department of Land and Surveys (Cadastre) of the Republic of Cyprus. Specific geo-processing tools within ArcGIS were used to estimate the following derivative factors: slope angle, aspect, plan curvature, and profile curvature.

Elevation is one of the most common factors to be considered for the estimation of landslide susceptibility and is generally through to have a positive correlation with the occurrence of landslides (Raghuvanshi et al. 2015). Elevation, in combination with rainfall, is closely related to the manifestation of landslides in mountainous areas (De Oliveira et al. 2016). Herein, the elevation was divided into four classes, namely < 350 m, 350–700 m, 700–1050 m and > 1050 m (Fig. 4a and Table 2).

Fig. 4
figure 4

Thematic maps of geomorphological factors: (a) Elevation, (b) Slope angle (c) Aspect, (d) Dip Direction Difference, (e) Plan curvature, (f) Profile curvature

Table 2 Spatial relation of causal factor and landslides by Frequency Ratio

Similar to elevation, the slope angle (Fig. 4b) and aspect (Fig. 4c) are the factors that are broadly used for the investigation of landslide failure mechanisms because they both directly or indirectly affect the stability of the ground (Gaidzik and Ramírez-Herrera 2021; Sun et al. 2021). For example, the shear stress increases with increasing slope angle (Aboye 2009). In addition, the aspect affects climatic parameters such as the degree of sun exposure and evapotranspiration (Conforti et al. 2014; Wang et al. 2019; Raghuvanshi et al. 2015). The slope angle was divided into six classes (< 5°, 5°−15°, 15°−25°, 25°−35°, 35°−45° and > 45°), while the aspect into nine classes (North, Northeast, East, Southeast, South, Southwest, West, Northwest, Flat) (Table 2).

To generate the thematic map of the dip direction (azimuthal) difference θ, first a map of the spatial distribution of dip direction of bedding planes (including sills and dikes) was created using all the geological compass measurements shown in the geological maps provided by GSD. The dip markings were digitized into ArcGIS in the context of this study. This dataset was enhanced with additional geological compass measurements obtained during fieldwork campaigns. The resulting geodatabase comprised 1520 measurements in total. The dip direction difference θ (Fig. 4d) was then calculated using the following formula:

$$\theta =\text{min}(\left|{AZ}_{1}-{AZ}_{2}\right|, {360}^{o}-\left|{AZ}_{1}-{AZ}_{2}\right|)$$
(5)

where AZ1 is the dip direction of the ground surface (aspect) and AZ2 is the discontinuity dip direction. As θ approaches zero, the likelihood of planar sliding increases. The value of θ cannot exceed 180°. Four classes (0°−30°, 30°−60°, 60°−90o and > 90°) were considered for θ (Table 2).

The curvature factors play an active role in the manifestation of landslides as they affect the surface runoff and the distribution of erosion (Moosavi and Niazi 2016). The plan curvature (Fig. 4e) is perpendicular to the direction of the maximum slope and controls the convergence or divergence of surface water across the surface (Carson and Kirkby 1972). The plan curvature falls into three categories: positive values reflect convex surfaces, negative values reflect concave surfaces, and zero values reflect linear surfaces (Buckley 2010). The profile curvature (Fig. 4f) is parallel to the steepest slope (Ayalew et al. 2004), affecting the changes in water flow velocity across the surface, as well as the deposition and erosion (ESRI 2015; Zevenbergen and Thorne 1987). The profile curvature falls into three categories: positive values reflect concave surfaces, negative values reflect convex surfaces, and zero values reflect linear surfaces (Chapi et al. 2017).

Geological/geotechnical factors

Lithology reflects the physical and mechanical characteristics of the ground. Based on the GSD geological maps, the lithology was divided into 18 classes, namely Quaternary Formations, Nicosia Formation, Kalavasos Formation, Pachna Formation, Lefkara Formation, Talus, Kannaviou Formation (bentonitic clays), Kannaviou volcanoclastics, Agia Varvara Formation, Pera Pedi Formation, Mélange, Radiolarian mudstones, Limestones, Silica sandstones, Lavas, Diabase group, Serpentinite and Gabbro (Fig. 2 and Table 2).

Proximity to tectonic faults foretells pre-sheared rock mass of poor quality. Hence, distance to fault (Fig. 5a) is one of the main preparatory factors of slope failures (Wu et al. 2014). In this study, it was categorized into five classes, namely < 250 m, 250–500 m, 500–750 m, 750-100 m and > 1000 m (Table 2).

Fig. 5
figure 5

Thematic maps of geological/geotechnical factors: (a) Distance to tectonic faults, (b) Clay Fraction, (c) Plasticity Index, (d) Geological Strength Index

The geotechnical properties frequently reported in geotechnical investigations for engineering projects, such as CF, PI and GSI, are strongly related to the shear strength of the ground (Anis et al. 2019; Sartohadi et al. 2018; Mirabedini et al. 2022). High CF and PI values generally lead to slope stability problems (Yalcin 2007; Ahmad et al. 2019; De Ploey and Cruz 1979). To establish the spatial distribution of the geotechnical properties, 667 geotechnical boreholes contained in the files of GSD were processed. This dataset was enhanced with additional surface samples obtained during fieldwork campaigns. Each geological formation was examined separately. For geological formations that had plenty of data well distributed throughout their extent, interpolation was performed using the Inverse Distance Weighted (IDW) spatial analysis tool. On the other hand, in those formations where there was either scarcity of data or the data points where cluttered in regions of limited extent, the average was adopted as an indicative value. The assembly of the spatial distributions of geotechnical properties per geological formation into final thematic maps with resolution 15 m × 15 m for the entire study area was done in ArcGIS (raster calculator, mosaic to new raster, etc.). The CF (Fig. 5b) was divided into five classes, namely 0–15%, 15–30%, 30–45%, > 45%, and N/A (i.e. not applicable), while PI (Fig. 5c) into 0–20%, 20–40%, 40–60%, > 60%, and N/A (Table 2).

The GSI (Fig. 5d) is a geotechnical characterization system that quantifies the quality of rock masses by considering a series of factors, i.e. rock mass structure, number of discontinuity sets, schistosity, roughness and quality of discontinuities (Marinos et al. 2005). The GSI directly relates to rock mass strength and thus plays an important role in landslide occurrence (Mirabedini et al. 2022; Singh et al. 2014). To create the GSI thematic layer, the geological formations were aggregated into four larger groups: 1) Quaternary deposits, Talus; 2) Nicosia F., Kalavasos F., Pachna F., Lefkara F.; 3) Kannaviou F. (bentonitic clays), Kannaviou volcanoclastics, Agia Varvara F. (metamorphics), Pera Pedi F., Mélange, Radiolarian Mudstones, Limestones, Silica Sandstones; 4) Lavas, Diabase, Serpentinites, Gabbro. GSI data was collected from the field by inspecting rock masses exposed in road cuts and creeks (a total of 48 observations well-spread across the study area). Each of these four group was treated separately through interpolation processes using IDW spatial analysis tool. Finally, the GSI was divided into four classes (0–40, 40–60, 60–90, N/A) (Table 2).

Hydrological factors

Rainfall is an important triggering factor (Segoni et al. 2018). The average annual rainfall, in particular, can be considered also a preparatory factor because it affects the long-term groundwater table and, thus, the sustained pore water pressure and mobilized shear strength (Wistuba et al. 2021). The thematic map of average annual rainfall was established based on data from 24 rainfall stations of the Department of Meteorology of Cyprus for the period 1991—2022 (Fig. 6a) using Inverse Distance Weighted (IDW) interpolation in the ArcGIS program. The highest mean annual rainfall were observed in the northeastern part of the study area, whereas the coastal areas had the lowest values. It should be mentioned that the average annual rainfall was introduced as continuous raw data (Fig. 6a) instead of being divided into classes. The reason for this is that preliminary analyses showed that rainfall has one of the strongest influences on landslide susceptibility. As a result, division into classes resulted into artifacts in the susceptibility maps stemming from the borders between regions of different rainfall classes, such as unrealistically abrupt (discontinuous) changes in landslide susceptibility.

Fig. 6
figure 6

Thematic maps of hydrological/hydrogeological factors: (a) Average annual rainfall (b) Distance to hydrographic network, (c) Stream Power Index, (d) Topographic Wetness Index

The distance to the hydrographic network (Fig. 6b) can affect the stability of slopes because in areas near streams the phreatic water table is expected to be at a relatively shallow depth, leading to reduced shear strength values. The study area was divided into five multi-ring buffer zones from both sides of the thawlegs (< 100 m, 100–200 m, 200–300 m, 300–400 m and > 400 m) (Table 2).

The Stream Power Index (SPI) (Fig. 6c) can be used to estimate the eroding power of surface water (Althuwaynee et al. 2014; Sharma 2010). The probability of landslide increases as SPI increases (Akgun and Türk 2010; Althuwaynee et al. 2014). This index was calculated using the DEM through the following equation:

$$SPI=A\;\text{tan}\;\beta$$
(6)

where A signifies the upstream area and β the slope angle. The SPI was split into 5 classes (−13 to −9, −9 to −2, −2 to 0, 0 to 2, 2 to 8) (Table 2).

The Topographic Wetness Index TWI (Fig. 6d) calculates the ratio of the accumulated surface water flow to the slope gradient (Beven and Kirkby 1979; Gruber and Peckham 2009; Sørensen et al. 2005). Higher values lead to elevated susceptibility to landslides because the ground can hold more moisture (Sun et al. 2020). This index was calculated using the DEM, considering the following equation:

$$TWI=\text{ln}\left(\frac{A}{\text{tan}\beta }\right)$$
(7)

The TWI was divided into 6 classes (2–4, 4–5, 5–6, 6–7, 7–8, > 8) (Table 2).

Anthropogenic factors

The construction of road networks influences the ground stability through the extensive earthworks (road cuts and embankments) and the removal of vegetation, as well as the imposition of additional external loads (Bourenane et al. 2015). The distance to the road network is inversely related to the likelihood of landslide occurrence (Pawluszek and Borkowski 2017). In this study, the distance to the main road network (Fig. 7a) was divided into five classes (< 100 m, 100–200 m, 200–300 m, 300–400 m and > 400 m) (Table 2).

Fig. 7
figure 7

Thematic maps of human activity (anthropogenic) factors: (a) Distance to main road network, (b)Land use cover

Changes in land use over the years can lead to slope stability issues (Meneses et al. 2019; Roccati et al. 2021; Shu et al. 2019). The land use map was produced based on the CORINE Land Cover inventory (Corine, 2018). Considering the nature of their impact on landslides, land use types (Fig. 7b) were grouped into five classes as follows: 1) Continuous urban fabric, Road and rail networks and associated land, Mineral extraction sites (CLC codes 111, 122, 131); 2) Discontinuous urban fabric, Industrial or commercial units, Permanently irrigated land, Sclerophyllous vegetation, Transitional woodland-shrub, Sparsely vegetated areas (CLC codes 112, 121, 212,323,324,333); 3) Vineyards, Fruit trees and berry plantations, Olive groves, Broad-leaved forest, Coniferous forest, Beaches (CLC codes 221, 222, 223, 311, 312, 331); 4) Airports, Construction sites, Sport and leisure facilities, Non-irrigated arable land, Pastures, Annual crops associated with permanent crops, Complex cultivation patterns, Land principally occupied by agriculture, Natural grasslands, Bare rocks (CLC codes 124, 133, 142, 211, 231, 241, 242, 243, 321, 332); 5) Wetlands, Water bodies (CLC codes 512, 411) (Table 2).

Numerical Results

Frequency ratio analysis

The frequency ratio (FR) values were estimated for each class of causal factors. Higher FR values indicate a stronger correlation between the class of causal factors and landslide occurrence. This means that, as the FR value increases, the likelihood of landslides associated with that specific class also increases (Das et al. 2023). The resulting FR values on the basis of the training dataset containing 1750 landslide points are shown in Table 2. Generally, the largest FR values among all classes were observed for Kannaviou formation (clays FR = 4.75, volcanoclastics FR = 12.04) and talus (FR = 4.84), for regions with a slope angle larger than 45° (FR = 7.42) and for PI larger than 60 (FR = 4.58).

Focusing on geomorphological causal factors, the highest frequency of landslides was observed in areas with elevation between 350 and 700 m (FR = 1.33). FR values are lower in areas above 700 m elevation, probably because of sparse urban development, limited road networks, and dense vegetation. Regarding ground inclination, the areas with slope angles in the ranges 5–15° and 15–25° present also relatively high values of FR (1.21 and 1.55, respectively). The FR values appeared to decrease in areas with more than 25° slope angles, likely due to reduced rainwater infiltration. Exceptions are areas with over 45° slope angle, where FR becomes very large. The results regarding the aspect indicate that most of the landslides occurred with north orientation (North: FR = 1.33; Northwest: FR = 1.30; Northeast: FR = 1.26), highlighting the impact of climatic factors (the NNW–NNE-oriented slopes are mostly sunless, facilitating larger rainwater infiltration). The results for the plan curvature (< −0.2: FR = 1.47) and profile curvature (> 0.15: FR = 1.27) indicated that most of the landslides were recorded on concave slopes. This could be explained by the fact that concave slopes are susceptible to landslides because they can hold moisture (Sujatha et al. 2012). The dip-direction difference θ does not seem to have a significant effect. This is probably because most of the weak formations in the study area (Kannaviou formation, radiolarian mudstones) are intensely folded or they have mostly a chaotic structure (mélanges), limiting the prevalence of planar failure mechanisms.

Regarding the geological factors, most landslides are recorded in the Kannaviou Formation volcanoclastics (FR = 12.04), talus (FR = 4.84), Kannaviou Formation (bentonitic clays) (FR = 4.75), radiolarian mudstones (FR = 3.86) and mélange (FR = 3.20). Furthermore, the likelihood of presence of landslide is high near faults (distance < 250 m: FR = 1.21) and decreases substantially with distance from them.

Considering hydrological factors, the distance from the hydrographic network is strongly correlated with the occurrence of landslides. As expected, the FR values exhibit an inverse relationship with the distance from rivers and streams, with FR ranging from 1.33 at distances less than 100 m from the hydrographic network to 0.28 at distances exceeding 400 m. The SPI index appears to be an increasing function of landslide occurrence, ranging from FR = 0.69 for SPI values between −13 and −9 to FR = 1.37 for SPI values between 2 and 8. An exception observed in the SPI range 0 to 2 (FR = 1.41) may be attributed to the influence of other factors such as geological conditions, land use, or human activities that mitigate the effects of high SPI. The ΤWI classes 4–5, 5–6, and 6–7 present the greatest FR values (1.07, 1.19 and 1.04, respectively), while the class in which the TWI index is over 8 (FR = 0.74) presents similar values to the class in which the TWI ranges from 2 to 4 (FR = 0.77). This observation could be attributed to the fact that extremely high TWI values often correspond to very flat areas, where water accumulation occurs without ground inclination sufficient to trigger landslides, or areas with efficient drainage systems that reduce the impact of water on slope stability. Conversely, lower TWI values (2–4) may still represent areas with moderate slopes and some degree of water accumulation, maintaining a consistent landslide risk.

Considering the anthropogenic factors, it appears that the distance from the main road network does not correlate inversely with the occurrence of landslides (as it would have been expected), as the FR values ​​increase with distance from it. This finding is mainly due to the fact that almost all the landslides in the study area are failures of natural slopes. Furthermore, the majority of those are inactive landslides which had developed before the existence of road networks. Moreover, the road network is mainly developed in the seaside plains of the study area; therefore, no valid conclusions can be drawn.

Finally, the clay fraction (CF) correlates positively with landslide occurrence, as it signifies low shear strength. The largest FR values were observed in areas where the CF exceeded 45% (FR = 2.93). The PI follows the same trend, with the highest FR values observed in areas in where the PI is over 60% (FR = 4.58). Finally, the FR turns out to be in general a decreasing function of GSI. This is expected because high GSI values indicate large rock mass strength.

Multicollinerarity analysis and factor importance

The results of the multicollinearity analysis indicate that the causal factors can be considered sufficiently independent from each other, as all TOL values are well above 0.1 (Table 3). The lowest TOL values are observed for PI (0.459), lithology (0.481), CF (0.569) and GSI (0.607), attesting to a certain degree of correlation of the geotechnical factors to lithology, as well as to each other (e.g. geomaterials with high CF usually exhibit high PI values).

Table 3 Results of multicollinearity analysis

According to Mean Decrease Accuracy (MDA) results based on the Random Forest model (Fig. 8) trained with dataset LP1750 / NLS-1, the most important variable in our study area is the lithology (a geological factor), followed by rainfall (an environmental factor), land use (an anthropogenic factor) and slope angle (a geomorphological factor). It is interesting to note that the geotechnical factors PI and GSI rank 6th and 7th in importance among the 17 causal factors, despite their correlation with lithology. This finding suggests that the spatial variation of these geotechnical properties within a given geological formation may play a measurable role in landslide occurrence. Nevertheless, the other geotechnical factor, CF, appears to be of small importance. Finally, the factor with the lowest importance according to MDA analysis is the dip direction difference.

Fig. 8
figure 8

Importance ranking based on mean decrease accuracy

Landslide susceptibility maps

Figure 9 presents the landslide susceptibility maps generated by the three Machine Learning algorithms (LR, RF, XGBoost) using the training dataset LP1750 / NLS-1, i.e. 1750 landslide points with equal number of non-landslide points sampled using the NLS-1 approach described in Sect. "Landslide inventory map". All maps display a gradation of five levels of landslide susceptibility, namely very high (VHS), high (HS), medium (MS), low (LS) and very low (VLS), which were produced using the natural breaks classification method. The percentage of the study area covered by the HS and VHS classes is 41.2%, 36.3% and 34.7% according to the LR, RF and XGBoost methods, respectively. These areas are found mainly in the central and southeastern part, and in a separate cluster close to coast in the northwestern part. This is due to the presence of outcrops of weak argillaceous geomaterials in these regions, namely Kannaviou formation, mélange and radiolarian mudstones, a fact attesting to the high importance of the lithology. Areas of very low and low susceptibility are found close to the coastlines due to the presence of the more sandy Quaternary formations and low slope angles, and in the northern parts which are dominated by the outcrops of high strength igneous rocks of the Troodos ophiolite.

Fig. 9
figure 9

Landslide susceptibility maps generated using (a) Logistic Regression, (b) Random Forest and (c) XGBoost methods of analysis. Landslide susceptibility quantified through gradation of 5 levels (VHS, HS, MS, LS, VLS). Models trained using dataset LP1750 / NLS-1

The rather unusually large area classified as "very high susceptibility" (VHS) in the maps produced by Logistic Regression (29.7%), Random Forest (27.5%), and XGBoost (22.7%) models (Fig. 9) can be attributed to the fact that 9% of the study area is covered by the recorded (mapped) landslides (Fig. 3a), coupled with the predictive power of the models. An additional reason is that most of the recorded landslides happen to be in argillaceous formations characterized by very low friction angle, such as the bentonitic clays of the Kannaviou formation, mélanges and the marls of the Nicosia formation (Northmore et al. 1986; Tzampoglou et al. 2024). As a consequence, many landslides occur in areas with low slope angles (e.g. 7°−12°) in addition to regions with rugged terrain.

Performance of machine learning models

The ROC-curves for the three models (trained with dataset LP1750 / NLS-1) are shown in Fig. 10, while the associated metrics are presented in Table 4. It can be seen that all the models exhibit large AUC (Area-Under-the-Curve) values (≥ 0.98), attesting to a very high performance. In order to investigate the significance of differences between the models, Wilcoxon signed-rank tests at the 5% significance level were used. According to this test, there are no significant differences between models if the p-value is more than 0.05 and the z-value falls in the −1.96 to 1.96 range. The Wilcoxon signed-rank test results are shown in Table 5. The LR-RF comparison yields a z-value of 2.16 and a p-value of 0.0303, indicating a statistically significant difference and suggesting that Random Forest (RF) outperforms Logistic Regression (LR). For the XGBoost-LR comparison, the z-value is 1.86 with a p-value of 0.0625, which is larger than the threshold value of 0.05, but close enough to suggest a potential trend towards significance. Therefore, while RF outperforms LR, the potential superiority of XGBoost over LR is suggested but not confirmed. Lastly, the XGBoost-RF comparison yielded a z-value of 1.46 and a p-value of 0.1426, indicating no significant difference in performance between these two models.

Fig. 10
figure 10

ROC curves of the landslide susceptibility maps generated using dataset LP1750 / NLS-1

Table 4 ROC-curve parameters from analyses with training dataset LP1750 / NLS-1
Table 5 Results of Wilcoxon signed-rank model (training dataset LP1750 / NLS-1)

To further investigate the prediction performance of the three models, the polygons of all 1936 recorded landslides were divided into grid cells of size 15 m x 15 m (same size as for the landslide susceptibility map), resulting in 578967 landslide cells in total. Subsequently, the percentage of landslide cells falling in each class of the susceptibility maps of Fig. 9 was evaluated. As seen in Fig. 11a, RF achieved the best results, as 87% of the recorded landslides falls within the very high landslide susceptibility level, while the respective percentages are 78% for XGBoost and 83% for LR. Examining the landslide frequency alone may lead to misinterpretation. For example, if a map classifies 90% of a study area as having high or very high landslide susceptibility, it may show extremely high predictive performance in terms of capturing the recorded landslides. However, this result would be unrealistic as it would likely overestimate the risk in many areas. To address this issue, the predictive capability was also examined in terms of relative landslide density, i.e. in relation to the total spatial coverage of each susceptibility level in the study area (Fig. 11b). The largest values of relative landslide density are observed in the very high susceptibility class, with RF achieving 71%. Lower values were observed using the LR (68%) and XGBoost (66%) methods.

Fig. 11
figure 11

Percentage (a) and density (b) of recorded landslides in each susceptibility class and (c) Spatial extent of ranges of landslide susceptibility value. Models trained using dataset LP1750 / NLS-1

To further examine the degree of correspondence of recorded landslides to high susceptibility values independently of the classification method (natural breaks) used for generating the five susceptibility classes, the susceptibility values were divided in 10 equal intervals (i.e. 0–0.1, 0.1–0.2, ….,0.9–1.0) and their spatial extent on the study area is presented in Fig. 11c. Most of the study area (30%−36%) falls within the lowest susceptibility value range (0–0.1). A substantial percentage falls in the 0.9–1.0 range (21%−26%). This can be explained by a) the fact that a large portion (9%) of the study area is covered by recorded landslides (inventory) and b) the large extent of outcrops of weak argillaceous geomaterials. In our case, a model based on the modern high-performing machine learning methods should predict at least 9% of the study area to have landslide susceptibility values in the 0.9–1.0 range.

Tables 6 and 7 present various statistical metrics of testing performance for the case in which the models are trained using the large dataset of 1750 landslide points (LP1750 / NLS-1), and compare them to those achieved when the models are trained using a reduced dataset of only 20 landslide points (LP20 / NLS-1). The best performing model with respect to each metric is noted with bold font. For the calculation of the statistical metrics, two different values, 0.5 (Table 6) and 0.8 (Table 7), are considered for the positive classification threshold (also called probability threshold). For the models trained using the LP1750 dataset, the accuracy, specificity, precision and recall attain very high values (larger than 90%) when considering the 0.5 threshold value. Moreover, the F1 score is larger than 0.9 and the Cohen’s kappa (κ) is larger than 0.8, indicating excellent performance. The RF has the best performance in all metrics, while LR exhibits the smallest values among the three methods. However, when the threshold value is raised to 0.8, the LR and RF yield practically the same metric values, with LR having a slight lead (Table 7). This shows that the LR has a tendency to identify the recorded landslides more confidently than the other methods by assigning larger value of landslide susceptibility. It is worth pointing out that, despite the elevated threshold, the predictive capability of the models is still very good, with accuracy close to 0.9 or higher. In particular, the performance of LR and RF can be deemed excellent, since F1 score remains above 0.9 and Cohen’s kappa above 0.8.

Table 6 Testing performance for probability threshold 0.5
Table 7 Testing performance for probability threshold 0.8

The performance naturally degrades when the models are trained using a thinned-down dataset of only 20 landslide points (LP20 / NLS-1). Yet, all models manage to achieve adequate accuracy (> 0.7) when considering the positive classification threshold value to be equal to 0.5 (Table 6). Although RF remains the best overall performing method also in the case of the thinned-down training dataset, XGBoost exhibits better specificity and precision. Both methods achieve F1 score larger than 0.8 and Cohen’s κ larger than 0.6, indicating good to very good predictive capacity and inter-rater reliability.

Effect of non-landslide points sampling method

Figure 12 presents the landslide susceptibility maps produced using the buffer zone approach (NLS-2) for sampling the non-landslide points, with the buffer size being 250 m. It can be seed that the maps have the same qualitative characteristics as those of Fig. 9 (NLS-1 approach). Nonetheless, the areas characterized as VHS are smaller in Fig. 12. This is especially true for LR and RF, in which case the VHS areas of Fig. 12 are almost half those observed in Fig. 9.

Fig. 12
figure 12

Landslide susceptibility maps generated using (a) Logistic Regression, (b) Random Forest and (c) XGBoost methods of analysis. Models trained using dataset LP1750 / NLS-2 with buffer zone size 250 m

Tables 8 and 9 compare the performance metrics from the analyses using the NLS-2 sampling method with those of method NLS-1 which involves expert knowledge and preliminary landslide susceptibility calculations. The results show that the use of the buffer zone approach results in smaller but still high AUC values (> 0.9). In particular, for the 750 m size, the AUC values for Random Forest and XGBoost are 0.978 and 0.979, very close to those from the analyses involving prior knowledge in the selection of the non-landslide points (Table 8). Moreover, the NLS-2 approach yields smaller performance metric values (accuracy, specificity, F1, κ) than NLS-1 in all scenarios, with specificity in the 750 m buffer scenario being the only exception. RF and XGBoost consistently exhibit superior predictive accuracy than LR, with AUC values surpassing 0.95. Unlike the case of NLS-1, XGBoost is outperforming RF in terms of AUC in all three buffer zone scenarios. However, RF achieved better statistical metrics (Table 9) than XGBoost when the buffer zone size is 750 m. The AUC and the other metrics improve with increasing buffer size for all three machine learning models, a fact that is in agreement with previous studies that employed the buffer zone method (Dou et al. 2023; Guo et al. 2024; Liu et al. 2024; Xi et al. 2022). This observation could be partially attributed to reduction of spatial overlap and misclassification risks as the buffer size increases.

Table 8 Comparison of AUC values for landslide susceptibility models with different non-landslide sampling method
Table 9 Comparison of performance metrics for landslide susceptibility models with different non-landslide sampling method (probability threshold 0.5)

Figure 13 shows the frequency distribution of landslide susceptibility values from the buffer-zone-based analyses. It can be seen that the percentage in the 0.9–1.0 range is smaller than that observed in Fig. 11c. The spatial extent of the 0.9–1.0 range from the Logistic Regression method is only 3.3%, far lower than the percentage of the study area covered by the recorded landslides (9%). RF and XGBoost distributions exhibit a local maximum in the 0.9–1.0 range, as in the case of the NLS-1 approach. More specifically, the percentage value from analysis with RF is 10%, slightly larger than the spatial extent of recorded landslides (9% of the study area), while for the XGBoost this percentage rises to 18.2%, effectively approaching the 20%−25% yielded by using the NLS-1 approach. The present results show that, in our study area, a highly performing model will inevitably lead to a “spike” in the highest landslide susceptibility range, irrespectively of the non-landslide sampling method employed.

Fig. 13
figure 13

Spatial extent of ranges of landslide susceptibility value for analysis with dataset LP1750 / NLS-2 with buffer zone size 250 m

Effect of incorporation of geotechnical thematic layers

In order to investigate the contribution of the geotechnical thematic layers (Plasticity Index, Clay Fraction and Geological Strength Index) to the predictive capability of the models, landslide susceptibility calculations were repeated without including these thematic layers in the analysis. Their effect on model performance is quantified through the ratio RGT (Fig. 14), which is the ratio of the statistical metric values (accuracy, specificity, precision, recall, F1 and κ) obtained from the analyses that include the geotechnical thematic layers to the respective values from analyses performed without the geotechnical thematic layers. RGT > 1.0 signifies performance enhancement, while RGT < 1.0 performance degradation.

Fig. 14
figure 14

Change in statistical metrics due to the incorporation of the geotechnical thematic layers in analysis with (a) 1750 and (b) 20 landslide points in the training dataset (probability threshold 0.8)

Figure 14a presents the results for the case of the models being trained using the large dataset of 1750 landslide points and non-landslide sampling approach NLS-1 (LP1750 / NLS-1). The RGT for RF in all statistical metrics turns out to be slightly larger than 1.0, indicating a meager performance enhancement. On the other hand, LR and XGBoost exhibit a marked improvement, particularly in terms of precision (smaller false positive rate) and κ (better inter-rater reliability). The results indicate that the inclusion of the geotechnical thematic layers help more the methods that are comparatively underperforming. The AUC values from the two sets of analyses are shown in Table 10. It can be seen that incorporating the geotechnical variables leads to a slight improvement in AUC values for all models. The gain is larger in LR and XGB, than in RF. These enhancements, although modest, suggests that the inclusion of the geotechnical variables can provide additional predictive value, contributing to a more accurate landslide susceptibility assessment.

Table 10 Comparison of AUC values for landslide susceptibility models with and without inclusion of geotechnical variables

When the models are trained based on the thinned-down dataset (LP20 / NLS-1), the trends appear more irregular, with improvement shown in certain metrics but degradation in others (Fig. 14b). Substantial enhancement is observed in terms of precision, F1 and κ of the RF model, showing that overall balance between precision and recall and the agreement between predictions and true labels still improve. In the case of the LR model performance enhancement is particularly seen in terms of accuracy and κ. The LR method seems to provide the most stable model, benefiting from the geotechnical layers irrespectively of the size of the training dataset. This may be attributed to the fact that LR, being a simpler method, can respond better under limited data, showing more consistent improvements across metrics. On the other hand, XGboost appears not to improve at all with the use of geotechnical thematic layers (Fig. 14b).

Discussion

The proposed methodology is expected to contribute to better-informed decision-making processes in land-use planning, infrastructure development, and disaster management, ultimately reducing the risk and impact of landslides on human life and property. The resulting maps can be utilized as an approximate guide for areas where future urban development and infrastructure projects are being considered. For instance, in areas classified as highly to very highly susceptible to landslides, thorough geological/geotechnical studies with specific requirements pertinent to landslide hazard should be submitted to and approved by state authorities prior to construction. Conversely, in areas of low to very low susceptibility, geotechnical investigations and analysis could be limited to those provisioned by the local geotechnical design standard, namely Eurocode 7, on condition that other types of geohazards are not prevalent.

The results illustrate that in cases of availability of a comprehensive landslide inventory, Machine Learning techniques are capable of achieving very high success rates in identifying zones prone to landslides and distinguishing them from those that are relatively safe. The high AUC values achieved (0.9 to 0.985) are primarily attributable to 1) the extensive number of recorded landslides (comprehensive inventory) and 2) the high performance of the machine learning methodologies. It should be mentioned that a number of recent publications employing Random Forest, XGBoost or Logistic Regression report AUC values in the 0.90 to 0.99 range (e.g. Agboola et al. 2024; Chang et al. 2023; Dou et al. 2023; Liu et al. 2024; Xi et al. 2022). The incorporation of the geotechnical thematic layers leads to a relatively small performance improvement. This is due to the inherent correlation of these thematic layers with the lithology layer, which turned out to be the most dominant factor in the given study area (Fig. 8). The results suggest that, for this study area, the variation of the ground properties across a geological formation is a less critical information than the geological formation itself.

Despite the excellent performance of the Machine Learning methods used herein, there are certain limitations that must be acknowledged. An important issue that may be introducing a certain degree of bias in the present results is the procedure followed in selecting the non-landslide points, which relied on expert knowledge and exploitation of low susceptibility areas from previous studies and preliminary frequency-ratio-based analysis. The proper selection strategy for non-landslide points remains a topic of discussion within the scientific community.

Another limitation of this study is that models may be affected by overfitting owing to the specific characteristics of the study area, limiting their generalizability to other regions. The accuracy of landslide susceptibility maps is also influenced by the resolution of the input data and the classification methods used, which might lead to the over- or underestimation of landslide risks in certain areas. Expanding the study to different geographical areas with varying conditions would help to further validate and refine the models.

Conclusions

The main objective of this study was to examine the landslide susceptibility prediction capability of three different machine learning methods: Logistic Regression (LR), Random Forest (RF), and XGboost. In addition, the geotechnical parameters Plasticity Index (PI), Clay Fraction (CF) and Geological Strength Index (GSI) were incorporated as thematic layers and the potential gain in model performance was evaluated. The study area, which comprised the southwestern part of Cyprus, contains 1936 recorded landslides of different types, rendering it an ideal test bed. After processing the feature importance of 17 causal factors, lithology stands out as the factor that affects the most the generation of landslides, followed by rainfall, land use and slope angle.

All models achieved high values of accuracy, specificity, F1 score and Cohen’s kappa index, while AUC values exceeded 0.9. Among the three methods, RF exhibits the best performance, followed by XGBoost and LR, provided that the statistical metrics are calculated assuming a true positive threshold equal to 0.5. For a threshold equal to 0.8, the LR achieves slightly better metrics compared to RF, on condition that the models are trained based on an extensive landslide inventory. Unlike LR, the RF and XGBoost methods exhibit adequate performance even when the models are trained using a thinned-down dataset of only 20 landslide points. The incorporation of the geotechnical parameters PI, CF and GSI as thematic layers in the landslide susceptibility analysis led to a small but measurable performance improvement. The performance enhancement is more marked in cases in which the models are the least effective.