Objective The 5·12 Wenchuan earthquake triggered extensive secondary geological disasters and cascading effects. Wenchuan County, which was severely impacted by the earthquake, exhibits widespread unstable slopes and areas prone to landslides and collapses. In mountainous regions, the occurrence of extreme rainfall events precipitates extensive landslides and collapses. The copious loose material produced constitutes a substantial sediment source, exacerbating the magnitude of flash flood disasters under the coupling effect of water and sediment movement, and particularly heightening the risk of debris flows and debris floods. Given these circumstances, it is imperative to develop assessment models for landslide and collapse susceptibility to facilitate early prevention of compound flash flood disasters in Wenchuan County. Conventional susceptibility assessment approaches often rely on expert experience and subjective judgment; alternatively, they encounter difficulties in adequately fitting high-dimensional complex data. As a result, the precise delineation of the actual spatial distribution of areas susceptible to landslides and collapses remains a formidable challenge. Recent advancements in data science and machine learning provide promising solutions. Two state-of-the-art ensemble learning algorithms, eXtreme Gradient Boosting (XGBoost) and Light Gradient Boosting Machine (LightGBM), are introduced to formulate dependable models for appraising susceptibility to landslides and collapses within the confines of Wenchuan County. Methods A comprehensive evaluation of factors related to topography, geology, meteorology, and hydrology was conducted to select ten evaluative factors: Elevation, slope, aspect, terrain relief, distance to rivers, distance to faults, normalized difference vegetation index (NDVI), land cover type, average annual precipitation, and lithology. Data preprocessing procedures were implemented to ensure the effectiveness and stability of model training. The data were standardized to mitigate the impact of differing scales among the dependent factors on the model. Factors displaying significant multicollinearity were identified and excluded using the Variance Inflation Factor (VIF), ensuring the independence of each feature in the analysis. In addition, the Information Gain Ratio (InGR) was utilized as a metric to evaluate the importance of each factor, facilitating the preliminary selection of explanatory variables. Then, two advanced ensemble learning algorithms (XGBoost and LightGBM) were applied alongside two traditional algorithms (logistic regression and random forest) to construct landslide and collapse susceptibility assessment models for Wenchuan County. Quantitative metrics, including accuracy, precision, recall, F1 score, and receiver operating characteristic (ROC) curves, were employed to enable a comparative and evaluative analysis of the performance of each model. These models were then utilized to predict the probabilities of landslide and collapse occurrences across the designated study area. The natural breakpoint method was employed to demarcate susceptibility zones, resulting in the development of a map delineating areas vulnerable to landslides and collapses. Additional qualitative and quantitative analyses were performed on the resulting susceptibility maps, with particular attention given to the correspondence between predicted results and actual landslide and collapse events, evaluating the predictive reliability of the proposed models. Results and Discussions The results indicated that both ensemble learning models demonstrated superior classification prediction capabilities when compared to traditional models. XGBoost and LightGBM achieved accuracies of 0.903, surpassing random forest (0.900) and logistic regression (0.864). In terms of precision, LightGBM (0.887) slightly outperformed XGBoost (0.882), while both outperformed random forest (0.872) and logistic regression (0.802). The F1 score metric placed XGBoost at the forefront with 0.899, closely followed by LightGBM (0.898) and random forest (0.897), while logistic regression yielded the lowest F1 score (0.866). Evaluation of the area under the ROC curve (AUC) indicated that XGBoost and LightGBM achieved nearly identical high classification performance (0.904), outperforming random forest (0.902), with logistic regression trailing at the lowest AUC (0.869). The examination of the constructed susceptibility zoning maps, coupled with quantitative analysis of the area proportions attributed to each zone, disclosed disparities in the partitioning outcomes from the XGBoost and LightGBM models in comparison to those produced by logistic regression and random forest models. These disparities were primarily attributed to the divergent data processing strategies inherent to each algorithm. In an effort to substantiate the reliability of the models’ predictions, the density of landslide and collapse points within each susceptibility zone was quantitatively scrutinized. XGBoost, LightGBM, and random forest models consistently reflected the general trend of increasing landslide and collapse point density with higher susceptibility levels, aligning with the typical pattern of disaster susceptibility. LightGBM performed best in identifying high and extremely high susceptibility areas, with landslide and collapse point density ratios of 1.844 and 3.079, respectively, the highest among all models evaluated. In contrast, logistic regression did not adhere to this increasing trend, presenting an anomalous ratio of 0.588 in zones of very low susceptibility, a figure surpassing that within zones of high susceptibility (0.528). This anomaly indicated the presence of prediction bias in the logistic regression model, potentially ascribable to the limitations of the logistic regression algorithm and the lack of representative data. Conclusions The predictive capabilities of the advanced ensemble learning models in assessing landslide and collapse susceptibility in Wenchuan County surpassed those of the two traditional models. These models outperformed the traditional approaches in terms of accuracy, precision, F1 score, and area under the Receiver Operating Characteristic. LightGBM demonstrated higher precision, while XGBoost yielded superior results in the F1 score. In terms of reliability, both ensemble learning models, particularly LightGBM, exhibited advantages in identifying high and very high susceptibility areas, reinforcing their superiority in landslide and collapse susceptibility assessment. The research findings provide a more accurate tool for evaluating landslide and collapse susceptibility in Wenchuan County and similar areas affected by earthquakes, supporting the development of disaster prevention and mitigation measures. Future research can involve more comprehensive data collection methods and investigate broader applications of ensemble learning models, improving the reliability and practical implementation of predictions in disaster management.
YanYan, GeYonggang, ZhangJianqiang,et al.Research on the debris flow hazards in cutou gully,Wenchuan county on July 10,2013[J].Journal of Catastrophology,2014,29(3):229‒234.
WangXiekang, LiuXingnian, ZhouJiawen.Research fram-ework and anticipated results of flash flood disasters under the mutation of sediment supply[J].Advanced Engine-ering Sciences,2019,51(4):1‒10.
LiuFuzhen, WangLing, XiaoDongsheng,et al.Evaluation of landslide susceptibility in Ningnan County based on fu-zzy comprehensive evaluation[J].Journal of Natural Disasters,2021,30(5):237‒246.
HaiminLyu, ShenJ, ArulrajahA.Assessment of geohazards and preventative countermeasures using AHP incorporated with GIS in Lanzhou,China[J].Sustainability,2018,10(2):304. doi:10.3390/su10020304
[10]
YuanXinyue, LiuChao, NieRuihua,et al.A comparative analysis of certainty factor-based machine learning methods for collapse and landslide susceptibility mapping in Wenchuan County,China[J].Remote Sensing,2022,14(14):3259. doi:10.3390/rs14143259
[11]
GnyawaliK R, ZhangYonghong, WangGuojie,et al.Mapping the susceptibility of rainfall and earthquake triggered landslides along China—Nepal highways[J].Bulletin of Engineering Geology and the Environment,2020,79(2):587‒601. doi:10.1007/s10064-019-01583-2
[12]
WuXueling, ShenShaoqing, NiuRuiqing.Landslide susceptibility prediction using GIS and PSO‒SVM[J].Geom-atics and Information Science of Wuhan University,2016,41(5):665‒671.
TianNaiman, LanHengxing, WuYuming,et al.Performa-nce comparison of BP artificial neural network and CART decision tree model in landslide susceptibility prediction[J].Journal of Geo-information Science,2020,22(12):2304‒2316.
AditianA, KubotaT, ShinoharaY.Comparison of GIS-ba-sed landslide susceptibility models using frequency ratio,logistic regression,and artificial neural network in a terti-ary region of Ambon,Indonesia[J].Geomorphology,2018,318:101‒111. doi:10.1016/j.geomorph.2018.06.006
[17]
FangZhice, WangYi, PengLing,et al.A comparative study of heterogeneous ensemble-learning techniques for landslide susceptibility mapping[J].International Journal of G-eographical Information Science,2021,35(2):321‒347. doi:10.1080/13658816.2020.1808897
[18]
ShehadehA, AlshboulO, Al MamlookR E,et al.Machine learning models for predicting the residual value of heavy construction equipment:An evaluation of modified decis-ion tree,LightGBM,and XGBoost regression[J].Automation in Construction,2021,129:103827. doi:10.1016/j.autcon.2021.103827
[19]
WangDehua, ZhangYang, ZhaoYi.LightGBM:An effective miRNA classification method in breast cancer patients[C]//Proceedings of the 2017 International Conference on Computational Biology and Bioinformatics.Newark:ACM,2017. doi:10.1145/3155077.3155079
[20]
XuChong, XuXiwei, YaoXin,et al.Three (nearly) complete inventories of landslides triggered by the May 12,2008 Wenchuan Mw 7.9 earthquake of China and their sp-atial distribution statistical analysis[J].Landslides,2014,11(3):441‒461. doi:10.1007/s10346-013-0404-6
[21]
MengShaoqiang, ShiZhenming, LiGang,et al.A novel de-ep learning framework for landslide susceptibility assess-ment using improved deep belief networks with the intelligent optimization algorithm[J].Computers and Geotechni-cs,2024,167:106106. doi:10.1016/j.compgeo.2024.106106
SayreR, DangermondJ, FryeC,et al.A new map of global ecological land units—An ecophysiographic stratification approach[M].Washington DC:Association of American G-eographers,2014.
[24]
HartmannJ, MoosdorfN.The new global lithological map database GLiM:A representation of rock properties at the Earth surface[J].Geochemistry,Geophysics,Geosystems,2012,13(12):Q12004. doi:10.1029/2012gc004370
[25]
PengShouzhang, DingYongxia, WenZhongming,et al.Spatiotemporal change and trend analysis of potential evapotranspiration over the Loess Plateau of China during 2011—2100[J].Agricultural and Forest Meteorology,2017,233:183‒194. doi:10.1016/j.agrformet.2016.11.129
[26]
DingYongxia, PengShouzhang.Spatiotemporal trends and attribution of drought across China from 1901—2100[J].Sustainability,2020,12(2):477. doi:10.3390/su12020477
[27]
PengShouzhang, DingYongxia, LiuWenzhao,et al.1 km monthly temperature and precipitation dataset for China from 1901 to 2017[J].Earth System Science Data,2019,11(4):1931‒1946. doi:10.5194/essd-11-1931-2019
[28]
PengShouzhang, ChengchengGang, CaoYang,et al.Assessment of climate change trends over the Loess Plateau in China from 1901 to 2100[J].International Journal of Cl-imatology,2018,38(5):2250‒2264. doi:10.1002/joc.5331
[29]
VasuN N, LeeS R.A hybrid feature selection algorithm integrating an extreme learning machine for landslide susceptibility modeling of Mt,Woomyeon,South Korea[J].Ge-omorphology,2016,263:50‒70. doi:10.1016/j.geomorph.2016.03.023
[30]
LiXiao, LiShouding, ChenJian,et al.Coupling effect me-chanism of endogenic and exogenic geological processes of geological hazards evolution[J].Chinese Journal of Rock Mechanics and Engineering,2008,27(9):1792‒1806.
SunDeliang, WenHaijia, WangDanzhou,et al.A random forest model of landslide susceptibility mapping based on hyperparameter optimization using Bayes algorithm[J].G-eomorphology,2020,362:107201. doi:10.1016/j.geomorph.2020.107201
[33]
DouJie, XiangZilin, XuQiang,et al.Application and development trend of machine learning in landslide intelligent disaster prevention and mitigation[J].Earth Science,2023,48(5):1657‒1674.
HosseiniF S, ChoubinB, MosaviA,et al.Flash-flood hazard assessment using ensembles and Bayesian-based machine learning models:Application of the simulated annealing feature selection method[J].Science of the Total Environment,2020,711:135161. doi:10.1016/j.scitotenv.2019.135161
[36]
ChenTianqi, GuestrinC.XGBoost:A scalable tree boosting system[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.San Francisco:ACM,2016. doi:10.1145/2939672.2939785
[37]
KeG, MengQ, FinleyT,et al.LightGBM:A highly efficient gradient boosting decision tree[C]//Advances in Neural Information Processing Systems30(NIP 2017).California:C-urran Associates,Inc.,2017:30.
[38]
YesilnacarE, TopalT.Landslide susceptibility mapping:A comparison of logistic regression and neural networks me-thods in a medium scale study,Hendek region (Turkey)[J].Engineering Geology,2005,79(3/4):251‒266. doi:10.1016/j.enggeo.2005.02.002
[39]
HuangFaming,Zhang,Yinlang,Guo,Zizheng,et al.Effects of different classification methods on regional landslide susceptibility zonation[J].Advanced Engineering Sciences,2024,56(1):148‒159.