基于集成学习方法的汶川震损区崩塌滑坡易发性评价

丁嘉伟; 王协康

doi:10.12454/j.jsuese.202400244

工程科学与技术 ›› 2025, Vol. 57 ›› Issue (04) : 52 -61. DOI: 10.12454/j.jsuese.202400244

滑坡堰塞湖灾害机理与防控

基于集成学习方法的汶川震损区崩塌滑坡易发性评价

丁嘉伟 ,
王协康

作者信息 +

Landslide and Collapse Susceptibility Analysis in Wenchuan Earthquake-damaged Area Based on Ensemble Learning Methods

Author information +

文章历史 +

PDF (2442K)

摘要

崩塌、滑坡等形成的松散物质常为山洪灾害提供重要泥沙物源。汶川“5·12”地震灾区存在大量不稳定边坡与潜在的滑坡、崩塌风险区。构建崩塌、滑坡易发性评价模型对于该区域复合山洪灾害的早期防范具有重要意义。本文从地形、地质及气象水文等方面筛选出10个评价因子，应用极端梯度提升（XGBoost）与轻量级梯度提升机（LightGBM）两种先进的集成学习算法和逻辑回归、随机森林两种常见算法分别构建汶川县崩塌滑坡易发性评价模型，通过准确率、精确率、受试者工作特征曲线（ROC）下面积等定量指标对比各模型评估结果。结果表明：根据不同分类评价指标，两种集成学习模型相较于传统模型拥有更高的分类预测能力；分类准确率方面，XGBoost模型（0.903）与LightGBM模型（0.903）优于随机森林模型（0.900）与逻辑回归模型（0.864）；精确率方面，LightGBM模型（0.887）略优于XGBoost模型（0.882），优于随机森林模型（0.872）与逻辑回归模型（0.802）；根据不同模型ROC曲线下面积计算结果，XGBoost模型（0.904）与LightGBM模型（0.904）具有近乎同等的分类性能，略优于随机森林模型（0.902），逻辑回归模型最差（0.869）；对易发性图进一步对比分析发现，两种集成学习模型的易发性分区结果与逻辑回归、随机森林模型结果存在一定差异，根据对各分区崩滑点密集程度的计算，两种集成学习模型的结果较为可靠，LightGBM模型在识别和预测崩滑高易发区域方面的性能最佳。

Abstract

Objective The 5·12 Wenchuan earthquake triggered extensive secondary geological disasters and cascading effects. Wenchuan County, which was severely impacted by the earthquake, exhibits widespread unstable slopes and areas prone to landslides and collapses. In mountainous regions, the occurrence of extreme rainfall events precipitates extensive landslides and collapses. The copious loose material produced constitutes a substantial sediment source, exacerbating the magnitude of flash flood disasters under the coupling effect of water and sediment movement, and particularly heightening the risk of debris flows and debris floods. Given these circumstances, it is imperative to develop assessment models for landslide and collapse susceptibility to facilitate early prevention of compound flash flood disasters in Wenchuan County. Conventional susceptibility assessment approaches often rely on expert experience and subjective judgment; alternatively, they encounter difficulties in adequately fitting high-dimensional complex data. As a result, the precise delineation of the actual spatial distribution of areas susceptible to landslides and collapses remains a formidable challenge. Recent advancements in data science and machine learning provide promising solutions. Two state-of-the-art ensemble learning algorithms, eXtreme Gradient Boosting (XGBoost) and Light Gradient Boosting Machine (LightGBM), are introduced to formulate dependable models for appraising susceptibility to landslides and collapses within the confines of Wenchuan County. Methods A comprehensive evaluation of factors related to topography, geology, meteorology, and hydrology was conducted to select ten evaluative factors: Elevation, slope, aspect, terrain relief, distance to rivers, distance to faults, normalized difference vegetation index (NDVI), land cover type, average annual precipitation, and lithology. Data preprocessing procedures were implemented to ensure the effectiveness and stability of model training. The data were standardized to mitigate the impact of differing scales among the dependent factors on the model. Factors displaying significant multicollinearity were identified and excluded using the Variance Inflation Factor (VIF), ensuring the independence of each feature in the analysis. In addition, the Information Gain Ratio (InGR) was utilized as a metric to evaluate the importance of each factor, facilitating the preliminary selection of explanatory variables. Then, two advanced ensemble learning algorithms (XGBoost and LightGBM) were applied alongside two traditional algorithms (logistic regression and random forest) to construct landslide and collapse susceptibility assessment models for Wenchuan County. Quantitative metrics, including accuracy, precision, recall, F₁ score, and receiver operating characteristic (ROC) curves, were employed to enable a comparative and evaluative analysis of the performance of each model. These models were then utilized to predict the probabilities of landslide and collapse occurrences across the designated study area. The natural breakpoint method was employed to demarcate susceptibility zones, resulting in the development of a map delineating areas vulnerable to landslides and collapses. Additional qualitative and quantitative analyses were performed on the resulting susceptibility maps, with particular attention given to the correspondence between predicted results and actual landslide and collapse events, evaluating the predictive reliability of the proposed models. Results and Discussions The results indicated that both ensemble learning models demonstrated superior classification prediction capabilities when compared to traditional models. XGBoost and LightGBM achieved accuracies of 0.903, surpassing random forest (0.900) and logistic regression (0.864). In terms of precision, LightGBM (0.887) slightly outperformed XGBoost (0.882), while both outperformed random forest (0.872) and logistic regression (0.802). The F₁ score metric placed XGBoost at the forefront with 0.899, closely followed by LightGBM (0.898) and random forest (0.897), while logistic regression yielded the lowest F₁ score (0.866). Evaluation of the area under the ROC curve (AUC) indicated that XGBoost and LightGBM achieved nearly identical high classification performance (0.904), outperforming random forest (0.902), with logistic regression trailing at the lowest AUC (0.869). The examination of the constructed susceptibility zoning maps, coupled with quantitative analysis of the area proportions attributed to each zone, disclosed disparities in the partitioning outcomes from the XGBoost and LightGBM models in comparison to those produced by logistic regression and random forest models. These disparities were primarily attributed to the divergent data processing strategies inherent to each algorithm. In an effort to substantiate the reliability of the models’ predictions, the density of landslide and collapse points within each susceptibility zone was quantitatively scrutinized. XGBoost, LightGBM, and random forest models consistently reflected the general trend of increasing landslide and collapse point density with higher susceptibility levels, aligning with the typical pattern of disaster susceptibility. LightGBM performed best in identifying high and extremely high susceptibility areas, with landslide and collapse point density ratios of 1.844 and 3.079, respectively, the highest among all models evaluated. In contrast, logistic regression did not adhere to this increasing trend, presenting an anomalous ratio of 0.588 in zones of very low susceptibility, a figure surpassing that within zones of high susceptibility (0.528). This anomaly indicated the presence of prediction bias in the logistic regression model, potentially ascribable to the limitations of the logistic regression algorithm and the lack of representative data. Conclusions The predictive capabilities of the advanced ensemble learning models in assessing landslide and collapse susceptibility in Wenchuan County surpassed those of the two traditional models. These models outperformed the traditional approaches in terms of accuracy, precision, F₁ score, and area under the Receiver Operating Characteristic. LightGBM demonstrated higher precision, while XGBoost yielded superior results in the F₁ score. In terms of reliability, both ensemble learning models, particularly LightGBM, exhibited advantages in identifying high and very high susceptibility areas, reinforcing their superiority in landslide and collapse susceptibility assessment. The research findings provide a more accurate tool for evaluating landslide and collapse susceptibility in Wenchuan County and similar areas affected by earthquakes, supporting the development of disaster prevention and mitigation measures. Future research can involve more comprehensive data collection methods and investigate broader applications of ensemble learning models, improving the reliability and practical implementation of predictions in disaster management.

Graphical abstract

关键词

汶川震损区 / 滑坡 / 崩塌 / 山洪灾害 / 机器学习

Key words

Wenchuan earthquake-damaged area / landslide / collapse / flash flood disaster / machine learning

引用本文

引用格式 ▾

[Author(id=1261369518032379912, tenantId=1045748351789510663, journalId=1189532792859160581, articleId=1189602732304421605, orderNo=0, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=dingjiawei@stu.scu.edu.cn, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1261369518091100173, tenantId=1045748351789510663, journalId=1189532792859160581, articleId=1189602732304421605, authorId=1261369518032379912, language=EN, stringName=Jiawei DING, firstName=Jiawei, middleName=null, lastName=DING, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=State Key Laboratory of Hydraulics and Mountain River Engineering, Sichuan University, Chengdu 610065, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1261369518137237522, tenantId=1045748351789510663, journalId=1189532792859160581, articleId=1189602732304421605, authorId=1261369518032379912, language=CN, stringName=丁嘉伟, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=四川大学山区河流保护与治理全国重点实验室，四川成都 610065, bio={"content":"

丁嘉伟（2001—），男，硕士生. 研究方向：水力学及河流动力学. E-mail：dingjiawei@stu.scu.edu.cn

"}, bioImg=null, bioContent=

丁嘉伟（2001—），男，硕士生. 研究方向：水力学及河流动力学. E-mail：dingjiawei@stu.scu.edu.cn

, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1261369517956882432, tenantId=1045748351789510663, journalId=1189532792859160581, articleId=1189602732304421605, xref=null, ext=[AuthorCompanyExt(id=1261369517973659650, tenantId=1045748351789510663, journalId=1189532792859160581, articleId=1189602732304421605, companyId=1261369517956882432, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=State Key Laboratory of Hydraulics and Mountain River Engineering, Sichuan University, Chengdu 610065, China), AuthorCompanyExt(id=1261369517986242564, tenantId=1045748351789510663, journalId=1189532792859160581, articleId=1189602732304421605, companyId=1261369517956882432, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=四川大学山区河流保护与治理全国重点实验室，四川成都 610065)])]), Author(id=1261369518187569173, tenantId=1045748351789510663, journalId=1189532792859160581, articleId=1189602732304421605, orderNo=1, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=wangxiekang@scu.edu.cn, emailSecond=null, emailThird=null, correspondingAuthor=1, authorType=1, ext={EN=AuthorExt(id=1261369518250483738, tenantId=1045748351789510663, journalId=1189532792859160581, articleId=1189602732304421605, authorId=1261369518187569173, language=EN, stringName=Xiekang WANG, firstName=Xiekang, middleName=null, lastName=WANG, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=State Key Laboratory of Hydraulics and Mountain River Engineering, Sichuan University, Chengdu 610065, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1261369518296621085, tenantId=1045748351789510663, journalId=1189532792859160581, articleId=1189602732304421605, authorId=1261369518187569173, language=CN, stringName=王协康, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=四川大学山区河流保护与治理全国重点实验室，四川成都 610065, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1261369517956882432, tenantId=1045748351789510663, journalId=1189532792859160581, articleId=1189602732304421605, xref=null, ext=[AuthorCompanyExt(id=1261369517973659650, tenantId=1045748351789510663, journalId=1189532792859160581, articleId=1189602732304421605, companyId=1261369517956882432, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=State Key Laboratory of Hydraulics and Mountain River Engineering, Sichuan University, Chengdu 610065, China), AuthorCompanyExt(id=1261369517986242564, tenantId=1045748351789510663, journalId=1189532792859160581, articleId=1189602732304421605, companyId=1261369517956882432, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=四川大学山区河流保护与治理全国重点实验室，四川成都 610065)])])] 丁嘉伟,王协康. 基于集成学习方法的汶川震损区崩塌滑坡易发性评价[J]. 工程科学与技术, 2025, 57(04): 52-61 DOI:10.12454/j.jsuese.202400244

登录浏览全文

4963

注册一个新账户忘记密码

本刊网刊

2008年“5·12”汶川地震的极大破坏力及龙门山区脆弱的地质环境，诱发了大规模滑坡、崩塌等次生地质灾害^[1]，形成的松散固体物质在坡麓、沟道等区域大量堆积。汶川县作为震后受损最严重的区域之一，滑坡、崩塌、不稳定边坡及潜在易发区域分布广。山区流域短时强降雨引起的崩塌，滑坡形成的大量泥沙、石块易被暴涨的洪水搬运至沟道，与区域中已有的松散堆积体一起为山洪水沙灾害与山洪泥石流灾害提供充足的物源。严炎等^[2]分析四川省汶川县簇头沟“7·10”泥石流灾害的成因与特征，认为流域内大量的固体松散物质为泥石流的发育提供了重要补给。王协康等^[3]基于西南山区灾害现场调查与资料进行分析，提出山区暴雨引发的坡体滑塌可能导致沟道超量产沙，诱发沟床形态的急剧变化，使水位陡增成灾。因此，构建汶川县及类似震损区的易发性评价模型对当地防灾减灾具有重要意义。

早期的崩滑易发性评估主要为依靠知识驱动的经验模型，如模糊逻辑^[4]、层次分析法^[5]等，此类方法依赖于评估者的主观认知，与自身经验密切相关，难以挖掘崩滑与评价因子间确切的相关关系。随着遥感、地理信息系统、全球定位系统及计算机科学的发展，数据获取难度降低，基于数据的模型逐渐涌现，统计分析模型，如确定性因子法^[6]、频率比法^[7]等客观高效，能够反映崩塌、滑坡发生的统计规律，但此类传统的数学方法难以处理过于复杂的非线性关系。随着数据量与数据复杂度的不断提升，基于自我学习算法的机器学习模型被引入，并逐渐成为主流^[8‒10]，机器学习模型通过特定算法从输入数据中学习规则与模式，完成对高维复杂数据的拟合。然而，基于单一机器学习模型的灾害易发性评估仍存在诸多障碍，由于训练集数据的限制，单一机器学习模型可能会错过假设空间中样本的真实分布，导致不同模型的结果存在较大差异^[11]。近年来，基于组合各类基分类器的集成学习模型进入研究者视野，通过结合多个基本模型的预测结果，能减小模型的偏差与方差，得到更优的近似解。XGBoost和LightGBM作为近年提出的先进集成学习算法，具备优秀的泛化能力和运行效率，已在金融^[12]、医学^[13]等领域得到广泛应用，但在自然灾害易发性评估方面的研究仍较为有限。

基于此，本文将XGBoost和LightGBM模型与较为常见的随机森林（RF）、逻辑回归（LR）模型进行对比研究，通过分析不同模型的分类性能，获取可靠的崩滑易发性图。

1 研究区域与数据来源

1.1 研究区域

研究区域如图1所示。汶川县位于四川盆地西北部，居阿坝藏族羌族自治州东南部，总面积4 084 km²。地势西北高东南低，由西北向东南倾斜，东西相对高差3 000 m以上，县境四面环山，地势沿江河溪沟逐渐降低，形成河谷阶地区域。县域位于龙门山华夏系构造体系中南段，内有两条北东走向的压扭性大断裂带，即茂县—汶川断裂带与北川—映秀断裂带。汶川县降水量地区差异大，呈现鲜明的南涝北旱分布。县域内沟壑纵横、水系密布，县内河流均属岷江水系，4条主要支流（杂谷脑河、草坡河、二河与寿江）自西向东，分别从县境北部、中部、南部注入岷江。

根据中国科学院资源环境科学与数据中心（www.resdc.cn）提供的地质灾害数据，研究区内有崩塌287处、滑坡134处、不稳定边坡132处，共计553处选作正样本点。由于研究区尚无精确选择非崩滑点的指南，本文参考Xu等^[14]编录的“5·12”汶川地震同震滑坡清单，在未受滑坡影响区域随机选取553个负样本点^[15]。选取70%的样本点作为训练集，其余30%作为测试集构建模型。

1.2 评价因子

滑坡、崩塌的发生是多种环境因素共同作用的结果，总体可分为地形、地质、水文气象和土地覆盖4个方面^[16]。由于崩滑诱因的复杂性，选取高程、坡度、坡向、地形起伏度、距河流距离、距断层距离、归一化植被指数（NDVI）、土地覆盖类型、降水量及岩性10个参数作为评价因子。各评价因子详情如表1所示。

高程影响着一个地区内的气温、降水、植被等条件，不同高程的区域可能存在明显的气候垂直分布差异，进而影响崩滑的发生。坡度直接影响着斜坡的稳定性与滑动速度，也能一定程度反映径流与地下水的分布。通常而言，坡内剪应力越集中，崩塌、滑坡发生的可能性与危险性越高^[6]。坡向通过影响斜坡的日照、温度，影响控制边坡的水文、生态条件，进而影响其稳定性。地形起伏度反映着导致崩塌、滑坡发生的重力势能，与土体的稳定性存在着密切关联^[6]。临近河流区域，土体力学性质可能因长期饱和而下降，导致崩滑风险加剧^[23]。活动断层控制着岩体内节理和层理的发育，靠近断层斜坡的岩土体易被切割为不连续的破碎块体，基岩强度降低，斜坡的稳定性受到影响^[24]。归一化植被指数（NDVI）通过分析遥感图像中的红外波段和可见光波段间的反射率差异反映植被的生长和覆盖情况，本文提取研究区2000—2020年的30 m分辨率NDVI并求得均值。土地覆盖类型反映地表的气候、水文、荷载等情况，能间接影响崩塌、滑坡的发生，选取2000—2020年的土地覆盖数据并求得其均值，将研究区内土地覆盖分为耕地、森林、灌木、草原、水体、冰（雪）、荒地和不透水地面8种类型。多年平均降水量能够反映研究区域内长期的降水分布情况，不仅直接影响着边坡土体的渗透性能，也间接影响着植被、径流等情况^[25]，利用中国的逐月降水量数据计算研究区域2000—2020年的多年平均降水量。岩性和地质结构的不同常导致土壤和岩石在应力分布、渗透性上的差异，不同岩性的边坡也因此表现出不同的强度与稳定性。

2 研究方法

本文的研究技术流程图如图2所示，分为3个步骤：1）数据的准备与预处理；2）基于机器学习的易发性评估模型建立与各模型分类性能的对比评估；3）易发性图的绘制与分析。

2.1 数据预处理

2.1.1 特征缩放

通过特征缩放将量纲、数量级等存在一定差异的复杂数据转化为无量纲的相对数值，能有效消除特征之间的属性差异，提升模型的性能。目前，常用的特征缩放方法有Min‒Max归一化法与Z-score标准化法，Min‒Max归一化法易受部分离群极端数据影响，不能反映数据的正确分布。本研究选用Z-score标准化法进行数据标准化，

x i'

表达式为：

x i' = x i - x m e a n s t d

（1）

式中，x_i 为第i个样本在该特征上的原始值，x_mean为该特征数据的均值，s_td为该特征数据的标准差。

2.1.2 特征选择

1）多重共线性分析

多重共线性指在多重回归分析中自变量之间存在高度相关性，过高的多重共线性会增大标准误差，导致模型的稳定性与可信度降低。方差膨胀因子（VIF）是一种常用的衡量变量间相关性的参数，本文将10作为阈值，认为方差膨胀因子值高于10的变量与其他变量间存在严重的共线关系，故将其排除在外。VIF计算式为：

V i = 1 1 - R i 2, i = 1,2, 3, …, k

（2）

式中，R_i 为第i个变量与剩余k‒1个变量之间的决定系数。

2）信息增益率

信息增益率（InGR）基于信息论，通过计算某一特定评价因子条件下数据集信息熵的减少量，量化评价因子对数据的预测能力。InGR过小表示该特征不能提供有效信息；相反，可能产生噪声，降低预测模型的性能。本文将0.05作为阈值，将InGR小于该阈值的特征剔除。InGR计算式为：

R (x, Z) = E Z - ∑ i = 1 n Z i Z E (Z i) - ∑ i = 1 n Z i Z l b Z i Z

（3）

式中：E(Z)与E(Z_i )分别为数据集总体信息熵与第i个类别的条件熵；Z为总样本数；x为数据集Z划分为子集的评价因子，把所有样本x分成i个类别；Z_i 为特征x的值，属于第i个类别的样本数。

2.2 模型建立与评估

2.2.1 机器学习算法

逻辑回归（LR）作为经典的统计学习算法，其核心在于运用Sigmoid函数将线性模型的输出映射到(0,1)区间，以表征在给定特征组合下事件发生的概率。逻辑回归输出的概率p为：

p = 1 1 + e - (w T x + b)

（4）

式中，

w

、b均为参数。

训练过程中，通过最大化似然函数或最小化交叉熵损失函数估计参数 w 和b，确保模型逐步趋近于数据的最佳概率描述。逻辑回归假设存在一个线性边界将不同类别的数据分开，其假设较为简单，可解释性高，计算代价小，且在处理近似线性可分的二分类问题时展现出良好的效果，是易发性建模中常用的算法^[26]。

随机森林（RF）基于Bagging集成策略，即通过构建多个去相关性的基学习器实现对复杂高维数据的预测。具体而言，通过自助采样生成多个相互独立的子数据集，每个数据集用于单独训练一棵决策树。在决策树每个节点分裂过程中，从全部特征中随机选取有限集合作为候选分裂特征，而非考察所有特征。随后采取多数表决的方式，综合每棵决策树的预测结果达成最终决策。随机森林通过在模型构建的每个环节引入随机性，提高模型的鲁棒性，增强对复杂数据模式的捕捉能力。在易发性建模中，随机森林因其高普及度与良好的分类性能，频繁被选作评估新颖算法性能的基准模型^[27]。

Chen等^[28]提出极端梯度提升（XGBoost）。XGBoost基于Boosting集成策略，即以序列化方式训练一系列基学习器，每个基学习器都致力于修正前序模型的预测残差，逐步提升模型的性能以得出最优预测结果。XGBoost通过迭代添加树模型最小化损失函数，同时引入正则化项控制模型的复杂度，得到综合损失和模型复杂度的复合目标函数：

L t = G j w j + 12 H j + λ w j 2 + γ T

（5）

式中：t为当前的迭代轮次；G_j 与H_j 分别为叶子节点j所包含样本的1阶偏导数与2阶偏导数的累加之和；w_j 为叶子节点j的权重；λ与γ为正则化参数，用于控制模型复杂度；T为树模型的叶子节点总数。

决策树构建过程中，遍历各特征可能分裂点的增益，选择最佳分裂点并判断是否继续分裂。因其高效率和灵活性，XGBoost算法在竞赛与工业领域已得到广泛应用。

轻量级梯度提升机（LightGBM）是由微软公司于2017年提出的一种梯度提升决策树的变体算法^[29]，基于Boosting集成策略，其核心在于对单边梯度采样与互斥特征捆绑算法的引入。具体而言，相较于遍历每个特征的每个可能分裂点的信息增益，LightGBM通过单边梯度采样，识别并优先考虑梯度较大的数据实例估计信息增益，显著减小了计算量；针对高维稀疏特征空间，通过互斥特征捆绑识别和合并极少同时非0的互斥特征，完成对特征的降维处理。此外，LightGBM采用直方图算法对连续特征进行离散化处理，通过汇总各区间内特征值的统计信息，替代了在连续域上的逐一计算，进一步提升了计算效率。在此基础上，采用叶优先（leaf-wise）的决策树生长策略，于每轮迭代中仅选取增益最大的叶节点进行分裂，直至达到预设的最大深度。LightGBM算法通过优化策略平衡了模型复杂度与预测精度，在维持预测性能的同时，实现了计算效率的提升。

4种模型的差异主要表现在数据处理方法、模型构建机制及优化目标的设定上。逻辑回归模型基于线性假设，通过线性函数对特征与结果的关系进行建模，采用概率映射预测分类结果。随机森林模型依赖多个独立决策树的集成学习，通过树结构捕捉数据中的复杂模式和特征间的交互作用。XGBoost和LightGBM模型采用梯度提升通过迭代优化损失函数学习数据的非线性决策边界。其中，LightGBM通过特定的优化手段进一步提升了效率。

超参数对学习过程和模型结构起控制作用，超参数的调优对于提高模型的性能至关重要。传统的网格搜索法与随机搜索法存在计算效率低、易错过最优超参数组合等缺陷。对于超参数数量较多的集成学习算法，贝叶斯优化法是更好的选择，其通过构建目标函数的概率模型指导搜索过程，通过多次迭代在超参数空间内得到全局最优解^[25]，极大地提高了超参数调优的效率与准确性。因此，本文选用贝叶斯优化法对4种算法模型进行超参数调优。

2.2.2 模型评估指标

选用准确率、精确率、召回率、F₁值及ROC曲线评估模型的分类结果。几种数据指标均基于分类模型的混淆矩阵。

A c c = T P + T N T P + F P + T N + F N

（6）

P r e = T P T P + F P

（7）

R e = T P T P + F N

（8）

F = 2 T P 2 T P + F P + F N

（9）

T P R = T P P = T P T P + F N

（10）

F P R = F P N = F P F P + T N

（11）

式（6）～（11）中，A_cc、P_re、R_e与F分别为预测的准确率、精确率、召回率与F₁值，T_P、T_N、F_P、F_N分别为正类判对、负类判对、负类误判为正、正类误判为负的样本数，T_PR与F_PR分别为真正率（正类识别率）和假正率（负类被误判为正类的比例），P、N分别为真实的正样本和负样本数。

ROC曲线通过真正率T_PR和假正率F_PR，反映模型整体的分类性能，一般用曲线下面积（AUC，用A_UC表示）定量表示。曲线下面积的取值范围位于0.5～1.0之间，接近0.5时，暗示模型的预测能力与随机选择无显著差异；趋近于1.0时，表明模型完美拟合，达到分类全部正确的理想状态^[30]。

3 结果与讨论

3.1 数据处理与特征选择

先对所有数据进行标准化处理，再计算各评价因子的方差膨胀因子和信息增益率。特征选择计算结果如表2所示。由表2可知：所有评价因子的方差膨胀因子均在阈值以下；岩性、地形起伏度、坡向3个因子的信息增益率均低于0.05。因此，保留距河流距离、降水量、坡度、土地覆盖、NDVI、距断层距离、高程共7个因子作为最终训练模型的特征。

3.2 模型评估

根据各模型分类评价指标的计算结果（表3），XGBoost模型与LightGBM模型在准确率方面均表现出色，数值达到0.903，高于两种传统模型。LightGBM模型在精确率指标上以0.887的结果领先；XGBoost模型次之，为0.882。召回率方面，LR模型以0.942呈现最优表现。在F₁值的比较中，XGBoost与LightGBM模型表现相近，分别为0.899与0.898；LR以0.866的结果位居最末，表明LR模型在平衡精确率与召回率方面存在不足。进一步分析图3的ROC曲线：XGBoost模型与LightGBM模型的分类性能最佳，曲线下面积均为0.904；RF模型次之，达0.902；LR模型呈现0.869的结果，显著低于其他模型。综合5个评价指标，认为XGBoost模型与LightGBM模型的分类性能近乎等同，优于RF模型与LR模型。

3.3 易发性图分析

使用所建立模型对研究区域内的全部数据集进行预测，输出崩滑发生概率，取值范围为0～1，对应崩滑易发性从低到高。为了直观区分易发程度，采用自然间断点法^[31]将研究区域分为极低易发性、低易发性、中易发性、高易发性与极高易发性5个等级。

4种模型崩滑易发性区域分布如图4所示，各模型输出的极高易发性区域分布较为相似，主要集中在北川—映秀、茂县—汶川两条活动断层及研究区域内水系的周边地带。相对而言，低易发性与极低易发性区域主要分布在县域西部。

崩滑易发性分析如图5所示。在空间分布和面积占比方面，XGBoost模型预测的极低、低、中、高、极高易发性区域的面积占比分别为38.06%、14.49%、13.67%、14.06%和19.72%，LightGBM模型预测的极低、低、中、高、极高易发性区域的面积占比分别为46.29%、12.63%、8.99%、12.06%和20.03%，两种模型呈现出相似的趋势（图5（a））且与LR、RF模型的易发性区域划分结果存在一定差异，这一现象可能归因于不同算法在数据处理策略上的差异。具体来说，LightGBM和XGBoost是基于梯度提升的集成学习算法；RF模型基于自助聚合集成策略，在处理非线性和复杂数据结构时，可能得出略有差异的结果；LR模型采用线性拟合与区间映射，可能难以准确处理高维度的复杂数据。图5（b）展示了各易发性分区中已确认的553个崩塌滑坡点的分布占比。对于LR模型，极低易发性区域包含了14.65%的崩滑点，超过了高易发性区域的12.84%，这一发现指向了LR模型预测结果中潜在的不合理性；RF模型预测中，中易发性区域的崩滑点占比达21.88%，这一比例可能与该模型划分的中易发性区面积的比重有较大相关性。

为进一步验证模型预测结果的可靠性，计算各易发性分区中崩塌和滑坡点的百分比与相应分区面积百分比之间的比率。该比率的计算旨在量化各分区中崩塌和滑坡事件的密集程度，从而评估模型对各分区易发性的预测是否存在偏差。

易发性分区崩滑点密集程度比率如表4所示，RF、XGBoost、LightGBM模型预测的比率均随着易发性等级的增加而递增，符合一般规律，证明了模型的可靠性。进一步对比，LightGBM模型在高与极高易发性区崩滑点密集程度比率分别为1.844、3.079，均为所有模型中最高，意味着相较于其他模型，LightGBM模型在识别和预测崩滑高易发区域方面的效果最优，能够更准确地指出崩塌与滑坡发生更为集中的区域。LR模型的计算结果不存在递增规律，在极低易发性区的比率（0.588）大于在高易发性区的比率（0.528），且最小比率（0.047）出现在中易发性区，证实了LR模型预测结果的不合理性。

LR模型呈现出的非预期结果可能源于算法本身与数据集两方面的局限性。一方面，逻辑回归算法在处理高维非线性数据时可能无法捕捉特征与结果之间复杂的非线性关系与交互情况；另一方面，本文使用的崩塌与滑坡数据主要来源于官方平台发布的各类新闻消息，获取的崩塌、滑坡点一般在野外排查易到达的、地形相对平坦的堆积区以及存在人员伤亡的人口聚居区，而难以走访或居民稀少的区域发生的坡体崩滑存在被忽略的情况，因此该数据集可能无法完全准确地反映研究区域内崩塌、滑坡的分布情况。未来研究中，可采用遥感影像解译等方法获取更为完整的崩滑数据集^[32]，以提高数据的代表性。

4 结　论

本文利用搜集的历史崩塌、滑坡记录，以汶川县为例进行了崩滑易发性研究。选取10个评价因子，以方差膨胀因子与信息增益率筛选特征，采用两种先进的集成学习算法（XGBoost和LightGBM）与两种常用传统算法（逻辑回归和随机森林）构建易发性预测模型，通过对比各模型的分类性指标评估各模型的分类性能，绘制了易发性分区图，并就不同模型的结果进行了讨论分析。通过研究得到以下结论。

1）XGBoost模型与LightGBM模型在分类准确率、分类精确率和F₁值方面优于其他模型，其中：LightGBM模型拥有最高的精确率（0.887）；XGBoost模型呈现最高的F₁值（0.899）；根据各模型ROC曲线，XGBoost模型与LightGBM模型展现出优于传统模型的分类性能。

2）根据由4种模型预测结果绘制的易发性分区图可知，县域内崩滑高易发区域主要集中在东部和北部活动断层与水系的周边地带，低易发性区域主要位于县域西部。

3）由XGBoost与LightGBM模型结果绘制的易发性图中，各分区呈现出相似的空间分布与面积占比，与逻辑回归、随机森林模型结果存在差异，这一现象可能归因于不同算法在数据处理策略方面的差异。

4）通过分析各分区崩滑点数量占比和密集程度认为，LightGBM模型在识别和预测崩滑高易发区域方面的性能最佳；逻辑回归模型的预测结果存在异常，此异常可能由算法与数据两方面的局限而导致。

对NDVI等在时间尺度上存在变化的评价因子，选用多年平均值，故研究结果能够显示研究区域内崩滑易发性的总体空间分布情况。研究结果能够为山洪泥石流、水沙灾害预警区域的划分提供参考，对汶川县及类似震损区域的防灾减灾具有一定指导意义。然而，缺少更加完整的崩塌、滑坡记录是本文的局限性之一。

参考文献

原文顺序 | 出版日期 | 本文引用

[1]	Xu Qiang, Li Weile.Distribution of large-scale landslides induced by the Wenchuan earthquake[J].Journal of Engineering Geology,2010,18(6):818‒826.

[2]	许强,李为乐.汶川地震诱发大型滑坡分布规律研究[J].工程地质学报,2010,18(6):818‒826.

[3]	Yan Yan, Ge Yonggang, Zhang Jianqiang,et al.Research on the debris flow hazards in cutou gully,Wenchuan county on July 10,2013[J].Journal of Catastrophology,2014,29(3):229‒234.

[4]	严炎,葛永刚,张建强,等.四川省汶川县簇头沟"7·10"泥石流灾害成因与特征分析[J].灾害学,2014,29(3):229‒234.

[5]	Wang Xiekang, Liu Xingnian, Zhou Jiawen.Research fram-ework and anticipated results of flash flood disasters under the mutation of sediment supply[J].Advanced Engine-ering Sciences,2019,51(4):1‒10.

[6]	王协康,刘兴年,周家文.泥沙补给突变下的山洪灾害研究构想和成果展望[J].工程科学与技术,2019,51(4):1‒10.

[7]	Liu Fuzhen, Wang Ling, Xiao Dongsheng,et al.Evaluation of landslide susceptibility in Ningnan County based on fu-zzy comprehensive evaluation[J].Journal of Natural Disasters,2021,30(5):237‒246.

[8]	刘福臻,王灵,肖东升,等.基于模糊综合评判法的宁南县滑坡易发性评价[J].自然灾害学报,2021,30(5):237‒246.

[9]	Haimin Lyu, Shen J, Arulrajah A.Assessment of geohazards and preventative countermeasures using AHP incorporated with GIS in Lanzhou,China[J].Sustainability,2018,10(2):304. doi:10.3390/su10020304

[10]	Yuan Xinyue, Liu Chao, Nie Ruihua,et al.A comparative analysis of certainty factor-based machine learning methods for collapse and landslide susceptibility mapping in Wenchuan County,China[J].Remote Sensing,2022,14(14):3259. doi:10.3390/rs14143259

[11]	Gnyawali K R, Zhang Yonghong, Wang Guojie,et al.Mapping the susceptibility of rainfall and earthquake triggered landslides along China—Nepal highways[J].Bulletin of Engineering Geology and the Environment,2020,79(2):587‒601. doi:10.1007/s10064-019-01583-2

[12]	Wu Xueling, Shen Shaoqing, Niu Ruiqing.Landslide susceptibility prediction using GIS and PSO‒SVM[J].Geom-atics and Information Science of Wuhan University,2016,41(5):665‒671.

[13]	武雪玲,沈少青,牛瑞卿.GIS支持下应用PSO‒SVM模型预测滑坡易发性[J].武汉大学学报(信息科学版),2016,41(5):665‒671.

[14]	Tian Naiman, Lan Hengxing, Wu Yuming,et al.Performa-nce comparison of BP artificial neural network and CART decision tree model in landslide susceptibility prediction[J].Journal of Geo-information Science,2020,22(12):2304‒2316.

[15]	田乃满,兰恒星,伍宇明,等.人工神经网络和决策树模型在滑坡易发性分析中的性能对比[J].地球信息科学学报,2020,22(12):2304‒2316.

[16]	Aditian A, Kubota T, Shinohara Y.Comparison of GIS-ba-sed landslide susceptibility models using frequency ratio,logistic regression,and artificial neural network in a terti-ary region of Ambon,Indonesia[J].Geomorphology,2018,318:101‒111. doi:10.1016/j.geomorph.2018.06.006

[17]	Fang Zhice, Wang Yi, Peng Ling,et al.A comparative study of heterogeneous ensemble-learning techniques for landslide susceptibility mapping[J].International Journal of G-eographical Information Science,2021,35(2):321‒347. doi:10.1080/13658816.2020.1808897

[18]	Shehadeh A, Alshboul O, Al Mamlook R E,et al.Machine learning models for predicting the residual value of heavy construction equipment:An evaluation of modified decis-ion tree,LightGBM,and XGBoost regression[J].Automation in Construction,2021,129:103827. doi:10.1016/j.autcon.2021.103827

[19]	Wang Dehua, Zhang Yang, Zhao Yi.LightGBM:An effective miRNA classification method in breast cancer patients[C]//Proceedings of the 2017 International Conference on Computational Biology and Bioinformatics.Newark:ACM,2017. doi:10.1145/3155077.3155079

[20]	Xu Chong, Xu Xiwei, Yao Xin,et al.Three (nearly) complete inventories of landslides triggered by the May 12,2008 Wenchuan Mw 7.9 earthquake of China and their sp-atial distribution statistical analysis[J].Landslides,2014,11(3):441‒461. doi:10.1007/s10346-013-0404-6

[21]	Meng Shaoqiang, Shi Zhenming, Li Gang,et al.A novel de-ep learning framework for landslide susceptibility assess-ment using improved deep belief networks with the intelligent optimization algorithm[J].Computers and Geotechni-cs,2024,167:106106. doi:10.1016/j.compgeo.2024.106106

[22]	Reichenbach P, Rossi M, Malamud B D,et al.A review of statistically-based landslide susceptibility models[J].Earth-Science Reviews,2018,180:60‒91. doi:10.1016/j.earscirev.2018.03.001

[23]	Sayre R, Dangermond J, Frye C,et al.A new map of global ecological land units—An ecophysiographic stratification approach[M].Washington DC:Association of American G-eographers,2014.

[24]	Hartmann J, Moosdorf N.The new global lithological map database GLiM:A representation of rock properties at the Earth surface[J].Geochemistry,Geophysics,Geosystems,2012,13(12):Q12004. doi:10.1029/2012gc004370

[25]	Peng Shouzhang, Ding Yongxia, Wen Zhongming,et al.Spatiotemporal change and trend analysis of potential evapotranspiration over the Loess Plateau of China during 2011—2100[J].Agricultural and Forest Meteorology,2017,233:183‒194. doi:10.1016/j.agrformet.2016.11.129

[26]	Ding Yongxia, Peng Shouzhang.Spatiotemporal trends and attribution of drought across China from 1901—2100[J].Sustainability,2020,12(2):477. doi:10.3390/su12020477

[27]	Peng Shouzhang, Ding Yongxia, Liu Wenzhao,et al.1 km monthly temperature and precipitation dataset for China from 1901 to 2017[J].Earth System Science Data,2019,11(4):1931‒1946. doi:10.5194/essd-11-1931-2019

[28]	Peng Shouzhang, Chengcheng Gang, Cao Yang,et al.Assessment of climate change trends over the Loess Plateau in China from 1901 to 2100[J].International Journal of Cl-imatology,2018,38(5):2250‒2264. doi:10.1002/joc.5331

[29]	Vasu N N, Lee S R.A hybrid feature selection algorithm integrating an extreme learning machine for landslide susceptibility modeling of Mt,Woomyeon,South Korea[J].Ge-omorphology,2016,263:50‒70. doi:10.1016/j.geomorph.2016.03.023

[30]	Li Xiao, Li Shouding, Chen Jian,et al.Coupling effect me-chanism of endogenic and exogenic geological processes of geological hazards evolution[J].Chinese Journal of Rock Mechanics and Engineering,2008,27(9):1792‒1806.

[31]	李晓,李守定,陈剑,等.地质灾害形成的内外动力耦合作用机制[J].岩石力学与工程学报,2008,27(9):1792‒1806.

[32]	Sun Deliang, Wen Haijia, Wang Danzhou,et al.A random forest model of landslide susceptibility mapping based on hyperparameter optimization using Bayes algorithm[J].G-eomorphology,2020,362:107201. doi:10.1016/j.geomorph.2020.107201

[33]	Dou Jie, Xiang Zilin, Xu Qiang,et al.Application and development trend of machine learning in landslide intelligent disaster prevention and mitigation[J].Earth Science,2023,48(5):1657‒1674.

[34]	窦杰,向子林,许强,等.机器学习在滑坡智能防灾减灾中的应用与发展趋势[J].地球科学,2023,48(5):1657‒1674.

[35]	Hosseini F S, Choubin B, Mosavi A,et al.Flash-flood hazard assessment using ensembles and Bayesian-based machine learning models:Application of the simulated annealing feature selection method[J].Science of the Total Environment,2020,711:135161. doi:10.1016/j.scitotenv.2019.135161

[36]	Chen Tianqi, Guestrin C.XGBoost:A scalable tree boosting system[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.San Francisco:ACM,2016. doi:10.1145/2939672.2939785

[37]	Ke G, Meng Q, Finley T,et al.LightGBM:A highly efficient gradient boosting decision tree[C]//Advances in Neural Information Processing Systems30(NIP 2017).California:C-urran Associates,Inc.,2017:30.

[38]	Yesilnacar E, Topal T.Landslide susceptibility mapping:A comparison of logistic regression and neural networks me-thods in a medium scale study,Hendek region (Turkey)[J].Engineering Geology,2005,79(3/4):251‒266. doi:10.1016/j.enggeo.2005.02.002

[39]	Huang Faming,Zhang,Yinlang,Guo,Zizheng,et al.Effects of different classification methods on regional landslide susceptibility zonation[J].Advanced Engineering Sciences,2024,56(1):148‒159.

[40]	黄发明,张崟琅,郭子正,等.不同分级方法对区域滑坡易发性区划的影响[J].工程科学与技术,2024,56(1):148‒159.

[41]	Fiorucci F, Ardizzone F, Mondini A C,et al.Visual interpretation of stereoscopic NDVI satellite images to map rainfall-induced landslides[J].Landslides,2019,16(1):165‒174. doi:10.1007/s10346-018-1069-y