基于不同滑坡负样本选取策略和随机森林方法的岷江上游大型滑坡易发性评价
骆飞 , 凌斯祥 , 高凤欣 , 林祖豪 , 孙春卫 , 高芳芳 , 巫锡勇
水利水电技术(中英文) ›› 2025, Vol. 56 ›› Issue (9) : 42 -59.
基于不同滑坡负样本选取策略和随机森林方法的岷江上游大型滑坡易发性评价
Large-scale landslide susceptibility evaluation in upper reaches of Minjiang River based on different selection methods for landslide negative samples and random forest algorithm
【目的】依据准确的滑坡易发性结果能对滑坡危险性与风险性进行精准防控。在滑坡易发性评价中,不同滑坡负样本的选取方式是影响滑坡易发性预测精度的重要不确定因素。【方法】以四川岷江流域山区为研究区,通过遥感影像编制881个大型滑坡(> 106 m3),选取地形地貌、基础地质、水文地质、地质环境、地震参数与人类活动等13个滑坡评价因子,并共线性分析检验因子冗余;其次对滑坡负样本采取全区随机、坡度低于10°区域、滑坡缓冲1 km外区域、信息量法(IV)、支持向量机法(SVM)和半监督法选择出与滑坡等比例的滑坡负样本,进一步与随机森林(RF)耦合构建随机RF、低坡度RF、缓冲区RF、IV-RF、SVM-RF与半监督RF模型开展滑坡易发性区划;最后采用ROC曲线的平均AUC值对不同滑坡负样本采样模型的预测精度进行对比评价。【结果】结果显示:(1)不同滑坡负样本采样方式得到的滑坡高易发区与极高易发区均集中分布于卧龙至映秀段、棉虒至古尔沟段、黑虎至木苏段与叠溪至松潘段河谷两侧;(2)不同滑坡负样本采样方式的滑坡易发性预测精度为:半监督RF($\overline{A U C}=0.971$)>SVM-RF($\overline{A U C}=0.954$)>IV-RF($\overline{A U C}=0.945$)>缓冲区RF($\overline{A U C}=0.902$)>低坡度RF($\overline{A U C}=0.895$)>随机RF($\overline{A U C}=0.882$);(3)在低易发区选择滑坡负样本能明显提高易发性精度,低坡度RF、缓冲区RF、IV-RF、SVM-RF与半监督RF模型相较于随机RF模型,■值分别提高了0.013、0.02、0.063、0.072、0.089。【结论】半监督RF模型的标准差最小(0.004)且平均AUC值(0.971)最高,展现出最优稳定性与模型预测能力,表明半监督采样方法对模型的优化效果最好。研究成果可为滑坡易发性预测中滑坡负样本选择和模型构建提供参考,同时也为岷江上游流域滑坡风险与防灾减灾提供理论支撑。
[Objective] Accurate landslide susceptibility results enable precise prevention and control of landslide hazards and risks. In landslide susceptibility evaluation, selection methods for different landslide negative samples represent a critical uncertainty factor that affects the prediction accuracy of landslide susceptibility. [Methods] Taking the mountainous area of the Minjiang River basin in Sichuan Province as the study area, data on 881 large landslides(>106 m3) were compiled through remote sensing imagery. Thirteen landslide evaluation factors including topography and geomorphology, basic geology, hydrogeology, geological environment, seismic parameters, and human activities were selected, and factor redundancy was examined through collinearity analysis. Subsequently, landslide negative samples were selected using random sampling across the study area, sampling in slope zones below 10°, sampling in areas outside 1 km buffer zones around landslides, Information Value(IV) method, Support Vector Machine(SVM) method, and semi-supervised method, with the same proportion as landslide positive samples. These negative samples were further coupled with the Random Forest(RF) to establish Random RF, Low-Slope RF, Buffer RF, IV-RF, SVM-RF, and Semi-Supervised RF models for landslide susceptibility zoning. Finally, the prediction accuracy of different sampling models for landslide negative samples was compared and evaluated using the mean Area Under the Curve(AUC) value derived from the Receiver Operating Characteristic(ROC) curve. [Results] The results showed that:(1) the high and extremely high landslide susceptibility zones obtained by different sampling methods for landslide negative samples were predominantly concentrated on both sides of the river valleys from Wolong to Yingxiu, Miansi to Gu'ergou, Heihu to Musu, and Diexi to Songpan.(2) The prediction accuracy of landslide susceptibility using different sampling methods for landslide negative samples ranked as follows: Semi-Supervised RF ($\overline{A U C}=0.971$)>SVM-RF($\overline{A U C}=0.954$)>IV-RF($\overline{A U C}=0.945$)>Buffer RF($\overline{A U C}=0.902$)>Low-Slope RF($\overline{A U C}=0.895$)>Random RF($\overline{A U C}=0.882$). (3) The selection of landslide negative samples in low-susceptibility areas significantly enhanced the prediction accuracy of susceptibility. Compared to Random RF model, the values of the Low-Slope RF, Buffer RF, IV-RF, SVM-RF, and Semi-Supervised RF models increased by 0.013, 0.02, 0.063, 0.072, and 0.089, respectively. [Conclusion] The Semi-Supervised RF model exhibits the smallest standard deviation(0.004) and the highest mean AUC value(0.971), demonstrating optimal stability and prediction capability. This indicates that the semi-supervised sampling method offers the best optimization for the model. These research findings provide references for selecting landslide negative samples and establishing models in landslide susceptibility prediction, while offering theoretical support for landslide risk assessment and disaster mitigation strategies in the upper Minjiang River Basin.
滑坡易发性预测 / 采样策略 / 随机森林 / 半监督法 / 模型平均法 / 岷江流域 / 滑坡 / 影响因素
landslide susceptibility prediction / sampling strategy / random forest / semi-supervised method / model averaging method / Minjiang River Basin / landslides / influencing factors
/
| 〈 |
|
〉 |