基于决策树的水灾预测模型
Flood prediction model based on decision trees
【目的】洪水是由暴雨、冰雪快速融化、风暴潮等因素引发的自然灾害,常导致经济损失与生活不便。常规的洪水预测主要依赖于传统的水文学方法和基于经验的统计模型,但在遇到缺乏长期、连续的水文观测数据的地区,利用其他数据进行洪水预测的方法就至关重要。【方法】基于决策树的机器学习算法,如随机森林、XGBoost及LightGBM,因其直观性和强大功能,在分类和回归任务中具有良好的预测能力,适用于洪水预测。使用包含50 000条记录与21个变量的数据集,评估随机森林、XGBoost和LightGBM三种算法的洪水预测能力,通过预测效果与关键变量识别比较其性能,并以ROC-AUC曲线衡量优劣。【结果】结果显示:所有模型均表现出较高的预测精度,其中XGBoost模型具有最小的均方误差0.000 186 2和最高的决定系数0.925 2,而LightGBM模型在ROC-AUC曲线中取得了最大的AUC值0.99。随机森林模型各指标均不如以上二者。【结论】结果表明:XGBoost模型在洪水概率的预测方面效果最好,预测误差最小;而对于预测洪水是否发生这类二分类情况,LightGBM则是最优的选择。
[Objective] Floods are natural disasters triggered by factors such as heavy rainfall, rapid snow and ice melt, and storm surges, often resulting in significant economic losses and severe disruption to daily life. Conventional flood prediction primarily relies on traditional hydrological method and experience-based statistical models. However, in areas lacking long-term and continuous hydrological monitoring data, alternative data-driven method for flood prediction are essential. [Methods] Machine learning algorithms based on decision trees, including Random Forest, XGBoost, and LightGBM, demonstrated excellent performance in classification and regression tasks due to their interpretability and strong functions, making them suitable for flood prediction. A dataset containing 50 000 records and 21 variables was used to evaluate the flood prediction performance of these three algorithms, namely Random Forest, XGBoost, and LightGBM. Their performance was assessed based on prediction accuracy and key variable identification, with the ROC-AUC curve used for comparative analysis. [Results] The result showed that all three models achieved high prediction accuracy. Among them, the XGBoost model exhibited the lowest mean squared error(0.000 186 2) and the highest coefficient of determination(0.925 2). Moreover, the LightGBM model achieved the highest AUC value(0.99) in the ROC-AUC curve. The Random Forest model underperformed the other two across all indicators. [Conclusion] The findings indicate that XGBoost delivers optimal performance for flood probability prediction with lowest prediction errors, while LightGBM is the optimal choice for binary classification tasks, such as predicting flood occurrence.
洪水预测 / 决策树 / 随机森林 / XGBoost / LightGBM
flood prediction / decision trees / Random Forest / XGBoost / LightGBM
/
| 〈 |
|
〉 |