Objective To assess the applicability of a multi-source data fusion approach for constructing high-precision regional precipitation datasets. Methods Taking the Tarim River basin as the study area, the study compared the performance of four machine learning methods (random forest, support vector machine, XGBoost, and regression tree) in simulating regional precipitation using multi-source data fusion. It further discussed the effects of different satellite precipitation products (GPM IMERG-v06 (GPMv06) and GPM IMERG-v07 (GPMv07)) and incorporating the lagged response of the normalized difference vegetation index (NDVI) to precipitation on fusion model accuracy. Results Both GPMv06 and GPMv07 overestimated the precipitation in low-elevation zones and underestimated it in high-elevation zones. Compared with GPMv06, GPMv07 demonstrated improved precipitation prediction accuracy in both summer and winter, with the Nash-Sutcliffe efficiency (NSE) coefficient for winter precipitation increasing by 0.58 in particular. Among the four fusion models, XGBoost model achieved the highest accuracy in monthly precipitation simulation. Compared to GPMv07, the XGBoost model reduced the root-mean-square error (RMSE) by 2.01 mm, increased the percentage of sites with NSE coefficient no less than 0.6 by 33%, and raised the mean NSE coefficient by 0.23. The input error of satellite precipitation had a relatively minor influence on the accuracy of XGBoost model. Incorporating the lagged response of NDVI to precipitation improved the model's accuracy in some spring and summer months but had limited effect for most autumn and winter months. Conclusion The XGBoost model demonstrates significant advantages in correcting the GPMv07 satellite precipitation data in the Tarim River basin, achieving substantial improvements in satellite precipitation prediction accuracy. This research provides data references for studies in regional water resource management and soil erosion prevention.
全球降水测量(global precipitation measurement,简称GPM)是由美国国家航空航天局(NASA)和日本宇宙航空研究开发机构(JAXA)联合发起的一项国际卫星任务,旨在提供高精度、高分辨率的全球降水数据[18]。本研究使用2种GPM降水产品,分别为Integrated Multi-satellitE Retrievals for Global Precipitation Measurement Final Run v06 (GPM IMERG-v06)和Integrated Multi-satellitE Retrievals for Global Precipitation Measurement Final Run v07 (GPM IMERG-v07),时间为2001—2017年,时间分辨率为月,空间分辨率为0.1°×0.1°,数据来源于美国国家航空航天局官方网站(https://pmm.nasa.gov/)。下文中GPM IMERG-v06和GPM IMERG-v07分别简称为GPMv06和GPMv07。
SHENH Y, LIJ, WANGZ H, et al. Water resources utilization and eco-environment problem of Fenhe River, branch of Yellow River[J].Geology in China,2022,49(4):1127-1138.
DONGJ P, YEY T, GUJ J, et al. Multi-temporal characterization analysis of remotely sensed precipitation downscaling in the Luanhe River basin, China[J].Journal of Hydroelectric Engineering,2022,41(8):77-91.
XIONGJ H, GUOJ, GUOS L, et al. Estimating probable maximum precipitation based on multisource data of precipitation in the Lancang-Mekong River basin[J].Journal of Hydroelectric Engineering,2022,41(9):77-86.
NANT Y, CHENJ, DINGZ W, et al. Deep learning-based multi-source precipitation merging for the Tibetan Plateau[J].Science China Earth Sciences,2023,66(4):852-870.
YUH, LIANGZ T, YANY C. Review on multi-source and multi-modal data fusion and integration[J].Information Studies (Theory and Application),2020,43(11):169-178.
PANY, SHENY, YUJ J, et al. An experiment of high-resolution gauge-radar-satellite combined precipitation retrieval based on the Bayesian merging method[J].Acta Meteor Sinica,2015, 73(1):177-186.
SHIY J, WANGZ J, SUOY. Evaluation of Haihe River basin precipitation resources based on multisource data fusion[J].Advances in Water Science,2022,33(4):602-613.
RUANH H, ZHANGJ M, XUJ H, et al. An XGBoost-based geostatistical data fusion method for integrating hourly gauge-radar-satellite precipitation data by considering the temporal correlation characteristics of precipitation[J].Journal of Tropical Meteorology,2023,39(3):300-312.
[25]
LUX Y, LIJ, LIUY, et al. Quantitative precipitation estimation in the Tianshan Mountains based on machine learning[J].Remote Sensing,2023,15(16):e3962.
[26]
KARBALAYE GHORBANPOURA, HESSELST, MOGHIMS, et al. Comparison and assessment of spatial downscaling methods for enhancing the accuracy of satellite-based precipitation over Lake Urmia basin[J].Journal of Hydrology,2021,596:e126055.
DENGW B, HOUX Q. Construction of a snow cover prediction model in Xinjiang based on machine learning algorithm[J].Journal of Basic Science and Engineering,2024,32(6):1664-1677.
SUNQ, Aliya·Baidourela, Ilyas·Nurmuhammat. The response of NDVI to the changes of terrestrial water storage and precipitation in Tarim basin[J].China Rural Water and Hydropower,2018(2):54-59.
LIW W. Study on the utilization of water and soil resources in Tarim River basin under the influence of climate change and human activities[D].Nanjing:Nanjing University of Information Science and Technology,2022.
LIY Y, NINGS W, DINGW, et al. The evaluation of latest GPM-Era precipitation data in Yellow River basin[J].Remote Sensing for Land and Resources,2019,31(1):164-170.
GUJ J, YEY T, DONGJ P, et al. A high-precision spatial downscaling method for remotely sensed precipitation data in the Luanhe River basin[J].South-to-North Water Transfers and Water Science and Technology,2021,19(5):862-873.
CAIC, RANX T, XUEW, et al. Research on distributed support vector regression model under the background of big data[J].Journal of Systems Science and Mathematical Sciences,2023,43(4):1081-1092.
WANGR B, CAIX Y. Biota-sediment accumulation factor models of organic chemicals in benthic invertebrates with gradient boosting regression tree[J].Asian Journal of Ecotoxicology,2023,18(4):22-33.
DENGY R, CHENGX D, TANGF, et al. The control of moldy risk during rice storage based on multivariate linear regression analysis and random forest algorithm[J].Justc,2022,52(1):44-51.
XIEY Q. Spatial inversion of soil moisture in the Beijing-Tianjin-Hebei Region using integrated multi-source data and XGBoost algorithm[J].Geospatial Information,2024,22(12):20-24.
TANJ, WEIQ J, LIAOZ Y, et al. Relationship between urban form and surface temperature based on XGBoost SHAP interpretable machine learning model[J].Chinese Journal of Applied Ecology,2025,36(3):659-670.
JIC M, ZHAOY W, ZHANGY K, et al. Short-term power balance model considering cost of abandoned hydropower[J].Journal of Hydroelectric Engineering,2021,40(3):50-63.
[49]
TYSONC, LONGYANGQ Q, NEILSONB T, et al. Effects of meteorological forcing uncertainty on high-resolution snow modeling and streamflow prediction in a mountainous karst watershed[J].Journal of Hydrology,2023,619:e129304.
[50]
AKSUH, YALDIZS G. Performance comparison of GPM IMERG V07 with its predecessor V06 and its application in extreme precipitation clustering over Türkiye[J].Atmospheric Research,2025,315:e107840.
[51]
WANGY J, LIZ, GAOL, et al. Comparison of GPM IMERG version 06 final Run products and its latest version 07 precipitation products across scales: Similarities,differences and improvements[J].Remote Sensing,2023,15(23):e5622.
LianR L. Application of XGBoost algorithm in downscaling of GPM precipitation data in Sichuan Province[J].Water Resources and Power,2021,39(10):14-17.
[54]
MAZ Q, XUJ T, ZHUS Y, et al. AIMERG: A new Asian precipitation dataset (0.1°/half-hourly, 2000-2015) by calibrating GPM IMERG at daily scale using APHRODITE[J].Earth System Science Data, 2020:1525-1544.