基于可解释多任务学习模型揭示糖尿病联合并发症的关键特征及预测建模
Interpretable multitask learning⁃based model reveals key features and predictive modelling of joint complications in diabetes mellitus
糖尿病并发症是引起糖尿病患者死亡的重要因素,揭示并发症的关键特征能有效地帮助医生制定针对性干预策略,从而降低糖尿病患者并发症状况下的死亡风险.然而,既往研究大多集中在识别糖尿病单一并发症的风险因素上,忽略了并发症之间的潜在关联,因此,基于国家人口健康科学数据中心提供的糖尿病并发症预警数据集,采用皮尔逊相关系数和卡方检验筛选出显著相关的糖尿病并发症,并将其纳入多任务学习模型中进行联合建模.接着使用SHAP (SHapley Additive exPlanations)评估各特征的重要性,筛选出SHAP的值高于75%分位数的11个特征作为糖尿病联合并发症的重要风险因素.基于随机森林、逻辑回归、梯度提升模型、极限梯度提升模型、自适应增强算法以及类别特征梯度提升模型构建糖尿病联合并发症预测模型,输入变量为SHAP的值高于25%分位数的特征,结合网格搜索选择最优参数组合,并通过准确率、精确率、F1⁃score、AUC等指标评估模型的预测性能.结果表明,采用可解释的多任务学习模型筛选出来的特征是关键特征,六种预测模型的AUC均接近0.90.最后引入LIME (Local Interpretable Model⁃Agnostic Explanations)对模型进行解释,进一步验证所构建的可解释多任务学习模型筛选关键特征的有效性与可靠性.可解释多任务学习模型充分考虑了并发症之间的潜在关系,能够准确地识别糖尿病联合并发症的关键风险因素,辅助医生制定针对性干预策略,有助于减少患者因并发症导致的死亡.
Complications of diabetes mellitus are important factors in patient mortality,and revealing their key features can effectively help physicians develop targeted intervention strategies to reduce the risk of death in comorbid conditions. However,most previous studies have focused on identifying risk factors for a single complication of diabetes,ignoring potential associations between complications. Therefore,based on the Diabetes Complications Early Warning Dataset provided by the National Population Health Sciences Data Centre,we used Pearson's correlation coefficient and the chi⁃square test to screen out significantly associated diabetic complications and incorporated them into a multi⁃task learning model for joint modeling. Then the importance of each feature was assessed using SHAP (SHapley Additive exPlanations),and 11 features with SHAP values higher than the 75% quartile were screened as significant risk factors for diabetes co⁃morbidities. A predictive model for diabetes⁃related complications was constructed using random forest,logistic regression,gradient boosting,extreme gradient boosting,adaptive boosting,and categorical feature gradient boosting. Input variables comprised features with SHAP values exceeding the 25th percentile. Optimal parameter combinations were selected via grid search,with model predictive performance evaluated using metrics including accuracy,precision,F1⁃score,and AUC. Results indicated that features selected through the interpretable multi⁃task learning model constituted key predictors,with all six predictive models achieving AUC values approaching 0.90. Finally,LIME (Local Interpretable Model⁃Agnostic Explanations) was introduced to interpret the model outcomes,thereby further validating the effectiveness and reliability of the constructed interpretable multi⁃task learning model for screening key features. The interpretable multi⁃task learning model comprehensively accounts for the underlying relationships between complications,enabling the precise identification of key risk factors for concurrent diabetic complications. This assists clinicians in formulating targeted intervention strategies,thereby helping to reduce patient mortality attributable to complications.
| [1] |
American Diabetes Association. Diagnosis and classification of diabetes mellitus. Diabetes Care,2014,37():S81-S90. |
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
|
| [24] |
|
| [25] |
|
| [26] |
|
| [27] |
|
| [28] |
黎子豪,蒋恕. 基于机器学习和SHAP算法的声波测井曲线重构及可解释性分析. 地质科技通报,2025,44(1):321-331. |
| [29] |
李佳思. 基于机器学习的糖尿病预测及SHAP特征分析. 智能计算机与应用,2023,13(1):153-157. |
| [30] |
|
| [31] |
|
| [32] |
|
| [33] |
|
| [34] |
|
| [35] |
|
| [36] |
|
| [37] |
|
| [38] |
|
| [39] |
|
| [40] |
徐良辰,郭崇慧. 基于集成学习的胃癌生存预测模型研究. 数据分析与知识发现,2021,5(8):86-99. |
| [41] |
|
| [42] |
|
| [43] |
|
| [44] |
|
| [45] |
|
| [46] |
宋亚男,武惠韬,应俊, |
| [47] |
|
安徽省教育教学改革研究项目(2024sx047)
安徽省高校自然科学基金重点研究项目(2022AH050328)
/
| 〈 |
|
〉 |