Objective To establish a diabetes prediction model based on four classifiers of extreme gradient boosting(XGBoost),light gradient boosting machine(LightGBM),adaptive boosting(AdaBoost),and multilayer perceptron(MLP) according to clinical indicators,and to evaluate the screening effect. Methods According to the case-control study design,99 attributes of clinical data from the study group and the control group were collected,and analyzed by python 3.8. Then the linear interpolation method and an inherent non-negative latent feature(INLF) model were used to predict the feature missing value,and the classification model was constructed using four classifiers to detect diabetes. Results Through analyses of 3 241 patients with hypertension combined with diabetes(study group) and 4 181 patients with hypertension(control group) in the model,99 features were included. The accuracy rates of the diabetes classification model based on XGBoost,LightGBM,AdaBoost,and MLP classifiers were 0.894 9,0.887 5,0.862 0,and 0.856 6,respectively. Conclusion Our proposed classifier model framework based on INLF prediction has a good screening effect,and preliminarily solves the problem of early diabetes screening through machine learning,which has certain practical significance for clinical diagnosis and can be used as a simple and effective screening method for diabetes and its complications.
该研究以重庆市黔江中心医院2017年至2019年匿名电子病历(electronic medical record,EMR)首页为基础进行的回顾性研究,所有的诊断都由ICD-10(国际疾病分类,第10版)指定,慢性病检验数据来自于重庆市黔江中心医院A医院实验室信息管理系统(laboratory information system,LIS)中2017年至2019年的历史数据,涉及的检验项目较多,包括血液、体液、生物化学、免疫等项目。为了证明提出的策略的有效性,根据纳入排除标准制作了2个慢性病数据集(高血压病、高血压病合并糖尿病)。研究组定义为诊断为含有原发性高血压(I10)、糖尿病(E10~E14)及其合并症的住院患者共3 241例,对照组为4 181例仅患有原发性高血压(I10)及其合并症(不包括糖尿病)的患者。
1.1.2 数据预处理
数据集成。采用数据关联方法对医院信息系统(hospital information system,HIS)入院诊断数据和LIS临床检验数据进行整合和清洗,基于纳入和排除标准进行了选择。
ChenC, SongJL, XuXL,et al. Analysis of influencing factors of economic burden and medical service utilization of diabetic patients in China[J]. PLoS One,2020,15(10):e0239844.
[2]
ZhongVW, YuDM, ZhaoLY,et al. Achievement of guideline-recommended targets in diabetes care in China:a nationwide cross-sectional study[J]. Ann Intern Med,2023,176(8):1037-1046.
[3]
ZhouKX, DonnellyLA, MorrisAD,et al. Clinical and genetic determinants of progression of type 2 diabetes:a DIRECT study[J]. Diabetes Care,2014,37(3):718-724.
[4]
NazarzadehM, BidelZ, CanoyD,et al. Blood pressure lowering and risk of new-onset type 2 diabetes:an individual participant data meta-analysis[J]. Lancet,2021,398(10313):1803-1810.
[5]
BerikolGB, YildizO, ÖzcanIT. Diagnosis of acute coronary syndrome with a support vector machine[J]. J Med Syst,2016,40(4):84.
[6]
DinhA, MiertschinS, YoungA,et al. A data-driven approach to predicting diabetes and cardiovascular disease with machine learning[J]. BMC Med Inform Decis Mak,2019,19(1):211.
[7]
DanielsJ, HerreroP, GeorgiouP. A multitask learning approach to personalized blood glucose prediction[J]. IEEE J Biomed Health Inform,2022,26(1):436-445.
GongJ, DuC, ZhongXG,et al. Researches on the illness risk of essential hypertension complicated with coronary heart disease based on machine learning algorithm[J]. Med J Chin People’s Liberation Army,2020,45(7):735-741.
[12]
HossainME, UddinS, KhanA. Network analytics and machine learning for predictive risk modelling of cardiovascular disease in patients with type 2 diabetes[J]. Expert Syst Appl,2021,164:113918.
[13]
LuoX, ZhouMC, XiaYN,et al. An efficient non-negative matrix-factorization-based approach to collaborative filtering for recommender systems[J]. IEEE Trans Ind Inform,2014,10(2):1273-1284.
[14]
LuoX, ZhouMC, LiS,et al. An inherently nonnegative latent factor model for high-dimensional and sparse matrices from industrial applications[J]. IEEE Trans Ind Inform,2018,14(5):2011-2022.
[15]
ShangMS, LuoX, LiuZG,et al. Randomized latent factor model for high-dimensional and sparse matrices from industrial applications[J]. IEEE/CAA J Autom Sin,2019,6(1):131-141.
[16]
OgunleyeA, WangQG. XGBoost model for chronic kidney disease diagnosis[J]. IEEE/ACM Trans Comput Biol Bioinform,2020,17(6):2131-2140.
[17]
PunmiyaR, ChoeS. Energy theft detection using gradient boosting theft detector with feature engineering-based preprocessing[J]. IEEE Trans Smart Grid,2019,10(2):2326-2329.
[18]
XiangY, ZhouYR, YangXW,et al. A many-objective evolutionary algorithm with pareto-adaptive reference points[J]. IEEE Trans Evol Comput,2020,24(1):99-113.