现有的电子健康记录(electronic health records, EHR)的图表示学习方法多依赖单个患者的局部信息,忽视了群体患者在疾病演化和诊疗路径上的潜在关联,从而限制了模型的泛化性与鲁棒性.针对这一问题,本文提出一种混合多层级图神经网络(hybrid multi-level graph neural network, H-MGNN)模型,并将其应用于重症监护室(intensive care unit, ICU)患者的死亡预测.该模型通过构建宏观层面的患者关系图(patient-patient graph, P-P)、微观层面的分类-笔记-词汇超图(taxonomy-note-word hypergraph, T-N-W),结合超图的时序依赖关系,实现多尺度上的患者特征融合.同时,本文设计了融合算法(hybrid embedding, Hybrid-E),用于提取和整合患者嵌入的潜在特征,以提升预测准确性.实验结果表明,H-MGNN在MIMIC-Ⅲ(medical information mart for intensive care Ⅲ)数据集上的住院死亡率预测等任务中显著优于现有方法,验证了其在复杂EHR数据挖掘中的有效性和先进性.
Abstract
Existing graph representation learning methods for electronic health records (EHR) primarily rely on local information of a single patient, overlooking potential associations among patients in disease progression and treatment pathways. This limits the models’ generalizability and robustness. To address this issue, a hybrid multi-level graph neural network (H-MGNN) model was proposed, and it was applied to mortality prediction for intensive care unit (ICU) patients. The model constructed a patient-patient graph (P-P) at the macroscopic level and a taxonomy-note-word hypergraph (T-N-W) at the microscopic level, while incorporating temporal dependencies within the hypergraph to achieve multi-scale fusion of patient features. Meanwhile, a hybrid embedding (Hybrid-E) algorithm was designed to extract and integrate latent patient features and improve the prediction accuracy. Experimental results demonstrate that H-MGNN on the medical information mart for intensive care Ⅲ (MIMIC-Ⅲ) dataset significantly outperforms existing methods in terms of in-hospital mortality prediction and other tasks, validating its effectiveness and superiority in complex EHR data mining.
早期的研究多利用时序数据预测患者健康状态[2-5].随后,学者将非结构化的临床笔记引入预测任务中[6- 9].然而临床笔记内容难以理解且结构复杂,因此信息挖掘难度大.随着预训练语言模型(pre-trained language models,PLMs)的发展逐步完善[10-11],研究者通过在大规模语料库上的预训练,并结合针对临床笔记的微调(fine-tuning)或迁移学习,显著提升了模型的文本理解能力和预测精度.进一步地,大语言模型(large language models,LLMs)的发展以及其检索与推理能力增强方法的成熟,如检索增强生成(retrieval-augmented generation,RAG),为临床笔记的信息挖掘提供了新的研究范式[12].
在EHR研究中,时序特性同样是影响预测性能的关键因素.例如,文献[13-15]将时序特征与预训练模型相结合,构建了一系列基于Transformer的模型,在多项临床任务中取得了良好表现.针对临床笔记的时序建模,文献[16-17]使用插值等方法处理时间间隔不规则问题,并结合BERT(bidirectional encoder representations from Transformers)及Transformer等预训练模型,进一步提升了对非结构化临床数据的建模能力.尽管这些模型性能优越,但通常计算开销大,且侧重于时序依赖建模,对EHR中复杂的结构关系关注不足.
近年来,图神经网络(graph neural network,GNN)[18-19]的兴起为复杂非欧几何结构的表示提供了有效的工具,并在自然语言处理领域[20-21]取得显著成效.归纳式图神经网络,如TextING (text-based interaction graph neural network)[22]、InducT-GCN (inductive text graph convolutional network)[23]和SSL-GNN (sparse structure learning via graph neural networks)[24],有效缓解了全局图初始化与泛化能力弱的问题;在长文本建图过程中,为应对边密集问题,研究者提出图稀疏化策略与超图建模方法,如HyperGAT(hypergraph attention network)[25]、HEGEL(hypergraph embedding with graph-enhanced learning)[26]等;在复杂关系构建中,超图通过超边建立多对多联系,天然适配层次化结构与高阶关系建模.然而现有的GNN方法多聚焦于单个患者EHR的内部结构,忽略了患者之间存在的关联关系,如语义、病理关联等.如何在大规模EHR数据中建立跨患者间的关系图,并通过图学习增强个体嵌入的表达能力,仍是一项具有临床意义的任务.尤其是当关系图包含数万节点和数百万条边时,EHR的图表示问题更具复杂性与挑战性.
图嵌入学习旨在将图的结构信息映射到低维空间,同时保留节点之间的语义关系,广泛应用于节点分类、链接预测等下游任务.其理论基础来源于自然语言处理中“相似语境的词具有相似意义”的分布假设,即“You shall know a word by the company it keeps”.
早期经典方法如多维缩放(multidimensional scaling,MDS)[45]、IsoMap[46]、局部线性嵌入(locally linear embedding,LLE)[47]以及拉普拉斯特征映射(Laplacian eigenmaps)[48]等,通常先基于数据特征向量构建邻接图(如K近邻图),再进行维度压缩.这些方法大多依赖于对邻接矩阵或拉普拉斯矩阵的特征分解,计算复杂度至少为节点数的平方,因此难以扩展至大规模图.为缓解大图建模的效率瓶颈,研究者提出了多种基于采样与矩阵分解的图嵌入方法.例如,图分解类方法GraRep[49]和HOPE(high-order proximity preserved embedding)[50]等,均可通过矩阵分解捕捉图的高阶结构,但可能损失全局语义信息.DeepWalk[51]首次引入截断随机游走与Skip-Gram机制,将图嵌入问题转化为类语言建模问题,显著提升了可扩展性.LINE(large-scale information network embedding)[52]在此基础上进一步建模一阶与二阶邻近关系,有效保留图的局部和整体结构;后续提出的PTE(predictive text embedding)[53]则扩展至异构图,支持包含类别信息的文本节点建模.需要指出的是,传统图嵌入方法多采用无监督学习策略,即在学习过程中未利用节点的标签信息,而仅在后续分类器中引入监督信号.尽管如此,这类方法在训练过程中能够整合节点上下文、局部邻近关系与全局拓扑特征,其学习到的表示具有良好的通用性,可广泛用于多种图相关任务.随着GNN的发展,嵌入学习逐渐与半监督学习结合,为下游任务提供更丰富且结构感知能力更强的表示.近年来,面向高阶结构关系的图建模中引入了超图表示学习,进一步拓展了图嵌入学习的研究边界.
MIMIC-Ⅲ[1]是1个公开可用的临床数据库,收录了约46 520名于2001年至2012年期间在Beth Israel Deaconess Medical Center(贝斯以色列女执事医疗中心)接受ICU治疗的患者的医疗记录.该数据集涵盖人口统计信息、生理测量、实验室检查、医疗干预、药物使用、文本记录,以及诊断和手术编码等多维度信息,存储在26个关联表中.
本文选取了基于词、时序、图和超图的四类基准方法进行对比,并对数据集进行裁剪以适配不同的基线模型.词嵌入方法采用FastText[55],该方法基于词的上下文信息生成词向量,能够有效缓解低频词汇带来的数据稀疏问题;时序建模方法包括双向长短期记忆网络(bi-directional long short-term memory,Bi-LSTM)[56],通过捕捉文本中的双向时序信息来提高文本分类性能,带注意力机制的双向LSTM(bi-LSTM with attention,Bi-LSTM-Att)[57],通过注意力机制自动聚焦输入序列中的重要部分;图方法包括文本图神经网络(inductive text classification via graph neural network,TextING)[23],通过构建文本词共现图,并利用GNN提取词之间的依赖关系;归纳图卷积网络(inductive text graph convolutional network,InducT-GCN),通过GNN对文本进行图结构建模,适用于未知类别的文本分类任务.超图方法包括超图注意力网络(hypergraph attention network,HyperGAT)[25],结合超图结构和注意力机制来提升模型的表示能力;多层分类学超图神经网络(taxonomy-aware multi-layer hypergraph neural network, TM-HGNN)[58],通过分类学超图结构建模复杂文本结构,提升分类效果.
JohnsonA E W, PollardT J, ShenL, et al. MIMIC-III, a freely accessible critical care database[J]. Scientific Data, 2016, 3: 160035.
[2]
LiptonZ C, KaleD C, ElkanC, et al. Learning to diagnose with LSTM recurrent neural networks[C]// International Conference on Learning Representations (ICLR). San Juan, 2016:1-8.
[3]
CheZ P, PurushothamS, ChoK, et al. Recurrent neural networks for multivariate time series with missing values[J]. Scientific Reports, 2018, 8: 6085.
[4]
MaloneB, Garcia-DuranA, NiepertM. Learning representations of missing data for predicting patient outcomes[EB/OL]. (2018-12-12)[2025-02-18].
[5]
XuY B, BiswalS, DeshpandeS R, et al. RAIM: recurrent attentive and intensive model of multimodal patient monitoring data[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’18). London, 2018: 2565-2573.
[6]
NgoQ H, KechadiT, Le-KhacN A. Domain specific entity recognition with semantic-based deep learning approach[J]. IEEE Access, 2021, 9: 152892-152902.
[7]
RasmyL, XiangY, XieZ Q, et al. Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction[J]. NPJ Digital Medicine, 2021, 4: 86.
[8]
LeeJ, YoonW, KimS, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining[J]. Bioinformatics, 2020, 36(4): 1234-1240.
[9]
AlsentzerE, MurphyJ R, BoagW, et al. Publicly available clinical BERT embeddings[EB/OL]. (2019-04-06)[2025-01-10].
[10]
VaswaniA, ShazeerN, ParmarN, et al. Attention is all you need[J]. Advances in Neural Information Processing Systems, 2017, 30:6000-6010.
LewisP, PerezE, PiktusA, et al. Retrieval-augmented generation for knowledge-intensive NLP tasks[J]. Advances in Neural Information Processing Systems, 2020, 33: 9459-9474.
[13]
SongH, RajanD, ThiagarajanJ, et al. Attend and diagnose: clinical time series analysis using attention models[C]//AAAI Conference on Artificial Intelligence. New Orleans: AAAI Press, 2018: 4091-4098.
[14]
HirszowiczO, AranD. ICU bloodstream infection prediction: a transformer-based approach for EHR analysis[C]//Artificial Intelligence in Medicine. Cham: Springer, 2024: 279-292.
[15]
LiY K, RaoS, SolaresJ R A, et al. BEHRT: transformer for electronic health records[J]. Scientific Reports, 2020, 10: 7155.
[16]
PenningtonJ, SocherR, ManningC. GLOVE: global vectors for word representation[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha: ACL, 2014: 1532-1543.
[17]
TipirneniS, ReddyC K. Self-supervised transformer for sparse and irregularly sampled multivariate clinical time-series[J]. ACM Transactions on Knowledge Discovery from Data, 2022, 16(6): 105. 1-105.17.
[18]
KipfT N, WellingM. Semi-supervised classification with graph convolutional networks[EB/OL]. (2016-09-09)[2020-01-05].
LiuX E, YouX X, ZhangX, et al. Tensor graph convolutional networks for text classification[C]// Proceedings of the AAAI Conference on Artificial Intelligence. Philadelphia: AAAI Press, 2020: 8409-8416.
[21]
YaoL, MaoC S, LuoY. Graph convolutional networks for text classification[C]// Proceedings of the AAAI Conference on Artificial Intelligence. Los Angeles: AAAI Press, 2019: 7370-7377.
[22]
ZhangY F, YuX L, CuiZ Y, et al. Every document owns its structure: inductive text classification via graph neural networks[EB/OL]. (2020-04-22)[2021-05-10].
[23]
WangK Z, HanS C, PoonJ. InducT-GCN: inductive graph convolutional networks for text classification[C]// 2022 26th International Conference on Pattern Recognition (ICPR). Montreal: IEEE, 2022: 1243-1249.
[24]
PiaoY h, LeeS S, LeeD, et al. Sparse structure learning via graph neural networks for inductive document classification[C]//Processing of the AAAI Conference on Aritificial Intelligence.Vancouver,2022:11165-11173.
[25]
DingK Z, WangJ L, LiJ D, et al. Be more with less: hypergraph attention networks for inductive text classification[EB/OL]. (2020-11-01)[2023-05-10].
[26]
ZhangH P, LiuX, ZhangJ W. HEGEL: hypergraph transformer for long document summarization[EB/OL]. (2022-08-09)[2023-05-10].
[27]
ParkS, BaeS, KimJ, et al. Graph-text multi-modal pre-training for medical representation learning[C]// ACM Conference on Health, Inference, and Learning. Online, 2022: 261-281.
[28]
ZhangC H, ChuX, MaL T, et al. M3Care: learning with missing modalities in multimodal healthcare data[C]// Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Washington DC, 2022: 2418-2428.
[29]
XuY X, YangK, ZhangC H, et al. VecoCare: visit sequences-clinical notes joint learning for diagnosis prediction in healthcare data[C]// Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence. Macau, 2023: 4921-4929.
[30]
ChenD X, O’BrayL, BorgwardtK M. Structure-aware transformer for graph representation learning[C]// International Conference on Machine Learning. Online, 2022: 3469-3489.
[31]
ChoiE, BahadoriM T, SongL, et al. GRAM: graph-based attention model for healthcare representation learning[C]// Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Halifax NS: ACM, 2017: 787-795.
[32]
QiuL, GorantlaS, RajanV, et al. Multi-disease predictive analytics: a clinical knowledge-aware approach[J]. ACM Transactions on Management Information Systems, 2021, 12(3): 1-34.
[33]
MaJ T, LiuB, LiK L, et al. A review of graph neural networks and pretrained language models for knowledge graph reasoning[J]. Neurocomputing, 2024, 609: 128490.
[34]
MoX, DingG H, TangR, et al. Bipartite graphs contrastive learning with knowledge-aware diffusion-enhanced[J]. IEEE Transaction Network Science and Engineering, 2025, 12(5): 4182-4195.
[35]
MishraR, ShrideviS. Knowledge graph driven medicine recommendation system using graph neural networks on longitudinal medical records[J]. Scientific Reports, 2024, 14: 25449.
[36]
GauppR, DiniusJ, DrazicI, et al. Long-term effects of an e-learning course on patient safety: a controlled longitudinal study with medical students[J]. PLoS One, 2019, 14(1): e0210947.
[37]
GuptaS, SharmaS, SharmaR, et al. Healing with hierarchy: hierarchical attention empowered graph neural networks for predictive analysis in medical data[J]. Artificial Intelligence in Medicine, 2025, 165: 103134.
[38]
ZhangD D, YinC C, ZengJ C, et al. Combining structured and unstructured data for predictive models: a deep learning approach[J]. BMC Medical Informatics and Decision Making, 2020, 20: 280.
[39]
GayathriR, SangeethaS K B, SangeethaR, et al. Dynamic AI-enhanced therapeutic framework for precision medicine using multi-modal data and patient-centric reinforcement learning[J]. IEEE Access, 2025, 13: 77709-77733.
[40]
HuangK X, SinghA, ChenS T, et al. Clinical XLNet: modeling sequential clinical notes and predicting prolonged mechanical ventilation[EB/OL]. (2019-12-27)[2020-10-10].
[41]
HouL X, ZhuangY, XieY H, et al. Cross-modal generalizable visual-language models via inter-modal bidirectional supervision for enhanced pathology image recognition[J]. Pattern Recognition, 2026, 171: 112240.
[42]
HastutiR P, RajagedeR A, ZhengM, et al. Clinic-prompt: few-shot discrete clinical prompt optimization[C]//Workshop on Large Language Models and Generative AI for Health at AAAI 2025. Philadelphia, 2025:2451490.
[43]
MulyarA, SchumacherE, RouhizadehM, et al. Phenotyping of clinical notes with improved document classification models using contextualized neural language models[EB/OL]. (2019-10-30)[2021-01-02].
[44]
KruskalJ B. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis[J]. Psychometrika, 1964, 29(1): 1-27.
[45]
HaugenE, FirthJ R. Papers in linguistics 1934—1951[J]. Language, 1958, 34(4): 498-502.
[46]
TenenbaumJ B, de SilvaV, LangfordJ C. A global geometric framework for nonlinear dimensionality reduction[J]. Science, 2000, 290(5500): 2319-2323.
[47]
RoweisS T, SaulL K. Nonlinear dimensionality reduction by locally linear embedding[J]. Science, 2000, 290(5500): 2323-2326.
[48]
BelkinM, NiyogiP. Laplacian eigenmaps and spectral techniques for embedding and clustering[C]// Advances in Neural Information Processing Systems 14. Cambridge, MA: MIT Press, 2002: 585-592.
[49]
CaoS S, LuW, XuQ K. GraRep: learning graph representations with global structural information[C]//Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. Melbourne, 2015: 891-900.
[50]
OuM D, CuiP, PeiJ, et al. Asymmetric transitivity preserving graph embedding[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco: ACM, 2016: 1105-1114.
[51]
PerozziB, Al-RfouR, SkienaS. DeepWalk: online learning of social representations[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2014: 701-710.
[52]
TangJ, QuM, WangM Z, et al. LINE: large-scale information network embedding[EB/OL]. (2015-03-12)[2020-03-11].
[53]
TangJ, QuM, MeiQ Z. PTE: predictive text embedding through large-scale heterogeneous text networks[C]//Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Sydney, NSW: ACM, 2015: 1165-1174.
[54]
HarutyunyanH, KhachatrianH, KaleD C, et al. Multitask learning and benchmarking with clinical time series data[J]. Scientific Data, 2019, 6: 96.
[55]
KimN, PiaoY H, KimS. Clinical note owns its hierarchy: multi-level hypergraph neural networks for patient-level representation learning[EB/OL]. (2023-05-16)[2025-02-20].
[56]
ZhouP, ShiW, TianJ, et al. Attention-based bidirectional long short-term memory networks for relation classification[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Berlin: ACL, 2016: 207-212.
[57]
WangZ H, YangB. Attention-based bidirectional long short-term memory networks for relation classification using knowledge distillation from BERT[C]// 2020 IEEE International Conference on Dependable, Autonomic and Secure Computing, International Conference on Pervasive Intelligence and Computing, international conference on Cloud and Big Data Computing, international conference on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech). Calgary: IEEE, 2020: 562-568.
[58]
JoulinA, GraveE, BojanowskiP, et al. Bag of tricks for efficient text classification[EB/OL]. (2016-07-06)[2023-05-10].