在医学随访研究中,纵向观测数据(如重复测量的生物标志物或症状评分)与生存时间数据(如疾病进展或死亡事件)存在密切关联。传统的独立分析方法因忽视二者内在关联及测量误差,易导致统计推断偏差。联合模型通过共享随机效应关联纵向子模型与生存子模型,可纠正重复测量中的测量误差,提升参数估计效率和统计检验效能。传统频率学派的联合模型在简单场景下具有可行性,但在处理高维、非线性或复杂缺失机制处理时面临计算与推断挑战。贝叶斯联合模型基于马尔可夫链蒙特卡罗(Markov chain Monte Carlo,MCMC)方法,通过引入先验分布和后验抽样技术,在参数估计稳健性、模型扩展性和动态预测性能方面更具优势。本文简介贝叶斯联合模型的方法学框架,包括:(1)纵向子模型(如线性混合效应模型)与生存子模型(如Cox比例风险模型)的构建;(2)三类常见关联结构(当前值、当前斜率及累积面积);(3)基于MCMC的贝叶斯参数估计;(4)个体化动态预测与模型性能评估。以原发性胆汁性肝硬化为例,演示贝叶斯联合模型的实际应用流程:从临床预测指标筛选、单/多指标联合模型拟合与比较,到时间依赖性ROC曲线验证预测效能。实例分析显示,贝叶斯联合模型可有效整合纵向轨迹信息,动态更新个体生存概率,为临床精准决策提供量化依据。
Abstract
In medical follow-up studies, there is often an intrinsic association between longitudinal observations (e.g., repeatedly measured biomarkers or symptom scores) and time-to-event data (e.g., disease progression or mortality). Traditional approaches that analyze these two data types independently may lead to biased statistical inference, as they neglect this inherent association and measurement errors. Joint models (JMs) address this limitation by linking longitudinal and survival sub-models through shared random effects, thereby correcting measurement errors in repeated measurements and improving estimation efficiency and statistical power. While traditional frequentist approaches are feasible in simple scenarios, they face computational and theoretical challenges when handling high-dimensional, nonlinear, or complex missing data mechanisms. In contrast, Bayesian JMs leverage Markov chain Monte Carlo (MCMC) methods, incorporating prior distributions and posterior sampling techniques to enhance robustness in parameter estimation, model flexibility, and dynamic prediction performance. This article introduces the methodological framework of Bayesian joint models, including: (1) Specification of longitudinal sub-models (e.g., linear mixed-effects models) and survival sub-models (e.g., Cox proportional hazards models); (2) Three common association structures (current value, current slope, and cumulative area); (3) Bayesian parameter estimation via MCMC; (4) Personalized dynamic prediction and model performance evaluation. Using primary biliary cirrhosis (PBC) as a case study, we demonstrate the practical application of Bayesian JMs, ranging from the selection of predictive indicators, fitting and comparison of single/multi-indicator JMs to the predictive performance using time-dependent ROC curves. The case study indicates that Bayesian JMs can effectively integrate longitudinal trajectory information, dynamically update individual survival probabilities, and provide quantitative support for clinical decision-making.
在参数估计方面,早期研究主要采用频率学派的极大似然估计或偏似然估计[5]。然而,当处理包含多结局指标、竞争风险事件或高维随机效应的复杂模型时,这类方法对分布假设较为敏感,容易出现估计不稳定的问题。基于马尔可夫链蒙特卡罗 (Markov chain Monte Carlo,MCMC)算法的贝叶斯方法为此提供了更优的解决方案:通过引入参数的先验分布并结合数据得到后验分布进行统计推断,不仅显著提高了参数估计的稳健性和灵活性,还能自然处理多事件终点和复合纵向指标[7],且无需对随机效应分布施加正态性假设[8]。由此可见,贝叶斯联合建模方法有效突破了传统频率学派方法在模型复杂度和分布假设方面的双重局限,为多维度、多类型临床数据的整合分析提供了更为完善的统计解决方案。
在贝叶斯联合建模框架下,模型拟合效果通常通过信息准则进行评估,如偏差信息准则(deviance information criterion,DIC)和广泛适用信息准则(widely applicable information criterion,WAIC)。DIC基于后验偏差与有效参数个数的权衡,值越小表示模型拟合越好且复杂度适中[17];WAIC通过计算对数预测密度评估模型泛化能力,同样以较小值为优[18]。这两种准则均适用于复杂随机效应结构。
模型预测性能的评估主要关注区分度与校准度。区分度常用时间依赖性ROC曲线(time-dependent ROC)及其曲线下面积(area under curve,AUC)来评估模型区分度。动态AUC能贴切地反映预测模型在随访不同时间点的判别力。校准度反映预测风险与实际风险的一致性,可通过动态预期布里尔分数(Brier score,BS)或绘制校准图来评估[19]。
模型首先以单一纵向指标——血清胆红素作为纵向子模型的因变量。为了满足拟合线性混合模型的因变量近似正态分布的要求,对血清胆红素进行对数变换得到log(serBilir)。本研究对纵向子模型和生存子模型分别采用最优子集法筛选自变量,以赤池信息准则(Akaike information criterion,AIC)最低为标准。纵向子模型在基线特征中筛选出腹水和蜘蛛痣等4个变量;生存子模型使用相同方法筛选变量,并通过Schoenfeld残差法检验比例风险假设,筛选出性别、腹水和水肿等5个变量。同时,基于临床意义,将用药情况纳入两个子模型中进行分析。
目前,贝叶斯联合模型在医学领域的应用正逐步深化,但仍面临高维多模态数据整合、缺失数据处理及模型可解释性提升等关键挑战。未来研究可聚焦于以下几个方向:(1)拓展模型对高维多模态纵向数据(如影像组学、基因组学数据)的处理能力;(2)优化针对非随机缺失数据的联合建模方法;(3)结合贝叶斯模型平均(Bayesian model averaging,BMA)等技术,整合不同关联结构模型的预测结果以提升稳健性;(4)通过大规模临床试验和真实世界研究验证其应用价值,推动精准医学实践。随着计算工具的日益完善和个体化预后评估需求的增长,贝叶斯联合模型有望在更广泛的医学研究领域发挥重要作用。
RIZOPOULOSD.Joint modeling of longitudinal and time-to-event data:challenges and future directions[M/OL]//TORELLI N,PESARIN F,BAR-HEN A.Advances in Theoretical and Applied Statistics.Berlin,Heidelberg:Springer,2013:199-209.(2013-01-01)[2026-03-08].
[5]
TSIATISAA, DEGRUTTOLAV, WULFSOHNMS.Modeling the relationship of survival to longitudinal data measured with error.Applications to survival and CD4 counts in patients with AIDS[J].J Am Stat Assoc,1995,90(429):27-37.
[6]
SWEETINGMJ, THOMPSONSG.Joint modelling of longitudinal and time-to-event data with application to predicting abdominal aortic aneurysm growth and rupture[J].Biom J,2011,53(5):750-763.
[7]
RIZOPOULOSD, GHOSHP.A bayesian semiparametric multivariate joint model for multiple longitudinal outcomes and a time-to-event[J].Stat Med,2011,30(12):1366-1380.
[8]
GOULDAL, BOYEME, CROWTHERMJ,et al.Joint modeling of survival and longitudinal non-survival data:Current methods and issues.Report of the DIA bayesian joint modeling working group[J].Stat Med,2015,34(14):2181-2195.
[9]
RIZOPOULOSD.Joint models for longitudinal and time-to-event data:With applications in R[M].New York:Chapman and Hall/CRC,2012.
[10]
RIZOPOULOSD, HATFIELDLA, CARLINBP,et al.Combining dynamic predictions from joint models for longitudinal and time-to-event data using bayesian model averaging[J].J Am Stat Assoc,2014,109(508):1385-1397.
[11]
YEW, LINX, TAYLORJMG.Semiparametric modeling of longitudinal measurements and time-to-event data--a two-stage regression calibration approach[J].Biometrics,2008,64(4):1238-1246.
[12]
WOLBERSM, BABIKERA, SABINC,et al.Pretreatment CD4 cell slope and progression to AIDS or death in HIV-infected patients initiating antiretroviral therapy—the CASCADE collaboration:a collaboration of 23 cohort studies[J].PLoS Med,2010,7(2):e1000239.
[13]
RIZOPOULOSD.The R package JMbayes for fitting joint models for longitudinal and time-to-event data using MCMC[J].J Stat Softw,2016,72:1-46.
[14]
GODANAAA, MOLLABT, ABATIHUND.Bayesian longitudinal modeling of blood pressure measurements of hypertensive patients at wachemo university nigist elleni mohamed memorial teaching and referral hospital hosanna,southern ethiopia[J].Heliyon,2023,9(12):e22984.
[15]
YUM, TAYLORJMG, SANDLERHM.Individual prediction in prostate cancer studies using a joint longitudinal survival-cure model[J].J Am Stat Assoc,2008,103(481):178-187.
[16]
TAYLORJMG, PARKY, ANKERSTDP,et al.Real-time individual predictions of prostate cancer recurrence using joint models[J].Biometrics,2013,69(1):206-213.
[17]
SPIEGELHALTERDJ, BESTNG, CARLINBP,et al.Bayesian measures of model complexity and fit[J].J R Statist Soc B,2002,64(4):583-639.
[18]
GELMANA, HWANGJ, VEHTARIA.Understanding predictive information criteria for bayesian models[J].Stat Comput,2014,24(6):997-1016.
[19]
LIK, LUOS.Dynamic predictions in bayesian functional joint models for longitudinal and time-to-event data:An application to alzheimer’s disease[J].Stat Methods Med Res,2019,28(2):327-342.
[20]
RIZOPOULOSD.JM:an R package for the joint modelling of longitudinal and time-to-event data[J].J Stat Softw,2010,35:1-33.
[21]
RIZOPOULOSD, MIRANDA-AFONSOP, PAPAGEORGIOUG.JMbayes2:extended joint models for longitudinal and time-to-event data[CP/OL].(2026-01-28)[2026-03-08].
[22]
LIUL, ZHENGC, KANGJ.Exploring causality mechanism in the joint analysis of longitudinal and survival data[J].Stat Med,2018,37(26):3733-3744.
[23]
RAYNAUDM, AUBERTO, DIVARDG,et al.Dynamic prediction of renal survival among deeply phenotyped kidney transplant recipients using artificial intelligence:an observational,international,multicohort study[J].Lancet Digit Health,2021,3(12):e795-e805.