打开APP
userphoto
未登录

开通VIP,畅享免费电子书等14项超值服

开通VIP
复杂调查数据分析连载七
userphoto

2023.07.08 浙江

关注
加权生存分析
要将复杂的调查设计纳入生存分析,我们将使用
svykm() 计算生存函数的加权 Kaplan-Meier 估计, ·
svylogrank() 执行加权对数秩检验以比较组间的生存曲线,以及 ·
svycoxph() 执行加权 Cox 回归。
生存函数的加权 Kaplan-Meier 估计
示例 :使用 NHANES 2017-2018 数据 (nhanes1718_rmph.Rdata),估计首次诊断为哮喘的生存函数的加权 Kaplan-Meier 估计值。时间变量是第一次患哮喘的年龄(MCQ025),事件指标是 MCQ010——“曾被告知患有哮喘”)。
我们的目标是有一个事件指示器,对于那些被告知患有哮喘的人是1,对于其他人是0;时间变量是对于那些已经被告知患有哮喘的人来说是第一次被告知患有哮喘的年龄,而对于那些已经被告知患有哮喘的人来说是当前年龄(被审查)。我们首先需要格式化事件指标变量,然后为那些没有哮喘的人填写当前年龄。
注:这一步对于Kaplan-Meier估计、logrank检验和Cox回归是必要的。
# Format the event indicator
levels(nhanes.mod$MCQ010)
## [1] "Yes" "No"
nhanes.mod$asthma <- as.numeric(nhanes.mod$MCQ010 == "Yes")
table(nhanes.mod$MCQ010, nhanes.mod$asthma, exclude = NULL)
##       
##           0    1 <NA>
##   Yes     0 1325    0
##   No   7563    0    0
##   <NA>    0    0  366
# Create a copy of the age variable
nhanes.mod$asthma_age <- nhanes.mod$MCQ025
# Subset on asthma = 0 and replace the missing asthma ages with current age
SUB <- !is.na(nhanes.mod$asthma) & nhanes.mod$asthma == 0
nhanes.mod$asthma_age[SUB] <- nhanes.mod$RIDAGEYR[SUB]
# Check# For !SUB, these should be the same
summary(nhanes.mod$MCQ025[!SUB])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
##     1.0     3.0     8.0    16.6    25.0    80.0
##    NA's
##     387
summary(nhanes.mod$asthma_age[!SUB])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
##     1.0     3.0     8.0    16.6    25.0    80.0
##    NA's
##     387
# For SUB, the first is all missing# but the second has no missing values
summary(nhanes.mod$MCQ025[SUB])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
##      NA      NA      NA     NaN      NA      NA
##    NA's
##    7563
summary(nhanes.mod$asthma_age[SUB])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
##     1.0    12.0    33.0    35.8    59.0    80.0
接下来,我们重新创建一个设计对象(在创建或修改变量时,我们总是必须重新创建该设计对象)。哮喘变量是NHANES访谈的一部分,因此我们使用访谈设计权重。
design.INT <- svydesign(strata=~SDMVSTRA, id=~SDMVPSU, weights=~WTINT2YR,                        nest=TRUE, survey.lonely.psu = "adjust",                        data=nhanes.mod)
接下来,我们使用svykm()来估计生存函数。
km.ex9.4 <- svykm(Surv(asthma_age, asthma) ~ 1,                  design = design.INT)km.ex9.4
## Weighted survival curve: svykm(formula = Surv(asthma_age, asthma) ~ 1, design = design.INT)
## Q1 = Inf  median = Inf  Q3 = Inf
plot(km.ex9.4, ylim = c(0.65, 1),     xlab = "Age (years)",     ylab = "Proportion without asthma")
 


比较组的加权LOGRANK检验
为了比较分组,我们使用公式右侧带有因子变量的svykm,然后使用svylogrank进行LOGRANK检验。
注意:如果您还没有这样做,请格式化事件指示器和时间变量。
示例 :使用 NHANES 2017-2018 数据 (nhanes1718_rmph.Rdata),使用加权对数秩检验比较种族/民族之间首次诊断哮喘的时间。**
km.ex9.4b <- svykm(Surv(asthma_age, asthma) ~ race_eth,                  design = design.INT)km.ex9.4b
## Weighted survival curves:
## svykm(formula = Surv(asthma_age, asthma) ~ race_eth, design = design.INT)
## Hispanic : Q1 = 80  median = Inf  Q3 = Inf
## Non-Hispanic White : Q1 = Inf  median = Inf  Q3 = Inf
## Non-Hispanic Black : Q1 = 62  median = Inf  Q3 = Inf
## Non-Hispanic Other : Q1 = 80  median = Inf  Q3 = Inf
# When plotting svykm objects, the lty# option must be in a list. When including# lty, ylim must also be in a list (without# lty, ylim works without being in a list)
plot(km.ex9.4b, pars=list(lty=1:4),     xlab = "Age (years)",     ylab = "Proportion without asthma",     ylim = list(c(0.65, 1)))legend(45, 1, levels(nhanes.mod$race_eth)[c(2,4,1,3)],       lty = (1:4)[c(2,4,1,3)])
 


# WARNING: svylogrank() will fail if there are missing values.# Create a domain with complete cases
nhanes.mod <- nhanes.mod %>%   mutate(domain_asthma = !is.na(asthma_age) & !is.na(asthma) & !is.na(race_eth))
# Recreate the design to include this variable
design.INT <- svydesign(strata=~SDMVSTRA, id=~SDMVPSU, weights=~WTINT2YR,                        nest=TRUE, survey.lonely.psu = "adjust",                        data=nhanes.mod)
# Create domain designdesign_asthma <- subset(design.INT, domain_asthma)
# Logrank test
svylogrank(Surv(asthma_age, asthma) ~ race_eth,                  design = design_asthma)
## [[1]]
##         score     se       z             p
## [1,] -2148289 902839 -2.3795 0.01733698168
## [2,]  1968539 369426  5.3286 0.00000009895
## [3,]   159713 730398  0.2187 0.82691048095
##
## [[2]]
##        Chisq            p
## 29.330856009  0.000001908
##
## attr(,"class")
## [1] "svylogrank"
svylogrank 不仅返回用于比较因子变量水平之间的生存曲线的全局 p 值,还返回用于比较每个水平与参考水平的 p 值。
 加权 Cox 回归
为了拟合加权Cox比例风险回归,我们使用svycoxph()。不幸的是,没有改变DF的选项,因此测试假设分母自由度是无限的。一个例外是我们的定制函数svycontract_design_df(),它确实使用了design df。
注:如果尚未设置事件指示器和时间变量的格式。
例:使用NHANES 2017-2018数据(NHANES 1718rmph.Rdata),使用Cox比例风险回归法比较种族/民族组之间首次诊断哮喘的时间。
cox.ex9.4b <- svycoxph(Surv(asthma_age, asthma) ~ race_eth,                   design = design.INT)
summary(cox.ex9.4b)
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (30) clusters.
## svydesign(strata = ~SDMVSTRA, id = ~SDMVPSU, weights = ~WTINT2YR,
##     nest = TRUE, survey.lonely.psu = "adjust", data = nhanes.mod)
## Call:
## svycoxph(formula = Surv(asthma_age, asthma) ~ race_eth, design = design.INT)
##
##   n= 8867, number of events= 1304
##    (387 observations deleted due to missingness)
##
##                               coef exp(coef) se(coef)
## race_ethNon-Hispanic White -0.0809    0.9223   0.0770
## race_ethNon-Hispanic Black  0.3198    1.3768   0.0973
## race_ethNon-Hispanic Other  0.0288    1.0292   0.1075
##                            robust se     z Pr(>|z|)   
## race_ethNon-Hispanic White    0.1186 -0.68   0.4950   
## race_ethNon-Hispanic Black    0.1238  2.58   0.0098 **
## race_ethNon-Hispanic Other    0.2135  0.13   0.8927   
## ---
## Signif. codes:  
## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##                            exp(coef) exp(-coef)
## race_ethNon-Hispanic White     0.922      1.084
## race_ethNon-Hispanic Black     1.377      0.726
## race_ethNon-Hispanic Other     1.029      0.972
##                            lower .95 upper .95
## race_ethNon-Hispanic White     0.731      1.16
## race_ethNon-Hispanic Black     1.080      1.76
## race_ethNon-Hispanic Other     0.677      1.56
##
## Concordance= 0.532  (se = 0.01 )
## Likelihood ratio test= NA  on 3 df,   p=NA
## Wald test            = 20.1  on 3 df,   p=0.0002
## Score (logrank) test = NA  on 3 df,   p=NA
##
##   (Note: the likelihood ratio and score tests assume independence of
##      observations within a cluster, the Wald and robust score tests do not).
car::Anova(cox.ex9.4b, type = 3, test = "Wald")
## Analysis of Deviance Table (Type III tests)
##
## Response: Surv(asthma_age, asthma)
##          Df Chisq Pr(>Chisq)    
## race_eth  3  20.1    0.00016 ***
## ---
## Signif. codes:  
## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# AHRs, 95% CIs, p-values
cbind(exp(summary(cox.ex9.4b)$coef[, "coef"]),      exp(confint(cox.ex9.4b)),      summary(cox.ex9.4b)$coef[, "Pr(>|z|)"])
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (30) clusters.
## svydesign(strata = ~SDMVSTRA, id = ~SDMVPSU, weights = ~WTINT2YR,
##     nest = TRUE, survey.lonely.psu = "adjust", data = nhanes.mod)
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (30) clusters.
## svydesign(strata = ~SDMVSTRA, id = ~SDMVPSU, weights = ~WTINT2YR,
##     nest = TRUE, survey.lonely.psu = "adjust", data = nhanes.mod)
##                                    2.5 % 97.5 %
## race_ethNon-Hispanic White 0.9223 0.7310  1.164
## race_ethNon-Hispanic Black 1.3768 1.0802  1.755
## race_ethNon-Hispanic Other 1.0292 0.6773  1.564
##                                    
## race_ethNon-Hispanic White 0.495004
## race_ethNon-Hispanic Black 0.009803
## race_ethNon-Hispanic Other 0.892674
library(gtsummary)
# Table of regression coefficients
t1 <- cox.ex9.4b %>%  tbl_regression(estimate_fun = function(x) style_number(x, digits = 3),                 label  = list(race_eth ~ "Race/Ethnicity")) %>%  modify_column_hide(p.value) %>%  modify_caption("Cox regression results for time to asthma vs. race/ethnicity")
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (30) clusters.
## svydesign(strata = ~SDMVSTRA, id = ~SDMVPSU, weights = ~WTINT2YR,
##     nest = TRUE, survey.lonely.psu = "adjust", data = nhanes.mod)
# Table of hazard ratios
t2 <- cox.ex9.4b %>%  tbl_regression(exponentiate = T,  # HR = exp(B)                 estimate_fun = function(x) style_number(x, digits = 3),                 pvalue_fun   = function(x) style_pvalue(x, digits = 3),                 label  = list(race_eth ~ "Race/Ethnicity")) %>%  add_global_p(keep = T, test = "Wald")
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (30) clusters.
## svydesign(strata = ~SDMVSTRA, id = ~SDMVPSU, weights = ~WTINT2YR,
##     nest = TRUE, survey.lonely.psu = "adjust", data = nhanes.mod)
tbl_merge(  tbls = list(t1, t2),  tab_spanner = c("**Regression Coefficients**", "**Adjusted Hazard Ratio**"))
Table 9.6: Cox regression results for time to asthma vs. race/ethnicity
Characteristic
Regression Coefficients
Adjusted Hazard Ratio
log(HR)1
95% CI1
HR1
95% CI1
p-value
Race/Ethnicity




<0.001
Hispanic

Non-Hispanic White
-0.081
-0.313, 0.151
0.922
0.731, 1.164
0.495
Non-Hispanic Black
0.320
0.077, 0.562
1.377
1.080, 1.755
0.010
Non-Hispanic Other
0.029
-0.390, 0.447
1.029
0.677, 1.564
0.893
1 HR = Hazard Ratio, CI = Confidence Interval
排除权重为零的情况
与我们在本章中使用的其他函数不同,如果有任何权重为 0 的情况,svycoxph() 会返回错误。为避免这种情况,请使用域分析来排除这些情况。如果您已经在使用域分析,请修改该域以排除这些情况。例如,在 NHANES 数据集中,有一些检查权重为 0 的案例(不在检查子样本中)。
design.MEC <- svydesign(strata=~SDMVSTRA, id=~SDMVPSU, weights=~WTMEC2YR,                        nest=TRUE, survey.lonely.psu = "adjust",                        data=nhanes.mod)
design.pos.weights <- subset(design.MEC, WTMEC2YR > 0)
交互
包括模型中的交互作用和估计一个变量在另一个变量水平上的影响,如上述加权线性回归所述,但我们取幂以获得 HR。
示例 :体重指数 (BMXBMI) 对哮喘发作时间的影响是否因种族/民族而异?估计每个种族/民族级别的 BMI 效应。
由于 BMI 仅在检查子样本中收集,因此我们必须在指定样本设计时使用检查权重(并使用排除权重为零的领域分析)。
est     lower    upper
0.9820204 0.9519592 1.013031 est lower upper 1.004747 0.9802147 1.029894 est lower upper 0.9962407 0.9806913 1.012037 est lower upper 1.02071 0.9960611 1.045969
cox.ex9.5 <- svycoxph(Surv(asthma_age, asthma) ~ race_eth + BMXBMI +                         race_eth:BMXBMI,                  design = design.pos.weights)summary(cox.ex9.5)
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (30) clusters.
## subset(design.MEC, WTMEC2YR > 0)
## Call:
## svycoxph(formula = Surv(asthma_age, asthma) ~ race_eth + BMXBMI +
##     race_eth:BMXBMI, design = design.pos.weights)
##
##   n= 7981, number of events= 1213
##    (723 observations deleted due to missingness)
##
##                                      coef exp(coef)
## race_ethNon-Hispanic White        -0.7954    0.4514
## race_ethNon-Hispanic Black        -0.0770    0.9259
## race_ethNon-Hispanic Other        -1.0542    0.3485
## BMXBMI                            -0.0181    0.9820
## race_ethNon-Hispanic White:BMXBMI  0.0229    1.0231
## race_ethNon-Hispanic Black:BMXBMI  0.0144    1.0145
## race_ethNon-Hispanic Other:BMXBMI  0.0386    1.0394
##                                   se(coef) robust se
## race_ethNon-Hispanic White          0.3380    0.4491
## race_ethNon-Hispanic Black          0.3969    0.4886
## race_ethNon-Hispanic Other          0.4454    0.5308
## BMXBMI                              0.0105    0.0146
## race_ethNon-Hispanic White:BMXBMI   0.0116    0.0149
## race_ethNon-Hispanic Black:BMXBMI   0.0134    0.0175
## race_ethNon-Hispanic Other:BMXBMI   0.0154    0.0194
##                                       z Pr(>|z|)  
## race_ethNon-Hispanic White        -1.77    0.077 .
## race_ethNon-Hispanic Black        -0.16    0.875  
## race_ethNon-Hispanic Other        -1.99    0.047 *
## BMXBMI                            -1.24    0.214  
## race_ethNon-Hispanic White:BMXBMI  1.53    0.125  
## race_ethNon-Hispanic Black:BMXBMI  0.82    0.412  
## race_ethNon-Hispanic Other:BMXBMI  1.99    0.047 *
## ---
## Signif. codes:  
## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##                                   exp(coef) exp(-coef)
## race_ethNon-Hispanic White            0.451      2.215
## race_ethNon-Hispanic Black            0.926      1.080
## race_ethNon-Hispanic Other            0.348      2.870
## BMXBMI                                0.982      1.018
## race_ethNon-Hispanic White:BMXBMI     1.023      0.977
## race_ethNon-Hispanic Black:BMXBMI     1.014      0.986
## race_ethNon-Hispanic Other:BMXBMI     1.039      0.962
##                                   lower .95 upper .95
## race_ethNon-Hispanic White            0.187     1.089
## race_ethNon-Hispanic Black            0.355     2.413
## race_ethNon-Hispanic Other            0.123     0.986
## BMXBMI                                0.954     1.010
## race_ethNon-Hispanic White:BMXBMI     0.994     1.053
## race_ethNon-Hispanic Black:BMXBMI     0.980     1.050
## race_ethNon-Hispanic Other:BMXBMI     1.001     1.080
##
## Concordance= 0.528  (se = 0.013 )
## Likelihood ratio test= NA  on 7 df,   p=NA
## Wald test            = 27.6  on 7 df,   p=0.0003
## Score (logrank) test = NA  on 7 df,   p=NA
##
##   (Note: the likelihood ratio and score tests assume independence of
##      observations within a cluster, the Wald and robust score tests do not).
car::Anova(cox.ex9.5, type = 3, test = "Wald")
## Analysis of Deviance Table (Type III tests)
##
## Response: Surv(asthma_age, asthma)
##                 Df Chisq Pr(>Chisq)  
## race_eth         3  6.36      0.095 .
## BMXBMI           1  1.55      0.214  
## race_eth:BMXBMI  3  4.86      0.183  
## ---
## Signif. codes:  
## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
svycontrast_design_df(cox.ex9.5, c(0,0,0,1,0,0,0), EXP=T)
##     est lower upper
## 1 0.982 0.952 1.013
svycontrast_design_df(cox.ex9.5, c(0,0,0,1,1,0,0), EXP=T)
##     est  lower upper
## 1 1.005 0.9802  1.03
svycontrast_design_df(cox.ex9.5, c(0,0,0,1,0,1,0), EXP=T)
##      est  lower upper
## 1 0.9962 0.9807 1.012
svycontrast_design_df(cox.ex9.5, c(0,0,0,1,0,0,1), EXP=T)
##     est  lower upper
## 1 1.021 0.9961 1.046
 特殊情况总结
本章中有很多地方都有一个关于如何处理情况的特殊案例。下面的列表是一个提醒。
在指定设计之前,在数据管理步骤中将NA权重替换为0
使用subset()创建域分析的子集设计
表1的完整案例分析
仅用于svycoxph()的正权重
亚组分析
如果你添加/删除/更改任何变量,请重新指定设计(以及子集,如果有的话)
在tbl_summary()中使用“by”变量的字符版本
tbl_regression()没有更改DF、导出为Word以及用其他函数的值替换CI和p值的选项 
本站仅提供存储服务,所有内容均由用户发布,如发现有害或侵权内容,请点击举报
打开APP,阅读全文并永久保存 查看更多类似文章
猜你喜欢
类似文章
利用furniture包制作table1
【11月15日英语朗读课】在美国旅游千万不能这么说!!!
What Race Is George Zimmerman? | Psychology Today
R数据分析:做量性研究的必备“家伙什”-furniture包介绍
利用meta分析惯用的State软件进行认知功能障碍NHANES数据库分析
颜值最高的race
更多类似文章 >>
生活服务
热点新闻
分享 收藏 导长图 关注 下载文章
绑定账号成功
后续可登录账号畅享VIP特权!
如果VIP功能使用有故障,
可点击这里联系客服!

联系客服