Monday, 21st of January (2019)

Around the Genetic
of
Type 2 Diabetes

A bit of history …

A lot of SNPs discovered …

Flannick & Florez (2016)

Associated with T2D and traits …

Marullo, El-Sayed Moustafa, & Prokopenko (2014)

But, weak correlation of the effects

Scott et al. (2012)

But, weak correlation of the effects

Yaghootkar & Frayling (2013)

Is There
a
Genetic Joint Effect?

"In all cases, the glucose-raising allele was associated with increased risk of T2D, yet fasting glucose effect sizes and T2D ORs were weakly correlated"

Scott et al. (2012)

Why use a Joint Model?

  • To identify biomarker relevant to a disease

  • To identify the effect of a treatment on a disease
    (independently of the association between the disease and the biomarker)

For example:

  • Biomarker: CD4 counts
  • Event: death

We can start with a Cox Model …

(Extended) Cox model: \[\begin{align} \lambda_i(t)=\lambda_0(t) \exp(\beta Y_i(t) + \alpha Z_i + \eta W_i) \end{align}\]

Where:

  • \(\lambda_i(t)\) is the hazard function at time \(t\) for individual \(i\);
  • \(\lambda_0(t)\) is the unspecified baseline hazard function;
  • \(\alpha\) measures the effect of \(Z_i\) on the hazard function;
  • \(\beta\) measures the association between the trajectory function \(Y_i(t)\) and the hazard function.

We can start with a Cox Model …

coxph(Surv(Time, death) ~ drug, data = aids.id)

We can start with a Cox Model …

coxph(Surv(Time, death) ~ drug, data = aids.id)
#> Call:
#> coxph(formula = Surv(Time, death) ~ drug, data = aids.id, x = TRUE)
#> 
#>   n= 467, number of events= 188 
#> 
#>           coef exp(coef) se(coef)     z Pr(>|z|)
#> drugddI 0.2102    1.2339   0.1462 1.437    0.151
#> 
#>         exp(coef) exp(-coef) lower .95 upper .95
#> drugddI     1.234     0.8104    0.9264     1.643
#> 
#> Concordance= 0.531  (se = 0.019 )
#> Rsquare= 0.004   (max possible= 0.99 )
#> Likelihood ratio test= 2.07  on 1 df,   p=0.2
#> Wald test            = 2.07  on 1 df,   p=0.2
#> Score (logrank) test = 2.07  on 1 df,   p=0.1

It works, but biomarkers …

  1. are measured at determined time points (\(t_{ij}\))
  2. can have missing values over time
    • => Imputation?
    • \(\Rightarrow\) Bias introduction
  3. are measured with some degree of error
    • \(\Rightarrow\) Noise in the biomarker trajectory (\(Y_i(t_{ij}) \neq X_i(t_{ij})\))
  4. can be endogenous
    • \(\Rightarrow\) Trajectory can change when the event occurs
    • \(\Rightarrow\) Bias introduction

Hopefully, the Mixed Model is there!

(Generalised) linear mixed effect model: \[Y_{i}(t_{ij})=X_{i}(t_{ij})+\epsilon_{i}(t_{ij})\]

where:

  • \(Y_{i}(t_{ij})\) is the observed value
  • \(X_{i}(t_{ij})\) is the true (unobserved) value of the longitudinal measurement at time \(t_{ij}\) for individual \(i\).
  • \(\epsilon_{i}(t_{ij})\) is a random error term, usually:
    \[\epsilon_{i}(t_{ij})\sim \mathcal{N}(0,\sigma^2)\]

Hopefully, the Mixed Model is there!

lme(sqrt(CD4) ~ obstime*drug - drug, random = ~ 1|patient, data = aids)

Hopefully, the Mixed Model is there!

lme(sqrt(CD4) ~ obstime*drug - drug, random = ~ 1|patient, data = aids)
#> Linear mixed-effects model fit by REML
#>  Data: aids 
#>        AIC      BIC    logLik
#>   2730.729 2756.957 -1360.364
#> 
#> Random effects:
#>  Formula: ~1 | patient
#>         (Intercept)  Residual
#> StdDev:   0.8719493 0.4106749
#> 
#> Fixed effects: sqrt(CD4) ~ obstime * drug - drug 
#>                      Value  Std.Error  DF  t-value p-value
#> (Intercept)      2.5087153 0.04306984 936 58.24761  0.0000
#> obstime         -0.0338783 0.00352643 936 -9.60697  0.0000
#> obstime:drugddI  0.0055813 0.00498557 936  1.11949  0.2632
#>  Correlation: 
#>                 (Intr) obstim
#> obstime         -0.153       
#> obstime:drugddI  0.003 -0.691
#> 
#> Standardized Within-Group Residuals:
#>         Min          Q1         Med          Q3         Max 
#> -5.09514877 -0.46243667  0.01324625  0.48993786  5.17139372 
#> 
#> Number of Observations: 1405
#> Number of Groups: 467

Let's use a Joint Model

fitCOX <- coxph(Surv(Time, death) ~ drug, data = aids.id, x = TRUE)
fitLME <- lme(sqrt(CD4) ~ obstime*drug - drug, random = ~ 1 | patient, data = aids)
fitJOINT <- jointModel(fitLME, fitCOX, timeVar = "obstime", method = "piecewise-PH-aGH")
#> Variance Components:
#>                StdDev
#> (Intercept) 0.8793445
#> Residual    0.4093577
#>
#> Coefficients:
#> Longitudinal Process
#>                   Value Std.Err  z-value p-value
#> (Intercept)      2.5100  0.0434  57.8778 <0.0001
#> obstime         -0.0362  0.0035 -10.3474 <0.0001
#> obstime:drugddI  0.0045  0.0049   0.9173  0.3590
#>
#> Event Process
#>             Value Std.Err z-value p-value                                  
#> drugddI    0.3458  0.1523  2.2715  0.0231                                  
#> Assoct    -1.0866  0.1184 -9.1786 <0.0001  
#> log(xi.1) -1.6560  0.2529 -6.5475                                          
#> log(xi.*)  ......  .....   ......

What is a Joint Model? With a picture!

What is a Joint Model? With a picture!

What is a Joint Model? With a picture!

What is a Joint Model? With a picture!

What is a Joint Model? With a picture!

What is a Joint Model? With a picture!

What is a Joint Model? With a picture!

What is a Joint Model? With a picture!

What can we do with a Joint Model?

Test simultaneously an effect on:

  • a biomarker (\(\gamma\));
  • an event (\(\alpha\));
  • a biomarker and an event (\(\beta\gamma+\alpha\)).

The best part?

A gain in statistical power to detect those effects

  • if \(\beta\neq0\), to detect a joint effect of \(Z\): \(\beta\gamma+\alpha\neq0\)
    (from Chen, Ibrahim, & Chu (2011));
  • compared to the extended Cox model.

What is a Joint Model? With equations!

The standard (joint likelihood) formulation involves two components:

  • a longitudinal component
  • a time-to-event (survival) component.

With:

  • \(n\), the sample size;
  • \(i\), an individual (\(i=1,\cdots,n\));
  • \(m_i\), the number of measurements on individual \(i\);
  • \(t_{ij}\), a time points (\(j=1,\cdots,m_i\)).

The longitudinal component

(Generalised) linear mixed effect model: \[Y_{i}(t_{ij})=X_{i}(t_{ij})+\epsilon_{i}(t_{ij})\]

\(X_{ij}\) is the trajectory function, and could be defined: \[\begin{gather}X_{i}(t_{ij})=\theta_{0i} + \theta_{1i}t_{ij} + \cdots + \theta_{pi}t_{ij}^p &, & \boldsymbol\theta_p \sim \mathcal{N}(\boldsymbol\mu_, \boldsymbol\Sigma)\end{gather}\]

For simplicity here, we assume linearity over time (\(\theta_{0i}+\theta_{1i}t_{ij}\)): \[\begin{gather}Y_{i}(t_{ij})=\theta_{0i}+\theta_{1i}t_{ij}+\gamma Z_i+\delta W_i+\epsilon_{ij} &, & \boldsymbol{\theta} \sim \mathcal{N}_2 (\boldsymbol{\mu},\boldsymbol{\Sigma})\end{gather}\]

The longitudinal component

(Generalised) linear mixed effect model: \[Y_{i}(t_{ij})=\theta_{0i}+\theta_{1i}t_{ij}+\gamma Z_i+\delta W_i+\epsilon_{ij}\]

With:

  • \(Y_{i}(t_{ij})\), the observed value;
  • \(X_{i}(t_{ij})\), the true (unobserved) value of the longitudinal measurement at time \(t_{ij}\) for individual \(i\);
  • \(\epsilon_{ij}\),a random error term, usually: \(\epsilon_{ij}\sim \mathcal{N}(0,\sigma^2)\);
  • \(Z_i\), a vector denoting the genotype of individual \(i\);
  • \(W_i\), a set of adjusting covariates.

The time-to-event component

(Extended) Cox model (proportional hazards): \[\begin{align} \lambda_i(t)&=\lim_{dt \to 0} \frac{P\{t\leq T_i<t+dt|T_i\geq t, \bar{Y_i}(t), Z_i, W_i\}}{dt}\\ &=\lambda_0(t) \exp\{\beta X_{i}(t) + \alpha Z_i + \eta W_i\} \end{align}\]

With:

  • \(\bar{Y_i}(t)=\{Y_i(u),0 \leq u \leq t\}\), the history of the trajectory;
  • \(T_i\), the event time for individual \(i\);
  • \(C_i\), the right censoring time (end of the follow-up);
  • \(\Delta_i\), an event indicator: \(\begin{cases} \Delta_i=0, & \text{if }\ T_i>C_i. \\ \Delta_i=1, & \text{if }\ T_i <= C_i. \end{cases}\)

Hypothesis testing

Null hypothesis: \(\begin{cases}H_0&:& \theta=\theta_0\\H_1&:& \theta\neq\theta_0\end{cases}\)
  • Likelihood Ratio Test
    \[LRT=-2\{\ell(\hat{\theta}_0)-\ell(\hat{\theta})\}\]

  • Wald Test
    \[\begin{gather} W=(\hat{\theta}-\theta_0)^\top \mathcal{I}(\hat{\theta})(\hat{\theta}-\theta_0)\\ \left(\text{Univariate: }(\hat{\theta}_j-\theta_{0j})/\widehat{\text{s.e.}}(\hat{\theta}_j)\right) \end{gather}\]

  • Score Test
    \[U=S^\top(\hat{\theta}_0)\{\mathcal{I}(\hat{\theta}_0)\}^{-1}S(\hat{\theta}_0)\]

Is the
Joint Model Approach
Worth It?

Let's find out with simulations!

Estimators (& Computation time)

  • Are Joint Model estimators good?
    • Bias, variance and RMSE (Root-Mean Square Error)
      \[\begin{align} \operatorname{MSE}(\hat\phi)&= \operatorname{Bias}(\hat\phi)^2 + \operatorname{Var}(\hat\phi)\\ \operatorname{RMSE}(\hat{\phi})&=\sqrt{\operatorname{MSE}(\hat\phi)}\\ &=\sqrt{E\{(\hat{\phi}-\phi)^2\}} \end{align}\] \[\phi=(\beta, \gamma, \alpha)\]
  • Can we do a whole genome analysis…
    … in a reasonable time frame?

A more "naive" approach!

What if, we split the job in two?

\(\Rightarrow\) "Two-Step"? (Tsiatis, DeGruttola, & Wulfsohn, 1995)

  1. (Generalised) linear mixed effect model
    \[\begin{align} Y_{i}(t)&=X_i(t)+ \epsilon_{i}(t)\\ X^*_{i}(t)&=E\{X_{i}(t)|\bar{Y_i}(t), T_i\geq t\} \end{align}\]

  2. (Extended) Cox model (proportional hazards)
    \[h_i(t)=h_0(t) \exp\{\beta X^*_{i}(t)\}\]

Time to generate fake data!

Let's keep it simple, i.e., without covariates:

  • the trajectory: \(Y_{i}(t)=\theta_{0i} + \theta_{1i}t + \gamma Z_i + \epsilon_{i}(t)\)

  • the event: \(\lambda_i(t)=\lambda_0(t) \exp\{\beta X_{i}(t) + \alpha Z_i\}\)

  • the time of event, e.g., the exponential distribution (Austin, 2012):
    \[\begin{gather} H_i(T_i)=\int_0^{T_i}\lambda_0(t) \exp(\beta X_i(t)+\alpha Z_i)dt , & \lambda_0(t)=\lambda\\ F_i(T_i)=1-exp(-H_i(T_i))=u , & u\sim\mathcal{U}(0, 1) \end{gather}\] \[T_i=\frac{1}{\beta\theta_{1i}}\log\left(1-\frac{\beta\theta_{1i}\times \log(1-u)}{\lambda \exp(\beta\theta_{0i}+(\beta\gamma+\alpha)Z_i)}\right)\]

Set the parameters!

Scott et al. (2012)

Yaghootkar & Frayling (2013)

Set the parameters!

Parameters and numerical values used for sensitivity analysis and simulations, based on results from rs17747324 within gene TCF7L2 in the French cohort D.E.S.I.R.
Parameters Values
Number of participants (\(n\)) 4,352
Number of measures (\(m\)) 4
Diabetes incidence rate (\(d\)) 0.0384
Minor allele frequency (\(f\)) 0.244
Random effects (\(\theta\)) \(\sim\mathcal{N}_2\left (\begin{bmatrix}4.55\\0.0108\end{bmatrix} , \begin{bmatrix} 0.143 & -0.00109 \\ -0.00109 & 6.8\times 10^{-04} \end{bmatrix} \right )\)
SNP effect on \(Y_{ij}\) (\(\gamma\)) 0.0229
SNP effect on \(T_i\) (\(\alpha\)) 0.265 (OR=1.3)
Association between \(Y_{ij}\) and \(T_i\) (\(\beta\)) 3.17
Error term (\(\epsilon\)) \(\sim\mathcal{N}(0,0.305^2)\)

How does our fake data looks like?