class: title-slide, right, top background-image: url(img/r_medicine.jpg) background-position: 10% 75%, 75% 75% background-size: 30%, cover .right-column[ # Generalized additive models for longitudinal biomedical data ### _Beyond linear models_ **Ariel Mundo**<br> <br> Department of Biomedical Engineering <br /> University of Arkansas <br><br> 08-26-2021 ] --- class: center This talk is based on work from our lab (under review) the preprint is available at: ![:scale 15%](img/bioRxiv.png) -- ![:scale 40%](img/preprint.png) -- This paper covers: **Limitations of linear models**<br> **Theory of GAMs** <br> **Workflow for GAM selection in R using biomedical data** <br> <br> -- <br> The slides of this talk are available at <br> []( --- # Motivation > Longitudinal studies (LS): Repeated measures on the subjects in multiple groups -- > LS are a powerful tools because they allow to see the evolution of an effect over time -- > Some examples of different areas of biomedical research that use longitudinal studies: - Pediatrics - Cancer - Nutrition --- ### How do we analyze longitudinal data? #### What we tend to do in Biomedical Research: <img src="img/arrow.jpg" width="300" style="position: fixed; right: 20px; bottom:20px;"> -- .my-coral[Repeated measures → repeated measures ANOVA (rm-ANOVA) → _post-hoc_ comparisons ]<br> <br> -- #### Or we can also do: .green[Repeated measures → linear mixed model (LMEM) → _post-hoc_ comparisons] --- # Simulation to the rescue! - Some simulated data that follows trends of tumor volume reported in Zheng et. al. (2019). -- - Simulation is useful here because we can only get a mean value from the paper. -- .pull-left[ <img src="RMedicine2021_slides_files/figure-html/data-plot-1.png" width="504" /> ] -- .pull-right[ <img src="RMedicine2021_slides_files/figure-html/simulated-data-1.png" width="504" /> ] --- ### How does an rm-ANOVA model look on this data? - Linear model with interaction of time and group: .panelset[ .panel[.panel-name[model] ```r lm1<-lm(Vol_sim ~ Day + Group + Day * Group, data = dat_sim) ``` Where: <br> `Vol_sim`= simulated volume size <br> `Day`= Day number (1-15) <br> `Group`= Factor (T1 or T2) <br> `dat_sim`= simulated dataset ] .panel[.panel-name[p-values] ```r anova(lm1) ``` ``` ## Analysis of Variance Table ## ## Response: Vol_sim ## Df Sum Sq Mean Sq F value Pr(>F) ## Day 1 1572512 1572512 554.64 < 2.2e-16 *** ## Group 1 1411668 1411668 497.91 < 2.2e-16 *** ## Day:Group 1 879240 879240 310.12 < 2.2e-16 *** ## Residuals 316 895923 2835 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` ] .panel[.panel-name[post-hoc] ```r emmeans(lm1, ~Day * Group, adjust = "bonf") ``` ``` ## Day Group emmean SE df lower.CL upper.CL ## 7.5 T1 257 4.21 316 248 267 ## 7.5 T2 124 4.21 316 115 134 ## ## Confidence level used: 0.95 ## Conf-level adjustment: bonferroni method for 2 estimates ``` ] .panel[.panel-name[Plot] .pull-left[ <img src="RMedicine2021_slides_files/figure-html/rm-ANOVA plot-1.png" width="504" /> ] .pull-right[ ![:scale 50%](img/uncle_roger.gif) ] ] ] --- ### But what is exactly an rm-ANOVA? <br> <br> <br> `\begin{equation} y_{ijt} = \beta_0+\beta_1 \times time_{t} +\beta_2 \times treatment_{j} +\beta_3 \times time_{t}\times treatment_{j}+\varepsilon_{ijt}\\ \end{equation}` -- `\(y_{ijt}\)`: is the response for subject `\(i\)` in treatment group `\(j\)` at time `\(t\)` </br> -- `\(\beta_0\)`: the mean group value </br> -- `\(time_t\)`, `\(treatment_j\)`, `\(time_t \times treatment_j\)`: fixed effects </br> -- `\(\beta_1, \beta_2\)` and `\(\beta_3\)`: linear slopes of the fixed effects. </br> -- `\(\varepsilon_{ijt}\)`: random variation not explained by the fixed effects, assumed to be `\(\sim N(0,\sigma^2)\)`</br> --- ### In other words... An rm-ANOVA is a model that fits a .my-gold[**line**] to the trend of the data! -- .pull-left[ ![:scale 75%](img/batis.gif) ] .footnote[ _Batis et. al. 2013_ ] -- .pull-right[ - It works reasonably well in certain cases .remark-slide-emphasis[ .green[ But in biomedical research things don't look linear! ] ] ] --- # Some examples .pull-left[ ![:scale 50%](img/Skala.jpg) ] .footnote[ _Skala et. al. 2010_ ] -- .pull-right[ ![:scale 80%](img/Vishwanath.jpg) .footnote[ _Vishwanath et. al. 2009_ ] ] --- # An alternative: Generalized additive models (GAMs) `\begin{equation} y_{ijt}=\beta_0+f(x_t\mid \beta_j)+\varepsilon_{ijt} \end{equation}` -- `\(y_{ijt}\)`: response at time `\(t\)` of subject `\(i\)` in group `\(j\)` <br> -- `\(\beta_0\)`: expected value at time 0 <br> -- The change of `\(y_{ijt}\)` over time is represented by the _smooth function_ `\(f(x_t\mid \beta_j)\)` with inputs as the covariates `\(x_t\)` and parameters `\(\beta_j\)` <br> -- `\(\varepsilon_{ijt}\)` represents the residual error --- # An alternative: GAMs <img src="RMedicine2021_slides_files/figure-html/basis-functions-plot-1.png" width="864" style="display: block; margin: auto;" /> --- # How does a GAM model look for the simulated data? .panelset[ .panel[.panel-name[model] ```r gam1 <- gam(Vol_sim ~ Group+s(Day, by = Group, k = 10), method='REML', data = dat_sim) ``` ] .panel[.panel-name[Plot] <img src="RMedicine2021_slides_files/figure-html/GAM-plot-1.png" width="504" style="display: block; margin: auto;" /> ] .panel[.panel-name[Pairwise comp.] <img src="RMedicine2021_slides_files/figure-html/GAM-tumor-plot-1.png" width="576" style="display: block; margin: auto auto auto 0;" /> .pull-right[ - Comparisons are not guided by a _p-value_ - But the comparison actually makes sense! ] ] ] --- # Other advantages of GAMs <!-- <font size="16"> --> <!-- <table> --> <!-- <tr> --> <!-- <th> Data</th> --> <!-- <th>GAMs</th> --> <!-- </tr> --> <!-- <tr> --> <!-- <td> Missing obs.</td> --> <!-- <td> ✔</td> --> <!-- </tr> --> <!-- <tr> --> <!-- <td>Different covariance <br> structures</td> --> <!-- <td> ✔</td> --> <!-- </tr> --> <!-- <tr> --> <!-- <td>Prediction</td> --> <!-- <td> ✔</td> --> <!-- </tr> --> <!-- </table> --> <!-- </font> --> - Can use different covariance structures ✅ <br> <br> -- - Work with missing observations ✅ <br> <br> -- - Different types of splines can be used: <br> <br> -- - Cubic - thin plate - Gaussian process --- # Conclusions - Doing a visual exploration of the data is always a good idea! <br> <br> -- - GAMs allow to fit non-linear responses over time <br> <br> -- - The same idea behind a rm-ANOVA or LMEM holds, but you use a spline instead of a line to do the fitting <br> <br> -- - .red[_p-values_] can be misleading! --- class: center # Acknowledgements .pull-left[ Dr. John R. 