class: title-slide .header[ <img src="bristol-logo.png" style="width: 200px"></img> <img src="ieu-logo.png" style="width: 200px"></img> ] # Epigenetic Epidemiology Update .large[**Matthew Suderman**] .large[**Nov 14, 2022**] --- layout: true .footer[MRC Integrative Epidemiology Unit] --- ## EWAS .striped[ | pmid|journal |variable |tissue |population |results | |--------:|:------------------------|:------------------------------|:-----------------------|:-------------------------------------------------------------|:------------------------------| | 36266660|BMC Med |placenta |maternal blood pressure |666 births |null | | 36329530|Clin Epigenetics |endocrine hypertension |blood |255 hypertensive patients with or without endocrine disorders |over 100K sites? | | 36325427|Brain Behav Immun Health |history of clinical depression |blood |692 Parkinon's Disease patients |35 | | 36319817|Sci Rep |major depressive disorder |blood |298 MDD cases and 63 controls |multiple sites linked to TNNT3 | ] --- ## EWAS, cont. .striped[ | pmid|journal |variable |tissue |population |results | |--------:|:-----------------|:--------------------------------------------------------------|:------|:--------------------------------------------|:------------------------------------------| | 36303554|Front Genet |albumin |blood |960 HIV-infected males |9 sites | | 36253871|Genome Biol |age |blood |600 elderly |182760 sites | | 36292585|Genes (Basel) |left ventricular hypertrophy |blood |636 African American adults |2 sites | | 36344488|Transl Psychiatry |generalised anxiety disorder and obsessive-compulsive disorder |blood |460 Chinese adults |3 sites; 3 sites differentiate GAD and OCD | | 36274151|Clin Epigenetics |4 lipid measures |blood |1084 from the Chinese National Twin Registry |19 sites | ] --- .running[Prediction] ## CpG sites predict blood pressure in 5 years Hong X ... Li L. **Association Between DNA Methylation and Blood Pressure: A 5-Year Longitudinal Twin Study.** *Hypertension* . doi: [10.1161/HYPERTENSIONAHA.122.19953](http://doi.org/10.1161/HYPERTENSIONAHA.122.19953) 16 CpG sites associated with systolic BP; 20 with diastolic BP Cross-lagged analysis indicates predictive ability: <img src="36345830.png"></img> --- .running[Prediction] ## Predictors affected by preprocessing Ori APS, Lu AT, Horvath S, Ophoff RA. **Significant variation in the performance of DNA methylation predictors across data preprocessing and normalization strategies.** *Genome Biol* . doi: [10.1186/s13059-022-02793-w](http://doi.org/10.1186/s13059-022-02793-w) <img src="13059_2022_2793_Fig1_HTML.png" style="width:100%"></img> * "32 out of 41 predictors (78%) demonstrate excellent consistency" * "moderate correlation in performance across analytical strategies (mean rho = 0.40, SD = 0.27)" * recommend "OOB background correction, RELIC dye-bias correction, quantile normalization applied separately for methylated and unmethylated intensities of Infinium I and II probes, and RCP to correct for probe design type bias." --- .running[Prediction] <img src="13059_2022_2793_Fig3_HTML.jpg" style="width:100%"</img> *... but mortality risk by hazard ratio positively associated with technical noise?* --- .running[Prediction] ## Detecting breast cancer from cell-free DNAm Manoochehri M ... Hamann U. **DNA methylation biomarkers for noninvasive detection of triple-negative breast cancer using liquid biopsy.** *Int J Cancer* . doi: [10.1002/ijc.34337](http://doi.org/10.1002/ijc.34337) 1. Identify 6 top DMRs between tumor and normal breast tissue 2. Droplet digital PCR applied to cell-free DNA from blood plasma (test: 60 cases, 36 controls; validation: 79 cases, 48 controls) <img src="ijc34337-fig-0004-m.png" style="width: 66%"></img> --- .running[Prediction] ## Including prior information when building models Kawaguchi ES, Li S, Weaver GM, Lewinger JP. **Hierarchical Ridge Regression for Incorporating Prior Information in Genomic Studies.** *J Data Sci* . doi: [10.6339/21-jds1030](http://doi.org/10.6339/21-jds1030) We'd often like to use previous knowledge to generate better models, e.g. genes regulated by CpG sites, pathways with genes regulated by CpG sites. .pull-left[ **Ridge regression** `$$min_{\beta} ||y-X\beta||_2^2 + \lambda||\beta||_2^2$$` where * `\(y\)` is the variable of interest, * `\(X\)` is the methylation matrix, * `\(\beta\)` is the model coefficients for each CpG site ] .pull-right[ **Hierarchical ridge regression** `$$min_{\beta,\gamma} ||y-X\beta||_2^2 + \lambda_1||\beta-Z\gamma||_2^2 + \lambda_2||\gamma||_2^2$$` where * `\(Z\)` describes added information, <br>e.g. identify CpG sites linked to a gene, effect sizes from a related EWAS In other words, we shrink `\(\beta\)` toward prior information and `\(\gamma\)` to 0. If the prior information is not useful, then hierarchical ridge regression reduces to ridge regression. ] --- .running[Prediction] There is an R package on CRAN. ```r install.packages("xrnet") ``` ```r library(xrnet) ``` To illustrate, we'll use their example dataset with 200 samples, 50 features and 5 external features. We split it into 80% for training and 20% for testing. ```r data(GaussianExample) is.train <- sample(c(F,T), length(y_linear), prob=c(0.2,0.8), replace=T) ``` Here is how to the hierarchical model using cross-validation to optimize `\(\lambda_1\)` and `\(\lambda_2\)`. ```r model <- tune_xrnet( x = x_linear[is.train,], ## 50 features for 200 samples y = y_linear[is.train], ## variable of interest for 200 samples external = ext_linear, ## 5 external features for 50 features (50x5) family = "gaussian", ## outcome variable is numerical penalty_main = define_penalty(0), ## ridge for features penalty_external = define_penalty(1) ## lasso for external features ) ``` ```r model$opt_penalty ## lambda_1 model$opt_penalty_ext ## lambda_2 ``` Predictions are made as usual: ```r pred <- predict(model, newdata=x_linear[!is.train,], type="response") ``` --- .running[Prediction] .pull-left-30[ For fun, I've plotted model performance for different proportions of samples being set aside for training. The 'base' is just normal ridge regression. Notice: * Performance improves with training set proportion * The hierarchical model tends to perform better ] .pull-right-70[ <!-- --> ] --- .running[Prediction] .pull-left-30[ The authors trained age predictors in a dataset with 656 samples. External features `\(Z\)` for hierarchical ridge regression linked CpG sites to genes. In other words, a CpG site would be more likely to play an important role in the model if other sites for the gene were chosen to play important roles. 'Augmented ridge regression' is ridge regression applied to the matrix `\([X,XZ]\)` rather than just `\(X\)`. ] .pull-left-70[ <img src="hierarchical-ridge-dnamage.png" style="width: 100%"></img> ] --- .running[Prediction] ## Matched buccal and brain DNAm Sommerer Y ... Bertram L. **A correlation map of genome-wide DNA methylation patterns between paired human brain and buccal samples.** *Clin Epigenetics* . doi: [10.1186/s13148-022-01357-w](http://doi.org/10.1186/s13148-022-01357-w) > "In this study, we performed a correlation analysis between DNAm data of a total of n=120 matched post-mortem buccal and prefrontal cortex samples. We identified nearly 25,000 (3% of approximately 730,000) cytosine-phosphate-guanine (CpG) sites showing significant (false discovery rate q < 0.05) correlations between buccal and PFC samples." > "The DNAm raw data generated and used for the analyses described in this manuscript are available to qualified researchers and qualified research projects." They replicate findings in https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE111165 (blood, saliva, buccal, **live** brain from 27 epilepsy patients) --- .running[Prediction] ## Matched brain, blood, saliva and buccal DNAm Wahba NE ... Shinozaki G. **Genome-wide DNA methylation analysis of post-operative delirium with brain, blood, saliva, and buccal samples from neurosurgery patients.** *J Psychiatr Res* . doi: [10.1016/j.jpsychires.2022.10.023](http://doi.org/10.1016/j.jpsychires.2022.10.023) > "Methods: The four tissue types (brain, blood, saliva, buccal) of DNA samples from up to 40 patients, including 11 POD cases, were analyzed using Illumina EPIC array. DNAm differences between patients with and without POD were examined. We also conducted enrichment analysis based on the top DNAm signals." > "Results: The most different CpG site between control and POD was found at cg16526133 near the ADAMTS9 gene from the brain tissue(p = 8.66E-08). However, there are no CpG sites to reach the genome-wide significant level." > "The data that support the findings of this study are available from the corresponding author, G.S., upon reasonable request." --- .running[Epigenetics] ## CpG density and age-associated DNAm Higham J ... Sproul D. **Local CpG density affects the trajectory and variance of age-associated DNA methylation changes.** *Genome Biol* . doi: [10.1186/s13059-022-02787-8](http://doi.org/10.1186/s13059-022-02787-8) Data * longitudinal DNA methylation from 600 aged between 67-80 Results * 182,760 loci change with age * strongest changes 8322 low CpG density loci * change at 1487 of these affected by cis SNPs * in younger individuals, change at these sites is less variable and mostly loss of methylation