Table of Contents

Using aries to access ARIES DNA methylation profiles

This tutorial illustrates the functionality of aries by presenting an EWAS for identifying associations of BMI with ARIES DNA methylation in ALSPAC parents in middle age.

Installation

require(devtools)
devtools::install_github("MRCIEU/aries")

Load the package

library(aries)

ARIES location

The package functions will need to know where ARIES is located on your system. In this tutorial, we save this location for later use in the variable aries.dir.

aries.dir <- "path/to/aries"

Select ARIES samples

As noted above, we will analyse the mothers and fathers of ALSPAC participants in middle age, i.e. at time points 'FOM' and 'FOF', respectively.

Run aries.time.points(aries.dir) to see a list of all ARIES time points.

To select this sample subset, we use the aries.select() function.

aries <- aries.select(aries.dir, time.point=c("FOM","FOF"))

Additional arguments for refining sample selection:

The output of aries.select() is a list containing ARIES information for the selected samples. It includes the following elements:

Load ARIES methylation

Having selected our samples, we now load the corresponding DNA methylation profiles.

aries$meth <- aries.methylation(aries)

aries.methylation() returns the matrix of normalized methylation levels (aka 'betas') for the specified set of samples. There is one column for each sample and one row for each CpG site.

Load ALSPAC phenotype/exposure variables (direct ALSPAC users only)

Direct users of ALSPAC can load ALSPAC data into R using the alspac R package (https://github.com/explodecomputer/alspac).

It can be installed from github:

require(devtools)
devtools::install_github("explodecomputer/alspac")

We load the package and tell it where the ALSPAC dataset is located (On University of Bristol Windows computers, this is typically R://Data).

library(alspac)
alspac::setDataDir("path/to/alspac/data")

Next we load the BMI variables for the ALSPAC participant mothers and fathers measured at middle age (see https://github.com/explodecomputer/alspac for more information on how to use the alspac package).

varnames <- c("fm1ms111", ## mother BMI 
              "ff1ms111") ## father BMI
variable.index <- alspac::findVars(varnames)
stopifnot(all(varnames %in% tolower(variable.index$name)))
alspac <- alspac::extractVars(variable.index)

We give the variables more meaningful names and set any negative BMI values to missing.

alspac$bmi.mother <- alspac$fm1ms111
alspac$bmi.mother[which(alspac$bmi.mother < 0)] <- NA

alspac$bmi.father <- alspac$ff1ms111
alspac$bmi.father[which(alspac$bmi.father < 0)] <- NA

Prepare data for EWAS

We reorder the samples in the ALSPAC dataset to match the DNA methylation data.

alspac <- alspac[match(aries$samples$aln, alspac$aln),]

(If we were analysing data from the ALSPAC child participants, then we should use alnqlet rather than aln.)

Next we construct a data frame containing our variable of interest (BMI) and all covariates (sex, plate, and cell counts).

ewas.data <- data.frame(bmi=NA,
                        sex=aries$samples$sex,
                        batch=aries$samples$plate,
                        aries$cell.counts[["blood-gse35069"]])

BMI was left missing because we need to combine two BMI variables, one for mothers and the other for fathers, into a single variable. Here we first fill in BMI for mothers and then BMI for fathers.

mothers.idx <- which(aries$samples$time_point == "FOM")
ewas.data$bmi[mothers.idx] <- alspac$bmi.mother[mothers.idx]
fathers.idx <- which(aries$samples$time_point == "FOF")
ewas.data$bmi[fathers.idx] <- alspac$bmi.father[fathers.idx]

Run EWAS

Here we use the R Bioconductor package limma to test associations.

library(limma)
design <- model.matrix(~., ewas.data)
fit <- lmFit(aries$meth[,rownames(design)], design) 
fit <- eBayes(fit)

Compare associations to a published study

We compare our associations to a large published EWAS of BMI (n=3743).

Mendelson MM, Marioni RE, Joehanes R, et al. Association of Body Mass Index with DNA Methylation and Gene Expression in Blood Cells and Relations to Cardiometabolic Disease: A Mendelian Randomization Approach. PLoS Med. 2017;14(1):e1002215.

We load summary statistics from the study.

read.mendelson <- function(filename) {
    require(readxl)
    ret <- read_xlsx(filename, skip=2)
    ret <- as.data.frame(ret)
    cols <- colnames(ret)
    names(cols) <- cols
    cols[c("Probename","β...9","p-value...11")] <- c("cpg","effect","p.value")
    colnames(ret) <- cols
    ret
}
bmi <- read.mendelson("bmi-mendelson-plosmed-2017.xlsx")
rownames(bmi) <- bmi$cpg

Mendelson et al. identify 137 CpG sites associated with BMI.

sites <- rownames(fit$coefficient)
common <- intersect(bmi$cpg, sites)

Of these, 135 CpG sites are measured in ARIES.

Effect sizes for these sites are highly correlated with ARIES:

cor(bmi[common,"effect"],
    fit$coefficient[common,"bmi"])
## [1] 0.8284101

There is a large overlap between the top 135 CpG site associations observed in each study.

top <- rownames(fit$p.value)[order(fit$p.value[,"bmi"])[1:length(common)]]
tab <- table(mendelson=sites %in% common,
             aries=sites %in% top)
kable(tab)
FALSE TRUE
FALSE 482816 117
TRUE 117 18

(Fisher's exact test p = 3.404676 × 10-43)

Nearly all associations that survive adjustment for multiple tests in ARIES also survive adjustment for multiple tests in Mendelson et al.

padj <- p.adjust(fit$p.value[,"bmi"])
sig <- rownames(fit$p.value)[padj < 0.05]
tab <- table(mendelson=sites %in% common,
             aries=sites %in% sig)
kable(tab)
FALSE TRUE
FALSE 482932 1
TRUE 132 3

(Fisher's exact test p = 8.5356987 × 10-11).

Overall, the Mendelson et al. associations are replicated in ARIES as illustrated by the following QQ-plot showing p-values in ARIES for the top Mendelson et al. CpG site associations.

bmi$p.aries <- fit$p.value[match(bmi$cpg, rownames(fit$p.value)),"bmi"]
bmi$col <- "black"
bmi$col[which(p.adjust(bmi$p.aries,"bonferroni") < 0.05)] <- "red"
bmi <- bmi[order(bmi$p.aries),]
bmi$p.exp <- ppoints(bmi$p.aries)
lim <- rev(range(c(bmi$p.aries, bmi$p.exp), na.rm=T))
plot(x=-log10(bmi$p.exp), y=-log10(bmi$p.aries),
     xlim=-log10(lim), ylim=-log10(lim),
     pch=19,
     main="Mendelson et al. associations in ARIES",
     xlab="-log10 expected p-values",
     ylab="-log10 observed p-values",
     col=bmi$col)
abline(0,1,col="gray")
legend("bottomright",
       legend=c(
           paste0("adjusted p > 0.05 (",
                  sum(bmi$col=="black",na.rm=T), " CpG sites)"),
           paste0("adjusted p < 0.05 (",
                  sum(bmi$col=="red",na.rm=T), " CpG sites)")),
       fill=c("black","red"))           

plot of chunk unnamed-chunk-19