Tutorial 5: Alternative regional genotype-phenotype map example, with meta-analyses included
Source:vignettes/finemap_coloc2.Rmd
finemap_coloc2.Rmd
remotes::install_github('ritarasteiro/hyprcoloc', build_opts = c('--resave-data', '--no-manual'), build_vignettes = TRUE)
remotes::install_github('ritarasteiro/susieR')
#library(gwasglue2)
devtools::load_all("../") # this was added just for development
library(ieugwasr)
library(susieR) # fork that takes Summaryset
library(hyprcoloc) # fork that takes Dataset
library(dplyr)
This tutorial aims to study the use of statins, a drug to reduce cholesterol, on coronary heart disease and type 2 diabetes. There is some evidence that using statins can increase blood sugar, which can put people who use statins at higher risk of developing type 2 diabetes.
We will be generating a regional genotype-phenotype map for the following traits: chd - coronary heart disease; ldl - low-density lipoprotein; hdl - high-density lipoprotein; trig - triglycerides and t2d - type 2 diabetes around different genes. In this example, meta-analysis will be performed for the chd trait using two independent studies ieu-a-7
and ukb-d-I9_IHD
.
First, we set the path to where the reference population plink files are located to build the LD matrix
bed_ref <- "../data/ld/EUR"
and choose the data for each trait and set the IEU IDs. Note that we are performing a meta-analysis for the chd trait.
chd_id <- c("ieu-a-7", "ukb-d-I9_IHD")
ldl_id <- "ieu-b-110"
hdl_id <- "ieu-b-109"
trig_id <- "ieu-b-111"
t2d_id <- "ebi-a-GCST006867"
1. Setting the metadata
We create a chd_metadata
object for the meta-analysis we are going to perform on the chd trait.
chd_metadata <- lapply(seq_along(chd_id), function(i){
m <- create_metadata(ieugwasr::gwasinfo(chd_id[i]))
})
and then the metadata
objects for each of the other trait objects
ids <- c(ldl_id, hdl_id, trig_id, t2d_id)
# get metadata and create metadata objects
metadata <- lapply(seq_along(ids), function(i){
m <- create_metadata(ieugwasr::gwasinfo(ids[i]))
})
We are going to use these same chd_metadata
and metadata
objects for each analyse for each gene.
Note, that in the code above both chd_metadata
and metadata
objects are lists that contain metadata information for each trait. Eg., chd_metadata
will have metadata information for both ieu-a-7
and ukb-d-I9_IHD
studies, retrieved using ieugwasr::gwasinfo()
.
2. HMGCR (3-hydroxy-3-methylglutaryl-CoA reductase) gene region
Set the gene region:
hmgcr_chrpos <- "5:74132993-75132993"
2.1. Meta-analysis for the chd trait
The code below performs meta-analysis including two studies (ieu-a-7
and ukb-d-I9_IHD
, see above) for the chd trait and the end result is one hmgcr_chd
summaryset. First, it creates the summary sets for each of the studies followed by the dataset which includes them. Finally, performs the meta-analysis to create a new summary set.
hmgcr_chd <- lapply(seq_along(chd_id), function(i){
# create summarysets
s <- create_summaryset(data = ieugwasr::associations(variants = hmgcr_chrpos, id = chd_id[i]), metadata = chd_metadata[[i]])
}) %>%
# create dataset and harmonise
create_dataset(., harmonise = TRUE, tolerance = 0.08, action = 1) %>%
# meta-analysis to create a new summary set
meta_analysis(.)
2.2. Creating a dataset for all the traits
Here, we create an harmonised dataset named hmgcr
, which includes all the traits in ids
and the hmgcr_chd
summaryset created above. We start by creating the summarysets for each trait and then the dataset, which comprise all the summarysets (note that we are not saving the SummarySet objects separately, but filling them directly to the DataSet). Then we add the hmgcr_chd
summaryset and harmonise trait against trait followed by each trait against a LD matrix.
# create an harmonised gwasglue2 DataSet object
hmgcr <- lapply(seq_along(ids), function(i){
# create summarysets
dt <- create_summaryset(data = ieugwasr::associations(variants = hmgcr_chrpos, id = ids[i]), metadata = metadata[[i]])
}) %>%
# create dataset
create_dataset(., harmonise = FALSE) %>%
# add chd summaryset and harmonise
add_summaryset(hmgcr_chd, ., harmonise = TRUE, tolerance = 0.08, action = 1) %>%
# harmonise dataset against LD matrix
harmonise_ld(., bfile = bed_ref , plink_bin = "plink")
Before starting with the fine mapping and colocalisation analyses, we are going to look at the p-values for each variant in each trait.
# plot gwasglue2 DataSet object (hmgcr)
plot_gwasglue(hmgcr, type="manhattan", title = "HMGCR")
2.3. Finemapping with SusieR
After summary sets are harmonised, marginalise each summary set independently and create a new dataset with all marginalised summary sets merged
# do finemapping with susie
ntraits <- getLength(hmgcr)
hmgcr_marginalised <- lapply(1:ntraits, function(trait)
{
# Takes in 1 SS
# Outputs 1 DS (with at least 1 SS)
ds <- susie_rss(R = getLDMatrix(hmgcr), summaryset = getSummarySet(hmgcr, trait))
})
Now we are going to merge the datasets. Note that the merge_datasets()
function is just going to merge the marginalised datasets.
hmgcr_marginalised <- merge_datasets(hmgcr_marginalised)
3. PCSK9 (proprotein convertase subtilisin/kexin type 9) gene region
Set the gene region:
pcsk9_chrpos <- "1:55005221-56005221"
3.1. Meta-analysis for the chd trait
Like detailed above, in section 2.1, the code below performs meta-analysis for the chd trait and the end result is one pcsk9_chd
summaryset.
pcsk9_chd <- lapply(seq_along(chd_id), function(i){
# create summarysets
s <- create_summaryset(ieugwasr::associations(variants = pcsk9_chrpos, id =chd_id[i]), metadata=chd_metadata[[i]])
}) %>%
# create dataset and harmonise
create_dataset(., harmonise = TRUE, tolerance = 0.08, action = 1) %>%
# meta-analysis to create a new summary set
meta_analysis(.)
3.2. Creating a dataset for all the traits
Like detailed in section 2.2, here we create another harmonised dataset named pcsk9
, this time for the PCSK9 gene region.
# create an harmonised gwasglue2 DataSet object
pcsk9 <- lapply(seq_along(ids), function(i){
# create summarysets
dt <- create_summaryset(data = ieugwasr::associations(variants = pcsk9_chrpos, id = ids[i]), metadata = metadata[[i]])
}) %>%
# create dataset
create_dataset(., harmonise = FALSE) %>%
# add chd summaryset and harmonise
add_summaryset(pcsk9_chd, ., harmonise = TRUE, tolerance = 0.08, action = 1) %>%
# harmonise dataset against LD matrix
harmonise_ld(., bfile = bed_ref , plink_bin = "plink")
Manhattan plots for each trait in the DataSet.
# plot gwasglue2 DataSet object (pcsk9)
plot_gwasglue(pcsk9, type="manhattan", title = "PCSK9")
3.3. Finemapping with SusieR
Marginalise each summary set independently and create a new dataset with all marginalised summary sets merged.
# do finemapping with susie
ntraits <- getLength(pcsk9)
pcsk9_marginalised <- lapply(1:ntraits, function(trait)
{
# Takes in 1 SS
# Outputs 1 DS (with at least 1 SS)
ds <- susie_rss(R = getLDMatrix(pcsk9), summaryset = getSummarySet(pcsk9, trait))
})
Merge the marginalised datasets:
pcsk9_marginalised <- merge_datasets(pcsk9_marginalised)
4. NPC1L1 (NPC1 like intracellular cholesterol transporter 1) gene region
Set the region:
npc1l1_chrpos <- "7:44052134-45052134"
4.1. Meta-analysis for the chd trait
Like section 2.1, we’ll do a meta-analysis for the chd trait and the end result is one npc1l1_chd
summaryset.
npc1l1_chd <- lapply(seq_along(chd_id), function(i){
# create summarysets
s <- create_summaryset(ieugwasr::associations(variants = npc1l1_chrpos, id =chd_id[i]), metadata=chd_metadata[[i]])
}) %>%
# create dataset and harmonise
create_dataset(., harmonise = TRUE, tolerance = 0.08, action = 1) %>%
# meta-analysis to create a new summary set
meta_analysis(.)
4.2. Creating a dataset for all the traits
Like detailed in section 2.2, here we create another harmonised dataset named npc1l1
, this time for the NPC1L1 gene region.
# create an harmonised gwasglue2 DataSet object
npc1l1 <- lapply(seq_along(ids), function(i){
# create summarysets
dt <- create_summaryset(data = ieugwasr::associations(variants = npc1l1_chrpos, id = ids[i]), metadata = metadata[[i]])
}) %>%
# create dataset
create_dataset(., harmonise = FALSE) %>%
# add chd summaryset and harmonise
add_summaryset(npc1l1_chd, ., harmonise = TRUE, tolerance = 0.08, action = 1) %>%
# harmonise dataset against LD matrix
harmonise_ld(., bfile = bed_ref , plink_bin = "plink")
Manhattan plots for each trait in the DataSet.
# plot gwasglue2 DataSet object (npc1l1)
plot_gwasglue(npc1l1, type="manhattan", title ="NPC1L1")
4.3. Finemapping with SusieR
Marginalise each summary set independently and create a new dataset with all marginalised summary sets merged
ntraits <- getLength( npc1l1)
npc1l1_marginalised <- lapply(1:ntraits, function(trait)
{
# Takes in 1 SS
# Outputs 1 DS (with at least 1 SS)
ds <- susie_rss(R = getLDMatrix(npc1l1), summaryset = getSummarySet( npc1l1, trait))
})
Merge the marginalised datasets:
npc1l1_marginalised <- merge_datasets(npc1l1_marginalised)
5. LPA (Lipoprotein(A)) gene region
Set the region:
lpa_chrpos <- "6:160952515-161087407"
5.1. Meta-analysis for the chd trait
Like section 2.1, we’ll do a meta-analysis for the chd trait and the end result is one lpa_chd
summaryset.
lpa_chd <- lapply(seq_along(chd_id), function(i){
# create summarysets
s <- create_summaryset(ieugwasr::associations(variants = lpa_chrpos, id =chd_id[i]), metadata=chd_metadata[[i]])
}) %>%
# create dataset and harmonise
create_dataset(., harmonise = TRUE, tolerance = 0.08, action = 1) %>%
# meta-analysis to create a new summary set
meta_analysis(.)
5.2. Creating a dataset for all the traits
Like detailed in section 2.2, here we create another harmonised dataset named lpa
, this time for the NPC1L1 gene region.
# create an harmonised gwasglue2 DataSet object
lpa <- lapply(seq_along(ids), function(i){
# create summarysets
dt <- create_summaryset(data = ieugwasr::associations(variants = lpa_chrpos, id = ids[i]), metadata = metadata[[i]])
}) %>%
# create dataset
create_dataset(., harmonise = FALSE) %>%
# add chd summaryset and harmonise
add_summaryset(lpa_chd, ., harmonise = TRUE, tolerance = 0.08, action = 1) %>%
# harmonise dataset against LD matrix
harmonise_ld(., bfile = bed_ref , plink_bin = "plink")
Manhattan plots for each trait in the DataSet.
# plot gwasglue2 DataSet object (lpa)
plot_gwasglue(lpa, type="manhattan", title = "LPA")
5.3. Finemapping with SusieR
Marginalise each summary set independently and create a new dataset with all marginalised summary sets merged
ntraits <- getLength(lpa)
lpa_marginalised <- lapply(1:ntraits, function(trait)
{
# Takes in 1 SS
# Outputs 1 DS (with at least 1 SS)
ds <- susie_rss(R = getLDMatrix(lpa), summaryset = getSummarySet(lpa, trait))
})
Merge the marginalised datasets:
lpa_marginalised <- merge_datasets(lpa_marginalised)