Tutorial 1: How to create a SummarySet and a DataSet
Source:vignettes/SummarySet_DataSet.Rmd
SummarySet_DataSet.Rmd
library(gwasglue2)
library(VariantAnnotation)
library( gwasvcf)
library(ieugwasr)
devtools::load_all("../") # this was added just for development
In this tutorial, we will approach the several ways of creating the two types of gwasglue2 objects: the SummarySet()
and the DataSet
. Check the Strategy page for more details on these objects.
SummarySet
To create a SummarySet object we first need GWAS summary data, like the ones that can be obtained from IEU OpenGWAS through the ieugwasr
package.
data1 <- ieugwasr::associations(variants = "5:74132993-75132993", id = "ukb-d-I9_IHD")
Then we use the create_summaryset()
to create the SummarySet
.
sumset1 <- create_summaryset(data1)
getMetadata(sumset1)
In the code above, we did not supply any metadata information. The function will fill the metadata using any information available in the GWAS summary data (in data1
). It is possible to add metadata using the metadata
argument.
To create a metadata
object, we use the create_metadata()
function. We can give metadata information manually using one or more argument of the function. For example:
meta1 <- create_metadata(sample_size = 361194)
print(meta1)
It is also possible for the user to add extra metadata information.
meta1 <- create_metadata(sample_size = 361194, access = date())
print(meta1)
Or to provide a dataframe with metadata information, using the metadata
argument.
meta1 <- create_metadata(metadata = ieugwasr::gwasinfo("ukb-d-I9_IHD"), access = date())
meta1
Thus, we can now build a SummarySet
object with metadata information provided. Note that data1
is a tibble and by default create_summaryset()
reads tibbles.
sumset1 <- create_summaryset(data1, metadata=meta1)
getMetadata(sumset1)
Let’s try with another example, using another IEU id.
First the GWAS summary data,
data2 <- ieugwasr::associations(variants = "5:74132993-75132993", id = "finn-b-I9_CHD")
and then the metadata
meta2 <- create_metadata(metadata = ieugwasr::gwasinfo("finn-b-I9_CHD"))
meta2
You will notice that there is no sample_size
information in the metadata. This absence could cause gwasglue2 further down in the analyses. We could the sample_size
parameter in create_metadata()
to add the argument or add it later. We are going to add this information after creating the SummarySet
.
sumset2 <- create_summaryset(data2, metadata=meta2)
The warning tell us that there is a problem with the GWAS summary data. Specifically, there are variants with the same chromosomal position and alleles that have different betas, p-values and allele frequencies. If we change the quality control parameter to qc = TRUE
, we will let gwasglue2 to deal with this inconsistencies in the dataset.
sumset2 <- create_summaryset(data2, metadata=meta2, qc = TRUE)
Now we will deal with the lack of sample size information in the metadata. If this information was present in data2
, the create_summaryset()
function would fill this information automatically. But this is not the case, and thus we are going to use the addToMetadata()
method and the ncontrol
information already present in the metadata.
getMetadata(sumset2)
sumset2 = addToMetadata(sumset2, sample_size = (getMetadata(sumset2)$ncontrol + getMetadata(sumset2)$ncases))
getMetadata(sumset2)
Create a Summaryset from a GWAS vcf file
In the examples above, the SummarySets were created using a tibble or more specifically the output of the ieugwasr
package. Now we are going to use GWAS vcf files to create a SummarySet.
First we use the Bioconductor package VariantAnnotation
to read the vcf file.
vcffile <- "../data/vcf/IEU-a-2.vcf.gz"
vcf <- VariantAnnotation::readVcf(vcffile)
Then gwasvcf
is used to query the vcf file and converting to simple dataframes.
data_vcf <- gwasvcf::query_gwas(vcf, chrompos=c("5:74132993-75132993"))%>%
gwasvcf::vcf_to_granges() %>%
dplyr:: as_tibble()
Then we create the SummarySet using the create_sumaryset()
function with argument type = "vcf"
sumset_vcf <- create_summaryset(data_vcf, type="vcf")
Another and simpler way of creating a SummarySet
from a vcf file is to use the gwasvcf::gwasvcf_to_summaryset()
function:
sumset_vcf <- gwasvcf::query_gwas(vcf, chrompos=c("5:74132993-75132993")) %>%
gwasvcf::gwasvcf_to_summaryset()
DataSet
A DataSet-class
is a list of two or more harmonised SummarySets
.
First we need to create a list with all SummarySets
we want to analyse,
summarysets <- list(sumset1, sumset2, sumset_vcf)
and then use the create_dataset()
function to build the DataSet
object.
dataset <- create_dataset(summarysets, harmonise = TRUE, tolerance = 0.08, action = 1)