We have tried to provide useful cloud-based functionality for many operations, including relatively demanding LD operations. If you are running a large number of LD operations, we request that you think about performing those locally rather than through the API. We have tried to write the software to enable this to work seamlessly. Some examples below.
LD operations available on the OpenGWAS API
The API has a wrapper around plink version 1.90 and can use it to perform clumping with an LD reference panel from 1000 genomes reference data.
a <- tophits(id="ieu-a-2", clump=0) b <- ld_clump( dplyr::tibble(rsid=a$name, pval=a$p, id=a$id) )
There are 5 super-populations that can be requested via the
pop argument. By default this will use the Europeans subset (EUR super-population). The reference panel has INDELs removed and only retains SNPs with MAF > 0.01 in the selected population.
Note that you can perform the same operation locally if you provide a path to plink and a bed/bim/fam LD reference dataset.
To get a path to plink you can do the following:
To get the same LD reference dataset that is used by the API, you can download it directly from here:
This contains an LD reference panel for each of the 5 super-populations in the 1000 genomes reference dataset. e.g. for the European super population it has the following files:
Now supposing in R you have a dataframe,
dat, with the following columns:
to perform clumping, just do the following:
Similarly, a matrix of LD r values can be generated using
This uses the API by default but is limited to only 500 variants. You can use, instead, local plink and LD reference data in the same manner as in the
ld_clump function, e.g.
To automatically extract variants from a dataset, and search for LD proxies when a requested variant is not present in the dataset, please look at the options available in the gwasvcf package: