We developed a format for storing and harmonising GWAS summary data known as GWAS VCF format. This format is effective for being very fast when querying chromosome and position ranges, handling multiallelic variants and indels.
All the data in the IEU GWAS database is available for download in the GWAS VCF format. This R package provides fast and convenient functions for querying and creating GWAS summary data in GWAS VCF format. The package builds on the VariantAnnotation Bioconductor package, which itself is based on the widely used SummarizedExperiment Bioconductor package.
For some VCF querying functions it is faster to optionally use bcftools, and when available the R package will use that strategy. To set a location for the bcftools package, use
library(gwasvcf)
set_bcftools('/path/to/bcftools')
Note that there is bcftools binary for Windows available, so some querying options will be slower on Windows.
For LD related functions the package uses plink 1.90. You can specify the location of your plink installation by running
set_plink('/path/to/plink')
Alternatively you can automatically use use the binaries bundled here: https://github.com/mrcieu/genetics.binaRies
remotes::install_github('mrcieu/genetics.binaRies')
set_plink()
set_bcftools()
To unset a path:
set_plink(NULL)
set_bcftools(NULL)
For this vignette we will use the bundled binaries in
genetics.binaRies
.
suppressWarnings(suppressPackageStartupMessages({
library(gwasvcf)
library(VariantAnnotation)
library(dplyr)
library(magrittr)
}))
set_bcftools()
#> Loading required namespace: genetics.binaRies
#> Path not provided, using binaries in the MRCIEU/genetics.binaRies package
To read an entire dataset use the readVcf
function. As
an example we’ll use the bundled data which is a small subset of the
Speliotes et al 2010 BMI GWAS.
vcffile <- system.file("extdata", "data.vcf.gz", package="gwasvcf")
vcf <- readVcf(vcffile)
class(vcf)
#> [1] "CollapsedVCF"
#> attr(,"package")
#> [1] "VariantAnnotation"
Please refer to the VariantAnnotation
package
documentation for full details about the CollapsedVCF
object. A brief summary follows.
General info about the dataset can be obtained by calling it:
vcf
#> class: CollapsedVCF
#> dim: 92 1
#> rowRanges(vcf):
#> GRanges with 5 metadata columns: paramRangeID, REF, ALT, QUAL, FILTER
#> info(vcf):
#> DataFrame with 1 column: AF
#> info(header(vcf)):
#> Number Type Description
#> AF A Float Allele Frequency
#> geno(vcf):
#> List of length 9: ES, SE, LP, AF, SS, EZ, SI, NC, ID
#> geno(header(vcf)):
#> Number Type Description
#> ES A Float Effect size estimate relative to the alternative allele
#> SE A Float Standard error of effect size estimate
#> LP A Float -log10 p-value for effect estimate
#> AF A Float Alternate allele frequency in the association study
#> SS A Float Sample size used to estimate genetic effect
#> EZ A Float Z-score provided if it was used to derive the EFFECT and...
#> SI A Float Accuracy score of summary data imputation
#> NC A Float Number of cases used to estimate genetic effect
#> ID 1 String Study variant identifier
There are 92 rows and 1 column which means 92 SNPs and one GWAS. See the header information:
header(vcf)
#> class: VCFHeader
#> samples(1): IEU-a-2
#> meta(4): fileformat META SAMPLE contig
#> fixed(1): FILTER
#> info(1): AF
#> geno(9): ES SE ... NC ID
See the names of the GWAS datasets (in this case just one, and it refers to the IEU GWAS database ID name):
In this case you can obtain information about this study through the
ieugwasr
package
e.g. ieugwasr::gwasinfo("IEU-a-2")
.
There are a few components within the object:
header
which has the meta data describing the dataset,
including the association result variablesrowRanges
which is information about each variantinfo
which is further metadata about each variantgeno
which is the actual association results for each
GWASthe rowRanges
object is a GenomicRanges
class, which is useful for performing fast operations on chromosome
position information.
rowRanges(vcf)
#> GRanges object with 92 ranges and 5 metadata columns:
#> seqnames ranges strand | paramRangeID REF
#> <Rle> <IRanges> <Rle> | <factor> <DNAStringSet>
#> rs12565286 1 721290 * | NA G
#> rs11804171 1 723819 * | NA T
#> rs2977670 1 723891 * | NA G
#> rs3094315 1 752566 * | NA G
#> rs2073813 1 753541 * | NA G
#> ... ... ... ... . ... ...
#> rs715643 1 1172907 * | NA C
#> rs6675798 1 1176597 * | NA T
#> rs6603783 1 1181751 * | NA T
#> rs6603785 1 1186502 * | NA A
#> rs6603787 1 1188225 * | NA G
#> ALT QUAL FILTER
#> <DNAStringSetList> <numeric> <character>
#> rs12565286 C NA PASS
#> rs11804171 A NA PASS
#> rs2977670 C NA PASS
#> rs3094315 A NA PASS
#> rs2073813 A NA PASS
#> ... ... ... ...
#> rs715643 T NA PASS
#> rs6675798 C NA PASS
#> rs6603783 C NA PASS
#> rs6603785 T NA PASS
#> rs6603787 T NA PASS
#> -------
#> seqinfo: 84 sequences from GRCh37 genome
The VCF object is somewhat complex and you can read more about it in
the ’VariantAnnotation package documentation](https://bioconductor.org/packages/release/bioc/html/VariantAnnotation.html).
You can create various other formats that might be easier to use from
it. For example, create a GRanges
object which is great for
fast chromosome-position operations
vcf_to_granges(vcf)
#> GRanges object with 92 ranges and 15 metadata columns:
#> seqnames ranges strand | paramRangeID REF ALT
#> <Rle> <IRanges> <Rle> | <factor> <character> <character>
#> rs12565286 1 721290 * | NA G C
#> rs11804171 1 723819 * | NA T A
#> rs2977670 1 723891 * | NA G C
#> rs3094315 1 752566 * | NA G A
#> rs2073813 1 753541 * | NA G A
#> ... ... ... ... . ... ... ...
#> rs715643 1 1172907 * | NA C T
#> rs6675798 1 1176597 * | NA T C
#> rs6603783 1 1181751 * | NA T C
#> rs6603785 1 1186502 * | NA A T
#> rs6603787 1 1188225 * | NA G T
#> QUAL FILTER ES SE LP AF
#> <numeric> <character> <numeric> <numeric> <numeric> <numeric>
#> rs12565286 NA PASS -0.0067 0.0145 0.1930060 0.93220
#> rs11804171 NA PASS -0.0146 0.0175 0.3935110 0.96296
#> rs2977670 NA PASS 0.0044 0.0184 0.0909791 0.07143
#> rs3094315 NA PASS 0.0060 0.0065 0.4485500 0.15520
#> rs2073813 NA PASS 0.0035 0.0102 0.1357860 NA
#> ... ... ... ... ... ... ...
#> rs715643 NA PASS 0.0019 0.0118 0.0594337 0.90833
#> rs6675798 NA PASS -0.0013 0.0067 0.0725270 0.89170
#> rs6603783 NA PASS -0.0002 0.0069 0.0101499 0.90000
#> rs6603785 NA PASS 0.0075 0.0104 0.3271640 0.91667
#> rs6603787 NA PASS 0.0025 0.0089 0.1085740 NA
#> SS EZ SI NC ID id
#> <numeric> <numeric> <numeric> <numeric> <character> <character>
#> rs12565286 109823.0 NA NA NA rs12565286 IEU-a-2
#> rs11804171 84828.0 NA NA NA rs11804171 IEU-a-2
#> rs2977670 68458.9 NA NA NA rs2977670 IEU-a-2
#> rs3094315 131544.0 NA NA NA rs3094315 IEU-a-2
#> rs2073813 64351.3 NA NA NA rs2073813 IEU-a-2
#> ... ... ... ... ... ... ...
#> rs715643 121822 NA NA NA rs715643 IEU-a-2
#> rs6675798 223475 NA NA NA rs6675798 IEU-a-2
#> rs6603783 220022 NA NA NA rs6603783 IEU-a-2
#> rs6603785 165964 NA NA NA rs6603785 IEU-a-2
#> rs6603787 199099 NA NA NA rs6603787 IEU-a-2
#> -------
#> seqinfo: 84 sequences from GRCh37 genome
Create a data frame:
vcf_to_granges(vcf) %>% dplyr::as_tibble()
#> # A tibble: 92 × 20
#> seqnames start end width strand paramRangeID REF ALT QUAL FILTER
#> <fct> <int> <int> <int> <fct> <fct> <chr> <chr> <dbl> <chr>
#> 1 1 721290 721290 1 * NA G C NA PASS
#> 2 1 723819 723819 1 * NA T A NA PASS
#> 3 1 723891 723891 1 * NA G C NA PASS
#> 4 1 752566 752566 1 * NA G A NA PASS
#> 5 1 753541 753541 1 * NA G A NA PASS
#> 6 1 754192 754192 1 * NA A G NA PASS
#> 7 1 768448 768448 1 * NA G A NA PASS
#> 8 1 775659 775659 1 * NA A G NA PASS
#> 9 1 777122 777122 1 * NA A T NA PASS
#> 10 1 779322 779322 1 * NA A G NA PASS
#> # ℹ 82 more rows
#> # ℹ 10 more variables: ES <dbl>, SE <dbl>, LP <dbl>, AF <dbl>, SS <dbl>,
#> # EZ <dbl>, SI <dbl>, NC <dbl>, ID <chr>, id <chr>
The direct conversion to formats for tools such as TwoSampleMR, coloc, and many others can also be made using the https://github.com/mrcieu/gwasglue R package.
The query_gwas
function takes either a filename to a vcf
file, or vcf object as the main argument. You can then query on
rsid
, pval
or chrompos
. For
example
vcfsubset <- query_gwas(vcffile, chrompos=c("1:1097291-1099437"))
and
vcf <- readVcf(vcffile)
vcfsubset <- query_gwas(vcf, chrompos=c("1:1097291-1099437"))
are each identical, but the former saves time and memory because it is querying the file using an index and only reading in what is required.
Examples of other filters are here:
vcf <- query_gwas(vcffile, rsid=c("rs3128126", "rs3121561", "rs3813193"))
vcf
#> class: CollapsedVCF
#> dim: 3 1
#> rowRanges(vcf):
#> GRanges with 5 metadata columns: paramRangeID, REF, ALT, QUAL, FILTER
#> info(vcf):
#> DataFrame with 3 columns: AF, AC, AN
#> info(header(vcf)):
#> Number Type Description
#> AF A Float Allele Frequency
#> AC A Integer Allele count in genotypes
#> AN 1 Integer Total number of alleles in called genotypes
#> geno(vcf):
#> List of length 9: ES, SE, LP, AF, SS, EZ, SI, NC, ID
#> geno(header(vcf)):
#> Number Type Description
#> ES A Float Effect size estimate relative to the alternative allele
#> SE A Float Standard error of effect size estimate
#> LP A Float -log10 p-value for effect estimate
#> AF A Float Alternate allele frequency in the association study
#> SS A Float Sample size used to estimate genetic effect
#> EZ A Float Z-score provided if it was used to derive the EFFECT and...
#> SI A Float Accuracy score of summary data imputation
#> NC A Float Number of cases used to estimate genetic effect
#> ID 1 String Study variant identifier
vcf <- query_gwas(vcffile, pval=0.5)
vcf
#> class: CollapsedVCF
#> dim: 45 1
#> rowRanges(vcf):
#> GRanges with 5 metadata columns: paramRangeID, REF, ALT, QUAL, FILTER
#> info(vcf):
#> DataFrame with 3 columns: AF, AC, AN
#> info(header(vcf)):
#> Number Type Description
#> AF A Float Allele Frequency
#> AC A Integer Allele count in genotypes
#> AN 1 Integer Total number of alleles in called genotypes
#> geno(vcf):
#> List of length 9: ES, SE, LP, AF, SS, EZ, SI, NC, ID
#> geno(header(vcf)):
#> Number Type Description
#> ES A Float Effect size estimate relative to the alternative allele
#> SE A Float Standard error of effect size estimate
#> LP A Float -log10 p-value for effect estimate
#> AF A Float Alternate allele frequency in the association study
#> SS A Float Sample size used to estimate genetic effect
#> EZ A Float Z-score provided if it was used to derive the EFFECT and...
#> SI A Float Accuracy score of summary data imputation
#> NC A Float Number of cases used to estimate genetic effect
#> ID 1 String Study variant identifier
vcf <- query_gwas(vcffile, chrompos=c("1:1097291-1099437"))
vcf
#> class: CollapsedVCF
#> dim: 2 1
#> rowRanges(vcf):
#> GRanges with 5 metadata columns: paramRangeID, REF, ALT, QUAL, FILTER
#> info(vcf):
#> DataFrame with 3 columns: AF, AC, AN
#> info(header(vcf)):
#> Number Type Description
#> AF A Float Allele Frequency
#> AC A Integer Allele count in genotypes
#> AN 1 Integer Total number of alleles in called genotypes
#> geno(vcf):
#> List of length 9: ES, SE, LP, AF, SS, EZ, SI, NC, ID
#> geno(header(vcf)):
#> Number Type Description
#> ES A Float Effect size estimate relative to the alternative allele
#> SE A Float Standard error of effect size estimate
#> LP A Float -log10 p-value for effect estimate
#> AF A Float Alternate allele frequency in the association study
#> SS A Float Sample size used to estimate genetic effect
#> EZ A Float Z-score provided if it was used to derive the EFFECT and...
#> SI A Float Accuracy score of summary data imputation
#> NC A Float Number of cases used to estimate genetic effect
#> ID 1 String Study variant identifier
It’s possible to chain filters together e.g.
vcf <- query_gwas(vcffile, rsid=c("rs3128126", "rs3121561", "rs3813193")) %>%
query_gwas(pval=0.5)
vcf
#> class: CollapsedVCF
#> dim: 1 1
#> rowRanges(vcf):
#> GRanges with 5 metadata columns: paramRangeID, REF, ALT, QUAL, FILTER
#> info(vcf):
#> DataFrame with 3 columns: AF, AC, AN
#> info(header(vcf)):
#> Number Type Description
#> AF A Float Allele Frequency
#> AC A Integer Allele count in genotypes
#> AN 1 Integer Total number of alleles in called genotypes
#> geno(vcf):
#> List of length 9: ES, SE, LP, AF, SS, EZ, SI, NC, ID
#> geno(header(vcf)):
#> Number Type Description
#> ES A Float Effect size estimate relative to the alternative allele
#> SE A Float Standard error of effect size estimate
#> LP A Float -log10 p-value for effect estimate
#> AF A Float Alternate allele frequency in the association study
#> SS A Float Sample size used to estimate genetic effect
#> EZ A Float Z-score provided if it was used to derive the EFFECT and...
#> SI A Float Accuracy score of summary data imputation
#> NC A Float Number of cases used to estimate genetic effect
#> ID 1 String Study variant identifier
It’s possible to have multiple GWAS studies per vcf. You can specify specific GWAS studies to read in using e.g.
vcf <- query_gwas(vcffile, rsid=c("rs3128126", "rs3121561", "rs3813193"), id="IEU-a-2")
Note that querying by chrompos is the fastest way to deal with VCFs, use this over rsid where possible when speed is an issue.
Querying by rsid is slow. If a large number of queries by rsid are to be performed then it could be worth generating an index which would speed up the querying. This approach uses SQLite to create a local database, linking rsid to chromosome and position. It strips out the ‘rs’ from the rs identifiers to make fast searchers by integer. The concept is based on that developed here: bioforensics/rsidx.
To create the index:
create_rsidx_index_from_vcf(vcffile, "index.rsidx")
#> Extracting position info
#> Generating index
To query using the index:
vcf <- query_gwas(vcffile, rsid=c("rs3128126", "rs3121561", "rs3813193"), rsidx="index.rsidx")
Querying by p-value is slow. It could be worth generating an index file for p-values to speed this up. Similar to rsid queries, it uses an sqlite database linking -log10 pvalues to chromosome and position.
To create the index:
create_pval_index_from_vcf(vcffile, maximum_pval=0.05, "index.pvali")
#> Extracting pval info
#> [1] "CREATE TABLE pval_to_coord (chrom TEXT NOT NULL DEFAULT NULL, coord INTEGER NOT NULL DEFAULT NULL, LP REAL NOT NULL DEFAULT 0);"
#> [2] ".separator ,"
#> [3] ".import /tmp/Rtmp4NVB5y/file1a267cbb0e79 pval_to_coord"
#> [4] "CREATE INDEX idx_LP ON pval_to_coord (LP)"
#> Generating index
To query using the index:
vcf <- query_gwas(vcffile, pval=0.05, pvali="index.pvali")
#> Using pval index
#> Identified 7 variants passing threshold. Extracting...
The fastest way to query VCFs is by specifying chromosome and position. Can specify specific positions, or ranges. e.g.
cp <- c("1:10000", "2:10000-20000")
or as a data frame
You can check what will be parsed out with:
parse_chrompos(cp)
#> GRanges object with 2 ranges and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> [1] 1 10000 *
#> [2] 2 10000-20000 *
#> -------
#> seqinfo: 2 sequences from an unspecified genome; no seqlengths
Querying by p-value or rsid is also possible but is slower as only chrompos is indexed. On Mac and Linux, rsid and p-value queries are performed by calls to bcftools. On Windows it uses VariantAnnotation directly, because bcftools binaries are not available. This is unfortunately somewhat slower. If many operations are being performed it might be faster to read in the whole dataset and perform queries that way.
If a set of rsids are requested from a vcf but some are absent, a reference panel can be used to search for LD proxies, extract them, and align the effects and alleles against the original variants that were requested.
There are two ways to perform the LD proxy search:
An LD reference panel can be obtained from here: http://fileserve.mrcieu.ac.uk/ld/data_maf0.01_rs_ref.tgz. This dataset comprises Europeans from the 1000 genomes project, in plink format, and including only SNPs with MAF > 0.01, and with the reference alleles aligned to the human genome reference sequence. For this vignette we can use a small subset of that dataset:
ldfile <- system.file("extdata", "eur.bed", package="gwasvcf") %>%
gsub(".bed", "", .)
We also need to provide a path to the plink binary used to generate
LD calculations. This can be done through the
genetics.binaRies
package as with bcftools
set_plink()
#> Path not provided, using binaries in the MRCIEU/genetics.binaRies package
The rs4442317 variant is not present in the vcf file, i.e. if we query that variant:
query_gwas(vcffile, rsid="rs4442317") %>% nrow
#> [1] 0
vcf <- query_gwas(vcffile, rsid="rs4442317", proxies="yes", bfile=ldfile, tag_r2=0.05)
#> Initial search...
#> Extracted 0 out of 1 rsids
#> Searching for proxies for 1 rsids
#> Determining searchspace...
#> Proxy lookup...
#> Finding proxies...
#> Taking input= as a system command ('gunzip -c /tmp/Rtmp4NVB5y/file1a26c346ec.targets.ld.gz') and a variable has been used in the expression passed to `input=`. Please use fread(cmd=...). There is a security concern if you are creating an app, and the app could have a malicious user, and the app is not running in a secure environment; e.g. the app is running as root. Please read item 5 in the NEWS file for v1.11.6 for more information and for the option to suppress this message.
#> Found 10 proxies
#> Extrating proxies...
#> Identified proxies for 1 of 1 rsids
#> Aligning...
vcf %>% vcf_to_granges()
#> GRanges object with 1 range and 15 metadata columns:
#> seqnames ranges strand | REF ALT QUAL
#> <Rle> <IRanges> <Rle> | <character> <character> <numeric>
#> rs4442317 1 1106784 * | T C NA
#> FILTER ES SE LP AF SS
#> <character> <numeric> <numeric> <numeric> <numeric> <numeric>
#> rs4442317 PASS 0.0059 0.0071 0.391474 0.8559 138001
#> EZ SI NC ID PR id
#> <numeric> <numeric> <numeric> <character> <character> <character>
#> rs4442317 NA NA NA rs4970420 rs4970420 IEU-a-2
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
Here we see that the proxy variant is rs4970420.
You may also extract only the best available proxies even if the
requested rsids are present, by using proxies="only"
. An
example of this shows that the effect size estimates for the proxy
variants are aligned to the effect alleles of the target variants:
# Read vcf
a <- readVcf(vcffile)
# Obtain the best LD proxy for each of the rsids
b <- query_gwas(vcffile, rsid=names(a), proxies="only", bfile=ldfile, tag_r2=0.6)
#> Determining searchspace...
#> Proxy lookup...
#> Finding proxies...
#> Taking input= as a system command ('gunzip -c /tmp/Rtmp4NVB5y/file1a26dd22e7c.targets.ld.gz') and a variable has been used in the expression passed to `input=`. Please use fread(cmd=...). There is a security concern if you are creating an app, and the app could have a malicious user, and the app is not running in a secure environment; e.g. the app is running as root. Please read item 5 in the NEWS file for v1.11.6 for more information and for the option to suppress this message.
#> Found 270 proxies
#> Extrating proxies...
#> Identified proxies for 52 of 1 rsids
#> Aligning...
# Match the target data to the proxy data
index <- match(names(b), names(a))
# Plot the target data effects against the proxy data effects
plot(vcf_to_granges(b)$ES, vcf_to_granges(a)$ES[index])
Using the LD reference panel described above, it is possible to create a sqlite tag reference panel using the following commands. First get an example LD reference panel:
ldfile <- system.file("extdata", "eur.bed", package="gwasvcf") %>%
gsub(".bed", "", .)
We also need to provide a path to the plink binary used to generate
LD calculations. This can be done through the
genetics.binaRies
package as with bcftools
set_plink()
#> Path not provided, using binaries in the MRCIEU/genetics.binaRies package
Now generate the tagging database
dbfile <- tempfile()
create_ldref_sqlite(ldfile, dbfile, tag_r2 = 0.05)
#> identifying indels to remove
#> calculating ld tags
#> formatting
#> creating sqlite db
Perform the query
vcf <- query_gwas(vcffile, rsid="rs4442317", proxies="yes", dbfile=dbfile, tag_r2=0.05)
#> Initial search...
#> Extracted 0 out of 1 rsids
#> Searching for proxies for 1 rsids
#> Proxy lookup...
#> Found 168 proxies
#> Extrating proxies...
#> Identified proxies for 1 of 1 rsids
#> Aligning...
vcf %>% vcf_to_granges()
#> GRanges object with 1 range and 15 metadata columns:
#> seqnames ranges strand | REF ALT QUAL
#> <Rle> <IRanges> <Rle> | <character> <character> <numeric>
#> rs4442317 1 1106784 * | T C NA
#> FILTER ES SE LP AF SS
#> <character> <numeric> <numeric> <numeric> <numeric> <numeric>
#> rs4442317 PASS -4e-04 0.0066 0.0214999 0.9 233073
#> EZ SI NC ID PR id
#> <numeric> <numeric> <numeric> <character> <character> <character>
#> rs4442317 NA NA NA rs10907175 rs10907175 IEU-a-2
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
If you have GWAS summary data in a text file or data frame, this can be converted to a VCF object.
vcf <- readVcf(vcffile)
vv <- vcf_to_granges(vcf) %>% dplyr::as_tibble()
out <- vv %$% create_vcf(chrom=seqnames, pos=start, nea=REF, ea=ALT, snp=ID, ea_af=AF, effect=ES, se=SE, pval=10^-LP, n=SS, name="a")
out
#> class: CollapsedVCF
#> dim: 92 1
#> rowRanges(vcf):
#> GRanges with 4 metadata columns: REF, ALT, QUAL, FILTER
#> info(vcf):
#> DataFrame with 0 columns:
#> geno(vcf):
#> List of length 6: AF, ES, SE, LP, SS, ID
#> geno(header(vcf)):
#> Number Type Description
#> AF A Float Alternate allele frequency in the association study
#> ES A Float Effect size estimate relative to the alternative allele
#> SE A Float Standard error of effect size estimate
#> LP A Float -log10 p-value for effect estimate
#> SS A Float Sample size used to estimate genetic effect
#> ID A String Study variant identifier
It’s possible to write the vcf file:
writeVcf(out, file="temp.vcf")
You may want to first harmonise the data so that all the non-effect alleles are aligned to the human genome reference. See the gwasglue package on some functions to do this.
Although still under development, if compared with its predecessor, the gwasglue2 package has several new features, including the use of S4 R objects.
It is possible to create a SummarySet
object from a
GWAS-VCF file or VCF object e.g. output from
VariantAnnotation::readVcf()
, create_vcf()
or
query_gwas()
using the gwasvcf_to_summaryset()
function.
For example:
summaryset <- readVcf(vcffile) %>%
gwasvcf_to_summaryset()
Once the SummarySet
objects are created, it is possible
to use gwasglue2
to harmonise data, harmonise against a LD
matrix, remap genomic coordinates to a different genome assembly,
convert to other formats and more.