library(gwasglue)
library(gwasvcf)

Conditional analysis of VCF files can be performed using GCTA’s COJO routine. The procedure implemented here is as follows

1. Obtain clumped top-hits
2. Assign each top-hit to an LD region. The LD regions are demarkated using this approach.
3. Perform finemapping within each LD region that has a top-hit, retaining a representative variant for every credible set
4. For each LD region that has multiple finemapped loci, perform conditional analysis. e.g. If there are three finemapped loci in a particular region, three conditional analyses will be performed. First, obtain the effects of variant 1 conditional on variants 2 and 3; then variant 2 conditional on variants 1 and 3; then variant 3 conditional on variants 1 and 2.

Ultimately, a list of results will be returned where every fine-mapped variant has a regional set of summary data that is conditionally independent of all neighbouring fine-mapped variants.

## Finemapping pipeline

1. Clump dataset
2. Map clumps to LD regions
3. Perform fine mapping in each LD region

Setup:

vcffile <- "ieu-a-300.vcf.gz"
ldref <- "/Users/gh13047/repo/mr-base-api/app/ld_files/EUR"
gwasvcf::set_bcftools()

Perform susieR pipeline:

out <- susieR_pipeline(
vcffile=vcffile,
bfile=ldref,
pop="EUR",
L=10,
estimate_residual_variance=TRUE,
estimate_prior_variance=TRUE,
check_R=FALSE,
z_ld_weight=1/500
)

Each detected region now has a finemapped object stored against it. You can see them for example like this:

summary(out$res[[1]]$susieR)
susieR::susie_plot(out$res[[1]]$susieR, y="PIP")

For each region we can extract the variants with the highest posterior inclusion probability per credible set, e.g.:

out$res[[1]]$susieR$fmset ## Conditional analysis pipeline Now we can perform conditional analysis at each region using knowledge of the finemapped variants. The cojo_cond function does the following 1. Creates temporary directory to store files 2. Writes vcf file to summary stats file in COJO format 3. Determines regions that have multiple fine-mapped variants 4. For each fine-mapped variant, obtains summary stats conditional on other fine-mapped variants in the region The result is a list of regions, with a set of conditional summary stats for every fine-mapped variant in that region. out2 <- cojo_cond( vcffile=vcffile, bfile=ldref, pop="EUR", snplist=unlist(sapply(out$res, function(x) x$susieR$fmset))
)

TODO

• Make sure finemapped variants are in reference panel
• Improve speed of cojo by implementing within R so don’t have to use GCTA
• Determine how to combine cojo with coloc