Conditional analysis of VCF files can be performed using GCTA’s COJO routine. The procedure implemented here is as follows

  1. Obtain clumped top-hits
  2. Assign each top-hit to an LD region. The LD regions are demarkated using this approach.
  3. Perform finemapping within each LD region that has a top-hit, retaining a representative variant for every credible set
  4. For each LD region that has multiple finemapped loci, perform conditional analysis. e.g. If there are three finemapped loci in a particular region, three conditional analyses will be performed. First, obtain the effects of variant 1 conditional on variants 2 and 3; then variant 2 conditional on variants 1 and 3; then variant 3 conditional on variants 1 and 2.

Ultimately, a list of results will be returned where every fine-mapped variant has a regional set of summary data that is conditionally independent of all neighbouring fine-mapped variants.

Finemapping pipeline

  1. Clump dataset
  2. Map clumps to LD regions
  3. Perform fine mapping in each LD region

Setup:

vcffile <- "ieu-a-300.vcf.gz"
ldref <- "/Users/gh13047/repo/mr-base-api/app/ld_files/EUR"
gwasvcf::set_bcftools()

Perform susieR pipeline:

out <- susieR_pipeline(
    vcffile=vcffile,
    bfile=ldref,
    plink_bin=genetics.binaRies::get_plink_binary(),
    pop="EUR",
    threads=1,
    L=10, 
    estimate_residual_variance=TRUE, 
    estimate_prior_variance=TRUE, 
    check_R=FALSE, 
    z_ld_weight=1/500
)

Each detected region now has a finemapped object stored against it. You can see them for example like this:

summary(out$res[[1]]$susieR)
susieR::susie_plot(out$res[[1]]$susieR, y="PIP")

For each region we can extract the variants with the highest posterior inclusion probability per credible set, e.g.:

out$res[[1]]$susieR$fmset

Conditional analysis pipeline

Now we can perform conditional analysis at each region using knowledge of the finemapped variants. The cojo_cond function does the following

  1. Creates temporary directory to store files
  2. Writes vcf file to summary stats file in COJO format
  3. Determines regions that have multiple fine-mapped variants
  4. For each fine-mapped variant, obtains summary stats conditional on other fine-mapped variants in the region

The result is a list of regions, with a set of conditional summary stats for every fine-mapped variant in that region.

out2 <- cojo_cond(
    vcffile=vcffile, 
    bfile=ldref,
    pop="EUR",
    snplist=unlist(sapply(out$res, function(x) x$susieR$fmset))
)

TODO

  • Make sure finemapped variants are in reference panel
  • Improve speed of cojo by implementing within R so don’t have to use GCTA
  • Determine how to combine cojo with coloc