Skip to contents

The GPMap API allows you to upload your own GWAS summary statistics and receive colocalisation results against the existing GPMap. This vignette covers how to upload a GWAS, retrieve your results, interpret them, and compare your upload with another trait (including another upload) to find insights.

Uploading a GWAS

Use gpmapr::upload_gwas() to submit your summary statistics. You need a file with standard GWAS columns and metadata such as sample size and ancestry.

# Example: upload a GWAS file
result <- gpmapr::upload_gwas(
  file = "path/to/your_gwas.tsv.gz",
  name = "Specific Triat",
  p_value_threshold = 5e-8,
  column_names = list(
    CHR = "chr",
    BP = "pos",
    P = "pval",
    EA = "effect_allele",
    OA = "other_allele",
    EAF = "eaf",
    BETA = "beta",
    SE = "se"
  ),
  email = "your@email.com",
  category = "continuous",
  ancestry = "EUR",
  sample_size = 50000,
  reference_build = "GRCh38"
)

You can also compare your upload with another upload by passing the GUIDs of the uploads you want to compare with, by including compare_with_upload_guids = c("GUID1").

The response includes a GUID (a UUID such as a1b2c3d4-e5f6-7890-abcd-ef1234567890). Save this GUID; you use it to fetch your results once they are ready.

# The GUID is returned in the response
my_guid <- result$id

Fetching your results

Processing can take some time. Use the GUID with trait() to check status and retrieve results. Here we will use an GWAS that has already been uploaded to the GPMap.

This is of Atopic Dermatitis.

my_guid <- "7a289615-c1b4-91f3-3d97-887f60de9155"
my_results <- gpmapr::trait(my_guid, include_associations = TRUE)

When processing is complete, the result includes trait, coloc_groups, study_extractions, upload_study_extractions, and optionally associations and coloc_pairs.

Interpreting your results

Much of the same analysis that is shown in the case study can be applied to your upload results. If you are interested in the relationship between your upload and another trait, you can filter the coloc groups by trait id.

Your upload results use the same structure as traits in the database. Here are the main components and what they mean.

Study extractions

The difference between study_extractions and upload_study_extractions is that upload_study_extractions are the individual finemapped regions (LD blocks) that were found from the uploaded GWAS, while study_extractions are the existing finemapped loci that your upload colocalises with.

Key columns include study, snp, gene, chr, bp, min_p, and trait metadata (trait_name, data_type, tissue).

# Study extractions (structure may vary - list of dataframes or single dataframe)
uploaded_study_extractions <- my_results$upload_study_extractions
paste("Number of uploaded study extractions:", nrow(uploaded_study_extractions))
#> [1] "Number of uploaded study extractions: 144"
study_extractions <- my_results$study_extractions
paste("Number of study extractions associated with your upload:", nrow(study_extractions))
#> [1] "Number of study extractions associated with your upload: 3854"

Coloc groups and coloc pairs

Although coloc groups provide a more robust view of the relationship between your trait and other traits, coloc pairs can be useful for identifying the specific regions that are driving the colocalisation, espeically if the loci is less powered, and therefore less likely to be captured by the coloc groups.

Coloc groups identify genomic regions (LD blocks) where your trait colocalises with other studies in the map. Each row is a study extraction that shares a colocalisation signal with your trait at that region.

If you are interested in the relationship between your trait and another specific trait, you can filter the coloc groups by trait id.

trait_to_compare <- 2527L # Allergic rhinitis
upload_id   <- my_results$trait$id
upload_name <- as.character(my_results$trait$name)[[1L]]

compare_name <- {
  nm <- my_results$coloc_groups |>
    dplyr::filter(trait_id == trait_to_compare) |>
    dplyr::distinct(trait_name) |>
    dplyr::pull(trait_name)
  if (length(nm) >= 1L) nm[[1L]] else "Comparison trait"
}

shared_coloc_groups <- my_results$coloc_groups |>
  dplyr::filter(trait_id == trait_to_compare) |>
  dplyr::pull(coloc_group_id) |>
  unique()

compare_coloc_groups <- my_results$coloc_groups |>
  dplyr::filter(
    coloc_group_id %in% shared_coloc_groups,
    trait_id == trait_to_compare | gwas_upload_id == upload_id
  )

compare_coloc_groups <- compare_coloc_groups |>
  dplyr::group_by(coloc_group_id) |>
  dplyr::filter(!any(se == 1)) |>
  dplyr::ungroup()


# Strongest hit (smallest min_p) per group per side
pick_lead <- function(dat, grp_col) {
  dat |>
    dplyr::filter(!is.na(beta), !is.na(se), se > 0, !is.na(min_p)) |>
    dplyr::group_by(dplyr::across(dplyr::all_of(grp_col))) |>
    dplyr::slice_min(order_by = min_p, n = 1L, with_ties = FALSE) |>
    dplyr::ungroup()
}

trait_colours <- c("#2166ac", "#d6604d")
names(trait_colours) <- c(upload_name, compare_name)

make_forest_plot <- function(df, panel_labels, title_str) {
  lbl_fn <- function(x) {
    m <- match(as.character(x), as.character(panel_labels$group_id))
    out <- panel_labels$strip_lbl[m]
    dplyr::if_else(is.na(out), as.character(x), out)
  }
  ggplot2::ggplot(df, ggplot2::aes(x = beta, y = trait_lbl, colour = trait_lbl)) +
    ggplot2::geom_vline(xintercept = 0, colour = "red", lty = 2) +
    ggplot2::geom_hline(yintercept = 1.5, colour = "grey85", linewidth = 0.5) +
    ggplot2::geom_errorbar(
      ggplot2::aes(xmin = beta - 1.96 * se, xmax = beta + 1.96 * se),
      width = 0.12, linewidth = 0.35
    ) +
    ggplot2::geom_point(size = 2.5) +
    ggplot2::facet_grid(
      rows = ggplot2::vars(coloc_panel),
      scales = "fixed", space = "fixed", switch = "y",
      labeller = ggplot2::labeller(coloc_panel = lbl_fn)
    ) +
    ggplot2::scale_y_discrete(expand = ggplot2::expansion(add = 0.65)) +
    ggplot2::scale_colour_manual(values = trait_colours,
                                 breaks = c(upload_name, compare_name)) +
    ggplot2::theme_bw() +
    ggplot2::theme(
      panel.grid      = ggplot2::element_blank(),
      panel.border    = ggplot2::element_blank(),
      panel.spacing.y = grid::unit(1.5, "lines"),
      strip.background = ggplot2::element_rect(colour = NA),
      strip.placement  = "outside",
      plot.title       = ggplot2::element_text(hjust = 0.5, size = 12),
      legend.position  = "bottom",
      axis.text.y      = ggplot2::element_blank(),
      axis.ticks.y     = ggplot2::element_blank()
    ) +
    ggplot2::labs(x = "Study beta", y = NULL, colour = NULL, title = title_str)
}

make_strip_labels <- function(df) {
  df |>
    dplyr::group_by(group_id) |>
    dplyr::summarise(
      strip_lbl = paste(unique(stats::na.omit(display_snp)), collapse = " · "),
      .groups = "drop"
    ) |>
    dplyr::mutate(strip_lbl = dplyr::if_else(
      nzchar(strip_lbl), strip_lbl, as.character(group_id)
    ))
}

build_forest_df <- function(upload_rows, compare_rows, max_groups = 5L) {
  df <- dplyr::bind_rows(
    upload_rows  |> dplyr::mutate(trait_lbl = upload_name),
    compare_rows |> dplyr::mutate(trait_lbl = compare_name)
  ) |>
    dplyr::mutate(trait_lbl = factor(trait_lbl, levels = c(upload_name, compare_name)))

  ids_both <- df |>
    dplyr::count(group_id, name = "n") |>
    dplyr::filter(n == 2L) |>
    dplyr::slice_head(n = max_groups) |>
    dplyr::pull(group_id)

  df |>
    dplyr::filter(group_id %in% ids_both) |>
    dplyr::mutate(coloc_panel = factor(group_id))
}

# --- Section 1: Coloc groups ---
cg_upload <- compare_coloc_groups |>
  dplyr::filter(gwas_upload_id == upload_id) |>
  pick_lead("coloc_group_id") |>
  dplyr::mutate(group_id = as.character(coloc_group_id))

cg_compare <- compare_coloc_groups |>
  dplyr::filter(trait_id == trait_to_compare) |>
  pick_lead("coloc_group_id") |>
  dplyr::mutate(group_id = as.character(coloc_group_id))

cg_df <- build_forest_df(cg_upload, cg_compare)

if (nrow(cg_df) > 0) {
  print(make_forest_plot(
    cg_df, make_strip_labels(cg_df),
    "Colocalisation groups"
  ))
}


# --- Section 2: Coloc pairs only (h4 > 0.8, locus not in shared coloc groups) ---
pairs_all <- my_results$coloc_pairs

if (!is.null(pairs_all) && nrow(pairs_all) > 0 && "h4" %in% names(pairs_all) &&
    "ld_block_id" %in% names(pairs_all)) {
  covered_ld <- compare_coloc_groups |>
    dplyr::pull(ld_block_id) |>
    unique()

  sig_ld <- pairs_all |>
    dplyr::filter(h4 > 0.8, !ld_block_id %in% covered_ld) |>
    dplyr::pull(ld_block_id) |>
    unique()

  if (length(sig_ld) > 0) {
    cp_upload <- my_results$coloc_groups |>
      dplyr::filter(gwas_upload_id == upload_id, ld_block_id %in% sig_ld) |>
      pick_lead("ld_block_id") |>
      dplyr::mutate(group_id = as.character(ld_block_id))

    cp_compare <- my_results$coloc_groups |>
      dplyr::filter(trait_id == trait_to_compare, ld_block_id %in% sig_ld) |>
      pick_lead("ld_block_id") |>
      dplyr::mutate(group_id = as.character(ld_block_id))

    cp_df <- build_forest_df(cp_upload, cp_compare)

    if (nrow(cp_df) > 0) {
      print(make_forest_plot(
        cp_df, make_strip_labels(cp_df),
        "Coloc pairs only (h4 > 0.8, not in a shared coloc group)"
      ))
    }
  }
}

Coloc pairs give pairwise colocalisation probabilities between study extractions:

  • h4: Both traits share one common causal variant
  • h3: Both traits associate at the region through two distinct causal variants

The forest plot above already includes loci with h4 > 0.8 that are not captured by any shared coloc group. The raw pair scores for all significant pairs are shown below.

truncate_results <- 20L

pairs <- my_results$coloc_pairs
if (!is.null(pairs) && nrow(pairs) > 0) {
  cols <- intersect(names(pairs), c(
    "study_extraction_id_a", "study_extraction_id_b",
    "existing_study_extraction_id_a", "existing_study_extraction_id_b",
    "ld_block_id", "h3", "h4", "false_positive", "spurious"
  ))
  knitr::kable(
    head(dplyr::filter(pairs[, cols, drop = FALSE], h4 > 0.8), truncate_results),
    digits = 3
  )
} else {
  "No coloc pairs for these traits"
}
existing_study_extraction_id_a study_extraction_id_a existing_study_extraction_id_b study_extraction_id_b ld_block_id h3 h4 false_positive
NA 7293 46415 NA 9 0.083 0.882 FALSE
NA 7293 49939 NA 9 0.117 0.879 FALSE
NA 7293 50071 NA 9 0.003 0.996 FALSE
NA 7293 50085 NA 9 0.086 0.911 FALSE
NA 7294 276438 NA 58 0.002 0.998 FALSE
NA 7294 276555 NA 58 0.069 0.931 FALSE
NA 7294 276582 NA 58 0.002 0.998 FALSE
NA 7295 NA 7296 59 0.000 1.000 FALSE
NA 7297 NA 7298 59 0.000 1.000 FALSE
NA 7303 281882 NA 60 0.067 0.933 FALSE
NA 7303 282948 NA 60 0.064 0.936 FALSE
NA 7303 283008 NA 60 0.071 0.929 FALSE
NA 7303 283259 NA 60 0.057 0.943 FALSE
NA 7303 289880 NA 60 0.097 0.903 FALSE
NA 7303 291699 NA 60 0.047 0.952 FALSE
NA 7303 291703 NA 60 0.127 0.872 FALSE
NA 7303 296454 NA 60 0.039 0.960 FALSE
NA 7303 296463 NA 60 0.052 0.948 FALSE
NA 7303 296756 NA 60 0.025 0.975 FALSE
NA 7303 296758 NA 60 0.047 0.953 FALSE