Skip to contents

Reads in exposure data. Checks and organises columns for use with MR or enrichment tests. Infers p-values when possible from beta and se.

Usage

read_exposure_data(
  filename,
  clump = FALSE,
  sep = " ",
  phenotype_col = "Phenotype",
  snp_col = "SNP",
  beta_col = "beta",
  se_col = "se",
  eaf_col = "eaf",
  effect_allele_col = "effect_allele",
  other_allele_col = "other_allele",
  pval_col = "pval",
  units_col = "units",
  ncase_col = "ncase",
  ncontrol_col = "ncontrol",
  samplesize_col = "samplesize",
  gene_col = "gene",
  id_col = "id",
  min_pval = 1e-200,
  log_pval = FALSE,
  chr_col = "chr",
  pos_col = "pos",
  clump_kb = 10000,
  clump_r2 = 0.001,
  clump_p1 = 1,
  pop = "EUR",
  bfile = NULL,
  plink_bin = NULL
)

Arguments

filename

Filename. Must have header with at least SNP column present.

clump

Whether to perform LD clumping with clump_data() on the exposure data. The default is FALSE.

sep

Specify delimiter in file. The default is a space, i.e. " ".

phenotype_col

Optional column name for the column with phenotype name corresponding the the SNP. If not present then will be created with the value "Outcome". The default is "Phenotype".

snp_col

Required name of column with SNP rs IDs. The default is "SNP".

beta_col

Required for MR. Name of column with effect sizes. The default is "beta".

se_col

Required for MR. Name of column with standard errors. The default is "se".

eaf_col

Required for MR. Name of column with effect allele frequency. The default is "eaf".

effect_allele_col

Required for MR. Name of column with effect allele. Must be "A", "C", "T" or "G". The default is "effect_allele".

other_allele_col

Required for MR. Name of column with non effect allele. Must be "A", "C", "T" or "G". The default is "other_allele".

pval_col

Required for enrichment tests. Name of column with p-value. The default is "pval".

units_col

Optional column name for units. The default is "units".

ncase_col

Optional column name for number of cases. The default is "ncase".

ncontrol_col

Optional column name for number of controls. The default is "ncontrol".

samplesize_col

Optional column name for sample size. The default is "samplesize".

gene_col

Optional column name for gene name. The default is "gene".

id_col

Optional column name to give the dataset an ID. Will be generated automatically if not provided for every trait / unit combination. The default is "id".

min_pval

Minimum allowed p-value. The default is 1e-200.

log_pval

The p-value is -log10(P). The default is FALSE.

chr_col

Optional column name for chromosome. Default is "chr".

pos_col

Optional column name for genetic position Default is "pos".

clump_kb

Clumping window, default is 10000.

clump_r2

Clumping r2 cutoff. Note that this default value has recently changed from 0.01 to 0.001.

clump_p1

Clumping sig level for index SNPs, default is 1.

pop

Super-population to use as reference panel. Default = "EUR". Options are "EUR", "SAS", "EAS", "AFR", "AMR". 'legacy' also available - which is a previously used version of the EUR panel with a slightly different set of markers

bfile

If this is provided then will use the API. Default = NULL

If NULL and bfile is not NULL then will detect packaged plink binary for specific OS. Otherwise specify path to plink binary. Default = NULL

Value

data frame