Read outcome data — read_outcome

Reads in outcome data. Checks and organises columns for use with MR or enrichment tests. Infers p-values when possible from beta and se.

Usage

read_outcome_data(
  filename,
  snps = NULL,
  sep = " ",
  phenotype_col = "Phenotype",
  snp_col = "SNP",
  beta_col = "beta",
  se_col = "se",
  eaf_col = "eaf",
  effect_allele_col = "effect_allele",
  other_allele_col = "other_allele",
  pval_col = "pval",
  units_col = "units",
  ncase_col = "ncase",
  ncontrol_col = "ncontrol",
  samplesize_col = "samplesize",
  gene_col = "gene",
  id_col = "id",
  min_pval = 1e-200,
  log_pval = FALSE,
  chr_col = "chr",
  pos_col = "pos"
)

Arguments

filename: Filename. Must have header with at least SNP column present.
snps: SNPs to extract. If NULL, which the default, then doesn't extract any and keeps all.
sep: Specify delimiter in file. The default is space, i.e. sep=" ".
phenotype_col: Optional column name for the column with phenotype name corresponding the the SNP. If not present then will be created with the value "Outcome". Default is "Phenotype".
snp_col: Required name of column with SNP rs IDs. The default is "SNP".
beta_col: Required for MR. Name of column with effect sizes. The default is "beta".
se_col: Required for MR. Name of column with standard errors. The default is "se".
eaf_col: Required for MR. Name of column with effect allele frequency. The default is "eaf".
effect_allele_col: Required for MR. Name of column with effect allele. Must be "A", "C", "T" or "G". The default is "effect_allele".
other_allele_col: Required for MR. Name of column with non effect allele. Must be "A", "C", "T" or "G". The default is "other_allele".
pval_col: Required for enrichment tests. Name of column with p-value. The default is "pval".
units_col: Optional column name for units. The default is "units".
ncase_col: Optional column name for number of cases. The default is "ncase".
ncontrol_col: Optional column name for number of controls. The default is "ncontrol".
samplesize_col: Optional column name for sample size. The default is "samplesize".
gene_col: Optional column name for gene name. The default is "gene".
id_col: Optional column name to give the dataset an ID. Will be generated automatically if not provided for every trait / unit combination. The default is "id".
min_pval: Minimum allowed p-value. The default is 1e-200.
log_pval: The pval is -log10(P). The default is FALSE.
chr_col: Optional column name for chromosome. Default is "chr".
pos_col: Optional column name for genetic position Default is "pos".

Value

data frame