Reads in outcome data. Checks and organises columns for use with MR or enrichment tests. Infers p-values when possible from beta and se.

read_outcome_data(
  filename,
  snps = NULL,
  sep = " ",
  phenotype_col = "Phenotype",
  snp_col = "SNP",
  beta_col = "beta",
  se_col = "se",
  eaf_col = "eaf",
  effect_allele_col = "effect_allele",
  other_allele_col = "other_allele",
  pval_col = "pval",
  units_col = "units",
  ncase_col = "ncase",
  ncontrol_col = "ncontrol",
  samplesize_col = "samplesize",
  gene_col = "gene",
  id_col = "id",
  min_pval = 1e-200,
  log_pval = FALSE,
  chr_col = "chr",
  pos_col = "pos"
)

Arguments

filename

Filename. Must have header with at least SNP column present.

snps

SNPs to extract. If NULL, which the default, then doesn't extract any and keeps all.

sep

Specify delimeter in file. The default is space, i.e. sep=" ".

phenotype_col

Optional column name for the column with phenotype name corresponding the the SNP. If not present then will be created with the value "Outcome". Default is "Phenotype".

snp_col

Required name of column with SNP rs IDs. The default is "SNP".

beta_col

Required for MR. Name of column with effect sizes. THe default is "beta".

se_col

Required for MR. Name of column with standard errors. The default is "se".

eaf_col

Required for MR. Name of column with effect allele frequency. The default is "eaf".

effect_allele_col

Required for MR. Name of column with effect allele. Must be "A", "C", "T" or "G". The default is "effect_allele".

other_allele_col

Required for MR. Name of column with non effect allele. Must be "A", "C", "T" or "G". The default is "other_allele".

pval_col

Required for enrichment tests. Name of column with p-value. The default is "pval".

units_col

Optional column name for units. The default is "units".

ncase_col

Optional column name for number of cases. The default is "ncase".

ncontrol_col

Optional column name for number of controls. The default is "ncontrol".

samplesize_col

Optional column name for sample size. The default is "samplesize".

gene_col

Optional column name for gene name. The default is "gene".

id_col

Optional column name to give the dataset an ID. Will be generated automatically if not provided for every trait / unit combination. The default is "id".

min_pval

Minimum allowed p-value. The default is 1e-200.

log_pval

The pval is -log10(P). The default is FALSE.

chr_col

Optional column name for chromosome. Default is "chr".

pos_col

Optional column name for genetic position Default is "pos".

Value

data frame