Reads in exposure data. Checks and organises columns for use with MR or enrichment tests. Infers p-values when possible from beta and se.
Usage
format_data(
dat,
type = "exposure",
snps = NULL,
header = TRUE,
phenotype_col = "Phenotype",
snp_col = "SNP",
beta_col = "beta",
se_col = "se",
eaf_col = "eaf",
effect_allele_col = "effect_allele",
other_allele_col = "other_allele",
pval_col = "pval",
units_col = "units",
ncase_col = "ncase",
ncontrol_col = "ncontrol",
samplesize_col = "samplesize",
gene_col = "gene",
id_col = "id",
min_pval = 1e-200,
z_col = "z",
info_col = "info",
chr_col = "chr",
pos_col = "pos",
log_pval = FALSE
)
Arguments
- dat
Data frame. Must have header with at least SNP column present.
- type
Is this the exposure or the outcome data that is being read in? The default is
"exposure"
.- snps
SNPs to extract. If NULL then doesn't extract any and keeps all. The default is
NULL
.- header
The default is
TRUE
.- phenotype_col
Optional column name for the column with phenotype name corresponding the the SNP. If not present then will be created with the value
"Outcome"
. The default is"Phenotype"
.- snp_col
Required name of column with SNP rs IDs. The default is
"SNP"
.- beta_col
Required for MR. Name of column with effect sizes. The default is
"beta"
.- se_col
Required for MR. Name of column with standard errors. The default is
"se"
.- eaf_col
Required for MR. Name of column with effect allele frequency. The default is
"eaf"
.- effect_allele_col
Required for MR. Name of column with effect allele. Must contain only the characters "A", "C", "T" or "G". The default is
"effect_allele"
.- other_allele_col
Required for MR. Name of column with non effect allele. Must contain only the characters "A", "C", "T" or "G". The default is
"other_allele"
.- pval_col
Required for enrichment tests. Name of column with p-value. The default is
"pval"
.- units_col
Optional column name for units. The default is
"units"
.- ncase_col
Optional column name for number of cases. The default is
"ncase"
.- ncontrol_col
Optional column name for number of controls. The default is
"ncontrol"
.- samplesize_col
Optional column name for sample size. The default is
"samplesize"
.- gene_col
Optional column name for gene name. The default is
"gene"
.- id_col
The default is
"id"
.- min_pval
Minimum allowed p-value. The default is
1e-200
.- z_col
The default is
"z"
.- info_col
The default is
"info_col"
.- chr_col
The default is
"chr_col"
.- pos_col
The default is
"pos"
.- log_pval
The pval is -log10(P). The default is
FALSE
.