Harmonise the alleles and effects between the exposure and outcome

In order to perform MR the effect of a SNP on an outcome and exposure must be harmonised to be relative to the same allele.

Usage

harmonise_data(exposure_dat, outcome_dat, action = 2)

Arguments

exposure_dat

Output from read_exposure_data().

outcome_dat

Output from extract_outcome_data().

action

Level of strictness in dealing with SNPs.

action = 1: Assume all alleles are coded on the forward strand, i.e. do not attempt to flip alleles
action = 2: Try to infer positive strand alleles, using allele frequencies for palindromes (default, conservative);
action = 3: Correct strand for non-palindromic SNPs, and drop all palindromic SNPs from the analysis (more conservative). If a single value is passed then this action is applied to all outcomes. But multiple values can be supplied as a vector, each element relating to a different outcome.

Value

Data frame with harmonised effects and alleles

Details

Expects data in the format generated by read_exposure_data() and extract_outcome_data(). This means the inputs must be dataframes with the following columns:

outcome_dat:

SNP
beta.outcome
se.outcome
effect_allele.outcome
other_allele.outcome
eaf.outcome
outcome

exposure_dat:

SNP
beta.exposure
se.exposure
effect_allele.exposure
other_allele.exposure
eaf.exposure

The function tries to harmonise INDELs. If they are coded as sequence strings things work more smoothly. If they are coded as D/I in one dataset it will try to convert them to sequences if the other dataset has adequate information. If coded as D/I in one dataset and as a variant with equal length INDEL alleles in the other, the variant is dropped. If one or both the datasets only has one allele (i.e. the effect allele) then harmonisation is naturally going to be more ambiguous and more variants will be dropped.