R/harmonise.r
is_forward_strand.Rd
Assuming reference data is all on forward strand, check if the GWAS is also. Use some threshold e.g. if more than 90 need to be flipped then it's likely that the dataset is on the forward strand
is_forward_strand( gwas_snp, gwas_a1, gwas_a2, ref_snp, ref_a1, ref_a2, threshold = 0.9 )
gwas_snp | Vector of SNP names for the dataset being checked |
---|---|
gwas_a1 | Vector of alleles |
gwas_a2 | Vector of alleles |
ref_snp | Vector of SNP names for the reference dataset |
ref_a1 | Vector of alleles |
ref_a2 | Vector of alleles |
threshold | =0.9 If the proportion of allele strands match is above this threshold, then declare the dataset to be on the forward strand |
1 = Forward strand; 2 = Not on forward strand
This function can be used to evaluate how strict harmonisation should be The trade off if you assume we are not on the forward strand then palindromic SNPs are dropped within a particular frequency range But you could instead have some small probability of error for whether palindromic SNPs are on the forward strand, and avoid dropping too many variants.