Check a GWAS dataset against a reference known to be on the forward strand

Assuming reference data is all on forward strand, check if the GWAS is also. Use some threshold e.g. if more than 90 need to be flipped then it's likely that the dataset is on the forward strand

is_forward_strand(
  gwas_snp,
  gwas_a1,
  gwas_a2,
  ref_snp,
  ref_a1,
  ref_a2,
  threshold = 0.9
)

Arguments

gwas_snp	Vector of SNP names for the dataset being checked
gwas_a1	Vector of alleles
gwas_a2	Vector of alleles
ref_snp	Vector of SNP names for the reference dataset
ref_a1	Vector of alleles
ref_a2	Vector of alleles
threshold	=0.9 If the proportion of allele strands match is above this threshold, then declare the dataset to be on the forward strand

Value

1 = Forward strand; 2 = Not on forward strand

Details

This function can be used to evaluate how strict harmonisation should be The trade off if you assume we are not on the forward strand then palindromic SNPs are dropped within a particular frequency range But you could instead have some small probability of error for whether palindromic SNPs are on the forward strand, and avoid dropping too many variants.