Assuming reference data is all on forward strand, check if the GWAS is also. Use some threshold e.g. if more than 90 need to be flipped then it's likely that the dataset is on the forward strand

is_forward_strand(
  gwas_snp,
  gwas_a1,
  gwas_a2,
  ref_snp,
  ref_a1,
  ref_a2,
  threshold = 0.9
)

Arguments

gwas_snp

Vector of SNP names for the dataset being checked

gwas_a1

Vector of alleles

gwas_a2

Vector of alleles

ref_snp

Vector of SNP names for the reference dataset

ref_a1

Vector of alleles

ref_a2

Vector of alleles

threshold

=0.9 If the proportion of allele strands match is above this threshold, then declare the dataset to be on the forward strand

Value

1 = Forward strand; 2 = Not on forward strand

Details

This function can be used to evaluate how strict harmonisation should be The trade off if you assume we are not on the forward strand then palindromic SNPs are dropped within a particular frequency range But you could instead have some small probability of error for whether palindromic SNPs are on the forward strand, and avoid dropping too many variants.