Tutorial
Assuming the GWAS summary stats have a hg19/b37 chromosome name & position you can use these files:
Download GWAS
# obtain test gwas summary stats
wget https://raw.githubusercontent.com/MRCIEU/gwas2vcfweb/master/app/tests/data/example.1k.txt
Create parameters file
{
"chr_col": 0,
"pos_col": 1,
"snp_col": 2,
"ea_col": 3,
"oa_col": 4,
"beta_col": 5,
"se_col": 6,
"ncontrol_col": 7,
"pval_col": 8,
"eaf_col": 9,
"delimiter": "\t",
"header": true,
"build": "GRCh37"
}
Map GWAS summary stats to GWAS-VCF
SumStatsFile=/data/example.1k.txt
RefGenomeFile=/data/human_g1k_v37.fasta
ParamFile=/data/params.json
DbSnpVcfFile=/data/dbsnp.v153.b37.vcf.gz
VcfFileOutPath=/data/out.vcf
ID="test"
python /app/main.py \
--data ${SumStatsFile} \
--json ${ParamFile} \
--id ${ID} \
--ref ${RefGenomeFile} \
--dbsnp ${DbSnpVcfFile} \
--out ${VcfFileOutPath} \
--alias /app/alias-b37.txt
Alias file
Some genome builds use the chr prefix on chromosome names i.e. chr1
while others just use 1
. This will cause issues if your GWAS summary statistics and FASTA you wish to map to have different chromosome names (although they use the same genome build).
One solution is to provide an alias file to map your GWAS summary stats chromosome name to another string. An example alias is provided in the repo alias-b37.txt
and alias-hg38.txt
. The format is source-chr\tdest-chr
one row per contig.