10 Running a GWAS workflow on BioData Catalyst
On the BioData Catalyst platform, locate the following public apps and copy them to your project. To find the apps, go to Public Gallery > Apps, and then use the search box. Click “copy”, select your project from the dropdown, and click “copy” again.
- Bcftools Merge and Filter
- VCF to GDS Converter
- KING robust
- PC-AiR
- PC-Relate
- GENESIS Null Model
- GENESIS Single Variant Association Testing
- GENESIS Aggregate Association Testing
10.1 Preparing the genotype data
10.1.1 Bcftools Merge and Filter
Run the Bcftools Merge and Filter
tool to combine two separate VCF files into one merged file (see the task “1. Bcftools merge chr 1 subsets”).
- Inputs
- Input variant files:
1KG_phase3_chr1.subset1.vcf
and1KG_phase3_chr1.subset2.vcf
- Input variant files:
- App Settings
- Output name:
1KG_phase3_subset_chr1
- Output name:
This will create a file named 1KG_phase3_subset_chr1.merged.vcf.gz
that contains all of the data from both input files.
10.1.2 VCF to GDS Converter
Run the VCF to GDS Converter
workflow to convert the 1000 Genomes VCF file you just created to a GDS file (see the task “2. Convert chr 1 VCF to GDS”).
- Inputs
- Variants Files:
1KG_phase3_subset_chr1.merged.vcf.gz
- Variants Files:
- App Settings
- check GDS: No
This will create a GDS file named 1KG_phase3_subset_chr1.merged.gds
that contains the same information as the input VCF file. Use the “view stats and logs” button to check on the status of your tasks.
10.3 Association Testing
10.3.1 Null Model
Fit a null model using the Null Model
workflow (see the task “6. GENESIS Null Model run”).
- Inputs
- Phenotype File:
sample_phenotype_pcs.RData
(note that the PCs are included in this file) - Relatedness matrix file:
kinship.RData
- Phenotype File:
- App Settings
- Outcome: height
- Covariates: age, sex, study, PC1, PC2, PC3, PC4
- Group variate: study
- Family: gaussian
- Output prefix:
height
This will create a height_null_model.RData
file that contains the null model fit and a height_phenotypes.RData
file that contains the phenotype data used in the analysis. It also creates a height_report.html
null model report – review this report.
10.3.2 Single variant association test
Use the GENESIS Single Variant Association Testing
workflow to run a single variant association test (see the task “7. GENESIS Single Variant Association Testing run”).
- Inputs
- GDS files:
1KG_phase3_subset_chr1.merged.gds
- Null model file:
height_null_model.RData
- Phenotype file:
height_phenotypes.RData
(it is recommended you use the file produced by the Null Model workflow)
- GDS files:
- App Settings
- MAC threshold: 5
- Output prefix:
height_single
- memory GB: 32
This will create a height_single_chr1.RData
file with the association test results as well as height_single_manh.png
and height_single_qq.png
files – review the QQ and Manhattan plots.
10.3.3 Aggregate variant test
Use the GENESIS Aggregate Association Testing
workflow to run a burden association test (see the task “8. GENESIS Aggregate Association Testing run”).
- Inputs
- GDS files:
1KG_phase3_subset_chr1.merged.gds
- Null model file:
height_null_model.RData
- Phenotype file:
height_phenotypes.RData
(it is recommended you use the file produced by the Null Model workflow) - Variant group files:
variants_by_gene_chr1.RData
- GDS files:
- App Settings
- Alt Freq Max: 0.1
- Test: burden
- Output prefix:
height_burden
- Memory GB: 32
This will create a height_burden_chr1.RData
file with the association test results as well as height_burden_manh.png
and height_burden_qq.png
files – review the QQ and Manhattan plots.
10.4 Analysis follow-up
In RStudio, locate the results of your association test under /sbgenomics/project-files/
. Load one of these results files into R and explore it.