14 Analysis Pipeline
The DCC’s analysis pipeline is hosted on github: https://github.com/UW-GAC/analysis_pipeline
14.1 Running on a local cluster
To run a burden test on our local SGE cluster, first we create a config file and call it assoc_window_burden.config
:
out_prefix "test"
gds_file "testdata/1KG_phase3_subset_chr .gds"
phenotype_file "testdata/1KG_phase3_subset_annot.RData"
null_model_file "testdata/null_model.RData"
null_model_params "testdata/null_model.params"
variant_include_file "testdata/variant_include_chr .RData"
alt_freq_max "0.1"
test "burden"
test_type "score"
genome_build "hg19"
We will use the python script assoc.py
to submit all jobs. First we look at the available options:
setenv PIPELINE /projects/topmed/working_code/analysis_pipeline_devel
$PIPELINE/assoc.py --help
Let’s run a sliding window test on chromosomes 1-10. We will also specify the cluster type, although UW_Cluster is actually the default. The cluster file is a JSON file that can override default values for the cluster configuration. In this case, we are changing the memory requirements for each job to only reserve a small amount of memory on each cluster node. The last argument is our config file.
First, we print the commands that will be be run without actually submitting jobs:
$PIPELINE/assoc.py \
--chromosomes 1-10 \
--cluster_type UW_Cluster \
--cluster_file test_cluster_cfg.json \
--print_only \
window \
testdata/assoc_window_burden.config
The default segment length is 10,000 kb, but we can change that to 50,000 kb when we submit:
$PIPELINE/assoc.py \
--chromosomes 1-10 \
--cluster_type UW_Cluster \
--cluster_file test_cluster_cfg.json \
--segment_length 50000 \
window \
testdata/assoc_window_burden.config
We can use the qstat
command to check the status of our jobs.