This site contains course materials for SISG Module QG4: WGS Data Analysis, June 11-13, 2025.
This module will provide an introduction to analyzing genotype data generated from whole genome sequencing (WGS). It will focus on extensions of standard GWAS analyses (e.g. rare-variant association tests) and “post-GWAS” follow-up analyses (e.g. conditional analysis, fine-mapping), and how WGS may improve results or be best utilized for these analyses; methods that incorporate variant annotation information will be highlighted.
Methods and examples will be informed by the instructors’ experience in large human genetics consortia (e.g. TOPMed), and, therefore, will focus on analyzing human data, but may be applicable/extendable to other organisms. A basic introduction to cloud computing will be provided, and students will perform hands-on exercises on a genomic analysis cloud platform.
After attending this module, participants will be able to:
Course material will be presented through lectures. Slides for lectures are linked in the schedule below.
Many of the lectures will be followed with hands-on tutorials/exercises. Students are encouraged to work through the tutorials together. Afterwards, the instructors will walk through the tutorials and lead a discussion.
To run the tutorials, log into NHLBI BioData Catalyst powered by Seven Bridges with your username and password – we will use this platform for live demonstrations during the course.
If you are affiliated with a US-based institution, you will log into the platform using eRA Commons credentials. eRA Commons is the system used by NIH to administer grants, and it also serves as a mechanism for authenticating researchers to work with controlled access data. To create a BioData Catalyst account, please follow steps on this page.
If you are not affiliated with a US institution that is already registered in eRA Commons, or you do not already have an eRA Commons ID in advance of the workshop, you will still be able to fully participate in the module exercises. Please see this document for instructions.
After you create an account, we will add you to the SISG 2025 WGS Analysis Module course project.
All of the R code and data can also be downloaded from the github repository from which the site is built and run on your local machine. Download the complete workshop data and tutorials: https://github.com/UW-GAC/SISG_2025/archive/main.zip
The exact timing of the schedule is subject to change, depending on the amount of discussion we have in class.
Coffee breaks are daily from 10:00am-10:30am and 3:00pm-3:30pm. Lunch break is daily from 12:00pm-1:30pm.
Wednesday, June 11th
| Topic | Materials |
|---|---|
| Introduction | Slides |
| Intro to Cloud Computing for WGS Data Analysis | Lecture Slides |
| Intro to GDS Tutorial | .Rmd | .html |
| GWAS Crash Course | Lecture Slides |
| GWAS Tutorial | Slides | .Rmd | .html |
| Extra: Population Structure and Relatedness Tutorial | .Rmd | .html |
| Extra: GWAS: Advanced Model Extenstions Tutorial | .Rmd | .html |
| Extra: GENESIS Model Explorer Tutorial | .Rmd | .html |
Thursday, June 12th
| Topic | Materials |
|---|---|
| Leveraging Multi-Ancestry Data | Lecture Slides |
| LD Exercise | .pdf | NEJM 2020 | Nature 2021 | KEY |
| TOPMed Telomere Length GWAS | Slides |
| Locus Zoom and Conditional Analysis Tutorial | .Rmd | .html |
| Variant Annotation | Lecture Slides |
| UCSC Genome Browser and FAVOR Tutorial | .pdf | chr16 SNPS | KEY |
| Annotation Explorer Tutorial | .Rmd | .html |
| 5:00pm-6:00pm: Tutorial Open Q&A Session |
Friday, June 13th
| Topic | Materials |
|---|---|
| Multi-Variant Association Tests | Lecture Slides |
| Multi-Variant Association Tests Tutorial | .Rmd | .html |
| STAAR | Lecture Slides |
| STAAR Tutorial | .Rmd | .html |
| Recent Findings and Resources for WGS Analysis | Lecture Slides |
| Open Q&A |
A detailed tutorial and relevant R scripts for STAAR pipeline are available at https://github.com/xihaoli/STAARpipeline-Tutorial.
If you are new to R, you might find the following material helpful: