SISG 2025 Mod QG4 | WGS Data Analysis

This site contains course materials for SISG Module QG4: WGS Data Analysis, June 11-13, 2025.

Course Description

This module will provide an introduction to analyzing genotype data generated from whole genome sequencing (WGS). It will focus on extensions of standard GWAS analyses (e.g. rare-variant association tests) and “post-GWAS” follow-up analyses (e.g. conditional analysis, fine-mapping), and how WGS may improve results or be best utilized for these analyses; methods that incorporate variant annotation information will be highlighted.

Methods and examples will be informed by the instructors’ experience in large human genetics consortia (e.g. TOPMed), and, therefore, will focus on analyzing human data, but may be applicable/extendable to other organisms. A basic introduction to cloud computing will be provided, and students will perform hands-on exercises on a genomic analysis cloud platform.

Learning Objectives

After attending this module, participants will be able to:

  1. Understand how to perform association analyses for rare variants measured in WGS data using aggregate tests
  2. Access variant annotation resources and understand how to incorporate annotation information into analyses to improve power and inform results
  3. Understand the theory of, and how and when to perform, various “post-GWAS” follow-up analyses
  4. Leverage multi-ancestry WGS data
  5. Appreciate the utility of existing genomic analysis cloud platforms and get hands-on experience with cloud computing on one of these platforms

Course Format

Lectures

Course material will be presented through lectures. Slides for lectures are linked in the schedule below.

Tutorials

Many of the lectures will be followed with hands-on tutorials/exercises. Students are encouraged to work through the tutorials together. Afterwards, the instructors will walk through the tutorials and lead a discussion.

To run the tutorials, log into NHLBI BioData Catalyst powered by Seven Bridges with your username and password – we will use this platform for live demonstrations during the course.

Setting up a BioData Catalyst account

If you are affiliated with a US-based institution, you will log into the platform using eRA Commons credentials. eRA Commons is the system used by NIH to administer grants, and it also serves as a mechanism for authenticating researchers to work with controlled access data. To create a BioData Catalyst account, please follow steps on this page.

If you are not affiliated with a US institution that is already registered in eRA Commons, or you do not already have an eRA Commons ID in advance of the workshop, you will still be able to fully participate in the module exercises. Please see this document for instructions.

After you create an account, we will add you to the SISG 2025 WGS Analysis Module course project.

All of the R code and data can also be downloaded from the github repository from which the site is built and run on your local machine. Download the complete workshop data and tutorials: https://github.com/UW-GAC/SISG_2025/archive/main.zip

Course Schedule and Materials

The exact timing of the schedule is subject to change, depending on the amount of discussion we have in class.
Coffee breaks are daily from 10:00am-10:30am and 3:00pm-3:30pm. Lunch break is daily from 12:00pm-1:30pm.

Wednesday, June 11th

Topic Materials
Introduction Slides
Intro to Cloud Computing for WGS Data Analysis Lecture Slides
Intro to GDS Tutorial .Rmd | .html
GWAS Crash Course Lecture Slides
GWAS Tutorial Slides | .Rmd | .html
Extra: Population Structure and Relatedness Tutorial .Rmd | .html
Extra: GWAS: Advanced Model Extenstions Tutorial .Rmd | .html
Extra: GENESIS Model Explorer Tutorial .Rmd | .html

Thursday, June 12th

Topic Materials
Leveraging Multi-Ancestry Data Lecture Slides
LD Exercise .pdf | NEJM 2020 | Nature 2021 | KEY
TOPMed Telomere Length GWAS Slides
Locus Zoom and Conditional Analysis Tutorial .Rmd | .html
Variant Annotation Lecture Slides
UCSC Genome Browser and FAVOR Tutorial .pdf | chr16 SNPS | KEY
Annotation Explorer Tutorial .Rmd | .html
5:00pm-6:00pm: Tutorial Open Q&A Session  

Friday, June 13th

Topic Materials
Multi-Variant Association Tests Lecture Slides
Multi-Variant Association Tests Tutorial .Rmd | .html
STAAR Lecture Slides
STAAR Tutorial .Rmd | .html
Recent Findings and Resources for WGS Analysis Lecture Slides
Open Q&A  

R packages used

Resources

A detailed tutorial and relevant R scripts for STAAR pipeline are available at https://github.com/xihaoli/STAARpipeline-Tutorial.

If you are new to R, you might find the following material helpful: