SISG Module 17: Computational Pipeline for WGS Data
2020-08-02
1 Introduction
This site contains course materials for Course materials for SISG Module 17: Computational Pipeline for WGS Data, July 29-31, 2020. Data used is located in the github repository from which the site is built, as well as in the TOPMed analysis pipeline.
Videos and slides for lectures are linked below in the schedule. Links to live sessions can be found at https://si.biostat.washington.edu/suminst/sisg2020/modules/SM2017 (make sure to log in).
To work through the exercises, log into https://platform.sb.biodatacatalyst.nhlbi.nih.gov with your username and password.
Join the Slack channel here: https://uwbiostatisticssisg.slack.com
1.1 Schedule
NOTE: All times are Pacific Daylight Time (GMT-07:00)
Wednesday, July 29
- Zoom session 11:30-13:15 PDT - Introduction and Genomic Data Storage
- Introduction (video) (slides)
- Pre-recorded lecture - Using BioData Catalyst for SISG exercises (video - 11 min)
- Pre-recorded lecture - Sequencing data formats (video - 15 min) (slides)
- Pre-recorded lecture - Intro to Genomic Data Storage (video - 17 min) (slides)
- Exercises in breakout rooms - GDS format in R
- Discussion (video)
- Zoom session 13:45-14:30 PDT - Phenotype harmonization
- Pre-recorded lecture - Phenotype harmonization (video - 19 min) (slides)
- Exercises in breakout rooms - Harmonization in R
- Discussion (video)
Thursday, July 30
- Zoom session 8:00-9:45 PDT - Association tests, Part I
- Pre-recorded lecture - Association tests: Methods and motivation (video - 57 min) (slides)
- Pre-recorded lecture - GENESIS for association tests (video - 11 min) (slides)
- Exercises in breakout rooms - Association tests in R
- Discussion (video)
- Zoom session 10:15-11:45 PDT - Association tests, Part II
- Pre-recorded lecture - Aggregate tests (video - 46 min) (slides)
- Exercises in breakout rooms - Sliding window tests
- Discussion (video)
- Zoom session 12:45-14:30 PDT - Population Structure and Relatedness
- Pre-recorded lecture - Population structure inference (video - 33 min) (slides)
- Pre-recorded lecture - Relatedness inference (video - 25 min) (slides)
- Pre-recorded lecture - R packages for PCA and relatedness (video - 9 min) (slides)
- Exercises in breakout rooms - Population structure and relatedness in R
- Discussion (video)
Friday, July 31
- Zoom session 8:00-9:30 PDT - Mixed models
- Pre-recorded lecture - Mixed model association testing (video - 42 min) (slides)
- Exercises in breakout rooms - Mixed models in R
- Discussion (video)
- Zoom session 9:45-11:30 PDT - Variant annotation
- Pre-recorded lecture - Variant annotation (video - 21 min) (slides)
- Exercises in breakout rooms - Using variant annotation
- Discussion (video) - includes 30 min of open questions and discussion
- Zoom session 12:30-13:45 PDT - Working in the cloud
- Pre-recorded lecture - Analysis pipelines on the cloud (video - 24 min) (slides)
- Pre-recorded lecture - Running a workflow on BioData Catalyst (video - 16 min)
- Exercises in breakout rooms - Running a GWAS workflow
- Discussion (video)
- Zoom session 14:00-14:30 PDT - Open session for questions/advice
- Discussion (video)
Download the workshop data and exercises: https://github.com/UW-GAC/SISG_2020/archive/master.zip
1.2 R packages used
1.3 Resources
If you are new to R, you might find the following material helpful:
- Introduction to R materials from SISG Module 3
- Graphics with ggplot2 tutorial
- Data manipulation with dplyr