Instructors: Evangelina Lopez de Maturana, Oscar Gonzalez-Recio
This course will introduce students to perform prediction of complex traits using genomic information. Each day the course will start at 14:00 and end at 20:00 (CET).
For computing, we will use our EC2 AWS cloud, where most of the software needed for this course are already installed.
You will, therefore, only need a few applications installed on your laptop: SSH client Windows: MobaXterm
Mac/Linux: not required, terminal should be installed as standard
FTP client - transfers files to/from the server Windows/Mac/Linux -Filezilla Client
This is my recommendation but any FTP client should be fine, including Mac/Linux built-in
Please make sure that you have installed on your laptop R and RStudio
Once you have R and R Studio installed on your laptop, please install this list of packages using this command:
rpkgs<-c("BGLR", "snpReady", "data.table", "pheatmap", "rsample", "coda", "ggplot2", "ROCR", "tidyverse", "rmarkdown","knitr", "pander")
It is likely that when you install snpReady you get a message saying that ‘impute’ R package is necessary. You can install it as follows.
if (!require("BiocManager", quietly = TRUE))
The ultimate check whether a package installation was successful is to load the package into your R session via:
library() #eg library(ggplot2)
Day 1: Concepts review
- Presentation (E&O)
- General Introduction / Overview of the Course [General Introduction]
- Introduction to Genome-wide Prediction in Human genetics and Animal and Plant breeding. Breeding value vs Polygenic Risk Score. Factors affecting reliability of GWP. (E). Slides
- Review of Quantitative genetics. Slides
- Linear mixed models. Slides
- Genotype imputation procedures (design the reference population). Slides
- Lab 1: imputation. code training.ped testing.ped
Day 2: Imputation
- The ‘Curse’ of Dimensionality in large p small n problems. Regularization and shrinkage estimation. Slides
- Breakout-rooms: Design of analytical approaches. (E&O)
- Resemblance among relatives: Pedigree vs Genomic-based. (E). Slides
- Lab 2: building relationship matrices (E). code data
Day 3: Kernel and Bayesian regression methods for GWP
- GBLUP and Kernel-based regression models. (E) Slides
- Lab 3: (GBLUP,RKHS). meta_data testing_data BGLR_VanRaden BGLR_UAR BGLR_UARadj BGLR_Gaussian
- Bayesian alphabet (Methods on SNP regression). (O) Slides
- Lab 4: Bayesian Lasso
- Review on post-Gibbs convergence and McMC chains inspection analysis. (E) Slides
- Hands-on Post Gibbs (E) code_toyGS code_postGibbs
Day 4: Machine Learning methods for GWP
- Predictive ability metrics: MSE, Pearson and Spearman correlations, AUC-ROC curves. (E) Slides data code_AUC code_Metrics
- Cross validation strategies (E) Slides data code_CV
- Machine Learning (Advantages and disadvantages). Slides(O)
- Random Forest (O)
- Lab 5: RanFog (O)
- Boosting (O)
- Lab 6: RanBoost(O)
- Other ML approaches and wrap up. (O)
Day 5: Practical session
- Build your own Genome-enabled prediction. Breakout rooms
This is your reference population and the corresponding map file, and these are the candidate individuals and SNP map file to predict their genomic value.
Hackathon steps:
- Imputation
- Determine your predictive accuracy (internal), with different methods/models
- Predict yet-to-be observed phenotypes with your preferred method(s)
- submit results to instructors for final check
Day 1
- Code example to show the infinitesimal model
- Exercise on solving equations using residual updates.
- Imputation
Day 3
Day 4