Skip to content

aupadhaya/GenomeWidePrediction_course

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Genomewide_prediction

Material for the Course "GENOME-WIDE PREDICTION OF COMPLEX TRAITS IN HUMANS, PLANTS AND ANIMALS (GWP)"

Instructors: Evangelina Lopez de Maturana, Oscar Gonzalez-Recio

This course will introduce students to perform prediction of complex traits using genomic information. Each day the course will start at 14:00 and end at 20:00 (CET).

Preparatory_steps:

For computing, we will use our EC2 AWS cloud, where most of the software needed for this course are already installed.

You will, therefore, only need a few applications installed on your laptop: SSH client Windows: MobaXterm

Mac/Linux: not required, terminal should be installed as standard

FTP client - transfers files to/from the server Windows/Mac/Linux -Filezilla Client

This is my recommendation but any FTP client should be fine, including Mac/Linux built-in

Please make sure that you have installed on your laptop R and RStudio

Once you have R and R Studio installed on your laptop, please install this list of packages using this command:

 rpkgs<-c("BGLR", "snpReady", "data.table", "pheatmap", "rsample", "coda", "ggplot2", "ROCR", "tidyverse", "rmarkdown","knitr", "pander")
 install.packages(rpkgs)

It is likely that when you install snpReady you get a message saying that ‘impute’ R package is necessary. You can install it as follows.

 if (!require("BiocManager", quietly = TRUE))
 install.packages("BiocManager")
 BiocManager::install("impute")

The ultimate check whether a package installation was successful is to load the package into your R session via:

library() #eg library(ggplot2)

Content of the course

Day 1: Concepts review

  • Presentation (E&O)
  • General Introduction / Overview of the Course [General Introduction]
  • Introduction to Genome-wide Prediction in Human genetics and Animal and Plant breeding. Breeding value vs Polygenic Risk Score. Factors affecting reliability of GWP. (E). Slides
  • Review of Quantitative genetics. Slides
  • Linear mixed models. Slides
  • Genotype imputation procedures (design the reference population). Slides
  • Lab 1: imputation. code training.ped training.map testing.ped testing.map

Day 2: Imputation

  • The ‘Curse’ of Dimensionality in large p small n problems. Regularization and shrinkage estimation. Slides
  • Breakout-rooms: Design of analytical approaches. (E&O)
  • Resemblance among relatives: Pedigree vs Genomic-based. (E). Slides
  • Lab 2: building relationship matrices (E). code data

Day 3: Kernel and Bayesian regression methods for GWP

Day 4: Machine Learning methods for GWP

  • Predictive ability metrics: MSE, Pearson and Spearman correlations, AUC-ROC curves. (E) Slides data code_AUC code_Metrics
  • Cross validation strategies (E) Slides data code_CV
  • Machine Learning (Advantages and disadvantages). Slides(O)
  • Random Forest (O)
  • Lab 5: RanFog (O)
  • Boosting (O)
  • Lab 6: RanBoost(O)
  • Other ML approaches and wrap up. (O)

Day 5: Practical session

  • Build your own Genome-enabled prediction. Breakout rooms

This is your reference population and the corresponding map file, and these are the candidate individuals and SNP map file to predict their genomic value.

Hackathon steps:

  • Imputation
  • Determine your predictive accuracy (internal), with different methods/models
  • Predict yet-to-be observed phenotypes with your preferred method(s)
  • submit results to instructors for final check

Organization of the code for the practical Sessions

Day 1

  • Code example to show the infinitesimal model
  • Exercise on solving equations using residual updates.
  • Imputation

Day 3

Day 4

About

GENOME-WIDE PREDICTION OF COMPLEX TRAITS IN HUMANS, PLANTS AND ANIMALS

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages