This is the material for a short tutorial on machine learning for genetic data given at the 2018 Data Science@Polytechnique summer school.
http://www.ds3-datascience-polytechnique.fr/sessions/
Thank you to the following folks for their help in making this tutorial:
- Chloé-Agathe Azencott
- Benoit Playe
Slides by C.-A. Azencott to introduce this tutorial are available in the 'slide' directory'
Just do the usual:
git clone https://github.com/jpvert/2018_DS3_tutorial.git
cd 2018_DS3_tutorial
This tutorial is made of two parts. The first part allows you to get familiar with concepts and techniques useful to analyse genetic data with machine learning methods. All methods are tested on a small dataset of simulated data, which makes life easier. In the second part, we apply the techniques to real data.
Each part is implemented as a Jupyter notebook, in Python. In addition, we provide the solutions to each notebook as another notebook, where the missing parts have been filled. The notebooks are files ending in *.ipynb
, and the names of the different notebooks should be self-explanatory.
To run a notebook, you should start the Jupyter notebook app, e.g., by typing in a command line:
jupyter notebook
and then open the notebook you want to study by clicking the corresponding file.
Once the notebook is open, we recommend studying and running the cells one by one to follow the tutorial and answering the questions. To run a cell, just click on it to highlight is, and click "SHIFT+ENTER". This will run the instructions in the cell, and move to the next cell. You can of course modify the content of the cells, and create new cells.