This course follows the Machine Learning and the High Dimensional & Deep Learning courses. In theses courses, you have acquired knowledge in machine and deep learning algoritms and their application on various type of data. This knowledge is primordial to become a DataScientist.
This course has three main objectives. You will
-
learn how to apply efficiently these algorithms using
- Cloud computing with Google Cloud,
- Container with Docker,
-
discover new field of artificial intelligence applied on (real) datasets that require specific algorithms:
- Text.
- Algorithms: Text processing, Vectorizer, Words Embedding, RNN
- Libraries : Nltk, Scikit-Learn, Gensim
- Video Game
- Algorithms: Reinforcement learning, (Policy Gradient algorithm, Q-Learning, Deep Q-learning)
- *Libraries : AI Gym, Tensorflow.
- Movies Notations
- Algorithms: Recommendation system, (User/User and Item/Item filters, NMF, Neural recomendation system)
- *Libraries : Surprise, Tensorflow.
- Text.
-
how to efficiently share reproducible code.
- Build a Github repository.
NB: Some contents from previous years are still available on the repository (like Spark) but are not covered during theses courses anymore.
- R Tutorial
- Python Tutorial
- Elementary statistic tools
- Data Exploration and Clustering.
- Machine Learning
- High Dimensional & Deep Learning
- Lectures : 10 hours
- Practical Works : 30 hours.
The course is divided in 5 topics (of various lentgh) over 5 days.
Course introduction + Github Reminder: Slides/Video
-
Session 1 - 02-11-20
-
Session 2 - 16-11-20
-
Session 3 - 30-11-20
-
Session 4 - 07-12-20
-
Session 5 14-12-20
-
Session 6 04-01-20
- Free time on project.
The evaluation is associated to the DEFI-IA
You will be evaluated on your capacity of acting like a Data Scientist, i.e.
- Handle a new dataset and explore it.
- Find a solution to address the defi's problem with a high score (above baseline).
- Explain the choosen algorithm.
- Write a complete pipeline to easily reproduce the results.
- Justify the choice of the algorithms and the environment (CPU/GPU, Cloud etc..).
- Share it and make your results easily reproducible (Git - docker, conda environment.).
-
Project - (60%): a Git repository.
- The git should contain a clear markdown Readme, which describes (33%)
- Which result you achieved? In which computation time? On which engine?
- What do I have to install to be able to reproduce the code?
- Which command do I have to run to reproduce the results?
- The code has to be easily reproducible. (33%)
- Packages required has to be well described. (a requirements.txt files is the best)
- Conda command or docker command can be furnish
- The code should be clear and easily readable. (33%)
- Final results can be run in a script and not a notebook.
- Only final code can be found in this script.
- Deadline : January 29 2021.
- The git should contain a clear markdown Readme, which describes (33%)
-
Rapport - (40%) 10 pages maximum:
- Quality of the presentation. 25%
- In-Deep explanation of the chosen algorithm. 25%
- Choice of the tools-infrastructure used. 25%
- Results you obtained. 25%
- Date : January 29, 2021.
- Group of 4 to 5 people (DEFI IA's team).
All the libraries required for these modules are listed in the requirements.txt
(IN CONSTRUCTION/ ONLY SESSION 1 IS OK)
To build a functional environment in pandas execute the following lines:
conda create -n AIF python=3.8
conda activate AIF
pip install -r requirements.txt
jupyter labextension install jupyterlab-plotly@4.12.0