This Data Science course has been taught at the Faculty of Science, Masaryk University, in the fall semester 2020/2021.
I gave 12 lectures, each focused on one ML technique and dataset (typically from kaggle.com). The emphasis has been on coding and practicing data science skills, rather than the theoretical background.
The course is now in the IS Course Catalogue, look for M7DataSP Data Science Practicum (Praktikum z pokročilé datové vědy).
The course is scheduled for Mondays, 12:00-13:30, and will be taught remotely through Google Meet/Hangouts. The first lecture will take place on October 5. To be invited to the classes, enroll to the course in IS. If you want to try a few first lectures without a formal enrollment, send me an email.
No special knowledge is expected but you should have at least one year of coding experience, either R or Python. I like diverse crowds; students from different faculties and specializations are encouraged to enroll (if still in doubt, let me know to be paired with a more experienced student). The course will be taught in English if at least two students will be interested, otherwise in Czech.
- Intro, linear regression (one neuron), neural networks (NN), TensorFlow (TF)
- Logistic regression, softmax, cross-entropy
- Image data, convolutional NN
- ImageNet, fine tuning, tranfer learning, data augmentation
- TenforFlowJS, GitHub Pages, backpropagation
- Natural language processing (NLP), text preprocessing, dense NN
- Embeddings, recurrent NN (LSTM, GRU)
- Text classification, transformers, NLP methods on genomic data
- Recommenders / Collaborative filtering, optimization
- Tabular data, batch normalization
- Trees, random forest, XGBoost, LightGBM, CatBoost
- ML models interpretation, hyper-parameters optimization, autoML
Recordings of the lectures can be found in the teaching materials in IS (you need to have MU GSuite to access the videos).
- Create a GitHub account and your first repository
- Classify penguins based on their size
- Fashion MNIST classification
- Image classification app
- Text generation
- Genomic seqs classification
- Ratings prediction
- Blue book for buldosers
For the students' solution, see Assignments.md.
- Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (2nd edition)
- Deep Learning with Python, Second Edition
- TensorFlow 2 in 30 days
- Deep Learning for Coders with fastai and PyTorch: AI Applications Without a PhD
- RStudio AI Blog
- The Missing Semester of Your CS Education
This work would be impossible without tutorials provided by TensorFlow and RStudio. I also get a lot of inspiration from numerous Kagle notebooks and blogs all over the internet. Sometimes, in a time pressure before the lecture, I might have forgotten to properly link all my sources. If this is the case, I would be grateful if you correct my mistake, either by a pull request or sending me a message. Thank you.