This is a proposal for a Data Science course to be taught at the Faculty of Science, Masaryk University, in the fall semester 2020/2021.
I expect to give 12 lectures, each focused on one dataset (typically from kaggle.com) and one data science technique. The emphasis should be on coding and practicing data science skills, not the theoretical background.
The course is now in the IS Course Catalogue, look for M7DataSP Data Science Practicum (Praktikum z pokročilé datové vědy).
The course is scheduled for Mondays, 12:00-13:30, and will be taught remotely through Google Meet/Hangouts. The first lecture will take place on October 5. To be invited to the classes, enroll to the course in IS. If you want to try a few first lectures without a formal enrollment, send me an email.
No special knowledge is expected but you should have at least one year of coding experience, either R or Python. I like diverse crowds; students from different faculties and specializations are encouraged to enroll (if still in doubt, let me know to be paired with a more experienced student). The course will be taught in English if at least two students will be interested, otherwise in Czech.
- Linear regression [data], git and GitHub
- Logistic regression [data], splitting data into train, validation and testing sets
- Unsupervised methods [data], visualizations
- Trees and forests [data]
- XGBoost & friends [data]
- Review
- TensorFlow, Keras, neural networks
- Classification of images [data]
- Fine-tuning, transfer learning [data]
- Neural networks applied to natural language processing [data]
- Neural networks applied to tabular data [data]
- Collaborative filtering [data]
Recordings of the lectures can be found in the teaching materials in IS (you need to have MU GSuite to access the videos).
50% homeworks (by group of 2-4 students), 50% final project (individual). To pass, you must achieve at least 60% points.
- Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (2nd edition)
- Deep Learning with Python, Second Edition
- TensorFlow 2 in 30 days
- Deep Learning for Coders with fastai and PyTorch: AI Applications Without a PhD
- RStudio AI Blog
- The Missing Semester of Your CS Education
This work would be impossible without tutorials provided by TensorFlow and RStudio. I also get a lot of inspiration from numerous Kagle notebooks and blogs all over the internet. Sometimes, in a time pressure before the lecture, I might have forgotten to properly link all my sources. If this is the case, I would be grateful if you correct my mistake, either by a pull request or sending me a message. Thank you.