This repository includes teaching materials related to the elective course Machine Learning taught at the HSE-Moscow masters programme Applied Statistics with Network Analysis. The materials are organized in sections corresponding to lecture days. Each section provides a brief outline of the topics addressed, access to the lecture slides, outline of the practical exercises and seminars, and references to the relevant literature.
For further information on the course, students can contact the lecturers via email at Nada Lavrač, nada.lavrac@ijs.si
and Ljupco Todorovski, ljupco.todorovski@fu.uni-lj.si
.
The grading for this course will be based on two types of assignments, homework and written exam. The schedule of the assignments is summarized in the following table:
Assignment | Grading | Submission Deadline (Date) |
---|---|---|
Homework 1 | 25% | 16th of February 2021 |
Homework 2 | 25% | 25th of February 2021 |
Written Exam | 50% | 2nd of March 2021 @12:00 (12:00 pm) Moscow time |
Please note that the schedule is now final. We are going to organize additional hour of discussion related to the second homework and feedback on the first one on Tuesday, 23rd of February 2021 at 18:30 (6:30pm) Moscow time.
- For the first homework, you can delay the submission for up to ten days: therefore, the ultimate deadline for submitting the first homework is 26th of February. Each day of delay after 16th of February will reduce your homework score by 1%: for example, if you submit your homework on 21st of February (five days of delay) and your homework were initially evaluated with 20%, your final grade, due to the 5 days delay, will be 15%.
- The ultimate deadline for submitting the second homework is 2nd of March. The same rules for reducing the score as above apply, i.e., 1% penalty for one day of delay.
We have noticed that there are two clusters of students attending the course:
- Seventeen (17) students that have chosen the course officially. These students can work on the homework assignments and submit their solutions in groups with other students from this cluster only (up to three students per group). All students from this cluster are expected to take the written exam.
This cluster includes the following students: Borisyuk Anna, Vidovic Milica, Vorobeva Maria, Danilova Kseniia, Eremenko Alexandra, Kuzina Maria, Makhsudova Elvina, Parkhaeva Olga, Khairullina Dinara, Shabanova Ekaterina, Shakhova Anna, Petrov Gleb, Vladimirova Ksenia, Kozlova Yulia, Li Ling, Stremousov Alexander, and Chzu Chongrui. - Other students that attend the course voluntarily and are not on the official list of course students. These students can work on the homework assignments and submit their solutions in groups with other students from this cluster only (up to three students per group). The students from this cluster will not be able to take the written exam.
The written exam will be composed of three types of questions: (1) multiple choice questions and (2) questions requiring short answers related to the methodology and theory of machine learning, as well as (3) a practical exercise that will require performing a certain learning task on given data and providing answers to specific questions related to the obtained models and results.
You can follow the lectures using the following Zoom link, https://fmf-uni-lj-si.zoom.us/j/97756216461
or join the Zoom meeting using the ID 977 562 16461
.
Date | Topic/Section |
---|---|
Thursday, 14th of January 2021 | Introduction to Machine Learning |
Tuesday, 19th of January 2021 | Learning Rules |
Thursday, 21st of January 2021 | Relational Learning |
Tuesday, 26th of January 2021 Thursday, 28th of January 2021 |
Learning from Heterogeneous Data |
Thursday, 28th of January 2021 Tuesday, 2nd of February 2021 |
Learning Ensembles |
Thursday, 4th of February 2021 | Artificial Neural Networks and Deep Learning |
Tuesday, 9th of February 2021 Thursday, 11th of February 2021 |
Embedding Complex Data Types |
Thursday, 11th of February 2021 | Dimensionality Reduction with Autoencoders |
Tuesday, 16th of February 2021 | Literature-Based Discovery and Support Vector Machines |
- Basic definitions and taxonomy of learning tasks
- Three generations of machine learning and data mining methods
- Understanding the error of machine learning models
- The curse of dimensionality
- Rough overview of the course topics
First part, Nada Lavrač
Second part, Ljupčo Todorovski
Last update: 15th of January 2021, 9:10 CET
-
James G, Witten D, Hastie T and Tibshirani R (2013) An Introduction to Statistical Learning. Springer, New York. Available at https://statlearning.com/. Sections 1 and 2, check also the exercises at the end of Section 2.
-
Bramer M (2007) Principles of Data Mining. Springer, Berlin. DOI:10.1007/978-1-84628-766-4. An introductory textbook for refreshing your knowledge on basics of data mining. The first edition of the textbook is also available at ResearchGate,
https://www.researchgate.net/publication/220688376_Principles_of_Data_Mining
- Learning rules from decision trees
- Covering algorithm and its variants
- Association rules and subgroup discovery
- Evaluating rules and rule sets
Learning Rules
Last update: 20th of January 2021, 15:00 CET
Learning Decision Trees and Rules in R
Last update: 28th of January 2021, 15:20 CET
- Fürnkranz J, Gamberger D and Lavrač N (2012) Foundations of Rule Learning. Springer, Berlin. DOI:10.1007/978-3-540-75197-7. Chapters 1 and 2, available here.
- Learning relational rules
- Inductive logic programming
- Propositionalization
- Wordification and Python-RDM
Relational Learning
Last update: 26th of January 2021, 20:10 CET
Relational Learning in Python
Last update: 26th of January 2021, 15:20 CET
- Džeroski S and Lavrač N (2001) Relational Data Mining. Springer, Berlin. DOI:10.1007/978-3-662-04599-2. Chapter 1, available here.
- Perovšek M, Vavpetič A, Kranjc J, Cestnik B and Lavrač N (2015) Wordification: Propositionalization by unfolding relational data into bags of words. DOI:10.1016/j.eswa.2015.04.017. Available here.
- Železný F and Lavrač N (2006) Propositionalization-based relational subgroup discovery with RSD. DOI:10.1007/s10994-006-8633-8. Available here.
- Semantic relational learning with ontologies
- Propositionalization, Hedwig and NetSDM
- Propositionalization of heterogeneous information networks
- TEHmINE and HINMINE
- Practical exercises with HINMINE
Semantic Relational Learning
Last update: 26th of January 2021, 20:20 CET
Heterogeneous Information Networks
Last update: 23rd of February 2021, 17:40 CET
HINMINE in Python
Last update: 28th of January 2021, 15:20 CET
- Kralj J, Robnik-Šikonja M and Lavrač N (2019) NetSDM: semantic data mining with network analysis. Journal of Machine Learning Research 20: 1-50.
- Kralj J, Robnik-Šikonja M and Lavrač N (2018) HINMINE: Heterogeneous Information Network Mining with Information Retrieval Heuristics. DOI:10.1007/s10844-017-0444-9. Available here.
- Why ensembles: variance reduction
- Boosting, bagging, feature subspaces, random forests
- Out-of-bag error estimate, attribute importance
- Bagging and random forests in R
Learning Ensembles
Last update: 2nd of February 2021, 22:10 CET
Bagging and Random Forests in R
Last update: 4th of February 2021, 15:40 CET
- James G, Witten D, Hastie T and Tibshirani R (2013) An Introduction to Statistical Learning. Springer, New York. Available at https://statlearning.com/. Section 8.2 (Bagging, Random Forest, Boosting), check also exercises 5 and 7-12 at the end of Section 8.
- General introduction to NNs
- Feed-forward networks and back propagation
- Towards deep networks: Convolutional networks
- Neural networks in R
Neural Networks and Deep Learning
Last update: 4th of February 2021, 15:40 CET
Neural Networks in R
Last update: 4th of February 2021, 15:40 CET
- Hastie T, Tibshirani R and Friedman J (2009) The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York. Available at https://web.stanford.edu/~hastie/ElemStatLearn/. Sections 11.2 to 11.8 of Chapter 11.
- Nielsen M (2019) Neural Networks and Deep Learning. Available at http://neuralnetworksanddeeplearning.com/. Excellent and highly recommended further reading.
- Complex data types: semi-structured data and networks
- Embedding of semi-structured data, bag-of-words
- Embedding of words and text documents, word2vec and doc2vec
- Classifying text documents in R
- Embedding network nodes, node2vec
- node2vec in R
Embedding Semi-Structured Data
Last update: 9th of February 2021, 22:20 CET
Embedding Networks
Last update: 11th of February, 11:10 CET
Classifying Text Documents in R
Last update: 9th of February 2021, 22:20 CET
Classifying Network Nodes in R
Last update: 11th of February 2021, 15:00 CET
- Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781.
- Le QV, Mikolov T (2014) Distributed representations of sentences and documents. arXiv:1405.4053.
- Grover A, Leskovec J (2016) node2vec: Scalable feature learning for networks. arXiv:1607.00653.
- Perozzi B, Al-Rfou R, Skiena S (2014) DeepWalk: Online learning of social representations. arXiv:1403.6652.
- Classic methods for dimensionality reduction, PCA
- Autoencoders as general embedding approach
- Taxonomy of autoencoders: regularization and de-noising
Dim Reduction and Autoencoders
Last update: 11th of February 2021, 13:20 CET
- James G, Witten D, Hastie T and Tibshirani R (2013) An Introduction to Statistical Learning. Springer, New York. Available at https://statlearning.com/. Section 10.2: Principal Components Analysis.
- Goodfellow I, Bengio Y, Courville A (2016) Deep Learning. MIT Press. Available at https://www.deeplearningbook.org/. Introductory part of Section 14.
- Charte D, Charte F, García S, del Jesus MJ, Herrera F (2018) A practical tutorial on autoencoders for nonlinear feature fusion: Taxonomy, models, software and guidelines. arXiv:1801.01586.
- Literature-based discovery
- Connecting unrelated terms across domains
- Support vector machines and kernels
- Linear support vector machine
- Non-linearity and kernel functions
- Selecting kernels, setting hyper-parameters
Literature-Based Discovery
Last update: 16th of February 2021, 17:30 CET
Support Vector Machines and Kernels
Last update: 16th of February 2021, 10:20 CET
- James G, Witten D, Hastie T and Tibshirani R (2013) An Introduction to Statistical Learning. Springer, New York. Available at https://statlearning.com/. Section 9, check also exercises 1-8 in the same section.