Skip to content

Lectures on Machine Learning at HSE, Moscow

Notifications You must be signed in to change notification settings

mrglazkov/hse-moscow-ml

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 

Repository files navigation

Machine Learning @ Applied Statistics with Network Analysis, HSE, Moscow

This repository includes teaching materials related to the elective course Machine Learning taught at the HSE-Moscow masters programme Applied Statistics with Network Analysis. The materials are organized in sections corresponding to lecture days. Each section provides a brief outline of the topics addressed, access to the lecture slides, outline of the practical exercises and seminars, and references to the relevant literature.

For further information on the course, students can contact the lecturers via email at Nada Lavrač, nada.lavrac@ijs.si and Ljupco Todorovski, ljupco.todorovski@fu.uni-lj.si.

Grading

The grading for this course will be based on two types of assignments, homework and written exam. The schedule of the assignments is summarized in the following table:

Assignment Grading Submission Deadline (Date)
Homework 1 25% 16th of February 2021
Homework 2 25% 25th of February 2021
Written Exam 50% 2nd of March 2021 @12:00 (12:00 pm) Moscow time

Please note that the schedule is now final. We are going to organize additional hour of discussion related to the second homework and feedback on the first one on Tuesday, 23rd of February 2021 at 18:30 (6:30pm) Moscow time.

Handling Late Submissions

  • For the first homework, you can delay the submission for up to ten days: therefore, the ultimate deadline for submitting the first homework is 26th of February. Each day of delay after 16th of February will reduce your homework score by 1%: for example, if you submit your homework on 21st of February (five days of delay) and your homework were initially evaluated with 20%, your final grade, due to the 5 days delay, will be 15%.
  • The ultimate deadline for submitting the second homework is 2nd of March. The same rules for reducing the score as above apply, i.e., 1% penalty for one day of delay.

Student Clusters and Groups

We have noticed that there are two clusters of students attending the course:

  1. Seventeen (17) students that have chosen the course officially. These students can work on the homework assignments and submit their solutions in groups with other students from this cluster only (up to three students per group). All students from this cluster are expected to take the written exam.
    This cluster includes the following students: Borisyuk Anna, Vidovic Milica, Vorobeva Maria, Danilova Kseniia, Eremenko Alexandra, Kuzina Maria, Makhsudova Elvina, Parkhaeva Olga, Khairullina Dinara, Shabanova Ekaterina, Shakhova Anna, Petrov Gleb, Vladimirova Ksenia, Kozlova Yulia, Li Ling, Stremousov Alexander, and Chzu Chongrui.
  2. Other students that attend the course voluntarily and are not on the official list of course students. These students can work on the homework assignments and submit their solutions in groups with other students from this cluster only (up to three students per group). The students from this cluster will not be able to take the written exam.

Other Information

The written exam will be composed of three types of questions: (1) multiple choice questions and (2) questions requiring short answers related to the methodology and theory of machine learning, as well as (3) a practical exercise that will require performing a certain learning task on given data and providing answers to specific questions related to the obtained models and results.

Tentative Course Schedule for the Academic Year 2020/21

You can follow the lectures using the following Zoom link, https://fmf-uni-lj-si.zoom.us/j/97756216461 or join the Zoom meeting using the ID 977 562 16461.

Date Topic/Section
Thursday, 14th of January 2021 Introduction to Machine Learning
Tuesday, 19th of January 2021 Learning Rules
Thursday, 21st of January 2021 Relational Learning
Tuesday, 26th of January 2021
Thursday, 28th of January 2021
Learning from Heterogeneous Data
Thursday, 28th of January 2021
Tuesday, 2nd of February 2021
Learning Ensembles
Thursday, 4th of February 2021 Artificial Neural Networks and Deep Learning
Tuesday, 9th of February 2021
Thursday, 11th of February 2021
Embedding Complex Data Types
Thursday, 11th of February 2021 Dimensionality Reduction with Autoencoders
Tuesday, 16th of February 2021 Literature-Based Discovery and Support Vector Machines

1: Introduction to Machine Learning

  • Basic definitions and taxonomy of learning tasks
  • Three generations of machine learning and data mining methods
  • Understanding the error of machine learning models
  • The curse of dimensionality
  • Rough overview of the course topics

Lecture Slides

First part, Nada Lavrač
Second part, Ljupčo Todorovski
Last update: 15th of January 2021, 9:10 CET

Literature

  • James G, Witten D, Hastie T and Tibshirani R (2013) An Introduction to Statistical Learning. Springer, New York. Available at https://statlearning.com/. Sections 1 and 2, check also the exercises at the end of Section 2.

  • Bramer M (2007) Principles of Data Mining. Springer, Berlin. DOI:10.1007/978-1-84628-766-4. An introductory textbook for refreshing your knowledge on basics of data mining. The first edition of the textbook is also available at ResearchGate, https://www.researchgate.net/publication/220688376_Principles_of_Data_Mining

2: Learning Rules

  • Learning rules from decision trees
  • Covering algorithm and its variants
  • Association rules and subgroup discovery
  • Evaluating rules and rule sets

Lecture Slides

Learning Rules
Last update: 20th of January 2021, 15:00 CET

Exercise Materials

Learning Decision Trees and Rules in R
Last update: 28th of January 2021, 15:20 CET

Literature

3: Relational Learning

  • Learning relational rules
  • Inductive logic programming
  • Propositionalization
  • Wordification and Python-RDM

Lecture Slides

Relational Learning
Last update: 26th of January 2021, 20:10 CET

Exercise Materials

Relational Learning in Python
Last update: 26th of January 2021, 15:20 CET

Literature

4: Learning from Heterogeneous Data

  • Semantic relational learning with ontologies
    • Propositionalization, Hedwig and NetSDM
  • Propositionalization of heterogeneous information networks
    • TEHmINE and HINMINE
  • Practical exercises with HINMINE

Lecture Slides

Semantic Relational Learning
Last update: 26th of January 2021, 20:20 CET
Heterogeneous Information Networks
Last update: 23rd of February 2021, 17:40 CET

Exercise Materials

HINMINE in Python
Last update: 28th of January 2021, 15:20 CET

Literature

5: Learning Ensembles

  • Why ensembles: variance reduction
  • Boosting, bagging, feature subspaces, random forests
  • Out-of-bag error estimate, attribute importance
  • Bagging and random forests in R

Lecture Slides

Learning Ensembles
Last update: 2nd of February 2021, 22:10 CET

Exercise Materials

Bagging and Random Forests in R
Last update: 4th of February 2021, 15:40 CET

Literature

  • James G, Witten D, Hastie T and Tibshirani R (2013) An Introduction to Statistical Learning. Springer, New York. Available at https://statlearning.com/. Section 8.2 (Bagging, Random Forest, Boosting), check also exercises 5 and 7-12 at the end of Section 8.

6: Neural Networks and Deep Learning

  • General introduction to NNs
  • Feed-forward networks and back propagation
  • Towards deep networks: Convolutional networks
  • Neural networks in R

Lecture Slides

Neural Networks and Deep Learning
Last update: 4th of February 2021, 15:40 CET

Exercise Materials

Neural Networks in R
Last update: 4th of February 2021, 15:40 CET

Literature

7: Embedding Complex Data Types

  • Complex data types: semi-structured data and networks
  • Embedding of semi-structured data, bag-of-words
  • Embedding of words and text documents, word2vec and doc2vec
  • Classifying text documents in R
  • Embedding network nodes, node2vec
  • node2vec in R

Lecture Slides

Embedding Semi-Structured Data
Last update: 9th of February 2021, 22:20 CET
Embedding Networks
Last update: 11th of February, 11:10 CET

Exercise Materials

Classifying Text Documents in R
Last update: 9th of February 2021, 22:20 CET
Classifying Network Nodes in R
Last update: 11th of February 2021, 15:00 CET

Literature

  • Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781.
  • Le QV, Mikolov T (2014) Distributed representations of sentences and documents. arXiv:1405.4053.
  • Grover A, Leskovec J (2016) node2vec: Scalable feature learning for networks. arXiv:1607.00653.
  • Perozzi B, Al-Rfou R, Skiena S (2014) DeepWalk: Online learning of social representations. arXiv:1403.6652.

8: Dimensionality Reduction and Autoencoders

  • Classic methods for dimensionality reduction, PCA
  • Autoencoders as general embedding approach
  • Taxonomy of autoencoders: regularization and de-noising

Lecture Slides

Dim Reduction and Autoencoders
Last update: 11th of February 2021, 13:20 CET

Literature

  • James G, Witten D, Hastie T and Tibshirani R (2013) An Introduction to Statistical Learning. Springer, New York. Available at https://statlearning.com/. Section 10.2: Principal Components Analysis.
  • Goodfellow I, Bengio Y, Courville A (2016) Deep Learning. MIT Press. Available at https://www.deeplearningbook.org/. Introductory part of Section 14.
  • Charte D, Charte F, García S, del Jesus MJ, Herrera F (2018) A practical tutorial on autoencoders for nonlinear feature fusion: Taxonomy, models, software and guidelines. arXiv:1801.01586.

9: Literature-Based Discovery and Support Vector Machines

  • Literature-based discovery
    • Connecting unrelated terms across domains
  • Support vector machines and kernels
    • Linear support vector machine
    • Non-linearity and kernel functions
    • Selecting kernels, setting hyper-parameters

Lecture Slides

Literature-Based Discovery
Last update: 16th of February 2021, 17:30 CET
Support Vector Machines and Kernels
Last update: 16th of February 2021, 10:20 CET

Literature

  • James G, Witten D, Hastie T and Tibshirani R (2013) An Introduction to Statistical Learning. Springer, New York. Available at https://statlearning.com/. Section 9, check also exercises 1-8 in the same section.

About

Lectures on Machine Learning at HSE, Moscow

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published