Skip to content

Task for IASA DATA SCIENCE CHAMP - forecasting the total income from the user on the thirtieth day of his life

Notifications You must be signed in to change notification settings

vladyslav-honcharuk/IASA-DS-2022-Champ-AIM

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 

Repository files navigation

IASA_DS_CHAMP

Team AIM (Artificial Intelligence in Medicine) Task for IASA DATA SCIENCE CHAMP - prediction of total income from the user on the thirtieth day of his life.

The project focuses on developing interpretable robust machine learning model, which is responsible for predicting potential revenue accumulated by user playing the game. Our team presented pipeline of producing an ML model along with explaining certain mandatory steps. We carried out uni- as well as multivariate analysis and observed correlation coefficient to what features contribute the least in order to exclude them from the very start. Also we performed certain data preparation steps to improve data quality. In the end, we tested our regression model, which appeared to be a Light Gradient Boosting Machine (also known as LightGBM), and it reached accuracy of approximately 80%. After that we broke down feature importances to see how we could have improved our data engineering section to accomplish obtaining a more accurate score.

The main goal of our project was to create a model for predicting the total revenue per user on the thirtieth day of his life using various machine learning algorithms. In turn, the business notion for the income consists of three subcomponents: subscriptions-derivated income, ad-derivated income, and in-game microtransactions such as for buying tickets or diamonds.

Throughout the project, we did 3 main tasks: explored the dataset to get familiar with it, showed some general and statistical data and plotted some graphs, prepared the data by imputing missing values, removed redundant features, built a correlation heatmap for identifying properties that are the most correlated with the target property. We predicted continious value for each of the users. Following this, we analyzed the most appropriate model architectures, such as tree-based and common linear alghorithms. Finally, we tried different algorithms with different number of variables to find the best one in the training set and then used this model to predict user revenue based on the test.

About

Task for IASA DATA SCIENCE CHAMP - forecasting the total income from the user on the thirtieth day of his life

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%