Team AIM (Artificial Intelligence in Medicine) Task for IASA DATA SCIENCE CHAMP - prediction of total income from the user on the thirtieth day of his life.
The project focuses on developing interpretable robust machine learning model, which is responsible for predicting potential revenue accumulated by user playing the game. Our team presented pipeline of producing an ML model along with explaining certain mandatory steps. We carried out uni- as well as multivariate analysis and observed correlation coefficient to what features contribute the least in order to exclude them from the very start. Also we performed certain data preparation steps to improve data quality. In the end, we tested our regression model, which appeared to be a Light Gradient Boosting Machine (also known as LightGBM), and it reached accuracy of approximately 80%. After that we broke down feature importances to see how we could have improved our data engineering section to accomplish obtaining a more accurate score.
The main goal of our project was to create a model for predicting the total revenue per user on the thirtieth day of his life using various machine learning algorithms. In turn, the business notion for the income consists of three subcomponents: subscriptions-derivated income, ad-derivated income, and in-game microtransactions such as for buying tickets or diamonds.
Throughout the project, we did 3 main tasks: explored the dataset to get familiar with it, showed some general and statistical data and plotted some graphs, prepared the data by imputing missing values, removed redundant features, built a correlation heatmap for identifying properties that are the most correlated with the target property. We predicted continious value for each of the users. Following this, we analyzed the most appropriate model architectures, such as tree-based and common linear alghorithms. Finally, we tried different algorithms with different number of variables to find the best one in the training set and then used this model to predict user revenue based on the test.