Predict popularity of music tracks based on features provided in the dataset. The target variable, popularity has five categories:
- Very High
- High
- Average
- Low
- Very Low
For each category there is an initial bid price (royalties to be paid) and expected revenue collection (in 10k $) as follows:
Popularity | Bid Price | Expected Revenue |
---|---|---|
Very High | 5 | 10 |
High | 4 | 8 |
Average | 3 | 6 |
Low | 2 | 4 |
Very Low | 1 | 2 |
Based on predictions, 10,000 (in 10K $) will be invested to place bids on 4,000 music tracks at the cost of a more popular music track. Vice versa is not possible.
-
Upon implementing the machine learning models, namely (Random Forest, XG Boost, AdaBoost) we got nearly the same revenue earned, but on analysing the confusion matrix we found out that most of the misclassification was where there was under-classification. On top of that bidding will be successful only if we bid on a less popular music track at the cost of a more popular music track, hence it would be desirable to classify the song one class above if not correct. Also, we were not able to maximize our bidding which had a ceiling of 2.5 times the number of datapoints.
-
Thus, we chose three models having nearly the same revenue generated (Random Forest, XGBoost, AdaBoost) and took the ceiling value of the average of the three predictions. Thereby even if one of the models predicted a certain datapoint to have a class one higher than the other two then we would bid assuming it to be such.
-
For example, if the actual category of a song was “high”, if two of the models predicted the song to be of category “average” and the other “high”. Then had we applied a single model we would have got a revenue of 2 * 4 = 8, on average it would be $8/3. But on applying a ceiling function on the predicted output we would have classified it correctly as “high” hence a revenue of $8.
-
Taking another example if the actual category was “high”, if two of the models predicted the song to be of category “high” and the other “average”. Then if had we applied a single model we would have got a revenue of 2 * (2 * 4) = 16, on average it would be $16/3. But on applying a ceiling function on the predicted output we would have classified it correctly as “high” hence a revenue of $8.