From c29c303b53450298377c61e4ba15a7bc9a969b38 Mon Sep 17 00:00:00 2001 From: Mariam <139501475+mariam7084@users.noreply.github.com> Date: Wed, 21 Feb 2024 16:55:34 +0530 Subject: [PATCH] Added README --- .../Model/README.md | 90 +++++++++++++++++++ 1 file changed, 90 insertions(+) create mode 100644 Google Playstore Analysis And Rating Predictor/Model/README.md diff --git a/Google Playstore Analysis And Rating Predictor/Model/README.md b/Google Playstore Analysis And Rating Predictor/Model/README.md new file mode 100644 index 000000000..fdd6097e7 --- /dev/null +++ b/Google Playstore Analysis And Rating Predictor/Model/README.md @@ -0,0 +1,90 @@ +

Google PlayStore Analysis and Rating Predictor

+ +**GOAL** + +To analyze the 'Google Playstore Dataset' Dataset using Exploratory Data analysis and make a regression model to predict the rating of the apps. + +**DATASET** + +https://www.kaggle.com/datasets/madhav000/playstore-analysis + +**DESCRIPTION** + +The problem is to identify the apps that are going to be good for Google to promote. App ratings, which are provided by the customers, is always a great indicator of the goodness of the app. The problem reduces to: predict which apps will have high ratings. + +The dataset contains the following columns: +- App : Applicaton Name +- Category: Category to which the app belongs +- Rating: Overall user rating of the app +- Reviews: Number of user reviews for the app +- Size: Size of the app +- Installs: Number of user downloads/installs for the app +- Type: Paid or Free +- Price: Price of the app +- Content Rating: Age group the app is targeted at - Children / Mature 21+ / Adult +- Genres: An app can belong to multiple genres (apart from its main category). For example, a musical family game will belong to Music, Game, Family genres. +- Last Updated: Date when the app was last updated on Play Store +- Current Ver: Current version of the app available on Play Store +- Android Ver: Minimum required Android version + +**WHAT I HAD DONE** + +* Checked for missing values and cleaned the data accordingly +* Analyzed the data, found insights and visualized them accordingly. +* Found detailed insights of different columns with one another using plotting libraries. +* Deployed four regression models viz., Linear Regression, Support Vector regression, Decision Tree Regression, Random FOrest Regression to predict the rating. +* Used RMSE(Root Mean Square Error) to evaluate the performance of the models. + + +**LIBRARIES NEEDED** + +1. Pandas +2. Matplotlib +3. Seaborn +4. Plotly +5. Numpy +6. WordCloud +7. Sklearn + +**VISUALIZATION** +![App distribution sunburst chart by category](<../Images/App distribution sunburst chart by category.png>) +![App size distribution by category](<../Images/App size distribution by category.png>) +![App distribution by category](<../Images/App distribution by category.png>) +![Average Price By category](<../Images/Average Price By category.png>) +![Category by Content Rating](<../Images/Category by Content Rating.png>) +![wordcloud for category column](<../Images/wordcloud for category column.png>) +![Total installs by category](<../Images/Total installs by category.png>) +![Number of apps by type](<../Images/Number of apps by type.png>) +![Word Cloud for Genre Column](<../Images/Word Cloud for Genre Column.png>) +![Number of applications by content rating](<../Images/Number of applications by content rating.png>) + +For more visualization refer the .ipynb file :) + +**Model Performances** + +|Model | RMSE | +| ------------------------- | -------------------| +|Linear Regression | 0.5474395094809866 | +|Support Vector Regression | 0.545564956206932 | +|Decision Tree Regression | 0.7552190124670595 | +|Random Forest Regression | 0.7552190124670595 | + +- reviews, type, installs and size columns were used to make the regression model with the rating column being the target vector +- high RMSE shows that the data has a wide variation in it. + +**CONCLUSION** +- Various Categories of apps have varied ratings +- Most installed apps belonged to the category of 'Game' +- Most of the apps on the playstore are rated for all age groups of audience +- More than 3/4th of the apps are free to install. +- Among the paid apps, finance apps were most expensive having an average price of $8, followed by Lifestyle apps at $6 and medical apps at $3 average. +- All the apps were less than 100 mb. +- Most of the available apps had 'Family' category. +- The Support Vector regression(with rbf kernel) had the minimum RMSE among the algorithms used making it the best model. + +**AUTHOR** + +- Code contributed by *Mariam* @ #JWoC_2024 + +[![LinkedIn](https://img.shields.io/badge/linkedin-%230077B5.svg?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/mariam-m7084) +[![GitHub](https://img.shields.io/badge/github-%23121011.svg?style=for-the-badge&logo=github&logoColor=white)](https://github.com/mariam7084/)