-
-
Notifications
You must be signed in to change notification settings - Fork 216
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
d7c2ea7
commit c29c303
Showing
1 changed file
with
90 additions
and
0 deletions.
There are no files selected for viewing
90 changes: 90 additions & 0 deletions
90
Google Playstore Analysis And Rating Predictor/Model/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,90 @@ | ||
<h1>Google PlayStore Analysis and Rating Predictor</h1> | ||
|
||
**GOAL** | ||
|
||
To analyze the 'Google Playstore Dataset' Dataset using Exploratory Data analysis and make a regression model to predict the rating of the apps. | ||
|
||
**DATASET** | ||
|
||
https://www.kaggle.com/datasets/madhav000/playstore-analysis | ||
|
||
**DESCRIPTION** | ||
|
||
The problem is to identify the apps that are going to be good for Google to promote. App ratings, which are provided by the customers, is always a great indicator of the goodness of the app. The problem reduces to: predict which apps will have high ratings. | ||
|
||
The dataset contains the following columns: | ||
- App : Applicaton Name | ||
- Category: Category to which the app belongs | ||
- Rating: Overall user rating of the app | ||
- Reviews: Number of user reviews for the app | ||
- Size: Size of the app | ||
- Installs: Number of user downloads/installs for the app | ||
- Type: Paid or Free | ||
- Price: Price of the app | ||
- Content Rating: Age group the app is targeted at - Children / Mature 21+ / Adult | ||
- Genres: An app can belong to multiple genres (apart from its main category). For example, a musical family game will belong to Music, Game, Family genres. | ||
- Last Updated: Date when the app was last updated on Play Store | ||
- Current Ver: Current version of the app available on Play Store | ||
- Android Ver: Minimum required Android version | ||
|
||
**WHAT I HAD DONE** | ||
|
||
* Checked for missing values and cleaned the data accordingly | ||
* Analyzed the data, found insights and visualized them accordingly. | ||
* Found detailed insights of different columns with one another using plotting libraries. | ||
* Deployed four regression models viz., Linear Regression, Support Vector regression, Decision Tree Regression, Random FOrest Regression to predict the rating. | ||
* Used RMSE(Root Mean Square Error) to evaluate the performance of the models. | ||
|
||
|
||
**LIBRARIES NEEDED** | ||
|
||
1. Pandas | ||
2. Matplotlib | ||
3. Seaborn | ||
4. Plotly | ||
5. Numpy | ||
6. WordCloud | ||
7. Sklearn | ||
|
||
**VISUALIZATION** | ||
![App distribution sunburst chart by category](<../Images/App distribution sunburst chart by category.png>) | ||
![App size distribution by category](<../Images/App size distribution by category.png>) | ||
![App distribution by category](<../Images/App distribution by category.png>) | ||
![Average Price By category](<../Images/Average Price By category.png>) | ||
![Category by Content Rating](<../Images/Category by Content Rating.png>) | ||
![wordcloud for category column](<../Images/wordcloud for category column.png>) | ||
![Total installs by category](<../Images/Total installs by category.png>) | ||
![Number of apps by type](<../Images/Number of apps by type.png>) | ||
![Word Cloud for Genre Column](<../Images/Word Cloud for Genre Column.png>) | ||
![Number of applications by content rating](<../Images/Number of applications by content rating.png>) | ||
|
||
For more visualization refer the .ipynb file :) | ||
|
||
**Model Performances** | ||
|
||
|Model | RMSE | | ||
| ------------------------- | -------------------| | ||
|Linear Regression | 0.5474395094809866 | | ||
|Support Vector Regression | 0.545564956206932 | | ||
|Decision Tree Regression | 0.7552190124670595 | | ||
|Random Forest Regression | 0.7552190124670595 | | ||
|
||
- reviews, type, installs and size columns were used to make the regression model with the rating column being the target vector | ||
- high RMSE shows that the data has a wide variation in it. | ||
|
||
**CONCLUSION** | ||
- Various Categories of apps have varied ratings | ||
- Most installed apps belonged to the category of 'Game' | ||
- Most of the apps on the playstore are rated for all age groups of audience | ||
- More than 3/4th of the apps are free to install. | ||
- Among the paid apps, finance apps were most expensive having an average price of $8, followed by Lifestyle apps at $6 and medical apps at $3 average. | ||
- All the apps were less than 100 mb. | ||
- Most of the available apps had 'Family' category. | ||
- The Support Vector regression(with rbf kernel) had the minimum RMSE among the algorithms used making it the best model. | ||
|
||
**AUTHOR** | ||
|
||
- Code contributed by *Mariam* @ #JWoC_2024 | ||
|
||
[![LinkedIn](https://img.shields.io/badge/linkedin-%230077B5.svg?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/mariam-m7084) | ||
[![GitHub](https://img.shields.io/badge/github-%23121011.svg?style=for-the-badge&logo=github&logoColor=white)](https://github.com/mariam7084/) |