Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Google Playstore Analysis and Rating Prediction #588

Merged
merged 6 commits into from
Feb 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10,841 changes: 10,841 additions & 0 deletions Google Playstore Analysis And Rating Predictor/Dataset/googleplaystore.csv

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Large diffs are not rendered by default.

90 changes: 90 additions & 0 deletions Google Playstore Analysis And Rating Predictor/Model/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
<h1>Google PlayStore Analysis and Rating Predictor</h1>

**GOAL**

To analyze the 'Google Playstore Dataset' Dataset using Exploratory Data analysis and make a regression model to predict the rating of the apps.

**DATASET**

https://www.kaggle.com/datasets/madhav000/playstore-analysis

**DESCRIPTION**

The problem is to identify the apps that are going to be good for Google to promote. App ratings, which are provided by the customers, is always a great indicator of the goodness of the app. The problem reduces to: predict which apps will have high ratings.

The dataset contains the following columns:
- App : Applicaton Name
- Category: Category to which the app belongs
- Rating: Overall user rating of the app
- Reviews: Number of user reviews for the app
- Size: Size of the app
- Installs: Number of user downloads/installs for the app
- Type: Paid or Free
- Price: Price of the app
- Content Rating: Age group the app is targeted at - Children / Mature 21+ / Adult
- Genres: An app can belong to multiple genres (apart from its main category). For example, a musical family game will belong to Music, Game, Family genres.
- Last Updated: Date when the app was last updated on Play Store
- Current Ver: Current version of the app available on Play Store
- Android Ver: Minimum required Android version

**WHAT I HAD DONE**

* Checked for missing values and cleaned the data accordingly
* Analyzed the data, found insights and visualized them accordingly.
* Found detailed insights of different columns with one another using plotting libraries.
* Deployed four regression models viz., Linear Regression, Support Vector regression, Decision Tree Regression, Random FOrest Regression to predict the rating.
* Used RMSE(Root Mean Square Error) to evaluate the performance of the models.


**LIBRARIES NEEDED**

1. Pandas
2. Matplotlib
3. Seaborn
4. Plotly
5. Numpy
6. WordCloud
7. Sklearn

**VISUALIZATION**
![App distribution sunburst chart by category](<../Images/App distribution sunburst chart by category.png>)
![App size distribution by category](<../Images/App size distribution by category.png>)
![App distribution by category](<../Images/App distribution by category.png>)
![Average Price By category](<../Images/Average Price By category.png>)
![Category by Content Rating](<../Images/Category by Content Rating.png>)
![wordcloud for category column](<../Images/wordcloud for category column.png>)
![Total installs by category](<../Images/Total installs by category.png>)
![Number of apps by type](<../Images/Number of apps by type.png>)
![Word Cloud for Genre Column](<../Images/Word Cloud for Genre Column.png>)
![Number of applications by content rating](<../Images/Number of applications by content rating.png>)

For more visualization refer the .ipynb file :)

**Model Performances**

|Model | RMSE |
| ------------------------- | -------------------|
|Linear Regression | 0.5474395094809866 |
|Support Vector Regression | 0.545564956206932 |
|Decision Tree Regression | 0.7552190124670595 |
|Random Forest Regression | 0.7552190124670595 |

- reviews, type, installs and size columns were used to make the regression model with the rating column being the target vector
- high RMSE shows that the data has a wide variation in it.

**CONCLUSION**
- Various Categories of apps have varied ratings
- Most installed apps belonged to the category of 'Game'
- Most of the apps on the playstore are rated for all age groups of audience
- More than 3/4th of the apps are free to install.
- Among the paid apps, finance apps were most expensive having an average price of $8, followed by Lifestyle apps at $6 and medical apps at $3 average.
- All the apps were less than 100 mb.
- Most of the available apps had 'Family' category.
- The Support Vector regression(with rbf kernel) had the minimum RMSE among the algorithms used making it the best model.

**AUTHOR**

- Code contributed by *Mariam* @ #JWoC_2024

[![LinkedIn](https://img.shields.io/badge/linkedin-%230077B5.svg?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/mariam-m7084)
[![GitHub](https://img.shields.io/badge/github-%23121011.svg?style=for-the-badge&logo=github&logoColor=white)](https://github.com/mariam7084/)
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Numpy == 1.25.2
Matplotlib == 3.7.1
Pandas == 1.5.3
Seaborn == 0.13.1
Plotly == 5.15.0
Wordcloud == 1.9.3
Sklearn == 1.2.2
Loading