Skip to content

Commit

Permalink
Merge pull request #588 from mariam7084/main
Browse files Browse the repository at this point in the history
Google Playstore Analysis and Rating Prediction
  • Loading branch information
abhisheks008 committed Feb 21, 2024
2 parents 452f52e + 38adfd6 commit 6c581aa
Show file tree
Hide file tree
Showing 16 changed files with 14,089 additions and 0 deletions.
10,841 changes: 10,841 additions & 0 deletions Google Playstore Analysis And Rating Predictor/Dataset/googleplaystore.csv

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Large diffs are not rendered by default.

90 changes: 90 additions & 0 deletions Google Playstore Analysis And Rating Predictor/Model/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
<h1>Google PlayStore Analysis and Rating Predictor</h1>

**GOAL**

To analyze the 'Google Playstore Dataset' Dataset using Exploratory Data analysis and make a regression model to predict the rating of the apps.

**DATASET**

https://www.kaggle.com/datasets/madhav000/playstore-analysis

**DESCRIPTION**

The problem is to identify the apps that are going to be good for Google to promote. App ratings, which are provided by the customers, is always a great indicator of the goodness of the app. The problem reduces to: predict which apps will have high ratings.

The dataset contains the following columns:
- App : Applicaton Name
- Category: Category to which the app belongs
- Rating: Overall user rating of the app
- Reviews: Number of user reviews for the app
- Size: Size of the app
- Installs: Number of user downloads/installs for the app
- Type: Paid or Free
- Price: Price of the app
- Content Rating: Age group the app is targeted at - Children / Mature 21+ / Adult
- Genres: An app can belong to multiple genres (apart from its main category). For example, a musical family game will belong to Music, Game, Family genres.
- Last Updated: Date when the app was last updated on Play Store
- Current Ver: Current version of the app available on Play Store
- Android Ver: Minimum required Android version

**WHAT I HAD DONE**

* Checked for missing values and cleaned the data accordingly
* Analyzed the data, found insights and visualized them accordingly.
* Found detailed insights of different columns with one another using plotting libraries.
* Deployed four regression models viz., Linear Regression, Support Vector regression, Decision Tree Regression, Random FOrest Regression to predict the rating.
* Used RMSE(Root Mean Square Error) to evaluate the performance of the models.


**LIBRARIES NEEDED**

1. Pandas
2. Matplotlib
3. Seaborn
4. Plotly
5. Numpy
6. WordCloud
7. Sklearn

**VISUALIZATION**
![App distribution sunburst chart by category](<../Images/App distribution sunburst chart by category.png>)
![App size distribution by category](<../Images/App size distribution by category.png>)
![App distribution by category](<../Images/App distribution by category.png>)
![Average Price By category](<../Images/Average Price By category.png>)
![Category by Content Rating](<../Images/Category by Content Rating.png>)
![wordcloud for category column](<../Images/wordcloud for category column.png>)
![Total installs by category](<../Images/Total installs by category.png>)
![Number of apps by type](<../Images/Number of apps by type.png>)
![Word Cloud for Genre Column](<../Images/Word Cloud for Genre Column.png>)
![Number of applications by content rating](<../Images/Number of applications by content rating.png>)

For more visualization refer the .ipynb file :)

**Model Performances**

|Model | RMSE |
| ------------------------- | -------------------|
|Linear Regression | 0.5474395094809866 |
|Support Vector Regression | 0.545564956206932 |
|Decision Tree Regression | 0.7552190124670595 |
|Random Forest Regression | 0.7552190124670595 |

- reviews, type, installs and size columns were used to make the regression model with the rating column being the target vector
- high RMSE shows that the data has a wide variation in it.

**CONCLUSION**
- Various Categories of apps have varied ratings
- Most installed apps belonged to the category of 'Game'
- Most of the apps on the playstore are rated for all age groups of audience
- More than 3/4th of the apps are free to install.
- Among the paid apps, finance apps were most expensive having an average price of $8, followed by Lifestyle apps at $6 and medical apps at $3 average.
- All the apps were less than 100 mb.
- Most of the available apps had 'Family' category.
- The Support Vector regression(with rbf kernel) had the minimum RMSE among the algorithms used making it the best model.

**AUTHOR**

- Code contributed by *Mariam* @ #JWoC_2024

[![LinkedIn](https://img.shields.io/badge/linkedin-%230077B5.svg?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/mariam-m7084)
[![GitHub](https://img.shields.io/badge/github-%23121011.svg?style=for-the-badge&logo=github&logoColor=white)](https://github.com/mariam7084/)
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Numpy == 1.25.2
Matplotlib == 3.7.1
Pandas == 1.5.3
Seaborn == 0.13.1
Plotly == 5.15.0
Wordcloud == 1.9.3
Sklearn == 1.2.2

0 comments on commit 6c581aa

Please sign in to comment.