Merge pull request #588 from mariam7084/main

Google Playstore Analysis and Rating Prediction
abhisheks008 · Feb 21, 2024 · 6c581aa · 6c581aa
2 parents 452f52e + 38adfd6
commit 6c581aa
Show file tree

Hide file tree

Showing 16 changed files with 14,089 additions and 0 deletions.
diff --git a/Google Playstore Analysis And Rating Predictor/Dataset/googleplaystore.csv b/Google Playstore Analysis And Rating Predictor/Dataset/googleplaystore.csv
diff --git a/...Playstore Analysis And Rating Predictor/Images/App distribution by category.png b/...Playstore Analysis And Rating Predictor/Images/App distribution by category.png
diff --git a/...sis And Rating Predictor/Images/App distribution sunburst chart by category.png b/...sis And Rating Predictor/Images/App distribution sunburst chart by category.png
diff --git a/...tore Analysis And Rating Predictor/Images/App size distribution by category.png b/...tore Analysis And Rating Predictor/Images/App size distribution by category.png
diff --git a/...le Playstore Analysis And Rating Predictor/Images/Average Price By category.png b/...le Playstore Analysis And Rating Predictor/Images/Average Price By category.png
diff --git a/...e Playstore Analysis And Rating Predictor/Images/Category by Content Rating.png b/...e Playstore Analysis And Rating Predictor/Images/Category by Content Rating.png
diff --git a/...le Playstore Analysis And Rating Predictor/Images/Distribution of App Sizes.png b/...le Playstore Analysis And Rating Predictor/Images/Distribution of App Sizes.png
diff --git a/...alysis And Rating Predictor/Images/Number of applications by content rating.png b/...alysis And Rating Predictor/Images/Number of applications by content rating.png
diff --git a/Google Playstore Analysis And Rating Predictor/Images/Number of apps by type.png b/Google Playstore Analysis And Rating Predictor/Images/Number of apps by type.png
diff --git a/...re Analysis And Rating Predictor/Images/Scatter plot of reviews vs installs.png b/...re Analysis And Rating Predictor/Images/Scatter plot of reviews vs installs.png
diff --git a/...e Playstore Analysis And Rating Predictor/Images/Total installs by category.png b/...e Playstore Analysis And Rating Predictor/Images/Total installs by category.png
diff --git a/... Playstore Analysis And Rating Predictor/Images/Word Cloud for Genre Column.png b/... Playstore Analysis And Rating Predictor/Images/Word Cloud for Genre Column.png
diff --git a/...laystore Analysis And Rating Predictor/Images/wordcloud for category column.png b/...laystore Analysis And Rating Predictor/Images/wordcloud for category column.png
diff --git a/...Analysis And Rating Predictor/Model/Google_Playstore_Analysis_and_Rating_Prediction.ipynb b/...Analysis And Rating Predictor/Model/Google_Playstore_Analysis_and_Rating_Prediction.ipynb
diff --git a/Google Playstore Analysis And Rating Predictor/Model/README.md b/Google Playstore Analysis And Rating Predictor/Model/README.md
@@ -0,0 +1,90 @@
+<h1>Google PlayStore Analysis and Rating Predictor</h1>
+
+**GOAL**
+
+To analyze the 'Google Playstore Dataset' Dataset using Exploratory Data analysis and make a regression model to predict the rating of the apps.
+
+**DATASET**
+
+https://www.kaggle.com/datasets/madhav000/playstore-analysis
+
+**DESCRIPTION**
+
+The problem is to identify the apps that are going to be good for Google to promote. App ratings, which are provided by the customers, is always a great indicator of the goodness of the app. The problem reduces to: predict which apps will have high ratings.
+
+The dataset contains the following columns:
+- App : Applicaton Name
+- Category: Category to which the app belongs
+- Rating: Overall user rating of the app
+- Reviews: Number of user reviews for the app
+- Size: Size of the app
+- Installs: Number of user downloads/installs for the app
+- Type: Paid or Free
+- Price: Price of the app
+- Content Rating: Age group the app is targeted at - Children / Mature 21+ / Adult
+- Genres: An app can belong to multiple genres (apart from its main category). For example, a musical family game will belong to Music, Game, Family genres.
+- Last Updated: Date when the app was last updated on Play Store
+- Current Ver: Current version of the app available on Play Store
+- Android Ver: Minimum required Android version
+
+**WHAT I HAD DONE**
+
+* Checked for missing values and cleaned the data accordingly
+* Analyzed the data, found insights and visualized them accordingly.
+* Found detailed insights of different columns with one another using plotting libraries.
+* Deployed four regression models viz., Linear Regression, Support Vector regression, Decision Tree Regression, Random FOrest Regression to predict the rating.
+* Used RMSE(Root Mean Square Error) to evaluate the performance of the models.
+
+
+**LIBRARIES NEEDED**
+
+1. Pandas
+2. Matplotlib
+3. Seaborn
+4. Plotly
+5. Numpy
+6. WordCloud
+7. Sklearn
+
+**VISUALIZATION**
+![App distribution sunburst chart by category](<../Images/App distribution sunburst chart by category.png>)
+![App size distribution by category](<../Images/App size distribution by category.png>)
+![App distribution by category](<../Images/App distribution by category.png>)
+![Average Price By category](<../Images/Average Price By category.png>)
+![Category by Content Rating](<../Images/Category by Content Rating.png>)
+![wordcloud for category column](<../Images/wordcloud for category column.png>)
+![Total installs by category](<../Images/Total installs by category.png>)
+![Number of apps by type](<../Images/Number of apps by type.png>)
+![Word Cloud for Genre Column](<../Images/Word Cloud for Genre Column.png>)
+![Number of applications by content rating](<../Images/Number of applications by content rating.png>)
+
+For more visualization refer the .ipynb file :)
+
+**Model Performances**
+
+|Model | RMSE |
+| ------------------------- | -------------------|
+|Linear Regression | 0.5474395094809866 |
+|Support Vector Regression | 0.545564956206932 |
+|Decision Tree Regression | 0.7552190124670595 |
+|Random Forest Regression | 0.7552190124670595 |
+
+- reviews, type, installs and size columns were used to make the regression model with the rating column being the target vector
+- high RMSE shows that the data has a wide variation in it.
+
+**CONCLUSION**
+- Various Categories of apps have varied ratings
+- Most installed apps belonged to the category of 'Game'
+- Most of the apps on the playstore are rated for all age groups of audience
+- More than 3/4th of the apps are free to install.
+- Among the paid apps, finance apps were most expensive having an average price of $8, followed by Lifestyle apps at $6 and medical apps at $3 average.
+- All the apps were less than 100 mb.
+- Most of the available apps had 'Family' category.
+- The Support Vector regression(with rbf kernel) had the minimum RMSE among the algorithms used making it the best model.
+
+**AUTHOR**
+
+- Code contributed by *Mariam* @ #JWoC_2024
+
+[![LinkedIn](https://img.shields.io/badge/linkedin-%230077B5.svg?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/mariam-m7084) 
+[![GitHub](https://img.shields.io/badge/github-%23121011.svg?style=for-the-badge&logo=github&logoColor=white)](https://github.com/mariam7084/)
diff --git a/Google Playstore Analysis And Rating Predictor/Requirements.txt b/Google Playstore Analysis And Rating Predictor/Requirements.txt
@@ -0,0 +1,7 @@
+Numpy == 1.25.2
+Matplotlib == 3.7.1
+Pandas == 1.5.3
+Seaborn == 0.13.1
+Plotly == 5.15.0
+Wordcloud == 1.9.3
+Sklearn == 1.2.2