Big data analysis on youtubers based on increase and decrease of subscribers and comments during the increase and decrease period.
Many dream of becoming a famous Youtuber these days. Youtube contents itself, for sure, influences a channel's fame. However, Youtube users writing comments also influence a channel's popularity. Therefore, this project analyze top 30 popularity increase and decrease channels and compare the comments during the increase and decrease period.
Since the comments may vary drastically depending on which category the youtubers are in, the Youtube channels are split into six different categories which are:
- Autos & Vehicles
- Entertainment
- Gaming
- How to & Style
- Science & Technology
- Travel & Events
In the experiment, ratios and z-scores are calculated with 60 channels for each postivie/negative status in six different categories. The test is done using 30 channels for each positive/negative status in six different categories as well.
Present ~ | Autos & Vehicles | Entertainment | Gaming | How to & Style | Science & Technology | Travel & Events | |
---|---|---|---|---|---|---|---|
Duplicate | Ratio | 53.85% | 72.73% | 56.86% | 50.91% | 52.00% | 52.83% |
Z-Score | 57.69% | 72.73% | 56.86% | 50.91% | 52.00% | 52.83% | |
No Duplicate | Ratio | 53.85% | 59.09% | 56.86% | 50.91% | 52.00% | 52.83% |
Z-Score | 55.77% | 59.09% | 56.86% | 50.91% | 54.00% | 52.83% |
Present ~ 1 Week Ago | Autos & Vehicles | Entertainment | Gaming | How to & Style | Science & Technology | Travel & Events | |
---|---|---|---|---|---|---|---|
Duplicate | Ratio | 45.45% | 59.09% | 52.94% | 45.45% | 48.00% | 50.94% |
Z-Score | 48.08% | 59.09% | 52.94% | 47.27% | 50.00% | 50.94% | |
No Duplicate | Ratio | 48.08% | 59.09% | 52.94% | 45.45% | 48.00% | 50.94% |
Z-Score | 48.08% | 59.09% | 52.94% | 47.27% | 50.00% | 50.94% |
1 Week Ago ~ 2 Week Ago | Autos & Vehicles | Entertainment | Gaming | How to & Style | Science & Technology | Travel & Events | |
---|---|---|---|---|---|---|---|
Duplicate | Ratio | 21.15% | 13.64% | 15.69% | 21.82% | 32.00% | 39.62% |
Z-Score | 19.23% | 22.73% | 11.76% | 20.00% | 34.00% | 39.62% | |
No Duplicate | Ratio | 21.15% | 15.91% | 15.69% | 21.82% | 32.00% | 39.62% |
Z-Score | 19.23% | 22.73% | 11.76% | 20.00% | 34.00% | 39.62% |
In order to run this code, you must get a Youtube API key from
Google Developer console and have the key as API_KEY
as an
environment variable.
pip install -r requirements.txt
Web scrape statistics of top 30 increase and decrease categories.
python web_scrape.py
Query maximum of 5 most recent videos and get 100 the comments and statistics.
Youtube only allows people to use 10,000 units/day. If you do not have additional permission, you must fix the code so it gets data for a single category at a time.
python api_query.py
Preprocess data by doing the followings:
- Tokenize comments
- Remove punctuation
- Keep English only
- Remove stopwords
- Extract word stem
- Count words
- Ratio
- Z-Score
python preprocess_data.py
Visualize data by the followings:
- Wordcloud
- Horizontal Bar Graph
- Vertical Bar Graph
python visualize.py
Test the accuracy of calculated ratio and z-score.
You must re-do step 1️⃣ and step 2️⃣ to collect data for testing first
python test.py