Skip to content

Harshit-Sinha-49/YouTube-view-prediction-model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

YouTube View Count Prediction and Viewers Analsysis Model


Description:

This model focuses on the predictive analysis of YouTube view counts, employing the CatBoost algorithm to model and forecast viewership trends. Leveraging the YouTube API, data was collected from a specific channel, encompassing video metadata and comment section sentiment analysis to provide a comprehensive understanding of audience engagement dynamics.

Techn Stack


Language : Python
Library : Pandas, Matplotlib, Seaborn, NLTK
Platform : Juypter Notebook

Output

View Count Analysis Output:

Views Plot

Scatter Plot

Views Histogram

Views Log Histogram

Seasonal Decomposition

Seasonal Pattern

Data Distribution Plot

Heatmap

Catboost (without addition feature)

Viewer Analysis Output:

Boxplot

Sentiment Distribution

Wordcloud


Data collection

Data collection was facilitated through the YouTube API, enabling the extraction of video metadata such as title, description, and tags, along with engagement metrics including likes, dislikes, and comments. Additionally, sentiment analysis was performed on the comment section to gauge audience sentiment and its impact on viewership.

CatBoost algorithm

The CatBoost algorithm, known for its robustness in handling categorical variables and its ability to mitigate overfitting, was chosen for its suitability in predicting view counts amidst the complex landscape of YouTube content. Through feature engineering and model optimization, our analysis aimed to uncover the key factors influencing video popularity and viewership dynamics.

Result

The results of our analysis highlight the significance of various factors such as video length, title sentiment, and viewer interaction in influencing view counts. Furthermore, the CatBoost model demonstrated strong predictive performance, accurately capturing the nuances of audience behavior and content preferences.

Result