This git repository contains code py files, notebook files, to analyze the Media Publishers in terms of their published articles.
** This is only a draft copy
- Extract Google Search URLs for the User's Search Term
- Extract News Articles data using the URL's from Meta data
- Process & Clean the text data
- Analyze the data using typical NLP metrics
- Using Pre-trained models predict the
Over all Sentiment
&Sentiment Flow
- Using Pre-trained models predict the
Over all Emotion
&Emotion Flow
- Apply Clustering Models and identify different Clusters
- Using the Clusters as the
Lables
create a Supervised Classification Model - Using the Model identify the
Feature Importance
- Using those Features differentiate the clusters and the Media houses
Install the required packages to run the Notebook
pip install requirments.txt
Python, NLP, Text Processing,Transformers, Huggingface, Web Scraping