Sentiment analysis of Blog authorship corpus Data

This undertaking employs advanced text analytics techniques to effectively analyze and summarize an extensive corpus of blogger content.

The initial step involved reading and understanding the dataset by importing it into a data frame, eliminating null and duplicate values, and visually inspecting random data points to gain insight into the structure and content of the data.

Subsequently, various data cleaning techniques were applied, including dropping unimportant columns, removing all numbers, symbols and extra spaces, and standardizing all text to upper and lowercase alphabets. This was followed by the elimination of stop words, non-English words, misspelled words, and chat acronyms, to ensure a high level of data integrity.

To gain a deeper understanding of the content, advanced text analysis techniques were applied, such as computing polarity and subjectivity values across various demographics, and visualizing their distribution.

Furthermore, trends in word usage were identified across various demographics and blog categories using advanced data visualization techniques such as word clouds, providing a comprehensive understanding of the corpus and its underlying themes.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Final Project Code.py		Final Project Code.py
Final Project Report- Surya Vaddhiparthy.pdf		Final Project Report- Surya Vaddhiparthy.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment analysis of Blog authorship corpus Data

About

Releases

Packages

Languages

vaddhiparthy/BlogSentimentAnalysis

Folders and files

Latest commit

History

Repository files navigation

Sentiment analysis of Blog authorship corpus Data

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages