Sentiment-analysis-project

Project Overview

This project demonstrates sentiment analysis on the NLTK movie reviews dataset using machine learning techniques. The project includes data preprocessing, feature extraction using TF-IDF vectorisation, and the implementation of two machine learning models: Multinomial Naive Bayes and Logistic Regression. The aim is to classify movie reviews as positive or negative based on their content.

Technologies Used

Pandas
NLTK
Scikit-Learn
Seaborn
Matplotlib

Dataset

The dataset used in this project is the NLTK movie reviews dataset. It contains 2,000 movie reviews categorized into positive and negative sentiments.

Source: NLTK library
Categories: Positive, Negative
Number of Reviews: 2,000

Project Structure

sentiment-analysis/
│
├── data/
│ └── movie_reviews_dataset.csv
│
├── notebook/
│ └── sentiment_analysis.ipynb
│
├── results/
│ ├── distribution_of_sentiment_categories.png
│ └── classification_report.txt
│
└── requirements.txt

Methodology

Data Collection:

The dataset used for this project is the NLTK movie reviews dataset, containing 2,000 labeled movie reviews (positive or negative).
Data Preprocessing:
1. Loading the Data: Using NLTK's built-in functions.
2. Cleaning the Text Data: Removing stopwords, converting to lowercase, and removing punctuation.
3. Tokenization: Converting text data into individual words.
4. Dataframe Creation: Converting the cleaned data into a Pandas DataFrame and saving as a CSV file.
Exploratory Data Analysis (EDA):
1. Checking for missing values and removing duplicates.
2. Visualizing the distribution of sentiment categories.
Feature Extraction:

Using TF-IDF vectorization to transform text data into numerical features.
Model Building and Training:
1. Splitting the data into training and testing sets (80-20 split).
2. Training a Multinomial Naive Bayes classifier and a Logistic Regression model on the TF-IDF features.
Model Evaluation:

Calculating the accuracy score and generating classification reports for both models.

Setup and Installations

Clone the repository:

git clone https://github.com/ellahu1003/sentiment-analysis-project.git
cd sentiment-analysis-project

Install the required libraries:
```
pip install -r Requirements.txt
```

Run the Jupyter Notebook:

jupyter notebook notebook/sentiment_analysis.ipynb

Requirements

The 'Requirements.txt' file lists all the Python packages required to run the project. Install these dependencies to avoid any compatibility issues.

Results

The accuracy of the Multinomial Naive Bayes model: [0.785].
The accuracy of the Logistic Regression model: [0.795].
Detailed classification reports for both models are available in classification_report.txt.
Distribution of the sentiment categories is visualised in distribution_of_sentiment_categories.png.

Conclusion

The sentiment analysis project successfully demonstrates the application of natural language processing and machine learning techniques to classify movie reviews as positive or negative. The Multinomial Naive Bayes and Logistic Regression models both performed well, with Logistic Regression slightly outperforming in terms of accuracy.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
Requirements.txt		Requirements.txt
classification report.txt		classification report.txt
distribution_of_sentiment_categories.png		distribution_of_sentiment_categories.png
movie_reviews_dataset.csv		movie_reviews_dataset.csv
sentiment_analysis.ipynb		sentiment_analysis.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment-analysis-project

Project Overview

Technologies Used

Dataset

Project Structure

Methodology

Setup and Installations

Requirements

Results

Conclusion

About

Releases

Packages

Languages

ellahu1003/Sentiment-Analysis-Project

Folders and files

Latest commit

History

Repository files navigation

Sentiment-analysis-project

Project Overview

Technologies Used

Dataset

Project Structure

Methodology

Setup and Installations

Requirements

Results

Conclusion

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages