News Sorting with NLP: BBC News Dataset 📰🔍

Overview

Welcome to the News Sorting project using Natural Language Processing (NLP) techniques applied to the BBC News Dataset. This project aims to classify news articles into predefined categories such as business, entertainment, politics, sport, and tech. By leveraging NLP, we'll extract features from the text data to build a machine learning model capable of accurately categorizing news articles.

Dataset

The BBC News Dataset consists of news articles published by the BBC, categorized into five predefined classes: business, entertainment, politics, sport, and tech. Each article contains textual content along with its corresponding category label. The dataset is available on Kaggle.

Project Structure

data/: Directory to store the dataset files.
notebooks/: Jupyter notebooks for data exploration, text preprocessing, model training, and evaluation.
models/: Saved models after training.
results/: Results and evaluation metrics.
README.md: This file, providing an overview of the project.

Setup

To run the project, you'll need Python 3.x and the following libraries:

numpy
pandas
scikit-learn
matplotlib
seaborn
nltk
wordcloud

You can install these dependencies using pip:

pip install numpy pandas scikit-learn matplotlib seaborn nltk wordcloud

Usage

Download the BBC News Dataset from Kaggle.
Place the dataset files in the data/ directory.
Open and run the Jupyter notebooks in the notebooks/ directory sequentially. These notebooks cover data preprocessing, text feature extraction, model training, and evaluation.
After training the models, evaluate their performance using appropriate evaluation metrics.
Experiment with different NLP techniques, models, and hyperparameters to improve performance.

Results

The project aims to achieve the following outcomes:

Develop an NLP model capable of accurately categorizing news articles into business, entertainment, politics, sport, and tech categories.
Evaluate the model's performance using metrics such as accuracy, precision, recall, and F1-score.
Visualize the results and gain insights into the classification performance.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
notebooks		notebooks
src/news_sorting_project		src/news_sorting_project
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
setup.py		setup.py
template.py		template.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

News Sorting with NLP: BBC News Dataset 📰🔍

Overview

Dataset

Project Structure

Setup

Usage

Results

License

About

Releases

Packages

Languages

License

Rahul-404/bbc-news-sorting

Folders and files

Latest commit

History

Repository files navigation

News Sorting with NLP: BBC News Dataset 📰🔍

Overview

Dataset

Project Structure

Setup

Usage

Results

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages