Skip to content

Using NLP, this project categorizes BBC News articles into business, entertainment, politics, sport, and tech. It leverages text data to build a precise classification model.

License

Notifications You must be signed in to change notification settings

Rahul-404/bbc-news-sorting

Repository files navigation

News Sorting with NLP: BBC News Dataset 📰🔍

Overview

Welcome to the News Sorting project using Natural Language Processing (NLP) techniques applied to the BBC News Dataset. This project aims to classify news articles into predefined categories such as business, entertainment, politics, sport, and tech. By leveraging NLP, we'll extract features from the text data to build a machine learning model capable of accurately categorizing news articles.

Dataset

The BBC News Dataset consists of news articles published by the BBC, categorized into five predefined classes: business, entertainment, politics, sport, and tech. Each article contains textual content along with its corresponding category label. The dataset is available on Kaggle.

Project Structure

  • data/: Directory to store the dataset files.
  • notebooks/: Jupyter notebooks for data exploration, text preprocessing, model training, and evaluation.
  • models/: Saved models after training.
  • results/: Results and evaluation metrics.
  • README.md: This file, providing an overview of the project.

Setup

To run the project, you'll need Python 3.x and the following libraries:

  • numpy
  • pandas
  • scikit-learn
  • matplotlib
  • seaborn
  • nltk
  • wordcloud

You can install these dependencies using pip:

pip install numpy pandas scikit-learn matplotlib seaborn nltk wordcloud

Usage

  1. Download the BBC News Dataset from Kaggle.
  2. Place the dataset files in the data/ directory.
  3. Open and run the Jupyter notebooks in the notebooks/ directory sequentially. These notebooks cover data preprocessing, text feature extraction, model training, and evaluation.
  4. After training the models, evaluate their performance using appropriate evaluation metrics.
  5. Experiment with different NLP techniques, models, and hyperparameters to improve performance.

Results

The project aims to achieve the following outcomes:

  • Develop an NLP model capable of accurately categorizing news articles into business, entertainment, politics, sport, and tech categories.
  • Evaluate the model's performance using metrics such as accuracy, precision, recall, and F1-score.
  • Visualize the results and gain insights into the classification performance.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

Using NLP, this project categorizes BBC News articles into business, entertainment, politics, sport, and tech. It leverages text data to build a precise classification model.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published