Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text Classification, Named Entity Recognition, and Transformers Overview Documentation #80

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 87 additions & 0 deletions NLP/Algorithms/Named_Entity_Recognition/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# Named Entity Recognition (NER) Project

This project demonstrates a basic Named Entity Recognition (NER) algorithm using Python and the `spacy` library. The goal is to identify named entities in text and classify them into predefined categories.

## Directory Structure

```
ner_project/
├── data/
│ └── ner_data.txt
├── models/
│ └── ner_model
├── preprocess.py
├── train.py
└── recognize.py
└── README.md
```

- **data/ner_data.txt**: Contains the dataset used for training the NER model.
- **models/ner_model**: Stores the trained NER model.
- **preprocess.py**: Contains the code for preprocessing the text data.
- **train.py**: Script for training the NER model.
- **recognize.py**: Script for recognizing named entities in new text using the trained model.
- **README.md**: Project documentation.

## Dataset

The dataset (`ner_data.txt`) contains sentences and their corresponding entity labels in the IOB format. Each line contains a word and its label, separated by a space. Sentences are separated by blank lines.

## Preprocessing

The `preprocess.py` file contains functions to preprocess the text data. It reads the dataset and converts it into a format suitable for training with `spacy`.

## Training the Model

The `train.py` script is used to train the NER model. It performs the following steps:

1. Load a blank English model.
2. Create the NER pipeline component and add it to the pipeline.
3. Add labels to the NER component.
4. Load the training data.
5. Train the model using the training data.
6. Save the trained model to `models/ner_model`.

### Running the Training Script

To train the model, run:
```bash
python train.py
```

## Recognizing Named Entities

The `recognize.py` script is used to recognize named entities in new text using the trained model. It performs the following steps:

1. Load the trained model.
2. Process the input text.
3. Print the recognized entities and their labels.

### Running the Recognition Script

To recognize named entities in new text, run:
```bash
python recognize.py
```

## Dependencies

The project requires the following Python libraries:

- spacy

You can install the dependencies using:
```bash
pip install spacy
```

## Example Usage

```python
# Example usage of the recognize.py script
if __name__ == "__main__":
text = "I love programming in Python. Machine learning is fascinating. Spacy is a useful library."
recognize_entities(text)
```

This project provides a basic implementation of Named Entity Recognition using the `spacy` library. You can expand it by using more advanced models or preprocessing techniques based on your requirements.
Loading