Skip to content

This repository contains a brief introduction about feature extraction of text based data.

Notifications You must be signed in to change notification settings

SrishtiVashishtha/Text-Based-Feature-Extraction-using-Python

Repository files navigation

Text-Based-Feature-Extraction-using-Python

This repository contains a brief introduction about feature extraction of text based data.

The textual data is present in resort.txt file.

The pre-processing steps of textual data are explained in Pre-processing of Data.py file. The basic pre-processing steps includes: Tokenization of words and sentences Removal of punctuations Removal of stop-words Stemming of words Lemmatization of words

The binary feature of data: A particular word exsists in a sentence:1, not exists in a sentence:0, is explained in Binary Features.py

The computation of count vector, that stores the frequency of words in a sentence, is explained in CountVector.py

The calculation of TF Matrix: Term Frequency matrix and TF-IDF: Term Frequency and Inverse Document Frequency matrix is explained in TF_matrix.py and TF-IDF_Matrix.py

About

This repository contains a brief introduction about feature extraction of text based data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages