Skip to content

ArbabKhan-sudo/Natural-Language-Processing

Repository files navigation

In this repo, I have covered preprocessing by,

Cleaning: Remove irrelevant items like HTML tags, symbols, and nonalphabetic characters from the corpus (Data set in NLP)

Normalization: Convert all words to lowercase. Remove punctuation and extra spaces.

Tokenization: Split the text into words, also known as tokens.

Stop Words Removal: Remove the most common words (a, an, the, etc.).

Parts of Speech Tagging: Identify the parts of speech for the remaining words.

Named Entity Recognition: Recognize the named entities in the data.

Stemming and Lemmatisation: Convert words into dictionary forms, using stemming and lemmatization.

#######################################################################################################

Applications that are covered,

-> Speech-to-text conversion

-> Text Preprocessing

-> Language Modelling

-> Language Translation

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published