GitHub - ArbabKhan-sudo/Natural-Language-Processing

In this repo, I have covered preprocessing by,

Cleaning: Remove irrelevant items like HTML tags, symbols, and nonalphabetic characters from the corpus (Data set in NLP)

Normalization: Convert all words to lowercase. Remove punctuation and extra spaces.

Tokenization: Split the text into words, also known as tokens.

Stop Words Removal: Remove the most common words (a, an, the, etc.).

Parts of Speech Tagging: Identify the parts of speech for the remaining words.

Named Entity Recognition: Recognize the named entities in the data.

Stemming and Lemmatisation: Convert words into dictionary forms, using stemming and lemmatization.

#######################################################################################################

Applications that are covered,

-> Speech-to-text conversion

-> Text Preprocessing

-> Language Modelling

-> Language Translation

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
README.md		README.md
Text_Preprocessing.ipynb		Text_Preprocessing.ipynb
Updates		Updates
Using_GPT_model_for_language_translation.ipynb		Using_GPT_model_for_language_translation.ipynb
Using_Language_Modelling_for_MellodyGeneration.ipynb		Using_Language_Modelling_for_MellodyGeneration.ipynb
speech_to_text.ipynb		speech_to_text.ipynb

Provide feedback