Skip to content

System developed in Java to normalize bilingual text found on social media and handle abbreviation, slang words and intentionally misspelt words

Notifications You must be signed in to change notification settings

nehapai23/Text-Normalization-Of-Code-Mix

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Text-Normalization-Of-Code-Mix

May 2017
  • Designed a system to efficiently preprocess Impure Code-Mixed text obtained from Social Media
  • Performed data cleaning and preprocessing of text.
  • Identified and converted various Net Lingo (e.g. Abbreviations, Slang words, Intentionally Misspelt words etc.) using a dictionary-based approach and Regex
  • Designed an algorithm for transliteration of Romanized Hindi words to Devanagari script using syllabification

About

System developed in Java to normalize bilingual text found on social media and handle abbreviation, slang words and intentionally misspelt words

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages