Language Identification and Named Entity Recognition in Hinglish Code Mixed Tweets

Kushagra Singh, Indira Sen, Ponnurangam Kumaraguru
ACL 2018, SRW
Link to paper

Repository contains
(i) Seq2seq based transliterator (Roman to Devanagri)
(ii) Language identification tool for Hindi-English code switched text (English, Hindi, Rest)
(iii) CRF based Named Entity Recogntion tool for Hindi-English code switched text (Person, Location, Organisation)

Check http://precog.iiitd.edu.in/resources.html for the annotated corpus.

Install dependencies using requirements.txt file in a virtualenv.
Check the README in transliteration dir and follow instructions to set up.
Export the following env variables before running demo files

export TRANSLITERATION_DIR={{path_to_parent_dir}}/hindi-english-code-mixing-lidf-ner/transliteration
export HINGLISH_ROOT_DIR={{path_to_parent_dir}}/hindi-english-code-mixing-lidf-ner

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Language Identification and Named Entity Recognition in Hinglish Code Mixed Tweets

Files

README.md

Latest commit

History

README.md

File metadata and controls

Language Identification and Named Entity Recognition in Hinglish Code Mixed Tweets