Skip to content

Latest commit

 

History

History
21 lines (17 loc) · 1.05 KB

File metadata and controls

21 lines (17 loc) · 1.05 KB

Language Identification and Named Entity Recognition in Hinglish Code Mixed Tweets

Kushagra Singh, Indira Sen, Ponnurangam Kumaraguru
ACL 2018, SRW
Link to paper

Repository contains
(i) Seq2seq based transliterator (Roman to Devanagri)
(ii) Language identification tool for Hindi-English code switched text (English, Hindi, Rest)
(iii) CRF based Named Entity Recogntion tool for Hindi-English code switched text (Person, Location, Organisation)

Check http://precog.iiitd.edu.in/resources.html for the annotated corpus.

  • Install dependencies using requirements.txt file in a virtualenv.

  • Check the README in transliteration dir and follow instructions to set up.

  • Export the following env variables before running demo files

export TRANSLITERATION_DIR={{path_to_parent_dir}}/hindi-english-code-mixing-lidf-ner/transliteration
export HINGLISH_ROOT_DIR={{path_to_parent_dir}}/hindi-english-code-mixing-lidf-ner