Skip to content

precog-iiith/hindi-english-code-mixing-lidf-ner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Language Identification and Named Entity Recognition in Hinglish Code Mixed Tweets

Kushagra Singh, Indira Sen, Ponnurangam Kumaraguru
ACL 2018, SRW
Link to paper

Repository contains
(i) Seq2seq based transliterator (Roman to Devanagri)
(ii) Language identification tool for Hindi-English code switched text (English, Hindi, Rest)
(iii) CRF based Named Entity Recogntion tool for Hindi-English code switched text (Person, Location, Organisation)

Check http://precog.iiitd.edu.in/resources.html for the annotated corpus.

  • Install dependencies using requirements.txt file in a virtualenv.

  • Check the README in transliteration dir and follow instructions to set up.

  • Export the following env variables before running demo files

export TRANSLITERATION_DIR={{path_to_parent_dir}}/hindi-english-code-mixing-lidf-ner/transliteration
export HINGLISH_ROOT_DIR={{path_to_parent_dir}}/hindi-english-code-mixing-lidf-ner

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published