code-switch classifier - next token predication
First time, do the following:
- extract bm_tagged/bm_tagged.zip
- create bm_tagged/ (or bm_tagged_w_cognatehood/) and run yuli_cognates/tag_cognatehood.py to get the .csv files
- create corpus/ and run read_corpus.py to get the .dat files
- run & explore the uter_cls.ipynb