Create dataset loader for IJELID (Indonesian-Javanese-English Code-Mixed Language Identification) #345

SamuelCahyawijaya · 2023-03-10T08:55:27Z

Dataset	ijelid
Description	This is a clean version of code-mixed Indonesian-Javanese-English data for token level language identification. We name this dataset as IJELID (Indonesian-Javanese-English Language Identification). This dataset contains tweets that have been tokenized with the corresponding token and its language label. There are seven language labels in the dataset, namely: ID (Indonesian), JV (Javanese), EN (English), MIX_ID_EN (mixed Indonesian-English), MIX_ID_JV (mixed Indonesian-Javanese), MIX_JV_EN (mixed Javanese-English), OTH (Other).
License	CC-BY 4.0

Provide feedback