add my own data to the pretrained model #61
-
Hi, thank you for this great work. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
For continuing Pre-training I suggest you use this code https://github.com/huggingface/transformers/blob/master/examples/language-modeling/run_mlm.py. It will run masked language modeling without the next sentence prediction task. You just have to provide it with a text file with 1 sentence per line i think. It works directly with all AraBERT models (notice that for v1 and v2 you have to pre-segment the text data first) The task is bit hard to setup, but good luck |
Beta Was this translation helpful? Give feedback.
For continuing Pre-training I suggest you use this code https://github.com/huggingface/transformers/blob/master/examples/language-modeling/run_mlm.py. It will run masked language modeling without the next sentence prediction task. You just have to provide it with a text file with 1 sentence per line i think. It works directly with all AraBERT models (notice that for v1 and v2 you have to pre-segment the text data first)
The task is bit hard to setup, but good luck