add my own data to the pretrained model #61

nada2017 · 2021-01-25T23:41:34Z

nada2017
Jan 25, 2021

Hi, thank you for this great work.
I am a beginner in this and I would like to know how can I add my data to the pretrained model without training all data all over again?

Answered by WissamAntoun

Jan 26, 2021

For continuing Pre-training I suggest you use this code https://github.com/huggingface/transformers/blob/master/examples/language-modeling/run_mlm.py. It will run masked language modeling without the next sentence prediction task. You just have to provide it with a text file with 1 sentence per line i think. It works directly with all AraBERT models (notice that for v1 and v2 you have to pre-segment the text data first)

The task is bit hard to setup, but good luck

View full answer

WissamAntoun · 2021-01-26T07:37:20Z

WissamAntoun
Jan 26, 2021
Maintainer

For continuing Pre-training I suggest you use this code https://github.com/huggingface/transformers/blob/master/examples/language-modeling/run_mlm.py. It will run masked language modeling without the next sentence prediction task. You just have to provide it with a text file with 1 sentence per line i think. It works directly with all AraBERT models (notice that for v1 and v2 you have to pre-segment the text data first)

The task is bit hard to setup, but good luck

2 replies

AmjadNas Mar 13, 2023

Hi Wissam

could you please provide another example?
the link you provided no longer exists.

Thanks

WissamAntoun Mar 13, 2023
Maintainer

the file lives here now https://github.com/huggingface/transformers/tree/main/examples/pytorch/language-modeling

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add my own data to the pretrained model #61

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

add my own data to the pretrained model #61

nada2017 Jan 25, 2021

Replies: 1 comment · 2 replies

WissamAntoun Jan 26, 2021 Maintainer

AmjadNas Mar 13, 2023

WissamAntoun Mar 13, 2023 Maintainer

nada2017
Jan 25, 2021

Replies: 1 comment 2 replies

WissamAntoun
Jan 26, 2021
Maintainer

WissamAntoun Mar 13, 2023
Maintainer