What is the difference between AraBERTv0.1 and AraBERTv1? #14

hischen · 2020-06-15T17:35:24Z

hischen
Jun 15, 2020

Thank you for your contribution! But I'm still confused about what is the difference between AraBERTv0.1 and AraBERTv1 model

Answered by WaelMohammedAbed

Jun 15, 2020

From their paper, I can say the difference is that v1 is trained on segmented data while v0.1 is not.
"To avoid this issue, we first segment the words using Farasa (Abdelali et al., 2016) into stems, prefixes and suffixes. For instance, “اللغة - Alloga” becomes ال+لغ+ة -Al+ log +a”. Then, we trained a SentencePiece (an unsupervised text tokenizer and detokenizer (Kudo, 2018)), in unigram mode, on the segmented pre-training dataset to produce a subword vocabulary of ∼60K tokens. To evaluate the impact of the proposed tokenization, we also trained SentencePiece on non-segmented text to create a second version of ARABERT (AraBERTv0.1) that does not require any segmentation"

View full answer

WaelMohammedAbed · 2020-06-15T20:01:57Z

WaelMohammedAbed
Jun 15, 2020

From their paper, I can say the difference is that v1 is trained on segmented data while v0.1 is not.
"To avoid this issue, we first segment the words using Farasa (Abdelali et al., 2016) into stems, prefixes and suffixes. For instance, “اللغة - Alloga” becomes ال+لغ+ة -Al+ log +a”. Then, we trained a SentencePiece (an unsupervised text tokenizer and detokenizer (Kudo, 2018)), in unigram mode, on the segmented pre-training dataset to produce a subword vocabulary of ∼60K tokens. To evaluate the impact of the proposed tokenization, we also trained SentencePiece on non-segmented text to create a second version of ARABERT (AraBERTv0.1) that does not require any segmentation"

0 replies

WissamAntoun · 2020-06-16T01:18:01Z

WissamAntoun
Jun 16, 2020
Maintainer

yeah, its exactly what @WaelMohammedAbed said. The v0.1 model is trained on regular Arabic data and v1 is trained on pre-segmented data using farasa.
Take a look at the example code in the Readme file to get a clearer view.

Thank you @WaelMohammedAbed for answering

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the difference between AraBERTv0.1 and AraBERTv1? #14

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

What is the difference between AraBERTv0.1 and AraBERTv1? #14

hischen Jun 15, 2020

Replies: 2 comments

WaelMohammedAbed Jun 15, 2020

WissamAntoun Jun 16, 2020 Maintainer

hischen
Jun 15, 2020

WaelMohammedAbed
Jun 15, 2020

WissamAntoun
Jun 16, 2020
Maintainer