Training an Electra-base model #2

dumitrescustefan · 2021-05-22T14:20:08Z

dumitrescustefan
May 22, 2021

Congrats on the gc4lm! As I'm also trying to train LMs for Romanian, and compute resources are pretty scarce, and as I've seen you mentioned you trained an Electra on a v3-32 TPU, I'd like to ask you how were you able to get it? I did manage to train the first BERT model for Romanian last year, but that was with help from a Finnish university (they provided the GPUs). Right now with a single V100 that I have access to, I'm not really able to train anything larger than Electra-smalls.

I also applied for TFRC and got only 5 v3-8s, and with the 300$ free credits there was not enough time to train, for example, an Electra-base. So, I'd like to ask you the following:

Did you do a special request to TFRC to get access to a v3-32?
How long did it take to train on a v3-32? (Just an estimate, as gc4 is 800+GB, and I don't know the batch size, but just to form an idea on how long and how many times, on average, a training would go over my 30GB corpus).
You used the standard Electra google-repo code? (like your Turkish BERT Electra model for example)

Right now I have a pretty large corpus for Romanian (~30GB, probably the largest available, that I've spent months gathering and cleaning), and I started an Electra-base training back in March, and it will run probably until this September-October :). It'd be awesome if I would be able to get to train these models faster.

Thank you very much for your time!

stefan-it · 2021-05-22T21:05:21Z

stefan-it
May 22, 2021
Maintainer

Hi @dumitrescustefan ,

thanks for your interest 🤗

I got access to the v3-32 TPUs through TFRC within a special Alpha TPU test. Yeah, using only one V100 is not really sufficient to train larger ELECTRA models: I did one experiment that took ~4 days for an ELECTRA small model on a V100, but the same model could be trained on a v3-8 within 8 (!) hours.

A good intuition about training times can be found in our German's Next Language Model paper where we report training times for our BERT and ELECTRA models. So using a v3-8 TPU would take 8-9 days. The GC4 model took around 3,5 days on a v3-32 TPU, but: the vocab size also affects training time, so when you use a smaller vocab (e.g. the "usual" 32k vocab it could be done in less than 3,5 days). So these training times should also be realistic for your Romanian model!

I used the official ELECTRA repo and the same pre-processing steps as documented in the BERTurk Cheatsheet.

If you're interested I could help in training a model on a v3-32 TPU on your cleaned corpus :)

2 replies

dumitrescustefan May 23, 2021
Author

Thanks for the time to answer, it's quite informative! Yes, if you're willing, I'd be glad to ask for your time (and v3-32 :D) further to train a new model :)

I have a bit of details to give you regarding the corpus, so if you'd like, I think it would be better to send you an email. As I don't know your address, if you could send me a mail at dumitrescu.stefan@gmail.com it would be awesome - I will respond with the details.

Thanks! 👍

dumitrescustefan May 23, 2021
Author

Later edit: I updated mail address, sorry for the typo, i forgot the "." in the middle, in case you might have already sent an email

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training an Electra-base model #2

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Training an Electra-base model #2

dumitrescustefan May 22, 2021

Replies: 1 comment · 2 replies

stefan-it May 22, 2021 Maintainer

dumitrescustefan May 23, 2021 Author

dumitrescustefan May 23, 2021 Author

dumitrescustefan
May 22, 2021

Replies: 1 comment 2 replies

stefan-it
May 22, 2021
Maintainer

dumitrescustefan May 23, 2021
Author

dumitrescustefan May 23, 2021
Author