Training an Electra-base model #2
Replies: 1 comment 2 replies
-
Hi @dumitrescustefan , thanks for your interest 🤗 I got access to the v3-32 TPUs through TFRC within a special Alpha TPU test. Yeah, using only one V100 is not really sufficient to train larger ELECTRA models: I did one experiment that took ~4 days for an ELECTRA small model on a V100, but the same model could be trained on a v3-8 within 8 (!) hours. A good intuition about training times can be found in our German's Next Language Model paper where we report training times for our BERT and ELECTRA models. So using a v3-8 TPU would take 8-9 days. The GC4 model took around 3,5 days on a v3-32 TPU, but: the vocab size also affects training time, so when you use a smaller vocab (e.g. the "usual" 32k vocab it could be done in less than 3,5 days). So these training times should also be realistic for your Romanian model! I used the official ELECTRA repo and the same pre-processing steps as documented in the BERTurk Cheatsheet. If you're interested I could help in training a model on a v3-32 TPU on your cleaned corpus :) |
Beta Was this translation helpful? Give feedback.
-
Hi @stefan-it,
Congrats on the gc4lm! As I'm also trying to train LMs for Romanian, and compute resources are pretty scarce, and as I've seen you mentioned you trained an Electra on a v3-32 TPU, I'd like to ask you how were you able to get it? I did manage to train the first BERT model for Romanian last year, but that was with help from a Finnish university (they provided the GPUs). Right now with a single V100 that I have access to, I'm not really able to train anything larger than Electra-smalls.
I also applied for TFRC and got only 5 v3-8s, and with the 300$ free credits there was not enough time to train, for example, an Electra-base. So, I'd like to ask you the following:
Right now I have a pretty large corpus for Romanian (~30GB, probably the largest available, that I've spent months gathering and cleaning), and I started an Electra-base training back in March, and it will run probably until this September-October :). It'd be awesome if I would be able to get to train these models faster.
Thank you very much for your time!
Beta Was this translation helpful? Give feedback.
All reactions