AdapterFusion requires large batch_size? #597

dyxohjl666 · 2023-10-30T02:35:54Z

Details

When I train AdapterFusion with the default configuration on a summarization task, the training loss suddenly increases after the first epoch and doesn't converge at last.
Due to the resource limitations, I could only train with a batch_size of 1 originally. Then I tried different hyperparameters and finally found that it works normally only when batch_size>=16. I also test batch_size=8, it's not good enough. (I didn't really add the batch_size, but apply Gradient Accumulation).

Just record the details here for someone who may experience the same confusion with me when training AdapterFusion!

dyxohjl666 added the question Further information is requested label Oct 30, 2023

dyxohjl666 closed this as completed Oct 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AdapterFusion requires large batch_size? #597

AdapterFusion requires large batch_size? #597

dyxohjl666 commented Oct 30, 2023

AdapterFusion requires large batch_size? #597

AdapterFusion requires large batch_size? #597

Comments

dyxohjl666 commented Oct 30, 2023

Details