Lora rank & alpha #2037

BigDataMLexplorer · 2024-08-26T12:54:47Z

BigDataMLexplorer
Aug 26, 2024

Hi, I'm training the Llama3 8b model. I did many trials with lora rank = 16 and different aplhas -> (32, 16 and 8). In my case the best result was with aplha 8. I did not use rslora in this testing.
Even assuming I have the best aplha value, can it still help me to use use_rslora=True in this configuration? If I have aplha set to 8, what aplha will be used when rslora? I didn't quite get it from the huggingface article.

In general, I read that a higher rank in lora should capture more nuances, because more parameters will be trained.
That's why I also tried to increase lora's rank to 256 and leave aplha at half (128). Of course with a proper learning rate, otherwise the results would be very bad. I already used use_rslora=True here. The result was 1% percentage point worse than rank 16. The result was worse than rank 16 even though I didn't use rslora.

Do you think I may have already reached the optimum or should I do something different when using rslora?
Thank you

BenjaminBossan · 2024-08-26T13:47:54Z

BenjaminBossan
Aug 26, 2024
Maintainer

When using rslora, the LoRA output will be scaled by a factor of alpha / sqrt(rank), where normally it is scaled by alpha / rank. Therefore, if you have already determined a good alpha value, there is IMHO no reason to run more hyper-parameter searches with rslora enabled. At the end of the day, rslora should help you determine a better scale but you already seem to have found a good scale by varying alpha.

2 replies

BigDataMLexplorer Aug 27, 2024
Author

Have you ever fine tuned that a lower rank worked better than a higher one? Or is it rather that the user did not adjust the learning rate better by changing the rank?
I've read studies that with the right hyperparameters, a higher rank should achieve better results. But it doesn't have to be the rule. So I would like to ask someone more experienced. Do you think this is the case and have you experienced it in your own coding?
Thanks.

BenjaminBossan Aug 27, 2024
Maintainer

I haven't run extensive tests to check the effect of the rank. However, I think that the rank should be treated similar to how we should treat higher parameter counts for neural nets in general (think larger hidden size). The further we increase the rank, the more learning capacity the model has. But since we're fine-tuning, this also means that there is more danger of forgetting useful information.

For this reason, I would not subscribe to the general statement that "a higher rank should achieve better results", as it will surely depend on the task, the dataset, etc. Where did you see this?

This is not to say that if you invest a lot of time in tuning hyper-parameters, you may not get better scores with rank 256. Before doing that, I would definitely check why that model is performing worse: Is it overfitting, does it show signs of catastrophic forgetting? This should guide your experiments. But in the end it really depends on your use case if this is a worthy investment in time or not.

All that said, keep in mind that I'm not an ML researcher. I'm sure there are folks out there who have studied this topic much more closely and can give better advise than I do :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lora rank & alpha #2037

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Lora rank & alpha #2037

BigDataMLexplorer Aug 26, 2024

Replies: 1 comment · 2 replies

BenjaminBossan Aug 26, 2024 Maintainer

BigDataMLexplorer Aug 27, 2024 Author

BenjaminBossan Aug 27, 2024 Maintainer

BigDataMLexplorer
Aug 26, 2024

Replies: 1 comment 2 replies

BenjaminBossan
Aug 26, 2024
Maintainer

BigDataMLexplorer Aug 27, 2024
Author

BenjaminBossan Aug 27, 2024
Maintainer