Optimize DoRA in `eval` and `no dropout` #2122

ariG23498 · 2024-10-01T16:06:22Z

ariG23498 · 2024-10-01T16:08:45Z

src/peft/tuners/lora/layer.py

+ if isinstance(dropout, nn.Identity):
+ print("no dropout, optimize here")
+ else:
+ print("dropout, same ops")
+


@BenjaminBossan did you envision something like this?

My intuition was:

Figure out whether there is dropout or not

Use a flag for dropout

Pass the flag to the forward or DoRA layers -- where I would need to skip the alignment step and reuse x (the base model outputs)

Let me know if I am on the right track.

Note: I could not figure out a way to catch if the model was in eval mode. How would you have done it.

Yes, I think the dropout check is valid as is. Regarding eval mode, I think that checking self.training should work.

On how to proceed, my thinking was that if we find that we can make this optimization, we pass the base result as an additional argument to DoRA forward (default for that argument being None) and there, we use this base result if it's given and if not, we calculate it like we currently do. Could be that I'm missing something but that's my idea.

The good news is that since we have a working implementation, we can then compare the results using both approaches and it should be identical (of course not when there is dropout, but apart from that).

Neat solution!

HuggingFaceDocBuilderDev · 2024-10-01T16:10:03Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

src/peft/tuners/lora/layer.py

BenjaminBossan

Thanks for implementing this task. I have a few comments:

I don't think we need the do_optimize argument. When result is passed, let's use it, else don't.

Also, let's refactor this a bit and avoid an early return. Instead, let's do something like:

if result is not None:
    # let's also add a comment to explain
    base_result = ...
else:
    base _result = F.linear(x, transpose(weight, self.fan_in_fan_out))

Then in this line:

https://github.com/huggingface/peft/pull/2122/files#diff-bcb4d7d165949d2eb3eac3203b9067589261e52a1192b7a31f101a3ec98855acR99

replace F.linear(x, transpose(weight, self.fan_in_fan_out)) by base_result.

A small caveat I see with this approach is that we assume that we can just remove the bias to get the base result. This will work for normal nn.Linear layers but may not work for other types. But let's maybe not worry about that for now.

We should also run an example with and without the optimization to ensure that we get the same results and also a speedup and memory improvement.

BenjaminBossan

Thanks for the update. I left a comment where I think the calculation is not quite right.

Also, it would be great if you could check a DoRA example to see the changes caused by this PR. Probably one of the existing examples could be used. We should ensure that:

The results are the same (assuming dropout being 0)
Training should be faster
Memory usage should be lower

BenjaminBossan · 2024-10-02T15:30:37Z

src/peft/tuners/lora/dora.py

+ bias = base_layer.bias
+ if bias is not None:
+ base_result = base_result - bias
+ result_dora = mag_norm_scale * base_result + mag_norm_scale * lora_result * scaling


Hmm, wait, should this not be the exact same calculation as in line 103? I.e. we should leave the condition after calculating the base_result and then do the same calculation of dora_result for both cases.

I am not sure that I follow. With the base_result in place:

We first subtract the bias

Compute the dora_result where the scale the base_result with mag_norm_scale

But without the base_result:

We compute the base_result with the linear forward

Compute the dora_result where we scale the base_result with (1 - mag_norm_scale)

Aren't they going to be different for each case?

Okay, so I'm a bit confused, let's try to resolve this.

In the old code, we basically have:

dora_result = (mag_norm_scale - 1) * base_result + mag_norm_scale * lora_result * lora_scale

variable names slightly changed for clarity

My thinking is that the base_result is either calculated right there (old code) or we use the base_result that is being passed as an argument, but the basic equation stays the same.

Of course, as you correctly noted, the bias needs to be subtracted first and then added back in the latter case.

In the currently proposed code, in one case we calculate mag_norm_scale * base_result and in the other (mag_norm_scale - 1) * base_result. This looks inconsistent to me.

whether dropout

9593abf

ariG23498 commented Oct 1, 2024

View reviewed changes

chore: adding optimized code for dora

ba60f0b

ariG23498 marked this pull request as ready for review October 2, 2024 10:23

ariG23498 requested a review from BenjaminBossan October 2, 2024 10:23

ariG23498 added 2 commits October 2, 2024 16:04

formating

1a01895

chore: adding bias

4b3af88

charchit7 reviewed Oct 2, 2024

View reviewed changes

src/peft/tuners/lora/layer.py Show resolved Hide resolved

BenjaminBossan requested changes Oct 2, 2024

View reviewed changes

ariG23498 added 2 commits October 2, 2024 19:30

chore: refactor code

e61fb97

chore: adding comments

3506f39

ariG23498 requested a review from BenjaminBossan October 2, 2024 14:05

BenjaminBossan reviewed Oct 2, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize DoRA in `eval` and `no dropout` #2122

Optimize DoRA in `eval` and `no dropout` #2122

ariG23498 commented Oct 1, 2024

ariG23498 Oct 1, 2024

BenjaminBossan Oct 1, 2024

charchit7 Oct 2, 2024

HuggingFaceDocBuilderDev commented Oct 1, 2024

BenjaminBossan left a comment

BenjaminBossan left a comment

BenjaminBossan Oct 2, 2024

ariG23498 Oct 3, 2024

BenjaminBossan Oct 4, 2024

Optimize DoRA in eval and no dropout #2122

Are you sure you want to change the base?

Optimize DoRA in eval and no dropout #2122

Conversation

ariG23498 commented Oct 1, 2024

ariG23498 Oct 1, 2024

Choose a reason for hiding this comment

BenjaminBossan Oct 1, 2024

Choose a reason for hiding this comment

charchit7 Oct 2, 2024

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Oct 1, 2024

BenjaminBossan left a comment

Choose a reason for hiding this comment

BenjaminBossan left a comment

Choose a reason for hiding this comment

BenjaminBossan Oct 2, 2024

Choose a reason for hiding this comment

ariG23498 Oct 3, 2024

Choose a reason for hiding this comment

BenjaminBossan Oct 4, 2024

Choose a reason for hiding this comment

Optimize DoRA in `eval` and `no dropout` #2122

Optimize DoRA in `eval` and `no dropout` #2122