Implements `Vera` #763

julian-fong · 2024-12-01T17:22:42Z

This PR aims to implement Vera, which introduces trainable parameters d and b while keeps LoRA matrices A and B frozen, random, and shared across layers.

I've opted to put the Vera implementation under the Lora implementation (like IA3)

Paper: https://arxiv.org/pdf/2310.11454

This PR includes:

A new Vera implementation under the lora methods file
A new adapter config named VeraConfig
A shared parameter initialization function that takes in the init_weights argument set from the VeraConfig
A new criteria inside the add_adapter function inside the LoRALayer to check if we should use the Vera class. Currently, the criteria will check to see if the parameters from the VeraConfig d and b are of type float, its the most simple criteria I could think of at the moment.
A new parameter included added inside add_adapter inside the LoRALayer as suggested. It will now pass the name of the adapter inside the __init__ function.

Things to note:

In the original Vera paper, the decomposition matrices B and A are frozen and shared across layers. As suggested, I've opted to create these matrices similar to the PHMLayer implementation, using a new method named init_shared_vera_parameters. This function will take in the init_weights argument set from the VeraConfig to setup the initialization of B and A, while the other parameters from the VeraConfig will be used inside the Vera module.

As I'm not 100% sure what the composition modes do ('add', 'scale'), I've opted the Vera class to be use only if composition_mode is set to add, and either d or b are of type float (and not None).

reviews appreciated!

julian-fong · 2024-12-01T21:29:23Z

@calpt Regarding the paper: from my first readings it seems like it doesn't mention any scaling via a constant alpha/r or any gating. Should I still include it in the vera implementation to make it consistent with the lora and IA3 modules? I am also assuming the com and com_inv would also need to be re-included for Vera in order to allow for integration with the LoRALayer class

calpt · 2024-12-02T23:18:07Z

@calpt Regarding the paper: from my first readings it seems like it doesn't mention any scaling via a constant alpha/r or any gating. Should I still include it in the vera implementation to make it consistent with the lora and IA3 modules? I am also assuming the com and com_inv would also need to be re-included for Vera in order to allow for integration with the LoRALayer class

yes, doesn't hurt to include these options even if not mentioned in the paper, unless this makes the implementation significantly more challenging

julian-fong · 2024-12-14T03:56:20Z

I'll also include a new test module to test the Vera module similar to the IA3 and the Lora test modules soon

calpt

did a first review pass, thanks for working on this!
will review again once we have tests & docs added.

for adding tests, you might sync with @TimoImhof, since we have a larger test folder refactoring coming up here: #740, so might make sense to directly base off that?

calpt · 2024-12-23T12:33:33Z

src/adapters/methods/lora.py

+            gate = torch.mean(gate, dim=1).unsqueeze(-1)
+            hidden_states = hidden_states * gate
+        else:
+            gate = None


as this is likely merged after #770, the same fix from there should be applied here

calpt · 2024-12-23T12:36:32Z

src/adapters/methods/lora.py

+        config: LoRAConfig,
+        gating_heads: int = 1,
+    ):
+        super().__init__()


we should also add an assert for composition mode "add" here (same as in LoRA init), just to make sure

calpt · 2024-12-23T12:39:54Z

src/adapters/configuration/adapter_config.py

+    d: Union[bool, float] = None
+    b: Union[bool, float] = None


could we name these "vera_b" and "vera_d", to make more obvious what these are related to?

also why can these be bools? ie what happens when I set d=True, b=True?

Good catch, i think this is a typo based on a previous idea I had which i scraped later.. thanks

calpt · 2024-12-23T12:42:03Z

src/adapters/methods/lora.py

@@ -90,6 +94,7 @@ def com_inv(self, weights: torch.Tensor, added: torch.Tensor) -> torch.Tensor:
        return weights - added * self.scaling

    def forward(self, hidden_states: Optional[torch.Tensor], layer_input: torch.Tensor):
+        print("triggered")


calpt · 2024-12-23T12:45:35Z

src/adapters/methods/lora.py

+        if getattr(self, "lora_dropout"):
+            hidden_states = self.lora_dropout(hidden_states)
+
+        hidden_states = hidden_states @ self.vera_B @ lora_B @ self.vera_D @ lora_A


shouldn't the order be reversed here? ie we matmul hidden states with lora_A -> vera_d -> lora_B -> vera_b, according to §3.1 (2) of the paper?

calpt · 2024-12-23T12:51:49Z

src/adapters/methods/lora.py

+            # if we're using Vera, then set the adapter name into the Vera object
+            if lora_cls == Vera:
+                lora.set_vera_adapter_name(name=adapter_name)


feels a bit hacky to do this only for vera as the name is not specific to this type. what do you think of always passing the name directly to the __init__ method of each module class (for all LoRA, Vera, IA3) and setting self.name directly there?
that might be cleaner long-term as we might want to use the name in LoRA as well in the future.

I've thought about that idea as well but opted for now to implement this idea first since right now lora and IA3 don't use self.name. I'll refactor it as you said. Thanks!

calpt · 2024-12-23T12:52:14Z

src/adapters/methods/lora.py

+        self.name = name
+
+
+def init_shared_Vera_parameters(model_config, adapter_config, device):


nit: ideally lower-case "v" in the middle of method names

calpt · 2024-12-23T12:56:57Z

src/adapters/configuration/adapter_config.py

+    Lora Config that applies vector-based random matrix adaptation. It adds
+    trainable matrices 'd' and 'b' while keeping the original LoRA matrices
+    frozen, random, and shared across layers. See more through their paper:
+    https://arxiv.org/pdf/2106.09685. Note that `r` will still be supplied
+    since we are still initializing decomposition matrices A and B.
+    The `composition_mode` parameter should also be set to `add`.


the paper link still needs updating :)

… implement_vera

julian-fong added 3 commits December 1, 2024 12:22

initial commit

57c5131

improved docstring and fixed formatting issues

259a268

fixed formatting

b66571c

julian-fong added 7 commits December 12, 2024 14:16

updates

acee994

Updates

18182af

Updates

f28e508

removed typo

f38b0e3

fix black

385cd35

updates

46af3fd

fixed typo

9f3a202

calpt reviewed Dec 23, 2024

View reviewed changes

julian-fong and others added 7 commits December 23, 2024 11:47

Merge branch 'main' into implement_vera

12379e3

updates

0c0f7e6

Merge branch 'implement_vera' of github.com:julian-fong/adapters into…

99cfb68

… implement_vera

added review updates

1229fc5

apply fix from adapter-hub#770

20ddb5c

updated docstring

25fe0a9

updated docstring

7f79832

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implements `Vera` #763

Implements `Vera` #763

julian-fong commented Dec 1, 2024 •

edited

Loading

julian-fong commented Dec 1, 2024

calpt commented Dec 2, 2024

julian-fong commented Dec 14, 2024

calpt left a comment

calpt Dec 23, 2024

calpt Dec 23, 2024

calpt Dec 23, 2024

calpt Dec 23, 2024

julian-fong Dec 23, 2024

calpt Dec 23, 2024

calpt Dec 23, 2024

calpt Dec 23, 2024

julian-fong Dec 23, 2024

calpt Dec 23, 2024

calpt Dec 23, 2024 •

edited

Loading

		self.name = name


		def init_shared_Vera_parameters(model_config, adapter_config, device):

Implements Vera #763

Are you sure you want to change the base?

Implements Vera #763

Conversation

julian-fong commented Dec 1, 2024 • edited Loading

julian-fong commented Dec 1, 2024

calpt commented Dec 2, 2024

julian-fong commented Dec 14, 2024

calpt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

calpt Dec 23, 2024 • edited Loading

Choose a reason for hiding this comment

Implements `Vera` #763

Implements `Vera` #763

julian-fong commented Dec 1, 2024 •

edited

Loading

calpt Dec 23, 2024 •

edited

Loading