For each linear layer, we run SVD on the weights of the linear layers W to get the A,B matrix pairs such that ABestimates W using the predefined max_rank value to truncate the singular values as explained in the previous section. The only layer that we keep full-rank is the v_proj. This is because the rank of the weights of this layer tends to be higher.
-
We freeze all the weights and use LoRA with the r parameter to create the new trainable parameters.
+
For each linear layer, we run SVD on the weights of the linear layers W to get the A,B matrix pairs such that the matrix multiplication AB estimates W using the predefined max_rank value to truncate the singular values as explained in the previous section. The only layer that we keep full-rank is the v_proj. This is because the rank of the weights of this layer tends to be higher.
+
We freeze all the weights and use LoRA with the r parameter to create the new trainable parameters.
@@ -147,11 +147,10 @@
Inference mode
Since the rank of the sum of two matrices is lower or equal than the sum of their ranks
$$ {rank({\bf AB}+{\bf A_L} {\bf B_L} ) \le rank({\bf AB}) + rank({\bf A_LB_L})} $$
we can safely combine the 4 weights by applying truncated SVD on the sum of their matrix multiplications using the sum of their ranks to build the new low-rank pair:
- $$ {{\bf AB} + {\bf A_LB_L} \Rightarrow {\bf \bar{A} \bar{B} }} $$
- $$ { rank({\bf AB}) = max rank + r } $$
+ $$ { rank({\bf AB} + {\bf A_LB_L} ) = maxrank + r } $$
-
Now we can use the new pairs and remove the older A,B and LoRA weights.
+
Now we can use the new pairs and remove the older A,B and LoRA weights.
@@ -159,11 +158,11 @@
Inference mode
The illustration below shows the difference between the standard LoRA approach and the proposed low-rank LoRA merging method. Note that the result is a pair of matrices.