From 624c60f230906a1ae436f0a65ddd945b7e092b95 Mon Sep 17 00:00:00 2001 From: mobicham <37179323+mobicham@users.noreply.github.com> Date: Thu, 2 Nov 2023 11:38:34 +0100 Subject: [PATCH] Update index.html - train/inference mode fix --- index.html | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/index.html b/index.html index fa8c572..ae2f62c 100644 --- a/index.html +++ b/index.html @@ -135,8 +135,8 @@

Low-Rank Pruning of Llama2 Models

Training Mode

@@ -147,11 +147,10 @@

Inference mode

  • Since the rank of the sum of two matrices is lower or equal than the sum of their ranks $$ {rank({\bf AB}+{\bf A_L} {\bf B_L} ) \le rank({\bf AB}) + rank({\bf A_LB_L})} $$ we can safely combine the 4 weights by applying truncated SVD on the sum of their matrix multiplications using the sum of their ranks to build the new low-rank pair: - $$ {{\bf AB} + {\bf A_LB_L} \Rightarrow {\bf \bar{A} \bar{B} }} $$ - $$ { rank({\bf AB}) = max rank + r } $$ + $$ { rank({\bf AB} + {\bf A_LB_L} ) = maxrank + r } $$
  • -
  • Now we can use the new pairs and remove the older A,B and LoRA weights. +
  • Now we can use the new pairs and remove the older A,B and LoRA weights.
  • @@ -159,11 +158,11 @@

    Inference mode

    The illustration below shows the difference between the standard LoRA approach and the proposed low-rank LoRA merging method. Note that the result is a pair of matrices.

    -
    +

    The code below summarizes the merging logic:

    -
    +

    Speed Benchmark

    @@ -260,4 +259,4 @@

    Conclusion

    - \ No newline at end of file +