Skip to content

Commit

Permalink
Update index.html - train/inference mode fix
Browse files Browse the repository at this point in the history
  • Loading branch information
mobicham authored Nov 2, 2023
1 parent 857d8b7 commit 624c60f
Showing 1 changed file with 7 additions and 8 deletions.
15 changes: 7 additions & 8 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -135,8 +135,8 @@ <h2 id="pruningllama2" class="">Low-Rank Pruning of Llama2 Models</h2>

<h4>Training Mode</h4>
<ul>
<li>For each linear layer, we run SVD on the weights of the linear layers <b>W</b> to get the <b>A</b>,<b>B</b> matrix pairs such that <b>AB</b>estimates <b>W</b> using the predefined max_rank value to truncate the singular values as explained in the previous section. The only layer that we keep full-rank is the <b>v_proj</b>. This is because the rank of the weights of this layer tends to be higher.</li>
<li>We freeze all the weights and use LoRA with the r parameter to create the new trainable parameters.
<li>For each linear layer, we run SVD on the weights of the linear layers <b>W</b> to get the <b>A</b>,<b>B</b> matrix pairs such that the matrix multiplication <b>AB</b> estimates <b>W</b> using the predefined max_rank value to truncate the singular values as explained in the previous section. The only layer that we keep full-rank is the <b>v_proj</b>. This is because the rank of the weights of this layer tends to be higher.</li>
<li>We freeze all the weights and use LoRA with the <b>r</b> parameter to create the new trainable parameters.
</li>
</ul>

Expand All @@ -147,23 +147,22 @@ <h4>Inference mode</h4>
<li><a href="https://www.ic.unicamp.br/~meidanis/PUB/Doutorado/2012-Biller/Marsaglia1964.pdf">Since the rank of the sum of two matrices is lower or equal than the sum of their ranks</a>
$$ {rank({\bf AB}+{\bf A_L} {\bf B_L} ) \le rank({\bf AB}) + rank({\bf A_LB_L})} $$
we can safely combine the 4 weights by applying truncated SVD on the sum of their matrix multiplications using the sum of their ranks to build the new low-rank pair:
$$ {{\bf AB} + {\bf A_LB_L} \Rightarrow {\bf \bar{A} \bar{B} }} $$
$$ { rank({\bf AB}) = max rank + r } $$
$$ { rank({\bf AB} + {\bf A_LB_L} ) = maxrank + r } $$

</li>
<li>Now we can use the new pairs and remove the older A,B and LoRA weights.
<li>Now we can use the new pairs and remove the older <b>A</b>,<b>B</b> and LoRA weights.
</li>
</ul>


<p>The illustration below shows the difference between the standard LoRA approach and the proposed low-rank LoRA merging method. Note that the result is a pair of matrices.</p>


<figure><img style="width:480px" src="figs/merging.png" /></figure>
<figure><center></center><img style="width:480px" src="figs/merging.png"/></center></figure>

<p>The code below summarizes the merging logic:</p>

<figure><img style="width:640px" src="figs/pseudo-code.png" /></figure>
<figure><center></center><img style="width:640px" src="figs/pseudo-code.png" /></center></figure>

<h2 id="benchmark">Speed Benchmark</h2>

Expand Down Expand Up @@ -260,4 +259,4 @@ <h2 id="conclusion">Conclusion</h2>
</article>
</body>

</html>
</html>

0 comments on commit 624c60f

Please sign in to comment.