What are thoughts on Tied-LoRA? #1350

Maykeye · 2024-01-11T19:21:42Z

Maykeye
Jan 11, 2024

NVidia proposed Tied-LoRA https://arxiv.org/abs/2311.09578

The idea is to share A,B matrices for QKV's lora across the layers with possibility to freeze some of them (and they add u, v vectors to process data going in and out of A,B):

Any thoughts on this approach and maybe even its generalization? Eg splitting layers into groups so lets say first half of layers get one pair of A,B; second half gets its own parms and If u,v are frozen and number of groups = number of layers, we get current LoRA

BenjaminBossan · 2024-01-12T10:37:18Z

BenjaminBossan
Jan 12, 2024
Maintainer

This could certainly be interesting, thanks for bringing this up. Implementing this is not completely trivial, as we have to think about some issues like how to avoid copies, how to correctly save and load, and how to deal with models where the sizes of the weights would differ. In fact, those are some of the challenges we ran into when implementing VeRA (see #1039), which also uses tied LoRA weights. In fact, at a very quick glance, the paper you cite could be considered a special case/variation of VeRA.

Probably it would make most sense to figure out the VeRA PR first, then we can add this method either using a similar approach as VeRA, or as an optional argument for VeRA.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What are thoughts on Tied-LoRA? #1350

{{title}}

Replies: 1 comment

{{title}}

Select a reply

What are thoughts on Tied-LoRA? #1350

Maykeye Jan 11, 2024

Replies: 1 comment

BenjaminBossan Jan 12, 2024 Maintainer

Maykeye
Jan 11, 2024

BenjaminBossan
Jan 12, 2024
Maintainer