-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate X-LoRA #1491
Integrate X-LoRA #1491
Conversation
Let me know once this is ready for review. |
Hi @BenjaminBossan, I think this is ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for working on this PR. I did not do an in-depth review yet, but from what I saw it already looks very solid.
Before working on the details, I think it's best to get the overall design polished. For this, I still have a few questions, please check out my comments. Together, I'm sure we can figure out a way to simplify some of the code.
Also, it would really help to have examples and/or tests to see X-LoRA in action. Do you have something that you could add? Later, we should also work on documentation, but this can wait until for now.
Edit: For other readers, here is the paper link: https://arxiv.org/abs/2402.07148
src/peft/tuners/xlora/classifier.py
Outdated
npy = result.numpy() | ||
numpy.save(file, npy) | ||
|
||
def flush_log_scalings(self, path: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain why we need this? Is this just for debugging/monitoring purposes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This API enables the user to get a log of the scalings. It is useful for generating visualizations such as this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. In that case, there should be some kind of example provided to illustrate how to use this, I think otherwise users will not discover this feature.
Also, I'd suggest for this method to only return the indices_map
and allow the caller to decide themselves if they want to save it on disk as json or do something else with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't returning the indices_map
(or maybe the seqlens_map
would be better, as it contains the tensors) make flush_log_scalings
redundant to get_scalings_log
? Of course, a dedicated method to calculate the indices_map
may be helpful, and given the addition of such a method I think removing flush_log_scalings
would be OK.
Additionally, there is actually a wrapper method on XLoraModel
with the rest of the API which contains a docstring. This way, it should be easy to find. This method in the classifier is actually internal, I have prefixed it with _
now.
src/peft/peft_model.py
Outdated
@@ -149,7 +155,7 @@ def active_adapters(self) -> list[str]: | |||
adapters = self.active_adapter | |||
if isinstance(adapters, str): | |||
adapters = [adapters] | |||
return adapters | |||
return list(filter(lambda x: len(x) > 0, adapters)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why this is needed. Is this because of self.active_adapter = ""
. Could you please explain the reason behind that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During some of the testing I did, self.active_adapters
would return a function instead of executing the @property
. Through debugging, I discovered that it was somehow connected to self.active_adapter
not being set, as no adapter is initially loaded.
However, suppose that I could allow the default adapter to be loaded, and then delete it later. That should make those obsolete. I have pushed a commit to hopefully resolve this.
src/peft/peft_model.py
Outdated
@@ -123,7 +129,7 @@ def __init__(self, model: PreTrainedModel, peft_config: PeftConfig, adapter_name | |||
else: | |||
self._peft_config = None | |||
cls = PEFT_TYPE_TO_MODEL_MAPPING[peft_config.peft_type] | |||
self.base_model = cls(model, {adapter_name: peft_config}, adapter_name) | |||
self.base_model = cls(model, {adapter_name: peft_config}, adapter_name, self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really want to avoid this, as this looks like a mixing of concerns. Surely, we can figure out a better way. Could you explain why this was needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was needed because in XLoraModel.__init__
we load adapters into the PeftModel
via load_adapter
and for use by the XLoraClassifier. This way, automatic loading is achieved. However, all that matters is that we are able to get some reference to PeftModel
into the constructor.
Perhaps we could add some code after we call cls(model...)
, and check if self.base_model
is a XLoraModel
. Then, we could call a __post_init__
which would take the PeftModel
self, and do the adapter loading. Would this be more elegant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just pushed a commit (c5cdfc3) which is the latest one currently for easy reversion. Does this resolve the concern?
src/peft/tuners/xlora/classifier.py
Outdated
npy = result.numpy() | ||
numpy.save(file, npy) | ||
|
||
def flush_log_scalings(self, path: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. In that case, there should be some kind of example provided to illustrate how to use this, I think otherwise users will not discover this feature.
Also, I'd suggest for this method to only return the indices_map
and allow the caller to decide themselves if they want to save it on disk as json or do something else with it.
src/peft/tuners/xlora/model.py
Outdated
model_peft.get_nb_trainable_parameters, | ||
model_peft.generate, | ||
) | ||
model_peft.save_pretrained = peft_model_wrapper.save_pretrained # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I fully understand this yet. Above, we call:
self.base_model.__xlora_post_init__(model, peft_config, adapter_name)
but here it looks like this method requires 4 arguments. What is the model_peft
, is this the PeftModel
?
So what I suspect is happening here is that you want to modify the behavior of generate
and save_pretrained
of the PeftModel
without actually modifying PeftModel
, is this the goal?
When it comes to generate
, the PeftModel
calls generate
on the base_model
anyway, could we not add the modification at that point?
When it comes to save_pretrained
, I think we could check if we can somehow make the two approaches work together, I'd need to understand what the custom save_pretrained
method does differently.
One easy way that we could allow custom save_pretrained
methods without a lot of changes in PeftModel
would be something like this:
class PeftModel:
def save_pretrained(...):
if hasattr(self.base_model, "_custom_save_pretrained"):
return self.base_model._custom_save_pretrained(...)
# same code as right now
This way, we only added 2 extra lines in PeftModel
but would allow the underlying model to implement their own save_pretrained
method.
src/peft/tuners/xlora/insertion.py
Outdated
# TODO(EricLBuehler): Evaluate effectiveness and performance degradation | ||
self.peft_model.base_model.eval() | ||
if not self.config.use_trainable_adapters: | ||
for name, param in self.peft_model.base_model.named_parameters(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks strange to me, generate
should normally not have anything to do with parameter updates. Could you explain why this is required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certainly! We discovered during training that the adapters were being set to trainable after each generate
call. If you could provide any insight to why that may be the case, it would be great!
This should certainly not happen. Could you please open an issue and provide an example so that we can fix it? Regarding the state of the PR, it's actually a bit hard for me to tell which comments have been addressed and which haven't (as GH removes them when the line has been changed, even if it may not have been addressed). Regardless of some designs which I think could be improved, I think the most efficient way forward would be if you could provide an example/tests so that I can see X-LoRA in action. This may help answer some questions I still have. |
Thank you, I work on some example code to reproduce the behavior and will raise an issue. Regarding examples of X-LoRA, we have put together some examples using the I hope this will work as a demonstration of X-LoRA in action, would it be better for me to provide some examples using this PR? |
I took a look at these examples, thanks. But let's get something to work based on this PR, even if very basic. This helps me understand what code paths are taken and how the different parts interact. Otherwise, the review is much harder for me. Also, we will need something like this eventually because we want to add tests for X-LoRA. As I said, it can be very basic for a start. |
I was able to put together a small example which shows how one would use the API as documented here, which I have attached in a plain text file as GH does not allow .py to be attached. This is very basic and just shows creation, generation and simple API usage. I hope this helps! |
I tried to run your example but encountered some problems:
|
Thank you for trying it out. I ran the following code on a local machine with the latest installation from this branch to test the loading of a normal LoRA adapter, and it seemed to work, as after printing the model I can see from transformers import AutoModelForCausalLM, OPTForCausalLM, AutoTokenizer
from peft import LoraConfig
model_id = "HuggingFaceH4/zephyr-7b-beta"
model = AutoModelForCausalLM.from_pretrained(model_id)
lora_config = LoraConfig(
target_modules=[
"q_proj",
"gate_proj",
"o_proj",
"v_proj",
"k_proj"
],
init_lora_weights=False
)
model.add_adapter(lora_config, adapter_name="adapter_1")
print(model) Strangely, when I printed out the model in the X-LoRA test script, it showed proper injection of adapters as well as the classifier. To begin fixing this, could you please provide a minimum-reproducible example for the LoRA adapter loading so that I can find the error? |
Using your exact branch with the merge conflict removed, when I run your script with this slight modification, I get the issue of no LoRA weights being applied: - model.add_adapter(lora_config, adapter_name="adapter_1")
+ from peft import get_peft_model
+ model = get_peft_model(model, lora_config) |
I found the bug: it was because of a mistake in a |
Thanks for fixing this, it should now be working. There is still an issue with a merge conflict being unresolved as mentioned earlier:
|
I have now fixed the merge conflict. |
Hi @BenjaminBossan, I'm not sure if you have had a chance to look at the updated PR, as merge conflict has been resolved, and I think it is ready for another review. Would you prefer a new PR to be opened as a cleaner slate for further reviews? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for working on my previous comments. I was busy with some other stuff and only today could do another in detailed review of your PR. As I have to go now, it is unfortunately not 100% complete, but I still wanted to give you the feedback so you don't have to wait.
In addition to the individual comments I made, I have a few general questions:
- How is the X-LoRA adapter trained? Could you please provide an example? Eventually, we'll want to move this to unit tests.
- Could you please add the copyright notice to all new modules?
- X-LoRA only really works with transformers language models, right? Can we document this more clearly? Also, do you think it would be possible to make this work with other types of models?
- I'm not a fan of the type annotations of the style
self.inner: nn.ModuleList = nn.ModuleList([])
ormodel: nn.Module = self.model
, especially when followed by a# type: ignore
. Same with the use oftyping.cast
. Is that because your IDE flags the code otherwise? Maybe you could deactivate the type checker for this project, as PEFT isn't really well annotated.
src/peft/peft_model.py
Outdated
if not isinstance(config, XLoraConfig): | ||
raise TypeError(f"Expected 'XLoraConfig', got '{type(config)}' instead.") | ||
|
||
device = infer_device() # As in PeftModel.load_adapter, torch_device = infer_device( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment is cut off?
src/peft/tuners/xlora/model.py
Outdated
) -> None: | ||
super().__init__(model, config, adapter_name) | ||
|
||
def _xlora_post_init( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it wouldn't be better to convert this to a standalone function, something like def post_init_lora(peft_model)
. Not sure if we need all the other arguments, can they not be derived from the PeftModel
instance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. Most of those are deeply nested by the time post_init_lora
is called, so I thought it would increase readability to pass it this way. Would you prefer it if they are accessed through the PeftModel?
src/peft/peft_model.py
Outdated
config.device = torch.device(device) | ||
|
||
# If we are passed adapters in the kwargs, it is already in the config. | ||
# If no adapters are passed, config.adapters is None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand correctly that this is for the case where we call save_pretrained
on the X-LoRA model and then load this pretrained model again with from_pretrained
? The only "new" thing added in that case would be the X-LoRA classifier, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is correct. I simply load the weights for the classifier here.
src/peft/tuners/lora/layer.py
Outdated
@@ -434,7 +446,14 @@ def forward(self, x: torch.Tensor, *args: Any, **kwargs: Any) -> torch.Tensor: | |||
x = x.to(lora_A.weight.dtype) | |||
|
|||
if not self.use_dora[active_adapter]: | |||
result = result + lora_B(lora_A(dropout(x))) * scaling | |||
if _xlora_layer is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not so happy with this addition to the LoraLayer
s. It makes reading and understanding them more complex and requires all LoRA layers to be updated (e.g. what about the bnb lora layers?).
I couldn't come up with an alternative yet, but I wonder if we could achieve something with wrapping and/or forward hooks. I'll continue to think about this tomorrow but wanted to let you know already in case you have some ideas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, it is not very easy to read. Perhaps we could implement some sort of hook in the LoraLayer
so that techniques such as DoRA and X-LoRA could use that instead of modifying the layer source?
Thank you for your review! I have updated the code with your suggestions.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the updates. I left a few comments, but let's focus more on the bigger picture and not each detail.
When I tried to run your code, I encountered an error:
AttributeError: 'XLoraConfig' object has no attribute 'target_modules'
I also think some other parts of the code don't quite work. To avoid this in the future, let's start adding unit tests. Let's start simple and add a new file tests/test_xlora.py
with a functional test based on the example you posted earlier. It should also contain a test of training the XLoRA classifier.
Regarding the issue of only working with transformers language models, I think it's fine for the start. We can think of generalizing this in a follow up PR.
Again, thanks a lot for this contribution and your patience.
src/peft/tuners/xlora/classifier.py
Outdated
|
||
logits = self.last.forward(hidden_state) | ||
|
||
### Repeat to make layerwise scalings if the classifier layer does not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add this explanation as a comment? Thanks.
How is the state of the PR? Let me know if you need any help with it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your comments! I have fixed the embedding layer, and all tests pass when I run:
pytest tests/test_xlora.py
Here is the coverage:
src/peft/tuners/xlora/__init__.py 3 0 100%
src/peft/tuners/xlora/classifier.py 88 9 90% 74-77, 80, 118-120, 142-143
src/peft/tuners/xlora/config.py 35 7 80% 82-85, 87-90, 93, 96, 101
src/peft/tuners/xlora/layer.py 110 30 73% 112, 114, 159, 161, 170-171, 186, 194-223
src/peft/tuners/xlora/model.py 171 14 92% 72-82, 142, 150, 154, 159-160, 275, 307, 312, 315
I also updated ruff
and ran make style
which produced no formatting changes but gave several errors such as the following, which I ignored as I have not modified that part of the codebase.
src/peft/tuners/tuners_utils.py:568:9: F811 Redefinition of unused `active_adapter` from line 503
|
567 | @property
568 | def active_adapter(self) -> str | list[str]:
| ^^^^^^^^^^^^^^ F811
569 | # use a property to ensure that active_adapter is not set directly, instead use the set_adapter method
570 | return self._active_adapter
|
= help: Remove definition: `active_adapter`
|
||
device = None | ||
for module in base.modules(): | ||
# Check the exact type because classes like OPTLearnedPositionalEmbedding inherit from nn.Embedding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check for the exact type or use isinstance
? My thought was that isinstance
is not strict enough, but I cannot think of a case where a class is a subtype of a LoRA layer (at the moment) and we need to handle that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think isinstance
should also work, though the only existing layer that would be affected is AdaLoraLayer
, so it won't make a big difference either way.
It looks like The only state which should be shared are the saved model files, I think. |
Yeah, really strange, I can't replicate the issue locally and also some CI passes, some fails, so it looks flaky.
How are the model files shared? The |
I made the |
I added the scope information to the LoRA adapter saving too, and the tests pass locally now. |
Does that mean you could reproduce the error locally? How did you run the tests, did you use CPU or GPU, what OS did you use? Ideally, we should fix the issue without changing the scope of the saved lora adapters to "function", as this means we need to create them again for each test, which is pretty wasteful. Do you have any suspicion what kind of side effect could be responsible for the test failure? |
I ran
Yeah, I'm curious if recreating them will be the solution though - perhaps something is being overwritten when we save with Pytorch? I noticed that when I flip the order of |
Oh I see now, when I change the order I also get the error. The tests are dumping everything into the same temporary directory. There is an easy fix for this: Let's use a separate temporary directory for each test using the
|
@BenjaminBossan I updated the tests to use the |
Looks like the tests are failing with this error:
But the X-LoRA tests pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests are now passing, nice! As you correctly observed, the failing CI is unrelated, so no need to worry about that.
While thinking about this implementation a bit more, I think I came up with a further simplification, which would allow us to revert all changes to lora/model.py
. This would be quite nice, because I don't want to make LoraModel
more complex for X-LoRA, since the former should theoretically not have to know about the latter. Please check if my suggestion makes change.
src/peft/tuners/lora/model.py
Outdated
@@ -164,7 +164,8 @@ def _prepare_model(self, peft_config: LoraConfig, model: nn.Module): | |||
model (`nn.Module`): | |||
The model that is going to be adapted. | |||
""" | |||
if peft_config.layer_replication: | |||
# Handle X-LoRA case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I found a way to allow us to remove all these changes to lora/model.py
. The main issue is the following: During XLoraModel.__init__
, we want to create the LoraModel
instance based on the XLoraConfig
. This LoraModel
is not supposed to contain any actual LoRA adapter, as those will come later through the config.adapters
. So what we need is to implement a way to create this "empty" LoraModel
without needing to change LoraModel
itself. Here is my idea:
First, let's make a change to XLoraModel.__init__
by creating a copy of the XLoraConfig
that "imitates" a normal LoraConfig
:
modified src/peft/tuners/xlora/model.py
@@ -140,7 +140,15 @@ class XLoraModel(BaseTuner):
conf = config[adapter_name]
else:
conf = config
- lora_model = LoraModel(model, config.copy(), adapter_name)
+
+ # create an empty LoraModel
+ base_lora_config = copy.copy(conf)
+ base_lora_config.target_modules = DUMMY_TARGET_MODULES
+ # imitate a LoraConfig, fields might need to be updated if LoraConfig is updated
+ base_lora_config.layer_replication = None
+ base_lora_config.bias = "none"
+ lora_model = LoraModel(model, base_lora_config, adapter_name)
+
So we set the required attributes so that we no longer need the extra checks in LoraModel
that this PR adds. Also, we have added DUMMY_TARGET_MODULES
. What is this? It's a constant defined in constants.py
as DUMMY_TARGET_MODULES = "dummy-target-modules"
. This is a special value that allows us to create an "empty" LoraModel
. For this to work, we also need a small change in BaseTuner
:
modified src/peft/tuners/tuners_utils.py
@@ -395,6 +395,10 @@ class BaseTuner(nn.Module, ABC):
self._prepare_model(peft_config, model)
is_target_modules_in_base_model = False
key_list = [key for key, _ in model.named_modules()]
+ if getattr(peft_config, "target_modules", None) == DUMMY_TARGET_MODULES:
+ # dummy adapter, we allow not matching any module
+ key_list = []
+ is_target_modules_in_base_model = True
I tested this locally and the tests pass after reverting all changes to lora/model.py
.
I think this is the more elegant solution, because it should not be necessary for LoraModel
to know about XLoRA.
All tests pass locally after the separation of concerns change. |
Are test failures perhaps caused by the fact that we are downloading models during the testing? |
Yes, you can ignore them, we have some strange issues with timeouts lately. |
Ah, ok. Are there any other changes you would like me to make before merge? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I don't see any issues left that would require fixing, so finally this PR can be approved :-) I know it's been a long time in the making, so thanks a lot for your patience and your work on X-LoRA.
I'll give you the opportunity to also do a last check to see if we missed anything, since there were a lot of smaller changes recently. If you give the thumbs up, I can merge the PR.
Note that I still think it is very important to also add documentation and at least one example. Otherwise, users will have a very hard time discovering this method, which would be a pity, given the amount of work that went into it. So I hope you'll add those in a future PR. That one should be a lot less work ;-)
|
||
device = None | ||
for module in base.modules(): | ||
# Check the exact type because classes like OPTLearnedPositionalEmbedding inherit from nn.Embedding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think isinstance
should also work, though the only existing layer that would be affected is AdaLoraLayer
, so it won't make a big difference either way.
I think this is ready to merge, all the X-LoRA functionality is implemented! Perhaps I can do a follow-up PR to add docs. Thanks for all your help. |
That would be really great. |
Paper link: https://arxiv.org/abs/2402.07148
This PR integrates X-LoRA by creating a new tuner model type on the level of
LoraModel
. Please see #1472.Changes
Although the new model type is a subclass of
LoraModel
, this is only an implementation detail to remove the need for nestedPeftModel
s. In a similar vein of thought, I have updated the signatures of the tuner__init__
functions to allow the method swapping and ensure thexLoRAModel
is a tuner and not on the "level" ofPeftModel
.__init__
functions to take a back reference (not used for all butxLoRAModel
).xLoRAModel
andxLoRAConfig
.xLoRAModel
.Status
LoraModel
somewhere before layer swapping inxLoRAModel.__init__