Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate X-LoRA #1491

Merged
merged 190 commits into from
Jul 5, 2024
Merged

Integrate X-LoRA #1491

merged 190 commits into from
Jul 5, 2024

Conversation

EricLBuehler
Copy link
Contributor

@EricLBuehler EricLBuehler commented Feb 20, 2024

Paper link: https://arxiv.org/abs/2402.07148

This PR integrates X-LoRA by creating a new tuner model type on the level of LoraModel. Please see #1472.

Changes

Although the new model type is a subclass of LoraModel, this is only an implementation detail to remove the need for nested PeftModels. In a similar vein of thought, I have updated the signatures of the tuner __init__ functions to allow the method swapping and ensure the xLoRAModel is a tuner and not on the "level" of PeftModel.

  • Update signatures of tuner __init__ functions to take a back reference (not used for all but xLoRAModel).
  • Implement and export xLoRAModel and xLoRAConfig.
  • The special API for X-LoRA is located in xLoRAModel.

Status

  • Instantiate a LoraModel somewhere before layer swapping in xLoRAModel.__init__
  • Fully integrate.
  • Test implementation.
  • Add documentation for methods.

@BenjaminBossan
Copy link
Member

Let me know once this is ready for review.

@EricLBuehler
Copy link
Contributor Author

Hi @BenjaminBossan, I think this is ready for review.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for working on this PR. I did not do an in-depth review yet, but from what I saw it already looks very solid.

Before working on the details, I think it's best to get the overall design polished. For this, I still have a few questions, please check out my comments. Together, I'm sure we can figure out a way to simplify some of the code.

Also, it would really help to have examples and/or tests to see X-LoRA in action. Do you have something that you could add? Later, we should also work on documentation, but this can wait until for now.

Edit: For other readers, here is the paper link: https://arxiv.org/abs/2402.07148

src/peft/tuners/xlora/insertion.py Outdated Show resolved Hide resolved
src/peft/tuners/xlora/insertion.py Outdated Show resolved Hide resolved
src/peft/tuners/xlora/insertion.py Outdated Show resolved Hide resolved
src/peft/tuners/xlora/model.py Outdated Show resolved Hide resolved
src/peft/tuners/xlora/model.py Outdated Show resolved Hide resolved
src/peft/tuners/xlora/model.py Outdated Show resolved Hide resolved
npy = result.numpy()
numpy.save(file, npy)

def flush_log_scalings(self, path: str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why we need this? Is this just for debugging/monitoring purposes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API enables the user to get a log of the scalings. It is useful for generating visualizations such as this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. In that case, there should be some kind of example provided to illustrate how to use this, I think otherwise users will not discover this feature.

Also, I'd suggest for this method to only return the indices_map and allow the caller to decide themselves if they want to save it on disk as json or do something else with it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't returning the indices_map (or maybe the seqlens_map would be better, as it contains the tensors) make flush_log_scalings redundant to get_scalings_log? Of course, a dedicated method to calculate the indices_map may be helpful, and given the addition of such a method I think removing flush_log_scalings would be OK.

Additionally, there is actually a wrapper method on XLoraModel with the rest of the API which contains a docstring. This way, it should be easy to find. This method in the classifier is actually internal, I have prefixed it with _ now.

@@ -149,7 +155,7 @@ def active_adapters(self) -> list[str]:
adapters = self.active_adapter
if isinstance(adapters, str):
adapters = [adapters]
return adapters
return list(filter(lambda x: len(x) > 0, adapters))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why this is needed. Is this because of self.active_adapter = "". Could you please explain the reason behind that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During some of the testing I did, self.active_adapters would return a function instead of executing the @property. Through debugging, I discovered that it was somehow connected to self.active_adapter not being set, as no adapter is initially loaded.

However, suppose that I could allow the default adapter to be loaded, and then delete it later. That should make those obsolete. I have pushed a commit to hopefully resolve this.

@@ -123,7 +129,7 @@ def __init__(self, model: PreTrainedModel, peft_config: PeftConfig, adapter_name
else:
self._peft_config = None
cls = PEFT_TYPE_TO_MODEL_MAPPING[peft_config.peft_type]
self.base_model = cls(model, {adapter_name: peft_config}, adapter_name)
self.base_model = cls(model, {adapter_name: peft_config}, adapter_name, self)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really want to avoid this, as this looks like a mixing of concerns. Surely, we can figure out a better way. Could you explain why this was needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was needed because in XLoraModel.__init__ we load adapters into the PeftModel via load_adapter and for use by the XLoraClassifier. This way, automatic loading is achieved. However, all that matters is that we are able to get some reference to PeftModel into the constructor.

Perhaps we could add some code after we call cls(model...), and check if self.base_model is a XLoraModel. Then, we could call a __post_init__ which would take the PeftModel self, and do the adapter loading. Would this be more elegant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just pushed a commit (c5cdfc3) which is the latest one currently for easy reversion. Does this resolve the concern?

src/peft/peft_model.py Outdated Show resolved Hide resolved
npy = result.numpy()
numpy.save(file, npy)

def flush_log_scalings(self, path: str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. In that case, there should be some kind of example provided to illustrate how to use this, I think otherwise users will not discover this feature.

Also, I'd suggest for this method to only return the indices_map and allow the caller to decide themselves if they want to save it on disk as json or do something else with it.

src/peft/tuners/xlora/classifier.py Outdated Show resolved Hide resolved
src/peft/tuners/xlora/model.py Outdated Show resolved Hide resolved
model_peft.get_nb_trainable_parameters,
model_peft.generate,
)
model_peft.save_pretrained = peft_model_wrapper.save_pretrained # type: ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I fully understand this yet. Above, we call:

self.base_model.__xlora_post_init__(model, peft_config, adapter_name)

but here it looks like this method requires 4 arguments. What is the model_peft, is this the PeftModel?

So what I suspect is happening here is that you want to modify the behavior of generate and save_pretrained of the PeftModel without actually modifying PeftModel, is this the goal?

When it comes to generate, the PeftModel calls generate on the base_model anyway, could we not add the modification at that point?

When it comes to save_pretrained, I think we could check if we can somehow make the two approaches work together, I'd need to understand what the custom save_pretrained method does differently.

One easy way that we could allow custom save_pretrained methods without a lot of changes in PeftModel would be something like this:

class PeftModel:
    def save_pretrained(...):
        if hasattr(self.base_model, "_custom_save_pretrained"):
            return self.base_model._custom_save_pretrained(...)
        # same code as right now

This way, we only added 2 extra lines in PeftModel but would allow the underlying model to implement their own save_pretrained method.

# TODO(EricLBuehler): Evaluate effectiveness and performance degradation
self.peft_model.base_model.eval()
if not self.config.use_trainable_adapters:
for name, param in self.peft_model.base_model.named_parameters():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks strange to me, generate should normally not have anything to do with parameter updates. Could you explain why this is required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly! We discovered during training that the adapters were being set to trainable after each generate call. If you could provide any insight to why that may be the case, it would be great!

src/peft/tuners/xlora/model.py Outdated Show resolved Hide resolved
@BenjaminBossan
Copy link
Member

We discovered during training that the adapters were being set to trainable after each generate call.

This should certainly not happen. Could you please open an issue and provide an example so that we can fix it?

Regarding the state of the PR, it's actually a bit hard for me to tell which comments have been addressed and which haven't (as GH removes them when the line has been changed, even if it may not have been addressed). Regardless of some designs which I think could be improved, I think the most efficient way forward would be if you could provide an example/tests so that I can see X-LoRA in action. This may help answer some questions I still have.

@EricLBuehler
Copy link
Contributor Author

Thank you, I work on some example code to reproduce the behavior and will raise an issue.

Regarding examples of X-LoRA, we have put together some examples using the xlora package API here. Although this is slightly different (ex., add_xlora_to_model and from_pretrained which are parts of get_peft_model and PeftModel respectively in this PR) from the XLoraModel proposed here, it documents the same methods.

I hope this will work as a demonstration of X-LoRA in action, would it be better for me to provide some examples using this PR?

@BenjaminBossan
Copy link
Member

Regarding examples of X-LoRA, we have put together some examples using the xlora package API here. Although this is slightly different (ex., add_xlora_to_model and from_pretrained which are parts of get_peft_model and PeftModel respectively in this PR) from the XLoraModel proposed here, it documents the same methods.

I hope this will work as a demonstration of X-LoRA in action, would it be better for me to provide some examples using this PR?

I took a look at these examples, thanks. But let's get something to work based on this PR, even if very basic. This helps me understand what code paths are taken and how the different parts interact. Otherwise, the review is much harder for me. Also, we will need something like this eventually because we want to add tests for X-LoRA. As I said, it can be very basic for a start.

@EricLBuehler
Copy link
Contributor Author

I was able to put together a small example which shows how one would use the API as documented here, which I have attached in a plain text file as GH does not allow .py to be attached.
example.txt

This is very basic and just shows creation, generation and simple API usage. I hope this helps!

@BenjaminBossan
Copy link
Member

I tried to run your example but encountered some problems:

  1. There is still a merge conflict in the forward method of lora.Linear. This is because we merged DoRA recently. My suggestion would be to apply X-LoRA only when not using DoRA. A quick and dirty solution should be enough for now, I think we'll rework that part in the future anyway.
  2. As I don't have the adapter checkpoints referenced in the script, I tried to create some checkpoints with randomly initialized LoRA weights. However, this led to a bizarre error when trying to load them. It turns out that when I tried to create a normal LoRA adapter, it was not applied to any layer, despite setting target_modules correctly. Could it be possible that something has been messed up?

@EricLBuehler
Copy link
Contributor Author

EricLBuehler commented Feb 29, 2024

Thank you for trying it out. I ran the following code on a local machine with the latest installation from this branch to test the loading of a normal LoRA adapter, and it seemed to work, as after printing the model I can see lora.Linear in the specified layers.

from transformers import AutoModelForCausalLM, OPTForCausalLM, AutoTokenizer
from peft import LoraConfig

model_id = "HuggingFaceH4/zephyr-7b-beta"
model = AutoModelForCausalLM.from_pretrained(model_id)

lora_config = LoraConfig(
    target_modules=[
        "q_proj",
        "gate_proj",
        "o_proj",
        "v_proj",
        "k_proj"
    ],
    init_lora_weights=False
)

model.add_adapter(lora_config, adapter_name="adapter_1")
print(model)

Strangely, when I printed out the model in the X-LoRA test script, it showed proper injection of adapters as well as the classifier. To begin fixing this, could you please provide a minimum-reproducible example for the LoRA adapter loading so that I can find the error?

@BenjaminBossan
Copy link
Member

could you please provide a minimum-reproducible example for the LoRA adapter loading so that I can find the error?

Using your exact branch with the merge conflict removed, when I run your script with this slight modification, I get the issue of no LoRA weights being applied:

- model.add_adapter(lora_config, adapter_name="adapter_1")
+ from peft import get_peft_model
+ model = get_peft_model(model, lora_config)

@EricLBuehler
Copy link
Contributor Author

EricLBuehler commented Feb 29, 2024

I found the bug: it was because of a mistake in a hasattr check. I added this because an XLoraModel should not have a default adapter injected. Perhaps you could try it again?

@BenjaminBossan
Copy link
Member

Thanks for fixing this, it should now be working. There is still an issue with a merge conflict being unresolved as mentioned earlier:

  1. There is still a merge conflict in the forward method of lora.Linear. This is because we merged DoRA recently. My suggestion would be to apply X-LoRA only when not using DoRA. A quick and dirty solution should be enough for now, I think we'll rework that part in the future anyway.

@EricLBuehler
Copy link
Contributor Author

I have now fixed the merge conflict.

@EricLBuehler
Copy link
Contributor Author

Hi @BenjaminBossan, I'm not sure if you have had a chance to look at the updated PR, as merge conflict has been resolved, and I think it is ready for another review. Would you prefer a new PR to be opened as a cleaner slate for further reviews?

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for working on my previous comments. I was busy with some other stuff and only today could do another in detailed review of your PR. As I have to go now, it is unfortunately not 100% complete, but I still wanted to give you the feedback so you don't have to wait.

In addition to the individual comments I made, I have a few general questions:

  1. How is the X-LoRA adapter trained? Could you please provide an example? Eventually, we'll want to move this to unit tests.
  2. Could you please add the copyright notice to all new modules?
  3. X-LoRA only really works with transformers language models, right? Can we document this more clearly? Also, do you think it would be possible to make this work with other types of models?
  4. I'm not a fan of the type annotations of the style self.inner: nn.ModuleList = nn.ModuleList([]) or model: nn.Module = self.model, especially when followed by a # type: ignore. Same with the use of typing.cast. Is that because your IDE flags the code otherwise? Maybe you could deactivate the type checker for this project, as PEFT isn't really well annotated.

src/peft/tuners/tuners_utils.py Outdated Show resolved Hide resolved
if not isinstance(config, XLoraConfig):
raise TypeError(f"Expected 'XLoraConfig', got '{type(config)}' instead.")

device = infer_device() # As in PeftModel.load_adapter, torch_device = infer_device(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment is cut off?

src/peft/tuners/xlora/classifier.py Outdated Show resolved Hide resolved
src/peft/tuners/xlora/classifier.py Outdated Show resolved Hide resolved
src/peft/tuners/xlora/config.py Outdated Show resolved Hide resolved
src/peft/tuners/xlora/model.py Show resolved Hide resolved
) -> None:
super().__init__(model, config, adapter_name)

def _xlora_post_init(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it wouldn't be better to convert this to a standalone function, something like def post_init_lora(peft_model). Not sure if we need all the other arguments, can they not be derived from the PeftModel instance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. Most of those are deeply nested by the time post_init_lora is called, so I thought it would increase readability to pass it this way. Would you prefer it if they are accessed through the PeftModel?

src/peft/tuners/xlora/util.py Outdated Show resolved Hide resolved
config.device = torch.device(device)

# If we are passed adapters in the kwargs, it is already in the config.
# If no adapters are passed, config.adapters is None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly that this is for the case where we call save_pretrained on the X-LoRA model and then load this pretrained model again with from_pretrained? The only "new" thing added in that case would be the X-LoRA classifier, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is correct. I simply load the weights for the classifier here.

@@ -434,7 +446,14 @@ def forward(self, x: torch.Tensor, *args: Any, **kwargs: Any) -> torch.Tensor:
x = x.to(lora_A.weight.dtype)

if not self.use_dora[active_adapter]:
result = result + lora_B(lora_A(dropout(x))) * scaling
if _xlora_layer is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not so happy with this addition to the LoraLayers. It makes reading and understanding them more complex and requires all LoRA layers to be updated (e.g. what about the bnb lora layers?).

I couldn't come up with an alternative yet, but I wonder if we could achieve something with wrapping and/or forward hooks. I'll continue to think about this tomorrow but wanted to let you know already in case you have some ideas.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, it is not very easy to read. Perhaps we could implement some sort of hook in the LoraLayer so that techniques such as DoRA and X-LoRA could use that instead of modifying the layer source?

@EricLBuehler
Copy link
Contributor Author

Thank you for your review! I have updated the code with your suggestions.

  • How is the X-LoRA adapter trained? Could you please provide an example? Eventually, we'll want to move this to unit tests.
    Each LoRA adapter for an X-LoRA model is trained by training a LoRA adapter separately. The weights are then loaded, and the X-LoRA model is trained by only training the classifier. Would you recommend that I add a few sentences to the docstring detailing what is above, or some other code example? If a code example would be better, what part of the X-LoRA training process should I show?
  • Could you please add the copyright notice to all new modules?
    I have added it to each new module.
  • X-LoRA only really works with transformers language models, right? Can we document this more clearly? Also, do you think it would be possible to make this work with other types of models?
    With the current implementation of the classifier, it will not work with other types of models as it requires a sequence length. However, it would be possible to make it work with other types of models by changing the way the resulting scalings are reshape/expanded. I have added a note in the XLoraModel docstring.
  • I'm not a fan of the type annotations of the style self.inner: nn.ModuleList = nn.ModuleList([]) or model: nn.Module = self.model, especially when followed by a # type: ignore. Same with the use of typing.cast. Is that because your IDE flags the code otherwise? Maybe you could deactivate the type checker for this project, as PEFT isn't really well annotated.
    Yes, my IDE would flag the code otherwise. I have removed these.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the updates. I left a few comments, but let's focus more on the bigger picture and not each detail.

When I tried to run your code, I encountered an error:

AttributeError: 'XLoraConfig' object has no attribute 'target_modules'

I also think some other parts of the code don't quite work. To avoid this in the future, let's start adding unit tests. Let's start simple and add a new file tests/test_xlora.py with a functional test based on the example you posted earlier. It should also contain a test of training the XLoRA classifier.

Regarding the issue of only working with transformers language models, I think it's fine for the start. We can think of generalizing this in a follow up PR.

Again, thanks a lot for this contribution and your patience.

src/peft/tuners/xlora/classifier.py Outdated Show resolved Hide resolved

logits = self.last.forward(hidden_state)

### Repeat to make layerwise scalings if the classifier layer does not
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add this explanation as a comment? Thanks.

src/peft/tuners/xlora/util.py Outdated Show resolved Hide resolved
src/peft/tuners/xlora/model.py Show resolved Hide resolved
src/peft/tuners/lora/layer.py Outdated Show resolved Hide resolved
@BenjaminBossan
Copy link
Member

How is the state of the PR? Let me know if you need any help with it.

Copy link
Contributor Author

@EricLBuehler EricLBuehler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your comments! I have fixed the embedding layer, and all tests pass when I run:

pytest tests/test_xlora.py

Here is the coverage:

src/peft/tuners/xlora/__init__.py                         3      0   100%
src/peft/tuners/xlora/classifier.py                      88      9    90%   74-77, 80, 118-120, 142-143
src/peft/tuners/xlora/config.py                          35      7    80%   82-85, 87-90, 93, 96, 101
src/peft/tuners/xlora/layer.py                          110     30    73%   112, 114, 159, 161, 170-171, 186, 194-223
src/peft/tuners/xlora/model.py                          171     14    92%   72-82, 142, 150, 154, 159-160, 275, 307, 312, 315

I also updated ruff and ran make style which produced no formatting changes but gave several errors such as the following, which I ignored as I have not modified that part of the codebase.

src/peft/tuners/tuners_utils.py:568:9: F811 Redefinition of unused `active_adapter` from line 503
    |
567 |     @property
568 |     def active_adapter(self) -> str | list[str]:
    |         ^^^^^^^^^^^^^^ F811
569 |         # use a property to ensure that active_adapter is not set directly, instead use the set_adapter method
570 |         return self._active_adapter
    |
    = help: Remove definition: `active_adapter`


device = None
for module in base.modules():
# Check the exact type because classes like OPTLearnedPositionalEmbedding inherit from nn.Embedding
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check for the exact type or use isinstance? My thought was that isinstance is not strict enough, but I cannot think of a case where a class is a subtype of a LoRA layer (at the moment) and we need to handle that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think isinstance should also work, though the only existing layer that would be affected is AdaLoraLayer, so it won't make a big difference either way.

@EricLBuehler
Copy link
Contributor Author

EricLBuehler commented Jul 1, 2024

It looks like test_save_load_functional_pt failed for one test but passed in another? That seems super strange, could it be the order in which test_save_load_functional_pt and test_save_load_functional are executed?

The only state which should be shared are the saved model files, I think.

@BenjaminBossan
Copy link
Member

It looks like test_save_load_functional_pt failed for one test but passed in another? That seems super strange, could it be the order in which test_save_load_functional_pt and test_save_load_functional are executed?

Yeah, really strange, I can't replicate the issue locally and also some CI passes, some fails, so it looks flaky.

The only state which should be shared are the saved model files, I think.

How are the model files shared? The tmp_dir should be a separate one for the two tests. The only shared state I can spot at the moment is the tokenizer. Could you please try making its fixture "function" scoped instead of "class"? That shouldn't slow down the tests by much.

@EricLBuehler
Copy link
Contributor Author

How are the model files shared? The tmp_dir should be a separate one for the two tests. The only shared state I can spot at the moment is the tokenizer. Could you please try making its fixture "function" scoped instead of "class"? That shouldn't slow down the tests by much.

I made the tmp_dir and the tokenizer both function scoped, perhaps an interaction in the tmp directory is the problem.

@EricLBuehler
Copy link
Contributor Author

I added the scope information to the LoRA adapter saving too, and the tests pass locally now.

@BenjaminBossan
Copy link
Member

I added the scope information to the LoRA adapter saving too, and the tests pass locally now.

Does that mean you could reproduce the error locally? How did you run the tests, did you use CPU or GPU, what OS did you use?

Ideally, we should fix the issue without changing the scope of the saved lora adapters to "function", as this means we need to create them again for each test, which is pretty wasteful. Do you have any suspicion what kind of side effect could be responsible for the test failure?

@EricLBuehler
Copy link
Contributor Author

Does that mean you could reproduce the error locally? How did you run the tests, did you use CPU or GPU, what OS did you use?

I ran pytest tests/test_xlora.py and all tests passed on my CUDA GPU with WSL2.

Ideally, we should fix the issue without changing the scope of the saved lora adapters to "function", as this means we need to create them again for each test, which is pretty wasteful. Do you have any suspicion what kind of side effect could be responsible for the test failure?

Yeah, I'm curious if recreating them will be the solution though - perhaps something is being overwritten when we save with Pytorch? I noticed that when I flip the order of test_save_load_functional_pt and test_save_load_functional so that test_save_load_functional comes before test_save_load_functional_pt I can reproduce the issue locally sometimes, which implies that it is a sporadic thing?

@BenjaminBossan
Copy link
Member

Yeah, I'm curious if recreating them will be the solution though - perhaps something is being overwritten when we save with Pytorch? I noticed that when I flip the order of test_save_load_functional_pt and test_save_load_functional so that test_save_load_functional comes before test_save_load_functional_pt I can reproduce the issue locally sometimes, which implies that it is a sporadic thing?

Oh I see now, when I change the order I also get the error. The tests are dumping everything into the same temporary directory. There is an easy fix for this: Let's use a separate temporary directory for each test using the tmp_path fixture provided by pytest. Here are the steps to take:

  • Let's rename tmp_dir to lora_dir, just to avoid confusion with the name. Adjust saved_lora_adapters accordingly.
  • Let's create a lora_embedding_dir fixture which is the same as lora_dir but we will use it for saved_lora_embedding_adapters.
    • Now saved_lora_adapters and saved_lora_embedding_adapters are created once ("class" scope) and they have separate directories, so there should be no side effect from the tests.
  • Finally, replace all tmp_dir that are being used by the tests by tmp_path, i.e. the built-in pytest fixture. This now ensures that each test gets their own temporary directory.

@EricLBuehler
Copy link
Contributor Author

EricLBuehler commented Jul 2, 2024

@BenjaminBossan I updated the tests to use the tmp_path fixture and did the renames of tmp_dir -> lora_dir and created lora_embedding_dir. The tests pass now on my machine when I invert the order (I left it inverted in the committed test).

@EricLBuehler
Copy link
Contributor Author

Looks like the tests are failing with this error:

FAILED tests/test_decoder_models.py::PeftDecoderModelTester::test_inference_safetensors_14_test_hf_internal_testing_tiny_random_GPTNeoXForCausalLM_boft - requests.exceptions.ReadTimeout: (ReadTimeoutError("HTTPSConnectionPool(host='huggingface.co', port=443): Read timed out. (read timeout=10)"), '(Request ID: 73f83a27-fedd-45f8-9c59-4b8a51388973)')

But the X-LoRA tests pass.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests are now passing, nice! As you correctly observed, the failing CI is unrelated, so no need to worry about that.

While thinking about this implementation a bit more, I think I came up with a further simplification, which would allow us to revert all changes to lora/model.py. This would be quite nice, because I don't want to make LoraModel more complex for X-LoRA, since the former should theoretically not have to know about the latter. Please check if my suggestion makes change.

@@ -164,7 +164,8 @@ def _prepare_model(self, peft_config: LoraConfig, model: nn.Module):
model (`nn.Module`):
The model that is going to be adapted.
"""
if peft_config.layer_replication:
# Handle X-LoRA case
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I found a way to allow us to remove all these changes to lora/model.py. The main issue is the following: During XLoraModel.__init__, we want to create the LoraModel instance based on the XLoraConfig. This LoraModel is not supposed to contain any actual LoRA adapter, as those will come later through the config.adapters. So what we need is to implement a way to create this "empty" LoraModel without needing to change LoraModel itself. Here is my idea:

First, let's make a change to XLoraModel.__init__ by creating a copy of the XLoraConfig that "imitates" a normal LoraConfig:

modified   src/peft/tuners/xlora/model.py
@@ -140,7 +140,15 @@ class XLoraModel(BaseTuner):
             conf = config[adapter_name]
         else:
             conf = config
-        lora_model = LoraModel(model, config.copy(), adapter_name)
+
+        # create an empty LoraModel
+        base_lora_config = copy.copy(conf)
+        base_lora_config.target_modules = DUMMY_TARGET_MODULES
+        # imitate a LoraConfig, fields might need to be updated if LoraConfig is updated
+        base_lora_config.layer_replication = None
+        base_lora_config.bias = "none"
+        lora_model = LoraModel(model, base_lora_config, adapter_name)
+

So we set the required attributes so that we no longer need the extra checks in LoraModel that this PR adds. Also, we have added DUMMY_TARGET_MODULES. What is this? It's a constant defined in constants.py as DUMMY_TARGET_MODULES = "dummy-target-modules". This is a special value that allows us to create an "empty" LoraModel. For this to work, we also need a small change in BaseTuner:

modified   src/peft/tuners/tuners_utils.py
@@ -395,6 +395,10 @@ class BaseTuner(nn.Module, ABC):
         self._prepare_model(peft_config, model)
         is_target_modules_in_base_model = False
         key_list = [key for key, _ in model.named_modules()]
+        if getattr(peft_config, "target_modules", None) == DUMMY_TARGET_MODULES:
+            # dummy adapter, we allow not matching any module
+            key_list = []
+            is_target_modules_in_base_model = True

I tested this locally and the tests pass after reverting all changes to lora/model.py.

I think this is the more elegant solution, because it should not be necessary for LoraModel to know about XLoRA.

@EricLBuehler
Copy link
Contributor Author

All tests pass locally after the separation of concerns change.

@EricLBuehler
Copy link
Contributor Author

Are test failures perhaps caused by the fact that we are downloading models during the testing?

@BenjaminBossan
Copy link
Member

Are test failures perhaps caused by the fact that we are downloading models during the testing?

Yes, you can ignore them, we have some strange issues with timeouts lately.

@EricLBuehler
Copy link
Contributor Author

Ah, ok. Are there any other changes you would like me to make before merge?

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I don't see any issues left that would require fixing, so finally this PR can be approved :-) I know it's been a long time in the making, so thanks a lot for your patience and your work on X-LoRA.

I'll give you the opportunity to also do a last check to see if we missed anything, since there were a lot of smaller changes recently. If you give the thumbs up, I can merge the PR.

Note that I still think it is very important to also add documentation and at least one example. Otherwise, users will have a very hard time discovering this method, which would be a pity, given the amount of work that went into it. So I hope you'll add those in a future PR. That one should be a lot less work ;-)


device = None
for module in base.modules():
# Check the exact type because classes like OPTLearnedPositionalEmbedding inherit from nn.Embedding
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think isinstance should also work, though the only existing layer that would be affected is AdaLoraLayer, so it won't make a big difference either way.

@EricLBuehler
Copy link
Contributor Author

I think this is ready to merge, all the X-LoRA functionality is implemented! Perhaps I can do a follow-up PR to add docs. Thanks for all your help.

@BenjaminBossan BenjaminBossan merged commit 58afb34 into huggingface:main Jul 5, 2024
12 of 14 checks passed
@BenjaminBossan
Copy link
Member

Perhaps I can do a follow-up PR to add docs.

That would be really great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants