-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lora Mask based on lora index #348
Conversation
vllm/worker/habana_model_runner.py
Outdated
end_pos = start_pos + self.lora_config.max_lora_rank | ||
lora_mask[i, start_pos:end_pos] = ones | ||
lora_mask = lora_mask.to('hpu') | ||
lora_logits_mask = lora_mask |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain why now logits mask is pointing to mask, before it was left None for decode phase.
Is that because now in line 1906 there can be reference to an object that is None? If so I would rather add there check if lora_logits_mask is not none instead of changing the logic of what lora_logits_mask is keeping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@michalkuligowski for decode phase lora_mask and logits_mask are same; previously we were setting it at execute_model; just moved to create_lora_mask for a cleaner code
cdee227
to
c2572b9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also check if these changes do not break long lora context test. We may need to port these changes to long lora context branch to check that (maybe Sanju/Ruheena can do this check).
c2572b9
to
cafde9c
Compare
cafde9c
to
9d62244
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
Changes the filling of lora mask from lora_id to lora_index. This is needed to ensure that the mask does not fail in case lora id is greater than max_loras