-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Experimental] Support Cross encoder models #10400
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add an example script to show how to use this API?
QQ: Why do we have to define a separate "cross_encoding"
task for this? I think we can keep using "embedding"
task if we just override the pooler
method instead of defining a new classification_output
method.
Yes, I've added one in the PR description but it's a good idea to add it to the documentation.
I thought about this, and the only reason I kept it that way was that in the serving layer I need to know what task is being done because I need to call the tokenizer with |
I think it may be simpler to make this a separate flag, similar to how we have a flag for multimodal models, rather than creating a new task for it. That way, we won't have to change our internals at all. |
I think having a separate API for this would be cleaner as well - perhaps a Scoring API where we output a single score? |
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
I've removed the "cross_encoding" task and added a Pending TODOs:
|
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
I've added a Pending TODOs:
|
texts: Union[SingletonPrompt, Sequence[SingletonPrompt]], | ||
text_pairs: Union[SingletonPrompt, Sequence[SingletonPrompt]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe text_1
and text_2
? text_pairs
is a bit confusing to me as it suggests that I should pass in a list of tuples.
if (hasattr(config, "sbert_ce_default_activation_function") | ||
and config.sbert_ce_default_activation_function is not None): | ||
self.default_activation_function = import_from_string( | ||
config.sbert_ce_default_activation_function)() | ||
else: | ||
self.default_activation_function = \ | ||
nn.Sigmoid() if config.num_labels == 1 else nn.Identity() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Factor out this code to transformer_utils
?
This PR contains a proof of concept to add support for Cross Encoder models reusing most of what was done to support embedding models, as well as the chat embeddings API. It's a bit hacky, but I think it helps to have something concrete to iterate and refine the design.
Response:
I've added a new model type and a new task type. From the point of view of the models and the model runner, this is not strictly necessary at this level because the inputs and outputs are compatible with the embedding models. However at the serving level we need to know that the model has a different task to be able to know for sure that we have to use the
text_pair
parameter of the tokenizer instead trying to apply a chat template.What's still missing:
LLM
classcc: @DarkLight1337 @flaviabeo