Model compatiblity #50
-
Hi! I noticed that you are neither relying on Stanford ColBERT, nor on RAGatouille in your dependencies. What models are compatible with |
Beta Was this translation helpful? Give feedback.
Replies: 10 comments 5 replies
-
Hello, Indeed, PyLate is a standalone library that use sentence-transformers (and thus transformers) for the modeling, so we do not use Stanford nor RAGatouille! Right now, the models usable are the ones trained using the library, as well as ColBERT-v2 and ColBERT-small from AnswerAI. For starter, I'll share a conversion script soon so people can translate their favorite models. Did you have any model in mind so I can help? |
Beta Was this translation helpful? Give feedback.
-
Makes sense! But how come edit: I'm trying to get up and running in German which is unfortunately a bit difficult. I came across these models: AdrienB134/ColBERTv1.0-german-mmarcoDE: Only ColBERTv1, but seems ready to go. |
Beta Was this translation helpful? Give feedback.
-
It comes from the fact that I know Ben and I worked with him to add the weights for PyLate on the repository! I am still making some adjustments to handle more models (e.g, the very recent jina-colbert-v2) but it's already usable and should be even better very soon! |
Beta Was this translation helpful? Give feedback.
-
So by "stanford-nlp model", you mean the reference implementation? I guess this warning refers to the loading logic you mentioned. Is there any way to store the converted model on disk in order to save some initialization time?
Apart from that, kudos for this library. I got a small retriever running already :) |
Beta Was this translation helpful? Give feedback.
-
Yes indeed, the models built using this repository (and also RAGatouille ones as it is using the lib as backend). I am not 100% sure the model from Antoine Louis will be loaded correctly as it has been trained with its own codebase (and idk how much the modeling is compatible). You can save the model locally by using My last PR (#54) allow to load the recent jina-colbert-v2 model which is an amazing multilingual ColBERT. You could use this one once the PR is merged. |
Beta Was this translation helpful? Give feedback.
-
That's good to know. I haven't checked the results, but so far everything works without errors. How much work would it be to make sure |
Beta Was this translation helpful? Give feedback.
-
I honestly have no idea as I did not dig much into ColBERT-XM code. |
Beta Was this translation helpful? Give feedback.
-
Alright, I understand. I guess the following warning indicates that the conversion did not work properly...?
|
Beta Was this translation helpful? Give feedback.
-
Not really, it just means that the query/document prefixes have been added to the vocabulary. Which model is it? |
Beta Was this translation helpful? Give feedback.
-
Still |
Beta Was this translation helpful? Give feedback.
It comes from the fact that I know Ben and I worked with him to add the weights for PyLate on the repository!
However, this approach come at the cost of having to modify repositories, duplicate weights and actually created some issues.
Thus, I reworked the loading logic in #52 and you should now be able to load any existing stanford-nlp model!
I am still making some adjustments to handle more models (e.g, the very recent jina-colbert-v2) but it's already usable and should be even better very soon!