How to pass instructions for Instruction-based embedding models #293
-
Hi there :) I am looking into deploying several embedding models using Infinity on Runpod Serverless (https://github.com/runpod-workers/worker-infinity-embedding?tab=readme-ov-file). However, many of these models that I am testing out are Instruction-based models, and so I should pass instructions along with any text to embed. What is the correct way of doing this using Infinity Runpod workers? For instance, InstructOR expects tuples with (instruction, text) pairs, while Salesforce's Mistral expects that the instruction is part of the text (ala "Instruction: ... Text: ..."). |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
Its just a prompt template. Unless up-streamed in sentence-transformers & properly specified in config.json, I wish to not accommodate prompt templates in infinity. Instructor expects a special tuple, but ultimatley also formats it using a similar function. The instructor models are quite outdated, and Mistral to large, I would recommend using Bert/Deberta/.. models! For SFT Mistral the template is the following: def get_detailed_instruct(task_description: str, query: str) -> str:
return f'Instruct: {task_description}\nQuery: {query}' Please manage this piece of code client side. |
Beta Was this translation helpful? Give feedback.
-
Thanks, agreed that it might be best to handle this on client side. |
Beta Was this translation helpful? Give feedback.
Its just a prompt template. Unless up-streamed in sentence-transformers & properly specified in config.json, I wish to not accommodate prompt templates in infinity.
Instructor expects a special tuple, but ultimatley also formats it using a similar function. The instructor models are quite outdated, and Mistral to large, I would recommend using Bert/Deberta/.. models!
For SFT Mistral the template is the following:
https://huggingface.co/Salesforce/SFR-Embedding-Mistral
Please manage this piece of code client side.