-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend repository schema to include sentence embedder #15
Conversation
datastew/embedding.py
Outdated
|
||
|
||
class MPNetAdapter(EmbeddingModel): | ||
def __init__(self, model="sentence-transformers/all-mpnet-base-v2"): | ||
logging.getLogger().setLevel(logging.INFO) | ||
self.mpnet_model = SentenceTransformer(model) | ||
self.model = SentenceTransformer(model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Model property is a str in GPT4 and an object in MPNet, that shouldnt be the case if it is implementing the same class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed the issue by renaming the model attribute of GPT4Adapter to model_name.
datastew/repository/model.py
Outdated
@@ -18,6 +17,15 @@ def __init__(self, name: str, id: str) -> object: | |||
self.id = id | |||
|
|||
|
|||
class SentenceEmbedder(Base): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We dont need a seperate Table for a single string property
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The embedding model can just be a single string property in the embedding class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored the code accordingly.
datastew/repository/weaviate.py
Outdated
for item in result['data']['Get']['Mapping']: | ||
sentence_embedders.add(item["hasSentenceEmbedder"]) | ||
except Exception as e: | ||
raise RuntimeError(f"Failed to fetch terminologies: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it failed to fetch terminologies here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A typo on my side, sorry. Fixed in the latest push.
Extended both SQL and Weaviate repository schemas to include sentence embedders.
Made necessary changes to functions utilizing these classes.
Added a new function allowing the user to filter based on the terminology and model.
Added new tests for each repository for the function retrieving all the stored sentence embedders.