You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Document Embeddings does not allow local models and therefore creates a privacy hazard.
As I don't assume that this was done due to malicious design by the Bioinformatics Lab at University of Ljubljana, Slovenia, you need to fix this and enable local open source models.
The text was updated successfully, but these errors were encountered:
Thanks, we would also prefer to have a local option. Do you know of any small models that are easily pip-installable? Preferably not like 1GB dependency?
You could try Small Language Models like gemini Nano, orca-2-7b etc. and in general use spacy as in
# Install spacy
pip install -U spacy
# Download the small English model
python -m spacy download en_core_web_sm
import spacy
# Load the installed model
nlp = spacy.load("en_core_web_sm")
# Use the model
doc = nlp("This is a sentence.")
Spacy would be super beneficial for adding the named entity recognition option! Perhaps also a way to add Chinese tokenisation.
Note that Spacy would not cover 17 languages that FastText does (Catalan, Croatian, Lithuanian, Macedonian, Ukrainian, Arabic, Azerbaijani, Bengali, Hindi, Tajik, Turkish, Norwegian Nynorsk, Nepali, Kazakh, Indonesian, Hungarian, Hebrew) or other 25 languages that multilingual SBERT covers. However, as an option, it would be great to have!
Spacy's English model is 12 MB (the smallest model) + an added 11MB in Spacy dependency.
Document Embeddings does not allow local models and therefore creates a privacy hazard.
As I don't assume that this was done due to malicious design by the Bioinformatics Lab at University of Ljubljana, Slovenia, you need to fix this and enable local open source models.
The text was updated successfully, but these errors were encountered: