Privacy violation by only offering online embeddings! #1057

Bardo-Konrad · 2024-04-23T11:16:55Z

Document Embeddings does not allow local models and therefore creates a privacy hazard.

As I don't assume that this was done due to malicious design by the Bioinformatics Lab at University of Ljubljana, Slovenia, you need to fix this and enable local open source models.

markotoplak · 2024-04-23T11:36:43Z

Thanks, we would also prefer to have a local option. Do you know of any small models that are easily pip-installable? Preferably not like 1GB dependency?

Bardo-Konrad · 2024-04-23T11:57:38Z

You could try Small Language Models like gemini Nano, orca-2-7b etc. and in general use spacy as in

# Install spacy
pip install -U spacy

# Download the small English model
python -m spacy download en_core_web_sm
import spacy

# Load the installed model
nlp = spacy.load("en_core_web_sm")

# Use the model
doc = nlp("This is a sentence.")

ajdapretnar · 2024-05-29T09:55:31Z

Spacy would be super beneficial for adding the named entity recognition option! Perhaps also a way to add Chinese tokenisation.
Note that Spacy would not cover 17 languages that FastText does (Catalan, Croatian, Lithuanian, Macedonian, Ukrainian, Arabic, Azerbaijani, Bengali, Hindi, Tajik, Turkish, Norwegian Nynorsk, Nepali, Kazakh, Indonesian, Hungarian, Hebrew) or other 25 languages that multilingual SBERT covers. However, as an option, it would be great to have!
Spacy's English model is 12 MB (the smallest model) + an added 11MB in Spacy dependency.

Bardo-Konrad added the bug report label Apr 23, 2024

janezd transferred this issue from biolab/orange3 May 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Privacy violation by only offering online embeddings! #1057

Privacy violation by only offering online embeddings! #1057

Bardo-Konrad commented Apr 23, 2024 •

edited

Loading

markotoplak commented Apr 23, 2024

Bardo-Konrad commented Apr 23, 2024 •

edited

Loading

ajdapretnar commented May 29, 2024

Privacy violation by only offering online embeddings! #1057

Privacy violation by only offering online embeddings! #1057

Comments

Bardo-Konrad commented Apr 23, 2024 • edited Loading

markotoplak commented Apr 23, 2024

Bardo-Konrad commented Apr 23, 2024 • edited Loading

ajdapretnar commented May 29, 2024

Bardo-Konrad commented Apr 23, 2024 •

edited

Loading

Bardo-Konrad commented Apr 23, 2024 •

edited

Loading