-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to load model in offline mode using local files #968
Comments
So, I didn't mess around changing the cache dir or anything, but I downloaded the model once using default cache setup, unplugged my internet connection, and ran again with HF_HUB_OFFLINE = 1 and it works fine, loads everything from cache. I didn't bother using contents related to the model end up under /mycachedir/hub/models--timm--ViT-B-16-SigLIP-i18n-256/ ... but it's not a simple set of files, there are snapshot ids, refs, etc If you pre-load the cache and then copy as is into the docker container and make sure the base HF cache dir (not just a dir with the model files) matches that, I feel it should work |
Also, not using the global cache dir but specifying a local one for this specific model instantiation Doing this in a python console, first with connection, then without, works for me. Though possible some other cache hit may have been done to the global cache that I didn't notice, I didn't do in isolation as it would be in a container. NOTE, the tokenizer for siglip and other models where the tokenizer is on the hub needs a cache dir arg too. TODO: I need to make the tokenizer cache_dir arg explicit and more noticible, right now it's implicit, passed through kwargs to the underlying HF tokenizer wrapper (and it errors out on the models that don't pass through and need any tokenizer files).
|
As you were saying, running it locally after unplugging my internet connection and after first downloading the models works for me as well.
about that part, if i understand correctly, today we are not able to pass cache_dir param to the get_tokenizer function and it eventually takes the cache dir I have set as HF_CACHE_DIR? |
@yarden4998 no the cache_dir argument for get_tokenizer does work right now for models that pass through to a HF tokenizer like this model, when I tried above I verified both the tokenizer and model files ended up in ./cc folder. It is just not clear that it does so. If you don't do this it will try to load tokenzier from the default cache dir, and if that fails it will get stuck. I am not sure why there would still be any request with HF_HUB_OFFLINefE is set, it must be non-blocking as I don't see any hangs when my network connection is killed. |
@yarden4998 I made a few cache_dir related improvements (and fixed one instance where it was missing for getting model config), can by tried on #970 this branch |
Description:
I'm attempting to load a pretrained model (ViT-B-16-SigLIP-i18n-256) entirely in offline mode within a Docker container and AWS Lambda environment. Despite setting the appropriate environment variables for offline mode, the system still attempts to reach Hugging Face Hub, leading to the following error:
"We couldn't connect to 'https://huggingface.co/' to load this file, couldn't find it in the cached files and it looks like timm/ViT-B-16-SigLIP-i18n-256 is not the path to a directory containing a file named config.json.\nCheckout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'."
The open_clip functions I have used:
Steps Taken:
open_clip_config.json
open_clip_pytorch_model.bin
special_tokens_map.json
tokenizer.json
tokenizer_config.json
Environment:
Could you please provide guidance on why the offline mode isn’t functioning as expected, or if there are additional steps required to force the use of local files only?
The text was updated successfully, but these errors were encountered: