Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when using AutoTokenizer to load local files without network #31712

Closed
2 of 4 tasks
pppppkun opened this issue Jun 29, 2024 · 3 comments
Closed
2 of 4 tasks

Error when using AutoTokenizer to load local files without network #31712

pppppkun opened this issue Jun 29, 2024 · 3 comments

Comments

@pppppkun
Copy link

System Info

  • transformers version: 4.42.3
  • Platform: Linux-5.15.0-91-generic-x86_64-with-glibc2.35
  • Python version: 3.10.14
  • Huggingface_hub version: 0.23.4
  • Safetensors version: 0.4.3
  • Accelerate version: 0.31.0
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.3.0+cu121 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?:
  • Using GPU in script?:
  • GPU type: NVIDIA A100-PCIE-40GB

Who can help?

@ArthurZucker

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Here are the results of my analysis and the corresponding steps to reproduce:

I examined the stack trace in step 4 and found that the issue may stem from line 505 in transformers/dynamic_module_utils.py within the get_class_from_dynamic_module function, where the first parameter repo_id is incorrectly set when calling get_cached_module_file. It should have been set to the parameter pretrained_model_name_or_path (in my case, the value of this parameter is /home/xx/chatglm3-6b), but instead, it received THUDM/chatglm3-6b--tokenization_chatglm.ChatGLMTokenizer, as set in line 497.

I believe the logic in lines 496-499 needs adjustment. When pretrained_model_name_or_path represents a file path, repo_id should directly reflect pretrained_model_name_or_path. Whether or not my analysis is correct, I would like to personally fix this issue and contribute to the open-source process.

  1. The server is in a state where it cannot connect to the network.
  2. Cloned https://huggingface.co/THUDM/chatglm3-6b to the proxy machine and copied it from the proxy machine to the directory /home/xx/chatglm3-6b.
  3. Ran the following code:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('/home/xx/chatglm3-6b', trust_remote_code=True)
  1. Encountered the following error:
Could not locate the tokenization_chatglm.py inside THUDM/chatglm3-6b.
Traceback (most recent call last):
  File "{}/lib/python3.10/site-packages/urllib3/connection.py", line 196, in _new_conn
    sock = connection.create_connection(
  File "{}/lib/python3.10/site-packages/urllib3/util/connection.py", line 85, in create_connection
    raise err
  File "{}/lib/python3.10/site-packages/urllib3/util/connection.py", line 73, in create_connection
    sock.connect(sa)
OSError: [Errno 101] Network is unreachable

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "{}/lib/python3.10/site-packages/urllib3/connectionpool.py", line 789, in urlopen
    response = self._make_request(
  File "{}/lib/python3.10/site-packages/urllib3/connectionpool.py", line 490, in _make_request
    raise new_e
  File "{}/lib/python3.10/site-packages/urllib3/connectionpool.py", line 466, in _make_request
    self._validate_conn(conn)
  File "{}/lib/python3.10/site-packages/urllib3/connectionpool.py", line 1095, in _validate_conn
    conn.connect()
  File "{}/lib/python3.10/site-packages/urllib3/connection.py", line 615, in connect
    self.sock = sock = self._new_conn()
  File "{}/lib/python3.10/site-packages/urllib3/connection.py", line 211, in _new_conn
    raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7fefc286a7a0>: Failed to establish a new connection: [Errno 101] Network is unreachable

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "{}/lib/python3.10/site-packages/requests/adapters.py", line 667, in send
    resp = conn.urlopen(
  File "{}/lib/python3.10/site-packages/urllib3/connectionpool.py", line 843, in urlopen
    retries = retries.increment(
  File "{}/lib/python3.10/site-packages/urllib3/util/retry.py", line 519, in increment
    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /THUDM/chatglm3-6b/resolve/main/tokenization_chatglm.py (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fefc286a7a0>: Failed to establish a new connection: [Errno 101] Network is unreachable'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "{}/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1722, in _get_metadata_or_catch_error
    metadata = get_hf_file_metadata(url=url, proxies=proxies, timeout=etag_timeout, headers=headers)
  File "{}/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "{}/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1645, in get_hf_file_metadata
    r = _request_wrapper(
  File "{}/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 372, in _request_wrapper
    response = _request_wrapper(
  File "{}/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 395, in _request_wrapper
    response = get_session().request(method=method, url=url, **params)
  File "{}/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "{}/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "{}/lib/python3.10/site-packages/huggingface_hub/utils/_http.py", line 66, in send
    return super().send(request, *args, **kwargs)
  File "{}/lib/python3.10/site-packages/requests/adapters.py", line 700, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /THUDM/chatglm3-6b/resolve/main/tokenization_chatglm.py (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fefc286a7a0>: Failed to establish a new connection: [Errno 101] Network is unreachable'))"), '(Request ID: a2a5cb2f-dfdd-4747-aad0-fe648d2bfc70)')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "{}/lib/python3.10/site-packages/transformers/utils/hub.py", line 402, in cached_file
    resolved_file = hf_hub_download(
  File "{}/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "{}/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1221, in hf_hub_download
    return _hf_hub_download_to_cache_dir(
  File "{}/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1325, in _hf_hub_download_to_cache_dir
    _raise_on_head_call_error(head_call_error, force_download, local_files_only)
  File "{}/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1826, in _raise_on_head_call_error
    raise LocalEntryNotFoundError(
huggingface_hub.utils._errors.LocalEntryNotFoundError: An error happened while trying to locate the file on the Hub and we cannot find the requested files in the local cache. Please check your connection and try again or make sure your Internet connection is on.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "{}/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 871, in from_pretrained
    tokenizer_class = get_class_from_dynamic_module(class_ref, pretrained_model_name_or_path, **kwargs)
  File "{}/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 505, in get_class_from_dynamic_module
    final_module = get_cached_module_file(
  File "{}/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 308, in get_cached_module_file
    resolved_module_file = cached_file(
  File "{}/lib/python3.10/site-packages/transformers/utils/hub.py", line 445, in cached_file
    raise EnvironmentError(
OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like THUDM/chatglm3-6b is not the path to a directory containing a file named tokenization_chatglm.py.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

Expected behavior

The tokenizer should be loaded correctly

@ArthurZucker
Copy link
Collaborator

cc @itazap can you have a look?

@itazap
Copy link
Contributor

itazap commented Jul 11, 2024

Hello @pppppkun ! I'm not able to reproduce this issue, I cloned the repo and copied it to a local folder ('home/chatglm3-6b') and it correctly accesses it without network with 'home/chatglm3-6b' or an absolute path to this folder. I'm not sure if the \ is the problem or perhaps I misunderstood your issue! Let me know please! 🤗

Copy link

github-actions bot commented Aug 5, 2024

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants