Error when using AutoTokenizer to load local files without network #31712

pppppkun · 2024-06-29T17:11:33Z

System Info

transformers version: 4.42.3
Platform: Linux-5.15.0-91-generic-x86_64-with-glibc2.35
Python version: 3.10.14
Huggingface_hub version: 0.23.4
Safetensors version: 0.4.3
Accelerate version: 0.31.0
Accelerate config: not found
PyTorch version (GPU?): 2.3.0+cu121 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?:
Using GPU in script?:
GPU type: NVIDIA A100-PCIE-40GB

Who can help?

@ArthurZucker

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Here are the results of my analysis and the corresponding steps to reproduce:

I examined the stack trace in step 4 and found that the issue may stem from line 505 in transformers/dynamic_module_utils.py within the get_class_from_dynamic_module function, where the first parameter repo_id is incorrectly set when calling get_cached_module_file. It should have been set to the parameter pretrained_model_name_or_path (in my case, the value of this parameter is /home/xx/chatglm3-6b), but instead, it received THUDM/chatglm3-6b--tokenization_chatglm.ChatGLMTokenizer, as set in line 497.

I believe the logic in lines 496-499 needs adjustment. When pretrained_model_name_or_path represents a file path, repo_id should directly reflect pretrained_model_name_or_path. Whether or not my analysis is correct, I would like to personally fix this issue and contribute to the open-source process.

The server is in a state where it cannot connect to the network.
Cloned https://huggingface.co/THUDM/chatglm3-6b to the proxy machine and copied it from the proxy machine to the directory /home/xx/chatglm3-6b.
Ran the following code:

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('/home/xx/chatglm3-6b', trust_remote_code=True)

Encountered the following error:

Could not locate the tokenization_chatglm.py inside THUDM/chatglm3-6b.
Traceback (most recent call last):
  File "{}/lib/python3.10/site-packages/urllib3/connection.py", line 196, in _new_conn
    sock = connection.create_connection(
  File "{}/lib/python3.10/site-packages/urllib3/util/connection.py", line 85, in create_connection
    raise err
  File "{}/lib/python3.10/site-packages/urllib3/util/connection.py", line 73, in create_connection
    sock.connect(sa)
OSError: [Errno 101] Network is unreachable

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "{}/lib/python3.10/site-packages/urllib3/connectionpool.py", line 789, in urlopen
    response = self._make_request(
  File "{}/lib/python3.10/site-packages/urllib3/connectionpool.py", line 490, in _make_request
    raise new_e
  File "{}/lib/python3.10/site-packages/urllib3/connectionpool.py", line 466, in _make_request
    self._validate_conn(conn)
  File "{}/lib/python3.10/site-packages/urllib3/connectionpool.py", line 1095, in _validate_conn
    conn.connect()
  File "{}/lib/python3.10/site-packages/urllib3/connection.py", line 615, in connect
    self.sock = sock = self._new_conn()
  File "{}/lib/python3.10/site-packages/urllib3/connection.py", line 211, in _new_conn
    raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7fefc286a7a0>: Failed to establish a new connection: [Errno 101] Network is unreachable

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "{}/lib/python3.10/site-packages/requests/adapters.py", line 667, in send
    resp = conn.urlopen(
  File "{}/lib/python3.10/site-packages/urllib3/connectionpool.py", line 843, in urlopen
    retries = retries.increment(
  File "{}/lib/python3.10/site-packages/urllib3/util/retry.py", line 519, in increment
    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /THUDM/chatglm3-6b/resolve/main/tokenization_chatglm.py (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fefc286a7a0>: Failed to establish a new connection: [Errno 101] Network is unreachable'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "{}/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1722, in _get_metadata_or_catch_error
    metadata = get_hf_file_metadata(url=url, proxies=proxies, timeout=etag_timeout, headers=headers)
  File "{}/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "{}/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1645, in get_hf_file_metadata
    r = _request_wrapper(
  File "{}/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 372, in _request_wrapper
    response = _request_wrapper(
  File "{}/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 395, in _request_wrapper
    response = get_session().request(method=method, url=url, **params)
  File "{}/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "{}/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "{}/lib/python3.10/site-packages/huggingface_hub/utils/_http.py", line 66, in send
    return super().send(request, *args, **kwargs)
  File "{}/lib/python3.10/site-packages/requests/adapters.py", line 700, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /THUDM/chatglm3-6b/resolve/main/tokenization_chatglm.py (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fefc286a7a0>: Failed to establish a new connection: [Errno 101] Network is unreachable'))"), '(Request ID: a2a5cb2f-dfdd-4747-aad0-fe648d2bfc70)')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "{}/lib/python3.10/site-packages/transformers/utils/hub.py", line 402, in cached_file
    resolved_file = hf_hub_download(
  File "{}/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "{}/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1221, in hf_hub_download
    return _hf_hub_download_to_cache_dir(
  File "{}/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1325, in _hf_hub_download_to_cache_dir
    _raise_on_head_call_error(head_call_error, force_download, local_files_only)
  File "{}/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1826, in _raise_on_head_call_error
    raise LocalEntryNotFoundError(
huggingface_hub.utils._errors.LocalEntryNotFoundError: An error happened while trying to locate the file on the Hub and we cannot find the requested files in the local cache. Please check your connection and try again or make sure your Internet connection is on.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "{}/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 871, in from_pretrained
    tokenizer_class = get_class_from_dynamic_module(class_ref, pretrained_model_name_or_path, **kwargs)
  File "{}/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 505, in get_class_from_dynamic_module
    final_module = get_cached_module_file(
  File "{}/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 308, in get_cached_module_file
    resolved_module_file = cached_file(
  File "{}/lib/python3.10/site-packages/transformers/utils/hub.py", line 445, in cached_file
    raise EnvironmentError(
OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like THUDM/chatglm3-6b is not the path to a directory containing a file named tokenization_chatglm.py.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

Expected behavior

The tokenizer should be loaded correctly

The text was updated successfully, but these errors were encountered:

ArthurZucker · 2024-07-10T12:22:01Z

cc @itazap can you have a look?

itazap · 2024-07-11T10:39:11Z

Hello @pppppkun ! I'm not able to reproduce this issue, I cloned the repo and copied it to a local folder ('home/chatglm3-6b') and it correctly accesses it without network with 'home/chatglm3-6b' or an absolute path to this folder. I'm not sure if the \ is the problem or perhaps I misunderstood your issue! Let me know please! 🤗

github-actions · 2024-08-05T08:03:53Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions bot closed this as completed Aug 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when using AutoTokenizer to load local files without network #31712

Error when using AutoTokenizer to load local files without network #31712

pppppkun commented Jun 29, 2024

ArthurZucker commented Jul 10, 2024

itazap commented Jul 11, 2024

github-actions bot commented Aug 5, 2024

Error when using AutoTokenizer to load local files without network #31712

Error when using AutoTokenizer to load local files without network #31712

Comments

pppppkun commented Jun 29, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

ArthurZucker commented Jul 10, 2024

itazap commented Jul 11, 2024

github-actions bot commented Aug 5, 2024