Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NonMatchingChecksumError when loading cnn_dailymail dataset in Google Colab #8981

Open
Tkag0001 opened this issue Nov 30, 2024 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@Tkag0001
Copy link

Tkag0001 commented Nov 30, 2024

Description:
I encountered a NonMatchingChecksumError while trying to load the cnn_dailymail dataset using TensorFlow Datasets in Google Colab. The error indicates that the checksum of the downloaded file does not match the expected checksum.

Environment information
Operating System: Google Colab

Python Version: 3.10.12

TensorFlow Datasets Version: 4.9.7

TensorFlow Version: 2.17.1

I have upgraded tfds-nightly but it still exists.

About the code

# Tải bộ dữ liệu cnn_dailymail phiên bản mới nhất.
data, info = tfds.load(name='cnn_dailymail', with_info=True)
print(info)

I try to specify the version of dataset but can't solve.

# Tải bộ dữ liệu cnn_dailymail phiên bản mới nhất.
data, info = tfds.load(name='cnn_dailymail:3.4.0', with_info=True)
print(info)

Link for colab

About the log

Downloading and preparing dataset Unknown size (download: Unknown size, generated: Unknown size, total: Unknown size) to /root/tensorflow_datasets/cnn_dailymail/3.4.0...
DlCompleted...:  80%4/5 [00:00<00:00,  8.02url/s]
DlSize...: 100%50967909/50967909 [00:00<00:00, 127601440.78MiB/s]
Extractioncompleted...: 
 0/0 [00:00<?, ? file/s]
---------------------------------------------------------------------------
NonMatchingChecksumError                  Traceback (most recent call last)
[<ipython-input-18-b60d40b51be8>](https://localhost:8080/#) in <cell line: 2>()
      1 # Tải bộ dữ liệu cnn_dailymail phiên bản mới nhất.
----> 2 data, info = tfds.load(name='cnn_dailymail', with_info=True)
      3 print(info)

19 frames
[/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/download/download_manager.py](https://localhost:8080/#) in _register_or_validate_checksums(self, url, url_info, path)
    521             'https://www.tensorflow.org/datasets/overview#fixing_nonmatchingchecksumerror'
    522         )
--> 523         raise NonMatchingChecksumError(msg)
    524 
    525   def _is_checksum_registered(self, url: str) -> bool:

NonMatchingChecksumError: Artifact https://drive.google.com/uc?export=download&id=0BwmD_VLjROrfTHk4NFg2SndKcjQ, downloaded to /root/tensorflow_datasets/downloads/cnn_dailymail/ucexport_download_id_0BwmD_VLjROrfTHk4NFg2SndKG8BdJPpt2iRo6Dpzz23CByJuAePEilB-pxbcBCHaWDs.tmp.49d5b8b3b580447995507b4fcf0d1179/download, has wrong checksum:
* Expected: UrlInfo(size=151.23 MiB, checksum='e8fbc0027e54e0a916abd9c969eb35f708ed1467d7ef4e3b17a56739d65cb200', filename='cnn_stories.tgz')
* Got: UrlInfo(size=2.36 KiB, checksum='99bb99a237f4825f8282a9dd5a53aa7d11973e2e68123493711c93ded42f2673', filename='download')
To debug, see: https://www.tensorflow.org/datasets/overview#fixing_nonmatchingchecksumerror

Expected behavior
I expected to obtain the dataset of CNN_DailyMail dataset

@Tkag0001 Tkag0001 added the bug Something isn't working label Nov 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants
@fylux @Tkag0001 and others