-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CroissantBuilder does not work on Windows machines #5546
Comments
Hey @zwouter, thanks a lot for opening the issue! I don't have access to a Windows machine. Can you help us investigate? From the logs, it seems to come from
For some reasons, it tries to load the
import mlcroissant as mlc
url = "http://huggingface.co/api/datasets/fashion_mnist/croissant"
ds = mlc.Dataset(url)
for x in ds.records(record_set="fashion_mnist"):
print(x) Thanks! |
Hi @marcenacp, thanks for the reply! I have the latest versions of mlcroissant and tfds-nightly installed, I created a new virtual environment yesterday to test this. That piece of code does not print anything if I run it on on Windows. |
Weird! Can you please try to delete all local caches? (Caches are located in |
Yess, just deleted the relevant chaches, same results. |
Sorry, that was a blind guess as I cannot reproduce what happens in Windows. Could you please help us understand why the following snippet doesn't print anything? import mlcroissant as mlc
url = "http://huggingface.co/api/datasets/fashion_mnist/croissant"
ds = mlc.Dataset(url)
for x in ds.records(record_set="fashion_mnist"):
print(x)
break You can install mlcroissant in dev mode: pip uninstall mlcroissant
git clone https://github.com/mlcommons/croissant
cd croissant/python/mlcroissant
pip install -e .[dev] Adding prints/debug points in
For each Thanks in advance for your help and contribution! |
No problem, I'm happy with any help I can get :) And thanks for the resources! Unfortunately, I don't think I have the time to completely debug this right now. |
Short description
When using a simple example code snippet of the CroissantBuilder to load datasets using the croissant format, it only seems to work on Linux.
The code snippet below correctly downloads and prepares a dataset on Collab, or WSL, but results in an error on Windows. All tested on a clean virtual environment.
Environment information
Operating System: Windows 11
Python version: 3.11.1
tensorflow-datasets
/tfds-nightly
version: tfds-nightly 4.9.6.dev202408050044tensorflow
/tf-nightly
version: tensorflow 2.17.0Does the issue still exists with the last
tfds-nightly
package (pip install --upgrade tfds-nightly
) ?Yes
Reproduction instructions
Link to logs
https://pastebin.com/fRrfn8jj
Expected behavior
A dataset builder is prepared such that I can use .as_data_source() later.
The text was updated successfully, but these errors were encountered: