Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use load_dataset to load imagenet-1K But find a empty dataset #7139

Open
fscdc opened this issue Sep 5, 2024 · 1 comment
Open

Use load_dataset to load imagenet-1K But find a empty dataset #7139

fscdc opened this issue Sep 5, 2024 · 1 comment

Comments

@fscdc
Copy link

fscdc commented Sep 5, 2024

Describe the bug

def get_dataset(data_path, train_folder="train", val_folder="val"):
    traindir = os.path.join(data_path, train_folder)
    valdir = os.path.join(data_path, val_folder)

    def transform_val_examples(examples):
        transform = Compose([
            Resize(256),
            CenterCrop(224),
            ToTensor(),
        ])
        examples["image"] = [transform(image.convert("RGB")) for image in examples["image"]]
        return examples

    def transform_train_examples(examples):
        transform = Compose([
            RandomResizedCrop(224),
            RandomHorizontalFlip(),
            ToTensor(),
        ])
        examples["image"] = [transform(image.convert("RGB")) for image in examples["image"]]
        return examples

    # @fengsicheng: This way is very slow for big dataset like ImageNet-1K (but can pass the network problem using local dataset)
    # train_set = load_dataset("imagefolder", data_dir=traindir, num_proc=4)
    # test_set = load_dataset("imagefolder", data_dir=valdir, num_proc=4)

    train_set = load_dataset("imagenet-1K", split="train", trust_remote_code=True)                                                                                                                                                                                                            
    test_set = load_dataset("imagenet-1K", split="test", trust_remote_code=True)

    print(train_set["label"])

    train_set.set_transform(transform_train_examples)
    test_set.set_transform(transform_val_examples)

    return train_set, test_set
above the code, but output of the print is a list of None:
image

Steps to reproduce the bug

  1. just ran the code
  2. see the print

Expected behavior

I do not know how to fix this, can anyone provide help or something? It is hurry for me

Environment info

  • datasets version: 2.21.0
  • Platform: Linux-5.4.0-190-generic-x86_64-with-glibc2.31
  • Python version: 3.10.14
  • huggingface_hub version: 0.24.6
  • PyArrow version: 17.0.0
  • Pandas version: 2.2.2
  • fsspec version: 2024.6.1
@the-silent-geek
Copy link

Imagenet-1k is a gated dataset which means you’ll have to agree to share your contact info to access it. Have you tried this yet? Once you have, you can sign in with your user token (you can find this in your Hugging Face account settings) when prompted by running.

huggingface-cli login
train_set  = load_dataset('imagenet-1k', split='train', use_auth_token=True)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants