Use load_dataset to load imagenet-1K But find a empty dataset #7139

fscdc · 2024-09-05T15:12:22Z

Describe the bug

def get_dataset(data_path, train_folder="train", val_folder="val"):
    traindir = os.path.join(data_path, train_folder)
    valdir = os.path.join(data_path, val_folder)

    def transform_val_examples(examples):
        transform = Compose([
            Resize(256),
            CenterCrop(224),
            ToTensor(),
        ])
        examples["image"] = [transform(image.convert("RGB")) for image in examples["image"]]
        return examples

    def transform_train_examples(examples):
        transform = Compose([
            RandomResizedCrop(224),
            RandomHorizontalFlip(),
            ToTensor(),
        ])
        examples["image"] = [transform(image.convert("RGB")) for image in examples["image"]]
        return examples

    # @fengsicheng: This way is very slow for big dataset like ImageNet-1K (but can pass the network problem using local dataset)
    # train_set = load_dataset("imagefolder", data_dir=traindir, num_proc=4)
    # test_set = load_dataset("imagefolder", data_dir=valdir, num_proc=4)

    train_set = load_dataset("imagenet-1K", split="train", trust_remote_code=True)                                                                                                                                                                                                            
    test_set = load_dataset("imagenet-1K", split="test", trust_remote_code=True)

    print(train_set["label"])

    train_set.set_transform(transform_train_examples)
    test_set.set_transform(transform_val_examples)

    return train_set, test_set

above the code, but output of the print is a list of None:

Steps to reproduce the bug

just ran the code
see the print

Expected behavior

I do not know how to fix this, can anyone provide help or something? It is hurry for me

Environment info

datasets version: 2.21.0
Platform: Linux-5.4.0-190-generic-x86_64-with-glibc2.31
Python version: 3.10.14
huggingface_hub version: 0.24.6
PyArrow version: 17.0.0
Pandas version: 2.2.2
fsspec version: 2024.6.1

The text was updated successfully, but these errors were encountered:

the-silent-geek · 2024-09-30T10:50:50Z

Imagenet-1k is a gated dataset which means you’ll have to agree to share your contact info to access it. Have you tried this yet? Once you have, you can sign in with your user token (you can find this in your Hugging Face account settings) when prompted by running.

huggingface-cli login
train_set  = load_dataset('imagenet-1k', split='train', use_auth_token=True)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use load_dataset to load imagenet-1K But find a empty dataset #7139

Use load_dataset to load imagenet-1K But find a empty dataset #7139

fscdc commented Sep 5, 2024

the-silent-geek commented Sep 30, 2024

Use load_dataset to load imagenet-1K But find a empty dataset #7139

Use load_dataset to load imagenet-1K But find a empty dataset #7139

Comments

fscdc commented Sep 5, 2024

Describe the bug

Steps to reproduce the bug

Expected behavior

Environment info

the-silent-geek commented Sep 30, 2024