Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ffcv imagenet won't start #376

Open
VarusJ opened this issue May 20, 2024 · 1 comment
Open

ffcv imagenet won't start #376

VarusJ opened this issue May 20, 2024 · 1 comment

Comments

@VarusJ
Copy link

VarusJ commented May 20, 2024

Hi! I am training a resnet50 using ffcv imagenet here is my config
image

I am having trouble getting it to start as shown here
image
It won't start at 0. I checked CUDA is working fine. Please help! Thanks a lot!

@VarusJ
Copy link
Author

VarusJ commented May 20, 2024

I set some debug point and it turns out there is nothing yield in the for loop of a ffcv loader:

for ix, (images, target) in enumerate(train_loader): .....

I define the train_loader as follows:

def create_train_loader(self, train_dataset, num_workers, batch_size,
                            distributed, in_memory):
        this_device = f'cuda:{self.gpu}'
        train_path = Path(train_dataset)
        assert train_path.is_file()

        res = self.get_resolution(epoch=0)
        self.decoder = RandomResizedCropRGBImageDecoder((res, res))
        gaussian_kernel_size = 5
        sigma = 2
        image_pipeline: List[Operation] = [
            self.decoder,
            RandomHorizontalFlip(),
            ToTensor(),
            transforms.RandomApply([transforms.GaussianBlur(gaussian_kernel_size, sigma)], p=0.5),
            ToDevice(ch.device(this_device), non_blocking=True),
            ToTorchImage(),
            NormalizeImage(IMAGENET_MEAN, IMAGENET_STD, np.float16)
        ]

        label_pipeline: List[Operation] = [
            IntDecoder(),
            ToTensor(),
            Squeeze(),
            ToDevice(ch.device(this_device), non_blocking=True)
        ]

        order = OrderOption.RANDOM if distributed else OrderOption.QUASI_RANDOM
        loader = Loader(train_dataset,
                        batch_size=batch_size,
                        num_workers=num_workers,
                        order=order,
                        os_cache=in_memory,
                        drop_last=True,
                        pipelines={
                            'image': image_pipeline,
                            'label': label_pipeline
                        },
                        distributed=distributed)
        
        print("loader: ", loader)

        return loader

Could really use some insights!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant