Loading ImageNet with FFCV for distributed training #97

numpee · 2022-01-24T07:57:26Z

numpee
Jan 24, 2022

Hi there,
I'm trying out FFCV for the first time and I've been writing the ImageNet dataset with the writing parameters listed here. It seems that at around 50k images, the writer uses around 27GB of storage. Assuming a linear scaling, I expect the ImageNet file to be ~700GB.

But is it ever feasible to load 700GB of data onto the RAM?

According to the tuning guide, os_cache should always be set to True for distributed training, since only the RANDOM setting is supported in distributed mode. Does that mean that if I don't have a machine that can fit the ImageNet dataset onto the RAM, I can't use FFCV (+ distributed)?

Thanks,

GuillaumeLeclerc · 2022-01-24T08:47:25Z

GuillaumeLeclerc
Jan 24, 2022
Maintainer

I @numpee . We plan on adding QUASI_RANDOM to distributed soon hopefully. However in your situation nothing forces you to use these parameters if you don't have the resources required to do so (some machines have 700Gb of RAM but I assume yours don't. You can simply use JPEG or a combination of JPEG+RAW so that it fits in your RAM (I have no idea how much RAM you have). We have a paragraph in the Benchmark section of our website that gives different settings and their respective imagenet sizes.

1 reply

numpee Jan 24, 2022
Author

Thanks for your reply.
I have 128GB of RAM, so I guess I'll need to use some of the less memory-intensive settings. I completely missed the Benchmarks page, so big thanks for pointing that out.

I'll be watching out for further updates, especially regarding QUASI_RANDOM.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loading ImageNet with FFCV for distributed training #97

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Loading ImageNet with FFCV for distributed training #97

numpee Jan 24, 2022

Replies: 1 comment · 1 reply

GuillaumeLeclerc Jan 24, 2022 Maintainer

numpee Jan 24, 2022 Author

numpee
Jan 24, 2022

Replies: 1 comment 1 reply

GuillaumeLeclerc
Jan 24, 2022
Maintainer

numpee Jan 24, 2022
Author