Why is the performance of parallel runs of examples/torch/classification very bad? #1516

i3abghany · 2023-01-19T19:13:52Z

i3abghany
Jan 19, 2023

Hello.

I am trying to generate multiple compressed models using NNCF. I am using CPU for training and I have access to so many cores. It seems to me that running the classification example with ResNet18 for CIFAR100 is best when run sequentially. When I run many instances using different configuration files, the performance is just terrible for individual tasks.

Why are runs interfering with each other and penalizing the overall performance? I am using the same dataset for all of them and I only maintain one copy of it. Could it be the case that training maintains locks over the files so only one run can have access to the data at any point in time?

I have tried different numbers of parallelly executed runs and it seems like it doesn't make much difference. Starting from 2 running processes, I can observe noticeable performance loss in individual runs.

Any leads will be appreciated. Thanks in advance!

vshampor · 2023-01-20T10:04:08Z

vshampor
Jan 20, 2023
Maintainer

Greetings, @i3abghany!

Is this effect compression-specific? Does it reproduce if you run training via the classification example without the "compression": { ... } section specified in the NNCF .json config file?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is the performance of parallel runs of examples/torch/classification very bad? #1516

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Why is the performance of parallel runs of examples/torch/classification very bad? #1516

i3abghany Jan 19, 2023

Replies: 1 comment

vshampor Jan 20, 2023 Maintainer

i3abghany
Jan 19, 2023

vshampor
Jan 20, 2023
Maintainer