Fixing pickle errors to allow for multiprocessing #37

dmklee · 2023-03-06T14:39:46Z

When training on multiple GPU's using ddp, there is an error with pickling escnn models. The solution was pointed out QUVA-Lab/e2cnn/pull/69 for the e2cnn library so I made similar modifications to escnn. This was discussed in issue #16.

I made a basic test file, test_pickling.py and I have fixed the problem for C(n) and D(n) on R2. I am happy to extend it to other groups if you want. The pickling error occurs because escnn uses local functions which cannot be pickled. The fix is to convert local functions to the __call__ methods of a class.

Signed-off-by: David Klee <klee.d@northeastern.edu>

codingS3b · 2023-07-17T09:32:49Z

This definitely helped me when using escnn together with pytorch lightning which, when saving checkpoints, also relied on pickling. Using your fork I was finally able to reload my trained lightning models, which failed before. Thanks!

Gabri95 · 2023-07-17T17:06:01Z

Hey!

Sorry for not having merged your PR yet, but I am not fully convinced that this is the best way to go at the moment.
In particular, by following this strategy, I should replace most functions to deal with subgroups and representations with new classes. This seems an overhead and it would introduce some redundancy.
Moreover, using lambda expression won't be possible anymore.

However, at the end, one needs to only pickle classes from escnn.nn, not from the other subpackages.
Objects like groups, irreps and gspaces don't really have a state and can just be regenerated from some ids.
That means that, theoretically, they don't need to be completely pickled, but one needs to only store these ids.

I had already implemented a similar trick for groups, that's what the static _generator() method and the _keys() property are for. You can see them used for caching purpose here.

Now, the question for me is how to integrate this with pickling.
I read here that one can customize how a class is being pickled via the __getstate__() and __setstate__() methods, but I need to understand this a little better to make sure it actually solves all the problems here before refactoring everything.

Please, feel free to comment on this or to give me some advice if you have some experience with this.

Thanks,
Gabriele

fixed pickling issue for r2

e3c3c79

Signed-off-by: David Klee <klee.d@northeastern.edu>

kalekundert mentioned this pull request Aug 28, 2023

Make groups, irreps, and gspaces into stateless, picklable singletons #78

Open

kalekundert mentioned this pull request Feb 21, 2024

Saving and loading escnn models #1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing pickle errors to allow for multiprocessing #37

Fixing pickle errors to allow for multiprocessing #37

dmklee commented Mar 6, 2023

codingS3b commented Jul 17, 2023

Gabri95 commented Jul 17, 2023

Fixing pickle errors to allow for multiprocessing #37

Are you sure you want to change the base?

Fixing pickle errors to allow for multiprocessing #37

Conversation

dmklee commented Mar 6, 2023

codingS3b commented Jul 17, 2023

Gabri95 commented Jul 17, 2023