Documentation Suggestion #233

virajmehta · 2020-05-02T16:02:00Z

virajmehta
May 2, 2020

Hi guys,

I'm trying to implement a WGAN-GP in flax and got it to work but spent a few hours hunting for a bug that turned out to be me misunderstanding the library. So I figured I'd write you a note so you can include this gotcha in the documentation.

The problem was that when I applied gradients to an optimizer, I'd assumed that my original copy of the model (in this case the Generator) could be called on some inputs and would contain the updated weights. Only at the end of the MNIST example when it says "At any particular optimzation step, optimizer.target contains the model." gave me a clue that that might not be the case.

This behavior seems very functional, which is fine, but certainly different than what I'd expect coming from PyTorch or TF. It would be great if that was made more clear in the documentation. Maybe even the optimizer / train step interface might be changed in the example to make it clear that model(inputs) doesn't work unless you reassign model to the new target. A suggestion would be to make training step also return the model but I'm sure you guys have a view on that.

Thanks,
Viraj

Answered by avital

May 12, 2020

Hi @virajmehta, nice to meet you!

First of all, super awesome that you're using Flax to implement GANs -- we would love to link to your example. Per our new example policy, we avoid adding new official examples because of the long term maintenance expectations but we definitely want to highlight your model by linking to it prominently. Take a look at flax/examples/README.md and please do file a pull request adding a link to your code!

Per your comment on the confusion on the functional API, I think you're hitting a good point here. Our Model wrapper is a convenience that also leads to users not being fully aware of the functional approach. One proposal is to just scrap Model altogether, o…

View full answer

virajmehta · 2020-05-04T14:17:14Z

virajmehta
May 4, 2020
Author

oh also I would be happy to contribute the code for the WGAN-GP as an example if that would be useful to you guys. I am pretty happy with the quality of the code.

0 replies

avital · 2020-05-12T13:53:06Z

avital
May 12, 2020

Hi @virajmehta, nice to meet you!

First of all, super awesome that you're using Flax to implement GANs -- we would love to link to your example. Per our new example policy, we avoid adding new official examples because of the long term maintenance expectations but we definitely want to highlight your model by linking to it prominently. Take a look at flax/examples/README.md and please do file a pull request adding a link to your code!

Per your comment on the confusion on the functional API, I think you're hitting a good point here. Our Model wrapper is a convenience that also leads to users not being fully aware of the functional approach. One proposal is to just scrap Model altogether, or make its params attribute read-only. I'm not sure what the best approach is right now, I'd like to think about it some more. But we really appreciate you pointing us out to this mental confusion and please share more of those!

0 replies

virajmehta · 2020-05-13T13:41:03Z

virajmehta
May 13, 2020
Author

Sounds great! I'll figure out how best to open source my code and will add a PR when I've figured that out.

On the functional API comment, I think perhaps the train step of the MNIST example could make it clear either in comments or in code structure that the "live" model is only in optimizer.target.

For example, my GAN and other code all returns optimizer and explicitly separately optimizer.target so that I can overwrite self.generator or self.critic or whatever in my training class, and I think that would pair nicely with an immutable Model. I don't think it's necessary to go that far, but I think an update there would have spared me the confusion. Thanks!

1 reply

avital Jun 25, 2020

Yes, we are reworking our module API (inspired but not identical to the proposal in #208) and all examples and will probably make the top-level training loops hold a variables dict directly rather than a Model. I think that would solve at least part of the confusion you were hitting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation Suggestion #233

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Documentation Suggestion #233

virajmehta May 2, 2020

Replies: 3 comments · 1 reply

virajmehta May 4, 2020 Author

avital May 12, 2020

virajmehta May 13, 2020 Author

avital Jun 25, 2020

virajmehta
May 2, 2020

Replies: 3 comments 1 reply

virajmehta
May 4, 2020
Author

avital
May 12, 2020

virajmehta
May 13, 2020
Author