Replies: 6 comments 2 replies
-
So, we have a lot of experience with Gin and tensor2tensor (and I personally worked on trax as well). Though Gin can be a lifesaver when trying to bring sanity to an existing codebase needing configuration management, we find that Gin tends to infect the entire codebase with its dependency injection approach and adds a number of issues, e.g. too much noise to stacktraces, making debugging much less pleasant. If our users wish to use Gin that's great: it can be a great way of organizing hparam settings and we really don't want to dictate how users control their training loops. We just want to avoid adding that additional complexity to our examples for now. That said, we would like to settle on a better 'canonical' strategy for hparam management in Flax. The one demonstrated in the examples is a kick-the-can-down-the-road non-answer. A subset of design questions on the table:
So: we're still trying to find the right spot in design space that will maximize utility for our users without getting in their way or precluding their own solutions. Further comments on the issue are most welcome! |
Beta Was this translation helpful? Give feedback.
-
Ack on timing, importance of clear docs, clear debugging, imposition on training strategy, and built-in vs. canonical recommendations / anti-framework. Partial metaphor in another domain is OpenWC that offers an opinionated docs site + a generator. To your point above about unconstrained the functional sub-classing know exactly what you mean. That ref to t2t was just noting the feature of range or set hparams. About hierarchy it's worth acknowledging there are two kinds to consider i.e. hierarchical configuration of a single model vs. a hierarchical lineage of configurations (perhaps the latter was source of complexity in t2t configs). Your reference to yaml vs. python config in the devops world is great food for thought (linked examples). In our case local + same language vs. remote + potentially different. Free thinking... perhaps useful to shift perspective from management of a library of independent hparam configs to individual experiment specification - that being the conjunction of settings and model and training etc. code (perhaps plus annotations for what is tunable) e.g. with an individual experiment specified by an individual python module. In this latter pattern, arbitrary hierarchy, composition, etc. is supported in the natural way of specifying a program instead of looking for a non-program DSL that can be used to configure arbitrary programs. The in-program specification of hparams won't break these out as cleanly as e.g. the Gin approach but you could easily write an editor plugin that extracts and displays these. ... good to better understand the problem and clearly the design context is complex. Thank you for the interesting discussion. |
Beta Was this translation helpful? Give feedback.
-
I don't see why flax has anything to do with hyper-parameter configurations. Flax is a library, and hparam configuration is something that should be chosen at the application level, not imposed by a library. One might wish to use flax in applications that already have their way of handling hyper-parameters, for example. Now if you mean "[...] in the flax examples" then yeah, but it shouldn't "infect" the flax library in any way imo. |
Beta Was this translation helpful? Give feedback.
-
We certainly aren't going to infect the core system w. hparam specific concerns, we just want to make sure we support underlying mechanisms that allow people to do what they want. Since the above was written a pattern that seems to work well is to have a model-specific dataclass of all the hparams that is fed through the layers by the user to minimize pointless "kwarg plumbing". The one improvement the new api revision will make w.o. any special hparam logic is allowing fine-grained hparams (e.g. quantization hparams for -every- parameter) by the same mechanism that feeds parameters and stateful variables into the module tree. |
Beta Was this translation helpful? Give feedback.
-
It's also important that these parameters could change during the training without recompilation. I.e. be treated as inputs to the graph. |
Beta Was this translation helpful? Give feedback.
-
Note that in the meantime ml-collections have been open sourced (https://pypi.org/project/ml-collections/) and we started updating the examples to use |
Beta Was this translation helpful? Give feedback.
-
Specifying hyper-parameters in flat/non-hierarchical form is the most natural first approach but as models grow in complexity such flat schemes can become unwieldy - both from the human perspectives of (1) understanding / intuiting how to improve and (2) maintaining them as well as from (3) the optimization perspective of a software tuner seeking to infer what a previous success indicates about what to try next.
Gin is an example of how to implement this (as is in practice in the Trax library). This seems like a great approach with only minor exceptions from perspective e.g. that it feels a little unnatural when developing in the notebook to specify hparams as a block of strings instead of as python objects.
Another requirement beyond hierarchy to consider is the means to specify allowable hyper-parameter ranges as was done in the tensor2tensor library and which could be used to configure tuning on CloudML.
A differential for this feature would be the perspective that flax is meant to be lower-level than the concern that would require such a means of configuration to be included in the core library.
Beta Was this translation helpful? Give feedback.
All reactions