Extend factories syntax to other config files? #3751
Replies: 12 comments
-
Posting some more questions about using
and in globals.yml
The resolution of the config happens when the config is loaded before a session is run. The dataset factory placeholders are resolved later when the pipeline is being executed. So it isn't currently possible to do this. |
Beta Was this translation helpful? Give feedback.
-
Hi, Is it possible to partially resolve the config (only the keys without placeholders) before the session run? Then, before the pipeline run we could resolve the placeholders and perfom again another OmegaConf resolution. This is something similar to the catalog resolution, the only difference is that we should resolve with OmegaConf after the placeholder resolution. My use case would be:
|
Beta Was this translation helpful? Give feedback.
-
I have the same need to be able to define dataset factories but customize specific parameters. I would like to be able to do something like this (the example uses simple variable interpolation but ideally I would want to be using globals): _dataset_config:
country_technology_granularity:
partition_method: datetime
datetime_column: gas_date
partition_by: [year, month, day]
'{signal_name}__predictions':
type: axpo.kedro.datasets.pandas_arrow_dataset.ParquetArrowDataset
path: abfs://container/{signal_name}/predictions/
credentials: blob_storage
versioned: true
write_mode: append
partition_method: {_dataset_config.{signal_name}.partition_method} |
Beta Was this translation helpful? Give feedback.
-
I would love to extend the factories pattern to other config files. When creating dynamic pipelines (reusing part of a pipeline many times as seen in https://getindata.com/blog/kedro-dynamic-pipelines/), you often end up with a lot of datasets having the same names but namespaced. Using the catalog factories pattern, the output of these pipelines could all be persisted with a single catalog entry like: "{namespace}.{variant}.classification_experiment":
type: "${_datasets.pickle}"
backend: cloudpickle
filepath: "${_base_path}/${_folders.mdl}/{namespace}/{variant}/classification_experiment_train.pkl"
metadata:
layer: Model
"{namespace}.{variant}.classification_base_model":
type: "${_datasets.pickle}"
filepath: "${_base_path}/${_folders.mdl}/{namespace}/{variant}/classification_base_model.pkl"
metadata:
layer: Model However, since there's no similar functionality for parameters, you then have to create a parameterset for every single variant instead of reusing part of the parameters. For my use-cases, I would also need to be able to define parameter factories but customize specific parameters (similar to what @inigohidalgo mentioned but for parameters). Think of it like:
|
Beta Was this translation helpful? Give feedback.
-
Thank you for the example, it's clear and easy to understand. Can you use a subkey instead of a string literal? I don't think you are supposed to use a.b.c as a key You can use variable interpolation instead @kasperjanehag
I find this more clear than compiling a pattern and is similar to what YAML anchor does. |
Beta Was this translation helpful? Give feedback.
-
@noklam, sorry good point. I guess you mean:
|
Beta Was this translation helpful? Give feedback.
-
@kasperjanehag I mean more like leveraging what OmegaConf support already, we may not need to introduce a new syntax. Assuming you don't need to override parameters (i am not sure if there is a way yet but the point is we can consider a different approach) default:
modelling:
param_1: value_1
param_2: value_2
param_3: value_3
use_case_1:
variant_1: `${default}
use_case_2:
variant_2: `${default}
use_case_3:
variant_3: `${default} I find this more readable, which is also similar to YAML anchor. |
Beta Was this translation helpful? Give feedback.
-
@noklam thanks for suggesting. The main problem I see with this approach (correct me if I'm wrong), is how polluted the parameter space gets. With your suggested approach (which we're also currently running in a few project), wouldn't parameters be duplicated and exist both in |
Beta Was this translation helpful? Give feedback.
-
This is a fair point. Though this is just a design choice because factory pattern are duplicate d in the config, we remove it from the resolved version because it's fairly easy to identify. We can take the same approach, let say if it's start from _ we know that it's not a real config. |
Beta Was this translation helpful? Give feedback.
-
I'd be interested in this as well! Currently I'm creating a data processing pipeline for some devices measurements data, which we identify by the SAP number of the device. In
which is later used in i.a.
Could it be somehow simplified so that my params are defined as:
(similarly to catalog) |
Beta Was this translation helpful? Give feedback.
-
Yes, @julnow that's how we do it in our project as well. Would love a simpler solution to parameter factories,but haven't figured out the right implementation yet. |
Beta Was this translation helpful? Give feedback.
-
Hiya, has there been any further discussion on this issue? This #3751 (comment) has been an important pain point for us in recent developments. (It only became a pain point because dataset factories are so great we want to use them everywhere 😅) We can address this on our side but will require some significant developments with some new Dataset implementations which can read some metadata from the data to be saved, which would be rendered unnecessary by this feature. Before starting that development I wanted to have an idea if this will be resolved any time soon. Thank you! |
Beta Was this translation helpful? Give feedback.
-
Description
https://linen-slack.kedro.org/t/15867022/hey-all-this-can-be-done-in-catalog-yml-to-access-data-throu#52b21693-0b9f-43d0-ae46-b4f215720bbe
https://linen-slack.kedro.org/t/13222958/hi-everyone-is-there-a-way-to-have-truly-global-params-i-e-p#c273b11c-f60a-4ca1-bc87-9aab4301b327
Context
Opening this issue to collect feedback and use cases.
Maybe the answer is "no", and maybe the answer is "these issues arise from something else" (for example @noklam has suggested that namespaces are confusing). But I think it's important that we centralize the discussion.
Possible Implementation
Possible Alternatives
Beta Was this translation helpful? Give feedback.
All reactions