-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable Pydantic I/O types in workflow context #1189
base: main
Are you sure you want to change the base?
Enable Pydantic I/O types in workflow context #1189
Conversation
1d52ec1
to
3b97438
Compare
3b97438
to
ff3fdf6
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1189 +/- ##
=======================================
+ Coverage 81.7% 82.0% +0.2%
=======================================
Files 54 61 +7
Lines 4208 4778 +570
Branches 889 1021 +132
=======================================
+ Hits 3439 3919 +480
- Misses 574 625 +51
- Partials 195 234 +39 ☔ View full report in Codecov by Sentry. |
@@ -539,8 +545,7 @@ def _create_subnode( | |||
assert _context.pieces | |||
|
|||
template_ref = None | |||
_context.declaring = False | |||
if _context.pieces[0] != self and isinstance(self, WorkflowTemplate): | |||
if _context.pieces[0] is not self and isinstance(self, WorkflowTemplate): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: Pydantic's !=
implementation raises an "incorrect type" validation error here due to having been returned a string from TemplateInvocatorSubNodeMixin's modified __getattribute__
implementation. We could temporarily disable the active context instead, but switching to is not
is simpler (and faster). As a side-benefit, we can remove the temporary assignment of False
to _context.declaring
, which was previously working around this issue.
c6ea26f
to
a6eab0f
Compare
@@ -760,6 +755,27 @@ def script_wrapper(func: Callable[FuncIns, FuncR]) -> Callable: | |||
def task_wrapper(*args, **kwargs) -> Union[FuncR, Step, Task, None]: | |||
"""Invokes a `Script` object's `__call__` method using the given SubNode (Step or Task) args/kwargs.""" | |||
if _context.active: | |||
if len(args) == 1 and isinstance(args[0], (InputV1, InputV2)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note this new behaviour only affects calls made in an active context to a @script
-decorated function when passing a Pydantic I/O object by argument, which previously would not have functioned meaningfully (the object was ignored by s.__call__
).
if _context.declaring: | ||
if _context.active: | ||
# Intercept the declaration to avoid validation on the templated strings | ||
# We must then turn off declaring mode to be able to "construct" an instance | ||
# We must then disable the active context to be able to "construct" an instance | ||
# of the InputMixin subclass. | ||
_context.declaring = False | ||
instance = cls.construct(**kwargs) | ||
_context.declaring = True | ||
return instance | ||
with no_active_context(): | ||
return cls.construct(**kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand hera's context management 100% (especially active & declaring), so deferring mixins' review to others.
For me, it seems refine the context state checking, and looks fine.
src/hera/workflows/_meta_mixins.py
Outdated
# Add dependencies based on context if not explicitly provided | ||
current_task_depends = _context.pieces[-1]._current_task_depends | ||
if current_task_depends and "depends" not in kwargs: | ||
kwargs["depends"] = " && ".join(sorted(current_task_depends)) | ||
current_task_depends.clear() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not 100% sure.
But this seems more discussions are needed.
e.g. in this pr's example, following code works as expected
cut_task = cut(CutInput(strings=["hello", "world", "it's", "hera"], cut_after=1))
join_task = join(JoinInput(strings=cut_task.first_strings, joiner=" "))
cut_task >> join_task
This code can possibly restrict the construction of DAGs.
for example, let's think about following example
A = task_a_fn(SomePydanticClass1(...))
B = task_b_fn(SomePydanticClass2(arg_1=A.return_1, ...)) # B depends on A's output
C = task_c_fn(SomePydanticClass3(...)) # C doesn't depends on A's output.
# But user want to construct DAG as followings.
A >> [B, C]
In above case, hera will raise an error. (e.g. already in some task's depends)
hera/src/hera/workflows/task.py
Lines 83 to 84 in 2eec4c2
elif self.name in other._get_dependency_tasks(): | |
raise ValueError(f"{self.name} already in {other.name}'s depends: {other.depends}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In your example, you could just write A >> C
instead of A >> [B, C]
However, I agree that this is an existing limitation of the new Pydantic I/O approach which will need solving separately. My preference would be for user-provided dependency edges to not raise duplication errors (like in your example — it should be fine to be explicit), and to override automatic ones (e.g. if you do A.on_success(B)
, that should overwrite the existing dependency between A and B). In the meantime, users can:
- use the old
A.get_parameter
API to avoid adding dependency edges - pass in
requires=""
to disable the automatic generation of requires from dependency edges - do
B.requires=""
later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds great, but for me it looks better to have same/expected behavior even with Pydantic I/O.
User might expect to be able to write A >> [B, C]
, but this raises an error.
I assume (checked very limited cases) other nodes are also doesn't generate depends
attribute based on the returns/arguments.
How about separating this feature to another PR, and support this for regardless of the pydantic io?
Maybe @elliotgunton can give more opinions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
User might expect to be able to write A >> [B, C]
I agree - it would be good to write all my dependencies in one place (and as a bonus: in one statement) rather than spread out
user-provided dependency edges to not raise duplication errors
Yes I think it might make sense (not sure about precedent/existing patterns) to make dependency-setting idempotent, and then use the last call (ignoring whether it was automatic or not):
A.on_success(B)
A.on_failure(B)
Will end up only with B's depends: A.Failed
.
9caeb6d
to
e22dd8d
Compare
src/hera/workflows/_meta_mixins.py
Outdated
@@ -372,6 +372,12 @@ def __call__(self, *args, **kwargs) -> Union[None, Step, Task]: | |||
return Step(template=self, **kwargs) | |||
|
|||
if isinstance(_context.pieces[-1], DAG): | |||
# Add dependencies based on context if not explicitly provided |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Note this is different from the implementation in the experimental decorator code, which does not allow a
depends
to be passed at all:hera/src/hera/workflows/_meta_mixins.py
Lines 568 to 569 in 2eec4c2
depends=" && ".join(sorted(_context.pieces[-1]._current_task_depends)) or None, **kwargs,
This is to allow more gradual migration to the new syntax. Would it be reasonable for the existing code to change to match? - I think this logic should probably be made common -- perhaps moved into DAG?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be reasonable for the existing code to change to match?
Yes that would be good! 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/hera/workflows/_meta_mixins.py
Outdated
current_task_depends = _context.pieces[-1]._current_task_depends | ||
if current_task_depends and "depends" not in kwargs: | ||
kwargs["depends"] = " && ".join(sorted(current_task_depends)) | ||
current_task_depends.clear() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pre-existing issue: if the user creates a Task explicitly rather than via a decorated function, the dependencies list won't be cleared. Perhaps this should be moved to the point where a Task is registered?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, makes sense to move it to DAG's _add_sub
hera/src/hera/workflows/dag.py
Lines 50 to 56 in e7f38fa
def _add_sub(self, node: Any): | |
if not isinstance(node, Task): | |
raise InvalidType(type(node)) | |
if node.name in self._node_names: | |
raise NodeNameConflict(f"Found multiple Task nodes with name: {node.name}") | |
self._node_names.add(node.name) | |
self.tasks.append(node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
if not output_class or output_class is NoneType: | ||
return None | ||
|
||
if not issubclass(output_class, (OutputV1, OutputV2)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this should be a requirement, especially for gradual migrations and issues like #1191 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what to return that would match the type signature though. Perhaps this can be addressed in a later PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting the subnode._build_obj
means users can do my_task.parameter_name
when passes values between tasks - that behaviour should really only be available for the new decorator contexts as it's very experimental and unproven (see #1162). I think this PR is mixing up the two features which should be kept separate IMO, so I'm struggling to understand the motivation of the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved the new behaviour into the decorator_syntax experimental feature for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to discuss the motivation behind this PR and how we might tidy it up as it's mixing up the two experimental features in a way that is non-obvious
tests/test_pydantic_io_syntax.py
Outdated
result = triple(IntInput(field=5)).field | ||
|
||
assert result == "{{steps.triple.outputs.parameters.field}}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not what the Pydantic IO feature flag was designed for and shouldn't be possible in a normal Steps
context
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved behind the decorator_syntax flag
I think I just have 2 issues to resolve for the general direction of the PR - the main one being the code implementing the new decorator behaviour i.e. The second is just if you could motivate the PR with an issue showing how the change in the existing example would work here hera/examples/workflows/experimental/script_runner_io.py Lines 54 to 62 in e7f38fa
I think the feature is a good stepping stone for users to migrate, but shouldn't be backported to the |
0a7fb95
to
d13e057
Compare
Issue added: #1192 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updating blocking status of the PR for discussion in #1192. I don't think we should make this drastic of a feature change to the old syntax
Hey @alicederyn, we discussed this PR and motivation behind it in the maintainer's meeting (notes here), @samj1912 agreed that we don't want to mix the two syntaxes as that will confuse users, so I've moved the PR into the "draft" state. It seems the main goal is to avoid a big-bang migration, or at least have a checkpoint approach as you explained. An alternative would be to work on #711 as the new decorators mean we can create a full Argo Workflow spec as Python stubs in Hera. This should mean we can construct the DAG too, so the only work to do would be convert old script templates to use Pydantic IO classes, which should match the YAML (because it would be generated from it (super awesome!!)) and then you'd just copy/paste the function body into the stubs. Let's discuss when you're back, no rush to reply! 😄 |
4d247fd
to
af1bcef
Compare
da10075
to
f263f56
Compare
Code to check if _SCRIPT_PYDANTIC_IO_FLAG is set and error if not occurs twice; factor out into a shared utility function. Signed-off-by: Alice Purcell <alicederyn@gmail.com>
_create_subnode temporarily disables _context.declaring to avoid triggering the altered `__getattribute__` behaviour in TemplateInvocatorSubNodeMixin in two situations: - Pydantic's implementation of `!=`, which is attempting to read field values, and will raise a validation error due to the altered field names being of incorrect type. - Accessing `__class__` on a Pydantic subnode object with no build_obj set. Instead, fix this by: - Using `is not` instead of `!=`, which is correct where we use it, and also faster. - Fixing `__getattribute__` to no longer fail if used on an object with no build_obj set. Signed-off-by: Alice Purcell <alicederyn@gmail.com>
Extend experimental Pydantic I/O support to allow passing Pydantic types into `@script`-decorated functions when inside a `with` workflow context block, and using fields on the returned Pydantic output as shorthand for the associated Hera template in subsequent steps. Signed-off-by: Alice Purcell <alicederyn@gmail.com>
Signed-off-by: Alice Purcell <alicederyn@gmail.com>
Signed-off-by: Alice Purcell <alicederyn@gmail.com>
Currently, TemplateDecoratorFuncMixin._create_subnode unconditionally adds a depends kwarg, causing a runtime error if the user also supplies one. Instead, prefer the user-supplied one. Signed-off-by: Alice Purcell <alicederyn@gmail.com>
Move the logic that creates the right leaf node for Steps, Parallel and DAG to a _create_leaf_node method on those types. DAG now specifies how to default the depends parameter to Task based on its _current_task_depends field. This simplifies the duplicated logic in _meta_mixins to a simple isinstance check for any of those three types, followed by a _create_leaf_node call. Signed-off-by: Alice Purcell <alicederyn@gmail.com>
Signed-off-by: Alice Purcell <alicederyn@gmail.com>
Signed-off-by: Alice Purcell <alicederyn@gmail.com>
Signed-off-by: Alice Purcell <alicederyn@gmail.com>
Add a new experimental flag to gate use of Pydantic types in a workflow context manager Signed-off-by: Alice Purcell <alicederyn@gmail.com>
Signed-off-by: Alice Purcell <alicederyn@gmail.com>
f263f56
to
461785e
Compare
Pull Request Checklist
Description of PR
Currently,
@script
-decorated functions that use Pydantic I/O types must still use pre-Pydantic syntax when creating workflows. This makes it harder to migrate to the new dag and steps decorators, as more code must be migrated in a single step before the YAML can be regenerated and tested.This PR allows Pydantic input types to be passed in when in an active context, and will then configure the returned subnode to mimic the Pydantic output type. Tasks using fields from the returned object will automatically have a dependency added, if no dependency is explicitly specified.