Enable Pydantic I/O types in workflow context #1189

alicederyn · 2024-09-04T22:02:35Z

Pull Request Checklist

Fixes Allow using Pydantic I/O in with-based code #1192
Tests added
Documentation/examples added
Good commit messages and/or PR title

Description of PR
Currently, @script-decorated functions that use Pydantic I/O types must still use pre-Pydantic syntax when creating workflows. This makes it harder to migrate to the new dag and steps decorators, as more code must be migrated in a single step before the YAML can be regenerated and tested.

This PR allows Pydantic input types to be passed in when in an active context, and will then configure the returned subnode to mimic the Pydantic output type. Tasks using fields from the returned object will automatically have a dependency added, if no dependency is explicitly specified.

codecov · 2024-09-04T22:45:41Z

Codecov Report

Attention: Patch coverage is 93.84615% with 4 lines in your changes missing coverage. Please review.

Project coverage is 82.0%. Comparing base (d553546) to head (4d247fd).
Report is 136 commits behind head on main.

Files with missing lines	Patch %	Lines
src/hera/workflows/_meta_mixins.py	81.8%	1 Missing and 1 partial ⚠️
src/hera/workflows/script.py	91.6%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##            main   #1189     +/-   ##
=======================================
+ Coverage   81.7%   82.0%   +0.2%     
=======================================
  Files         54      61      +7     
  Lines       4208    4778    +570     
  Branches     889    1021    +132     
=======================================
+ Hits        3439    3919    +480     
- Misses       574     625     +51     
- Partials     195     234     +39

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

alicederyn · 2024-09-04T22:46:41Z

src/hera/workflows/_meta_mixins.py

@@ -539,8 +545,7 @@ def _create_subnode(
        assert _context.pieces

        template_ref = None
-        _context.declaring = False
-        if _context.pieces[0] != self and isinstance(self, WorkflowTemplate):
+        if _context.pieces[0] is not self and isinstance(self, WorkflowTemplate):


Note: Pydantic's != implementation raises an "incorrect type" validation error here due to having been returned a string from TemplateInvocatorSubNodeMixin's modified __getattribute__ implementation. We could temporarily disable the active context instead, but switching to is not is simpler (and faster). As a side-benefit, we can remove the temporary assignment of False to _context.declaring, which was previously working around this issue.

alicederyn · 2024-09-04T23:11:34Z

src/hera/workflows/script.py

@@ -760,6 +755,27 @@ def script_wrapper(func: Callable[FuncIns, FuncR]) -> Callable:
        def task_wrapper(*args, **kwargs) -> Union[FuncR, Step, Task, None]:
            """Invokes a `Script` object's `__call__` method using the given SubNode (Step or Task) args/kwargs."""
            if _context.active:
+                if len(args) == 1 and isinstance(args[0], (InputV1, InputV2)):


Note this new behaviour only affects calls made in an active context to a @script-decorated function when passing a Pydantic I/O object by argument, which previously would not have functioned meaningfully (the object was ignored by s.__call__).

src/hera/workflows/script.py

examples/workflows/experimental/pydantic_io_in_steps_context.py

jeongukjae · 2024-09-05T10:30:46Z

src/hera/workflows/io/_io_mixins.py

-        if _context.declaring:
+        if _context.active:
            # Intercept the declaration to avoid validation on the templated strings
-            # We must then turn off declaring mode to be able to "construct" an instance
+            # We must then disable the active context to be able to "construct" an instance
            # of the InputMixin subclass.
-            _context.declaring = False
-            instance = cls.construct(**kwargs)
-            _context.declaring = True
-            return instance
+            with no_active_context():
+                return cls.construct(**kwargs)


I don't understand hera's context management 100% (especially active & declaring), so deferring mixins' review to others.
For me, it seems refine the context state checking, and looks fine.

jeongukjae · 2024-09-05T10:55:48Z

src/hera/workflows/_meta_mixins.py

+                # Add dependencies based on context if not explicitly provided
+                current_task_depends = _context.pieces[-1]._current_task_depends
+                if current_task_depends and "depends" not in kwargs:
+                    kwargs["depends"] = " && ".join(sorted(current_task_depends))
+                current_task_depends.clear()
+


Not 100% sure.
But this seems more discussions are needed.

e.g. in this pr's example, following code works as expected

cut_task = cut(CutInput(strings=["hello", "world", "it's", "hera"], cut_after=1)) join_task = join(JoinInput(strings=cut_task.first_strings, joiner=" ")) cut_task >> join_task

This code can possibly restrict the construction of DAGs.
for example, let's think about following example

A = task_a_fn(SomePydanticClass1(...)) B = task_b_fn(SomePydanticClass2(arg_1=A.return_1, ...)) # B depends on A's output C = task_c_fn(SomePydanticClass3(...)) # C doesn't depends on A's output. # But user want to construct DAG as followings. A >> [B, C]

In above case, hera will raise an error. (e.g. already in some task's depends)

hera/src/hera/workflows/task.py

Lines 83 to 84 in 2eec4c2

elif self.name in other._get_dependency_tasks():

raise ValueError(f"{self.name} already in {other.name}'s depends: {other.depends}")

In your example, you could just write A >> C instead of A >> [B, C]

However, I agree that this is an existing limitation of the new Pydantic I/O approach which will need solving separately. My preference would be for user-provided dependency edges to not raise duplication errors (like in your example — it should be fine to be explicit), and to override automatic ones (e.g. if you do A.on_success(B), that should overwrite the existing dependency between A and B). In the meantime, users can:

use the old A.get_parameter API to avoid adding dependency edges

pass in requires="" to disable the automatic generation of requires from dependency edges

do B.requires="" later

Sounds great, but for me it looks better to have same/expected behavior even with Pydantic I/O.
User might expect to be able to write A >> [B, C], but this raises an error.
I assume (checked very limited cases) other nodes are also doesn't generate depends attribute based on the returns/arguments.

How about separating this feature to another PR, and support this for regardless of the pydantic io?
Maybe @elliotgunton can give more opinions?

User might expect to be able to write A >> [B, C]

I agree - it would be good to write all my dependencies in one place (and as a bonus: in one statement) rather than spread out

user-provided dependency edges to not raise duplication errors

Yes I think it might make sense (not sure about precedent/existing patterns) to make dependency-setting idempotent, and then use the last call (ignoring whether it was automatic or not):

A.on_success(B) A.on_failure(B)

Will end up only with B's depends: A.Failed.

alicederyn · 2024-09-05T11:48:43Z

src/hera/workflows/_meta_mixins.py

@@ -372,6 +372,12 @@ def __call__(self, *args, **kwargs) -> Union[None, Step, Task]:
                return Step(template=self, **kwargs)

            if isinstance(_context.pieces[-1], DAG):
+                # Add dependencies based on context if not explicitly provided


Note this is different from the implementation in the experimental decorator code, which does not allow a depends to be passed at all:

hera/src/hera/workflows/_meta_mixins.py

Lines 568 to 569 in 2eec4c2

depends=" && ".join(sorted(_context.pieces[-1]._current_task_depends)) or None,

**kwargs,

This is to allow more gradual migration to the new syntax. Would it be reasonable for the existing code to change to match?

I think this logic should probably be made common -- perhaps moved into DAG?

Would it be reasonable for the existing code to change to match?

Yes that would be good! 🙏

alicederyn · 2024-09-05T11:52:56Z

src/hera/workflows/_meta_mixins.py

+                current_task_depends = _context.pieces[-1]._current_task_depends
+                if current_task_depends and "depends" not in kwargs:
+                    kwargs["depends"] = " && ".join(sorted(current_task_depends))
+                current_task_depends.clear()


Pre-existing issue: if the user creates a Task explicitly rather than via a decorated function, the dependencies list won't be cleared. Perhaps this should be moved to the point where a Task is registered?

Yes, makes sense to move it to DAG's _add_sub

hera/src/hera/workflows/dag.py

Lines 50 to 56 in e7f38fa

def _add_sub(self, node: Any):

if not isinstance(node, Task):

raise InvalidType(type(node))

if node.name in self._node_names:

raise NodeNameConflict(f"Found multiple Task nodes with name: {node.name}")

self._node_names.add(node.name)

self.tasks.append(node)

elliotgunton · 2024-09-05T14:36:37Z

src/hera/workflows/script.py

+                    if not output_class or output_class is NoneType:
+                        return None
+
+                    if not issubclass(output_class, (OutputV1, OutputV2)):


I'm not sure this should be a requirement, especially for gradual migrations and issues like #1191 (comment)

I'm not sure what to return that would match the type signature though. Perhaps this can be addressed in a later PR?

Setting the subnode._build_obj means users can do my_task.parameter_name when passes values between tasks - that behaviour should really only be available for the new decorator contexts as it's very experimental and unproven (see #1162). I think this PR is mixing up the two features which should be kept separate IMO, so I'm struggling to understand the motivation of the PR.

Moved the new behaviour into the decorator_syntax experimental feature for now.

elliotgunton

I would like to discuss the motivation behind this PR and how we might tidy it up as it's mixing up the two experimental features in a way that is non-obvious

elliotgunton · 2024-09-05T16:31:08Z

tests/test_pydantic_io_syntax.py

+            result = triple(IntInput(field=5)).field
+
+    assert result == "{{steps.triple.outputs.parameters.field}}"


This is not what the Pydantic IO feature flag was designed for and shouldn't be possible in a normal Steps context

Moved behind the decorator_syntax flag

elliotgunton · 2024-09-05T16:41:01Z

I think I just have 2 issues to resolve for the general direction of the PR - the main one being the code implementing the new decorator behaviour i.e. result = triple(IntInput(field=5)).field style, should be asserting on the "new decorators" feature flag.

The second is just if you could motivate the PR with an issue showing how the change in the existing example would work here

hera/examples/workflows/experimental/script_runner_io.py

Lines 54 to 62 in e7f38fa

    
           pydantic_io( 
        
               arguments=[ 
        
                   write_step.get_artifact("int-artifact").with_name("artifact-input"), 
        
                   { 
        
                       "param_int": 101, 
        
                       "an_object": MyObject(a_dict={"my-new-key": "my-new-value"}), 
        
                   }, 
        
               ] 
        
           )

I think the feature is a good stepping stone for users to migrate, but shouldn't be backported to the script_pydantic_io flag (as that's been out a while, whereas our funky new syntax is kinda buggy, and I actually wouldn't recommend migrating yet, having tried it out myself).

alicederyn · 2024-09-05T18:02:40Z

Issue added: #1192

elliotgunton

Updating blocking status of the PR for discussion in #1192. I don't think we should make this drastic of a feature change to the old syntax

elliotgunton · 2024-09-13T15:20:18Z

Hey @alicederyn, we discussed this PR and motivation behind it in the maintainer's meeting (notes here), @samj1912 agreed that we don't want to mix the two syntaxes as that will confuse users, so I've moved the PR into the "draft" state.

It seems the main goal is to avoid a big-bang migration, or at least have a checkpoint approach as you explained. An alternative would be to work on #711 as the new decorators mean we can create a full Argo Workflow spec as Python stubs in Hera. This should mean we can construct the DAG too, so the only work to do would be convert old script templates to use Pydantic IO classes, which should match the YAML (because it would be generated from it (super awesome!!)) and then you'd just copy/paste the function body into the stubs.

Let's discuss when you're back, no rush to reply! 😄

Code to check if _SCRIPT_PYDANTIC_IO_FLAG is set and error if not occurs twice; factor out into a shared utility function. Signed-off-by: Alice Purcell <alicederyn@gmail.com>

_create_subnode temporarily disables _context.declaring to avoid triggering the altered `__getattribute__` behaviour in TemplateInvocatorSubNodeMixin in two situations: - Pydantic's implementation of `!=`, which is attempting to read field values, and will raise a validation error due to the altered field names being of incorrect type. - Accessing `__class__` on a Pydantic subnode object with no build_obj set. Instead, fix this by: - Using `is not` instead of `!=`, which is correct where we use it, and also faster. - Fixing `__getattribute__` to no longer fail if used on an object with no build_obj set. Signed-off-by: Alice Purcell <alicederyn@gmail.com>

Extend experimental Pydantic I/O support to allow passing Pydantic types into `@script`-decorated functions when inside a `with` workflow context block, and using fields on the returned Pydantic output as shorthand for the associated Hera template in subsequent steps. Signed-off-by: Alice Purcell <alicederyn@gmail.com>

Signed-off-by: Alice Purcell <alicederyn@gmail.com>

Currently, TemplateDecoratorFuncMixin._create_subnode unconditionally adds a depends kwarg, causing a runtime error if the user also supplies one. Instead, prefer the user-supplied one. Signed-off-by: Alice Purcell <alicederyn@gmail.com>

Move the logic that creates the right leaf node for Steps, Parallel and DAG to a _create_leaf_node method on those types. DAG now specifies how to default the depends parameter to Task based on its _current_task_depends field. This simplifies the duplicated logic in _meta_mixins to a simple isinstance check for any of those three types, followed by a _create_leaf_node call. Signed-off-by: Alice Purcell <alicederyn@gmail.com>

Signed-off-by: Alice Purcell <alicederyn@gmail.com>

Add a new experimental flag to gate use of Pydantic types in a workflow context manager Signed-off-by: Alice Purcell <alicederyn@gmail.com>

Signed-off-by: Alice Purcell <alicederyn@gmail.com>

elliotgunton · 2024-11-13T22:39:53Z

Closing as agreed in maintainer meeting discussion (I'd rather not leave the PR open indefinitely) - we need to consider the graduation criteria for the new decorators and then how to migrate once they are matured and ready to graduate.

alicederyn force-pushed the use.pydantic.types.in.context branch from 1d52ec1 to 3b97438 Compare September 4, 2024 22:09

alicederyn added type:enhancement A general enhancement semver:minor A change requiring a minor version bump labels Sep 4, 2024

alicederyn force-pushed the use.pydantic.types.in.context branch from 3b97438 to ff3fdf6 Compare September 4, 2024 22:43

alicederyn commented Sep 4, 2024

View reviewed changes

alicederyn force-pushed the use.pydantic.types.in.context branch 2 times, most recently from c6ea26f to a6eab0f Compare September 4, 2024 23:06

alicederyn commented Sep 4, 2024

View reviewed changes

jeongukjae reviewed Sep 5, 2024

View reviewed changes

alicederyn force-pushed the use.pydantic.types.in.context branch 3 times, most recently from 9caeb6d to e22dd8d Compare September 5, 2024 11:41

alicederyn commented Sep 5, 2024

View reviewed changes

elliotgunton reviewed Sep 5, 2024

View reviewed changes

elliotgunton requested changes Sep 5, 2024

View reviewed changes

alicederyn force-pushed the use.pydantic.types.in.context branch 2 times, most recently from 0a7fb95 to d13e057 Compare September 5, 2024 17:47

alicederyn requested a review from elliotgunton September 5, 2024 17:53

alicederyn marked this pull request as ready for review September 5, 2024 17:53

alicederyn requested review from flaviuvadan and sambhav as code owners September 5, 2024 17:53

elliotgunton requested changes Sep 10, 2024

View reviewed changes

elliotgunton marked this pull request as draft September 13, 2024 15:01

elliotgunton mentioned this pull request Sep 13, 2024

Allow using Pydantic I/O in with-based code #1192

Open

alicederyn force-pushed the use.pydantic.types.in.context branch from 4d247fd to af1bcef Compare September 24, 2024 09:57

alicederyn force-pushed the use.pydantic.types.in.context branch 2 times, most recently from da10075 to f263f56 Compare October 9, 2024 10:27

alicederyn added 12 commits October 16, 2024 10:11

Factor out common _SCRIPT_PYDANTIC_IO_FLAG check

9b0fa03

Code to check if _SCRIPT_PYDANTIC_IO_FLAG is set and error if not occurs twice; factor out into a shared utility function. Signed-off-by: Alice Purcell <alicederyn@gmail.com>

Add tests for the new Pydantic I/O syntax

90fd3bb

Signed-off-by: Alice Purcell <alicederyn@gmail.com>

Clear current task dependencies in DAG._add_sub

310658a

Signed-off-by: Alice Purcell <alicederyn@gmail.com>

Allow passing depends in a @w.dag context

4d1d257

Currently, TemplateDecoratorFuncMixin._create_subnode unconditionally adds a depends kwarg, causing a runtime error if the user also supplies one. Instead, prefer the user-supplied one. Signed-off-by: Alice Purcell <alicederyn@gmail.com>

Move new syntax behind decorator_syntax feature

f474926

Signed-off-by: Alice Purcell <alicederyn@gmail.com>

Fix name of experimental feature message function

f90d3c9

Signed-off-by: Alice Purcell <alicederyn@gmail.com>

Update new examples to Python 3.9 (logical merge conflict)

f201ef1

Signed-off-by: Alice Purcell <alicederyn@gmail.com>

Create a new experimental flag

5969aa5

Add a new experimental flag to gate use of Pydantic types in a workflow context manager Signed-off-by: Alice Purcell <alicederyn@gmail.com>

Document incremental migration option

461785e

Signed-off-by: Alice Purcell <alicederyn@gmail.com>

alicederyn force-pushed the use.pydantic.types.in.context branch from f263f56 to 461785e Compare October 16, 2024 09:11

elliotgunton closed this Nov 13, 2024

alicederyn deleted the use.pydantic.types.in.context branch November 14, 2024 10:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable Pydantic I/O types in workflow context #1189

Enable Pydantic I/O types in workflow context #1189

alicederyn commented Sep 4, 2024 •

edited

Loading

codecov bot commented Sep 4, 2024 •

edited

Loading

alicederyn Sep 4, 2024 •

edited

Loading

alicederyn Sep 4, 2024 •

edited

Loading

jeongukjae Sep 5, 2024 •

edited

Loading

jeongukjae Sep 5, 2024

alicederyn Sep 5, 2024

jeongukjae Sep 5, 2024

elliotgunton Sep 5, 2024 •

edited

Loading

alicederyn Sep 5, 2024 •

edited

Loading

elliotgunton Sep 5, 2024

alicederyn Sep 5, 2024

alicederyn Sep 5, 2024 •

edited

Loading

elliotgunton Sep 5, 2024

alicederyn Sep 5, 2024

elliotgunton Sep 5, 2024

alicederyn Sep 5, 2024

elliotgunton Sep 5, 2024

alicederyn Sep 5, 2024

elliotgunton left a comment

elliotgunton Sep 5, 2024

alicederyn Sep 5, 2024 •

edited

Loading

elliotgunton commented Sep 5, 2024

alicederyn commented Sep 5, 2024

elliotgunton left a comment

elliotgunton commented Sep 13, 2024

elliotgunton commented Nov 13, 2024

	elif self.name in other._get_dependency_tasks():
	raise ValueError(f"{self.name} already in {other.name}'s depends: {other.depends}")

	depends=" && ".join(sorted(_context.pieces[-1]._current_task_depends)) or None,
	**kwargs,

	def _add_sub(self, node: Any):
	if not isinstance(node, Task):
	raise InvalidType(type(node))
	if node.name in self._node_names:
	raise NodeNameConflict(f"Found multiple Task nodes with name: {node.name}")
	self._node_names.add(node.name)
	self.tasks.append(node)

		result = triple(IntInput(field=5)).field

		assert result == "{{steps.triple.outputs.parameters.field}}"

Enable Pydantic I/O types in workflow context #1189

Enable Pydantic I/O types in workflow context #1189

Conversation

alicederyn commented Sep 4, 2024 • edited Loading

codecov bot commented Sep 4, 2024 • edited Loading

Codecov Report

alicederyn Sep 4, 2024 • edited Loading

Choose a reason for hiding this comment

alicederyn Sep 4, 2024 • edited Loading

Choose a reason for hiding this comment

jeongukjae Sep 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elliotgunton Sep 5, 2024 • edited Loading

Choose a reason for hiding this comment

alicederyn Sep 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alicederyn Sep 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elliotgunton left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alicederyn Sep 5, 2024 • edited Loading

Choose a reason for hiding this comment

elliotgunton commented Sep 5, 2024

alicederyn commented Sep 5, 2024

elliotgunton left a comment

Choose a reason for hiding this comment

elliotgunton commented Sep 13, 2024

elliotgunton commented Nov 13, 2024

alicederyn commented Sep 4, 2024 •

edited

Loading

codecov bot commented Sep 4, 2024 •

edited

Loading

alicederyn Sep 4, 2024 •

edited

Loading

alicederyn Sep 4, 2024 •

edited

Loading

jeongukjae Sep 5, 2024 •

edited

Loading

elliotgunton Sep 5, 2024 •

edited

Loading

alicederyn Sep 5, 2024 •

edited

Loading

alicederyn Sep 5, 2024 •

edited

Loading

alicederyn Sep 5, 2024 •

edited

Loading