-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TEP-0060: Propose remote resolution experimental controller #493
Conversation
/kind tep |
Keeping this proposal open for a week, giving folks a chance to review if anyone is interested 🧑💻 |
/assign |
In order to support an experimental controller independently performing | ||
resolution we could add an arg to Tekton Pipelines' controller (e.g. | ||
`--experimental-disable-in-tree-resolution`) that completely disables the | ||
built-in resolution logic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wonder if there's an alternative where we could leverage the "pending" state - e.g. if you want to use the experimental controller, you create "pending" pipelineruns, then the experimental resolution controller can do it's thing (embedding the spec) and remove the "pending" status
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah agree this is an option for PipelineRuns and I considered it. We don't have that Pending feature for TaskRuns though. Should we gate this work on that being decided on and delivered first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah right - yeah that's kind of a pain 🤔
at the least i think the deciding part of adding "pending" for taskruns would be very brief in that the PR that was accepted for the pending tep actually was called "TEP-0015 - Add a pending setting to Tekton PipelineRun and TaskRuns" (#203) (more discussion in the PR that adds the feature tektoncd/pipeline#3522 (comment)) - so since we accepted adding pending PipelineRuns, im not sure why we wouldnt accept pending TaskRuns
I agree with you tho, making this TEP depend on that happening would be putting yet another obstacle in the path of this TEP and I'd rather accept the lesser evil of an extra config option than make it take even longer to start experimenting with this
(and as a side note, ive had this awful/wonderful idea recently that we'd be better off making the pending setting more like a semaphore - i.e. what if you have multiple controllers that need to sign off before executing a pending run? maybe there's a better way than that, e.g. a list of explicit controllers you want to process a run before it executes? LONG STORY SHORT i think you're making the right call)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(and as a side note, ive had this awful/wonderful idea recently that we'd be better off making the pending setting more like a semaphore - i.e. what if you have multiple controllers that need to sign off before executing a pending run? maybe there's a better way than that, e.g. a list of explicit controllers you want to process a run before it executes? LONG STORY SHORT i think you're making the right call)
YES. This is an excellent point. Very tricky edge case if a user wants both resolution and has their own pending processing logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I've also updated the doc to include mention of this alternative approach)
also other Tekton projects. Having the metadata copy performed by an external | ||
controller doesn't make a ton of sense in that case: metadata copying is a | ||
feature/requirement of Tekton Pipelines, not a general requirement for | ||
resolution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you explain a bit more about this? im not totally following because:
- afaik the experimental resolution controllers know enough about pipelineruns + taskruns to modify them directly, so in that sense they're not generic enough to use for anything else (maybe im missing something tho)
- metadata like labels and annotations is pretty universal across k8s resources, so in that sense having some logic around this even in a very generic tool seems reasonable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The resolution controller I'm proposing for the experimental repo would aim to deliver the functionality that we've been discussing through TEP-0060, which has now largely boiled down to: (1) It should be a separate project (2) It should process the resources before Pipelines sees them and (3) it shouldn't be Pipelines-specific.
It's true that metadata is a universal feature of objects in k8s. But the behaviour where metadata is copied verbatim from Tasks to TaskRuns and Pipelines to PipelineRuns is quite specifically Tekton Pipelines I think? Let's say we open the system up in such a way that anyone can write a Resolver. Should we require their resolver to also perform this Pipelines-specific metadata copying action? I guess I'm not convinced that this is an action we want left to external resolvers to perform. Those labels / annotations can trip up the pipelines controller when they're missing and are part of our backwards-compat story I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the behaviour where metadata is copied verbatim from Tasks to TaskRuns and Pipelines to PipelineRuns is quite specifically Tekton Pipelines I think?
i wonder if one of the reasons we're copying around metadata like this is because we needed a better way to query pipelineruns and taskruns - which maybe is the role that tekton results fills now? might be something we want to consider for v1
I guess I'm not convinced that this is an action we want left to external resolvers to perform.
in that case im leaning more toward the approach where we embed the entire resolved yaml document (or as much of it as we know the controller needs) 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, possibly the biggest reason for the metadata duplication is about querying the cluster for PipelineRuns associated to Pipelines (and TaskRuns to Tasks). I definitely agree that Results might be a better host for this kind of relational data.
Just to throw an additional wrinkle (because I'm also generally in favor of embedding the entire yaml doc): if we go the route of embedding the entire resolved yaml I imagine there will be some amount of time where we need to store both pipelineSpec
and (hypothetical) pipelineYAML
in single PipelineRuns' status
for compatibility's sake. In turn I wonder if this would push PipelineRuns much more quickly towards etcd's size limits when very large Pipelines are involved. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, didn't quite finish my thought inre: metadata duplication. While querying associations is one very big use case I think there are also probably others and it would probably behoove us to enumerate those and make sure they're accounted for before we decide to drop the copying behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine there will be some amount of time where we need to store both pipelineSpec and (hypothetical) pipelineYAML in single PipelineRuns' status for compatibility's sake.
hmm I'm trying to think through the scenario where we'd need to do this - all I can think of is if the controller is downgraded during pipelinerun execution?
The current controller expects to find a pipeline spec in the pipelinerun status - the updated version would expect to find the entire pipelineYAML but could also support the pipeline spec if it found it (we could wait until we're outside the compatibility window to remove that support) - but most cases wouldnt require actually storing both i dont think?
scenarios i can think of:
- upgrade from "pipeline spec" controller to new version:
- completed pipelineruns would contain "pipeline spec" in status, doesnt matter cuz they're done
- in progress pipelineruns would be using "pipeline spec" which would be okay as long as the new version still supported it
- new pipelineruns would start using "pipelineYAML"
- downgrade from new version of the controller to "pipeline spec" controller
- completed pipeline runs contain "pipelineYAML", doesnt matter cuz they're done (possible that on controller start up it gets upset about unexpected fields?)
- in progress pipeline runs would be in trouble - the old controller wouldn't know how to read "pipelineYAML" out of their specs
- new pipeline runs would use "pipeline spec"
Do we need to support the downgrade case in (2)? (specifically: support runs which execute during the update?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was specifically thinking of users who have written code that expects the pipelineSpec
to be embedded in the status rather than scenarios where the system itself might be caught in an intermediate state. But it sounds like you don't think this is a situation we need to worry about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I've updated the proposal to pass the entire resolved yaml via a taskYaml
field for taskruns and a pipelineYaml
field for pipelineruns.
FYI @vdemeester and @dibyom - this is a change to the proposal so probably worth giving it one more look to confirm you're happy to keep approving.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolving both metadata
and spec
under a taskYAML
or pipelineYAML
field sounds good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
users who have written code that expects the pipelineSpec to be embedded in the status
ah that's a good point, i was only thinking about this from the controller's perspective - i always forget about status as part of the api 😅
you're probably right, it's probably best to support both - maybe we can use (even more) config options to control whether we do that or not?
/assign vdemeester |
The refactoring I'd started in pipelines#4108 ended up being far too invasive to deliver in a single PR. I've instead added the new resolving reconcilers behind flags in a way that's similar to what's described by this proposal. This protects pipelines' users from being exposed to the new changes until we're ready to commit to them. It also clears the way for external processes (like the experimental controller proposed here) to perform remote resolution. |
needs re-approvals with the TEP changes, please take a look @tektoncd/core-maintainers |
When the proposed experimental resolution controller sees an unresolved | ||
`TaskRun` or `PipelineRun` it will read the resource and then resolve whatever | ||
ref, bundle or inline document is included. The resolved `spec` will then be | ||
embedded in the `status.taskSpec` field for `TaskRuns` or the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be status.taskYAML.spec
and status.pipelineYAML.spec
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well spotted, thank you! Fixed and pushed.
also other Tekton projects. Having the metadata copy performed by an external | ||
controller doesn't make a ton of sense in that case: metadata copying is a | ||
feature/requirement of Tekton Pipelines, not a general requirement for | ||
resolution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolving both metadata
and spec
under a taskYAML
or pipelineYAML
field sounds good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking great to me, i really like your thorough incremental approach
/approve
|
||
There are two approaches that I've considered for solutions to this: | ||
|
||
1. Change the contract such that the _entire_ resolved YAML document is passed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at the risk of making this too complicated and unreliable, i wonder if we should identify some exceptions to this right off the bat
I was just looking at a Task in the dogfooding cluster and the metadata looks like this:
I'm wondering if we want to identify a list of metadata fields and specific annotations we don't copy, e.g.:
- the annotation
kubectl.kubernetes.io/last-applied-configuration
managedFields
@_@ OR we start with only labels? we could pave the way for potentially embedding the entire thing by making the field able to store the entire yaml, but initially only store a certain subset that we know we need?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated with a note to decide precisely how we filter the copied metadata as part of this experimental phase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, we should exclude storing manageFields
and last-applied-configuration
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated with mention of managedFields in addition to last-applied-configuration.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bobcatfish, dibyom, vdemeester The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This commit updates the proposed remote resolution feature with a suggestion to develop a controller in our experimental repo to try out some of the Alternatives. In order to support such an experimental controller two changes are also proposed for Tekton Pipelines that would make such a controller much easier to develop and test.
Looking at the working group notes, it looks like the only reason this wasn't LGTM-ed in the meeting was b/c it needed a rebase, so: /lgtm |
This commit updates the proposed remote resolution feature with a suggestion to
develop a controller in our experimental repo to try out some of the
Alternatives.
In order to support such an experimental controller two changes are also
proposed for Tekton Pipelines that would make such a controller much easier to
develop and test.