-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write to the action cache in the execution server #7974
Conversation
resultDigest, err := digest.AddInvocationIDToDigest(actionResourceName.GetDigest(), actionResourceName.GetDigestFunction(), md.GetExecutionTask().GetInvocationId()) | ||
if err != nil { | ||
return status.UnavailableErrorf("Error uploading action result: %s", err.Error()) | ||
} | ||
cacheableResourceName = digest.NewResourceName(resultDigest, actionResourceName.GetInstanceName(), rspb.CacheType_AC, actionResourceName.GetDigestFunction()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these failed ActionResults are no longer used, so we can short-circuit in this case. The UI fetches the response that was stored with cacheExecuteResponse now, which includes the complete response and works for both failed and succeeded actions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I sent #7985 to first remove these writes.
proto/execution_stats.proto
Outdated
@@ -135,6 +135,9 @@ message Execution { | |||
message ExecutionAuxiliaryMetadata { | |||
// Platform overrides set via remote header. | |||
build.bazel.remote.execution.v2.Platform platform_overrides = 1; | |||
|
|||
// The ExecutionTask that the executor received from the scheduler. | |||
build.bazel.remote.execution.v2.ExecutionTask execution_task = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just as a note - once we start populating this, we probably want to keep an eye on the prometheus metrics for AC write size and make sure it looks OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I decided it's probably best not to write the auxiliary metadata in cacheExecuteResponse. It's not used and it can be large.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The UI uses the auxiliary metadata (if you were looking for references using a case-sensitive search for AuxiliaryMetadata
you'll miss the UI references because they are lowerCamelCase
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re. metrics, Upload throughput
accounts for CAS blobs as well - you could either run a one-off query (Explore) to restrict to the AC label or update that dashboard to split out those charts by AC vs CAS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that's a good point about the UI. I decided to play it safe and not remove any existing data. Instead, the ExecutionTask will come as a separate auxiliary metadata, which can easily be removed without messing with the other metadata. PTAL.
Now that I don't need the invocation ID, another option is to not use the presence of the ExecutionTask to trigger cache writes from the app. All the necessary data is already available in PublishOperation. One downside of this is we might double write to the cache, especially in the case of self-hosted executors that aren't getting updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the double-write for older executor clients (and during the rollout) seems fine - if we notice it being a problem we could ask people to upgrade and/or gate the logic behind some sort of version check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the version without avoiding double-writes: #8009. Still waiting for Tyler's thoughts on which one he prefers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tyler's OK with the double writes, so ignore this PR in favor of #8009
it's not necessary for this change.
This makes it easier to remove it, and only it
@@ -1007,31 +1008,61 @@ func (s *ExecutionServer) PublishOperation(stream repb.Execution_PublishOperatio | |||
return err | |||
} | |||
|
|||
if op.GetName() == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this change feels a little risky, is it needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could revert this change, but as far as I can tell, there's no way for PublishOperation to do anything useful without the name being an execution ID.
This is done when the executor sends a completed action to PublishOperation, and only when the executor sends the
ExecutionTask
in auxiliary metadata. This allows us to avoid a period of time where both the executor and the app write to the cache.I didn't include the executor changes in this PR. My plan was to let this roll out, and then send an executor PR that would start
ExecutionAuxiliaryMetadata.execution_task
and stop writing to the cache. This avoids double writes, but it does still leave a window where we have no writes, if the executor is updated before the app. Let me know what you think.