Write to the action cache in the execution server #7974

vanja-p · 2024-11-27T21:06:30Z

This is done when the executor sends a completed action to PublishOperation, and only when the executor sends the ExecutionTask in auxiliary metadata. This allows us to avoid a period of time where both the executor and the app write to the cache.

I didn't include the executor changes in this PR. My plan was to let this roll out, and then send an executor PR that would start ExecutionAuxiliaryMetadata.execution_task and stop writing to the cache. This avoids double writes, but it does still leave a window where we have no writes, if the executor is updated before the app. Let me know what you think.

bduffany · 2024-11-27T23:41:24Z

enterprise/server/remote_execution/execution_server/execution_server.go

+		resultDigest, err := digest.AddInvocationIDToDigest(actionResourceName.GetDigest(), actionResourceName.GetDigestFunction(), md.GetExecutionTask().GetInvocationId())
+		if err != nil {
+			return status.UnavailableErrorf("Error uploading action result: %s", err.Error())
+		}
+		cacheableResourceName = digest.NewResourceName(resultDigest, actionResourceName.GetInstanceName(), rspb.CacheType_AC, actionResourceName.GetDigestFunction())


I think these failed ActionResults are no longer used, so we can short-circuit in this case. The UI fetches the response that was stored with cacheExecuteResponse now, which includes the complete response and works for both failed and succeeded actions.

I sent #7985 to first remove these writes.

bduffany · 2024-11-27T23:47:12Z

proto/execution_stats.proto

@@ -135,6 +135,9 @@ message Execution {
 message ExecutionAuxiliaryMetadata {
  // Platform overrides set via remote header.
  build.bazel.remote.execution.v2.Platform platform_overrides = 1;
+
+  // The ExecutionTask that the executor received from the scheduler.
+  build.bazel.remote.execution.v2.ExecutionTask execution_task = 2;


just as a note - once we start populating this, we probably want to keep an eye on the prometheus metrics for AC write size and make sure it looks OK

Would that be the metric on the right of this screenshot?

Also, I decided it's probably best not to write the auxiliary metadata in cacheExecuteResponse. It's not used and it can be large.

The UI uses the auxiliary metadata (if you were looking for references using a case-sensitive search for AuxiliaryMetadata you'll miss the UI references because they are lowerCamelCase)

re. metrics, Upload throughput accounts for CAS blobs as well - you could either run a one-off query (Explore) to restrict to the AC label or update that dashboard to split out those charts by AC vs CAS

Ah, that's a good point about the UI. I decided to play it safe and not remove any existing data. Instead, the ExecutionTask will come as a separate auxiliary metadata, which can easily be removed without messing with the other metadata. PTAL.

Now that I don't need the invocation ID, another option is to not use the presence of the ExecutionTask to trigger cache writes from the app. All the necessary data is already available in PublishOperation. One downside of this is we might double write to the cache, especially in the case of self-hosted executors that aren't getting updated.

I think the double-write for older executor clients (and during the rollout) seems fine - if we notice it being a problem we could ask people to upgrade and/or gate the logic behind some sort of version check.

Here is the version without avoiding double-writes: #8009. Still waiting for Tyler's thoughts on which one he prefers.

Tyler's OK with the double writes, so ignore this PR in favor of #8009

it's not necessary for this change.

This makes it easier to remove it, and only it

bduffany · 2024-12-03T20:55:54Z

enterprise/server/remote_execution/execution_server/execution_server.go

@@ -1007,31 +1008,61 @@ func (s *ExecutionServer) PublishOperation(stream repb.Execution_PublishOperatio
 			return err
 		}

+		if op.GetName() == "" {


this change feels a little risky, is it needed?

I could revert this change, but as far as I can tell, there's no way for PublishOperation to do anything useful without the name being an execution ID.

Write to the cache in the execution server

9cc4995

vanja-p requested review from tylerwilliams and bduffany November 27, 2024 21:06

Fix BUILD ordering

0990e76

bduffany reviewed Nov 27, 2024

View reviewed changes

vanja-p added 7 commits December 2, 2024 10:58

Merge branch 'master' into vanja-cache

0b866aa

Remove rbetest.go from the PR

fbb38f5

it's not necessary for this change.

address feedback

a0e84f5

Merge branch 'master' into vanja-cache

e271cec

fix failing test.

51bdd59

Pass ExecutionTask as its own auxiliary metadata

0251520

This makes it easier to remove it, and only it

fix style

2d9dbf9

vanja-p requested a review from bduffany December 3, 2024 20:33

bduffany approved these changes Dec 3, 2024

View reviewed changes

vanja-p marked this pull request as draft December 4, 2024 22:18

vanja-p closed this Dec 6, 2024

vanja-p deleted the vanja-cache branch December 6, 2024 15:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write to the action cache in the execution server #7974

Write to the action cache in the execution server #7974

vanja-p commented Nov 27, 2024 •

edited

Loading

bduffany Nov 27, 2024

vanja-p Dec 2, 2024

bduffany Nov 27, 2024 •

edited

Loading

vanja-p Dec 2, 2024

vanja-p Dec 2, 2024

bduffany Dec 2, 2024 •

edited

Loading

bduffany Dec 2, 2024 •

edited

Loading

vanja-p Dec 3, 2024

bduffany Dec 3, 2024

vanja-p Dec 4, 2024

vanja-p Dec 4, 2024

bduffany Dec 3, 2024

vanja-p Dec 4, 2024

Write to the action cache in the execution server #7974

Write to the action cache in the execution server #7974

Conversation

vanja-p commented Nov 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bduffany Nov 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bduffany Dec 2, 2024 • edited Loading

Choose a reason for hiding this comment

bduffany Dec 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vanja-p commented Nov 27, 2024 •

edited

Loading

bduffany Nov 27, 2024 •

edited

Loading

bduffany Dec 2, 2024 •

edited

Loading

bduffany Dec 2, 2024 •

edited

Loading