Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: deprecate job move within priority group #9585

Merged
merged 14 commits into from
Jul 2, 2024

Conversation

kkunapuli
Copy link
Contributor

@kkunapuli kkunapuli commented Jun 27, 2024

Ticket

RM-360

Description

Deprecate job move within the same priority group. Jobs can still be re-ordered via CLI command by updating their priority.

Test Plan

Checklist

  • Changes have been manually QA'd
  • New features have been approved by the corresponding PM
  • User-facing API changes have the "User-facing API Change" label
  • Release notes have been added as a separate file under docs/release-notes/
    See Release Note for details.
  • Licenses have been included for new code which was copied and/or modified from any external code

@cla-bot cla-bot bot added the cla-signed label Jun 27, 2024
Copy link

netlify bot commented Jun 27, 2024

Deploy Preview for determined-ui canceled.

Name Link
🔨 Latest commit ca634f2
🔍 Latest deploy log https://app.netlify.com/sites/determined-ui/deploys/667edf2ba084ec0008ae81e5

Copy link

codecov bot commented Jun 27, 2024

Codecov Report

Attention: Patch coverage is 33.33333% with 2 lines in your changes missing coverage. Please review.

Project coverage is 44.07%. Comparing base (13e92e4) to head (7fbdb4b).
Report is 22 commits behind head on job-queue-move.

❗ There is a different number of reports uploaded between BASE (13e92e4) and HEAD (7fbdb4b). Click for more details.

HEAD has 5 uploads less than BASE
Flag BASE (13e92e4) HEAD (7fbdb4b)
harness 9 4
Additional details and impacted files
@@                Coverage Diff                 @@
##           job-queue-move    #9585      +/-   ##
==================================================
- Coverage           49.86%   44.07%   -5.79%     
==================================================
  Files                1247     1086     -161     
  Lines              162293   149901   -12392     
  Branches             2887     2888       +1     
==================================================
- Hits                80922    66066   -14856     
- Misses              81200    83664    +2464     
  Partials              171      171              
Flag Coverage Δ
harness 38.64% <ø> (-25.11%) ⬇️
web 46.22% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
master/internal/db/postgres_jobs.go 92.30% <ø> (+1.00%) ⬆️
...ster/internal/rm/agentrm/agent_resource_manager.go 49.66% <ø> (-0.45%) ⬇️
master/internal/rm/agentrm/resource_pool.go 39.74% <ø> (+0.61%) ⬆️
...nal/rm/dispatcherrm/dispatcher_resource_manager.go 18.78% <ø> (+0.01%) ⬆️
master/internal/rm/kubernetesrm/job.go 79.61% <ø> (+1.59%) ⬆️
master/internal/rm/kubernetesrm/jobs.go 70.19% <100.00%> (+0.53%) ⬆️
...nal/rm/kubernetesrm/kubernetes_resource_manager.go 31.22% <ø> (+0.40%) ⬆️
master/internal/rm/kubernetesrm/resource_pool.go 85.05% <ø> (+11.42%) ⬆️
master/internal/rm/multirm/multirm.go 82.95% <ø> (-0.38%) ⬇️
master/internal/rm/resource_manager_iface.go 77.77% <ø> (ø)
... and 3 more

... and 265 files with indirect coverage changes

@determined-ci determined-ci requested a review from a team June 28, 2024 15:33
@determined-ci determined-ci added the documentation Improvements or additions to documentation label Jun 28, 2024
@kkunapuli kkunapuli changed the title first pass chore: deprecate job move within priority group Jun 28, 2024
@kkunapuli kkunapuli marked this pull request as ready for review June 28, 2024 17:36
@kkunapuli kkunapuli requested review from a team as code owners June 28, 2024 17:36
@kkunapuli kkunapuli requested review from hamidzr and MikhailKardash and removed request for hamidzr June 28, 2024 17:36
@kkunapuli kkunapuli marked this pull request as draft June 28, 2024 17:36
@kkunapuli kkunapuli marked this pull request as ready for review June 28, 2024 17:37

- Agent and Kubernetes Resource Manager: Jobs can no longer be moved within the same priority
group. Update a job's priority via CLI command to move a job ahead, or behind, of another job.
Visit :ref:`modify-job-queue-cli` for details. Deprecation was announced in 0.33.0.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Agent and Kubernetes Resource Manager: Jobs can no longer be moved within the same priority group. To reposition a job, update its priority using the CLI. For detailed instructions, visit :ref:modify-job-queue-cli. This change was announced in version 0.33.0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to self: adjust priority in CLI or webui

Copy link
Member

@tara-det-ai tara-det-ai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added suggestion

@kkunapuli kkunapuli requested review from NicholasBlaskey and removed request for carolinaecalderon June 28, 2024 22:48
Ahead: true,
ResourcePool: j.ResourcePool(),
})
return fmt.Errorf("action not supported - update priority to move job")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should update the proto definition to no longer have these options

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we leave it like this, we can have a more informative error message though right?

I was thinking we could leave it until the UI changes are done to actually remove the option.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UI and the API being removed should happen at the same time (as in the same release) so I don't think the error message is that relevant here

We should update the proto definition comments / mark it as deprecated in proto at the least so our API documentation is accurate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UI updates need to be prioritized. That could be in a future release. My understanding from product is that it was an acceptable trade-off.

https://hpe-aiatscale.slack.com/archives/C06GMG83ZE0/p1718299032071299?thread_ts=1718298903.798219&cid=C06GMG83ZE0

I agree with marking the proto definition as deprecated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the UI happen removal happen first?

Can we just wait till the UI removal happens then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understand what you're suggesting. Are you suggesting we shouldn't merge this PR until the corresponding front-end work is completed?

My understanding from previous conversations is that we were going to remove as much as possible from the backend, and leave it as a backlog task to update the front-end. https://hpe-aiatscale.slack.com/archives/C06GMG83ZE0/p1718311967982659?thread_ts=1718298903.798219&cid=C06GMG83ZE0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That previous conversation is talking about a slightly different situation. The previous conversation assumed that Kubernetes would still support this feature and the backlog ticket is to grey out it for agentrm / slurm cases. This change now has removed the feature from all resource managers. I think there is a gap between having a button that only works in certain situations verse a button that never works.

Have we asked the frontend team if by the release would be a reasonable deadline for their work on this?

@kkunapuli kkunapuli changed the base branch from main to job-queue-move July 2, 2024 18:19
@determined-ci determined-ci requested a review from a team July 2, 2024 18:46
@kkunapuli kkunapuli merged commit 95e090f into job-queue-move Jul 2, 2024
76 of 92 checks passed
@kkunapuli kkunapuli deleted the kunapuli/deprecate-job-queue-move branch July 2, 2024 22:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants