Skip to content

Commit

Permalink
Recover updates to docs and release notes (#356)
Browse files Browse the repository at this point in the history
* Recover updates to docs and release notes

* Add more features to release notes

Change-Id: I1783a3e890da9a0599b83452e77d956e58d83ec6

* Better wording

Change-Id: Idb8737e5ba5f55863438d601743a15e5967f6ea1
  • Loading branch information
alculquicondor authored Aug 25, 2022
1 parent c87be91 commit 8016971
Show file tree
Hide file tree
Showing 5 changed files with 71 additions and 52 deletions.
30 changes: 19 additions & 11 deletions CHANGELOG/CHANGELOG-0.2.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,24 +3,32 @@
Changes since `v0.1.0`:

### Features
- Bumped the API version from v1alpha1 to v1alpha2. v1alpha1 is no longer supported and Queue is now named LocalQueue.
- Add webhooks to validate and add defaults to all kueue APIs.

- Upgrade the API version from v1alpha1 to v1alpha2. v1alpha1 is no longer supported.
v1alpha2 includes the following changes:
- Rename Queue to LocalQueue.
- Remove ResourceFlavor.labels. Use ResourceFlavor.metadata.labels instead.
- Add webhooks to validate and to add defaults to all kueue APIs.
- Add internal cert manager to serve webhooks with TLS.
- Use finalizers to prevent ClusterQueues and ResourceFlavors in use from being
deleted prematurely.
- Support [codependent resources](/docs/concepts/cluster_queue.md#codepedent-resources)
by assigning the same flavor to codependent resources in a pod set.
- Support [pod overhead](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-overhead/)
in Workload pod sets.
- Default requests to limits if requests are not set in a Workload pod set, to
match internal defaulting for k8s Pods.
- Added [prometheus metrics](/docs/reference/metrics.md) to monitor health of
- Set requests to limits if requests are not set in a Workload pod set,
matching [internal defaulting for k8s Pods](https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#resources).
- Add [prometheus metrics](/docs/reference/metrics.md) to monitor health of
the system and the status of ClusterQueues.
- Use Server Side Apply for Workload admission to reduce API conflicts.

### Bug fixes

- Prevent Workloads that don't match the ClusterQueue's namespaceSelector from
blocking other Workloads in a StrictFIFO ClusterQueue.
- Fixed number of pending workloads in a BestEffortFIFO ClusterQueue.
- Fixed bug in a BestEffortFIFO ClusterQueue where a workload might not be
- Fix bug that caused Workloads that don't match the ClusterQueue's
namespaceSelector to block other Workloads in StrictFIFO ClusterQueues.
- Fix the number of pending workloads in BestEffortFIFO ClusterQueues status.
- Fix a bug in BestEffortFIFO ClusterQueues where a workload might not be
retried after a transient error.
- Fixed requeuing an out-of-date workload when failed to admit it.
- Fixed bug in a BestEffortFIFO ClusterQueue where unadmissible workloads
- Fix requeuing an out-of-date workload when failed to admit it.
- Fix a bug in BestEffortFIFO ClusterQueues where inadmissible workloads
were not removed from the ClusterQueue when removing the corresponding Queue.
17 changes: 8 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,15 @@ created) and when it should stop (as in active pods should be deleted).
## Why use Kueue

Kueue is a lean controller that you can install on top of a vanilla Kubernetes
cluster without replacing any components. It is compatible with cloud
environments where:
- Nodes and other compute resources can be scaled up and down.
cluster. Kueue does not replace any existing Kubernetes components. Kueue is
compatible with cloud environments where:
- Compute resources are elastic and can be scaled up and down.
- Compute resources are heterogeneous (in architecture, availability, price, etc.).

Kueue APIs allow you to express:
- Quotas and policies for fair sharing among tenants.
- Resource fungibility: if a [resource flavor](docs/concepts/cluster_queue.md#resourceflavor-object)
is fully utilized, run the [job](docs/concepts/workload.md) using a different
flavor.
is fully utilized, Kueue can admit the job using a different flavor.

The main design principle for Kueue is to avoid duplicating mature functionality
in [Kubernetes components](https://kubernetes.io/docs/concepts/overview/components/)
Expand Down Expand Up @@ -62,11 +61,11 @@ Learn more about:

<!-- TODO(#64) Remove links to google docs once the contents have been migrated to this repo -->

Learn more about the architecture of Kueue in the design docs:
Learn more about the architecture of Kueue with the following design docs:

- [bit.ly/kueue-apis](https://bit.ly/kueue-apis) (please join the [mailing list](https://groups.google.com/a/kubernetes.io/g/wg-batch)
to get access) discusses the API proposal and a high-level description of how it
operates.
- [bit.ly/kueue-apis](https://bit.ly/kueue-apis) discusses the API proposal and a high
level description of how Kueue operates. Join the [mailing list](https://groups.google.com/a/kubernetes.io/g/wg-batch)
to get document access.
- [bit.ly/kueue-controller-design](https://bit.ly/kueue-controller-design)
presents the detailed design of the controller.

Expand Down
69 changes: 40 additions & 29 deletions docs/concepts/cluster_queue.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# Cluster Queue

A ClusterQueue is a cluster-scoped object that governs a pool of resources
such as CPU, memory and hardware accelerators. A `ClusterQueue` defines:
- The [resource _flavors_](#resourceflavor-object) that it manages, with usage
limits and order of consumption.
such as CPU, memory, and hardware accelerators. A ClusterQueue defines:
- The [resource _flavors_](#resourceflavor-object) that the ClusterQueue manages,
with usage limits and order of consumption.
- Fair sharing rules across the tenants of the cluster.

Only [cluster administrators](/docs/tasks#batch-administrator) should create `ClusterQueue` objects.
Expand Down Expand Up @@ -39,29 +39,29 @@ You can specify the quota as a [quantity](https://kubernetes.io/docs/reference/k
## Resources
In a ClusterQueue, you can define quotas for multiple [compute resources](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#resource-types)
(cpu, memory, GPUs, etc.).
(CPU, memory, GPUs, etc.).
For each resource, you can define quotas for multiple _flavors_. A
flavor represents different variations of a resource. The variations can be
defined in a [ResourceFlavor object](#resourceflavor-object).
For each resource, you can define quotas for multiple _flavors_.
Flavors represent different variations of a resource (for example, different GPU
models). A flavor is defined using a [ResourceFlavor object](#resourceflavor-object).
In a process called [admission](.#admission), Kueue assigns
[Workload pod sets](workload.md#pod-sets) a flavor for each resource it requests.
In a process called [admission](.#admission), Kueue assigns to the
[Workload pod sets](workload.md#pod-sets) a flavor for each resource the pod set
requests.
Kueue assigns the first flavor in the ClusterQueue's `.spec.resources[*].flavors`
list that has enough unused `min` quota in the ClusterQueue or the
ClusterQueue's [cohort](#cohort).

### Codepedent resources

It is possible that multiple resources are tied to the same flavors. This is
typical for `cpu` and `memory`, where the flavors are generally tied to a
machine family or availability guarantees.
It is possible that multiple resources in a ClusterQueue have the same flavors.
This is typical for `cpu` and `memory`, where the flavors are generally tied to
a machine family or VM availability policies. When two or more resources in a
ClusterQueue match their flavors, they are said to be codependent resources.

If this is the case, the resources in the ClusterQueue must list the same
flavors in the same order. When two or more resources match their flavors,
they are said to be codependent. During admission, for each pod set in a
Workload, Kueue assigns the same flavor to the codependent resources that the
pod set requests.
To manage codependent resources, you should list the flavors in the ClusterQueue
resources in the same order. During admission, for each pod set in a Workload,
Kueue assigns the same flavor to the codependent resources that the pod set requests.

An example of a ClusterQueue with codependent resources looks like the following:

Expand Down Expand Up @@ -150,8 +150,8 @@ Resources in a cluster are typically not homogeneous. Resources could differ in:
- architecture (ex: x86 vs ARM CPUs)
- brands and models (ex: Radeon 7000 vs Nvidia A100 vs T4 GPUs)

A ResourceFlavor is an object that represents these variations and allows you
to associate them with node labels and taints.
A ResourceFlavor is an object that represents these resource variations and
allows you to associate them with node labels and taints.

**Note**: If your cluster is homogeneous, you can use an [empty ResourceFlavor](#empty-resourceflavor)
instead of adding labels to custom ResourceFlavors.
Expand Down Expand Up @@ -197,8 +197,8 @@ steps:

For example, for a [batch/v1.Job](https://kubernetes.io/docs/concepts/workloads/controllers/job/),
Kueue adds the labels to the `.spec.template.spec.nodeSelector` field. This
guarantees that the workload Pods run on the nodes associated to the flavor
that Kueue decided that the workload should use.
guarantees that the Workload's Pods can only be scheduled on the nodes
targeted by the flavor that Kueue assigned to the Workload.

### ResourceFlavor taints

Expand All @@ -208,7 +208,7 @@ with taints.
Taints on the ResourceFlavor work similarly to [node taints](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/).
For Kueue to admit a workload to use the ResourceFlavor, the PodSpecs in the
workload should have a toleration for it. As opposed to the behavior for
[ResourceFlavor labels](#resourceflavor-labels), Kueue will not add tolerations
[ResourceFlavor labels](#resourceflavor-labels), Kueue does not add tolerations
for the flavor taints.

### Empty ResourceFlavor
Expand Down Expand Up @@ -238,16 +238,27 @@ ClusterQueue.

### Flavors and borrowing semantics

When borrowing, Kueue satisfies the following admission semantics:
When a ClusterQueue is part of a cohort, Kueue satisfies the following admission
semantics:

- When assigning flavors, Kueue goes through the list of flavors in the
ClusterQueue's `.spec.resources[*].flavors`. For each flavor, Kueue attempts
to fit a Workload's pod set using the `min` quota of the ClusterQueue or the
unused `min` quota of other ClusterQueues in the cohort, up to the `max` quota
of the ClusterQueue. If the workload doesn't fit, Kueue proceeds evaluating the next
flavor in the list.
- A ClusterQueue can only borrow quota of flavors it defines and it can only
borrow quota for one flavor.
to fit a Workload's pod set according to the quota defined in the
ClusterQueue for the flavor and the unused quota in the cohort.
If the workload doesn't fit, Kueue evaluates the next flavor in the list.
- A Workload's pod set resource fits in a flavor defined for a ClusterQueue
resource if the sum of requests for the resource:
1. Is less than or equal to the unused `.quota.min` for the flavor in the
ClusterQueue; or
2. Is less than or equal to the sum of unused `.quota.min` for the flavor in
the ClusterQueues in the cohort, and
3. Is less than or equal to the unused `.quota.max` for the flavor in the
ClusterQueue.
In Kueue, when (2) and (3) are satisfied, but not (1), this is called
_borrowing quota_.
- A ClusterQueue can only borrow quota for flavors that the ClusterQueue defines.
- For each pod set resource in a Workload, a ClusterQueue can only borrow quota
for one flavor.

### Borrowing example

Expand Down
4 changes: 2 additions & 2 deletions docs/concepts/local_queue.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ A `LocalQueue` is a namespaced object that groups closely related workloads
belonging to a single tenant. A `LocalQueue` points to one [`ClusterQueue`](cluster_queue.md)
from which resources are allocated to run its workloads.

Users submit jobs to a `LocalQueue`, instead of directly to a `ClusterQueue`.
Users submit jobs to a `LocalQueue`, instead of to a `ClusterQueue` directly.
Tenants can discover which queues they can submit jobs to by listing the
local queues in their namespace. The command looks similar to the following:
local queues in their namespace. The command is similar to the following:

```sh
kubectl get -n my-namespace localqueues
Expand Down
3 changes: 2 additions & 1 deletion docs/setup/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,8 @@ kubectl delete -f https://github.com/kubernetes-sigs/kueue/releases/download/$VE

### Upgrading from 0.1 to 0.2

Upgrading from `0.1.x` to `0.2.y` is not supported due to breaking API changes.
Upgrading from `0.1.x` to `0.2.y` is not supported because of breaking API
changes.
To install Kueue `0.2.y`, [uninstall](#uninstall) the older version first.

## Install a custom-configured released version
Expand Down

0 comments on commit 8016971

Please sign in to comment.