KEP-5491: DRA: List Types for Attributes#5698
KEP-5491: DRA: List Types for Attributes#5698k8s-ci-robot merged 12 commits intokubernetes:masterfrom
Conversation
|
Skipping CI for Draft Pull Request. |
6740a4d to
ff35ed6
Compare
ff35ed6 to
e418c60
Compare
| - Initial supported semantics are | ||
| - `NonEmptyIntersection` and `Identical` for `matchAttribute` | ||
| - `EmptyIntersection`, `PairwiseDisjoint`, and `AllDistinct` for `distinctAttribute` | ||
| - Maintain backward compatibility and inter-operability for scalar-only attributes. |
There was a problem hiding this comment.
We also need to figure out what to do with CEL expressions
There was a problem hiding this comment.
Agreed.
Figuring out how existing cel expressions comparing attributes to single-value scalars work in the presence of list attributes is important. A set of strawman examples using a hypothetical includes helper:
.attributes["foo"] is 1
`.attributes["foo"] == 1` cel -> true
`.attributes["foo"].includes(1)` cel -> true
.attributes["foo"] is [1]
`.attributes["foo"] == 1` cel -> ?, maybe true?
can maybe do with an overload if we want to
`.attributes["foo"].includes(1)` cel -> true
.attributes["foo"] is [1,2]
`.attributes["foo"] == 1` cel -> ?, probably false
`.attributes["foo"].includes(1)` cel -> true
There was a problem hiding this comment.
Wow, thanks for the great feedback! The compatibility I originally referred to only supposed the ResourceClaim's constraint objects.
Figuring out how existing cel expressions comparing attributes to single-value scalars work in the presence of list attributes is important.
But, I feel it's very difficult. If the attribute type is changed, I think it will be difficult to keep existing CEL expressions working without any modifications. This happens with any type changes, not only with scalar to list. It is because CEL is a strongly typed language, the types simply won’t match, i.e. it won’t compile.
Thus, even if we provide some helper functions/macros, I think we basically have to ask users to update thier CELs....
What do you think??
note: include could be written with standard exists macro, right?
.attributes["foo"].includes(1)
<==> .attributes["foo"].exists(x, x == 1)
I fully agree include is much easier to read though.
There was a problem hiding this comment.
So you can't override == in CEL? You can't do type coercion in operators?
If that's the case and so how it behaves today for other types, probably we should just fail and not try to solve this for users. It's unfortunate to need users to update CEL based on what version of the driver is running, though.
There was a problem hiding this comment.
If I'm reading this right we want a more convenient equivalent of type(attributes["foo"]) == list ? 1 in attributes["foo"] : attributes["foo"] == 1 ?
While it MIGHT be possible to define a CEL custom type that allows for a list type where attributes["foo"] == 1 is equivalent to attributes["foo"] == [1], it wouldn't be idiomatic. I don't recommend going that way. It would diverge from normal CEL type system expectations and feels confusing to anyone that already has an understanding of how the CEL type system is suppose to work.
The options I can think of are:
- Add a function like @liggitt suggested above. CEL supports function overloading so it's unsurprising in CEL to have a function operate differently depending on receiver and argument types.
- Overload the
inoperate to allow for1 in attributes["foo"].
I prefer adding a function. I consider overloading in less bad than overloading == but it's still surprising.
There was a problem hiding this comment.
Thanks, Joe. @everpeace let's add a helper function that can do the type-agnostic comparison in CEL.
@dom4ha once that's in, I think this is ready to go!
There was a problem hiding this comment.
Oh, thanks for the great discussion! OK. I'll add type-agnostic helper function (say .include). I'll update the KEP.
added: fixed in 2e25605
|
Such constraints will greatly complicate scheduling. Do we have any idea how we could approach it systematically to avoid checking all permutations? |
|
The simple version just changes the exist "equal" and "not equal" functions to do type coercions. So it has no impact on permutation evaluation. |
Co-authored-by: John Belamaric <[email protected]>
…matchAttribute semantics)
…monotinicity" to goal
…original one(introducing *Semantics fields) to Alternative seceion.
everpeace
left a comment
There was a problem hiding this comment.
@johnbelamaric @liggitt Thank you very much for the review!! I largely simplified the proposal so that it proposed just adding list type and no API changes on constraints side but just extending the semantics (No complex *Semantics fields).
Please take a look.
For the backward compatibility on existing CELs(in device selectors) issue, I also commented(I think it's hard...). Let's keep discussing.
| A good summary is probably at least a paragraph in length. | ||
| --> | ||
|
|
||
| The Device Resource Assignment (DRA) API currently allows scalar attribute values to describe device characteristics. However, many real-world device topologies require representing sets of relationships (e.g., multiple PCIe roots, NUMA nodes). This KEP introduces support for list-typed attributes in `ResourceSlice` and extends `ResourceClaim`'s `constraints` with a new declarative field, `matchSemantics`/`distinctSemantics`, that can configure _how_ attribute values should be evaluated. |
There was a problem hiding this comment.
OK. Let's remove *Semantics fields
|
|
||
| - Support typed-list in device attribute values. | ||
| - Define an extensible API in `ResourceClaim`'s `constraints` field which enables flexible matching semantics against single device attribute value. | ||
| - Initial supported semantics are |
There was a problem hiding this comment.
Right, we haven't had usecases for them. I'll remove this bullet item from the goal.
| - Initial supported semantics are | ||
| - `NonEmptyIntersection` and `Identical` for `matchAttribute` | ||
| - `EmptyIntersection`, `PairwiseDisjoint`, and `AllDistinct` for `distinctAttribute` | ||
| - Maintain backward compatibility and inter-operability for scalar-only attributes. |
There was a problem hiding this comment.
Wow, thanks for the great feedback! The compatibility I originally referred to only supposed the ResourceClaim's constraint objects.
Figuring out how existing cel expressions comparing attributes to single-value scalars work in the presence of list attributes is important.
But, I feel it's very difficult. If the attribute type is changed, I think it will be difficult to keep existing CEL expressions working without any modifications. This happens with any type changes, not only with scalar to list. It is because CEL is a strongly typed language, the types simply won’t match, i.e. it won’t compile.
Thus, even if we provide some helper functions/macros, I think we basically have to ask users to update thier CELs....
What do you think??
note: include could be written with standard exists macro, right?
.attributes["foo"].includes(1)
<==> .attributes["foo"].exists(x, x == 1)
I fully agree include is much easier to read though.
| - Redesigning the entire DRA matching model. | ||
| - Currently `Allocator`'s algorithm assumes [_monotonic_ constraints](https://github.com/kubernetes/kubernetes/blob/v1.34.2/staging/src/k8s.io/dynamic-resource-allocation/structured/internal/experimental/allocator_experimental.go#L274-L276) only. Monotonic means that once a constraint returns false, adding more devices will never cause it to return true. This allows to bound the computational complexity for searching device combinations which satisfies the specified constraints. This KEP does NOT intend to change this design. Thus, this KEP focuses to propose monotonic constraints only. |
There was a problem hiding this comment.
OK. I will move this in goal section.
| A good summary is probably at least a paragraph in length. | ||
| --> | ||
|
|
||
| The Device Resource Assignment (DRA) API currently allows scalar attribute values to describe device characteristics. However, many real-world device topologies require representing sets of relationships (e.g., multiple PCIe roots, NUMA nodes). This KEP introduces support for list-typed attributes in `ResourceSlice` and extends `ResourceClaim`'s `constraints` with a new declarative field, `matchSemantics`/`distinctSemantics`, that can configure _how_ attribute values should be evaluated. |
|
|
||
| - Support typed-list in device attribute values. | ||
| - Define an extensible API in `ResourceClaim`'s `constraints` field which enables flexible matching semantics against single device attribute value. | ||
| - Initial supported semantics are |
| - Redesigning the entire DRA matching model. | ||
| - Currently `Allocator`'s algorithm assumes [_monotonic_ constraints](https://github.com/kubernetes/kubernetes/blob/v1.34.2/staging/src/k8s.io/dynamic-resource-allocation/structured/internal/experimental/allocator_experimental.go#L274-L276) only. Monotonic means that once a constraint returns false, adding more devices will never cause it to return true. This allows to bound the computational complexity for searching device combinations which satisfies the specified constraints. This KEP does NOT intend to change this design. Thus, this KEP focuses to propose monotonic constraints only. |
| pool: | ||
| name: "cpu" | ||
| resourceSliceCount: 1 | ||
| allNodes: node-1 |
|
|
||
| - [x] Feature gate (also fill in values in `kep.yaml`) | ||
| - Feature gate name: `DRAListTypeAttributes` | ||
| - Components depending on the feature gate: kube-apiserver, kube-**scheduler** |
| NOTE: Also set `disable-supported` to `true` or `false` in `kep.yaml`. | ||
| --> | ||
|
|
||
| Yes. When disabled, you can not create `ResourceClaim` with `matchSemantics`/`distinctSemantics` nor `DeviceAttribute` with `list`-type values. And, existing `list`-type attribute values are just ignored. But, if specified attribute in `matchAttribute`/`distinctAttribute` is `list` type, allocation will be failed. |
There was a problem hiding this comment.
Can you update this based on the simplified design?
There was a problem hiding this comment.
Thanks for the catch! I'll update.
added: fixed in 3ca5c9e
|
|
||
| ### Implementation (for evaluating constraints) | ||
|
|
||
| Since _non-empty intersection_ constraint is _monotonic_, we would not need updating [`Allocator.Allocate()` algorithm](https://github.com/kubernetes/kubernetes/blob/v1.34.2/staging/src/k8s.io/dynamic-resource-allocation/structured/internal/experimental/allocator_experimental.go#L135) and can keep using [`constraint` interface](https://github.com/kubernetes/kubernetes/blob/v1.34.2/staging/src/k8s.io/dynamic-resource-allocation/structured/internal/experimental/allocator_experimental.go#L703-L712). We will just extend the current [`matchAttributeConstraint`](https://github.com/kubernetes/kubernetes/blob/v1.34.2/staging/src/k8s.io/dynamic-resource-allocation/structured/internal/experimental/allocator_experimental.go#L721C6-L728) and [`distinctAttributeConstraint`](https://github.com/kubernetes/kubernetes/blob/v1.34.2/staging/src/k8s.io/dynamic-resource-allocation/structured/internal/experimental/constraint.go#L34-L41) instances. Or, we could introduce `constraint` instances for proposed modes (e.g., `nonEmptyIntersectionMatchAttributeConstraint`, etc.). |
There was a problem hiding this comment.
Update this for the simplified design.
There was a problem hiding this comment.
Actually, I think this is already updated in e470001e0872739f4042aa347b5d07ca1ddfade6. We need to update allocator implementation so that it can handle list type.
| #### Story 2 | ||
|
|
||
| T.B.D. |
There was a problem hiding this comment.
I am not aware of use cases for distinct with lists, but because it's a new type, any existing operations need to be updated to handle that type. This is a way to do it that maintains the complimentary nature of the two types of constraints.
dom4ha
left a comment
There was a problem hiding this comment.
Simplified enhancement looks good and does not add much complexity. If you come up with a conclusion regarding CEL, I will have time tomorrow morning to approve this KEP.
| [experience reports]: https://github.com/golang/go/wiki/ExperienceReports | ||
| --> | ||
|
|
||
| The `ResourceSlice` API allows users to attach scalar attributes to devices. These can be used to allocate devices that share common topology within the node. For certain types of topological relationships, scalar values are insufficient. For example, a CPU may have adjacency to multiple PCIe roots. This enhancement proposes allowing attributes to be lists. The semantics of the `MatchAttribute` and `DistinctAttribute` constraints must adapt to the possibility of lists. For example, rather than defining an attribute "match" as equality, it would be defined as a non-empty intersection, treating scalars as single-element lists. Conversely, "distinct" attributes for lists would be defined as an empty intersection. |
There was a problem hiding this comment.
For example, a CPU may have adjacency to multiple PCIe roots
Is adjacency to PCIe roots is the same as running on a PCIe root? Can we have a device which is running on a given root and is adjacent to another two at the same time.
Can all those three roots be advertised as resource.kubernetes.io/pcieRoot or need to have separate labels like resource.kubernetes.io/adjacentPcieRoot for which we cannot construct a common matchAttribute constraint?
There was a problem hiding this comment.
One attribute is sufficient for all the known use cases, there are not two different types of relationships, it's just not a singular relationship for CPUs.
There was a problem hiding this comment.
The simple version just changes the exist "equal" and "not equal" functions to do type coercions. So it has no impact on permutation evaluation.
I see, so we don't make it any worse and keep using monotonic search algorithm and consider all values from the list rather than one.
However, matchAttribute functionality itself makes the search computationally expensive and it needs to be performed per each node. Do we know how it's doing in practices? I could easily imagine that it would generate many possible options to scan (even without CEL constraints #5254).
Another question is how users are handling these complex constraints and debug why they workload haven't scheduled. We faced very similar concern regarding a KEP proposing adding CEL expressions to tolerations #5822 and for now we've put it on hold. The questions are similar in both cases.
There was a problem hiding this comment.
I don't have any real world data on that; in simulations it's not too bad. The current naive algorithm is more-or-less a brute force search. But with the well-defined semantics of match it's possible to optimize the algorithm to reduce the search space quickly. That sort of thing can be achieved, for example, with a simple b-tree.
In practice, we expect these lists to be short, as they correspond for example roughly with number of PCIe switches per CPU. @everpeace is that accurate?
So, doing non-empty intersection is a more expensive operation, for sure. But I doubt it is material (we should measure it in scheduler_perf tests)
There was a problem hiding this comment.
However, matchAttribute functionality itself makes the search computationally expensive and it needs to be performed per each node. Do we know how it's doing in practices? I could easily imagine that it would generate many possible options to scan (even without CEL constraints #5254).
It's correct. matchAttribute/distinctAttribute could be basically large complexity. In DRA, Allocator is responsible for searching device comibination which satisfies the given constraint.
But, the Allocator implementation does NOT try all the combination in practice. They have an assumption that the given constraint is monotonic. Monotonic means that once a constraint returns false, adding more devices will never cause it to return true. This allows to bound the computational complexity for searching device combinations in practice.
Another question is how users are handling these complex constraints and debug why they workload haven't scheduled. We faced very similar concern regarding a KEP proposing adding CEL expressions to tolerations #5822 and for now we've put it on hold. The questions are similar in both cases.
Yes. I also recognize this issue. However, I think this issue should be resolved in an separate KEP. I'd like to let this KEP focus to add list type attribute and extending the semantics of matchAttribute/distinctAttribute. WDYT?
In practice, we expect these lists to be short, as they correspond for example roughly with number of PCIe switches per CPU. @everpeace is that accurate?
Although I'm not a specialist of hardware, as far as I know, Yes. Typical large GPU servers may have 2 to 4 PCIe root per socket. And this proposal limits the length of the list attribute with +k8s:maxItems=64
There was a problem hiding this comment.
@johnbelamaric @jpbetz @dom4ha Thanks for your review and discussion! I added some commits to address your comments! PTAL 🙇
If you come up with a conclusion regarding CEL, I will have time tomorrow morning to approve this KEP.
I think this is concluded in this thread that we will add some helper function, say include, that can work both for scalar and list. The KEP is also updated in 2e25605 for this.
There was a problem hiding this comment.
However, matchAttribute functionality itself makes the search computationally expensive and it needs to be performed per each node. Do we know how it's doing in practices? I could easily imagine that it would generate many possible options to scan (even without CEL constraints #5254).
It's correct. matchAttribute/distinctAttribute could be basically large complexity. In DRA, Allocator is responsible for searching device comibination which satisfies the given constraint.
But, the Allocator implementation does NOT try all the combination in practice. They have an assumption that the given constraint is monotonic. Monotonic means that once a constraint returns false, adding more devices will never cause it to return true. This allows to bound the computational complexity for searching device combinations in practice.
Another question is how users are handling these complex constraints and debug why they workload haven't scheduled. We faced very similar concern regarding a KEP proposing adding CEL expressions to tolerations #5822 and for now we've put it on hold. The questions are similar in both cases.
Yes. I also recognize this issue. However, I think this issue should be resolved in an separate KEP. I'd like to let this KEP focus to add list type attribute and extending the semantics of matchAttribute/distinctAttribute. WDYT?
In practice, we expect these lists to be short, as they correspond for example roughly with number of PCIe switches per CPU. @everpeace is that accurate?
Although I'm not a specialist of hardware, as far as I know, Yes. Typical large GPU servers may have 2 to 4 PCIe root per socket. And this proposal limits the length of the list attribute with +k8s:maxItems=64
|
|
||
| ### Implementation (for evaluating constraints) | ||
|
|
||
| Since _non-empty intersection_ constraint is _monotonic_, we would not need updating [`Allocator.Allocate()` algorithm](https://github.com/kubernetes/kubernetes/blob/v1.34.2/staging/src/k8s.io/dynamic-resource-allocation/structured/internal/experimental/allocator_experimental.go#L135) and can keep using [`constraint` interface](https://github.com/kubernetes/kubernetes/blob/v1.34.2/staging/src/k8s.io/dynamic-resource-allocation/structured/internal/experimental/allocator_experimental.go#L703-L712). We will just extend the current [`matchAttributeConstraint`](https://github.com/kubernetes/kubernetes/blob/v1.34.2/staging/src/k8s.io/dynamic-resource-allocation/structured/internal/experimental/allocator_experimental.go#L721C6-L728) and [`distinctAttributeConstraint`](https://github.com/kubernetes/kubernetes/blob/v1.34.2/staging/src/k8s.io/dynamic-resource-allocation/structured/internal/experimental/constraint.go#L34-L41) instances. Or, we could introduce `constraint` instances for proposed modes (e.g., `nonEmptyIntersectionMatchAttributeConstraint`, etc.). |
There was a problem hiding this comment.
Actually, I think this is already updated in e470001e0872739f4042aa347b5d07ca1ddfade6. We need to update allocator implementation so that it can handle list type.
| NOTE: Also set `disable-supported` to `true` or `false` in `kep.yaml`. | ||
| --> | ||
|
|
||
| Yes. When disabled, you can not create `ResourceClaim` with `matchSemantics`/`distinctSemantics` nor `DeviceAttribute` with `list`-type values. And, existing `list`-type attribute values are just ignored. But, if specified attribute in `matchAttribute`/`distinctAttribute` is `list` type, allocation will be failed. |
There was a problem hiding this comment.
Thanks for the catch! I'll update.
added: fixed in 3ca5c9e
| --> | ||
| Yes and no. It does add new fields, which increase the worst case size of `ResourceSlice` and `ResourceClaim` object. However, the increase size is bounded for most cases: | ||
| - `ResourceClaim`: linear to the number of constraints specified in the resource. | ||
| - `ResourceSlice`: linear to the number of devices defined in the resource. And, the number of list items is also bounded. |
There was a problem hiding this comment.
OK! Thanks for the info!! I'll keep it in mind. Would you mind sharing some pointer for this validation (varing max/min number of device)??
| - Initial supported semantics are | ||
| - `NonEmptyIntersection` and `Identical` for `matchAttribute` | ||
| - `EmptyIntersection`, `PairwiseDisjoint`, and `AllDistinct` for `distinctAttribute` | ||
| - Maintain backward compatibility and inter-operability for scalar-only attributes. |
There was a problem hiding this comment.
Oh, thanks for the great discussion! OK. I'll add type-agnostic helper function (say .include). I'll update the KEP.
added: fixed in 2e25605
|
/hold for sig scheduling approval @dom4ha if it looks good to you, please approve and unhold before the deadline. |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dom4ha, everpeace, johnbelamaric The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@johnbelamaric @dom4ha Thank you very much for your approval!!! I think we need an extra |
|
/lgtm |
matchAttribute/distinctAttributeconstaint are extended as "non-empty intersection" semantics (scalar can be implicitly treated as single value list)./wg device-management
/sig scheduling