KREP-013: Multicluster support via multicluster-runtime#1223
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: mjudeikis The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
jakobmoellerdev
left a comment
There was a problem hiding this comment.
I believe this is a good proposal, has an implementation backed proof of concept, and can work without maintaining any debt, building on existing primitives.
I'm in support of the proposal, well done.
| available on all clusters. Today this requires 50 KRO installations, 50 copies | ||
| of each RGD, and no centralized way to observe instance health across the fleet. | ||
|
|
||
| The hub-spoke model solves this: define RGDs once in the hub, KRO automatically |
There was a problem hiding this comment.
I am lacking a support for spoke-to-spoke communication and Meta RGDs that reconcile around multiple clusters. maybe out of scope for this proposal but should probably be mentioned right?
There was a problem hiding this comment.
Spoke to spoke is out of scope. This would require dependencies, like network connectivity between them. One could build this using mcr, but this is out of scope.
Meta RGDs that reconcile around multiple clusters
what do you mean by this? RGDs produced by RGDs?
There was a problem hiding this comment.
Maybe for MCR it is, but not for KRO. we need to put this into the proposal
There was a problem hiding this comment.
Can you give me a use case for spoke-to-spoke? Im having a hard time wrapping my head around why one would want this in the initial version. I think im missing something here.
There was a problem hiding this comment.
If you wanna maintain a graph across clusters this becomes interesting:
Example: Service Mesh Configuration between 2 clusters.
You could use KRO to:
- Setup both clusters with certificate / mTLS information
- Configure Ingress/Egress on both clusters
- Then establish connectivity between both
all from one graph
There was a problem hiding this comment.
But this is hub-to-spoke, as the graph still comes from a central hub cluster as orchestration for both leaf clusters. There would be no KRO running in any of the leaf/spoke clusters, so no KRO operations as such.
The connection establishment is a side-effect of how KRO can be used, not direct spoke-to-spoke communication by KRO itself.
|
|
||
| **Hub and Spoke Separation** | ||
|
|
||
| RGDs are authored and stored only in the hub cluster. The RGD controller runs |
There was a problem hiding this comment.
We will need to update this a little because GraphRevisions are now a thing!
There was a problem hiding this comment.
Ah, wrote this looking to old version of KRO :D updating to represent how this is handled
There was a problem hiding this comment.
TL:DR:
GraphRevisions are never distributed to spokes. They are an internal hub
concern — spokes only receive the generated CRDs. The revision registry is
hub-local in-memory state that instance controllers on the hub read from
directly.
| 1. RGD controller generates CRD (existing behavior) | ||
| 2. CRD is applied to the hub cluster (existing behavior) | ||
| 3. CRD is applied to all engaged spoke clusters via `ClusterClientFactory` | ||
| 4. When a new cluster is engaged, all existing CRDs are distributed to it |
There was a problem hiding this comment.
how would we control a rollout between multiple runtime clusters? (i.e. phased)
Also how would we control lifecycle differentiation (as in spoke A gets Graph A, spoke B gets Graph B)? Is such a thing possible with mcr?
There was a problem hiding this comment.
No, this would require to pull logic from KREP-012 in a way. One could build something like this (like deployment rollout strategies), but because it works on the reconcile loop level, it does not have it. We would need to add something at the API level.
There was a problem hiding this comment.
We can add this, but I would do this as follow-up
There was a problem hiding this comment.
For me the KREP will be incomplete without a vision on how to move to phased rollout between clusters TBH. even if it is just an outlook
There was a problem hiding this comment.
How would you imagine this? The idea I was thinking of is a separate object, like RGDRollout, which is based on selectors/labels, that would orchestrate it. Easiest, would not overload existing API and would be opt-in only when using multi-cluster mode.
| 3. CRD is applied to all engaged spoke clusters via `ClusterClientFactory` | ||
| 4. When a new cluster is engaged, all existing CRDs are distributed to it | ||
|
|
||
| CRD distribution uses server-side apply to handle conflicts gracefully. If a |
There was a problem hiding this comment.
nit: I feel like this is irrelevant to the Proposal
There was a problem hiding this comment.
I think we need to be explicit for this. We don't have a CRD distribution problem as it's a singleton cluster. After this was implemented, this is not the case anymore, and it becomes closer to a distributed system in traits than an operator.
| decoupled from the RGD definition - for example, "make all our platform RGDs | ||
| available on every cluster in the fleet". | ||
|
|
||
| ### Complementary, Not Competing |
There was a problem hiding this comment.
I think this is a good overview, thanks for this
| (`controller-runtime/pkg/metrics`). | ||
|
|
||
| **Low coupling** - Instance controller (already uses custom abstractions), | ||
| `DynamicController` (only needs `Runnable` interface and `ctrl.Request` type). |
There was a problem hiding this comment.
There is now stronger coupling in dynamic controller because the instance feeds back watch requests to DynamicController
There was a problem hiding this comment.
Added a section about the watch system.
TL:DR:
For multicluster, the watch system needs cluster awareness (keys, funcs)
|
This KREP now has many technical implementation details. I can split this a bit, but tried to keep it tied "to the ground" so impact is clearer |
|
@mjudeikis: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
| compiled graphs from the hub-local in-memory registry: | ||
|
|
||
| ```go | ||
| func (c *Controller) Reconcile(ctx context.Context, clusterName string, req ctrl.Request) error { |
There was a problem hiding this comment.
clusterName should be the typed multicluster.ClusterName
| The instance controller's `Reconcile(ctx, req) error` and the RGD controller's | ||
| `Reconcile(ctx, obj) (Result, error)` differ. For multicluster, both need a | ||
| cluster name in the reconcile path. Rather than changing signatures, the cluster | ||
| name is carried in the context: |
There was a problem hiding this comment.
This conflicts with the seciont "Instance Controller Changes" above, or?
There is says the instance controller should expect the cluster name in its signature now.
Or are these two different controllers?
Summary
Multicluster support for KRO using
sigs.k8s.io/multicluster-runtime(MCR) as a drop-in forcontroller-runtime. Hub-spoke model: RGDs on hub, CRDs + instances on spokes. No RGD APIchanges.
Addresses #1060.
Key points
DynamicControllerinstances viaMulticlusterDynamicController--enable-multiclusterflag, zero change without itRelationship to KREP-012
Compares with #1064 (Cluster Targets). Different layer — KREP-012 is API-level (Target field in RGD), KREP-013 is runtime-level (reconcile loop). They're complementary: KREP-013 handles
"where do instances live", KREP-012 handles "where do specific resources go". KREP-012's Target can be built on top of KREP-013's
ClusterClientFactory.Also covers
POC: https://github.com/mjudeikis/kro/tree/mcr.poc