Skip to content

KREP-013: Multicluster support via multicluster-runtime#1223

Open
mjudeikis wants to merge 4 commits into
kubernetes-sigs:mainfrom
mjudeikis:krep.mcr
Open

KREP-013: Multicluster support via multicluster-runtime#1223
mjudeikis wants to merge 4 commits into
kubernetes-sigs:mainfrom
mjudeikis:krep.mcr

Conversation

@mjudeikis

Copy link
Copy Markdown

Summary

Multicluster support for KRO using sigs.k8s.io/multicluster-runtime (MCR) as a drop-in for controller-runtime. Hub-spoke model: RGDs on hub, CRDs + instances on spokes. No RGD API
changes.

Addresses #1060.

Key points

  • MCR operates at the reconcile loop level, transparent to the RGD API
  • Pluggable cluster discovery (kubeconfig Secrets first, Cluster API later)
  • Per-cluster DynamicController instances via MulticlusterDynamicController
  • Opt-in via --enable-multicluster flag, zero change without it
  • Same-cluster locality: all child resources stay in the instance's cluster

Relationship to KREP-012

Compares with #1064 (Cluster Targets). Different layer — KREP-012 is API-level (Target field in RGD), KREP-013 is runtime-level (reconcile loop). They're complementary: KREP-013 handles
"where do instances live", KREP-012 handles "where do specific resources go". KREP-012's Target can be built on top of KREP-013's ClusterClientFactory.

Also covers

  • Code restructuring needed to make controller-runtime vs MCR pluggable (5 interface extractions, incremental migration path)
  • How KREP-012 Target functionality can be implemented within KREP-013's foundation

POC: https://github.com/mjudeikis/kro/tree/mcr.poc

@k8s-ci-robot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mjudeikis
Once this PR has been reviewed and has the lgtm label, please assign joelspeed for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 31, 2026

@jakobmoellerdev jakobmoellerdev left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is a good proposal, has an implementation backed proof of concept, and can work without maintaining any debt, building on existing primitives.

I'm in support of the proposal, well done.

available on all clusters. Today this requires 50 KRO installations, 50 copies
of each RGD, and no centralized way to observe instance health across the fleet.

The hub-spoke model solves this: define RGDs once in the hub, KRO automatically

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am lacking a support for spoke-to-spoke communication and Meta RGDs that reconcile around multiple clusters. maybe out of scope for this proposal but should probably be mentioned right?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spoke to spoke is out of scope. This would require dependencies, like network connectivity between them. One could build this using mcr, but this is out of scope.

Meta RGDs that reconcile around multiple clusters
what do you mean by this? RGDs produced by RGDs?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe for MCR it is, but not for KRO. we need to put this into the proposal

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give me a use case for spoke-to-spoke? Im having a hard time wrapping my head around why one would want this in the initial version. I think im missing something here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you wanna maintain a graph across clusters this becomes interesting:

Example: Service Mesh Configuration between 2 clusters.

You could use KRO to:

  1. Setup both clusters with certificate / mTLS information
  2. Configure Ingress/Egress on both clusters
  3. Then establish connectivity between both

all from one graph

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is hub-to-spoke, as the graph still comes from a central hub cluster as orchestration for both leaf clusters. There would be no KRO running in any of the leaf/spoke clusters, so no KRO operations as such.
The connection establishment is a side-effect of how KRO can be used, not direct spoke-to-spoke communication by KRO itself.


**Hub and Spoke Separation**

RGDs are authored and stored only in the hub cluster. The RGD controller runs

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need to update this a little because GraphRevisions are now a thing!

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, wrote this looking to old version of KRO :D updating to represent how this is handled

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL:DR:

GraphRevisions are never distributed to spokes. They are an internal hub
concern — spokes only receive the generated CRDs. The revision registry is
hub-local in-memory state that instance controllers on the hub read from
directly.

1. RGD controller generates CRD (existing behavior)
2. CRD is applied to the hub cluster (existing behavior)
3. CRD is applied to all engaged spoke clusters via `ClusterClientFactory`
4. When a new cluster is engaged, all existing CRDs are distributed to it

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how would we control a rollout between multiple runtime clusters? (i.e. phased)
Also how would we control lifecycle differentiation (as in spoke A gets Graph A, spoke B gets Graph B)? Is such a thing possible with mcr?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this would require to pull logic from KREP-012 in a way. One could build something like this (like deployment rollout strategies), but because it works on the reconcile loop level, it does not have it. We would need to add something at the API level.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add this, but I would do this as follow-up

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me the KREP will be incomplete without a vision on how to move to phased rollout between clusters TBH. even if it is just an outlook

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you imagine this? The idea I was thinking of is a separate object, like RGDRollout, which is based on selectors/labels, that would orchestrate it. Easiest, would not overload existing API and would be opt-in only when using multi-cluster mode.

3. CRD is applied to all engaged spoke clusters via `ClusterClientFactory`
4. When a new cluster is engaged, all existing CRDs are distributed to it

CRD distribution uses server-side apply to handle conflicts gracefully. If a

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I feel like this is irrelevant to the Proposal

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to be explicit for this. We don't have a CRD distribution problem as it's a singleton cluster. After this was implemented, this is not the case anymore, and it becomes closer to a distributed system in traits than an operator.

decoupled from the RGD definition - for example, "make all our platform RGDs
available on every cluster in the fleet".

### Complementary, Not Competing

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good overview, thanks for this

(`controller-runtime/pkg/metrics`).

**Low coupling** - Instance controller (already uses custom abstractions),
`DynamicController` (only needs `Runnable` interface and `ctrl.Request` type).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is now stronger coupling in dynamic controller because the instance feeds back watch requests to DynamicController

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a section about the watch system.

TL:DR:

For multicluster, the watch system needs cluster awareness (keys, funcs)

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Apr 1, 2026
@mjudeikis

Copy link
Copy Markdown
Author

This KREP now has many technical implementation details. I can split this a bit, but tried to keep it tied "to the ground" so impact is clearer

@k8s-ci-robot

Copy link
Copy Markdown
Contributor

@mjudeikis: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
presubmits-unit-tests 696111f link true /test presubmits-unit-tests
presubmits-integration-tests 696111f link true /test presubmits-integration-tests
presubmits-e2e-upgrade-tests 696111f link true /test presubmits-e2e-upgrade-tests

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

compiled graphs from the hub-local in-memory registry:

```go
func (c *Controller) Reconcile(ctx context.Context, clusterName string, req ctrl.Request) error {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clusterName should be the typed multicluster.ClusterName

Comment on lines +583 to +586
The instance controller's `Reconcile(ctx, req) error` and the RGD controller's
`Reconcile(ctx, obj) (Result, error)` differ. For multicluster, both need a
cluster name in the reconcile path. Rather than changing signatures, the cluster
name is carried in the context:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This conflicts with the seciont "Instance Controller Changes" above, or?
There is says the instance controller should expect the cluster name in its signature now.
Or are these two different controllers?

@ntnn ntnn added this to tbd Apr 2, 2026
@ntnn ntnn moved this to Reviewing in tbd Apr 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants