KREP-013: Multicluster support via multicluster-runtime by mjudeikis · Pull Request #1223 · kubernetes-sigs/kro

mjudeikis · 2026-03-31T08:32:01Z

Summary

Multicluster support for KRO using sigs.k8s.io/multicluster-runtime (MCR) as a drop-in for controller-runtime. Hub-spoke model: RGDs on hub, CRDs + instances on spokes. No RGD API
changes.

Addresses #1060.

Key points

MCR operates at the reconcile loop level, transparent to the RGD API
Pluggable cluster discovery (kubeconfig Secrets first, Cluster API later)
Per-cluster DynamicController instances via MulticlusterDynamicController
Opt-in via --enable-multicluster flag, zero change without it
Same-cluster locality: all child resources stay in the instance's cluster

Relationship to KREP-012

Compares with #1064 (Cluster Targets). Different layer — KREP-012 is API-level (Target field in RGD), KREP-013 is runtime-level (reconcile loop). They're complementary: KREP-013 handles
"where do instances live", KREP-012 handles "where do specific resources go". KREP-012's Target can be built on top of KREP-013's ClusterClientFactory.

Also covers

Code restructuring needed to make controller-runtime vs MCR pluggable (5 interface extractions, incremental migration path)
How KREP-012 Target functionality can be implemented within KREP-013's foundation

POC: https://github.com/mjudeikis/kro/tree/mcr.poc

k8s-ci-robot · 2026-03-31T08:32:10Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mjudeikis
Once this PR has been reviewed and has the lgtm label, please assign joelspeed for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

jakobmoellerdev

I believe this is a good proposal, has an implementation backed proof of concept, and can work without maintaining any debt, building on existing primitives.

I'm in support of the proposal, well done.

jakobmoellerdev · 2026-03-31T17:14:19Z

+available on all clusters. Today this requires 50 KRO installations, 50 copies
+of each RGD, and no centralized way to observe instance health across the fleet.
+
+The hub-spoke model solves this: define RGDs once in the hub, KRO automatically


I am lacking a support for spoke-to-spoke communication and Meta RGDs that reconcile around multiple clusters. maybe out of scope for this proposal but should probably be mentioned right?

Spoke to spoke is out of scope. This would require dependencies, like network connectivity between them. One could build this using mcr, but this is out of scope.

Meta RGDs that reconcile around multiple clusters
what do you mean by this? RGDs produced by RGDs?

Maybe for MCR it is, but not for KRO. we need to put this into the proposal

Can you give me a use case for spoke-to-spoke? Im having a hard time wrapping my head around why one would want this in the initial version. I think im missing something here.

If you wanna maintain a graph across clusters this becomes interesting:

Example: Service Mesh Configuration between 2 clusters.

You could use KRO to:

Setup both clusters with certificate / mTLS information

Configure Ingress/Egress on both clusters

Then establish connectivity between both

all from one graph

But this is hub-to-spoke, as the graph still comes from a central hub cluster as orchestration for both leaf clusters. There would be no KRO running in any of the leaf/spoke clusters, so no KRO operations as such.
The connection establishment is a side-effect of how KRO can be used, not direct spoke-to-spoke communication by KRO itself.

jakobmoellerdev · 2026-03-31T17:17:37Z

+
+**Hub and Spoke Separation**
+
+RGDs are authored and stored only in the hub cluster. The RGD controller runs


We will need to update this a little because GraphRevisions are now a thing!

Ah, wrote this looking to old version of KRO :D updating to represent how this is handled

TL:DR:

GraphRevisions are never distributed to spokes. They are an internal hub concern — spokes only receive the generated CRDs. The revision registry is hub-local in-memory state that instance controllers on the hub read from directly.

jakobmoellerdev · 2026-03-31T17:20:03Z

+1. RGD controller generates CRD (existing behavior)
+2. CRD is applied to the hub cluster (existing behavior)
+3. CRD is applied to all engaged spoke clusters via `ClusterClientFactory`
+4. When a new cluster is engaged, all existing CRDs are distributed to it


how would we control a rollout between multiple runtime clusters? (i.e. phased)
Also how would we control lifecycle differentiation (as in spoke A gets Graph A, spoke B gets Graph B)? Is such a thing possible with mcr?

No, this would require to pull logic from KREP-012 in a way. One could build something like this (like deployment rollout strategies), but because it works on the reconcile loop level, it does not have it. We would need to add something at the API level.

We can add this, but I would do this as follow-up

For me the KREP will be incomplete without a vision on how to move to phased rollout between clusters TBH. even if it is just an outlook

How would you imagine this? The idea I was thinking of is a separate object, like RGDRollout, which is based on selectors/labels, that would orchestrate it. Easiest, would not overload existing API and would be opt-in only when using multi-cluster mode.

jakobmoellerdev · 2026-03-31T17:20:22Z

+3. CRD is applied to all engaged spoke clusters via `ClusterClientFactory`
+4. When a new cluster is engaged, all existing CRDs are distributed to it
+
+CRD distribution uses server-side apply to handle conflicts gracefully. If a


nit: I feel like this is irrelevant to the Proposal

I think we need to be explicit for this. We don't have a CRD distribution problem as it's a singleton cluster. After this was implemented, this is not the case anymore, and it becomes closer to a distributed system in traits than an operator.

jakobmoellerdev · 2026-03-31T17:22:28Z

+decoupled from the RGD definition - for example, "make all our platform RGDs
+available on every cluster in the fleet".
+
+### Complementary, Not Competing


I think this is a good overview, thanks for this

jakobmoellerdev · 2026-03-31T17:23:20Z

+(`controller-runtime/pkg/metrics`).
+
+**Low coupling** - Instance controller (already uses custom abstractions),
+`DynamicController` (only needs `Runnable` interface and `ctrl.Request` type).


There is now stronger coupling in dynamic controller because the instance feeds back watch requests to DynamicController

Added a section about the watch system.

TL:DR:

For multicluster, the watch system needs cluster awareness (keys, funcs)

mjudeikis · 2026-04-01T06:10:55Z

This KREP now has many technical implementation details. I can split this a bit, but tried to keep it tied "to the ground" so impact is clearer

k8s-ci-robot · 2026-04-01T06:18:19Z

@mjudeikis: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
presubmits-unit-tests	`696111f`	link	true	`/test presubmits-unit-tests`
presubmits-integration-tests	`696111f`	link	true	`/test presubmits-integration-tests`
presubmits-e2e-upgrade-tests	`696111f`	link	true	`/test presubmits-e2e-upgrade-tests`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

ntnn · 2026-04-02T08:45:41Z

+compiled graphs from the hub-local in-memory registry:
+
+```go
+func (c *Controller) Reconcile(ctx context.Context, clusterName string, req ctrl.Request) error {


clusterName should be the typed multicluster.ClusterName

ntnn · 2026-04-02T08:56:58Z

+The instance controller's `Reconcile(ctx, req) error` and the RGD controller's
+`Reconcile(ctx, obj) (Result, error)` differ. For multicluster, both need a
+cluster name in the reconcile path. Rather than changing signatures, the cluster
+name is carried in the context:


This conflicts with the seciont "Instance Controller Changes" above, or?
There is says the instance controller should expect the cluster name in its signature now.
Or are these two different controllers?

k8s-ci-robot requested review from a-hilaly and michaelhtm March 31, 2026 08:32

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 31, 2026

jakobmoellerdev reviewed Mar 31, 2026

View reviewed changes

mjudeikis added 4 commits April 1, 2026 08:53

kro multicluster-support KREP

6949ea5

Address meta RGDs

3e4327b

Address GraphRevision

b6b1963

address watch system

696111f

mjudeikis force-pushed the krep.mcr branch from c2b93f8 to 696111f Compare April 1, 2026 06:09

k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Apr 1, 2026

ntnn reviewed Apr 2, 2026

View reviewed changes

ntnn added this to tbd Apr 2, 2026

ntnn moved this to Reviewing in tbd Apr 2, 2026

jakobmoellerdev mentioned this pull request Apr 14, 2026

EPIC: KRO Levelled Apply open-component-model/ocm-project#955

Open

21 tasks


		Hub and Spoke Separation

		RGDs are authored and stored only in the hub cluster. The RGD controller runs

Conversation

mjudeikis commented Mar 31, 2026

Summary

Key points

Relationship to KREP-012

Also covers

Uh oh!

k8s-ci-robot commented Mar 31, 2026

Uh oh!

jakobmoellerdev left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mjudeikis commented Apr 1, 2026

Uh oh!

k8s-ci-robot commented Apr 1, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants