Restart driver pods in place when driver config is unchanged#2527
Restart driver pods in place when driver config is unchanged#2527rajathagasthya wants to merge 1 commit into
Conversation
ce94262 to
5165328
Compare
5165328 to
6144fb2
Compare
| // DaemonSet have the same DRIVER_CONFIG_DIGEST, i.e. the install-relevant config is | ||
| // unchanged (e.g. only a helm.sh/chart label changed). If either digest is missing, it | ||
| // returns false and the node takes the full upgrade flow. | ||
| func (r *UpgradeReconciler) driverPodRestartOnly(_ context.Context, pod *corev1.Pod, ds *appsv1.DaemonSet) (bool, error) { |
There was a problem hiding this comment.
If we are not going to use ctx why add it to the function signature of the predicate type?
There was a problem hiding this comment.
Instead of Pod and Daemonset as the arguments, can we have both the arguments of identical types instead? Having corev1.Pod on the LHS and the appsv1.DaemonSet on the RHS is not too intuitive IMO
There was a problem hiding this comment.
I kept the predicate signature for flexibility in case we change how we compare pod and daemonset in the future. But I can convert it to:
type RestartOnlyPredicate func(running, desired *corev1.PodSpec) (bool, error)I don't particularly have a strong opinion.
There was a problem hiding this comment.
Changed it to the above. Let me know if it looks good.
| } | ||
| desired := driverconfig.DriverConfigDigestFromPodSpec(&ds.Spec.Template.Spec) | ||
| running := driverconfig.DriverConfigDigestFromPodSpec(&pod.Spec) | ||
| if desired == "" || running == "" { |
There was a problem hiding this comment.
Let's do a len() == 0 check instead
There was a problem hiding this comment.
Why? Afaik, if str == "" is the most common and idiomatic way of checking for an empty string (I see this everywhere in this repo), while len() == 0 is idiomatic for slices/maps.
A patch chart upgrade can change only cosmetic pod-template metadata (e.g. the helm.sh/chart label) without changing the driver itself. The upgrade controller keys on the DaemonSet's controller revision hash, so such a change still evicts running GPU workloads and drains the node -- for no driver benefit. Register a RestartOnlyPredicate on the upgrade state manager (from the UpgradeReconciler) that compares DRIVER_CONFIG_DIGEST -- a hash of the install-relevant driver config, already set on the driver pod template -- between the running pod and the desired DaemonSet. When the digests match, the node is cordoned and the driver pod restarted in place, with no workload eviction or drain; the driver fast-path keeps the kernel modules loaded across the restart, so running GPU workloads are not disrupted. Cordoning keeps the node unschedulable if the restart fails, and the node is uncordoned on success. A missing or differing digest falls back to the full upgrade flow. The digest env name and a reader for it live in internal/config beside the digest definition; the restart-only routing decision is a method on the upgrade controller, registered in SetupWithManager. Depends on the RestartOnlyPredicate hook in k8s-operator-libs; the vendored dependency bump follows once that change is released. Signed-off-by: Rajath Agasthya <ragasthya@nvidia.com>
6144fb2 to
731d85c
Compare
Description
A patch chart upgrade can change only cosmetic pod-template metadata (e.g. the
helm.sh/chartlabel) without changing the driver itself. The upgrade controller keys on the DaemonSet's controller revision hash, so such a change still evicts running GPU workloads and drains the node, causing disruption for running workloads.Register a
RestartOnlyPredicateon the upgrade state manager (from theUpgradeReconciler) that comparesDRIVER_CONFIG_DIGEST— a hash of the install-relevant driver config, already set on the driver pod template — between the running pod and the desired DaemonSet. When the digests match, the node is cordoned and the driver pod restarted in place, with no workload eviction or drain; the driver fast-path keeps the kernel modules loaded across the restart, so running GPU workloads are not disrupted. Cordoning keeps the node unschedulable if the restart fails (same as the full flow), and the node is uncordoned on success. A missing or differing digest falls back to the full upgrade flow.If the predicate returns an error or the cordon fails, the node stays in
upgrade-requiredand is retried on a later reconcile (with a Warning event), rather than being routed to the disruptive flow on an unknown answer.Known limitation: the first upgrade from a release without restart-only is still disruptive, because the old operator holds the leader-election lease and routes the upgrade before the new operator becomes leader. Steady-state (both sides have the code) is non-disruptive.
Related to #349
Checklist
make lint)make validate-generated-assets)make validate-modules)Testing
Unit tests:
internal/config:TestDriverConfigDigestFromPodSpec— digest reader, incl. nil/empty/container-precedence cases.controllers:TestDriverPodRestartOnly— predicate routing, incl. nil pod/DS and missing/equal/differing digests.Manual testing (single-node cluster, GPU workload running throughout):
upgrade-required → pod-restart-required(cordoned, nevercordon-required), the driver pod restarts via the fast path, and the GPU workload is not evicted.