[Feature]: Driver upgrades cannot free GPU consumers that lack an nvidia.com/gpu request or run as DaemonSets

### What

Request a supported way for the driver upgrade flow to free every consumer holding the GPU kernel module open. Two cases the current flow cannot handle:

1. Pods that use the GPU via `runtimeClassName: nvidia` with no `nvidia.com/gpu` resource request (exclusive mode, no time-slicing / MPS).
2. DaemonSet GPU consumers (for us, a standalone dcgm-exporter).

Both keep the module refcount above zero, so an in-place driver unload/upgrade stalls. This is the underlying cause behind the stuck-driver symptom in #2549 (the init container "fails to unload the driver while workloads are active").

### Why the current flow doesn't cover these

There are two eviction paths, and neither can free these consumers:

- Node drain (`k8s-operator-libs` `pkg/upgrade/drain_manager.go`): `PodSelector` is configurable and defaults in gpu-operator (`controllers/upgrade_controller.go`) to `nvidia.com/gpu-driver-upgrade-drain.skip!=true`. But `IgnoreAllDaemonSets: true` is hardcoded, so DaemonSet consumers are never evicted by drain.
- Targeted pod deletion (`pkg/upgrade/pod_manager.go`): uses an injected `PodDeletionFilter`. gpu-operator's filter keys on the `nvidia.com/gpu` resource, so pods without that request are never selected. `IgnoreAllDaemonSets: true` is hardcoded here too.

Net: a DaemonSet GPU consumer cannot be freed by any path, and a non-resource GPU pod is only reachable via a broad node drain - heavier than needed, and in our testing it did not free our runtime-direct Deployments either.

### Proposal

1. Allow declaring additional GPU consumers to evict during an upgrade by label/selector, surfaced through `driver.upgradePolicy`, so they do not have to request `nvidia.com/gpu`. This complements the existing `nvidia.com/gpu-driver-upgrade-drain.skip` opt-out with an opt-in.
2. Provide handling for declared DaemonSet GPU consumers during the upgrade window - e.g. an opt-in to cordon/park them (or make the DaemonSet-ignore configurable for a named set), restored on `upgrade-done`. The operator already quiesces its own operands; this extends that to declared third-party consumers.

### Where the change lives

- Core: `NVIDIA/k8s-operator-libs` `pkg/upgrade` (drain/pod managers; the hardcoded `IgnoreAllDaemonSets: true`).
- Surface: `NVIDIA/gpu-operator` `driver.upgradePolicy` API and the injected `PodDeletionFilter` in `controllers/upgrade_controller.go`.

### Workaround today

We run a small sidecar controller that watches `nvidia.com/gpu-driver-upgrade-state` and, on any transition away from `upgrade-done`, deletes the runtime-direct pods and parks the DaemonSet consumer (non-matching nodeSelector), then restores them on `upgrade-done`. With those consumers freed, the operator unloads, rebuilds and reloads the driver in place with no node reboot (boot_id unchanged across the upgrade). Happy to share it.

Related: #2549.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature]: Driver upgrades cannot free GPU consumers that lack an nvidia.com/gpu request or run as DaemonSets #2570

What

Why the current flow doesn't cover these

Proposal

Where the change lives

Workaround today

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: Driver upgrades cannot free GPU consumers that lack an nvidia.com/gpu request or run as DaemonSets #2570

Description

What

Why the current flow doesn't cover these

Proposal

Where the change lives

Workaround today

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions