fix(gpu): select single CDI GPU defaults#1675
Conversation
|
🌿 Preview your docs: https://nvidia-preview-pr-1675.docs.buildwithfern.com/openshell |
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
de229a4 to
7fa3cc6
Compare
4c18100 to
cec0e21
Compare
7fa3cc6 to
16fe7a2
Compare
16fe7a2 to
4cedfec
Compare
| Kubernetes mirrors each limit into the matching request. VM accepts the fields | ||
| but currently ignores them. | ||
|
|
||
| GPU requests enter the driver layer through `SandboxSpec.gpu` and |
4cedfec to
bbdc514
Compare
PR Review StatusValidation: this is maintainer-authored, project-valid GPU runtime work that addresses #1477 by making default Docker/Podman GPU requests select a concrete CDI device when possible. Review findings:
Docs: Fern docs were updated on the existing sandbox management page; no navigation change appears needed. Next state: |
|
Label |
bbdc514 to
2d73188
Compare
Re-check After Author UpdateI re-evaluated latest head Disposition: partially resolved. Remaining items:
Checks: Next state: |
BlockedGator is blocked by merge conflicts: GitHub reports Next action: @elezar update the branch against current
Next state: |
Prefer a single CDI-qualified device when Docker or Podman resolves the default GPU request to one GPU. Allow nvidia.com/gpu=all only as a WSL2 all-only compatibility fallback, using Docker daemon info and Podman's /dev/dxg probe to identify that case. Update driver docs, architecture notes, and GPU e2e coverage for the default selection behavior. Signed-off-by: Evan Lezar <elezar@nvidia.com>
2d73188 to
d1f5410
Compare
Re-check After Author UpdateI re-evaluated latest head Disposition: resolved for the blocker and prior review-loop items. Remaining items:
Checks: Next state: |
Maintainer Approval NeededGator validation and PR monitoring are complete for latest head Validation: maintainer-authored GPU runtime work that addresses #1477 by making default Docker/Podman GPU requests select a concrete CDI device when possible. Human maintainer approval or merge decision is now required. |
Summary
Updates Docker and Podman GPU handling so a bare GPU request selects one concrete NVIDIA CDI device when possible.
Default
nvidia.com/gpu=allfallback is allowed only for WSL2 all-only compatibility. Explicit GPU device requests pass through unchanged, includingnvidia.com/gpu=all.Related Issue
Closes #1477
Changes
DiscoveredDevicesand Docker/infoto allow WSL2 all-only fallback./dev/nvidiaNdevice nodes and uses/dev/dxgfor WSL2 all-only fallback.Testing
mise exec -- cargo fmt --checkmise exec -- cargo check -p openshell-core -p openshell-driver-docker -p openshell-driver-podman --testsmise exec -- cargo test -p openshell-core -p openshell-driver-docker -p openshell-driver-podman gpu --testsmise exec -- cargo test -p openshell-driver-docker -p openshell-driver-podman wsl2 --testsmise exec -- cargo test -p openshell-core -p openshell-driver-podman all_only --testsmise run pre-commitChecklist