Skip to content

feat(kubernetes): WIP node enforcer isolation proof of concept#1827

Draft
TaylorMutch wants to merge 3 commits into
mainfrom
node-enforcer-poc/tmutch
Draft

feat(kubernetes): WIP node enforcer isolation proof of concept#1827
TaylorMutch wants to merge 3 commits into
mainfrom
node-enforcer-poc/tmutch

Conversation

@TaylorMutch

@TaylorMutch TaylorMutch commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Summary

Draft proof-of-concept for the Kubernetes node-enforcer topology. This PR is intentionally WIP and meant to capture the spike result, diagrams, local validation, and tradeoffs rather than propose a finalized production design.

See the spike/ directory README and the diagrams to see more investigation into this topology design.

Related Issue

N/A - spike proof of concept.

Changes

  • Adds the external node-enforcer topology for Kubernetes so sandbox workload pods can run without the filesystem lockdown and network namespace privileges currently held by the combined supervisor.
  • Adds Helm wiring, node-enforcer CI/dev values, and label isolation for the node-enforcer DaemonSet.
  • Moves the spike notes into spike/README.md and includes the generated architecture diagrams comparing current state, split-pod options, isolation backend framing, and the node-enforcer option.

Testing

  • mise run helm:test passed.
  • Kubernetes node-enforcer smoke e2e passed with deploy/helm/openshell/ci/values-node-enforcer.yaml.
  • Kubernetes bypass_detection e2e passed with the node-enforcer overlay.
  • Manual sandbox validation passed: the sandbox created successfully, node-enforcer enforcement logs were observed, and raw direct TCP from the sandbox user path was rejected with exit code 111.
  • Final full pre-commit rerun was skipped for spike scope after the initial post-diagram run timed out during the Rust test phase; completed earlier phases had passed.

Checklist

  • Draft PR
  • Proof-of-concept status called out in the title and summary
  • Spike README and diagrams included
  • Production hardening complete

Add workload/enforcer supervisor roles and an external-enforcer network mode so Kubernetes sandbox pods can run without privileged netns setup while a privileged node DaemonSet owns coarse pod-netns egress enforcement.

Wire the mode through gateway config, Helm values/RBAC, the supervisor image, and Kubernetes driver pod rendering. Add workload registration, node-side nftables installation, docs, and focused tests for the prototype topology.
@copy-pr-bot

copy-pr-bot Bot commented Jun 9, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant