Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions .agents/skills/debug-openshell-cluster/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,21 @@ helm -n openshell get values openshell | grep -E 'repository|tag|supervisorImage

The gateway image built from `deploy/docker/Dockerfile.gateway` and the scratch supervisor image built from `deploy/docker/Dockerfile.supervisor` should use the same build tag in branch and E2E deploys. A stale supervisor image can make sandbox behavior lag behind gateway policy or proto changes.

Check the sandbox supervisor topology rendered into gateway config:

```bash
kubectl -n openshell get configmap openshell-config -o jsonpath='{.data.gateway\.toml}' | grep -E 'supervisor_role|network_enforcement_mode|enforcer_endpoint|privileged'
```

Expected Kubernetes default is `supervisor_role = "workload"` and
`network_enforcement_mode = "soft-proxy"`. This starts unprivileged sandbox
pods and logs that direct socket egress is not kernel-blocked. Use
`network_enforcement_mode = "supervisor-netns"` only when the sandbox pod has
the required capabilities or `server.sandboxPrivileged=true`. Use
`network_enforcement_mode = "external-enforcer"` with `nodeEnforcer.enabled=true`
to test node-enforcer enforcement; the enforcer should log workload
registration and successful sandbox network egress enforcement installation.

For local/external pull mode (the default local path via `mise run cluster`), local images are tagged to the configured local registry base, pushed to that registry, and pulled by k3s via the `registries.yaml` mirror endpoint. The `cluster` task pushes prebuilt local tags (`openshell/*:dev`, falling back to `localhost:5000/openshell/*:dev` or `127.0.0.1:5000/openshell/*:dev`).

Gateway image builds stage a partial Rust workspace from `deploy/docker/Dockerfile.images`. If cargo fails with a missing manifest under `/build/crates/...`, or an imported symbol exists locally but is missing in the image build, verify that every current gateway dependency crate, including `openshell-driver-docker`, `openshell-driver-kubernetes`, and `openshell-ocsf`, is copied into the staged workspace there.
Expand All @@ -206,6 +221,18 @@ kubectl -n openshell get svc openshell -o wide
kubectl -n openshell get endpoints openshell
```

When the gateway is exposed through Envoy Gateway, deployment infrastructure may
need a `BackendTrafficPolicy` to disable Envoy's request and stream duration
timeouts for OpenShell's long-lived gRPC streams. A missing or rejected policy
commonly shows up as CLI failures around 15 seconds with `h2 protocol error:
error reading a body from connection`, especially on `sandbox create -- <cmd>`,
upload/download, sync, `WatchSandbox`, `ForwardTcp`, and `RelayStream` paths.

```bash
kubectl -n openshell get backendtrafficpolicy openshell-grpc-streams -o yaml
kubectl -n openshell get grpcroute openshell -o yaml
```

For local port-forward testing:

```bash
Expand Down
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

22 changes: 22 additions & 0 deletions architecture/compute-runtimes.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,28 @@ runtime still owns GPU device injection.
## Deployment Shape

Kubernetes deployments use the Helm chart under `deploy/helm/openshell`.
The Kubernetes driver can set a default `runtimeClassName` for sandbox pods,
for example `gvisor` or a Kata Containers RuntimeClass, while preserving
per-sandbox template overrides. When a default RuntimeClass is configured, the
Kubernetes driver validates its existence at startup so missing cluster runtime
support fails before any sandbox pods are requested. Per-sandbox RuntimeClass
overrides are validated during sandbox admission/create because they are not
known at gateway startup. The Kubernetes driver can also set
`securityContext.privileged` on all sandbox pod containers as a deployment-wide,
short-term compatibility escape hatch for clusters that require privileged pod
admission; this weakens the container boundary and is not a replacement for a
stronger runtime isolation model. Kubernetes deployments also select an
explicit supervisor/network topology. The default is
`supervisor_role = "workload"` with `network_enforcement_mode = "soft-proxy"`,
which keeps sandbox pods unprivileged and relies on the proxy for cooperative
traffic while reporting that direct sockets are not kernel-blocked. The existing
hard supervisor-managed netns/veth/nft path remains available through
`network_enforcement_mode = "supervisor-netns"`. The experimental
`external-enforcer` mode registers workload supervisors with a privileged
node-side enforcer DaemonSet, which enters the pod network namespace and
installs coarse nftables egress rules so non-root sandbox processes must use the
proxy. Dynamic endpoint, binary, and L7 policy remains inside the workload
proxy.
Standalone local deployments start the gateway with a selected runtime such as
Docker, Podman, or VM. The CLI can register multiple gateways and switch between
them without changing the sandbox architecture.
Expand Down
8 changes: 8 additions & 0 deletions architecture/sandbox.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ Each sandbox workload has two trust levels:
|---|---|
| Supervisor | Starts as root inside the workload, prepares isolation, runs the proxy, fetches config, injects credentials, serves the relay socket, and launches child processes. |
| Agent child | Runs as an unprivileged user with filesystem, process, and network restrictions applied. |
| Node enforcer | Optional privileged host/node-side supervisor role for Kubernetes. It accepts workload registrations and installs coarse pod-netns egress rules while policy decisions stay in the workload proxy. |

The supervisor keeps enough privilege to manage the sandbox, but the agent child
loses that privilege before user code runs.
Expand Down Expand Up @@ -41,6 +42,13 @@ OpenShell uses overlapping controls rather than a single sandbox primitive:
| Network namespace | Forces ordinary agent egress through the local CONNECT proxy. |
| Policy proxy | Evaluates destination, binary identity, TLS/L7 rules, SSRF checks, and inference interception. |

The supervisor resolves an explicit network enforcement mode at startup.
`combined`/`supervisor-netns` preserves the local hard netns path. Kubernetes
defaults to `workload`/`soft-proxy`, which keeps the pod unprivileged and
reports that direct sockets are not kernel-blocked. `external-enforcer`
delegates the coarse direct-egress boundary to a host/node component while
leaving dynamic endpoint policy in the proxy.

The supervisor may enrich baseline filesystem allowances for runtime-required
paths, such as proxy support files or GPU device paths when a GPU is present.

Expand Down
10 changes: 10 additions & 0 deletions crates/openshell-cli/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4387,6 +4387,16 @@ mod tests {
}
}

#[test]
fn sandbox_create_rejects_privileged_flag() {
let err = Cli::try_parse_from(["openshell", "sandbox", "create", "--privileged"])
.expect_err("privileged must not be a per-sandbox CLI flag");
assert!(
err.to_string().contains("--privileged"),
"error should identify the rejected flag"
);
}

#[test]
fn service_expose_accepts_positional_target_port_and_service() {
let cli = Cli::try_parse_from([
Expand Down
135 changes: 135 additions & 0 deletions crates/openshell-core/src/sandbox_env.rs
Original file line number Diff line number Diff line change
Expand Up @@ -63,3 +63,138 @@ pub const USER_ENVIRONMENT: &str = "OPENSHELL_USER_ENVIRONMENT";
/// writes and rotates this file; the supervisor exchanges its contents
/// for a gateway JWT at startup and on refresh.
pub const K8S_SA_TOKEN_FILE: &str = "OPENSHELL_K8S_SA_TOKEN_FILE";

/// Runtime role selected for the sandbox supervisor binary.
pub const SUPERVISOR_ROLE: &str = "OPENSHELL_SUPERVISOR_ROLE";

/// Network enforcement mode selected for the sandbox supervisor binary.
pub const NETWORK_ENFORCEMENT_MODE: &str = "OPENSHELL_NETWORK_ENFORCEMENT_MODE";

/// Endpoint for an external node/host enforcer.
pub const ENFORCER_ENDPOINT: &str = "OPENSHELL_ENFORCER_ENDPOINT";

/// Node IP injected by Kubernetes when an external node enforcer is used.
pub const NODE_IP: &str = "OPENSHELL_NODE_IP";

/// Pod IP injected by Kubernetes for node-enforcer registration.
pub const POD_IP: &str = "OPENSHELL_POD_IP";

#[derive(Debug, Clone, Copy, PartialEq, Eq, Default, serde::Serialize, serde::Deserialize)]
#[serde(rename_all = "kebab-case")]
pub enum SupervisorRole {
/// Runs inside the sandbox/container and owns workload lifecycle.
Workload,
/// Runs as a privileged host/node-side enforcement component.
Enforcer,
/// Current local-style topology: one supervisor owns lifecycle and hard controls.
#[default]
Combined,
}

impl SupervisorRole {
#[must_use]
pub const fn as_str(self) -> &'static str {
match self {
Self::Workload => "workload",
Self::Enforcer => "enforcer",
Self::Combined => "combined",
}
}
}

impl std::fmt::Display for SupervisorRole {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
f.write_str(self.as_str())
}
}

impl std::str::FromStr for SupervisorRole {
type Err = String;

fn from_str(value: &str) -> Result<Self, Self::Err> {
match value.trim().to_ascii_lowercase().as_str() {
"workload" => Ok(Self::Workload),
"enforcer" => Ok(Self::Enforcer),
"combined" => Ok(Self::Combined),
other => Err(format!(
"unknown supervisor role '{other}'; expected 'workload', 'enforcer', or 'combined'"
)),
}
}
}

#[derive(Debug, Clone, Copy, PartialEq, Eq, Default, serde::Serialize, serde::Deserialize)]
#[serde(rename_all = "kebab-case")]
pub enum NetworkEnforcementMode {
/// Resolve from the supervisor role and runtime hints.
#[default]
Auto,
/// Cooperative proxy environment only; direct sockets are not kernel-blocked.
SoftProxy,
/// Supervisor-managed netns/veth/nft enforcement.
SupervisorNetns,
/// Enforcement delegated to a node/host enforcer.
ExternalEnforcer,
}

impl NetworkEnforcementMode {
#[must_use]
pub const fn as_str(self) -> &'static str {
match self {
Self::Auto => "auto",
Self::SoftProxy => "soft-proxy",
Self::SupervisorNetns => "supervisor-netns",
Self::ExternalEnforcer => "external-enforcer",
}
}
}

impl std::fmt::Display for NetworkEnforcementMode {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
f.write_str(self.as_str())
}
}

impl std::str::FromStr for NetworkEnforcementMode {
type Err = String;

fn from_str(value: &str) -> Result<Self, Self::Err> {
match value.trim().to_ascii_lowercase().as_str() {
"auto" => Ok(Self::Auto),
"soft-proxy" => Ok(Self::SoftProxy),
"supervisor-netns" => Ok(Self::SupervisorNetns),
"external-enforcer" => Ok(Self::ExternalEnforcer),
other => Err(format!(
"unknown network enforcement mode '{other}'; expected 'auto', 'soft-proxy', 'supervisor-netns', or 'external-enforcer'"
)),
}
}
}

#[cfg(test)]
mod tests {
use super::{NetworkEnforcementMode, SupervisorRole};

#[test]
fn supervisor_role_round_trips_kebab_case() {
assert_eq!("workload".parse(), Ok(SupervisorRole::Workload));
assert_eq!(SupervisorRole::Enforcer.to_string(), "enforcer");
assert_eq!(
serde_json::to_value(SupervisorRole::Combined).unwrap(),
serde_json::json!("combined")
);
}

#[test]
fn network_enforcement_mode_round_trips_kebab_case() {
assert_eq!("soft-proxy".parse(), Ok(NetworkEnforcementMode::SoftProxy));
assert_eq!(
NetworkEnforcementMode::ExternalEnforcer.to_string(),
"external-enforcer"
);
assert_eq!(
serde_json::to_value(NetworkEnforcementMode::SupervisorNetns).unwrap(),
serde_json::json!("supervisor-netns")
);
}
}
24 changes: 24 additions & 0 deletions crates/openshell-driver-kubernetes/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,30 @@ this driver. Kubernetes owns scheduling and pod lifecycle. The
`openshell-sandbox` supervisor inside each workload owns agent isolation,
credential injection, policy polling, logs, and the gateway relay.

Set `default_runtime_class_name` in the driver config to assign a default Kubernetes
RuntimeClass, such as `gvisor` or a Kata Containers RuntimeClass, to sandbox
pods. Per-sandbox template `runtime_class_name` values override the driver
default. When `default_runtime_class_name` is configured, the driver validates
that the cluster has that RuntimeClass during startup so a missing runtime fails
fast instead of surfacing later as pod sandbox creation errors. Per-sandbox
RuntimeClass overrides are validated during sandbox
admission/create. As a short-term compatibility escape hatch, the driver can set
`privileged = true` deployment-wide; the driver maps that to
`podTemplate.spec.containers[0].securityContext.privileged` for all sandbox pod
containers. Use it only for trusted clusters that require privileged pod
admission because it weakens the container boundary.

Kubernetes deployments default to `supervisor_role = "workload"` and
`network_enforcement_mode = "soft-proxy"`. In this mode the supervisor runs the
proxy, policy reload, relay, and agent lifecycle without creating a Linux
network namespace; proxy-aware traffic is enforced, but direct socket egress is
not kernel-blocked. Set `network_enforcement_mode = "supervisor-netns"` to use
the existing netns/veth/nft path when the sandbox pod has the required Linux
capabilities. Set `network_enforcement_mode = "external-enforcer"` to try the
node-enforcer topology; the workload supervisor registers with a node-side
enforcer, which installs coarse pod-netns egress rules while dynamic endpoint
policy stays inside the proxy.

## Sandbox Resource

The driver works with the `agents.x-k8s.io/v1alpha1` `Sandbox` custom resource.
Expand Down
53 changes: 53 additions & 0 deletions crates/openshell-driver-kubernetes/src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
// SPDX-License-Identifier: Apache-2.0

use openshell_core::config::DEFAULT_SUPERVISOR_IMAGE;
use openshell_core::sandbox_env::{NetworkEnforcementMode, SupervisorRole};
use serde::{Deserialize, Deserializer, Serialize};
use std::str::FromStr;

Expand Down Expand Up @@ -176,6 +177,19 @@ pub struct KubernetesComputeConfig {
deserialize_with = "deserialize_optional_app_armor_profile"
)]
pub app_armor_profile: Option<AppArmorProfile>,
/// Runtime role passed to the sandbox supervisor binary.
pub supervisor_role: SupervisorRole,
/// Network enforcement mode passed to the sandbox supervisor binary.
pub network_enforcement_mode: NetworkEnforcementMode,
/// Endpoint template for a node/host enforcer. Supports Kubernetes env
/// expansion such as `http://$(OPENSHELL_NODE_IP):17671`.
pub enforcer_endpoint: String,
/// Set `securityContext.privileged` on sandbox pod containers.
///
/// This is a deployment-wide compatibility escape hatch for clusters that
/// require privileged pod admission. It weakens the container boundary and
/// should stay disabled unless the Kubernetes environment is trusted.
pub privileged: bool,
pub workspace_default_storage_size: String,
/// Default Kubernetes `runtimeClassName` for sandbox pods.
/// Applied when a `CreateSandbox` request does not specify one.
Expand Down Expand Up @@ -221,6 +235,10 @@ impl Default for KubernetesComputeConfig {
host_gateway_ip: String::new(),
enable_user_namespaces: false,
app_armor_profile: None,
supervisor_role: SupervisorRole::Workload,
network_enforcement_mode: NetworkEnforcementMode::SoftProxy,
enforcer_endpoint: String::new(),
privileged: false,
workspace_default_storage_size: DEFAULT_WORKSPACE_STORAGE_SIZE.to_string(),
default_runtime_class_name: String::new(),
sa_token_ttl_secs: 3600,
Expand Down Expand Up @@ -362,4 +380,39 @@ mod tests {
let cfg: KubernetesComputeConfig = serde_json::from_value(json).unwrap();
assert_eq!(cfg.image_pull_secrets, ["regcred", "backup-regcred"]);
}

#[test]
fn serde_override_privileged() {
let json = serde_json::json!({
"privileged": true
});
let cfg: KubernetesComputeConfig = serde_json::from_value(json).unwrap();
assert!(cfg.privileged);
}

#[test]
fn default_kubernetes_supervisor_mode_is_soft_workload() {
let cfg = KubernetesComputeConfig::default();
assert_eq!(cfg.supervisor_role, SupervisorRole::Workload);
assert_eq!(
cfg.network_enforcement_mode,
NetworkEnforcementMode::SoftProxy
);
}

#[test]
fn serde_override_supervisor_network_mode() {
let json = serde_json::json!({
"supervisor_role": "combined",
"network_enforcement_mode": "supervisor-netns",
"enforcer_endpoint": "http://$(OPENSHELL_NODE_IP):17671"
});
let cfg: KubernetesComputeConfig = serde_json::from_value(json).unwrap();
assert_eq!(cfg.supervisor_role, SupervisorRole::Combined);
assert_eq!(
cfg.network_enforcement_mode,
NetworkEnforcementMode::SupervisorNetns
);
assert_eq!(cfg.enforcer_endpoint, "http://$(OPENSHELL_NODE_IP):17671");
}
}
Loading
Loading