Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
91fd9d4
chore(gator): add gator gate skill
johntmyers Jun 3, 2026
1a21bba
chore(gator): add sandbox launcher scaffold
johntmyers Jun 3, 2026
ba80e91
chore(gator): add codex image and docs checks
johntmyers Jun 3, 2026
9825e16
chore(gator): fold approved provider policy rules
johntmyers Jun 3, 2026
567bc88
chore(gator): add deterministic reviewer runner
johntmyers Jun 4, 2026
3dd6607
chore(gator): clarify ok-to-test comments
johntmyers Jun 4, 2026
c3066ac
chore(gator): structure launcher harnesses
johntmyers Jun 4, 2026
9141c1b
chore(gator): require e2e for dependabot
johntmyers Jun 4, 2026
b875880
chore(gator): add codex refresh profile
johntmyers Jun 4, 2026
c7306cd
chore(gator): wip manifest agent launcher
johntmyers Jun 5, 2026
d810646
feat(agents): supervise watch cycles in sandbox
johntmyers Jun 5, 2026
3b111f1
fix(agents): preserve gateway refresh state
johntmyers Jun 5, 2026
1057af2
fix(gator): continue human response threads
johntmyers Jun 6, 2026
8b83535
fix(agents): keep watch supervisor retrying
johntmyers Jun 7, 2026
6846b3b
fix(agents): use refreshed Codex credential aliases
johntmyers Jun 8, 2026
34c571e
fix(gator): avoid misleading gh auth checks
johntmyers Jun 9, 2026
10bc74a
docs(agents): remove architecture build update
johntmyers Jun 9, 2026
87b2a10
fix(gator): use REST-backed GitHub writes
johntmyers Jun 9, 2026
c479d52
fix(agents): bake immutable agent payloads
johntmyers Jun 9, 2026
7c3a2eb
fix(agents): upload writable agent workspace
johntmyers Jun 9, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
691 changes: 691 additions & 0 deletions .agents/skills/gator-gate/SKILL.md

Large diffs are not rendered by default.

94 changes: 94 additions & 0 deletions openshell-agents/Dockerfile.gator
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# syntax=docker/dockerfile:1

# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0

# Gator sandbox image.
#
# This mirrors the OpenShell Community base image's core system and developer
# tooling, but keeps the initial agent surface focused on Codex + GitHub tooling
# for the gator-gate workflow.

FROM nvcr.io/nvidia/base/ubuntu:noble-20251013 AS system

ENV DEBIAN_FRONTEND=noninteractive \
PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1

WORKDIR /sandbox

# Core system dependencies copied from the community base sandbox image.
# iproute2: network namespace management (ip netns, veth pairs)
# iptables: legacy bypass detection (kept for transition)
# nftables: bypass detection; log + reject rules for direct connection diagnostics
# dnsutils: dig, nslookup
RUN apt-get update && apt-get install -y --no-install-recommends \
ca-certificates \
curl \
dnsutils \
iproute2 \
iptables \
nftables \
iputils-ping \
net-tools \
netcat-openbsd \
openssh-sftp-server \
procps \
traceroute \
&& rm -rf /var/lib/apt/lists/*

RUN groupadd -r supervisor && useradd -r -g supervisor -s /usr/sbin/nologin supervisor && \
groupadd -r sandbox && useradd -r -g sandbox -d /sandbox -s /bin/bash sandbox

FROM system AS devtools

# Node.js 22 + build toolchain. Keep the default apt installs aligned with the
# community base image, then add the small CLI tools gator commonly needs.
RUN curl -fsSL https://deb.nodesource.com/setup_22.x | bash - && \
apt-get install -y --no-install-recommends \
build-essential \
git \
jq \
less \
nodejs=22.22.1-1nodesource1 \
ripgrep \
vim-tiny \
nano \
&& rm -rf /var/lib/apt/lists/* \
&& npm install -g npm@11.11.0

# GitHub CLI
RUN curl -fsSL https://cli.github.com/packages/githubcli-archive-keyring.gpg \
-o /usr/share/keyrings/githubcli-archive-keyring.gpg && \
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/githubcli-archive-keyring.gpg] https://cli.github.com/packages stable main" \
> /etc/apt/sources.list.d/github-cli.list && \
apt-get update && apt-get install -y --no-install-recommends gh && \
rm -rf /var/lib/apt/lists/*

COPY runtime/harnesses/codex/install-codex.sh /usr/local/bin/install-codex.sh
ARG CODEX_VERSION=latest
RUN chmod 755 /usr/local/bin/install-codex.sh && \
/usr/local/bin/install-codex.sh "$CODEX_VERSION"

# Provider profiles include both /usr/bin and /usr/local/bin variants for common
# tools. Create the /usr/local/bin aliases in this image so sandbox symlink
# resolution does not warn about missing alternate paths during policy reloads.
RUN ln -sf /usr/bin/gh /usr/local/bin/gh && \
ln -sf /usr/bin/git /usr/local/bin/git && \
ln -sf /usr/bin/codex /usr/local/bin/codex

FROM devtools AS final

ENV PATH="/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/bin:/sbin"

RUN mkdir -p /etc/openshell
COPY gator/policy.yaml /etc/openshell/policy.yaml

RUN printf 'export PATH="/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/bin:/sbin"\nexport PS1="\\u@\\h:\\w\\$ "\n' \
> /sandbox/.bashrc && \
printf '[ -f ~/.bashrc ] && . ~/.bashrc\n' > /sandbox/.profile && \
chown -R sandbox:sandbox /sandbox

USER sandbox

ENTRYPOINT ["/bin/bash"]
192 changes: 192 additions & 0 deletions openshell-agents/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,192 @@
# OpenShell Agents

`openshell-agents/` contains repository-owned agent launchers. An agent is a
manifest plus prompt assets that the shared launcher turns into an OpenShell
sandbox run. Agents do not own harness implementations. Harness-specific setup
and execution live in `runtime/harnesses/<name>/`.

## Directory Layout

```text
openshell-agents/
run.sh # Generic manifest-driven launcher
runtime/ # Shared in-sandbox runtime
entrypoint.sh # Starts the in-sandbox supervisor
supervisor.sh # Runs bounded harness cycles in once/watch mode
subagent.sh # Generic subagent dispatcher
harnesses/
codex/ # Codex install and execution adapter
<agent>/
agent.yaml # Agent manifest
prompts/ # Prompt templates rendered at launch
providers/ # Provider profile YAML files for this agent
policy.yaml # Optional image policy source
```

Agent directories should contain agent-specific intent and payloads: manifests,
prompt templates, provider profiles, policies, and references to skills or
subagents. They should not contain `harnesses/codex`, `harnesses/opencode`, or
similar runtime code.

## Agent Manifest

Each agent has an `agent.yaml` manifest. The launcher currently reads these
sections:

- `id`, `display_name`, `description`: human and runtime identity.
- `sandbox`: default sandbox name prefix, gateway, source image or Dockerfile,
and background log directory.
- `harness`: default harness and per-harness settings such as model and
reasoning effort.
- `runtime`: in-sandbox run mode (`once` or `watch`), watch poll interval, and
transient failure logging threshold.
- `profile_paths`: ordered directories to scan for provider profile YAML files.
- `settings`: gateway settings to apply before launch.
- `providers`: provider instances to create or update, credential sources, and
optional refresh configuration.
- `skills`: files to inject into the sandbox payload.
- `subagents`: subagent definitions to inject into the sandbox payload.
- `prompt_template`: prompt template rendered into the immutable agent payload as
`agent-prompt.md`.

Manifest paths support these prefixes:

- `repo://path`: resolve from the repository root.
- `agent://path`: resolve from the agent directory.
- Relative paths without a prefix: resolve from the agent directory.
- Absolute paths: use as-is.

## Launch Order

`openshell-agents/run.sh` performs the launch in this order:

1. Parse CLI flags and select the agent directory from `--agent`.
2. Load `agent.yaml`, select the requested harness, and reject unsupported
harness names.
3. Resolve sandbox defaults from the manifest and CLI/environment overrides.
4. Build a temporary payload directory.
5. Copy `runtime/` into the payload so every agent uses the same in-sandbox
entrypoint and harness adapters.
6. Optionally copy a host Codex binary into the shared Codex runtime path when
`--codex-bin` is supplied.
7. Copy manifest-declared skills and subagents into the payload.
8. Render the prompt template with runtime values such as `{{HARNESS}}`,
`{{RUN_MODE}}`, `{{POLL_INTERVAL_SECONDS}}`, `{{SUBAGENT_COMMAND}}`, and
`{{USER_PROMPT}}`.
9. Build a temporary Docker context that bakes the rendered payload into
`/etc/openshell/agent-payload`.
10. Apply manifest-declared gateway settings.
11. Resolve provider profile IDs by scanning `profile_paths` in order.
12. Import each provider profile into the gateway. If an active profile already
exists, the launcher keeps going and uses it.
13. Resolve provider credentials from host commands, JSON files, or literal
manifest values.
14. Create or update each provider instance and attach every selected provider
to the sandbox.
15. Configure and rotate refresh-backed provider credentials when declared by
the manifest.
16. Run `openshell sandbox create` from that temporary Dockerfile source.
17. Inside the sandbox, run `/etc/openshell/agent-payload/runtime/entrypoint.sh`.
18. The runtime entrypoint starts
`/etc/openshell/agent-payload/runtime/supervisor.sh`.
19. The supervisor invokes
`/etc/openshell/agent-payload/runtime/harnesses/<harness>/exec.sh` as a
bounded child execution.
20. Harness adapters prepare harness-local auth/config and execute the agent
prompt headlessly.

The payload directory is baked into the image under `/etc/openshell`, which the
gator filesystem policy mounts read-only for agent processes. Prompts, skills,
subagent definitions, and runtime scripts are agent guts, not workspace state.
Agents should write session artifacts, checkouts, temporary files, and future
memory records under `/sandbox` or `/tmp` instead.

## Runtime Modes

Agents can run in `once` or `watch` mode. In `once` mode the supervisor runs one
harness cycle and exits with the harness result unless the agent emits an
`OPENSHELL_AGENT_RESULT` sentinel.

In `watch` mode the sandbox stays alive while the supervisor repeatedly runs
bounded harness cycles. The harness must not sleep or poll indefinitely. Instead,
it performs one reconciliation cycle, then prints a final-line sentinel:

```text
OPENSHELL_AGENT_RESULT {"status":"waiting","next_poll_seconds":900,"reason":"checks_pending"}
```

Supported statuses are `complete`, `waiting`, `blocked`, `transient_failure`, and
`terminal_failure`. The supervisor sleeps between `waiting` or `blocked` cycles
without keeping the harness connected, then launches a fresh harness cycle inside
the same sandbox. In `watch` mode, missing or malformed result sentinels and
harness transport failures are retried indefinitely with bounded backoff; only
`complete` and `terminal_failure` stop the supervisor. This keeps long-lived
agents resilient to upstream model errors while leaving durable state ownership
to the agent domain.

The shared runtime does not prescribe the durable state store. Gator uses GitHub
labels, comments, reviews, and checks. Other agents can use a repository branch,
issue tracker, object store, database, or another domain-specific store as long
as each cycle can reconcile from that state.

Use `--once` or `--watch` to override the manifest default. Use
`--poll-interval <seconds>` to override the watch sleep interval.

Refresh-backed providers are bootstrapped from manifest credential sources when
no gateway refresh state exists. Later launches preserve gateway-owned refresh
material and request a credential rotation first. If that rotation fails, the
launcher treats the host credential source as a repair source, replaces the
gateway refresh material, and retries rotation once. Use `--reset-refresh` to
skip the preserve-first path and intentionally replace gateway refresh material
from the host credential source before rotating.

Long-lived harnesses must not persist revision-scoped provider placeholders such
as `openshell:resolve:env:v123_TOKEN` into files they reuse across refreshes.
Persist the current-name alias, for example `openshell:resolve:env:TOKEN`, so the
sandbox proxy resolves the latest gateway-refreshed credential on each request.

## Subagents

The launcher injects subagent definitions under
`/etc/openshell/agent-payload/subagents/`.
Prompt templates should refer to the generic command instead of a harness-specific
script:

```shell
bash /etc/openshell/agent-payload/runtime/subagent.sh <subagent-id> < task.md
```

The shared subagent dispatcher forwards the task to the active harness adapter.
For Codex, this runs a separate bounded `codex exec` invocation using the same
model and reasoning defaults as the parent harness.

## Providers

Listing a provider in `agent.yaml` means the provider is attached to the sandbox.
Provider profiles describe credential shape, endpoint policy, discovery metadata,
and refresh metadata. The launcher only creates provider instances and supplies
runtime credential values.

`profile_paths` are ordered. The first profile file with the requested `id` wins.
If the same directory contains duplicate profile IDs, the launcher fails. If a
later profile path contains a profile ID that was already found, the launcher
warns that the later file is shadowed.

## Gator Example

`gator/` is the first manifest-driven agent. It uses:

- `gator/agent.yaml` for the launch contract.
- `gator/prompts/gator.md` for the rendered operator prompt.
- `gator/providers/` for scoped GitHub and Codex provider profiles.
- `Dockerfile.gator` for the local sandbox image.
- `runtime/harnesses/codex/` for Codex installation and execution.

Run it through the generic launcher:

```shell
./openshell-agents/run.sh \
--agent gator \
--gateway docker-dev \
"Run gator on PR 1536 and keep watching until it closes or merges."
```
1 change: 1 addition & 0 deletions openshell-agents/gator/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
logs/
52 changes: 52 additions & 0 deletions openshell-agents/gator/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Gator Agent

Launch a headless sandbox agent that runs the `gator-gate` skill against OpenShell issues and pull requests. The default and currently only supported harness is Codex.

## Prerequisites

- `gh` is authenticated on the host and has access to `NVIDIA/OpenShell` and `NVIDIA/OpenShell-Community`.
- For `--harness codex`, `codex login` has created `$HOME/.codex/auth.json`.
- For `--harness codex`, local Codex auth must include an access token, refresh token, and account ID.
- A local gateway is available when using the default local Dockerfile source.

## Usage

```shell
./openshell-agents/run.sh \
--agent gator \
--gateway docker-dev \
--harness codex \
"Run gator on PR 1536 and keep watching until it closes or merges."
```

By default the launcher uses `openshell-agents/Dockerfile.gator` as the sandbox source. Local gateways build that Dockerfile with `openshell-agents/` as the build context, which lets the image use shared harness install scripts from `runtime/` and gator-specific policy from `gator/policy.yaml`. The launcher bakes rendered prompts, skills, subagents, and runtime files into `/etc/openshell/agent-payload`, so `--from` must point to a local Dockerfile or directory containing a Dockerfile.

Use `--harness codex` to select Codex explicitly. Other harness names are rejected until their support is added to `agent.yaml` and `openshell-agents/runtime/harnesses/<name>/`. Agent directories do not carry their own harness implementations; they provide prompt templates and optional skills or subagents for the shared runtime to inject.

Use `--codex-bin "$(command -v codex)"` only when the host executable is compatible with the sandbox OS and architecture.

The manifest-driven launcher at `openshell-agents/run.sh` reads `agent.yaml`, which defines the agent prompt template, provider profile IDs, provider credential sources, gateway settings, skills, subagents, sandbox defaults, runtime mode, and harness defaults. The shared sandbox entrypoint at `openshell-agents/runtime/entrypoint.sh` starts the in-sandbox supervisor, which invokes the selected harness adapter for bounded cycles.

The launcher:

- Scans `profile_paths` in manifest order and imports `providers/github-gator.yaml`.
- Creates or updates the `github-gator` provider from `gh auth token`.
- Selects the requested harness and bakes the common runtime into the immutable sandbox payload.
- For `--harness codex`, imports `providers/codex-gator.yaml`, creates or updates the `codex-gator` provider from `$HOME/.codex/auth.json`, and stores the refresh token as gateway-only refresh material.
- For `--harness codex`, configures gateway-managed refresh for `CODEX_AUTH_ACCESS_TOKEN` and rotates it before launching the sandbox.
- Enables `providers_v2_enabled`, `agent_policy_proposals_enabled`, and `proposal_approval_mode=auto` at gateway scope.
- Uses the gator image policy copied to `/etc/openshell/policy.yaml`.
- Bakes the current `.agents/skills/gator-gate/SKILL.md` into `/etc/openshell/agent-payload`.
- Bakes `.claude/agents/principal-engineer-reviewer.md` so the selected harness can run a deterministic independent reviewer execution through `/etc/openshell/agent-payload/runtime/subagent.sh principal-engineer-reviewer < task.md`.
- For `--harness codex`, optionally bakes a host Codex executable as `/etc/openshell/agent-payload/runtime/harnesses/codex/codex`.
- Starts the selected harness without a TTY.
- Runs gator in `watch` mode by default. The sandbox stays alive while the supervisor sleeps between bounded Codex cycles, so Codex is not connected during passive PR waits.
- Deletes the sandbox automatically after the supervisor exits. Pass `--keep` to preserve it for debugging.

The GitHub provider profile allows read-only GraphQL queries on `api.github.com/graphql` so `gh` read paths can use GraphQL when needed. Write operations remain REST-only and scoped to the two allowed repositories.

Set `GATOR_CODEX_ACCESS_CREDENTIAL_KEY` or pass `--codex-access-key` if the gator Codex profile uses a credential key other than `CODEX_AUTH_ACCESS_TOKEN` for the short-lived access token.

Use `--once` for a single reconciliation cycle. Use `--poll-interval <seconds>` to change the default 15-minute watch cadence.

The launcher preserves existing gateway-owned Codex refresh material by default so multiple gator sandboxes do not overwrite each other's refresh-token lineage from host Codex auth. If gateway rotation fails, the launcher automatically resets gateway refresh material from host Codex auth and retries once. After `codex logout && codex login`, you can also pass `--reset-refresh` to force that reset before rotation.
Loading
Loading