GitHub - garniergeorges/agent-forge: Forge, run, and orchestrate sandboxed LLM agents — a conversational CLI.

Forge, run, and orchestrate sandboxed LLM agents.

🇬🇧 English version · 🇫🇷 Version française

🚧 Status — POC, milestones P1 → P6 reached. You can now bun run forge, describe an agent in plain English or French, watch the builder draft the AGENT.md, approve it, then ask the builder to run that agent — it spins up its own Docker container with six native tools (Bash, FileRead, FileEdit, FileWrite, Grep, Glob) sandboxed under /workspace, streams the output, and tears the sandbox down. Recurring orchestration patterns are handled by skills : drop a SKILL.md in ~/.agent-forge/skills/ (or use the built-in scaffold-and-run) and the CLI auto-dispatches when a trigger phrase appears in your message. Next milestone : P5 — hardened sandbox + artifact extraction.

What is Agent Forge ?

A conversational CLI where you describe the software you want and a builder LLM designs, writes and launches the agents that produce it — each agent isolated in its own Docker container, with a pixel-art TUI built on Ink.

The builder is the only conversational surface. Sub-agents are spawned on demand in disposable sandboxes ; long-running agents and multi-agent teams come later (P5 and P7).

Status — what works today

Milestone	Scope	State
P1	Hello agent in Docker (host script ↔ container ↔ LLM round-trip)	✅ done
P2	Conversational CLI (REPL Ink, EN/FR, slash commands, provider switch)	✅ done
P3	Builder writes `AGENT.md`, asks for permission, launches the agent in a fresh container, streams its output	✅ done
P4	Six native tools sandboxed under `/workspace` : Bash, FileWrite, FileRead, FileEdit, Grep, Glob ; runtime tool-loop with `maxTurns`	✅ done
P6	Skill layer : `SKILL.md` format, catalog (built-in + `~/.agent-forge/skills/`), server-side trigger matching, two-call runner (one for AGENT.md, one for the run prompt)	✅ done
P5	Hardened sandbox + persistent agents (`docker exec`) + artifact extraction back to host	next
P7	`TEAM.md` — coordinated multi-agent runs
P8	Pixel-art dashboard (live agent activity)
P9	★ POC validated : Next.js + Laravel + QA demo end-to-end

Quick start

# 1. Build the base Docker image (one-time, ~600 MB, ~1 min)
bash scripts/docker/build-base.sh

# 2. Install JS deps and build the runtime bundle
bun install
bun run --cwd packages/runtime build

# 3. Configure your LLM provider (cloud — recommended)
cp .env.example .env
# edit .env and set FORGE_API_KEY=…

# 4. Launch the builder REPL
bun run forge

On the first run the CLI asks you to pick a language (EN / FR), then drops you into the conversational prompt.

What the screen looks like

 ▌▌ MISSION CONTROL ▐▐    1 action

 ╭──────────────────────────────────────────────────────────────╮
 │  [DONE]  write  agents/haiku-writer/AGENT.md                 │
 │                                                              │
 │      1   ---                                                 │
 │      2   name: haiku-writer                                  │
 │      3   description: Écrit un haïku en 5-7-5.               │
 │      4   sandbox:                                            │
 │      5     image: agent-forge/base:latest                    │
 │      6     timeout: 60s                                      │
 │      7   maxTurns: 1                                         │
 │      8   ---                                                 │
 │      …                                                       │
 │     ✓ written /Users/you/.agent-forge/agents/haiku-writer/…  │
 ╰──────────────────────────────────────────────────────────────╯

                                                            ▀▀▀
                                                            ▀▀▀▀
                                                            ▄ ▄ ▄

 ▌▌ AGENT FORGE ▐▐  v0.0.0  home · new session       session : new · model: mistral-small-latest
 ─────────────────────────────────────────────────────────────────
   ❯ create an agent that writes haikus
   ▸ Done. The agent is forged. Want me to run it ?

 ❯ describe what you want to build…
 [⏎] send  [PgUp/PgDn] scroll  [Ctrl+E] live  [/help] commands

The TUI is split in two strict zones :

Top zone (Mission Control) — every concrete action the builder takes. File writes, container launches, agent output. Syntax-highlighted, status-coloured (orange = pending, green = done, red = failed).
Bottom zone (Conversation) — only the natural-language exchange between you and the builder. No code, no logs, no internals.

Provider configuration

Agent Forge talks to any OpenAI-compatible chat endpoint via the Vercel AI SDK. Pick what fits.

Mistral cloud (default — recommended)

Get a key at https://console.mistral.ai. The free tier is enough for the POC.

FORGE_BASE_URL=https://api.mistral.ai/v1
FORGE_API_KEY=…
FORGE_MODEL=mistral-small-latest

OpenAI cloud

FORGE_BASE_URL=https://api.openai.com/v1
FORGE_API_KEY=sk-…
FORGE_MODEL=gpt-4o-mini

Local MLX server (Apple Silicon, free, no key)

python3 -m venv ~/.agent-forge/mlx-venv
~/.agent-forge/mlx-venv/bin/pip install mlx-lm
~/.agent-forge/mlx-venv/bin/hf download mlx-community/Qwen2.5-7B-Instruct-4bit
~/.agent-forge/mlx-venv/bin/mlx_lm.server \
  --model mlx-community/Qwen2.5-7B-Instruct-4bit --port 8080

FORGE_BASE_URL=http://host.docker.internal:8080/v1
FORGE_MODEL=mlx-community/Qwen2.5-7B-Instruct-4bit

You can also switch on the fly inside the REPL : /provider mistral, /model mistral-large-latest, /provider mlx.

A typical session

Describe — > create an agent that writes haikus on a given topic
Approve — the builder drafts an AGENT.md, Mission Control shows it, a permission dialog asks [Y] approve [N] decline [D] preview. Press Y.
Run — > run haiku-writer on Docker. Same dialog, same Y.
Watch — Mission Control streams the container output live, the badge flips to [DONE], the container is removed (docker run --rm).

Every session is persisted to ~/.agent-forge/sessions/<id>/transcript.jsonl. Use /sessions to list, /session to show the current id.

Native tools (inside the agent sandbox)

Agents launched by the builder run inside a disposable container with /workspace mounted as their writable root. Six native tools are exposed and called via fenced forge:* blocks the agent emits in its reply :

Tag	Tool	What it does
`forge:bash`	Bash	`bash -lc <command>` inside `/workspace`, 30 s default timeout (max 120 s), output clipped at 16 KB
`forge:write`	FileWrite	Create or overwrite a file under `/workspace`, parent dirs auto-created
`forge:read`	FileRead	Line-based offset/limit, 16 KB clip, fails on non-regular files
`forge:edit`	FileEdit	Exact-substring patch ; refuses ambiguous matches unless `replaceAll: true`
`forge:grep`	Grep	Pure JS regex over an optional glob filter, skips binaries, 200 hits cap
`forge:glob`	Glob	Hand-rolled `` / `*` / `?` matcher, 200 results cap

The runtime parses one block per turn, executes it, feeds the structured result back as a system message, and loops up to maxTurns (capped at 10). All tools are sandboxed : path traversal, null bytes and absolute paths outside /workspace are refused.

Why a text-structured protocol instead of OpenAI tool_calls ? Local LLMs (MLX, llama.cpp) don't all honour native tool-use, and a single protocol across builder and agents is easier to debug — the raw stream stays human-readable.

Skills (recurring orchestration patterns)

A single user message can mix two intents the LLM tends to collapse — "what the agent IS" and "what the agent should do RIGHT NOW". Skills keep them apart.

A skill is a SKILL.md file with a YAML frontmatter (name, description, triggers, actions) and a markdown body of instructions. The CLI loads skills from two sources :

built-in : shipped under packages/core/src/builder/skills/
user : drop a file into ~/.agent-forge/skills/<name>.md (or <name>/SKILL.md for grouped assets) and it overrides the built-in on name collision

When you send a message, the CLI scans it server-side against every skill's trigger phrases (case-insensitive substring). If one matches, the skill runner takes over the turn : two narrow LLM calls, one for the AGENT.md (generic role only), one for the run prompt (the concrete task), then both blocks land as PROPOSED cards in Mission Control. You approve them in order. The LLM never has to make the meta-decision.

Built-in scaffold-and-run ships today : it triggers on words like audite, teste, lance puis, audit, test it, then run, create and run. Type /skills in the REPL to list what's available.

Useful slash commands

/help                show all commands
/clear               clear the view (LLM context kept)
/reset               clear view AND LLM context
/lang en|fr          switch UI language
/provider <name>     mlx | openai | anthropic | mistral
/model <name>        switch model on the active provider
/session             show the current session id
/sessions            list persisted sessions
/skills              list available skills (built-in + user)
/exit                quit

Mission Control keyboard

Tab / Shift+Tab — cycle focus through action cards
Enter — open the focused card in a full-screen detail view
Esc — drop the focus (or close the detail view)
↑↓ / PgUp / PgDn / g / G — scroll inside the detail view
Ctrl+E — return the chat transcript to live mode

Architecture

┌─────────────────────────────────────────────────────────────┐
│  HOST                                                       │
│                                                             │
│  forge CLI (= the builder LLM)                              │
│    ├─ Ink TUI (Mission Control + conversation)              │
│    ├─ Skill catalog : built-in + ~/.agent-forge/skills/     │
│    ├─ Server-side trigger matcher + skill runner            │
│    ├─ AGENT.md / SKILL.md parsers (Zod-validated)           │
│    ├─ FileWrite tool (host, sandboxed under ~/.agent-forge) │
│    └─ DockerLaunch tool (spawns one-shot containers)        │
└────────────────────┬────────────────────────────────────────┘
                     │ docker run --rm -i
                     │   -v <agent>/AGENT.md:/agent/AGENT.md:ro
                     │   -v <runtime-bundle>:/runtime:ro
                     │   -v <per-run-host-dir>:/workspace
                     ▼
┌─────────────────────────────────────────────────────────────┐
│  CONTAINER (one per agent run, disposable)                  │
│  agent-forge/base:latest                                    │
│                                                             │
│  Node runtime ── reads /agent/AGENT.md as system prompt     │
│              ├─ pipes the user prompt through stdin         │
│              ├─ streams the LLM answer to stdout            │
│              └─ tool loop : forge:bash / write / read /     │
│                 edit / grep / glob, capped at maxTurns      │
│                                                             │
│  /workspace ── writable scratchpad, kept on host after exit │
└─────────────────────────────────────────────────────────────┘

Persistent agents (docker exec instead of docker run --rm) and multi-agent teams (one container, many processes coordinating via claude-presence) land in P5 and P7.

Tech stack

TypeScript + Bun runtime + Bun workspaces
Ink (React for terminals) for the TUI
Vercel AI SDK (ai, @ai-sdk/openai) — provider-agnostic LLM calls
zod — AGENT.md frontmatter validation
docker CLI via child_process.spawn (Bun + dockerode hangs on attach)
biome for lint/format
Apache 2.0 license

Repository structure

agent-forge/
├── packages/
│   ├── core/                       # builder LLM, schemas, skill layer
│   │   └── src/builder/skills/     # built-in SKILL.md files
│   ├── cli/                        # the `forge` binary (Ink REPL + Mission Control)
│   ├── runtime/                    # bundle that runs inside each agent container
│   │   └── src/tool-protocol.ts    # forge:* parser + result renderers
│   └── tools-core/
│       ├── file-write.ts           # host-side FileWrite (~/.agent-forge)
│       ├── docker-launch.ts        # one-shot container launcher
│       └── runtime/                # in-container tools : bash, file-write,
│                                   #   file-read, file-edit, grep, glob
├── docker/                         # Dockerfiles
├── scripts/                        # build helpers (docker, hooks)
├── demo-sprites/                   # interactive mockup (UX reference)
└── assets/                         # README images

Genesis

This project's architecture was informed by a public technical analysis of an existing reference coding-agent. The analysis (~6 400 lines, 13 documents) extracted patterns worth keeping and pitfalls to avoid. No code was copied — only architectural patterns inspired the design.

Contributing

Project is in active POC phase. Feedback and ideas welcome via issues. Code contributions will open after the P9 milestone (POC validated).

License

Author

@garniergeorges

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.github/workflows		.github/workflows
assets		assets
demo-sprites		demo-sprites
docker		docker
docs		docs
examples		examples
packages		packages
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.fr.md		README.fr.md
README.md		README.md
biome.json		biome.json
bun.lock		bun.lock
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is Agent Forge ?

Status — what works today

Quick start

What the screen looks like

Provider configuration

Mistral cloud (default — recommended)

OpenAI cloud

Local MLX server (Apple Silicon, free, no key)

A typical session

Native tools (inside the agent sandbox)

Skills (recurring orchestration patterns)

Useful slash commands

Mission Control keyboard

Architecture

Tech stack

Repository structure

Genesis

Contributing

License

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

What is Agent Forge ?

Status — what works today

Quick start

What the screen looks like

Provider configuration

Mistral cloud (default — recommended)

OpenAI cloud

Local MLX server (Apple Silicon, free, no key)

A typical session

Native tools (inside the agent sandbox)

Skills (recurring orchestration patterns)

Useful slash commands

Mission Control keyboard

Architecture

Tech stack

Repository structure

Genesis

Contributing

License

Author

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages