🚧 Status — POC, milestones P1 → P6 reached. You can now
bun run forge, describe an agent in plain English or French, watch the builder draft theAGENT.md, approve it, then ask the builder to run that agent — it spins up its own Docker container with six native tools (Bash, FileRead, FileEdit, FileWrite, Grep, Glob) sandboxed under/workspace, streams the output, and tears the sandbox down. Recurring orchestration patterns are handled by skills : drop aSKILL.mdin~/.agent-forge/skills/(or use the built-inscaffold-and-run) and the CLI auto-dispatches when a trigger phrase appears in your message. Next milestone : P5 — hardened sandbox + artifact extraction.
A conversational CLI where you describe the software you want and a builder LLM designs, writes and launches the agents that produce it — each agent isolated in its own Docker container, with a pixel-art TUI built on Ink.
The builder is the only conversational surface. Sub-agents are spawned on demand in disposable sandboxes ; long-running agents and multi-agent teams come later (P5 and P7).
| Milestone | Scope | State |
|---|---|---|
| P1 | Hello agent in Docker (host script ↔ container ↔ LLM round-trip) | ✅ done |
| P2 | Conversational CLI (REPL Ink, EN/FR, slash commands, provider switch) | ✅ done |
| P3 | Builder writes AGENT.md, asks for permission, launches the agent in a fresh container, streams its output |
✅ done |
| P4 | Six native tools sandboxed under /workspace : Bash, FileWrite, FileRead, FileEdit, Grep, Glob ; runtime tool-loop with maxTurns |
✅ done |
| P6 | Skill layer : SKILL.md format, catalog (built-in + ~/.agent-forge/skills/), server-side trigger matching, two-call runner (one for AGENT.md, one for the run prompt) |
✅ done |
| P5 | Hardened sandbox + persistent agents (docker exec) + artifact extraction back to host |
next |
| P7 | TEAM.md — coordinated multi-agent runs |
|
| P8 | Pixel-art dashboard (live agent activity) | |
| P9 | ★ POC validated : Next.js + Laravel + QA demo end-to-end |
# 1. Build the base Docker image (one-time, ~600 MB, ~1 min)
bash scripts/docker/build-base.sh
# 2. Install JS deps and build the runtime bundle
bun install
bun run --cwd packages/runtime build
# 3. Configure your LLM provider (cloud — recommended)
cp .env.example .env
# edit .env and set FORGE_API_KEY=…
# 4. Launch the builder REPL
bun run forgeOn the first run the CLI asks you to pick a language (EN / FR), then drops you into the conversational prompt.
▌▌ MISSION CONTROL ▐▐ 1 action
╭──────────────────────────────────────────────────────────────╮
│ [DONE] write agents/haiku-writer/AGENT.md │
│ │
│ 1 --- │
│ 2 name: haiku-writer │
│ 3 description: Écrit un haïku en 5-7-5. │
│ 4 sandbox: │
│ 5 image: agent-forge/base:latest │
│ 6 timeout: 60s │
│ 7 maxTurns: 1 │
│ 8 --- │
│ … │
│ ✓ written /Users/you/.agent-forge/agents/haiku-writer/… │
╰──────────────────────────────────────────────────────────────╯
▀▀▀
▀▀▀▀
▄ ▄ ▄
▌▌ AGENT FORGE ▐▐ v0.0.0 home · new session session : new · model: mistral-small-latest
─────────────────────────────────────────────────────────────────
❯ create an agent that writes haikus
▸ Done. The agent is forged. Want me to run it ?
❯ describe what you want to build…
[⏎] send [PgUp/PgDn] scroll [Ctrl+E] live [/help] commands
The TUI is split in two strict zones :
- Top zone (Mission Control) — every concrete action the builder takes. File writes, container launches, agent output. Syntax-highlighted, status-coloured (orange = pending, green = done, red = failed).
- Bottom zone (Conversation) — only the natural-language exchange between you and the builder. No code, no logs, no internals.
Agent Forge talks to any OpenAI-compatible chat endpoint via the Vercel AI SDK. Pick what fits.
Get a key at https://console.mistral.ai. The free tier is enough for the POC.
FORGE_BASE_URL=https://api.mistral.ai/v1
FORGE_API_KEY=…
FORGE_MODEL=mistral-small-latestFORGE_BASE_URL=https://api.openai.com/v1
FORGE_API_KEY=sk-…
FORGE_MODEL=gpt-4o-minipython3 -m venv ~/.agent-forge/mlx-venv
~/.agent-forge/mlx-venv/bin/pip install mlx-lm
~/.agent-forge/mlx-venv/bin/hf download mlx-community/Qwen2.5-7B-Instruct-4bit
~/.agent-forge/mlx-venv/bin/mlx_lm.server \
--model mlx-community/Qwen2.5-7B-Instruct-4bit --port 8080FORGE_BASE_URL=http://host.docker.internal:8080/v1
FORGE_MODEL=mlx-community/Qwen2.5-7B-Instruct-4bitYou can also switch on the fly inside the REPL : /provider mistral, /model mistral-large-latest, /provider mlx.
- Describe —
> create an agent that writes haikus on a given topic - Approve — the builder drafts an
AGENT.md, Mission Control shows it, a permission dialog asks[Y] approve [N] decline [D] preview. PressY. - Run —
> run haiku-writer on Docker. Same dialog, sameY. - Watch — Mission Control streams the container output live, the badge flips to
[DONE], the container is removed (docker run --rm).
Every session is persisted to ~/.agent-forge/sessions/<id>/transcript.jsonl. Use /sessions to list, /session to show the current id.
Agents launched by the builder run inside a disposable container with /workspace mounted as their writable root. Six native tools are exposed and called via fenced forge:* blocks the agent emits in its reply :
| Tag | Tool | What it does |
|---|---|---|
forge:bash |
Bash | bash -lc <command> inside /workspace, 30 s default timeout (max 120 s), output clipped at 16 KB |
forge:write |
FileWrite | Create or overwrite a file under /workspace, parent dirs auto-created |
forge:read |
FileRead | Line-based offset/limit, 16 KB clip, fails on non-regular files |
forge:edit |
FileEdit | Exact-substring patch ; refuses ambiguous matches unless replaceAll: true |
forge:grep |
Grep | Pure JS regex over an optional glob filter, skips binaries, 200 hits cap |
forge:glob |
Glob | Hand-rolled * / ** / ? matcher, 200 results cap |
The runtime parses one block per turn, executes it, feeds the structured result back as a system message, and loops up to maxTurns (capped at 10). All tools are sandboxed : path traversal, null bytes and absolute paths outside /workspace are refused.
Why a text-structured protocol instead of OpenAI tool_calls ? Local LLMs (MLX, llama.cpp) don't all honour native tool-use, and a single protocol across builder and agents is easier to debug — the raw stream stays human-readable.
A single user message can mix two intents the LLM tends to collapse — "what the agent IS" and "what the agent should do RIGHT NOW". Skills keep them apart.
A skill is a SKILL.md file with a YAML frontmatter (name, description, triggers, actions) and a markdown body of instructions. The CLI loads skills from two sources :
- built-in : shipped under
packages/core/src/builder/skills/ - user : drop a file into
~/.agent-forge/skills/<name>.md(or<name>/SKILL.mdfor grouped assets) and it overrides the built-in on name collision
When you send a message, the CLI scans it server-side against every skill's trigger phrases (case-insensitive substring). If one matches, the skill runner takes over the turn : two narrow LLM calls, one for the AGENT.md (generic role only), one for the run prompt (the concrete task), then both blocks land as PROPOSED cards in Mission Control. You approve them in order. The LLM never has to make the meta-decision.
Built-in scaffold-and-run ships today : it triggers on words like audite, teste, lance puis, audit, test it, then run, create and run. Type /skills in the REPL to list what's available.
/help show all commands
/clear clear the view (LLM context kept)
/reset clear view AND LLM context
/lang en|fr switch UI language
/provider <name> mlx | openai | anthropic | mistral
/model <name> switch model on the active provider
/session show the current session id
/sessions list persisted sessions
/skills list available skills (built-in + user)
/exit quit
Tab/Shift+Tab— cycle focus through action cardsEnter— open the focused card in a full-screen detail viewEsc— drop the focus (or close the detail view)↑↓ / PgUp / PgDn / g / G— scroll inside the detail viewCtrl+E— return the chat transcript to live mode
┌─────────────────────────────────────────────────────────────┐
│ HOST │
│ │
│ forge CLI (= the builder LLM) │
│ ├─ Ink TUI (Mission Control + conversation) │
│ ├─ Skill catalog : built-in + ~/.agent-forge/skills/ │
│ ├─ Server-side trigger matcher + skill runner │
│ ├─ AGENT.md / SKILL.md parsers (Zod-validated) │
│ ├─ FileWrite tool (host, sandboxed under ~/.agent-forge) │
│ └─ DockerLaunch tool (spawns one-shot containers) │
└────────────────────┬────────────────────────────────────────┘
│ docker run --rm -i
│ -v <agent>/AGENT.md:/agent/AGENT.md:ro
│ -v <runtime-bundle>:/runtime:ro
│ -v <per-run-host-dir>:/workspace
▼
┌─────────────────────────────────────────────────────────────┐
│ CONTAINER (one per agent run, disposable) │
│ agent-forge/base:latest │
│ │
│ Node runtime ── reads /agent/AGENT.md as system prompt │
│ ├─ pipes the user prompt through stdin │
│ ├─ streams the LLM answer to stdout │
│ └─ tool loop : forge:bash / write / read / │
│ edit / grep / glob, capped at maxTurns │
│ │
│ /workspace ── writable scratchpad, kept on host after exit │
└─────────────────────────────────────────────────────────────┘
Persistent agents (docker exec instead of docker run --rm) and multi-agent teams (one container, many processes coordinating via claude-presence) land in P5 and P7.
- TypeScript + Bun runtime + Bun workspaces
- Ink (React for terminals) for the TUI
- Vercel AI SDK (
ai,@ai-sdk/openai) — provider-agnostic LLM calls zod—AGENT.mdfrontmatter validationdockerCLI viachild_process.spawn(Bun + dockerode hangs on attach)biomefor lint/format- Apache 2.0 license
agent-forge/
├── packages/
│ ├── core/ # builder LLM, schemas, skill layer
│ │ └── src/builder/skills/ # built-in SKILL.md files
│ ├── cli/ # the `forge` binary (Ink REPL + Mission Control)
│ ├── runtime/ # bundle that runs inside each agent container
│ │ └── src/tool-protocol.ts # forge:* parser + result renderers
│ └── tools-core/
│ ├── file-write.ts # host-side FileWrite (~/.agent-forge)
│ ├── docker-launch.ts # one-shot container launcher
│ └── runtime/ # in-container tools : bash, file-write,
│ # file-read, file-edit, grep, glob
├── docker/ # Dockerfiles
├── scripts/ # build helpers (docker, hooks)
├── demo-sprites/ # interactive mockup (UX reference)
└── assets/ # README images
This project's architecture was informed by a public technical analysis of an existing reference coding-agent. The analysis (~6 400 lines, 13 documents) extracted patterns worth keeping and pitfalls to avoid. No code was copied — only architectural patterns inspired the design.
Project is in active POC phase. Feedback and ideas welcome via issues. Code contributions will open after the P9 milestone (POC validated).
Apache 2.0 — Copyright 2026 Georges Garnier

