From 5a4687901b963387d30308e1027b1ac7366bb71e Mon Sep 17 00:00:00 2001
From: Nick Sullivan <nick@technick.ai>
Date: Mon, 4 May 2026 15:30:03 -0500
Subject: [PATCH 1/2] =?UTF-8?q?=E2=9C=A8=20Add=20verify-deliverables=20Sub?=
 =?UTF-8?q?agentStop=20hook?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Cross-references a subagent's recent activity against its completion
claims. Fires advisory output via hookSpecificOutput.additionalContext
when the tool-call record or filesystem disagrees with what the
subagent said it did.

Two checks:
- strong completion claim + zero Write/Edit/MultiEdit/NotebookEdit
  calls + a file-path-shaped token in the message → flag
- a Write/Edit-style tool call's file_path is missing on disk → flag

Always advisory, never blocks. Fail-open in every error path so a hook
bug cannot break the harness. Honors stop_hook_active to avoid loops
and a VERIFY_DELIVERABLES_DISABLED=1 env kill switch.

Test harness covers 18 cases including honest writes, MultiEdit,
alt payload shape, negated phrasings, trailing-boundary regressions,
broken symlinks, long transcripts, relative paths, the escape hatch,
and several fail-open paths.

Bumps plugin version to 9.21.0 across marketplace.json and plugin.json.

Fixes #51
---
 .claude-plugin/marketplace.json               |   4 +-
 plugins/core/.claude-plugin/plugin.json       |   2 +-
 plugins/core/hooks/README.md                  |  85 +++++++
 plugins/core/hooks/hooks.json                 |  11 +
 .../hooks/tests/test-verify-deliverables.sh   | 215 ++++++++++++++++++
 plugins/core/hooks/verify-deliverables.sh     | 132 +++++++++++
 6 files changed, 446 insertions(+), 3 deletions(-)
 create mode 100644 plugins/core/hooks/README.md
 create mode 100755 plugins/core/hooks/tests/test-verify-deliverables.sh
 create mode 100755 plugins/core/hooks/verify-deliverables.sh

diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json
index 7ef97f4..4d7e2ba 100644
--- a/.claude-plugin/marketplace.json
+++ b/.claude-plugin/marketplace.json
@@ -6,7 +6,7 @@
   },
   "metadata": {
     "description": "Professional AI coding configurations, agents, skills, and context for Claude Code and Cursor",
-    "version": "9.20.0",
+    "version": "9.21.0",
     "license": "MIT",
     "repository": "https://github.com/TechNickAI/ai-coding-config"
   },
@@ -15,7 +15,7 @@
       "name": "ai-coding-config",
       "source": "./plugins/core",
       "description": "Commands, agents, skills, and context for AI-assisted development workflows",
-      "version": "9.20.0",
+      "version": "9.21.0",
       "tags": ["commands", "agents", "skills", "workflows", "essential"]
     }
   ]
diff --git a/plugins/core/.claude-plugin/plugin.json b/plugins/core/.claude-plugin/plugin.json
index c1e4583..6d4fcaf 100644
--- a/plugins/core/.claude-plugin/plugin.json
+++ b/plugins/core/.claude-plugin/plugin.json
@@ -1,6 +1,6 @@
 {
   "name": "ai-coding-config",
-  "version": "9.20.0",
+  "version": "9.21.0",
   "description": "Commands, agents, skills, and context for AI-assisted development workflows",
   "author": {
     "name": "TechNickAI",
diff --git a/plugins/core/hooks/README.md b/plugins/core/hooks/README.md
new file mode 100644
index 0000000..e21530e
--- /dev/null
+++ b/plugins/core/hooks/README.md
@@ -0,0 +1,85 @@
+# Hooks
+
+Lifecycle hooks that fire during Claude Code sessions. Each hook is a small bash
+script registered in `hooks.json` against a Claude Code event.
+
+## Conventions
+
+- Bash scripts (`.sh`), executable, with a `#!/bin/bash` shebang.
+- Read JSON payload from stdin via `cat`, parse fields with `jq -r '.field // empty'`.
+- Write any guidance for the model to stdout — Claude Code surfaces stdout as
+  context. Scripts that have nothing to say print nothing.
+- Always `exit 0` unless the hook contract specifically supports blocking. A
+  failing hook that returns non-zero can disrupt the harness.
+- Defensive parsing: handle missing fields, empty input, malformed JSON gracefully.
+  Fail open.
+- Match registration in `hooks.json`. Use `${CLAUDE_PLUGIN_ROOT}/hooks/<file>` for
+  the command path.
+
+## Installed Hooks
+
+### `session-multiplexing.sh` — `SessionStart`
+
+Counts recently-touched session marker files. When 3+ Claude Code sessions are
+active, prints a reminder that the user is juggling windows so the assistant
+includes explicit context (repo, branch, current task) in every status update.
+
+### `todo-persist.sh` — `PostToolUse` (TodoWrite)
+
+Mirrors the model's todo list to a human-readable `todos.md` in the project
+directory each time `TodoWrite` runs. Survives compaction and lets the user see
+in-flight work without the harness UI.
+
+### `verify-deliverables.sh` — `SubagentStop`
+
+Cross-references a subagent's recent activity against its completion claims.
+Surfaces a structured advisory back to the parent agent when the tool-call record
+or filesystem disagrees with what the subagent said it did. Always advisory,
+never blocking.
+
+**What it checks:**
+
+- **Completion claim with zero mutations.** If the subagent's final message
+  contains strong completion language ("done.", "implemented X", "all tests pass",
+  ✅) and a file-path-shaped token, but the recent transcript has zero
+  `Write`/`Edit`/`MultiEdit`/`NotebookEdit` calls — flagged as a likely lie.
+- **Tool call recorded but artifact missing.** For each path the subagent's
+  tool calls claim to have written, verifies the file exists on disk now. A
+  missing file means the tool failed silently, was deleted, or the call was
+  fabricated.
+
+**What it doesn't do:**
+
+- Doesn't verify *correctness* — only existence. Use `verify-fix` for behavioral
+  validation.
+- Doesn't block the subagent from stopping. Output is added to
+  `hookSpecificOutput.additionalContext` so the parent agent sees the
+  discrepancy and decides what to do.
+- Doesn't parse natural-language claims aggressively. Patterns are conservative
+  to keep false-positive rate low; we'd rather miss a few lies than nag on
+  honest analytical responses.
+
+**Escape hatches:**
+
+- Set `VERIFY_DELIVERABLES_DISABLED=1` in the environment to bypass entirely.
+- The hook respects `stop_hook_active=true` and exits silently when the harness
+  signals it has already fired once this stop cycle, so it cannot loop.
+- Any internal error or unreadable transcript causes a silent pass — a broken
+  hook never breaks the harness.
+
+**Why this exists:**
+
+The `<epistemic-honesty>` block in `CLAUDE.md`, the `verify-fix` skill, and the
+`verification-before-completion` superpower all try to prevent subagents from
+claiming work they haven't done. Prompts are advisory; this hook is structural.
+See [issue #51](https://github.com/TechNickAI/ai-coding-config/issues/51) for
+the design discussion.
+
+## Running Tests
+
+```bash
+bash plugins/core/hooks/tests/test-verify-deliverables.sh
+```
+
+Test fixtures generate synthetic transcript JSONL and pipe synthetic stdin
+payloads to the hook. No live Claude Code session required.
diff --git a/plugins/core/hooks/hooks.json b/plugins/core/hooks/hooks.json
index d900d8a..bbedab7 100644
--- a/plugins/core/hooks/hooks.json
+++ b/plugins/core/hooks/hooks.json
@@ -21,6 +21,17 @@
           }
         ]
       }
+    ],
+    "SubagentStop": [
+      {
+        "matcher": "*",
+        "hooks": [
+          {
+            "type": "command",
+            "command": "${CLAUDE_PLUGIN_ROOT}/hooks/verify-deliverables.sh"
+          }
+        ]
+      }
     ]
   }
 }
diff --git a/plugins/core/hooks/tests/test-verify-deliverables.sh b/plugins/core/hooks/tests/test-verify-deliverables.sh
new file mode 100755
index 0000000..f0129aa
--- /dev/null
+++ b/plugins/core/hooks/tests/test-verify-deliverables.sh
@@ -0,0 +1,215 @@
+#!/bin/bash
+# Test harness for verify-deliverables.sh
+#
+# Generates synthetic SubagentStop payloads and transcript JSONL fixtures,
+# pipes them to the hook, and asserts on its stdout. Doesn't require a
+# live Claude Code session.
+#
+# Run from any cwd:
+#   bash plugins/core/hooks/tests/test-verify-deliverables.sh
+
+set -u
+
+HOOK="$(cd "$(dirname "$0")/.." && pwd)/verify-deliverables.sh"
+if [ ! -x "$HOOK" ]; then
+    echo "ERROR: hook not executable at $HOOK"
+    exit 2
+fi
+
+TMPDIR=$(mktemp -d)
+trap 'rm -rf "$TMPDIR"' EXIT
+
+PASS=0
+FAIL=0
+
+# --- Helpers ---
+
+# Run a case. $expect is one of:
+#   silent   — assert empty stdout
+#   advisory — assert stdout contains "additionalContext"
+#   <substring> — assert stdout contains the literal substring
+run_case() {
+    local name="$1"
+    local payload="$2"
+    local expect="$3"
+    local output
+    output=$(printf '%s' "$payload" | bash "$HOOK" 2>&1)
+
+    case "$expect" in
+        silent)
+            if [ -z "$output" ]; then
+                echo "PASS: $name"
+                PASS=$((PASS + 1))
+            else
+                echo "FAIL: $name"
+                echo "    expected: silent"
+                echo "    got:      $output"
+                FAIL=$((FAIL + 1))
+            fi
+            ;;
+        advisory)
+            if echo "$output" | grep -q 'additionalContext'; then
+                echo "PASS: $name"
+                PASS=$((PASS + 1))
+            else
+                echo "FAIL: $name"
+                echo "    expected: advisory output"
+                echo "    got:      $output"
+                FAIL=$((FAIL + 1))
+            fi
+            ;;
+        *)
+            if echo "$output" | grep -qF "$expect"; then
+                echo "PASS: $name"
+                PASS=$((PASS + 1))
+            else
+                echo "FAIL: $name"
+                echo "    expected substring: $expect"
+                echo "    got:                $output"
+                FAIL=$((FAIL + 1))
+            fi
+            ;;
+    esac
+}
+
+write_transcript() {
+    local file="$TMPDIR/$1.jsonl"
+    shift
+    : > "$file"
+    for line in "$@"; do
+        printf '%s\n' "$line" >> "$file"
+    done
+    printf '%s' "$file"
+}
+
+make_payload() {
+    local transcript="$1"
+    local stop_hook_active="${2:-false}"
+    cat <<EOF
+{"transcript_path":"$transcript","stop_hook_active":$stop_hook_active,"cwd":"$TMPDIR","hook_event_name":"SubagentStop","session_id":"test-session"}
+EOF
+}
+
+# --- Fixtures ---
+
+# An honest write: tool call to foo.ts AND foo.ts exists on disk.
+echo "content" > "$TMPDIR/foo.ts"
+TX_HONEST=$(write_transcript "honest" \
+    "{\"type\":\"assistant\",\"message\":{\"content\":[{\"type\":\"tool_use\",\"name\":\"Write\",\"input\":{\"file_path\":\"$TMPDIR/foo.ts\",\"content\":\"content\"}}]}}" \
+    '{"type":"assistant","message":{"content":[{"type":"text","text":"Done. Created `foo.ts` as requested."}]}}'
+)
+
+# A liar: completion claim, zero mutating tool calls (only Read).
+TX_LIAR_NO_ACTION=$(write_transcript "liar-no-action" \
+    '{"type":"assistant","message":{"content":[{"type":"tool_use","name":"Read","input":{"file_path":"src/whatever.ts"}}]}}' \
+    '{"type":"assistant","message":{"content":[{"type":"text","text":"Done. Implemented the change in src/whatever.ts."}]}}'
+)
+
+# A ghost: tool call recorded but file is not on disk.
+TX_GHOST=$(write_transcript "ghost" \
+    "{\"type\":\"assistant\",\"message\":{\"content\":[{\"type\":\"tool_use\",\"name\":\"Write\",\"input\":{\"file_path\":\"$TMPDIR/ghost.md\",\"content\":\"x\"}}]}}" \
+    '{"type":"assistant","message":{"content":[{"type":"text","text":"All set."}]}}'
+)
+
+# Pure analyst: no completion claim, no mutations.
+TX_ANALYST=$(write_transcript "analyst" \
+    '{"type":"assistant","message":{"content":[{"type":"text","text":"I read the code and found 3 patterns worth examining further."}]}}'
+)
+
+# Trivial chatter: completion-ish word but no file path tokens.
+TX_TRIVIAL=$(write_transcript "trivial" \
+    '{"type":"assistant","message":{"content":[{"type":"text","text":"All done with the analysis."}]}}'
+)
+
+# Edit (not Write) to existing file. Should pass — Edit counts as mutating.
+echo "before" > "$TMPDIR/edited.ts"
+TX_EDIT=$(write_transcript "edit-honest" \
+    "{\"type\":\"assistant\",\"message\":{\"content\":[{\"type\":\"tool_use\",\"name\":\"Edit\",\"input\":{\"file_path\":\"$TMPDIR/edited.ts\",\"old_string\":\"before\",\"new_string\":\"after\"}}]}}" \
+    '{"type":"assistant","message":{"content":[{"type":"text","text":"Done. Updated `edited.ts`."}]}}'
+)
+
+# tool_input shape (alternative payload format) for Write.
+echo "alt-content" > "$TMPDIR/alt.ts"
+TX_ALT_SHAPE=$(write_transcript "alt-shape" \
+    "{\"type\":\"assistant\",\"message\":{\"content\":[{\"type\":\"tool_use\",\"name\":\"Write\",\"tool_input\":{\"file_path\":\"$TMPDIR/alt.ts\"}}]}}" \
+    '{"type":"assistant","message":{"content":[{"type":"text","text":"Done. Wrote alt.ts."}]}}'
+)
+
+# MultiEdit should count as mutating.
+echo "before" > "$TMPDIR/multi.ts"
+TX_MULTIEDIT=$(write_transcript "multi-edit" \
+    "{\"type\":\"assistant\",\"message\":{\"content\":[{\"type\":\"tool_use\",\"name\":\"MultiEdit\",\"input\":{\"file_path\":\"$TMPDIR/multi.ts\",\"edits\":[]}}]}}" \
+    '{"type":"assistant","message":{"content":[{"type":"text","text":"Done. Updated multi.ts in several places."}]}}'
+)
+
+# Negation: "not all tests pass" should not flag.
+TX_NEGATION=$(write_transcript "negated" \
+    '{"type":"assistant","message":{"content":[{"type":"tool_use","name":"Read","input":{"file_path":"src/foo.ts"}}]}}' \
+    '{"type":"assistant","message":{"content":[{"type":"text","text":"Investigated src/foo.ts — not all tests pass yet."}]}}'
+)
+
+# Trailing boundary: "fixed itself" should not match "fixed it".
+TX_FIXED_ITSELF=$(write_transcript "fixed-itself" \
+    '{"type":"assistant","message":{"content":[{"type":"tool_use","name":"Read","input":{"file_path":"src/x.ts"}}]}}' \
+    '{"type":"assistant","message":{"content":[{"type":"text","text":"After reading src/x.ts the issue fixed itself when I retried."}]}}'
+)
+
+# Broken symlink: tool created the symlink, target is gone. -L catches it.
+ln -s "/nonexistent/target" "$TMPDIR/broken-link"
+TX_BROKEN_SYMLINK=$(write_transcript "broken-symlink" \
+    "{\"type\":\"assistant\",\"message\":{\"content\":[{\"type\":\"tool_use\",\"name\":\"Write\",\"input\":{\"file_path\":\"$TMPDIR/broken-link\"}}]}}" \
+    '{"type":"assistant","message":{"content":[{"type":"text","text":"Done."}]}}'
+)
+
+# Long transcript: Write at start, final message after many Read calls.
+# Confirms the 200-line slice catches Writes that the old 50-line window missed.
+echo "long-content" > "$TMPDIR/long.ts"
+{
+    printf '%s\n' "{\"type\":\"assistant\",\"message\":{\"content\":[{\"type\":\"tool_use\",\"name\":\"Write\",\"input\":{\"file_path\":\"$TMPDIR/long.ts\",\"content\":\"x\"}}]}}"
+    for i in $(seq 1 60); do
+        printf '%s\n' "{\"type\":\"assistant\",\"message\":{\"content\":[{\"type\":\"tool_use\",\"name\":\"Read\",\"input\":{\"file_path\":\"src/file-$i.ts\"}}]}}"
+    done
+    printf '%s\n' '{"type":"assistant","message":{"content":[{"type":"text","text":"Done. Implemented in long.ts."}]}}'
+} > "$TMPDIR/long-transcript.jsonl"
+TX_LONG="$TMPDIR/long-transcript.jsonl"
+
+# Relative file_path with cwd — must resolve to $CWD/relative.ts.
+echo "relative-content" > "$TMPDIR/relative.ts"
+TX_RELATIVE=$(write_transcript "relative" \
+    '{"type":"assistant","message":{"content":[{"type":"tool_use","name":"Write","input":{"file_path":"relative.ts"}}]}}' \
+    '{"type":"assistant","message":{"content":[{"type":"text","text":"Done."}]}}'
+)
+
+# --- Cases ---
+
+run_case "honest completion (write + file exists)" "$(make_payload "$TX_HONEST")" silent
+run_case "edit + file exists" "$(make_payload "$TX_EDIT")" silent
+run_case "MultiEdit counts as mutating" "$(make_payload "$TX_MULTIEDIT")" silent
+run_case "alt payload shape (.tool_input.file_path)" "$(make_payload "$TX_ALT_SHAPE")" silent
+run_case "pure analyst (no claim, no mutations)" "$(make_payload "$TX_ANALYST")" silent
+run_case "trivial chatter (claim word but no file token)" "$(make_payload "$TX_TRIVIAL")" silent
+run_case "negated 'not all tests pass' (no flag)" "$(make_payload "$TX_NEGATION")" silent
+run_case "'fixed itself' does not match 'fixed it'" "$(make_payload "$TX_FIXED_ITSELF")" silent
+run_case "broken symlink counts as present" "$(make_payload "$TX_BROKEN_SYMLINK")" silent
+run_case "long transcript (Write at line 1, msg after 60 Reads)" "$(make_payload "$TX_LONG")" silent
+run_case "relative file_path resolves against cwd" "$(make_payload "$TX_RELATIVE")" silent
+run_case "liar — completion + 0 mutations" "$(make_payload "$TX_LIAR_NO_ACTION")" "made no file changes"
+run_case "ghost — tool call but file missing on disk" "$(make_payload "$TX_GHOST")" "not present on disk"
+run_case "escape hatch (stop_hook_active=true)" "$(make_payload "$TX_LIAR_NO_ACTION" true)" silent
+run_case "missing transcript path (fail-open)" '{"stop_hook_active":false}' silent
+run_case "non-existent transcript file (fail-open)" "$(make_payload "/nonexistent/transcript.jsonl")" silent
+run_case "empty stdin (fail-open)" '' silent
+
+# Kill switch — set env var, expect silent regardless of payload.
+KILL_OUT=$(printf '%s' "$(make_payload "$TX_LIAR_NO_ACTION")" | VERIFY_DELIVERABLES_DISABLED=1 bash "$HOOK" 2>&1)
+if [ -z "$KILL_OUT" ]; then
+    echo "PASS: kill switch (VERIFY_DELIVERABLES_DISABLED=1)"
+    PASS=$((PASS + 1))
+else
+    echo "FAIL: kill switch — got: $KILL_OUT"
+    FAIL=$((FAIL + 1))
+fi
+
+echo ""
+echo "Results: $PASS passed, $FAIL failed"
+[ "$FAIL" -eq 0 ]
diff --git a/plugins/core/hooks/verify-deliverables.sh b/plugins/core/hooks/verify-deliverables.sh
new file mode 100755
index 0000000..3af6ac7
--- /dev/null
+++ b/plugins/core/hooks/verify-deliverables.sh
@@ -0,0 +1,132 @@
+#!/bin/bash
+# SubagentStop hook: verify-deliverables
+#
+# Cross-references the subagent's recent activity against its completion claims.
+# When the subagent says it finished work but the tool-call record or filesystem
+# disagree, this hook surfaces the discrepancy to the parent agent via
+# hookSpecificOutput.additionalContext.
+#
+# Always advisory, never blocks. Exits 0 in every path so a hook bug cannot
+# break the harness. See plugins/core/hooks/README.md for design notes.
+#
+# Issue: https://github.com/TechNickAI/ai-coding-config/issues/51
+
+set -u
+
+# Kill switch for opt-out without uninstalling.
+[ "${VERIFY_DELIVERABLES_DISABLED:-}" = "1" ] && exit 0
+
+# Hook depends on jq for JSON parsing. If jq is missing, fail open silently
+# rather than producing a non-zero exit that disrupts the harness.
+command -v jq >/dev/null 2>&1 || exit 0
+
+# --- Read input ---
+
+INPUT=$(cat)
+[ -z "$INPUT" ] && exit 0
+
+TRANSCRIPT_PATH=$(echo "$INPUT" | jq -r '.transcript_path // empty' 2>/dev/null)
+STOP_HOOK_ACTIVE=$(echo "$INPUT" | jq -r '.stop_hook_active // false' 2>/dev/null)
+CWD=$(echo "$INPUT" | jq -r '.cwd // empty' 2>/dev/null)
+[ -z "$CWD" ] && CWD="$(pwd)"
+
+# Escape hatch: hook already fired this stop cycle. Prevents loops.
+[ "$STOP_HOOK_ACTIVE" = "true" ] && exit 0
+
+# Fail-open: no readable transcript -> silent pass.
+[ -z "$TRANSCRIPT_PATH" ] && exit 0
+[ ! -r "$TRANSCRIPT_PATH" ] && exit 0
+
+# --- Slice recent activity ---
+
+# 200 entries handles subagents that ran many Read calls between an early
+# Write and a final summary, while still bounding parse time.
+RECENT=$(tail -n 200 "$TRANSCRIPT_PATH" 2>/dev/null)
+[ -z "$RECENT" ] && exit 0
+
+# --- Extract signals ---
+
+LAST_ASSISTANT=$(echo "$RECENT" | jq -c 'select(.type == "assistant")' 2>/dev/null | tail -n 1)
+LAST_MSG=$(echo "$LAST_ASSISTANT" | jq -r '.message.content[]? | select(.type == "text") | .text' 2>/dev/null)
+
+# Mutating tool calls in the recent slice.
+MUTATING_CALLS=$(echo "$RECENT" | jq -r '
+    select(.type == "assistant")
+    | .message.content[]?
+    | select(.type == "tool_use")
+    | select(.name == "Write" or .name == "Edit" or .name == "MultiEdit" or .name == "NotebookEdit")
+    | .name
+' 2>/dev/null | wc -l | tr -d ' ')
+
+# File paths from those tool calls. Hedge between .input and .tool_input
+# since transcript shape can differ from hook-payload shape across versions.
+# Strip embedded newlines so the read loop iterates one path per line.
+CALLED_PATHS=$(echo "$RECENT" | jq -r '
+    select(.type == "assistant")
+    | .message.content[]?
+    | select(.type == "tool_use")
+    | select(.name == "Write" or .name == "Edit" or .name == "MultiEdit" or .name == "NotebookEdit")
+    | (.input.file_path // .tool_input.file_path // empty)
+    | gsub("[\\n\\r]"; " ")
+' 2>/dev/null | sort -u)
+
+# --- Check A: completion claim with zero mutations ---
+
+ISSUES=""
+
+if [ -n "$LAST_MSG" ] && [ "$MUTATING_CALLS" = "0" ]; then
+    # Conservative completion patterns. False positives are noisier than
+    # false negatives, so we only fire on strong signals. Trailing
+    # boundary keeps "fixed it" from matching inside "fixed itself"
+    # and "all tests pass" from matching inside "passenger".
+    if echo "$LAST_MSG" | grep -qiE '(^|[^a-z])(✅|done\.|complete\.|completed\.|finished\.|implemented |fixed it|created the |wrote the |added the |saved the |all tests pass(ed)?)([^a-zA-Z]|$)'; then
+        # Reject negated phrasing — "not all tests pass" should not trigger.
+        if ! echo "$LAST_MSG" | grep -qiE '(not|n.?t|cannot|couldn.?t|didn.?t|haven.?t|hasn.?t|wasn.?t)([[:space:]]+(yet|fully|quite|all|even))?[[:space:]]+(✅|done|complete|completed|finished|implemented|fixed|added|wrote|created|saved|all tests pass)'; then
+            # Require a file-path-shaped token. Pure analysis output
+            # ("found 3 patterns in the codebase") shouldn't trigger.
+            if echo "$LAST_MSG" | grep -qE '[a-zA-Z0-9_/-]+\.[a-zA-Z]{1,5}'; then
+                ISSUES="${ISSUES}- Subagent claims completion but made no file changes (Write/Edit/MultiEdit/NotebookEdit count: 0).\n"
+            fi
+        fi
+    fi
+fi
+
+# --- Check B: tool-call paths that don't exist on disk ---
+
+if [ -n "$CALLED_PATHS" ]; then
+    while IFS= read -r path; do
+        [ -z "$path" ] && continue
+        case "$path" in
+            /*) full_path="$path" ;;
+            *)  full_path="$CWD/$path" ;;
+        esac
+        # `-e` follows symlinks; a broken symlink would falsely flag.
+        # `-L` catches the symlink itself — tool created it successfully
+        # even if the target moved.
+        if [ ! -e "$full_path" ] && [ ! -L "$full_path" ]; then
+            ISSUES="${ISSUES}- Tool call recorded Write/Edit to '$path' but the file is not present on disk.\n"
+        fi
+    done <<EOF
+$CALLED_PATHS
+EOF
+fi
+
+# --- Emit advisory or pass silently ---
+
+if [ -z "$ISSUES" ]; then
+    exit 0
+fi
+
+MESSAGE=$(printf "[verify-deliverables] Subagent completion check found discrepancies:\n%bReview before treating the subagent's result as final." "$ISSUES")
+
+# Wrap so a jq failure (OOM, missing binary slipping past the early guard)
+# cannot leak a non-zero exit to the harness.
+jq -n --arg msg "$MESSAGE" '{
+  "continue": true,
+  "hookSpecificOutput": {
+    "hookEventName": "SubagentStop",
+    "additionalContext": $msg
+  }
+}' 2>/dev/null || true
+
+exit 0

From 29a6e0a64de84c9a75ad6282d923eb904781210d Mon Sep 17 00:00:00 2001
From: Nick Sullivan <nick@technick.ai>
Date: Mon, 4 May 2026 17:50:27 -0500
Subject: [PATCH 2/2] =?UTF-8?q?=F0=9F=90=9B=20Fix=20bot-review=20issues=20?=
 =?UTF-8?q?in=20verify-deliverables=20hook?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- Read agent_transcript_path (subagent's own log) instead of
  transcript_path (parent session); add last_assistant_message shortcut
- Fix n.?t → n't in negation guard (n.?t matched 'nt' inside 'count',
  'front', 'environment', etc.)
- Replace TMPDIR with TEST_DIR in test harness (TMPDIR is a reserved
  env var that mktemp reads internally)
- Add NotebookEdit test case to complete mutation-type coverage
- Switch ISSUES separator from literal \n to $'\n'; drop printf %b in
  MESSAGE to prevent backslash interpretation in file paths
- Remove ✅ from completion pattern (locale portability; prose patterns
  are sufficient)
- Payload now includes agent_transcript_path in test fixtures

All 19 test cases pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 .../hooks/tests/test-verify-deliverables.sh   | 48 +++++++++++--------
 plugins/core/hooks/verify-deliverables.sh     | 25 ++++++----
 2 files changed, 45 insertions(+), 28 deletions(-)

diff --git a/plugins/core/hooks/tests/test-verify-deliverables.sh b/plugins/core/hooks/tests/test-verify-deliverables.sh
index f0129aa..01863d0 100755
--- a/plugins/core/hooks/tests/test-verify-deliverables.sh
+++ b/plugins/core/hooks/tests/test-verify-deliverables.sh
@@ -16,8 +16,8 @@ if [ ! -x "$HOOK" ]; then
     exit 2
 fi
 
-TMPDIR=$(mktemp -d)
-trap 'rm -rf "$TMPDIR"' EXIT
+TEST_DIR=$(mktemp -d)
+trap 'rm -rf "$TEST_DIR"' EXIT
 
 PASS=0
 FAIL=0
@@ -73,7 +73,7 @@ run_case() {
 }
 
 write_transcript() {
-    local file="$TMPDIR/$1.jsonl"
+    local file="$TEST_DIR/$1.jsonl"
     shift
     : > "$file"
     for line in "$@"; do
@@ -86,16 +86,16 @@ make_payload() {
     local transcript="$1"
     local stop_hook_active="${2:-false}"
     cat <<EOF
-{"transcript_path":"$transcript","stop_hook_active":$stop_hook_active,"cwd":"$TMPDIR","hook_event_name":"SubagentStop","session_id":"test-session"}
+{"transcript_path":"$transcript","agent_transcript_path":"$transcript","stop_hook_active":$stop_hook_active,"cwd":"$TEST_DIR","hook_event_name":"SubagentStop","session_id":"test-session"}
 EOF
 }
 
 # --- Fixtures ---
 
 # An honest write: tool call to foo.ts AND foo.ts exists on disk.
-echo "content" > "$TMPDIR/foo.ts"
+echo "content" > "$TEST_DIR/foo.ts"
 TX_HONEST=$(write_transcript "honest" \
-    "{\"type\":\"assistant\",\"message\":{\"content\":[{\"type\":\"tool_use\",\"name\":\"Write\",\"input\":{\"file_path\":\"$TMPDIR/foo.ts\",\"content\":\"content\"}}]}}" \
+    "{\"type\":\"assistant\",\"message\":{\"content\":[{\"type\":\"tool_use\",\"name\":\"Write\",\"input\":{\"file_path\":\"$TEST_DIR/foo.ts\",\"content\":\"content\"}}]}}" \
     '{"type":"assistant","message":{"content":[{"type":"text","text":"Done. Created `foo.ts` as requested."}]}}'
 )
 
@@ -107,7 +107,7 @@ TX_LIAR_NO_ACTION=$(write_transcript "liar-no-action" \
 
 # A ghost: tool call recorded but file is not on disk.
 TX_GHOST=$(write_transcript "ghost" \
-    "{\"type\":\"assistant\",\"message\":{\"content\":[{\"type\":\"tool_use\",\"name\":\"Write\",\"input\":{\"file_path\":\"$TMPDIR/ghost.md\",\"content\":\"x\"}}]}}" \
+    "{\"type\":\"assistant\",\"message\":{\"content\":[{\"type\":\"tool_use\",\"name\":\"Write\",\"input\":{\"file_path\":\"$TEST_DIR/ghost.md\",\"content\":\"x\"}}]}}" \
     '{"type":"assistant","message":{"content":[{"type":"text","text":"All set."}]}}'
 )
 
@@ -122,26 +122,33 @@ TX_TRIVIAL=$(write_transcript "trivial" \
 )
 
 # Edit (not Write) to existing file. Should pass — Edit counts as mutating.
-echo "before" > "$TMPDIR/edited.ts"
+echo "before" > "$TEST_DIR/edited.ts"
 TX_EDIT=$(write_transcript "edit-honest" \
-    "{\"type\":\"assistant\",\"message\":{\"content\":[{\"type\":\"tool_use\",\"name\":\"Edit\",\"input\":{\"file_path\":\"$TMPDIR/edited.ts\",\"old_string\":\"before\",\"new_string\":\"after\"}}]}}" \
+    "{\"type\":\"assistant\",\"message\":{\"content\":[{\"type\":\"tool_use\",\"name\":\"Edit\",\"input\":{\"file_path\":\"$TEST_DIR/edited.ts\",\"old_string\":\"before\",\"new_string\":\"after\"}}]}}" \
     '{"type":"assistant","message":{"content":[{"type":"text","text":"Done. Updated `edited.ts`."}]}}'
 )
 
 # tool_input shape (alternative payload format) for Write.
-echo "alt-content" > "$TMPDIR/alt.ts"
+echo "alt-content" > "$TEST_DIR/alt.ts"
 TX_ALT_SHAPE=$(write_transcript "alt-shape" \
-    "{\"type\":\"assistant\",\"message\":{\"content\":[{\"type\":\"tool_use\",\"name\":\"Write\",\"tool_input\":{\"file_path\":\"$TMPDIR/alt.ts\"}}]}}" \
+    "{\"type\":\"assistant\",\"message\":{\"content\":[{\"type\":\"tool_use\",\"name\":\"Write\",\"tool_input\":{\"file_path\":\"$TEST_DIR/alt.ts\"}}]}}" \
     '{"type":"assistant","message":{"content":[{"type":"text","text":"Done. Wrote alt.ts."}]}}'
 )
 
 # MultiEdit should count as mutating.
-echo "before" > "$TMPDIR/multi.ts"
+echo "before" > "$TEST_DIR/multi.ts"
 TX_MULTIEDIT=$(write_transcript "multi-edit" \
-    "{\"type\":\"assistant\",\"message\":{\"content\":[{\"type\":\"tool_use\",\"name\":\"MultiEdit\",\"input\":{\"file_path\":\"$TMPDIR/multi.ts\",\"edits\":[]}}]}}" \
+    "{\"type\":\"assistant\",\"message\":{\"content\":[{\"type\":\"tool_use\",\"name\":\"MultiEdit\",\"input\":{\"file_path\":\"$TEST_DIR/multi.ts\",\"edits\":[]}}]}}" \
     '{"type":"assistant","message":{"content":[{"type":"text","text":"Done. Updated multi.ts in several places."}]}}'
 )
 
+# NotebookEdit should count as mutating.
+echo "before" > "$TEST_DIR/nb.ipynb"
+TX_NOTEBOOK=$(write_transcript "notebook-edit" \
+    "{\"type\":\"assistant\",\"message\":{\"content\":[{\"type\":\"tool_use\",\"name\":\"NotebookEdit\",\"input\":{\"file_path\":\"$TEST_DIR/nb.ipynb\",\"cell_number\":0,\"new_source\":\"x\"}}]}}" \
+    '{"type":"assistant","message":{"content":[{"type":"text","text":"Done. Updated the notebook."}]}}'
+)
+
 # Negation: "not all tests pass" should not flag.
 TX_NEGATION=$(write_transcript "negated" \
     '{"type":"assistant","message":{"content":[{"type":"tool_use","name":"Read","input":{"file_path":"src/foo.ts"}}]}}' \
@@ -155,26 +162,26 @@ TX_FIXED_ITSELF=$(write_transcript "fixed-itself" \
 )
 
 # Broken symlink: tool created the symlink, target is gone. -L catches it.
-ln -s "/nonexistent/target" "$TMPDIR/broken-link"
+ln -s "/nonexistent/target" "$TEST_DIR/broken-link"
 TX_BROKEN_SYMLINK=$(write_transcript "broken-symlink" \
-    "{\"type\":\"assistant\",\"message\":{\"content\":[{\"type\":\"tool_use\",\"name\":\"Write\",\"input\":{\"file_path\":\"$TMPDIR/broken-link\"}}]}}" \
+    "{\"type\":\"assistant\",\"message\":{\"content\":[{\"type\":\"tool_use\",\"name\":\"Write\",\"input\":{\"file_path\":\"$TEST_DIR/broken-link\"}}]}}" \
     '{"type":"assistant","message":{"content":[{"type":"text","text":"Done."}]}}'
 )
 
 # Long transcript: Write at start, final message after many Read calls.
 # Confirms the 200-line slice catches Writes that the old 50-line window missed.
-echo "long-content" > "$TMPDIR/long.ts"
+echo "long-content" > "$TEST_DIR/long.ts"
 {
-    printf '%s\n' "{\"type\":\"assistant\",\"message\":{\"content\":[{\"type\":\"tool_use\",\"name\":\"Write\",\"input\":{\"file_path\":\"$TMPDIR/long.ts\",\"content\":\"x\"}}]}}"
+    printf '%s\n' "{\"type\":\"assistant\",\"message\":{\"content\":[{\"type\":\"tool_use\",\"name\":\"Write\",\"input\":{\"file_path\":\"$TEST_DIR/long.ts\",\"content\":\"x\"}}]}}"
     for i in $(seq 1 60); do
         printf '%s\n' "{\"type\":\"assistant\",\"message\":{\"content\":[{\"type\":\"tool_use\",\"name\":\"Read\",\"input\":{\"file_path\":\"src/file-$i.ts\"}}]}}"
     done
     printf '%s\n' '{"type":"assistant","message":{"content":[{"type":"text","text":"Done. Implemented in long.ts."}]}}'
-} > "$TMPDIR/long-transcript.jsonl"
-TX_LONG="$TMPDIR/long-transcript.jsonl"
+} > "$TEST_DIR/long-transcript.jsonl"
+TX_LONG="$TEST_DIR/long-transcript.jsonl"
 
 # Relative file_path with cwd — must resolve to $CWD/relative.ts.
-echo "relative-content" > "$TMPDIR/relative.ts"
+echo "relative-content" > "$TEST_DIR/relative.ts"
 TX_RELATIVE=$(write_transcript "relative" \
     '{"type":"assistant","message":{"content":[{"type":"tool_use","name":"Write","input":{"file_path":"relative.ts"}}]}}' \
     '{"type":"assistant","message":{"content":[{"type":"text","text":"Done."}]}}'
@@ -185,6 +192,7 @@ TX_RELATIVE=$(write_transcript "relative" \
 run_case "honest completion (write + file exists)" "$(make_payload "$TX_HONEST")" silent
 run_case "edit + file exists" "$(make_payload "$TX_EDIT")" silent
 run_case "MultiEdit counts as mutating" "$(make_payload "$TX_MULTIEDIT")" silent
+run_case "NotebookEdit counts as mutating" "$(make_payload "$TX_NOTEBOOK")" silent
 run_case "alt payload shape (.tool_input.file_path)" "$(make_payload "$TX_ALT_SHAPE")" silent
 run_case "pure analyst (no claim, no mutations)" "$(make_payload "$TX_ANALYST")" silent
 run_case "trivial chatter (claim word but no file token)" "$(make_payload "$TX_TRIVIAL")" silent
diff --git a/plugins/core/hooks/verify-deliverables.sh b/plugins/core/hooks/verify-deliverables.sh
index 3af6ac7..0e31257 100755
--- a/plugins/core/hooks/verify-deliverables.sh
+++ b/plugins/core/hooks/verify-deliverables.sh
@@ -25,7 +25,10 @@ command -v jq >/dev/null 2>&1 || exit 0
 INPUT=$(cat)
 [ -z "$INPUT" ] && exit 0
 
-TRANSCRIPT_PATH=$(echo "$INPUT" | jq -r '.transcript_path // empty' 2>/dev/null)
+# SubagentStop provides agent_transcript_path (subagent's own log); fall back
+# to transcript_path for callers that only set the session-level field.
+TRANSCRIPT_PATH=$(echo "$INPUT" | jq -r '.agent_transcript_path // empty' 2>/dev/null)
+[ -z "$TRANSCRIPT_PATH" ] && TRANSCRIPT_PATH=$(echo "$INPUT" | jq -r '.transcript_path // empty' 2>/dev/null)
 STOP_HOOK_ACTIVE=$(echo "$INPUT" | jq -r '.stop_hook_active // false' 2>/dev/null)
 CWD=$(echo "$INPUT" | jq -r '.cwd // empty' 2>/dev/null)
 [ -z "$CWD" ] && CWD="$(pwd)"
@@ -46,8 +49,13 @@ RECENT=$(tail -n 200 "$TRANSCRIPT_PATH" 2>/dev/null)
 
 # --- Extract signals ---
 
-LAST_ASSISTANT=$(echo "$RECENT" | jq -c 'select(.type == "assistant")' 2>/dev/null | tail -n 1)
-LAST_MSG=$(echo "$LAST_ASSISTANT" | jq -r '.message.content[]? | select(.type == "text") | .text' 2>/dev/null)
+# last_assistant_message is provided directly in the SubagentStop payload;
+# fall back to JSONL parsing when running against older harness versions.
+LAST_MSG=$(echo "$INPUT" | jq -r '.last_assistant_message // empty' 2>/dev/null)
+if [ -z "$LAST_MSG" ]; then
+    LAST_ASSISTANT=$(echo "$RECENT" | jq -c 'select(.type == "assistant")' 2>/dev/null | tail -n 1)
+    LAST_MSG=$(echo "$LAST_ASSISTANT" | jq -r '.message.content[]? | select(.type == "text") | .text' 2>/dev/null)
+fi
 
 # Mutating tool calls in the recent slice.
 MUTATING_CALLS=$(echo "$RECENT" | jq -r '
@@ -79,13 +87,13 @@ if [ -n "$LAST_MSG" ] && [ "$MUTATING_CALLS" = "0" ]; then
     # false negatives, so we only fire on strong signals. Trailing
     # boundary keeps "fixed it" from matching inside "fixed itself"
     # and "all tests pass" from matching inside "passenger".
-    if echo "$LAST_MSG" | grep -qiE '(^|[^a-z])(✅|done\.|complete\.|completed\.|finished\.|implemented |fixed it|created the |wrote the |added the |saved the |all tests pass(ed)?)([^a-zA-Z]|$)'; then
+    if echo "$LAST_MSG" | grep -qiE '(^|[^a-z])(done\.|complete\.|completed\.|finished\.|implemented |fixed it|created the |wrote the |added the |saved the |all tests pass(ed)?)([^a-zA-Z]|$)'; then
         # Reject negated phrasing — "not all tests pass" should not trigger.
-        if ! echo "$LAST_MSG" | grep -qiE '(not|n.?t|cannot|couldn.?t|didn.?t|haven.?t|hasn.?t|wasn.?t)([[:space:]]+(yet|fully|quite|all|even))?[[:space:]]+(✅|done|complete|completed|finished|implemented|fixed|added|wrote|created|saved|all tests pass)'; then
+        if ! echo "$LAST_MSG" | grep -qiE "(not|n't|cannot|couldn't|didn't|haven't|hasn't|wasn't)([[:space:]]+(yet|fully|quite|all|even))?[[:space:]]+(done|complete|completed|finished|implemented|fixed|added|wrote|created|saved|all tests pass)"; then
             # Require a file-path-shaped token. Pure analysis output
             # ("found 3 patterns in the codebase") shouldn't trigger.
             if echo "$LAST_MSG" | grep -qE '[a-zA-Z0-9_/-]+\.[a-zA-Z]{1,5}'; then
-                ISSUES="${ISSUES}- Subagent claims completion but made no file changes (Write/Edit/MultiEdit/NotebookEdit count: 0).\n"
+                ISSUES="${ISSUES}- Subagent claims completion but made no file changes (Write/Edit/MultiEdit/NotebookEdit count: 0)."$'\n'
             fi
         fi
     fi
@@ -104,7 +112,7 @@ if [ -n "$CALLED_PATHS" ]; then
         # `-L` catches the symlink itself — tool created it successfully
         # even if the target moved.
         if [ ! -e "$full_path" ] && [ ! -L "$full_path" ]; then
-            ISSUES="${ISSUES}- Tool call recorded Write/Edit to '$path' but the file is not present on disk.\n"
+            ISSUES="${ISSUES}- Tool call recorded Write/Edit to '$path' but the file is not present on disk."$'\n'
         fi
     done <<EOF
 $CALLED_PATHS
@@ -117,7 +125,8 @@ if [ -z "$ISSUES" ]; then
     exit 0
 fi
 
-MESSAGE=$(printf "[verify-deliverables] Subagent completion check found discrepancies:\n%bReview before treating the subagent's result as final." "$ISSUES")
+MESSAGE="[verify-deliverables] Subagent completion check found discrepancies:
+${ISSUES}Review before treating the subagent's result as final."
 
 # Wrap so a jq failure (OOM, missing binary slipping past the early guard)
 # cannot leak a non-zero exit to the harness.