✨ Add verify-deliverables SubagentStop hook by TechNickAI · Pull Request #54 · TechNickAI/ai-coding-config

TechNickAI · 2026-05-04T20:30:47Z

Summary

Adds a SubagentStop lifecycle hook that cross-references a subagent's recent activity against its completion claims. When the tool-call record or filesystem disagree with what the subagent said it did, the hook surfaces a structured advisory back to the parent agent via hookSpecificOutput.additionalContext — never blocking.

This shifts the "did the subagent actually do what it claimed?" check from prompt discipline (where it lives today across <epistemic-honesty>, the verify-fix skill, and the verification-before-completion superpower) to a structural lifecycle hook the harness enforces.

Fixes #51

What it checks

Check A — completion claim with zero mutations. If the subagent's last message contains strong completion language (done., completed., ✅, implemented, fixed it, all tests pass, etc.) plus a file-path-shaped token, but the recent transcript shows zero Write/Edit/MultiEdit/NotebookEdit calls — flag.

Check B — recorded tool call but file missing. For each path in a Write/Edit-style tool call's file_path, verify it exists on disk now. A missing file means the call failed silently or the artifact was lost.

Design decisions

Bash + jq, not Node. Matches the existing session-multiplexing.sh and todo-persist.sh style. Easier to maintain, no runtime deps beyond what other hooks already require.
Always advisory, never blocks. Output is appended via additionalContext; the parent agent decides what to do. A misbehaving hook that blocks would be worse than one that misses a lie.
Conservative regex on completion phrases. Patterns like done\., completed\., fixed it, all tests pass(ed)? are anchored with leading and trailing non-letter boundaries. A negation guard rejects "not all tests pass". Tests cover the "fixed itself" / negation regressions.
Cross-references tool calls, not prose claims. OMC's prior art uses a static deliverables.json config per stage; this approach uses the actual transcript so it works on any subagent without per-skill configuration.
200-line transcript window (was 50 in the first draft) — handles subagents that ran many Read calls between an early Write and a final summary.
Fail-open in every error path. Missing jq, unreadable transcript, malformed JSON, jq OOM at the final emit step — all exit 0 silently.
Escape hatches: respects stop_hook_active=true to prevent loops; VERIFY_DELIVERABLES_DISABLED=1 env var disables entirely.

Files

plugins/core/hooks/verify-deliverables.sh — the hook
plugins/core/hooks/tests/test-verify-deliverables.sh — 18-case test harness
plugins/core/hooks/hooks.json — registers the hook against SubagentStop
plugins/core/hooks/README.md — documents the hooks directory + this hook's design
plugins/core/.claude-plugin/plugin.json — version 9.20.0 → 9.21.0
.claude-plugin/marketplace.json — version 9.20.0 → 9.21.0 (both fields)

Complexity

balanced — multi-file, new pattern in this repo (first SubagentStop hook), but well-bounded. Multi-review surfaced and fixed real issues before push.

Validation

18 tests pass: honest writes, MultiEdit, alt payload shape, pure analyst messages, negated claims, trailing-boundary regressions ("fixed itself" not matching "fixed it"), broken symlinks, long transcripts (Write at line 1, message at line 60+), relative paths resolved against cwd, escape hatch, kill switch, plus several fail-open cases.
jq . hooks.json validates.
Version sync confirmed across all three locations per .claude-plugin/CLAUDE.md.
Multi-review by robustness, security, and logic reviewers; all must-fix and bug findings addressed:
- Added command -v jq guard at script entry
- Wrapped final jq -n so a jq failure can't leak non-zero exit
- Bumped tail window 50 → 200
- Tightened completion regex with trailing boundary
- Added negation guard
- Added broken-symlink handling (-L test)
- Strip embedded newlines from extracted file paths in jq

Out of scope (per issue)

Verifying correctness of artifacts — that's verify-fix's job. This hook only checks existence.
Reading external project-level overrides for verification rules — kept simple; project-specific tuning can be revisited if needed.

Cross-references a subagent's recent activity against its completion claims. Fires advisory output via hookSpecificOutput.additionalContext when the tool-call record or filesystem disagrees with what the subagent said it did. Two checks: - strong completion claim + zero Write/Edit/MultiEdit/NotebookEdit calls + a file-path-shaped token in the message → flag - a Write/Edit-style tool call's file_path is missing on disk → flag Always advisory, never blocks. Fail-open in every error path so a hook bug cannot break the harness. Honors stop_hook_active to avoid loops and a VERIFY_DELIVERABLES_DISABLED=1 env kill switch. Test harness covers 18 cases including honest writes, MultiEdit, alt payload shape, negated phrasings, trailing-boundary regressions, broken symlinks, long transcripts, relative paths, the escape hatch, and several fail-open paths. Bumps plugin version to 9.21.0 across marketplace.json and plugin.json. Fixes #51

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5a4687901b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-04T20:33:21Z

+INPUT=$(cat)
+[ -z "$INPUT" ] && exit 0
+
+TRANSCRIPT_PATH=$(echo "$INPUT" | jq -r '.transcript_path // empty' 2>/dev/null)


Read subagent transcript instead of parent transcript

This hook is registered on SubagentStop, but it reads .transcript_path, which for this event is the main session transcript rather than the subagent’s own log; the subagent tool calls and final message live under .agent_transcript_path. As written, the checks can miss real subagent write/edit activity (false “no file changes”) and fail to verify the files the subagent actually touched, so the deliverable validation is systematically unreliable for the event it targets.

Useful? React with 👍 / 👎.

Critical catch — thank you. The docs confirm SubagentStop provides both transcript_path (main session) and agent_transcript_path (subagent's own log). Fixed in 29a6e0a: the hook now reads agent_transcript_path first and falls back to transcript_path for older harness versions or test fixtures. Also wired in last_assistant_message from the payload as a direct shortcut for the final message check, avoiding JSONL parsing when the harness provides it.

Incorrect — line 30 already reads agent_transcript_path first and falls back to transcript_path only when the subagent field is absent. The code handles this correctly.

claude · 2026-05-04T20:33:38Z

Code Review

Summary: This is a well-engineered PR. The fail-open design is correct, the test coverage is solid (18 cases), and the advisory-only output model is the right call. A few issues worth addressing before merge.

Bug: `TMPDIR` shadowing in test harness

test-verify-deliverables.sh line 21:

TMPDIR=$(mktemp -d)

TMPDIR is a reserved environment variable that mktemp itself reads to determine where to create temp files. Shadowing it exports that new directory as the temp root for all child processes. Any tool called from within the test that internally calls mktemp (jq on some platforms, bash subshells) would create files inside the test directory, which then gets rm -rf'd on EXIT. On macOS this is benign; on Linux it can cause surprising behavior.

Use a different name: TEST_DIR=$(mktemp -d) and update the three references.

Bug: `n.?t` in negation guard matches unintended substrings

verify-deliverables.sh line ~108:

grep -qiE '(not|n.?t|cannot|couldn.?t|...)...'

n.?t (zero-or-one wildcard) matches nt as well as n't — so it also matches inside "count", "front", "hunt", "sent", "mount", "want", etc. Combined with the context constraint (negation word + whitespace + completion word), false positives are rare but possible. The intent is almost certainly n't for general contractions.

Simpler and correct:

grep -qiE "(not|n't|cannot|couldn't|didn't|haven't|hasn't|wasn't)..."

Since the script already uses -E (ERE), the ' literal is fine inside the pattern.

Missing test: `NotebookEdit` as a mutation type

The hook counts NotebookEdit alongside Write/Edit/MultiEdit as a mutating call, but there's no test case that sends a NotebookEdit tool call and verifies it suppresses the advisory. The MultiEdit case is tested; adding one for NotebookEdit would complete the coverage.

Nit: `printf "%b"` interprets escape sequences in file paths

verify-deliverables.sh line ~125:

MESSAGE=$(printf "...\n%bReview..." "$ISSUES")

%b expands backslash sequences in its argument. $ISSUES is built from extracted file paths, so a path containing \t, \n, etc. (unusual but valid on Linux) would be silently mangled. Using actual newlines in $ISSUES and printf "%s" is safer:

ISSUES="${ISSUES}- message here"$'\n'
...
MESSAGE="[verify-deliverables] ...$'\n'${ISSUES}Review..."

Or just use echo -e / printf with a literal newline. Low-probability failure mode but worth closing.

Minor: ✅ emoji in `grep -E` requires UTF-8 locale

The completion-phrase regex includes a raw ✅ character. grep -E with multi-byte emoji depends on the runtime locale being UTF-8. On minimal CI environments or systems with LC_ALL=C, the emoji byte sequence may not match. Adding LC_ALL=C.UTF-8 before the grep calls (or testing ✅ separately with grep -P or printf '\xe2\x9c\x85') would make this portable. Alternatively, drop the emoji from the pattern if the prose patterns are sufficient — the test suite has no case covering ✅ detection anyway.

The design is sound

The 200-line transcript window (vs. the original 50), the stop_hook_active escape hatch, the -L symlink check, the trailing-boundary regex to prevent "fixed itself" false positives, and the jq -n wrapper around the final emit are all good choices that I'd have surfaced in review if they weren't already there. The hooks.json registration is syntactically valid and matches the existing convention.

- Read agent_transcript_path (subagent's own log) instead of transcript_path (parent session); add last_assistant_message shortcut - Fix n.?t → n't in negation guard (n.?t matched 'nt' inside 'count', 'front', 'environment', etc.) - Replace TMPDIR with TEST_DIR in test harness (TMPDIR is a reserved env var that mktemp reads internally) - Add NotebookEdit test case to complete mutation-type coverage - Switch ISSUES separator from literal \n to $'\n'; drop printf %b in MESSAGE to prevent backslash interpretation in file paths - Remove ✅ from completion pattern (locale portability; prose patterns are sufficient) - Payload now includes agent_transcript_path in test fixtures All 19 test cases pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

TechNickAI · 2026-05-04T22:50:57Z

Fixed all five issues in 29a6e0a:

TMPDIR shadowing: renamed to TEST_DIR throughout the test harness
n.?t: changed to n't and switched grep to double-quotes; also removed ✅ from the negation check since it was dropped from the positive pattern
NotebookEdit test: added fixture + run_case — 19 tests now, all passing
printf %b: ISSUES entries now use $'\n' separators; MESSAGE built with a literal newline so %b never touches file-path content
✅ locale: removed from the completion pattern; the prose patterns (done., finished., implemented, etc.) cover the same signal without the UTF-8 dependency

claude · 2026-05-04T22:52:54Z

Code Review — PR #54: Add `verify-deliverables` SubagentStop hook

Overview

This PR adds the repo's first SubagentStop lifecycle hook, shifting the "did the subagent actually do what it claimed?" check from prompt discipline to a structural harness enforcement point. The design is sound: advisory-only, fail-open, with a kill switch and escape hatches. Overall this is well-considered work.

✅ Strengths

Fail-open is consistently enforced — every error path exits 0: missing jq, empty input, unreadable transcript, jq OOM at emit. This is exactly right for a hook.
Test coverage is solid — 18 cases covering honest writes, MultiEdit, alt payload shapes, negated claims, trailing-boundary regressions, broken symlinks, long transcripts, relative paths, escape hatch, kill switch, and all fail-open paths.
The jq guard at script entry is a good addition (noted as a prior-review fix). Prevents a confusing failure mode on minimal environments.
200-line transcript window is a reasonable bound; the prior 50-line window would have missed the long-transcript case documented in the test.
README.md is well-written and serves as a good template for future hooks.
Version sync is correct across plugin.json and marketplace.json.

🟡 Issues Worth Addressing

1. `implemented` pattern has an unreachable trailing boundary (false-negative bug)

In verify-deliverables.sh, the completion regex:

(^|[^a-z])(done\.|...|implemented |...)([ ^a-zA-Z]|$)

The implemented alternative already ends with a literal space. The trailing boundary group ([^a-zA-Z]|$) then requires the next character after that space to be a non-letter or end of string. Since English sentences have implemented <noun> (where <noun> starts with a letter), this means almost no real "implemented X" sentence will trigger via this branch.

Example: "Implemented the changes in src/foo.ts." — implemented matches the characters, but then t (from the) fails [^a-zA-Z], so the whole pattern misses.

In practice this is masked because done. catches most liar messages, but the implemented branch is nearly dead code. Consider:

implemented[^a-zA-Z] (word boundary approach, no trailing space), or
Drop the trailing boundary from space-terminated alternatives and rely only on the leading boundary.

This is a false-negative (misses some liars), not a false-positive — consistent with the stated philosophy of "rather miss a few lies than nag honest responses" — but it's worth knowing the pattern doesn't work as written.

2. Same issue applies to several other space-terminated alternatives

created the , wrote the , added the , saved the all have the same structural problem: the trailing boundary ([^a-zA-Z]|$) applies after the space, so created the file.ts → after created the , the next char is f (letter) → no match. The only patterns that work correctly are the ones ending in . (like done., completed.) since ([^a-zA-Z]|$) on $ or non-letter after . works fine.

3. `CALLED_PATHS` here-doc iterates an empty string when paths contain only whitespace

if [ -n "$CALLED_PATHS" ]; then
    while IFS= read -r path; do
        [ -z "$path" ] && continue

The [ -z "$path" ] && continue guard handles this correctly. No change needed, but worth a comment that the empty-line guard is load-bearing.

🔵 Minor / Nits

Multiple echo "$RECENT" | jq calls (lines ~70, ~80, ~90) scan the same 200-line slice three times. Negligible at this scale, but a single jq pass extracting all three signals would be cleaner if the script ever grows.
Test runner uses bash "$HOOK" rather than executing it directly — fine, but if the shebang were ever changed or the script needed to test for interpreter errors, direct execution would catch more.
run_case in the test harness merges stdout and stderr (2>&1). If the hook ever writes intentional diagnostics to stderr, this would create spurious failures. Separating them would be more robust, but is acceptable for now.

Summary

Category	Finding
Logic	`implemented` , `created the` , `wrote the` , `added the` , `saved the` patterns are effectively dead due to trailing-boundary misapplication
Security	No concerns — existence-check only, fail-open
Performance	Fine — bounded transcript window, jq is lightweight here
Tests	Strong — 18 cases with good edge coverage
Conventions	Consistent with existing hooks, README follows project style

The must-fix is the regex dead-code issue (#1/#2 above), which is a correctness gap in the stated feature. Everything else is minor. The architectural approach and fail-open discipline are solid — this is a good addition to the hooks system.

🤖 Generated with Claude Code

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 29a6e0a. Configure here.}

cursor · 2026-05-04T22:59:30Z

+    # false negatives, so we only fire on strong signals. Trailing
+    # boundary keeps "fixed it" from matching inside "fixed itself"
+    # and "all tests pass" from matching inside "passenger".
+    if echo "$LAST_MSG" | grep -qiE '(^|[^a-z])(done\.|complete\.|completed\.|finished\.|implemented |fixed it|created the |wrote the |added the |saved the |all tests pass(ed)?)([^a-zA-Z]|$)'; then


Trailing boundary breaks space-ending completion patterns

Medium Severity

The trailing boundary ([^a-zA-Z]|$) is applied uniformly to all alternatives in the completion regex. For patterns ending with a space — implemented , created the , wrote the , added the , saved the — this boundary checks the character after the trailing space, which in natural English is almost always a letter (e.g., "Implemented the fix"). These five patterns therefore almost never match. The boundary was needed only for fixed it (to avoid matching fixed itself) but inadvertently silences half the completion vocabulary. The liar test case is masked because it uses "Done. Implemented…", which is caught by done\. instead.

^{Reviewed by Cursor Bugbot for commit 29a6e0a. Configure here.}

TechNickAI · 2026-05-05T11:15:34Z

Great catch on the trailing boundary issue (#1/#2). The space-ending patterns (implemented , created the , wrote the , added the , saved the ) are effectively dead code due to ([^a-zA-Z]|$) requiring a non-letter after the trailing space. Addressing in a follow-up PR.

chatgpt-codex-connector Bot reviewed May 4, 2026

View reviewed changes

cursor Bot reviewed May 4, 2026

View reviewed changes

Comment thread plugins/core/hooks/verify-deliverables.sh Outdated

TechNickAI mentioned this pull request May 4, 2026

Add verify-deliverables SubagentStop hook to harden subagent completion claims #51

Closed

cursor Bot reviewed May 4, 2026

View reviewed changes

TechNickAI merged commit 54700e9 into main May 4, 2026
9 checks passed

TechNickAI deleted the feat/verify-deliverables-hook branch May 4, 2026 23:00

TechNickAI mentioned this pull request May 5, 2026

🐛 Fix trailing boundary regex in verify-deliverables #58

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

✨ Add verify-deliverables SubagentStop hook#54

✨ Add verify-deliverables SubagentStop hook#54
TechNickAI merged 2 commits intomainfrom
feat/verify-deliverables-hook

TechNickAI commented May 4, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 4, 2026

Uh oh!

TechNickAI May 4, 2026

Uh oh!

TechNickAI May 5, 2026

Uh oh!

claude Bot commented May 4, 2026

Uh oh!

Uh oh!

TechNickAI commented May 4, 2026

Uh oh!

claude Bot commented May 4, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot May 4, 2026

Uh oh!

Uh oh!

TechNickAI commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

TechNickAI commented May 4, 2026

Summary

What it checks

Design decisions

Files

Complexity

Validation

Out of scope (per issue)

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

TechNickAI May 4, 2026

Choose a reason for hiding this comment

Uh oh!

TechNickAI May 5, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot commented May 4, 2026

Code Review

Bug: TMPDIR shadowing in test harness

Bug: n.?t in negation guard matches unintended substrings

Missing test: NotebookEdit as a mutation type

Nit: printf "%b" interprets escape sequences in file paths

Minor: ✅ emoji in grep -E requires UTF-8 locale

The design is sound

Uh oh!

Uh oh!

TechNickAI commented May 4, 2026

Uh oh!

claude Bot commented May 4, 2026

Code Review — PR #54: Add verify-deliverables SubagentStop hook

Overview

✅ Strengths

🟡 Issues Worth Addressing

1. implemented pattern has an unreachable trailing boundary (false-negative bug)

2. Same issue applies to several other space-terminated alternatives

3. CALLED_PATHS here-doc iterates an empty string when paths contain only whitespace

🔵 Minor / Nits

Summary

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 4, 2026

Choose a reason for hiding this comment

Trailing boundary breaks space-ending completion patterns

Uh oh!

Uh oh!

TechNickAI commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Bug: `TMPDIR` shadowing in test harness

Bug: `n.?t` in negation guard matches unintended substrings

Missing test: `NotebookEdit` as a mutation type

Nit: `printf "%b"` interprets escape sequences in file paths

Minor: ✅ emoji in `grep -E` requires UTF-8 locale

Code Review — PR #54: Add `verify-deliverables` SubagentStop hook

1. `implemented` pattern has an unreachable trailing boundary (false-negative bug)

3. `CALLED_PATHS` here-doc iterates an empty string when paths contain only whitespace