✨ Add verify-deliverables SubagentStop hook#54
Conversation
Cross-references a subagent's recent activity against its completion claims. Fires advisory output via hookSpecificOutput.additionalContext when the tool-call record or filesystem disagrees with what the subagent said it did. Two checks: - strong completion claim + zero Write/Edit/MultiEdit/NotebookEdit calls + a file-path-shaped token in the message → flag - a Write/Edit-style tool call's file_path is missing on disk → flag Always advisory, never blocks. Fail-open in every error path so a hook bug cannot break the harness. Honors stop_hook_active to avoid loops and a VERIFY_DELIVERABLES_DISABLED=1 env kill switch. Test harness covers 18 cases including honest writes, MultiEdit, alt payload shape, negated phrasings, trailing-boundary regressions, broken symlinks, long transcripts, relative paths, the escape hatch, and several fail-open paths. Bumps plugin version to 9.21.0 across marketplace.json and plugin.json. Fixes #51
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5a4687901b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| INPUT=$(cat) | ||
| [ -z "$INPUT" ] && exit 0 | ||
|
|
||
| TRANSCRIPT_PATH=$(echo "$INPUT" | jq -r '.transcript_path // empty' 2>/dev/null) |
There was a problem hiding this comment.
Read subagent transcript instead of parent transcript
This hook is registered on SubagentStop, but it reads .transcript_path, which for this event is the main session transcript rather than the subagent’s own log; the subagent tool calls and final message live under .agent_transcript_path. As written, the checks can miss real subagent write/edit activity (false “no file changes”) and fail to verify the files the subagent actually touched, so the deliverable validation is systematically unreliable for the event it targets.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Critical catch — thank you. The docs confirm SubagentStop provides both transcript_path (main session) and agent_transcript_path (subagent's own log). Fixed in 29a6e0a: the hook now reads agent_transcript_path first and falls back to transcript_path for older harness versions or test fixtures. Also wired in last_assistant_message from the payload as a direct shortcut for the final message check, avoiding JSONL parsing when the harness provides it.
There was a problem hiding this comment.
Incorrect — line 30 already reads agent_transcript_path first and falls back to transcript_path only when the subagent field is absent. The code handles this correctly.
Code ReviewSummary: This is a well-engineered PR. The fail-open design is correct, the test coverage is solid (18 cases), and the advisory-only output model is the right call. A few issues worth addressing before merge. Bug:
|
- Read agent_transcript_path (subagent's own log) instead of transcript_path (parent session); add last_assistant_message shortcut - Fix n.?t → n't in negation guard (n.?t matched 'nt' inside 'count', 'front', 'environment', etc.) - Replace TMPDIR with TEST_DIR in test harness (TMPDIR is a reserved env var that mktemp reads internally) - Add NotebookEdit test case to complete mutation-type coverage - Switch ISSUES separator from literal \n to $'\n'; drop printf %b in MESSAGE to prevent backslash interpretation in file paths - Remove ✅ from completion pattern (locale portability; prose patterns are sufficient) - Payload now includes agent_transcript_path in test fixtures All 19 test cases pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Fixed all five issues in 29a6e0a:
|
Code Review — PR #54: Add
|
| Category | Finding |
|---|---|
| Logic | implemented , created the , wrote the , added the , saved the patterns are effectively dead due to trailing-boundary misapplication |
| Security | No concerns — existence-check only, fail-open |
| Performance | Fine — bounded transcript window, jq is lightweight here |
| Tests | Strong — 18 cases with good edge coverage |
| Conventions | Consistent with existing hooks, README follows project style |
The must-fix is the regex dead-code issue (#1/#2 above), which is a correctness gap in the stated feature. Everything else is minor. The architectural approach and fail-open discipline are solid — this is a good addition to the hooks system.
🤖 Generated with Claude Code
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 29a6e0a. Configure here.
| # false negatives, so we only fire on strong signals. Trailing | ||
| # boundary keeps "fixed it" from matching inside "fixed itself" | ||
| # and "all tests pass" from matching inside "passenger". | ||
| if echo "$LAST_MSG" | grep -qiE '(^|[^a-z])(done\.|complete\.|completed\.|finished\.|implemented |fixed it|created the |wrote the |added the |saved the |all tests pass(ed)?)([^a-zA-Z]|$)'; then |
There was a problem hiding this comment.
Trailing boundary breaks space-ending completion patterns
Medium Severity
The trailing boundary ([^a-zA-Z]|$) is applied uniformly to all alternatives in the completion regex. For patterns ending with a space — implemented , created the , wrote the , added the , saved the — this boundary checks the character after the trailing space, which in natural English is almost always a letter (e.g., "Implemented the fix"). These five patterns therefore almost never match. The boundary was needed only for fixed it (to avoid matching fixed itself) but inadvertently silences half the completion vocabulary. The liar test case is masked because it uses "Done. Implemented…", which is caught by done\. instead.
Reviewed by Cursor Bugbot for commit 29a6e0a. Configure here.


Summary
Adds a
SubagentStoplifecycle hook that cross-references a subagent's recent activity against its completion claims. When the tool-call record or filesystem disagree with what the subagent said it did, the hook surfaces a structured advisory back to the parent agent viahookSpecificOutput.additionalContext— never blocking.This shifts the "did the subagent actually do what it claimed?" check from prompt discipline (where it lives today across
<epistemic-honesty>, theverify-fixskill, and theverification-before-completionsuperpower) to a structural lifecycle hook the harness enforces.Fixes #51
What it checks
Check A — completion claim with zero mutations. If the subagent's last message contains strong completion language (
done.,completed.,✅,implemented,fixed it,all tests pass, etc.) plus a file-path-shaped token, but the recent transcript shows zeroWrite/Edit/MultiEdit/NotebookEditcalls — flag.Check B — recorded tool call but file missing. For each path in a Write/Edit-style tool call's
file_path, verify it exists on disk now. A missing file means the call failed silently or the artifact was lost.Design decisions
session-multiplexing.shandtodo-persist.shstyle. Easier to maintain, no runtime deps beyond what other hooks already require.additionalContext; the parent agent decides what to do. A misbehaving hook that blocks would be worse than one that misses a lie.done\.,completed\.,fixed it,all tests pass(ed)?are anchored with leading and trailing non-letter boundaries. A negation guard rejects "not all tests pass". Tests cover the "fixed itself" / negation regressions.deliverables.jsonconfig per stage; this approach uses the actual transcript so it works on any subagent without per-skill configuration.jq, unreadable transcript, malformed JSON, jq OOM at the final emit step — all exit 0 silently.stop_hook_active=trueto prevent loops;VERIFY_DELIVERABLES_DISABLED=1env var disables entirely.Files
plugins/core/hooks/verify-deliverables.sh— the hookplugins/core/hooks/tests/test-verify-deliverables.sh— 18-case test harnessplugins/core/hooks/hooks.json— registers the hook againstSubagentStopplugins/core/hooks/README.md— documents the hooks directory + this hook's designplugins/core/.claude-plugin/plugin.json— version 9.20.0 → 9.21.0.claude-plugin/marketplace.json— version 9.20.0 → 9.21.0 (both fields)Complexity
balanced — multi-file, new pattern in this repo (first SubagentStop hook), but well-bounded. Multi-review surfaced and fixed real issues before push.
Validation
jq . hooks.jsonvalidates..claude-plugin/CLAUDE.md.command -v jqguard at script entryjq -nso a jq failure can't leak non-zero exit-Ltest)Out of scope (per issue)
verify-fix's job. This hook only checks existence.