feat: report a performed action with no confirmation as done(), not abort#535
Open
lmorchard wants to merge 1 commit into
Open
feat: report a performed action with no confirmation as done(), not abort#535lmorchard wants to merge 1 commit into
lmorchard wants to merge 1 commit into
Conversation
The done/abort guidance treated a form submit that produced no error and no explicit success message as 'unverified', pushing the agent to abort. Refine it so a performed action with no error but no confirmation is reported via done() with a caveat, while abort remains for unverified data and blocked core steps.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adjusts the core action-loop system prompt guidance so that when an agent submits a form and observes neither a success confirmation nor a validation error (i.e., no explicit confirmation but also no error), it reports the outcome via done() with an explicit caveat rather than treating it as “unverified” and calling abort().
Changes:
- Updates the “Before calling done()” checklist to distinguish validation errors (treat as not submitted; retry) from “no confirmation and no error” (treat as submitted;
done()with caveat). - Narrows the “abort on uncertainty” rule to apply to unverified data or outright blocked core steps, not to performed actions lacking explicit confirmation.
- Adds a unit test asserting the updated guidance text is present in the built system prompt.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| packages/core/src/prompts.ts | Refines the action-loop prompt guidance around form submission verification and when to use done() vs abort(). |
| packages/core/test/prompts.test.ts | Adds a regression test that asserts the updated guidance appears in the generated prompt. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| 4. Data grounding: every value in your answer must appear in a page snapshot, a tool result, or the task input. Do NOT use general knowledge to fill gaps. If a value was not found during this session, say so explicitly rather than inventing it. | ||
| 5. Blockers vs. obstacles: if you hit an unrecoverable block (paywall, login wall, access denied, payment declined) that prevented completing a core requirement, call abort() with the reason. Temporary obstacles you handled (dismissed popups, retried errors) don't change the outcome. | ||
| 6. If anything is unverified, incomplete, or uncertain — call abort() with the reason rather than done() with an overclaiming answer. | ||
| 6. If the information or data the task asks you to return is unverified, or a core step was blocked outright, call abort() with the reason rather than done() with an overclaiming answer. But an action you actually performed — e.g. a form submit that returned no error and showed no validation message — is NOT "unverified" merely because the site displayed no explicit success message; report that with done() and the caveat, don't abort. |
| - Does your answer match the requested format? | ||
| 3. Verify actions actually completed by checking the most recent page state: | ||
| - If you submitted a form, did the next page confirm success? | ||
| - If you submitted a form, look for a success message OR a validation error. A validation error means it did NOT submit — fix and retry. Seeing NEITHER (no confirmation, but no error) is a normal outcome on many sites: treat the submission as completed and report it with done(), stating explicitly that no confirmation was shown. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Refines the
done()/abort()guidance in the action-loop prompt. Today, a form submit that returns no error but shows no explicit confirmation is treated as "unverified," pushing the agent toabort(). That's miscalibrated for action tasks: the honest outcome is "submitted; no confirmation shown, but no error either." This change lets the agent report such cases withdone()(caveated), while keepingabort()for unverified data and blocked core steps.Two scoped edits to the "Before calling done()" block in
prompts.ts:done()stating no confirmation was shown).Why separate from #534
This is a global agent-behavior change (affects every task), distinct from the upload-files feature. Reviewers should weigh it on its own.
Validation
done()(previously aborted reliably). Reasoning shows the agent stating it submitted with no confirmation but no error.abort(), zero fabrication.pnpm --filter pilo-coretests green.Note for reviewers
Rigorous research-task regression measurement belongs in a main-vs-branch comparison eval (the local probes here are a sound sanity check, not a full sweep). Worth running before merge.
🤖 Generated with Claude Code