Skip to content

feat: report a performed action with no confirmation as done(), not abort#535

Open
lmorchard wants to merge 1 commit into
mainfrom
feat/done-abort-no-confirmation
Open

feat: report a performed action with no confirmation as done(), not abort#535
lmorchard wants to merge 1 commit into
mainfrom
feat/done-abort-no-confirmation

Conversation

@lmorchard

Copy link
Copy Markdown
Collaborator

Summary

Refines the done()/abort() guidance in the action-loop prompt. Today, a form submit that returns no error but shows no explicit confirmation is treated as "unverified," pushing the agent to abort(). That's miscalibrated for action tasks: the honest outcome is "submitted; no confirmation shown, but no error either." This change lets the agent report such cases with done() (caveated), while keeping abort() for unverified data and blocked core steps.

Two scoped edits to the "Before calling done()" block in prompts.ts:

  • Form-submit check now distinguishes a validation error (did NOT submit → fix/retry) from "neither confirmation nor error" (normal on many sites → done() stating no confirmation was shown).
  • The abort-on-uncertain rule is narrowed to information/data you must return, or a blocked core step — explicitly not an action you performed that produced no error.

Why separate from #534

This is a global agent-behavior change (affects every task), distinct from the upload-files feature. Reviewers should weigh it on its own.

Validation

  • Target (action tasks): browser-use ember-form upload task, firewall on — 5 of 6 runs now pass via honest, caveated done() (previously aborted reliably). Reasoning shows the agent stating it submitted with no confirmation but no error.
  • Guardrail intact (data honesty): "report a datum that isn't on the page" probe — 3 of 3 correctly abort(), zero fabrication.
  • Unit test asserts the new guidance is present; pnpm --filter pilo-core tests green.

Note for reviewers

Rigorous research-task regression measurement belongs in a main-vs-branch comparison eval (the local probes here are a sound sanity check, not a full sweep). Worth running before merge.

🤖 Generated with Claude Code

The done/abort guidance treated a form submit that produced no error and no
explicit success message as 'unverified', pushing the agent to abort. Refine it
so a performed action with no error but no confirmation is reported via done()
with a caveat, while abort remains for unverified data and blocked core steps.
Copilot AI review requested due to automatic review settings June 10, 2026 00:47

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adjusts the core action-loop system prompt guidance so that when an agent submits a form and observes neither a success confirmation nor a validation error (i.e., no explicit confirmation but also no error), it reports the outcome via done() with an explicit caveat rather than treating it as “unverified” and calling abort().

Changes:

  • Updates the “Before calling done()” checklist to distinguish validation errors (treat as not submitted; retry) from “no confirmation and no error” (treat as submitted; done() with caveat).
  • Narrows the “abort on uncertainty” rule to apply to unverified data or outright blocked core steps, not to performed actions lacking explicit confirmation.
  • Adds a unit test asserting the updated guidance text is present in the built system prompt.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
packages/core/src/prompts.ts Refines the action-loop prompt guidance around form submission verification and when to use done() vs abort().
packages/core/test/prompts.test.ts Adds a regression test that asserts the updated guidance appears in the generated prompt.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

4. Data grounding: every value in your answer must appear in a page snapshot, a tool result, or the task input. Do NOT use general knowledge to fill gaps. If a value was not found during this session, say so explicitly rather than inventing it.
5. Blockers vs. obstacles: if you hit an unrecoverable block (paywall, login wall, access denied, payment declined) that prevented completing a core requirement, call abort() with the reason. Temporary obstacles you handled (dismissed popups, retried errors) don't change the outcome.
6. If anything is unverified, incomplete, or uncertain — call abort() with the reason rather than done() with an overclaiming answer.
6. If the information or data the task asks you to return is unverified, or a core step was blocked outright, call abort() with the reason rather than done() with an overclaiming answer. But an action you actually performed — e.g. a form submit that returned no error and showed no validation message — is NOT "unverified" merely because the site displayed no explicit success message; report that with done() and the caveat, don't abort.
- Does your answer match the requested format?
3. Verify actions actually completed by checking the most recent page state:
- If you submitted a form, did the next page confirm success?
- If you submitted a form, look for a success message OR a validation error. A validation error means it did NOT submit — fix and retry. Seeing NEITHER (no confirmation, but no error) is a normal outcome on many sites: treat the submission as completed and report it with done(), stating explicitly that no confirmation was shown.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants