docs(ai-guard): add system prompt steering guidance to reduce false positives#37375
docs(ai-guard): add system prompt steering guidance to reduce false positives#37375emmlejeail wants to merge 1 commit into
Conversation
…ositives Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Preview links (active after the
|
|
/review |
rtrieu
left a comment
There was a problem hiding this comment.
some minor suggestions for your review
|
|
||
| ### Steer AI Guard with system prompt context {#system-prompt-context} | ||
|
|
||
| AI Guard evaluates the full conversation, including your system prompt, when assessing threats. Adding context about your agent's purpose, the data it handles, and the tools it is authorized to use helps AI Guard distinguish legitimate operations from genuine threats—reducing false positives without weakening security coverage. |
There was a problem hiding this comment.
| AI Guard evaluates the full conversation, including your system prompt, when assessing threats. Adding context about your agent's purpose, the data it handles, and the tools it is authorized to use helps AI Guard distinguish legitimate operations from genuine threats—reducing false positives without weakening security coverage. | |
| AI Guard evaluates the full conversation, including your system prompt, when assessing threats. Adding context about your agent's purpose, the data it handles, and the tools it is authorized to use helps AI Guard distinguish legitimate operations from genuine threats—reducing false positives without reducing security coverage. |
| Do not access external systems or process requests unrelated to financial reporting. | ||
| ``` | ||
|
|
||
| With this context, AI Guard understands that SQL queries and file exports are expected, authorized operations—and is less likely to flag them as data exfiltration or destructive tool calls. |
There was a problem hiding this comment.
| With this context, AI Guard understands that SQL queries and file exports are expected, authorized operations—and is less likely to flag them as data exfiltration or destructive tool calls. | |
| With this context, AI Guard understands that SQL queries and file exports are expected, authorized operations—and is less likely to flag them as [data-exfiltration][signals] or [destructive-tool-call][signals]. |
There was a problem hiding this comment.
🤖 Automated review by Claude. AI-generated; verify before acting.
Overall the new section reads well and fits the page. A couple of minor wording/clarity suggestions inline.
Reviewed 77d7974a80f0c6d978157a22490b90d5f0e96750 — workflow run
|
|
||
| #### Example | ||
|
|
||
| A system prompt with minimal context is more likely to produce false positives for legitimate operations: |
There was a problem hiding this comment.
Style: "produce false positives" reads awkwardly here — a system prompt doesn't produce the false positives, AI Guard does. Consider rephrasing.
| A system prompt with minimal context is more likely to produce false positives for legitimate operations: | |
| A system prompt with minimal context is more likely to result in false positives for legitimate operations: |
|
|
||
| A system prompt with minimal context is more likely to produce false positives for legitimate operations: | ||
|
|
||
| ``` |
There was a problem hiding this comment.
Suggestion: Add a language hint (text) to the fenced code block so it renders consistently and isn't picked up by syntax highlighters. Same applies to the code block at line 140.
| ``` | |
| ```text |
| - A **lower** value **increases** sensitivity: AI Guard flags threats even when the confidence is low, surfacing more potential attacks but also more false positives. | ||
| - A **higher** value **decreases** sensitivity: AI Guard only flags threats when the confidence is high, reducing noise but potentially missing some attacks. | ||
|
|
||
| ### Steer AI Guard with system prompt context {#system-prompt-context} |
There was a problem hiding this comment.
Suggestion: "Steer" is unusual phrasing for a docs heading. Consider a more direct, scannable verb that matches the imperative style used elsewhere on the page.
| ### Steer AI Guard with system prompt context {#system-prompt-context} | |
| ### Add context with your system prompt {#system-prompt-context} |
| Do not access external systems or process requests unrelated to financial reporting. | ||
| ``` | ||
|
|
||
| With this context, AI Guard understands that SQL queries and file exports are expected, authorized operations—and is less likely to flag them as data exfiltration or destructive tool calls. |
There was a problem hiding this comment.
Style: "AI Guard understands" anthropomorphizes the product. Consider a more neutral phrasing such as "AI Guard treats SQL queries and file exports as expected, authorized operations, and is less likely to flag them as data exfiltration or destructive tool calls."
| With this context, AI Guard understands that SQL queries and file exports are expected, authorized operations—and is less likely to flag them as data exfiltration or destructive tool calls. | |
| With this context, AI Guard treats SQL queries and file exports as expected, authorized operations, and is less likely to flag them as data exfiltration or destructive tool calls. |
What does this PR do? What is the motivation?
Adds a new subsection Steer AI Guard with system prompt context to the AI Guard setup page, under the "Configure AI Guard policies" section.
Customers can reduce false positives by adding context to their system prompt (which AI Guard receives as part of the conversation evaluation). This section explains what to include—agent purpose, authorized data, and authorized tools—and provides a before/after example using a financial analyst agent scenario.
Merge instructions
Merge readiness:
Additional notes
New section is anchored at
#system-prompt-contextfor direct linking.