Skip to content

docs(ai-guard): add system prompt steering guidance to reduce false positives#37375

Open
emmlejeail wants to merge 1 commit into
masterfrom
emmanuelle.lejeail/add-ai-guard-steering
Open

docs(ai-guard): add system prompt steering guidance to reduce false positives#37375
emmlejeail wants to merge 1 commit into
masterfrom
emmanuelle.lejeail/add-ai-guard-steering

Conversation

@emmlejeail

Copy link
Copy Markdown
Contributor

What does this PR do? What is the motivation?

Adds a new subsection Steer AI Guard with system prompt context to the AI Guard setup page, under the "Configure AI Guard policies" section.

Customers can reduce false positives by adding context to their system prompt (which AI Guard receives as part of the conversation evaluation). This section explains what to include—agent purpose, authorized data, and authorized tools—and provides a before/after example using a financial analyst agent scenario.

Merge instructions

Merge readiness:

  • Ready for merge

Additional notes

New section is anchored at #system-prompt-context for direct linking.

…ositives

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Preview links (active after the build_preview check completes)

Modified Files

@emmlejeail emmlejeail marked this pull request as ready for review June 9, 2026 15:30
@emmlejeail emmlejeail requested a review from a team as a code owner June 9, 2026 15:30
@rtrieu

rtrieu commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

/review

@rtrieu rtrieu self-requested a review June 9, 2026 18:55

@rtrieu rtrieu left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some minor suggestions for your review


### Steer AI Guard with system prompt context {#system-prompt-context}

AI Guard evaluates the full conversation, including your system prompt, when assessing threats. Adding context about your agent's purpose, the data it handles, and the tools it is authorized to use helps AI Guard distinguish legitimate operations from genuine threats—reducing false positives without weakening security coverage.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
AI Guard evaluates the full conversation, including your system prompt, when assessing threats. Adding context about your agent's purpose, the data it handles, and the tools it is authorized to use helps AI Guard distinguish legitimate operations from genuine threats—reducing false positives without weakening security coverage.
AI Guard evaluates the full conversation, including your system prompt, when assessing threats. Adding context about your agent's purpose, the data it handles, and the tools it is authorized to use helps AI Guard distinguish legitimate operations from genuine threats—reducing false positives without reducing security coverage.

Do not access external systems or process requests unrelated to financial reporting.
```

With this context, AI Guard understands that SQL queries and file exports are expected, authorized operations—and is less likely to flag them as data exfiltration or destructive tool calls.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
With this context, AI Guard understands that SQL queries and file exports are expected, authorized operations—and is less likely to flag them as data exfiltration or destructive tool calls.
With this context, AI Guard understands that SQL queries and file exports are expected, authorized operations—and is less likely to flag them as [data-exfiltration][signals] or [destructive-tool-call][signals].

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Automated review by Claude. AI-generated; verify before acting.

Overall the new section reads well and fits the page. A couple of minor wording/clarity suggestions inline.

Reviewed 77d7974a80f0c6d978157a22490b90d5f0e96750workflow run


#### Example

A system prompt with minimal context is more likely to produce false positives for legitimate operations:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: "produce false positives" reads awkwardly here — a system prompt doesn't produce the false positives, AI Guard does. Consider rephrasing.

Suggested change
A system prompt with minimal context is more likely to produce false positives for legitimate operations:
A system prompt with minimal context is more likely to result in false positives for legitimate operations:


A system prompt with minimal context is more likely to produce false positives for legitimate operations:

```

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Add a language hint (text) to the fenced code block so it renders consistently and isn't picked up by syntax highlighters. Same applies to the code block at line 140.

Suggested change
```
```text

- A **lower** value **increases** sensitivity: AI Guard flags threats even when the confidence is low, surfacing more potential attacks but also more false positives.
- A **higher** value **decreases** sensitivity: AI Guard only flags threats when the confidence is high, reducing noise but potentially missing some attacks.

### Steer AI Guard with system prompt context {#system-prompt-context}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: "Steer" is unusual phrasing for a docs heading. Consider a more direct, scannable verb that matches the imperative style used elsewhere on the page.

Suggested change
### Steer AI Guard with system prompt context {#system-prompt-context}
### Add context with your system prompt {#system-prompt-context}

Do not access external systems or process requests unrelated to financial reporting.
```

With this context, AI Guard understands that SQL queries and file exports are expected, authorized operations—and is less likely to flag them as data exfiltration or destructive tool calls.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: "AI Guard understands" anthropomorphizes the product. Consider a more neutral phrasing such as "AI Guard treats SQL queries and file exports as expected, authorized operations, and is less likely to flag them as data exfiltration or destructive tool calls."

Suggested change
With this context, AI Guard understands that SQL queries and file exports are expected, authorized operationsand is less likely to flag them as data exfiltration or destructive tool calls.
With this context, AI Guard treats SQL queries and file exports as expected, authorized operations, and is less likely to flag them as data exfiltration or destructive tool calls.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants