docs(ai-guard): add system prompt steering guidance to reduce false positives by emmlejeail · Pull Request #37375 · DataDog/documentation

emmlejeail · 2026-06-09T14:57:02Z

What does this PR do? What is the motivation?

Adds a new subsection Steer AI Guard with system prompt context to the AI Guard setup page, under the "Configure AI Guard policies" section.

Customers can reduce false positives by adding context to their system prompt (which AI Guard receives as part of the conversation evaluation). This section explains what to include—agent purpose, authorized data, and authorized tools—and provides a before/after example using a financial analyst agent scenario.

Merge instructions

Merge readiness:

Ready for merge

Additional notes

New section is anchored at #system-prompt-context for direct linking.

…ositives Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-06-09T15:01:46Z

Preview links (active after the `build_preview` check completes)

Modified Files

https://docs-staging.datadoghq.com/emmanuelle.lejeail/add-ai-guard-steering/security/ai_guard/setup/

rtrieu · 2026-06-09T18:55:29Z

/review

rtrieu

some minor suggestions for your review

rtrieu · 2026-06-09T18:57:13Z


+### Steer AI Guard with system prompt context {#system-prompt-context}
+
+AI Guard evaluates the full conversation, including your system prompt, when assessing threats. Adding context about your agent's purpose, the data it handles, and the tools it is authorized to use helps AI Guard distinguish legitimate operations from genuine threats—reducing false positives without weakening security coverage.


Suggested change

AI Guard evaluates the full conversation, including your system prompt, when assessing threats. Adding context about your agent's purpose, the data it handles, and the tools it is authorized to use helps AI Guard distinguish legitimate operations from genuine threats—reducing false positives without weakening security coverage.

AI Guard evaluates the full conversation, including your system prompt, when assessing threats. Adding context about your agent's purpose, the data it handles, and the tools it is authorized to use helps AI Guard distinguish legitimate operations from genuine threats—reducing false positives without reducing security coverage.

rtrieu · 2026-06-09T18:58:28Z

+Do not access external systems or process requests unrelated to financial reporting.
+```
+
+With this context, AI Guard understands that SQL queries and file exports are expected, authorized operations—and is less likely to flag them as data exfiltration or destructive tool calls.


Suggested change

With this context, AI Guard understands that SQL queries and file exports are expected, authorized operations—and is less likely to flag them as data exfiltration or destructive tool calls.

With this context, AI Guard understands that SQL queries and file exports are expected, authorized operations—and is less likely to flag them as [data-exfiltration][signals] or [destructive-tool-call][signals].

github-actions

🤖 Automated review by Claude. AI-generated; verify before acting.

Overall the new section reads well and fits the page. A couple of minor wording/clarity suggestions inline.

_{Reviewed 77d7974a80f0c6d978157a22490b90d5f0e96750 — workflow run}

github-actions · 2026-06-09T18:58:50Z

+
+#### Example
+
+A system prompt with minimal context is more likely to produce false positives for legitimate operations:


Style: "produce false positives" reads awkwardly here — a system prompt doesn't produce the false positives, AI Guard does. Consider rephrasing.

Suggested change

A system prompt with minimal context is more likely to produce false positives for legitimate operations:

A system prompt with minimal context is more likely to result in false positives for legitimate operations:

github-actions · 2026-06-09T18:58:50Z

+
+A system prompt with minimal context is more likely to produce false positives for legitimate operations:
+
+```


Suggestion: Add a language hint (text) to the fenced code block so it renders consistently and isn't picked up by syntax highlighters. Same applies to the code block at line 140.

Suggested change

```

```text

github-actions · 2026-06-09T18:58:50Z

 - A **lower** value **increases** sensitivity: AI Guard flags threats even when the confidence is low, surfacing more potential attacks but also more false positives.
 - A **higher** value **decreases** sensitivity: AI Guard only flags threats when the confidence is high, reducing noise but potentially missing some attacks.

+### Steer AI Guard with system prompt context {#system-prompt-context}


Suggestion: "Steer" is unusual phrasing for a docs heading. Consider a more direct, scannable verb that matches the imperative style used elsewhere on the page.

Suggested change

### Steer AI Guard with system prompt context {#system-prompt-context}

### Add context with your system prompt {#system-prompt-context}

github-actions · 2026-06-09T18:58:50Z

+Do not access external systems or process requests unrelated to financial reporting.
+```
+
+With this context, AI Guard understands that SQL queries and file exports are expected, authorized operations—and is less likely to flag them as data exfiltration or destructive tool calls.


Style: "AI Guard understands" anthropomorphizes the product. Consider a more neutral phrasing such as "AI Guard treats SQL queries and file exports as expected, authorized operations, and is less likely to flag them as data exfiltration or destructive tool calls."

Suggested change

With this context, AI Guard understands that SQL queries and file exports are expected, authorized operations—and is less likely to flag them as data exfiltration or destructive tool calls.

With this context, AI Guard treats SQL queries and file exports as expected, authorized operations, and is less likely to flag them as data exfiltration or destructive tool calls.

docs(ai-guard): add system prompt steering guidance to reduce false p…

77d7974

…ositives Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

emmlejeail marked this pull request as ready for review June 9, 2026 15:30

emmlejeail requested a review from a team as a code owner June 9, 2026 15:30

rtrieu self-requested a review June 9, 2026 18:55

rtrieu requested changes Jun 9, 2026

View reviewed changes

github-actions Bot reviewed Jun 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(ai-guard): add system prompt steering guidance to reduce false positives#37375

docs(ai-guard): add system prompt steering guidance to reduce false positives#37375
emmlejeail wants to merge 1 commit into
masterfrom
emmanuelle.lejeail/add-ai-guard-steering

emmlejeail commented Jun 9, 2026

Uh oh!

github-actions Bot commented Jun 9, 2026

Uh oh!

rtrieu commented Jun 9, 2026

Uh oh!

rtrieu left a comment

Uh oh!

rtrieu Jun 9, 2026

Uh oh!

rtrieu Jun 9, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot Jun 9, 2026

Uh oh!

github-actions Bot Jun 9, 2026

Uh oh!

github-actions Bot Jun 9, 2026

Uh oh!

github-actions Bot Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		### Steer AI Guard with system prompt context {#system-prompt-context}

		AI Guard evaluates the full conversation, including your system prompt, when assessing threats. Adding context about your agent's purpose, the data it handles, and the tools it is authorized to use helps AI Guard distinguish legitimate operations from genuine threats—reducing false positives without weakening security coverage.

	With this context, AI Guard understands that SQL queries and file exports are expected, authorized operations—and is less likely to flag them as data exfiltration or destructive tool calls.
	With this context, AI Guard understands that SQL queries and file exports are expected, authorized operations—and is less likely to flag them as [data-exfiltration][signals] or [destructive-tool-call][signals].


		#### Example

		A system prompt with minimal context is more likely to produce false positives for legitimate operations:

	### Steer AI Guard with system prompt context {#system-prompt-context}
	### Add context with your system prompt {#system-prompt-context}

	With this context, AI Guard understands that SQL queries and file exports are expected, authorized operations—and is less likely to flag them as data exfiltration or destructive tool calls.
	With this context, AI Guard treats SQL queries and file exports as expected, authorized operations, and is less likely to flag them as data exfiltration or destructive tool calls.

Conversation

emmlejeail commented Jun 9, 2026

What does this PR do? What is the motivation?

Merge instructions

Additional notes

Uh oh!

github-actions Bot commented Jun 9, 2026

Preview links (active after the build_preview check completes)

Modified Files

Uh oh!

rtrieu commented Jun 9, 2026

Uh oh!

rtrieu left a comment

Choose a reason for hiding this comment

Uh oh!

rtrieu Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

rtrieu Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Preview links (active after the `build_preview` check completes)