-
Notifications
You must be signed in to change notification settings - Fork 753
FEAT: Adding Scenario run to the REST API #1696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
4a02725
3a3c991
88d71e6
eed310b
15bdf35
33f36b2
f70a3b5
492961c
67315f5
9ad5938
cdae91d
6173e85
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -95,18 +95,24 @@ class AttackSummary(BaseModel): | |
|
|
||
| attack_result_id: str = Field(..., description="Database-assigned unique ID for this AttackResult") | ||
| conversation_id: str = Field(..., description="Primary conversation of this attack result") | ||
| attack_type: str = Field(..., description="Attack class name (e.g., 'CrescendoAttack', 'ManualAttack')") | ||
| attack_type: str = Field("", description="Attack class name (e.g., 'CrescendoAttack', 'ManualAttack')") | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why the change? |
||
| attack_specific_params: Optional[dict[str, Any]] = Field(None, description="Additional attack-specific parameters") | ||
| target: Optional[TargetInfo] = Field(None, description="Target information from the stored identifier") | ||
| converters: list[str] = Field( | ||
| default_factory=list, description="Request converter class names applied in this attack" | ||
| ) | ||
| objective: str = Field("", description="Natural-language description of the attacker's objective") | ||
| outcome: Optional[Literal["undetermined", "success", "failure"]] = Field( | ||
| None, description="Attack outcome (null if not yet determined)" | ||
| ) | ||
| outcome_reason: str | None = Field(None, description="Reason for the outcome") | ||
| last_response: str | None = Field(None, description="Model response from the final turn") | ||
| last_message_preview: Optional[str] = Field( | ||
| None, description="Preview of the last message (truncated to ~100 chars)" | ||
| ) | ||
| score_value: str | None = Field(None, description="Score value from the objective scorer") | ||
| executed_turns: int = Field(0, ge=0, description="Number of turns executed") | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: num_executed_turns |
||
| execution_time_ms: int = Field(0, ge=0, description="Execution time in milliseconds") | ||
| message_count: int = Field(0, description="Total number of messages in the attack") | ||
| related_conversation_ids: list[str] = Field( | ||
| default_factory=list, description="IDs of related conversations within this attack" | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -5,17 +5,20 @@ | |||||
| Scenario API response models. | ||||||
|
|
||||||
| Scenarios are multi-attack security testing campaigns. These models represent | ||||||
| the metadata about available scenarios (listing), not scenario execution results. | ||||||
| the metadata about available scenarios (listing) and scenario execution (runs). | ||||||
| """ | ||||||
|
|
||||||
| from typing import Optional | ||||||
| from datetime import datetime | ||||||
| from enum import Enum | ||||||
| from typing import Any, Optional | ||||||
|
|
||||||
| from pydantic import BaseModel, Field | ||||||
|
|
||||||
| from pyrit.backend.models.attacks import AttackSummary | ||||||
| from pyrit.backend.models.common import PaginationInfo | ||||||
|
|
||||||
|
|
||||||
| class ScenarioSummary(BaseModel): | ||||||
| class RegisteredScenario(BaseModel): | ||||||
| """Summary of a registered scenario.""" | ||||||
|
|
||||||
| scenario_name: str = Field(..., description="Registry key (e.g., 'foundry.red_team_agent')") | ||||||
|
|
@@ -30,8 +33,103 @@ class ScenarioSummary(BaseModel): | |||||
| max_dataset_size: Optional[int] = Field(None, description="Maximum items per dataset (None means unlimited)") | ||||||
|
|
||||||
|
|
||||||
| class ScenarioListResponse(BaseModel): | ||||||
| class ListRegisteredScenarioResponse(BaseModel): | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: ScenarioS plural? |
||||||
| """Response for listing scenarios.""" | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit:
Suggested change
|
||||||
|
|
||||||
| items: list[ScenarioSummary] = Field(..., description="List of scenario summaries") | ||||||
| items: list[RegisteredScenario] = Field(..., description="List of scenario summaries") | ||||||
| pagination: PaginationInfo = Field(..., description="Pagination metadata") | ||||||
|
|
||||||
|
|
||||||
| # ============================================================================ | ||||||
| # Scenario Run Models | ||||||
| # ============================================================================ | ||||||
|
|
||||||
|
|
||||||
| class ScenarioRunStatus(str, Enum): | ||||||
| """Status of a scenario run.""" | ||||||
|
|
||||||
| PENDING = "pending" | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: what does pending mean ? like scheduled ?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ... and do we need to have another status? We have on in PyRIT core, right? |
||||||
| INITIALIZING = "initializing" | ||||||
| RUNNING = "running" | ||||||
| COMPLETED = "completed" | ||||||
| FAILED = "failed" | ||||||
| CANCELLED = "cancelled" | ||||||
|
|
||||||
|
|
||||||
| class RunScenarioRequest(BaseModel): | ||||||
| """Request body for starting a scenario run.""" | ||||||
|
|
||||||
| scenario_name: str = Field(..., description="Registry key of the scenario to run") | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit:
Suggested change
matching the target_name description below which makes more sense to me
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. or if its not necessarily the name, rename to scenario_registry_key |
||||||
| target_name: str = Field(..., description="Name of a registered target from the TargetRegistry") | ||||||
| initializers: list[str] | None = Field( | ||||||
| None, description="Initializer names to run before scenario (e.g., ['target', 'load_default_datasets'])" | ||||||
| ) | ||||||
| strategies: list[str] | None = Field(None, description="Strategy names to use (uses scenario default if omitted)") | ||||||
| dataset_names: list[str] | None = Field(None, description="Dataset names to use (uses scenario default if omitted)") | ||||||
| max_dataset_size: int | None = Field(None, ge=1, description="Maximum items per dataset") | ||||||
| max_concurrency: int = Field(10, ge=1, le=100, description="Maximum concurrent operations") | ||||||
| max_retries: int = Field(0, ge=0, le=20, description="Maximum retry attempts on failure") | ||||||
| labels: dict[str, str] | None = Field(None, description="Labels to attach to memory entries") | ||||||
| scenario_params: dict[str, Any] | None = Field( | ||||||
| None, | ||||||
| description="Custom parameters for the scenario (passed to scenario.set_params_from_args). " | ||||||
| "Keys are parameter names declared by the scenario's supported_parameters().", | ||||||
| ) | ||||||
| initializer_args: dict[str, dict[str, Any]] | None = Field( | ||||||
| None, | ||||||
| description="Per-initializer arguments keyed by initializer name. " | ||||||
| "Each value is a dict of args passed to that initializer's set_params_from_args(). " | ||||||
| "Example: {'target': {'endpoint': 'https://...'}}.", | ||||||
| ) | ||||||
| scenario_result_id: str | None = Field( | ||||||
| None, | ||||||
| description="Optional ID of an existing ScenarioResult to resume. " | ||||||
| "If provided, the scenario will resume from prior progress instead of starting fresh.", | ||||||
| ) | ||||||
|
|
||||||
|
|
||||||
| class ScenarioRunSummary(BaseModel): | ||||||
| """Response for a scenario run (status + result details).""" | ||||||
|
|
||||||
| scenario_result_id: str = Field(..., description="UUID of the ScenarioResult in memory") | ||||||
| scenario_name: str = Field(..., description="Registry key of the scenario being run") | ||||||
| scenario_version: int = Field(0, ge=0, description="Version of the scenario") | ||||||
| status: ScenarioRunStatus = Field(..., description="Current run status") | ||||||
| created_at: datetime = Field(..., description="When the run was created") | ||||||
| updated_at: datetime = Field(..., description="When the run status last changed") | ||||||
| error: str | None = Field(None, description="Error message if status is FAILED") | ||||||
| strategies_used: list[str] = Field(default_factory=list, description="Strategy names that were executed") | ||||||
| total_attacks: int = Field(0, ge=0, description="Total number of atomic attacks") | ||||||
| completed_attacks: int = Field(0, ge=0, description="Number of attacks that completed") | ||||||
| objective_achieved_rate: int = Field(0, ge=0, le=100, description="Success rate as percentage (0-100)") | ||||||
| labels: dict[str, str] = Field(default_factory=dict, description="Labels attached to this run") | ||||||
| completed_at: datetime | None = Field(None, description="When the scenario finished") | ||||||
|
|
||||||
|
|
||||||
| class ScenarioRunListResponse(BaseModel): | ||||||
| """Response for listing scenario runs.""" | ||||||
|
|
||||||
| items: list[ScenarioRunSummary] = Field(..., description="List of scenario runs") | ||||||
|
|
||||||
|
|
||||||
| # ============================================================================ | ||||||
| # Scenario Results Detail Models | ||||||
| # ============================================================================ | ||||||
|
|
||||||
|
|
||||||
| class AtomicAttackResults(BaseModel): | ||||||
| """Results grouped by atomic attack name.""" | ||||||
|
|
||||||
| atomic_attack_name: str = Field(..., description="Name of the atomic attack (strategy)") | ||||||
| display_group: str | None = Field(None, description="Display group label for UI grouping") | ||||||
| results: list[AttackSummary] = Field(..., description="Individual attack results") | ||||||
| success_count: int = Field(0, ge=0, description="Number of successful attacks") | ||||||
| failure_count: int = Field(0, ge=0, description="Number of failed attacks") | ||||||
| total_count: int = Field(0, ge=0, description="Total number of attack results") | ||||||
|
|
||||||
|
|
||||||
| class ScenarioRunDetail(BaseModel): | ||||||
| """Full detailed results of a scenario run.""" | ||||||
|
|
||||||
| run: ScenarioRunSummary = Field(..., description="The scenario run summary") | ||||||
|
rlundeen2 marked this conversation as resolved.
|
||||||
| attacks: list[AtomicAttackResults] = Field(..., description="Results grouped by atomic attack") | ||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I struggle with this a bit. For example, if we use the same adversarial model it's bound to become a bottleneck when there are too many runs that need it. But what if it's just prompt sending attack which doesn't need it at all (maybe for scorers depending on configuration)?
On the other hand, I could run different scenarios against different endpoints in parallel. Let's say it's 1000 TAP attacks against endpoint A in scenario A, and same for B and C. Then I might be worried about running more than one TAP attack with the same endpoint at a time, but there's no reason to hold off on B and C. I suspect we don't control that kind of parallel execution, right?