FEAT: Adding Scenario run to the REST API#1696
FEAT: Adding Scenario run to the REST API#1696rlundeen2 wants to merge 12 commits intomicrosoft:mainfrom
Conversation
| class ScenarioRunStatus(str, Enum): | ||
| """Status of a scenario run.""" | ||
|
|
||
| PENDING = "pending" |
There was a problem hiding this comment.
nit: what does pending mean ? like scheduled ?
There was a problem hiding this comment.
... and do we need to have another status? We have on in PyRIT core, right?
| @@ -64,6 +64,7 @@ def __init__( | |||
| objective_scorer_identifier: "ComponentIdentifier", | |||
| scenario_run_state: ScenarioRunState = "CREATED", | |||
| labels: Optional[dict[str, str]] = None, | |||
There was a problem hiding this comment.
do we want an error message property here too ?
There was a problem hiding this comment.
100% and great catch; it is even in my notes. But it has some bits I'd like to tackle as a follow up
| active.task = task | ||
|
|
||
| response = self._build_response(scenario_result_id=scenario_result_id) | ||
| assert response is not None # guaranteed: we just inserted into DB via initialize_async |
There was a problem hiding this comment.
Why have it then? For typing reasons? Asserts don't run in production code.
| attack_result_id: str = Field(..., description="Database-assigned unique ID for this AttackResult") | ||
| conversation_id: str = Field(..., description="Primary conversation of this attack result") | ||
| attack_type: str = Field(..., description="Attack class name (e.g., 'CrescendoAttack', 'ManualAttack')") | ||
| attack_type: str = Field("", description="Attack class name (e.g., 'CrescendoAttack', 'ManualAttack')") |
|
|
||
|
|
||
| class ScenarioListResponse(BaseModel): | ||
| class ListRegisteredScenarioResponse(BaseModel): |
| Raises: | ||
| ValueError: If the run is not in a completed state. | ||
| """ | ||
| memory = CentralMemory.get_memory_instance() |
There was a problem hiding this comment.
This happens in 4 methods as the first command. Perhaps we should set it on the constructor as it's not going to change?
| _service_instance: ScenarioRunService | None = None | ||
|
|
||
|
|
||
| def get_scenario_run_service() -> ScenarioRunService: |
There was a problem hiding this comment.
I'm somewhat confused by this. It's quite different from how we do this in other services and I don't know why.
Is it because of max concurrent runs?
| objective_target_identifier: Mapped[dict[str, str]] = mapped_column(JSON, nullable=False) | ||
| objective_scorer_identifier: Mapped[Optional[dict[str, str]]] = mapped_column(JSON, nullable=True) | ||
| scenario_run_state: Mapped[Literal["CREATED", "IN_PROGRESS", "COMPLETED", "FAILED"]] = mapped_column( | ||
| scenario_run_state: Mapped[Literal["CREATED", "IN_PROGRESS", "COMPLETED", "FAILED", "CANCELLED"]] = mapped_column( |
There was a problem hiding this comment.
Should this Literal["CREATED", "IN_PROGRESS", "COMPLETED", "FAILED", "CANCELLED"] become a type so that we can use it in multiple places?
There was a problem hiding this comment.
actually it did. I see it in models/scenario_result.py. But not used here
| objective_scorer_identifier: "ComponentIdentifier", | ||
| scenario_run_state: ScenarioRunState = "CREATED", | ||
| labels: Optional[dict[str, str]] = None, | ||
| created_at: Optional[datetime] = None, |
There was a problem hiding this comment.
I thought the guidelines say we don't do Optional anymore but rather datetime | None following whatever the latest PEP is.
| self.created_at = created_at if created_at is not None else datetime.now(timezone.utc) | ||
| self.completion_time = completion_time if completion_time is not None else datetime.now(timezone.utc) |
There was a problem hiding this comment.
Would be nice if it matched. created_at and completed_at
| # ---------------------------- | ||
| # Maximum number of scenario runs that can execute concurrently in the backend. | ||
| # Applies only to the pyrit_backend server. | ||
| max_concurrent_scenario_runs: 3 |
There was a problem hiding this comment.
I struggle with this a bit. For example, if we use the same adversarial model it's bound to become a bottleneck when there are too many runs that need it. But what if it's just prompt sending attack which doesn't need it at all (maybe for scorers depending on configuration)?
On the other hand, I could run different scenarios against different endpoints in parallel. Let's say it's 1000 TAP attacks against endpoint A in scenario A, and same for B and C. Then I might be worried about running more than one TAP attack with the same endpoint at a time, but there's no reason to hold off on B and C. I suspect we don't control that kind of parallel execution, right?
| None, description="Preview of the last message (truncated to ~100 chars)" | ||
| ) | ||
| score_value: str | None = Field(None, description="Score value from the objective scorer") | ||
| executed_turns: int = Field(0, ge=0, description="Number of turns executed") |
There was a problem hiding this comment.
nit: num_executed_turns
| class RunScenarioRequest(BaseModel): | ||
| """Request body for starting a scenario run.""" | ||
|
|
||
| scenario_name: str = Field(..., description="Registry key of the scenario to run") |
There was a problem hiding this comment.
nit:
| scenario_name: str = Field(..., description="Registry key of the scenario to run") | |
| scenario_name: str = Field(..., description="Name of registered scenario from the ScenarioRegistry") |
matching the target_name description below which makes more sense to me
There was a problem hiding this comment.
or if its not necessarily the name, rename to scenario_registry_key
|
|
||
| class ScenarioListResponse(BaseModel): | ||
| class ListRegisteredScenarioResponse(BaseModel): | ||
| """Response for listing scenarios.""" |
There was a problem hiding this comment.
nit:
| """Response for listing scenarios.""" | |
| """Response for listing registered scenarios.""" |
| except KeyError as e: | ||
| raise ValueError(str(e)) from None | ||
|
|
||
| # Validate and run initializers |
There was a problem hiding this comment.
nit: make lines 215-226 an initializer validation function, 229-241 a resolve target function, and 253-266 a resolve strategy function
| identifier_filters (Optional[Sequence[IdentifierFilter]], optional): | ||
| A sequence of IdentifierFilter objects that allows filtering by identifier JSON properties. | ||
| Defaults to None. | ||
| limit (int | None): Maximum number of results to return. Defaults to None (no limit). |
There was a problem hiding this comment.
nit: limit is not super intuitive to me, i'd prefer something like max_results or result_limit
hannahwestra25
left a comment
There was a problem hiding this comment.
small nits but looks good!
Adds backend endpoints for running, monitoring, and cancelling scenario executions via the REST UI.
ScenarioRunServicewith in-memory run tracking, background asyncio execution, and cancellation support (max 3concurrent runs)