FEAT: Adding Scenario run to the REST API by rlundeen2 · Pull Request #1696 · microsoft/PyRIT

rlundeen2 · 2026-05-07T00:52:36Z

Adds backend endpoints for running, monitoring, and cancelling scenario executions via the REST UI.

New ScenarioRunService with in-memory run tracking, background asyncio execution, and cancellation support (max 3
concurrent runs)
New routes: POST /runs, GET /runs, GET /runs/{id}, DELETE /runs/{id}, GET /runs/{id}/results
Comprehensive unit tests for service and route layers

…io-rest

hannahwestra25 · 2026-05-07T20:02:25Z

+class ScenarioRunStatus(str, Enum):
+    """Status of a scenario run."""
+
+    PENDING = "pending"


nit: what does pending mean ? like scheduled ?

... and do we need to have another status? We have on in PyRIT core, right?

hannahwestra25 · 2026-05-07T20:19:41Z

@@ -64,6 +64,7 @@ def __init__(
        objective_scorer_identifier: "ComponentIdentifier",
        scenario_run_state: ScenarioRunState = "CREATED",
        labels: Optional[dict[str, str]] = None,


do we want an error message property here too ?

100% and great catch; it is even in my notes. But it has some bits I'd like to tackle as a follow up

romanlutz · 2026-05-08T13:11:07Z

+        active.task = task
+
+        response = self._build_response(scenario_result_id=scenario_result_id)
+        assert response is not None  # guaranteed: we just inserted into DB via initialize_async


Why have it then? For typing reasons? Asserts don't run in production code.

romanlutz · 2026-05-08T13:12:15Z

    attack_result_id: str = Field(..., description="Database-assigned unique ID for this AttackResult")
    conversation_id: str = Field(..., description="Primary conversation of this attack result")
-    attack_type: str = Field(..., description="Attack class name (e.g., 'CrescendoAttack', 'ManualAttack')")
+    attack_type: str = Field("", description="Attack class name (e.g., 'CrescendoAttack', 'ManualAttack')")


Why the change?

romanlutz · 2026-05-08T13:13:26Z



-class ScenarioListResponse(BaseModel):
+class ListRegisteredScenarioResponse(BaseModel):


nit: ScenarioS plural?

romanlutz · 2026-05-08T13:18:37Z

+        Raises:
+            ValueError: If the run is not in a completed state.
+        """
+        memory = CentralMemory.get_memory_instance()


This happens in 4 methods as the first command. Perhaps we should set it on the constructor as it's not going to change?

romanlutz · 2026-05-08T13:21:03Z

+_service_instance: ScenarioRunService | None = None
+
+
+def get_scenario_run_service() -> ScenarioRunService:


I'm somewhat confused by this. It's quite different from how we do this in other services and I don't know why.

Is it because of max concurrent runs?

romanlutz · 2026-05-08T13:22:10Z

    objective_target_identifier: Mapped[dict[str, str]] = mapped_column(JSON, nullable=False)
    objective_scorer_identifier: Mapped[Optional[dict[str, str]]] = mapped_column(JSON, nullable=True)
-    scenario_run_state: Mapped[Literal["CREATED", "IN_PROGRESS", "COMPLETED", "FAILED"]] = mapped_column(
+    scenario_run_state: Mapped[Literal["CREATED", "IN_PROGRESS", "COMPLETED", "FAILED", "CANCELLED"]] = mapped_column(


Should this Literal["CREATED", "IN_PROGRESS", "COMPLETED", "FAILED", "CANCELLED"] become a type so that we can use it in multiple places?

actually it did. I see it in models/scenario_result.py. But not used here

romanlutz · 2026-05-08T13:24:12Z

        objective_scorer_identifier: "ComponentIdentifier",
        scenario_run_state: ScenarioRunState = "CREATED",
        labels: Optional[dict[str, str]] = None,
+        created_at: Optional[datetime] = None,


I thought the guidelines say we don't do Optional anymore but rather datetime | None following whatever the latest PEP is.

romanlutz · 2026-05-08T13:24:50Z

+        self.created_at = created_at if created_at is not None else datetime.now(timezone.utc)
        self.completion_time = completion_time if completion_time is not None else datetime.now(timezone.utc)


Would be nice if it matched. created_at and completed_at

romanlutz · 2026-05-08T13:29:10Z

+# ----------------------------
+# Maximum number of scenario runs that can execute concurrently in the backend.
+# Applies only to the pyrit_backend server.
+max_concurrent_scenario_runs: 3


I struggle with this a bit. For example, if we use the same adversarial model it's bound to become a bottleneck when there are too many runs that need it. But what if it's just prompt sending attack which doesn't need it at all (maybe for scorers depending on configuration)?

On the other hand, I could run different scenarios against different endpoints in parallel. Let's say it's 1000 TAP attacks against endpoint A in scenario A, and same for B and C. Then I might be worried about running more than one TAP attack with the same endpoint at a time, but there's no reason to hold off on B and C. I suspect we don't control that kind of parallel execution, right?

hannahwestra25 · 2026-05-08T16:10:57Z

        None, description="Preview of the last message (truncated to ~100 chars)"
    )
+    score_value: str | None = Field(None, description="Score value from the objective scorer")
+    executed_turns: int = Field(0, ge=0, description="Number of turns executed")


nit: num_executed_turns

hannahwestra25 · 2026-05-08T16:14:50Z

+class RunScenarioRequest(BaseModel):
+    """Request body for starting a scenario run."""
+
+    scenario_name: str = Field(..., description="Registry key of the scenario to run")


nit:

Suggested change

scenario_name: str = Field(..., description="Registry key of the scenario to run")

scenario_name: str = Field(..., description="Name of registered scenario from the ScenarioRegistry")

matching the target_name description below which makes more sense to me

or if its not necessarily the name, rename to scenario_registry_key

hannahwestra25 · 2026-05-08T16:20:07Z


-class ScenarioListResponse(BaseModel):
+class ListRegisteredScenarioResponse(BaseModel):
    """Response for listing scenarios."""


nit:

Suggested change

"""Response for listing scenarios."""

"""Response for listing registered scenarios."""

hannahwestra25 · 2026-05-08T17:10:24Z

+        except KeyError as e:
+            raise ValueError(str(e)) from None
+
+        # Validate and run initializers


nit: make lines 215-226 an initializer validation function, 229-241 a resolve target function, and 253-266 a resolve strategy function

hannahwestra25 · 2026-05-08T17:13:58Z

            identifier_filters (Optional[Sequence[IdentifierFilter]], optional):
                A sequence of IdentifierFilter objects that allows filtering by identifier JSON properties.
                Defaults to None.
+            limit (int | None): Maximum number of results to return. Defaults to None (no limit).


nit: limit is not super intuitive to me, i'd prefer something like max_results or result_limit

hannahwestra25

small nits but looks good!

rlundeen2 added 7 commits May 6, 2026 12:55

adding basics

4a02725

Merge remote-tracking branch 'origin/main' into users/rludneen/scenar…

3a3c991

…io-rest

refactoring

88d71e6

refactoring again

eed310b

self pr review

15bdf35

pre-commit

33f36b2

test fix

f70a3b5

romanlutz reviewed May 7, 2026

View reviewed changes

Comment thread pyrit/backend/models/scenarios.py Outdated

romanlutz reviewed May 7, 2026

View reviewed changes

Comment thread pyrit/backend/models/scenarios.py Outdated

romanlutz reviewed May 7, 2026

View reviewed changes

Comment thread pyrit/backend/models/scenarios.py Outdated

romanlutz reviewed May 7, 2026

View reviewed changes

Comment thread pyrit/backend/models/scenarios.py Outdated

romanlutz reviewed May 7, 2026

View reviewed changes

Comment thread pyrit/backend/models/scenarios.py Outdated