MAINT unify how default adversarial and scorer targets are set in scenarios by behnam-o · Pull Request #1695 · microsoft/PyRIT

behnam-o · 2026-05-06T23:22:11Z

Changes:

1- Use a common logic for resolving default adversarial and scorer targets in scenarios
2- Have a fallback mechanism that uses the target identified by OPENAI_CHAT_*** env variables. This allows someone to only define 1 endpoint and get a runnable scenario

…io-targets

rlundeen2

Looks great!

hannahwestra25 · 2026-05-08T14:47:21Z

+        When overridden to return a non-empty sequence, the default objective scorer becomes
+        one ``SelfAskTrueFalseScorer`` per path AND-ed together with ``NOT(SelfAskRefusalScorer)``
+        instead of the scenario-level default.


nit: I feel like this part is confusing here because the logic for this is not in this specific function and this should be moved to the doc string for _get_default_objective_scorer

hannahwestra25 · 2026-05-08T14:50:29Z

+    @classmethod
+    def get_override_composite_scorer_questions_path(cls) -> Sequence[Path]:
+        """
+        Override to provide true/false question prompt paths for objective scoring.


Suggested change

Override to provide true/false question prompt paths for objective scoring.

Get the true/false question prompt paths for objective scoring.

I think the override part is redundant, if a class implements this, it is overridden inherently

hannahwestra25 · 2026-05-08T15:10:12Z

+        Returns:
+            Sequence[Path]: Paths to true/false question prompts, or an empty sequence to use the default scorer.
+        """
+        return []


so this function subclasses one input — a list of paths — but silently determines:

The per-path scorer type (always SelfAskTrueFalseScorer)

The aggregator (always AND)

That a NOT(SelfAskRefusalScorer) backstop is appended

The chat target (via a 3-step fallback inside _get_default_objective_scorer

From the subclass author's perspective in scam.py:88-95, leakage.py:104, cyber.py:64, they "return paths" — and have no idea (without reading the base class) what scorer actually gets built. The method name suggests the mechanism ("composite") while hiding the behavior. so why not have something like :

def _get_default_objective_scorer(self) -> TrueFalseScorer: return self._build_composite_with_refusal_backstop( chat_target=self._resolve_default_scorer_chat_target(), true_false_question_paths=[SCORER_SEED_PROMPT_PATH / "true_false_question" / "scams.yaml"], ) @staticmethod def _build_composite_with_refusal_backstop( *, chat_target: PromptChatTarget, true_false_question_paths: Sequence[Path], ) -> TrueFalseCompositeScorer: path_scorers = [ SelfAskTrueFalseScorer(chat_target=chat_target, true_false_question_path=p) for p in true_false_question_paths ] backstop = TrueFalseInverterScorer(scorer=SelfAskRefusalScorer(chat_target=chat_target)) return TrueFalseCompositeScorer( aggregator=TrueFalseScoreAggregator.AND, scorers=[*path_scorers, backstop], )

My reasoning behind this is ... as an author of a new scenario: "I don't want to think about what kind of scorer is used by default in my scenario ... just give me whatever all scenarios use, and btw, here's a list of questions I care about ... I'm happy with whatever other scenarios are doing with it"

and if a scenario author truely cares about the specific default scorer used for their scenario, they can still override _get_default_objective_scorer

(and if a user, as opposed to author, of a scenario wants a specific scorer, they can create one themselves and pass it as an init param)

that's why I take out the "construction of the scorers" out of individual scenarios, and only leave a way for them to specify what questions they want to be considered for scoring ... and if the maintainers decide that using a different type of scorer is more appropriate as default, scenario authors don't care and don't need to update their default scorer construction ...

I think maybe the method name could actually just be "additional_scoring_questions" or something like that, to indicate these are something the base scenario will "consider", but the author doesn't need to know how exactly.

Am I making sense? :D

"the subclass authors have no idea what scorer actually gets built." is the goal and a good thing in my opinion ...

gotcha yeah i guess i forgot that a user could pass in their own non-default scorer, so this is just the case that the user doesn't pass in any scorer and then we create a custom scorer using the paths. I do think the function should be renamed to default_true_false_question_paths (we have a weird inconsistency with using get_* and default_* in our class functions) and leave the scorer portion out of it but i won't die on that hill 😅

jsong468 · 2026-05-08T21:35:30Z

+OBJECTIVE_SCORER_CHAT_ENDPOINT="https://xxxxx.openai.azure.com/openai/v1"
+OBJECTIVE_SCORER_CHAT_KEY="xxxxx"
+OBJECTIVE_SCORER_CHAT_MODEL="deployment-name"


just making sure this is updated in our global env and probably should be whichever endpoint our best performing scorer uses

jsong468 · 2026-05-08T21:38:32Z

    TARGET_REQUIREMENTS: ClassVar[TargetRequirements] = TargetRequirements()

+    @classmethod
+    def get_override_composite_scorer_questions_path(cls) -> Sequence[Path]:


Suggested change

def get_override_composite_scorer_questions_path(cls) -> Sequence[Path]:

def get_override_composite_scorer_questions_paths(cls) -> Sequence[Path]:

also NIT: would it be more clear to have this be get_override_default_scorer_questions_paths (to make it more obvious it's for the default scorer) or maybe get_override_true_false_scorer_questions_paths since it applies to the true_false scorer portion of the composite.

jsong468 · 2026-05-08T21:44:33Z

+    """
+    return _get_default_chat_target(
+        preferred_target_key="adversarial_chat",
+        required_capabilities={CapabilityName.MULTI_TURN},


does SYSTEM_PROMPT also need to be a require_capability? It seems like most of our multi-turn attacks require it.

jsong468 · 2026-05-08T22:13:04Z

+            return registry_default_scorer
+
+        scorer = TrueFalseInverterScorer(scorer=SelfAskRefusalScorer(chat_target=chat_target))
+        logger.warning(f"Using fallback default objective scorer: {type(scorer).__name__}")


in all the logging for this method, would it also be helpful to output target information?

jsong468 · 2026-05-08T22:20:42Z

+MAIN: str = "main"
+FALLBACK: str = "fallback"


should these names be more descriptive and perhaps with docstring? maybe just me but it's not obvious that MAIN refers to objective_scorer_chat endpoint which is used for scenarios as a default target when the default scorer is not in the registry and FALLBACK refers to the default openai_chat endpoint.

jsong468 · 2026-05-08T22:35:31Z

    # Core scorer registration
    # ---------------------------------------------------------------------------

+    def _register_fallback_scorers(self) -> None:


maybe dumb question, but why do we need to register these? These are both technically just fallbacks for our scenario which are not retrieved from the scorer registry directly, and I'm a bit confused on what the metrics for these would tell us. For evaluating TrueFalseInverterScorer with Refusal, we already use the refusal scorer tagged as "best" and the defaults here would differ per user (i.e., not tied to a model name and my default objective_scorer model could be different from yours).

We already register and evaluate all the different refusal scorers (with different models) and choose the best one for the inverter scoring; adding these just seems like moving closer to what we used to do by trying all possible combinations without the intermediate _tag_best_per_category step. What do you think?
cc: @rlundeen2

romanlutz · 2026-05-08T23:47:21Z

+from pyrit.prompt_target import PromptTarget
 from pyrit.prompt_target.common.target_requirements import TargetRequirements
-from pyrit.registry import ScorerRegistry
+from pyrit.registry.object_registries.scorer_registry import ScorerRegistry


That's a VERY long import from another module. Can we make it work with pyrit.registry? There aren't that many, are there?

romanlutz · 2026-05-08T23:47:32Z

+from pyrit.score.true_false.self_ask_refusal_scorer import SelfAskRefusalScorer
+from pyrit.score.true_false.self_ask_true_false_scorer import SelfAskTrueFalseScorer
+from pyrit.score.true_false.true_false_composite_scorer import TrueFalseCompositeScorer
+from pyrit.score.true_false.true_false_inverter_scorer import TrueFalseInverterScorer
+from pyrit.score.true_false.true_false_score_aggregator import TrueFalseScoreAggregator


same here about short imports.

Behnam Ousat added 3 commits May 6, 2026 15:17

refactor default scenario targets

8477420

rename

5545543

log

b26f553

behnam-o changed the title ~~MAINT simplify how default scenario adversarial and scorer targets resolve~~ MAINT simplify how default adversarial and scorer targets are set in scenarios May 6, 2026

Merge branch 'main' of https://github.com/microsoft/PyRIT into scenar…

3e22987

…io-targets