Skip to content

MAINT unify how default adversarial and scorer targets are set in scenarios#1695

Open
behnam-o wants to merge 12 commits intomicrosoft:mainfrom
behnam-o:scenario-targets
Open

MAINT unify how default adversarial and scorer targets are set in scenarios#1695
behnam-o wants to merge 12 commits intomicrosoft:mainfrom
behnam-o:scenario-targets

Conversation

@behnam-o
Copy link
Copy Markdown
Contributor

@behnam-o behnam-o commented May 6, 2026

Changes:

1- Use a common logic for resolving default adversarial and scorer targets in scenarios
2- Have a fallback mechanism that uses the target identified by OPENAI_CHAT_*** env variables. This allows someone to only define 1 endpoint and get a runnable scenario

@behnam-o behnam-o changed the title MAINT simplify how default scenario adversarial and scorer targets resolve MAINT simplify how default adversarial and scorer targets are set in scenarios May 6, 2026
Comment thread pyrit/scenario/core/scenario_target_defaults.py
Comment thread pyrit/scenario/core/scenario_target_defaults.py
Comment thread pyrit/scenario/core/scenario.py Outdated
Comment thread pyrit/scenario/core/scenario.py Outdated
Comment thread .env_example
@behnam-o behnam-o marked this pull request as ready for review May 7, 2026 19:57
@behnam-o behnam-o changed the title MAINT simplify how default adversarial and scorer targets are set in scenarios MAINT unify how default adversarial and scorer targets are set in scenarios May 7, 2026
Copy link
Copy Markdown
Contributor

@rlundeen2 rlundeen2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@behnam-o behnam-o added this pull request to the merge queue May 7, 2026
@behnam-o behnam-o removed this pull request from the merge queue due to a manual request May 7, 2026
Comment on lines +122 to +124
When overridden to return a non-empty sequence, the default objective scorer becomes
one ``SelfAskTrueFalseScorer`` per path AND-ed together with ``NOT(SelfAskRefusalScorer)``
instead of the scenario-level default.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I feel like this part is confusing here because the logic for this is not in this specific function and this should be moved to the doc string for _get_default_objective_scorer

@classmethod
def get_override_composite_scorer_questions_path(cls) -> Sequence[Path]:
"""
Override to provide true/false question prompt paths for objective scoring.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Override to provide true/false question prompt paths for objective scoring.
Get the true/false question prompt paths for objective scoring.

I think the override part is redundant, if a class implements this, it is overridden inherently

Returns:
Sequence[Path]: Paths to true/false question prompts, or an empty sequence to use the default scorer.
"""
return []
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this function subclasses one input — a list of paths — but silently determines:

  1. The per-path scorer type (always SelfAskTrueFalseScorer)
  2. The aggregator (always AND)
  3. That a NOT(SelfAskRefusalScorer) backstop is appended
  4. The chat target (via a 3-step fallback inside _get_default_objective_scorer

From the subclass author's perspective in scam.py:88-95, leakage.py:104, cyber.py:64, they "return paths" — and have no idea (without reading the base class) what scorer actually gets built. The method name suggests the mechanism ("composite") while hiding the behavior. so why not have something like :

def _get_default_objective_scorer(self) -> TrueFalseScorer:
    return self._build_composite_with_refusal_backstop(
        chat_target=self._resolve_default_scorer_chat_target(),
        true_false_question_paths=[SCORER_SEED_PROMPT_PATH / "true_false_question" / "scams.yaml"],
    )

@staticmethod
def _build_composite_with_refusal_backstop(
    *,
    chat_target: PromptChatTarget,
    true_false_question_paths: Sequence[Path],
) -> TrueFalseCompositeScorer:
    path_scorers = [
        SelfAskTrueFalseScorer(chat_target=chat_target, true_false_question_path=p)
        for p in true_false_question_paths
    ]
    backstop = TrueFalseInverterScorer(scorer=SelfAskRefusalScorer(chat_target=chat_target))
    return TrueFalseCompositeScorer(
        aggregator=TrueFalseScoreAggregator.AND,
        scorers=[*path_scorers, backstop],
    )

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reasoning behind this is ... as an author of a new scenario: "I don't want to think about what kind of scorer is used by default in my scenario ... just give me whatever all scenarios use, and btw, here's a list of questions I care about ... I'm happy with whatever other scenarios are doing with it"

and if a scenario author truely cares about the specific default scorer used for their scenario, they can still override _get_default_objective_scorer

(and if a user, as opposed to author, of a scenario wants a specific scorer, they can create one themselves and pass it as an init param)

that's why I take out the "construction of the scorers" out of individual scenarios, and only leave a way for them to specify what questions they want to be considered for scoring ... and if the maintainers decide that using a different type of scorer is more appropriate as default, scenario authors don't care and don't need to update their default scorer construction ...

I think maybe the method name could actually just be "additional_scoring_questions" or something like that, to indicate these are something the base scenario will "consider", but the author doesn't need to know how exactly.

Am I making sense? :D

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"the subclass authors have no idea what scorer actually gets built." is the goal and a good thing in my opinion ...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha yeah i guess i forgot that a user could pass in their own non-default scorer, so this is just the case that the user doesn't pass in any scorer and then we create a custom scorer using the paths. I do think the function should be renamed to default_true_false_question_paths (we have a weird inconsistency with using get_* and default_* in our class functions) and leave the scorer portion out of it but i won't die on that hill 😅

Comment thread .env_example
Comment on lines +83 to +85
OBJECTIVE_SCORER_CHAT_ENDPOINT="https://xxxxx.openai.azure.com/openai/v1"
OBJECTIVE_SCORER_CHAT_KEY="xxxxx"
OBJECTIVE_SCORER_CHAT_MODEL="deployment-name"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just making sure this is updated in our global env and probably should be whichever endpoint our best performing scorer uses

TARGET_REQUIREMENTS: ClassVar[TargetRequirements] = TargetRequirements()

@classmethod
def get_override_composite_scorer_questions_path(cls) -> Sequence[Path]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def get_override_composite_scorer_questions_path(cls) -> Sequence[Path]:
def get_override_composite_scorer_questions_paths(cls) -> Sequence[Path]:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also NIT: would it be more clear to have this be get_override_default_scorer_questions_paths (to make it more obvious it's for the default scorer) or maybe get_override_true_false_scorer_questions_paths since it applies to the true_false scorer portion of the composite.

"""
return _get_default_chat_target(
preferred_target_key="adversarial_chat",
required_capabilities={CapabilityName.MULTI_TURN},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does SYSTEM_PROMPT also need to be a require_capability? It seems like most of our multi-turn attacks require it.

return registry_default_scorer

scorer = TrueFalseInverterScorer(scorer=SelfAskRefusalScorer(chat_target=chat_target))
logger.warning(f"Using fallback default objective scorer: {type(scorer).__name__}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in all the logging for this method, would it also be helpful to output target information?

Comment on lines +91 to +92
MAIN: str = "main"
FALLBACK: str = "fallback"
Copy link
Copy Markdown
Contributor

@jsong468 jsong468 May 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should these names be more descriptive and perhaps with docstring? maybe just me but it's not obvious that MAIN refers to objective_scorer_chat endpoint which is used for scenarios as a default target when the default scorer is not in the registry and FALLBACK refers to the default openai_chat endpoint.

# Core scorer registration
# ---------------------------------------------------------------------------

def _register_fallback_scorers(self) -> None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe dumb question, but why do we need to register these? These are both technically just fallbacks for our scenario which are not retrieved from the scorer registry directly, and I'm a bit confused on what the metrics for these would tell us. For evaluating TrueFalseInverterScorer with Refusal, we already use the refusal scorer tagged as "best" and the defaults here would differ per user (i.e., not tied to a model name and my default objective_scorer model could be different from yours).

We already register and evaluate all the different refusal scorers (with different models) and choose the best one for the inverter scoring; adding these just seems like moving closer to what we used to do by trying all possible combinations without the intermediate _tag_best_per_category step. What do you think?
cc: @rlundeen2

from pyrit.prompt_target import PromptTarget
from pyrit.prompt_target.common.target_requirements import TargetRequirements
from pyrit.registry import ScorerRegistry
from pyrit.registry.object_registries.scorer_registry import ScorerRegistry
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a VERY long import from another module. Can we make it work with pyrit.registry? There aren't that many, are there?

Comment on lines +40 to +44
from pyrit.score.true_false.self_ask_refusal_scorer import SelfAskRefusalScorer
from pyrit.score.true_false.self_ask_true_false_scorer import SelfAskTrueFalseScorer
from pyrit.score.true_false.true_false_composite_scorer import TrueFalseCompositeScorer
from pyrit.score.true_false.true_false_inverter_scorer import TrueFalseInverterScorer
from pyrit.score.true_false.true_false_score_aggregator import TrueFalseScoreAggregator
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here about short imports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants