Add local trace replay regression harness by RitwijParmar · Pull Request #494 · braintrustdata/braintrust-sdk-python

Ritwij Aryan Parmar (RitwijParmar) · 2026-06-04T06:53:33Z

Summary

This adds a local trace replay path for the Python SDK. The goal is to make saved Braintrust trace exports useful as regression cases when iterating on an agent/task or scorer, without creating a new experiment just to sanity-check behavior.

What changed:

added braintrust replay for JSON/JSONL span exports
added ReplayTrace so replayed scorers can inspect spans with get_spans() and get_thread()-style access
reports current scores, baseline root-span scores, score deltas, derived trace metrics, and metric deltas
added CI-oriented gates: --min-score, --min-score-delta, and --fail-on-error
documented the workflow in the Python README

Why

For agent/eval workflows, production traces often capture the hard cases: tool-call paths, bad intermediate states, or regressions that unit fixtures miss. This gives users a lightweight way to replay those traces locally and fail a check when a task or scorer change regresses against the saved baseline.

Tests

PYTHONPATH=py/src .venv/bin/python -m pytest py/src/braintrust/test_trace_replay.py -q
PYTHONPATH=py/src .venv/bin/python -m pytest py/src/braintrust/cli/test_push.py py/src/braintrust/test_trace_replay.py -q
PYTHONPATH=py/src .venv/bin/python -m compileall -q py/src/braintrust/trace_replay.py py/src/braintrust/test_trace_replay.py py/src/braintrust/cli/__main__.py py/src/braintrust/__init__.py
git diff --check

Add trace replay regression harness

b48af84

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add local trace replay regression harness#494

Add local trace replay regression harness#494
Ritwij Aryan Parmar (RitwijParmar) wants to merge 1 commit into
braintrustdata:mainfrom
RitwijParmar:codex/braintrust-trace-replay-regression

Ritwij Aryan Parmar (RitwijParmar) commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Ritwij Aryan Parmar (RitwijParmar) commented Jun 4, 2026

Summary

Why

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant