Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .changeset/late-buses-enjoy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"braintrust": minor
---

Add support for preserving explicit origin metadata on inline eval cases, so evals that run transformed or pre-resolved rows can retain their source-row provenance.
22 changes: 12 additions & 10 deletions js/src/framework.ts
Original file line number Diff line number Diff line change
Expand Up @@ -1169,6 +1169,17 @@ async function runEvaluatorInternal(
: Dataset.isDataset(evaluator.data)
? evaluator.data
: undefined;
const origin =
datum.origin ??

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one question I have here:
is there some way we could do something like only take the object_type and object_id from the base row rather than the whole origin - since technically the origin id should always be the row id? That way we don't need to change the interface

Something like that could convert dataset_id on the base row -> object_type: dataset, object_id: dataset_id

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd still have to change the API surface to do something like this - right now we are passing these rows inline, so there is no dataset_id in the eval input:

z.object({ data: z.array(z.unknown()) }),

I don't think there is any reference to the base dataset in the existing shape that we could use - I could add that instead of the origin change, but then I think we are still basically implicitly changing the contract.

(eventDataset && datum.id && datum._xact_id
? {
object_type: "dataset",
object_id: await eventDataset.id,
id: datum.id,
created: datum.created,
_xact_id: datum._xact_id,
}
: undefined);

const baseEvent: StartSpanArgs = {
name: "eval",
Expand All @@ -1179,16 +1190,7 @@ async function runEvaluatorInternal(
input: datum.input,
expected: "expected" in datum ? datum.expected : undefined,
tags: datum.tags,
origin:
eventDataset && datum.id && datum._xact_id
? {
object_type: "dataset",
object_id: await eventDataset.id,
id: datum.id,
created: datum.created,
_xact_id: datum._xact_id,
}
: undefined,
origin,
...(datum.upsert_id ? { id: datum.upsert_id } : {}),
},
};
Expand Down
3 changes: 3 additions & 0 deletions js/src/logger.ts
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@ import {
type PromptType as PromptRow,
type PromptSessionEventType as PromptSessionEvent,
type RepoInfoType as RepoInfo,
type ObjectReferenceType as ObjectReference,
type PromptBlockDataType as PromptBlockData,
type ResponseFormatJsonSchemaType as ResponseFormatJsonSchema,
} from "./generated_types";
Expand Down Expand Up @@ -6223,13 +6224,15 @@ export class ObjectFetcher<RecordType> implements AsyncIterable<

export type BaseMetadata = Record<string, unknown> | void;
export type DefaultMetadataType = void;
export type EvalCaseOrigin = ObjectReference;
export type EvalCase<Input, Expected, Metadata> = {
input: Input;
tags?: string[];
// These fields are only set if the EvalCase is part of a Dataset.
id?: string;
_xact_id?: TransactionId;
created?: string | null;
origin?: EvalCaseOrigin;
// This field is used to help re-run a particular experiment row.
upsert_id?: string;
// The number of times to run the evaluator for this specific input.
Expand Down
Loading