Summary
Anthropic SDK v1.44.0 added usage.output_tokens_details to Messages API responses. This nested object contains thinking_tokens — the number of output tokens consumed by extended thinking/reasoning. The Braintrust Ruby SDK does not capture this field. Users who enable extended thinking via anthropic.messages.create(thinking: {...}, ...) or via the beta messages API have no visibility into their thinking token consumption.
This is distinct from issue #164 (RubyLLM extended thinking), which covers the ruby_llm gem. This issue covers the direct anthropic gem instrumentation.
What is missing
The Anthropic Messages API now returns:
"usage": {
"input_tokens": 2095,
"output_tokens": 503,
"cache_creation_input_tokens": 2051,
"cache_read_input_tokens": 2051,
"output_tokens_details": {
"thinking_tokens": 312
}
}
output_tokens_details.thinking_tokens is the count of tokens the model generated as internal reasoning (always ≤ output_tokens). Capturing it allows users to:
- Attribute cost to extended thinking vs. standard output
- Diagnose cases where reasoning dominates total output tokens
- Compare thinking token spend across requests
Why it is dropped today
Common.parse_usage_tokens in lib/braintrust/contrib/anthropic/instrumentation/common.rb iterates over the top-level usage hash and skips any value that is not Numeric:
usage_hash.each do |key, value|
next unless value.is_a?(Numeric) # ← skips output_tokens_details (a Hash)
...
end
output_tokens_details maps to {thinking_tokens: 312}, which fails the Numeric check and is silently dropped. No field in the existing field_map covers it.
The same gap applies to both the stable Messages API instrumentation (messages.rb) and the beta Messages API instrumentation (beta_messages.rb), since both delegate to Common.parse_usage_tokens.
Braintrust docs status
not_found — The Braintrust Anthropic integration docs at https://www.braintrust.dev/docs/providers/anthropic list prompt_tokens, completion_tokens, and cache metrics as captured but do not mention thinking tokens or output_tokens_details.
Upstream sources
Local files inspected
lib/braintrust/contrib/anthropic/instrumentation/common.rb — parse_usage_tokens method (lines 14–48): 4-field field_map; Numeric guard silently drops nested objects like output_tokens_details
lib/braintrust/contrib/anthropic/instrumentation/messages.rb — set_metrics (line 132) calls Common.parse_usage_tokens; also captures streaming output via finalize_stream_span
lib/braintrust/contrib/anthropic/instrumentation/beta_messages.rb — same parse_usage_tokens call pattern
Summary
Anthropic SDK v1.44.0 added
usage.output_tokens_detailsto Messages API responses. This nested object containsthinking_tokens— the number of output tokens consumed by extended thinking/reasoning. The Braintrust Ruby SDK does not capture this field. Users who enable extended thinking viaanthropic.messages.create(thinking: {...}, ...)or via the beta messages API have no visibility into their thinking token consumption.This is distinct from issue #164 (RubyLLM extended thinking), which covers the
ruby_llmgem. This issue covers the directanthropicgem instrumentation.What is missing
The Anthropic Messages API now returns:
output_tokens_details.thinking_tokensis the count of tokens the model generated as internal reasoning (always ≤output_tokens). Capturing it allows users to:Why it is dropped today
Common.parse_usage_tokensinlib/braintrust/contrib/anthropic/instrumentation/common.rbiterates over the top-level usage hash and skips any value that is notNumeric:output_tokens_detailsmaps to{thinking_tokens: 312}, which fails theNumericcheck and is silently dropped. No field in the existingfield_mapcovers it.The same gap applies to both the stable Messages API instrumentation (
messages.rb) and the beta Messages API instrumentation (beta_messages.rb), since both delegate toCommon.parse_usage_tokens.Braintrust docs status
not_found— The Braintrust Anthropic integration docs athttps://www.braintrust.dev/docs/providers/anthropiclist prompt_tokens, completion_tokens, and cache metrics as captured but do not mention thinking tokens oroutput_tokens_details.Upstream sources
usage.output_tokens_details.thinking_tokensfield: https://platform.claude.com/docs/en/api/messagesoutput_tokens_detailsand mid-conversation usage details: https://github.com/anthropics/anthropic-sdk-ruby/blob/main/CHANGELOG.mdLocal files inspected
lib/braintrust/contrib/anthropic/instrumentation/common.rb—parse_usage_tokensmethod (lines 14–48): 4-fieldfield_map;Numericguard silently drops nested objects likeoutput_tokens_detailslib/braintrust/contrib/anthropic/instrumentation/messages.rb—set_metrics(line 132) callsCommon.parse_usage_tokens; also captures streaming output viafinalize_stream_spanlib/braintrust/contrib/anthropic/instrumentation/beta_messages.rb— sameparse_usage_tokenscall pattern