feat(logs): trace span tree rewrite with resizable split, provider icons, and execution improvements#4292
feat(logs): trace span tree rewrite with resizable split, provider icons, and execution improvements#4292waleedlatif1 wants to merge 15 commits intostagingfrom
Conversation
…ion enrichment Unify tool calls under span.children, capture dual-clock timing, and surface per-iteration model content (assistant text, thinking, tool calls, finish reason, tokens, cost, ttft, provider, errors) across all 12 LLM providers. UI renders the new fields on model child spans; old logs degrade gracefully since every field is optional. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Wrap log-details drawer in Overview | Trace tabs; Overview unchanged - New TraceView with hierarchical tree on the left and detail pane on the right - Keyboard nav, span filter, expand/collapse all - Bump min drawer width 400->600 and clamp persisted widths on rehydrate Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Brings PR #4181 inline: persists workflowInput on successful runs, adds useRetryExecution mutation (streaming read-one-chunk-and-cancel), Retry entrypoints in the row context menu and the detail sidebar, and extractRetryInput with fallback to starter block state for older logs. Also surfaces the captured input in a new "Workflow Input" section above Workflow Output in the detail Overview tab, guarded so older logs without the field don't render an empty block. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…oped keyboard nav - Pad tree rows from panel edge so the root chevron isn't visually clipped. - Key DetailCodeSection by label so collapse state belongs to the section purpose, preventing isOpen from leaking across span changes when positional slots happened to align. - Ignore log-to-log arrow-key nav while the Trace tab is active so TraceView owns span navigation; filter inputs keep native caret movement. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Chevron at depth 0 and the timeline bars now sit on the same 14px left/right grid as the trace view's header strip and the rest of the log details panel, removing the stagger where bars extended further left than chevrons and the chevron appeared cramped against the panel edge. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Overview tab's scroll container (SModalTabsContent) was wrapped around a non-overflow inner div that held the scrollAreaRef, so the scroll-reset on log change targeted a non-scrolling element. Collapse the wrapper into the Tabs.Content element itself and move the ref there. Add min-h-0 to the Trace detail pane wrapper so its scrolling child can shrink inside the horizontal-flex row.
Tailwind's `.flex` utility overrides the UA `[hidden]` rule, so applying `flex` to SModalTabsContent caused the inactive Overview panel to still participate in the Tabs flex column and push the Trace view down. Keep SModalTabsContent as a plain overflow container (no `flex` class) with the scroll ref on it, and restore the inner flex-col wrapper for the Overview content so it still stacks with gap spacing.
- Tree pane now has top padding so the first row has breathing room under the header strip instead of sitting flush against the border. - DetailCodeSection dropped its wrapper `overflow-hidden`. Per CSS, a flex item with `overflow: hidden` resolves `min-height: auto` to `0`, so when Input and Output were both expanded the flex algorithm shrank each section below its content, cutting off rows. Without the clip, sections size to content and the surrounding pane's `overflow-y-auto` takes over. - Selected span row now scrolls into view on selection change, so arrow-key navigation always keeps the active row visible in the tree pane.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…, cleanup - Resizable tree/detail split in trace view (default 360px, drag to resize) - Resizable right panel in preview snapshot (280–600px) - Fix Gantt bar invisibility for late-run spans (clamp offsetPct to 100-MIN_BAR_PCT) - Propagate model+provider to child model spans in span-factory for correct icons - Fix icon contrast on light provider backgrounds (luminance-based color class) - Replace custom status badges with emcn Badge component - Lighten jump-to-error button to ghost variant - Remove double X button in modal snapshot (showBlockCloseButton prop) - Fix emcn subpath imports → barrel in trace-view, log-details, execution-snapshot - Fix hover: → hover-hover: on resize handles - Add body style cleanup on resize unmount - Fix React Query key factory naming (stats/stat convention) - Remove unnecessary useCallback/useMemo in preview and execution-snapshot
- logs: only scroll-into-view on keyboard nav, not on click selection - resource: stable scrollbar gutter, wider first column - credentials: toast success/error feedback, remove useMemo for personalEnvData, allow editing conflict rows, fix disabled state visibility, use --text-error token - integrations: use --text-error token for error state - input: increase right padding (px-2 → pl-2 pr-3)
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
PR SummaryMedium Risk Overview Upgrades trace-span rendering to show additional per-span data (e.g. thinking/tool calls, provider/model metadata, tokens/cost/TTFT/throughput, explicit error type/message) and normalizes child/tool spans handling; also updates Copilot log tools to extract tool calls from both legacy and new span shapes. Introduces retry execution from both the log row context menu and sidebar (fetches log detail, derives retry input with On the backend/executor side, improves block logging timing consistency and makes Reviewed by Cursor Bugbot for commit fbcee22. Configure here. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit fbcee22. Configure here.
| }) | ||
| } | ||
| } | ||
| }) |
There was a problem hiding this comment.
Fire-and-forget callbacks create cost tracking race condition
High Severity
The fireBlockCompleteCallback changed from awaiting onBlockComplete to fire-and-forget (void promise.catch(…)). The onBlockComplete handler in logging-session.ts accumulates costs on this.accumulatedCost after an internal await this.trackProgressWrite(…). Since the callback is no longer awaited, the workflow can finish and read accumulatedCost before the last block's cost accumulation completes — resulting in incomplete cost/token data in the final execution log. This is especially impactful for short workflows or single-block executions where the cost could be entirely missing.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit fbcee22. Configure here.
Greptile SummaryThis PR rewrites the trace span pipeline end-to-end: a new Confidence Score: 5/5Safe to merge; all remaining findings are P2 style/robustness suggestions with no data-loss or crash risk on the happy path. No P0 or P1 issues found. The three P2 comments cover: (1) a fragile heuristic in a legacy-log fallback that only affects old-format logs, (2) a fire-and-forget DB backpressure concern that is intentional per the PR description, and (3) a name-pattern false-positive in iteration grouping that requires a deliberately unusual block name to trigger. apps/sim/app/workspace/[workspaceId]/logs/utils.ts (extractRetryInput heuristic), apps/sim/lib/logs/execution/trace-spans/iteration-grouping.ts (name-match classification) Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[BlockExecutor.executeBlock] -->|captures startedAt/startTime| B[createBlockLog]
B --> C[fireBlockStartCallback\nvoid .catch]
C -->|async, non-blocking| D[(DB: progress marker)]
A --> E[execute block]
E -->|success| F[fireBlockCompleteCallback\nvoid .catch]
E -->|error| G[handleBlockError\nfireBlockCompleteCallback]
F -->|async, non-blocking| D
G -->|async, non-blocking| D
E --> H[BlockLog pushed to ctx.blockLogs]
H --> I[ExecutionResult.logs]
I --> J[buildTraceSpans]
J --> K[createSpanFromLog\nspan-factory.ts]
K -->|timeSegments present| L[buildChildrenFromTimeSegments]
K -->|no timeSegments| M[buildChildrenFromToolCalls]
J --> N[groupIterationBlocks\niteration-grouping.ts]
N --> O[groupIterationBlocksRecursive\nLoop/Parallel containers]
O --> P[TraceView\ntrace-view.tsx]
P --> Q[Resizable split\nTree pane left]
P --> R[Detail pane right]
Q --> S[Gantt bars]
Q --> T[Provider icons]
Reviews (1): Last reviewed commit: "chore(skills): add /ship command to clau..." | Re-trigger Greptile |
| blockStates?: Record< | ||
| string, | ||
| { output?: unknown; executed?: boolean; executionTime?: number } | ||
| > | ||
| } | ||
| | undefined | ||
| if (!executionState?.blockStates) return undefined | ||
|
|
||
| // Starter/trigger blocks are pre-populated with executed: false and | ||
| // executionTime: 0, which distinguishes them from blocks that actually ran. | ||
| for (const state of Object.values(executionState.blockStates)) { |
There was a problem hiding this comment.
extractRetryInput heuristic is fragile for failed runs
The condition executed === false && executionTime === 0 && output != null is intended to identify starter/trigger blocks, but it is not unique. In a run that failed early, any block that was skipped or never reached also satisfies executed: false, executionTime: 0. Object.values iteration order is insertion order (stable in V8, but not guaranteed to put the trigger block first). If the first matching state belongs to a non-trigger block, the wrong output is returned as the retry input — causing a retry with unexpected payload.
A safer approach is to also check that the blockId matches a known trigger block type, or to use the workflowInput field (added in this same PR) directly and only fall back to the heuristic for pre-migration logs.
| return redactApiKeys(result) | ||
| } | ||
|
|
||
| private async callOnBlockStart( | ||
| /** | ||
| * Fires the `onBlockStart` progress callback without blocking block execution. | ||
| * Any error is logged and swallowed so callback I/O never stalls the critical path. | ||
| */ | ||
| private fireBlockStartCallback( | ||
| ctx: ExecutionContext, | ||
| node: DAGNode, | ||
| block: SerializedBlock, | ||
| executionOrder: number | ||
| ): Promise<void> { | ||
| ): void { | ||
| if (!this.contextExtensions.onBlockStart) return | ||
|
|
||
| const blockId = node.metadata?.originalBlockId ?? node.id | ||
| const blockName = block.metadata?.name ?? blockId | ||
| const blockType = block.metadata?.id ?? DEFAULTS.BLOCK_TYPE |
There was a problem hiding this comment.
Fire-and-forget callbacks remove backpressure on DB writes
fireBlockStartCallback and fireBlockCompleteCallback now fire-and-forget (via void ...catch), removing the serial await that previously guaranteed each progress DB write finished before the next block began. Under heavy load or high-iteration-count loops, many unresolved promises can pile up simultaneously, potentially exhausting the DB connection pool and producing out-of-order progress records. Errors are only logged at warn, so callers have no signal that a write was dropped.
If the callbacks are purely best-effort streaming markers this is acceptable, but consider documenting that contract explicitly and adding a max-concurrency guard (e.g., a semaphore or a sliding window) for workflows with large fan-out.
| * | ||
| * Handles two cases: | ||
| * 1. **Flat** (backward compat): spans have loopId/parallelId + iterationIndex but no | ||
| * parentIterations. Grouped by immediate container -> iteration -> leaf. | ||
| * 2. **Nested** (new): spans have parentIterations chains. The outermost ancestor in the | ||
| * chain determines the top-level container. Iteration spans are peeled one level at a | ||
| * time and recursed. | ||
| */ | ||
| function groupIterationBlocksRecursive( |
There was a problem hiding this comment.
Name-pattern match can produce false positives for user-named blocks
The check span.name.match(/^(.+) \(iteration (\d+)\)$/) is used as the primary signal that a span belongs to an iteration. Any span whose user-supplied name ends with " (iteration N)" — e.g. a block named "Image (iteration 3)" — is silently reclassified as an iteration span and nested inside a loop container, even if it has no loopId or parallelId. This could corrupt the trace tree for such blocks.
Consider also requiring that the span has a loopId, parallelId, or non-empty parentIterations before treating it as an iteration span.


Summary
Type of Change
Testing
Tested manually with runs containing agents, loops, parallels, nested workflows, and error paths.
Checklist