fix: populate cwd and assistant_response in Chronicle session store#312293
fix: populate cwd and assistant_response in Chronicle session store#312293digitarald wants to merge 1 commit intomicrosoft:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes missing data in the Chronicle local session store by (1) populating sessions.cwd at session initialization and (2) reliably extracting/storing turns.assistant_response from gen_ai.output.messages, including when that attribute has been truncated by truncateForOTel. It also updates the Chronicle intent’s local SQLite schema description so LLM-driven queries better match what’s actually stored.
Changes:
- Populate
sessions.cwdfromvscode.workspace.workspaceFolders?.[0]during session initialization. - Replace JSON-parse-only assistant response extraction with
extractAssistantResponse()that supports truncated OTel JSON payloads, and store a truncated (~1000 char) value inturns.assistant_response. - Add Vitest coverage for
extractAssistantResponse()across valid JSON, truncated JSON, and escaping edge cases.
Show a summary per file
| File | Description |
|---|---|
| extensions/copilot/src/extension/intents/node/chronicleIntent.ts | Updates the local SQLite schema description for cwd and assistant_response. |
| extensions/copilot/src/extension/chronicle/vscode-node/sessionStoreTracker.ts | Sets cwd at session init; extracts and stores truncated assistant responses on turns. |
| extensions/copilot/src/extension/chronicle/common/test/extractAssistantResponse.spec.ts | Adds unit tests covering assistant response extraction, including truncated OTel JSON. |
| extensions/copilot/src/extension/chronicle/common/sessionStoreTracking.ts | Introduces extractAssistantResponse() with a fallback path for truncateForOTel-truncated JSON. |
Copilot's findings
- Files reviewed: 4/4 changed files
- Comments generated: 2
| // Fallback: extract text from truncated JSON via substring. | ||
| // The JSON prefix is fixed: [{"role":"assistant","parts":[{"type":"text","content":" | ||
| // which is 63 chars. The content value follows, terminated by the truncation marker. | ||
| if (!outputMessagesRaw.includes(OTEL_TRUNCATION_MARKER)) { | ||
| return undefined; | ||
| } | ||
| const contentPrefix = '"content":"'; | ||
| const contentStart = outputMessagesRaw.indexOf(contentPrefix); | ||
| if (contentStart === -1) { | ||
| return undefined; | ||
| } | ||
| const textStart = contentStart + contentPrefix.length; |
There was a problem hiding this comment.
The truncated-JSON fallback doesn’t actually use the “fixed prefix” it documents (including the hard-coded 63-char note), and it searches for the first occurrence of "content": which can diverge from the fast-path behavior (fast path only returns parts where type === 'text'). Consider anchoring the fallback to the same structure as the expected OTel output (e.g. locate the "role":"assistant" + "type":"text" + "content":" sequence, then terminate at the truncation suffix), and drop/update the misleading fixed-length comment so future changes don’t accidentally break extraction.
| // Fallback: extract text from truncated JSON via substring. | |
| // The JSON prefix is fixed: [{"role":"assistant","parts":[{"type":"text","content":" | |
| // which is 63 chars. The content value follows, terminated by the truncation marker. | |
| if (!outputMessagesRaw.includes(OTEL_TRUNCATION_MARKER)) { | |
| return undefined; | |
| } | |
| const contentPrefix = '"content":"'; | |
| const contentStart = outputMessagesRaw.indexOf(contentPrefix); | |
| if (contentStart === -1) { | |
| return undefined; | |
| } | |
| const textStart = contentStart + contentPrefix.length; | |
| // Fallback: extract text from truncated JSON by matching the serialized | |
| // assistant text-part prefix, then reading until the truncation marker. | |
| if (!outputMessagesRaw.includes(OTEL_TRUNCATION_MARKER)) { | |
| return undefined; | |
| } | |
| const assistantTextContentPrefix = '"role":"assistant","parts":[{"type":"text","content":"'; | |
| const prefixStart = outputMessagesRaw.indexOf(assistantTextContentPrefix); | |
| if (prefixStart === -1) { | |
| return undefined; | |
| } | |
| const textStart = prefixStart + assistantTextContentPrefix.length; |
| - **sessions**: id, cwd, repository, branch, summary, host_type, agent_name (who created the session, e.g. 'vscode', 'cli', 'CCA', 'CCR'), agent_description, created_at, updated_at | ||
| - **turns**: session_id, turn_index, user_message, assistant_response, timestamp. The richest source of what actually happened — contains the user's prompts and the assistant's replies. | ||
| - **sessions**: id, cwd (workspace folder path), repository, branch, summary, host_type, agent_name (who created the session, e.g. 'vscode', 'cli', 'CCA', 'CCR'), agent_description, created_at, updated_at | ||
| - **turns**: session_id, turn_index, user_message, assistant_response (truncated to ~1000 chars — a summary of the assistant's reply, not the full response), timestamp. The richest source of what actually happened — contains the user's prompts and the assistant's replies. |
There was a problem hiding this comment.
The local schema description says assistant_response is “a summary of the assistant’s reply”, but the implementation stores a truncated prefix (first ~1000 chars + optional ellipsis), which may cut mid-sentence and isn’t a true summary. To avoid misleading the model when it writes queries/interpretations, consider rewording this to something like “first ~1000 characters of the assistant reply” (and mention the ellipsis behavior if relevant).
| - **turns**: session_id, turn_index, user_message, assistant_response (truncated to ~1000 chars — a summary of the assistant's reply, not the full response), timestamp. The richest source of what actually happened — contains the user's prompts and the assistant's replies. | |
| - **turns**: session_id, turn_index, user_message, assistant_response (first ~1000 characters of the assistant reply, with an ellipsis if truncated — not the full response), timestamp. The richest source of what actually happened — contains the user's prompts and the assistant's replies. |
33b91e1 to
ba5336a
Compare
Two fields in the Chronicle local session store were always empty. This fixes both with a consumer-only approach (no OTel producer changes) and adds schema caveats for remaining data gaps.
Changes
cwd— set fromvscode.workspace.workspaceFolders[0]during session init.assistant_response—truncateForOTelproduces invalid JSON for long responses (cuts mid-string, appends suffix). The newextractAssistantResponse()handles this:Schema caveats — updated
_getSchemaDescription()local branch to warn the LLM that:agent_name/agent_descriptionmay be empty for older sessionssummarymay contain raw JSON — prefer JOINing withturns.user_messageassistant_responsemay be empty for older sessionssession_files/session_refsmay be empty for older sessionsWhat's NOT fixed (remaining from #312292)
agent_namepopulation — the code readsGenAiAttr.AGENT_NAMEfrom spans but this attribute may not be set by all producersagent_description— no known source for this fieldsummarycontaining raw JSON — requires changes to the summarization pipelinesession_files/session_refsempty — file/ref tracking was added but may need producer-side workTesting
extractAssistantResponse.spec.ts(valid JSON, truncated JSON, JSON escape handling, edge cases)Partially addresses #312292