Skip to content

fix(deepseek): force tool_choice=required for thinking models + e2e regression suite#72

Open
roomote[bot] wants to merge 3 commits into
mainfrom
feature/deepseek-v4-e2e-029jkypsvzc50
Open

fix(deepseek): force tool_choice=required for thinking models + e2e regression suite#72
roomote[bot] wants to merge 3 commits into
mainfrom
feature/deepseek-v4-e2e-029jkypsvzc50

Conversation

@roomote
Copy link
Copy Markdown

@roomote roomote Bot commented May 11, 2026

Opened on behalf of Elliott de Launay. View the task or mention @roomote for follow-up asks.

Related GitHub Issue

Addresses PostHog error 019e08dc: 24 occurrences of ConsecutiveMistakeError from deepseek-v4-pro (20) and deepseek-v4-flash (4) thinking models.

See also: #73 — follow-up to apply this fix holistically across all providers.

Description

Root cause: DeepSeek V4 thinking models sometimes return a plain text response instead of calling a tool when tool_choice: "auto" is sent, causing MODEL_NO_TOOLS_USED errors that accumulate into ConsecutiveMistakeError after 3 consecutive failures.

Fix (src/api/providers/deepseek.ts): Override tool_choice from "auto" to "required" inside DeepSeekHandler.createMessage when isThinkingModel is true and tools are present. Since attempt_completion is always included as a tool, every turn must end with a tool call — "required" is semantically correct here. Scoped to thinking models only to minimise blast radius; #73 tracks the global fix.

E2E regression suite (apps/vscode-e2e/src/suite/providers/deepseek-v4.test.ts): Four tests covering deepseek-v4-flash and deepseek-v4-pro × reasoning on/off. Each test verifies:

  • Correct model is requested
  • max_completion_tokens is set
  • thinking and reasoning_effort params match the reasoning toggle
  • Task completes via the read_fileattempt_completion tool loop
  • Exact marker value is returned (no hallucination)

Fixtures use toolCallId matching for turn-2 to avoid cross-test count contamination from aimock's global match counters (documented in AGENTS.md).

Test Procedure

Mock mode (no API key needed):

USE_MOCK=true TEST_FILE=deepseek-v4.test pnpm --filter @roo-code/vscode-e2e test:run
# Expected: 4 passing

Record mode (requires DeepSeek API key):

DEEPSEEK_API_KEY=<key> TEST_FILE=deepseek-v4.test pnpm --filter @roo-code/vscode-e2e test:record

Unit tests:

cd src && pnpm test -- deepseek.spec

Pre-Submission Checklist

  • Issue Linked: This PR is linked to an approved GitHub Issue (see above).
  • Scope: Changes are focused on the linked issue.
  • Self-Review: Thorough self-review performed.
  • Testing: New and updated tests added.
  • Documentation Impact: No user-facing docs require updates; AGENTS.md updated with fixture guidance.
  • Contribution Guidelines: Read and agreed.

Screenshots / Videos

N/A — backend provider change, no UI impact.

Documentation Updates

  • No documentation updates required.

apps/vscode-e2e/AGENTS.md updated with multi-turn fixture and toolCallId matching guidance.

Additional Notes

If another provider shows similar no_tools_used flakiness, apply the same targeted override pattern and reference #73.

Get in Touch

@roomote roomote Bot added the roomote:auto-resolve-conflicts Allow Roomote to auto-resolve merge conflicts for this PR label May 11, 2026
@roomote
Copy link
Copy Markdown
Author

roomote Bot commented May 11, 2026

1 issue outstanding. Action required. See task

  • Tighten the DeepSeek fetch interceptor to match the real host or origin instead of url.includes("api.deepseek.com"); the current substring check trips CodeQL and can match arbitrary URLs.
  • Treat DEEPSEEK_API_KEY as a live-provider credential in apps/vscode-e2e/src/runTest.ts; otherwise TEST_FILE=deepseek-v4.test pnpm --filter @roo-code/vscode-e2e test:ci still forces aimock replay instead of exercising the real DeepSeek API.
  • CI is still pending after the review wait window: platform-unit-test (windows-latest).

Comment thread apps/vscode-e2e/src/suite/providers/deepseek-v4.test.ts Fixed
@codecov
Copy link
Copy Markdown

codecov Bot commented May 11, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Comment thread apps/vscode-e2e/src/suite/providers/deepseek-v4.test.ts Outdated
const isDeepSeekTest = testFile?.includes("deepseek-v4") === true

if (isRecord && !process.env.OPENROUTER_API_KEY) {
if (isRecord && isDeepSeekTest && !process.env.DEEPSEEK_API_KEY) {
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds the DeepSeek-specific record-mode key check, but the useMock gate below still ignores DEEPSEEK_API_KEY. Running DEEPSEEK_API_KEY=... TEST_FILE=deepseek-v4.test pnpm --filter @roo-code/vscode-e2e test:ci still starts aimock and injects AIMOCK_URL, so the new suite replays fixtures instead of exercising the live provider this PR is meant to validate. The DeepSeek key needs to count as a real-provider credential for DeepSeek-targeted runs.

@edelauna edelauna changed the title [Chore] Add DeepSeek V4 agentic regression suite fix(deepseek): force tool_choice=required for thinking models + e2e regression suite May 11, 2026
@edelauna edelauna marked this pull request as ready for review May 11, 2026 16:32
@edelauna edelauna requested a review from hannesrudolph as a code owner May 11, 2026 16:32
@edelauna edelauna force-pushed the feature/deepseek-v4-e2e-029jkypsvzc50 branch from 6189beb to cdae5dd Compare May 11, 2026 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

roomote:auto-resolve-conflicts Allow Roomote to auto-resolve merge conflicts for this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants