docs(examples): add chat reasoning_effort examples#511
Open
andreaonofrei01 wants to merge 4 commits intomainfrom
Open
docs(examples): add chat reasoning_effort examples#511andreaonofrei01 wants to merge 4 commits intomainfrom
andreaonofrei01 wants to merge 4 commits intomainfrom
Conversation
Add four examples in examples/mistral/chat/ demonstrating reasoning_effort on mistral-medium-3-5: - reasoning_response_shape.py: dump the raw response shape for reasoning_effort="high" vs "none" so users see the ThinkChunk / TextChunk JSON before consuming it. - reasoning.py: single-turn call, iterate ThinkChunk and TextChunk in message.content. - reasoning_with_streaming.py: handle streaming deltas where chunks arrive as ThinkChunk lists during thinking and as plain string fragments after thinking ends. - reasoning_multi_turn.py: 3-turn math chain run with two replay strategies (keep vs drop ThinkChunks) and prints token usage so the cost difference is visible.
louis-sanna-dev
approved these changes
May 4, 2026
- reasoning_multi_turn.py: drop the keep-vs-drop comparison and recommend keeping ThinkChunks across turns. Per reviewer feedback, dropping the reasoning trace degrades MM3.5 performance. - All four files: replace httpx.Client(timeout=...) with the SDK's timeout_ms parameter; remove the httpx import.
Avoids the unnecessary AssistantMessage(content=content) re-wrap and forwards any future fields on AssistantMessage automatically. Verified end-to-end: 3/3 runs of the math chain produce 391 -> 1173 -> 1073, and history inspection confirms each AssistantMessage slot preserves [ThinkChunk, TextChunk].
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds four runnable examples in
examples/mistral/chat/demonstratingreasoning_effortonmistral-medium-3-5. The SDK already supports the parameter and theThinkChunk/TextChunktypes, but there were no examples showing how to use them.Files
reasoning_response_shape.py: first stop. Calls the API once withreasoning_effort="high"and once with"none", then dumps the rawmessage.contentso the reader can see theThinkChunk/TextChunkJSON before consuming it.reasoning.py: single-turn call. Iteratesmessage.contentand branches onisinstance(chunk, ThinkChunk | TextChunk). Notes thatreasoning_effort="none"returns a plainstr.reasoning_with_streaming.py: the streaming version. Handles the three shapes that arrive ondelta.content:ThinkChunklists during the thinking phase, a transition list containing both a closingThinkChunkand the firstTextChunk, and plain string fragments after thinking ends.reasoning_multi_turn.py: 3-turn math chain (17 × 23 → × 3 → −100) run with two replay strategies (keepThinkChunks vs drop them) and prints per-turn token usage so the cost difference is visible.Notes
httpx.Timeout(300.0)because reasoning runs routinely exceed the default 60s read timeout.mistral-medium-3-5(the model docs explicitly list as supportingreasoning_effort).Test plan
python examples/mistral/chat/reasoning_response_shape.py: verified output matches the documentedThinkChunk/TextChunkschema;effort="none"returnsstr.python examples/mistral/chat/reasoning.py: verified the John puzzle returnsJohn is 22 years old.python examples/mistral/chat/reasoning_with_streaming.py: verified the trains problem returns11:16:40 amand the thinking → final-answer transition renders cleanly.python examples/mistral/chat/reasoning_multi_turn.py: verified both strategies arrive at1073; observed totals keep≈1656 vs drop≈637 tokens (varies with temperature=0.7).