Fix: agent engine sandbox code executor gemini 2.x incompatibility#5499
Open
JesserHamdaoui wants to merge 13 commits intogoogle:mainfrom
Open
Conversation
Steering prompt that tells Gemini 2.x to wrap code in tool_code fences instead of emitting native executable_code parts when no code_execution tool is declared on the request.
…L_CALL_RE constants Frozenset of Gemini 2.x error codes that indicate a native code_execution tool call was rejected, and a regex to extract the code payload from the error message.
Parses the code payload out of a Gemini UNEXPECTED_TOOL_CALL rejection error message using _UNEXPECTED_TOOL_CALL_RE.
Reconstructs the executable_code part that Gemini 2.x intended to emit when the API rejected the response with UNEXPECTED_TOOL_CALL or MALFORMED_FUNCTION_CALL, allowing the sandbox executor pipeline to proceed normally.
…-in executors Appends _NON_BUILTIN_EXECUTOR_INSTRUCTION to every LLM request that uses a non-BuiltInCodeExecutor, steering Gemini 2.x to output code in tool_code markdown fences rather than native executable_code parts which the API rejects as UNEXPECTED_TOOL_CALL.
…ost-processor When Gemini 2.x emits a native code_execution call and the API rejects it, llm_response.content is empty. For non-built-in executors, attempt to reconstruct the executable_code part from the error message via _maybe_recover_from_api_rejection so the sandbox executor pipeline can still run the code.
2 tasks
…ni-2.x-incompatibility
Collaborator
|
Hi @JesserHamdaoui , Thank you for your contribution! We appreciate you taking the time to submit this pull request. Can you please fix the failing mypy-diff tests before we can proceed with the review. |
Contributor
Author
|
Hi @rohityan, |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem:
AgentEngineSandboxCodeExecutorfails with Gemini 2.x models when the agent is asked to execute Python code. The observed errors areUNEXPECTED_TOOL_CALL(when no other tools are registered) orMALFORMED_FUNCTION_CALL(when other tools are present).The failure is a contract mismatch across three layers that must agree for code execution to succeed:
AgentEngineSandboxCodeExecutor```python/```tool_codemarkdown fences inside text partsextract_code_and_truncate_contentexecutable_codeparts correctly — this layer is not the problemexecutable_codepart is only valid if the request explicitly declaredTool(code_execution=ToolCodeExecution()), otherwise the response is rejected before ADK ever sees itGemini 2.x models, as it seems, are post-trained to satisfy "execute Python" requests by emitting structured native
executable_codeparts rather than markdown text. BecauseAgentEngineSandboxCodeExecutordoes not declare thecode_executiontool in the outgoing request (it was designed to receive markdown), the Vertex AI API validator rejects the model's response and returnscontent=null. The ADK post-processor never receives anything to route to the sandbox.This is why passing
code_executor=AgentEngineSandboxCodeExecutor(...)to the agent constructor does not help:code_executoris an ADK-side construct only. It tells ADK where to send code once a response arrives; it does not communicate anything to the Vertex AI API server, which has no knowledge of the attached sandbox and enforces the tool-declaration contract at response validation time.Solution:
Two complementary fixes applied entirely within
google/adk/flows/llm_flows/_code_execution.py, with no changes to any other ADK file and no public API changes:Layer 1 — Pre-processor steering (
_run_pre_processor): When the configured executor is aBaseCodeExecutorbut not aBuiltInCodeExecutor, append a system-instruction (_NON_BUILTIN_EXECUTOR_INSTRUCTION) to every outgoing request. The instruction explicitly tells the model to wrap Python code in```tool_codemarkdown fences and forbids nativeexecutable_codeemission, reducing the frequency with which the model triggers the API validator.Layer 2 — Response-processor recovery (
_run_post_processor): When the API still rejects the response withUNEXPECTED_TOOL_CALLorMALFORMED_FUNCTION_CALL, parse the rejected code out oferror_messagevia_extract_code_from_error_message, reconstruct a syntheticexecutable_codepart onllm_response.content, clear the error fields, and let the existingextract_code_and_truncate_content→code_executor.execute_codepipeline handle it exactly as if the model had emitted the part cleanly. Note thatextract_code_and_truncate_contentalready supportsexecutable_codeparts, this recovery path simply gives it the chance to run.Together, Layer 1 stops most rejections at the request side and Layer 2 rescues the cases where the model still emits a native tool call despite the steering, ensuring the full User → Model → Executor → Sandbox flow completes across Gemini 2.x.
Testing Plan
Unit Tests:
Four new test groups were added to
tests/unittests/flows/llm_flows/test_code_execution.py:test_extract_code_from_error_message_*Noneinput, non-matching error messagetest_maybe_recover_from_api_rejection_*UNEXPECTED_TOOL_CALLrecovery,MALFORMED_FUNCTION_CALLrecovery, unrecognised error code, missing error code, unparseable messagetest_pre_processor_injects_instruction_*BuiltInCodeExecutortest_post_processor_recovery_*BuiltInCodeExecutorskips the recovery path entirelyManual End-to-End (E2E) Tests:
Please refer to #5315 for the full reproduction script and setup steps. Using the reproduction case from that issue, the fix was verified against a live Vertex AI Agent Engine sandbox.