Skip to content

fix(optimization): handle None metric scores in LocalEvalSampler#5415

Open
JesserHamdaoui wants to merge 4 commits intogoogle:mainfrom
JesserHamdaoui:fix/5403-LocalEvalSampler-TypeError
Open

fix(optimization): handle None metric scores in LocalEvalSampler#5415
JesserHamdaoui wants to merge 4 commits intogoogle:mainfrom
JesserHamdaoui:fix/5403-LocalEvalSampler-TypeError

Conversation

@JesserHamdaoui
Copy link
Copy Markdown

Fixes #5403


Summary

When running adk optimize, if a metric evaluation fails (e.g., due to a transient API error, missing rubrics, or a malformed JSONDecodeError response from the LLM judge), local_eval_service.py gracefully catches the exception and returns an EvaluationResult with a None score and NOT_EVALUATED status.

However, LocalEvalSampler._extract_eval_data subsequently attempts to unconditionally round this value, resulting in a TypeError: type NoneType doesn't define __round__ method, which crashes the entire optimization loop rather than safely skipping or reporting the failed case.

Changes

  • google/adk/optimization/local_eval_sampler.py: Guarded the metric score rounding step in _extract_eval_data.
    • Before: "score": round(eval_metric_result.score, 2)
    • After: "score": round(eval_metric_result.score, 2) if eval_metric_result.score is not None else None
    • This correctly maintains the None value in the diagnostic trace data for failed evals.

Huge shoutout to the issue author @msteiner-google for the detailed bug report, root cause analysis, and for suggesting the fix!


Motivation

Optimization loops can run for a long time and make dozens of LLM calls. If a single evaluation case fails due to an intermittent network issue or a temporary rate limit, the NOT_EVALUATED status is the correct fallback. Crashing the entire adk optimize run because of a missing None check wastes compute, time, and API quotas. By preserving None, the optimizer can safely continue and log that the metric did not produce a score.


Test plan

Unit Tests:

  • Added test_extract_eval_data_preserves_none_metric_score in tests/unittests/optimization/local_eval_sampler_test.py to verify that _extract_eval_data preserves "score": None and retains the proper NOT_EVALUATED status without throwing a TypeError.
  • Ran targeted test with uv run pytest tests/unittests/optimization/local_eval_sampler_test.py::test_extract_eval_data_preserves_none_metric_score -q (Result: 1 passed).

Manual Reproduction & Verification:

  • Simulated the interruption: Created a local script to intentionally trigger the bug by forcing a None score during the evaluation step.
  • Verified the fix: Ran the simulation against the updated code. Before the fix, the script consistently crashed with TypeError: type NoneType doesn't define __round__ method. After applying the fix in this PR, the optimizer safely handled the None scores and ran to completion without crashing.

Used the hello_world example from the provided samples and followed the optimization documentation. Then added patch_and_run.py file in my local environment to force the eval failure

# Simulated the issue by triggering an eval failure to force None scores
# and verifying the optimizer handles it gracefully.
sampler_config = LocalEvalSamplerConfig(
    eval_config=EvalConfig(criteria={"rubric_based_tool_use_quality_v1": 0.75}), # Or a metric missing rubrics
    app_name="hello_world",
    train_eval_set="train_eval_set",
)
sampler = LocalEvalSampler(
    sampler_config, 
    LocalEvalSetsManager(agents_dir=os.path.dirname(os.getcwd()))
)

opt_config = GEPARootAgentPromptOptimizerConfig(max_metric_calls=5)
optimizer = GEPARootAgentPromptOptimizer(config=opt_config)

# Before PR: Crashes with TypeError on None. After PR: Runs successfully.
result = asyncio.run(optimizer.optimize(agent.root_agent, sampler))

@JesserHamdaoui JesserHamdaoui changed the title Fix/5403 local eval sampler type error fix(optimization): handle None metric scores in LocalEvalSampler Apr 20, 2026
@adk-bot adk-bot added the eval [Component] This issue is related to evaluation label Apr 20, 2026
@rohityan rohityan self-assigned this Apr 20, 2026
@rohityan
Copy link
Copy Markdown
Collaborator

Hi @JesserHamdaoui , Thank you for your contribution! We appreciate you taking the time to submit this pull request. Your PR has been received by the team and is currently under review. We will provide feedback as soon as we have an update to share.

@rohityan rohityan added the needs review [Status] The PR/issue is awaiting review from the maintainer label Apr 24, 2026
@rohityan
Copy link
Copy Markdown
Collaborator

Hi @DeanChensj , can you please review this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

eval [Component] This issue is related to evaluation needs review [Status] The PR/issue is awaiting review from the maintainer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TypeError in LocalEvalSampler when metric evaluation fails

4 participants