Skip to content

refactor(engine): pass Python worker startup arguments by name#5597

Open
yangzhang75 wants to merge 2 commits into
apache:mainfrom
yangzhang75:refactor/python-worker-named-startup-args
Open

refactor(engine): pass Python worker startup arguments by name#5597
yangzhang75 wants to merge 2 commits into
apache:mainfrom
yangzhang75:refactor/python-worker-named-startup-args

Conversation

@yangzhang75

Copy link
Copy Markdown
Contributor

What changes were proposed in this PR?

Passes Python worker startup configuration by name instead of by argv position, as proposed in the issue.

PythonWorkflowWorker (JVM) previously built ~19 positional command-line arguments, and texera_run_python_worker.py unpacked them positionally. Because the two sides agreed only by index, adding/removing/reordering one argument could silently misassign values.

  • PythonWorkflowWorker.scala: build a single JSON object of named startup-config keys and pass it as one argument.
  • texera_run_python_worker.py: parse that JSON and read each value by key; a missing or renamed key now raises a clear KeyError instead of silently misaligning.

The set of keys written on the Scala side and read on the Python side is identical (19 keys). No behavior change otherwise.

Any related issues, documentation, discussions?

Closes #5547

How was this PR tested?

  • Verified the JSON key set written by PythonWorkflowWorker.scala exactly matches the keys read in texera_run_python_worker.py (19 keys, no drift).
  • WorkflowExecutionService/compile (amber) succeeds.
  • scalafmtCheckAll passes; scalafix rules (RemoveUnused, ProcedureSyntax) are satisfied (the new import is used).
  • Python entry script passes py_compile.
  • End-to-end worker launch is exercised by the existing CI integration jobs (amber-integration), which start a real Python worker.

Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

@codecov-commenter

codecov-commenter commented Jun 10, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 18.75000% with 26 lines in your changes missing coverage. Please review.
✅ Project coverage is 52.39%. Comparing base (07ca5d4) to head (72affa3).

Files with missing lines Patch % Lines
...chitecture/pythonworker/PythonWorkflowWorker.scala 0.00% 25 Missing ⚠️
amber/src/main/python/texera_run_python_worker.py 85.71% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #5597      +/-   ##
============================================
+ Coverage     52.38%   52.39%   +0.01%     
+ Complexity     2484     2480       -4     
============================================
  Files          1070     1070              
  Lines         41359    41369      +10     
  Branches       4441     4441              
============================================
+ Hits          21666    21677      +11     
+ Misses        18427    18423       -4     
- Partials       1266     1269       +3     
Flag Coverage Δ *Carryforward flag
access-control-service 64.61% <ø> (ø) Carriedforward from 1e6d0ca
agent-service 33.76% <ø> (ø) Carriedforward from 1e6d0ca
amber 53.25% <0.00%> (-0.06%) ⬇️
computing-unit-managing-service 1.65% <ø> (ø) Carriedforward from 1e6d0ca
config-service 56.06% <ø> (ø) Carriedforward from 1e6d0ca
file-service 38.21% <ø> (ø) Carriedforward from 1e6d0ca
frontend 46.91% <ø> (ø) Carriedforward from 1e6d0ca
pyamber 91.04% <85.71%> (+0.32%) ⬆️
python 90.82% <ø> (+0.06%) ⬆️ Carriedforward from 1e6d0ca
workflow-compiling-service 58.69% <ø> (ø) Carriedforward from 1e6d0ca

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@xuang7

xuang7 commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Could you assign a reviewer for this PR? @chenlica

@Yicong-Huang Yicong-Huang left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change. We need to add tests to guard the behavior.

@yangzhang75

Copy link
Copy Markdown
Contributor Author

@Yicong-Huang I added unit tests for the mapping and missing-key behavior in the latest commit, CI's green. Could you take another look? Thanks!

Comment on lines +101 to +112
@pytest.mark.parametrize("missing_key", sorted(_full_config().keys()))
def test_main_raises_keyerror_when_a_field_is_missing(missing_key):
"""A missing/renamed key fails loudly rather than being silently misassigned."""
config = _full_config()
del config[missing_key]
with (
mock.patch.object(entry, "StorageConfig"),
mock.patch.object(entry, "PythonWorker"),
mock.patch.object(entry, "init_loguru_logger"),
):
with pytest.raises(KeyError):
entry.main(json.dumps(config))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to guard it a big tighter, so that we know once a drift happens.

please include tests about

  1. having extra configs. -> should fail
  2. having wrong types. -> should fail
  3. having wrong orders -> should succeed as json/dict is used
  4. having duplicate configs -> should fail on scala side. (do we have tests for scala side yet?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pass Python worker startup arguments by name instead of by position

4 participants