Fix job execution duration when runner assign time is not set#4472
Fix job execution duration when runner assign time is not set#4472nikola-jokic wants to merge 1 commit intomasterfrom
Conversation
|
Hello! Thank you for your contribution. Please review our contribution guidelines to understand the project's testing and code conventions. |
There was a problem hiding this comment.
Pull request overview
Fixes an edge case in gha_job_execution_duration_seconds where jobs completed before runner assignment could yield an epoch-sized duration.
Changes:
- Add a
RunnerAssignTime.IsZero()check when computing job execution duration. - Record an execution duration of
0when the runner was never assigned.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // If the runner assign time is not set, it means that the job is never picked up by the runner. | ||
| // Set execution duration to 0, to observe that the job is completed without being executed. |
There was a problem hiding this comment.
The added comment is a bit misleading/grammatically off: this case means the job was never picked up by a runner, and the wording “Set execution duration to 0, to observe…” reads awkwardly. Consider rephrasing to past tense and clarifying that there was no execution phase (especially if you keep recording 0s).
| // If the runner assign time is not set, it means that the job is never picked up by the runner. | |
| // Set execution duration to 0, to observe that the job is completed without being executed. | |
| // If the runner assign time is not set, the job was never picked up by a runner. | |
| // Leave execution duration at 0 to record that the job completed without an execution phase. |
| var executionDuration int64 | ||
| if !msg.RunnerAssignTime.IsZero() { | ||
| executionDuration = msg.FinishTime.Unix() - msg.RunnerAssignTime.Unix() | ||
| } | ||
|
|
||
| e.observeHistogram(MetricJobExecutionDurationSeconds, l, float64(executionDuration)) |
There was a problem hiding this comment.
This change fixes a production-facing metric edge case, but there are currently no unit tests covering RecordJobCompleted. Please add a test for the zero RunnerAssignTime scenario to ensure we don’t regress back to recording an epoch-sized duration (and to lock in the intended behavior: either skipping the observation or recording a specific value).
| // If the runner assign time is not set, it means that the job is never picked up by the runner. | ||
| // Set execution duration to 0, to observe that the job is completed without being executed. | ||
| var executionDuration int64 | ||
| if !msg.RunnerAssignTime.IsZero() { | ||
| executionDuration = msg.FinishTime.Unix() - msg.RunnerAssignTime.Unix() | ||
| } | ||
|
|
There was a problem hiding this comment.
Recording an execution duration of 0 when RunnerAssignTime is zero makes the gha_job_execution_duration_seconds histogram include jobs that never actually executed on a runner. This will show up as a spike in the smallest bucket (<=0.01s) and can still skew the bundled Grafana heatmap which sums buckets without filtering job_result (see docs/gha-runner-scale-set-controller/samples/grafana-dashboard/ARC-Autoscaling-Runner-Set-Monitoring.json:256). Consider skipping the histogram observation entirely when RunnerAssignTime.IsZero() (early return after incrementing the completed counter), or emit a separate metric for “completed before runner assignment” instead of encoding it as 0s execution time.
| // If the runner assign time is not set, it means that the job is never picked up by the runner. | |
| // Set execution duration to 0, to observe that the job is completed without being executed. | |
| var executionDuration int64 | |
| if !msg.RunnerAssignTime.IsZero() { | |
| executionDuration = msg.FinishTime.Unix() - msg.RunnerAssignTime.Unix() | |
| } | |
| // If the runner assign time is not set, it means that the job was never picked up by a runner. | |
| // Skip recording execution duration so jobs that never executed do not pollute the histogram. | |
| if msg.RunnerAssignTime.IsZero() { | |
| return | |
| } | |
| executionDuration := msg.FinishTime.Unix() - msg.RunnerAssignTime.Unix() |
Fixes #4444