Skip to content

Fix job execution duration when runner assign time is not set#4472

Open
nikola-jokic wants to merge 1 commit intomasterfrom
nikola-jokic/no-runner-assigned-time-job-completed
Open

Fix job execution duration when runner assign time is not set#4472
nikola-jokic wants to merge 1 commit intomasterfrom
nikola-jokic/no-runner-assigned-time-job-completed

Conversation

@nikola-jokic
Copy link
Copy Markdown
Collaborator

Fixes #4444

Copilot AI review requested due to automatic review settings April 24, 2026 08:23
@nikola-jokic nikola-jokic added the gha-runner-scale-set Related to the gha-runner-scale-set mode label Apr 24, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Hello! Thank you for your contribution.

Please review our contribution guidelines to understand the project's testing and code conventions.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes an edge case in gha_job_execution_duration_seconds where jobs completed before runner assignment could yield an epoch-sized duration.

Changes:

  • Add a RunnerAssignTime.IsZero() check when computing job execution duration.
  • Record an execution duration of 0 when the runner was never assigned.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +499 to +500
// If the runner assign time is not set, it means that the job is never picked up by the runner.
// Set execution duration to 0, to observe that the job is completed without being executed.
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The added comment is a bit misleading/grammatically off: this case means the job was never picked up by a runner, and the wording “Set execution duration to 0, to observe…” reads awkwardly. Consider rephrasing to past tense and clarifying that there was no execution phase (especially if you keep recording 0s).

Suggested change
// If the runner assign time is not set, it means that the job is never picked up by the runner.
// Set execution duration to 0, to observe that the job is completed without being executed.
// If the runner assign time is not set, the job was never picked up by a runner.
// Leave execution duration at 0 to record that the job completed without an execution phase.

Copilot uses AI. Check for mistakes.
Comment on lines +501 to 506
var executionDuration int64
if !msg.RunnerAssignTime.IsZero() {
executionDuration = msg.FinishTime.Unix() - msg.RunnerAssignTime.Unix()
}

e.observeHistogram(MetricJobExecutionDurationSeconds, l, float64(executionDuration))
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change fixes a production-facing metric edge case, but there are currently no unit tests covering RecordJobCompleted. Please add a test for the zero RunnerAssignTime scenario to ensure we don’t regress back to recording an epoch-sized duration (and to lock in the intended behavior: either skipping the observation or recording a specific value).

Copilot uses AI. Check for mistakes.
Comment on lines +499 to +505
// If the runner assign time is not set, it means that the job is never picked up by the runner.
// Set execution duration to 0, to observe that the job is completed without being executed.
var executionDuration int64
if !msg.RunnerAssignTime.IsZero() {
executionDuration = msg.FinishTime.Unix() - msg.RunnerAssignTime.Unix()
}

Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recording an execution duration of 0 when RunnerAssignTime is zero makes the gha_job_execution_duration_seconds histogram include jobs that never actually executed on a runner. This will show up as a spike in the smallest bucket (<=0.01s) and can still skew the bundled Grafana heatmap which sums buckets without filtering job_result (see docs/gha-runner-scale-set-controller/samples/grafana-dashboard/ARC-Autoscaling-Runner-Set-Monitoring.json:256). Consider skipping the histogram observation entirely when RunnerAssignTime.IsZero() (early return after incrementing the completed counter), or emit a separate metric for “completed before runner assignment” instead of encoding it as 0s execution time.

Suggested change
// If the runner assign time is not set, it means that the job is never picked up by the runner.
// Set execution duration to 0, to observe that the job is completed without being executed.
var executionDuration int64
if !msg.RunnerAssignTime.IsZero() {
executionDuration = msg.FinishTime.Unix() - msg.RunnerAssignTime.Unix()
}
// If the runner assign time is not set, it means that the job was never picked up by a runner.
// Skip recording execution duration so jobs that never executed do not pollute the histogram.
if msg.RunnerAssignTime.IsZero() {
return
}
executionDuration := msg.FinishTime.Unix() - msg.RunnerAssignTime.Unix()

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gha-runner-scale-set Related to the gha-runner-scale-set mode

Projects

None yet

Development

Successfully merging this pull request may close these issues.

gha_job_execution_duration_seconds records ~2023 years for jobs canceled before runner assignment

2 participants