Skip to content

fix(metrics): drop redundant _seconds suffix on run_duration metric name#29

Merged
bdchatham merged 1 commit intomainfrom
fix/run-duration-double-suffix
Apr 28, 2026
Merged

fix(metrics): drop redundant _seconds suffix on run_duration metric name#29
bdchatham merged 1 commit intomainfrom
fix/run-duration-double-suffix

Conversation

@bdchatham
Copy link
Copy Markdown
Contributor

@bdchatham bdchatham commented Apr 28, 2026

Summary

  • Metric was named `run_duration_seconds` and annotated with `metric.WithUnit("s")`. The OTel Prometheus exporter appends `_seconds` to any metric with unit `s`, producing `seiload_run_duration_seconds_seconds` — a double-suffix that broke PromQL queries assuming the canonical name.
  • Rename to `run_duration` so the exporter produces `seiload_run_duration_seconds`, matching the same convention used by `block_time` in this file (`block_time` + unit `s` → `seiload_block_time_seconds`).

Why this matters

The harbor nightly alert `NightlyRunFailed` was first written against the double-suffix name (it had to be — that's what was actually exported), making it harder to discover and review. Standardizing to `seiload_run_duration_seconds` aligns with OTel semantic conventions and Prometheus norms.

Coordination

A paired platform-repo PR will atomically:

  1. Bump the pinned seiload image tag in `.github/workflows/k8s_nightly.yml` to the SHA built from this PR
  2. Update the alert query in `clusters/harbor/monitoring/alerts/protocol/alerts-nightly.yaml` from `seiload_run_duration_seconds_seconds` → `seiload_run_duration_seconds`

Both must land in one PR or the alert will spuriously fire (or never fire, depending on which side ships first).

Test plan

  • `go build ./...` clean
  • After merge: image publishes; verify in Prometheus that `seiload_run_duration_seconds` is emitted (single suffix) on the next nightly run

🤖 Generated with Claude Code

The metric was declared as `run_duration_seconds` AND given
`metric.WithUnit("s")`. The OTel Prometheus exporter appends `_seconds`
to any metric carrying unit `s`, so this exported as
`seiload_run_duration_seconds_seconds` — a double-suffix that broke
PromQL queries assuming the canonical OTel-conventional name.

Compare with `block_time` in this same file: name `block_time` + unit
`s` exports cleanly as `seiload_block_time_seconds`. Apply the same
convention to `run_duration` so it exports as
`seiload_run_duration_seconds`.

The harbor nightly alert (alerts-nightly.yaml) currently queries the
double-suffixed name; it will be updated in a paired platform-repo PR
that also bumps the pinned seiload image tag, atomically swapping
emitter and consumer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bdchatham bdchatham merged commit 9858af0 into main Apr 28, 2026
2 checks passed
@bdchatham bdchatham deleted the fix/run-duration-double-suffix branch April 28, 2026 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants