Skip to content

[doc-only] docs(pathfinder): prepare 1.5.4 release notes#1981

Open
rwgk wants to merge 3 commits intoNVIDIA:mainfrom
rwgk:pathfinder_release_notes_1.5.4
Open

[doc-only] docs(pathfinder): prepare 1.5.4 release notes#1981
rwgk wants to merge 3 commits intoNVIDIA:mainfrom
rwgk:pathfinder_release_notes_1.5.4

Conversation

@rwgk
Copy link
Copy Markdown
Contributor

@rwgk rwgk commented Apr 27, 2026

Add cuda-pathfinder 1.5.4 release notes.

Made-with: Cursor

Add cuda-pathfinder 1.5.4 release notes and register 1.5.4 in nv-versions so the published docs include the new version entry.

Made-with: Cursor
@github-actions github-actions Bot added the cuda.pathfinder Everything related to the cuda.pathfinder module label Apr 27, 2026
@rwgk rwgk self-assigned this Apr 27, 2026
@rwgk rwgk added the P0 High priority - Must do! label Apr 27, 2026
@rwgk rwgk added this to the cuda.pathfinder next milestone Apr 27, 2026
@rwgk rwgk added documentation Improvements or additions to documentation and removed documentation Improvements or additions to documentation labels Apr 27, 2026
@rwgk
Copy link
Copy Markdown
Contributor Author

rwgk commented Apr 27, 2026

Analysis of a CI failure unrelated to this PR, but currently blocking it

CI failures:

The failures on PR #1981 do not appear to be caused by the changes in this PR.

This PR only touches:

  • cuda_pathfinder/docs/nv-versions.json
  • cuda_pathfinder/docs/source/release/1.5.4-notes.rst

It does not modify .github/, ci/, packaging metadata, or build logic.

What is actually failing

In both unpacked log bundles (attempt1 and attempt2), the hard failure is in the Download cuda.bindings build artifacts from the prior branch step from .github/workflows/build-wheel.yml.

That step resolves:

  • OLD_BRANCH=12.9.x from ci/versions.yml

and then runs:

gh run list -b ${OLD_BRANCH} -L 1 -w "ci.yml" -s success -R NVIDIA/cuda-python --json databaseId

The step then exits with:

LATEST_PRIOR_RUN_ID not found!

followed by:

##[error]Process completed with exit code 1.

The earlier setuptools / setuptools-scm messages in the logs are noisy and surprising, but they are not the immediate cause of the job failure here. The job continues past those messages and only fails when the gh run list ... lookup returns no run ID.

Why this looks unrelated to PR #1981

  • PR [doc-only] docs(pathfinder): prepare 1.5.4 release notes #1981 does not change CI configuration or build scripts.
  • The failure happens when CI tries to download comparison artifacts from the backport branch, not when it processes the release-notes changes in this PR.
  • The same failure appears across multiple matrix entries and across both reruns, which is consistent with a shared infrastructure/workflow issue rather than a PR-content-specific regression.

What changed recently

The most relevant recent repo-side change I found is commit cfbda9fd0c (ensure the backport CI can fetch artifacts successfully, 2026-03-06), which changed the artifact lookup in:

  • .github/workflows/build-wheel.yml
  • .github/workflows/test-wheel-linux.yml
  • .github/workflows/test-wheel-windows.yml

from:

-s completed

to:

-s success

So there was a recent CI change directly in the area that is now failing.

By contrast, the backport-branch setting itself is not recent:

  • backport_branch: "12.9.x" was introduced much earlier in commit 1fc792df81 (2025-12-15).
  • The more recent ci/versions.yml edit in c85391ad6b only bumped the CUDA build version from 13.2.0 to 13.2.1.

Evidence that the branch and artifacts do exist

Successful CI runs do exist on 12.9.x. For example, run 24790231835 from 2026-04-22 is a successful CI run on that branch.

That successful run also contains the expected prior-branch artifacts, including linux-64 bindings artifacts such as:

  • cuda-bindings-python312-cuda12.9.1-linux-64-...
  • cuda-bindings-python314t-cuda12.9.1-linux-64-...

So this does not look like:

  • a missing 12.9.x branch
  • a missing successful CI run on that branch
  • or missing artifacts on that successful run

What seems brittle in gh run list

I reproduced the lookup behavior against the live repo and found that the current command is brittle in two separate ways:

  • gh run list ... -b 12.9.x -w CI returns the expected runs.
  • gh run list ... -b 12.9.x -w CI -s completed also returns the expected runs.
  • gh run list ... -b 12.9.x -w CI -s success returns [].
  • gh run list ... -b 12.9.x -w ci.yml returns [].
  • gh run list ... -b 12.9.x -w .github/workflows/ci.yml returns [].

So the current workflow command is depending on two selectors that appear brittle under current gh behavior:

  • workflow selector: -w "ci.yml"
  • status selector: -s success

The March change from -s completed to -s success is the most plausible repo-side change that made this start failing, but the workflow selector also appears fragile now.

Bottom line

This looks like a CI infrastructure/workflow issue that is unrelated to the content of PR #1981, but currently blocks PR #1981 from going green.

The failure appears to come from brittle gh run list filtering in the prior-artifact lookup step, not from the release-notes changes in this PR.

Likely fix direction

The safest fix is probably to stop relying on:

gh run list ... -w "ci.yml" -s success

and instead use a more robust lookup, for example:

  • query workflow CI by name or workflow ID rather than ci.yml
  • query completed runs
  • then explicitly filter for conclusion == "success"

That same fix likely needs to be applied in:

  • .github/workflows/build-wheel.yml
  • .github/workflows/test-wheel-linux.yml
  • .github/workflows/test-wheel-windows.yml

Backport artifact downloads were relying on `gh run list -w ci.yml -s success`, which can fail to return runs even when the branch has successful CI artifacts. Move the lookup into a shared helper that queries completed `CI` runs and filters for successful results explicitly, so Linux and Windows workflows resolve prior-branch bindings artifacts reliably.

Made-with: Cursor
@github-actions github-actions Bot added the CI/CD CI/CD infrastructure label Apr 27, 2026
@rwgk rwgk removed the CI/CD CI/CD infrastructure label Apr 27, 2026
@github-actions
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.pathfinder Everything related to the cuda.pathfinder module P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant