OCPBUGS-62517: Synchronize from Upstream Repositories#710
OCPBUGS-62517: Synchronize from Upstream Repositories#710openshift-merge-bot[bot] merged 104 commits intoopenshift:mainfrom
Conversation
- Rename local variable `tag` to `bundleTag` in catalog.Build to avoid shadowing the method's `tag` parameter - Fail the step with an error on invalid LargeCRD field counts in the bundle option parser; use strconv.ParseInt with bitSize 0 to also reject non-positive values - Look up channels by name in TestCatalog_FBCGeneration instead of assuming insertion order - Use errgroup.Group with SetLimit(8) in ScenarioCleanup deletion loop to bound concurrent goroutines and kubectl calls - Clarify README that ClusterObjectSet deletion is conditional on the BoxcutterRuntime feature gate - Add missing `text` language specifier to fenced code block in e2e-isolation design doc - Remove extra blank line in setup.sh Signed-off-by: Todd Short <tshort@redhat.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@openshift-bot: This pull request explicitly references no jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds catalogd HA tests and leader-failover steps, replaces leader-gated catalog server with a runnable that serves on every pod and exposes readiness, implements feature-gated BundleRelease support across API/CRD/parsing/labels/controllers/tests, moves progress-deadline logic into reconcile with a deadline-aware rate limiter, and updates manifests, tests, tooling, and deps. Changes
Sequence Diagram(s)sequenceDiagram
participant Test as Test Runner
participant K8s as Kubernetes API
participant Old as catalogd Pod (old leader)
participant New as catalogd Pod (follower/new leader)
participant Manager as ctrl-runtime Manager
participant Server as Catalog HTTP Server
Test->>K8s: create catalog resource
K8s-->>Old: schedule/start leader Pod
Old->>Manager: runnable starts (no leader election)
Manager->>Server: start Serve -> signal ready
Server-->>Test: catalog reconciled / Serving True
Test->>K8s: force-delete Old pod (--force, grace=0)
K8s-->>Old: terminate
New->>Manager: runnable started immediately
Manager->>Server: start Serve -> signal ready
Server-->>Test: new leader observed, Serving True
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 9 | ❌ 3❌ Failed checks (2 warnings, 1 inconclusive)
✅ Passed checks (9 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
test/e2e/steps/steps.go (1)
1661-1708:⚠️ Potential issue | 🟠 MajorReject unknown/malformed
contentstokens instead of silently ignoring them.
parseContentsnow returns an error, but unknown tokens still fall through and are ignored. That can hide typos and let scenarios pass with unintended bundle contents.🔧 Suggested hardening
func parseContents(contents string) ([]catalog.BundleOption, error) { contents = strings.TrimSpace(contents) if contents == "" { return nil, nil } if strings.EqualFold(contents, "BadImage") { return []catalog.BundleOption{catalog.BadImage()}, nil } var opts []catalog.BundleOption for _, part := range strings.Split(contents, ",") { part = strings.TrimSpace(part) + if part == "" { + continue + } switch { case part == "CRD": opts = append(opts, catalog.WithCRD()) case part == "Deployment": opts = append(opts, catalog.WithDeployment()) case part == "ConfigMap": opts = append(opts, catalog.WithConfigMap()) case strings.HasPrefix(part, "Property(") && strings.HasSuffix(part, ")"): inner := part[len("Property(") : len(part)-1] - if k, v, ok := strings.Cut(inner, "="); ok { - opts = append(opts, catalog.WithBundleProperty(k, v)) - } + k, v, ok := strings.Cut(inner, "=") + if !ok || strings.TrimSpace(k) == "" || strings.TrimSpace(v) == "" { + return nil, fmt.Errorf("invalid Property token %q: expected Property(type=value)", part) + } + opts = append(opts, catalog.WithBundleProperty(k, v)) case strings.HasPrefix(part, "InstallMode(") && strings.HasSuffix(part, ")"): mode := part[len("InstallMode(") : len(part)-1] opts = append(opts, catalog.WithInstallMode(v1alpha1.InstallModeType(mode))) case strings.HasPrefix(part, "LargeCRD(") && strings.HasSuffix(part, ")"): countStr := part[len("LargeCRD(") : len(part)-1] count, err := strconv.ParseInt(countStr, 10, 0) if err != nil || count <= 0 { return nil, fmt.Errorf("invalid LargeCRD field count %q: must be a positive integer", countStr) } opts = append(opts, catalog.WithLargeCRD(int(count))) case strings.HasPrefix(part, "ClusterRegistry(") && strings.HasSuffix(part, ")"): host := part[len("ClusterRegistry(") : len(part)-1] opts = append(opts, catalog.WithClusterRegistry(host)) case strings.HasPrefix(part, "StaticBundleDir(") && strings.HasSuffix(part, ")"): dir := part[len("StaticBundleDir(") : len(part)-1] absDir := filepath.Join(projectRootDir(), dir) opts = append(opts, catalog.StaticBundleDir(absDir)) + default: + return nil, fmt.Errorf("unknown bundle content token %q", part) } } return opts, nil }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/steps/steps.go` around lines 1661 - 1708, parseContents silently ignores unknown or malformed tokens which hides typos; update parseContents to return an error when a part does not match any recognized pattern (CRD, Deployment, ConfigMap, Property(...), InstallMode(...), LargeCRD(...), ClusterRegistry(...), StaticBundleDir(...), BadImage or empty) or when a recognized pattern is malformed (e.g., Property without "="). In the loop over parts in parseContents, after trimming each part, detect empty parts (from extra commas) and treat them as errors or skip consistently, and if none of the switch cases handled the part, return fmt.Errorf with the offending token; also return errors for malformed Property(...) and other pattern-parsing failures so callers get immediate feedback.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@test/e2e/steps/steps.go`:
- Around line 1661-1708: parseContents silently ignores unknown or malformed
tokens which hides typos; update parseContents to return an error when a part
does not match any recognized pattern (CRD, Deployment, ConfigMap,
Property(...), InstallMode(...), LargeCRD(...), ClusterRegistry(...),
StaticBundleDir(...), BadImage or empty) or when a recognized pattern is
malformed (e.g., Property without "="). In the loop over parts in parseContents,
after trimming each part, detect empty parts (from extra commas) and treat them
as errors or skip consistently, and if none of the switch cases handled the
part, return fmt.Errorf with the offending token; also return errors for
malformed Property(...) and other pattern-parsing failures so callers get
immediate feedback.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: bf6bbb55-1573-4b42-9326-cc2ab1b49755
📒 Files selected for processing (7)
docs/designs/testing/2026-04-13-e2e-isolation/design.mdtest/e2e/README.mdtest/e2e/steps/hooks.gotest/e2e/steps/steps.gotest/extension-developer-e2e/setup.shtest/internal/catalog/catalog.gotest/internal/catalog/catalog_test.go
💤 Files with no reviewable changes (1)
- test/extension-developer-e2e/setup.sh
* fix(catalogd): bind catalog HTTP port lazily; add readiness check The catalog HTTP server has OnlyServeWhenLeader: true, so only the leader pod should serve catalog content. Previously, net.Listen was called eagerly at startup for all pods: the listen socket was bound on non-leaders even though http.Serve was never called, causing TCP connections to queue without being served. With replicas > 1 this made ~50% of catalog content requests fail silently. Replace manager.Server with a custom Runnable (catalogServerRunnable) in serverutil that: - Binds the catalog port lazily inside Start(), which is only called on the leader by controller-runtime's leader election machinery. - Closes a ready channel once the listener is established, and registers a channel-select readiness check via AddReadyzCheck so non-leader pods fail the /readyz probe and are excluded from Service endpoints. This keeps cmd/catalogd/main.go health/readiness setup identical to cmd/operator-controller/main.go (healthz.Ping for both liveness and readiness); the catalog-server readiness check is an implementation detail of serverutil.AddCatalogServerToManager. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(experimental): run catalogd and operator-controller with 2 replicas The experimental e2e suite uses a 2-node kind cluster, making it a natural fit to validate HA behaviour. Set replicas=2 for both components in helm/experimental.yaml so the experimental and experimental-e2e manifests exercise the multi-replica path end-to-end. This is safe for operator-controller (no leader-only HTTP servers) and for catalogd now that the catalog server starts on all pods via NeedLeaderElection=false, preventing the rolling-update deadlock that would arise if the server were leader-only. Also adds a @CatalogdHA experimental e2e scenario that force-deletes the catalogd leader pod and verifies that a new leader is elected and the catalog resumes serving. The scenario is gated on a 2-node cluster (detected in BeforeSuite and reflected in the featureGates map), so it is automatically skipped in the standard 1-node e2e suite. The experimental e2e timeout is bumped from 20m to 25m to accommodate leader re-election time (~163s worst case). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Todd Short <tshort@redhat.com> --------- Signed-off-by: Todd Short <tshort@redhat.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Bumps [github.com/google/go-containerregistry](https://github.com/google/go-containerregistry) from 0.21.4 to 0.21.5. - [Release notes](https://github.com/google/go-containerregistry/releases) - [Commits](google/go-containerregistry@v0.21.4...v0.21.5) --- updated-dependencies: - dependency-name: github.com/google/go-containerregistry dependency-version: 0.21.5 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
7e81373 to
7e83d75
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
internal/operator-controller/catalogmetadata/client/client.go (1)
109-112: Align cache contract/docs with new non-200 behavior.This change is reasonable for HA retries, but it now conflicts with the
Cachecontract comments (Lines 30-35) that say population errors are cached. Please update those docs (and any related tests) to reflect that HTTP non-200 errors are returned directly and not cached.Suggested doc update
type Cache interface { // Get returns cache for a specified catalog name and version (resolvedRef). // // Method behaviour is as follows: // - If cache exists, it returns a non-nil fs.FS and nil error // - If cache doesn't exist, it returns nil fs.FS and nil error - // - If there was an error during cache population, - // it returns nil fs.FS and the error from the cache population. - // In other words - cache population errors are also cached. + // - If there was a cached error during cache population, + // it returns nil fs.FS and that cached error. + // Note: not all population failures are cached (for example, + // HTTP non-200 responses may be returned directly by PopulateCache). Get(catalogName, resolvedRef string) (fs.FS, error)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/operator-controller/catalogmetadata/client/client.go` around lines 109 - 112, Update the Cache contract comment to reflect the new behavior where HTTP non-200 responses are not cached: modify the comment on the Cache interface (the block that previously said "population errors are cached") to state that transient HTTP non-200 responses (e.g., non-leader 404s) are returned directly and not stored, and add a note that only successful 200 responses are populated into the cache; adjust any related unit/integration tests that assert caching of non-200 responses to expect immediate error returns from the function that currently does `return nil, fmt.Errorf("error: received unexpected response status code %d", resp.StatusCode)`.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/catalogd/serverutil/serverutil.go`:
- Around line 96-99: The TLS setup assumes r.cw is non-nil when r.cfg.CertFile
and r.cfg.KeyFile are set; add a nil guard before using r.cw to avoid a panic:
check that r.cw != nil before constructing tls.Config with GetCertificate, and
if it is nil return a descriptive error (or load the certificate directly) so
tls.Config.GetCertificate is never dereferenced; update the block that creates
tls.Config (references: r.cfg.CertFile, r.cfg.KeyFile, r.cw, tls.Config,
GetCertificate) to validate r.cw and handle the misconfiguration path safely.
In `@test/e2e/steps/ha_steps.go`:
- Around line 45-47: The code reads sc.leaderPods["catalogd"] into oldLeader
without validating existence; change the step to first check presence and
non-empty value (e.g. look up with the comma-ok form on sc.leaderPods for key
"catalogd" or test oldLeader != "") and fail fast if missing (use the test
context's failure helper such as t.Fatalf or the step's error return) so the
step doesn't proceed when the prior leader-discovery did not record a leader.
---
Nitpick comments:
In `@internal/operator-controller/catalogmetadata/client/client.go`:
- Around line 109-112: Update the Cache contract comment to reflect the new
behavior where HTTP non-200 responses are not cached: modify the comment on the
Cache interface (the block that previously said "population errors are cached")
to state that transient HTTP non-200 responses (e.g., non-leader 404s) are
returned directly and not stored, and add a note that only successful 200
responses are populated into the cache; adjust any related unit/integration
tests that assert caching of non-200 responses to expect immediate error returns
from the function that currently does `return nil, fmt.Errorf("error: received
unexpected response status code %d", resp.StatusCode)`.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: c7e7f28d-de77-429b-8cf2-46b69cedc5ba
⛔ Files ignored due to path filters (6)
go.sumis excluded by!**/*.sumvendor/github.com/docker/cli/AUTHORSis excluded by!**/vendor/**,!vendor/**vendor/github.com/docker/cli/cli/config/configfile/file.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/docker/cli/cli/config/credentials/file_store.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/docker/cli/cli/config/memorystore/store.gois excluded by!**/vendor/**,!vendor/**vendor/modules.txtis excluded by!**/vendor/**,!vendor/**
📒 Files selected for processing (17)
Makefiledocs/designs/testing/2026-04-13-e2e-isolation/design.mdgo.modhelm/experimental.yamlinternal/catalogd/serverutil/serverutil.gointernal/operator-controller/catalogmetadata/client/client.gointernal/operator-controller/catalogmetadata/client/client_test.gomanifests/experimental-e2e.yamlmanifests/experimental.yamltest/e2e/README.mdtest/e2e/features/ha.featuretest/e2e/steps/ha_steps.gotest/e2e/steps/hooks.gotest/e2e/steps/steps.gotest/extension-developer-e2e/setup.shtest/internal/catalog/catalog.gotest/internal/catalog/catalog_test.go
💤 Files with no reviewable changes (2)
- test/extension-developer-e2e/setup.sh
- internal/operator-controller/catalogmetadata/client/client_test.go
✅ Files skipped from review due to trivial changes (5)
- docs/designs/testing/2026-04-13-e2e-isolation/design.md
- helm/experimental.yaml
- test/internal/catalog/catalog.go
- go.mod
- test/e2e/README.md
🚧 Files skipped from review as they are similar to previous changes (3)
- test/internal/catalog/catalog_test.go
- test/e2e/steps/hooks.go
- test/e2e/steps/steps.go
| if r.cfg.CertFile != "" && r.cfg.KeyFile != "" { | ||
| config := &tls.Config{ | ||
| GetCertificate: tlsFileWatcher.GetCertificate, | ||
| GetCertificate: r.cw.GetCertificate, | ||
| MinVersion: tls.VersionTLS12, |
There was a problem hiding this comment.
Add a nil guard for certificate watcher when TLS is enabled.
When cert/key are configured, r.cw is assumed non-nil. A misconfiguration here can panic during TLS handshake.
🔧 Proposed defensive fix
if r.cfg.CertFile != "" && r.cfg.KeyFile != "" {
+ if r.cw == nil {
+ return fmt.Errorf("catalog server TLS is enabled but certificate watcher is nil")
+ }
config := &tls.Config{
GetCertificate: r.cw.GetCertificate,
MinVersion: tls.VersionTLS12,
}
listener = tls.NewListener(listener, config)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if r.cfg.CertFile != "" && r.cfg.KeyFile != "" { | |
| config := &tls.Config{ | |
| GetCertificate: tlsFileWatcher.GetCertificate, | |
| GetCertificate: r.cw.GetCertificate, | |
| MinVersion: tls.VersionTLS12, | |
| if r.cfg.CertFile != "" && r.cfg.KeyFile != "" { | |
| if r.cw == nil { | |
| return fmt.Errorf("catalog server TLS is enabled but certificate watcher is nil") | |
| } | |
| config := &tls.Config{ | |
| GetCertificate: r.cw.GetCertificate, | |
| MinVersion: tls.VersionTLS12, | |
| } | |
| listener = tls.NewListener(listener, config) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/catalogd/serverutil/serverutil.go` around lines 96 - 99, The TLS
setup assumes r.cw is non-nil when r.cfg.CertFile and r.cfg.KeyFile are set; add
a nil guard before using r.cw to avoid a panic: check that r.cw != nil before
constructing tls.Config with GetCertificate, and if it is nil return a
descriptive error (or load the certificate directly) so
tls.Config.GetCertificate is never dereferenced; update the block that creates
tls.Config (references: r.cfg.CertFile, r.cfg.KeyFile, r.cw, tls.Config,
GetCertificate) to validate r.cw and handle the misconfiguration path safely.
Bumps [marocchino/sticky-pull-request-comment](https://github.com/marocchino/sticky-pull-request-comment) from 3.0.3 to 3.0.4. - [Release notes](https://github.com/marocchino/sticky-pull-request-comment/releases) - [Commits](marocchino/sticky-pull-request-comment@v3.0.3...v3.0.4) --- updated-dependencies: - dependency-name: marocchino/sticky-pull-request-comment dependency-version: 3.0.4 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [github.com/cert-manager/cert-manager](https://github.com/cert-manager/cert-manager) from 1.20.1 to 1.20.2. - [Release notes](https://github.com/cert-manager/cert-manager/releases) - [Changelog](https://github.com/cert-manager/cert-manager/blob/master/RELEASE.md) - [Commits](cert-manager/cert-manager@v1.20.1...v1.20.2) --- updated-dependencies: - dependency-name: github.com/cert-manager/cert-manager dependency-version: 1.20.2 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Rashmi Gottipati <rgottipa@redhat.com>
7e83d75 to
b09c936
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
openshift/operator-controller/manifests-experimental.yaml (1)
1236-1241:⚠️ Potential issue | 🟠 MajorEnable
BundleReleaseSupporthere or remove the new release semantics from this manifest.This OpenShift experimental deployment still starts
operator-controllerwithout--feature-gates=BundleReleaseSupport=true, so bundles with explicitpkg.Releasewill continue to use legacy build-metadata parsing here even though this same manifest now advertisesstatus.install.bundle.release. That already diverges frommanifests/experimental.yamlLine 2735, which enables the gate.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@openshift/operator-controller/manifests-experimental.yaml` around lines 1236 - 1241, The manifest for the operator-controller starts without the BundleReleaseSupport feature gate while elsewhere the manifest set advertises status.install.bundle.release, causing inconsistent release semantics; update the operator-controller container args (the entries starting with --feature-gates in the operator-controller spec) to include --feature-gates=BundleReleaseSupport=true (or remove the new release/status.install.bundle.release behavior across manifests if you intend to keep the legacy semantics) so the operator-controller and experimental deployment use the same release parsing behavior.
♻️ Duplicate comments (1)
test/e2e/steps/ha_steps.go (1)
45-47:⚠️ Potential issue | 🟠 MajorFail fast when previous leader is missing.
On Line 46,
oldLeaderis read but not validated. If it is empty, this step can pass without proving leader failover correctness.Suggested fix
func NewCatalogdLeaderIsElected(ctx context.Context) error { sc := scenarioCtx(ctx) oldLeader := sc.leaderPods["catalogd"] + if oldLeader == "" { + return fmt.Errorf("catalogd leader pod not found in scenario context; run 'catalogd is ready to reconcile resources' first") + } waitFor(ctx, func() bool {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/steps/ha_steps.go` around lines 45 - 47, After reading oldLeader := sc.leaderPods["catalogd"] add a fast-fail check to ensure the previous leader is present: if oldLeader is empty or nil, immediately fail the test/step (e.g. t.Fatalf or require.FailNow) with a clear message like "previous leader missing" so the step cannot silently pass; place this check directly after the oldLeader assignment in the same function that calls scenarioCtx(ctx) so the rest of the leader-failover assertions only run when oldLeader is valid.
🧹 Nitpick comments (1)
internal/operator-controller/bundleutil/bundle_test.go (1)
101-108: Feature gate manipulation may cause test interference if run in parallel.The pattern of modifying the global
OperatorControllerFeatureGateand restoring it viat.Cleanupis correct for sequential test execution. However, if these tests or other tests in the package uset.Parallel(), the shared mutable state could cause flaky test results.Consider whether parallel test execution is used in this package. If so, you may need to refactor to use test-local feature gate instances or ensure these tests don't run in parallel with others that depend on this gate.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/operator-controller/bundleutil/bundle_test.go` around lines 101 - 108, Test modifies the global OperatorControllerFeatureGate (used with features.BundleReleaseSupport) which can race with other parallel tests; to fix, either (A) avoid running this test in parallel by ensuring TestGetVersionAndRelease_WithBundleReleaseSupport and its subtests do not call t.Parallel(), or (B) eliminate global mutation by creating and using a test-local feature gate instance (instead of features.OperatorControllerFeatureGate) when setting BundleReleaseSupport, or (C) serialize access by protecting Set/restore with a package-level sync.Mutex around the calls to OperatorControllerFeatureGate.Set in the test and cleanup; update the test to use one of these approaches and keep the existing Set/cleanup logic (references: TestGetVersionAndRelease_WithBundleReleaseSupport, OperatorControllerFeatureGate, BundleReleaseSupport).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@api/v1/clusterextension_types.go`:
- Around line 489-491: The MaxLength=20 validation on the Release field (Release
*string `json:"release,omitempty"`) creates a mismatch with downstream
parsing/propagation; remove or change the `//
+kubebuilder:validation:MaxLength=20` annotation on the Release field in
clusterextension_types.go so the CRD allows the longer valid release strings the
parser accepts (or increase the limit to the parser’s maximum), or alternatively
implement the same 20-char check at all ingestion points before assigning to
status.install.bundle.release and return a clear validation error there; update
the annotation and any upstream validators consistent with the chosen approach.
In `@internal/operator-controller/catalogmetadata/filter/successors_test.go`:
- Around line 190-210: Tests currently only assert empty results because
installedBundle names are absent from bundleSet; add catalog fixtures with
bundles whose Name/Version differ only by Release/build metadata so SuccessorsOf
actually matches and parseInstalledBundleVersionRelease is exercised. Update the
two test cases (the ones with installedBundle Release non-empty and empty) to
include corresponding entries in the bundleSet (or declcfg.Bundle fixtures) that
are identical except for release/build metadata, then assert that SuccessorsOf
(or the function invoking parseInstalledBundleVersionRelease) returns the
expected matching declcfg.Bundle(s) verifying preservation/stripping of build
metadata and handling of explicit empty Release.
In `@manifests/experimental.yaml`:
- Around line 1264-1287: The CRD's release field currently limits release length
with maxLength: 20 which is stricter than the controller and will cause
status.install.bundle.release updates to be rejected; update the CRD to either
remove or raise the maxLength (e.g., to a much larger limit like 256) so it
matches the controller's accepted syntax, or alternatively add the same
truncation/validation logic into the release parser used by
BundleReleaseSupport/pkg.Release to enforce the 20-char cap consistently; adjust
the "release" field (maxLength) and keep the existing x-kubernetes-validations
regex intact so the schema and controller behavior align.
---
Outside diff comments:
In `@openshift/operator-controller/manifests-experimental.yaml`:
- Around line 1236-1241: The manifest for the operator-controller starts without
the BundleReleaseSupport feature gate while elsewhere the manifest set
advertises status.install.bundle.release, causing inconsistent release
semantics; update the operator-controller container args (the entries starting
with --feature-gates in the operator-controller spec) to include
--feature-gates=BundleReleaseSupport=true (or remove the new
release/status.install.bundle.release behavior across manifests if you intend to
keep the legacy semantics) so the operator-controller and experimental
deployment use the same release parsing behavior.
---
Duplicate comments:
In `@test/e2e/steps/ha_steps.go`:
- Around line 45-47: After reading oldLeader := sc.leaderPods["catalogd"] add a
fast-fail check to ensure the previous leader is present: if oldLeader is empty
or nil, immediately fail the test/step (e.g. t.Fatalf or require.FailNow) with a
clear message like "previous leader missing" so the step cannot silently pass;
place this check directly after the oldLeader assignment in the same function
that calls scenarioCtx(ctx) so the rest of the leader-failover assertions only
run when oldLeader is valid.
---
Nitpick comments:
In `@internal/operator-controller/bundleutil/bundle_test.go`:
- Around line 101-108: Test modifies the global OperatorControllerFeatureGate
(used with features.BundleReleaseSupport) which can race with other parallel
tests; to fix, either (A) avoid running this test in parallel by ensuring
TestGetVersionAndRelease_WithBundleReleaseSupport and its subtests do not call
t.Parallel(), or (B) eliminate global mutation by creating and using a
test-local feature gate instance (instead of
features.OperatorControllerFeatureGate) when setting BundleReleaseSupport, or
(C) serialize access by protecting Set/restore with a package-level sync.Mutex
around the calls to OperatorControllerFeatureGate.Set in the test and cleanup;
update the test to use one of these approaches and keep the existing Set/cleanup
logic (references: TestGetVersionAndRelease_WithBundleReleaseSupport,
OperatorControllerFeatureGate, BundleReleaseSupport).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: ad5bf2f2-1803-444a-bd59-4b5b5de90947
⛔ Files ignored due to path filters (9)
api/v1/zz_generated.deepcopy.gois excluded by!**/zz_generated*go.sumis excluded by!**/*.sumopenshift/tests-extension/vendor/github.com/operator-framework/operator-controller/api/v1/clusterextension_types.gois excluded by!**/vendor/**openshift/tests-extension/vendor/github.com/operator-framework/operator-controller/api/v1/zz_generated.deepcopy.gois excluded by!**/vendor/**,!**/zz_generated*vendor/github.com/docker/cli/AUTHORSis excluded by!**/vendor/**,!vendor/**vendor/github.com/docker/cli/cli/config/configfile/file.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/docker/cli/cli/config/credentials/file_store.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/docker/cli/cli/config/memorystore/store.gois excluded by!**/vendor/**,!vendor/**vendor/modules.txtis excluded by!**/vendor/**,!vendor/**
📒 Files selected for processing (34)
Makefileapi/v1/clusterextension_types.goapplyconfigurations/api/v1/bundlemetadata.godocs/api-reference/olmv1-api-reference.mddocs/designs/testing/2026-04-13-e2e-isolation/design.mdgo.modhelm/experimental.yamlhelm/olmv1/base/operator-controller/crd/experimental/olm.operatorframework.io_clusterextensions.yamlinternal/catalogd/serverutil/serverutil.gointernal/operator-controller/applier/boxcutter.gointernal/operator-controller/bundle/versionrelease.gointernal/operator-controller/bundleutil/bundle.gointernal/operator-controller/bundleutil/bundle_test.gointernal/operator-controller/catalogmetadata/client/client.gointernal/operator-controller/catalogmetadata/client/client_test.gointernal/operator-controller/catalogmetadata/filter/successors.gointernal/operator-controller/catalogmetadata/filter/successors_test.gointernal/operator-controller/controllers/boxcutter_reconcile_steps.gointernal/operator-controller/controllers/clusterextension_controller.gointernal/operator-controller/controllers/clusterextension_controller_test.gointernal/operator-controller/controllers/clusterextension_reconcile_steps.gointernal/operator-controller/features/features.gointernal/operator-controller/labels/labels.gomanifests/experimental-e2e.yamlmanifests/experimental.yamlopenshift/operator-controller/manifests-experimental.yamltest/e2e/README.mdtest/e2e/features/ha.featuretest/e2e/steps/ha_steps.gotest/e2e/steps/hooks.gotest/e2e/steps/steps.gotest/extension-developer-e2e/setup.shtest/internal/catalog/catalog.gotest/internal/catalog/catalog_test.go
💤 Files with no reviewable changes (2)
- test/extension-developer-e2e/setup.sh
- internal/operator-controller/catalogmetadata/client/client_test.go
✅ Files skipped from review due to trivial changes (4)
- internal/operator-controller/labels/labels.go
- docs/designs/testing/2026-04-13-e2e-isolation/design.md
- test/e2e/README.md
- go.mod
🚧 Files skipped from review as they are similar to previous changes (4)
- internal/operator-controller/catalogmetadata/client/client.go
- helm/experimental.yaml
- test/e2e/features/ha.feature
- internal/catalogd/serverutil/serverutil.go
| // +kubebuilder:validation:MaxLength=20 | ||
| // +kubebuilder:validation:XValidation:rule="self.matches(\"^$|^(0|[1-9][0-9]*|[0-9]*[A-Za-z-][0-9A-Za-z-]*)(\\\\.(0|[1-9][0-9]*|[0-9]*[A-Za-z-][0-9A-Za-z-]*))*$\")",message="release must be empty or consist of dot-separated identifiers (numeric without leading zeros, or alphanumeric)" | ||
| Release *string `json:"release,omitempty"` |
There was a problem hiding this comment.
MaxLength=20 introduces a schema/runtime mismatch that can block reconciliation.
Line 489 caps status.install.bundle.release to 20 chars, but release parsing/propagation paths accept valid longer values. A valid release string over 20 chars would fail status validation at write time instead of being handled deterministically earlier.
💡 Proposed fix (align API with current parser behavior)
- // +kubebuilder:validation:MaxLength=20
// +kubebuilder:validation:XValidation:rule="self.matches(\"^$|^(0|[1-9][0-9]*|[0-9]*[A-Za-z-][0-9A-Za-z-]*)(\\\\.(0|[1-9][0-9]*|[0-9]*[A-Za-z-][0-9A-Za-z-]*))*$\")",message="release must be empty or consist of dot-separated identifiers (numeric without leading zeros, or alphanumeric)"
Release *string `json:"release,omitempty"`If the 20-char cap is required, enforce that same limit at all ingestion points before status assignment, and return a clear validation error there.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@api/v1/clusterextension_types.go` around lines 489 - 491, The MaxLength=20
validation on the Release field (Release *string `json:"release,omitempty"`)
creates a mismatch with downstream parsing/propagation; remove or change the `//
+kubebuilder:validation:MaxLength=20` annotation on the Release field in
clusterextension_types.go so the CRD allows the longer valid release strings the
parser accepts (or increase the limit to the parser’s maximum), or alternatively
implement the same 20-char check at all ingestion points before assigning to
status.install.bundle.release and return a clear validation error there; update
the annotation and any upstream validators consistent with the chosen approach.
There was a problem hiding this comment.
@grokspawn @rashmigottipati do we care about this?
There was a problem hiding this comment.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
There was a problem hiding this comment.
This is already consistently handled in each location where a release value can be generated EXCEPT for the legacy build-metadata approach in the legacy handling, but SEPARATELY we have already verified that all available catalog metadata which uses this approach is acceptable to these limits (operator-framework/operator-registry#1852).
We no longer 'grandfather' build metadata approaches to infer release versions, nor do we allow new submissions using that approach to SQLite/legacy catalog formats, so I'm reasonably confident we've bolted the doors and locked the windows here.
| release: | ||
| description: |- | ||
| release is an optional field that identifies a specific release of this bundle's version. | ||
| A release represents a re-publication of the same version, typically used to deliver | ||
| packaging or metadata changes without changing the version number. When multiple | ||
| releases exist for the same version, higher releases are preferred. An unset release | ||
| is less preferred than all other release values. | ||
|
|
||
| The value consists of dot-separated identifiers, where each identifier is either a | ||
| numeric value (without leading zeros) or an alphanumeric string (e.g., "2", "1.el9", | ||
| "3.alpha.1"). Releases are compared identifier by identifier: numeric identifiers are | ||
| compared as integers, alphanumeric identifiers are compared lexically, and numeric | ||
| identifiers always sort before alphanumeric identifiers. | ||
|
|
||
| For bundles with explicit pkg.Release metadata, this field contains that release value. | ||
| For registry+v1 bundles lacking an explicit release value, this field contains the release | ||
| extracted from version's build metadata (e.g., '2' from '1.0.0+2'). | ||
| This field is omitted when the bundle's release value is unset. | ||
| maxLength: 20 | ||
| type: string | ||
| x-kubernetes-validations: | ||
| - message: release must be empty or consist of dot-separated | ||
| identifiers (numeric without leading zeros, or alphanumeric) | ||
| rule: self.matches("^$|^(0|[1-9][0-9]*|[0-9]*[A-Za-z-][0-9A-Za-z-]*)(\\.(0|[1-9][0-9]*|[0-9]*[A-Za-z-][0-9A-Za-z-]*))*$") |
There was a problem hiding this comment.
maxLength: 20 is stricter than the controller now accepts.
The release parser accepts any syntactically valid dot-separated release string, and this manifest enables BundleReleaseSupport on Line 2735, so status.install.bundle.release can now be populated from explicit pkg.Release or derived build metadata. With the 20-character cap here, any longer valid release will reconcile until the status update is rejected by CRD validation. Please either enforce the same limit during parsing or relax the schema cap.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@manifests/experimental.yaml` around lines 1264 - 1287, The CRD's release
field currently limits release length with maxLength: 20 which is stricter than
the controller and will cause status.install.bundle.release updates to be
rejected; update the CRD to either remove or raise the maxLength (e.g., to a
much larger limit like 256) so it matches the controller's accepted syntax, or
alternatively add the same truncation/validation logic into the release parser
used by BundleReleaseSupport/pkg.Release to enforce the 20-char cap
consistently; adjust the "release" field (maxLength) and keep the existing
x-kubernetes-validations regex intact so the schema and controller behavior
align.
…2643) * 🐛 fix: allow reconciliation of deadline-exceeded ClusterObjectSets Previously, a watch predicate dropped update events for ClusterObjectSets with ProgressDeadlineExceeded, preventing them from being archived or cleaned up. Remove the predicate and replace the inline deadline logic in Reconcile with an enforceProgressDeadline() helper and a progressDeadline() function that computes the absolute deadline from spec and metadata. Add a deadline-aware rate limiter that caps exponential backoff at the deadline and allows one immediate requeue when it passes, without inspecting status conditions. Add an e2e scenario that forces ProgressDeadlineExceeded, archives the COS, and verifies resource cleanup. * test: add unit test for recovery from ProgressDeadlineExceeded to Succeeded * test: add table-driven unit tests for durationUntilDeadline * docs: add comment clarifying Blocked takes precedence over ProgressDeadlineExceeded * refactor: hoist nonTerminalReasons map to package-level var * refactor: pass logger into markAsProgressing instead of using global log.Log * test: add e2e test for recovery from ProgressDeadlineExceeded to Succeeded
|
/retest |
|
I have some upstream fixes: operator-framework/operator-controller#2679 |
|
/lgtm |
|
/retest-required |
e4b8c55 to
e048679
Compare
|
New changes are detected. LGTM label has been removed. |
|
@openshift-bot: This pull request explicitly references no jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retitle OCPBUGS-62517: Synchronize from Upstream Repositories |
|
@openshift-bot: This pull request references Jira Issue OCPBUGS-62517, which is valid. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (jiazha@redhat.com), skipping review request. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
test/e2e/steps/upgrade_steps.go (1)
32-37:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPopulate
componentNamespacesafter install to keep per-component checks correct.On Line 32 and Line 36, the code updates only
olmNamespaceand drops catalogd deployment info. In upgrade paths whereBeforeSuitecouldn’t pre-detect deployments, this leavescomponentNamespacesunset andnamespaceForComponent("catalogd")can resolve to the wrong namespace.Suggested fix
- olm, _, err := detectOLMDeployments() + olm, catalogdDep, err := detectOLMDeployments() if err != nil { return err } olmNamespace = olm.Namespace + componentNamespaces["operator-controller"] = olm.Namespace + catalogdNS := olm.Namespace + if catalogdDep != nil { + catalogdNS = catalogdDep.Namespace + } + componentNamespaces["catalogd"] = catalogdNS return nil🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/steps/upgrade_steps.go` around lines 32 - 37, The code calls detectOLMDeployments() and only assigns olmNamespace, leaving componentNamespaces unset; update the block that handles the returned values from detectOLMDeployments() (the call that returns olm, _, err) to capture the deployments result and populate componentNamespaces so namespaceForComponent("catalogd") resolves correctly — e.g., change the ignored variable to a meaningful name (deployments) and set componentNamespaces["catalogd"] = deployments["catalogd"].Namespace (and any other component entries returned) after assigning olmNamespace; ensure this uses the existing detectOLMDeployments, olmNamespace, componentNamespaces, namespaceForComponent, and catalogd identifiers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/operator-controller/catalogmetadata/client/client.go`:
- Around line 109-112: Update the cache contract comments for the Cache.Get and
Cache.Put methods in client.go to reflect that non-200 HTTP responses are
intentionally not written to the cache; specifically, change the wording in the
Cache.Get/Put doc comments (referencing the Cache interface and its Get/Put
methods) to state that only successful (200) responses are cached and that
callers should not expect non-200 responses (e.g., 404 from non-leader pods) to
be populated in the cache, and mention that callers should handle retries
without relying on cached population for error or non-200 cases.
In `@internal/operator-controller/controllers/clusterobjectset_controller.go`:
- Around line 339-345: SetupWithManager currently unconditionally sets c.Clock =
clock.RealClock{}, preventing injected fake clocks; change it to only default
c.Clock to clock.RealClock{} when c.Clock is nil so reconcile (which
dereferences c.Clock each run) and newDeadlineAwareRateLimiter still receive the
injected clock. Locate SetupWithManager and replace the unconditional assignment
with a nil-check (e.g., if c.Clock == nil { c.Clock = clock.RealClock{} }) so
callers can inject fake clocks for testing while leaving
controller.Options/RateLimiter setup unchanged.
In `@test/e2e/steps/steps.go`:
- Around line 219-226: The helper namespaceForComponent is implemented but not
used; update all namespace-sensitive lookups to call it instead of hardcoding
olmNamespace: replace occurrences in listReferredSecrets, SendMetricsRequest,
and helmReleaseSecretForExtension so they derive the namespace via
namespaceForComponent(componentName) (or accept/pass the appropriate component
string into those functions if needed) and use that returned namespace for API
queries; ensure any call sites that currently assume olmNamespace are updated to
provide the right component identifier so the lookups respect split vs
single-namespace layouts.
---
Outside diff comments:
In `@test/e2e/steps/upgrade_steps.go`:
- Around line 32-37: The code calls detectOLMDeployments() and only assigns
olmNamespace, leaving componentNamespaces unset; update the block that handles
the returned values from detectOLMDeployments() (the call that returns olm, _,
err) to capture the deployments result and populate componentNamespaces so
namespaceForComponent("catalogd") resolves correctly — e.g., change the ignored
variable to a meaningful name (deployments) and set
componentNamespaces["catalogd"] = deployments["catalogd"].Namespace (and any
other component entries returned) after assigning olmNamespace; ensure this uses
the existing detectOLMDeployments, olmNamespace, componentNamespaces,
namespaceForComponent, and catalogd identifiers.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: add1d36f-391b-4aac-8252-f46bee9f74de
⛔ Files ignored due to path filters (23)
api/v1/zz_generated.deepcopy.gois excluded by!**/zz_generated*go.sumis excluded by!**/*.sumopenshift/tests-extension/go.sumis excluded by!**/*.sumopenshift/tests-extension/vendor/github.com/operator-framework/operator-controller/api/v1/clusterextension_types.gois excluded by!**/vendor/**openshift/tests-extension/vendor/github.com/operator-framework/operator-controller/api/v1/zz_generated.deepcopy.gois excluded by!**/vendor/**,!**/zz_generated*openshift/tests-extension/vendor/modules.txtis excluded by!**/vendor/**openshift/tests-extension/vendor/sigs.k8s.io/structured-merge-diff/v6/fieldpath/element.gois excluded by!**/vendor/**openshift/tests-extension/vendor/sigs.k8s.io/structured-merge-diff/v6/fieldpath/pathelementmap.gois excluded by!**/vendor/**openshift/tests-extension/vendor/sigs.k8s.io/structured-merge-diff/v6/fieldpath/set.gois excluded by!**/vendor/**openshift/tests-extension/vendor/sigs.k8s.io/structured-merge-diff/v6/value/allocator.gois excluded by!**/vendor/**openshift/tests-extension/vendor/sigs.k8s.io/structured-merge-diff/v6/value/jsontagutil.gois excluded by!**/vendor/**vendor/github.com/containerd/containerd/archive/tar_unix.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/containerd/containerd/version/version.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/docker/cli/AUTHORSis excluded by!**/vendor/**,!vendor/**vendor/github.com/docker/cli/cli/config/configfile/file.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/docker/cli/cli/config/credentials/file_store.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/docker/cli/cli/config/memorystore/store.gois excluded by!**/vendor/**,!vendor/**vendor/modules.txtis excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/structured-merge-diff/v6/fieldpath/element.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/structured-merge-diff/v6/fieldpath/pathelementmap.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/structured-merge-diff/v6/fieldpath/set.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/structured-merge-diff/v6/value/allocator.gois excluded by!**/vendor/**,!vendor/**vendor/sigs.k8s.io/structured-merge-diff/v6/value/jsontagutil.gois excluded by!**/vendor/**,!vendor/**
📒 Files selected for processing (43)
Makefileapi/v1/clusterextension_types.goapplyconfigurations/api/v1/bundlemetadata.godocs/api-reference/olmv1-api-reference.mddocs/designs/testing/2026-04-13-e2e-isolation/design.mdgo.modhelm/experimental.yamlhelm/olmv1/base/operator-controller/crd/experimental/olm.operatorframework.io_clusterextensions.yamlinternal/catalogd/serverutil/serverutil.gointernal/operator-controller/applier/boxcutter.gointernal/operator-controller/bundle/versionrelease.gointernal/operator-controller/bundleutil/bundle.gointernal/operator-controller/bundleutil/bundle_test.gointernal/operator-controller/catalogmetadata/client/client.gointernal/operator-controller/catalogmetadata/client/client_test.gointernal/operator-controller/catalogmetadata/filter/successors.gointernal/operator-controller/catalogmetadata/filter/successors_test.gointernal/operator-controller/controllers/boxcutter_reconcile_steps.gointernal/operator-controller/controllers/clusterextension_controller.gointernal/operator-controller/controllers/clusterextension_controller_test.gointernal/operator-controller/controllers/clusterextension_reconcile_steps.gointernal/operator-controller/controllers/clusterobjectset_controller.gointernal/operator-controller/controllers/clusterobjectset_controller_test.gointernal/operator-controller/controllers/progress_deadline.gointernal/operator-controller/controllers/progress_deadline_test.gointernal/operator-controller/features/features.gointernal/operator-controller/labels/labels.gomanifests/experimental-e2e.yamlmanifests/experimental.yamlopenshift/operator-controller/manifests-experimental.yamlopenshift/tests-extension/go.modrequirements.txttest/e2e/README.mdtest/e2e/features/ha.featuretest/e2e/features/install.featuretest/e2e/features/revision.featuretest/e2e/steps/ha_steps.gotest/e2e/steps/hooks.gotest/e2e/steps/steps.gotest/e2e/steps/upgrade_steps.gotest/extension-developer-e2e/setup.shtest/internal/catalog/catalog.gotest/internal/catalog/catalog_test.go
💤 Files with no reviewable changes (2)
- test/extension-developer-e2e/setup.sh
- internal/operator-controller/catalogmetadata/client/client_test.go
✅ Files skipped from review due to trivial changes (10)
- requirements.txt
- docs/designs/testing/2026-04-13-e2e-isolation/design.md
- test/e2e/README.md
- helm/experimental.yaml
- test/internal/catalog/catalog.go
- internal/operator-controller/labels/labels.go
- test/e2e/features/ha.feature
- internal/operator-controller/bundle/versionrelease.go
- go.mod
- openshift/tests-extension/go.mod
🚧 Files skipped from review as they are similar to previous changes (7)
- internal/operator-controller/controllers/boxcutter_reconcile_steps.go
- docs/api-reference/olmv1-api-reference.md
- internal/operator-controller/controllers/progress_deadline.go
- test/e2e/steps/hooks.go
- test/e2e/features/install.feature
- internal/catalogd/serverutil/serverutil.go
- internal/operator-controller/bundleutil/bundle_test.go
| // Do not cache non-200 responses (e.g. 404 from a non-leader catalogd pod). | ||
| // Returning the error directly lets the next reconcile retry a fresh HTTP | ||
| // request and eventually hit the leader. | ||
| return nil, fmt.Errorf("error: received unexpected response status code %d", resp.StatusCode) |
There was a problem hiding this comment.
Update cache contract docs to match the new non-200 behavior.
This path now intentionally bypasses cache writes for non-200 responses, but the Cache.Get/Put contract comments still read as if population errors are always cached. Please align the comments to avoid incorrect assumptions by callers.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/operator-controller/catalogmetadata/client/client.go` around lines
109 - 112, Update the cache contract comments for the Cache.Get and Cache.Put
methods in client.go to reflect that non-200 HTTP responses are intentionally
not written to the cache; specifically, change the wording in the Cache.Get/Put
doc comments (referencing the Cache interface and its Get/Put methods) to state
that only successful (200) responses are cached and that callers should not
expect non-200 responses (e.g., 404 from non-leader pods) to be populated in the
cache, and mention that callers should handle retries without relying on cached
population for error or non-200 cases.
| WithOptions(controller.Options{ | ||
| RateLimiter: newDeadlineAwareRateLimiter( | ||
| workqueue.DefaultTypedControllerRateLimiter[ctrl.Request](), | ||
| mgr.GetClient(), | ||
| c.Clock, | ||
| ), | ||
| }). |
There was a problem hiding this comment.
Only default Clock when it is nil.
reconcile now dereferences c.Clock on every run, but SetupWithManager always overwrites whatever the caller injected with clock.RealClock{}. That makes fake-clock coverage impossible through the normal setup path and removes an easy way to exercise the deadline logic deterministically.
Suggested change
func (c *ClusterObjectSetReconciler) SetupWithManager(mgr ctrl.Manager) error {
- c.Clock = clock.RealClock{}
+ if c.Clock == nil {
+ c.Clock = clock.RealClock{}
+ }
return ctrl.NewControllerManagedBy(mgr).
WithOptions(controller.Options{
RateLimiter: newDeadlineAwareRateLimiter(🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/operator-controller/controllers/clusterobjectset_controller.go`
around lines 339 - 345, SetupWithManager currently unconditionally sets c.Clock
= clock.RealClock{}, preventing injected fake clocks; change it to only default
c.Clock to clock.RealClock{} when c.Clock is nil so reconcile (which
dereferences c.Clock each run) and newDeadlineAwareRateLimiter still receive the
injected clock. Locate SetupWithManager and replace the unconditional assignment
with a nil-check (e.g., if c.Clock == nil { c.Clock = clock.RealClock{} }) so
callers can inject fake clocks for testing while leaving
controller.Options/RateLimiter setup unchanged.
| // namespaceForComponent returns the namespace for the named OLM component. | ||
| // Falls back to olmNamespace when the component has no dedicated entry (e.g. upstream single-namespace deployments). | ||
| func namespaceForComponent(component string) string { | ||
| if ns, ok := componentNamespaces[component]; ok { | ||
| return ns | ||
| } | ||
| return olmNamespace | ||
| } |
There was a problem hiding this comment.
Wire namespaceForComponent into the namespace-sensitive lookups.
This helper does not change behavior yet: listReferredSecrets (Lines 1097-1100), SendMetricsRequest (Line 1407), and helmReleaseSecretForExtension (Lines 1919-1921) still hardcode olmNamespace. In split/single-namespace layouts, these steps will keep querying olmv1-system and can false-fail the new scenarios.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/e2e/steps/steps.go` around lines 219 - 226, The helper
namespaceForComponent is implemented but not used; update all
namespace-sensitive lookups to call it instead of hardcoding olmNamespace:
replace occurrences in listReferredSecrets, SendMetricsRequest, and
helmReleaseSecretForExtension so they derive the namespace via
namespaceForComponent(componentName) (or accept/pass the appropriate component
string into those functions if needed) and use that returned namespace for API
queries; ensure any call sites that currently assume olmNamespace are updated to
provide the right component identifier so the lookups respect split vs
single-namespace layouts.
|
@openshift-bot: This pull request explicitly references no jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retitle OCPBUGS-62517: Synchronize from Upstream Repositories |
|
@openshift-bot: This pull request references Jira Issue OCPBUGS-62517, which is valid. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (jiazha@redhat.com), skipping review request. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/test e2e-gcp-ovn-upgrade |
|
@openshift-bot: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
[APPROVALNOTIFIER] This PR is APPROVED Approval requirements bypassed by manually added approval. This pull-request has been approved by: openshift-bot, rashmigottipati The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/verified by grokspawn Rationale: All 12 upstream commits are low-risk or have complete test coverage — no manual testing needed. Low-risk commits (8):
Production changes with tests (4):
All functional CI jobs passed. 🤖 Generated by Claude Code |
|
@grokspawn: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@openshift-bot: Jira Issue OCPBUGS-62517: Some pull requests linked via external trackers have merged:
The following pull request, linked via external tracker, has not merged: All associated pull requests must be merged or unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with Jira Issue OCPBUGS-62517 has not been moved to the MODIFIED state. This PR is marked as verified. If the remaining PRs listed above are marked as verified before merging, the issue will automatically be moved to VERIFIED after all of the changes from the PRs are available in an accepted nightly payload. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
The downstream repository has been updated with the following following upstream commits:
pkg.Releasefield (#2543)The
vendor/directory has been updated and the following commits were carried:@catalogd-updateThis pull request is expected to merge without any human intervention. If tests are failing here, changes must land upstream to fix any issues so that future downstreaming efforts succeed.
/assign @openshift/openshift-team-operator-runtime