Skip to content

Skip SecretHashes update on ansibleLimit-scoped dataplane deployments#1781

Open
lmiccini wants to merge 1 commit intoopenstack-k8s-operators:mainfrom
lmiccini:nodeset_rmqu_finalizer_configmap
Open

Skip SecretHashes update on ansibleLimit-scoped dataplane deployments#1781
lmiccini wants to merge 1 commit intoopenstack-k8s-operators:mainfrom
lmiccini:nodeset_rmqu_finalizer_configmap

Conversation

@lmiccini
Copy link
Copy Markdown
Contributor

@lmiccini lmiccini commented Jan 27, 2026

For servicesOverride deployments, merge SecretHashes into the existing map without resetting it, since all nodes were touched.
For ansibleLimit-scoped deployments, skip the update entirely as not all nodes received the secrets.

@softwarefactory-project-zuul
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/56ac80bd0e7547ad88350eb0206886b5

✔️ openstack-k8s-operators-content-provider SUCCESS in 3h 18m 47s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 23m 38s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 37m 31s
adoption-standalone-to-crc-ceph-provider FAILURE in 3h 01m 55s
✔️ openstack-operator-tempest-multinode SUCCESS in 1h 51m 23s
openstack-operator-docs-preview POST_FAILURE in 2m 32s

@stuggi stuggi requested a review from slagle January 28, 2026 08:13
Comment thread api/core/v1beta1/openstackcontrolplane_webhook.go Outdated
@softwarefactory-project-zuul
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/db62c9cd33b34a538c7eccf243769b6a

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 02m 26s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 20m 56s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 36m 03s
adoption-standalone-to-crc-ceph-provider FAILURE in 1h 46m 57s
✔️ openstack-operator-tempest-multinode SUCCESS in 1h 34m 08s
openstack-operator-docs-preview POST_FAILURE in 3m 15s

@lmiccini lmiccini force-pushed the nodeset_rmqu_finalizer_configmap branch 2 times, most recently from 3885c4a to c1fe8f8 Compare February 7, 2026 18:56
@softwarefactory-project-zuul
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/b5d3972863e64857b2da5055f867ef55

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 20m 43s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 21m 41s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 36m 22s
adoption-standalone-to-crc-ceph-provider FAILURE in 2h 05m 30s
✔️ openstack-operator-tempest-multinode SUCCESS in 1h 43m 01s
✔️ openstack-operator-docs-preview SUCCESS in 3m 14s

@lmiccini
Copy link
Copy Markdown
Contributor Author

lmiccini commented Feb 8, 2026

/retest

@lmiccini
Copy link
Copy Markdown
Contributor Author

lmiccini commented Feb 8, 2026

recheck

@lmiccini
Copy link
Copy Markdown
Contributor Author

lmiccini commented Feb 8, 2026

/test openstack-operator-build-deploy-kuttl-4-18

@lmiccini lmiccini force-pushed the nodeset_rmqu_finalizer_configmap branch 2 times, most recently from cbfbb7c to f52529a Compare February 8, 2026 15:01
@lmiccini lmiccini force-pushed the nodeset_rmqu_finalizer_configmap branch from f52529a to 017d2ca Compare February 10, 2026 06:41
@lmiccini
Copy link
Copy Markdown
Contributor Author

/test functional

@lmiccini lmiccini force-pushed the nodeset_rmqu_finalizer_configmap branch 2 times, most recently from 97bb482 to b1d9350 Compare February 12, 2026 15:46
@lmiccini
Copy link
Copy Markdown
Contributor Author

/test openstack-operator-build-deploy-kuttl-4-18

@lmiccini lmiccini force-pushed the nodeset_rmqu_finalizer_configmap branch from d974015 to 45f39f8 Compare April 24, 2026 15:49
@lmiccini lmiccini force-pushed the nodeset_rmqu_finalizer_configmap branch 4 times, most recently from 5b509f3 to 5880a65 Compare April 25, 2026 10:44
@lmiccini
Copy link
Copy Markdown
Contributor Author

/test openstack-operator-build-deploy-kuttl-4-18

@lmiccini lmiccini force-pushed the nodeset_rmqu_finalizer_configmap branch from 5880a65 to 187eecc Compare April 27, 2026 07:16
@lmiccini lmiccini changed the title Add per-node secret rotation tracking with drift detection Add secret rotation tracking with composite hash drift detection Apr 27, 2026
@lmiccini lmiccini requested a review from slagle April 27, 2026 07:33
@lmiccini lmiccini force-pushed the nodeset_rmqu_finalizer_configmap branch from 187eecc to 9f7cae7 Compare April 27, 2026 07:38
@lmiccini
Copy link
Copy Markdown
Contributor Author

/test openstack-operator-build-deploy-kuttl-4-18

1 similar comment
@lmiccini
Copy link
Copy Markdown
Contributor Author

/test openstack-operator-build-deploy-kuttl-4-18

@rabi
Copy link
Copy Markdown
Contributor

rabi commented Apr 28, 2026

Why can't infra-operator block deletion of there is mismatch between latest deployment and nodeset secret hashes? I am not sure if we should go this path as it is othrogonal to the original design. Also, users should be discouraged to do secret rotations using ansible_limits. It is there for scenarios where nodes are not reachable for some reason and users are expected to do full deployment across all nodes of the nodeset eventually to keep things in sync.

@lmiccini
Copy link
Copy Markdown
Contributor Author

Why can't infra-operator block deletion of there is mismatch between latest deployment and nodeset secret hashes? I am not sure if we should go this path as it is othrogonal to the original design. Also, users should be discouraged to do secret rotations using ansible_limits. It is there for scenarios where nodes are not reachable for some reason and users are expected to do full deployment across all nodes of the nodeset eventually to keep things in sync.

Thanks @rabi . One of the challenges with that approach is that the last deployment may not be representative of the overall status of the nodeset. AFAICU we would have to introspect the deployments to check whether ansible_limits was used and/or if only a subset of the services were deployed, and repeat this pattern not just in infra but other operators that need to know if the dataplane is in sync (keystone for appcred, possibly others?).
I thought having a generic "am I up to date" status field would be in line with the loosely coupled approach we have for the dataplane, plus we would be able to use this for credentials rotation (rabbitmq, appcred, etc) and also to facilitate the cleanup after we migrate from mirrored to quorum queues among others. I am sure there are more use cases we could use this for.

@rabi
Copy link
Copy Markdown
Contributor

rabi commented Apr 30, 2026

One of the challenges with that approach is that the last deployment may not be representative of the overall status of the nodeset

I am not sure if we need to solve that by adding per-node tracking or by introspecting all deployments. That would start to break the current NodeSet design and make it overly complex IMO.

For this requirement, I think we can keep it simple:

  • do not update NodeSet.Status.SecretHashes when ansibleLimit is set, since that deployment did not fully update the NodeSet

  • with servicesOverride, the last deployment SecretHashes map is only partial, so it should not fully match the NodeSet SecretHashes. This can be false negative but conservative.

  • block deletion whenever NodeSet.Status.SecretHashes and Deployment.Status.SecretHashes differ

With this approach, any scoped deployment, either via ansibleLimit or servicesOverride, keeps deletion blocked. Only a full unscoped deployment across all nodes brings the NodeSet back in sync.

We can add the ansibleLimit guard where SecretHashes are copied back to the NodeSet in internal/controller/dataplane/openstackdataplanenodeset_controller.go.

@lmiccini lmiccini force-pushed the nodeset_rmqu_finalizer_configmap branch from 9f7cae7 to e816aa3 Compare April 30, 2026 07:58
@lmiccini lmiccini changed the title Add secret rotation tracking with composite hash drift detection Skip SecretHashes update on scoped dataplane deployments Apr 30, 2026
@lmiccini lmiccini requested a review from rabi April 30, 2026 08:00
@lmiccini
Copy link
Copy Markdown
Contributor Author

/test functional

@centosinfra-prod-github-app
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://gateway-cloud-softwarefactory.apps.ocp.cloud.ci.centos.org/zuul/t/rdo/buildset/44b91fa09db54b80829d1eb5022cd8ba

✔️ openstack-k8s-operators-content-provider SUCCESS in 12m 54s
podified-multinode-edpm-deployment-crc RETRY_LIMIT in 30s
cifmw-crc-podified-edpm-baremetal RETRY_LIMIT in 28s
openstack-operator-tempest-multinode RETRY_LIMIT in 30s
openstack-operator-edpm-baremetal-minor-update RETRY_LIMIT in 28s

}
for k, v := range deployment.Status.SecretHashes {
instance.Status.SecretHashes[k] = v
for k, v := range deployment.Status.SecretHashes {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine with me, but I think we can still update SecretHashes for servicesOverride deployments without resetting the full map, which is what we do for full deployments today.

@lmiccini lmiccini force-pushed the nodeset_rmqu_finalizer_configmap branch from e816aa3 to 13d1c14 Compare April 30, 2026 12:11
@lmiccini lmiccini changed the title Skip SecretHashes update on scoped dataplane deployments Skip SecretHashes update on ansibleLimit-scoped dataplane deployments Apr 30, 2026
For servicesOverride deployments, merge SecretHashes into the existing
map without resetting it, since all nodes were touched. For
ansibleLimit-scoped deployments, skip the update entirely as not all
nodes received the secrets.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@lmiccini lmiccini force-pushed the nodeset_rmqu_finalizer_configmap branch from 13d1c14 to 5761424 Compare April 30, 2026 12:28
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 30, 2026

@lmiccini: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/openstack-operator-build-deploy-kuttl 1698305 link true /test openstack-operator-build-deploy-kuttl
ci/prow/openstack-operator-build-deploy-kuttl-4-18 5761424 link true /test openstack-operator-build-deploy-kuttl-4-18

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants