Conversation
The webhook server (cloudzero-webhook binary; previously known as the insights-controller) used to push metrics directly to the CloudZero cloud API and authenticated each request with the customer's API key. It now always pushes to the in-cluster aggregator, which is unauthenticated on the cluster-internal hop. The API key secret mount, the Authorization: Bearer header, the in-process secret monitor that polled the key file for rotation, and the legacy query-string encoding of cluster_name/cloud_account_id/region in the destination URL all became dead weight on this code path. Functional Change: Before: The webhook pod mounted the cloudzero-api-key secret, the rendered server-config.yaml included api_key_path:, and the pusher posted to http://aggregator/collector?cluster_name=...&cloud_account_id= ...®ion=... with an `Authorization: Bearer $KEY` header. A secret monitor polled the API key file every 10 seconds for in-process rotation. After: The webhook pod has no api-key volume or volumeMount, the rendered server-config.yaml has no api_key_path field, and the pusher posts to the bare collector URL with no Authorization header. The secret monitor is no longer wired into the webhook entry point. The aggregator (collector and shipper), the agent (Prometheus and validator init containers), the config-loader Job, and other components that legitimately still need the key are unchanged. Solution: 1. helm/templates/_cm_helpers.tpl: removed the api_key_path: line from the cloudzero-agent.insightsController.configuration helper. The aggregator's configuration helper still emits api_key_path because the shipper continues to need the key for uploads. 2. helm/templates/webhook-deploy.yaml: removed the cloudzero-api-key volumeMount and volume blocks (both gated on existingSecretName or apiKey). 3. app/config/webhook/settings.go: removed the unexported RemoteWrite.apiKey field, the GetAPIKey/SetAPIKey methods, the absFilePath and isValidURL helpers, and the Settings.mu mutex (it only protected the API-key read/write pair). Simplified setRemoteWriteURL() to assign Destination verbatim with a single URL validation; the aggregator's /collector handler does not parse cluster_name/cloud_account_id/region query parameters (only the shipper consumes those, on its own outbound path). The APIKeyPath field is kept as a deprecated tombstone so legacy server-config.yaml files and pod specs that still set API_KEY_PATH parse cleanly even under future strict-mode YAML/env decoders. 4. app/domain/pusher/pusher.go: removed the GetAPIKey lookup, the empty-key error branch, the Authorization header, and the apiKey parameter from pushMetrics. 5. app/functions/webhook/main.go: removed the monitor.NewSecretMonitor instantiation and its Run/Shutdown wiring. The monitor package is still imported for TLS reloading on SIGHUP, which is unrelated. Also updated the startup log message from "Starting CloudZero Insights Controller" to "Starting CloudZero Webhook Server" to use the current component name. 6. app/domain/monitor/secret_monitor_test.go: replaced the test's dependency on the webhook config.Settings (which no longer implements MonitoredAPIKey) with a small in-test file-backed fake. 7. app/config/webhook/settings_test.go and app/domain/pusher/pusher_test.go: removed API-key fixtures; the pusher test now positively asserts that no Authorization header is sent. 8. helm/tests/webhook_no_apikey_mount_test.yaml: new helm-unittest suite asserting the rendered webhook Deployment has no cloudzero-api-key volume or volumeMount, and the rendered webhook ConfigMap contains no api_key_path field and no cluster_name/cloud_account_id query string fragments. Validation: - Go unit tests pass for app/config/webhook, app/domain/pusher, and app/domain/monitor. - make lint-go is clean; make helm-lint is clean; make helm-test passes, including the new helm-unittest suite. - Generated chart manifests under tests/helm/template/ show only the intended deletions across all six variants (alloy, cert-manager, federated, istio, kubestate, manifest): one api_key_path: line in the webhook ConfigMap, one four-line volumeMount block, and one three-line volume block per variant. No other resources change. - KIND end-to-end: deployed the freshly built image, confirmed the webhook pod has no cloudzero-api-key volume or volumeMount, the rendered server-config.yaml has no api_key_path, and the destination URL has no query string. Exercised the admission flow with test pods and confirmed the pusher's "Pushing records to remote write endpoint" log fires with records flowing to the in-cluster aggregator. - Development EKS cluster end-to-end: helm upgrade rolled cleanly, webhook started with no API-key plumbing and no startup errors, admission requests exercised, "Pushing records to remote write endpoint" log confirmed the full data path through the new pusher code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The webhook server (cloudzero-webhook binary; previously known as the insights-controller) used to push metrics directly to the CloudZero cloud API and authenticated each request with the customer's API key. It now always pushes to the in-cluster aggregator, which is unauthenticated on the cluster-internal hop. The API key secret mount, the Authorization: Bearer header, the in-process secret monitor that polled the key file for rotation, and the legacy query-string encoding of cluster_name/cloud_account_id/region in the destination URL all became dead weight on this code path.
Functional Change:
Before: The webhook pod mounted the cloudzero-api-key secret, the rendered server-config.yaml included api_key_path:, and the pusher posted to http://aggregator/collector?cluster_name=...&cloud_account_id= ...®ion=... with an
Authorization: Bearer $KEYheader. A secret monitor polled the API key file every 10 seconds for in-process rotation.After: The webhook pod has no api-key volume or volumeMount, the rendered server-config.yaml has no api_key_path field, and the pusher posts to the bare collector URL with no Authorization header. The secret monitor is no longer wired into the webhook entry point. The aggregator (collector and shipper), the agent (Prometheus and validator init containers), the config-loader Job, and other components that legitimately still need the key are unchanged.
Solution:
Validation: