improvement(helm): helm chart updates with security, ESO, and docs overhaul#4565
Conversation
…s overhaul Comprehensive Helm chart improvements bringing the chart up to industry standards for security, secret management, and documentation. Security - Pod Security Standards "restricted" defaults on every pod and container (runAsNonRoot, allowPrivilegeEscalation=false, capabilities.drop=[ALL], seccompProfile=RuntimeDefault) - automountServiceAccountToken=false on ServiceAccount and every pod - NetworkPolicy egress blocks cloud metadata endpoints by default - Sensitive app/realtime env keys auto-partitioned into chart-managed Secret via envFrom; no more plaintext secrets on container specs Secret management - Three modes: inline, existingSecret, ExternalSecrets Operator (ESO) - ESO sync supports arbitrary sensitive keys - Fail-fast template rendering when ESO enabled but sensitive key unmapped - AWS/Azure/GCP example files document all three modes Reliability - Headless Services for both Postgres StatefulSets - HPA-aware replicas (omits spec.replicas when autoscaling.enabled) - PodDisruptionBudget auto-activates when replicaCount > 1 - Startup / liveness / readiness probes with distinct timings - CronJob ttlSecondsAfterFinished for automatic cleanup Chart hygiene - Image tags default to Chart.AppVersion; pullPolicy IfNotPresent - Optional image.digest pin for content-addressed deploys - kubeVersion >=1.25.0-0 enforced - Ollama pinned to 0.23.2; mount moved to /data Documentation - README rewritten in cert-manager / Bitnami style - NOTES.txt with post-install guidance - Example values files annotated with usage and secret-strategy guidance
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
PR SummaryHigh Risk Overview Reworks configuration and secret handling to support three mutually exclusive modes (inline, Hardens and operationalizes workloads: applies restricted PSS-aligned pod/container security contexts and Reviewed by Cursor Bugbot for commit b9ceff9. Configure here. |
Greptile SummaryThis PR is a comprehensive Helm chart overhaul that applies Pod Security Standards "restricted" defaults, introduces three secret-management modes (inline,
Confidence Score: 4/5Safe to merge for most deployments; users combining NetworkPolicy with an external database should add an egress extraRule for the DB host until the chart adds it natively. The chart overhaul is well-structured and the most critical security, ESO-coverage, and env-precedence bugs were addressed during the PR review cycle. The remaining items are the external DB egress gap in NetworkPolicy (pre-existing open comment), a constant checksum/secret hash in ESO/existingSecret mode, and the maxUnavailable: 0 PDB falsy-int edge case — all non-blocking for the vast majority of deployments. helm/sim/templates/networkpolicy.yaml — still missing an egress rule for externalDatabase hosts when networkPolicy.enabled=true and externalDatabase.enabled=true, which silently blocks app-to-database connectivity. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
subgraph SecretModes["Secret Resolution"]
ESO{{"externalSecrets.enabled?"}}
ESO -->|yes| ESOPath["ExternalSecret CR\nESO syncs to K8s Secret\nvalidateExternalSecretCoverage\nenforces remoteRefs coverage"]
ESO -->|no| ExistingQ{{"existingSecret.enabled?"}}
ExistingQ -->|yes| ExistingPath["Pre-existing K8s Secret\napp.env non-empty values\ninlined as env: entries"]
ExistingQ -->|no| InlinePath["Chart-managed Secret\napp.env + realtime.env union\napp.env wins on collision"]
end
InlinePath --> EnvFrom["envFrom secretRef"]
ESOPath --> EnvFrom
ExistingPath --> EnvFrom
EnvFrom --> Pod["Pod environment"]
Reviews (15): Last reviewed commit: "fix(helm): allow cron pods through app N..." | Re-trigger Greptile |
The sim.fullname helper collapses to the release name when the release name contains the chart name. With the documented release name 'sim', actual resources are 'sim-app', 'sim-postgresql', etc. — not the 'sim-sim-*' form previously documented. Fixes copy-paste commands in the pre-1.0.0 upgrade walkthrough and several troubleshooting snippets. Also expands the cronjobs component description to reflect the full set of 13 scheduled jobs (was understated as just Gmail/Outlook polling).
…defaults - Add app.envDefaults / realtime.envDefaults for chart-shipped operational tunables (rate limits, timeouts, IVM, feature-flag defaults, localhost URL fallbacks). Rendered inline on the container, not into the Secret - Remove operational defaults from app.env / realtime.env so the chart-managed Secret stays minimal and External Secrets Operator users only map keys they actually set, not every chart default - Skip an envDefaults key when the user explicitly sets it in env (K8s `env` overrides `envFrom`, so an inline default would otherwise mask a Secret value at runtime) - Relax values.schema.json to allow empty strings on NEXT_PUBLIC_APP_URL, BETTER_AUTH_URL, NEXT_PUBLIC_SUPPORT_EMAIL (defaults supplied via envDefaults)
…cret merge order, image guard - CronJobs reference CRON_SECRET via secretKeyRef; fail-fast at template time when cronjobs.enabled=true and app.env.CRON_SECRET is empty so users get a clear error instead of a CreateContainerConfigError loop - Default externalSecrets.apiVersion to "v1beta1" (supported by every ESO release since v0.7). The previous "v1" default targets only ESO v0.17+ - Swap merge order in secrets-app.yaml so app.env wins over realtime.env for shared keys (BETTER_AUTH_SECRET, BETTER_AUTH_URL, …) — both pods consume the same Secret via envFrom, so the app value must be canonical - Add `required` guard on sim.image so an empty tag + empty digest + empty Chart.AppVersion surfaces as a clear template-time error instead of rendering an invalid `repo:` reference
|
@greptile |
|
@cursor review |
Previously, enabling externalSecrets without mapping BETTER_AUTH_SECRET / ENCRYPTION_KEY / INTERNAL_API_SECRET (and CRON_SECRET when cronjobs are on) rendered cleanly but produced CrashLoopBackOff at runtime with cryptic missing-env errors. Fail at template time instead.
Previously the auto-enable predicate only checked the static app.replicaCount, which defaults to 1 even when autoscaling is on (HPA owns spec.replicas). PDB now also activates when autoscaling.enabled=true and minReplicas > 1.
|
@greptile |
|
@cursor review |
…alues; add StatefulSet upgrade NOTES - Realtime override-skip now considers keys set in either app.env or realtime.env. The shared app Secret is mounted via envFrom on the realtime pod, so a key set in app.env (e.g. NEXT_PUBLIC_APP_URL) would previously be masked by the realtime envDefault (inline env overrides envFrom in K8s). - NOTES.txt now prints a StatefulSet orphan-delete reminder on upgrade, surfacing the immutable serviceName issue documented in the README.
|
@cursor review |
|
@greptile |
- 7 helm-unittest suites covering smoke, validators, secret modes, envDefaults secret-mode-aware inlining (round-9 regression net), chart-computed env keys (round-8 regression net), NetworkPolicy shape, and PDB/HPA conditional rendering (38 tests, ~265ms). - ci/*.yaml render fixtures for default, production, existingSecret, ESO, and external-db install modes. - GitHub Actions workflow runs helm lint --strict, helm unittest, helm template across the ci matrix, and kubeconform validation against Kubernetes 1.30 schemas. - CONTRIBUTING.md documents how to run the same gates locally.
|
| GitGuardian id | GitGuardian status | Secret | Commit | Filename | |
|---|---|---|---|---|---|
| 32763747 | Triggered | Generic Password | 716a677 | helm/sim/tests/validators_test.yaml | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secret safely. Learn here the best practices.
- Revoke and rotate this secret.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
- New templates/tests/test-connection.yaml renders a Pod with helm.sh/hook=test that wgets the app Service (and realtime when enabled). Lets users run `helm test <release>` after install for a real in-cluster connectivity check. Restricted PSS context. - tests.* values block (image, timeoutSeconds, resources) is the knob to disable or tune the probe; documented in values.schema.json. - 3 helm-unittest tests cover the hook annotations, PSS context, and tests.enabled=false skip path (41 tests total). - New CI job spins up a kind v1.30 cluster and runs `kubectl apply --dry-run=server` against the rendered manifests for the CRD-free ci fixtures (default / existing-secret / external-db). Catches admission and validation issues the static kubeconform schema check can't see.
This is the 1.0.0 release of the chart — there is no pre-1.0.0 predecessor for users to upgrade from, so all of the dedicated upgrade narration was hypothetical. - Drop the 'Upgrading from a pre-1.0.0 build' README section and the matching troubleshooting entry. - Drop the .Release.IsUpgrade block from NOTES.txt: items 5 (StatefulSet orphan-delete), 6 (INTERNAL_API_SECRET 'new in 1.0.0'), 7 (networkPolicy.egress shape change). Each described a migration off a chart version that never shipped. - Delete references/upgrade-pre-1.0.0.md and remove the corresponding pointers from SKILL.md. - Anchor .helmignore patterns to chart root so /tests/ (unit suites) and /examples/ are dropped from the packaged tarball without also dropping templates/tests/ (the helm test hook).
The helm-unittest suites in helm/sim/tests/ and the helm test hook in helm/sim/templates/tests/ stay — those are chart-internal quality scaffolding, not CI. Removed: - .github/workflows/helm-chart.yml - helm/sim/ci/*.yaml (5 render fixtures used only by the workflow) - helm/sim/CONTRIBUTING.md (mostly documented those gates) - dead /ci/ and /CONTRIBUTING.md entries in .helmignore
- Add checksum/secret pod annotations on app, realtime, and copilot Deployments (plus checksum/config on app when branding ConfigMap is enabled). Closes the long-standing footgun where 'helm upgrade' with a changed Secret would silently leave pods running the old values until a manual rollout restart. - New top-level topologySpreadConstraints value (and sim.topologySpreadConstraints helper) applied to app and realtime Deployments. Mirrors how affinity and tolerations are plumbed; users supply their own labelSelector to mirror Bitnami convention. - 5 helm-unittest cases cover the checksum annotations and topology spread rendering (46 tests total).
Sprig 'merge' treats "" as a real value, so a default-empty app.env.BETTER_AUTH_URL would shadow a non-empty realtime.env override and the URL would never reach the rendered Secret. Replace 'merge' with an explicit two-pass overlay that filters empties before writing, mirroring the same pattern already used in deployment-realtime.yaml's existingSecret block. Adds two regression tests: realtime.env-only value reaches the Secret when app.env is empty, and app.env still wins on collision when both are non-empty (48 tests total).
|
@greptile |
|
@cursor review |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 570e5f0. Configure here.
…tring Greptile flagged that sim.topologySpreadConstraints helper docstring promised per-component config (.Values.app, .Values.realtime, ...) but call sites passed .Values, so any app.topologySpreadConstraints / realtime.topologySpreadConstraints set by the user was silently dropped. The single global key also prevented distinct app-vs-realtime spread rules. Pass .Values.app / .Values.realtime to the helper at each call site; move the top-level topologySpreadConstraints key into both component sections in values.yaml. Adds a regression test that app constraints don't leak onto the realtime pod.
|
@greptile |
|
@cursor review |
Cursor flagged that when networkPolicy.enabled=true and cronjobs.enabled=true (the recommended production config), the app NetworkPolicy only allowed ingress from realtime and the ingress controller — silently blocking every cron pod's HTTP call to /api/schedules/execute, webhook polls, etc. All 13 default cronjobs would fail. Tag cron pods with a stable simstudio.ai/component-group: cronjob label so the app NetworkPolicy can allow them with a single rule (no per-job enumeration). Rule is conditional on cronjobs.enabled. Adds positive and negative regression tests.
|
@greptile |
|
@cursor review |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit b9ceff9. Configure here.
Summary
automountServiceAccountToken: false, block cloud metadata endpoints via NetworkPolicy egressapp.env/realtime.envkeys into a chart-managed Secret viaenvFrom(no more plaintext secrets on container specs)existingSecret, and ExternalSecrets Operator (ESO) — with fail-fast rendering when ESO is enabled but a sensitive key is unmappedreplicas, auto PodDisruptionBudget, distinct startup/liveness/readiness probes, andttlSecondsAfterFinishedon CronJobsChart.AppVersionwithpullPolicy: IfNotPresent; optionalimage.digestpin; enforcekubeVersion >=1.25.0-0Type of Change
Testing
Tested manually —
helm lintclean,helm templaterenders 2055 lines without error across all three secret modes.Checklist