Skip to content

v0.6.75: scheduler claim-budget drain, helm chart hardening, mothership md polish#4568

Merged
TheodoreSpeaks merged 3 commits into
mainfrom
staging
May 12, 2026
Merged

v0.6.75: scheduler claim-budget drain, helm chart hardening, mothership md polish#4568
TheodoreSpeaks merged 3 commits into
mainfrom
staging

Conversation

@TheodoreSpeaks
Copy link
Copy Markdown
Collaborator

waleedlatif1 and others added 3 commits May 12, 2026 10:24
… design tokens (#4566)

* improvement(mothership): align markdown blockquote, img, em, del with design tokens

* fix(mothership): correctly scope blockquote paragraph margin reset to first/last child

* improvement(mothership): restore italic on blockquotes

* fix(mothership): widen img component prop type to satisfy Streamdown Components
…erhaul (#4565)

* improvement(helm): production-ready chart with security, ESO, and docs overhaul

Comprehensive Helm chart improvements bringing the chart up to industry
standards for security, secret management, and documentation.

Security
- Pod Security Standards "restricted" defaults on every pod and container
  (runAsNonRoot, allowPrivilegeEscalation=false, capabilities.drop=[ALL],
  seccompProfile=RuntimeDefault)
- automountServiceAccountToken=false on ServiceAccount and every pod
- NetworkPolicy egress blocks cloud metadata endpoints by default
- Sensitive app/realtime env keys auto-partitioned into chart-managed Secret
  via envFrom; no more plaintext secrets on container specs

Secret management
- Three modes: inline, existingSecret, ExternalSecrets Operator (ESO)
- ESO sync supports arbitrary sensitive keys
- Fail-fast template rendering when ESO enabled but sensitive key unmapped
- AWS/Azure/GCP example files document all three modes

Reliability
- Headless Services for both Postgres StatefulSets
- HPA-aware replicas (omits spec.replicas when autoscaling.enabled)
- PodDisruptionBudget auto-activates when replicaCount > 1
- Startup / liveness / readiness probes with distinct timings
- CronJob ttlSecondsAfterFinished for automatic cleanup

Chart hygiene
- Image tags default to Chart.AppVersion; pullPolicy IfNotPresent
- Optional image.digest pin for content-addressed deploys
- kubeVersion >=1.25.0-0 enforced
- Ollama pinned to 0.23.2; mount moved to /data

Documentation
- README rewritten in cert-manager / Bitnami style
- NOTES.txt with post-install guidance
- Example values files annotated with usage and secret-strategy guidance

* fix(helm): correct resource names in README (sim-sim-* → sim-*)

The sim.fullname helper collapses to the release name when the release
name contains the chart name. With the documented release name 'sim',
actual resources are 'sim-app', 'sim-postgresql', etc. — not the
'sim-sim-*' form previously documented. Fixes copy-paste commands in the
pre-1.0.0 upgrade walkthrough and several troubleshooting snippets.

Also expands the cronjobs component description to reflect the full set
of 13 scheduled jobs (was understated as just Gmail/Outlook polling).

* improvement(helm): split app/realtime env into Secret-bound + inline defaults

- Add app.envDefaults / realtime.envDefaults for chart-shipped operational
  tunables (rate limits, timeouts, IVM, feature-flag defaults, localhost URL
  fallbacks). Rendered inline on the container, not into the Secret
- Remove operational defaults from app.env / realtime.env so the chart-managed
  Secret stays minimal and External Secrets Operator users only map keys they
  actually set, not every chart default
- Skip an envDefaults key when the user explicitly sets it in env (K8s `env`
  overrides `envFrom`, so an inline default would otherwise mask a Secret
  value at runtime)
- Relax values.schema.json to allow empty strings on NEXT_PUBLIC_APP_URL,
  BETTER_AUTH_URL, NEXT_PUBLIC_SUPPORT_EMAIL (defaults supplied via envDefaults)

* fix(helm): address PR review — cronjob validation, ESO apiVersion, secret merge order, image guard

- CronJobs reference CRON_SECRET via secretKeyRef; fail-fast at template
  time when cronjobs.enabled=true and app.env.CRON_SECRET is empty so users
  get a clear error instead of a CreateContainerConfigError loop
- Default externalSecrets.apiVersion to "v1beta1" (supported by every ESO
  release since v0.7). The previous "v1" default targets only ESO v0.17+
- Swap merge order in secrets-app.yaml so app.env wins over realtime.env
  for shared keys (BETTER_AUTH_SECRET, BETTER_AUTH_URL, …) — both pods
  consume the same Secret via envFrom, so the app value must be canonical
- Add `required` guard on sim.image so an empty tag + empty digest +
  empty Chart.AppVersion surfaces as a clear template-time error instead
  of rendering an invalid `repo:` reference

* fix(helm): require critical secrets to be mapped when ESO is enabled

Previously, enabling externalSecrets without mapping BETTER_AUTH_SECRET /
ENCRYPTION_KEY / INTERNAL_API_SECRET (and CRON_SECRET when cronjobs are
on) rendered cleanly but produced CrashLoopBackOff at runtime with
cryptic missing-env errors. Fail at template time instead.

* fix(helm): auto-enable PDB when HPA minReplicas > 1

Previously the auto-enable predicate only checked the static
app.replicaCount, which defaults to 1 even when autoscaling is on
(HPA owns spec.replicas). PDB now also activates when
autoscaling.enabled=true and minReplicas > 1.

* fix(helm): prevent realtime envDefaults from masking app.env Secret values; add StatefulSet upgrade NOTES

- Realtime override-skip now considers keys set in either app.env or
  realtime.env. The shared app Secret is mounted via envFrom on the
  realtime pod, so a key set in app.env (e.g. NEXT_PUBLIC_APP_URL) would
  previously be masked by the realtime envDefault (inline env overrides
  envFrom in K8s).
- NOTES.txt now prints a StatefulSet orphan-delete reminder on upgrade,
  surfacing the immutable serviceName issue documented in the README.

* feat(helm): add Claude Skill for chart deployment

Adds a skill at helm/sim/.claude/skills/sim-helm/ that teaches agents how
to deploy and troubleshoot the Sim Helm chart: install path selection
(inline / existingSecret / ESO), secret generation, the values.yaml
four-layer mental model, common-failure troubleshooting, and the
pre-1.0.0 StatefulSet orphan-delete upgrade procedure.

Skill is loadable by Claude Code, Codex, and OpenCode via the standard
skills convention (directory name matches frontmatter name).

* docs(helm): add CRON_SECRET to TL;DR, dry-run, and example install headers

The validateSecrets guard requires CRON_SECRET when cronjobs.enabled=true
(the default), but the quickstart and example file install commands
omitted it — users following the docs hit a hard template-render failure.
Adds CRON_SECRET to README TL;DR, validate-the-install dry-run snippet,
and the install command headers in all example values files.

* fix(helm): require INTERNAL_API_SECRET in inline secret mode

The ESO coverage validator already required INTERNAL_API_SECRET, but the
inline validateSecrets path only checked BETTER_AUTH_SECRET, ENCRYPTION_KEY,
and CRON_SECRET — letting inline installs render successfully and then
crash at runtime when the realtime↔app shared auth secret was missing.
Adds the same fail-fast check to the inline path.

* docs(helm): surface INTERNAL_API_SECRET upgrade requirement in NOTES.txt

The new validateSecrets check makes app.env.INTERNAL_API_SECRET mandatory
on upgrade. Existing installs that never set it would hit a template
render failure with no in-context guidance. Adds an upgrade-only note
with the generation snippet and storage guidance alongside the existing
StatefulSet orphan-delete instructions.

* fix(helm): NetworkPolicy egress to OTEL collector + external-db example format

- Add app/realtime NetworkPolicy egress rules for the OpenTelemetry
  collector pod on ports 4317 (OTLP gRPC) and 4318 (OTLP HTTP) when
  telemetry.enabled=true. Without these, traces and metrics were silently
  dropped with connection-refused errors when both telemetry and
  networkPolicy were enabled.
- Migrate values-external-db.yaml from the legacy list-shaped egress
  format to the new {exceptCidrs, extraRules} object. The list form would
  replace the default object on merge and crash template rendering when
  the chart tried to access .exceptCidrs on a list.

* fix(helm): NOTES.txt no longer prints false secret warning for ESO users

The secrets-empty warning only checked app.secrets.existingSecret.enabled
before scanning app.env. ESO users intentionally leave app.env empty —
secrets come from the ESO-synced Secret — so every ESO install/upgrade
printed a misleading 'pods will fail to start' warning.

Reorders the branches so externalSecrets.enabled takes precedence: ESO
users now see a confirmation message with kubectl commands to verify the
ExternalSecret has synced. The empty-app.env warning only fires when
both ESO and existingSecret are disabled.

* fix(helm): existingSecret mode no longer drops app.env / realtime.env values

In existingSecret mode the chart-managed Secret is not rendered, so non-empty
values in app.env / realtime.env had nowhere to land — yet the envDefaults
skip logic still suppressed the matching defaults. Result: keys like
NEXT_PUBLIC_APP_URL, BETTER_AUTH_URL, and NODE_ENV silently went missing
on both pods (the example values-existing-secret.yaml hit this directly).

Both app and realtime deployments now inline non-empty values from app.env
(plus realtime.env on the realtime container) when existingSecret is enabled
and ESO is not. Inline / ESO modes are unchanged: inline still flows through
the chart-managed Secret, ESO still owns the synced Secret.

* fix(helm): correct realtime env overlay + filter chart-computed keys in existingSecret mode

Realtime: Sprig merge gives the first source precedence and treats "" as a
real value, so realtime.env empty defaults for shared keys shadowed
non-empty app.env values. Replace with deepCopy($appEnv) base + manual
non-empty overlay of $rtEnv.

Both deployments: exclude DATABASE_URL/SOCKET_SERVER_URL/OLLAMA_URL from
the existingSecret inline path so user-supplied values can't override
chart-computed ones via last-wins env semantics.

* fix(helm): skip envDefaults in existingSecret mode + document egress rename

In existingSecret mode the user's pre-existing Secret is the source of
truth (loaded via envFrom). Inlining localhost envDefaults for URL keys
(BETTER_AUTH_URL, NEXT_PUBLIC_APP_URL, ALLOWED_ORIGINS) silently shadowed
the Secret-bound values because K8s env always wins over envFrom. Skip
envDefaults entirely on both deployments when existingSecret is enabled.

Also call out the networkPolicy.egress shape change (list -> map with
exceptCidrs + extraRules) in the NOTES.txt upgrade block so operators
migrate their custom rules rather than silently losing them.

* fix(helm): copy-pasteable install commands in copilot + ESO examples

values-copilot.yaml: the install header was missing every required
copilot.server.env.* secret (AGENT_API_DB_ENCRYPTION_KEY, INTERNAL_API_SECRET,
LICENSE_KEY, SIM_BASE_URL, SIM_AGENT_API_KEY, REDIS_URL, one model key) plus
copilot.postgresql.auth.password. Pasting it as-is failed at template render.

values-external-secrets.yaml: NEXT_PUBLIC_APP_URL, BETTER_AUTH_URL, etc. were
declared under app.env / realtime.env. In ESO mode the chart-managed Secret
isn't rendered, so the validator (rightly) rejects keys in app.env that
aren't mapped under externalSecrets.remoteRefs. Moved non-secret URL/config
to envDefaults, which is inlined and not subject to the ESO mapping rule.

* polish(helm): configurable NetworkPolicy ingress peers + clearer API_ENCRYPTION_KEY comment

- networkPolicy.ingressFrom lets operators scope the ingress-controller
  rule to a specific namespace/podSelector. Defaults to a single empty
  peer (`- {}`), which is the explicit form of "any source" — same
  effective behavior as the old `from: []` but unambiguous across CNIs.
  To restrict, override with e.g.:
    networkPolicy:
      ingressFrom:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: ingress-nginx

- API_ENCRYPTION_KEY comment: drop the "must be exactly 64 hex
  characters" phrasing that sat awkwardly next to `openssl rand -hex 32`.
  The generation command already produces the required length.

* test(helm): add helm-unittest suites + CI workflow + ci values matrix

- 7 helm-unittest suites covering smoke, validators, secret modes,
  envDefaults secret-mode-aware inlining (round-9 regression net),
  chart-computed env keys (round-8 regression net), NetworkPolicy
  shape, and PDB/HPA conditional rendering (38 tests, ~265ms).
- ci/*.yaml render fixtures for default, production, existingSecret,
  ESO, and external-db install modes.
- GitHub Actions workflow runs helm lint --strict, helm unittest,
  helm template across the ci matrix, and kubeconform validation
  against Kubernetes 1.30 schemas.
- CONTRIBUTING.md documents how to run the same gates locally.

* test(helm): add helm test hook + kind apiserver dry-run in CI

- New templates/tests/test-connection.yaml renders a Pod with
  helm.sh/hook=test that wgets the app Service (and realtime when
  enabled). Lets users run `helm test <release>` after install for
  a real in-cluster connectivity check. Restricted PSS context.
- tests.* values block (image, timeoutSeconds, resources) is the
  knob to disable or tune the probe; documented in values.schema.json.
- 3 helm-unittest tests cover the hook annotations, PSS context,
  and tests.enabled=false skip path (41 tests total).
- New CI job spins up a kind v1.30 cluster and runs
  `kubectl apply --dry-run=server` against the rendered manifests
  for the CRD-free ci fixtures (default / existing-secret /
  external-db). Catches admission and validation issues the static
  kubeconform schema check can't see.

* chore(helm): remove pre-1.0.0 upgrade fluff + tighten .helmignore

This is the 1.0.0 release of the chart — there is no pre-1.0.0
predecessor for users to upgrade from, so all of the dedicated upgrade
narration was hypothetical.

- Drop the 'Upgrading from a pre-1.0.0 build' README section and the
  matching troubleshooting entry.
- Drop the .Release.IsUpgrade block from NOTES.txt: items 5 (StatefulSet
  orphan-delete), 6 (INTERNAL_API_SECRET 'new in 1.0.0'), 7
  (networkPolicy.egress shape change). Each described a migration off a
  chart version that never shipped.
- Delete references/upgrade-pre-1.0.0.md and remove the corresponding
  pointers from SKILL.md.
- Anchor .helmignore patterns to chart root so /tests/ (unit suites)
  and /examples/ are dropped from the packaged tarball without also
  dropping templates/tests/ (the helm test hook).

* chore(helm): drop CI workflow + ci/ fixtures + CONTRIBUTING.md

The helm-unittest suites in helm/sim/tests/ and the helm test hook
in helm/sim/templates/tests/ stay — those are chart-internal quality
scaffolding, not CI. Removed:

- .github/workflows/helm-chart.yml
- helm/sim/ci/*.yaml (5 render fixtures used only by the workflow)
- helm/sim/CONTRIBUTING.md (mostly documented those gates)
- dead /ci/ and /CONTRIBUTING.md entries in .helmignore

* feat(helm): pod rollout on Secret change + topologySpreadConstraints

- Add checksum/secret pod annotations on app, realtime, and copilot
  Deployments (plus checksum/config on app when branding ConfigMap is
  enabled). Closes the long-standing footgun where 'helm upgrade' with
  a changed Secret would silently leave pods running the old values
  until a manual rollout restart.
- New top-level topologySpreadConstraints value (and sim.topologySpreadConstraints
  helper) applied to app and realtime Deployments. Mirrors how affinity
  and tolerations are plumbed; users supply their own labelSelector
  to mirror Bitnami convention.
- 5 helm-unittest cases cover the checksum annotations and topology
  spread rendering (46 tests total).

* fix(helm): drop empty-string shadowing in app/realtime env merge

Sprig 'merge' treats "" as a real value, so a default-empty
app.env.BETTER_AUTH_URL would shadow a non-empty realtime.env override
and the URL would never reach the rendered Secret. Replace 'merge'
with an explicit two-pass overlay that filters empties before writing,
mirroring the same pattern already used in deployment-realtime.yaml's
existingSecret block.

Adds two regression tests: realtime.env-only value reaches the Secret
when app.env is empty, and app.env still wins on collision when both
are non-empty (48 tests total).

* fix(helm): make topologySpreadConstraints per-component to match docstring

Greptile flagged that sim.topologySpreadConstraints helper docstring promised
per-component config (.Values.app, .Values.realtime, ...) but call sites
passed .Values, so any app.topologySpreadConstraints / realtime.topologySpreadConstraints
set by the user was silently dropped. The single global key also prevented
distinct app-vs-realtime spread rules.

Pass .Values.app / .Values.realtime to the helper at each call site; move
the top-level topologySpreadConstraints key into both component sections in
values.yaml. Adds a regression test that app constraints don't leak onto
the realtime pod.

* fix(helm): allow cron pods through app NetworkPolicy

Cursor flagged that when networkPolicy.enabled=true and cronjobs.enabled=true
(the recommended production config), the app NetworkPolicy only allowed
ingress from realtime and the ingress controller — silently blocking every
cron pod's HTTP call to /api/schedules/execute, webhook polls, etc. All 13
default cronjobs would fail.

Tag cron pods with a stable simstudio.ai/component-group: cronjob label so
the app NetworkPolicy can allow them with a single rule (no per-job
enumeration). Rule is conditional on cronjobs.enabled. Adds positive and
negative regression tests.
…4567)

* improvement(scheduler): raise per-tick claim budget to drain backlog

MAX_CRON_CLAIMS 20 -> 100; reserved workflow/job slots 10/10 -> 50/50.
Throughput was capped at 20 schedules/tick which created a 20+ hour
backlog when due work exceeded ~1 item per cron-second.

* improvement(scheduler): raise per-tick claim budget to 200

Bumps MAX_CRON_CLAIMS 100 -> 200 (workflow/job split 100/100). Pairs
with the fire-and-forget cron Lambda change so per-tick processing
time is no longer bounded by the Lambda's 50s HTTP timeout.
@vercel
Copy link
Copy Markdown

vercel Bot commented May 12, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped May 12, 2026 8:07pm

Request Review

@cursor
Copy link
Copy Markdown

cursor Bot commented May 12, 2026

PR Summary

Medium Risk
Medium risk because it significantly changes Helm chart templating for secrets/env precedence, network policies, and workload specs, which can break or alter Kubernetes deployments if values are misconfigured. Scheduler claim-budget increase could also increase load by executing more due work per tick.

Overview
Raises the schedule executor claim budget (from 20/10 to 200/100) to drain larger backlogs faster.

Aligns markdown rendering across chat/file/note viewers by standardizing blockquote styling and adding support/styling for em, del, and images in workspace chat content.

Overhauls the helm/sim chart for safer production ops: bumps chart metadata/versioning, adds install/upgrade guidance and NOTES.txt, tightens secret validation (including ESO fail-fast coverage checks), switches cron jobs to source CRON_SECRET from the mounted Secret, and introduces more consistent PSS-restricted security contexts, topology spread hooks, network policy improvements (cron ingress, telemetry egress, configurable ingress peers, metadata IP block exceptions), plus headless Postgres services and updated example values/tests.

Reviewed by Cursor Bugbot for commit 05892f7. Bugbot is set up for automated code reviews on this repo. Configure here.

@gitguardian
Copy link
Copy Markdown

gitguardian Bot commented May 12, 2026

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secret in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
32763747 Triggered Generic Password 9d2dd8f helm/sim/tests/validators_test.yaml View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secret safely. Learn here the best practices.
  3. Revoke and rotate this secret.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 12, 2026

Greptile Summary

This release bundles three changes: a 10× increase to the scheduler's per-tick claim budget to drain backlogs faster; a Helm chart major-version overhaul (0.1.0 → 1.0.0) adding security hardening, External Secrets Operator improvements, and extensive documentation; and alignment of blockquote, em, del, and img markdown rendering across five UI surfaces to use consistent design tokens.

  • Scheduler (route.ts): MAX_CRON_CLAIMS raised from 20 → 200; no logic changes beyond the budget constants.
  • Helm chart v1.0.0: sim.image gains digest-pinning and Chart.AppVersion fallback; envDefaults introduced to separate operational tunables from secret-grade values; NetworkPolicy hardened with IMDS CIDR blocking; PostgreSQL StatefulSet given a proper headless governing service (immutable field — see inline comment); networkPolicy.egress schema changed from a flat list to an object (breaking for existing users with custom egress rules).
  • Markdown rendering: blockquote, em, del, and img components added or restyled across four files to use var(--divider) / var(--text-primary) design tokens.

Confidence Score: 3/5

Safe for new installs and the app/scheduler changes; existing PostgreSQL deployments will fail to upgrade without manual StatefulSet deletion.

The PostgreSQL StatefulSet spec.serviceName change from the ClusterIP service to the new headless service is a technically correct fix, but Kubernetes treats that field as immutable after creation. Any operator running helm upgrade on an existing release will hit a hard API rejection and need to delete and recreate the StatefulSet. Additionally, the networkPolicy.egress schema shifted from a flat list to a nested object — users who previously enabled NetworkPolicy with custom egress rules will find those rules silently absent after the upgrade. Both issues are contained to the Helm chart and do not affect the application or scheduler logic, which are straightforward and low-risk.

helm/sim/templates/statefulset-postgresql.yaml (immutable serviceName change), helm/sim/values.yaml (networkPolicy.egress schema change)

Important Files Changed

Filename Overview
apps/sim/app/api/schedules/execute/route.ts Raises per-tick claim budget 10x (MAX_CRON_CLAIMS 20→200, RESERVED_WORKFLOW_CLAIMS 10→100) to drain schedule backlogs; logic and concurrency guards are unchanged.
apps/sim/app/chat/components/message/components/markdown-renderer.tsx Blockquote styling updated to use CSS design-token variables; inconsistent with all other elements in this file which use concrete Tailwind palette classes with explicit dark-mode variants.
apps/sim/app/workspace/[workspaceId]/home/components/message-content/components/chat-content/chat-content.tsx Adds blockquote, em, del, and img markdown components using design tokens; img includes a proper null-guard and lazy loading.
helm/sim/templates/statefulset-postgresql.yaml Adds headless service as governing service; also adds automountServiceAccountToken, podManagementPolicy, and updateStrategy. The governing service rename is an immutable field change that will block helm upgrade on existing deployments.
helm/sim/templates/_helpers.tpl Major refactor: sim.image supports digest pinning and Chart.AppVersion fallback; sim.securityContext renamed to sim.containerSecurityContext with secure defaults; INTERNAL_API_SECRET and CRON_SECRET validation checks added (correctly guarded by existingSecret/ESO); ESO coverage validation added.
helm/sim/templates/networkpolicy.yaml Significant hardening: metadata-endpoint CIDR blocking (169.254.169.254/32), configurable ingressFrom peers, cron-pod-to-app ingress rule, telemetry egress, and OTEL ports; old flat egress list migrated to egress.extraRules.
helm/sim/templates/external-secret-app.yaml Refactored from hardcoded 6-key allowlist to iterating externalSecrets.remoteRefs.app, supporting both string (legacy) and map (full remoteRef block) values; apiVersion default changed to v1beta1.
helm/sim/values.yaml Large refactor separating env (secret-grade, Secret-bound) from envDefaults (operational tunables, inline env on container); image tag defaults to empty string (falls back to Chart.AppVersion); networkPolicy.egress schema changed from list to object — breaking for existing users with custom egress rules.
helm/sim/templates/services.yaml Adds headless Service (*-postgresql-headless) for PostgreSQL StatefulSet with publishNotReadyAddresses=true; existing ClusterIP service unchanged.
helm/sim/templates/deployment-app.yaml HPA-aware replica gate added; pod rollout triggered on config checksum; serviceAccount token auto-mount disabled; startupProbe and topology spread support added; env routing refactored for envDefaults vs existingSecret vs ESO modes.
helm/sim/templates/cronjobs.yaml Auth token now sourced from a SecretKeyRef rather than a plain value; component-group label added for NetworkPolicy targeting; serviceAccount token auto-mount disabled; job TTL cleanup field added.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[helm upgrade] --> B{statefulset-postgresql spec.serviceName changed?}
    B -- existing install --> C[Forbidden: immutable field - helm upgrade fails]
    B -- fresh install --> D[StatefulSet created with headless governing service]
    E[app.env keys] --> F{secret mode?}
    F -- inline --> G[chart-managed Secret plus envDefaults inline env]
    F -- existingSecret --> H[user Secret plus app.env inline env]
    F -- ESO --> I[ExternalSecret syncs remoteRefs.app to Secret]
    J[networkPolicy.egress] --> K{key type}
    K -- list pre-1.0 --> L[Rules silently dropped - now expects egress.extraRules]
    K -- egress.extraRules --> M[Rules applied]
Loading

Reviews (1): Last reviewed commit: "improvement(scheduler): raise per-tick c..." | Re-trigger Greptile

{{- include "sim.postgresql.labels" . | nindent 4 }}
spec:
serviceName: {{ include "sim.fullname" . }}-postgresql
serviceName: {{ include "sim.fullname" . }}-postgresql-headless
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Immutable StatefulSet field change breaks helm upgrade

spec.serviceName is an immutable field on a StatefulSet — Kubernetes only allows mutations to replicas, template, updateStrategy, persistentVolumeClaimRetentionPolicy, and minReadySeconds. Changing it from <release>-postgresql to <release>-postgresql-headless will cause helm upgrade to fail with a Forbidden: updates to statefulset spec for fields other than … error on any installation that already has the PostgreSQL StatefulSet. Operators would need to delete the StatefulSet (and re-attach the PVC) before the upgrade can proceed. Upgrade documentation or a pre-upgrade hook that deletes the old StatefulSet gracefully should be provided.

Comment thread helm/sim/values.yaml
Comment on lines +970 to +984
ingressFrom:
- {}

# Custom ingress rules appended to the policy
ingress: []

# Custom egress rules
egress: []

# Egress configuration
egress:
# CIDRs excluded from broad HTTPS (443) egress.
# Defaults block AWS/GCP/Azure IMDS (169.254.169.254/32) and ECS task metadata
# (169.254.170.2/32). Add your cluster's API server CIDR for stronger isolation.
exceptCidrs:
- "169.254.169.254/32"
- "169.254.170.2/32"
# Custom egress rules appended to the policy
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Breaking schema change for networkPolicy.egress

The networkPolicy.egress key changed type from a plain list to a map with extraRules and exceptCidrs sub-keys. Any existing deployment that set networkPolicy.egress: [...] will have those rules silently dropped because the template now reads networkPolicy.egress.extraRules rather than networkPolicy.egress. Since networkPolicy defaults to enabled: false, only deployments that explicitly enabled it are affected, but those users will lose their custom egress rules without any error or warning.

Comment on lines 117 to 121
blockquote: ({ children }: React.HTMLAttributes<HTMLQuoteElement>) => (
<blockquote className='my-4 border-gray-300 border-l-4 py-1 pl-4 font-sans text-gray-700 italic dark:border-gray-600 dark:text-gray-300'>
<blockquote className='my-4 break-words border-[var(--divider)] border-l-2 pl-4 font-sans text-[var(--text-primary)] italic [&>p]:my-2 [&>p:first-child]:mt-0 [&>p:last-child]:mb-0'>
{children}
</blockquote>
),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 CSS variable mismatch with the rest of the file's styling

Every other element in COMPONENTS (paragraphs, headings, lists, tables, img, etc.) uses concrete Tailwind palette classes with explicit dark: variants (e.g. text-gray-700 dark:text-gray-300, border-gray-300 dark:border-gray-600). The updated blockquote is now the sole element using var(--divider) and var(--text-primary). In the rendering context of this component (the public chat / landing page, where the wrapper already uses var(--landing-text)), if --divider or --text-primary are not defined, the border and text color will fall through to the browser default rather than the intended dark-mode-aware value.

Comment thread helm/sim/values.yaml
Comment on lines +1506 to +1509
# ESO API version. Default "v1beta1" — supported by every ESO release from
# v0.7+ (mid-2023) through current. Set to "v1" only when targeting ESO
# v0.17+ clusters where the v1 API has graduated.
apiVersion: "v1beta1"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The default ESO apiVersion was changed from "v1" to "v1beta1". Existing installations that relied on the previous default (and are running ESO ≥ v0.17 where the v1 API is stable) will now render ExternalSecret resources at the v1beta1 API version on the next helm upgrade. While v1beta1 is functionally equivalent and this is a safe downgrade, the change is not surfaced to the user through a default-diff. Adding a note in NOTES.txt about this API version shift would reduce surprise.

Suggested change
# ESO API version. Default "v1beta1" — supported by every ESO release from
# v0.7+ (mid-2023) through current. Set to "v1" only when targeting ESO
# v0.17+ clusters where the v1 API has graduated.
apiVersion: "v1beta1"
# ESO API version. Use "v1beta1" for broad compatibility (ESO v0.7+).
# Set to "v1" when targeting ESO >= v0.17 clusters where the v1 API has graduated.
apiVersion: "v1beta1"

@TheodoreSpeaks TheodoreSpeaks merged commit 3e9849b into main May 12, 2026
30 of 31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants