Skip to content

v0.6.76: helm updates, media centering, lazy loading, security hardening#4576

Merged
waleedlatif1 merged 10 commits into
mainfrom
staging
May 13, 2026
Merged

v0.6.76: helm updates, media centering, lazy loading, security hardening#4576
waleedlatif1 merged 10 commits into
mainfrom
staging

Conversation

@waleedlatif1
Copy link
Copy Markdown
Collaborator

@waleedlatif1 waleedlatif1 commented May 13, 2026

waleedlatif1 and others added 8 commits May 12, 2026 14:23
#4569)

* fix(helm): preserve STS serviceName + networkPolicy.egress back-compat

Greptile flagged two real upgrade-breaking changes vs the prior chart:

1. statefulset-postgresql spec.serviceName flipped from <name>-postgresql
   to <name>-postgresql-headless. spec.serviceName is immutable, so any
   existing install would hit 'Forbidden: updates to statefulset spec ...'
   on helm upgrade. Revert to the original name (the headless Service in
   services.yaml is added alongside, not as a swap).

2. networkPolicy.egress changed from a list to a map ({extraRules, exceptCidrs}),
   silently dropping any custom egress list set by existing users. Restore
   the original list semantics for networkPolicy.egress and move cloud-metadata
   blocking to a sibling top-level field networkPolicy.egressExceptCidrs.

Adds NOTES.txt upgrade-notes entry covering both + the ESO v1→v1beta1 default
flip (functionally a no-op, but worth surfacing).

* docs(helm): update README egress reference to new key name

* fix(helm): revert copilot-postgresql STS serviceName too (same immutability issue)

Audit caught that the main fix in d5c2e8e missed statefulset-copilot-postgres.yaml,
which had the identical immutable-field rename from -copilot-postgresql to
-copilot-postgresql-headless. Same upgrade-break vector for anyone running
copilot.enabled=true on a prior chart version. Mirrors the fix and comment
from the main postgresql STS.

* improvement(helm): postgres startupProbe + otel-collector NetworkPolicy

- add startupProbe defaults for both postgresql + copilot-postgresql STSs
  to shield liveness from slow first-boot (pgvector init, WAL replay)
- render a dedicated NetworkPolicy for the otel-collector when
  telemetry.enabled=true (OTLP ingress from app/realtime/copilot, DNS +
  HTTPS egress for forwarding to external observability backends)
- document why copilot + copilot-postgresql intentionally do NOT ship
  dedicated NetworkPolicies (Redis URL is unknowable at render time)
- regression test pins the otel-collector NP at documentIndex 3

* test(helm): assert custom egress applied to realtime NP too

The prior test claimed coverage of both app and realtime NPs but only
asserted documentIndex 0. Split into two tests so a regression that drops
custom egress from realtime would fail loudly.

* docs(helm-skill): trim narrative bloat in values-model

Cut the historical 'Layer 2 was added in chart 1.0.0' note and the
generic 'single source of truth' framing. Kept the two actionable
points: ESO requires mapping Layer 1 keys; app.env overrides
envDefaults.
* fix(docs): restore media centering and full-width intro image

* fix(docs): drop overflow-hidden from intro media wrappers so focus ring is not clipped

* fix(docs): use inset focus ring on lightbox media so parent overflow-hidden cannot clip it

* fix(docs): drop focus ring on lightbox media to match original UI
…ydration, safer materialization, and batched parallel execution (#4560)

* improvement(resolver): lazy resolution for underlying fields greater than 10MB

* progress

* feat(parallel): batching

* codegen to allow inline substitution

* address comments

* ui inconsistencies

* cleanup redundant code

* address more comments

* address comments

* replace helper

* fix tests
…mode (#4573)

* improvement(workflow-block): support manual workflow ID via advanced mode

* fix(input-mapping): resolve workflowId via canonical hook for advanced mode

* fix(input-mapping): fall back to manualWorkflowId in preview context

* refactor(input-mapping): resolve workflowId via useDependsOnGate canonical pattern
…h, credential access (#4571)

* fix(security): harden HIGH deepsec findings across multiple attack surfaces

- Supabase tools (get_row, delete, update): validate table name with strict
  identifier regex and encodeURIComponent to prevent LLM-controlled path
  traversal to admin endpoints; add missing empty-filter guard to update
  matching the delete.ts pattern

- SFTP/SMTP/SharePoint upload routes: add verifyFileAccess ownership check
  before downloadFileFromStorage, matching the WordPress reference pattern;
  rejects files the requesting user does not own with 404

- Gmail labels, OneDrive folders, Wealthbox items (×2): replace bare
  resolveOAuthAccountId + workspace-only membership check with
  authorizeCredentialUse which enforces credentialMember table; use
  credentialOwnerUserId for token refresh instead of bare accountRow.userId

- A2A utils: thread pre-resolved IP from validateUrlWithDNS into A2A SDK
  via pinnedFetch (secureFetchWithPinnedIP) for JsonRpcTransportFactory,
  RestTransportFactory, and DefaultAgentCardResolver, closing the TOCTOU
  DNS rebinding window

- SSH utils: cap stdout/stderr accumulation at 16 MB with truncation marker
  to prevent OOM from unbounded command output

- Form DELETE route: replace db.delete() with db.update({archivedAt}) for
  true soft delete matching the schema's archivedAt column

- Workflow admin import: fix Array.isArray() guard that silently dropped
  all variables (export format is Record, not Array)

- Multipart upload: apply checkStorageQuota and MAX_WORKSPACE_FILE_SIZE to
  mothership context, closing the quota bypass for workspace-scoped storage

* fix(security): eliminate workspace env lost-update race with atomic JSONB ops

PUT: use `variables || excluded.variables` in onConflictDoUpdate so
concurrent writes merge atomically in the DB instead of last-writer-wins
at the application layer.

DELETE: replace the read-modify-write upsert with a single UPDATE that
removes keys via the JSONB `-` operator, preventing concurrent deletes
from resurrecting previously-removed secrets.

* fix(security): address audit findings from security fix review

- SMTP send: restructure attachment loop from Promise.all to sequential
  for...of so verifyFileAccess denial returns 404 instead of propagating
  as a generic 500 via the SMTP error classifier

- Supabase tools: extend table-name validation and encodeURIComponent to
  the five previously missed tools — insert, upsert, count, query,
  text_search — completing coverage across all nine Supabase tools

- Credential routes: remove unnecessary `request as any` casts in Gmail,
  OneDrive, and Wealthbox routes; authorizeCredentialUse already accepts
  NextRequest directly

- Form soft delete: also set isActive=false alongside archivedAt so that
  any future code paths querying by isActive see a consistent state

- SSH utils: fix exit code fallback from 0 to -1 so an abnormally closed
  connection that supplies no exit code is not reported as success

- Workspace env: capitalize EXCLUDED.variables in the onConflictDoUpdate
  set clause to make the pseudo-table reference unambiguous

* fix(security): address PR review comments and harden deepsec fixes

- fix(env): replace jsonb operators with transaction+FOR UPDATE read-modify-write
  - PUT: uses db.transaction + SELECT FOR UPDATE + JS merge to avoid lost-update race
  - DELETE: same pattern; fixes variable scope bug where current was referenced outside tx
  - removes broken || and - jsonb operators that fail on json-typed column

- fix(ssh): trim truncated output consistently with non-truncated path

- fix(gmail): remove redundant resolveOAuthAccountId call
  - adds credentialType field to CredentialAccessResult
  - authorizeCredentialUse now returns credentialType in all success paths
  - gmail/labels route uses authz.credentialType and authz.resolvedCredentialId directly

- fix(supabase): centralize table identifier validation
  - adds validateDatabaseIdentifier() to input-validation.ts
  - all 8 supabase tools use the shared util instead of inline regex

* fix(workflows): fix VariableType assignment in admin workflow import route

The intermediate Record cast used 'string' for the type field which TypeScript
correctly rejected — WorkflowVariable.type is 'VariableType', not string.
Changed the cast to use VariableType so both branches typecheck correctly.

* fix(a2a): handle Request objects in pinnedFetch URL extraction

* fix(security): extract shared file-access guard; merge workspace/mothership branch

* fix(security): advisory lock for env first-insert race; handle all BodyInit types in pinnedFetch

* chore: remove inline comment from advisory lock

* fix(security): remove stray comment; narrow credentialType to literal union

* fix(security): add credentialId validation to wealthbox oauth route; fix null body override in pinnedFetch

* fix(security): stream A2A response body to unblock SSE; keep text/json/arrayBuffer for non-streaming callers

* fix(security): resolve credentialId guard on OneDrive, use assertToolFileAccess in WordPress, memoize body buffer to prevent silent empty reads, fix ArrayBuffer type cast

* fix(security): handle string[][] HeadersInit format in pinnedFetch

* fix(security): keep abort listener alive during body streaming; clean up in stream end/error/cancel

* chore: remove extraneous inline comment

* fix(security): cleanup abort listener when maxResponseBytes limit is exceeded
…conciling dropped SSE events (#4575)

* fix(console): match child-workflow inner blocks by instanceId when reconciling dropped SSE events

* fix(console): drop noisy warn when reconcile finds no matching entry
…o clear stuck-yellow task tiles (#4556)

* fix(mothership): reconcile stuck conversation_id against Redis lock to clear stuck-yellow task tiles

copilot_chats.conversation_id has no TTL/heartbeat, so when a stream
process dies before the clear path runs (pod OOM, SIGKILL, uncaught
throw, deploy mid-stream) the column is orphaned and the task tile
renders yellow forever. The Redis lock at copilot:chat-stream-lock:<chatId>
is the canonical liveness signal and self-heals via 60s TTL + 20s
heartbeat, but the mothership APIs weren't consulting it.

Adds read-time reconciliation: a batched MGET helper checks whether
each persisted conversation_id still has a live Redis lock, and both
GET /api/mothership/chats and GET /api/mothership/chats/[chatId]
rewrite the marker to null when the lock has expired. No DB writes;
stuck rows self-heal on next fetch.

* test(mothership): clarify test name to reflect that getActiveChatStreamIds is called with empty candidateIds

* address comments

* fix state machine issue

* cleanup code and fix types

---------

Co-authored-by: Vikhyath Mondreti <vikhyath@simstudio.ai>
)

* improvement(grafana): align tools and block with official Grafana API spec

Validates and corrects the Grafana integration against the official API
docs: fixes wire-format field naming for provisioned alert rules
(missing_series_evals_to_resolve, keepFiringFor, orgID), adds
X-Disable-Provenance support, expands alert-rule params (isPaused,
notificationSettings, record, annotations, labels), corrects defaults
(execErrState=Error, dashboard overwrite=false), and centralizes alert-rule
output mapping in a shared utils module.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(grafana): correct wire-format casing for provisioned alert rule fields

Grafana's ProvisionedAlertRule schema (verified against upstream Go source
and swagger spec) uses keep_firing_for (snake_case) and
missingSeriesEvalsToResolve (camelCase) — the opposite of what prior audit
rounds assumed. POST/PUT bodies now send the correct field names; mapAlertRule
reads the correct primary names with the old casings kept as fallbacks.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(grafana): address PR review feedback

- Drop hardcoded orgID: 1 fallback; only send orgID when organizationId is
  provided, so token-scoped org context drives rule placement.
- Surface invalid JSON for notificationSettings/record on alert rule
  create/update instead of silently dropping the input.
- Fix execErrState description in update_alert_rule to include Error.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(grafana): surface invalid JSON for annotations/labels/data on alert rules

Match the behavior of other JSON params (data, notificationSettings, record):
return a descriptive error instead of silently falling back to {} (create)
or keeping the existing value (update).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(grafana): expose alert-rule output fields in generated docs

Move ALERT_RULE_OUTPUT_FIELDS from utils.ts to types.ts and rename to
SCREAMING_SNAKE_CASE so scripts/generate-docs.ts (which only resolves const
references from types.ts matching [A-Z][A-Z_0-9]+) can inline the per-field
rows into the generated alert-rule output tables.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 13, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped May 13, 2026 4:16am

Request Review

@cursor
Copy link
Copy Markdown

cursor Bot commented May 13, 2026

PR Summary

High Risk
High risk because it changes execution response serialization/streaming (including new lazy materialization hooks) and touches multiple authz/file-access paths for third-party tool routes, where regressions could impact data exposure or workflow execution reliability.

Overview
Improves execution handling for oversized outputs by compacting workflow/function route payloads and SSE block logs via durable large-value references, and adds isolated-vm brokers (sim.files.*, sim.values.read) to lazily hydrate large files/values under byte caps with clearer resource-limit errors.

Updates workflow runtime/editor behavior by introducing parallel batch size defaults/config merging to avoid partial-update clobbering, wiring streaming routes with additional execution context, and adding tests/fixes to reconcile Mothership chat activeStreamId against Redis locks (preventing “stuck-yellow” tiles).

Hardens several tool/API routes by centralizing credential authorization (authorizeCredentialUse), adding a shared assertToolFileAccess guard for storage downloads, soft-deleting forms, adding DB advisory locks for workspace env var updates, truncating SSH output at 16MB, and refreshing docs/UI (lightbox button layout, async API jobId/statusUrl, Grafana spec updates, and new guidance on large payloads/batching).

Reviewed by Cursor Bugbot for commit c21bb91. Configure here.

…false (#4579)

* preserveUserFileBase64 is on and the event still exceeds the threshold, re-compact the event with preserveUserFileBase64: false

* address comments
@waleedlatif1 waleedlatif1 merged commit 64d855a into main May 13, 2026
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants