feat(appkit): send internal telemetry via AppkitLog schema#332
feat(appkit): send internal telemetry via AppkitLog schema#332calvarjorge wants to merge 11 commits intomainfrom
Conversation
Introduce the AppkitLog event family (APP_STARTUP, HEARTBEAT,
REQUEST_METRICS) and a TelemetryReporter singleton that owns the
shared dispatch state, periodic heartbeat, and request metrics
aggregation. The server plugin records each matched route via
res.on('finish') middleware; the reporter flushes one event per
endpoint on a periodic timer. The legacy observability_log
APP_STARTUP is kept as a fallback until the AppkitLog schema is
deployed end-to-end on the telemetry backend.
Errors propagate from the inner senders so consumers can see
exactly what was POSTed and how the endpoint responded; the only
catches live at the SDK's outermost boundaries (fire-and-forget
startup + interval timers).
Adds an Internal Telemetry tab in dev-playground that lets you
trigger each event on demand and renders the request, response,
and an equivalent curl command. Disable with
disableInternalTelemetry: true or APPKIT_TELEMETRY_DISABLED=true.
Also unblocks the pre-commit knip hook by ignoring the @cyclonedx/cdxgen
dependency, which is invoked dynamically via pnpm exec from the
release:sbom script.
Co-authored-by: Isaac
Signed-off-by: Jorge Calvar <jorge.calvar@databricks.com>
The dev-playground registers the jobs() plugin, whose manifest requires DATABRICKS_JOB_ID, but app.yaml never declared a binding for it. As a result, deploying the playground to a fresh Databricks App fails AppKit's startup resource validation with "Missing required resources: job:Job [jobs]". Add the missing entry alongside the other resource bindings. Co-authored-by: Isaac Signed-off-by: Jorge Calvar <jorge.calvar@databricks.com>
The bare /telemetry endpoint rejects SP bearer tokens and 302s to /login.html?next_url=..., which the previous code tried to follow verbatim — but a relative location is not a valid fetch URL and threw a "Failed to parse URL" error that the legacy try/catch silently swallowed. Switch the dispatch URL to the SP-friendly /telemetry-ext endpoint, and harden redirect handling by resolving the location against the original request URL. Co-authored-by: Isaac Signed-off-by: Jorge Calvar <jorge.calvar@databricks.com>
Now that the AppkitLog dispatch path is verified end-to-end against /telemetry-ext, the observability_log fallback is no longer needed. Remove sendStartupTelemetry, the dead StartupTelemetryParams / buildEntityId / buildLegacyStartupPayload helpers, and the second fire-and-forget block in createApp's bootstrap. Migrate the sender test suite to cover sendAppkitLogs (which retains all the URL, auth, redirect, and error-propagation guarantees) and rewrite the core internal-telemetry tests against the reporter mock. Co-authored-by: Isaac Signed-off-by: Jorge Calvar <jorge.calvar@databricks.com>
Databricks Apps injects DATABRICKS_CLIENT_ID (the app's OAuth client UUID) into the runtime env, not DATABRICKS_APP_ID, so the old lookup always resolved to "" and the AppkitLog.app_id field went out empty. Switch the bootstrap to read DATABRICKS_CLIENT_ID so logs carry the actual per-app identifier. Co-authored-by: Isaac Signed-off-by: Jorge Calvar <jorge.calvar@databricks.com>
Rename APPKIT_TELEMETRY_DISABLED to DISABLE_APPKIT_INTERNAL_TELEMETRY.
The new name makes it explicit that this controls AppKit's
internal/anonymized telemetry, not the user-facing OpenTelemetry
config exposed via createApp({ telemetry }).
Co-authored-by: Isaac
Signed-off-by: Jorge Calvar <jorge.calvar@databricks.com>
sender.ts only wrapped postTelemetry with buildAppkitPayload — one caller, no shared logic worth a dedicated file. Have the reporter's #send call postTelemetry directly and rename the test file from sender.test.ts to client.test.ts so the wire-format coverage (URL, auth, redirects, error propagation) clearly targets postTelemetry. Co-authored-by: Isaac Signed-off-by: Jorge Calvar <jorge.calvar@databricks.com>
Document exactly what AppKit collects (event_name, app_id, appkit_version, plus per-event bodies for APP_STARTUP, HEARTBEAT, and REQUEST_METRICS), how it's sent, and the two ways to disable — disableInternalTelemetry on createApp and the DISABLE_APPKIT_INTERNAL_TELEMETRY env var. Bump faq.md's sidebar position to 8 to make room. Add a one-paragraph header in the package's index.ts pointing at the public doc. Co-authored-by: Isaac Signed-off-by: Jorge Calvar <jorge.calvar@databricks.com>
- Refuse cross-origin redirects from /telemetry-ext so the live SP Authorization header cannot be replayed against a third party. - Fall back to serviceCtx.client.config.host when DATABRICKS_HOST is unset so the dispatch URL still resolves correctly when the SDK was given a pre-configured WorkspaceClient. - Redact Authorization / Cookie / Set-Cookie when surfacing the request in the dev-playground debug UI and in the printed curl, so the sensitive headers don't leak via the response or get copy-pasted into shared logs. - Revert the knip.json @cyclonedx/cdxgen exception. Earlier diagnosis was wrong — the warnings are notices, not errors, and the original pre-commit failures came from this branch's own unused exports (now trimmed). With the branch rebased on origin/ main, knip exits 0 against the unmodified config. Co-authored-by: Isaac Signed-off-by: Jorge Calvar <jorge.calvar@databricks.com>
Both branches in fetchWithRedirect (cross-origin throw + same-origin follow) want to release the redirect's response body before moving on. Run the cancel once after we've parsed the target URL, before deciding what to do, instead of repeating it on each side. Co-authored-by: Isaac Signed-off-by: Jorge Calvar <jorge.calvar@databricks.com>
The /telemetry-ext endpoint may not actually issue redirects in practice; the follow logic was added defensively. Inline a single fetch with redirect: "manual" so any 3xx surfaces directly to the caller (and the dev-playground UI), making it easy to verify whether the redirect path is exercised on real traffic. If the deployed app never produces a 3xx response, the helper function and its tests stay deleted; if it does, restore them. Co-authored-by: Isaac Signed-off-by: Jorge Calvar <jorge.calvar@databricks.com>
04a9847 to
08378b9
Compare
pkosiec
left a comment
There was a problem hiding this comment.
Nice work! Could you share internally a short demo how it works E2E? not sure how to test it properly.
Thanks!
| export { | ||
| TelemetryReporter, | ||
| type TelemetrySendRequest, | ||
| type TelemetrySendResponse, | ||
| type TelemetrySendResult, | ||
| } from "./internal-telemetry"; |
There was a problem hiding this comment.
Do we need to export those types? If so, why?
There was a problem hiding this comment.
Maybe we should rework this page as Privacy policy or sth like that? Internal telemetry might be confused with plugin's telemetry 🤔
Also, IMO it should be the last item on the sidebar
| cache?: CacheConfig; | ||
| client?: WorkspaceClient; | ||
| onPluginsReady?: (appkit: PluginMap<T>) => void | Promise<void>; | ||
| disableInternalTelemetry?: boolean; |
There was a problem hiding this comment.
I'd vote for removing that. IMO the environmental variable is enough 👍
| ``` | ||
|
|
||
| Either one fully disables the reporter — no events are emitted and no | ||
| network calls are made. |
There was a problem hiding this comment.
We could also support DO_NOT_TRACK=1 - https://donottrack.sh/
| ## Inspecting events locally | ||
|
|
||
| The dev-playground ships an **Internal Telemetry** tab that lets you | ||
| trigger each event type on demand and inspect the exact request, the | ||
| response, and a `curl` command you can replay. Use it to verify what your | ||
| deployed app would send before enabling telemetry in production. |
There was a problem hiding this comment.
Dev playground is only for us, AppKit maintainers and contributors. IMO we shouldn't describe it here.
|
|
||
| .databricks | ||
|
|
||
| .superset/config.json |
There was a problem hiding this comment.
We can ignore the whole dir, right?
BTW, personally I added such user-specific files to my global gitignore - you can consider such approach too 👍
| return this.#send([ | ||
| this.#wrap({ | ||
| event_name: "HEARTBEAT", | ||
| heartbeat_event: { placeholder: true }, |
There was a problem hiding this comment.
why placeholder: true? shouldn't it be an empty object for now?
Same for startup
| export { isInternalTelemetryEnabled } from "./config.js"; | ||
| export { TelemetryReporter } from "./reporter.js"; |
There was a problem hiding this comment.
Is there any reason why those exports are for the .js files?
| await params.client.config.authenticate(headers); | ||
|
|
||
| const controller = new AbortController(); | ||
| const timer = setTimeout(() => controller.abort(), TIMEOUT_MS); | ||
|
|
||
| try { | ||
| const response = await fetch(url, { | ||
| method: "POST", | ||
| headers, | ||
| body, | ||
| signal: controller.signal, | ||
| redirect: "manual", | ||
| }); | ||
| const responseBody = await response.text(); | ||
| return { | ||
| request: { url, method: "POST", headers: headersToObject(headers), body }, | ||
| response: { | ||
| status: response.status, | ||
| statusText: response.statusText, | ||
| body: responseBody, | ||
| }, | ||
| }; | ||
| } finally { | ||
| clearTimeout(timer); |
There was a problem hiding this comment.
Something to check, but I believe we could use the client to do an arbitrary request with all the necessary headers already - I think it was client.apiClient.request() but not sure, pls double check if that's possible
| response_count_http5xx?: number; | ||
| } | ||
|
|
||
| export interface AppkitLog { |
There was a problem hiding this comment.
Why Log? shouldn't we call it AppEvent / AppKitEvent etc.? 🤔
| appkitVersion: productVersion, | ||
| }); | ||
| reporter.start(); | ||
| reporter.sendStartup().catch(() => {}); |
There was a problem hiding this comment.
we are calling this (which is asynchronous) and basically ignoring the promise, is this on purpose?
Summary
AppkitLogevents (APP_STARTUP,HEARTBEAT,REQUEST_METRICS) and aTelemetryReportersingleton that owns shared dispatch state, the periodic heartbeat, and per-endpoint request metrics aggregation.res.on('finish')middleware; the reporter flushes one event per(method, route)on a periodic timer (configurable viaAPPKIT_TELEMETRY_HEARTBEAT_INTERVAL_MS/APPKIT_TELEMETRY_METRICS_FLUSH_INTERVAL_MS)./telemetry-ext?o=<workspaceId>with the workspace SP bearer token. Errors propagate from the inner senders so consumers can see exactly what was sent and how the endpoint responded; the only swallows live at the SDK's outermost boundaries (fire-and-forget startup + interval timers).curlcommand. Disable globally withdisableInternalTelemetry: trueoncreateApporAPPKIT_TELEMETRY_DISABLED=true.app.yamlwas missing theDATABRICKS_JOB_IDbinding the jobs plugin requires (deploys to a fresh app failed startup validation); knip false-positive on thepnpm exec cdxgen-invoked@cyclonedx/cdxgendep blocked all pre-commit hooks.