diff --git a/clickstack/managed-onboarding/instrument-application.mdx b/clickstack/managed-onboarding/instrument-application.mdx new file mode 100644 index 000000000..2c2b4ce04 --- /dev/null +++ b/clickstack/managed-onboarding/instrument-application.mdx @@ -0,0 +1,153 @@ +--- +slug: /use-cases/observability/clickstack/instrument-application +title: 'Instrument an application in 5 mins with Managed ClickStack' +description: 'Instrument a Node.js application with OpenTelemetry and send its logs, metrics, and traces into Managed ClickStack' +doc_type: 'guide' +keywords: ['clickstack', 'instrumentation', 'opentelemetry', 'managed', 'observability', 'sdk', 'nodejs'] +--- + +import { Image } from "/snippets/components/Image.jsx"; + +This guide shows how to instrument a small Node.js application with OpenTelemetry and send its logs, metrics, and traces into Managed ClickStack. The backend is instrumented with no changes to the application source code. + +The [HackerNews Analyzer](https://github.com/ClickHouse/hn-news-analyzer) is a Node.js app that queries the HackerNews dataset hosted in the public ClickHouse demo. Every chart, table, and search box is backed by a real ClickHouse query, so every interaction produces a trace whose main span is the HTTPS call from the backend out to ClickHouse. + +This guide assumes you've completed setting up your OpenTelemetry Collector and have a ClickStack collector running and reachable from the machine you run this application on. **Ensure you have recorded its OTLP endpoint** and the `OTLP_AUTH_TOKEN` you set when deploying it. + +## Prerequisites {#prerequisites} + +- A ClickStack collector reachable from this machine. If you haven't deployed one yet, see the [OpenTelemetry collector guide](/clickstack/ingesting-data/collector) first. +- The OTLP endpoint of that collector and the `OTLP_AUTH_TOKEN` you set on it. +- Node 18+ and npm. + + + + +Clone the repository, install dependencies, and create your `.env` file: + +```bash +git clone https://github.com/ClickHouse/hn-news-analyzer.git +cd hn-news-analyzer +npm install +cp .env.example .env +``` + +The ClickHouse data source defaults to the public read-only demo cluster, so the app runs without any further configuration. Start it: + +```bash +./run.sh +``` + +Open [http://localhost:5001](http://localhost:5001). You will see a year selector, summary statistics, an activity chart, top users and domains tables, and a search box. Click around: switch years, drill into stories. + +The HackerNews Analyzer application running locally + +At this point the application is running but uninstrumented. ClickStack shows no data: it is waiting for telemetry. This is the "before" state. + + + + +The application needs two values to reach the collector: + +- `OTEL_EXPORTER_OTLP_ENDPOINT`: the OTLP endpoint your collector exposes (commonly port `4318` for OTLP over HTTP). +- `OTEL_EXPORTER_OTLP_HEADERS`: the authorization header carrying your ingestion token, in the form `authorization=`. + +Open `.env` and set them: + +```bash +OTEL_SERVICE_NAME=hn-analyzer-api +OTEL_EXPORTER_OTLP_ENDPOINT=https://:4318 +OTEL_EXPORTER_OTLP_HEADERS=authorization= +``` + +The SDK uses `OTEL_EXPORTER_OTLP_HEADERS` to set the authorization header for all three signals: traces, metrics, and logs. If your collector runs locally and doesn't enforce auth, you can leave the value empty (`OTEL_EXPORTER_OTLP_HEADERS=authorization=`), but the variable must be present; the SDK skips initialization entirely if it's unset or fully empty. + + + + +Instrumentation has three parts: install the SDKs, switch the launch command, and enable the browser SDK. None of it changes the application's business logic. + +### Install the SDKs {#install-sdks} + +Install both the backend and browser OpenTelemetry SDKs: + +```bash +npm install @hyperdx/node-opentelemetry @hyperdx/browser +``` + +### Use the opentelemetry-instrument CLI {#use-open-telemetry-cli} + +The application is launched by `run.sh`, which has two `exec` lines at the bottom: one active, one commented. Switch which one is active so Node is wrapped by `opentelemetry-instrument`: + +```diff + # BEFORE: plain node, no instrumentation, collector stays silent: +-exec node scripts/entrypoint.js ++# exec node scripts/entrypoint.js + + # AFTER: same source, wrapped by opentelemetry-instrument CLI. +-# exec npx opentelemetry-instrument scripts/entrypoint.js ++exec npx opentelemetry-instrument scripts/entrypoint.js +``` + +That is the entire backend change. The auto-instrumentation is loaded by `opentelemetry-instrument` at process start. + +### Enable the browser SDK {#enable-browser-sdk} + +To capture distributed traces (browser to backend) and session replays, enable the browser SDK in `src/web/telemetry.ts`. Uncomment the import and the `HyperDX.init({...})` block: + +```javascript +import HyperDX from '@hyperdx/browser'; + +export function initTelemetry(): void { + HyperDX.init({ + url: __OTLP_ENDPOINT__, + apiKey: __OTLP_AUTH_TOKEN__, + service: 'hn-analyzer-web', + tracePropagationTargets: [/localhost:5001/i, /\/api\//i], + consoleCapture: true, + advancedNetworkCapture: true, + }); +} +``` + +No extra `.env` edits are required. `__OTLP_ENDPOINT__` and `__OTLP_AUTH_TOKEN__` are compile-time constants injected by `vite.config.ts`: the endpoint is `OTEL_EXPORTER_OTLP_ENDPOINT` and the token is parsed out of `OTEL_EXPORTER_OTLP_HEADERS`, the same values the backend uses. + + +The ingestion token is baked into the public browser bundle and is readable by anyone inspecting the network tab. + + + + + +Restart the application so the new launch command and freshly built browser bundle take effect: + +```bash +# Ctrl-C the previous run, then: +./run.sh +``` + +Reload the browser tab so Vite serves the updated bundle, then refresh the app a few times, switch years, and click into stories to generate traffic. + +Open the ClickStack UI: + +1. Go to **Search** and filter to the last 5 minutes. Logs for `hn-analyzer-api` stream in. + +ClickStack Logs + +2. Click into a request and walk up the trace. You will see the Express handler span, a child HTTP span pointing at the ClickHouse cluster with real network duration, and correlated `console.log` records on the same trace. + +ClickStack Traces + +3. Open **Session Replay** to play back a scrubbable video of a browser session, synced to the trace timeline. + +ClickStack Sessions + +Logs, metrics, traces, and session replays all land in the same UI, share the same query language, and are correlated automatically. + + + + +## Further reading {#further-reading} + +- [OpenTelemetry collector](/clickstack/ingesting-data/collector) for configuring and deploying the collector. +- [Going to production](/clickstack/managing/production) for recommendations when going to production. diff --git a/clickstack/managed-onboarding/monitoring-aws-cloudwatch-logs.mdx b/clickstack/managed-onboarding/monitoring-aws-cloudwatch-logs.mdx new file mode 100644 index 000000000..923564de7 --- /dev/null +++ b/clickstack/managed-onboarding/monitoring-aws-cloudwatch-logs.mdx @@ -0,0 +1,305 @@ +--- +slug: /use-cases/observability/clickstack/monitoring-aws-cloudwatch-logs +title: 'Monitoring AWS CloudWatch logs with Managed ClickStack' +description: 'Forward AWS CloudWatch logs into Managed ClickStack via the OpenTelemetry CloudWatch receiver' +doc_type: 'guide' +keywords: ['clickstack', 'aws', 'cloudwatch', 'logs', 'managed', 'observability', 'otel'] +--- + +import { Image } from "/snippets/components/Image.jsx"; + +This guide walks you through forwarding AWS CloudWatch logs into Managed ClickStack using the OpenTelemetry [`awscloudwatch` receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/awscloudwatchreceiver), then viewing them in the ClickStack UI. + +We'll run a separate collector that polls CloudWatch via the AWS API and forwards events to your ClickStack collector via OTLP. Keep this collector in the same AWS account and region as the log groups to minimise API latency and cost. + +This guide assumes you've completed setting up your OpenTelemetry Collector and have a ClickStack collector running. + +The ClickStack collector can be deployed either as a **Docker container** or as a **Helm release** in Kubernetes via the upstream OpenTelemetry Helm chart with the ClickStack collector image (see [Deploying the collector](/clickstack/ingesting-data/collector#configuring-the-collector)). **Ensure you have recorded its OTLP endpoint** and the `OTLP_AUTH_TOKEN` you set when deploying it. + + + + +You'll need: + +- An **AWS account** with one or more CloudWatch log groups and credentials with the IAM permissions below. +- A host with **Docker** installed, AWS API access, and outbound network access to your ClickStack collector. Typically this is an EC2 instance in the same AWS account and region as the log groups. +- The **OTLP endpoint** of your ClickStack collector, reachable from this host. If it's running in Docker on the same machine, use `http://host.docker.internal:4318` (see the callout in [Configure the CloudWatch receiver](#configure-receiver)). For a remote collector, use its full URL, for example `https://otel.example.com:4318`. +- The `OTLP_AUTH_TOKEN` value you set on your ClickStack collector. If you didn't secure it, you can drop the `authorization` header from the config below. + + + + +The receiver picks up AWS credentials from the standard environment variables. Export them on the host that will run the collector. + +**For AWS SSO users:** + +```shell +aws sso login --profile YOUR_PROFILE_NAME +eval $(aws configure export-credentials --profile YOUR_PROFILE_NAME --format env) +aws sts get-caller-identity +``` + +**For IAM users with long-term credentials:** + +```shell +export AWS_ACCESS_KEY_ID="your-access-key-id" +export AWS_SECRET_ACCESS_KEY="your-secret-access-key" +export AWS_REGION="us-east-1" +aws sts get-caller-identity +``` + +The credentials need the following IAM policy to read CloudWatch logs: + +```json +{ + "Version": "2012-10-17", + "Statement": [ + { + "Sid": "CloudWatchLogsRead", + "Effect": "Allow", + "Action": [ + "logs:DescribeLogGroups", + "logs:FilterLogEvents" + ], + "Resource": "arn:aws:logs:*:YOUR_ACCOUNT_ID:log-group:*" + } + ] +} +``` + +Replace `YOUR_ACCOUNT_ID` with your AWS account ID. + + +**Production credentials:** For production, prefer instance-attached credentials over long-term keys: an IAM role on EC2, IRSA on EKS, or a task role on ECS. The same collector config below works without any credential env vars when the receiver can resolve credentials from the instance metadata service. + + + + + +Export your ClickStack collector endpoint and auth token, then create an `otel-collector-config.yaml`. + + +**Same-host setup:** The example below assumes the ClickStack collector and this CloudWatch collector run on the same host, so the receiver connects to it via `host.docker.internal` (the Docker host's address from inside a container). If your ClickStack collector lives elsewhere (an in-cluster service, a public URL, a private IP), substitute its address in `OTEL_COLLECTOR_ENDPOINT` below. + + +```shell +export OTEL_COLLECTOR_ENDPOINT="http://host.docker.internal:4318" +export OTLP_AUTH_TOKEN="a-strong-shared-secret" +``` + + + +Before editing the config, list the log groups that exist in your region so you can pick real names (and confirm the region is correct): + +```shell +aws logs describe-log-groups --region eu-central-1 \ + --query 'logGroups[].logGroupName' --output table +``` + +Example output: + +```text +------------------------------- +| DescribeLogGroups | ++-----------------------------+ +| /aws-glue/jobs/error | +| /aws-glue/jobs/logs-v2 | +| /aws-glue/jobs/output | +| /aws-glue/sessions/error | +| /aws-glue/sessions/output | ++-----------------------------+ +``` + +Use the names from this list directly in the `groups.named` block of Example 1 below. For the account above, the named-groups section would become: + +```yaml +groups: + named: + /aws-glue/jobs/error: + /aws-glue/jobs/logs-v2: + /aws-glue/jobs/output: + /aws-glue/sessions/error: + /aws-glue/sessions/output: +``` + +Alternatively, if the groups you want share a common prefix (here `/aws-glue/`), use Example 2 with `prefix: /aws-glue/` instead of listing them individually. + + + +**Example 1: Named log groups (recommended)** + +```shell +cat > otel-collector-config.yaml <<'EOF' +receivers: + awscloudwatch: + region: eu-central-1 + logs: + poll_interval: 1m + max_events_per_request: 100 + groups: + named: + /aws-glue/jobs/error: + /aws-glue/jobs/output: + /aws-glue/sessions/error: + +processors: + batch: + timeout: 10s + +exporters: + otlphttp: + endpoint: ${OTEL_COLLECTOR_ENDPOINT} + headers: + authorization: ${OTLP_AUTH_TOKEN} + +service: + pipelines: + logs: + receivers: [awscloudwatch] + processors: [batch] + exporters: [otlphttp] +EOF +``` + +**Example 2: Autodiscover log groups by prefix** + +```shell +cat > otel-collector-config.yaml <<'EOF' +receivers: + awscloudwatch: + region: eu-central-1 + logs: + poll_interval: 1m + max_events_per_request: 100 + groups: + autodiscover: + limit: 100 + prefix: /aws-glue/ + +processors: + batch: + timeout: 10s + +exporters: + otlphttp: + endpoint: ${OTEL_COLLECTOR_ENDPOINT} + headers: + authorization: ${OTLP_AUTH_TOKEN} + +service: + pipelines: + logs: + receivers: [awscloudwatch] + processors: [batch] + exporters: [otlphttp] +EOF +``` + +Key settings to adjust: + +- `region` to match where your log groups live. +- `poll_interval` (`1m` default). Lower values give near-real-time logs at the cost of more AWS API calls. +- `groups.named` for an explicit list, or `groups.autodiscover.prefix` to pick up every group matching a prefix. + +For the full set of options, see the [CloudWatch receiver documentation](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/awscloudwatchreceiver). + + +**Recent logs only:** On first run, the receiver checkpoints at the current time and only fetches logs from that point forward. Historical logs aren't backfilled. + + + + + +Create a `docker-compose.yaml` alongside `otel-collector-config.yaml`. The `extra_hosts` entry lets the container reach a ClickStack collector running on the same host via `host.docker.internal`; the long-form bind mount errors explicitly if the config file is missing, rather than silently creating an empty directory: + +```shell +cat > docker-compose.yaml <<'EOF' +services: + otel-collector: + image: otel/opentelemetry-collector-contrib:latest + command: ["--config=/etc/otel-config.yaml"] + volumes: + - type: bind + source: ./otel-collector-config.yaml + target: /etc/otel-config.yaml + read_only: true + environment: + - AWS_ACCESS_KEY_ID + - AWS_SECRET_ACCESS_KEY + - AWS_SESSION_TOKEN + - AWS_REGION + - OTEL_COLLECTOR_ENDPOINT + - OTLP_AUTH_TOKEN + extra_hosts: + - "host.docker.internal:host-gateway" + restart: unless-stopped +EOF +``` + +Start the collector: + +```shell +docker compose up -d +``` + +Tail its logs to confirm it's polling CloudWatch and exporting to your ClickStack collector: + +```shell +docker compose logs -f otel-collector +``` + + + + +Open your service in the [ClickHouse Cloud console](https://console.clickhouse.cloud) and select **ClickStack** from the left menu. + +Launch ClickStack + +In the **Search** view, switch the source to `Logs` and set the time range to **Last 15 minutes**. CloudWatch events should appear within a couple of poll intervals. + +ClickStack Search view with CloudWatch logs + +Each event carries the source group and stream as resource attributes: + +- `ResourceAttributes['aws.region']`: the AWS region (for example `eu-central-1`) +- `ResourceAttributes['cloudwatch.log.group.name']`: the source log group +- `ResourceAttributes['cloudwatch.log.stream']`: the source log stream +- `Body`: the original log line + +Modify the search to `Timestamp, SeverityText as level, ResourceAttributes['aws.region'], ResourceAttributes['cloudwatch.log.group.name'], ResourceAttributes['cloudwatch.log.stream'], Body` to include these attributes: + +ClickStack Search view with CloudWatch logs and attributes + +Select a log entry to inspect is metadata: + +CloudWatch attributes in the log detail view + +If nothing shows up: + +- Run `aws sts get-caller-identity` on the collector host to confirm credentials are valid. +- Tail the collector with `docker compose logs -f otel-collector` and look for `AccessDeniedException` (IAM), `security token` errors (expired SSO credentials), `ResourceNotFoundException` (log group name typo or wrong region), or `connection refused` (your ClickStack collector endpoint is unreachable from inside the container, see the `host.docker.internal` note in [Configure the CloudWatch receiver](#configure-receiver)). +- Verify `OTEL_COLLECTOR_ENDPOINT` is reachable from inside the container: `docker compose exec otel-collector wget -qO- ${OTEL_COLLECTOR_ENDPOINT}/v1/logs -S 2>&1 | head -5`. +- Confirm `OTLP_AUTH_TOKEN` matches the value set on your ClickStack collector. + + + + +A pre-built dashboard with log volume, severity breakdown, and error distribution is available for download. + +Download `cloudwatch-logs-dashboard.json`, then in the ClickStack UI navigate to **Dashboards**, click **Import**. + +Import dashboard button + +Upload the JSON file and click **Finish Import**. + +Finish import dialog + + + + +## Further reading {#further-reading} + +- [AWS CloudWatch logs integration reference](/clickstack/integration-examples/cloudwatch) for the demo dataset, full troubleshooting, and tuning options. +- [Securing the collector](/clickstack/ingesting-data/collector#securing-the-collector) with TLS on the OTLP endpoint and least-privilege ingestion users. +- [Processing, filtering, and enriching](/clickstack/ingesting-data/collector#processing-filtering-transforming-enriching) events at the collector. +- [Going to production](/clickstack/managing/production) for recommendations when going to production. diff --git a/clickstack/managed-onboarding/monitoring-kubernetes.mdx b/clickstack/managed-onboarding/monitoring-kubernetes.mdx new file mode 100644 index 000000000..28a30e2d3 --- /dev/null +++ b/clickstack/managed-onboarding/monitoring-kubernetes.mdx @@ -0,0 +1,335 @@ +--- +slug: /use-cases/observability/clickstack/monitoring-kubernetes +title: 'Monitoring Kubernetes with Managed ClickStack' +description: 'Collect logs, infrastructure metrics, and events from a Kubernetes cluster into Managed ClickStack' +doc_type: 'guide' +keywords: ['clickstack', 'kubernetes', 'k8s', 'managed', 'observability', 'logs', 'metrics', 'events', 'daemonset', 'helm'] +--- + +import { Image } from "/snippets/components/Image.jsx"; + +This guide walks you through collecting logs, infrastructure metrics, and Kubernetes events from a cluster into Managed ClickStack, then viewing them in the built-in Kubernetes dashboard. + +The pattern is the standard OpenTelemetry one: two collectors deployed via the [OpenTelemetry Helm chart](https://github.com/open-telemetry/opentelemetry-helm-charts/tree/main/charts/opentelemetry-collector), each forwarding to your ClickStack gateway collector via OTLP. A **DaemonSet** runs on every node to collect container logs and kubelet metrics. A **Deployment** with a single replica collects Kubernetes events and cluster-wide metrics. For background on the gateway role, see [Collector roles](/clickstack/ingesting-data/collector#collector-roles). + +This guide assumes you've completed setting up your OpenTelemetry Collector and have a ClickStack gateway collector running. + +For a Kubernetes-resident workload, the gateway collector itself should be deployed **inside the same cluster using the upstream OpenTelemetry Helm chart with the ClickStack collector image**. Follow the Helm path in [Deploying the collector](/clickstack/ingesting-data/collector#configuring-the-collector) to install it. **Ensure you have recorded this OTLP endpoint**. + + + + +You'll need: + +- A **Kubernetes cluster** (v1.20+ recommended) with `kubectl` configured against it. +- **[Helm](https://helm.sh/) v3+**. +- The **OTLP endpoint** of your ClickStack gateway collector, reachable from inside the cluster, for example `http://clickstack-otel-collector.observability.svc.cluster.local:4318`. The collector should be deployed somewhere your DaemonSets and Deployment can reach it, typically in the same cluster or via a service of type `LoadBalancer`. +- The `OTLP_AUTH_TOKEN` value you set when deploying the gateway collector. If you didn't secure the collector, you can skip the secret step below and drop the `authorization` header from the manifests. + + +**Where the gateway runs:** For a cluster-local deployment, run the gateway collector as a Kubernetes `Deployment` or `StatefulSet` inside the same cluster and address it through its in-cluster service DNS. For a gateway running outside the cluster, use its externally reachable URL. + + + + + +Pick the namespace you want the collectors to live in, then create a secret holding the `OTLP_AUTH_TOKEN` and a ConfigMap pointing at your gateway: + +```shell +export OTLP_AUTH_TOKEN="a-strong-shared-secret" +export OTEL_COLLECTOR_ENDPOINT="http://clickstack-otel-collector.observability.svc.cluster.local:4318" +export NAMESPACE=observability + +kubectl create namespace ${NAMESPACE} --dry-run=client -o yaml | kubectl apply -f - + +kubectl create secret generic clickstack-otlp-secret \ + --from-literal=OTLP_AUTH_TOKEN=${OTLP_AUTH_TOKEN} \ + -n ${NAMESPACE} + +kubectl create configmap otel-config-vars \ + --from-literal=YOUR_OTEL_COLLECTOR_ENDPOINT=${OTEL_COLLECTOR_ENDPOINT} \ + -n ${NAMESPACE} +``` + +Both collectors below read these values via `extraEnvs`, so the same secret and ConfigMap are reused across them. + + + + +```shell +helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts +helm repo update +``` + + + + +This is a single-replica Deployment that collects **Kubernetes events** and **cluster-wide metrics** (node counts, pod phases, deployment status, and so on). Running more than one replica would produce duplicates. + +Save the following as `k8s_deployment.yaml`: + + + +```yaml +# k8s_deployment.yaml +mode: deployment + +image: + repository: otel/opentelemetry-collector-contrib + tag: 0.123.0 + +# We only want one of these collectors - any more and we'd produce duplicate data +replicaCount: 1 + +presets: + kubernetesAttributes: + enabled: true + extractAllPodLabels: true + extractAllPodAnnotations: true + # Collects Kubernetes events via the k8sobject receiver. + kubernetesEvents: + enabled: true + # Collects cluster-level metrics via the k8s_cluster receiver. + clusterMetrics: + enabled: true + +extraEnvs: + - name: OTLP_AUTH_TOKEN + valueFrom: + secretKeyRef: + name: clickstack-otlp-secret + key: OTLP_AUTH_TOKEN + optional: true + - name: YOUR_OTEL_COLLECTOR_ENDPOINT + valueFrom: + configMapKeyRef: + name: otel-config-vars + key: YOUR_OTEL_COLLECTOR_ENDPOINT + +config: + exporters: + otlphttp: + endpoint: "${env:YOUR_OTEL_COLLECTOR_ENDPOINT}" + compression: gzip + headers: + authorization: "${env:OTLP_AUTH_TOKEN}" + service: + pipelines: + logs: + exporters: + - otlphttp + metrics: + exporters: + - otlphttp +``` + + + +Install it: + +```shell +helm install k8s-otel-deployment open-telemetry/opentelemetry-collector \ + -f k8s_deployment.yaml \ + -n ${NAMESPACE} +``` + + + + +This is a DaemonSet that runs on every node to collect **container logs**, **host metrics**, and **kubelet metrics** (per-pod and per-container CPU and memory utilisation against requests and limits). + +Save the following as `k8s_daemonset.yaml`: + + + +```yaml +# k8s_daemonset.yaml +mode: daemonset + +image: + repository: otel/opentelemetry-collector-contrib + tag: 0.123.0 + +# Required to use the kubeletstats cpu/memory utilization metrics +clusterRole: + create: true + rules: + - apiGroups: + - '' + resources: + - nodes/proxy + verbs: + - get + +presets: + logsCollection: + enabled: true + hostMetrics: + enabled: true + kubernetesAttributes: + enabled: true + extractAllPodLabels: true + extractAllPodAnnotations: true + kubeletMetrics: + enabled: true + +extraEnvs: + - name: OTLP_AUTH_TOKEN + valueFrom: + secretKeyRef: + name: clickstack-otlp-secret + key: OTLP_AUTH_TOKEN + optional: true + - name: YOUR_OTEL_COLLECTOR_ENDPOINT + valueFrom: + configMapKeyRef: + name: otel-config-vars + key: YOUR_OTEL_COLLECTOR_ENDPOINT + +config: + receivers: + # Additional kubelet metrics expressed as utilisation against requests and limits. + kubeletstats: + collection_interval: 20s + auth_type: 'serviceAccount' + endpoint: '${env:K8S_NODE_NAME}:10250' + insecure_skip_verify: true + metrics: + k8s.pod.cpu_limit_utilization: + enabled: true + k8s.pod.cpu_request_utilization: + enabled: true + k8s.pod.memory_limit_utilization: + enabled: true + k8s.pod.memory_request_utilization: + enabled: true + k8s.pod.uptime: + enabled: true + k8s.node.uptime: + enabled: true + k8s.container.cpu_limit_utilization: + enabled: true + k8s.container.cpu_request_utilization: + enabled: true + k8s.container.memory_limit_utilization: + enabled: true + k8s.container.memory_request_utilization: + enabled: true + container.uptime: + enabled: true + + exporters: + otlphttp: + endpoint: "${env:YOUR_OTEL_COLLECTOR_ENDPOINT}" + compression: gzip + headers: + authorization: "${env:OTLP_AUTH_TOKEN}" + + service: + pipelines: + logs: + exporters: + - otlphttp + metrics: + exporters: + - otlphttp +``` + + + +Install it: + +```shell +helm install k8s-otel-daemonset open-telemetry/opentelemetry-collector \ + -f k8s_daemonset.yaml \ + -n ${NAMESPACE} +``` + +Confirm both releases are healthy: + +```shell +kubectl get pods -n ${NAMESPACE} -l app.kubernetes.io/name=opentelemetry-collector +``` + +You should see one Deployment pod and one DaemonSet pod per node, all in `Running` state. + + + + +To correlate your application logs, metrics, and traces with Kubernetes metadata (pod name, namespace, node, deployment), forward the metadata into your application via `OTEL_RESOURCE_ATTRIBUTES`. The DaemonSet's `k8sattributes` processor will then enrich incoming telemetry with the matching pod and node attributes. + +```yaml +# my_app_deployment.yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: app-deployment +spec: + replicas: 1 + selector: + matchLabels: + app: app + template: + metadata: + labels: + app: app + service.name: + spec: + containers: + - name: app-container + image: my-image + env: + - name: POD_NAME + valueFrom: + fieldRef: + fieldPath: metadata.name + - name: POD_UID + valueFrom: + fieldRef: + fieldPath: metadata.uid + - name: POD_NAMESPACE + valueFrom: + fieldRef: + fieldPath: metadata.namespace + - name: NODE_NAME + valueFrom: + fieldRef: + fieldPath: spec.nodeName + - name: DEPLOYMENT_NAME + valueFrom: + fieldRef: + fieldPath: metadata.labels['deployment'] + - name: OTEL_RESOURCE_ATTRIBUTES + value: k8s.pod.name=$(POD_NAME),k8s.pod.uid=$(POD_UID),k8s.namespace.name=$(POD_NAMESPACE),k8s.node.name=$(NODE_NAME),k8s.deployment.name=$(DEPLOYMENT_NAME) +``` + + + + +Open your service in the [ClickHouse Cloud console](https://console.clickhouse.cloud) and select **ClickStack** from the left menu. + +Launch ClickStack + +In the **Search** view, switch the source to `Logs` and set the time range to **Last 15 minutes**. Container logs from across the cluster should appear within a few seconds, enriched with attributes like `k8s.namespace.name`, `k8s.pod.name`, and `k8s.node.name`. + +ClickStack Search view with Kubernetes logs + +To see infrastructure metrics and events in context, open the built-in **Kubernetes** dashboard by navigating to **Dashboards** -> **Kubernetes**. The `Pods`, `Nodes`, and `Namespaces` tabs should all be populated. + +ClickStack Kubernetes dashboard + +If nothing shows up: + +- Verify the DaemonSet and Deployment pods are `Running` and tail their logs with `kubectl logs -n ${NAMESPACE} `. +- Confirm `YOUR_OTEL_COLLECTOR_ENDPOINT` is reachable from inside the cluster (`kubectl exec` into one of the collector pods and `curl` it). +- Check that the `OTLP_AUTH_TOKEN` in the secret matches the value set on the gateway collector. + + + + +## Further reading {#further-reading} + +- [Kubernetes integration reference](/clickstack/integration-examples/kubernetes) for the full set of receivers, processors, and tuning options. +- [Securing the collector](/clickstack/ingesting-data/collector#securing-the-collector) with TLS on the OTLP endpoint and least-privilege ingestion users. +- [Estimating resources](/clickstack/ingesting-data/collector#estimating-resources) for gateway and agent deployments at your expected throughput. +- [Going to production](/clickstack/managing/production) for recommendations when going to production. diff --git a/clickstack/managed-onboarding/tuning-clickstack-schema.mdx b/clickstack/managed-onboarding/tuning-clickstack-schema.mdx new file mode 100644 index 000000000..460f4dfe6 --- /dev/null +++ b/clickstack/managed-onboarding/tuning-clickstack-schema.mdx @@ -0,0 +1,86 @@ +--- +slug: /use-cases/observability/clickstack/tuning-clickstack-schema +title: 'Tuning Managed ClickStack - refining your schema' +description: 'Refine your ClickStack schema for improved query performance and storage efficiency in Managed ClickStack' +doc_type: 'guide' +keywords: ['clickstack', 'tuning', 'schema', 'managed', 'observability', 'performance', 'optimization', 'storage'] +--- + +If you've been running ClickStack for a while, you've probably noticed that the default schema handles most observability workloads without any changes. This page is for when that's no longer enough: query latency starts to climb, or your access patterns have drifted away from the defaults. + +Four optimizations cover most of what helps in practice. They're listed roughly in order of effort. The first two are local `ALTER TABLE` changes you can roll out incrementally. The third pays off when the same aggregation runs over and over on a dashboard. The fourth needs a table migration, so it's the most involved. + +The summaries below are short on purpose. For the reasoning behind each change, benchmarks, and the recipes for rolling it out to existing data, see [Performance tuning](/clickstack/managing/performance-tuning). + + + + +Filtering on `LogAttributes['service.version']` asks ClickHouse to load and decode the whole `LogAttributes` Map for every row it examines. Promote the attribute to a `MATERIALIZED` column and the same filter becomes a column read, usually an order of magnitude faster. ClickStack rewrites the filter automatically once the column exists, so saved searches and dashboards keep working unchanged. + +Pick the attributes you actually query often. Each materialized column costs storage and insert time, so this is a "promote what you use" exercise rather than a "promote everything" one. + +```sql +ALTER TABLE otel_logs + ADD COLUMN ServiceVersion LowCardinality(String) + MATERIALIZED LogAttributes['service.version']; +``` + +Existing rows stay empty for the new column until you also `ALTER TABLE otel_logs MATERIALIZE COLUMN ServiceVersion`. + +Read more: [Materialize frequently queried attributes](/clickstack/managing/performance-tuning#materialize-frequently-queried-attributes). + + + + +Skip indexes let ClickHouse rule out granules of data that can't match a filter, turning a scan into a small targeted read. Three types are worth knowing about: + +- **Text indexes** (`text(tokenizer = ...)`) on string columns and on `mapKeys`/`*AttributeItems` arrays. The default logs schema already ships these. +- **Min-max indexes** on numeric columns filtered by range. Trace `Duration` is the classic case. +- **Bloom filters** for high-cardinality equality lookups on ClickHouse versions that don't yet support text indexes. + +```sql +ALTER TABLE otel_traces ADD INDEX idx_duration Duration TYPE minmax GRANULARITY 1; +ALTER TABLE otel_traces MATERIALIZE INDEX idx_duration; +``` + +A skip index only pays for its evaluation cost if it actually prunes granules. Confirm with `EXPLAIN indexes = 1` on a representative query before assuming it helped. + +Read more: [Adding skip indexes](/clickstack/managing/performance-tuning#adding-skip-indexes). + + + + +When the same aggregation runs again and again on a dashboard (top services by error rate, p99 latency per endpoint, request counts per minute), a materialized view computes the result at insert time and writes it to a small rollup table. Dashboards then hit the rollup instead of the raw logs or traces, which is cheap by comparison. + +This pays off when the dashboard is hot and the underlying table is large. The cost is some insert-time CPU and a second table to maintain. + +Read more: [Exploiting materialized views](/clickstack/managing/performance-tuning#exploiting-materialized-views). + + + + +The primary key controls how rows are sorted on disk. Filters on the leading columns of that key let ClickHouse seek straight to the relevant region; filters that don't lead with one of those columns scan the whole partition. + +The default logs key `(toStartOfFiveMinutes(Timestamp), ServiceName, Timestamp)` favors "what happened in the last N minutes for service X". If most of your queries lead with a different column (a tenant id, a customer id, a region), changing the primary key to lead with that column is the highest-impact change you can make. + +```sql +CREATE TABLE otel_logs_v2 +( + -- same columns as otel_logs +) +ENGINE = MergeTree +ORDER BY (TenantId, ServiceName, Timestamp); +``` + +ClickHouse doesn't allow editing the primary key in place, so this is a table migration rather than a simple `ALTER`. The performance tuning guide walks through creating the new table, redirecting ingestion, and using a `Merge` table so existing dashboards keep working across old and new data. + +Read more: [Modifying the primary key](/clickstack/managing/performance-tuning#modifying-the-primary-key). + + + + +## Further reading {#further-reading} + +- [Performance tuning](/clickstack/managing/performance-tuning): full guide, including projections and row-lookup acceleration. +- [Tables and schemas used by ClickStack](/clickstack/ingesting-data/schemas): the canonical DDL the optimizations build on. +- [Going to production](/clickstack/managing/production): broader production recommendations. diff --git a/images/clickstack/cloudwatch/error-log-column-values-clickstack.png b/images/clickstack/cloudwatch/error-log-column-values-clickstack.png new file mode 100644 index 000000000..416d5e4f5 Binary files /dev/null and b/images/clickstack/cloudwatch/error-log-column-values-clickstack.png differ diff --git a/images/clickstack/cloudwatch/finish-clickstack-import.png b/images/clickstack/cloudwatch/finish-clickstack-import.png new file mode 100644 index 000000000..76f6760eb Binary files /dev/null and b/images/clickstack/cloudwatch/finish-clickstack-import.png differ diff --git a/images/clickstack/cloudwatch/log-search-attributes-clickstack.png b/images/clickstack/cloudwatch/log-search-attributes-clickstack.png new file mode 100644 index 000000000..26d83abb4 Binary files /dev/null and b/images/clickstack/cloudwatch/log-search-attributes-clickstack.png differ diff --git a/images/clickstack/cloudwatch/log-search-view-clickstack.png b/images/clickstack/cloudwatch/log-search-view-clickstack.png new file mode 100644 index 000000000..2ca04865a Binary files /dev/null and b/images/clickstack/cloudwatch/log-search-view-clickstack.png differ diff --git a/images/clickstack/getting-started/hackernews_main.png b/images/clickstack/getting-started/hackernews_main.png new file mode 100644 index 000000000..37110fc4e Binary files /dev/null and b/images/clickstack/getting-started/hackernews_main.png differ diff --git a/images/clickstack/getting-started/instrument_app_clickstack_logs.png b/images/clickstack/getting-started/instrument_app_clickstack_logs.png new file mode 100644 index 000000000..16b8acba6 Binary files /dev/null and b/images/clickstack/getting-started/instrument_app_clickstack_logs.png differ diff --git a/images/clickstack/getting-started/instrument_app_clickstack_sessions.png b/images/clickstack/getting-started/instrument_app_clickstack_sessions.png new file mode 100644 index 000000000..b5cf8e562 Binary files /dev/null and b/images/clickstack/getting-started/instrument_app_clickstack_sessions.png differ diff --git a/images/clickstack/getting-started/instrument_app_clickstack_traces.png b/images/clickstack/getting-started/instrument_app_clickstack_traces.png new file mode 100644 index 000000000..51c064aaf Binary files /dev/null and b/images/clickstack/getting-started/instrument_app_clickstack_traces.png differ