diff --git a/products/kubernetes-operator/guides/configuration.mdx b/products/kubernetes-operator/guides/configuration.mdx index 7b9ac3d3c..f23d67c00 100644 --- a/products/kubernetes-operator/guides/configuration.mdx +++ b/products/kubernetes-operator/guides/configuration.mdx @@ -239,10 +239,10 @@ spec: resources: requests: cpu: "250m" - memory: "256Mi" + memory: "512Mi" limits: cpu: "1" - memory: "1Gi" + memory: "512Mi" ``` ### Environment variables {#environment-variables} @@ -322,6 +322,297 @@ spec: key: ``` +## External Secret {#external-secret} + +By default the operator creates and owns a Secret containing the cluster's internal credentials (interserver password, management password, keeper identity, cluster secret, named-collections key). The Secret is named after the cluster and lives in the cluster's namespace. + +If you want to manage these credentials yourself — for example, sourcing them from HashiCorp Vault, AWS Secrets Manager, or [External Secrets Operator](https://external-secrets.io/) — point the operator at a pre-existing Secret using `spec.externalSecret`: + +```yaml +apiVersion: clickhouse.com/v1alpha1 +kind: ClickHouseCluster +metadata: + name: sample +spec: + replicas: 2 + keeperClusterRef: + name: sample + dataVolumeClaimSpec: + resources: + requests: + storage: 10Gi + externalSecret: + name: my-clickhouse-credentials + policy: Observe +``` + + +The referenced Secret must reside in the **same namespace** as the ClickHouseCluster. The operator never deletes a Secret it did not create. + + +### Required keys {#external-secret-required-keys} + +The Secret must contain the following keys: + +| Key | Format | When required | +|---|---|---| +| `interserver-password` | plaintext password | Always | +| `management-password` | plaintext password | Always | +| `keeper-identity` | `clickhouse:` | Always | +| `cluster-secret` | plaintext password | Always | +| `named-collections-key` | hex-encoded 16-byte AES key (32 hex chars) | ClickHouse `>= 25.12` only | + +A complete Secret looks like this: + +```yaml +apiVersion: v1 +kind: Secret +metadata: + name: my-clickhouse-credentials + namespace: sample +type: Opaque +stringData: + interserver-password: "a-strong-random-password" + management-password: "another-strong-password" + keeper-identity: "clickhouse:keeper-auth-password" + cluster-secret: "cluster-internal-secret" + named-collections-key: "0123456789abcdef0123456789abcdef" # 32 hex chars = 16 bytes +``` + +### Policy: Observe vs Manage {#external-secret-policy} + +`spec.externalSecret.policy` controls how the operator handles missing required keys: + +| Policy | Behavior on missing keys | +|---|---| +| `Observe` (default) | Reconciliation is **blocked** until every required key is present. The operator reports each missing key — and the format hint for it — via the `ExternalSecretValid` status condition and a `Warning` event. | +| `Manage` | The operator **generates** any missing required keys and writes them back to the same Secret. Useful for bootstrapping: create an empty Secret, let the operator fill it, then optionally tighten access. The operator still never deletes the Secret. | + + +Even with `policy: Manage` the Secret must already exist in the namespace — the operator never creates the Secret itself, it only writes generated keys into an existing one. If the referenced Secret is missing, reconciliation is blocked with the `ExternalSecretNotFound` reason regardless of policy. + + +Pick `Observe` when an external system (Vault, ESO, sealed-secrets, GitOps) is the source of truth and you want the operator to fail loudly on misconfiguration. Pick `Manage` when you want self-sufficient bootstrapping but still want to retain ownership of the Secret object itself (for example, to back it up). + +### Status condition and troubleshooting {#external-secret-status} + +The operator exposes a `ExternalSecretValid` condition on `ClickHouseCluster.status.conditions`. Inspect it when reconciliation looks stuck: + +```bash +# Plain kubectl — works out of the box +kubectl describe clickhousecluster sample | sed -n '/Conditions:/,$p' + +# Same data as YAML +kubectl get clickhousecluster sample -o yaml | sed -n '/conditions:/,/^[^ ]/p' + +# Pretty-printed JSON (requires jq) +kubectl get clickhousecluster sample -o jsonpath='{.status.conditions}' | jq +``` + +Possible reasons: + +| `reason` | Meaning | Fix | +|---|---|---| +| `ExternalSecretNotFound` | The referenced Secret does not exist in the namespace. | Create the Secret, or fix `spec.externalSecret.name`. | +| `ExternalSecretInvalid` | The Secret exists but lacks required keys (only with `Observe`). The message lists each missing key together with its expected format. | Add the missing keys, or switch to `policy: Manage`. | +| `ExternalSecretValid` | All required keys are present and the operator is using the Secret. | — | + +The operator requeues reconciliation while the Secret is invalid, so once you add the missing keys the next reconcile picks them up automatically — no need to bounce pods. + + +The set of required keys depends on the running ClickHouse version. `named-collections-key` is only validated once the operator's version probe has detected ClickHouse `25.12` or newer. On older versions the key may be absent from the Secret. + + +## Additional ports {#additional-ports} + +The operator exposes a fixed set of ports on every ClickHouse Pod and its headless Service: `8123` HTTP, `9000` native, `9009` interserver, `9001` management, `9363` Prometheus metrics, and the TLS variants `8443`/`9440` when TLS is enabled. To make ClickHouse listen on additional protocols — MySQL, PostgreSQL, gRPC, or any custom port — declare them in `spec.additionalPorts`: + +```yaml +spec: + additionalPorts: + - name: mysql + port: 9004 + - name: postgres + port: 9005 + - name: grpc + port: 9100 +``` + +The operator adds those ports to the Pod's `containerPorts` and to the headless Service. The complete example lives at [`examples/custom_protocols.yaml`](https://github.com/ClickHouse/clickhouse-operator/blob/main/examples/custom_protocols.yaml). + + +`additionalPorts` only opens the ports on the Kubernetes side. It does **not** configure the ClickHouse server to listen on them. You also have to enable the matching protocol in `spec.settings.extraConfig.protocols`. Without that, the port is open on the Service but nothing inside the pod is answering. + + +### End-to-end example: MySQL wire protocol {#additional-ports-mysql-example} + +To expose ClickHouse over the MySQL wire protocol on port `9004`: + +```yaml +apiVersion: clickhouse.com/v1alpha1 +kind: ClickHouseCluster +metadata: + name: sample +spec: + replicas: 1 + keeperClusterRef: + name: sample + dataVolumeClaimSpec: + resources: + requests: + storage: 2Gi + + # 1) Open the port on the Pod and the headless Service. + additionalPorts: + - name: mysql + port: 9004 + + # 2) Tell ClickHouse server to actually listen on it. + settings: + extraConfig: + protocols: + mysql: + type: mysql + port: 9004 + description: "MySQL wire protocol" +``` + +After applying, verify from inside the cluster: + +```bash +kubectl exec sample-clickhouse-0-0-0 -- \ + clickhouse-client --port 9004 --query "SELECT 1" +``` + +### Field constraints {#additional-ports-constraints} + +| Field | Rule | +|---|---| +| `name` | Must match the DNS_LABEL pattern `^[a-z]([-a-z0-9]*[a-z0-9])?$`, max 63 characters. Uniqueness is enforced by the CRD as a list-map key. | +| `port` | Integer in `[1, 65535]`. The webhook rejects duplicate port numbers within the list. | + +### Reserved ports and names {#additional-ports-reserved} + +The validating webhook rejects `additionalPorts` entries that would collide with ports the operator binds itself. All TLS-related ports are reserved **unconditionally** so that flipping `spec.settings.tls.enabled` later cannot break a previously valid cluster. + +| Port | Reserved for | +|---|---| +| `8123` | HTTP | +| `8443` | HTTPS | +| `9000` | native TCP | +| `9440` | native TLS | +| `9009` | interserver | +| `9001` | management | +| `9363` | Prometheus metrics | + +The following names are also rejected — they are the operator's internal protocol-type identifiers (not the human-readable aliases): + +| Name | +|---| +| `http` | +| `http-secure` | +| `tcp` | +| `tcp-secure` | +| `interserver` | +| `management` | +| `prometheus` | + +A rejected request produces an error such as: + +``` +spec.additionalPorts[0].port: 8123 is reserved for the operator-managed HTTP port +spec.additionalPorts[0].name: "http" is reserved by the operator +``` + +## Version probe and upgrade channel {#version-probe-and-upgrade-channel} + +The operator does two independent things with cluster versions: + +1. **Version probe** — a Kubernetes `Job` that runs the container image once to detect the running ClickHouse / Keeper version. The detected version is recorded in `.status.version` and used by other reconciliation steps (e.g. the `External Secret` named-collections key is only required from ClickHouse `25.12`). +2. **Upgrade channel** — a periodic check against the public ClickHouse release feed (`https://clickhouse.com/data/version_date.tsv`). The operator reports whether a newer version is available via the `VersionUpgraded` status condition. It never upgrades the cluster on its own — the user is in control of the image tag. + +### Choosing a release channel {#upgrade-channel-choosing} + +`spec.upgradeChannel` selects which set of upstream releases the operator compares against. Same field exists on both `ClickHouseCluster` and `KeeperCluster`. + +```yaml +spec: + upgradeChannel: lts # or "stable", or "25.8", or omitted +``` + +Allowed values (validated by the CRD with the pattern `^(lts|stable|\d+\.\d+)?$`): + +| Value | Behavior | +|---|---| +| _empty_ (default) | The operator proposes only **minor** updates within the currently-running major.minor line. A cluster on `25.8.3.1` will be told about `25.8.4.x` but not `25.9.x`. | +| `stable` | Tracks the upstream `stable` channel — the latest release that ClickHouse Inc. flags as stable on the main release line. Receives major upgrades sooner than the `lts` channel. | +| `lts` | Tracks the upstream `lts` channel — long-term support releases. Receives major upgrades less frequently, with longer support windows. | +| `25.8` (or any `.`) | Pins the channel to a specific major.minor line. Major upgrades beyond it are not proposed even if a newer version exists upstream. | + +For production, pinning the channel to an explicit `.` (e.g. `25.8`) is generally preferred. It locks the cluster to the intended major release line and lets the operator surface a `WrongReleaseChannel` warning if any replica somehow drifts onto a different major — which matters especially when the image is referenced by a digest (`@sha256:...`) rather than by a human-readable tag. The empty default is fine for development clusters where major-version jumps are not a concern. + +### Status conditions {#version-status-conditions} + +Two conditions surface the result of the probe and the upgrade check: + +| Condition | Reason | Meaning | +|---|---|---| +| `VersionInSync` | `VersionMatch` | All replicas report the same version as the image | +| `VersionInSync` | `VersionMismatch` | Replicas are running different versions. This reason is suppressed during a planned rolling upgrade. It typically surfaces when a mutable image tag has been pinned (for example `latest` or a bare major like `26.3`) and the underlying registry has shifted between pulls, so different replicas ended up on different patches of the same tag. | +| `VersionInSync` | `VersionPending` | Version probe Job has not finished yet | +| `VersionInSync` | `VersionProbeFailed` | Probe Job failed; the operator cannot determine the running version | +| `VersionUpgraded` | `UpToDate` | The cluster is on the latest version available in the selected channel | +| `VersionUpgraded` | `MinorUpdateAvailable` | A newer patch is available in the same `major.minor` line | +| `VersionUpgraded` | `MajorUpdateAvailable` | A newer `major.minor` is available within the chosen channel | +| `VersionUpgraded` | `VersionOutdated` | The running version is out of date and will no longer receive fixes from the selected channel — typically because the major line has been dropped from `lts` or `stable` upstream | +| `VersionUpgraded` | `WrongReleaseChannel` | The running image does not belong to the selected `upgradeChannel`. Example: a cluster running `26.5` with `upgradeChannel: lts`, since `26.5` is not part of the upstream `lts` line. | +| `VersionUpgraded` | `UpgradeCheckFailed` | The operator could not reach the upstream release feed | + +Inspect them with: + +```bash +kubectl get clickhousecluster sample -o yaml | sed -n '/conditions:/,/^[^ ]/p' +``` + +### Overriding the version probe Job {#version-probe-template} + +The probe is implemented as a regular Kubernetes `Job`. If your cluster has admission policies that require specific Tolerations, node selectors, security contexts, or you want to limit how long completed probe Jobs linger, override the template via `spec.versionProbeTemplate`: + +```yaml +spec: + versionProbeTemplate: + spec: + ttlSecondsAfterFinished: 600 # delete completed probe Jobs 10 minutes after completion + template: + spec: + nodeSelector: + kubernetes.io/arch: amd64 + tolerations: + - key: dedicated + operator: Equal + value: clickhouse + effect: NoSchedule + containers: + - name: version-probe + resources: + requests: + cpu: 50m + memory: 64Mi +``` + +The container name `version-probe` is the operator's default — the entry under `containers:` matches it by name, so the operator deep-merges the user-provided fields on top of the defaults. + +### Operator-wide controls {#version-operator-flags} + +Two flags on the operator manager control the upgrade-check loop globally: + +| Flag | Default | Effect | +|---|---|---| +| `--version-update-interval` | `24h` | How often the operator re-fetches the upstream version list | +| `--disable-version-update-checks` | `false` | Disables the upgrade checker entirely. The `VersionUpgraded` condition is not set, and no outbound HTTP traffic to `clickhouse.com` is generated | + +Set `--disable-version-update-checks=true` in air-gapped environments or when egress to `clickhouse.com` is not allowed. + ## ClickHouse settings {#clickhouse-settings} ### Default user password {#default-user-password} @@ -426,6 +717,47 @@ spec: When enabled, the operator synchronizes Replicated and integration tables to new replicas. +### Server logging {#server-logging} + +Configure the ClickHouse server log through `spec.settings.logger`. Every field is optional with a safe default, so a cluster you never touch already logs at `trace` to both the container console and a rotated file on disk. + +```yaml +spec: + settings: + logger: + logToFile: true # Default: true. Set false to log only to the console + jsonLogs: false # Default: false. Set true for structured JSON log lines + level: trace # Default: trace + size: 1000M # Default: 1000M. Rotate a log file once it reaches this size + count: 50 # Default: 50. Number of rotated files to keep +``` + +| Field | Default | Description | +|---|---|---| +| `logToFile` | `true` | When `false`, the operator drops the file targets and the server logs only to the container console. | +| `jsonLogs` | `false` | When `true`, the operator adds `formatting.type: json` so each line is a JSON object. | +| `level` | `trace` | Log verbosity. One of `test`, `trace`, `debug`, `information`, `notice`, `warning`, `error`, `critical`, `fatal`. | +| `size` | `1000M` | Maximum size of a single log file before rotation. | +| `count` | `50` | Number of rotated log files the server retains. | + +The operator always keeps console logging on so that `kubectl logs` works, and layers file logging on top when `logToFile` is `true`. A cluster with the defaults renders this `logger` block: + +```yaml +logger: + console: true + level: trace + log: /var/log/clickhouse-server/clickhouse-server.log + errorlog: /var/log/clickhouse-server/clickhouse-server.err.log + size: 1000M + count: 50 +``` + +The same `spec.settings.logger` block applies to a `KeeperCluster`; the operator writes its files under `/var/log/clickhouse-keeper/` instead. + + +Console logging stays on regardless of `logToFile`, so `kubectl logs` keeps working even when you disable file logging. Set `jsonLogs: true` when you ship logs to a structured log store that parses JSON. + + ## Custom configuration {#custom-configuration} ### Embedded extra configuration {#embedded-extra-configuration} @@ -442,8 +774,8 @@ spec: ``` #### Useful links: -* [YAML configuration examples](/concepts/features/configuration/server-config/configuration-files#example-1) -* [All server settings](/reference/settings/server-settings/settings) +* [YAML configuration examples](/core/concepts/features/configuration/server-config/configuration-files#example-1) +* [All server settings](/core/reference/settings/server-settings/settings) ### Embedded extra users configuration {#embedded-extra-users-configuration} @@ -475,7 +807,7 @@ spec: The `extraUsersConfig` is stored in k8s ConfigMap object. Avoid plain text secrets there. -#### See [documentation](/concepts/features/configuration/settings/settings-users) for all supported ClickHouse users configuration options. +#### See [documentation](/core/concepts/features/configuration/settings/settings-users) for all supported ClickHouse users configuration options. ### Configuration example {#configuration-example} diff --git a/products/kubernetes-operator/guides/introduction.mdx b/products/kubernetes-operator/guides/introduction.mdx index ab530d166..6c08a0fd9 100644 --- a/products/kubernetes-operator/guides/introduction.mdx +++ b/products/kubernetes-operator/guides/introduction.mdx @@ -101,7 +101,7 @@ The ClickHouse Operator automatically replicates database definitions across all ### What Gets Replicated {#what-gets-replicated} The operator synchronizes: -- [Replicated](/reference/engines/database-engines/replicated) database definitions +- [Replicated](/core/reference/engines/database-engines/replicated) database definitions - Integration database engines (PostgreSQL, MySQL, etc.) The operator does **not** synchronize: @@ -114,7 +114,7 @@ The operator does **not** synchronize: **Best practice** -Always use the [Replicated](/reference/engines/database-engines/replicated) database engine for production deployments. +Always use the [Replicated](/core/reference/engines/database-engines/replicated) database engine for production deployments. Benefits: diff --git a/products/kubernetes-operator/guides/monitoring.mdx b/products/kubernetes-operator/guides/monitoring.mdx new file mode 100644 index 000000000..a994d3d54 --- /dev/null +++ b/products/kubernetes-operator/guides/monitoring.mdx @@ -0,0 +1,384 @@ +--- +position: 3 +slug: /clickhouse-operator/guides/monitoring +title: 'Monitoring' +keywords: ['kubernetes', 'prometheus', 'monitoring', 'metrics'] +description: 'How to scrape, secure, and use the operator metrics and health endpoints.' +doc_type: 'guide' +--- + +# Monitoring the ClickHouse Operator + +The operator exposes Prometheus-compatible metrics and Kubernetes health probes so that you can observe its reconciliation activity, detect stalled controllers, and alert on failures. + +This guide covers what the operator exposes, how to scrape it, and which queries are useful day to day. + + +This guide is about the **operator process itself** (the controller manager). For ClickHouse server metrics (queries, parts, replication lag), use the [Prometheus endpoint in ClickHouse](/operations/server-configuration-parameters/settings#prometheus) to scrape it separately. + + +## Endpoints {#endpoints} + +The operator process exposes two HTTP endpoints inside the manager pod: + +| Endpoint | Default port | Path | Purpose | +|---|---|---|---| +| Metrics | `8080` (Helm) / `0` disabled (binary default) | `/metrics` | Prometheus exposition format | +| Health probe | `8081` | `/healthz`, `/readyz` | Kubernetes liveness and readiness | + +The metrics endpoint is **off by default** when running the operator binary directly (`--metrics-bind-address=0`). The Helm chart turns it on with `metrics.enable: true` and `metrics.port: 8080`. + +The health probe endpoint is always on; the deployment template wires `/healthz` and `/readyz` to the pod's liveness and readiness probes on port `8081`. + +## Operator binary flags {#operator-binary-flags} + +The relevant `manager` flags (defined in [`cmd/main.go`](https://github.com/ClickHouse/clickhouse-operator/blob/main/cmd/main.go)): + +| Flag | Default | Description | +|---|---|---| +| `--metrics-bind-address` | `0` (disabled) | Bind address for the metrics endpoint. Set to `:8443` for HTTPS or `:8080` for HTTP. Leave as `0` to disable the metrics server. | +| `--metrics-secure` | `true` | Serve metrics over HTTPS with authn/authz. Set to `false` for plain HTTP. | +| `--metrics-cert-path` | empty | Directory containing TLS cert files (`tls.crt`, `tls.key`) for the metrics server. | +| `--metrics-cert-name` | `tls.crt` | Cert file name inside `--metrics-cert-path`. | +| `--metrics-cert-key` | `tls.key` | Key file name inside `--metrics-cert-path`. | +| `--enable-http2` | `false` | Enable HTTP/2 for the metrics **and webhook** servers. Off by default to mitigate CVE-2023-44487 / CVE-2023-39325. | +| `--leader-elect` | `false` (binary) / `true` (Helm chart) | Enable leader election so only one replica reconciles at a time. The Helm chart sets this flag in `manager.args` by default. | +| `--health-probe-bind-address` | `:8081` | Bind address for `/healthz` and `/readyz`. | + + +The `8443` (HTTPS) / `8080` (HTTP) convention in the flag's help text is only a hint. The Helm chart serves HTTPS on `8080` because it sets both `metrics.port: 8080` and `metrics.secure: true`. There is no port-based mode detection — `--metrics-secure` is what selects HTTPS or HTTP. + + +## Enable metrics via Helm {#enable-metrics-via-helm} + +The chart already creates a `Service` for the metrics port and, optionally, a `ServiceMonitor` for prometheus-operator. + +The metrics endpoint itself is on by default (`metrics.enable: true`, port `8080`, served over HTTPS via `metrics.secure: true`). The only setting you typically need to flip is `prometheus.enable` to have the chart create a `ServiceMonitor` for you: + +```yaml +# values.yaml — minimal override +prometheus: + enable: true +``` + +If you do not use cert-manager, additionally set `certManager.enable: false` and the ServiceMonitor will scrape with `insecureSkipVerify: true`, relying on bearer-token authentication only. + +The full set of metrics-related defaults is: + +```yaml +metrics: + enable: true + port: 8080 + secure: true # HTTPS with authn/authz enforced on every scrape + +certManager: + enable: true # Issues the metrics server certificate + +prometheus: + enable: false # Set to true to render the ServiceMonitor + scraping_annotations: false # Alternative: prometheus.io/scrape pod annotations +``` + +Apply: + +```bash +helm upgrade --install clickhouse-operator \ + oci://ghcr.io/clickhouse/clickhouse-operator-helm \ + -n clickhouse-operator-system --create-namespace \ + -f values.yaml +``` + +After install the chart creates: + +- `Service/metrics-service` — exposes port `8080` (HTTPS when `metrics.secure: true`). +- `ServiceMonitor/-controller-manager-metrics-monitor` — when `prometheus.enable: true`. +- `ClusterRole/-metrics-reader` — non-resource URL `/metrics` with `get` verb. + +## Securing the metrics endpoint {#securing-the-metrics-endpoint} + +When `metrics.secure: true` the metrics server enforces TLS **and** Kubernetes authentication/authorization on every scrape. Scrapers must: + +1. Present a valid Kubernetes bearer token. +2. Belong to a ServiceAccount bound to a ClusterRole granting `get` on the non-resource URL `/metrics`. + +The chart ships such a ClusterRole: + +```yaml +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRole +metadata: + name: clickhouse-operator-metrics-reader +rules: + - nonResourceURLs: + - /metrics + verbs: + - get +``` + +Bind it to the ServiceAccount used by your scraper (typically Prometheus): + +```yaml +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRoleBinding +metadata: + name: prometheus-clickhouse-operator-metrics-reader +roleRef: + apiGroup: rbac.authorization.k8s.io + kind: ClusterRole + name: clickhouse-operator-metrics-reader +subjects: + - kind: ServiceAccount + name: + namespace: +``` + + +If you see `401 Unauthorized` or `403 Forbidden` from the metrics endpoint, the scraper is using HTTPS but is missing/unauthorized for a Kubernetes bearer token, or its ServiceAccount lacks the binding above. Disabling security by setting `metrics.secure: false` is **not recommended** in shared clusters because anyone with network reachability to the pod could scrape the endpoint. + + +## ServiceMonitor reference {#servicemonitor-reference} + +The chart renders a ServiceMonitor of this shape when `prometheus.enable: true`: + +```yaml +apiVersion: monitoring.coreos.com/v1 +kind: ServiceMonitor +metadata: + name: -controller-manager-metrics-monitor + namespace: + labels: + control-plane: controller-manager +spec: + selector: + matchLabels: + control-plane: controller-manager + endpoints: + - path: /metrics + port: https # "http" when metrics.secure: false + scheme: https + bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token + tlsConfig: + serverName: -metrics-service..svc + ca: + secret: + name: metrics-server-cert + key: ca.crt + cert: + secret: + name: metrics-server-cert + key: tls.crt + keySecret: + name: metrics-server-cert + key: tls.key +``` + +If your Prometheus instance does not run cert-manager, set `tlsConfig.insecureSkipVerify: true` and rely on bearer-token authentication only — the chart already does this when `certManager.enable: false`. + +## Standalone Prometheus example {#standalone-prometheus-example} + +If you do not use kube-prometheus-stack, the repository ships a self-contained example at [`examples/prometheus_secure_metrics_scraper.yaml`](https://github.com/ClickHouse/clickhouse-operator/blob/main/examples/prometheus_secure_metrics_scraper.yaml). It creates a ServiceAccount, the necessary RBAC, and a `Prometheus` CR that selects the operator's ServiceMonitor. + +## Health probe endpoints {#health-probe-endpoints} + +| Path | Used by | Returns | +|---|---|---| +| `/healthz` | Kubernetes liveness probe | `200 OK` as long as the probe server is listening. | +| `/readyz` | Kubernetes readiness probe | `200 OK` as long as the probe server is listening. | + +Both endpoints are registered with the same trivial ping check (`healthz.Ping` from `sigs.k8s.io/controller-runtime`). A failing probe therefore means "the manager process is not serving HTTP on `:8081`" — not "controllers are unhealthy". To detect controller-level problems, use the [reconciliation metrics](#reconciliation-activity) instead. + +Both endpoints are served on port `8081` by default. They are wired to the deployment as: + +```yaml +livenessProbe: + httpGet: + path: /healthz + port: 8081 + initialDelaySeconds: 15 + periodSeconds: 20 +readinessProbe: + httpGet: + path: /readyz + port: 8081 + initialDelaySeconds: 5 +``` + +A repeatedly failing probe usually means the probe server itself never came up — for example, the manager exited early during startup. Check the manager logs for `unable to start manager`, RBAC failures, or `cache did not sync` errors. + +## Metrics catalog {#metrics-catalog} + +The operator does not register custom Prometheus collectors. Everything below is exposed by the underlying `controller-runtime` and `client-go` libraries. The most useful series, grouped by purpose: + +### Reconciliation activity {#reconciliation-activity} + +| Metric | Type | Labels | +|---|---|---| +| `controller_runtime_reconcile_total` | counter | `controller`, `result` (`success` / `error` / `requeue` / `requeue_after`) | +| `controller_runtime_reconcile_errors_total` | counter | `controller` | +| `controller_runtime_reconcile_time_seconds_bucket` | histogram | `controller` | +| `controller_runtime_active_workers` | gauge | `controller` | +| `controller_runtime_max_concurrent_reconciles` | gauge | `controller` | + +The `controller` label is derived by `controller-runtime` from the resource type registered with `For(...)`. With the current code in `internal/controller/clickhouse` and `internal/controller/keeper` this resolves to `clickhousecluster` and `keepercluster` respectively. If you have customized the operator, verify with a one-time scrape of `/metrics`. + +### Work queue {#work-queue} + +| Metric | Type | Labels | +|---|---|---| +| `workqueue_depth` | gauge | `name` (= controller name) | +| `workqueue_adds_total` | counter | `name` | +| `workqueue_retries_total` | counter | `name` | +| `workqueue_unfinished_work_seconds` | gauge | `name` | +| `workqueue_longest_running_processor_seconds` | gauge | `name` | +| `workqueue_queue_duration_seconds_bucket` | histogram | `name` | +| `workqueue_work_duration_seconds_bucket` | histogram | `name` | + +### API server traffic {#api-server-traffic} + +| Metric | Type | Labels | +|---|---|---| +| `rest_client_requests_total` | counter | `code`, `method`, `host` | +| `rest_client_request_duration_seconds_bucket` | histogram | `verb`, `host`, `url` | + +### Leader election {#leader-election} + +| Metric | Type | Labels | +|---|---|---| +| `leader_election_master_status` | gauge | `name` (= `d4ceba06.clickhouse.com`) | + +The Helm chart enables `--leader-elect` by default, so this metric is present in standard Helm installs. When running the binary directly without the flag, the metric is absent. + +### Runtime {#runtime} + +Standard Go process and runtime collectors — `go_goroutines`, `go_memstats_*`, `process_cpu_seconds_total`, `process_resident_memory_bytes`, etc. + +## Useful PromQL queries {#useful-promql-queries} + +### Health overview + +```promql +# Reconciliation rate per controller +sum by (controller) (rate(controller_runtime_reconcile_total[5m])) + +# Error rate per controller (alert if > 0 sustained) +sum by (controller) (rate(controller_runtime_reconcile_errors_total[5m])) + +# p99 reconcile latency +histogram_quantile( + 0.99, + sum by (le, controller) (rate(controller_runtime_reconcile_time_seconds_bucket[5m])) +) +``` + +### Backlog detection + +```promql +# Pending items in the work queue — a sustained value > 0 indicates a backlog, +# but short spikes during large reconciles are normal. +avg_over_time(workqueue_depth[10m]) + +# Reconciles that have been running for a long time +workqueue_longest_running_processor_seconds > 60 +``` + +### Throttling and API pressure + +```promql +# Throttled requests to the API server +sum by (code, host) (rate(rest_client_requests_total{code=~"4..|5.."}[5m])) + +# 99th percentile API call duration +histogram_quantile( + 0.99, + sum by (le, verb) (rate(rest_client_request_duration_seconds_bucket[5m])) +) +``` + +### Leader status (HA deployment) + +```promql +# Should be exactly 1 across the replica set (Helm install enables --leader-elect by default) +sum(leader_election_master_status{name="d4ceba06.clickhouse.com"}) +``` + +## Suggested alerts {#suggested-alerts} + +Starting point for a PrometheusRule (tune thresholds for your environment): + +```yaml +groups: + - name: clickhouse-operator + rules: + - alert: ClickHouseOperatorReconcileErrors + # > 0.1 errors/s sustained = > ~6 errors/min, filters transient conflicts. + expr: sum by (controller) (rate(controller_runtime_reconcile_errors_total[5m])) > 0.1 + for: 15m + labels: + severity: warning + annotations: + summary: 'ClickHouse operator is failing to reconcile {{ $labels.controller }}' + + - alert: ClickHouseOperatorWorkqueueBacklog + # avg_over_time avoids alerting on transient bursts during large reconciles. + expr: avg_over_time(workqueue_depth[10m]) > 5 + for: 30m + labels: + severity: warning + annotations: + summary: 'Operator work queue backlog sustained for 30m' + + - alert: ClickHouseOperatorReconcileSlow + expr: | + histogram_quantile( + 0.99, + sum by (le, controller) (rate(controller_runtime_reconcile_time_seconds_bucket[10m])) + ) > 30 + for: 15m + labels: + severity: warning + annotations: + summary: 'p99 reconcile latency for {{ $labels.controller }} > 30s' + + - alert: ClickHouseOperatorNoLeader + expr: absent(leader_election_master_status{name="d4ceba06.clickhouse.com"}) == 1 + for: 5m + labels: + severity: critical + annotations: + summary: 'No leader for the ClickHouse operator (HA deployment)' +``` + +The last rule is only meaningful when leader election is enabled. + +## Verifying the setup {#verifying-the-setup} + +A quick end-to-end check, assuming the chart was installed in `clickhouse-operator-system`: + +```bash +NS=clickhouse-operator-system + +# The metrics Service exists and selects the manager pod +kubectl -n $NS get svc -l control-plane=controller-manager + +# The ServiceMonitor exists (only with prometheus.enable=true) +kubectl -n $NS get servicemonitor -l control-plane=controller-manager + +# Manager pod is Ready (readiness probe answers) +kubectl -n $NS get pod -l control-plane=controller-manager + +# Direct scrape from inside the cluster (with the metrics-reader binding) +kubectl -n $NS run curl-metrics --rm -it --restart=Never \ + --image=curlimages/curl:8.10.1 -- sh -c ' + TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token) + curl -sk -H "Authorization: Bearer $TOKEN" \ + https://-metrics-service.'$NS'.svc:8080/metrics \ + | head -20 + ' +``` + +If the scrape returns metrics in the Prometheus exposition format, the endpoint and RBAC are correctly wired. + +## Related guides {#related-guides} + +- [Installation](/products/kubernetes-operator/install/helm) — Helm values relevant to monitoring. +- [Configuration](/products/kubernetes-operator/guides/configuration) — TLS configuration shared with the metrics server. diff --git a/products/kubernetes-operator/install/olm.mdx b/products/kubernetes-operator/install/olm.mdx index 25051f352..8ff3cb140 100644 --- a/products/kubernetes-operator/install/olm.mdx +++ b/products/kubernetes-operator/install/olm.mdx @@ -97,3 +97,4 @@ More info about uninstalling can be found in the [OLM documentation](https://olm ## Additional Resources {#additional-resources} - [Operator Lifecycle Manager Documentation](https://olm.operatorframework.io/docs) + diff --git a/products/kubernetes-operator/navigation.json b/products/kubernetes-operator/navigation.json index 302acf737..f41a15419 100644 --- a/products/kubernetes-operator/navigation.json +++ b/products/kubernetes-operator/navigation.json @@ -17,7 +17,8 @@ "expanded": true, "pages": [ "products/kubernetes-operator/guides/introduction", - "products/kubernetes-operator/guides/configuration" + "products/kubernetes-operator/guides/configuration", + "products/kubernetes-operator/guides/monitoring" ] }, { diff --git a/products/kubernetes-operator/overview.mdx b/products/kubernetes-operator/overview.mdx index ba1c47d29..55d5df1e5 100644 --- a/products/kubernetes-operator/overview.mdx +++ b/products/kubernetes-operator/overview.mdx @@ -34,7 +34,6 @@ Choose your preferred installation method: - **[Introduction](/products/kubernetes-operator/guides/introduction)** - General overview of ClickHouse Operator concepts - **[Configuration Guide](/products/kubernetes-operator/guides/configuration)** - Configure ClickHouse and Keeper clusters -- **[Monitoring](/products/kubernetes-operator/guides/introduction)** - Monitor clickhouse-operator using Prometheus metrics ## Reference {#reference} diff --git a/products/kubernetes-operator/reference/api-reference.mdx b/products/kubernetes-operator/reference/api-reference.mdx index 43e59de85..8911ffd51 100644 --- a/products/kubernetes-operator/reference/api-reference.mdx +++ b/products/kubernetes-operator/reference/api-reference.mdx @@ -8,9 +8,20 @@ doc_type: 'reference' sidebarTitle: 'API reference' --- - This document provides detailed API reference for the ClickHouse Operator custom resources. +## AdditionalPort {#additionalport} + +AdditionalPort declares one extra TCP port to expose on the ClickHouse Pod and the operator-managed headless Service. + +| Field | Type | Description | Required | Default | +|-------|------|-------------|----------|---------| +| `name` | string | Name uniquely identifies the port within the list. Used as both the container port name and the Service port name.
This must be a DNS_LABEL. | true | | +| `port` | integer | Port is the TCP port number to expose. | true | | + +Appears in: +- [ClickHouseClusterSpec](#clickhouseclusterspec) + ## ClickHouseCluster {#clickhousecluster} ClickHouseCluster is the Schema for the `clickhouseclusters` API. @@ -65,6 +76,7 @@ ClickHouseClusterSpec defines the desired state of ClickHouseCluster. | `upgradeChannel` | string | UpgradeChannel specifies the release channel for major version upgrade checks.
When empty, only minor updates will be proposed. Allowed values are: stable, lts or specific major.minor version (e.g. 25.8). | false | | | `versionProbeTemplate` | [VersionProbeTemplate](#versionprobetemplate) | VersionProbeTemplate overrides for the version detection Job. | false | | | `externalSecret` | [ExternalSecret](#externalsecret) | ExternalSecret is an optional reference to an externally-managed Secret containing cluster secrets.
The secret must reside in the same namespace as the cluster. | false | | +| `additionalPorts` | [AdditionalPort](#additionalport) array | AdditionalPorts declares extra TCP ports to expose on the ClickHouse Pod and the operator-managed headless Service.
The operator only adds the ports to the Kubernetes resources, it does not configure the ClickHouse server to listen on them. | false | | Appears in: - [ClickHouseCluster](#clickhousecluster) @@ -152,7 +164,7 @@ ContainerTemplateSpec describes the container configuration overrides for the cl |-------|------|-------------|----------|---------| | `image` | [ContainerImage](#containerimage) | Image is the container image to be deployed. | true | | | `imagePullPolicy` | [PullPolicy](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#pullpolicy-v1-core) | ImagePullPolicy for the image, which defaults to IfNotPresent. | false | | -| `resources` | [ResourceRequirements](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#resourcerequirements-v1-core) | Resources is the resource requirements for the server container.
Deep-merged with operator defaults via SMP. Individual limits and requests override only matching
keys; unset fields preserve operator defaults. | false | | +| `resources` | [ResourceRequirements](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#resourcerequirements-v1-core) | Resources is the resource requirements for the server container.
Applied as a whole: operator defaults are used only when all resource fields are empty. | false | | | `volumeMounts` | [VolumeMount](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#volumemount-v1-core) array | VolumeMounts is the list of volume mounts for the container.
Concatenated with operator-generated mounts. Entries sharing a `mountPath` with an operator
mount are merged into a projected volume. | false | | | `env` | [EnvVar](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#envvar-v1-core) array | Env is the list of environment variables to set in the container.
Merged with operator defaults by name. | false | | | `securityContext` | [SecurityContext](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#securitycontext-v1-core) | SecurityContext defines the security options the container should be run with.
A non-nil SecurityContext fully replaces operator defaults; the user owns the
entire struct. When nil, operator defaults are preserved.
More info: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/ | false | |