Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
301 changes: 296 additions & 5 deletions products/kubernetes-operator/guides/configuration.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -239,10 +239,10 @@ spec:
resources:
requests:
cpu: "250m"
memory: "256Mi"
memory: "512Mi"
limits:
cpu: "1"
memory: "1Gi"
memory: "512Mi"
```

### Environment variables {#environment-variables}
Expand Down Expand Up @@ -322,6 +322,297 @@ spec:
key: <ca-certificate-key>
```

## External Secret {#external-secret}

By default the operator creates and owns a Secret containing the cluster's internal credentials (interserver password, management password, keeper identity, cluster secret, named-collections key). The Secret is named after the cluster and lives in the cluster's namespace.

If you want to manage these credentials yourself — for example, sourcing them from HashiCorp Vault, AWS Secrets Manager, or [External Secrets Operator](https://external-secrets.io/) — point the operator at a pre-existing Secret using `spec.externalSecret`:

```yaml
apiVersion: clickhouse.com/v1alpha1
kind: ClickHouseCluster
metadata:
name: sample
spec:
replicas: 2
keeperClusterRef:
name: sample
dataVolumeClaimSpec:
resources:
requests:
storage: 10Gi
externalSecret:
name: my-clickhouse-credentials
policy: Observe
```

<Note>
The referenced Secret must reside in the **same namespace** as the ClickHouseCluster. The operator never deletes a Secret it did not create.
</Note>

### Required keys {#external-secret-required-keys}

The Secret must contain the following keys:

| Key | Format | When required |
|---|---|---|
| `interserver-password` | plaintext password | Always |
| `management-password` | plaintext password | Always |
| `keeper-identity` | `clickhouse:<password>` | Always |
| `cluster-secret` | plaintext password | Always |
| `named-collections-key` | hex-encoded 16-byte AES key (32 hex chars) | ClickHouse `>= 25.12` only |

A complete Secret looks like this:

```yaml
apiVersion: v1
kind: Secret
metadata:
name: my-clickhouse-credentials
namespace: sample
type: Opaque
stringData:
interserver-password: "a-strong-random-password"
management-password: "another-strong-password"
keeper-identity: "clickhouse:keeper-auth-password"
cluster-secret: "cluster-internal-secret"
named-collections-key: "0123456789abcdef0123456789abcdef" # 32 hex chars = 16 bytes
```

### Policy: Observe vs Manage {#external-secret-policy}

`spec.externalSecret.policy` controls how the operator handles missing required keys:

| Policy | Behavior on missing keys |
|---|---|
| `Observe` (default) | Reconciliation is **blocked** until every required key is present. The operator reports each missing key — and the format hint for it — via the `ExternalSecretValid` status condition and a `Warning` event. |
| `Manage` | The operator **generates** any missing required keys and writes them back to the same Secret. Useful for bootstrapping: create an empty Secret, let the operator fill it, then optionally tighten access. The operator still never deletes the Secret. |

<Note>
Even with `policy: Manage` the Secret must already exist in the namespace — the operator never creates the Secret itself, it only writes generated keys into an existing one. If the referenced Secret is missing, reconciliation is blocked with the `ExternalSecretNotFound` reason regardless of policy.
</Note>

Pick `Observe` when an external system (Vault, ESO, sealed-secrets, GitOps) is the source of truth and you want the operator to fail loudly on misconfiguration. Pick `Manage` when you want self-sufficient bootstrapping but still want to retain ownership of the Secret object itself (for example, to back it up).

### Status condition and troubleshooting {#external-secret-status}

The operator exposes a `ExternalSecretValid` condition on `ClickHouseCluster.status.conditions`. Inspect it when reconciliation looks stuck:

```bash
# Plain kubectl — works out of the box
kubectl describe clickhousecluster sample | sed -n '/Conditions:/,$p'

# Same data as YAML
kubectl get clickhousecluster sample -o yaml | sed -n '/conditions:/,/^[^ ]/p'

# Pretty-printed JSON (requires jq)
kubectl get clickhousecluster sample -o jsonpath='{.status.conditions}' | jq
```

Possible reasons:

| `reason` | Meaning | Fix |
|---|---|---|
| `ExternalSecretNotFound` | The referenced Secret does not exist in the namespace. | Create the Secret, or fix `spec.externalSecret.name`. |
| `ExternalSecretInvalid` | The Secret exists but lacks required keys (only with `Observe`). The message lists each missing key together with its expected format. | Add the missing keys, or switch to `policy: Manage`. |
| `ExternalSecretValid` | All required keys are present and the operator is using the Secret. | — |

The operator requeues reconciliation while the Secret is invalid, so once you add the missing keys the next reconcile picks them up automatically — no need to bounce pods.

<Note>
The set of required keys depends on the running ClickHouse version. `named-collections-key` is only validated once the operator's version probe has detected ClickHouse `25.12` or newer. On older versions the key may be absent from the Secret.
</Note>

## Additional ports {#additional-ports}

The operator exposes a fixed set of ports on every ClickHouse Pod and its headless Service: `8123` HTTP, `9000` native, `9009` interserver, `9001` management, `9363` Prometheus metrics, and the TLS variants `8443`/`9440` when TLS is enabled. To make ClickHouse listen on additional protocols — MySQL, PostgreSQL, gRPC, or any custom port — declare them in `spec.additionalPorts`:

```yaml
spec:
additionalPorts:
- name: mysql
port: 9004
- name: postgres
port: 9005
- name: grpc
port: 9100
```

The operator adds those ports to the Pod's `containerPorts` and to the headless Service. The complete example lives at [`examples/custom_protocols.yaml`](https://github.com/ClickHouse/clickhouse-operator/blob/main/examples/custom_protocols.yaml).

<Warning>
`additionalPorts` only opens the ports on the Kubernetes side. It does **not** configure the ClickHouse server to listen on them. You also have to enable the matching protocol in `spec.settings.extraConfig.protocols`. Without that, the port is open on the Service but nothing inside the pod is answering.
</Warning>

### End-to-end example: MySQL wire protocol {#additional-ports-mysql-example}

To expose ClickHouse over the MySQL wire protocol on port `9004`:

```yaml
apiVersion: clickhouse.com/v1alpha1
kind: ClickHouseCluster
metadata:
name: sample
spec:
replicas: 1
keeperClusterRef:
name: sample
dataVolumeClaimSpec:
resources:
requests:
storage: 2Gi

# 1) Open the port on the Pod and the headless Service.
additionalPorts:
- name: mysql
port: 9004

# 2) Tell ClickHouse server to actually listen on it.
settings:
extraConfig:
protocols:
mysql:
type: mysql
port: 9004
description: "MySQL wire protocol"
```

After applying, verify from inside the cluster:

```bash
kubectl exec sample-clickhouse-0-0-0 -- \
clickhouse-client --port 9004 --query "SELECT 1"
```

### Field constraints {#additional-ports-constraints}

| Field | Rule |
|---|---|
| `name` | Must match the DNS_LABEL pattern `^[a-z]([-a-z0-9]*[a-z0-9])?$`, max 63 characters. Uniqueness is enforced by the CRD as a list-map key. |
| `port` | Integer in `[1, 65535]`. The webhook rejects duplicate port numbers within the list. |

### Reserved ports and names {#additional-ports-reserved}

The validating webhook rejects `additionalPorts` entries that would collide with ports the operator binds itself. All TLS-related ports are reserved **unconditionally** so that flipping `spec.settings.tls.enabled` later cannot break a previously valid cluster.

| Port | Reserved for |
|---|---|
| `8123` | HTTP |
| `8443` | HTTPS |
| `9000` | native TCP |
| `9440` | native TLS |
| `9009` | interserver |
| `9001` | management |
| `9363` | Prometheus metrics |

The following names are also rejected — they are the operator's internal protocol-type identifiers (not the human-readable aliases):

| Name |
|---|
| `http` |
| `http-secure` |
| `tcp` |
| `tcp-secure` |
| `interserver` |
| `management` |
| `prometheus` |

A rejected request produces an error such as:

```
spec.additionalPorts[0].port: 8123 is reserved for the operator-managed HTTP port
spec.additionalPorts[0].name: "http" is reserved by the operator
```

## Version probe and upgrade channel {#version-probe-and-upgrade-channel}

The operator does two independent things with cluster versions:

1. **Version probe** — a Kubernetes `Job` that runs the container image once to detect the running ClickHouse / Keeper version. The detected version is recorded in `.status.version` and used by other reconciliation steps (e.g. the `External Secret` named-collections key is only required from ClickHouse `25.12`).
2. **Upgrade channel** — a periodic check against the public ClickHouse release feed (`https://clickhouse.com/data/version_date.tsv`). The operator reports whether a newer version is available via the `VersionUpgraded` status condition. It never upgrades the cluster on its own — the user is in control of the image tag.

### Choosing a release channel {#upgrade-channel-choosing}

`spec.upgradeChannel` selects which set of upstream releases the operator compares against. Same field exists on both `ClickHouseCluster` and `KeeperCluster`.

```yaml
spec:
upgradeChannel: lts # or "stable", or "25.8", or omitted
```

Allowed values (validated by the CRD with the pattern `^(lts|stable|\d+\.\d+)?$`):

| Value | Behavior |
|---|---|
| _empty_ (default) | The operator proposes only **minor** updates within the currently-running major.minor line. A cluster on `25.8.3.1` will be told about `25.8.4.x` but not `25.9.x`. |
| `stable` | Tracks the upstream `stable` channel — the latest release that ClickHouse Inc. flags as stable on the main release line. Receives major upgrades sooner than the `lts` channel. |
| `lts` | Tracks the upstream `lts` channel — long-term support releases. Receives major upgrades less frequently, with longer support windows. |
| `25.8` (or any `<major>.<minor>`) | Pins the channel to a specific major.minor line. Major upgrades beyond it are not proposed even if a newer version exists upstream. |

For production, pinning the channel to an explicit `<major>.<minor>` (e.g. `25.8`) is generally preferred. It locks the cluster to the intended major release line and lets the operator surface a `WrongReleaseChannel` warning if any replica somehow drifts onto a different major — which matters especially when the image is referenced by a digest (`@sha256:...`) rather than by a human-readable tag. The empty default is fine for development clusters where major-version jumps are not a concern.

### Status conditions {#version-status-conditions}

Two conditions surface the result of the probe and the upgrade check:

| Condition | Reason | Meaning |
|---|---|---|
| `VersionInSync` | `VersionMatch` | All replicas report the same version as the image |
| `VersionInSync` | `VersionMismatch` | Replicas are running different versions. This reason is suppressed during a planned rolling upgrade. It typically surfaces when a mutable image tag has been pinned (for example `latest` or a bare major like `26.3`) and the underlying registry has shifted between pulls, so different replicas ended up on different patches of the same tag. |
| `VersionInSync` | `VersionPending` | Version probe Job has not finished yet |
| `VersionInSync` | `VersionProbeFailed` | Probe Job failed; the operator cannot determine the running version |
| `VersionUpgraded` | `UpToDate` | The cluster is on the latest version available in the selected channel |
| `VersionUpgraded` | `MinorUpdateAvailable` | A newer patch is available in the same `major.minor` line |
| `VersionUpgraded` | `MajorUpdateAvailable` | A newer `major.minor` is available within the chosen channel |
| `VersionUpgraded` | `VersionOutdated` | The running version is out of date and will no longer receive fixes from the selected channel — typically because the major line has been dropped from `lts` or `stable` upstream |
| `VersionUpgraded` | `WrongReleaseChannel` | The running image does not belong to the selected `upgradeChannel`. Example: a cluster running `26.5` with `upgradeChannel: lts`, since `26.5` is not part of the upstream `lts` line. |
| `VersionUpgraded` | `UpgradeCheckFailed` | The operator could not reach the upstream release feed |

Inspect them with:

```bash
kubectl get clickhousecluster sample -o yaml | sed -n '/conditions:/,/^[^ ]/p'
```

### Overriding the version probe Job {#version-probe-template}

The probe is implemented as a regular Kubernetes `Job`. If your cluster has admission policies that require specific Tolerations, node selectors, security contexts, or you want to limit how long completed probe Jobs linger, override the template via `spec.versionProbeTemplate`:

```yaml
spec:
versionProbeTemplate:
spec:
ttlSecondsAfterFinished: 600 # delete completed probe Jobs 10 minutes after completion
template:
spec:
nodeSelector:
kubernetes.io/arch: amd64
tolerations:
- key: dedicated
operator: Equal
value: clickhouse
effect: NoSchedule
containers:
- name: version-probe
resources:
requests:
cpu: 50m
memory: 64Mi
```

The container name `version-probe` is the operator's default — the entry under `containers:` matches it by name, so the operator deep-merges the user-provided fields on top of the defaults.

### Operator-wide controls {#version-operator-flags}

Two flags on the operator manager control the upgrade-check loop globally:

| Flag | Default | Effect |
|---|---|---|
| `--version-update-interval` | `24h` | How often the operator re-fetches the upstream version list |
| `--disable-version-update-checks` | `false` | Disables the upgrade checker entirely. The `VersionUpgraded` condition is not set, and no outbound HTTP traffic to `clickhouse.com` is generated |

Set `--disable-version-update-checks=true` in air-gapped environments or when egress to `clickhouse.com` is not allowed.

## ClickHouse settings {#clickhouse-settings}

### Default user password {#default-user-password}
Expand Down Expand Up @@ -442,8 +733,8 @@ spec:
```

#### Useful links:
* [YAML configuration examples](/concepts/features/configuration/server-config/configuration-files#example-1)
* [All server settings](/reference/settings/server-settings/settings)
* [YAML configuration examples](/core/concepts/features/configuration/server-config/configuration-files#example-1)
* [All server settings](/core/reference/settings/server-settings/settings)

### Embedded extra users configuration {#embedded-extra-users-configuration}

Expand Down Expand Up @@ -475,7 +766,7 @@ spec:
The `extraUsersConfig` is stored in k8s ConfigMap object. Avoid plain text secrets there.
</Note>

#### See [documentation](/concepts/features/configuration/settings/settings-users) for all supported ClickHouse users configuration options.
#### See [documentation](/core/concepts/features/configuration/settings/settings-users) for all supported ClickHouse users configuration options.

### Configuration example {#configuration-example}

Expand Down
4 changes: 2 additions & 2 deletions products/kubernetes-operator/guides/introduction.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ The ClickHouse Operator automatically replicates database definitions across all
### What Gets Replicated {#what-gets-replicated}

The operator synchronizes:
- [Replicated](/reference/engines/database-engines/replicated) database definitions
- [Replicated](/core/reference/engines/database-engines/replicated) database definitions
- Integration database engines (PostgreSQL, MySQL, etc.)

The operator does **not** synchronize:
Expand All @@ -114,7 +114,7 @@ The operator does **not** synchronize:
<Tip>
**Best practice**

Always use the [Replicated](/reference/engines/database-engines/replicated) database engine for production deployments.
Always use the [Replicated](/core/reference/engines/database-engines/replicated) database engine for production deployments.
</Tip>

Benefits:
Expand Down
Loading