diff --git a/products/kubernetes-operator/guides/configuration.mdx b/products/kubernetes-operator/guides/configuration.mdx
index 7b9ac3d3c..f23d67c00 100644
--- a/products/kubernetes-operator/guides/configuration.mdx
+++ b/products/kubernetes-operator/guides/configuration.mdx
@@ -239,10 +239,10 @@ spec:
     resources:
       requests:
         cpu: "250m"
-        memory: "256Mi"
+        memory: "512Mi"
       limits:
         cpu: "1"
-        memory: "1Gi"
+        memory: "512Mi"
 ```
 
 ### Environment variables {#environment-variables}
@@ -322,6 +322,297 @@ spec:
             key: <ca-certificate-key>
 ```
 
+## External Secret {#external-secret}
+
+By default the operator creates and owns a Secret containing the cluster's internal credentials (interserver password, management password, keeper identity, cluster secret, named-collections key). The Secret is named after the cluster and lives in the cluster's namespace.
+
+If you want to manage these credentials yourself — for example, sourcing them from HashiCorp Vault, AWS Secrets Manager, or [External Secrets Operator](https://external-secrets.io/) — point the operator at a pre-existing Secret using `spec.externalSecret`:
+
+```yaml
+apiVersion: clickhouse.com/v1alpha1
+kind: ClickHouseCluster
+metadata:
+  name: sample
+spec:
+  replicas: 2
+  keeperClusterRef:
+    name: sample
+  dataVolumeClaimSpec:
+    resources:
+      requests:
+        storage: 10Gi
+  externalSecret:
+    name: my-clickhouse-credentials
+    policy: Observe
+```
+
+<Note>
+The referenced Secret must reside in the **same namespace** as the ClickHouseCluster. The operator never deletes a Secret it did not create.
+</Note>
+
+### Required keys {#external-secret-required-keys}
+
+The Secret must contain the following keys:
+
+| Key | Format | When required |
+|---|---|---|
+| `interserver-password` | plaintext password | Always |
+| `management-password` | plaintext password | Always |
+| `keeper-identity` | `clickhouse:<password>` | Always |
+| `cluster-secret` | plaintext password | Always |
+| `named-collections-key` | hex-encoded 16-byte AES key (32 hex chars) | ClickHouse `>= 25.12` only |
+
+A complete Secret looks like this:
+
+```yaml
+apiVersion: v1
+kind: Secret
+metadata:
+  name: my-clickhouse-credentials
+  namespace: sample
+type: Opaque
+stringData:
+  interserver-password: "a-strong-random-password"
+  management-password: "another-strong-password"
+  keeper-identity: "clickhouse:keeper-auth-password"
+  cluster-secret: "cluster-internal-secret"
+  named-collections-key: "0123456789abcdef0123456789abcdef"   # 32 hex chars = 16 bytes
+```
+
+### Policy: Observe vs Manage {#external-secret-policy}
+
+`spec.externalSecret.policy` controls how the operator handles missing required keys:
+
+| Policy | Behavior on missing keys |
+|---|---|
+| `Observe` (default) | Reconciliation is **blocked** until every required key is present. The operator reports each missing key — and the format hint for it — via the `ExternalSecretValid` status condition and a `Warning` event. |
+| `Manage` | The operator **generates** any missing required keys and writes them back to the same Secret. Useful for bootstrapping: create an empty Secret, let the operator fill it, then optionally tighten access. The operator still never deletes the Secret. |
+
+<Note>
+Even with `policy: Manage` the Secret must already exist in the namespace — the operator never creates the Secret itself, it only writes generated keys into an existing one. If the referenced Secret is missing, reconciliation is blocked with the `ExternalSecretNotFound` reason regardless of policy.
+</Note>
+
+Pick `Observe` when an external system (Vault, ESO, sealed-secrets, GitOps) is the source of truth and you want the operator to fail loudly on misconfiguration. Pick `Manage` when you want self-sufficient bootstrapping but still want to retain ownership of the Secret object itself (for example, to back it up).
+
+### Status condition and troubleshooting {#external-secret-status}
+
+The operator exposes a `ExternalSecretValid` condition on `ClickHouseCluster.status.conditions`. Inspect it when reconciliation looks stuck:
+
+```bash
+# Plain kubectl — works out of the box
+kubectl describe clickhousecluster sample | sed -n '/Conditions:/,$p'
+
+# Same data as YAML
+kubectl get clickhousecluster sample -o yaml | sed -n '/conditions:/,/^[^ ]/p'
+
+# Pretty-printed JSON (requires jq)
+kubectl get clickhousecluster sample -o jsonpath='{.status.conditions}' | jq
+```
+
+Possible reasons:
+
+| `reason` | Meaning | Fix |
+|---|---|---|
+| `ExternalSecretNotFound` | The referenced Secret does not exist in the namespace. | Create the Secret, or fix `spec.externalSecret.name`. |
+| `ExternalSecretInvalid` | The Secret exists but lacks required keys (only with `Observe`). The message lists each missing key together with its expected format. | Add the missing keys, or switch to `policy: Manage`. |
+| `ExternalSecretValid` | All required keys are present and the operator is using the Secret. | — |
+
+The operator requeues reconciliation while the Secret is invalid, so once you add the missing keys the next reconcile picks them up automatically — no need to bounce pods.
+
+<Note>
+The set of required keys depends on the running ClickHouse version. `named-collections-key` is only validated once the operator's version probe has detected ClickHouse `25.12` or newer. On older versions the key may be absent from the Secret.
+</Note>
+
+## Additional ports {#additional-ports}
+
+The operator exposes a fixed set of ports on every ClickHouse Pod and its headless Service: `8123` HTTP, `9000` native, `9009` interserver, `9001` management, `9363` Prometheus metrics, and the TLS variants `8443`/`9440` when TLS is enabled. To make ClickHouse listen on additional protocols — MySQL, PostgreSQL, gRPC, or any custom port — declare them in `spec.additionalPorts`:
+
+```yaml
+spec:
+  additionalPorts:
+    - name: mysql
+      port: 9004
+    - name: postgres
+      port: 9005
+    - name: grpc
+      port: 9100
+```
+
+The operator adds those ports to the Pod's `containerPorts` and to the headless Service. The complete example lives at [`examples/custom_protocols.yaml`](https://github.com/ClickHouse/clickhouse-operator/blob/main/examples/custom_protocols.yaml).
+
+<Warning>
+`additionalPorts` only opens the ports on the Kubernetes side. It does **not** configure the ClickHouse server to listen on them. You also have to enable the matching protocol in `spec.settings.extraConfig.protocols`. Without that, the port is open on the Service but nothing inside the pod is answering.
+</Warning>
+
+### End-to-end example: MySQL wire protocol {#additional-ports-mysql-example}
+
+To expose ClickHouse over the MySQL wire protocol on port `9004`:
+
+```yaml
+apiVersion: clickhouse.com/v1alpha1
+kind: ClickHouseCluster
+metadata:
+  name: sample
+spec:
+  replicas: 1
+  keeperClusterRef:
+    name: sample
+  dataVolumeClaimSpec:
+    resources:
+      requests:
+        storage: 2Gi
+
+  # 1) Open the port on the Pod and the headless Service.
+  additionalPorts:
+    - name: mysql
+      port: 9004
+
+  # 2) Tell ClickHouse server to actually listen on it.
+  settings:
+    extraConfig:
+      protocols:
+        mysql:
+          type: mysql
+          port: 9004
+          description: "MySQL wire protocol"
+```
+
+After applying, verify from inside the cluster:
+
+```bash
+kubectl exec sample-clickhouse-0-0-0 -- \
+  clickhouse-client --port 9004 --query "SELECT 1"
+```
+
+### Field constraints {#additional-ports-constraints}
+
+| Field | Rule |
+|---|---|
+| `name` | Must match the DNS_LABEL pattern `^[a-z]([-a-z0-9]*[a-z0-9])?$`, max 63 characters. Uniqueness is enforced by the CRD as a list-map key. |
+| `port` | Integer in `[1, 65535]`. The webhook rejects duplicate port numbers within the list. |
+
+### Reserved ports and names {#additional-ports-reserved}
+
+The validating webhook rejects `additionalPorts` entries that would collide with ports the operator binds itself. All TLS-related ports are reserved **unconditionally** so that flipping `spec.settings.tls.enabled` later cannot break a previously valid cluster.
+
+| Port | Reserved for |
+|---|---|
+| `8123` | HTTP |
+| `8443` | HTTPS |
+| `9000` | native TCP |
+| `9440` | native TLS |
+| `9009` | interserver |
+| `9001` | management |
+| `9363` | Prometheus metrics |
+
+The following names are also rejected — they are the operator's internal protocol-type identifiers (not the human-readable aliases):
+
+| Name |
+|---|
+| `http` |
+| `http-secure` |
+| `tcp` |
+| `tcp-secure` |
+| `interserver` |
+| `management` |
+| `prometheus` |
+
+A rejected request produces an error such as:
+
+```
+spec.additionalPorts[0].port: 8123 is reserved for the operator-managed HTTP port
+spec.additionalPorts[0].name: "http" is reserved by the operator
+```
+
+## Version probe and upgrade channel {#version-probe-and-upgrade-channel}
+
+The operator does two independent things with cluster versions:
+
+1. **Version probe** — a Kubernetes `Job` that runs the container image once to detect the running ClickHouse / Keeper version. The detected version is recorded in `.status.version` and used by other reconciliation steps (e.g. the `External Secret` named-collections key is only required from ClickHouse `25.12`).
+2. **Upgrade channel** — a periodic check against the public ClickHouse release feed (`https://clickhouse.com/data/version_date.tsv`). The operator reports whether a newer version is available via the `VersionUpgraded` status condition. It never upgrades the cluster on its own — the user is in control of the image tag.
+
+### Choosing a release channel {#upgrade-channel-choosing}
+
+`spec.upgradeChannel` selects which set of upstream releases the operator compares against. Same field exists on both `ClickHouseCluster` and `KeeperCluster`.
+
+```yaml
+spec:
+  upgradeChannel: lts   # or "stable", or "25.8", or omitted
+```
+
+Allowed values (validated by the CRD with the pattern `^(lts|stable|\d+\.\d+)?$`):
+
+| Value | Behavior |
+|---|---|
+| _empty_ (default) | The operator proposes only **minor** updates within the currently-running major.minor line. A cluster on `25.8.3.1` will be told about `25.8.4.x` but not `25.9.x`. |
+| `stable` | Tracks the upstream `stable` channel — the latest release that ClickHouse Inc. flags as stable on the main release line. Receives major upgrades sooner than the `lts` channel. |
+| `lts` | Tracks the upstream `lts` channel — long-term support releases. Receives major upgrades less frequently, with longer support windows. |
+| `25.8` (or any `<major>.<minor>`) | Pins the channel to a specific major.minor line. Major upgrades beyond it are not proposed even if a newer version exists upstream. |
+
+For production, pinning the channel to an explicit `<major>.<minor>` (e.g. `25.8`) is generally preferred. It locks the cluster to the intended major release line and lets the operator surface a `WrongReleaseChannel` warning if any replica somehow drifts onto a different major — which matters especially when the image is referenced by a digest (`@sha256:...`) rather than by a human-readable tag. The empty default is fine for development clusters where major-version jumps are not a concern.
+
+### Status conditions {#version-status-conditions}
+
+Two conditions surface the result of the probe and the upgrade check:
+
+| Condition | Reason | Meaning |
+|---|---|---|
+| `VersionInSync` | `VersionMatch` | All replicas report the same version as the image |
+| `VersionInSync` | `VersionMismatch` | Replicas are running different versions. This reason is suppressed during a planned rolling upgrade. It typically surfaces when a mutable image tag has been pinned (for example `latest` or a bare major like `26.3`) and the underlying registry has shifted between pulls, so different replicas ended up on different patches of the same tag. |
+| `VersionInSync` | `VersionPending` | Version probe Job has not finished yet |
+| `VersionInSync` | `VersionProbeFailed` | Probe Job failed; the operator cannot determine the running version |
+| `VersionUpgraded` | `UpToDate` | The cluster is on the latest version available in the selected channel |
+| `VersionUpgraded` | `MinorUpdateAvailable` | A newer patch is available in the same `major.minor` line |
+| `VersionUpgraded` | `MajorUpdateAvailable` | A newer `major.minor` is available within the chosen channel |
+| `VersionUpgraded` | `VersionOutdated` | The running version is out of date and will no longer receive fixes from the selected channel — typically because the major line has been dropped from `lts` or `stable` upstream |
+| `VersionUpgraded` | `WrongReleaseChannel` | The running image does not belong to the selected `upgradeChannel`. Example: a cluster running `26.5` with `upgradeChannel: lts`, since `26.5` is not part of the upstream `lts` line. |
+| `VersionUpgraded` | `UpgradeCheckFailed` | The operator could not reach the upstream release feed |
+
+Inspect them with:
+
+```bash
+kubectl get clickhousecluster sample -o yaml | sed -n '/conditions:/,/^[^ ]/p'
+```
+
+### Overriding the version probe Job {#version-probe-template}
+
+The probe is implemented as a regular Kubernetes `Job`. If your cluster has admission policies that require specific Tolerations, node selectors, security contexts, or you want to limit how long completed probe Jobs linger, override the template via `spec.versionProbeTemplate`:
+
+```yaml
+spec:
+  versionProbeTemplate:
+    spec:
+      ttlSecondsAfterFinished: 600   # delete completed probe Jobs 10 minutes after completion
+      template:
+        spec:
+          nodeSelector:
+            kubernetes.io/arch: amd64
+          tolerations:
+            - key: dedicated
+              operator: Equal
+              value: clickhouse
+              effect: NoSchedule
+          containers:
+            - name: version-probe
+              resources:
+                requests:
+                  cpu: 50m
+                  memory: 64Mi
+```
+
+The container name `version-probe` is the operator's default — the entry under `containers:` matches it by name, so the operator deep-merges the user-provided fields on top of the defaults.
+
+### Operator-wide controls {#version-operator-flags}
+
+Two flags on the operator manager control the upgrade-check loop globally:
+
+| Flag | Default | Effect |
+|---|---|---|
+| `--version-update-interval` | `24h` | How often the operator re-fetches the upstream version list |
+| `--disable-version-update-checks` | `false` | Disables the upgrade checker entirely. The `VersionUpgraded` condition is not set, and no outbound HTTP traffic to `clickhouse.com` is generated |
+
+Set `--disable-version-update-checks=true` in air-gapped environments or when egress to `clickhouse.com` is not allowed.
+
 ## ClickHouse settings {#clickhouse-settings}
 
 ### Default user password {#default-user-password}
@@ -426,6 +717,47 @@ spec:
 
 When enabled, the operator synchronizes Replicated and integration tables to new replicas.
 
+### Server logging {#server-logging}
+
+Configure the ClickHouse server log through `spec.settings.logger`. Every field is optional with a safe default, so a cluster you never touch already logs at `trace` to both the container console and a rotated file on disk.
+
+```yaml
+spec:
+  settings:
+    logger:
+      logToFile: true   # Default: true. Set false to log only to the console
+      jsonLogs: false   # Default: false. Set true for structured JSON log lines
+      level: trace      # Default: trace
+      size: 1000M       # Default: 1000M. Rotate a log file once it reaches this size
+      count: 50         # Default: 50. Number of rotated files to keep
+```
+
+| Field | Default | Description |
+|---|---|---|
+| `logToFile` | `true` | When `false`, the operator drops the file targets and the server logs only to the container console. |
+| `jsonLogs` | `false` | When `true`, the operator adds `formatting.type: json` so each line is a JSON object. |
+| `level` | `trace` | Log verbosity. One of `test`, `trace`, `debug`, `information`, `notice`, `warning`, `error`, `critical`, `fatal`. |
+| `size` | `1000M` | Maximum size of a single log file before rotation. |
+| `count` | `50` | Number of rotated log files the server retains. |
+
+The operator always keeps console logging on so that `kubectl logs` works, and layers file logging on top when `logToFile` is `true`. A cluster with the defaults renders this `logger` block:
+
+```yaml
+logger:
+  console: true
+  level: trace
+  log: /var/log/clickhouse-server/clickhouse-server.log
+  errorlog: /var/log/clickhouse-server/clickhouse-server.err.log
+  size: 1000M
+  count: 50
+```
+
+The same `spec.settings.logger` block applies to a `KeeperCluster`; the operator writes its files under `/var/log/clickhouse-keeper/` instead.
+
+<Note>
+Console logging stays on regardless of `logToFile`, so `kubectl logs` keeps working even when you disable file logging. Set `jsonLogs: true` when you ship logs to a structured log store that parses JSON.
+</Note>
+
 ## Custom configuration {#custom-configuration}
 
 ### Embedded extra configuration {#embedded-extra-configuration}
@@ -442,8 +774,8 @@ spec:
 ```
 
 #### Useful links:
-* [YAML configuration examples](/concepts/features/configuration/server-config/configuration-files#example-1)
-* [All server settings](/reference/settings/server-settings/settings)
+* [YAML configuration examples](/core/concepts/features/configuration/server-config/configuration-files#example-1)
+* [All server settings](/core/reference/settings/server-settings/settings)
 
 ### Embedded extra users configuration {#embedded-extra-users-configuration}
 
@@ -475,7 +807,7 @@ spec:
 The `extraUsersConfig` is stored in k8s ConfigMap object. Avoid plain text secrets there.
 </Note>
 
-#### See [documentation](/concepts/features/configuration/settings/settings-users) for all supported ClickHouse users configuration options.
+#### See [documentation](/core/concepts/features/configuration/settings/settings-users) for all supported ClickHouse users configuration options.
 
 ### Configuration example {#configuration-example}
 
diff --git a/products/kubernetes-operator/guides/introduction.mdx b/products/kubernetes-operator/guides/introduction.mdx
index ab530d166..6c08a0fd9 100644
--- a/products/kubernetes-operator/guides/introduction.mdx
+++ b/products/kubernetes-operator/guides/introduction.mdx
@@ -101,7 +101,7 @@ The ClickHouse Operator automatically replicates database definitions across all
 ### What Gets Replicated {#what-gets-replicated}
 
 The operator synchronizes:
-- [Replicated](/reference/engines/database-engines/replicated) database definitions
+- [Replicated](/core/reference/engines/database-engines/replicated) database definitions
 - Integration database engines (PostgreSQL, MySQL, etc.)
 
 The operator does **not** synchronize:
@@ -114,7 +114,7 @@ The operator does **not** synchronize:
 <Tip>
 **Best practice**
 
-Always use the [Replicated](/reference/engines/database-engines/replicated) database engine for production deployments.
+Always use the [Replicated](/core/reference/engines/database-engines/replicated) database engine for production deployments.
 </Tip>
 
 Benefits:
diff --git a/products/kubernetes-operator/guides/monitoring.mdx b/products/kubernetes-operator/guides/monitoring.mdx
new file mode 100644
index 000000000..a994d3d54
--- /dev/null
+++ b/products/kubernetes-operator/guides/monitoring.mdx
@@ -0,0 +1,384 @@
+---
+position: 3
+slug: /clickhouse-operator/guides/monitoring
+title: 'Monitoring'
+keywords: ['kubernetes', 'prometheus', 'monitoring', 'metrics']
+description: 'How to scrape, secure, and use the operator metrics and health endpoints.'
+doc_type: 'guide'
+---
+
+# Monitoring the ClickHouse Operator
+
+The operator exposes Prometheus-compatible metrics and Kubernetes health probes so that you can observe its reconciliation activity, detect stalled controllers, and alert on failures.
+
+This guide covers what the operator exposes, how to scrape it, and which queries are useful day to day.
+
+<Note>
+This guide is about the **operator process itself** (the controller manager). For ClickHouse server metrics (queries, parts, replication lag), use the [Prometheus endpoint in ClickHouse](/operations/server-configuration-parameters/settings#prometheus) to scrape it separately.
+</Note>
+
+## Endpoints {#endpoints}
+
+The operator process exposes two HTTP endpoints inside the manager pod:
+
+| Endpoint | Default port | Path | Purpose |
+|---|---|---|---|
+| Metrics | `8080` (Helm) / `0` disabled (binary default) | `/metrics` | Prometheus exposition format |
+| Health probe | `8081` | `/healthz`, `/readyz` | Kubernetes liveness and readiness |
+
+The metrics endpoint is **off by default** when running the operator binary directly (`--metrics-bind-address=0`). The Helm chart turns it on with `metrics.enable: true` and `metrics.port: 8080`.
+
+The health probe endpoint is always on; the deployment template wires `/healthz` and `/readyz` to the pod's liveness and readiness probes on port `8081`.
+
+## Operator binary flags {#operator-binary-flags}
+
+The relevant `manager` flags (defined in [`cmd/main.go`](https://github.com/ClickHouse/clickhouse-operator/blob/main/cmd/main.go)):
+
+| Flag | Default | Description |
+|---|---|---|
+| `--metrics-bind-address` | `0` (disabled) | Bind address for the metrics endpoint. Set to `:8443` for HTTPS or `:8080` for HTTP. Leave as `0` to disable the metrics server. |
+| `--metrics-secure` | `true` | Serve metrics over HTTPS with authn/authz. Set to `false` for plain HTTP. |
+| `--metrics-cert-path` | empty | Directory containing TLS cert files (`tls.crt`, `tls.key`) for the metrics server. |
+| `--metrics-cert-name` | `tls.crt` | Cert file name inside `--metrics-cert-path`. |
+| `--metrics-cert-key` | `tls.key` | Key file name inside `--metrics-cert-path`. |
+| `--enable-http2` | `false` | Enable HTTP/2 for the metrics **and webhook** servers. Off by default to mitigate CVE-2023-44487 / CVE-2023-39325. |
+| `--leader-elect` | `false` (binary) / `true` (Helm chart) | Enable leader election so only one replica reconciles at a time. The Helm chart sets this flag in `manager.args` by default. |
+| `--health-probe-bind-address` | `:8081` | Bind address for `/healthz` and `/readyz`. |
+
+<Note>
+The `8443` (HTTPS) / `8080` (HTTP) convention in the flag's help text is only a hint. The Helm chart serves HTTPS on `8080` because it sets both `metrics.port: 8080` and `metrics.secure: true`. There is no port-based mode detection — `--metrics-secure` is what selects HTTPS or HTTP.
+</Note>
+
+## Enable metrics via Helm {#enable-metrics-via-helm}
+
+The chart already creates a `Service` for the metrics port and, optionally, a `ServiceMonitor` for prometheus-operator.
+
+The metrics endpoint itself is on by default (`metrics.enable: true`, port `8080`, served over HTTPS via `metrics.secure: true`). The only setting you typically need to flip is `prometheus.enable` to have the chart create a `ServiceMonitor` for you:
+
+```yaml
+# values.yaml — minimal override
+prometheus:
+  enable: true
+```
+
+If you do not use cert-manager, additionally set `certManager.enable: false` and the ServiceMonitor will scrape with `insecureSkipVerify: true`, relying on bearer-token authentication only.
+
+The full set of metrics-related defaults is:
+
+```yaml
+metrics:
+  enable: true
+  port: 8080
+  secure: true            # HTTPS with authn/authz enforced on every scrape
+
+certManager:
+  enable: true            # Issues the metrics server certificate
+
+prometheus:
+  enable: false           # Set to true to render the ServiceMonitor
+  scraping_annotations: false   # Alternative: prometheus.io/scrape pod annotations
+```
+
+Apply:
+
+```bash
+helm upgrade --install clickhouse-operator \
+  oci://ghcr.io/clickhouse/clickhouse-operator-helm \
+  -n clickhouse-operator-system --create-namespace \
+  -f values.yaml
+```
+
+After install the chart creates:
+
+- `Service/<resource-prefix>metrics-service` — exposes port `8080` (HTTPS when `metrics.secure: true`).
+- `ServiceMonitor/<resource-prefix>-controller-manager-metrics-monitor` — when `prometheus.enable: true`.
+- `ClusterRole/<resource-prefix>-metrics-reader` — non-resource URL `/metrics` with `get` verb.
+
+## Securing the metrics endpoint {#securing-the-metrics-endpoint}
+
+When `metrics.secure: true` the metrics server enforces TLS **and** Kubernetes authentication/authorization on every scrape. Scrapers must:
+
+1. Present a valid Kubernetes bearer token.
+2. Belong to a ServiceAccount bound to a ClusterRole granting `get` on the non-resource URL `/metrics`.
+
+The chart ships such a ClusterRole:
+
+```yaml
+apiVersion: rbac.authorization.k8s.io/v1
+kind: ClusterRole
+metadata:
+  name: clickhouse-operator-metrics-reader
+rules:
+  - nonResourceURLs:
+      - /metrics
+    verbs:
+      - get
+```
+
+Bind it to the ServiceAccount used by your scraper (typically Prometheus):
+
+```yaml
+apiVersion: rbac.authorization.k8s.io/v1
+kind: ClusterRoleBinding
+metadata:
+  name: prometheus-clickhouse-operator-metrics-reader
+roleRef:
+  apiGroup: rbac.authorization.k8s.io
+  kind: ClusterRole
+  name: clickhouse-operator-metrics-reader
+subjects:
+  - kind: ServiceAccount
+    name: <prometheus-sa>
+    namespace: <prometheus-namespace>
+```
+
+<Warning>
+If you see `401 Unauthorized` or `403 Forbidden` from the metrics endpoint, the scraper is using HTTPS but is missing/unauthorized for a Kubernetes bearer token, or its ServiceAccount lacks the binding above. Disabling security by setting `metrics.secure: false` is **not recommended** in shared clusters because anyone with network reachability to the pod could scrape the endpoint.
+</Warning>
+
+## ServiceMonitor reference {#servicemonitor-reference}
+
+The chart renders a ServiceMonitor of this shape when `prometheus.enable: true`:
+
+```yaml
+apiVersion: monitoring.coreos.com/v1
+kind: ServiceMonitor
+metadata:
+  name: <release>-controller-manager-metrics-monitor
+  namespace: <operator-namespace>
+  labels:
+    control-plane: controller-manager
+spec:
+  selector:
+    matchLabels:
+      control-plane: controller-manager
+  endpoints:
+    - path: /metrics
+      port: https           # "http" when metrics.secure: false
+      scheme: https
+      bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
+      tlsConfig:
+        serverName: <release>-metrics-service.<operator-namespace>.svc
+        ca:
+          secret:
+            name: metrics-server-cert
+            key: ca.crt
+        cert:
+          secret:
+            name: metrics-server-cert
+            key: tls.crt
+        keySecret:
+          name: metrics-server-cert
+          key: tls.key
+```
+
+If your Prometheus instance does not run cert-manager, set `tlsConfig.insecureSkipVerify: true` and rely on bearer-token authentication only — the chart already does this when `certManager.enable: false`.
+
+## Standalone Prometheus example {#standalone-prometheus-example}
+
+If you do not use kube-prometheus-stack, the repository ships a self-contained example at [`examples/prometheus_secure_metrics_scraper.yaml`](https://github.com/ClickHouse/clickhouse-operator/blob/main/examples/prometheus_secure_metrics_scraper.yaml). It creates a ServiceAccount, the necessary RBAC, and a `Prometheus` CR that selects the operator's ServiceMonitor.
+
+## Health probe endpoints {#health-probe-endpoints}
+
+| Path | Used by | Returns |
+|---|---|---|
+| `/healthz` | Kubernetes liveness probe | `200 OK` as long as the probe server is listening. |
+| `/readyz` | Kubernetes readiness probe | `200 OK` as long as the probe server is listening. |
+
+Both endpoints are registered with the same trivial ping check (`healthz.Ping` from `sigs.k8s.io/controller-runtime`). A failing probe therefore means "the manager process is not serving HTTP on `:8081`" — not "controllers are unhealthy". To detect controller-level problems, use the [reconciliation metrics](#reconciliation-activity) instead.
+
+Both endpoints are served on port `8081` by default. They are wired to the deployment as:
+
+```yaml
+livenessProbe:
+  httpGet:
+    path: /healthz
+    port: 8081
+  initialDelaySeconds: 15
+  periodSeconds: 20
+readinessProbe:
+  httpGet:
+    path: /readyz
+    port: 8081
+  initialDelaySeconds: 5
+```
+
+A repeatedly failing probe usually means the probe server itself never came up — for example, the manager exited early during startup. Check the manager logs for `unable to start manager`, RBAC failures, or `cache did not sync` errors.
+
+## Metrics catalog {#metrics-catalog}
+
+The operator does not register custom Prometheus collectors. Everything below is exposed by the underlying `controller-runtime` and `client-go` libraries. The most useful series, grouped by purpose:
+
+### Reconciliation activity {#reconciliation-activity}
+
+| Metric | Type | Labels |
+|---|---|---|
+| `controller_runtime_reconcile_total` | counter | `controller`, `result` (`success` / `error` / `requeue` / `requeue_after`) |
+| `controller_runtime_reconcile_errors_total` | counter | `controller` |
+| `controller_runtime_reconcile_time_seconds_bucket` | histogram | `controller` |
+| `controller_runtime_active_workers` | gauge | `controller` |
+| `controller_runtime_max_concurrent_reconciles` | gauge | `controller` |
+
+The `controller` label is derived by `controller-runtime` from the resource type registered with `For(...)`. With the current code in `internal/controller/clickhouse` and `internal/controller/keeper` this resolves to `clickhousecluster` and `keepercluster` respectively. If you have customized the operator, verify with a one-time scrape of `/metrics`.
+
+### Work queue {#work-queue}
+
+| Metric | Type | Labels |
+|---|---|---|
+| `workqueue_depth` | gauge | `name` (= controller name) |
+| `workqueue_adds_total` | counter | `name` |
+| `workqueue_retries_total` | counter | `name` |
+| `workqueue_unfinished_work_seconds` | gauge | `name` |
+| `workqueue_longest_running_processor_seconds` | gauge | `name` |
+| `workqueue_queue_duration_seconds_bucket` | histogram | `name` |
+| `workqueue_work_duration_seconds_bucket` | histogram | `name` |
+
+### API server traffic {#api-server-traffic}
+
+| Metric | Type | Labels |
+|---|---|---|
+| `rest_client_requests_total` | counter | `code`, `method`, `host` |
+| `rest_client_request_duration_seconds_bucket` | histogram | `verb`, `host`, `url` |
+
+### Leader election {#leader-election}
+
+| Metric | Type | Labels |
+|---|---|---|
+| `leader_election_master_status` | gauge | `name` (= `d4ceba06.clickhouse.com`) |
+
+The Helm chart enables `--leader-elect` by default, so this metric is present in standard Helm installs. When running the binary directly without the flag, the metric is absent.
+
+### Runtime {#runtime}
+
+Standard Go process and runtime collectors — `go_goroutines`, `go_memstats_*`, `process_cpu_seconds_total`, `process_resident_memory_bytes`, etc.
+
+## Useful PromQL queries {#useful-promql-queries}
+
+### Health overview
+
+```promql
+# Reconciliation rate per controller
+sum by (controller) (rate(controller_runtime_reconcile_total[5m]))
+
+# Error rate per controller (alert if > 0 sustained)
+sum by (controller) (rate(controller_runtime_reconcile_errors_total[5m]))
+
+# p99 reconcile latency
+histogram_quantile(
+  0.99,
+  sum by (le, controller) (rate(controller_runtime_reconcile_time_seconds_bucket[5m]))
+)
+```
+
+### Backlog detection
+
+```promql
+# Pending items in the work queue — a sustained value > 0 indicates a backlog,
+# but short spikes during large reconciles are normal.
+avg_over_time(workqueue_depth[10m])
+
+# Reconciles that have been running for a long time
+workqueue_longest_running_processor_seconds > 60
+```
+
+### Throttling and API pressure
+
+```promql
+# Throttled requests to the API server
+sum by (code, host) (rate(rest_client_requests_total{code=~"4..|5.."}[5m]))
+
+# 99th percentile API call duration
+histogram_quantile(
+  0.99,
+  sum by (le, verb) (rate(rest_client_request_duration_seconds_bucket[5m]))
+)
+```
+
+### Leader status (HA deployment)
+
+```promql
+# Should be exactly 1 across the replica set (Helm install enables --leader-elect by default)
+sum(leader_election_master_status{name="d4ceba06.clickhouse.com"})
+```
+
+## Suggested alerts {#suggested-alerts}
+
+Starting point for a PrometheusRule (tune thresholds for your environment):
+
+```yaml
+groups:
+  - name: clickhouse-operator
+    rules:
+      - alert: ClickHouseOperatorReconcileErrors
+        # > 0.1 errors/s sustained = > ~6 errors/min, filters transient conflicts.
+        expr: sum by (controller) (rate(controller_runtime_reconcile_errors_total[5m])) > 0.1
+        for: 15m
+        labels:
+          severity: warning
+        annotations:
+          summary: 'ClickHouse operator is failing to reconcile {{ $labels.controller }}'
+
+      - alert: ClickHouseOperatorWorkqueueBacklog
+        # avg_over_time avoids alerting on transient bursts during large reconciles.
+        expr: avg_over_time(workqueue_depth[10m]) > 5
+        for: 30m
+        labels:
+          severity: warning
+        annotations:
+          summary: 'Operator work queue backlog sustained for 30m'
+
+      - alert: ClickHouseOperatorReconcileSlow
+        expr: |
+          histogram_quantile(
+            0.99,
+            sum by (le, controller) (rate(controller_runtime_reconcile_time_seconds_bucket[10m]))
+          ) > 30
+        for: 15m
+        labels:
+          severity: warning
+        annotations:
+          summary: 'p99 reconcile latency for {{ $labels.controller }} > 30s'
+
+      - alert: ClickHouseOperatorNoLeader
+        expr: absent(leader_election_master_status{name="d4ceba06.clickhouse.com"}) == 1
+        for: 5m
+        labels:
+          severity: critical
+        annotations:
+          summary: 'No leader for the ClickHouse operator (HA deployment)'
+```
+
+The last rule is only meaningful when leader election is enabled.
+
+## Verifying the setup {#verifying-the-setup}
+
+A quick end-to-end check, assuming the chart was installed in `clickhouse-operator-system`:
+
+```bash
+NS=clickhouse-operator-system
+
+# The metrics Service exists and selects the manager pod
+kubectl -n $NS get svc -l control-plane=controller-manager
+
+# The ServiceMonitor exists (only with prometheus.enable=true)
+kubectl -n $NS get servicemonitor -l control-plane=controller-manager
+
+# Manager pod is Ready (readiness probe answers)
+kubectl -n $NS get pod -l control-plane=controller-manager
+
+# Direct scrape from inside the cluster (with the metrics-reader binding)
+kubectl -n $NS run curl-metrics --rm -it --restart=Never \
+  --image=curlimages/curl:8.10.1 -- sh -c '
+    TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
+    curl -sk -H "Authorization: Bearer $TOKEN" \
+      https://<release>-metrics-service.'$NS'.svc:8080/metrics \
+      | head -20
+  '
+```
+
+If the scrape returns metrics in the Prometheus exposition format, the endpoint and RBAC are correctly wired.
+
+## Related guides {#related-guides}
+
+- [Installation](/products/kubernetes-operator/install/helm) — Helm values relevant to monitoring.
+- [Configuration](/products/kubernetes-operator/guides/configuration) — TLS configuration shared with the metrics server.
diff --git a/products/kubernetes-operator/install/olm.mdx b/products/kubernetes-operator/install/olm.mdx
index 25051f352..8ff3cb140 100644
--- a/products/kubernetes-operator/install/olm.mdx
+++ b/products/kubernetes-operator/install/olm.mdx
@@ -97,3 +97,4 @@ More info about uninstalling can be found in the [OLM documentation](https://olm
 ## Additional Resources {#additional-resources}
 
 - [Operator Lifecycle Manager Documentation](https://olm.operatorframework.io/docs)
+
diff --git a/products/kubernetes-operator/navigation.json b/products/kubernetes-operator/navigation.json
index 302acf737..f41a15419 100644
--- a/products/kubernetes-operator/navigation.json
+++ b/products/kubernetes-operator/navigation.json
@@ -17,7 +17,8 @@
       "expanded": true,
       "pages": [
         "products/kubernetes-operator/guides/introduction",
-        "products/kubernetes-operator/guides/configuration"
+        "products/kubernetes-operator/guides/configuration",
+        "products/kubernetes-operator/guides/monitoring"
       ]
     },
     {
diff --git a/products/kubernetes-operator/overview.mdx b/products/kubernetes-operator/overview.mdx
index ba1c47d29..55d5df1e5 100644
--- a/products/kubernetes-operator/overview.mdx
+++ b/products/kubernetes-operator/overview.mdx
@@ -34,7 +34,6 @@ Choose your preferred installation method:
 
 - **[Introduction](/products/kubernetes-operator/guides/introduction)** - General overview of ClickHouse Operator concepts
 - **[Configuration Guide](/products/kubernetes-operator/guides/configuration)** - Configure ClickHouse and Keeper clusters
-- **[Monitoring](/products/kubernetes-operator/guides/introduction)** - Monitor clickhouse-operator using Prometheus metrics
 
 ## Reference {#reference}
 
diff --git a/products/kubernetes-operator/reference/api-reference.mdx b/products/kubernetes-operator/reference/api-reference.mdx
index 43e59de85..8911ffd51 100644
--- a/products/kubernetes-operator/reference/api-reference.mdx
+++ b/products/kubernetes-operator/reference/api-reference.mdx
@@ -8,9 +8,20 @@ doc_type: 'reference'
 sidebarTitle: 'API reference'
 ---
 
-
 This document provides detailed API reference for the ClickHouse Operator custom resources.
 
+## AdditionalPort {#additionalport}
+
+AdditionalPort declares one extra TCP port to expose on the ClickHouse Pod and the operator-managed headless Service.
+
+| Field | Type | Description | Required | Default |
+|-------|------|-------------|----------|---------|
+| `name` | string | Name uniquely identifies the port within the list. Used as both the container port name and the Service port name.<br />This must be a DNS_LABEL. | true |  |
+| `port` | integer | Port is the TCP port number to expose. | true |  |
+
+Appears in:
+- [ClickHouseClusterSpec](#clickhouseclusterspec)
+
 ## ClickHouseCluster {#clickhousecluster}
 
 ClickHouseCluster is the Schema for the `clickhouseclusters` API.
@@ -65,6 +76,7 @@ ClickHouseClusterSpec defines the desired state of ClickHouseCluster.
 | `upgradeChannel` | string | UpgradeChannel specifies the release channel for major version upgrade checks.<br />When empty, only minor updates will be proposed. Allowed values are: stable, lts or specific major.minor version (e.g. 25.8). | false |  |
 | `versionProbeTemplate` | [VersionProbeTemplate](#versionprobetemplate) | VersionProbeTemplate overrides for the version detection Job. | false |  |
 | `externalSecret` | [ExternalSecret](#externalsecret) | ExternalSecret is an optional reference to an externally-managed Secret containing cluster secrets.<br />The secret must reside in the same namespace as the cluster. | false |  |
+| `additionalPorts` | [AdditionalPort](#additionalport) array | AdditionalPorts declares extra TCP ports to expose on the ClickHouse Pod and the operator-managed headless Service.<br />The operator only adds the ports to the Kubernetes resources, it does not configure the ClickHouse server to listen on them. | false |  |
 
 Appears in:
 - [ClickHouseCluster](#clickhousecluster)
@@ -152,7 +164,7 @@ ContainerTemplateSpec describes the container configuration overrides for the cl
 |-------|------|-------------|----------|---------|
 | `image` | [ContainerImage](#containerimage) | Image is the container image to be deployed. | true |  |
 | `imagePullPolicy` | [PullPolicy](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#pullpolicy-v1-core) | ImagePullPolicy for the image, which defaults to IfNotPresent. | false |  |
-| `resources` | [ResourceRequirements](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#resourcerequirements-v1-core) | Resources is the resource requirements for the server container.<br />Deep-merged with operator defaults via SMP. Individual limits and requests override only matching<br />keys; unset fields preserve operator defaults. | false |  |
+| `resources` | [ResourceRequirements](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#resourcerequirements-v1-core) | Resources is the resource requirements for the server container.<br />Applied as a whole: operator defaults are used only when all resource fields are empty. | false |  |
 | `volumeMounts` | [VolumeMount](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#volumemount-v1-core) array | VolumeMounts is the list of volume mounts for the container.<br />Concatenated with operator-generated mounts. Entries sharing a `mountPath` with an operator<br />mount are merged into a projected volume. | false |  |
 | `env` | [EnvVar](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#envvar-v1-core) array | Env is the list of environment variables to set in the container.<br />Merged with operator defaults by name. | false |  |
 | `securityContext` | [SecurityContext](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#securitycontext-v1-core) | SecurityContext defines the security options the container should be run with.<br />A non-nil SecurityContext fully replaces operator defaults; the user owns the<br />entire struct. When nil, operator defaults are preserved.<br />More info: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/ | false |  |