Skip to content

fix(pg_upgrade): use systemctl to start data.mount#2144

Open
valigula wants to merge 1 commit intodevelopfrom
andres/inc-495-pg-upgrade-mount-race
Open

fix(pg_upgrade): use systemctl to start data.mount#2144
valigula wants to merge 1 commit intodevelopfrom
andres/inc-495-pg-upgrade-mount-race

Conversation

@valigula
Copy link
Copy Markdown
Contributor

@valigula valigula commented May 7, 2026

complete.sh used mount -a to mount /data after the upgraded volume was re-attached. Because the fstab entry for /data includes x-systemd.device-timeout, systemd takes ownership of the data.mount unit — and mount -a from util-linux silently skips systemd-managed mounts, exiting 0 without mounting anything.

This caused copy_configs to run against an unmounted /data, exhausting its 3 retries in ~6s and marking the upgrade as failed.

Ref: INC-495

Replace retry 8 mount -a -v with systemctl start data.mount, which explicitly activates the systemd unit. Follow it with retry 8 mountpoint -q /data as a safety net with an explicit failure message.

PR #2133 (Paul Cioanca) addressed the secondary race — EBS API reporting in-use before the NVMe device is visible to the OS — by adding udevadm settle and wait_for_data_device. This PR addresses the root cause that remained after that fix.

Please go the Preview tab and select the appropriate sub-template:

`complete.sh` used `mount -a` to mount `/data` after the upgraded volume
was re-attached. Because the fstab entry for `/data` includes
`x-systemd.device-timeout`, systemd takes ownership of the `data.mount`
unit — and `mount -a` from util-linux silently skips systemd-managed
 mounts, exiting 0 without mounting anything.

This caused `copy_configs` to run against an unmounted `/data`, exhausting
its 3 retries in ~6s and marking the upgrade as failed.

Ref: INC-495

Replace `retry 8 mount -a -v` with `systemctl start data.mount`, which
explicitly activates the systemd unit. Follow it with
 `retry 8 mountpoint  -q /data` as a safety net with an explicit failure message.

PR #2133 (Paul Cioanca) addressed the secondary race — EBS API reporting
`in-use` before the NVMe device is visible to the OS — by adding
`udevadm settle` and `wait_for_data_device`. This PR addresses the root
cause that remained after that fix.
@valigula valigula requested review from a team as code owners May 7, 2026 12:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants