Skip to content

Upgrade cloud-hypervisor to v50.1 (CVE-2026-27211)#200

Open
ulziibay-kernel wants to merge 3 commits intomainfrom
hypeship/upgrade-ch-v50.1
Open

Upgrade cloud-hypervisor to v50.1 (CVE-2026-27211)#200
ulziibay-kernel wants to merge 3 commits intomainfrom
hypeship/upgrade-ch-v50.1

Conversation

@ulziibay-kernel
Copy link
Copy Markdown
Contributor

@ulziibay-kernel ulziibay-kernel commented Apr 24, 2026

Summary

Upgrades the embedded Cloud Hypervisor binaries from v48.0/v49.0 to v50.1.

  • Fixes GHSA-jmr4-g2hv-mjj6 / CVE-2026-27211: arbitrary host file exfiltration via crafted QCOW2 disk headers. All versions 34.0–50.0 are affected; fix is in 50.1.
  • Drops embedded v48.0 and v49.0 binaries. SupportedVersions is now {V50_1}; GetVersion() in the cloud-hypervisor starter returns v50.1.
  • Updates Makefile (download-ch-binaries, download-ch-spec, ensure-ch-binaries), ParseVersion, tests, and README docs.

Compatibility notes for reviewer

  • Snapshot restore compatibility: lib/system/README.md notes that snapshot restore requires an exact CH version match. Dropping v48.0/v49.0 means existing standby instances snapshotted on those versions can no longer be restored by this binary. If a transition window is needed, we could keep v49.0 embedded temporarily behind a deprecation marker — happy to switch to that approach.
  • API spec: Cloud Hypervisor's OpenAPI version is still 0.3.0 at tag v50.1. The v50.1 spec adds /vm.resize-disk and an optional nested field; both are additive so I did not regenerate lib/vmm/vmm.go. Run make download-ch-spec && make generate-vmm-client if we want the new surface.

Test plan

  • make download-ch-binaries pulls v50.1 static binaries (x86_64 + aarch64)
  • Downloaded binary reports cloud-hypervisor v50.1
  • go build ./lib/vmm/... ./lib/hypervisor/cloudhypervisor/...
  • go vet clean on changed packages
  • go test ./lib/vmm/... -run 'TestIsVersionSupported|TestExtractBinary|TestParseVersion' passes
  • Full go test ./lib/vmm/... (requires KVM; run in CI / on a KVM-capable host)
  • Integration smoke: boot a VM with the new binary end-to-end

🤖 Generated with Claude Code


Note

Medium Risk
Upgrades the embedded hypervisor binary and drops older supported versions, which can break standby snapshot restore for instances created on prior Cloud Hypervisor versions and may expose subtle runtime behavior changes.

Overview
Updates embedded Cloud Hypervisor from v48.0/v49.0 to v50.1 (including Makefile download/ensure targets and the spec fetch URL), and makes v50.1 the sole SupportedVersions/default returned by the Cloud Hypervisor starter.

Refreshes the VMM client surface to match the newer spec (e.g., disk image_type, CPU nested, NIC offload flags, and a new /vm.resize-disk request/response), and updates parsing/tests/docs to expect v50.1 throughout.

Reviewed by Cursor Bugbot for commit 6cb1e07. Bugbot is set up for automated code reviews on this repo. Configure here.

Fixes GHSA-jmr4-g2hv-mjj6 (CVE-2026-27211): VMM host file
exfiltration via malicious QCOW2 headers. Affects versions 34.0
through 50.0; fixed in 50.1.

- Drop embedded v48.0 and v49.0 binaries; embed v50.1 only
- Update Makefile downloads, spec source, and ensure-ch-binaries
  check to v50.1
- Update SupportedVersions, ParseVersion, and the default
  GetVersion() returned by the cloud-hypervisor Starter
- Update tests and docs to reference v50.1

Cloud Hypervisor API remains at v0.3.0 (new /vm.resize-disk
endpoint and optional `nested` field are additive, no regen
needed unless the new surface is used).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ulziibay-kernel ulziibay-kernel marked this pull request as ready for review April 27, 2026 19:02
@firetiger-agent
Copy link
Copy Markdown

Firetiger deploy monitoring skipped

This PR didn't match the auto-monitor filter configured on your GitHub connection:

Any PR that changes the kernel API. Monitor changes to API endpoints (packages/api/cmd/api/) and Temporal workflows (packages/api/lib/temporal) in the kernel repo

Reason: PR updates Cloud Hypervisor binaries and related build tooling, but does not modify API endpoints (packages/api/cmd/api/) or Temporal workflows (packages/api/lib/temporal) that the filter targets.

To monitor this PR anyway, reply with @firetiger monitor this.

CH v50.1 prevents sector-zero writes on autodetected raw images as
part of the CVE-2026-27211 fix. Without explicit image_type, the
overlay disk (vdb) fails with I/O errors because CH treats it as a
potential QCOW2 spoof:

  I/O error, dev vdb, sector 0 op 0x1:(WRITE)
  EXT4-fs (vdb): mount failed
  FATAL: dropping to shell for debugging

Fix:
1. Regenerate lib/vmm/vmm.go from the v50.1 OpenAPI spec to pick up
   the new image_type and backing_files fields in DiskConfig
2. Fix malformed enum in the upstream spec (type: enum [...] -> type:
   string with enum list) matching cloud-hypervisor PR #7734
3. Set ImageType: Raw on all disk configs in ToVMConfig so CH skips
   format autodetection and allows sector-zero writes on raw images

Made-with: Cursor
@ulziibay-kernel
Copy link
Copy Markdown
Contributor Author

Changes between v49.0 and v50.1 -- reviewer notes

Pushed a fix for the CI failure (ceb572d). Root cause: v50.1 prevents sector-zero writes on autodetected raw images (part of CVE fix). Fix: regenerate VMM client from v50.1 spec + set ImageType: Raw on all disk configs.

Here's a full breakdown of potentially breaking changes between v49.0 (what we run today) and v50.1:

Breaking / High Impact

Change Version Impact on us Action needed
Sector-zero write prevention on autodetected raw images v50.1 BREAKING -- overlay disk (vdb) fails with I/O errors, VMs drop to debug shell Fixed in ceb572d: set ImageType: Raw explicitly
backing_files defaults to off v50.1 Low risk -- we don't use QCOW2 backing files. But if any disk path references a QCOW2 with a backing chain, it would break No action needed (we use raw images)
Snapshot restore requires exact CH version match v50.0 HIGH -- existing standby instances snapshotted on v48/v49 cannot be restored by v50.1 Need a drain strategy: let existing standbys expire before deploy, or keep v49 briefly for restore

Medium Impact

Change Version Impact on us Action needed
Byte-range advisory locks on block devices (was whole-file locks) v50.0 Low risk -- better compatibility with network storage. Could surface bugs if code relied on whole-file lock semantics Monitor for lock-related errors
QCOW2 thread safety fix for num_queues > 1 v50.1 N/A -- we use raw images, not QCOW2 None
Seccomp filter fixes v50.0 Positive -- fixes seccomp violations in vsock thread and others. Could unmask previously-silent blocked syscalls Monitor for seccomp kills in CH vmm.log
CPUID fixes in guest v50.0 Positive -- several guest CPUID issues fixed. Unlikely to break anything but could change guest CPU feature visibility None expected
Nested virtualization now configurable (`nested=on off, default on`) v50.0 No change in behavior (default matches previous implicit behavior)

Low Impact / Positive

Change Version Notes
Live migration performance improvement v50.0 Positive
Live disk resizing API (/vm.resize-disk) v50.0 New capability, not a breaking change
Windows guest snapshot/restore fix v50.0 N/A for us
Serial Manager fixes v50.0 Positive
Logging now includes event-monitor info v50.0 Positive -- more visibility

Already handled from v49.0

Change Impact
Removed default IP/netmask for virtio-net No impact -- hypeman always sets Ip, Mac, Mask explicitly in ToVMConfig
Vsock crash on malformed connect requests fix Positive
Pause-resume race condition fix Positive

Key risk: Snapshot compatibility

The biggest operational risk is snapshot restore. Instances snapshotted on v48/v49 won't restore on v50.1. Options:

  1. Drain approach: stop creating new standbys, wait for existing ones to expire (based on --vmm-mtss-min-age-s), then deploy v50.1
  2. Keep v49 temporarily: embed v49 as a fallback for restore-only, route new VMs to v50.1 (more complex)
  3. Accept the break: force-delete stale standbys during deploy window

I'd recommend option 1 (drain) for production rollout.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit ceb572d. Configure here.

Comment thread lib/vmm/version.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant