Skip to content

engineering: Azure Linux 4.0 (AZL4) full enablement stack#667

Draft
Britel wants to merge 42 commits into
mainfrom
user/britel/azl4-7b-rollback-stage
Draft

engineering: Azure Linux 4.0 (AZL4) full enablement stack#667
Britel wants to merge 42 commits into
mainfrom
user/britel/azl4-7b-rollback-stage

Conversation

@Britel

@Britel Britel commented May 28, 2026

Copy link
Copy Markdown
Collaborator

Summary

Full AZL4 enablement stack — all changes from PR-1 through PR-7b in a single cumulative PR against main. This PR includes:

Rust engine changes (PR-1 + PR-2 + PR-3)

  • Native /etc/default/grub editor (grub_defaults.rs, 609 lines)
  • AZL4 distro detection (osrelease.rs, AzL4 variant + ID_LIKE=fedora)
  • AZL4 GRUB update path (update_grub_config_native)
  • Generic EFI vendor-dir discovery for AZL4 ESP layouts
  • BLS kernel cmdline reader for AZL4
  • Chroot distro detection fix (use image distro, not host)

Build infrastructure (PR-5a)

  • Blob download for AZL4 base VHDX (no ADO feed exists yet)
  • Pinned-MIC container build apparatus (at upstream PR #698 merge commit)
  • BaseImage.AZL4_QEMU_GUEST + BlobImageManifest in testimages.py

Image configs + pipeline (PR-5b + PR-6 + PR-7a + PR-7b)

  • updateimg-grub-azl4.yaml: AZL4 COSI build config
  • baseimg-grub-azl4.yaml: AZL4 qcow2 base for VM rollback testing
  • Pipeline stages: build-image-azl4, build-pinned-mic, netlaunch-testing-azl4, vm-testing-azl4
  • E2E configs: base-azl4, rollback-azl4
  • SELinux xattr strip script, sfdisk hardening, offline-init PTUUID fallback

Validation

CI build 1127408 — all 4 AZL4 stages passed (image builds, BM-sim install, storm-trident rollback). All AZL3 stages also passed (no regressions).

Stacked PR breakdown

For reviewers who prefer smaller chunks, this stack is also available as individual branches:

  • PR-1: azl4-1-grub-native (Rust primitives)
  • PR-2: azl4-2-esp-layouts (ESP)
  • PR-3: azl4-3-configure-bls (BLS)
  • PR-5a: azl4-5a-builder-infra (Python builder)
  • PR-5b: azl4-5b-image-pipeline (YAML configs)
  • PR-6: azl4-6-bm-test (netlaunch stage)
  • PR-7a: azl4-7a-qcow2-rust (qcow2 + Rust)
  • PR-7b: azl4-7b-rollback-stage (this PR, cumulative)

@Britel

Britel commented May 28, 2026

Copy link
Copy Markdown
Collaborator Author

/azp run [GITHUB]-trident-pr-e2e

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@Britel Britel force-pushed the user/britel/azl4-7b-rollback-stage branch 4 times, most recently from 23e5322 to 659da62 Compare May 30, 2026 01:02
Implements AzureLinuxRelease::AzL4 variant, VERSION_ID 4.x parsing,
ID_LIKE=fedora matching, updated GRUB match arms for AzL3|AzL4,
and image_distro() fallback to host os-release.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@Britel Britel force-pushed the user/britel/azl4-7b-rollback-stage branch from 1ca856a to 317b898 Compare June 3, 2026 01:05
Britel and others added 9 commits June 3, 2026 16:41
image_distro() was falling back to the host os-release whenever the
image's distro was Distro::Other. This silently masked unrecognized
distros as the host distro, causing GRUB config to be written for
the wrong OS.

Now: if an image is mounted (self.image.is_some()), always use the
image's distro. Fallback to host only fires when no image is present
at all (functional tests, runtime operations).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds is_azl4_or_later() helper, generic EFI vendor-dir discovery
via grub-probe, and AZL4 ESP partition layout support.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove redundant ensure!(grub_noprefix) check from ESP setup.
  generate_boot_filepaths() already finds a working GRUB binary
  (noprefix, standard, or vendor-dir). The separate policy check
  was redundant.
- Simplify copy_boot_files to return () instead of bool
- Attribute grub search format variants to distro conventions
  (AZL3/Mariner vs AZL4/Fedora), not MIC internals
- Update mixed-forms test comment to reference cross-version A/B
  update scenario

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
No callers remain after the noprefix check removal. Can be re-added
if a future change needs version-range gating.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
AZL3 ships two GRUB variants: grub2-efi-binary (prefix-relative config
lookup) and grub2-efi-binary-noprefix (root-device-relative lookup).
Trident's A/B update path requires the noprefix variant on AZL3.

Restore the noprefix check, but scope it to AZL3 only using
image_distro().is_azl3(). AZL4+ uses standard grubx64.efi in vendor
directories and does not need noprefix.

This replaces the previous generic ensure! + DISABLE_GRUB_NOPREFIX_CHECK
flag with a targeted distro check. No escape hatch needed since the
check only fires for AZL3.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Keep the original variable name and preserve the operator escape hatch.
Minimize diff from upstream.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Keep the same macro as upstream to minimize diff.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Keep the original if/else if chain with replace (first match). No
real-world grub config has multiple search lines. Minimizes diff
from upstream.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@Britel Britel force-pushed the user/britel/azl4-7b-rollback-stage branch from 1de76ba to 5d0d1e8 Compare June 3, 2026 23:58
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@Britel Britel force-pushed the user/britel/azl4-7b-rollback-stage branch 4 times, most recently from 941764d to c4cecd1 Compare June 4, 2026 22:37
AZL4 (Fedora-based) uses Boot Loader Spec entries instead of inline
linux commands in grub.cfg. When grub.cfg contains blscfg and no
inline linux lines, fall back to reading boot args from
/boot/loader/entries/*.conf.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@Britel Britel force-pushed the user/britel/azl4-7b-rollback-stage branch from c4cecd1 to afb3c77 Compare June 5, 2026 18:42
Britel and others added 3 commits June 5, 2026 12:37
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds AZL4 build pipeline stages with MCR-hosted MIC container,
BlobImageManifest class for ACG blob source downloads,
and service connection runbook.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
testimages.py runs docker with the short tag (imagecustomizer:1.4.0-1)
but docker pull uses the full MCR path. Without a local tag, docker run
fails with 'pull access denied'.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@Britel Britel force-pushed the user/britel/azl4-7b-rollback-stage branch from afb3c77 to 3767fd8 Compare June 5, 2026 19:37
AZL4 base VHDXes may continue to come from blob storage rather than
the ADO feed. The trident-service RPM will come from an AZL4 package
repo, not ADO. Update comments to reflect this.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

/// Retrieves the distribution of the OS image.
///
/// Prefers the image's own os-release (e.g., from the COSI being installed).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we use "servicing OS" and "target OS" to distinguish these ideas.

if self.image.is_some() {
self.image_os_release().get_distro()
} else {
self.host_os_release.get_distro()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this now and havn't needed it before?

@@ -0,0 +1,5 @@
compatible:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need the duplicate directories (base and base-azl4) ... i wonder if there is a way to do this without duplicating the files.

help="The image to download.",
choices=[c.image.name for c in artifacts.base_images],
)
parser_download_img.add_argument(

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious: is this just temp? or is this how azl4 will be accessed in the future?

AZL4 Beta may not have complete SELinux policies. Testing whether
enforcing mode prevents services from starting after reboot.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@Britel Britel force-pushed the user/britel/azl4-7b-rollback-stage branch from a413009 to eca253f Compare June 9, 2026 04:28
Testing whether netplan (match: enp*) conflicts with the image's
eth0 networking (net.ifnames=0) and prevents network after reboot.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@Britel Britel force-pushed the user/britel/azl4-7b-rollback-stage branch from eca253f to 473a057 Compare June 9, 2026 22:36
Strip back to the config that passed in build 1133385 to confirm
the netlaunch timeout is caused by our additions, not an infra change.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@Britel Britel force-pushed the user/britel/azl4-7b-rollback-stage branch from 473a057 to e53ffb0 Compare June 9, 2026 23:28
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@Britel Britel force-pushed the user/britel/azl4-7b-rollback-stage branch from e53ffb0 to fa9f4a0 Compare June 10, 2026 00:20
os.users alone passed. Now testing swap + /home partitions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@Britel Britel force-pushed the user/britel/azl4-7b-rollback-stage branch from fa9f4a0 to 0fc76f3 Compare June 10, 2026 01:02
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@Britel Britel force-pushed the user/britel/azl4-7b-rollback-stage branch from 0fc76f3 to 4eab827 Compare June 10, 2026 01:43
The COSI image user (MIC) must differ from the trident config user
(os.users) to avoid /home mount conflict. AZL3 uses testuser in the
COSI and testing-user in the trident config. AZL4 was using
testing-user in both, causing 'Mount path /mnt/newroot/home is not
empty' during install.

Also restore full test config (swap, /home, os.users, os.selinux,
os.netplan) and fix netplan match from enp* to eth* (AZL4 uses
net.ifnames=0).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@Britel Britel force-pushed the user/britel/azl4-7b-rollback-stage branch from 4eab827 to 745568e Compare June 10, 2026 23:31
COSI ESP only stores one set of boot files (~7MB). 64M was
unnecessarily large.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@Britel Britel force-pushed the user/britel/azl4-7b-rollback-stage branch from 745568e to f5a3b53 Compare June 11, 2026 05:28
Britel and others added 9 commits June 10, 2026 22:32
The COSI bakes /home/testuser onto root via MIC os.users. Trident's
newroot mount rejects non-empty mount points, so a separate /home
partition conflicts. AZL3 avoids this by only testing /home in
container mode. Container mode for AZL4 is a follow-up.

Keep swap, os.users, os.selinux, os.netplan, postConfigure.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds AZL4 bare-metal simulated netlaunch pipeline stage
and SELinux xattr stripping script for test image prep.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds sfdisk partition-table helper, extended offline-init for AZL4
qcow2 images, base image COSI config, and test helper scripts.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
osmodifier is now a Rust crate built into the trident binary (PR #638).
No separate osmodifier binary needs to be baked into test images.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Matches AZL3's 16M. Remove stale comment about needing 64M.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds AZL4 VM rollback test pipeline stage using storm-trident
for automated rollback validation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…k config

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@Britel Britel force-pushed the user/britel/azl4-7b-rollback-stage branch from f5a3b53 to 22e88da Compare June 11, 2026 05:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants