engineering: ACL COSI: handle shared BTRFS UUIDs and ESP space management#673
engineering: ACL COSI: handle shared BTRFS UUIDs and ESP space management#673bfjelds wants to merge 40 commits into
Conversation
ACL images ship with PARTUUID-based verity addons — templates for both A and B slots stored in acl/uki-addons/ on the ESP, with slot A active by default. During an A/B update, trident must swap the active addon to match the target slot so the new UKI boots with the correct verity partition identity. Add activate_verity_addon_for_target_volume() which: - Checks for ACL verity addon templates on the image ESP - Copies the correct slot template into the staged addon directory - Is a silent no-op for non-ACL images (no template dir) - Errors if template dir exists but the selected slot is missing Called from copy_file_artifacts() after stage_uki_on_esp(), gated on ctx.image_distro().is_acl() to ensure only ACL images are affected. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ACL uses identical FS UUIDs across A/B slots by design — partitions are distinguished by PARTUUID instead. The within-image uniqueness check is unaffected. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Scan each UKI's .extra.d/ directory for *.addon.efi files and extract their .cmdline PE sections. Addons are stored as a new field on the boot entry so the COSI metadata captures the full effective cmdline (main UKI + addons). Both Go (mkcosi) and Rust (metadata deserialization) updated. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
With PARTUUID-based verity addons, usrhash= moved from the main UKI cmdline to the verity addon cmdline. Update extractUsrhashFromUKIEntries to also search addon cmdlines when looking for the root hash. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When staging an A/B update on ACL (Azure Container Linux) UKI images, the COSI image may share BTRFS filesystem UUIDs with the active OS. BTRFS maintains a kernel-global UUID registry and refuses to mount a filesystem whose UUID is already registered by another mounted device, causing the staging verity device mount to fail. This change detects the UUID collision by checking the well-known ACL USR-A/USR-B partition UUIDs (by PARTUUID) before the mount loop. When a collision is detected, it bind-mounts the active /usr into the newroot instead of attempting to mount the staging verity device. This is safe because: - USR is verity-protected and read-only - Matching UUIDs means identical filesystem content - The chroot only reads from /usr during provisioning - After reboot, initramfs sets up the correct verity device normally Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When the bind-mount workaround activates for ACL BTRFS UUID collisions, compare the staging USR verity root hash (from COSI metadata) against the active USR root hash (from /proc/cmdline usrhash= parameter) to cryptographically prove the filesystems are byte-identical. If the staging hash is available but the active hash cannot be read or does not match, the bind-mount is refused and the normal mount path proceeds (which will fail with the BTRFS UUID error, as expected for genuinely different content). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When internalParams.forceAbUpdate is true, trident will proceed with
an A/B update even when the old and new OS image SHA384 hashes match.
This is useful for testing A/B update flows repeatedly with the same
COSI file.
Usage in trident-config.yaml:
internalParams:
forceAbUpdate: true
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace the blanket ACL skip in validate_filesystem_uniqueness() with proper validation. When a duplicate FS UUID is found during A/B update on ACL, the update is only allowed if: 1. The duplicate is on the /usr mount point 2. The staging COSI has a verity root hash 3. The active system has a usrhash= in /proc/cmdline 4. The normalized hashes match (merkle tree proof of identical content) If COSI partition metadata is available, also validates that the staging USR partition has a known ACL PARTUUID. Extracts ACL constants and read_active_usr_roothash() into a shared engine::acl module used by both osimage.rs and newroot.rs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
DiscoverablePartitionType does not have is_acl_usr() — that method lives on the HC PartitionType enum. Since we already check for known ACL USR PARTUUIDs, the part_type check was redundant. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The ESP (128 MB) can overflow when multiple UKIs accumulate across A/B updates. Before staging a new UKI, remove old UKIs for the target slot: 1. Trident-managed UKIs matching the target slot (all install indices) 2. Non-trident-managed (original install) UKIs, but only when trident already manages the other slot (proving it owns boot management) The other slot's UKI is always preserved as the active/rollback path. Also extract UKI_SLOT_A/UKI_SLOT_B constants to replace string literals. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
In multi-OS configurations, the ESP has UKI pairs per OS instance (azla0/azlb0, azla1/azlb1, etc.). Cleanup must only remove UKIs for the specific slot+os-index being updated, not all UKIs for the same slot letter across different OS instances. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
In multiboot configurations, the original UKI has OS 0's partition references baked in. OS 1+ instances never depend on it, but only OS 0 should remove it since it's the owner. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move /proc/cmdline read out of validate_acl_duplicate_uuid into its caller (validate_filesystem_uniqueness). The function now accepts active_usr_roothash as Option<String>, making it fully testable in unit tests without filesystem access. Add 7 unit tests covering all validation paths: - matching hash (success) - case-insensitive matching (success) - wrong mount point (reject) - no staging verity hash (reject) - mismatched hashes (reject) - no active hash / None (reject) - empty active hash (reject) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
DR-001 (High): Replace if-let with let-else for missing staging hash in detect_acl_btrfs_uuid_collision - None now logs a warning and refuses the bind-mount instead of silently proceeding unverified. DR-002 (High): Replace suffix.contains() with exact suffix equality in cleanup_ukis_before_staging - prevents azla0 from matching azla01.efi in multiboot with 10+ OS instances. DR-003 (Medium): Extract verity_hashes_match() into engine::acl module, replacing duplicated normalize+compare logic in newroot.rs and osimage.rs. Rejects empty hashes so "" == "" cannot incorrectly pass. DR-004 (Medium): Document pre-staging cleanup ordering rationale in esp.rs - explains the crash-safety trade-off (active slot UKI preserved as A/B fallback). DR-005 (Medium): Make remove_uki_and_addons idempotent by treating NotFound as success - prevents orphaned addon dirs if UKI was already removed by a prior partial cleanup. DR-006 (Medium): Document that cleanup_ukis_before_staging is intentionally universal (not ACL-gated) - ESP space constraints apply to all UKI-based A/B updates. DR-007 (Medium): Replace byte-index hash slicing with char-safe hash_preview() using chars().take(16) - prevents panics on non-ASCII input (defense in depth for hex hashes). Adds unit tests for verity_hashes_match(), hash_preview(), cleanup_ukis_before_staging (exact suffix matching, multi-index cleanup), and remove_uki_and_addons (idempotency, addon directory cleanup). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/azp run [GITHUB]-trident-pr-e2e |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Pull request overview
This PR improves Trident’s reliability for Azure Container Linux (ACL) A/B updates by accommodating ACL’s intentional BTRFS filesystem UUID duplication (with verity root-hash verification and a mount-time workaround), and by proactively managing ESP space to avoid “no space left on device” failures across repeated updates. It also extends COSI/UKI metadata handling so verity-related UKI addon cmdlines are discoverable.
Changes:
- Add ACL-specific safety checks for duplicate filesystem UUIDs in A/B updates (validated via verity
usrhash=root hash) and a BTRFS UUID-collision bind-mount workaround during newroot mounting. - Extend mkcosi and Trident COSI metadata to include UKI addons and to extract
usrhash=from addon cmdlines. - Add pre-staging ESP cleanup for target-slot UKIs and introduce
forceAbUpdateinternal param to allow repeated A↔B testing with the same COSI.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/cmd/mkcosi/metadata/metadata.go | Extend systemd-boot entry metadata schema to include UKI addons. |
| tools/cmd/mkcosi/generator/generator.go | Discover .extra.d/*.addon.efi addons and extract addon .cmdline into metadata. |
| tools/cmd/mkcosi/generator/cih.go | Extract usrhash= from both UKI and addon cmdlines. |
| crates/trident/src/subsystems/storage/osimage.rs | Allow ACL duplicate FS UUIDs for /usr only when verity root hashes match; add unit tests. |
| crates/trident/src/subsystems/esp.rs | Add pre-staging UKI cleanup and ACL verity-addon activation hook during ESP deployment. |
| crates/trident/src/osimage/cosi/metadata.rs | Deserialize addons for bootloader entries from COSI metadata. |
| crates/trident/src/engine/newroot.rs | Add ACL BTRFS UUID-collision detection and bind-mount fallback for /usr. |
| crates/trident/src/engine/mod.rs | Expose new engine::acl helper module. |
| crates/trident/src/engine/context/image.rs | Add forceAbUpdate internal param support in A/B update decision logic. |
| crates/trident/src/engine/clean_install.rs | Thread the new optional staging /usr roothash parameter through clean install mount calls. |
| crates/trident/src/engine/boot/uki.rs | Add UKI pre-staging cleanup logic and ACL verity addon activation from templates; add tests. |
| crates/trident/src/engine/acl.rs | New ACL helper module for PARTUUID constants and usrhash parsing/comparison utilities. |
| crates/trident/src/engine/ab_update.rs | Extract staging /usr verity roothash from image metadata and pass into newroot mounting. |
| crates/trident_api/src/error.rs | Add a specific InvalidInput error variant for ACL duplicate-FS-UUID verification failures. |
| crates/trident_api/src/constants.rs | Add internal_params::FORCE_AB_UPDATE constant. |
| /// | ||
| /// Returns `None` if: | ||
| /// - The system is not ACL (PARTUUIDs not found) | ||
| /// - The partitions don't have BTRFS filesystems |
There was a problem hiding this comment.
i don't know what happens ix an ext4 partition and a btrfs partition have the same UUID
There was a problem hiding this comment.
is that an ACL possibility?
…ot consts - Add osutils::verity_roothash module with VerityRootHash newtype that handles normalization, comparison, preview, and /proc/cmdline parsing - Move verity hash logic out of engine::acl into osutils; acl.rs now re-exports VerityRootHash and ACL PARTUUID constants - Add UKI_SLOT_A/UKI_SLOT_B to constants.rs (derived from AZURE_LINUX_INSTALL_ID_PREFIX + AB_VOLUME_A/B_NAME, lowercased) so changes to AZL prefix propagate to UKI filenames - Update newroot.rs and osimage.rs to use VerityRootHash instead of raw string comparison functions Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…tring Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Callers in newroot.rs and osimage.rs now import VerityRootHash from osutils and ACL constants from trident_api directly. The thin re-export module added no value. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
UKI_SLOT_A/B are derived from AZURE_LINUX_INSTALL_ID_PREFIX + AB_VOLUME_*_NAME lowercased. Rust const can't call to_lowercase(), so the values are hardcoded. This test catches drift if the parent consts ever change. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace UKI_SLOT_A/UKI_SLOT_B constants with a uki_slot() helper that derives the slot name from AZURE_LINUX_INSTALL_ID_PREFIX + volume name, lowercased. This eliminates the drift risk if the parent consts change. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Summary
Enables trident to successfully perform A/B updates on ACL when the COSI image shares BTRFS filesystem UUIDs with the currently active OS. Also adds pre-staging ESP cleanup to prevent "no space left on device" failures during repeated updates.
Related PRs
Combined Validation
https://dev.azure.com/mariner-org/ACL/_build/results?buildId=1132645
Changes
ACL BTRFS UUID handling
search --fs-uuid), so shared UUIDs are safeUKI enhancements
verity.addon.efibased on which slot is being updatedfindUkiEntriesCOSI metadata — ensures addon files are discovered during COSI parsingusrhash=parameter — extracts verity root hash from addon kernel cmdlineESP space management
azla0) not just slot letter, safe for multibootinstall_index == 0— only the OS that placed the original UKI can remove itInternal testing support
forceAbUpdateinternal param — bypasses SHA384 identity check, allowing the same COSI to be applied repeatedly for A↔B cycle testingTesting
cargo check -p trident,cargo fmt)user/bfjelds/single-acl-buildupdated with 5-cycle A↔B test and ESP diagnostics