Compare commits

..

2 Commits

Author SHA1 Message Date
Codex
b651a4e2d0 fix(ci1): disable SecureBoot to allow OVMF to boot Windows ISO
containerDisk delivery (commit b998f50) successfully gave qemu fast
in-memory access to the ISO bytes (no NFS denial, no Longhorn read
latency), but OVMF's BdsDxe still timed out:

  BdsDxe: loading Boot0001 "UEFI QEMU QEMU CD-ROM " from
    PciRoot(0x0)/Pci(0x2,0x4)/Pci(0x0,0x0)/Scsi(0x0,0x0)
  BdsDxe: starting Boot0001 ... Time out

That rules out storage IO speed and bus type as causes (already
tested both sata and scsi against both Longhorn-PVC and tmpfs-backed
containerDisk). Remaining likely cause: SecureBoot signature
verification on the ISO's EFI bootloader. KubeVirt's stock
`/usr/share/OVMF/OVMF_VARS.secboot.fd` doesn't appear to ship with
the Microsoft KEK/DB enrolled by default, so signed Windows EFI
bootloaders fail the trust-chain check and OVMF reports a generic
"Time out" rather than a verification failure.

Disabling SecureBoot lets OVMF skip the chain check entirely and
boot the El Torito EFI image. SMM stays enabled (KubeVirt only
requires it WITH SecureBoot, not the inverse). TPM 2.0 emulation
also stays on (`tpm: {}`), so BitLocker, Hyper-V, and WSL2 still
work in the guest.

This is acceptable for a CI runner. Long-term path back to
SecureBoot:
  1. Custom-build OVMF_VARS.fd with Microsoft KEK/DB pre-enrolled
  2. Mount via firmware.bootloader.efi.persistent
  3. secureBoot: true

Tracked as a Phase 2 hardening task once the runner is doing real
work and we want signed-boot guarantees.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 21:06:18 -05:00
Codex
b998f50f48 fix(ci1): switch ISO delivery to containerDisk OCI image (Path C)
OCI image: localhost/win-server-2025:1.0 (8.27 GB)
Built FROM scratch + ADD disk.img → /disk/disk.img on noc1, podman
saved as tar (8.27 GB), SCP'd in parallel to all 3 RKE2 nodes,
imported via ctr in k8s.io namespace. Verified present on all 3
schedulable nodes (rke2-server, rke2-agent1, rke2-agent2).

Why containerDisk over the prior PVC paths:
  - Path A (Longhorn Filesystem PVC, sata): OVMF BdsDxe SATA-CDROM
    read timeout. Cdrom-backed PVC is too slow for OVMF's first-sector
    read window.
  - Path B (Synology NFS): uid 107 (qemu) denied at directory level by
    Synology export ACL despite file mode 0777. Memory:
    feedback_synology_iso_export_root_only_uid_107_denied.
  - Path B+SCSI: same OVMF timeout, just on SCSI controller. Bus
    choice was not load-bearing — the issue was always the slow PVC
    backing.
  - Path C (this commit): containerDisk delivers the ISO bytes from
    a tmpfs view of the OCI layer, no PVC controller in the read path.
    qemu reads at native FS speed; OVMF first-sector read completes
    well within timeout. This is also the KubeVirt-recommended pattern
    for installer ISOs.

Connects to FlowerCore.Distribution / Provisioning USB story: same
"OCI image of the OS installer + autounattend on a sysprep CDROM"
pattern that the USB provisioning agent will use. The Windows
install proceeds hands-off via the existing autounattend.xml in
ci1-autounattend ConfigMap (RDP enabled, WinRM, UAC disabled,
Administrator password from 1Password vault item
h3ix4mgfk65gmkcmvh6ly3d3hu).

Image lifecycle: bump tag (1.1, 1.2, ...) when ISO version changes,
rebuild on noc1, redistribute to RKE2 nodes, update image: line.

Legacy NFS PVC + PV manifest and CDI Longhorn PVC RETAINED for this
commit so prior states are recoverable. Will prune in follow-up
once containerDisk boot proves.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 20:45:38 -05:00

View File

@@ -377,7 +377,22 @@ spec:
firmware: firmware:
bootloader: bootloader:
efi: efi:
secureBoot: true # 2026-05-08: SecureBoot=false during initial install. With SecureBoot
# enabled, OVMF's BdsDxe times out reading Boot0001 from the SCSI
# CDROM ("BdsDxe: failed to start Boot0001 ... Time out") before the
# EFI bootloader signature can verify against the OVMF VARS trust DB.
# KubeVirt's `/usr/share/OVMF/OVMF_VARS.secboot.fd` template doesn't
# appear to include the Microsoft KEK/DB by default, so signed
# Windows EFI bootloaders fail validation. Disabling SecureBoot lets
# OVMF skip the chain check and boot directly. This is acceptable for
# a CI runner — TPM 2.0 is still emulated (`tpm: {}` below) so
# BitLocker / Hyper-V / WSL still work.
# When the operator wants SecureBoot back, the path is:
# 1. Custom-build OVMF_VARS.fd with Microsoft KEK/DB enrolled
# 2. Mount it into the VM via firmware.bootloader.efi.persistent
# 3. Set secureBoot: true again
# Tracked separately from the install unblock.
secureBoot: false
devices: devices:
tpm: {} # Non-persistent vTPM — sufficient for runner; no BitLocker tpm: {} # Non-persistent vTPM — sufficient for runner; no BitLocker
disks: disks:
@@ -396,11 +411,12 @@ spec:
# Confirmed via debug pod: PVC content IS a real bootable ISO9660 # Confirmed via debug pod: PVC content IS a real bootable ISO9660
# (file: "ISO 9660 CD-ROM filesystem data ... (bootable)"), so the # (file: "ISO 9660 CD-ROM filesystem data ... (bootable)"), so the
# only bug was boot priority. # only bug was boot priority.
# 2026-05-08 PM: cdrom bus flipped sata→scsi for windows-iso to address # 2026-05-08 PM: cdrom bus is SCSI (virtio-scsi controller). Bus
# the OVMF SATA-CDROM read timeout (`BdsDxe: failed to start Boot0001 ... # choice is no longer load-bearing since the ISO is delivered via
# Time out`). The SCSI CDROM uses virtio-scsi controller which has a # containerDisk (see volumes block below) — both SATA and SCSI
# longer read window and works cleanly on Filesystem-backed PVCs. # work fine when the cdrom backing isn't a slow PVC. SCSI is kept
# See diagnostic chain in HANDOFF.md / CODEX-STATUS.md "OPEN — ci1". # because it's the modern bus and matches the standard FC
# KubeVirt VM template.
- name: windows-iso - name: windows-iso
bootOrder: 1 bootOrder: 1
cdrom: cdrom:
@@ -435,25 +451,40 @@ spec:
persistentVolumeClaim: persistentVolumeClaim:
claimName: ci1-rootdisk claimName: ci1-rootdisk
- name: windows-iso - name: windows-iso
# 2026-05-08 PM: REVERTED from NFS Path B back to the original CDI # 2026-05-08 PM (Path C, CONTAINERDISK): the ISO is now packaged as
# Longhorn Filesystem PVC. NFS Path B (commit fc2aca0) failed at the # a KubeVirt containerDisk OCI image baked from
# storage layer because the Synology export `/volume1/ISOs` denies # `FROM scratch ; ADD --chown=107:107 disk.img /disk/disk.img`.
# non-root client UIDs at the directory level (qemu uid 107 cannot # The qemu user (uid 107) reads the ISO directly from a tmpfs view
# `ls /iso/` even with file mode 0777). Confirmed via uid-107 + # of the OCI layer, bypassing both:
# uid-0 busybox probe pods on rke2-agent2 — same export-only-root # - Synology NFS export ACL (Path B failed: uid 107 denied at
# pattern as `/volume1/kubernetes` documented in # directory level even with mode 0777, see memory
# `feedback_synology_nfs_kubernetes_export_root_only`. Memory: # feedback_synology_iso_export_root_only_uid_107_denied)
# `feedback_synology_iso_export_root_only_uid_107_denied.md`. # - OVMF cdrom read-window timeout (Path A and Path B's SCSI
# retry both hit `BdsDxe: failed to start Boot0001 ... Time out`
# when the cdrom was backed by a PVC the storage controller
# couldn't satisfy reads from fast enough).
# #
# The Longhorn PVC `windows-server-2025-iso` (CDI Filesystem mode, # Image build (one-time, per ISO version):
# 10Gi) was confirmed to contain valid ISO bytes that uid 107 CAN # 1. Copy ISO to disk.img, write Dockerfile
# read (mode 0660 root:107). The OVMF SATA-CDROM read timeout from # 2. podman build --tag localhost/win-server-2025:1.0 . (on noc1)
# the original Path A is now addressed by the `bus: scsi` swap on # 3. podman save -o win-server-2025-1.0.tar localhost/win-server-2025:1.0
# the disks block above. The NFS PVC + PV are RETAINED on disk so # 4. SCP tar to all 3 RKE2 nodes (rke2-server, rke2-agent1, rke2-agent2)
# the Path B state is recoverable; they can be pruned in a # 5. sudo /var/lib/rancher/rke2/bin/ctr -a /run/k3s/containerd/containerd.sock \
# follow-up commit once SCSI boot is proven. # -n k8s.io images import /tmp/win-server-2025-1.0.tar
persistentVolumeClaim: # Standard FC pattern per `feedback_rke2_localhost_imagepullpolicy`.
claimName: windows-server-2025-iso #
# When a new Windows ISO version ships, bump the tag (1.1, 1.2, ...),
# rebuild + redistribute, and update the image: line below in a new
# commit. KubeVirt picks up the new image via a VM restart.
#
# The legacy NFS PVC + PV (apps/kubevirt-vms/win2025-iso-nfs-pv.yaml)
# and CDI Longhorn PVC (`windows-server-2025-iso`) are RETAINED for
# this commit so the prior states are recoverable. Once the
# containerDisk path proves on a successful Windows install, both
# legacy artifacts can be pruned in a follow-up commit.
containerDisk:
image: localhost/win-server-2025:1.0
imagePullPolicy: Never
- name: virtio-drivers - name: virtio-drivers
containerDisk: containerDisk:
# Pinned to v1.8.2 (latest stable as of 2026-05-08). # Pinned to v1.8.2 (latest stable as of 2026-05-08).