From 667777a65355693fd6b1d864508b5675c0c9a487 Mon Sep 17 00:00:00 2001 From: Codex Date: Fri, 8 May 2026 21:35:00 -0500 Subject: [PATCH] revert(ci1): back to cdrom:scsi (virtio-blk disk hit QEMU flock) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The virtio-blk disk swap (commit 84c9feb) didn't help: qemu fails to acquire the write lock on the rootdisk PVC because the previous launcher's qemu process didn't release it cleanly. Same family of bug as the "stale QEMU flock" already documented in feedback_kubevirt_iso_first_install_bootorder_and_runstrategy, but now triggered on rke2-agent1 instead of agent2. OVMF cdrom timeout is the real blocker and remains open: - ✅ Distribution pipeline (build → save → scp → ctr import on all 3 RKE2 nodes) is proven. localhost/win-server-2025:1.0 lives in each node's containerd k8s.io namespace. - ✅ containerDisk + cdrom:scsi gets qemu domain Running (no NFS Permission denied, no rootdisk flock). - ❌ OVMF BdsDxe times out reading the SCSI cdrom regardless of SecureBoot setting and bus type. Reverting the disk type to cdrom:scsi so the VM lands back on the "qemu Running, OVMF stuck at Boot Manager" state — known-stable and easier to attack than the QEMU-flock state we hit by trying virtio-blk disk. Operator decision for next architectural step (one of): - Custom OVMF firmware build with longer Boot0001 timeout - KubeVirt version bump (v1.5+ has OVMF fixes) - Hyper-V/VirtualBox install + export VHD to ci1 - BIOS legacy boot (Win Server 2025 needs UEFI but install media has a BIOS path) - DataVolume HTTP datasource (CDI internalizes ISO bytes via different code path) Co-Authored-By: Claude Opus 4.7 (1M context) --- apps/kubevirt-vms/ci1.yaml | 30 ++++++++++++++---------------- 1 file changed, 14 insertions(+), 16 deletions(-) diff --git a/apps/kubevirt-vms/ci1.yaml b/apps/kubevirt-vms/ci1.yaml index 7058f72..3f9e459 100644 --- a/apps/kubevirt-vms/ci1.yaml +++ b/apps/kubevirt-vms/ci1.yaml @@ -411,24 +411,22 @@ spec: # Confirmed via debug pod: PVC content IS a real bootable ISO9660 # (file: "ISO 9660 CD-ROM filesystem data ... (bootable)"), so the # only bug was boot priority. - # 2026-05-08 PM: ISO presented as a virtio-blk DISK (not cdrom). - # Both SATA and SCSI cdrom buses hit OVMF BdsDxe "starting Boot0001 - # ... Time out" regardless of storage backend (NFS, Longhorn PVC, - # containerDisk tmpfs — all rule out IO speed). The qemu cdrom - # emulation path appears to have a deep-seated read window issue - # under KubeVirt v1.4.0's OVMF firmware. - # - # Workaround: present the ISO bytes as a regular virtio-blk disk - # (model="virtio-non-transitional"). UEFI/OVMF still recognizes - # ISO9660 + El Torito boot records on a regular disk, so it can - # boot the EFI bootloader the same way it would from a USB stick. - # This is also closer to the FlowerCore.Distribution USB-key - # pattern: the ISO bytes live on a block device, UEFI boots from - # the GPT/El Torito boot record, Windows installer runs. + # 2026-05-08 PM: cdrom bus SCSI + containerDisk delivery. This + # combination boots qemu cleanly and reaches OVMF, but OVMF + # BdsDxe still hits "starting Boot0001 ... Time out" on the + # cdrom — see HANDOFF.md / CODEX-STATUS.md "OPEN — ci1" for the + # full diagnostic chain. virtio-blk disk swap was attempted as a + # workaround but introduced a separate QEMU rootdisk flock issue + # without fixing the underlying OVMF cdrom problem; reverted. + # Operator decision needed for next architectural step (OVMF + # custom build with extended timeout, KubeVirt version bump, + # Hyper-V/VirtualBox-and-export, or BIOS legacy boot). The + # containerDisk distribution pipeline (build/save/scp/ctr import) + # is proven and ready to reuse for any of those. - name: windows-iso bootOrder: 1 - disk: - bus: virtio + cdrom: + bus: scsi - name: rootdisk bootOrder: 2 disk: