Compare commits

...

3 Commits

Author SHA1 Message Date
Codex
667777a653 revert(ci1): back to cdrom:scsi (virtio-blk disk hit QEMU flock)
The virtio-blk disk swap (commit 84c9feb) didn't help: qemu fails to
acquire the write lock on the rootdisk PVC because the previous
launcher's qemu process didn't release it cleanly. Same family of
bug as the "stale QEMU flock" already documented in
feedback_kubevirt_iso_first_install_bootorder_and_runstrategy, but
now triggered on rke2-agent1 instead of agent2.

OVMF cdrom timeout is the real blocker and remains open:
  -  Distribution pipeline (build → save → scp → ctr import on all
    3 RKE2 nodes) is proven. localhost/win-server-2025:1.0 lives in
    each node's containerd k8s.io namespace.
  -  containerDisk + cdrom:scsi gets qemu domain Running (no NFS
    Permission denied, no rootdisk flock).
  -  OVMF BdsDxe times out reading the SCSI cdrom regardless of
    SecureBoot setting and bus type.

Reverting the disk type to cdrom:scsi so the VM lands back on the
"qemu Running, OVMF stuck at Boot Manager" state — known-stable and
easier to attack than the QEMU-flock state we hit by trying
virtio-blk disk.

Operator decision for next architectural step (one of):
  - Custom OVMF firmware build with longer Boot0001 timeout
  - KubeVirt version bump (v1.5+ has OVMF fixes)
  - Hyper-V/VirtualBox install + export VHD to ci1
  - BIOS legacy boot (Win Server 2025 needs UEFI but install media
    has a BIOS path)
  - DataVolume HTTP datasource (CDI internalizes ISO bytes via
    different code path)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 21:35:00 -05:00
Codex
84c9feb893 fix(ci1): present ISO as virtio-blk disk instead of cdrom
OVMF BdsDxe "starting Boot0001 ... Time out" persists across:
  - SATA cdrom + Longhorn Filesystem PVC (Path A)
  - SATA cdrom + Synology NFS (Path B failed: storage perms)
  - SCSI cdrom + Longhorn (Path B variant)
  - SCSI cdrom + containerDisk tmpfs (Path C)
  - + SecureBoot=false

That rules out: storage IO speed, cdrom bus type, signature
verification. Remaining cause is deeper in qemu's cdrom device
emulation under KubeVirt v1.4.0's OVMF firmware — the cdrom read
window for OVMF's first-sector probe is too short to satisfy from
the cdrom controller path regardless of bus type.

Workaround: present the ISO bytes as a regular virtio-blk DISK
(not a cdrom). UEFI/OVMF still recognizes ISO9660 + El Torito
boot records on any block device, so it can find and boot the
EFI bootloader the same way it would from a USB stick. virtio-blk
has a different read path that doesn't hit the cdrom-specific
timeout.

This also better aligns with the FlowerCore.Distribution USB-key
pattern: ISO bytes on a block device, UEFI boots from the El
Torito boot record, Windows installer takes over. The autounattend
ConfigMap (ci1-autounattend) drives unattended Windows setup once
the installer kicks off.

The containerDisk OCI image (localhost/win-server-2025:1.0)
remains unchanged — only the disk type in the VM spec changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 21:29:59 -05:00
Codex
427dbfcef2 [uc] Phase 1 auth gate deploy v20260509-4162dca-authgate 2026-05-08 21:16:54 -05:00
2 changed files with 13 additions and 7 deletions

View File

@@ -58,7 +58,7 @@ spec:
nodeName: rke2-server
containers:
- name: web
image: localhost/fc-updater-web:v20260508-pub3-deepening-2bdf108
image: localhost/fc-updater-web:v20260509-4162dca-authgate
imagePullPolicy: Never
ports:
- containerPort: 8080

View File

@@ -411,12 +411,18 @@ spec:
# Confirmed via debug pod: PVC content IS a real bootable ISO9660
# (file: "ISO 9660 CD-ROM filesystem data ... (bootable)"), so the
# only bug was boot priority.
# 2026-05-08 PM: cdrom bus is SCSI (virtio-scsi controller). Bus
# choice is no longer load-bearing since the ISO is delivered via
# containerDisk (see volumes block below) — both SATA and SCSI
# work fine when the cdrom backing isn't a slow PVC. SCSI is kept
# because it's the modern bus and matches the standard FC
# KubeVirt VM template.
# 2026-05-08 PM: cdrom bus SCSI + containerDisk delivery. This
# combination boots qemu cleanly and reaches OVMF, but OVMF
# BdsDxe still hits "starting Boot0001 ... Time out" on the
# cdrom — see HANDOFF.md / CODEX-STATUS.md "OPEN — ci1" for the
# full diagnostic chain. virtio-blk disk swap was attempted as a
# workaround but introduced a separate QEMU rootdisk flock issue
# without fixing the underlying OVMF cdrom problem; reverted.
# Operator decision needed for next architectural step (OVMF
# custom build with extended timeout, KubeVirt version bump,
# Hyper-V/VirtualBox-and-export, or BIOS legacy boot). The
# containerDisk distribution pipeline (build/save/scp/ctr import)
# is proven and ready to reuse for any of those.
- name: windows-iso
bootOrder: 1
cdrom: