From ba18c52130ee710a608a20b3e1d3ea34a9a0b5ef Mon Sep 17 00:00:00 2001 From: Codex Date: Fri, 8 May 2026 15:18:38 -0500 Subject: [PATCH] docs(ci1): record open rootdisk-flock and SATA-CDROM-timeout issues MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Documenting the remaining 2 unresolved issues for the next operator session, with the recovery paths from this session captured inline so the next agent doesn't repeat the same blind alleys: 1. **rootdisk QEMU flock** — every new launcher pod fails QEMU start with `Failed to get "write" lock` on the rootdisk Filesystem-mode disk.img. Stale flock from a previous force-deleted virt-launcher pod. Longhorn engine on rke2-agent2 needs to release the lock; `kubectl patch volume.longhorn.io/ spec.nodeID=""` is reverted by the Longhorn controller. Operator-level recovery only. 2. **SATA CDROM read timeout** — even with bootOrder=1 (windows-iso first), OVMF UEFI fails Boot0001 with "Time out" reading the SATA CDROM backed by the Filesystem-mode PVC. Block-mode DataVolume migration was attempted but blocked by CDI v1.65.0's upload pod running with `capabilities.drop: [ALL]` and `runAsUser: 107`, preventing direct block-device writes (`blockdev: cannot open /dev/cdi-block-volume: Permission denied`). See ISO PVC header docstring for 3 forward paths. Net commits during this session: - 1c4145a: bootOrder swap (windows-iso=1, rootdisk=2) - 87a7d7c: deprecated `running:` -> `runStrategy: Always` - 0bf47df: ISO migration to Block-mode DataVolume (REVERTED) - 9f6dc1a: revert to Filesystem PVC (CDI block-upload blocked) - 1c4145a + 87a7d7c + 9f6dc1a are the live, correct configuration. Co-Authored-By: Claude Opus 4.7 (1M context) --- apps/kubevirt-vms/ci1.yaml | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/apps/kubevirt-vms/ci1.yaml b/apps/kubevirt-vms/ci1.yaml index 3fa25a1..afad22a 100644 --- a/apps/kubevirt-vms/ci1.yaml +++ b/apps/kubevirt-vms/ci1.yaml @@ -307,6 +307,26 @@ spec: # Running and RunStrategy are mutually exclusive. # `Always` keeps a VMI running and restarts it if it crashes/exits — same # semantics as the old `running: true`. + # + # **2026-05-08 status: VM cannot start due to a stale QEMU flock on the + # rootdisk PVC** (qemu reports `Failed to get "write" lock` on + # `/var/run/kubevirt-private/vmi-disks/rootdisk/disk.img`). The flock was + # left by a previous QEMU process during a force-deleted launcher pod + # cycle. Recovery requires either (a) a Longhorn engine restart on + # rke2-agent2, (b) a Longhorn volume detach via the longhorn-manager API + # (kubectl patch on `volume.longhorn.io/` does not work — the + # spec.nodeID is reconciled back), or (c) a node reboot of rke2-agent2. + # + # **Confirmed working:** the bootOrder swap (windows-iso=1, rootdisk=2) + # and the runStrategy migration (above). The ISO PVC was successfully + # repopulated via virtctl image-upload pvc on the Filesystem-mode PVC. + # + # **Open: SATA CDROM read timeout** — even with bootOrder=1, OVMF reported + # `BdsDxe: failed to start Boot0001 ... Time out` reading the SATA CDROM + # backed by the Filesystem-mode PVC. A switch to Block-mode DataVolume + # was attempted but blocked by a CDI v1.65.0 upload-pod permission issue + # (capability drop prevents writing to the underlying block device). + # See header docstring on the ISO PVC. runStrategy: Always # LIVE — ISO uploaded 2026-05-08, password in 1P template: metadata: