docs(ci1): record open rootdisk-flock and SATA-CDROM-timeout issues
Documenting the remaining 2 unresolved issues for the next operator session, with the recovery paths from this session captured inline so the next agent doesn't repeat the same blind alleys: 1. **rootdisk QEMU flock** — every new launcher pod fails QEMU start with `Failed to get "write" lock` on the rootdisk Filesystem-mode disk.img. Stale flock from a previous force-deleted virt-launcher pod. Longhorn engine on rke2-agent2 needs to release the lock; `kubectl patch volume.longhorn.io/<pvc-name> spec.nodeID=""` is reverted by the Longhorn controller. Operator-level recovery only. 2. **SATA CDROM read timeout** — even with bootOrder=1 (windows-iso first), OVMF UEFI fails Boot0001 with "Time out" reading the SATA CDROM backed by the Filesystem-mode PVC. Block-mode DataVolume migration was attempted but blocked by CDI v1.65.0's upload pod running with `capabilities.drop: [ALL]` and `runAsUser: 107`, preventing direct block-device writes (`blockdev: cannot open /dev/cdi-block-volume: Permission denied`). See ISO PVC header docstring for 3 forward paths. Net commits during this session: -1c4145a: bootOrder swap (windows-iso=1, rootdisk=2) -87a7d7c: deprecated `running:` -> `runStrategy: Always` -0bf47df: ISO migration to Block-mode DataVolume (REVERTED) -9f6dc1a: revert to Filesystem PVC (CDI block-upload blocked) -1c4145a+87a7d7c+9f6dc1aare the live, correct configuration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -307,6 +307,26 @@ spec:
|
||||
# Running and RunStrategy are mutually exclusive.
|
||||
# `Always` keeps a VMI running and restarts it if it crashes/exits — same
|
||||
# semantics as the old `running: true`.
|
||||
#
|
||||
# **2026-05-08 status: VM cannot start due to a stale QEMU flock on the
|
||||
# rootdisk PVC** (qemu reports `Failed to get "write" lock` on
|
||||
# `/var/run/kubevirt-private/vmi-disks/rootdisk/disk.img`). The flock was
|
||||
# left by a previous QEMU process during a force-deleted launcher pod
|
||||
# cycle. Recovery requires either (a) a Longhorn engine restart on
|
||||
# rke2-agent2, (b) a Longhorn volume detach via the longhorn-manager API
|
||||
# (kubectl patch on `volume.longhorn.io/<pvc-name>` does not work — the
|
||||
# spec.nodeID is reconciled back), or (c) a node reboot of rke2-agent2.
|
||||
#
|
||||
# **Confirmed working:** the bootOrder swap (windows-iso=1, rootdisk=2)
|
||||
# and the runStrategy migration (above). The ISO PVC was successfully
|
||||
# repopulated via virtctl image-upload pvc on the Filesystem-mode PVC.
|
||||
#
|
||||
# **Open: SATA CDROM read timeout** — even with bootOrder=1, OVMF reported
|
||||
# `BdsDxe: failed to start Boot0001 ... Time out` reading the SATA CDROM
|
||||
# backed by the Filesystem-mode PVC. A switch to Block-mode DataVolume
|
||||
# was attempted but blocked by a CDI v1.65.0 upload-pod permission issue
|
||||
# (capability drop prevents writing to the underlying block device).
|
||||
# See header docstring on the ISO PVC.
|
||||
runStrategy: Always # LIVE — ISO uploaded 2026-05-08, password in 1P
|
||||
template:
|
||||
metadata:
|
||||
|
||||
Reference in New Issue
Block a user