The bootOrder swap alone didn't fix the install — even with `windows-iso` at
bootOrder:1, OVMF UEFI still timed out reading the SATA CDROM:
BdsDxe: starting Boot0001 "UEFI QEMU DVD-ROM QM00001 " from ... Sata(...)
BdsDxe: failed to start Boot0001 ... : Time out
BdsDxe: No bootable option or device was found.
Diagnosis (debug pod mounting the live PVC):
- /pvc/disk.img IS a valid bootable ISO9660 image — `file` reports
"ISO 9660 CD-ROM filesystem data 'SSS_X64FRE_EN-US_DV9' (bootable)".
- bytes 0..15: zeros (NOT QCOW2 magic 51 46 49 fb).
- bytes 32769..32773: "CD001" — ISO9660 primary volume descriptor at the
correct offset.
So content was fine. The bug is in how KubeVirt + QEMU + Longhorn expose a
Filesystem-mode PVC's `/disk.img` as a SATA CDROM. With Block-mode the
underlying volume IS the raw ISO9660 sectors, OVMF reads them directly,
no QEMU file-emulation layer. This is the recommended pattern for ISO
install media on KubeVirt + Longhorn.
Migration:
- Replace `kind: PersistentVolumeClaim` with `kind: DataVolume` (CDI manages
the underlying PVC + upload-target pod).
- Set `pvc.volumeMode: Block`.
- Annotate `cdi.kubevirt.io/storage.contentType: kubevirt` so CDI keeps raw
bytes (no QCOW2 wrap).
- VM volume reference changes from `persistentVolumeClaim.claimName` to
`dataVolume.name`. KubeVirt's VMI controller blocks VM start until DV
phase is Succeeded (upload completed).
Operator step after this lands:
1. Wait for DV `phase: UploadReady`
kubectl get dv -n kubevirt-vms windows-server-2025-iso -w
2. virtctl image-upload dv windows-server-2025-iso -n kubevirt-vms \
--image-path "...\en-us_windows_server_2025...iso" \
--uploadproxy-url https://localhost:8443 --insecure --no-create
3. Re-flip runStrategy to Always (was set to Halted live-side during
migration; this commit keeps the manifest at Always).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Required to clear OutOfSync state after the bootOrder fix. Live VM had
runStrategy: Halted (set during diagnosis to release the PVC for inspection).
Manifest had running: true. KubeVirt's validating webhook rejects sync:
admission webhook "virtualmachine-validator.kubevirt.io" denied the request:
Running and RunStrategy are mutually exclusive.
Switching to runStrategy: Always preserves the original "auto-start +
auto-restart" semantics with the non-deprecated field, and gives ArgoCD a
clean diff target to flip Halted -> Always.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Original order: rootdisk=1 (empty 200Gi virtio), windows-iso=2 (SATA CDROM).
UEFI tried the empty virtio disk first, got nothing, fell back to Boot0001
(the SATA CDROM) with a short timeout, and aborted with:
BdsDxe: failed to start Boot0001 ... Time out
BdsDxe: No bootable option or device was found.
VM had been running 38+ min with rootdisk actualSize stuck at 4.13 GiB and
no AgentConnected condition — install never started.
Diagnosis via debug pod mounting the windows-server-2025-iso PVC:
/pvc/disk.img: ISO 9660 CD-ROM filesystem data 'SSS_X64FRE_EN-US_DV9' (bootable)
bytes 0..15: zeros (NOT QCOW2 magic 51 46 49 fb)
bytes 32769..32773: "CD001" (ISO9660 primary volume descriptor)
So the PVC content is a real bootable ISO — the only fix needed is to make
the ISO bootOrder=1 for first install. After Windows installs, it writes its
own UEFI Boot#### entries pointing at the rootdisk EFI partition; UEFI then
boots from rootdisk going forward and the ISO at bootOrder:2 is a fallback
for re-install scenarios.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
KubeVirt v1.4.0 + RKE2 containerd 2.1.5 cannot pull
quay.io/kubevirt/virtio-container-disk:latest:
rpc error: code = Unimplemented
desc = failed to pull and unpack image: not implemented:
media type "application/vnd.docker.distribution.manifest.v1+prettyjws"
is no longer supported since containerd v2.1, please rebuild the image as
"application/vnd.docker.distribution.manifest.v2+json" or
"application/vnd.oci.image.manifest.v1+json"
The :latest tag was last rebuilt with the v1 manifest schema. Tagged versions
v1.6.5+, v1.7.3, v1.8.2 are rebuilt with v2/OCI manifests.
Pinning to v1.8.2 (newest available, contains current Windows VirtIO drivers).
The image only contains the Windows VirtIO driver ISO mounted as a CDROM —
not the KubeVirt runtime — so it is decoupled from the cluster KubeVirt
version.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 prereqs all satisfied:
- Multus CNI v4.2.2 thick-plugin DS Running on rke2-server/agent1/agent2
- CDI v1.65.0 operator + CR Deployed (cdi-apiserver/deployment/uploadproxy
all Running 1/1)
- Windows Server 2025 ISO (7.7GiB, March 2026 update) uploaded via CDI
virtctl image-upload to PVC windows-server-2025-iso. Verified via PVC
annotations: cdi.kubevirt.io/storage.condition.running.message="Upload
Complete", storage.pod.phase="Succeeded"
- Local Administrator password generated (26 char, FANTASTIC strength).
Stored in 1Password vault IAmWorkin (qaphopopkryhbg353ukzhhuqoq) item
h3ix4mgfk65gmkcmvh6ly3d3hu. UTF-16-LE base64 in autounattend.xml Value
field matches the 1P "autounattend AdministratorPassword Value" field.
Changes:
- ISO PVC bumped 6Gi → 10Gi (ISO is 7.7GiB, need headroom)
- Added labels app=ci-runner, flowercore.io/managed-by=bluejay-infra
- autounattend.xml AdministratorPassword Value: real base64-encoded password
- spec.running: false → true (VM starts on next ArgoCD sync)
- Header comment refreshed to LIVE state with prereq references
Network: still pod-network masquerade. Multus NAD prod-vlan57 is registered
but the VM doesn't use it yet (Phase 1.5 host bridge needed first).
Verify after sync:
kubectl --kubeconfig $env:USERPROFILE\.kube\rke2.yaml -n kubevirt-vms get vm,vmi
virtctl --kubeconfig $env:USERPROFILE\.kube\rke2.yaml vnc ci1 -n kubevirt-vms
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds three new bluejay-infra apps that auto-pickup via ApplicationSet (apps/*
directory generator on main):
* apps/multus/multus.yaml — Multus CNI v4.2.2 thick-plugin daemonset (verbatim
upstream, project-annotated). Enables KubeVirt VMs to attach additional
network interfaces. Required by ci1 to bridge onto PROD VLAN 57.
* apps/cdi/{cdi-operator.yaml,cdi-cr.yaml,README.md} — Containerized Data
Importer v1.65.0 (verbatim upstream). Operator + CR pattern. Enables
populating PVCs from HTTP/registry/upload sources, used to load the Windows
Server 2025 ISO into the windows-server-2025-iso PVC.
* apps/kubevirt-vms/prod-vlan57-nad.yaml — NetworkAttachmentDefinition for
PROD VLAN 57 bridge. **Deploy gated on Phase 1.5 host work**: requires
br-prod bridge enslaving enp86s0.57 on each RKE2 node (Puppet config-as-code).
ci1.yaml continues to use pod-network masquerade until that lands; switching
to multus.networkName: kubevirt-vms/prod-vlan57 is a one-line YAML edit
followed by a GitOps push.
Cluster verification (2026-05-08):
- KubeVirt LIVE (3 nodes, virt-api/controller/handler/operator all Running)
- Calico CNI on /etc/cni/net.d + /opt/cni/bin (Multus default paths)
- ApplicationSet `bluejay-infra` already watches `apps/*` on main
Reproducibility: upstream YAMLs vendored verbatim with project header diffs
only. Bumping versions = re-curl + git push. No deploy-time internet fetch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stages a draft VirtualMachine + Namespace + ISO PVC + rootdisk PVC + sysprep
ConfigMap for the dedicated GitHub Actions self-hosted runner that replaces
the never-registered bluejay-ws-sandbox-1 placeholder.
Status: STAGED ONLY. spec.running = false. ISO PVC empty. Two operator
decisions still pending before this can boot:
1. Network choice — pod-network fallback (in this draft) vs Multus +
PROD VLAN NAD (preferred, requires Multus install).
2. ISO path — manual upload via helper pod (Path A) vs CDI HTTP import
(Path B, requires CDI install).
Cluster baseline 2026-05-08:
- KubeVirt operator: installed, healthy, 14d
- CDI: NOT installed
- Multus: NOT installed
- Calico-only CNI
See docs/infrastructure/windows-server-build-runner-plan.md "Phase 1 readiness
gate" for the full operator pickup checklist.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>