fix(ci1): revert ISO to Filesystem PVC; CDI v1.65.0 block-upload pod blocked by capability drop

The Block-mode DataVolume migration (commit 0bf47df) hit a CDI v1.65.0 limitation:
the upload-target pod runs as uid 107 with `capabilities.drop: [ALL]`, so it
cannot open the underlying block device:

  blockdev: cannot open /dev/cdi-block-volume: Permission denied
  Saving stream failed: Unable to transfer source data to target file:
  error determining if block device exists: exit status 1

Reverting to a Filesystem-mode PVC + virtctl image-upload pvc, which DID work
(uploaded the 7.7 GiB ISO with valid ISO9660 magic intact). Boot timeout is
unresolved (header docstring captures the open issue + 3 paths to revisit).

The bootOrder swap (1c4145a) and runStrategy migration (87a7d7c) stay landed —
those are correct improvements regardless of the volume-mode question.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Codex
2026-05-08 14:32:52 -05:00
parent 0bf47dfa33
commit 9f6dc1a9d5

View File

@@ -49,57 +49,49 @@ metadata:
pod-security.kubernetes.io/enforce: privileged
---
# ISO DataVolume — CDI manages an underlying PVC of the same name and exposes
# the upload-target pod once it's ready.
# ISO PVC — populated via CDI virtctl image-upload (CDI is now installed).
#
# **Why DataVolume + Block volumeMode** (vs the original `kind: PersistentVolumeClaim`
# + virtctl image-upload pvc): a `volumeMode: Filesystem` PVC stores the upload
# as `/disk.img` on a mounted ext4. KubeVirt then exposes that file as a SATA
# CDROM via QEMU. On 2026-05-08 this caused the OVMF UEFI firmware to fail
# Boot0001 with "Time out" reading the SATA CDROM, even with the install ISO
# at bootOrder:1 — see docs/infrastructure/feedback notes below. The ISO
# content WAS valid (`file` reported "ISO 9660 CD-ROM filesystem data ...
# (bootable)"), but the QEMU SATA emulation over a Filesystem-PVC backing was
# too slow / mis-attached for OVMF's CDROM read window.
# **Volume mode (2026-05-08 status):** Filesystem-mode PVC. A migration to
# `volumeMode: Block` via DataVolume was attempted to address an OVMF SATA
# CDROM read timeout, but CDI v1.65.0's upload-target pod runs as uid 107
# with `capabilities.drop: [ALL]` and cannot open the underlying block
# device (`blockdev: cannot open /dev/cdi-block-volume: Permission denied`).
# Reverted to Filesystem PVC pending one of:
# - CDI deployment override granting CAP_SYS_RAWIO to upload pod
# - Pre-populated PVC via privileged init pod that dd's the ISO directly
# - Migration to a different storage class that exposes block devices
# differently (e.g. iSCSI, where Longhorn's CSI mount path may behave
# differently)
#
# `volumeMode: Block` gives us a raw block device directly — KubeVirt attaches
# it to the VM as `/dev/sdX` style storage, OVMF reads ISO9660 sectors directly
# from the underlying block volume, no QEMU virtual file emulation needed.
# This is the recommended pattern for ISO install media on KubeVirt + Longhorn.
#
# Population workflow:
# 1. After this DataVolume is applied, CDI creates the PVC and an
# upload-target pod. Wait for `phase: UploadReady`.
# 2. From BLUEJAY-WS:
# kubectl --kubeconfig $env:USERPROFILE\.kube\rke2.yaml port-forward \
# -n cdi service/cdi-uploadproxy 8443:443 &
# virtctl --kubeconfig $env:USERPROFILE\.kube\rke2.yaml image-upload dv \
# Population workflow (this PVC, Filesystem mode):
# 1. virtctl --kubeconfig $env:USERPROFILE\.kube\rke2.yaml image-upload pvc \
# windows-server-2025-iso -n kubevirt-vms \
# --image-path "$env:USERPROFILE\Downloads\en-us_windows_server_2025_updated_march_2026_x64_dvd_8e06425a.iso" \
# --uploadproxy-url https://localhost:8443 --insecure --no-create
# (`--no-create` — the DV/PVC already exist, virtctl just streams bytes.)
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
# --size 10Gi --storage-class longhorn --access-mode ReadWriteOnce \
# --uploadproxy-url https://localhost:8443 --insecure
# (--uploadproxy-url uses port-forward in practice: `kubectl port-forward
# -n cdi service/cdi-uploadproxy 8443:443 &` first.)
#
# **Open boot issue:** even with the ISO at bootOrder:1, OVMF console showed:
# BdsDxe: starting Boot0001 "UEFI QEMU DVD-ROM QM00001 " from ... Sata(...)
# BdsDxe: failed to start Boot0001 ... Time out
# Diagnosis confirmed PVC content IS a valid bootable ISO9660 image — the
# timeout is in OVMF reading from the SATA-CDROM-backed-by-filesystem-PVC.
# Block mode would likely fix it; see CDI permission issue above.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: windows-server-2025-iso
namespace: kubevirt-vms
labels:
app: ci-runner
flowercore.io/managed-by: bluejay-infra
annotations:
# Tell CDI not to "convert" — keep raw bytes so the underlying block device
# IS the ISO9660 sectors verbatim, not a QCOW2 wrap.
cdi.kubevirt.io/storage.contentType: kubevirt
spec:
source:
upload: {}
pvc:
accessModes:
- ReadWriteOnce # Bump to ReadOnlyMany after population for multi-VM use
resources:
requests:
storage: 10Gi # Server 2025 ISO is 7.7GB; 10Gi for headroom
volumeMode: Block # CRITICAL — see header comment above
storageClassName: longhorn
---
@@ -410,11 +402,8 @@ spec:
persistentVolumeClaim:
claimName: ci1-rootdisk
- name: windows-iso
# Reference the DataVolume (defined above) — CDI creates the PVC of
# the same name with volumeMode: Block. The VMI controller blocks
# VM start until DV phase is Succeeded (i.e. upload completed).
dataVolume:
name: windows-server-2025-iso
persistentVolumeClaim:
claimName: windows-server-2025-iso
- name: virtio-drivers
containerDisk:
# Pinned to v1.8.2 (latest stable as of 2026-05-08).