fix(ci1): present ISO as virtio-blk disk instead of cdrom

OVMF BdsDxe "starting Boot0001 ... Time out" persists across: - SATA cdrom + Longhorn Filesystem PVC (Path A) - SATA cdrom + Synology NFS (Path B failed: storage perms) - SCSI cdrom + Longhorn (Path B variant) - SCSI cdrom + containerDisk tmpfs (Path C) - + SecureBoot=false That rules out: storage IO speed, cdrom bus type, signature verification. Remaining cause is deeper in qemu's cdrom device emulation under KubeVirt v1.4.0's OVMF firmware — the cdrom read window for OVMF's first-sector probe is too short to satisfy from the cdrom controller path regardless of bus type. Workaround: present the ISO bytes as a regular virtio-blk DISK (not a cdrom). UEFI/OVMF still recognizes ISO9660 + El Torito boot records on any block device, so it can find and boot the EFI bootloader the same way it would from a USB stick. virtio-blk has a different read path that doesn't hit the cdrom-specific timeout. This also better aligns with the FlowerCore.Distribution USB-key pattern: ISO bytes on a block device, UEFI boots from the El Torito boot record, Windows installer takes over. The autounattend ConfigMap (ci1-autounattend) drives unattended Windows setup once the installer kicks off. The containerDisk OCI image (localhost/win-server-2025:1.0) remains unchanged — only the disk type in the VM spec changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
[uc] Phase 1 auth gate deploy v20260509-4162dca-authgate
2026-05-08 21:29:59 -05:00 · 2026-05-08 21:16:54 -05:00 · 2026-05-08 21:06:18 -05:00 · 2026-05-08 20:45:38 -05:00 · 2026-05-08 18:54:36 -05:00 · 2026-05-08 17:03:42 -05:00
8 changed files with 6849 additions and 1 deletions
--- a/apps/cdi/README.md
+++ b/apps/cdi/README.md
@@ -0,0 +1,69 @@
 # CDI — Containerized Data Importer
 KubeVirt's `containerized-data-importer` for populating PVCs from external
 sources (HTTP, HTTPS, container registry, S3, virtctl upload). Required to
 import the Windows Server 2025 ISO into the `windows-server-2025-iso` PVC
 that `apps/kubevirt-vms/ci1.yaml` mounts as a CDROM.
 ## Files
 | File              | Source                                                                                                            | Purpose                                            |
 | ----------------- | ----------------------------------------------------------------------------------------------------------------- | -------------------------------------------------- |
 | `cdi-operator.yaml` | [`v1.65.0`](https://github.com/kubevirt/containerized-data-importer/releases/tag/v1.65.0) — verbatim copy        | Installs operator + CRDs (5779 lines, large)       |
 | `cdi-cr.yaml`     | [`v1.65.0`](https://github.com/kubevirt/containerized-data-importer/releases/tag/v1.65.0) — annotated + commented | Tells operator to deploy CDI components          |
 `cdi-operator.yaml` is **vendored verbatim** from the upstream release for
 air-gap reproducibility (no internet fetch at deploy time, ArgoCD prune
 contracts hold). To bump versions:
 ```bash
 CDI_VER=v1.66.0  # for example
 curl -sL "https://github.com/kubevirt/containerized-data-importer/releases/download/${CDI_VER}/cdi-operator.yaml" \
  -o apps/cdi/cdi-operator.yaml
 curl -sL "https://github.com/kubevirt/containerized-data-importer/releases/download/${CDI_VER}/cdi-cr.yaml" \
  -o /tmp/cdi-cr-new.yaml  # then re-apply project header diff
 git diff apps/cdi/  # review
 git commit + push
 ```
 ## Verify after deploy
 ```bash
 kubectl -n cdi get pods               # operator + apiserver + deployment + uploadproxy
 kubectl get cdis cdi -o jsonpath='{.status.phase}'  # "Deployed"
 kubectl get crd | grep cdi.kubevirt.io
 # Expected CRDs: datavolumes.cdi.kubevirt.io, cdiconfigs.cdi.kubevirt.io,
 # storageprofiles.cdi.kubevirt.io, dataimportcrons.cdi.kubevirt.io,
 # datasources.cdi.kubevirt.io, objecttransfers.cdi.kubevirt.io
 ```
 ## Use after install
 ```yaml
 # Example DataVolume that imports from HTTP
 apiVersion: cdi.kubevirt.io/v1beta1
 kind: DataVolume
 metadata:
  name: my-iso
 spec:
  source:
    http:
      url: "https://server/path/to.iso"
  pvc:
    accessModes: [ReadWriteOnce]
    resources:
      requests:
        storage: 10Gi
    storageClassName: longhorn
 ```
 ```bash
 # Or upload from local disk via virtctl
 virtctl image-upload pvc my-iso \
  --image-path ./my.iso \
  --size 10Gi \
  --storage-class longhorn \
  --access-mode ReadWriteOnce \
  --uploadproxy-url https://cdi-uploadproxy.cdi.svc:443 \
  --insecure
 ```
--- a/apps/cdi/cdi-cr.yaml
+++ b/apps/cdi/cdi-cr.yaml
@@ -0,0 +1,36 @@
 # =============================================================================
 # CDI CR — Tells the CDI operator to install CDI components into the cluster.
 # =============================================================================
 # After cdi-operator.yaml is applied, the operator watches for THIS resource
 # (CDI named "cdi"). When found, it deploys cdi-apiserver, cdi-deployment,
 # cdi-uploadproxy, cdi-cronjob, and the importer/uploadserver/cloner pods.
 #
 # Configuration:
 #   - HonorWaitForFirstConsumer: PVCs created by DataVolumes wait for first
 #     pod to schedule before binding (lets storage class pick best node).
 #   - WebhookPvcRendering: validates PVC creation against CDI policies.
 #   - imagePullPolicy IfNotPresent: re-pull only on tag rotation.
 #   - nodeSelector linux: pin to Linux nodes (no Windows worker support).
 #
 # Andrew may want to add a `uploadProxyURLOverride` later to expose the
 # uploadproxy via Traefik IngressRoute for `virtctl image-upload` from
 # BLUEJAY-WS without `kubectl port-forward`. Phase 2 enhancement.
 # =============================================================================
 apiVersion: cdi.kubevirt.io/v1beta1
 kind: CDI
 metadata:
  name: cdi
  annotations:
    bluejay.iamworkin.lan/source: "kubevirt/containerized-data-importer v1.65.0"
 spec:
  config:
    featureGates:
    - HonorWaitForFirstConsumer
    - WebhookPvcRendering
  imagePullPolicy: IfNotPresent
  infra:
    nodeSelector:
      kubernetes.io/os: linux
  workload:
    nodeSelector:
      kubernetes.io/os: linux
--- a/apps/cdi/cdi-operator.yaml
+++ b/apps/cdi/cdi-operator.yaml
--- a/apps/fc-updater/fc-updater.yaml
+++ b/apps/fc-updater/fc-updater.yaml
@@ -58,7 +58,7 @@ spec:
      nodeName: rke2-server
      containers:
        - name: web
-          image: localhost/fc-updater-web:v20260507-public-privacy
+          image: localhost/fc-updater-web:v20260509-4162dca-authgate
          imagePullPolicy: Never
          ports:
            - containerPort: 8080
--- a/apps/kubevirt-vms/ci1.yaml
+++ b/apps/kubevirt-vms/ci1.yaml
@@ -0,0 +1,510 @@
 # =============================================================================
 # ci1 — Windows Server 2025 KubeVirt VM (GitHub Actions Self-Hosted Runner)
 # =============================================================================
 # Purpose: dedicated CI runner for FlowerCore.Updater Sandbox E2E nightly +
 # future fleet WPF AAT lanes. Replaces the never-registered
 # `bluejay-ws-sandbox-1` runner placeholder. Andrew explicitly does NOT want
 # BLUEJAY-WS registered as a runner (workstation has personal/operator state).
 #
 # Storage layout (2026-05-08):
 #   * ISO is now sourced from Synology NFS (Path B) — see
 #     win2025-iso-nfs-pv.yaml. The Longhorn Filesystem PVC
 #     `windows-server-2025-iso` below is RETAINED but UNUSED so the prior
 #     CDI upload state is preserved as a fallback (and so ArgoCD doesn't
 #     prune it on this commit). It can be deleted in a follow-up commit
 #     after the NFS path is proven on a successful Windows install.
 #
 # Status (2026-05-08): LIVE — Phase 1 prereqs satisfied:
 #   * Multus CNI v4.2.2 thick-plugin DaemonSet running on all 3 RKE2 nodes
 #     (apps/multus/multus.yaml; ApplicationSet `infra-multus` Synced/Healthy)
 #   * CDI v1.65.0 operator + CR Deployed (apps/cdi/; ApplicationSet
 #     `infra-cdi` Synced/Healthy; uploadproxy reachable via kubectl port-forward)
 #   * Windows Server 2025 ISO uploaded via CDI virtctl image-upload to
 #     PVC windows-server-2025-iso (7.7 GiB → 10Gi PVC, Bound, Upload Complete)
 #   * Local Administrator password generated, stored in 1Password vault
 #     IAmWorkin (qaphopopkryhbg353ukzhhuqoq) item id h3ix4mgfk65gmkcmvh6ly3d3hu
 #   * NetworkAttachmentDefinition prod-vlan57 registered (apps/kubevirt-vms/
 #     prod-vlan57-nad.yaml). VM still uses pod-network masquerade until Phase 1.5
 #     host bridge work lands (Puppet br-prod + enp86s0.57); switching is a
 #     one-line YAML edit + git push.
 #
 # See docs/infrastructure/windows-server-build-runner-plan.md "Phase 1 readiness gate".
 #
 # Network choice in this draft: **pod-network fallback** (Calico default).
 # Outbound-only is fine for the Updater Sandbox E2E runner workload (the runner
 # polls GitHub Actions over HTTPS; no inbound listener needed). Switch to a
 # Multus PROD VLAN NetworkAttachmentDefinition once Multus is installed and the
 # operator wants L2 access from `ci1` to other PROD VLAN services.
 #
 # Sizing: 8 vCPU / 16 GB RAM / 200 GB disk on Longhorn (default storageClass).
 # Capacity check 2026-05-08: each RKE2 node has 16 vCPU / ~64Gi allocatable;
 # 8 vCPU is ~17% of one node's allocatable, fits comfortably.
 #
 # Apply (after operator approval + ISO loaded):
 #   kubectl --kubeconfig $env:USERPROFILE\.kube\rke2.yaml apply -f apps/kubevirt-vms/ci1.yaml
 #
 # Connect to console for Windows install:
 #   virtctl --kubeconfig $env:USERPROFILE\.kube\rke2.yaml vnc ci1 -n kubevirt-vms
 #   (Or via Guacamole once a connection profile is added.)
 # =============================================================================
 apiVersion: v1
 kind: Namespace
 metadata:
  name: kubevirt-vms
  labels:
    app.kubernetes.io/part-of: kubevirt-stack
    pod-security.kubernetes.io/enforce: privileged
 ---
 # ISO PVC — populated via CDI virtctl image-upload (CDI is now installed).
 #
 # **Volume mode (2026-05-08 status):** Filesystem-mode PVC. A migration to
 # `volumeMode: Block` via DataVolume was attempted to address an OVMF SATA
 # CDROM read timeout, but CDI v1.65.0's upload-target pod runs as uid 107
 # with `capabilities.drop: [ALL]` and cannot open the underlying block
 # device (`blockdev: cannot open /dev/cdi-block-volume: Permission denied`).
 # Reverted to Filesystem PVC pending one of:
 #   - CDI deployment override granting CAP_SYS_RAWIO to upload pod
 #   - Pre-populated PVC via privileged init pod that dd's the ISO directly
 #   - Migration to a different storage class that exposes block devices
 #     differently (e.g. iSCSI, where Longhorn's CSI mount path may behave
 #     differently)
 #
 # Population workflow (this PVC, Filesystem mode):
 #   1. virtctl --kubeconfig $env:USERPROFILE\.kube\rke2.yaml image-upload pvc \
 #        windows-server-2025-iso -n kubevirt-vms \
 #        --image-path "$env:USERPROFILE\Downloads\en-us_windows_server_2025_updated_march_2026_x64_dvd_8e06425a.iso" \
 #        --size 10Gi --storage-class longhorn --access-mode ReadWriteOnce \
 #        --uploadproxy-url https://localhost:8443 --insecure
 #   (--uploadproxy-url uses port-forward in practice: `kubectl port-forward
 #   -n cdi service/cdi-uploadproxy 8443:443 &` first.)
 #
 # **Open boot issue:** even with the ISO at bootOrder:1, OVMF console showed:
 #   BdsDxe: starting Boot0001 "UEFI QEMU DVD-ROM QM00001 " from ... Sata(...)
 #   BdsDxe: failed to start Boot0001 ... Time out
 # Diagnosis confirmed PVC content IS a valid bootable ISO9660 image — the
 # timeout is in OVMF reading from the SATA-CDROM-backed-by-filesystem-PVC.
 # Block mode would likely fix it; see CDI permission issue above.
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
  name: windows-server-2025-iso
  namespace: kubevirt-vms
  labels:
    app: ci-runner
    flowercore.io/managed-by: bluejay-infra
 spec:
  accessModes:
    - ReadWriteOnce          # Bump to ReadOnlyMany after population for multi-VM use
  resources:
    requests:
      storage: 10Gi          # Server 2025 ISO is 7.7GB; 10Gi for headroom
  storageClassName: longhorn
 ---
 # Root disk PVC — empty 200Gi volume that Windows installs into.
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
  name: ci1-rootdisk
  namespace: kubevirt-vms
 spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 200Gi
  storageClassName: longhorn
 ---
 # Sysprep ConfigMap — autounattend.xml for hands-off Windows install.
 # Sets local Administrator password (REPLACE the placeholder), enables RDP,
 # enables WinRM, sets hostname, and configures static-ish networking via DHCP.
 # The ISO + VirtIO drivers handle the rest.
 apiVersion: v1
 kind: ConfigMap
 metadata:
  name: ci1-autounattend
  namespace: kubevirt-vms
 data:
  autounattend.xml: |
    <?xml version="1.0" encoding="utf-8"?>
    <unattend xmlns="urn:schemas-microsoft-com:unattend">
      <!-- Pass 1: WindowsPE — Disk setup and VirtIO driver injection -->
      <settings pass="windowsPE">
        <component name="Microsoft-Windows-International-Core-WinPE"
                   processorArchitecture="amd64"
                   publicKeyToken="31bf3856ad364e35"
                   language="neutral" versionScope="nonSxS">
          <SetupUILanguage>
            <UILanguage>en-US</UILanguage>
          </SetupUILanguage>
          <InputLocale>en-US</InputLocale>
          <SystemLocale>en-US</SystemLocale>
          <UILanguage>en-US</UILanguage>
          <UserLocale>en-US</UserLocale>
        </component>
        <component name="Microsoft-Windows-PnpCustomizationsWinPE"
                   processorArchitecture="amd64"
                   publicKeyToken="31bf3856ad364e35"
                   language="neutral" versionScope="nonSxS">
          <DriverPaths>
            <PathAndCredentials wcm:action="add" wcm:keyValue="1">
              <Path>E:\amd64\2k25</Path>
            </PathAndCredentials>
          </DriverPaths>
        </component>
        <component name="Microsoft-Windows-Setup"
                   processorArchitecture="amd64"
                   publicKeyToken="31bf3856ad364e35"
                   language="neutral" versionScope="nonSxS">
          <DiskConfiguration>
            <Disk wcm:action="add">
              <DiskID>0</DiskID>
              <WillWipeDisk>true</WillWipeDisk>
              <CreatePartitions>
                <CreatePartition wcm:action="add">
                  <Order>1</Order>
                  <Size>260</Size>
                  <Type>EFI</Type>
                </CreatePartition>
                <CreatePartition wcm:action="add">
                  <Order>2</Order>
                  <Size>128</Size>
                  <Type>MSR</Type>
                </CreatePartition>
                <CreatePartition wcm:action="add">
                  <Order>3</Order>
                  <Extend>true</Extend>
                  <Type>Primary</Type>
                </CreatePartition>
              </CreatePartitions>
              <ModifyPartitions>
                <ModifyPartition wcm:action="add">
                  <Order>1</Order>
                  <PartitionID>1</PartitionID>
                  <Format>FAT32</Format>
                  <Label>EFI</Label>
                </ModifyPartition>
                <ModifyPartition wcm:action="add">
                  <Order>2</Order>
                  <PartitionID>2</PartitionID>
                </ModifyPartition>
                <ModifyPartition wcm:action="add">
                  <Order>3</Order>
                  <PartitionID>3</PartitionID>
                  <Format>NTFS</Format>
                  <Label>Windows</Label>
                </ModifyPartition>
              </ModifyPartitions>
            </Disk>
          </DiskConfiguration>
          <ImageInstall>
            <OSImage>
              <InstallTo>
                <DiskID>0</DiskID>
                <PartitionID>3</PartitionID>
              </InstallTo>
              <!-- Index 2 = Standard Desktop Experience. Use 4 for Datacenter Desktop. -->
              <InstallFrom>
                <MetaData wcm:action="add">
                  <Key>/IMAGE/INDEX</Key>
                  <Value>2</Value>
                </MetaData>
              </InstallFrom>
            </OSImage>
          </ImageInstall>
          <UserData>
            <AcceptEula>true</AcceptEula>
            <FullName>FlowerCore CI Runner</FullName>
            <Organization>FlowerCore</Organization>
            <!-- Eval install — no product key needed for 180-day evaluation -->
          </UserData>
        </component>
      </settings>
      <!-- Pass 4: Specialize — Hostname, RDP, WinRM -->
      <settings pass="specialize">
        <component name="Microsoft-Windows-Shell-Setup"
                   processorArchitecture="amd64"
                   publicKeyToken="31bf3856ad364e35"
                   language="neutral" versionScope="nonSxS">
          <ComputerName>CI1</ComputerName>
          <TimeZone>Central Standard Time</TimeZone>
        </component>
        <component name="Microsoft-Windows-TerminalServices-LocalSessionManager"
                   processorArchitecture="amd64"
                   publicKeyToken="31bf3856ad364e35"
                   language="neutral" versionScope="nonSxS">
          <fDenyTSConnections>false</fDenyTSConnections>
        </component>
      </settings>
      <!-- Pass 7: OOBE — Admin account, RDP firewall, WinRM -->
      <settings pass="oobeSystem">
        <component name="Microsoft-Windows-Shell-Setup"
                   processorArchitecture="amd64"
                   publicKeyToken="31bf3856ad364e35"
                   language="neutral" versionScope="nonSxS">
          <OOBE>
            <HideEULAPage>true</HideEULAPage>
            <HideLocalAccountScreen>true</HideLocalAccountScreen>
            <HideOEMRegistrationScreen>true</HideOEMRegistrationScreen>
            <HideOnlineAccountScreens>true</HideOnlineAccountScreens>
            <HideWirelessSetupInOOBE>true</HideWirelessSetupInOOBE>
            <ProtectYourPC>3</ProtectYourPC>
          </OOBE>
          <UserAccounts>
            <AdministratorPassword>
              <!-- Real password is in 1Password — vault qaphopopkryhbg353ukzhhuqoq,
                   item id h3ix4mgfk65gmkcmvh6ly3d3hu, title:
                   "ci1 Administrator (Windows Server 2025 KubeVirt VM)".
                   Field "autounattend AdministratorPassword Value (UTF-16-LE base64)"
                   matches the Value below.
                   To rotate: regenerate, recompute base64
                     $combined = $pw + "AdministratorPassword"
                     [Convert]::ToBase64String([Text.Encoding]::Unicode.GetBytes($combined))
                   then update both 1P item AND this Value field, recreate VM. -->
              <Value>bAA3AGsANABOAHcAcgBMAG4AeQBTAHUAYgBBAHQAaQBzAFUAcAB6AEMAWQAhADkAYQBCAEEAZABtAGkAbgBpAHMAdAByAGEAdABvAHIAUABhAHMAcwB3AG8AcgBkAA==</Value>
              <PlainText>false</PlainText>
            </AdministratorPassword>
          </UserAccounts>
          <FirstLogonCommands>
            <SynchronousCommand wcm:action="add">
              <Order>1</Order>
              <CommandLine>powershell.exe -ExecutionPolicy Bypass -Command "Set-NetFirewallRule -DisplayGroup 'Remote Desktop' -Enabled True"</CommandLine>
              <Description>Enable RDP firewall rule</Description>
            </SynchronousCommand>
            <SynchronousCommand wcm:action="add">
              <Order>2</Order>
              <CommandLine>powershell.exe -ExecutionPolicy Bypass -Command "Enable-PSRemoting -Force; Set-Item WSMan:\localhost\Service\Auth\Basic $true; Set-Item WSMan:\localhost\Service\AllowUnencrypted $true"</CommandLine>
              <Description>Enable WinRM (Phase 2 will pivot to HTTPS via step-ca cert)</Description>
            </SynchronousCommand>
            <SynchronousCommand wcm:action="add">
              <Order>3</Order>
              <CommandLine>cmd.exe /c reg add "HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System" /v EnableLUA /t REG_DWORD /d 0 /f</CommandLine>
              <Description>Disable UAC (Phase 2 Puppet will re-evaluate)</Description>
            </SynchronousCommand>
          </FirstLogonCommands>
        </component>
      </settings>
    </unattend>
 ---
 # VirtualMachine — Windows Server 2025 CI runner.
 apiVersion: kubevirt.io/v1
 kind: VirtualMachine
 metadata:
  name: ci1
  namespace: kubevirt-vms
  labels:
    app: ci-runner
    role: github-actions-runner
    flowercore.io/managed-by: bluejay-infra
 spec:
  # `running: true` is deprecated in favor of `runStrategy`. They are mutually
  # exclusive — KubeVirt's validating webhook rejects any VM that sets both:
  #   admission webhook "virtualmachine-validator.kubevirt.io" denied the request:
  #   Running and RunStrategy are mutually exclusive.
  # `Always` keeps a VMI running and restarts it if it crashes/exits — same
  # semantics as the old `running: true`.
  #
  # **2026-05-08 status: VM cannot start due to a stale QEMU flock on the
  # rootdisk PVC** (qemu reports `Failed to get "write" lock` on
  # `/var/run/kubevirt-private/vmi-disks/rootdisk/disk.img`). The flock was
  # left by a previous QEMU process during a force-deleted launcher pod
  # cycle. Recovery requires either (a) a Longhorn engine restart on
  # rke2-agent2, (b) a Longhorn volume detach via the longhorn-manager API
  # (kubectl patch on `volume.longhorn.io/<pvc-name>` does not work — the
  # spec.nodeID is reconciled back), or (c) a node reboot of rke2-agent2.
  #
  # **Confirmed working:** the bootOrder swap (windows-iso=1, rootdisk=2)
  # and the runStrategy migration (above). The ISO PVC was successfully
  # repopulated via virtctl image-upload pvc on the Filesystem-mode PVC.
  #
  # **Open: SATA CDROM read timeout** — even with bootOrder=1, OVMF reported
  # `BdsDxe: failed to start Boot0001 ... Time out` reading the SATA CDROM
  # backed by the Filesystem-mode PVC. A switch to Block-mode DataVolume
  # was attempted but blocked by a CDI v1.65.0 upload-pod permission issue
  # (capability drop prevents writing to the underlying block device).
  # See header docstring on the ISO PVC.
  runStrategy: Always   # LIVE — ISO uploaded 2026-05-08, password in 1P
  template:
    metadata:
      labels:
        app: ci-runner
        role: github-actions-runner
        kubevirt.io/vm: ci1
    spec:
      domain:
        cpu:
          cores: 8
          sockets: 1
          threads: 1
        memory:
          guest: 16Gi
        resources:
          requests:
            memory: 16Gi
          limits:
            memory: 16Gi
        clock:
          utc: {}
          timer:
            hpet:
              present: false
            pit:
              tickPolicy: delay
            rtc:
              tickPolicy: catchup
            hyperv: {}
        features:
          acpi: {}
          apic: {}
          hyperv:
            relaxed: {}
            vapic: {}
            spinlocks:
              spinlocks: 8191
          smm: {}
        firmware:
          bootloader:
            efi:
              # 2026-05-08: SecureBoot=false during initial install. With SecureBoot
              # enabled, OVMF's BdsDxe times out reading Boot0001 from the SCSI
              # CDROM ("BdsDxe: failed to start Boot0001 ... Time out") before the
              # EFI bootloader signature can verify against the OVMF VARS trust DB.
              # KubeVirt's `/usr/share/OVMF/OVMF_VARS.secboot.fd` template doesn't
              # appear to include the Microsoft KEK/DB by default, so signed
              # Windows EFI bootloaders fail validation. Disabling SecureBoot lets
              # OVMF skip the chain check and boot directly. This is acceptable for
              # a CI runner — TPM 2.0 is still emulated (`tpm: {}` below) so
              # BitLocker / Hyper-V / WSL still work.
              # When the operator wants SecureBoot back, the path is:
              #   1. Custom-build OVMF_VARS.fd with Microsoft KEK/DB enrolled
              #   2. Mount it into the VM via firmware.bootloader.efi.persistent
              #   3. Set secureBoot: true again
              # Tracked separately from the install unblock.
              secureBoot: false
        devices:
          tpm: {}             # Non-persistent vTPM — sufficient for runner; no BitLocker
          disks:
            # bootOrder: ISO must be 1 for first-boot install (the rootdisk has no
            # EFI bootloader yet). After Windows installs, it writes its own UEFI
            # Boot#### entries pointing at the rootdisk's EFI partition; UEFI then
            # boots from rootdisk going forward and the ISO at bootOrder:2 acts as
            # a fallback for re-install scenarios.
            #
            # Original (broken) order had rootdisk=1, windows-iso=2 — UEFI tried
            # the empty virtio disk first, got nothing, fell back to the SATA
            # CDROM at Boot0001 with a short timeout, and timed out before the
            # CDROM enumerated. Console showed:
            #   BdsDxe: failed to start Boot0001 ... Time out
            #   BdsDxe: No bootable option or device was found.
            # Confirmed via debug pod: PVC content IS a real bootable ISO9660
            # (file: "ISO 9660 CD-ROM filesystem data ... (bootable)"), so the
            # only bug was boot priority.
            # 2026-05-08 PM: ISO presented as a virtio-blk DISK (not cdrom).
            # Both SATA and SCSI cdrom buses hit OVMF BdsDxe "starting Boot0001
            # ... Time out" regardless of storage backend (NFS, Longhorn PVC,
            # containerDisk tmpfs — all rule out IO speed). The qemu cdrom
            # emulation path appears to have a deep-seated read window issue
            # under KubeVirt v1.4.0's OVMF firmware.
            #
            # Workaround: present the ISO bytes as a regular virtio-blk disk
            # (model="virtio-non-transitional"). UEFI/OVMF still recognizes
            # ISO9660 + El Torito boot records on a regular disk, so it can
            # boot the EFI bootloader the same way it would from a USB stick.
            # This is also closer to the FlowerCore.Distribution USB-key
            # pattern: the ISO bytes live on a block device, UEFI boots from
            # the GPT/El Torito boot record, Windows installer runs.
            - name: windows-iso
              bootOrder: 1
              disk:
                bus: virtio
            - name: rootdisk
              bootOrder: 2
              disk:
                bus: virtio
            - name: virtio-drivers
              cdrom:
                bus: sata
            - name: sysprep
              cdrom:
                bus: sata
          interfaces:
            # Pod-network fallback for Phase 1. To switch to PROD VLAN once Multus
            # + the prod-vlan57 NAD exist, replace this block with:
            #   - name: prod-net
            #     bridge: {}
            #     model: virtio
            # and update the networks: stanza to use multus.networkName: kubevirt-vms/prod-vlan57
            - name: default
              masquerade: {}
              model: virtio
        machine:
          type: q35
      networks:
        - name: default
          pod: {}
      volumes:
        - name: rootdisk
          persistentVolumeClaim:
            claimName: ci1-rootdisk
        - name: windows-iso
          # 2026-05-08 PM (Path C, CONTAINERDISK): the ISO is now packaged as
          # a KubeVirt containerDisk OCI image baked from
          # `FROM scratch ; ADD --chown=107:107 disk.img /disk/disk.img`.
          # The qemu user (uid 107) reads the ISO directly from a tmpfs view
          # of the OCI layer, bypassing both:
          #   - Synology NFS export ACL (Path B failed: uid 107 denied at
          #     directory level even with mode 0777, see memory
          #     feedback_synology_iso_export_root_only_uid_107_denied)
          #   - OVMF cdrom read-window timeout (Path A and Path B's SCSI
          #     retry both hit `BdsDxe: failed to start Boot0001 ... Time out`
          #     when the cdrom was backed by a PVC the storage controller
          #     couldn't satisfy reads from fast enough).
          #
          # Image build (one-time, per ISO version):
          #   1. Copy ISO to disk.img, write Dockerfile
          #   2. podman build --tag localhost/win-server-2025:1.0 .  (on noc1)
          #   3. podman save -o win-server-2025-1.0.tar localhost/win-server-2025:1.0
          #   4. SCP tar to all 3 RKE2 nodes (rke2-server, rke2-agent1, rke2-agent2)
          #   5. sudo /var/lib/rancher/rke2/bin/ctr -a /run/k3s/containerd/containerd.sock \
          #        -n k8s.io images import /tmp/win-server-2025-1.0.tar
          # Standard FC pattern per `feedback_rke2_localhost_imagepullpolicy`.
          #
          # When a new Windows ISO version ships, bump the tag (1.1, 1.2, ...),
          # rebuild + redistribute, and update the image: line below in a new
          # commit. KubeVirt picks up the new image via a VM restart.
          #
          # The legacy NFS PVC + PV (apps/kubevirt-vms/win2025-iso-nfs-pv.yaml)
          # and CDI Longhorn PVC (`windows-server-2025-iso`) are RETAINED for
          # this commit so the prior states are recoverable. Once the
          # containerDisk path proves on a successful Windows install, both
          # legacy artifacts can be pruned in a follow-up commit.
          containerDisk:
            image: localhost/win-server-2025:1.0
            imagePullPolicy: Never
        - name: virtio-drivers
          containerDisk:
            # Pinned to v1.8.2 (latest stable as of 2026-05-08).
            # The :latest tag uses Docker manifest v1 schema which containerd
            # 2.1 (RKE2 v1.34.5) refuses to pull with:
            #   "media type application/vnd.docker.distribution.manifest.v1+prettyjws
            #    is no longer supported since containerd v2.1"
            # v1.8.2 is rebuilt with manifest v2/OCI and works on containerd 2.1.
            # Bump available: https://quay.io/repository/kubevirt/virtio-container-disk?tab=tags
            image: quay.io/kubevirt/virtio-container-disk:v1.8.2
        - name: sysprep
          sysprep:
            configMap:
              name: ci1-autounattend
      terminationGracePeriodSeconds: 3600
--- a/apps/kubevirt-vms/prod-vlan57-nad.yaml
+++ b/apps/kubevirt-vms/prod-vlan57-nad.yaml
@@ -0,0 +1,69 @@
 # =============================================================================
 # NetworkAttachmentDefinition — PROD VLAN 57 bridge
 # =============================================================================
 # Purpose: makes KubeVirt VMs reachable on the PROD VLAN (10.0.57.0/24)
 # alongside the existing pod network. Required for ci1 to bridge onto PROD
 # (e.g. to provision/scrape edge1, edge2, kiosks, Pis on the same L2 segment).
 #
 # **DEPLOY GATE — Phase 1.5 host work required first**:
 #   On every RKE2 node (rke2-server, rke2-agent1, rke2-agent2):
 #     1. Switch port (UniFi USL16LP) trunks VLAN 57 to the node — usually
 #        already true since BLUEJAY-WS reaches 10.0.57.x services. Verify
 #        with `ip link show enp86s0.57` after configuring sub-interface, OR
 #        `tcpdump -ni enp86s0 vlan 57` and ping a known PROD host.
 #     2. Linux bridge `br-prod` enslaving `enp86s0.57` (VLAN sub-interface).
 #        NetworkManager profile examples in the runbook below.
 #     3. Verify Multus DaemonSet `kube-multus-ds` is Ready on all nodes.
 #
 # Without those, applying this NAD has no effect except to register the CRD.
 # A VM that requests this NAD with no bridge present will fail with:
 #   `error adding pod kubevirt-vms_ci1 to CNI network "prod-vlan57": failed to
 #    plumb VLAN: open /sys/class/net/br-prod/master: no such file or directory`
 #
 # Configuration notes:
 #   - cniVersion 0.3.1 to match Multus daemon-config.json
 #   - mtu 1500 (matches enp86s0 default; bump if jumbo frames configured)
 #   - bridge name `br-prod` is convention; if Puppet picks a different name
 #     (e.g. `br57`, `br-vlan57`), edit BOTH this NAD and the ci1.yaml
 #     interface block. Keep them in sync.
 #   - vlan: 0 because the host bridge already strips VLAN tag (br-prod sits
 #     on top of `enp86s0.57`). If we instead used a VLAN-aware bridge with
 #     trunk port, set vlan: 57 here. Current convention is VLAN-stripped at
 #     the sub-interface, so the bridge passes untagged frames.
 #
 # Apply:
 #   kubectl --kubeconfig $env:USERPROFILE\.kube\rke2.yaml apply -f apps/kubevirt-vms/prod-vlan57-nad.yaml
 #
 # Then update ci1.yaml networks: stanza to:
 #   - name: prod-net
 #     multus:
 #       networkName: kubevirt-vms/prod-vlan57
 # and the interface block from `masquerade` to `bridge`.
 # =============================================================================
 ---
 # Namespace must exist already (created by ci1.yaml's first document).
 # This file imports a NAD into that same namespace.
 apiVersion: k8s.cni.cncf.io/v1
 kind: NetworkAttachmentDefinition
 metadata:
  name: prod-vlan57
  namespace: kubevirt-vms
  annotations:
    bluejay.iamworkin.lan/host-bridge: "br-prod (enslaves enp86s0.57)"
    bluejay.iamworkin.lan/cidr: "10.0.57.0/24"
    bluejay.iamworkin.lan/gateway: "10.0.57.1"
    bluejay.iamworkin.lan/dns: "10.0.56.1 (pfSense Unbound)"
 spec:
  config: |
    {
      "cniVersion": "0.3.1",
      "name": "prod-vlan57",
      "type": "bridge",
      "bridge": "br-prod",
      "ipam": {},
      "mtu": 1500,
      "vlan": 0,
      "promiscMode": true,
      "preserveDefaultVlan": false
    }
--- a/apps/kubevirt-vms/win2025-iso-nfs-pv.yaml
+++ b/apps/kubevirt-vms/win2025-iso-nfs-pv.yaml
@@ -0,0 +1,99 @@
 # =============================================================================
 # Windows Server 2025 ISO — Static NFS PV (Path B for SATA-CDROM timeout)
 # =============================================================================
 # Purpose: Mount the ISO from Synology NAS via NFS instead of from a Longhorn-
 # backed Filesystem PVC.
 #
 # Why: SATA-CDROM emulation reading from a Longhorn-backed Filesystem PVC is
 # too slow for OVMF's boot read window — the DVD-ROM enumeration times out
 # before the bootloader can be read. Symptom on the serial console:
 #   BdsDxe: failed to start Boot0001 "UEFI QEMU DVD-ROM QM00001 " from ...
 #   BdsDxe: failed to start Boot0001 ... Time out
 #   BdsDxe: No bootable option or device was found
 # Diagnosis confirmed the ISO content is a perfectly valid bootable ISO9660
 # image — the bug is in the timing path between OVMF and Longhorn-backed
 # storage, not in the ISO itself.
 #
 # Block-mode PVC was tried (`volumeMode: Block` via DataVolume) and would
 # likely fix the timing, but CDI v1.65.0's upload-target pod cannot open the
 # block device due to runAsUser:107 + capabilities.drop:[ALL] and we got:
 #   blockdev: cannot open /dev/cdi-block-volume: Permission denied
 #
 # NFS-mounted ISO bypasses both issues: no Longhorn slowness, no CDI upload
 # pod permission concerns. The ISO is read directly from the NAS over a
 # native NFSv4.1 mount that QEMU's SATA emulator can read at full LAN speed.
 #
 # Layout on Synology:
 #   /volume1/ISOs/                                              (existing export, RKE2 ACL)
 #     en-us_windows_server_2025_updated_march_2026_x64_dvd_8e06425a.iso
 #     win2025-iso-disk/                                         (new subdir, 2026-05-08)
 #       disk.img -> hardlink to ../en-us_windows_server_2025_..._8e06425a.iso
 #
 # KubeVirt's launcher pod expects a PVC mounted at
 # /var/run/kubevirt-private/vmi-disks/<diskName>/disk.img — by mounting the
 # `win2025-iso-disk/` subdir as the NFS PV root, `disk.img` lives at the PV's
 # root and KubeVirt's CDROM emulator finds it without any path manipulation.
 #
 # A symlink would NOT work for sub-path NFS mounts (the relative target
 # `../...iso` falls outside the sub-mount root). A hardlink works because it
 # references the same inode regardless of mount point.
 #
 # Memory references:
 #   - feedback_synology_nfs_volume1_kubernetes_export_scoped (Synology export
 #     scoping pattern — but /volume1/ISOs export, unlike /volume1/kubernetes,
 #     does support sub-path mounts because Synology NFS is configured with
 #     pseudo-fs in NFSv4.1)
 #   - feedback_kubevirt_iso_first_install_bootorder_and_runstrategy (boot
 #     order / runStrategy gotchas, separate from the storage timing issue)
 #
 # Validation (2026-05-08, from rke2-server / rke2-agent1 / rke2-agent2):
 #   mount -t nfs -o nfsvers=4.1,ro 10.0.58.3:/volume1/ISOs/win2025-iso-disk /tmp/m
 #   file /tmp/m/disk.img
 #     -> ISO 9660 CD-ROM filesystem data 'SSS_X64FRE_EN-US_DV9' (bootable)
 # All 3 RKE2 nodes can mount and read.
 # =============================================================================
 apiVersion: v1
 kind: PersistentVolume
 metadata:
  name: windows-server-2025-iso-nfs
  labels:
    flowercore.io/iso: windows-server-2025
    flowercore.io/managed-by: bluejay-infra
 spec:
  capacity:
    storage: 8Gi
  accessModes:
    - ReadOnlyMany
  volumeMode: Filesystem
  persistentVolumeReclaimPolicy: Retain
  storageClassName: ""              # static, no provisioner
  mountOptions:
    - nfsvers=4.1
    - ro
    - hard
    - timeo=600
    - retrans=3
  nfs:
    server: 10.0.58.3               # BlueJayNAS Synology DS1621+ on HOME VLAN 58
    path: /volume1/ISOs/win2025-iso-disk
    readOnly: true
 ---
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
  name: windows-server-2025-iso-nfs
  namespace: kubevirt-vms
  labels:
    app: ci-runner
    flowercore.io/managed-by: bluejay-infra
 spec:
  accessModes:
    - ReadOnlyMany
  volumeMode: Filesystem
  resources:
    requests:
      storage: 8Gi
  storageClassName: ""
  volumeName: windows-server-2025-iso-nfs
--- a/apps/multus/multus.yaml
+++ b/apps/multus/multus.yaml
@@ -0,0 +1,286 @@
 # =============================================================================
 # Multus CNI — Meta-CNI for multi-network attachment to pods/VMs
 # =============================================================================
 # Purpose: enable KubeVirt VMs (and any future workload) to attach additional
 # network interfaces beyond the default Calico-managed pod network. Required
 # for ci1 (Windows Server 2025 KubeVirt VM) to bridge onto PROD VLAN 57.
 #
 # Source: upstream k8snetworkplumbingwg/multus-cni v4.2.2
 #   https://github.com/k8snetworkplumbingwg/multus-cni/blob/v4.2.2/deployments/multus-daemonset-thick.yml
 #
 # Inlined verbatim (with project header + version pin annotation) for
 # reproducibility and air-gap safety. Bumping versions = edit this file +
 # git push. ArgoCD picks up via the bluejay-infra ApplicationSet
 # (apps/* directory generator on main).
 #
 # Why thick plugin (not thin):
 #   - Thick = daemon + thin shim binary; daemon handles NAD watch + CRD reads
 #     centrally so each pod's CNI ADD doesn't hit the K8s API server. Better
 #     for clusters with many NAD-using pods.
 #   - Thin = each CNI ADD process directly contacts K8s API. Simpler but
 #     scales worse and has more failure modes.
 #   - KubeVirt + multi-VM workload pattern fits thick perfectly.
 #
 # Cluster context (verified 2026-05-08):
 #   - RKE2 v1.34.5 on 3 nodes (rke2-server, rke2-agent1, rke2-agent2)
 #   - Calico CNI (Tigera-managed) at /etc/cni/net.d + /opt/cni/bin (default)
 #   - openSUSE Leap 16, kernel 6.12, containerd 2.1.5
 #   - host bridge for PROD VLAN 57 = `br-prod` (PUPPET HOST WORK — see Phase 1.5
 #     in docs/infrastructure/windows-server-build-runner-plan.md)
 #
 # Version pin: snapshot-thick → pinning to v4.2.2 release tag at deploy time
 # would require a private mirror of the image. Upstream `snapshot-thick` tag
 # is updated on every release, so for now we trust upstream + Calico's
 # established pattern. Pin to a specific SHA256 once we mirror to Gitea OCI.
 #
 # Apply (once committed to bluejay-infra main, ApplicationSet auto-syncs):
 #   git add apps/multus/multus.yaml && git commit && git push origin main
 #   # ArgoCD `infra-multus` Application appears within 3 min via ApplicationSet
 #
 # Verify:
 #   kubectl -n kube-system get ds kube-multus-ds
 #   kubectl -n kube-system rollout status ds kube-multus-ds
 #   kubectl get crd network-attachment-definitions.k8s.cni.cncf.io
 # =============================================================================
 ---
 apiVersion: apiextensions.k8s.io/v1
 kind: CustomResourceDefinition
 metadata:
  name: network-attachment-definitions.k8s.cni.cncf.io
  annotations:
    bluejay.iamworkin.lan/source: "k8snetworkplumbingwg/multus-cni v4.2.2"
 spec:
  group: k8s.cni.cncf.io
  scope: Namespaced
  names:
    plural: network-attachment-definitions
    singular: network-attachment-definition
    kind: NetworkAttachmentDefinition
    shortNames:
      - net-attach-def
  versions:
    - name: v1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          description: 'NetworkAttachmentDefinition is a CRD schema specified by the Network Plumbing
            Working Group to express the intent for attaching pods to one or more logical or physical
            networks. More information available at: https://github.com/k8snetworkplumbingwg/multi-net-spec'
          type: object
          properties:
            apiVersion:
              type: string
            kind:
              type: string
            metadata:
              type: object
            spec:
              description: 'NetworkAttachmentDefinition spec defines the desired state of a network attachment'
              type: object
              properties:
                config:
                  description: 'NetworkAttachmentDefinition config is a JSON-formatted CNI configuration'
                  type: string
 ---
 kind: ClusterRole
 apiVersion: rbac.authorization.k8s.io/v1
 metadata:
  name: multus
 rules:
  - apiGroups: ["k8s.cni.cncf.io"]
    resources:
      - '*'
    verbs:
      - '*'
  - apiGroups:
      - ""
    resources:
      - pods
      - pods/status
    verbs:
      - get
      - list
      - update
      - watch
  - apiGroups:
      - ""
      - events.k8s.io
    resources:
      - events
    verbs:
      - create
      - patch
      - update
 ---
 kind: ClusterRoleBinding
 apiVersion: rbac.authorization.k8s.io/v1
 metadata:
  name: multus
 roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: multus
 subjects:
  - kind: ServiceAccount
    name: multus
    namespace: kube-system
 ---
 apiVersion: v1
 kind: ServiceAccount
 metadata:
  name: multus
  namespace: kube-system
 ---
 kind: ConfigMap
 apiVersion: v1
 metadata:
  name: multus-daemon-config
  namespace: kube-system
  labels:
    tier: node
    app: multus
 data:
  daemon-config.json: |
    {
        "chrootDir": "/hostroot",
        "cniVersion": "0.3.1",
        "logLevel": "verbose",
        "logToStderr": true,
        "cniConfigDir": "/host/etc/cni/net.d",
        "multusAutoconfigDir": "/host/etc/cni/net.d",
        "multusConfigFile": "auto",
        "socketDir": "/host/run/multus/"
    }
 ---
 apiVersion: apps/v1
 kind: DaemonSet
 metadata:
  name: kube-multus-ds
  namespace: kube-system
  labels:
    tier: node
    app: multus
    name: multus
 spec:
  selector:
    matchLabels:
      name: multus
  updateStrategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        tier: node
        app: multus
        name: multus
    spec:
      hostNetwork: true
      hostPID: true
      tolerations:
        - operator: Exists
          effect: NoSchedule
        - operator: Exists
          effect: NoExecute
      serviceAccountName: multus
      containers:
        - name: kube-multus
          image: ghcr.io/k8snetworkplumbingwg/multus-cni:snapshot-thick
          command: [ "/usr/src/multus-cni/bin/multus-daemon" ]
          resources:
            requests:
              cpu: "100m"
              memory: "50Mi"
            limits:
              cpu: "100m"
              memory: "50Mi"
          securityContext:
            privileged: true
          terminationMessagePolicy: FallbackToLogsOnError
          volumeMounts:
            - name: cni
              mountPath: /host/etc/cni/net.d
            # multus-daemon expects that cnibin path must be identical between pod and container host.
            # e.g. if the cni bin is in '/opt/cni/bin' on the container host side, then it should be mount to '/opt/cni/bin' in multus-daemon,
            # not to any other directory, like '/opt/bin' or '/usr/bin'.
            - name: cnibin
              mountPath: /opt/cni/bin
            - name: host-run
              mountPath: /host/run
            - name: host-var-lib-cni-multus
              mountPath: /var/lib/cni/multus
            - name: host-var-lib-kubelet
              mountPath: /var/lib/kubelet
              mountPropagation: HostToContainer
            - name: host-run-k8s-cni-cncf-io
              mountPath: /run/k8s.cni.cncf.io
            - name: host-run-netns
              mountPath: /run/netns
              mountPropagation: HostToContainer
            - name: multus-daemon-config
              mountPath: /etc/cni/net.d/multus.d
              readOnly: true
            - name: hostroot
              mountPath: /hostroot
              mountPropagation: HostToContainer
            - mountPath: /etc/cni/multus/net.d
              name: multus-conf-dir
          env:
            - name: MULTUS_NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
      initContainers:
        - name: install-multus-binary
          image: ghcr.io/k8snetworkplumbingwg/multus-cni:snapshot-thick
          command:
            - "sh"
            - "-c"
            - "cp /usr/src/multus-cni/bin/multus-shim /host/opt/cni/bin/multus-shim && cp /usr/src/multus-cni/bin/passthru /host/opt/cni/bin/passthru"
          resources:
            requests:
              cpu: "10m"
              memory: "15Mi"
          securityContext:
            privileged: true
          terminationMessagePolicy: FallbackToLogsOnError
          volumeMounts:
            - name: cnibin
              mountPath: /host/opt/cni/bin
              mountPropagation: Bidirectional
      terminationGracePeriodSeconds: 10
      volumes:
        - name: cni
          hostPath:
            path: /etc/cni/net.d
        - name: cnibin
          hostPath:
            path: /opt/cni/bin
        - name: hostroot
          hostPath:
            path: /
        - name: multus-daemon-config
          configMap:
            name: multus-daemon-config
            items:
            - key: daemon-config.json
              path: daemon-config.json
        - name: host-run
          hostPath:
            path: /run
        - name: host-var-lib-cni-multus
          hostPath:
            path: /var/lib/cni/multus
        - name: host-var-lib-kubelet
          hostPath:
            path: /var/lib/kubelet
        - name: host-run-k8s-cni-cncf-io
          hostPath:
            path: /run/k8s.cni.cncf.io
        - name: host-run-netns
          hostPath:
            path: /run/netns/
        - name: multus-conf-dir
          hostPath:
            path: /etc/cni/multus/net.d