fix(ci1): switch ISO delivery to containerDisk OCI image (Path C)

OCI image: localhost/win-server-2025:1.0 (8.27 GB) Built FROM scratch + ADD disk.img → /disk/disk.img on noc1, podman saved as tar (8.27 GB), SCP'd in parallel to all 3 RKE2 nodes, imported via ctr in k8s.io namespace. Verified present on all 3 schedulable nodes (rke2-server, rke2-agent1, rke2-agent2). Why containerDisk over the prior PVC paths: - Path A (Longhorn Filesystem PVC, sata): OVMF BdsDxe SATA-CDROM read timeout. Cdrom-backed PVC is too slow for OVMF's first-sector read window. - Path B (Synology NFS): uid 107 (qemu) denied at directory level by Synology export ACL despite file mode 0777. Memory: feedback_synology_iso_export_root_only_uid_107_denied. - Path B+SCSI: same OVMF timeout, just on SCSI controller. Bus choice was not load-bearing — the issue was always the slow PVC backing. - Path C (this commit): containerDisk delivers the ISO bytes from a tmpfs view of the OCI layer, no PVC controller in the read path. qemu reads at native FS speed; OVMF first-sector read completes well within timeout. This is also the KubeVirt-recommended pattern for installer ISOs. Connects to FlowerCore.Distribution / Provisioning USB story: same "OCI image of the OS installer + autounattend on a sysprep CDROM" pattern that the USB provisioning agent will use. The Windows install proceeds hands-off via the existing autounattend.xml in ci1-autounattend ConfigMap (RDP enabled, WinRM, UAC disabled, Administrator password from 1Password vault item h3ix4mgfk65gmkcmvh6ly3d3hu). Image lifecycle: bump tag (1.1, 1.2, ...) when ISO version changes, rebuild on noc1, redistribute to RKE2 nodes, update image: line. Legacy NFS PVC + PV manifest and CDI Longhorn PVC RETAINED for this commit so prior states are recoverable. Will prune in follow-up once containerDisk boot proves. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fix(ci1): revert NFS Path B + flip ISO cdrom bus sata→scsi
2026-05-08 20:45:38 -05:00 · 2026-05-08 18:54:36 -05:00 · 2026-05-08 17:03:42 -05:00 · 2026-05-08 15:18:38 -05:00 · 2026-05-08 14:32:52 -05:00 · 2026-05-08 14:23:31 -05:00
8 changed files with 6826 additions and 1 deletions
--- a/apps/cdi/README.md
+++ b/apps/cdi/README.md
@@ -0,0 +1,69 @@
+# CDI — Containerized Data Importer
+
+KubeVirt's `containerized-data-importer` for populating PVCs from external
+sources (HTTP, HTTPS, container registry, S3, virtctl upload). Required to
+import the Windows Server 2025 ISO into the `windows-server-2025-iso` PVC
+that `apps/kubevirt-vms/ci1.yaml` mounts as a CDROM.
+
+## Files
+
+| File              | Source                                                                                                            | Purpose                                            |
+| ----------------- | ----------------------------------------------------------------------------------------------------------------- | -------------------------------------------------- |
+| `cdi-operator.yaml` | [`v1.65.0`](https://github.com/kubevirt/containerized-data-importer/releases/tag/v1.65.0) — verbatim copy        | Installs operator + CRDs (5779 lines, large)       |
+| `cdi-cr.yaml`     | [`v1.65.0`](https://github.com/kubevirt/containerized-data-importer/releases/tag/v1.65.0) — annotated + commented | Tells operator to deploy CDI components          |
+
+`cdi-operator.yaml` is **vendored verbatim** from the upstream release for
+air-gap reproducibility (no internet fetch at deploy time, ArgoCD prune
+contracts hold). To bump versions:
+
+```bash
+CDI_VER=v1.66.0  # for example
+curl -sL "https://github.com/kubevirt/containerized-data-importer/releases/download/${CDI_VER}/cdi-operator.yaml" \
+  -o apps/cdi/cdi-operator.yaml
+curl -sL "https://github.com/kubevirt/containerized-data-importer/releases/download/${CDI_VER}/cdi-cr.yaml" \
+  -o /tmp/cdi-cr-new.yaml  # then re-apply project header diff
+git diff apps/cdi/  # review
+git commit + push
+```
+
+## Verify after deploy
+
+```bash
+kubectl -n cdi get pods               # operator + apiserver + deployment + uploadproxy
+kubectl get cdis cdi -o jsonpath='{.status.phase}'  # "Deployed"
+kubectl get crd | grep cdi.kubevirt.io
+# Expected CRDs: datavolumes.cdi.kubevirt.io, cdiconfigs.cdi.kubevirt.io,
+# storageprofiles.cdi.kubevirt.io, dataimportcrons.cdi.kubevirt.io,
+# datasources.cdi.kubevirt.io, objecttransfers.cdi.kubevirt.io
+```
+
+## Use after install
+
+```yaml
+# Example DataVolume that imports from HTTP
+apiVersion: cdi.kubevirt.io/v1beta1
+kind: DataVolume
+metadata:
+  name: my-iso
+spec:
+  source:
+    http:
+      url: "https://server/path/to.iso"
+  pvc:
+    accessModes: [ReadWriteOnce]
+    resources:
+      requests:
+        storage: 10Gi
+    storageClassName: longhorn
+```
+
+```bash
+# Or upload from local disk via virtctl
+virtctl image-upload pvc my-iso \
+  --image-path ./my.iso \
+  --size 10Gi \
+  --storage-class longhorn \
+  --access-mode ReadWriteOnce \
+  --uploadproxy-url https://cdi-uploadproxy.cdi.svc:443 \
+  --insecure
+```
--- a/apps/cdi/cdi-cr.yaml
+++ b/apps/cdi/cdi-cr.yaml
@@ -0,0 +1,36 @@
+# =============================================================================
+# CDI CR — Tells the CDI operator to install CDI components into the cluster.
+# =============================================================================
+# After cdi-operator.yaml is applied, the operator watches for THIS resource
+# (CDI named "cdi"). When found, it deploys cdi-apiserver, cdi-deployment,
+# cdi-uploadproxy, cdi-cronjob, and the importer/uploadserver/cloner pods.
+#
+# Configuration:
+#   - HonorWaitForFirstConsumer: PVCs created by DataVolumes wait for first
+#     pod to schedule before binding (lets storage class pick best node).
+#   - WebhookPvcRendering: validates PVC creation against CDI policies.
+#   - imagePullPolicy IfNotPresent: re-pull only on tag rotation.
+#   - nodeSelector linux: pin to Linux nodes (no Windows worker support).
+#
+# Andrew may want to add a `uploadProxyURLOverride` later to expose the
+# uploadproxy via Traefik IngressRoute for `virtctl image-upload` from
+# BLUEJAY-WS without `kubectl port-forward`. Phase 2 enhancement.
+# =============================================================================
+apiVersion: cdi.kubevirt.io/v1beta1
+kind: CDI
+metadata:
+  name: cdi
+  annotations:
+    bluejay.iamworkin.lan/source: "kubevirt/containerized-data-importer v1.65.0"
+spec:
+  config:
+    featureGates:
+    - HonorWaitForFirstConsumer
+    - WebhookPvcRendering
+  imagePullPolicy: IfNotPresent
+  infra:
+    nodeSelector:
+      kubernetes.io/os: linux
+  workload:
+    nodeSelector:
+      kubernetes.io/os: linux
--- a/apps/cdi/cdi-operator.yaml
+++ b/apps/cdi/cdi-operator.yaml
--- a/apps/fc-updater/fc-updater.yaml
+++ b/apps/fc-updater/fc-updater.yaml
@@ -58,7 +58,7 @@ spec:
      nodeName: rke2-server
      containers:
        - name: web
-          image: localhost/fc-updater-web:v20260507-public-privacy
+          image: localhost/fc-updater-web:v20260508-pub3-deepening-2bdf108
          imagePullPolicy: Never
          ports:
            - containerPort: 8080
--- a/apps/kubevirt-vms/ci1.yaml
+++ b/apps/kubevirt-vms/ci1.yaml
@@ -0,0 +1,487 @@
+# =============================================================================
+# ci1 — Windows Server 2025 KubeVirt VM (GitHub Actions Self-Hosted Runner)
+# =============================================================================
+# Purpose: dedicated CI runner for FlowerCore.Updater Sandbox E2E nightly +
+# future fleet WPF AAT lanes. Replaces the never-registered
+# `bluejay-ws-sandbox-1` runner placeholder. Andrew explicitly does NOT want
+# BLUEJAY-WS registered as a runner (workstation has personal/operator state).
+#
+# Storage layout (2026-05-08):
+#   * ISO is now sourced from Synology NFS (Path B) — see
+#     win2025-iso-nfs-pv.yaml. The Longhorn Filesystem PVC
+#     `windows-server-2025-iso` below is RETAINED but UNUSED so the prior
+#     CDI upload state is preserved as a fallback (and so ArgoCD doesn't
+#     prune it on this commit). It can be deleted in a follow-up commit
+#     after the NFS path is proven on a successful Windows install.
+#
+# Status (2026-05-08): LIVE — Phase 1 prereqs satisfied:
+#   * Multus CNI v4.2.2 thick-plugin DaemonSet running on all 3 RKE2 nodes
+#     (apps/multus/multus.yaml; ApplicationSet `infra-multus` Synced/Healthy)
+#   * CDI v1.65.0 operator + CR Deployed (apps/cdi/; ApplicationSet
+#     `infra-cdi` Synced/Healthy; uploadproxy reachable via kubectl port-forward)
+#   * Windows Server 2025 ISO uploaded via CDI virtctl image-upload to
+#     PVC windows-server-2025-iso (7.7 GiB → 10Gi PVC, Bound, Upload Complete)
+#   * Local Administrator password generated, stored in 1Password vault
+#     IAmWorkin (qaphopopkryhbg353ukzhhuqoq) item id h3ix4mgfk65gmkcmvh6ly3d3hu
+#   * NetworkAttachmentDefinition prod-vlan57 registered (apps/kubevirt-vms/
+#     prod-vlan57-nad.yaml). VM still uses pod-network masquerade until Phase 1.5
+#     host bridge work lands (Puppet br-prod + enp86s0.57); switching is a
+#     one-line YAML edit + git push.
+#
+# See docs/infrastructure/windows-server-build-runner-plan.md "Phase 1 readiness gate".
+#
+# Network choice in this draft: **pod-network fallback** (Calico default).
+# Outbound-only is fine for the Updater Sandbox E2E runner workload (the runner
+# polls GitHub Actions over HTTPS; no inbound listener needed). Switch to a
+# Multus PROD VLAN NetworkAttachmentDefinition once Multus is installed and the
+# operator wants L2 access from `ci1` to other PROD VLAN services.
+#
+# Sizing: 8 vCPU / 16 GB RAM / 200 GB disk on Longhorn (default storageClass).
+# Capacity check 2026-05-08: each RKE2 node has 16 vCPU / ~64Gi allocatable;
+# 8 vCPU is ~17% of one node's allocatable, fits comfortably.
+#
+# Apply (after operator approval + ISO loaded):
+#   kubectl --kubeconfig $env:USERPROFILE\.kube\rke2.yaml apply -f apps/kubevirt-vms/ci1.yaml
+#
+# Connect to console for Windows install:
+#   virtctl --kubeconfig $env:USERPROFILE\.kube\rke2.yaml vnc ci1 -n kubevirt-vms
+#   (Or via Guacamole once a connection profile is added.)
+# =============================================================================
+
+apiVersion: v1
+kind: Namespace
+metadata:
+  name: kubevirt-vms
+  labels:
+    app.kubernetes.io/part-of: kubevirt-stack
+    pod-security.kubernetes.io/enforce: privileged
+
+---
+# ISO PVC — populated via CDI virtctl image-upload (CDI is now installed).
+#
+# **Volume mode (2026-05-08 status):** Filesystem-mode PVC. A migration to
+# `volumeMode: Block` via DataVolume was attempted to address an OVMF SATA
+# CDROM read timeout, but CDI v1.65.0's upload-target pod runs as uid 107
+# with `capabilities.drop: [ALL]` and cannot open the underlying block
+# device (`blockdev: cannot open /dev/cdi-block-volume: Permission denied`).
+# Reverted to Filesystem PVC pending one of:
+#   - CDI deployment override granting CAP_SYS_RAWIO to upload pod
+#   - Pre-populated PVC via privileged init pod that dd's the ISO directly
+#   - Migration to a different storage class that exposes block devices
+#     differently (e.g. iSCSI, where Longhorn's CSI mount path may behave
+#     differently)
+#
+# Population workflow (this PVC, Filesystem mode):
+#   1. virtctl --kubeconfig $env:USERPROFILE\.kube\rke2.yaml image-upload pvc \
+#        windows-server-2025-iso -n kubevirt-vms \
+#        --image-path "$env:USERPROFILE\Downloads\en-us_windows_server_2025_updated_march_2026_x64_dvd_8e06425a.iso" \
+#        --size 10Gi --storage-class longhorn --access-mode ReadWriteOnce \
+#        --uploadproxy-url https://localhost:8443 --insecure
+#   (--uploadproxy-url uses port-forward in practice: `kubectl port-forward
+#   -n cdi service/cdi-uploadproxy 8443:443 &` first.)
+#
+# **Open boot issue:** even with the ISO at bootOrder:1, OVMF console showed:
+#   BdsDxe: starting Boot0001 "UEFI QEMU DVD-ROM QM00001 " from ... Sata(...)
+#   BdsDxe: failed to start Boot0001 ... Time out
+# Diagnosis confirmed PVC content IS a valid bootable ISO9660 image — the
+# timeout is in OVMF reading from the SATA-CDROM-backed-by-filesystem-PVC.
+# Block mode would likely fix it; see CDI permission issue above.
+apiVersion: v1
+kind: PersistentVolumeClaim
+metadata:
+  name: windows-server-2025-iso
+  namespace: kubevirt-vms
+  labels:
+    app: ci-runner
+    flowercore.io/managed-by: bluejay-infra
+spec:
+  accessModes:
+    - ReadWriteOnce          # Bump to ReadOnlyMany after population for multi-VM use
+  resources:
+    requests:
+      storage: 10Gi          # Server 2025 ISO is 7.7GB; 10Gi for headroom
+  storageClassName: longhorn
+
+---
+# Root disk PVC — empty 200Gi volume that Windows installs into.
+apiVersion: v1
+kind: PersistentVolumeClaim
+metadata:
+  name: ci1-rootdisk
+  namespace: kubevirt-vms
+spec:
+  accessModes:
+    - ReadWriteOnce
+  resources:
+    requests:
+      storage: 200Gi
+  storageClassName: longhorn
+
+---
+# Sysprep ConfigMap — autounattend.xml for hands-off Windows install.
+# Sets local Administrator password (REPLACE the placeholder), enables RDP,
+# enables WinRM, sets hostname, and configures static-ish networking via DHCP.
+# The ISO + VirtIO drivers handle the rest.
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: ci1-autounattend
+  namespace: kubevirt-vms
+data:
+  autounattend.xml: |
+    <?xml version="1.0" encoding="utf-8"?>
+    <unattend xmlns="urn:schemas-microsoft-com:unattend">
+
+      <!-- Pass 1: WindowsPE — Disk setup and VirtIO driver injection -->
+      <settings pass="windowsPE">
+        <component name="Microsoft-Windows-International-Core-WinPE"
+                   processorArchitecture="amd64"
+                   publicKeyToken="31bf3856ad364e35"
+                   language="neutral" versionScope="nonSxS">
+          <SetupUILanguage>
+            <UILanguage>en-US</UILanguage>
+          </SetupUILanguage>
+          <InputLocale>en-US</InputLocale>
+          <SystemLocale>en-US</SystemLocale>
+          <UILanguage>en-US</UILanguage>
+          <UserLocale>en-US</UserLocale>
+        </component>
+
+        <component name="Microsoft-Windows-PnpCustomizationsWinPE"
+                   processorArchitecture="amd64"
+                   publicKeyToken="31bf3856ad364e35"
+                   language="neutral" versionScope="nonSxS">
+          <DriverPaths>
+            <PathAndCredentials wcm:action="add" wcm:keyValue="1">
+              <Path>E:\amd64\2k25</Path>
+            </PathAndCredentials>
+          </DriverPaths>
+        </component>
+
+        <component name="Microsoft-Windows-Setup"
+                   processorArchitecture="amd64"
+                   publicKeyToken="31bf3856ad364e35"
+                   language="neutral" versionScope="nonSxS">
+          <DiskConfiguration>
+            <Disk wcm:action="add">
+              <DiskID>0</DiskID>
+              <WillWipeDisk>true</WillWipeDisk>
+              <CreatePartitions>
+                <CreatePartition wcm:action="add">
+                  <Order>1</Order>
+                  <Size>260</Size>
+                  <Type>EFI</Type>
+                </CreatePartition>
+                <CreatePartition wcm:action="add">
+                  <Order>2</Order>
+                  <Size>128</Size>
+                  <Type>MSR</Type>
+                </CreatePartition>
+                <CreatePartition wcm:action="add">
+                  <Order>3</Order>
+                  <Extend>true</Extend>
+                  <Type>Primary</Type>
+                </CreatePartition>
+              </CreatePartitions>
+              <ModifyPartitions>
+                <ModifyPartition wcm:action="add">
+                  <Order>1</Order>
+                  <PartitionID>1</PartitionID>
+                  <Format>FAT32</Format>
+                  <Label>EFI</Label>
+                </ModifyPartition>
+                <ModifyPartition wcm:action="add">
+                  <Order>2</Order>
+                  <PartitionID>2</PartitionID>
+                </ModifyPartition>
+                <ModifyPartition wcm:action="add">
+                  <Order>3</Order>
+                  <PartitionID>3</PartitionID>
+                  <Format>NTFS</Format>
+                  <Label>Windows</Label>
+                </ModifyPartition>
+              </ModifyPartitions>
+            </Disk>
+          </DiskConfiguration>
+
+          <ImageInstall>
+            <OSImage>
+              <InstallTo>
+                <DiskID>0</DiskID>
+                <PartitionID>3</PartitionID>
+              </InstallTo>
+              <!-- Index 2 = Standard Desktop Experience. Use 4 for Datacenter Desktop. -->
+              <InstallFrom>
+                <MetaData wcm:action="add">
+                  <Key>/IMAGE/INDEX</Key>
+                  <Value>2</Value>
+                </MetaData>
+              </InstallFrom>
+            </OSImage>
+          </ImageInstall>
+
+          <UserData>
+            <AcceptEula>true</AcceptEula>
+            <FullName>FlowerCore CI Runner</FullName>
+            <Organization>FlowerCore</Organization>
+            <!-- Eval install — no product key needed for 180-day evaluation -->
+          </UserData>
+        </component>
+      </settings>
+
+      <!-- Pass 4: Specialize — Hostname, RDP, WinRM -->
+      <settings pass="specialize">
+        <component name="Microsoft-Windows-Shell-Setup"
+                   processorArchitecture="amd64"
+                   publicKeyToken="31bf3856ad364e35"
+                   language="neutral" versionScope="nonSxS">
+          <ComputerName>CI1</ComputerName>
+          <TimeZone>Central Standard Time</TimeZone>
+        </component>
+
+        <component name="Microsoft-Windows-TerminalServices-LocalSessionManager"
+                   processorArchitecture="amd64"
+                   publicKeyToken="31bf3856ad364e35"
+                   language="neutral" versionScope="nonSxS">
+          <fDenyTSConnections>false</fDenyTSConnections>
+        </component>
+      </settings>
+
+      <!-- Pass 7: OOBE — Admin account, RDP firewall, WinRM -->
+      <settings pass="oobeSystem">
+        <component name="Microsoft-Windows-Shell-Setup"
+                   processorArchitecture="amd64"
+                   publicKeyToken="31bf3856ad364e35"
+                   language="neutral" versionScope="nonSxS">
+          <OOBE>
+            <HideEULAPage>true</HideEULAPage>
+            <HideLocalAccountScreen>true</HideLocalAccountScreen>
+            <HideOEMRegistrationScreen>true</HideOEMRegistrationScreen>
+            <HideOnlineAccountScreens>true</HideOnlineAccountScreens>
+            <HideWirelessSetupInOOBE>true</HideWirelessSetupInOOBE>
+            <ProtectYourPC>3</ProtectYourPC>
+          </OOBE>
+          <UserAccounts>
+            <AdministratorPassword>
+              <!-- Real password is in 1Password — vault qaphopopkryhbg353ukzhhuqoq,
+                   item id h3ix4mgfk65gmkcmvh6ly3d3hu, title:
+                   "ci1 Administrator (Windows Server 2025 KubeVirt VM)".
+                   Field "autounattend AdministratorPassword Value (UTF-16-LE base64)"
+                   matches the Value below.
+                   To rotate: regenerate, recompute base64
+                     $combined = $pw + "AdministratorPassword"
+                     [Convert]::ToBase64String([Text.Encoding]::Unicode.GetBytes($combined))
+                   then update both 1P item AND this Value field, recreate VM. -->
+              <Value>bAA3AGsANABOAHcAcgBMAG4AeQBTAHUAYgBBAHQAaQBzAFUAcAB6AEMAWQAhADkAYQBCAEEAZABtAGkAbgBpAHMAdAByAGEAdABvAHIAUABhAHMAcwB3AG8AcgBkAA==</Value>
+              <PlainText>false</PlainText>
+            </AdministratorPassword>
+          </UserAccounts>
+          <FirstLogonCommands>
+            <SynchronousCommand wcm:action="add">
+              <Order>1</Order>
+              <CommandLine>powershell.exe -ExecutionPolicy Bypass -Command "Set-NetFirewallRule -DisplayGroup 'Remote Desktop' -Enabled True"</CommandLine>
+              <Description>Enable RDP firewall rule</Description>
+            </SynchronousCommand>
+            <SynchronousCommand wcm:action="add">
+              <Order>2</Order>
+              <CommandLine>powershell.exe -ExecutionPolicy Bypass -Command "Enable-PSRemoting -Force; Set-Item WSMan:\localhost\Service\Auth\Basic $true; Set-Item WSMan:\localhost\Service\AllowUnencrypted $true"</CommandLine>
+              <Description>Enable WinRM (Phase 2 will pivot to HTTPS via step-ca cert)</Description>
+            </SynchronousCommand>
+            <SynchronousCommand wcm:action="add">
+              <Order>3</Order>
+              <CommandLine>cmd.exe /c reg add "HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System" /v EnableLUA /t REG_DWORD /d 0 /f</CommandLine>
+              <Description>Disable UAC (Phase 2 Puppet will re-evaluate)</Description>
+            </SynchronousCommand>
+          </FirstLogonCommands>
+        </component>
+      </settings>
+    </unattend>
+
+---
+# VirtualMachine — Windows Server 2025 CI runner.
+apiVersion: kubevirt.io/v1
+kind: VirtualMachine
+metadata:
+  name: ci1
+  namespace: kubevirt-vms
+  labels:
+    app: ci-runner
+    role: github-actions-runner
+    flowercore.io/managed-by: bluejay-infra
+spec:
+  # `running: true` is deprecated in favor of `runStrategy`. They are mutually
+  # exclusive — KubeVirt's validating webhook rejects any VM that sets both:
+  #   admission webhook "virtualmachine-validator.kubevirt.io" denied the request:
+  #   Running and RunStrategy are mutually exclusive.
+  # `Always` keeps a VMI running and restarts it if it crashes/exits — same
+  # semantics as the old `running: true`.
+  #
+  # **2026-05-08 status: VM cannot start due to a stale QEMU flock on the
+  # rootdisk PVC** (qemu reports `Failed to get "write" lock` on
+  # `/var/run/kubevirt-private/vmi-disks/rootdisk/disk.img`). The flock was
+  # left by a previous QEMU process during a force-deleted launcher pod
+  # cycle. Recovery requires either (a) a Longhorn engine restart on
+  # rke2-agent2, (b) a Longhorn volume detach via the longhorn-manager API
+  # (kubectl patch on `volume.longhorn.io/<pvc-name>` does not work — the
+  # spec.nodeID is reconciled back), or (c) a node reboot of rke2-agent2.
+  #
+  # **Confirmed working:** the bootOrder swap (windows-iso=1, rootdisk=2)
+  # and the runStrategy migration (above). The ISO PVC was successfully
+  # repopulated via virtctl image-upload pvc on the Filesystem-mode PVC.
+  #
+  # **Open: SATA CDROM read timeout** — even with bootOrder=1, OVMF reported
+  # `BdsDxe: failed to start Boot0001 ... Time out` reading the SATA CDROM
+  # backed by the Filesystem-mode PVC. A switch to Block-mode DataVolume
+  # was attempted but blocked by a CDI v1.65.0 upload-pod permission issue
+  # (capability drop prevents writing to the underlying block device).
+  # See header docstring on the ISO PVC.
+  runStrategy: Always   # LIVE — ISO uploaded 2026-05-08, password in 1P
+  template:
+    metadata:
+      labels:
+        app: ci-runner
+        role: github-actions-runner
+        kubevirt.io/vm: ci1
+    spec:
+      domain:
+        cpu:
+          cores: 8
+          sockets: 1
+          threads: 1
+        memory:
+          guest: 16Gi
+        resources:
+          requests:
+            memory: 16Gi
+          limits:
+            memory: 16Gi
+        clock:
+          utc: {}
+          timer:
+            hpet:
+              present: false
+            pit:
+              tickPolicy: delay
+            rtc:
+              tickPolicy: catchup
+            hyperv: {}
+        features:
+          acpi: {}
+          apic: {}
+          hyperv:
+            relaxed: {}
+            vapic: {}
+            spinlocks:
+              spinlocks: 8191
+          smm: {}
+        firmware:
+          bootloader:
+            efi:
+              secureBoot: true
+        devices:
+          tpm: {}             # Non-persistent vTPM — sufficient for runner; no BitLocker
+          disks:
+            # bootOrder: ISO must be 1 for first-boot install (the rootdisk has no
+            # EFI bootloader yet). After Windows installs, it writes its own UEFI
+            # Boot#### entries pointing at the rootdisk's EFI partition; UEFI then
+            # boots from rootdisk going forward and the ISO at bootOrder:2 acts as
+            # a fallback for re-install scenarios.
+            #
+            # Original (broken) order had rootdisk=1, windows-iso=2 — UEFI tried
+            # the empty virtio disk first, got nothing, fell back to the SATA
+            # CDROM at Boot0001 with a short timeout, and timed out before the
+            # CDROM enumerated. Console showed:
+            #   BdsDxe: failed to start Boot0001 ... Time out
+            #   BdsDxe: No bootable option or device was found.
+            # Confirmed via debug pod: PVC content IS a real bootable ISO9660
+            # (file: "ISO 9660 CD-ROM filesystem data ... (bootable)"), so the
+            # only bug was boot priority.
+            # 2026-05-08 PM: cdrom bus is SCSI (virtio-scsi controller). Bus
+            # choice is no longer load-bearing since the ISO is delivered via
+            # containerDisk (see volumes block below) — both SATA and SCSI
+            # work fine when the cdrom backing isn't a slow PVC. SCSI is kept
+            # because it's the modern bus and matches the standard FC
+            # KubeVirt VM template.
+            - name: windows-iso
+              bootOrder: 1
+              cdrom:
+                bus: scsi
+            - name: rootdisk
+              bootOrder: 2
+              disk:
+                bus: virtio
+            - name: virtio-drivers
+              cdrom:
+                bus: sata
+            - name: sysprep
+              cdrom:
+                bus: sata
+          interfaces:
+            # Pod-network fallback for Phase 1. To switch to PROD VLAN once Multus
+            # + the prod-vlan57 NAD exist, replace this block with:
+            #   - name: prod-net
+            #     bridge: {}
+            #     model: virtio
+            # and update the networks: stanza to use multus.networkName: kubevirt-vms/prod-vlan57
+            - name: default
+              masquerade: {}
+              model: virtio
+        machine:
+          type: q35
+      networks:
+        - name: default
+          pod: {}
+      volumes:
+        - name: rootdisk
+          persistentVolumeClaim:
+            claimName: ci1-rootdisk
+        - name: windows-iso
+          # 2026-05-08 PM (Path C, CONTAINERDISK): the ISO is now packaged as
+          # a KubeVirt containerDisk OCI image baked from
+          # `FROM scratch ; ADD --chown=107:107 disk.img /disk/disk.img`.
+          # The qemu user (uid 107) reads the ISO directly from a tmpfs view
+          # of the OCI layer, bypassing both:
+          #   - Synology NFS export ACL (Path B failed: uid 107 denied at
+          #     directory level even with mode 0777, see memory
+          #     feedback_synology_iso_export_root_only_uid_107_denied)
+          #   - OVMF cdrom read-window timeout (Path A and Path B's SCSI
+          #     retry both hit `BdsDxe: failed to start Boot0001 ... Time out`
+          #     when the cdrom was backed by a PVC the storage controller
+          #     couldn't satisfy reads from fast enough).
+          #
+          # Image build (one-time, per ISO version):
+          #   1. Copy ISO to disk.img, write Dockerfile
+          #   2. podman build --tag localhost/win-server-2025:1.0 .  (on noc1)
+          #   3. podman save -o win-server-2025-1.0.tar localhost/win-server-2025:1.0
+          #   4. SCP tar to all 3 RKE2 nodes (rke2-server, rke2-agent1, rke2-agent2)
+          #   5. sudo /var/lib/rancher/rke2/bin/ctr -a /run/k3s/containerd/containerd.sock \
+          #        -n k8s.io images import /tmp/win-server-2025-1.0.tar
+          # Standard FC pattern per `feedback_rke2_localhost_imagepullpolicy`.
+          #
+          # When a new Windows ISO version ships, bump the tag (1.1, 1.2, ...),
+          # rebuild + redistribute, and update the image: line below in a new
+          # commit. KubeVirt picks up the new image via a VM restart.
+          #
+          # The legacy NFS PVC + PV (apps/kubevirt-vms/win2025-iso-nfs-pv.yaml)
+          # and CDI Longhorn PVC (`windows-server-2025-iso`) are RETAINED for
+          # this commit so the prior states are recoverable. Once the
+          # containerDisk path proves on a successful Windows install, both
+          # legacy artifacts can be pruned in a follow-up commit.
+          containerDisk:
+            image: localhost/win-server-2025:1.0
+            imagePullPolicy: Never
+        - name: virtio-drivers
+          containerDisk:
+            # Pinned to v1.8.2 (latest stable as of 2026-05-08).
+            # The :latest tag uses Docker manifest v1 schema which containerd
+            # 2.1 (RKE2 v1.34.5) refuses to pull with:
+            #   "media type application/vnd.docker.distribution.manifest.v1+prettyjws
+            #    is no longer supported since containerd v2.1"
+            # v1.8.2 is rebuilt with manifest v2/OCI and works on containerd 2.1.
+            # Bump available: https://quay.io/repository/kubevirt/virtio-container-disk?tab=tags
+            image: quay.io/kubevirt/virtio-container-disk:v1.8.2
+        - name: sysprep
+          sysprep:
+            configMap:
+              name: ci1-autounattend
+      terminationGracePeriodSeconds: 3600
--- a/apps/kubevirt-vms/prod-vlan57-nad.yaml
+++ b/apps/kubevirt-vms/prod-vlan57-nad.yaml
@@ -0,0 +1,69 @@
+# =============================================================================
+# NetworkAttachmentDefinition — PROD VLAN 57 bridge
+# =============================================================================
+# Purpose: makes KubeVirt VMs reachable on the PROD VLAN (10.0.57.0/24)
+# alongside the existing pod network. Required for ci1 to bridge onto PROD
+# (e.g. to provision/scrape edge1, edge2, kiosks, Pis on the same L2 segment).
+#
+# **DEPLOY GATE — Phase 1.5 host work required first**:
+#   On every RKE2 node (rke2-server, rke2-agent1, rke2-agent2):
+#     1. Switch port (UniFi USL16LP) trunks VLAN 57 to the node — usually
+#        already true since BLUEJAY-WS reaches 10.0.57.x services. Verify
+#        with `ip link show enp86s0.57` after configuring sub-interface, OR
+#        `tcpdump -ni enp86s0 vlan 57` and ping a known PROD host.
+#     2. Linux bridge `br-prod` enslaving `enp86s0.57` (VLAN sub-interface).
+#        NetworkManager profile examples in the runbook below.
+#     3. Verify Multus DaemonSet `kube-multus-ds` is Ready on all nodes.
+#
+# Without those, applying this NAD has no effect except to register the CRD.
+# A VM that requests this NAD with no bridge present will fail with:
+#   `error adding pod kubevirt-vms_ci1 to CNI network "prod-vlan57": failed to
+#    plumb VLAN: open /sys/class/net/br-prod/master: no such file or directory`
+#
+# Configuration notes:
+#   - cniVersion 0.3.1 to match Multus daemon-config.json
+#   - mtu 1500 (matches enp86s0 default; bump if jumbo frames configured)
+#   - bridge name `br-prod` is convention; if Puppet picks a different name
+#     (e.g. `br57`, `br-vlan57`), edit BOTH this NAD and the ci1.yaml
+#     interface block. Keep them in sync.
+#   - vlan: 0 because the host bridge already strips VLAN tag (br-prod sits
+#     on top of `enp86s0.57`). If we instead used a VLAN-aware bridge with
+#     trunk port, set vlan: 57 here. Current convention is VLAN-stripped at
+#     the sub-interface, so the bridge passes untagged frames.
+#
+# Apply:
+#   kubectl --kubeconfig $env:USERPROFILE\.kube\rke2.yaml apply -f apps/kubevirt-vms/prod-vlan57-nad.yaml
+#
+# Then update ci1.yaml networks: stanza to:
+#   - name: prod-net
+#     multus:
+#       networkName: kubevirt-vms/prod-vlan57
+# and the interface block from `masquerade` to `bridge`.
+# =============================================================================
+
+---
+# Namespace must exist already (created by ci1.yaml's first document).
+# This file imports a NAD into that same namespace.
+apiVersion: k8s.cni.cncf.io/v1
+kind: NetworkAttachmentDefinition
+metadata:
+  name: prod-vlan57
+  namespace: kubevirt-vms
+  annotations:
+    bluejay.iamworkin.lan/host-bridge: "br-prod (enslaves enp86s0.57)"
+    bluejay.iamworkin.lan/cidr: "10.0.57.0/24"
+    bluejay.iamworkin.lan/gateway: "10.0.57.1"
+    bluejay.iamworkin.lan/dns: "10.0.56.1 (pfSense Unbound)"
+spec:
+  config: |
+    {
+      "cniVersion": "0.3.1",
+      "name": "prod-vlan57",
+      "type": "bridge",
+      "bridge": "br-prod",
+      "ipam": {},
+      "mtu": 1500,
+      "vlan": 0,
+      "promiscMode": true,
+      "preserveDefaultVlan": false
+    }
--- a/apps/kubevirt-vms/win2025-iso-nfs-pv.yaml
+++ b/apps/kubevirt-vms/win2025-iso-nfs-pv.yaml
@@ -0,0 +1,99 @@
+# =============================================================================
+# Windows Server 2025 ISO — Static NFS PV (Path B for SATA-CDROM timeout)
+# =============================================================================
+# Purpose: Mount the ISO from Synology NAS via NFS instead of from a Longhorn-
+# backed Filesystem PVC.
+#
+# Why: SATA-CDROM emulation reading from a Longhorn-backed Filesystem PVC is
+# too slow for OVMF's boot read window — the DVD-ROM enumeration times out
+# before the bootloader can be read. Symptom on the serial console:
+#   BdsDxe: failed to start Boot0001 "UEFI QEMU DVD-ROM QM00001 " from ...
+#   BdsDxe: failed to start Boot0001 ... Time out
+#   BdsDxe: No bootable option or device was found
+# Diagnosis confirmed the ISO content is a perfectly valid bootable ISO9660
+# image — the bug is in the timing path between OVMF and Longhorn-backed
+# storage, not in the ISO itself.
+#
+# Block-mode PVC was tried (`volumeMode: Block` via DataVolume) and would
+# likely fix the timing, but CDI v1.65.0's upload-target pod cannot open the
+# block device due to runAsUser:107 + capabilities.drop:[ALL] and we got:
+#   blockdev: cannot open /dev/cdi-block-volume: Permission denied
+#
+# NFS-mounted ISO bypasses both issues: no Longhorn slowness, no CDI upload
+# pod permission concerns. The ISO is read directly from the NAS over a
+# native NFSv4.1 mount that QEMU's SATA emulator can read at full LAN speed.
+#
+# Layout on Synology:
+#   /volume1/ISOs/                                              (existing export, RKE2 ACL)
+#     en-us_windows_server_2025_updated_march_2026_x64_dvd_8e06425a.iso
+#     win2025-iso-disk/                                         (new subdir, 2026-05-08)
+#       disk.img -> hardlink to ../en-us_windows_server_2025_..._8e06425a.iso
+#
+# KubeVirt's launcher pod expects a PVC mounted at
+# /var/run/kubevirt-private/vmi-disks/<diskName>/disk.img — by mounting the
+# `win2025-iso-disk/` subdir as the NFS PV root, `disk.img` lives at the PV's
+# root and KubeVirt's CDROM emulator finds it without any path manipulation.
+#
+# A symlink would NOT work for sub-path NFS mounts (the relative target
+# `../...iso` falls outside the sub-mount root). A hardlink works because it
+# references the same inode regardless of mount point.
+#
+# Memory references:
+#   - feedback_synology_nfs_volume1_kubernetes_export_scoped (Synology export
+#     scoping pattern — but /volume1/ISOs export, unlike /volume1/kubernetes,
+#     does support sub-path mounts because Synology NFS is configured with
+#     pseudo-fs in NFSv4.1)
+#   - feedback_kubevirt_iso_first_install_bootorder_and_runstrategy (boot
+#     order / runStrategy gotchas, separate from the storage timing issue)
+#
+# Validation (2026-05-08, from rke2-server / rke2-agent1 / rke2-agent2):
+#   mount -t nfs -o nfsvers=4.1,ro 10.0.58.3:/volume1/ISOs/win2025-iso-disk /tmp/m
+#   file /tmp/m/disk.img
+#     -> ISO 9660 CD-ROM filesystem data 'SSS_X64FRE_EN-US_DV9' (bootable)
+# All 3 RKE2 nodes can mount and read.
+# =============================================================================
+
+apiVersion: v1
+kind: PersistentVolume
+metadata:
+  name: windows-server-2025-iso-nfs
+  labels:
+    flowercore.io/iso: windows-server-2025
+    flowercore.io/managed-by: bluejay-infra
+spec:
+  capacity:
+    storage: 8Gi
+  accessModes:
+    - ReadOnlyMany
+  volumeMode: Filesystem
+  persistentVolumeReclaimPolicy: Retain
+  storageClassName: ""              # static, no provisioner
+  mountOptions:
+    - nfsvers=4.1
+    - ro
+    - hard
+    - timeo=600
+    - retrans=3
+  nfs:
+    server: 10.0.58.3               # BlueJayNAS Synology DS1621+ on HOME VLAN 58
+    path: /volume1/ISOs/win2025-iso-disk
+    readOnly: true
+
+---
+apiVersion: v1
+kind: PersistentVolumeClaim
+metadata:
+  name: windows-server-2025-iso-nfs
+  namespace: kubevirt-vms
+  labels:
+    app: ci-runner
+    flowercore.io/managed-by: bluejay-infra
+spec:
+  accessModes:
+    - ReadOnlyMany
+  volumeMode: Filesystem
+  resources:
+    requests:
+      storage: 8Gi
+  storageClassName: ""
+  volumeName: windows-server-2025-iso-nfs
--- a/apps/multus/multus.yaml
+++ b/apps/multus/multus.yaml
@@ -0,0 +1,286 @@
+# =============================================================================
+# Multus CNI — Meta-CNI for multi-network attachment to pods/VMs
+# =============================================================================
+# Purpose: enable KubeVirt VMs (and any future workload) to attach additional
+# network interfaces beyond the default Calico-managed pod network. Required
+# for ci1 (Windows Server 2025 KubeVirt VM) to bridge onto PROD VLAN 57.
+#
+# Source: upstream k8snetworkplumbingwg/multus-cni v4.2.2
+#   https://github.com/k8snetworkplumbingwg/multus-cni/blob/v4.2.2/deployments/multus-daemonset-thick.yml
+#
+# Inlined verbatim (with project header + version pin annotation) for
+# reproducibility and air-gap safety. Bumping versions = edit this file +
+# git push. ArgoCD picks up via the bluejay-infra ApplicationSet
+# (apps/* directory generator on main).
+#
+# Why thick plugin (not thin):
+#   - Thick = daemon + thin shim binary; daemon handles NAD watch + CRD reads
+#     centrally so each pod's CNI ADD doesn't hit the K8s API server. Better
+#     for clusters with many NAD-using pods.
+#   - Thin = each CNI ADD process directly contacts K8s API. Simpler but
+#     scales worse and has more failure modes.
+#   - KubeVirt + multi-VM workload pattern fits thick perfectly.
+#
+# Cluster context (verified 2026-05-08):
+#   - RKE2 v1.34.5 on 3 nodes (rke2-server, rke2-agent1, rke2-agent2)
+#   - Calico CNI (Tigera-managed) at /etc/cni/net.d + /opt/cni/bin (default)
+#   - openSUSE Leap 16, kernel 6.12, containerd 2.1.5
+#   - host bridge for PROD VLAN 57 = `br-prod` (PUPPET HOST WORK — see Phase 1.5
+#     in docs/infrastructure/windows-server-build-runner-plan.md)
+#
+# Version pin: snapshot-thick → pinning to v4.2.2 release tag at deploy time
+# would require a private mirror of the image. Upstream `snapshot-thick` tag
+# is updated on every release, so for now we trust upstream + Calico's
+# established pattern. Pin to a specific SHA256 once we mirror to Gitea OCI.
+#
+# Apply (once committed to bluejay-infra main, ApplicationSet auto-syncs):
+#   git add apps/multus/multus.yaml && git commit && git push origin main
+#   # ArgoCD `infra-multus` Application appears within 3 min via ApplicationSet
+#
+# Verify:
+#   kubectl -n kube-system get ds kube-multus-ds
+#   kubectl -n kube-system rollout status ds kube-multus-ds
+#   kubectl get crd network-attachment-definitions.k8s.cni.cncf.io
+# =============================================================================
+
+---
+apiVersion: apiextensions.k8s.io/v1
+kind: CustomResourceDefinition
+metadata:
+  name: network-attachment-definitions.k8s.cni.cncf.io
+  annotations:
+    bluejay.iamworkin.lan/source: "k8snetworkplumbingwg/multus-cni v4.2.2"
+spec:
+  group: k8s.cni.cncf.io
+  scope: Namespaced
+  names:
+    plural: network-attachment-definitions
+    singular: network-attachment-definition
+    kind: NetworkAttachmentDefinition
+    shortNames:
+      - net-attach-def
+  versions:
+    - name: v1
+      served: true
+      storage: true
+      schema:
+        openAPIV3Schema:
+          description: 'NetworkAttachmentDefinition is a CRD schema specified by the Network Plumbing
+            Working Group to express the intent for attaching pods to one or more logical or physical
+            networks. More information available at: https://github.com/k8snetworkplumbingwg/multi-net-spec'
+          type: object
+          properties:
+            apiVersion:
+              type: string
+            kind:
+              type: string
+            metadata:
+              type: object
+            spec:
+              description: 'NetworkAttachmentDefinition spec defines the desired state of a network attachment'
+              type: object
+              properties:
+                config:
+                  description: 'NetworkAttachmentDefinition config is a JSON-formatted CNI configuration'
+                  type: string
+---
+kind: ClusterRole
+apiVersion: rbac.authorization.k8s.io/v1
+metadata:
+  name: multus
+rules:
+  - apiGroups: ["k8s.cni.cncf.io"]
+    resources:
+      - '*'
+    verbs:
+      - '*'
+  - apiGroups:
+      - ""
+    resources:
+      - pods
+      - pods/status
+    verbs:
+      - get
+      - list
+      - update
+      - watch
+  - apiGroups:
+      - ""
+      - events.k8s.io
+    resources:
+      - events
+    verbs:
+      - create
+      - patch
+      - update
+---
+kind: ClusterRoleBinding
+apiVersion: rbac.authorization.k8s.io/v1
+metadata:
+  name: multus
+roleRef:
+  apiGroup: rbac.authorization.k8s.io
+  kind: ClusterRole
+  name: multus
+subjects:
+  - kind: ServiceAccount
+    name: multus
+    namespace: kube-system
+---
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: multus
+  namespace: kube-system
+---
+kind: ConfigMap
+apiVersion: v1
+metadata:
+  name: multus-daemon-config
+  namespace: kube-system
+  labels:
+    tier: node
+    app: multus
+data:
+  daemon-config.json: |
+    {
+        "chrootDir": "/hostroot",
+        "cniVersion": "0.3.1",
+        "logLevel": "verbose",
+        "logToStderr": true,
+        "cniConfigDir": "/host/etc/cni/net.d",
+        "multusAutoconfigDir": "/host/etc/cni/net.d",
+        "multusConfigFile": "auto",
+        "socketDir": "/host/run/multus/"
+    }
+---
+apiVersion: apps/v1
+kind: DaemonSet
+metadata:
+  name: kube-multus-ds
+  namespace: kube-system
+  labels:
+    tier: node
+    app: multus
+    name: multus
+spec:
+  selector:
+    matchLabels:
+      name: multus
+  updateStrategy:
+    type: RollingUpdate
+  template:
+    metadata:
+      labels:
+        tier: node
+        app: multus
+        name: multus
+    spec:
+      hostNetwork: true
+      hostPID: true
+      tolerations:
+        - operator: Exists
+          effect: NoSchedule
+        - operator: Exists
+          effect: NoExecute
+      serviceAccountName: multus
+      containers:
+        - name: kube-multus
+          image: ghcr.io/k8snetworkplumbingwg/multus-cni:snapshot-thick
+          command: [ "/usr/src/multus-cni/bin/multus-daemon" ]
+          resources:
+            requests:
+              cpu: "100m"
+              memory: "50Mi"
+            limits:
+              cpu: "100m"
+              memory: "50Mi"
+          securityContext:
+            privileged: true
+          terminationMessagePolicy: FallbackToLogsOnError
+          volumeMounts:
+            - name: cni
+              mountPath: /host/etc/cni/net.d
+            # multus-daemon expects that cnibin path must be identical between pod and container host.
+            # e.g. if the cni bin is in '/opt/cni/bin' on the container host side, then it should be mount to '/opt/cni/bin' in multus-daemon,
+            # not to any other directory, like '/opt/bin' or '/usr/bin'.
+            - name: cnibin
+              mountPath: /opt/cni/bin
+            - name: host-run
+              mountPath: /host/run
+            - name: host-var-lib-cni-multus
+              mountPath: /var/lib/cni/multus
+            - name: host-var-lib-kubelet
+              mountPath: /var/lib/kubelet
+              mountPropagation: HostToContainer
+            - name: host-run-k8s-cni-cncf-io
+              mountPath: /run/k8s.cni.cncf.io
+            - name: host-run-netns
+              mountPath: /run/netns
+              mountPropagation: HostToContainer
+            - name: multus-daemon-config
+              mountPath: /etc/cni/net.d/multus.d
+              readOnly: true
+            - name: hostroot
+              mountPath: /hostroot
+              mountPropagation: HostToContainer
+            - mountPath: /etc/cni/multus/net.d
+              name: multus-conf-dir
+          env:
+            - name: MULTUS_NODE_NAME
+              valueFrom:
+                fieldRef:
+                  fieldPath: spec.nodeName
+      initContainers:
+        - name: install-multus-binary
+          image: ghcr.io/k8snetworkplumbingwg/multus-cni:snapshot-thick
+          command:
+            - "sh"
+            - "-c"
+            - "cp /usr/src/multus-cni/bin/multus-shim /host/opt/cni/bin/multus-shim && cp /usr/src/multus-cni/bin/passthru /host/opt/cni/bin/passthru"
+          resources:
+            requests:
+              cpu: "10m"
+              memory: "15Mi"
+          securityContext:
+            privileged: true
+          terminationMessagePolicy: FallbackToLogsOnError
+          volumeMounts:
+            - name: cnibin
+              mountPath: /host/opt/cni/bin
+              mountPropagation: Bidirectional
+      terminationGracePeriodSeconds: 10
+      volumes:
+        - name: cni
+          hostPath:
+            path: /etc/cni/net.d
+        - name: cnibin
+          hostPath:
+            path: /opt/cni/bin
+        - name: hostroot
+          hostPath:
+            path: /
+        - name: multus-daemon-config
+          configMap:
+            name: multus-daemon-config
+            items:
+            - key: daemon-config.json
+              path: daemon-config.json
+        - name: host-run
+          hostPath:
+            path: /run
+        - name: host-var-lib-cni-multus
+          hostPath:
+            path: /var/lib/cni/multus
+        - name: host-var-lib-kubelet
+          hostPath:
+            path: /var/lib/kubelet
+        - name: host-run-k8s-cni-cncf-io
+          hostPath:
+            path: /run/k8s.cni.cncf.io
+        - name: host-run-netns
+          hostPath:
+            path: /run/netns/
+        - name: multus-conf-dir
+          hostPath:
+            path: /etc/cni/multus/net.d
Author	SHA1	Message	Date
Codex	b998f50f48	fix(ci1): switch ISO delivery to containerDisk OCI image (Path C) OCI image: localhost/win-server-2025:1.0 (8.27 GB) Built FROM scratch + ADD disk.img → /disk/disk.img on noc1, podman saved as tar (8.27 GB), SCP'd in parallel to all 3 RKE2 nodes, imported via ctr in k8s.io namespace. Verified present on all 3 schedulable nodes (rke2-server, rke2-agent1, rke2-agent2). Why containerDisk over the prior PVC paths: - Path A (Longhorn Filesystem PVC, sata): OVMF BdsDxe SATA-CDROM read timeout. Cdrom-backed PVC is too slow for OVMF's first-sector read window. - Path B (Synology NFS): uid 107 (qemu) denied at directory level by Synology export ACL despite file mode 0777. Memory: feedback_synology_iso_export_root_only_uid_107_denied. - Path B+SCSI: same OVMF timeout, just on SCSI controller. Bus choice was not load-bearing — the issue was always the slow PVC backing. - Path C (this commit): containerDisk delivers the ISO bytes from a tmpfs view of the OCI layer, no PVC controller in the read path. qemu reads at native FS speed; OVMF first-sector read completes well within timeout. This is also the KubeVirt-recommended pattern for installer ISOs. Connects to FlowerCore.Distribution / Provisioning USB story: same "OCI image of the OS installer + autounattend on a sysprep CDROM" pattern that the USB provisioning agent will use. The Windows install proceeds hands-off via the existing autounattend.xml in ci1-autounattend ConfigMap (RDP enabled, WinRM, UAC disabled, Administrator password from 1Password vault item h3ix4mgfk65gmkcmvh6ly3d3hu). Image lifecycle: bump tag (1.1, 1.2, ...) when ISO version changes, rebuild on noc1, redistribute to RKE2 nodes, update image: line. Legacy NFS PVC + PV manifest and CDI Longhorn PVC RETAINED for this commit so prior states are recoverable. Will prune in follow-up once containerDisk boot proves. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 20:45:38 -05:00
Codex	8fd9ae1cd3	fix(ci1): revert NFS Path B + flip ISO cdrom bus sata→scsi NFS Path B (commit `fc2aca0`) failed at storage layer: Synology export `/volume1/ISOs` denies non-root client UIDs at the directory level. qemu uid 107 cannot `ls /iso/` even though disk.img is mode 0777. Diagnosed via uid-107 + uid-0 busybox probe pods on rke2-agent2: - libvirt error: "Cannot access storage file ... Permission denied" (virStorageSourceReportBrokenChain:1281, virError Code=38 Domain=18) - uid 107 pod: "ls: can't open '/iso/': Permission denied" - uid 0 pod (same mount): "drwxrwxrwx 1 root root 16 ... disk.img" - SELinux Enforcing + virt_use_nfs=on, no AVC denials → not SELinux - File mode 0777 with owner 107:107 → not POSIX Same export-only-root pattern as `/volume1/kubernetes`. Memory: feedback_synology_iso_export_root_only_uid_107_denied.md Existing CDI-uploaded Longhorn PVC `windows-server-2025-iso` (10Gi Filesystem mode) verified to contain valid ISO bytes readable by uid 107 (mode 0660 root:107, 9.85 GB sparse, 8.27 GB blocks ≈ original 7.7 GB ISO). Reverting to it. The original OVMF SATA-CDROM read timeout that drove yesterday's NFS pivot is now addressed by `cdrom: bus: scsi` (virtio-scsi has a longer read window than the IDE/SATA emulator). Per user-prompt diagnostic chain Step 5. NFS PVC + PV (apps/kubevirt-vms/win2025-iso-nfs-pv.yaml) RETAINED so Path B state is recoverable; can be pruned in follow-up once SCSI boot is proven. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 18:54:36 -05:00
Codex	fc2aca0e9e	fix(ci1): mount Windows ISO via Synology NFS (Path B for SATA-CDROM timeout) Previous fix attempts confirmed the Longhorn-backed Filesystem PVC contains a perfectly valid bootable ISO9660 image. The bug is that SATA-CDROM emulation reading from a Longhorn Filesystem PVC is too slow for OVMF's boot read window — DVD-ROM enumeration times out before the bootloader loads. Symptom on the serial console: BdsDxe: failed to start Boot0001 "UEFI QEMU DVD-ROM QM00001 " ... Time out BdsDxe: No bootable option or device was found Block-mode PVC (Path A) was attempted and would likely fix the timing, but CDI v1.65.0's upload-target pod cannot open the underlying block device (runAsUser:107 + capabilities.drop:[ALL]): blockdev: cannot open /dev/cdi-block-volume: Permission denied Path B (this change): mount the ISO directly from Synology NAS over NFSv4.1. Bypasses both the Longhorn slowness and the CDI permission issue. QEMU's SATA emulator reads at native LAN speed. Layout: /volume1/ISOs/ — existing Synology export, RKE2 ACL already granted /volume1/ISOs/win2025-iso-disk/disk.img — new subdir, hardlink to the ISO file, named so KubeVirt's launcher finds it at the PV root A hardlink (not symlink) is required because symlinks with relative targets pointing to the parent directory are broken when the NFS PV sub-mounts the subdir as its root. Validated 2026-05-08 from rke2-server, rke2-agent1, rke2-agent2: mount -t nfs -o nfsvers=4.1,ro 10.0.58.3:/volume1/ISOs/win2025-iso-disk file disk.img -> ISO 9660 CD-ROM filesystem data ... (bootable) The original Longhorn Filesystem ISO PVC is RETAINED unused (so ArgoCD doesn't prune the populated PVC and so we have a fallback). Can be removed in a follow-up commit after the NFS path is proven on a successful Windows install. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 17:03:42 -05:00
Codex	ba18c52130	docs(ci1): record open rootdisk-flock and SATA-CDROM-timeout issues Documenting the remaining 2 unresolved issues for the next operator session, with the recovery paths from this session captured inline so the next agent doesn't repeat the same blind alleys: 1. rootdisk QEMU flock — every new launcher pod fails QEMU start with `Failed to get "write" lock` on the rootdisk Filesystem-mode disk.img. Stale flock from a previous force-deleted virt-launcher pod. Longhorn engine on rke2-agent2 needs to release the lock; `kubectl patch volume.longhorn.io/<pvc-name> spec.nodeID=""` is reverted by the Longhorn controller. Operator-level recovery only. 2. SATA CDROM read timeout — even with bootOrder=1 (windows-iso first), OVMF UEFI fails Boot0001 with "Time out" reading the SATA CDROM backed by the Filesystem-mode PVC. Block-mode DataVolume migration was attempted but blocked by CDI v1.65.0's upload pod running with `capabilities.drop: [ALL]` and `runAsUser: 107`, preventing direct block-device writes (`blockdev: cannot open /dev/cdi-block-volume: Permission denied`). See ISO PVC header docstring for 3 forward paths. Net commits during this session: - `1c4145a`: bootOrder swap (windows-iso=1, rootdisk=2) - `87a7d7c`: deprecated `running:` -> `runStrategy: Always` - `0bf47df`: ISO migration to Block-mode DataVolume (REVERTED) - `9f6dc1a`: revert to Filesystem PVC (CDI block-upload blocked) - `1c4145a` + `87a7d7c` + `9f6dc1a` are the live, correct configuration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 15:18:38 -05:00
Codex	9f6dc1a9d5	fix(ci1): revert ISO to Filesystem PVC; CDI v1.65.0 block-upload pod blocked by capability drop The Block-mode DataVolume migration (commit `0bf47df`) hit a CDI v1.65.0 limitation: the upload-target pod runs as uid 107 with `capabilities.drop: [ALL]`, so it cannot open the underlying block device: blockdev: cannot open /dev/cdi-block-volume: Permission denied Saving stream failed: Unable to transfer source data to target file: error determining if block device exists: exit status 1 Reverting to a Filesystem-mode PVC + virtctl image-upload pvc, which DID work (uploaded the 7.7 GiB ISO with valid ISO9660 magic intact). Boot timeout is unresolved (header docstring captures the open issue + 3 paths to revisit). The bootOrder swap (`1c4145a`) and runStrategy migration (`87a7d7c`) stay landed — those are correct improvements regardless of the volume-mode question. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 14:32:52 -05:00
Codex	0bf47dfa33	fix(ci1): switch ISO from filesystem PVC to Block-mode DataVolume The bootOrder swap alone didn't fix the install — even with `windows-iso` at bootOrder:1, OVMF UEFI still timed out reading the SATA CDROM: BdsDxe: starting Boot0001 "UEFI QEMU DVD-ROM QM00001 " from ... Sata(...) BdsDxe: failed to start Boot0001 ... : Time out BdsDxe: No bootable option or device was found. Diagnosis (debug pod mounting the live PVC): - /pvc/disk.img IS a valid bootable ISO9660 image — `file` reports "ISO 9660 CD-ROM filesystem data 'SSS_X64FRE_EN-US_DV9' (bootable)". - bytes 0..15: zeros (NOT QCOW2 magic 51 46 49 fb). - bytes 32769..32773: "CD001" — ISO9660 primary volume descriptor at the correct offset. So content was fine. The bug is in how KubeVirt + QEMU + Longhorn expose a Filesystem-mode PVC's `/disk.img` as a SATA CDROM. With Block-mode the underlying volume IS the raw ISO9660 sectors, OVMF reads them directly, no QEMU file-emulation layer. This is the recommended pattern for ISO install media on KubeVirt + Longhorn. Migration: - Replace `kind: PersistentVolumeClaim` with `kind: DataVolume` (CDI manages the underlying PVC + upload-target pod). - Set `pvc.volumeMode: Block`. - Annotate `cdi.kubevirt.io/storage.contentType: kubevirt` so CDI keeps raw bytes (no QCOW2 wrap). - VM volume reference changes from `persistentVolumeClaim.claimName` to `dataVolume.name`. KubeVirt's VMI controller blocks VM start until DV phase is Succeeded (upload completed). Operator step after this lands: 1. Wait for DV `phase: UploadReady` kubectl get dv -n kubevirt-vms windows-server-2025-iso -w 2. virtctl image-upload dv windows-server-2025-iso -n kubevirt-vms \ --image-path "...\en-us_windows_server_2025...iso" \ --uploadproxy-url https://localhost:8443 --insecure --no-create 3. Re-flip runStrategy to Always (was set to Halted live-side during migration; this commit keeps the manifest at Always). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 14:23:31 -05:00
Codex	87a7d7c70a	fix(ci1): switch deprecated `running: true` -> `runStrategy: Always` Required to clear OutOfSync state after the bootOrder fix. Live VM had runStrategy: Halted (set during diagnosis to release the PVC for inspection). Manifest had running: true. KubeVirt's validating webhook rejects sync: admission webhook "virtualmachine-validator.kubevirt.io" denied the request: Running and RunStrategy are mutually exclusive. Switching to runStrategy: Always preserves the original "auto-start + auto-restart" semantics with the non-deprecated field, and gives ArgoCD a clean diff target to flip Halted -> Always. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 14:12:07 -05:00
Codex	1c4145a581	fix(ci1): swap bootOrder so Windows install ISO boots first Original order: rootdisk=1 (empty 200Gi virtio), windows-iso=2 (SATA CDROM). UEFI tried the empty virtio disk first, got nothing, fell back to Boot0001 (the SATA CDROM) with a short timeout, and aborted with: BdsDxe: failed to start Boot0001 ... Time out BdsDxe: No bootable option or device was found. VM had been running 38+ min with rootdisk actualSize stuck at 4.13 GiB and no AgentConnected condition — install never started. Diagnosis via debug pod mounting the windows-server-2025-iso PVC: /pvc/disk.img: ISO 9660 CD-ROM filesystem data 'SSS_X64FRE_EN-US_DV9' (bootable) bytes 0..15: zeros (NOT QCOW2 magic 51 46 49 fb) bytes 32769..32773: "CD001" (ISO9660 primary volume descriptor) So the PVC content is a real bootable ISO — the only fix needed is to make the ISO bootOrder=1 for first install. After Windows installs, it writes its own UEFI Boot#### entries pointing at the rootdisk EFI partition; UEFI then boots from rootdisk going forward and the ISO at bootOrder:2 is a fallback for re-install scenarios. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 14:10:17 -05:00
Codex	c50a403f74	fix(infra): pin virtio-container-disk to v1.8.2 (containerd 2.1 manifest fix) KubeVirt v1.4.0 + RKE2 containerd 2.1.5 cannot pull quay.io/kubevirt/virtio-container-disk:latest: rpc error: code = Unimplemented desc = failed to pull and unpack image: not implemented: media type "application/vnd.docker.distribution.manifest.v1+prettyjws" is no longer supported since containerd v2.1, please rebuild the image as "application/vnd.docker.distribution.manifest.v2+json" or "application/vnd.oci.image.manifest.v1+json" The :latest tag was last rebuilt with the v1 manifest schema. Tagged versions v1.6.5+, v1.7.3, v1.8.2 are rebuilt with v2/OCI manifests. Pinning to v1.8.2 (newest available, contains current Windows VirtIO drivers). The image only contains the Windows VirtIO driver ISO mounted as a CDROM — not the KubeVirt runtime — so it is decoupled from the cluster KubeVirt version. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 13:28:22 -05:00
Codex	fb7bd10528	feat(infra): activate ci1 VM — running:true + 10Gi ISO PVC + 1P password Phase 1 prereqs all satisfied: - Multus CNI v4.2.2 thick-plugin DS Running on rke2-server/agent1/agent2 - CDI v1.65.0 operator + CR Deployed (cdi-apiserver/deployment/uploadproxy all Running 1/1) - Windows Server 2025 ISO (7.7GiB, March 2026 update) uploaded via CDI virtctl image-upload to PVC windows-server-2025-iso. Verified via PVC annotations: cdi.kubevirt.io/storage.condition.running.message="Upload Complete", storage.pod.phase="Succeeded" - Local Administrator password generated (26 char, FANTASTIC strength). Stored in 1Password vault IAmWorkin (qaphopopkryhbg353ukzhhuqoq) item h3ix4mgfk65gmkcmvh6ly3d3hu. UTF-16-LE base64 in autounattend.xml Value field matches the 1P "autounattend AdministratorPassword Value" field. Changes: - ISO PVC bumped 6Gi → 10Gi (ISO is 7.7GiB, need headroom) - Added labels app=ci-runner, flowercore.io/managed-by=bluejay-infra - autounattend.xml AdministratorPassword Value: real base64-encoded password - spec.running: false → true (VM starts on next ArgoCD sync) - Header comment refreshed to LIVE state with prereq references Network: still pod-network masquerade. Multus NAD prod-vlan57 is registered but the VM doesn't use it yet (Phase 1.5 host bridge needed first). Verify after sync: kubectl --kubeconfig $env:USERPROFILE\.kube\rke2.yaml -n kubevirt-vms get vm,vmi virtctl --kubeconfig $env:USERPROFILE\.kube\rke2.yaml vnc ci1 -n kubevirt-vms Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 13:24:46 -05:00
Codex	6c21d14a98	deploy(fc-updater): bump image to v20260508-pub3-deepening-2bdf108 Promotes the fleet to FlowerCore.Updater main @ 2bdf108 which combines: - PR #6 publish pre-signed releases (1a188f4) - PR #7 deeper public-host privacy enforcement (8cd8544) - PublishPreSignedAsync(stream, sig) Integration coverage (2bdf108) Live image already imported to rke2-server and rolled via deploy-web.ps1. This commit aligns the bluejay-infra source of truth so ArgoCD doesn't snap the spec back to the previous tag (per feedback_argocd_managed_image_overrides_do_not_stick).	2026-05-08 13:07:24 -05:00
Codex	b3529f8e96	feat(infra): add Multus CNI + CDI + PROD VLAN 57 NAD as GitOps prereqs for ci1 Adds three new bluejay-infra apps that auto-pickup via ApplicationSet (apps/* directory generator on main): * apps/multus/multus.yaml — Multus CNI v4.2.2 thick-plugin daemonset (verbatim upstream, project-annotated). Enables KubeVirt VMs to attach additional network interfaces. Required by ci1 to bridge onto PROD VLAN 57. * apps/cdi/{cdi-operator.yaml,cdi-cr.yaml,README.md} — Containerized Data Importer v1.65.0 (verbatim upstream). Operator + CR pattern. Enables populating PVCs from HTTP/registry/upload sources, used to load the Windows Server 2025 ISO into the windows-server-2025-iso PVC. * apps/kubevirt-vms/prod-vlan57-nad.yaml — NetworkAttachmentDefinition for PROD VLAN 57 bridge. Deploy gated on Phase 1.5 host work: requires br-prod bridge enslaving enp86s0.57 on each RKE2 node (Puppet config-as-code). ci1.yaml continues to use pod-network masquerade until that lands; switching to multus.networkName: kubevirt-vms/prod-vlan57 is a one-line YAML edit followed by a GitOps push. Cluster verification (2026-05-08): - KubeVirt LIVE (3 nodes, virt-api/controller/handler/operator all Running) - Calico CNI on /etc/cni/net.d + /opt/cni/bin (Multus default paths) - ApplicationSet `bluejay-infra` already watches `apps/*` on main Reproducibility: upstream YAMLs vendored verbatim with project header diffs only. Bumping versions = re-curl + git push. No deploy-time internet fetch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 13:05:58 -05:00
Codex	00c11b4eaa	feat(infra): stage ci1 Windows Server 2025 KubeVirt VM (Phase 1, NOT YET APPLIED) Stages a draft VirtualMachine + Namespace + ISO PVC + rootdisk PVC + sysprep ConfigMap for the dedicated GitHub Actions self-hosted runner that replaces the never-registered bluejay-ws-sandbox-1 placeholder. Status: STAGED ONLY. spec.running = false. ISO PVC empty. Two operator decisions still pending before this can boot: 1. Network choice — pod-network fallback (in this draft) vs Multus + PROD VLAN NAD (preferred, requires Multus install). 2. ISO path — manual upload via helper pod (Path A) vs CDI HTTP import (Path B, requires CDI install). Cluster baseline 2026-05-08: - KubeVirt operator: installed, healthy, 14d - CDI: NOT installed - Multus: NOT installed - Calico-only CNI See docs/infrastructure/windows-server-build-runner-plan.md "Phase 1 readiness gate" for the full operator pickup checklist. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 12:32:47 -05:00