From 6e581d2879359a2e2ca96de238bd9b44e29ed5db Mon Sep 17 00:00:00 2001 From: Andrew Stoltz Date: Wed, 20 May 2026 11:27:21 -0500 Subject: [PATCH] docs(fc-build-windows): capture runner operator gate --- apps/fc-build-windows/README.md | 263 ++++++++++++++++++ apps/fc-build-windows/kustomization.yaml | 4 + .../operator-gate-configmap.yaml | 61 ++++ 3 files changed, 328 insertions(+) create mode 100644 apps/fc-build-windows/README.md create mode 100644 apps/fc-build-windows/kustomization.yaml create mode 100644 apps/fc-build-windows/operator-gate-configmap.yaml diff --git a/apps/fc-build-windows/README.md b/apps/fc-build-windows/README.md new file mode 100644 index 0000000..a4bcfcc --- /dev/null +++ b/apps/fc-build-windows/README.md @@ -0,0 +1,263 @@ +# fc-build-windows runner gate + +Status: OPEN-WITH-OPERATOR-ACTION as of 2026-05-20. + +This directory is intentionally not a live runner deployment. It records the +exact gate for bringing up the Windows self-hosted runner fleet without faking +capacity in GitHub or Kubernetes. + +## Lane evidence + +- `D:\git\FlowerCore\FlowerCore.Notes\docs\dashboards\decisions-waiting.html` + lines 15078-15085: Q-MR-82 says the Updater Windows Sandbox E2E run is + queued and `bluejay-ws-sandbox-1` is offline. +- `D:\git\FlowerCore\FlowerCore.Notes\memory\project_morning_routine_8_2026_05_20.md`: + Morning Routine #8 carries Q-MR-82 as the fleet-wide Windows runner gap. +- `D:\git\FlowerCore\FlowerCore.Notes\docs\standards\sprint-37-codex-dispatch-log-2026-05-19.md` + lines 76, 84-85, and 97: keep BLUEJAY-WS out of runner plans, merge Linux + runner expansion separately, and keep true Windows-only workflows parked on + the Windows runner host substrate path. +- `D:\git\FlowerCore\FlowerCore.Notes\docs\ai-agents\codex-prompts\2026-05-20-xxxxl-sprint-42-orchestrator-briefs.md` + lane Cx-5: land a deployment only if a Windows runner image/substrate is + ready; otherwise commit an operator-action gate. +- `D:\git\FlowerCore\FlowerCore.Notes\memory\feedback_bluejay_ws_never_a_github_runner.md`: + BLUEJAY-WS is operator-only territory; Windows runners belong on a dedicated + KubeVirt Windows VM such as `ci1` or a sibling VM. + +## Live probe summary + +Commands run on 2026-05-20 from `D:\git\FlowerCore\bluejay-infra`: + +```powershell +$env:KUBECONFIG="$env:USERPROFILE\.kube\rke2.yaml" +kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"`t"}{.metadata.labels.kubernetes\.io/os}{"`n"}{end}' +``` + +Result: `rke2-agent1`, `rke2-agent2`, and `rke2-server` all report +`kubernetes.io/os=linux`. There is no Windows Kubernetes node, so Windows +containers on RKE2 cannot satisfy `fc-build-windows`. + +```powershell +kubectl -n kubevirt-vms get vm,vmi,pods -o wide +``` + +Result: KubeVirt is healthy and `ci1` is `Running` / `Ready=True` on +`rke2-agent1` with VMI IP `10.42.103.35`. + +```powershell +virtctl --kubeconfig $env:USERPROFILE\.kube\rke2.yaml port-forward vm/ci1.kubevirt-vms 15985:5985 +``` + +Result during port tests: `dial tcp 10.42.103.35:5985: connect: no route to +host`. The same result was seen for RDP 3389 and SSH 22. The VM exists, but it +is not remotely reachable for runner bootstrap from this lane. + +```powershell +gh api /repos/astoltz/FlowerCore.Updater/actions/runners ` + --jq '.runners[]? | {name,status,busy,labels:[.labels[].name]}' +gh run list --repo astoltz/FlowerCore.Updater ` + --workflow "Updater Windows Sandbox E2E" --limit 5 +``` + +Result: GitHub has one Updater runner, `bluejay-ws-sandbox-1`, with +`status=offline`; run `26150689447` is still `queued`. + +## Feasibility classification + +### Option A: Windows containers on RKE2 + +Not feasible without operator-physical infrastructure work. Kubernetes Windows +containers require a Windows node. The current cluster has Linux-only RKE2 +nodes. + +### Option B: KubeVirt Windows VM + +Partially present, not deployable from this lane. + +`apps/kubevirt-vms/ci1.yaml` already defines a Windows Server 2025 KubeVirt VM +using `localhost/fc-win-server-2025:v1`, and the live VM is running. However: + +- the guest is not reachable over RDP, WinRM, or SSH through `virtctl + port-forward`; +- the current root disk is a `containerDisk`, so runner installation inside the + running guest is not a durable fleet state unless the first-boot automation + re-registers on every boot or the VM is moved to a persistent PVC-backed + disk; +- FC.Updater `Updater Windows Sandbox E2E` uses + `[self-hosted, windows, windows-sandbox]`, while `fc-build-windows` build jobs + use `[self-hosted, windows, fc-build-windows]`. Do not advertise + `windows-sandbox` until Windows Sandbox has been proven in the guest. + +### Option C: bluejay-ws-sandbox-1 + +Operator-only emergency fallback. GitHub shows it registered but offline. The +current memory says BLUEJAY-WS must not be a fleet runner host, so this lane +does not start or re-register it. If the operator deliberately overrides the +policy to drain an emergency queue, start the existing visible runner console +from the BLUEJAY-WS desktop and treat that as temporary break-glass, not the +permanent Q-MR-82 closure. + +## Operator action plan + +### 1. Pick the Windows host class + +Use `ci1` or a sibling Windows Server 2025 VM for WPF build/test jobs that need +`fc-build-windows`. + +Use a Windows 11 Pro/Enterprise KubeVirt VM for Updater or WorldBuilder +Windows Sandbox gates, unless Windows Sandbox support is explicitly proven on +the selected guest. The workflow labels must match the real capability: + +- WPF build runner: `self-hosted,windows,fc-build-windows,ci1` +- Sandbox runner: `self-hosted,windows,windows-sandbox,ci-sandbox1` + +### 2. Make the VM reachable and durable + +From BLUEJAY-WS: + +```powershell +$env:KUBECONFIG="$env:USERPROFILE\.kube\rke2.yaml" +kubectl -n kubevirt-vms get vm,vmi,pods -o wide +virtctl --kubeconfig $env:KUBECONFIG vnc ci1 -n kubevirt-vms +virtctl --kubeconfig $env:KUBECONFIG port-forward vm/ci1.kubevirt-vms 13389:3389 +virtctl --kubeconfig $env:KUBECONFIG port-forward vm/ci1.kubevirt-vms 15985:5985 +``` + +Before runner registration, fix the current port-forward failure. The expected +state is that RDP or WinRM accepts a connection through the control plane. + +For durability, either: + +- move the runner VM to a persistent PVC-backed root disk; or +- keep `containerDisk` and bake first-boot runner registration into the sysprep + flow using a non-expiring credential lookup path. + +Do not install a runner by hand into a transient VM and call Q-MR-82 closed. + +### 3. Install runner prerequisites inside the VM + +Run in an elevated PowerShell session in the Windows runner guest: + +```powershell +winget install Microsoft.DotNet.SDK.10 --silent +winget install Microsoft.DotNet.DesktopRuntime.8 --silent +winget install Microsoft.PowerShell --silent +winget install Git.Git --silent +winget install Microsoft.VisualStudio.2022.BuildTools --silent +winget install Google.Chrome --silent +``` + +For a Sandbox-capable runner only: + +```powershell +Enable-WindowsOptionalFeature -Online -FeatureName Containers-DisposableClientVM -All +Restart-Computer -Force +``` + +After reboot: + +```powershell +Get-CimInstance -ClassName Win32_OptionalFeature -Filter "Name='Containers-DisposableClientVM'" +Test-Path C:\Windows\System32\WindowsSandbox.exe +``` + +### 4. Register repo-scoped GitHub runners + +The `astoltz` account uses repo-scoped runners. Generate a fresh one-hour +registration token per repo immediately before `config.cmd`. + +From a trusted operator shell with `gh` authenticated: + +```powershell +$repos = @( + "FlowerCore.Updater", + "FlowerCore.WorldBuilder", + "FlowerCore.DeviceManagement" +) + +foreach ($repo in $repos) { + $token = gh api -X POST "/repos/astoltz/$repo/actions/runners/registration-token" --jq .token + $repoSlug = $repo.ToLowerInvariant().Replace("flowercore.", "").Replace(".", "-") + $runnerDir = "C:\fc-ghr\$repoSlug-fc-build-windows" + + New-Item -ItemType Directory -Force -Path $runnerDir | Out-Null + Set-Location $runnerDir + + if (-not (Test-Path ".\config.cmd")) { + Invoke-WebRequest ` + -Uri "https://github.com/actions/runner/releases/download/v2.323.0/actions-runner-win-x64-2.323.0.zip" ` + -OutFile "actions-runner.zip" + Add-Type -AssemblyName System.IO.Compression.FileSystem + [System.IO.Compression.ZipFile]::ExtractToDirectory((Resolve-Path actions-runner.zip), $runnerDir) + } + + .\config.cmd ` + --url "https://github.com/astoltz/$repo" ` + --token $token ` + --name "ci1-$repoSlug-fc-build-windows" ` + --labels "self-hosted,windows,fc-build-windows,ci1" ` + --work "_work" ` + --unattended ` + --replace + + .\svc.ps1 install + .\svc.ps1 start +} +``` + +For Updater Sandbox E2E, register only after the guest proves Sandbox support, +and use `windows-sandbox` labels: + +```powershell +$token = gh api -X POST "/repos/astoltz/FlowerCore.Updater/actions/runners/registration-token" --jq .token +.\config.cmd ` + --url "https://github.com/astoltz/FlowerCore.Updater" ` + --token $token ` + --name "ci-sandbox1-updater" ` + --labels "self-hosted,windows,windows-sandbox,ci-sandbox1" ` + --work "_work" ` + --unattended ` + --replace +``` + +Keep registration tokens out of Git and logs. The durable credential source for +automation should be the existing 1Password item named `GitHub PAT (Runner +Registration)`, used only to mint short-lived repo registration tokens. + +### 5. Verify GitHub and workflow pickup + +```powershell +gh api /repos/astoltz/FlowerCore.Updater/actions/runners ` + --jq '.runners[] | select(.labels[].name == "windows-sandbox") | {name,status,busy,labels:[.labels[].name]}' + +gh api /repos/astoltz/FlowerCore.DeviceManagement/actions/runners ` + --jq '.runners[] | select(.labels[].name == "fc-build-windows") | {name,status,busy,labels:[.labels[].name]}' + +gh run list --repo astoltz/FlowerCore.Updater ` + --workflow "Updater Windows Sandbox E2E" --limit 3 +``` + +Q-MR-82 can be marked resolved only after the Updater run moves from `queued` to +`in_progress` or `completed` on an online runner, or after the affected WPF +build repos show online `fc-build-windows` repo-scoped runners and their queued +jobs start. + +## Break-glass BLUEJAY-WS command + +Only if the operator explicitly overrides the "BLUEJAY-WS is not a runner" +policy to drain a queue: + +```powershell +Set-Location C:\fc-ghr\updater-sandbox +.\run.cmd +``` + +If a Windows service exists: + +```powershell +Get-Service 'actions.runner.*' +Start-Service 'actions.runner.*' +``` + +This does not close Q-MR-82 permanently. It is a temporary queue drain until a +dedicated VM runner is online. diff --git a/apps/fc-build-windows/kustomization.yaml b/apps/fc-build-windows/kustomization.yaml new file mode 100644 index 0000000..4d613b7 --- /dev/null +++ b/apps/fc-build-windows/kustomization.yaml @@ -0,0 +1,4 @@ +apiVersion: kustomize.config.k8s.io/v1beta1 +kind: Kustomization +resources: + - operator-gate-configmap.yaml diff --git a/apps/fc-build-windows/operator-gate-configmap.yaml b/apps/fc-build-windows/operator-gate-configmap.yaml new file mode 100644 index 0000000..b10814c --- /dev/null +++ b/apps/fc-build-windows/operator-gate-configmap.yaml @@ -0,0 +1,61 @@ +apiVersion: v1 +kind: ConfigMap +metadata: + name: fc-build-windows-operator-gate + namespace: kubevirt-vms + labels: + app.kubernetes.io/name: fc-build-windows + app.kubernetes.io/component: operator-gate + app.kubernetes.io/part-of: github-runner + flowercore.io/q-card: Q-MR-82 + annotations: + flowercore.io/outcome: OPEN-WITH-OPERATOR-ACTION + flowercore.io/live-runner: "false" +data: + outcome: OPEN-WITH-OPERATOR-ACTION + gate.md: | + Do not treat this ConfigMap as runner capacity. + + Current probe, 2026-05-20: + - RKE2 nodes are linux-only; Windows containers require a Windows node. + - KubeVirt `ci1` is Running/Ready, but RDP 3389, WinRM 5985, and SSH 22 + through `virtctl port-forward` return `connect: no route to host`. + - GitHub Updater runner list has only `bluejay-ws-sandbox-1`, status + offline. Updater Windows Sandbox E2E run 26150689447 remains queued. + + Required operator action: + 1. Make a dedicated Windows VM reachable and durable. + 2. Install .NET 10 SDK, .NET 8 Desktop Runtime, Git, VS Build Tools, and + PowerShell 7. + 3. Register repo-scoped runners with short-lived GitHub registration tokens. + 4. Add `fc-build-windows` labels only to WPF build-capable guests. + 5. Add `windows-sandbox` labels only after Sandbox support is proven. + registration-token-pattern.ps1: | + $repo = "FlowerCore.Updater" + $token = gh api -X POST "/repos/astoltz/$repo/actions/runners/registration-token" --jq .token + $runnerDir = "C:\fc-ghr\updater-fc-build-windows" + + New-Item -ItemType Directory -Force -Path $runnerDir | Out-Null + Set-Location $runnerDir + + # Install the Actions runner package here if config.cmd is absent. + .\config.cmd ` + --url "https://github.com/astoltz/$repo" ` + --token $token ` + --name "ci1-updater-fc-build-windows" ` + --labels "self-hosted,windows,fc-build-windows,ci1" ` + --work "_work" ` + --unattended ` + --replace + + .\svc.ps1 install + .\svc.ps1 start + verification.ps1: | + gh api /repos/astoltz/FlowerCore.Updater/actions/runners ` + --jq '.runners[] | {name,status,busy,labels:[.labels[].name]}' + + gh run list --repo astoltz/FlowerCore.Updater ` + --workflow "Updater Windows Sandbox E2E" --limit 3 + + $env:KUBECONFIG="$env:USERPROFILE\.kube\rke2.yaml" + kubectl -n kubevirt-vms get vm,vmi,pods -o wide