ci(github-runner): add Phase 2 ephemeral Linux runner K8s manifest

Namespace github-runner with myoung34/github-runner:latest Deployment, 5Gi Longhorn RWO NuGet cache PVC, zero-privilege ServiceAccount, and OnePasswordItem CRD for the registration token. EPHEMERAL=true mode re-registers after each job; Recreate strategy avoids RWO multi-attach. Targets fc-build-linux label; single replica pinned to rke2-server node. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add Pi signage Phase 1 player artifacts
2026-05-14 12:46:25 -05:00 · 2026-05-14 01:46:09 +00:00 · 2026-05-13 20:32:48 -05:00 · 2026-05-12 16:58:18 -05:00 · 2026-05-12 09:26:03 -05:00 · 2026-05-11 19:02:58 -05:00
28 changed files with 1261 additions and 432 deletions
--- a/apps/fc-redis/fc-redis.yaml
+++ b/apps/fc-redis/fc-redis.yaml
@@ -0,0 +1,171 @@
 # fc-redis — SignalR backplane for cross-product event bus
 #
 # Lands per Q-SO-1 resolution (2026-05-11 PM): SignalR backplane in Phase A,
 # not Phase C as originally drafted. Operator directive: "Redis can be
 # deployed just fine as it's another FlowerCore technology we'll want to
 # manage."
 #
 # Phase A scope (this file):
 #   - Single Redis 7.x Alpine pod
 #   - 1Gi Longhorn RWO PVC for AOF persistence
 #   - ClusterIP Service at `redis.fc-redis.svc.cluster.local:6379`
 #   - No AUTH (in-cluster only; not exposed externally)
 #   - No IngressRoute (backplane is server-to-server only)
 #
 # Consumers (Phase A IMPL across FC services):
 #   - FlowerCore.Signage.Web (OpsConsoleHub)
 #   - FlowerCore.Scoreboard.Web (ScoreboardHub)
 #   - FlowerCore.SignalControl.Web
 #   - FlowerCore.DMS.Web
 #   - Any other product joining the cross-product event bus
 #
 # Each consumer adds:
 #   services.AddSignalR()
 #           .AddStackExchangeRedis(
 #               "redis.fc-redis.svc.cluster.local:6379",
 #               opts => opts.Configuration.ChannelPrefix =
 #                   StackExchange.Redis.RedisChannel.Literal("fc-opsconsole"));
 #
 # Phase B / C follow-ons (out of scope here):
 #   - Redis Sentinel for HA (3-node)
 #   - AUTH password from 1Password Connect (rotate via /rotate-password)
 #   - redis_exporter sidecar for Prometheus scrape
 #   - Network policies restricting which namespaces can dial 6379
 #
 # Design: docs/signage/operations-console-phase-2-design.md §3.5
 # Decision: Q-SO-1 (RESOLVED 2026-05-11 PM)
 # Memory: feedback_blooming_ui_pattern_no_iframes
 ---
 apiVersion: v1
 kind: Namespace
 metadata:
  name: fc-redis
  labels:
    app.kubernetes.io/part-of: flowercore
    app.kubernetes.io/managed-by: argocd
 ---
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
  name: fc-redis-data
  namespace: fc-redis
 spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: longhorn
  resources:
    requests:
      storage: 1Gi
 ---
 apiVersion: v1
 kind: ConfigMap
 metadata:
  name: fc-redis-config
  namespace: fc-redis
 data:
  redis.conf: |
    # Phase A — minimal config; no AUTH, no replication.
    bind 0.0.0.0
    protected-mode no
    port 6379
    tcp-backlog 511
    timeout 0
    tcp-keepalive 300
    # Persistence: AOF (fsync every second is the standard SignalR-backplane
    # durability sweet spot — the backplane only needs to survive Redis
    # restarts, not absolute zero loss).
    appendonly yes
    appendfsync everysec
    auto-aof-rewrite-percentage 100
    auto-aof-rewrite-min-size 64mb
    # Reasonable defaults — let Redis pick most things.
    maxmemory-policy allkeys-lru
    maxmemory 256mb
    # Logging
    loglevel notice
 ---
 apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: fc-redis
  namespace: fc-redis
  labels:
    app: fc-redis
 spec:
  replicas: 1
  strategy:
    type: Recreate           # RWO PVC; do not do rolling update
  selector:
    matchLabels:
      app: fc-redis
  template:
    metadata:
      labels:
        app: fc-redis
    spec:
      securityContext:
        runAsNonRoot: true
        runAsUser: 999       # redis:7-alpine default uid
        runAsGroup: 999
        fsGroup: 999
      containers:
        - name: redis
          image: redis:7-alpine
          imagePullPolicy: IfNotPresent
          command: ["redis-server", "/etc/redis/redis.conf"]
          ports:
            - name: redis
              containerPort: 6379
          resources:
            requests:
              cpu: "50m"
              memory: "128Mi"
            limits:
              cpu: "500m"
              memory: "384Mi"
          volumeMounts:
            - name: data
              mountPath: /data
            - name: config
              mountPath: /etc/redis
              readOnly: true
          livenessProbe:
            tcpSocket:
              port: 6379
            initialDelaySeconds: 5
            periodSeconds: 10
          readinessProbe:
            exec:
              command: ["redis-cli", "ping"]
            initialDelaySeconds: 2
            periodSeconds: 5
          securityContext:
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true
            capabilities:
              drop: [ALL]
      volumes:
        - name: data
          persistentVolumeClaim:
            claimName: fc-redis-data
        - name: config
          configMap:
            name: fc-redis-config
 ---
 apiVersion: v1
 kind: Service
 metadata:
  name: redis
  namespace: fc-redis
 spec:
  type: ClusterIP
  selector:
    app: fc-redis
  ports:
    - name: redis
      port: 6379
      targetPort: 6379
      protocol: TCP
--- a/apps/fc-signage-appletv/README.md
+++ b/apps/fc-signage-appletv/README.md
@@ -0,0 +1,14 @@
 # fc-signage-appletv
 Apple TV signage is a sealed appliance running the `FlowerCore.Signage.Agent.AppleTv` tvOS app per ADR-134.
 This ApplicationSet entry is documentation and inventory metadata only. It intentionally creates no `Deployment`, `Service`, or `Pod`.
 The Apple TV app connects outbound to existing FC.Signage.Web surfaces:
 - `https://signage.iamworkin.lan/hub/signage` for SignalR live status.
 - `GET /api/v1/nodes/{nodeId}/state` for the 30 second polling fallback.
 - `POST /api/v1/nodes/register` and `POST /api/v1/nodes/{nodeId}/enroll` for pairing and mTLS enrollment.
 - `POST /api/v1/nodes/{nodeId}/heartbeat` for metrics, current content identity, and local audit excerpts.
 Distribution is via Apple Developer Enterprise Program or TestFlight plus FC.Distribution / UpdateCenter publishing once Apple credentials are available.
--- a/apps/fc-signage-appletv/kustomization.yaml
+++ b/apps/fc-signage-appletv/kustomization.yaml
@@ -0,0 +1,5 @@
 apiVersion: kustomize.config.k8s.io/v1beta1
 kind: Kustomization
 resources:
  - manifest.yaml
--- a/apps/fc-signage-appletv/manifest.yaml
+++ b/apps/fc-signage-appletv/manifest.yaml
@@ -0,0 +1,26 @@
 # Apple TV signage is a sealed tvOS appliance. This ArgoCD app intentionally
 # carries documentation metadata only; no Deployment, Service, or Pod resources
 # are created for the player.
 ---
 apiVersion: v1
 kind: ConfigMap
 metadata:
  name: fc-signage-appletv-docs
  namespace: fc-signage
  labels:
    app.kubernetes.io/name: fc-signage-appletv
    app.kubernetes.io/part-of: flowercore-signage
    flowercore.io/manifest-kind: docs-only
 data:
  README: |
    FlowerCore.Signage.Agent.AppleTv is distributed through Apple Developer
    Enterprise Program or TestFlight, not Kubernetes.
    The app connects outbound to FC.Signage.Web:
    - SignalR: https://signage.iamworkin.lan/hub/signage
    - Polling fallback: GET /api/v1/nodes/{nodeId}/state
    - Enrollment: POST /api/v1/nodes/{nodeId}/enroll
    - Heartbeat: POST /api/v1/nodes/{nodeId}/heartbeat
    This placeholder gives ArgoCD and inventory dashboards a first-class
    Apple TV signage app entry without creating runtime pods.
--- a/apps/fc-signage-pi-player/README.md
+++ b/apps/fc-signage-pi-player/README.md
@@ -0,0 +1,17 @@
 # FlowerCore Signage Pi Player
 Phase 1 Raspberry Pi signage player packaging for Chromium kiosk deployments.
 This bundle is intentionally air-gap friendly: systemd units, shell scripts,
 udev rules, and Chromium managed policy are all checked into the repo and are
 installed by `FlowerCore.Puppet`.
 ## Scope
 - Bootstrap a stable node identity and mTLS client certificate.
 - Launch Chromium in kiosk mode against `FC.Signage.Web` player routes.
 - Restart the kiosk on HDMI hotplug.
 - Renew mTLS certificates daily when fewer than 30 days remain.
 - Detect display capabilities at boot, daily, and on HDMI hotplug.
 Phase 2 native Avalonia rendering is documented separately in Notes and remains
 deferred.
--- a/apps/fc-signage-pi-player/chromium-policies/flowercore-signage.json
+++ b/apps/fc-signage-pi-player/chromium-policies/flowercore-signage.json
@@ -0,0 +1,15 @@
 {
  "AutofillAddressEnabled": false,
  "AutofillCreditCardEnabled": false,
  "PasswordManagerEnabled": false,
  "BrowserSignin": 0,
  "MetricsReportingEnabled": false,
  "SafeBrowsingProtectionLevel": 0,
  "DefaultNotificationsSetting": 2,
  "DefaultPopupsSetting": 2,
  "BackgroundModeEnabled": false,
  "DefaultBrowserSettingEnabled": false,
  "PromotionalTabsEnabled": false,
  "CommandLineFlagSecurityWarningsEnabled": false,
  "ExtensionInstallBlocklist": ["*"]
 }
--- a/apps/fc-signage-pi-player/scripts/fc-signage-detect-display
+++ b/apps/fc-signage-pi-player/scripts/fc-signage-detect-display
@@ -0,0 +1,132 @@
 #!/usr/bin/env bash
 set -euo pipefail
 NODE_JSON="/etc/flowercore/signage-node.json"
 CERT_DIR="/etc/fc-signage-player"
 SIGNAGE_URL="${FC_SIGNAGE_URL:-https://signage.iamworkin.lan}"
 NODE_ID=$(jq -r '.nodeId' "$NODE_JSON")
 CONNECTORS=()
 for dir in /sys/class/drm/card*-HDMI-A-*; do
  [[ -e "$dir/status" ]] || continue
  if [[ "$(cat "$dir/status")" == "connected" ]]; then
    CONNECTORS+=("$(basename "$dir")")
  fi
 done
 if [[ ${#CONNECTORS[@]} -eq 0 ]]; then
  CAPABILITIES_JSON=$(jq -n --arg id "$NODE_ID" '{
    nodeId: $id,
    platform: "linux-arm64-pi",
    displayConnected: false,
    detectedAt: (now | todate),
    note: "No HDMI display detected"
  }')
 else
  PRIMARY="${CONNECTORS[0]}"
  EDID_PATH="/sys/class/drm/${PRIMARY}/edid"
  WIDTH=0
  HEIGHT=0
  REFRESH=60
  HDR=false
  AUDIO_HDMI=false
  MFG=""
  MODEL=""
  PHYSICAL_SIZE=null
  if [[ -s "$EDID_PATH" ]] && command -v edid-decode >/dev/null 2>&1; then
    EDID_INFO=$(edid-decode < "$EDID_PATH" 2>/dev/null || true)
    MFG=$(echo "$EDID_INFO" | grep -m1 -oP 'Manufacturer:\s*\K\S+' || true)
    MODEL=$(echo "$EDID_INFO" | grep -m1 -oP 'Model:\s*\K\S+' || true)
    PREF=$(echo "$EDID_INFO" | grep -m1 -oP '\d+x\d+\s*@\s*\d+(?:\.\d+)?\s*Hz' || true)
    if [[ -n "$PREF" ]]; then
      WIDTH=$(echo "$PREF" | grep -oP '^\d+')
      HEIGHT=$(echo "$PREF" | grep -oP 'x\K\d+')
      REFRESH=$(echo "$PREF" | grep -oP '@\s*\K[\d.]+' | cut -d. -f1)
    fi
    if echo "$EDID_INFO" | grep -qiE 'HDR (Static|Dynamic) Metadata Block'; then HDR=true; fi
    if echo "$EDID_INFO" | grep -qiE 'CEA Audio Block|Audio Format Descriptor'; then AUDIO_HDMI=true; fi
    PH_W=$(echo "$EDID_INFO" | grep -m1 -oP 'Maximum image size:\s*\K\d+\s*cm\s*x\s*\d+' || true)
    if [[ -n "$PH_W" ]]; then
      PH_CM_W=$(echo "$PH_W" | grep -oP '^\d+')
      PH_CM_H=$(echo "$PH_W" | grep -oP 'x\s*\K\d+')
      if (( PH_CM_W > 0 && PH_CM_H > 0 )); then
        PHYSICAL_SIZE=$(awk -v w="$PH_CM_W" -v h="$PH_CM_H" 'BEGIN { printf "%.1f", sqrt(w*w + h*h)/2.54 }')
      fi
    fi
  fi
  if [[ "$WIDTH" == "0" ]] && command -v kmsprint >/dev/null 2>&1; then
    KMS=$(kmsprint 2>/dev/null | grep -A2 "$PRIMARY" | grep -oP '\d+x\d+' | head -1 || true)
    if [[ -n "$KMS" ]]; then
      WIDTH=$(echo "$KMS" | grep -oP '^\d+')
      HEIGHT=$(echo "$KMS" | grep -oP 'x\K\d+')
    fi
  fi
  AUDIO_ALSA=false
  if aplay -l 2>/dev/null | grep -qi 'card.*HDMI'; then AUDIO_ALSA=true; fi
  HAS_AUDIO=false
  if [[ "$AUDIO_HDMI" == "true" && "$AUDIO_ALSA" == "true" ]]; then HAS_AUDIO=true; fi
  CAPABILITIES_JSON=$(jq -n \
    --arg id "$NODE_ID" \
    --argjson w "$WIDTH" \
    --argjson h "$HEIGHT" \
    --argjson r "$REFRESH" \
    --argjson hdr "$HDR" \
    --argjson audio "$HAS_AUDIO" \
    --arg connector "$PRIMARY" \
    --arg mfg "$MFG" \
    --arg model "$MODEL" \
    --argjson size "$PHYSICAL_SIZE" \
    '{
      nodeId: $id,
      platform: "linux-arm64-pi",
      displayConnected: true,
      detectedAt: (now | todate),
      hardware: {
        maxResolution: { width: $w, height: $h },
        nativeResolution: { width: $w, height: $h },
        refreshRateHz: $r,
        colorDepth: ($hdr | if . then "Color30Hdr" else "Color24" end),
        hasAudioOutput: $audio,
        audioChannelCount: ($audio | if . then 2 else 0 end),
        physicalSizeInches: $size,
        connector: $connector,
        manufacturer: $mfg,
        modelName: $model
      },
      render: { codecs: ["h264", "vp9", "mp4"] }
    }')
 fi
 ENDPOINT_CANDIDATES=(
  "${SIGNAGE_URL}/api/v1/nodes/${NODE_ID}/capabilities"
  "${SIGNAGE_URL}/api/v1/displays/${NODE_ID}/capability-profile"
 )
 SUCCESS=false
 for url in "${ENDPOINT_CANDIDATES[@]}"; do
  HTTP_STATUS=$(curl -sk -o /tmp/cap-response.json -w "%{http_code}" \
    --max-time 10 \
    --cert "$CERT_DIR/client.crt" --key "$CERT_DIR/client.key" \
    -X POST "$url" \
    -H "Content-Type: application/json" \
    -d "$CAPABILITIES_JSON" || echo "000")
  if [[ "$HTTP_STATUS" == "200" || "$HTTP_STATUS" == "201" || "$HTTP_STATUS" == "204" ]]; then
    SUCCESS=true
    break
  fi
 done
 mkdir -p /var/log/fc-signage-player
 if [[ "$SUCCESS" != "true" ]]; then
  echo "[$(date -Is)] capability declare: no endpoint accepted the profile; logging locally" \
    | tee -a /var/log/fc-signage-player/capabilities.log
  echo "$CAPABILITIES_JSON" | tee -a /var/log/fc-signage-player/capabilities.log
 else
  echo "[$(date -Is)] capability declare: ok ($url)" | tee -a /var/log/fc-signage-player/capabilities.log
 fi
 echo "$CAPABILITIES_JSON"
--- a/apps/fc-signage-pi-player/scripts/flowercore-signage-bootstrap.sh
+++ b/apps/fc-signage-pi-player/scripts/flowercore-signage-bootstrap.sh
@@ -0,0 +1,144 @@
 #!/usr/bin/env bash
 set -euo pipefail
 NODE_JSON="/etc/flowercore/signage-node.json"
 CERT_DIR="/etc/fc-signage-player"
 SIGNAGE_URL="${FC_SIGNAGE_URL:-https://signage.iamworkin.lan}"
 SETUP_CODE_FILE="/etc/flowercore/signage-setup-code"
 mkdir -p /etc/flowercore "$CERT_DIR" /var/log/fc-signage-player
 chown fc-signage:fc-signage /etc/flowercore "$CERT_DIR" /var/log/fc-signage-player
 chmod 0750 "$CERT_DIR"
 if [[ -s "$NODE_JSON" && -s "$CERT_DIR/client.p12" ]]; then
  ENROLLED=$(jq -r '.enrolledAt // empty' "$NODE_JSON")
  if [[ -n "$ENROLLED" ]]; then
    echo "[$(date -Is)] bootstrap: already enrolled at $ENROLLED; skipping"
    exit 0
  fi
 fi
 if [[ -s "$NODE_JSON" ]]; then
  NODE_UUID=$(jq -r '.nodeUuid // empty' "$NODE_JSON")
  MACHINE_ID=$(jq -r '.machineId // empty' "$NODE_JSON")
 else
  NODE_UUID=$(uuidgen)
  MACHINE_ID=$(echo "$NODE_UUID" | tr -d '-' | cut -c1-16)
  jq -n --arg uuid "$NODE_UUID" --arg machine "$MACHINE_ID" --arg host "$(hostname -f)" --arg ts "$(date -Is)" \
    '{nodeUuid: $uuid, machineId: $machine, hostname: $host, platform: "linux-arm64-pi", createdAt: $ts}' \
    > "$NODE_JSON"
  chmod 0640 "$NODE_JSON"
  chown fc-signage:fc-signage "$NODE_JSON"
 fi
 SETUP_CODE=""
 if [[ -s "$SETUP_CODE_FILE" ]]; then
  SETUP_CODE=$(tr -d '\r\n\t ' < "$SETUP_CODE_FILE")
 fi
 MODEL=$(tr -d '\0' < /sys/firmware/devicetree/base/model 2>/dev/null || echo Unknown)
 REG_PAYLOAD=$(jq -n \
  --arg machine "$MACHINE_ID" \
  --arg name "$(hostname -f)" \
  --arg setup "$SETUP_CODE" \
  --arg resolution "1920x1080" \
  --arg model "$MODEL" \
  '{
    machineId: $machine,
    name: $name,
    setupCode: ($setup | if . == "" then null else . end),
    resolution: $resolution,
    hardwareModel: $model,
    platform: "linux-arm64-pi"
  }')
 for attempt in 1 2; do
  HTTP_STATUS=$(curl -sk -o /tmp/register-response.json -w "%{http_code}" \
    --max-time 15 \
    -X POST "${SIGNAGE_URL}/api/v1/nodes/register" \
    -H "Content-Type: application/json" \
    -d "$REG_PAYLOAD" || echo "000")
  if [[ "$HTTP_STATUS" == "200" || "$HTTP_STATUS" == "201" ]]; then
    break
  fi
  echo "[$(date -Is)] bootstrap: register attempt $attempt returned $HTTP_STATUS" >&2
  sleep 5
 done
 if [[ "$HTTP_STATUS" != "200" && "$HTTP_STATUS" != "201" ]]; then
  echo "[$(date -Is)] bootstrap: register failed after 2 attempts" >&2
  exit 2
 fi
 NODE_ID=$(jq -r '.nodeId // empty' /tmp/register-response.json)
 if [[ -z "$NODE_ID" ]]; then
  echo "[$(date -Is)] bootstrap: register response did not include nodeId" >&2
  exit 2
 fi
 jq --arg id "$NODE_ID" '.nodeId = $id' "$NODE_JSON" > "${NODE_JSON}.tmp" && mv "${NODE_JSON}.tmp" "$NODE_JSON"
 if [[ -s "$SETUP_CODE_FILE" ]]; then
  curl -sk -X POST "${SIGNAGE_URL}/api/v1/nodes/${NODE_ID}/approve-via-setup-code" \
    -H "Content-Type: application/json" \
    -d "{\"setupCode\":\"${SETUP_CODE}\"}" \
    -o /dev/null || true
 fi
 STATUS=""
 DEADLINE=$(( $(date +%s) + 1800 ))
 while (( $(date +%s) < DEADLINE )); do
  STATUS=$(curl -sk --max-time 5 "${SIGNAGE_URL}/api/v1/nodes/${NODE_ID}/status" | jq -r '.status // empty')
  if [[ "$STATUS" == "Approved" || "$STATUS" == "Enrolled" || "$STATUS" == "Online" ]]; then
    break
  fi
  sleep 15
 done
 if [[ "$STATUS" != "Approved" && "$STATUS" != "Enrolled" && "$STATUS" != "Online" ]]; then
  echo "[$(date -Is)] bootstrap: approval not granted within 30min budget" >&2
  exit 3
 fi
 KEY_PATH="${CERT_DIR}/client.key"
 CSR_PATH="${CERT_DIR}/client.csr"
 openssl ecparam -genkey -name prime256v1 -out "$KEY_PATH"
 openssl req -new -key "$KEY_PATH" -out "$CSR_PATH" \
  -subj "/CN=${NODE_ID}/O=FlowerCore/OU=SignagePlayer-Pi"
 ENROLL_PAYLOAD=$(jq -n --arg csr "$(cat "$CSR_PATH")" '{certificateSigningRequest: $csr}')
 HTTP_STATUS=$(curl -sk -o /tmp/enroll-response.json -w "%{http_code}" \
  --max-time 15 \
  -X POST "${SIGNAGE_URL}/api/v1/nodes/${NODE_ID}/enroll" \
  -H "Content-Type: application/json" \
  -d "$ENROLL_PAYLOAD")
 if [[ "$HTTP_STATUS" != "200" && "$HTTP_STATUS" != "201" ]]; then
  echo "[$(date -Is)] bootstrap: enroll failed with HTTP $HTTP_STATUS" >&2
  exit 4
 fi
 jq -r '.clientCertificatePem // .signedCertificatePem' /tmp/enroll-response.json > "${CERT_DIR}/client.crt"
 jq -r '.caCertificatePem' /tmp/enroll-response.json > "${CERT_DIR}/ca-chain.pem"
 P12_PASS=$(openssl rand -hex 24)
 echo -n "$P12_PASS" > "${CERT_DIR}/client.p12.pass"
 chmod 0600 "${CERT_DIR}/client.p12.pass"
 openssl pkcs12 -export \
  -inkey "$KEY_PATH" \
  -in "${CERT_DIR}/client.crt" \
  -certfile "${CERT_DIR}/ca-chain.pem" \
  -out "${CERT_DIR}/client.p12" \
  -password "pass:${P12_PASS}"
 chown fc-signage:fc-signage "${CERT_DIR}"/* "$NODE_JSON"
 chmod 0640 "${CERT_DIR}/client.p12" "${CERT_DIR}/client.crt" "${CERT_DIR}/ca-chain.pem" "$KEY_PATH"
 chmod 0600 "${CERT_DIR}/client.p12.pass"
 EXPIRY=$(openssl x509 -in "${CERT_DIR}/client.crt" -enddate -noout | sed 's/notAfter=//')
 jq --arg ts "$(date -Is)" --arg exp "$EXPIRY" \
  '.enrolledAt = $ts | .certExpiry = $exp' "$NODE_JSON" > "${NODE_JSON}.tmp" \
  && mv "${NODE_JSON}.tmp" "$NODE_JSON"
 systemctl start flowercore-signage-detect-display.service || true
 systemctl start flowercore-signage-player-pi.service || true
 echo "[$(date -Is)] bootstrap: enrolled and kiosk started (NodeId=${NODE_ID})"
--- a/apps/fc-signage-pi-player/scripts/flowercore-signage-hdmi-respond.sh
+++ b/apps/fc-signage-pi-player/scripts/flowercore-signage-hdmi-respond.sh
@@ -0,0 +1,6 @@
 #!/usr/bin/env bash
 set -euo pipefail
 sleep 2
 systemctl start flowercore-signage-detect-display.service || true
 systemctl restart flowercore-signage-player-pi.service
--- a/apps/fc-signage-pi-player/scripts/flowercore-signage-launch.sh
+++ b/apps/fc-signage-pi-player/scripts/flowercore-signage-launch.sh
@@ -0,0 +1,44 @@
 #!/usr/bin/env bash
 set -euo pipefail
 NODE_JSON="/etc/flowercore/signage-node.json"
 NODE_ID=$(jq -r '.nodeId' "$NODE_JSON")
 SIGNAGE_URL="${FC_SIGNAGE_URL:-https://signage.iamworkin.lan}"
 CERT_DIR="/etc/fc-signage-player"
 CERT_THUMB=$(openssl pkcs12 -in "$CERT_DIR/client.p12" -passin file:"$CERT_DIR/client.p12.pass" -nodes -nokeys 2>/dev/null \
  | openssl x509 -fingerprint -sha256 -noout \
  | sed 's/.*=//' \
  | tr -d ':')
 PLAYER_URL="${SIGNAGE_URL}/player/${NODE_ID}/embed?token=${CERT_THUMB}"
 HTTP_STATUS=$(curl -sk -o /dev/null -w "%{http_code}" --max-time 5 \
  --cert-type P12 --cert "$CERT_DIR/client.p12:$(cat "$CERT_DIR/client.p12.pass")" \
  "$PLAYER_URL" || echo "000")
 mkdir -p /var/log/fc-signage-player
 if [[ "$HTTP_STATUS" != "200" && "$HTTP_STATUS" != "301" && "$HTTP_STATUS" != "302" ]]; then
  echo "[$(date -Is)] /embed returned $HTTP_STATUS; falling back to /player/${NODE_ID}" \
    >> /var/log/fc-signage-player/url-divergence.log
  PLAYER_URL="${SIGNAGE_URL}/player/${NODE_ID}?token=${CERT_THUMB}"
 fi
 exec chromium-browser \
  --kiosk \
  --noerrdialogs \
  --disable-infobars \
  --disable-translate \
  --disable-features=TranslateUI,InfiniteSessionRestore \
  --autoplay-policy=no-user-gesture-required \
  --password-store=basic \
  --user-data-dir=/var/lib/fc-signage-player/profile \
  --disk-cache-dir=/var/lib/fc-signage-player/cache \
  --disk-cache-size=104857600 \
  --no-first-run \
  --no-default-browser-check \
  --check-for-update-interval=2592000 \
  --enable-features=OverlayScrollbar \
  --start-fullscreen \
  --window-position=0,0 \
  --window-size=1920,1080 \
  "$PLAYER_URL"
--- a/apps/fc-signage-pi-player/scripts/flowercore-signage-prelaunch.sh
+++ b/apps/fc-signage-pi-player/scripts/flowercore-signage-prelaunch.sh
@@ -0,0 +1,20 @@
 #!/usr/bin/env bash
 set -euo pipefail
 mkdir -p /var/log/fc-signage-player
 for f in /etc/flowercore/signage-node.json /etc/fc-signage-player/client.p12 /etc/fc-signage-player/client.p12.pass; do
  if [[ ! -r "$f" ]]; then
    echo "[$(date -Is)] prelaunch: missing or unreadable $f" >&2
    exit 1
  fi
 done
 if openssl pkcs12 -in /etc/fc-signage-player/client.p12 -passin file:/etc/fc-signage-player/client.p12.pass -nokeys -clcerts 2>/dev/null \
   | openssl x509 -checkend $((7*24*3600)) -noout; then
  :
 else
  echo "[$(date -Is)] prelaunch: client cert expires within 7 days" >&2
 fi
 echo "[$(date -Is)] prelaunch: ok" | tee -a /var/log/fc-signage-player/prelaunch.log
--- a/apps/fc-signage-pi-player/scripts/flowercore-signage-renew-cert.sh
+++ b/apps/fc-signage-pi-player/scripts/flowercore-signage-renew-cert.sh
@@ -0,0 +1,46 @@
 #!/usr/bin/env bash
 set -euo pipefail
 CERT_DIR="/etc/fc-signage-player"
 NODE_JSON="/etc/flowercore/signage-node.json"
 SIGNAGE_URL="${FC_SIGNAGE_URL:-https://signage.iamworkin.lan}"
 [[ -s "$CERT_DIR/client.crt" ]] || { echo "no cert to renew"; exit 0; }
 if openssl x509 -in "$CERT_DIR/client.crt" -checkend $((30*24*3600)) -noout; then
  exit 0
 fi
 NODE_ID=$(jq -r '.nodeId' "$NODE_JSON")
 NEW_KEY="$CERT_DIR/client.key.new"
 NEW_CSR="$CERT_DIR/client.csr.new"
 openssl ecparam -genkey -name prime256v1 -out "$NEW_KEY"
 openssl req -new -key "$NEW_KEY" -out "$NEW_CSR" \
  -subj "/CN=${NODE_ID}/O=FlowerCore/OU=SignagePlayer-Pi"
 HTTP_STATUS=$(curl -sk -o /tmp/renew-response.json -w "%{http_code}" \
  --cert "$CERT_DIR/client.crt" --key "$CERT_DIR/client.key" \
  -X POST "${SIGNAGE_URL}/api/v1/nodes/${NODE_ID}/renew" \
  -H "Content-Type: application/json" \
  -d "$(jq -n --arg csr "$(cat "$NEW_CSR")" '{certificateSigningRequest: $csr}')")
 if [[ "$HTTP_STATUS" != "200" && "$HTTP_STATUS" != "201" ]]; then
  echo "[$(date -Is)] renew: failed HTTP $HTTP_STATUS; leaving old cert in place" >&2
  exit 5
 fi
 jq -r '.clientCertificatePem // .signedCertificatePem' /tmp/renew-response.json > "$CERT_DIR/client.crt.new"
 jq -r '.caCertificatePem' /tmp/renew-response.json > "$CERT_DIR/ca-chain.pem.new"
 P12_PASS=$(cat "$CERT_DIR/client.p12.pass")
 openssl pkcs12 -export -inkey "$NEW_KEY" -in "$CERT_DIR/client.crt.new" \
  -certfile "$CERT_DIR/ca-chain.pem.new" \
  -out "$CERT_DIR/client.p12.new" -password "pass:${P12_PASS}"
 mv "$CERT_DIR/client.key.new" "$CERT_DIR/client.key"
 mv "$CERT_DIR/client.crt.new" "$CERT_DIR/client.crt"
 mv "$CERT_DIR/ca-chain.pem.new" "$CERT_DIR/ca-chain.pem"
 mv "$CERT_DIR/client.p12.new" "$CERT_DIR/client.p12"
 chown fc-signage:fc-signage "$CERT_DIR"/client.*
 systemctl restart flowercore-signage-player-pi.service
--- a/apps/fc-signage-pi-player/systemd/99-flowercore-signage-hdmi.rules
+++ b/apps/fc-signage-pi-player/systemd/99-flowercore-signage-hdmi.rules
@@ -0,0 +1,3 @@
 # Restart kiosk and redeclare capabilities when HDMI connect/disconnect changes DRM state.
 SUBSYSTEM=="drm", KERNEL=="card?-HDMI-A-?", ACTION=="change", RUN+="/usr/bin/systemctl restart flowercore-signage-player-pi.service"
 SUBSYSTEM=="drm", KERNEL=="card?-HDMI-A-?", ACTION=="change", RUN+="/usr/bin/systemctl start flowercore-signage-detect-display.service"
--- a/apps/fc-signage-pi-player/systemd/flowercore-signage-bootstrap.service
+++ b/apps/fc-signage-pi-player/systemd/flowercore-signage-bootstrap.service
@@ -0,0 +1,16 @@
 [Unit]
 Description=FlowerCore Signage Pi: first-boot identity + mTLS enrollment
 Wants=network-online.target
 After=network-online.target
 Before=flowercore-signage-player-pi.service
 [Service]
 Type=oneshot
 ExecStart=/usr/local/bin/flowercore-signage-bootstrap.sh
 RemainAfterExit=yes
 StandardOutput=journal
 StandardError=journal
 TimeoutStartSec=2100
 [Install]
 WantedBy=multi-user.target
--- a/apps/fc-signage-pi-player/systemd/flowercore-signage-detect-display.service
+++ b/apps/fc-signage-pi-player/systemd/flowercore-signage-detect-display.service
@@ -0,0 +1,8 @@
 [Unit]
 Description=FlowerCore Signage Pi: detect connected display + declare capabilities
 After=flowercore-signage-bootstrap.service
 [Service]
 Type=oneshot
 User=fc-signage
 ExecStart=/usr/local/bin/fc-signage-detect-display
--- a/apps/fc-signage-pi-player/systemd/flowercore-signage-detect-display.timer
+++ b/apps/fc-signage-pi-player/systemd/flowercore-signage-detect-display.timer
@@ -0,0 +1,11 @@
 [Unit]
 Description=Daily FlowerCore Signage Pi display capability redeclaration
 [Timer]
 OnCalendar=daily
 RandomizedDelaySec=1h
 Persistent=true
 OnBootSec=30s
 [Install]
 WantedBy=timers.target
--- a/apps/fc-signage-pi-player/systemd/flowercore-signage-player-pi-hdmi.service
+++ b/apps/fc-signage-pi-player/systemd/flowercore-signage-player-pi-hdmi.service
@@ -0,0 +1,7 @@
 [Unit]
 Description=FlowerCore Signage Pi Player HDMI hotplug responder
 DefaultDependencies=no
 [Service]
 Type=oneshot
 ExecStart=/usr/local/bin/flowercore-signage-hdmi-respond.sh
--- a/apps/fc-signage-pi-player/systemd/flowercore-signage-player-pi.service
+++ b/apps/fc-signage-pi-player/systemd/flowercore-signage-player-pi.service
@@ -0,0 +1,30 @@
 [Unit]
 Description=FlowerCore Digital Signage Pi Player (Chromium kiosk)
 Documentation=https://github.com/astoltz/FlowerCore.Notes/blob/master/docs/standards/appletv-pi-signage-agents-design.md
 Wants=network-online.target
 After=network-online.target graphical.target
 ConditionPathExists=/etc/flowercore/signage-node.json
 ConditionPathExists=/etc/fc-signage-player/client.p12
 [Service]
 Type=simple
 User=fc-signage
 Group=fc-signage
 WorkingDirectory=/var/lib/fc-signage-player
 EnvironmentFile=-/etc/flowercore/signage-player.env
 ExecStartPre=/usr/local/bin/flowercore-signage-prelaunch.sh
 ExecStart=/usr/local/bin/flowercore-signage-launch.sh
 Restart=always
 RestartSec=10s
 StartLimitBurst=5
 StartLimitIntervalSec=300s
 MemoryMax=2G
 MemoryHigh=1500M
 ProtectSystem=strict
 ProtectHome=true
 ReadWritePaths=/var/lib/fc-signage-player /var/log/fc-signage-player
 PrivateTmp=true
 NoNewPrivileges=true
 [Install]
 WantedBy=graphical.target
--- a/apps/fc-signage-pi-player/systemd/flowercore-signage-renew.service
+++ b/apps/fc-signage-pi-player/systemd/flowercore-signage-renew.service
@@ -0,0 +1,6 @@
 [Unit]
 Description=FlowerCore Signage Pi: cert renewal worker
 [Service]
 Type=oneshot
 ExecStart=/usr/local/bin/flowercore-signage-renew-cert.sh
--- a/apps/fc-signage-pi-player/systemd/flowercore-signage-renew.timer
+++ b/apps/fc-signage-pi-player/systemd/flowercore-signage-renew.timer
@@ -0,0 +1,10 @@
 [Unit]
 Description=Daily check for FlowerCore Signage Pi cert renewal
 [Timer]
 OnCalendar=daily
 RandomizedDelaySec=2h
 Persistent=true
 [Install]
 WantedBy=timers.target
--- a/apps/github-runner/github-runner.yaml
+++ b/apps/github-runner/github-runner.yaml
@@ -0,0 +1,196 @@
 # GitHub Actions self-hosted Linux runner — Phase 2 K8s deployment
 #
 # Phase 1 (current): BLUEJAY-WS registered manually as a Windows runner
 #   with label "fc-build-windows" via config.cmd (see docs/infrastructure/
 #   self-hosted-runner-fleet.md §WPF Build Runner).
 #
 # Phase 2 (this file): ephemeral Linux runner in RKE2 for non-WPF builds
 #   (Blazor Server, class libraries, operators, integration tests). Reduces
 #   billing for ubuntu-24.04 jobs that run on GitHub-hosted runners today.
 #
 # Runner image: myoung34/github-runner:latest
 #   EPHEMERAL=true — each pod runs exactly one job then exits; the
 #   Deployment controller immediately recreates it and re-registers.
 #   Prevents job queue starvation when two jobs overlap.
 #
 # NuGet cache: 5Gi Longhorn RWO PVC mounted at /home/runner/.nuget/packages
 #   Persists NuGet packages across ephemeral pod restarts (not shared across
 #   simultaneous runner pods; single-replica constraint below).
 #
 # Credentials:
 #   OnePasswordItem "GitHub Runner Registration Token" → Secret
 #   github-runner-token with field "credential" used as RUNNER_TOKEN.
 #   Operator must create/rotate the 1P item manually; registration tokens
 #   expire after 1h — use a fine-grained PAT with admin:org_hook scope
 #   or a re-registration script. See docs/infrastructure/
 #   self-hosted-runner-fleet.md §Security.
 #
 # Security model:
 #   - No ClusterRole / ClusterRoleBinding — runner has no K8s API access.
 #   - securityContext: runAsNonRoot with read-only root filesystem where
 #     possible (runner image needs /tmp and /home/runner writable).
 #   - Fork pull-request approval required on the GitHub repo settings.
 #   - RUNNER_ALLOW_RUNASROOT=false is the default.
 #
 # Cost: Phase 2 eliminates GitHub-hosted ubuntu-24.04 billing; break-even
 #   vs electricity is ~1 000 min/month at current TOU rates.
 #
 # Node placement: rke2-server (10.0.56.11) only — Longhorn RWO PVC must
 #   land on the same node as the volume, and the server node has the most
 #   spare capacity for burst CI workloads.
 #
 # Designs: docs/infrastructure/self-hosted-runner-fleet.md
 # Questions: Q-CI-1..5 (all Recommended defaults)
 ---
 apiVersion: v1
 kind: Namespace
 metadata:
  name: github-runner
  labels:
    app.kubernetes.io/part-of: flowercore
    app.kubernetes.io/managed-by: argocd
 ---
 # 1Password secret sync — creates github-runner-token K8s Secret.
 # Fields expected in the 1Password item:
 #   credential — GitHub runner registration token (or PAT for re-reg script)
 # Item path: IAmWorkin vault > "GitHub Runner Registration Token"
 # Operator MUST create this item before the Deployment will start cleanly.
 apiVersion: onepassword.com/v1
 kind: OnePasswordItem
 metadata:
  name: github-runner-token
  namespace: github-runner
  labels:
    app.kubernetes.io/component: credentials
    app.kubernetes.io/part-of: flowercore
 spec:
  itemPath: vaults/IAmWorkin/items/GitHub Runner Registration Token
 ---
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
  name: github-runner-nuget-cache
  namespace: github-runner
  labels:
    app.kubernetes.io/component: cache
    app.kubernetes.io/part-of: flowercore
 spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: longhorn
  resources:
    requests:
      storage: 5Gi
  volumeMode: Filesystem
 ---
 apiVersion: v1
 kind: ServiceAccount
 metadata:
  name: github-runner
  namespace: github-runner
  labels:
    app.kubernetes.io/component: runner
    app.kubernetes.io/part-of: flowercore
 # No ClusterRole or ClusterRoleBinding — runner has zero K8s API privileges.
 # CI jobs that need kubectl must supply their own kubeconfig via a secret
 # injected at the job level, not via this service account.
 ---
 apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: github-runner
  namespace: github-runner
  labels:
    app.kubernetes.io/name: github-runner
    app.kubernetes.io/component: runner
    app.kubernetes.io/part-of: flowercore
    app.kubernetes.io/managed-by: argocd
    flowercore.io/created-by: argocd
 spec:
  # Single replica enforced: the Longhorn RWO PVC can only be mounted by
  # one pod at a time. Each pod re-registers as an ephemeral runner after
  # completing a job (EPHEMERAL=true restarts the container, not the pod,
  # so the PVC stays attached between jobs).
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: github-runner
  # Use Recreate to avoid the Multi-Attach RWO error during rollouts.
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app.kubernetes.io/name: github-runner
        app.kubernetes.io/component: runner
        app.kubernetes.io/part-of: flowercore
        flowercore.io/created-by: argocd
    spec:
      serviceAccountName: github-runner
      # Pin to rke2-server so the Longhorn RWO volume is always on the same node.
      nodeSelector:
        kubernetes.io/hostname: rke2-server
      securityContext:
        runAsNonRoot: true
        runAsUser: 1001
        runAsGroup: 1001
        fsGroup: 1001
      containers:
        - name: runner
          image: myoung34/github-runner:latest
          imagePullPolicy: Always
          env:
            # GitHub org/repo targeting.
            # Set REPO_URL for a repo-scoped runner (cheaper, simpler).
            # Switch to ORG_NAME + empty REPO_URL for an org-scoped runner.
            - name: REPO_URL
              value: "https://github.com/astoltz/FlowerCore.Common"
            - name: RUNNER_NAME_PREFIX
              value: "rke2-linux"
            - name: RUNNER_WORKDIR
              value: "/tmp/runner/work"
            # EPHEMERAL=true: runner deregisters after one job; container
            # exits with code 0; Deployment controller restarts it and a
            # fresh registration occurs. Prevents stale runner accumulation.
            - name: EPHEMERAL
              value: "true"
            # Labels used by workflow files: runs-on: [self-hosted, linux, fc-build-linux]
            - name: LABELS
              value: "self-hosted,linux,fc-build-linux"
            # Registration token injected from 1Password via OnePasswordItem CRD.
            - name: RUNNER_TOKEN
              valueFrom:
                secretKeyRef:
                  name: github-runner-token
                  key: credential
          resources:
            requests:
              cpu: "500m"
              memory: "1Gi"
            limits:
              cpu: "2000m"
              memory: "4Gi"
          volumeMounts:
            - name: nuget-cache
              mountPath: /home/runner/.nuget/packages
            - name: tmp
              mountPath: /tmp
          # Liveness: runner process is alive.
          livenessProbe:
            exec:
              command:
                - /bin/sh
                - -c
                - "pgrep -f Runner.Listener > /dev/null"
            initialDelaySeconds: 30
            periodSeconds: 30
            failureThreshold: 3
      volumes:
        - name: nuget-cache
          persistentVolumeClaim:
            claimName: github-runner-nuget-cache
        - name: tmp
          emptyDir: {}
      # Restart policy: Always — the Deployment controller handles
      # re-registration after each ephemeral job completes.
      restartPolicy: Always
--- a/apps/guacamole/guacamole.yaml
+++ b/apps/guacamole/guacamole.yaml
@@ -466,11 +466,11 @@ spec:
  itemPath: vaults/IAmWorkin/items/Guacamole JSON Auth
 ---
 ---
-# 1Password-backed credentials for Mac mini VNC access (Phase 1 — 2026-04-28)
+# 1Password-backed credentials for Mac mini VNC access (Phase 1 <EFBFBD> 2026-04-28)
 # The operator mints Secret 'macmini-vnc-creds' with keys: username, password, VNC Password
 # Note: '1Password' field label 'VNC Password' -> K8s Secret key 'VNC Password' (space retained)
 # Guacamole VNC connection password is sourced from the 'VNC Password' field.
-# Actual IP is 10.0.56.115 (INFRA VLAN) — the 1P item 'IP' field is kept as backup reference.
+# Actual IP is 10.0.56.115 (INFRA VLAN) <EFBFBD> the 1P item 'IP' field is kept as backup reference.
 apiVersion: onepassword.com/v1
 kind: OnePasswordItem
 metadata:
@@ -481,6 +481,7 @@ metadata:
    app.kubernetes.io/part-of: flowercore
 spec:
  itemPath: vaults/IAmWorkin/items/Mac Mini
 ---
 # Blue Jay Branding Extension (CSS + translations)
 apiVersion: v1
 kind: ConfigMap
--- a/apps/kubevirt-vms/ci1.yaml
+++ b/apps/kubevirt-vms/ci1.yaml
@@ -1,51 +1,9 @@
 # =============================================================================
-# ci1 — Windows Server 2025 KubeVirt VM (GitHub Actions Self-Hosted Runner)
+# ci1 - Windows Server 2025 KubeVirt VM (GitHub Actions Self-Hosted Runner)
 # =============================================================================
-# Purpose: dedicated CI runner for FlowerCore.Updater Sandbox E2E nightly +
+# Boots from the sysprepped containerDisk template built by the Windows VM
-# future fleet WPF AAT lanes. Replaces the never-registered
+# sysprep pipeline. See docs/infrastructure/windows-vm-sysprep-pipeline.md.
-# `bluejay-ws-sandbox-1` runner placeholder. Andrew explicitly does NOT want
+# Path A/B/C install history is preserved in git log only.
 # BLUEJAY-WS registered as a runner (workstation has personal/operator state).
 #
 # Storage layout (2026-05-08):
 #   * ISO is now sourced from Synology NFS (Path B) — see
 #     win2025-iso-nfs-pv.yaml. The Longhorn Filesystem PVC
 #     `windows-server-2025-iso` below is RETAINED but UNUSED so the prior
 #     CDI upload state is preserved as a fallback (and so ArgoCD doesn't
 #     prune it on this commit). It can be deleted in a follow-up commit
 #     after the NFS path is proven on a successful Windows install.
 #
 # Status (2026-05-08): LIVE — Phase 1 prereqs satisfied:
 #   * Multus CNI v4.2.2 thick-plugin DaemonSet running on all 3 RKE2 nodes
 #     (apps/multus/multus.yaml; ApplicationSet `infra-multus` Synced/Healthy)
 #   * CDI v1.65.0 operator + CR Deployed (apps/cdi/; ApplicationSet
 #     `infra-cdi` Synced/Healthy; uploadproxy reachable via kubectl port-forward)
 #   * Windows Server 2025 ISO uploaded via CDI virtctl image-upload to
 #     PVC windows-server-2025-iso (7.7 GiB → 10Gi PVC, Bound, Upload Complete)
 #   * Local Administrator password generated, stored in 1Password vault
 #     IAmWorkin (qaphopopkryhbg353ukzhhuqoq) item id h3ix4mgfk65gmkcmvh6ly3d3hu
 #   * NetworkAttachmentDefinition prod-vlan57 registered (apps/kubevirt-vms/
 #     prod-vlan57-nad.yaml). VM still uses pod-network masquerade until Phase 1.5
 #     host bridge work lands (Puppet br-prod + enp86s0.57); switching is a
 #     one-line YAML edit + git push.
 #
 # See docs/infrastructure/windows-server-build-runner-plan.md "Phase 1 readiness gate".
 #
 # Network choice in this draft: **pod-network fallback** (Calico default).
 # Outbound-only is fine for the Updater Sandbox E2E runner workload (the runner
 # polls GitHub Actions over HTTPS; no inbound listener needed). Switch to a
 # Multus PROD VLAN NetworkAttachmentDefinition once Multus is installed and the
 # operator wants L2 access from `ci1` to other PROD VLAN services.
 #
 # Sizing: 8 vCPU / 16 GB RAM / 200 GB disk on Longhorn (default storageClass).
 # Capacity check 2026-05-08: each RKE2 node has 16 vCPU / ~64Gi allocatable;
 # 8 vCPU is ~17% of one node's allocatable, fits comfortably.
 #
 # Apply (after operator approval + ISO loaded):
 #   kubectl --kubeconfig $env:USERPROFILE\.kube\rke2.yaml apply -f apps/kubevirt-vms/ci1.yaml
 #
 # Connect to console for Windows install:
 #   virtctl --kubeconfig $env:USERPROFILE\.kube\rke2.yaml vnc ci1 -n kubevirt-vms
 #   (Or via Guacamole once a connection profile is added.)
 # =============================================================================
 apiVersion: v1
@@ -57,248 +15,6 @@ metadata:
    pod-security.kubernetes.io/enforce: privileged
 ---
 # ISO PVC — populated via CDI virtctl image-upload (CDI is now installed).
 #
 # **Volume mode (2026-05-08 status):** Filesystem-mode PVC. A migration to
 # `volumeMode: Block` via DataVolume was attempted to address an OVMF SATA
 # CDROM read timeout, but CDI v1.65.0's upload-target pod runs as uid 107
 # with `capabilities.drop: [ALL]` and cannot open the underlying block
 # device (`blockdev: cannot open /dev/cdi-block-volume: Permission denied`).
 # Reverted to Filesystem PVC pending one of:
 #   - CDI deployment override granting CAP_SYS_RAWIO to upload pod
 #   - Pre-populated PVC via privileged init pod that dd's the ISO directly
 #   - Migration to a different storage class that exposes block devices
 #     differently (e.g. iSCSI, where Longhorn's CSI mount path may behave
 #     differently)
 #
 # Population workflow (this PVC, Filesystem mode):
 #   1. virtctl --kubeconfig $env:USERPROFILE\.kube\rke2.yaml image-upload pvc \
 #        windows-server-2025-iso -n kubevirt-vms \
 #        --image-path "$env:USERPROFILE\Downloads\en-us_windows_server_2025_updated_march_2026_x64_dvd_8e06425a.iso" \
 #        --size 10Gi --storage-class longhorn --access-mode ReadWriteOnce \
 #        --uploadproxy-url https://localhost:8443 --insecure
 #   (--uploadproxy-url uses port-forward in practice: `kubectl port-forward
 #   -n cdi service/cdi-uploadproxy 8443:443 &` first.)
 #
 # **Open boot issue:** even with the ISO at bootOrder:1, OVMF console showed:
 #   BdsDxe: starting Boot0001 "UEFI QEMU DVD-ROM QM00001 " from ... Sata(...)
 #   BdsDxe: failed to start Boot0001 ... Time out
 # Diagnosis confirmed PVC content IS a valid bootable ISO9660 image — the
 # timeout is in OVMF reading from the SATA-CDROM-backed-by-filesystem-PVC.
 # Block mode would likely fix it; see CDI permission issue above.
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
  name: windows-server-2025-iso
  namespace: kubevirt-vms
  labels:
    app: ci-runner
    flowercore.io/managed-by: bluejay-infra
 spec:
  accessModes:
    - ReadWriteOnce          # Bump to ReadOnlyMany after population for multi-VM use
  resources:
    requests:
      storage: 10Gi          # Server 2025 ISO is 7.7GB; 10Gi for headroom
  storageClassName: longhorn
 ---
 # Root disk PVC — empty 200Gi volume that Windows installs into.
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
  name: ci1-rootdisk
  namespace: kubevirt-vms
 spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 200Gi
  storageClassName: longhorn
 ---
 # Sysprep ConfigMap — autounattend.xml for hands-off Windows install.
 # Sets local Administrator password (REPLACE the placeholder), enables RDP,
 # enables WinRM, sets hostname, and configures static-ish networking via DHCP.
 # The ISO + VirtIO drivers handle the rest.
 apiVersion: v1
 kind: ConfigMap
 metadata:
  name: ci1-autounattend
  namespace: kubevirt-vms
 data:
  autounattend.xml: |
    <?xml version="1.0" encoding="utf-8"?>
    <unattend xmlns="urn:schemas-microsoft-com:unattend">
      <!-- Pass 1: WindowsPE — Disk setup and VirtIO driver injection -->
      <settings pass="windowsPE">
        <component name="Microsoft-Windows-International-Core-WinPE"
                   processorArchitecture="amd64"
                   publicKeyToken="31bf3856ad364e35"
                   language="neutral" versionScope="nonSxS">
          <SetupUILanguage>
            <UILanguage>en-US</UILanguage>
          </SetupUILanguage>
          <InputLocale>en-US</InputLocale>
          <SystemLocale>en-US</SystemLocale>
          <UILanguage>en-US</UILanguage>
          <UserLocale>en-US</UserLocale>
        </component>
        <component name="Microsoft-Windows-PnpCustomizationsWinPE"
                   processorArchitecture="amd64"
                   publicKeyToken="31bf3856ad364e35"
                   language="neutral" versionScope="nonSxS">
          <DriverPaths>
            <PathAndCredentials wcm:action="add" wcm:keyValue="1">
              <Path>E:\amd64\2k25</Path>
            </PathAndCredentials>
          </DriverPaths>
        </component>
        <component name="Microsoft-Windows-Setup"
                   processorArchitecture="amd64"
                   publicKeyToken="31bf3856ad364e35"
                   language="neutral" versionScope="nonSxS">
          <DiskConfiguration>
            <Disk wcm:action="add">
              <DiskID>0</DiskID>
              <WillWipeDisk>true</WillWipeDisk>
              <CreatePartitions>
                <CreatePartition wcm:action="add">
                  <Order>1</Order>
                  <Size>260</Size>
                  <Type>EFI</Type>
                </CreatePartition>
                <CreatePartition wcm:action="add">
                  <Order>2</Order>
                  <Size>128</Size>
                  <Type>MSR</Type>
                </CreatePartition>
                <CreatePartition wcm:action="add">
                  <Order>3</Order>
                  <Extend>true</Extend>
                  <Type>Primary</Type>
                </CreatePartition>
              </CreatePartitions>
              <ModifyPartitions>
                <ModifyPartition wcm:action="add">
                  <Order>1</Order>
                  <PartitionID>1</PartitionID>
                  <Format>FAT32</Format>
                  <Label>EFI</Label>
                </ModifyPartition>
                <ModifyPartition wcm:action="add">
                  <Order>2</Order>
                  <PartitionID>2</PartitionID>
                </ModifyPartition>
                <ModifyPartition wcm:action="add">
                  <Order>3</Order>
                  <PartitionID>3</PartitionID>
                  <Format>NTFS</Format>
                  <Label>Windows</Label>
                </ModifyPartition>
              </ModifyPartitions>
            </Disk>
          </DiskConfiguration>
          <ImageInstall>
            <OSImage>
              <InstallTo>
                <DiskID>0</DiskID>
                <PartitionID>3</PartitionID>
              </InstallTo>
              <!-- Index 2 = Standard Desktop Experience. Use 4 for Datacenter Desktop. -->
              <InstallFrom>
                <MetaData wcm:action="add">
                  <Key>/IMAGE/INDEX</Key>
                  <Value>2</Value>
                </MetaData>
              </InstallFrom>
            </OSImage>
          </ImageInstall>
          <UserData>
            <AcceptEula>true</AcceptEula>
            <FullName>FlowerCore CI Runner</FullName>
            <Organization>FlowerCore</Organization>
            <!-- Eval install — no product key needed for 180-day evaluation -->
          </UserData>
        </component>
      </settings>
      <!-- Pass 4: Specialize — Hostname, RDP, WinRM -->
      <settings pass="specialize">
        <component name="Microsoft-Windows-Shell-Setup"
                   processorArchitecture="amd64"
                   publicKeyToken="31bf3856ad364e35"
                   language="neutral" versionScope="nonSxS">
          <ComputerName>CI1</ComputerName>
          <TimeZone>Central Standard Time</TimeZone>
        </component>
        <component name="Microsoft-Windows-TerminalServices-LocalSessionManager"
                   processorArchitecture="amd64"
                   publicKeyToken="31bf3856ad364e35"
                   language="neutral" versionScope="nonSxS">
          <fDenyTSConnections>false</fDenyTSConnections>
        </component>
      </settings>
      <!-- Pass 7: OOBE — Admin account, RDP firewall, WinRM -->
      <settings pass="oobeSystem">
        <component name="Microsoft-Windows-Shell-Setup"
                   processorArchitecture="amd64"
                   publicKeyToken="31bf3856ad364e35"
                   language="neutral" versionScope="nonSxS">
          <OOBE>
            <HideEULAPage>true</HideEULAPage>
            <HideLocalAccountScreen>true</HideLocalAccountScreen>
            <HideOEMRegistrationScreen>true</HideOEMRegistrationScreen>
            <HideOnlineAccountScreens>true</HideOnlineAccountScreens>
            <HideWirelessSetupInOOBE>true</HideWirelessSetupInOOBE>
            <ProtectYourPC>3</ProtectYourPC>
          </OOBE>
          <UserAccounts>
            <AdministratorPassword>
              <!-- Real password is in 1Password — vault qaphopopkryhbg353ukzhhuqoq,
                   item id h3ix4mgfk65gmkcmvh6ly3d3hu, title:
                   "ci1 Administrator (Windows Server 2025 KubeVirt VM)".
                   Field "autounattend AdministratorPassword Value (UTF-16-LE base64)"
                   matches the Value below.
                   To rotate: regenerate, recompute base64
                     $combined = $pw + "AdministratorPassword"
                     [Convert]::ToBase64String([Text.Encoding]::Unicode.GetBytes($combined))
                   then update both 1P item AND this Value field, recreate VM. -->
              <Value>bAA3AGsANABOAHcAcgBMAG4AeQBTAHUAYgBBAHQAaQBzAFUAcAB6AEMAWQAhADkAYQBCAEEAZABtAGkAbgBpAHMAdAByAGEAdABvAHIAUABhAHMAcwB3AG8AcgBkAA==</Value>
              <PlainText>false</PlainText>
            </AdministratorPassword>
          </UserAccounts>
          <FirstLogonCommands>
            <SynchronousCommand wcm:action="add">
              <Order>1</Order>
              <CommandLine>powershell.exe -ExecutionPolicy Bypass -Command "Set-NetFirewallRule -DisplayGroup 'Remote Desktop' -Enabled True"</CommandLine>
              <Description>Enable RDP firewall rule</Description>
            </SynchronousCommand>
            <SynchronousCommand wcm:action="add">
              <Order>2</Order>
              <CommandLine>powershell.exe -ExecutionPolicy Bypass -Command "Enable-PSRemoting -Force; Set-Item WSMan:\localhost\Service\Auth\Basic $true; Set-Item WSMan:\localhost\Service\AllowUnencrypted $true"</CommandLine>
              <Description>Enable WinRM (Phase 2 will pivot to HTTPS via step-ca cert)</Description>
            </SynchronousCommand>
            <SynchronousCommand wcm:action="add">
              <Order>3</Order>
              <CommandLine>cmd.exe /c reg add "HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System" /v EnableLUA /t REG_DWORD /d 0 /f</CommandLine>
              <Description>Disable UAC (Phase 2 Puppet will re-evaluate)</Description>
            </SynchronousCommand>
          </FirstLogonCommands>
        </component>
      </settings>
    </unattend>
 ---
 # VirtualMachine — Windows Server 2025 CI runner.
 apiVersion: kubevirt.io/v1
 kind: VirtualMachine
 metadata:
@@ -309,33 +25,7 @@ metadata:
    role: github-actions-runner
    flowercore.io/managed-by: bluejay-infra
 spec:
-  # `running: true` is deprecated in favor of `runStrategy`. They are mutually
+  runStrategy: Always
  # exclusive — KubeVirt's validating webhook rejects any VM that sets both:
  #   admission webhook "virtualmachine-validator.kubevirt.io" denied the request:
  #   Running and RunStrategy are mutually exclusive.
  # `Always` keeps a VMI running and restarts it if it crashes/exits — same
  # semantics as the old `running: true`.
  #
  # **2026-05-08 status: VM cannot start due to a stale QEMU flock on the
  # rootdisk PVC** (qemu reports `Failed to get "write" lock` on
  # `/var/run/kubevirt-private/vmi-disks/rootdisk/disk.img`). The flock was
  # left by a previous QEMU process during a force-deleted launcher pod
  # cycle. Recovery requires either (a) a Longhorn engine restart on
  # rke2-agent2, (b) a Longhorn volume detach via the longhorn-manager API
  # (kubectl patch on `volume.longhorn.io/<pvc-name>` does not work — the
  # spec.nodeID is reconciled back), or (c) a node reboot of rke2-agent2.
  #
  # **Confirmed working:** the bootOrder swap (windows-iso=1, rootdisk=2)
  # and the runStrategy migration (above). The ISO PVC was successfully
  # repopulated via virtctl image-upload pvc on the Filesystem-mode PVC.
  #
  # **Open: SATA CDROM read timeout** — even with bootOrder=1, OVMF reported
  # `BdsDxe: failed to start Boot0001 ... Time out` reading the SATA CDROM
  # backed by the Filesystem-mode PVC. A switch to Block-mode DataVolume
  # was attempted but blocked by a CDI v1.65.0 upload-pod permission issue
  # (capability drop prevents writing to the underlying block device).
  # See header docstring on the ISO PVC.
  runStrategy: Always   # LIVE — ISO uploaded 2026-05-08, password in 1P
  template:
    metadata:
      labels:
@@ -377,73 +67,16 @@ spec:
        firmware:
          bootloader:
            efi:
              # 2026-05-08: SecureBoot=false during initial install. With SecureBoot
              # enabled, OVMF's BdsDxe times out reading Boot0001 from the SCSI
              # CDROM ("BdsDxe: failed to start Boot0001 ... Time out") before the
              # EFI bootloader signature can verify against the OVMF VARS trust DB.
              # KubeVirt's `/usr/share/OVMF/OVMF_VARS.secboot.fd` template doesn't
              # appear to include the Microsoft KEK/DB by default, so signed
              # Windows EFI bootloaders fail validation. Disabling SecureBoot lets
              # OVMF skip the chain check and boot directly. This is acceptable for
              # a CI runner — TPM 2.0 is still emulated (`tpm: {}` below) so
              # BitLocker / Hyper-V / WSL still work.
              # When the operator wants SecureBoot back, the path is:
              #   1. Custom-build OVMF_VARS.fd with Microsoft KEK/DB enrolled
              #   2. Mount it into the VM via firmware.bootloader.efi.persistent
              #   3. Set secureBoot: true again
              # Tracked separately from the install unblock.
              secureBoot: false
        devices:
-          tpm: {}             # Non-persistent vTPM — sufficient for runner; no BitLocker
+          tpm: {}
          disks:
            # bootOrder: ISO must be 1 for first-boot install (the rootdisk has no
            # EFI bootloader yet). After Windows installs, it writes its own UEFI
            # Boot#### entries pointing at the rootdisk's EFI partition; UEFI then
            # boots from rootdisk going forward and the ISO at bootOrder:2 acts as
            # a fallback for re-install scenarios.
            #
            # Original (broken) order had rootdisk=1, windows-iso=2 — UEFI tried
            # the empty virtio disk first, got nothing, fell back to the SATA
            # CDROM at Boot0001 with a short timeout, and timed out before the
            # CDROM enumerated. Console showed:
            #   BdsDxe: failed to start Boot0001 ... Time out
            #   BdsDxe: No bootable option or device was found.
            # Confirmed via debug pod: PVC content IS a real bootable ISO9660
            # (file: "ISO 9660 CD-ROM filesystem data ... (bootable)"), so the
            # only bug was boot priority.
            # 2026-05-08 PM: cdrom bus SCSI + containerDisk delivery. This
            # combination boots qemu cleanly and reaches OVMF, but OVMF
            # BdsDxe still hits "starting Boot0001 ... Time out" on the
            # cdrom — see HANDOFF.md / CODEX-STATUS.md "OPEN — ci1" for the
            # full diagnostic chain. virtio-blk disk swap was attempted as a
            # workaround but introduced a separate QEMU rootdisk flock issue
            # without fixing the underlying OVMF cdrom problem; reverted.
            # Operator decision needed for next architectural step (OVMF
            # custom build with extended timeout, KubeVirt version bump,
            # Hyper-V/VirtualBox-and-export, or BIOS legacy boot). The
            # containerDisk distribution pipeline (build/save/scp/ctr import)
            # is proven and ready to reuse for any of those.
            - name: windows-iso
              bootOrder: 1
              cdrom:
                bus: scsi
            - name: rootdisk
              bootOrder: 2
              disk:
                bus: virtio
            - name: virtio-drivers
              cdrom:
                bus: sata
            - name: sysprep
              cdrom:
                bus: sata
          interfaces:
-            # Pod-network fallback for Phase 1. To switch to PROD VLAN once Multus
+            # Pod-network fallback for CI runner outbound traffic. Switch to
-            # + the prod-vlan57 NAD exist, replace this block with:
+            # prod-vlan57 once the bridge/NAD lane is ready for L2 access.
            #   - name: prod-net
            #     bridge: {}
            #     model: virtio
            # and update the networks: stanza to use multus.networkName: kubevirt-vms/prod-vlan57
            - name: default
              masquerade: {}
              model: virtio
@@ -454,55 +87,7 @@ spec:
          pod: {}
      volumes:
        - name: rootdisk
          persistentVolumeClaim:
            claimName: ci1-rootdisk
        - name: windows-iso
          # 2026-05-08 PM (Path C, CONTAINERDISK): the ISO is now packaged as
          # a KubeVirt containerDisk OCI image baked from
          # `FROM scratch ; ADD --chown=107:107 disk.img /disk/disk.img`.
          # The qemu user (uid 107) reads the ISO directly from a tmpfs view
          # of the OCI layer, bypassing both:
          #   - Synology NFS export ACL (Path B failed: uid 107 denied at
          #     directory level even with mode 0777, see memory
          #     feedback_synology_iso_export_root_only_uid_107_denied)
          #   - OVMF cdrom read-window timeout (Path A and Path B's SCSI
          #     retry both hit `BdsDxe: failed to start Boot0001 ... Time out`
          #     when the cdrom was backed by a PVC the storage controller
          #     couldn't satisfy reads from fast enough).
          #
          # Image build (one-time, per ISO version):
          #   1. Copy ISO to disk.img, write Dockerfile
          #   2. podman build --tag localhost/win-server-2025:1.0 .  (on noc1)
          #   3. podman save -o win-server-2025-1.0.tar localhost/win-server-2025:1.0
          #   4. SCP tar to all 3 RKE2 nodes (rke2-server, rke2-agent1, rke2-agent2)
          #   5. sudo /var/lib/rancher/rke2/bin/ctr -a /run/k3s/containerd/containerd.sock \
          #        -n k8s.io images import /tmp/win-server-2025-1.0.tar
          # Standard FC pattern per `feedback_rke2_localhost_imagepullpolicy`.
          #
          # When a new Windows ISO version ships, bump the tag (1.1, 1.2, ...),
          # rebuild + redistribute, and update the image: line below in a new
          # commit. KubeVirt picks up the new image via a VM restart.
          #
          # The legacy NFS PVC + PV (apps/kubevirt-vms/win2025-iso-nfs-pv.yaml)
          # and CDI Longhorn PVC (`windows-server-2025-iso`) are RETAINED for
          # this commit so the prior states are recoverable. Once the
          # containerDisk path proves on a successful Windows install, both
          # legacy artifacts can be pruned in a follow-up commit.
          containerDisk:
-            image: localhost/win-server-2025:1.0
+            image: localhost/fc-win-server-2025:v1
            imagePullPolicy: Never
        - name: virtio-drivers
          containerDisk:
            # Pinned to v1.8.2 (latest stable as of 2026-05-08).
            # The :latest tag uses Docker manifest v1 schema which containerd
            # 2.1 (RKE2 v1.34.5) refuses to pull with:
            #   "media type application/vnd.docker.distribution.manifest.v1+prettyjws
            #    is no longer supported since containerd v2.1"
            # v1.8.2 is rebuilt with manifest v2/OCI and works on containerd 2.1.
            # Bump available: https://quay.io/repository/kubevirt/virtio-container-disk?tab=tags
            image: quay.io/kubevirt/virtio-container-disk:v1.8.2
        - name: sysprep
          sysprep:
            configMap:
              name: ci1-autounattend
      terminationGracePeriodSeconds: 3600
--- a/apps/kubevirt-vms/kustomization.yaml
+++ b/apps/kubevirt-vms/kustomization.yaml
@@ -0,0 +1,3 @@
 resources:
  - ci1.yaml
  - prod-vlan57-nad.yaml
--- a/apps/monitoring/noc-monitoring.yaml
+++ b/apps/monitoring/noc-monitoring.yaml
@@ -974,6 +974,39 @@ data:
              summary: "Deployment {{ $labels.namespace }}/{{ $labels.deployment }} replica mismatch"
              description: "Spec wants {{ $labels.spec_replicas }} but only {{ $value }} available. Likely a rollout stuck on probe failure, scheduling, or PVC."
          # Q-MR-3 (2026-05-11): multus memory pressure — catches the next OOM
          # cascade BEFORE multus is OOM-killed cluster-wide. The 2026-05-10
          # outage (21h) hit because no alert fired on the rising multus working
          # set — only downstream blackbox / Traefik / service alerts. With
          # 1Gi limit (bluejay-infra@eb8693e), 80% = ~800MiB; steady-state
          # runs ~150-250MiB so this only fires when an avalanche starts.
          - alert: MultusMemoryPressure
            expr: |
              container_memory_working_set_bytes{container="kube-multus"}
                / container_spec_memory_limit_bytes{container="kube-multus"} > 0.8
            for: 5m
            labels:
              severity: critical
              alert_channel: thermal_print
            annotations:
              summary: "kube-multus memory >80% of limit on {{ $labels.node }} for 5m"
              description: "kube-multus working set is {{ $value | humanizePercentage }} of its memory limit on node {{ $labels.node }}. If this keeps climbing, multus will OOM and all new pod networking will halt cluster-wide (precedent: 2026-05-10 outage)."
          # Q-MR-3 (2026-05-11): namespace pending-pod backlog — catches the
          # operator-leak avalanche pattern BEFORE it cascades into a multus
          # CNI OOM. Any FC operator (RemoteDesktop / Distribution / WorldBuilder)
          # emitting pods without ownerReferences will accumulate them when
          # the operator crashes. >25 pending pods in any namespace for 30m
          # is the signal to investigate the reconciler.
          - alert: NamespacePendingPodBacklog
            expr: sum by (namespace) (kube_pod_status_phase{phase="Pending"}) > 25
            for: 30m
            labels:
              severity: warning
            annotations:
              summary: "Namespace {{ $labels.namespace }} has {{ $value }} Pending pods for 30m"
              description: "Pending pod count in {{ $labels.namespace }} exceeds 25 sustained for 30m. Likely operator-leak avalanche pattern — children emitted without ownerReferences. Risk of multus CNI OOM cascade."
      # Longhorn storage health alerts. Required: longhorn scrape job
      # (added 2026-04-26 — see scrape_configs above). The K8s events
      # for "snapshot becomes not ready to use" are transient lifecycle
--- a/apps/multus/multus.yaml
+++ b/apps/multus/multus.yaml
@@ -188,13 +188,24 @@ spec:
        - name: kube-multus
          image: ghcr.io/k8snetworkplumbingwg/multus-cni:snapshot-thick
          command: [ "/usr/src/multus-cni/bin/multus-daemon" ]
          # 2026-05-11: upstream default of 50Mi memory limit OOM-cascades when
          # an operator-owned namespace accumulates >100 pending pods retrying
          # CNI ADD. RemoteDesktop emitted 219 orphan rd-browser-only pods
          # (missing OwnerReferences), kubelet's CNI ADD avalanche pushed multus
          # over 50Mi, OOMKilled, restarted with even bigger backlog → loop.
          # 21h cluster outage. See FlowerCore.Notes:
          #   feedback_multus_50mi_limit_oom_orphan_pod_avalanche.md
          # 1Gi limit / 512Mi request comfortably handles a 200+ pod CNI
          # catchup burst on 64GB nodes (nodes are <25% used in steady-state).
          # Drop back toward 256Mi only after MultusMemoryPressure alert
          # proves steady-state working set sits well below 200Mi.
          resources:
            requests:
              cpu: "100m"
-              memory: "50Mi"
+              memory: "512Mi"
            limits:
              cpu: "100m"
-              memory: "50Mi"
+              memory: "1Gi"
          securityContext:
            privileged: true
          terminationMessagePolicy: FallbackToLogsOnError
--- a/apps/telephony/telephony.yaml
+++ b/apps/telephony/telephony.yaml
@@ -127,10 +127,13 @@ spec:
      initContainers:
        - name: fix-data-perms
          image: busybox:latest
-          # Also chown /shared-tts (hostPath /tmp/tts-audio) so the non-root
+          # Must run as root to chown the hostPath /tmp/tts-audio that may be
-          # app user (uid 1654) can write Piper .sln16 files that Asterisk
+          # root-owned after node reboot. Pod-level runAsNonRoot:true would
-          # reads at /var/lib/asterisk/sounds/tts. World-readable (755) is
+          # otherwise inherit and chown would fail with EPERM (see Notes memory
-          # fine — Asterisk runs as a different uid in the other pod.
+          # feedback_hostpath_initcontainer_chown_perms).
          securityContext:
            runAsUser: 0
            runAsNonRoot: false
          command: ["sh", "-c", "chown -R 1654:1654 /data && chown 1654:1654 /shared-tts && chmod 0755 /shared-tts"]
          volumeMounts:
            - name: telephony-data
--- a/tests/bluejay-infra-lint/PiSignagePlayerArtifactTests.cs
+++ b/tests/bluejay-infra-lint/PiSignagePlayerArtifactTests.cs
@@ -0,0 +1,266 @@
 using System.Text.Json;
 using FluentAssertions;
 using Xunit;
 namespace BluejayInfraLint.Tests;
 [Trait("Category", "Unit")]
 public sealed class PiSignagePlayerArtifactTests
 {
    private static readonly string Root = FindRepoRoot();
    private static readonly string AppRoot = Path.Combine(Root, "apps", "fc-signage-pi-player");
    public static TheoryData<string> RequiredArtifacts => new()
    {
        "README.md",
        "systemd/flowercore-signage-player-pi.service",
        "systemd/flowercore-signage-player-pi-hdmi.service",
        "systemd/flowercore-signage-bootstrap.service",
        "systemd/flowercore-signage-renew.service",
        "systemd/flowercore-signage-renew.timer",
        "systemd/flowercore-signage-detect-display.service",
        "systemd/flowercore-signage-detect-display.timer",
        "systemd/99-flowercore-signage-hdmi.rules",
        "chromium-policies/flowercore-signage.json",
        "scripts/flowercore-signage-launch.sh",
        "scripts/flowercore-signage-prelaunch.sh",
        "scripts/flowercore-signage-bootstrap.sh",
        "scripts/flowercore-signage-renew-cert.sh",
        "scripts/flowercore-signage-hdmi-respond.sh",
        "scripts/fc-signage-detect-display",
    };
    [Theory]
    [MemberData(nameof(RequiredArtifacts))]
    public void RequiredArtifacts_ArePresent(string relativePath)
    {
        File.Exists(Path.Combine(AppRoot, relativePath)).Should().BeTrue(relativePath);
    }
    [Fact]
    public void PlayerService_UsesExpectedRestartAndMemoryGuards()
    {
        var unit = Read("systemd/flowercore-signage-player-pi.service");
        unit.Should().Contain("Restart=always");
        unit.Should().Contain("RestartSec=10s");
        unit.Should().Contain("StartLimitBurst=5");
        unit.Should().Contain("StartLimitIntervalSec=300s");
        unit.Should().Contain("MemoryMax=2G");
    }
    [Fact]
    public void PlayerService_IsGatedByNodeIdentityAndMtlsCertificate()
    {
        var unit = Read("systemd/flowercore-signage-player-pi.service");
        unit.Should().Contain("ConditionPathExists=/etc/flowercore/signage-node.json");
        unit.Should().Contain("ConditionPathExists=/etc/fc-signage-player/client.p12");
        unit.Should().Contain("ExecStartPre=/usr/local/bin/flowercore-signage-prelaunch.sh");
    }
    [Fact]
    public void LaunchScript_TriesEmbedThenFallsBackToBarePlayerRoute()
    {
        var script = Read("scripts/flowercore-signage-launch.sh");
        script.Should().Contain("/player/${NODE_ID}/embed?token=${CERT_THUMB}");
        script.Should().Contain("url-divergence.log");
        script.Should().Contain("/player/${NODE_ID}?token=${CERT_THUMB}");
    }
    [Fact]
    public void LaunchScript_DisablesChromiumPromptsAndRuntimeUpdates()
    {
        var script = Read("scripts/flowercore-signage-launch.sh");
        script.Should().Contain("--noerrdialogs");
        script.Should().Contain("--disable-infobars");
        script.Should().Contain("--password-store=basic");
        script.Should().Contain("--check-for-update-interval=2592000");
    }
    [Fact]
    public void PrelaunchScript_AbortsWhenRequiredFilesAreMissing()
    {
        var script = Read("scripts/flowercore-signage-prelaunch.sh");
        script.Should().Contain("for f in /etc/flowercore/signage-node.json /etc/fc-signage-player/client.p12 /etc/fc-signage-player/client.p12.pass");
        script.Should().Contain("exit 1");
        script.Should().Contain("-checkend $((7*24*3600))");
    }
    [Fact]
    public void BootstrapScript_IsIdempotentWhenAlreadyEnrolled()
    {
        var script = Read("scripts/flowercore-signage-bootstrap.sh");
        script.Should().Contain("already enrolled");
        script.Should().Contain("exit 0");
        script.Should().Contain(".enrolledAt");
    }
    [Fact]
    public void BootstrapScript_GeneratesStableMachineIdFromUuid()
    {
        var script = Read("scripts/flowercore-signage-bootstrap.sh");
        script.Should().Contain("uuidgen");
        script.Should().Contain("cut -c1-16");
        script.Should().Contain("machineId");
    }
    [Fact]
    public void BootstrapScript_RetriesRegisterOnceForFirstCallRace()
    {
        var script = Read("scripts/flowercore-signage-bootstrap.sh");
        script.Should().Contain("for attempt in 1 2");
        script.Should().Contain("register attempt $attempt returned");
        script.Should().Contain("sleep 5");
    }
    [Fact]
    public void BootstrapScript_SupportsSetupCodeAndApprovalPollingBudget()
    {
        var script = Read("scripts/flowercore-signage-bootstrap.sh");
        script.Should().Contain("signage-setup-code");
        script.Should().Contain("approve-via-setup-code");
        script.Should().Contain("+ 1800");
        script.Should().Contain("sleep 15");
    }
    [Fact]
    public void BootstrapScript_CsrSubjectIdentifiesPiPlayer()
    {
        var script = Read("scripts/flowercore-signage-bootstrap.sh");
        script.Should().Contain("/CN=${NODE_ID}/O=FlowerCore/OU=SignagePlayer-Pi");
    }
    [Fact]
    public void BootstrapScript_PersistsCertificateAsP12WithRestrictivePermissions()
    {
        var script = Read("scripts/flowercore-signage-bootstrap.sh");
        script.Should().Contain("openssl pkcs12 -export");
        script.Should().Contain("client.p12.pass");
        script.Should().Contain("chmod 0600");
        script.Should().Contain("chmod 0640");
    }
    [Fact]
    public void RenewScript_OnlyRunsWhenCertHasLessThanThirtyDays()
    {
        var script = Read("scripts/flowercore-signage-renew-cert.sh");
        script.Should().Contain("-checkend $((30*24*3600))");
        script.Should().Contain("exit 0");
        script.Should().Contain("/renew");
    }
    [Fact]
    public void RenewScript_AtomicallySwapsNewCertificateFiles()
    {
        var script = Read("scripts/flowercore-signage-renew-cert.sh");
        script.Should().Contain("client.key.new");
        script.Should().Contain("mv \"$CERT_DIR/client.key.new\" \"$CERT_DIR/client.key\"");
        script.Should().Contain("mv \"$CERT_DIR/client.p12.new\" \"$CERT_DIR/client.p12\"");
    }
    [Fact]
    public void HdmiRule_RestartsPlayerAndRunsCapabilityDetection()
    {
        var rule = Read("systemd/99-flowercore-signage-hdmi.rules");
        rule.Should().Contain("KERNEL==\"card?-HDMI-A-?\"");
        rule.Should().Contain("restart flowercore-signage-player-pi.service");
        rule.Should().Contain("start flowercore-signage-detect-display.service");
    }
    [Fact]
    public void DetectDisplayServiceAndTimer_RunAtBootAndDaily()
    {
        var service = Read("systemd/flowercore-signage-detect-display.service");
        var timer = Read("systemd/flowercore-signage-detect-display.timer");
        service.Should().Contain("ExecStart=/usr/local/bin/fc-signage-detect-display");
        timer.Should().Contain("OnBootSec=30s");
        timer.Should().Contain("OnCalendar=daily");
        timer.Should().Contain("RandomizedDelaySec=1h");
    }
    [Fact]
    public void DetectDisplayScript_EmitsDisconnectedProfileWhenNoHdmiIsPresent()
    {
        var script = Read("scripts/fc-signage-detect-display");
        script.Should().Contain("displayConnected: false");
        script.Should().Contain("No HDMI display detected");
    }
    [Fact]
    public void DetectDisplayScript_ParsesEdidForHdrResolutionAndAudio()
    {
        var script = Read("scripts/fc-signage-detect-display");
        script.Should().Contain("edid-decode");
        script.Should().Contain("HDR (Static|Dynamic) Metadata Block");
        script.Should().Contain("maxResolution");
        script.Should().Contain("hasAudioOutput");
    }
    [Fact]
    public void DetectDisplayScript_TriesBothForwardCompatibleCapabilityEndpoints()
    {
        var script = Read("scripts/fc-signage-detect-display");
        script.Should().Contain("/api/v1/nodes/${NODE_ID}/capabilities");
        script.Should().Contain("/api/v1/displays/${NODE_ID}/capability-profile");
        script.Should().Contain("no endpoint accepted the profile");
    }
    [Fact]
    public void ChromiumPolicy_IsValidJsonAndDisablesCredentialPrompts()
    {
        using var doc = JsonDocument.Parse(Read("chromium-policies/flowercore-signage.json"));
        var root = doc.RootElement;
        root.GetProperty("AutofillAddressEnabled").GetBoolean().Should().BeFalse();
        root.GetProperty("AutofillCreditCardEnabled").GetBoolean().Should().BeFalse();
        root.GetProperty("PasswordManagerEnabled").GetBoolean().Should().BeFalse();
        root.GetProperty("ExtensionInstallBlocklist")[0].GetString().Should().Be("*");
    }
    [Fact]
    public void RenewalTimer_UsesDailyCadenceWithTwoHourJitter()
    {
        var timer = Read("systemd/flowercore-signage-renew.timer");
        timer.Should().Contain("OnCalendar=daily");
        timer.Should().Contain("RandomizedDelaySec=2h");
        timer.Should().Contain("Persistent=true");
    }
    private static string Read(string relativePath)
        => File.ReadAllText(Path.Combine(AppRoot, relativePath.Replace('/', Path.DirectorySeparatorChar)));
    private static string FindRepoRoot()
    {
        var current = new DirectoryInfo(AppContext.BaseDirectory);
        while (current is not null)
        {
            if (Directory.Exists(Path.Combine(current.FullName, "apps"))
                && File.Exists(Path.Combine(current.FullName, "README.md")))
            {
                return current.FullName;
            }
            current = current.Parent;
        }
        throw new DirectoryNotFoundException("Could not find bluejay-infra root.");
    }
 }
Author	SHA1	Message	Date
Codex	e8094eb0bd	ci(github-runner): add Phase 2 ephemeral Linux runner K8s manifest Namespace github-runner with myoung34/github-runner:latest Deployment, 5Gi Longhorn RWO NuGet cache PVC, zero-privilege ServiceAccount, and OnePasswordItem CRD for the registration token. EPHEMERAL=true mode re-registers after each job; Recreate strategy avoids RWO multi-attach. Targets fc-build-linux label; single replica pinned to rke2-server node. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-14 12:46:25 -05:00
bluejay	8d87d9172c	Add Pi signage Phase 1 player artifacts Squash merge Sprint 14 Pi signage player artifacts.	2026-05-14 01:46:09 +00:00
Codex	cfd9743afa	Add Apple TV signage docs manifest	2026-05-13 20:32:48 -05:00
Andrew Stoltz	5029e209cd	kubevirt-vms: boot ci1 from server template	2026-05-12 16:58:18 -05:00
Codex	f298339152	fix(guacamole): add --- separator between macmini-vnc-creds OnePasswordItem and guacamole-branding ConfigMap Missing document separator caused YAML to merge the OnePasswordItem's top-level `spec: itemPath:` block into the ConfigMap that follows. Result: a ConfigMap with a `.spec` field whose K8s schema does not declare one, triggering ArgoCD's structured-merge diff to fail since 2026-05-11T15:30:54Z: Failed to compare desired state to live state: failed to calculate diff: error calculating structured merge diff: error building typed value from config resource: .spec: field not declared in schema App stayed Healthy (live K8s tolerated the extra field — ConfigMap ignored it) but ArgoCD's diff calc was broken, leaving the app stuck at sync=Unknown for all 21 resources. Adding the missing `---` separator makes the OnePasswordItem and ConfigMap proper sibling YAML documents, each with its own kind-correct schema. Diagnosed during 2026-05-12 morning routine. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 09:26:03 -05:00
Codex	6e7d88db49	feat(fc-redis): add SignalR backplane for cross-product event bus (Q-SO-1 Phase A) Per Q-SO-1 operator resolution 2026-05-11 PM, Redis SignalR backplane lands in Phase A (was Phase C deferral). Treats Redis as a managed FC infrastructure component, not a deferred scaling escalation. Lands the minimal Phase A surface: - Namespace fc-redis - Single Redis 7-alpine pod with 1Gi Longhorn RWO PVC - ConfigMap with AOF persistence (everysec), 256Mi maxmemory, allkeys-lru - ClusterIP Service `redis.fc-redis.svc.cluster.local:6379` (in-cluster only) - No AUTH Phase A (Phase B add via 1Password Connect rotation) - No IngressRoute (backplane is server-to-server) Consumers (Phase A IMPL across FC services) add: services.AddSignalR().AddStackExchangeRedis( "redis.fc-redis.svc.cluster.local:6379", opts => opts.Configuration.ChannelPrefix = StackExchange.Redis.RedisChannel.Literal("fc-opsconsole")); Phase B/C follow-ons (not in this commit): Sentinel for HA, AUTH password from 1Password, redis_exporter sidecar for Prometheus, network policies. See FlowerCore.Notes/docs/signage/operations-console-phase-2-design.md section 3.5 (rewritten) and decisions-waiting.html Q-SO-1 (RESOLVED). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 19:02:58 -05:00
Codex	5ae50bd491	fix(telephony): init container runs as root to chown hostPath /tmp/tts-audio The fix-data-perms init container chowns /data (PVC) and /shared-tts (hostPath /tmp/tts-audio on rke2-agent1) to uid 1654 so the non-root telephony-web app can write Piper TTS .sln16 files. Without an explicit container-level securityContext override, the init container inherits pod-level runAsNonRoot:true / runAsUser:1654 and fails with 'chown: /shared-tts: Operation not permitted' the first time the hostPath comes up root-owned after a node reboot. Outage 2026-05-11 23:00 UTC: telephony-web in Init:CrashLoopBackOff for 9 hours (100+ restarts) until init container was bumped to runAsUser:0. Live cluster patched in the same operation; this commit makes the fix durable in git so ArgoCD sync preserves it. See Notes memory: feedback_hostpath_initcontainer_chown_perms Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 18:37:15 -05:00
Codex	653d4472f5	fix(monitoring): mirror Q-MR-3 MultusMemoryPressure + NamespacePendingPodBacklog alerts Two new preventive alert rules added to the kubernetes-state group of the K8s migration target ConfigMap. The live Podman Prometheus on noc1 has already been updated via FlowerCore.Notes/scripts/monitoring/alerts.yml + sudo cp + podman pod restart monitoring (this commit only locks it in the bluejay-infra K8s mirror so a future migration carries it forward). MultusMemoryPressure (critical, thermal_print): fires when kube-multus working set exceeds 80% of its memory limit for 5m. Catches the next multus OOM cascade BEFORE it kills the daemon cluster-wide. The 2026-05-10 21h outage hit because no alert fired on the rising multus working set; only downstream blackbox / Traefik / service alerts triggered, after the fact. NamespacePendingPodBacklog (warning): fires when any single namespace has >25 Pending pods sustained for 30m. Catches the operator-leak avalanche pattern (orphan pods from a crashed reconciler emitting children without ownerReferences) before it cascades into a CNI OOM. See FlowerCore.Notes: - feedback_multus_50mi_limit_oom_orphan_pod_avalanche - feedback_monitoring_k8s_target_vs_live_podman (workflow) Companion commits: - bluejay-infra@eb8693e (multus memory limit) - FlowerCore.RemoteDesktop@b02c59b (OwnerReferences fix) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 10:42:27 -05:00
Codex	eb8693e1ce	fix(multus): bump kube-multus-ds memory 50Mi/50Mi -> 1Gi/512Mi (prevent OOM cascade) Cluster outage 2026-05-10T17:43 through 2026-05-11 ~10:30 (~21h). Root cause: FlowerCore.RemoteDesktop emitted 219 orphan rd-browser-only-* pods in fc-desktop (missing OwnerReferences — see companion fix in FlowerCore.RemoteDesktop). Kubelet's continuous CNI ADD retries for those pending pods drove a request queue that exceeded the upstream default 50Mi limit on kube-multus-ds. Multus OOMKilled (exit 137), restarted with an even bigger backlog, OOMKilled again, positive feedback loop. Restart counts climbed to 276 / 412 / 261 across the 3 RKE2 nodes. Downstream blast radius: both Traefik pods stuck ContainerCreating (101m + 4h35m), all Longhorn CSI attacher/provisioner/instance-manager stuck, every Prometheus blackbox probe for *.iamworkin.lan failing, UpdateCenterPublicEdgeDown critical on update.flowercore.io, every ArgoCD app showing sync=Unknown because repo-server lost git connectivity. 45 firing Prometheus alerts. Recovery sequence (Q-MR-1 from FlowerCore.Notes morning routine): 1. kubectl patch kube-multus-ds memory live (this commit locks it in git so ArgoCD doesn't revert on next sync) 2. Force-delete the 219 orphan pods (kubectl --grace-period=0 --force) to break the avalanche 3. Rollout restart kube-multus-ds — STABLE after restart with new limit 4. Restart Traefik + Longhorn CSI to clear stuck ContainerCreating 5. Verify update.flowercore.io returns 200 + ArgoCD apps reconcile Tested incrementally: 256Mi limit was insufficient (still OOMed on catchup burst), 512Mi was insufficient on rke2-agent1 (most pods concentrated there), 1Gi/512Mi handled the full 200+ pending pod CNI catchup cleanly with 0 multus restarts after rollout. Nodes are 64GB with <25% used in steady-state, so the ~256Mi typical working-set is well within the new limit. Companion change: FlowerCore.RemoteDesktop must set OwnerReferences on every worker pod so future operator crashes don't leak orphans (Q-MR-2). Preventive alerts (Q-MR-3) MultusMemoryPressure + NamespacePendingPodBacklog are coming in a follow-up commit to apps/monitoring/. Memory: feedback_multus_50mi_limit_oom_orphan_pod_avalanche Decisions card: docs/dashboards/decisions-waiting.html Q-MR-1..3 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 10:30:05 -05:00