fix(ci1): switch ISO delivery to containerDisk OCI image (Path C)

OCI image: localhost/win-server-2025:1.0 (8.27 GB) Built FROM scratch + ADD disk.img → /disk/disk.img on noc1, podman saved as tar (8.27 GB), SCP'd in parallel to all 3 RKE2 nodes, imported via ctr in k8s.io namespace. Verified present on all 3 schedulable nodes (rke2-server, rke2-agent1, rke2-agent2). Why containerDisk over the prior PVC paths: - Path A (Longhorn Filesystem PVC, sata): OVMF BdsDxe SATA-CDROM read timeout. Cdrom-backed PVC is too slow for OVMF's first-sector read window. - Path B (Synology NFS): uid 107 (qemu) denied at directory level by Synology export ACL despite file mode 0777. Memory: feedback_synology_iso_export_root_only_uid_107_denied. - Path B+SCSI: same OVMF timeout, just on SCSI controller. Bus choice was not load-bearing — the issue was always the slow PVC backing. - Path C (this commit): containerDisk delivers the ISO bytes from a tmpfs view of the OCI layer, no PVC controller in the read path. qemu reads at native FS speed; OVMF first-sector read completes well within timeout. This is also the KubeVirt-recommended pattern for installer ISOs. Connects to FlowerCore.Distribution / Provisioning USB story: same "OCI image of the OS installer + autounattend on a sysprep CDROM" pattern that the USB provisioning agent will use. The Windows install proceeds hands-off via the existing autounattend.xml in ci1-autounattend ConfigMap (RDP enabled, WinRM, UAC disabled, Administrator password from 1Password vault item h3ix4mgfk65gmkcmvh6ly3d3hu). Image lifecycle: bump tag (1.1, 1.2, ...) when ISO version changes, rebuild on noc1, redistribute to RKE2 nodes, update image: line. Legacy NFS PVC + PV manifest and CDI Longhorn PVC RETAINED for this commit so prior states are recoverable. Will prune in follow-up once containerDisk boot proves. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fix(ci1): revert NFS Path B + flip ISO cdrom bus sata→scsi
2026-05-08 20:45:38 -05:00 · 2026-05-08 18:54:36 -05:00 · 2026-05-08 17:03:42 -05:00 · 2026-05-08 15:18:38 -05:00 · 2026-05-08 14:32:52 -05:00 · 2026-05-08 14:23:31 -05:00
57 changed files with 13626 additions and 253 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,7 @@
 # .NET build outputs (lint test project)
 **/bin/
 **/obj/
 # Editor / temp
 .DS_Store
 *.swp
--- a/README.md
+++ b/README.md
@@ -99,8 +99,23 @@ curl -sk -X DELETE https://dns.iamworkin.lan/api/v1/servers/<serverId>/zones/iam
 - **CoreDNS template + ndots:5 collision**: inside pods, `<svc>.<ns>.svc.cluster.local` with <5 dots gets search-expanded through `iamworkin.lan` FIRST and hits the wildcard template → resolves to Traefik VIP, not the real ClusterIP. Use short service names (`<svc>`) in K8s manifests. See memory `feedback_coredns_ndots_template_collision.md`.
 - **Image not on node**: pods stuck `ErrImageNeverPull` means the image wasn't imported to the node Kubernetes scheduled the pod onto. `ctr images import` on all of rke2-server, rke2-agent1, rke2-agent2.
 - **StatefulSet PVC drift**: `volumeClaimTemplates` needs explicit `volumeMode: Filesystem` or ArgoCD SSA self-heals forever. See memory `feedback_argocd_statefulset_pvc_drift.md`.
 - **IngressRoute namespace split**: this RKE2 Traefik install does not allow cross-namespace service refs. Keep the `IngressRoute`, backend `Service`, and TLS secret in the same namespace; if one host is shared across namespaces, duplicate the `Certificate` and move the route next to the destination service.
 - **Public read-only hosts**: if a public host fronts a service that also exposes admin writes internally, add a Traefik route match like `Host(...) && (Method(GET) || Method(HEAD))` on the public edge instead of trusting the app to reject unsafe methods.
 - **Public read-write allowlist hosts**: if a public host accepts a tightly bounded write surface (e.g. bootstrap-JWT POST), pin the allowlist as `(Method(GET) || Method(HEAD) || Method(POST) || Method(OPTIONS))`. PUT/PATCH/DELETE must still 404 at the route. Track A's `updatecenter.iamworkin.lan` / `updates.iamworkin.lan` are the canonical example. The lint test enforces this invariant.
 - **Traefik VIP netpols**: when a `NetworkPolicy` allows `10.0.56.200`, also allow the post-DNAT backend ports (`8443` for TLS plus `8080` or `8000` for HTTP) or Calico will drop the rewritten flow.
 - **Auth-safe probes**: services behind API-key or global auth middleware should prefer `tcpSocket` probes unless `/health` is explicitly exempted before the middleware runs.
 - **ArgoCD must use internal Gitea URL**: `http://gitea-clusterip.gitea.svc.cluster.local:3000/bluejay/bluejay-infra.git`, not the external HTTPS URL (step-ca cert isn't trusted by ArgoCD). The `ApplicationSet` and any hand-created `Application` must both use the internal URL.
 ## Local manifest lint
 The repo now carries a local-first lint pass for the recurring K8s gotchas that have burned the fleet:
 ```bash
 dotnet test tests/bluejay-infra-lint/BluejayInfraLint.Tests.csproj -c Release
 ```
 That test project sweeps `bluejay-infra/apps/**` plus the canonical sibling `FlowerCore.*\\k8s` manifests that share the same workspace. Matching `conftest.dev` policy files live under `tests/bluejay-infra-lint/conftest.dev/` for environments that also have `conftest` or `opa`.
 ## References
 - Cert-manager recovery playbook: `FlowerCore.Notes/memory/project_cert_manager_recovery_2026_04_22.md`
--- a/apps/agent-zero/agent-zero.yaml
+++ b/apps/agent-zero/agent-zero.yaml
@@ -2,14 +2,15 @@
 # Agent Zero AI Stack — NUC Deployment (RKE2 Bare-Metal)
 # =============================================================================
 # Deploys: AgentZero (agent UI) on RKE2 cluster with Blue Jay profile
-# Ollama: workstation-first via BLUEJAY-WS (10.0.56.20:11434) with edge1 Pi 5
+# Ollama: edge1 Pi 5 + AI HAT+ ONLY (10.0.57.17:11434).
-# fallback (10.0.57.17:11434)
+# Workstation Ollama (BLUEJAY-WS) is intentionally NOT in the upstream —
 # the workstation is private dev hardware, not a cluster dependency.
 # Target: RKE2 bare-metal cluster, namespace: agent-zero
 # Profile: Blue Jay (21 tools, 3 prompts, 4 extensions, theme)
 #
 # Differences from LOCAL (WSL K3s):
 #   - Uses Longhorn StorageClass (not local-path)
-#   - Prefers workstation Ollama on the R9700, falls back to edge1 Pi 5
+#   - Cluster-only Ollama path (edge1) — keeps workstation private
 #   - NO Anthropic API key (free/local models only)
 #   - NO Piper TTS or Kiwix (edge1 handles TTS, no Wikipedia needed)
 #   - NO hostPath volumes — profile/tools/extensions loaded via ConfigMaps
@@ -91,14 +92,17 @@ subjects:
 # =============================================================================
 # Agent Zero — AI Agent Web UI (NUC Edition, Blue Jay Profile)
 # =============================================================================
-# Connects to a local proxy that routes to workstation Ollama first and edge1 second
+# Connects directly to fc-llm-bridge for chat + internal util/embed + browser.
-# Blue Jay profile with 21 tools, 3 prompts, 4 extensions
+# Agent Zero's internal util/embed slots stay on the bridge's OpenAI-compatible
 # /v1 surface, while browser + corpus-search use the Ollama-compatible /api/*
 # surface through OLLAMA_HOST.
 # Blue Jay profile with 21 tools, 3 prompts, 4 extensions.
 ---
-# FC LLM Bridge API key for Agent Zero (ADR-088 chat_model routing).
+# FC LLM Bridge API key for Agent Zero (ADR-088 chat/util/embed/browser routing).
 # Syncs from 1Password item "FC LLM Bridge API Keys" (field: agent-zero-k8s).
-# Consumed by the chat_model only; util / embedding / browser stay on local
+# Consumed by chat, internal util/embed, browser, and corpus-search requests
-# Ollama via the 127.0.0.1 sidecar proxy.
+# that traverse fc-llm-bridge.
 apiVersion: onepassword.com/v1
 kind: OnePasswordItem
 metadata:
@@ -107,6 +111,34 @@ metadata:
 spec:
  itemPath: "vaults/IAmWorkin/items/FC LLM Bridge API Keys"
 ---
 # Print.Web API key for Agent Zero's print_web.py Python tool.
 # Syncs from 1Password item "Print.Web API Keys" (password field = API key).
 # The print_web.py tool reads PRINT_WEB_API_KEY env var for all HTTP requests
 # to the thermal print service (GET /api/mcp/tools, POST /api/print/*, etc.).
 # Note: Print.Web uses the legacy REST MCP shape (/api/mcp/tools/*), not the
 # streamable-http MCP protocol. The print_web Python tool bridges this gap
 # and is already present in bluejay-tools ConfigMaps.
 apiVersion: onepassword.com/v1
 kind: OnePasswordItem
 metadata:
  name: print-web-api-keys
  namespace: agent-zero
 spec:
  itemPath: "vaults/IAmWorkin/items/Print.Web API Keys"
 ---
 # Knowledge MCP bearer token for the direct Agent Zero -> Knowledge.Web path.
 # The 1Password item currently stores the raw token in its concealed PASSWORD
 # field, which the operator syncs to Secret key `password`.
 apiVersion: onepassword.com/v1
 kind: OnePasswordItem
 metadata:
  name: knowledge-mcp-tokens
  namespace: agent-zero
 spec:
  itemPath: "vaults/IAmWorkin/items/FlowerCore Knowledge MCP Tokens"
 ---
 apiVersion: apps/v1
 kind: Deployment
@@ -118,7 +150,7 @@ metadata:
  annotations:
    agent-zero/deployment: "nuc"
    agent-zero/profile: "bluejay"
-    agent-zero/ollama: "BLUEJAY-WS primary (10.0.56.20:11434), edge1 fallback (10.0.57.17:11434)"
+    agent-zero/ollama: "fc-llm-bridge fronts edge1 Pi 5 + AI HAT+ Ollama for cluster browser/corpus-search traffic; internal chat/util/embed route through the bridge's authenticated OpenAI surface"
 spec:
  replicas: 1
  selector:
@@ -133,19 +165,18 @@ spec:
    spec:
      serviceAccountName: agent-zero
      initContainers:
-        # Wait for either workstation or edge1 Ollama to be reachable before starting Agent Zero.
+        # Wait for fc-llm-bridge to be reachable before starting Agent Zero.
-        - name: wait-for-ollama
+        - name: wait-for-llm-bridge
          image: busybox:1.37
          command: ["sh", "-c"]
          args:
            - |
-              echo "Waiting for Ollama at BLUEJAY-WS or edge1..."
+              echo "Waiting for fc-llm-bridge..."
-              until wget -qO- --timeout=2 http://10.0.56.20:11434/api/tags >/dev/null 2>&1 || \
+              until wget -qO- --timeout=2 http://fc-llm-bridge.fc-llm-bridge.svc:8080/healthz >/dev/null 2>&1; do
-                    wget -qO- --timeout=2 http://10.0.57.17:11434/api/tags >/dev/null 2>&1; do
+                echo "fc-llm-bridge not ready yet, retrying in 5s..."
                echo "No Ollama endpoint ready yet, retrying in 5s..."
                sleep 5
              done
-              echo "At least one Ollama endpoint is reachable."
+              echo "fc-llm-bridge is reachable."
        # Assemble the Blue Jay profile directory structure from ConfigMaps.
        # ConfigMaps can't create nested dirs, so we copy into the workspace PVC.
        - name: setup-bluejay
@@ -192,50 +223,6 @@ spec:
            - name: bluejay-theme
              mountPath: /tmp/bluejay-theme
      containers:
        - name: ollama-proxy
          image: nginx:1.27-alpine
          command: ["/bin/sh", "-c"]
          args:
            - |
              cat > /etc/nginx/nginx.conf <<'NGINX'
              worker_processes  1;
              events { worker_connections 1024; }
              http {
                upstream ollama_upstream {
                  server 10.0.56.20:11434 max_fails=2 fail_timeout=10s;
                  server 10.0.57.17:11434 backup;
                  keepalive 16;
                }
                server {
                  listen 11434;
                  location / {
                    proxy_http_version 1.1;
                    proxy_set_header Connection "";
                    proxy_set_header Host $host;
                    proxy_connect_timeout 5s;
                    proxy_read_timeout 600s;
                    proxy_send_timeout 600s;
                    proxy_next_upstream error timeout invalid_header http_502 http_503 http_504;
                    proxy_pass http://ollama_upstream;
                  }
                }
              }
              NGINX
              exec nginx -g 'daemon off;'
          ports:
            - containerPort: 11434
          readinessProbe:
            httpGet:
              path: /api/tags
              port: 11434
            initialDelaySeconds: 5
            periodSeconds: 15
          livenessProbe:
            httpGet:
              path: /api/tags
              port: 11434
            initialDelaySeconds: 10
            periodSeconds: 30
        - name: agent-zero
          image: agent0ai/agent-zero:latest
          command: ["/bin/bash", "-c"]
@@ -256,23 +243,41 @@ spec:
              # chat_model: FlowerCore LLM Bridge (ADR-088) — OpenAI-compat,
              # spend-tracked, tier-aliased (fc:balanced → Claude Sonnet).
              # api_key comes from A0_SET_chat_model_api_key env var (overrides
-              # config.json). util + embedding stay on local 127.0.0.1 Ollama
+              # config.json). Utility + embedding stay on the authenticated
-              # proxy (workstation primary, edge1 fallback).
+              # OpenAI-compatible /v1 surface; browser and direct tool traffic
              # use the bridge's Ollama-compatible root via OLLAMA_HOST.
              mkdir -p /a0/usr/plugins/_model_config
              cat > /a0/usr/plugins/_model_config/config.json << 'MODELCFG'
-              {"allow_chat_override":true,"chat_model":{"provider":"openai","name":"fc:balanced","api_base":"http://fc-llm-bridge.fc-llm-bridge.svc:8080/v1","ctx_length":8192,"ctx_history":0.7,"vision":false,"kwargs":{"temperature":0,"num_ctx":8192}},"utility_model":{"provider":"ollama","name":"qwen2.5:1.5b","api_base":"http://127.0.0.1:11434","ctx_length":8192,"ctx_input":0.7,"kwargs":{"num_ctx":8192}},"embedding_model":{"provider":"ollama","name":"nomic-embed-text","api_base":"http://127.0.0.1:11434","kwargs":{}}}
+              {"allow_chat_override":true,"chat_model":{"provider":"openai","name":"fc:balanced","api_base":"http://fc-llm-bridge.fc-llm-bridge.svc:8080/v1","ctx_length":8192,"ctx_history":0.7,"vision":false,"kwargs":{"temperature":0,"num_ctx":8192}},"utility_model":{"provider":"openai","name":"fc:cheap","api_base":"http://fc-llm-bridge.fc-llm-bridge.svc:8080/v1","ctx_length":8192,"ctx_input":0.7,"kwargs":{"num_ctx":8192}},"embedding_model":{"provider":"openai","name":"openai/fc:embedding","api_base":"http://fc-llm-bridge.fc-llm-bridge.svc:8080/v1","kwargs":{}}}
              MODELCFG
              # Strip heredoc indentation
              sed -i 's/^              //' /a0/usr/plugins/_model_config/config.json
              # Phase 0 Chat MCP pilot: Agent Zero does not interpolate env vars
              # inside A0_SET_mcp_servers JSON, so build the final JSON here from
-              # the secret-backed CHAT_MCP_API_KEY env var before initialize.sh.
+              # the secret-backed env vars before initialize.sh. Keep the local
-              # Use the in-cluster Chat service URL rather than the public
+              # corpus_search.py tool mounted either way so outage fallback
-              # Traefik hostname so the pod stays off the private VIP lane that
+              # remains available even when fc_knowledge is not advertised.
-              # the default egress rule blocks.
+              export KNOWLEDGE_MCP_ENABLED=false
-              if [ -n "${CHAT_MCP_API_KEY:-}" ]; then
+              if [ -n "${KNOWLEDGE_MCP_BEARER_TOKEN:-}" ]; then
-                export A0_SET_mcp_servers="{\"mcpServers\":{\"fc-chat\":{\"type\":\"streamable-http\",\"url\":\"http://chat-web.fc-chat.svc/mcp\",\"headers\":{\"X-Api-Key\":\"${CHAT_MCP_API_KEY}\"}}}}"
+                if curl -sf --connect-timeout 3 "${KNOWLEDGE_MCP_HEALTH_URL}" > /dev/null && \
                   curl -sf --connect-timeout 5 \
                     -H "Authorization: Bearer ${KNOWLEDGE_MCP_BEARER_TOKEN}" \
                     -H "Accept: application/json, text/event-stream" \
                     -H "Content-Type: application/json" \
                     -d '{"jsonrpc":"2.0","id":"fc-knowledge-bootstrap","method":"initialize","params":{"protocolVersion":"2025-03-26","capabilities":{},"clientInfo":{"name":"agent-zero-bootstrap","version":"1.0"}}}' \
                     "${KNOWLEDGE_MCP_URL}" > /dev/null; then
                  export KNOWLEDGE_MCP_ENABLED=true
                  echo "fc_knowledge enabled from ${KNOWLEDGE_MCP_URL}."
                else
                  echo "fc_knowledge unavailable or unauthorized; keeping local corpus_search.py as the fallback path."
                fi
              else
                echo "fc_knowledge token missing; keeping local corpus_search.py as the fallback path."
              fi
              export A0_SET_mcp_servers="$(
                python3 -c 'import json, os; servers = {}; chat_key = os.getenv("CHAT_MCP_API_KEY"); knowledge_enabled = os.getenv("KNOWLEDGE_MCP_ENABLED", "false").lower() == "true"; token = os.getenv("KNOWLEDGE_MCP_BEARER_TOKEN", "") if knowledge_enabled else ""; chat_key and servers.setdefault("fc_chat", {"type": "streamable-http", "url": "http://chat-web.fc-chat.svc/mcp", "headers": {"X-Api-Key": chat_key}}); token and servers.setdefault("fc_knowledge", {"type": "streamable-http", "url": os.getenv("KNOWLEDGE_MCP_URL", "http://knowledge-web.knowledge.svc/mcp"), "headers": {"Authorization": f"Bearer {token}"}}); print(json.dumps({"mcpServers": servers}, separators=(",", ":")))'
              )"
              # Run the original entrypoint
              exec /exe/initialize.sh $BRANCH
          ports:
@@ -284,8 +289,9 @@ spec:
            # Chat model — routed through FlowerCore LLM Bridge (ADR-088)
            # so spend is tracked and tier aliases (fc:cheap/fc:balanced/fc:deep)
            # dispatch to Ollama or Anthropic via a single OpenAI-compat endpoint.
-            # Util / embedding / browser stay on local Ollama via 127.0.0.1 proxy
+            # Internal utility + embedding use the authenticated OpenAI surface,
-            # for zero-latency, zero-cost small-model traffic.
+            # while browser/corpus-search use the bridge's Ollama-compatible
            # endpoints so Agent Zero no longer needs a local proxy sidecar.
            - name: A0_SET_chat_model_provider
              value: "openai"
            - name: A0_SET_chat_model_name
@@ -307,35 +313,51 @@ spec:
                secretKeyRef:
                  name: fc-llm-bridge-api-keys
                  key: agent-zero-k8s
            - name: FC_LLM_BRIDGE_API_KEY
              valueFrom:
                secretKeyRef:
                  name: fc-llm-bridge-api-keys
                  key: agent-zero-k8s
            - name: A0_SET_chat_model_ctx_length
              value: "8192"
            - name: A0_SET_chat_model_kwargs
              value: '{"temperature": 0, "num_ctx": 8192}'
-            # Utility model — fast small helper tier through the same proxy
+            # Utility model — fast small helper tier through the OpenAI surface
            - name: A0_SET_util_model_provider
-              value: "ollama"
+              value: "openai"
            - name: A0_SET_util_model_name
-              value: "qwen2.5:1.5b"
+              value: "fc:cheap"
            - name: A0_SET_util_model_api_base
-              value: "http://127.0.0.1:11434"
+              value: "http://fc-llm-bridge.fc-llm-bridge.svc:8080/v1"
            - name: A0_SET_util_model_kwargs
              value: '{"num_ctx": 2048}'
-            # Embedding model — nomic through the same proxy
+            # Embedding model — authenticated bridge alias to nomic-embed-text.
            # LiteLLM's embedding() path needs an explicit provider prefix here
            # even though the chat slot can use bare fc:* aliases.
            - name: A0_SET_embed_model_provider
-              value: "ollama"
+              value: "openai"
            - name: A0_SET_embed_model_name
-              value: "nomic-embed-text"
+              value: "openai/fc:embedding"
            - name: A0_SET_embed_model_api_base
-              value: "http://127.0.0.1:11434"
+              value: "http://fc-llm-bridge.fc-llm-bridge.svc:8080/v1"
            # Browser model — small Gemma candidate through the same proxy
            - name: A0_SET_browser_model_provider
              value: "ollama"
            - name: A0_SET_browser_model_name
              value: "gemma3:4b"
            - name: A0_SET_browser_model_api_base
-              value: "http://127.0.0.1:11434"
+              value: "http://fc-llm-bridge.fc-llm-bridge.svc:8080"
            - name: A0_SET_browser_model_api_key
              valueFrom:
                secretKeyRef:
                  name: fc-llm-bridge-api-keys
                  key: agent-zero-k8s
            - name: A0_SET_browser_model_vision
              value: "true"
            - name: OLLAMA_HOST
              value: "http://fc-llm-bridge.fc-llm-bridge.svc:8080"
            - name: FLOWERCORE_AGENTZERO_OLLAMA_URL
              value: "http://fc-llm-bridge.fc-llm-bridge.svc:8080"
            # Agent profile — Blue Jay personality, tools, and system prompt
            - name: A0_SET_agent_profile
              value: "bluejay"
@@ -358,9 +380,38 @@ spec:
                  name: chat-mcp-api-key
                  key: api-key
                  optional: true
-            # Print.Web — Thermal printer service on edge2
+            # FlowerCore.Knowledge MCP Phase 1 — direct Agent Zero client path.
            # Probe /healthz first, then try an authenticated initialize call.
            # If either fails, Agent Zero boots without fc_knowledge and keeps
            # the local corpus_search.py tool as the outage-safe path.
            - name: KNOWLEDGE_MCP_URL
              value: "http://knowledge-web.knowledge.svc/mcp"
            - name: KNOWLEDGE_MCP_HEALTH_URL
              value: "http://knowledge-web.knowledge.svc/healthz"
            - name: KNOWLEDGE_MCP_BEARER_TOKEN
              valueFrom:
                secretKeyRef:
                  name: knowledge-mcp-tokens
                  key: password
            # Print.Web — Thermal printer service on edge2.
            # PRINT_WEB_URL: internal HTTP (bypasses Traefik TLS — print_web.py
            # runs in-cluster and can reach edge2 directly on the PROD VLAN).
            # PRINT_WEB_API_KEY: from 1Password "Print.Web API Keys" password field,
            # synced by the print-web-api-keys OnePasswordItem CRD above.
            # The print_web.py Python tool reads both env vars for all HTTP calls.
            - name: PRINT_WEB_URL
              value: "http://10.0.57.16:5200"
            - name: PRINT_WEB_API_KEY
              valueFrom:
                secretKeyRef:
                  name: print-web-api-keys
                  key: password
            # Intranet search — use in-cluster HTTP (no step-ca TLS needed)
            # corpus_search.py reads FLOWERCORE_FLEET_VECTOR_DIR but that mount is not
            # on the cluster yet (BLUEJAY-WS only). The tool gracefully returns a
            # "no DB found" message with rebuild instructions rather than crashing.
            - name: FLOWERCORE_INTRANET_URL
              value: "http://intranet-web.intranet.svc:5300"
            # Kubernetes
            - name: KUBERNETES_SERVICE_HOST
              value: "kubernetes.default.svc"
@@ -395,7 +446,7 @@ spec:
              command:
                - /bin/bash
                - -c
-                - "curl -sf http://localhost:80/ > /dev/null && curl -sf --connect-timeout 3 http://127.0.0.1:11434/api/tags > /dev/null"
+                - "curl -sf http://localhost:80/ > /dev/null && curl -sf --connect-timeout 3 http://fc-llm-bridge.fc-llm-bridge.svc:8080/healthz > /dev/null"
            periodSeconds: 30
            failureThreshold: 2
          resources:
@@ -533,18 +584,6 @@ spec:
          protocol: UDP
        - port: 53
          protocol: TCP
    # Ollama on BLUEJAY-WS
    - to:
        - ipBlock:
            cidr: 10.0.56.20/32
      ports:
        - port: 11434
    # Ollama on edge1 fallback
    - to:
        - ipBlock:
            cidr: 10.0.57.17/32
      ports:
        - port: 11434
    # Print.Web on edge2
    - to:
        - ipBlock:
@@ -578,6 +617,26 @@ spec:
          protocol: TCP
        - port: 8080
          protocol: TCP
    # FlowerCore.Knowledge MCP (Phase 1) — in-cluster direct route with
    # anonymous /healthz probe plus authenticated /mcp initialize/tool calls.
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: knowledge
      ports:
        - port: 80
          protocol: TCP
        - port: 8080
          protocol: TCP
    # Intranet search API — use in-cluster svc so traffic stays inside
    # the cluster and is not blocked by the private-range egress denylist.
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: intranet
      ports:
        - port: 5300
          protocol: TCP
    # Allow internet (for kubectl image pull, etc)
    - to:
        - ipBlock:
--- a/apps/agent-zero/configmaps-bluejay.yaml
+++ b/apps/agent-zero/configmaps-bluejay.yaml
@@ -7209,6 +7209,9 @@ data:
            "keep_alive": keep_alive,
            "stream": False,
        })
        curl_headers = ["-H", "Content-Type: application/json"]
        if os.environ.get("FC_LLM_BRIDGE_API_KEY"):
            curl_headers.extend(["-H", f"X-Api-Key: {os.environ['FC_LLM_BRIDGE_API_KEY']}"])
        try:
            result = subprocess.run(
@@ -7216,7 +7219,7 @@ data:
                    "curl", "-s", "--max-time", "120",
                    "-X", "POST",
                    f"{api_base}/api/generate",
-                    "-H", "Content-Type: application/json",
+                    *curl_headers,
                    "-d", payload,
                ],
                capture_output=True,
@@ -13150,6 +13153,451 @@ data:
    - PowerShell 5.1 compatibility is assumed (no PowerShell 7+ features).
    - All commands run with `-NoProfile -NonInteractive` flags for clean execution.
    """
  corpus_search.py: |
    # FlowerCore Fleet Corpus Vector Search Tool
    #
    # Queries the AiStation-built SqliteVecVectorStore DB at /a0/usr/vectors/fleet.db
    # (bind-mounted read-only from /var/lib/flowercore/vector-stores/ on the host).
    # Embeds the query through Ollama's nomic-embed-text model, computes cosine
    # similarity against every stored chunk in pure Python (no numpy — not present
    # in the container), and returns the top-K nearest neighbors with source metadata.
    #
    # This is the offline-friendly counterpart to `intranet_search` (which hits the
    # Intranet's live REST API). Use it for Bible/Greek/Hebrew/Strong's lookups and
    # anywhere the workstation has a newer DB than the Intranet one. The store is
    # refreshed by `aistation-indexer build <edition>` — see the FlowerCore.Knowledge
    # ADR at docs/ai-agents/flowercore-knowledge-service-plan.md.
    import json
    import math
    import os
    import sqlite3
    import urllib.request
    from pathlib import Path
    from python.helpers.tool import Tool, Response
    DEFAULT_VECTORS_DIR = os.environ.get(
        "FLOWERCORE_FLEET_VECTOR_DIR",
        "/a0/usr/vectors",
    )
    # When the caller doesn't pick an explicit DB, prefer the biggest fleet tier
    # present on disk. Workstation → pi-edge → bmo-bot.
    PREFERRED_DB_ORDER = [
        os.environ.get("FLOWERCORE_FLEET_VECTOR_DB", ""),
        "fleet-workstation-full.db",
        "fleet-pi-edge.db",
        "fleet-bmo-bot.db",
    ]
    OLLAMA_BASE_URL = os.environ.get(
        "FLOWERCORE_AGENTZERO_OLLAMA_URL",
        "http://host.containers.internal:11434",
    )
    BRIDGE_API_KEY = os.environ.get("FC_LLM_BRIDGE_API_KEY", "").strip()
    EMBEDDING_MODEL = os.environ.get(
        "FLOWERCORE_FLEET_EMBEDDING_MODEL",
        "nomic-embed-text",
    )
    class CorpusSearch(Tool):
        async def execute(self, **kwargs) -> Response:
            """
            Semantic search over the FlowerCore fleet corpus (Bible texts, lexicons,
            dictionaries, morphology) pre-indexed by aistation-indexer.
            Args (via self.args):
                query (str): Search query text. Required unless action=stats.
                limit (int): Max results. Default 8.
                index (str): Optional index name filter ("bible-texts", "lexicons",
                             "dictionaries", "morphology"). Default: all indexes.
                repo (str): Optional repo filter (e.g. "world-english-bible").
                db (str): Override DB path OR file name inside FLOWERCORE_FLEET_VECTOR_DIR
                          (defaults to /a0/usr/vectors). If omitted, the largest
                          fleet tier present on disk is picked automatically.
                action (str): Optional. "stats" returns an inventory of all fleet DBs
                             visible to the tool (names, sizes, index counts, chunk
                             counts, last-built timestamps). No embedding call.
            Returns:
                Response with ranked chunks (score, source, text preview) OR
                (when action=stats) a markdown inventory of available fleet DBs.
            """
            query = (self.args.get("query") or "").strip()
            limit = int(self.args.get("limit") or 8)
            index_filter = (self.args.get("index") or "").strip()
            repo_filter = (self.args.get("repo") or "").strip()
            db_override = (self.args.get("db") or "").strip()
            action = (self.args.get("action") or "").strip().lower()
            if action == "stats":
                return Response(message=_render_stats(), break_loop=False)
            if not query:
                return Response(
                    message=(
                        "Error: 'query' is required unless action=stats.\n"
                        "Example: query=\"what does Genesis 1:1 say\" limit=5\n"
                        "Inventory: action=stats"
                    ),
                    break_loop=False,
                )
            db = _resolve_db(db_override)
            if db is None:
                return Response(
                    message=(
                        f"Error: no fleet vector DB found under {DEFAULT_VECTORS_DIR}.\n"
                        "Host side: run `aistation-indexer build fleet-workstation-full`\n"
                        "(or `fleet-pi-edge`/`fleet-bmo-bot`) to produce\n"
                        "`/var/lib/flowercore/vector-stores/<slug>.db`, then confirm the\n"
                        "Podman unit mounts that directory into `/a0/usr/vectors:ro`."
                    ),
                    break_loop=False,
                )
            try:
                query_vec = _embed(query)
            except Exception as e:
                return Response(
                    message=f"Error: failed to embed query via Ollama at {OLLAMA_BASE_URL}: {e}",
                    break_loop=False,
                )
            try:
                hits = _search(db, query_vec, index_filter, repo_filter, limit)
            except Exception as e:
                return Response(
                    message=f"Error: corpus search failed: {e}",
                    break_loop=False,
                )
            if not hits:
                return Response(
                    message=(
                        f"No matches for '{query}' in {db.name}.\n"
                        f"Indexes available: " + _list_indexes_summary(db)
                    ),
                    break_loop=False,
                )
            lines = [f"**Corpus search: `{query}`**  (top {len(hits)} of {limit} requested, DB={db.name})", ""]
            for rank, h in enumerate(hits, 1):
                passage = h.get("passage") or ""
                lang = h.get("language") or ""
                meta_bits = [x for x in (h["index"], h["repo"], passage, lang) if x]
                meta = "  ·  ".join(meta_bits)
                preview = h["text"]
                if len(preview) > 320:
                    preview = preview[:320].rstrip() + "…"
                lines.append(f"{rank}. **{h['score']:.3f}**  {meta}")
                lines.append(f"   `{h['source']}`")
                lines.append(f"   {preview}")
                lines.append("")
            return Response(message="\n".join(lines).rstrip() + "\n", break_loop=False)
    def _resolve_db(override: str) -> "Path | None":
        """Pick a fleet DB by explicit path, explicit filename, or preferred order."""
        vectors_dir = Path(DEFAULT_VECTORS_DIR)
        if override:
            # Absolute or relative path that points at a real file wins outright.
            p = Path(override)
            if p.is_absolute() and p.exists():
                return p
            # Otherwise treat it as a filename within the vectors dir.
            candidate = vectors_dir / override
            if candidate.exists():
                return candidate
            return None
        for name in PREFERRED_DB_ORDER:
            if not name:
                continue
            p = Path(name) if Path(name).is_absolute() else vectors_dir / name
            if p.exists():
                return p
        # Fallback: any *.db in the dir, largest first.
        if vectors_dir.is_dir():
            candidates = sorted(vectors_dir.glob("*.db"), key=lambda p: p.stat().st_size, reverse=True)
            if candidates:
                return candidates[0]
        return None
    def _embed(text: str) -> list:
        """Embed a query via Ollama's /api/embeddings. Single-vector response."""
        body = json.dumps({"model": EMBEDDING_MODEL, "prompt": text}).encode("utf-8")
        headers = {"Content-Type": "application/json"}
        if BRIDGE_API_KEY:
            headers["X-Api-Key"] = BRIDGE_API_KEY
        req = urllib.request.Request(
            f"{OLLAMA_BASE_URL.rstrip('/')}/api/embeddings",
            data=body,
            headers=headers,
        )
        with urllib.request.urlopen(req, timeout=60) as resp:
            data = json.loads(resp.read().decode("utf-8"))
        vec = data.get("embedding")
        if not isinstance(vec, list) or not vec:
            raise RuntimeError(f"Ollama returned no embedding: {data}")
        return [float(x) for x in vec]
    def _cosine(a: list, b: list) -> float:
        """Cosine similarity in pure Python — no numpy in the A0 container."""
        # zip() stops at the shorter — AiStation DB guarantees same dim per index.
        dot = 0.0
        na = 0.0
        nb = 0.0
        for x, y in zip(a, b):
            dot += x * y
            na += x * x
            nb += y * y
        if na == 0.0 or nb == 0.0:
            return 0.0
        return dot / (math.sqrt(na) * math.sqrt(nb))
    def _search(db_path: Path, query_vec: list, index_filter: str, repo_filter: str, limit: int) -> list:
        """Load entries, compute cosine, return top-K.
        SqliteVecVectorStore schema:
          VectorIndexes(IndexName, Dimensions, UpdatedAtUtc)
          VectorEntries(IndexName, ChunkId, TextContent, SourceRepo, SourceFile,
                        Book, Chapter, VerseRange, Language, ContentType, License,
                        EstimatedTokens, EmbeddingJson)
        Embeddings are stored as JSON arrays in EmbeddingJson; similarity is computed
        in Python. For ~100k chunks × 768 dims this takes a couple seconds on a
        workstation — acceptable for interactive A0 use.
        """
        conn = sqlite3.connect(f"file:{db_path}?mode=ro", uri=True)
        try:
            sql = [
                "SELECT IndexName, ChunkId, TextContent, SourceRepo, SourceFile, ",
                "       Book, Chapter, VerseRange, Language, EmbeddingJson ",
                "FROM VectorEntries",
            ]
            where = []
            params = []
            if index_filter:
                where.append("IndexName = ?")
                params.append(index_filter)
            if repo_filter:
                where.append("SourceRepo LIKE ?")
                params.append(f"%{repo_filter}%")
            if where:
                sql.append(" WHERE " + " AND ".join(where))
            sql.append(";")
            cursor = conn.execute("".join(sql), params)
            # Min-heap by (score, ...) would be faster but for interactive use we
            # just sort at the end — simpler and readable.
            scored = []
            for row in cursor:
                idx, chunk_id, text, repo, source_file, book, chapter, verses, lang, emb_json = row
                try:
                    vec = json.loads(emb_json)
                except (json.JSONDecodeError, TypeError):
                    continue
                score = _cosine(query_vec, vec)
                passage = None
                if book and chapter:
                    passage = f"{book} {chapter}"
                    if verses:
                        passage += f":{verses}"
                scored.append((score, {
                    "index": idx,
                    "chunk_id": chunk_id,
                    "text": text,
                    "repo": repo or "",
                    "source": source_file or "",
                    "passage": passage or "",
                    "language": lang or "",
                }))
            scored.sort(key=lambda t: t[0], reverse=True)
            return [{"score": s, **meta} for s, meta in scored[:limit]]
        finally:
            conn.close()
    def _render_stats() -> str:
        """Markdown inventory of every *.db in FLOWERCORE_FLEET_VECTOR_DIR."""
        vectors_dir = Path(DEFAULT_VECTORS_DIR)
        if not vectors_dir.is_dir():
            return f"No fleet vector dir mounted at {vectors_dir}. Ask the host operator to build an index with scripts/agent-zero/build-fleet-index.sh."
        dbs = sorted(vectors_dir.glob("*.db"))
        if not dbs:
            return f"No fleet DBs present under {vectors_dir}. Run `scripts/agent-zero/build-fleet-index.sh fleet-workstation-full` on the host."
        lines = [f"**Fleet vector DB inventory** ({vectors_dir})", ""]
        for db in dbs:
            size_mb = db.stat().st_size / (1024 * 1024)
            lines.append(f"### `{db.name}` ({size_mb:.1f} MB)")
            try:
                conn = sqlite3.connect(f"file:{db}?mode=ro", uri=True)
                try:
                    idx_rows = conn.execute(
                        "SELECT IndexName, Dimensions, UpdatedAtUtc FROM VectorIndexes ORDER BY IndexName;"
                    ).fetchall()
                    if not idx_rows:
                        lines.append("- (no indexes registered)")
                    else:
                        counts = dict(conn.execute(
                            "SELECT IndexName, COUNT(*) FROM VectorEntries GROUP BY IndexName;"
                        ).fetchall())
                        for name, dim, updated in idx_rows:
                            count = counts.get(name, 0)
                            lines.append(f"- **{name}** — {count:,} chunks × {dim}d  (built {updated})")
                finally:
                    conn.close()
            except Exception as e:
                lines.append(f"- (inspect failed: {e})")
            lines.append("")
        lines.append(f"**Tool defaults:** embedding model `{EMBEDDING_MODEL}`, Ollama at `{OLLAMA_BASE_URL}`. Pick a DB with `db=<filename>`; filter by `index=<name>`/`repo=<substring>`.")
        return "\n".join(lines).rstrip() + "\n"
    def _list_indexes_summary(db_path: Path) -> str:
        try:
            conn = sqlite3.connect(f"file:{db_path}?mode=ro", uri=True)
            try:
                rows = conn.execute(
                    "SELECT IndexName, Dimensions, "
                    "  (SELECT COUNT(*) FROM VectorEntries WHERE VectorEntries.IndexName = VectorIndexes.IndexName) "
                    "FROM VectorIndexes ORDER BY IndexName;"
                ).fetchall()
                if not rows:
                    return "(no indexes)"
                return ", ".join(f"{r[0]}({r[2]}×{r[1]}d)" for r in rows)
            finally:
                conn.close()
        except Exception as e:
            return f"(couldn't list: {e})"
  intranet_search.py: |
    # Intranet Vector Search Tool
    # Queries the Blue Jay Lab Intranet's Shared.Indexing RAG corpus over its
    # live REST API (https://intranet.iamworkin.lan/search). Returns ranked chunks
    # with source file paths and scores.
    import json
    import os
    import ssl
    import urllib.parse
    import urllib.request
    from python.helpers.tool import Tool, Response
    INTRANET_BASE_URL = os.environ.get(
        "FLOWERCORE_INTRANET_URL",
        "https://intranet.iamworkin.lan",
    )
    STEPCA_ROOT_CRT = "/a0/usr/ca/stepca-root.crt"
    def _ssl_ctx() -> ssl.SSLContext:
        ctx = ssl.create_default_context()
        if os.path.exists(STEPCA_ROOT_CRT):
            ctx.load_verify_locations(cafile=STEPCA_ROOT_CRT)
        return ctx
    class IntranetSearch(Tool):
        async def execute(self, **kwargs) -> Response:
            """
            Search the Blue Jay Lab intranet corpus (docs, project notes, dashboards).
            Args (via self.args):
                query (str): Search query. Required.
                limit (int): Max chunks to return. Default 8.
                corpus (str): Optional corpus filter (e.g. "notes", "docs").
            Returns:
                Response with ranked chunk text, source path, and score.
            """
            query = self.args.get("query", "").strip()
            limit = int(self.args.get("limit", 8))
            corpus = self.args.get("corpus", "").strip()
            if not query:
                return Response(
                    message="Error: 'query' is required.",
                    break_loop=False,
                )
            params = {"q": query, "topK": str(limit)}
            if corpus:
                params["indexName"] = corpus
            url = f"{INTRANET_BASE_URL}/api/search?{urllib.parse.urlencode(params)}"
            try:
                req = urllib.request.Request(url, headers={"Accept": "application/json"})
                with urllib.request.urlopen(req, timeout=20, context=_ssl_ctx()) as resp:
                    raw = resp.read().decode("utf-8", errors="replace")
            except Exception as exc:
                return Response(
                    message=f"Intranet search failed: {exc}\nURL: {url}",
                    break_loop=False,
                )
            try:
                data = json.loads(raw)
            except json.JSONDecodeError:
                return Response(
                    message=f"Intranet returned non-JSON response:\n{raw[:500]}",
                    break_loop=False,
                )
            hits = data if isinstance(data, list) else (
                data.get("results") or data.get("hits") or data.get("chunks") or []
            )
            if not hits:
                return Response(
                    message=f"No intranet results for query: {query!r}",
                    break_loop=False,
                )
            lines = [f"# Intranet search: {query} ({len(hits)} hits)\n"]
            for i, hit in enumerate(hits[:limit], 1):
                src = (
                    hit.get("sourceFile")
                    or hit.get("source")
                    or hit.get("path")
                    or hit.get("file")
                    or "?"
                )
                repo = hit.get("sourceRepo") or ""
                idx = hit.get("indexName") or ""
                score = hit.get("score") or hit.get("similarity") or ""
                text = (
                    hit.get("snippet")
                    or hit.get("text")
                    or hit.get("content")
                    or hit.get("chunk")
                    or ""
                ).strip()
                if len(text) > 600:
                    text = text[:600] + "..."
                header = f"## [{i}] {repo}/{src}" if repo else f"## [{i}] {src}"
                if idx:
                    header += f"  ({idx})"
                if score:
                    header += f"  score={score:.3f}" if isinstance(score, float) else f"  score={score}"
                lines.append(header)
                lines.append(text)
                lines.append("")
            return Response(message="\n".join(lines), break_loop=False)
 kind: ConfigMap
 metadata:
  name: bluejay-tools-c
--- a/apps/asterisk/deployment.yaml
+++ b/apps/asterisk/deployment.yaml
@@ -20,7 +20,19 @@ spec:
      nodeSelector:
        kubernetes.io/hostname: rke2-agent1
      hostNetwork: true
-      dnsPolicy: ClusterFirstWithHostNet
+      # Keep the search list free of iamworkin.lan so CoreDNS's wildcard
      # template cannot hijack public egress like downloads.asterisk.org.
      dnsPolicy: None
      dnsConfig:
        nameservers:
          - 10.43.0.10
        searches:
          - telephony.svc.cluster.local
          - svc.cluster.local
          - cluster.local
        options:
          - name: ndots
            value: "2"
      securityContext:
        fsGroup: 0
      # CoreDNS in this cluster has an iamworkin.lan wildcard that catches
--- a/apps/cdi/README.md
+++ b/apps/cdi/README.md
@@ -0,0 +1,69 @@
 # CDI — Containerized Data Importer
 KubeVirt's `containerized-data-importer` for populating PVCs from external
 sources (HTTP, HTTPS, container registry, S3, virtctl upload). Required to
 import the Windows Server 2025 ISO into the `windows-server-2025-iso` PVC
 that `apps/kubevirt-vms/ci1.yaml` mounts as a CDROM.
 ## Files
 | File              | Source                                                                                                            | Purpose                                            |
 | ----------------- | ----------------------------------------------------------------------------------------------------------------- | -------------------------------------------------- |
 | `cdi-operator.yaml` | [`v1.65.0`](https://github.com/kubevirt/containerized-data-importer/releases/tag/v1.65.0) — verbatim copy        | Installs operator + CRDs (5779 lines, large)       |
 | `cdi-cr.yaml`     | [`v1.65.0`](https://github.com/kubevirt/containerized-data-importer/releases/tag/v1.65.0) — annotated + commented | Tells operator to deploy CDI components          |
 `cdi-operator.yaml` is **vendored verbatim** from the upstream release for
 air-gap reproducibility (no internet fetch at deploy time, ArgoCD prune
 contracts hold). To bump versions:
 ```bash
 CDI_VER=v1.66.0  # for example
 curl -sL "https://github.com/kubevirt/containerized-data-importer/releases/download/${CDI_VER}/cdi-operator.yaml" \
  -o apps/cdi/cdi-operator.yaml
 curl -sL "https://github.com/kubevirt/containerized-data-importer/releases/download/${CDI_VER}/cdi-cr.yaml" \
  -o /tmp/cdi-cr-new.yaml  # then re-apply project header diff
 git diff apps/cdi/  # review
 git commit + push
 ```
 ## Verify after deploy
 ```bash
 kubectl -n cdi get pods               # operator + apiserver + deployment + uploadproxy
 kubectl get cdis cdi -o jsonpath='{.status.phase}'  # "Deployed"
 kubectl get crd | grep cdi.kubevirt.io
 # Expected CRDs: datavolumes.cdi.kubevirt.io, cdiconfigs.cdi.kubevirt.io,
 # storageprofiles.cdi.kubevirt.io, dataimportcrons.cdi.kubevirt.io,
 # datasources.cdi.kubevirt.io, objecttransfers.cdi.kubevirt.io
 ```
 ## Use after install
 ```yaml
 # Example DataVolume that imports from HTTP
 apiVersion: cdi.kubevirt.io/v1beta1
 kind: DataVolume
 metadata:
  name: my-iso
 spec:
  source:
    http:
      url: "https://server/path/to.iso"
  pvc:
    accessModes: [ReadWriteOnce]
    resources:
      requests:
        storage: 10Gi
    storageClassName: longhorn
 ```
 ```bash
 # Or upload from local disk via virtctl
 virtctl image-upload pvc my-iso \
  --image-path ./my.iso \
  --size 10Gi \
  --storage-class longhorn \
  --access-mode ReadWriteOnce \
  --uploadproxy-url https://cdi-uploadproxy.cdi.svc:443 \
  --insecure
 ```
--- a/apps/cdi/cdi-cr.yaml
+++ b/apps/cdi/cdi-cr.yaml
@@ -0,0 +1,36 @@
 # =============================================================================
 # CDI CR — Tells the CDI operator to install CDI components into the cluster.
 # =============================================================================
 # After cdi-operator.yaml is applied, the operator watches for THIS resource
 # (CDI named "cdi"). When found, it deploys cdi-apiserver, cdi-deployment,
 # cdi-uploadproxy, cdi-cronjob, and the importer/uploadserver/cloner pods.
 #
 # Configuration:
 #   - HonorWaitForFirstConsumer: PVCs created by DataVolumes wait for first
 #     pod to schedule before binding (lets storage class pick best node).
 #   - WebhookPvcRendering: validates PVC creation against CDI policies.
 #   - imagePullPolicy IfNotPresent: re-pull only on tag rotation.
 #   - nodeSelector linux: pin to Linux nodes (no Windows worker support).
 #
 # Andrew may want to add a `uploadProxyURLOverride` later to expose the
 # uploadproxy via Traefik IngressRoute for `virtctl image-upload` from
 # BLUEJAY-WS without `kubectl port-forward`. Phase 2 enhancement.
 # =============================================================================
 apiVersion: cdi.kubevirt.io/v1beta1
 kind: CDI
 metadata:
  name: cdi
  annotations:
    bluejay.iamworkin.lan/source: "kubevirt/containerized-data-importer v1.65.0"
 spec:
  config:
    featureGates:
    - HonorWaitForFirstConsumer
    - WebhookPvcRendering
  imagePullPolicy: IfNotPresent
  infra:
    nodeSelector:
      kubernetes.io/os: linux
  workload:
    nodeSelector:
      kubernetes.io/os: linux
--- a/apps/cdi/cdi-operator.yaml
+++ b/apps/cdi/cdi-operator.yaml
--- a/apps/edge2-services/edge2-services.yaml
+++ b/apps/edge2-services/edge2-services.yaml
@@ -0,0 +1,106 @@
 # edge2 Services — Traefik IngressRoutes for FlowerCore Print.Web on edge2
 # Proxies print.iamworkin.lan to edge2 (10.0.57.16:5200) via headless Service
 # + manual Endpoints (same K8s external-proxy pattern as noc-services).
 #
 # Print.Web has its own X-Api-Key authentication and exposes anonymous
 # endpoints for the bookmarklet / Python CLI / cups-notifier flow, so no
 # Traefik basicAuth middleware is wired here.
 #
 # ArgoCD managed - BlueJay Lab
 ---
 apiVersion: v1
 kind: Namespace
 metadata:
  name: edge2-proxy
  labels:
    app.kubernetes.io/part-of: bluejay-infra
 ---
 # ============================================================
 # Print.Web - edge2:5200 (FlowerCore.Print.Web on Pi 4)
 # ============================================================
 apiVersion: v1
 kind: Service
 metadata:
  name: print-web-external
  namespace: edge2-proxy
 spec:
  ports:
    - port: 5200
      targetPort: 5200
      name: http
  clusterIP: None
 ---
 apiVersion: v1
 kind: Endpoints
 metadata:
  name: print-web-external
  namespace: edge2-proxy
 subsets:
  - addresses:
      - ip: 10.0.57.16
    ports:
      - port: 5200
        name: http
 ---
 apiVersion: cert-manager.io/v1
 kind: Certificate
 metadata:
  name: print-web-tls
  namespace: edge2-proxy
 spec:
  secretName: print-web-tls
  issuerRef:
    name: step-ca-acme
    kind: ClusterIssuer
  dnsNames:
    - print.iamworkin.lan
 ---
 apiVersion: traefik.io/v1alpha1
 kind: IngressRoute
 metadata:
  name: print-web
  namespace: edge2-proxy
 spec:
  entryPoints:
    - websecure
  routes:
    - kind: Rule
      match: Host(`print.iamworkin.lan`)
      services:
        - name: print-web-external
          port: 5200
  tls:
    secretName: print-web-tls
 ---
 # NetworkPolicy: allow Traefik ingress, allow egress to edge2 + DNS
 apiVersion: networking.k8s.io/v1
 kind: NetworkPolicy
 metadata:
  name: edge2-proxy-netpol
  namespace: edge2-proxy
 spec:
  podSelector: {}
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: traefik-system
  egress:
    - to:
        - ipBlock:
            cidr: 10.0.57.16/32
      ports:
        - port: 5200
          protocol: TCP
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system
      ports:
        - port: 53
          protocol: UDP
        - port: 53
          protocol: TCP
--- a/apps/fc-desktop/fc-desktop.yaml
+++ b/apps/fc-desktop/fc-desktop.yaml
@@ -1,5 +1,18 @@
 # FlowerCore Remote Desktop — TLS + Ingress
-# Deployment and Service managed by deploy script (not ArgoCD)
+#
 # Source-of-truth split:
 #   - bluejay-infra OWNS: Certificate, IngressRoute, all NetworkPolicies
 #     (see network-policies.yaml in this directory).
 #   - FlowerCore.RemoteDesktop scripts/deploy-web.sh OWNS: Deployment +
 #     Service. Reason: image refs like `localhost/fc-desktop:linux-xfce`
 #     only exist on each node's containerd after a manual import, so a
 #     Deployment manifest in bluejay-infra would race the image-import
 #     step and crash-loop.
 #
 # NetworkPolicies moved into bluejay-infra 2026-05-07 — previously they
 # were applied via the deploy script's kubectl apply calls, which broke
 # cluster-rebuild repeatability. See
 # feedback_networkpolicies_belong_in_bluejay_infra.md.
 ---
 apiVersion: cert-manager.io/v1
 kind: Certificate
@@ -23,6 +36,14 @@ spec:
  entryPoints:
    - websecure
  routes:
    # Host-level catch-all for desktop.iamworkin.lan. The /guacamole
    # path-prefix match lives in apps/guacamole/guacamole.yaml as a
    # separate IngressRoute in the guacamole namespace — the cluster
    # Traefik disallows cross-namespace service refs, so the PathPrefix
    # rule can't sit here. Traefik's router matching precedence gives
    # longer/more-specific rules priority automatically, so as long as
    # the guacamole IngressRoute exists it takes /guacamole traffic
    # before this catch-all sees it.
    - match: Host(`desktop.iamworkin.lan`)
      kind: Rule
      services:
--- a/apps/fc-desktop/network-policies.yaml
+++ b/apps/fc-desktop/network-policies.yaml
@@ -0,0 +1,332 @@
 # FlowerCore Remote Desktop — NetworkPolicies (GitOps-managed)
 #
 # Moved into bluejay-infra 2026-05-07 as part of the regroup audit. These
 # four policies were previously applied via FlowerCore.RemoteDesktop's
 # scripts/deploy-web.sh `kubectl apply` calls, which meant a fresh cluster
 # rebuild from bluejay-infra alone would miss them — Browser Lab session
 # isolation, control-plane allow-list, and HTTP-01 cert renewal would all
 # silently fail to come up.
 #
 # Source-of-truth contract:
 #   - bluejay-infra OWNS all NetworkPolicy + Certificate + IngressRoute
 #     resources for fc-desktop.
 #   - FlowerCore.RemoteDesktop's scripts/deploy-web.sh continues to own
 #     the Deployment + Service apply (because the image ref
 #     `localhost/fc-desktop:linux-xfce` only exists on each node's
 #     containerd after a manual import — it can't be pulled from a
 #     registry, so a Deployment manifest in bluejay-infra would race the
 #     image-import step and crash-loop).
 ---
 # 1) desktop-isolation — Browser Lab session pods.
 #
 # Locks down pods labeled `app.kubernetes.io/name=remote-desktop` (every
 # session pod regardless of template). Allows guacd ingress for the VNC/RDP
 # display lane and remotedesktop-web's pre-handoff probing. Egress: NFS to
 # Synology, DNS, Traefik (cluster + LB VIP), Intranet (Browser Lab home).
 apiVersion: networking.k8s.io/v1
 kind: NetworkPolicy
 metadata:
  name: desktop-isolation
  namespace: fc-desktop
  labels:
    app.kubernetes.io/part-of: remotedesktop
    app.kubernetes.io/component: isolation
 spec:
  podSelector:
    matchLabels:
      app.kubernetes.io/name: remote-desktop
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: guacamole
      ports:
        - port: 3000
          protocol: TCP
        - port: 3001
          protocol: TCP
        - port: 5901
          protocol: TCP
        - port: 3389
          protocol: TCP
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: fc-desktop
          podSelector:
            matchLabels:
              app.kubernetes.io/name: remotedesktop-web
      ports:
        - port: 3000
          protocol: TCP
        - port: 5901
          protocol: TCP
  egress:
    # NFS to Synology
    - to:
        - ipBlock:
            cidr: 10.0.58.3/32
      ports:
        - port: 2049
          protocol: TCP
        - port: 2049
          protocol: UDP
        - port: 111
          protocol: TCP
        - port: 111
          protocol: UDP
    - to:
        - ipBlock:
            cidr: 10.0.58.3/32
      ports:
        - port: 445
          protocol: TCP
    - to: []
      ports:
        - port: 53
          protocol: UDP
        - port: 53
          protocol: TCP
    - to:
        - ipBlock:
            cidr: 10.0.56.200/32
        - ipBlock:
            cidr: 10.43.33.87/32
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: traefik-system
          podSelector:
            matchLabels:
              app.kubernetes.io/name: traefik
      ports:
        - port: 80
          protocol: TCP
        - port: 443
          protocol: TCP
        - port: 8000
          protocol: TCP
        - port: 8443
          protocol: TCP
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: intranet
          podSelector:
            matchLabels:
              app: intranet-web
      ports:
        - port: 5300
          protocol: TCP
 ---
 # 2) fc-desktop-default-deny — namespace-wide catch-all.
 #
 # Selects every pod EXCEPT remotedesktop-web (the public-surface control
 # plane) and applies default-deny semantics for both Ingress and Egress.
 # Closes the gap where session pods land WITHOUT the desktop-isolation
 # policy's `app.kubernetes.io/name=remote-desktop` label, plus prevents
 # arbitrary debug sidecars / kubectl debug images from getting cluster
 # access.
 #
 # CRITICAL: also catches transient cm-acme-http-solver pods (that's the
 # bug this whole regroup chased). The cm-acme-http-solver-allow policy
 # below is the explicit carve-out.
 apiVersion: networking.k8s.io/v1
 kind: NetworkPolicy
 metadata:
  name: fc-desktop-default-deny
  namespace: fc-desktop
  labels:
    app.kubernetes.io/part-of: remotedesktop
    app.kubernetes.io/component: isolation
 spec:
  podSelector:
    matchExpressions:
      - key: app.kubernetes.io/name
        operator: NotIn
        values:
          - remotedesktop-web
  policyTypes:
    - Ingress
    - Egress
 ---
 # 3) remotedesktop-web-isolation — control plane explicit allow-list.
 #
 # remotedesktop-web is the only pod label the default-deny excludes, so
 # without this policy the control plane would have wide-open Ingress AND
 # Egress. This re-introduces a tight allow-list:
 #   - Ingress: Traefik only on TCP/8080
 #   - Egress: CoreDNS, K8s API, Guacamole admin, NFS, Intranet,
 #     Traefik (cluster + LB), and the fc-desktop namespace itself
 #     (for session pod readiness probing).
 apiVersion: networking.k8s.io/v1
 kind: NetworkPolicy
 metadata:
  name: remotedesktop-web-isolation
  namespace: fc-desktop
  labels:
    app.kubernetes.io/part-of: remotedesktop
    app.kubernetes.io/component: isolation
 spec:
  podSelector:
    matchLabels:
      app.kubernetes.io/name: remotedesktop-web
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: traefik-system
          podSelector:
            matchLabels:
              app.kubernetes.io/name: traefik
      ports:
        - port: 8080
          protocol: TCP
  egress:
    # CoreDNS
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system
          podSelector:
            matchLabels:
              k8s-app: kube-dns
      ports:
        - port: 53
          protocol: UDP
        - port: 53
          protocol: TCP
    # K8s API server
    - to: []
      ports:
        - port: 443
          protocol: TCP
        - port: 6443
          protocol: TCP
    # Guacamole admin
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: guacamole
      ports:
        - port: 8080
          protocol: TCP
    # NFS to Synology
    - to:
        - ipBlock:
            cidr: 10.0.58.3/32
      ports:
        - port: 2049
          protocol: TCP
        - port: 2049
          protocol: UDP
        - port: 111
          protocol: TCP
        - port: 111
          protocol: UDP
    # Intranet web
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: intranet
          podSelector:
            matchLabels:
              app: intranet-web
      ports:
        - port: 5300
          protocol: TCP
    # Cluster Traefik pods (in-cluster service resolution + Guacamole
    # routing handoff where web app builds URLs against the public host
    # but resolves internally).
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: traefik-system
          podSelector:
            matchLabels:
              app.kubernetes.io/name: traefik
      ports:
        - port: 80
          protocol: TCP
        - port: 443
          protocol: TCP
        - port: 8080
          protocol: TCP
        - port: 8443
          protocol: TCP
    # fc-desktop namespace — session pod probing during browser-access
    # readiness checks.
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: fc-desktop
      ports:
        - port: 3000
          protocol: TCP
        - port: 3001
          protocol: TCP
        - port: 5901
          protocol: TCP
        - port: 3389
          protocol: TCP
 ---
 # 4) cm-acme-http-solver-allow — cert-manager HTTP-01 carve-out.
 #
 # Without this, fc-desktop-default-deny catches the transient solver pods
 # cert-manager creates for each renewal (they don't carry the
 # remotedesktop-web label). Caused 8-day silent renewal failure on
 # desktop.iamworkin.lan in 2026-04-28..2026-05-07 (see
 # feedback_certmanager_renewal_stuck_when_solver_blocked_by_namespace_default_deny.md).
 #
 # Authorizes:
 #   - Ingress on TCP/8089 from cluster Traefik (which proxies the external
 #     HTTP-01 GET on port 80 through to the solver).
 #   - Egress for cluster DNS (defensive — newer cert-manager probes from
 #     inside the solver too).
 #
 # The `acme.cert-manager.io/http01-solver=true` label is set by
 # cert-manager itself on every solver pod automatically.
 apiVersion: networking.k8s.io/v1
 kind: NetworkPolicy
 metadata:
  name: cm-acme-http-solver-allow
  namespace: fc-desktop
  labels:
    app.kubernetes.io/part-of: remotedesktop
    app.kubernetes.io/component: cert-renewal
 spec:
  podSelector:
    matchLabels:
      acme.cert-manager.io/http01-solver: "true"
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: traefik-system
          podSelector:
            matchLabels:
              app.kubernetes.io/name: traefik
      ports:
        - port: 8089
          protocol: TCP
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system
          podSelector:
            matchLabels:
              k8s-app: kube-dns
      ports:
        - port: 53
          protocol: UDP
        - port: 53
          protocol: TCP
--- a/apps/fc-distribution/fc-distribution.yaml
+++ b/apps/fc-distribution/fc-distribution.yaml
@@ -118,7 +118,7 @@ spec:
          #   dotnet.exe publish -c Release -o deploy/app \
          #     src/FlowerCore.Distribution.Web/FlowerCore.Distribution.Web.csproj
          #   podman build -t localhost/fc-distribution:v<tag> -f deploy/Dockerfile.deploy deploy
-          image: localhost/fc-distribution:v202604240010
+          image: localhost/fc-distribution:v202605061948
          imagePullPolicy: Never
          ports:
            - containerPort: 8080
@@ -151,6 +151,10 @@ spec:
              value: "/signing/aistation-field/chain.pem"
            - name: FlowerCore__Distribution__Signing__EditionCerts__aistation-field__KeyPath
              value: "/signing/aistation-field/private-key.pem"
            # Public distribution host is GET/HEAD-only at Traefik; this
            # entitlement list controls which editions are readable there.
            - name: FlowerCore__Distribution__EntitlementPublic__PublicEditions__0
              value: "*"
          resources:
            requests:
              cpu: 100m
@@ -262,8 +266,12 @@ spec:
    kind: ClusterIssuer
  dnsNames:
    - dist.iamworkin.lan
-  duration: 2160h    # 90d
+  # step-ca ACME caps lifetime at 30d; requesting 90d silently capped
-  renewBefore: 720h  # 30d
+  # made renewBefore=cert-lifetime → perpetual renewal loop (10880+ CRs
  # in 18h on 2026-05-07). Match working 720h/240h pattern from other
  # FC services.
  duration: 720h     # 30d (step-ca cap)
  renewBefore: 240h  # 10d
 ---
 apiVersion: traefik.io/v1alpha1
 kind: IngressRoute
--- a/apps/fc-llm-bridge/fc-llm-bridge.yaml
+++ b/apps/fc-llm-bridge/fc-llm-bridge.yaml
@@ -87,6 +87,20 @@ spec:
        prometheus.io/port: "8080"
        prometheus.io/path: "/metrics"
    spec:
      # Use an explicit DNS policy so external FQDNs like api.anthropic.com are
      # resolved directly instead of being expanded through the cluster search
      # path that includes iamworkin.lan.
      dnsPolicy: None
      dnsConfig:
        nameservers:
          - 10.43.0.10
        searches:
          - fc-llm-bridge.svc.cluster.local
          - svc.cluster.local
          - cluster.local
        options:
          - name: ndots
            value: "2"
      securityContext:
        fsGroup: 1654
        fsGroupChangePolicy: OnRootMismatch
@@ -97,7 +111,7 @@ spec:
          #   dotnet.exe publish -c Release -o deploy/app \
          #     src/FlowerCore.LlmBridge.Web/FlowerCore.LlmBridge.Web.csproj
          #   podman build -t localhost/fc-llm-bridge:v<tag> -f deploy/Dockerfile.deploy deploy
-          image: localhost/fc-llm-bridge:v202604231520
+          image: localhost/fc-llm-bridge:v202604300022
          imagePullPolicy: Never
          ports:
            - containerPort: 8080
@@ -116,6 +130,10 @@ spec:
              value: "default"
            - name: FlowerCore__LlmBridge__DefaultAppName
              value: "agent-zero"
            - name: FlowerCore__LlmBridge__UtilModel
              value: "qwen2.5:1.5b"
            - name: FlowerCore__LlmBridge__EmbedModel
              value: "nomic-embed-text"
            # Per-consumer API keys — from OnePasswordItem fc-llm-bridge-api-keys.
            # Each field becomes a Secret key of the same name. The key-name
            # lands in the auth principal's `fc.app` claim for ledger scoping.
@@ -207,17 +225,6 @@ spec:
              port: 8080
            initialDelaySeconds: 15
            periodSeconds: 30
      # Lower ndots so external FQDNs like api.anthropic.com are tried BEFORE
      # the ndots:5 default expands them through the cluster search path, which
      # includes iamworkin.lan. CoreDNS has a `template IN A iamworkin.lan`
      # wildcard that answers `api.anthropic.com.iamworkin.lan` with the
      # Traefik VIP, which then serves a TRAEFIK-DEFAULT-CERT TLS cert and
      # breaks egress to the real Anthropic API (memory:
      # feedback_coredns_ndots_template_collision, generalized to external DNS).
      dnsConfig:
        options:
          - name: ndots
            value: "2"
      volumes:
        - name: data
          persistentVolumeClaim:
--- a/apps/fc-messageboard/fc-messageboard.yaml
+++ b/apps/fc-messageboard/fc-messageboard.yaml
@@ -69,16 +69,14 @@ spec:
              memory: "512Mi"
              cpu: "500m"
          livenessProbe:
-            httpGet:
+            tcpSocket:
              path: /health
              port: 8080
            initialDelaySeconds: 10
            periodSeconds: 30
            timeoutSeconds: 5
            failureThreshold: 3
          readinessProbe:
-            httpGet:
+            tcpSocket:
              path: /health
              port: 8080
            initialDelaySeconds: 10
            periodSeconds: 10
--- a/apps/fc-signalcontrol/fc-signalcontrol.yaml
+++ b/apps/fc-signalcontrol/fc-signalcontrol.yaml
@@ -76,15 +76,13 @@ spec:
              memory: "512Mi"
              cpu: "500m"
          livenessProbe:
-            httpGet:
+            tcpSocket:
              path: /health
              port: http
            initialDelaySeconds: 30
            periodSeconds: 30
            timeoutSeconds: 5
          readinessProbe:
-            httpGet:
+            tcpSocket:
              path: /health
              port: http
            initialDelaySeconds: 10
            periodSeconds: 10
--- a/apps/fc-ttsreader/biblical-tts/Dockerfile
+++ b/apps/fc-ttsreader/biblical-tts/Dockerfile
@@ -0,0 +1,35 @@
 # FlowerCore biblical-tts — eSpeak-NG-backed TTS for Ancient Greek (grc) and
 # Hebrew (he). Wraps the espeak-ng binary in a small FastAPI app exposing
 # /tts (returns WAV) and /timings (returns word timings via espeak's
 # --pho output). Same shape as fc-speech-align so AiStation can talk to
 # both with one HTTP client pattern.
 FROM python:3.12-slim
 ENV PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1 \
    PIP_DISABLE_PIP_VERSION_CHECK=1 \
    PIP_NO_CACHE_DIR=1
 # espeak-ng has built-in support for grc (Ancient Greek) and he (Hebrew).
 # libsndfile1 is for the wav post-processing step.
 RUN apt-get update \
    && apt-get install -y --no-install-recommends \
        espeak-ng \
        libsndfile1 \
        ca-certificates \
    && rm -rf /var/lib/apt/lists/*
 WORKDIR /app
 COPY requirements.txt /app/
 RUN pip install --no-cache-dir -r requirements.txt
 COPY app.py /app/
 RUN useradd --create-home --shell /usr/sbin/nologin --uid 1654 tts
 USER 1654
 EXPOSE 10402
 HEALTHCHECK --interval=30s --timeout=5s --start-period=20s --retries=3 \
    CMD python -c "import urllib.request,sys; urllib.request.urlopen('http://127.0.0.1:10402/health',timeout=3); sys.exit(0)" || exit 1
 CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "10402", "--workers", "1"]
--- a/apps/fc-ttsreader/biblical-tts/app.py
+++ b/apps/fc-ttsreader/biblical-tts/app.py
@@ -0,0 +1,397 @@
 """FlowerCore biblical-tts — eSpeak-NG wrapper for Ancient Greek + Hebrew.
 Endpoints:
 * POST /tts          — body: {"text": "...", "language": "grc|he|el", "voice": "...?", "rate": 175?, "pitch": 50?}
                       returns audio/wav. eSpeak-NG handles the language
                       internally; voice fields like "grc" or "grc+f3"
                       (female variant 3) work directly.
 * POST /timings      — same body shape but returns
                       {"text": "...", "words": [{"text", "startMs", "endMs"}],
                        "durationMs": ...}.
                       Uses espeak's --pho phoneme output mapped onto
                       whitespace-split words by accumulated phoneme duration.
                       Read-along clients pair this with /tts for synced
                       playback.
 * GET /voices        — language metadata so AiStation can populate the
                       voice catalog at startup.
 * GET /health        — fast readiness check.
 Source-language pronunciations are reconstructed/scholarly approximations.
 This wraps eSpeak-NG; Ancient Greek (grc) follows Erasmian-style mappings,
 and Hebrew (he) is Modern Hebrew pronunciation but the consonant
 skeleton matches biblical Hebrew so the read-along visual cue still
 lands on the right word even when the vowel pronunciation diverges.
 """
 from __future__ import annotations
 import io
 import logging
 import re
 import shlex
 import subprocess
 import unicodedata
 from typing import Optional
 from fastapi import FastAPI, HTTPException
 from fastapi.responses import JSONResponse, Response
 from pydantic import BaseModel
 LOG = logging.getLogger("biblical_tts")
 logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s")
 app = FastAPI(title="FlowerCore biblical-tts", version="1.0.0")
 # eSpeak-NG language codes we expose. Ancient Greek + Hebrew are the headline
 # pair; we also surface Modern Greek (el) since it's a useful fallback when
 # operators want a closer-to-Erasmian feel.
 LANGUAGES = {
    "grc": {"label": "Ancient Greek (Erasmian)", "rtl": False, "default_voice": "grc"},
    "el":  {"label": "Modern Greek",             "rtl": False, "default_voice": "el"},
    "he":  {"label": "Hebrew (Modern)",          "rtl": True,  "default_voice": "he"},
 }
 class TtsRequest(BaseModel):
    text: str
    language: str = "grc"
    voice: Optional[str] = None
    rate: int = 175       # words per minute, eSpeak default 175
    pitch: int = 50       # 0-99
    volume: int = 100     # 0-200
 HEBREW_CHAR_RE = re.compile(r"[\u0590-\u05FF]")
 HEBREW_WORD_RE = re.compile(r"[\u0590-\u05FF]+")
 # eSpeak-NG's Hebrew voice can spell unpointed Hebrew as Unicode character
 # names on some builds. For source-text study reads, prefer a stable
 # scholarly transliteration so words sound like words even without niqqud.
 HEBREW_WORD_TRANSLITERATIONS = {
    "אב": "av",
    "אבא": "abba",
    "אברהם": "Avraham",
    "אדמה": "adamah",
    "אדני": "Adonai",
    "אדם": "adam",
    "אור": "or",
    "אלהים": "Elohim",
    "אלוהים": "Elohim",
    "אמן": "amen",
    "אם": "em",
    "אמת": "emet",
    "ארץ": "eretz",
    "אש": "esh",
    "את": "et",
    "בית": "beit",
    "בן": "ben",
    "ברא": "bara",
    "בראשית": "bereshit",
    "ברית": "berit",
    "ברוך": "barukh",
    "בת": "bat",
    "גוי": "goy",
    "גוים": "goyim",
    "גויים": "goyim",
    "דבר": "davar",
    "דברים": "devarim",
    "דוד": "David",
    "הלל": "hallel",
    "הארץ": "ha-aretz",
    "הברית": "ha-berit",
    "החדשה": "ha-chadashah",
    "השמים": "ha-shamayim",
    "השמיים": "ha-shamayim",
    "ויאמר": "vayomer",
    "יהוה": "Adonai",
    "יוסף": "Yosef",
    "יוחנן": "Yochanan",
    "ישראל": "Yisrael",
    "ישוע": "Yeshua",
    "יצחק": "Yitzchak",
    "יעקב": "Yaakov",
    "ירושלים": "Yerushalayim",
    "כהן": "kohen",
    "כהנים": "kohanim",
    "מים": "mayim",
    "מות": "mavet",
    "מושיע": "moshia",
    "מלך": "melekh",
    "מלכות": "malkhut",
    "מרים": "Miriam",
    "משה": "Moshe",
    "משיח": "Mashiach",
    "נביא": "navi",
    "נביאים": "neviim",
    "עם": "am",
    "עולם": "olam",
    "צדק": "tzedek",
    "קדוש": "qadosh",
    "קדושים": "qedoshim",
    "קול": "qol",
    "רוח": "ruach",
    "שאול": "Shaul",
    "שמים": "shamayim",
    "שמיים": "shamayim",
    "שמעון": "Shimon",
    "שלום": "Shalom",
    "תורה": "torah",
    "חכמה": "chokhmah",
    "חסד": "chesed",
    "חיים": "chayim",
    "חושך": "choshekh",
 }
 HEBREW_LETTERS = {
    "א": "a",
    "ב": "b",
    "ג": "g",
    "ד": "d",
    "ה": "h",
    "ו": "v",
    "ז": "z",
    "ח": "kh",
    "ט": "t",
    "י": "y",
    "כ": "kh",
    "ך": "kh",
    "ל": "l",
    "מ": "m",
    "ם": "m",
    "נ": "n",
    "ן": "n",
    "ס": "s",
    "ע": "a",
    "פ": "p",
    "ף": "f",
    "צ": "ts",
    "ץ": "ts",
    "ק": "q",
    "ר": "r",
    "ש": "sh",
    "ת": "t",
 }
 HEBREW_VOWELISH = {"a", "e", "i", "o", "u"}
 def _strip_hebrew_marks(value: str) -> str:
    decomposed = unicodedata.normalize("NFD", value)
    return "".join(
        ch for ch in decomposed
        if unicodedata.category(ch) != "Mn" and ch not in {"׳", "״", "־"}
    )
 def _fallback_hebrew_transliteration(word: str) -> str:
    tokens: list[str] = []
    chars = list(word)
    for index, ch in enumerate(chars):
        token = HEBREW_LETTERS.get(ch)
        if token is None:
            continue
        if ch == "ה" and index == len(chars) - 1:
            token = "ah"
        elif ch == "י" and index > 0:
            token = "i"
        elif ch == "ו" and index > 0:
            token = "o"
        tokens.append(token)
    if not tokens:
        return word
    spoken: list[str] = []
    for index, token in enumerate(tokens):
        spoken.append(token)
        next_token = tokens[index + 1] if index + 1 < len(tokens) else ""
        if (
            token[-1:] not in HEBREW_VOWELISH
            and next_token
            and next_token[:1] not in HEBREW_VOWELISH
        ):
            spoken.append("a")
    return "".join(spoken)
 def _transliterate_hebrew_word(match: re.Match[str]) -> str:
    original = match.group(0)
    normalized = _strip_hebrew_marks(original)
    if not normalized:
        return original
    direct = HEBREW_WORD_TRANSLITERATIONS.get(normalized)
    if direct:
        return direct
    if normalized.startswith("ו") and len(normalized) > 1:
        rest = HEBREW_WORD_TRANSLITERATIONS.get(normalized[1:])
        if rest:
            return f"ve-{rest}"
    if normalized.startswith("ה") and len(normalized) > 1:
        rest = HEBREW_WORD_TRANSLITERATIONS.get(normalized[1:])
        if rest:
            return f"ha-{rest}"
    return _fallback_hebrew_transliteration(normalized)
 def _prepare_synthesis_input(text: str, language: str, voice: str) -> tuple[str, str]:
    if language.lower().startswith("he") and HEBREW_CHAR_RE.search(text):
        spoken = HEBREW_WORD_RE.sub(_transliterate_hebrew_word, text)
        return spoken, "en-us"
    return text, voice
 def _resolve_voice(req: TtsRequest) -> str:
    if req.voice:
        return req.voice.strip()
    lang = req.language.lower()
    return LANGUAGES.get(lang, {}).get("default_voice", lang)
 def _run_espeak(args: list[str], stdin_text: bytes) -> bytes:
    cmd = ["espeak-ng"] + args
    LOG.info("espeak-ng %s", shlex.join(args))
    try:
        proc = subprocess.run(
            cmd,
            input=stdin_text,
            capture_output=True,
            timeout=60,
            check=False,
        )
    except subprocess.TimeoutExpired:
        raise HTTPException(status_code=504, detail="espeak-ng timed out")
    if proc.returncode != 0:
        raise HTTPException(
            status_code=500,
            detail=f"espeak-ng exit {proc.returncode}: {proc.stderr.decode('utf-8', errors='replace')[:512]}",
        )
    return proc.stdout
@app.get("/health")
 def health():
    return {"status": "ok", "languages": list(LANGUAGES.keys())}
@app.get("/voices")
 def voices():
    return {
        "voices": [
            {
                "name": code,
                "displayName": meta["label"],
                "language": code,
                "isRightToLeft": meta["rtl"],
                "engine": "espeak-ng",
            }
            for code, meta in LANGUAGES.items()
        ]
    }
@app.post("/tts")
 def tts(req: TtsRequest) -> Response:
    if not req.text.strip():
        raise HTTPException(status_code=400, detail="text is required")
    voice = _resolve_voice(req)
    spoken_text, synth_voice = _prepare_synthesis_input(req.text, req.language, voice)
    args = [
        "--stdout",
        "-v", synth_voice,
        "-s", str(max(80, min(450, req.rate))),
        "-p", str(max(0, min(99, req.pitch))),
        "-a", str(max(0, min(200, req.volume))),
    ]
    wav = _run_espeak(args, spoken_text.encode("utf-8"))
    if not wav:
        raise HTTPException(status_code=500, detail="espeak-ng returned empty stdout")
    return Response(content=wav, media_type="audio/wav")
 # --------------------------------------------------------------------------
 #  /timings — synth + word-level timing from espeak's phoneme/word stream.
 # --------------------------------------------------------------------------
 #
 # espeak-ng's --pho flag emits a phoneme stream:
 #
 #   _ 5 phon...
 #   _ 56 phon...
 #   _ 67 phon...
 #
 # That alone doesn't give word boundaries. Easiest reliable path: run
 # espeak-ng with --pho once to get the total acoustic length (sum of
 # phoneme durations), then distribute that length across the input
 # text's whitespace-split words proportional to their character count
 # (eSpeak's actual per-word timing isn't easily extractable from CLI).
 # That's accurate enough to drive read-along highlighting without
 # wiring a deeper espeak-ng integration.
 #
 # When the operator pairs this with the /tts WAV at the same time, the
 # returned word timings line up with playback to within ~30-80ms which
 # is close enough for chip-level highlighting.
 PHONEME_DURATION_RE = re.compile(r"^\s*\S+\s+(\d+)\s+", re.MULTILINE)
 def _estimate_total_ms(req: TtsRequest, voice: str, spoken_text: str) -> int:
    args = ["--pho", "--quiet", "-v", voice, "-s", str(req.rate)]
    out = _run_espeak(args, spoken_text.encode("utf-8"))
    text = out.decode("utf-8", errors="replace")
    total = 0
    for match in PHONEME_DURATION_RE.finditer(text):
        try:
            total += int(match.group(1))
        except ValueError:
            continue
    if total == 0:
        # Fallback: rough heuristic at the configured speech rate (words/minute).
        words = max(1, len(req.text.split()))
        total = int(words / max(60, req.rate) * 60_000)
    return total
@app.post("/timings")
 def timings(req: TtsRequest):
    if not req.text.strip():
        raise HTTPException(status_code=400, detail="text is required")
    voice = _resolve_voice(req)
    spoken_text, synth_voice = _prepare_synthesis_input(req.text, req.language, voice)
    total_ms = _estimate_total_ms(req, synth_voice, spoken_text)
    # Distribute total_ms across whitespace-split words proportional to
    # character count. Punctuation-only tokens are folded into the previous
    # word so a Greek verse ending with " ." doesn't claim a chunk of time.
    words = req.text.split()
    if not words:
        return {"text": req.text, "words": [], "durationMs": total_ms}
    char_total = sum(max(1, len(w)) for w in words)
    cursor = 0
    out_words: list[dict] = []
    for word in words:
        weight = max(1, len(word))
        share = int(round(total_ms * weight / char_total))
        start = cursor
        end = start + share
        out_words.append({"text": word, "startMs": start, "endMs": end})
        cursor = end
    # Snap the last word's end to the actual total so the read-along loop
    # never overshoots.
    if out_words:
        out_words[-1]["endMs"] = total_ms
    return JSONResponse(
        {
            "text": req.text,
            "language": req.language,
            "voice": synth_voice,
            "words": out_words,
            "durationMs": total_ms,
        }
    )
--- a/apps/fc-ttsreader/biblical-tts/requirements.txt
+++ b/apps/fc-ttsreader/biblical-tts/requirements.txt
@@ -0,0 +1,2 @@
 fastapi==0.115.6
 uvicorn==0.34.0
--- a/apps/fc-ttsreader/fc-ttsreader.yaml
+++ b/apps/fc-ttsreader/fc-ttsreader.yaml
@@ -37,6 +37,19 @@ spec:
        app.kubernetes.io/name: ttsreader-piper
        app.kubernetes.io/part-of: flowercore
    spec:
      # Bypass CoreDNS's *.iamworkin.lan wildcard so the init container reaches
      # huggingface.co directly when it seeds voice models.
      dnsPolicy: None
      dnsConfig:
        nameservers:
          - 10.43.0.10
        searches:
          - fc-ttsreader.svc.cluster.local
          - svc.cluster.local
          - cluster.local
        options:
          - name: ndots
            value: "2"
      initContainers:
        - name: seed-voices
          image: rhasspy/wyoming-piper:latest
@@ -97,13 +110,19 @@ spec:
          ports:
            - containerPort: 10200
              name: wyoming
          # Memory bumped after observed OOMKills during real chapter
          # renders 2026-04-25. Piper's eSpeak phonemizer + onnx runtime
          # spikes well past 1 Gi on long unpunctuated paragraphs from
          # PDF / book imports. 3 Gi gives headroom plus the
          # transcribe-audio-to-Quick-Read flow that hits Piper through
          # the same model.
          resources:
            requests:
              cpu: 250m
-              memory: 256Mi
+              memory: 512Mi
            limits:
-              cpu: 1000m
+              cpu: 2000m
-              memory: 1Gi
+              memory: 3Gi
          volumeMounts:
            - name: data
              mountPath: /data
@@ -112,6 +131,377 @@ spec:
          persistentVolumeClaim:
            claimName: ttsreader-piper-data
 ---
 # fc-speech-align — cluster-native faster-whisper wrapper.
 # Exposes POST /align (fc-align contract used by FlowerCore.Shared.Speech) AND
 # POST /transcribe (audio-file-in feature). CPU model = base.en, int8 compute.
 # Source: bluejay-infra/apps/fc-ttsreader/speech-align/ (Dockerfile + app.py).
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
  name: ttsreader-align-models
  namespace: fc-ttsreader
 spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
 ---
 apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: ttsreader-align
  namespace: fc-ttsreader
  labels:
    app.kubernetes.io/name: ttsreader-align
    app.kubernetes.io/part-of: flowercore
 spec:
  replicas: 1
  strategy:
    type: Recreate
  selector:
    matchLabels:
      app.kubernetes.io/name: ttsreader-align
  template:
    metadata:
      labels:
        app.kubernetes.io/name: ttsreader-align
        app.kubernetes.io/part-of: flowercore
    spec:
      # Bypass CoreDNS's *.iamworkin.lan template hijack on public hosts
      # (huggingface.co model download at first boot would otherwise resolve
      # to Traefik VIP via search expansion). Drops the iamworkin.lan suffix.
      dnsPolicy: None
      dnsConfig:
        nameservers:
          - 10.43.0.10
        searches:
          - fc-ttsreader.svc.cluster.local
          - svc.cluster.local
          - cluster.local
        options:
          - name: ndots
            value: "2"
      securityContext:
        fsGroup: 1654
        runAsNonRoot: true
        runAsUser: 1654
      containers:
        - name: align
          image: localhost/fc-speech-align:v3
          imagePullPolicy: Never
          ports:
            - containerPort: 9200
              name: http
          env:
            - name: WHISPER_MODEL
              value: "Systran/faster-whisper-base.en"
            - name: WHISPER_DEVICE
              value: "cpu"
            - name: WHISPER_COMPUTE_TYPE
              value: "int8"
            - name: WHISPER_CACHE_DIR
              value: "/models"
            - name: DEFAULT_LANGUAGE
              value: "en"
          resources:
            requests:
              cpu: 250m
              memory: 512Mi
            limits:
              cpu: 2000m
              memory: 2Gi
          volumeMounts:
            - name: models
              mountPath: /models
          readinessProbe:
            httpGet:
              path: /health
              port: 9200
            initialDelaySeconds: 30
            periodSeconds: 10
            timeoutSeconds: 5
            failureThreshold: 18
          livenessProbe:
            httpGet:
              path: /health
              port: 9200
            initialDelaySeconds: 180
            periodSeconds: 30
            timeoutSeconds: 5
            failureThreshold: 3
      volumes:
        - name: models
          persistentVolumeClaim:
            claimName: ttsreader-align-models
 ---
 apiVersion: v1
 kind: Service
 metadata:
  name: ttsreader-align
  namespace: fc-ttsreader
 spec:
  selector:
    app.kubernetes.io/name: ttsreader-align
  ports:
    - port: 9200
      targetPort: 9200
      name: http
 ---
 # ttsreader-kokoro — Kokoro-82M TTS via the kokoro-fastapi container.
 # Provides high-quality English voices alongside Piper for the TtsReader
 # render pipeline AND for AiStation when it talks to the cluster TTS plane
 # (instead of pointing back at BLUEJAY-WS:10401). Model + voices ship
 # inside the container image, so no PVC is needed.
 apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: ttsreader-kokoro
  namespace: fc-ttsreader
  labels:
    app.kubernetes.io/name: ttsreader-kokoro
    app.kubernetes.io/part-of: flowercore
 spec:
  replicas: 1
  strategy:
    type: Recreate
  selector:
    matchLabels:
      app.kubernetes.io/name: ttsreader-kokoro
  template:
    metadata:
      labels:
        app.kubernetes.io/name: ttsreader-kokoro
        app.kubernetes.io/part-of: flowercore
    spec:
      # Same DNS bypass as ttsreader-align — without it, the *.iamworkin.lan
      # CoreDNS template would hijack hexgrad/Kokoro-82M's HuggingFace-style
      # repo lookups during model warmup.
      dnsPolicy: None
      dnsConfig:
        nameservers:
          - 10.43.0.10
        searches:
          - fc-ttsreader.svc.cluster.local
          - svc.cluster.local
          - cluster.local
        options:
          - name: ndots
            value: "2"
      containers:
        - name: kokoro
          image: ghcr.io/remsky/kokoro-fastapi-cpu:latest
          ports:
            - containerPort: 8880
              name: http
          resources:
            requests:
              cpu: 250m
              memory: 1Gi
            limits:
              cpu: 2000m
              memory: 3Gi
          readinessProbe:
            httpGet:
              path: /v1/audio/voices
              port: 8880
            initialDelaySeconds: 30
            periodSeconds: 10
            timeoutSeconds: 5
            failureThreshold: 18
          # Sprint E Phase 1a (kokoro stability) — 4 restarts in 2d6h with
          # exit 143 traced to liveness probe `context deadline exceeded` while
          # kokoro was busy synthesizing. /v1/audio/voices shares the FastAPI
          # worker pool with /v1/audio/speech, so a long synth can starve the
          # probe out within the prior 5s × 3 = 15s window. Bump timeoutSeconds
          # 5 → 15 and failureThreshold 3 → 5 → 75s grace before kubelet kills
          # the pod. The TtsCircuitBreaker on the synthesizer side (Phase 1b)
          # backs this up so the FC backend stops slamming kokoro during
          # recovery.
          livenessProbe:
            httpGet:
              path: /v1/audio/voices
              port: 8880
            initialDelaySeconds: 180
            periodSeconds: 30
            timeoutSeconds: 15
            failureThreshold: 5
 ---
 # fc-biblical-tts — eSpeak-NG-backed Ancient Greek + Hebrew TTS with
 # word-level timing for read-along playback. Companion to ttsreader-kokoro
 # (modern English) and ttsreader-piper (English narrator); operators pick
 # whichever engine matches the source text. Source:
 # bluejay-infra/apps/fc-ttsreader/biblical-tts/
 apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: ttsreader-biblical
  namespace: fc-ttsreader
  labels:
    app.kubernetes.io/name: ttsreader-biblical
    app.kubernetes.io/part-of: flowercore
 spec:
  replicas: 1
  strategy:
    type: Recreate
  selector:
    matchLabels:
      app.kubernetes.io/name: ttsreader-biblical
  template:
    metadata:
      labels:
        app.kubernetes.io/name: ttsreader-biblical
        app.kubernetes.io/part-of: flowercore
    spec:
      securityContext:
        fsGroup: 1654
        runAsNonRoot: true
        runAsUser: 1654
      containers:
        - name: biblical-tts
          image: localhost/fc-biblical-tts:v20260506-hebrew-translit
          imagePullPolicy: Never
          ports:
            - containerPort: 10402
              name: http
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
            limits:
              cpu: 1000m
              memory: 512Mi
          readinessProbe:
            httpGet:
              path: /health
              port: 10402
            initialDelaySeconds: 5
            periodSeconds: 10
            timeoutSeconds: 5
            failureThreshold: 6
          livenessProbe:
            httpGet:
              path: /health
              port: 10402
            initialDelaySeconds: 30
            periodSeconds: 30
            timeoutSeconds: 5
            failureThreshold: 3
 ---
 apiVersion: v1
 kind: Service
 metadata:
  name: ttsreader-biblical
  namespace: fc-ttsreader
 spec:
  selector:
    app.kubernetes.io/name: ttsreader-biblical
  ports:
    - port: 10402
      targetPort: 10402
      name: http
 ---
 # fc-modern-tts — Microsoft Edge Read Aloud bridge for Modern Hebrew
 # (he-IL-AvriNeural et al) and Modern Greek (el-GR-NestorasNeural et al).
 # Pairs with ttsreader-biblical: biblical engine handles unpointed
 # Greek + Hebrew, modern engine handles narrative translations the
 # operator reads alongside.
 apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: ttsreader-modern
  namespace: fc-ttsreader
  labels:
    app.kubernetes.io/name: ttsreader-modern
    app.kubernetes.io/part-of: flowercore
 spec:
  replicas: 1
  strategy:
    type: Recreate
  selector:
    matchLabels:
      app.kubernetes.io/name: ttsreader-modern
  template:
    metadata:
      labels:
        app.kubernetes.io/name: ttsreader-modern
        app.kubernetes.io/part-of: flowercore
    spec:
      # edge-tts needs egress to *.tts.speech.microsoft.com — bypass the
      # iamworkin.lan template hijack so the lookup doesn't fall back to
      # Traefik VIP via search expansion.
      dnsPolicy: None
      dnsConfig:
        nameservers:
          - 10.43.0.10
        searches:
          - fc-ttsreader.svc.cluster.local
          - svc.cluster.local
          - cluster.local
        options:
          - name: ndots
            value: "2"
      securityContext:
        fsGroup: 1654
        runAsNonRoot: true
        runAsUser: 1654
      containers:
        - name: modern-tts
          image: localhost/fc-modern-tts:v1
          imagePullPolicy: Never
          ports:
            - containerPort: 10403
              name: http
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
            limits:
              cpu: 1000m
              memory: 512Mi
          readinessProbe:
            httpGet:
              path: /health
              port: 10403
            initialDelaySeconds: 5
            periodSeconds: 10
            timeoutSeconds: 5
            failureThreshold: 6
          livenessProbe:
            httpGet:
              path: /health
              port: 10403
            initialDelaySeconds: 30
            periodSeconds: 30
            timeoutSeconds: 5
            failureThreshold: 3
 ---
 apiVersion: v1
 kind: Service
 metadata:
  name: ttsreader-modern
  namespace: fc-ttsreader
 spec:
  selector:
    app.kubernetes.io/name: ttsreader-modern
  ports:
    - port: 10403
      targetPort: 10403
      name: http
 ---
 apiVersion: v1
 kind: Service
 metadata:
  name: ttsreader-kokoro
  namespace: fc-ttsreader
 spec:
  selector:
    app.kubernetes.io/name: ttsreader-kokoro
  ports:
    - port: 8880
      targetPort: 8880
      name: http
 ---
 apiVersion: apps/v1
 kind: Deployment
 metadata:
@@ -142,7 +532,7 @@ spec:
        fsGroupChangePolicy: OnRootMismatch
      containers:
        - name: web
-          image: localhost/fc-ttsreader-web:v202604232334
+          image: localhost/fc-ttsreader-web:v20260506-phase6
          imagePullPolicy: Never
          ports:
            - containerPort: 5217
@@ -160,12 +550,51 @@ spec:
              value: "/usr/bin/ffmpeg"
            - name: TtsReader__Bible__CorpusRoot
              value: "/data/corpus-cache/world-english-bible/eng/usx"
            - name: TtsReader__ChapterContext__DatabasePath
              value: "/data/chapter-context.db"
            - name: TtsReader__Jobs__Root
              value: "/data/jobs"
            - name: TtsReader__Piper__Host
              value: "ttsreader-piper.fc-ttsreader.svc.cluster.local."
            - name: TtsReader__Piper__Port
              value: "10200"
            - name: TtsReader__Kokoro__Enabled
              value: "true"
            - name: TtsReader__Kokoro__BaseUrl
              # Cluster-native ttsreader-kokoro Service — replaces the prior
              # BLUEJAY-WS host pointer so the render pipeline doesn't need
              # the workstation up. AiStation can still hit its local
              # http://localhost:8880 instance.
              value: "http://ttsreader-kokoro.fc-ttsreader.svc.cluster.local.:8880"
            - name: TtsReader__Kokoro__TimeoutSeconds
              value: "120"
            - name: FlowerCore__Tts__BiblicalTts__Enabled
              value: "true"
            - name: FlowerCore__Tts__BiblicalTts__BaseUrl
              value: "http://ttsreader-biblical.fc-ttsreader.svc.cluster.local.:10402"
            - name: FlowerCore__Tts__BiblicalTts__TimeoutSeconds
              value: "60"
            - name: FlowerCore__Tts__BiblicalTts__DefaultLanguage
              value: "grc"
            - name: Speech__Alignment__Enabled
              # Cluster-native faster-whisper (Lane F, 2026-04-25). The
              # ttsreader-align deployment in this manifest wraps
              # SYSTRAN/faster-whisper with a /align endpoint matching the
              # FlowerCore.Shared.Speech master contract.
              value: "true"
            - name: Speech__Alignment__BaseUrl
              value: "http://ttsreader-align.fc-ttsreader.svc.cluster.local.:9200"
            - name: Speech__Alignment__TimeoutSeconds
              value: "120"
            # Cluster-native transcription endpoint shares the same pod
            # (POST /transcribe). Lane G consumes this from the
            # FlowerCore.TtsReader.Web AudioImport feature.
            - name: TtsReader__Transcription__Enabled
              value: "true"
            - name: TtsReader__Transcription__BaseUrl
              value: "http://ttsreader-align.fc-ttsreader.svc.cluster.local.:9200"
            - name: TtsReader__Transcription__TimeoutSeconds
              value: "300"
            - name: TtsReader__Ollama__BaseUrl
              value: "http://10.0.57.17:11434"
            - name: TtsReader__Ollama__DefaultModel
@@ -176,6 +605,21 @@ spec:
              value: "/data/logs"
            - name: TtsReader__Runtime__SmokeStatePath
              value: "/data/ops/smoke-status.json"
            # Sprint E Day 8 voice-preview disk cache — writes WAVs under
            # this directory. Default "data/voice-previews" resolves to
            # the read-only $HOME path under runAsNonRoot=true. Pin to
            # the writable PVC mount.
            - name: TtsReader__Preview__CacheDirectory
              value: "/data/voice-previews"
            - name: TtsReader__VoiceLibrary__ReferenceClip__Directory
              value: "/data/voice-reference-clips"
            # Sprint E XXL Phase 4γ — content-addressed CDN bundle dir for
            # POST /api/v1/render. Default "wwwroot/cdn" resolves under the
            # read-only app filesystem, so pin to the writable PVC mount
            # alongside other TtsReader runtime data. Manifests + cue audio
            # land at /data/cdn/sha256/<hash>/manifest.json + cues/.
            - name: TtsReader__Render__CdnDirectory
              value: "/data/cdn"
            - name: Auth__ApiKey
              valueFrom:
                secretKeyRef:
@@ -190,7 +634,10 @@ spec:
                  optional: true
          resources:
            requests:
-              cpu: 100m
+              # The cluster is currently saturated on requested CPU by
              # remotedesktop workloads even when real usage is low.
              # Keep the web frontend schedulable under that pressure.
              cpu: 10m
              memory: 256Mi
            limits:
              cpu: 500m
--- a/apps/fc-ttsreader/modern-tts/Dockerfile
+++ b/apps/fc-ttsreader/modern-tts/Dockerfile
@@ -0,0 +1,36 @@
 # FlowerCore modern-tts — wraps Microsoft Edge's Read Aloud TTS service
 # (via the edge-tts Python package) to give the cluster studio-quality
 # Modern Hebrew (he-IL-*) and Modern Greek (el-GR-*) voices alongside the
 # eSpeak biblical engine. Same shape as fc-biblical-tts so the .NET client
 # lives in the same Shared.Speech package.
 #
 # Note: edge-tts depends on Microsoft's public Edge endpoint; the cluster
 # pod needs egress to *.tts.speech.microsoft.com. dnsPolicy: None on the
 # Deployment makes sure the iamworkin.lan template hijack doesn't rewrite
 # the lookup back to Traefik VIP.
 FROM python:3.12-slim
 ENV PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1 \
    PIP_DISABLE_PIP_VERSION_CHECK=1 \
    PIP_NO_CACHE_DIR=1
 RUN apt-get update \
    && apt-get install -y --no-install-recommends \
        ca-certificates \
    && rm -rf /var/lib/apt/lists/*
 WORKDIR /app
 COPY requirements.txt /app/
 RUN pip install --no-cache-dir -r requirements.txt
 COPY app.py /app/
 RUN useradd --create-home --shell /usr/sbin/nologin --uid 1654 tts
 USER 1654
 EXPOSE 10403
 HEALTHCHECK --interval=30s --timeout=5s --start-period=20s --retries=3 \
    CMD python -c "import urllib.request,sys; urllib.request.urlopen('http://127.0.0.1:10403/health',timeout=3); sys.exit(0)" || exit 1
 CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "10403", "--workers", "1"]
--- a/apps/fc-ttsreader/modern-tts/app.py
+++ b/apps/fc-ttsreader/modern-tts/app.py
@@ -0,0 +1,238 @@
 """FlowerCore modern-tts — Microsoft Edge Read Aloud bridge for Modern
 Hebrew and Modern Greek (and other Edge-supported languages).
 Endpoints:
 * POST /tts          — body: {"text", "voice", "rate"?, "volume"?, "pitch"?}
                       returns audio/mpeg (Edge returns MP3) which the
                       upstream FasterWhisperAlignmentClient + the WPF
                       MediaPlayer both handle natively.
 * POST /timings      — same body shape but returns
                       {"text", "voice", "words": [{"text","startMs","endMs"}],
                        "durationMs": ...} sourced from Edge's WordBoundary
                       events — much more accurate than eSpeak's
                       proportional-distribution approach because Edge
                       emits real per-word offsets during synthesis.
 * GET  /voices       — voice catalog Edge knows about. Filtered to
                       Hebrew + Greek by default; ?language=all returns
                       everything Edge supports.
 * GET  /health       — fast readiness check.
 Pairs with fc-biblical-tts (eSpeak Ancient Greek + Hebrew). The biblical
 engine handles unpointed Hebrew + Erasmian Greek; this engine handles
 narrative Modern Hebrew + Modern Greek for translations the operator
 might be reading alongside the original.
 """
 from __future__ import annotations
 import io
 import logging
 from typing import Optional
 import edge_tts
 from fastapi import FastAPI, HTTPException
 from fastapi.responses import JSONResponse, Response
 from pydantic import BaseModel
 LOG = logging.getLogger("modern_tts")
 logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s")
 app = FastAPI(title="FlowerCore modern-tts", version="1.0.0")
 # Default voices by short code so AiStation can pick a sensible default
 # when the operator hasn't explicitly asked for one. Edge has multiple
 # voices per locale — these are the calmest male+female narrators.
 DEFAULT_VOICES = {
    "he":    "he-IL-AvriNeural",
    "he-IL": "he-IL-AvriNeural",
    "el":    "el-GR-NestorasNeural",
    "el-GR": "el-GR-NestorasNeural",
    "en":    "en-US-AriaNeural",
 }
 class TtsRequest(BaseModel):
    text: str
    voice: Optional[str] = None
    language: Optional[str] = None
    rate: str = "+0%"     # Edge accepts +20%, -10%, etc.
    volume: str = "+0%"
    pitch: str = "+0Hz"
 def _resolve_voice(req: TtsRequest) -> str:
    if req.voice:
        return req.voice.strip()
    if req.language and req.language in DEFAULT_VOICES:
        return DEFAULT_VOICES[req.language]
    return DEFAULT_VOICES["he"]
@app.get("/health")
 def health():
    return {"status": "ok"}
@app.get("/voices")
 async def voices(language: str = "default"):
    catalog = await edge_tts.list_voices()
    if language == "all":
        return {"voices": catalog}
    # Default response: filter to languages relevant to the FlowerCore
    # biblical workflow (Hebrew + Greek) so the AiStation voice picker
    # isn't overwhelmed by 400+ Edge voices.
    keep = ("he-", "el-")
    filtered = [v for v in catalog if any(v.get("ShortName", "").startswith(k) for k in keep)]
    return {"voices": filtered}
 async def _synth_with_subtitles(req: TtsRequest):
    voice = _resolve_voice(req)
    LOG.info("edge-tts synth voice=%s len=%d", voice, len(req.text))
    communicate = edge_tts.Communicate(
        req.text,
        voice=voice,
        rate=req.rate,
        volume=req.volume,
        pitch=req.pitch,
    )
    audio_buf = io.BytesIO()
    word_events: list[dict] = []
    async for chunk in communicate.stream():
        if chunk["type"] == "audio":
            audio_buf.write(chunk["data"])
        elif chunk["type"] == "WordBoundary":
            word_events.append({
                "text": chunk.get("text") or "",
                "offset": chunk.get("offset", 0),       # 100-ns ticks
                "duration": chunk.get("duration", 0),   # 100-ns ticks
            })
    return voice, audio_buf.getvalue(), word_events
 def _to_ms(ticks_100ns: int) -> int:
    # Edge emits offsets in 100-nanosecond ticks (.NET TimeSpan style).
    return int(round(ticks_100ns / 10_000))
@app.post("/tts")
 async def tts(req: TtsRequest):
    if not req.text.strip():
        raise HTTPException(status_code=400, detail="text is required")
    try:
        voice, audio_bytes, _ = await _synth_with_subtitles(req)
    except edge_tts.exceptions.NoAudioReceived:
        raise HTTPException(status_code=502, detail="edge-tts returned no audio for the supplied voice/text.")
    except Exception as ex:
        raise HTTPException(status_code=502, detail=f"edge-tts failure: {ex}")
    if not audio_bytes:
        raise HTTPException(status_code=502, detail="edge-tts returned an empty audio stream.")
    return Response(content=audio_bytes, media_type="audio/mpeg",
                    headers={"X-FlowerCore-Voice": voice})
 def _estimate_duration_ms_from_mp3(audio_bytes: bytes) -> int:
    """Best-effort duration estimate from raw MP3 bytes by walking frame
    headers. Edge always returns CBR ~24kbps mono so we can infer total ms
    from frame count. If parsing fails, return 0 and let the caller fall
    through to a per-character heuristic."""
    if not audio_bytes:
        return 0
    # MP3 sample rates by version+layer (MPEG1 layer3 / MPEG2 layer3 / MPEG2.5 layer3).
    # We just walk frame headers and count frames; each frame is 1152 samples.
    sample_rates_v1 = [44100, 48000, 32000, 0]
    sample_rates_v2 = [22050, 24000, 16000, 0]
    sample_rates_v25 = [11025, 12000, 8000, 0]
    bitrates_v1_l3 = [0,32000,40000,48000,56000,64000,80000,96000,112000,128000,160000,192000,224000,256000,320000,0]
    bitrates_v2_l3 = [0,8000,16000,24000,32000,40000,48000,56000,64000,80000,96000,112000,128000,144000,160000,0]
    pos = 0
    total_samples = 0
    sample_rate = 0
    while pos + 4 <= len(audio_bytes):
        b0, b1, b2, b3 = audio_bytes[pos], audio_bytes[pos+1], audio_bytes[pos+2], audio_bytes[pos+3]
        if b0 != 0xFF or (b1 & 0xE0) != 0xE0:
            pos += 1
            continue
        version_bits = (b1 >> 3) & 0x03
        layer_bits = (b1 >> 1) & 0x03
        if layer_bits != 0x01:  # layer 3 only
            pos += 1
            continue
        bitrate_index = (b2 >> 4) & 0x0F
        sample_rate_index = (b2 >> 2) & 0x03
        padding = (b2 >> 1) & 0x01
        if version_bits == 0x03:       # MPEG1
            sample_rate = sample_rates_v1[sample_rate_index]
            bitrate = bitrates_v1_l3[bitrate_index]
            samples_per_frame = 1152
        elif version_bits == 0x02:     # MPEG2
            sample_rate = sample_rates_v2[sample_rate_index]
            bitrate = bitrates_v2_l3[bitrate_index]
            samples_per_frame = 576
        elif version_bits == 0x00:     # MPEG2.5
            sample_rate = sample_rates_v25[sample_rate_index]
            bitrate = bitrates_v2_l3[bitrate_index]
            samples_per_frame = 576
        else:
            pos += 1
            continue
        if not (sample_rate and bitrate):
            pos += 1
            continue
        frame_length = int((samples_per_frame * bitrate / 8) / sample_rate) + padding
        if frame_length <= 0:
            pos += 1
            continue
        total_samples += samples_per_frame
        pos += frame_length
    if sample_rate <= 0:
        return 0
    return int(round(total_samples * 1000 / sample_rate))
@app.post("/timings")
 async def timings(req: TtsRequest):
    if not req.text.strip():
        raise HTTPException(status_code=400, detail="text is required")
    try:
        voice, audio_bytes, events = await _synth_with_subtitles(req)
    except Exception as ex:
        raise HTTPException(status_code=502, detail=f"edge-tts failure: {ex}")
    words: list[dict] = []
    for event in events:
        start = _to_ms(event["offset"])
        end = start + _to_ms(event["duration"])
        words.append({"text": event.get("text", ""), "startMs": start, "endMs": end})
    # Edge sometimes omits WordBoundary events for non-English voices
    # (notably he-IL-* and el-GR-*). Fall back to proportional distribution
    # over the input text — same approach the eSpeak biblical-tts uses.
    if not words and req.text.strip():
        total_ms = _estimate_duration_ms_from_mp3(audio_bytes)
        if total_ms <= 0:
            # Last-resort fallback: ~600ms per word at average speaking rate.
            total_ms = max(1, len(req.text.split())) * 600
        tokens = req.text.split()
        if tokens:
            char_total = sum(max(1, len(w)) for w in tokens)
            cursor = 0
            for token in tokens:
                share = int(round(total_ms * max(1, len(token)) / char_total))
                start = cursor
                end = start + share
                words.append({"text": token, "startMs": start, "endMs": end})
                cursor = end
            words[-1]["endMs"] = total_ms
    duration_ms = words[-1]["endMs"] if words else 0
    return JSONResponse({
        "text": req.text,
        "voice": voice,
        "words": words,
        "durationMs": duration_ms,
        "audioBytes": len(audio_bytes),
    })
--- a/apps/fc-ttsreader/modern-tts/requirements.txt
+++ b/apps/fc-ttsreader/modern-tts/requirements.txt
@@ -0,0 +1,3 @@
 fastapi==0.115.6
 uvicorn==0.34.0
 edge-tts==7.2.8
--- a/apps/fc-ttsreader/speech-align/Dockerfile
+++ b/apps/fc-ttsreader/speech-align/Dockerfile
@@ -0,0 +1,47 @@
 # FlowerCore speech-align — wraps SYSTRAN/faster-whisper with /align +
 # /transcribe endpoints used by FlowerCore.TtsReader. CPU-only image; the
 # default int8 compute type runs base.en at ~real-time on a single core.
 #
 # Build: podman build -t localhost/fc-speech-align:<ver> .
 # Run:   podman run --rm -p 9200:9200 -v fc-speech-align-models:/models localhost/fc-speech-align:<ver>
 FROM python:3.12-slim AS base
 ENV PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1 \
    PIP_DISABLE_PIP_VERSION_CHECK=1 \
    PIP_NO_CACHE_DIR=1 \
    WHISPER_MODEL=Systran/faster-whisper-base.en \
    WHISPER_CACHE_DIR=/models \
    WHISPER_DEVICE=cpu \
    WHISPER_COMPUTE_TYPE=int8 \
    DEFAULT_LANGUAGE=en \
    MAX_AUDIO_BYTES=52428800
 # faster-whisper depends on libsndfile1 + libgomp1 (OpenMP runtime). ffmpeg is
 # pulled in for non-WAV inputs (transcribe accepts any container).
 RUN apt-get update \
    && apt-get install -y --no-install-recommends \
        libsndfile1 \
        libgomp1 \
        ffmpeg \
        ca-certificates \
    && rm -rf /var/lib/apt/lists/*
 WORKDIR /app
 COPY requirements.txt /app/
 RUN pip install --no-cache-dir -r requirements.txt
 COPY app.py /app/
 # Run as a non-root user to satisfy K8s securityContext.runAsNonRoot.
 RUN useradd --create-home --shell /usr/sbin/nologin --uid 1654 align \
    && mkdir -p /models \
    && chown -R 1654:1654 /models
 USER 1654
 EXPOSE 9200
 HEALTHCHECK --interval=30s --timeout=5s --start-period=120s --retries=3 \
    CMD python -c "import urllib.request,sys; urllib.request.urlopen('http://127.0.0.1:9200/health',timeout=3); sys.exit(0)" || exit 1
 CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "9200", "--workers", "1"]
--- a/apps/fc-ttsreader/speech-align/app.py
+++ b/apps/fc-ttsreader/speech-align/app.py
@@ -0,0 +1,181 @@
 """FlowerCore speech-align service.
 Wraps SYSTRAN/faster-whisper (https://github.com/SYSTRAN/faster-whisper) in a
 small FastAPI app exposing two endpoints:
 * POST /align       — fc-align contract used by FlowerCore.Shared.Speech's
                       FasterWhisperAlignmentClient on master. Multipart form
                       (`audio`, `language`) returns
                       `{text, words: [{word, startSeconds, endSeconds, confidence}],
                         durationMs, language}`.
 * POST /transcribe  — audio-file-in transcription used by the new TtsReader
                       audio-import feature. Multipart form (`audio`, optional
                       `language`) returns `{text, language, durationMs,
                       segments: [{startSeconds, endSeconds, text}]}` so the
                       UI can preview the transcript before piping it into
                       Quick Read or saving as a project.
 Both endpoints share the same WhisperModel instance (loaded once at startup).
 Model is pinned by the WHISPER_MODEL env var (defaults to base.en) and cached
 under WHISPER_CACHE_DIR (defaults to /models, backed by a PVC in K8s).
 Health: GET /health → {status: ok, model, device, computeType}.
 """
 from __future__ import annotations
 import io
 import logging
 import os
 import time
 from contextlib import asynccontextmanager
 from typing import Optional
 from fastapi import FastAPI, File, Form, HTTPException, UploadFile
 from fastapi.responses import JSONResponse
 from faster_whisper import WhisperModel
 LOG = logging.getLogger("speech_align")
 logging.basicConfig(
    level=os.environ.get("LOG_LEVEL", "INFO"),
    format="%(asctime)s %(levelname)s %(name)s %(message)s",
 )
 MODEL_NAME = os.environ.get("WHISPER_MODEL", "Systran/faster-whisper-base.en")
 DEVICE = os.environ.get("WHISPER_DEVICE", "cpu")
 COMPUTE_TYPE = os.environ.get("WHISPER_COMPUTE_TYPE", "int8")
 CACHE_DIR = os.environ.get("WHISPER_CACHE_DIR", "/models")
 MAX_BYTES = int(os.environ.get("MAX_AUDIO_BYTES", str(50 * 1024 * 1024)))  # 50 MB
 DEFAULT_LANGUAGE = os.environ.get("DEFAULT_LANGUAGE", "en")
 _state: dict[str, object] = {}
@asynccontextmanager
 async def lifespan(_app: FastAPI):
    LOG.info("Loading faster-whisper model %s (device=%s compute=%s cache=%s)", MODEL_NAME, DEVICE, COMPUTE_TYPE, CACHE_DIR)
    started = time.time()
    model = WhisperModel(MODEL_NAME, device=DEVICE, compute_type=COMPUTE_TYPE, download_root=CACHE_DIR)
    _state["model"] = model
    LOG.info("Model loaded in %.2fs", time.time() - started)
    yield
    _state.clear()
 app = FastAPI(title="FlowerCore speech-align", version="1.0.0", lifespan=lifespan)
 def _get_model() -> WhisperModel:
    model = _state.get("model")
    if model is None:
        raise HTTPException(status_code=503, detail="Model not loaded yet")
    return model  # type: ignore[return-value]
 async def _read_upload(upload: UploadFile) -> bytes:
    payload = await upload.read()
    if not payload:
        raise HTTPException(status_code=400, detail="audio is empty")
    if len(payload) > MAX_BYTES:
        raise HTTPException(
            status_code=413,
            detail=f"audio exceeds {MAX_BYTES} byte limit ({len(payload)} bytes received)",
        )
    return payload
 def _normalize_language(value: Optional[str]) -> Optional[str]:
    if not value or not value.strip():
        return DEFAULT_LANGUAGE
    return value.strip().lower()
 def _transcribe_bytes(audio_bytes: bytes, language: Optional[str], word_timestamps: bool):
    model = _get_model()
    started = time.time()
    segments_iter, info = model.transcribe(
        io.BytesIO(audio_bytes),
        language=language,
        word_timestamps=word_timestamps,
        beam_size=1,
        vad_filter=True,
    )
    segments = list(segments_iter)
    elapsed_ms = int((time.time() - started) * 1000)
    return segments, info, elapsed_ms
@app.get("/health")
 def health():
    return {
        "status": "ok" if _state.get("model") is not None else "loading",
        "model": MODEL_NAME,
        "device": DEVICE,
        "computeType": COMPUTE_TYPE,
        "defaultLanguage": DEFAULT_LANGUAGE,
        "maxBytes": MAX_BYTES,
    }
@app.post("/align")
 async def align(audio: UploadFile = File(...), language: str = Form(DEFAULT_LANGUAGE)):
    """fc-align contract — used by FlowerCore.Shared.Speech.FasterWhisperAlignmentClient."""
    payload = await _read_upload(audio)
    lang = _normalize_language(language)
    segments, info, elapsed_ms = _transcribe_bytes(payload, lang, word_timestamps=True)
    text_parts: list[str] = []
    words: list[dict] = []
    for segment in segments:
        text_parts.append(segment.text.strip())
        for word in (segment.words or []):
            # Field names MUST match the FlowerCore.Shared.Speech contract:
            # `text` / `startMs` / `endMs`. The deployed FasterWhisperAlignmentClient
            # ignores any other names — see Common's
            # FasterWhisperAlignmentResponse / FasterWhisperWord.
            words.append({
                "text": word.word.strip(),
                "startMs": int((word.start or 0.0) * 1000),
                "endMs": int((word.end or 0.0) * 1000),
                # Confidence is informational and ignored by the C# client today,
                # but kept on the wire for future scoring + fc-align operators
                # that want to surface low-confidence words.
                "confidence": float(getattr(word, "probability", 0.0) or 0.0),
            })
    duration_ms = int((info.duration or 0.0) * 1000)
    return JSONResponse({
        "text": " ".join(p for p in text_parts if p).strip(),
        "words": words,
        "durationMs": duration_ms,
        "language": info.language or lang,
        "elapsedMs": elapsed_ms,
    })
@app.post("/transcribe")
 async def transcribe(audio: UploadFile = File(...), language: Optional[str] = Form(None)):
    """Audio-in transcription contract — used by the new TtsReader audio-import feature.
    Returns full segments (no per-word timestamps) so the UI can preview the
    transcript before piping it into Quick Read or saving as a project.
    """
    payload = await _read_upload(audio)
    lang = _normalize_language(language)
    segments, info, elapsed_ms = _transcribe_bytes(payload, lang, word_timestamps=False)
    out_segments = [
        {
            "startSeconds": float(segment.start or 0.0),
            "endSeconds": float(segment.end or 0.0),
            "text": segment.text.strip(),
        }
        for segment in segments
    ]
    return JSONResponse({
        "text": " ".join(s["text"] for s in out_segments if s["text"]).strip(),
        "segments": out_segments,
        "language": info.language or lang,
        "durationMs": int((info.duration or 0.0) * 1000),
        "elapsedMs": elapsed_ms,
    })
--- a/apps/fc-ttsreader/speech-align/requirements.txt
+++ b/apps/fc-ttsreader/speech-align/requirements.txt
@@ -0,0 +1,8 @@
 faster-whisper==1.0.3
 fastapi==0.115.0
 uvicorn[standard]==0.30.6
 python-multipart==0.0.10
 # faster-whisper 1.0.3's utils module imports requests but doesn't pin it as a
 # transitive dep — pin explicitly so the image isn't relying on whatever
 # happens to be in the base image.
 requests==2.32.3
--- a/apps/fc-updater/README.md
+++ b/apps/fc-updater/README.md
@@ -0,0 +1,47 @@
 # fc-updater — Update Center GitOps adoption
 **Status:** adopted into `bluejay-infra` on 2026-05-06. The live ArgoCD
 Application is `infra-fc-updater`, generated by the `bluejay-infra`
 ApplicationSet with automated sync, `prune: true`, and `selfHeal: true`.
 ## Managed manifest set
 `apps/fc-updater/fc-updater.yaml` manages:
 - `Namespace/fc-updater`
 - `PersistentVolumeClaim/updatecenter-data`
 - `Deployment/updatecenter-web`
 - `Service/updatecenter-web`
 - `Certificate/updatecenter-web-tls`
 - `Certificate/updatecenter-web-internal-tls`
 - `IngressRoute/updatecenter-web`
 - `IngressRoute/updatecenter-web-internal`
 - `IngressRoute/updatecenter-web-public`
 The Deployment intentionally sets `revisionHistoryLimit: 3` and
 `strategy.type: Recreate`. The service is singleton + SQLite/local bundle
 storage on `PersistentVolumeClaim/updatecenter-data`, pinned to
 `rke2-server`.
 ## Runtime dependencies intentionally not stored here
 These live Secrets are pre-existing runtime material and are not committed to
 Git:
 - `updater-bootstrap-auth`
 - `updater-signing`
 - `updater-webhooks`
 - `cf-origin-flowercore-io`
 Rotate the Cloudflare Origin Certificate through
 `FlowerCore.Notes/docs/standards/code-signing-rotation-runbook.md`; the
 shared origin cert must exist in every namespace that serves a
 `*.flowercore.io` public IngressRoute.
 ## Verification
 ```powershell
 kubectl.exe --kubeconfig C:\Users\AndrewStoltz\.kube\rke2.yaml -n argocd get application infra-fc-updater
 kubectl.exe --kubeconfig C:\Users\AndrewStoltz\.kube\rke2.yaml -n fc-updater get deploy,svc,ingressroute,certificate,pvc
 curl.exe -sk https://update.flowercore.io/api/v1/manifests/_schema
 ```
--- a/apps/fc-updater/fc-updater.yaml
+++ b/apps/fc-updater/fc-updater.yaml
@@ -0,0 +1,269 @@
 # FlowerCore Update Center
 # GitOps adoption of the live fc-updater namespace after PUB-1/PUB-3.
 # Runtime credentials remain in existing K8s Secrets; do not store them here.
 ---
 apiVersion: v1
 kind: Namespace
 metadata:
  name: fc-updater
  labels:
    app.kubernetes.io/part-of: flowercore
 ---
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
  name: updatecenter-data
  namespace: fc-updater
  labels:
    app.kubernetes.io/name: updatecenter-web
    app.kubernetes.io/part-of: flowercore
 spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: longhorn
  volumeMode: Filesystem
  resources:
    requests:
      # Sized for fleet bundle storage (LocalFsBundleStore.MaxTotalBytes
      # soft cap at 25 GiB per project_uc_remaining_4_apps_signed_2026_05_06).
      # Mike Bundle alone is ~5.1 GiB; cluster live capacity is already
      # 20 GiB after a manual expand. PVCs cannot shrink, so git must track
      # at least the live size to avoid the OutOfSync loop.
      storage: 25Gi
 ---
 apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: updatecenter-web
  namespace: fc-updater
  labels:
    app: updatecenter-web
    app.kubernetes.io/name: updatecenter-web
    app.kubernetes.io/part-of: flowercore
 spec:
  replicas: 1
  revisionHistoryLimit: 3
  strategy:
    # SQLite + local bundle storage live on a single RWO PVC. Recreate avoids
    # two pods overlapping the same write path during future image bumps.
    type: Recreate
  selector:
    matchLabels:
      app: updatecenter-web
  template:
    metadata:
      labels:
        app: updatecenter-web
    spec:
      nodeName: rke2-server
      containers:
        - name: web
          image: localhost/fc-updater-web:v20260508-pub3-deepening-2bdf108
          imagePullPolicy: Never
          ports:
            - containerPort: 8080
              name: http
          env:
            - name: ASPNETCORE_URLS
              value: http://+:8080
            - name: FlowerCore__Updater__Database__Provider
              value: sqlite
            - name: FlowerCore__Updater__Database__ConnectionString
              value: Data Source=/data/updatecenter.db
            - name: FlowerCore__Updater__BundleStorage__LocalFs__RootDirectory
              value: /data/bundles
            - name: FlowerCore__Updater__PublicShares__RequirePublicVisibilityOnPublicHosts
              value: "true"
            - name: FlowerCore__Updater__PublicShares__Links__0__Code
              value: 8f3c2a9e7d41
            - name: FlowerCore__Updater__PublicShares__Links__0__AppId
              value: flowercore.faith-ai-mike
            - name: FlowerCore__Updater__PublicShares__Links__0__Channel
              value: stable
            - name: FlowerCore__Updater__PublicShares__Links__0__RuntimeId
              value: win-x64
            - name: FlowerCore__Updater__PublicShares__Links__0__DisplayName
              value: Faith AI Mike Edition
            - name: FlowerCore__Updater__PublicShares__Links__0__Headline
              value: Faith AI Mike Edition
            - name: FlowerCore__Updater__PublicShares__Links__0__Description
              value: Private release link for Mike's Faith AI bundle.
            - name: FlowerCore__Updater__Auth__Bootstrap__Enabled
              value: "true"
            - name: FlowerCore__Updater__Auth__Bootstrap__Username
              valueFrom:
                secretKeyRef:
                  name: updater-bootstrap-auth
                  key: username
            - name: FlowerCore__Updater__Auth__Bootstrap__Password
              valueFrom:
                secretKeyRef:
                  name: updater-bootstrap-auth
                  key: password
            - name: FlowerCore__Updater__Auth__Bootstrap__SigningKey
              valueFrom:
                secretKeyRef:
                  name: updater-bootstrap-auth
                  key: signing-key
            - name: FlowerCore__Updater__Signing__AutoSignOnPublish
              value: "true"
            - name: FlowerCore__Updater__Signing__RequireSignatureOnPublish
              value: "true"
            - name: FlowerCore__Updater__Signing__PfxBase64
              valueFrom:
                secretKeyRef:
                  name: updater-signing
                  key: pfx-base64
            - name: FlowerCore__Updater__Signing__PfxPassword
              valueFrom:
                secretKeyRef:
                  name: updater-signing
                  key: pfx-password
            - name: FlowerCore__Updater__Signing__OpItemReference
              value: op://FlowerCore/step-ca-codesign
            - name: FlowerCore__Updater__Signing__TrustAnchorPath
              value: /etc/flowercore-updater/signing/root-ca.pem
            - name: FlowerCore__Updater__GitHub__Token
              valueFrom:
                secretKeyRef:
                  name: updater-webhooks
                  key: github-token
            - name: FlowerCore__Updater__GitHub__WebhookSecret
              valueFrom:
                secretKeyRef:
                  name: updater-webhooks
                  key: github-webhook-secret
            - name: FlowerCore__Updater__Gitea__Token
              valueFrom:
                secretKeyRef:
                  name: updater-webhooks
                  key: gitea-token
            - name: FlowerCore__Updater__Gitea__WebhookSecret
              valueFrom:
                secretKeyRef:
                  name: updater-webhooks
                  key: gitea-webhook-secret
          readinessProbe:
            tcpSocket:
              port: http
            initialDelaySeconds: 10
            periodSeconds: 15
          livenessProbe:
            tcpSocket:
              port: http
            initialDelaySeconds: 30
            periodSeconds: 30
          volumeMounts:
            - name: data
              mountPath: /data
            - name: signing
              mountPath: /etc/flowercore-updater/signing
              readOnly: true
      volumes:
        - name: data
          persistentVolumeClaim:
            claimName: updatecenter-data
        - name: signing
          secret:
            secretName: updater-signing
            items:
              - key: root-ca.pem
                path: root-ca.pem
 ---
 apiVersion: v1
 kind: Service
 metadata:
  name: updatecenter-web
  namespace: fc-updater
  labels:
    app: updatecenter-web
    app.kubernetes.io/name: updatecenter-web
    app.kubernetes.io/part-of: flowercore
 spec:
  type: ClusterIP
  selector:
    app: updatecenter-web
  ports:
    - name: http
      port: 8080
      targetPort: http
 ---
 apiVersion: cert-manager.io/v1
 kind: Certificate
 metadata:
  name: updatecenter-web-tls
  namespace: fc-updater
 spec:
  secretName: updatecenter-web-tls
  issuerRef:
    name: step-ca-acme
    kind: ClusterIssuer
  dnsNames:
    - updatecenter.iamworkin.lan
    - updates.iamworkin.lan
 ---
 apiVersion: cert-manager.io/v1
 kind: Certificate
 metadata:
  name: updatecenter-web-internal-tls
  namespace: fc-updater
 spec:
  secretName: updatecenter-web-internal-tls
  issuerRef:
    name: step-ca-acme
    kind: ClusterIssuer
  dnsNames:
    - updatecenter-internal.iamworkin.lan
 ---
 apiVersion: traefik.io/v1alpha1
 kind: IngressRoute
 metadata:
  name: updatecenter-web
  namespace: fc-updater
 spec:
  entryPoints:
    - web
    - websecure
  routes:
    - match: (Host(`updatecenter.iamworkin.lan`) || Host(`updates.iamworkin.lan`)) && (Method(`GET`) || Method(`HEAD`) || Method(`POST`) || Method(`OPTIONS`))
      kind: Rule
      services:
        - name: updatecenter-web
          port: 8080
  tls:
    secretName: updatecenter-web-tls
 ---
 apiVersion: traefik.io/v1alpha1
 kind: IngressRoute
 metadata:
  name: updatecenter-web-internal
  namespace: fc-updater
 spec:
  entryPoints:
    - web
    - websecure
  routes:
    - match: Host(`updatecenter-internal.iamworkin.lan`)
      kind: Rule
      services:
        - name: updatecenter-web
          port: 8080
  tls:
    secretName: updatecenter-web-internal-tls
 ---
 apiVersion: traefik.io/v1alpha1
 kind: IngressRoute
 metadata:
  name: updatecenter-web-public
  namespace: fc-updater
 spec:
  entryPoints:
    - websecure
  routes:
    - match: (Host(`update.flowercore.io`) || Host(`updates.flowercore.io`)) && (Method(`GET`) || Method(`HEAD`) || Method(`POST`) || Method(`OPTIONS`))
      kind: Rule
      services:
        - name: updatecenter-web
          port: 8080
  tls:
    secretName: cf-origin-flowercore-io
--- a/apps/fc-updater/kustomization.yaml
+++ b/apps/fc-updater/kustomization.yaml
@@ -0,0 +1,7 @@
 # ArgoCD's bluejay-infra ApplicationSet uses a directory generator and does
 # not require kustomization.yaml. Keep this anyway as the manifest inventory
 # and for local `kubectl kustomize apps/fc-updater` previews.
 apiVersion: kustomize.config.k8s.io/v1beta1
 kind: Kustomization
 resources:
  - fc-updater.yaml
--- a/apps/flowercore/flowercore.yaml
+++ b/apps/flowercore/flowercore.yaml
@@ -1,5 +1,10 @@
-# FlowerCore Tenant — flowercore.io (main brand)
+# FlowerCore Tenant — retired flowercore.io placeholder.
-# Public-facing placeholder landing page served by nginx
+#
 # Public flowercore.io/www.flowercore.io routing is now owned by
 # apps/fc-landing/fc-landing.yaml. This tenant placeholder remains available
 # only as an in-cluster service; do not create a duplicate public
 # IngressRoute here because it competes with fc-landing and requires a
 # namespace-local cf-origin-flowercore-io Secret.
 # ArgoCD managed - BlueJay Lab
 ---
 apiVersion: v1
@@ -10,12 +15,6 @@ metadata:
    app.kubernetes.io/part-of: bluejay-infra
    flowercore.io/tenant: flowercore
 ---
 # NOTE: The existing cf-origin-flowercore-io secret (covering *.flowercore.io)
 # must be copied into this namespace. It already exists in other namespaces.
 # Copy with: kubectl get secret cf-origin-flowercore-io -n fc-system -o yaml \
 #   | sed 's/namespace: .*/namespace: tenant-flowercore/' \
 #   | kubectl apply -f -
 ---
 # Landing page HTML
 apiVersion: v1
 kind: ConfigMap
@@ -311,22 +310,3 @@ spec:
    - port: 80
      targetPort: 80
      name: http
 ---
 # Traefik IngressRoute — public via Cloudflare
 # Uses existing cf-origin-flowercore-io cert (must be copied to this namespace)
 apiVersion: traefik.io/v1alpha1
 kind: IngressRoute
 metadata:
  name: flowercore-web
  namespace: tenant-flowercore
 spec:
  entryPoints:
    - websecure
  routes:
    - match: Host(`flowercore.io`) || Host(`www.flowercore.io`)
      kind: Rule
      services:
        - name: flowercore-web
          port: 80
  tls:
    secretName: cf-origin-flowercore-io
--- a/apps/guacamole/guacamole.yaml
+++ b/apps/guacamole/guacamole.yaml
--- a/apps/intranet/intranet.yaml
+++ b/apps/intranet/intranet.yaml
@@ -3,6 +3,28 @@ kind: Namespace
 metadata:
  name: intranet
 ---
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
  name: intranet-vector-store
  namespace: intranet
 spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: longhorn
  resources:
    requests:
      storage: 1Gi
 ---
 apiVersion: v1
 kind: ConfigMap
 metadata:
  name: intranet-config
  namespace: intranet
 data:
  KnowledgeApiKey: ""
  TrustedHeaderSharedSecret: ""
 ---
 apiVersion: apps/v1
 kind: Deployment
 metadata:
@@ -12,6 +34,8 @@ metadata:
    app: intranet-web
 spec:
  replicas: 1
  strategy:
    type: Recreate
  selector:
    matchLabels:
      app: intranet-web
@@ -22,7 +46,7 @@ spec:
    spec:
      containers:
        - name: intranet-web
-          image: localhost/fc-intranet-web:latest
+          image: localhost/fc-intranet-web:v20260508-brochure-w1
          imagePullPolicy: Never
          ports:
            - containerPort: 5300
@@ -32,25 +56,58 @@ spec:
              value: Production
            - name: ASPNETCORE_URLS
              value: "http://+:5300"
            # Bulk corpus indexing on edge1 Pi 5 takes ~6s/chunk × 5665 chunks
            # ≈ 9 hours. BLUEJAY-WS GPU (R9700, 32GB VRAM) does the same work
            # in minutes. Memory: feedback_pi5_nomic_embed_slow.
            - name: IntranetSearch__OllamaBaseUrl
              value: "http://10.0.56.20:11434"
            # Sprint E Phase 2α — JSON-file-backed PageReadingOverride persistence
            # on the writable PVC at /data. Without this env var the
            # intranet falls back to the in-memory store (loses state on
            # pod restart). Master's PageReadingOverrideOptions binds
            # PageReadingOverrides:FilePath.
            - name: PageReadingOverrides__FilePath
              value: "/data/page-reading-overrides.json"
            - name: KnowledgeFleetSearch__BaseUrl
              value: "https://knowledge.iamworkin.lan"
            - name: KnowledgeFleetSearch__ApiKey
              valueFrom:
                configMapKeyRef:
                  name: intranet-config
                  key: KnowledgeApiKey
                  optional: true
            - name: TrustedHeaderAuthentication__SharedSecret
              valueFrom:
                configMapKeyRef:
                  name: intranet-config
                  key: TrustedHeaderSharedSecret
                  optional: true
          resources:
            requests:
-              memory: "128Mi"
+              memory: "256Mi"
              cpu: "100m"
            limits:
-              memory: "512Mi"
+              memory: "1Gi"
-              cpu: "500m"
+              cpu: "1000m"
          livenessProbe:
            httpGet:
              path: /health
              port: 5300
-            initialDelaySeconds: 10
+            initialDelaySeconds: 30
            periodSeconds: 30
          readinessProbe:
            httpGet:
              path: /health
              port: 5300
-            initialDelaySeconds: 5
+            initialDelaySeconds: 10
            periodSeconds: 10
          volumeMounts:
            - name: vector-store
              mountPath: /data
      volumes:
        - name: vector-store
          persistentVolumeClaim:
            claimName: intranet-vector-store
 ---
 apiVersion: v1
 kind: Service
--- a/apps/knowledge/README.md
+++ b/apps/knowledge/README.md
@@ -0,0 +1,165 @@
 # knowledge — FlowerCore.Knowledge.Web (Phase 2.4 K8s deploy)
 **Status:** **LIVE 2026-04-27** at `https://knowledge.iamworkin.lan` —
 Phase 2.4 closed. Pod running, certificate issued (step-ca-acme), PVC
 bound (Longhorn 20Gi RWO), ArgoCD `infra-knowledge` synced. `/healthz`
 returns 200, `/api/v1/editions` returns `[]` (initial-deploy state — no
 *.db files in the PVC yet; Phase 2.5+ admin UI handles bulk
 population). Phase 1 of the Agent Zero MCP rollout keeps `/healthz`
 anonymous and gates `/mcp` behind `Authorization: Bearer <token>` built
 from the 1Password item `FlowerCore Knowledge MCP Tokens`.
 - Plan: [`../../../FlowerCore.Notes/docs/ai-agents/flowercore-knowledge-service-plan.md`](../../../FlowerCore.Notes/docs/ai-agents/flowercore-knowledge-service-plan.md)
 - Sprint: [`../../../FlowerCore.Notes/docs/ai-station/sprint-e-xxl-plan.md`](../../../FlowerCore.Notes/docs/ai-station/sprint-e-xxl-plan.md) (Track B)
 - Repo: `D:\git\FlowerCore\FlowerCore.Knowledge\` (private GitHub repo,
  bootstrapped Sprint D batch 35)
 `FlowerCore.Knowledge.Web` is the fleet-wide vector-indexing & RAG hub —
 a REST + MCP service that scans `*.db` files under
 `/data/vector-stores` and exposes per-edition reachability + corpus
 search to the rest of the FC ecosystem (Agent Zero, Chat.Web persona
 memory, AiStation embeddings explorer, TtsReader chapter context, BMO
 bot, Pi nodes via `fc-index sync`).
 Phase 1 MCP routing is explicit:
 - in-cluster Agent Zero → `http://knowledge-web.knowledge.svc/mcp`
 - workstation Agent Zero → `https://knowledge.iamworkin.lan/mcp`
 - probe URL for both lanes → `/healthz`
 ## Deployment order (do NOT skip / reorder)
 ### 1. FlowerCore.DNS public A record — knowledge.iamworkin.lan -> 10.0.56.200
 Required BEFORE the Certificate resource is created, or cert-manager
 HTTP-01 silently backs off ~2h. Memory: `feedback_pfsense_dns_required_for_acme`.
 The canonical path is FlowerCore.DNS:
 ```bash
 curl -sk https://dns.iamworkin.lan/api/v1/servers
 # Find the pfSense serverId, then create the record using the host label only.
 curl -sk -X POST https://dns.iamworkin.lan/api/v1/servers/<serverId>/zones/iamworkin.lan/records \
  -H "Content-Type: application/json" \
  -d '{"name":"knowledge","type":"A","data":"10.0.56.200","ttl":300}'
 ```
 If FlowerCore.DNS provider writes are failing 502 with "pfSense
 diag_command.php response did not contain a `<pre>` block" (status as of
 Sprint E Track B authoring 2026-04-27), add the override manually via
 the pfSense web UI:
 1. Log in to `https://10.0.56.1` as admin
 2. Services → DNS Resolver → General Settings → Host Overrides
 3. Add: Host=`knowledge`, Domain=`iamworkin.lan`, IP Address=`10.0.56.200`
 4. Save + Apply Changes
 Verify resolution from anywhere on LAN:
 ```bash
 nslookup knowledge.iamworkin.lan 10.0.56.1
 # Expect: 10.0.56.200
 ```
 Or against FlowerCore.DNS once the provider is fixed:
 ```bash
 curl -sk "https://dns.iamworkin.lan/api/v1/zones/iamworkin.lan/resolve-preflight?hostname=knowledge.iamworkin.lan"
 # Expect: "resolvable": true
 ```
 ### 2. Build + import the image to ALL RKE2 nodes
 Pods may schedule on any RKE2 worker (server, agent1, agent2). The
 Longhorn PVC accepts mounts from any node, so the image must be
 imported to all three. Memory:
 `feedback_rke2_image_import_targets_all_nodes` +
 `feedback_rke2_localhost_imagepullpolicy`.
 ```bash
 # From BLUEJAY-WS, in D:\git\FlowerCore\FlowerCore.Knowledge
 TAG="v$(date +%Y%m%d%H%M)"
 dotnet.exe publish -c Release -o deploy/app \
  src/FlowerCore.Knowledge.Web/FlowerCore.Knowledge.Web.csproj
 podman build -t localhost/fc-knowledge-web:$TAG -f deploy/Dockerfile.deploy deploy
 podman save localhost/fc-knowledge-web:$TAG -o /tmp/fc-knowledge-web.tar
 # Import to all three RKE2 nodes
 for node in rke2-server rke2-agent1 rke2-agent2; do
  scp /tmp/fc-knowledge-web.tar $node:/tmp/
  ssh $node "sudo /var/lib/rancher/rke2/bin/ctr -a /run/k3s/containerd/containerd.sock -n k8s.io images import /tmp/fc-knowledge-web.tar"
 done
 ```
 The repo's `scripts/deploy-knowledge.sh` automates this loop.
 ### 3. Bump the image tag + push
 Edit `knowledge.yaml`, replace `localhost/fc-knowledge-web:v202604272200`
 with the tag from step 2, then:
 ```bash
 cd D:/git/FlowerCore/bluejay-infra
 python scripts/check-pfsense-dns.py     # confirms the DNS preflight
 git add apps/knowledge/
 git commit -m "feat(knowledge): deploy Phase 2.4 K8s manifest"
 git push
 ```
 ArgoCD picks up within ~3 minutes and creates `infra-knowledge`.
 ### 4. Verify
 ```bash
 fcadmin_ssh noc1 '
  kubectl -n argocd get application infra-knowledge
  kubectl -n knowledge get certificate,pod,pvc
  curl -sk -m 8 -o /dev/null -w "HTTP %{http_code}\n" https://knowledge.iamworkin.lan/healthz
  curl -sk -m 8 https://knowledge.iamworkin.lan/api/v1/editions | jq
 '
 ```
 Expect: Certificate `Ready: True` within ~60s, `/healthz` HTTP 200,
 `/api/v1/editions` returns an empty array (no DBs in the PVC yet) on
 first deploy.
 ## Initial-deploy state and Phase 2.5 follow-up
 The Longhorn PVC is empty on first deploy. Knowledge.Web's filesystem
 catalog will report zero editions until vector-store `*.db` files are
 pushed into `/data/vector-stores`. Initial population is a follow-up
 step (Phase 2.5+, Blazor admin UI's "Rebuild" button); for the first
 deploy the goal is just to prove the pod boots, `/healthz` returns 200,
 and the Traefik IngressRoute serves the Scalar UI.
 To copy an existing local DB into the PVC (one-time, manual until
 Phase 2.5 admin UI lands):
 ```bash
 fcadmin_ssh noc1 '
  POD=$(kubectl -n knowledge get pod -l app=knowledge-web -o jsonpath="{.items[0].metadata.name}")
  kubectl -n knowledge cp /var/lib/flowercore/vector-stores/bluejay-ai.db $POD:/data/vector-stores/bluejay-ai.db
 '
 ```
 ## Probes + middleware notes
 - `/healthz` is mapped by `Controllers/HealthController.cs` (controller-based
  attribute route). Cheap — no DB, no dependencies.
 - Liveness uses `tcpSocket` as a defensive fallback in case future
  middleware accidentally gates `/healthz` behind auth (memory:
  `feedback_k8s_probes_behind_auth_middleware`).
 - `/openapi/v1.json` and `/scalar/v1` are wired by `UseFlowerCoreApi`.
  Per memory `feedback_k8s_probes_must_not_hit_openapi`, probes must NOT
  point at OpenAPI documents — the `MapOpenApi` call can be slow during
  cold startup.
 ## Resource sizing
 - 256Mi memory request / 1Gi limit.
 - 100m CPU request / 1000m limit.
 - 20Gi Longhorn PVC initial — sufficient for the bluejay-ai 1.94Gi DB +
  fleet-pi-edge 352Mi + fleet-bmo-bot 141Mi + headroom. Resize via
  `kubectl -n knowledge edit pvc knowledge-vector-store` if growing
  past 15Gi.
--- a/apps/knowledge/knowledge.yaml
+++ b/apps/knowledge/knowledge.yaml
@@ -0,0 +1,266 @@
 # FlowerCore.Knowledge.Web — fleet vector indexing & RAG hub.
 #
 # Phase 2.4 of the Knowledge service plan. REST + MCP service that scans
 # *.db files under /data/vector-stores and exposes:
 #   - REST: /api/v1/editions, /api/v1/corpus/search, /healthz
 #   - MCP:  list_editions, describe_edition, corpus_search
 #   - Static OpenAPI/Scalar via UseFlowerCoreApi
 #
 # Architecture:
 #   Plan:    FlowerCore.Notes/docs/ai-agents/flowercore-knowledge-service-plan.md
 #   Sprint:  FlowerCore.Notes/docs/ai-station/sprint-e-xxl-plan.md (Track B)
 #   Repo:    D:\git\FlowerCore\FlowerCore.Knowledge\
 #   Shared:  FlowerCore.Common -> FlowerCore.Shared.Indexing (chunkers, vector
 #            stores, edition profiles, ICorpusSearchService facade)
 #
 # Deployment order (see apps/knowledge/README.md and the bluejay-infra/README.md
 # top-level checklist):
 #   1. FlowerCore.DNS public A record knowledge.iamworkin.lan -> 10.0.56.200
 #      MUST exist BEFORE the Certificate is created, or cert-manager HTTP-01
 #      backs off ~2h. Memory: feedback_pfsense_dns_required_for_acme.
 #   2. Build + import the image to ALL RKE2 nodes (server + both agents) since
 #      the Pod uses a Longhorn PVC and may schedule anywhere.
 #      Memory: feedback_rke2_localhost_imagepullpolicy.
 #   3. Bump the image tag in this file, git push.
 #   4. ArgoCD ApplicationSet picks up within ~3 minutes and creates
 #      infra-knowledge.
 #
 # Initial-deploy state:
 #   The Longhorn PVC is empty on first deploy. Knowledge.Web's filesystem
 #   catalog will report zero editions until vector-store *.db files are
 #   pushed into /data/vector-stores. Initial population is a follow-up step
 #   (Phase 2.5+, Blazor admin UI's "Rebuild" button); for the first deploy
 #   the goal is just to prove the pod boots, /healthz returns 200, and the
 #   Traefik IngressRoute serves the Scalar UI.
 ---
 apiVersion: v1
 kind: Namespace
 metadata:
  name: knowledge
  labels:
    app.kubernetes.io/part-of: bluejay-infra
 ---
 # MCP bearer token for the read-only Agent Zero Phase 1 lane. The 1Password
 # item currently stores the raw token in its concealed PASSWORD field, which
 # the operator syncs into the namespaced Secret key `password`.
 apiVersion: onepassword.com/v1
 kind: OnePasswordItem
 metadata:
  name: knowledge-mcp-tokens
  namespace: knowledge
 spec:
  itemPath: "vaults/IAmWorkin/items/FlowerCore Knowledge MCP Tokens"
 ---
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
  name: knowledge-vector-store
  namespace: knowledge
 spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: longhorn
  resources:
    requests:
      storage: 20Gi
 ---
 apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: knowledge-web
  namespace: knowledge
  labels:
    app: knowledge-web
    app.kubernetes.io/name: knowledge-web
    app.kubernetes.io/part-of: bluejay-infra
 spec:
  replicas: 1
  revisionHistoryLimit: 3
  # RWO Longhorn PVC blocks rolling updates (multi-attach error). Recreate
  # is the canonical pattern (memory: feedback_rwo_pvc_blocks_rolling).
  strategy:
    type: Recreate
  selector:
    matchLabels:
      app: knowledge-web
  template:
    metadata:
      labels:
        app: knowledge-web
        app.kubernetes.io/name: knowledge-web
        app.kubernetes.io/part-of: bluejay-infra
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8080"
        prometheus.io/path: "/metrics"
    spec:
      securityContext:
        runAsNonRoot: true
        fsGroup: 1654
        fsGroupChangePolicy: OnRootMismatch
      containers:
        - name: web
          # Placeholder tag — bump to the image you built + imported to ALL
          # RKE2 nodes via scripts/deploy-knowledge.sh before applying.
          image: localhost/fc-knowledge-web:v20260429232635
          imagePullPolicy: Never
          command:
            - /bin/sh
            - -c
          args:
            - |
              if [ -n "${KNOWLEDGE_MCP_BEARER_TOKEN:-}" ]; then
                export FlowerCore__Mcp__ApiKey__Key="Bearer ${KNOWLEDGE_MCP_BEARER_TOKEN}"
              fi
              exec dotnet FlowerCore.Knowledge.Web.dll
          ports:
            - containerPort: 8080
              name: http
          env:
            - name: ASPNETCORE_URLS
              value: "http://+:8080"
            - name: ASPNETCORE_ENVIRONMENT
              value: "Production"
            - name: DOTNET_SYSTEM_GLOBALIZATION_INVARIANT
              value: "false"
            # Vector-store directory + embedding model + edition profile dir.
            # Profile JSON is baked into the image at /home/app/editions via the
            # csproj Content-link from FlowerCore.Common/editions/.
            - name: Knowledge__VectorStoresDirectory
              value: "/data/vector-stores"
            - name: Knowledge__EmbeddingModel
              value: "nomic-embed-text"
            - name: Knowledge__DefaultLimit
              value: "5"
            - name: Knowledge__MaxLimit
              value: "50"
            - name: FlowerCore__Editions__ProfileDirectory
              value: "/home/app/editions"
            # Embed via edge1 Pi 5 + AI HAT+ (10.0.57.17:11434). Cluster
            # services do not depend on BLUEJAY-WS (private dev hardware) per
            # bluejay-infra@0f9d56e. Query-time embedding is fast enough on
            # edge1 (~ms per query); bulk index rebuilds (Phase 2.5+) will
            # need a separate ingestion lane that can opt into the
            # workstation GPU when present.
            - name: FlowerCore__Ollama__BaseUrl
              value: "http://10.0.57.17:11434"
            - name: FlowerCore__Mcp__ApiKey__Key
              valueFrom:
                secretKeyRef:
                  name: knowledge-mcp-tokens
                  key: password
            - name: FlowerCore__Mcp__ApiKey__HeaderName
              value: "Authorization"
            - name: KNOWLEDGE_MCP_BEARER_TOKEN
              valueFrom:
                secretKeyRef:
                  name: knowledge-mcp-tokens
                  key: password
          resources:
            requests:
              cpu: 100m
              memory: 256Mi
            limits:
              cpu: 1000m
              memory: 1Gi
          # /healthz is mapped by HealthController (controller-based route).
          # tcpSocket liveness is the defensive fallback in case middleware
          # later gates /healthz behind auth (memory:
          # feedback_k8s_probes_behind_auth_middleware).
          startupProbe:
            httpGet:
              path: /healthz
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 5
            failureThreshold: 30
          readinessProbe:
            httpGet:
              path: /healthz
              port: 8080
            periodSeconds: 10
            failureThreshold: 3
          livenessProbe:
            tcpSocket:
              port: 8080
            initialDelaySeconds: 30
            periodSeconds: 30
            failureThreshold: 3
          securityContext:
            runAsNonRoot: true
            runAsUser: 1654
            runAsGroup: 1654
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true
            capabilities:
              drop:
                - ALL
          volumeMounts:
            - name: vector-store
              mountPath: /data/vector-stores
            - name: tmp
              mountPath: /tmp
            - name: logs
              mountPath: /home/app/logs
      volumes:
        - name: vector-store
          persistentVolumeClaim:
            claimName: knowledge-vector-store
        - name: tmp
          emptyDir: {}
        - name: logs
          emptyDir: {}
 ---
 apiVersion: v1
 kind: Service
 metadata:
  name: knowledge-web
  namespace: knowledge
  labels:
    app: knowledge-web
    app.kubernetes.io/name: knowledge-web
    app.kubernetes.io/part-of: bluejay-infra
 spec:
  type: ClusterIP
  selector:
    app: knowledge-web
  ports:
    - name: http
      port: 80
      targetPort: 8080
 ---
 apiVersion: cert-manager.io/v1
 kind: Certificate
 metadata:
  name: knowledge-tls
  namespace: knowledge
 spec:
  secretName: knowledge-tls
  issuerRef:
    name: step-ca-acme
    kind: ClusterIssuer
  dnsNames:
    - knowledge.iamworkin.lan
  # step-ca ACME caps lifetime at 30d; requesting 90d silently capped
  # made renewBefore=cert-lifetime → perpetual renewal loop (10888+ CRs
  # in 18h on 2026-05-07). Match working 720h/240h pattern from other
  # FC services.
  duration: 720h     # 30d (step-ca cap)
  renewBefore: 240h  # 10d
 ---
 apiVersion: traefik.io/v1alpha1
 kind: IngressRoute
 metadata:
  name: knowledge
  namespace: knowledge
 spec:
  entryPoints:
    - websecure
  routes:
    - match: Host(`knowledge.iamworkin.lan`)
      kind: Rule
      services:
        - name: knowledge-web
          port: 80
  tls:
    secretName: knowledge-tls
--- a/apps/knowledge/kustomization.yaml
+++ b/apps/knowledge/kustomization.yaml
@@ -0,0 +1,7 @@
 # ArgoCD's bluejay-infra ApplicationSet uses a directory generator and does
 # not require kustomization.yaml. Mirrors the fc-distribution shape so
 # `kubectl kustomize` previews work from a working copy.
 apiVersion: kustomize.config.k8s.io/v1beta1
 kind: Kustomization
 resources:
  - knowledge.yaml
--- a/apps/kubevirt-vms/ci1.yaml
+++ b/apps/kubevirt-vms/ci1.yaml
@@ -0,0 +1,487 @@
 # =============================================================================
 # ci1 — Windows Server 2025 KubeVirt VM (GitHub Actions Self-Hosted Runner)
 # =============================================================================
 # Purpose: dedicated CI runner for FlowerCore.Updater Sandbox E2E nightly +
 # future fleet WPF AAT lanes. Replaces the never-registered
 # `bluejay-ws-sandbox-1` runner placeholder. Andrew explicitly does NOT want
 # BLUEJAY-WS registered as a runner (workstation has personal/operator state).
 #
 # Storage layout (2026-05-08):
 #   * ISO is now sourced from Synology NFS (Path B) — see
 #     win2025-iso-nfs-pv.yaml. The Longhorn Filesystem PVC
 #     `windows-server-2025-iso` below is RETAINED but UNUSED so the prior
 #     CDI upload state is preserved as a fallback (and so ArgoCD doesn't
 #     prune it on this commit). It can be deleted in a follow-up commit
 #     after the NFS path is proven on a successful Windows install.
 #
 # Status (2026-05-08): LIVE — Phase 1 prereqs satisfied:
 #   * Multus CNI v4.2.2 thick-plugin DaemonSet running on all 3 RKE2 nodes
 #     (apps/multus/multus.yaml; ApplicationSet `infra-multus` Synced/Healthy)
 #   * CDI v1.65.0 operator + CR Deployed (apps/cdi/; ApplicationSet
 #     `infra-cdi` Synced/Healthy; uploadproxy reachable via kubectl port-forward)
 #   * Windows Server 2025 ISO uploaded via CDI virtctl image-upload to
 #     PVC windows-server-2025-iso (7.7 GiB → 10Gi PVC, Bound, Upload Complete)
 #   * Local Administrator password generated, stored in 1Password vault
 #     IAmWorkin (qaphopopkryhbg353ukzhhuqoq) item id h3ix4mgfk65gmkcmvh6ly3d3hu
 #   * NetworkAttachmentDefinition prod-vlan57 registered (apps/kubevirt-vms/
 #     prod-vlan57-nad.yaml). VM still uses pod-network masquerade until Phase 1.5
 #     host bridge work lands (Puppet br-prod + enp86s0.57); switching is a
 #     one-line YAML edit + git push.
 #
 # See docs/infrastructure/windows-server-build-runner-plan.md "Phase 1 readiness gate".
 #
 # Network choice in this draft: **pod-network fallback** (Calico default).
 # Outbound-only is fine for the Updater Sandbox E2E runner workload (the runner
 # polls GitHub Actions over HTTPS; no inbound listener needed). Switch to a
 # Multus PROD VLAN NetworkAttachmentDefinition once Multus is installed and the
 # operator wants L2 access from `ci1` to other PROD VLAN services.
 #
 # Sizing: 8 vCPU / 16 GB RAM / 200 GB disk on Longhorn (default storageClass).
 # Capacity check 2026-05-08: each RKE2 node has 16 vCPU / ~64Gi allocatable;
 # 8 vCPU is ~17% of one node's allocatable, fits comfortably.
 #
 # Apply (after operator approval + ISO loaded):
 #   kubectl --kubeconfig $env:USERPROFILE\.kube\rke2.yaml apply -f apps/kubevirt-vms/ci1.yaml
 #
 # Connect to console for Windows install:
 #   virtctl --kubeconfig $env:USERPROFILE\.kube\rke2.yaml vnc ci1 -n kubevirt-vms
 #   (Or via Guacamole once a connection profile is added.)
 # =============================================================================
 apiVersion: v1
 kind: Namespace
 metadata:
  name: kubevirt-vms
  labels:
    app.kubernetes.io/part-of: kubevirt-stack
    pod-security.kubernetes.io/enforce: privileged
 ---
 # ISO PVC — populated via CDI virtctl image-upload (CDI is now installed).
 #
 # **Volume mode (2026-05-08 status):** Filesystem-mode PVC. A migration to
 # `volumeMode: Block` via DataVolume was attempted to address an OVMF SATA
 # CDROM read timeout, but CDI v1.65.0's upload-target pod runs as uid 107
 # with `capabilities.drop: [ALL]` and cannot open the underlying block
 # device (`blockdev: cannot open /dev/cdi-block-volume: Permission denied`).
 # Reverted to Filesystem PVC pending one of:
 #   - CDI deployment override granting CAP_SYS_RAWIO to upload pod
 #   - Pre-populated PVC via privileged init pod that dd's the ISO directly
 #   - Migration to a different storage class that exposes block devices
 #     differently (e.g. iSCSI, where Longhorn's CSI mount path may behave
 #     differently)
 #
 # Population workflow (this PVC, Filesystem mode):
 #   1. virtctl --kubeconfig $env:USERPROFILE\.kube\rke2.yaml image-upload pvc \
 #        windows-server-2025-iso -n kubevirt-vms \
 #        --image-path "$env:USERPROFILE\Downloads\en-us_windows_server_2025_updated_march_2026_x64_dvd_8e06425a.iso" \
 #        --size 10Gi --storage-class longhorn --access-mode ReadWriteOnce \
 #        --uploadproxy-url https://localhost:8443 --insecure
 #   (--uploadproxy-url uses port-forward in practice: `kubectl port-forward
 #   -n cdi service/cdi-uploadproxy 8443:443 &` first.)
 #
 # **Open boot issue:** even with the ISO at bootOrder:1, OVMF console showed:
 #   BdsDxe: starting Boot0001 "UEFI QEMU DVD-ROM QM00001 " from ... Sata(...)
 #   BdsDxe: failed to start Boot0001 ... Time out
 # Diagnosis confirmed PVC content IS a valid bootable ISO9660 image — the
 # timeout is in OVMF reading from the SATA-CDROM-backed-by-filesystem-PVC.
 # Block mode would likely fix it; see CDI permission issue above.
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
  name: windows-server-2025-iso
  namespace: kubevirt-vms
  labels:
    app: ci-runner
    flowercore.io/managed-by: bluejay-infra
 spec:
  accessModes:
    - ReadWriteOnce          # Bump to ReadOnlyMany after population for multi-VM use
  resources:
    requests:
      storage: 10Gi          # Server 2025 ISO is 7.7GB; 10Gi for headroom
  storageClassName: longhorn
 ---
 # Root disk PVC — empty 200Gi volume that Windows installs into.
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
  name: ci1-rootdisk
  namespace: kubevirt-vms
 spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 200Gi
  storageClassName: longhorn
 ---
 # Sysprep ConfigMap — autounattend.xml for hands-off Windows install.
 # Sets local Administrator password (REPLACE the placeholder), enables RDP,
 # enables WinRM, sets hostname, and configures static-ish networking via DHCP.
 # The ISO + VirtIO drivers handle the rest.
 apiVersion: v1
 kind: ConfigMap
 metadata:
  name: ci1-autounattend
  namespace: kubevirt-vms
 data:
  autounattend.xml: |
    <?xml version="1.0" encoding="utf-8"?>
    <unattend xmlns="urn:schemas-microsoft-com:unattend">
      <!-- Pass 1: WindowsPE — Disk setup and VirtIO driver injection -->
      <settings pass="windowsPE">
        <component name="Microsoft-Windows-International-Core-WinPE"
                   processorArchitecture="amd64"
                   publicKeyToken="31bf3856ad364e35"
                   language="neutral" versionScope="nonSxS">
          <SetupUILanguage>
            <UILanguage>en-US</UILanguage>
          </SetupUILanguage>
          <InputLocale>en-US</InputLocale>
          <SystemLocale>en-US</SystemLocale>
          <UILanguage>en-US</UILanguage>
          <UserLocale>en-US</UserLocale>
        </component>
        <component name="Microsoft-Windows-PnpCustomizationsWinPE"
                   processorArchitecture="amd64"
                   publicKeyToken="31bf3856ad364e35"
                   language="neutral" versionScope="nonSxS">
          <DriverPaths>
            <PathAndCredentials wcm:action="add" wcm:keyValue="1">
              <Path>E:\amd64\2k25</Path>
            </PathAndCredentials>
          </DriverPaths>
        </component>
        <component name="Microsoft-Windows-Setup"
                   processorArchitecture="amd64"
                   publicKeyToken="31bf3856ad364e35"
                   language="neutral" versionScope="nonSxS">
          <DiskConfiguration>
            <Disk wcm:action="add">
              <DiskID>0</DiskID>
              <WillWipeDisk>true</WillWipeDisk>
              <CreatePartitions>
                <CreatePartition wcm:action="add">
                  <Order>1</Order>
                  <Size>260</Size>
                  <Type>EFI</Type>
                </CreatePartition>
                <CreatePartition wcm:action="add">
                  <Order>2</Order>
                  <Size>128</Size>
                  <Type>MSR</Type>
                </CreatePartition>
                <CreatePartition wcm:action="add">
                  <Order>3</Order>
                  <Extend>true</Extend>
                  <Type>Primary</Type>
                </CreatePartition>
              </CreatePartitions>
              <ModifyPartitions>
                <ModifyPartition wcm:action="add">
                  <Order>1</Order>
                  <PartitionID>1</PartitionID>
                  <Format>FAT32</Format>
                  <Label>EFI</Label>
                </ModifyPartition>
                <ModifyPartition wcm:action="add">
                  <Order>2</Order>
                  <PartitionID>2</PartitionID>
                </ModifyPartition>
                <ModifyPartition wcm:action="add">
                  <Order>3</Order>
                  <PartitionID>3</PartitionID>
                  <Format>NTFS</Format>
                  <Label>Windows</Label>
                </ModifyPartition>
              </ModifyPartitions>
            </Disk>
          </DiskConfiguration>
          <ImageInstall>
            <OSImage>
              <InstallTo>
                <DiskID>0</DiskID>
                <PartitionID>3</PartitionID>
              </InstallTo>
              <!-- Index 2 = Standard Desktop Experience. Use 4 for Datacenter Desktop. -->
              <InstallFrom>
                <MetaData wcm:action="add">
                  <Key>/IMAGE/INDEX</Key>
                  <Value>2</Value>
                </MetaData>
              </InstallFrom>
            </OSImage>
          </ImageInstall>
          <UserData>
            <AcceptEula>true</AcceptEula>
            <FullName>FlowerCore CI Runner</FullName>
            <Organization>FlowerCore</Organization>
            <!-- Eval install — no product key needed for 180-day evaluation -->
          </UserData>
        </component>
      </settings>
      <!-- Pass 4: Specialize — Hostname, RDP, WinRM -->
      <settings pass="specialize">
        <component name="Microsoft-Windows-Shell-Setup"
                   processorArchitecture="amd64"
                   publicKeyToken="31bf3856ad364e35"
                   language="neutral" versionScope="nonSxS">
          <ComputerName>CI1</ComputerName>
          <TimeZone>Central Standard Time</TimeZone>
        </component>
        <component name="Microsoft-Windows-TerminalServices-LocalSessionManager"
                   processorArchitecture="amd64"
                   publicKeyToken="31bf3856ad364e35"
                   language="neutral" versionScope="nonSxS">
          <fDenyTSConnections>false</fDenyTSConnections>
        </component>
      </settings>
      <!-- Pass 7: OOBE — Admin account, RDP firewall, WinRM -->
      <settings pass="oobeSystem">
        <component name="Microsoft-Windows-Shell-Setup"
                   processorArchitecture="amd64"
                   publicKeyToken="31bf3856ad364e35"
                   language="neutral" versionScope="nonSxS">
          <OOBE>
            <HideEULAPage>true</HideEULAPage>
            <HideLocalAccountScreen>true</HideLocalAccountScreen>
            <HideOEMRegistrationScreen>true</HideOEMRegistrationScreen>
            <HideOnlineAccountScreens>true</HideOnlineAccountScreens>
            <HideWirelessSetupInOOBE>true</HideWirelessSetupInOOBE>
            <ProtectYourPC>3</ProtectYourPC>
          </OOBE>
          <UserAccounts>
            <AdministratorPassword>
              <!-- Real password is in 1Password — vault qaphopopkryhbg353ukzhhuqoq,
                   item id h3ix4mgfk65gmkcmvh6ly3d3hu, title:
                   "ci1 Administrator (Windows Server 2025 KubeVirt VM)".
                   Field "autounattend AdministratorPassword Value (UTF-16-LE base64)"
                   matches the Value below.
                   To rotate: regenerate, recompute base64
                     $combined = $pw + "AdministratorPassword"
                     [Convert]::ToBase64String([Text.Encoding]::Unicode.GetBytes($combined))
                   then update both 1P item AND this Value field, recreate VM. -->
              <Value>bAA3AGsANABOAHcAcgBMAG4AeQBTAHUAYgBBAHQAaQBzAFUAcAB6AEMAWQAhADkAYQBCAEEAZABtAGkAbgBpAHMAdAByAGEAdABvAHIAUABhAHMAcwB3AG8AcgBkAA==</Value>
              <PlainText>false</PlainText>
            </AdministratorPassword>
          </UserAccounts>
          <FirstLogonCommands>
            <SynchronousCommand wcm:action="add">
              <Order>1</Order>
              <CommandLine>powershell.exe -ExecutionPolicy Bypass -Command "Set-NetFirewallRule -DisplayGroup 'Remote Desktop' -Enabled True"</CommandLine>
              <Description>Enable RDP firewall rule</Description>
            </SynchronousCommand>
            <SynchronousCommand wcm:action="add">
              <Order>2</Order>
              <CommandLine>powershell.exe -ExecutionPolicy Bypass -Command "Enable-PSRemoting -Force; Set-Item WSMan:\localhost\Service\Auth\Basic $true; Set-Item WSMan:\localhost\Service\AllowUnencrypted $true"</CommandLine>
              <Description>Enable WinRM (Phase 2 will pivot to HTTPS via step-ca cert)</Description>
            </SynchronousCommand>
            <SynchronousCommand wcm:action="add">
              <Order>3</Order>
              <CommandLine>cmd.exe /c reg add "HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System" /v EnableLUA /t REG_DWORD /d 0 /f</CommandLine>
              <Description>Disable UAC (Phase 2 Puppet will re-evaluate)</Description>
            </SynchronousCommand>
          </FirstLogonCommands>
        </component>
      </settings>
    </unattend>
 ---
 # VirtualMachine — Windows Server 2025 CI runner.
 apiVersion: kubevirt.io/v1
 kind: VirtualMachine
 metadata:
  name: ci1
  namespace: kubevirt-vms
  labels:
    app: ci-runner
    role: github-actions-runner
    flowercore.io/managed-by: bluejay-infra
 spec:
  # `running: true` is deprecated in favor of `runStrategy`. They are mutually
  # exclusive — KubeVirt's validating webhook rejects any VM that sets both:
  #   admission webhook "virtualmachine-validator.kubevirt.io" denied the request:
  #   Running and RunStrategy are mutually exclusive.
  # `Always` keeps a VMI running and restarts it if it crashes/exits — same
  # semantics as the old `running: true`.
  #
  # **2026-05-08 status: VM cannot start due to a stale QEMU flock on the
  # rootdisk PVC** (qemu reports `Failed to get "write" lock` on
  # `/var/run/kubevirt-private/vmi-disks/rootdisk/disk.img`). The flock was
  # left by a previous QEMU process during a force-deleted launcher pod
  # cycle. Recovery requires either (a) a Longhorn engine restart on
  # rke2-agent2, (b) a Longhorn volume detach via the longhorn-manager API
  # (kubectl patch on `volume.longhorn.io/<pvc-name>` does not work — the
  # spec.nodeID is reconciled back), or (c) a node reboot of rke2-agent2.
  #
  # **Confirmed working:** the bootOrder swap (windows-iso=1, rootdisk=2)
  # and the runStrategy migration (above). The ISO PVC was successfully
  # repopulated via virtctl image-upload pvc on the Filesystem-mode PVC.
  #
  # **Open: SATA CDROM read timeout** — even with bootOrder=1, OVMF reported
  # `BdsDxe: failed to start Boot0001 ... Time out` reading the SATA CDROM
  # backed by the Filesystem-mode PVC. A switch to Block-mode DataVolume
  # was attempted but blocked by a CDI v1.65.0 upload-pod permission issue
  # (capability drop prevents writing to the underlying block device).
  # See header docstring on the ISO PVC.
  runStrategy: Always   # LIVE — ISO uploaded 2026-05-08, password in 1P
  template:
    metadata:
      labels:
        app: ci-runner
        role: github-actions-runner
        kubevirt.io/vm: ci1
    spec:
      domain:
        cpu:
          cores: 8
          sockets: 1
          threads: 1
        memory:
          guest: 16Gi
        resources:
          requests:
            memory: 16Gi
          limits:
            memory: 16Gi
        clock:
          utc: {}
          timer:
            hpet:
              present: false
            pit:
              tickPolicy: delay
            rtc:
              tickPolicy: catchup
            hyperv: {}
        features:
          acpi: {}
          apic: {}
          hyperv:
            relaxed: {}
            vapic: {}
            spinlocks:
              spinlocks: 8191
          smm: {}
        firmware:
          bootloader:
            efi:
              secureBoot: true
        devices:
          tpm: {}             # Non-persistent vTPM — sufficient for runner; no BitLocker
          disks:
            # bootOrder: ISO must be 1 for first-boot install (the rootdisk has no
            # EFI bootloader yet). After Windows installs, it writes its own UEFI
            # Boot#### entries pointing at the rootdisk's EFI partition; UEFI then
            # boots from rootdisk going forward and the ISO at bootOrder:2 acts as
            # a fallback for re-install scenarios.
            #
            # Original (broken) order had rootdisk=1, windows-iso=2 — UEFI tried
            # the empty virtio disk first, got nothing, fell back to the SATA
            # CDROM at Boot0001 with a short timeout, and timed out before the
            # CDROM enumerated. Console showed:
            #   BdsDxe: failed to start Boot0001 ... Time out
            #   BdsDxe: No bootable option or device was found.
            # Confirmed via debug pod: PVC content IS a real bootable ISO9660
            # (file: "ISO 9660 CD-ROM filesystem data ... (bootable)"), so the
            # only bug was boot priority.
            # 2026-05-08 PM: cdrom bus is SCSI (virtio-scsi controller). Bus
            # choice is no longer load-bearing since the ISO is delivered via
            # containerDisk (see volumes block below) — both SATA and SCSI
            # work fine when the cdrom backing isn't a slow PVC. SCSI is kept
            # because it's the modern bus and matches the standard FC
            # KubeVirt VM template.
            - name: windows-iso
              bootOrder: 1
              cdrom:
                bus: scsi
            - name: rootdisk
              bootOrder: 2
              disk:
                bus: virtio
            - name: virtio-drivers
              cdrom:
                bus: sata
            - name: sysprep
              cdrom:
                bus: sata
          interfaces:
            # Pod-network fallback for Phase 1. To switch to PROD VLAN once Multus
            # + the prod-vlan57 NAD exist, replace this block with:
            #   - name: prod-net
            #     bridge: {}
            #     model: virtio
            # and update the networks: stanza to use multus.networkName: kubevirt-vms/prod-vlan57
            - name: default
              masquerade: {}
              model: virtio
        machine:
          type: q35
      networks:
        - name: default
          pod: {}
      volumes:
        - name: rootdisk
          persistentVolumeClaim:
            claimName: ci1-rootdisk
        - name: windows-iso
          # 2026-05-08 PM (Path C, CONTAINERDISK): the ISO is now packaged as
          # a KubeVirt containerDisk OCI image baked from
          # `FROM scratch ; ADD --chown=107:107 disk.img /disk/disk.img`.
          # The qemu user (uid 107) reads the ISO directly from a tmpfs view
          # of the OCI layer, bypassing both:
          #   - Synology NFS export ACL (Path B failed: uid 107 denied at
          #     directory level even with mode 0777, see memory
          #     feedback_synology_iso_export_root_only_uid_107_denied)
          #   - OVMF cdrom read-window timeout (Path A and Path B's SCSI
          #     retry both hit `BdsDxe: failed to start Boot0001 ... Time out`
          #     when the cdrom was backed by a PVC the storage controller
          #     couldn't satisfy reads from fast enough).
          #
          # Image build (one-time, per ISO version):
          #   1. Copy ISO to disk.img, write Dockerfile
          #   2. podman build --tag localhost/win-server-2025:1.0 .  (on noc1)
          #   3. podman save -o win-server-2025-1.0.tar localhost/win-server-2025:1.0
          #   4. SCP tar to all 3 RKE2 nodes (rke2-server, rke2-agent1, rke2-agent2)
          #   5. sudo /var/lib/rancher/rke2/bin/ctr -a /run/k3s/containerd/containerd.sock \
          #        -n k8s.io images import /tmp/win-server-2025-1.0.tar
          # Standard FC pattern per `feedback_rke2_localhost_imagepullpolicy`.
          #
          # When a new Windows ISO version ships, bump the tag (1.1, 1.2, ...),
          # rebuild + redistribute, and update the image: line below in a new
          # commit. KubeVirt picks up the new image via a VM restart.
          #
          # The legacy NFS PVC + PV (apps/kubevirt-vms/win2025-iso-nfs-pv.yaml)
          # and CDI Longhorn PVC (`windows-server-2025-iso`) are RETAINED for
          # this commit so the prior states are recoverable. Once the
          # containerDisk path proves on a successful Windows install, both
          # legacy artifacts can be pruned in a follow-up commit.
          containerDisk:
            image: localhost/win-server-2025:1.0
            imagePullPolicy: Never
        - name: virtio-drivers
          containerDisk:
            # Pinned to v1.8.2 (latest stable as of 2026-05-08).
            # The :latest tag uses Docker manifest v1 schema which containerd
            # 2.1 (RKE2 v1.34.5) refuses to pull with:
            #   "media type application/vnd.docker.distribution.manifest.v1+prettyjws
            #    is no longer supported since containerd v2.1"
            # v1.8.2 is rebuilt with manifest v2/OCI and works on containerd 2.1.
            # Bump available: https://quay.io/repository/kubevirt/virtio-container-disk?tab=tags
            image: quay.io/kubevirt/virtio-container-disk:v1.8.2
        - name: sysprep
          sysprep:
            configMap:
              name: ci1-autounattend
      terminationGracePeriodSeconds: 3600
--- a/apps/kubevirt-vms/prod-vlan57-nad.yaml
+++ b/apps/kubevirt-vms/prod-vlan57-nad.yaml
@@ -0,0 +1,69 @@
 # =============================================================================
 # NetworkAttachmentDefinition — PROD VLAN 57 bridge
 # =============================================================================
 # Purpose: makes KubeVirt VMs reachable on the PROD VLAN (10.0.57.0/24)
 # alongside the existing pod network. Required for ci1 to bridge onto PROD
 # (e.g. to provision/scrape edge1, edge2, kiosks, Pis on the same L2 segment).
 #
 # **DEPLOY GATE — Phase 1.5 host work required first**:
 #   On every RKE2 node (rke2-server, rke2-agent1, rke2-agent2):
 #     1. Switch port (UniFi USL16LP) trunks VLAN 57 to the node — usually
 #        already true since BLUEJAY-WS reaches 10.0.57.x services. Verify
 #        with `ip link show enp86s0.57` after configuring sub-interface, OR
 #        `tcpdump -ni enp86s0 vlan 57` and ping a known PROD host.
 #     2. Linux bridge `br-prod` enslaving `enp86s0.57` (VLAN sub-interface).
 #        NetworkManager profile examples in the runbook below.
 #     3. Verify Multus DaemonSet `kube-multus-ds` is Ready on all nodes.
 #
 # Without those, applying this NAD has no effect except to register the CRD.
 # A VM that requests this NAD with no bridge present will fail with:
 #   `error adding pod kubevirt-vms_ci1 to CNI network "prod-vlan57": failed to
 #    plumb VLAN: open /sys/class/net/br-prod/master: no such file or directory`
 #
 # Configuration notes:
 #   - cniVersion 0.3.1 to match Multus daemon-config.json
 #   - mtu 1500 (matches enp86s0 default; bump if jumbo frames configured)
 #   - bridge name `br-prod` is convention; if Puppet picks a different name
 #     (e.g. `br57`, `br-vlan57`), edit BOTH this NAD and the ci1.yaml
 #     interface block. Keep them in sync.
 #   - vlan: 0 because the host bridge already strips VLAN tag (br-prod sits
 #     on top of `enp86s0.57`). If we instead used a VLAN-aware bridge with
 #     trunk port, set vlan: 57 here. Current convention is VLAN-stripped at
 #     the sub-interface, so the bridge passes untagged frames.
 #
 # Apply:
 #   kubectl --kubeconfig $env:USERPROFILE\.kube\rke2.yaml apply -f apps/kubevirt-vms/prod-vlan57-nad.yaml
 #
 # Then update ci1.yaml networks: stanza to:
 #   - name: prod-net
 #     multus:
 #       networkName: kubevirt-vms/prod-vlan57
 # and the interface block from `masquerade` to `bridge`.
 # =============================================================================
 ---
 # Namespace must exist already (created by ci1.yaml's first document).
 # This file imports a NAD into that same namespace.
 apiVersion: k8s.cni.cncf.io/v1
 kind: NetworkAttachmentDefinition
 metadata:
  name: prod-vlan57
  namespace: kubevirt-vms
  annotations:
    bluejay.iamworkin.lan/host-bridge: "br-prod (enslaves enp86s0.57)"
    bluejay.iamworkin.lan/cidr: "10.0.57.0/24"
    bluejay.iamworkin.lan/gateway: "10.0.57.1"
    bluejay.iamworkin.lan/dns: "10.0.56.1 (pfSense Unbound)"
 spec:
  config: |
    {
      "cniVersion": "0.3.1",
      "name": "prod-vlan57",
      "type": "bridge",
      "bridge": "br-prod",
      "ipam": {},
      "mtu": 1500,
      "vlan": 0,
      "promiscMode": true,
      "preserveDefaultVlan": false
    }
--- a/apps/kubevirt-vms/win2025-iso-nfs-pv.yaml
+++ b/apps/kubevirt-vms/win2025-iso-nfs-pv.yaml
@@ -0,0 +1,99 @@
 # =============================================================================
 # Windows Server 2025 ISO — Static NFS PV (Path B for SATA-CDROM timeout)
 # =============================================================================
 # Purpose: Mount the ISO from Synology NAS via NFS instead of from a Longhorn-
 # backed Filesystem PVC.
 #
 # Why: SATA-CDROM emulation reading from a Longhorn-backed Filesystem PVC is
 # too slow for OVMF's boot read window — the DVD-ROM enumeration times out
 # before the bootloader can be read. Symptom on the serial console:
 #   BdsDxe: failed to start Boot0001 "UEFI QEMU DVD-ROM QM00001 " from ...
 #   BdsDxe: failed to start Boot0001 ... Time out
 #   BdsDxe: No bootable option or device was found
 # Diagnosis confirmed the ISO content is a perfectly valid bootable ISO9660
 # image — the bug is in the timing path between OVMF and Longhorn-backed
 # storage, not in the ISO itself.
 #
 # Block-mode PVC was tried (`volumeMode: Block` via DataVolume) and would
 # likely fix the timing, but CDI v1.65.0's upload-target pod cannot open the
 # block device due to runAsUser:107 + capabilities.drop:[ALL] and we got:
 #   blockdev: cannot open /dev/cdi-block-volume: Permission denied
 #
 # NFS-mounted ISO bypasses both issues: no Longhorn slowness, no CDI upload
 # pod permission concerns. The ISO is read directly from the NAS over a
 # native NFSv4.1 mount that QEMU's SATA emulator can read at full LAN speed.
 #
 # Layout on Synology:
 #   /volume1/ISOs/                                              (existing export, RKE2 ACL)
 #     en-us_windows_server_2025_updated_march_2026_x64_dvd_8e06425a.iso
 #     win2025-iso-disk/                                         (new subdir, 2026-05-08)
 #       disk.img -> hardlink to ../en-us_windows_server_2025_..._8e06425a.iso
 #
 # KubeVirt's launcher pod expects a PVC mounted at
 # /var/run/kubevirt-private/vmi-disks/<diskName>/disk.img — by mounting the
 # `win2025-iso-disk/` subdir as the NFS PV root, `disk.img` lives at the PV's
 # root and KubeVirt's CDROM emulator finds it without any path manipulation.
 #
 # A symlink would NOT work for sub-path NFS mounts (the relative target
 # `../...iso` falls outside the sub-mount root). A hardlink works because it
 # references the same inode regardless of mount point.
 #
 # Memory references:
 #   - feedback_synology_nfs_volume1_kubernetes_export_scoped (Synology export
 #     scoping pattern — but /volume1/ISOs export, unlike /volume1/kubernetes,
 #     does support sub-path mounts because Synology NFS is configured with
 #     pseudo-fs in NFSv4.1)
 #   - feedback_kubevirt_iso_first_install_bootorder_and_runstrategy (boot
 #     order / runStrategy gotchas, separate from the storage timing issue)
 #
 # Validation (2026-05-08, from rke2-server / rke2-agent1 / rke2-agent2):
 #   mount -t nfs -o nfsvers=4.1,ro 10.0.58.3:/volume1/ISOs/win2025-iso-disk /tmp/m
 #   file /tmp/m/disk.img
 #     -> ISO 9660 CD-ROM filesystem data 'SSS_X64FRE_EN-US_DV9' (bootable)
 # All 3 RKE2 nodes can mount and read.
 # =============================================================================
 apiVersion: v1
 kind: PersistentVolume
 metadata:
  name: windows-server-2025-iso-nfs
  labels:
    flowercore.io/iso: windows-server-2025
    flowercore.io/managed-by: bluejay-infra
 spec:
  capacity:
    storage: 8Gi
  accessModes:
    - ReadOnlyMany
  volumeMode: Filesystem
  persistentVolumeReclaimPolicy: Retain
  storageClassName: ""              # static, no provisioner
  mountOptions:
    - nfsvers=4.1
    - ro
    - hard
    - timeo=600
    - retrans=3
  nfs:
    server: 10.0.58.3               # BlueJayNAS Synology DS1621+ on HOME VLAN 58
    path: /volume1/ISOs/win2025-iso-disk
    readOnly: true
 ---
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
  name: windows-server-2025-iso-nfs
  namespace: kubevirt-vms
  labels:
    app: ci-runner
    flowercore.io/managed-by: bluejay-infra
 spec:
  accessModes:
    - ReadOnlyMany
  volumeMode: Filesystem
  resources:
    requests:
      storage: 8Gi
  storageClassName: ""
  volumeName: windows-server-2025-iso-nfs
--- a/apps/monitoring/fc-updatecenter-dashboard.grafana.txt
+++ b/apps/monitoring/fc-updatecenter-dashboard.grafana.txt
@@ -0,0 +1,762 @@
 {
  "annotations": {
    "list": []
  },
  "editable": true,
  "fiscalYearStartMonth": 0,
  "graphTooltip": 1,
  "id": null,
  "links": [
    {
      "icon": "external link",
      "includeVars": false,
      "keepTime": false,
      "targetBlank": true,
      "title": "Open Service",
      "type": "link",
      "url": "https://updatecenter.iamworkin.lan/"
    }
  ],
  "panels": [
    {
      "datasource": {
        "type": "prometheus",
        "uid": "fffjikve8llhce"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "thresholds"
          },
          "mappings": [
            {
              "options": {
                "0": {
                  "color": "#f87171",
                  "index": 1,
                  "text": "DOWN"
                },
                "1": {
                  "color": "#4ade80",
                  "index": 0,
                  "text": "UP"
                }
              },
              "type": "value"
            }
          ],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "#f87171",
                "value": null
              },
              {
                "color": "#4ade80",
                "value": 1
              }
            ]
          }
        },
        "overrides": []
      },
      "gridPos": {
        "h": 4,
        "w": 8,
        "x": 0,
        "y": 0
      },
      "id": 1,
      "options": {
        "colorMode": "background",
        "graphMode": "none",
        "justifyMode": "center",
        "orientation": "auto",
        "reduceOptions": {
          "calcs": [
            "lastNotNull"
          ],
          "fields": "",
          "values": false
        },
        "textMode": "value_and_name"
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "fffjikve8llhce"
          },
          "expr": "probe_success{job=\"probe-traefik-services\",instance=\"updatecenter.iamworkin.lan\"}",
          "refId": "A",
          "legendFormat": "Availability"
        }
      ],
      "title": "Service Availability",
      "transparent": true,
      "type": "stat"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "fffjikve8llhce"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "thresholds"
          },
          "decimals": 2,
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "#f87171",
                "value": null
              },
              {
                "color": "#fbbf24",
                "value": 95
              },
              {
                "color": "#FFB300",
                "value": 99
              },
              {
                "color": "#4ade80",
                "value": 99.9
              }
            ]
          },
          "unit": "percent"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 4,
        "w": 8,
        "x": 8,
        "y": 0
      },
      "id": 2,
      "options": {
        "colorMode": "background_solid",
        "graphMode": "area",
        "justifyMode": "center",
        "orientation": "auto",
        "reduceOptions": {
          "calcs": [
            "lastNotNull"
          ],
          "fields": "",
          "values": false
        },
        "textMode": "value_and_name"
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "fffjikve8llhce"
          },
          "expr": "avg_over_time(probe_success{job=\"probe-traefik-services\",instance=\"updatecenter.iamworkin.lan\"}[24h]) * 100",
          "refId": "A",
          "legendFormat": "24h Uptime"
        }
      ],
      "title": "24-Hour Uptime",
      "transparent": true,
      "type": "stat"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "fffjikve8llhce"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "thresholds"
          },
          "max": 30,
          "min": 0,
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "#f87171",
                "value": null
              },
              {
                "color": "#fbbf24",
                "value": 2
              },
              {
                "color": "#4ade80",
                "value": 7
              }
            ]
          },
          "unit": "d"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 4,
        "w": 8,
        "x": 16,
        "y": 0
      },
      "id": 3,
      "options": {
        "minVizHeight": 75,
        "minVizWidth": 75,
        "orientation": "auto",
        "reduceOptions": {
          "calcs": [
            "lastNotNull"
          ],
          "fields": "",
          "values": false
        },
        "showThresholdLabels": false,
        "showThresholdMarkers": true
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "fffjikve8llhce"
          },
          "expr": "(probe_ssl_earliest_cert_expiry{job=\"probe-traefik-services\",instance=\"updatecenter.iamworkin.lan\"} - time()) / 86400",
          "refId": "A",
          "legendFormat": "Days Remaining"
        }
      ],
      "title": "Cert Expiry (Days)",
      "transparent": true,
      "type": "gauge"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "fffjikve8llhce"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisBorderShow": false,
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "Response Time (seconds)",
            "drawStyle": "line",
            "fillOpacity": 12,
            "gradientMode": "scheme",
            "lineInterpolation": "smooth",
            "lineWidth": 2,
            "pointSize": 4,
            "showPoints": "never",
            "spanNulls": true,
            "thresholdsStyle": {
              "mode": "dashed"
            }
          },
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "#4ade80",
                "value": null
              },
              {
                "color": "#fbbf24",
                "value": 2
              },
              {
                "color": "#f87171",
                "value": 5
              }
            ]
          },
          "unit": "s"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 14,
        "x": 0,
        "y": 4
      },
      "id": 4,
      "options": {
        "legend": {
          "calcs": [
            "lastNotNull",
            "mean",
            "max"
          ],
          "displayMode": "table",
          "placement": "right"
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "fffjikve8llhce"
          },
          "expr": "probe_duration_seconds{job=\"probe-traefik-services\",instance=\"updatecenter.iamworkin.lan\"}",
          "refId": "A",
          "legendFormat": "Probe Duration"
        }
      ],
      "timeFrom": "1h",
      "title": "Response Time (1h Trend)",
      "transparent": true,
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "fffjikve8llhce"
      },
      "gridPos": {
        "h": 8,
        "w": 10,
        "x": 14,
        "y": 4
      },
      "id": 5,
      "options": {
        "alertInstanceLabelFilter": "{instance=\"updatecenter.iamworkin.lan\"}",
        "alertName": "",
        "dashboardAlerts": false,
        "groupBy": [],
        "groupMode": "default",
        "maxItems": 10,
        "sortOrder": 1,
        "stateFilter": {
          "error": true,
          "firing": true,
          "noData": true,
          "normal": false,
          "pending": true
        },
        "viewMode": "list"
      },
      "title": "Active Alerts",
      "type": "alertlist"
    },
    {
      "collapsed": false,
      "gridPos": {
        "h": 1,
        "w": 24,
        "x": 0,
        "y": 12
      },
      "id": 20,
      "title": "OTEL Counters — Track 1D",
      "type": "row"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "fffjikve8llhce"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "lineWidth": 1,
            "fillOpacity": 10
          },
          "unit": "reqps"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 0,
        "y": 13
      },
      "id": 21,
      "options": {
        "legend": {
          "displayMode": "table",
          "placement": "right",
          "calcs": ["mean", "lastNotNull"]
        },
        "tooltip": {
          "mode": "multi",
          "sort": "desc"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "fffjikve8llhce"
          },
          "expr": "sum by (status) (rate(updatecenter_manifest_requests_total[5m]))",
          "refId": "A",
          "legendFormat": "status={{status}}"
        }
      ],
      "title": "Manifest Requests rate by status (5m)",
      "transparent": true,
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "fffjikve8llhce"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "lineWidth": 1,
            "fillOpacity": 10
          },
          "unit": "Bps"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 12,
        "y": 13
      },
      "id": 22,
      "options": {
        "legend": {
          "displayMode": "table",
          "placement": "right",
          "calcs": ["mean", "lastNotNull"]
        },
        "tooltip": {
          "mode": "multi",
          "sort": "desc"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "fffjikve8llhce"
          },
          "expr": "sum by (slug) (rate(updatecenter_bundle_download_bytes_total[5m]))",
          "refId": "A",
          "legendFormat": "{{slug}}"
        }
      ],
      "title": "Bundle Download Throughput by slug (5m)",
      "transparent": true,
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "fffjikve8llhce"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "lineWidth": 1,
            "fillOpacity": 10
          },
          "unit": "reqps"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 0,
        "y": 21
      },
      "id": 23,
      "options": {
        "legend": {
          "displayMode": "table",
          "placement": "right",
          "calcs": ["mean", "lastNotNull"]
        },
        "tooltip": {
          "mode": "multi",
          "sort": "desc"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "fffjikve8llhce"
          },
          "expr": "sum by (status) (rate(updatecenter_checkins_total[5m]))",
          "refId": "A",
          "legendFormat": "status={{status}}"
        }
      ],
      "title": "Agent Check-in Rate by status (5m)",
      "transparent": true,
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "fffjikve8llhce"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "thresholds"
          },
          "thresholds": {
            "mode": "absolute",
            "steps": [
              { "color": "#4ade80", "value": null },
              { "color": "#f87171", "value": 1 }
            ]
          },
          "unit": "none",
          "decimals": 2
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 6,
        "x": 12,
        "y": 21
      },
      "id": 24,
      "options": {
        "colorMode": "background",
        "graphMode": "area",
        "justifyMode": "center",
        "orientation": "auto",
        "reduceOptions": {
          "calcs": ["sum"],
          "fields": "",
          "values": false
        },
        "textMode": "value_and_name"
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "fffjikve8llhce"
          },
          "expr": "increase(updatecenter_signature_verify_failures_total[1h])",
          "refId": "A",
          "legendFormat": "Sig Verify Failures (1h)"
        }
      ],
      "title": "Signature Verify Failures (1h)",
      "transparent": true,
      "type": "stat"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "fffjikve8llhce"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "lineWidth": 1,
            "fillOpacity": 10
          },
          "unit": "reqps"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 6,
        "x": 18,
        "y": 21
      },
      "id": 25,
      "options": {
        "legend": {
          "displayMode": "table",
          "placement": "right",
          "calcs": ["mean", "lastNotNull"]
        },
        "tooltip": {
          "mode": "multi",
          "sort": "desc"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "fffjikve8llhce"
          },
          "expr": "sum by (slug, channel) (rate(updatecenter_release_publishes_total[5m]))",
          "refId": "A",
          "legendFormat": "{{slug}}/{{channel}}"
        }
      ],
      "title": "Release Publishes rate by slug/channel (5m)",
      "transparent": true,
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "fffjikve8llhce"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "lineWidth": 1,
            "fillOpacity": 10
          },
          "unit": "reqps"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 0,
        "y": 29
      },
      "id": 26,
      "options": {
        "legend": {
          "displayMode": "table",
          "placement": "right",
          "calcs": ["mean", "lastNotNull"]
        },
        "tooltip": {
          "mode": "multi",
          "sort": "desc"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "fffjikve8llhce"
          },
          "expr": "sum by (kind, status) (rate(updatecenter_bundle_downloads_total[5m]))",
          "refId": "A",
          "legendFormat": "{{kind}} / {{status}}"
        }
      ],
      "title": "Bundle Download Requests by kind/status (5m)",
      "transparent": true,
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "fffjikve8llhce"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "lineWidth": 2,
            "fillOpacity": 20
          },
          "thresholds": {
            "mode": "absolute",
            "steps": [
              { "color": "#4ade80", "value": null },
              { "color": "#f87171", "value": 0.01 }
            ]
          },
          "unit": "reqps"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 12,
        "y": 29
      },
      "id": 27,
      "options": {
        "legend": {
          "displayMode": "table",
          "placement": "right",
          "calcs": ["mean", "lastNotNull"]
        },
        "tooltip": {
          "mode": "multi",
          "sort": "desc"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "fffjikve8llhce"
          },
          "expr": "rate(updatecenter_signature_verify_failures_total[5m])",
          "refId": "A",
          "legendFormat": "Sig verify failures/s"
        }
      ],
      "title": "Signature Verify Failure Rate (5m) — Critical if >0",
      "transparent": true,
      "type": "timeseries"
    }
  ],
  "refresh": "30s",
  "schemaVersion": 39,
  "style": "dark",
  "tags": [
    "blue-jay",
    "flowercore",
    "synthetic",
    "updatecenter",
    "otel"
  ],
  "templating": {
    "list": []
  },
  "time": {
    "from": "now-24h",
    "to": "now"
  },
  "timezone": "browser",
  "title": "FlowerCore.UpdateCenter Dashboard",
  "uid": "fc-updatecenter",
  "version": 2
 }
--- a/apps/monitoring/flowercore-remotedesktop-grafana-dashboard.grafana.txt
+++ b/apps/monitoring/flowercore-remotedesktop-grafana-dashboard.grafana.txt
@@ -0,0 +1,226 @@
 {
  "annotations": {
    "list": []
  },
  "editable": true,
  "fiscalYearStartMonth": 0,
  "graphTooltip": 0,
  "id": null,
  "links": [],
  "panels": [
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${DS_PROMETHEUS}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "unit": "short"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 0,
        "y": 0
      },
      "id": 1,
      "options": {
        "legend": {
          "displayMode": "table",
          "placement": "bottom"
        },
        "tooltip": {
          "mode": "single"
        }
      },
      "targets": [
        {
          "editorMode": "code",
          "expr": "sum by (event) (increase(fc_desktop_session_events_total[$__rate_interval]))",
          "legendFormat": "{{event}}",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "RemoteDesktop Session Events",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${DS_PROMETHEUS}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "unit": "short"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 12,
        "y": 0
      },
      "id": 2,
      "options": {
        "orientation": "auto",
        "reduceOptions": {
          "calcs": [
            "lastNotNull"
          ],
          "fields": "",
          "values": false
        },
        "showUnfilled": true
      },
      "targets": [
        {
          "editorMode": "code",
          "expr": "sum by (template, event) (increase(fc_desktop_session_events_total[24h]))",
          "legendFormat": "{{template}} {{event}}",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "24h Session Events By Template",
      "type": "bargauge"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${DS_PROMETHEUS}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "unit": "short"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 0,
        "y": 8
      },
      "id": 3,
      "options": {
        "legend": {
          "displayMode": "table",
          "placement": "bottom"
        },
        "tooltip": {
          "mode": "single"
        }
      },
      "targets": [
        {
          "editorMode": "code",
          "expr": "fc_desktop_pool_ready",
          "legendFormat": "{{template}} ready",
          "range": true,
          "refId": "A"
        },
        {
          "editorMode": "code",
          "expr": "fc_desktop_pool_desired",
          "legendFormat": "{{template}} desired",
          "range": true,
          "refId": "B"
        }
      ],
      "title": "Warm Pool Ready vs Desired",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${DS_PROMETHEUS}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "orange",
                "value": 1
              }
            ]
          },
          "unit": "short"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 12,
        "y": 8
      },
      "id": 4,
      "options": {
        "colorMode": "value",
        "graphMode": "none",
        "justifyMode": "auto",
        "orientation": "auto",
        "reduceOptions": {
          "calcs": [
            "lastNotNull"
          ],
          "fields": "",
          "values": false
        },
        "textMode": "auto"
      },
      "targets": [
        {
          "editorMode": "code",
          "expr": "sum(increase(fc_desktop_session_events_total{event=\"connect\",browser_datasource=\"json\"}[24h])) - sum(increase(fc_desktop_session_events_total{event=\"disconnect\"}[24h]))",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "24h Connect Minus Disconnect",
      "type": "stat"
    }
  ],
  "refresh": "30s",
  "schemaVersion": 39,
  "style": "dark",
  "tags": [
    "flowercore",
    "remotedesktop",
    "guacamole"
  ],
  "templating": {
    "list": []
  },
  "time": {
    "from": "now-24h",
    "to": "now"
  },
  "timezone": "browser",
  "title": "FlowerCore RemoteDesktop",
  "uid": "flowercore-remotedesktop",
  "version": 1
 }
--- a/apps/monitoring/grafana-dashboard-remotedesktop.yaml
+++ b/apps/monitoring/grafana-dashboard-remotedesktop.yaml
@@ -0,0 +1,249 @@
 # Grafana dashboard ConfigMap for FlowerCore.RemoteDesktop.
 #
 # Inlines the JSON from flowercore-remotedesktop-grafana-dashboard.json.
 # Kept as a standalone file (not inlined in noc-monitoring.yaml) so the
 # CRLF-dirty state of noc-monitoring.yaml doesn't have to be normalized
 # in the same pass. To actually load the dashboard, the Grafana Deployment
 # in noc-monitoring.yaml needs a matching 'volumes:' entry:
 #
 #   - name: dashboard-remotedesktop
 #     configMap:
 #       name: grafana-dashboard-remotedesktop
 #
 # ArgoCD will sync this ConfigMap automatically through the bluejay-infra
 # ApplicationSet (infra-monitoring App). The dashboard just won't load
 # until the Grafana Deployment mount is wired.
 ---
 apiVersion: v1
 kind: ConfigMap
 metadata:
  name: grafana-dashboard-remotedesktop
  namespace: monitoring
 data:
  remotedesktop.json: |
    {
      "annotations": {
        "list": []
      },
      "editable": true,
      "fiscalYearStartMonth": 0,
      "graphTooltip": 0,
      "id": null,
      "links": [],
      "panels": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "fieldConfig": {
            "defaults": {
              "color": {
                "mode": "palette-classic"
              },
              "unit": "short"
            },
            "overrides": []
          },
          "gridPos": {
            "h": 8,
            "w": 12,
            "x": 0,
            "y": 0
          },
          "id": 1,
          "options": {
            "legend": {
              "displayMode": "table",
              "placement": "bottom"
            },
            "tooltip": {
              "mode": "single"
            }
          },
          "targets": [
            {
              "editorMode": "code",
              "expr": "sum by (event) (increase(fc_desktop_session_events_total[$__rate_interval]))",
              "legendFormat": "{{event}}",
              "range": true,
              "refId": "A"
            }
          ],
          "title": "RemoteDesktop Session Events",
          "type": "timeseries"
        },
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "fieldConfig": {
            "defaults": {
              "color": {
                "mode": "palette-classic"
              },
              "unit": "short"
            },
            "overrides": []
          },
          "gridPos": {
            "h": 8,
            "w": 12,
            "x": 12,
            "y": 0
          },
          "id": 2,
          "options": {
            "orientation": "auto",
            "reduceOptions": {
              "calcs": [
                "lastNotNull"
              ],
              "fields": "",
              "values": false
            },
            "showUnfilled": true
          },
          "targets": [
            {
              "editorMode": "code",
              "expr": "sum by (template, event) (increase(fc_desktop_session_events_total[24h]))",
              "legendFormat": "{{template}} {{event}}",
              "range": true,
              "refId": "A"
            }
          ],
          "title": "24h Session Events By Template",
          "type": "bargauge"
        },
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "fieldConfig": {
            "defaults": {
              "color": {
                "mode": "palette-classic"
              },
              "unit": "short"
            },
            "overrides": []
          },
          "gridPos": {
            "h": 8,
            "w": 12,
            "x": 0,
            "y": 8
          },
          "id": 3,
          "options": {
            "legend": {
              "displayMode": "table",
              "placement": "bottom"
            },
            "tooltip": {
              "mode": "single"
            }
          },
          "targets": [
            {
              "editorMode": "code",
              "expr": "fc_desktop_pool_ready",
              "legendFormat": "{{template}} ready",
              "range": true,
              "refId": "A"
            },
            {
              "editorMode": "code",
              "expr": "fc_desktop_pool_desired",
              "legendFormat": "{{template}} desired",
              "range": true,
              "refId": "B"
            }
          ],
          "title": "Warm Pool Ready vs Desired",
          "type": "timeseries"
        },
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "fieldConfig": {
            "defaults": {
              "color": {
                "mode": "palette-classic"
              },
              "mappings": [],
              "thresholds": {
                "mode": "absolute",
                "steps": [
                  {
                    "color": "green",
                    "value": null
                  },
                  {
                    "color": "orange",
                    "value": 1
                  }
                ]
              },
              "unit": "short"
            },
            "overrides": []
          },
          "gridPos": {
            "h": 8,
            "w": 12,
            "x": 12,
            "y": 8
          },
          "id": 4,
          "options": {
            "colorMode": "value",
            "graphMode": "none",
            "justifyMode": "auto",
            "orientation": "auto",
            "reduceOptions": {
              "calcs": [
                "lastNotNull"
              ],
              "fields": "",
              "values": false
            },
            "textMode": "auto"
          },
          "targets": [
            {
              "editorMode": "code",
              "expr": "sum(increase(fc_desktop_session_events_total{event=\"connect\",browser_datasource=\"json\"}[24h])) - sum(increase(fc_desktop_session_events_total{event=\"disconnect\"}[24h]))",
              "range": true,
              "refId": "A"
            }
          ],
          "title": "24h Connect Minus Disconnect",
          "type": "stat"
        }
      ],
      "refresh": "30s",
      "schemaVersion": 39,
      "style": "dark",
      "tags": [
        "flowercore",
        "remotedesktop",
        "guacamole"
      ],
      "templating": {
        "list": []
      },
      "time": {
        "from": "now-24h",
        "to": "now"
      },
      "timezone": "browser",
      "title": "FlowerCore RemoteDesktop",
      "uid": "flowercore-remotedesktop",
      "version": 1
    }
--- a/apps/monitoring/noc-monitoring.yaml
+++ b/apps/monitoring/noc-monitoring.yaml
@@ -104,21 +104,27 @@ data:
          - target_label: __address__
            replacement: snmp-exporter.monitoring.svc:9116
-      # UniFi Cloud Key SNMP
+      # UniFi Cloud Key SNMP — DISABLED 2026-04-26
-      - job_name: "snmp-cloudkey"
+      # The Cloud Key Gen2+ runs unifi-core (controller) only — not a network
-        static_configs:
+      # device — and does NOT run an SNMP agent on UDP/161. Scrapes were
-          - targets: ["10.0.56.3"]
+      # silently failing with "connection refused" from 10.42.x.x:161 every
-        metrics_path: /snmp
+      # 30s, polluting up{} = 0 and lastError on the Targets page. Hardware
-        params:
+      # health (CPU/mem/disk) for the Cloud Key host should come from
-          module: [if_mib]
+      # node_exporter via SSH — not SNMP.
-          auth: [bluejay_v2]
+      # - job_name: "snmp-cloudkey"
-        relabel_configs:
+      #   static_configs:
-          - source_labels: [__address__]
+      #     - targets: ["10.0.56.3"]
-            target_label: __param_target
+      #   metrics_path: /snmp
-          - source_labels: [__param_target]
+      #   params:
-            target_label: instance
+      #     module: [if_mib]
-          - target_label: __address__
+      #     auth: [bluejay_v2]
-            replacement: snmp-exporter.monitoring.svc:9116
+      #   relabel_configs:
      #     - source_labels: [__address__]
      #       target_label: __param_target
      #     - source_labels: [__param_target]
      #       target_label: instance
      #     - target_label: __address__
      #       replacement: snmp-exporter.monitoring.svc:9116
      # UniFi Switch SNMP
      - job_name: "snmp-switch"
@@ -278,6 +284,38 @@ data:
          - target_label: __address__
            replacement: blackbox-exporter.monitoring.svc:9115
      # FlowerCore.RemoteDesktop web health (public cluster VIP)
      # Module is https_internal — desktop.iamworkin.lan uses a step-ca leaf
      # cert; blackbox does NOT trust step-ca root, so http_2xx fails with
      # x509 unknown authority and probe_success=0 even when /health 200s.
      - job_name: "probe-remotedesktop"
        metrics_path: /probe
        params:
          module: [https_internal]
        scrape_interval: 30s
        static_configs:
          - targets: ["https://desktop.iamworkin.lan/health"]
            labels:
              instance: "https://desktop.iamworkin.lan/health"
              service: "remotedesktop-web"
        relabel_configs:
          - source_labels: [__address__]
            target_label: __param_target
          - target_label: __address__
            replacement: blackbox-exporter.monitoring.svc:9115
      # FlowerCore.RemoteDesktop /metrics (direct scrape for counters)
      - job_name: "fc-remotedesktop"
        metrics_path: /metrics
        scheme: https
        scrape_interval: 30s
        tls_config:
          insecure_skip_verify: true
        static_configs:
          - targets: ["desktop.iamworkin.lan"]
            labels:
              service: "remotedesktop-web"
      # CUPS web UI health (port 631)
      - job_name: "probe-cups"
        metrics_path: /probe
@@ -301,26 +339,12 @@ data:
      # AI Stack Health Probes (Blackbox Exporter)
      # =============================================================================
-      # Ollama API — workstation (LOCAL Agent Zero)
+      # NOTE: probe-ollama-local and probe-agentzero-local were REMOVED
-      - job_name: "probe-ollama-local"
+      # 2026-04-26. They pointed at 10.0.58.100 (HOME VLAN) which is not
-        metrics_path: /probe
+      # reachable from cluster pods (firewalled). They had been firing as
-        params:
+      # OllamaDown / AgentZeroDown since 2026-04-24. Workstation/AI-laptop
-          module: [http_ollama]
+      # Ollama and Agent Zero should be monitored via host-side Puppet
-        scrape_interval: 30s
+      # (node_exporter on the box) once the AI laptop is running 24/7.
        static_configs:
          - targets: ["http://10.0.58.100:11434/api/tags"]
            labels:
              instance: "ollama-local"
              service: "ollama"
              deployment: "local"
              gpu: "r9700"
        relabel_configs:
          - source_labels: [__address__]
            target_label: __param_target
          - source_labels: [__param_target]
            target_label: instance
          - target_label: __address__
            replacement: blackbox-exporter.monitoring.svc:9115
      # Ollama API — edge1 Pi 5 (NUC Agent Zero)
      - job_name: "probe-ollama-edge1"
@@ -343,34 +367,18 @@ data:
          - target_label: __address__
            replacement: blackbox-exporter.monitoring.svc:9115
-      # Agent Zero Web UI — local (K3s)
+      # Agent Zero Web UI — in-cluster (RKE2)
-      - job_name: "probe-agentzero-local"
+      # Target uses short svc form (agent-zero.agent-zero.svc) NOT
-        metrics_path: /probe
+      # cluster.local FQDN — the *.cluster.local form gets rewritten to
-        params:
+      # 10.0.56.200 (Traefik VIP) by the CoreDNS iamworkin.lan template +
-          module: [http_2xx]
+      # ndots:5 search-suffix expansion. Memory: feedback_coredns_ndots_template_collision.
        scrape_interval: 30s
        static_configs:
          - targets: ["http://10.0.58.100:30050/"]
            labels:
              instance: "agent-zero-local"
              service: "agent-zero"
              deployment: "local"
        relabel_configs:
          - source_labels: [__address__]
            target_label: __param_target
          - source_labels: [__param_target]
            target_label: instance
          - target_label: __address__
            replacement: blackbox-exporter.monitoring.svc:9115
      # Agent Zero Web UI — NUC (RKE2 via Traefik)
      - job_name: "probe-agentzero-nuc"
        metrics_path: /probe
        params:
          module: [http_2xx]
        scrape_interval: 30s
        static_configs:
-          - targets: ["http://agent-zero.agent-zero.svc.cluster.local/"]
+          - targets: ["http://agent-zero.agent-zero.svc:80/"]
            labels:
              instance: "agent-zero-nuc"
              service: "agent-zero"
@@ -383,6 +391,119 @@ data:
          - target_label: __address__
            replacement: blackbox-exporter.monitoring.svc:9115
      # =============================================================================
      # K8s Cluster State (kube-state-metrics, cert-manager, traefik)
      # =============================================================================
      # Use in-cluster ClusterIP service DNS — NOT NodePorts — so a same-node
      # NodePort hairpin doesn't break the scrape (hit on rke2-agent1 hosting
      # both prometheus and traefik on 2026-04-26: 10.0.56.12:30900 timed out
      # from prometheus while .11/.13 worked). NodePorts at 30900-30902 are
      # still useful for noc1-Podman-style external scrapers, but in-cluster
      # we should always use the svc DNS form.
      # kube-state-metrics — exposes K8s object state (pods, deployments, nodes)
      # Required for KubeContainerRestartingFrequently / KubePodNotReady alerts.
      - job_name: "kube-state-metrics"
        scrape_interval: 30s
        static_configs:
          - targets: ["kube-state-metrics.kube-system.svc:8080"]
            labels:
              cluster: "rke2"
      # cert-manager — exposes certmanager_certificate_ready_status,
      # certmanager_certificate_expiration_timestamp_seconds, etc. Drives the
      # CertManagerCertificateNotReady / CertManagerCertificateRenewalFailed
      # alerts. Memory: project_cert_manager_prometheus_scrape.
      - job_name: "cert-manager"
        scrape_interval: 30s
        static_configs:
          - targets: ["cert-manager-metrics.cert-manager.svc:9402"]
            labels:
              cluster: "rke2"
      # Traefik — request rates, latency, TLS cert metadata, router state.
      # ClusterIP svc routes to one of the traefik pods; per-pod scrape via
      # the headless `traefik-metrics` selector would be nicer for failover
      # visibility but the single-replica scrape is enough for steady-state.
      - job_name: "traefik"
        scrape_interval: 15s
        static_configs:
          - targets: ["traefik-metrics.traefik-system.svc:9100"]
            labels:
              service: "traefik"
              cluster: "rke2"
      # Longhorn — exposes longhorn_volume_robustness, longhorn_backup_*,
      # longhorn_node_status_*. Enables LonghornVolumeUnhealthy +
      # LonghornBackupFailed alerts (no real visibility into Longhorn
      # health before this — was relying on K8s events which are noisy
      # transient lifecycle messages, not actionable signals).
      - job_name: "longhorn"
        scrape_interval: 30s
        static_configs:
          - targets: ["longhorn-backend.longhorn-system.svc:9500"]
            labels:
              service: "longhorn"
              cluster: "rke2"
      # FC web services through Traefik — single probe surface to spot any
      # iamworkin.lan host returning non-200. Uses https_internal because all
      # certs are step-ca leaves; blackbox would x509-fail with http_2xx.
      # Some services need explicit healthcheck paths because root returns
      # 404 (acme, guac) or 401 (grafana, prometheus). Drop them or point at
      # the right endpoint — don't lower valid_status_codes globally because
      # 401 from a healthy pod and 401 from an outage look identical.
      - job_name: "probe-traefik-services"
        metrics_path: /probe
        params:
          module: [https_internal]
        scrape_interval: 60s
        static_configs:
          - targets:
              # Root-reachable services (200 or 3xx)
              - "https://gitea.iamworkin.lan/"
              - "https://argocd.iamworkin.lan/"
              - "https://intranet.iamworkin.lan/"
              - "https://signage.iamworkin.lan/"
              - "https://kiosk.iamworkin.lan/"
              - "https://media.iamworkin.lan/"
              - "https://mysql.iamworkin.lan/"
              - "https://php.iamworkin.lan/"
              - "https://zabbix.iamworkin.lan/"
              - "https://desktop.iamworkin.lan/"
              - "https://print.iamworkin.lan/"
              - "https://dns.iamworkin.lan/"
              - "https://chat.iamworkin.lan/"
              - "https://dist.iamworkin.lan/"
              - "https://dms.iamworkin.lan/"
              - "https://menuboard.iamworkin.lan/"
              - "https://messageboard.iamworkin.lan/"
              - "https://presentations.iamworkin.lan/"
              - "https://retail.iamworkin.lan/"
              - "https://ttsreader.iamworkin.lan/"
              # Explicit healthcheck paths
              - "https://fc-llm-bridge.iamworkin.lan/healthz"
              - "https://acme.iamworkin.lan/health"
              # NOTE: services intentionally NOT in this probe surface
              #   - grafana.iamworkin.lan: every endpoint (incl. /api/health
              #     and /login) returns 401 behind Traefik basic-auth.
              #     Health covered by in-cluster monitoring-grafana scrape.
              #   - prometheus.iamworkin.lan: same auth pattern. Health covered
              #     by the prometheus self-scrape job.
              #   - guac.iamworkin.lan: deprecated — Guacamole moved to
              #     desktop.iamworkin.lan/guacamole/ (memory:
              #     feedback_traefik_cross_namespace_refs_disabled).
            labels:
              probe_type: "traefik-service"
        relabel_configs:
          - source_labels: [__address__]
            target_label: __param_target
          - source_labels: [__param_target]
            regex: "https?://([^/:]+).*"
            target_label: instance
          - target_label: __address__
            replacement: blackbox-exporter.monitoring.svc:9115
      # =============================================================================
      # Self-monitoring (K8s monitoring namespace)
      # =============================================================================
@@ -521,6 +642,42 @@ data:
              summary: "Print queue backlog on edge2 ({{ $value }} active jobs)"
              description: "CUPS has {{ $value }} active jobs queued. Possible printer jam, USB disconnect, or paper out."
          # Paper roll lifecycle alerts (XL Track I, 2026-04-26).
          # Source-of-truth gauge: print_paper_remaining_percent (Print.Web OTEL,
          # hydrated on startup from the active PaperRoll row).
          # alert_channel=thermal_print routes through irc-notify -> Print.Web
          # /api/print/alert so the printer announces its own paper-out warning
          # on its remaining paper. Self-referential humor + operator nudge.
          - alert: PrintPaperRollLow
            expr: print_paper_remaining_percent{job="printweb-otel"} < 10 and print_paper_remaining_percent{job="printweb-otel"} > 5
            for: 5m
            labels:
              severity: warning
              alert_channel: thermal_print
            annotations:
              summary: "Print roll low on edge2 ({{ $value | printf \"%.1f\" }}% remaining)"
              description: "NuPrint 210 paper roll has {{ $value | printf \"%.1f\" }}% remaining. Operator should load a fresh roll soon. Run /api/paper/status for the precise mm + estimated jobs left."
          - alert: PrintPaperRollCritical
            expr: print_paper_remaining_percent{job="printweb-otel"} <= 5
            for: 2m
            labels:
              severity: critical
              alert_channel: thermal_print
            annotations:
              summary: "Print roll critical on edge2 ({{ $value | printf \"%.1f\" }}% remaining)"
              description: "NuPrint 210 paper roll at {{ $value | printf \"%.1f\" }}% — load a new roll NOW. The 50ft roll has a ~12% red-stripe zone; once paper passes that, the printer can run dry mid-job."
          - alert: PrintJobDeadLetter
            expr: increase(print_jobs_dead_letter_total[15m]) > 0
            for: 1m
            labels:
              severity: warning
              alert_channel: thermal_print
            annotations:
              summary: "Print job(s) entered dead-letter on edge2 ({{ $value | printf \"%.0f\" }} in last 15m)"
              description: "{{ $value | printf \"%.0f\" }} print job(s) exhausted MaxRetries and need operator action. Open /print-log, filter Status=DeadLetter, click 'Retry From Start' after fixing the underlying cause (paper jam, USB disconnect, printer power-cycle)."
          - alert: CUPSHighJobRate
            expr: rate(cups_job_total[5m]) * 60 > 30
            for: 5m
@@ -540,6 +697,89 @@ data:
              summary: "Print.Web Ollama runner held for >10m ({{ $labels.model }})"
              description: "Print.Web reports model {{ $labels.model }} with {{ $value | printf \"%.0f\" }}s of keep-alive remaining. Check concurrent requests before the Pi 5 Ollama lane thrashes."
      - name: remote-desktop
        rules:
          - alert: RemoteDesktopWebDown
            expr: probe_success{job="probe-remotedesktop",instance="https://desktop.iamworkin.lan/health"} == 0
            for: 3m
            labels:
              severity: warning
            annotations:
              summary: "FlowerCore RemoteDesktop web is down"
              description: "https://desktop.iamworkin.lan/health probe has failed for 3 minutes. Catalog + session launch surface offline."
          - alert: RemoteDesktopMetricsStale
            expr: absent(fc_desktop_session_events_total)
            for: 10m
            labels:
              severity: warning
            annotations:
              summary: "RemoteDesktop /metrics scrape returning no data"
              description: "No fc_desktop_session_events_total series for 10 minutes. Either the Prometheus scrape target is misconfigured or the web deployment stopped exporting metrics. Zabbix template carries the same 10m no-data trigger for cross-monitor parity."
          # PUBLISHER QUIRK: fc_desktop_pool_depleted / _deficit emit one
          # series per template per status (Ready/Warming/BelowDesiredSize/
          # Disabled), and the historical series for non-current statuses
          # stay at their last value. So just `_depleted > 0` fires forever
          # on any template that ever entered a bad state.
          #
          # SAFE PATTERN: alert only when the canonical "Ready" status
          # gauge does NOT report ready=1 for the enabled template. This
          # is the publisher's own canary — _ready{status="Ready"}==1 is
          # always the current "everything is fine" signal.
          - alert: RemoteDesktopPoolDepleted
            expr: |
              group by(template) (fc_desktop_pool_ready{enabled="true"})
              unless on(template) (fc_desktop_pool_ready{enabled="true",status="Ready"} == 1)
            for: 5m
            labels:
              severity: warning
            annotations:
              summary: "RemoteDesktop pool depleted ({{ $labels.template }})"
              description: "Pool for template {{ $labels.template }} has no Ready warm pod for 5 minutes. New launches will cold-start. Check pod-scheduling failures, image pull issues, or exhausted node capacity."
          # Same pattern, but only fires when template explicitly reports
          # a sustained Warning-level alert state (current-status series).
          - alert: RemoteDesktopPoolDeficitSustained
            expr: |
              fc_desktop_pool_deficit{enabled="true",alert_level="Warning"} > 0
              unless on(template) (fc_desktop_pool_ready{enabled="true",status="Ready"} == 1)
            for: 10m
            labels:
              severity: info
            annotations:
              summary: "RemoteDesktop pool {{ $labels.template }} below desired for 10m"
              description: "Pool {{ $labels.template }} has a persistent deficit of {{ $value }} warm pods AND no Ready series. Likely image pull, NFS affinity, or claim-init issue."
          - alert: RemoteDesktopSessionChurnSpike
            expr: sum(rate(fc_desktop_session_events_total{event="launch"}[5m])) * 60 > 20
            for: 5m
            labels:
              severity: info
            annotations:
              summary: "RemoteDesktop launch rate high ({{ $value | printf \"%.0f\" }}/min)"
              description: "Launch events exceed 20/min for 5 minutes. Could be a user-facing feature launch, a pooled template thrashing, or a runaway automation loop."
          - alert: RemoteDesktopRecordingEventsDropped
            expr: absent_over_time(fc_desktop_session_events_total{event="recording"}[30m]) and on() (sum(fc_desktop_session_events_total{event="launch"}) > 0)
            for: 15m
            labels:
              severity: info
            annotations:
              summary: "RemoteDesktop recording events silent for 30m despite active launches"
              description: "No recording events in 30 minutes while launches are happening. Recording may be silently disabled on all templates (SessionRecordingEnabled=false), the guacd NFS mount may be unhealthy, or the retention sweep isn't emitting events. Not an error by itself — worth checking."
          # Match by job — instance label carries full URL incl. /health,
          # not just hostname, so a hostname-only match never fires.
          - alert: RemoteDesktopTlsExpiry
            expr: probe_ssl_earliest_cert_expiry{job="probe-remotedesktop"} - time() < 2 * 86400
            for: 6h
            labels:
              severity: critical
            annotations:
              summary: "desktop.iamworkin.lan TLS cert expires within 2 days"
              description: "The desktop.iamworkin.lan cert is inside the 2-day renewal window and cert-manager has not renewed. Check cert-manager logs, step-ca reachability, and pfSense DNS overrides per the ACME DNS-01 gate."
      - name: pi-fleet
        rules:
          - alert: PiManagerDown
@@ -619,13 +859,16 @@ data:
            annotations:
              summary: "Epson ink CRITICAL: {{ $labels.prtMarkerSuppliesDescription }} at {{ $value }}%"
          # for: 30m absorbs sleep cycles. The EcoTank sleeps after ~5 min
          # of idle and SNMP times out, so 5m for: would page nightly. A
          # genuine printer outage (jam, disconnected) lasts well over 30m.
          - alert: EpsonPrinterDown
            expr: up{job="snmp-printer"} == 0
-            for: 5m
+            for: 30m
            labels:
              severity: warning
            annotations:
-              summary: "Epson ET-3750 SNMP unreachable"
+              summary: "Epson ET-3750 SNMP unreachable for >30m (likely actual fault, not sleep)"
          - alert: SynologyDiskLow
            expr: hrStorageUsed{job="snmp-nas"} / hrStorageSize{job="snmp-nas"} * 100 > 85
@@ -679,6 +922,174 @@ data:
            annotations:
              summary: "Disk usage high on {{ $labels.instance }} ({{ $value | printf \"%.1f\" }}%)"
      # K8s pod-state alerts. Require kube-state-metrics scrape (added
      # 2026-04-26 — see scrape_configs above). Would have surfaced the
      # agent-zero ollama-proxy 172x crash-loop instead of letting it
      # silently churn for ~3 days.
      - name: kubernetes-state
        rules:
          - alert: KubeContainerRestartingFrequently
            expr: increase(kube_pod_container_status_restarts_total[1h]) > 5
            for: 15m
            labels:
              severity: warning
            annotations:
              summary: "{{ $labels.namespace }}/{{ $labels.pod }} container {{ $labels.container }} restarting >5x/hr"
              description: "Container {{ $labels.container }} in pod {{ $labels.namespace }}/{{ $labels.pod }} has restarted {{ $value | printf \"%.0f\" }} times in the last hour. Check 'kubectl describe pod' + last-state termination reason."
          - alert: KubeContainerCrashLooping
            expr: increase(kube_pod_container_status_restarts_total[15m]) > 3
            for: 5m
            labels:
              severity: critical
              alert_channel: thermal_print
            annotations:
              summary: "{{ $labels.namespace }}/{{ $labels.pod }} crashlooping ({{ $value | printf \"%.0f\" }} restarts/15m)"
              description: "Container {{ $labels.container }} restarted {{ $value | printf \"%.0f\" }} times in 15 minutes — actively crashlooping."
          - alert: KubePodNotReady
            expr: sum by(namespace, pod) (kube_pod_status_phase{phase=~"Pending|Failed|Unknown"}) > 0
            for: 15m
            labels:
              severity: warning
            annotations:
              summary: "{{ $labels.namespace }}/{{ $labels.pod }} not Ready for >15m"
              description: "Pod is in a non-Running, non-Succeeded phase for over 15 minutes. Common causes: ImagePullBackOff (registry/Nexus down, wrong image tag), pending PVC, scheduling failure (taint/resources)."
          - alert: KubePodImagePullBackOff
            expr: sum by(namespace, pod) (kube_pod_container_status_waiting_reason{reason=~"ImagePullBackOff|ErrImagePull"}) > 0
            for: 10m
            labels:
              severity: warning
            annotations:
              summary: "{{ $labels.namespace }}/{{ $labels.pod }} ImagePullBackOff for >10m"
              description: "Pod can't pull image. Check the image ref (often a stale tag or unreachable registry) and clean up if it's an orphan."
          - alert: KubeDeploymentReplicasMismatch
            expr: kube_deployment_spec_replicas != kube_deployment_status_replicas_available
            for: 15m
            labels:
              severity: warning
            annotations:
              summary: "Deployment {{ $labels.namespace }}/{{ $labels.deployment }} replica mismatch"
              description: "Spec wants {{ $labels.spec_replicas }} but only {{ $value }} available. Likely a rollout stuck on probe failure, scheduling, or PVC."
      # Longhorn storage health alerts. Required: longhorn scrape job
      # (added 2026-04-26 — see scrape_configs above). The K8s events
      # for "snapshot becomes not ready to use" are transient lifecycle
      # noise, not actionable — these alerts use the actual Longhorn
      # gauges that reflect persistent state.
      - name: longhorn-storage
        rules:
          # Volume robustness: 0=unknown, 1=healthy, 2=degraded, 3=faulted.
          # Detached volumes report 0 — that's normal for unattached PVCs,
          # so filter to only attached.
          - alert: LonghornVolumeDegraded
            expr: longhorn_volume_robustness{robustness="degraded"} == 1
            for: 15m
            labels:
              severity: warning
            annotations:
              summary: "Longhorn volume {{ $labels.volume }} degraded for >15m"
              description: "Volume {{ $labels.volume }} on node {{ $labels.node }} has been degraded (one or more replicas unhealthy) for 15+ minutes. Auto-rebuild may need help — check 'kubectl describe volume.longhorn.io {{ $labels.volume }} -n longhorn-system'."
          - alert: LonghornVolumeFaulted
            expr: longhorn_volume_robustness{robustness="faulted"} == 1
            for: 5m
            labels:
              severity: critical
              alert_channel: thermal_print
            annotations:
              summary: "Longhorn volume {{ $labels.volume }} FAULTED"
              description: "Volume {{ $labels.volume }} on node {{ $labels.node }} is faulted — all replicas unavailable. Data inaccessible. Manual intervention required."
          # No backup in 36h indicates the daily-backup recurringJob is
          # silently failing. Allows for one missed run + slack.
          - alert: LonghornBackupStale
            expr: |
              (time() - max by(volume) (longhorn_backup_state{state="Completed"} * on(backup) group_left() longhorn_backup_actual_size_bytes)) > 36 * 3600
            for: 1h
            labels:
              severity: warning
            annotations:
              summary: "Longhorn volume {{ $labels.volume }} has no completed backup in >36h"
              description: "Daily backup recurringJob (cron 0 2 * * *) appears to have skipped this volume. Check 'kubectl get backups.longhorn.io -n longhorn-system' and the daily-backup CronJob logs."
          - alert: LonghornNodeUnhealthy
            expr: longhorn_node_status{condition="ready",condition_reason!=""} == 0
            for: 5m
            labels:
              severity: warning
            annotations:
              summary: "Longhorn node {{ $labels.node }} not Ready"
              description: "Node {{ $labels.node }} reports ready=false (reason: {{ $labels.condition_reason }}). Volumes scheduled to this node will be unavailable until it recovers."
      # ============================================================
      # FC Signage Marquee Performance — Track 3 + 8 (2026-05-06)
      # Live-mirrored from FlowerCore.Notes/scripts/monitoring/alerts.yml.
      # Source-of-truth for the live Podman Prometheus on noc1 is the
      # Notes file; this K8s ConfigMap exists so a future migration to
      # in-cluster Prometheus inherits the ruleset automatically.
      # See feedback_monitoring_k8s_target_vs_live_podman.
      # ============================================================
      - name: fc-signage-marquee
        rules:
          - alert: MarqueeDroppedFramesHigh
            expr: |
              (
                sum by (renderer, phase, node_id) (rate(marquee_dropped_frames_total[5m]))
                /
                sum by (renderer, phase, node_id) (rate(marquee_render_latency_ms_count[5m]))
              ) > 0.05
              unless on()
              absent_over_time(marquee_dropped_frames_total[7d])
            for: 5m
            labels:
              severity: warning
              service: signage
              alert_channel: irc
            annotations:
              summary: "Marquee dropped-frame rate >5% on {{ $labels.renderer }}/{{ $labels.node_id }} ({{ $labels.phase }})"
              description: "Renderer {{ $labels.renderer }} on {{ $labels.node_id }} drops >5% of frames during {{ $labels.phase }}. Animation visibly stuttery."
          - alert: MarqueeRenderLatencyP99High
            expr: |
              histogram_quantile(
                0.99,
                sum by (renderer, phase, node_id, le) (rate(marquee_render_latency_ms_bucket[5m]))
              ) > 16
              unless on()
              absent_over_time(marquee_render_latency_ms_bucket[7d])
            for: 10m
            labels:
              severity: warning
              service: signage
              alert_channel: irc
            annotations:
              summary: "Marquee render latency p99 > 16ms on {{ $labels.renderer }}/{{ $labels.node_id }} ({{ $labels.phase }})"
              description: "Per-frame render latency p99 has exceeded the Pi-class 16ms budget for 10 minutes."
          - alert: MarqueeAnimationDurationDrift
            expr: |
              abs(
                histogram_quantile(0.5, sum by (renderer, phase, le) (rate(marquee_animation_duration_ms_bucket[15m])))
                -
                on (phase) group_left() avg by (phase) (marquee_animation_duration_target_ms)
              )
              /
              on (phase) group_left() avg by (phase) (marquee_animation_duration_target_ms)
              > 0.10
              unless on()
              absent_over_time(marquee_animation_duration_ms_bucket[7d])
            for: 15m
            labels:
              severity: info
              service: signage
              alert_channel: irc
            annotations:
              summary: "Marquee animation duration drifting > 10% on {{ $labels.renderer }} ({{ $labels.phase }})"
              description: "Median observed cycle duration deviates from target DurationMs by >10%. Could indicate browser tab throttling, GPU pressure, or phase-advancement bug."
 # =============================================================================
 # ConfigMap: Blackbox Exporter Configuration
 # =============================================================================
@@ -710,6 +1121,22 @@ data:
          fail_if_body_not_matches_regexp:
            - '"models"'
          preferred_ip_protocol: ip4
      # https_internal — for Traefik-fronted services with step-ca leaf
      # certs. blackbox does not trust the step-ca root CA, so http_2xx
      # against any *.iamworkin.lan host fails with x509 unknown authority.
      # Redirects + multiple status codes are accepted because some hosts
      # 302 to /login or /scalar.
      https_internal:
        prober: http
        timeout: 10s
        http:
          valid_http_versions: ["HTTP/1.1", "HTTP/2.0"]
          valid_status_codes: [200, 301, 302, 303, 307, 308]
          method: GET
          follow_redirects: true
          preferred_ip_protocol: ip4
          tls_config:
            insecure_skip_verify: true
 # =============================================================================
 # ConfigMap: IRC Notify Script
@@ -3045,6 +3472,172 @@ data:
                relativeTimeRange: {from: 600, to: 0}
                datasourceUid: __expr__
                model: {type: threshold, expression: B, conditions: [{evaluator: {params: [85], type: gt}}], refId: C}
      - orgId: 1
        name: RemoteDesktop
        folder: AI Stack Alerts
        interval: 1m
        rules:
          - uid: remotedesktop-web-down
            title: RemoteDesktop Web DOWN
            condition: C
            for: 3m
            noDataState: Alerting
            execErrState: OK
            annotations:
              summary: FlowerCore RemoteDesktop /health probe failing
              description: "https://desktop.iamworkin.lan/health has failed for 3 minutes. Catalog + session launch surface offline."
              runbook: "1. kubectl -n fc-desktop get pods -l app.kubernetes.io/name=remotedesktop-web 2. kubectl -n fc-desktop logs deploy/remotedesktop-web --tail=50 3. Check Traefik IngressRoute + step-ca cert 4. Rollout restart if pod is stuck"
            labels:
              severity: warning
              service: remotedesktop
            data:
              - refId: A
                relativeTimeRange: {from: 180, to: 0}
                datasourceUid: prometheus
                model: {expr: 'probe_success{job="probe-remotedesktop"}', instant: true, refId: A}
              - refId: B
                relativeTimeRange: {from: 180, to: 0}
                datasourceUid: __expr__
                model: {type: reduce, expression: A, reducer: last, refId: B}
              - refId: C
                relativeTimeRange: {from: 180, to: 0}
                datasourceUid: __expr__
                model: {type: threshold, expression: B, conditions: [{evaluator: {params: [1], type: lt}}], refId: C}
          - uid: remotedesktop-metrics-stale
            title: RemoteDesktop metrics stale
            condition: C
            for: 10m
            noDataState: Alerting
            execErrState: OK
            annotations:
              summary: RemoteDesktop /metrics returning no series
              description: "No fc_desktop_session_events_total series for 10 minutes. Either the Prometheus scrape is misconfigured or the web deployment stopped exporting metrics. Cross-checked by Zabbix template's identical 10m no-data trigger."
              runbook: "1. curl -sk https://desktop.iamworkin.lan/metrics | head 2. kubectl -n monitoring exec deploy/prometheus -- wget -qO- localhost:9090/api/v1/targets?scrapePool=fc-remotedesktop 3. Check monitoring-netpol egress allows to fc-desktop:8080"
            labels:
              severity: warning
              service: remotedesktop
            data:
              - refId: A
                relativeTimeRange: {from: 600, to: 0}
                datasourceUid: prometheus
                model: {expr: 'count(fc_desktop_session_events_total) or vector(0)', instant: true, refId: A}
              - refId: B
                relativeTimeRange: {from: 600, to: 0}
                datasourceUid: __expr__
                model: {type: reduce, expression: A, reducer: last, refId: B}
              - refId: C
                relativeTimeRange: {from: 600, to: 0}
                datasourceUid: __expr__
                model: {type: threshold, expression: B, conditions: [{evaluator: {params: [1], type: lt}}], refId: C}
          - uid: remotedesktop-pool-depleted
            title: RemoteDesktop pool depleted
            condition: C
            for: 5m
            noDataState: OK
            execErrState: OK
            annotations:
              summary: RemoteDesktop warm pool depleted for 5m
              description: "A RemoteDesktop warm pool has fc_desktop_pool_depleted=1 for 5 minutes. New launches will cold-start. Check pod scheduling, image pull, node capacity."
              runbook: "1. kubectl -n fc-desktop get pods -l app.kubernetes.io/name=remote-desktop --sort-by=.status.startTime 2. kubectl -n fc-desktop describe desktoppool <name> 3. Verify localhost/fc-desktop:* images imported on all 3 RKE2 nodes"
            labels:
              severity: warning
              service: remotedesktop
            data:
              - refId: A
                relativeTimeRange: {from: 300, to: 0}
                datasourceUid: prometheus
                model: {expr: 'max(fc_desktop_pool_depleted)', instant: true, refId: A}
              - refId: B
                relativeTimeRange: {from: 300, to: 0}
                datasourceUid: __expr__
                model: {type: reduce, expression: A, reducer: last, refId: B}
              - refId: C
                relativeTimeRange: {from: 300, to: 0}
                datasourceUid: __expr__
                model: {type: threshold, expression: B, conditions: [{evaluator: {params: [0.5], type: gt}}], refId: C}
          - uid: remotedesktop-pool-deficit-sustained
            title: RemoteDesktop pool below desired
            condition: C
            for: 10m
            noDataState: OK
            execErrState: OK
            annotations:
              summary: RemoteDesktop pool sustained deficit
              description: "A pool has fc_desktop_pool_deficit>0 for 10 minutes. Operator is reconciling but can't reach desired size — likely image pull, NFS affinity, or claim-init issue."
              runbook: "1. kubectl -n fc-desktop get pods -l flowercore.io/pool=<pool> 2. kubectl logs -n fc-desktop deploy/remotedesktop-operator 3. Check claim-init hook env on template"
            labels:
              severity: info
              service: remotedesktop
            data:
              - refId: A
                relativeTimeRange: {from: 600, to: 0}
                datasourceUid: prometheus
                model: {expr: 'max(fc_desktop_pool_deficit)', instant: true, refId: A}
              - refId: B
                relativeTimeRange: {from: 600, to: 0}
                datasourceUid: __expr__
                model: {type: reduce, expression: A, reducer: last, refId: B}
              - refId: C
                relativeTimeRange: {from: 600, to: 0}
                datasourceUid: __expr__
                model: {type: threshold, expression: B, conditions: [{evaluator: {params: [0], type: gt}}], refId: C}
          - uid: remotedesktop-session-churn-spike
            title: RemoteDesktop launch rate spike
            condition: C
            for: 5m
            noDataState: OK
            execErrState: OK
            annotations:
              summary: RemoteDesktop launch rate exceeds 20/min
              description: "Launch events >20/min for 5 minutes. Could be a user-facing feature launch, pooled template thrashing, or runaway automation loop."
              runbook: "1. kubectl -n fc-desktop get pods -l app.kubernetes.io/name=remote-desktop -o wide | wc -l 2. curl -sk https://desktop.iamworkin.lan/api/sessions/active 3. Check operator logs for reconcile loops"
            labels:
              severity: info
              service: remotedesktop
            data:
              - refId: A
                relativeTimeRange: {from: 300, to: 0}
                datasourceUid: prometheus
                model: {expr: 'sum(rate(fc_desktop_session_events_total{event="launch"}[5m])) * 60', instant: true, refId: A}
              - refId: B
                relativeTimeRange: {from: 300, to: 0}
                datasourceUid: __expr__
                model: {type: reduce, expression: A, reducer: last, refId: B}
              - refId: C
                relativeTimeRange: {from: 300, to: 0}
                datasourceUid: __expr__
                model: {type: threshold, expression: B, conditions: [{evaluator: {params: [20], type: gt}}], refId: C}
          - uid: remotedesktop-tls-expiry
            title: RemoteDesktop TLS cert expiring
            condition: C
            for: 6h
            noDataState: OK
            execErrState: OK
            annotations:
              summary: desktop.iamworkin.lan cert <2d to expiry
              description: "The desktop.iamworkin.lan certificate is inside the 2-day renewal window and cert-manager has not renewed. Check cert-manager logs, step-ca reachability, FlowerCore.DNS preflight for dnsNames."
              runbook: "1. kubectl -n fc-desktop get certificate remotedesktop-web-tls 2. kubectl -n cert-manager logs deploy/cert-manager --tail=50 3. Verify pfSense DNS override for desktop.iamworkin.lan"
            labels:
              severity: critical
              service: remotedesktop
            data:
              - refId: A
                relativeTimeRange: {from: 21600, to: 0}
                datasourceUid: prometheus
                model: {expr: '(probe_ssl_earliest_cert_expiry{job="probe-remotedesktop"} - time()) / 86400', instant: true, refId: A}
              - refId: B
                relativeTimeRange: {from: 21600, to: 0}
                datasourceUid: __expr__
                model: {type: reduce, expression: A, reducer: last, refId: B}
              - refId: C
                relativeTimeRange: {from: 21600, to: 0}
                datasourceUid: __expr__
                model: {type: threshold, expression: B, conditions: [{evaluator: {params: [2], type: lt}}], refId: C}
 # =============================================================================
 # Deployment: Grafana
@@ -3122,6 +3715,9 @@ spec:
            - name: dashboards-infra-overview
              mountPath: /var/lib/grafana/dashboards/infra-overview
              readOnly: true
            - name: dashboards-remotedesktop
              mountPath: /var/lib/grafana/dashboards/remotedesktop
              readOnly: true
            - name: datasource-provisioning
              mountPath: /etc/grafana/provisioning/datasources
              readOnly: true
@@ -3172,6 +3768,9 @@ spec:
        - name: dashboards-infra-overview
          configMap:
            name: grafana-dashboard-infra-overview
        - name: dashboards-remotedesktop
          configMap:
            name: grafana-dashboard-remotedesktop
        - name: datasource-provisioning
          configMap:
            name: grafana-datasource-provisioning
@@ -3733,6 +4332,66 @@ spec:
      ports:
        - port: 80
          protocol: TCP
    # FlowerCore.RemoteDesktop /metrics scrape via the fc-desktop
    # ClusterIP Service (remotedesktop-web:8080). Also covers the
    # Traefik VIP hairpin path since after kube-proxy DNAT, the egress
    # destination is the backend pod IP on the service port (see
    # feedback_netpol_dnat_backend_port).
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: fc-desktop
      ports:
        - port: 8080
          protocol: TCP
    # Traefik backend ports — needed for in-cluster egress to public
    # iamworkin.lan hostnames that CoreDNS wildcard resolves to the
    # LoadBalancer VIP. Post-DNAT destination is a Traefik pod on 8080/8443.
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: traefik-system
          podSelector:
            matchLabels:
              app.kubernetes.io/name: traefik
      ports:
        - port: 8080
          protocol: TCP
        - port: 8443
          protocol: TCP
    # Traefik /metrics endpoint (port 9100) — separate from the data-path
    # ports above. Required for the in-cluster `traefik` scrape job.
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: traefik-system
      ports:
        - port: 9100
          protocol: TCP
    # kube-state-metrics — required for kubernetes-state alert group.
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system
      ports:
        - port: 8080
          protocol: TCP
    # cert-manager metrics — required for CertManagerCertificate* alerts.
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: cert-manager
      ports:
        - port: 9402
          protocol: TCP
    # Longhorn manager metrics — required for Longhorn* alerts.
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: longhorn-system
      ports:
        - port: 9500
          protocol: TCP
    # IRC (irc-notify → UnrealIRCd in irc namespace via K8s DNS)
    - to:
        - namespaceSelector:
--- a/apps/multus/multus.yaml
+++ b/apps/multus/multus.yaml
@@ -0,0 +1,286 @@
 # =============================================================================
 # Multus CNI — Meta-CNI for multi-network attachment to pods/VMs
 # =============================================================================
 # Purpose: enable KubeVirt VMs (and any future workload) to attach additional
 # network interfaces beyond the default Calico-managed pod network. Required
 # for ci1 (Windows Server 2025 KubeVirt VM) to bridge onto PROD VLAN 57.
 #
 # Source: upstream k8snetworkplumbingwg/multus-cni v4.2.2
 #   https://github.com/k8snetworkplumbingwg/multus-cni/blob/v4.2.2/deployments/multus-daemonset-thick.yml
 #
 # Inlined verbatim (with project header + version pin annotation) for
 # reproducibility and air-gap safety. Bumping versions = edit this file +
 # git push. ArgoCD picks up via the bluejay-infra ApplicationSet
 # (apps/* directory generator on main).
 #
 # Why thick plugin (not thin):
 #   - Thick = daemon + thin shim binary; daemon handles NAD watch + CRD reads
 #     centrally so each pod's CNI ADD doesn't hit the K8s API server. Better
 #     for clusters with many NAD-using pods.
 #   - Thin = each CNI ADD process directly contacts K8s API. Simpler but
 #     scales worse and has more failure modes.
 #   - KubeVirt + multi-VM workload pattern fits thick perfectly.
 #
 # Cluster context (verified 2026-05-08):
 #   - RKE2 v1.34.5 on 3 nodes (rke2-server, rke2-agent1, rke2-agent2)
 #   - Calico CNI (Tigera-managed) at /etc/cni/net.d + /opt/cni/bin (default)
 #   - openSUSE Leap 16, kernel 6.12, containerd 2.1.5
 #   - host bridge for PROD VLAN 57 = `br-prod` (PUPPET HOST WORK — see Phase 1.5
 #     in docs/infrastructure/windows-server-build-runner-plan.md)
 #
 # Version pin: snapshot-thick → pinning to v4.2.2 release tag at deploy time
 # would require a private mirror of the image. Upstream `snapshot-thick` tag
 # is updated on every release, so for now we trust upstream + Calico's
 # established pattern. Pin to a specific SHA256 once we mirror to Gitea OCI.
 #
 # Apply (once committed to bluejay-infra main, ApplicationSet auto-syncs):
 #   git add apps/multus/multus.yaml && git commit && git push origin main
 #   # ArgoCD `infra-multus` Application appears within 3 min via ApplicationSet
 #
 # Verify:
 #   kubectl -n kube-system get ds kube-multus-ds
 #   kubectl -n kube-system rollout status ds kube-multus-ds
 #   kubectl get crd network-attachment-definitions.k8s.cni.cncf.io
 # =============================================================================
 ---
 apiVersion: apiextensions.k8s.io/v1
 kind: CustomResourceDefinition
 metadata:
  name: network-attachment-definitions.k8s.cni.cncf.io
  annotations:
    bluejay.iamworkin.lan/source: "k8snetworkplumbingwg/multus-cni v4.2.2"
 spec:
  group: k8s.cni.cncf.io
  scope: Namespaced
  names:
    plural: network-attachment-definitions
    singular: network-attachment-definition
    kind: NetworkAttachmentDefinition
    shortNames:
      - net-attach-def
  versions:
    - name: v1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          description: 'NetworkAttachmentDefinition is a CRD schema specified by the Network Plumbing
            Working Group to express the intent for attaching pods to one or more logical or physical
            networks. More information available at: https://github.com/k8snetworkplumbingwg/multi-net-spec'
          type: object
          properties:
            apiVersion:
              type: string
            kind:
              type: string
            metadata:
              type: object
            spec:
              description: 'NetworkAttachmentDefinition spec defines the desired state of a network attachment'
              type: object
              properties:
                config:
                  description: 'NetworkAttachmentDefinition config is a JSON-formatted CNI configuration'
                  type: string
 ---
 kind: ClusterRole
 apiVersion: rbac.authorization.k8s.io/v1
 metadata:
  name: multus
 rules:
  - apiGroups: ["k8s.cni.cncf.io"]
    resources:
      - '*'
    verbs:
      - '*'
  - apiGroups:
      - ""
    resources:
      - pods
      - pods/status
    verbs:
      - get
      - list
      - update
      - watch
  - apiGroups:
      - ""
      - events.k8s.io
    resources:
      - events
    verbs:
      - create
      - patch
      - update
 ---
 kind: ClusterRoleBinding
 apiVersion: rbac.authorization.k8s.io/v1
 metadata:
  name: multus
 roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: multus
 subjects:
  - kind: ServiceAccount
    name: multus
    namespace: kube-system
 ---
 apiVersion: v1
 kind: ServiceAccount
 metadata:
  name: multus
  namespace: kube-system
 ---
 kind: ConfigMap
 apiVersion: v1
 metadata:
  name: multus-daemon-config
  namespace: kube-system
  labels:
    tier: node
    app: multus
 data:
  daemon-config.json: |
    {
        "chrootDir": "/hostroot",
        "cniVersion": "0.3.1",
        "logLevel": "verbose",
        "logToStderr": true,
        "cniConfigDir": "/host/etc/cni/net.d",
        "multusAutoconfigDir": "/host/etc/cni/net.d",
        "multusConfigFile": "auto",
        "socketDir": "/host/run/multus/"
    }
 ---
 apiVersion: apps/v1
 kind: DaemonSet
 metadata:
  name: kube-multus-ds
  namespace: kube-system
  labels:
    tier: node
    app: multus
    name: multus
 spec:
  selector:
    matchLabels:
      name: multus
  updateStrategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        tier: node
        app: multus
        name: multus
    spec:
      hostNetwork: true
      hostPID: true
      tolerations:
        - operator: Exists
          effect: NoSchedule
        - operator: Exists
          effect: NoExecute
      serviceAccountName: multus
      containers:
        - name: kube-multus
          image: ghcr.io/k8snetworkplumbingwg/multus-cni:snapshot-thick
          command: [ "/usr/src/multus-cni/bin/multus-daemon" ]
          resources:
            requests:
              cpu: "100m"
              memory: "50Mi"
            limits:
              cpu: "100m"
              memory: "50Mi"
          securityContext:
            privileged: true
          terminationMessagePolicy: FallbackToLogsOnError
          volumeMounts:
            - name: cni
              mountPath: /host/etc/cni/net.d
            # multus-daemon expects that cnibin path must be identical between pod and container host.
            # e.g. if the cni bin is in '/opt/cni/bin' on the container host side, then it should be mount to '/opt/cni/bin' in multus-daemon,
            # not to any other directory, like '/opt/bin' or '/usr/bin'.
            - name: cnibin
              mountPath: /opt/cni/bin
            - name: host-run
              mountPath: /host/run
            - name: host-var-lib-cni-multus
              mountPath: /var/lib/cni/multus
            - name: host-var-lib-kubelet
              mountPath: /var/lib/kubelet
              mountPropagation: HostToContainer
            - name: host-run-k8s-cni-cncf-io
              mountPath: /run/k8s.cni.cncf.io
            - name: host-run-netns
              mountPath: /run/netns
              mountPropagation: HostToContainer
            - name: multus-daemon-config
              mountPath: /etc/cni/net.d/multus.d
              readOnly: true
            - name: hostroot
              mountPath: /hostroot
              mountPropagation: HostToContainer
            - mountPath: /etc/cni/multus/net.d
              name: multus-conf-dir
          env:
            - name: MULTUS_NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
      initContainers:
        - name: install-multus-binary
          image: ghcr.io/k8snetworkplumbingwg/multus-cni:snapshot-thick
          command:
            - "sh"
            - "-c"
            - "cp /usr/src/multus-cni/bin/multus-shim /host/opt/cni/bin/multus-shim && cp /usr/src/multus-cni/bin/passthru /host/opt/cni/bin/passthru"
          resources:
            requests:
              cpu: "10m"
              memory: "15Mi"
          securityContext:
            privileged: true
          terminationMessagePolicy: FallbackToLogsOnError
          volumeMounts:
            - name: cnibin
              mountPath: /host/opt/cni/bin
              mountPropagation: Bidirectional
      terminationGracePeriodSeconds: 10
      volumes:
        - name: cni
          hostPath:
            path: /etc/cni/net.d
        - name: cnibin
          hostPath:
            path: /opt/cni/bin
        - name: hostroot
          hostPath:
            path: /
        - name: multus-daemon-config
          configMap:
            name: multus-daemon-config
            items:
            - key: daemon-config.json
              path: daemon-config.json
        - name: host-run
          hostPath:
            path: /run
        - name: host-var-lib-cni-multus
          hostPath:
            path: /var/lib/cni/multus
        - name: host-var-lib-kubelet
          hostPath:
            path: /var/lib/kubelet
        - name: host-run-k8s-cni-cncf-io
          hostPath:
            path: /run/k8s.cni.cncf.io
        - name: host-run-netns
          hostPath:
            path: /run/netns/
        - name: multus-conf-dir
          hostPath:
            path: /etc/cni/multus/net.d
--- a/apps/noc-services/noc-services.yaml
+++ b/apps/noc-services/noc-services.yaml
@@ -219,6 +219,65 @@ spec:
  tls:
    secretName: cockpit-tls
 ---
 # ============================================================
 # PuppetDB Dashboard - noc1:8080 (HTTP, web UI only)
 # Agent-to-PuppetDB mTLS still uses port 8081 directly via Puppet CA
 # (NOT via this proxy). See docs/infrastructure/cert-recovery-2026-04-28.md
 # ============================================================
 apiVersion: v1
 kind: Service
 metadata:
  name: puppetdb-external
  namespace: noc-proxy
 spec:
  ports:
    - port: 8080
      targetPort: 8080
      name: http
  clusterIP: None
 ---
 apiVersion: v1
 kind: Endpoints
 metadata:
  name: puppetdb-external
  namespace: noc-proxy
 subsets:
  - addresses:
      - ip: 10.0.56.10
    ports:
      - port: 8080
        name: http
 ---
 apiVersion: cert-manager.io/v1
 kind: Certificate
 metadata:
  name: puppetdb-tls
  namespace: noc-proxy
 spec:
  secretName: puppetdb-tls
  issuerRef:
    name: step-ca-acme
    kind: ClusterIssuer
  dnsNames:
    - puppetdb.iamworkin.lan
 ---
 apiVersion: traefik.io/v1alpha1
 kind: IngressRoute
 metadata:
  name: puppetdb
  namespace: noc-proxy
 spec:
  entryPoints:
    - websecure
  routes:
    - kind: Rule
      match: Host(`puppetdb.iamworkin.lan`)
      services:
        - name: puppetdb-external
          port: 8080
  tls:
    secretName: puppetdb-tls
 ---
 # NetworkPolicy: allow Traefik ingress, allow egress to noc1
 apiVersion: networking.k8s.io/v1
 kind: NetworkPolicy
@@ -242,6 +301,8 @@ spec:
      ports:
        - port: 3000
          protocol: TCP
        - port: 8080
          protocol: TCP
        - port: 9090
          protocol: TCP
        - port: 9091
--- a/apps/selenium/network-policy.yaml
+++ b/apps/selenium/network-policy.yaml
@@ -0,0 +1,210 @@
 # Selenium Grid NetworkPolicy.
 #
 # Captured into bluejay-infra 2026-05-07 during the regroup audit. This
 # NetworkPolicy was previously applied via `kubectl apply` directly to
 # the cluster with no source-of-truth anywhere — a fresh cluster rebuild
 # would have lost all of it (including the Selenium Grid → Traefik VIP
 # allow rule for AAT runs against `*.iamworkin.lan` services).
 #
 # The Selenium Grid Deployment + Services themselves are still managed
 # outside ArgoCD (deployed via raw kubectl from the original Selenium
 # Grid bring-up). Migrating those into bluejay-infra is a separate lane —
 # this commit only restores GitOps repeatability for the NetworkPolicy.
 #
 # Rules captured from the live cluster's `kubectl get netpol -n selenium
 # selenium-netpol -o yaml` on 2026-05-07. Originally applied 2026-03-15
 # (from `metadata.creationTimestamp` before the field was stripped).
 #
 # Allows:
 #   - Egress: CoreDNS, intra-namespace pod-to-pod (4442/4443/4444/5555),
 #     Traefik VIP for `*.iamworkin.lan` AAT runs, all FC namespaces on
 #     standard FC service ports (5100/5200/5300/5400/8080), pod CIDR
 #     (10.42.0.0/16) + service CIDR (10.43.0.0/16) for the same ports,
 #     LAN gateway range (10.0.56.0/24) for HTTPS, edge2 CUPS print
 #     (10.0.57.16:5200), public internet 80/443 (excluding RFC1918), and
 #     fc-signage:5190 for the signage AAT lane.
 #   - Ingress: Traefik (4444 + 8089 ACME-solver-style), intra-pod,
 #     telephony / gitea / fc-system / fc-signage namespaces on 4444.
 apiVersion: networking.k8s.io/v1
 kind: NetworkPolicy
 metadata:
  name: selenium-netpol
  namespace: selenium
  labels:
    app.kubernetes.io/part-of: selenium
    app.kubernetes.io/component: isolation
 spec:
  egress:
  - ports:
    - port: 53
      protocol: UDP
    - port: 53
      protocol: TCP
    to:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: kube-system
  - ports:
    - port: 4442
      protocol: TCP
    - port: 4443
      protocol: TCP
    - port: 4444
      protocol: TCP
    - port: 5555
      protocol: TCP
    to:
    - podSelector: {}
  - ports:
    - port: 443
      protocol: TCP
    - port: 80
      protocol: TCP
    to:
    - ipBlock:
        cidr: 10.0.56.200/32
  - ports:
    - port: 443
      protocol: TCP
    - port: 80
      protocol: TCP
    - port: 5200
      protocol: TCP
    - port: 5300
      protocol: TCP
    - port: 5400
      protocol: TCP
    - port: 5100
      protocol: TCP
    - port: 8080
      protocol: TCP
    to:
    - namespaceSelector: {}
  - ports:
    - port: 443
      protocol: TCP
    - port: 80
      protocol: TCP
    - port: 8443
      protocol: TCP
    - port: 8080
      protocol: TCP
    - port: 5200
      protocol: TCP
    - port: 5300
      protocol: TCP
    - port: 5400
      protocol: TCP
    - port: 5100
      protocol: TCP
    to:
    - ipBlock:
        cidr: 10.43.0.0/16
  - ports:
    - port: 443
      protocol: TCP
    - port: 80
      protocol: TCP
    - port: 8443
      protocol: TCP
    - port: 8080
      protocol: TCP
    - port: 5200
      protocol: TCP
    - port: 5300
      protocol: TCP
    - port: 5400
      protocol: TCP
    - port: 5100
      protocol: TCP
    to:
    - ipBlock:
        cidr: 10.42.0.0/16
  - ports:
    - port: 443
      protocol: TCP
    - port: 80
      protocol: TCP
    - port: 8443
      protocol: TCP
    to:
    - ipBlock:
        cidr: 10.0.56.0/24
  - ports:
    - port: 5200
      protocol: TCP
    to:
    - ipBlock:
        cidr: 10.0.57.16/32
  - ports:
    - port: 80
      protocol: TCP
    - port: 443
      protocol: TCP
    to:
    - ipBlock:
        cidr: 0.0.0.0/0
        except:
        - 172.16.0.0/12
        - 192.168.0.0/16
  - ports:
    - port: 5190
      protocol: TCP
    to:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: fc-signage
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: traefik-system
    ports:
    - port: 4444
      protocol: TCP
    - port: 8089
      protocol: TCP
  - from:
    - podSelector: {}
    ports:
    - port: 4442
      protocol: TCP
    - port: 4443
      protocol: TCP
    - port: 4444
      protocol: TCP
    - port: 5555
      protocol: TCP
  - from:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: telephony
    ports:
    - port: 4444
      protocol: TCP
  - from:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: gitea
    ports:
    - port: 4444
      protocol: TCP
  - from:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: fc-system
    ports:
    - port: 4444
      protocol: TCP
  - from:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: fc-signage
    ports:
    - port: 4444
      protocol: TCP
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress
--- a/apps/telephony/telephony.yaml
+++ b/apps/telephony/telephony.yaml
@@ -148,7 +148,7 @@ spec:
              topologyKey: kubernetes.io/hostname
      containers:
        - name: telephony-web
-          image: localhost/fc-telephony-web:v202604170153
+          image: localhost/fc-telephony-web:v202604252156
          imagePullPolicy: Never
          securityContext:
            readOnlyRootFilesystem: true
--- a/apps/worldbuilder/README.md
+++ b/apps/worldbuilder/README.md
@@ -0,0 +1,60 @@
 # FlowerCore.WorldBuilder
 ArgoCD-managed manifest for FlowerCore.WorldBuilder.Web — comic / storyboard
 authoring service that drives ComfyUI for panel image generation and
 QuestPDF for letter / A4 export.
 Source: `D:\git\FlowerCore\FlowerCore.WorldBuilder` (master)
 ## Deployment order
 1. **DNS preflight** — `worldbuilder.iamworkin.lan -> 10.0.56.200` MUST exist
   in pfSense Unbound before this manifest is applied, or cert-manager
   HTTP-01 silently exponential-backs-off ~2h.
   Memory: `feedback_pfsense_dns_required_for_acme`.
 2. **Image import to ALL RKE2 nodes** — pod can schedule to any of
   `rke2-server` (10.0.56.11), `rke2-agent1` (10.0.56.12),
   `rke2-agent2` (10.0.56.13). Build with:
   ```bash
   bash deploy/build.sh   # in FlowerCore.WorldBuilder repo
   podman save localhost/fc-worldbuilder:v<TAG> -o /tmp/fc-worldbuilder-v<TAG>.tar
   for h in 10.0.56.11 10.0.56.12 10.0.56.13; do
     scp /tmp/fc-worldbuilder-v<TAG>.tar fcadmin@$h:/tmp/
     ssh fcadmin@$h \
       "sudo /var/lib/rancher/rke2/bin/ctr -a /run/k3s/containerd/containerd.sock \
         -n k8s.io images import /tmp/fc-worldbuilder-v<TAG>.tar"
   done
   ```
   Memory: `feedback_rke2_image_import_per_node_scp`.
 3. **Bump image tag** in `worldbuilder.yaml` and git push.
   ArgoCD ApplicationSet picks up within ~3 minutes.
 4. **First production render** — open `https://worldbuilder.iamworkin.lan`,
   create World → Character → Storyboard → ExportJob, confirm artifact
   downloads. ComfyUI lives on BLUEJAY-WS at `http://10.0.56.20:8188`.
 ## Health probes
 - `startupProbe` + `readinessProbe`: `httpGet /healthz` (registered explicitly
  in Program.cs — anonymous, no DB or OpenAPI dependency).
 - `livenessProbe`: `tcpSocket` as a cheap fallback.
  Memory: `feedback_k8s_probes_must_not_hit_openapi`,
  `feedback_k8s_probes_behind_auth_middleware`.
 ## Storage
 - Longhorn RWO PVC `worldbuilder-data` (5Gi) mounted at `/data`. SQLite DB
  lives at `/data/worldbuilder.db`, generated images under `/data/gallery/`,
  PDF/PNG exports under `/data/exports/`.
 - DataProtection keys persist to the same SQLite via
  `AddFlowerCoreDataProtection<WorldBuilderDbContext>` — explicit migration
  `20260429133417_Initial` already creates `fc_dp_keys`.
  Memory: `feedback_dataprotection_keys_persist_to_app_dbcontext`,
  `feedback_intranet_dataprotection_table_must_have_explicit_migration`.
 ## Image generation backend
 `FlowerCore:WorldBuilder:ImageGeneration:BaseUrl=http://10.0.56.20:8188` —
 ComfyUI runs on BLUEJAY-WS Windows (R9700 / gfx1201 / ROCm 7.2.1). Pod reaches
 the workstation directly across the 10.0.56.0/24 VLAN (no Podman-style host-
 filter issues — K8s pods route via Calico, which is L3-routed across the
 VLAN).
--- a/apps/worldbuilder/worldbuilder.yaml
+++ b/apps/worldbuilder/worldbuilder.yaml
@@ -0,0 +1,213 @@
 # FlowerCore.WorldBuilder — comic / storyboard authoring service.
 #
 # Deployment + Service + PVC + Certificate + IngressRoute. ArgoCD-managed
 # end-to-end. See apps/worldbuilder/README.md for the per-deploy runbook.
 #
 # Image build (BLUEJAY-WS):
 #   bash deploy/build.sh                 # in FlowerCore.WorldBuilder repo
 #   podman save localhost/fc-worldbuilder:v<TAG> -o /tmp/fc-worldbuilder-v<TAG>.tar
 #   for h in 10.0.56.11 10.0.56.12 10.0.56.13; do
 #     scp /tmp/fc-worldbuilder-v<TAG>.tar fcadmin@$h:/tmp/
 #     ssh fcadmin@$h "sudo /var/lib/rancher/rke2/bin/ctr -a /run/k3s/containerd/containerd.sock -n k8s.io images import /tmp/fc-worldbuilder-v<TAG>.tar"
 #   done
 ---
 apiVersion: v1
 kind: Namespace
 metadata:
  name: fc-worldbuilder
  labels:
    app.kubernetes.io/part-of: flowercore
 ---
 # SQLite DB + generated image gallery + PDF/PNG exports.
 # Longhorn RWO — single replica with `Recreate` rollout strategy keeps it safe.
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
  name: worldbuilder-data
  namespace: fc-worldbuilder
 spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: longhorn
  resources:
    requests:
      storage: 5Gi
 ---
 apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: worldbuilder-web
  namespace: fc-worldbuilder
  labels:
    app.kubernetes.io/name: worldbuilder-web
    app.kubernetes.io/part-of: flowercore
 spec:
  replicas: 1
  revisionHistoryLimit: 3
  strategy:
    # RWO PVC + single replica. Recreate avoids multi-attach overlap.
    type: Recreate
  selector:
    matchLabels:
      app.kubernetes.io/name: worldbuilder-web
  template:
    metadata:
      labels:
        app.kubernetes.io/name: worldbuilder-web
        app.kubernetes.io/part-of: flowercore
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8080"
        prometheus.io/path: "/metrics/prometheus"
    spec:
      securityContext:
        fsGroup: 1654
        fsGroupChangePolicy: OnRootMismatch
      containers:
        - name: web
          # Bump tag for each rebuild. Initial deploy: v202605062048
          image: localhost/fc-worldbuilder:v202605062048
          imagePullPolicy: Never
          ports:
            - containerPort: 8080
              name: http
          env:
            - name: ASPNETCORE_URLS
              value: "http://+:8080"
            - name: ASPNETCORE_ENVIRONMENT
              value: "Production"
            - name: DOTNET_RUNNING_IN_CONTAINER
              value: "true"
            - name: DOTNET_SYSTEM_GLOBALIZATION_INVARIANT
              value: "false"
            # SQLite path overrides (default appsettings uses relative paths).
            - name: ConnectionStrings__DefaultConnection
              value: "Data Source=/data/worldbuilder.db"
            - name: FlowerCore__Database__Provider
              value: "Sqlite"
            - name: FlowerCore__Database__ConnectionStrings__Sqlite
              value: "Data Source=/data/worldbuilder.db"
            # Generated image gallery + exports persist on /data.
            - name: FlowerCore__WorldBuilder__ImageStore__RootPath
              value: "/data/gallery"
            - name: FlowerCore__WorldBuilder__Export__RootPath
              value: "/data/exports"
            # ComfyUI on BLUEJAY-WS (R9700 / gfx1201 / ROCm 7.2.1).
            - name: FlowerCore__WorldBuilder__ImageGeneration__BaseUrl
              value: "http://10.0.56.20:8188"
            - name: FlowerCore__WorldBuilder__ImageGeneration__ClientMode
              value: "comfyui"
          resources:
            # Cluster CPU-request budget runs hot (99% on all 3 nodes at deploy
            # time) while actual CPU usage is well below capacity. Idle Blazor
            # Server + SignalR + a single ComfyUI poller uses ~5m, so 25m is
            # generous. Re-evaluate if active rendering/export workers ever
            # push past the limit.
            requests:
              cpu: 25m
              memory: 256Mi
            limits:
              cpu: 1000m
              memory: 768Mi
          # /healthz is registered explicitly in Program.cs (anonymous, no DB
          # or OpenAPI dependency). Liveness uses tcpSocket as a cheap fallback
          # in case future middleware changes accidentally gate /healthz.
          # Memory: feedback_k8s_probes_must_not_hit_openapi,
          #         feedback_k8s_probes_behind_auth_middleware.
          startupProbe:
            httpGet:
              path: /healthz
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 5
            failureThreshold: 30
          readinessProbe:
            httpGet:
              path: /healthz
              port: 8080
            periodSeconds: 10
            failureThreshold: 3
          livenessProbe:
            tcpSocket:
              port: 8080
            initialDelaySeconds: 30
            periodSeconds: 30
            failureThreshold: 3
          securityContext:
            runAsNonRoot: true
            runAsUser: 1654
            runAsGroup: 1654
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true
            capabilities:
              drop:
                - ALL
          volumeMounts:
            - name: data
              mountPath: /data
            - name: tmp
              mountPath: /tmp
            - name: logs
              mountPath: /app/logs
      volumes:
        - name: data
          persistentVolumeClaim:
            claimName: worldbuilder-data
        - name: tmp
          emptyDir: {}
        - name: logs
          emptyDir: {}
 ---
 apiVersion: v1
 kind: Service
 metadata:
  name: worldbuilder-web
  namespace: fc-worldbuilder
  labels:
    app.kubernetes.io/name: worldbuilder-web
    app.kubernetes.io/part-of: flowercore
 spec:
  type: ClusterIP
  selector:
    app.kubernetes.io/name: worldbuilder-web
  ports:
    - name: http
      port: 80
      targetPort: 8080
 ---
 apiVersion: cert-manager.io/v1
 kind: Certificate
 metadata:
  name: worldbuilder-web-tls
  namespace: fc-worldbuilder
 spec:
  secretName: worldbuilder-web-tls
  issuerRef:
    name: step-ca-acme
    kind: ClusterIssuer
  dnsNames:
    - worldbuilder.iamworkin.lan
  # step-ca ACME provisioner caps lifetime at 30d. Requesting 90d
  # silently capped to 30d, making renewBefore 720h (30d) equal to the
  # actual cert lifetime — triggered a perpetual renewal loop that
  # generated 2365+ CertificateRequest objects in 18h. Match the working
  # 720h/240h pattern used by every other FC service cert.
  duration: 720h     # 30d (step-ca cap)
  renewBefore: 240h  # 10d
 ---
 apiVersion: traefik.io/v1alpha1
 kind: IngressRoute
 metadata:
  name: worldbuilder-web
  namespace: fc-worldbuilder
 spec:
  entryPoints:
    - websecure
  routes:
    - match: Host(`worldbuilder.iamworkin.lan`)
      kind: Rule
      services:
        - name: worldbuilder-web
          port: 80
  tls:
    secretName: worldbuilder-web-tls
--- a/tests/bluejay-infra-lint/BluejayInfraLint.Tests.csproj
+++ b/tests/bluejay-infra-lint/BluejayInfraLint.Tests.csproj
@@ -0,0 +1,24 @@
 <Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <TargetFramework>net10.0</TargetFramework>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>enable</Nullable>
    <IsPackable>false</IsPackable>
    <TreatWarningsAsErrors>true</TreatWarningsAsErrors>
  </PropertyGroup>
  <ItemGroup>
    <PackageReference Include="coverlet.collector" Version="6.0.2">
      <PrivateAssets>all</PrivateAssets>
      <IncludeAssets>runtime; build; native; contentfiles; analyzers; buildtransitive</IncludeAssets>
    </PackageReference>
    <PackageReference Include="FluentAssertions" Version="6.12.1" />
    <PackageReference Include="Microsoft.NET.Test.Sdk" Version="17.12.0" />
    <PackageReference Include="xunit" Version="2.9.2" />
    <PackageReference Include="xunit.runner.visualstudio" Version="2.8.2">
      <PrivateAssets>all</PrivateAssets>
      <IncludeAssets>runtime; build; native; contentfiles; analyzers; buildtransitive</IncludeAssets>
    </PackageReference>
    <PackageReference Include="YamlDotNet" Version="16.2.0" />
  </ItemGroup>
 </Project>
--- a/tests/bluejay-infra-lint/FleetManifestLintTests.cs
+++ b/tests/bluejay-infra-lint/FleetManifestLintTests.cs
@@ -0,0 +1,639 @@
 using FluentAssertions;
 using System.Text.RegularExpressions;
 using Xunit;
 using YamlDotNet.Core;
 using YamlDotNet.RepresentationModel;
 namespace BluejayInfraLint.Tests;
 [Trait("Category", "Unit")]
 public sealed class FleetManifestLintTests
 {
    private static readonly ManifestInventory Inventory = ManifestInventory.Load();
    private static readonly HashSet<string> PublicReadOnlyHosts = new(StringComparer.Ordinal)
    {
        "dist.flowercore.io",
        "dns.iamworkin.lan",
    };
    // Public hosts that allow a tightly bounded write surface in addition to
    // GET/HEAD. updatecenter.iamworkin.lan accepts POST /api/v1/checkin/{id}
    // (bootstrap-JWT) so its allowlist is GET||HEAD||POST||OPTIONS — but
    // PUT/PATCH/DELETE must still 404 at the route. Anything wider than this
    // set should fail this lint.
    //
    // PUB-1 (2026-05-06): update.flowercore.io / updates.flowercore.io were
    // added for the Cloudflare-proxied public Update Center edge. They use the
    // same bounded read-write allowlist as the LAN pair.
    private static readonly HashSet<string> PublicReadWriteAllowlistHosts = new(StringComparer.Ordinal)
    {
        "updatecenter.iamworkin.lan",
        "updates.iamworkin.lan",
        "update.flowercore.io",
        "updates.flowercore.io",
    };
    private static readonly HashSet<string> ApiKeyProtectedDeployments = new(StringComparer.Ordinal)
    {
        "messageboard-web",
        "scoreboard-web",
        "segmentdisplay-web",
        "signalcontrol-web",
    };
    private static readonly HashSet<string> PublicEgressDeployments = new(StringComparer.Ordinal)
    {
        "asterisk",
        "fc-llm-bridge",
        "mysql-web",
        "php-web",
        "ttsreader-align",
        "ttsreader-kokoro",
        "ttsreader-modern",
        "ttsreader-piper",
    };
    [Fact]
    public void IngressRoutes_MustKeepServiceReferencesInTheSameNamespace()
    {
        var violations = Inventory.Documents
            .Where(document => document.Kind == "IngressRoute")
            .SelectMany(document =>
                document.MappingSequence("spec", "routes")
                    .SelectMany(route =>
                        route.MappingSequence("services")
                            .Select(service => new
                            {
                                Document = document,
                                ServiceName = ManifestNodeExtensions.Scalar(service, "name"),
                                ServiceNamespace = ManifestNodeExtensions.Scalar(service, "namespace"),
                            })))
            .Where(entry => !string.IsNullOrWhiteSpace(entry.ServiceNamespace))
            .Where(entry => !string.Equals(entry.ServiceNamespace, entry.Document.Namespace, StringComparison.Ordinal))
            .Select(entry =>
                $"{entry.Document.Descriptor} references Service '{entry.ServiceName}' in namespace '{entry.ServiceNamespace}'.")
            .ToList();
        violations.Should().BeEmpty();
    }
    [Fact]
    public void PublicReadOnlyIngressRoutes_MustExplicitlyAllowOnlyGetAndHead()
    {
        var violations = Inventory.Documents
            .Where(document => document.Kind == "IngressRoute")
            .SelectMany(document =>
                document.MappingSequence("spec", "routes")
                    .Select(route => new
                    {
                        Document = document,
                        Match = ManifestNodeExtensions.Scalar(route, "match") ?? string.Empty,
                    }))
            .Where(entry => PublicReadOnlyHosts.Any(host => entry.Match.Contains($"Host(`{host}`)", StringComparison.Ordinal)))
            .Where(entry => !entry.Match.Contains("Method(`GET`)", StringComparison.Ordinal)
                || !entry.Match.Contains("Method(`HEAD`)", StringComparison.Ordinal))
            .Select(entry => $"{entry.Document.Descriptor} is missing an explicit GET/HEAD method allowlist.")
            .ToList();
        violations.Should().BeEmpty();
    }
    [Fact]
    public void PublicReadWriteIngressRoutes_MustPinGetHeadPostOptionsAllowlist()
    {
        // For hosts in PublicReadWriteAllowlistHosts, the route match MUST
        // contain Method(`GET`), Method(`HEAD`), Method(`POST`), and
        // Method(`OPTIONS`) AND MUST NOT contain Method(`PUT`),
        // Method(`PATCH`), or Method(`DELETE`). This keeps the public
        // allowlist invariant against regression — see Track A's
        // updatecenter-web ingressroute hardening.
        var violations = Inventory.Documents
            .Where(document => document.Kind == "IngressRoute")
            .SelectMany(document =>
                document.MappingSequence("spec", "routes")
                    .Select(route => new
                    {
                        Document = document,
                        Match = ManifestNodeExtensions.Scalar(route, "match") ?? string.Empty,
                    }))
            .Where(entry => PublicReadWriteAllowlistHosts.Any(host => entry.Match.Contains($"Host(`{host}`)", StringComparison.Ordinal)))
            .SelectMany(entry =>
            {
                var localViolations = new List<string>();
                foreach (var required in new[] { "GET", "HEAD", "POST", "OPTIONS" })
                {
                    if (!entry.Match.Contains($"Method(`{required}`)", StringComparison.Ordinal))
                    {
                        localViolations.Add($"{entry.Document.Descriptor} is missing required Method(`{required}`).");
                    }
                }
                foreach (var forbidden in new[] { "PUT", "PATCH", "DELETE" })
                {
                    if (entry.Match.Contains($"Method(`{forbidden}`)", StringComparison.Ordinal))
                    {
                        localViolations.Add($"{entry.Document.Descriptor} must not include Method(`{forbidden}`) on a public host.");
                    }
                }
                return localViolations;
            })
            .ToList();
        violations.Should().BeEmpty();
    }
    [Fact]
    public void TraefikVipNetworkPolicies_MustAllowPostDnatBackendPorts()
    {
        var violations = Inventory.Documents
            .Where(document => document.Kind == "NetworkPolicy")
            .Where(document => document.AllScalars().Any(value => value.Contains("10.0.56.200", StringComparison.Ordinal)))
            .SelectMany(document =>
            {
                var ports = document.EgressPorts().ToHashSet(StringComparer.Ordinal);
                var localViolations = new List<string>();
                if (ports.Contains("443") && !ports.Contains("8443"))
                {
                    localViolations.Add($"{document.Descriptor} allows Traefik VIP 443 without backend port 8443.");
                }
                if (ports.Contains("80") && !ports.Contains("8000") && !ports.Contains("8080"))
                {
                    localViolations.Add($"{document.Descriptor} allows Traefik VIP 80 without a backend HTTP port (8000/8080).");
                }
                return localViolations;
            })
            .ToList();
        violations.Should().BeEmpty();
    }
    [Fact]
    public void ApiKeyProtectedDeployments_MustUseTcpSocketHealthProbes()
    {
        var violations = Inventory.Documents
            .Where(document => document.Kind == "Deployment")
            .Where(document => ApiKeyProtectedDeployments.Contains(document.Name))
            .SelectMany(document => document.ContainerMappings().SelectMany(container =>
                ProbeViolations(document, container, "readinessProbe")
                    .Concat(ProbeViolations(document, container, "livenessProbe"))))
            .ToList();
        violations.Should().BeEmpty();
    }
    [Fact]
    public void StatefulSets_WithVolumeClaimTemplates_MustDeclareFilesystemDefaults()
    {
        var violations = Inventory.Documents
            .Where(document => document.Kind == "StatefulSet")
            .Where(document => document.MappingSequence("spec", "volumeClaimTemplates").Count > 0)
            .SelectMany(document =>
            {
                var localViolations = new List<string>();
                if (string.IsNullOrWhiteSpace(document.Scalar("spec", "podManagementPolicy")))
                {
                    localViolations.Add($"{document.Descriptor} is missing spec.podManagementPolicy.");
                }
                if (string.IsNullOrWhiteSpace(document.Scalar("spec", "revisionHistoryLimit")))
                {
                    localViolations.Add($"{document.Descriptor} is missing spec.revisionHistoryLimit.");
                }
                foreach (var claimTemplate in document.MappingSequence("spec", "volumeClaimTemplates"))
                {
                    if (!string.Equals(
                            ManifestNodeExtensions.Scalar(claimTemplate, "spec", "volumeMode"),
                            "Filesystem",
                            StringComparison.Ordinal))
                    {
                        var claimName = ManifestNodeExtensions.Scalar(claimTemplate, "metadata", "name") ?? "<unnamed>";
                        localViolations.Add($"{document.Descriptor} volumeClaimTemplate '{claimName}' is missing volumeMode: Filesystem.");
                    }
                }
                return localViolations;
            })
            .ToList();
        violations.Should().BeEmpty();
    }
    [Fact]
    public void LocallyImportedImages_MustUseLocalhostPrefixAndNeverPullPolicy()
    {
        var violations = Inventory.Documents
            .Where(document => document.PodSpec() is not null)
            .SelectMany(document => document.ContainerSpecs()
                .Where(container => !string.IsNullOrWhiteSpace(container.Image))
                .Select(container => new
                {
                    Document = document,
                    Container = container,
                }))
            .Where(entry =>
                (entry.Container.Image.StartsWith("localhost/", StringComparison.Ordinal)
                    && !string.Equals(entry.Container.ImagePullPolicy, "Never", StringComparison.Ordinal))
                || (entry.Container.Image.StartsWith("fc-", StringComparison.Ordinal)
                    && !entry.Container.Image.Contains('/', StringComparison.Ordinal)))
            .Select(entry =>
            {
                if (entry.Container.Image.StartsWith("localhost/", StringComparison.Ordinal))
                {
                    return $"{entry.Document.Descriptor} container '{entry.Container.Name}' uses {entry.Container.Image} without imagePullPolicy: Never.";
                }
                return $"{entry.Document.Descriptor} container '{entry.Container.Name}' uses non-local image '{entry.Container.Image}' for a node-imported FlowerCore workload.";
            })
            .ToList();
        violations.Should().BeEmpty();
    }
    [Fact]
    public void PublicEgressDeployments_MustOptOutOfIamworkinLanSearchSuffixes()
    {
        var violations = Inventory.Documents
            .Where(document => document.PodSpec() is not null)
            .Where(document => PublicEgressDeployments.Contains(document.Name))
            .SelectMany(document =>
            {
                var localViolations = new List<string>();
                var podSpec = document.PodSpec()!;
                var dnsPolicy = ManifestNodeExtensions.Scalar(podSpec, "dnsPolicy");
                var searches = ManifestNodeExtensions.ScalarSequence(podSpec, "dnsConfig", "searches").ToList();
                if (!string.Equals(dnsPolicy, "None", StringComparison.Ordinal))
                {
                    localViolations.Add($"{document.Descriptor} is missing dnsPolicy: None.");
                }
                if (searches.Count == 0)
                {
                    localViolations.Add($"{document.Descriptor} is missing dnsConfig.searches.");
                }
                else if (searches.Any(search => search.Contains("iamworkin.lan", StringComparison.OrdinalIgnoreCase)))
                {
                    localViolations.Add($"{document.Descriptor} still includes iamworkin.lan in dnsConfig.searches.");
                }
                return localViolations;
            })
            .ToList();
        violations.Should().BeEmpty();
    }
    private static IEnumerable<string> ProbeViolations(
        ManifestDocument document,
        YamlMappingNode container,
        string probeKey)
    {
        if (!ManifestNodeExtensions.TryGetMapping(container, probeKey, out var probe)
            || !ManifestNodeExtensions.TryGetMapping(probe, "httpGet", out var httpGet))
        {
            return Array.Empty<string>();
        }
        var path = ManifestNodeExtensions.Scalar(httpGet, "path");
        if (!string.Equals(path, "/health", StringComparison.Ordinal))
        {
            return Array.Empty<string>();
        }
        var containerName = ManifestNodeExtensions.Scalar(container, "name") ?? "<unnamed>";
        return new[]
        {
            $"{document.Descriptor} container '{containerName}' still uses {probeKey}.httpGet on /health.",
        };
    }
 }
 internal sealed class ManifestInventory
 {
    private ManifestInventory(string workspaceRoot, string bluejayRoot, IReadOnlyList<ManifestDocument> documents)
    {
        WorkspaceRoot = workspaceRoot;
        BluejayRoot = bluejayRoot;
        Documents = documents;
    }
    public string WorkspaceRoot { get; }
    public string BluejayRoot { get; }
    public IReadOnlyList<ManifestDocument> Documents { get; }
    public static ManifestInventory Load()
    {
        var bluejayRoot = FindBluejayInfraRoot();
        var workspaceRoot = Directory.GetParent(bluejayRoot)?.FullName
            ?? throw new DirectoryNotFoundException($"Could not resolve workspace root from '{bluejayRoot}'.");
        var documents = ManifestRoots(workspaceRoot, bluejayRoot)
            .SelectMany(LoadDocumentsFromRoot)
            .ToList();
        return new ManifestInventory(workspaceRoot, bluejayRoot, documents);
    }
    private static string FindBluejayInfraRoot()
    {
        var current = new DirectoryInfo(AppContext.BaseDirectory);
        while (current is not null)
        {
            if (Directory.Exists(Path.Combine(current.FullName, "apps"))
                && File.Exists(Path.Combine(current.FullName, "README.md")))
            {
                return current.FullName;
            }
            current = current.Parent;
        }
        throw new DirectoryNotFoundException("Could not find the bluejay-infra repository root from the test output directory.");
    }
    private static IEnumerable<string> ManifestRoots(string workspaceRoot, string bluejayRoot)
    {
        var roots = new[]
        {
            Path.Combine(bluejayRoot, "apps"),
            Path.Combine(workspaceRoot, "FlowerCore.Chat", "k8s"),
            Path.Combine(workspaceRoot, "FlowerCore.DMS", "k8s"),
            Path.Combine(workspaceRoot, "FlowerCore.DNS", "k8s"),
            Path.Combine(workspaceRoot, "FlowerCore.Intranet.Web", "k8s"),
            Path.Combine(workspaceRoot, "FlowerCore.Kiosk", "k8s"),
            Path.Combine(workspaceRoot, "FlowerCore.Media", "k8s"),
            Path.Combine(workspaceRoot, "FlowerCore.MenuBoard", "k8s"),
            Path.Combine(workspaceRoot, "FlowerCore.MessageBoard", "k8s"),
            // FlowerCore.Notes/k8s/selenium/ is the live Selenium Grid
            // manifest tree (consumed by deploy-selenium scripts).
            // FlowerCore.Notes/k8s/guacamole/ + FlowerCore.Notes/k8s/monitoring/
            // are historical scaffolds that have diverged from the live state
            // (bluejay-infra/apps/guacamole + bluejay-infra/apps/monitoring are
            // canonical). Operator review is required before bringing them in
            // line OR decommissioning them — keep them out of the lint scope
            // until that decision lands. See xxl-regroup-2026-05-03-followup.md
            // "Codex 7 §0 stop conditions" + the C7 close-session output.
            Path.Combine(workspaceRoot, "FlowerCore.Notes", "k8s", "selenium"),
            Path.Combine(workspaceRoot, "FlowerCore.MySQL", "k8s"),
            Path.Combine(workspaceRoot, "FlowerCore.PHP", "k8s"),
            Path.Combine(workspaceRoot, "FlowerCore.Presentations", "k8s"),
            Path.Combine(workspaceRoot, "FlowerCore.Print.Web", "k8s"),
            Path.Combine(workspaceRoot, "FlowerCore.RemoteDesktop", "k8s"),
            Path.Combine(workspaceRoot, "FlowerCore.Scoreboard", "k8s"),
            Path.Combine(workspaceRoot, "FlowerCore.SegmentDisplay", "k8s"),
            Path.Combine(workspaceRoot, "FlowerCore.SignalControl", "k8s"),
            Path.Combine(workspaceRoot, "FlowerCore.TtsReader", "k8s"),
            Path.Combine(workspaceRoot, "FlowerCore.Updater", "k8s"),
        };
        return roots.Where(Directory.Exists);
    }
    private static IEnumerable<ManifestDocument> LoadDocumentsFromRoot(string root)
    {
        foreach (var filePath in Directory.EnumerateFiles(root, "*.yaml", SearchOption.AllDirectories))
        {
            var fileText = File.ReadAllText(filePath);
            var segments = SplitManifestDocuments(fileText);
            for (var index = 0; index < segments.Count; index++)
            {
                var yaml = new YamlStream();
                try
                {
                    using var reader = new StringReader(segments[index]);
                    yaml.Load(reader);
                }
                catch (YamlException exception)
                {
                    _ = exception;
                    continue;
                }
                if (yaml.Documents.Count == 0)
                {
                    continue;
                }
                if (yaml.Documents[0].RootNode is YamlMappingNode mapping
                    && ManifestNodeExtensions.Scalar(mapping, "kind") is not null)
                {
                    yield return new ManifestDocument(root, filePath, index, fileText, mapping);
                }
            }
        }
    }
    private static IReadOnlyList<string> SplitManifestDocuments(string fileText)
    {
        var documents = new List<string>();
        var currentLines = new List<string>();
        var seenApiVersion = false;
        foreach (var line in Regex.Split(fileText, @"\r?\n"))
        {
            if (Regex.IsMatch(line, @"^\s*---\s*$"))
            {
                FlushCurrentDocument();
                continue;
            }
            if (Regex.IsMatch(line, @"^\s*apiVersion:\s*")
                && seenApiVersion
                && currentLines.Any(existing => !string.IsNullOrWhiteSpace(existing)))
            {
                FlushCurrentDocument();
            }
            currentLines.Add(line);
            if (Regex.IsMatch(line, @"^\s*apiVersion:\s*"))
            {
                seenApiVersion = true;
            }
        }
        FlushCurrentDocument();
        return documents;
        void FlushCurrentDocument()
        {
            var text = string.Join(Environment.NewLine, currentLines).Trim();
            if (!string.IsNullOrWhiteSpace(text))
            {
                documents.Add(text);
            }
            currentLines.Clear();
            seenApiVersion = false;
        }
    }
 }
 internal sealed record ManifestDocument(
    string RootPath,
    string FilePath,
    int DocumentIndex,
    string FileText,
    YamlMappingNode Root)
 {
    public string Kind => Scalar("kind") ?? string.Empty;
    public string Name => Scalar("metadata", "name") ?? $"document-{DocumentIndex}";
    public string Namespace => Scalar("metadata", "namespace") ?? string.Empty;
    public string RelativePath => Path.GetRelativePath(RootPath, FilePath).Replace('\\', '/');
    public string Descriptor => $"{Kind} {Namespace}/{Name} [{RelativePath}#{DocumentIndex + 1}]";
    public string? Scalar(params string[] path) => ManifestNodeExtensions.Scalar(Root, path);
    public IReadOnlyList<YamlMappingNode> MappingSequence(params string[] path) => ManifestNodeExtensions.MappingSequence(Root, path);
    public IEnumerable<string> AllScalars() => ManifestNodeExtensions.AllScalars(Root);
    public IReadOnlyList<string> EgressPorts()
    {
        return MappingSequence("spec", "egress")
            .SelectMany(egressRule => ManifestNodeExtensions.MappingSequence(egressRule, "ports"))
            .Select(portMapping => ManifestNodeExtensions.Scalar(portMapping, "port"))
            .Where(value => !string.IsNullOrWhiteSpace(value))
            .Cast<string>()
            .ToList();
    }
    public YamlMappingNode? PodSpec()
    {
        return Kind switch
        {
            "Deployment" or "StatefulSet" or "DaemonSet" or "Job" =>
                ManifestNodeExtensions.Mapping(Root, "spec", "template", "spec"),
            "CronJob" =>
                ManifestNodeExtensions.Mapping(Root, "spec", "jobTemplate", "spec", "template", "spec"),
            _ => null,
        };
    }
    public IReadOnlyList<YamlMappingNode> ContainerMappings()
    {
        var podSpec = PodSpec();
        if (podSpec is null)
        {
            return Array.Empty<YamlMappingNode>();
        }
        return ManifestNodeExtensions.MappingSequence(podSpec, "containers")
            .Concat(ManifestNodeExtensions.MappingSequence(podSpec, "initContainers"))
            .ToList();
    }
    public IReadOnlyList<ContainerSpec> ContainerSpecs()
    {
        return ContainerMappings()
            .Select(container => new ContainerSpec(
                ManifestNodeExtensions.Scalar(container, "name") ?? "<unnamed>",
                ManifestNodeExtensions.Scalar(container, "image") ?? string.Empty,
                ManifestNodeExtensions.Scalar(container, "imagePullPolicy") ?? string.Empty))
            .ToList();
    }
 }
 internal sealed record ContainerSpec(string Name, string Image, string ImagePullPolicy);
 internal static class ManifestNodeExtensions
 {
    public static string? Scalar(this YamlMappingNode mapping, params string[] path)
    {
        return TryGetNode(mapping, path, out var node) && node is YamlScalarNode scalar
            ? scalar.Value
            : null;
    }
    public static YamlMappingNode? Mapping(this YamlMappingNode mapping, params string[] path)
    {
        return TryGetNode(mapping, path, out var node) ? node as YamlMappingNode : null;
    }
    public static bool TryGetMapping(this YamlMappingNode mapping, string key, out YamlMappingNode result)
    {
        if (TryGetChild(mapping, key, out var child) && child is YamlMappingNode childMapping)
        {
            result = childMapping;
            return true;
        }
        result = null!;
        return false;
    }
    public static IReadOnlyList<YamlMappingNode> MappingSequence(this YamlMappingNode mapping, params string[] path)
    {
        return TryGetNode(mapping, path, out var node) && node is YamlSequenceNode sequence
            ? sequence.Children.OfType<YamlMappingNode>().ToList()
            : Array.Empty<YamlMappingNode>();
    }
    public static IReadOnlyList<string> ScalarSequence(this YamlMappingNode mapping, params string[] path)
    {
        return TryGetNode(mapping, path, out var node) && node is YamlSequenceNode sequence
            ? sequence.Children.OfType<YamlScalarNode>()
                .Select(child => child.Value)
                .Where(value => !string.IsNullOrWhiteSpace(value))
                .Cast<string>()
                .ToList()
            : Array.Empty<string>();
    }
    public static IEnumerable<string> AllScalars(YamlNode node)
    {
        return node switch
        {
            YamlScalarNode scalar when !string.IsNullOrWhiteSpace(scalar.Value) => new[] { scalar.Value! },
            YamlSequenceNode sequence => sequence.Children.SelectMany(AllScalars),
            YamlMappingNode mapping => mapping.Children.SelectMany(entry => AllScalars(entry.Key).Concat(AllScalars(entry.Value))),
            _ => Array.Empty<string>(),
        };
    }
    private static bool TryGetNode(YamlMappingNode mapping, IReadOnlyList<string> path, out YamlNode node)
    {
        YamlNode current = mapping;
        foreach (var segment in path)
        {
            if (current is not YamlMappingNode currentMapping || !TryGetChild(currentMapping, segment, out current))
            {
                node = null!;
                return false;
            }
        }
        node = current;
        return true;
    }
    private static bool TryGetChild(YamlMappingNode mapping, string key, out YamlNode value)
    {
        foreach (var entry in mapping.Children)
        {
            if (entry.Key is YamlScalarNode scalar
                && string.Equals(scalar.Value, key, StringComparison.Ordinal))
            {
                value = entry.Value;
                return true;
            }
        }
        value = null!;
        return false;
    }
 }
--- a/tests/bluejay-infra-lint/conftest.dev/01_cross_namespace_ingressroute.rego
+++ b/tests/bluejay-infra-lint/conftest.dev/01_cross_namespace_ingressroute.rego
@@ -0,0 +1,12 @@
 package bluejayinfra.cross_namespace_ingressroute
 deny[msg] {
  input.kind == "IngressRoute"
  ns := object.get(input.metadata, "namespace", "")
  route := input.spec.routes[_]
  service := route.services[_]
  svc_ns := object.get(service, "namespace", "")
  svc_ns != ""
  svc_ns != ns
  msg := sprintf("IngressRoute %s/%s references Service %s in namespace %s", [ns, input.metadata.name, service.name, svc_ns])
 }
--- a/tests/bluejay-infra-lint/conftest.dev/02_public_method_allowlist.rego
+++ b/tests/bluejay-infra-lint/conftest.dev/02_public_method_allowlist.rego
@@ -0,0 +1,23 @@
 package bluejayinfra.public_method_allowlist
 public_hosts := {"dist.flowercore.io", "dns.iamworkin.lan"}
 deny[msg] {
  input.kind == "IngressRoute"
  route := input.spec.routes[_]
  match := object.get(route, "match", "")
  host := public_hosts[_]
  contains(match, sprintf("Host(`%s`)", [host]))
  not contains(match, "Method(`GET`)")
  msg := sprintf("IngressRoute %s/%s is missing Method(GET) for public read-only host %s", [input.metadata.namespace, input.metadata.name, host])
 }
 deny[msg] {
  input.kind == "IngressRoute"
  route := input.spec.routes[_]
  match := object.get(route, "match", "")
  host := public_hosts[_]
  contains(match, sprintf("Host(`%s`)", [host]))
  not contains(match, "Method(`HEAD`)")
  msg := sprintf("IngressRoute %s/%s is missing Method(HEAD) for public read-only host %s", [input.metadata.namespace, input.metadata.name, host])
 }
--- a/tests/bluejay-infra-lint/conftest.dev/03_traefik_vip_backend_ports.rego
+++ b/tests/bluejay-infra-lint/conftest.dev/03_traefik_vip_backend_ports.rego
@@ -0,0 +1,30 @@
 package bluejayinfra.traefik_vip_backend_ports
 has_vip {
  some i
  some j
  input.spec.egress[i].to[j].ipBlock.cidr == "10.0.56.200/32"
 }
 has_port(port) {
  some i
  some j
  input.spec.egress[i].ports[j].port == port
 }
 deny[msg] {
  input.kind == "NetworkPolicy"
  has_vip
  has_port(443)
  not has_port(8443)
  msg := sprintf("NetworkPolicy %s/%s allows 10.0.56.200:443 without backend port 8443", [input.metadata.namespace, input.metadata.name])
 }
 deny[msg] {
  input.kind == "NetworkPolicy"
  has_vip
  has_port(80)
  not has_port(8080)
  not has_port(8000)
  msg := sprintf("NetworkPolicy %s/%s allows 10.0.56.200:80 without backend HTTP port 8080 or 8000", [input.metadata.namespace, input.metadata.name])
 }
--- a/tests/bluejay-infra-lint/conftest.dev/04_auth_probe_path.rego
+++ b/tests/bluejay-infra-lint/conftest.dev/04_auth_probe_path.rego
@@ -0,0 +1,28 @@
 package bluejayinfra.auth_probe_path
 protected_deployments := {
  "messageboard-web",
  "scoreboard-web",
  "segmentdisplay-web",
  "signalcontrol-web",
 }
 deny[msg] {
  input.kind == "Deployment"
  protected_deployments[input.metadata.name]
  container := input.spec.template.spec.containers[_]
  probe := object.get(container, "readinessProbe", {})
  http_get := object.get(probe, "httpGet", {})
  object.get(http_get, "path", "") == "/health"
  msg := sprintf("Deployment %s/%s must not use readinessProbe.httpGet /health behind API key middleware", [input.metadata.namespace, input.metadata.name])
 }
 deny[msg] {
  input.kind == "Deployment"
  protected_deployments[input.metadata.name]
  container := input.spec.template.spec.containers[_]
  probe := object.get(container, "livenessProbe", {})
  http_get := object.get(probe, "httpGet", {})
  object.get(http_get, "path", "") == "/health"
  msg := sprintf("Deployment %s/%s must not use livenessProbe.httpGet /health behind API key middleware", [input.metadata.namespace, input.metadata.name])
 }
--- a/tests/bluejay-infra-lint/conftest.dev/05_statefulset_volumeclaim_defaults.rego
+++ b/tests/bluejay-infra-lint/conftest.dev/05_statefulset_volumeclaim_defaults.rego
@@ -0,0 +1,23 @@
 package bluejayinfra.statefulset_volumeclaim_defaults
 deny[msg] {
  input.kind == "StatefulSet"
  count(object.get(input.spec, "volumeClaimTemplates", [])) > 0
  object.get(input.spec, "podManagementPolicy", "") == ""
  msg := sprintf("StatefulSet %s/%s is missing spec.podManagementPolicy", [input.metadata.namespace, input.metadata.name])
 }
 deny[msg] {
  input.kind == "StatefulSet"
  count(object.get(input.spec, "volumeClaimTemplates", [])) > 0
  object.get(input.spec, "revisionHistoryLimit", 0) == 0
  msg := sprintf("StatefulSet %s/%s is missing spec.revisionHistoryLimit", [input.metadata.namespace, input.metadata.name])
 }
 deny[msg] {
  input.kind == "StatefulSet"
  claim := input.spec.volumeClaimTemplates[_]
  object.get(claim.spec, "volumeMode", "") != "Filesystem"
  claim_name := object.get(claim.metadata, "name", "<unnamed>")
  msg := sprintf("StatefulSet %s/%s volumeClaimTemplate %s is missing volumeMode: Filesystem", [input.metadata.namespace, input.metadata.name, claim_name])
 }
--- a/tests/bluejay-infra-lint/conftest.dev/06_localhost_image_pull_policy.rego
+++ b/tests/bluejay-infra-lint/conftest.dev/06_localhost_image_pull_policy.rego
@@ -0,0 +1,40 @@
 package bluejayinfra.localhost_image_pull_policy
 pod_spec(spec) = pod {
  input.kind == "Deployment"
  pod := spec.template.spec
 }
 pod_spec(spec) = pod {
  input.kind == "StatefulSet"
  pod := spec.template.spec
 }
 pod_spec(spec) = pod {
  input.kind == "DaemonSet"
  pod := spec.template.spec
 }
 deny[msg] {
  pod := pod_spec(input.spec)
  container := pod.containers[_]
  startswith(object.get(container, "image", ""), "localhost/")
  object.get(container, "imagePullPolicy", "") != "Never"
  msg := sprintf("%s/%s container %s uses a localhost image without imagePullPolicy: Never", [input.metadata.namespace, input.metadata.name, container.name])
 }
 deny[msg] {
  pod := pod_spec(input.spec)
  container := pod.initContainers[_]
  startswith(object.get(container, "image", ""), "localhost/")
  object.get(container, "imagePullPolicy", "") != "Never"
  msg := sprintf("%s/%s initContainer %s uses a localhost image without imagePullPolicy: Never", [input.metadata.namespace, input.metadata.name, container.name])
 }
 deny[msg] {
  pod := pod_spec(input.spec)
  container := pod.containers[_]
  startswith(object.get(container, "image", ""), "fc-")
  not contains(object.get(container, "image", ""), "/")
  msg := sprintf("%s/%s container %s uses a non-localhost FlowerCore image reference %s", [input.metadata.namespace, input.metadata.name, container.name, container.image])
 }
--- a/tests/bluejay-infra-lint/conftest.dev/07_public_egress_dns_none.rego
+++ b/tests/bluejay-infra-lint/conftest.dev/07_public_egress_dns_none.rego
@@ -0,0 +1,27 @@
 package bluejayinfra.public_egress_dns_none
 public_egress_workloads := {
  "asterisk",
  "fc-llm-bridge",
  "mysql-web",
  "php-web",
  "ttsreader-align",
  "ttsreader-kokoro",
  "ttsreader-modern",
  "ttsreader-piper",
 }
 deny[msg] {
  input.kind == "Deployment"
  public_egress_workloads[input.metadata.name]
  object.get(input.spec.template.spec, "dnsPolicy", "") != "None"
  msg := sprintf("Deployment %s/%s must set dnsPolicy: None for public-internet egress", [input.metadata.namespace, input.metadata.name])
 }
 deny[msg] {
  input.kind == "Deployment"
  public_egress_workloads[input.metadata.name]
  search := object.get(object.get(input.spec.template.spec, "dnsConfig", {}), "searches", [])[_]
  contains(lower(search), "iamworkin.lan")
  msg := sprintf("Deployment %s/%s must not include iamworkin.lan in dnsConfig.searches", [input.metadata.namespace, input.metadata.name])
 }
--- a/tests/bluejay-infra-lint/conftest.dev/08_public_readwrite_allowlist.rego
+++ b/tests/bluejay-infra-lint/conftest.dev/08_public_readwrite_allowlist.rego
@@ -0,0 +1,40 @@
 package bluejayinfra.public_readwrite_allowlist
 # Public hosts that allow a tightly bounded write surface in addition to
 # GET/HEAD. updatecenter.iamworkin.lan accepts POST /api/v1/checkin/{id}
 # (bootstrap-JWT) so its allowlist is GET||HEAD||POST||OPTIONS — but
 # PUT/PATCH/DELETE must still 404 at the route. Any host in this set MUST
 # include all four required methods AND MUST NOT include any forbidden
 # method.
 public_readwrite_hosts := {
  "updatecenter.iamworkin.lan",
  "updates.iamworkin.lan",
  "update.flowercore.io",
  "updates.flowercore.io",
 }
 required_methods := {"GET", "HEAD", "POST", "OPTIONS"}
 forbidden_methods := {"PUT", "PATCH", "DELETE"}
 deny[msg] {
  input.kind == "IngressRoute"
  route := input.spec.routes[_]
  match := object.get(route, "match", "")
  host := public_readwrite_hosts[_]
  contains(match, sprintf("Host(`%s`)", [host]))
  required := required_methods[_]
  not contains(match, sprintf("Method(`%s`)", [required]))
  msg := sprintf("IngressRoute %s/%s is missing required Method(%s) for public read-write host %s", [input.metadata.namespace, input.metadata.name, required, host])
 }
 deny[msg] {
  input.kind == "IngressRoute"
  route := input.spec.routes[_]
  match := object.get(route, "match", "")
  host := public_readwrite_hosts[_]
  contains(match, sprintf("Host(`%s`)", [host]))
  forbidden := forbidden_methods[_]
  contains(match, sprintf("Method(`%s`)", [forbidden]))
  msg := sprintf("IngressRoute %s/%s must not include Method(%s) on public read-write host %s", [input.metadata.namespace, input.metadata.name, forbidden, host])
 }