feat(fc-devicemgmt): add Kubernetes deployment manifests

fix(guacamole): add --- separator between macmini-vnc-creds OnePasswordItem and guacamole-branding ConfigMap
Missing document separator caused YAML to merge the OnePasswordItem's top-level `spec: itemPath:` block into the ConfigMap that follows. Result: a ConfigMap with a `.spec` field whose K8s schema does not declare one, triggering ArgoCD's structured-merge diff to fail since 2026-05-11T15:30:54Z: Failed to compare desired state to live state: failed to calculate diff: error calculating structured merge diff: error building typed value from config resource: .spec: field not declared in schema App stayed Healthy (live K8s tolerated the extra field — ConfigMap ignored it) but ArgoCD's diff calc was broken, leaving the app stuck at sync=Unknown for all 21 resources. Adding the missing `---` separator makes the OnePasswordItem and ConfigMap proper sibling YAML documents, each with its own kind-correct schema. Diagnosed during 2026-05-12 morning routine. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 15:43:22 -05:00 · 2026-05-12 09:26:03 -05:00 · 2026-05-11 19:02:58 -05:00 · 2026-05-11 18:37:15 -05:00 · 2026-05-11 10:42:27 -05:00 · 2026-05-11 10:30:05 -05:00
19 changed files with 1185 additions and 24 deletions
--- a/apps/fc-devicemgmt/1password-item.yaml
+++ b/apps/fc-devicemgmt/1password-item.yaml
@@ -0,0 +1,26 @@
 # Runtime secrets for FlowerCore.DeviceManagement.
 #
 # OnePasswordItem operator syncs this item into a Kubernetes Secret with the
 # same name. Expected fields:
 #   DB-Password
 #   mtls-ca.pem
 #   mtls-client.crt
 #   mtls-client.key
 #   mtls-chain.pem
 #
 # Do not add literal secret values to this repo. Runtime pods consume the
 # synced Secret through env vars and read-only mounts.
 apiVersion: onepassword.com/v1
 kind: OnePasswordItem
 metadata:
  name: fc-devicemgmt-runtime
  namespace: fc-devicemgmt
  labels:
    app.kubernetes.io/name: fc-devicemgmt
    app.kubernetes.io/component: secrets
    app.kubernetes.io/part-of: flowercore
    app.kubernetes.io/managed-by: argocd
    flowercore.io/tenant-id: system
    flowercore.io/created-by: bluejay-infra
 spec:
  itemPath: "vaults/IAmWorkin/items/FlowerCore DeviceManagement Runtime"
--- a/apps/fc-devicemgmt/argocd-application.yaml
+++ b/apps/fc-devicemgmt/argocd-application.yaml
@@ -0,0 +1,33 @@
 # Explicit ArgoCD Application shape for bootstrap/review.
 #
 # The live bluejay-infra ApplicationSet already discovers apps/* directories
 # and creates this same Application name (`infra-fc-devicemgmt`) automatically.
 # Keep repoURL on the internal Gitea ClusterIP URL; ArgoCD does not trust the
 # external step-ca HTTPS endpoint.
 apiVersion: argoproj.io/v1alpha1
 kind: Application
 metadata:
  name: infra-fc-devicemgmt
  namespace: argocd
  labels:
    app.kubernetes.io/name: fc-devicemgmt
    app.kubernetes.io/part-of: flowercore
    app.kubernetes.io/managed-by: argocd
    flowercore.io/tenant-id: system
    flowercore.io/created-by: bluejay-infra
 spec:
  project: default
  source:
    repoURL: http://gitea-clusterip.gitea.svc.cluster.local:3000/bluejay/bluejay-infra.git
    targetRevision: main
    path: apps/fc-devicemgmt
  destination:
    server: https://kubernetes.default.svc
    namespace: fc-devicemgmt
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true
      - ServerSideApply=true
--- a/apps/fc-devicemgmt/certificate-web.yaml
+++ b/apps/fc-devicemgmt/certificate-web.yaml
@@ -0,0 +1,30 @@
 # Certificate for devices.iamworkin.lan.
 #
 # Preflight gate: FlowerCore.DNS / pfSense must contain an explicit A record:
 #   devices.iamworkin.lan -> 10.0.56.200
 # before this Certificate is synced. step-ca ACME cannot see the CoreDNS
 # wildcard, so missing pfSense DNS produces cert-manager HTTP-01 backoff
 # (feedback_pfsense_dns_required_for_acme).
 apiVersion: cert-manager.io/v1
 kind: Certificate
 metadata:
  name: fc-devicemgmt-web-tls
  namespace: fc-devicemgmt
  labels:
    app.kubernetes.io/name: fc-devicemgmt-web
    app.kubernetes.io/component: web
    app.kubernetes.io/part-of: flowercore
    app.kubernetes.io/managed-by: argocd
    flowercore.io/tenant-id: system
    flowercore.io/created-by: bluejay-infra
  annotations:
    flowercore.io/dns-preflight: "devices.iamworkin.lan must resolve to 10.0.56.200 before ACME sync"
 spec:
  secretName: fc-devicemgmt-web-tls
  issuerRef:
    name: step-ca-acme
    kind: ClusterIssuer
  dnsNames:
    - devices.iamworkin.lan
  duration: 720h
  renewBefore: 240h
--- a/apps/fc-devicemgmt/clusterrole-operator.yaml
+++ b/apps/fc-devicemgmt/clusterrole-operator.yaml
@@ -0,0 +1,81 @@
 apiVersion: rbac.authorization.k8s.io/v1
 kind: ClusterRole
 metadata:
  name: fc-devicemgmt-operator
  labels:
    app.kubernetes.io/name: fc-devicemgmt-operator
    app.kubernetes.io/component: operator
    app.kubernetes.io/part-of: flowercore
    app.kubernetes.io/managed-by: argocd
    flowercore.io/tenant-id: system
    flowercore.io/created-by: bluejay-infra
 rules:
  - apiGroups:
      - devices.flowercore.io
    resources:
      - '*'
    verbs:
      - get
      - list
      - watch
      - create
      - update
      - patch
      - delete
  - apiGroups:
      - devices.flowercore.io
    resources:
      - devices/status
      - devices/finalizers
      - devicegroups/status
      - devicegroups/finalizers
      - devicepolicies/status
      - devicepolicies/finalizers
      - remotecommands/status
      - remotecommands/finalizers
    verbs:
      - get
      - update
      - patch
  - apiGroups:
      - apps
    resources:
      - deployments
    verbs:
      - get
  - apiGroups:
      - ""
    resources:
      - pods
      - services
      - configmaps
      - secrets
      - events
    verbs:
      - get
      - list
      - watch
      - create
      - update
      - patch
      - delete
  - apiGroups:
      - batch
    resources:
      - jobs
    verbs:
      - get
      - list
      - watch
      - create
      - update
      - patch
      - delete
  - apiGroups:
      - networking.k8s.io
    resources:
      - networkpolicies
    verbs:
      - get
      - list
      - watch
--- a/apps/fc-devicemgmt/clusterrolebinding-operator.yaml
+++ b/apps/fc-devicemgmt/clusterrolebinding-operator.yaml
@@ -0,0 +1,19 @@
 apiVersion: rbac.authorization.k8s.io/v1
 kind: ClusterRoleBinding
 metadata:
  name: fc-devicemgmt-operator
  labels:
    app.kubernetes.io/name: fc-devicemgmt-operator
    app.kubernetes.io/component: operator
    app.kubernetes.io/part-of: flowercore
    app.kubernetes.io/managed-by: argocd
    flowercore.io/tenant-id: system
    flowercore.io/created-by: bluejay-infra
 roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: fc-devicemgmt-operator
 subjects:
  - kind: ServiceAccount
    name: fc-devicemgmt-operator
    namespace: fc-devicemgmt
--- a/apps/fc-devicemgmt/deployment-operator.yaml
+++ b/apps/fc-devicemgmt/deployment-operator.yaml
@@ -0,0 +1,109 @@
 # FlowerCore.DeviceManagement Operator.
 #
 # KubeOps controller for devices.flowercore.io resources. Operator-created
 # children must set OwnerReferences + traceability labels/annotations per
 # k8s-pod-ownership-and-traceability-standard.md. RBAC below grants
 # apps/deployments/get so the process can resolve its own Deployment UID.
 apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: fc-devicemgmt-operator
  namespace: fc-devicemgmt
  labels:
    app: fc-devicemgmt-operator
    app.kubernetes.io/name: fc-devicemgmt-operator
    app.kubernetes.io/component: operator
    app.kubernetes.io/part-of: flowercore
    app.kubernetes.io/managed-by: argocd
    flowercore.io/tenant-id: system
    flowercore.io/created-by: bluejay-infra
  annotations:
    flowercore.io/traceability-standard: k8s-pod-ownership-and-traceability-standard
 spec:
  replicas: 1
  revisionHistoryLimit: 3
  selector:
    matchLabels:
      app: fc-devicemgmt-operator
  template:
    metadata:
      labels:
        app: fc-devicemgmt-operator
        app.kubernetes.io/name: fc-devicemgmt-operator
        app.kubernetes.io/component: operator
        app.kubernetes.io/part-of: flowercore
        app.kubernetes.io/managed-by: argocd
        flowercore.io/tenant-id: system
        flowercore.io/created-by: bluejay-infra
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8080"
        prometheus.io/path: "/metrics"
        flowercore.io/audit-trace-id: "runtime-activity-trace"
    spec:
      serviceAccountName: fc-devicemgmt-operator
      securityContext:
        fsGroup: 1654
        fsGroupChangePolicy: OnRootMismatch
      containers:
        - name: operator
          image: localhost/fc-devicemgmt-operator:v20260512-cx5
          imagePullPolicy: Never
          ports:
            - name: metrics
              containerPort: 8080
          env:
            - name: ASPNETCORE_ENVIRONMENT
              value: "Production"
            - name: DOTNET_SYSTEM_GLOBALIZATION_INVARIANT
              value: "false"
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            - name: FLOWERCORE_KUBERNETES_OWNER_DEPLOYMENT
              value: "fc-devicemgmt-operator"
            - name: FlowerCore__Service__Name
              value: "FlowerCore.DeviceManagement.Operator"
            - name: FlowerCore__DeviceManagement__DefaultTenantId
              value: "system"
          resources:
            requests:
              cpu: 50m
              memory: 128Mi
            limits:
              cpu: 500m
              memory: 512Mi
          readinessProbe:
            tcpSocket:
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 10
          livenessProbe:
            tcpSocket:
              port: 8080
            initialDelaySeconds: 20
            periodSeconds: 30
          securityContext:
            runAsNonRoot: true
            runAsUser: 1654
            runAsGroup: 1654
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true
            capabilities:
              drop:
                - ALL
          volumeMounts:
            - name: tmp
              mountPath: /tmp
            - name: logs
              mountPath: /app/logs
      volumes:
        - name: tmp
          emptyDir: {}
        - name: logs
          emptyDir: {}
--- a/apps/fc-devicemgmt/deployment-web.yaml
+++ b/apps/fc-devicemgmt/deployment-web.yaml
@@ -0,0 +1,135 @@
 # FlowerCore.DeviceManagement Web.
 #
 # Source repo is expected to ship FlowerCore.DeviceManagement.Web in a later
 # Sprint 9+ lane. This manifest is static-valid without requiring the image to
 # exist yet; import localhost/fc-devicemgmt-web:<tag> to all schedulable RKE2
 # nodes before letting ArgoCD sync a live rollout.
 apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: fc-devicemgmt-web
  namespace: fc-devicemgmt
  labels:
    app: fc-devicemgmt-web
    app.kubernetes.io/name: fc-devicemgmt-web
    app.kubernetes.io/component: web
    app.kubernetes.io/part-of: flowercore
    app.kubernetes.io/managed-by: argocd
    flowercore.io/tenant-id: system
    flowercore.io/created-by: bluejay-infra
  annotations:
    flowercore.io/traceability-standard: k8s-pod-ownership-and-traceability-standard
 spec:
  replicas: 2
  revisionHistoryLimit: 3
  selector:
    matchLabels:
      app: fc-devicemgmt-web
  template:
    metadata:
      labels:
        app: fc-devicemgmt-web
        app.kubernetes.io/name: fc-devicemgmt-web
        app.kubernetes.io/component: web
        app.kubernetes.io/part-of: flowercore
        app.kubernetes.io/managed-by: argocd
        flowercore.io/tenant-id: system
        flowercore.io/created-by: bluejay-infra
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8080"
        prometheus.io/path: "/metrics"
        flowercore.io/audit-trace-id: "runtime-activity-trace"
    spec:
      securityContext:
        fsGroup: 1654
        fsGroupChangePolicy: OnRootMismatch
      containers:
        - name: web
          image: localhost/fc-devicemgmt-web:v20260512-cx5
          imagePullPolicy: Never
          ports:
            - name: http
              containerPort: 8080
          env:
            - name: ASPNETCORE_URLS
              value: "http://+:8080"
            - name: ASPNETCORE_ENVIRONMENT
              value: "Production"
            - name: DOTNET_SYSTEM_GLOBALIZATION_INVARIANT
              value: "false"
            - name: FlowerCore__Service__Name
              value: "FlowerCore.DeviceManagement.Web"
            - name: FlowerCore__DeviceManagement__DefaultTenantId
              value: "system"
            - name: FlowerCore__Database__Provider
              value: "MySql"
            - name: FlowerCore__Database__Host
              value: "mysql.fc-mysql.svc"
            - name: FlowerCore__Database__Database
              value: "flowercore_devicemgmt"
            - name: FlowerCore__Database__User
              value: "fc_devicemgmt"
            - name: FlowerCore__Database__Password
              valueFrom:
                secretKeyRef:
                  name: fc-devicemgmt-runtime
                  key: DB-Password
            - name: FlowerCore__DeviceManagement__AgentMtls__CaPath
              value: "/secrets/devicemgmt-mtls/mtls-ca.pem"
            - name: FlowerCore__DeviceManagement__AgentMtls__ClientCertificatePath
              value: "/secrets/devicemgmt-mtls/mtls-client.crt"
            - name: FlowerCore__DeviceManagement__AgentMtls__ClientKeyPath
              value: "/secrets/devicemgmt-mtls/mtls-client.key"
            - name: FlowerCore__EventBus__Redis__Configuration
              value: "redis.fc-redis.svc:6379"
          resources:
            requests:
              cpu: 100m
              memory: 256Mi
            limits:
              cpu: 1000m
              memory: 768Mi
          startupProbe:
            tcpSocket:
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 5
            failureThreshold: 30
          readinessProbe:
            tcpSocket:
              port: 8080
            periodSeconds: 10
            failureThreshold: 3
          livenessProbe:
            tcpSocket:
              port: 8080
            initialDelaySeconds: 30
            periodSeconds: 30
            failureThreshold: 3
          securityContext:
            runAsNonRoot: true
            runAsUser: 1654
            runAsGroup: 1654
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true
            capabilities:
              drop:
                - ALL
          volumeMounts:
            - name: tmp
              mountPath: /tmp
            - name: logs
              mountPath: /app/logs
            - name: devicemgmt-mtls
              mountPath: /secrets/devicemgmt-mtls
              readOnly: true
      volumes:
        - name: tmp
          emptyDir: {}
        - name: logs
          emptyDir: {}
        - name: devicemgmt-mtls
          secret:
            secretName: fc-devicemgmt-runtime
            defaultMode: 0400
--- a/apps/fc-devicemgmt/ingressroute-web.yaml
+++ b/apps/fc-devicemgmt/ingressroute-web.yaml
@@ -0,0 +1,55 @@
 # LAN ingress for FlowerCore.DeviceManagement Web.
 #
 # RKE2 Traefik has no built-in ACME resolver configured. Keep TLS certificate
 # ownership in cert-manager Certificate/fc-devicemgmt-web-tls.
 apiVersion: traefik.io/v1alpha1
 kind: IngressRoute
 metadata:
  name: fc-devicemgmt-web
  namespace: fc-devicemgmt
  labels:
    app.kubernetes.io/name: fc-devicemgmt-web
    app.kubernetes.io/component: web
    app.kubernetes.io/part-of: flowercore
    app.kubernetes.io/managed-by: argocd
    flowercore.io/tenant-id: system
    flowercore.io/created-by: bluejay-infra
 spec:
  entryPoints:
    - websecure
  routes:
    - match: Host(`devices.iamworkin.lan`)
      kind: Rule
      services:
        - name: fc-devicemgmt-web
          port: 80
  tls:
    secretName: fc-devicemgmt-web-tls
 # Future public agent/update host gate (OFF by default):
 #
 # Do not enable `update.flowercore.io` here until Authentik OIDC Q-OIDC-1
 # resolves the public-device-management auth model and route ownership with
 # UpdateCenter. When enabled, use a separate public IngressRoute with an
 # explicit Method allowlist, public-host auth middleware, and public TLS
 # certificate strategy. Leaving this as comments keeps ArgoCD from stealing
 # live UpdateCenter traffic.
 #
 # apiVersion: traefik.io/v1alpha1
 # kind: IngressRoute
 # metadata:
 #   name: fc-devicemgmt-web-public
 #   namespace: fc-devicemgmt
 #   annotations:
 #     flowercore.io/public-host-gate: "disabled-until-Q-OIDC-1"
 # spec:
 #   entryPoints:
 #     - websecure
 #   routes:
 #     - match: Host(`update.flowercore.io`) && (Method(`GET`) || Method(`HEAD`) || Method(`POST`) || Method(`OPTIONS`))
 #       kind: Rule
 #       services:
 #         - name: fc-devicemgmt-web
 #           port: 80
 #   tls:
 #     secretName: fc-devicemgmt-public-tls
--- a/apps/fc-devicemgmt/namespace.yaml
+++ b/apps/fc-devicemgmt/namespace.yaml
@@ -0,0 +1,13 @@
 # FlowerCore.DeviceManagement namespace.
 #
 # ArgoCD discovers this directory as Application `infra-fc-devicemgmt`.
 apiVersion: v1
 kind: Namespace
 metadata:
  name: fc-devicemgmt
  labels:
    app.kubernetes.io/name: fc-devicemgmt
    app.kubernetes.io/part-of: flowercore
    app.kubernetes.io/managed-by: argocd
    flowercore.io/tenant-id: system
    flowercore.io/created-by: bluejay-infra
--- a/apps/fc-devicemgmt/network-policy.yaml
+++ b/apps/fc-devicemgmt/network-policy.yaml
@@ -0,0 +1,224 @@
 # FlowerCore.DeviceManagement NetworkPolicies.
 #
 # NetworkPolicies belong in bluejay-infra so ArgoCD owns rebuild state.
 # Rules include Traefik post-DNAT backend ports per
 # feedback_netpol_dnat_backend_port and Synology NFS egress for the requested
 # cold-tier / future artifact path.
 ---
 apiVersion: networking.k8s.io/v1
 kind: NetworkPolicy
 metadata:
  name: fc-devicemgmt-web-isolation
  namespace: fc-devicemgmt
  labels:
    app.kubernetes.io/name: fc-devicemgmt-web
    app.kubernetes.io/component: web
    app.kubernetes.io/part-of: flowercore
    app.kubernetes.io/managed-by: argocd
    flowercore.io/tenant-id: system
    flowercore.io/created-by: bluejay-infra
 spec:
  podSelector:
    matchLabels:
      app: fc-devicemgmt-web
  policyTypes:
    - Ingress
    - Egress
  ingress:
    # LAN edge: only cluster Traefik should reach the Web pod for
    # devices.iamworkin.lan.
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: traefik-system
          podSelector:
            matchLabels:
              app.kubernetes.io/name: traefik
      ports:
        - port: 8080
          protocol: TCP
    # Direct LAN diagnostics are allowed only from FlowerCore LAN/VPN ranges.
    - from:
        - ipBlock:
            cidr: 10.0.56.0/24
        - ipBlock:
            cidr: 10.0.57.0/24
        - ipBlock:
            cidr: 10.0.58.0/24
        - ipBlock:
            cidr: 10.0.68.0/27
      ports:
        - port: 8080
          protocol: TCP
  egress:
    # CoreDNS.
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system
          podSelector:
            matchLabels:
              k8s-app: kube-dns
      ports:
        - port: 53
          protocol: UDP
        - port: 53
          protocol: TCP
    # Database namespace.
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: fc-mysql
      ports:
        - port: 3306
          protocol: TCP
    # Redis backplane for multi-replica SignalR / live-status fan-out.
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: fc-redis
      ports:
        - port: 6379
          protocol: TCP
    # Traefik VIP / in-cluster Traefik for self-callbacks and public URL
    # generation tests. Include post-DNAT backend ports 8443 + 8080.
    - to:
        - ipBlock:
            cidr: 10.0.56.200/32
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: traefik-system
          podSelector:
            matchLabels:
              app.kubernetes.io/name: traefik
      ports:
        - port: 80
          protocol: TCP
        - port: 443
          protocol: TCP
        - port: 8080
          protocol: TCP
        - port: 8443
          protocol: TCP
    # Agent egress: LAN/VPN devices may run DM Agent in Generic, Kiosk, Pi,
    # ThinClient, or Server mode. Keep this private-range only.
    - to:
        - ipBlock:
            cidr: 10.0.56.0/24
        - ipBlock:
            cidr: 10.0.57.0/24
        - ipBlock:
            cidr: 10.0.58.0/24
        - ipBlock:
            cidr: 10.0.68.0/27
      ports:
        - port: 80
          protocol: TCP
        - port: 443
          protocol: TCP
        - port: 8080
          protocol: TCP
        - port: 8443
          protocol: TCP
        - port: 5000
          protocol: TCP
        - port: 5001
          protocol: TCP
    # Synology NFS cold-tier / artifact mount allowance.
    - to:
        - ipBlock:
            cidr: 10.0.58.3/32
      ports:
        - port: 2049
          protocol: TCP
        - port: 2049
          protocol: UDP
        - port: 111
          protocol: TCP
        - port: 111
          protocol: UDP
 ---
 apiVersion: networking.k8s.io/v1
 kind: NetworkPolicy
 metadata:
  name: fc-devicemgmt-operator-isolation
  namespace: fc-devicemgmt
  labels:
    app.kubernetes.io/name: fc-devicemgmt-operator
    app.kubernetes.io/component: operator
    app.kubernetes.io/part-of: flowercore
    app.kubernetes.io/managed-by: argocd
    flowercore.io/tenant-id: system
    flowercore.io/created-by: bluejay-infra
 spec:
  podSelector:
    matchLabels:
      app: fc-devicemgmt-operator
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: monitoring
      ports:
        - port: 8080
          protocol: TCP
  egress:
    # CoreDNS.
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system
          podSelector:
            matchLabels:
              k8s-app: kube-dns
      ports:
        - port: 53
          protocol: UDP
        - port: 53
          protocol: TCP
    # Kubernetes API for KubeOps reconciliation and Deployment UID lookup.
    - to: []
      ports:
        - port: 443
          protocol: TCP
        - port: 6443
          protocol: TCP
    # Agent egress for operator-initiated probes / fallback command dispatch.
    - to:
        - ipBlock:
            cidr: 10.0.56.0/24
        - ipBlock:
            cidr: 10.0.57.0/24
        - ipBlock:
            cidr: 10.0.58.0/24
        - ipBlock:
            cidr: 10.0.68.0/27
      ports:
        - port: 80
          protocol: TCP
        - port: 443
          protocol: TCP
        - port: 8080
          protocol: TCP
        - port: 8443
          protocol: TCP
        - port: 5000
          protocol: TCP
        - port: 5001
          protocol: TCP
    # Synology NFS allowance for future cold-tier/audit archival jobs.
    - to:
        - ipBlock:
            cidr: 10.0.58.3/32
      ports:
        - port: 2049
          protocol: TCP
        - port: 2049
          protocol: UDP
        - port: 111
          protocol: TCP
        - port: 111
          protocol: UDP
--- a/apps/fc-devicemgmt/service-web.yaml
+++ b/apps/fc-devicemgmt/service-web.yaml
@@ -0,0 +1,22 @@
 apiVersion: v1
 kind: Service
 metadata:
  name: fc-devicemgmt-web
  namespace: fc-devicemgmt
  labels:
    app: fc-devicemgmt-web
    app.kubernetes.io/name: fc-devicemgmt-web
    app.kubernetes.io/component: web
    app.kubernetes.io/part-of: flowercore
    app.kubernetes.io/managed-by: argocd
    flowercore.io/tenant-id: system
    flowercore.io/created-by: bluejay-infra
 spec:
  type: ClusterIP
  selector:
    app: fc-devicemgmt-web
  ports:
    - name: http
      port: 80
      targetPort: 8080
      protocol: TCP
--- a/apps/fc-devicemgmt/serviceaccount-operator.yaml
+++ b/apps/fc-devicemgmt/serviceaccount-operator.yaml
@@ -0,0 +1,12 @@
 apiVersion: v1
 kind: ServiceAccount
 metadata:
  name: fc-devicemgmt-operator
  namespace: fc-devicemgmt
  labels:
    app.kubernetes.io/name: fc-devicemgmt-operator
    app.kubernetes.io/component: operator
    app.kubernetes.io/part-of: flowercore
    app.kubernetes.io/managed-by: argocd
    flowercore.io/tenant-id: system
    flowercore.io/created-by: bluejay-infra
--- a/apps/fc-redis/fc-redis.yaml
+++ b/apps/fc-redis/fc-redis.yaml
@@ -0,0 +1,171 @@
 # fc-redis — SignalR backplane for cross-product event bus
 #
 # Lands per Q-SO-1 resolution (2026-05-11 PM): SignalR backplane in Phase A,
 # not Phase C as originally drafted. Operator directive: "Redis can be
 # deployed just fine as it's another FlowerCore technology we'll want to
 # manage."
 #
 # Phase A scope (this file):
 #   - Single Redis 7.x Alpine pod
 #   - 1Gi Longhorn RWO PVC for AOF persistence
 #   - ClusterIP Service at `redis.fc-redis.svc.cluster.local:6379`
 #   - No AUTH (in-cluster only; not exposed externally)
 #   - No IngressRoute (backplane is server-to-server only)
 #
 # Consumers (Phase A IMPL across FC services):
 #   - FlowerCore.Signage.Web (OpsConsoleHub)
 #   - FlowerCore.Scoreboard.Web (ScoreboardHub)
 #   - FlowerCore.SignalControl.Web
 #   - FlowerCore.DMS.Web
 #   - Any other product joining the cross-product event bus
 #
 # Each consumer adds:
 #   services.AddSignalR()
 #           .AddStackExchangeRedis(
 #               "redis.fc-redis.svc.cluster.local:6379",
 #               opts => opts.Configuration.ChannelPrefix =
 #                   StackExchange.Redis.RedisChannel.Literal("fc-opsconsole"));
 #
 # Phase B / C follow-ons (out of scope here):
 #   - Redis Sentinel for HA (3-node)
 #   - AUTH password from 1Password Connect (rotate via /rotate-password)
 #   - redis_exporter sidecar for Prometheus scrape
 #   - Network policies restricting which namespaces can dial 6379
 #
 # Design: docs/signage/operations-console-phase-2-design.md §3.5
 # Decision: Q-SO-1 (RESOLVED 2026-05-11 PM)
 # Memory: feedback_blooming_ui_pattern_no_iframes
 ---
 apiVersion: v1
 kind: Namespace
 metadata:
  name: fc-redis
  labels:
    app.kubernetes.io/part-of: flowercore
    app.kubernetes.io/managed-by: argocd
 ---
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
  name: fc-redis-data
  namespace: fc-redis
 spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: longhorn
  resources:
    requests:
      storage: 1Gi
 ---
 apiVersion: v1
 kind: ConfigMap
 metadata:
  name: fc-redis-config
  namespace: fc-redis
 data:
  redis.conf: |
    # Phase A — minimal config; no AUTH, no replication.
    bind 0.0.0.0
    protected-mode no
    port 6379
    tcp-backlog 511
    timeout 0
    tcp-keepalive 300
    # Persistence: AOF (fsync every second is the standard SignalR-backplane
    # durability sweet spot — the backplane only needs to survive Redis
    # restarts, not absolute zero loss).
    appendonly yes
    appendfsync everysec
    auto-aof-rewrite-percentage 100
    auto-aof-rewrite-min-size 64mb
    # Reasonable defaults — let Redis pick most things.
    maxmemory-policy allkeys-lru
    maxmemory 256mb
    # Logging
    loglevel notice
 ---
 apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: fc-redis
  namespace: fc-redis
  labels:
    app: fc-redis
 spec:
  replicas: 1
  strategy:
    type: Recreate           # RWO PVC; do not do rolling update
  selector:
    matchLabels:
      app: fc-redis
  template:
    metadata:
      labels:
        app: fc-redis
    spec:
      securityContext:
        runAsNonRoot: true
        runAsUser: 999       # redis:7-alpine default uid
        runAsGroup: 999
        fsGroup: 999
      containers:
        - name: redis
          image: redis:7-alpine
          imagePullPolicy: IfNotPresent
          command: ["redis-server", "/etc/redis/redis.conf"]
          ports:
            - name: redis
              containerPort: 6379
          resources:
            requests:
              cpu: "50m"
              memory: "128Mi"
            limits:
              cpu: "500m"
              memory: "384Mi"
          volumeMounts:
            - name: data
              mountPath: /data
            - name: config
              mountPath: /etc/redis
              readOnly: true
          livenessProbe:
            tcpSocket:
              port: 6379
            initialDelaySeconds: 5
            periodSeconds: 10
          readinessProbe:
            exec:
              command: ["redis-cli", "ping"]
            initialDelaySeconds: 2
            periodSeconds: 5
          securityContext:
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true
            capabilities:
              drop: [ALL]
      volumes:
        - name: data
          persistentVolumeClaim:
            claimName: fc-redis-data
        - name: config
          configMap:
            name: fc-redis-config
 ---
 apiVersion: v1
 kind: Service
 metadata:
  name: redis
  namespace: fc-redis
 spec:
  type: ClusterIP
  selector:
    app: fc-redis
  ports:
    - name: redis
      port: 6379
      targetPort: 6379
      protocol: TCP
--- a/apps/guacamole/guacamole.yaml
+++ b/apps/guacamole/guacamole.yaml
@@ -466,11 +466,11 @@ spec:
  itemPath: vaults/IAmWorkin/items/Guacamole JSON Auth
 ---
 ---
-# 1Password-backed credentials for Mac mini VNC access (Phase 1 — 2026-04-28)
+# 1Password-backed credentials for Mac mini VNC access (Phase 1 <EFBFBD> 2026-04-28)
 # The operator mints Secret 'macmini-vnc-creds' with keys: username, password, VNC Password
 # Note: '1Password' field label 'VNC Password' -> K8s Secret key 'VNC Password' (space retained)
 # Guacamole VNC connection password is sourced from the 'VNC Password' field.
-# Actual IP is 10.0.56.115 (INFRA VLAN) — the 1P item 'IP' field is kept as backup reference.
+# Actual IP is 10.0.56.115 (INFRA VLAN) <EFBFBD> the 1P item 'IP' field is kept as backup reference.
 apiVersion: onepassword.com/v1
 kind: OnePasswordItem
 metadata:
@@ -481,6 +481,7 @@ metadata:
    app.kubernetes.io/part-of: flowercore
 spec:
  itemPath: vaults/IAmWorkin/items/Mac Mini
 ---
 # Blue Jay Branding Extension (CSS + translations)
 apiVersion: v1
 kind: ConfigMap
--- a/apps/kubevirt-vms/ci1.yaml
+++ b/apps/kubevirt-vms/ci1.yaml
@@ -411,24 +411,22 @@ spec:
            # Confirmed via debug pod: PVC content IS a real bootable ISO9660
            # (file: "ISO 9660 CD-ROM filesystem data ... (bootable)"), so the
            # only bug was boot priority.
-            # 2026-05-08 PM: ISO presented as a virtio-blk DISK (not cdrom).
+            # 2026-05-08 PM: cdrom bus SCSI + containerDisk delivery. This
-            # Both SATA and SCSI cdrom buses hit OVMF BdsDxe "starting Boot0001
+            # combination boots qemu cleanly and reaches OVMF, but OVMF
-            # ... Time out" regardless of storage backend (NFS, Longhorn PVC,
+            # BdsDxe still hits "starting Boot0001 ... Time out" on the
-            # containerDisk tmpfs — all rule out IO speed). The qemu cdrom
+            # cdrom — see HANDOFF.md / CODEX-STATUS.md "OPEN — ci1" for the
-            # emulation path appears to have a deep-seated read window issue
+            # full diagnostic chain. virtio-blk disk swap was attempted as a
-            # under KubeVirt v1.4.0's OVMF firmware.
+            # workaround but introduced a separate QEMU rootdisk flock issue
-            #
+            # without fixing the underlying OVMF cdrom problem; reverted.
-            # Workaround: present the ISO bytes as a regular virtio-blk disk
+            # Operator decision needed for next architectural step (OVMF
-            # (model="virtio-non-transitional"). UEFI/OVMF still recognizes
+            # custom build with extended timeout, KubeVirt version bump,
-            # ISO9660 + El Torito boot records on a regular disk, so it can
+            # Hyper-V/VirtualBox-and-export, or BIOS legacy boot). The
-            # boot the EFI bootloader the same way it would from a USB stick.
+            # containerDisk distribution pipeline (build/save/scp/ctr import)
-            # This is also closer to the FlowerCore.Distribution USB-key
+            # is proven and ready to reuse for any of those.
            # pattern: the ISO bytes live on a block device, UEFI boots from
            # the GPT/El Torito boot record, Windows installer runs.
            - name: windows-iso
              bootOrder: 1
-              disk:
+              cdrom:
-                bus: virtio
+                bus: scsi
            - name: rootdisk
              bootOrder: 2
              disk:
--- a/apps/monitoring/noc-monitoring.yaml
+++ b/apps/monitoring/noc-monitoring.yaml
@@ -974,6 +974,39 @@ data:
              summary: "Deployment {{ $labels.namespace }}/{{ $labels.deployment }} replica mismatch"
              description: "Spec wants {{ $labels.spec_replicas }} but only {{ $value }} available. Likely a rollout stuck on probe failure, scheduling, or PVC."
          # Q-MR-3 (2026-05-11): multus memory pressure — catches the next OOM
          # cascade BEFORE multus is OOM-killed cluster-wide. The 2026-05-10
          # outage (21h) hit because no alert fired on the rising multus working
          # set — only downstream blackbox / Traefik / service alerts. With
          # 1Gi limit (bluejay-infra@eb8693e), 80% = ~800MiB; steady-state
          # runs ~150-250MiB so this only fires when an avalanche starts.
          - alert: MultusMemoryPressure
            expr: |
              container_memory_working_set_bytes{container="kube-multus"}
                / container_spec_memory_limit_bytes{container="kube-multus"} > 0.8
            for: 5m
            labels:
              severity: critical
              alert_channel: thermal_print
            annotations:
              summary: "kube-multus memory >80% of limit on {{ $labels.node }} for 5m"
              description: "kube-multus working set is {{ $value | humanizePercentage }} of its memory limit on node {{ $labels.node }}. If this keeps climbing, multus will OOM and all new pod networking will halt cluster-wide (precedent: 2026-05-10 outage)."
          # Q-MR-3 (2026-05-11): namespace pending-pod backlog — catches the
          # operator-leak avalanche pattern BEFORE it cascades into a multus
          # CNI OOM. Any FC operator (RemoteDesktop / Distribution / WorldBuilder)
          # emitting pods without ownerReferences will accumulate them when
          # the operator crashes. >25 pending pods in any namespace for 30m
          # is the signal to investigate the reconciler.
          - alert: NamespacePendingPodBacklog
            expr: sum by (namespace) (kube_pod_status_phase{phase="Pending"}) > 25
            for: 30m
            labels:
              severity: warning
            annotations:
              summary: "Namespace {{ $labels.namespace }} has {{ $value }} Pending pods for 30m"
              description: "Pending pod count in {{ $labels.namespace }} exceeds 25 sustained for 30m. Likely operator-leak avalanche pattern — children emitted without ownerReferences. Risk of multus CNI OOM cascade."
      # Longhorn storage health alerts. Required: longhorn scrape job
      # (added 2026-04-26 — see scrape_configs above). The K8s events
      # for "snapshot becomes not ready to use" are transient lifecycle
--- a/apps/multus/multus.yaml
+++ b/apps/multus/multus.yaml
@@ -188,13 +188,24 @@ spec:
        - name: kube-multus
          image: ghcr.io/k8snetworkplumbingwg/multus-cni:snapshot-thick
          command: [ "/usr/src/multus-cni/bin/multus-daemon" ]
          # 2026-05-11: upstream default of 50Mi memory limit OOM-cascades when
          # an operator-owned namespace accumulates >100 pending pods retrying
          # CNI ADD. RemoteDesktop emitted 219 orphan rd-browser-only pods
          # (missing OwnerReferences), kubelet's CNI ADD avalanche pushed multus
          # over 50Mi, OOMKilled, restarted with even bigger backlog → loop.
          # 21h cluster outage. See FlowerCore.Notes:
          #   feedback_multus_50mi_limit_oom_orphan_pod_avalanche.md
          # 1Gi limit / 512Mi request comfortably handles a 200+ pod CNI
          # catchup burst on 64GB nodes (nodes are <25% used in steady-state).
          # Drop back toward 256Mi only after MultusMemoryPressure alert
          # proves steady-state working set sits well below 200Mi.
          resources:
            requests:
              cpu: "100m"
-              memory: "50Mi"
+              memory: "512Mi"
            limits:
              cpu: "100m"
-              memory: "50Mi"
+              memory: "1Gi"
          securityContext:
            privileged: true
          terminationMessagePolicy: FallbackToLogsOnError
--- a/apps/telephony/telephony.yaml
+++ b/apps/telephony/telephony.yaml
@@ -127,10 +127,13 @@ spec:
      initContainers:
        - name: fix-data-perms
          image: busybox:latest
-          # Also chown /shared-tts (hostPath /tmp/tts-audio) so the non-root
+          # Must run as root to chown the hostPath /tmp/tts-audio that may be
-          # app user (uid 1654) can write Piper .sln16 files that Asterisk
+          # root-owned after node reboot. Pod-level runAsNonRoot:true would
-          # reads at /var/lib/asterisk/sounds/tts. World-readable (755) is
+          # otherwise inherit and chown would fail with EPERM (see Notes memory
-          # fine — Asterisk runs as a different uid in the other pod.
+          # feedback_hostpath_initcontainer_chown_perms).
          securityContext:
            runAsUser: 0
            runAsNonRoot: false
          command: ["sh", "-c", "chown -R 1654:1654 /data && chown 1654:1654 /shared-tts && chmod 0755 /shared-tts"]
          volumeMounts:
            - name: telephony-data
--- a/tests/bluejay-infra-lint/FleetManifestLintTests.cs
+++ b/tests/bluejay-infra-lint/FleetManifestLintTests.cs
@@ -291,6 +291,184 @@ public sealed class FleetManifestLintTests
        violations.Should().BeEmpty();
    }
    [Fact]
    public void FcDeviceManagement_MustShipExpectedManifestSet()
    {
        var appRoot = Path.Combine(Inventory.BluejayRoot, "apps", "fc-devicemgmt");
        Directory.Exists(appRoot).Should().BeTrue("Sprint 8 Cx-5 owns apps/fc-devicemgmt.");
        var expectedFiles = new[]
        {
            "1password-item.yaml",
            "argocd-application.yaml",
            "certificate-web.yaml",
            "clusterrole-operator.yaml",
            "clusterrolebinding-operator.yaml",
            "deployment-operator.yaml",
            "deployment-web.yaml",
            "ingressroute-web.yaml",
            "namespace.yaml",
            "network-policy.yaml",
            "service-web.yaml",
            "serviceaccount-operator.yaml",
        };
        Directory.GetFiles(appRoot, "*.yaml")
            .Select(Path.GetFileName)
            .Should()
            .BeEquivalentTo(expectedFiles);
        foreach (var expectedFile in expectedFiles)
        {
            FcDeviceManagementDocuments()
                .Should()
                .Contain(document => document.RelativePath == $"fc-devicemgmt/{expectedFile}");
        }
    }
    [Fact]
    public void FcDeviceManagement_ObjectsMustCarryStandardTraceabilityLabels()
    {
        var requiredLabels = new[]
        {
            "app.kubernetes.io/name",
            "app.kubernetes.io/part-of",
            "app.kubernetes.io/managed-by",
            "flowercore.io/tenant-id",
            "flowercore.io/created-by",
        };
        var violations = FcDeviceManagementDocuments()
            .SelectMany(document => requiredLabels
                .Where(label => string.IsNullOrWhiteSpace(document.Scalar("metadata", "labels", label)))
                .Select(label => $"{document.Descriptor} is missing metadata.labels['{label}']."))
            .Concat(FcDeviceManagementDocuments()
                .Where(document => document.Kind == "Deployment")
                .SelectMany(document => requiredLabels
                    .Where(label => string.IsNullOrWhiteSpace(document.Scalar("spec", "template", "metadata", "labels", label)))
                    .Select(label => $"{document.Descriptor} pod template is missing metadata.labels['{label}'].")))
            .Concat(FcDeviceManagementDocuments()
                .Where(document => document.Kind == "Deployment")
                .Where(document => string.IsNullOrWhiteSpace(document.Scalar("spec", "template", "metadata", "annotations", "flowercore.io/audit-trace-id")))
                .Select(document => $"{document.Descriptor} pod template is missing flowercore.io/audit-trace-id."))
            .ToList();
        violations.Should().BeEmpty();
    }
    [Fact]
    public void FcDeviceManagement_IngressMustUseCertManagerAndKeepPublicHostDisabled()
    {
        var appText = string.Join(
            Environment.NewLine,
            Directory.GetFiles(Path.Combine(Inventory.BluejayRoot, "apps", "fc-devicemgmt"), "*.yaml")
                .Select(File.ReadAllText));
        appText.Should().NotContain("certResolver");
        appText.Should().Contain("update.flowercore.io");
        appText.Should().Contain("disabled-until-Q-OIDC-1");
        FcDeviceManagementDocuments()
            .Where(document => document.Kind == "IngressRoute")
            .SelectMany(document => document.MappingSequence("spec", "routes"))
            .Select(route => ManifestNodeExtensions.Scalar(route, "match") ?? string.Empty)
            .Should()
            .Contain(match => match.Contains("Host(`devices.iamworkin.lan`)", StringComparison.Ordinal))
            .And.NotContain(match => match.Contains("Host(`update.flowercore.io`)", StringComparison.Ordinal));
        var certificate = FcDeviceManagementDocuments()
            .Single(document => document.Kind == "Certificate" && document.Name == "fc-devicemgmt-web-tls");
        certificate.Scalar("spec", "issuerRef", "name").Should().Be("step-ca-acme");
        certificate.Scalar("spec", "issuerRef", "kind").Should().Be("ClusterIssuer");
        ManifestNodeExtensions.ScalarSequence(certificate.Root, "spec", "dnsNames")
            .Should()
            .ContainSingle("devices.iamworkin.lan");
    }
    [Fact]
    public void FcDeviceManagement_OperatorRbacMustCoverDevicesAndOwnerLookup()
    {
        var clusterRole = FcDeviceManagementDocuments()
            .Single(document => document.Kind == "ClusterRole" && document.Name == "fc-devicemgmt-operator");
        var allScalars = clusterRole.AllScalars().ToList();
        allScalars.Should().Contain("devices.flowercore.io");
        allScalars.Should().Contain("*");
        allScalars.Should().Contain("deployments");
        allScalars.Should().Contain("get");
        var operatorDeployment = FcDeviceManagementDocuments()
            .Single(document => document.Kind == "Deployment" && document.Name == "fc-devicemgmt-operator");
        operatorDeployment.AllScalars().Should().Contain("FLOWERCORE_KUBERNETES_OWNER_DEPLOYMENT");
        operatorDeployment.AllScalars().Should().Contain("fc-devicemgmt-operator");
    }
    [Fact]
    public void FcDeviceManagement_RuntimeSecretsMustUseOnePasswordItemPattern()
    {
        var item = FcDeviceManagementDocuments()
            .Single(document => document.Kind == "OnePasswordItem" && document.Name == "fc-devicemgmt-runtime");
        item.Scalar("spec", "itemPath")
            .Should()
            .Be("vaults/IAmWorkin/items/FlowerCore DeviceManagement Runtime");
        var appText = string.Join(
            Environment.NewLine,
            Directory.GetFiles(Path.Combine(Inventory.BluejayRoot, "apps", "fc-devicemgmt"), "*.yaml")
                .Select(File.ReadAllText));
        FcDeviceManagementDocuments().Should().NotContain(document => document.Kind == "Secret");
        appText.Should().Contain("secretKeyRef:");
        appText.Should().Contain("secretName: fc-devicemgmt-runtime");
        appText.Should().NotContain("stringData:");
        appText.Should().NotContain("from-literal");
        appText.Should().NotContain("tls.key:");
    }
    [Fact]
    public void FcDeviceManagement_NetworkPoliciesMustAllowLanAgentsSynologyAndDnatPorts()
    {
        var policies = FcDeviceManagementDocuments()
            .Where(document => document.Kind == "NetworkPolicy")
            .ToList();
        policies.Should().HaveCount(2);
        var combinedScalars = policies.SelectMany(policy => policy.AllScalars()).ToList();
        combinedScalars.Should().Contain("10.0.56.0/24");
        combinedScalars.Should().Contain("10.0.57.0/24");
        combinedScalars.Should().Contain("10.0.58.0/24");
        combinedScalars.Should().Contain("10.0.68.0/27");
        combinedScalars.Should().Contain("10.0.58.3/32");
        var combinedEgressPorts = policies.SelectMany(policy => policy.EgressPorts()).ToHashSet(StringComparer.Ordinal);
        combinedEgressPorts.Should().Contain(new[] { "80", "443", "8080", "8443", "2049", "111" });
        var traefikVipPolicies = policies
            .Where(policy => policy.AllScalars().Any(value => value.Contains("10.0.56.200", StringComparison.Ordinal)))
            .ToList();
        traefikVipPolicies.Should().ContainSingle();
        traefikVipPolicies[0].EgressPorts().Should().Contain(new[] { "80", "443", "8080", "8443" });
    }
    [Fact]
    public void FcDeviceManagement_ArgocdApplicationMustMatchApplicationSetDiscoveryConventions()
    {
        var application = FcDeviceManagementDocuments()
            .Single(document => document.Kind == "Application" && document.Name == "infra-fc-devicemgmt");
        application.Namespace.Should().Be("argocd");
        application.Scalar("spec", "source", "repoURL")
            .Should()
            .Be("http://gitea-clusterip.gitea.svc.cluster.local:3000/bluejay/bluejay-infra.git");
        application.Scalar("spec", "source", "path").Should().Be("apps/fc-devicemgmt");
        application.Scalar("spec", "destination", "namespace").Should().Be("fc-devicemgmt");
    }
    private static IEnumerable<string> ProbeViolations(
        ManifestDocument document,
        YamlMappingNode container,
@@ -314,6 +492,13 @@ public sealed class FleetManifestLintTests
            $"{document.Descriptor} container '{containerName}' still uses {probeKey}.httpGet on /health.",
        };
    }
    private static IReadOnlyList<ManifestDocument> FcDeviceManagementDocuments()
    {
        return Inventory.Documents
            .Where(document => document.RelativePath.StartsWith("fc-devicemgmt/", StringComparison.Ordinal))
            .ToList();
    }
 }
 internal sealed class ManifestInventory
Author	SHA1	Message	Date
Codex	211ecbf294	feat(fc-devicemgmt): add Kubernetes deployment manifests	2026-05-12 15:43:22 -05:00
Codex	f298339152	fix(guacamole): add --- separator between macmini-vnc-creds OnePasswordItem and guacamole-branding ConfigMap Missing document separator caused YAML to merge the OnePasswordItem's top-level `spec: itemPath:` block into the ConfigMap that follows. Result: a ConfigMap with a `.spec` field whose K8s schema does not declare one, triggering ArgoCD's structured-merge diff to fail since 2026-05-11T15:30:54Z: Failed to compare desired state to live state: failed to calculate diff: error calculating structured merge diff: error building typed value from config resource: .spec: field not declared in schema App stayed Healthy (live K8s tolerated the extra field — ConfigMap ignored it) but ArgoCD's diff calc was broken, leaving the app stuck at sync=Unknown for all 21 resources. Adding the missing `---` separator makes the OnePasswordItem and ConfigMap proper sibling YAML documents, each with its own kind-correct schema. Diagnosed during 2026-05-12 morning routine. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 09:26:03 -05:00
Codex	6e7d88db49	feat(fc-redis): add SignalR backplane for cross-product event bus (Q-SO-1 Phase A) Per Q-SO-1 operator resolution 2026-05-11 PM, Redis SignalR backplane lands in Phase A (was Phase C deferral). Treats Redis as a managed FC infrastructure component, not a deferred scaling escalation. Lands the minimal Phase A surface: - Namespace fc-redis - Single Redis 7-alpine pod with 1Gi Longhorn RWO PVC - ConfigMap with AOF persistence (everysec), 256Mi maxmemory, allkeys-lru - ClusterIP Service `redis.fc-redis.svc.cluster.local:6379` (in-cluster only) - No AUTH Phase A (Phase B add via 1Password Connect rotation) - No IngressRoute (backplane is server-to-server) Consumers (Phase A IMPL across FC services) add: services.AddSignalR().AddStackExchangeRedis( "redis.fc-redis.svc.cluster.local:6379", opts => opts.Configuration.ChannelPrefix = StackExchange.Redis.RedisChannel.Literal("fc-opsconsole")); Phase B/C follow-ons (not in this commit): Sentinel for HA, AUTH password from 1Password, redis_exporter sidecar for Prometheus, network policies. See FlowerCore.Notes/docs/signage/operations-console-phase-2-design.md section 3.5 (rewritten) and decisions-waiting.html Q-SO-1 (RESOLVED). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 19:02:58 -05:00
Codex	5ae50bd491	fix(telephony): init container runs as root to chown hostPath /tmp/tts-audio The fix-data-perms init container chowns /data (PVC) and /shared-tts (hostPath /tmp/tts-audio on rke2-agent1) to uid 1654 so the non-root telephony-web app can write Piper TTS .sln16 files. Without an explicit container-level securityContext override, the init container inherits pod-level runAsNonRoot:true / runAsUser:1654 and fails with 'chown: /shared-tts: Operation not permitted' the first time the hostPath comes up root-owned after a node reboot. Outage 2026-05-11 23:00 UTC: telephony-web in Init:CrashLoopBackOff for 9 hours (100+ restarts) until init container was bumped to runAsUser:0. Live cluster patched in the same operation; this commit makes the fix durable in git so ArgoCD sync preserves it. See Notes memory: feedback_hostpath_initcontainer_chown_perms Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 18:37:15 -05:00
Codex	653d4472f5	fix(monitoring): mirror Q-MR-3 MultusMemoryPressure + NamespacePendingPodBacklog alerts Two new preventive alert rules added to the kubernetes-state group of the K8s migration target ConfigMap. The live Podman Prometheus on noc1 has already been updated via FlowerCore.Notes/scripts/monitoring/alerts.yml + sudo cp + podman pod restart monitoring (this commit only locks it in the bluejay-infra K8s mirror so a future migration carries it forward). MultusMemoryPressure (critical, thermal_print): fires when kube-multus working set exceeds 80% of its memory limit for 5m. Catches the next multus OOM cascade BEFORE it kills the daemon cluster-wide. The 2026-05-10 21h outage hit because no alert fired on the rising multus working set; only downstream blackbox / Traefik / service alerts triggered, after the fact. NamespacePendingPodBacklog (warning): fires when any single namespace has >25 Pending pods sustained for 30m. Catches the operator-leak avalanche pattern (orphan pods from a crashed reconciler emitting children without ownerReferences) before it cascades into a CNI OOM. See FlowerCore.Notes: - feedback_multus_50mi_limit_oom_orphan_pod_avalanche - feedback_monitoring_k8s_target_vs_live_podman (workflow) Companion commits: - bluejay-infra@eb8693e (multus memory limit) - FlowerCore.RemoteDesktop@b02c59b (OwnerReferences fix) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 10:42:27 -05:00
Codex	eb8693e1ce	fix(multus): bump kube-multus-ds memory 50Mi/50Mi -> 1Gi/512Mi (prevent OOM cascade) Cluster outage 2026-05-10T17:43 through 2026-05-11 ~10:30 (~21h). Root cause: FlowerCore.RemoteDesktop emitted 219 orphan rd-browser-only-* pods in fc-desktop (missing OwnerReferences — see companion fix in FlowerCore.RemoteDesktop). Kubelet's continuous CNI ADD retries for those pending pods drove a request queue that exceeded the upstream default 50Mi limit on kube-multus-ds. Multus OOMKilled (exit 137), restarted with an even bigger backlog, OOMKilled again, positive feedback loop. Restart counts climbed to 276 / 412 / 261 across the 3 RKE2 nodes. Downstream blast radius: both Traefik pods stuck ContainerCreating (101m + 4h35m), all Longhorn CSI attacher/provisioner/instance-manager stuck, every Prometheus blackbox probe for *.iamworkin.lan failing, UpdateCenterPublicEdgeDown critical on update.flowercore.io, every ArgoCD app showing sync=Unknown because repo-server lost git connectivity. 45 firing Prometheus alerts. Recovery sequence (Q-MR-1 from FlowerCore.Notes morning routine): 1. kubectl patch kube-multus-ds memory live (this commit locks it in git so ArgoCD doesn't revert on next sync) 2. Force-delete the 219 orphan pods (kubectl --grace-period=0 --force) to break the avalanche 3. Rollout restart kube-multus-ds — STABLE after restart with new limit 4. Restart Traefik + Longhorn CSI to clear stuck ContainerCreating 5. Verify update.flowercore.io returns 200 + ArgoCD apps reconcile Tested incrementally: 256Mi limit was insufficient (still OOMed on catchup burst), 512Mi was insufficient on rke2-agent1 (most pods concentrated there), 1Gi/512Mi handled the full 200+ pending pod CNI catchup cleanly with 0 multus restarts after rollout. Nodes are 64GB with <25% used in steady-state, so the ~256Mi typical working-set is well within the new limit. Companion change: FlowerCore.RemoteDesktop must set OwnerReferences on every worker pod so future operator crashes don't leak orphans (Q-MR-2). Preventive alerts (Q-MR-3) MultusMemoryPressure + NamespacePendingPodBacklog are coming in a follow-up commit to apps/monitoring/. Memory: feedback_multus_50mi_limit_oom_orphan_pod_avalanche Decisions card: docs/dashboards/decisions-waiting.html Q-MR-1..3 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 10:30:05 -05:00
Codex	667777a653	revert(ci1): back to cdrom:scsi (virtio-blk disk hit QEMU flock) The virtio-blk disk swap (commit `84c9feb`) didn't help: qemu fails to acquire the write lock on the rootdisk PVC because the previous launcher's qemu process didn't release it cleanly. Same family of bug as the "stale QEMU flock" already documented in feedback_kubevirt_iso_first_install_bootorder_and_runstrategy, but now triggered on rke2-agent1 instead of agent2. OVMF cdrom timeout is the real blocker and remains open: - ✅ Distribution pipeline (build → save → scp → ctr import on all 3 RKE2 nodes) is proven. localhost/win-server-2025:1.0 lives in each node's containerd k8s.io namespace. - ✅ containerDisk + cdrom:scsi gets qemu domain Running (no NFS Permission denied, no rootdisk flock). - ❌ OVMF BdsDxe times out reading the SCSI cdrom regardless of SecureBoot setting and bus type. Reverting the disk type to cdrom:scsi so the VM lands back on the "qemu Running, OVMF stuck at Boot Manager" state — known-stable and easier to attack than the QEMU-flock state we hit by trying virtio-blk disk. Operator decision for next architectural step (one of): - Custom OVMF firmware build with longer Boot0001 timeout - KubeVirt version bump (v1.5+ has OVMF fixes) - Hyper-V/VirtualBox install + export VHD to ci1 - BIOS legacy boot (Win Server 2025 needs UEFI but install media has a BIOS path) - DataVolume HTTP datasource (CDI internalizes ISO bytes via different code path) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 21:35:00 -05:00