From d637fe9b30c8617a970d8eb9eea924bcea68d276 Mon Sep 17 00:00:00 2001 From: Codex Date: Thu, 7 May 2026 10:27:20 -0500 Subject: [PATCH] fix(fc-desktop): land 4 NetworkPolicies into bluejay-infra (was deploy-script-only) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Repeatability gap caught during 2026-05-07 morning regroup. The four fc-desktop NetworkPolicies (desktop-isolation, fc-desktop-default-deny, remotedesktop-web-isolation, cm-acme-http-solver-allow) were applied via FlowerCore.RemoteDesktop/scripts/deploy-web.sh `kubectl apply` calls. That meant a fresh cluster rebuild from bluejay-infra alone would miss all of them — Browser Lab session isolation, control-plane allow-list, and HTTP-01 cert renewal would silently fail to come up. Canonical FC GitOps pattern is for NetworkPolicies to live alongside other resources in bluejay-infra. Verified by audit: 6 of 11 cluster NetworkPolicies (agent-zero, edge2-services, monitoring, noc-services, telephony, voice) already follow this pattern. fc-desktop was the outlier; selenium-netpol is also unmanaged and tracked separately. Source-of-truth split (now documented in fc-desktop.yaml): - bluejay-infra OWNS: Certificate + IngressRoute + all NetworkPolicies. - FlowerCore.RemoteDesktop scripts/deploy-web.sh OWNS: Deployment + Service ONLY (because `localhost/fc-desktop:linux-xfce` image refs require manual ctr import on each node — Deployment in bluejay-infra would race the image-import step). Follow-up commits in FlowerCore.RemoteDesktop will: - Remove the now-duplicate k8s/{networkpolicy,namespace-default-deny, web-networkpolicy,acme-http01-solver-allow}.yaml files. - Drop the 3 `kubectl_apply_file` lines from scripts/deploy-web.sh. The 4 NPs in this commit are byte-for-byte identical to what's running in the cluster today (verified via kubectl get -o yaml diff). ServerSideApply in the bluejay-infra ApplicationSet will adopt the existing resources without recreating them. Co-Authored-By: Claude Opus 4.7 (1M context) --- apps/fc-desktop/fc-desktop.yaml | 15 +- apps/fc-desktop/network-policies.yaml | 332 ++++++++++++++++++++++++++ 2 files changed, 346 insertions(+), 1 deletion(-) create mode 100644 apps/fc-desktop/network-policies.yaml diff --git a/apps/fc-desktop/fc-desktop.yaml b/apps/fc-desktop/fc-desktop.yaml index 92846e2..6a45b07 100644 --- a/apps/fc-desktop/fc-desktop.yaml +++ b/apps/fc-desktop/fc-desktop.yaml @@ -1,5 +1,18 @@ # FlowerCore Remote Desktop — TLS + Ingress -# Deployment and Service managed by deploy script (not ArgoCD) +# +# Source-of-truth split: +# - bluejay-infra OWNS: Certificate, IngressRoute, all NetworkPolicies +# (see network-policies.yaml in this directory). +# - FlowerCore.RemoteDesktop scripts/deploy-web.sh OWNS: Deployment + +# Service. Reason: image refs like `localhost/fc-desktop:linux-xfce` +# only exist on each node's containerd after a manual import, so a +# Deployment manifest in bluejay-infra would race the image-import +# step and crash-loop. +# +# NetworkPolicies moved into bluejay-infra 2026-05-07 — previously they +# were applied via the deploy script's kubectl apply calls, which broke +# cluster-rebuild repeatability. See +# feedback_networkpolicies_belong_in_bluejay_infra.md. --- apiVersion: cert-manager.io/v1 kind: Certificate diff --git a/apps/fc-desktop/network-policies.yaml b/apps/fc-desktop/network-policies.yaml new file mode 100644 index 0000000..6ea22b0 --- /dev/null +++ b/apps/fc-desktop/network-policies.yaml @@ -0,0 +1,332 @@ +# FlowerCore Remote Desktop — NetworkPolicies (GitOps-managed) +# +# Moved into bluejay-infra 2026-05-07 as part of the regroup audit. These +# four policies were previously applied via FlowerCore.RemoteDesktop's +# scripts/deploy-web.sh `kubectl apply` calls, which meant a fresh cluster +# rebuild from bluejay-infra alone would miss them — Browser Lab session +# isolation, control-plane allow-list, and HTTP-01 cert renewal would all +# silently fail to come up. +# +# Source-of-truth contract: +# - bluejay-infra OWNS all NetworkPolicy + Certificate + IngressRoute +# resources for fc-desktop. +# - FlowerCore.RemoteDesktop's scripts/deploy-web.sh continues to own +# the Deployment + Service apply (because the image ref +# `localhost/fc-desktop:linux-xfce` only exists on each node's +# containerd after a manual import — it can't be pulled from a +# registry, so a Deployment manifest in bluejay-infra would race the +# image-import step and crash-loop). +--- +# 1) desktop-isolation — Browser Lab session pods. +# +# Locks down pods labeled `app.kubernetes.io/name=remote-desktop` (every +# session pod regardless of template). Allows guacd ingress for the VNC/RDP +# display lane and remotedesktop-web's pre-handoff probing. Egress: NFS to +# Synology, DNS, Traefik (cluster + LB VIP), Intranet (Browser Lab home). +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: desktop-isolation + namespace: fc-desktop + labels: + app.kubernetes.io/part-of: remotedesktop + app.kubernetes.io/component: isolation +spec: + podSelector: + matchLabels: + app.kubernetes.io/name: remote-desktop + policyTypes: + - Ingress + - Egress + ingress: + - from: + - namespaceSelector: + matchLabels: + kubernetes.io/metadata.name: guacamole + ports: + - port: 3000 + protocol: TCP + - port: 3001 + protocol: TCP + - port: 5901 + protocol: TCP + - port: 3389 + protocol: TCP + - from: + - namespaceSelector: + matchLabels: + kubernetes.io/metadata.name: fc-desktop + podSelector: + matchLabels: + app.kubernetes.io/name: remotedesktop-web + ports: + - port: 3000 + protocol: TCP + - port: 5901 + protocol: TCP + egress: + # NFS to Synology + - to: + - ipBlock: + cidr: 10.0.58.3/32 + ports: + - port: 2049 + protocol: TCP + - port: 2049 + protocol: UDP + - port: 111 + protocol: TCP + - port: 111 + protocol: UDP + - to: + - ipBlock: + cidr: 10.0.58.3/32 + ports: + - port: 445 + protocol: TCP + - to: [] + ports: + - port: 53 + protocol: UDP + - port: 53 + protocol: TCP + - to: + - ipBlock: + cidr: 10.0.56.200/32 + - ipBlock: + cidr: 10.43.33.87/32 + - namespaceSelector: + matchLabels: + kubernetes.io/metadata.name: traefik-system + podSelector: + matchLabels: + app.kubernetes.io/name: traefik + ports: + - port: 80 + protocol: TCP + - port: 443 + protocol: TCP + - port: 8000 + protocol: TCP + - port: 8443 + protocol: TCP + - to: + - namespaceSelector: + matchLabels: + kubernetes.io/metadata.name: intranet + podSelector: + matchLabels: + app: intranet-web + ports: + - port: 5300 + protocol: TCP +--- +# 2) fc-desktop-default-deny — namespace-wide catch-all. +# +# Selects every pod EXCEPT remotedesktop-web (the public-surface control +# plane) and applies default-deny semantics for both Ingress and Egress. +# Closes the gap where session pods land WITHOUT the desktop-isolation +# policy's `app.kubernetes.io/name=remote-desktop` label, plus prevents +# arbitrary debug sidecars / kubectl debug images from getting cluster +# access. +# +# CRITICAL: also catches transient cm-acme-http-solver pods (that's the +# bug this whole regroup chased). The cm-acme-http-solver-allow policy +# below is the explicit carve-out. +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: fc-desktop-default-deny + namespace: fc-desktop + labels: + app.kubernetes.io/part-of: remotedesktop + app.kubernetes.io/component: isolation +spec: + podSelector: + matchExpressions: + - key: app.kubernetes.io/name + operator: NotIn + values: + - remotedesktop-web + policyTypes: + - Ingress + - Egress +--- +# 3) remotedesktop-web-isolation — control plane explicit allow-list. +# +# remotedesktop-web is the only pod label the default-deny excludes, so +# without this policy the control plane would have wide-open Ingress AND +# Egress. This re-introduces a tight allow-list: +# - Ingress: Traefik only on TCP/8080 +# - Egress: CoreDNS, K8s API, Guacamole admin, NFS, Intranet, +# Traefik (cluster + LB), and the fc-desktop namespace itself +# (for session pod readiness probing). +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: remotedesktop-web-isolation + namespace: fc-desktop + labels: + app.kubernetes.io/part-of: remotedesktop + app.kubernetes.io/component: isolation +spec: + podSelector: + matchLabels: + app.kubernetes.io/name: remotedesktop-web + policyTypes: + - Ingress + - Egress + ingress: + - from: + - namespaceSelector: + matchLabels: + kubernetes.io/metadata.name: traefik-system + podSelector: + matchLabels: + app.kubernetes.io/name: traefik + ports: + - port: 8080 + protocol: TCP + egress: + # CoreDNS + - to: + - namespaceSelector: + matchLabels: + kubernetes.io/metadata.name: kube-system + podSelector: + matchLabels: + k8s-app: kube-dns + ports: + - port: 53 + protocol: UDP + - port: 53 + protocol: TCP + # K8s API server + - to: [] + ports: + - port: 443 + protocol: TCP + - port: 6443 + protocol: TCP + # Guacamole admin + - to: + - namespaceSelector: + matchLabels: + kubernetes.io/metadata.name: guacamole + ports: + - port: 8080 + protocol: TCP + # NFS to Synology + - to: + - ipBlock: + cidr: 10.0.58.3/32 + ports: + - port: 2049 + protocol: TCP + - port: 2049 + protocol: UDP + - port: 111 + protocol: TCP + - port: 111 + protocol: UDP + # Intranet web + - to: + - namespaceSelector: + matchLabels: + kubernetes.io/metadata.name: intranet + podSelector: + matchLabels: + app: intranet-web + ports: + - port: 5300 + protocol: TCP + # Cluster Traefik pods (in-cluster service resolution + Guacamole + # routing handoff where web app builds URLs against the public host + # but resolves internally). + - to: + - namespaceSelector: + matchLabels: + kubernetes.io/metadata.name: traefik-system + podSelector: + matchLabels: + app.kubernetes.io/name: traefik + ports: + - port: 80 + protocol: TCP + - port: 443 + protocol: TCP + - port: 8080 + protocol: TCP + - port: 8443 + protocol: TCP + # fc-desktop namespace — session pod probing during browser-access + # readiness checks. + - to: + - namespaceSelector: + matchLabels: + kubernetes.io/metadata.name: fc-desktop + ports: + - port: 3000 + protocol: TCP + - port: 3001 + protocol: TCP + - port: 5901 + protocol: TCP + - port: 3389 + protocol: TCP +--- +# 4) cm-acme-http-solver-allow — cert-manager HTTP-01 carve-out. +# +# Without this, fc-desktop-default-deny catches the transient solver pods +# cert-manager creates for each renewal (they don't carry the +# remotedesktop-web label). Caused 8-day silent renewal failure on +# desktop.iamworkin.lan in 2026-04-28..2026-05-07 (see +# feedback_certmanager_renewal_stuck_when_solver_blocked_by_namespace_default_deny.md). +# +# Authorizes: +# - Ingress on TCP/8089 from cluster Traefik (which proxies the external +# HTTP-01 GET on port 80 through to the solver). +# - Egress for cluster DNS (defensive — newer cert-manager probes from +# inside the solver too). +# +# The `acme.cert-manager.io/http01-solver=true` label is set by +# cert-manager itself on every solver pod automatically. +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: cm-acme-http-solver-allow + namespace: fc-desktop + labels: + app.kubernetes.io/part-of: remotedesktop + app.kubernetes.io/component: cert-renewal +spec: + podSelector: + matchLabels: + acme.cert-manager.io/http01-solver: "true" + policyTypes: + - Ingress + - Egress + ingress: + - from: + - namespaceSelector: + matchLabels: + kubernetes.io/metadata.name: traefik-system + podSelector: + matchLabels: + app.kubernetes.io/name: traefik + ports: + - port: 8089 + protocol: TCP + egress: + - to: + - namespaceSelector: + matchLabels: + kubernetes.io/metadata.name: kube-system + podSelector: + matchLabels: + k8s-app: kube-dns + ports: + - port: 53 + protocol: UDP + - port: 53 + protocol: TCP