From 387097485efcd52fdfdc98153fd0442eb5a81626 Mon Sep 17 00:00:00 2001 From: Andrew Stoltz Date: Sat, 13 Jun 2026 12:06:31 -0500 Subject: [PATCH] infra(public-tls): add gated Let's Encrypt issuers + tenant NetworkPolicy substrate Cl-infra-2 (deep-regroup 2026-06-13). LE staging+prod ClusterIssuers (HTTP-01 via Traefik, DNS-01 stub) + a per-tenant default-deny NetworkPolicy template, under gated/public-tls/ OUTSIDE apps/ so the ApplicationSet does NOT auto-apply them (an applied ACME ClusterIssuer registers an account immediately). Internal *.iamworkin.lan TLS stays on step-ca. Inert until the operator opens the web-hosting public-exposure gate (R-1; 14/14 blockers red). Pairs with Codex Wh-C1 (hybrid public TLS) + Wh-C2 (isolation). Co-Authored-By: Claude Opus 4.8 (1M context) --- gated/public-tls/README.md | 39 ++++++++++ gated/public-tls/letsencrypt-issuers.yaml | 78 +++++++++++++++++++ .../tenant-networkpolicy-template.yaml | 59 ++++++++++++++ 3 files changed, 176 insertions(+) create mode 100644 gated/public-tls/README.md create mode 100644 gated/public-tls/letsencrypt-issuers.yaml create mode 100644 gated/public-tls/tenant-networkpolicy-template.yaml diff --git a/gated/public-tls/README.md b/gated/public-tls/README.md new file mode 100644 index 0000000..a59b32e --- /dev/null +++ b/gated/public-tls/README.md @@ -0,0 +1,39 @@ +# Public-TLS substrate (gated) + +**Lane:** Cl-infra-2 (deep-regroup 2026-06-13). **Status:** authored, **NOT applied** — operator-gated. + +This directory holds the Let's Encrypt + isolation substrate for **public** multi-tenant +web hosting. It lives **outside `apps/`** on purpose: the bluejay-infra ApplicationSet only +reconciles `apps/*`, so nothing here is auto-applied. Applying a cert-manager ACME +`ClusterIssuer` registers an ACME account immediately, so these stay inert until the +operator opens the web-hosting public-exposure gate (**R-1**). + +## What's here + +| File | What | Activate when | +|---|---|---| +| `letsencrypt-issuers.yaml` | `letsencrypt-staging` + `letsencrypt-prod` ClusterIssuers (HTTP-01 via Traefik; DNS-01 stub for wildcards) | Public-go. Move to `apps/cluster-issuers/`, **staging first**. | +| `tenant-networkpolicy-template.yaml` | Per-tenant default-deny + allowlist NetworkPolicy (Traefik ingress, CoreDNS, own-DB egress only) | Rendered per tenant at provision time (Wh-C2 isolation). | + +## The gate + +Public exposure is **NO-GO** until the §6 go/no-go checklist in +[`docs/standards/web-hosting-production-readiness-plan.md`](../../../FlowerCore.Notes/docs/standards/web-hosting-production-readiness-plan.md) +is green (currently 14/14 red) **and** the operator explicitly opens R-1. Internal +`*.iamworkin.lan` TLS stays on **step-ca** (`apps/fc-dns/fc-dns.yaml` → `step-ca-dns01`); +these LE issuers are **only** for public tenant domains. + +## Pairing + +- **Codex Wh-C1** consumes `letsencrypt-staging`/`-prod` for hybrid public TLS on + FlowerCore.PHP/MySQL/DNS. +- **Codex Wh-C2** consumes the NetworkPolicy template for cross-tenant isolation suites. + +## Activation checklist (public-go) + +1. Wire a public DNS-01 solver (Cloudflare/Namecheap webhook) **or** confirm public tenant + domains route HTTP-01 to the cluster ingress. +2. `git mv gated/public-tls/letsencrypt-issuers.yaml apps/cluster-issuers/` — staging only. +3. Issue one **staging** cert for a throwaway public domain; verify the chain in a browser. +4. Flip that tenant's Certificate `issuerRef` to `letsencrypt-prod`; mind LE rate limits. +5. Render `tenant-networkpolicy-template.yaml` per tenant; run the Wh-C2 negative suites. diff --git a/gated/public-tls/letsencrypt-issuers.yaml b/gated/public-tls/letsencrypt-issuers.yaml new file mode 100644 index 0000000..6f1d33e --- /dev/null +++ b/gated/public-tls/letsencrypt-issuers.yaml @@ -0,0 +1,78 @@ +# ============================================================================ +# Let's Encrypt ClusterIssuers — PUBLIC TLS substrate (Cl-infra-2, deep-regroup 2026-06-13) +# ============================================================================ +# GATED. This file lives OUTSIDE apps/ on purpose, so the bluejay-infra +# ApplicationSet does NOT auto-apply it. Applying a cert-manager ACME +# ClusterIssuer registers an ACME account immediately, so we keep these inert +# until the operator opens the web-hosting public-exposure gate (R-1; the §6 +# go/no-go checklist in docs/standards/web-hosting-production-readiness-plan.md +# is currently 14/14 red). +# +# Pairs with Codex Wh-C1 (FlowerCore.PHP/MySQL/DNS hybrid public TLS) and +# Wh-C2 (isolation). Internal *.iamworkin.lan certs STAY on step-ca +# (apps/fc-dns/fc-dns.yaml: ClusterIssuer step-ca-dns01) — these LE issuers are +# ONLY for public tenant domains. +# +# TO ACTIVATE (operator public-go): +# 1. Confirm a public DNS-01 solver is wired (Cloudflare/Namecheap webhook) OR +# that public tenant domains route HTTP-01 to the cluster's public ingress. +# 2. Move this file to apps/cluster-issuers/ (the ApplicationSet will create +# infra-cluster-issuers and apply it), staging FIRST. +# 3. Issue ONE staging cert for a throwaway public domain, verify the chain, +# THEN switch that tenant's Certificate issuerRef to letsencrypt-prod. +# 4. Mind LE prod rate limits (50 certs/registered-domain/week, 5 dupes/week). +# +# Registration email is for expiry notices only — adjust to a role address if +# desired (astoltz@iamwork.in is the current operator contact). +# ---------------------------------------------------------------------------- +apiVersion: cert-manager.io/v1 +kind: ClusterIssuer +metadata: + name: letsencrypt-staging + labels: + app.kubernetes.io/part-of: flowercore + flowercore.io/created-by: bluejay-infra + flowercore.io/gate: public-tls +spec: + acme: + # LE STAGING — untrusted certs, generous limits. Use this first, always. + server: https://acme-staging-v02.api.letsencrypt.org/directory + email: astoltz@iamwork.in + privateKeySecretRef: + name: letsencrypt-staging-account-key + solvers: + # HTTP-01 via Traefik. Requires the public tenant domain's :80 traffic to + # reach the cluster ingress. For wildcard / apex without inbound :80, swap + # to the dns01 solver block below (needs a public DNS provider webhook). + - http01: + ingress: + class: traefik + # --- DNS-01 alternative for wildcards (uncomment + wire a public DNS webhook) --- + # - dns01: + # webhook: + # groupName: acme.flowercore.io # or the cloudflare/namecheap solver + # solverName: +--- +apiVersion: cert-manager.io/v1 +kind: ClusterIssuer +metadata: + name: letsencrypt-prod + labels: + app.kubernetes.io/part-of: flowercore + flowercore.io/created-by: bluejay-infra + flowercore.io/gate: public-tls +spec: + acme: + # LE PRODUCTION — trusted certs, strict rate limits. Only after staging proves out. + server: https://acme-v02.api.letsencrypt.org/directory + email: astoltz@iamwork.in + privateKeySecretRef: + name: letsencrypt-prod-account-key + solvers: + - http01: + ingress: + class: traefik + # - dns01: + # webhook: + # groupName: acme.flowercore.io + # solverName: diff --git a/gated/public-tls/tenant-networkpolicy-template.yaml b/gated/public-tls/tenant-networkpolicy-template.yaml new file mode 100644 index 0000000..646530e --- /dev/null +++ b/gated/public-tls/tenant-networkpolicy-template.yaml @@ -0,0 +1,59 @@ +# ============================================================================ +# Per-tenant NetworkPolicy TEMPLATE — web-hosting isolation (Cl-infra-2 / Wh-C2) +# ============================================================================ +# GATED substrate (outside apps/, not auto-applied). Modeled on the canonical +# default-deny + allowlist shape in apps/fc-devicemgmt/network-policy.yaml. +# +# Purpose: when a public multi-tenant site is provisioned, each tenant's pods +# get a NetworkPolicy that (a) default-denies all ingress/egress, then allows +# only Traefik ingress + CoreDNS + that tenant's own DB. This enforces the +# cross-tenant isolation Wh-C2 verifies with negative suites. +# +# Replace the {{TENANT}} placeholders and apply alongside the tenant's workload +# (the MySQL/PHP managers should emit this when they create a tenant, or a +# templating step in apps/ should render it). Kept here as the reference shape. +# ---------------------------------------------------------------------------- +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: tenant-{{TENANT}}-isolation + namespace: fc-tenant-{{TENANT}} + labels: + app.kubernetes.io/part-of: flowercore + flowercore.io/tenant-id: "{{TENANT}}" + flowercore.io/created-by: bluejay-infra + flowercore.io/gate: public-tls +spec: + podSelector: {} # all pods in the tenant namespace + policyTypes: [Ingress, Egress] + ingress: + # Only Traefik may reach tenant pods (public traffic terminates at Traefik). + - from: + - namespaceSelector: + matchLabels: + kubernetes.io/metadata.name: traefik-system + ports: + - { protocol: TCP, port: 80 } + - { protocol: TCP, port: 443 } + - { protocol: TCP, port: 8080 } + egress: + # CoreDNS resolution. + - to: + - namespaceSelector: {} + podSelector: + matchLabels: + k8s-app: kube-dns + ports: + - { protocol: UDP, port: 53 } + - { protocol: TCP, port: 53 } + # This tenant's OWN MySQL only (NOT other tenants' DBs — that's the isolation). + - to: + - podSelector: + matchLabels: + flowercore.io/tenant-id: "{{TENANT}}" + app.kubernetes.io/name: mysql + ports: + - { protocol: TCP, port: 3306 } + # NOTE: deliberately NO blanket egress. Add per-tenant allowances explicitly + # (object storage, mail relay, etc.) so a compromised tenant pod cannot reach + # the rest of the fleet or other tenants.