K8s gotcha sweep C7 — extend lint + cover Track A allowlist + scope Notes/k8s
Follow-up to 0b52093 (K8s manifest hardening) closing two real gaps the
prior sweep didn't catch:
1. Public read-write allowlist regression guard (Track A)
- New PublicReadWriteAllowlistHosts set tracks updatecenter.iamworkin.lan
+ updates.iamworkin.lan. The allowlist on those hosts is
GET||HEAD||POST||OPTIONS — POST is required for the bootstrap-JWT
check-in endpoint. PUT/PATCH/DELETE must still 404 at the route.
- New PublicReadWriteIngressRoutes_MustPinGetHeadPostOptionsAllowlist
test enforces the allowlist invariant (3 required methods present,
3 forbidden methods absent).
- Companion conftest.dev policy 08_public_readwrite_allowlist.rego.
2. Selenium NetworkPolicy DNAT backend port audit
- FlowerCore.Notes/k8s/selenium/06-networkpolicy.yaml allowed Traefik
VIP 10.0.56.200:443 + :80 but its 10.42.0.0/16 + 10.43.0.0/16 egress
rules didn't include the post-DNAT backend ports (8443 for Traefik
TLS, 8080 for HTTP). Per feedback_netpol_dnat_backend_port: kube-proxy
DNATs the destination to a backend pod IP+port BEFORE Calico
evaluates the FORWARD chain, so without those backend ports in the
pod CIDR rule, Selenium-driven browser AAT calls to
https://*.iamworkin.lan time out at connect.
- Lint inventory now includes FlowerCore.Notes/k8s/selenium/ so
regressions in this manifest fail fast.
Lint scope notes:
- FlowerCore.Notes/k8s/guacamole/ + monitoring/ are historical
scaffolds that have diverged from the live state (bluejay-infra/apps/
is canonical). Operator review is required before bringing them in
line OR decommissioning them — kept out of lint scope until that
decision lands (see xxl-regroup-2026-05-03-followup.md "Codex 7 §0").
README hardening:
- New "Public read-write allowlist hosts" entry under "Known gotchas"
documenting the GET||HEAD||POST||OPTIONS pattern + linking the lint.
Tests: 8/8 lint tests pass.
Companion fix in FlowerCore.Updater repo on branch
codex/k8s-gotcha-fleet-sweep-c7 (k8s/web-deployment.yaml: localhost/ image
needs imagePullPolicy: Never). The FlowerCore.Updater fix applies to a
deploy that's currently live but bites only on first scheduled-pod
landing on a fresh node — not a live production-impact regression.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -101,6 +101,7 @@ curl -sk -X DELETE https://dns.iamworkin.lan/api/v1/servers/<serverId>/zones/iam
|
||||
- **StatefulSet PVC drift**: `volumeClaimTemplates` needs explicit `volumeMode: Filesystem` or ArgoCD SSA self-heals forever. See memory `feedback_argocd_statefulset_pvc_drift.md`.
|
||||
- **IngressRoute namespace split**: this RKE2 Traefik install does not allow cross-namespace service refs. Keep the `IngressRoute`, backend `Service`, and TLS secret in the same namespace; if one host is shared across namespaces, duplicate the `Certificate` and move the route next to the destination service.
|
||||
- **Public read-only hosts**: if a public host fronts a service that also exposes admin writes internally, add a Traefik route match like `Host(...) && (Method(GET) || Method(HEAD))` on the public edge instead of trusting the app to reject unsafe methods.
|
||||
- **Public read-write allowlist hosts**: if a public host accepts a tightly bounded write surface (e.g. bootstrap-JWT POST), pin the allowlist as `(Method(GET) || Method(HEAD) || Method(POST) || Method(OPTIONS))`. PUT/PATCH/DELETE must still 404 at the route. Track A's `updatecenter.iamworkin.lan` / `updates.iamworkin.lan` are the canonical example. The lint test enforces this invariant.
|
||||
- **Traefik VIP netpols**: when a `NetworkPolicy` allows `10.0.56.200`, also allow the post-DNAT backend ports (`8443` for TLS plus `8080` or `8000` for HTTP) or Calico will drop the rewritten flow.
|
||||
- **Auth-safe probes**: services behind API-key or global auth middleware should prefer `tcpSocket` probes unless `/health` is explicitly exempted before the middleware runs.
|
||||
- **ArgoCD must use internal Gitea URL**: `http://gitea-clusterip.gitea.svc.cluster.local:3000/bluejay/bluejay-infra.git`, not the external HTTPS URL (step-ca cert isn't trusted by ArgoCD). The `ApplicationSet` and any hand-created `Application` must both use the internal URL.
|
||||
|
||||
Reference in New Issue
Block a user