Three Certificates requested duration: 2160h (90d) with renewBefore: 720h (30d). step-ca's ACME provisioner caps cert lifetime at 30d, so it silently issued 720h certs — making renewBefore EQUAL to the actual cert lifetime. cert-manager treats the cert as needing immediate renewal the moment it's issued, creates a CertificateRequest, gets a new (still 30d) cert, marks it for immediate renewal, and loops. Damage on 2026-05-07 ~20:30 (caught during regroup after 5h gap): - fc-worldbuilder/worldbuilder-web-tls: 2365 CRs in 18h - fc-distribution/fc-distribution-tls: 10880 CRs in 18h - knowledge/knowledge-tls: 10888 CRs in 18h Total: 24,133 stale CertificateRequest objects in etcd. Bulk-deleted all CRs + Orders in those 3 namespaces, then this commit fixes the source so ArgoCD sync stops re-creating the loop. Fix: match the working 720h/240h pattern used by every other FC service cert (agent-zero, fc-dns, fc-llm-bridge, fc-php, traefik-system, etc.). 30d cert lifetime + 10d renewal headroom = renewal at day 20, which is the cert-manager standard 2/3-of-lifetime practice. Side effect during loop: ALSO contributed to step-ca load and may have caused intermittent timeouts cluster-wide (the latest stuck challenge was timing out dialing step-ca:9443 even though step-ca itself was up). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
FlowerCore.WorldBuilder
ArgoCD-managed manifest for FlowerCore.WorldBuilder.Web — comic / storyboard authoring service that drives ComfyUI for panel image generation and QuestPDF for letter / A4 export.
Source: D:\git\FlowerCore\FlowerCore.WorldBuilder (master)
Deployment order
- DNS preflight —
worldbuilder.iamworkin.lan -> 10.0.56.200MUST exist in pfSense Unbound before this manifest is applied, or cert-manager HTTP-01 silently exponential-backs-off ~2h. Memory:feedback_pfsense_dns_required_for_acme. - Image import to ALL RKE2 nodes — pod can schedule to any of
rke2-server(10.0.56.11),rke2-agent1(10.0.56.12),rke2-agent2(10.0.56.13). Build with:Memory:bash deploy/build.sh # in FlowerCore.WorldBuilder repo podman save localhost/fc-worldbuilder:v<TAG> -o /tmp/fc-worldbuilder-v<TAG>.tar for h in 10.0.56.11 10.0.56.12 10.0.56.13; do scp /tmp/fc-worldbuilder-v<TAG>.tar fcadmin@$h:/tmp/ ssh fcadmin@$h \ "sudo /var/lib/rancher/rke2/bin/ctr -a /run/k3s/containerd/containerd.sock \ -n k8s.io images import /tmp/fc-worldbuilder-v<TAG>.tar" donefeedback_rke2_image_import_per_node_scp. - Bump image tag in
worldbuilder.yamland git push. ArgoCD ApplicationSet picks up within ~3 minutes. - First production render — open
https://worldbuilder.iamworkin.lan, create World → Character → Storyboard → ExportJob, confirm artifact downloads. ComfyUI lives on BLUEJAY-WS athttp://10.0.56.20:8188.
Health probes
startupProbe+readinessProbe:httpGet /healthz(registered explicitly in Program.cs — anonymous, no DB or OpenAPI dependency).livenessProbe:tcpSocketas a cheap fallback. Memory:feedback_k8s_probes_must_not_hit_openapi,feedback_k8s_probes_behind_auth_middleware.
Storage
- Longhorn RWO PVC
worldbuilder-data(5Gi) mounted at/data. SQLite DB lives at/data/worldbuilder.db, generated images under/data/gallery/, PDF/PNG exports under/data/exports/. - DataProtection keys persist to the same SQLite via
AddFlowerCoreDataProtection<WorldBuilderDbContext>— explicit migration20260429133417_Initialalready createsfc_dp_keys. Memory:feedback_dataprotection_keys_persist_to_app_dbcontext,feedback_intranet_dataprotection_table_must_have_explicit_migration.
Image generation backend
FlowerCore:WorldBuilder:ImageGeneration:BaseUrl=http://10.0.56.20:8188 —
ComfyUI runs on BLUEJAY-WS Windows (R9700 / gfx1201 / ROCm 7.2.1). Pod reaches
the workstation directly across the 10.0.56.0/24 VLAN (no Podman-style host-
filter issues — K8s pods route via Calico, which is L3-routed across the
VLAN).