Three Certificates requested duration: 2160h (90d) with renewBefore: 720h
(30d). step-ca's ACME provisioner caps cert lifetime at 30d, so it silently
issued 720h certs — making renewBefore EQUAL to the actual cert lifetime.
cert-manager treats the cert as needing immediate renewal the moment it's
issued, creates a CertificateRequest, gets a new (still 30d) cert, marks
it for immediate renewal, and loops.
Damage on 2026-05-07 ~20:30 (caught during regroup after 5h gap):
- fc-worldbuilder/worldbuilder-web-tls: 2365 CRs in 18h
- fc-distribution/fc-distribution-tls: 10880 CRs in 18h
- knowledge/knowledge-tls: 10888 CRs in 18h
Total: 24,133 stale CertificateRequest objects in etcd.
Bulk-deleted all CRs + Orders in those 3 namespaces, then this commit
fixes the source so ArgoCD sync stops re-creating the loop.
Fix: match the working 720h/240h pattern used by every other FC service
cert (agent-zero, fc-dns, fc-llm-bridge, fc-php, traefik-system, etc.).
30d cert lifetime + 10d renewal headroom = renewal at day 20, which is
the cert-manager standard 2/3-of-lifetime practice.
Side effect during loop: ALSO contributed to step-ca load and may have
caused intermittent timeouts cluster-wide (the latest stuck challenge
was timing out dialing step-ca:9443 even though step-ca itself was up).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>