bluejay-infra/apps at 0f1dc5f871c3984ef45ceb7870c98a05c07809f2 - bluejay-infra - Gitea: Git with a cup of tea

bluejay/bluejay-infra

Files

History

Codex 0f1dc5f871 fix(certs): kill cert-manager renewal loop on 3 broken Certificate specs

Three Certificates requested duration: 2160h (90d) with renewBefore: 720h
(30d). step-ca's ACME provisioner caps cert lifetime at 30d, so it silently
issued 720h certs — making renewBefore EQUAL to the actual cert lifetime.
cert-manager treats the cert as needing immediate renewal the moment it's
issued, creates a CertificateRequest, gets a new (still 30d) cert, marks
it for immediate renewal, and loops.

Damage on 2026-05-07 ~20:30 (caught during regroup after 5h gap):
  - fc-worldbuilder/worldbuilder-web-tls:  2365 CRs in 18h
  - fc-distribution/fc-distribution-tls:  10880 CRs in 18h
  - knowledge/knowledge-tls:              10888 CRs in 18h
  Total: 24,133 stale CertificateRequest objects in etcd.

Bulk-deleted all CRs + Orders in those 3 namespaces, then this commit
fixes the source so ArgoCD sync stops re-creating the loop.

Fix: match the working 720h/240h pattern used by every other FC service
cert (agent-zero, fc-dns, fc-llm-bridge, fc-php, traefik-system, etc.).
30d cert lifetime + 10d renewal headroom = renewal at day 20, which is
the cert-manager standard 2/3-of-lifetime practice.

Side effect during loop: ALSO contributed to step-ca load and may have
caused intermittent timeouts cluster-wide (the latest stuck challenge
was timing out dialing step-ca:9443 even though step-ca itself was up).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-07 15:32:00 -05:00

..

fix(agent-zero): export knowledge mcp gate to python builder

2026-04-29 23:32:55 -05:00

Fix Traefik dashboard cert issuer: step-ca-acme

2026-03-10 01:12:08 -05:00

K8s manifest hardening + new bluejay-infra-lint test project

2026-05-04 03:18:04 -05:00

Fix Traefik dashboard cert issuer: step-ca-acme

2026-03-10 01:12:08 -05:00

edge2-services: print.iamworkin.lan Traefik HTTPS for Print.Web (XL Track C)

2026-04-26 14:37:33 -05:00

Fix Traefik dashboard cert issuer: step-ca-acme

2026-03-10 01:12:08 -05:00

apps: fc-chat refactor + fc-menuboard app split

2026-04-16 19:25:25 -05:00

fix(fc-desktop): land 4 NetworkPolicies into bluejay-infra (was deploy-script-only)

2026-05-07 10:27:20 -05:00

fc-distribution

fix(certs): kill cert-manager renewal loop on 3 broken Certificate specs

2026-05-07 15:32:00 -05:00

Add signage service ingress manifests

2026-04-09 15:09:08 -05:00

Add step-ca TLS certs for mysql, php, desktop, signage, fc-landing

2026-04-08 18:20:23 -05:00

K8s manifest hardening + new bluejay-infra-lint test project

2026-05-04 03:18:04 -05:00

apps: fc-chat refactor + fc-menuboard app split

2026-04-16 19:25:25 -05:00

fc-messageboard

K8s manifest hardening + new bluejay-infra-lint test project

2026-05-04 03:18:04 -05:00

Add step-ca TLS certs for mysql, php, desktop, signage, fc-landing

2026-04-08 18:20:23 -05:00

Add step-ca TLS certs for mysql, php, desktop, signage, fc-landing

2026-04-08 18:20:23 -05:00

fc-presentations

Add signage service ingress manifests

2026-04-09 15:09:08 -05:00

Add signage service ingress manifests

2026-04-09 15:09:08 -05:00

fc-segmentdisplay

feat(fc-segmentdisplay): switch tls certificate to dns01

2026-04-23 18:39:17 -05:00

Add step-ca TLS certs for mysql, php, desktop, signage, fc-landing

2026-04-08 18:20:23 -05:00

fc-signalcontrol

K8s manifest hardening + new bluejay-infra-lint test project

2026-05-04 03:18:04 -05:00

deploy(ttsreader): persist voice reference clips on pvc

2026-05-06 20:48:58 -05:00

fix(infra): unstick fc-updater + monitoring ArgoCD apps

2026-05-07 10:11:27 -05:00

Fix Traefik dashboard cert issuer: step-ca-acme

2026-03-10 01:12:08 -05:00

Adopt fc-updater into ArgoCD

2026-05-06 17:33:32 -05:00

Update telephony-web image to v20260324d, resolve merge conflicts

2026-03-24 15:55:52 -05:00

feat(guacamole): add macmini-vnc-creds OnePasswordItem + fix Mac mini connection IPs

2026-04-28 20:09:45 -05:00

deploy(fc-intranet-web): roll v20260506-1737 with Wave 2 specialist galleries

2026-05-06 17:38:22 -05:00

fix(irc): use short name for unrealircd in anope + thelounge configs

2026-04-22 21:23:38 -05:00

fix(certs): kill cert-manager renewal loop on 3 broken Certificate specs

2026-05-07 15:32:00 -05:00

Update telephony-web image to v20260324d, resolve merge conflicts

2026-03-24 15:55:52 -05:00

statefulsets: align guacamole and matrix drift defaults

2026-04-22 23:11:47 -05:00

fix(monitoring): rename bare Grafana dashboard JSONs out of *.json extension

2026-05-07 10:13:37 -05:00

feat(noc-services): wire puppetdb.iamworkin.lan through Traefik step-ca cert

2026-04-28 15:13:20 -05:00

Add infrastructure manifests for 9 services

2026-03-09 16:35:04 -05:00

fix(selenium): GitOps-capture selenium-netpol (was unmanaged anywhere)

2026-05-07 10:30:59 -05:00

Update telephony-web image to v20260324d, resolve merge conflicts

2026-03-24 15:55:52 -05:00

fc-telephony: bump web to v202604252156 (T7 step trail)

2026-04-25 21:56:14 -05:00

traefik-dashboard

Fix Traefik dashboard cert issuer: step-ca-acme

2026-03-10 01:12:08 -05:00

Update telephony-web image to v20260324d, resolve merge conflicts

2026-03-24 15:55:52 -05:00

fix(certs): kill cert-manager renewal loop on 3 broken Certificate specs

2026-05-07 15:32:00 -05:00

feat(zabbix): add RemoteDesktop monitoring template

2026-04-23 23:30:32 -05:00