Three Certificates requested duration: 2160h (90d) with renewBefore: 720h
(30d). step-ca's ACME provisioner caps cert lifetime at 30d, so it silently
issued 720h certs — making renewBefore EQUAL to the actual cert lifetime.
cert-manager treats the cert as needing immediate renewal the moment it's
issued, creates a CertificateRequest, gets a new (still 30d) cert, marks
it for immediate renewal, and loops.
Damage on 2026-05-07 ~20:30 (caught during regroup after 5h gap):
- fc-worldbuilder/worldbuilder-web-tls: 2365 CRs in 18h
- fc-distribution/fc-distribution-tls: 10880 CRs in 18h
- knowledge/knowledge-tls: 10888 CRs in 18h
Total: 24,133 stale CertificateRequest objects in etcd.
Bulk-deleted all CRs + Orders in those 3 namespaces, then this commit
fixes the source so ArgoCD sync stops re-creating the loop.
Fix: match the working 720h/240h pattern used by every other FC service
cert (agent-zero, fc-dns, fc-llm-bridge, fc-php, traefik-system, etc.).
30d cert lifetime + 10d renewal headroom = renewal at day 20, which is
the cert-manager standard 2/3-of-lifetime practice.
Side effect during loop: ALSO contributed to step-ca load and may have
caused intermittent timeouts cluster-wide (the latest stuck challenge
was timing out dialing step-ca:9443 even though step-ca itself was up).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two changes after the Phase 2.4 deploy went live at
https://knowledge.iamworkin.lan:
1. **Ollama URL flip**: from BLUEJAY-WS (10.0.56.20:11434) to edge1 Pi 5
(10.0.57.17:11434). Honors the cluster-clean architecture from
bluejay-infra@0f9d56e ("Workstation is private dev hardware and should
not be in the cluster path"). Query-time embeddings (~ms per query)
are fast enough on edge1; bulk index rebuilds (Phase 2.5+) will need a
separate ingestion lane that can opt into the workstation GPU when
present. ArgoCD picks up the env-var change and rolls the pod
automatically — no image rebuild needed.
2. **README LIVE status**: flip the staged-not-yet-applied banner to
LIVE 2026-04-27. Pod running, certificate issued, PVC bound,
/healthz 200, /api/v1/editions [] (initial-deploy state). Phase 2.5+
admin UI handles bulk population.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
NOT YET APPLIED — push to origin/main is gated on the DNS A record
knowledge.iamworkin.lan -> 10.0.56.200 being live. Per memory
feedback_pfsense_dns_required_for_acme, applying the Certificate
without DNS in place puts cert-manager into ~2h HTTP-01 backoff and
needs `kubectl -n knowledge delete order <name>` recovery.
Manifests authored:
- apps/knowledge/knowledge.yaml — Namespace, PVC (knowledge-vector-store
Longhorn 20Gi RWO), Deployment (single replica, Recreate, image
localhost/fc-knowledge-web:v202604272200 placeholder, runAsNonRoot
1654, readOnlyRootFilesystem, drop ALL caps, /healthz startupProbe +
readinessProbe, tcpSocket livenessProbe), Service (ClusterIP port
80 -> 8080), Certificate (step-ca-acme ClusterIssuer, 90d duration),
IngressRoute (knowledge.iamworkin.lan, websecure entrypoint).
- apps/knowledge/kustomization.yaml — `kubectl kustomize` preview file
(matches fc-distribution shape; ApplicationSet uses dir generator).
- apps/knowledge/README.md — deployment order checklist with the DNS
preflight, image build/import loop for all 3 RKE2 nodes, push
procedure, smoke verification, initial-deploy-state notes
(zero editions until *.db files are pushed to the PVC), resource
sizing, probe + middleware notes.
Companion artifacts (separate repos, separate commits):
- FlowerCore.Knowledge@eb91eb4 — Dockerfile.deploy at repo root
- FlowerCore.Notes@96cd443 — scripts/deploy-knowledge.sh
Apply order (from apps/knowledge/README.md):
1. Add DNS A record knowledge.iamworkin.lan -> 10.0.56.200 via
FlowerCore.DNS or pfSense web UI.
2. Run `bash scripts/deploy-knowledge.sh` from FlowerCore.Notes — this
builds + imports the image to all 3 RKE2 nodes with
FLOWERCORE_DEPLOY_SKIP_ROLLOUT=1 (since the Deployment doesn't
exist yet on the cluster).
3. Bump the image tag in this manifest to match the freshly-imported
tag, then `git push` from this repo to land on main. ArgoCD picks
up within ~3 minutes and creates `infra-knowledge`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>