Commit Graph

17 Commits

Author SHA1 Message Date
Andrew Stoltz
3b40dfb185 fix(auth): deploy distribution root anonymous image 2026-06-04 01:19:16 -05:00
Andrew Stoltz
36039c1335 fix(auth): deploy distribution oidc image tag 2026-06-04 01:04:44 -05:00
Andrew Stoltz
933fea89d1 feat(auth): adopt oidc apps in gitops 2026-06-04 00:49:36 -05:00
Andrew Stoltz
13f9bb7710 fix(distribution): revert OIDC enforcement — enabling it gated /healthz probe (service down)
Flipping Auth__Enabled=true gated the /healthz readiness probe (302->NotReady->
no endpoints->distribution.iamworkin.lan down, healthz=000). Classic
feedback_k8s_probes_behind_auth_middleware. Revert to false (OIDC env block kept,
gate off) to restore service. Proper fix (AllowAnonymous /healthz + CA-trust +
idempotent Editions seed + OIDC-challenge wiring + browser-proof) -> falcon OIDC lane.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 23:47:29 -05:00
Andrew Stoltz
9a58fd2af6 oidc: flip enforcement ON for knowledge + distribution (no-live-proof, fix-forward)
Operator 2026-06-04: nothing is production yet, flip OIDC + fix-forward (no
browser-proof gate). knowledge: Auth__Enabled false->true (OIDC env already
wired). distribution: add OIDC env block (Authority/Audience/ClientId=distribution,
ClientSecret from distribution-oidc-client) + Enabled=true; public read/entitlement
+ Method() allowlist stay open (OIDC gates admin only). Clients already provisioned
(secrets present). ArgoCD deploys both.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 23:38:48 -05:00
Codex
0f1dc5f871 fix(certs): kill cert-manager renewal loop on 3 broken Certificate specs
Three Certificates requested duration: 2160h (90d) with renewBefore: 720h
(30d). step-ca's ACME provisioner caps cert lifetime at 30d, so it silently
issued 720h certs — making renewBefore EQUAL to the actual cert lifetime.
cert-manager treats the cert as needing immediate renewal the moment it's
issued, creates a CertificateRequest, gets a new (still 30d) cert, marks
it for immediate renewal, and loops.

Damage on 2026-05-07 ~20:30 (caught during regroup after 5h gap):
  - fc-worldbuilder/worldbuilder-web-tls:  2365 CRs in 18h
  - fc-distribution/fc-distribution-tls:  10880 CRs in 18h
  - knowledge/knowledge-tls:              10888 CRs in 18h
  Total: 24,133 stale CertificateRequest objects in etcd.

Bulk-deleted all CRs + Orders in those 3 namespaces, then this commit
fixes the source so ArgoCD sync stops re-creating the loop.

Fix: match the working 720h/240h pattern used by every other FC service
cert (agent-zero, fc-dns, fc-llm-bridge, fc-php, traefik-system, etc.).
30d cert lifetime + 10d renewal headroom = renewal at day 20, which is
the cert-manager standard 2/3-of-lifetime practice.

Side effect during loop: ALSO contributed to step-ca load and may have
caused intermittent timeouts cluster-wide (the latest stuck challenge
was timing out dialing step-ca:9443 even though step-ca itself was up).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 15:32:00 -05:00
Codex
a4aa612373 deploy(fc-distribution): roll startup backfill fix 2026-05-06 19:51:11 -05:00
Codex
bf6f542569 deploy(fc-distribution): roll latest endpoint fix 2026-05-06 19:38:26 -05:00
Codex
e150b2102f deploy(fc-distribution): roll phase1 api image 2026-05-06 19:31:22 -05:00
Andrew Stoltz
436185818d fc-distribution: restrict public IngressRoute to GET+HEAD only
Live verification 2026-04-24 caught POST /blobs on dist.flowercore.io
returning 201 Created with the blob persisted — admin write operations
reachable on the public surface. Controller-level strict entitlement
was on, but that gates reads; writes weren't blocked at all.

Fix: add Method(GET) || Method(HEAD) to the Host match on the public
IngressRoute. POST/PUT/PATCH/DELETE now miss every route for
dist.flowercore.io and Traefik returns 404 before the pod sees the
request. Edge-level defense-in-depth on top of the controller's
strict-mode entitlement check.

The internal IngressRoute for dist.iamworkin.lan stays unrestricted —
admin POST /blobs + POST /manifests flows keep working from the lab.
2026-04-23 20:12:25 -05:00
Andrew Stoltz
c3cc404beb fc-distribution: add dist.flowercore.io public surface (Cloudflare A record + Origin Cert + profile-header middleware)
Lights up dist.flowercore.io end-to-end:
- cf-origin-flowercore-io Secret (literal *.flowercore.io Origin Cert,
  copied from the telephony/gitea-public/matrix/mail/flowercore/fc-landing
  pattern — not via OnePasswordItem yet).
- Traefik Middleware dist-public-profile-header: strips any caller-supplied
  X-FC-Distribution-Profile, injects 'public' so the controller's
  NamedEntitlementResolverRouter routes to the strict resolver.
- IngressRoute fc-distribution-public: Host(`dist.flowercore.io`) ->
  same backing Service as the internal dist.iamworkin.lan route.
  Middleware attached; cert secret cf-origin-flowercore-io.

Cloudflare DNS A record dist.flowercore.io -> 74.40.140.24 (proxied)
already created 2026-04-24 via Cloudflare API (record id
e9b957511556f37ff6763f4441acbc45).

Controller entitlement config is still DefaultAllow=false + empty
PublicEditions on the 'public' profile, so every public request
returns 403 by default. Populate FlowerCore__Distribution__EntitlementPublic__PublicEditions__0
via env var when ready to expose specific editions.
2026-04-23 20:10:29 -05:00
Andrew Stoltz
90627819cc fc-distribution: bump to v202604240010 (Phase 4 header-routing controller) 2026-04-23 19:23:35 -05:00
Andrew Stoltz
209bdc16cd fc-distribution: bump to v202604232310 (Front D entitlement wired into ManifestsController) 2026-04-23 18:11:21 -05:00
Andrew Stoltz
61538d3712 fc-distribution: bump to v202604232212 (disable 30MB body size limit on POST /blobs) 2026-04-23 17:11:56 -05:00
Andrew Stoltz
ccaac367af fc-distribution: bump to v202604232206 (adds POST /blobs endpoint) 2026-04-23 17:07:13 -05:00
Andrew Stoltz
5b6c7b97fc feat(fc-distribution): bump image to v202604232145 — cert chain endpoint
- Serves GET /manifests/{edition}/{version}.cert (leaf+intermediate PEM)
- Adds CertChainPem migration on startup (nullable column)
- ManifestSignService now embeds version-specific certChainUrl

Provisioning Agent's verify step will flip from ChainNotServed (Phase 2A
soft-pass) to Valid once a fresh edition is published with this image.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 16:43:56 -05:00
Andrew Stoltz
8a960ffc73 feat(fc-distribution): K8s manifest for Phase 1 edition publisher
Adds apps/fc-distribution/{fc-distribution.yaml,kustomization.yaml,README.md}.
Ships the FlowerCore.Distribution service (Blazor + REST + MCP) backed by
Synology NFS for SQLite catalog + content-addressed blob root.

Contents:
- Namespace fc-distribution
- 3x OnePasswordItem (FlowerCore Code Signing CA informational + per-edition
  signing keys for kiosk-standard and aistation-field)
- Deployment: localhost/fc-distribution:v202604232000 (already imported to
  rke2-server via ctr), pinned to rke2-server nodeSelector because Synology
  NFS ACL restricts writes to that node, emptyDir for /tmp + /app/logs,
  inline NFS for /data (subPath distribution/data) and /blobs (subPath
  distribution/blobs), Secret volume mounts for /signing/<edition>.
  readOnlyRootFilesystem + runAsUser 1654 + drop ALL capabilities.
  Probes: startup + readiness on /healthz, liveness on tcpSocket (defense
  against future auth middleware accidentally gating /healthz).
- Service (ClusterIP :80 -> container :8080)
- Certificate (cert-manager ClusterIssuer step-ca-acme, dist.iamworkin.lan,
  90d / 30d renew). pfSense Unbound override dist.iamworkin.lan ->
  10.0.56.200 already in place (req'd for HTTP-01).
- IngressRoute (Traefik websecure, Host rule on dist.iamworkin.lan)

Env var keys align with the scaffold:
  FlowerCore__Database__ConnectionStrings__Sqlite
  FlowerCore__Distribution__Blobs__Root
  FlowerCore__Distribution__Signing__EditionCerts__<slug>__{CertPath,KeyPath}

Consumer: ProvisioningAgent (USB-side, Phase 2) — see
FlowerCore.Notes/docs/infrastructure/usb-provisioning-architecture.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 15:59:50 -05:00