Commit Graph

363 Commits

Author SHA1 Message Date
Andrew Stoltz
cae03296f5 intranet: bake Notes corpus into image, drop init container
Cluster egress to github.com is fronted by a step-ca TLS proxy that
returns 404 page not found for unmatched routes — git clone of the
public FlowerCore.Notes repo failed inside the pod even with
GIT_SSL_NO_VERIFY=true. Rather than chase the egress NetworkPolicy /
proxy config, bake the docs corpus directly into the image at
/srv/flowercore-notes/docs.

The corpus is just *.md + *.html (369 files, 2.7 MB uncompressed) —
small enough that re-baking on every deploy is fine and avoids any
runtime network dependency.

Manifest changes:
- Image bump: v202604240040search -> v202604240050corpus
- Removed initContainers (clone-notes-corpus is now redundant)
- Removed notes-corpus emptyDir + its volumeMounts
- Vector-store PVC mount stays.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 00:50:00 -05:00
Andrew Stoltz
3c5c1a07bd fix(monitoring): netpol egress allows for fc-desktop + Traefik hairpin
Adds two egress allows to monitoring-netpol so Prometheus can scrape
FlowerCore.RemoteDesktop:

1. fc-desktop namespace on port 8080 — direct ClusterIP service
   target (remotedesktop-web.fc-desktop:8080).
2. traefik-system namespace pods on ports 8080 + 8443 — covers the
   Traefik VIP hairpin path for the `https://desktop.iamworkin.lan`
   scrape target (CoreDNS wildcard resolves iamworkin.lan hostnames
   to the LB VIP; after kube-proxy DNAT, egress needs the backend
   pod port allowed per feedback_netpol_dnat_backend_port).

Without these, the fc-remotedesktop scrape times out with "context
deadline exceeded" even though the monitoring-netpol already allows
the 10.0.56.0/24 CIDR — post-DNAT the destination is a 10.42.x.x
pod IP, not the VIP.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 00:47:50 -05:00
Andrew Stoltz
057595de3d intranet: GIT_SSL_NO_VERIFY=true in clone-notes-corpus init container
Cluster egress is fronted by a step-ca TLS proxy whose cert doesn't
match github.com. The init container's git clone failed with
"SSL: no alternative certificate subject name matches target hostname
'github.com'". The Notes repo is public — there is no secret to
protect on the wire — so GIT_SSL_NO_VERIFY=true is the right tradeoff
here. Tag at v202604240040search.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 00:46:20 -05:00
Andrew Stoltz
b02bb4be38 intranet: deploy v202604240040search with Notes corpus + vector store
Phase 3 lane 1 of FlowerCore.Shared.Indexing rollout — wires the new
search consumer in FlowerCore.Intranet.Web to live infrastructure.

Manifest changes:
- Image bump: localhost/fc-intranet-web:latest -> :v202604240040search.
  Built from FlowerCore.Intranet.Web@feat/shared-indexing-search and
  imported into all three RKE2 nodes (rke2-server, rke2-agent1, rke2-agent2)
  via ctr import. Both :latest and :v202604240040search tags are present.
- New PersistentVolumeClaim intranet-vector-store (1Gi, ReadWriteOnce,
  Longhorn) mounted at /data for the SQLite vector store
  (intranet-vectors.db).
- New emptyDir volume notes-corpus (1Gi sizeLimit) shared between the
  init container and main container, mounted at /srv/flowercore-notes
  (read-only in the main container).
- New init container clone-notes-corpus (alpine/git) that shallow-clones
  https://github.com/astoltz/FlowerCore.Notes.git
  (codex/notes-pimanager-live-drift) into /srv/flowercore-notes on every
  pod start. Re-clone is cheap (depth=1) and re-runs of git fetch +
  reset --hard are idempotent.
- Strategy switched to Recreate for the deployment, since the new RWO
  PVC blocks rolling updates — see CLAUDE.md memory "RWO PVC blocks K8s
  rolling updates".
- Resource bumps: memory 128Mi -> 256Mi req, 512Mi -> 1Gi limit; CPU
  500m -> 1000m limit. The DocsCorpusIndexer + Ollama HTTP calls add
  measurable load during the initial index build.
- initialDelaySeconds bumps on both probes (10s -> 30s liveness, 5s ->
  10s readiness) to account for startup-time Ollama probing and the
  slightly larger image.

The DocsCorpusIndexer waits 15s after host startup before its first
indexing pass, then loops every RescanInterval (default 1h). Its first
run will:
1. Embed all *.md under /srv/flowercore-notes/docs against
   nomic-embed-text on edge1 (10.0.57.17:11434).
2. Embed all *.html under /srv/flowercore-notes/docs/dashboards.
3. Persist chunks + embeddings to /data/intranet-vectors.db.

Verify after rollout:
- kubectl -n intranet logs deploy/intranet-web -c clone-notes-corpus
  (init container should show the docs/ listing).
- kubectl -n intranet logs deploy/intranet-web -f
  (DocsCorpusIndexer should log "Indexing docs root 'notes-md'..." then
  "Docs root 'notes-md' indexed: N files, M chunks, M stored").
- curl -sk https://intranet.iamworkin.lan/api/search/indexes
  -> ["notes-html","notes-md"]
- curl -sk 'https://intranet.iamworkin.lan/api/search?q=guacamole+single+host&topK=3'
  -> hits from docs/infrastructure/guacamole-customization-plan.md

Companion source on FlowerCore.Intranet.Web@feat/shared-indexing-search.
Depends on FlowerCore.Common@feat/shared-indexing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 00:42:03 -05:00
Andrew Stoltz
e44e9a0062 feat(monitoring): RemoteDesktop alerts + scrape jobs + dashboard mount
Three additions to the monitoring ConfigMap, each targeting
FlowerCore.RemoteDesktop:

- **Scrape jobs** (2 new):
  - probe-remotedesktop: blackbox http_2xx against
    https://desktop.iamworkin.lan/health every 30s. Feeds the
    RemoteDesktopWebDown alert.
  - fc-remotedesktop: direct /metrics scrape against
    desktop.iamworkin.lan for the fc_desktop_session_events_total
    and fc_desktop_pool_* series.

- **Alert group `remote-desktop`** (7 rules in alerts.yml):
  - RemoteDesktopWebDown (3m) — /health probe failing
  - RemoteDesktopMetricsStale (10m) — absent metrics series
  - RemoteDesktopPoolDepleted (5m) — pool deficit + depleted flag
  - RemoteDesktopPoolDeficitSustained (10m, info) — persistent
    below-desired pool size
  - RemoteDesktopSessionChurnSpike (5m, info) — launch rate
    >20/min
  - RemoteDesktopRecordingEventsDropped (15m, info) — 30m without
    recording events while launches active
  - RemoteDesktopTlsExpiry (6h, critical) — <2d cert renewal
    window; aligns with feedback_acme_expiry_alert_threshold

- **Grafana dashboard mount**: new volumeMounts + volumes entry for
  `dashboards-remotedesktop` backed by the grafana-dashboard-remotedesktop
  ConfigMap (previously added as a standalone file in d4210c8).
  Folder path /var/lib/grafana/dashboards/remotedesktop — picked up
  by the file-provider with foldersFromFilesStructure:true so the
  dashboard shows up in a "Remotedesktop" folder in Grafana.

No CRLF churn; pure 100-line insertion into LF-normalized file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 00:41:35 -05:00
Andrew Stoltz
297a2a9bbc ttsreader: bump image to v202604240023 for P3 one-click book render
New surfaces: POST /api/v1/bible/projects (one-click whole-book render),
GET /api/v1/bible/books, GET /api/v1/bible/books/{book}/preview, MCP
tools render_tts_reader_bible_book + list_tts_reader_bible_books,
Dashboard "Render a Bible book" card. 107/107 tests, +7 from previous.
2026-04-24 00:25:48 -05:00
Andrew Stoltz
d4210c819f feat(monitoring): RemoteDesktop Grafana dashboard ConfigMap
Wraps apps/monitoring/flowercore-remotedesktop-grafana-dashboard.json
as a ConfigMap manifest so ArgoCD syncs it into the cluster alongside
the existing grafana-dashboard-* ConfigMaps. Standalone file — does
NOT modify noc-monitoring.yaml. That keeps the CRLF churn on
noc-monitoring.yaml (sibling files apps/intranet/intranet.yaml and
apps/agent-zero/configmaps-bluejay.yaml also carry CRLF churn) out
of this commit.

Dashboard will be synced into the cluster but NOT loaded by Grafana
until a matching `volumes:` entry lands in the Grafana Deployment
in noc-monitoring.yaml:

    - name: dashboard-remotedesktop
      configMap:
        name: grafana-dashboard-remotedesktop

Plus a `volumeMounts:` entry in the grafana container:

    - name: dashboard-remotedesktop
      mountPath: /etc/grafana/provisioning/dashboards/remotedesktop
      readOnly: true

Those edits are deferred to the CRLF-normalization pass on
bluejay-infra so the review diff stays reviewable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 00:20:40 -05:00
Andrew Stoltz
fc0b67f670 ttsreader: bump image to v202604232334 for iTunes RSS + ID3 tags
Pulls in FlowerCore.TtsReader@9e2497f: P2.3 iTunes-namespace podcast
feed (author, summary, category, cover art, episode numbering,
duration, atom:self link, serial channel type for Bible projects) and
P2.4 ID3v2 tags on MP3 export + Vorbis comments on OGG (title, artist
with Piper voice humanized, album, track N/M, genre defaulting to
Religion & Spirituality for Bible or Audiobook for text sources,
date). Phones and podcast apps now show proper track info instead of
"Unknown - Unknown".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 23:37:53 -05:00
Andrew Stoltz
223e9a9232 feat(zabbix): add RemoteDesktop monitoring template
New Zabbix 7.2 template under `Templates/FlowerCore` that scrapes
the `/metrics` exposition from FlowerCore.RemoteDesktop and extracts:

- `fc_desktop_session_events_total` split by event (launch/connect/
  disconnect/recording), with a dedicated datapoint for the
  `browser_datasource="json"` slice to track delegated-auth launches.
- `fc_desktop_pool_ready` gauge sum for warm pools.

Trigger: `nodata(flowercore.remotedesktop.metrics,10m)=1` warns when
the public desktop host stops exposing metrics.

Follows the existing `flowercore-print-ollama.yaml` pattern — import
manually into Zabbix and link to the Print/Desktop host. Not a K8s
manifest; ArgoCD ignores.

Grafana dashboard JSON is drafted at
`apps/monitoring/flowercore-remotedesktop-grafana-dashboard.json`
but still needs a ConfigMap wrap + Grafana Deployment volume mount
in noc-monitoring.yaml before it ships (follow-up).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 23:30:32 -05:00
Andrew Stoltz
6c1375b21a ttsreader: bump image to v202604232310 for Media Session API + Bible lexicon
Pulls in FlowerCore.TtsReader@63e6b62: P1.1 Media Session API wiring in
fc-media-session.js + quick-player.js + rendered-chapter-player.js, and
P1.2 biblical-name pronunciation lexicon auto-seed on Bible-source
project creation plus apply-bible-defaults endpoint + MCP tool for
existing projects. Tests 81 -> 97 all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 23:15:53 -05:00
Andrew Stoltz
82529ed9b5 fix(guacamole): detect managed embeds from client URL 2026-04-23 21:02:05 -05:00
Andrew Stoltz
3ea8a56dab fix(guacamole): disable logout for managed embeds 2026-04-23 20:51:15 -05:00
Andrew Stoltz
9272abc225 Use absolute cluster DNS for ttsreader piper 2026-04-23 20:49:40 -05:00
Andrew Stoltz
436185818d fc-distribution: restrict public IngressRoute to GET+HEAD only
Live verification 2026-04-24 caught POST /blobs on dist.flowercore.io
returning 201 Created with the blob persisted — admin write operations
reachable on the public surface. Controller-level strict entitlement
was on, but that gates reads; writes weren't blocked at all.

Fix: add Method(GET) || Method(HEAD) to the Host match on the public
IngressRoute. POST/PUT/PATCH/DELETE now miss every route for
dist.flowercore.io and Traefik returns 404 before the pod sees the
request. Edge-level defense-in-depth on top of the controller's
strict-mode entitlement check.

The internal IngressRoute for dist.iamworkin.lan stays unrestricted —
admin POST /blobs + POST /manifests flows keep working from the lab.
2026-04-23 20:12:25 -05:00
Andrew Stoltz
c3cc404beb fc-distribution: add dist.flowercore.io public surface (Cloudflare A record + Origin Cert + profile-header middleware)
Lights up dist.flowercore.io end-to-end:
- cf-origin-flowercore-io Secret (literal *.flowercore.io Origin Cert,
  copied from the telephony/gitea-public/matrix/mail/flowercore/fc-landing
  pattern — not via OnePasswordItem yet).
- Traefik Middleware dist-public-profile-header: strips any caller-supplied
  X-FC-Distribution-Profile, injects 'public' so the controller's
  NamedEntitlementResolverRouter routes to the strict resolver.
- IngressRoute fc-distribution-public: Host(`dist.flowercore.io`) ->
  same backing Service as the internal dist.iamworkin.lan route.
  Middleware attached; cert secret cf-origin-flowercore-io.

Cloudflare DNS A record dist.flowercore.io -> 74.40.140.24 (proxied)
already created 2026-04-24 via Cloudflare API (record id
e9b957511556f37ff6763f4441acbc45).

Controller entitlement config is still DefaultAllow=false + empty
PublicEditions on the 'public' profile, so every public request
returns 403 by default. Populate FlowerCore__Distribution__EntitlementPublic__PublicEditions__0
via env var when ready to expose specific editions.
2026-04-23 20:10:29 -05:00
Andrew Stoltz
90627819cc fc-distribution: bump to v202604240010 (Phase 4 header-routing controller) 2026-04-23 19:23:35 -05:00
Andrew Stoltz
c97d486a3d feat(fc-segmentdisplay): switch tls certificate to dns01 2026-04-23 18:39:17 -05:00
Andrew Stoltz
209bdc16cd fc-distribution: bump to v202604232310 (Front D entitlement wired into ManifestsController) 2026-04-23 18:11:21 -05:00
Andrew Stoltz
3999634b06 Seed ttsreader piper voices before startup 2026-04-23 17:18:57 -05:00
Andrew Stoltz
61538d3712 fc-distribution: bump to v202604232212 (disable 30MB body size limit on POST /blobs) 2026-04-23 17:11:56 -05:00
Andrew Stoltz
ccaac367af fc-distribution: bump to v202604232206 (adds POST /blobs endpoint) 2026-04-23 17:07:13 -05:00
Andrew Stoltz
407d473b71 feat(infra): route dns preflight through flowercore dns 2026-04-23 17:03:22 -05:00
Andrew Stoltz
f9593e494a Allow ttsreader piper voice downloads 2026-04-23 16:50:21 -05:00
Andrew Stoltz
5b6c7b97fc feat(fc-distribution): bump image to v202604232145 — cert chain endpoint
- Serves GET /manifests/{edition}/{version}.cert (leaf+intermediate PEM)
- Adds CertChainPem migration on startup (nullable column)
- ManifestSignService now embeds version-specific certChainUrl

Provisioning Agent's verify step will flip from ChainNotServed (Phase 2A
soft-pass) to Valid once a fresh edition is published with this image.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 16:43:56 -05:00
Andrew Stoltz
a76eeb5c39 Add dedicated selectable piper for ttsreader 2026-04-23 16:37:03 -05:00
Andrew Stoltz
8a960ffc73 feat(fc-distribution): K8s manifest for Phase 1 edition publisher
Adds apps/fc-distribution/{fc-distribution.yaml,kustomization.yaml,README.md}.
Ships the FlowerCore.Distribution service (Blazor + REST + MCP) backed by
Synology NFS for SQLite catalog + content-addressed blob root.

Contents:
- Namespace fc-distribution
- 3x OnePasswordItem (FlowerCore Code Signing CA informational + per-edition
  signing keys for kiosk-standard and aistation-field)
- Deployment: localhost/fc-distribution:v202604232000 (already imported to
  rke2-server via ctr), pinned to rke2-server nodeSelector because Synology
  NFS ACL restricts writes to that node, emptyDir for /tmp + /app/logs,
  inline NFS for /data (subPath distribution/data) and /blobs (subPath
  distribution/blobs), Secret volume mounts for /signing/<edition>.
  readOnlyRootFilesystem + runAsUser 1654 + drop ALL capabilities.
  Probes: startup + readiness on /healthz, liveness on tcpSocket (defense
  against future auth middleware accidentally gating /healthz).
- Service (ClusterIP :80 -> container :8080)
- Certificate (cert-manager ClusterIssuer step-ca-acme, dist.iamworkin.lan,
  90d / 30d renew). pfSense Unbound override dist.iamworkin.lan ->
  10.0.56.200 already in place (req'd for HTTP-01).
- IngressRoute (Traefik websecure, Host rule on dist.iamworkin.lan)

Env var keys align with the scaffold:
  FlowerCore__Database__ConnectionStrings__Sqlite
  FlowerCore__Distribution__Blobs__Root
  FlowerCore__Distribution__Signing__EditionCerts__<slug>__{CertPath,KeyPath}

Consumer: ProvisioningAgent (USB-side, Phase 2) — see
FlowerCore.Notes/docs/infrastructure/usb-provisioning-architecture.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 15:59:50 -05:00
Andrew Stoltz
686dbacc66 Bump TTS Reader image for render follow-along 2026-04-23 15:54:07 -05:00
Andrew Stoltz
5ccf055465 check-pfsense-dns: add live-cluster scan
Extends the pre-merge DNS gate to (optionally) scan live-cluster
Certificates + IngressRoutes via kubectl. Closes the coverage hole
where a service's IngressRoute gets deployed from its own repo (not
from bluejay-infra/apps/) and the manifests-only scan misses it —
fc-retail/retail-web-tls stuck Issuing for 15h on a missing pfSense
Unbound override was exactly this class of bug.

Auto mode: if kubectl is on PATH and usable, live-scan runs silently.
--live  forces it (and errors out if kubectl can't reach the cluster).
--no-live skips live entirely (CI path with no cluster access).

Immediate live-scan finding on 2026-04-23: 10 orphan *.iamworkin.lan
IngressRoutes from failed e2e / codex / smoke / deleteproof test runs
in fc-php + fc-tenant-default (2026-04-16/17). None have DNS overrides
so their Certificates have been failing to issue for 7 days — the new
CertManagerCertificateNotReady alert will catch them too. Cleanup
(delete abandoned IngressRoutes + Certificates + CertificateRequests)
is a separate task; this check now surfaces them.
2026-04-23 15:51:19 -05:00
Andrew Stoltz
4da60820c6 Deploy TTS Reader queue presentation fix 2026-04-23 15:13:21 -05:00
Andrew Stoltz
1cc4324cfb Deploy TTS Reader import and preview fixes 2026-04-23 14:28:08 -05:00
Andrew Stoltz
bfc755057e fix(agent-zero): use streamable http for chat mcp 2026-04-23 13:54:06 -05:00
Andrew Stoltz
d6008ee205 fix(agent-zero): allow chat mcp pod port 2026-04-23 13:29:36 -05:00
Andrew Stoltz
39fe6f1dba fix(agent-zero): route chat mcp in-cluster 2026-04-23 13:26:10 -05:00
Andrew Stoltz
90fcf0cd5d fix(agent-zero): expose openai provider key 2026-04-23 13:21:12 -05:00
Andrew Stoltz
ffef5c9126 Deploy TTS Reader annotation timeout fix 2026-04-23 13:06:17 -05:00
Andrew Stoltz
634e90a9ee Deploy TTS Reader quick hardening release 2026-04-23 12:47:45 -05:00
Andrew Stoltz
86ccca18e3 Add Chat MCP server to Agent Zero 2026-04-23 12:41:58 -05:00
Andrew Stoltz
1c5caf3f40 Deploy TTS Reader v20260423114016 2026-04-23 11:57:39 -05:00
Andrew Stoltz
d3db19b0ca guacamole: enable json auth for remotedesktop sso 2026-04-23 11:27:30 -05:00
Andrew Stoltz
702a6e4f52 fix(agent-zero): use short DNS name to avoid CoreDNS template hijack
The full FQDN fc-llm-bridge.fc-llm-bridge.svc.cluster.local has 4 dots,
which is less than the pod's ndots:5 threshold. The resolver then
applies every entry in the search list BEFORE falling through to the
bare FQDN, and the CoreDNS 'template iamworkin.lan' catch-all matches
"...svc.cluster.local.iamworkin.lan" and returns Traefik VIP
10.0.56.200. The egress NetworkPolicy blocks that VIP (0.0.0.0/0
EXCEPT 10.0.0.0/8), so curl hangs for 30-134s and returns HTTP 000.

Reference: feedback_coredns_ndots_template_collision memory.

Fix: use "fc-llm-bridge.fc-llm-bridge.svc" (2 dots, still <5 so search
expansion still fires, but the first suffix "...svc.cluster.local"
hits the Kubernetes plugin in CoreDNS and returns the real ClusterIP
10.43.67.125 before the iamworkin.lan template is ever consulted).

Verified: pod-exec curl fc:cheap → HTTP 200 with a real chat.completion
envelope (Ollama/gemma3:4b via bridge).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 10:02:09 -05:00
Andrew Stoltz
6cbb5d8792 fix(agent-zero): NetworkPolicy egress rule for fc-llm-bridge (ADR-088)
The chat_model flip (62db15c) pointed Agent Zero at
fc-llm-bridge.fc-llm-bridge.svc.cluster.local:8080 but the existing
agent-zero-netpol only allowed egress to specific node IPs
(10.0.56.20:11434, 10.0.57.17:11434, 10.0.57.16:5200, 10.0.56.11:6443)
plus public-internet (with RFC1918 exclusion). ClusterIP traffic to
10.43.0.0/16 was implicitly denied, so pod-exec curl to the bridge
timed out after 134s.

Adds an egress rule allowing TCP 8080 to the fc-llm-bridge namespace
(matched by kubernetes.io/metadata.name which K8s 1.22+ sets
automatically). No ingress changes needed — fc-llm-bridge has no
NetworkPolicy, so the ingress side is already open.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 09:59:17 -05:00
Andrew Stoltz
62db15c69c feat(agent-zero): route chat_model through fc-llm-bridge (ADR-088)
Flips Agent Zero's chat_model from direct local Ollama (gemma3:12b via
the 127.0.0.1:11434 sidecar proxy) to the FlowerCore LLM Bridge
(fc:balanced tier, OpenAI-compatible, Anthropic Claude Sonnet under the
hood) so chat turns are spend-tracked and can dispatch to any provider
via a single tier alias.

Scope is intentionally minimal and reversible:
  - chat_model: ollama/gemma3:12b/127.0.0.1:11434
              → openai/fc:balanced/fc-llm-bridge internal service URL
  - utility_model, embedding_model, browser_model: UNCHANGED
    (stay on local 127.0.0.1 Ollama sidecar — no spend, low latency,
    not worth routing through the bridge for small-model traffic).

Auth: new A0_SET_chat_model_api_key env var wired to the
fc-llm-bridge-api-keys Secret (field: agent-zero-k8s). The Secret is
synced by a new OnePasswordItem pointing at "FC LLM Bridge API Keys"
in the IAmWorkin vault. Bearer-token auth is now accepted by the
bridge (FlowerCore.LlmBridge@3225f1f).

Rollback: revert this commit; old image v202604231449 is still present
on all RKE2 nodes, and Agent Zero's strategy: Recreate makes the flip
atomic.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 09:54:27 -05:00
Andrew Stoltz
84634f59f0 chore(fc-llm-bridge): bump image to v202604231520
Ships the Bearer-token auth fix (FlowerCore.LlmBridge@3225f1f) so Agent
Zero's OpenAI provider can authenticate with Authorization: Bearer in
addition to the original X-Api-Key header.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 09:51:57 -05:00
Andrew Stoltz
4cd5806fd0 fix(fc-llm-bridge): set dnsConfig ndots=2 to prevent CoreDNS wildcard hijack
Pods in this cluster inherit ndots=5. External FQDNs with <5 dots (like
api.anthropic.com) are expanded through the search path first, and the 4th
suffix `api.anthropic.com.iamworkin.lan` matches CoreDNS' `template IN A
iamworkin.lan` wildcard — resolves to Traefik VIP 10.0.56.200. TLS connect
lands on Traefik's default cert and the AnthropicClient rejects with
RemoteCertificateNameMismatch/RemoteCertificateChainErrors.

Setting ndots=2 makes the resolver try the bare FQDN first (3 dots in
api.anthropic.com), so the search path never fires.

Reference: memory feedback_coredns_ndots_template_collision. Wider follow-up:
the CoreDNS template plugin should add fallthrough for external public suffixes,
so every FC service calling external HTTPS APIs stops hitting this trap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 09:42:17 -05:00
Andrew Stoltz
11c48bef30 chore(fc-llm-bridge): bump to v202604231449 (Budget 1.0.1 multi-provider dispatcher)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 09:36:05 -05:00
Andrew Stoltz
a86e87050b fix(fc-llm-bridge): anthropic secret key is 'password' not 'credential'
The 1Password item "Claude API Key" stores the key in a standard Password
field (labeled `password`), so the OnePasswordItem operator creates the K8s
Secret with key `password`. Deployment was referencing `credential`, which
made the pod fail with CreateContainerConfigError.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 09:29:32 -05:00
Andrew Stoltz
0214f94ac4 chore(fc-llm-bridge): bump image to v202604231424 (first live tag)
Built from FlowerCore.LlmBridge@6d285b5 (initial scaffold). Imported on all
three RKE2 nodes via podman save + ctr import. Replaces v00000000000000
placeholder — ArgoCD sync will roll the pod.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 09:28:05 -05:00
Andrew Stoltz
a1b8eb379d feat(fc-llm-bridge): stage ADR-088 manifests (not yet applied)
Staged but NOT applied. Do not git push until the two pre-requisites below
are done. See apps/fc-llm-bridge/README.md for the full order-of-ops.

Manifests (apps/fc-llm-bridge/fc-llm-bridge.yaml, 8 docs):
  - Namespace fc-llm-bridge
  - OnePasswordItem anthropic-api-key (existing Claude API Key item)
  - OnePasswordItem fc-llm-bridge-api-keys (NEW item, pending creation)
  - PersistentVolumeClaim fc-llm-bridge-data (2Gi longhorn)
  - Deployment fc-llm-bridge (port 8080, uid 1654, readOnlyRootFilesystem,
    tcpSocket probes to survive future ApiKeyAuthMiddleware reordering)
  - Service fc-llm-bridge ClusterIP
  - Certificate fc-llm-bridge-cert (step-ca-acme)
  - IngressRoute fc-llm-bridge (fc-llm-bridge.iamworkin.lan, websecure)

Pre-requisites BEFORE git push:
  1. pfSense Unbound override fc-llm-bridge.iamworkin.lan -> 10.0.56.200
     (currently NXDOMAIN -- verified via nslookup and check-pfsense-dns.py).
     Skipping this step puts cert-manager HTTP-01 into ~2h backoff.
  2. Create 1Password item `FC LLM Bridge API Keys` in vault IAmWorkin with
     password fields: agent-zero-ws, agent-zero-k8s, spare-1, spare-2.
  3. Build + import localhost/fc-llm-bridge:v<tag> to rke2-server +
     rke2-agent1 + rke2-agent2. Bump image tag from placeholder
     v00000000000000 before committing the apply.

Related: ADR-088 (FlowerCore.Notes/ARCHITECTURE.md), design doc at
FlowerCore.Notes/docs/ai-agents/agent-zero-anthropic-bridge.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 03:10:36 -05:00
Andrew Stoltz
9a1665907c fc-signalcontrol: align live port and selectors 2026-04-22 23:22:14 -05:00
Andrew Stoltz
899804215a statefulsets: align guacamole and matrix drift defaults 2026-04-22 23:11:47 -05:00