Files
bluejay-infra/apps/fc-llm-bridge
Andrew Stoltz a1b8eb379d feat(fc-llm-bridge): stage ADR-088 manifests (not yet applied)
Staged but NOT applied. Do not git push until the two pre-requisites below
are done. See apps/fc-llm-bridge/README.md for the full order-of-ops.

Manifests (apps/fc-llm-bridge/fc-llm-bridge.yaml, 8 docs):
  - Namespace fc-llm-bridge
  - OnePasswordItem anthropic-api-key (existing Claude API Key item)
  - OnePasswordItem fc-llm-bridge-api-keys (NEW item, pending creation)
  - PersistentVolumeClaim fc-llm-bridge-data (2Gi longhorn)
  - Deployment fc-llm-bridge (port 8080, uid 1654, readOnlyRootFilesystem,
    tcpSocket probes to survive future ApiKeyAuthMiddleware reordering)
  - Service fc-llm-bridge ClusterIP
  - Certificate fc-llm-bridge-cert (step-ca-acme)
  - IngressRoute fc-llm-bridge (fc-llm-bridge.iamworkin.lan, websecure)

Pre-requisites BEFORE git push:
  1. pfSense Unbound override fc-llm-bridge.iamworkin.lan -> 10.0.56.200
     (currently NXDOMAIN -- verified via nslookup and check-pfsense-dns.py).
     Skipping this step puts cert-manager HTTP-01 into ~2h backoff.
  2. Create 1Password item `FC LLM Bridge API Keys` in vault IAmWorkin with
     password fields: agent-zero-ws, agent-zero-k8s, spare-1, spare-2.
  3. Build + import localhost/fc-llm-bridge:v<tag> to rke2-server +
     rke2-agent1 + rke2-agent2. Bump image tag from placeholder
     v00000000000000 before committing the apply.

Related: ADR-088 (FlowerCore.Notes/ARCHITECTURE.md), design doc at
FlowerCore.Notes/docs/ai-agents/agent-zero-anthropic-bridge.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 03:10:36 -05:00
..

fc-llm-bridge — staged deployment (ADR-088)

Status: manifests staged, NOT YET APPLIED. Do not git push or sync ArgoCD until the two pre-requisites below are done, in order.

Design: ../../../FlowerCore.Notes/docs/ai-agents/agent-zero-anthropic-bridge.md ADR: ADR-088 in ../../../FlowerCore.Notes/ARCHITECTURE.md

Deployment order (do NOT skip / reorder)

1. pfSense Unbound DNS override — REQUIRED FIRST

fc-llm-bridge.iamworkin.lan is not currently in pfSense Unbound. Verified with python bluejay-infra/scripts/check-pfsense-dns.py at staging time.

step-ca (the ACME CA on noc1) uses pfSense Unbound (10.0.56.1), not cluster CoreDNS. If you apply this manifest before adding the DNS override, cert-manager's HTTP-01 challenge silently fails for ~2h (exponential backoff) until someone manually runs kubectl -n fc-llm-bridge delete order <order> to bust the cache. See memory feedback_pfsense_dns_required_for_acme.md.

From FlowerCore.Notes:

# 1. Edit HOSTS list in scripts/pfsense-add-dns-overrides.py, append:
#    ("fc-llm-bridge", "10.0.56.200", "cert-manager HTTP-01 target (Traefik VIP)"),
# 2. Source creds + run:
source scripts/credential-helper.sh
export PFSENSE_PASS=$(get_cred "pfSense Admin")
python scripts/pfsense-add-dns-overrides.py

Verify:

nslookup fc-llm-bridge.iamworkin.lan 10.0.56.1
# Expect: Address: 10.0.56.200

Or run the full pre-merge gate from bluejay-infra:

python scripts/check-pfsense-dns.py
# Expect: OK   fc-llm-bridge.iamworkin.lan                   -> 10.0.56.200

2. Create the FC LLM Bridge API Keys 1Password item

The Claude API Key item in vault IAmWorkin already exists (id e5tth3y5mp3lhdavg35pxadzca, see docs/ai-agents/anthropic-integration.md).

The new item for per-consumer bridge API keys does NOT yet exist. Create it before the first apply of this manifest — the Deployment marks the individual key env vars optional: true so missing keys will not crash the pod, but the bridge will reject every request with 401 until at least one key is populated.

Field Item position Type Purpose
credential Top section Password (random, 48 char) Unused placeholder required by the 1Password schema for single-field items. Can be anything — this file is never read by K8s.
agent-zero-ws "API Keys" section Password (random, 48 char) API key for the BLUEJAY-WS Agent Zero instance.
agent-zero-k8s "API Keys" section Password (random, 48 char) API key for the K8s-hosted agent-zero Deployment.
spare-1 "API Keys" section Password (random, 48 char) Reserve for future Agent Zero forks / smoke-test scripts.
spare-2 "API Keys" section Password (random, 48 char) Reserve.

Steps via the CLI (run from a machine with op signed in):

op item create \
  --category="API Credential" \
  --title="FC LLM Bridge API Keys" \
  --vault="IAmWorkin" \
  "API Keys.agent-zero-ws[password]=$(openssl rand -hex 24)" \
  "API Keys.agent-zero-k8s[password]=$(openssl rand -hex 24)" \
  "API Keys.spare-1[password]=$(openssl rand -hex 24)" \
  "API Keys.spare-2[password]=$(openssl rand -hex 24)"

OR via the 1Password GUI — create a new item titled exactly FC LLM Bridge API Keys in the IAmWorkin vault, add an API Keys section, add four password fields named agent-zero-ws, agent-zero-k8s, spare-1, spare-2 with openssl rand -hex 24 values.

Mapping to K8s: The 1Password Connect operator syncs each field to a Secret key of the same name. The Deployment's env vars (FlowerCore__LlmBridge__ApiKeys__agent-zero-ws etc) reference those Secret keys. In FlowerCore.Shared.Api.Authentication.ApiKeyAuthMiddleware, the key name (e.g. agent-zero-k8s) becomes the fc.app claim on the ClaimsPrincipal, which is what IBudgetLedger uses to scope spend per consumer.

3. Build + import the image to every RKE2 node

# From BLUEJAY-WS, in D:\git\FlowerCore\FlowerCore.LlmBridge
TAG="v$(date +%Y%m%d%H%M%S)"
dotnet.exe publish -c Release -o deploy/app \
  src/FlowerCore.LlmBridge.Web/FlowerCore.LlmBridge.Web.csproj
podman build -t localhost/fc-llm-bridge:$TAG -f deploy/Dockerfile.deploy deploy
podman save localhost/fc-llm-bridge:$TAG -o /tmp/fc-llm-bridge.tar

# SCP to each node and ctr import
for NODE in rke2-server rke2-agent1 rke2-agent2; do
  scp /tmp/fc-llm-bridge.tar $NODE:/tmp/
  ssh $NODE "sudo /var/lib/rancher/rke2/bin/ctr -a /run/k3s/containerd/containerd.sock -n k8s.io images import /tmp/fc-llm-bridge.tar"
done

4. Bump the image tag in the manifest

Edit fc-llm-bridge.yaml, replace localhost/fc-llm-bridge:v00000000000000 with the tag from step 3.

5. Commit + push

cd D:/git/FlowerCore/bluejay-infra
# re-run the DNS gate
python scripts/check-pfsense-dns.py
git add apps/fc-llm-bridge/
git commit -m "feat(fc-llm-bridge): deploy ADR-088 Agent Zero bridge"
git push

ArgoCD picks up within ~3 minutes and creates infra-fc-llm-bridge.

6. Verify

# From noc1
fcadmin_ssh noc1 '
  kubectl -n argocd get application infra-fc-llm-bridge
  kubectl -n fc-llm-bridge get certificate,pod
  curl -sk -m 8 -o /dev/null -w "HTTP %{http_code}\n" https://fc-llm-bridge.iamworkin.lan/healthz
'

Expect: Certificate Ready: True within ~60s, /healthz HTTP 200.

7. Flip Agent Zero to the bridge

After the bridge passes a real chat smoke test, update the Agent Zero ConfigMap (apps/agent-zero/agent-zero.yaml) to route through the bridge:

  • A0_SET_chat_model_api_base / config.json > chat_model.api_base -> https://fc-llm-bridge.iamworkin.lan/v1
  • Add an A0_SET_chat_model_api_key env var wired to a K8s Secret sourced from FC LLM Bridge API Keys field agent-zero-k8s.
  • Set chat_model.name to fc:balanced (or a concrete model) — the bridge accepts both tier aliases and concrete model names.

Do the same for BLUEJAY-WS Agent Zero (agent-zero-ws key), or keep the workstation on direct Ollama and only route Anthropic calls through the bridge (the design doc describes this split as the preferred approach).

Current state at staging time (2026-04-23)

  • fc-llm-bridge.iamworkin.lan — NOT in pfSense Unbound (verified via nslookup fc-llm-bridge.iamworkin.lan 10.0.56.1: NXDOMAIN).
  • FC LLM Bridge API Keys — NOT created in 1Password (user action).
  • Claude API Key — already exists in IAmWorkin vault (e5tth3y5mp3lhdavg35pxadzca), also consumed by AiStation and Chat.Web.
  • localhost/fc-llm-bridge:v* image — not yet built; FlowerCore.LlmBridge repo has local commit 6d285b5 only, no remote.
  • ArgoCD infra-fc-llm-bridge Application — will be auto-created by the bluejay-infra ApplicationSet once the directory is on main.

Why tcpSocket probes (not /healthz)

The bridge runs ApiKeyAuthMiddleware. /healthz and /health are exempt via FlowerCore:LlmBridge:AuthExemptPaths, so an HTTP probe would work today. But a future change to the middleware registration order could silently turn kubelet probes into 401/404, which crashes pods on every deploy. tcpSocket keeps probes robust against that regression. Memory: feedback_k8s_probes_behind_auth_middleware.md.