feat(fc-llm-bridge): stage ADR-088 manifests (not yet applied)

Staged but NOT applied. Do not git push until the two pre-requisites below
are done. See apps/fc-llm-bridge/README.md for the full order-of-ops.

Manifests (apps/fc-llm-bridge/fc-llm-bridge.yaml, 8 docs):
  - Namespace fc-llm-bridge
  - OnePasswordItem anthropic-api-key (existing Claude API Key item)
  - OnePasswordItem fc-llm-bridge-api-keys (NEW item, pending creation)
  - PersistentVolumeClaim fc-llm-bridge-data (2Gi longhorn)
  - Deployment fc-llm-bridge (port 8080, uid 1654, readOnlyRootFilesystem,
    tcpSocket probes to survive future ApiKeyAuthMiddleware reordering)
  - Service fc-llm-bridge ClusterIP
  - Certificate fc-llm-bridge-cert (step-ca-acme)
  - IngressRoute fc-llm-bridge (fc-llm-bridge.iamworkin.lan, websecure)

Pre-requisites BEFORE git push:
  1. pfSense Unbound override fc-llm-bridge.iamworkin.lan -> 10.0.56.200
     (currently NXDOMAIN -- verified via nslookup and check-pfsense-dns.py).
     Skipping this step puts cert-manager HTTP-01 into ~2h backoff.
  2. Create 1Password item `FC LLM Bridge API Keys` in vault IAmWorkin with
     password fields: agent-zero-ws, agent-zero-k8s, spare-1, spare-2.
  3. Build + import localhost/fc-llm-bridge:v<tag> to rke2-server +
     rke2-agent1 + rke2-agent2. Bump image tag from placeholder
     v00000000000000 before committing the apply.

Related: ADR-088 (FlowerCore.Notes/ARCHITECTURE.md), design doc at
FlowerCore.Notes/docs/ai-agents/agent-zero-anthropic-bridge.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Andrew Stoltz
2026-04-23 03:10:36 -05:00
parent 9a1665907c
commit a1b8eb379d
2 changed files with 439 additions and 0 deletions

View File

@@ -0,0 +1,174 @@
# fc-llm-bridge — staged deployment (ADR-088)
**Status:** manifests staged, **NOT YET APPLIED**. Do not `git push` or sync
ArgoCD until the two pre-requisites below are done, in order.
Design: [`../../../FlowerCore.Notes/docs/ai-agents/agent-zero-anthropic-bridge.md`](../../../FlowerCore.Notes/docs/ai-agents/agent-zero-anthropic-bridge.md)
ADR: ADR-088 in [`../../../FlowerCore.Notes/ARCHITECTURE.md`](../../../FlowerCore.Notes/ARCHITECTURE.md)
## Deployment order (do NOT skip / reorder)
### 1. pfSense Unbound DNS override — REQUIRED FIRST
`fc-llm-bridge.iamworkin.lan` is not currently in pfSense Unbound. Verified
with `python bluejay-infra/scripts/check-pfsense-dns.py` at staging time.
step-ca (the ACME CA on noc1) uses pfSense Unbound (10.0.56.1), **not**
cluster CoreDNS. If you apply this manifest before adding the DNS override,
cert-manager's HTTP-01 challenge silently fails for ~2h (exponential backoff)
until someone manually runs `kubectl -n fc-llm-bridge delete order <order>`
to bust the cache. See memory `feedback_pfsense_dns_required_for_acme.md`.
From `FlowerCore.Notes`:
```bash
# 1. Edit HOSTS list in scripts/pfsense-add-dns-overrides.py, append:
# ("fc-llm-bridge", "10.0.56.200", "cert-manager HTTP-01 target (Traefik VIP)"),
# 2. Source creds + run:
source scripts/credential-helper.sh
export PFSENSE_PASS=$(get_cred "pfSense Admin")
python scripts/pfsense-add-dns-overrides.py
```
Verify:
```bash
nslookup fc-llm-bridge.iamworkin.lan 10.0.56.1
# Expect: Address: 10.0.56.200
```
Or run the full pre-merge gate from `bluejay-infra`:
```bash
python scripts/check-pfsense-dns.py
# Expect: OK fc-llm-bridge.iamworkin.lan -> 10.0.56.200
```
### 2. Create the `FC LLM Bridge API Keys` 1Password item
The `Claude API Key` item in vault `IAmWorkin` already exists (id
`e5tth3y5mp3lhdavg35pxadzca`, see `docs/ai-agents/anthropic-integration.md`).
The new item for per-consumer bridge API keys does NOT yet exist. Create it
before the first apply of this manifest — the Deployment marks the individual
key env vars `optional: true` so missing keys will not crash the pod, but the
bridge will reject every request with 401 until at least one key is populated.
| Field | Item position | Type | Purpose |
|-------|---------------|------|---------|
| `credential` | Top section | Password (random, 48 char) | Unused placeholder required by the 1Password schema for single-field items. Can be anything — this file is never read by K8s. |
| `agent-zero-ws` | "API Keys" section | Password (random, 48 char) | API key for the BLUEJAY-WS Agent Zero instance. |
| `agent-zero-k8s` | "API Keys" section | Password (random, 48 char) | API key for the K8s-hosted `agent-zero` Deployment. |
| `spare-1` | "API Keys" section | Password (random, 48 char) | Reserve for future Agent Zero forks / smoke-test scripts. |
| `spare-2` | "API Keys" section | Password (random, 48 char) | Reserve. |
Steps via the CLI (run from a machine with `op` signed in):
```bash
op item create \
--category="API Credential" \
--title="FC LLM Bridge API Keys" \
--vault="IAmWorkin" \
"API Keys.agent-zero-ws[password]=$(openssl rand -hex 24)" \
"API Keys.agent-zero-k8s[password]=$(openssl rand -hex 24)" \
"API Keys.spare-1[password]=$(openssl rand -hex 24)" \
"API Keys.spare-2[password]=$(openssl rand -hex 24)"
```
OR via the 1Password GUI — create a new item titled exactly `FC LLM Bridge API
Keys` in the `IAmWorkin` vault, add an `API Keys` section, add four password
fields named `agent-zero-ws`, `agent-zero-k8s`, `spare-1`, `spare-2` with
`openssl rand -hex 24` values.
**Mapping to K8s:** The 1Password Connect operator syncs each field to a
Secret key of the same name. The Deployment's env vars
(`FlowerCore__LlmBridge__ApiKeys__agent-zero-ws` etc) reference those Secret
keys. In `FlowerCore.Shared.Api.Authentication.ApiKeyAuthMiddleware`, the key
name (e.g. `agent-zero-k8s`) becomes the `fc.app` claim on the
`ClaimsPrincipal`, which is what `IBudgetLedger` uses to scope spend per
consumer.
### 3. Build + import the image to every RKE2 node
```bash
# From BLUEJAY-WS, in D:\git\FlowerCore\FlowerCore.LlmBridge
TAG="v$(date +%Y%m%d%H%M%S)"
dotnet.exe publish -c Release -o deploy/app \
src/FlowerCore.LlmBridge.Web/FlowerCore.LlmBridge.Web.csproj
podman build -t localhost/fc-llm-bridge:$TAG -f deploy/Dockerfile.deploy deploy
podman save localhost/fc-llm-bridge:$TAG -o /tmp/fc-llm-bridge.tar
# SCP to each node and ctr import
for NODE in rke2-server rke2-agent1 rke2-agent2; do
scp /tmp/fc-llm-bridge.tar $NODE:/tmp/
ssh $NODE "sudo /var/lib/rancher/rke2/bin/ctr -a /run/k3s/containerd/containerd.sock -n k8s.io images import /tmp/fc-llm-bridge.tar"
done
```
### 4. Bump the image tag in the manifest
Edit `fc-llm-bridge.yaml`, replace `localhost/fc-llm-bridge:v00000000000000`
with the tag from step 3.
### 5. Commit + push
```bash
cd D:/git/FlowerCore/bluejay-infra
# re-run the DNS gate
python scripts/check-pfsense-dns.py
git add apps/fc-llm-bridge/
git commit -m "feat(fc-llm-bridge): deploy ADR-088 Agent Zero bridge"
git push
```
ArgoCD picks up within ~3 minutes and creates `infra-fc-llm-bridge`.
### 6. Verify
```bash
# From noc1
fcadmin_ssh noc1 '
kubectl -n argocd get application infra-fc-llm-bridge
kubectl -n fc-llm-bridge get certificate,pod
curl -sk -m 8 -o /dev/null -w "HTTP %{http_code}\n" https://fc-llm-bridge.iamworkin.lan/healthz
'
```
Expect: Certificate `Ready: True` within ~60s, `/healthz` HTTP 200.
### 7. Flip Agent Zero to the bridge
After the bridge passes a real chat smoke test, update the Agent Zero
ConfigMap (`apps/agent-zero/agent-zero.yaml`) to route through the bridge:
- `A0_SET_chat_model_api_base` / `config.json > chat_model.api_base`
-> `https://fc-llm-bridge.iamworkin.lan/v1`
- Add an `A0_SET_chat_model_api_key` env var wired to a K8s Secret sourced
from `FC LLM Bridge API Keys` field `agent-zero-k8s`.
- Set `chat_model.name` to `fc:balanced` (or a concrete model) — the bridge
accepts both tier aliases and concrete model names.
Do the same for BLUEJAY-WS Agent Zero (`agent-zero-ws` key), or keep the
workstation on direct Ollama and only route Anthropic calls through the
bridge (the design doc describes this split as the preferred approach).
## Current state at staging time (2026-04-23)
- `fc-llm-bridge.iamworkin.lan` — NOT in pfSense Unbound (verified via
`nslookup fc-llm-bridge.iamworkin.lan 10.0.56.1`: NXDOMAIN).
- `FC LLM Bridge API Keys` — NOT created in 1Password (user action).
- `Claude API Key` — already exists in `IAmWorkin` vault
(`e5tth3y5mp3lhdavg35pxadzca`), also consumed by AiStation and Chat.Web.
- `localhost/fc-llm-bridge:v*` image — not yet built; `FlowerCore.LlmBridge`
repo has local commit `6d285b5` only, no remote.
- ArgoCD `infra-fc-llm-bridge` Application — will be auto-created by the
`bluejay-infra` ApplicationSet once the directory is on `main`.
## Why tcpSocket probes (not `/healthz`)
The bridge runs `ApiKeyAuthMiddleware`. `/healthz` and `/health` are exempt
via `FlowerCore:LlmBridge:AuthExemptPaths`, so an HTTP probe would work
today. But a future change to the middleware registration order could
silently turn kubelet probes into 401/404, which crashes pods on every
deploy. `tcpSocket` keeps probes robust against that regression. Memory:
`feedback_k8s_probes_behind_auth_middleware.md`.

View File

@@ -0,0 +1,265 @@
# FlowerCore.LlmBridge — OpenAI-compatible bridge for Agent Zero.
# Routes through FlowerCore.Shared.Chat (ILlmProviderClient) with budget
# enforcement, response caching, and tier-based model routing. Lets Agent
# Zero (Python) reach Anthropic and Ollama providers without re-implementing
# the C# budget/cache/router primitives.
#
# Design: FlowerCore.Notes/docs/ai-agents/agent-zero-anthropic-bridge.md
# ADR: FlowerCore.Notes/ARCHITECTURE.md (ADR-088)
#
# Deployment order (see bluejay-infra/README.md):
# 1. pfSense DNS override for fc-llm-bridge.iamworkin.lan -> 10.0.56.200
# (REQUIRED before this is applied — cert-manager HTTP-01 will silently
# fail for ~2h backoff otherwise). Run scripts/pfsense-add-dns-overrides.py.
# 2. 1Password items `Claude API Key` (already exists) and
# `FC LLM Bridge API Keys` (create when first non-dev environment comes up).
# 3. Build + import image: localhost/fc-llm-bridge:v<YYYYMMDD><HHMM>
# Import to rke2-server, rke2-agent1, rke2-agent2 via ctr images import.
# 4. Bump the image tag below and git push; ArgoCD ApplicationSet picks up.
# 5. Flip Agent Zero chat.openai.base_url to https://fc-llm-bridge.iamworkin.lan/v1
# and api_key to the op://IAmWorkin/FC LLM Bridge API Keys/agent-zero-k8s value.
---
apiVersion: v1
kind: Namespace
metadata:
name: fc-llm-bridge
labels:
app.kubernetes.io/part-of: flowercore
---
# Claude (Anthropic) API key — shared across FC services.
# Existing 1Password item. `credential` field -> Secret `anthropic-api-key`.
apiVersion: onepassword.com/v1
kind: OnePasswordItem
metadata:
name: anthropic-api-key
namespace: fc-llm-bridge
spec:
itemPath: "vaults/IAmWorkin/items/Claude API Key"
---
# Per-consumer API keys for the bridge itself.
# NEW 1Password item — see apps/fc-llm-bridge/README.md for the field layout
# to create before first apply. Fields become Secret keys of the same name:
# agent-zero-ws, agent-zero-k8s, spare-1, spare-2
apiVersion: onepassword.com/v1
kind: OnePasswordItem
metadata:
name: fc-llm-bridge-api-keys
namespace: fc-llm-bridge
spec:
itemPath: "vaults/IAmWorkin/items/FC LLM Bridge API Keys"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: fc-llm-bridge-data
namespace: fc-llm-bridge
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
resources:
requests:
storage: 2Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: fc-llm-bridge
namespace: fc-llm-bridge
labels:
app.kubernetes.io/name: fc-llm-bridge
app.kubernetes.io/part-of: flowercore
spec:
replicas: 1
revisionHistoryLimit: 3
strategy:
type: Recreate
selector:
matchLabels:
app.kubernetes.io/name: fc-llm-bridge
template:
metadata:
labels:
app.kubernetes.io/name: fc-llm-bridge
app.kubernetes.io/part-of: flowercore
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/metrics"
spec:
securityContext:
fsGroup: 1654
fsGroupChangePolicy: OnRootMismatch
containers:
- name: web
# Placeholder tag — bump to the image you built + imported to every
# RKE2 node before applying. Build with:
# dotnet.exe publish -c Release -o deploy/app \
# src/FlowerCore.LlmBridge.Web/FlowerCore.LlmBridge.Web.csproj
# podman build -t localhost/fc-llm-bridge:v<tag> -f deploy/Dockerfile.deploy deploy
image: localhost/fc-llm-bridge:v00000000000000
imagePullPolicy: Never
ports:
- containerPort: 8080
name: http
env:
- name: ASPNETCORE_URLS
value: "http://+:8080"
- name: ASPNETCORE_ENVIRONMENT
value: "Production"
- name: DOTNET_SYSTEM_GLOBALIZATION_INVARIANT
value: "false"
# SQLite (budget ledger + response cache + data-protection keys)
- name: FlowerCore__LlmBridge__SqliteConnectionString
value: "Data Source=/data/llm-bridge.db"
- name: FlowerCore__LlmBridge__DefaultTenantId
value: "default"
- name: FlowerCore__LlmBridge__DefaultAppName
value: "agent-zero"
# Per-consumer API keys — from OnePasswordItem fc-llm-bridge-api-keys.
# Each field becomes a Secret key of the same name. The key-name
# lands in the auth principal's `fc.app` claim for ledger scoping.
- name: FlowerCore__LlmBridge__ApiKeys__agent-zero-ws
valueFrom:
secretKeyRef:
name: fc-llm-bridge-api-keys
key: agent-zero-ws
optional: true
- name: FlowerCore__LlmBridge__ApiKeys__agent-zero-k8s
valueFrom:
secretKeyRef:
name: fc-llm-bridge-api-keys
key: agent-zero-k8s
optional: true
- name: FlowerCore__LlmBridge__ApiKeys__spare-1
valueFrom:
secretKeyRef:
name: fc-llm-bridge-api-keys
key: spare-1
optional: true
- name: FlowerCore__LlmBridge__ApiKeys__spare-2
valueFrom:
secretKeyRef:
name: fc-llm-bridge-api-keys
key: spare-2
optional: true
# Shared.Chat — Ollama (edge1 Pi 5 + AI HAT+, matches bridge default)
- name: FlowerCore__Chat__OllamaBaseUrl
value: "http://10.0.57.17:11434"
- name: FlowerCore__Chat__HttpTimeout
value: "00:05:00"
# Shared.Chat — Anthropic
- name: FlowerCore__Chat__Anthropic__Enabled
value: "true"
- name: FlowerCore__Chat__Anthropic__ApiKey
valueFrom:
secretKeyRef:
name: anthropic-api-key
key: credential
- name: FlowerCore__Chat__Anthropic__OrganizationId
valueFrom:
secretKeyRef:
name: anthropic-api-key
key: organization_id
optional: true
- name: FlowerCore__Chat__Anthropic__BaseUrl
value: "https://api.anthropic.com"
- name: FlowerCore__Chat__Anthropic__DefaultModel
value: "claude-sonnet-4-6"
- name: FlowerCore__Chat__Anthropic__AnthropicVersion
value: "2023-06-01"
- name: FlowerCore__Chat__Anthropic__Timeout
value: "00:05:00"
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 1000m
memory: 768Mi
volumeMounts:
- name: data
mountPath: /data
- name: tmp
mountPath: /tmp
- name: app-data
mountPath: /app/data
securityContext:
runAsNonRoot: true
runAsUser: 1654
runAsGroup: 1654
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
# tcpSocket probes: the app runs ApiKeyAuthMiddleware. /healthz is
# registered as anonymous via AuthExemptPaths but tcpSocket avoids any
# future accidental middleware ordering regression
# (memory: feedback_k8s_probes_behind_auth_middleware).
readinessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
livenessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 15
periodSeconds: 30
volumes:
- name: data
persistentVolumeClaim:
claimName: fc-llm-bridge-data
- name: tmp
emptyDir: {}
# The Dockerfile `WORKDIR /app` pairs with the default
# SqliteConnectionString "Data Source=data/llm-bridge.db" (relative).
# The env var above overrides to /data, so /app/data can be emptyDir.
- name: app-data
emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
name: fc-llm-bridge
namespace: fc-llm-bridge
spec:
selector:
app.kubernetes.io/name: fc-llm-bridge
ports:
- port: 8080
targetPort: 8080
name: http
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: fc-llm-bridge-cert
namespace: fc-llm-bridge
spec:
secretName: fc-llm-bridge-tls
issuerRef:
name: step-ca-acme
kind: ClusterIssuer
dnsNames:
- fc-llm-bridge.iamworkin.lan
duration: 720h
renewBefore: 240h
---
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: fc-llm-bridge
namespace: fc-llm-bridge
spec:
entryPoints:
- websecure
routes:
- match: Host(`fc-llm-bridge.iamworkin.lan`)
kind: Rule
services:
- name: fc-llm-bridge
port: 8080
tls:
secretName: fc-llm-bridge-tls