feat(fc-llm-bridge): stage ADR-088 manifests (not yet applied)
Staged but NOT applied. Do not git push until the two pre-requisites below
are done. See apps/fc-llm-bridge/README.md for the full order-of-ops.
Manifests (apps/fc-llm-bridge/fc-llm-bridge.yaml, 8 docs):
- Namespace fc-llm-bridge
- OnePasswordItem anthropic-api-key (existing Claude API Key item)
- OnePasswordItem fc-llm-bridge-api-keys (NEW item, pending creation)
- PersistentVolumeClaim fc-llm-bridge-data (2Gi longhorn)
- Deployment fc-llm-bridge (port 8080, uid 1654, readOnlyRootFilesystem,
tcpSocket probes to survive future ApiKeyAuthMiddleware reordering)
- Service fc-llm-bridge ClusterIP
- Certificate fc-llm-bridge-cert (step-ca-acme)
- IngressRoute fc-llm-bridge (fc-llm-bridge.iamworkin.lan, websecure)
Pre-requisites BEFORE git push:
1. pfSense Unbound override fc-llm-bridge.iamworkin.lan -> 10.0.56.200
(currently NXDOMAIN -- verified via nslookup and check-pfsense-dns.py).
Skipping this step puts cert-manager HTTP-01 into ~2h backoff.
2. Create 1Password item `FC LLM Bridge API Keys` in vault IAmWorkin with
password fields: agent-zero-ws, agent-zero-k8s, spare-1, spare-2.
3. Build + import localhost/fc-llm-bridge:v<tag> to rke2-server +
rke2-agent1 + rke2-agent2. Bump image tag from placeholder
v00000000000000 before committing the apply.
Related: ADR-088 (FlowerCore.Notes/ARCHITECTURE.md), design doc at
FlowerCore.Notes/docs/ai-agents/agent-zero-anthropic-bridge.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
174
apps/fc-llm-bridge/README.md
Normal file
174
apps/fc-llm-bridge/README.md
Normal file
@@ -0,0 +1,174 @@
|
||||
# fc-llm-bridge — staged deployment (ADR-088)
|
||||
|
||||
**Status:** manifests staged, **NOT YET APPLIED**. Do not `git push` or sync
|
||||
ArgoCD until the two pre-requisites below are done, in order.
|
||||
|
||||
Design: [`../../../FlowerCore.Notes/docs/ai-agents/agent-zero-anthropic-bridge.md`](../../../FlowerCore.Notes/docs/ai-agents/agent-zero-anthropic-bridge.md)
|
||||
ADR: ADR-088 in [`../../../FlowerCore.Notes/ARCHITECTURE.md`](../../../FlowerCore.Notes/ARCHITECTURE.md)
|
||||
|
||||
## Deployment order (do NOT skip / reorder)
|
||||
|
||||
### 1. pfSense Unbound DNS override — REQUIRED FIRST
|
||||
|
||||
`fc-llm-bridge.iamworkin.lan` is not currently in pfSense Unbound. Verified
|
||||
with `python bluejay-infra/scripts/check-pfsense-dns.py` at staging time.
|
||||
|
||||
step-ca (the ACME CA on noc1) uses pfSense Unbound (10.0.56.1), **not**
|
||||
cluster CoreDNS. If you apply this manifest before adding the DNS override,
|
||||
cert-manager's HTTP-01 challenge silently fails for ~2h (exponential backoff)
|
||||
until someone manually runs `kubectl -n fc-llm-bridge delete order <order>`
|
||||
to bust the cache. See memory `feedback_pfsense_dns_required_for_acme.md`.
|
||||
|
||||
From `FlowerCore.Notes`:
|
||||
|
||||
```bash
|
||||
# 1. Edit HOSTS list in scripts/pfsense-add-dns-overrides.py, append:
|
||||
# ("fc-llm-bridge", "10.0.56.200", "cert-manager HTTP-01 target (Traefik VIP)"),
|
||||
# 2. Source creds + run:
|
||||
source scripts/credential-helper.sh
|
||||
export PFSENSE_PASS=$(get_cred "pfSense Admin")
|
||||
python scripts/pfsense-add-dns-overrides.py
|
||||
```
|
||||
|
||||
Verify:
|
||||
|
||||
```bash
|
||||
nslookup fc-llm-bridge.iamworkin.lan 10.0.56.1
|
||||
# Expect: Address: 10.0.56.200
|
||||
```
|
||||
|
||||
Or run the full pre-merge gate from `bluejay-infra`:
|
||||
|
||||
```bash
|
||||
python scripts/check-pfsense-dns.py
|
||||
# Expect: OK fc-llm-bridge.iamworkin.lan -> 10.0.56.200
|
||||
```
|
||||
|
||||
### 2. Create the `FC LLM Bridge API Keys` 1Password item
|
||||
|
||||
The `Claude API Key` item in vault `IAmWorkin` already exists (id
|
||||
`e5tth3y5mp3lhdavg35pxadzca`, see `docs/ai-agents/anthropic-integration.md`).
|
||||
|
||||
The new item for per-consumer bridge API keys does NOT yet exist. Create it
|
||||
before the first apply of this manifest — the Deployment marks the individual
|
||||
key env vars `optional: true` so missing keys will not crash the pod, but the
|
||||
bridge will reject every request with 401 until at least one key is populated.
|
||||
|
||||
| Field | Item position | Type | Purpose |
|
||||
|-------|---------------|------|---------|
|
||||
| `credential` | Top section | Password (random, 48 char) | Unused placeholder required by the 1Password schema for single-field items. Can be anything — this file is never read by K8s. |
|
||||
| `agent-zero-ws` | "API Keys" section | Password (random, 48 char) | API key for the BLUEJAY-WS Agent Zero instance. |
|
||||
| `agent-zero-k8s` | "API Keys" section | Password (random, 48 char) | API key for the K8s-hosted `agent-zero` Deployment. |
|
||||
| `spare-1` | "API Keys" section | Password (random, 48 char) | Reserve for future Agent Zero forks / smoke-test scripts. |
|
||||
| `spare-2` | "API Keys" section | Password (random, 48 char) | Reserve. |
|
||||
|
||||
Steps via the CLI (run from a machine with `op` signed in):
|
||||
|
||||
```bash
|
||||
op item create \
|
||||
--category="API Credential" \
|
||||
--title="FC LLM Bridge API Keys" \
|
||||
--vault="IAmWorkin" \
|
||||
"API Keys.agent-zero-ws[password]=$(openssl rand -hex 24)" \
|
||||
"API Keys.agent-zero-k8s[password]=$(openssl rand -hex 24)" \
|
||||
"API Keys.spare-1[password]=$(openssl rand -hex 24)" \
|
||||
"API Keys.spare-2[password]=$(openssl rand -hex 24)"
|
||||
```
|
||||
|
||||
OR via the 1Password GUI — create a new item titled exactly `FC LLM Bridge API
|
||||
Keys` in the `IAmWorkin` vault, add an `API Keys` section, add four password
|
||||
fields named `agent-zero-ws`, `agent-zero-k8s`, `spare-1`, `spare-2` with
|
||||
`openssl rand -hex 24` values.
|
||||
|
||||
**Mapping to K8s:** The 1Password Connect operator syncs each field to a
|
||||
Secret key of the same name. The Deployment's env vars
|
||||
(`FlowerCore__LlmBridge__ApiKeys__agent-zero-ws` etc) reference those Secret
|
||||
keys. In `FlowerCore.Shared.Api.Authentication.ApiKeyAuthMiddleware`, the key
|
||||
name (e.g. `agent-zero-k8s`) becomes the `fc.app` claim on the
|
||||
`ClaimsPrincipal`, which is what `IBudgetLedger` uses to scope spend per
|
||||
consumer.
|
||||
|
||||
### 3. Build + import the image to every RKE2 node
|
||||
|
||||
```bash
|
||||
# From BLUEJAY-WS, in D:\git\FlowerCore\FlowerCore.LlmBridge
|
||||
TAG="v$(date +%Y%m%d%H%M%S)"
|
||||
dotnet.exe publish -c Release -o deploy/app \
|
||||
src/FlowerCore.LlmBridge.Web/FlowerCore.LlmBridge.Web.csproj
|
||||
podman build -t localhost/fc-llm-bridge:$TAG -f deploy/Dockerfile.deploy deploy
|
||||
podman save localhost/fc-llm-bridge:$TAG -o /tmp/fc-llm-bridge.tar
|
||||
|
||||
# SCP to each node and ctr import
|
||||
for NODE in rke2-server rke2-agent1 rke2-agent2; do
|
||||
scp /tmp/fc-llm-bridge.tar $NODE:/tmp/
|
||||
ssh $NODE "sudo /var/lib/rancher/rke2/bin/ctr -a /run/k3s/containerd/containerd.sock -n k8s.io images import /tmp/fc-llm-bridge.tar"
|
||||
done
|
||||
```
|
||||
|
||||
### 4. Bump the image tag in the manifest
|
||||
|
||||
Edit `fc-llm-bridge.yaml`, replace `localhost/fc-llm-bridge:v00000000000000`
|
||||
with the tag from step 3.
|
||||
|
||||
### 5. Commit + push
|
||||
|
||||
```bash
|
||||
cd D:/git/FlowerCore/bluejay-infra
|
||||
# re-run the DNS gate
|
||||
python scripts/check-pfsense-dns.py
|
||||
git add apps/fc-llm-bridge/
|
||||
git commit -m "feat(fc-llm-bridge): deploy ADR-088 Agent Zero bridge"
|
||||
git push
|
||||
```
|
||||
|
||||
ArgoCD picks up within ~3 minutes and creates `infra-fc-llm-bridge`.
|
||||
|
||||
### 6. Verify
|
||||
|
||||
```bash
|
||||
# From noc1
|
||||
fcadmin_ssh noc1 '
|
||||
kubectl -n argocd get application infra-fc-llm-bridge
|
||||
kubectl -n fc-llm-bridge get certificate,pod
|
||||
curl -sk -m 8 -o /dev/null -w "HTTP %{http_code}\n" https://fc-llm-bridge.iamworkin.lan/healthz
|
||||
'
|
||||
```
|
||||
|
||||
Expect: Certificate `Ready: True` within ~60s, `/healthz` HTTP 200.
|
||||
|
||||
### 7. Flip Agent Zero to the bridge
|
||||
|
||||
After the bridge passes a real chat smoke test, update the Agent Zero
|
||||
ConfigMap (`apps/agent-zero/agent-zero.yaml`) to route through the bridge:
|
||||
|
||||
- `A0_SET_chat_model_api_base` / `config.json > chat_model.api_base`
|
||||
-> `https://fc-llm-bridge.iamworkin.lan/v1`
|
||||
- Add an `A0_SET_chat_model_api_key` env var wired to a K8s Secret sourced
|
||||
from `FC LLM Bridge API Keys` field `agent-zero-k8s`.
|
||||
- Set `chat_model.name` to `fc:balanced` (or a concrete model) — the bridge
|
||||
accepts both tier aliases and concrete model names.
|
||||
|
||||
Do the same for BLUEJAY-WS Agent Zero (`agent-zero-ws` key), or keep the
|
||||
workstation on direct Ollama and only route Anthropic calls through the
|
||||
bridge (the design doc describes this split as the preferred approach).
|
||||
|
||||
## Current state at staging time (2026-04-23)
|
||||
|
||||
- `fc-llm-bridge.iamworkin.lan` — NOT in pfSense Unbound (verified via
|
||||
`nslookup fc-llm-bridge.iamworkin.lan 10.0.56.1`: NXDOMAIN).
|
||||
- `FC LLM Bridge API Keys` — NOT created in 1Password (user action).
|
||||
- `Claude API Key` — already exists in `IAmWorkin` vault
|
||||
(`e5tth3y5mp3lhdavg35pxadzca`), also consumed by AiStation and Chat.Web.
|
||||
- `localhost/fc-llm-bridge:v*` image — not yet built; `FlowerCore.LlmBridge`
|
||||
repo has local commit `6d285b5` only, no remote.
|
||||
- ArgoCD `infra-fc-llm-bridge` Application — will be auto-created by the
|
||||
`bluejay-infra` ApplicationSet once the directory is on `main`.
|
||||
|
||||
## Why tcpSocket probes (not `/healthz`)
|
||||
|
||||
The bridge runs `ApiKeyAuthMiddleware`. `/healthz` and `/health` are exempt
|
||||
via `FlowerCore:LlmBridge:AuthExemptPaths`, so an HTTP probe would work
|
||||
today. But a future change to the middleware registration order could
|
||||
silently turn kubelet probes into 401/404, which crashes pods on every
|
||||
deploy. `tcpSocket` keeps probes robust against that regression. Memory:
|
||||
`feedback_k8s_probes_behind_auth_middleware.md`.
|
||||
Reference in New Issue
Block a user