Files
bluejay-infra/apps/fc-llm-bridge
Codex 0b52093b36 K8s manifest hardening + new bluejay-infra-lint test project
Manifest hardening (per documented memories):
- apps/asterisk/deployment.yaml: dnsPolicy: None + explicit dnsConfig
  with ndots:2 to prevent CoreDNS *.iamworkin.lan template from
  hijacking external egress (downloads.asterisk.org).
- apps/fc-llm-bridge/fc-llm-bridge.yaml: same dnsConfig pattern for
  api.anthropic.com egress.
- apps/fc-ttsreader/fc-ttsreader.yaml: same dnsConfig pattern for
  huggingface.co model seeding.
- apps/fc-messageboard/fc-messageboard.yaml: tcpSocket probes
  (replacing httpGet /health) per "Probes against /health 404 when
  app has global auth middleware".
- apps/fc-signalcontrol/fc-signalcontrol.yaml: same tcpSocket probe
  fix.

New lint project:
- tests/bluejay-infra-lint/BluejayInfraLint.Tests.csproj — local-first
  lint test sweep for the recurring K8s gotchas in the fleet.
- tests/bluejay-infra-lint/FleetManifestLintTests.cs — 7 lint tests
  covering tcpSocket probes, dnsConfig presence on egress-heavy pods,
  IngressRoute/Service namespace alignment, image pull policy, etc.
- tests/bluejay-infra-lint/conftest.dev/ — matching conftest policies
  for environments with conftest/opa.
- .gitignore — adds bin/ + obj/ + DS_Store/swp.

README.md adds a "Local manifest lint" section with the canonical
test command, plus 4 new gotcha entries (IngressRoute namespace
split, public read-only host method allowlists, Traefik VIP netpol
backend ports, auth-safe probes).

Tests: 7 / 7 lint tests passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 03:18:04 -05:00
..

fc-llm-bridge — staged deployment (ADR-088)

Status: manifests staged, NOT YET APPLIED. Do not git push or sync ArgoCD until the two pre-requisites below are done, in order.

Design: ../../../FlowerCore.Notes/docs/ai-agents/agent-zero-anthropic-bridge.md ADR: ADR-088 in ../../../FlowerCore.Notes/ARCHITECTURE.md

Deployment order (do NOT skip / reorder)

1. FlowerCore.DNS preflight — REQUIRED FIRST

fc-llm-bridge.iamworkin.lan must keep resolving to 10.0.56.200 through FlowerCore.DNS before this manifest is applied.

step-ca (the ACME CA on noc1) uses pfSense Unbound (10.0.56.1), not cluster CoreDNS. If you apply this manifest before adding the DNS override, cert-manager's HTTP-01 challenge silently fails for ~2h (exponential backoff) until someone manually runs kubectl -n fc-llm-bridge delete order <order> to bust the cache. See memory feedback_pfsense_dns_required_for_acme.md.

Verify the record through the public preflight API:

curl -sk "https://dns.iamworkin.lan/api/v1/zones/iamworkin.lan/resolve-preflight?hostname=fc-llm-bridge.iamworkin.lan"
# Expect: "resolvable": true

Verify:

python scripts/check-pfsense-dns.py
# Historical filename retained; implementation now calls FlowerCore.DNS
# resolve-preflight instead of raw resolver lookups.

If the record is missing, recreate it through FlowerCore.DNS before pushing:

curl -sk https://dns.iamworkin.lan/api/v1/servers
curl -sk -X POST https://dns.iamworkin.lan/api/v1/servers/<serverId>/zones/iamworkin.lan/records \
  -H "Content-Type: application/json" \
  -d '{"name":"fc-llm-bridge","type":"A","data":"10.0.56.200","ttl":300}'

2. Create the FC LLM Bridge API Keys 1Password item

The Claude API Key item in vault IAmWorkin already exists (id e5tth3y5mp3lhdavg35pxadzca, see docs/ai-agents/anthropic-integration.md).

The new item for per-consumer bridge API keys does NOT yet exist. Create it before the first apply of this manifest — the Deployment marks the individual key env vars optional: true so missing keys will not crash the pod, but the bridge will reject every request with 401 until at least one key is populated.

Field Item position Type Purpose
credential Top section Password (random, 48 char) Unused placeholder required by the 1Password schema for single-field items. Can be anything — this file is never read by K8s.
agent-zero-ws "API Keys" section Password (random, 48 char) API key for the BLUEJAY-WS Agent Zero instance.
agent-zero-k8s "API Keys" section Password (random, 48 char) API key for the K8s-hosted agent-zero Deployment.
spare-1 "API Keys" section Password (random, 48 char) Reserve for future Agent Zero forks / smoke-test scripts.
spare-2 "API Keys" section Password (random, 48 char) Reserve.

Steps via the CLI (run from a machine with op signed in):

op item create \
  --category="API Credential" \
  --title="FC LLM Bridge API Keys" \
  --vault="IAmWorkin" \
  "API Keys.agent-zero-ws[password]=$(openssl rand -hex 24)" \
  "API Keys.agent-zero-k8s[password]=$(openssl rand -hex 24)" \
  "API Keys.spare-1[password]=$(openssl rand -hex 24)" \
  "API Keys.spare-2[password]=$(openssl rand -hex 24)"

OR via the 1Password GUI — create a new item titled exactly FC LLM Bridge API Keys in the IAmWorkin vault, add an API Keys section, add four password fields named agent-zero-ws, agent-zero-k8s, spare-1, spare-2 with openssl rand -hex 24 values.

Mapping to K8s: The 1Password Connect operator syncs each field to a Secret key of the same name. The Deployment's env vars (FlowerCore__LlmBridge__ApiKeys__agent-zero-ws etc) reference those Secret keys. In FlowerCore.Shared.Api.Authentication.ApiKeyAuthMiddleware, the key name (e.g. agent-zero-k8s) becomes the fc.app claim on the ClaimsPrincipal, which is what IBudgetLedger uses to scope spend per consumer.

3. Build + import the image to every RKE2 node

# From BLUEJAY-WS, in D:\git\FlowerCore\FlowerCore.LlmBridge
TAG="v$(date +%Y%m%d%H%M%S)"
dotnet.exe publish -c Release -o deploy/app \
  src/FlowerCore.LlmBridge.Web/FlowerCore.LlmBridge.Web.csproj
podman build -t localhost/fc-llm-bridge:$TAG -f deploy/Dockerfile.deploy deploy
podman save localhost/fc-llm-bridge:$TAG -o /tmp/fc-llm-bridge.tar

# SCP to each node and ctr import
for NODE in rke2-server rke2-agent1 rke2-agent2; do
  scp /tmp/fc-llm-bridge.tar $NODE:/tmp/
  ssh $NODE "sudo /var/lib/rancher/rke2/bin/ctr -a /run/k3s/containerd/containerd.sock -n k8s.io images import /tmp/fc-llm-bridge.tar"
done

4. Bump the image tag in the manifest

Edit fc-llm-bridge.yaml, replace localhost/fc-llm-bridge:v00000000000000 with the tag from step 3.

5. Commit + push

cd D:/git/FlowerCore/bluejay-infra
# re-run the DNS gate
python scripts/check-pfsense-dns.py
git add apps/fc-llm-bridge/
git commit -m "feat(fc-llm-bridge): deploy ADR-088 Agent Zero bridge"
git push

ArgoCD picks up within ~3 minutes and creates infra-fc-llm-bridge.

6. Verify

# From noc1
fcadmin_ssh noc1 '
  kubectl -n argocd get application infra-fc-llm-bridge
  kubectl -n fc-llm-bridge get certificate,pod
  curl -sk -m 8 -o /dev/null -w "HTTP %{http_code}\n" https://fc-llm-bridge.iamworkin.lan/healthz
'

Expect: Certificate Ready: True within ~60s, /healthz HTTP 200.

7. Flip Agent Zero to the bridge

After the bridge passes a real chat smoke test, update the Agent Zero ConfigMap (apps/agent-zero/agent-zero.yaml) to route through the bridge:

  • A0_SET_chat_model_api_base / config.json > chat_model.api_base -> https://fc-llm-bridge.iamworkin.lan/v1
  • Add an A0_SET_chat_model_api_key env var wired to a K8s Secret sourced from FC LLM Bridge API Keys field agent-zero-k8s.
  • Set chat_model.name to fc:balanced (or a concrete model) — the bridge accepts both tier aliases and concrete model names.

Do the same for BLUEJAY-WS Agent Zero (agent-zero-ws key), or keep the workstation on direct Ollama and only route Anthropic calls through the bridge (the design doc describes this split as the preferred approach).

Current state at staging time (2026-04-23)

  • fc-llm-bridge.iamworkin.lan — public FlowerCore.DNS preflight is now green and resolves to 10.0.56.200; keep python scripts/check-pfsense-dns.py green before push.
  • FC LLM Bridge API Keys — NOT created in 1Password (user action).
  • Claude API Key — already exists in IAmWorkin vault (e5tth3y5mp3lhdavg35pxadzca), also consumed by AiStation and Chat.Web.
  • localhost/fc-llm-bridge:v* image — not yet built; FlowerCore.LlmBridge repo has local commit 6d285b5 only, no remote.
  • ArgoCD infra-fc-llm-bridge Application — will be auto-created by the bluejay-infra ApplicationSet once the directory is on main.

Why tcpSocket probes (not /healthz)

The bridge runs ApiKeyAuthMiddleware. /healthz and /health are exempt via FlowerCore:LlmBridge:AuthExemptPaths, so an HTTP probe would work today. But a future change to the middleware registration order could silently turn kubelet probes into 401/404, which crashes pods on every deploy. tcpSocket keeps probes robust against that regression. Memory: feedback_k8s_probes_behind_auth_middleware.md.