feat(agent-zero): route chat_model through fc-llm-bridge (ADR-088)

Flips Agent Zero's chat_model from direct local Ollama (gemma3:12b via
the 127.0.0.1:11434 sidecar proxy) to the FlowerCore LLM Bridge
(fc:balanced tier, OpenAI-compatible, Anthropic Claude Sonnet under the
hood) so chat turns are spend-tracked and can dispatch to any provider
via a single tier alias.

Scope is intentionally minimal and reversible:
  - chat_model: ollama/gemma3:12b/127.0.0.1:11434
              → openai/fc:balanced/fc-llm-bridge internal service URL
  - utility_model, embedding_model, browser_model: UNCHANGED
    (stay on local 127.0.0.1 Ollama sidecar — no spend, low latency,
    not worth routing through the bridge for small-model traffic).

Auth: new A0_SET_chat_model_api_key env var wired to the
fc-llm-bridge-api-keys Secret (field: agent-zero-k8s). The Secret is
synced by a new OnePasswordItem pointing at "FC LLM Bridge API Keys"
in the IAmWorkin vault. Bearer-token auth is now accepted by the
bridge (FlowerCore.LlmBridge@3225f1f).

Rollback: revert this commit; old image v202604231449 is still present
on all RKE2 nodes, and Agent Zero's strategy: Recreate makes the flip
atomic.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Andrew Stoltz
2026-04-23 09:54:27 -05:00
parent 84634f59f0
commit 62db15c69c

View File

@@ -94,6 +94,19 @@ subjects:
# Connects to a local proxy that routes to workstation Ollama first and edge1 second
# Blue Jay profile with 21 tools, 3 prompts, 4 extensions
---
# FC LLM Bridge API key for Agent Zero (ADR-088 chat_model routing).
# Syncs from 1Password item "FC LLM Bridge API Keys" (field: agent-zero-k8s).
# Consumed by the chat_model only; util / embedding / browser stay on local
# Ollama via the 127.0.0.1 sidecar proxy.
apiVersion: onepassword.com/v1
kind: OnePasswordItem
metadata:
name: fc-llm-bridge-api-keys
namespace: agent-zero
spec:
itemPath: "vaults/IAmWorkin/items/FC LLM Bridge API Keys"
---
apiVersion: apps/v1
kind: Deployment
@@ -239,12 +252,15 @@ spec:
# Link Blue Jay profile from workspace into Agent Zero's expected path
ln -sfn /a0/work/.bluejay/agents/bluejay /a0/agents/bluejay
# Write model config BEFORE initialize.sh loads it
# The _model_config plugin reads config.json (NOT config.yaml)
# Default is OpenRouter; override to the local proxy, which prefers
# the workstation and falls back to edge1 automatically.
# The _model_config plugin reads config.json (NOT config.yaml).
# chat_model: FlowerCore LLM Bridge (ADR-088) — OpenAI-compat,
# spend-tracked, tier-aliased (fc:balanced → Claude Sonnet).
# api_key comes from A0_SET_chat_model_api_key env var (overrides
# config.json). util + embedding stay on local 127.0.0.1 Ollama
# proxy (workstation primary, edge1 fallback).
mkdir -p /a0/usr/plugins/_model_config
cat > /a0/usr/plugins/_model_config/config.json << 'MODELCFG'
{"allow_chat_override":true,"chat_model":{"provider":"ollama","name":"gemma3:12b","api_base":"http://127.0.0.1:11434","ctx_length":8192,"ctx_history":0.7,"vision":false,"kwargs":{"temperature":0,"num_ctx":8192}},"utility_model":{"provider":"ollama","name":"qwen2.5:1.5b","api_base":"http://127.0.0.1:11434","ctx_length":8192,"ctx_input":0.7,"kwargs":{"num_ctx":8192}},"embedding_model":{"provider":"ollama","name":"nomic-embed-text","api_base":"http://127.0.0.1:11434","kwargs":{}}}
{"allow_chat_override":true,"chat_model":{"provider":"openai","name":"fc:balanced","api_base":"http://fc-llm-bridge.fc-llm-bridge.svc.cluster.local:8080/v1","ctx_length":8192,"ctx_history":0.7,"vision":false,"kwargs":{"temperature":0,"num_ctx":8192}},"utility_model":{"provider":"ollama","name":"qwen2.5:1.5b","api_base":"http://127.0.0.1:11434","ctx_length":8192,"ctx_input":0.7,"kwargs":{"num_ctx":8192}},"embedding_model":{"provider":"ollama","name":"nomic-embed-text","api_base":"http://127.0.0.1:11434","kwargs":{}}}
MODELCFG
# Strip heredoc indentation
sed -i 's/^ //' /a0/usr/plugins/_model_config/config.json
@@ -256,13 +272,22 @@ spec:
# Agent identity
- name: AGENT_NAME
value: "Blue Jay (NUC)"
# Chat model — workstation primary, edge1 fallback via local proxy
# Chat model — routed through FlowerCore LLM Bridge (ADR-088)
# so spend is tracked and tier aliases (fc:cheap/fc:balanced/fc:deep)
# dispatch to Ollama or Anthropic via a single OpenAI-compat endpoint.
# Util / embedding / browser stay on local Ollama via 127.0.0.1 proxy
# for zero-latency, zero-cost small-model traffic.
- name: A0_SET_chat_model_provider
value: "ollama"
value: "openai"
- name: A0_SET_chat_model_name
value: "gemma3:12b"
value: "fc:balanced"
- name: A0_SET_chat_model_api_base
value: "http://127.0.0.1:11434"
value: "http://fc-llm-bridge.fc-llm-bridge.svc.cluster.local:8080/v1"
- name: A0_SET_chat_model_api_key
valueFrom:
secretKeyRef:
name: fc-llm-bridge-api-keys
key: agent-zero-k8s
- name: A0_SET_chat_model_ctx_length
value: "8192"
- name: A0_SET_chat_model_kwargs