The PROD-VLAN VIP 10.0.57.201 MTU-black-holes Agent Zero's ~150KB requests
(full prompt + 108 MCP tools) -> connection reset mid-stream -> AZ 'same message
again' loop. Switch FlowerCore__Chat__OllamaBaseUrl to the INFRA-VLAN NodePort
10.0.56.14:30976 (same VLAN as the old cluster, carries 150KB fine). Verified:
150KB POST = 200 via NodePort, times out via VIP. NodePort pinned to 30976 on GX10.
Agent Zero's agentic tool-loop ran on cloud Anthropic Sonnet (the bridge's
Anthropic key is currently 401) + gemma3:4b util (gemma3 returns 400 "does not
support tools" — fatal for the loop). Repoint the bridge ModelRouter tiers:
Balanced -> Ollama qwen2.5:14b (AZ chat) and Cheap -> qwen2.5:7b (AZ util), both
on the GX10 VIP 10.0.57.201 (already the bridge OllamaBaseUrl). Env-only, no
rebuild; Wiring A keeps the budget ledger + cache. Also: AZ chat ctx -> 32768,
browser -> qwen2.5:7b (text/tool-capable, vision off), AGENT_NAME -> "Blue Jay"
(the NUC role is retired). qwen2.5:7b + :14b pulled + warm-pinned on the GX10.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Repoints fc-chat, fc-ttsreader, knowledge, fc-llm-bridge (off the slow edge1
Pi5 10.0.57.17) and intranet (off the reimaged BLUEJAY-AI test laptop
10.0.56.132) to the GX10 (DGX Spark / GB10) Ollama over the PROD MetalLB VIP
10.0.57.201. GX10 serves gemma3:12b/gemma3:4b/qwen2.5:1.5b/nomic-embed-text/
llama3.2:1b on local NVMe, warm-pinned (keep_alive=-1).
fc-chat default model qwen2.5-coder:7b -> gemma3:12b (the coder model won't
pull reliably on the GX10; gemma3:12b is the warm fleet default + a better
general-chat model). Other consumers keep their exact models. Inline comments
referencing edge1/BLUEJAY-AI are now historical; the values are the GX10 VIP.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Ships the Bearer-token auth fix (FlowerCore.LlmBridge@3225f1f) so Agent
Zero's OpenAI provider can authenticate with Authorization: Bearer in
addition to the original X-Api-Key header.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pods in this cluster inherit ndots=5. External FQDNs with <5 dots (like
api.anthropic.com) are expanded through the search path first, and the 4th
suffix `api.anthropic.com.iamworkin.lan` matches CoreDNS' `template IN A
iamworkin.lan` wildcard — resolves to Traefik VIP 10.0.56.200. TLS connect
lands on Traefik's default cert and the AnthropicClient rejects with
RemoteCertificateNameMismatch/RemoteCertificateChainErrors.
Setting ndots=2 makes the resolver try the bare FQDN first (3 dots in
api.anthropic.com), so the search path never fires.
Reference: memory feedback_coredns_ndots_template_collision. Wider follow-up:
the CoreDNS template plugin should add fallthrough for external public suffixes,
so every FC service calling external HTTPS APIs stops hitting this trap.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 1Password item "Claude API Key" stores the key in a standard Password
field (labeled `password`), so the OnePasswordItem operator creates the K8s
Secret with key `password`. Deployment was referencing `credential`, which
made the pod fail with CreateContainerConfigError.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Built from FlowerCore.LlmBridge@6d285b5 (initial scaffold). Imported on all
three RKE2 nodes via podman save + ctr import. Replaces v00000000000000
placeholder — ArgoCD sync will roll the pod.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Staged but NOT applied. Do not git push until the two pre-requisites below
are done. See apps/fc-llm-bridge/README.md for the full order-of-ops.
Manifests (apps/fc-llm-bridge/fc-llm-bridge.yaml, 8 docs):
- Namespace fc-llm-bridge
- OnePasswordItem anthropic-api-key (existing Claude API Key item)
- OnePasswordItem fc-llm-bridge-api-keys (NEW item, pending creation)
- PersistentVolumeClaim fc-llm-bridge-data (2Gi longhorn)
- Deployment fc-llm-bridge (port 8080, uid 1654, readOnlyRootFilesystem,
tcpSocket probes to survive future ApiKeyAuthMiddleware reordering)
- Service fc-llm-bridge ClusterIP
- Certificate fc-llm-bridge-cert (step-ca-acme)
- IngressRoute fc-llm-bridge (fc-llm-bridge.iamworkin.lan, websecure)
Pre-requisites BEFORE git push:
1. pfSense Unbound override fc-llm-bridge.iamworkin.lan -> 10.0.56.200
(currently NXDOMAIN -- verified via nslookup and check-pfsense-dns.py).
Skipping this step puts cert-manager HTTP-01 into ~2h backoff.
2. Create 1Password item `FC LLM Bridge API Keys` in vault IAmWorkin with
password fields: agent-zero-ws, agent-zero-k8s, spare-1, spare-2.
3. Build + import localhost/fc-llm-bridge:v<tag> to rke2-server +
rke2-agent1 + rke2-agent2. Bump image tag from placeholder
v00000000000000 before committing the apply.
Related: ADR-088 (FlowerCore.Notes/ARCHITECTURE.md), design doc at
FlowerCore.Notes/docs/ai-agents/agent-zero-anthropic-bridge.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>