infra(ai): consolidate fleet Ollama consumers onto GX10 VIP 10.0.57.201

Repoints fc-chat, fc-ttsreader, knowledge, fc-llm-bridge (off the slow edge1 Pi5 10.0.57.17) and intranet (off the reimaged BLUEJAY-AI test laptop 10.0.56.132) to the GX10 (DGX Spark / GB10) Ollama over the PROD MetalLB VIP 10.0.57.201. GX10 serves gemma3:12b/gemma3:4b/qwen2.5:1.5b/nomic-embed-text/ llama3.2:1b on local NVMe, warm-pinned (keep_alive=-1). fc-chat default model qwen2.5-coder:7b -> gemma3:12b (the coder model won't pull reliably on the GX10; gemma3:12b is the warm fleet default + a better general-chat model). Other consumers keep their exact models. Inline comments referencing edge1/BLUEJAY-AI are now historical; the values are the GX10 VIP. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 00:54:36 -05:00
parent 303c450bc9
commit e0460bd881
5 changed files with 8 additions and 8 deletions
--- a/apps/fc-chat/fc-chat.yaml
+++ b/apps/fc-chat/fc-chat.yaml
@@ -34,10 +34,10 @@ data:
  # proved Chat pods time out reaching 10.0.56.20:11434. Keep generation and
  # behavior-rule checks on the cluster-routable edge1 endpoint until that route
  # is fixed; choose models that edge1 actually hosts.
-  FlowerCore__AI__OllamaBaseUrl: "http://10.0.57.17:11434"
-  FlowerCore__AI__DefaultModelName: "qwen2.5-coder:7b"
-  ChatOptions__BehaviorRuleEngine__OllamaBaseUrl: "http://10.0.57.17:11434"
-  ChatOptions__BehaviorRuleEngine__FallbackOllamaBaseUrl: "http://10.0.57.17:11434"
+  FlowerCore__AI__OllamaBaseUrl: "http://10.0.57.201:11434"
+  FlowerCore__AI__DefaultModelName: "gemma3:12b"
+  ChatOptions__BehaviorRuleEngine__OllamaBaseUrl: "http://10.0.57.201:11434"
+  ChatOptions__BehaviorRuleEngine__FallbackOllamaBaseUrl: "http://10.0.57.201:11434"
  ChatOptions__BehaviorRuleEngine__ModelName: "gemma3:4b"
  FlowerCore__AI__Memory__UseSharedIndexingAdapter: "true"
  FlowerCore__AI__Memory__UseOllamaEmbeddings: "true"