infra(ai): consolidate fleet Ollama consumers onto GX10 VIP 10.0.57.201

Repoints fc-chat, fc-ttsreader, knowledge, fc-llm-bridge (off the slow edge1
Pi5 10.0.57.17) and intranet (off the reimaged BLUEJAY-AI test laptop
10.0.56.132) to the GX10 (DGX Spark / GB10) Ollama over the PROD MetalLB VIP
10.0.57.201. GX10 serves gemma3:12b/gemma3:4b/qwen2.5:1.5b/nomic-embed-text/
llama3.2:1b on local NVMe, warm-pinned (keep_alive=-1).

fc-chat default model qwen2.5-coder:7b -> gemma3:12b (the coder model won't
pull reliably on the GX10; gemma3:12b is the warm fleet default + a better
general-chat model). Other consumers keep their exact models. Inline comments
referencing edge1/BLUEJAY-AI are now historical; the values are the GX10 VIP.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
Andrew Stoltz
2026-06-14 00:54:36 -05:00
parent 303c450bc9
commit e0460bd881
5 changed files with 8 additions and 8 deletions

View File

@@ -34,10 +34,10 @@ data:
# proved Chat pods time out reaching 10.0.56.20:11434. Keep generation and # proved Chat pods time out reaching 10.0.56.20:11434. Keep generation and
# behavior-rule checks on the cluster-routable edge1 endpoint until that route # behavior-rule checks on the cluster-routable edge1 endpoint until that route
# is fixed; choose models that edge1 actually hosts. # is fixed; choose models that edge1 actually hosts.
FlowerCore__AI__OllamaBaseUrl: "http://10.0.57.17:11434" FlowerCore__AI__OllamaBaseUrl: "http://10.0.57.201:11434"
FlowerCore__AI__DefaultModelName: "qwen2.5-coder:7b" FlowerCore__AI__DefaultModelName: "gemma3:12b"
ChatOptions__BehaviorRuleEngine__OllamaBaseUrl: "http://10.0.57.17:11434" ChatOptions__BehaviorRuleEngine__OllamaBaseUrl: "http://10.0.57.201:11434"
ChatOptions__BehaviorRuleEngine__FallbackOllamaBaseUrl: "http://10.0.57.17:11434" ChatOptions__BehaviorRuleEngine__FallbackOllamaBaseUrl: "http://10.0.57.201:11434"
ChatOptions__BehaviorRuleEngine__ModelName: "gemma3:4b" ChatOptions__BehaviorRuleEngine__ModelName: "gemma3:4b"
FlowerCore__AI__Memory__UseSharedIndexingAdapter: "true" FlowerCore__AI__Memory__UseSharedIndexingAdapter: "true"
FlowerCore__AI__Memory__UseOllamaEmbeddings: "true" FlowerCore__AI__Memory__UseOllamaEmbeddings: "true"

View File

@@ -166,7 +166,7 @@ spec:
optional: true optional: true
# Shared.Chat — Ollama (edge1 Pi 5 + AI HAT+, matches bridge default) # Shared.Chat — Ollama (edge1 Pi 5 + AI HAT+, matches bridge default)
- name: FlowerCore__Chat__OllamaBaseUrl - name: FlowerCore__Chat__OllamaBaseUrl
value: "http://10.0.57.17:11434" value: "http://10.0.57.201:11434"
- name: FlowerCore__Chat__HttpTimeout - name: FlowerCore__Chat__HttpTimeout
value: "00:05:00" value: "00:05:00"
# Shared.Chat — Anthropic # Shared.Chat — Anthropic

View File

@@ -605,7 +605,7 @@ spec:
- name: TtsReader__Transcription__TimeoutSeconds - name: TtsReader__Transcription__TimeoutSeconds
value: "300" value: "300"
- name: TtsReader__Ollama__BaseUrl - name: TtsReader__Ollama__BaseUrl
value: "http://10.0.57.17:11434" value: "http://10.0.57.201:11434"
- name: TtsReader__Ollama__DefaultModel - name: TtsReader__Ollama__DefaultModel
value: "gemma3:4b" value: "gemma3:4b"
- name: TtsReader__Ollama__TimeoutSeconds - name: TtsReader__Ollama__TimeoutSeconds

View File

@@ -92,7 +92,7 @@ spec:
# down. Bulk embed runs in the background; /health does not depend on it. # down. Bulk embed runs in the background; /health does not depend on it.
# Memory: feedback_pi5_nomic_embed_slow. # Memory: feedback_pi5_nomic_embed_slow.
- name: IntranetSearch__OllamaBaseUrl - name: IntranetSearch__OllamaBaseUrl
value: "http://10.0.56.132:11434" value: "http://10.0.57.201:11434"
# Notes docs corpus IS now mounted at /srv/flowercore-notes (see the # Notes docs corpus IS now mounted at /srv/flowercore-notes (see the
# notes-corpus-clone initContainer + notes-corpus-sync sidecar), so the # notes-corpus-clone initContainer + notes-corpus-sync sidecar), so the
# IntranetSearch indexer is ENABLED. First-boot bulk embed of the corpus # IntranetSearch indexer is ENABLED. First-boot bulk embed of the corpus

View File

@@ -168,7 +168,7 @@ spec:
# need a separate ingestion lane that can opt into the # need a separate ingestion lane that can opt into the
# workstation GPU when present. # workstation GPU when present.
- name: FlowerCore__Ollama__BaseUrl - name: FlowerCore__Ollama__BaseUrl
value: "http://10.0.57.17:11434" value: "http://10.0.57.201:11434"
- name: FlowerCore__Mcp__ApiKey__Key - name: FlowerCore__Mcp__ApiKey__Key
valueFrom: valueFrom:
secretKeyRef: secretKeyRef: