bluejay-infra

Author	SHA1	Message	Date
Andrew Stoltz	11f32f1a6e	deploy(dns): add GX10 fc-dns app	2026-06-17 02:12:40 -05:00
Andrew Stoltz	083e7f41cd	fix(fc-php): restore missing IngressRoute + TLS cert (php-web 404 on GX10) php.iamworkin.lan returned 404 on every path: the GX10 GitOps capture grabbed fc-php's deployment/service but NOT its IngressRoute (chicken-egg — php wasn't routed at capture time), so Traefik matched no route. Pod is 1/1 Running 37h — the 404 was pure missing-route, confirmed by diffing against the healthy sibling mysql-web (which has its IngressRoute). Mirrors the mysql-web / fc-network pattern: a cert-manager Certificate (step-ca-acme ClusterIssuer) to mint php-web-tls + an IngressRoute Host(php.iamworkin.lan)->php-web:5400. Additive only. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-17 01:57:47 -05:00
Andrew Stoltz	336c4a6ec0	deploy(signage): roll GX10 F2 image	2026-06-17 01:25:04 -05:00
Andrew Stoltz	415fec9e4d	gx10-gitops: deploy-loop proof — mark knowledge svc managed-by gx10-argocd	2026-06-16 22:33:40 -05:00
Andrew Stoltz	6c0be8563d	gx10-gitops: capture live manifests for 32 product namespaces (ArgoCD adoption source)	2026-06-16 22:24:23 -05:00
Andrew Stoltz	0218b1f8b6	gx10-gitops: pilot — capture live knowledge manifests (adoption source)	2026-06-16 22:18:20 -05:00
Andrew Stoltz	4b58b0ca5f	deploy: align gateway key field	2026-06-16 21:08:03 -05:00
Andrew Stoltz	bd8adb2188	deploy: add MCP gateway for Agent Zero	2026-06-16 21:01:52 -05:00
Andrew Stoltz	d32abd62c8	deploy(chat): chat-web v20260616-circuit-mood-5711f2d Ships the Blazor circuit-resilience + mood + telemetry fixes to live chat: - FcAiChat reconnect-resync (stuck _generating after a circuit drop) - fc-blazor-start.js client serverTimeout 60s (fewer spurious 1006 reconnects) - ChatToolVisibility (tool plumbing hidden in personality chat) - mood empathy fix (avatar no longer "excited" on bad news) - fc-circuit-telemetry.js + /api/clientlog sink (kubectl-readable circuit data) Image built on noc1 + imported to rke2-server + rke2-agent1. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-16 13:29:01 -05:00
Andrew Stoltz	204001a89d	deploy(dns): pin current DNS image	2026-06-16 11:45:13 -05:00
Andrew Stoltz	6950010ea4	ops(github-runner): scale tts-reader runner to 0 (crash-looping, memory relief) The github-runner-tts-reader pod was crash-looping (329 restarts) and consuming memory on the over-pressured old rke2 cluster (rke2-agent1 ~81%), contributing to Blazor SignalR circuit drops on ttsreader/chat. It provides no working CI in this state. Set replicas: 0 so ArgoCD stops re-creating it; restore to 1 once the runner is fixed or CI moves to a working host. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-16 11:27:40 -05:00
Andrew Stoltz	b28ab73a19	deploy(ttsreader): pin TT-3 endpoint fix image	2026-06-16 05:33:43 -05:00
Andrew Stoltz	09398d451f	deploy(ttsreader): pin TT-3 plane health image	2026-06-16 05:15:48 -05:00
Andrew Stoltz	3a7978ab1f	deploy(dns): pin DN-3b drift image	2026-06-15 20:56:30 -05:00
Andrew Stoltz	c0bfcb46fa	deploy(dns): pin DN-3 PowerDNS image	2026-06-15 20:33:18 -05:00
Andrew Stoltz	ebbf501038	deploy(dns): pin DN-2 MCP bridge image	2026-06-15 20:15:42 -05:00
Andrew Stoltz	d4f24f6f43	deploy(dns): wire MCP transport key	2026-06-15 19:58:52 -05:00
Andrew Stoltz	9f4805f1d6	deploy(dns): pin DN-2 entity CRUD image	2026-06-15 19:52:42 -05:00
Andrew Stoltz	b9a81fb4c0	deploy(dns): pin DN-1 rate-limit image	2026-06-15 19:07:56 -05:00
Andrew Stoltz	a4ccd30429	deploy(chat): require intranet fallback citation image	2026-06-15 18:25:19 -05:00
Andrew Stoltz	09b22e32c2	deploy(chat): pin activation hardened citation image	2026-06-15 18:21:54 -05:00
Andrew Stoltz	5bb136554d	deploy(chat): pin citation card fallback image	2026-06-15 18:16:15 -05:00
Andrew Stoltz	485710230b	deploy(chat): pin citation card image	2026-06-15 18:06:08 -05:00
Andrew Stoltz	f016375419	deploy(chat): pin citation route fix image	2026-06-15 17:50:21 -05:00
Andrew Stoltz	fc64638029	deploy(chat): pin citation fallback fix-forward	2026-06-15 17:32:39 -05:00
Andrew Stoltz	6b751b0fbe	deploy(chat): pin citation fallback image	2026-06-15 17:26:03 -05:00
Andrew Stoltz	a03dbe166d	deploy(library): pin RL5 fine action image	2026-06-15 15:56:20 -05:00
Andrew Stoltz	6febe1fdb3	deploy(dns): enable production auth profile	2026-06-15 15:08:03 -05:00
Andrew Stoltz	40fd35ba44	deploy(chat): pin CH-6 presence image	2026-06-14 19:26:31 -05:00
Andrew Stoltz	17654835e7	gx10/platform: step-ca-acme issuer + Traefik HelmChart (migration platform layer) Bootstrap manifests for the GX10 cluster platform layer (NUC->GX10 migration). Direct-applied to GX10 + LIVE: step-ca-acme ClusterIssuer Ready (ACME->noc1 step-ca), Traefik v3.6.10 via RKE2 HelmChart CRD at MetalLB VIP 10.0.57.202 (prod-pool, temp parallel-run; no clash with live old .200). Under gx10/ NOT apps/* to avoid the old ApplicationSet auto-deploying GX10 manifests to the OLD cluster.	2026-06-14 18:06:25 -05:00
Andrew Stoltz	63b8d4b667	Deploy Chat regroup CH-3 image	2026-06-14 18:01:43 -05:00
Andrew Stoltz	2c12f35f75	agent-zero: fix fc_dms netpol egress port (8080 = pod targetPort, not svc 80) NetworkPolicy matches the destination POD port. dms-web svc:80 -> containerPort 8080, so the egress must allow 8080 (the fc-chat rule already allows 80+8080, which is why chat worked and dms timed out). Add 8080 to the fc-dms egress.	2026-06-14 16:25:25 -05:00
Andrew Stoltz	e33fe81823	agent-zero: connect fc_dms MCP (product-manager fan-out, first server) AZ only had fc_chat (chat-session) + fc_knowledge (RAG) — so it had no product capabilities (the 'mysql manager' gap). Wire fc_dms (dynamic message signs, ~13 tools): OnePasswordItem dms-mcp-keys (1P 'FlowerCore DMS MCP Keys' field credential) -> DMS_MCP_API_KEY -> X-Api-Key; builder adds fc_dms; netpol egress fc-dms:80. Proven: dms-web/mcp returns 200 with this key. presentations/messageboard/ segmentdisplay/telephony 1P MCP-key items exist for the same pattern; mysql+signage need 1P items provisioned first (mysql/mcp 401s with no key). Watch context budget.	2026-06-14 16:19:34 -05:00
Andrew Stoltz	ef6afdd577	fc-llm-bridge: repoint Ollama to GX10 NodePort (fix AZ MTU black-hole) The PROD-VLAN VIP 10.0.57.201 MTU-black-holes Agent Zero's ~150KB requests (full prompt + 108 MCP tools) -> connection reset mid-stream -> AZ 'same message again' loop. Switch FlowerCore__Chat__OllamaBaseUrl to the INFRA-VLAN NodePort 10.0.56.14:30976 (same VLAN as the old cluster, carries 150KB fine). Verified: 150KB POST = 200 via NodePort, times out via VIP. NodePort pinned to 30976 on GX10.	2026-06-14 15:12:05 -05:00
Andrew Stoltz	62ca7dacf6	telephony: deploy ARI abort-fix image v20260614-arifix; drop 3600s band-aids Image -> v20260614-arifix (Telephony 86ff0d0: ReceiveAsync no longer cancelled). Remove the WebSocketKeepAliveTimeoutSeconds/WebSocketReceiveTimeoutSeconds=3600 band-aids; the code now disables the pong deadline by default and ignores the receive timeout (liveness = keepalive ping + WebSocketException/Close).	2026-06-14 14:36:11 -05:00
Andrew Stoltz	d03a92407d	gx10/tts: persist Piper /tts source + manifest (telephony TTS port baseline) Dockerfile (linux/arm64, en_US-amy-medium baked), tts_service.py (16kHz/16-bit/mono WAV, numpy resample 22050->16000), gx10-tts.yaml (CPU NodePort 30850, no GPU request), README (build/import/cutover/verify on the GX10 cluster).	2026-06-14 14:14:59 -05:00
Andrew Stoltz	e4d1735d35	telephony: make TTS cutover EFFECTIVE via Tts__PiperUrl env (overrides configmap) Root cause: the live deploy carried env Tts__PiperUrl=edge1 (drifted, not in git) which shadows appsettings Tts.PiperUrl. Codify Tts__PiperUrl=GX10 + Ari__ env to match live so git is source-of-truth; the configmap edit alone was inert.	2026-06-14 14:12:02 -05:00
Andrew Stoltz	15edcb7c71	telephony: cut TTS over to GX10 (10.0.56.14:30850, amy-medium); keep edge1 warm - Tts.PiperUrl edge1 10.0.57.17:8500 -> GX10 NodePort 10.0.56.14:30850 - add netpol egress to GX10 TTS; keep edge1 egress as rollback target - DefaultEngine piper / SampleRate 8000 unchanged (sln16 16kHz path)	2026-06-14 14:01:50 -05:00
Andrew Stoltz	284ca84166	agent-zero: GX10 system prompt rewrite (tool-calling + RAG rules, strip dead lanes) Sync the bluejay-profile ConfigMap's embedded system_prompt.md with the rewritten scripts/agent-zero/agents/bluejay/system_prompt.md: Ollama section -> GX10 hub (VIP 10.0.57.201, GB10/121GiB); model table with tool-calling flags (qwen2.5 = tools, gemma3 = 400-on-tools/vision-only, nomic = embed); new 'Models & Tool-Calling' + 'Knowledge & RAG' rule blocks; stripped dead WSL/R9700/.132/host.docker.internal/port-30050 lanes; de-pinned test counts; 'Blu' team is persona vocabulary not a fixed team. Personality preserved. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-14 13:40:25 -05:00
Andrew Stoltz	7a86c40cf1	fix(telephony): ARI receive timeout 45s->3600s — the real false-abort root cause Cancelling ClientWebSocket.ReceiveAsync via CancellationToken ABORTS the socket (a half-read WS frame can't resume). The per-iteration iterationCts.CancelAfter(WebSocketReceiveTimeoutSeconds) therefore aborted a healthy idle ARI WebSocket every 45s (state=Aborted), not the keepalive pong (proven: loop persisted after pong-timeout 15s->3600s). A large receive timeout lets ReceiveAsync block harmlessly while the PBX is idle; real drops still surface immediately as WebSocketException -> reconnect. Proper code fix (stop using CancelAfter on the receive) tracked separately. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-14 13:04:13 -05:00
Andrew Stoltz	de5c9f39fd	deploy(devicemgmt): pin regroup web image	2026-06-14 12:52:30 -05:00
Andrew Stoltz	d5311de676	fix(telephony): stop ARI WebSocket false-abort loop (pong-timeout 15s->3600s) Asterisk res_http_websocket does not reliably answer client PING frames with PONG, so .NET KeepAliveTimeout (default 15s) aborted a healthy idle ARI WebSocket every ~45s (ping@30s + pong-wait@15s), dropping StasisStart events so the *100 IVR intermittently answered with no audio. Generous pong timeout stops the false aborts; genuine drops still caught by the 45s receive-timeout state re-check and TCP-level WebSocketException. Surfaced by FlowerCore.Telephony.SipTests Call_Star100_ReceivesAudibleAudioStream (0 RTP packets while ExtToExt RTP-hook passed). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-14 12:50:12 -05:00
Andrew Stoltz	7b4f57bb97	deploy(updater): pin regroup web image	2026-06-14 12:45:39 -05:00
Andrew Stoltz	c569c05ad7	deploy(retail-library): roll regroup web images	2026-06-14 12:38:57 -05:00
Andrew Stoltz	fc8297041a	deploy(fc-chat): roll effective-prompt debug reveal v20260614-debugreveal-d389e4b Influence Audit panel now surfaces the per-turn effective prompt (RagContextSnapshot) as an operator/debug row. FlowerCore.Chat d389e4b. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-14 12:33:37 -05:00
Andrew Stoltz	e1554757e8	deploy(fc-chat): roll user-bubble prompt-leak fix v20260614-bubblefix-37f57b0 Stored/displayed user message is now the raw prompt; injected scaffolding (mood contract + guidance + memory) goes to the model via ragContext as a system message and is captured in RagContextSnapshot for debug. FlowerCore.Chat 37f57b0 + FlowerCore.Common 4d741b3. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-14 03:15:26 -05:00
Andrew Stoltz	0c8e6ee8ab	agent-zero(models): tool-capable qwen2.5 on GX10 via fc-llm-bridge (Wiring A) Agent Zero's agentic tool-loop ran on cloud Anthropic Sonnet (the bridge's Anthropic key is currently 401) + gemma3:4b util (gemma3 returns 400 "does not support tools" — fatal for the loop). Repoint the bridge ModelRouter tiers: Balanced -> Ollama qwen2.5:14b (AZ chat) and Cheap -> qwen2.5:7b (AZ util), both on the GX10 VIP 10.0.57.201 (already the bridge OllamaBaseUrl). Env-only, no rebuild; Wiring A keeps the budget ledger + cache. Also: AZ chat ctx -> 32768, browser -> qwen2.5:7b (text/tool-capable, vision off), AGENT_NAME -> "Blue Jay" (the NUC role is retired). qwen2.5:7b + :14b pulled + warm-pinned on the GX10. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-14 02:38:17 -05:00
Andrew Stoltz	9d5a1cce97	deploy(fc-chat): roll mood-signal build v20260614-moodsignal-a606892 Workstream A: set_mood structured signal replaces leaky [mood:X] text (FlowerCore.Chat a606892). Image built + imported to rke2-server and rke2-agent1. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-14 02:21:47 -05:00
Andrew Stoltz	e0460bd881	infra(ai): consolidate fleet Ollama consumers onto GX10 VIP 10.0.57.201 Repoints fc-chat, fc-ttsreader, knowledge, fc-llm-bridge (off the slow edge1 Pi5 10.0.57.17) and intranet (off the reimaged BLUEJAY-AI test laptop 10.0.56.132) to the GX10 (DGX Spark / GB10) Ollama over the PROD MetalLB VIP 10.0.57.201. GX10 serves gemma3:12b/gemma3:4b/qwen2.5:1.5b/nomic-embed-text/ llama3.2:1b on local NVMe, warm-pinned (keep_alive=-1). fc-chat default model qwen2.5-coder:7b -> gemma3:12b (the coder model won't pull reliably on the GX10; gemma3:12b is the warm fleet default + a better general-chat model). Other consumers keep their exact models. Inline comments referencing edge1/BLUEJAY-AI are now historical; the values are the GX10 VIP. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-14 00:54:36 -05:00
Robot	303c450bc9	Cl-5: Admin console infra finding — rides DM.Web (zero new infra) Audit of apps/fc-devicemgmt/ confirms the admin/helpdesk console needs NO new infra: the existing host-matched IngressRoute (devices.iamworkin.lan, no path constraint) + step-ca-acme Certificate already cover admin routes served under FlowerCore:PathBase (ADR-204 routes-inside-DM.Web). ADMIN-CONSOLE-INFRA.md records the finding + the open Q-MP question (distinct admin hostname vs PathBase path) with the exact 3-step add if a separate host is later chosen. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-13 23:22:16 -05:00

1 2 3 4 5 ...

580 Commits