bluejay-infra

Author	SHA1	Message	Date
Andrew Stoltz	cebd934872	deploy(php): roll non-root GX10 operator image	2026-06-17 05:22:36 -05:00
Andrew Stoltz	8d55ca1566	deploy(mysql): roll non-root GX10 operator image	2026-06-17 04:34:28 -05:00
Andrew Stoltz	b11f26b963	deploy(mysql): roll non-root GX10 web image	2026-06-17 04:08:23 -05:00
Andrew Stoltz	aa0525331d	deploy(updater): roll non-root GX10 image	2026-06-17 03:15:35 -05:00
Andrew Stoltz	9ce18e4acc	fix(irc): inject GX10 cloak keys from Secret	2026-06-17 02:39:55 -05:00
Andrew Stoltz	11f32f1a6e	deploy(dns): add GX10 fc-dns app	2026-06-17 02:12:40 -05:00
Andrew Stoltz	083e7f41cd	fix(fc-php): restore missing IngressRoute + TLS cert (php-web 404 on GX10) php.iamworkin.lan returned 404 on every path: the GX10 GitOps capture grabbed fc-php's deployment/service but NOT its IngressRoute (chicken-egg — php wasn't routed at capture time), so Traefik matched no route. Pod is 1/1 Running 37h — the 404 was pure missing-route, confirmed by diffing against the healthy sibling mysql-web (which has its IngressRoute). Mirrors the mysql-web / fc-network pattern: a cert-manager Certificate (step-ca-acme ClusterIssuer) to mint php-web-tls + an IngressRoute Host(php.iamworkin.lan)->php-web:5400. Additive only. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-17 01:57:47 -05:00
Andrew Stoltz	336c4a6ec0	deploy(signage): roll GX10 F2 image	2026-06-17 01:25:04 -05:00
Andrew Stoltz	415fec9e4d	gx10-gitops: deploy-loop proof — mark knowledge svc managed-by gx10-argocd	2026-06-16 22:33:40 -05:00
Andrew Stoltz	6c0be8563d	gx10-gitops: capture live manifests for 32 product namespaces (ArgoCD adoption source)	2026-06-16 22:24:23 -05:00
Andrew Stoltz	0218b1f8b6	gx10-gitops: pilot — capture live knowledge manifests (adoption source)	2026-06-16 22:18:20 -05:00
Andrew Stoltz	4b58b0ca5f	deploy: align gateway key field	2026-06-16 21:08:03 -05:00
Andrew Stoltz	bd8adb2188	deploy: add MCP gateway for Agent Zero	2026-06-16 21:01:52 -05:00
Andrew Stoltz	d32abd62c8	deploy(chat): chat-web v20260616-circuit-mood-5711f2d Ships the Blazor circuit-resilience + mood + telemetry fixes to live chat: - FcAiChat reconnect-resync (stuck _generating after a circuit drop) - fc-blazor-start.js client serverTimeout 60s (fewer spurious 1006 reconnects) - ChatToolVisibility (tool plumbing hidden in personality chat) - mood empathy fix (avatar no longer "excited" on bad news) - fc-circuit-telemetry.js + /api/clientlog sink (kubectl-readable circuit data) Image built on noc1 + imported to rke2-server + rke2-agent1. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-16 13:29:01 -05:00
Andrew Stoltz	204001a89d	deploy(dns): pin current DNS image	2026-06-16 11:45:13 -05:00
Andrew Stoltz	6950010ea4	ops(github-runner): scale tts-reader runner to 0 (crash-looping, memory relief) The github-runner-tts-reader pod was crash-looping (329 restarts) and consuming memory on the over-pressured old rke2 cluster (rke2-agent1 ~81%), contributing to Blazor SignalR circuit drops on ttsreader/chat. It provides no working CI in this state. Set replicas: 0 so ArgoCD stops re-creating it; restore to 1 once the runner is fixed or CI moves to a working host. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-16 11:27:40 -05:00
Andrew Stoltz	b28ab73a19	deploy(ttsreader): pin TT-3 endpoint fix image	2026-06-16 05:33:43 -05:00
Andrew Stoltz	09398d451f	deploy(ttsreader): pin TT-3 plane health image	2026-06-16 05:15:48 -05:00
Andrew Stoltz	3a7978ab1f	deploy(dns): pin DN-3b drift image	2026-06-15 20:56:30 -05:00
Andrew Stoltz	c0bfcb46fa	deploy(dns): pin DN-3 PowerDNS image	2026-06-15 20:33:18 -05:00
Andrew Stoltz	ebbf501038	deploy(dns): pin DN-2 MCP bridge image	2026-06-15 20:15:42 -05:00
Andrew Stoltz	d4f24f6f43	deploy(dns): wire MCP transport key	2026-06-15 19:58:52 -05:00
Andrew Stoltz	9f4805f1d6	deploy(dns): pin DN-2 entity CRUD image	2026-06-15 19:52:42 -05:00
Andrew Stoltz	b9a81fb4c0	deploy(dns): pin DN-1 rate-limit image	2026-06-15 19:07:56 -05:00
Andrew Stoltz	a4ccd30429	deploy(chat): require intranet fallback citation image	2026-06-15 18:25:19 -05:00
Andrew Stoltz	09b22e32c2	deploy(chat): pin activation hardened citation image	2026-06-15 18:21:54 -05:00
Andrew Stoltz	5bb136554d	deploy(chat): pin citation card fallback image	2026-06-15 18:16:15 -05:00
Andrew Stoltz	485710230b	deploy(chat): pin citation card image	2026-06-15 18:06:08 -05:00
Andrew Stoltz	f016375419	deploy(chat): pin citation route fix image	2026-06-15 17:50:21 -05:00
Andrew Stoltz	fc64638029	deploy(chat): pin citation fallback fix-forward	2026-06-15 17:32:39 -05:00
Andrew Stoltz	6b751b0fbe	deploy(chat): pin citation fallback image	2026-06-15 17:26:03 -05:00
Andrew Stoltz	a03dbe166d	deploy(library): pin RL5 fine action image	2026-06-15 15:56:20 -05:00
Andrew Stoltz	6febe1fdb3	deploy(dns): enable production auth profile	2026-06-15 15:08:03 -05:00
Andrew Stoltz	40fd35ba44	deploy(chat): pin CH-6 presence image	2026-06-14 19:26:31 -05:00
Andrew Stoltz	17654835e7	gx10/platform: step-ca-acme issuer + Traefik HelmChart (migration platform layer) Bootstrap manifests for the GX10 cluster platform layer (NUC->GX10 migration). Direct-applied to GX10 + LIVE: step-ca-acme ClusterIssuer Ready (ACME->noc1 step-ca), Traefik v3.6.10 via RKE2 HelmChart CRD at MetalLB VIP 10.0.57.202 (prod-pool, temp parallel-run; no clash with live old .200). Under gx10/ NOT apps/* to avoid the old ApplicationSet auto-deploying GX10 manifests to the OLD cluster.	2026-06-14 18:06:25 -05:00
Andrew Stoltz	63b8d4b667	Deploy Chat regroup CH-3 image	2026-06-14 18:01:43 -05:00
Andrew Stoltz	2c12f35f75	agent-zero: fix fc_dms netpol egress port (8080 = pod targetPort, not svc 80) NetworkPolicy matches the destination POD port. dms-web svc:80 -> containerPort 8080, so the egress must allow 8080 (the fc-chat rule already allows 80+8080, which is why chat worked and dms timed out). Add 8080 to the fc-dms egress.	2026-06-14 16:25:25 -05:00
Andrew Stoltz	e33fe81823	agent-zero: connect fc_dms MCP (product-manager fan-out, first server) AZ only had fc_chat (chat-session) + fc_knowledge (RAG) — so it had no product capabilities (the 'mysql manager' gap). Wire fc_dms (dynamic message signs, ~13 tools): OnePasswordItem dms-mcp-keys (1P 'FlowerCore DMS MCP Keys' field credential) -> DMS_MCP_API_KEY -> X-Api-Key; builder adds fc_dms; netpol egress fc-dms:80. Proven: dms-web/mcp returns 200 with this key. presentations/messageboard/ segmentdisplay/telephony 1P MCP-key items exist for the same pattern; mysql+signage need 1P items provisioned first (mysql/mcp 401s with no key). Watch context budget.	2026-06-14 16:19:34 -05:00
Andrew Stoltz	ef6afdd577	fc-llm-bridge: repoint Ollama to GX10 NodePort (fix AZ MTU black-hole) The PROD-VLAN VIP 10.0.57.201 MTU-black-holes Agent Zero's ~150KB requests (full prompt + 108 MCP tools) -> connection reset mid-stream -> AZ 'same message again' loop. Switch FlowerCore__Chat__OllamaBaseUrl to the INFRA-VLAN NodePort 10.0.56.14:30976 (same VLAN as the old cluster, carries 150KB fine). Verified: 150KB POST = 200 via NodePort, times out via VIP. NodePort pinned to 30976 on GX10.	2026-06-14 15:12:05 -05:00
Andrew Stoltz	62ca7dacf6	telephony: deploy ARI abort-fix image v20260614-arifix; drop 3600s band-aids Image -> v20260614-arifix (Telephony 86ff0d0: ReceiveAsync no longer cancelled). Remove the WebSocketKeepAliveTimeoutSeconds/WebSocketReceiveTimeoutSeconds=3600 band-aids; the code now disables the pong deadline by default and ignores the receive timeout (liveness = keepalive ping + WebSocketException/Close).	2026-06-14 14:36:11 -05:00
Andrew Stoltz	d03a92407d	gx10/tts: persist Piper /tts source + manifest (telephony TTS port baseline) Dockerfile (linux/arm64, en_US-amy-medium baked), tts_service.py (16kHz/16-bit/mono WAV, numpy resample 22050->16000), gx10-tts.yaml (CPU NodePort 30850, no GPU request), README (build/import/cutover/verify on the GX10 cluster).	2026-06-14 14:14:59 -05:00
Andrew Stoltz	e4d1735d35	telephony: make TTS cutover EFFECTIVE via Tts__PiperUrl env (overrides configmap) Root cause: the live deploy carried env Tts__PiperUrl=edge1 (drifted, not in git) which shadows appsettings Tts.PiperUrl. Codify Tts__PiperUrl=GX10 + Ari__ env to match live so git is source-of-truth; the configmap edit alone was inert.	2026-06-14 14:12:02 -05:00
Andrew Stoltz	15edcb7c71	telephony: cut TTS over to GX10 (10.0.56.14:30850, amy-medium); keep edge1 warm - Tts.PiperUrl edge1 10.0.57.17:8500 -> GX10 NodePort 10.0.56.14:30850 - add netpol egress to GX10 TTS; keep edge1 egress as rollback target - DefaultEngine piper / SampleRate 8000 unchanged (sln16 16kHz path)	2026-06-14 14:01:50 -05:00
Andrew Stoltz	284ca84166	agent-zero: GX10 system prompt rewrite (tool-calling + RAG rules, strip dead lanes) Sync the bluejay-profile ConfigMap's embedded system_prompt.md with the rewritten scripts/agent-zero/agents/bluejay/system_prompt.md: Ollama section -> GX10 hub (VIP 10.0.57.201, GB10/121GiB); model table with tool-calling flags (qwen2.5 = tools, gemma3 = 400-on-tools/vision-only, nomic = embed); new 'Models & Tool-Calling' + 'Knowledge & RAG' rule blocks; stripped dead WSL/R9700/.132/host.docker.internal/port-30050 lanes; de-pinned test counts; 'Blu' team is persona vocabulary not a fixed team. Personality preserved. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-14 13:40:25 -05:00
Andrew Stoltz	7a86c40cf1	fix(telephony): ARI receive timeout 45s->3600s — the real false-abort root cause Cancelling ClientWebSocket.ReceiveAsync via CancellationToken ABORTS the socket (a half-read WS frame can't resume). The per-iteration iterationCts.CancelAfter(WebSocketReceiveTimeoutSeconds) therefore aborted a healthy idle ARI WebSocket every 45s (state=Aborted), not the keepalive pong (proven: loop persisted after pong-timeout 15s->3600s). A large receive timeout lets ReceiveAsync block harmlessly while the PBX is idle; real drops still surface immediately as WebSocketException -> reconnect. Proper code fix (stop using CancelAfter on the receive) tracked separately. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-14 13:04:13 -05:00
Andrew Stoltz	de5c9f39fd	deploy(devicemgmt): pin regroup web image	2026-06-14 12:52:30 -05:00
Andrew Stoltz	d5311de676	fix(telephony): stop ARI WebSocket false-abort loop (pong-timeout 15s->3600s) Asterisk res_http_websocket does not reliably answer client PING frames with PONG, so .NET KeepAliveTimeout (default 15s) aborted a healthy idle ARI WebSocket every ~45s (ping@30s + pong-wait@15s), dropping StasisStart events so the *100 IVR intermittently answered with no audio. Generous pong timeout stops the false aborts; genuine drops still caught by the 45s receive-timeout state re-check and TCP-level WebSocketException. Surfaced by FlowerCore.Telephony.SipTests Call_Star100_ReceivesAudibleAudioStream (0 RTP packets while ExtToExt RTP-hook passed). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-14 12:50:12 -05:00
Andrew Stoltz	7b4f57bb97	deploy(updater): pin regroup web image	2026-06-14 12:45:39 -05:00
Andrew Stoltz	c569c05ad7	deploy(retail-library): roll regroup web images	2026-06-14 12:38:57 -05:00
Andrew Stoltz	fc8297041a	deploy(fc-chat): roll effective-prompt debug reveal v20260614-debugreveal-d389e4b Influence Audit panel now surfaces the per-turn effective prompt (RagContextSnapshot) as an operator/debug row. FlowerCore.Chat d389e4b. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-14 12:33:37 -05:00

1 2 3 4 5 ...

585 Commits