php.iamworkin.lan returned 404 on every path: the GX10 GitOps capture grabbed
fc-php's deployment/service but NOT its IngressRoute (chicken-egg — php wasn't
routed at capture time), so Traefik matched no route. Pod is 1/1 Running 37h —
the 404 was pure missing-route, confirmed by diffing against the healthy sibling
mysql-web (which has its IngressRoute).
Mirrors the mysql-web / fc-network pattern: a cert-manager Certificate (step-ca-acme
ClusterIssuer) to mint php-web-tls + an IngressRoute Host(php.iamworkin.lan)->php-web:5400.
Additive only.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The github-runner-tts-reader pod was crash-looping (329 restarts) and consuming
memory on the over-pressured old rke2 cluster (rke2-agent1 ~81%), contributing
to Blazor SignalR circuit drops on ttsreader/chat. It provides no working CI in
this state. Set replicas: 0 so ArgoCD stops re-creating it; restore to 1 once
the runner is fixed or CI moves to a working host.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Bootstrap manifests for the GX10 cluster platform layer (NUC->GX10 migration).
Direct-applied to GX10 + LIVE: step-ca-acme ClusterIssuer Ready (ACME->noc1 step-ca),
Traefik v3.6.10 via RKE2 HelmChart CRD at MetalLB VIP 10.0.57.202 (prod-pool, temp
parallel-run; no clash with live old .200). Under gx10/ NOT apps/* to avoid the old
ApplicationSet auto-deploying GX10 manifests to the OLD cluster.
NetworkPolicy matches the destination POD port. dms-web svc:80 -> containerPort
8080, so the egress must allow 8080 (the fc-chat rule already allows 80+8080,
which is why chat worked and dms timed out). Add 8080 to the fc-dms egress.
AZ only had fc_chat (chat-session) + fc_knowledge (RAG) — so it had no product
capabilities (the 'mysql manager' gap). Wire fc_dms (dynamic message signs, ~13
tools): OnePasswordItem dms-mcp-keys (1P 'FlowerCore DMS MCP Keys' field credential)
-> DMS_MCP_API_KEY -> X-Api-Key; builder adds fc_dms; netpol egress fc-dms:80.
Proven: dms-web/mcp returns 200 with this key. presentations/messageboard/
segmentdisplay/telephony 1P MCP-key items exist for the same pattern; mysql+signage
need 1P items provisioned first (mysql/mcp 401s with no key). Watch context budget.
The PROD-VLAN VIP 10.0.57.201 MTU-black-holes Agent Zero's ~150KB requests
(full prompt + 108 MCP tools) -> connection reset mid-stream -> AZ 'same message
again' loop. Switch FlowerCore__Chat__OllamaBaseUrl to the INFRA-VLAN NodePort
10.0.56.14:30976 (same VLAN as the old cluster, carries 150KB fine). Verified:
150KB POST = 200 via NodePort, times out via VIP. NodePort pinned to 30976 on GX10.
Image -> v20260614-arifix (Telephony 86ff0d0: ReceiveAsync no longer cancelled).
Remove the WebSocketKeepAliveTimeoutSeconds/WebSocketReceiveTimeoutSeconds=3600
band-aids; the code now disables the pong deadline by default and ignores the
receive timeout (liveness = keepalive ping + WebSocketException/Close).
Root cause: the live deploy carried env Tts__PiperUrl=edge1 (drifted, not in git)
which shadows appsettings Tts.PiperUrl. Codify Tts__PiperUrl=GX10 + Ari__ env to
match live so git is source-of-truth; the configmap edit alone was inert.
Sync the bluejay-profile ConfigMap's embedded system_prompt.md with the
rewritten scripts/agent-zero/agents/bluejay/system_prompt.md: Ollama section
-> GX10 hub (VIP 10.0.57.201, GB10/121GiB); model table with tool-calling
flags (qwen2.5 = tools, gemma3 = 400-on-tools/vision-only, nomic = embed);
new 'Models & Tool-Calling' + 'Knowledge & RAG' rule blocks; stripped dead
WSL/R9700/.132/host.docker.internal/port-30050 lanes; de-pinned test counts;
'Blu' team is persona vocabulary not a fixed team. Personality preserved.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Cancelling ClientWebSocket.ReceiveAsync via CancellationToken ABORTS the
socket (a half-read WS frame can't resume). The per-iteration
iterationCts.CancelAfter(WebSocketReceiveTimeoutSeconds) therefore aborted a
healthy idle ARI WebSocket every 45s (state=Aborted), not the keepalive pong
(proven: loop persisted after pong-timeout 15s->3600s). A large receive
timeout lets ReceiveAsync block harmlessly while the PBX is idle; real drops
still surface immediately as WebSocketException -> reconnect. Proper code fix
(stop using CancelAfter on the receive) tracked separately.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Asterisk res_http_websocket does not reliably answer client PING frames
with PONG, so .NET KeepAliveTimeout (default 15s) aborted a healthy idle
ARI WebSocket every ~45s (ping@30s + pong-wait@15s), dropping StasisStart
events so the *100 IVR intermittently answered with no audio. Generous pong
timeout stops the false aborts; genuine drops still caught by the 45s
receive-timeout state re-check and TCP-level WebSocketException.
Surfaced by FlowerCore.Telephony.SipTests Call_Star100_ReceivesAudibleAudioStream
(0 RTP packets while ExtToExt RTP-hook passed).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Influence Audit panel now surfaces the per-turn effective prompt
(RagContextSnapshot) as an operator/debug row. FlowerCore.Chat d389e4b.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>