Commit Graph

353 Commits

Author SHA1 Message Date
Andrew Stoltz
3c1d212251 fc-messageboard: deploy latest web image via gitops 2026-04-22 15:48:05 -05:00
Andrew Stoltz
c0547a9964 fc-signalcontrol: switch probes to tcpSocket — middleware blocks /health
The app's ApiKeyAuthenticationMiddleware runs BEFORE /health is mapped, so
unauthenticated probe requests get 404. tcpSocket probes verify the listener
is up without auth, which is correct for an internal K8s probe (kubelet
talks pod IP directly, not externally).

Real fix is in the app: move /health before the middleware or mark it
[AllowAnonymous]. Tracked separately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 15:21:04 -05:00
Andrew Stoltz
973c1dae72 fc-signalcontrol: fix probe path /metrics/prometheus -> /health
The app exposes /health (Program.cs:91 maps a Healthy text response) but does
NOT expose /metrics/prometheus. K8s liveness/readiness probes against a 404
endpoint kept the pod in CrashLoopBackOff after PVC mount was added.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 15:15:07 -05:00
Andrew Stoltz
475737b36f fc-signalcontrol: add PVC + volumeMount for SQLite data dir
Live cluster had a Longhorn PVC `signalcontrol-data` mounted at /app/data
since 2026-04-14, but the bluejay-infra git manifest never declared it. As a
result, when ArgoCD recreated the Deployment from git (after deletion to fix
an unrelated selector-label mismatch caught during cert-manager recovery),
the new pod started without /app/data and crashed with `SQLite Error 14:
unable to open database file 'data/signalcontrol.db'`.

Bring git in line with reality: declare the PVC, mount it, and switch the
Deployment to `strategy.type: Recreate` (RWO PVC blocks rolling updates per
existing memory feedback_k8s_rwo_rollout.md).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 15:10:10 -05:00
Andrew Stoltz
3bb3801fbd fix(monitoring): use short service name for irc-notify IRC_HOST
CoreDNS iamworkin.lan template + ndots:5 was hijacking
unrealircd.irc.svc.cluster.local lookups → Traefik VIP → timeout.
Every alert since ~2026-04-09 silently failed with "IRC send failed: timed out",
which also killed the thermal-printer path (routed through irc-notify).

Same fix pattern as guacamole@28b7600.
2026-04-22 09:55:23 -05:00
Andrew Stoltz
28b76001a8 fix(guacamole): use short service name for GUAC_URL (CoreDNS template collision)
The guac-k8s-sync CronJob has been crash-looping (exit 7) since the
2026-04-11 run. Root cause: CoreDNS has an `*.iamworkin.lan`
template wildcard, and the Kubernetes pod resolv.conf ships with
`ndots:5` plus a search list that includes `iamworkin.lan`.

Resolving `guacamole.guacamole.svc.cluster.local` (4 dots < 5) goes
through search-suffix expansion BEFORE the bare FQDN. The iamworkin.lan
suffix makes it `guacamole.guacamole.svc.cluster.local.iamworkin.lan`,
which matches the template and answers with Traefik LB VIP
10.0.56.200. That VIP has no pod-network hairpin route, so curl exits
with 'No route to host'.

Using the short name `http://guacamole:8080` keeps the query at 0
dots, search expansion runs on the bare name, and the in-namespace
`guacamole.svc.cluster.local` suffix hits the Kubernetes CoreDNS
plugin directly (ClusterIP 10.43.229.31).

Alt fixes considered but not taken: trim the CoreDNS template regex
to exclude `.svc.cluster.local.` prefixes (cross-cutting, higher
blast radius); trailing-dot FQDN in the URL (curl/Java HTTP clients
handle inconsistently).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 15:52:53 -05:00
Andrew Stoltz
0c67fa5356 asterisk: add *832 test-entry dialplan for VDAY workflow AATs
Lets live SIP AATs (ext 901–904, from-internal context) dial *832 to
exercise the Victory Day workflow + Fun Menu + AsteriskGameHandler path
without routing through Twilio. Mnemonic: *832 = V-D-A (8-3-2) from the
V-D-A-Y keypad pattern.

Maps to Stasis(flowercore-pbx,inbound-pstn,+15074618329) — same call-
type classification as a real Twilio-inbound call to the VDAY DID, so
InboundPstnHandler routes to the seeded VDAY workflow identically.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 15:51:49 -05:00
Andrew Stoltz
62e342cfb2 guacamole: consolidate nodeSelector — use rke2-server for guacd too
Previous commit 90deacd raced with the user's f0733ff (which had
already pinned the guacamole web Deployment to rke2-server for the
NFS ACL). That left two nodeSelector blocks on the web pod and an
inconsistent agent2 pin on guacd. Align both pods to rke2-server.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 15:36:25 -05:00
Andrew Stoltz
90deacd154 guacamole: pin guacd + web to rke2-agent2 for NFS recordings mount
Synology NFS export at /volume1/kubernetes currently grants mount
permission only to 10.0.56.13 (rke2-agent2). rke2-agent1 gets
"access denied by server". guacd + guacamole web both need the
recordings volume, so co-locating is also efficient. Remove the
nodeSelector once the Synology NFS ACL opens to all cluster nodes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 15:35:13 -05:00
Andrew Stoltz
f0733ff89d feat(guacamole): wire 1Password vault extension + logback into deployment
Adds the 1Password vault JAR to the Guacamole pod so connection params
like ${OP:ItemTitle/fieldLabel} are resolved from 1Password Connect at
tunnel-open time. Credentials never land in MySQL — only token literals.

Deployment changes:
- env: OP_CONNECT_URL=http://10.0.56.10:8180, OP_VAULT_ID=..., plus
  OP_CONNECT_TOKEN from secret/guacamole-1password-token/credential.
- env: ENABLE_ENVIRONMENT_PROPERTIES=true so OP_* env vars render as
  op-connect-url / op-connect-token / op-vault-id properties the
  extension reads.
- volumeMount for guacamole-vault-jar at
  /etc/guacamole/extensions/guacamole-vault-1password-1.0.0.jar
- volumeMount for guacamole-logback so we see DEBUG token-inject lines.
- nodeSelector kubernetes.io/hostname=rke2-server — the Synology NFS
  export for /volume1/kubernetes currently only allows rke2-server.
  Followup: add rke2-agent1/2 to the export and remove this selector.

New ConfigMaps:
- guacamole-vault-jar (binaryData, ~312KB JAR, Gson shaded, built from
  FlowerCore.Notes/k8s/guacamole/extensions/1password-vault via mvn).
- guacamole-logback with DEBUG on io.flowercore.guacamole.vault — drop
  to INFO once resolution is proven stable.

Existing guacamole-properties: added onepassword-vault to extension-priority.

The guacamole-1password-token Secret is NOT in git — it holds a verbatim
copy of the onepassword-connect-operator bearer token. Followup task:
provision a scoped Connect token for Guacamole and rotate the copy out.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 15:32:51 -05:00
Andrew Stoltz
313bdcb21a guacamole: NFS subPath — Synology exports /volume1/kubernetes root only
First pass used nfs.path=/volume1/kubernetes/guacamole/recordings,
which triggered "mount.nfs: access denied by server" on rke2-agent1.
Synology NFS export is scoped to /volume1/kubernetes; match the
working fc-desktop pattern: mount the export root and select the
subdirectory via volumeMount.subPath.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 15:23:49 -05:00
Andrew Stoltz
5f4818bd96 guacamole: wire session recording to Synology NFS
Phase 5 of docs/infrastructure/guacamole-customization-plan.md:

- Mount /volume1/kubernetes/guacamole/recordings (Synology 10.0.58.3)
  into both guacd (writer) and guacamole web (reader) at
  /var/lib/guacamole/recordings
- Set RECORDING_SEARCH_PATH env on guacamole web -- the Guacamole
  Docker entrypoint treats any RECORDING_* var as an enable signal
  for the history-recording-storage extension (symlinks the JAR
  from /opt/guacamole/environment/RECORDING_/extensions/ into
  GUACAMOLE_HOME/extensions/)

Per-connection recording still requires setting recording-path on
each connection in MySQL -- follow-up task. This commit enables
the plumbing; no sessions record yet.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 15:15:55 -05:00
Andrew Stoltz
fff998dab5 matrix, zabbix: add volumeMode to postgres PVC templates
Same ArgoCD + SSA self-heal loop pattern as guacamole (20e4130):
K8s defaults volumeMode=Filesystem on volumeClaimTemplates at
creation, git omits it, argocd-controller owns the atomic list so
every reconcile sees drift, and volumeClaimTemplates is immutable
so it can never reconcile. Adding the field closes both loops.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 14:48:43 -05:00
Andrew Stoltz
20e4130c74 guacamole: add volumeMode to guac-mysql PVC template
Closes the infra-guacamole OutOfSync sync loop. K8s API sets
volumeMode=Filesystem as a default on volumeClaimTemplates at creation,
but the git manifest omitted it. ArgoCD uses ServerSideApply with
atomic ownership of volumeClaimTemplates, so every sync saw a
desired/live mismatch on that one field. volumeClaimTemplates is
immutable after creation so ArgoCD could never reconcile it --
autoHealAttemptsCount climbed to 6091. Adding the field to git
matches live and breaks the loop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 14:29:40 -05:00
Andrew Stoltz
3cf675b8c3 ttsreader: wire operator secrets through 1password 2026-04-17 10:05:24 -05:00
Andrew Stoltz
2a9f2e4540 Improve TTS Reader workspace card layout 2026-04-17 03:57:23 -05:00
Andrew Stoltz
b15a35a258 Fix TTS Reader character layout 2026-04-17 03:48:03 -05:00
Andrew Stoltz
3f4985ee13 Deploy TTS Reader queue feedback fix 2026-04-17 03:34:28 -05:00
Andrew Stoltz
e535a8d34b Deploy TTS Reader voice preview update 2026-04-17 02:13:09 -05:00
Andrew Stoltz
6ddbd2cae5 Point TTS Reader at Pi Ollama defaults 2026-04-17 00:53:45 -05:00
Andrew Stoltz
e9608651f7 Bump TTS Reader image to v20260417001119 2026-04-17 00:33:29 -05:00
Andrew Stoltz
abdb7a806e Bump TTS Reader image to v20260416234817 2026-04-16 23:53:42 -05:00
Andrew Stoltz
7afb5043c4 Fix ttsreader forwarded header handling 2026-04-16 21:55:46 -05:00
Andrew Stoltz
1883953cb8 Enable live Piper and ffmpeg for fc-ttsreader 2026-04-16 21:18:43 -05:00
Andrew Stoltz
9c555db083 telephony: bump web image to v202604170153 2026-04-16 20:56:30 -05:00
Andrew Stoltz
cb349c6764 Configure TtsReader Bible corpus path 2026-04-16 20:44:23 -05:00
Andrew Stoltz
3888c4c3e0 Align fc-ttsreader with hardened runtime 2026-04-16 20:06:53 -05:00
Andrew Stoltz
7aec403e96 Pin telephony-web v202604170059 2026-04-16 20:03:01 -05:00
Andrew Stoltz
5685ab0550 Improve The Lounge MOTD contrast 2026-04-16 19:49:53 -05:00
Andrew Stoltz
d4d3455ef2 Style The Lounge as FlowerCore IRC 2026-04-16 19:45:52 -05:00
Andrew Stoltz
29d557003f fix: deploy responsive telephony debug menu 2026-04-16 19:45:49 -05:00
Andrew Stoltz
719aa8c1c6 fix: align desk phone dtmf mode with yealink provisioning 2026-04-16 19:36:37 -05:00
Andrew Stoltz
63cf5193ef Use recreate strategy for UnrealIRCd 2026-04-16 19:33:28 -05:00
Andrew Stoltz
ef0e1f2505 fix: update telephony web image tag 2026-04-16 19:30:36 -05:00
Andrew Stoltz
f8eb946704 Add IRC MOTD file 2026-04-16 19:29:43 -05:00
Andrew Stoltz
929449c55c apps: fc-chat refactor + fc-menuboard app split
- fc-chat.yaml: TLS/IngressRoute only (Deployment managed by deploy script, matches fc-signage/fc-mysql/fc-kiosk pattern)
- fc-menuboard: new app bundle

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 19:25:25 -05:00
Andrew Stoltz
9d0da584af Add The Lounge web IRC client 2026-04-16 19:10:10 -05:00
Andrew Stoltz
4f33d7a053 fix(telephony): chown /shared-tts in initContainer + harden security context
Two follow-ups to the Piper TTS wire-up landed in d3ffad9:

1. Telephony-web runs as uid 1654 (non-root), but the hostPath at
   /tmp/tts-audio is owned by root:root 0755. Pod couldn't write .sln16
   files — every Piper call would succeed at the HTTP layer and then
   fall back to the sound map when File.WriteAllBytesAsync threw
   "Permission denied." Extend the existing fix-data-perms initContainer
   to chown the shared-tts mount too (0755 world-readable, so the
   Asterisk pod — running as a different uid — can still read).

2. Pod security context now explicitly sets runAsNonRoot: true + runAsUser
   1654 + runAsGroup 1654 (cluster policy), matching the pattern used
   by every other FlowerCore service.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 16:29:21 -05:00
Andrew Stoltz
d3ffad9190 fix(telephony): PiperUrl 10.0.57.15 → .17 + shared-tts hostPath for TTS playback
Piper was never reachable on 10.0.57.15 — edge1's actual address is
10.0.57.17 (SSH config, project_edge1_sdcard memory). Every telephony
prompt hit the 8s HttpClient timeout and fell back to the built-in sound
map (vm-advopts, vm-goodbye, beep) instead of speaking the real workflow
text. Verified from noc1: `curl http://10.0.57.17:8500/health` returns
HTTP 200 in 6ms, `POST /tts` returns a 16kHz mono WAV in 606ms.

Changes:

- apps/telephony/telephony.yaml
  - `Tts.PiperUrl` → `http://10.0.57.17:8500`
  - NetworkPolicy egress allow → `10.0.57.17/32:8500`
  - Header comment now documents the POST /tts {"text":"..."} contract
  - telephony-web pod mounts `/shared-tts` from hostPath `/tmp/tts-audio`
    (rke2-agent1). This is where `AsteriskProvider.SpeakTextAsync` writes
    the synthesized .sln16 before calling ARI `Play sound:tts/<name>`.

- apps/asterisk/deployment.yaml
  - Asterisk pod mounts the same hostPath at
    `/var/lib/asterisk/sounds/tts` so it can read and play what
    telephony-web wrote. Both deployments have
    `nodeSelector: kubernetes.io/hostname: rke2-agent1` so the hostPath
    is guaranteed to be the same directory.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 16:19:48 -05:00
Andrew Stoltz
403d061664 fix(asterisk): hostAlias downloads.asterisk.org so sounds actually download
CoreDNS wildcard for iamworkin.lan catches unresolved names and returns
the Traefik VIP (10.0.56.200), so downloads.asterisk.org from inside a
pod returns 404 from Traefik rather than the real Sangoma mirror. Pin
the real IP (165.22.184.19 = oss-downloads.sangoma.com) via hostAliases
so curl reaches the actual server.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 15:54:39 -05:00
Andrew Stoltz
45a2cb3f93 fix(asterisk): curl -k for sounds download — cluster TLS MITM
Cluster egress goes through a step-ca-fronted TLS proxy that install-sounds
doesn't trust ("SSL certificate problem: self-signed certificate"). The
Asterisk core sounds tarball is a public artifact; integrity is enforced
downstream when Asterisk plays the file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 15:48:34 -05:00
Andrew Stoltz
e1922564ae fix(asterisk): actually install core sounds (en, ulaw 1.6.1)
The install-sounds init container was a stub that left /var/lib/asterisk/sounds/en
empty. Result: every SpeakText fallback path (vm-advopts, vm-goodbye, characters:*,
digits/*, beep, pbx-invalid) resolved to a missing file, Asterisk silently failed
each Playback, zero RTP was produced, and callers heard dead air. This is why
dialing *0 (Settings Menu) or *100 (Debug IVR) "picks up quietly" — there is
literally nothing to stream.

Replaced the stub with alpine:3.20 + curl + tar that downloads the pinned
asterisk-core-sounds-en-ulaw-1.6.1.tar.gz (~10 MB) from downloads.asterisk.org
and unpacks it into the sounds emptyDir. Idempotent — skips download if
vm-goodbye is already present.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 15:42:58 -05:00
Andrew Stoltz
7762a0079a Add K8s deployment manifests for SignalControl, MessageBoard, Chat, TTS Reader
Full deployment manifests (Namespace, Deployment, Service, Certificate,
IngressRoute) for 4 new FlowerCore services with port 8080, ClusterIP
on port 80, cert-manager step-ca-acme TLS, and /metrics/prometheus
health probes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 20:23:55 -05:00
Andrew Stoltz
ab7435a43a Update Agent Zero, Asterisk, and Telephony K8s manifests
- Update agent-zero deployment configuration
- Update Asterisk configmap and deployment
- Update telephony service manifest

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 19:12:08 -05:00
Andrew Stoltz
53234bfcc8 Fix K8s sync script: use grep instead of python3
bitnami/kubectl image doesn't have python3. Replaced all python3
JSON parsing with grep/cut for auth token and connection data.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 23:02:02 -05:00
Andrew Stoltz
cf572c167f Update Guacamole: branding JAR, K8s sync CronJob
- Updated bluejay-branding-1.0.0.jar with gold accents, hover fix,
  icon fix, pinstripe patterns, Blue Jay SVG logo
- Added guac-k8s-sync CronJob: runs every 2min, auto-updates pod
  names in Kubernetes exec connections when pods restart
- Fixed secret reference (guacamole-credentials, not guacamole-db-credentials)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 22:49:48 -05:00
Andrew Stoltz
7d5d0f86e7 Add signage service ingress manifests 2026-04-09 15:09:08 -05:00
Andrew Stoltz
8f59322329 Add step-ca TLS certs for mysql, php, desktop, signage, fc-landing
RKE2 Traefik has no ACME certResolver configured, so IngressRoutes
using certResolver: step-ca silently fall back to the Traefik default
self-signed cert. Fix by using cert-manager Certificate resources with
the step-ca-acme ClusterIssuer and tls.secretName in IngressRoutes.

- fc-landing: Add Certificate, change tls: {} to tls.secretName
- fc-mysql: New app (Certificate + IngressRoute only)
- fc-php: New app (Certificate + IngressRoute only)
- fc-desktop: New app (Certificate + IngressRoute only)
- fc-signage: New app (Certificate + IngressRoute, plus HTTP route for players)

Deployments/Services for mysql/php/desktop/signage are managed by
deploy scripts, not ArgoCD. These apps only manage TLS + ingress.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 18:20:23 -05:00
Claude Code
8f8290e0da Increase ctx to 8192 (system prompt + 21 tools need >2048) 2026-04-08 20:07:27 +00:00
Claude Code
607192aaec Reduce ctx to 2048 for Pi 5 CPU speed 2026-04-08 19:40:52 +00:00