The guac-k8s-sync CronJob has been crash-looping (exit 7) since the
2026-04-11 run. Root cause: CoreDNS has an `*.iamworkin.lan`
template wildcard, and the Kubernetes pod resolv.conf ships with
`ndots:5` plus a search list that includes `iamworkin.lan`.
Resolving `guacamole.guacamole.svc.cluster.local` (4 dots < 5) goes
through search-suffix expansion BEFORE the bare FQDN. The iamworkin.lan
suffix makes it `guacamole.guacamole.svc.cluster.local.iamworkin.lan`,
which matches the template and answers with Traefik LB VIP
10.0.56.200. That VIP has no pod-network hairpin route, so curl exits
with 'No route to host'.
Using the short name `http://guacamole:8080` keeps the query at 0
dots, search expansion runs on the bare name, and the in-namespace
`guacamole.svc.cluster.local` suffix hits the Kubernetes CoreDNS
plugin directly (ClusterIP 10.43.229.31).
Alt fixes considered but not taken: trim the CoreDNS template regex
to exclude `.svc.cluster.local.` prefixes (cross-cutting, higher
blast radius); trailing-dot FQDN in the URL (curl/Java HTTP clients
handle inconsistently).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lets live SIP AATs (ext 901–904, from-internal context) dial *832 to
exercise the Victory Day workflow + Fun Menu + AsteriskGameHandler path
without routing through Twilio. Mnemonic: *832 = V-D-A (8-3-2) from the
V-D-A-Y keypad pattern.
Maps to Stasis(flowercore-pbx,inbound-pstn,+15074618329) — same call-
type classification as a real Twilio-inbound call to the VDAY DID, so
InboundPstnHandler routes to the seeded VDAY workflow identically.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous commit 90deacd raced with the user's f0733ff (which had
already pinned the guacamole web Deployment to rke2-server for the
NFS ACL). That left two nodeSelector blocks on the web pod and an
inconsistent agent2 pin on guacd. Align both pods to rke2-server.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Synology NFS export at /volume1/kubernetes currently grants mount
permission only to 10.0.56.13 (rke2-agent2). rke2-agent1 gets
"access denied by server". guacd + guacamole web both need the
recordings volume, so co-locating is also efficient. Remove the
nodeSelector once the Synology NFS ACL opens to all cluster nodes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the 1Password vault JAR to the Guacamole pod so connection params
like ${OP:ItemTitle/fieldLabel} are resolved from 1Password Connect at
tunnel-open time. Credentials never land in MySQL — only token literals.
Deployment changes:
- env: OP_CONNECT_URL=http://10.0.56.10:8180, OP_VAULT_ID=..., plus
OP_CONNECT_TOKEN from secret/guacamole-1password-token/credential.
- env: ENABLE_ENVIRONMENT_PROPERTIES=true so OP_* env vars render as
op-connect-url / op-connect-token / op-vault-id properties the
extension reads.
- volumeMount for guacamole-vault-jar at
/etc/guacamole/extensions/guacamole-vault-1password-1.0.0.jar
- volumeMount for guacamole-logback so we see DEBUG token-inject lines.
- nodeSelector kubernetes.io/hostname=rke2-server — the Synology NFS
export for /volume1/kubernetes currently only allows rke2-server.
Followup: add rke2-agent1/2 to the export and remove this selector.
New ConfigMaps:
- guacamole-vault-jar (binaryData, ~312KB JAR, Gson shaded, built from
FlowerCore.Notes/k8s/guacamole/extensions/1password-vault via mvn).
- guacamole-logback with DEBUG on io.flowercore.guacamole.vault — drop
to INFO once resolution is proven stable.
Existing guacamole-properties: added onepassword-vault to extension-priority.
The guacamole-1password-token Secret is NOT in git — it holds a verbatim
copy of the onepassword-connect-operator bearer token. Followup task:
provision a scoped Connect token for Guacamole and rotate the copy out.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First pass used nfs.path=/volume1/kubernetes/guacamole/recordings,
which triggered "mount.nfs: access denied by server" on rke2-agent1.
Synology NFS export is scoped to /volume1/kubernetes; match the
working fc-desktop pattern: mount the export root and select the
subdirectory via volumeMount.subPath.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 5 of docs/infrastructure/guacamole-customization-plan.md:
- Mount /volume1/kubernetes/guacamole/recordings (Synology 10.0.58.3)
into both guacd (writer) and guacamole web (reader) at
/var/lib/guacamole/recordings
- Set RECORDING_SEARCH_PATH env on guacamole web -- the Guacamole
Docker entrypoint treats any RECORDING_* var as an enable signal
for the history-recording-storage extension (symlinks the JAR
from /opt/guacamole/environment/RECORDING_/extensions/ into
GUACAMOLE_HOME/extensions/)
Per-connection recording still requires setting recording-path on
each connection in MySQL -- follow-up task. This commit enables
the plumbing; no sessions record yet.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same ArgoCD + SSA self-heal loop pattern as guacamole (20e4130):
K8s defaults volumeMode=Filesystem on volumeClaimTemplates at
creation, git omits it, argocd-controller owns the atomic list so
every reconcile sees drift, and volumeClaimTemplates is immutable
so it can never reconcile. Adding the field closes both loops.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the infra-guacamole OutOfSync sync loop. K8s API sets
volumeMode=Filesystem as a default on volumeClaimTemplates at creation,
but the git manifest omitted it. ArgoCD uses ServerSideApply with
atomic ownership of volumeClaimTemplates, so every sync saw a
desired/live mismatch on that one field. volumeClaimTemplates is
immutable after creation so ArgoCD could never reconcile it --
autoHealAttemptsCount climbed to 6091. Adding the field to git
matches live and breaks the loop.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- fc-chat.yaml: TLS/IngressRoute only (Deployment managed by deploy script, matches fc-signage/fc-mysql/fc-kiosk pattern)
- fc-menuboard: new app bundle
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two follow-ups to the Piper TTS wire-up landed in d3ffad9:
1. Telephony-web runs as uid 1654 (non-root), but the hostPath at
/tmp/tts-audio is owned by root:root 0755. Pod couldn't write .sln16
files — every Piper call would succeed at the HTTP layer and then
fall back to the sound map when File.WriteAllBytesAsync threw
"Permission denied." Extend the existing fix-data-perms initContainer
to chown the shared-tts mount too (0755 world-readable, so the
Asterisk pod — running as a different uid — can still read).
2. Pod security context now explicitly sets runAsNonRoot: true + runAsUser
1654 + runAsGroup 1654 (cluster policy), matching the pattern used
by every other FlowerCore service.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Piper was never reachable on 10.0.57.15 — edge1's actual address is
10.0.57.17 (SSH config, project_edge1_sdcard memory). Every telephony
prompt hit the 8s HttpClient timeout and fell back to the built-in sound
map (vm-advopts, vm-goodbye, beep) instead of speaking the real workflow
text. Verified from noc1: `curl http://10.0.57.17:8500/health` returns
HTTP 200 in 6ms, `POST /tts` returns a 16kHz mono WAV in 606ms.
Changes:
- apps/telephony/telephony.yaml
- `Tts.PiperUrl` → `http://10.0.57.17:8500`
- NetworkPolicy egress allow → `10.0.57.17/32:8500`
- Header comment now documents the POST /tts {"text":"..."} contract
- telephony-web pod mounts `/shared-tts` from hostPath `/tmp/tts-audio`
(rke2-agent1). This is where `AsteriskProvider.SpeakTextAsync` writes
the synthesized .sln16 before calling ARI `Play sound:tts/<name>`.
- apps/asterisk/deployment.yaml
- Asterisk pod mounts the same hostPath at
`/var/lib/asterisk/sounds/tts` so it can read and play what
telephony-web wrote. Both deployments have
`nodeSelector: kubernetes.io/hostname: rke2-agent1` so the hostPath
is guaranteed to be the same directory.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CoreDNS wildcard for iamworkin.lan catches unresolved names and returns
the Traefik VIP (10.0.56.200), so downloads.asterisk.org from inside a
pod returns 404 from Traefik rather than the real Sangoma mirror. Pin
the real IP (165.22.184.19 = oss-downloads.sangoma.com) via hostAliases
so curl reaches the actual server.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cluster egress goes through a step-ca-fronted TLS proxy that install-sounds
doesn't trust ("SSL certificate problem: self-signed certificate"). The
Asterisk core sounds tarball is a public artifact; integrity is enforced
downstream when Asterisk plays the file.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The install-sounds init container was a stub that left /var/lib/asterisk/sounds/en
empty. Result: every SpeakText fallback path (vm-advopts, vm-goodbye, characters:*,
digits/*, beep, pbx-invalid) resolved to a missing file, Asterisk silently failed
each Playback, zero RTP was produced, and callers heard dead air. This is why
dialing *0 (Settings Menu) or *100 (Debug IVR) "picks up quietly" — there is
literally nothing to stream.
Replaced the stub with alpine:3.20 + curl + tar that downloads the pinned
asterisk-core-sounds-en-ulaw-1.6.1.tar.gz (~10 MB) from downloads.asterisk.org
and unpacks it into the sounds emptyDir. Idempotent — skips download if
vm-goodbye is already present.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Full deployment manifests (Namespace, Deployment, Service, Certificate,
IngressRoute) for 4 new FlowerCore services with port 8080, ClusterIP
on port 80, cert-manager step-ca-acme TLS, and /metrics/prometheus
health probes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
bitnami/kubectl image doesn't have python3. Replaced all python3
JSON parsing with grep/cut for auth token and connection data.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Updated bluejay-branding-1.0.0.jar with gold accents, hover fix,
icon fix, pinstripe patterns, Blue Jay SVG logo
- Added guac-k8s-sync CronJob: runs every 2min, auto-updates pod
names in Kubernetes exec connections when pods restart
- Fixed secret reference (guacamole-credentials, not guacamole-db-credentials)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
RKE2 Traefik has no ACME certResolver configured, so IngressRoutes
using certResolver: step-ca silently fall back to the Traefik default
self-signed cert. Fix by using cert-manager Certificate resources with
the step-ca-acme ClusterIssuer and tls.secretName in IngressRoutes.
- fc-landing: Add Certificate, change tls: {} to tls.secretName
- fc-mysql: New app (Certificate + IngressRoute only)
- fc-php: New app (Certificate + IngressRoute only)
- fc-desktop: New app (Certificate + IngressRoute only)
- fc-signage: New app (Certificate + IngressRoute, plus HTTP route for players)
Deployments/Services for mysql/php/desktop/signage are managed by
deploy scripts, not ArgoCD. These apps only manage TLS + ingress.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>