fix(guacamole): use short service name for GUAC_URL (CoreDNS template collision)

The guac-k8s-sync CronJob has been crash-looping (exit 7) since the
2026-04-11 run. Root cause: CoreDNS has an `*.iamworkin.lan`
template wildcard, and the Kubernetes pod resolv.conf ships with
`ndots:5` plus a search list that includes `iamworkin.lan`.

Resolving `guacamole.guacamole.svc.cluster.local` (4 dots < 5) goes
through search-suffix expansion BEFORE the bare FQDN. The iamworkin.lan
suffix makes it `guacamole.guacamole.svc.cluster.local.iamworkin.lan`,
which matches the template and answers with Traefik LB VIP
10.0.56.200. That VIP has no pod-network hairpin route, so curl exits
with 'No route to host'.

Using the short name `http://guacamole:8080` keeps the query at 0
dots, search expansion runs on the bare name, and the in-namespace
`guacamole.svc.cluster.local` suffix hits the Kubernetes CoreDNS
plugin directly (ClusterIP 10.43.229.31).

Alt fixes considered but not taken: trim the CoreDNS template regex
to exclude `.svc.cluster.local.` prefixes (cross-cutting, higher
blast radius); trailing-dot FQDN in the URL (curl/Java HTTP clients
handle inconsistently).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Andrew Stoltz
2026-04-17 15:52:53 -05:00
parent 0c67fa5356
commit 28b76001a8

View File

@@ -543,7 +543,15 @@ spec:
- /scripts/sync-k8s-connections.sh
env:
- name: GUAC_URL
value: "http://guacamole.guacamole.svc.cluster.local:8080"
# Short name on purpose: CoreDNS has an *.iamworkin.lan
# template wildcard that resolves
# guacamole.guacamole.svc.cluster.local.iamworkin.lan
# (search-expanded before the bare FQDN gets tried with
# ndots:5) to the Traefik LB VIP 10.0.56.200, which has
# no pod-network hairpin route. The short name leans on
# the namespace's own search suffix and hits the
# Kubernetes CoreDNS plugin directly.
value: "http://guacamole:8080"
- name: GUAC_ADMIN_USER
value: "guacadmin"
- name: GUAC_ADMIN_PASSWORD