bluejay-infra

Author	SHA1	Message	Date
Andrew Stoltz	e50e103ba0	fix(zabbix): bump web probe timeouts 5s→15s + add failureThreshold zabbix-web nginx+PHP-FPM container serves / at ~3-5s baseline with occasional 6-7s spikes (probe path renders full dashboard via PHP). kube-probe was killing the container after 3 consecutive 5s-timeout 499s, producing CrashLoopBackOff alert noise even though the app was serving real traffic fine. 15s timeout absorbs the natural variance; explicit failureThreshold=3 documents the policy (was implicit default). Closes the firing PodCrashLoopBackOff (zabbix-web) + pending HTTPServiceSlow/HTTPServiceDegraded alerts. zabbix.iamworkin.lan remains slow at the application layer (separate work — PHP-FPM warm-up + Zabbix server "host not found" agent lookup spam need their own fixes) but the pod restart loop stops.	2026-05-15 15:59:04 -05:00
Andrew Stoltz	1dc66738e6	zabbix: align postgres tracking label	2026-04-22 22:50:24 -05:00
Andrew Stoltz	5623a272c5	zabbix: include statefulset defaults	2026-04-22 22:39:31 -05:00
Andrew Stoltz	fff998dab5	matrix, zabbix: add volumeMode to postgres PVC templates Same ArgoCD + SSA self-heal loop pattern as guacamole (`20e4130`): K8s defaults volumeMode=Filesystem on volumeClaimTemplates at creation, git omits it, argocd-controller owns the atomic list so every reconcile sees drift, and volumeClaimTemplates is immutable so it can never reconcile. Adding the field closes both loops. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 14:48:43 -05:00
Andrew M. Stoltz	f3fde15002	Update telephony-web image to v20260324d, resolve merge conflicts	2026-03-24 15:55:52 -05:00
Claude Code	efc3dc5b4e	Increase Zabbix web probe timeouts to 5s (prevents 502 during heavy dashboard queries)	2026-03-12 20:40:09 -05:00
Claude Code	518340b373	Tune Zabbix stack: PostgreSQL, web PHP-FPM, server caches PostgreSQL 16: - shared_buffers 128MB→256MB, work_mem 4MB→16MB - random_page_cost 4→1.1 (SSD/Longhorn), effective_io_concurrency→200 - maintenance_work_mem→128MB, wal_buffers→8MB - max_connections 100→50, memory limit 512Mi→1Gi Zabbix Web: - PHP_FPM_PM_MAX_CHILDREN 50→10 (fixes 68x OOMKill) - ZBX_MEMORYLIMIT 128M→256M, PM_MAX_REQUESTS→500 - Memory limit 512Mi→768Mi, request 128Mi→256Mi Zabbix Server: - ZBX_CACHESIZE→64M, ZBX_VALUECACHESIZE→64M - ZBX_HISTORYCACHESIZE→32M, ZBX_TRENDCACHESIZE→8M - ZBX_STARTPOLLERS→10, ZBX_STARTPOLLERSUNREACHABLE→3	2026-03-12 19:21:15 -05:00
Andrew Stoltz	3199c509c0	Wire Zabbix/Matrix credentials to 1Password-synced secrets, add OnePasswordItem CRDs - Zabbix: Remove hardcoded zabbix-db-secret and zabbix-admin-secret, reference zabbix-credentials (1Password) for DB-User, DB-Password, and admin password - Matrix: Remove hardcoded matrix-db-secret, reference matrix-credentials for Postgres user/password. Convert ConfigMap homeserver.yaml to template with __DB_PASSWORD__/__DB_USER__ placeholders, inject via busybox init container - Guacamole: Add OnePasswordItem CRD for future use. MySQL DB creds remain in guac-db-secret (1Password item lacks DB-specific fields — gap documented) - All three services now include OnePasswordItem CRD manifests for ArgoCD mgmt	2026-03-09 18:28:38 -05:00
Blue Jay	ef442e29eb	Add infrastructure manifests for 9 services Zabbix, IRC, Mail, Guacamole, Matrix, TeamSpeak, Intranet, PKI Web, FC Landing. All with cert-manager TLS, Traefik IngressRoutes, Longhorn PVCs.	2026-03-09 16:35:04 -05:00

9 Commits