bluejay-infra

Author	SHA1	Message	Date
Andrew Stoltz	e44e9a0062	feat(monitoring): RemoteDesktop alerts + scrape jobs + dashboard mount Three additions to the monitoring ConfigMap, each targeting FlowerCore.RemoteDesktop: - Scrape jobs (2 new): - probe-remotedesktop: blackbox http_2xx against https://desktop.iamworkin.lan/health every 30s. Feeds the RemoteDesktopWebDown alert. - fc-remotedesktop: direct /metrics scrape against desktop.iamworkin.lan for the fc_desktop_session_events_total and fc_desktop_pool_* series. - Alert group `remote-desktop` (7 rules in alerts.yml): - RemoteDesktopWebDown (3m) — /health probe failing - RemoteDesktopMetricsStale (10m) — absent metrics series - RemoteDesktopPoolDepleted (5m) — pool deficit + depleted flag - RemoteDesktopPoolDeficitSustained (10m, info) — persistent below-desired pool size - RemoteDesktopSessionChurnSpike (5m, info) — launch rate >20/min - RemoteDesktopRecordingEventsDropped (15m, info) — 30m without recording events while launches active - RemoteDesktopTlsExpiry (6h, critical) — <2d cert renewal window; aligns with feedback_acme_expiry_alert_threshold - Grafana dashboard mount: new volumeMounts + volumes entry for `dashboards-remotedesktop` backed by the grafana-dashboard-remotedesktop ConfigMap (previously added as a standalone file in `d4210c8`). Folder path /var/lib/grafana/dashboards/remotedesktop — picked up by the file-provider with foldersFromFilesStructure:true so the dashboard shows up in a "Remotedesktop" folder in Grafana. No CRLF churn; pure 100-line insertion into LF-normalized file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 00:41:35 -05:00
Andrew Stoltz	297a2a9bbc	ttsreader: bump image to v202604240023 for P3 one-click book render New surfaces: POST /api/v1/bible/projects (one-click whole-book render), GET /api/v1/bible/books, GET /api/v1/bible/books/{book}/preview, MCP tools render_tts_reader_bible_book + list_tts_reader_bible_books, Dashboard "Render a Bible book" card. 107/107 tests, +7 from previous.	2026-04-24 00:25:48 -05:00
Andrew Stoltz	d4210c819f	feat(monitoring): RemoteDesktop Grafana dashboard ConfigMap Wraps apps/monitoring/flowercore-remotedesktop-grafana-dashboard.json as a ConfigMap manifest so ArgoCD syncs it into the cluster alongside the existing grafana-dashboard-* ConfigMaps. Standalone file — does NOT modify noc-monitoring.yaml. That keeps the CRLF churn on noc-monitoring.yaml (sibling files apps/intranet/intranet.yaml and apps/agent-zero/configmaps-bluejay.yaml also carry CRLF churn) out of this commit. Dashboard will be synced into the cluster but NOT loaded by Grafana until a matching `volumes:` entry lands in the Grafana Deployment in noc-monitoring.yaml: - name: dashboard-remotedesktop configMap: name: grafana-dashboard-remotedesktop Plus a `volumeMounts:` entry in the grafana container: - name: dashboard-remotedesktop mountPath: /etc/grafana/provisioning/dashboards/remotedesktop readOnly: true Those edits are deferred to the CRLF-normalization pass on bluejay-infra so the review diff stays reviewable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 00:20:40 -05:00
Andrew Stoltz	fc0b67f670	ttsreader: bump image to v202604232334 for iTunes RSS + ID3 tags Pulls in FlowerCore.TtsReader@9e2497f: P2.3 iTunes-namespace podcast feed (author, summary, category, cover art, episode numbering, duration, atom:self link, serial channel type for Bible projects) and P2.4 ID3v2 tags on MP3 export + Vorbis comments on OGG (title, artist with Piper voice humanized, album, track N/M, genre defaulting to Religion & Spirituality for Bible or Audiobook for text sources, date). Phones and podcast apps now show proper track info instead of "Unknown - Unknown". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 23:37:53 -05:00
Andrew Stoltz	223e9a9232	feat(zabbix): add RemoteDesktop monitoring template New Zabbix 7.2 template under `Templates/FlowerCore` that scrapes the `/metrics` exposition from FlowerCore.RemoteDesktop and extracts: - `fc_desktop_session_events_total` split by event (launch/connect/ disconnect/recording), with a dedicated datapoint for the `browser_datasource="json"` slice to track delegated-auth launches. - `fc_desktop_pool_ready` gauge sum for warm pools. Trigger: `nodata(flowercore.remotedesktop.metrics,10m)=1` warns when the public desktop host stops exposing metrics. Follows the existing `flowercore-print-ollama.yaml` pattern — import manually into Zabbix and link to the Print/Desktop host. Not a K8s manifest; ArgoCD ignores. Grafana dashboard JSON is drafted at `apps/monitoring/flowercore-remotedesktop-grafana-dashboard.json` but still needs a ConfigMap wrap + Grafana Deployment volume mount in noc-monitoring.yaml before it ships (follow-up). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 23:30:32 -05:00
Andrew Stoltz	6c1375b21a	ttsreader: bump image to v202604232310 for Media Session API + Bible lexicon Pulls in FlowerCore.TtsReader@63e6b62: P1.1 Media Session API wiring in fc-media-session.js + quick-player.js + rendered-chapter-player.js, and P1.2 biblical-name pronunciation lexicon auto-seed on Bible-source project creation plus apply-bible-defaults endpoint + MCP tool for existing projects. Tests 81 -> 97 all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 23:15:53 -05:00
Andrew Stoltz	82529ed9b5	fix(guacamole): detect managed embeds from client URL	2026-04-23 21:02:05 -05:00
Andrew Stoltz	3ea8a56dab	fix(guacamole): disable logout for managed embeds	2026-04-23 20:51:15 -05:00
Andrew Stoltz	9272abc225	Use absolute cluster DNS for ttsreader piper	2026-04-23 20:49:40 -05:00
Andrew Stoltz	436185818d	fc-distribution: restrict public IngressRoute to GET+HEAD only Live verification 2026-04-24 caught POST /blobs on dist.flowercore.io returning 201 Created with the blob persisted — admin write operations reachable on the public surface. Controller-level strict entitlement was on, but that gates reads; writes weren't blocked at all. Fix: add Method(GET) \|\| Method(HEAD) to the Host match on the public IngressRoute. POST/PUT/PATCH/DELETE now miss every route for dist.flowercore.io and Traefik returns 404 before the pod sees the request. Edge-level defense-in-depth on top of the controller's strict-mode entitlement check. The internal IngressRoute for dist.iamworkin.lan stays unrestricted — admin POST /blobs + POST /manifests flows keep working from the lab.	2026-04-23 20:12:25 -05:00
Andrew Stoltz	c3cc404beb	fc-distribution: add dist.flowercore.io public surface (Cloudflare A record + Origin Cert + profile-header middleware) Lights up dist.flowercore.io end-to-end: - cf-origin-flowercore-io Secret (literal *.flowercore.io Origin Cert, copied from the telephony/gitea-public/matrix/mail/flowercore/fc-landing pattern — not via OnePasswordItem yet). - Traefik Middleware dist-public-profile-header: strips any caller-supplied X-FC-Distribution-Profile, injects 'public' so the controller's NamedEntitlementResolverRouter routes to the strict resolver. - IngressRoute fc-distribution-public: Host(`dist.flowercore.io`) -> same backing Service as the internal dist.iamworkin.lan route. Middleware attached; cert secret cf-origin-flowercore-io. Cloudflare DNS A record dist.flowercore.io -> 74.40.140.24 (proxied) already created 2026-04-24 via Cloudflare API (record id e9b957511556f37ff6763f4441acbc45). Controller entitlement config is still DefaultAllow=false + empty PublicEditions on the 'public' profile, so every public request returns 403 by default. Populate FlowerCore__Distribution__EntitlementPublic__PublicEditions__0 via env var when ready to expose specific editions.	2026-04-23 20:10:29 -05:00
Andrew Stoltz	90627819cc	fc-distribution: bump to v202604240010 (Phase 4 header-routing controller)	2026-04-23 19:23:35 -05:00
Andrew Stoltz	c97d486a3d	feat(fc-segmentdisplay): switch tls certificate to dns01	2026-04-23 18:39:17 -05:00
Andrew Stoltz	209bdc16cd	fc-distribution: bump to v202604232310 (Front D entitlement wired into ManifestsController)	2026-04-23 18:11:21 -05:00
Andrew Stoltz	3999634b06	Seed ttsreader piper voices before startup	2026-04-23 17:18:57 -05:00
Andrew Stoltz	61538d3712	fc-distribution: bump to v202604232212 (disable 30MB body size limit on POST /blobs)	2026-04-23 17:11:56 -05:00
Andrew Stoltz	ccaac367af	fc-distribution: bump to v202604232206 (adds POST /blobs endpoint)	2026-04-23 17:07:13 -05:00
Andrew Stoltz	407d473b71	feat(infra): route dns preflight through flowercore dns	2026-04-23 17:03:22 -05:00
Andrew Stoltz	f9593e494a	Allow ttsreader piper voice downloads	2026-04-23 16:50:21 -05:00
Andrew Stoltz	5b6c7b97fc	feat(fc-distribution): bump image to v202604232145 — cert chain endpoint - Serves GET /manifests/{edition}/{version}.cert (leaf+intermediate PEM) - Adds CertChainPem migration on startup (nullable column) - ManifestSignService now embeds version-specific certChainUrl Provisioning Agent's verify step will flip from ChainNotServed (Phase 2A soft-pass) to Valid once a fresh edition is published with this image. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 16:43:56 -05:00
Andrew Stoltz	a76eeb5c39	Add dedicated selectable piper for ttsreader	2026-04-23 16:37:03 -05:00
Andrew Stoltz	8a960ffc73	feat(fc-distribution): K8s manifest for Phase 1 edition publisher Adds apps/fc-distribution/{fc-distribution.yaml,kustomization.yaml,README.md}. Ships the FlowerCore.Distribution service (Blazor + REST + MCP) backed by Synology NFS for SQLite catalog + content-addressed blob root. Contents: - Namespace fc-distribution - 3x OnePasswordItem (FlowerCore Code Signing CA informational + per-edition signing keys for kiosk-standard and aistation-field) - Deployment: localhost/fc-distribution:v202604232000 (already imported to rke2-server via ctr), pinned to rke2-server nodeSelector because Synology NFS ACL restricts writes to that node, emptyDir for /tmp + /app/logs, inline NFS for /data (subPath distribution/data) and /blobs (subPath distribution/blobs), Secret volume mounts for /signing/<edition>. readOnlyRootFilesystem + runAsUser 1654 + drop ALL capabilities. Probes: startup + readiness on /healthz, liveness on tcpSocket (defense against future auth middleware accidentally gating /healthz). - Service (ClusterIP :80 -> container :8080) - Certificate (cert-manager ClusterIssuer step-ca-acme, dist.iamworkin.lan, 90d / 30d renew). pfSense Unbound override dist.iamworkin.lan -> 10.0.56.200 already in place (req'd for HTTP-01). - IngressRoute (Traefik websecure, Host rule on dist.iamworkin.lan) Env var keys align with the scaffold: FlowerCore__Database__ConnectionStrings__Sqlite FlowerCore__Distribution__Blobs__Root FlowerCore__Distribution__Signing__EditionCerts__<slug>__{CertPath,KeyPath} Consumer: ProvisioningAgent (USB-side, Phase 2) — see FlowerCore.Notes/docs/infrastructure/usb-provisioning-architecture.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 15:59:50 -05:00
Andrew Stoltz	686dbacc66	Bump TTS Reader image for render follow-along	2026-04-23 15:54:07 -05:00
Andrew Stoltz	4da60820c6	Deploy TTS Reader queue presentation fix	2026-04-23 15:13:21 -05:00
Andrew Stoltz	1cc4324cfb	Deploy TTS Reader import and preview fixes	2026-04-23 14:28:08 -05:00
Andrew Stoltz	bfc755057e	fix(agent-zero): use streamable http for chat mcp	2026-04-23 13:54:06 -05:00
Andrew Stoltz	d6008ee205	fix(agent-zero): allow chat mcp pod port	2026-04-23 13:29:36 -05:00
Andrew Stoltz	39fe6f1dba	fix(agent-zero): route chat mcp in-cluster	2026-04-23 13:26:10 -05:00
Andrew Stoltz	90fcf0cd5d	fix(agent-zero): expose openai provider key	2026-04-23 13:21:12 -05:00
Andrew Stoltz	ffef5c9126	Deploy TTS Reader annotation timeout fix	2026-04-23 13:06:17 -05:00
Andrew Stoltz	634e90a9ee	Deploy TTS Reader quick hardening release	2026-04-23 12:47:45 -05:00
Andrew Stoltz	86ccca18e3	Add Chat MCP server to Agent Zero	2026-04-23 12:41:58 -05:00
Andrew Stoltz	1c5caf3f40	Deploy TTS Reader v20260423114016	2026-04-23 11:57:39 -05:00
Andrew Stoltz	d3db19b0ca	guacamole: enable json auth for remotedesktop sso	2026-04-23 11:27:30 -05:00
Andrew Stoltz	702a6e4f52	fix(agent-zero): use short DNS name to avoid CoreDNS template hijack The full FQDN fc-llm-bridge.fc-llm-bridge.svc.cluster.local has 4 dots, which is less than the pod's ndots:5 threshold. The resolver then applies every entry in the search list BEFORE falling through to the bare FQDN, and the CoreDNS 'template iamworkin.lan' catch-all matches "...svc.cluster.local.iamworkin.lan" and returns Traefik VIP 10.0.56.200. The egress NetworkPolicy blocks that VIP (0.0.0.0/0 EXCEPT 10.0.0.0/8), so curl hangs for 30-134s and returns HTTP 000. Reference: feedback_coredns_ndots_template_collision memory. Fix: use "fc-llm-bridge.fc-llm-bridge.svc" (2 dots, still <5 so search expansion still fires, but the first suffix "...svc.cluster.local" hits the Kubernetes plugin in CoreDNS and returns the real ClusterIP 10.43.67.125 before the iamworkin.lan template is ever consulted). Verified: pod-exec curl fc:cheap → HTTP 200 with a real chat.completion envelope (Ollama/gemma3:4b via bridge). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 10:02:09 -05:00
Andrew Stoltz	6cbb5d8792	fix(agent-zero): NetworkPolicy egress rule for fc-llm-bridge (ADR-088) The chat_model flip (`62db15c`) pointed Agent Zero at fc-llm-bridge.fc-llm-bridge.svc.cluster.local:8080 but the existing agent-zero-netpol only allowed egress to specific node IPs (10.0.56.20:11434, 10.0.57.17:11434, 10.0.57.16:5200, 10.0.56.11:6443) plus public-internet (with RFC1918 exclusion). ClusterIP traffic to 10.43.0.0/16 was implicitly denied, so pod-exec curl to the bridge timed out after 134s. Adds an egress rule allowing TCP 8080 to the fc-llm-bridge namespace (matched by kubernetes.io/metadata.name which K8s 1.22+ sets automatically). No ingress changes needed — fc-llm-bridge has no NetworkPolicy, so the ingress side is already open. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 09:59:17 -05:00
Andrew Stoltz	62db15c69c	feat(agent-zero): route chat_model through fc-llm-bridge (ADR-088) Flips Agent Zero's chat_model from direct local Ollama (gemma3:12b via the 127.0.0.1:11434 sidecar proxy) to the FlowerCore LLM Bridge (fc:balanced tier, OpenAI-compatible, Anthropic Claude Sonnet under the hood) so chat turns are spend-tracked and can dispatch to any provider via a single tier alias. Scope is intentionally minimal and reversible: - chat_model: ollama/gemma3:12b/127.0.0.1:11434 → openai/fc:balanced/fc-llm-bridge internal service URL - utility_model, embedding_model, browser_model: UNCHANGED (stay on local 127.0.0.1 Ollama sidecar — no spend, low latency, not worth routing through the bridge for small-model traffic). Auth: new A0_SET_chat_model_api_key env var wired to the fc-llm-bridge-api-keys Secret (field: agent-zero-k8s). The Secret is synced by a new OnePasswordItem pointing at "FC LLM Bridge API Keys" in the IAmWorkin vault. Bearer-token auth is now accepted by the bridge (FlowerCore.LlmBridge@3225f1f). Rollback: revert this commit; old image v202604231449 is still present on all RKE2 nodes, and Agent Zero's strategy: Recreate makes the flip atomic. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 09:54:27 -05:00
Andrew Stoltz	84634f59f0	chore(fc-llm-bridge): bump image to v202604231520 Ships the Bearer-token auth fix (FlowerCore.LlmBridge@3225f1f) so Agent Zero's OpenAI provider can authenticate with Authorization: Bearer in addition to the original X-Api-Key header. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 09:51:57 -05:00
Andrew Stoltz	4cd5806fd0	fix(fc-llm-bridge): set dnsConfig ndots=2 to prevent CoreDNS wildcard hijack Pods in this cluster inherit ndots=5. External FQDNs with <5 dots (like api.anthropic.com) are expanded through the search path first, and the 4th suffix `api.anthropic.com.iamworkin.lan` matches CoreDNS' `template IN A iamworkin.lan` wildcard — resolves to Traefik VIP 10.0.56.200. TLS connect lands on Traefik's default cert and the AnthropicClient rejects with RemoteCertificateNameMismatch/RemoteCertificateChainErrors. Setting ndots=2 makes the resolver try the bare FQDN first (3 dots in api.anthropic.com), so the search path never fires. Reference: memory feedback_coredns_ndots_template_collision. Wider follow-up: the CoreDNS template plugin should add fallthrough for external public suffixes, so every FC service calling external HTTPS APIs stops hitting this trap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 09:42:17 -05:00
Andrew Stoltz	11c48bef30	chore(fc-llm-bridge): bump to v202604231449 (Budget 1.0.1 multi-provider dispatcher) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 09:36:05 -05:00
Andrew Stoltz	a86e87050b	fix(fc-llm-bridge): anthropic secret key is 'password' not 'credential' The 1Password item "Claude API Key" stores the key in a standard Password field (labeled `password`), so the OnePasswordItem operator creates the K8s Secret with key `password`. Deployment was referencing `credential`, which made the pod fail with CreateContainerConfigError. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 09:29:32 -05:00
Andrew Stoltz	0214f94ac4	chore(fc-llm-bridge): bump image to v202604231424 (first live tag) Built from FlowerCore.LlmBridge@6d285b5 (initial scaffold). Imported on all three RKE2 nodes via podman save + ctr import. Replaces v00000000000000 placeholder — ArgoCD sync will roll the pod. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 09:28:05 -05:00
Andrew Stoltz	a1b8eb379d	feat(fc-llm-bridge): stage ADR-088 manifests (not yet applied) Staged but NOT applied. Do not git push until the two pre-requisites below are done. See apps/fc-llm-bridge/README.md for the full order-of-ops. Manifests (apps/fc-llm-bridge/fc-llm-bridge.yaml, 8 docs): - Namespace fc-llm-bridge - OnePasswordItem anthropic-api-key (existing Claude API Key item) - OnePasswordItem fc-llm-bridge-api-keys (NEW item, pending creation) - PersistentVolumeClaim fc-llm-bridge-data (2Gi longhorn) - Deployment fc-llm-bridge (port 8080, uid 1654, readOnlyRootFilesystem, tcpSocket probes to survive future ApiKeyAuthMiddleware reordering) - Service fc-llm-bridge ClusterIP - Certificate fc-llm-bridge-cert (step-ca-acme) - IngressRoute fc-llm-bridge (fc-llm-bridge.iamworkin.lan, websecure) Pre-requisites BEFORE git push: 1. pfSense Unbound override fc-llm-bridge.iamworkin.lan -> 10.0.56.200 (currently NXDOMAIN -- verified via nslookup and check-pfsense-dns.py). Skipping this step puts cert-manager HTTP-01 into ~2h backoff. 2. Create 1Password item `FC LLM Bridge API Keys` in vault IAmWorkin with password fields: agent-zero-ws, agent-zero-k8s, spare-1, spare-2. 3. Build + import localhost/fc-llm-bridge:v<tag> to rke2-server + rke2-agent1 + rke2-agent2. Bump image tag from placeholder v00000000000000 before committing the apply. Related: ADR-088 (FlowerCore.Notes/ARCHITECTURE.md), design doc at FlowerCore.Notes/docs/ai-agents/agent-zero-anthropic-bridge.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 03:10:36 -05:00
Andrew Stoltz	9a1665907c	fc-signalcontrol: align live port and selectors	2026-04-22 23:22:14 -05:00
Andrew Stoltz	899804215a	statefulsets: align guacamole and matrix drift defaults	2026-04-22 23:11:47 -05:00
Andrew Stoltz	1dc66738e6	zabbix: align postgres tracking label	2026-04-22 22:50:24 -05:00
Andrew Stoltz	5623a272c5	zabbix: include statefulset defaults	2026-04-22 22:39:31 -05:00
Andrew Stoltz	3d3f91160b	monitoring: add Print.Web Ollama Zabbix template	2026-04-22 22:07:40 -05:00
Andrew Stoltz	93f77c1844	fix(monitoring): use bluejay_v2 auth for snmp-nas (not public_v2) Synology NAS is configured with community bluejay_monitor (→ snmp.yml auth 'bluejay_v2'), not public. public_v2 was returning HTTP 500 from snmp-exporter for this target. Verified bluejay_v2 returns metrics. Keeps printer (10.0.58.107) on public_v2 — Epson ET-3750 uses community "public" as documented in its SNMP settings.	2026-04-22 21:32:14 -05:00
Andrew Stoltz	59efc460fd	fix(irc): use short name for unrealircd in anope + thelounge configs Same CoreDNS iamworkin.lan template + ndots:5 hijack as the irc-notify fix. Anope services (nickserv/chanserv/memo) have been disconnected from unrealircd for weeks ("Host is unreachable" every 3s). Thelounge server defaults pointed at the same broken FQDN. Short name unrealircd.irc.svc resolves to the ClusterIP directly.	2026-04-22 21:23:38 -05:00

... 3 4 5 6 7 ...

405 Commits