Single-host routing via desktop.iamworkin.lan/guacamole has been
live-proven (curl → 200) and the Codex single-host-guacamole-wip
merge flipped RemoteDesktop.Web's GuacamolePublicUrl + defaults to
the new path. Nothing else in FlowerCore actively requires the
legacy guac.iamworkin.lan URL.
Removed from the guacamole app:
- IngressRoute `guacamole` matching Host(guac.iamworkin.lan)
- Middleware `guac-add-prefix` (only the legacy route referenced it)
- Certificate `guacamole-tls` (only covered guac.iamworkin.lan)
ArgoCD prune will delete the live resources on next sync. The
pfSense DNS override for guac.iamworkin.lan should be removed
via FlowerCore.DNS as a follow-up operator step — not managed by
this repo.
The new `guacamole-desktop-path` IngressRoute + `desktop-guacamole-path-tls`
Certificate (added in e65de29) handle all Guacamole traffic going
forward.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two-part fix on top of the live Shared.Indexing rollout:
1. Image bump v202604240050corpus -> v202604240108gpu, rebuilt from
FlowerCore.Intranet.Web@feat/shared-indexing-search (HEAD includes
the FilePatterns array-merge fix in IntranetSearchOptions). At
runtime each DocCorpusRoot now sees ONLY the patterns explicitly
set in appsettings.json — notes-md gets ["*.md"], notes-html gets
["*.html"], no accidental cross-bleed.
2. New IntranetSearch__OllamaBaseUrl env var pointing at
http://10.0.56.20:11434 (BLUEJAY-WS GPU, R9700 32GB VRAM). Verified
reachable from the cluster and nomic-embed-text:latest is pulled.
This is the workaround for memory feedback_pi5_nomic_embed_slow:
edge1 Pi 5 takes ~189s per 32-chunk batch, projecting full notes-md
indexing (5665 chunks) at ~9 hours; the GPU should land it in minutes.
Edge1 stays the chat default; this env var only redirects the
indexer's bulk embedding calls.
Image distributed to all three RKE2 nodes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cluster Traefik disallows cross-namespace service refs from
IngressRoutes, so the PathPrefix(/guacamole) rule I added to
fc-desktop IngressRoute in 292528e failed with:
"service guacamole/guacamole not in the parent resource namespace
fc-desktop"
Move the /guacamole path match into the guacamole namespace where
the Service actually lives:
- apps/guacamole/guacamole.yaml adds a new `guacamole-desktop-path`
IngressRoute matching `Host(desktop.iamworkin.lan) &&
PathPrefix(/guacamole)` → guacamole:8080 (no add-prefix middleware;
the browser already sends the /guacamole/* path that Guacamole's
servlet serves at).
- New Certificate `desktop-guacamole-path-tls` for desktop.iamworkin.lan
in the guacamole namespace, issued by step-ca-acme. Separate cert
from fc-desktop's remotedesktop-web-tls because Secret refs are
also scoped per-namespace; duplicating the cert is cheaper than
enabling cross-namespace secret refs cluster-wide.
- Revert the cross-namespace attempt in apps/fc-desktop/fc-desktop.yaml
back to a Host-only route. Traefik's router matching precedence
(longer/more-specific rule wins) handles the /guacamole vs
catch-all priority without explicit priority: fields.
Closes the single-host Guacamole URL regression Codex's branch
introduced — GuacamolePublicUrl=https://desktop.iamworkin.lan/guacamole
now resolves to the Guacamole webapp end-to-end.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Single-host Guacamole routing — Traefik matches Host=desktop.iamworkin.lan
+ PathPrefix=/guacamole first (priority 20) and forwards to the
guacamole Service in the guacamole namespace on 8080. The existing
Host-only catch-all rule drops to priority 10 so Guacamole traffic
resolves to the more-specific match.
Mirrors the IngressRoute in FlowerCore.RemoteDesktop@master (merged
as part of codex/single-host-guacamole-wip). The RemoteDesktop repo
copy is deploy-ref only — ArgoCD owns the live IngressRoute via
this manifest. Without this change, GuacamolePublicUrl=
https://desktop.iamworkin.lan/guacamole returns 404 because Traefik
routes the whole Host to remotedesktop-web.
Unblocks the per-template AAT smoke against the new public URL
path + closes the final live piece of Codex's single-host routing
work.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds TtsReader__Kokoro__Enabled=true + BaseUrl=http://10.0.56.20:10401
+ TimeoutSeconds=120 so the pod routes kokoro-tagged voices to the
Kokoro-FastAPI backend running on BLUEJAY-WS. Multi-engine router
falls through to Piper for piper-tagged and untagged voices.
Requires nftables on BLUEJAY-WS to permit tcp/10401 from 10.0.56/23
and 10.42.0.0/16. Applied to the live ruleset — Puppet Hiera path is
the durable fix (kokoro_server_enabled under profile::security::firewall).
Tests 107 → 114 (+7 MultiEngineSpeechSynthesizerTests).
Companion to the Prometheus alert rules landed in e44e9a0. The
Prometheus rules were loading but never delivered — the monitoring
stack has no Alertmanager configured; **Grafana** owns alert
routing via its built-in engine + webhook contact point to
irc-notify.monitoring.svc:9119. Without a matching Grafana alert,
the Prometheus rules just show up in the Prometheus UI and page
no one.
Adds 6 Grafana alert rules in a new `RemoteDesktop` group under
the AI Stack Alerts folder:
- remotedesktop-web-down (3m) — probe_success{job="probe-remotedesktop"} < 1
- remotedesktop-metrics-stale (10m) — fc_desktop_session_events_total series absent
- remotedesktop-pool-depleted (5m) — fc_desktop_pool_depleted > 0
- remotedesktop-pool-deficit-sustained (10m info) — fc_desktop_pool_deficit > 0
- remotedesktop-session-churn-spike (5m info) — launch rate > 20/min
- remotedesktop-tls-expiry (6h critical) — cert < 2 days to expiry
Each uses the standard Grafana 3-stage pipeline (query → reduce →
threshold) matching the existing AI Stack + Infrastructure alert
patterns. Labels: service=remotedesktop + severity (warning/info/critical).
Default route is `IRC #alerts` via the existing webhook contact point.
Parity with the Prometheus rules (which already fire internally
for the Prometheus UI + any future Alertmanager integration).
Grafana restart picks up the new provisioning on next reload.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cluster egress to github.com is fronted by a step-ca TLS proxy that
returns 404 page not found for unmatched routes — git clone of the
public FlowerCore.Notes repo failed inside the pod even with
GIT_SSL_NO_VERIFY=true. Rather than chase the egress NetworkPolicy /
proxy config, bake the docs corpus directly into the image at
/srv/flowercore-notes/docs.
The corpus is just *.md + *.html (369 files, 2.7 MB uncompressed) —
small enough that re-baking on every deploy is fine and avoids any
runtime network dependency.
Manifest changes:
- Image bump: v202604240040search -> v202604240050corpus
- Removed initContainers (clone-notes-corpus is now redundant)
- Removed notes-corpus emptyDir + its volumeMounts
- Vector-store PVC mount stays.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds two egress allows to monitoring-netpol so Prometheus can scrape
FlowerCore.RemoteDesktop:
1. fc-desktop namespace on port 8080 — direct ClusterIP service
target (remotedesktop-web.fc-desktop:8080).
2. traefik-system namespace pods on ports 8080 + 8443 — covers the
Traefik VIP hairpin path for the `https://desktop.iamworkin.lan`
scrape target (CoreDNS wildcard resolves iamworkin.lan hostnames
to the LB VIP; after kube-proxy DNAT, egress needs the backend
pod port allowed per feedback_netpol_dnat_backend_port).
Without these, the fc-remotedesktop scrape times out with "context
deadline exceeded" even though the monitoring-netpol already allows
the 10.0.56.0/24 CIDR — post-DNAT the destination is a 10.42.x.x
pod IP, not the VIP.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cluster egress is fronted by a step-ca TLS proxy whose cert doesn't
match github.com. The init container's git clone failed with
"SSL: no alternative certificate subject name matches target hostname
'github.com'". The Notes repo is public — there is no secret to
protect on the wire — so GIT_SSL_NO_VERIFY=true is the right tradeoff
here. Tag at v202604240040search.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3 lane 1 of FlowerCore.Shared.Indexing rollout — wires the new
search consumer in FlowerCore.Intranet.Web to live infrastructure.
Manifest changes:
- Image bump: localhost/fc-intranet-web:latest -> :v202604240040search.
Built from FlowerCore.Intranet.Web@feat/shared-indexing-search and
imported into all three RKE2 nodes (rke2-server, rke2-agent1, rke2-agent2)
via ctr import. Both :latest and :v202604240040search tags are present.
- New PersistentVolumeClaim intranet-vector-store (1Gi, ReadWriteOnce,
Longhorn) mounted at /data for the SQLite vector store
(intranet-vectors.db).
- New emptyDir volume notes-corpus (1Gi sizeLimit) shared between the
init container and main container, mounted at /srv/flowercore-notes
(read-only in the main container).
- New init container clone-notes-corpus (alpine/git) that shallow-clones
https://github.com/astoltz/FlowerCore.Notes.git
(codex/notes-pimanager-live-drift) into /srv/flowercore-notes on every
pod start. Re-clone is cheap (depth=1) and re-runs of git fetch +
reset --hard are idempotent.
- Strategy switched to Recreate for the deployment, since the new RWO
PVC blocks rolling updates — see CLAUDE.md memory "RWO PVC blocks K8s
rolling updates".
- Resource bumps: memory 128Mi -> 256Mi req, 512Mi -> 1Gi limit; CPU
500m -> 1000m limit. The DocsCorpusIndexer + Ollama HTTP calls add
measurable load during the initial index build.
- initialDelaySeconds bumps on both probes (10s -> 30s liveness, 5s ->
10s readiness) to account for startup-time Ollama probing and the
slightly larger image.
The DocsCorpusIndexer waits 15s after host startup before its first
indexing pass, then loops every RescanInterval (default 1h). Its first
run will:
1. Embed all *.md under /srv/flowercore-notes/docs against
nomic-embed-text on edge1 (10.0.57.17:11434).
2. Embed all *.html under /srv/flowercore-notes/docs/dashboards.
3. Persist chunks + embeddings to /data/intranet-vectors.db.
Verify after rollout:
- kubectl -n intranet logs deploy/intranet-web -c clone-notes-corpus
(init container should show the docs/ listing).
- kubectl -n intranet logs deploy/intranet-web -f
(DocsCorpusIndexer should log "Indexing docs root 'notes-md'..." then
"Docs root 'notes-md' indexed: N files, M chunks, M stored").
- curl -sk https://intranet.iamworkin.lan/api/search/indexes
-> ["notes-html","notes-md"]
- curl -sk 'https://intranet.iamworkin.lan/api/search?q=guacamole+single+host&topK=3'
-> hits from docs/infrastructure/guacamole-customization-plan.md
Companion source on FlowerCore.Intranet.Web@feat/shared-indexing-search.
Depends on FlowerCore.Common@feat/shared-indexing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three additions to the monitoring ConfigMap, each targeting
FlowerCore.RemoteDesktop:
- **Scrape jobs** (2 new):
- probe-remotedesktop: blackbox http_2xx against
https://desktop.iamworkin.lan/health every 30s. Feeds the
RemoteDesktopWebDown alert.
- fc-remotedesktop: direct /metrics scrape against
desktop.iamworkin.lan for the fc_desktop_session_events_total
and fc_desktop_pool_* series.
- **Alert group `remote-desktop`** (7 rules in alerts.yml):
- RemoteDesktopWebDown (3m) — /health probe failing
- RemoteDesktopMetricsStale (10m) — absent metrics series
- RemoteDesktopPoolDepleted (5m) — pool deficit + depleted flag
- RemoteDesktopPoolDeficitSustained (10m, info) — persistent
below-desired pool size
- RemoteDesktopSessionChurnSpike (5m, info) — launch rate
>20/min
- RemoteDesktopRecordingEventsDropped (15m, info) — 30m without
recording events while launches active
- RemoteDesktopTlsExpiry (6h, critical) — <2d cert renewal
window; aligns with feedback_acme_expiry_alert_threshold
- **Grafana dashboard mount**: new volumeMounts + volumes entry for
`dashboards-remotedesktop` backed by the grafana-dashboard-remotedesktop
ConfigMap (previously added as a standalone file in d4210c8).
Folder path /var/lib/grafana/dashboards/remotedesktop — picked up
by the file-provider with foldersFromFilesStructure:true so the
dashboard shows up in a "Remotedesktop" folder in Grafana.
No CRLF churn; pure 100-line insertion into LF-normalized file.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New surfaces: POST /api/v1/bible/projects (one-click whole-book render),
GET /api/v1/bible/books, GET /api/v1/bible/books/{book}/preview, MCP
tools render_tts_reader_bible_book + list_tts_reader_bible_books,
Dashboard "Render a Bible book" card. 107/107 tests, +7 from previous.
Wraps apps/monitoring/flowercore-remotedesktop-grafana-dashboard.json
as a ConfigMap manifest so ArgoCD syncs it into the cluster alongside
the existing grafana-dashboard-* ConfigMaps. Standalone file — does
NOT modify noc-monitoring.yaml. That keeps the CRLF churn on
noc-monitoring.yaml (sibling files apps/intranet/intranet.yaml and
apps/agent-zero/configmaps-bluejay.yaml also carry CRLF churn) out
of this commit.
Dashboard will be synced into the cluster but NOT loaded by Grafana
until a matching `volumes:` entry lands in the Grafana Deployment
in noc-monitoring.yaml:
- name: dashboard-remotedesktop
configMap:
name: grafana-dashboard-remotedesktop
Plus a `volumeMounts:` entry in the grafana container:
- name: dashboard-remotedesktop
mountPath: /etc/grafana/provisioning/dashboards/remotedesktop
readOnly: true
Those edits are deferred to the CRLF-normalization pass on
bluejay-infra so the review diff stays reviewable.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pulls in FlowerCore.TtsReader@9e2497f: P2.3 iTunes-namespace podcast
feed (author, summary, category, cover art, episode numbering,
duration, atom:self link, serial channel type for Bible projects) and
P2.4 ID3v2 tags on MP3 export + Vorbis comments on OGG (title, artist
with Piper voice humanized, album, track N/M, genre defaulting to
Religion & Spirituality for Bible or Audiobook for text sources,
date). Phones and podcast apps now show proper track info instead of
"Unknown - Unknown".
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New Zabbix 7.2 template under `Templates/FlowerCore` that scrapes
the `/metrics` exposition from FlowerCore.RemoteDesktop and extracts:
- `fc_desktop_session_events_total` split by event (launch/connect/
disconnect/recording), with a dedicated datapoint for the
`browser_datasource="json"` slice to track delegated-auth launches.
- `fc_desktop_pool_ready` gauge sum for warm pools.
Trigger: `nodata(flowercore.remotedesktop.metrics,10m)=1` warns when
the public desktop host stops exposing metrics.
Follows the existing `flowercore-print-ollama.yaml` pattern — import
manually into Zabbix and link to the Print/Desktop host. Not a K8s
manifest; ArgoCD ignores.
Grafana dashboard JSON is drafted at
`apps/monitoring/flowercore-remotedesktop-grafana-dashboard.json`
but still needs a ConfigMap wrap + Grafana Deployment volume mount
in noc-monitoring.yaml before it ships (follow-up).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pulls in FlowerCore.TtsReader@63e6b62: P1.1 Media Session API wiring in
fc-media-session.js + quick-player.js + rendered-chapter-player.js, and
P1.2 biblical-name pronunciation lexicon auto-seed on Bible-source
project creation plus apply-bible-defaults endpoint + MCP tool for
existing projects. Tests 81 -> 97 all green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Live verification 2026-04-24 caught POST /blobs on dist.flowercore.io
returning 201 Created with the blob persisted — admin write operations
reachable on the public surface. Controller-level strict entitlement
was on, but that gates reads; writes weren't blocked at all.
Fix: add Method(GET) || Method(HEAD) to the Host match on the public
IngressRoute. POST/PUT/PATCH/DELETE now miss every route for
dist.flowercore.io and Traefik returns 404 before the pod sees the
request. Edge-level defense-in-depth on top of the controller's
strict-mode entitlement check.
The internal IngressRoute for dist.iamworkin.lan stays unrestricted —
admin POST /blobs + POST /manifests flows keep working from the lab.
Lights up dist.flowercore.io end-to-end:
- cf-origin-flowercore-io Secret (literal *.flowercore.io Origin Cert,
copied from the telephony/gitea-public/matrix/mail/flowercore/fc-landing
pattern — not via OnePasswordItem yet).
- Traefik Middleware dist-public-profile-header: strips any caller-supplied
X-FC-Distribution-Profile, injects 'public' so the controller's
NamedEntitlementResolverRouter routes to the strict resolver.
- IngressRoute fc-distribution-public: Host(`dist.flowercore.io`) ->
same backing Service as the internal dist.iamworkin.lan route.
Middleware attached; cert secret cf-origin-flowercore-io.
Cloudflare DNS A record dist.flowercore.io -> 74.40.140.24 (proxied)
already created 2026-04-24 via Cloudflare API (record id
e9b957511556f37ff6763f4441acbc45).
Controller entitlement config is still DefaultAllow=false + empty
PublicEditions on the 'public' profile, so every public request
returns 403 by default. Populate FlowerCore__Distribution__EntitlementPublic__PublicEditions__0
via env var when ready to expose specific editions.
- Serves GET /manifests/{edition}/{version}.cert (leaf+intermediate PEM)
- Adds CertChainPem migration on startup (nullable column)
- ManifestSignService now embeds version-specific certChainUrl
Provisioning Agent's verify step will flip from ChainNotServed (Phase 2A
soft-pass) to Valid once a fresh edition is published with this image.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The full FQDN fc-llm-bridge.fc-llm-bridge.svc.cluster.local has 4 dots,
which is less than the pod's ndots:5 threshold. The resolver then
applies every entry in the search list BEFORE falling through to the
bare FQDN, and the CoreDNS 'template iamworkin.lan' catch-all matches
"...svc.cluster.local.iamworkin.lan" and returns Traefik VIP
10.0.56.200. The egress NetworkPolicy blocks that VIP (0.0.0.0/0
EXCEPT 10.0.0.0/8), so curl hangs for 30-134s and returns HTTP 000.
Reference: feedback_coredns_ndots_template_collision memory.
Fix: use "fc-llm-bridge.fc-llm-bridge.svc" (2 dots, still <5 so search
expansion still fires, but the first suffix "...svc.cluster.local"
hits the Kubernetes plugin in CoreDNS and returns the real ClusterIP
10.43.67.125 before the iamworkin.lan template is ever consulted).
Verified: pod-exec curl fc:cheap → HTTP 200 with a real chat.completion
envelope (Ollama/gemma3:4b via bridge).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The chat_model flip (62db15c) pointed Agent Zero at
fc-llm-bridge.fc-llm-bridge.svc.cluster.local:8080 but the existing
agent-zero-netpol only allowed egress to specific node IPs
(10.0.56.20:11434, 10.0.57.17:11434, 10.0.57.16:5200, 10.0.56.11:6443)
plus public-internet (with RFC1918 exclusion). ClusterIP traffic to
10.43.0.0/16 was implicitly denied, so pod-exec curl to the bridge
timed out after 134s.
Adds an egress rule allowing TCP 8080 to the fc-llm-bridge namespace
(matched by kubernetes.io/metadata.name which K8s 1.22+ sets
automatically). No ingress changes needed — fc-llm-bridge has no
NetworkPolicy, so the ingress side is already open.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Flips Agent Zero's chat_model from direct local Ollama (gemma3:12b via
the 127.0.0.1:11434 sidecar proxy) to the FlowerCore LLM Bridge
(fc:balanced tier, OpenAI-compatible, Anthropic Claude Sonnet under the
hood) so chat turns are spend-tracked and can dispatch to any provider
via a single tier alias.
Scope is intentionally minimal and reversible:
- chat_model: ollama/gemma3:12b/127.0.0.1:11434
→ openai/fc:balanced/fc-llm-bridge internal service URL
- utility_model, embedding_model, browser_model: UNCHANGED
(stay on local 127.0.0.1 Ollama sidecar — no spend, low latency,
not worth routing through the bridge for small-model traffic).
Auth: new A0_SET_chat_model_api_key env var wired to the
fc-llm-bridge-api-keys Secret (field: agent-zero-k8s). The Secret is
synced by a new OnePasswordItem pointing at "FC LLM Bridge API Keys"
in the IAmWorkin vault. Bearer-token auth is now accepted by the
bridge (FlowerCore.LlmBridge@3225f1f).
Rollback: revert this commit; old image v202604231449 is still present
on all RKE2 nodes, and Agent Zero's strategy: Recreate makes the flip
atomic.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Ships the Bearer-token auth fix (FlowerCore.LlmBridge@3225f1f) so Agent
Zero's OpenAI provider can authenticate with Authorization: Bearer in
addition to the original X-Api-Key header.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pods in this cluster inherit ndots=5. External FQDNs with <5 dots (like
api.anthropic.com) are expanded through the search path first, and the 4th
suffix `api.anthropic.com.iamworkin.lan` matches CoreDNS' `template IN A
iamworkin.lan` wildcard — resolves to Traefik VIP 10.0.56.200. TLS connect
lands on Traefik's default cert and the AnthropicClient rejects with
RemoteCertificateNameMismatch/RemoteCertificateChainErrors.
Setting ndots=2 makes the resolver try the bare FQDN first (3 dots in
api.anthropic.com), so the search path never fires.
Reference: memory feedback_coredns_ndots_template_collision. Wider follow-up:
the CoreDNS template plugin should add fallthrough for external public suffixes,
so every FC service calling external HTTPS APIs stops hitting this trap.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>