Deploys fix for stale Failed manifest accumulation in TTS Reader Ops view
and atomic-write guard against empty/corrupt job manifests.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Sprint E XXL Phase 4γ MVP deploy — POST /api/v1/render endpoint.
Two changes:
1. Image tag v202604272339 → v202604280946 (TtsReader@d9e0a58 master tip
includes the new RenderController + RenderService + 9 tests).
2. New TtsReader__Render__CdnDirectory=/data/cdn env var. Default
wwwroot/cdn resolves under the read-only app filesystem when
runAsNonRoot=true; pin to the existing writable PVC mount alongside
other TtsReader runtime data. Manifests + cue audio land at
/data/cdn/sha256/<hash>/manifest.json + cues/.
Pre-existing PVC mount at /data/ already covers this — no PVC change
needed, just the env var override.
Pairs with TtsReader@d9e0a58 master tip (ready for image build + import).
Bump intranet image to v20260427-2353 (master @ 38b0148):
- Sprint E search lane: /search Blazor page + IntranetSearchService
+ DocsCorpusIndexer + Shared.Indexing wiring
- 7 new service pages: LocalAiAgents, AiTopology, Distribution, Dns,
Knowledge, LlmBridge, Provisioning
- PiManager drift docs
New env var: PageReadingOverrides__FilePath=/data/page-reading-overrides.json
so the persisted Lane 2α store lives on the writable PVC instead of
the default in-memory fallback (which loses state on pod restart).
Operator-edited overrides via the existing /api/v1/pages/{encoded}/overrides
controller will now survive across restarts.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
TtsReader@9333480: distinguishes partial-render (yellow Warning, audio
plays, 'Re-render N failed sentences' button) from full-fail (red
Danger, 'Try render again'). New TtsFallbackChainFailedException carries
both voices when Kokoro + Piper both fail; chapter breadcrumb names
the entire chain instead of just the requested voice. +8 tests.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Kokoro pod has 4 restarts in 2d6h with exit 143 (SIGTERM from kubelet).
kubectl describe events all show:
Liveness probe failed: Get "http://10.42.229.109:8880/v1/audio/voices":
context deadline exceeded
The probe path /v1/audio/voices shares the FastAPI worker pool with
/v1/audio/speech. A long synth (Bible chapter, 30+ sentences) holds the
pool past the prior 5s × 3 = 15s probe window, kubelet kills the pod,
in-flight renders fail. Operator hits "fallback chain failed" toasts +
partial-render breadcrumbs during these windows.
Bump probe timeoutSeconds 5 → 15 and failureThreshold 3 → 5 → 75 s of
grace before kubelet gives up. Combined with the kokoro-side circuit
breaker landing in TtsReader (Sprint E Phase 1b), the FC backend will
also stop slamming kokoro during recovery so it can serve the probe
even faster.
The companion Prometheus alerts (KokoroPodFlapping, PiperPodFlapping)
land in FlowerCore.Notes/scripts/monitoring/alerts.yml.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Day 8 disk-cache warmer crashes on production with
'Read-only file system : /home/app/data' because the relative default
'data/voice-previews' resolves under runAsNonRoot HOME (read-only with
readOnlyRootFilesystem=true). Pin to /data/voice-previews so the cache
lands on the writable PVC mount alongside ttsreader.db, audio output,
and jobs root.
Image v202604272216 (already on nodes) is unaffected by this — only
the env routing changes. ArgoCD reconciles + rollout restart picks up
the new env without rebuild.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v202604272157 crash-looped on the production PVC because Database.EnsureCreated()
is a no-op on existing DBs and the VoiceProfiles table was missing. TtsReader@a9f0b73
adds an idempotent CREATE TABLE IF NOT EXISTS to the infra reconciler before
TtsReaderDataSeeder runs. Bumping the manifest to pick up that fix.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds 7 new pages (5 service pages, AI topology, opencode operator guide)
to https://intranet.iamworkin.lan:
/services/dns
/services/distribution
/services/llm-bridge
/services/knowledge
/services/provisioning
/services/ai-topology
/development/local-ai-agents
Plus topology corrections in /services/ai (AiStack.razor) and 6 new nav entries.
Source commit: FlowerCore.Intranet.Web@1598542 on
codex-wip-pre-readaloud-collision-2026-04-24.
Image built from artifacts/publish via Dockerfile.deploy on BLUEJAY-WS,
imported to all 3 RKE2 nodes (rke2-server + rke2-agent1 + rke2-agent2).
Build: 0 warnings, 0 errors, 197/197 tests passing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two changes after the Phase 2.4 deploy went live at
https://knowledge.iamworkin.lan:
1. **Ollama URL flip**: from BLUEJAY-WS (10.0.56.20:11434) to edge1 Pi 5
(10.0.57.17:11434). Honors the cluster-clean architecture from
bluejay-infra@0f9d56e ("Workstation is private dev hardware and should
not be in the cluster path"). Query-time embeddings (~ms per query)
are fast enough on edge1; bulk index rebuilds (Phase 2.5+) will need a
separate ingestion lane that can opt into the workstation GPU when
present. ArgoCD picks up the env-var change and rolls the pod
automatically — no image rebuild needed.
2. **README LIVE status**: flip the staged-not-yet-applied banner to
LIVE 2026-04-27. Pod running, certificate issued, PVC bound,
/healthz 200, /api/v1/editions [] (initial-deploy state). Phase 2.5+
admin UI handles bulk population.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Workstation (BLUEJAY-WS) is private dev hardware and should not be in the
cluster path. Repointing the nginx ollama-proxy sidecar so cluster Agent Zero
talks ONLY to edge1 Pi 5 + AI HAT+ (10.0.57.17:11434):
- nginx upstream: edge1 sole server, no workstation entry
- wait-for-ollama init container: only checks edge1
- NetworkPolicy egress: drop 10.0.56.20/32, keep 10.0.57.17/32
- Comments updated throughout to flag workstation as off-limits to cluster
- Annotation rewritten to document the architectural intent
Pulled qwen2.5:1.5b on edge1 first so Agent Zero's utility_model survives
the cutover (existing models on edge1: qwen3:4b, gemma3:4b, qwen2.5-coder:7b,
nomic-embed-text). Model count on edge1: 4 → 5.
Lets BLUEJAY-WS lock down its Ollama port to localhost without breaking
the cluster Agent Zero.
NOT YET APPLIED — push to origin/main is gated on the DNS A record
knowledge.iamworkin.lan -> 10.0.56.200 being live. Per memory
feedback_pfsense_dns_required_for_acme, applying the Certificate
without DNS in place puts cert-manager into ~2h HTTP-01 backoff and
needs `kubectl -n knowledge delete order <name>` recovery.
Manifests authored:
- apps/knowledge/knowledge.yaml — Namespace, PVC (knowledge-vector-store
Longhorn 20Gi RWO), Deployment (single replica, Recreate, image
localhost/fc-knowledge-web:v202604272200 placeholder, runAsNonRoot
1654, readOnlyRootFilesystem, drop ALL caps, /healthz startupProbe +
readinessProbe, tcpSocket livenessProbe), Service (ClusterIP port
80 -> 8080), Certificate (step-ca-acme ClusterIssuer, 90d duration),
IngressRoute (knowledge.iamworkin.lan, websecure entrypoint).
- apps/knowledge/kustomization.yaml — `kubectl kustomize` preview file
(matches fc-distribution shape; ApplicationSet uses dir generator).
- apps/knowledge/README.md — deployment order checklist with the DNS
preflight, image build/import loop for all 3 RKE2 nodes, push
procedure, smoke verification, initial-deploy-state notes
(zero editions until *.db files are pushed to the PVC), resource
sizing, probe + middleware notes.
Companion artifacts (separate repos, separate commits):
- FlowerCore.Knowledge@eb91eb4 — Dockerfile.deploy at repo root
- FlowerCore.Notes@96cd443 — scripts/deploy-knowledge.sh
Apply order (from apps/knowledge/README.md):
1. Add DNS A record knowledge.iamworkin.lan -> 10.0.56.200 via
FlowerCore.DNS or pfSense web UI.
2. Run `bash scripts/deploy-knowledge.sh` from FlowerCore.Notes — this
builds + imports the image to all 3 RKE2 nodes with
FLOWERCORE_DEPLOY_SKIP_ROLLOUT=1 (since the Deployment doesn't
exist yet on the cluster).
3. Bump the image tag in this manifest to match the freshly-imported
tag, then `git push` from this repo to land on main. ArgoCD picks
up within ~3 minutes and creates `infra-knowledge`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three new Prometheus alert rules for the print-services group, all routed
to thermal_print via alert_channel label (Grafana contact point ->
irc-notify -> Print.Web /api/print/alert):
- PrintPaperRollLow (warning, 5-10% remaining, 5m for)
- PrintPaperRollCritical (critical, <=5% remaining, 2m for)
- PrintJobDeadLetter (warning, any new dead-letter in 15m)
Source-of-truth gauge is print_paper_remaining_percent (Print.Web OTEL),
which is hydrated from the active PaperRoll row at process startup
(Print.Web@<TBD> HydrateMetricsAsync) so the gauge isn't blind for an
arbitrary window after every deploy.
Self-referential humor: low-roll alerts route to the printer that's
running out of paper, so it announces its own paper-out warning on its
remaining paper.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds an IngressRoute + cert-manager Certificate that terminates HTTPS for
print.iamworkin.lan and proxies to edge2's Print.Web at 10.0.57.16:5200.
Same headless-Service-with-manual-Endpoints pattern as noc-services (used
for grafana/prometheus/cockpit on noc1). pfSense Unbound already resolves
print.iamworkin.lan to the Traefik VIP 10.0.56.200, so cert-manager
HTTP-01 should validate cleanly.
No basicAuth middleware: Print.Web has its own X-Api-Key authentication
and exposes anonymous endpoints for the bookmarklet / Python CLI /
cups-notifier flow.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
agent-zero ollama-proxy had 172 historic restarts (now stable).
Root cause: liveness/readiness probes hit /api/tags which proxies
through to BLUEJAY-WS Ollama (10.0.56.20:11434). When the workstation
Ollama is slow or offline, nginx fails over to the edge1 backup —
but the failover takes >1s and the kube-probe default timeoutSeconds=1
gives up first. Three failed probes → kubelet kills the container.
Fix:
- Add nginx local healthz endpoint (200, no upstream).
- Liveness probe → /healthz (proves nginx itself is alive).
- Readiness probe stays on /api/tags but with timeoutSeconds=5 so
failover to backup completes before the probe times out.
This decouples liveness from upstream availability — kubelet only
restarts the proxy when nginx is genuinely dead, not when Ollama is
slow.
Longhorn coverage gap: K8s emits "snapshot becomes not ready to use"
events constantly during the hourly snapshot lifecycle (1047
snapshots, all readyToUse=true on inspect). Those events were the
only signal we had — purely transient lifecycle noise, not actionable.
Add:
- longhorn scrape job (longhorn-backend.longhorn-system.svc:9500)
- NetworkPolicy egress rule for longhorn-system port 9500
- 4 new alerts in 'longhorn-storage' group:
- LonghornVolumeDegraded (>15m) — replica unhealthy, auto-rebuild
- LonghornVolumeFaulted (>5m, critical, thermal print) — data loss
- LonghornBackupStale (no completed backup in >36h) — recurring job
silently failing
- LonghornNodeUnhealthy (>5m) — node ready=false
zabbix-web 7 restarts and Print.Web 12:55 stop investigated — both
are stable now, no actionable cause found in journal/events. Adding
KubeContainerRestartingFrequently in the previous commit will catch
recurrence of either.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Followup to 05a273d. After deploy, six PoolDepleted/Deficit alerts
went pending again because the publisher emits per-status gauge
series (fc_desktop_pool_depleted{template,status,alert_level}) and
the historical Warming/BelowDesiredSize series stay at value=1 even
after the template transitions to status=Ready. Filtering by
alert_level=Critical/Warning was not enough — those labels are baked
into the stale series too.
Replace with a join-based query: alert only when the canonical
"Ready" status gauge does NOT report ready=1 for the enabled
template. fc_desktop_pool_ready{status="Ready"}==1 is the publisher's
own current-state canary and never goes stale.
Verified against the live cluster — query returns 0 results when all
pools report healthy in their reconcile logs (no stale-label false
positives).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Followup to ab6ade4. Three issues uncovered after the rollout:
1. NodePort hairpin breaks scrape from same-node pod. Prometheus on
rke2-agent1 could reach traefik-metrics on .11/.13 NodePort 30900
but timed out on its OWN node's NodePort. Same problem would hit
kube-state-metrics + cert-manager whenever prometheus reschedules.
Fix: scrape via ClusterIP svc DNS instead of NodePort. NodePorts
stay in place for external/Podman scrapers.
2. probe-traefik-services failed for grafana, prometheus, guac with
non-200/3xx codes. grafana + prometheus are behind Traefik basic-
auth (every endpoint returns 401), so drop from probe surface —
health is covered by the in-cluster monitoring-* scrape jobs.
guac.iamworkin.lan was deprecated when Guacamole moved under
desktop.iamworkin.lan/guacamole/ — drop it.
3. acme path was wrong (root 404). Use /health.
Coverage adds (probe-traefik-services):
chat, dist, dms, menuboard, messageboard, presentations, retail,
ttsreader. All of these have IngressRoutes serving root at 200/3xx.
NetworkPolicy egress rules added so the new ClusterIP svc scrapes work:
- traefik-system: port 9100 (metrics) — separate from data-path 8080/8443
- kube-system: port 8080 (kube-state-metrics)
- cert-manager: port 9402 (controller metrics)
Out-of-band fix during this audit:
- Print.Web on edge2 was inactive (clean exit at 12:55 CDT, root cause
unclear — systemd Stopping signal). Restarted. Service back on 5200.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Live audit on 2026-04-26 found 14 firing alerts caused by stale probe
targets, blackbox TLS verify failures, and stale state-as-label series.
Plus three K8s scrape sources (kube-state-metrics, cert-manager,
traefik) that exposed NodePorts but were not in any scrape config.
Fixes
- probe-remotedesktop: switch http_2xx -> https_internal. Blackbox does
not trust step-ca root, so /health was failing with x509 unknown
authority while the app served 200s.
- probe-agentzero-nuc: short svc form (agent-zero.agent-zero.svc:80)
instead of *.cluster.local. The FQDN form was being rewritten to the
Traefik VIP by the CoreDNS iamworkin.lan template + ndots:5 search
expansion, then 5s timeout.
- probe-agentzero-local + probe-ollama-local: removed. 10.0.58.100 is on
HOME VLAN and not reachable from cluster pods. Workstation/AI-laptop
Ollama monitoring belongs to host-side Puppet, not cluster blackbox.
- snmp-cloudkey: commented out. The Cloud Key Gen2+ runs unifi-core
(controller), not an SNMP agent. Was generating "connection refused"
every 30s.
- RemoteDesktopPoolDepleted / RemoteDesktopPoolDeficitSustained:
filter on alert_level=Critical / Warning|Critical + enabled=true.
The publisher emits one series per template per status without
resetting old series to 0, so the historical Warming/BelowDesiredSize
series stayed at 1 and the alert kept firing on stale labels.
- RemoteDesktopTlsExpiry: match by job, not hostname-only instance.
The probe sets instance=https://desktop.iamworkin.lan/health so a
hostname-only label match never fired.
- EpsonPrinterDown for: 5m -> 30m. EcoTank sleeps after ~5 min idle and
SNMP times out, so 5m guaranteed nightly noise.
Coverage adds
- kube-state-metrics scrape (NodePort 30901). Required for the new
pod-state alerts and a long list of standard K8s SLO queries.
- cert-manager scrape (NodePort 30902). Required for the
CertManagerCertificateNotReady / RenewalFailed alert pair documented
in project_cert_manager_prometheus_scrape.
- traefik scrape (NodePort 30900) on all three nodes.
- probe-traefik-services: HTTPS probe (https_internal) over the 17 main
iamworkin.lan hosts so any Traefik-fronted service returning non-200
shows up as a single named probe failure.
- blackbox-config: add the https_internal module that the new probes
reference (was only in the FlowerCore.Notes scripts/monitoring copy,
not in the live ConfigMap).
New alerts (kubernetes-state group)
- KubeContainerRestartingFrequently (>5 restarts/h)
- KubeContainerCrashLooping (>3 restarts/15m, thermal print)
- KubePodNotReady (Pending/Failed/Unknown >15m)
- KubePodImagePullBackOff (>10m)
- KubeDeploymentReplicasMismatch (>15m)
Without these, the agent-zero ollama-proxy 172x restart loop was
invisible for ~3 days. Same gap would have hidden the fc-php
php84-app-probe ImagePullBackOff orphan (cleaned up out of band).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the Deployment + Service for the fc-modern-tts container that
landed in the previous commit. Same shape as ttsreader-biblical:
runAsNonRoot uid 1654, dnsPolicy: None to bypass the iamworkin.lan
hijack on Microsoft endpoint lookups, /health probes, modest CPU/mem
since edge-tts is network-bound.
Service surfaces ttsreader-modern.fc-ttsreader.svc:10403 for the web
pod to call when the operator picks a he-IL-* or el-GR-* voice.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a fourth TTS engine alongside Piper / Kokoro / biblical-tts: a
small FastAPI bridge to Microsoft Edge's Read Aloud TTS via the
edge-tts Python package. Provides studio-quality Modern Hebrew (he-IL)
and Modern Greek (el-GR) narrators for the cluster.
modern-tts/Dockerfile + app.py:
- Python 3.12 base + edge-tts==7.2.8 (older versions hit 403 from MS).
- POST /tts -> MP3 audio (audio/mpeg).
- POST /timings -> word-level timings. Edge sometimes omits WordBoundary
events for non-English voices; fall back to MP3-frame-walking duration
estimate + proportional distribution across whitespace-split words
(same approach biblical-tts uses for eSpeak).
- GET /voices?language=all|default — filtered to he-/el- by default so
the AiStation voice picker isn't overwhelmed by 400+ voices.
- GET /health for probes.
- Body shape mirrors BiblicalTtsRequest so the .NET client lives in the
same FlowerCore.Shared.Speech package.
K8s deployment in fc-ttsreader namespace:
- ttsreader-modern Deployment + Service on port 10403.
- localhost/fc-modern-tts:v1, imagePullPolicy: Never (built on noc1,
imported to all 3 RKE2 nodes via ctr).
- runAsNonRoot uid 1654 + fsGroup 1654.
- dnsPolicy: None to bypass the *.iamworkin.lan template hijack on
Microsoft endpoint lookups.
- Modest resources (100m/128Mi req, 1000m/512Mi limit) — edge-tts is
network-bound, not compute-bound.
- Probes against /health.
Verified live locally: container handles 'Καλημέρα Ελλάδα Πώς είστε'
in 2496ms, returns el-GR-NestorasNeural voice + 4 word timings.
Hebrew: 'בְּרֵאשִׁית בָּרָא אֱלֹהִים' returns he-IL-AvriNeural,
2472ms, 3 words.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a third TTS engine alongside Piper (modern English/multi-lang) and
Kokoro (high-quality English): a small FastAPI wrapper around eSpeak-NG
with built-in support for Ancient Greek (grc), Hebrew (he), and Modern
Greek (el). Same shape as fc-speech-align so AiStation talks to all the
TTS/alignment services with one HTTP client pattern.
biblical-tts/Dockerfile + app.py:
- Python 3.12 base + apt-get espeak-ng + libsndfile1 + ffmpeg-free deps.
- POST /tts -> WAV audio bytes (audio/wav).
- POST /timings -> word-level timings derived from espeak's --pho phoneme
duration stream, distributed across whitespace-split words proportional
to character count. Accuracy is good enough for chip-level read-along
highlighting (~30-80ms per-word jitter).
- GET /voices for catalog discovery, GET /health for probes.
- Body shape mirrors AlignmentRequest from FlowerCore.Shared.Speech so
the .NET BiblicalTtsClient round-trips it cleanly.
K8s deployment in fc-ttsreader namespace:
- ttsreader-biblical Deployment + Service on port 10402.
- localhost/fc-biblical-tts:v1, imagePullPolicy: Never (built on noc1,
imported to all 3 RKE2 nodes via ctr).
- runAsNonRoot uid 1654 to match the namespace's standard security ctx.
- Modest resources (100m/128Mi req, 1000m/512Mi limit) — eSpeak is
CPU-cheap.
- Probes hit /health which returns the supported language list.
Verified live: container started, /health returns ok with grc/el/he,
POST /timings on Ἐν ἀρχῇ ἦν ὁ λόγος returned 5 words / 1714ms.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The cluster ttsreader-web was reaching across to BLUEJAY-WS:10401 for
Kokoro synthesis, which meant a workstation-down event broke render-
pipeline TTS. Add a cluster-native ttsreader-kokoro Deployment and
Service inside fc-ttsreader so the cluster owns the engine.
- Image: ghcr.io/remsky/kokoro-fastapi-cpu:latest. Model + 67 voices
ship inside the image, so no PVC is required.
- Port 8880 (the kokoro-fastapi default; the entrypoint hardcodes it).
- Resources: 250m/1Gi request, 2000m/3Gi limit. CPU-only inference
matches what AiStation runs locally on BLUEJAY-WS.
- dnsPolicy: None to bypass CoreDNS's *.iamworkin.lan template hijack
on huggingface.co lookups, same shape as ttsreader-align.
- Probes hit /v1/audio/voices since the kokoro server doesn't expose
/health; that endpoint is cheap (lists configured voice files).
ttsreader-web env var TtsReader__Kokoro__BaseUrl flips from the
workstation pointer to the cluster service:
http://ttsreader-kokoro.fc-ttsreader.svc.cluster.local.:8880.
AiStation keeps its local http://localhost:8880 since the workstation
operator still wants the audio to render on the local sound device
without a network hop.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The /align endpoint was returning Whisper-native word fields
(word/startSeconds/endSeconds/confidence), but FlowerCore.Shared.Speech's
FasterWhisperAlignmentClient on master deserializes
FasterWhisperWord against [JsonPropertyName("text")/("startMs")/("endMs")].
Result: ttsreader-web reported alignment.source="whisper" with words[]
present but every entry had Text="" and StartMs=EndMs=0 — visible in the
2026-04-25 hello-world smoke against ttsreader.iamworkin.lan.
Match the published Common contract instead of the Python model's native
shape: emit text/startMs/endMs (millisecond ints, not float seconds).
Confidence stays on the wire as informational; the deployed C# client
ignores it but a future fc-align operator UI can surface low-confidence
words. Bump tag to v3 and bump the Deployment image accordingly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- New ttsreader-align Deployment + Service + 5Gi PVC under
apps/fc-ttsreader/. Wraps SYSTRAN/faster-whisper in a small FastAPI app
exposing POST /align (fc-align contract used by Shared.Speech) AND
POST /transcribe (audio-in feature consumed by ttsreader-web Lane G).
Source: apps/fc-ttsreader/speech-align/ (Dockerfile + app.py +
requirements.txt). Built locally (apt-get RUN steps need BLUEJAY-WS,
not noc1) and ctr-imported to all 3 RKE2 nodes.
- ttsreader-web env: flip Speech__Alignment__Enabled=true and point
BaseUrl at http://ttsreader-align.fc-ttsreader.svc.cluster.local.:9200.
Add new TtsReader__Transcription__* env triplet pointing at the same
service (same /transcribe endpoint).
- Bump ttsreader-web image to v202604251046 (carries the
TranscriptionController + MCP tool + Quick.razor InputFile UI).
The cluster-wide pod cannot reach BLUEJAY-WS speaches on 10.0.56.20:9200
because the rootless+host-net podman setup binds 127.0.0.1 only on the
WSL machine; nothing on the LAN-facing interface. The openai-compatible
Backend value also relied on a Common change still on feat/shared-indexing
rather than master, so the deployed image's Shared.Speech only knows
the FC-native /align shape.
Disable Speech:Alignment for now. EstimatedAlignmentClient kicks in and
keeps /api/v1/voices/preview-with-timings returning word-aligned JSON,
just with uniform-distribution timings instead of real Whisper output.
Re-enable once: (a) Common's openai-compatible Backend lands on master
and a new TtsReader image ships, or (b) we point at a LAN-routable
backend (e.g. an aiohttp /align shim, or speaches running on a node
that's actually reachable from cluster pods).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>