fc-ttsreader: disable Whisper, fall back to estimator until backend is reachable

The cluster-wide pod cannot reach BLUEJAY-WS speaches on 10.0.56.20:9200
because the rootless+host-net podman setup binds 127.0.0.1 only on the
WSL machine; nothing on the LAN-facing interface. The openai-compatible
Backend value also relied on a Common change still on feat/shared-indexing
rather than master, so the deployed image's Shared.Speech only knows
the FC-native /align shape.

Disable Speech:Alignment for now. EstimatedAlignmentClient kicks in and
keeps /api/v1/voices/preview-with-timings returning word-aligned JSON,
just with uniform-distribution timings instead of real Whisper output.

Re-enable once: (a) Common's openai-compatible Backend lands on master
and a new TtsReader image ships, or (b) we point at a LAN-routable
backend (e.g. an aiohttp /align shim, or speaches running on a node
that's actually reachable from cluster pods).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Andrew Stoltz
2026-04-25 10:28:21 -05:00
parent 08aa7a5bff
commit 9df26620b8

View File

@@ -173,18 +173,18 @@ spec:
- name: TtsReader__Kokoro__TimeoutSeconds - name: TtsReader__Kokoro__TimeoutSeconds
value: "120" value: "120"
- name: Speech__Alignment__Enabled - name: Speech__Alignment__Enabled
value: "true" # Off until either:
- name: Speech__Alignment__Backend # (a) a native /align backend is deployed inside the cluster, or
# speaches container on BLUEJAY-WS speaks the OpenAI-compatible # (b) the BLUEJAY-WS host exposes the speaches container on the
# /v1/audio/transcriptions contract; FasterWhisperAlignmentClient # LAN-routable bind (10.0.56.20:9200, not just 127.0.0.1)
# adapts the verbose_json response into the FlowerCore shape. # AND Common ships the openai-compatible Backend support
# Switch to "fc-align" once a native /align backend is deployed. # (currently on feat/shared-indexing, not on master).
value: "openai-compatible" # While disabled, /preview-with-timings still returns word timings
# via EstimatedAlignmentClient — slightly less accurate, but the
# UI can still drive word-level highlight playback.
value: "false"
- name: Speech__Alignment__BaseUrl - name: Speech__Alignment__BaseUrl
value: "http://10.0.56.20:9200" value: "http://10.0.56.20:9200"
- name: Speech__Alignment__Model
# Tag understood by speaches (faster-whisper-server).
value: "Systran/faster-whisper-base.en"
- name: Speech__Alignment__TimeoutSeconds - name: Speech__Alignment__TimeoutSeconds
value: "120" value: "120"
- name: TtsReader__Ollama__BaseUrl - name: TtsReader__Ollama__BaseUrl