From b51ee35bfa00a8175c276ec24193d9206033d861 Mon Sep 17 00:00:00 2001 From: Andrew Stoltz Date: Sat, 25 Apr 2026 11:52:14 -0500 Subject: [PATCH] =?UTF-8?q?fc-speech-align:=20v3=20=E2=80=94=20emit=20Flow?= =?UTF-8?q?erCore.Shared.Speech=20word=20contract?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The /align endpoint was returning Whisper-native word fields (word/startSeconds/endSeconds/confidence), but FlowerCore.Shared.Speech's FasterWhisperAlignmentClient on master deserializes FasterWhisperWord against [JsonPropertyName("text")/("startMs")/("endMs")]. Result: ttsreader-web reported alignment.source="whisper" with words[] present but every entry had Text="" and StartMs=EndMs=0 — visible in the 2026-04-25 hello-world smoke against ttsreader.iamworkin.lan. Match the published Common contract instead of the Python model's native shape: emit text/startMs/endMs (millisecond ints, not float seconds). Confidence stays on the wire as informational; the deployed C# client ignores it but a future fc-align operator UI can surface low-confidence words. Bump tag to v3 and bump the Deployment image accordingly. Co-Authored-By: Claude Opus 4.7 (1M context) --- apps/fc-ttsreader/fc-ttsreader.yaml | 2 +- apps/fc-ttsreader/speech-align/app.py | 13 ++++++++++--- 2 files changed, 11 insertions(+), 4 deletions(-) diff --git a/apps/fc-ttsreader/fc-ttsreader.yaml b/apps/fc-ttsreader/fc-ttsreader.yaml index d91aa58..b756e2c 100644 --- a/apps/fc-ttsreader/fc-ttsreader.yaml +++ b/apps/fc-ttsreader/fc-ttsreader.yaml @@ -169,7 +169,7 @@ spec: runAsUser: 1654 containers: - name: align - image: localhost/fc-speech-align:v2 + image: localhost/fc-speech-align:v3 imagePullPolicy: Never ports: - containerPort: 9200 diff --git a/apps/fc-ttsreader/speech-align/app.py b/apps/fc-ttsreader/speech-align/app.py index 092bb48..70652eb 100644 --- a/apps/fc-ttsreader/speech-align/app.py +++ b/apps/fc-ttsreader/speech-align/app.py @@ -128,10 +128,17 @@ async def align(audio: UploadFile = File(...), language: str = Form(DEFAULT_LANG for segment in segments: text_parts.append(segment.text.strip()) for word in (segment.words or []): + # Field names MUST match the FlowerCore.Shared.Speech contract: + # `text` / `startMs` / `endMs`. The deployed FasterWhisperAlignmentClient + # ignores any other names — see Common's + # FasterWhisperAlignmentResponse / FasterWhisperWord. words.append({ - "word": word.word.strip(), - "startSeconds": float(word.start or 0.0), - "endSeconds": float(word.end or 0.0), + "text": word.word.strip(), + "startMs": int((word.start or 0.0) * 1000), + "endMs": int((word.end or 0.0) * 1000), + # Confidence is informational and ignored by the C# client today, + # but kept on the wire for future scoring + fc-align operators + # that want to surface low-confidence words. "confidence": float(getattr(word, "probability", 0.0) or 0.0), })