fc-speech-align: v3 — emit FlowerCore.Shared.Speech word contract

The /align endpoint was returning Whisper-native word fields
(word/startSeconds/endSeconds/confidence), but FlowerCore.Shared.Speech's
FasterWhisperAlignmentClient on master deserializes
FasterWhisperWord against [JsonPropertyName("text")/("startMs")/("endMs")].
Result: ttsreader-web reported alignment.source="whisper" with words[]
present but every entry had Text="" and StartMs=EndMs=0 — visible in the
2026-04-25 hello-world smoke against ttsreader.iamworkin.lan.

Match the published Common contract instead of the Python model's native
shape: emit text/startMs/endMs (millisecond ints, not float seconds).
Confidence stays on the wire as informational; the deployed C# client
ignores it but a future fc-align operator UI can surface low-confidence
words. Bump tag to v3 and bump the Deployment image accordingly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Andrew Stoltz
2026-04-25 11:52:14 -05:00
parent 4abc2fa95d
commit b51ee35bfa
2 changed files with 11 additions and 4 deletions

View File

@@ -169,7 +169,7 @@ spec:
runAsUser: 1654
containers:
- name: align
image: localhost/fc-speech-align:v2
image: localhost/fc-speech-align:v3
imagePullPolicy: Never
ports:
- containerPort: 9200

View File

@@ -128,10 +128,17 @@ async def align(audio: UploadFile = File(...), language: str = Form(DEFAULT_LANG
for segment in segments:
text_parts.append(segment.text.strip())
for word in (segment.words or []):
# Field names MUST match the FlowerCore.Shared.Speech contract:
# `text` / `startMs` / `endMs`. The deployed FasterWhisperAlignmentClient
# ignores any other names — see Common's
# FasterWhisperAlignmentResponse / FasterWhisperWord.
words.append({
"word": word.word.strip(),
"startSeconds": float(word.start or 0.0),
"endSeconds": float(word.end or 0.0),
"text": word.word.strip(),
"startMs": int((word.start or 0.0) * 1000),
"endMs": int((word.end or 0.0) * 1000),
# Confidence is informational and ignored by the C# client today,
# but kept on the wire for future scoring + fc-align operators
# that want to surface low-confidence words.
"confidence": float(getattr(word, "probability", 0.0) or 0.0),
})