fc-speech-align: v3 — emit FlowerCore.Shared.Speech word contract
The /align endpoint was returning Whisper-native word fields
(word/startSeconds/endSeconds/confidence), but FlowerCore.Shared.Speech's
FasterWhisperAlignmentClient on master deserializes
FasterWhisperWord against [JsonPropertyName("text")/("startMs")/("endMs")].
Result: ttsreader-web reported alignment.source="whisper" with words[]
present but every entry had Text="" and StartMs=EndMs=0 — visible in the
2026-04-25 hello-world smoke against ttsreader.iamworkin.lan.
Match the published Common contract instead of the Python model's native
shape: emit text/startMs/endMs (millisecond ints, not float seconds).
Confidence stays on the wire as informational; the deployed C# client
ignores it but a future fc-align operator UI can surface low-confidence
words. Bump tag to v3 and bump the Deployment image accordingly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -169,7 +169,7 @@ spec:
|
||||
runAsUser: 1654
|
||||
containers:
|
||||
- name: align
|
||||
image: localhost/fc-speech-align:v2
|
||||
image: localhost/fc-speech-align:v3
|
||||
imagePullPolicy: Never
|
||||
ports:
|
||||
- containerPort: 9200
|
||||
|
||||
@@ -128,10 +128,17 @@ async def align(audio: UploadFile = File(...), language: str = Form(DEFAULT_LANG
|
||||
for segment in segments:
|
||||
text_parts.append(segment.text.strip())
|
||||
for word in (segment.words or []):
|
||||
# Field names MUST match the FlowerCore.Shared.Speech contract:
|
||||
# `text` / `startMs` / `endMs`. The deployed FasterWhisperAlignmentClient
|
||||
# ignores any other names — see Common's
|
||||
# FasterWhisperAlignmentResponse / FasterWhisperWord.
|
||||
words.append({
|
||||
"word": word.word.strip(),
|
||||
"startSeconds": float(word.start or 0.0),
|
||||
"endSeconds": float(word.end or 0.0),
|
||||
"text": word.word.strip(),
|
||||
"startMs": int((word.start or 0.0) * 1000),
|
||||
"endMs": int((word.end or 0.0) * 1000),
|
||||
# Confidence is informational and ignored by the C# client today,
|
||||
# but kept on the wire for future scoring + fc-align operators
|
||||
# that want to surface low-confidence words.
|
||||
"confidence": float(getattr(word, "probability", 0.0) or 0.0),
|
||||
})
|
||||
|
||||
|
||||
Reference in New Issue
Block a user