fc-speech-align: v3 — emit FlowerCore.Shared.Speech word contract
The /align endpoint was returning Whisper-native word fields
(word/startSeconds/endSeconds/confidence), but FlowerCore.Shared.Speech's
FasterWhisperAlignmentClient on master deserializes
FasterWhisperWord against [JsonPropertyName("text")/("startMs")/("endMs")].
Result: ttsreader-web reported alignment.source="whisper" with words[]
present but every entry had Text="" and StartMs=EndMs=0 — visible in the
2026-04-25 hello-world smoke against ttsreader.iamworkin.lan.
Match the published Common contract instead of the Python model's native
shape: emit text/startMs/endMs (millisecond ints, not float seconds).
Confidence stays on the wire as informational; the deployed C# client
ignores it but a future fc-align operator UI can surface low-confidence
words. Bump tag to v3 and bump the Deployment image accordingly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -169,7 +169,7 @@ spec:
|
|||||||
runAsUser: 1654
|
runAsUser: 1654
|
||||||
containers:
|
containers:
|
||||||
- name: align
|
- name: align
|
||||||
image: localhost/fc-speech-align:v2
|
image: localhost/fc-speech-align:v3
|
||||||
imagePullPolicy: Never
|
imagePullPolicy: Never
|
||||||
ports:
|
ports:
|
||||||
- containerPort: 9200
|
- containerPort: 9200
|
||||||
|
|||||||
@@ -128,10 +128,17 @@ async def align(audio: UploadFile = File(...), language: str = Form(DEFAULT_LANG
|
|||||||
for segment in segments:
|
for segment in segments:
|
||||||
text_parts.append(segment.text.strip())
|
text_parts.append(segment.text.strip())
|
||||||
for word in (segment.words or []):
|
for word in (segment.words or []):
|
||||||
|
# Field names MUST match the FlowerCore.Shared.Speech contract:
|
||||||
|
# `text` / `startMs` / `endMs`. The deployed FasterWhisperAlignmentClient
|
||||||
|
# ignores any other names — see Common's
|
||||||
|
# FasterWhisperAlignmentResponse / FasterWhisperWord.
|
||||||
words.append({
|
words.append({
|
||||||
"word": word.word.strip(),
|
"text": word.word.strip(),
|
||||||
"startSeconds": float(word.start or 0.0),
|
"startMs": int((word.start or 0.0) * 1000),
|
||||||
"endSeconds": float(word.end or 0.0),
|
"endMs": int((word.end or 0.0) * 1000),
|
||||||
|
# Confidence is informational and ignored by the C# client today,
|
||||||
|
# but kept on the wire for future scoring + fc-align operators
|
||||||
|
# that want to surface low-confidence words.
|
||||||
"confidence": float(getattr(word, "probability", 0.0) or 0.0),
|
"confidence": float(getattr(word, "probability", 0.0) or 0.0),
|
||||||
})
|
})
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user