gx10/tts: persist Piper /tts source + manifest (telephony TTS port baseline)

Dockerfile (linux/arm64, en_US-amy-medium baked), tts_service.py (16kHz/16-bit/mono
WAV, numpy resample 22050->16000), gx10-tts.yaml (CPU NodePort 30850, no GPU request),
README (build/import/cutover/verify on the GX10 cluster).
This commit is contained in:
Andrew Stoltz
2026-06-14 14:14:59 -05:00
parent e4d1735d35
commit d03a92407d
4 changed files with 324 additions and 0 deletions

31
gx10/tts/Dockerfile Normal file
View File

@@ -0,0 +1,31 @@
# GX10 Piper TTS — linux/arm64 (built natively on the GX10 / DGX Spark, aarch64).
# Serves the telephony /tts contract: POST {"text"} -> 16 kHz/16-bit/mono WAV.
# Voice baked into the image so there is no runtime HuggingFace dependency.
FROM python:3.12-slim
# espeak-ng is the phonemizer backend piper-tts uses at synthesis time.
RUN apt-get update \
&& apt-get install -y --no-install-recommends espeak-ng ca-certificates curl \
&& rm -rf /var/lib/apt/lists/*
RUN pip install --no-cache-dir piper-tts flask numpy
# Bake the voice model (en_US-amy-medium, 22.05 kHz native) into the image.
ARG PIPER_VOICE=en_US-amy-medium
ARG VOICE_BASE=https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/amy/medium
RUN mkdir -p /voices \
&& curl -sSL -o "/voices/${PIPER_VOICE}.onnx" "${VOICE_BASE}/${PIPER_VOICE}.onnx" \
&& curl -sSL -o "/voices/${PIPER_VOICE}.onnx.json" "${VOICE_BASE}/${PIPER_VOICE}.onnx.json" \
&& test -s "/voices/${PIPER_VOICE}.onnx" \
&& test -s "/voices/${PIPER_VOICE}.onnx.json"
COPY tts_service.py /app/tts_service.py
WORKDIR /app
ENV TTS_PORT=8500 \
PIPER_VOICE=en_US-amy-medium \
VOICES_DIR=/voices \
TARGET_RATE=16000
EXPOSE 8500
CMD ["python", "tts_service.py"]

59
gx10/tts/README.md Normal file
View File

@@ -0,0 +1,59 @@
# GX10 Piper TTS — telephony `/tts` endpoint
CPU Piper TTS serving the telephony `/tts` contract on the **GX10 RKE2 cluster**
(ASUS Ascent GX10 / NVIDIA DGX Spark, ARM64, `10.0.56.14`). This is the
telephony-TTS-port-to-GX10 (P1) baseline: edge1 parity at higher quality, zero
GPU/aarch64 risk, frees telephony off the slow edge1 Pi 5.
## What it is
- `tts_service.py` — Flask app: `POST /tts {"text"}`**16 kHz / 16-bit / mono WAV**
(canonical 44-byte header) + `GET /health`. Voice `en_US-amy-medium` (22.05 kHz
native) is numpy-resampled to 16 kHz so it drops straight onto Asterisk's
`.sln16` path (telephony strips the 44-byte header). Same wire contract as the
edge1 `speech-pipeline` `/tts`, just the TTS half (no STT/Wyoming).
- `Dockerfile``linux/arm64`, voice baked in (no runtime HuggingFace dep).
- `gx10-tts.yaml` — Namespace `tts` + Deployment (CPU-only, **no GPU request** so it
co-resides with the GPU-holding Ollama pod) + NodePort Service.
## This cluster is NOT under the old-cluster ArgoCD (yet)
Apply manually with the GX10's own kubectl:
```bash
ssh -J noc1 -i ~/.ssh/fcadmin_ed25519 bluejay@10.0.56.14
export KUBECONFIG=/etc/rancher/rke2/rke2.yaml
K=/var/lib/rancher/rke2/bin/kubectl
$K apply -f gx10-tts.yaml
```
## Build + import (native arm64 on the GX10)
```bash
docker build -t localhost/fc-gx10-tts:v20260614 .
docker save localhost/fc-gx10-tts:v20260614 -o /tmp/t.tar
sudo /var/lib/rancher/rke2/bin/ctr -a /run/k3s/containerd/containerd.sock -n k8s.io images import /tmp/t.tar
# manifest uses imagePullPolicy: Never (image lives in containerd, no registry)
```
## Telephony cutover (reversible)
Endpoint telephony hits: **`http://10.0.56.14:30850`** (NodePort, MGMT VLAN 56).
In `apps/telephony/telephony.yaml`:
1. Deployment env `Tts__PiperUrl=http://10.0.56.14:30850`**this is the real lever**;
env vars override `appsettings.Production.json`, so the configmap `Tts` block alone
is inert (it was shadowed by a drifted live env `Tts__PiperUrl=edge1`).
2. NetworkPolicy egress to `10.0.56.14/32:30850` (telephony-web is `hostNetwork`, so this
only matters for non-hostNetwork pods; harmless either way).
3. edge1 (`10.0.57.17:8500`) stays warm — **rollback = set `Tts__PiperUrl` back to it**.
The TTS circuit breaker + `MapTextToSound` canned-prompt fallback mean a bad endpoint
degrades gracefully, never to silence.
## Verify (not a manual call)
```bash
FLOWERCORE_SIP_TEST_MODE=required dotnet.exe test \
FlowerCore.Telephony/tests/FlowerCore.Telephony.SipTests/FlowerCore.Telephony.SipTests.csproj \
--filter FullyQualifiedName~Call_Star100_ReceivesAudibleAudioStream
```
A passing audible test alone is NOT sufficient (edge1 also produces audible audio) —
confirm the **GX10 TTS pod's own access log** (`kubectl -n tts logs deploy/gx10-tts`)
shows `POST /tts 200` during the call, and telephony-web logs target `10.0.56.14:30850`.
## Voice upgrade (follow-on)
Operator's pick is **Kokoro**; needs GPU time-slicing (Ollama holds the GB10 GPU; MPS is
refuted on GB10) OR Kokoro-CPU behind a `/tts` shim. This Piper baseline stays as the floor.

81
gx10/tts/gx10-tts.yaml Normal file
View File

@@ -0,0 +1,81 @@
# GX10 Piper TTS — telephony /tts endpoint on the GX10 RKE2 cluster.
# Applied DIRECTLY via the GX10's own kubectl (KUBECONFIG=/etc/rancher/rke2/rke2.yaml);
# the GX10 cluster is NOT yet under the old-cluster ArgoCD. CPU-only (no GPU request)
# so it co-resides with the GPU-holding Ollama pod without contending for the GB10.
# Image is imported into RKE2 containerd (imagePullPolicy: Never).
# Telephony reaches it at http://10.0.56.14:30850 (NodePort, MGMT VLAN 56).
apiVersion: v1
kind: Namespace
metadata:
name: tts
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: gx10-tts
namespace: tts
labels:
app: gx10-tts
spec:
replicas: 1
selector:
matchLabels:
app: gx10-tts
template:
metadata:
labels:
app: gx10-tts
spec:
containers:
- name: tts
image: localhost/fc-gx10-tts:v20260614
imagePullPolicy: Never
ports:
- containerPort: 8500
name: http
env:
- name: TTS_PORT
value: "8500"
- name: PIPER_VOICE
value: "en_US-amy-medium"
- name: TARGET_RATE
value: "16000"
readinessProbe:
httpGet:
path: /health
port: 8500
initialDelaySeconds: 3
periodSeconds: 5
timeoutSeconds: 3
livenessProbe:
httpGet:
path: /health
port: 8500
initialDelaySeconds: 10
periodSeconds: 20
timeoutSeconds: 5
resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "4"
memory: "2Gi"
---
apiVersion: v1
kind: Service
metadata:
name: gx10-tts
namespace: tts
labels:
app: gx10-tts
spec:
type: NodePort
selector:
app: gx10-tts
ports:
- name: http
port: 8500
targetPort: 8500
nodePort: 30850
protocol: TCP

153
gx10/tts/tts_service.py Normal file
View File

@@ -0,0 +1,153 @@
#!/usr/bin/env python3
"""GX10 Piper TTS microservice — telephony /tts contract.
POST /tts {"text": "..."} -> 16 kHz / 16-bit / mono WAV (canonical 44-byte header)
GET /health -> JSON status
The telephony AsteriskProvider strips the 44-byte WAV header and writes the
remainder as a `.sln16` (signed-linear 16 kHz) file that Asterisk transcodes to
any codec. So the response MUST be 16 kHz / 16-bit / mono. The en_US-amy-medium
voice is 22.05 kHz native, so we resample to 16 kHz (a 22.05 kHz stream treated
as 16 kHz plays ~1.38x too fast). This is a drop-in upgrade over edge1's
en_US-amy-low (16 kHz native, lower quality), keeping the exact wire contract.
"""
import io
import logging
import os
import sys
import threading
import wave
import numpy as np
from flask import Flask, Response, jsonify, request
API_PORT = int(os.environ.get("TTS_PORT", "8500"))
PIPER_VOICE = os.environ.get("PIPER_VOICE", "en_US-amy-medium")
VOICES_DIR = os.environ.get("VOICES_DIR", "/voices")
TARGET_RATE = int(os.environ.get("TARGET_RATE", "16000"))
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
stream=sys.stdout,
)
log = logging.getLogger("gx10-tts")
piper_voice_obj = None
piper_loaded = False
piper_lock = threading.Lock()
native_rate = None
app = Flask(__name__)
def load_piper():
"""Load the Piper voice model once at startup (shared, lock-guarded)."""
global piper_voice_obj, piper_loaded
try:
from piper import PiperVoice
model_path = os.path.join(VOICES_DIR, f"{PIPER_VOICE}.onnx")
if not os.path.isfile(model_path):
log.error("Piper voice model not found at %s — TTS disabled", model_path)
piper_loaded = False
return
log.info("Loading Piper voice %s from %s", PIPER_VOICE, model_path)
piper_voice_obj = PiperVoice.load(model_path)
piper_loaded = True
log.info("Piper voice loaded")
except Exception as exc: # noqa: BLE001 — fail-soft, /health reports it
log.error("Failed to load Piper: %s", exc)
piper_loaded = False
def synthesize_chunks(text):
"""Run Piper synthesis under a lock because the loaded voice is shared."""
with piper_lock:
return list(piper_voice_obj.synthesize(text))
def resample_i16(pcm_i16, src_rate, dst_rate):
"""Linear-interpolation resample of int16 PCM (matches edge1's STT resample)."""
if src_rate == dst_rate or len(pcm_i16) == 0:
return pcm_i16
audio = pcm_i16.astype(np.float32)
target_len = int(round(len(audio) * dst_rate / src_rate))
if target_len <= 0:
return np.zeros(0, dtype=np.int16)
idx = np.linspace(0, len(audio) - 1, target_len)
res = np.interp(idx, np.arange(len(audio)), audio)
return np.clip(np.round(res), -32768, 32767).astype(np.int16)
@app.route("/health", methods=["GET"])
def health():
return jsonify({
"status": "ok",
"voice": PIPER_VOICE,
"loaded": piper_loaded,
"target_rate": TARGET_RATE,
"native_rate": native_rate,
})
@app.route("/tts", methods=["POST"])
def tts():
"""Text -> 16 kHz/16-bit/mono WAV. Mirrors the edge1 speech-pipeline contract."""
if not piper_loaded:
return jsonify({"error": "Piper TTS model not loaded"}), 503
data = request.get_json(silent=True)
if not data or "text" not in data:
return jsonify({"error": "Missing required field: text"}), 400
text = data["text"].strip()
if not text:
return jsonify({"error": "Text field is empty"}), 400
if len(text) > 10000:
return jsonify({"error": "Text too long (max 10000 characters)"}), 400
try:
chunks = synthesize_chunks(text)
if not chunks:
return jsonify({"error": "No audio produced"}), 500
global native_rate
first = chunks[0]
native_rate = first.sample_rate
if first.sample_width != 2 or first.sample_channels != 1:
return jsonify({
"error": f"Unexpected PCM format: width={first.sample_width} "
f"channels={first.sample_channels} (need 16-bit mono)"
}), 500
pcm = np.frombuffer(
b"".join(c.audio_int16_bytes for c in chunks), dtype=np.int16
)
out = resample_i16(pcm, native_rate, TARGET_RATE)
wav_buffer = io.BytesIO()
with wave.open(wav_buffer, "wb") as wav_file:
wav_file.setnchannels(1)
wav_file.setsampwidth(2)
wav_file.setframerate(TARGET_RATE)
wav_file.writeframes(out.tobytes())
wav_buffer.seek(0)
return Response(
wav_buffer.read(),
mimetype="audio/wav",
headers={"Content-Disposition": 'inline; filename="speech.wav"'},
)
except Exception as exc: # noqa: BLE001
log.error("TTS synthesis failed: %s", exc)
return jsonify({"error": f"Synthesis failed: {exc}"}), 500
if __name__ == "__main__":
log.info(
"GX10 TTS starting on port %d (voice=%s -> %d Hz)",
API_PORT, PIPER_VOICE, TARGET_RATE,
)
load_piper()
app.run(host="0.0.0.0", port=API_PORT, threaded=True)