deploy(worldbuilder): enable live gpu backend

This commit is contained in:
Andrew Stoltz
2026-06-11 16:05:40 -05:00
parent 7103658342
commit c1a43c64b3
2 changed files with 29 additions and 33 deletions

View File

@@ -12,28 +12,27 @@ Source: `D:\git\FlowerCore\FlowerCore.WorldBuilder` (master)
in pfSense Unbound before this manifest is applied, or cert-manager in pfSense Unbound before this manifest is applied, or cert-manager
HTTP-01 silently exponential-backs-off ~2h. HTTP-01 silently exponential-backs-off ~2h.
Memory: `feedback_pfsense_dns_required_for_acme`. Memory: `feedback_pfsense_dns_required_for_acme`.
2. **Image import to ALL RKE2 nodes** — pod can schedule to any of 2. **Image import to ALL Ready RKE2 nodes** — pod can currently schedule to
`rke2-server` (10.0.56.11), `rke2-agent1` (10.0.56.12), `rke2-server` (10.0.56.11) and `rke2-agent1` (10.0.56.12). Build with:
`rke2-agent2` (10.0.56.13). Build with:
```bash ```bash
bash deploy/build.sh # in FlowerCore.WorldBuilder repo bash deploy/build.sh # in FlowerCore.WorldBuilder repo
podman save localhost/fc-worldbuilder:v<TAG> -o /tmp/fc-worldbuilder-v<TAG>.tar mkdir -p artifacts/deploy
for h in 10.0.56.11 10.0.56.12 10.0.56.13; do podman save localhost/fc-worldbuilder:v<TAG> -o artifacts/deploy/fc-worldbuilder-v<TAG>.tar
scp /tmp/fc-worldbuilder-v<TAG>.tar fcadmin@$h:/tmp/ for h in 10.0.56.11 10.0.56.12; do
ssh fcadmin@$h "mkdir -p /home/fcadmin/.fcv"
scp artifacts/deploy/fc-worldbuilder-v<TAG>.tar fcadmin@$h:/home/fcadmin/.fcv/
ssh fcadmin@$h \ ssh fcadmin@$h \
"sudo /var/lib/rancher/rke2/bin/ctr -a /run/k3s/containerd/containerd.sock \ "sudo /var/lib/rancher/rke2/bin/ctr -a /run/k3s/containerd/containerd.sock \
-n k8s.io images import /tmp/fc-worldbuilder-v<TAG>.tar" -n k8s.io images import /home/fcadmin/.fcv/fc-worldbuilder-v<TAG>.tar"
done done
``` ```
Memory: `feedback_rke2_image_import_per_node_scp`. Memory: `feedback_rke2_image_import_per_node_scp`.
3. **Bump image tag** in `worldbuilder.yaml` and git push. 3. **Bump image tag** in `worldbuilder.yaml` and git push.
ArgoCD ApplicationSet picks up within ~3 minutes. ArgoCD ApplicationSet picks up within ~3 minutes.
4. **First production render** — open 4. **First production render** — verify
`https://worldbuilder.iamworkin.lan/studio/c32e0000-0000-4000-8000-000000000004` `https://worldbuilder.iamworkin.lan/healthz`, open
and confirm the Cyberpunk Blue Jay demo prompt loads with five seeded fake `https://worldbuilder.iamworkin.lan/settings`, and confirm the image backend
generated images. This Sprint 32 visitor-safe profile uses reports ComfyUI before running an operator-owned render lane.
`ClientMode=fake`; switch the image-generation env vars back to ComfyUI only
for an operator-owned GPU render lane.
## Health probes ## Health probes
@@ -56,13 +55,8 @@ Source: `D:\git\FlowerCore\FlowerCore.WorldBuilder` (master)
## Image generation backend ## Image generation backend
Sprint 32 pins the Kubernetes profile to The live internal profile now uses
`FlowerCore:WorldBuilder:ImageGeneration:ClientMode=fake` with `FlowerCore:WorldBuilder:ImageGeneration:ClientMode=comfyui` with
`BaseUrl=http://127.0.0.1:1`. That keeps the public/internal visitor demo `BaseUrl=http://10.0.56.20:8188` on BLUEJAY-WS (R9700 / gfx1201 / ROCm 7.2).
deterministic, avoids GPU exposure, and still exercises the studio/gallery Keep the public host pre-staging disabled unless the five safe-to-expose gates
surface with persisted generated-image metadata. are rechecked; the live GPU lane is operator-owned and internal-only.
The previous ComfyUI backend target was `http://10.0.56.20:8188` on
BLUEJAY-WS (R9700 / gfx1201 / ROCm 7.2.1). Re-enable it only in an
operator-owned follow-up that also verifies workstation reachability and image
import freshness.

View File

@@ -5,10 +5,10 @@
# #
# Image build (BLUEJAY-WS): # Image build (BLUEJAY-WS):
# bash deploy/build.sh # in FlowerCore.WorldBuilder repo # bash deploy/build.sh # in FlowerCore.WorldBuilder repo
# podman save localhost/fc-worldbuilder:v<TAG> -o /tmp/fc-worldbuilder-v<TAG>.tar # podman save localhost/fc-worldbuilder:v<TAG> -o artifacts/deploy/fc-worldbuilder-v<TAG>.tar
# for h in 10.0.56.11 10.0.56.12 10.0.56.13; do # for h in 10.0.56.11 10.0.56.12; do
# scp /tmp/fc-worldbuilder-v<TAG>.tar fcadmin@$h:/tmp/ # scp artifacts/deploy/fc-worldbuilder-v<TAG>.tar fcadmin@$h:/home/fcadmin/.fcv/
# ssh fcadmin@$h "sudo /var/lib/rancher/rke2/bin/ctr -a /run/k3s/containerd/containerd.sock -n k8s.io images import /tmp/fc-worldbuilder-v<TAG>.tar" # ssh fcadmin@$h "sudo /var/lib/rancher/rke2/bin/ctr -a /run/k3s/containerd/containerd.sock -n k8s.io images import /home/fcadmin/.fcv/fc-worldbuilder-v<TAG>.tar"
# done # done
--- ---
apiVersion: v1 apiVersion: v1
@@ -90,7 +90,7 @@ spec:
containers: containers:
- name: web - name: web
# Bump tag for each rebuild. Initial deploy: v202605062048 # Bump tag for each rebuild. Initial deploy: v202605062048
image: localhost/fc-worldbuilder:v202605062048 image: localhost/fc-worldbuilder:v20260611-b4a0025-gpu
imagePullPolicy: Never imagePullPolicy: Never
ports: ports:
- containerPort: 8080 - containerPort: 8080
@@ -117,14 +117,16 @@ spec:
value: "/data/gallery" value: "/data/gallery"
- name: FlowerCore__WorldBuilder__Export__RootPath - name: FlowerCore__WorldBuilder__Export__RootPath
value: "/data/exports" value: "/data/exports"
# Visitor-safe Sprint 32 profile: fake backend keeps public demo # Operator-approved live GPU lane. Internal-only host targets
# rendering deterministic and avoids exposing BLUEJAY-WS GPU. # BLUEJAY-WS ComfyUI; keep public host pre-staging disabled below.
- name: FlowerCore__WorldBuilder__ImageGeneration__BaseUrl - name: FlowerCore__WorldBuilder__ImageGeneration__BaseUrl
value: "http://127.0.0.1:1" value: "http://10.0.56.20:8188"
- name: FlowerCore__WorldBuilder__ImageGeneration__ClientMode - name: FlowerCore__WorldBuilder__ImageGeneration__ClientMode
value: "fake" value: "comfyui"
- name: FlowerCore__WorldBuilder__ImageGeneration__BackendId - name: FlowerCore__WorldBuilder__ImageGeneration__BackendId
value: "fake" value: "comfyui"
- name: FlowerCore__WorldBuilder__ImageGeneration__VisitorSafe
value: "false"
resources: resources:
# Cluster CPU-request budget runs hot (99% on all 3 nodes at deploy # Cluster CPU-request budget runs hot (99% on all 3 nodes at deploy
# time) while actual CPU usage is well below capacity. Idle Blazor # time) while actual CPU usage is well below capacity. Idle Blazor