Compare commits

..

74 Commits

Author SHA1 Message Date
Andrew Stoltz
46bbd00d09 Add step-ca agent issuer manifest 2026-05-19 17:52:58 -05:00
Andrew Stoltz
ca574c2280 brochure: delete apps/brochure/ — full prune per operator decision 2026-05-19
Removes the apps/brochure/ directory entirely from the bluejay-infra
ApplicationSet glob. ArgoCD will:

  1. See infra-brochure has no git source -> mark for delete
  2. Prune the brochure namespace + Deployment + Service + Certificate
     + Secret + IngressRoute (all generated from the now-gone
     apps/brochure/brochure.yaml)
  3. Remove the infra-brochure Application from argocd ns

Operator decision 2026-05-19 (follow-up to 09387f9 ARCHIVED banner
commit): "Yes, prune argo for brochure. Probably fully deleted there."

The brochure subdomain project was a planning-chain misinterpretation
of "make TtsReader + AI Station production-ready" — see
memory/project_brochure_split_misinterpretation_archived_2026_05_19.md
in FlowerCore.Notes for the full decision record.

Reusable artifacts that were the operator's archive concern stay alive
in their actual homes:

- FlowerCore.Intranet.Web PR #8 content-NuGet carve-out: still in
  Intranet's master, may transfer to TtsReader / AI Station prod work
- Sprint 32 Cl-5 substrate (public-twin design ideas): SUPERSEDED banner
  in-place in FlowerCore.Notes docs/standards/, history preserved
- magpie-doc-writer + wren-walkthrough skill output: unchanged in
  Intranet's flowercore-whats-new/walkthroughs/galleries directories

Companion Notes-side commit updates the "scaled to 0 + ARCHIVED banner"
language in mvp-readiness.html + fleet-roadmap-2026-05-19-sprint36-v2.md
+ memory record to reflect full deletion instead.

Wrong-codebase image localhost/fc-brochure-web:v20260524-sprint32 is
being removed from rke2-server / rke2-agent1 / rke2-agent2 in a
follow-up step (reclaims ~800MB per node).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 10:42:30 -05:00
Andrew Stoltz
09387f90e1 brochure: ARCHIVED 2026-05-19 — was a misinterpretation, do not re-enable
The brochure split project was a misinterpretation of an operator request
to make TtsReader + AI Station production-ready. Somewhere in the planning
chain it spun up into a separate "showcase brochure product" with its own
host, repo, NuGet, and Codex pack — none of which the operator actually
wanted. The project itself is pointless and a waste of credits.

Archive (not delete) per operator decision 2026-05-19, because some work
shipped under the misinterpretation may still have reusable value:

- FlowerCore.Intranet.Web PR #8 (merged) introduced FlowerCore.Brochure.Content
  content-NuGet carve-out — pattern may apply to TtsReader/AiStation production
  polish.
- Sprint 32 Cl-5 substrate has design ideas for public-twin vs operator-host
  separation that may transfer.
- magpie-doc-writer / wren-walkthrough skills still author useful Intranet
  content — those skills stay active.

These manifests stay at replicas: 0 for ArgoCD continuity. Cleanup options
(move out of apps/* glob, or delete entirely) are documented in README.md
for an operator-explicit future call.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 10:34:28 -05:00
Andrew Stoltz
e641ceab48 monitoring(irc-notify): criticals also batch hourly — fix per-fire spam
The first batching pass (bacac06) left critical-severity alerts on the
immediate-print path. That's still per-event spam for any persistent
critical (e.g. PrintPaperRollCritical fires every 30s Grafana evaluation
cycle when paper is <5%). Caught immediately after deploy: CUPS queue grew
0 → 8 jobs in 8 minutes from a single firing PrintPaperRollCritical.

This commit aligns with the operator's verbatim ask ("one alert an hour"):

- Critical-severity alerts now go into the digest buffer, NOT the
  immediate-print path. The digest payload already shows severity tags
  per alertname, so the operator still sees "[critical] X" in the printout.
- The explicit `alert_channel=thermal_print_immediate` label still bypasses
  batching, but only on NEW fingerprint arrival — it triggers a flush of
  the CURRENT digest (with the new alert included), then clears. Repeat
  webhooks for the same fingerprint dedupe in the buffer until the next
  hourly tick OR until the alert resolves. No fingerprint can spam.
- `add_to_digest` now returns bool (True = buffer grew, False = dedup /
  resolution / disabled) so the immediate-label path can flush only on
  state transitions.

Net effect: max 1 thermal print per BATCH_INTERVAL_MIN per alert fingerprint,
regardless of severity. Rules that genuinely need same-second paper opt in
via `alert_channel=thermal_print_immediate` (currently zero rules use this).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 10:22:25 -05:00
Andrew Stoltz
c263426ea5 fc-devicemgmt: operator image fix + Web scaled to 0
OPERATOR (PodCrashLoopBackOff cleared):
- Bumped image to v20260519-sp34cl3-fix (built from astoltz/FlowerCore.DeviceManagement@d9a3685
  after Sprint 34 Cl-3 stranded branch was merged via PR #19 squash).
- The v20260512-cx5 image was the broken Sprint 8 scaffold: generic Host
  builder, no kubeops, no Kestrel on :8080, no AddController chain. Readiness
  probe dial-tcp 8080 failed every restart.
- The new image ships the AddController chain for all 4 reconcilers
  (DeviceCrd / DeviceGroupCrd / DevicePolicyCrd / RemoteCommandCrd) plus
  Kestrel on :8080 and /healthz.
- Image saved + scp'd + ctr-imported on rke2-server / rke2-agent1 / rke2-agent2
  before this commit. SHA256: 2cc79ee0a2313c550268d1244f805ae41b396362148dd5603061cc15b6f7fa7e

WEB (DeploymentReplicasMismatch cleared via scale-to-0):
- Web pod cannot start. Two upstream gaps must close first:
  1) MySQL DB instance + user `fc_devicemgmt` / database `flowercore_devicemgmt`
     are not provisioned in fc-mysql. Cluster has zero MySqlInstanceCrds and
     no `mysql.fc-mysql.svc:3306` Service.
  2) 1Password vault item `IAmWorkin/FlowerCore DeviceManagement Runtime` is
     missing (5 fields: DB-Password + 4 mTLS PEMs). OnePasswordItem CRD has
     been stuck Ready=False since 2026-05-18T02:58.
- Same pattern as the brochure-web scale-to-0 in 914fed0 — make the cluster
  clean and quiet, let operator restart deploy on a real schedule.

Re-enable path is fully documented in the deployment-web.yaml header comment.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 10:11:09 -05:00
Andrew Stoltz
bacac067cf monitoring(irc-notify): hourly digest batching for thermal printer
The thermal printer drained overnight (2026-05-18/19) because the old
notify.py POSTed one print job per Grafana webhook fire. With 9
concurrently-firing alerts (zabbix-postgres + fc-devicemgmt + brochure
+ PrintPaperRollLow), every evaluation cycle stamped fresh CUPS jobs
onto the queue until the operator physically powered the printer off.

This refactor:

- Adds env-var config: THERMAL_PRINT_ENABLED (master kill switch),
  BATCH_INTERVAL_MIN (default 60), BATCH_MAX_PENDING (default 50).
- IRC delivery stays per-event (operator wants the live stream).
- Thermal routing now:
  * critical/disaster/page severity OR alert_channel=thermal_print_immediate
    -> print immediately
  * alert_channel=thermal_print -> enqueue into hourly digest
  * RESOLVED -> remove from digest buffer (no resolution-spam prints)
  * else -> IRC only, no thermal
- Background digest_loop thread flushes the buffer hourly (or sooner
  if buffer hits BATCH_MAX_PENDING). Digest payload is a single
  Print.Web /api/print/alert POST listing distinct alertnames + per-rule
  target counts.
- New POST /flush endpoint (manual operator force-flush; useful for
  testing without waiting an hour).
- GET / returns config + buffer depth + per-stat counters for observability.

Net effect: max 1 thermal print per BATCH_INTERVAL_MIN for batched
warnings, plus immediate prints for criticals. Closes the 2026-05-18/19
alert-storm incident.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 09:56:14 -05:00
914fed08d8 fix(brochure): scale brochure-web to 0 — wrong codebase shipped (Intranet.Web binary in fc-brochure-web image, CrashLoopBackOff 296 restarts on /data read-only). Re-enable after Sprint 34 Cx-3 rebuild per docs/ai-agents/codex-prompts/2026-05-18-fc-brochure-web-rebuild-pack.md 2026-05-19 14:45:01 +00:00
Andrew Stoltz
200aeab032 ttsreader: deploy study mode repair image 2026-05-18 16:33:08 -05:00
Andrew Stoltz
8182616d4c ttsreader: point render piper to edge1 demo endpoint 2026-05-18 16:06:37 -05:00
Andrew Stoltz
f0862ac03c ttsreader: deploy sprint36 demo audio image 2026-05-18 16:04:59 -05:00
Andrew Stoltz
46c392605e monitoring: mirror PuppetServiceFailed alert from Notes (Sprint 33 Cx-7 Phase B)
Mirrors the live `puppet` alert group from
FlowerCore.Notes/scripts/monitoring/alerts.yml into the K8s ConfigMap so a
future in-cluster Prometheus inherits the ruleset automatically.

Source of truth remains the Notes file (live Podman Prometheus on noc1).
See feedback_monitoring_k8s_target_vs_live_podman.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 11:11:07 -05:00
89b147bbdd docs(openvox): document quadlet durability smoke (#12) 2026-05-18 04:53:02 +00:00
d7238a5e3b feat(brochure): add public brochure GitOps app (#13) 2026-05-18 04:52:37 +00:00
fc444a02a1 feat(chat): add public twin ingress (#11) 2026-05-18 04:52:20 +00:00
83d4883d55 feat(worldbuilder): pin k8s demo to fake backend (#10) 2026-05-18 04:52:11 +00:00
f8fe3b2688 feat(github-runner): add final long-tail runners (#9) 2026-05-18 04:52:01 +00:00
f2ab892ebc feat(github-runner): add Marquee + TtsReader per-repo runners (#8) 2026-05-18 03:27:14 +00:00
fef68a9560 feat(fc-devicemgmt): add Kubernetes deployment manifests (#1)
Sprint 8 IMPL lane Cx-5: fc-devicemgmt K8s manifests (rebased onto main 2026-05-18; 13 files, +944).

Namespace + Web Deployment (replicas:2, MySQL backend) + Operator Deployment (replicas:1, KubeOps leader-elect) + Service + Certificate (step-ca-acme ClusterIssuer) + Traefik IngressRoute (devices.iamworkin.lan internal) + ServiceAccount + ClusterRole + ClusterRoleBinding + NetworkPolicy (CNI DNAT-aware backend ports) + OnePasswordItem (5-field consolidated) + ArgoCD Application bootstrap shape + lint coverage.

Follow-ups (not merge blockers):
- localhost/fc-devicemgmt-{web,operator}:v20260512-cx5 must be imported to all 3 RKE2 nodes; pods will ErrImageNeverPull until imported.
- 1Password vault item 'FlowerCore DeviceManagement Runtime' must be created with 5 fields before pods can start.
- DNS devices.iamworkin.lan -> 10.0.56.200 already present.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 02:56:23 +00:00
Andrew Stoltz
6fe77225ae fix(github-runner): dedupe DOTNET_INSTALL_DIR+NUGET_PACKAGES on base+sharedpos
PR #5 rebase concatenated PR #5 env additions onto PR #7 env additions on
the base + sharedpos Deployments, producing duplicate-key validation
errors in ArgoCD's structured merge. The DOTNET_INSTALL_DIR and
NUGET_PACKAGES values are identical between PR #5 and PR #7; keep the
PR #7 originals and retain only the unique new env vars from PR #5
(DOTNET_CLI_TELEMETRY_OPTOUT, DOTNET_NOLOGO, DOTNET_GENERATE_ASPNET_CERTIFICATE).

No behavioral change — same final env var set, no duplicates.
2026-05-17 21:53:05 -05:00
634b9c4169 feat(github-runner): harden Linux runner fleet (#5) 2026-05-18 02:51:02 +00:00
b8c7e59005 Tighten Pi signage HDMI settle coverage (#3) 2026-05-18 02:35:17 +00:00
65ac8d6f01 feat(github-runner): pod-env DOTNET_INSTALL_DIR + initContainer for non-root runner (#7) 2026-05-18 02:25:18 +00:00
35844e0dbd chore(github-runner): un-park github-runner-sharedpos (Shared.Pos non-root build fixed) (#6) 2026-05-18 02:20:00 +00:00
b1e307151e chore(github-runner): un-park github-runner-sharedpos (replicas 1) after Shared.Pos CI fix merged 2026-05-17 21:54:16 +00:00
12b07219c7 chore(github-runner): park github-runner-sharedpos (replicas 0) until Cx-1 non-root fix
Shared.Pos build fails on non-root runner (setup-dotnet /usr/share/dotnet denied); churning runner drove HighCPU on rke2-agent2. Re-enable in the Sprint 30+ Cx-1 Linux-runner-fleet lane (DOTNET_INSTALL_DIR on pod env).
2026-05-17 21:50:35 +00:00
9fd32c4415 feat(monitoring): MacMiniRunnerOffline alert (Sprint 28 reconcile) 2026-05-17 19:50:29 +00:00
ad670fb344 feat(github-runner): add Shared.Pos repo-scoped Linux runner (unstick stuck publish) 2026-05-17 19:50:23 +00:00
Codex
6f6ca50987 fix(github-runner): switch RUNNER_TOKEN -> ACCESS_TOKEN; set RUN_AS_ROOT=false
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:08:56 +00:00
Codex
c7be58c1f7 chore(github-runner): bump replicas 0 -> 1 (PAT provisioned)
Operator provisioned GitHub PAT (Runner Registration) 1P item. OnePasswordItem CRD already materialized the secret. Unblocks CI fleet-wide.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:04:03 +00:00
Codex
a1f5a393cd chore(github-runner): rename 1P item to GitHub PAT (Runner Registration)
Renames the OnePasswordItem.itemPath from "GitHub Runner Registration
Token" to "GitHub PAT (Runner Registration)" so the runner 1P entry
sits next to its siblings — GitHub PAT (Gitea Mirrors) and GitHub PAT
(NuGet Packages) — under a consistent "GitHub PAT (...)" naming pattern
and API_CREDENTIAL category.

Existing field "credential" remains the consumer (RUNNER_TOKEN env).
Comment block clarified to require Administration:read/write fine-grained
PAT scope on target repos.

Old 1P item renamed to "[DEPRECATED 2026-05-16] GitHub Runner
Registration" — kept as recovery backup; can be hard-deleted after the
first successful runner pod start against the new item path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:01:41 +00:00
Codex
710340d8be chore(github-runner): rename 1P item to GitHub PAT (Runner Registration)
Renames the OnePasswordItem.itemPath from "GitHub Runner Registration
Token" to "GitHub PAT (Runner Registration)" so the runner 1P entry
sits next to its siblings — GitHub PAT (Gitea Mirrors) and GitHub PAT
(NuGet Packages) — under a consistent "GitHub PAT (...)" naming pattern
and API_CREDENTIAL category.

Existing field "credential" remains the consumer (RUNNER_TOKEN env).
Comment block clarified to require Administration:read/write fine-grained
PAT scope on target repos.

Old 1P item renamed to "[DEPRECATED 2026-05-16] GitHub Runner
Registration" — kept as recovery backup; can be hard-deleted after the
first successful runner pod start against the new item path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 10:27:58 -05:00
Andrew Stoltz
7d2daaa4f8 chore(github-runner): replicas 1 → 0 until 1Password token provisioned
github-runner-token OnePasswordItem exists but the underlying 1Password
vault item hasn't been created yet, so the operator can't mint the K8s
Secret. Pod stuck in CreateContainerConfigError → DeploymentReplicasMismatch
alert fires.

Scaling to 0 keeps the manifest infrastructure intact but stops trying
to schedule until operator:
1. Creates "GitHub Runner Registration Token" item in IAmWorkin vault
2. Generates a token at github.com/astoltz/<repo>/settings/actions/runners/new
3. Updates the OnePasswordItem itemPath to point at it
4. Bumps replicas back to 1 via PR
2026-05-15 16:18:19 -05:00
Andrew Stoltz
e50e103ba0 fix(zabbix): bump web probe timeouts 5s→15s + add failureThreshold
zabbix-web nginx+PHP-FPM container serves / at ~3-5s baseline with
occasional 6-7s spikes (probe path renders full dashboard via PHP).
kube-probe was killing the container after 3 consecutive 5s-timeout
499s, producing CrashLoopBackOff alert noise even though the app
was serving real traffic fine.

15s timeout absorbs the natural variance; explicit failureThreshold=3
documents the policy (was implicit default).

Closes the firing PodCrashLoopBackOff (zabbix-web) + pending
HTTPServiceSlow/HTTPServiceDegraded alerts. zabbix.iamworkin.lan
remains slow at the application layer (separate work — PHP-FPM
warm-up + Zabbix server "host not found" agent lookup spam need
their own fixes) but the pod restart loop stops.
2026-05-15 15:59:04 -05:00
Codex
e8094eb0bd ci(github-runner): add Phase 2 ephemeral Linux runner K8s manifest
Namespace github-runner with myoung34/github-runner:latest Deployment,
5Gi Longhorn RWO NuGet cache PVC, zero-privilege ServiceAccount, and
OnePasswordItem CRD for the registration token. EPHEMERAL=true mode
re-registers after each job; Recreate strategy avoids RWO multi-attach.
Targets fc-build-linux label; single replica pinned to rke2-server node.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-14 12:46:25 -05:00
8d87d9172c Add Pi signage Phase 1 player artifacts
Squash merge Sprint 14 Pi signage player artifacts.
2026-05-14 01:46:09 +00:00
Codex
cfd9743afa Add Apple TV signage docs manifest 2026-05-13 20:32:48 -05:00
Andrew Stoltz
5029e209cd kubevirt-vms: boot ci1 from server template 2026-05-12 16:58:18 -05:00
Codex
f298339152 fix(guacamole): add --- separator between macmini-vnc-creds OnePasswordItem and guacamole-branding ConfigMap
Missing document separator caused YAML to merge the OnePasswordItem's
top-level `spec: itemPath:` block into the ConfigMap that follows.
Result: a ConfigMap with a `.spec` field whose K8s schema does not
declare one, triggering ArgoCD's structured-merge diff to fail since
2026-05-11T15:30:54Z:

  Failed to compare desired state to live state: failed to calculate
  diff: error calculating structured merge diff: error building typed
  value from config resource: .spec: field not declared in schema

App stayed Healthy (live K8s tolerated the extra field — ConfigMap
ignored it) but ArgoCD's diff calc was broken, leaving the app stuck at
sync=Unknown for all 21 resources. Adding the missing `---` separator
makes the OnePasswordItem and ConfigMap proper sibling YAML documents,
each with its own kind-correct schema.

Diagnosed during 2026-05-12 morning routine.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 09:26:03 -05:00
Codex
6e7d88db49 feat(fc-redis): add SignalR backplane for cross-product event bus (Q-SO-1 Phase A)
Per Q-SO-1 operator resolution 2026-05-11 PM, Redis SignalR backplane lands
in Phase A (was Phase C deferral). Treats Redis as a managed FC infrastructure
component, not a deferred scaling escalation.

Lands the minimal Phase A surface:
- Namespace fc-redis
- Single Redis 7-alpine pod with 1Gi Longhorn RWO PVC
- ConfigMap with AOF persistence (everysec), 256Mi maxmemory, allkeys-lru
- ClusterIP Service `redis.fc-redis.svc.cluster.local:6379` (in-cluster only)
- No AUTH Phase A (Phase B add via 1Password Connect rotation)
- No IngressRoute (backplane is server-to-server)

Consumers (Phase A IMPL across FC services) add:
  services.AddSignalR().AddStackExchangeRedis(
      "redis.fc-redis.svc.cluster.local:6379",
      opts => opts.Configuration.ChannelPrefix =
          StackExchange.Redis.RedisChannel.Literal("fc-opsconsole"));

Phase B/C follow-ons (not in this commit): Sentinel for HA, AUTH password
from 1Password, redis_exporter sidecar for Prometheus, network policies.

See FlowerCore.Notes/docs/signage/operations-console-phase-2-design.md
section 3.5 (rewritten) and decisions-waiting.html Q-SO-1 (RESOLVED).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 19:02:58 -05:00
Codex
5ae50bd491 fix(telephony): init container runs as root to chown hostPath /tmp/tts-audio
The fix-data-perms init container chowns /data (PVC) and /shared-tts
(hostPath /tmp/tts-audio on rke2-agent1) to uid 1654 so the non-root
telephony-web app can write Piper TTS .sln16 files.

Without an explicit container-level securityContext override, the init
container inherits pod-level runAsNonRoot:true / runAsUser:1654 and
fails with 'chown: /shared-tts: Operation not permitted' the first
time the hostPath comes up root-owned after a node reboot.

Outage 2026-05-11 23:00 UTC: telephony-web in Init:CrashLoopBackOff for
9 hours (100+ restarts) until init container was bumped to runAsUser:0.
Live cluster patched in the same operation; this commit makes the fix
durable in git so ArgoCD sync preserves it.

See Notes memory: feedback_hostpath_initcontainer_chown_perms

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 18:37:15 -05:00
Codex
653d4472f5 fix(monitoring): mirror Q-MR-3 MultusMemoryPressure + NamespacePendingPodBacklog alerts
Two new preventive alert rules added to the kubernetes-state group of the
K8s migration target ConfigMap. The live Podman Prometheus on noc1 has
already been updated via FlowerCore.Notes/scripts/monitoring/alerts.yml +
sudo cp + podman pod restart monitoring (this commit only locks it in
the bluejay-infra K8s mirror so a future migration carries it forward).

MultusMemoryPressure (critical, thermal_print): fires when kube-multus
working set exceeds 80% of its memory limit for 5m. Catches the next
multus OOM cascade BEFORE it kills the daemon cluster-wide. The 2026-05-10
21h outage hit because no alert fired on the rising multus working set;
only downstream blackbox / Traefik / service alerts triggered, after the
fact.

NamespacePendingPodBacklog (warning): fires when any single namespace has
>25 Pending pods sustained for 30m. Catches the operator-leak avalanche
pattern (orphan pods from a crashed reconciler emitting children without
ownerReferences) before it cascades into a CNI OOM.

See FlowerCore.Notes:
  - feedback_multus_50mi_limit_oom_orphan_pod_avalanche
  - feedback_monitoring_k8s_target_vs_live_podman (workflow)

Companion commits:
  - bluejay-infra@eb8693e (multus memory limit)
  - FlowerCore.RemoteDesktop@b02c59b (OwnerReferences fix)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 10:42:27 -05:00
Codex
eb8693e1ce fix(multus): bump kube-multus-ds memory 50Mi/50Mi -> 1Gi/512Mi (prevent OOM cascade)
Cluster outage 2026-05-10T17:43 through 2026-05-11 ~10:30 (~21h). Root cause:
FlowerCore.RemoteDesktop emitted 219 orphan rd-browser-only-* pods in fc-desktop
(missing OwnerReferences — see companion fix in FlowerCore.RemoteDesktop).
Kubelet's continuous CNI ADD retries for those pending pods drove a request
queue that exceeded the upstream default 50Mi limit on kube-multus-ds. Multus
OOMKilled (exit 137), restarted with an even bigger backlog, OOMKilled again,
positive feedback loop. Restart counts climbed to 276 / 412 / 261 across the
3 RKE2 nodes.

Downstream blast radius: both Traefik pods stuck ContainerCreating (101m +
4h35m), all Longhorn CSI attacher/provisioner/instance-manager stuck, every
Prometheus blackbox probe for *.iamworkin.lan failing, UpdateCenterPublicEdgeDown
critical on update.flowercore.io, every ArgoCD app showing sync=Unknown
because repo-server lost git connectivity. 45 firing Prometheus alerts.

Recovery sequence (Q-MR-1 from FlowerCore.Notes morning routine):
1. kubectl patch kube-multus-ds memory live (this commit locks it in git so
   ArgoCD doesn't revert on next sync)
2. Force-delete the 219 orphan pods (kubectl --grace-period=0 --force) to
   break the avalanche
3. Rollout restart kube-multus-ds — STABLE after restart with new limit
4. Restart Traefik + Longhorn CSI to clear stuck ContainerCreating
5. Verify update.flowercore.io returns 200 + ArgoCD apps reconcile

Tested incrementally: 256Mi limit was insufficient (still OOMed on catchup
burst), 512Mi was insufficient on rke2-agent1 (most pods concentrated there),
1Gi/512Mi handled the full 200+ pending pod CNI catchup cleanly with 0 multus
restarts after rollout. Nodes are 64GB with <25% used in steady-state, so the
~256Mi typical working-set is well within the new limit.

Companion change: FlowerCore.RemoteDesktop must set OwnerReferences on every
worker pod so future operator crashes don't leak orphans (Q-MR-2). Preventive
alerts (Q-MR-3) MultusMemoryPressure + NamespacePendingPodBacklog are coming
in a follow-up commit to apps/monitoring/.

Memory: feedback_multus_50mi_limit_oom_orphan_pod_avalanche
Decisions card: docs/dashboards/decisions-waiting.html Q-MR-1..3

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 10:30:05 -05:00
Codex
667777a653 revert(ci1): back to cdrom:scsi (virtio-blk disk hit QEMU flock)
The virtio-blk disk swap (commit 84c9feb) didn't help: qemu fails to
acquire the write lock on the rootdisk PVC because the previous
launcher's qemu process didn't release it cleanly. Same family of
bug as the "stale QEMU flock" already documented in
feedback_kubevirt_iso_first_install_bootorder_and_runstrategy, but
now triggered on rke2-agent1 instead of agent2.

OVMF cdrom timeout is the real blocker and remains open:
  -  Distribution pipeline (build → save → scp → ctr import on all
    3 RKE2 nodes) is proven. localhost/win-server-2025:1.0 lives in
    each node's containerd k8s.io namespace.
  -  containerDisk + cdrom:scsi gets qemu domain Running (no NFS
    Permission denied, no rootdisk flock).
  -  OVMF BdsDxe times out reading the SCSI cdrom regardless of
    SecureBoot setting and bus type.

Reverting the disk type to cdrom:scsi so the VM lands back on the
"qemu Running, OVMF stuck at Boot Manager" state — known-stable and
easier to attack than the QEMU-flock state we hit by trying
virtio-blk disk.

Operator decision for next architectural step (one of):
  - Custom OVMF firmware build with longer Boot0001 timeout
  - KubeVirt version bump (v1.5+ has OVMF fixes)
  - Hyper-V/VirtualBox install + export VHD to ci1
  - BIOS legacy boot (Win Server 2025 needs UEFI but install media
    has a BIOS path)
  - DataVolume HTTP datasource (CDI internalizes ISO bytes via
    different code path)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 21:35:00 -05:00
Codex
84c9feb893 fix(ci1): present ISO as virtio-blk disk instead of cdrom
OVMF BdsDxe "starting Boot0001 ... Time out" persists across:
  - SATA cdrom + Longhorn Filesystem PVC (Path A)
  - SATA cdrom + Synology NFS (Path B failed: storage perms)
  - SCSI cdrom + Longhorn (Path B variant)
  - SCSI cdrom + containerDisk tmpfs (Path C)
  - + SecureBoot=false

That rules out: storage IO speed, cdrom bus type, signature
verification. Remaining cause is deeper in qemu's cdrom device
emulation under KubeVirt v1.4.0's OVMF firmware — the cdrom read
window for OVMF's first-sector probe is too short to satisfy from
the cdrom controller path regardless of bus type.

Workaround: present the ISO bytes as a regular virtio-blk DISK
(not a cdrom). UEFI/OVMF still recognizes ISO9660 + El Torito
boot records on any block device, so it can find and boot the
EFI bootloader the same way it would from a USB stick. virtio-blk
has a different read path that doesn't hit the cdrom-specific
timeout.

This also better aligns with the FlowerCore.Distribution USB-key
pattern: ISO bytes on a block device, UEFI boots from the El
Torito boot record, Windows installer takes over. The autounattend
ConfigMap (ci1-autounattend) drives unattended Windows setup once
the installer kicks off.

The containerDisk OCI image (localhost/win-server-2025:1.0)
remains unchanged — only the disk type in the VM spec changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 21:29:59 -05:00
Codex
427dbfcef2 [uc] Phase 1 auth gate deploy v20260509-4162dca-authgate 2026-05-08 21:16:54 -05:00
Codex
b651a4e2d0 fix(ci1): disable SecureBoot to allow OVMF to boot Windows ISO
containerDisk delivery (commit b998f50) successfully gave qemu fast
in-memory access to the ISO bytes (no NFS denial, no Longhorn read
latency), but OVMF's BdsDxe still timed out:

  BdsDxe: loading Boot0001 "UEFI QEMU QEMU CD-ROM " from
    PciRoot(0x0)/Pci(0x2,0x4)/Pci(0x0,0x0)/Scsi(0x0,0x0)
  BdsDxe: starting Boot0001 ... Time out

That rules out storage IO speed and bus type as causes (already
tested both sata and scsi against both Longhorn-PVC and tmpfs-backed
containerDisk). Remaining likely cause: SecureBoot signature
verification on the ISO's EFI bootloader. KubeVirt's stock
`/usr/share/OVMF/OVMF_VARS.secboot.fd` doesn't appear to ship with
the Microsoft KEK/DB enrolled by default, so signed Windows EFI
bootloaders fail the trust-chain check and OVMF reports a generic
"Time out" rather than a verification failure.

Disabling SecureBoot lets OVMF skip the chain check entirely and
boot the El Torito EFI image. SMM stays enabled (KubeVirt only
requires it WITH SecureBoot, not the inverse). TPM 2.0 emulation
also stays on (`tpm: {}`), so BitLocker, Hyper-V, and WSL2 still
work in the guest.

This is acceptable for a CI runner. Long-term path back to
SecureBoot:
  1. Custom-build OVMF_VARS.fd with Microsoft KEK/DB pre-enrolled
  2. Mount via firmware.bootloader.efi.persistent
  3. secureBoot: true

Tracked as a Phase 2 hardening task once the runner is doing real
work and we want signed-boot guarantees.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 21:06:18 -05:00
Codex
b998f50f48 fix(ci1): switch ISO delivery to containerDisk OCI image (Path C)
OCI image: localhost/win-server-2025:1.0 (8.27 GB)
Built FROM scratch + ADD disk.img → /disk/disk.img on noc1, podman
saved as tar (8.27 GB), SCP'd in parallel to all 3 RKE2 nodes,
imported via ctr in k8s.io namespace. Verified present on all 3
schedulable nodes (rke2-server, rke2-agent1, rke2-agent2).

Why containerDisk over the prior PVC paths:
  - Path A (Longhorn Filesystem PVC, sata): OVMF BdsDxe SATA-CDROM
    read timeout. Cdrom-backed PVC is too slow for OVMF's first-sector
    read window.
  - Path B (Synology NFS): uid 107 (qemu) denied at directory level by
    Synology export ACL despite file mode 0777. Memory:
    feedback_synology_iso_export_root_only_uid_107_denied.
  - Path B+SCSI: same OVMF timeout, just on SCSI controller. Bus
    choice was not load-bearing — the issue was always the slow PVC
    backing.
  - Path C (this commit): containerDisk delivers the ISO bytes from
    a tmpfs view of the OCI layer, no PVC controller in the read path.
    qemu reads at native FS speed; OVMF first-sector read completes
    well within timeout. This is also the KubeVirt-recommended pattern
    for installer ISOs.

Connects to FlowerCore.Distribution / Provisioning USB story: same
"OCI image of the OS installer + autounattend on a sysprep CDROM"
pattern that the USB provisioning agent will use. The Windows
install proceeds hands-off via the existing autounattend.xml in
ci1-autounattend ConfigMap (RDP enabled, WinRM, UAC disabled,
Administrator password from 1Password vault item
h3ix4mgfk65gmkcmvh6ly3d3hu).

Image lifecycle: bump tag (1.1, 1.2, ...) when ISO version changes,
rebuild on noc1, redistribute to RKE2 nodes, update image: line.

Legacy NFS PVC + PV manifest and CDI Longhorn PVC RETAINED for this
commit so prior states are recoverable. Will prune in follow-up
once containerDisk boot proves.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 20:45:38 -05:00
Codex
8fd9ae1cd3 fix(ci1): revert NFS Path B + flip ISO cdrom bus sata→scsi
NFS Path B (commit fc2aca0) failed at storage layer: Synology export
`/volume1/ISOs` denies non-root client UIDs at the directory level.
qemu uid 107 cannot `ls /iso/` even though disk.img is mode 0777.

Diagnosed via uid-107 + uid-0 busybox probe pods on rke2-agent2:
- libvirt error: "Cannot access storage file ... Permission denied"
  (virStorageSourceReportBrokenChain:1281, virError Code=38 Domain=18)
- uid 107 pod: "ls: can't open '/iso/': Permission denied"
- uid 0 pod (same mount): "drwxrwxrwx 1 root root 16 ... disk.img"
- SELinux Enforcing + virt_use_nfs=on, no AVC denials → not SELinux
- File mode 0777 with owner 107:107 → not POSIX

Same export-only-root pattern as `/volume1/kubernetes`. Memory:
feedback_synology_iso_export_root_only_uid_107_denied.md

Existing CDI-uploaded Longhorn PVC `windows-server-2025-iso` (10Gi
Filesystem mode) verified to contain valid ISO bytes readable by
uid 107 (mode 0660 root:107, 9.85 GB sparse, 8.27 GB blocks ≈
original 7.7 GB ISO). Reverting to it.

The original OVMF SATA-CDROM read timeout that drove yesterday's
NFS pivot is now addressed by `cdrom: bus: scsi` (virtio-scsi has
a longer read window than the IDE/SATA emulator). Per user-prompt
diagnostic chain Step 5.

NFS PVC + PV (apps/kubevirt-vms/win2025-iso-nfs-pv.yaml) RETAINED
so Path B state is recoverable; can be pruned in follow-up once
SCSI boot is proven.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 18:54:36 -05:00
Codex
fc2aca0e9e fix(ci1): mount Windows ISO via Synology NFS (Path B for SATA-CDROM timeout)
Previous fix attempts confirmed the Longhorn-backed Filesystem PVC contains
a perfectly valid bootable ISO9660 image. The bug is that SATA-CDROM
emulation reading from a Longhorn Filesystem PVC is too slow for OVMF's
boot read window — DVD-ROM enumeration times out before the bootloader
loads. Symptom on the serial console:
  BdsDxe: failed to start Boot0001 "UEFI QEMU DVD-ROM QM00001 " ... Time out
  BdsDxe: No bootable option or device was found

Block-mode PVC (Path A) was attempted and would likely fix the timing,
but CDI v1.65.0's upload-target pod cannot open the underlying block
device (runAsUser:107 + capabilities.drop:[ALL]):
  blockdev: cannot open /dev/cdi-block-volume: Permission denied

Path B (this change): mount the ISO directly from Synology NAS over
NFSv4.1. Bypasses both the Longhorn slowness and the CDI permission
issue. QEMU's SATA emulator reads at native LAN speed.

Layout:
  /volume1/ISOs/ — existing Synology export, RKE2 ACL already granted
  /volume1/ISOs/win2025-iso-disk/disk.img — new subdir, hardlink to the
    ISO file, named so KubeVirt's launcher finds it at the PV root

A hardlink (not symlink) is required because symlinks with relative
targets pointing to the parent directory are broken when the NFS PV
sub-mounts the subdir as its root.

Validated 2026-05-08 from rke2-server, rke2-agent1, rke2-agent2:
  mount -t nfs -o nfsvers=4.1,ro 10.0.58.3:/volume1/ISOs/win2025-iso-disk
  file disk.img -> ISO 9660 CD-ROM filesystem data ... (bootable)

The original Longhorn Filesystem ISO PVC is RETAINED unused (so ArgoCD
doesn't prune the populated PVC and so we have a fallback). Can be
removed in a follow-up commit after the NFS path is proven on a
successful Windows install.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 17:03:42 -05:00
Codex
ba18c52130 docs(ci1): record open rootdisk-flock and SATA-CDROM-timeout issues
Documenting the remaining 2 unresolved issues for the next operator
session, with the recovery paths from this session captured inline so
the next agent doesn't repeat the same blind alleys:

1. **rootdisk QEMU flock** — every new launcher pod fails QEMU start with
   `Failed to get "write" lock` on the rootdisk Filesystem-mode disk.img.
   Stale flock from a previous force-deleted virt-launcher pod. Longhorn
   engine on rke2-agent2 needs to release the lock; `kubectl patch
   volume.longhorn.io/<pvc-name> spec.nodeID=""` is reverted by the
   Longhorn controller. Operator-level recovery only.

2. **SATA CDROM read timeout** — even with bootOrder=1 (windows-iso first),
   OVMF UEFI fails Boot0001 with "Time out" reading the SATA CDROM backed
   by the Filesystem-mode PVC. Block-mode DataVolume migration was
   attempted but blocked by CDI v1.65.0's upload pod running with
   `capabilities.drop: [ALL]` and `runAsUser: 107`, preventing direct
   block-device writes (`blockdev: cannot open /dev/cdi-block-volume:
   Permission denied`). See ISO PVC header docstring for 3 forward paths.

Net commits during this session:
- 1c4145a: bootOrder swap (windows-iso=1, rootdisk=2)
- 87a7d7c: deprecated `running:` -> `runStrategy: Always`
- 0bf47df: ISO migration to Block-mode DataVolume (REVERTED)
- 9f6dc1a: revert to Filesystem PVC (CDI block-upload blocked)
- 1c4145a + 87a7d7c + 9f6dc1a are the live, correct configuration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 15:18:38 -05:00
Codex
9f6dc1a9d5 fix(ci1): revert ISO to Filesystem PVC; CDI v1.65.0 block-upload pod blocked by capability drop
The Block-mode DataVolume migration (commit 0bf47df) hit a CDI v1.65.0 limitation:
the upload-target pod runs as uid 107 with `capabilities.drop: [ALL]`, so it
cannot open the underlying block device:

  blockdev: cannot open /dev/cdi-block-volume: Permission denied
  Saving stream failed: Unable to transfer source data to target file:
  error determining if block device exists: exit status 1

Reverting to a Filesystem-mode PVC + virtctl image-upload pvc, which DID work
(uploaded the 7.7 GiB ISO with valid ISO9660 magic intact). Boot timeout is
unresolved (header docstring captures the open issue + 3 paths to revisit).

The bootOrder swap (1c4145a) and runStrategy migration (87a7d7c) stay landed —
those are correct improvements regardless of the volume-mode question.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 14:32:52 -05:00
Codex
0bf47dfa33 fix(ci1): switch ISO from filesystem PVC to Block-mode DataVolume
The bootOrder swap alone didn't fix the install — even with `windows-iso` at
bootOrder:1, OVMF UEFI still timed out reading the SATA CDROM:

  BdsDxe: starting Boot0001 "UEFI QEMU DVD-ROM QM00001 " from ... Sata(...)
  BdsDxe: failed to start Boot0001 ... : Time out
  BdsDxe: No bootable option or device was found.

Diagnosis (debug pod mounting the live PVC):
- /pvc/disk.img IS a valid bootable ISO9660 image — `file` reports
  "ISO 9660 CD-ROM filesystem data 'SSS_X64FRE_EN-US_DV9' (bootable)".
- bytes 0..15: zeros (NOT QCOW2 magic 51 46 49 fb).
- bytes 32769..32773: "CD001" — ISO9660 primary volume descriptor at the
  correct offset.

So content was fine. The bug is in how KubeVirt + QEMU + Longhorn expose a
Filesystem-mode PVC's `/disk.img` as a SATA CDROM. With Block-mode the
underlying volume IS the raw ISO9660 sectors, OVMF reads them directly,
no QEMU file-emulation layer. This is the recommended pattern for ISO
install media on KubeVirt + Longhorn.

Migration:
- Replace `kind: PersistentVolumeClaim` with `kind: DataVolume` (CDI manages
  the underlying PVC + upload-target pod).
- Set `pvc.volumeMode: Block`.
- Annotate `cdi.kubevirt.io/storage.contentType: kubevirt` so CDI keeps raw
  bytes (no QCOW2 wrap).
- VM volume reference changes from `persistentVolumeClaim.claimName` to
  `dataVolume.name`. KubeVirt's VMI controller blocks VM start until DV
  phase is Succeeded (upload completed).

Operator step after this lands:
1. Wait for DV `phase: UploadReady`
   kubectl get dv -n kubevirt-vms windows-server-2025-iso -w
2. virtctl image-upload dv windows-server-2025-iso -n kubevirt-vms \
     --image-path "...\en-us_windows_server_2025...iso" \
     --uploadproxy-url https://localhost:8443 --insecure --no-create
3. Re-flip runStrategy to Always (was set to Halted live-side during
   migration; this commit keeps the manifest at Always).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 14:23:31 -05:00
Codex
87a7d7c70a fix(ci1): switch deprecated running: true -> runStrategy: Always
Required to clear OutOfSync state after the bootOrder fix. Live VM had
runStrategy: Halted (set during diagnosis to release the PVC for inspection).
Manifest had running: true. KubeVirt's validating webhook rejects sync:
  admission webhook "virtualmachine-validator.kubevirt.io" denied the request:
  Running and RunStrategy are mutually exclusive.

Switching to runStrategy: Always preserves the original "auto-start +
auto-restart" semantics with the non-deprecated field, and gives ArgoCD a
clean diff target to flip Halted -> Always.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 14:12:07 -05:00
Codex
1c4145a581 fix(ci1): swap bootOrder so Windows install ISO boots first
Original order: rootdisk=1 (empty 200Gi virtio), windows-iso=2 (SATA CDROM).
UEFI tried the empty virtio disk first, got nothing, fell back to Boot0001
(the SATA CDROM) with a short timeout, and aborted with:
  BdsDxe: failed to start Boot0001 ... Time out
  BdsDxe: No bootable option or device was found.

VM had been running 38+ min with rootdisk actualSize stuck at 4.13 GiB and
no AgentConnected condition — install never started.

Diagnosis via debug pod mounting the windows-server-2025-iso PVC:
  /pvc/disk.img: ISO 9660 CD-ROM filesystem data 'SSS_X64FRE_EN-US_DV9' (bootable)
  bytes 0..15: zeros (NOT QCOW2 magic 51 46 49 fb)
  bytes 32769..32773: "CD001" (ISO9660 primary volume descriptor)

So the PVC content is a real bootable ISO — the only fix needed is to make
the ISO bootOrder=1 for first install. After Windows installs, it writes its
own UEFI Boot#### entries pointing at the rootdisk EFI partition; UEFI then
boots from rootdisk going forward and the ISO at bootOrder:2 is a fallback
for re-install scenarios.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 14:10:17 -05:00
Codex
c50a403f74 fix(infra): pin virtio-container-disk to v1.8.2 (containerd 2.1 manifest fix)
KubeVirt v1.4.0 + RKE2 containerd 2.1.5 cannot pull
quay.io/kubevirt/virtio-container-disk:latest:
  rpc error: code = Unimplemented
  desc = failed to pull and unpack image: not implemented:
  media type "application/vnd.docker.distribution.manifest.v1+prettyjws"
  is no longer supported since containerd v2.1, please rebuild the image as
  "application/vnd.docker.distribution.manifest.v2+json" or
  "application/vnd.oci.image.manifest.v1+json"

The :latest tag was last rebuilt with the v1 manifest schema. Tagged versions
v1.6.5+, v1.7.3, v1.8.2 are rebuilt with v2/OCI manifests.

Pinning to v1.8.2 (newest available, contains current Windows VirtIO drivers).
The image only contains the Windows VirtIO driver ISO mounted as a CDROM —
not the KubeVirt runtime — so it is decoupled from the cluster KubeVirt
version.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 13:28:22 -05:00
Codex
fb7bd10528 feat(infra): activate ci1 VM — running:true + 10Gi ISO PVC + 1P password
Phase 1 prereqs all satisfied:
- Multus CNI v4.2.2 thick-plugin DS Running on rke2-server/agent1/agent2
- CDI v1.65.0 operator + CR Deployed (cdi-apiserver/deployment/uploadproxy
  all Running 1/1)
- Windows Server 2025 ISO (7.7GiB, March 2026 update) uploaded via CDI
  virtctl image-upload to PVC windows-server-2025-iso. Verified via PVC
  annotations: cdi.kubevirt.io/storage.condition.running.message="Upload
  Complete", storage.pod.phase="Succeeded"
- Local Administrator password generated (26 char, FANTASTIC strength).
  Stored in 1Password vault IAmWorkin (qaphopopkryhbg353ukzhhuqoq) item
  h3ix4mgfk65gmkcmvh6ly3d3hu. UTF-16-LE base64 in autounattend.xml Value
  field matches the 1P "autounattend AdministratorPassword Value" field.

Changes:
- ISO PVC bumped 6Gi → 10Gi (ISO is 7.7GiB, need headroom)
- Added labels app=ci-runner, flowercore.io/managed-by=bluejay-infra
- autounattend.xml AdministratorPassword Value: real base64-encoded password
- spec.running: false → true (VM starts on next ArgoCD sync)
- Header comment refreshed to LIVE state with prereq references

Network: still pod-network masquerade. Multus NAD prod-vlan57 is registered
but the VM doesn't use it yet (Phase 1.5 host bridge needed first).

Verify after sync:
  kubectl --kubeconfig $env:USERPROFILE\.kube\rke2.yaml -n kubevirt-vms get vm,vmi
  virtctl --kubeconfig $env:USERPROFILE\.kube\rke2.yaml vnc ci1 -n kubevirt-vms

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 13:24:46 -05:00
Codex
6c21d14a98 deploy(fc-updater): bump image to v20260508-pub3-deepening-2bdf108
Promotes the fleet to FlowerCore.Updater main @ 2bdf108 which combines:
- PR #6 publish pre-signed releases (1a188f4)
- PR #7 deeper public-host privacy enforcement (8cd8544)
- PublishPreSignedAsync(stream, sig) Integration coverage (2bdf108)

Live image already imported to rke2-server and rolled via deploy-web.ps1.
This commit aligns the bluejay-infra source of truth so ArgoCD doesn't
snap the spec back to the previous tag (per
feedback_argocd_managed_image_overrides_do_not_stick).
2026-05-08 13:07:24 -05:00
Codex
b3529f8e96 feat(infra): add Multus CNI + CDI + PROD VLAN 57 NAD as GitOps prereqs for ci1
Adds three new bluejay-infra apps that auto-pickup via ApplicationSet (apps/*
directory generator on main):

* apps/multus/multus.yaml — Multus CNI v4.2.2 thick-plugin daemonset (verbatim
  upstream, project-annotated). Enables KubeVirt VMs to attach additional
  network interfaces. Required by ci1 to bridge onto PROD VLAN 57.

* apps/cdi/{cdi-operator.yaml,cdi-cr.yaml,README.md} — Containerized Data
  Importer v1.65.0 (verbatim upstream). Operator + CR pattern. Enables
  populating PVCs from HTTP/registry/upload sources, used to load the Windows
  Server 2025 ISO into the windows-server-2025-iso PVC.

* apps/kubevirt-vms/prod-vlan57-nad.yaml — NetworkAttachmentDefinition for
  PROD VLAN 57 bridge. **Deploy gated on Phase 1.5 host work**: requires
  br-prod bridge enslaving enp86s0.57 on each RKE2 node (Puppet config-as-code).
  ci1.yaml continues to use pod-network masquerade until that lands; switching
  to multus.networkName: kubevirt-vms/prod-vlan57 is a one-line YAML edit
  followed by a GitOps push.

Cluster verification (2026-05-08):
- KubeVirt LIVE (3 nodes, virt-api/controller/handler/operator all Running)
- Calico CNI on /etc/cni/net.d + /opt/cni/bin (Multus default paths)
- ApplicationSet `bluejay-infra` already watches `apps/*` on main

Reproducibility: upstream YAMLs vendored verbatim with project header diffs
only. Bumping versions = re-curl + git push. No deploy-time internet fetch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 13:05:58 -05:00
Codex
00c11b4eaa feat(infra): stage ci1 Windows Server 2025 KubeVirt VM (Phase 1, NOT YET APPLIED)
Stages a draft VirtualMachine + Namespace + ISO PVC + rootdisk PVC + sysprep
ConfigMap for the dedicated GitHub Actions self-hosted runner that replaces
the never-registered bluejay-ws-sandbox-1 placeholder.

Status: STAGED ONLY. spec.running = false. ISO PVC empty. Two operator
decisions still pending before this can boot:
  1. Network choice — pod-network fallback (in this draft) vs Multus +
     PROD VLAN NAD (preferred, requires Multus install).
  2. ISO path — manual upload via helper pod (Path A) vs CDI HTTP import
     (Path B, requires CDI install).

Cluster baseline 2026-05-08:
  - KubeVirt operator: installed, healthy, 14d
  - CDI: NOT installed
  - Multus: NOT installed
  - Calico-only CNI

See docs/infrastructure/windows-server-build-runner-plan.md "Phase 1 readiness
gate" for the full operator pickup checklist.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 12:32:47 -05:00
Codex
04881f46f0 deploy(intranet): promote brochure wave 1 image 2026-05-08 11:12:56 -05:00
Codex
c0038e4859 deploy(intranet): bump image to v20260508-7bad3a5 (Theme picker + FcThemedRoot)
FlowerCore.Intranet.Web@7bad3a5 'feat(theme): add /admin/theme picker page + wrap routes in FcThemedRoot'.
Image built, distributed to all 3 RKE2 nodes (10.0.56.11/12/13), 366/366 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 10:20:11 -05:00
Codex
dee48831c6 Deploy updater public privacy hardening 2026-05-07 17:12:33 -05:00
Codex
0f1dc5f871 fix(certs): kill cert-manager renewal loop on 3 broken Certificate specs
Three Certificates requested duration: 2160h (90d) with renewBefore: 720h
(30d). step-ca's ACME provisioner caps cert lifetime at 30d, so it silently
issued 720h certs — making renewBefore EQUAL to the actual cert lifetime.
cert-manager treats the cert as needing immediate renewal the moment it's
issued, creates a CertificateRequest, gets a new (still 30d) cert, marks
it for immediate renewal, and loops.

Damage on 2026-05-07 ~20:30 (caught during regroup after 5h gap):
  - fc-worldbuilder/worldbuilder-web-tls:  2365 CRs in 18h
  - fc-distribution/fc-distribution-tls:  10880 CRs in 18h
  - knowledge/knowledge-tls:              10888 CRs in 18h
  Total: 24,133 stale CertificateRequest objects in etcd.

Bulk-deleted all CRs + Orders in those 3 namespaces, then this commit
fixes the source so ArgoCD sync stops re-creating the loop.

Fix: match the working 720h/240h pattern used by every other FC service
cert (agent-zero, fc-dns, fc-llm-bridge, fc-php, traefik-system, etc.).
30d cert lifetime + 10d renewal headroom = renewal at day 20, which is
the cert-manager standard 2/3-of-lifetime practice.

Side effect during loop: ALSO contributed to step-ca load and may have
caused intermittent timeouts cluster-wide (the latest stuck challenge
was timing out dialing step-ca:9443 even though step-ca itself was up).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 15:32:00 -05:00
Codex
11c5f6e6cc fix(selenium): GitOps-capture selenium-netpol (was unmanaged anywhere)
Captured during 2026-05-07 regroup audit. selenium-netpol was applied via
raw `kubectl apply` to the cluster on 2026-03-15 with no source-of-truth
file anywhere — neither in bluejay-infra nor in any FC service repo. A
cluster rebuild from bluejay-infra would have lost it entirely (including
the Selenium Grid → Traefik VIP allow rule that gates AAT runs against
*.iamworkin.lan services).

Captured byte-for-byte from `kubectl get netpol -n selenium selenium-netpol
-o yaml`. ServerSideApply via ArgoCD will adopt the existing resource
without recreation.

The Selenium Grid Deployment + Services themselves are still managed
outside ArgoCD (deployed via raw kubectl from the original bring-up).
Migrating those into bluejay-infra is a separate lane — this commit only
restores GitOps repeatability for the NetworkPolicy.

See feedback_networkpolicies_belong_in_bluejay_infra.md for the canonical
pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 10:30:59 -05:00
Codex
d637fe9b30 fix(fc-desktop): land 4 NetworkPolicies into bluejay-infra (was deploy-script-only)
Repeatability gap caught during 2026-05-07 morning regroup. The four
fc-desktop NetworkPolicies (desktop-isolation, fc-desktop-default-deny,
remotedesktop-web-isolation, cm-acme-http-solver-allow) were applied via
FlowerCore.RemoteDesktop/scripts/deploy-web.sh `kubectl apply` calls.
That meant a fresh cluster rebuild from bluejay-infra alone would miss
all of them — Browser Lab session isolation, control-plane allow-list,
and HTTP-01 cert renewal would silently fail to come up.

Canonical FC GitOps pattern is for NetworkPolicies to live alongside
other resources in bluejay-infra. Verified by audit: 6 of 11 cluster
NetworkPolicies (agent-zero, edge2-services, monitoring, noc-services,
telephony, voice) already follow this pattern. fc-desktop was the
outlier; selenium-netpol is also unmanaged and tracked separately.

Source-of-truth split (now documented in fc-desktop.yaml):
  - bluejay-infra OWNS: Certificate + IngressRoute + all NetworkPolicies.
  - FlowerCore.RemoteDesktop scripts/deploy-web.sh OWNS: Deployment +
    Service ONLY (because `localhost/fc-desktop:linux-xfce` image refs
    require manual ctr import on each node — Deployment in bluejay-infra
    would race the image-import step).

Follow-up commits in FlowerCore.RemoteDesktop will:
  - Remove the now-duplicate k8s/{networkpolicy,namespace-default-deny,
    web-networkpolicy,acme-http01-solver-allow}.yaml files.
  - Drop the 3 `kubectl_apply_file` lines from scripts/deploy-web.sh.

The 4 NPs in this commit are byte-for-byte identical to what's running in
the cluster today (verified via kubectl get -o yaml diff). ServerSideApply
in the bluejay-infra ApplicationSet will adopt the existing resources
without recreating them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 10:27:20 -05:00
Codex
5bfe41beca fix(monitoring): rename bare Grafana dashboard JSONs out of *.json extension
ArgoCD's directory-driven manifest parser scans *.yaml AND *.json by
default. Bare Grafana dashboard JSONs (no apiVersion/kind/metadata)
poisoned manifest generation for the entire infra-monitoring Application
("Object 'Kind' is missing in <dashboard JSON>"), leaving sync state
Unknown.

These files are SOURCE for the file-provisioning path on noc1
(/opt/monitoring/grafana/dashboards/) and also get inlined into ConfigMap
wrappers like grafana-dashboard-remotedesktop.yaml. They are NOT K8s
manifests and shouldn't be in ArgoCD's manifest tree.

.argocdignore is for repo-level GitOps source eligibility, not for
filtering manifests within a directory-mode Application — the cleanest
fix is the *.txt extension that ArgoCD's parser skips.

Reverts the .argocdignore from the previous commit (didn't take effect).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 10:13:37 -05:00
Codex
df22774674 fix(infra): unstick fc-updater + monitoring ArgoCD apps
fc-updater PVC: bump updatecenter-data storage 10Gi → 25Gi.
The cluster PVC was already manually expanded to 20Gi to fit Mike Bundle
(~5.1 GiB) plus the LocalFsBundleStore.MaxTotalBytes soft cap of 25 GiB
(see project_uc_remaining_4_apps_signed_2026_05_06). PVCs cannot shrink,
so ArgoCD couldn't sync the smaller git value (OutOfSync, retried 5x with
"field can not be less than status.capacity"). Setting git to 25Gi gives
headroom matching the soft cap.

monitoring .argocdignore: skip bare dashboard JSON files.
Both fc-updatecenter-dashboard.json and flowercore-remotedesktop-grafana-
dashboard.json live in apps/monitoring/ as source-of-truth for file-
provisioning to noc1's /opt/monitoring/grafana/dashboards/. ArgoCD's
manifest parser tries to unmarshal every file and chokes on bare dashboard
JSON ("Object 'Kind' is missing"), which then poisoned the whole
infra-monitoring Application status (Unknown sync, no comparison possible).
The .argocdignore tells ArgoCD to skip *.json — actual K8s deploys happen
via ConfigMap wrappers like grafana-dashboard-remotedesktop.yaml.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 10:11:27 -05:00
Codex
c4065b15a3 deploy(ttsreader): persist voice reference clips on pvc 2026-05-06 20:48:58 -05:00
Codex
a4aa612373 deploy(fc-distribution): roll startup backfill fix 2026-05-06 19:51:11 -05:00
Codex
c2eb37dee9 deploy(ttsreader): enable phase6 biblical routing 2026-05-06 19:46:25 -05:00
Codex
bf6f542569 deploy(fc-distribution): roll latest endpoint fix 2026-05-06 19:38:26 -05:00
Codex
e150b2102f deploy(fc-distribution): roll phase1 api image 2026-05-06 19:31:22 -05:00
Codex
33a765b0bc deploy(fc-intranet-web): roll v20260506-1737 with Wave 2 specialist galleries
6 Wave 2 product galleries landed on intranet master c083016:
- /specialists/telephony  (7 sections + Overview, +11 tests)
- /specialists/library    (8 sections + Overview, +17 tests)
- /specialists/retail     (6 sections + Overview, +16 tests)
- /specialists/mysql      (6 sections + Overview, +22 tests)
- /specialists/php        (6 sections + Overview, +9 tests)
- /specialists/pimanager  (7 sections + Overview, +11 tests)

NavMenu.razor wired with new Specialists section.

Test ledger: 280 -> 366 (+86) full project, 0W/0E build.

Sources: 6 sibling-depth worktrees claude/intranet-w2-{name} dispatched
2026-05-06 per intranet-xxxl-sprint-2026-05-05.md §4 Phase 2.
Inherits Q-IK-1..15 + Q-IS-1..12 + Q-IX-1..7 verbatim per Q-IW-5.
6 Q-IW-1..6 cards on Notes decisions-waiting.html.
2026-05-06 17:38:22 -05:00
Codex
5484ed7db6 Adopt fc-updater into ArgoCD 2026-05-06 17:33:32 -05:00
75 changed files with 14527 additions and 126 deletions

View File

@@ -118,6 +118,7 @@ That test project sweeps `bluejay-infra/apps/**` plus the canonical sibling `Flo
## References
- OpenVox noc1 durability runbook: `docs/runbooks/openvoxserver-quadlet-durability.md`
- Cert-manager recovery playbook: `FlowerCore.Notes/memory/project_cert_manager_recovery_2026_04_22.md`
- Why pfSense DNS is required: `FlowerCore.Notes/memory/feedback_pfsense_dns_required_for_acme.md`
- Public DNS operator host: `https://dns.iamworkin.lan`

69
apps/cdi/README.md Normal file
View File

@@ -0,0 +1,69 @@
# CDI — Containerized Data Importer
KubeVirt's `containerized-data-importer` for populating PVCs from external
sources (HTTP, HTTPS, container registry, S3, virtctl upload). Required to
import the Windows Server 2025 ISO into the `windows-server-2025-iso` PVC
that `apps/kubevirt-vms/ci1.yaml` mounts as a CDROM.
## Files
| File | Source | Purpose |
| ----------------- | ----------------------------------------------------------------------------------------------------------------- | -------------------------------------------------- |
| `cdi-operator.yaml` | [`v1.65.0`](https://github.com/kubevirt/containerized-data-importer/releases/tag/v1.65.0) — verbatim copy | Installs operator + CRDs (5779 lines, large) |
| `cdi-cr.yaml` | [`v1.65.0`](https://github.com/kubevirt/containerized-data-importer/releases/tag/v1.65.0) — annotated + commented | Tells operator to deploy CDI components |
`cdi-operator.yaml` is **vendored verbatim** from the upstream release for
air-gap reproducibility (no internet fetch at deploy time, ArgoCD prune
contracts hold). To bump versions:
```bash
CDI_VER=v1.66.0 # for example
curl -sL "https://github.com/kubevirt/containerized-data-importer/releases/download/${CDI_VER}/cdi-operator.yaml" \
-o apps/cdi/cdi-operator.yaml
curl -sL "https://github.com/kubevirt/containerized-data-importer/releases/download/${CDI_VER}/cdi-cr.yaml" \
-o /tmp/cdi-cr-new.yaml # then re-apply project header diff
git diff apps/cdi/ # review
git commit + push
```
## Verify after deploy
```bash
kubectl -n cdi get pods # operator + apiserver + deployment + uploadproxy
kubectl get cdis cdi -o jsonpath='{.status.phase}' # "Deployed"
kubectl get crd | grep cdi.kubevirt.io
# Expected CRDs: datavolumes.cdi.kubevirt.io, cdiconfigs.cdi.kubevirt.io,
# storageprofiles.cdi.kubevirt.io, dataimportcrons.cdi.kubevirt.io,
# datasources.cdi.kubevirt.io, objecttransfers.cdi.kubevirt.io
```
## Use after install
```yaml
# Example DataVolume that imports from HTTP
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
name: my-iso
spec:
source:
http:
url: "https://server/path/to.iso"
pvc:
accessModes: [ReadWriteOnce]
resources:
requests:
storage: 10Gi
storageClassName: longhorn
```
```bash
# Or upload from local disk via virtctl
virtctl image-upload pvc my-iso \
--image-path ./my.iso \
--size 10Gi \
--storage-class longhorn \
--access-mode ReadWriteOnce \
--uploadproxy-url https://cdi-uploadproxy.cdi.svc:443 \
--insecure
```

36
apps/cdi/cdi-cr.yaml Normal file
View File

@@ -0,0 +1,36 @@
# =============================================================================
# CDI CR — Tells the CDI operator to install CDI components into the cluster.
# =============================================================================
# After cdi-operator.yaml is applied, the operator watches for THIS resource
# (CDI named "cdi"). When found, it deploys cdi-apiserver, cdi-deployment,
# cdi-uploadproxy, cdi-cronjob, and the importer/uploadserver/cloner pods.
#
# Configuration:
# - HonorWaitForFirstConsumer: PVCs created by DataVolumes wait for first
# pod to schedule before binding (lets storage class pick best node).
# - WebhookPvcRendering: validates PVC creation against CDI policies.
# - imagePullPolicy IfNotPresent: re-pull only on tag rotation.
# - nodeSelector linux: pin to Linux nodes (no Windows worker support).
#
# Andrew may want to add a `uploadProxyURLOverride` later to expose the
# uploadproxy via Traefik IngressRoute for `virtctl image-upload` from
# BLUEJAY-WS without `kubectl port-forward`. Phase 2 enhancement.
# =============================================================================
apiVersion: cdi.kubevirt.io/v1beta1
kind: CDI
metadata:
name: cdi
annotations:
bluejay.iamworkin.lan/source: "kubevirt/containerized-data-importer v1.65.0"
spec:
config:
featureGates:
- HonorWaitForFirstConsumer
- WebhookPvcRendering
imagePullPolicy: IfNotPresent
infra:
nodeSelector:
kubernetes.io/os: linux
workload:
nodeSelector:
kubernetes.io/os: linux

5779
apps/cdi/cdi-operator.yaml Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -30,3 +30,41 @@ spec:
port: 80
tls:
secretName: chat-web-tls
---
# Public host profile marker. The app treats this header as authoritative for
# the public twin, while the internal chat.iamworkin.lan route does not attach
# it and keeps the operator-oriented UI.
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: chat-public-profile-header
namespace: fc-chat
spec:
headers:
customRequestHeaders:
X-FC-Chat-Host-Profile: "public"
---
# Public Cloudflare-fronted twin for the anonymous chat surface. Operator
# paths are intentionally absent from the allowlist below, so /admin,
# /operator, /console, /ops, /api/operator, and /operatorhub miss this route
# and return Traefik 404 before reaching the pod. Operator action still needed:
# create/verify Cloudflare DNS chat.flowercore.io -> public Traefik endpoint
# and mirror the cf-origin-flowercore-io TLS secret into namespace fc-chat.
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: chat-web-public
namespace: fc-chat
spec:
entryPoints:
- websecure
routes:
- match: Host(`chat.flowercore.io`) && (Path(`/`) || Path(`/chat`) || PathPrefix(`/_blazor`) || PathPrefix(`/_framework`) || PathPrefix(`/_content`) || PathPrefix(`/avatars`) || PathPrefix(`/css`) || PathPrefix(`/js`) || PathPrefix(`/favicon`) || PathPrefix(`/chathub`)) && (Method(`GET`) || Method(`HEAD`) || Method(`POST`) || Method(`OPTIONS`))
kind: Rule
middlewares:
- name: chat-public-profile-header
services:
- name: chat-web
port: 80
tls:
secretName: cf-origin-flowercore-io

View File

@@ -1,5 +1,18 @@
# FlowerCore Remote Desktop — TLS + Ingress
# Deployment and Service managed by deploy script (not ArgoCD)
#
# Source-of-truth split:
# - bluejay-infra OWNS: Certificate, IngressRoute, all NetworkPolicies
# (see network-policies.yaml in this directory).
# - FlowerCore.RemoteDesktop scripts/deploy-web.sh OWNS: Deployment +
# Service. Reason: image refs like `localhost/fc-desktop:linux-xfce`
# only exist on each node's containerd after a manual import, so a
# Deployment manifest in bluejay-infra would race the image-import
# step and crash-loop.
#
# NetworkPolicies moved into bluejay-infra 2026-05-07 — previously they
# were applied via the deploy script's kubectl apply calls, which broke
# cluster-rebuild repeatability. See
# feedback_networkpolicies_belong_in_bluejay_infra.md.
---
apiVersion: cert-manager.io/v1
kind: Certificate

View File

@@ -0,0 +1,332 @@
# FlowerCore Remote Desktop — NetworkPolicies (GitOps-managed)
#
# Moved into bluejay-infra 2026-05-07 as part of the regroup audit. These
# four policies were previously applied via FlowerCore.RemoteDesktop's
# scripts/deploy-web.sh `kubectl apply` calls, which meant a fresh cluster
# rebuild from bluejay-infra alone would miss them — Browser Lab session
# isolation, control-plane allow-list, and HTTP-01 cert renewal would all
# silently fail to come up.
#
# Source-of-truth contract:
# - bluejay-infra OWNS all NetworkPolicy + Certificate + IngressRoute
# resources for fc-desktop.
# - FlowerCore.RemoteDesktop's scripts/deploy-web.sh continues to own
# the Deployment + Service apply (because the image ref
# `localhost/fc-desktop:linux-xfce` only exists on each node's
# containerd after a manual import — it can't be pulled from a
# registry, so a Deployment manifest in bluejay-infra would race the
# image-import step and crash-loop).
---
# 1) desktop-isolation — Browser Lab session pods.
#
# Locks down pods labeled `app.kubernetes.io/name=remote-desktop` (every
# session pod regardless of template). Allows guacd ingress for the VNC/RDP
# display lane and remotedesktop-web's pre-handoff probing. Egress: NFS to
# Synology, DNS, Traefik (cluster + LB VIP), Intranet (Browser Lab home).
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: desktop-isolation
namespace: fc-desktop
labels:
app.kubernetes.io/part-of: remotedesktop
app.kubernetes.io/component: isolation
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: remote-desktop
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: guacamole
ports:
- port: 3000
protocol: TCP
- port: 3001
protocol: TCP
- port: 5901
protocol: TCP
- port: 3389
protocol: TCP
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: fc-desktop
podSelector:
matchLabels:
app.kubernetes.io/name: remotedesktop-web
ports:
- port: 3000
protocol: TCP
- port: 5901
protocol: TCP
egress:
# NFS to Synology
- to:
- ipBlock:
cidr: 10.0.58.3/32
ports:
- port: 2049
protocol: TCP
- port: 2049
protocol: UDP
- port: 111
protocol: TCP
- port: 111
protocol: UDP
- to:
- ipBlock:
cidr: 10.0.58.3/32
ports:
- port: 445
protocol: TCP
- to: []
ports:
- port: 53
protocol: UDP
- port: 53
protocol: TCP
- to:
- ipBlock:
cidr: 10.0.56.200/32
- ipBlock:
cidr: 10.43.33.87/32
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: traefik-system
podSelector:
matchLabels:
app.kubernetes.io/name: traefik
ports:
- port: 80
protocol: TCP
- port: 443
protocol: TCP
- port: 8000
protocol: TCP
- port: 8443
protocol: TCP
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: intranet
podSelector:
matchLabels:
app: intranet-web
ports:
- port: 5300
protocol: TCP
---
# 2) fc-desktop-default-deny — namespace-wide catch-all.
#
# Selects every pod EXCEPT remotedesktop-web (the public-surface control
# plane) and applies default-deny semantics for both Ingress and Egress.
# Closes the gap where session pods land WITHOUT the desktop-isolation
# policy's `app.kubernetes.io/name=remote-desktop` label, plus prevents
# arbitrary debug sidecars / kubectl debug images from getting cluster
# access.
#
# CRITICAL: also catches transient cm-acme-http-solver pods (that's the
# bug this whole regroup chased). The cm-acme-http-solver-allow policy
# below is the explicit carve-out.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: fc-desktop-default-deny
namespace: fc-desktop
labels:
app.kubernetes.io/part-of: remotedesktop
app.kubernetes.io/component: isolation
spec:
podSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: NotIn
values:
- remotedesktop-web
policyTypes:
- Ingress
- Egress
---
# 3) remotedesktop-web-isolation — control plane explicit allow-list.
#
# remotedesktop-web is the only pod label the default-deny excludes, so
# without this policy the control plane would have wide-open Ingress AND
# Egress. This re-introduces a tight allow-list:
# - Ingress: Traefik only on TCP/8080
# - Egress: CoreDNS, K8s API, Guacamole admin, NFS, Intranet,
# Traefik (cluster + LB), and the fc-desktop namespace itself
# (for session pod readiness probing).
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: remotedesktop-web-isolation
namespace: fc-desktop
labels:
app.kubernetes.io/part-of: remotedesktop
app.kubernetes.io/component: isolation
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: remotedesktop-web
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: traefik-system
podSelector:
matchLabels:
app.kubernetes.io/name: traefik
ports:
- port: 8080
protocol: TCP
egress:
# CoreDNS
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- port: 53
protocol: UDP
- port: 53
protocol: TCP
# K8s API server
- to: []
ports:
- port: 443
protocol: TCP
- port: 6443
protocol: TCP
# Guacamole admin
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: guacamole
ports:
- port: 8080
protocol: TCP
# NFS to Synology
- to:
- ipBlock:
cidr: 10.0.58.3/32
ports:
- port: 2049
protocol: TCP
- port: 2049
protocol: UDP
- port: 111
protocol: TCP
- port: 111
protocol: UDP
# Intranet web
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: intranet
podSelector:
matchLabels:
app: intranet-web
ports:
- port: 5300
protocol: TCP
# Cluster Traefik pods (in-cluster service resolution + Guacamole
# routing handoff where web app builds URLs against the public host
# but resolves internally).
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: traefik-system
podSelector:
matchLabels:
app.kubernetes.io/name: traefik
ports:
- port: 80
protocol: TCP
- port: 443
protocol: TCP
- port: 8080
protocol: TCP
- port: 8443
protocol: TCP
# fc-desktop namespace — session pod probing during browser-access
# readiness checks.
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: fc-desktop
ports:
- port: 3000
protocol: TCP
- port: 3001
protocol: TCP
- port: 5901
protocol: TCP
- port: 3389
protocol: TCP
---
# 4) cm-acme-http-solver-allow — cert-manager HTTP-01 carve-out.
#
# Without this, fc-desktop-default-deny catches the transient solver pods
# cert-manager creates for each renewal (they don't carry the
# remotedesktop-web label). Caused 8-day silent renewal failure on
# desktop.iamworkin.lan in 2026-04-28..2026-05-07 (see
# feedback_certmanager_renewal_stuck_when_solver_blocked_by_namespace_default_deny.md).
#
# Authorizes:
# - Ingress on TCP/8089 from cluster Traefik (which proxies the external
# HTTP-01 GET on port 80 through to the solver).
# - Egress for cluster DNS (defensive — newer cert-manager probes from
# inside the solver too).
#
# The `acme.cert-manager.io/http01-solver=true` label is set by
# cert-manager itself on every solver pod automatically.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: cm-acme-http-solver-allow
namespace: fc-desktop
labels:
app.kubernetes.io/part-of: remotedesktop
app.kubernetes.io/component: cert-renewal
spec:
podSelector:
matchLabels:
acme.cert-manager.io/http01-solver: "true"
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: traefik-system
podSelector:
matchLabels:
app.kubernetes.io/name: traefik
ports:
- port: 8089
protocol: TCP
egress:
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- port: 53
protocol: UDP
- port: 53
protocol: TCP

View File

@@ -0,0 +1,26 @@
# Runtime secrets for FlowerCore.DeviceManagement.
#
# OnePasswordItem operator syncs this item into a Kubernetes Secret with the
# same name. Expected fields:
# DB-Password
# mtls-ca.pem
# mtls-client.crt
# mtls-client.key
# mtls-chain.pem
#
# Do not add literal secret values to this repo. Runtime pods consume the
# synced Secret through env vars and read-only mounts.
apiVersion: onepassword.com/v1
kind: OnePasswordItem
metadata:
name: fc-devicemgmt-runtime
namespace: fc-devicemgmt
labels:
app.kubernetes.io/name: fc-devicemgmt
app.kubernetes.io/component: secrets
app.kubernetes.io/part-of: flowercore
app.kubernetes.io/managed-by: argocd
flowercore.io/tenant-id: system
flowercore.io/created-by: bluejay-infra
spec:
itemPath: "vaults/IAmWorkin/items/FlowerCore DeviceManagement Runtime"

View File

@@ -0,0 +1,33 @@
# Explicit ArgoCD Application shape for bootstrap/review.
#
# The live bluejay-infra ApplicationSet already discovers apps/* directories
# and creates this same Application name (`infra-fc-devicemgmt`) automatically.
# Keep repoURL on the internal Gitea ClusterIP URL; ArgoCD does not trust the
# external step-ca HTTPS endpoint.
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: infra-fc-devicemgmt
namespace: argocd
labels:
app.kubernetes.io/name: fc-devicemgmt
app.kubernetes.io/part-of: flowercore
app.kubernetes.io/managed-by: argocd
flowercore.io/tenant-id: system
flowercore.io/created-by: bluejay-infra
spec:
project: default
source:
repoURL: http://gitea-clusterip.gitea.svc.cluster.local:3000/bluejay/bluejay-infra.git
targetRevision: main
path: apps/fc-devicemgmt
destination:
server: https://kubernetes.default.svc
namespace: fc-devicemgmt
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
- ServerSideApply=true

View File

@@ -0,0 +1,30 @@
# Certificate for devices.iamworkin.lan.
#
# Preflight gate: FlowerCore.DNS / pfSense must contain an explicit A record:
# devices.iamworkin.lan -> 10.0.56.200
# before this Certificate is synced. step-ca ACME cannot see the CoreDNS
# wildcard, so missing pfSense DNS produces cert-manager HTTP-01 backoff
# (feedback_pfsense_dns_required_for_acme).
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: fc-devicemgmt-web-tls
namespace: fc-devicemgmt
labels:
app.kubernetes.io/name: fc-devicemgmt-web
app.kubernetes.io/component: web
app.kubernetes.io/part-of: flowercore
app.kubernetes.io/managed-by: argocd
flowercore.io/tenant-id: system
flowercore.io/created-by: bluejay-infra
annotations:
flowercore.io/dns-preflight: "devices.iamworkin.lan must resolve to 10.0.56.200 before ACME sync"
spec:
secretName: fc-devicemgmt-web-tls
issuerRef:
name: step-ca-acme
kind: ClusterIssuer
dnsNames:
- devices.iamworkin.lan
duration: 720h
renewBefore: 240h

View File

@@ -0,0 +1,31 @@
# Step issuer for FlowerCore.DeviceManagement runtime mTLS leaves.
#
# Requires the smallstep step-issuer CRDs/controller:
# stepclusterissuers.certmanager.step.sm
# The provisioner password lives in the live cert-manager Secret below; do not
# commit the password or generated private key material to this repo.
apiVersion: certmanager.step.sm/v1beta1
kind: StepClusterIssuer
metadata:
name: step-ca-agent
labels:
app.kubernetes.io/name: step-ca-agent
app.kubernetes.io/component: pki
app.kubernetes.io/part-of: flowercore
app.kubernetes.io/managed-by: argocd
flowercore.io/tenant-id: system
flowercore.io/created-by: bluejay-infra
annotations:
flowercore.io/traceability-standard: k8s-pod-ownership-and-traceability-standard
flowercore.io/provisioner-source: profile::pki::stepca
flowercore.io/secret-source: cert-manager/step-ca-agent-provisioner-password
spec:
url: https://10.0.56.10:9443
caBundle: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUJ4RENDQVdxZ0F3SUJBZ0lSQVBZMzU3RzZvdzZ6TUFMNSs0YlMya2t3Q2dZSUtvWkl6ajBFQXdJd1FERWEKTUJnR0ExVUVDaE1SU1VGdFYyOXlhMmx1SUVGRFRVVWdRMEV4SWpBZ0JnTlZCQU1UR1VsQmJWZHZjbXRwYmlCQgpRMDFGSUVOQklGSnZiM1FnUTBFd0hoY05Nall3TXpBNE1UZ3dOekV4V2hjTk16WXdNekExTVRnd056RXhXakJBCk1Sb3dHQVlEVlFRS0V4RkpRVzFYYjNKcmFXNGdRVU5OUlNCRFFURWlNQ0FHQTFVRUF4TVpTVUZ0VjI5eWEybHUKSUVGRFRVVWdRMEVnVW05dmRDQkRRVEJaTUJNR0J5cUdTTTQ5QWdFR0NDcUdTTTQ5QXdFSEEwSUFCSjJuMDRYMQpKWm81WmRxL2kxSWR2OCtmcXdaeUF6Qmg3d2hicWowU1dzSkw4VVdSYWJDTXFZQ3M3K2RYTzB4UlN6cWt3RkRMCngrdm9vT2FpOFJnUk5oYWpSVEJETUE0R0ExVWREd0VCL3dRRUF3SUJCakFTQmdOVkhSTUJBZjhFQ0RBR0FRSC8KQWdFQk1CMEdBMVVkRGdRV0JCUm51UFBRUjZpTS9INnZPbHVpVTNTeWdheXo4akFLQmdncWhrak9QUVFEQWdOSQpBREJGQWlFQXJRSzlkWVBHbUFac2RZbmp6aXVGVlZFNU5LWlVjY2VZdkdmR0MrdExYVXNDSUF1ZEYyekpyQ1JxCjNtSzUwWlpFVC9md1RrSndpRUY0ODI0bWpQOHAxQ0tNCi0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K
provisioner:
name: step-ca-agent
kid: RF3A9welUYVOWBX8tr19aWyA2kQlxoGZN1dRwTElUEM
passwordRef:
name: step-ca-agent-provisioner-password
namespace: cert-manager
key: password

View File

@@ -0,0 +1,81 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: fc-devicemgmt-operator
labels:
app.kubernetes.io/name: fc-devicemgmt-operator
app.kubernetes.io/component: operator
app.kubernetes.io/part-of: flowercore
app.kubernetes.io/managed-by: argocd
flowercore.io/tenant-id: system
flowercore.io/created-by: bluejay-infra
rules:
- apiGroups:
- devices.flowercore.io
resources:
- '*'
verbs:
- get
- list
- watch
- create
- update
- patch
- delete
- apiGroups:
- devices.flowercore.io
resources:
- devices/status
- devices/finalizers
- devicegroups/status
- devicegroups/finalizers
- devicepolicies/status
- devicepolicies/finalizers
- remotecommands/status
- remotecommands/finalizers
verbs:
- get
- update
- patch
- apiGroups:
- apps
resources:
- deployments
verbs:
- get
- apiGroups:
- ""
resources:
- pods
- services
- configmaps
- secrets
- events
verbs:
- get
- list
- watch
- create
- update
- patch
- delete
- apiGroups:
- batch
resources:
- jobs
verbs:
- get
- list
- watch
- create
- update
- patch
- delete
- apiGroups:
- networking.k8s.io
resources:
- networkpolicies
verbs:
- get
- list
- watch

View File

@@ -0,0 +1,19 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: fc-devicemgmt-operator
labels:
app.kubernetes.io/name: fc-devicemgmt-operator
app.kubernetes.io/component: operator
app.kubernetes.io/part-of: flowercore
app.kubernetes.io/managed-by: argocd
flowercore.io/tenant-id: system
flowercore.io/created-by: bluejay-infra
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: fc-devicemgmt-operator
subjects:
- kind: ServiceAccount
name: fc-devicemgmt-operator
namespace: fc-devicemgmt

View File

@@ -0,0 +1,109 @@
# FlowerCore.DeviceManagement Operator.
#
# KubeOps controller for devices.flowercore.io resources. Operator-created
# children must set OwnerReferences + traceability labels/annotations per
# k8s-pod-ownership-and-traceability-standard.md. RBAC below grants
# apps/deployments/get so the process can resolve its own Deployment UID.
apiVersion: apps/v1
kind: Deployment
metadata:
name: fc-devicemgmt-operator
namespace: fc-devicemgmt
labels:
app: fc-devicemgmt-operator
app.kubernetes.io/name: fc-devicemgmt-operator
app.kubernetes.io/component: operator
app.kubernetes.io/part-of: flowercore
app.kubernetes.io/managed-by: argocd
flowercore.io/tenant-id: system
flowercore.io/created-by: bluejay-infra
annotations:
flowercore.io/traceability-standard: k8s-pod-ownership-and-traceability-standard
spec:
replicas: 1
revisionHistoryLimit: 3
selector:
matchLabels:
app: fc-devicemgmt-operator
template:
metadata:
labels:
app: fc-devicemgmt-operator
app.kubernetes.io/name: fc-devicemgmt-operator
app.kubernetes.io/component: operator
app.kubernetes.io/part-of: flowercore
app.kubernetes.io/managed-by: argocd
flowercore.io/tenant-id: system
flowercore.io/created-by: bluejay-infra
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/metrics"
flowercore.io/audit-trace-id: "runtime-activity-trace"
spec:
serviceAccountName: fc-devicemgmt-operator
securityContext:
fsGroup: 1654
fsGroupChangePolicy: OnRootMismatch
containers:
- name: operator
image: localhost/fc-devicemgmt-operator:v20260519-sp34cl3-fix
imagePullPolicy: Never
ports:
- name: metrics
containerPort: 8080
env:
- name: ASPNETCORE_ENVIRONMENT
value: "Production"
- name: DOTNET_SYSTEM_GLOBALIZATION_INVARIANT
value: "false"
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: FLOWERCORE_KUBERNETES_OWNER_DEPLOYMENT
value: "fc-devicemgmt-operator"
- name: FlowerCore__Service__Name
value: "FlowerCore.DeviceManagement.Operator"
- name: FlowerCore__DeviceManagement__DefaultTenantId
value: "system"
resources:
requests:
cpu: 50m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
readinessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
livenessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 20
periodSeconds: 30
securityContext:
runAsNonRoot: true
runAsUser: 1654
runAsGroup: 1654
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
volumeMounts:
- name: tmp
mountPath: /tmp
- name: logs
mountPath: /app/logs
volumes:
- name: tmp
emptyDir: {}
- name: logs
emptyDir: {}

View File

@@ -0,0 +1,151 @@
# FlowerCore.DeviceManagement Web.
#
# Source repo is expected to ship FlowerCore.DeviceManagement.Web in a later
# Sprint 9+ lane. This manifest is static-valid without requiring the image to
# exist yet; import localhost/fc-devicemgmt-web:<tag> to all schedulable RKE2
# nodes before letting ArgoCD sync a live rollout.
#
# SCALED TO 0 — 2026-05-19 morning-routine cleanup.
# The Web pod cannot start until TWO upstream gaps close:
# 1. MySQL DB instance `flowercore_devicemgmt` (user `fc_devicemgmt`) is
# provisioned via fc-mysql Manager. The cluster currently has ZERO
# MySqlInstanceCrds and no `mysql.fc-mysql.svc:3306` Service, so the
# deployment-web container env `FlowerCore__Database__Host=mysql.fc-mysql.svc`
# points at nothing. Provision via the fc-mysql Manager UI/REST/MCP.
# 2. 1Password vault item `IAmWorkin/FlowerCore DeviceManagement Runtime`
# with 5 fields (DB-Password, mtls-ca.pem, mtls-client.crt, mtls-client.key,
# mtls-chain.pem) — see apps/fc-devicemgmt/1password-item.yaml. Mint mTLS
# from step-ca-agent ClusterIssuer per ADR-126; DB-Password must match the
# password configured for the MySQL user.
# Re-enable: change replicas back to 2 after both gaps close. The image tag
# in this file (v20260512-cx5) MAY also need a refresh — it predates the
# Sprint 34 Cl-3 operator fix; Web may have an analogous bug.
apiVersion: apps/v1
kind: Deployment
metadata:
name: fc-devicemgmt-web
namespace: fc-devicemgmt
labels:
app: fc-devicemgmt-web
app.kubernetes.io/name: fc-devicemgmt-web
app.kubernetes.io/component: web
app.kubernetes.io/part-of: flowercore
app.kubernetes.io/managed-by: argocd
flowercore.io/tenant-id: system
flowercore.io/created-by: bluejay-infra
annotations:
flowercore.io/traceability-standard: k8s-pod-ownership-and-traceability-standard
spec:
replicas: 0
revisionHistoryLimit: 3
selector:
matchLabels:
app: fc-devicemgmt-web
template:
metadata:
labels:
app: fc-devicemgmt-web
app.kubernetes.io/name: fc-devicemgmt-web
app.kubernetes.io/component: web
app.kubernetes.io/part-of: flowercore
app.kubernetes.io/managed-by: argocd
flowercore.io/tenant-id: system
flowercore.io/created-by: bluejay-infra
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/metrics"
flowercore.io/audit-trace-id: "runtime-activity-trace"
spec:
securityContext:
fsGroup: 1654
fsGroupChangePolicy: OnRootMismatch
containers:
- name: web
image: localhost/fc-devicemgmt-web:v20260512-cx5
imagePullPolicy: Never
ports:
- name: http
containerPort: 8080
env:
- name: ASPNETCORE_URLS
value: "http://+:8080"
- name: ASPNETCORE_ENVIRONMENT
value: "Production"
- name: DOTNET_SYSTEM_GLOBALIZATION_INVARIANT
value: "false"
- name: FlowerCore__Service__Name
value: "FlowerCore.DeviceManagement.Web"
- name: FlowerCore__DeviceManagement__DefaultTenantId
value: "system"
- name: FlowerCore__Database__Provider
value: "MySql"
- name: FlowerCore__Database__Host
value: "mysql.fc-mysql.svc"
- name: FlowerCore__Database__Database
value: "flowercore_devicemgmt"
- name: FlowerCore__Database__User
value: "fc_devicemgmt"
- name: FlowerCore__Database__Password
valueFrom:
secretKeyRef:
name: fc-devicemgmt-runtime
key: DB-Password
- name: FlowerCore__DeviceManagement__AgentMtls__CaPath
value: "/secrets/devicemgmt-mtls/mtls-ca.pem"
- name: FlowerCore__DeviceManagement__AgentMtls__ClientCertificatePath
value: "/secrets/devicemgmt-mtls/mtls-client.crt"
- name: FlowerCore__DeviceManagement__AgentMtls__ClientKeyPath
value: "/secrets/devicemgmt-mtls/mtls-client.key"
- name: FlowerCore__EventBus__Redis__Configuration
value: "redis.fc-redis.svc:6379"
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 1000m
memory: 768Mi
startupProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 30
readinessProbe:
tcpSocket:
port: 8080
periodSeconds: 10
failureThreshold: 3
livenessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 30
periodSeconds: 30
failureThreshold: 3
securityContext:
runAsNonRoot: true
runAsUser: 1654
runAsGroup: 1654
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
volumeMounts:
- name: tmp
mountPath: /tmp
- name: logs
mountPath: /app/logs
- name: devicemgmt-mtls
mountPath: /secrets/devicemgmt-mtls
readOnly: true
volumes:
- name: tmp
emptyDir: {}
- name: logs
emptyDir: {}
- name: devicemgmt-mtls
secret:
secretName: fc-devicemgmt-runtime
defaultMode: 0400

View File

@@ -0,0 +1,55 @@
# LAN ingress for FlowerCore.DeviceManagement Web.
#
# RKE2 Traefik has no built-in ACME resolver configured. Keep TLS certificate
# ownership in cert-manager Certificate/fc-devicemgmt-web-tls.
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: fc-devicemgmt-web
namespace: fc-devicemgmt
labels:
app.kubernetes.io/name: fc-devicemgmt-web
app.kubernetes.io/component: web
app.kubernetes.io/part-of: flowercore
app.kubernetes.io/managed-by: argocd
flowercore.io/tenant-id: system
flowercore.io/created-by: bluejay-infra
spec:
entryPoints:
- websecure
routes:
- match: Host(`devices.iamworkin.lan`)
kind: Rule
services:
- name: fc-devicemgmt-web
port: 80
tls:
secretName: fc-devicemgmt-web-tls
# Future public agent/update host gate (OFF by default):
#
# Do not enable `update.flowercore.io` here until Authentik OIDC Q-OIDC-1
# resolves the public-device-management auth model and route ownership with
# UpdateCenter. When enabled, use a separate public IngressRoute with an
# explicit Method allowlist, public-host auth middleware, and public TLS
# certificate strategy. Leaving this as comments keeps ArgoCD from stealing
# live UpdateCenter traffic.
#
# apiVersion: traefik.io/v1alpha1
# kind: IngressRoute
# metadata:
# name: fc-devicemgmt-web-public
# namespace: fc-devicemgmt
# annotations:
# flowercore.io/public-host-gate: "disabled-until-Q-OIDC-1"
# spec:
# entryPoints:
# - websecure
# routes:
# - match: Host(`update.flowercore.io`) && (Method(`GET`) || Method(`HEAD`) || Method(`POST`) || Method(`OPTIONS`))
# kind: Rule
# services:
# - name: fc-devicemgmt-web
# port: 80
# tls:
# secretName: fc-devicemgmt-public-tls

View File

@@ -0,0 +1,13 @@
# FlowerCore.DeviceManagement namespace.
#
# ArgoCD discovers this directory as Application `infra-fc-devicemgmt`.
apiVersion: v1
kind: Namespace
metadata:
name: fc-devicemgmt
labels:
app.kubernetes.io/name: fc-devicemgmt
app.kubernetes.io/part-of: flowercore
app.kubernetes.io/managed-by: argocd
flowercore.io/tenant-id: system
flowercore.io/created-by: bluejay-infra

View File

@@ -0,0 +1,224 @@
# FlowerCore.DeviceManagement NetworkPolicies.
#
# NetworkPolicies belong in bluejay-infra so ArgoCD owns rebuild state.
# Rules include Traefik post-DNAT backend ports per
# feedback_netpol_dnat_backend_port and Synology NFS egress for the requested
# cold-tier / future artifact path.
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: fc-devicemgmt-web-isolation
namespace: fc-devicemgmt
labels:
app.kubernetes.io/name: fc-devicemgmt-web
app.kubernetes.io/component: web
app.kubernetes.io/part-of: flowercore
app.kubernetes.io/managed-by: argocd
flowercore.io/tenant-id: system
flowercore.io/created-by: bluejay-infra
spec:
podSelector:
matchLabels:
app: fc-devicemgmt-web
policyTypes:
- Ingress
- Egress
ingress:
# LAN edge: only cluster Traefik should reach the Web pod for
# devices.iamworkin.lan.
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: traefik-system
podSelector:
matchLabels:
app.kubernetes.io/name: traefik
ports:
- port: 8080
protocol: TCP
# Direct LAN diagnostics are allowed only from FlowerCore LAN/VPN ranges.
- from:
- ipBlock:
cidr: 10.0.56.0/24
- ipBlock:
cidr: 10.0.57.0/24
- ipBlock:
cidr: 10.0.58.0/24
- ipBlock:
cidr: 10.0.68.0/27
ports:
- port: 8080
protocol: TCP
egress:
# CoreDNS.
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- port: 53
protocol: UDP
- port: 53
protocol: TCP
# Database namespace.
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: fc-mysql
ports:
- port: 3306
protocol: TCP
# Redis backplane for multi-replica SignalR / live-status fan-out.
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: fc-redis
ports:
- port: 6379
protocol: TCP
# Traefik VIP / in-cluster Traefik for self-callbacks and public URL
# generation tests. Include post-DNAT backend ports 8443 + 8080.
- to:
- ipBlock:
cidr: 10.0.56.200/32
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: traefik-system
podSelector:
matchLabels:
app.kubernetes.io/name: traefik
ports:
- port: 80
protocol: TCP
- port: 443
protocol: TCP
- port: 8080
protocol: TCP
- port: 8443
protocol: TCP
# Agent egress: LAN/VPN devices may run DM Agent in Generic, Kiosk, Pi,
# ThinClient, or Server mode. Keep this private-range only.
- to:
- ipBlock:
cidr: 10.0.56.0/24
- ipBlock:
cidr: 10.0.57.0/24
- ipBlock:
cidr: 10.0.58.0/24
- ipBlock:
cidr: 10.0.68.0/27
ports:
- port: 80
protocol: TCP
- port: 443
protocol: TCP
- port: 8080
protocol: TCP
- port: 8443
protocol: TCP
- port: 5000
protocol: TCP
- port: 5001
protocol: TCP
# Synology NFS cold-tier / artifact mount allowance.
- to:
- ipBlock:
cidr: 10.0.58.3/32
ports:
- port: 2049
protocol: TCP
- port: 2049
protocol: UDP
- port: 111
protocol: TCP
- port: 111
protocol: UDP
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: fc-devicemgmt-operator-isolation
namespace: fc-devicemgmt
labels:
app.kubernetes.io/name: fc-devicemgmt-operator
app.kubernetes.io/component: operator
app.kubernetes.io/part-of: flowercore
app.kubernetes.io/managed-by: argocd
flowercore.io/tenant-id: system
flowercore.io/created-by: bluejay-infra
spec:
podSelector:
matchLabels:
app: fc-devicemgmt-operator
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: monitoring
ports:
- port: 8080
protocol: TCP
egress:
# CoreDNS.
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- port: 53
protocol: UDP
- port: 53
protocol: TCP
# Kubernetes API for KubeOps reconciliation and Deployment UID lookup.
- to: []
ports:
- port: 443
protocol: TCP
- port: 6443
protocol: TCP
# Agent egress for operator-initiated probes / fallback command dispatch.
- to:
- ipBlock:
cidr: 10.0.56.0/24
- ipBlock:
cidr: 10.0.57.0/24
- ipBlock:
cidr: 10.0.58.0/24
- ipBlock:
cidr: 10.0.68.0/27
ports:
- port: 80
protocol: TCP
- port: 443
protocol: TCP
- port: 8080
protocol: TCP
- port: 8443
protocol: TCP
- port: 5000
protocol: TCP
- port: 5001
protocol: TCP
# Synology NFS allowance for future cold-tier/audit archival jobs.
- to:
- ipBlock:
cidr: 10.0.58.3/32
ports:
- port: 2049
protocol: TCP
- port: 2049
protocol: UDP
- port: 111
protocol: TCP
- port: 111
protocol: UDP

View File

@@ -0,0 +1,22 @@
apiVersion: v1
kind: Service
metadata:
name: fc-devicemgmt-web
namespace: fc-devicemgmt
labels:
app: fc-devicemgmt-web
app.kubernetes.io/name: fc-devicemgmt-web
app.kubernetes.io/component: web
app.kubernetes.io/part-of: flowercore
app.kubernetes.io/managed-by: argocd
flowercore.io/tenant-id: system
flowercore.io/created-by: bluejay-infra
spec:
type: ClusterIP
selector:
app: fc-devicemgmt-web
ports:
- name: http
port: 80
targetPort: 8080
protocol: TCP

View File

@@ -0,0 +1,12 @@
apiVersion: v1
kind: ServiceAccount
metadata:
name: fc-devicemgmt-operator
namespace: fc-devicemgmt
labels:
app.kubernetes.io/name: fc-devicemgmt-operator
app.kubernetes.io/component: operator
app.kubernetes.io/part-of: flowercore
app.kubernetes.io/managed-by: argocd
flowercore.io/tenant-id: system
flowercore.io/created-by: bluejay-infra

View File

@@ -118,7 +118,7 @@ spec:
# dotnet.exe publish -c Release -o deploy/app \
# src/FlowerCore.Distribution.Web/FlowerCore.Distribution.Web.csproj
# podman build -t localhost/fc-distribution:v<tag> -f deploy/Dockerfile.deploy deploy
image: localhost/fc-distribution:v202604240010
image: localhost/fc-distribution:v202605061948
imagePullPolicy: Never
ports:
- containerPort: 8080
@@ -151,6 +151,10 @@ spec:
value: "/signing/aistation-field/chain.pem"
- name: FlowerCore__Distribution__Signing__EditionCerts__aistation-field__KeyPath
value: "/signing/aistation-field/private-key.pem"
# Public distribution host is GET/HEAD-only at Traefik; this
# entitlement list controls which editions are readable there.
- name: FlowerCore__Distribution__EntitlementPublic__PublicEditions__0
value: "*"
resources:
requests:
cpu: 100m
@@ -262,8 +266,12 @@ spec:
kind: ClusterIssuer
dnsNames:
- dist.iamworkin.lan
duration: 2160h # 90d
renewBefore: 720h # 30d
# step-ca ACME caps lifetime at 30d; requesting 90d silently capped
# made renewBefore=cert-lifetime → perpetual renewal loop (10880+ CRs
# in 18h on 2026-05-07). Match working 720h/240h pattern from other
# FC services.
duration: 720h # 30d (step-ca cap)
renewBefore: 240h # 10d
---
apiVersion: traefik.io/v1alpha1
kind: IngressRoute

171
apps/fc-redis/fc-redis.yaml Normal file
View File

@@ -0,0 +1,171 @@
# fc-redis — SignalR backplane for cross-product event bus
#
# Lands per Q-SO-1 resolution (2026-05-11 PM): SignalR backplane in Phase A,
# not Phase C as originally drafted. Operator directive: "Redis can be
# deployed just fine as it's another FlowerCore technology we'll want to
# manage."
#
# Phase A scope (this file):
# - Single Redis 7.x Alpine pod
# - 1Gi Longhorn RWO PVC for AOF persistence
# - ClusterIP Service at `redis.fc-redis.svc.cluster.local:6379`
# - No AUTH (in-cluster only; not exposed externally)
# - No IngressRoute (backplane is server-to-server only)
#
# Consumers (Phase A IMPL across FC services):
# - FlowerCore.Signage.Web (OpsConsoleHub)
# - FlowerCore.Scoreboard.Web (ScoreboardHub)
# - FlowerCore.SignalControl.Web
# - FlowerCore.DMS.Web
# - Any other product joining the cross-product event bus
#
# Each consumer adds:
# services.AddSignalR()
# .AddStackExchangeRedis(
# "redis.fc-redis.svc.cluster.local:6379",
# opts => opts.Configuration.ChannelPrefix =
# StackExchange.Redis.RedisChannel.Literal("fc-opsconsole"));
#
# Phase B / C follow-ons (out of scope here):
# - Redis Sentinel for HA (3-node)
# - AUTH password from 1Password Connect (rotate via /rotate-password)
# - redis_exporter sidecar for Prometheus scrape
# - Network policies restricting which namespaces can dial 6379
#
# Design: docs/signage/operations-console-phase-2-design.md §3.5
# Decision: Q-SO-1 (RESOLVED 2026-05-11 PM)
# Memory: feedback_blooming_ui_pattern_no_iframes
---
apiVersion: v1
kind: Namespace
metadata:
name: fc-redis
labels:
app.kubernetes.io/part-of: flowercore
app.kubernetes.io/managed-by: argocd
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: fc-redis-data
namespace: fc-redis
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
resources:
requests:
storage: 1Gi
---
apiVersion: v1
kind: ConfigMap
metadata:
name: fc-redis-config
namespace: fc-redis
data:
redis.conf: |
# Phase A — minimal config; no AUTH, no replication.
bind 0.0.0.0
protected-mode no
port 6379
tcp-backlog 511
timeout 0
tcp-keepalive 300
# Persistence: AOF (fsync every second is the standard SignalR-backplane
# durability sweet spot — the backplane only needs to survive Redis
# restarts, not absolute zero loss).
appendonly yes
appendfsync everysec
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
# Reasonable defaults — let Redis pick most things.
maxmemory-policy allkeys-lru
maxmemory 256mb
# Logging
loglevel notice
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: fc-redis
namespace: fc-redis
labels:
app: fc-redis
spec:
replicas: 1
strategy:
type: Recreate # RWO PVC; do not do rolling update
selector:
matchLabels:
app: fc-redis
template:
metadata:
labels:
app: fc-redis
spec:
securityContext:
runAsNonRoot: true
runAsUser: 999 # redis:7-alpine default uid
runAsGroup: 999
fsGroup: 999
containers:
- name: redis
image: redis:7-alpine
imagePullPolicy: IfNotPresent
command: ["redis-server", "/etc/redis/redis.conf"]
ports:
- name: redis
containerPort: 6379
resources:
requests:
cpu: "50m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "384Mi"
volumeMounts:
- name: data
mountPath: /data
- name: config
mountPath: /etc/redis
readOnly: true
livenessProbe:
tcpSocket:
port: 6379
initialDelaySeconds: 5
periodSeconds: 10
readinessProbe:
exec:
command: ["redis-cli", "ping"]
initialDelaySeconds: 2
periodSeconds: 5
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: [ALL]
volumes:
- name: data
persistentVolumeClaim:
claimName: fc-redis-data
- name: config
configMap:
name: fc-redis-config
---
apiVersion: v1
kind: Service
metadata:
name: redis
namespace: fc-redis
spec:
type: ClusterIP
selector:
app: fc-redis
ports:
- name: redis
port: 6379
targetPort: 6379
protocol: TCP

View File

@@ -0,0 +1,14 @@
# fc-signage-appletv
Apple TV signage is a sealed appliance running the `FlowerCore.Signage.Agent.AppleTv` tvOS app per ADR-134.
This ApplicationSet entry is documentation and inventory metadata only. It intentionally creates no `Deployment`, `Service`, or `Pod`.
The Apple TV app connects outbound to existing FC.Signage.Web surfaces:
- `https://signage.iamworkin.lan/hub/signage` for SignalR live status.
- `GET /api/v1/nodes/{nodeId}/state` for the 30 second polling fallback.
- `POST /api/v1/nodes/register` and `POST /api/v1/nodes/{nodeId}/enroll` for pairing and mTLS enrollment.
- `POST /api/v1/nodes/{nodeId}/heartbeat` for metrics, current content identity, and local audit excerpts.
Distribution is via Apple Developer Enterprise Program or TestFlight plus FC.Distribution / UpdateCenter publishing once Apple credentials are available.

View File

@@ -0,0 +1,5 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- manifest.yaml

View File

@@ -0,0 +1,26 @@
# Apple TV signage is a sealed tvOS appliance. This ArgoCD app intentionally
# carries documentation metadata only; no Deployment, Service, or Pod resources
# are created for the player.
---
apiVersion: v1
kind: ConfigMap
metadata:
name: fc-signage-appletv-docs
namespace: fc-signage
labels:
app.kubernetes.io/name: fc-signage-appletv
app.kubernetes.io/part-of: flowercore-signage
flowercore.io/manifest-kind: docs-only
data:
README: |
FlowerCore.Signage.Agent.AppleTv is distributed through Apple Developer
Enterprise Program or TestFlight, not Kubernetes.
The app connects outbound to FC.Signage.Web:
- SignalR: https://signage.iamworkin.lan/hub/signage
- Polling fallback: GET /api/v1/nodes/{nodeId}/state
- Enrollment: POST /api/v1/nodes/{nodeId}/enroll
- Heartbeat: POST /api/v1/nodes/{nodeId}/heartbeat
This placeholder gives ArgoCD and inventory dashboards a first-class
Apple TV signage app entry without creating runtime pods.

View File

@@ -0,0 +1,17 @@
# FlowerCore Signage Pi Player
Phase 1 Raspberry Pi signage player packaging for Chromium kiosk deployments.
This bundle is intentionally air-gap friendly: systemd units, shell scripts,
udev rules, and Chromium managed policy are all checked into the repo and are
installed by `FlowerCore.Puppet`.
## Scope
- Bootstrap a stable node identity and mTLS client certificate.
- Launch Chromium in kiosk mode against `FC.Signage.Web` player routes.
- Restart the kiosk on HDMI hotplug.
- Renew mTLS certificates daily when fewer than 30 days remain.
- Detect display capabilities at boot, daily, and on HDMI hotplug.
Phase 2 native Avalonia rendering is documented separately in Notes and remains
deferred.

View File

@@ -0,0 +1,15 @@
{
"AutofillAddressEnabled": false,
"AutofillCreditCardEnabled": false,
"PasswordManagerEnabled": false,
"BrowserSignin": 0,
"MetricsReportingEnabled": false,
"SafeBrowsingProtectionLevel": 0,
"DefaultNotificationsSetting": 2,
"DefaultPopupsSetting": 2,
"BackgroundModeEnabled": false,
"DefaultBrowserSettingEnabled": false,
"PromotionalTabsEnabled": false,
"CommandLineFlagSecurityWarningsEnabled": false,
"ExtensionInstallBlocklist": ["*"]
}

View File

@@ -0,0 +1,132 @@
#!/usr/bin/env bash
set -euo pipefail
NODE_JSON="/etc/flowercore/signage-node.json"
CERT_DIR="/etc/fc-signage-player"
SIGNAGE_URL="${FC_SIGNAGE_URL:-https://signage.iamworkin.lan}"
NODE_ID=$(jq -r '.nodeId' "$NODE_JSON")
CONNECTORS=()
for dir in /sys/class/drm/card*-HDMI-A-*; do
[[ -e "$dir/status" ]] || continue
if [[ "$(cat "$dir/status")" == "connected" ]]; then
CONNECTORS+=("$(basename "$dir")")
fi
done
if [[ ${#CONNECTORS[@]} -eq 0 ]]; then
CAPABILITIES_JSON=$(jq -n --arg id "$NODE_ID" '{
nodeId: $id,
platform: "linux-arm64-pi",
displayConnected: false,
detectedAt: (now | todate),
note: "No HDMI display detected"
}')
else
PRIMARY="${CONNECTORS[0]}"
EDID_PATH="/sys/class/drm/${PRIMARY}/edid"
WIDTH=0
HEIGHT=0
REFRESH=60
HDR=false
AUDIO_HDMI=false
MFG=""
MODEL=""
PHYSICAL_SIZE=null
if [[ -s "$EDID_PATH" ]] && command -v edid-decode >/dev/null 2>&1; then
EDID_INFO=$(edid-decode < "$EDID_PATH" 2>/dev/null || true)
MFG=$(echo "$EDID_INFO" | grep -m1 -oP 'Manufacturer:\s*\K\S+' || true)
MODEL=$(echo "$EDID_INFO" | grep -m1 -oP 'Model:\s*\K\S+' || true)
PREF=$(echo "$EDID_INFO" | grep -m1 -oP '\d+x\d+\s*@\s*\d+(?:\.\d+)?\s*Hz' || true)
if [[ -n "$PREF" ]]; then
WIDTH=$(echo "$PREF" | grep -oP '^\d+')
HEIGHT=$(echo "$PREF" | grep -oP 'x\K\d+')
REFRESH=$(echo "$PREF" | grep -oP '@\s*\K[\d.]+' | cut -d. -f1)
fi
if echo "$EDID_INFO" | grep -qiE 'HDR (Static|Dynamic) Metadata Block'; then HDR=true; fi
if echo "$EDID_INFO" | grep -qiE 'CEA Audio Block|Audio Format Descriptor'; then AUDIO_HDMI=true; fi
PH_W=$(echo "$EDID_INFO" | grep -m1 -oP 'Maximum image size:\s*\K\d+\s*cm\s*x\s*\d+' || true)
if [[ -n "$PH_W" ]]; then
PH_CM_W=$(echo "$PH_W" | grep -oP '^\d+')
PH_CM_H=$(echo "$PH_W" | grep -oP 'x\s*\K\d+')
if (( PH_CM_W > 0 && PH_CM_H > 0 )); then
PHYSICAL_SIZE=$(awk -v w="$PH_CM_W" -v h="$PH_CM_H" 'BEGIN { printf "%.1f", sqrt(w*w + h*h)/2.54 }')
fi
fi
fi
if [[ "$WIDTH" == "0" ]] && command -v kmsprint >/dev/null 2>&1; then
KMS=$(kmsprint 2>/dev/null | grep -A2 "$PRIMARY" | grep -oP '\d+x\d+' | head -1 || true)
if [[ -n "$KMS" ]]; then
WIDTH=$(echo "$KMS" | grep -oP '^\d+')
HEIGHT=$(echo "$KMS" | grep -oP 'x\K\d+')
fi
fi
AUDIO_ALSA=false
if aplay -l 2>/dev/null | grep -qi 'card.*HDMI'; then AUDIO_ALSA=true; fi
HAS_AUDIO=false
if [[ "$AUDIO_HDMI" == "true" && "$AUDIO_ALSA" == "true" ]]; then HAS_AUDIO=true; fi
CAPABILITIES_JSON=$(jq -n \
--arg id "$NODE_ID" \
--argjson w "$WIDTH" \
--argjson h "$HEIGHT" \
--argjson r "$REFRESH" \
--argjson hdr "$HDR" \
--argjson audio "$HAS_AUDIO" \
--arg connector "$PRIMARY" \
--arg mfg "$MFG" \
--arg model "$MODEL" \
--argjson size "$PHYSICAL_SIZE" \
'{
nodeId: $id,
platform: "linux-arm64-pi",
displayConnected: true,
detectedAt: (now | todate),
hardware: {
maxResolution: { width: $w, height: $h },
nativeResolution: { width: $w, height: $h },
refreshRateHz: $r,
colorDepth: ($hdr | if . then "Color30Hdr" else "Color24" end),
hasAudioOutput: $audio,
audioChannelCount: ($audio | if . then 2 else 0 end),
physicalSizeInches: $size,
connector: $connector,
manufacturer: $mfg,
modelName: $model
},
render: { codecs: ["h264", "vp9", "mp4"] }
}')
fi
ENDPOINT_CANDIDATES=(
"${SIGNAGE_URL}/api/v1/nodes/${NODE_ID}/capabilities"
"${SIGNAGE_URL}/api/v1/displays/${NODE_ID}/capability-profile"
)
SUCCESS=false
for url in "${ENDPOINT_CANDIDATES[@]}"; do
HTTP_STATUS=$(curl -sk -o /tmp/cap-response.json -w "%{http_code}" \
--max-time 10 \
--cert "$CERT_DIR/client.crt" --key "$CERT_DIR/client.key" \
-X POST "$url" \
-H "Content-Type: application/json" \
-d "$CAPABILITIES_JSON" || echo "000")
if [[ "$HTTP_STATUS" == "200" || "$HTTP_STATUS" == "201" || "$HTTP_STATUS" == "204" ]]; then
SUCCESS=true
break
fi
done
mkdir -p /var/log/fc-signage-player
if [[ "$SUCCESS" != "true" ]]; then
echo "[$(date -Is)] capability declare: no endpoint accepted the profile; logging locally" \
| tee -a /var/log/fc-signage-player/capabilities.log
echo "$CAPABILITIES_JSON" | tee -a /var/log/fc-signage-player/capabilities.log
else
echo "[$(date -Is)] capability declare: ok ($url)" | tee -a /var/log/fc-signage-player/capabilities.log
fi
echo "$CAPABILITIES_JSON"

View File

@@ -0,0 +1,144 @@
#!/usr/bin/env bash
set -euo pipefail
NODE_JSON="/etc/flowercore/signage-node.json"
CERT_DIR="/etc/fc-signage-player"
SIGNAGE_URL="${FC_SIGNAGE_URL:-https://signage.iamworkin.lan}"
SETUP_CODE_FILE="/etc/flowercore/signage-setup-code"
mkdir -p /etc/flowercore "$CERT_DIR" /var/log/fc-signage-player
chown fc-signage:fc-signage /etc/flowercore "$CERT_DIR" /var/log/fc-signage-player
chmod 0750 "$CERT_DIR"
if [[ -s "$NODE_JSON" && -s "$CERT_DIR/client.p12" ]]; then
ENROLLED=$(jq -r '.enrolledAt // empty' "$NODE_JSON")
if [[ -n "$ENROLLED" ]]; then
echo "[$(date -Is)] bootstrap: already enrolled at $ENROLLED; skipping"
exit 0
fi
fi
if [[ -s "$NODE_JSON" ]]; then
NODE_UUID=$(jq -r '.nodeUuid // empty' "$NODE_JSON")
MACHINE_ID=$(jq -r '.machineId // empty' "$NODE_JSON")
else
NODE_UUID=$(uuidgen)
MACHINE_ID=$(echo "$NODE_UUID" | tr -d '-' | cut -c1-16)
jq -n --arg uuid "$NODE_UUID" --arg machine "$MACHINE_ID" --arg host "$(hostname -f)" --arg ts "$(date -Is)" \
'{nodeUuid: $uuid, machineId: $machine, hostname: $host, platform: "linux-arm64-pi", createdAt: $ts}' \
> "$NODE_JSON"
chmod 0640 "$NODE_JSON"
chown fc-signage:fc-signage "$NODE_JSON"
fi
SETUP_CODE=""
if [[ -s "$SETUP_CODE_FILE" ]]; then
SETUP_CODE=$(tr -d '\r\n\t ' < "$SETUP_CODE_FILE")
fi
MODEL=$(tr -d '\0' < /sys/firmware/devicetree/base/model 2>/dev/null || echo Unknown)
REG_PAYLOAD=$(jq -n \
--arg machine "$MACHINE_ID" \
--arg name "$(hostname -f)" \
--arg setup "$SETUP_CODE" \
--arg resolution "1920x1080" \
--arg model "$MODEL" \
'{
machineId: $machine,
name: $name,
setupCode: ($setup | if . == "" then null else . end),
resolution: $resolution,
hardwareModel: $model,
platform: "linux-arm64-pi"
}')
for attempt in 1 2; do
HTTP_STATUS=$(curl -sk -o /tmp/register-response.json -w "%{http_code}" \
--max-time 15 \
-X POST "${SIGNAGE_URL}/api/v1/nodes/register" \
-H "Content-Type: application/json" \
-d "$REG_PAYLOAD" || echo "000")
if [[ "$HTTP_STATUS" == "200" || "$HTTP_STATUS" == "201" ]]; then
break
fi
echo "[$(date -Is)] bootstrap: register attempt $attempt returned $HTTP_STATUS" >&2
sleep 5
done
if [[ "$HTTP_STATUS" != "200" && "$HTTP_STATUS" != "201" ]]; then
echo "[$(date -Is)] bootstrap: register failed after 2 attempts" >&2
exit 2
fi
NODE_ID=$(jq -r '.nodeId // empty' /tmp/register-response.json)
if [[ -z "$NODE_ID" ]]; then
echo "[$(date -Is)] bootstrap: register response did not include nodeId" >&2
exit 2
fi
jq --arg id "$NODE_ID" '.nodeId = $id' "$NODE_JSON" > "${NODE_JSON}.tmp" && mv "${NODE_JSON}.tmp" "$NODE_JSON"
if [[ -s "$SETUP_CODE_FILE" ]]; then
curl -sk -X POST "${SIGNAGE_URL}/api/v1/nodes/${NODE_ID}/approve-via-setup-code" \
-H "Content-Type: application/json" \
-d "{\"setupCode\":\"${SETUP_CODE}\"}" \
-o /dev/null || true
fi
STATUS=""
DEADLINE=$(( $(date +%s) + 1800 ))
while (( $(date +%s) < DEADLINE )); do
STATUS=$(curl -sk --max-time 5 "${SIGNAGE_URL}/api/v1/nodes/${NODE_ID}/status" | jq -r '.status // empty')
if [[ "$STATUS" == "Approved" || "$STATUS" == "Enrolled" || "$STATUS" == "Online" ]]; then
break
fi
sleep 15
done
if [[ "$STATUS" != "Approved" && "$STATUS" != "Enrolled" && "$STATUS" != "Online" ]]; then
echo "[$(date -Is)] bootstrap: approval not granted within 30min budget" >&2
exit 3
fi
KEY_PATH="${CERT_DIR}/client.key"
CSR_PATH="${CERT_DIR}/client.csr"
openssl ecparam -genkey -name prime256v1 -out "$KEY_PATH"
openssl req -new -key "$KEY_PATH" -out "$CSR_PATH" \
-subj "/CN=${NODE_ID}/O=FlowerCore/OU=SignagePlayer-Pi"
ENROLL_PAYLOAD=$(jq -n --arg csr "$(cat "$CSR_PATH")" '{certificateSigningRequest: $csr}')
HTTP_STATUS=$(curl -sk -o /tmp/enroll-response.json -w "%{http_code}" \
--max-time 15 \
-X POST "${SIGNAGE_URL}/api/v1/nodes/${NODE_ID}/enroll" \
-H "Content-Type: application/json" \
-d "$ENROLL_PAYLOAD")
if [[ "$HTTP_STATUS" != "200" && "$HTTP_STATUS" != "201" ]]; then
echo "[$(date -Is)] bootstrap: enroll failed with HTTP $HTTP_STATUS" >&2
exit 4
fi
jq -r '.clientCertificatePem // .signedCertificatePem' /tmp/enroll-response.json > "${CERT_DIR}/client.crt"
jq -r '.caCertificatePem' /tmp/enroll-response.json > "${CERT_DIR}/ca-chain.pem"
P12_PASS=$(openssl rand -hex 24)
echo -n "$P12_PASS" > "${CERT_DIR}/client.p12.pass"
chmod 0600 "${CERT_DIR}/client.p12.pass"
openssl pkcs12 -export \
-inkey "$KEY_PATH" \
-in "${CERT_DIR}/client.crt" \
-certfile "${CERT_DIR}/ca-chain.pem" \
-out "${CERT_DIR}/client.p12" \
-password "pass:${P12_PASS}"
chown fc-signage:fc-signage "${CERT_DIR}"/* "$NODE_JSON"
chmod 0640 "${CERT_DIR}/client.p12" "${CERT_DIR}/client.crt" "${CERT_DIR}/ca-chain.pem" "$KEY_PATH"
chmod 0600 "${CERT_DIR}/client.p12.pass"
EXPIRY=$(openssl x509 -in "${CERT_DIR}/client.crt" -enddate -noout | sed 's/notAfter=//')
jq --arg ts "$(date -Is)" --arg exp "$EXPIRY" \
'.enrolledAt = $ts | .certExpiry = $exp' "$NODE_JSON" > "${NODE_JSON}.tmp" \
&& mv "${NODE_JSON}.tmp" "$NODE_JSON"
systemctl start flowercore-signage-detect-display.service || true
systemctl start flowercore-signage-player-pi.service || true
echo "[$(date -Is)] bootstrap: enrolled and kiosk started (NodeId=${NODE_ID})"

View File

@@ -0,0 +1,6 @@
#!/usr/bin/env bash
set -euo pipefail
sleep 2
systemctl start flowercore-signage-detect-display.service || true
systemctl restart flowercore-signage-player-pi.service

View File

@@ -0,0 +1,44 @@
#!/usr/bin/env bash
set -euo pipefail
NODE_JSON="/etc/flowercore/signage-node.json"
NODE_ID=$(jq -r '.nodeId' "$NODE_JSON")
SIGNAGE_URL="${FC_SIGNAGE_URL:-https://signage.iamworkin.lan}"
CERT_DIR="/etc/fc-signage-player"
CERT_THUMB=$(openssl pkcs12 -in "$CERT_DIR/client.p12" -passin file:"$CERT_DIR/client.p12.pass" -nodes -nokeys 2>/dev/null \
| openssl x509 -fingerprint -sha256 -noout \
| sed 's/.*=//' \
| tr -d ':')
PLAYER_URL="${SIGNAGE_URL}/player/${NODE_ID}/embed?token=${CERT_THUMB}"
HTTP_STATUS=$(curl -sk -o /dev/null -w "%{http_code}" --max-time 5 \
--cert-type P12 --cert "$CERT_DIR/client.p12:$(cat "$CERT_DIR/client.p12.pass")" \
"$PLAYER_URL" || echo "000")
mkdir -p /var/log/fc-signage-player
if [[ "$HTTP_STATUS" != "200" && "$HTTP_STATUS" != "301" && "$HTTP_STATUS" != "302" ]]; then
echo "[$(date -Is)] /embed returned $HTTP_STATUS; falling back to /player/${NODE_ID}" \
>> /var/log/fc-signage-player/url-divergence.log
PLAYER_URL="${SIGNAGE_URL}/player/${NODE_ID}?token=${CERT_THUMB}"
fi
exec chromium-browser \
--kiosk \
--noerrdialogs \
--disable-infobars \
--disable-translate \
--disable-features=TranslateUI,InfiniteSessionRestore \
--autoplay-policy=no-user-gesture-required \
--password-store=basic \
--user-data-dir=/var/lib/fc-signage-player/profile \
--disk-cache-dir=/var/lib/fc-signage-player/cache \
--disk-cache-size=104857600 \
--no-first-run \
--no-default-browser-check \
--check-for-update-interval=2592000 \
--enable-features=OverlayScrollbar \
--start-fullscreen \
--window-position=0,0 \
--window-size=1920,1080 \
"$PLAYER_URL"

View File

@@ -0,0 +1,20 @@
#!/usr/bin/env bash
set -euo pipefail
mkdir -p /var/log/fc-signage-player
for f in /etc/flowercore/signage-node.json /etc/fc-signage-player/client.p12 /etc/fc-signage-player/client.p12.pass; do
if [[ ! -r "$f" ]]; then
echo "[$(date -Is)] prelaunch: missing or unreadable $f" >&2
exit 1
fi
done
if openssl pkcs12 -in /etc/fc-signage-player/client.p12 -passin file:/etc/fc-signage-player/client.p12.pass -nokeys -clcerts 2>/dev/null \
| openssl x509 -checkend $((7*24*3600)) -noout; then
:
else
echo "[$(date -Is)] prelaunch: client cert expires within 7 days" >&2
fi
echo "[$(date -Is)] prelaunch: ok" | tee -a /var/log/fc-signage-player/prelaunch.log

View File

@@ -0,0 +1,46 @@
#!/usr/bin/env bash
set -euo pipefail
CERT_DIR="/etc/fc-signage-player"
NODE_JSON="/etc/flowercore/signage-node.json"
SIGNAGE_URL="${FC_SIGNAGE_URL:-https://signage.iamworkin.lan}"
[[ -s "$CERT_DIR/client.crt" ]] || { echo "no cert to renew"; exit 0; }
if openssl x509 -in "$CERT_DIR/client.crt" -checkend $((30*24*3600)) -noout; then
exit 0
fi
NODE_ID=$(jq -r '.nodeId' "$NODE_JSON")
NEW_KEY="$CERT_DIR/client.key.new"
NEW_CSR="$CERT_DIR/client.csr.new"
openssl ecparam -genkey -name prime256v1 -out "$NEW_KEY"
openssl req -new -key "$NEW_KEY" -out "$NEW_CSR" \
-subj "/CN=${NODE_ID}/O=FlowerCore/OU=SignagePlayer-Pi"
HTTP_STATUS=$(curl -sk -o /tmp/renew-response.json -w "%{http_code}" \
--cert "$CERT_DIR/client.crt" --key "$CERT_DIR/client.key" \
-X POST "${SIGNAGE_URL}/api/v1/nodes/${NODE_ID}/renew" \
-H "Content-Type: application/json" \
-d "$(jq -n --arg csr "$(cat "$NEW_CSR")" '{certificateSigningRequest: $csr}')")
if [[ "$HTTP_STATUS" != "200" && "$HTTP_STATUS" != "201" ]]; then
echo "[$(date -Is)] renew: failed HTTP $HTTP_STATUS; leaving old cert in place" >&2
exit 5
fi
jq -r '.clientCertificatePem // .signedCertificatePem' /tmp/renew-response.json > "$CERT_DIR/client.crt.new"
jq -r '.caCertificatePem' /tmp/renew-response.json > "$CERT_DIR/ca-chain.pem.new"
P12_PASS=$(cat "$CERT_DIR/client.p12.pass")
openssl pkcs12 -export -inkey "$NEW_KEY" -in "$CERT_DIR/client.crt.new" \
-certfile "$CERT_DIR/ca-chain.pem.new" \
-out "$CERT_DIR/client.p12.new" -password "pass:${P12_PASS}"
mv "$CERT_DIR/client.key.new" "$CERT_DIR/client.key"
mv "$CERT_DIR/client.crt.new" "$CERT_DIR/client.crt"
mv "$CERT_DIR/ca-chain.pem.new" "$CERT_DIR/ca-chain.pem"
mv "$CERT_DIR/client.p12.new" "$CERT_DIR/client.p12"
chown fc-signage:fc-signage "$CERT_DIR"/client.*
systemctl restart flowercore-signage-player-pi.service

View File

@@ -0,0 +1,2 @@
# Settle DRM for 2s before restarting Chromium, then redeclare capabilities.
SUBSYSTEM=="drm", KERNEL=="card?-HDMI-A-?", ACTION=="change", RUN+="/usr/bin/systemctl start flowercore-signage-player-pi-hdmi.service"

View File

@@ -0,0 +1,16 @@
[Unit]
Description=FlowerCore Signage Pi: first-boot identity + mTLS enrollment
Wants=network-online.target
After=network-online.target
Before=flowercore-signage-player-pi.service
[Service]
Type=oneshot
ExecStart=/usr/local/bin/flowercore-signage-bootstrap.sh
RemainAfterExit=yes
StandardOutput=journal
StandardError=journal
TimeoutStartSec=2100
[Install]
WantedBy=multi-user.target

View File

@@ -0,0 +1,8 @@
[Unit]
Description=FlowerCore Signage Pi: detect connected display + declare capabilities
After=flowercore-signage-bootstrap.service
[Service]
Type=oneshot
User=fc-signage
ExecStart=/usr/local/bin/fc-signage-detect-display

View File

@@ -0,0 +1,11 @@
[Unit]
Description=Daily FlowerCore Signage Pi display capability redeclaration
[Timer]
OnCalendar=daily
RandomizedDelaySec=1h
Persistent=true
OnBootSec=30s
[Install]
WantedBy=timers.target

View File

@@ -0,0 +1,7 @@
[Unit]
Description=FlowerCore Signage Pi Player HDMI hotplug responder
DefaultDependencies=no
[Service]
Type=oneshot
ExecStart=/usr/local/bin/flowercore-signage-hdmi-respond.sh

View File

@@ -0,0 +1,30 @@
[Unit]
Description=FlowerCore Digital Signage Pi Player (Chromium kiosk)
Documentation=https://github.com/astoltz/FlowerCore.Notes/blob/master/docs/standards/appletv-pi-signage-agents-design.md
Wants=network-online.target
After=network-online.target graphical.target
ConditionPathExists=/etc/flowercore/signage-node.json
ConditionPathExists=/etc/fc-signage-player/client.p12
[Service]
Type=simple
User=fc-signage
Group=fc-signage
WorkingDirectory=/var/lib/fc-signage-player
EnvironmentFile=-/etc/flowercore/signage-player.env
ExecStartPre=/usr/local/bin/flowercore-signage-prelaunch.sh
ExecStart=/usr/local/bin/flowercore-signage-launch.sh
Restart=always
RestartSec=10s
StartLimitBurst=5
StartLimitIntervalSec=300s
MemoryMax=2G
MemoryHigh=1500M
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/var/lib/fc-signage-player /var/log/fc-signage-player
PrivateTmp=true
NoNewPrivileges=true
[Install]
WantedBy=graphical.target

View File

@@ -0,0 +1,6 @@
[Unit]
Description=FlowerCore Signage Pi: cert renewal worker
[Service]
Type=oneshot
ExecStart=/usr/local/bin/flowercore-signage-renew-cert.sh

View File

@@ -0,0 +1,10 @@
[Unit]
Description=Daily check for FlowerCore Signage Pi cert renewal
[Timer]
OnCalendar=daily
RandomizedDelaySec=2h
Persistent=true
[Install]
WantedBy=timers.target

View File

@@ -0,0 +1,22 @@
#!/usr/bin/env bats
setup() {
APP_ROOT="$(cd "$BATS_TEST_DIRNAME/.." && pwd)"
DETECT="$APP_ROOT/scripts/fc-signage-detect-display"
}
@test "display detection emits graceful disconnected profile when no hdmi connector is present" {
script="$(cat "$DETECT")"
[[ "$script" == *"displayConnected: false"* ]]
[[ "$script" == *"No HDMI display detected"* ]]
}
@test "display detection parses edid, falls back to kmsprint, and logs endpoint failures locally" {
script="$(cat "$DETECT")"
[[ "$script" == *"edid-decode"* ]]
[[ "$script" == *"HDR (Static|Dynamic) Metadata Block"* ]]
[[ "$script" == *"kmsprint"* ]]
[[ "$script" == *"/api/v1/nodes/\${NODE_ID}/capabilities"* ]]
[[ "$script" == *"/api/v1/displays/\${NODE_ID}/capability-profile"* ]]
[[ "$script" == *"capabilities.log"* ]]
}

View File

@@ -0,0 +1,64 @@
#!/usr/bin/env bats
setup() {
APP_ROOT="$(cd "$BATS_TEST_DIRNAME/.." && pwd)"
BOOTSTRAP="$APP_ROOT/scripts/flowercore-signage-bootstrap.sh"
RENEW="$APP_ROOT/scripts/flowercore-signage-renew-cert.sh"
}
@test "bootstrap is idempotent when node is already enrolled" {
script="$(cat "$BOOTSTRAP")"
[[ "$script" == *'[[ -s "$NODE_JSON" && -s "$CERT_DIR/client.p12" ]]'* ]]
[[ "$script" == *"already enrolled"* ]]
[[ "$script" == *"exit 0"* ]]
}
@test "bootstrap generates a stable node uuid and machine id" {
script="$(cat "$BOOTSTRAP")"
[[ "$script" == *"uuidgen"* ]]
[[ "$script" == *"nodeUuid"* ]]
[[ "$script" == *"machineId"* ]]
[[ "$script" == *"cut -c1-16"* ]]
}
@test "bootstrap posts to the canonical register endpoint" {
grep -q '/api/v1/nodes/register' "$BOOTSTRAP"
grep -q '"linux-arm64-pi"' "$BOOTSTRAP"
}
@test "bootstrap retries registration once for first-call races" {
script="$(cat "$BOOTSTRAP")"
[[ "$script" == *"for attempt in 1 2"* ]]
[[ "$script" == *"register attempt \$attempt returned"* ]]
[[ "$script" == *"sleep 5"* ]]
}
@test "bootstrap supports setup-code approval with manual polling fallback" {
script="$(cat "$BOOTSTRAP")"
[[ "$script" == *"signage-setup-code"* ]]
[[ "$script" == *"approve-via-setup-code"* ]]
[[ "$script" == *"+ 1800"* ]]
[[ "$script" == *"sleep 15"* ]]
}
@test "bootstrap generates an ecdsa p256 csr for the signage pi subject" {
script="$(cat "$BOOTSTRAP")"
[[ "$script" == *"ecparam -genkey -name prime256v1"* ]]
[[ "$script" == *'/CN=${NODE_ID}/O=FlowerCore/OU=SignagePlayer-Pi'* ]]
}
@test "bootstrap writes pkcs12 bundle with restrictive permissions" {
script="$(cat "$BOOTSTRAP")"
[[ "$script" == *"openssl pkcs12 -export"* ]]
[[ "$script" == *"client.p12.pass"* ]]
[[ "$script" == *"chmod 0640"* ]]
[[ "$script" == *"chmod 0600"* ]]
}
@test "renewal only calls renew endpoint inside the thirty-day window and swaps atomically" {
script="$(cat "$RENEW")"
[[ "$script" == *'-checkend $((30*24*3600))'* ]]
[[ "$script" == *"/api/v1/nodes/\${NODE_ID}/renew"* ]]
[[ "$script" == *"client.key.new"* ]]
[[ "$script" == *'mv "$CERT_DIR/client.p12.new" "$CERT_DIR/client.p12"'* ]]
}

View File

@@ -0,0 +1,68 @@
#!/usr/bin/env bats
setup() {
APP_ROOT="$(cd "$BATS_TEST_DIRNAME/.." && pwd)"
}
@test "player unit exists" {
[ -f "$APP_ROOT/systemd/flowercore-signage-player-pi.service" ]
}
@test "player unit uses simple chromium service with restart backoff" {
unit="$(cat "$APP_ROOT/systemd/flowercore-signage-player-pi.service")"
[[ "$unit" == *"Type=simple"* ]]
[[ "$unit" == *"Restart=always"* ]]
[[ "$unit" == *"RestartSec=10s"* ]]
[[ "$unit" == *"StartLimitBurst=5"* ]]
[[ "$unit" == *"StartLimitIntervalSec=300s"* ]]
}
@test "player unit caps chromium memory at two gigabytes" {
grep -q '^MemoryMax=2G$' "$APP_ROOT/systemd/flowercore-signage-player-pi.service"
grep -q '^MemoryHigh=1500M$' "$APP_ROOT/systemd/flowercore-signage-player-pi.service"
}
@test "player unit condition-gates startup on identity and p12 certificate" {
grep -q '^ConditionPathExists=/etc/flowercore/signage-node.json$' "$APP_ROOT/systemd/flowercore-signage-player-pi.service"
grep -q '^ConditionPathExists=/etc/fc-signage-player/client.p12$' "$APP_ROOT/systemd/flowercore-signage-player-pi.service"
}
@test "player unit runs prelaunch checks before chromium" {
grep -q '^ExecStartPre=/usr/local/bin/flowercore-signage-prelaunch.sh$' "$APP_ROOT/systemd/flowercore-signage-player-pi.service"
grep -q '^ExecStart=/usr/local/bin/flowercore-signage-launch.sh$' "$APP_ROOT/systemd/flowercore-signage-player-pi.service"
}
@test "hdmi udev rule routes through the two-second settle service" {
rule="$(cat "$APP_ROOT/systemd/99-flowercore-signage-hdmi.rules")"
[[ "$rule" == *'KERNEL=="card?-HDMI-A-?"'* ]]
[[ "$rule" == *"systemctl start flowercore-signage-player-pi-hdmi.service"* ]]
[[ "$rule" != *"systemctl restart flowercore-signage-player-pi.service"* ]]
}
@test "hdmi responder settles, declares display, then restarts chromium" {
responder="$(cat "$APP_ROOT/scripts/flowercore-signage-hdmi-respond.sh")"
[[ "$responder" == *"sleep 2"* ]]
[[ "$responder" == *"systemctl start flowercore-signage-detect-display.service"* ]]
[[ "$responder" == *"systemctl restart flowercore-signage-player-pi.service"* ]]
}
@test "chromium policy json is valid and disables credential prompts" {
command -v jq >/dev/null || skip "jq not installed"
jq -e '.AutofillAddressEnabled == false and .AutofillCreditCardEnabled == false and .PasswordManagerEnabled == false' \
"$APP_ROOT/chromium-policies/flowercore-signage.json" >/dev/null
}
@test "launch script tries embed URL and logs bare-player fallback" {
launch="$(cat "$APP_ROOT/scripts/flowercore-signage-launch.sh")"
[[ "$launch" == *'/player/${NODE_ID}/embed?token=${CERT_THUMB}'* ]]
[[ "$launch" == *"url-divergence.log"* ]]
[[ "$launch" == *'/player/${NODE_ID}?token=${CERT_THUMB}'* ]]
}
@test "prelaunch script validates required node and cert files" {
prelaunch="$(cat "$APP_ROOT/scripts/flowercore-signage-prelaunch.sh")"
[[ "$prelaunch" == *"/etc/flowercore/signage-node.json"* ]]
[[ "$prelaunch" == *"/etc/fc-signage-player/client.p12"* ]]
[[ "$prelaunch" == *"/etc/fc-signage-player/client.p12.pass"* ]]
[[ "$prelaunch" == *"exit 1"* ]]
}

View File

@@ -30,6 +30,7 @@ import logging
import re
import shlex
import subprocess
import unicodedata
from typing import Optional
from fastapi import FastAPI, HTTPException
@@ -60,6 +61,189 @@ class TtsRequest(BaseModel):
volume: int = 100 # 0-200
HEBREW_CHAR_RE = re.compile(r"[\u0590-\u05FF]")
HEBREW_WORD_RE = re.compile(r"[\u0590-\u05FF]+")
# eSpeak-NG's Hebrew voice can spell unpointed Hebrew as Unicode character
# names on some builds. For source-text study reads, prefer a stable
# scholarly transliteration so words sound like words even without niqqud.
HEBREW_WORD_TRANSLITERATIONS = {
"אב": "av",
"אבא": "abba",
"אברהם": "Avraham",
"אדמה": "adamah",
"אדני": "Adonai",
"אדם": "adam",
"אור": "or",
"אלהים": "Elohim",
"אלוהים": "Elohim",
"אמן": "amen",
"אם": "em",
"אמת": "emet",
"ארץ": "eretz",
"אש": "esh",
"את": "et",
"בית": "beit",
"בן": "ben",
"ברא": "bara",
"בראשית": "bereshit",
"ברית": "berit",
"ברוך": "barukh",
"בת": "bat",
"גוי": "goy",
"גוים": "goyim",
"גויים": "goyim",
"דבר": "davar",
"דברים": "devarim",
"דוד": "David",
"הלל": "hallel",
"הארץ": "ha-aretz",
"הברית": "ha-berit",
"החדשה": "ha-chadashah",
"השמים": "ha-shamayim",
"השמיים": "ha-shamayim",
"ויאמר": "vayomer",
"יהוה": "Adonai",
"יוסף": "Yosef",
"יוחנן": "Yochanan",
"ישראל": "Yisrael",
"ישוע": "Yeshua",
"יצחק": "Yitzchak",
"יעקב": "Yaakov",
"ירושלים": "Yerushalayim",
"כהן": "kohen",
"כהנים": "kohanim",
"מים": "mayim",
"מות": "mavet",
"מושיע": "moshia",
"מלך": "melekh",
"מלכות": "malkhut",
"מרים": "Miriam",
"משה": "Moshe",
"משיח": "Mashiach",
"נביא": "navi",
"נביאים": "neviim",
"עם": "am",
"עולם": "olam",
"צדק": "tzedek",
"קדוש": "qadosh",
"קדושים": "qedoshim",
"קול": "qol",
"רוח": "ruach",
"שאול": "Shaul",
"שמים": "shamayim",
"שמיים": "shamayim",
"שמעון": "Shimon",
"שלום": "Shalom",
"תורה": "torah",
"חכמה": "chokhmah",
"חסד": "chesed",
"חיים": "chayim",
"חושך": "choshekh",
}
HEBREW_LETTERS = {
"א": "a",
"ב": "b",
"ג": "g",
"ד": "d",
"ה": "h",
"ו": "v",
"ז": "z",
"ח": "kh",
"ט": "t",
"י": "y",
"כ": "kh",
"ך": "kh",
"ל": "l",
"מ": "m",
"ם": "m",
"נ": "n",
"ן": "n",
"ס": "s",
"ע": "a",
"פ": "p",
"ף": "f",
"צ": "ts",
"ץ": "ts",
"ק": "q",
"ר": "r",
"ש": "sh",
"ת": "t",
}
HEBREW_VOWELISH = {"a", "e", "i", "o", "u"}
def _strip_hebrew_marks(value: str) -> str:
decomposed = unicodedata.normalize("NFD", value)
return "".join(
ch for ch in decomposed
if unicodedata.category(ch) != "Mn" and ch not in {"׳", "״", "־"}
)
def _fallback_hebrew_transliteration(word: str) -> str:
tokens: list[str] = []
chars = list(word)
for index, ch in enumerate(chars):
token = HEBREW_LETTERS.get(ch)
if token is None:
continue
if ch == "ה" and index == len(chars) - 1:
token = "ah"
elif ch == "י" and index > 0:
token = "i"
elif ch == "ו" and index > 0:
token = "o"
tokens.append(token)
if not tokens:
return word
spoken: list[str] = []
for index, token in enumerate(tokens):
spoken.append(token)
next_token = tokens[index + 1] if index + 1 < len(tokens) else ""
if (
token[-1:] not in HEBREW_VOWELISH
and next_token
and next_token[:1] not in HEBREW_VOWELISH
):
spoken.append("a")
return "".join(spoken)
def _transliterate_hebrew_word(match: re.Match[str]) -> str:
original = match.group(0)
normalized = _strip_hebrew_marks(original)
if not normalized:
return original
direct = HEBREW_WORD_TRANSLITERATIONS.get(normalized)
if direct:
return direct
if normalized.startswith("ו") and len(normalized) > 1:
rest = HEBREW_WORD_TRANSLITERATIONS.get(normalized[1:])
if rest:
return f"ve-{rest}"
if normalized.startswith("ה") and len(normalized) > 1:
rest = HEBREW_WORD_TRANSLITERATIONS.get(normalized[1:])
if rest:
return f"ha-{rest}"
return _fallback_hebrew_transliteration(normalized)
def _prepare_synthesis_input(text: str, language: str, voice: str) -> tuple[str, str]:
if language.lower().startswith("he") and HEBREW_CHAR_RE.search(text):
spoken = HEBREW_WORD_RE.sub(_transliterate_hebrew_word, text)
return spoken, "en-us"
return text, voice
def _resolve_voice(req: TtsRequest) -> str:
if req.voice:
return req.voice.strip()
@@ -115,14 +299,15 @@ def tts(req: TtsRequest) -> Response:
raise HTTPException(status_code=400, detail="text is required")
voice = _resolve_voice(req)
spoken_text, synth_voice = _prepare_synthesis_input(req.text, req.language, voice)
args = [
"--stdout",
"-v", voice,
"-v", synth_voice,
"-s", str(max(80, min(450, req.rate))),
"-p", str(max(0, min(99, req.pitch))),
"-a", str(max(0, min(200, req.volume))),
]
wav = _run_espeak(args, req.text.encode("utf-8"))
wav = _run_espeak(args, spoken_text.encode("utf-8"))
if not wav:
raise HTTPException(status_code=500, detail="espeak-ng returned empty stdout")
return Response(content=wav, media_type="audio/wav")
@@ -153,9 +338,9 @@ def tts(req: TtsRequest) -> Response:
PHONEME_DURATION_RE = re.compile(r"^\s*\S+\s+(\d+)\s+", re.MULTILINE)
def _estimate_total_ms(req: TtsRequest, voice: str) -> int:
def _estimate_total_ms(req: TtsRequest, voice: str, spoken_text: str) -> int:
args = ["--pho", "--quiet", "-v", voice, "-s", str(req.rate)]
out = _run_espeak(args, req.text.encode("utf-8"))
out = _run_espeak(args, spoken_text.encode("utf-8"))
text = out.decode("utf-8", errors="replace")
total = 0
for match in PHONEME_DURATION_RE.finditer(text):
@@ -175,7 +360,8 @@ def timings(req: TtsRequest):
if not req.text.strip():
raise HTTPException(status_code=400, detail="text is required")
voice = _resolve_voice(req)
total_ms = _estimate_total_ms(req, voice)
spoken_text, synth_voice = _prepare_synthesis_input(req.text, req.language, voice)
total_ms = _estimate_total_ms(req, synth_voice, spoken_text)
# Distribute total_ms across whitespace-split words proportional to
# character count. Punctuation-only tokens are folded into the previous
@@ -204,7 +390,7 @@ def timings(req: TtsRequest):
{
"text": req.text,
"language": req.language,
"voice": voice,
"voice": synth_voice,
"words": out_words,
"durationMs": total_ms,
}

View File

@@ -359,7 +359,7 @@ spec:
runAsUser: 1654
containers:
- name: biblical-tts
image: localhost/fc-biblical-tts:v1
image: localhost/fc-biblical-tts:v20260506-hebrew-translit
imagePullPolicy: Never
ports:
- containerPort: 10402
@@ -532,7 +532,7 @@ spec:
fsGroupChangePolicy: OnRootMismatch
containers:
- name: web
image: localhost/fc-ttsreader-web:v202605061500
image: localhost/fc-ttsreader-web:v20260518-sprint36-demo-finish-b132cbf
imagePullPolicy: Never
ports:
- containerPort: 5217
@@ -555,9 +555,13 @@ spec:
- name: TtsReader__Jobs__Root
value: "/data/jobs"
- name: TtsReader__Piper__Host
value: "ttsreader-piper.fc-ttsreader.svc.cluster.local."
value: "10.0.57.17"
- name: TtsReader__Piper__Port
value: "10200"
value: "8500"
- name: TtsReader__Piper__Transport
value: "http"
- name: TtsReader__Piper__HttpPath
value: "/tts"
- name: TtsReader__Kokoro__Enabled
value: "true"
- name: TtsReader__Kokoro__BaseUrl
@@ -568,6 +572,14 @@ spec:
value: "http://ttsreader-kokoro.fc-ttsreader.svc.cluster.local.:8880"
- name: TtsReader__Kokoro__TimeoutSeconds
value: "120"
- name: FlowerCore__Tts__BiblicalTts__Enabled
value: "true"
- name: FlowerCore__Tts__BiblicalTts__BaseUrl
value: "http://ttsreader-biblical.fc-ttsreader.svc.cluster.local.:10402"
- name: FlowerCore__Tts__BiblicalTts__TimeoutSeconds
value: "60"
- name: FlowerCore__Tts__BiblicalTts__DefaultLanguage
value: "grc"
- name: Speech__Alignment__Enabled
# Cluster-native faster-whisper (Lane F, 2026-04-25). The
# ttsreader-align deployment in this manifest wraps
@@ -603,6 +615,8 @@ spec:
# the writable PVC mount.
- name: TtsReader__Preview__CacheDirectory
value: "/data/voice-previews"
- name: TtsReader__VoiceLibrary__ReferenceClip__Directory
value: "/data/voice-reference-clips"
# Sprint E XXL Phase 4γ — content-addressed CDN bundle dir for
# POST /api/v1/render. Default "wwwroot/cdn" resolves under the
# read-only app filesystem, so pin to the writable PVC mount

47
apps/fc-updater/README.md Normal file
View File

@@ -0,0 +1,47 @@
# fc-updater — Update Center GitOps adoption
**Status:** adopted into `bluejay-infra` on 2026-05-06. The live ArgoCD
Application is `infra-fc-updater`, generated by the `bluejay-infra`
ApplicationSet with automated sync, `prune: true`, and `selfHeal: true`.
## Managed manifest set
`apps/fc-updater/fc-updater.yaml` manages:
- `Namespace/fc-updater`
- `PersistentVolumeClaim/updatecenter-data`
- `Deployment/updatecenter-web`
- `Service/updatecenter-web`
- `Certificate/updatecenter-web-tls`
- `Certificate/updatecenter-web-internal-tls`
- `IngressRoute/updatecenter-web`
- `IngressRoute/updatecenter-web-internal`
- `IngressRoute/updatecenter-web-public`
The Deployment intentionally sets `revisionHistoryLimit: 3` and
`strategy.type: Recreate`. The service is singleton + SQLite/local bundle
storage on `PersistentVolumeClaim/updatecenter-data`, pinned to
`rke2-server`.
## Runtime dependencies intentionally not stored here
These live Secrets are pre-existing runtime material and are not committed to
Git:
- `updater-bootstrap-auth`
- `updater-signing`
- `updater-webhooks`
- `cf-origin-flowercore-io`
Rotate the Cloudflare Origin Certificate through
`FlowerCore.Notes/docs/standards/code-signing-rotation-runbook.md`; the
shared origin cert must exist in every namespace that serves a
`*.flowercore.io` public IngressRoute.
## Verification
```powershell
kubectl.exe --kubeconfig C:\Users\AndrewStoltz\.kube\rke2.yaml -n argocd get application infra-fc-updater
kubectl.exe --kubeconfig C:\Users\AndrewStoltz\.kube\rke2.yaml -n fc-updater get deploy,svc,ingressroute,certificate,pvc
curl.exe -sk https://update.flowercore.io/api/v1/manifests/_schema
```

View File

@@ -0,0 +1,269 @@
# FlowerCore Update Center
# GitOps adoption of the live fc-updater namespace after PUB-1/PUB-3.
# Runtime credentials remain in existing K8s Secrets; do not store them here.
---
apiVersion: v1
kind: Namespace
metadata:
name: fc-updater
labels:
app.kubernetes.io/part-of: flowercore
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: updatecenter-data
namespace: fc-updater
labels:
app.kubernetes.io/name: updatecenter-web
app.kubernetes.io/part-of: flowercore
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
volumeMode: Filesystem
resources:
requests:
# Sized for fleet bundle storage (LocalFsBundleStore.MaxTotalBytes
# soft cap at 25 GiB per project_uc_remaining_4_apps_signed_2026_05_06).
# Mike Bundle alone is ~5.1 GiB; cluster live capacity is already
# 20 GiB after a manual expand. PVCs cannot shrink, so git must track
# at least the live size to avoid the OutOfSync loop.
storage: 25Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: updatecenter-web
namespace: fc-updater
labels:
app: updatecenter-web
app.kubernetes.io/name: updatecenter-web
app.kubernetes.io/part-of: flowercore
spec:
replicas: 1
revisionHistoryLimit: 3
strategy:
# SQLite + local bundle storage live on a single RWO PVC. Recreate avoids
# two pods overlapping the same write path during future image bumps.
type: Recreate
selector:
matchLabels:
app: updatecenter-web
template:
metadata:
labels:
app: updatecenter-web
spec:
nodeName: rke2-server
containers:
- name: web
image: localhost/fc-updater-web:v20260509-4162dca-authgate
imagePullPolicy: Never
ports:
- containerPort: 8080
name: http
env:
- name: ASPNETCORE_URLS
value: http://+:8080
- name: FlowerCore__Updater__Database__Provider
value: sqlite
- name: FlowerCore__Updater__Database__ConnectionString
value: Data Source=/data/updatecenter.db
- name: FlowerCore__Updater__BundleStorage__LocalFs__RootDirectory
value: /data/bundles
- name: FlowerCore__Updater__PublicShares__RequirePublicVisibilityOnPublicHosts
value: "true"
- name: FlowerCore__Updater__PublicShares__Links__0__Code
value: 8f3c2a9e7d41
- name: FlowerCore__Updater__PublicShares__Links__0__AppId
value: flowercore.faith-ai-mike
- name: FlowerCore__Updater__PublicShares__Links__0__Channel
value: stable
- name: FlowerCore__Updater__PublicShares__Links__0__RuntimeId
value: win-x64
- name: FlowerCore__Updater__PublicShares__Links__0__DisplayName
value: Faith AI Mike Edition
- name: FlowerCore__Updater__PublicShares__Links__0__Headline
value: Faith AI Mike Edition
- name: FlowerCore__Updater__PublicShares__Links__0__Description
value: Private release link for Mike's Faith AI bundle.
- name: FlowerCore__Updater__Auth__Bootstrap__Enabled
value: "true"
- name: FlowerCore__Updater__Auth__Bootstrap__Username
valueFrom:
secretKeyRef:
name: updater-bootstrap-auth
key: username
- name: FlowerCore__Updater__Auth__Bootstrap__Password
valueFrom:
secretKeyRef:
name: updater-bootstrap-auth
key: password
- name: FlowerCore__Updater__Auth__Bootstrap__SigningKey
valueFrom:
secretKeyRef:
name: updater-bootstrap-auth
key: signing-key
- name: FlowerCore__Updater__Signing__AutoSignOnPublish
value: "true"
- name: FlowerCore__Updater__Signing__RequireSignatureOnPublish
value: "true"
- name: FlowerCore__Updater__Signing__PfxBase64
valueFrom:
secretKeyRef:
name: updater-signing
key: pfx-base64
- name: FlowerCore__Updater__Signing__PfxPassword
valueFrom:
secretKeyRef:
name: updater-signing
key: pfx-password
- name: FlowerCore__Updater__Signing__OpItemReference
value: op://FlowerCore/step-ca-codesign
- name: FlowerCore__Updater__Signing__TrustAnchorPath
value: /etc/flowercore-updater/signing/root-ca.pem
- name: FlowerCore__Updater__GitHub__Token
valueFrom:
secretKeyRef:
name: updater-webhooks
key: github-token
- name: FlowerCore__Updater__GitHub__WebhookSecret
valueFrom:
secretKeyRef:
name: updater-webhooks
key: github-webhook-secret
- name: FlowerCore__Updater__Gitea__Token
valueFrom:
secretKeyRef:
name: updater-webhooks
key: gitea-token
- name: FlowerCore__Updater__Gitea__WebhookSecret
valueFrom:
secretKeyRef:
name: updater-webhooks
key: gitea-webhook-secret
readinessProbe:
tcpSocket:
port: http
initialDelaySeconds: 10
periodSeconds: 15
livenessProbe:
tcpSocket:
port: http
initialDelaySeconds: 30
periodSeconds: 30
volumeMounts:
- name: data
mountPath: /data
- name: signing
mountPath: /etc/flowercore-updater/signing
readOnly: true
volumes:
- name: data
persistentVolumeClaim:
claimName: updatecenter-data
- name: signing
secret:
secretName: updater-signing
items:
- key: root-ca.pem
path: root-ca.pem
---
apiVersion: v1
kind: Service
metadata:
name: updatecenter-web
namespace: fc-updater
labels:
app: updatecenter-web
app.kubernetes.io/name: updatecenter-web
app.kubernetes.io/part-of: flowercore
spec:
type: ClusterIP
selector:
app: updatecenter-web
ports:
- name: http
port: 8080
targetPort: http
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: updatecenter-web-tls
namespace: fc-updater
spec:
secretName: updatecenter-web-tls
issuerRef:
name: step-ca-acme
kind: ClusterIssuer
dnsNames:
- updatecenter.iamworkin.lan
- updates.iamworkin.lan
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: updatecenter-web-internal-tls
namespace: fc-updater
spec:
secretName: updatecenter-web-internal-tls
issuerRef:
name: step-ca-acme
kind: ClusterIssuer
dnsNames:
- updatecenter-internal.iamworkin.lan
---
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: updatecenter-web
namespace: fc-updater
spec:
entryPoints:
- web
- websecure
routes:
- match: (Host(`updatecenter.iamworkin.lan`) || Host(`updates.iamworkin.lan`)) && (Method(`GET`) || Method(`HEAD`) || Method(`POST`) || Method(`OPTIONS`))
kind: Rule
services:
- name: updatecenter-web
port: 8080
tls:
secretName: updatecenter-web-tls
---
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: updatecenter-web-internal
namespace: fc-updater
spec:
entryPoints:
- web
- websecure
routes:
- match: Host(`updatecenter-internal.iamworkin.lan`)
kind: Rule
services:
- name: updatecenter-web
port: 8080
tls:
secretName: updatecenter-web-internal-tls
---
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: updatecenter-web-public
namespace: fc-updater
spec:
entryPoints:
- websecure
routes:
- match: (Host(`update.flowercore.io`) || Host(`updates.flowercore.io`)) && (Method(`GET`) || Method(`HEAD`) || Method(`POST`) || Method(`OPTIONS`))
kind: Rule
services:
- name: updatecenter-web
port: 8080
tls:
secretName: cf-origin-flowercore-io

View File

@@ -0,0 +1,7 @@
# ArgoCD's bluejay-infra ApplicationSet uses a directory generator and does
# not require kustomization.yaml. Keep this anyway as the manifest inventory
# and for local `kubectl kustomize apps/fc-updater` previews.
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- fc-updater.yaml

View File

@@ -1,6 +1,11 @@
# FlowerCore Tenant — flowercore.io (main brand)
# Public-facing placeholder landing page served by nginx
# ArgoCD managed - BlueJay Lab
# FlowerCore Tenant — retired flowercore.io placeholder.
#
# Public flowercore.io/www.flowercore.io routing is now owned by
# apps/fc-landing/fc-landing.yaml. This tenant placeholder remains available
# only as an in-cluster service; do not create a duplicate public
# IngressRoute here because it competes with fc-landing and requires a
# namespace-local cf-origin-flowercore-io Secret.
# ArgoCD managed - BlueJay Lab
---
apiVersion: v1
kind: Namespace
@@ -10,15 +15,9 @@ metadata:
app.kubernetes.io/part-of: bluejay-infra
flowercore.io/tenant: flowercore
---
# NOTE: The existing cf-origin-flowercore-io secret (covering *.flowercore.io)
# must be copied into this namespace. It already exists in other namespaces.
# Copy with: kubectl get secret cf-origin-flowercore-io -n fc-system -o yaml \
# | sed 's/namespace: .*/namespace: tenant-flowercore/' \
# | kubectl apply -f -
---
# Landing page HTML
apiVersion: v1
kind: ConfigMap
# Landing page HTML
apiVersion: v1
kind: ConfigMap
metadata:
name: flowercore-web-html
namespace: tenant-flowercore
@@ -308,25 +307,6 @@ spec:
selector:
app: flowercore-web
ports:
- port: 80
targetPort: 80
name: http
---
# Traefik IngressRoute — public via Cloudflare
# Uses existing cf-origin-flowercore-io cert (must be copied to this namespace)
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: flowercore-web
namespace: tenant-flowercore
spec:
entryPoints:
- websecure
routes:
- match: Host(`flowercore.io`) || Host(`www.flowercore.io`)
kind: Rule
services:
- name: flowercore-web
port: 80
tls:
secretName: cf-origin-flowercore-io
- port: 80
targetPort: 80
name: http

View File

@@ -0,0 +1,76 @@
# GitHub Runner Fleet
ArgoCD owns `apps/github-runner/github-runner.yaml`. Do not patch live runner
Deployments with `kubectl`; update this manifest and let ArgoCD reconcile.
## Runner Shape
All repo-scoped Linux runners use:
- `ACCESS_TOKEN` from the `github-runner-token` Secret
- `RUN_AS_ROOT=false`
- `EPHEMERAL=true`
- `LABELS=self-hosted,linux,fc-build-linux`
- writable non-root paths under `/home/runner` for .NET, NuGet, XDG cache, and
Actions tool cache
`github-runner` for `FlowerCore.Common` is single-replica because it retains the
original Longhorn ReadWriteOnce NuGet PVC. Every other repo-scoped runner uses
two replicas with per-pod `emptyDir` caches. That is the safe backlog-drain
strategy: no two pods share one RWO PVC.
Sprint 32 final long-tail wave adds 16 two-replica Deployments:
`FlowerCore.Knowledge`, `FlowerCore.LlmBridge`, `FlowerCore.Media`,
`FlowerCore.Presentations`, `FlowerCore.RemoteDesktop`, `FlowerCore.DNS`,
`FlowerCore.Distribution`, `FlowerCore.Scoreboard`,
`FlowerCore.SegmentDisplay`, `FlowerCore.Signage.Contracts`,
`FlowerCore.SignalControl`, `FlowerCore.Intranet.Web`,
`FlowerCore.Provisioning`, `FlowerCore.Redis`, `FlowerCore.MessageBoard`, and
`FlowerCore.MenuBoard`.
## Post-Merge Proof
After the PR is merged and ArgoCD syncs, verify the runner fleet:
```bash
kubectl -n github-runner get deploy,pods,pvc
```
Verify GitHub registration for the repo-scoped runners:
```bash
for repo in FlowerCore.Common FlowerCore.Shared.Pos FlowerCore.Puppet FlowerCore.Signage \
FlowerCore.DMS FlowerCore.Telephony FlowerCore.Print.Web FlowerCore.Chat \
FlowerCore.MySQL FlowerCore.Kiosk.Linux FlowerCore.Marquee FlowerCore.TtsReader \
FlowerCore.Knowledge FlowerCore.LlmBridge FlowerCore.Media \
FlowerCore.Presentations FlowerCore.RemoteDesktop FlowerCore.DNS \
FlowerCore.Distribution FlowerCore.Scoreboard FlowerCore.SegmentDisplay \
FlowerCore.Signage.Contracts FlowerCore.SignalControl FlowerCore.Intranet.Web \
FlowerCore.Provisioning FlowerCore.Redis FlowerCore.MessageBoard \
FlowerCore.MenuBoard; do
echo "=== $repo ==="
gh api "/repos/astoltz/$repo/actions/runners" \
--jq '.runners[] | select(.labels[].name == "fc-build-linux") | {name,status,busy,labels:[.labels[].name]}'
done
```
Shared.Pos publish proof after the runner pod is online:
```bash
gh run list --repo astoltz/FlowerCore.Shared.Pos \
--workflow "Build, Test & Publish" --branch main --limit 5
```
If the latest run is still queued after runner registration, rerun the workflow
from GitHub Actions and verify it lands on an `rke2-linux-*` runner.
## Failure Notes
- `actions/setup-dotnet` permission error at `/usr/share/dotnet`: check that
`DOTNET_INSTALL_DIR=/home/runner/.dotnet` and related cache env vars are
present on the runner pod.
- `404` during runner registration: the fine-grained PAT is valid but missing
repository access for that repo. Add the repo to the PAT access list; the PAT
value does not change.
- `Multi-Attach` volume error: only the Common runner uses a RWO PVC and it must
stay single-replica. New multi-replica runners use `emptyDir`.

File diff suppressed because it is too large Load Diff

View File

@@ -466,11 +466,11 @@ spec:
itemPath: vaults/IAmWorkin/items/Guacamole JSON Auth
---
---
# 1Password-backed credentials for Mac mini VNC access (Phase 1 2026-04-28)
# 1Password-backed credentials for Mac mini VNC access (Phase 1 <EFBFBD> 2026-04-28)
# The operator mints Secret 'macmini-vnc-creds' with keys: username, password, VNC Password
# Note: '1Password' field label 'VNC Password' -> K8s Secret key 'VNC Password' (space retained)
# Guacamole VNC connection password is sourced from the 'VNC Password' field.
# Actual IP is 10.0.56.115 (INFRA VLAN) the 1P item 'IP' field is kept as backup reference.
# Actual IP is 10.0.56.115 (INFRA VLAN) <EFBFBD> the 1P item 'IP' field is kept as backup reference.
apiVersion: onepassword.com/v1
kind: OnePasswordItem
metadata:
@@ -481,6 +481,7 @@ metadata:
app.kubernetes.io/part-of: flowercore
spec:
itemPath: vaults/IAmWorkin/items/Mac Mini
---
# Blue Jay Branding Extension (CSS + translations)
apiVersion: v1
kind: ConfigMap

View File

@@ -46,7 +46,7 @@ spec:
spec:
containers:
- name: intranet-web
image: localhost/fc-intranet-web:v20260506-2120
image: localhost/fc-intranet-web:v20260508-brochure-w1
imagePullPolicy: Never
ports:
- containerPort: 5300

View File

@@ -241,8 +241,12 @@ spec:
kind: ClusterIssuer
dnsNames:
- knowledge.iamworkin.lan
duration: 2160h # 90d
renewBefore: 720h # 30d
# step-ca ACME caps lifetime at 30d; requesting 90d silently capped
# made renewBefore=cert-lifetime → perpetual renewal loop (10888+ CRs
# in 18h on 2026-05-07). Match working 720h/240h pattern from other
# FC services.
duration: 720h # 30d (step-ca cap)
renewBefore: 240h # 10d
---
apiVersion: traefik.io/v1alpha1
kind: IngressRoute

View File

@@ -0,0 +1,93 @@
# =============================================================================
# ci1 - Windows Server 2025 KubeVirt VM (GitHub Actions Self-Hosted Runner)
# =============================================================================
# Boots from the sysprepped containerDisk template built by the Windows VM
# sysprep pipeline. See docs/infrastructure/windows-vm-sysprep-pipeline.md.
# Path A/B/C install history is preserved in git log only.
# =============================================================================
apiVersion: v1
kind: Namespace
metadata:
name: kubevirt-vms
labels:
app.kubernetes.io/part-of: kubevirt-stack
pod-security.kubernetes.io/enforce: privileged
---
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
name: ci1
namespace: kubevirt-vms
labels:
app: ci-runner
role: github-actions-runner
flowercore.io/managed-by: bluejay-infra
spec:
runStrategy: Always
template:
metadata:
labels:
app: ci-runner
role: github-actions-runner
kubevirt.io/vm: ci1
spec:
domain:
cpu:
cores: 8
sockets: 1
threads: 1
memory:
guest: 16Gi
resources:
requests:
memory: 16Gi
limits:
memory: 16Gi
clock:
utc: {}
timer:
hpet:
present: false
pit:
tickPolicy: delay
rtc:
tickPolicy: catchup
hyperv: {}
features:
acpi: {}
apic: {}
hyperv:
relaxed: {}
vapic: {}
spinlocks:
spinlocks: 8191
smm: {}
firmware:
bootloader:
efi:
secureBoot: false
devices:
tpm: {}
disks:
- name: rootdisk
disk:
bus: virtio
interfaces:
# Pod-network fallback for CI runner outbound traffic. Switch to
# prod-vlan57 once the bridge/NAD lane is ready for L2 access.
- name: default
masquerade: {}
model: virtio
machine:
type: q35
networks:
- name: default
pod: {}
volumes:
- name: rootdisk
containerDisk:
image: localhost/fc-win-server-2025:v1
imagePullPolicy: Never
terminationGracePeriodSeconds: 3600

View File

@@ -0,0 +1,3 @@
resources:
- ci1.yaml
- prod-vlan57-nad.yaml

View File

@@ -0,0 +1,69 @@
# =============================================================================
# NetworkAttachmentDefinition — PROD VLAN 57 bridge
# =============================================================================
# Purpose: makes KubeVirt VMs reachable on the PROD VLAN (10.0.57.0/24)
# alongside the existing pod network. Required for ci1 to bridge onto PROD
# (e.g. to provision/scrape edge1, edge2, kiosks, Pis on the same L2 segment).
#
# **DEPLOY GATE — Phase 1.5 host work required first**:
# On every RKE2 node (rke2-server, rke2-agent1, rke2-agent2):
# 1. Switch port (UniFi USL16LP) trunks VLAN 57 to the node — usually
# already true since BLUEJAY-WS reaches 10.0.57.x services. Verify
# with `ip link show enp86s0.57` after configuring sub-interface, OR
# `tcpdump -ni enp86s0 vlan 57` and ping a known PROD host.
# 2. Linux bridge `br-prod` enslaving `enp86s0.57` (VLAN sub-interface).
# NetworkManager profile examples in the runbook below.
# 3. Verify Multus DaemonSet `kube-multus-ds` is Ready on all nodes.
#
# Without those, applying this NAD has no effect except to register the CRD.
# A VM that requests this NAD with no bridge present will fail with:
# `error adding pod kubevirt-vms_ci1 to CNI network "prod-vlan57": failed to
# plumb VLAN: open /sys/class/net/br-prod/master: no such file or directory`
#
# Configuration notes:
# - cniVersion 0.3.1 to match Multus daemon-config.json
# - mtu 1500 (matches enp86s0 default; bump if jumbo frames configured)
# - bridge name `br-prod` is convention; if Puppet picks a different name
# (e.g. `br57`, `br-vlan57`), edit BOTH this NAD and the ci1.yaml
# interface block. Keep them in sync.
# - vlan: 0 because the host bridge already strips VLAN tag (br-prod sits
# on top of `enp86s0.57`). If we instead used a VLAN-aware bridge with
# trunk port, set vlan: 57 here. Current convention is VLAN-stripped at
# the sub-interface, so the bridge passes untagged frames.
#
# Apply:
# kubectl --kubeconfig $env:USERPROFILE\.kube\rke2.yaml apply -f apps/kubevirt-vms/prod-vlan57-nad.yaml
#
# Then update ci1.yaml networks: stanza to:
# - name: prod-net
# multus:
# networkName: kubevirt-vms/prod-vlan57
# and the interface block from `masquerade` to `bridge`.
# =============================================================================
---
# Namespace must exist already (created by ci1.yaml's first document).
# This file imports a NAD into that same namespace.
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
name: prod-vlan57
namespace: kubevirt-vms
annotations:
bluejay.iamworkin.lan/host-bridge: "br-prod (enslaves enp86s0.57)"
bluejay.iamworkin.lan/cidr: "10.0.57.0/24"
bluejay.iamworkin.lan/gateway: "10.0.57.1"
bluejay.iamworkin.lan/dns: "10.0.56.1 (pfSense Unbound)"
spec:
config: |
{
"cniVersion": "0.3.1",
"name": "prod-vlan57",
"type": "bridge",
"bridge": "br-prod",
"ipam": {},
"mtu": 1500,
"vlan": 0,
"promiscMode": true,
"preserveDefaultVlan": false
}

View File

@@ -0,0 +1,99 @@
# =============================================================================
# Windows Server 2025 ISO — Static NFS PV (Path B for SATA-CDROM timeout)
# =============================================================================
# Purpose: Mount the ISO from Synology NAS via NFS instead of from a Longhorn-
# backed Filesystem PVC.
#
# Why: SATA-CDROM emulation reading from a Longhorn-backed Filesystem PVC is
# too slow for OVMF's boot read window — the DVD-ROM enumeration times out
# before the bootloader can be read. Symptom on the serial console:
# BdsDxe: failed to start Boot0001 "UEFI QEMU DVD-ROM QM00001 " from ...
# BdsDxe: failed to start Boot0001 ... Time out
# BdsDxe: No bootable option or device was found
# Diagnosis confirmed the ISO content is a perfectly valid bootable ISO9660
# image — the bug is in the timing path between OVMF and Longhorn-backed
# storage, not in the ISO itself.
#
# Block-mode PVC was tried (`volumeMode: Block` via DataVolume) and would
# likely fix the timing, but CDI v1.65.0's upload-target pod cannot open the
# block device due to runAsUser:107 + capabilities.drop:[ALL] and we got:
# blockdev: cannot open /dev/cdi-block-volume: Permission denied
#
# NFS-mounted ISO bypasses both issues: no Longhorn slowness, no CDI upload
# pod permission concerns. The ISO is read directly from the NAS over a
# native NFSv4.1 mount that QEMU's SATA emulator can read at full LAN speed.
#
# Layout on Synology:
# /volume1/ISOs/ (existing export, RKE2 ACL)
# en-us_windows_server_2025_updated_march_2026_x64_dvd_8e06425a.iso
# win2025-iso-disk/ (new subdir, 2026-05-08)
# disk.img -> hardlink to ../en-us_windows_server_2025_..._8e06425a.iso
#
# KubeVirt's launcher pod expects a PVC mounted at
# /var/run/kubevirt-private/vmi-disks/<diskName>/disk.img — by mounting the
# `win2025-iso-disk/` subdir as the NFS PV root, `disk.img` lives at the PV's
# root and KubeVirt's CDROM emulator finds it without any path manipulation.
#
# A symlink would NOT work for sub-path NFS mounts (the relative target
# `../...iso` falls outside the sub-mount root). A hardlink works because it
# references the same inode regardless of mount point.
#
# Memory references:
# - feedback_synology_nfs_volume1_kubernetes_export_scoped (Synology export
# scoping pattern — but /volume1/ISOs export, unlike /volume1/kubernetes,
# does support sub-path mounts because Synology NFS is configured with
# pseudo-fs in NFSv4.1)
# - feedback_kubevirt_iso_first_install_bootorder_and_runstrategy (boot
# order / runStrategy gotchas, separate from the storage timing issue)
#
# Validation (2026-05-08, from rke2-server / rke2-agent1 / rke2-agent2):
# mount -t nfs -o nfsvers=4.1,ro 10.0.58.3:/volume1/ISOs/win2025-iso-disk /tmp/m
# file /tmp/m/disk.img
# -> ISO 9660 CD-ROM filesystem data 'SSS_X64FRE_EN-US_DV9' (bootable)
# All 3 RKE2 nodes can mount and read.
# =============================================================================
apiVersion: v1
kind: PersistentVolume
metadata:
name: windows-server-2025-iso-nfs
labels:
flowercore.io/iso: windows-server-2025
flowercore.io/managed-by: bluejay-infra
spec:
capacity:
storage: 8Gi
accessModes:
- ReadOnlyMany
volumeMode: Filesystem
persistentVolumeReclaimPolicy: Retain
storageClassName: "" # static, no provisioner
mountOptions:
- nfsvers=4.1
- ro
- hard
- timeo=600
- retrans=3
nfs:
server: 10.0.58.3 # BlueJayNAS Synology DS1621+ on HOME VLAN 58
path: /volume1/ISOs/win2025-iso-disk
readOnly: true
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: windows-server-2025-iso-nfs
namespace: kubevirt-vms
labels:
app: ci-runner
flowercore.io/managed-by: bluejay-infra
spec:
accessModes:
- ReadOnlyMany
volumeMode: Filesystem
resources:
requests:
storage: 8Gi
storageClassName: ""
volumeName: windows-server-2025-iso-nfs

View File

@@ -75,6 +75,20 @@ data:
cluster: "rke2"
role: "agent"
# Mac mini macOS runner node (INFRA VLAN)
- job_name: "macmini-node"
scrape_timeout: 15s
static_configs:
- targets: ["10.0.56.115:9100"]
labels:
instance: "macmini"
host: "macmini.iamworkin.lan"
vlan: "infra"
arch: "arm64"
role: "macos-runner"
puppet_managed: "true"
puppet_server: "puppet.iamworkin.lan"
# In-cluster node-exporter DaemonSet
- job_name: "k8s-node-exporter"
kubernetes_sd_configs:
@@ -697,6 +711,36 @@ data:
summary: "Print.Web Ollama runner held for >10m ({{ $labels.model }})"
description: "Print.Web reports model {{ $labels.model }} with {{ $value | printf \"%.0f\" }}s of keep-alive remaining. Check concurrent requests before the Pi 5 Ollama lane thrashes."
- name: macmini-runners
rules:
- alert: MacMiniRunnerOffline
expr: (flowercore_github_runner_online{runner=~"macmini-.*"} == 0) or absent(flowercore_github_runner_online{runner=~"macmini-.*"})
for: 10m
labels:
severity: warning
service: github-runner
annotations:
summary: "Mac mini GitHub runner offline ({{ $labels.runner }})"
description: "A macmini-* GitHub Actions runner has not reported online for more than 10 minutes. Puppet manages its LaunchDaemon under /Library/LaunchDaemons/io.flowercore.github-runner-<slug>.plist; runners survive reboot and do not require a GUI session."
- name: linux-runners
rules:
- alert: LinuxRunnerOffline
expr: |
kube_deployment_status_replicas_ready{
namespace="github-runner",
deployment=~"github-runner(|-(sharedpos|puppet|signage|dms|telephony|print-web|chat|mysql|kiosk-linux))"
} == 0
for: 5m
labels:
severity: warning
alert_channel: irc
service: github-runner
team: ci
annotations:
summary: "Linux CI runner offline: {{ $labels.deployment }}"
description: "Deployment {{ $labels.deployment }} in namespace github-runner has 0 ready replicas for more than 5 minutes. CI jobs targeting this repo will queue until the runner pod restarts and re-registers with GitHub. Check pods with: kubectl -n github-runner get pods -l app.kubernetes.io/name={{ $labels.deployment }}. Check logs with: kubectl -n github-runner logs -l app.kubernetes.io/name={{ $labels.deployment }} --tail=50. Common causes: PAT missing repo access, runner CrashLoopBackOff, or node/resource pressure."
- name: remote-desktop
rules:
- alert: RemoteDesktopWebDown
@@ -922,6 +966,52 @@ data:
annotations:
summary: "Disk usage high on {{ $labels.instance }} ({{ $value | printf \"%.1f\" }}%)"
# Puppet agent + service alerts.
# Mirror of FlowerCore.Notes/scripts/monitoring/alerts.yml `puppet` group
# so a future migration to in-cluster Prometheus inherits the ruleset.
# Source-of-truth for the live Podman Prometheus on noc1 is the Notes file.
# See feedback_monitoring_k8s_target_vs_live_podman.
- name: puppet
rules:
- alert: PuppetAgentReportStale
expr: puppet_last_run_age_seconds > 7200
for: 30m
labels:
severity: warning
alert_channel: irc
annotations:
summary: "Puppet agent {{ $labels.instance }} hasn't reported in over 2h"
description: "Last run age: {{ $value | humanizeDuration }}. The puppet agent on {{ $labels.instance }} may be stopped, the node may be powered off, or noc1 may be unreachable from this node."
runbook: "1. SSH to node (via noc1 jumpbox if needed) 2. sudo systemctl status puppet 3. sudo puppet agent -t --noop to force a run 4. Check r10k: ssh fcadmin@10.0.56.10 'sudo podman logs openvoxserver --tail 50' 5. Verify noc1 reachability: ping puppet.iamworkin.lan"
- alert: PuppetAgentReportCritical
expr: puppet_last_run_age_seconds > 86400
for: 1h
labels:
severity: critical
alert_channel: irc
annotations:
summary: "Puppet agent {{ $labels.instance }} silent for over 24h — node is unmanaged"
description: "Last run age: {{ $value | humanizeDuration }}. Node {{ $labels.instance }} has not submitted a Puppet report in over 24 hours. Config drift is accumulating — investigate immediately. If intentional (maintenance), add to the exclusion filter or silence in Grafana."
runbook: "URGENT: 1. Check node power state 2. SSH via noc1 jumpbox: ssh fcadmin@10.0.56.10 then ssh <node> 3. sudo systemctl status puppet 4. sudo systemctl start puppet + sudo puppet agent -t 5. Check for network partitions (VLAN connectivity to 10.0.56.10) 6. If node was recently reimaged: sudo puppet agent -t to re-register with new SSL cert"
# Sprint 33 Cx-7 Phase B (2026-05-25 postmortem follow-up):
# Detects puppet.service in failed state — distinct from PuppetAgentReportStale
# which catches "agent hasn't run." This catches "systemd gave up restarting it"
# (CA-verify loop or other fatal exit). Requires node-exporter systemd collector
# enabled with --collector.systemd. If `node_systemd_unit_state` has no series
# for a node, the collector is disabled there — flag in postmortem follow-up.
- alert: PuppetServiceFailed
expr: node_systemd_unit_state{name="puppet.service",state="failed"} == 1
for: 5m
labels:
severity: warning
alert_channel: irc
annotations:
summary: "Puppet service failed on {{ $labels.instance }}"
description: "puppet.service on {{ $labels.instance }} has been in failed state for 5+ minutes. systemd has stopped auto-restarting (CA-verify-loop or other exit). Manual `systemctl status puppet` confirms. Run `sudo systemctl start puppet` to recover; investigate journal for root cause."
runbook_url: "https://github.com/astoltz/FlowerCore.Notes/blob/master/memory/feedback_puppet_service_dead_after_ca_loop_alert_misreads.md"
# K8s pod-state alerts. Require kube-state-metrics scrape (added
# 2026-04-26 — see scrape_configs above). Would have surfaced the
# agent-zero ollama-proxy 172x crash-loop instead of letting it
@@ -974,6 +1064,39 @@ data:
summary: "Deployment {{ $labels.namespace }}/{{ $labels.deployment }} replica mismatch"
description: "Spec wants {{ $labels.spec_replicas }} but only {{ $value }} available. Likely a rollout stuck on probe failure, scheduling, or PVC."
# Q-MR-3 (2026-05-11): multus memory pressure — catches the next OOM
# cascade BEFORE multus is OOM-killed cluster-wide. The 2026-05-10
# outage (21h) hit because no alert fired on the rising multus working
# set — only downstream blackbox / Traefik / service alerts. With
# 1Gi limit (bluejay-infra@eb8693e), 80% = ~800MiB; steady-state
# runs ~150-250MiB so this only fires when an avalanche starts.
- alert: MultusMemoryPressure
expr: |
container_memory_working_set_bytes{container="kube-multus"}
/ container_spec_memory_limit_bytes{container="kube-multus"} > 0.8
for: 5m
labels:
severity: critical
alert_channel: thermal_print
annotations:
summary: "kube-multus memory >80% of limit on {{ $labels.node }} for 5m"
description: "kube-multus working set is {{ $value | humanizePercentage }} of its memory limit on node {{ $labels.node }}. If this keeps climbing, multus will OOM and all new pod networking will halt cluster-wide (precedent: 2026-05-10 outage)."
# Q-MR-3 (2026-05-11): namespace pending-pod backlog — catches the
# operator-leak avalanche pattern BEFORE it cascades into a multus
# CNI OOM. Any FC operator (RemoteDesktop / Distribution / WorldBuilder)
# emitting pods without ownerReferences will accumulate them when
# the operator crashes. >25 pending pods in any namespace for 30m
# is the signal to investigate the reconciler.
- alert: NamespacePendingPodBacklog
expr: sum by (namespace) (kube_pod_status_phase{phase="Pending"}) > 25
for: 30m
labels:
severity: warning
annotations:
summary: "Namespace {{ $labels.namespace }} has {{ $value }} Pending pods for 30m"
description: "Pending pod count in {{ $labels.namespace }} exceeds 25 sustained for 30m. Likely operator-leak avalanche pattern — children emitted without ownerReferences. Risk of multus CNI OOM cascade."
# Longhorn storage health alerts. Required: longhorn scrape job
# (added 2026-04-26 — see scrape_configs above). The K8s events
# for "snapshot becomes not ready to use" are transient lifecycle
@@ -1150,24 +1273,55 @@ metadata:
data:
notify.py: |
#!/usr/bin/env python3
"""HTTP->IRC alert relay with thermal printer forwarding for Grafana webhooks.
Listens on :9119, posts to #alerts on UnrealIRCd via raw IRC protocol.
Alerts tagged alert_channel=thermal_print also POST to Print.Web /api/print/alert.
"""HTTP->IRC alert relay with thermal-printer DIGEST forwarding.
Listens on :9119, posts to #alerts on UnrealIRCd, forwards to Print.Web
/api/print/alert. Thermal printing is BATCHED into hourly digests by
default so the printer no longer spam-fires per Grafana webhook.
Routing (per Grafana webhook alert):
- IRC: always per-event (operator likes the stream)
- Thermal printer:
* severity in {critical,disaster,page} OR
label alert_channel=thermal_print_immediate -> print NOW
* label alert_channel=thermal_print -> enqueue into hourly digest
* everything else -> IRC only
- RESOLVED webhooks remove the alert from the digest buffer
Env vars (defaults preserve old behavior on first deploy):
THERMAL_PRINT_ENABLED default "true" - master kill switch
BATCH_INTERVAL_MIN default "60" - minutes between digest prints
BATCH_MAX_PENDING default "50" - force-flush threshold
HTTP surface:
POST / - Grafana webhook entry
POST /flush - manual digest flush (idempotent)
GET / - status + config + buffer depth + stats
"""
import json, socket, sys, time
import json, os, socket, sys, threading, time
from collections import defaultdict
from datetime import datetime, timezone
from http.server import HTTPServer, BaseHTTPRequestHandler
from urllib.request import Request, urlopen
from urllib.error import URLError
IRC_HOST = "unrealircd.irc.svc" # short name: CoreDNS ndots:5 + iamworkin.lan template hijacks full .cluster.local (see memory)
IRC_PORT = 6667
IRC_NICK = "grafana-bot"
IRC_CHANNEL = "#alerts"
PRINT_WEB_URL = "http://10.0.57.16:5200/api/print/alert"
PRINT_ENABLED = True
THERMAL_PRINT_ENABLED = os.environ.get("THERMAL_PRINT_ENABLED", "true").lower() == "true"
BATCH_INTERVAL_MIN = int(os.environ.get("BATCH_INTERVAL_MIN", "60"))
BATCH_MAX_PENDING = int(os.environ.get("BATCH_MAX_PENDING", "50"))
IRC_HOST = os.environ.get("IRC_HOST", "unrealircd.irc.svc")
IRC_PORT = int(os.environ.get("IRC_PORT", "6667"))
IRC_NICK = os.environ.get("IRC_NICK", "grafana-bot")
IRC_CHANNEL = os.environ.get("IRC_CHANNEL", "#alerts")
PRINT_WEB_URL = os.environ.get("PRINT_WEB_URL", "http://10.0.57.16:5200/api/print/alert")
_buffer_lock = threading.Lock()
_buffer = {} # fingerprint -> {"alert": dict, "first_seen": float, "last_seen": float}
_last_flush_time = time.time()
_stats = {"webhooks_received": 0, "irc_sent": 0, "print_immediate": 0,
"digest_flushed": 0, "buffer_dedup": 0, "buffer_added": 0,
"buffer_resolved": 0, "started_at": time.time()}
def send_irc(message):
"""Connect, handle PING, join, send, quit."""
try:
sock = socket.create_connection((IRC_HOST, IRC_PORT), timeout=15)
sock.sendall(f"NICK {IRC_NICK}\r\n".encode())
@@ -1200,52 +1354,137 @@ data:
time.sleep(0.5)
sock.sendall(b"QUIT :alert delivered\r\n")
sock.close()
_stats["irc_sent"] += 1
return True
except Exception as e:
print(f"[irc-notify] IRC send failed: {e}", file=sys.stderr)
return False
def send_thermal_print(alert):
if not PRINT_ENABLED: return
labels = alert.get("labels", {})
annotations = alert.get("annotations", {})
status = alert.get("status", "firing").upper()
summary = annotations.get("summary", "")
description = annotations.get("description", "")
runbook = annotations.get("runbook", "")
# Build a useful message: summary + description + runbook steps
parts = []
if summary: parts.append(summary)
if description and description != summary: parts.append(description)
if runbook: parts.append("STEPS: " + runbook)
message = " | ".join(parts) if parts else labels.get("alertname", "Unknown alert")
payload = {
"title": labels.get("alertname", "Unknown"),
"severity": labels.get("severity", "warning").capitalize(),
"host": labels.get("instance", labels.get("host", "unknown")),
"message": message,
"eventId": alert.get("fingerprint", ""),
"source": "Grafana",
"status": "RESOLVED" if status == "RESOLVED" else "PROBLEM",
"acknowledged": False
}
def post_thermal(payload, kind):
if not THERMAL_PRINT_ENABLED:
print(f"[irc-notify] thermal disabled; skip {kind} ({payload.get('title','?')[:40]})", file=sys.stderr)
return False
try:
req = Request(PRINT_WEB_URL, data=json.dumps(payload).encode("utf-8"),
headers={"Content-Type": "application/json"}, method="POST")
resp = urlopen(req, timeout=10)
print(f"[irc-notify] Thermal print sent: {resp.read().decode()}", file=sys.stderr)
if kind == "immediate": _stats["print_immediate"] += 1
print(f"[irc-notify] thermal {kind} sent: {payload.get('title','?')[:50]}", file=sys.stderr)
return True
except Exception as e:
print(f"[irc-notify] Thermal print failed: {e}", file=sys.stderr)
print(f"[irc-notify] thermal {kind} failed: {e}", file=sys.stderr)
return False
def should_print(alert):
def fingerprint_of(alert):
fp = alert.get("fingerprint", "")
if fp: return fp
labels = alert.get("labels", {})
if labels.get("alert_channel") == "thermal_print": return True
if labels.get("severity", "").lower() in ("critical", "disaster"): return True
if alert.get("status", "").upper() == "RESOLVED": return False
return False
target = labels.get("pod") or labels.get("instance") or labels.get("deployment") or labels.get("statefulset") or labels.get("namespace") or ""
return f"{labels.get('alertname','?')}/{labels.get('namespace','')}/{target}"
def is_critical(alert):
return alert.get("labels", {}).get("severity", "").lower() in ("critical", "disaster", "page")
def is_immediate_label(alert):
return alert.get("labels", {}).get("alert_channel") == "thermal_print_immediate"
def is_batched_label(alert):
return alert.get("labels", {}).get("alert_channel") == "thermal_print"
def add_to_digest(alert):
"""Add an alert to the digest buffer. Returns True if the buffer GREW
(new fingerprint), False if it was a dedup, resolution, or no-op.
"""
if not THERMAL_PRINT_ENABLED: return False
fp = fingerprint_of(alert)
status = alert.get("status", "firing").lower()
with _buffer_lock:
if status == "resolved":
if fp in _buffer:
del _buffer[fp]
_stats["buffer_resolved"] += 1
return False
if fp in _buffer:
_buffer[fp]["last_seen"] = time.time()
_buffer[fp]["alert"] = alert
_stats["buffer_dedup"] += 1
return False
_buffer[fp] = {"alert": alert, "first_seen": time.time(), "last_seen": time.time()}
_stats["buffer_added"] += 1
return True
def build_digest_payload():
with _buffer_lock:
items = list(_buffer.values())
if not items: return None
by_name = defaultdict(list)
for item in items:
labels = item["alert"].get("labels", {})
by_name[labels.get("alertname", "Unknown")].append(item)
lines = []
for name, group in sorted(by_name.items()):
targets = []
for it in group[:5]:
labels = it["alert"].get("labels", {})
t = (labels.get("pod") or labels.get("instance") or labels.get("deployment")
or labels.get("statefulset") or labels.get("namespace") or "?")
targets.append(t)
more = f" (+{len(group)-5})" if len(group) > 5 else ""
sevs = sorted({it["alert"].get("labels", {}).get("severity", "warning") for it in group})
lines.append(f"[{'/'.join(sevs)}] {name} x{len(group)}: {', '.join(targets)}{more}")
now = datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M UTC")
title = f"Alert digest: {len(items)} firing"
body = "\n".join([
f"=== {title} ===",
f"as of {now}",
"",
*lines,
"",
"Stream: #alerts (IRC) | Triage: grafana-noc1.iamworkin.lan",
"Force-flush: POST irc-notify.monitoring.svc:9119/flush",
])
return {"title": title, "severity": "Warning", "host": "monitoring",
"message": body, "eventId": f"digest-{int(time.time())}",
"source": "Grafana digest", "status": "PROBLEM", "acknowledged": False}
def flush_digest():
payload = build_digest_payload()
if payload is None:
print("[irc-notify] flush: buffer empty, no digest sent", file=sys.stderr)
return False
sent = post_thermal(payload, "digest")
with _buffer_lock:
_buffer.clear()
if sent: _stats["digest_flushed"] += 1
return sent
def digest_loop():
global _last_flush_time
while True:
try:
now = time.time()
elapsed = now - _last_flush_time
if elapsed >= BATCH_INTERVAL_MIN * 60:
print(f"[irc-notify] digest tick: interval reached ({BATCH_INTERVAL_MIN}m); buffer={len(_buffer)}", file=sys.stderr)
flush_digest()
_last_flush_time = now
elif len(_buffer) >= BATCH_MAX_PENDING:
print(f"[irc-notify] digest tick: buffer full ({len(_buffer)}); force flush", file=sys.stderr)
flush_digest()
_last_flush_time = now
time.sleep(15)
except Exception as e:
print(f"[irc-notify] digest loop error: {e}", file=sys.stderr)
time.sleep(60)
class Handler(BaseHTTPRequestHandler):
def do_POST(self):
if self.path == "/flush":
ok = flush_digest()
self.send_response(200); self.send_header("Content-Type", "application/json"); self.end_headers()
self.wfile.write(json.dumps({"flushed": ok, "buffer_after": len(_buffer)}).encode())
return
_stats["webhooks_received"] += 1
length = int(self.headers.get("Content-Length", 0))
body = json.loads(self.rfile.read(length)) if length else {}
for alert in body.get("alerts", []):
@@ -1260,22 +1499,56 @@ data:
msg = f"{icon}{sev_tag} {name}: {summary}"
if desc: msg += f"\n {desc}"
send_irc(msg)
if should_print(alert): send_thermal_print(alert)
self.send_response(200)
self.send_header("Content-Type", "application/json")
self.end_headers()
# Thermal routing — EVERYTHING (including criticals) goes into
# the hourly digest. Only the explicit `alert_channel=thermal_print_immediate`
# label bypasses, and even that flushes-the-current-digest rather
# than printing a standalone job, so the same fingerprint can't
# spam the printer per webhook cycle.
if status == "RESOLVED":
add_to_digest(alert) # removes from buffer
continue
if is_immediate_label(alert):
# Explicit opt-in for "paper this NOW" — first arrival of a
# new fingerprint triggers an immediate digest flush; repeat
# webhooks for the same fingerprint dedupe in the buffer
# until the next interval or until the alert resolves.
new_in_buffer = add_to_digest(alert)
if new_in_buffer:
global _last_flush_time
flush_digest()
_last_flush_time = time.time()
elif is_critical(alert) or is_batched_label(alert):
add_to_digest(alert)
# else: IRC-only (warnings without thermal_print label)
self.send_response(200); self.send_header("Content-Type", "application/json"); self.end_headers()
self.wfile.write(b'{"status":"ok"}')
def do_GET(self):
self.send_response(200)
self.send_header("Content-Type", "application/json")
self.end_headers()
self.wfile.write(json.dumps({"service":"irc-notify","thermal_print":PRINT_ENABLED}).encode())
self.send_response(200); self.send_header("Content-Type", "application/json"); self.end_headers()
with _buffer_lock:
alertnames = sorted({it["alert"].get("labels", {}).get("alertname", "?") for it in _buffer.values()})
depth = len(_buffer)
info = {
"service": "irc-notify",
"config": {"thermal_print_enabled": THERMAL_PRINT_ENABLED,
"batch_interval_min": BATCH_INTERVAL_MIN,
"batch_max_pending": BATCH_MAX_PENDING,
"irc_target": f"{IRC_HOST}:{IRC_PORT} {IRC_CHANNEL}",
"print_web_url": PRINT_WEB_URL},
"buffer": {"depth": depth, "alertnames": alertnames,
"seconds_since_last_flush": int(time.time() - _last_flush_time),
"seconds_until_next_flush": max(0, int(BATCH_INTERVAL_MIN*60 - (time.time() - _last_flush_time)))},
"stats": _stats,
}
self.wfile.write(json.dumps(info, indent=2).encode())
def log_message(self, format, *args):
print(f"[irc-notify] {args[0]}", file=sys.stderr)
if __name__ == "__main__":
threading.Thread(target=digest_loop, daemon=True).start()
server = HTTPServer(("0.0.0.0", 9119), Handler)
print(f"IRC alert relay :9119 -> {IRC_HOST}:{IRC_PORT} {IRC_CHANNEL} (thermal: {PRINT_ENABLED})")
print(f"[irc-notify] :9119 -> IRC {IRC_HOST}:{IRC_PORT} {IRC_CHANNEL} | thermal={'ON' if THERMAL_PRINT_ENABLED else 'OFF'} | digest={BATCH_INTERVAL_MIN}m max={BATCH_MAX_PENDING}", file=sys.stderr)
server.serve_forever()
# =============================================================================
@@ -3362,6 +3635,39 @@ data:
relativeTimeRange: {from: 120, to: 0}
datasourceUid: __expr__
model: {type: threshold, expression: B, conditions: [{evaluator: {params: [600], type: gt}}], refId: C}
- orgId: 1
name: CI Runners
folder: CI Alerts
interval: 1m
rules:
- uid: linux-runner-offline
title: LinuxRunnerOffline
condition: C
for: 5m
noDataState: OK
execErrState: Error
annotations:
summary: "Linux CI runner offline: {{ $labels.deployment }}"
description: "A github-runner namespace Deployment has 0 ready replicas for more than 5 minutes. CI jobs targeting that repo will queue until the runner pod restarts and re-registers."
runbook: "1. kubectl -n github-runner get pods -l app.kubernetes.io/name={{ $labels.deployment }} 2. kubectl -n github-runner logs -l app.kubernetes.io/name={{ $labels.deployment }} --tail=50 3. Verify PAT repo access if registration returns 404 4. Verify no RWO PVC is shared by scaled runners"
labels:
severity: warning
service: github-runner
alert_channel: irc
team: ci
data:
- refId: A
relativeTimeRange: {from: 300, to: 0}
datasourceUid: prometheus
model: {expr: 'kube_deployment_status_replicas_ready{namespace="github-runner",deployment=~"github-runner(|-(sharedpos|puppet|signage|dms|telephony|print-web|chat|mysql|kiosk-linux))"} == 0', instant: true, refId: A}
- refId: B
relativeTimeRange: {from: 300, to: 0}
datasourceUid: __expr__
model: {type: reduce, expression: A, reducer: last, refId: B}
- refId: C
relativeTimeRange: {from: 300, to: 0}
datasourceUid: __expr__
model: {type: threshold, expression: B, conditions: [{evaluator: {params: [0], type: gt}}], refId: C}
- orgId: 1
name: Infrastructure
folder: AI Stack Alerts
@@ -3394,6 +3700,32 @@ data:
relativeTimeRange: {from: 120, to: 0}
datasourceUid: __expr__
model: {type: threshold, expression: B, conditions: [{evaluator: {params: [1], type: lt}}], refId: C}
- uid: macmini-runner-offline
title: MacMiniRunnerOffline
condition: C
for: 10m
noDataState: Alerting
execErrState: OK
annotations:
summary: Mac mini GitHub runner offline
description: "One or more macmini-* GitHub Actions runners have not reported online for more than 10 minutes. LaunchDaemons survive reboot and do not require the bluejay GUI session."
runbook: "1. ssh fcadmin@macmini.iamworkin.lan 2. launchctl print system/io.flowercore.github-runner-<slug> 3. Check /Users/fcadmin/Library/Logs/github-runners/<slug>/stderr.log 4. Re-register the repo runner if .runner is missing"
labels:
severity: warning
service: github-runner
data:
- refId: A
relativeTimeRange: {from: 600, to: 0}
datasourceUid: prometheus
model: {expr: 'min(flowercore_github_runner_online{runner=~"macmini-.*"} or vector(0))', instant: true, refId: A}
- refId: B
relativeTimeRange: {from: 600, to: 0}
datasourceUid: __expr__
model: {type: reduce, expression: A, reducer: last, refId: B}
- refId: C
relativeTimeRange: {from: 600, to: 0}
datasourceUid: __expr__
model: {type: threshold, expression: B, conditions: [{evaluator: {params: [1], type: lt}}], refId: C}
- uid: high-cpu
title: High CPU (>85%)
condition: C

297
apps/multus/multus.yaml Normal file
View File

@@ -0,0 +1,297 @@
# =============================================================================
# Multus CNI — Meta-CNI for multi-network attachment to pods/VMs
# =============================================================================
# Purpose: enable KubeVirt VMs (and any future workload) to attach additional
# network interfaces beyond the default Calico-managed pod network. Required
# for ci1 (Windows Server 2025 KubeVirt VM) to bridge onto PROD VLAN 57.
#
# Source: upstream k8snetworkplumbingwg/multus-cni v4.2.2
# https://github.com/k8snetworkplumbingwg/multus-cni/blob/v4.2.2/deployments/multus-daemonset-thick.yml
#
# Inlined verbatim (with project header + version pin annotation) for
# reproducibility and air-gap safety. Bumping versions = edit this file +
# git push. ArgoCD picks up via the bluejay-infra ApplicationSet
# (apps/* directory generator on main).
#
# Why thick plugin (not thin):
# - Thick = daemon + thin shim binary; daemon handles NAD watch + CRD reads
# centrally so each pod's CNI ADD doesn't hit the K8s API server. Better
# for clusters with many NAD-using pods.
# - Thin = each CNI ADD process directly contacts K8s API. Simpler but
# scales worse and has more failure modes.
# - KubeVirt + multi-VM workload pattern fits thick perfectly.
#
# Cluster context (verified 2026-05-08):
# - RKE2 v1.34.5 on 3 nodes (rke2-server, rke2-agent1, rke2-agent2)
# - Calico CNI (Tigera-managed) at /etc/cni/net.d + /opt/cni/bin (default)
# - openSUSE Leap 16, kernel 6.12, containerd 2.1.5
# - host bridge for PROD VLAN 57 = `br-prod` (PUPPET HOST WORK — see Phase 1.5
# in docs/infrastructure/windows-server-build-runner-plan.md)
#
# Version pin: snapshot-thick → pinning to v4.2.2 release tag at deploy time
# would require a private mirror of the image. Upstream `snapshot-thick` tag
# is updated on every release, so for now we trust upstream + Calico's
# established pattern. Pin to a specific SHA256 once we mirror to Gitea OCI.
#
# Apply (once committed to bluejay-infra main, ApplicationSet auto-syncs):
# git add apps/multus/multus.yaml && git commit && git push origin main
# # ArgoCD `infra-multus` Application appears within 3 min via ApplicationSet
#
# Verify:
# kubectl -n kube-system get ds kube-multus-ds
# kubectl -n kube-system rollout status ds kube-multus-ds
# kubectl get crd network-attachment-definitions.k8s.cni.cncf.io
# =============================================================================
---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: network-attachment-definitions.k8s.cni.cncf.io
annotations:
bluejay.iamworkin.lan/source: "k8snetworkplumbingwg/multus-cni v4.2.2"
spec:
group: k8s.cni.cncf.io
scope: Namespaced
names:
plural: network-attachment-definitions
singular: network-attachment-definition
kind: NetworkAttachmentDefinition
shortNames:
- net-attach-def
versions:
- name: v1
served: true
storage: true
schema:
openAPIV3Schema:
description: 'NetworkAttachmentDefinition is a CRD schema specified by the Network Plumbing
Working Group to express the intent for attaching pods to one or more logical or physical
networks. More information available at: https://github.com/k8snetworkplumbingwg/multi-net-spec'
type: object
properties:
apiVersion:
type: string
kind:
type: string
metadata:
type: object
spec:
description: 'NetworkAttachmentDefinition spec defines the desired state of a network attachment'
type: object
properties:
config:
description: 'NetworkAttachmentDefinition config is a JSON-formatted CNI configuration'
type: string
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: multus
rules:
- apiGroups: ["k8s.cni.cncf.io"]
resources:
- '*'
verbs:
- '*'
- apiGroups:
- ""
resources:
- pods
- pods/status
verbs:
- get
- list
- update
- watch
- apiGroups:
- ""
- events.k8s.io
resources:
- events
verbs:
- create
- patch
- update
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: multus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: multus
subjects:
- kind: ServiceAccount
name: multus
namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: multus
namespace: kube-system
---
kind: ConfigMap
apiVersion: v1
metadata:
name: multus-daemon-config
namespace: kube-system
labels:
tier: node
app: multus
data:
daemon-config.json: |
{
"chrootDir": "/hostroot",
"cniVersion": "0.3.1",
"logLevel": "verbose",
"logToStderr": true,
"cniConfigDir": "/host/etc/cni/net.d",
"multusAutoconfigDir": "/host/etc/cni/net.d",
"multusConfigFile": "auto",
"socketDir": "/host/run/multus/"
}
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: kube-multus-ds
namespace: kube-system
labels:
tier: node
app: multus
name: multus
spec:
selector:
matchLabels:
name: multus
updateStrategy:
type: RollingUpdate
template:
metadata:
labels:
tier: node
app: multus
name: multus
spec:
hostNetwork: true
hostPID: true
tolerations:
- operator: Exists
effect: NoSchedule
- operator: Exists
effect: NoExecute
serviceAccountName: multus
containers:
- name: kube-multus
image: ghcr.io/k8snetworkplumbingwg/multus-cni:snapshot-thick
command: [ "/usr/src/multus-cni/bin/multus-daemon" ]
# 2026-05-11: upstream default of 50Mi memory limit OOM-cascades when
# an operator-owned namespace accumulates >100 pending pods retrying
# CNI ADD. RemoteDesktop emitted 219 orphan rd-browser-only pods
# (missing OwnerReferences), kubelet's CNI ADD avalanche pushed multus
# over 50Mi, OOMKilled, restarted with even bigger backlog → loop.
# 21h cluster outage. See FlowerCore.Notes:
# feedback_multus_50mi_limit_oom_orphan_pod_avalanche.md
# 1Gi limit / 512Mi request comfortably handles a 200+ pod CNI
# catchup burst on 64GB nodes (nodes are <25% used in steady-state).
# Drop back toward 256Mi only after MultusMemoryPressure alert
# proves steady-state working set sits well below 200Mi.
resources:
requests:
cpu: "100m"
memory: "512Mi"
limits:
cpu: "100m"
memory: "1Gi"
securityContext:
privileged: true
terminationMessagePolicy: FallbackToLogsOnError
volumeMounts:
- name: cni
mountPath: /host/etc/cni/net.d
# multus-daemon expects that cnibin path must be identical between pod and container host.
# e.g. if the cni bin is in '/opt/cni/bin' on the container host side, then it should be mount to '/opt/cni/bin' in multus-daemon,
# not to any other directory, like '/opt/bin' or '/usr/bin'.
- name: cnibin
mountPath: /opt/cni/bin
- name: host-run
mountPath: /host/run
- name: host-var-lib-cni-multus
mountPath: /var/lib/cni/multus
- name: host-var-lib-kubelet
mountPath: /var/lib/kubelet
mountPropagation: HostToContainer
- name: host-run-k8s-cni-cncf-io
mountPath: /run/k8s.cni.cncf.io
- name: host-run-netns
mountPath: /run/netns
mountPropagation: HostToContainer
- name: multus-daemon-config
mountPath: /etc/cni/net.d/multus.d
readOnly: true
- name: hostroot
mountPath: /hostroot
mountPropagation: HostToContainer
- mountPath: /etc/cni/multus/net.d
name: multus-conf-dir
env:
- name: MULTUS_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
initContainers:
- name: install-multus-binary
image: ghcr.io/k8snetworkplumbingwg/multus-cni:snapshot-thick
command:
- "sh"
- "-c"
- "cp /usr/src/multus-cni/bin/multus-shim /host/opt/cni/bin/multus-shim && cp /usr/src/multus-cni/bin/passthru /host/opt/cni/bin/passthru"
resources:
requests:
cpu: "10m"
memory: "15Mi"
securityContext:
privileged: true
terminationMessagePolicy: FallbackToLogsOnError
volumeMounts:
- name: cnibin
mountPath: /host/opt/cni/bin
mountPropagation: Bidirectional
terminationGracePeriodSeconds: 10
volumes:
- name: cni
hostPath:
path: /etc/cni/net.d
- name: cnibin
hostPath:
path: /opt/cni/bin
- name: hostroot
hostPath:
path: /
- name: multus-daemon-config
configMap:
name: multus-daemon-config
items:
- key: daemon-config.json
path: daemon-config.json
- name: host-run
hostPath:
path: /run
- name: host-var-lib-cni-multus
hostPath:
path: /var/lib/cni/multus
- name: host-var-lib-kubelet
hostPath:
path: /var/lib/kubelet
- name: host-run-k8s-cni-cncf-io
hostPath:
path: /run/k8s.cni.cncf.io
- name: host-run-netns
hostPath:
path: /run/netns/
- name: multus-conf-dir
hostPath:
path: /etc/cni/multus/net.d

View File

@@ -0,0 +1,210 @@
# Selenium Grid NetworkPolicy.
#
# Captured into bluejay-infra 2026-05-07 during the regroup audit. This
# NetworkPolicy was previously applied via `kubectl apply` directly to
# the cluster with no source-of-truth anywhere — a fresh cluster rebuild
# would have lost all of it (including the Selenium Grid → Traefik VIP
# allow rule for AAT runs against `*.iamworkin.lan` services).
#
# The Selenium Grid Deployment + Services themselves are still managed
# outside ArgoCD (deployed via raw kubectl from the original Selenium
# Grid bring-up). Migrating those into bluejay-infra is a separate lane —
# this commit only restores GitOps repeatability for the NetworkPolicy.
#
# Rules captured from the live cluster's `kubectl get netpol -n selenium
# selenium-netpol -o yaml` on 2026-05-07. Originally applied 2026-03-15
# (from `metadata.creationTimestamp` before the field was stripped).
#
# Allows:
# - Egress: CoreDNS, intra-namespace pod-to-pod (4442/4443/4444/5555),
# Traefik VIP for `*.iamworkin.lan` AAT runs, all FC namespaces on
# standard FC service ports (5100/5200/5300/5400/8080), pod CIDR
# (10.42.0.0/16) + service CIDR (10.43.0.0/16) for the same ports,
# LAN gateway range (10.0.56.0/24) for HTTPS, edge2 CUPS print
# (10.0.57.16:5200), public internet 80/443 (excluding RFC1918), and
# fc-signage:5190 for the signage AAT lane.
# - Ingress: Traefik (4444 + 8089 ACME-solver-style), intra-pod,
# telephony / gitea / fc-system / fc-signage namespaces on 4444.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: selenium-netpol
namespace: selenium
labels:
app.kubernetes.io/part-of: selenium
app.kubernetes.io/component: isolation
spec:
egress:
- ports:
- port: 53
protocol: UDP
- port: 53
protocol: TCP
to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
- ports:
- port: 4442
protocol: TCP
- port: 4443
protocol: TCP
- port: 4444
protocol: TCP
- port: 5555
protocol: TCP
to:
- podSelector: {}
- ports:
- port: 443
protocol: TCP
- port: 80
protocol: TCP
to:
- ipBlock:
cidr: 10.0.56.200/32
- ports:
- port: 443
protocol: TCP
- port: 80
protocol: TCP
- port: 5200
protocol: TCP
- port: 5300
protocol: TCP
- port: 5400
protocol: TCP
- port: 5100
protocol: TCP
- port: 8080
protocol: TCP
to:
- namespaceSelector: {}
- ports:
- port: 443
protocol: TCP
- port: 80
protocol: TCP
- port: 8443
protocol: TCP
- port: 8080
protocol: TCP
- port: 5200
protocol: TCP
- port: 5300
protocol: TCP
- port: 5400
protocol: TCP
- port: 5100
protocol: TCP
to:
- ipBlock:
cidr: 10.43.0.0/16
- ports:
- port: 443
protocol: TCP
- port: 80
protocol: TCP
- port: 8443
protocol: TCP
- port: 8080
protocol: TCP
- port: 5200
protocol: TCP
- port: 5300
protocol: TCP
- port: 5400
protocol: TCP
- port: 5100
protocol: TCP
to:
- ipBlock:
cidr: 10.42.0.0/16
- ports:
- port: 443
protocol: TCP
- port: 80
protocol: TCP
- port: 8443
protocol: TCP
to:
- ipBlock:
cidr: 10.0.56.0/24
- ports:
- port: 5200
protocol: TCP
to:
- ipBlock:
cidr: 10.0.57.16/32
- ports:
- port: 80
protocol: TCP
- port: 443
protocol: TCP
to:
- ipBlock:
cidr: 0.0.0.0/0
except:
- 172.16.0.0/12
- 192.168.0.0/16
- ports:
- port: 5190
protocol: TCP
to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: fc-signage
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: traefik-system
ports:
- port: 4444
protocol: TCP
- port: 8089
protocol: TCP
- from:
- podSelector: {}
ports:
- port: 4442
protocol: TCP
- port: 4443
protocol: TCP
- port: 4444
protocol: TCP
- port: 5555
protocol: TCP
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: telephony
ports:
- port: 4444
protocol: TCP
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: gitea
ports:
- port: 4444
protocol: TCP
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: fc-system
ports:
- port: 4444
protocol: TCP
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: fc-signage
ports:
- port: 4444
protocol: TCP
podSelector: {}
policyTypes:
- Ingress
- Egress

View File

@@ -127,10 +127,13 @@ spec:
initContainers:
- name: fix-data-perms
image: busybox:latest
# Also chown /shared-tts (hostPath /tmp/tts-audio) so the non-root
# app user (uid 1654) can write Piper .sln16 files that Asterisk
# reads at /var/lib/asterisk/sounds/tts. World-readable (755) is
# fine — Asterisk runs as a different uid in the other pod.
# Must run as root to chown the hostPath /tmp/tts-audio that may be
# root-owned after node reboot. Pod-level runAsNonRoot:true would
# otherwise inherit and chown would fail with EPERM (see Notes memory
# feedback_hostpath_initcontainer_chown_perms).
securityContext:
runAsUser: 0
runAsNonRoot: false
command: ["sh", "-c", "chown -R 1654:1654 /data && chown 1654:1654 /shared-tts && chmod 0755 /shared-tts"]
volumeMounts:
- name: telephony-data

View File

@@ -28,9 +28,12 @@ Source: `D:\git\FlowerCore\FlowerCore.WorldBuilder` (master)
Memory: `feedback_rke2_image_import_per_node_scp`.
3. **Bump image tag** in `worldbuilder.yaml` and git push.
ArgoCD ApplicationSet picks up within ~3 minutes.
4. **First production render** — open `https://worldbuilder.iamworkin.lan`,
create World → Character → Storyboard → ExportJob, confirm artifact
downloads. ComfyUI lives on BLUEJAY-WS at `http://10.0.56.20:8188`.
4. **First production render** — open
`https://worldbuilder.iamworkin.lan/studio/c32e0000-0000-4000-8000-000000000004`
and confirm the Cyberpunk Blue Jay demo prompt loads with five seeded fake
generated images. This Sprint 32 visitor-safe profile uses
`ClientMode=fake`; switch the image-generation env vars back to ComfyUI only
for an operator-owned GPU render lane.
## Health probes
@@ -53,8 +56,13 @@ Source: `D:\git\FlowerCore\FlowerCore.WorldBuilder` (master)
## Image generation backend
`FlowerCore:WorldBuilder:ImageGeneration:BaseUrl=http://10.0.56.20:8188` —
ComfyUI runs on BLUEJAY-WS Windows (R9700 / gfx1201 / ROCm 7.2.1). Pod reaches
the workstation directly across the 10.0.56.0/24 VLAN (no Podman-style host-
filter issues — K8s pods route via Calico, which is L3-routed across the
VLAN).
Sprint 32 pins the Kubernetes profile to
`FlowerCore:WorldBuilder:ImageGeneration:ClientMode=fake` with
`BaseUrl=http://127.0.0.1:1`. That keeps the public/internal visitor demo
deterministic, avoids GPU exposure, and still exercises the studio/gallery
surface with persisted generated-image metadata.
The previous ComfyUI backend target was `http://10.0.56.20:8188` on
BLUEJAY-WS (R9700 / gfx1201 / ROCm 7.2.1). Re-enable it only in an
operator-owned follow-up that also verifies workstation reachability and image
import freshness.

View File

@@ -16,7 +16,11 @@ kind: Namespace
metadata:
name: fc-worldbuilder
labels:
app.kubernetes.io/name: fc-worldbuilder
app.kubernetes.io/part-of: flowercore
app.kubernetes.io/managed-by: argocd
flowercore.io/tenant-id: system
flowercore.io/created-by: bluejay-infra
---
# SQLite DB + generated image gallery + PDF/PNG exports.
# Longhorn RWO — single replica with `Recreate` rollout strategy keeps it safe.
@@ -25,6 +29,13 @@ kind: PersistentVolumeClaim
metadata:
name: worldbuilder-data
namespace: fc-worldbuilder
labels:
app.kubernetes.io/name: worldbuilder-data
app.kubernetes.io/component: storage
app.kubernetes.io/part-of: flowercore
app.kubernetes.io/managed-by: argocd
flowercore.io/tenant-id: system
flowercore.io/created-by: bluejay-infra
spec:
accessModes:
- ReadWriteOnce
@@ -40,7 +51,13 @@ metadata:
namespace: fc-worldbuilder
labels:
app.kubernetes.io/name: worldbuilder-web
app.kubernetes.io/component: web
app.kubernetes.io/part-of: flowercore
app.kubernetes.io/managed-by: argocd
flowercore.io/tenant-id: system
flowercore.io/created-by: bluejay-infra
annotations:
flowercore.io/traceability-standard: k8s-pod-ownership-and-traceability-standard
spec:
replicas: 1
revisionHistoryLimit: 3
@@ -54,11 +71,16 @@ spec:
metadata:
labels:
app.kubernetes.io/name: worldbuilder-web
app.kubernetes.io/component: web
app.kubernetes.io/part-of: flowercore
app.kubernetes.io/managed-by: argocd
flowercore.io/tenant-id: system
flowercore.io/created-by: bluejay-infra
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/metrics/prometheus"
flowercore.io/audit-trace-id: "worldbuilder-runtime-demo"
spec:
securityContext:
fsGroup: 1654
@@ -92,11 +114,14 @@ spec:
value: "/data/gallery"
- name: FlowerCore__WorldBuilder__Export__RootPath
value: "/data/exports"
# ComfyUI on BLUEJAY-WS (R9700 / gfx1201 / ROCm 7.2.1).
# Visitor-safe Sprint 32 profile: fake backend keeps public demo
# rendering deterministic and avoids exposing BLUEJAY-WS GPU.
- name: FlowerCore__WorldBuilder__ImageGeneration__BaseUrl
value: "http://10.0.56.20:8188"
value: "http://127.0.0.1:1"
- name: FlowerCore__WorldBuilder__ImageGeneration__ClientMode
value: "comfyui"
value: "fake"
- name: FlowerCore__WorldBuilder__ImageGeneration__BackendId
value: "fake"
resources:
# Cluster CPU-request budget runs hot (99% on all 3 nodes at deploy
# time) while actual CPU usage is well below capacity. Idle Blazor
@@ -165,7 +190,11 @@ metadata:
namespace: fc-worldbuilder
labels:
app.kubernetes.io/name: worldbuilder-web
app.kubernetes.io/component: web
app.kubernetes.io/part-of: flowercore
app.kubernetes.io/managed-by: argocd
flowercore.io/tenant-id: system
flowercore.io/created-by: bluejay-infra
spec:
type: ClusterIP
selector:
@@ -180,6 +209,13 @@ kind: Certificate
metadata:
name: worldbuilder-web-tls
namespace: fc-worldbuilder
labels:
app.kubernetes.io/name: worldbuilder-web-tls
app.kubernetes.io/component: ingress
app.kubernetes.io/part-of: flowercore
app.kubernetes.io/managed-by: argocd
flowercore.io/tenant-id: system
flowercore.io/created-by: bluejay-infra
spec:
secretName: worldbuilder-web-tls
issuerRef:
@@ -187,14 +223,26 @@ spec:
kind: ClusterIssuer
dnsNames:
- worldbuilder.iamworkin.lan
duration: 2160h # 90d
renewBefore: 720h # 30d
# step-ca ACME provisioner caps lifetime at 30d. Requesting 90d
# silently capped to 30d, making renewBefore 720h (30d) equal to the
# actual cert lifetime — triggered a perpetual renewal loop that
# generated 2365+ CertificateRequest objects in 18h. Match the working
# 720h/240h pattern used by every other FC service cert.
duration: 720h # 30d (step-ca cap)
renewBefore: 240h # 10d
---
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: worldbuilder-web
namespace: fc-worldbuilder
labels:
app.kubernetes.io/name: worldbuilder-web
app.kubernetes.io/component: ingress
app.kubernetes.io/part-of: flowercore
app.kubernetes.io/managed-by: argocd
flowercore.io/tenant-id: system
flowercore.io/created-by: bluejay-infra
spec:
entryPoints:
- websecure

View File

@@ -305,15 +305,17 @@ spec:
path: /
port: 8080
initialDelaySeconds: 60
timeoutSeconds: 5
timeoutSeconds: 15
periodSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /
port: 8080
initialDelaySeconds: 30
periodSeconds: 5
timeoutSeconds: 5
timeoutSeconds: 15
failureThreshold: 3
---
apiVersion: v1
kind: Service

View File

@@ -0,0 +1,84 @@
# openvoxserver Quadlet Durability
This runbook documents the noc1 `openvoxserver` durability fix for the Puppet control-repo deploy path. The service is a noc1 host artifact, not an ArgoCD application, so discovery always starts on noc1 rather than in `apps/*`.
## Current State
As of the Sprint 32 Cx-12 apply on 2026-05-17:
- `/etc/containers/systemd/openvoxserver.container` has a `GIT_SSH_COMMAND` environment entry that points at the persisted serverdata deploy key.
- `/etc/systemd/system/openvoxserver-safeconfig.service` is enabled and active, and reapplies `git config --global --add safe.directory *` inside the running container.
- `/opt/puppet/r10k-deploy.sh` self-heals before each fetch by setting `safe.directory`, the repo-local `core.sshCommand`, and the persisted `known_hosts` file when needed.
- `puppet-deploy.service` exits `0/SUCCESS` after the apply and the control repo reports `HEAD == origin/master`.
- `systemctl cat openvoxserver` does not currently resolve to a generated unit on noc1. The container is running through Podman with `restart=always`, so destructive recreate smoke must not run until the generated unit is present.
## Discovery
Run every command through noc1 as `fcadmin`; do not assume BLUEJAY-WS can reach container-local surfaces directly.
```bash
ssh -i ~/.ssh/fcadmin_ed25519 fcadmin@10.0.56.10 "hostname && sudo -n true"
ssh -i ~/.ssh/fcadmin_ed25519 fcadmin@10.0.56.10 "sudo find /etc/containers/systemd /usr/share/containers/systemd /etc/systemd/system -name 'openvoxserver*' 2>/dev/null"
ssh -i ~/.ssh/fcadmin_ed25519 fcadmin@10.0.56.10 "sudo sed -n '1,220p' /etc/containers/systemd/openvoxserver.container"
ssh -i ~/.ssh/fcadmin_ed25519 fcadmin@10.0.56.10 "sudo systemctl cat puppet-deploy.service"
```
If a future noc1 profile manages these files, update the Puppet control repo and let `puppet-deploy.service` apply the change. On 2026-05-17, host `puppet` was not installed, so Cx-12 used a direct noc1 host edit.
## Durable Fix Shape
The Quadlet keeps the deploy key as a path reference only:
```ini
Environment=GIT_SSH_COMMAND=ssh -i /opt/puppetlabs/server/data/puppetserver/.puppet-deploy-key -o StrictHostKeyChecking=yes -o IdentitiesOnly=yes -o UserKnownHostsFile=/opt/puppetlabs/server/data/puppetserver/.known_hosts
```
The safeconfig service is intentionally independent of `openvoxserver.service` until the generated unit exists. It waits for the `openvoxserver` container name and then runs:
```bash
/usr/bin/podman exec openvoxserver git config --global --add safe.directory *
```
The deploy script self-heals inside the container before it fetches the control repo:
```bash
git config --global --add safe.directory "*" 2>/dev/null || true
DEPLOY_KEY="/opt/puppetlabs/server/data/puppetserver/.puppet-deploy-key"
KNOWN_HOSTS="/opt/puppetlabs/server/data/puppetserver/.known_hosts"
REPO="/etc/puppetlabs/code/environments/production"
export GIT_SSH_COMMAND="ssh -i $DEPLOY_KEY -o StrictHostKeyChecking=yes -o IdentitiesOnly=yes -o UserKnownHostsFile=$KNOWN_HOSTS"
git -C "$REPO" config core.sshCommand "$GIT_SSH_COMMAND" 2>/dev/null || true
```
## Validation
Non-destructive validation:
```bash
ssh -i ~/.ssh/fcadmin_ed25519 fcadmin@10.0.56.10 "sudo grep -n 'GIT_SSH_COMMAND' /etc/containers/systemd/openvoxserver.container"
ssh -i ~/.ssh/fcadmin_ed25519 fcadmin@10.0.56.10 "sudo systemctl status openvoxserver-safeconfig.service --no-pager -l"
ssh -i ~/.ssh/fcadmin_ed25519 fcadmin@10.0.56.10 "sudo systemctl start puppet-deploy.service && sudo systemctl status puppet-deploy.service --no-pager -l"
ssh -i ~/.ssh/fcadmin_ed25519 fcadmin@10.0.56.10 "sudo podman exec openvoxserver git -C /etc/puppetlabs/code/environments/production config --get core.sshCommand"
```
Destructive recreate smoke is opt-in only:
```bash
scp scripts/monitoring/openvox-recreate-smoke.sh fcadmin@10.0.56.10:/tmp/openvox-recreate-smoke.sh
ssh -i ~/.ssh/fcadmin_ed25519 fcadmin@10.0.56.10 "chmod +x /tmp/openvox-recreate-smoke.sh && sudo OPENVOX_RECREATE_SMOKE=1 /tmp/openvox-recreate-smoke.sh"
```
Do not run the smoke during normal sprint work. It stops and removes the production container before starting it again through systemd, and it now refuses to continue unless `systemctl cat openvoxserver` succeeds.
## Credential Rotation Note
When rotating the Puppet deploy key, update the persisted serverdata copy on noc1:
```bash
sudo install -m 0600 -o root -g root <new-deploy-key> /opt/puppet/serverdata/.puppet-deploy-key
sudo podman exec openvoxserver sh -c "ssh-keyscan github.com > /opt/puppetlabs/server/data/puppetserver/.known_hosts"
sudo systemctl start openvoxserver-safeconfig.service
sudo systemctl start puppet-deploy.service
```
Never commit the deploy key or print it in logs.

View File

@@ -0,0 +1,48 @@
#!/usr/bin/env bash
set -euo pipefail
if [ "${OPENVOX_RECREATE_SMOKE:-}" != "1" ]; then
echo "SKIP: set OPENVOX_RECREATE_SMOKE=1 to run the destructive openvoxserver recreate smoke." >&2
exit 64
fi
SUDO="${SUDO:-sudo}"
REPO="/etc/puppetlabs/code/environments/production"
CORE_SSH_COMMAND_FRAGMENT=".puppet-deploy-key"
if ! $SUDO systemctl cat openvoxserver >/dev/null 2>&1; then
echo "SKIP: systemctl cat openvoxserver failed; refusing to remove a container without a verified systemd recreate path." >&2
exit 65
fi
before="$($SUDO podman exec openvoxserver git -C "$REPO" rev-parse --short HEAD)"
echo "Before recreate: $before"
$SUDO systemctl stop openvoxserver
$SUDO podman rm openvoxserver 2>/dev/null || true
$SUDO systemctl start openvoxserver
sleep 50
$SUDO systemctl start puppet-deploy.service
sleep 5
$SUDO systemctl status puppet-deploy.service --no-pager -l
after="$($SUDO podman exec openvoxserver git -C "$REPO" rev-parse --short origin/master)"
echo "After recreate origin/master: $after"
$SUDO test -d /opt/puppet/code/environments/production/site-modules/profile/manifests
core_ssh="$($SUDO podman exec openvoxserver git -C "$REPO" config --get core.sshCommand)"
case "$core_ssh" in
*"$CORE_SSH_COMMAND_FRAGMENT"*) ;;
*)
echo "FAIL: core.sshCommand does not reference the persisted deploy key." >&2
exit 1
;;
esac
$SUDO podman exec openvoxserver git -C "$REPO" status --short --branch
echo "PASS: openvoxserver recreate smoke completed without git safety or deploy-key failure."

View File

@@ -13,6 +13,7 @@ public sealed class FleetManifestLintTests
private static readonly HashSet<string> PublicReadOnlyHosts = new(StringComparer.Ordinal)
{
"brochure.flowercore.io",
"dist.flowercore.io",
"dns.iamworkin.lan",
};
@@ -22,10 +23,16 @@ public sealed class FleetManifestLintTests
// (bootstrap-JWT) so its allowlist is GET||HEAD||POST||OPTIONS — but
// PUT/PATCH/DELETE must still 404 at the route. Anything wider than this
// set should fail this lint.
//
// PUB-1 (2026-05-06): update.flowercore.io / updates.flowercore.io were
// added for the Cloudflare-proxied public Update Center edge. They use the
// same bounded read-write allowlist as the LAN pair.
private static readonly HashSet<string> PublicReadWriteAllowlistHosts = new(StringComparer.Ordinal)
{
"updatecenter.iamworkin.lan",
"updates.iamworkin.lan",
"update.flowercore.io",
"updates.flowercore.io",
};
private static readonly HashSet<string> ApiKeyProtectedDeployments = new(StringComparer.Ordinal)
@@ -48,6 +55,43 @@ public sealed class FleetManifestLintTests
"ttsreader-piper",
};
private static readonly IReadOnlyDictionary<string, string> LinuxRunnerRepos = new Dictionary<string, string>(StringComparer.Ordinal)
{
["github-runner"] = "https://github.com/astoltz/FlowerCore.Common",
["github-runner-sharedpos"] = "https://github.com/astoltz/FlowerCore.Shared.Pos",
["github-runner-puppet"] = "https://github.com/astoltz/FlowerCore.Puppet",
["github-runner-signage"] = "https://github.com/astoltz/FlowerCore.Signage",
["github-runner-dms"] = "https://github.com/astoltz/FlowerCore.DMS",
["github-runner-telephony"] = "https://github.com/astoltz/FlowerCore.Telephony",
["github-runner-print-web"] = "https://github.com/astoltz/FlowerCore.Print.Web",
["github-runner-chat"] = "https://github.com/astoltz/FlowerCore.Chat",
["github-runner-mysql"] = "https://github.com/astoltz/FlowerCore.MySQL",
["github-runner-kiosk-linux"] = "https://github.com/astoltz/FlowerCore.Kiosk.Linux",
};
private static readonly HashSet<string> ScaledLinuxRunnerDeployments = new(StringComparer.Ordinal)
{
"github-runner-sharedpos",
"github-runner-puppet",
"github-runner-signage",
"github-runner-dms",
"github-runner-telephony",
"github-runner-print-web",
"github-runner-chat",
"github-runner-mysql",
"github-runner-kiosk-linux",
};
private static readonly IReadOnlyDictionary<string, string> WritableRunnerEnv = new Dictionary<string, string>(StringComparer.Ordinal)
{
["HOME"] = "/home/runner",
["DOTNET_INSTALL_DIR"] = "/home/runner/.dotnet",
["DOTNET_CLI_HOME"] = "/home/runner",
["NUGET_PACKAGES"] = "/home/runner/.nuget/packages",
["XDG_CACHE_HOME"] = "/home/runner/.cache",
["RUNNER_TOOL_CACHE"] = "/home/runner/_tool",
};
[Fact]
public void IngressRoutes_MustKeepServiceReferencesInTheSameNamespace()
{
@@ -181,6 +225,98 @@ public sealed class FleetManifestLintTests
violations.Should().BeEmpty();
}
[Fact]
public void GitHubRunnerFleet_MustRegisterRequiredReposAsRepoScopedDeployments()
{
var deployments = GitHubRunnerDeployments();
foreach (var expectedRunner in LinuxRunnerRepos)
{
deployments.Should().ContainKey(expectedRunner.Key);
var container = deployments[expectedRunner.Key].ContainerMappings().Should().ContainSingle().Subject;
EnvValue(container, "REPO_URL").Should().Be(expectedRunner.Value);
EnvValue(container, "EPHEMERAL").Should().Be("true");
EnvValue(container, "LABELS").Should().Be("self-hosted,linux,fc-build-linux");
EnvValue(container, "RUN_AS_ROOT").Should().Be("false");
EnvValue(container, "ACCESS_TOKEN").Should().BeNull("ACCESS_TOKEN must come from github-runner-token Secret, not a literal");
EnvSecretName(container, "ACCESS_TOKEN").Should().Be("github-runner-token");
EnvSecretKey(container, "ACCESS_TOKEN").Should().Be("credential");
}
}
[Fact]
public void GitHubRunnerFleet_MustSetWritableNonRootDotnetAndCachePaths()
{
foreach (var deployment in GitHubRunnerDeployments().Values)
{
var container = deployment.ContainerMappings().Should().ContainSingle().Subject;
foreach (var expectedEnv in WritableRunnerEnv)
{
EnvValue(container, expectedEnv.Key).Should().Be(expectedEnv.Value, $"{deployment.Name} must keep .NET paths writable for uid 1001");
}
var mounts = ManifestNodeExtensions.MappingSequence(container, "volumeMounts")
.ToDictionary(
mount => ManifestNodeExtensions.Scalar(mount, "name") ?? string.Empty,
mount => ManifestNodeExtensions.Scalar(mount, "mountPath") ?? string.Empty,
StringComparer.Ordinal);
mounts.Should().Contain("runner-home", "/home/runner");
mounts.Should().Contain("nuget-cache", "/home/runner/.nuget/packages");
mounts.Should().Contain("tmp", "/tmp");
}
}
[Fact]
public void GitHubRunnerFleet_MustAvoidRwoMultiAttachForScaledDeployments()
{
var deployments = GitHubRunnerDeployments();
foreach (var deploymentName in ScaledLinuxRunnerDeployments)
{
var deployment = deployments[deploymentName];
ReplicaCount(deployment).Should().Be(2);
var volumes = deployment.MappingSequence("spec", "template", "spec", "volumes");
var claimNames = volumes
.Select(volume => ManifestNodeExtensions.Scalar(volume, "persistentVolumeClaim", "claimName"))
.Where(value => !string.IsNullOrWhiteSpace(value))
.ToList();
claimNames.Should().BeEmpty($"{deploymentName} is scaled and must not share a RWO PVC");
volumes.Should().Contain(volume =>
string.Equals(ManifestNodeExtensions.Scalar(volume, "name"), "nuget-cache", StringComparison.Ordinal)
&& ManifestNodeExtensions.Mapping(volume, "emptyDir") != null);
}
var common = deployments["github-runner"];
ReplicaCount(common).Should().Be(1);
common.MappingSequence("spec", "template", "spec", "volumes")
.Select(volume => ManifestNodeExtensions.Scalar(volume, "persistentVolumeClaim", "claimName"))
.Where(value => !string.IsNullOrWhiteSpace(value))
.Should()
.ContainSingle()
.Which
.Should()
.Be("github-runner-nuget-cache");
}
[Fact]
public void Monitoring_MustAlertWhenLinuxRunnerDeploymentIsUnavailable()
{
var monitoring = File.ReadAllText(Path.Combine(Inventory.BluejayRoot, "apps", "monitoring", "noc-monitoring.yaml"));
monitoring.Should().Contain("MacMiniRunnerOffline");
monitoring.Should().Contain("LinuxRunnerOffline");
monitoring.Should().Contain("kube_deployment_status_replicas_ready");
monitoring.Should().Contain("github-runner(|-(sharedpos|puppet|signage|dms|telephony|print-web|chat|mysql|kiosk-linux))");
monitoring.Should().Contain("folder: CI Alerts");
monitoring.Should().Contain("uid: linux-runner-offline");
monitoring.Should().Contain("alert_channel: irc");
}
[Fact]
public void StatefulSets_WithVolumeClaimTemplates_MustDeclareFilesystemDefaults()
{
@@ -285,6 +421,232 @@ public sealed class FleetManifestLintTests
violations.Should().BeEmpty();
}
[Fact]
public void FcDeviceManagement_MustShipExpectedManifestSet()
{
var appRoot = Path.Combine(Inventory.BluejayRoot, "apps", "fc-devicemgmt");
Directory.Exists(appRoot).Should().BeTrue("Sprint 8 Cx-5 owns apps/fc-devicemgmt.");
var expectedFiles = new[]
{
"1password-item.yaml",
"argocd-application.yaml",
"certificate-web.yaml",
"clusterissuer-step-ca-agent.yaml",
"clusterrole-operator.yaml",
"clusterrolebinding-operator.yaml",
"deployment-operator.yaml",
"deployment-web.yaml",
"ingressroute-web.yaml",
"namespace.yaml",
"network-policy.yaml",
"service-web.yaml",
"serviceaccount-operator.yaml",
};
Directory.GetFiles(appRoot, "*.yaml")
.Select(Path.GetFileName)
.Should()
.BeEquivalentTo(expectedFiles);
foreach (var expectedFile in expectedFiles)
{
FcDeviceManagementDocuments()
.Should()
.Contain(document => document.RelativePath == $"fc-devicemgmt/{expectedFile}");
}
}
[Fact]
public void FcDeviceManagement_ObjectsMustCarryStandardTraceabilityLabels()
{
var requiredLabels = new[]
{
"app.kubernetes.io/name",
"app.kubernetes.io/part-of",
"app.kubernetes.io/managed-by",
"flowercore.io/tenant-id",
"flowercore.io/created-by",
};
var violations = FcDeviceManagementDocuments()
.SelectMany(document => requiredLabels
.Where(label => string.IsNullOrWhiteSpace(document.Scalar("metadata", "labels", label)))
.Select(label => $"{document.Descriptor} is missing metadata.labels['{label}']."))
.Concat(FcDeviceManagementDocuments()
.Where(document => document.Kind == "Deployment")
.SelectMany(document => requiredLabels
.Where(label => string.IsNullOrWhiteSpace(document.Scalar("spec", "template", "metadata", "labels", label)))
.Select(label => $"{document.Descriptor} pod template is missing metadata.labels['{label}'].")))
.Concat(FcDeviceManagementDocuments()
.Where(document => document.Kind == "Deployment")
.Where(document => string.IsNullOrWhiteSpace(document.Scalar("spec", "template", "metadata", "annotations", "flowercore.io/audit-trace-id")))
.Select(document => $"{document.Descriptor} pod template is missing flowercore.io/audit-trace-id."))
.ToList();
violations.Should().BeEmpty();
}
[Fact]
public void FcDeviceManagement_IngressMustUseCertManagerAndKeepPublicHostDisabled()
{
var appText = string.Join(
Environment.NewLine,
Directory.GetFiles(Path.Combine(Inventory.BluejayRoot, "apps", "fc-devicemgmt"), "*.yaml")
.Select(File.ReadAllText));
appText.Should().NotContain("certResolver");
appText.Should().Contain("update.flowercore.io");
appText.Should().Contain("disabled-until-Q-OIDC-1");
FcDeviceManagementDocuments()
.Where(document => document.Kind == "IngressRoute")
.SelectMany(document => document.MappingSequence("spec", "routes"))
.Select(route => ManifestNodeExtensions.Scalar(route, "match") ?? string.Empty)
.Should()
.Contain(match => match.Contains("Host(`devices.iamworkin.lan`)", StringComparison.Ordinal))
.And.NotContain(match => match.Contains("Host(`update.flowercore.io`)", StringComparison.Ordinal));
var certificate = FcDeviceManagementDocuments()
.Single(document => document.Kind == "Certificate" && document.Name == "fc-devicemgmt-web-tls");
certificate.Scalar("spec", "issuerRef", "name").Should().Be("step-ca-acme");
certificate.Scalar("spec", "issuerRef", "kind").Should().Be("ClusterIssuer");
ManifestNodeExtensions.ScalarSequence(certificate.Root, "spec", "dnsNames")
.Should()
.ContainSingle("devices.iamworkin.lan");
}
[Fact]
public void FcDeviceManagement_StepCaAgentIssuerMustTargetNocProvisioner()
{
var issuer = FcDeviceManagementDocuments()
.Single(document => document.Kind == "StepClusterIssuer" && document.Name == "step-ca-agent");
issuer.Scalar("apiVersion").Should().Be("certmanager.step.sm/v1beta1");
issuer.Scalar("spec", "url").Should().Be("https://10.0.56.10:9443");
issuer.Scalar("spec", "caBundle").Should().NotBeNullOrWhiteSpace();
issuer.Scalar("spec", "provisioner", "name").Should().Be("step-ca-agent");
issuer.Scalar("spec", "provisioner", "kid").Should().Be("RF3A9welUYVOWBX8tr19aWyA2kQlxoGZN1dRwTElUEM");
}
[Fact]
public void FcDeviceManagement_StepCaAgentIssuerMustReferencePasswordSecretOnly()
{
var issuer = FcDeviceManagementDocuments()
.Single(document => document.Kind == "StepClusterIssuer" && document.Name == "step-ca-agent");
issuer.Scalar("spec", "provisioner", "passwordRef", "name")
.Should()
.Be("step-ca-agent-provisioner-password");
issuer.Scalar("spec", "provisioner", "passwordRef", "namespace").Should().Be("cert-manager");
issuer.Scalar("spec", "provisioner", "passwordRef", "key").Should().Be("password");
var issuerText = File.ReadAllText(Path.Combine(Inventory.BluejayRoot, "apps", "fc-devicemgmt", "clusterissuer-step-ca-agent.yaml"));
issuerText.Should().NotContain("stringData:");
issuerText.Should().NotContain("password:");
issuerText.Should().NotContain("privateKey");
}
[Fact]
public void FcDeviceManagement_StepCaAgentIssuerMustCarryTraceabilityMetadata()
{
var issuer = FcDeviceManagementDocuments()
.Single(document => document.Kind == "StepClusterIssuer" && document.Name == "step-ca-agent");
issuer.Scalar("metadata", "labels", "app.kubernetes.io/managed-by").Should().Be("argocd");
issuer.Scalar("metadata", "labels", "flowercore.io/tenant-id").Should().Be("system");
issuer.Scalar("metadata", "annotations", "flowercore.io/provisioner-source")
.Should()
.Be("profile::pki::stepca");
issuer.Scalar("metadata", "annotations", "flowercore.io/secret-source")
.Should()
.Be("cert-manager/step-ca-agent-provisioner-password");
}
[Fact]
public void FcDeviceManagement_OperatorRbacMustCoverDevicesAndOwnerLookup()
{
var clusterRole = FcDeviceManagementDocuments()
.Single(document => document.Kind == "ClusterRole" && document.Name == "fc-devicemgmt-operator");
var allScalars = clusterRole.AllScalars().ToList();
allScalars.Should().Contain("devices.flowercore.io");
allScalars.Should().Contain("*");
allScalars.Should().Contain("deployments");
allScalars.Should().Contain("get");
var operatorDeployment = FcDeviceManagementDocuments()
.Single(document => document.Kind == "Deployment" && document.Name == "fc-devicemgmt-operator");
operatorDeployment.AllScalars().Should().Contain("FLOWERCORE_KUBERNETES_OWNER_DEPLOYMENT");
operatorDeployment.AllScalars().Should().Contain("fc-devicemgmt-operator");
}
[Fact]
public void FcDeviceManagement_RuntimeSecretsMustUseOnePasswordItemPattern()
{
var item = FcDeviceManagementDocuments()
.Single(document => document.Kind == "OnePasswordItem" && document.Name == "fc-devicemgmt-runtime");
item.Scalar("spec", "itemPath")
.Should()
.Be("vaults/IAmWorkin/items/FlowerCore DeviceManagement Runtime");
var appText = string.Join(
Environment.NewLine,
Directory.GetFiles(Path.Combine(Inventory.BluejayRoot, "apps", "fc-devicemgmt"), "*.yaml")
.Select(File.ReadAllText));
FcDeviceManagementDocuments().Should().NotContain(document => document.Kind == "Secret");
appText.Should().Contain("secretKeyRef:");
appText.Should().Contain("secretName: fc-devicemgmt-runtime");
appText.Should().NotContain("stringData:");
appText.Should().NotContain("from-literal");
appText.Should().NotContain("tls.key:");
}
[Fact]
public void FcDeviceManagement_NetworkPoliciesMustAllowLanAgentsSynologyAndDnatPorts()
{
var policies = FcDeviceManagementDocuments()
.Where(document => document.Kind == "NetworkPolicy")
.ToList();
policies.Should().HaveCount(2);
var combinedScalars = policies.SelectMany(policy => policy.AllScalars()).ToList();
combinedScalars.Should().Contain("10.0.56.0/24");
combinedScalars.Should().Contain("10.0.57.0/24");
combinedScalars.Should().Contain("10.0.58.0/24");
combinedScalars.Should().Contain("10.0.68.0/27");
combinedScalars.Should().Contain("10.0.58.3/32");
var combinedEgressPorts = policies.SelectMany(policy => policy.EgressPorts()).ToHashSet(StringComparer.Ordinal);
combinedEgressPorts.Should().Contain(new[] { "80", "443", "8080", "8443", "2049", "111" });
var traefikVipPolicies = policies
.Where(policy => policy.AllScalars().Any(value => value.Contains("10.0.56.200", StringComparison.Ordinal)))
.ToList();
traefikVipPolicies.Should().ContainSingle();
traefikVipPolicies[0].EgressPorts().Should().Contain(new[] { "80", "443", "8080", "8443" });
}
[Fact]
public void FcDeviceManagement_ArgocdApplicationMustMatchApplicationSetDiscoveryConventions()
{
var application = FcDeviceManagementDocuments()
.Single(document => document.Kind == "Application" && document.Name == "infra-fc-devicemgmt");
application.Namespace.Should().Be("argocd");
application.Scalar("spec", "source", "repoURL")
.Should()
.Be("http://gitea-clusterip.gitea.svc.cluster.local:3000/bluejay/bluejay-infra.git");
application.Scalar("spec", "source", "path").Should().Be("apps/fc-devicemgmt");
application.Scalar("spec", "destination", "namespace").Should().Be("fc-devicemgmt");
}
private static IEnumerable<string> ProbeViolations(
ManifestDocument document,
YamlMappingNode container,
@@ -308,6 +670,51 @@ public sealed class FleetManifestLintTests
$"{document.Descriptor} container '{containerName}' still uses {probeKey}.httpGet on /health.",
};
}
private static IReadOnlyDictionary<string, ManifestDocument> GitHubRunnerDeployments()
{
return Inventory.Documents
.Where(document => document.Kind == "Deployment")
.Where(document => document.Namespace == "github-runner")
.ToDictionary(document => document.Name, StringComparer.Ordinal);
}
private static int ReplicaCount(ManifestDocument document)
{
return int.TryParse(document.Scalar("spec", "replicas"), out var replicas) ? replicas : 1;
}
private static string? EnvValue(YamlMappingNode container, string name)
{
return EnvMapping(container, name) is { } env ? ManifestNodeExtensions.Scalar(env, "value") : null;
}
private static string? EnvSecretName(YamlMappingNode container, string name)
{
return EnvMapping(container, name) is { } env
? ManifestNodeExtensions.Scalar(env, "valueFrom", "secretKeyRef", "name")
: null;
}
private static string? EnvSecretKey(YamlMappingNode container, string name)
{
return EnvMapping(container, name) is { } env
? ManifestNodeExtensions.Scalar(env, "valueFrom", "secretKeyRef", "key")
: null;
}
private static YamlMappingNode? EnvMapping(YamlMappingNode container, string name)
{
return ManifestNodeExtensions.MappingSequence(container, "env")
.SingleOrDefault(env => string.Equals(ManifestNodeExtensions.Scalar(env, "name"), name, StringComparison.Ordinal));
}
private static IReadOnlyList<ManifestDocument> FcDeviceManagementDocuments()
{
return Inventory.Documents
.Where(document => document.RelativePath.StartsWith("fc-devicemgmt/", StringComparison.Ordinal))
.ToList();
}
}
internal sealed class ManifestInventory

View File

@@ -0,0 +1,99 @@
using FluentAssertions;
using Xunit;
namespace BluejayInfraLint.Tests;
[Trait("Category", "Unit")]
public sealed class OpenVoxServerDurabilityTests
{
private static readonly string Root = FindRepoRoot();
private static readonly string RunbookPath = Path.Combine(Root, "docs", "runbooks", "openvoxserver-quadlet-durability.md");
private static readonly string SmokePath = Path.Combine(Root, "scripts", "monitoring", "openvox-recreate-smoke.sh");
[Fact]
public void Runbook_DocumentsHostArtifactAndNonArgoPath()
{
var runbook = File.ReadAllText(RunbookPath);
runbook.Should().Contain("noc1 host artifact");
runbook.Should().Contain("not an ArgoCD application");
runbook.Should().Contain("systemctl cat openvoxserver");
runbook.Should().Contain("/etc/containers/systemd/openvoxserver.container");
}
[Fact]
public void Runbook_DocumentsCx12LiveApplyState()
{
var runbook = File.ReadAllText(RunbookPath);
runbook.Should().Contain("Sprint 32 Cx-12");
runbook.Should().Contain("openvoxserver-safeconfig.service");
runbook.Should().Contain("/opt/puppet/r10k-deploy.sh");
runbook.Should().Contain("HEAD == origin/master");
}
[Fact]
public void SmokeScript_IsExplicitlyOptIn()
{
var smoke = File.ReadAllText(SmokePath);
smoke.Should().Contain("OPENVOX_RECREATE_SMOKE");
smoke.Should().Contain("exit 64");
smoke.IndexOf("OPENVOX_RECREATE_SMOKE", StringComparison.Ordinal)
.Should().BeLessThan(smoke.IndexOf("systemctl stop openvoxserver", StringComparison.Ordinal));
}
[Fact]
public void SmokeScript_RequiresGeneratedSystemdUnitBeforeRemovingContainer()
{
var smoke = File.ReadAllText(SmokePath);
smoke.Should().Contain("systemctl cat openvoxserver");
smoke.Should().Contain("refusing to remove a container without a verified systemd recreate path");
smoke.IndexOf("systemctl cat openvoxserver", StringComparison.Ordinal)
.Should().BeLessThan(smoke.IndexOf("podman rm openvoxserver", StringComparison.Ordinal));
}
[Fact]
public void Artifacts_DoNotStoreSecretsOrPaidRunnerLabels()
{
var forbidden = new[]
{
"BEGIN OPENSSH PRIVATE KEY",
"BEGIN RSA PRIVATE KEY",
"ubuntu-latest",
"windows-latest",
"macos-latest",
};
var violations = new[] { RunbookPath, SmokePath }
.SelectMany(path =>
{
var text = File.ReadAllText(path);
return forbidden
.Where(token => text.Contains(token, StringComparison.OrdinalIgnoreCase))
.Select(token => $"{Path.GetRelativePath(Root, path)} contains forbidden token {token}");
})
.ToList();
violations.Should().BeEmpty();
}
private static string FindRepoRoot()
{
var current = new DirectoryInfo(AppContext.BaseDirectory);
while (current is not null)
{
if (Directory.Exists(Path.Combine(current.FullName, "apps"))
&& Directory.Exists(Path.Combine(current.FullName, "scripts"))
&& File.Exists(Path.Combine(current.FullName, "README.md")))
{
return current.FullName;
}
current = current.Parent;
}
throw new DirectoryNotFoundException("Could not find bluejay-infra root.");
}
}

View File

@@ -0,0 +1,269 @@
using System.Text.Json;
using FluentAssertions;
using Xunit;
namespace BluejayInfraLint.Tests;
[Trait("Category", "Unit")]
public sealed class PiSignagePlayerArtifactTests
{
private static readonly string Root = FindRepoRoot();
private static readonly string AppRoot = Path.Combine(Root, "apps", "fc-signage-pi-player");
public static TheoryData<string> RequiredArtifacts => new()
{
"README.md",
"systemd/flowercore-signage-player-pi.service",
"systemd/flowercore-signage-player-pi-hdmi.service",
"systemd/flowercore-signage-bootstrap.service",
"systemd/flowercore-signage-renew.service",
"systemd/flowercore-signage-renew.timer",
"systemd/flowercore-signage-detect-display.service",
"systemd/flowercore-signage-detect-display.timer",
"systemd/99-flowercore-signage-hdmi.rules",
"chromium-policies/flowercore-signage.json",
"scripts/flowercore-signage-launch.sh",
"scripts/flowercore-signage-prelaunch.sh",
"scripts/flowercore-signage-bootstrap.sh",
"scripts/flowercore-signage-renew-cert.sh",
"scripts/flowercore-signage-hdmi-respond.sh",
"scripts/fc-signage-detect-display",
};
[Theory]
[MemberData(nameof(RequiredArtifacts))]
public void RequiredArtifacts_ArePresent(string relativePath)
{
File.Exists(Path.Combine(AppRoot, relativePath)).Should().BeTrue(relativePath);
}
[Fact]
public void PlayerService_UsesExpectedRestartAndMemoryGuards()
{
var unit = Read("systemd/flowercore-signage-player-pi.service");
unit.Should().Contain("Restart=always");
unit.Should().Contain("RestartSec=10s");
unit.Should().Contain("StartLimitBurst=5");
unit.Should().Contain("StartLimitIntervalSec=300s");
unit.Should().Contain("MemoryMax=2G");
}
[Fact]
public void PlayerService_IsGatedByNodeIdentityAndMtlsCertificate()
{
var unit = Read("systemd/flowercore-signage-player-pi.service");
unit.Should().Contain("ConditionPathExists=/etc/flowercore/signage-node.json");
unit.Should().Contain("ConditionPathExists=/etc/fc-signage-player/client.p12");
unit.Should().Contain("ExecStartPre=/usr/local/bin/flowercore-signage-prelaunch.sh");
}
[Fact]
public void LaunchScript_TriesEmbedThenFallsBackToBarePlayerRoute()
{
var script = Read("scripts/flowercore-signage-launch.sh");
script.Should().Contain("/player/${NODE_ID}/embed?token=${CERT_THUMB}");
script.Should().Contain("url-divergence.log");
script.Should().Contain("/player/${NODE_ID}?token=${CERT_THUMB}");
}
[Fact]
public void LaunchScript_DisablesChromiumPromptsAndRuntimeUpdates()
{
var script = Read("scripts/flowercore-signage-launch.sh");
script.Should().Contain("--noerrdialogs");
script.Should().Contain("--disable-infobars");
script.Should().Contain("--password-store=basic");
script.Should().Contain("--check-for-update-interval=2592000");
}
[Fact]
public void PrelaunchScript_AbortsWhenRequiredFilesAreMissing()
{
var script = Read("scripts/flowercore-signage-prelaunch.sh");
script.Should().Contain("for f in /etc/flowercore/signage-node.json /etc/fc-signage-player/client.p12 /etc/fc-signage-player/client.p12.pass");
script.Should().Contain("exit 1");
script.Should().Contain("-checkend $((7*24*3600))");
}
[Fact]
public void BootstrapScript_IsIdempotentWhenAlreadyEnrolled()
{
var script = Read("scripts/flowercore-signage-bootstrap.sh");
script.Should().Contain("already enrolled");
script.Should().Contain("exit 0");
script.Should().Contain(".enrolledAt");
}
[Fact]
public void BootstrapScript_GeneratesStableMachineIdFromUuid()
{
var script = Read("scripts/flowercore-signage-bootstrap.sh");
script.Should().Contain("uuidgen");
script.Should().Contain("cut -c1-16");
script.Should().Contain("machineId");
}
[Fact]
public void BootstrapScript_RetriesRegisterOnceForFirstCallRace()
{
var script = Read("scripts/flowercore-signage-bootstrap.sh");
script.Should().Contain("for attempt in 1 2");
script.Should().Contain("register attempt $attempt returned");
script.Should().Contain("sleep 5");
}
[Fact]
public void BootstrapScript_SupportsSetupCodeAndApprovalPollingBudget()
{
var script = Read("scripts/flowercore-signage-bootstrap.sh");
script.Should().Contain("signage-setup-code");
script.Should().Contain("approve-via-setup-code");
script.Should().Contain("+ 1800");
script.Should().Contain("sleep 15");
}
[Fact]
public void BootstrapScript_CsrSubjectIdentifiesPiPlayer()
{
var script = Read("scripts/flowercore-signage-bootstrap.sh");
script.Should().Contain("/CN=${NODE_ID}/O=FlowerCore/OU=SignagePlayer-Pi");
}
[Fact]
public void BootstrapScript_PersistsCertificateAsP12WithRestrictivePermissions()
{
var script = Read("scripts/flowercore-signage-bootstrap.sh");
script.Should().Contain("openssl pkcs12 -export");
script.Should().Contain("client.p12.pass");
script.Should().Contain("chmod 0600");
script.Should().Contain("chmod 0640");
}
[Fact]
public void RenewScript_OnlyRunsWhenCertHasLessThanThirtyDays()
{
var script = Read("scripts/flowercore-signage-renew-cert.sh");
script.Should().Contain("-checkend $((30*24*3600))");
script.Should().Contain("exit 0");
script.Should().Contain("/renew");
}
[Fact]
public void RenewScript_AtomicallySwapsNewCertificateFiles()
{
var script = Read("scripts/flowercore-signage-renew-cert.sh");
script.Should().Contain("client.key.new");
script.Should().Contain("mv \"$CERT_DIR/client.key.new\" \"$CERT_DIR/client.key\"");
script.Should().Contain("mv \"$CERT_DIR/client.p12.new\" \"$CERT_DIR/client.p12\"");
}
[Fact]
public void HdmiRule_RestartsPlayerAndRunsCapabilityDetection()
{
var rule = Read("systemd/99-flowercore-signage-hdmi.rules");
var responder = Read("scripts/flowercore-signage-hdmi-respond.sh");
rule.Should().Contain("KERNEL==\"card?-HDMI-A-?\"");
rule.Should().Contain("start flowercore-signage-player-pi-hdmi.service");
responder.Should().Contain("sleep 2");
responder.Should().Contain("start flowercore-signage-detect-display.service");
responder.Should().Contain("restart flowercore-signage-player-pi.service");
}
[Fact]
public void DetectDisplayServiceAndTimer_RunAtBootAndDaily()
{
var service = Read("systemd/flowercore-signage-detect-display.service");
var timer = Read("systemd/flowercore-signage-detect-display.timer");
service.Should().Contain("ExecStart=/usr/local/bin/fc-signage-detect-display");
timer.Should().Contain("OnBootSec=30s");
timer.Should().Contain("OnCalendar=daily");
timer.Should().Contain("RandomizedDelaySec=1h");
}
[Fact]
public void DetectDisplayScript_EmitsDisconnectedProfileWhenNoHdmiIsPresent()
{
var script = Read("scripts/fc-signage-detect-display");
script.Should().Contain("displayConnected: false");
script.Should().Contain("No HDMI display detected");
}
[Fact]
public void DetectDisplayScript_ParsesEdidForHdrResolutionAndAudio()
{
var script = Read("scripts/fc-signage-detect-display");
script.Should().Contain("edid-decode");
script.Should().Contain("HDR (Static|Dynamic) Metadata Block");
script.Should().Contain("maxResolution");
script.Should().Contain("hasAudioOutput");
}
[Fact]
public void DetectDisplayScript_TriesBothForwardCompatibleCapabilityEndpoints()
{
var script = Read("scripts/fc-signage-detect-display");
script.Should().Contain("/api/v1/nodes/${NODE_ID}/capabilities");
script.Should().Contain("/api/v1/displays/${NODE_ID}/capability-profile");
script.Should().Contain("no endpoint accepted the profile");
}
[Fact]
public void ChromiumPolicy_IsValidJsonAndDisablesCredentialPrompts()
{
using var doc = JsonDocument.Parse(Read("chromium-policies/flowercore-signage.json"));
var root = doc.RootElement;
root.GetProperty("AutofillAddressEnabled").GetBoolean().Should().BeFalse();
root.GetProperty("AutofillCreditCardEnabled").GetBoolean().Should().BeFalse();
root.GetProperty("PasswordManagerEnabled").GetBoolean().Should().BeFalse();
root.GetProperty("ExtensionInstallBlocklist")[0].GetString().Should().Be("*");
}
[Fact]
public void RenewalTimer_UsesDailyCadenceWithTwoHourJitter()
{
var timer = Read("systemd/flowercore-signage-renew.timer");
timer.Should().Contain("OnCalendar=daily");
timer.Should().Contain("RandomizedDelaySec=2h");
timer.Should().Contain("Persistent=true");
}
private static string Read(string relativePath)
=> File.ReadAllText(Path.Combine(AppRoot, relativePath.Replace('/', Path.DirectorySeparatorChar)));
private static string FindRepoRoot()
{
var current = new DirectoryInfo(AppContext.BaseDirectory);
while (current is not null)
{
if (Directory.Exists(Path.Combine(current.FullName, "apps"))
&& File.Exists(Path.Combine(current.FullName, "README.md")))
{
return current.FullName;
}
current = current.Parent;
}
throw new DirectoryNotFoundException("Could not find bluejay-infra root.");
}
}

View File

@@ -1,6 +1,6 @@
package bluejayinfra.public_method_allowlist
public_hosts := {"dist.flowercore.io", "dns.iamworkin.lan"}
public_hosts := {"brochure.flowercore.io", "dist.flowercore.io", "dns.iamworkin.lan"}
deny[msg] {
input.kind == "IngressRoute"

View File

@@ -6,7 +6,12 @@ package bluejayinfra.public_readwrite_allowlist
# PUT/PATCH/DELETE must still 404 at the route. Any host in this set MUST
# include all four required methods AND MUST NOT include any forbidden
# method.
public_readwrite_hosts := {"updatecenter.iamworkin.lan", "updates.iamworkin.lan"}
public_readwrite_hosts := {
"updatecenter.iamworkin.lan",
"updates.iamworkin.lan",
"update.flowercore.io",
"updates.flowercore.io",
}
required_methods := {"GET", "HEAD", "POST", "OPTIONS"}