Compare commits

..

20 Commits

Author SHA1 Message Date
Andrew Stoltz
30e16bfcfb feat: add qt sdk remotedesktop warm pool 2026-05-19 12:30:32 -05:00
Andrew Stoltz
ca574c2280 brochure: delete apps/brochure/ — full prune per operator decision 2026-05-19
Removes the apps/brochure/ directory entirely from the bluejay-infra
ApplicationSet glob. ArgoCD will:

  1. See infra-brochure has no git source -> mark for delete
  2. Prune the brochure namespace + Deployment + Service + Certificate
     + Secret + IngressRoute (all generated from the now-gone
     apps/brochure/brochure.yaml)
  3. Remove the infra-brochure Application from argocd ns

Operator decision 2026-05-19 (follow-up to 09387f9 ARCHIVED banner
commit): "Yes, prune argo for brochure. Probably fully deleted there."

The brochure subdomain project was a planning-chain misinterpretation
of "make TtsReader + AI Station production-ready" — see
memory/project_brochure_split_misinterpretation_archived_2026_05_19.md
in FlowerCore.Notes for the full decision record.

Reusable artifacts that were the operator's archive concern stay alive
in their actual homes:

- FlowerCore.Intranet.Web PR #8 content-NuGet carve-out: still in
  Intranet's master, may transfer to TtsReader / AI Station prod work
- Sprint 32 Cl-5 substrate (public-twin design ideas): SUPERSEDED banner
  in-place in FlowerCore.Notes docs/standards/, history preserved
- magpie-doc-writer + wren-walkthrough skill output: unchanged in
  Intranet's flowercore-whats-new/walkthroughs/galleries directories

Companion Notes-side commit updates the "scaled to 0 + ARCHIVED banner"
language in mvp-readiness.html + fleet-roadmap-2026-05-19-sprint36-v2.md
+ memory record to reflect full deletion instead.

Wrong-codebase image localhost/fc-brochure-web:v20260524-sprint32 is
being removed from rke2-server / rke2-agent1 / rke2-agent2 in a
follow-up step (reclaims ~800MB per node).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 10:42:30 -05:00
Andrew Stoltz
09387f90e1 brochure: ARCHIVED 2026-05-19 — was a misinterpretation, do not re-enable
The brochure split project was a misinterpretation of an operator request
to make TtsReader + AI Station production-ready. Somewhere in the planning
chain it spun up into a separate "showcase brochure product" with its own
host, repo, NuGet, and Codex pack — none of which the operator actually
wanted. The project itself is pointless and a waste of credits.

Archive (not delete) per operator decision 2026-05-19, because some work
shipped under the misinterpretation may still have reusable value:

- FlowerCore.Intranet.Web PR #8 (merged) introduced FlowerCore.Brochure.Content
  content-NuGet carve-out — pattern may apply to TtsReader/AiStation production
  polish.
- Sprint 32 Cl-5 substrate has design ideas for public-twin vs operator-host
  separation that may transfer.
- magpie-doc-writer / wren-walkthrough skills still author useful Intranet
  content — those skills stay active.

These manifests stay at replicas: 0 for ArgoCD continuity. Cleanup options
(move out of apps/* glob, or delete entirely) are documented in README.md
for an operator-explicit future call.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 10:34:28 -05:00
Andrew Stoltz
e641ceab48 monitoring(irc-notify): criticals also batch hourly — fix per-fire spam
The first batching pass (bacac06) left critical-severity alerts on the
immediate-print path. That's still per-event spam for any persistent
critical (e.g. PrintPaperRollCritical fires every 30s Grafana evaluation
cycle when paper is <5%). Caught immediately after deploy: CUPS queue grew
0 → 8 jobs in 8 minutes from a single firing PrintPaperRollCritical.

This commit aligns with the operator's verbatim ask ("one alert an hour"):

- Critical-severity alerts now go into the digest buffer, NOT the
  immediate-print path. The digest payload already shows severity tags
  per alertname, so the operator still sees "[critical] X" in the printout.
- The explicit `alert_channel=thermal_print_immediate` label still bypasses
  batching, but only on NEW fingerprint arrival — it triggers a flush of
  the CURRENT digest (with the new alert included), then clears. Repeat
  webhooks for the same fingerprint dedupe in the buffer until the next
  hourly tick OR until the alert resolves. No fingerprint can spam.
- `add_to_digest` now returns bool (True = buffer grew, False = dedup /
  resolution / disabled) so the immediate-label path can flush only on
  state transitions.

Net effect: max 1 thermal print per BATCH_INTERVAL_MIN per alert fingerprint,
regardless of severity. Rules that genuinely need same-second paper opt in
via `alert_channel=thermal_print_immediate` (currently zero rules use this).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 10:22:25 -05:00
Andrew Stoltz
c263426ea5 fc-devicemgmt: operator image fix + Web scaled to 0
OPERATOR (PodCrashLoopBackOff cleared):
- Bumped image to v20260519-sp34cl3-fix (built from astoltz/FlowerCore.DeviceManagement@d9a3685
  after Sprint 34 Cl-3 stranded branch was merged via PR #19 squash).
- The v20260512-cx5 image was the broken Sprint 8 scaffold: generic Host
  builder, no kubeops, no Kestrel on :8080, no AddController chain. Readiness
  probe dial-tcp 8080 failed every restart.
- The new image ships the AddController chain for all 4 reconcilers
  (DeviceCrd / DeviceGroupCrd / DevicePolicyCrd / RemoteCommandCrd) plus
  Kestrel on :8080 and /healthz.
- Image saved + scp'd + ctr-imported on rke2-server / rke2-agent1 / rke2-agent2
  before this commit. SHA256: 2cc79ee0a2313c550268d1244f805ae41b396362148dd5603061cc15b6f7fa7e

WEB (DeploymentReplicasMismatch cleared via scale-to-0):
- Web pod cannot start. Two upstream gaps must close first:
  1) MySQL DB instance + user `fc_devicemgmt` / database `flowercore_devicemgmt`
     are not provisioned in fc-mysql. Cluster has zero MySqlInstanceCrds and
     no `mysql.fc-mysql.svc:3306` Service.
  2) 1Password vault item `IAmWorkin/FlowerCore DeviceManagement Runtime` is
     missing (5 fields: DB-Password + 4 mTLS PEMs). OnePasswordItem CRD has
     been stuck Ready=False since 2026-05-18T02:58.
- Same pattern as the brochure-web scale-to-0 in 914fed0 — make the cluster
  clean and quiet, let operator restart deploy on a real schedule.

Re-enable path is fully documented in the deployment-web.yaml header comment.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 10:11:09 -05:00
Andrew Stoltz
bacac067cf monitoring(irc-notify): hourly digest batching for thermal printer
The thermal printer drained overnight (2026-05-18/19) because the old
notify.py POSTed one print job per Grafana webhook fire. With 9
concurrently-firing alerts (zabbix-postgres + fc-devicemgmt + brochure
+ PrintPaperRollLow), every evaluation cycle stamped fresh CUPS jobs
onto the queue until the operator physically powered the printer off.

This refactor:

- Adds env-var config: THERMAL_PRINT_ENABLED (master kill switch),
  BATCH_INTERVAL_MIN (default 60), BATCH_MAX_PENDING (default 50).
- IRC delivery stays per-event (operator wants the live stream).
- Thermal routing now:
  * critical/disaster/page severity OR alert_channel=thermal_print_immediate
    -> print immediately
  * alert_channel=thermal_print -> enqueue into hourly digest
  * RESOLVED -> remove from digest buffer (no resolution-spam prints)
  * else -> IRC only, no thermal
- Background digest_loop thread flushes the buffer hourly (or sooner
  if buffer hits BATCH_MAX_PENDING). Digest payload is a single
  Print.Web /api/print/alert POST listing distinct alertnames + per-rule
  target counts.
- New POST /flush endpoint (manual operator force-flush; useful for
  testing without waiting an hour).
- GET / returns config + buffer depth + per-stat counters for observability.

Net effect: max 1 thermal print per BATCH_INTERVAL_MIN for batched
warnings, plus immediate prints for criticals. Closes the 2026-05-18/19
alert-storm incident.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 09:56:14 -05:00
914fed08d8 fix(brochure): scale brochure-web to 0 — wrong codebase shipped (Intranet.Web binary in fc-brochure-web image, CrashLoopBackOff 296 restarts on /data read-only). Re-enable after Sprint 34 Cx-3 rebuild per docs/ai-agents/codex-prompts/2026-05-18-fc-brochure-web-rebuild-pack.md 2026-05-19 14:45:01 +00:00
Andrew Stoltz
200aeab032 ttsreader: deploy study mode repair image 2026-05-18 16:33:08 -05:00
Andrew Stoltz
8182616d4c ttsreader: point render piper to edge1 demo endpoint 2026-05-18 16:06:37 -05:00
Andrew Stoltz
f0862ac03c ttsreader: deploy sprint36 demo audio image 2026-05-18 16:04:59 -05:00
Andrew Stoltz
46c392605e monitoring: mirror PuppetServiceFailed alert from Notes (Sprint 33 Cx-7 Phase B)
Mirrors the live `puppet` alert group from
FlowerCore.Notes/scripts/monitoring/alerts.yml into the K8s ConfigMap so a
future in-cluster Prometheus inherits the ruleset automatically.

Source of truth remains the Notes file (live Podman Prometheus on noc1).
See feedback_monitoring_k8s_target_vs_live_podman.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 11:11:07 -05:00
89b147bbdd docs(openvox): document quadlet durability smoke (#12) 2026-05-18 04:53:02 +00:00
d7238a5e3b feat(brochure): add public brochure GitOps app (#13) 2026-05-18 04:52:37 +00:00
fc444a02a1 feat(chat): add public twin ingress (#11) 2026-05-18 04:52:20 +00:00
83d4883d55 feat(worldbuilder): pin k8s demo to fake backend (#10) 2026-05-18 04:52:11 +00:00
f8fe3b2688 feat(github-runner): add final long-tail runners (#9) 2026-05-18 04:52:01 +00:00
f2ab892ebc feat(github-runner): add Marquee + TtsReader per-repo runners (#8) 2026-05-18 03:27:14 +00:00
fef68a9560 feat(fc-devicemgmt): add Kubernetes deployment manifests (#1)
Sprint 8 IMPL lane Cx-5: fc-devicemgmt K8s manifests (rebased onto main 2026-05-18; 13 files, +944).

Namespace + Web Deployment (replicas:2, MySQL backend) + Operator Deployment (replicas:1, KubeOps leader-elect) + Service + Certificate (step-ca-acme ClusterIssuer) + Traefik IngressRoute (devices.iamworkin.lan internal) + ServiceAccount + ClusterRole + ClusterRoleBinding + NetworkPolicy (CNI DNAT-aware backend ports) + OnePasswordItem (5-field consolidated) + ArgoCD Application bootstrap shape + lint coverage.

Follow-ups (not merge blockers):
- localhost/fc-devicemgmt-{web,operator}:v20260512-cx5 must be imported to all 3 RKE2 nodes; pods will ErrImageNeverPull until imported.
- 1Password vault item 'FlowerCore DeviceManagement Runtime' must be created with 5 fields before pods can start.
- DNS devices.iamworkin.lan -> 10.0.56.200 already present.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 02:56:23 +00:00
Andrew Stoltz
6fe77225ae fix(github-runner): dedupe DOTNET_INSTALL_DIR+NUGET_PACKAGES on base+sharedpos
PR #5 rebase concatenated PR #5 env additions onto PR #7 env additions on
the base + sharedpos Deployments, producing duplicate-key validation
errors in ArgoCD's structured merge. The DOTNET_INSTALL_DIR and
NUGET_PACKAGES values are identical between PR #5 and PR #7; keep the
PR #7 originals and retain only the unique new env vars from PR #5
(DOTNET_CLI_TELEMETRY_OPTOUT, DOTNET_NOLOGO, DOTNET_GENERATE_ASPNET_CERTIFICATE).

No behavioral change — same final env var set, no duplicates.
2026-05-17 21:53:05 -05:00
634b9c4169 feat(github-runner): harden Linux runner fleet (#5) 2026-05-18 02:51:02 +00:00
17 changed files with 4363 additions and 122 deletions

2
.gitattributes vendored Normal file
View File

@@ -0,0 +1,2 @@
*.yaml text eol=lf
*.yml text eol=lf

View File

@@ -118,6 +118,7 @@ That test project sweeps `bluejay-infra/apps/**` plus the canonical sibling `Flo
## References ## References
- OpenVox noc1 durability runbook: `docs/runbooks/openvoxserver-quadlet-durability.md`
- Cert-manager recovery playbook: `FlowerCore.Notes/memory/project_cert_manager_recovery_2026_04_22.md` - Cert-manager recovery playbook: `FlowerCore.Notes/memory/project_cert_manager_recovery_2026_04_22.md`
- Why pfSense DNS is required: `FlowerCore.Notes/memory/feedback_pfsense_dns_required_for_acme.md` - Why pfSense DNS is required: `FlowerCore.Notes/memory/feedback_pfsense_dns_required_for_acme.md`
- Public DNS operator host: `https://dns.iamworkin.lan` - Public DNS operator host: `https://dns.iamworkin.lan`

View File

@@ -30,3 +30,41 @@ spec:
port: 80 port: 80
tls: tls:
secretName: chat-web-tls secretName: chat-web-tls
---
# Public host profile marker. The app treats this header as authoritative for
# the public twin, while the internal chat.iamworkin.lan route does not attach
# it and keeps the operator-oriented UI.
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: chat-public-profile-header
namespace: fc-chat
spec:
headers:
customRequestHeaders:
X-FC-Chat-Host-Profile: "public"
---
# Public Cloudflare-fronted twin for the anonymous chat surface. Operator
# paths are intentionally absent from the allowlist below, so /admin,
# /operator, /console, /ops, /api/operator, and /operatorhub miss this route
# and return Traefik 404 before reaching the pod. Operator action still needed:
# create/verify Cloudflare DNS chat.flowercore.io -> public Traefik endpoint
# and mirror the cf-origin-flowercore-io TLS secret into namespace fc-chat.
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: chat-web-public
namespace: fc-chat
spec:
entryPoints:
- websecure
routes:
- match: Host(`chat.flowercore.io`) && (Path(`/`) || Path(`/chat`) || PathPrefix(`/_blazor`) || PathPrefix(`/_framework`) || PathPrefix(`/_content`) || PathPrefix(`/avatars`) || PathPrefix(`/css`) || PathPrefix(`/js`) || PathPrefix(`/favicon`) || PathPrefix(`/chathub`)) && (Method(`GET`) || Method(`HEAD`) || Method(`POST`) || Method(`OPTIONS`))
kind: Rule
middlewares:
- name: chat-public-profile-header
services:
- name: chat-web
port: 80
tls:
secretName: cf-origin-flowercore-io

View File

@@ -0,0 +1,30 @@
# FlowerCore RemoteDesktop warm-pool posture.
#
# The RemoteDesktop Web and Operator Deployments remain owned by
# FlowerCore.RemoteDesktop. bluejay-infra owns these GitOps pool intents so
# rebuilds preserve the operational posture without baking it into service code.
---
apiVersion: flowercore.io/v1
kind: RemoteDesktopPoolCrd
metadata:
name: qt-sdk-pool
namespace: fc-desktop
labels:
app.kubernetes.io/name: remotedesktop-pool
app.kubernetes.io/component: warm-pool
app.kubernetes.io/part-of: flowercore-remotedesktop
flowercore.io/template: dev-workstation
flowercore.io/image: localhost-fc-desktop-qt-sdk
annotations:
flowercore.io/deficit-tolerance: "0"
flowercore.io/scale-mode: ManualScaleOnDemand
flowercore.io/image-ref: localhost/fc-desktop:qt-sdk
flowercore.io/image-pull-policy: Never
spec:
templateSlug: dev-workstation
desiredSize: 0
enabled: false
userVolumeMode: LateAttach
deficitTolerance: 0
scaleMode: ManualScaleOnDemand
reconcileNow: false

View File

@@ -47,7 +47,7 @@ spec:
fsGroupChangePolicy: OnRootMismatch fsGroupChangePolicy: OnRootMismatch
containers: containers:
- name: operator - name: operator
image: localhost/fc-devicemgmt-operator:v20260512-cx5 image: localhost/fc-devicemgmt-operator:v20260519-sp34cl3-fix
imagePullPolicy: Never imagePullPolicy: Never
ports: ports:
- name: metrics - name: metrics

View File

@@ -4,6 +4,22 @@
# Sprint 9+ lane. This manifest is static-valid without requiring the image to # Sprint 9+ lane. This manifest is static-valid without requiring the image to
# exist yet; import localhost/fc-devicemgmt-web:<tag> to all schedulable RKE2 # exist yet; import localhost/fc-devicemgmt-web:<tag> to all schedulable RKE2
# nodes before letting ArgoCD sync a live rollout. # nodes before letting ArgoCD sync a live rollout.
#
# SCALED TO 0 — 2026-05-19 morning-routine cleanup.
# The Web pod cannot start until TWO upstream gaps close:
# 1. MySQL DB instance `flowercore_devicemgmt` (user `fc_devicemgmt`) is
# provisioned via fc-mysql Manager. The cluster currently has ZERO
# MySqlInstanceCrds and no `mysql.fc-mysql.svc:3306` Service, so the
# deployment-web container env `FlowerCore__Database__Host=mysql.fc-mysql.svc`
# points at nothing. Provision via the fc-mysql Manager UI/REST/MCP.
# 2. 1Password vault item `IAmWorkin/FlowerCore DeviceManagement Runtime`
# with 5 fields (DB-Password, mtls-ca.pem, mtls-client.crt, mtls-client.key,
# mtls-chain.pem) — see apps/fc-devicemgmt/1password-item.yaml. Mint mTLS
# from step-ca-agent ClusterIssuer per ADR-126; DB-Password must match the
# password configured for the MySQL user.
# Re-enable: change replicas back to 2 after both gaps close. The image tag
# in this file (v20260512-cx5) MAY also need a refresh — it predates the
# Sprint 34 Cl-3 operator fix; Web may have an analogous bug.
apiVersion: apps/v1 apiVersion: apps/v1
kind: Deployment kind: Deployment
metadata: metadata:
@@ -20,7 +36,7 @@ metadata:
annotations: annotations:
flowercore.io/traceability-standard: k8s-pod-ownership-and-traceability-standard flowercore.io/traceability-standard: k8s-pod-ownership-and-traceability-standard
spec: spec:
replicas: 2 replicas: 0
revisionHistoryLimit: 3 revisionHistoryLimit: 3
selector: selector:
matchLabels: matchLabels:

View File

@@ -532,7 +532,7 @@ spec:
fsGroupChangePolicy: OnRootMismatch fsGroupChangePolicy: OnRootMismatch
containers: containers:
- name: web - name: web
image: localhost/fc-ttsreader-web:v20260506-phase6 image: localhost/fc-ttsreader-web:v20260518-sprint36-demo-finish-b132cbf
imagePullPolicy: Never imagePullPolicy: Never
ports: ports:
- containerPort: 5217 - containerPort: 5217
@@ -555,9 +555,13 @@ spec:
- name: TtsReader__Jobs__Root - name: TtsReader__Jobs__Root
value: "/data/jobs" value: "/data/jobs"
- name: TtsReader__Piper__Host - name: TtsReader__Piper__Host
value: "ttsreader-piper.fc-ttsreader.svc.cluster.local." value: "10.0.57.17"
- name: TtsReader__Piper__Port - name: TtsReader__Piper__Port
value: "10200" value: "8500"
- name: TtsReader__Piper__Transport
value: "http"
- name: TtsReader__Piper__HttpPath
value: "/tts"
- name: TtsReader__Kokoro__Enabled - name: TtsReader__Kokoro__Enabled
value: "true" value: "true"
- name: TtsReader__Kokoro__BaseUrl - name: TtsReader__Kokoro__BaseUrl

View File

@@ -0,0 +1,76 @@
# GitHub Runner Fleet
ArgoCD owns `apps/github-runner/github-runner.yaml`. Do not patch live runner
Deployments with `kubectl`; update this manifest and let ArgoCD reconcile.
## Runner Shape
All repo-scoped Linux runners use:
- `ACCESS_TOKEN` from the `github-runner-token` Secret
- `RUN_AS_ROOT=false`
- `EPHEMERAL=true`
- `LABELS=self-hosted,linux,fc-build-linux`
- writable non-root paths under `/home/runner` for .NET, NuGet, XDG cache, and
Actions tool cache
`github-runner` for `FlowerCore.Common` is single-replica because it retains the
original Longhorn ReadWriteOnce NuGet PVC. Every other repo-scoped runner uses
two replicas with per-pod `emptyDir` caches. That is the safe backlog-drain
strategy: no two pods share one RWO PVC.
Sprint 32 final long-tail wave adds 16 two-replica Deployments:
`FlowerCore.Knowledge`, `FlowerCore.LlmBridge`, `FlowerCore.Media`,
`FlowerCore.Presentations`, `FlowerCore.RemoteDesktop`, `FlowerCore.DNS`,
`FlowerCore.Distribution`, `FlowerCore.Scoreboard`,
`FlowerCore.SegmentDisplay`, `FlowerCore.Signage.Contracts`,
`FlowerCore.SignalControl`, `FlowerCore.Intranet.Web`,
`FlowerCore.Provisioning`, `FlowerCore.Redis`, `FlowerCore.MessageBoard`, and
`FlowerCore.MenuBoard`.
## Post-Merge Proof
After the PR is merged and ArgoCD syncs, verify the runner fleet:
```bash
kubectl -n github-runner get deploy,pods,pvc
```
Verify GitHub registration for the repo-scoped runners:
```bash
for repo in FlowerCore.Common FlowerCore.Shared.Pos FlowerCore.Puppet FlowerCore.Signage \
FlowerCore.DMS FlowerCore.Telephony FlowerCore.Print.Web FlowerCore.Chat \
FlowerCore.MySQL FlowerCore.Kiosk.Linux FlowerCore.Marquee FlowerCore.TtsReader \
FlowerCore.Knowledge FlowerCore.LlmBridge FlowerCore.Media \
FlowerCore.Presentations FlowerCore.RemoteDesktop FlowerCore.DNS \
FlowerCore.Distribution FlowerCore.Scoreboard FlowerCore.SegmentDisplay \
FlowerCore.Signage.Contracts FlowerCore.SignalControl FlowerCore.Intranet.Web \
FlowerCore.Provisioning FlowerCore.Redis FlowerCore.MessageBoard \
FlowerCore.MenuBoard; do
echo "=== $repo ==="
gh api "/repos/astoltz/$repo/actions/runners" \
--jq '.runners[] | select(.labels[].name == "fc-build-linux") | {name,status,busy,labels:[.labels[].name]}'
done
```
Shared.Pos publish proof after the runner pod is online:
```bash
gh run list --repo astoltz/FlowerCore.Shared.Pos \
--workflow "Build, Test & Publish" --branch main --limit 5
```
If the latest run is still queued after runner registration, rerun the workflow
from GitHub Actions and verify it lands on an `rke2-linux-*` runner.
## Failure Notes
- `actions/setup-dotnet` permission error at `/usr/share/dotnet`: check that
`DOTNET_INSTALL_DIR=/home/runner/.dotnet` and related cache env vars are
present on the runner pod.
- `404` during runner registration: the fine-grained PAT is valid but missing
repository access for that repo. Add the repo to the PAT access list; the PAT
value does not change.
- `Multi-Attach` volume error: only the Common runner uses a RWO PVC and it must
stay single-replica. New multi-replica runners use `emptyDir`.

File diff suppressed because it is too large Load Diff

View File

@@ -723,6 +723,24 @@ data:
summary: "Mac mini GitHub runner offline ({{ $labels.runner }})" summary: "Mac mini GitHub runner offline ({{ $labels.runner }})"
description: "A macmini-* GitHub Actions runner has not reported online for more than 10 minutes. Puppet manages its LaunchDaemon under /Library/LaunchDaemons/io.flowercore.github-runner-<slug>.plist; runners survive reboot and do not require a GUI session." description: "A macmini-* GitHub Actions runner has not reported online for more than 10 minutes. Puppet manages its LaunchDaemon under /Library/LaunchDaemons/io.flowercore.github-runner-<slug>.plist; runners survive reboot and do not require a GUI session."
- name: linux-runners
rules:
- alert: LinuxRunnerOffline
expr: |
kube_deployment_status_replicas_ready{
namespace="github-runner",
deployment=~"github-runner(|-(sharedpos|puppet|signage|dms|telephony|print-web|chat|mysql|kiosk-linux))"
} == 0
for: 5m
labels:
severity: warning
alert_channel: irc
service: github-runner
team: ci
annotations:
summary: "Linux CI runner offline: {{ $labels.deployment }}"
description: "Deployment {{ $labels.deployment }} in namespace github-runner has 0 ready replicas for more than 5 minutes. CI jobs targeting this repo will queue until the runner pod restarts and re-registers with GitHub. Check pods with: kubectl -n github-runner get pods -l app.kubernetes.io/name={{ $labels.deployment }}. Check logs with: kubectl -n github-runner logs -l app.kubernetes.io/name={{ $labels.deployment }} --tail=50. Common causes: PAT missing repo access, runner CrashLoopBackOff, or node/resource pressure."
- name: remote-desktop - name: remote-desktop
rules: rules:
- alert: RemoteDesktopWebDown - alert: RemoteDesktopWebDown
@@ -948,6 +966,52 @@ data:
annotations: annotations:
summary: "Disk usage high on {{ $labels.instance }} ({{ $value | printf \"%.1f\" }}%)" summary: "Disk usage high on {{ $labels.instance }} ({{ $value | printf \"%.1f\" }}%)"
# Puppet agent + service alerts.
# Mirror of FlowerCore.Notes/scripts/monitoring/alerts.yml `puppet` group
# so a future migration to in-cluster Prometheus inherits the ruleset.
# Source-of-truth for the live Podman Prometheus on noc1 is the Notes file.
# See feedback_monitoring_k8s_target_vs_live_podman.
- name: puppet
rules:
- alert: PuppetAgentReportStale
expr: puppet_last_run_age_seconds > 7200
for: 30m
labels:
severity: warning
alert_channel: irc
annotations:
summary: "Puppet agent {{ $labels.instance }} hasn't reported in over 2h"
description: "Last run age: {{ $value | humanizeDuration }}. The puppet agent on {{ $labels.instance }} may be stopped, the node may be powered off, or noc1 may be unreachable from this node."
runbook: "1. SSH to node (via noc1 jumpbox if needed) 2. sudo systemctl status puppet 3. sudo puppet agent -t --noop to force a run 4. Check r10k: ssh fcadmin@10.0.56.10 'sudo podman logs openvoxserver --tail 50' 5. Verify noc1 reachability: ping puppet.iamworkin.lan"
- alert: PuppetAgentReportCritical
expr: puppet_last_run_age_seconds > 86400
for: 1h
labels:
severity: critical
alert_channel: irc
annotations:
summary: "Puppet agent {{ $labels.instance }} silent for over 24h — node is unmanaged"
description: "Last run age: {{ $value | humanizeDuration }}. Node {{ $labels.instance }} has not submitted a Puppet report in over 24 hours. Config drift is accumulating — investigate immediately. If intentional (maintenance), add to the exclusion filter or silence in Grafana."
runbook: "URGENT: 1. Check node power state 2. SSH via noc1 jumpbox: ssh fcadmin@10.0.56.10 then ssh <node> 3. sudo systemctl status puppet 4. sudo systemctl start puppet + sudo puppet agent -t 5. Check for network partitions (VLAN connectivity to 10.0.56.10) 6. If node was recently reimaged: sudo puppet agent -t to re-register with new SSL cert"
# Sprint 33 Cx-7 Phase B (2026-05-25 postmortem follow-up):
# Detects puppet.service in failed state — distinct from PuppetAgentReportStale
# which catches "agent hasn't run." This catches "systemd gave up restarting it"
# (CA-verify loop or other fatal exit). Requires node-exporter systemd collector
# enabled with --collector.systemd. If `node_systemd_unit_state` has no series
# for a node, the collector is disabled there — flag in postmortem follow-up.
- alert: PuppetServiceFailed
expr: node_systemd_unit_state{name="puppet.service",state="failed"} == 1
for: 5m
labels:
severity: warning
alert_channel: irc
annotations:
summary: "Puppet service failed on {{ $labels.instance }}"
description: "puppet.service on {{ $labels.instance }} has been in failed state for 5+ minutes. systemd has stopped auto-restarting (CA-verify-loop or other exit). Manual `systemctl status puppet` confirms. Run `sudo systemctl start puppet` to recover; investigate journal for root cause."
runbook_url: "https://github.com/astoltz/FlowerCore.Notes/blob/master/memory/feedback_puppet_service_dead_after_ca_loop_alert_misreads.md"
# K8s pod-state alerts. Require kube-state-metrics scrape (added # K8s pod-state alerts. Require kube-state-metrics scrape (added
# 2026-04-26 — see scrape_configs above). Would have surfaced the # 2026-04-26 — see scrape_configs above). Would have surfaced the
# agent-zero ollama-proxy 172x crash-loop instead of letting it # agent-zero ollama-proxy 172x crash-loop instead of letting it
@@ -1209,24 +1273,55 @@ metadata:
data: data:
notify.py: | notify.py: |
#!/usr/bin/env python3 #!/usr/bin/env python3
"""HTTP->IRC alert relay with thermal printer forwarding for Grafana webhooks. """HTTP->IRC alert relay with thermal-printer DIGEST forwarding.
Listens on :9119, posts to #alerts on UnrealIRCd via raw IRC protocol.
Alerts tagged alert_channel=thermal_print also POST to Print.Web /api/print/alert. Listens on :9119, posts to #alerts on UnrealIRCd, forwards to Print.Web
/api/print/alert. Thermal printing is BATCHED into hourly digests by
default so the printer no longer spam-fires per Grafana webhook.
Routing (per Grafana webhook alert):
- IRC: always per-event (operator likes the stream)
- Thermal printer:
* severity in {critical,disaster,page} OR
label alert_channel=thermal_print_immediate -> print NOW
* label alert_channel=thermal_print -> enqueue into hourly digest
* everything else -> IRC only
- RESOLVED webhooks remove the alert from the digest buffer
Env vars (defaults preserve old behavior on first deploy):
THERMAL_PRINT_ENABLED default "true" - master kill switch
BATCH_INTERVAL_MIN default "60" - minutes between digest prints
BATCH_MAX_PENDING default "50" - force-flush threshold
HTTP surface:
POST / - Grafana webhook entry
POST /flush - manual digest flush (idempotent)
GET / - status + config + buffer depth + stats
""" """
import json, socket, sys, time import json, os, socket, sys, threading, time
from collections import defaultdict
from datetime import datetime, timezone
from http.server import HTTPServer, BaseHTTPRequestHandler from http.server import HTTPServer, BaseHTTPRequestHandler
from urllib.request import Request, urlopen from urllib.request import Request, urlopen
from urllib.error import URLError
IRC_HOST = "unrealircd.irc.svc" # short name: CoreDNS ndots:5 + iamworkin.lan template hijacks full .cluster.local (see memory) THERMAL_PRINT_ENABLED = os.environ.get("THERMAL_PRINT_ENABLED", "true").lower() == "true"
IRC_PORT = 6667 BATCH_INTERVAL_MIN = int(os.environ.get("BATCH_INTERVAL_MIN", "60"))
IRC_NICK = "grafana-bot" BATCH_MAX_PENDING = int(os.environ.get("BATCH_MAX_PENDING", "50"))
IRC_CHANNEL = "#alerts"
PRINT_WEB_URL = "http://10.0.57.16:5200/api/print/alert" IRC_HOST = os.environ.get("IRC_HOST", "unrealircd.irc.svc")
PRINT_ENABLED = True IRC_PORT = int(os.environ.get("IRC_PORT", "6667"))
IRC_NICK = os.environ.get("IRC_NICK", "grafana-bot")
IRC_CHANNEL = os.environ.get("IRC_CHANNEL", "#alerts")
PRINT_WEB_URL = os.environ.get("PRINT_WEB_URL", "http://10.0.57.16:5200/api/print/alert")
_buffer_lock = threading.Lock()
_buffer = {} # fingerprint -> {"alert": dict, "first_seen": float, "last_seen": float}
_last_flush_time = time.time()
_stats = {"webhooks_received": 0, "irc_sent": 0, "print_immediate": 0,
"digest_flushed": 0, "buffer_dedup": 0, "buffer_added": 0,
"buffer_resolved": 0, "started_at": time.time()}
def send_irc(message): def send_irc(message):
"""Connect, handle PING, join, send, quit."""
try: try:
sock = socket.create_connection((IRC_HOST, IRC_PORT), timeout=15) sock = socket.create_connection((IRC_HOST, IRC_PORT), timeout=15)
sock.sendall(f"NICK {IRC_NICK}\r\n".encode()) sock.sendall(f"NICK {IRC_NICK}\r\n".encode())
@@ -1259,52 +1354,137 @@ data:
time.sleep(0.5) time.sleep(0.5)
sock.sendall(b"QUIT :alert delivered\r\n") sock.sendall(b"QUIT :alert delivered\r\n")
sock.close() sock.close()
_stats["irc_sent"] += 1
return True return True
except Exception as e: except Exception as e:
print(f"[irc-notify] IRC send failed: {e}", file=sys.stderr) print(f"[irc-notify] IRC send failed: {e}", file=sys.stderr)
return False return False
def send_thermal_print(alert): def post_thermal(payload, kind):
if not PRINT_ENABLED: return if not THERMAL_PRINT_ENABLED:
labels = alert.get("labels", {}) print(f"[irc-notify] thermal disabled; skip {kind} ({payload.get('title','?')[:40]})", file=sys.stderr)
annotations = alert.get("annotations", {}) return False
status = alert.get("status", "firing").upper()
summary = annotations.get("summary", "")
description = annotations.get("description", "")
runbook = annotations.get("runbook", "")
# Build a useful message: summary + description + runbook steps
parts = []
if summary: parts.append(summary)
if description and description != summary: parts.append(description)
if runbook: parts.append("STEPS: " + runbook)
message = " | ".join(parts) if parts else labels.get("alertname", "Unknown alert")
payload = {
"title": labels.get("alertname", "Unknown"),
"severity": labels.get("severity", "warning").capitalize(),
"host": labels.get("instance", labels.get("host", "unknown")),
"message": message,
"eventId": alert.get("fingerprint", ""),
"source": "Grafana",
"status": "RESOLVED" if status == "RESOLVED" else "PROBLEM",
"acknowledged": False
}
try: try:
req = Request(PRINT_WEB_URL, data=json.dumps(payload).encode("utf-8"), req = Request(PRINT_WEB_URL, data=json.dumps(payload).encode("utf-8"),
headers={"Content-Type": "application/json"}, method="POST") headers={"Content-Type": "application/json"}, method="POST")
resp = urlopen(req, timeout=10) resp = urlopen(req, timeout=10)
print(f"[irc-notify] Thermal print sent: {resp.read().decode()}", file=sys.stderr) if kind == "immediate": _stats["print_immediate"] += 1
print(f"[irc-notify] thermal {kind} sent: {payload.get('title','?')[:50]}", file=sys.stderr)
return True
except Exception as e: except Exception as e:
print(f"[irc-notify] Thermal print failed: {e}", file=sys.stderr) print(f"[irc-notify] thermal {kind} failed: {e}", file=sys.stderr)
return False
def should_print(alert): def fingerprint_of(alert):
fp = alert.get("fingerprint", "")
if fp: return fp
labels = alert.get("labels", {}) labels = alert.get("labels", {})
if labels.get("alert_channel") == "thermal_print": return True target = labels.get("pod") or labels.get("instance") or labels.get("deployment") or labels.get("statefulset") or labels.get("namespace") or ""
if labels.get("severity", "").lower() in ("critical", "disaster"): return True return f"{labels.get('alertname','?')}/{labels.get('namespace','')}/{target}"
if alert.get("status", "").upper() == "RESOLVED": return False
return False def is_critical(alert):
return alert.get("labels", {}).get("severity", "").lower() in ("critical", "disaster", "page")
def is_immediate_label(alert):
return alert.get("labels", {}).get("alert_channel") == "thermal_print_immediate"
def is_batched_label(alert):
return alert.get("labels", {}).get("alert_channel") == "thermal_print"
def add_to_digest(alert):
"""Add an alert to the digest buffer. Returns True if the buffer GREW
(new fingerprint), False if it was a dedup, resolution, or no-op.
"""
if not THERMAL_PRINT_ENABLED: return False
fp = fingerprint_of(alert)
status = alert.get("status", "firing").lower()
with _buffer_lock:
if status == "resolved":
if fp in _buffer:
del _buffer[fp]
_stats["buffer_resolved"] += 1
return False
if fp in _buffer:
_buffer[fp]["last_seen"] = time.time()
_buffer[fp]["alert"] = alert
_stats["buffer_dedup"] += 1
return False
_buffer[fp] = {"alert": alert, "first_seen": time.time(), "last_seen": time.time()}
_stats["buffer_added"] += 1
return True
def build_digest_payload():
with _buffer_lock:
items = list(_buffer.values())
if not items: return None
by_name = defaultdict(list)
for item in items:
labels = item["alert"].get("labels", {})
by_name[labels.get("alertname", "Unknown")].append(item)
lines = []
for name, group in sorted(by_name.items()):
targets = []
for it in group[:5]:
labels = it["alert"].get("labels", {})
t = (labels.get("pod") or labels.get("instance") or labels.get("deployment")
or labels.get("statefulset") or labels.get("namespace") or "?")
targets.append(t)
more = f" (+{len(group)-5})" if len(group) > 5 else ""
sevs = sorted({it["alert"].get("labels", {}).get("severity", "warning") for it in group})
lines.append(f"[{'/'.join(sevs)}] {name} x{len(group)}: {', '.join(targets)}{more}")
now = datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M UTC")
title = f"Alert digest: {len(items)} firing"
body = "\n".join([
f"=== {title} ===",
f"as of {now}",
"",
*lines,
"",
"Stream: #alerts (IRC) | Triage: grafana-noc1.iamworkin.lan",
"Force-flush: POST irc-notify.monitoring.svc:9119/flush",
])
return {"title": title, "severity": "Warning", "host": "monitoring",
"message": body, "eventId": f"digest-{int(time.time())}",
"source": "Grafana digest", "status": "PROBLEM", "acknowledged": False}
def flush_digest():
payload = build_digest_payload()
if payload is None:
print("[irc-notify] flush: buffer empty, no digest sent", file=sys.stderr)
return False
sent = post_thermal(payload, "digest")
with _buffer_lock:
_buffer.clear()
if sent: _stats["digest_flushed"] += 1
return sent
def digest_loop():
global _last_flush_time
while True:
try:
now = time.time()
elapsed = now - _last_flush_time
if elapsed >= BATCH_INTERVAL_MIN * 60:
print(f"[irc-notify] digest tick: interval reached ({BATCH_INTERVAL_MIN}m); buffer={len(_buffer)}", file=sys.stderr)
flush_digest()
_last_flush_time = now
elif len(_buffer) >= BATCH_MAX_PENDING:
print(f"[irc-notify] digest tick: buffer full ({len(_buffer)}); force flush", file=sys.stderr)
flush_digest()
_last_flush_time = now
time.sleep(15)
except Exception as e:
print(f"[irc-notify] digest loop error: {e}", file=sys.stderr)
time.sleep(60)
class Handler(BaseHTTPRequestHandler): class Handler(BaseHTTPRequestHandler):
def do_POST(self): def do_POST(self):
if self.path == "/flush":
ok = flush_digest()
self.send_response(200); self.send_header("Content-Type", "application/json"); self.end_headers()
self.wfile.write(json.dumps({"flushed": ok, "buffer_after": len(_buffer)}).encode())
return
_stats["webhooks_received"] += 1
length = int(self.headers.get("Content-Length", 0)) length = int(self.headers.get("Content-Length", 0))
body = json.loads(self.rfile.read(length)) if length else {} body = json.loads(self.rfile.read(length)) if length else {}
for alert in body.get("alerts", []): for alert in body.get("alerts", []):
@@ -1319,22 +1499,56 @@ data:
msg = f"{icon}{sev_tag} {name}: {summary}" msg = f"{icon}{sev_tag} {name}: {summary}"
if desc: msg += f"\n {desc}" if desc: msg += f"\n {desc}"
send_irc(msg) send_irc(msg)
if should_print(alert): send_thermal_print(alert) # Thermal routing — EVERYTHING (including criticals) goes into
self.send_response(200) # the hourly digest. Only the explicit `alert_channel=thermal_print_immediate`
self.send_header("Content-Type", "application/json") # label bypasses, and even that flushes-the-current-digest rather
self.end_headers() # than printing a standalone job, so the same fingerprint can't
# spam the printer per webhook cycle.
if status == "RESOLVED":
add_to_digest(alert) # removes from buffer
continue
if is_immediate_label(alert):
# Explicit opt-in for "paper this NOW" — first arrival of a
# new fingerprint triggers an immediate digest flush; repeat
# webhooks for the same fingerprint dedupe in the buffer
# until the next interval or until the alert resolves.
new_in_buffer = add_to_digest(alert)
if new_in_buffer:
global _last_flush_time
flush_digest()
_last_flush_time = time.time()
elif is_critical(alert) or is_batched_label(alert):
add_to_digest(alert)
# else: IRC-only (warnings without thermal_print label)
self.send_response(200); self.send_header("Content-Type", "application/json"); self.end_headers()
self.wfile.write(b'{"status":"ok"}') self.wfile.write(b'{"status":"ok"}')
def do_GET(self): def do_GET(self):
self.send_response(200) self.send_response(200); self.send_header("Content-Type", "application/json"); self.end_headers()
self.send_header("Content-Type", "application/json") with _buffer_lock:
self.end_headers() alertnames = sorted({it["alert"].get("labels", {}).get("alertname", "?") for it in _buffer.values()})
self.wfile.write(json.dumps({"service":"irc-notify","thermal_print":PRINT_ENABLED}).encode()) depth = len(_buffer)
info = {
"service": "irc-notify",
"config": {"thermal_print_enabled": THERMAL_PRINT_ENABLED,
"batch_interval_min": BATCH_INTERVAL_MIN,
"batch_max_pending": BATCH_MAX_PENDING,
"irc_target": f"{IRC_HOST}:{IRC_PORT} {IRC_CHANNEL}",
"print_web_url": PRINT_WEB_URL},
"buffer": {"depth": depth, "alertnames": alertnames,
"seconds_since_last_flush": int(time.time() - _last_flush_time),
"seconds_until_next_flush": max(0, int(BATCH_INTERVAL_MIN*60 - (time.time() - _last_flush_time)))},
"stats": _stats,
}
self.wfile.write(json.dumps(info, indent=2).encode())
def log_message(self, format, *args): def log_message(self, format, *args):
print(f"[irc-notify] {args[0]}", file=sys.stderr) print(f"[irc-notify] {args[0]}", file=sys.stderr)
if __name__ == "__main__": if __name__ == "__main__":
threading.Thread(target=digest_loop, daemon=True).start()
server = HTTPServer(("0.0.0.0", 9119), Handler) server = HTTPServer(("0.0.0.0", 9119), Handler)
print(f"IRC alert relay :9119 -> {IRC_HOST}:{IRC_PORT} {IRC_CHANNEL} (thermal: {PRINT_ENABLED})") print(f"[irc-notify] :9119 -> IRC {IRC_HOST}:{IRC_PORT} {IRC_CHANNEL} | thermal={'ON' if THERMAL_PRINT_ENABLED else 'OFF'} | digest={BATCH_INTERVAL_MIN}m max={BATCH_MAX_PENDING}", file=sys.stderr)
server.serve_forever() server.serve_forever()
# ============================================================================= # =============================================================================
@@ -3421,6 +3635,39 @@ data:
relativeTimeRange: {from: 120, to: 0} relativeTimeRange: {from: 120, to: 0}
datasourceUid: __expr__ datasourceUid: __expr__
model: {type: threshold, expression: B, conditions: [{evaluator: {params: [600], type: gt}}], refId: C} model: {type: threshold, expression: B, conditions: [{evaluator: {params: [600], type: gt}}], refId: C}
- orgId: 1
name: CI Runners
folder: CI Alerts
interval: 1m
rules:
- uid: linux-runner-offline
title: LinuxRunnerOffline
condition: C
for: 5m
noDataState: OK
execErrState: Error
annotations:
summary: "Linux CI runner offline: {{ $labels.deployment }}"
description: "A github-runner namespace Deployment has 0 ready replicas for more than 5 minutes. CI jobs targeting that repo will queue until the runner pod restarts and re-registers."
runbook: "1. kubectl -n github-runner get pods -l app.kubernetes.io/name={{ $labels.deployment }} 2. kubectl -n github-runner logs -l app.kubernetes.io/name={{ $labels.deployment }} --tail=50 3. Verify PAT repo access if registration returns 404 4. Verify no RWO PVC is shared by scaled runners"
labels:
severity: warning
service: github-runner
alert_channel: irc
team: ci
data:
- refId: A
relativeTimeRange: {from: 300, to: 0}
datasourceUid: prometheus
model: {expr: 'kube_deployment_status_replicas_ready{namespace="github-runner",deployment=~"github-runner(|-(sharedpos|puppet|signage|dms|telephony|print-web|chat|mysql|kiosk-linux))"} == 0', instant: true, refId: A}
- refId: B
relativeTimeRange: {from: 300, to: 0}
datasourceUid: __expr__
model: {type: reduce, expression: A, reducer: last, refId: B}
- refId: C
relativeTimeRange: {from: 300, to: 0}
datasourceUid: __expr__
model: {type: threshold, expression: B, conditions: [{evaluator: {params: [0], type: gt}}], refId: C}
- orgId: 1 - orgId: 1
name: Infrastructure name: Infrastructure
folder: AI Stack Alerts folder: AI Stack Alerts

View File

@@ -28,9 +28,12 @@ Source: `D:\git\FlowerCore\FlowerCore.WorldBuilder` (master)
Memory: `feedback_rke2_image_import_per_node_scp`. Memory: `feedback_rke2_image_import_per_node_scp`.
3. **Bump image tag** in `worldbuilder.yaml` and git push. 3. **Bump image tag** in `worldbuilder.yaml` and git push.
ArgoCD ApplicationSet picks up within ~3 minutes. ArgoCD ApplicationSet picks up within ~3 minutes.
4. **First production render** — open `https://worldbuilder.iamworkin.lan`, 4. **First production render** — open
create World → Character → Storyboard → ExportJob, confirm artifact `https://worldbuilder.iamworkin.lan/studio/c32e0000-0000-4000-8000-000000000004`
downloads. ComfyUI lives on BLUEJAY-WS at `http://10.0.56.20:8188`. and confirm the Cyberpunk Blue Jay demo prompt loads with five seeded fake
generated images. This Sprint 32 visitor-safe profile uses
`ClientMode=fake`; switch the image-generation env vars back to ComfyUI only
for an operator-owned GPU render lane.
## Health probes ## Health probes
@@ -53,8 +56,13 @@ Source: `D:\git\FlowerCore\FlowerCore.WorldBuilder` (master)
## Image generation backend ## Image generation backend
`FlowerCore:WorldBuilder:ImageGeneration:BaseUrl=http://10.0.56.20:8188` — Sprint 32 pins the Kubernetes profile to
ComfyUI runs on BLUEJAY-WS Windows (R9700 / gfx1201 / ROCm 7.2.1). Pod reaches `FlowerCore:WorldBuilder:ImageGeneration:ClientMode=fake` with
the workstation directly across the 10.0.56.0/24 VLAN (no Podman-style host- `BaseUrl=http://127.0.0.1:1`. That keeps the public/internal visitor demo
filter issues — K8s pods route via Calico, which is L3-routed across the deterministic, avoids GPU exposure, and still exercises the studio/gallery
VLAN). surface with persisted generated-image metadata.
The previous ComfyUI backend target was `http://10.0.56.20:8188` on
BLUEJAY-WS (R9700 / gfx1201 / ROCm 7.2.1). Re-enable it only in an
operator-owned follow-up that also verifies workstation reachability and image
import freshness.

View File

@@ -16,7 +16,11 @@ kind: Namespace
metadata: metadata:
name: fc-worldbuilder name: fc-worldbuilder
labels: labels:
app.kubernetes.io/name: fc-worldbuilder
app.kubernetes.io/part-of: flowercore app.kubernetes.io/part-of: flowercore
app.kubernetes.io/managed-by: argocd
flowercore.io/tenant-id: system
flowercore.io/created-by: bluejay-infra
--- ---
# SQLite DB + generated image gallery + PDF/PNG exports. # SQLite DB + generated image gallery + PDF/PNG exports.
# Longhorn RWO — single replica with `Recreate` rollout strategy keeps it safe. # Longhorn RWO — single replica with `Recreate` rollout strategy keeps it safe.
@@ -25,6 +29,13 @@ kind: PersistentVolumeClaim
metadata: metadata:
name: worldbuilder-data name: worldbuilder-data
namespace: fc-worldbuilder namespace: fc-worldbuilder
labels:
app.kubernetes.io/name: worldbuilder-data
app.kubernetes.io/component: storage
app.kubernetes.io/part-of: flowercore
app.kubernetes.io/managed-by: argocd
flowercore.io/tenant-id: system
flowercore.io/created-by: bluejay-infra
spec: spec:
accessModes: accessModes:
- ReadWriteOnce - ReadWriteOnce
@@ -40,7 +51,13 @@ metadata:
namespace: fc-worldbuilder namespace: fc-worldbuilder
labels: labels:
app.kubernetes.io/name: worldbuilder-web app.kubernetes.io/name: worldbuilder-web
app.kubernetes.io/component: web
app.kubernetes.io/part-of: flowercore app.kubernetes.io/part-of: flowercore
app.kubernetes.io/managed-by: argocd
flowercore.io/tenant-id: system
flowercore.io/created-by: bluejay-infra
annotations:
flowercore.io/traceability-standard: k8s-pod-ownership-and-traceability-standard
spec: spec:
replicas: 1 replicas: 1
revisionHistoryLimit: 3 revisionHistoryLimit: 3
@@ -54,11 +71,16 @@ spec:
metadata: metadata:
labels: labels:
app.kubernetes.io/name: worldbuilder-web app.kubernetes.io/name: worldbuilder-web
app.kubernetes.io/component: web
app.kubernetes.io/part-of: flowercore app.kubernetes.io/part-of: flowercore
app.kubernetes.io/managed-by: argocd
flowercore.io/tenant-id: system
flowercore.io/created-by: bluejay-infra
annotations: annotations:
prometheus.io/scrape: "true" prometheus.io/scrape: "true"
prometheus.io/port: "8080" prometheus.io/port: "8080"
prometheus.io/path: "/metrics/prometheus" prometheus.io/path: "/metrics/prometheus"
flowercore.io/audit-trace-id: "worldbuilder-runtime-demo"
spec: spec:
securityContext: securityContext:
fsGroup: 1654 fsGroup: 1654
@@ -92,11 +114,14 @@ spec:
value: "/data/gallery" value: "/data/gallery"
- name: FlowerCore__WorldBuilder__Export__RootPath - name: FlowerCore__WorldBuilder__Export__RootPath
value: "/data/exports" value: "/data/exports"
# ComfyUI on BLUEJAY-WS (R9700 / gfx1201 / ROCm 7.2.1). # Visitor-safe Sprint 32 profile: fake backend keeps public demo
# rendering deterministic and avoids exposing BLUEJAY-WS GPU.
- name: FlowerCore__WorldBuilder__ImageGeneration__BaseUrl - name: FlowerCore__WorldBuilder__ImageGeneration__BaseUrl
value: "http://10.0.56.20:8188" value: "http://127.0.0.1:1"
- name: FlowerCore__WorldBuilder__ImageGeneration__ClientMode - name: FlowerCore__WorldBuilder__ImageGeneration__ClientMode
value: "comfyui" value: "fake"
- name: FlowerCore__WorldBuilder__ImageGeneration__BackendId
value: "fake"
resources: resources:
# Cluster CPU-request budget runs hot (99% on all 3 nodes at deploy # Cluster CPU-request budget runs hot (99% on all 3 nodes at deploy
# time) while actual CPU usage is well below capacity. Idle Blazor # time) while actual CPU usage is well below capacity. Idle Blazor
@@ -165,7 +190,11 @@ metadata:
namespace: fc-worldbuilder namespace: fc-worldbuilder
labels: labels:
app.kubernetes.io/name: worldbuilder-web app.kubernetes.io/name: worldbuilder-web
app.kubernetes.io/component: web
app.kubernetes.io/part-of: flowercore app.kubernetes.io/part-of: flowercore
app.kubernetes.io/managed-by: argocd
flowercore.io/tenant-id: system
flowercore.io/created-by: bluejay-infra
spec: spec:
type: ClusterIP type: ClusterIP
selector: selector:
@@ -180,6 +209,13 @@ kind: Certificate
metadata: metadata:
name: worldbuilder-web-tls name: worldbuilder-web-tls
namespace: fc-worldbuilder namespace: fc-worldbuilder
labels:
app.kubernetes.io/name: worldbuilder-web-tls
app.kubernetes.io/component: ingress
app.kubernetes.io/part-of: flowercore
app.kubernetes.io/managed-by: argocd
flowercore.io/tenant-id: system
flowercore.io/created-by: bluejay-infra
spec: spec:
secretName: worldbuilder-web-tls secretName: worldbuilder-web-tls
issuerRef: issuerRef:
@@ -200,6 +236,13 @@ kind: IngressRoute
metadata: metadata:
name: worldbuilder-web name: worldbuilder-web
namespace: fc-worldbuilder namespace: fc-worldbuilder
labels:
app.kubernetes.io/name: worldbuilder-web
app.kubernetes.io/component: ingress
app.kubernetes.io/part-of: flowercore
app.kubernetes.io/managed-by: argocd
flowercore.io/tenant-id: system
flowercore.io/created-by: bluejay-infra
spec: spec:
entryPoints: entryPoints:
- websecure - websecure

View File

@@ -0,0 +1,84 @@
# openvoxserver Quadlet Durability
This runbook documents the noc1 `openvoxserver` durability fix for the Puppet control-repo deploy path. The service is a noc1 host artifact, not an ArgoCD application, so discovery always starts on noc1 rather than in `apps/*`.
## Current State
As of the Sprint 32 Cx-12 apply on 2026-05-17:
- `/etc/containers/systemd/openvoxserver.container` has a `GIT_SSH_COMMAND` environment entry that points at the persisted serverdata deploy key.
- `/etc/systemd/system/openvoxserver-safeconfig.service` is enabled and active, and reapplies `git config --global --add safe.directory *` inside the running container.
- `/opt/puppet/r10k-deploy.sh` self-heals before each fetch by setting `safe.directory`, the repo-local `core.sshCommand`, and the persisted `known_hosts` file when needed.
- `puppet-deploy.service` exits `0/SUCCESS` after the apply and the control repo reports `HEAD == origin/master`.
- `systemctl cat openvoxserver` does not currently resolve to a generated unit on noc1. The container is running through Podman with `restart=always`, so destructive recreate smoke must not run until the generated unit is present.
## Discovery
Run every command through noc1 as `fcadmin`; do not assume BLUEJAY-WS can reach container-local surfaces directly.
```bash
ssh -i ~/.ssh/fcadmin_ed25519 fcadmin@10.0.56.10 "hostname && sudo -n true"
ssh -i ~/.ssh/fcadmin_ed25519 fcadmin@10.0.56.10 "sudo find /etc/containers/systemd /usr/share/containers/systemd /etc/systemd/system -name 'openvoxserver*' 2>/dev/null"
ssh -i ~/.ssh/fcadmin_ed25519 fcadmin@10.0.56.10 "sudo sed -n '1,220p' /etc/containers/systemd/openvoxserver.container"
ssh -i ~/.ssh/fcadmin_ed25519 fcadmin@10.0.56.10 "sudo systemctl cat puppet-deploy.service"
```
If a future noc1 profile manages these files, update the Puppet control repo and let `puppet-deploy.service` apply the change. On 2026-05-17, host `puppet` was not installed, so Cx-12 used a direct noc1 host edit.
## Durable Fix Shape
The Quadlet keeps the deploy key as a path reference only:
```ini
Environment=GIT_SSH_COMMAND=ssh -i /opt/puppetlabs/server/data/puppetserver/.puppet-deploy-key -o StrictHostKeyChecking=yes -o IdentitiesOnly=yes -o UserKnownHostsFile=/opt/puppetlabs/server/data/puppetserver/.known_hosts
```
The safeconfig service is intentionally independent of `openvoxserver.service` until the generated unit exists. It waits for the `openvoxserver` container name and then runs:
```bash
/usr/bin/podman exec openvoxserver git config --global --add safe.directory *
```
The deploy script self-heals inside the container before it fetches the control repo:
```bash
git config --global --add safe.directory "*" 2>/dev/null || true
DEPLOY_KEY="/opt/puppetlabs/server/data/puppetserver/.puppet-deploy-key"
KNOWN_HOSTS="/opt/puppetlabs/server/data/puppetserver/.known_hosts"
REPO="/etc/puppetlabs/code/environments/production"
export GIT_SSH_COMMAND="ssh -i $DEPLOY_KEY -o StrictHostKeyChecking=yes -o IdentitiesOnly=yes -o UserKnownHostsFile=$KNOWN_HOSTS"
git -C "$REPO" config core.sshCommand "$GIT_SSH_COMMAND" 2>/dev/null || true
```
## Validation
Non-destructive validation:
```bash
ssh -i ~/.ssh/fcadmin_ed25519 fcadmin@10.0.56.10 "sudo grep -n 'GIT_SSH_COMMAND' /etc/containers/systemd/openvoxserver.container"
ssh -i ~/.ssh/fcadmin_ed25519 fcadmin@10.0.56.10 "sudo systemctl status openvoxserver-safeconfig.service --no-pager -l"
ssh -i ~/.ssh/fcadmin_ed25519 fcadmin@10.0.56.10 "sudo systemctl start puppet-deploy.service && sudo systemctl status puppet-deploy.service --no-pager -l"
ssh -i ~/.ssh/fcadmin_ed25519 fcadmin@10.0.56.10 "sudo podman exec openvoxserver git -C /etc/puppetlabs/code/environments/production config --get core.sshCommand"
```
Destructive recreate smoke is opt-in only:
```bash
scp scripts/monitoring/openvox-recreate-smoke.sh fcadmin@10.0.56.10:/tmp/openvox-recreate-smoke.sh
ssh -i ~/.ssh/fcadmin_ed25519 fcadmin@10.0.56.10 "chmod +x /tmp/openvox-recreate-smoke.sh && sudo OPENVOX_RECREATE_SMOKE=1 /tmp/openvox-recreate-smoke.sh"
```
Do not run the smoke during normal sprint work. It stops and removes the production container before starting it again through systemd, and it now refuses to continue unless `systemctl cat openvoxserver` succeeds.
## Credential Rotation Note
When rotating the Puppet deploy key, update the persisted serverdata copy on noc1:
```bash
sudo install -m 0600 -o root -g root <new-deploy-key> /opt/puppet/serverdata/.puppet-deploy-key
sudo podman exec openvoxserver sh -c "ssh-keyscan github.com > /opt/puppetlabs/server/data/puppetserver/.known_hosts"
sudo systemctl start openvoxserver-safeconfig.service
sudo systemctl start puppet-deploy.service
```
Never commit the deploy key or print it in logs.

View File

@@ -0,0 +1,48 @@
#!/usr/bin/env bash
set -euo pipefail
if [ "${OPENVOX_RECREATE_SMOKE:-}" != "1" ]; then
echo "SKIP: set OPENVOX_RECREATE_SMOKE=1 to run the destructive openvoxserver recreate smoke." >&2
exit 64
fi
SUDO="${SUDO:-sudo}"
REPO="/etc/puppetlabs/code/environments/production"
CORE_SSH_COMMAND_FRAGMENT=".puppet-deploy-key"
if ! $SUDO systemctl cat openvoxserver >/dev/null 2>&1; then
echo "SKIP: systemctl cat openvoxserver failed; refusing to remove a container without a verified systemd recreate path." >&2
exit 65
fi
before="$($SUDO podman exec openvoxserver git -C "$REPO" rev-parse --short HEAD)"
echo "Before recreate: $before"
$SUDO systemctl stop openvoxserver
$SUDO podman rm openvoxserver 2>/dev/null || true
$SUDO systemctl start openvoxserver
sleep 50
$SUDO systemctl start puppet-deploy.service
sleep 5
$SUDO systemctl status puppet-deploy.service --no-pager -l
after="$($SUDO podman exec openvoxserver git -C "$REPO" rev-parse --short origin/master)"
echo "After recreate origin/master: $after"
$SUDO test -d /opt/puppet/code/environments/production/site-modules/profile/manifests
core_ssh="$($SUDO podman exec openvoxserver git -C "$REPO" config --get core.sshCommand)"
case "$core_ssh" in
*"$CORE_SSH_COMMAND_FRAGMENT"*) ;;
*)
echo "FAIL: core.sshCommand does not reference the persisted deploy key." >&2
exit 1
;;
esac
$SUDO podman exec openvoxserver git -C "$REPO" status --short --branch
echo "PASS: openvoxserver recreate smoke completed without git safety or deploy-key failure."

View File

@@ -13,6 +13,7 @@ public sealed class FleetManifestLintTests
private static readonly HashSet<string> PublicReadOnlyHosts = new(StringComparer.Ordinal) private static readonly HashSet<string> PublicReadOnlyHosts = new(StringComparer.Ordinal)
{ {
"brochure.flowercore.io",
"dist.flowercore.io", "dist.flowercore.io",
"dns.iamworkin.lan", "dns.iamworkin.lan",
}; };
@@ -54,6 +55,43 @@ public sealed class FleetManifestLintTests
"ttsreader-piper", "ttsreader-piper",
}; };
private static readonly IReadOnlyDictionary<string, string> LinuxRunnerRepos = new Dictionary<string, string>(StringComparer.Ordinal)
{
["github-runner"] = "https://github.com/astoltz/FlowerCore.Common",
["github-runner-sharedpos"] = "https://github.com/astoltz/FlowerCore.Shared.Pos",
["github-runner-puppet"] = "https://github.com/astoltz/FlowerCore.Puppet",
["github-runner-signage"] = "https://github.com/astoltz/FlowerCore.Signage",
["github-runner-dms"] = "https://github.com/astoltz/FlowerCore.DMS",
["github-runner-telephony"] = "https://github.com/astoltz/FlowerCore.Telephony",
["github-runner-print-web"] = "https://github.com/astoltz/FlowerCore.Print.Web",
["github-runner-chat"] = "https://github.com/astoltz/FlowerCore.Chat",
["github-runner-mysql"] = "https://github.com/astoltz/FlowerCore.MySQL",
["github-runner-kiosk-linux"] = "https://github.com/astoltz/FlowerCore.Kiosk.Linux",
};
private static readonly HashSet<string> ScaledLinuxRunnerDeployments = new(StringComparer.Ordinal)
{
"github-runner-sharedpos",
"github-runner-puppet",
"github-runner-signage",
"github-runner-dms",
"github-runner-telephony",
"github-runner-print-web",
"github-runner-chat",
"github-runner-mysql",
"github-runner-kiosk-linux",
};
private static readonly IReadOnlyDictionary<string, string> WritableRunnerEnv = new Dictionary<string, string>(StringComparer.Ordinal)
{
["HOME"] = "/home/runner",
["DOTNET_INSTALL_DIR"] = "/home/runner/.dotnet",
["DOTNET_CLI_HOME"] = "/home/runner",
["NUGET_PACKAGES"] = "/home/runner/.nuget/packages",
["XDG_CACHE_HOME"] = "/home/runner/.cache",
["RUNNER_TOOL_CACHE"] = "/home/runner/_tool",
};
[Fact] [Fact]
public void IngressRoutes_MustKeepServiceReferencesInTheSameNamespace() public void IngressRoutes_MustKeepServiceReferencesInTheSameNamespace()
{ {
@@ -187,6 +225,98 @@ public sealed class FleetManifestLintTests
violations.Should().BeEmpty(); violations.Should().BeEmpty();
} }
[Fact]
public void GitHubRunnerFleet_MustRegisterRequiredReposAsRepoScopedDeployments()
{
var deployments = GitHubRunnerDeployments();
foreach (var expectedRunner in LinuxRunnerRepos)
{
deployments.Should().ContainKey(expectedRunner.Key);
var container = deployments[expectedRunner.Key].ContainerMappings().Should().ContainSingle().Subject;
EnvValue(container, "REPO_URL").Should().Be(expectedRunner.Value);
EnvValue(container, "EPHEMERAL").Should().Be("true");
EnvValue(container, "LABELS").Should().Be("self-hosted,linux,fc-build-linux");
EnvValue(container, "RUN_AS_ROOT").Should().Be("false");
EnvValue(container, "ACCESS_TOKEN").Should().BeNull("ACCESS_TOKEN must come from github-runner-token Secret, not a literal");
EnvSecretName(container, "ACCESS_TOKEN").Should().Be("github-runner-token");
EnvSecretKey(container, "ACCESS_TOKEN").Should().Be("credential");
}
}
[Fact]
public void GitHubRunnerFleet_MustSetWritableNonRootDotnetAndCachePaths()
{
foreach (var deployment in GitHubRunnerDeployments().Values)
{
var container = deployment.ContainerMappings().Should().ContainSingle().Subject;
foreach (var expectedEnv in WritableRunnerEnv)
{
EnvValue(container, expectedEnv.Key).Should().Be(expectedEnv.Value, $"{deployment.Name} must keep .NET paths writable for uid 1001");
}
var mounts = ManifestNodeExtensions.MappingSequence(container, "volumeMounts")
.ToDictionary(
mount => ManifestNodeExtensions.Scalar(mount, "name") ?? string.Empty,
mount => ManifestNodeExtensions.Scalar(mount, "mountPath") ?? string.Empty,
StringComparer.Ordinal);
mounts.Should().Contain("runner-home", "/home/runner");
mounts.Should().Contain("nuget-cache", "/home/runner/.nuget/packages");
mounts.Should().Contain("tmp", "/tmp");
}
}
[Fact]
public void GitHubRunnerFleet_MustAvoidRwoMultiAttachForScaledDeployments()
{
var deployments = GitHubRunnerDeployments();
foreach (var deploymentName in ScaledLinuxRunnerDeployments)
{
var deployment = deployments[deploymentName];
ReplicaCount(deployment).Should().Be(2);
var volumes = deployment.MappingSequence("spec", "template", "spec", "volumes");
var claimNames = volumes
.Select(volume => ManifestNodeExtensions.Scalar(volume, "persistentVolumeClaim", "claimName"))
.Where(value => !string.IsNullOrWhiteSpace(value))
.ToList();
claimNames.Should().BeEmpty($"{deploymentName} is scaled and must not share a RWO PVC");
volumes.Should().Contain(volume =>
string.Equals(ManifestNodeExtensions.Scalar(volume, "name"), "nuget-cache", StringComparison.Ordinal)
&& ManifestNodeExtensions.Mapping(volume, "emptyDir") != null);
}
var common = deployments["github-runner"];
ReplicaCount(common).Should().Be(1);
common.MappingSequence("spec", "template", "spec", "volumes")
.Select(volume => ManifestNodeExtensions.Scalar(volume, "persistentVolumeClaim", "claimName"))
.Where(value => !string.IsNullOrWhiteSpace(value))
.Should()
.ContainSingle()
.Which
.Should()
.Be("github-runner-nuget-cache");
}
[Fact]
public void Monitoring_MustAlertWhenLinuxRunnerDeploymentIsUnavailable()
{
var monitoring = File.ReadAllText(Path.Combine(Inventory.BluejayRoot, "apps", "monitoring", "noc-monitoring.yaml"));
monitoring.Should().Contain("MacMiniRunnerOffline");
monitoring.Should().Contain("LinuxRunnerOffline");
monitoring.Should().Contain("kube_deployment_status_replicas_ready");
monitoring.Should().Contain("github-runner(|-(sharedpos|puppet|signage|dms|telephony|print-web|chat|mysql|kiosk-linux))");
monitoring.Should().Contain("folder: CI Alerts");
monitoring.Should().Contain("uid: linux-runner-offline");
monitoring.Should().Contain("alert_channel: irc");
}
[Fact] [Fact]
public void StatefulSets_WithVolumeClaimTemplates_MustDeclareFilesystemDefaults() public void StatefulSets_WithVolumeClaimTemplates_MustDeclareFilesystemDefaults()
{ {
@@ -493,6 +623,44 @@ public sealed class FleetManifestLintTests
}; };
} }
private static IReadOnlyDictionary<string, ManifestDocument> GitHubRunnerDeployments()
{
return Inventory.Documents
.Where(document => document.Kind == "Deployment")
.Where(document => document.Namespace == "github-runner")
.ToDictionary(document => document.Name, StringComparer.Ordinal);
}
private static int ReplicaCount(ManifestDocument document)
{
return int.TryParse(document.Scalar("spec", "replicas"), out var replicas) ? replicas : 1;
}
private static string? EnvValue(YamlMappingNode container, string name)
{
return EnvMapping(container, name) is { } env ? ManifestNodeExtensions.Scalar(env, "value") : null;
}
private static string? EnvSecretName(YamlMappingNode container, string name)
{
return EnvMapping(container, name) is { } env
? ManifestNodeExtensions.Scalar(env, "valueFrom", "secretKeyRef", "name")
: null;
}
private static string? EnvSecretKey(YamlMappingNode container, string name)
{
return EnvMapping(container, name) is { } env
? ManifestNodeExtensions.Scalar(env, "valueFrom", "secretKeyRef", "key")
: null;
}
private static YamlMappingNode? EnvMapping(YamlMappingNode container, string name)
{
return ManifestNodeExtensions.MappingSequence(container, "env")
.SingleOrDefault(env => string.Equals(ManifestNodeExtensions.Scalar(env, "name"), name, StringComparison.Ordinal));
}
private static IReadOnlyList<ManifestDocument> FcDeviceManagementDocuments() private static IReadOnlyList<ManifestDocument> FcDeviceManagementDocuments()
{ {
return Inventory.Documents return Inventory.Documents

View File

@@ -0,0 +1,99 @@
using FluentAssertions;
using Xunit;
namespace BluejayInfraLint.Tests;
[Trait("Category", "Unit")]
public sealed class OpenVoxServerDurabilityTests
{
private static readonly string Root = FindRepoRoot();
private static readonly string RunbookPath = Path.Combine(Root, "docs", "runbooks", "openvoxserver-quadlet-durability.md");
private static readonly string SmokePath = Path.Combine(Root, "scripts", "monitoring", "openvox-recreate-smoke.sh");
[Fact]
public void Runbook_DocumentsHostArtifactAndNonArgoPath()
{
var runbook = File.ReadAllText(RunbookPath);
runbook.Should().Contain("noc1 host artifact");
runbook.Should().Contain("not an ArgoCD application");
runbook.Should().Contain("systemctl cat openvoxserver");
runbook.Should().Contain("/etc/containers/systemd/openvoxserver.container");
}
[Fact]
public void Runbook_DocumentsCx12LiveApplyState()
{
var runbook = File.ReadAllText(RunbookPath);
runbook.Should().Contain("Sprint 32 Cx-12");
runbook.Should().Contain("openvoxserver-safeconfig.service");
runbook.Should().Contain("/opt/puppet/r10k-deploy.sh");
runbook.Should().Contain("HEAD == origin/master");
}
[Fact]
public void SmokeScript_IsExplicitlyOptIn()
{
var smoke = File.ReadAllText(SmokePath);
smoke.Should().Contain("OPENVOX_RECREATE_SMOKE");
smoke.Should().Contain("exit 64");
smoke.IndexOf("OPENVOX_RECREATE_SMOKE", StringComparison.Ordinal)
.Should().BeLessThan(smoke.IndexOf("systemctl stop openvoxserver", StringComparison.Ordinal));
}
[Fact]
public void SmokeScript_RequiresGeneratedSystemdUnitBeforeRemovingContainer()
{
var smoke = File.ReadAllText(SmokePath);
smoke.Should().Contain("systemctl cat openvoxserver");
smoke.Should().Contain("refusing to remove a container without a verified systemd recreate path");
smoke.IndexOf("systemctl cat openvoxserver", StringComparison.Ordinal)
.Should().BeLessThan(smoke.IndexOf("podman rm openvoxserver", StringComparison.Ordinal));
}
[Fact]
public void Artifacts_DoNotStoreSecretsOrPaidRunnerLabels()
{
var forbidden = new[]
{
"BEGIN OPENSSH PRIVATE KEY",
"BEGIN RSA PRIVATE KEY",
"ubuntu-latest",
"windows-latest",
"macos-latest",
};
var violations = new[] { RunbookPath, SmokePath }
.SelectMany(path =>
{
var text = File.ReadAllText(path);
return forbidden
.Where(token => text.Contains(token, StringComparison.OrdinalIgnoreCase))
.Select(token => $"{Path.GetRelativePath(Root, path)} contains forbidden token {token}");
})
.ToList();
violations.Should().BeEmpty();
}
private static string FindRepoRoot()
{
var current = new DirectoryInfo(AppContext.BaseDirectory);
while (current is not null)
{
if (Directory.Exists(Path.Combine(current.FullName, "apps"))
&& Directory.Exists(Path.Combine(current.FullName, "scripts"))
&& File.Exists(Path.Combine(current.FullName, "README.md")))
{
return current.FullName;
}
current = current.Parent;
}
throw new DirectoryNotFoundException("Could not find bluejay-infra root.");
}
}

View File

@@ -1,6 +1,6 @@
package bluejayinfra.public_method_allowlist package bluejayinfra.public_method_allowlist
public_hosts := {"dist.flowercore.io", "dns.iamworkin.lan"} public_hosts := {"brochure.flowercore.io", "dist.flowercore.io", "dns.iamworkin.lan"}
deny[msg] { deny[msg] {
input.kind == "IngressRoute" input.kind == "IngressRoute"