Compare commits

..

2 Commits

Author SHA1 Message Date
Andrew Stoltz
f61901ccbd chore(bridge): bump fc-llm-bridge image tag v202604292028 2026-04-29 20:33:29 -05:00
Andrew Stoltz
4a309cbf0b refactor(agent-zero): drop ollama-proxy sidecar (Phase 3) 2026-04-29 20:27:28 -05:00
69 changed files with 128 additions and 12197 deletions

7
.gitignore vendored
View File

@@ -1,7 +0,0 @@
# .NET build outputs (lint test project)
**/bin/
**/obj/
# Editor / temp
.DS_Store
*.swp

View File

@@ -99,23 +99,8 @@ curl -sk -X DELETE https://dns.iamworkin.lan/api/v1/servers/<serverId>/zones/iam
- **CoreDNS template + ndots:5 collision**: inside pods, `<svc>.<ns>.svc.cluster.local` with <5 dots gets search-expanded through `iamworkin.lan` FIRST and hits the wildcard template → resolves to Traefik VIP, not the real ClusterIP. Use short service names (`<svc>`) in K8s manifests. See memory `feedback_coredns_ndots_template_collision.md`.
- **Image not on node**: pods stuck `ErrImageNeverPull` means the image wasn't imported to the node Kubernetes scheduled the pod onto. `ctr images import` on all of rke2-server, rke2-agent1, rke2-agent2.
- **StatefulSet PVC drift**: `volumeClaimTemplates` needs explicit `volumeMode: Filesystem` or ArgoCD SSA self-heals forever. See memory `feedback_argocd_statefulset_pvc_drift.md`.
- **IngressRoute namespace split**: this RKE2 Traefik install does not allow cross-namespace service refs. Keep the `IngressRoute`, backend `Service`, and TLS secret in the same namespace; if one host is shared across namespaces, duplicate the `Certificate` and move the route next to the destination service.
- **Public read-only hosts**: if a public host fronts a service that also exposes admin writes internally, add a Traefik route match like `Host(...) && (Method(GET) || Method(HEAD))` on the public edge instead of trusting the app to reject unsafe methods.
- **Public read-write allowlist hosts**: if a public host accepts a tightly bounded write surface (e.g. bootstrap-JWT POST), pin the allowlist as `(Method(GET) || Method(HEAD) || Method(POST) || Method(OPTIONS))`. PUT/PATCH/DELETE must still 404 at the route. Track A's `updatecenter.iamworkin.lan` / `updates.iamworkin.lan` are the canonical example. The lint test enforces this invariant.
- **Traefik VIP netpols**: when a `NetworkPolicy` allows `10.0.56.200`, also allow the post-DNAT backend ports (`8443` for TLS plus `8080` or `8000` for HTTP) or Calico will drop the rewritten flow.
- **Auth-safe probes**: services behind API-key or global auth middleware should prefer `tcpSocket` probes unless `/health` is explicitly exempted before the middleware runs.
- **ArgoCD must use internal Gitea URL**: `http://gitea-clusterip.gitea.svc.cluster.local:3000/bluejay/bluejay-infra.git`, not the external HTTPS URL (step-ca cert isn't trusted by ArgoCD). The `ApplicationSet` and any hand-created `Application` must both use the internal URL.
## Local manifest lint
The repo now carries a local-first lint pass for the recurring K8s gotchas that have burned the fleet:
```bash
dotnet test tests/bluejay-infra-lint/BluejayInfraLint.Tests.csproj -c Release
```
That test project sweeps `bluejay-infra/apps/**` plus the canonical sibling `FlowerCore.*\\k8s` manifests that share the same workspace. Matching `conftest.dev` policy files live under `tests/bluejay-infra-lint/conftest.dev/` for environments that also have `conftest` or `opa`.
## References
- Cert-manager recovery playbook: `FlowerCore.Notes/memory/project_cert_manager_recovery_2026_04_22.md`

View File

@@ -92,16 +92,13 @@ subjects:
# =============================================================================
# Agent Zero — AI Agent Web UI (NUC Edition, Blue Jay Profile)
# =============================================================================
# Connects directly to fc-llm-bridge for chat + internal util/embed + browser.
# Agent Zero's internal util/embed slots stay on the bridge's OpenAI-compatible
# /v1 surface, while browser + corpus-search use the Ollama-compatible /api/*
# surface through OLLAMA_HOST.
# Connects directly to fc-llm-bridge for chat + util + embeddings + browser.
# Blue Jay profile with 21 tools, 3 prompts, 4 extensions.
---
# FC LLM Bridge API key for Agent Zero (ADR-088 chat/util/embed/browser routing).
# Syncs from 1Password item "FC LLM Bridge API Keys" (field: agent-zero-k8s).
# Consumed by chat, internal util/embed, browser, and corpus-search requests
# Consumed by chat, util, embeddings, browser, and corpus-search requests
# that traverse fc-llm-bridge.
apiVersion: onepassword.com/v1
kind: OnePasswordItem
@@ -127,18 +124,6 @@ metadata:
spec:
itemPath: "vaults/IAmWorkin/items/Print.Web API Keys"
---
# Knowledge MCP bearer token for the direct Agent Zero -> Knowledge.Web path.
# The 1Password item currently stores the raw token in its concealed PASSWORD
# field, which the operator syncs to Secret key `password`.
apiVersion: onepassword.com/v1
kind: OnePasswordItem
metadata:
name: knowledge-mcp-tokens
namespace: agent-zero
spec:
itemPath: "vaults/IAmWorkin/items/FlowerCore Knowledge MCP Tokens"
---
apiVersion: apps/v1
kind: Deployment
@@ -150,7 +135,7 @@ metadata:
annotations:
agent-zero/deployment: "nuc"
agent-zero/profile: "bluejay"
agent-zero/ollama: "fc-llm-bridge fronts edge1 Pi 5 + AI HAT+ Ollama for cluster browser/corpus-search traffic; internal chat/util/embed route through the bridge's authenticated OpenAI surface"
agent-zero/ollama: "edge1 Pi 5 + AI HAT+ only (10.0.57.17:11434) — workstation Ollama is private dev hardware, not a cluster dependency"
spec:
replicas: 1
selector:
@@ -243,41 +228,23 @@ spec:
# chat_model: FlowerCore LLM Bridge (ADR-088) — OpenAI-compat,
# spend-tracked, tier-aliased (fc:balanced → Claude Sonnet).
# api_key comes from A0_SET_chat_model_api_key env var (overrides
# config.json). Utility + embedding stay on the authenticated
# OpenAI-compatible /v1 surface; browser and direct tool traffic
# use the bridge's Ollama-compatible root via OLLAMA_HOST.
# config.json). Utility / embedding / browser all point at the
# same bridge root and use Ollama-compatible endpoints there.
mkdir -p /a0/usr/plugins/_model_config
cat > /a0/usr/plugins/_model_config/config.json << 'MODELCFG'
{"allow_chat_override":true,"chat_model":{"provider":"openai","name":"fc:balanced","api_base":"http://fc-llm-bridge.fc-llm-bridge.svc:8080/v1","ctx_length":8192,"ctx_history":0.7,"vision":false,"kwargs":{"temperature":0,"num_ctx":8192}},"utility_model":{"provider":"openai","name":"fc:cheap","api_base":"http://fc-llm-bridge.fc-llm-bridge.svc:8080/v1","ctx_length":8192,"ctx_input":0.7,"kwargs":{"num_ctx":8192}},"embedding_model":{"provider":"openai","name":"openai/fc:embedding","api_base":"http://fc-llm-bridge.fc-llm-bridge.svc:8080/v1","kwargs":{}}}
{"allow_chat_override":true,"chat_model":{"provider":"openai","name":"fc:balanced","api_base":"http://fc-llm-bridge.fc-llm-bridge.svc:8080/v1","ctx_length":8192,"ctx_history":0.7,"vision":false,"kwargs":{"temperature":0,"num_ctx":8192}},"utility_model":{"provider":"ollama","name":"qwen2.5:1.5b","api_base":"http://fc-llm-bridge.fc-llm-bridge.svc:8080","ctx_length":8192,"ctx_input":0.7,"kwargs":{"num_ctx":8192}},"embedding_model":{"provider":"ollama","name":"nomic-embed-text","api_base":"http://fc-llm-bridge.fc-llm-bridge.svc:8080","kwargs":{}}}
MODELCFG
# Strip heredoc indentation
sed -i 's/^ //' /a0/usr/plugins/_model_config/config.json
# Phase 0 Chat MCP pilot: Agent Zero does not interpolate env vars
# inside A0_SET_mcp_servers JSON, so build the final JSON here from
# the secret-backed env vars before initialize.sh. Keep the local
# corpus_search.py tool mounted either way so outage fallback
# remains available even when fc_knowledge is not advertised.
export KNOWLEDGE_MCP_ENABLED=false
if [ -n "${KNOWLEDGE_MCP_BEARER_TOKEN:-}" ]; then
if curl -sf --connect-timeout 3 "${KNOWLEDGE_MCP_HEALTH_URL}" > /dev/null && \
curl -sf --connect-timeout 5 \
-H "Authorization: Bearer ${KNOWLEDGE_MCP_BEARER_TOKEN}" \
-H "Accept: application/json, text/event-stream" \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":"fc-knowledge-bootstrap","method":"initialize","params":{"protocolVersion":"2025-03-26","capabilities":{},"clientInfo":{"name":"agent-zero-bootstrap","version":"1.0"}}}' \
"${KNOWLEDGE_MCP_URL}" > /dev/null; then
export KNOWLEDGE_MCP_ENABLED=true
echo "fc_knowledge enabled from ${KNOWLEDGE_MCP_URL}."
else
echo "fc_knowledge unavailable or unauthorized; keeping local corpus_search.py as the fallback path."
fi
else
echo "fc_knowledge token missing; keeping local corpus_search.py as the fallback path."
# the secret-backed CHAT_MCP_API_KEY env var before initialize.sh.
# Use the in-cluster Chat service URL rather than the public
# Traefik hostname so the pod stays off the private VIP lane that
# the default egress rule blocks.
if [ -n "${CHAT_MCP_API_KEY:-}" ]; then
export A0_SET_mcp_servers="{\"mcpServers\":{\"fc-chat\":{\"type\":\"streamable-http\",\"url\":\"http://chat-web.fc-chat.svc/mcp\",\"headers\":{\"X-Api-Key\":\"${CHAT_MCP_API_KEY}\"}}}}"
fi
export A0_SET_mcp_servers="$(
python3 -c 'import json, os; servers = {}; chat_key = os.getenv("CHAT_MCP_API_KEY"); knowledge_enabled = os.getenv("KNOWLEDGE_MCP_ENABLED", "false").lower() == "true"; token = os.getenv("KNOWLEDGE_MCP_BEARER_TOKEN", "") if knowledge_enabled else ""; chat_key and servers.setdefault("fc_chat", {"type": "streamable-http", "url": "http://chat-web.fc-chat.svc/mcp", "headers": {"X-Api-Key": chat_key}}); token and servers.setdefault("fc_knowledge", {"type": "streamable-http", "url": os.getenv("KNOWLEDGE_MCP_URL", "http://knowledge-web.knowledge.svc/mcp"), "headers": {"Authorization": f"Bearer {token}"}}); print(json.dumps({"mcpServers": servers}, separators=(",", ":")))'
)"
# Run the original entrypoint
exec /exe/initialize.sh $BRANCH
ports:
@@ -289,9 +256,8 @@ spec:
# Chat model — routed through FlowerCore LLM Bridge (ADR-088)
# so spend is tracked and tier aliases (fc:cheap/fc:balanced/fc:deep)
# dispatch to Ollama or Anthropic via a single OpenAI-compat endpoint.
# Internal utility + embedding use the authenticated OpenAI surface,
# while browser/corpus-search use the bridge's Ollama-compatible
# endpoints so Agent Zero no longer needs a local proxy sidecar.
# Utility / embedding / browser now traverse fc-llm-bridge too so
# Agent Zero no longer needs a local Ollama proxy sidecar.
- name: A0_SET_chat_model_provider
value: "openai"
- name: A0_SET_chat_model_name
@@ -322,24 +288,32 @@ spec:
value: "8192"
- name: A0_SET_chat_model_kwargs
value: '{"temperature": 0, "num_ctx": 8192}'
# Utility model — fast small helper tier through the OpenAI surface
# Utility model — fast small helper tier through the same proxy
- name: A0_SET_util_model_provider
value: "openai"
value: "ollama"
- name: A0_SET_util_model_name
value: "fc:cheap"
value: "qwen2.5:1.5b"
- name: A0_SET_util_model_api_base
value: "http://fc-llm-bridge.fc-llm-bridge.svc:8080/v1"
value: "http://fc-llm-bridge.fc-llm-bridge.svc:8080"
- name: A0_SET_util_model_api_key
valueFrom:
secretKeyRef:
name: fc-llm-bridge-api-keys
key: agent-zero-k8s
- name: A0_SET_util_model_kwargs
value: '{"num_ctx": 2048}'
# Embedding model — authenticated bridge alias to nomic-embed-text.
# LiteLLM's embedding() path needs an explicit provider prefix here
# even though the chat slot can use bare fc:* aliases.
# Embedding model — nomic through the same proxy
- name: A0_SET_embed_model_provider
value: "openai"
value: "ollama"
- name: A0_SET_embed_model_name
value: "openai/fc:embedding"
value: "nomic-embed-text"
- name: A0_SET_embed_model_api_base
value: "http://fc-llm-bridge.fc-llm-bridge.svc:8080/v1"
value: "http://fc-llm-bridge.fc-llm-bridge.svc:8080"
- name: A0_SET_embed_model_api_key
valueFrom:
secretKeyRef:
name: fc-llm-bridge-api-keys
key: agent-zero-k8s
# Browser model — small Gemma candidate through the same proxy
- name: A0_SET_browser_model_provider
value: "ollama"
@@ -380,19 +354,6 @@ spec:
name: chat-mcp-api-key
key: api-key
optional: true
# FlowerCore.Knowledge MCP Phase 1 — direct Agent Zero client path.
# Probe /healthz first, then try an authenticated initialize call.
# If either fails, Agent Zero boots without fc_knowledge and keeps
# the local corpus_search.py tool as the outage-safe path.
- name: KNOWLEDGE_MCP_URL
value: "http://knowledge-web.knowledge.svc/mcp"
- name: KNOWLEDGE_MCP_HEALTH_URL
value: "http://knowledge-web.knowledge.svc/healthz"
- name: KNOWLEDGE_MCP_BEARER_TOKEN
valueFrom:
secretKeyRef:
name: knowledge-mcp-tokens
key: password
# Print.Web — Thermal printer service on edge2.
# PRINT_WEB_URL: internal HTTP (bypasses Traefik TLS — print_web.py
# runs in-cluster and can reach edge2 directly on the PROD VLAN).
@@ -617,17 +578,6 @@ spec:
protocol: TCP
- port: 8080
protocol: TCP
# FlowerCore.Knowledge MCP (Phase 1) — in-cluster direct route with
# anonymous /healthz probe plus authenticated /mcp initialize/tool calls.
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: knowledge
ports:
- port: 80
protocol: TCP
- port: 8080
protocol: TCP
# Intranet search API — use in-cluster svc so traffic stays inside
# the cluster and is not blocked by the private-range egress denylist.
- to:

View File

@@ -16,25 +16,13 @@ spec:
metadata:
labels:
app: asterisk
spec:
nodeSelector:
kubernetes.io/hostname: rke2-agent1
hostNetwork: true
# Keep the search list free of iamworkin.lan so CoreDNS's wildcard
# template cannot hijack public egress like downloads.asterisk.org.
dnsPolicy: None
dnsConfig:
nameservers:
- 10.43.0.10
searches:
- telephony.svc.cluster.local
- svc.cluster.local
- cluster.local
options:
- name: ndots
value: "2"
securityContext:
fsGroup: 0
spec:
nodeSelector:
kubernetes.io/hostname: rke2-agent1
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
securityContext:
fsGroup: 0
# CoreDNS in this cluster has an iamworkin.lan wildcard that catches
# any unresolved name and returns 10.0.56.200 (Traefik VIP), which
# means downloads.asterisk.org inside the pod resolves to Traefik and

View File

@@ -1,69 +0,0 @@
# CDI — Containerized Data Importer
KubeVirt's `containerized-data-importer` for populating PVCs from external
sources (HTTP, HTTPS, container registry, S3, virtctl upload). Required to
import the Windows Server 2025 ISO into the `windows-server-2025-iso` PVC
that `apps/kubevirt-vms/ci1.yaml` mounts as a CDROM.
## Files
| File | Source | Purpose |
| ----------------- | ----------------------------------------------------------------------------------------------------------------- | -------------------------------------------------- |
| `cdi-operator.yaml` | [`v1.65.0`](https://github.com/kubevirt/containerized-data-importer/releases/tag/v1.65.0) — verbatim copy | Installs operator + CRDs (5779 lines, large) |
| `cdi-cr.yaml` | [`v1.65.0`](https://github.com/kubevirt/containerized-data-importer/releases/tag/v1.65.0) — annotated + commented | Tells operator to deploy CDI components |
`cdi-operator.yaml` is **vendored verbatim** from the upstream release for
air-gap reproducibility (no internet fetch at deploy time, ArgoCD prune
contracts hold). To bump versions:
```bash
CDI_VER=v1.66.0 # for example
curl -sL "https://github.com/kubevirt/containerized-data-importer/releases/download/${CDI_VER}/cdi-operator.yaml" \
-o apps/cdi/cdi-operator.yaml
curl -sL "https://github.com/kubevirt/containerized-data-importer/releases/download/${CDI_VER}/cdi-cr.yaml" \
-o /tmp/cdi-cr-new.yaml # then re-apply project header diff
git diff apps/cdi/ # review
git commit + push
```
## Verify after deploy
```bash
kubectl -n cdi get pods # operator + apiserver + deployment + uploadproxy
kubectl get cdis cdi -o jsonpath='{.status.phase}' # "Deployed"
kubectl get crd | grep cdi.kubevirt.io
# Expected CRDs: datavolumes.cdi.kubevirt.io, cdiconfigs.cdi.kubevirt.io,
# storageprofiles.cdi.kubevirt.io, dataimportcrons.cdi.kubevirt.io,
# datasources.cdi.kubevirt.io, objecttransfers.cdi.kubevirt.io
```
## Use after install
```yaml
# Example DataVolume that imports from HTTP
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
name: my-iso
spec:
source:
http:
url: "https://server/path/to.iso"
pvc:
accessModes: [ReadWriteOnce]
resources:
requests:
storage: 10Gi
storageClassName: longhorn
```
```bash
# Or upload from local disk via virtctl
virtctl image-upload pvc my-iso \
--image-path ./my.iso \
--size 10Gi \
--storage-class longhorn \
--access-mode ReadWriteOnce \
--uploadproxy-url https://cdi-uploadproxy.cdi.svc:443 \
--insecure
```

View File

@@ -1,36 +0,0 @@
# =============================================================================
# CDI CR — Tells the CDI operator to install CDI components into the cluster.
# =============================================================================
# After cdi-operator.yaml is applied, the operator watches for THIS resource
# (CDI named "cdi"). When found, it deploys cdi-apiserver, cdi-deployment,
# cdi-uploadproxy, cdi-cronjob, and the importer/uploadserver/cloner pods.
#
# Configuration:
# - HonorWaitForFirstConsumer: PVCs created by DataVolumes wait for first
# pod to schedule before binding (lets storage class pick best node).
# - WebhookPvcRendering: validates PVC creation against CDI policies.
# - imagePullPolicy IfNotPresent: re-pull only on tag rotation.
# - nodeSelector linux: pin to Linux nodes (no Windows worker support).
#
# Andrew may want to add a `uploadProxyURLOverride` later to expose the
# uploadproxy via Traefik IngressRoute for `virtctl image-upload` from
# BLUEJAY-WS without `kubectl port-forward`. Phase 2 enhancement.
# =============================================================================
apiVersion: cdi.kubevirt.io/v1beta1
kind: CDI
metadata:
name: cdi
annotations:
bluejay.iamworkin.lan/source: "kubevirt/containerized-data-importer v1.65.0"
spec:
config:
featureGates:
- HonorWaitForFirstConsumer
- WebhookPvcRendering
imagePullPolicy: IfNotPresent
infra:
nodeSelector:
kubernetes.io/os: linux
workload:
nodeSelector:
kubernetes.io/os: linux

File diff suppressed because it is too large Load Diff

View File

@@ -1,18 +1,5 @@
# FlowerCore Remote Desktop — TLS + Ingress
#
# Source-of-truth split:
# - bluejay-infra OWNS: Certificate, IngressRoute, all NetworkPolicies
# (see network-policies.yaml in this directory).
# - FlowerCore.RemoteDesktop scripts/deploy-web.sh OWNS: Deployment +
# Service. Reason: image refs like `localhost/fc-desktop:linux-xfce`
# only exist on each node's containerd after a manual import, so a
# Deployment manifest in bluejay-infra would race the image-import
# step and crash-loop.
#
# NetworkPolicies moved into bluejay-infra 2026-05-07 — previously they
# were applied via the deploy script's kubectl apply calls, which broke
# cluster-rebuild repeatability. See
# feedback_networkpolicies_belong_in_bluejay_infra.md.
# Deployment and Service managed by deploy script (not ArgoCD)
---
apiVersion: cert-manager.io/v1
kind: Certificate

View File

@@ -1,332 +0,0 @@
# FlowerCore Remote Desktop — NetworkPolicies (GitOps-managed)
#
# Moved into bluejay-infra 2026-05-07 as part of the regroup audit. These
# four policies were previously applied via FlowerCore.RemoteDesktop's
# scripts/deploy-web.sh `kubectl apply` calls, which meant a fresh cluster
# rebuild from bluejay-infra alone would miss them — Browser Lab session
# isolation, control-plane allow-list, and HTTP-01 cert renewal would all
# silently fail to come up.
#
# Source-of-truth contract:
# - bluejay-infra OWNS all NetworkPolicy + Certificate + IngressRoute
# resources for fc-desktop.
# - FlowerCore.RemoteDesktop's scripts/deploy-web.sh continues to own
# the Deployment + Service apply (because the image ref
# `localhost/fc-desktop:linux-xfce` only exists on each node's
# containerd after a manual import — it can't be pulled from a
# registry, so a Deployment manifest in bluejay-infra would race the
# image-import step and crash-loop).
---
# 1) desktop-isolation — Browser Lab session pods.
#
# Locks down pods labeled `app.kubernetes.io/name=remote-desktop` (every
# session pod regardless of template). Allows guacd ingress for the VNC/RDP
# display lane and remotedesktop-web's pre-handoff probing. Egress: NFS to
# Synology, DNS, Traefik (cluster + LB VIP), Intranet (Browser Lab home).
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: desktop-isolation
namespace: fc-desktop
labels:
app.kubernetes.io/part-of: remotedesktop
app.kubernetes.io/component: isolation
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: remote-desktop
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: guacamole
ports:
- port: 3000
protocol: TCP
- port: 3001
protocol: TCP
- port: 5901
protocol: TCP
- port: 3389
protocol: TCP
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: fc-desktop
podSelector:
matchLabels:
app.kubernetes.io/name: remotedesktop-web
ports:
- port: 3000
protocol: TCP
- port: 5901
protocol: TCP
egress:
# NFS to Synology
- to:
- ipBlock:
cidr: 10.0.58.3/32
ports:
- port: 2049
protocol: TCP
- port: 2049
protocol: UDP
- port: 111
protocol: TCP
- port: 111
protocol: UDP
- to:
- ipBlock:
cidr: 10.0.58.3/32
ports:
- port: 445
protocol: TCP
- to: []
ports:
- port: 53
protocol: UDP
- port: 53
protocol: TCP
- to:
- ipBlock:
cidr: 10.0.56.200/32
- ipBlock:
cidr: 10.43.33.87/32
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: traefik-system
podSelector:
matchLabels:
app.kubernetes.io/name: traefik
ports:
- port: 80
protocol: TCP
- port: 443
protocol: TCP
- port: 8000
protocol: TCP
- port: 8443
protocol: TCP
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: intranet
podSelector:
matchLabels:
app: intranet-web
ports:
- port: 5300
protocol: TCP
---
# 2) fc-desktop-default-deny — namespace-wide catch-all.
#
# Selects every pod EXCEPT remotedesktop-web (the public-surface control
# plane) and applies default-deny semantics for both Ingress and Egress.
# Closes the gap where session pods land WITHOUT the desktop-isolation
# policy's `app.kubernetes.io/name=remote-desktop` label, plus prevents
# arbitrary debug sidecars / kubectl debug images from getting cluster
# access.
#
# CRITICAL: also catches transient cm-acme-http-solver pods (that's the
# bug this whole regroup chased). The cm-acme-http-solver-allow policy
# below is the explicit carve-out.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: fc-desktop-default-deny
namespace: fc-desktop
labels:
app.kubernetes.io/part-of: remotedesktop
app.kubernetes.io/component: isolation
spec:
podSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: NotIn
values:
- remotedesktop-web
policyTypes:
- Ingress
- Egress
---
# 3) remotedesktop-web-isolation — control plane explicit allow-list.
#
# remotedesktop-web is the only pod label the default-deny excludes, so
# without this policy the control plane would have wide-open Ingress AND
# Egress. This re-introduces a tight allow-list:
# - Ingress: Traefik only on TCP/8080
# - Egress: CoreDNS, K8s API, Guacamole admin, NFS, Intranet,
# Traefik (cluster + LB), and the fc-desktop namespace itself
# (for session pod readiness probing).
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: remotedesktop-web-isolation
namespace: fc-desktop
labels:
app.kubernetes.io/part-of: remotedesktop
app.kubernetes.io/component: isolation
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: remotedesktop-web
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: traefik-system
podSelector:
matchLabels:
app.kubernetes.io/name: traefik
ports:
- port: 8080
protocol: TCP
egress:
# CoreDNS
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- port: 53
protocol: UDP
- port: 53
protocol: TCP
# K8s API server
- to: []
ports:
- port: 443
protocol: TCP
- port: 6443
protocol: TCP
# Guacamole admin
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: guacamole
ports:
- port: 8080
protocol: TCP
# NFS to Synology
- to:
- ipBlock:
cidr: 10.0.58.3/32
ports:
- port: 2049
protocol: TCP
- port: 2049
protocol: UDP
- port: 111
protocol: TCP
- port: 111
protocol: UDP
# Intranet web
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: intranet
podSelector:
matchLabels:
app: intranet-web
ports:
- port: 5300
protocol: TCP
# Cluster Traefik pods (in-cluster service resolution + Guacamole
# routing handoff where web app builds URLs against the public host
# but resolves internally).
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: traefik-system
podSelector:
matchLabels:
app.kubernetes.io/name: traefik
ports:
- port: 80
protocol: TCP
- port: 443
protocol: TCP
- port: 8080
protocol: TCP
- port: 8443
protocol: TCP
# fc-desktop namespace — session pod probing during browser-access
# readiness checks.
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: fc-desktop
ports:
- port: 3000
protocol: TCP
- port: 3001
protocol: TCP
- port: 5901
protocol: TCP
- port: 3389
protocol: TCP
---
# 4) cm-acme-http-solver-allow — cert-manager HTTP-01 carve-out.
#
# Without this, fc-desktop-default-deny catches the transient solver pods
# cert-manager creates for each renewal (they don't carry the
# remotedesktop-web label). Caused 8-day silent renewal failure on
# desktop.iamworkin.lan in 2026-04-28..2026-05-07 (see
# feedback_certmanager_renewal_stuck_when_solver_blocked_by_namespace_default_deny.md).
#
# Authorizes:
# - Ingress on TCP/8089 from cluster Traefik (which proxies the external
# HTTP-01 GET on port 80 through to the solver).
# - Egress for cluster DNS (defensive — newer cert-manager probes from
# inside the solver too).
#
# The `acme.cert-manager.io/http01-solver=true` label is set by
# cert-manager itself on every solver pod automatically.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: cm-acme-http-solver-allow
namespace: fc-desktop
labels:
app.kubernetes.io/part-of: remotedesktop
app.kubernetes.io/component: cert-renewal
spec:
podSelector:
matchLabels:
acme.cert-manager.io/http01-solver: "true"
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: traefik-system
podSelector:
matchLabels:
app.kubernetes.io/name: traefik
ports:
- port: 8089
protocol: TCP
egress:
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- port: 53
protocol: UDP
- port: 53
protocol: TCP

View File

@@ -118,7 +118,7 @@ spec:
# dotnet.exe publish -c Release -o deploy/app \
# src/FlowerCore.Distribution.Web/FlowerCore.Distribution.Web.csproj
# podman build -t localhost/fc-distribution:v<tag> -f deploy/Dockerfile.deploy deploy
image: localhost/fc-distribution:v202605061948
image: localhost/fc-distribution:v202604240010
imagePullPolicy: Never
ports:
- containerPort: 8080
@@ -151,10 +151,6 @@ spec:
value: "/signing/aistation-field/chain.pem"
- name: FlowerCore__Distribution__Signing__EditionCerts__aistation-field__KeyPath
value: "/signing/aistation-field/private-key.pem"
# Public distribution host is GET/HEAD-only at Traefik; this
# entitlement list controls which editions are readable there.
- name: FlowerCore__Distribution__EntitlementPublic__PublicEditions__0
value: "*"
resources:
requests:
cpu: 100m
@@ -266,12 +262,8 @@ spec:
kind: ClusterIssuer
dnsNames:
- dist.iamworkin.lan
# step-ca ACME caps lifetime at 30d; requesting 90d silently capped
# made renewBefore=cert-lifetime → perpetual renewal loop (10880+ CRs
# in 18h on 2026-05-07). Match working 720h/240h pattern from other
# FC services.
duration: 720h # 30d (step-ca cap)
renewBefore: 240h # 10d
duration: 2160h # 90d
renewBefore: 720h # 30d
---
apiVersion: traefik.io/v1alpha1
kind: IngressRoute

View File

@@ -87,20 +87,6 @@ spec:
prometheus.io/port: "8080"
prometheus.io/path: "/metrics"
spec:
# Use an explicit DNS policy so external FQDNs like api.anthropic.com are
# resolved directly instead of being expanded through the cluster search
# path that includes iamworkin.lan.
dnsPolicy: None
dnsConfig:
nameservers:
- 10.43.0.10
searches:
- fc-llm-bridge.svc.cluster.local
- svc.cluster.local
- cluster.local
options:
- name: ndots
value: "2"
securityContext:
fsGroup: 1654
fsGroupChangePolicy: OnRootMismatch
@@ -111,7 +97,7 @@ spec:
# dotnet.exe publish -c Release -o deploy/app \
# src/FlowerCore.LlmBridge.Web/FlowerCore.LlmBridge.Web.csproj
# podman build -t localhost/fc-llm-bridge:v<tag> -f deploy/Dockerfile.deploy deploy
image: localhost/fc-llm-bridge:v202604300022
image: localhost/fc-llm-bridge:v202604292028
imagePullPolicy: Never
ports:
- containerPort: 8080
@@ -225,6 +211,17 @@ spec:
port: 8080
initialDelaySeconds: 15
periodSeconds: 30
# Lower ndots so external FQDNs like api.anthropic.com are tried BEFORE
# the ndots:5 default expands them through the cluster search path, which
# includes iamworkin.lan. CoreDNS has a `template IN A iamworkin.lan`
# wildcard that answers `api.anthropic.com.iamworkin.lan` with the
# Traefik VIP, which then serves a TRAEFIK-DEFAULT-CERT TLS cert and
# breaks egress to the real Anthropic API (memory:
# feedback_coredns_ndots_template_collision, generalized to external DNS).
dnsConfig:
options:
- name: ndots
value: "2"
volumes:
- name: data
persistentVolumeClaim:

View File

@@ -69,14 +69,16 @@ spec:
memory: "512Mi"
cpu: "500m"
livenessProbe:
tcpSocket:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 10
periodSeconds: 30
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
tcpSocket:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 10
periodSeconds: 10

View File

@@ -1,171 +0,0 @@
# fc-redis — SignalR backplane for cross-product event bus
#
# Lands per Q-SO-1 resolution (2026-05-11 PM): SignalR backplane in Phase A,
# not Phase C as originally drafted. Operator directive: "Redis can be
# deployed just fine as it's another FlowerCore technology we'll want to
# manage."
#
# Phase A scope (this file):
# - Single Redis 7.x Alpine pod
# - 1Gi Longhorn RWO PVC for AOF persistence
# - ClusterIP Service at `redis.fc-redis.svc.cluster.local:6379`
# - No AUTH (in-cluster only; not exposed externally)
# - No IngressRoute (backplane is server-to-server only)
#
# Consumers (Phase A IMPL across FC services):
# - FlowerCore.Signage.Web (OpsConsoleHub)
# - FlowerCore.Scoreboard.Web (ScoreboardHub)
# - FlowerCore.SignalControl.Web
# - FlowerCore.DMS.Web
# - Any other product joining the cross-product event bus
#
# Each consumer adds:
# services.AddSignalR()
# .AddStackExchangeRedis(
# "redis.fc-redis.svc.cluster.local:6379",
# opts => opts.Configuration.ChannelPrefix =
# StackExchange.Redis.RedisChannel.Literal("fc-opsconsole"));
#
# Phase B / C follow-ons (out of scope here):
# - Redis Sentinel for HA (3-node)
# - AUTH password from 1Password Connect (rotate via /rotate-password)
# - redis_exporter sidecar for Prometheus scrape
# - Network policies restricting which namespaces can dial 6379
#
# Design: docs/signage/operations-console-phase-2-design.md §3.5
# Decision: Q-SO-1 (RESOLVED 2026-05-11 PM)
# Memory: feedback_blooming_ui_pattern_no_iframes
---
apiVersion: v1
kind: Namespace
metadata:
name: fc-redis
labels:
app.kubernetes.io/part-of: flowercore
app.kubernetes.io/managed-by: argocd
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: fc-redis-data
namespace: fc-redis
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
resources:
requests:
storage: 1Gi
---
apiVersion: v1
kind: ConfigMap
metadata:
name: fc-redis-config
namespace: fc-redis
data:
redis.conf: |
# Phase A — minimal config; no AUTH, no replication.
bind 0.0.0.0
protected-mode no
port 6379
tcp-backlog 511
timeout 0
tcp-keepalive 300
# Persistence: AOF (fsync every second is the standard SignalR-backplane
# durability sweet spot — the backplane only needs to survive Redis
# restarts, not absolute zero loss).
appendonly yes
appendfsync everysec
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
# Reasonable defaults — let Redis pick most things.
maxmemory-policy allkeys-lru
maxmemory 256mb
# Logging
loglevel notice
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: fc-redis
namespace: fc-redis
labels:
app: fc-redis
spec:
replicas: 1
strategy:
type: Recreate # RWO PVC; do not do rolling update
selector:
matchLabels:
app: fc-redis
template:
metadata:
labels:
app: fc-redis
spec:
securityContext:
runAsNonRoot: true
runAsUser: 999 # redis:7-alpine default uid
runAsGroup: 999
fsGroup: 999
containers:
- name: redis
image: redis:7-alpine
imagePullPolicy: IfNotPresent
command: ["redis-server", "/etc/redis/redis.conf"]
ports:
- name: redis
containerPort: 6379
resources:
requests:
cpu: "50m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "384Mi"
volumeMounts:
- name: data
mountPath: /data
- name: config
mountPath: /etc/redis
readOnly: true
livenessProbe:
tcpSocket:
port: 6379
initialDelaySeconds: 5
periodSeconds: 10
readinessProbe:
exec:
command: ["redis-cli", "ping"]
initialDelaySeconds: 2
periodSeconds: 5
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: [ALL]
volumes:
- name: data
persistentVolumeClaim:
claimName: fc-redis-data
- name: config
configMap:
name: fc-redis-config
---
apiVersion: v1
kind: Service
metadata:
name: redis
namespace: fc-redis
spec:
type: ClusterIP
selector:
app: fc-redis
ports:
- name: redis
port: 6379
targetPort: 6379
protocol: TCP

View File

@@ -1,14 +0,0 @@
# fc-signage-appletv
Apple TV signage is a sealed appliance running the `FlowerCore.Signage.Agent.AppleTv` tvOS app per ADR-134.
This ApplicationSet entry is documentation and inventory metadata only. It intentionally creates no `Deployment`, `Service`, or `Pod`.
The Apple TV app connects outbound to existing FC.Signage.Web surfaces:
- `https://signage.iamworkin.lan/hub/signage` for SignalR live status.
- `GET /api/v1/nodes/{nodeId}/state` for the 30 second polling fallback.
- `POST /api/v1/nodes/register` and `POST /api/v1/nodes/{nodeId}/enroll` for pairing and mTLS enrollment.
- `POST /api/v1/nodes/{nodeId}/heartbeat` for metrics, current content identity, and local audit excerpts.
Distribution is via Apple Developer Enterprise Program or TestFlight plus FC.Distribution / UpdateCenter publishing once Apple credentials are available.

View File

@@ -1,5 +0,0 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- manifest.yaml

View File

@@ -1,26 +0,0 @@
# Apple TV signage is a sealed tvOS appliance. This ArgoCD app intentionally
# carries documentation metadata only; no Deployment, Service, or Pod resources
# are created for the player.
---
apiVersion: v1
kind: ConfigMap
metadata:
name: fc-signage-appletv-docs
namespace: fc-signage
labels:
app.kubernetes.io/name: fc-signage-appletv
app.kubernetes.io/part-of: flowercore-signage
flowercore.io/manifest-kind: docs-only
data:
README: |
FlowerCore.Signage.Agent.AppleTv is distributed through Apple Developer
Enterprise Program or TestFlight, not Kubernetes.
The app connects outbound to FC.Signage.Web:
- SignalR: https://signage.iamworkin.lan/hub/signage
- Polling fallback: GET /api/v1/nodes/{nodeId}/state
- Enrollment: POST /api/v1/nodes/{nodeId}/enroll
- Heartbeat: POST /api/v1/nodes/{nodeId}/heartbeat
This placeholder gives ArgoCD and inventory dashboards a first-class
Apple TV signage app entry without creating runtime pods.

View File

@@ -1,17 +0,0 @@
# FlowerCore Signage Pi Player
Phase 1 Raspberry Pi signage player packaging for Chromium kiosk deployments.
This bundle is intentionally air-gap friendly: systemd units, shell scripts,
udev rules, and Chromium managed policy are all checked into the repo and are
installed by `FlowerCore.Puppet`.
## Scope
- Bootstrap a stable node identity and mTLS client certificate.
- Launch Chromium in kiosk mode against `FC.Signage.Web` player routes.
- Restart the kiosk on HDMI hotplug.
- Renew mTLS certificates daily when fewer than 30 days remain.
- Detect display capabilities at boot, daily, and on HDMI hotplug.
Phase 2 native Avalonia rendering is documented separately in Notes and remains
deferred.

View File

@@ -1,15 +0,0 @@
{
"AutofillAddressEnabled": false,
"AutofillCreditCardEnabled": false,
"PasswordManagerEnabled": false,
"BrowserSignin": 0,
"MetricsReportingEnabled": false,
"SafeBrowsingProtectionLevel": 0,
"DefaultNotificationsSetting": 2,
"DefaultPopupsSetting": 2,
"BackgroundModeEnabled": false,
"DefaultBrowserSettingEnabled": false,
"PromotionalTabsEnabled": false,
"CommandLineFlagSecurityWarningsEnabled": false,
"ExtensionInstallBlocklist": ["*"]
}

View File

@@ -1,132 +0,0 @@
#!/usr/bin/env bash
set -euo pipefail
NODE_JSON="/etc/flowercore/signage-node.json"
CERT_DIR="/etc/fc-signage-player"
SIGNAGE_URL="${FC_SIGNAGE_URL:-https://signage.iamworkin.lan}"
NODE_ID=$(jq -r '.nodeId' "$NODE_JSON")
CONNECTORS=()
for dir in /sys/class/drm/card*-HDMI-A-*; do
[[ -e "$dir/status" ]] || continue
if [[ "$(cat "$dir/status")" == "connected" ]]; then
CONNECTORS+=("$(basename "$dir")")
fi
done
if [[ ${#CONNECTORS[@]} -eq 0 ]]; then
CAPABILITIES_JSON=$(jq -n --arg id "$NODE_ID" '{
nodeId: $id,
platform: "linux-arm64-pi",
displayConnected: false,
detectedAt: (now | todate),
note: "No HDMI display detected"
}')
else
PRIMARY="${CONNECTORS[0]}"
EDID_PATH="/sys/class/drm/${PRIMARY}/edid"
WIDTH=0
HEIGHT=0
REFRESH=60
HDR=false
AUDIO_HDMI=false
MFG=""
MODEL=""
PHYSICAL_SIZE=null
if [[ -s "$EDID_PATH" ]] && command -v edid-decode >/dev/null 2>&1; then
EDID_INFO=$(edid-decode < "$EDID_PATH" 2>/dev/null || true)
MFG=$(echo "$EDID_INFO" | grep -m1 -oP 'Manufacturer:\s*\K\S+' || true)
MODEL=$(echo "$EDID_INFO" | grep -m1 -oP 'Model:\s*\K\S+' || true)
PREF=$(echo "$EDID_INFO" | grep -m1 -oP '\d+x\d+\s*@\s*\d+(?:\.\d+)?\s*Hz' || true)
if [[ -n "$PREF" ]]; then
WIDTH=$(echo "$PREF" | grep -oP '^\d+')
HEIGHT=$(echo "$PREF" | grep -oP 'x\K\d+')
REFRESH=$(echo "$PREF" | grep -oP '@\s*\K[\d.]+' | cut -d. -f1)
fi
if echo "$EDID_INFO" | grep -qiE 'HDR (Static|Dynamic) Metadata Block'; then HDR=true; fi
if echo "$EDID_INFO" | grep -qiE 'CEA Audio Block|Audio Format Descriptor'; then AUDIO_HDMI=true; fi
PH_W=$(echo "$EDID_INFO" | grep -m1 -oP 'Maximum image size:\s*\K\d+\s*cm\s*x\s*\d+' || true)
if [[ -n "$PH_W" ]]; then
PH_CM_W=$(echo "$PH_W" | grep -oP '^\d+')
PH_CM_H=$(echo "$PH_W" | grep -oP 'x\s*\K\d+')
if (( PH_CM_W > 0 && PH_CM_H > 0 )); then
PHYSICAL_SIZE=$(awk -v w="$PH_CM_W" -v h="$PH_CM_H" 'BEGIN { printf "%.1f", sqrt(w*w + h*h)/2.54 }')
fi
fi
fi
if [[ "$WIDTH" == "0" ]] && command -v kmsprint >/dev/null 2>&1; then
KMS=$(kmsprint 2>/dev/null | grep -A2 "$PRIMARY" | grep -oP '\d+x\d+' | head -1 || true)
if [[ -n "$KMS" ]]; then
WIDTH=$(echo "$KMS" | grep -oP '^\d+')
HEIGHT=$(echo "$KMS" | grep -oP 'x\K\d+')
fi
fi
AUDIO_ALSA=false
if aplay -l 2>/dev/null | grep -qi 'card.*HDMI'; then AUDIO_ALSA=true; fi
HAS_AUDIO=false
if [[ "$AUDIO_HDMI" == "true" && "$AUDIO_ALSA" == "true" ]]; then HAS_AUDIO=true; fi
CAPABILITIES_JSON=$(jq -n \
--arg id "$NODE_ID" \
--argjson w "$WIDTH" \
--argjson h "$HEIGHT" \
--argjson r "$REFRESH" \
--argjson hdr "$HDR" \
--argjson audio "$HAS_AUDIO" \
--arg connector "$PRIMARY" \
--arg mfg "$MFG" \
--arg model "$MODEL" \
--argjson size "$PHYSICAL_SIZE" \
'{
nodeId: $id,
platform: "linux-arm64-pi",
displayConnected: true,
detectedAt: (now | todate),
hardware: {
maxResolution: { width: $w, height: $h },
nativeResolution: { width: $w, height: $h },
refreshRateHz: $r,
colorDepth: ($hdr | if . then "Color30Hdr" else "Color24" end),
hasAudioOutput: $audio,
audioChannelCount: ($audio | if . then 2 else 0 end),
physicalSizeInches: $size,
connector: $connector,
manufacturer: $mfg,
modelName: $model
},
render: { codecs: ["h264", "vp9", "mp4"] }
}')
fi
ENDPOINT_CANDIDATES=(
"${SIGNAGE_URL}/api/v1/nodes/${NODE_ID}/capabilities"
"${SIGNAGE_URL}/api/v1/displays/${NODE_ID}/capability-profile"
)
SUCCESS=false
for url in "${ENDPOINT_CANDIDATES[@]}"; do
HTTP_STATUS=$(curl -sk -o /tmp/cap-response.json -w "%{http_code}" \
--max-time 10 \
--cert "$CERT_DIR/client.crt" --key "$CERT_DIR/client.key" \
-X POST "$url" \
-H "Content-Type: application/json" \
-d "$CAPABILITIES_JSON" || echo "000")
if [[ "$HTTP_STATUS" == "200" || "$HTTP_STATUS" == "201" || "$HTTP_STATUS" == "204" ]]; then
SUCCESS=true
break
fi
done
mkdir -p /var/log/fc-signage-player
if [[ "$SUCCESS" != "true" ]]; then
echo "[$(date -Is)] capability declare: no endpoint accepted the profile; logging locally" \
| tee -a /var/log/fc-signage-player/capabilities.log
echo "$CAPABILITIES_JSON" | tee -a /var/log/fc-signage-player/capabilities.log
else
echo "[$(date -Is)] capability declare: ok ($url)" | tee -a /var/log/fc-signage-player/capabilities.log
fi
echo "$CAPABILITIES_JSON"

View File

@@ -1,144 +0,0 @@
#!/usr/bin/env bash
set -euo pipefail
NODE_JSON="/etc/flowercore/signage-node.json"
CERT_DIR="/etc/fc-signage-player"
SIGNAGE_URL="${FC_SIGNAGE_URL:-https://signage.iamworkin.lan}"
SETUP_CODE_FILE="/etc/flowercore/signage-setup-code"
mkdir -p /etc/flowercore "$CERT_DIR" /var/log/fc-signage-player
chown fc-signage:fc-signage /etc/flowercore "$CERT_DIR" /var/log/fc-signage-player
chmod 0750 "$CERT_DIR"
if [[ -s "$NODE_JSON" && -s "$CERT_DIR/client.p12" ]]; then
ENROLLED=$(jq -r '.enrolledAt // empty' "$NODE_JSON")
if [[ -n "$ENROLLED" ]]; then
echo "[$(date -Is)] bootstrap: already enrolled at $ENROLLED; skipping"
exit 0
fi
fi
if [[ -s "$NODE_JSON" ]]; then
NODE_UUID=$(jq -r '.nodeUuid // empty' "$NODE_JSON")
MACHINE_ID=$(jq -r '.machineId // empty' "$NODE_JSON")
else
NODE_UUID=$(uuidgen)
MACHINE_ID=$(echo "$NODE_UUID" | tr -d '-' | cut -c1-16)
jq -n --arg uuid "$NODE_UUID" --arg machine "$MACHINE_ID" --arg host "$(hostname -f)" --arg ts "$(date -Is)" \
'{nodeUuid: $uuid, machineId: $machine, hostname: $host, platform: "linux-arm64-pi", createdAt: $ts}' \
> "$NODE_JSON"
chmod 0640 "$NODE_JSON"
chown fc-signage:fc-signage "$NODE_JSON"
fi
SETUP_CODE=""
if [[ -s "$SETUP_CODE_FILE" ]]; then
SETUP_CODE=$(tr -d '\r\n\t ' < "$SETUP_CODE_FILE")
fi
MODEL=$(tr -d '\0' < /sys/firmware/devicetree/base/model 2>/dev/null || echo Unknown)
REG_PAYLOAD=$(jq -n \
--arg machine "$MACHINE_ID" \
--arg name "$(hostname -f)" \
--arg setup "$SETUP_CODE" \
--arg resolution "1920x1080" \
--arg model "$MODEL" \
'{
machineId: $machine,
name: $name,
setupCode: ($setup | if . == "" then null else . end),
resolution: $resolution,
hardwareModel: $model,
platform: "linux-arm64-pi"
}')
for attempt in 1 2; do
HTTP_STATUS=$(curl -sk -o /tmp/register-response.json -w "%{http_code}" \
--max-time 15 \
-X POST "${SIGNAGE_URL}/api/v1/nodes/register" \
-H "Content-Type: application/json" \
-d "$REG_PAYLOAD" || echo "000")
if [[ "$HTTP_STATUS" == "200" || "$HTTP_STATUS" == "201" ]]; then
break
fi
echo "[$(date -Is)] bootstrap: register attempt $attempt returned $HTTP_STATUS" >&2
sleep 5
done
if [[ "$HTTP_STATUS" != "200" && "$HTTP_STATUS" != "201" ]]; then
echo "[$(date -Is)] bootstrap: register failed after 2 attempts" >&2
exit 2
fi
NODE_ID=$(jq -r '.nodeId // empty' /tmp/register-response.json)
if [[ -z "$NODE_ID" ]]; then
echo "[$(date -Is)] bootstrap: register response did not include nodeId" >&2
exit 2
fi
jq --arg id "$NODE_ID" '.nodeId = $id' "$NODE_JSON" > "${NODE_JSON}.tmp" && mv "${NODE_JSON}.tmp" "$NODE_JSON"
if [[ -s "$SETUP_CODE_FILE" ]]; then
curl -sk -X POST "${SIGNAGE_URL}/api/v1/nodes/${NODE_ID}/approve-via-setup-code" \
-H "Content-Type: application/json" \
-d "{\"setupCode\":\"${SETUP_CODE}\"}" \
-o /dev/null || true
fi
STATUS=""
DEADLINE=$(( $(date +%s) + 1800 ))
while (( $(date +%s) < DEADLINE )); do
STATUS=$(curl -sk --max-time 5 "${SIGNAGE_URL}/api/v1/nodes/${NODE_ID}/status" | jq -r '.status // empty')
if [[ "$STATUS" == "Approved" || "$STATUS" == "Enrolled" || "$STATUS" == "Online" ]]; then
break
fi
sleep 15
done
if [[ "$STATUS" != "Approved" && "$STATUS" != "Enrolled" && "$STATUS" != "Online" ]]; then
echo "[$(date -Is)] bootstrap: approval not granted within 30min budget" >&2
exit 3
fi
KEY_PATH="${CERT_DIR}/client.key"
CSR_PATH="${CERT_DIR}/client.csr"
openssl ecparam -genkey -name prime256v1 -out "$KEY_PATH"
openssl req -new -key "$KEY_PATH" -out "$CSR_PATH" \
-subj "/CN=${NODE_ID}/O=FlowerCore/OU=SignagePlayer-Pi"
ENROLL_PAYLOAD=$(jq -n --arg csr "$(cat "$CSR_PATH")" '{certificateSigningRequest: $csr}')
HTTP_STATUS=$(curl -sk -o /tmp/enroll-response.json -w "%{http_code}" \
--max-time 15 \
-X POST "${SIGNAGE_URL}/api/v1/nodes/${NODE_ID}/enroll" \
-H "Content-Type: application/json" \
-d "$ENROLL_PAYLOAD")
if [[ "$HTTP_STATUS" != "200" && "$HTTP_STATUS" != "201" ]]; then
echo "[$(date -Is)] bootstrap: enroll failed with HTTP $HTTP_STATUS" >&2
exit 4
fi
jq -r '.clientCertificatePem // .signedCertificatePem' /tmp/enroll-response.json > "${CERT_DIR}/client.crt"
jq -r '.caCertificatePem' /tmp/enroll-response.json > "${CERT_DIR}/ca-chain.pem"
P12_PASS=$(openssl rand -hex 24)
echo -n "$P12_PASS" > "${CERT_DIR}/client.p12.pass"
chmod 0600 "${CERT_DIR}/client.p12.pass"
openssl pkcs12 -export \
-inkey "$KEY_PATH" \
-in "${CERT_DIR}/client.crt" \
-certfile "${CERT_DIR}/ca-chain.pem" \
-out "${CERT_DIR}/client.p12" \
-password "pass:${P12_PASS}"
chown fc-signage:fc-signage "${CERT_DIR}"/* "$NODE_JSON"
chmod 0640 "${CERT_DIR}/client.p12" "${CERT_DIR}/client.crt" "${CERT_DIR}/ca-chain.pem" "$KEY_PATH"
chmod 0600 "${CERT_DIR}/client.p12.pass"
EXPIRY=$(openssl x509 -in "${CERT_DIR}/client.crt" -enddate -noout | sed 's/notAfter=//')
jq --arg ts "$(date -Is)" --arg exp "$EXPIRY" \
'.enrolledAt = $ts | .certExpiry = $exp' "$NODE_JSON" > "${NODE_JSON}.tmp" \
&& mv "${NODE_JSON}.tmp" "$NODE_JSON"
systemctl start flowercore-signage-detect-display.service || true
systemctl start flowercore-signage-player-pi.service || true
echo "[$(date -Is)] bootstrap: enrolled and kiosk started (NodeId=${NODE_ID})"

View File

@@ -1,6 +0,0 @@
#!/usr/bin/env bash
set -euo pipefail
sleep 2
systemctl start flowercore-signage-detect-display.service || true
systemctl restart flowercore-signage-player-pi.service

View File

@@ -1,44 +0,0 @@
#!/usr/bin/env bash
set -euo pipefail
NODE_JSON="/etc/flowercore/signage-node.json"
NODE_ID=$(jq -r '.nodeId' "$NODE_JSON")
SIGNAGE_URL="${FC_SIGNAGE_URL:-https://signage.iamworkin.lan}"
CERT_DIR="/etc/fc-signage-player"
CERT_THUMB=$(openssl pkcs12 -in "$CERT_DIR/client.p12" -passin file:"$CERT_DIR/client.p12.pass" -nodes -nokeys 2>/dev/null \
| openssl x509 -fingerprint -sha256 -noout \
| sed 's/.*=//' \
| tr -d ':')
PLAYER_URL="${SIGNAGE_URL}/player/${NODE_ID}/embed?token=${CERT_THUMB}"
HTTP_STATUS=$(curl -sk -o /dev/null -w "%{http_code}" --max-time 5 \
--cert-type P12 --cert "$CERT_DIR/client.p12:$(cat "$CERT_DIR/client.p12.pass")" \
"$PLAYER_URL" || echo "000")
mkdir -p /var/log/fc-signage-player
if [[ "$HTTP_STATUS" != "200" && "$HTTP_STATUS" != "301" && "$HTTP_STATUS" != "302" ]]; then
echo "[$(date -Is)] /embed returned $HTTP_STATUS; falling back to /player/${NODE_ID}" \
>> /var/log/fc-signage-player/url-divergence.log
PLAYER_URL="${SIGNAGE_URL}/player/${NODE_ID}?token=${CERT_THUMB}"
fi
exec chromium-browser \
--kiosk \
--noerrdialogs \
--disable-infobars \
--disable-translate \
--disable-features=TranslateUI,InfiniteSessionRestore \
--autoplay-policy=no-user-gesture-required \
--password-store=basic \
--user-data-dir=/var/lib/fc-signage-player/profile \
--disk-cache-dir=/var/lib/fc-signage-player/cache \
--disk-cache-size=104857600 \
--no-first-run \
--no-default-browser-check \
--check-for-update-interval=2592000 \
--enable-features=OverlayScrollbar \
--start-fullscreen \
--window-position=0,0 \
--window-size=1920,1080 \
"$PLAYER_URL"

View File

@@ -1,20 +0,0 @@
#!/usr/bin/env bash
set -euo pipefail
mkdir -p /var/log/fc-signage-player
for f in /etc/flowercore/signage-node.json /etc/fc-signage-player/client.p12 /etc/fc-signage-player/client.p12.pass; do
if [[ ! -r "$f" ]]; then
echo "[$(date -Is)] prelaunch: missing or unreadable $f" >&2
exit 1
fi
done
if openssl pkcs12 -in /etc/fc-signage-player/client.p12 -passin file:/etc/fc-signage-player/client.p12.pass -nokeys -clcerts 2>/dev/null \
| openssl x509 -checkend $((7*24*3600)) -noout; then
:
else
echo "[$(date -Is)] prelaunch: client cert expires within 7 days" >&2
fi
echo "[$(date -Is)] prelaunch: ok" | tee -a /var/log/fc-signage-player/prelaunch.log

View File

@@ -1,46 +0,0 @@
#!/usr/bin/env bash
set -euo pipefail
CERT_DIR="/etc/fc-signage-player"
NODE_JSON="/etc/flowercore/signage-node.json"
SIGNAGE_URL="${FC_SIGNAGE_URL:-https://signage.iamworkin.lan}"
[[ -s "$CERT_DIR/client.crt" ]] || { echo "no cert to renew"; exit 0; }
if openssl x509 -in "$CERT_DIR/client.crt" -checkend $((30*24*3600)) -noout; then
exit 0
fi
NODE_ID=$(jq -r '.nodeId' "$NODE_JSON")
NEW_KEY="$CERT_DIR/client.key.new"
NEW_CSR="$CERT_DIR/client.csr.new"
openssl ecparam -genkey -name prime256v1 -out "$NEW_KEY"
openssl req -new -key "$NEW_KEY" -out "$NEW_CSR" \
-subj "/CN=${NODE_ID}/O=FlowerCore/OU=SignagePlayer-Pi"
HTTP_STATUS=$(curl -sk -o /tmp/renew-response.json -w "%{http_code}" \
--cert "$CERT_DIR/client.crt" --key "$CERT_DIR/client.key" \
-X POST "${SIGNAGE_URL}/api/v1/nodes/${NODE_ID}/renew" \
-H "Content-Type: application/json" \
-d "$(jq -n --arg csr "$(cat "$NEW_CSR")" '{certificateSigningRequest: $csr}')")
if [[ "$HTTP_STATUS" != "200" && "$HTTP_STATUS" != "201" ]]; then
echo "[$(date -Is)] renew: failed HTTP $HTTP_STATUS; leaving old cert in place" >&2
exit 5
fi
jq -r '.clientCertificatePem // .signedCertificatePem' /tmp/renew-response.json > "$CERT_DIR/client.crt.new"
jq -r '.caCertificatePem' /tmp/renew-response.json > "$CERT_DIR/ca-chain.pem.new"
P12_PASS=$(cat "$CERT_DIR/client.p12.pass")
openssl pkcs12 -export -inkey "$NEW_KEY" -in "$CERT_DIR/client.crt.new" \
-certfile "$CERT_DIR/ca-chain.pem.new" \
-out "$CERT_DIR/client.p12.new" -password "pass:${P12_PASS}"
mv "$CERT_DIR/client.key.new" "$CERT_DIR/client.key"
mv "$CERT_DIR/client.crt.new" "$CERT_DIR/client.crt"
mv "$CERT_DIR/ca-chain.pem.new" "$CERT_DIR/ca-chain.pem"
mv "$CERT_DIR/client.p12.new" "$CERT_DIR/client.p12"
chown fc-signage:fc-signage "$CERT_DIR"/client.*
systemctl restart flowercore-signage-player-pi.service

View File

@@ -1,3 +0,0 @@
# Restart kiosk and redeclare capabilities when HDMI connect/disconnect changes DRM state.
SUBSYSTEM=="drm", KERNEL=="card?-HDMI-A-?", ACTION=="change", RUN+="/usr/bin/systemctl restart flowercore-signage-player-pi.service"
SUBSYSTEM=="drm", KERNEL=="card?-HDMI-A-?", ACTION=="change", RUN+="/usr/bin/systemctl start flowercore-signage-detect-display.service"

View File

@@ -1,16 +0,0 @@
[Unit]
Description=FlowerCore Signage Pi: first-boot identity + mTLS enrollment
Wants=network-online.target
After=network-online.target
Before=flowercore-signage-player-pi.service
[Service]
Type=oneshot
ExecStart=/usr/local/bin/flowercore-signage-bootstrap.sh
RemainAfterExit=yes
StandardOutput=journal
StandardError=journal
TimeoutStartSec=2100
[Install]
WantedBy=multi-user.target

View File

@@ -1,8 +0,0 @@
[Unit]
Description=FlowerCore Signage Pi: detect connected display + declare capabilities
After=flowercore-signage-bootstrap.service
[Service]
Type=oneshot
User=fc-signage
ExecStart=/usr/local/bin/fc-signage-detect-display

View File

@@ -1,11 +0,0 @@
[Unit]
Description=Daily FlowerCore Signage Pi display capability redeclaration
[Timer]
OnCalendar=daily
RandomizedDelaySec=1h
Persistent=true
OnBootSec=30s
[Install]
WantedBy=timers.target

View File

@@ -1,7 +0,0 @@
[Unit]
Description=FlowerCore Signage Pi Player HDMI hotplug responder
DefaultDependencies=no
[Service]
Type=oneshot
ExecStart=/usr/local/bin/flowercore-signage-hdmi-respond.sh

View File

@@ -1,30 +0,0 @@
[Unit]
Description=FlowerCore Digital Signage Pi Player (Chromium kiosk)
Documentation=https://github.com/astoltz/FlowerCore.Notes/blob/master/docs/standards/appletv-pi-signage-agents-design.md
Wants=network-online.target
After=network-online.target graphical.target
ConditionPathExists=/etc/flowercore/signage-node.json
ConditionPathExists=/etc/fc-signage-player/client.p12
[Service]
Type=simple
User=fc-signage
Group=fc-signage
WorkingDirectory=/var/lib/fc-signage-player
EnvironmentFile=-/etc/flowercore/signage-player.env
ExecStartPre=/usr/local/bin/flowercore-signage-prelaunch.sh
ExecStart=/usr/local/bin/flowercore-signage-launch.sh
Restart=always
RestartSec=10s
StartLimitBurst=5
StartLimitIntervalSec=300s
MemoryMax=2G
MemoryHigh=1500M
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/var/lib/fc-signage-player /var/log/fc-signage-player
PrivateTmp=true
NoNewPrivileges=true
[Install]
WantedBy=graphical.target

View File

@@ -1,6 +0,0 @@
[Unit]
Description=FlowerCore Signage Pi: cert renewal worker
[Service]
Type=oneshot
ExecStart=/usr/local/bin/flowercore-signage-renew-cert.sh

View File

@@ -1,10 +0,0 @@
[Unit]
Description=Daily check for FlowerCore Signage Pi cert renewal
[Timer]
OnCalendar=daily
RandomizedDelaySec=2h
Persistent=true
[Install]
WantedBy=timers.target

View File

@@ -76,13 +76,15 @@ spec:
memory: "512Mi"
cpu: "500m"
livenessProbe:
tcpSocket:
httpGet:
path: /health
port: http
initialDelaySeconds: 30
periodSeconds: 30
timeoutSeconds: 5
readinessProbe:
tcpSocket:
httpGet:
path: /health
port: http
initialDelaySeconds: 10
periodSeconds: 10

View File

@@ -30,7 +30,6 @@ import logging
import re
import shlex
import subprocess
import unicodedata
from typing import Optional
from fastapi import FastAPI, HTTPException
@@ -61,189 +60,6 @@ class TtsRequest(BaseModel):
volume: int = 100 # 0-200
HEBREW_CHAR_RE = re.compile(r"[\u0590-\u05FF]")
HEBREW_WORD_RE = re.compile(r"[\u0590-\u05FF]+")
# eSpeak-NG's Hebrew voice can spell unpointed Hebrew as Unicode character
# names on some builds. For source-text study reads, prefer a stable
# scholarly transliteration so words sound like words even without niqqud.
HEBREW_WORD_TRANSLITERATIONS = {
"אב": "av",
"אבא": "abba",
"אברהם": "Avraham",
"אדמה": "adamah",
"אדני": "Adonai",
"אדם": "adam",
"אור": "or",
"אלהים": "Elohim",
"אלוהים": "Elohim",
"אמן": "amen",
"אם": "em",
"אמת": "emet",
"ארץ": "eretz",
"אש": "esh",
"את": "et",
"בית": "beit",
"בן": "ben",
"ברא": "bara",
"בראשית": "bereshit",
"ברית": "berit",
"ברוך": "barukh",
"בת": "bat",
"גוי": "goy",
"גוים": "goyim",
"גויים": "goyim",
"דבר": "davar",
"דברים": "devarim",
"דוד": "David",
"הלל": "hallel",
"הארץ": "ha-aretz",
"הברית": "ha-berit",
"החדשה": "ha-chadashah",
"השמים": "ha-shamayim",
"השמיים": "ha-shamayim",
"ויאמר": "vayomer",
"יהוה": "Adonai",
"יוסף": "Yosef",
"יוחנן": "Yochanan",
"ישראל": "Yisrael",
"ישוע": "Yeshua",
"יצחק": "Yitzchak",
"יעקב": "Yaakov",
"ירושלים": "Yerushalayim",
"כהן": "kohen",
"כהנים": "kohanim",
"מים": "mayim",
"מות": "mavet",
"מושיע": "moshia",
"מלך": "melekh",
"מלכות": "malkhut",
"מרים": "Miriam",
"משה": "Moshe",
"משיח": "Mashiach",
"נביא": "navi",
"נביאים": "neviim",
"עם": "am",
"עולם": "olam",
"צדק": "tzedek",
"קדוש": "qadosh",
"קדושים": "qedoshim",
"קול": "qol",
"רוח": "ruach",
"שאול": "Shaul",
"שמים": "shamayim",
"שמיים": "shamayim",
"שמעון": "Shimon",
"שלום": "Shalom",
"תורה": "torah",
"חכמה": "chokhmah",
"חסד": "chesed",
"חיים": "chayim",
"חושך": "choshekh",
}
HEBREW_LETTERS = {
"א": "a",
"ב": "b",
"ג": "g",
"ד": "d",
"ה": "h",
"ו": "v",
"ז": "z",
"ח": "kh",
"ט": "t",
"י": "y",
"כ": "kh",
"ך": "kh",
"ל": "l",
"מ": "m",
"ם": "m",
"נ": "n",
"ן": "n",
"ס": "s",
"ע": "a",
"פ": "p",
"ף": "f",
"צ": "ts",
"ץ": "ts",
"ק": "q",
"ר": "r",
"ש": "sh",
"ת": "t",
}
HEBREW_VOWELISH = {"a", "e", "i", "o", "u"}
def _strip_hebrew_marks(value: str) -> str:
decomposed = unicodedata.normalize("NFD", value)
return "".join(
ch for ch in decomposed
if unicodedata.category(ch) != "Mn" and ch not in {"׳", "״", "־"}
)
def _fallback_hebrew_transliteration(word: str) -> str:
tokens: list[str] = []
chars = list(word)
for index, ch in enumerate(chars):
token = HEBREW_LETTERS.get(ch)
if token is None:
continue
if ch == "ה" and index == len(chars) - 1:
token = "ah"
elif ch == "י" and index > 0:
token = "i"
elif ch == "ו" and index > 0:
token = "o"
tokens.append(token)
if not tokens:
return word
spoken: list[str] = []
for index, token in enumerate(tokens):
spoken.append(token)
next_token = tokens[index + 1] if index + 1 < len(tokens) else ""
if (
token[-1:] not in HEBREW_VOWELISH
and next_token
and next_token[:1] not in HEBREW_VOWELISH
):
spoken.append("a")
return "".join(spoken)
def _transliterate_hebrew_word(match: re.Match[str]) -> str:
original = match.group(0)
normalized = _strip_hebrew_marks(original)
if not normalized:
return original
direct = HEBREW_WORD_TRANSLITERATIONS.get(normalized)
if direct:
return direct
if normalized.startswith("ו") and len(normalized) > 1:
rest = HEBREW_WORD_TRANSLITERATIONS.get(normalized[1:])
if rest:
return f"ve-{rest}"
if normalized.startswith("ה") and len(normalized) > 1:
rest = HEBREW_WORD_TRANSLITERATIONS.get(normalized[1:])
if rest:
return f"ha-{rest}"
return _fallback_hebrew_transliteration(normalized)
def _prepare_synthesis_input(text: str, language: str, voice: str) -> tuple[str, str]:
if language.lower().startswith("he") and HEBREW_CHAR_RE.search(text):
spoken = HEBREW_WORD_RE.sub(_transliterate_hebrew_word, text)
return spoken, "en-us"
return text, voice
def _resolve_voice(req: TtsRequest) -> str:
if req.voice:
return req.voice.strip()
@@ -299,15 +115,14 @@ def tts(req: TtsRequest) -> Response:
raise HTTPException(status_code=400, detail="text is required")
voice = _resolve_voice(req)
spoken_text, synth_voice = _prepare_synthesis_input(req.text, req.language, voice)
args = [
"--stdout",
"-v", synth_voice,
"-v", voice,
"-s", str(max(80, min(450, req.rate))),
"-p", str(max(0, min(99, req.pitch))),
"-a", str(max(0, min(200, req.volume))),
]
wav = _run_espeak(args, spoken_text.encode("utf-8"))
wav = _run_espeak(args, req.text.encode("utf-8"))
if not wav:
raise HTTPException(status_code=500, detail="espeak-ng returned empty stdout")
return Response(content=wav, media_type="audio/wav")
@@ -338,9 +153,9 @@ def tts(req: TtsRequest) -> Response:
PHONEME_DURATION_RE = re.compile(r"^\s*\S+\s+(\d+)\s+", re.MULTILINE)
def _estimate_total_ms(req: TtsRequest, voice: str, spoken_text: str) -> int:
def _estimate_total_ms(req: TtsRequest, voice: str) -> int:
args = ["--pho", "--quiet", "-v", voice, "-s", str(req.rate)]
out = _run_espeak(args, spoken_text.encode("utf-8"))
out = _run_espeak(args, req.text.encode("utf-8"))
text = out.decode("utf-8", errors="replace")
total = 0
for match in PHONEME_DURATION_RE.finditer(text):
@@ -360,8 +175,7 @@ def timings(req: TtsRequest):
if not req.text.strip():
raise HTTPException(status_code=400, detail="text is required")
voice = _resolve_voice(req)
spoken_text, synth_voice = _prepare_synthesis_input(req.text, req.language, voice)
total_ms = _estimate_total_ms(req, synth_voice, spoken_text)
total_ms = _estimate_total_ms(req, voice)
# Distribute total_ms across whitespace-split words proportional to
# character count. Punctuation-only tokens are folded into the previous
@@ -390,7 +204,7 @@ def timings(req: TtsRequest):
{
"text": req.text,
"language": req.language,
"voice": synth_voice,
"voice": voice,
"words": out_words,
"durationMs": total_ms,
}

View File

@@ -37,19 +37,6 @@ spec:
app.kubernetes.io/name: ttsreader-piper
app.kubernetes.io/part-of: flowercore
spec:
# Bypass CoreDNS's *.iamworkin.lan wildcard so the init container reaches
# huggingface.co directly when it seeds voice models.
dnsPolicy: None
dnsConfig:
nameservers:
- 10.43.0.10
searches:
- fc-ttsreader.svc.cluster.local
- svc.cluster.local
- cluster.local
options:
- name: ndots
value: "2"
initContainers:
- name: seed-voices
image: rhasspy/wyoming-piper:latest
@@ -359,7 +346,7 @@ spec:
runAsUser: 1654
containers:
- name: biblical-tts
image: localhost/fc-biblical-tts:v20260506-hebrew-translit
image: localhost/fc-biblical-tts:v1
imagePullPolicy: Never
ports:
- containerPort: 10402
@@ -532,7 +519,7 @@ spec:
fsGroupChangePolicy: OnRootMismatch
containers:
- name: web
image: localhost/fc-ttsreader-web:v20260506-phase6
image: localhost/fc-ttsreader-web:v202604291817
imagePullPolicy: Never
ports:
- containerPort: 5217
@@ -550,8 +537,6 @@ spec:
value: "/usr/bin/ffmpeg"
- name: TtsReader__Bible__CorpusRoot
value: "/data/corpus-cache/world-english-bible/eng/usx"
- name: TtsReader__ChapterContext__DatabasePath
value: "/data/chapter-context.db"
- name: TtsReader__Jobs__Root
value: "/data/jobs"
- name: TtsReader__Piper__Host
@@ -568,14 +553,6 @@ spec:
value: "http://ttsreader-kokoro.fc-ttsreader.svc.cluster.local.:8880"
- name: TtsReader__Kokoro__TimeoutSeconds
value: "120"
- name: FlowerCore__Tts__BiblicalTts__Enabled
value: "true"
- name: FlowerCore__Tts__BiblicalTts__BaseUrl
value: "http://ttsreader-biblical.fc-ttsreader.svc.cluster.local.:10402"
- name: FlowerCore__Tts__BiblicalTts__TimeoutSeconds
value: "60"
- name: FlowerCore__Tts__BiblicalTts__DefaultLanguage
value: "grc"
- name: Speech__Alignment__Enabled
# Cluster-native faster-whisper (Lane F, 2026-04-25). The
# ttsreader-align deployment in this manifest wraps
@@ -611,8 +588,6 @@ spec:
# the writable PVC mount.
- name: TtsReader__Preview__CacheDirectory
value: "/data/voice-previews"
- name: TtsReader__VoiceLibrary__ReferenceClip__Directory
value: "/data/voice-reference-clips"
# Sprint E XXL Phase 4γ — content-addressed CDN bundle dir for
# POST /api/v1/render. Default "wwwroot/cdn" resolves under the
# read-only app filesystem, so pin to the writable PVC mount
@@ -634,10 +609,7 @@ spec:
optional: true
resources:
requests:
# The cluster is currently saturated on requested CPU by
# remotedesktop workloads even when real usage is low.
# Keep the web frontend schedulable under that pressure.
cpu: 10m
cpu: 100m
memory: 256Mi
limits:
cpu: 500m

View File

@@ -1,47 +0,0 @@
# fc-updater — Update Center GitOps adoption
**Status:** adopted into `bluejay-infra` on 2026-05-06. The live ArgoCD
Application is `infra-fc-updater`, generated by the `bluejay-infra`
ApplicationSet with automated sync, `prune: true`, and `selfHeal: true`.
## Managed manifest set
`apps/fc-updater/fc-updater.yaml` manages:
- `Namespace/fc-updater`
- `PersistentVolumeClaim/updatecenter-data`
- `Deployment/updatecenter-web`
- `Service/updatecenter-web`
- `Certificate/updatecenter-web-tls`
- `Certificate/updatecenter-web-internal-tls`
- `IngressRoute/updatecenter-web`
- `IngressRoute/updatecenter-web-internal`
- `IngressRoute/updatecenter-web-public`
The Deployment intentionally sets `revisionHistoryLimit: 3` and
`strategy.type: Recreate`. The service is singleton + SQLite/local bundle
storage on `PersistentVolumeClaim/updatecenter-data`, pinned to
`rke2-server`.
## Runtime dependencies intentionally not stored here
These live Secrets are pre-existing runtime material and are not committed to
Git:
- `updater-bootstrap-auth`
- `updater-signing`
- `updater-webhooks`
- `cf-origin-flowercore-io`
Rotate the Cloudflare Origin Certificate through
`FlowerCore.Notes/docs/standards/code-signing-rotation-runbook.md`; the
shared origin cert must exist in every namespace that serves a
`*.flowercore.io` public IngressRoute.
## Verification
```powershell
kubectl.exe --kubeconfig C:\Users\AndrewStoltz\.kube\rke2.yaml -n argocd get application infra-fc-updater
kubectl.exe --kubeconfig C:\Users\AndrewStoltz\.kube\rke2.yaml -n fc-updater get deploy,svc,ingressroute,certificate,pvc
curl.exe -sk https://update.flowercore.io/api/v1/manifests/_schema
```

View File

@@ -1,269 +0,0 @@
# FlowerCore Update Center
# GitOps adoption of the live fc-updater namespace after PUB-1/PUB-3.
# Runtime credentials remain in existing K8s Secrets; do not store them here.
---
apiVersion: v1
kind: Namespace
metadata:
name: fc-updater
labels:
app.kubernetes.io/part-of: flowercore
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: updatecenter-data
namespace: fc-updater
labels:
app.kubernetes.io/name: updatecenter-web
app.kubernetes.io/part-of: flowercore
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
volumeMode: Filesystem
resources:
requests:
# Sized for fleet bundle storage (LocalFsBundleStore.MaxTotalBytes
# soft cap at 25 GiB per project_uc_remaining_4_apps_signed_2026_05_06).
# Mike Bundle alone is ~5.1 GiB; cluster live capacity is already
# 20 GiB after a manual expand. PVCs cannot shrink, so git must track
# at least the live size to avoid the OutOfSync loop.
storage: 25Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: updatecenter-web
namespace: fc-updater
labels:
app: updatecenter-web
app.kubernetes.io/name: updatecenter-web
app.kubernetes.io/part-of: flowercore
spec:
replicas: 1
revisionHistoryLimit: 3
strategy:
# SQLite + local bundle storage live on a single RWO PVC. Recreate avoids
# two pods overlapping the same write path during future image bumps.
type: Recreate
selector:
matchLabels:
app: updatecenter-web
template:
metadata:
labels:
app: updatecenter-web
spec:
nodeName: rke2-server
containers:
- name: web
image: localhost/fc-updater-web:v20260509-4162dca-authgate
imagePullPolicy: Never
ports:
- containerPort: 8080
name: http
env:
- name: ASPNETCORE_URLS
value: http://+:8080
- name: FlowerCore__Updater__Database__Provider
value: sqlite
- name: FlowerCore__Updater__Database__ConnectionString
value: Data Source=/data/updatecenter.db
- name: FlowerCore__Updater__BundleStorage__LocalFs__RootDirectory
value: /data/bundles
- name: FlowerCore__Updater__PublicShares__RequirePublicVisibilityOnPublicHosts
value: "true"
- name: FlowerCore__Updater__PublicShares__Links__0__Code
value: 8f3c2a9e7d41
- name: FlowerCore__Updater__PublicShares__Links__0__AppId
value: flowercore.faith-ai-mike
- name: FlowerCore__Updater__PublicShares__Links__0__Channel
value: stable
- name: FlowerCore__Updater__PublicShares__Links__0__RuntimeId
value: win-x64
- name: FlowerCore__Updater__PublicShares__Links__0__DisplayName
value: Faith AI Mike Edition
- name: FlowerCore__Updater__PublicShares__Links__0__Headline
value: Faith AI Mike Edition
- name: FlowerCore__Updater__PublicShares__Links__0__Description
value: Private release link for Mike's Faith AI bundle.
- name: FlowerCore__Updater__Auth__Bootstrap__Enabled
value: "true"
- name: FlowerCore__Updater__Auth__Bootstrap__Username
valueFrom:
secretKeyRef:
name: updater-bootstrap-auth
key: username
- name: FlowerCore__Updater__Auth__Bootstrap__Password
valueFrom:
secretKeyRef:
name: updater-bootstrap-auth
key: password
- name: FlowerCore__Updater__Auth__Bootstrap__SigningKey
valueFrom:
secretKeyRef:
name: updater-bootstrap-auth
key: signing-key
- name: FlowerCore__Updater__Signing__AutoSignOnPublish
value: "true"
- name: FlowerCore__Updater__Signing__RequireSignatureOnPublish
value: "true"
- name: FlowerCore__Updater__Signing__PfxBase64
valueFrom:
secretKeyRef:
name: updater-signing
key: pfx-base64
- name: FlowerCore__Updater__Signing__PfxPassword
valueFrom:
secretKeyRef:
name: updater-signing
key: pfx-password
- name: FlowerCore__Updater__Signing__OpItemReference
value: op://FlowerCore/step-ca-codesign
- name: FlowerCore__Updater__Signing__TrustAnchorPath
value: /etc/flowercore-updater/signing/root-ca.pem
- name: FlowerCore__Updater__GitHub__Token
valueFrom:
secretKeyRef:
name: updater-webhooks
key: github-token
- name: FlowerCore__Updater__GitHub__WebhookSecret
valueFrom:
secretKeyRef:
name: updater-webhooks
key: github-webhook-secret
- name: FlowerCore__Updater__Gitea__Token
valueFrom:
secretKeyRef:
name: updater-webhooks
key: gitea-token
- name: FlowerCore__Updater__Gitea__WebhookSecret
valueFrom:
secretKeyRef:
name: updater-webhooks
key: gitea-webhook-secret
readinessProbe:
tcpSocket:
port: http
initialDelaySeconds: 10
periodSeconds: 15
livenessProbe:
tcpSocket:
port: http
initialDelaySeconds: 30
periodSeconds: 30
volumeMounts:
- name: data
mountPath: /data
- name: signing
mountPath: /etc/flowercore-updater/signing
readOnly: true
volumes:
- name: data
persistentVolumeClaim:
claimName: updatecenter-data
- name: signing
secret:
secretName: updater-signing
items:
- key: root-ca.pem
path: root-ca.pem
---
apiVersion: v1
kind: Service
metadata:
name: updatecenter-web
namespace: fc-updater
labels:
app: updatecenter-web
app.kubernetes.io/name: updatecenter-web
app.kubernetes.io/part-of: flowercore
spec:
type: ClusterIP
selector:
app: updatecenter-web
ports:
- name: http
port: 8080
targetPort: http
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: updatecenter-web-tls
namespace: fc-updater
spec:
secretName: updatecenter-web-tls
issuerRef:
name: step-ca-acme
kind: ClusterIssuer
dnsNames:
- updatecenter.iamworkin.lan
- updates.iamworkin.lan
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: updatecenter-web-internal-tls
namespace: fc-updater
spec:
secretName: updatecenter-web-internal-tls
issuerRef:
name: step-ca-acme
kind: ClusterIssuer
dnsNames:
- updatecenter-internal.iamworkin.lan
---
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: updatecenter-web
namespace: fc-updater
spec:
entryPoints:
- web
- websecure
routes:
- match: (Host(`updatecenter.iamworkin.lan`) || Host(`updates.iamworkin.lan`)) && (Method(`GET`) || Method(`HEAD`) || Method(`POST`) || Method(`OPTIONS`))
kind: Rule
services:
- name: updatecenter-web
port: 8080
tls:
secretName: updatecenter-web-tls
---
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: updatecenter-web-internal
namespace: fc-updater
spec:
entryPoints:
- web
- websecure
routes:
- match: Host(`updatecenter-internal.iamworkin.lan`)
kind: Rule
services:
- name: updatecenter-web
port: 8080
tls:
secretName: updatecenter-web-internal-tls
---
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: updatecenter-web-public
namespace: fc-updater
spec:
entryPoints:
- websecure
routes:
- match: (Host(`update.flowercore.io`) || Host(`updates.flowercore.io`)) && (Method(`GET`) || Method(`HEAD`) || Method(`POST`) || Method(`OPTIONS`))
kind: Rule
services:
- name: updatecenter-web
port: 8080
tls:
secretName: cf-origin-flowercore-io

View File

@@ -1,7 +0,0 @@
# ArgoCD's bluejay-infra ApplicationSet uses a directory generator and does
# not require kustomization.yaml. Keep this anyway as the manifest inventory
# and for local `kubectl kustomize apps/fc-updater` previews.
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- fc-updater.yaml

View File

@@ -1,11 +1,6 @@
# FlowerCore Tenant — retired flowercore.io placeholder.
#
# Public flowercore.io/www.flowercore.io routing is now owned by
# apps/fc-landing/fc-landing.yaml. This tenant placeholder remains available
# only as an in-cluster service; do not create a duplicate public
# IngressRoute here because it competes with fc-landing and requires a
# namespace-local cf-origin-flowercore-io Secret.
# ArgoCD managed - BlueJay Lab
# FlowerCore Tenant — flowercore.io (main brand)
# Public-facing placeholder landing page served by nginx
# ArgoCD managed - BlueJay Lab
---
apiVersion: v1
kind: Namespace
@@ -15,9 +10,15 @@ metadata:
app.kubernetes.io/part-of: bluejay-infra
flowercore.io/tenant: flowercore
---
# Landing page HTML
apiVersion: v1
kind: ConfigMap
# NOTE: The existing cf-origin-flowercore-io secret (covering *.flowercore.io)
# must be copied into this namespace. It already exists in other namespaces.
# Copy with: kubectl get secret cf-origin-flowercore-io -n fc-system -o yaml \
# | sed 's/namespace: .*/namespace: tenant-flowercore/' \
# | kubectl apply -f -
---
# Landing page HTML
apiVersion: v1
kind: ConfigMap
metadata:
name: flowercore-web-html
namespace: tenant-flowercore
@@ -307,6 +308,25 @@ spec:
selector:
app: flowercore-web
ports:
- port: 80
targetPort: 80
name: http
- port: 80
targetPort: 80
name: http
---
# Traefik IngressRoute — public via Cloudflare
# Uses existing cf-origin-flowercore-io cert (must be copied to this namespace)
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: flowercore-web
namespace: tenant-flowercore
spec:
entryPoints:
- websecure
routes:
- match: Host(`flowercore.io`) || Host(`www.flowercore.io`)
kind: Rule
services:
- name: flowercore-web
port: 80
tls:
secretName: cf-origin-flowercore-io

View File

@@ -1,38 +0,0 @@
# github-runner
ArgoCD-managed repo-scoped Linux GitHub Actions runners for FlowerCore.
`astoltz` is a GitHub user account, not an organization, so each repository
needs its own runner registration. The existing Common runner remains
`Deployment/github-runner`; Sprint 29 adds one single-replica Deployment for
each top Linux-cost repo:
- `FlowerCore.Puppet`
- `FlowerCore.Signage`
- `FlowerCore.DMS`
- `FlowerCore.Telephony`
- `FlowerCore.Print.Web`
- `FlowerCore.Chat`
- `FlowerCore.MySQL`
- `FlowerCore.Kiosk.Linux`
Each runner uses `myoung34/github-runner:latest`, `EPHEMERAL=true`, and labels
`self-hosted,linux,fc-build-linux`. The shared `github-runner-token` Secret is
synced from the existing 1Password item `GitHub PAT (Runner Registration)` and
is consumed as `ACCESS_TOKEN`.
Do not `kubectl apply` this app over ArgoCD. Merge to `main`, let
`infra-github-runner` sync, then verify from `noc1`:
```bash
kubectl -n github-runner get deploy,pods,pvc
for repo in FlowerCore.Puppet FlowerCore.Signage FlowerCore.DMS FlowerCore.Telephony FlowerCore.Print.Web FlowerCore.Chat FlowerCore.MySQL FlowerCore.Kiosk.Linux; do
gh api "/repos/astoltz/$repo/actions/runners" \
--jq '.runners[] | select((.labels[].name == "fc-build-linux") and (.status == "online")) | {name,status,busy,labels:[.labels[].name]}'
done
```
`LinuxRunnerOffline` is declared in `apps/monitoring/noc-monitoring.yaml` and
fires when any Common or top-8 Linux runner deployment has no available replica
for 10 minutes.

View File

@@ -1,999 +0,0 @@
# GitHub Actions self-hosted Linux runner fleet
#
# ArgoCD owns this namespace. Do not kubectl-apply ad hoc runner changes over
# it; update this manifest and let the bluejay-infra ApplicationSet reconcile.
#
# astoltz is a GitHub user account, not an org, so runners must be repo-scoped.
# Each Deployment below registers exactly one ephemeral myoung34/github-runner
# instance against one private FlowerCore repo using the shared PAT from the
# github-runner-token Secret.
#
# Current shape:
# - Common runner preserved from the phase-2 pilot.
# - Sprint 29 top-8 Linux-cost repos added first:
# Puppet, Signage, DMS, Telephony, Print.Web, Chat, MySQL, Kiosk.Linux.
#
# Security:
# - No ClusterRole / ClusterRoleBinding.
# - ServiceAccount has no K8s API privileges.
# - Self-hosted runners are for private repos and trusted branches only.
# - Fork pull-request approval must remain required in GitHub repo settings.
---
apiVersion: v1
kind: Namespace
metadata:
name: github-runner
labels:
app.kubernetes.io/part-of: flowercore
app.kubernetes.io/managed-by: argocd
---
# 1Password secret sync — creates github-runner-token K8s Secret.
# Fields expected in the 1Password item:
# credential — GitHub fine-grained PAT with Administration:read/write on
# each target repo. myoung34/github-runner uses ACCESS_TOKEN to
# mint fresh short-lived registration tokens at pod startup.
# Item path: IAmWorkin vault > "GitHub PAT (Runner Registration)"
apiVersion: onepassword.com/v1
kind: OnePasswordItem
metadata:
name: github-runner-token
namespace: github-runner
labels:
app.kubernetes.io/component: credentials
app.kubernetes.io/part-of: flowercore
spec:
itemPath: vaults/IAmWorkin/items/GitHub PAT (Runner Registration)
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: github-runner
namespace: github-runner
labels:
app.kubernetes.io/component: runner
app.kubernetes.io/part-of: flowercore
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: github-runner-nuget-cache
namespace: github-runner
labels:
app.kubernetes.io/component: cache
app.kubernetes.io/part-of: flowercore
flowercore.io/github-repo: FlowerCore.Common
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
resources:
requests:
storage: 5Gi
volumeMode: Filesystem
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: github-runner-puppet-nuget-cache
namespace: github-runner
labels:
app.kubernetes.io/component: cache
app.kubernetes.io/part-of: flowercore
flowercore.io/github-repo: FlowerCore.Puppet
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
resources:
requests:
storage: 5Gi
volumeMode: Filesystem
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: github-runner-signage-nuget-cache
namespace: github-runner
labels:
app.kubernetes.io/component: cache
app.kubernetes.io/part-of: flowercore
flowercore.io/github-repo: FlowerCore.Signage
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
resources:
requests:
storage: 5Gi
volumeMode: Filesystem
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: github-runner-dms-nuget-cache
namespace: github-runner
labels:
app.kubernetes.io/component: cache
app.kubernetes.io/part-of: flowercore
flowercore.io/github-repo: FlowerCore.DMS
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
resources:
requests:
storage: 5Gi
volumeMode: Filesystem
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: github-runner-telephony-nuget-cache
namespace: github-runner
labels:
app.kubernetes.io/component: cache
app.kubernetes.io/part-of: flowercore
flowercore.io/github-repo: FlowerCore.Telephony
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
resources:
requests:
storage: 5Gi
volumeMode: Filesystem
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: github-runner-print-web-nuget-cache
namespace: github-runner
labels:
app.kubernetes.io/component: cache
app.kubernetes.io/part-of: flowercore
flowercore.io/github-repo: FlowerCore.Print.Web
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
resources:
requests:
storage: 5Gi
volumeMode: Filesystem
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: github-runner-chat-nuget-cache
namespace: github-runner
labels:
app.kubernetes.io/component: cache
app.kubernetes.io/part-of: flowercore
flowercore.io/github-repo: FlowerCore.Chat
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
resources:
requests:
storage: 5Gi
volumeMode: Filesystem
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: github-runner-mysql-nuget-cache
namespace: github-runner
labels:
app.kubernetes.io/component: cache
app.kubernetes.io/part-of: flowercore
flowercore.io/github-repo: FlowerCore.MySQL
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
resources:
requests:
storage: 5Gi
volumeMode: Filesystem
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: github-runner-kiosk-linux-nuget-cache
namespace: github-runner
labels:
app.kubernetes.io/component: cache
app.kubernetes.io/part-of: flowercore
flowercore.io/github-repo: FlowerCore.Kiosk.Linux
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
resources:
requests:
storage: 5Gi
volumeMode: Filesystem
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: github-runner
namespace: github-runner
labels:
app.kubernetes.io/name: github-runner
app.kubernetes.io/component: runner
app.kubernetes.io/part-of: flowercore
app.kubernetes.io/managed-by: argocd
flowercore.io/created-by: argocd
flowercore.io/github-repo: FlowerCore.Common
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: github-runner
strategy:
type: Recreate
template:
metadata:
labels:
app.kubernetes.io/name: github-runner
app.kubernetes.io/component: runner
app.kubernetes.io/part-of: flowercore
flowercore.io/created-by: argocd
flowercore.io/github-repo: FlowerCore.Common
spec:
serviceAccountName: github-runner
nodeSelector:
kubernetes.io/hostname: rke2-server
securityContext:
runAsNonRoot: true
runAsUser: 1001
runAsGroup: 1001
fsGroup: 1001
containers:
- name: runner
image: myoung34/github-runner:latest
imagePullPolicy: Always
env:
- name: REPO_URL
value: "https://github.com/astoltz/FlowerCore.Common"
- name: RUNNER_NAME_PREFIX
value: "rke2-linux"
- name: RUNNER_WORKDIR
value: "/tmp/runner/work"
- name: EPHEMERAL
value: "true"
- name: LABELS
value: "self-hosted,linux,fc-build-linux"
- name: ACCESS_TOKEN
valueFrom:
secretKeyRef:
name: github-runner-token
key: credential
- name: RUN_AS_ROOT
value: "false"
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "2000m"
memory: "4Gi"
volumeMounts:
- name: nuget-cache
mountPath: /home/runner/.nuget/packages
- name: tmp
mountPath: /tmp
livenessProbe:
exec:
command:
- /bin/sh
- -c
- "pgrep -f Runner.Listener > /dev/null"
initialDelaySeconds: 30
periodSeconds: 30
failureThreshold: 3
volumes:
- name: nuget-cache
persistentVolumeClaim:
claimName: github-runner-nuget-cache
- name: tmp
emptyDir: {}
restartPolicy: Always
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: github-runner-puppet
namespace: github-runner
labels:
app.kubernetes.io/name: github-runner-puppet
app.kubernetes.io/component: runner
app.kubernetes.io/part-of: flowercore
app.kubernetes.io/managed-by: argocd
flowercore.io/created-by: argocd
flowercore.io/github-repo: FlowerCore.Puppet
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: github-runner-puppet
strategy:
type: Recreate
template:
metadata:
labels:
app.kubernetes.io/name: github-runner-puppet
app.kubernetes.io/component: runner
app.kubernetes.io/part-of: flowercore
flowercore.io/created-by: argocd
flowercore.io/github-repo: FlowerCore.Puppet
spec:
serviceAccountName: github-runner
nodeSelector:
kubernetes.io/hostname: rke2-server
securityContext:
runAsNonRoot: true
runAsUser: 1001
runAsGroup: 1001
fsGroup: 1001
containers:
- name: runner
image: myoung34/github-runner:latest
imagePullPolicy: Always
env:
- name: REPO_URL
value: "https://github.com/astoltz/FlowerCore.Puppet"
- name: RUNNER_NAME_PREFIX
value: "rke2-linux-puppet"
- name: RUNNER_WORKDIR
value: "/tmp/runner/work"
- name: EPHEMERAL
value: "true"
- name: LABELS
value: "self-hosted,linux,fc-build-linux"
- name: ACCESS_TOKEN
valueFrom:
secretKeyRef:
name: github-runner-token
key: credential
- name: RUN_AS_ROOT
value: "false"
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "2000m"
memory: "4Gi"
volumeMounts:
- name: nuget-cache
mountPath: /home/runner/.nuget/packages
- name: tmp
mountPath: /tmp
livenessProbe:
exec:
command:
- /bin/sh
- -c
- "pgrep -f Runner.Listener > /dev/null"
initialDelaySeconds: 30
periodSeconds: 30
failureThreshold: 3
volumes:
- name: nuget-cache
persistentVolumeClaim:
claimName: github-runner-puppet-nuget-cache
- name: tmp
emptyDir: {}
restartPolicy: Always
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: github-runner-signage
namespace: github-runner
labels:
app.kubernetes.io/name: github-runner-signage
app.kubernetes.io/component: runner
app.kubernetes.io/part-of: flowercore
app.kubernetes.io/managed-by: argocd
flowercore.io/created-by: argocd
flowercore.io/github-repo: FlowerCore.Signage
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: github-runner-signage
strategy:
type: Recreate
template:
metadata:
labels:
app.kubernetes.io/name: github-runner-signage
app.kubernetes.io/component: runner
app.kubernetes.io/part-of: flowercore
flowercore.io/created-by: argocd
flowercore.io/github-repo: FlowerCore.Signage
spec:
serviceAccountName: github-runner
nodeSelector:
kubernetes.io/hostname: rke2-server
securityContext:
runAsNonRoot: true
runAsUser: 1001
runAsGroup: 1001
fsGroup: 1001
containers:
- name: runner
image: myoung34/github-runner:latest
imagePullPolicy: Always
env:
- name: REPO_URL
value: "https://github.com/astoltz/FlowerCore.Signage"
- name: RUNNER_NAME_PREFIX
value: "rke2-linux-signage"
- name: RUNNER_WORKDIR
value: "/tmp/runner/work"
- name: EPHEMERAL
value: "true"
- name: LABELS
value: "self-hosted,linux,fc-build-linux"
- name: ACCESS_TOKEN
valueFrom:
secretKeyRef:
name: github-runner-token
key: credential
- name: RUN_AS_ROOT
value: "false"
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "2000m"
memory: "4Gi"
volumeMounts:
- name: nuget-cache
mountPath: /home/runner/.nuget/packages
- name: tmp
mountPath: /tmp
livenessProbe:
exec:
command:
- /bin/sh
- -c
- "pgrep -f Runner.Listener > /dev/null"
initialDelaySeconds: 30
periodSeconds: 30
failureThreshold: 3
volumes:
- name: nuget-cache
persistentVolumeClaim:
claimName: github-runner-signage-nuget-cache
- name: tmp
emptyDir: {}
restartPolicy: Always
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: github-runner-dms
namespace: github-runner
labels:
app.kubernetes.io/name: github-runner-dms
app.kubernetes.io/component: runner
app.kubernetes.io/part-of: flowercore
app.kubernetes.io/managed-by: argocd
flowercore.io/created-by: argocd
flowercore.io/github-repo: FlowerCore.DMS
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: github-runner-dms
strategy:
type: Recreate
template:
metadata:
labels:
app.kubernetes.io/name: github-runner-dms
app.kubernetes.io/component: runner
app.kubernetes.io/part-of: flowercore
flowercore.io/created-by: argocd
flowercore.io/github-repo: FlowerCore.DMS
spec:
serviceAccountName: github-runner
nodeSelector:
kubernetes.io/hostname: rke2-server
securityContext:
runAsNonRoot: true
runAsUser: 1001
runAsGroup: 1001
fsGroup: 1001
containers:
- name: runner
image: myoung34/github-runner:latest
imagePullPolicy: Always
env:
- name: REPO_URL
value: "https://github.com/astoltz/FlowerCore.DMS"
- name: RUNNER_NAME_PREFIX
value: "rke2-linux-dms"
- name: RUNNER_WORKDIR
value: "/tmp/runner/work"
- name: EPHEMERAL
value: "true"
- name: LABELS
value: "self-hosted,linux,fc-build-linux"
- name: ACCESS_TOKEN
valueFrom:
secretKeyRef:
name: github-runner-token
key: credential
- name: RUN_AS_ROOT
value: "false"
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "2000m"
memory: "4Gi"
volumeMounts:
- name: nuget-cache
mountPath: /home/runner/.nuget/packages
- name: tmp
mountPath: /tmp
livenessProbe:
exec:
command:
- /bin/sh
- -c
- "pgrep -f Runner.Listener > /dev/null"
initialDelaySeconds: 30
periodSeconds: 30
failureThreshold: 3
volumes:
- name: nuget-cache
persistentVolumeClaim:
claimName: github-runner-dms-nuget-cache
- name: tmp
emptyDir: {}
restartPolicy: Always
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: github-runner-telephony
namespace: github-runner
labels:
app.kubernetes.io/name: github-runner-telephony
app.kubernetes.io/component: runner
app.kubernetes.io/part-of: flowercore
app.kubernetes.io/managed-by: argocd
flowercore.io/created-by: argocd
flowercore.io/github-repo: FlowerCore.Telephony
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: github-runner-telephony
strategy:
type: Recreate
template:
metadata:
labels:
app.kubernetes.io/name: github-runner-telephony
app.kubernetes.io/component: runner
app.kubernetes.io/part-of: flowercore
flowercore.io/created-by: argocd
flowercore.io/github-repo: FlowerCore.Telephony
spec:
serviceAccountName: github-runner
nodeSelector:
kubernetes.io/hostname: rke2-server
securityContext:
runAsNonRoot: true
runAsUser: 1001
runAsGroup: 1001
fsGroup: 1001
containers:
- name: runner
image: myoung34/github-runner:latest
imagePullPolicy: Always
env:
- name: REPO_URL
value: "https://github.com/astoltz/FlowerCore.Telephony"
- name: RUNNER_NAME_PREFIX
value: "rke2-linux-telephony"
- name: RUNNER_WORKDIR
value: "/tmp/runner/work"
- name: EPHEMERAL
value: "true"
- name: LABELS
value: "self-hosted,linux,fc-build-linux"
- name: ACCESS_TOKEN
valueFrom:
secretKeyRef:
name: github-runner-token
key: credential
- name: RUN_AS_ROOT
value: "false"
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "2000m"
memory: "4Gi"
volumeMounts:
- name: nuget-cache
mountPath: /home/runner/.nuget/packages
- name: tmp
mountPath: /tmp
livenessProbe:
exec:
command:
- /bin/sh
- -c
- "pgrep -f Runner.Listener > /dev/null"
initialDelaySeconds: 30
periodSeconds: 30
failureThreshold: 3
volumes:
- name: nuget-cache
persistentVolumeClaim:
claimName: github-runner-telephony-nuget-cache
- name: tmp
emptyDir: {}
restartPolicy: Always
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: github-runner-print-web
namespace: github-runner
labels:
app.kubernetes.io/name: github-runner-print-web
app.kubernetes.io/component: runner
app.kubernetes.io/part-of: flowercore
app.kubernetes.io/managed-by: argocd
flowercore.io/created-by: argocd
flowercore.io/github-repo: FlowerCore.Print.Web
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: github-runner-print-web
strategy:
type: Recreate
template:
metadata:
labels:
app.kubernetes.io/name: github-runner-print-web
app.kubernetes.io/component: runner
app.kubernetes.io/part-of: flowercore
flowercore.io/created-by: argocd
flowercore.io/github-repo: FlowerCore.Print.Web
spec:
serviceAccountName: github-runner
nodeSelector:
kubernetes.io/hostname: rke2-server
securityContext:
runAsNonRoot: true
runAsUser: 1001
runAsGroup: 1001
fsGroup: 1001
containers:
- name: runner
image: myoung34/github-runner:latest
imagePullPolicy: Always
env:
- name: REPO_URL
value: "https://github.com/astoltz/FlowerCore.Print.Web"
- name: RUNNER_NAME_PREFIX
value: "rke2-linux-print-web"
- name: RUNNER_WORKDIR
value: "/tmp/runner/work"
- name: EPHEMERAL
value: "true"
- name: LABELS
value: "self-hosted,linux,fc-build-linux"
- name: ACCESS_TOKEN
valueFrom:
secretKeyRef:
name: github-runner-token
key: credential
- name: RUN_AS_ROOT
value: "false"
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "2000m"
memory: "4Gi"
volumeMounts:
- name: nuget-cache
mountPath: /home/runner/.nuget/packages
- name: tmp
mountPath: /tmp
livenessProbe:
exec:
command:
- /bin/sh
- -c
- "pgrep -f Runner.Listener > /dev/null"
initialDelaySeconds: 30
periodSeconds: 30
failureThreshold: 3
volumes:
- name: nuget-cache
persistentVolumeClaim:
claimName: github-runner-print-web-nuget-cache
- name: tmp
emptyDir: {}
restartPolicy: Always
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: github-runner-chat
namespace: github-runner
labels:
app.kubernetes.io/name: github-runner-chat
app.kubernetes.io/component: runner
app.kubernetes.io/part-of: flowercore
app.kubernetes.io/managed-by: argocd
flowercore.io/created-by: argocd
flowercore.io/github-repo: FlowerCore.Chat
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: github-runner-chat
strategy:
type: Recreate
template:
metadata:
labels:
app.kubernetes.io/name: github-runner-chat
app.kubernetes.io/component: runner
app.kubernetes.io/part-of: flowercore
flowercore.io/created-by: argocd
flowercore.io/github-repo: FlowerCore.Chat
spec:
serviceAccountName: github-runner
nodeSelector:
kubernetes.io/hostname: rke2-server
securityContext:
runAsNonRoot: true
runAsUser: 1001
runAsGroup: 1001
fsGroup: 1001
containers:
- name: runner
image: myoung34/github-runner:latest
imagePullPolicy: Always
env:
- name: REPO_URL
value: "https://github.com/astoltz/FlowerCore.Chat"
- name: RUNNER_NAME_PREFIX
value: "rke2-linux-chat"
- name: RUNNER_WORKDIR
value: "/tmp/runner/work"
- name: EPHEMERAL
value: "true"
- name: LABELS
value: "self-hosted,linux,fc-build-linux"
- name: ACCESS_TOKEN
valueFrom:
secretKeyRef:
name: github-runner-token
key: credential
- name: RUN_AS_ROOT
value: "false"
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "2000m"
memory: "4Gi"
volumeMounts:
- name: nuget-cache
mountPath: /home/runner/.nuget/packages
- name: tmp
mountPath: /tmp
livenessProbe:
exec:
command:
- /bin/sh
- -c
- "pgrep -f Runner.Listener > /dev/null"
initialDelaySeconds: 30
periodSeconds: 30
failureThreshold: 3
volumes:
- name: nuget-cache
persistentVolumeClaim:
claimName: github-runner-chat-nuget-cache
- name: tmp
emptyDir: {}
restartPolicy: Always
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: github-runner-mysql
namespace: github-runner
labels:
app.kubernetes.io/name: github-runner-mysql
app.kubernetes.io/component: runner
app.kubernetes.io/part-of: flowercore
app.kubernetes.io/managed-by: argocd
flowercore.io/created-by: argocd
flowercore.io/github-repo: FlowerCore.MySQL
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: github-runner-mysql
strategy:
type: Recreate
template:
metadata:
labels:
app.kubernetes.io/name: github-runner-mysql
app.kubernetes.io/component: runner
app.kubernetes.io/part-of: flowercore
flowercore.io/created-by: argocd
flowercore.io/github-repo: FlowerCore.MySQL
spec:
serviceAccountName: github-runner
nodeSelector:
kubernetes.io/hostname: rke2-server
securityContext:
runAsNonRoot: true
runAsUser: 1001
runAsGroup: 1001
fsGroup: 1001
containers:
- name: runner
image: myoung34/github-runner:latest
imagePullPolicy: Always
env:
- name: REPO_URL
value: "https://github.com/astoltz/FlowerCore.MySQL"
- name: RUNNER_NAME_PREFIX
value: "rke2-linux-mysql"
- name: RUNNER_WORKDIR
value: "/tmp/runner/work"
- name: EPHEMERAL
value: "true"
- name: LABELS
value: "self-hosted,linux,fc-build-linux"
- name: ACCESS_TOKEN
valueFrom:
secretKeyRef:
name: github-runner-token
key: credential
- name: RUN_AS_ROOT
value: "false"
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "2000m"
memory: "4Gi"
volumeMounts:
- name: nuget-cache
mountPath: /home/runner/.nuget/packages
- name: tmp
mountPath: /tmp
livenessProbe:
exec:
command:
- /bin/sh
- -c
- "pgrep -f Runner.Listener > /dev/null"
initialDelaySeconds: 30
periodSeconds: 30
failureThreshold: 3
volumes:
- name: nuget-cache
persistentVolumeClaim:
claimName: github-runner-mysql-nuget-cache
- name: tmp
emptyDir: {}
restartPolicy: Always
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: github-runner-kiosk-linux
namespace: github-runner
labels:
app.kubernetes.io/name: github-runner-kiosk-linux
app.kubernetes.io/component: runner
app.kubernetes.io/part-of: flowercore
app.kubernetes.io/managed-by: argocd
flowercore.io/created-by: argocd
flowercore.io/github-repo: FlowerCore.Kiosk.Linux
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: github-runner-kiosk-linux
strategy:
type: Recreate
template:
metadata:
labels:
app.kubernetes.io/name: github-runner-kiosk-linux
app.kubernetes.io/component: runner
app.kubernetes.io/part-of: flowercore
flowercore.io/created-by: argocd
flowercore.io/github-repo: FlowerCore.Kiosk.Linux
spec:
serviceAccountName: github-runner
nodeSelector:
kubernetes.io/hostname: rke2-server
securityContext:
runAsNonRoot: true
runAsUser: 1001
runAsGroup: 1001
fsGroup: 1001
containers:
- name: runner
image: myoung34/github-runner:latest
imagePullPolicy: Always
env:
- name: REPO_URL
value: "https://github.com/astoltz/FlowerCore.Kiosk.Linux"
- name: RUNNER_NAME_PREFIX
value: "rke2-linux-kiosk-linux"
- name: RUNNER_WORKDIR
value: "/tmp/runner/work"
- name: EPHEMERAL
value: "true"
- name: LABELS
value: "self-hosted,linux,fc-build-linux"
- name: ACCESS_TOKEN
valueFrom:
secretKeyRef:
name: github-runner-token
key: credential
- name: RUN_AS_ROOT
value: "false"
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "2000m"
memory: "4Gi"
volumeMounts:
- name: nuget-cache
mountPath: /home/runner/.nuget/packages
- name: tmp
mountPath: /tmp
livenessProbe:
exec:
command:
- /bin/sh
- -c
- "pgrep -f Runner.Listener > /dev/null"
initialDelaySeconds: 30
periodSeconds: 30
failureThreshold: 3
volumes:
- name: nuget-cache
persistentVolumeClaim:
claimName: github-runner-kiosk-linux-nuget-cache
- name: tmp
emptyDir: {}
restartPolicy: Always

View File

@@ -466,11 +466,11 @@ spec:
itemPath: vaults/IAmWorkin/items/Guacamole JSON Auth
---
---
# 1Password-backed credentials for Mac mini VNC access (Phase 1 <EFBFBD> 2026-04-28)
# 1Password-backed credentials for Mac mini VNC access (Phase 1 2026-04-28)
# The operator mints Secret 'macmini-vnc-creds' with keys: username, password, VNC Password
# Note: '1Password' field label 'VNC Password' -> K8s Secret key 'VNC Password' (space retained)
# Guacamole VNC connection password is sourced from the 'VNC Password' field.
# Actual IP is 10.0.56.115 (INFRA VLAN) <EFBFBD> the 1P item 'IP' field is kept as backup reference.
# Actual IP is 10.0.56.115 (INFRA VLAN) the 1P item 'IP' field is kept as backup reference.
apiVersion: onepassword.com/v1
kind: OnePasswordItem
metadata:
@@ -481,7 +481,6 @@ metadata:
app.kubernetes.io/part-of: flowercore
spec:
itemPath: vaults/IAmWorkin/items/Mac Mini
---
# Blue Jay Branding Extension (CSS + translations)
apiVersion: v1
kind: ConfigMap

View File

@@ -46,7 +46,7 @@ spec:
spec:
containers:
- name: intranet-web
image: localhost/fc-intranet-web:v20260508-brochure-w1
image: localhost/fc-intranet-web:v20260429-1646
imagePullPolicy: Never
ports:
- containerPort: 5300

View File

@@ -5,9 +5,7 @@ Phase 2.4 closed. Pod running, certificate issued (step-ca-acme), PVC
bound (Longhorn 20Gi RWO), ArgoCD `infra-knowledge` synced. `/healthz`
returns 200, `/api/v1/editions` returns `[]` (initial-deploy state — no
*.db files in the PVC yet; Phase 2.5+ admin UI handles bulk
population). Phase 1 of the Agent Zero MCP rollout keeps `/healthz`
anonymous and gates `/mcp` behind `Authorization: Bearer <token>` built
from the 1Password item `FlowerCore Knowledge MCP Tokens`.
population).
- Plan: [`../../../FlowerCore.Notes/docs/ai-agents/flowercore-knowledge-service-plan.md`](../../../FlowerCore.Notes/docs/ai-agents/flowercore-knowledge-service-plan.md)
- Sprint: [`../../../FlowerCore.Notes/docs/ai-station/sprint-e-xxl-plan.md`](../../../FlowerCore.Notes/docs/ai-station/sprint-e-xxl-plan.md) (Track B)
@@ -21,12 +19,6 @@ search to the rest of the FC ecosystem (Agent Zero, Chat.Web persona
memory, AiStation embeddings explorer, TtsReader chapter context, BMO
bot, Pi nodes via `fc-index sync`).
Phase 1 MCP routing is explicit:
- in-cluster Agent Zero → `http://knowledge-web.knowledge.svc/mcp`
- workstation Agent Zero → `https://knowledge.iamworkin.lan/mcp`
- probe URL for both lanes → `/healthz`
## Deployment order (do NOT skip / reorder)
### 1. FlowerCore.DNS public A record — knowledge.iamworkin.lan -> 10.0.56.200

View File

@@ -40,16 +40,16 @@ metadata:
labels:
app.kubernetes.io/part-of: bluejay-infra
---
# MCP bearer token for the read-only Agent Zero Phase 1 lane. The 1Password
# item currently stores the raw token in its concealed PASSWORD field, which
# the operator syncs into the namespaced Secret key `password`.
# MCP API key — synced from 1Password so /mcp stays gated without baking
# secrets into Git. The PASSWORD category maps the concealed field to Secret
# key `password`, which the Deployment reads into FlowerCore:Mcp:ApiKey:Key.
apiVersion: onepassword.com/v1
kind: OnePasswordItem
metadata:
name: knowledge-mcp-tokens
name: knowledge-mcp-api-key
namespace: knowledge
spec:
itemPath: "vaults/IAmWorkin/items/FlowerCore Knowledge MCP Tokens"
itemPath: "vaults/IAmWorkin/items/KnowledgeApiKey"
---
apiVersion: v1
kind: PersistentVolumeClaim
@@ -102,17 +102,8 @@ spec:
- name: web
# Placeholder tag — bump to the image you built + imported to ALL
# RKE2 nodes via scripts/deploy-knowledge.sh before applying.
image: localhost/fc-knowledge-web:v20260429232635
image: localhost/fc-knowledge-web:v202604272200
imagePullPolicy: Never
command:
- /bin/sh
- -c
args:
- |
if [ -n "${KNOWLEDGE_MCP_BEARER_TOKEN:-}" ]; then
export FlowerCore__Mcp__ApiKey__Key="Bearer ${KNOWLEDGE_MCP_BEARER_TOKEN}"
fi
exec dotnet FlowerCore.Knowledge.Web.dll
ports:
- containerPort: 8080
name: http
@@ -124,7 +115,7 @@ spec:
- name: DOTNET_SYSTEM_GLOBALIZATION_INVARIANT
value: "false"
# Vector-store directory + embedding model + edition profile dir.
# Profile JSON is baked into the image at /home/app/editions via the
# Profile JSON is baked into the image at /app/editions via the
# csproj Content-link from FlowerCore.Common/editions/.
- name: Knowledge__VectorStoresDirectory
value: "/data/vector-stores"
@@ -135,7 +126,7 @@ spec:
- name: Knowledge__MaxLimit
value: "50"
- name: FlowerCore__Editions__ProfileDirectory
value: "/home/app/editions"
value: "/app/editions"
# Embed via edge1 Pi 5 + AI HAT+ (10.0.57.17:11434). Cluster
# services do not depend on BLUEJAY-WS (private dev hardware) per
# bluejay-infra@0f9d56e. Query-time embedding is fast enough on
@@ -147,14 +138,7 @@ spec:
- name: FlowerCore__Mcp__ApiKey__Key
valueFrom:
secretKeyRef:
name: knowledge-mcp-tokens
key: password
- name: FlowerCore__Mcp__ApiKey__HeaderName
value: "Authorization"
- name: KNOWLEDGE_MCP_BEARER_TOKEN
valueFrom:
secretKeyRef:
name: knowledge-mcp-tokens
name: knowledge-mcp-api-key
key: password
resources:
requests:
@@ -201,7 +185,7 @@ spec:
- name: tmp
mountPath: /tmp
- name: logs
mountPath: /home/app/logs
mountPath: /app/logs
volumes:
- name: vector-store
persistentVolumeClaim:
@@ -241,12 +225,8 @@ spec:
kind: ClusterIssuer
dnsNames:
- knowledge.iamworkin.lan
# step-ca ACME caps lifetime at 30d; requesting 90d silently capped
# made renewBefore=cert-lifetime → perpetual renewal loop (10888+ CRs
# in 18h on 2026-05-07). Match working 720h/240h pattern from other
# FC services.
duration: 720h # 30d (step-ca cap)
renewBefore: 240h # 10d
duration: 2160h # 90d
renewBefore: 720h # 30d
---
apiVersion: traefik.io/v1alpha1
kind: IngressRoute

View File

@@ -1,93 +0,0 @@
# =============================================================================
# ci1 - Windows Server 2025 KubeVirt VM (GitHub Actions Self-Hosted Runner)
# =============================================================================
# Boots from the sysprepped containerDisk template built by the Windows VM
# sysprep pipeline. See docs/infrastructure/windows-vm-sysprep-pipeline.md.
# Path A/B/C install history is preserved in git log only.
# =============================================================================
apiVersion: v1
kind: Namespace
metadata:
name: kubevirt-vms
labels:
app.kubernetes.io/part-of: kubevirt-stack
pod-security.kubernetes.io/enforce: privileged
---
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
name: ci1
namespace: kubevirt-vms
labels:
app: ci-runner
role: github-actions-runner
flowercore.io/managed-by: bluejay-infra
spec:
runStrategy: Always
template:
metadata:
labels:
app: ci-runner
role: github-actions-runner
kubevirt.io/vm: ci1
spec:
domain:
cpu:
cores: 8
sockets: 1
threads: 1
memory:
guest: 16Gi
resources:
requests:
memory: 16Gi
limits:
memory: 16Gi
clock:
utc: {}
timer:
hpet:
present: false
pit:
tickPolicy: delay
rtc:
tickPolicy: catchup
hyperv: {}
features:
acpi: {}
apic: {}
hyperv:
relaxed: {}
vapic: {}
spinlocks:
spinlocks: 8191
smm: {}
firmware:
bootloader:
efi:
secureBoot: false
devices:
tpm: {}
disks:
- name: rootdisk
disk:
bus: virtio
interfaces:
# Pod-network fallback for CI runner outbound traffic. Switch to
# prod-vlan57 once the bridge/NAD lane is ready for L2 access.
- name: default
masquerade: {}
model: virtio
machine:
type: q35
networks:
- name: default
pod: {}
volumes:
- name: rootdisk
containerDisk:
image: localhost/fc-win-server-2025:v1
imagePullPolicy: Never
terminationGracePeriodSeconds: 3600

View File

@@ -1,3 +0,0 @@
resources:
- ci1.yaml
- prod-vlan57-nad.yaml

View File

@@ -1,69 +0,0 @@
# =============================================================================
# NetworkAttachmentDefinition — PROD VLAN 57 bridge
# =============================================================================
# Purpose: makes KubeVirt VMs reachable on the PROD VLAN (10.0.57.0/24)
# alongside the existing pod network. Required for ci1 to bridge onto PROD
# (e.g. to provision/scrape edge1, edge2, kiosks, Pis on the same L2 segment).
#
# **DEPLOY GATE — Phase 1.5 host work required first**:
# On every RKE2 node (rke2-server, rke2-agent1, rke2-agent2):
# 1. Switch port (UniFi USL16LP) trunks VLAN 57 to the node — usually
# already true since BLUEJAY-WS reaches 10.0.57.x services. Verify
# with `ip link show enp86s0.57` after configuring sub-interface, OR
# `tcpdump -ni enp86s0 vlan 57` and ping a known PROD host.
# 2. Linux bridge `br-prod` enslaving `enp86s0.57` (VLAN sub-interface).
# NetworkManager profile examples in the runbook below.
# 3. Verify Multus DaemonSet `kube-multus-ds` is Ready on all nodes.
#
# Without those, applying this NAD has no effect except to register the CRD.
# A VM that requests this NAD with no bridge present will fail with:
# `error adding pod kubevirt-vms_ci1 to CNI network "prod-vlan57": failed to
# plumb VLAN: open /sys/class/net/br-prod/master: no such file or directory`
#
# Configuration notes:
# - cniVersion 0.3.1 to match Multus daemon-config.json
# - mtu 1500 (matches enp86s0 default; bump if jumbo frames configured)
# - bridge name `br-prod` is convention; if Puppet picks a different name
# (e.g. `br57`, `br-vlan57`), edit BOTH this NAD and the ci1.yaml
# interface block. Keep them in sync.
# - vlan: 0 because the host bridge already strips VLAN tag (br-prod sits
# on top of `enp86s0.57`). If we instead used a VLAN-aware bridge with
# trunk port, set vlan: 57 here. Current convention is VLAN-stripped at
# the sub-interface, so the bridge passes untagged frames.
#
# Apply:
# kubectl --kubeconfig $env:USERPROFILE\.kube\rke2.yaml apply -f apps/kubevirt-vms/prod-vlan57-nad.yaml
#
# Then update ci1.yaml networks: stanza to:
# - name: prod-net
# multus:
# networkName: kubevirt-vms/prod-vlan57
# and the interface block from `masquerade` to `bridge`.
# =============================================================================
---
# Namespace must exist already (created by ci1.yaml's first document).
# This file imports a NAD into that same namespace.
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
name: prod-vlan57
namespace: kubevirt-vms
annotations:
bluejay.iamworkin.lan/host-bridge: "br-prod (enslaves enp86s0.57)"
bluejay.iamworkin.lan/cidr: "10.0.57.0/24"
bluejay.iamworkin.lan/gateway: "10.0.57.1"
bluejay.iamworkin.lan/dns: "10.0.56.1 (pfSense Unbound)"
spec:
config: |
{
"cniVersion": "0.3.1",
"name": "prod-vlan57",
"type": "bridge",
"bridge": "br-prod",
"ipam": {},
"mtu": 1500,
"vlan": 0,
"promiscMode": true,
"preserveDefaultVlan": false
}

View File

@@ -1,99 +0,0 @@
# =============================================================================
# Windows Server 2025 ISO — Static NFS PV (Path B for SATA-CDROM timeout)
# =============================================================================
# Purpose: Mount the ISO from Synology NAS via NFS instead of from a Longhorn-
# backed Filesystem PVC.
#
# Why: SATA-CDROM emulation reading from a Longhorn-backed Filesystem PVC is
# too slow for OVMF's boot read window — the DVD-ROM enumeration times out
# before the bootloader can be read. Symptom on the serial console:
# BdsDxe: failed to start Boot0001 "UEFI QEMU DVD-ROM QM00001 " from ...
# BdsDxe: failed to start Boot0001 ... Time out
# BdsDxe: No bootable option or device was found
# Diagnosis confirmed the ISO content is a perfectly valid bootable ISO9660
# image — the bug is in the timing path between OVMF and Longhorn-backed
# storage, not in the ISO itself.
#
# Block-mode PVC was tried (`volumeMode: Block` via DataVolume) and would
# likely fix the timing, but CDI v1.65.0's upload-target pod cannot open the
# block device due to runAsUser:107 + capabilities.drop:[ALL] and we got:
# blockdev: cannot open /dev/cdi-block-volume: Permission denied
#
# NFS-mounted ISO bypasses both issues: no Longhorn slowness, no CDI upload
# pod permission concerns. The ISO is read directly from the NAS over a
# native NFSv4.1 mount that QEMU's SATA emulator can read at full LAN speed.
#
# Layout on Synology:
# /volume1/ISOs/ (existing export, RKE2 ACL)
# en-us_windows_server_2025_updated_march_2026_x64_dvd_8e06425a.iso
# win2025-iso-disk/ (new subdir, 2026-05-08)
# disk.img -> hardlink to ../en-us_windows_server_2025_..._8e06425a.iso
#
# KubeVirt's launcher pod expects a PVC mounted at
# /var/run/kubevirt-private/vmi-disks/<diskName>/disk.img — by mounting the
# `win2025-iso-disk/` subdir as the NFS PV root, `disk.img` lives at the PV's
# root and KubeVirt's CDROM emulator finds it without any path manipulation.
#
# A symlink would NOT work for sub-path NFS mounts (the relative target
# `../...iso` falls outside the sub-mount root). A hardlink works because it
# references the same inode regardless of mount point.
#
# Memory references:
# - feedback_synology_nfs_volume1_kubernetes_export_scoped (Synology export
# scoping pattern — but /volume1/ISOs export, unlike /volume1/kubernetes,
# does support sub-path mounts because Synology NFS is configured with
# pseudo-fs in NFSv4.1)
# - feedback_kubevirt_iso_first_install_bootorder_and_runstrategy (boot
# order / runStrategy gotchas, separate from the storage timing issue)
#
# Validation (2026-05-08, from rke2-server / rke2-agent1 / rke2-agent2):
# mount -t nfs -o nfsvers=4.1,ro 10.0.58.3:/volume1/ISOs/win2025-iso-disk /tmp/m
# file /tmp/m/disk.img
# -> ISO 9660 CD-ROM filesystem data 'SSS_X64FRE_EN-US_DV9' (bootable)
# All 3 RKE2 nodes can mount and read.
# =============================================================================
apiVersion: v1
kind: PersistentVolume
metadata:
name: windows-server-2025-iso-nfs
labels:
flowercore.io/iso: windows-server-2025
flowercore.io/managed-by: bluejay-infra
spec:
capacity:
storage: 8Gi
accessModes:
- ReadOnlyMany
volumeMode: Filesystem
persistentVolumeReclaimPolicy: Retain
storageClassName: "" # static, no provisioner
mountOptions:
- nfsvers=4.1
- ro
- hard
- timeo=600
- retrans=3
nfs:
server: 10.0.58.3 # BlueJayNAS Synology DS1621+ on HOME VLAN 58
path: /volume1/ISOs/win2025-iso-disk
readOnly: true
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: windows-server-2025-iso-nfs
namespace: kubevirt-vms
labels:
app: ci-runner
flowercore.io/managed-by: bluejay-infra
spec:
accessModes:
- ReadOnlyMany
volumeMode: Filesystem
resources:
requests:
storage: 8Gi
storageClassName: ""
volumeName: windows-server-2025-iso-nfs

View File

@@ -1,762 +0,0 @@
{
"annotations": {
"list": []
},
"editable": true,
"fiscalYearStartMonth": 0,
"graphTooltip": 1,
"id": null,
"links": [
{
"icon": "external link",
"includeVars": false,
"keepTime": false,
"targetBlank": true,
"title": "Open Service",
"type": "link",
"url": "https://updatecenter.iamworkin.lan/"
}
],
"panels": [
{
"datasource": {
"type": "prometheus",
"uid": "fffjikve8llhce"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [
{
"options": {
"0": {
"color": "#f87171",
"index": 1,
"text": "DOWN"
},
"1": {
"color": "#4ade80",
"index": 0,
"text": "UP"
}
},
"type": "value"
}
],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "#f87171",
"value": null
},
{
"color": "#4ade80",
"value": 1
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 4,
"w": 8,
"x": 0,
"y": 0
},
"id": 1,
"options": {
"colorMode": "background",
"graphMode": "none",
"justifyMode": "center",
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"textMode": "value_and_name"
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "fffjikve8llhce"
},
"expr": "probe_success{job=\"probe-traefik-services\",instance=\"updatecenter.iamworkin.lan\"}",
"refId": "A",
"legendFormat": "Availability"
}
],
"title": "Service Availability",
"transparent": true,
"type": "stat"
},
{
"datasource": {
"type": "prometheus",
"uid": "fffjikve8llhce"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"decimals": 2,
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "#f87171",
"value": null
},
{
"color": "#fbbf24",
"value": 95
},
{
"color": "#FFB300",
"value": 99
},
{
"color": "#4ade80",
"value": 99.9
}
]
},
"unit": "percent"
},
"overrides": []
},
"gridPos": {
"h": 4,
"w": 8,
"x": 8,
"y": 0
},
"id": 2,
"options": {
"colorMode": "background_solid",
"graphMode": "area",
"justifyMode": "center",
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"textMode": "value_and_name"
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "fffjikve8llhce"
},
"expr": "avg_over_time(probe_success{job=\"probe-traefik-services\",instance=\"updatecenter.iamworkin.lan\"}[24h]) * 100",
"refId": "A",
"legendFormat": "24h Uptime"
}
],
"title": "24-Hour Uptime",
"transparent": true,
"type": "stat"
},
{
"datasource": {
"type": "prometheus",
"uid": "fffjikve8llhce"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"max": 30,
"min": 0,
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "#f87171",
"value": null
},
{
"color": "#fbbf24",
"value": 2
},
{
"color": "#4ade80",
"value": 7
}
]
},
"unit": "d"
},
"overrides": []
},
"gridPos": {
"h": 4,
"w": 8,
"x": 16,
"y": 0
},
"id": 3,
"options": {
"minVizHeight": 75,
"minVizWidth": 75,
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"showThresholdLabels": false,
"showThresholdMarkers": true
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "fffjikve8llhce"
},
"expr": "(probe_ssl_earliest_cert_expiry{job=\"probe-traefik-services\",instance=\"updatecenter.iamworkin.lan\"} - time()) / 86400",
"refId": "A",
"legendFormat": "Days Remaining"
}
],
"title": "Cert Expiry (Days)",
"transparent": true,
"type": "gauge"
},
{
"datasource": {
"type": "prometheus",
"uid": "fffjikve8llhce"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "Response Time (seconds)",
"drawStyle": "line",
"fillOpacity": 12,
"gradientMode": "scheme",
"lineInterpolation": "smooth",
"lineWidth": 2,
"pointSize": 4,
"showPoints": "never",
"spanNulls": true,
"thresholdsStyle": {
"mode": "dashed"
}
},
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "#4ade80",
"value": null
},
{
"color": "#fbbf24",
"value": 2
},
{
"color": "#f87171",
"value": 5
}
]
},
"unit": "s"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 14,
"x": 0,
"y": 4
},
"id": 4,
"options": {
"legend": {
"calcs": [
"lastNotNull",
"mean",
"max"
],
"displayMode": "table",
"placement": "right"
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "fffjikve8llhce"
},
"expr": "probe_duration_seconds{job=\"probe-traefik-services\",instance=\"updatecenter.iamworkin.lan\"}",
"refId": "A",
"legendFormat": "Probe Duration"
}
],
"timeFrom": "1h",
"title": "Response Time (1h Trend)",
"transparent": true,
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "fffjikve8llhce"
},
"gridPos": {
"h": 8,
"w": 10,
"x": 14,
"y": 4
},
"id": 5,
"options": {
"alertInstanceLabelFilter": "{instance=\"updatecenter.iamworkin.lan\"}",
"alertName": "",
"dashboardAlerts": false,
"groupBy": [],
"groupMode": "default",
"maxItems": 10,
"sortOrder": 1,
"stateFilter": {
"error": true,
"firing": true,
"noData": true,
"normal": false,
"pending": true
},
"viewMode": "list"
},
"title": "Active Alerts",
"type": "alertlist"
},
{
"collapsed": false,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 12
},
"id": 20,
"title": "OTEL Counters — Track 1D",
"type": "row"
},
{
"datasource": {
"type": "prometheus",
"uid": "fffjikve8llhce"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"lineWidth": 1,
"fillOpacity": 10
},
"unit": "reqps"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 13
},
"id": 21,
"options": {
"legend": {
"displayMode": "table",
"placement": "right",
"calcs": ["mean", "lastNotNull"]
},
"tooltip": {
"mode": "multi",
"sort": "desc"
}
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "fffjikve8llhce"
},
"expr": "sum by (status) (rate(updatecenter_manifest_requests_total[5m]))",
"refId": "A",
"legendFormat": "status={{status}}"
}
],
"title": "Manifest Requests rate by status (5m)",
"transparent": true,
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "fffjikve8llhce"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"lineWidth": 1,
"fillOpacity": 10
},
"unit": "Bps"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 13
},
"id": 22,
"options": {
"legend": {
"displayMode": "table",
"placement": "right",
"calcs": ["mean", "lastNotNull"]
},
"tooltip": {
"mode": "multi",
"sort": "desc"
}
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "fffjikve8llhce"
},
"expr": "sum by (slug) (rate(updatecenter_bundle_download_bytes_total[5m]))",
"refId": "A",
"legendFormat": "{{slug}}"
}
],
"title": "Bundle Download Throughput by slug (5m)",
"transparent": true,
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "fffjikve8llhce"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"lineWidth": 1,
"fillOpacity": 10
},
"unit": "reqps"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 21
},
"id": 23,
"options": {
"legend": {
"displayMode": "table",
"placement": "right",
"calcs": ["mean", "lastNotNull"]
},
"tooltip": {
"mode": "multi",
"sort": "desc"
}
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "fffjikve8llhce"
},
"expr": "sum by (status) (rate(updatecenter_checkins_total[5m]))",
"refId": "A",
"legendFormat": "status={{status}}"
}
],
"title": "Agent Check-in Rate by status (5m)",
"transparent": true,
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "fffjikve8llhce"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"thresholds": {
"mode": "absolute",
"steps": [
{ "color": "#4ade80", "value": null },
{ "color": "#f87171", "value": 1 }
]
},
"unit": "none",
"decimals": 2
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 6,
"x": 12,
"y": 21
},
"id": 24,
"options": {
"colorMode": "background",
"graphMode": "area",
"justifyMode": "center",
"orientation": "auto",
"reduceOptions": {
"calcs": ["sum"],
"fields": "",
"values": false
},
"textMode": "value_and_name"
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "fffjikve8llhce"
},
"expr": "increase(updatecenter_signature_verify_failures_total[1h])",
"refId": "A",
"legendFormat": "Sig Verify Failures (1h)"
}
],
"title": "Signature Verify Failures (1h)",
"transparent": true,
"type": "stat"
},
{
"datasource": {
"type": "prometheus",
"uid": "fffjikve8llhce"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"lineWidth": 1,
"fillOpacity": 10
},
"unit": "reqps"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 6,
"x": 18,
"y": 21
},
"id": 25,
"options": {
"legend": {
"displayMode": "table",
"placement": "right",
"calcs": ["mean", "lastNotNull"]
},
"tooltip": {
"mode": "multi",
"sort": "desc"
}
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "fffjikve8llhce"
},
"expr": "sum by (slug, channel) (rate(updatecenter_release_publishes_total[5m]))",
"refId": "A",
"legendFormat": "{{slug}}/{{channel}}"
}
],
"title": "Release Publishes rate by slug/channel (5m)",
"transparent": true,
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "fffjikve8llhce"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"lineWidth": 1,
"fillOpacity": 10
},
"unit": "reqps"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 29
},
"id": 26,
"options": {
"legend": {
"displayMode": "table",
"placement": "right",
"calcs": ["mean", "lastNotNull"]
},
"tooltip": {
"mode": "multi",
"sort": "desc"
}
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "fffjikve8llhce"
},
"expr": "sum by (kind, status) (rate(updatecenter_bundle_downloads_total[5m]))",
"refId": "A",
"legendFormat": "{{kind}} / {{status}}"
}
],
"title": "Bundle Download Requests by kind/status (5m)",
"transparent": true,
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "fffjikve8llhce"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"lineWidth": 2,
"fillOpacity": 20
},
"thresholds": {
"mode": "absolute",
"steps": [
{ "color": "#4ade80", "value": null },
{ "color": "#f87171", "value": 0.01 }
]
},
"unit": "reqps"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 29
},
"id": 27,
"options": {
"legend": {
"displayMode": "table",
"placement": "right",
"calcs": ["mean", "lastNotNull"]
},
"tooltip": {
"mode": "multi",
"sort": "desc"
}
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "fffjikve8llhce"
},
"expr": "rate(updatecenter_signature_verify_failures_total[5m])",
"refId": "A",
"legendFormat": "Sig verify failures/s"
}
],
"title": "Signature Verify Failure Rate (5m) — Critical if >0",
"transparent": true,
"type": "timeseries"
}
],
"refresh": "30s",
"schemaVersion": 39,
"style": "dark",
"tags": [
"blue-jay",
"flowercore",
"synthetic",
"updatecenter",
"otel"
],
"templating": {
"list": []
},
"time": {
"from": "now-24h",
"to": "now"
},
"timezone": "browser",
"title": "FlowerCore.UpdateCenter Dashboard",
"uid": "fc-updatecenter",
"version": 2
}

View File

@@ -1,226 +0,0 @@
{
"annotations": {
"list": []
},
"editable": true,
"fiscalYearStartMonth": 0,
"graphTooltip": 0,
"id": null,
"links": [],
"panels": [
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"unit": "short"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 0
},
"id": 1,
"options": {
"legend": {
"displayMode": "table",
"placement": "bottom"
},
"tooltip": {
"mode": "single"
}
},
"targets": [
{
"editorMode": "code",
"expr": "sum by (event) (increase(fc_desktop_session_events_total[$__rate_interval]))",
"legendFormat": "{{event}}",
"range": true,
"refId": "A"
}
],
"title": "RemoteDesktop Session Events",
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"unit": "short"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 0
},
"id": 2,
"options": {
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"showUnfilled": true
},
"targets": [
{
"editorMode": "code",
"expr": "sum by (template, event) (increase(fc_desktop_session_events_total[24h]))",
"legendFormat": "{{template}} {{event}}",
"range": true,
"refId": "A"
}
],
"title": "24h Session Events By Template",
"type": "bargauge"
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"unit": "short"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 8
},
"id": 3,
"options": {
"legend": {
"displayMode": "table",
"placement": "bottom"
},
"tooltip": {
"mode": "single"
}
},
"targets": [
{
"editorMode": "code",
"expr": "fc_desktop_pool_ready",
"legendFormat": "{{template}} ready",
"range": true,
"refId": "A"
},
{
"editorMode": "code",
"expr": "fc_desktop_pool_desired",
"legendFormat": "{{template}} desired",
"range": true,
"refId": "B"
}
],
"title": "Warm Pool Ready vs Desired",
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "orange",
"value": 1
}
]
},
"unit": "short"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 8
},
"id": 4,
"options": {
"colorMode": "value",
"graphMode": "none",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"textMode": "auto"
},
"targets": [
{
"editorMode": "code",
"expr": "sum(increase(fc_desktop_session_events_total{event=\"connect\",browser_datasource=\"json\"}[24h])) - sum(increase(fc_desktop_session_events_total{event=\"disconnect\"}[24h]))",
"range": true,
"refId": "A"
}
],
"title": "24h Connect Minus Disconnect",
"type": "stat"
}
],
"refresh": "30s",
"schemaVersion": 39,
"style": "dark",
"tags": [
"flowercore",
"remotedesktop",
"guacamole"
],
"templating": {
"list": []
},
"time": {
"from": "now-24h",
"to": "now"
},
"timezone": "browser",
"title": "FlowerCore RemoteDesktop",
"uid": "flowercore-remotedesktop",
"version": 1
}

View File

@@ -974,52 +974,6 @@ data:
summary: "Deployment {{ $labels.namespace }}/{{ $labels.deployment }} replica mismatch"
description: "Spec wants {{ $labels.spec_replicas }} but only {{ $value }} available. Likely a rollout stuck on probe failure, scheduling, or PVC."
- alert: LinuxRunnerOffline
expr: |
kube_deployment_status_replicas_available{namespace="github-runner",deployment=~"github-runner(|-(puppet|signage|dms|telephony|print-web|chat|mysql|kiosk-linux))"} < 1
for: 10m
labels:
severity: warning
service: github-runner
alert_channel: thermal_print
annotations:
summary: "Linux GitHub Actions runner offline: {{ $labels.deployment }}"
description: "{{ $labels.deployment }} has no available runner pod for 10 minutes. GitHub jobs using [self-hosted, linux, fc-build-linux] for its repo will queue at $0 until the runner returns."
runbook_url: "https://gitea.iamworkin.lan/bluejay/FlowerCore.Notes/src/branch/master/docs/infrastructure/self-hosted-runner-fleet.md"
# Q-MR-3 (2026-05-11): multus memory pressure — catches the next OOM
# cascade BEFORE multus is OOM-killed cluster-wide. The 2026-05-10
# outage (21h) hit because no alert fired on the rising multus working
# set — only downstream blackbox / Traefik / service alerts. With
# 1Gi limit (bluejay-infra@eb8693e), 80% = ~800MiB; steady-state
# runs ~150-250MiB so this only fires when an avalanche starts.
- alert: MultusMemoryPressure
expr: |
container_memory_working_set_bytes{container="kube-multus"}
/ container_spec_memory_limit_bytes{container="kube-multus"} > 0.8
for: 5m
labels:
severity: critical
alert_channel: thermal_print
annotations:
summary: "kube-multus memory >80% of limit on {{ $labels.node }} for 5m"
description: "kube-multus working set is {{ $value | humanizePercentage }} of its memory limit on node {{ $labels.node }}. If this keeps climbing, multus will OOM and all new pod networking will halt cluster-wide (precedent: 2026-05-10 outage)."
# Q-MR-3 (2026-05-11): namespace pending-pod backlog — catches the
# operator-leak avalanche pattern BEFORE it cascades into a multus
# CNI OOM. Any FC operator (RemoteDesktop / Distribution / WorldBuilder)
# emitting pods without ownerReferences will accumulate them when
# the operator crashes. >25 pending pods in any namespace for 30m
# is the signal to investigate the reconciler.
- alert: NamespacePendingPodBacklog
expr: sum by (namespace) (kube_pod_status_phase{phase="Pending"}) > 25
for: 30m
labels:
severity: warning
annotations:
summary: "Namespace {{ $labels.namespace }} has {{ $value }} Pending pods for 30m"
description: "Pending pod count in {{ $labels.namespace }} exceeds 25 sustained for 30m. Likely operator-leak avalanche pattern — children emitted without ownerReferences. Risk of multus CNI OOM cascade."
# Longhorn storage health alerts. Required: longhorn scrape job
# (added 2026-04-26 — see scrape_configs above). The K8s events
# for "snapshot becomes not ready to use" are transient lifecycle
@@ -1070,72 +1024,6 @@ data:
summary: "Longhorn node {{ $labels.node }} not Ready"
description: "Node {{ $labels.node }} reports ready=false (reason: {{ $labels.condition_reason }}). Volumes scheduled to this node will be unavailable until it recovers."
# ============================================================
# FC Signage Marquee Performance — Track 3 + 8 (2026-05-06)
# Live-mirrored from FlowerCore.Notes/scripts/monitoring/alerts.yml.
# Source-of-truth for the live Podman Prometheus on noc1 is the
# Notes file; this K8s ConfigMap exists so a future migration to
# in-cluster Prometheus inherits the ruleset automatically.
# See feedback_monitoring_k8s_target_vs_live_podman.
# ============================================================
- name: fc-signage-marquee
rules:
- alert: MarqueeDroppedFramesHigh
expr: |
(
sum by (renderer, phase, node_id) (rate(marquee_dropped_frames_total[5m]))
/
sum by (renderer, phase, node_id) (rate(marquee_render_latency_ms_count[5m]))
) > 0.05
unless on()
absent_over_time(marquee_dropped_frames_total[7d])
for: 5m
labels:
severity: warning
service: signage
alert_channel: irc
annotations:
summary: "Marquee dropped-frame rate >5% on {{ $labels.renderer }}/{{ $labels.node_id }} ({{ $labels.phase }})"
description: "Renderer {{ $labels.renderer }} on {{ $labels.node_id }} drops >5% of frames during {{ $labels.phase }}. Animation visibly stuttery."
- alert: MarqueeRenderLatencyP99High
expr: |
histogram_quantile(
0.99,
sum by (renderer, phase, node_id, le) (rate(marquee_render_latency_ms_bucket[5m]))
) > 16
unless on()
absent_over_time(marquee_render_latency_ms_bucket[7d])
for: 10m
labels:
severity: warning
service: signage
alert_channel: irc
annotations:
summary: "Marquee render latency p99 > 16ms on {{ $labels.renderer }}/{{ $labels.node_id }} ({{ $labels.phase }})"
description: "Per-frame render latency p99 has exceeded the Pi-class 16ms budget for 10 minutes."
- alert: MarqueeAnimationDurationDrift
expr: |
abs(
histogram_quantile(0.5, sum by (renderer, phase, le) (rate(marquee_animation_duration_ms_bucket[15m])))
-
on (phase) group_left() avg by (phase) (marquee_animation_duration_target_ms)
)
/
on (phase) group_left() avg by (phase) (marquee_animation_duration_target_ms)
> 0.10
unless on()
absent_over_time(marquee_animation_duration_ms_bucket[7d])
for: 15m
labels:
severity: info
service: signage
alert_channel: irc
annotations:
summary: "Marquee animation duration drifting > 10% on {{ $labels.renderer }} ({{ $labels.phase }})"
description: "Median observed cycle duration deviates from target DurationMs by >10%. Could indicate browser tab throttling, GPU pressure, or phase-advancement bug."
# =============================================================================
# ConfigMap: Blackbox Exporter Configuration
# =============================================================================
@@ -3440,33 +3328,6 @@ data:
relativeTimeRange: {from: 120, to: 0}
datasourceUid: __expr__
model: {type: threshold, expression: B, conditions: [{evaluator: {params: [1], type: lt}}], refId: C}
- uid: linux-runner-offline
title: LinuxRunnerOffline
condition: C
for: 10m
noDataState: Alerting
execErrState: OK
annotations:
summary: Linux GitHub Actions runner offline
description: "A repo-scoped fc-build-linux runner deployment has no available pod. Jobs will queue at $0 until ArgoCD/K8s returns the runner."
runbook_url: "https://gitea.iamworkin.lan/bluejay/FlowerCore.Notes/src/branch/master/docs/infrastructure/self-hosted-runner-fleet.md"
labels:
severity: warning
service: github-runner
alert_channel: thermal_print
data:
- refId: A
relativeTimeRange: {from: 600, to: 0}
datasourceUid: prometheus
model: {expr: 'min by(deployment) (kube_deployment_status_replicas_available{namespace="github-runner",deployment=~"github-runner(|-(puppet|signage|dms|telephony|print-web|chat|mysql|kiosk-linux))"})', instant: true, refId: A}
- refId: B
relativeTimeRange: {from: 600, to: 0}
datasourceUid: __expr__
model: {type: reduce, expression: A, reducer: last, refId: B}
- refId: C
relativeTimeRange: {from: 600, to: 0}
datasourceUid: __expr__
model: {type: threshold, expression: B, conditions: [{evaluator: {params: [1], type: lt}}], refId: C}
- uid: high-cpu
title: High CPU (>85%)
condition: C

View File

@@ -1,297 +0,0 @@
# =============================================================================
# Multus CNI — Meta-CNI for multi-network attachment to pods/VMs
# =============================================================================
# Purpose: enable KubeVirt VMs (and any future workload) to attach additional
# network interfaces beyond the default Calico-managed pod network. Required
# for ci1 (Windows Server 2025 KubeVirt VM) to bridge onto PROD VLAN 57.
#
# Source: upstream k8snetworkplumbingwg/multus-cni v4.2.2
# https://github.com/k8snetworkplumbingwg/multus-cni/blob/v4.2.2/deployments/multus-daemonset-thick.yml
#
# Inlined verbatim (with project header + version pin annotation) for
# reproducibility and air-gap safety. Bumping versions = edit this file +
# git push. ArgoCD picks up via the bluejay-infra ApplicationSet
# (apps/* directory generator on main).
#
# Why thick plugin (not thin):
# - Thick = daemon + thin shim binary; daemon handles NAD watch + CRD reads
# centrally so each pod's CNI ADD doesn't hit the K8s API server. Better
# for clusters with many NAD-using pods.
# - Thin = each CNI ADD process directly contacts K8s API. Simpler but
# scales worse and has more failure modes.
# - KubeVirt + multi-VM workload pattern fits thick perfectly.
#
# Cluster context (verified 2026-05-08):
# - RKE2 v1.34.5 on 3 nodes (rke2-server, rke2-agent1, rke2-agent2)
# - Calico CNI (Tigera-managed) at /etc/cni/net.d + /opt/cni/bin (default)
# - openSUSE Leap 16, kernel 6.12, containerd 2.1.5
# - host bridge for PROD VLAN 57 = `br-prod` (PUPPET HOST WORK — see Phase 1.5
# in docs/infrastructure/windows-server-build-runner-plan.md)
#
# Version pin: snapshot-thick → pinning to v4.2.2 release tag at deploy time
# would require a private mirror of the image. Upstream `snapshot-thick` tag
# is updated on every release, so for now we trust upstream + Calico's
# established pattern. Pin to a specific SHA256 once we mirror to Gitea OCI.
#
# Apply (once committed to bluejay-infra main, ApplicationSet auto-syncs):
# git add apps/multus/multus.yaml && git commit && git push origin main
# # ArgoCD `infra-multus` Application appears within 3 min via ApplicationSet
#
# Verify:
# kubectl -n kube-system get ds kube-multus-ds
# kubectl -n kube-system rollout status ds kube-multus-ds
# kubectl get crd network-attachment-definitions.k8s.cni.cncf.io
# =============================================================================
---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: network-attachment-definitions.k8s.cni.cncf.io
annotations:
bluejay.iamworkin.lan/source: "k8snetworkplumbingwg/multus-cni v4.2.2"
spec:
group: k8s.cni.cncf.io
scope: Namespaced
names:
plural: network-attachment-definitions
singular: network-attachment-definition
kind: NetworkAttachmentDefinition
shortNames:
- net-attach-def
versions:
- name: v1
served: true
storage: true
schema:
openAPIV3Schema:
description: 'NetworkAttachmentDefinition is a CRD schema specified by the Network Plumbing
Working Group to express the intent for attaching pods to one or more logical or physical
networks. More information available at: https://github.com/k8snetworkplumbingwg/multi-net-spec'
type: object
properties:
apiVersion:
type: string
kind:
type: string
metadata:
type: object
spec:
description: 'NetworkAttachmentDefinition spec defines the desired state of a network attachment'
type: object
properties:
config:
description: 'NetworkAttachmentDefinition config is a JSON-formatted CNI configuration'
type: string
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: multus
rules:
- apiGroups: ["k8s.cni.cncf.io"]
resources:
- '*'
verbs:
- '*'
- apiGroups:
- ""
resources:
- pods
- pods/status
verbs:
- get
- list
- update
- watch
- apiGroups:
- ""
- events.k8s.io
resources:
- events
verbs:
- create
- patch
- update
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: multus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: multus
subjects:
- kind: ServiceAccount
name: multus
namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: multus
namespace: kube-system
---
kind: ConfigMap
apiVersion: v1
metadata:
name: multus-daemon-config
namespace: kube-system
labels:
tier: node
app: multus
data:
daemon-config.json: |
{
"chrootDir": "/hostroot",
"cniVersion": "0.3.1",
"logLevel": "verbose",
"logToStderr": true,
"cniConfigDir": "/host/etc/cni/net.d",
"multusAutoconfigDir": "/host/etc/cni/net.d",
"multusConfigFile": "auto",
"socketDir": "/host/run/multus/"
}
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: kube-multus-ds
namespace: kube-system
labels:
tier: node
app: multus
name: multus
spec:
selector:
matchLabels:
name: multus
updateStrategy:
type: RollingUpdate
template:
metadata:
labels:
tier: node
app: multus
name: multus
spec:
hostNetwork: true
hostPID: true
tolerations:
- operator: Exists
effect: NoSchedule
- operator: Exists
effect: NoExecute
serviceAccountName: multus
containers:
- name: kube-multus
image: ghcr.io/k8snetworkplumbingwg/multus-cni:snapshot-thick
command: [ "/usr/src/multus-cni/bin/multus-daemon" ]
# 2026-05-11: upstream default of 50Mi memory limit OOM-cascades when
# an operator-owned namespace accumulates >100 pending pods retrying
# CNI ADD. RemoteDesktop emitted 219 orphan rd-browser-only pods
# (missing OwnerReferences), kubelet's CNI ADD avalanche pushed multus
# over 50Mi, OOMKilled, restarted with even bigger backlog → loop.
# 21h cluster outage. See FlowerCore.Notes:
# feedback_multus_50mi_limit_oom_orphan_pod_avalanche.md
# 1Gi limit / 512Mi request comfortably handles a 200+ pod CNI
# catchup burst on 64GB nodes (nodes are <25% used in steady-state).
# Drop back toward 256Mi only after MultusMemoryPressure alert
# proves steady-state working set sits well below 200Mi.
resources:
requests:
cpu: "100m"
memory: "512Mi"
limits:
cpu: "100m"
memory: "1Gi"
securityContext:
privileged: true
terminationMessagePolicy: FallbackToLogsOnError
volumeMounts:
- name: cni
mountPath: /host/etc/cni/net.d
# multus-daemon expects that cnibin path must be identical between pod and container host.
# e.g. if the cni bin is in '/opt/cni/bin' on the container host side, then it should be mount to '/opt/cni/bin' in multus-daemon,
# not to any other directory, like '/opt/bin' or '/usr/bin'.
- name: cnibin
mountPath: /opt/cni/bin
- name: host-run
mountPath: /host/run
- name: host-var-lib-cni-multus
mountPath: /var/lib/cni/multus
- name: host-var-lib-kubelet
mountPath: /var/lib/kubelet
mountPropagation: HostToContainer
- name: host-run-k8s-cni-cncf-io
mountPath: /run/k8s.cni.cncf.io
- name: host-run-netns
mountPath: /run/netns
mountPropagation: HostToContainer
- name: multus-daemon-config
mountPath: /etc/cni/net.d/multus.d
readOnly: true
- name: hostroot
mountPath: /hostroot
mountPropagation: HostToContainer
- mountPath: /etc/cni/multus/net.d
name: multus-conf-dir
env:
- name: MULTUS_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
initContainers:
- name: install-multus-binary
image: ghcr.io/k8snetworkplumbingwg/multus-cni:snapshot-thick
command:
- "sh"
- "-c"
- "cp /usr/src/multus-cni/bin/multus-shim /host/opt/cni/bin/multus-shim && cp /usr/src/multus-cni/bin/passthru /host/opt/cni/bin/passthru"
resources:
requests:
cpu: "10m"
memory: "15Mi"
securityContext:
privileged: true
terminationMessagePolicy: FallbackToLogsOnError
volumeMounts:
- name: cnibin
mountPath: /host/opt/cni/bin
mountPropagation: Bidirectional
terminationGracePeriodSeconds: 10
volumes:
- name: cni
hostPath:
path: /etc/cni/net.d
- name: cnibin
hostPath:
path: /opt/cni/bin
- name: hostroot
hostPath:
path: /
- name: multus-daemon-config
configMap:
name: multus-daemon-config
items:
- key: daemon-config.json
path: daemon-config.json
- name: host-run
hostPath:
path: /run
- name: host-var-lib-cni-multus
hostPath:
path: /var/lib/cni/multus
- name: host-var-lib-kubelet
hostPath:
path: /var/lib/kubelet
- name: host-run-k8s-cni-cncf-io
hostPath:
path: /run/k8s.cni.cncf.io
- name: host-run-netns
hostPath:
path: /run/netns/
- name: multus-conf-dir
hostPath:
path: /etc/cni/multus/net.d

View File

@@ -1,210 +0,0 @@
# Selenium Grid NetworkPolicy.
#
# Captured into bluejay-infra 2026-05-07 during the regroup audit. This
# NetworkPolicy was previously applied via `kubectl apply` directly to
# the cluster with no source-of-truth anywhere — a fresh cluster rebuild
# would have lost all of it (including the Selenium Grid → Traefik VIP
# allow rule for AAT runs against `*.iamworkin.lan` services).
#
# The Selenium Grid Deployment + Services themselves are still managed
# outside ArgoCD (deployed via raw kubectl from the original Selenium
# Grid bring-up). Migrating those into bluejay-infra is a separate lane —
# this commit only restores GitOps repeatability for the NetworkPolicy.
#
# Rules captured from the live cluster's `kubectl get netpol -n selenium
# selenium-netpol -o yaml` on 2026-05-07. Originally applied 2026-03-15
# (from `metadata.creationTimestamp` before the field was stripped).
#
# Allows:
# - Egress: CoreDNS, intra-namespace pod-to-pod (4442/4443/4444/5555),
# Traefik VIP for `*.iamworkin.lan` AAT runs, all FC namespaces on
# standard FC service ports (5100/5200/5300/5400/8080), pod CIDR
# (10.42.0.0/16) + service CIDR (10.43.0.0/16) for the same ports,
# LAN gateway range (10.0.56.0/24) for HTTPS, edge2 CUPS print
# (10.0.57.16:5200), public internet 80/443 (excluding RFC1918), and
# fc-signage:5190 for the signage AAT lane.
# - Ingress: Traefik (4444 + 8089 ACME-solver-style), intra-pod,
# telephony / gitea / fc-system / fc-signage namespaces on 4444.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: selenium-netpol
namespace: selenium
labels:
app.kubernetes.io/part-of: selenium
app.kubernetes.io/component: isolation
spec:
egress:
- ports:
- port: 53
protocol: UDP
- port: 53
protocol: TCP
to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
- ports:
- port: 4442
protocol: TCP
- port: 4443
protocol: TCP
- port: 4444
protocol: TCP
- port: 5555
protocol: TCP
to:
- podSelector: {}
- ports:
- port: 443
protocol: TCP
- port: 80
protocol: TCP
to:
- ipBlock:
cidr: 10.0.56.200/32
- ports:
- port: 443
protocol: TCP
- port: 80
protocol: TCP
- port: 5200
protocol: TCP
- port: 5300
protocol: TCP
- port: 5400
protocol: TCP
- port: 5100
protocol: TCP
- port: 8080
protocol: TCP
to:
- namespaceSelector: {}
- ports:
- port: 443
protocol: TCP
- port: 80
protocol: TCP
- port: 8443
protocol: TCP
- port: 8080
protocol: TCP
- port: 5200
protocol: TCP
- port: 5300
protocol: TCP
- port: 5400
protocol: TCP
- port: 5100
protocol: TCP
to:
- ipBlock:
cidr: 10.43.0.0/16
- ports:
- port: 443
protocol: TCP
- port: 80
protocol: TCP
- port: 8443
protocol: TCP
- port: 8080
protocol: TCP
- port: 5200
protocol: TCP
- port: 5300
protocol: TCP
- port: 5400
protocol: TCP
- port: 5100
protocol: TCP
to:
- ipBlock:
cidr: 10.42.0.0/16
- ports:
- port: 443
protocol: TCP
- port: 80
protocol: TCP
- port: 8443
protocol: TCP
to:
- ipBlock:
cidr: 10.0.56.0/24
- ports:
- port: 5200
protocol: TCP
to:
- ipBlock:
cidr: 10.0.57.16/32
- ports:
- port: 80
protocol: TCP
- port: 443
protocol: TCP
to:
- ipBlock:
cidr: 0.0.0.0/0
except:
- 172.16.0.0/12
- 192.168.0.0/16
- ports:
- port: 5190
protocol: TCP
to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: fc-signage
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: traefik-system
ports:
- port: 4444
protocol: TCP
- port: 8089
protocol: TCP
- from:
- podSelector: {}
ports:
- port: 4442
protocol: TCP
- port: 4443
protocol: TCP
- port: 4444
protocol: TCP
- port: 5555
protocol: TCP
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: telephony
ports:
- port: 4444
protocol: TCP
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: gitea
ports:
- port: 4444
protocol: TCP
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: fc-system
ports:
- port: 4444
protocol: TCP
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: fc-signage
ports:
- port: 4444
protocol: TCP
podSelector: {}
policyTypes:
- Ingress
- Egress

View File

@@ -127,13 +127,10 @@ spec:
initContainers:
- name: fix-data-perms
image: busybox:latest
# Must run as root to chown the hostPath /tmp/tts-audio that may be
# root-owned after node reboot. Pod-level runAsNonRoot:true would
# otherwise inherit and chown would fail with EPERM (see Notes memory
# feedback_hostpath_initcontainer_chown_perms).
securityContext:
runAsUser: 0
runAsNonRoot: false
# Also chown /shared-tts (hostPath /tmp/tts-audio) so the non-root
# app user (uid 1654) can write Piper .sln16 files that Asterisk
# reads at /var/lib/asterisk/sounds/tts. World-readable (755) is
# fine — Asterisk runs as a different uid in the other pod.
command: ["sh", "-c", "chown -R 1654:1654 /data && chown 1654:1654 /shared-tts && chmod 0755 /shared-tts"]
volumeMounts:
- name: telephony-data

View File

@@ -1,60 +0,0 @@
# FlowerCore.WorldBuilder
ArgoCD-managed manifest for FlowerCore.WorldBuilder.Web — comic / storyboard
authoring service that drives ComfyUI for panel image generation and
QuestPDF for letter / A4 export.
Source: `D:\git\FlowerCore\FlowerCore.WorldBuilder` (master)
## Deployment order
1. **DNS preflight**`worldbuilder.iamworkin.lan -> 10.0.56.200` MUST exist
in pfSense Unbound before this manifest is applied, or cert-manager
HTTP-01 silently exponential-backs-off ~2h.
Memory: `feedback_pfsense_dns_required_for_acme`.
2. **Image import to ALL RKE2 nodes** — pod can schedule to any of
`rke2-server` (10.0.56.11), `rke2-agent1` (10.0.56.12),
`rke2-agent2` (10.0.56.13). Build with:
```bash
bash deploy/build.sh # in FlowerCore.WorldBuilder repo
podman save localhost/fc-worldbuilder:v<TAG> -o /tmp/fc-worldbuilder-v<TAG>.tar
for h in 10.0.56.11 10.0.56.12 10.0.56.13; do
scp /tmp/fc-worldbuilder-v<TAG>.tar fcadmin@$h:/tmp/
ssh fcadmin@$h \
"sudo /var/lib/rancher/rke2/bin/ctr -a /run/k3s/containerd/containerd.sock \
-n k8s.io images import /tmp/fc-worldbuilder-v<TAG>.tar"
done
```
Memory: `feedback_rke2_image_import_per_node_scp`.
3. **Bump image tag** in `worldbuilder.yaml` and git push.
ArgoCD ApplicationSet picks up within ~3 minutes.
4. **First production render** — open `https://worldbuilder.iamworkin.lan`,
create World → Character → Storyboard → ExportJob, confirm artifact
downloads. ComfyUI lives on BLUEJAY-WS at `http://10.0.56.20:8188`.
## Health probes
- `startupProbe` + `readinessProbe`: `httpGet /healthz` (registered explicitly
in Program.cs — anonymous, no DB or OpenAPI dependency).
- `livenessProbe`: `tcpSocket` as a cheap fallback.
Memory: `feedback_k8s_probes_must_not_hit_openapi`,
`feedback_k8s_probes_behind_auth_middleware`.
## Storage
- Longhorn RWO PVC `worldbuilder-data` (5Gi) mounted at `/data`. SQLite DB
lives at `/data/worldbuilder.db`, generated images under `/data/gallery/`,
PDF/PNG exports under `/data/exports/`.
- DataProtection keys persist to the same SQLite via
`AddFlowerCoreDataProtection<WorldBuilderDbContext>` — explicit migration
`20260429133417_Initial` already creates `fc_dp_keys`.
Memory: `feedback_dataprotection_keys_persist_to_app_dbcontext`,
`feedback_intranet_dataprotection_table_must_have_explicit_migration`.
## Image generation backend
`FlowerCore:WorldBuilder:ImageGeneration:BaseUrl=http://10.0.56.20:8188` —
ComfyUI runs on BLUEJAY-WS Windows (R9700 / gfx1201 / ROCm 7.2.1). Pod reaches
the workstation directly across the 10.0.56.0/24 VLAN (no Podman-style host-
filter issues — K8s pods route via Calico, which is L3-routed across the
VLAN).

View File

@@ -1,213 +0,0 @@
# FlowerCore.WorldBuilder — comic / storyboard authoring service.
#
# Deployment + Service + PVC + Certificate + IngressRoute. ArgoCD-managed
# end-to-end. See apps/worldbuilder/README.md for the per-deploy runbook.
#
# Image build (BLUEJAY-WS):
# bash deploy/build.sh # in FlowerCore.WorldBuilder repo
# podman save localhost/fc-worldbuilder:v<TAG> -o /tmp/fc-worldbuilder-v<TAG>.tar
# for h in 10.0.56.11 10.0.56.12 10.0.56.13; do
# scp /tmp/fc-worldbuilder-v<TAG>.tar fcadmin@$h:/tmp/
# ssh fcadmin@$h "sudo /var/lib/rancher/rke2/bin/ctr -a /run/k3s/containerd/containerd.sock -n k8s.io images import /tmp/fc-worldbuilder-v<TAG>.tar"
# done
---
apiVersion: v1
kind: Namespace
metadata:
name: fc-worldbuilder
labels:
app.kubernetes.io/part-of: flowercore
---
# SQLite DB + generated image gallery + PDF/PNG exports.
# Longhorn RWO — single replica with `Recreate` rollout strategy keeps it safe.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: worldbuilder-data
namespace: fc-worldbuilder
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
resources:
requests:
storage: 5Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: worldbuilder-web
namespace: fc-worldbuilder
labels:
app.kubernetes.io/name: worldbuilder-web
app.kubernetes.io/part-of: flowercore
spec:
replicas: 1
revisionHistoryLimit: 3
strategy:
# RWO PVC + single replica. Recreate avoids multi-attach overlap.
type: Recreate
selector:
matchLabels:
app.kubernetes.io/name: worldbuilder-web
template:
metadata:
labels:
app.kubernetes.io/name: worldbuilder-web
app.kubernetes.io/part-of: flowercore
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/metrics/prometheus"
spec:
securityContext:
fsGroup: 1654
fsGroupChangePolicy: OnRootMismatch
containers:
- name: web
# Bump tag for each rebuild. Initial deploy: v202605062048
image: localhost/fc-worldbuilder:v202605062048
imagePullPolicy: Never
ports:
- containerPort: 8080
name: http
env:
- name: ASPNETCORE_URLS
value: "http://+:8080"
- name: ASPNETCORE_ENVIRONMENT
value: "Production"
- name: DOTNET_RUNNING_IN_CONTAINER
value: "true"
- name: DOTNET_SYSTEM_GLOBALIZATION_INVARIANT
value: "false"
# SQLite path overrides (default appsettings uses relative paths).
- name: ConnectionStrings__DefaultConnection
value: "Data Source=/data/worldbuilder.db"
- name: FlowerCore__Database__Provider
value: "Sqlite"
- name: FlowerCore__Database__ConnectionStrings__Sqlite
value: "Data Source=/data/worldbuilder.db"
# Generated image gallery + exports persist on /data.
- name: FlowerCore__WorldBuilder__ImageStore__RootPath
value: "/data/gallery"
- name: FlowerCore__WorldBuilder__Export__RootPath
value: "/data/exports"
# ComfyUI on BLUEJAY-WS (R9700 / gfx1201 / ROCm 7.2.1).
- name: FlowerCore__WorldBuilder__ImageGeneration__BaseUrl
value: "http://10.0.56.20:8188"
- name: FlowerCore__WorldBuilder__ImageGeneration__ClientMode
value: "comfyui"
resources:
# Cluster CPU-request budget runs hot (99% on all 3 nodes at deploy
# time) while actual CPU usage is well below capacity. Idle Blazor
# Server + SignalR + a single ComfyUI poller uses ~5m, so 25m is
# generous. Re-evaluate if active rendering/export workers ever
# push past the limit.
requests:
cpu: 25m
memory: 256Mi
limits:
cpu: 1000m
memory: 768Mi
# /healthz is registered explicitly in Program.cs (anonymous, no DB
# or OpenAPI dependency). Liveness uses tcpSocket as a cheap fallback
# in case future middleware changes accidentally gate /healthz.
# Memory: feedback_k8s_probes_must_not_hit_openapi,
# feedback_k8s_probes_behind_auth_middleware.
startupProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 30
readinessProbe:
httpGet:
path: /healthz
port: 8080
periodSeconds: 10
failureThreshold: 3
livenessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 30
periodSeconds: 30
failureThreshold: 3
securityContext:
runAsNonRoot: true
runAsUser: 1654
runAsGroup: 1654
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
volumeMounts:
- name: data
mountPath: /data
- name: tmp
mountPath: /tmp
- name: logs
mountPath: /app/logs
volumes:
- name: data
persistentVolumeClaim:
claimName: worldbuilder-data
- name: tmp
emptyDir: {}
- name: logs
emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
name: worldbuilder-web
namespace: fc-worldbuilder
labels:
app.kubernetes.io/name: worldbuilder-web
app.kubernetes.io/part-of: flowercore
spec:
type: ClusterIP
selector:
app.kubernetes.io/name: worldbuilder-web
ports:
- name: http
port: 80
targetPort: 8080
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: worldbuilder-web-tls
namespace: fc-worldbuilder
spec:
secretName: worldbuilder-web-tls
issuerRef:
name: step-ca-acme
kind: ClusterIssuer
dnsNames:
- worldbuilder.iamworkin.lan
# step-ca ACME provisioner caps lifetime at 30d. Requesting 90d
# silently capped to 30d, making renewBefore 720h (30d) equal to the
# actual cert lifetime — triggered a perpetual renewal loop that
# generated 2365+ CertificateRequest objects in 18h. Match the working
# 720h/240h pattern used by every other FC service cert.
duration: 720h # 30d (step-ca cap)
renewBefore: 240h # 10d
---
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: worldbuilder-web
namespace: fc-worldbuilder
spec:
entryPoints:
- websecure
routes:
- match: Host(`worldbuilder.iamworkin.lan`)
kind: Rule
services:
- name: worldbuilder-web
port: 80
tls:
secretName: worldbuilder-web-tls

View File

@@ -305,17 +305,15 @@ spec:
path: /
port: 8080
initialDelaySeconds: 60
timeoutSeconds: 15
timeoutSeconds: 5
periodSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /
port: 8080
initialDelaySeconds: 30
periodSeconds: 5
timeoutSeconds: 15
failureThreshold: 3
timeoutSeconds: 5
---
apiVersion: v1
kind: Service

View File

@@ -1,24 +0,0 @@
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<TargetFramework>net10.0</TargetFramework>
<ImplicitUsings>enable</ImplicitUsings>
<Nullable>enable</Nullable>
<IsPackable>false</IsPackable>
<TreatWarningsAsErrors>true</TreatWarningsAsErrors>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="coverlet.collector" Version="6.0.2">
<PrivateAssets>all</PrivateAssets>
<IncludeAssets>runtime; build; native; contentfiles; analyzers; buildtransitive</IncludeAssets>
</PackageReference>
<PackageReference Include="FluentAssertions" Version="6.12.1" />
<PackageReference Include="Microsoft.NET.Test.Sdk" Version="17.12.0" />
<PackageReference Include="xunit" Version="2.9.2" />
<PackageReference Include="xunit.runner.visualstudio" Version="2.8.2">
<PrivateAssets>all</PrivateAssets>
<IncludeAssets>runtime; build; native; contentfiles; analyzers; buildtransitive</IncludeAssets>
</PackageReference>
<PackageReference Include="YamlDotNet" Version="16.2.0" />
</ItemGroup>
</Project>

View File

@@ -1,746 +0,0 @@
using FluentAssertions;
using System.Text.RegularExpressions;
using Xunit;
using YamlDotNet.Core;
using YamlDotNet.RepresentationModel;
namespace BluejayInfraLint.Tests;
[Trait("Category", "Unit")]
public sealed class FleetManifestLintTests
{
private static readonly ManifestInventory Inventory = ManifestInventory.Load();
private static readonly HashSet<string> PublicReadOnlyHosts = new(StringComparer.Ordinal)
{
"dist.flowercore.io",
"dns.iamworkin.lan",
};
// Public hosts that allow a tightly bounded write surface in addition to
// GET/HEAD. updatecenter.iamworkin.lan accepts POST /api/v1/checkin/{id}
// (bootstrap-JWT) so its allowlist is GET||HEAD||POST||OPTIONS — but
// PUT/PATCH/DELETE must still 404 at the route. Anything wider than this
// set should fail this lint.
//
// PUB-1 (2026-05-06): update.flowercore.io / updates.flowercore.io were
// added for the Cloudflare-proxied public Update Center edge. They use the
// same bounded read-write allowlist as the LAN pair.
private static readonly HashSet<string> PublicReadWriteAllowlistHosts = new(StringComparer.Ordinal)
{
"updatecenter.iamworkin.lan",
"updates.iamworkin.lan",
"update.flowercore.io",
"updates.flowercore.io",
};
private static readonly HashSet<string> ApiKeyProtectedDeployments = new(StringComparer.Ordinal)
{
"messageboard-web",
"scoreboard-web",
"segmentdisplay-web",
"signalcontrol-web",
};
private static readonly HashSet<string> PublicEgressDeployments = new(StringComparer.Ordinal)
{
"asterisk",
"fc-llm-bridge",
"mysql-web",
"php-web",
"ttsreader-align",
"ttsreader-kokoro",
"ttsreader-modern",
"ttsreader-piper",
};
private static readonly IReadOnlyDictionary<string, string> TopLinuxRunnerRepos = new Dictionary<string, string>(StringComparer.Ordinal)
{
["github-runner-puppet"] = "https://github.com/astoltz/FlowerCore.Puppet",
["github-runner-signage"] = "https://github.com/astoltz/FlowerCore.Signage",
["github-runner-dms"] = "https://github.com/astoltz/FlowerCore.DMS",
["github-runner-telephony"] = "https://github.com/astoltz/FlowerCore.Telephony",
["github-runner-print-web"] = "https://github.com/astoltz/FlowerCore.Print.Web",
["github-runner-chat"] = "https://github.com/astoltz/FlowerCore.Chat",
["github-runner-mysql"] = "https://github.com/astoltz/FlowerCore.MySQL",
["github-runner-kiosk-linux"] = "https://github.com/astoltz/FlowerCore.Kiosk.Linux",
};
[Fact]
public void IngressRoutes_MustKeepServiceReferencesInTheSameNamespace()
{
var violations = Inventory.Documents
.Where(document => document.Kind == "IngressRoute")
.SelectMany(document =>
document.MappingSequence("spec", "routes")
.SelectMany(route =>
route.MappingSequence("services")
.Select(service => new
{
Document = document,
ServiceName = ManifestNodeExtensions.Scalar(service, "name"),
ServiceNamespace = ManifestNodeExtensions.Scalar(service, "namespace"),
})))
.Where(entry => !string.IsNullOrWhiteSpace(entry.ServiceNamespace))
.Where(entry => !string.Equals(entry.ServiceNamespace, entry.Document.Namespace, StringComparison.Ordinal))
.Select(entry =>
$"{entry.Document.Descriptor} references Service '{entry.ServiceName}' in namespace '{entry.ServiceNamespace}'.")
.ToList();
violations.Should().BeEmpty();
}
[Fact]
public void PublicReadOnlyIngressRoutes_MustExplicitlyAllowOnlyGetAndHead()
{
var violations = Inventory.Documents
.Where(document => document.Kind == "IngressRoute")
.SelectMany(document =>
document.MappingSequence("spec", "routes")
.Select(route => new
{
Document = document,
Match = ManifestNodeExtensions.Scalar(route, "match") ?? string.Empty,
}))
.Where(entry => PublicReadOnlyHosts.Any(host => entry.Match.Contains($"Host(`{host}`)", StringComparison.Ordinal)))
.Where(entry => !entry.Match.Contains("Method(`GET`)", StringComparison.Ordinal)
|| !entry.Match.Contains("Method(`HEAD`)", StringComparison.Ordinal))
.Select(entry => $"{entry.Document.Descriptor} is missing an explicit GET/HEAD method allowlist.")
.ToList();
violations.Should().BeEmpty();
}
[Fact]
public void PublicReadWriteIngressRoutes_MustPinGetHeadPostOptionsAllowlist()
{
// For hosts in PublicReadWriteAllowlistHosts, the route match MUST
// contain Method(`GET`), Method(`HEAD`), Method(`POST`), and
// Method(`OPTIONS`) AND MUST NOT contain Method(`PUT`),
// Method(`PATCH`), or Method(`DELETE`). This keeps the public
// allowlist invariant against regression — see Track A's
// updatecenter-web ingressroute hardening.
var violations = Inventory.Documents
.Where(document => document.Kind == "IngressRoute")
.SelectMany(document =>
document.MappingSequence("spec", "routes")
.Select(route => new
{
Document = document,
Match = ManifestNodeExtensions.Scalar(route, "match") ?? string.Empty,
}))
.Where(entry => PublicReadWriteAllowlistHosts.Any(host => entry.Match.Contains($"Host(`{host}`)", StringComparison.Ordinal)))
.SelectMany(entry =>
{
var localViolations = new List<string>();
foreach (var required in new[] { "GET", "HEAD", "POST", "OPTIONS" })
{
if (!entry.Match.Contains($"Method(`{required}`)", StringComparison.Ordinal))
{
localViolations.Add($"{entry.Document.Descriptor} is missing required Method(`{required}`).");
}
}
foreach (var forbidden in new[] { "PUT", "PATCH", "DELETE" })
{
if (entry.Match.Contains($"Method(`{forbidden}`)", StringComparison.Ordinal))
{
localViolations.Add($"{entry.Document.Descriptor} must not include Method(`{forbidden}`) on a public host.");
}
}
return localViolations;
})
.ToList();
violations.Should().BeEmpty();
}
[Fact]
public void TraefikVipNetworkPolicies_MustAllowPostDnatBackendPorts()
{
var violations = Inventory.Documents
.Where(document => document.Kind == "NetworkPolicy")
.Where(document => document.AllScalars().Any(value => value.Contains("10.0.56.200", StringComparison.Ordinal)))
.SelectMany(document =>
{
var ports = document.EgressPorts().ToHashSet(StringComparer.Ordinal);
var localViolations = new List<string>();
if (ports.Contains("443") && !ports.Contains("8443"))
{
localViolations.Add($"{document.Descriptor} allows Traefik VIP 443 without backend port 8443.");
}
if (ports.Contains("80") && !ports.Contains("8000") && !ports.Contains("8080"))
{
localViolations.Add($"{document.Descriptor} allows Traefik VIP 80 without a backend HTTP port (8000/8080).");
}
return localViolations;
})
.ToList();
violations.Should().BeEmpty();
}
[Fact]
public void ApiKeyProtectedDeployments_MustUseTcpSocketHealthProbes()
{
var violations = Inventory.Documents
.Where(document => document.Kind == "Deployment")
.Where(document => ApiKeyProtectedDeployments.Contains(document.Name))
.SelectMany(document => document.ContainerMappings().SelectMany(container =>
ProbeViolations(document, container, "readinessProbe")
.Concat(ProbeViolations(document, container, "livenessProbe"))))
.ToList();
violations.Should().BeEmpty();
}
[Fact]
public void GitHubRunnerFleet_MustRegisterTopLinuxReposAsRepoScopedDeployments()
{
var deployments = Inventory.Documents
.Where(document => document.Kind == "Deployment")
.Where(document => document.Namespace == "github-runner")
.ToDictionary(document => document.Name, StringComparer.Ordinal);
foreach (var expectedRunner in TopLinuxRunnerRepos)
{
deployments.Should().ContainKey(expectedRunner.Key);
var container = deployments[expectedRunner.Key].ContainerMappings().Should().ContainSingle().Subject;
EnvValue(container, "REPO_URL").Should().Be(expectedRunner.Value);
EnvValue(container, "EPHEMERAL").Should().Be("true");
EnvValue(container, "LABELS").Should().Be("self-hosted,linux,fc-build-linux");
EnvValue(container, "ACCESS_TOKEN").Should().BeNull("ACCESS_TOKEN must come from github-runner-token Secret, not a literal");
EnvSecretName(container, "ACCESS_TOKEN").Should().Be("github-runner-token");
EnvSecretKey(container, "ACCESS_TOKEN").Should().Be("credential");
}
}
[Fact]
public void GitHubRunnerFleet_MustPreserveExistingCommonRunnerShape()
{
var common = Inventory.Documents
.Single(document => document.Kind == "Deployment"
&& document.Namespace == "github-runner"
&& document.Name == "github-runner");
var container = common.ContainerMappings().Should().ContainSingle().Subject;
EnvValue(container, "REPO_URL").Should().Be("https://github.com/astoltz/FlowerCore.Common");
EnvValue(container, "RUNNER_NAME_PREFIX").Should().Be("rke2-linux");
EnvValue(container, "LABELS").Should().Be("self-hosted,linux,fc-build-linux");
var claimNames = common.MappingSequence("spec", "template", "spec", "volumes")
.Select(volume => ManifestNodeExtensions.Scalar(volume, "persistentVolumeClaim", "claimName"))
.Where(value => !string.IsNullOrWhiteSpace(value))
.ToList();
claimNames.Should().Contain("github-runner-nuget-cache");
}
[Fact]
public void GitHubRunnerFleet_MustUseOneRwoCachePerRepoScopedDeployment()
{
var pvcNames = Inventory.Documents
.Where(document => document.Kind == "PersistentVolumeClaim")
.Where(document => document.Namespace == "github-runner")
.Select(document => document.Name)
.ToHashSet(StringComparer.Ordinal);
foreach (var deploymentName in TopLinuxRunnerRepos.Keys)
{
var suffix = deploymentName["github-runner-".Length..];
pvcNames.Should().Contain($"github-runner-{suffix}-nuget-cache");
}
}
[Fact]
public void Monitoring_MustAlertWhenTopLinuxRunnerDeploymentIsUnavailable()
{
var monitoring = File.ReadAllText(Path.Combine(Inventory.BluejayRoot, "apps", "monitoring", "noc-monitoring.yaml"));
monitoring.Should().Contain("LinuxRunnerOffline");
monitoring.Should().Contain("kube_deployment_status_replicas_available{namespace=\"github-runner\"");
monitoring.Should().Contain("github-runner(|-(puppet|signage|dms|telephony|print-web|chat|mysql|kiosk-linux))");
monitoring.Should().Contain("runbook_url: \"https://gitea.iamworkin.lan/bluejay/FlowerCore.Notes/src/branch/master/docs/infrastructure/self-hosted-runner-fleet.md\"");
}
[Fact]
public void StatefulSets_WithVolumeClaimTemplates_MustDeclareFilesystemDefaults()
{
var violations = Inventory.Documents
.Where(document => document.Kind == "StatefulSet")
.Where(document => document.MappingSequence("spec", "volumeClaimTemplates").Count > 0)
.SelectMany(document =>
{
var localViolations = new List<string>();
if (string.IsNullOrWhiteSpace(document.Scalar("spec", "podManagementPolicy")))
{
localViolations.Add($"{document.Descriptor} is missing spec.podManagementPolicy.");
}
if (string.IsNullOrWhiteSpace(document.Scalar("spec", "revisionHistoryLimit")))
{
localViolations.Add($"{document.Descriptor} is missing spec.revisionHistoryLimit.");
}
foreach (var claimTemplate in document.MappingSequence("spec", "volumeClaimTemplates"))
{
if (!string.Equals(
ManifestNodeExtensions.Scalar(claimTemplate, "spec", "volumeMode"),
"Filesystem",
StringComparison.Ordinal))
{
var claimName = ManifestNodeExtensions.Scalar(claimTemplate, "metadata", "name") ?? "<unnamed>";
localViolations.Add($"{document.Descriptor} volumeClaimTemplate '{claimName}' is missing volumeMode: Filesystem.");
}
}
return localViolations;
})
.ToList();
violations.Should().BeEmpty();
}
[Fact]
public void LocallyImportedImages_MustUseLocalhostPrefixAndNeverPullPolicy()
{
var violations = Inventory.Documents
.Where(document => document.PodSpec() is not null)
.SelectMany(document => document.ContainerSpecs()
.Where(container => !string.IsNullOrWhiteSpace(container.Image))
.Select(container => new
{
Document = document,
Container = container,
}))
.Where(entry =>
(entry.Container.Image.StartsWith("localhost/", StringComparison.Ordinal)
&& !string.Equals(entry.Container.ImagePullPolicy, "Never", StringComparison.Ordinal))
|| (entry.Container.Image.StartsWith("fc-", StringComparison.Ordinal)
&& !entry.Container.Image.Contains('/', StringComparison.Ordinal)))
.Select(entry =>
{
if (entry.Container.Image.StartsWith("localhost/", StringComparison.Ordinal))
{
return $"{entry.Document.Descriptor} container '{entry.Container.Name}' uses {entry.Container.Image} without imagePullPolicy: Never.";
}
return $"{entry.Document.Descriptor} container '{entry.Container.Name}' uses non-local image '{entry.Container.Image}' for a node-imported FlowerCore workload.";
})
.ToList();
violations.Should().BeEmpty();
}
[Fact]
public void PublicEgressDeployments_MustOptOutOfIamworkinLanSearchSuffixes()
{
var violations = Inventory.Documents
.Where(document => document.PodSpec() is not null)
.Where(document => PublicEgressDeployments.Contains(document.Name))
.SelectMany(document =>
{
var localViolations = new List<string>();
var podSpec = document.PodSpec()!;
var dnsPolicy = ManifestNodeExtensions.Scalar(podSpec, "dnsPolicy");
var searches = ManifestNodeExtensions.ScalarSequence(podSpec, "dnsConfig", "searches").ToList();
if (!string.Equals(dnsPolicy, "None", StringComparison.Ordinal))
{
localViolations.Add($"{document.Descriptor} is missing dnsPolicy: None.");
}
if (searches.Count == 0)
{
localViolations.Add($"{document.Descriptor} is missing dnsConfig.searches.");
}
else if (searches.Any(search => search.Contains("iamworkin.lan", StringComparison.OrdinalIgnoreCase)))
{
localViolations.Add($"{document.Descriptor} still includes iamworkin.lan in dnsConfig.searches.");
}
return localViolations;
})
.ToList();
violations.Should().BeEmpty();
}
private static IEnumerable<string> ProbeViolations(
ManifestDocument document,
YamlMappingNode container,
string probeKey)
{
if (!ManifestNodeExtensions.TryGetMapping(container, probeKey, out var probe)
|| !ManifestNodeExtensions.TryGetMapping(probe, "httpGet", out var httpGet))
{
return Array.Empty<string>();
}
var path = ManifestNodeExtensions.Scalar(httpGet, "path");
if (!string.Equals(path, "/health", StringComparison.Ordinal))
{
return Array.Empty<string>();
}
var containerName = ManifestNodeExtensions.Scalar(container, "name") ?? "<unnamed>";
return new[]
{
$"{document.Descriptor} container '{containerName}' still uses {probeKey}.httpGet on /health.",
};
}
private static string? EnvValue(YamlMappingNode container, string name)
{
return EnvMapping(container, name) is { } env ? ManifestNodeExtensions.Scalar(env, "value") : null;
}
private static string? EnvSecretName(YamlMappingNode container, string name)
{
return EnvMapping(container, name) is { } env
? ManifestNodeExtensions.Scalar(env, "valueFrom", "secretKeyRef", "name")
: null;
}
private static string? EnvSecretKey(YamlMappingNode container, string name)
{
return EnvMapping(container, name) is { } env
? ManifestNodeExtensions.Scalar(env, "valueFrom", "secretKeyRef", "key")
: null;
}
private static YamlMappingNode? EnvMapping(YamlMappingNode container, string name)
{
return ManifestNodeExtensions.MappingSequence(container, "env")
.SingleOrDefault(env => string.Equals(ManifestNodeExtensions.Scalar(env, "name"), name, StringComparison.Ordinal));
}
}
internal sealed class ManifestInventory
{
private ManifestInventory(string workspaceRoot, string bluejayRoot, IReadOnlyList<ManifestDocument> documents)
{
WorkspaceRoot = workspaceRoot;
BluejayRoot = bluejayRoot;
Documents = documents;
}
public string WorkspaceRoot { get; }
public string BluejayRoot { get; }
public IReadOnlyList<ManifestDocument> Documents { get; }
public static ManifestInventory Load()
{
var bluejayRoot = FindBluejayInfraRoot();
var workspaceRoot = Directory.GetParent(bluejayRoot)?.FullName
?? throw new DirectoryNotFoundException($"Could not resolve workspace root from '{bluejayRoot}'.");
var documents = ManifestRoots(workspaceRoot, bluejayRoot)
.SelectMany(LoadDocumentsFromRoot)
.ToList();
return new ManifestInventory(workspaceRoot, bluejayRoot, documents);
}
private static string FindBluejayInfraRoot()
{
var current = new DirectoryInfo(AppContext.BaseDirectory);
while (current is not null)
{
if (Directory.Exists(Path.Combine(current.FullName, "apps"))
&& File.Exists(Path.Combine(current.FullName, "README.md")))
{
return current.FullName;
}
current = current.Parent;
}
throw new DirectoryNotFoundException("Could not find the bluejay-infra repository root from the test output directory.");
}
private static IEnumerable<string> ManifestRoots(string workspaceRoot, string bluejayRoot)
{
var roots = new[]
{
Path.Combine(bluejayRoot, "apps"),
Path.Combine(workspaceRoot, "FlowerCore.Chat", "k8s"),
Path.Combine(workspaceRoot, "FlowerCore.DMS", "k8s"),
Path.Combine(workspaceRoot, "FlowerCore.DNS", "k8s"),
Path.Combine(workspaceRoot, "FlowerCore.Intranet.Web", "k8s"),
Path.Combine(workspaceRoot, "FlowerCore.Kiosk", "k8s"),
Path.Combine(workspaceRoot, "FlowerCore.Media", "k8s"),
Path.Combine(workspaceRoot, "FlowerCore.MenuBoard", "k8s"),
Path.Combine(workspaceRoot, "FlowerCore.MessageBoard", "k8s"),
// FlowerCore.Notes/k8s/selenium/ is the live Selenium Grid
// manifest tree (consumed by deploy-selenium scripts).
// FlowerCore.Notes/k8s/guacamole/ + FlowerCore.Notes/k8s/monitoring/
// are historical scaffolds that have diverged from the live state
// (bluejay-infra/apps/guacamole + bluejay-infra/apps/monitoring are
// canonical). Operator review is required before bringing them in
// line OR decommissioning them — keep them out of the lint scope
// until that decision lands. See xxl-regroup-2026-05-03-followup.md
// "Codex 7 §0 stop conditions" + the C7 close-session output.
Path.Combine(workspaceRoot, "FlowerCore.Notes", "k8s", "selenium"),
Path.Combine(workspaceRoot, "FlowerCore.MySQL", "k8s"),
Path.Combine(workspaceRoot, "FlowerCore.PHP", "k8s"),
Path.Combine(workspaceRoot, "FlowerCore.Presentations", "k8s"),
Path.Combine(workspaceRoot, "FlowerCore.Print.Web", "k8s"),
Path.Combine(workspaceRoot, "FlowerCore.RemoteDesktop", "k8s"),
Path.Combine(workspaceRoot, "FlowerCore.Scoreboard", "k8s"),
Path.Combine(workspaceRoot, "FlowerCore.SegmentDisplay", "k8s"),
Path.Combine(workspaceRoot, "FlowerCore.SignalControl", "k8s"),
Path.Combine(workspaceRoot, "FlowerCore.TtsReader", "k8s"),
Path.Combine(workspaceRoot, "FlowerCore.Updater", "k8s"),
};
return roots.Where(Directory.Exists);
}
private static IEnumerable<ManifestDocument> LoadDocumentsFromRoot(string root)
{
foreach (var filePath in Directory.EnumerateFiles(root, "*.yaml", SearchOption.AllDirectories))
{
var fileText = File.ReadAllText(filePath);
var segments = SplitManifestDocuments(fileText);
for (var index = 0; index < segments.Count; index++)
{
var yaml = new YamlStream();
try
{
using var reader = new StringReader(segments[index]);
yaml.Load(reader);
}
catch (YamlException exception)
{
_ = exception;
continue;
}
if (yaml.Documents.Count == 0)
{
continue;
}
if (yaml.Documents[0].RootNode is YamlMappingNode mapping
&& ManifestNodeExtensions.Scalar(mapping, "kind") is not null)
{
yield return new ManifestDocument(root, filePath, index, fileText, mapping);
}
}
}
}
private static IReadOnlyList<string> SplitManifestDocuments(string fileText)
{
var documents = new List<string>();
var currentLines = new List<string>();
var seenApiVersion = false;
foreach (var line in Regex.Split(fileText, @"\r?\n"))
{
if (Regex.IsMatch(line, @"^\s*---\s*$"))
{
FlushCurrentDocument();
continue;
}
if (Regex.IsMatch(line, @"^\s*apiVersion:\s*")
&& seenApiVersion
&& currentLines.Any(existing => !string.IsNullOrWhiteSpace(existing)))
{
FlushCurrentDocument();
}
currentLines.Add(line);
if (Regex.IsMatch(line, @"^\s*apiVersion:\s*"))
{
seenApiVersion = true;
}
}
FlushCurrentDocument();
return documents;
void FlushCurrentDocument()
{
var text = string.Join(Environment.NewLine, currentLines).Trim();
if (!string.IsNullOrWhiteSpace(text))
{
documents.Add(text);
}
currentLines.Clear();
seenApiVersion = false;
}
}
}
internal sealed record ManifestDocument(
string RootPath,
string FilePath,
int DocumentIndex,
string FileText,
YamlMappingNode Root)
{
public string Kind => Scalar("kind") ?? string.Empty;
public string Name => Scalar("metadata", "name") ?? $"document-{DocumentIndex}";
public string Namespace => Scalar("metadata", "namespace") ?? string.Empty;
public string RelativePath => Path.GetRelativePath(RootPath, FilePath).Replace('\\', '/');
public string Descriptor => $"{Kind} {Namespace}/{Name} [{RelativePath}#{DocumentIndex + 1}]";
public string? Scalar(params string[] path) => ManifestNodeExtensions.Scalar(Root, path);
public IReadOnlyList<YamlMappingNode> MappingSequence(params string[] path) => ManifestNodeExtensions.MappingSequence(Root, path);
public IEnumerable<string> AllScalars() => ManifestNodeExtensions.AllScalars(Root);
public IReadOnlyList<string> EgressPorts()
{
return MappingSequence("spec", "egress")
.SelectMany(egressRule => ManifestNodeExtensions.MappingSequence(egressRule, "ports"))
.Select(portMapping => ManifestNodeExtensions.Scalar(portMapping, "port"))
.Where(value => !string.IsNullOrWhiteSpace(value))
.Cast<string>()
.ToList();
}
public YamlMappingNode? PodSpec()
{
return Kind switch
{
"Deployment" or "StatefulSet" or "DaemonSet" or "Job" =>
ManifestNodeExtensions.Mapping(Root, "spec", "template", "spec"),
"CronJob" =>
ManifestNodeExtensions.Mapping(Root, "spec", "jobTemplate", "spec", "template", "spec"),
_ => null,
};
}
public IReadOnlyList<YamlMappingNode> ContainerMappings()
{
var podSpec = PodSpec();
if (podSpec is null)
{
return Array.Empty<YamlMappingNode>();
}
return ManifestNodeExtensions.MappingSequence(podSpec, "containers")
.Concat(ManifestNodeExtensions.MappingSequence(podSpec, "initContainers"))
.ToList();
}
public IReadOnlyList<ContainerSpec> ContainerSpecs()
{
return ContainerMappings()
.Select(container => new ContainerSpec(
ManifestNodeExtensions.Scalar(container, "name") ?? "<unnamed>",
ManifestNodeExtensions.Scalar(container, "image") ?? string.Empty,
ManifestNodeExtensions.Scalar(container, "imagePullPolicy") ?? string.Empty))
.ToList();
}
}
internal sealed record ContainerSpec(string Name, string Image, string ImagePullPolicy);
internal static class ManifestNodeExtensions
{
public static string? Scalar(this YamlMappingNode mapping, params string[] path)
{
return TryGetNode(mapping, path, out var node) && node is YamlScalarNode scalar
? scalar.Value
: null;
}
public static YamlMappingNode? Mapping(this YamlMappingNode mapping, params string[] path)
{
return TryGetNode(mapping, path, out var node) ? node as YamlMappingNode : null;
}
public static bool TryGetMapping(this YamlMappingNode mapping, string key, out YamlMappingNode result)
{
if (TryGetChild(mapping, key, out var child) && child is YamlMappingNode childMapping)
{
result = childMapping;
return true;
}
result = null!;
return false;
}
public static IReadOnlyList<YamlMappingNode> MappingSequence(this YamlMappingNode mapping, params string[] path)
{
return TryGetNode(mapping, path, out var node) && node is YamlSequenceNode sequence
? sequence.Children.OfType<YamlMappingNode>().ToList()
: Array.Empty<YamlMappingNode>();
}
public static IReadOnlyList<string> ScalarSequence(this YamlMappingNode mapping, params string[] path)
{
return TryGetNode(mapping, path, out var node) && node is YamlSequenceNode sequence
? sequence.Children.OfType<YamlScalarNode>()
.Select(child => child.Value)
.Where(value => !string.IsNullOrWhiteSpace(value))
.Cast<string>()
.ToList()
: Array.Empty<string>();
}
public static IEnumerable<string> AllScalars(YamlNode node)
{
return node switch
{
YamlScalarNode scalar when !string.IsNullOrWhiteSpace(scalar.Value) => new[] { scalar.Value! },
YamlSequenceNode sequence => sequence.Children.SelectMany(AllScalars),
YamlMappingNode mapping => mapping.Children.SelectMany(entry => AllScalars(entry.Key).Concat(AllScalars(entry.Value))),
_ => Array.Empty<string>(),
};
}
private static bool TryGetNode(YamlMappingNode mapping, IReadOnlyList<string> path, out YamlNode node)
{
YamlNode current = mapping;
foreach (var segment in path)
{
if (current is not YamlMappingNode currentMapping || !TryGetChild(currentMapping, segment, out current))
{
node = null!;
return false;
}
}
node = current;
return true;
}
private static bool TryGetChild(YamlMappingNode mapping, string key, out YamlNode value)
{
foreach (var entry in mapping.Children)
{
if (entry.Key is YamlScalarNode scalar
&& string.Equals(scalar.Value, key, StringComparison.Ordinal))
{
value = entry.Value;
return true;
}
}
value = null!;
return false;
}
}

View File

@@ -1,266 +0,0 @@
using System.Text.Json;
using FluentAssertions;
using Xunit;
namespace BluejayInfraLint.Tests;
[Trait("Category", "Unit")]
public sealed class PiSignagePlayerArtifactTests
{
private static readonly string Root = FindRepoRoot();
private static readonly string AppRoot = Path.Combine(Root, "apps", "fc-signage-pi-player");
public static TheoryData<string> RequiredArtifacts => new()
{
"README.md",
"systemd/flowercore-signage-player-pi.service",
"systemd/flowercore-signage-player-pi-hdmi.service",
"systemd/flowercore-signage-bootstrap.service",
"systemd/flowercore-signage-renew.service",
"systemd/flowercore-signage-renew.timer",
"systemd/flowercore-signage-detect-display.service",
"systemd/flowercore-signage-detect-display.timer",
"systemd/99-flowercore-signage-hdmi.rules",
"chromium-policies/flowercore-signage.json",
"scripts/flowercore-signage-launch.sh",
"scripts/flowercore-signage-prelaunch.sh",
"scripts/flowercore-signage-bootstrap.sh",
"scripts/flowercore-signage-renew-cert.sh",
"scripts/flowercore-signage-hdmi-respond.sh",
"scripts/fc-signage-detect-display",
};
[Theory]
[MemberData(nameof(RequiredArtifacts))]
public void RequiredArtifacts_ArePresent(string relativePath)
{
File.Exists(Path.Combine(AppRoot, relativePath)).Should().BeTrue(relativePath);
}
[Fact]
public void PlayerService_UsesExpectedRestartAndMemoryGuards()
{
var unit = Read("systemd/flowercore-signage-player-pi.service");
unit.Should().Contain("Restart=always");
unit.Should().Contain("RestartSec=10s");
unit.Should().Contain("StartLimitBurst=5");
unit.Should().Contain("StartLimitIntervalSec=300s");
unit.Should().Contain("MemoryMax=2G");
}
[Fact]
public void PlayerService_IsGatedByNodeIdentityAndMtlsCertificate()
{
var unit = Read("systemd/flowercore-signage-player-pi.service");
unit.Should().Contain("ConditionPathExists=/etc/flowercore/signage-node.json");
unit.Should().Contain("ConditionPathExists=/etc/fc-signage-player/client.p12");
unit.Should().Contain("ExecStartPre=/usr/local/bin/flowercore-signage-prelaunch.sh");
}
[Fact]
public void LaunchScript_TriesEmbedThenFallsBackToBarePlayerRoute()
{
var script = Read("scripts/flowercore-signage-launch.sh");
script.Should().Contain("/player/${NODE_ID}/embed?token=${CERT_THUMB}");
script.Should().Contain("url-divergence.log");
script.Should().Contain("/player/${NODE_ID}?token=${CERT_THUMB}");
}
[Fact]
public void LaunchScript_DisablesChromiumPromptsAndRuntimeUpdates()
{
var script = Read("scripts/flowercore-signage-launch.sh");
script.Should().Contain("--noerrdialogs");
script.Should().Contain("--disable-infobars");
script.Should().Contain("--password-store=basic");
script.Should().Contain("--check-for-update-interval=2592000");
}
[Fact]
public void PrelaunchScript_AbortsWhenRequiredFilesAreMissing()
{
var script = Read("scripts/flowercore-signage-prelaunch.sh");
script.Should().Contain("for f in /etc/flowercore/signage-node.json /etc/fc-signage-player/client.p12 /etc/fc-signage-player/client.p12.pass");
script.Should().Contain("exit 1");
script.Should().Contain("-checkend $((7*24*3600))");
}
[Fact]
public void BootstrapScript_IsIdempotentWhenAlreadyEnrolled()
{
var script = Read("scripts/flowercore-signage-bootstrap.sh");
script.Should().Contain("already enrolled");
script.Should().Contain("exit 0");
script.Should().Contain(".enrolledAt");
}
[Fact]
public void BootstrapScript_GeneratesStableMachineIdFromUuid()
{
var script = Read("scripts/flowercore-signage-bootstrap.sh");
script.Should().Contain("uuidgen");
script.Should().Contain("cut -c1-16");
script.Should().Contain("machineId");
}
[Fact]
public void BootstrapScript_RetriesRegisterOnceForFirstCallRace()
{
var script = Read("scripts/flowercore-signage-bootstrap.sh");
script.Should().Contain("for attempt in 1 2");
script.Should().Contain("register attempt $attempt returned");
script.Should().Contain("sleep 5");
}
[Fact]
public void BootstrapScript_SupportsSetupCodeAndApprovalPollingBudget()
{
var script = Read("scripts/flowercore-signage-bootstrap.sh");
script.Should().Contain("signage-setup-code");
script.Should().Contain("approve-via-setup-code");
script.Should().Contain("+ 1800");
script.Should().Contain("sleep 15");
}
[Fact]
public void BootstrapScript_CsrSubjectIdentifiesPiPlayer()
{
var script = Read("scripts/flowercore-signage-bootstrap.sh");
script.Should().Contain("/CN=${NODE_ID}/O=FlowerCore/OU=SignagePlayer-Pi");
}
[Fact]
public void BootstrapScript_PersistsCertificateAsP12WithRestrictivePermissions()
{
var script = Read("scripts/flowercore-signage-bootstrap.sh");
script.Should().Contain("openssl pkcs12 -export");
script.Should().Contain("client.p12.pass");
script.Should().Contain("chmod 0600");
script.Should().Contain("chmod 0640");
}
[Fact]
public void RenewScript_OnlyRunsWhenCertHasLessThanThirtyDays()
{
var script = Read("scripts/flowercore-signage-renew-cert.sh");
script.Should().Contain("-checkend $((30*24*3600))");
script.Should().Contain("exit 0");
script.Should().Contain("/renew");
}
[Fact]
public void RenewScript_AtomicallySwapsNewCertificateFiles()
{
var script = Read("scripts/flowercore-signage-renew-cert.sh");
script.Should().Contain("client.key.new");
script.Should().Contain("mv \"$CERT_DIR/client.key.new\" \"$CERT_DIR/client.key\"");
script.Should().Contain("mv \"$CERT_DIR/client.p12.new\" \"$CERT_DIR/client.p12\"");
}
[Fact]
public void HdmiRule_RestartsPlayerAndRunsCapabilityDetection()
{
var rule = Read("systemd/99-flowercore-signage-hdmi.rules");
rule.Should().Contain("KERNEL==\"card?-HDMI-A-?\"");
rule.Should().Contain("restart flowercore-signage-player-pi.service");
rule.Should().Contain("start flowercore-signage-detect-display.service");
}
[Fact]
public void DetectDisplayServiceAndTimer_RunAtBootAndDaily()
{
var service = Read("systemd/flowercore-signage-detect-display.service");
var timer = Read("systemd/flowercore-signage-detect-display.timer");
service.Should().Contain("ExecStart=/usr/local/bin/fc-signage-detect-display");
timer.Should().Contain("OnBootSec=30s");
timer.Should().Contain("OnCalendar=daily");
timer.Should().Contain("RandomizedDelaySec=1h");
}
[Fact]
public void DetectDisplayScript_EmitsDisconnectedProfileWhenNoHdmiIsPresent()
{
var script = Read("scripts/fc-signage-detect-display");
script.Should().Contain("displayConnected: false");
script.Should().Contain("No HDMI display detected");
}
[Fact]
public void DetectDisplayScript_ParsesEdidForHdrResolutionAndAudio()
{
var script = Read("scripts/fc-signage-detect-display");
script.Should().Contain("edid-decode");
script.Should().Contain("HDR (Static|Dynamic) Metadata Block");
script.Should().Contain("maxResolution");
script.Should().Contain("hasAudioOutput");
}
[Fact]
public void DetectDisplayScript_TriesBothForwardCompatibleCapabilityEndpoints()
{
var script = Read("scripts/fc-signage-detect-display");
script.Should().Contain("/api/v1/nodes/${NODE_ID}/capabilities");
script.Should().Contain("/api/v1/displays/${NODE_ID}/capability-profile");
script.Should().Contain("no endpoint accepted the profile");
}
[Fact]
public void ChromiumPolicy_IsValidJsonAndDisablesCredentialPrompts()
{
using var doc = JsonDocument.Parse(Read("chromium-policies/flowercore-signage.json"));
var root = doc.RootElement;
root.GetProperty("AutofillAddressEnabled").GetBoolean().Should().BeFalse();
root.GetProperty("AutofillCreditCardEnabled").GetBoolean().Should().BeFalse();
root.GetProperty("PasswordManagerEnabled").GetBoolean().Should().BeFalse();
root.GetProperty("ExtensionInstallBlocklist")[0].GetString().Should().Be("*");
}
[Fact]
public void RenewalTimer_UsesDailyCadenceWithTwoHourJitter()
{
var timer = Read("systemd/flowercore-signage-renew.timer");
timer.Should().Contain("OnCalendar=daily");
timer.Should().Contain("RandomizedDelaySec=2h");
timer.Should().Contain("Persistent=true");
}
private static string Read(string relativePath)
=> File.ReadAllText(Path.Combine(AppRoot, relativePath.Replace('/', Path.DirectorySeparatorChar)));
private static string FindRepoRoot()
{
var current = new DirectoryInfo(AppContext.BaseDirectory);
while (current is not null)
{
if (Directory.Exists(Path.Combine(current.FullName, "apps"))
&& File.Exists(Path.Combine(current.FullName, "README.md")))
{
return current.FullName;
}
current = current.Parent;
}
throw new DirectoryNotFoundException("Could not find bluejay-infra root.");
}
}

View File

@@ -1,12 +0,0 @@
package bluejayinfra.cross_namespace_ingressroute
deny[msg] {
input.kind == "IngressRoute"
ns := object.get(input.metadata, "namespace", "")
route := input.spec.routes[_]
service := route.services[_]
svc_ns := object.get(service, "namespace", "")
svc_ns != ""
svc_ns != ns
msg := sprintf("IngressRoute %s/%s references Service %s in namespace %s", [ns, input.metadata.name, service.name, svc_ns])
}

View File

@@ -1,23 +0,0 @@
package bluejayinfra.public_method_allowlist
public_hosts := {"dist.flowercore.io", "dns.iamworkin.lan"}
deny[msg] {
input.kind == "IngressRoute"
route := input.spec.routes[_]
match := object.get(route, "match", "")
host := public_hosts[_]
contains(match, sprintf("Host(`%s`)", [host]))
not contains(match, "Method(`GET`)")
msg := sprintf("IngressRoute %s/%s is missing Method(GET) for public read-only host %s", [input.metadata.namespace, input.metadata.name, host])
}
deny[msg] {
input.kind == "IngressRoute"
route := input.spec.routes[_]
match := object.get(route, "match", "")
host := public_hosts[_]
contains(match, sprintf("Host(`%s`)", [host]))
not contains(match, "Method(`HEAD`)")
msg := sprintf("IngressRoute %s/%s is missing Method(HEAD) for public read-only host %s", [input.metadata.namespace, input.metadata.name, host])
}

View File

@@ -1,30 +0,0 @@
package bluejayinfra.traefik_vip_backend_ports
has_vip {
some i
some j
input.spec.egress[i].to[j].ipBlock.cidr == "10.0.56.200/32"
}
has_port(port) {
some i
some j
input.spec.egress[i].ports[j].port == port
}
deny[msg] {
input.kind == "NetworkPolicy"
has_vip
has_port(443)
not has_port(8443)
msg := sprintf("NetworkPolicy %s/%s allows 10.0.56.200:443 without backend port 8443", [input.metadata.namespace, input.metadata.name])
}
deny[msg] {
input.kind == "NetworkPolicy"
has_vip
has_port(80)
not has_port(8080)
not has_port(8000)
msg := sprintf("NetworkPolicy %s/%s allows 10.0.56.200:80 without backend HTTP port 8080 or 8000", [input.metadata.namespace, input.metadata.name])
}

View File

@@ -1,28 +0,0 @@
package bluejayinfra.auth_probe_path
protected_deployments := {
"messageboard-web",
"scoreboard-web",
"segmentdisplay-web",
"signalcontrol-web",
}
deny[msg] {
input.kind == "Deployment"
protected_deployments[input.metadata.name]
container := input.spec.template.spec.containers[_]
probe := object.get(container, "readinessProbe", {})
http_get := object.get(probe, "httpGet", {})
object.get(http_get, "path", "") == "/health"
msg := sprintf("Deployment %s/%s must not use readinessProbe.httpGet /health behind API key middleware", [input.metadata.namespace, input.metadata.name])
}
deny[msg] {
input.kind == "Deployment"
protected_deployments[input.metadata.name]
container := input.spec.template.spec.containers[_]
probe := object.get(container, "livenessProbe", {})
http_get := object.get(probe, "httpGet", {})
object.get(http_get, "path", "") == "/health"
msg := sprintf("Deployment %s/%s must not use livenessProbe.httpGet /health behind API key middleware", [input.metadata.namespace, input.metadata.name])
}

View File

@@ -1,23 +0,0 @@
package bluejayinfra.statefulset_volumeclaim_defaults
deny[msg] {
input.kind == "StatefulSet"
count(object.get(input.spec, "volumeClaimTemplates", [])) > 0
object.get(input.spec, "podManagementPolicy", "") == ""
msg := sprintf("StatefulSet %s/%s is missing spec.podManagementPolicy", [input.metadata.namespace, input.metadata.name])
}
deny[msg] {
input.kind == "StatefulSet"
count(object.get(input.spec, "volumeClaimTemplates", [])) > 0
object.get(input.spec, "revisionHistoryLimit", 0) == 0
msg := sprintf("StatefulSet %s/%s is missing spec.revisionHistoryLimit", [input.metadata.namespace, input.metadata.name])
}
deny[msg] {
input.kind == "StatefulSet"
claim := input.spec.volumeClaimTemplates[_]
object.get(claim.spec, "volumeMode", "") != "Filesystem"
claim_name := object.get(claim.metadata, "name", "<unnamed>")
msg := sprintf("StatefulSet %s/%s volumeClaimTemplate %s is missing volumeMode: Filesystem", [input.metadata.namespace, input.metadata.name, claim_name])
}

View File

@@ -1,40 +0,0 @@
package bluejayinfra.localhost_image_pull_policy
pod_spec(spec) = pod {
input.kind == "Deployment"
pod := spec.template.spec
}
pod_spec(spec) = pod {
input.kind == "StatefulSet"
pod := spec.template.spec
}
pod_spec(spec) = pod {
input.kind == "DaemonSet"
pod := spec.template.spec
}
deny[msg] {
pod := pod_spec(input.spec)
container := pod.containers[_]
startswith(object.get(container, "image", ""), "localhost/")
object.get(container, "imagePullPolicy", "") != "Never"
msg := sprintf("%s/%s container %s uses a localhost image without imagePullPolicy: Never", [input.metadata.namespace, input.metadata.name, container.name])
}
deny[msg] {
pod := pod_spec(input.spec)
container := pod.initContainers[_]
startswith(object.get(container, "image", ""), "localhost/")
object.get(container, "imagePullPolicy", "") != "Never"
msg := sprintf("%s/%s initContainer %s uses a localhost image without imagePullPolicy: Never", [input.metadata.namespace, input.metadata.name, container.name])
}
deny[msg] {
pod := pod_spec(input.spec)
container := pod.containers[_]
startswith(object.get(container, "image", ""), "fc-")
not contains(object.get(container, "image", ""), "/")
msg := sprintf("%s/%s container %s uses a non-localhost FlowerCore image reference %s", [input.metadata.namespace, input.metadata.name, container.name, container.image])
}

View File

@@ -1,27 +0,0 @@
package bluejayinfra.public_egress_dns_none
public_egress_workloads := {
"asterisk",
"fc-llm-bridge",
"mysql-web",
"php-web",
"ttsreader-align",
"ttsreader-kokoro",
"ttsreader-modern",
"ttsreader-piper",
}
deny[msg] {
input.kind == "Deployment"
public_egress_workloads[input.metadata.name]
object.get(input.spec.template.spec, "dnsPolicy", "") != "None"
msg := sprintf("Deployment %s/%s must set dnsPolicy: None for public-internet egress", [input.metadata.namespace, input.metadata.name])
}
deny[msg] {
input.kind == "Deployment"
public_egress_workloads[input.metadata.name]
search := object.get(object.get(input.spec.template.spec, "dnsConfig", {}), "searches", [])[_]
contains(lower(search), "iamworkin.lan")
msg := sprintf("Deployment %s/%s must not include iamworkin.lan in dnsConfig.searches", [input.metadata.namespace, input.metadata.name])
}

View File

@@ -1,40 +0,0 @@
package bluejayinfra.public_readwrite_allowlist
# Public hosts that allow a tightly bounded write surface in addition to
# GET/HEAD. updatecenter.iamworkin.lan accepts POST /api/v1/checkin/{id}
# (bootstrap-JWT) so its allowlist is GET||HEAD||POST||OPTIONS — but
# PUT/PATCH/DELETE must still 404 at the route. Any host in this set MUST
# include all four required methods AND MUST NOT include any forbidden
# method.
public_readwrite_hosts := {
"updatecenter.iamworkin.lan",
"updates.iamworkin.lan",
"update.flowercore.io",
"updates.flowercore.io",
}
required_methods := {"GET", "HEAD", "POST", "OPTIONS"}
forbidden_methods := {"PUT", "PATCH", "DELETE"}
deny[msg] {
input.kind == "IngressRoute"
route := input.spec.routes[_]
match := object.get(route, "match", "")
host := public_readwrite_hosts[_]
contains(match, sprintf("Host(`%s`)", [host]))
required := required_methods[_]
not contains(match, sprintf("Method(`%s`)", [required]))
msg := sprintf("IngressRoute %s/%s is missing required Method(%s) for public read-write host %s", [input.metadata.namespace, input.metadata.name, required, host])
}
deny[msg] {
input.kind == "IngressRoute"
route := input.spec.routes[_]
match := object.get(route, "match", "")
host := public_readwrite_hosts[_]
contains(match, sprintf("Host(`%s`)", [host]))
forbidden := forbidden_methods[_]
contains(match, sprintf("Method(`%s`)", [forbidden]))
msg := sprintf("IngressRoute %s/%s must not include Method(%s) on public read-write host %s", [input.metadata.namespace, input.metadata.name, forbidden, host])
}