Compare commits

..

9 Commits

Author SHA1 Message Date
1f1f6823db runners: right-size replica counts per 14d CI activity (#24) 2026-05-26 00:01:47 +00:00
Andrew Stoltz
b92f74b63a runners: right-size replica counts per 14d CI activity data
Drop 2 → 1 for 10 deploys based on trailing-14d run counts:
  - LlmBridge, Media, Knowledge, Intranet.Web, DNS  (0 runs each)
  - Presentations (6), Redis (3), Provisioning (3),
    MessageBoard (3), MenuBoard (3)

Bump 2 → 3 for Print.Web: 12 runs in trailing 5d, and the
help-screenshots AAT job holds a runner 30+ min, creating
head-of-line blocking for parallel PRs.

Net change: -9 replicas (≈ -9 GiB committed memory).
Aligns with Sprint 33 morning-routine capacity audit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 18:55:47 -05:00
Andrew Stoltz
cb7f7dbc4d authentik: generous startup/liveness probes for first-boot migration
The server pod was getting killed by liveness probe at 60s while still
waiting on migration DB lock (worker pod also running migrations against
same DB). Add startupProbe with 10.5 min budget so liveness doesn't fire
until migrations finish.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 16:03:03 -05:00
Andrew Stoltz
03126d5584 authentik: add fsGroup:1000 to server + worker so non-root uid can write /media
PermissionError: [Errno 13] Permission denied: '/media/public' in tenant_files
migration because Authentik container runs as uid 1000 but Longhorn PVC mounts
root:root by default. fsGroup on Pod securityContext recursively chgrps the
PVC mount to gid 1000 + chmods g+rwx.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 15:58:35 -05:00
Andrew Stoltz
495e884c41 authentik: initial deployment at id.iamworkin.lan
Stack:
  - PostgreSQL 16 StatefulSet (Longhorn RWO 5Gi)
  - Redis 7 Deployment (no persistence)
  - Authentik server + worker (ghcr.io/goauthentik/server:2024.12.3)
  - Shared media PVC (Longhorn RWO 2Gi) between server+worker
  - Certificate via step-ca-acme ClusterIssuer
  - Traefik IngressRoute at id.iamworkin.lan

Secrets sourced from 1Password item 'authentik-credentials' (IAmWorkin
vault, id y6i74ch22q5wvm7znquq4nhhcu) via OnePasswordItem CRD. Fields:
AUTHENTIK_SECRET_KEY, POSTGRES_PASSWORD, REDIS_PASSWORD,
BOOTSTRAP_ADMIN_PASSWORD, BOOTSTRAP_ADMIN_TOKEN, BOOTSTRAP_ADMIN_EMAIL.

DNS A record id.iamworkin.lan -> 10.0.56.200 added via
scripts/pfsense-add-id-host.py (FlowerCore.DNS service was 502'ing on
pfSense diag_command.php response parsing).

Closes the immediate gap from PiManager OIDC Cohort 3 wire-up: PiManager
(a87cd6f) configures id.iamworkin.lan as JWT authority but the backend
was never deployed. Pirelay specifically is on Mode:apikey until this
backend is bootstrapped and a pimanager service-account exists.

Post-deploy bootstrap (manual once pods Ready):
  1. Login at https://id.iamworkin.lan/if/admin/ as akadmin
     using BOOTSTRAP_ADMIN_PASSWORD from 1Password.
  2. Create OAuth2/OpenID Provider for pimanager (issuer
     https://id.iamworkin.lan/application/o/pimanager/, audience 'pimanager').
  3. Create Application binding the provider.
  4. Create service account user 'pimanager-service-account', generate
     long-lived token, store in 1Password as 'pimanager-service-account'.
  5. Re-enable jwt mode on pirelay + un-mask puppet.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 15:50:10 -05:00
Andrew Stoltz
65aa1e6104 fix(monitoring): point probe-printweb at /health (Q-MR-90)
Root path requires API key auth — `/` returned 401 to the blackbox
probe, firing PrintWebDown despite `/health` reporting Healthy.
Pattern: feedback_k8s_probes_behind_auth_middleware.

Mirrors FlowerCore.Notes scripts/monitoring/prometheus.yml.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 14:52:02 -05:00
Andrew Stoltz
7f2a3b76b4 feat(github-runner): bake Ruby 3.3 into Linux self-hosted runner image (Q-MR-81) 2026-05-20 11:45:43 -05:00
ea73f00461 fix(fc-devicemgmt): remove self-referential Application resource (Q-MR-79)
ApplicationSet already creates infra-fc-devicemgmt; removing the in-repo Application child clears the self-reference drift.
2026-05-20 16:20:01 +00:00
Andrew Stoltz
25ace30a03 fix(fc-devicemgmt): remove self-referential Application resource (Q-MR-79) 2026-05-20 11:18:25 -05:00
11 changed files with 874 additions and 354 deletions

View File

@@ -0,0 +1,448 @@
# Authentik OIDC backend
# ArgoCD-managed. BlueJay Lab.
#
# Stack:
# - PostgreSQL 16 StatefulSet (single replica, Longhorn RWO 5Gi)
# - Redis 7 Deployment (no persistence — session/cache only)
# - Authentik server + worker Deployments (image ghcr.io/goauthentik/server:2024.12.3)
# - Media PVC shared between server + worker (Longhorn RWO 2Gi)
# - Certificate via step-ca-acme ClusterIssuer
# - Traefik IngressRoute at id.iamworkin.lan
#
# Secrets come from 1Password item "authentik-credentials" (IAmWorkin vault, id y6i74ch22q5wvm7znquq4nhhcu)
# via the OnePasswordItem CRD, materialized into k8s Secret authentik/authentik-credentials.
#
# Why the discovery URL is /application/o/pimanager/ : Authentik issues per-application OIDC providers.
# The pimanager OIDC application/provider is created after the cluster pods are healthy (manual or
# via API once the bootstrap token is available — see Notes substrate).
---
apiVersion: v1
kind: Namespace
metadata:
name: authentik
labels:
app.kubernetes.io/part-of: bluejay-infra
---
# 1Password operator pulls the authentik-credentials item into a k8s Secret of the same name.
# Field labels in 1P become Secret keys: AUTHENTIK_SECRET_KEY, POSTGRES_PASSWORD, REDIS_PASSWORD,
# BOOTSTRAP_ADMIN_PASSWORD, BOOTSTRAP_ADMIN_TOKEN, BOOTSTRAP_ADMIN_EMAIL.
apiVersion: onepassword.com/v1
kind: OnePasswordItem
metadata:
name: authentik-credentials
namespace: authentik
spec:
itemPath: "vaults/IAmWorkin/items/authentik-credentials"
---
# Shared media volume for server + worker pods.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: authentik-media
namespace: authentik
spec:
storageClassName: longhorn
accessModes: [ReadWriteOnce]
resources:
requests:
storage: 2Gi
---
# PostgreSQL 16 StatefulSet — Authentik's primary store.
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: authentik-postgres
namespace: authentik
labels:
app: authentik-postgres
argocd.argoproj.io/instance: infra-authentik
spec:
persistentVolumeClaimRetentionPolicy:
whenDeleted: Retain
whenScaled: Retain
podManagementPolicy: OrderedReady
serviceName: authentik-postgres
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: authentik-postgres
template:
metadata:
labels:
app: authentik-postgres
spec:
containers:
- name: postgres
image: postgres:16-alpine
ports:
- containerPort: 5432
name: postgres
env:
- name: POSTGRES_USER
value: authentik
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: authentik-credentials
key: POSTGRES_PASSWORD
- name: POSTGRES_DB
value: authentik
- name: POSTGRES_INITDB_ARGS
value: "--encoding=UTF-8 --lc-collate=C --lc-ctype=C"
- name: PGDATA
value: /var/lib/postgresql/data/pgdata
readinessProbe:
exec:
command: ["pg_isready", "-U", "authentik"]
initialDelaySeconds: 5
periodSeconds: 5
livenessProbe:
exec:
command: ["pg_isready", "-U", "authentik"]
initialDelaySeconds: 30
periodSeconds: 30
resources:
requests: { cpu: 100m, memory: 256Mi }
limits: { cpu: 1000m, memory: 1Gi }
volumeMounts:
- name: pgdata
mountPath: /var/lib/postgresql/data
volumeClaimTemplates:
- metadata:
name: pgdata
spec:
storageClassName: longhorn
accessModes: [ReadWriteOnce]
volumeMode: Filesystem
resources:
requests:
storage: 5Gi
---
apiVersion: v1
kind: Service
metadata:
name: authentik-postgres
namespace: authentik
spec:
clusterIP: None
selector:
app: authentik-postgres
ports:
- name: postgres
port: 5432
targetPort: 5432
---
# Redis 7 — session storage + Celery broker. No persistence needed (cache).
apiVersion: apps/v1
kind: Deployment
metadata:
name: authentik-redis
namespace: authentik
labels:
app: authentik-redis
argocd.argoproj.io/instance: infra-authentik
spec:
replicas: 1
strategy:
type: Recreate
selector:
matchLabels:
app: authentik-redis
template:
metadata:
labels:
app: authentik-redis
spec:
containers:
- name: redis
image: redis:7-alpine
args:
- "--save"
- ""
- "--appendonly"
- "no"
- "--requirepass"
- "$(REDIS_PASSWORD)"
env:
- name: REDIS_PASSWORD
valueFrom:
secretKeyRef:
name: authentik-credentials
key: REDIS_PASSWORD
ports:
- containerPort: 6379
name: redis
readinessProbe:
tcpSocket: { port: 6379 }
initialDelaySeconds: 5
periodSeconds: 5
livenessProbe:
tcpSocket: { port: 6379 }
initialDelaySeconds: 30
periodSeconds: 30
resources:
requests: { cpu: 50m, memory: 64Mi }
limits: { cpu: 500m, memory: 256Mi }
---
apiVersion: v1
kind: Service
metadata:
name: authentik-redis
namespace: authentik
spec:
selector:
app: authentik-redis
ports:
- name: redis
port: 6379
targetPort: 6379
---
# Authentik server Deployment — HTTP frontend on :9000.
apiVersion: apps/v1
kind: Deployment
metadata:
name: authentik-server
namespace: authentik
labels:
app: authentik-server
argocd.argoproj.io/instance: infra-authentik
spec:
replicas: 1
strategy:
type: Recreate # shares /media RWO PVC with worker
selector:
matchLabels:
app: authentik-server
template:
metadata:
labels:
app: authentik-server
spec:
securityContext:
# Authentik image runs as uid 1000 "authentik" but the Longhorn PVC mounts
# root:root by default. fsGroup recursively chgrp + chmod g+rwx so the
# non-root container can mkdir /media/public during the tenant_files migration.
fsGroup: 1000
containers:
- name: server
image: ghcr.io/goauthentik/server:2024.12.3
args: ["server"]
ports:
- containerPort: 9000
name: http
- containerPort: 9443
name: https
env:
- name: AUTHENTIK_SECRET_KEY
valueFrom:
secretKeyRef:
name: authentik-credentials
key: AUTHENTIK_SECRET_KEY
- name: AUTHENTIK_REDIS__HOST
value: authentik-redis
- name: AUTHENTIK_REDIS__PASSWORD
valueFrom:
secretKeyRef:
name: authentik-credentials
key: REDIS_PASSWORD
- name: AUTHENTIK_POSTGRESQL__HOST
value: authentik-postgres
- name: AUTHENTIK_POSTGRESQL__NAME
value: authentik
- name: AUTHENTIK_POSTGRESQL__USER
value: authentik
- name: AUTHENTIK_POSTGRESQL__PASSWORD
valueFrom:
secretKeyRef:
name: authentik-credentials
key: POSTGRES_PASSWORD
- name: AUTHENTIK_BOOTSTRAP_PASSWORD
valueFrom:
secretKeyRef:
name: authentik-credentials
key: BOOTSTRAP_ADMIN_PASSWORD
- name: AUTHENTIK_BOOTSTRAP_TOKEN
valueFrom:
secretKeyRef:
name: authentik-credentials
key: BOOTSTRAP_ADMIN_TOKEN
- name: AUTHENTIK_BOOTSTRAP_EMAIL
valueFrom:
secretKeyRef:
name: authentik-credentials
key: BOOTSTRAP_ADMIN_EMAIL
- name: AUTHENTIK_DISABLE_UPDATE_CHECK
value: "true"
- name: AUTHENTIK_ERROR_REPORTING__ENABLED
value: "false"
- name: AUTHENTIK_LOG_LEVEL
value: info
# First-boot Authentik can take 3+ min on the migration phase
# (waiting on DB lock while worker also runs migrations). Initial
# delays are generous so kubelet doesn't kill the pod mid-migration;
# periodSeconds keeps post-startup probing responsive.
readinessProbe:
httpGet:
path: /-/health/ready/
port: 9000
initialDelaySeconds: 60
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 12
livenessProbe:
httpGet:
path: /-/health/live/
port: 9000
initialDelaySeconds: 300
periodSeconds: 30
timeoutSeconds: 10
failureThreshold: 3
startupProbe:
httpGet:
path: /-/health/live/
port: 9000
initialDelaySeconds: 30
periodSeconds: 15
timeoutSeconds: 10
failureThreshold: 40 # 30s + 40*15s = 10.5 min budget
resources:
requests: { cpu: 150m, memory: 512Mi }
limits: { cpu: 1500m, memory: 1Gi }
volumeMounts:
- name: media
mountPath: /media
volumes:
- name: media
persistentVolumeClaim:
claimName: authentik-media
---
# Authentik worker Deployment — runs Celery background tasks.
apiVersion: apps/v1
kind: Deployment
metadata:
name: authentik-worker
namespace: authentik
labels:
app: authentik-worker
argocd.argoproj.io/instance: infra-authentik
spec:
replicas: 1
strategy:
type: Recreate # shares /media RWO PVC with server
selector:
matchLabels:
app: authentik-worker
template:
metadata:
labels:
app: authentik-worker
spec:
securityContext:
# Same as server pod — non-root uid 1000 needs PVC group write.
fsGroup: 1000
containers:
- name: worker
image: ghcr.io/goauthentik/server:2024.12.3
args: ["worker"]
env:
- name: AUTHENTIK_SECRET_KEY
valueFrom:
secretKeyRef:
name: authentik-credentials
key: AUTHENTIK_SECRET_KEY
- name: AUTHENTIK_REDIS__HOST
value: authentik-redis
- name: AUTHENTIK_REDIS__PASSWORD
valueFrom:
secretKeyRef:
name: authentik-credentials
key: REDIS_PASSWORD
- name: AUTHENTIK_POSTGRESQL__HOST
value: authentik-postgres
- name: AUTHENTIK_POSTGRESQL__NAME
value: authentik
- name: AUTHENTIK_POSTGRESQL__USER
value: authentik
- name: AUTHENTIK_POSTGRESQL__PASSWORD
valueFrom:
secretKeyRef:
name: authentik-credentials
key: POSTGRES_PASSWORD
- name: AUTHENTIK_DISABLE_UPDATE_CHECK
value: "true"
- name: AUTHENTIK_ERROR_REPORTING__ENABLED
value: "false"
- name: AUTHENTIK_LOG_LEVEL
value: info
resources:
requests: { cpu: 100m, memory: 256Mi }
limits: { cpu: 1000m, memory: 768Mi }
volumeMounts:
- name: media
mountPath: /media
volumes:
- name: media
persistentVolumeClaim:
claimName: authentik-media
---
apiVersion: v1
kind: Service
metadata:
name: authentik-server
namespace: authentik
spec:
selector:
app: authentik-server
ports:
- name: http
port: 9000
targetPort: 9000
- name: https
port: 9443
targetPort: 9443
---
# step-ca leaf certificate for id.iamworkin.lan.
# step-ca container resolver uses pfSense Unbound, so the public A record for id.iamworkin.lan
# MUST exist before this Certificate is applied (cert-manager HTTP-01 will silently 2h-backoff
# otherwise). Added 2026-05-25 via scripts/pfsense-add-id-host.py.
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: authentik-tls
namespace: authentik
spec:
secretName: authentik-tls
dnsNames:
- id.iamworkin.lan
issuerRef:
name: step-ca-acme
kind: ClusterIssuer
---
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: authentik
namespace: authentik
spec:
entryPoints: [websecure]
routes:
- match: Host(`id.iamworkin.lan`)
kind: Rule
services:
- name: authentik-server
port: 9000
tls:
secretName: authentik-tls

View File

@@ -1,13 +1,10 @@
# FlowerCore Remote Desktop — TLS + Ingress
#
# Source-of-truth split:
# - bluejay-infra OWNS: Certificate, IngressRoute, all NetworkPolicies,
# and the explicit RemoteDesktopPoolCrd warm-pool intent in
# remotedesktop-pools.yaml.
# - bluejay-infra OWNS: Certificate, IngressRoute, all NetworkPolicies
# (see network-policies.yaml in this directory).
# - FlowerCore.RemoteDesktop OWNS: CRD definition/operator Deployment and
# scripts/deploy-web.sh Deployment + Service. Reason: image refs like
# `localhost/fc-desktop:linux-xfce`
# - FlowerCore.RemoteDesktop scripts/deploy-web.sh OWNS: Deployment +
# Service. Reason: image refs like `localhost/fc-desktop:linux-xfce`
# only exist on each node's containerd after a manual import, so a
# Deployment manifest in bluejay-infra would race the image-import
# step and crash-loop.

View File

@@ -1,101 +0,0 @@
# FlowerCore RemoteDesktop warm-pool intent.
#
# These CRDs are deliberately explicit. The RemoteDesktop warmup loop no
# longer scans template defaults to decide what to warm; every enabled pool
# here represents operator/GitOps intent and prevents a repeat of the
# orphan-pool leak from 2026-05-08.
---
apiVersion: flowercore.io/v1
kind: RemoteDesktopPoolCrd
metadata:
name: browser-lab-pool
namespace: fc-desktop
labels:
app.kubernetes.io/name: remotedesktop-pool
app.kubernetes.io/part-of: flowercore-remotedesktop
app.kubernetes.io/managed-by: bluejay-infra
spec:
templateSlug: browser-only
desiredSize: 1
enabled: true
reconcileNow: true
---
apiVersion: flowercore.io/v1
kind: RemoteDesktopPoolCrd
metadata:
name: opensuse-xfce-pool
namespace: fc-desktop
labels:
app.kubernetes.io/name: remotedesktop-pool
app.kubernetes.io/part-of: flowercore-remotedesktop
app.kubernetes.io/managed-by: bluejay-infra
spec:
templateSlug: opensuse-xfce
desiredSize: 1
enabled: true
userVolumeMode: LateAttach
reconcileNow: true
---
apiVersion: flowercore.io/v1
kind: RemoteDesktopPoolCrd
metadata:
name: dev-workstation-pool
namespace: fc-desktop
labels:
app.kubernetes.io/name: remotedesktop-pool
app.kubernetes.io/part-of: flowercore-remotedesktop
app.kubernetes.io/managed-by: bluejay-infra
spec:
templateSlug: dev-workstation
desiredSize: 1
enabled: true
userVolumeMode: LateAttach
reconcileNow: true
---
apiVersion: flowercore.io/v1
kind: RemoteDesktopPoolCrd
metadata:
name: ai-station-pool
namespace: fc-desktop
labels:
app.kubernetes.io/name: remotedesktop-pool
app.kubernetes.io/part-of: flowercore-remotedesktop
app.kubernetes.io/managed-by: bluejay-infra
spec:
templateSlug: ai-station
desiredSize: 1
enabled: true
userVolumeMode: LateAttach
reconcileNow: true
---
apiVersion: flowercore.io/v1
kind: RemoteDesktopPoolCrd
metadata:
name: linux-xfce-pool
namespace: fc-desktop
labels:
app.kubernetes.io/name: remotedesktop-pool
app.kubernetes.io/part-of: flowercore-remotedesktop
app.kubernetes.io/managed-by: bluejay-infra
spec:
templateSlug: linux-xfce
desiredSize: 1
enabled: true
userVolumeMode: LateAttach
reconcileNow: true
---
apiVersion: flowercore.io/v1
kind: RemoteDesktopPoolCrd
metadata:
name: linux-xfce-rdp-pool
namespace: fc-desktop
labels:
app.kubernetes.io/name: remotedesktop-pool
app.kubernetes.io/part-of: flowercore-remotedesktop
app.kubernetes.io/managed-by: bluejay-infra
spec:
templateSlug: linux-xfce-rdp
desiredSize: 1
enabled: true
userVolumeMode: LateAttach
reconcileNow: true

View File

@@ -1,33 +0,0 @@
# Explicit ArgoCD Application shape for bootstrap/review.
#
# The live bluejay-infra ApplicationSet already discovers apps/* directories
# and creates this same Application name (`infra-fc-devicemgmt`) automatically.
# Keep repoURL on the internal Gitea ClusterIP URL; ArgoCD does not trust the
# external step-ca HTTPS endpoint.
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: infra-fc-devicemgmt
namespace: argocd
labels:
app.kubernetes.io/name: fc-devicemgmt
app.kubernetes.io/part-of: flowercore
app.kubernetes.io/managed-by: argocd
flowercore.io/tenant-id: system
flowercore.io/created-by: bluejay-infra
spec:
project: default
source:
repoURL: http://gitea-clusterip.gitea.svc.cluster.local:3000/bluejay/bluejay-infra.git
targetRevision: main
path: apps/fc-devicemgmt
destination:
server: https://kubernetes.default.svc
namespace: fc-devicemgmt
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
- ServerSideApply=true

2
apps/github-runner/.gitattributes vendored Normal file
View File

@@ -0,0 +1,2 @@
*.sh text eol=lf
Dockerfile text eol=lf

View File

@@ -0,0 +1,44 @@
FROM myoung34/github-runner:latest
ARG RUBY_VERSION=3.3.11
ARG RUBY_MINOR=3.3
ARG RUBY_BUILD_VERSION=v20260326
ARG RUNNER_UID=1001
ARG RUNNER_GID=1001
ENV RUNNER_TOOL_CACHE=/home/runner/_tool
ENV RUNNER_RUBY_TOOLCACHE=/opt/runner-toolcache
ENV PATH="/home/runner/_tool/Ruby/${RUBY_MINOR}/x64/bin:/opt/runner-toolcache/Ruby/${RUBY_MINOR}/x64/bin:${PATH}"
USER root
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
autoconf \
bison \
build-essential \
ca-certificates \
curl \
libdb-dev \
libffi-dev \
libgdbm-dev \
libgmp-dev \
libncurses-dev \
libreadline-dev \
libssl-dev \
libyaml-dev \
patch \
pkg-config \
uuid-dev \
zlib1g-dev \
&& curl -fsSL "https://github.com/rbenv/ruby-build/archive/refs/tags/${RUBY_BUILD_VERSION}.tar.gz" -o /tmp/ruby-build.tar.gz \
&& mkdir -p /tmp/ruby-build \
&& tar -xzf /tmp/ruby-build.tar.gz --strip-components=1 -C /tmp/ruby-build \
&& /tmp/ruby-build/install.sh \
&& rm -rf /tmp/ruby-build /tmp/ruby-build.tar.gz /var/lib/apt/lists/*
COPY install-ruby-toolcache.sh /usr/local/bin/install-ruby-toolcache.sh
RUN chmod +x /usr/local/bin/install-ruby-toolcache.sh \
&& RUBY_VERSION="${RUBY_VERSION}" RUBY_MINOR="${RUBY_MINOR}" TOOLCACHE_ROOT="${RUNNER_RUBY_TOOLCACHE}" RUNNER_UID="${RUNNER_UID}" RUNNER_GID="${RUNNER_GID}" /usr/local/bin/install-ruby-toolcache.sh \
&& ruby -v

View File

@@ -7,12 +7,17 @@ Deployments with `kubectl`; update this manifest and let ArgoCD reconcile.
All repo-scoped Linux runners use:
- `localhost/fc-github-runner:v20260520-ruby3.3.11`, derived from
`myoung34/github-runner:latest`
- `ACCESS_TOKEN` from the `github-runner-token` Secret
- `RUN_AS_ROOT=false`
- `EPHEMERAL=true`
- `LABELS=self-hosted,linux,fc-build-linux`
- writable non-root paths under `/home/runner` for .NET, NuGet, XDG cache, and
Actions tool cache
- Ruby 3.3.11 seeded into `/home/runner/_tool/Ruby/3.3/x64` from the baked
`/opt/runner-toolcache` copy so `ruby/setup-ruby@v1` can discover it on
self-hosted `ubuntu-20.04-x64` runners
`github-runner` for `FlowerCore.Common` is single-replica because it retains the
original Longhorn ReadWriteOnce NuGet PVC. Every other repo-scoped runner uses
@@ -28,6 +33,34 @@ Sprint 32 final long-tail wave adds 16 two-replica Deployments:
`FlowerCore.Provisioning`, `FlowerCore.Redis`, `FlowerCore.MessageBoard`, and
`FlowerCore.MenuBoard`.
## Image Build
Ruby is baked with a pinned `ruby-build` release and Ruby patch version. The pod
still mounts an `emptyDir` over `/home/runner`, so the `setup-runner-home` init
container copies the baked toolcache from `/opt/runner-toolcache/Ruby` into
`/home/runner/_tool/Ruby` before the runner container starts.
```bash
cd apps/github-runner
podman build -t localhost/fc-github-runner:v20260520-ruby3.3.11 .
podman run --rm localhost/fc-github-runner:v20260520-ruby3.3.11 ruby -v
podman run --rm localhost/fc-github-runner:v20260520-ruby3.3.11 \
test -f /opt/runner-toolcache/Ruby/3.3/x64.complete
podman save localhost/fc-github-runner:v20260520-ruby3.3.11 \
-o fc-github-runner-v20260520-ruby3.3.11.tar
```
Import the saved image on every schedulable RKE2 node before ArgoCD rolls the
Deployments:
```bash
for node in rke2-server rke2-agent1 rke2-agent2; do
scp fc-github-runner-v20260520-ruby3.3.11.tar "$node:/tmp/"
ssh "$node" 'sudo ctr -a /run/k3s/containerd/containerd.sock -n k8s.io images rm localhost/fc-github-runner:v20260520-ruby3.3.11 || true'
ssh "$node" 'sudo ctr -a /run/k3s/containerd/containerd.sock -n k8s.io images import /tmp/fc-github-runner-v20260520-ruby3.3.11.tar'
done
```
## Post-Merge Proof
After the PR is merged and ArgoCD syncs, verify the runner fleet:
@@ -36,6 +69,14 @@ After the PR is merged and ArgoCD syncs, verify the runner fleet:
kubectl -n github-runner get deploy,pods,pvc
```
Verify the Ruby toolcache in a fresh pod:
```bash
kubectl -n github-runner exec deploy/github-runner-puppet -c runner -- ruby -v
kubectl -n github-runner exec deploy/github-runner-puppet -c runner -- sh -c \
'echo "$RUNNER_TOOL_CACHE" && test -f "$RUNNER_TOOL_CACHE/Ruby/3.3/x64.complete"'
```
Verify GitHub registration for the repo-scoped runners:
```bash
@@ -69,6 +110,10 @@ from GitHub Actions and verify it lands on an `rke2-linux-*` runner.
- `actions/setup-dotnet` permission error at `/usr/share/dotnet`: check that
`DOTNET_INSTALL_DIR=/home/runner/.dotnet` and related cache env vars are
present on the runner pod.
- `ruby/setup-ruby@v1` says self-hosted runners must install Ruby in
`$RUNNER_TOOL_CACHE`: check that the init container copied
`/opt/runner-toolcache/Ruby` into `/home/runner/_tool/Ruby` and that
`/home/runner/_tool/Ruby/3.3/x64.complete` exists.
- `404` during runner registration: the fine-grained PAT is valid but missing
repository access for that repo. Add the repo to the PAT access list; the PAT
value does not change.

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,19 @@
#!/usr/bin/env bash
set -euo pipefail
RUBY_VERSION="${RUBY_VERSION:-3.3.11}"
RUBY_MINOR="${RUBY_MINOR:-3.3}"
TOOLCACHE_ROOT="${TOOLCACHE_ROOT:-/opt/runner-toolcache}"
RUNNER_UID="${RUNNER_UID:-1001}"
RUNNER_GID="${RUNNER_GID:-1001}"
RUBY_PREFIX="${TOOLCACHE_ROOT}/Ruby/${RUBY_VERSION}/x64"
mkdir -p "${TOOLCACHE_ROOT}/Ruby"
RUBY_CONFIGURE_OPTS="${RUBY_CONFIGURE_OPTS:---disable-install-doc --disable-yjit}" ruby-build "${RUBY_VERSION}" "${RUBY_PREFIX}"
touch "${TOOLCACHE_ROOT}/Ruby/${RUBY_VERSION}/x64.complete"
ln -sfn "${RUBY_VERSION}" "${TOOLCACHE_ROOT}/Ruby/${RUBY_MINOR}"
"${RUBY_PREFIX}/bin/ruby" -v
chown -R "${RUNNER_UID}:${RUNNER_GID}" "${TOOLCACHE_ROOT}"
chmod -R a+rX "${TOOLCACHE_ROOT}"

View File

@@ -280,13 +280,14 @@ data:
printer_model: "NuPrint 210"
# Print.Web health (Blazor app on edge2:5200)
# Target `/health` (anonymous) — root path requires API key auth and returns 401.
- job_name: "probe-printweb"
metrics_path: /probe
params:
module: [http_2xx]
scrape_interval: 30s
static_configs:
- targets: ["http://10.0.57.16:5200/"]
- targets: ["http://10.0.57.16:5200/health"]
labels:
instance: "print-web"
service: "print-web"

View File

@@ -387,38 +387,6 @@ public sealed class FleetManifestLintTests
violations.Should().BeEmpty();
}
[Fact]
public void RemoteDesktopPoolCrds_MustExplicitlyOptInHookReadyTemplates()
{
var expectedModes = new Dictionary<string, string?>(StringComparer.Ordinal)
{
["browser-only"] = null,
["opensuse-xfce"] = "LateAttach",
["dev-workstation"] = "LateAttach",
["ai-station"] = "LateAttach",
["linux-xfce"] = "LateAttach",
["linux-xfce-rdp"] = "LateAttach",
};
var pools = Inventory.Documents
.Where(document => document.Kind == "RemoteDesktopPoolCrd")
.Where(document => document.RelativePath == "fc-desktop/remotedesktop-pools.yaml")
.ToDictionary(
document => document.Scalar("spec", "templateSlug") ?? string.Empty,
StringComparer.Ordinal);
pools.Keys.Should().BeEquivalentTo(expectedModes.Keys);
foreach (var expected in expectedModes)
{
var pool = pools[expected.Key];
pool.Namespace.Should().Be("fc-desktop");
pool.Scalar("spec", "desiredSize").Should().Be("1");
pool.Scalar("spec", "enabled").Should().Be("true");
pool.Scalar("spec", "reconcileNow").Should().Be("true");
pool.Scalar("spec", "userVolumeMode").Should().Be(expected.Value);
}
}
[Fact]
public void PublicEgressDeployments_MustOptOutOfIamworkinLanSearchSuffixes()
{