Without the IAmWorkin step-ca root CA in the runner image's system trust store, .NET HttpClient calls from CI tests against `*.iamworkin.lan` (e.g. `https://selenium.iamworkin.lan/session`) fail with `The remote certificate is invalid because of errors in the certificate chain: PartialChain`. FlowerCore.Print.Web's `WebScreenshotService` unit tests hit this on every build. Drop the step-ca root PEM into `/usr/local/share/ca-certificates/`, run `update-ca-certificates` once during apt install, and let OpenSSL + .NET-on-Linux read the regenerated `/etc/ssl/certs/ca-certificates.crt` automatically — no `SSL_CERT_FILE` env var, no per-Deployment volume mount. Image rebuilt + saved + imported on all 3 schedulable RKE2 nodes (rke2-server, rke2-agent1, rke2-agent2) before this PR — verified with `ctr images list -q | grep stepca` on each node. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
GitHub Runner Fleet
ArgoCD owns apps/github-runner/github-runner.yaml. Do not patch live runner
Deployments with kubectl; update this manifest and let ArgoCD reconcile.
Runner Shape
All repo-scoped Linux runners use:
localhost/fc-github-runner:v20260525-ruby3.3.11-stepca, derived frommyoung34/github-runner:latestACCESS_TOKENfrom thegithub-runner-tokenSecretRUN_AS_ROOT=falseEPHEMERAL=trueLABELS=self-hosted,linux,fc-build-linux- writable non-root paths under
/home/runnerfor .NET, NuGet, XDG cache, and Actions tool cache - Ruby 3.3.11 seeded into
/home/runner/_tool/Ruby/3.3/x64from the baked/opt/runner-toolcachecopy soruby/setup-ruby@v1can discover it on self-hostedubuntu-20.04-x64runners
github-runner for FlowerCore.Common is single-replica because it retains the
original Longhorn ReadWriteOnce NuGet PVC. Every other repo-scoped runner uses
two replicas with per-pod emptyDir caches. That is the safe backlog-drain
strategy: no two pods share one RWO PVC.
Sprint 32 final long-tail wave adds 16 two-replica Deployments:
FlowerCore.Knowledge, FlowerCore.LlmBridge, FlowerCore.Media,
FlowerCore.Presentations, FlowerCore.RemoteDesktop, FlowerCore.DNS,
FlowerCore.Distribution, FlowerCore.Scoreboard,
FlowerCore.SegmentDisplay, FlowerCore.Signage.Contracts,
FlowerCore.SignalControl, FlowerCore.Intranet.Web,
FlowerCore.Provisioning, FlowerCore.Redis, FlowerCore.MessageBoard, and
FlowerCore.MenuBoard.
Image Build
Ruby is baked with a pinned ruby-build release and Ruby patch version. The pod
still mounts an emptyDir over /home/runner, so the setup-runner-home init
container copies the baked toolcache from /opt/runner-toolcache/Ruby into
/home/runner/_tool/Ruby before the runner container starts.
The IAmWorkin step-ca root CA is also baked into the system trust store
(/usr/local/share/ca-certificates/iamworkin-step-ca-root.crt, registered by
update-ca-certificates). Without it, .NET HttpClient calls from CI tests
against *.iamworkin.lan (e.g. https://selenium.iamworkin.lan/session)
fail with PartialChain. To refresh the bundled cert when the root rotates,
re-extract from the cluster and overwrite step-ca-root.crt:
kubectl get secret -n cert-manager step-ca-root \
-o jsonpath='{.data.ca\.crt}' | base64 -d > step-ca-root.crt
cd apps/github-runner
podman build -t localhost/fc-github-runner:v20260525-ruby3.3.11-stepca .
podman run --rm localhost/fc-github-runner:v20260525-ruby3.3.11-stepca ruby -v
podman run --rm localhost/fc-github-runner:v20260525-ruby3.3.11-stepca \
test -f /opt/runner-toolcache/Ruby/3.3/x64.complete
podman save localhost/fc-github-runner:v20260525-ruby3.3.11-stepca \
-o fc-github-runner-v20260525-ruby3.3.11-stepca.tar
Import the saved image on every schedulable RKE2 node before ArgoCD rolls the Deployments:
for node in rke2-server rke2-agent1 rke2-agent2; do
scp fc-github-runner-v20260525-ruby3.3.11-stepca.tar "$node:/tmp/"
ssh "$node" 'sudo ctr -a /run/k3s/containerd/containerd.sock -n k8s.io images rm localhost/fc-github-runner:v20260525-ruby3.3.11-stepca || true'
ssh "$node" 'sudo ctr -a /run/k3s/containerd/containerd.sock -n k8s.io images import /tmp/fc-github-runner-v20260525-ruby3.3.11-stepca.tar'
done
Post-Merge Proof
After the PR is merged and ArgoCD syncs, verify the runner fleet:
kubectl -n github-runner get deploy,pods,pvc
Verify the Ruby toolcache in a fresh pod:
kubectl -n github-runner exec deploy/github-runner-puppet -c runner -- ruby -v
kubectl -n github-runner exec deploy/github-runner-puppet -c runner -- sh -c \
'echo "$RUNNER_TOOL_CACHE" && test -f "$RUNNER_TOOL_CACHE/Ruby/3.3/x64.complete"'
Verify GitHub registration for the repo-scoped runners:
for repo in FlowerCore.Common FlowerCore.Shared.Pos FlowerCore.Puppet FlowerCore.Signage \
FlowerCore.DMS FlowerCore.Telephony FlowerCore.Print.Web FlowerCore.Chat \
FlowerCore.MySQL FlowerCore.Kiosk.Linux FlowerCore.Marquee FlowerCore.TtsReader \
FlowerCore.Knowledge FlowerCore.LlmBridge FlowerCore.Media \
FlowerCore.Presentations FlowerCore.RemoteDesktop FlowerCore.DNS \
FlowerCore.Distribution FlowerCore.Scoreboard FlowerCore.SegmentDisplay \
FlowerCore.Signage.Contracts FlowerCore.SignalControl FlowerCore.Intranet.Web \
FlowerCore.Provisioning FlowerCore.Redis FlowerCore.MessageBoard \
FlowerCore.MenuBoard; do
echo "=== $repo ==="
gh api "/repos/astoltz/$repo/actions/runners" \
--jq '.runners[] | select(.labels[].name == "fc-build-linux") | {name,status,busy,labels:[.labels[].name]}'
done
Shared.Pos publish proof after the runner pod is online:
gh run list --repo astoltz/FlowerCore.Shared.Pos \
--workflow "Build, Test & Publish" --branch main --limit 5
If the latest run is still queued after runner registration, rerun the workflow
from GitHub Actions and verify it lands on an rke2-linux-* runner.
Failure Notes
actions/setup-dotnetpermission error at/usr/share/dotnet: check thatDOTNET_INSTALL_DIR=/home/runner/.dotnetand related cache env vars are present on the runner pod.ruby/setup-ruby@v1says self-hosted runners must install Ruby in$RUNNER_TOOL_CACHE: check that the init container copied/opt/runner-toolcache/Rubyinto/home/runner/_tool/Rubyand that/home/runner/_tool/Ruby/3.3/x64.completeexists.404during runner registration: the fine-grained PAT is valid but missing repository access for that repo. Add the repo to the PAT access list; the PAT value does not change.Multi-Attachvolume error: only the Common runner uses a RWO PVC and it must stay single-replica. New multi-replica runners useemptyDir.