5.1 KiB
GitHub Runner Fleet
ArgoCD owns apps/github-runner/github-runner.yaml. Do not patch live runner
Deployments with kubectl; update this manifest and let ArgoCD reconcile.
Runner Shape
All repo-scoped Linux runners use:
localhost/fc-github-runner:v20260520-ruby3.3.11, derived frommyoung34/github-runner:latestACCESS_TOKENfrom thegithub-runner-tokenSecretRUN_AS_ROOT=falseEPHEMERAL=trueLABELS=self-hosted,linux,fc-build-linux- writable non-root paths under
/home/runnerfor .NET, NuGet, XDG cache, and Actions tool cache - Ruby 3.3.11 seeded into
/home/runner/_tool/Ruby/3.3/x64from the baked/opt/runner-toolcachecopy soruby/setup-ruby@v1can discover it on self-hostedubuntu-20.04-x64runners
github-runner for FlowerCore.Common is single-replica because it retains the
original Longhorn ReadWriteOnce NuGet PVC. Every other repo-scoped runner uses
two replicas with per-pod emptyDir caches. That is the safe backlog-drain
strategy: no two pods share one RWO PVC.
Sprint 32 final long-tail wave adds 16 two-replica Deployments:
FlowerCore.Knowledge, FlowerCore.LlmBridge, FlowerCore.Media,
FlowerCore.Presentations, FlowerCore.RemoteDesktop, FlowerCore.DNS,
FlowerCore.Distribution, FlowerCore.Scoreboard,
FlowerCore.SegmentDisplay, FlowerCore.Signage.Contracts,
FlowerCore.SignalControl, FlowerCore.Intranet.Web,
FlowerCore.Provisioning, FlowerCore.Redis, FlowerCore.MessageBoard, and
FlowerCore.MenuBoard.
Image Build
Ruby is baked with a pinned ruby-build release and Ruby patch version. The pod
still mounts an emptyDir over /home/runner, so the setup-runner-home init
container copies the baked toolcache from /opt/runner-toolcache/Ruby into
/home/runner/_tool/Ruby before the runner container starts.
cd apps/github-runner
podman build -t localhost/fc-github-runner:v20260520-ruby3.3.11 .
podman run --rm localhost/fc-github-runner:v20260520-ruby3.3.11 ruby -v
podman run --rm localhost/fc-github-runner:v20260520-ruby3.3.11 \
test -f /opt/runner-toolcache/Ruby/3.3/x64.complete
podman save localhost/fc-github-runner:v20260520-ruby3.3.11 \
-o fc-github-runner-v20260520-ruby3.3.11.tar
Import the saved image on every schedulable RKE2 node before ArgoCD rolls the Deployments:
for node in rke2-server rke2-agent1 rke2-agent2; do
scp fc-github-runner-v20260520-ruby3.3.11.tar "$node:/tmp/"
ssh "$node" 'sudo ctr -a /run/k3s/containerd/containerd.sock -n k8s.io images rm localhost/fc-github-runner:v20260520-ruby3.3.11 || true'
ssh "$node" 'sudo ctr -a /run/k3s/containerd/containerd.sock -n k8s.io images import /tmp/fc-github-runner-v20260520-ruby3.3.11.tar'
done
Post-Merge Proof
After the PR is merged and ArgoCD syncs, verify the runner fleet:
kubectl -n github-runner get deploy,pods,pvc
Verify the Ruby toolcache in a fresh pod:
kubectl -n github-runner exec deploy/github-runner-puppet -c runner -- ruby -v
kubectl -n github-runner exec deploy/github-runner-puppet -c runner -- sh -c \
'echo "$RUNNER_TOOL_CACHE" && test -f "$RUNNER_TOOL_CACHE/Ruby/3.3/x64.complete"'
Verify GitHub registration for the repo-scoped runners:
for repo in FlowerCore.Common FlowerCore.Shared.Pos FlowerCore.Puppet FlowerCore.Signage \
FlowerCore.DMS FlowerCore.Telephony FlowerCore.Print.Web FlowerCore.Chat \
FlowerCore.MySQL FlowerCore.Kiosk.Linux FlowerCore.Marquee FlowerCore.TtsReader \
FlowerCore.Knowledge FlowerCore.LlmBridge FlowerCore.Media \
FlowerCore.Presentations FlowerCore.RemoteDesktop FlowerCore.DNS \
FlowerCore.Distribution FlowerCore.Scoreboard FlowerCore.SegmentDisplay \
FlowerCore.Signage.Contracts FlowerCore.SignalControl FlowerCore.Intranet.Web \
FlowerCore.Provisioning FlowerCore.Redis FlowerCore.MessageBoard \
FlowerCore.MenuBoard; do
echo "=== $repo ==="
gh api "/repos/astoltz/$repo/actions/runners" \
--jq '.runners[] | select(.labels[].name == "fc-build-linux") | {name,status,busy,labels:[.labels[].name]}'
done
Shared.Pos publish proof after the runner pod is online:
gh run list --repo astoltz/FlowerCore.Shared.Pos \
--workflow "Build, Test & Publish" --branch main --limit 5
If the latest run is still queued after runner registration, rerun the workflow
from GitHub Actions and verify it lands on an rke2-linux-* runner.
Failure Notes
actions/setup-dotnetpermission error at/usr/share/dotnet: check thatDOTNET_INSTALL_DIR=/home/runner/.dotnetand related cache env vars are present on the runner pod.ruby/setup-ruby@v1says self-hosted runners must install Ruby in$RUNNER_TOOL_CACHE: check that the init container copied/opt/runner-toolcache/Rubyinto/home/runner/_tool/Rubyand that/home/runner/_tool/Ruby/3.3/x64.completeexists.404during runner registration: the fine-grained PAT is valid but missing repository access for that repo. Add the repo to the PAT access list; the PAT value does not change.Multi-Attachvolume error: only the Common runner uses a RWO PVC and it must stay single-replica. New multi-replica runners useemptyDir.