# GitHub Runner Fleet ArgoCD owns `apps/github-runner/github-runner.yaml`. Do not patch live runner Deployments with `kubectl`; update this manifest and let ArgoCD reconcile. ## Runner Shape All repo-scoped Linux runners use: - `localhost/fc-github-runner:v20260520-ruby3.3.11`, derived from `myoung34/github-runner:latest` - `ACCESS_TOKEN` from the `github-runner-token` Secret - `RUN_AS_ROOT=false` - `EPHEMERAL=true` - `LABELS=self-hosted,linux,fc-build-linux` - writable non-root paths under `/home/runner` for .NET, NuGet, XDG cache, and Actions tool cache - Ruby 3.3.11 seeded into `/home/runner/_tool/Ruby/3.3/x64` from the baked `/opt/runner-toolcache` copy so `ruby/setup-ruby@v1` can discover it on self-hosted `ubuntu-20.04-x64` runners `github-runner` for `FlowerCore.Common` is single-replica because it retains the original Longhorn ReadWriteOnce NuGet PVC. Every other repo-scoped runner uses two replicas with per-pod `emptyDir` caches. That is the safe backlog-drain strategy: no two pods share one RWO PVC. Sprint 32 final long-tail wave adds 16 two-replica Deployments: `FlowerCore.Knowledge`, `FlowerCore.LlmBridge`, `FlowerCore.Media`, `FlowerCore.Presentations`, `FlowerCore.RemoteDesktop`, `FlowerCore.DNS`, `FlowerCore.Distribution`, `FlowerCore.Scoreboard`, `FlowerCore.SegmentDisplay`, `FlowerCore.Signage.Contracts`, `FlowerCore.SignalControl`, `FlowerCore.Intranet.Web`, `FlowerCore.Provisioning`, `FlowerCore.Redis`, `FlowerCore.MessageBoard`, and `FlowerCore.MenuBoard`. ## Image Build Ruby is baked with a pinned `ruby-build` release and Ruby patch version. The pod still mounts an `emptyDir` over `/home/runner`, so the `setup-runner-home` init container copies the baked toolcache from `/opt/runner-toolcache/Ruby` into `/home/runner/_tool/Ruby` before the runner container starts. ```bash cd apps/github-runner podman build -t localhost/fc-github-runner:v20260520-ruby3.3.11 . podman run --rm localhost/fc-github-runner:v20260520-ruby3.3.11 ruby -v podman run --rm localhost/fc-github-runner:v20260520-ruby3.3.11 \ test -f /opt/runner-toolcache/Ruby/3.3/x64.complete podman save localhost/fc-github-runner:v20260520-ruby3.3.11 \ -o fc-github-runner-v20260520-ruby3.3.11.tar ``` Import the saved image on every schedulable RKE2 node before ArgoCD rolls the Deployments: ```bash for node in rke2-server rke2-agent1 rke2-agent2; do scp fc-github-runner-v20260520-ruby3.3.11.tar "$node:/tmp/" ssh "$node" 'sudo ctr -a /run/k3s/containerd/containerd.sock -n k8s.io images rm localhost/fc-github-runner:v20260520-ruby3.3.11 || true' ssh "$node" 'sudo ctr -a /run/k3s/containerd/containerd.sock -n k8s.io images import /tmp/fc-github-runner-v20260520-ruby3.3.11.tar' done ``` ## Post-Merge Proof After the PR is merged and ArgoCD syncs, verify the runner fleet: ```bash kubectl -n github-runner get deploy,pods,pvc ``` Verify the Ruby toolcache in a fresh pod: ```bash kubectl -n github-runner exec deploy/github-runner-puppet -c runner -- ruby -v kubectl -n github-runner exec deploy/github-runner-puppet -c runner -- sh -c \ 'echo "$RUNNER_TOOL_CACHE" && test -f "$RUNNER_TOOL_CACHE/Ruby/3.3/x64.complete"' ``` Verify GitHub registration for the repo-scoped runners: ```bash for repo in FlowerCore.Common FlowerCore.Shared.Pos FlowerCore.Puppet FlowerCore.Signage \ FlowerCore.DMS FlowerCore.Telephony FlowerCore.Print.Web FlowerCore.Chat \ FlowerCore.MySQL FlowerCore.Kiosk.Linux FlowerCore.Marquee FlowerCore.TtsReader \ FlowerCore.Knowledge FlowerCore.LlmBridge FlowerCore.Media \ FlowerCore.Presentations FlowerCore.RemoteDesktop FlowerCore.DNS \ FlowerCore.Distribution FlowerCore.Scoreboard FlowerCore.SegmentDisplay \ FlowerCore.Signage.Contracts FlowerCore.SignalControl FlowerCore.Intranet.Web \ FlowerCore.Provisioning FlowerCore.Redis FlowerCore.MessageBoard \ FlowerCore.MenuBoard; do echo "=== $repo ===" gh api "/repos/astoltz/$repo/actions/runners" \ --jq '.runners[] | select(.labels[].name == "fc-build-linux") | {name,status,busy,labels:[.labels[].name]}' done ``` Shared.Pos publish proof after the runner pod is online: ```bash gh run list --repo astoltz/FlowerCore.Shared.Pos \ --workflow "Build, Test & Publish" --branch main --limit 5 ``` If the latest run is still queued after runner registration, rerun the workflow from GitHub Actions and verify it lands on an `rke2-linux-*` runner. ## Failure Notes - `actions/setup-dotnet` permission error at `/usr/share/dotnet`: check that `DOTNET_INSTALL_DIR=/home/runner/.dotnet` and related cache env vars are present on the runner pod. - `ruby/setup-ruby@v1` says self-hosted runners must install Ruby in `$RUNNER_TOOL_CACHE`: check that the init container copied `/opt/runner-toolcache/Ruby` into `/home/runner/_tool/Ruby` and that `/home/runner/_tool/Ruby/3.3/x64.complete` exists. - `404` during runner registration: the fine-grained PAT is valid but missing repository access for that repo. Add the repo to the PAT access list; the PAT value does not change. - `Multi-Attach` volume error: only the Common runner uses a RWO PVC and it must stay single-replica. New multi-replica runners use `emptyDir`.