Files
bluejay-infra/apps/github-runner/README.md
2026-05-17 21:50:29 -05:00

2.2 KiB

GitHub Runner Fleet

ArgoCD owns apps/github-runner/github-runner.yaml. Do not patch live runner Deployments with kubectl; update this manifest and let ArgoCD reconcile.

Runner Shape

All repo-scoped Linux runners use:

  • ACCESS_TOKEN from the github-runner-token Secret
  • RUN_AS_ROOT=false
  • EPHEMERAL=true
  • LABELS=self-hosted,linux,fc-build-linux
  • writable non-root paths under /home/runner for .NET, NuGet, XDG cache, and Actions tool cache

github-runner for FlowerCore.Common is single-replica because it retains the original Longhorn ReadWriteOnce NuGet PVC. github-runner-sharedpos and the top Linux-cost repo runners use two replicas with per-pod emptyDir caches. That is the safe backlog-drain strategy: no two pods share one RWO PVC.

Post-Merge Proof

After the PR is merged and ArgoCD syncs, verify the runner fleet:

kubectl -n github-runner get deploy,pods,pvc

Verify GitHub registration for the repo-scoped runners:

for repo in FlowerCore.Common FlowerCore.Shared.Pos FlowerCore.Puppet FlowerCore.Signage \
            FlowerCore.DMS FlowerCore.Telephony FlowerCore.Print.Web FlowerCore.Chat \
            FlowerCore.MySQL FlowerCore.Kiosk.Linux; do
  echo "=== $repo ==="
  gh api "/repos/astoltz/$repo/actions/runners" \
    --jq '.runners[] | select(.labels[].name == "fc-build-linux") | {name,status,busy,labels:[.labels[].name]}'
done

Shared.Pos publish proof after the runner pod is online:

gh run list --repo astoltz/FlowerCore.Shared.Pos \
  --workflow "Build, Test & Publish" --branch main --limit 5

If the latest run is still queued after runner registration, rerun the workflow from GitHub Actions and verify it lands on an rke2-linux-* runner.

Failure Notes

  • actions/setup-dotnet permission error at /usr/share/dotnet: check that DOTNET_INSTALL_DIR=/home/runner/.dotnet and related cache env vars are present on the runner pod.
  • 404 during runner registration: the fine-grained PAT is valid but missing repository access for that repo. Add the repo to the PAT access list; the PAT value does not change.
  • Multi-Attach volume error: only the Common runner uses a RWO PVC and it must stay single-replica. New multi-replica runners use emptyDir.