Files
bluejay-infra/apps/github-runner/README.md
2026-05-18 17:44:29 -05:00

4.0 KiB

GitHub Runner Fleet

ArgoCD owns apps/github-runner/github-runner.yaml. Do not patch live runner Deployments with kubectl; update this manifest and let ArgoCD reconcile.

Runner Shape

All repo-scoped Linux runners use:

  • ACCESS_TOKEN from the github-runner-token Secret
  • RUN_AS_ROOT=false
  • EPHEMERAL=true
  • LABELS=self-hosted,linux,fc-build-linux
  • writable non-root paths under /home/runner for .NET, NuGet, XDG cache, and Actions tool cache

github-runner for FlowerCore.Common is single-replica because it retains the original Longhorn ReadWriteOnce NuGet PVC. Every other repo-scoped runner uses two replicas with per-pod emptyDir caches. That is the safe backlog-drain strategy: no two pods share one RWO PVC.

Sprint 32 final long-tail wave adds 16 two-replica Deployments: FlowerCore.Knowledge, FlowerCore.LlmBridge, FlowerCore.Media, FlowerCore.Presentations, FlowerCore.RemoteDesktop, FlowerCore.DNS, FlowerCore.Distribution, FlowerCore.Scoreboard, FlowerCore.SegmentDisplay, FlowerCore.Signage.Contracts, FlowerCore.SignalControl, FlowerCore.Intranet.Web, FlowerCore.Provisioning, FlowerCore.Redis, FlowerCore.MessageBoard, and FlowerCore.MenuBoard.

Sprint 37 Cx-2 closes the audited Linux runner gaps for FlowerCore.DeviceManagement and FlowerCore.WorldBuilder with the same two-replica emptyDir pattern.

Post-Merge Proof

After the PR is merged and ArgoCD syncs, verify the runner fleet:

kubectl -n github-runner get deploy,pods,pvc

Verify GitHub registration for the repo-scoped runners:

for repo in FlowerCore.Common FlowerCore.Shared.Pos FlowerCore.Puppet FlowerCore.Signage \
            FlowerCore.DMS FlowerCore.Telephony FlowerCore.Print.Web FlowerCore.Chat \
            FlowerCore.MySQL FlowerCore.Kiosk.Linux FlowerCore.Marquee FlowerCore.TtsReader \
            FlowerCore.Knowledge FlowerCore.LlmBridge FlowerCore.Media \
            FlowerCore.Presentations FlowerCore.RemoteDesktop FlowerCore.DNS \
            FlowerCore.Distribution FlowerCore.Scoreboard FlowerCore.SegmentDisplay \
            FlowerCore.Signage.Contracts FlowerCore.SignalControl FlowerCore.Intranet.Web \
            FlowerCore.Provisioning FlowerCore.Redis FlowerCore.MessageBoard \
            FlowerCore.MenuBoard FlowerCore.DeviceManagement FlowerCore.WorldBuilder; do
  echo "=== $repo ==="
  gh api "/repos/astoltz/$repo/actions/runners" \
    --jq '.runners[] | select(.labels[].name == "fc-build-linux") | {name,status,busy,labels:[.labels[].name]}'
done

Shared.Pos publish proof after the runner pod is online:

gh run list --repo astoltz/FlowerCore.Shared.Pos \
  --workflow "Build, Test & Publish" --branch main --limit 5

If the latest run is still queued after runner registration, rerun the workflow from GitHub Actions and verify it lands on an rke2-linux-* runner.

Sprint 37 Cx-2 Gap Audit

The 2026-05-18 GitHub workflow scan found these remaining repos with runs-on: [self-hosted, linux, fc-build-linux] but no K8s runner Deployment: FlowerCore.AiStation.Linux, FlowerCore.PHP, FlowerCore.PiManager, FlowerCore.Shared.Barcodes, FlowerCore.Shared.Lookup, FlowerCore.Shared.Nodes, FlowerCore.Shared.PrintClient, FlowerCore.Shared.Relay, FlowerCore.Shared.ShowRunner, and FlowerCore.Shared.Storage.

Mixed/platform repos also have Linux workflow legs but need owner review before adding Linux runner Deployments: FlowerCore.Library.Mac, FlowerCore.Signage.Agent.AppleTv, and FlowerCore.Signage.Player.Wpf.

Failure Notes

  • actions/setup-dotnet permission error at /usr/share/dotnet: check that DOTNET_INSTALL_DIR=/home/runner/.dotnet and related cache env vars are present on the runner pod.
  • 404 during runner registration: the fine-grained PAT is valid but missing repository access for that repo. Add the repo to the PAT access list; the PAT value does not change.
  • Multi-Attach volume error: only the Common runner uses a RWO PVC and it must stay single-replica. New multi-replica runners use emptyDir.