Files
bluejay-infra/apps/github-runner
Andrew Stoltz a36036aa1a feat(github-runner): add Marquee + TtsReader per-repo runners
Closes the Marquee 1.2.0 publish queue (PR #7 on Marquee, scheme aligned
with FC.Common canonical csproj-baked Version) and the TtsReader audio-404
FcAlert CI queue (PR #18 on TtsReader merged from this session). Mirrors
the Sprint 29 Cx-1 per-repo Deployment pattern from PR #5 verbatim.

Marquee + TtsReader weren't in the Sprint 29 Cx-1 top-8 cost-driven set
but have operator-relevant CI queued. Long-tail repos (Common shared
sublibs / Distribution / DNS / Knowledge / LlmBridge / Media / Presentations
/ RemoteDesktop / SegmentDisplay / Signage.Contracts / SignalControl / etc.)
still deferred to Q-CI-58 Sprint 31+ batch as planned.
2026-05-17 22:26:28 -05:00
..

GitHub Runner Fleet

ArgoCD owns apps/github-runner/github-runner.yaml. Do not patch live runner Deployments with kubectl; update this manifest and let ArgoCD reconcile.

Runner Shape

All repo-scoped Linux runners use:

  • ACCESS_TOKEN from the github-runner-token Secret
  • RUN_AS_ROOT=false
  • EPHEMERAL=true
  • LABELS=self-hosted,linux,fc-build-linux
  • writable non-root paths under /home/runner for .NET, NuGet, XDG cache, and Actions tool cache

github-runner for FlowerCore.Common is single-replica because it retains the original Longhorn ReadWriteOnce NuGet PVC. github-runner-sharedpos and the top Linux-cost repo runners use two replicas with per-pod emptyDir caches. That is the safe backlog-drain strategy: no two pods share one RWO PVC.

Post-Merge Proof

After the PR is merged and ArgoCD syncs, verify the runner fleet:

kubectl -n github-runner get deploy,pods,pvc

Verify GitHub registration for the repo-scoped runners:

for repo in FlowerCore.Common FlowerCore.Shared.Pos FlowerCore.Puppet FlowerCore.Signage \
            FlowerCore.DMS FlowerCore.Telephony FlowerCore.Print.Web FlowerCore.Chat \
            FlowerCore.MySQL FlowerCore.Kiosk.Linux; do
  echo "=== $repo ==="
  gh api "/repos/astoltz/$repo/actions/runners" \
    --jq '.runners[] | select(.labels[].name == "fc-build-linux") | {name,status,busy,labels:[.labels[].name]}'
done

Shared.Pos publish proof after the runner pod is online:

gh run list --repo astoltz/FlowerCore.Shared.Pos \
  --workflow "Build, Test & Publish" --branch main --limit 5

If the latest run is still queued after runner registration, rerun the workflow from GitHub Actions and verify it lands on an rke2-linux-* runner.

Failure Notes

  • actions/setup-dotnet permission error at /usr/share/dotnet: check that DOTNET_INSTALL_DIR=/home/runner/.dotnet and related cache env vars are present on the runner pod.
  • 404 during runner registration: the fine-grained PAT is valid but missing repository access for that repo. Add the repo to the PAT access list; the PAT value does not change.
  • Multi-Attach volume error: only the Common runner uses a RWO PVC and it must stay single-replica. New multi-replica runners use emptyDir.