77 lines
3.1 KiB
Markdown
77 lines
3.1 KiB
Markdown
# GitHub Runner Fleet
|
|
|
|
ArgoCD owns `apps/github-runner/github-runner.yaml`. Do not patch live runner
|
|
Deployments with `kubectl`; update this manifest and let ArgoCD reconcile.
|
|
|
|
## Runner Shape
|
|
|
|
All repo-scoped Linux runners use:
|
|
|
|
- `ACCESS_TOKEN` from the `github-runner-token` Secret
|
|
- `RUN_AS_ROOT=false`
|
|
- `EPHEMERAL=true`
|
|
- `LABELS=self-hosted,linux,fc-build-linux`
|
|
- writable non-root paths under `/home/runner` for .NET, NuGet, XDG cache, and
|
|
Actions tool cache
|
|
|
|
`github-runner` for `FlowerCore.Common` is single-replica because it retains the
|
|
original Longhorn ReadWriteOnce NuGet PVC. Every other repo-scoped runner uses
|
|
two replicas with per-pod `emptyDir` caches. That is the safe backlog-drain
|
|
strategy: no two pods share one RWO PVC.
|
|
|
|
Sprint 32 final long-tail wave adds 16 two-replica Deployments:
|
|
`FlowerCore.Knowledge`, `FlowerCore.LlmBridge`, `FlowerCore.Media`,
|
|
`FlowerCore.Presentations`, `FlowerCore.RemoteDesktop`, `FlowerCore.DNS`,
|
|
`FlowerCore.Distribution`, `FlowerCore.Scoreboard`,
|
|
`FlowerCore.SegmentDisplay`, `FlowerCore.Signage.Contracts`,
|
|
`FlowerCore.SignalControl`, `FlowerCore.Intranet.Web`,
|
|
`FlowerCore.Provisioning`, `FlowerCore.Redis`, `FlowerCore.MessageBoard`, and
|
|
`FlowerCore.MenuBoard`.
|
|
|
|
## Post-Merge Proof
|
|
|
|
After the PR is merged and ArgoCD syncs, verify the runner fleet:
|
|
|
|
```bash
|
|
kubectl -n github-runner get deploy,pods,pvc
|
|
```
|
|
|
|
Verify GitHub registration for the repo-scoped runners:
|
|
|
|
```bash
|
|
for repo in FlowerCore.Common FlowerCore.Shared.Pos FlowerCore.Puppet FlowerCore.Signage \
|
|
FlowerCore.DMS FlowerCore.Telephony FlowerCore.Print.Web FlowerCore.Chat \
|
|
FlowerCore.MySQL FlowerCore.Kiosk.Linux FlowerCore.Marquee FlowerCore.TtsReader \
|
|
FlowerCore.Knowledge FlowerCore.LlmBridge FlowerCore.Media \
|
|
FlowerCore.Presentations FlowerCore.RemoteDesktop FlowerCore.DNS \
|
|
FlowerCore.Distribution FlowerCore.Scoreboard FlowerCore.SegmentDisplay \
|
|
FlowerCore.Signage.Contracts FlowerCore.SignalControl FlowerCore.Intranet.Web \
|
|
FlowerCore.Provisioning FlowerCore.Redis FlowerCore.MessageBoard \
|
|
FlowerCore.MenuBoard; do
|
|
echo "=== $repo ==="
|
|
gh api "/repos/astoltz/$repo/actions/runners" \
|
|
--jq '.runners[] | select(.labels[].name == "fc-build-linux") | {name,status,busy,labels:[.labels[].name]}'
|
|
done
|
|
```
|
|
|
|
Shared.Pos publish proof after the runner pod is online:
|
|
|
|
```bash
|
|
gh run list --repo astoltz/FlowerCore.Shared.Pos \
|
|
--workflow "Build, Test & Publish" --branch main --limit 5
|
|
```
|
|
|
|
If the latest run is still queued after runner registration, rerun the workflow
|
|
from GitHub Actions and verify it lands on an `rke2-linux-*` runner.
|
|
|
|
## Failure Notes
|
|
|
|
- `actions/setup-dotnet` permission error at `/usr/share/dotnet`: check that
|
|
`DOTNET_INSTALL_DIR=/home/runner/.dotnet` and related cache env vars are
|
|
present on the runner pod.
|
|
- `404` during runner registration: the fine-grained PAT is valid but missing
|
|
repository access for that repo. Add the repo to the PAT access list; the PAT
|
|
value does not change.
|
|
- `Multi-Attach` volume error: only the Common runner uses a RWO PVC and it must
|
|
stay single-replica. New multi-replica runners use `emptyDir`.
|