Compare commits

..

3 Commits

Author SHA1 Message Date
Codex
a0f8fd1790 chore(github-runner): gate token provisioning 2026-05-15 17:32:14 -05:00
Andrew Stoltz
7d2daaa4f8 chore(github-runner): replicas 1 → 0 until 1Password token provisioned
github-runner-token OnePasswordItem exists but the underlying 1Password
vault item hasn't been created yet, so the operator can't mint the K8s
Secret. Pod stuck in CreateContainerConfigError → DeploymentReplicasMismatch
alert fires.

Scaling to 0 keeps the manifest infrastructure intact but stops trying
to schedule until operator:
1. Creates "GitHub Runner Registration Token" item in IAmWorkin vault
2. Generates a token at github.com/astoltz/<repo>/settings/actions/runners/new
3. Updates the OnePasswordItem itemPath to point at it
4. Bumps replicas back to 1 via PR
2026-05-15 16:18:19 -05:00
Andrew Stoltz
e50e103ba0 fix(zabbix): bump web probe timeouts 5s→15s + add failureThreshold
zabbix-web nginx+PHP-FPM container serves / at ~3-5s baseline with
occasional 6-7s spikes (probe path renders full dashboard via PHP).
kube-probe was killing the container after 3 consecutive 5s-timeout
499s, producing CrashLoopBackOff alert noise even though the app
was serving real traffic fine.

15s timeout absorbs the natural variance; explicit failureThreshold=3
documents the policy (was implicit default).

Closes the firing PodCrashLoopBackOff (zabbix-web) + pending
HTTPServiceSlow/HTTPServiceDegraded alerts. zabbix.iamworkin.lan
remains slow at the application layer (separate work — PHP-FPM
warm-up + Zabbix server "host not found" agent lookup spam need
their own fixes) but the pod restart loop stops.
2026-05-15 15:59:04 -05:00
2 changed files with 19 additions and 5 deletions

View File

@@ -24,6 +24,8 @@
# expire after 1h — use a fine-grained PAT with admin:org_hook scope # expire after 1h — use a fine-grained PAT with admin:org_hook scope
# or a re-registration script. See docs/infrastructure/ # or a re-registration script. See docs/infrastructure/
# self-hosted-runner-fleet.md §Security. # self-hosted-runner-fleet.md §Security.
# Until that item exists and the Secret contains key "credential", this
# deployment intentionally stays at replicas: 0.
# #
# Security model: # Security model:
# - No ClusterRole / ClusterRoleBinding — runner has no K8s API access. # - No ClusterRole / ClusterRoleBinding — runner has no K8s API access.
@@ -53,13 +55,18 @@ metadata:
# 1Password secret sync — creates github-runner-token K8s Secret. # 1Password secret sync — creates github-runner-token K8s Secret.
# Fields expected in the 1Password item: # Fields expected in the 1Password item:
# credential — GitHub runner registration token (or PAT for re-reg script) # credential — GitHub runner registration token (or PAT for re-reg script)
# Item path: IAmWorkin vault > "GitHub Runner Registration Token" # Item path convention: vaults/IAmWorkin/items/<exact item title>
# Operator MUST create this item before the Deployment will start cleanly. # Current required title: "GitHub Runner Registration Token"
# Operator MUST create this item before replicas can be raised above 0.
apiVersion: onepassword.com/v1 apiVersion: onepassword.com/v1
kind: OnePasswordItem kind: OnePasswordItem
metadata: metadata:
name: github-runner-token name: github-runner-token
namespace: github-runner namespace: github-runner
annotations:
flowercore.io/operator-action: "Create IAmWorkin item 'GitHub Runner Registration Token' with field 'credential'."
flowercore.io/replica-gate: "Keep Deployment replicas at 0 until github-runner-token Secret exists with key credential."
flowercore.io/provisioning-status: "awaiting-operator-secret-provisioning"
labels: labels:
app.kubernetes.io/component: credentials app.kubernetes.io/component: credentials
app.kubernetes.io/part-of: flowercore app.kubernetes.io/part-of: flowercore
@@ -100,6 +107,8 @@ kind: Deployment
metadata: metadata:
name: github-runner name: github-runner
namespace: github-runner namespace: github-runner
annotations:
flowercore.io/replica-gate: "Scale to 1 only after the 1Password item exists and github-runner-token has key credential."
labels: labels:
app.kubernetes.io/name: github-runner app.kubernetes.io/name: github-runner
app.kubernetes.io/component: runner app.kubernetes.io/component: runner
@@ -111,7 +120,10 @@ spec:
# one pod at a time. Each pod re-registers as an ephemeral runner after # one pod at a time. Each pod re-registers as an ephemeral runner after
# completing a job (EPHEMERAL=true restarts the container, not the pod, # completing a job (EPHEMERAL=true restarts the container, not the pod,
# so the PVC stays attached between jobs). # so the PVC stays attached between jobs).
replicas: 1 # Intentionally 0 while the GitHub runner token item is absent. Follow-up
# PR should set replicas: 1 only after operator provisioning and Secret
# sync verification.
replicas: 0
selector: selector:
matchLabels: matchLabels:
app.kubernetes.io/name: github-runner app.kubernetes.io/name: github-runner

View File

@@ -305,15 +305,17 @@ spec:
path: / path: /
port: 8080 port: 8080
initialDelaySeconds: 60 initialDelaySeconds: 60
timeoutSeconds: 5 timeoutSeconds: 15
periodSeconds: 10 periodSeconds: 10
failureThreshold: 3
readinessProbe: readinessProbe:
httpGet: httpGet:
path: / path: /
port: 8080 port: 8080
initialDelaySeconds: 30 initialDelaySeconds: 30
periodSeconds: 5 periodSeconds: 5
timeoutSeconds: 5 timeoutSeconds: 15
failureThreshold: 3
--- ---
apiVersion: v1 apiVersion: v1
kind: Service kind: Service