bluejay-infra/apps at bacac067cff81808f9c6ae676c8fb0f7f355c8d6 - bluejay-infra - Gitea: Git with a cup of tea

bluejay/bluejay-infra

Files

History

Andrew Stoltz bacac067cf monitoring(irc-notify): hourly digest batching for thermal printer

The thermal printer drained overnight (2026-05-18/19) because the old
notify.py POSTed one print job per Grafana webhook fire. With 9
concurrently-firing alerts (zabbix-postgres + fc-devicemgmt + brochure
+ PrintPaperRollLow), every evaluation cycle stamped fresh CUPS jobs
onto the queue until the operator physically powered the printer off.

This refactor:

- Adds env-var config: THERMAL_PRINT_ENABLED (master kill switch),
  BATCH_INTERVAL_MIN (default 60), BATCH_MAX_PENDING (default 50).
- IRC delivery stays per-event (operator wants the live stream).
- Thermal routing now:
  * critical/disaster/page severity OR alert_channel=thermal_print_immediate
    -> print immediately
  * alert_channel=thermal_print -> enqueue into hourly digest
  * RESOLVED -> remove from digest buffer (no resolution-spam prints)
  * else -> IRC only, no thermal
- Background digest_loop thread flushes the buffer hourly (or sooner
  if buffer hits BATCH_MAX_PENDING). Digest payload is a single
  Print.Web /api/print/alert POST listing distinct alertnames + per-rule
  target counts.
- New POST /flush endpoint (manual operator force-flush; useful for
  testing without waiting an hour).
- GET / returns config + buffer depth + per-stat counters for observability.

Net effect: max 1 thermal print per BATCH_INTERVAL_MIN for batched
warnings, plus immediate prints for criticals. Closes the 2026-05-18/19
alert-storm incident.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-19 09:56:14 -05:00

..

fix(agent-zero): export knowledge mcp gate to python builder

2026-04-29 23:32:55 -05:00

Fix Traefik dashboard cert issuer: step-ca-acme

2026-03-10 01:12:08 -05:00

K8s manifest hardening + new bluejay-infra-lint test project

2026-05-04 03:18:04 -05:00

fix(brochure): scale brochure-web to 0 — wrong codebase shipped (Intranet.Web binary in fc-brochure-web image, CrashLoopBackOff 296 restarts on /data read-only). Re-enable after Sprint 34 Cx-3 rebuild per docs/ai-agents/codex-prompts/2026-05-18-fc-brochure-web-rebuild-pack.md

2026-05-19 14:45:01 +00:00

feat(infra): add Multus CNI + CDI + PROD VLAN 57 NAD as GitOps prereqs for ci1

2026-05-08 13:05:58 -05:00

Fix Traefik dashboard cert issuer: step-ca-acme

2026-03-10 01:12:08 -05:00

edge2-services: print.iamworkin.lan Traefik HTTPS for Print.Web (XL Track C)

2026-04-26 14:37:33 -05:00

Fix Traefik dashboard cert issuer: step-ca-acme

2026-03-10 01:12:08 -05:00

feat(chat): add public twin ingress (#11 )

2026-05-18 04:52:20 +00:00

fix(fc-desktop): land 4 NetworkPolicies into bluejay-infra (was deploy-script-only)

2026-05-07 10:27:20 -05:00

feat(fc-devicemgmt): add Kubernetes deployment manifests (#1 )

2026-05-18 02:56:23 +00:00

fc-distribution

fix(certs): kill cert-manager renewal loop on 3 broken Certificate specs

2026-05-07 15:32:00 -05:00

Add signage service ingress manifests

2026-04-09 15:09:08 -05:00

Add step-ca TLS certs for mysql, php, desktop, signage, fc-landing

2026-04-08 18:20:23 -05:00

K8s manifest hardening + new bluejay-infra-lint test project

2026-05-04 03:18:04 -05:00

apps: fc-chat refactor + fc-menuboard app split

2026-04-16 19:25:25 -05:00

fc-messageboard

K8s manifest hardening + new bluejay-infra-lint test project

2026-05-04 03:18:04 -05:00

Add step-ca TLS certs for mysql, php, desktop, signage, fc-landing

2026-04-08 18:20:23 -05:00

Add step-ca TLS certs for mysql, php, desktop, signage, fc-landing

2026-04-08 18:20:23 -05:00

fc-presentations

Add signage service ingress manifests

2026-04-09 15:09:08 -05:00

feat(fc-redis): add SignalR backplane for cross-product event bus (Q-SO-1 Phase A)

2026-05-11 19:02:58 -05:00

Add signage service ingress manifests

2026-04-09 15:09:08 -05:00

fc-segmentdisplay

feat(fc-segmentdisplay): switch tls certificate to dns01

2026-04-23 18:39:17 -05:00

Add step-ca TLS certs for mysql, php, desktop, signage, fc-landing

2026-04-08 18:20:23 -05:00

fc-signage-appletv

Add Apple TV signage docs manifest

2026-05-13 20:32:48 -05:00

fc-signage-pi-player

Tighten Pi signage HDMI settle coverage (#3 )

2026-05-18 02:35:17 +00:00

fc-signalcontrol

K8s manifest hardening + new bluejay-infra-lint test project

2026-05-04 03:18:04 -05:00

ttsreader: deploy study mode repair image

2026-05-18 16:33:08 -05:00

[uc] Phase 1 auth gate deploy v20260509-4162dca-authgate

2026-05-08 21:16:54 -05:00

Fix Traefik dashboard cert issuer: step-ca-acme

2026-03-10 01:12:08 -05:00

Adopt fc-updater into ArgoCD

2026-05-06 17:33:32 -05:00

Update telephony-web image to v20260324d, resolve merge conflicts

2026-03-24 15:55:52 -05:00

feat(github-runner): add final long-tail runners (#9 )

2026-05-18 04:52:01 +00:00

fix(guacamole): add --- separator between macmini-vnc-creds OnePasswordItem and guacamole-branding ConfigMap

2026-05-12 09:26:03 -05:00

deploy(intranet): promote brochure wave 1 image

2026-05-08 11:12:56 -05:00

fix(irc): use short name for unrealircd in anope + thelounge configs

2026-04-22 21:23:38 -05:00

fix(certs): kill cert-manager renewal loop on 3 broken Certificate specs

2026-05-07 15:32:00 -05:00

kubevirt-vms: boot ci1 from server template

2026-05-12 16:58:18 -05:00

Update telephony-web image to v20260324d, resolve merge conflicts

2026-03-24 15:55:52 -05:00

statefulsets: align guacamole and matrix drift defaults

2026-04-22 23:11:47 -05:00

monitoring(irc-notify): hourly digest batching for thermal printer

2026-05-19 09:56:14 -05:00

fix(multus): bump kube-multus-ds memory 50Mi/50Mi -> 1Gi/512Mi (prevent OOM cascade)

2026-05-11 10:30:05 -05:00

feat(noc-services): wire puppetdb.iamworkin.lan through Traefik step-ca cert

2026-04-28 15:13:20 -05:00

Add infrastructure manifests for 9 services

2026-03-09 16:35:04 -05:00

fix(selenium): GitOps-capture selenium-netpol (was unmanaged anywhere)

2026-05-07 10:30:59 -05:00

Update telephony-web image to v20260324d, resolve merge conflicts

2026-03-24 15:55:52 -05:00

fix(telephony): init container runs as root to chown hostPath /tmp/tts-audio

2026-05-11 18:37:15 -05:00

traefik-dashboard

Fix Traefik dashboard cert issuer: step-ca-acme

2026-03-10 01:12:08 -05:00

Update telephony-web image to v20260324d, resolve merge conflicts

2026-03-24 15:55:52 -05:00

feat(worldbuilder): pin k8s demo to fake backend (#10 )

2026-05-18 04:52:11 +00:00

fix(zabbix): bump web probe timeouts 5s→15s + add failureThreshold

2026-05-15 15:59:04 -05:00