monitoring: paper-roll lifecycle alerts (XL Track I)
Three new Prometheus alert rules for the print-services group, all routed to thermal_print via alert_channel label (Grafana contact point -> irc-notify -> Print.Web /api/print/alert): - PrintPaperRollLow (warning, 5-10% remaining, 5m for) - PrintPaperRollCritical (critical, <=5% remaining, 2m for) - PrintJobDeadLetter (warning, any new dead-letter in 15m) Source-of-truth gauge is print_paper_remaining_percent (Print.Web OTEL), which is hydrated from the active PaperRoll row at process startup (Print.Web@<TBD> HydrateMetricsAsync) so the gauge isn't blind for an arbitrary window after every deploy. Self-referential humor: low-roll alerts route to the printer that's running out of paper, so it announces its own paper-out warning on its remaining paper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -642,6 +642,42 @@ data:
|
|||||||
summary: "Print queue backlog on edge2 ({{ $value }} active jobs)"
|
summary: "Print queue backlog on edge2 ({{ $value }} active jobs)"
|
||||||
description: "CUPS has {{ $value }} active jobs queued. Possible printer jam, USB disconnect, or paper out."
|
description: "CUPS has {{ $value }} active jobs queued. Possible printer jam, USB disconnect, or paper out."
|
||||||
|
|
||||||
|
# Paper roll lifecycle alerts (XL Track I, 2026-04-26).
|
||||||
|
# Source-of-truth gauge: print_paper_remaining_percent (Print.Web OTEL,
|
||||||
|
# hydrated on startup from the active PaperRoll row).
|
||||||
|
# alert_channel=thermal_print routes through irc-notify -> Print.Web
|
||||||
|
# /api/print/alert so the printer announces its own paper-out warning
|
||||||
|
# on its remaining paper. Self-referential humor + operator nudge.
|
||||||
|
- alert: PrintPaperRollLow
|
||||||
|
expr: print_paper_remaining_percent{job="printweb-otel"} < 10 and print_paper_remaining_percent{job="printweb-otel"} > 5
|
||||||
|
for: 5m
|
||||||
|
labels:
|
||||||
|
severity: warning
|
||||||
|
alert_channel: thermal_print
|
||||||
|
annotations:
|
||||||
|
summary: "Print roll low on edge2 ({{ $value | printf \"%.1f\" }}% remaining)"
|
||||||
|
description: "NuPrint 210 paper roll has {{ $value | printf \"%.1f\" }}% remaining. Operator should load a fresh roll soon. Run /api/paper/status for the precise mm + estimated jobs left."
|
||||||
|
|
||||||
|
- alert: PrintPaperRollCritical
|
||||||
|
expr: print_paper_remaining_percent{job="printweb-otel"} <= 5
|
||||||
|
for: 2m
|
||||||
|
labels:
|
||||||
|
severity: critical
|
||||||
|
alert_channel: thermal_print
|
||||||
|
annotations:
|
||||||
|
summary: "Print roll critical on edge2 ({{ $value | printf \"%.1f\" }}% remaining)"
|
||||||
|
description: "NuPrint 210 paper roll at {{ $value | printf \"%.1f\" }}% — load a new roll NOW. The 50ft roll has a ~12% red-stripe zone; once paper passes that, the printer can run dry mid-job."
|
||||||
|
|
||||||
|
- alert: PrintJobDeadLetter
|
||||||
|
expr: increase(print_jobs_dead_letter_total[15m]) > 0
|
||||||
|
for: 1m
|
||||||
|
labels:
|
||||||
|
severity: warning
|
||||||
|
alert_channel: thermal_print
|
||||||
|
annotations:
|
||||||
|
summary: "Print job(s) entered dead-letter on edge2 ({{ $value | printf \"%.0f\" }} in last 15m)"
|
||||||
|
description: "{{ $value | printf \"%.0f\" }} print job(s) exhausted MaxRetries and need operator action. Open /print-log, filter Status=DeadLetter, click 'Retry From Start' after fixing the underlying cause (paper jam, USB disconnect, printer power-cycle)."
|
||||||
|
|
||||||
- alert: CUPSHighJobRate
|
- alert: CUPSHighJobRate
|
||||||
expr: rate(cups_job_total[5m]) * 60 > 30
|
expr: rate(cups_job_total[5m]) * 60 > 30
|
||||||
for: 5m
|
for: 5m
|
||||||
|
|||||||
Reference in New Issue
Block a user