selenium: right-size hub + chrome + edge memory limits #28

Merged
bluejay merged 1 commits from ops/selenium-right-size-memory-2026-05-25 into main 2026-05-26 01:12:21 +00:00
Owner

Stops the Edge node OOMKill cadence (51 restarts in 5d on 1Gi).

Sizing rationale

pod usage now old limit new limit restarts
hub 766Mi 1Gi 1.5Gi 42 (frozen 7d ago by ArgoCD migration)
chrome (max 2 sessions) 684Mi 1Gi 2Gi 0 (pod 46m old; one zombie -> OOM)
edge (max 1 session) 489Mi 1Gi 2Gi 51 in 5d
firefox (max 1 session) 1038Mi 2Gi 2Gi 0 in 9d

Firefox is the reference: it was already at 2Gi and has been stable for 9 days. Match Edge + Chrome to that. Hub bumped 1Gi -> 1.5Gi for ~50% headroom; CPU left alone on all three (observed utilization is far under existing caps).

Test plan

  • ArgoCD reconciles infra-selenium
  • Three pods rolling-restart with new limits
  • kubectl top pods -n selenium shows new limits applied
  • Edge restartCount stays flat for 24h
Stops the Edge node OOMKill cadence (51 restarts in 5d on 1Gi). ## Sizing rationale | pod | usage now | old limit | new limit | restarts | |---|---|---|---|---| | hub | 766Mi | 1Gi | 1.5Gi | 42 (frozen 7d ago by ArgoCD migration) | | chrome (max 2 sessions) | 684Mi | 1Gi | 2Gi | 0 (pod 46m old; one zombie -> OOM) | | edge (max 1 session) | 489Mi | 1Gi | **2Gi** | **51 in 5d** | | firefox (max 1 session) | 1038Mi | 2Gi | 2Gi | 0 in 9d | Firefox is the reference: it was already at 2Gi and has been stable for 9 days. Match Edge + Chrome to that. Hub bumped 1Gi -> 1.5Gi for ~50% headroom; CPU left alone on all three (observed utilization is far under existing caps). ## Test plan - [ ] ArgoCD reconciles `infra-selenium` - [ ] Three pods rolling-restart with new limits - [ ] `kubectl top pods -n selenium` shows new limits applied - [ ] Edge restartCount stays flat for 24h
bluejay added 1 commit 2026-05-26 01:12:11 +00:00
Edge node has been OOMKilled 51 times in 5 days (~1 every 2.4h) on a
1Gi memory limit. Chrome runs maxSessions=2 on the same 1Gi cap and
was idling at 684Mi — first concurrent session pushing the node to
~900Mi+ would be the next OOM. Hub was running at 766Mi against a 1Gi
limit (75%); no recent restarts but no headroom either.

Firefox node has been running at 2Gi memory limit for 9 days with
zero restarts — that is the right size for a Selenium 4.27 browser
node under our session profile (screen recording sidecar + 1080p
rendering + page captures). Match it.

Changes:
- Hub:    limit 1Gi -> 1.5Gi, request 512Mi -> 1Gi
- Chrome: limit 1Gi -> 2Gi,   request 512Mi -> 1Gi
- Edge:   limit 1Gi -> 2Gi,   request 512Mi -> 1Gi

CPU left alone on all three — observed utilization is well under the
existing limits (hub 54m / 500m, chrome 185m / 1, edge 11m / 1).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bluejay merged commit 74333cc26b into main 2026-05-26 01:12:21 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: bluejay/bluejay-infra#28