Files
bluejay-infra/docs/gx10-tenant-landing/CUTOVER-RUNBOOK.md
Andrew Stoltz eae7b4ed7a infra(cx2-5): DNS auth/NetPol substrate, air-gap landing, arm64 ARC runner + tenant landing manifests
- fc-dns: add OnePasswordItem CRD for DNS API keys + NetworkPolicy for Phase 0 auth hardening; bump dns-web image tag
- fc-landing: rewrite landing HTML to remove CDN dependencies (air-gap safe); add preview.html standalone preview
- github-runner: add TOOLCACHE_ARCH to install-ruby-toolcache.sh for arm64 support; add Dockerfile.arm64 for arm64 ARC runner image
- docs/gx10-tenant-landing: per-user Deployment+IngressRoute manifests (andrew/dustin/erik/fit/matt) + CUTOVER-RUNBOOK.md

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 11:53:26 -05:00

84 lines
4.8 KiB
Markdown

# GX10 Tenant Landing-Site Migration — Cutover Runbook
Date: 2026-06-16. Migrates the 5 per-tenant public landing sites from the OLD RKE2
cluster (`10.0.56.200` Traefik) to the GX10 ARM64 cluster (`10.0.57.202` VIP /
NodePort `10.0.56.14:32491`).
## Deployed on GX10 (DONE — staged-verified, NOT yet receiving public traffic)
| Domain(s) | GX10 ns | Workload | TLS secret (in ns + traefik-system) | Live content replicated |
|-----------------------------------|--------------------|---------------|-------------------------------------|-------------------------|
| bluejay.dev, www.bluejay.dev | `fc-tenant-andrew` | nginx:alpine | `cf-origin-bluejay-dev` | "Blue Jay" (custom) |
| timeforta.co, www.timeforta.co | `fc-tenant-dustin` | nginx:alpine | `cf-origin-timeforta-co` | "Coming Soon" (generic) |
| erckak.dev, www.erckak.dev | `fc-tenant-erik` | nginx:alpine | `cf-origin-erckak-dev` | "Erckak" (custom) |
| flowerinsider.xyz, www.* | `fc-tenant-fit` | nginx:alpine | `cf-origin-flowerinsider-xyz` | "Flower Insider" (custom)|
| matt.flowercore.io | `fc-tenant-matt` | nginx:alpine | `cf-origin-flowercore-io` | "Coming Soon" (generic) |
All nginx pods 1/1 Running, IngressRoutes priority 100 (override the GX10
`public-catchall`). Each site replicates EXACTLY what was live on OLD at migration
time, so cutover is content-invisible.
Staged verification (all HTTP 200, correct content, SNI-correct cert):
```
curl -sk --resolve <host>:32491:10.0.56.14 https://<host>:32491/
```
## Public routing reality (why NO automatic cutover happened)
Every tenant domain enters the network through Cloudflare (proxied) → a dedicated
pfSense WAN IP in 74.40.140.16/28 → pfSense port-forward. ALL FIVE currently forward
to OLD Traefik `10.0.56.200:443`:
| Domain | CF origin WAN IP | pfSense rdr today |
|-------------------|------------------|--------------------|
| bluejay.dev | 74.40.140.17 | → 10.0.56.200:443 |
| matt.flowercore.io| 74.40.140.19 | → 10.0.56.200:443 |
| timeforta.co | 74.40.140.21 | → 10.0.56.200:443 |
| erckak.dev | 74.40.140.23 | → 10.0.56.200:443 |
| flowerinsider.xyz | 74.40.140.25 | → 10.0.56.200:443 |
(Contrast: main flowercore.io = WAN `.24` → already GX10 `10.0.56.14:32491`.)
NOTE: matt.flowercore.io is bound to WAN `.19` (the MATT VPN IP), NOT `.24`, so the
"*.flowercore.io already NATs to GX10" assumption does NOT cover matt.
Because none of these NAT to GX10 yet, no cutover was performed (live sites untouched).
## OPERATOR ACTION — cutover = repoint the pfSense port-forward target
For each domain, change the HTTPS (and HTTP) port-forward TARGET from
`10.0.56.200` to `10.0.56.14:32491` (HTTPS) / `10.0.56.14:30776` (HTTP). pfSense
port-forwards (Firewall → NAT → Port Forward), edit these rule descriptions:
- `ANDREW: HTTPS to Traefik` 74.40.140.17:443 → change target `10.0.56.200:443` to `10.0.56.14:32491`
- `MATT: HTTPS to Traefik` 74.40.140.19:443 → change target `10.0.56.200:443` to `10.0.56.14:32491`
- `DUSTIN: HTTPS to Traefik` 74.40.140.21:443 → change target `10.0.56.200:443` to `10.0.56.14:32491`
- `ERIK: HTTPS to Traefik` 74.40.140.23:443 → change target `10.0.56.200:443` to `10.0.56.14:32491`
- `FIT: HTTPS to Traefik` 74.40.140.25:443 → change target `10.0.56.200:443` to `10.0.56.14:32491`
- (corresponding `:80 → 10.0.56.14:30776` HTTP rules likewise, optional — sites are HTTPS-only)
No Cloudflare DNS change is required: the WAN IPs stay the same, only the internal
NAT target moves. Each can be flipped independently (per-tenant blast radius).
Post-flip verify (external):
```
curl -sI https://<host>/ # expect HTTP 200, Server: cloudflare, unchanged content
```
## Rollback
OLD cluster left fully intact (ArgoCD apps infra-andrew/dustin/erik/fit Synced+Healthy,
pods Running). To roll back any domain: revert that pfSense port-forward target to
`10.0.56.200`.
## Notes
- The OLD cluster has DUPLICATE namespaces per tenant (`tenant-X` custom page +
`fc-tenant-X` generic landing), both with IngressRoutes claiming the same host.
Traefik non-deterministically picked a winner; live content was: andrew/erik/fit =
custom (`tenant-X`), dustin/matt = generic (`fc-tenant-X`). GX10 consolidates to ONE
namespace per tenant (`fc-tenant-X`) serving the content that was actually live.
- `infra-worldbuilder` (worldbuilder.iamworkin.lan, internal .NET app) was ALREADY
migrated to GX10 (`fc-worldbuilder`, 1/1 Running) — no action.
- `infra-flowercore` (tenant-flowercore/flowercore-web demo) has NO public route and is
superseded by the production `fc-system/fc-landing-public` (flowercore.io root) already
live on GX10 — intentionally NOT migrated.