infra(cx2-5): DNS auth/NetPol substrate, air-gap landing, arm64 ARC runner + tenant landing manifests
- fc-dns: add OnePasswordItem CRD for DNS API keys + NetworkPolicy for Phase 0 auth hardening; bump dns-web image tag - fc-landing: rewrite landing HTML to remove CDN dependencies (air-gap safe); add preview.html standalone preview - github-runner: add TOOLCACHE_ARCH to install-ruby-toolcache.sh for arm64 support; add Dockerfile.arm64 for arm64 ARC runner image - docs/gx10-tenant-landing: per-user Deployment+IngressRoute manifests (andrew/dustin/erik/fit/matt) + CUTOVER-RUNBOOK.md Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
83
docs/gx10-tenant-landing/CUTOVER-RUNBOOK.md
Normal file
83
docs/gx10-tenant-landing/CUTOVER-RUNBOOK.md
Normal file
@@ -0,0 +1,83 @@
|
||||
# GX10 Tenant Landing-Site Migration — Cutover Runbook
|
||||
|
||||
Date: 2026-06-16. Migrates the 5 per-tenant public landing sites from the OLD RKE2
|
||||
cluster (`10.0.56.200` Traefik) to the GX10 ARM64 cluster (`10.0.57.202` VIP /
|
||||
NodePort `10.0.56.14:32491`).
|
||||
|
||||
## Deployed on GX10 (DONE — staged-verified, NOT yet receiving public traffic)
|
||||
|
||||
| Domain(s) | GX10 ns | Workload | TLS secret (in ns + traefik-system) | Live content replicated |
|
||||
|-----------------------------------|--------------------|---------------|-------------------------------------|-------------------------|
|
||||
| bluejay.dev, www.bluejay.dev | `fc-tenant-andrew` | nginx:alpine | `cf-origin-bluejay-dev` | "Blue Jay" (custom) |
|
||||
| timeforta.co, www.timeforta.co | `fc-tenant-dustin` | nginx:alpine | `cf-origin-timeforta-co` | "Coming Soon" (generic) |
|
||||
| erckak.dev, www.erckak.dev | `fc-tenant-erik` | nginx:alpine | `cf-origin-erckak-dev` | "Erckak" (custom) |
|
||||
| flowerinsider.xyz, www.* | `fc-tenant-fit` | nginx:alpine | `cf-origin-flowerinsider-xyz` | "Flower Insider" (custom)|
|
||||
| matt.flowercore.io | `fc-tenant-matt` | nginx:alpine | `cf-origin-flowercore-io` | "Coming Soon" (generic) |
|
||||
|
||||
All nginx pods 1/1 Running, IngressRoutes priority 100 (override the GX10
|
||||
`public-catchall`). Each site replicates EXACTLY what was live on OLD at migration
|
||||
time, so cutover is content-invisible.
|
||||
|
||||
Staged verification (all HTTP 200, correct content, SNI-correct cert):
|
||||
```
|
||||
curl -sk --resolve <host>:32491:10.0.56.14 https://<host>:32491/
|
||||
```
|
||||
|
||||
## Public routing reality (why NO automatic cutover happened)
|
||||
|
||||
Every tenant domain enters the network through Cloudflare (proxied) → a dedicated
|
||||
pfSense WAN IP in 74.40.140.16/28 → pfSense port-forward. ALL FIVE currently forward
|
||||
to OLD Traefik `10.0.56.200:443`:
|
||||
|
||||
| Domain | CF origin WAN IP | pfSense rdr today |
|
||||
|-------------------|------------------|--------------------|
|
||||
| bluejay.dev | 74.40.140.17 | → 10.0.56.200:443 |
|
||||
| matt.flowercore.io| 74.40.140.19 | → 10.0.56.200:443 |
|
||||
| timeforta.co | 74.40.140.21 | → 10.0.56.200:443 |
|
||||
| erckak.dev | 74.40.140.23 | → 10.0.56.200:443 |
|
||||
| flowerinsider.xyz | 74.40.140.25 | → 10.0.56.200:443 |
|
||||
|
||||
(Contrast: main flowercore.io = WAN `.24` → already GX10 `10.0.56.14:32491`.)
|
||||
NOTE: matt.flowercore.io is bound to WAN `.19` (the MATT VPN IP), NOT `.24`, so the
|
||||
"*.flowercore.io already NATs to GX10" assumption does NOT cover matt.
|
||||
|
||||
Because none of these NAT to GX10 yet, no cutover was performed (live sites untouched).
|
||||
|
||||
## OPERATOR ACTION — cutover = repoint the pfSense port-forward target
|
||||
|
||||
For each domain, change the HTTPS (and HTTP) port-forward TARGET from
|
||||
`10.0.56.200` to `10.0.56.14:32491` (HTTPS) / `10.0.56.14:30776` (HTTP). pfSense
|
||||
port-forwards (Firewall → NAT → Port Forward), edit these rule descriptions:
|
||||
|
||||
- `ANDREW: HTTPS to Traefik` 74.40.140.17:443 → change target `10.0.56.200:443` to `10.0.56.14:32491`
|
||||
- `MATT: HTTPS to Traefik` 74.40.140.19:443 → change target `10.0.56.200:443` to `10.0.56.14:32491`
|
||||
- `DUSTIN: HTTPS to Traefik` 74.40.140.21:443 → change target `10.0.56.200:443` to `10.0.56.14:32491`
|
||||
- `ERIK: HTTPS to Traefik` 74.40.140.23:443 → change target `10.0.56.200:443` to `10.0.56.14:32491`
|
||||
- `FIT: HTTPS to Traefik` 74.40.140.25:443 → change target `10.0.56.200:443` to `10.0.56.14:32491`
|
||||
- (corresponding `:80 → 10.0.56.14:30776` HTTP rules likewise, optional — sites are HTTPS-only)
|
||||
|
||||
No Cloudflare DNS change is required: the WAN IPs stay the same, only the internal
|
||||
NAT target moves. Each can be flipped independently (per-tenant blast radius).
|
||||
|
||||
Post-flip verify (external):
|
||||
```
|
||||
curl -sI https://<host>/ # expect HTTP 200, Server: cloudflare, unchanged content
|
||||
```
|
||||
|
||||
## Rollback
|
||||
|
||||
OLD cluster left fully intact (ArgoCD apps infra-andrew/dustin/erik/fit Synced+Healthy,
|
||||
pods Running). To roll back any domain: revert that pfSense port-forward target to
|
||||
`10.0.56.200`.
|
||||
|
||||
## Notes
|
||||
- The OLD cluster has DUPLICATE namespaces per tenant (`tenant-X` custom page +
|
||||
`fc-tenant-X` generic landing), both with IngressRoutes claiming the same host.
|
||||
Traefik non-deterministically picked a winner; live content was: andrew/erik/fit =
|
||||
custom (`tenant-X`), dustin/matt = generic (`fc-tenant-X`). GX10 consolidates to ONE
|
||||
namespace per tenant (`fc-tenant-X`) serving the content that was actually live.
|
||||
- `infra-worldbuilder` (worldbuilder.iamworkin.lan, internal .NET app) was ALREADY
|
||||
migrated to GX10 (`fc-worldbuilder`, 1/1 Running) — no action.
|
||||
- `infra-flowercore` (tenant-flowercore/flowercore-web demo) has NO public route and is
|
||||
superseded by the production `fc-system/fc-landing-public` (flowercore.io root) already
|
||||
live on GX10 — intentionally NOT migrated.
|
||||
Reference in New Issue
Block a user