feat(infra): add Multus CNI + CDI + PROD VLAN 57 NAD as GitOps prereqs for ci1

Adds three new bluejay-infra apps that auto-pickup via ApplicationSet (apps/*
directory generator on main):

* apps/multus/multus.yaml — Multus CNI v4.2.2 thick-plugin daemonset (verbatim
  upstream, project-annotated). Enables KubeVirt VMs to attach additional
  network interfaces. Required by ci1 to bridge onto PROD VLAN 57.

* apps/cdi/{cdi-operator.yaml,cdi-cr.yaml,README.md} — Containerized Data
  Importer v1.65.0 (verbatim upstream). Operator + CR pattern. Enables
  populating PVCs from HTTP/registry/upload sources, used to load the Windows
  Server 2025 ISO into the windows-server-2025-iso PVC.

* apps/kubevirt-vms/prod-vlan57-nad.yaml — NetworkAttachmentDefinition for
  PROD VLAN 57 bridge. **Deploy gated on Phase 1.5 host work**: requires
  br-prod bridge enslaving enp86s0.57 on each RKE2 node (Puppet config-as-code).
  ci1.yaml continues to use pod-network masquerade until that lands; switching
  to multus.networkName: kubevirt-vms/prod-vlan57 is a one-line YAML edit
  followed by a GitOps push.

Cluster verification (2026-05-08):
- KubeVirt LIVE (3 nodes, virt-api/controller/handler/operator all Running)
- Calico CNI on /etc/cni/net.d + /opt/cni/bin (Multus default paths)
- ApplicationSet `bluejay-infra` already watches `apps/*` on main

Reproducibility: upstream YAMLs vendored verbatim with project header diffs
only. Bumping versions = re-curl + git push. No deploy-time internet fetch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Codex
2026-05-08 13:05:58 -05:00
parent 00c11b4eaa
commit b3529f8e96
5 changed files with 6239 additions and 0 deletions

69
apps/cdi/README.md Normal file
View File

@@ -0,0 +1,69 @@
# CDI — Containerized Data Importer
KubeVirt's `containerized-data-importer` for populating PVCs from external
sources (HTTP, HTTPS, container registry, S3, virtctl upload). Required to
import the Windows Server 2025 ISO into the `windows-server-2025-iso` PVC
that `apps/kubevirt-vms/ci1.yaml` mounts as a CDROM.
## Files
| File | Source | Purpose |
| ----------------- | ----------------------------------------------------------------------------------------------------------------- | -------------------------------------------------- |
| `cdi-operator.yaml` | [`v1.65.0`](https://github.com/kubevirt/containerized-data-importer/releases/tag/v1.65.0) — verbatim copy | Installs operator + CRDs (5779 lines, large) |
| `cdi-cr.yaml` | [`v1.65.0`](https://github.com/kubevirt/containerized-data-importer/releases/tag/v1.65.0) — annotated + commented | Tells operator to deploy CDI components |
`cdi-operator.yaml` is **vendored verbatim** from the upstream release for
air-gap reproducibility (no internet fetch at deploy time, ArgoCD prune
contracts hold). To bump versions:
```bash
CDI_VER=v1.66.0 # for example
curl -sL "https://github.com/kubevirt/containerized-data-importer/releases/download/${CDI_VER}/cdi-operator.yaml" \
-o apps/cdi/cdi-operator.yaml
curl -sL "https://github.com/kubevirt/containerized-data-importer/releases/download/${CDI_VER}/cdi-cr.yaml" \
-o /tmp/cdi-cr-new.yaml # then re-apply project header diff
git diff apps/cdi/ # review
git commit + push
```
## Verify after deploy
```bash
kubectl -n cdi get pods # operator + apiserver + deployment + uploadproxy
kubectl get cdis cdi -o jsonpath='{.status.phase}' # "Deployed"
kubectl get crd | grep cdi.kubevirt.io
# Expected CRDs: datavolumes.cdi.kubevirt.io, cdiconfigs.cdi.kubevirt.io,
# storageprofiles.cdi.kubevirt.io, dataimportcrons.cdi.kubevirt.io,
# datasources.cdi.kubevirt.io, objecttransfers.cdi.kubevirt.io
```
## Use after install
```yaml
# Example DataVolume that imports from HTTP
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
name: my-iso
spec:
source:
http:
url: "https://server/path/to.iso"
pvc:
accessModes: [ReadWriteOnce]
resources:
requests:
storage: 10Gi
storageClassName: longhorn
```
```bash
# Or upload from local disk via virtctl
virtctl image-upload pvc my-iso \
--image-path ./my.iso \
--size 10Gi \
--storage-class longhorn \
--access-mode ReadWriteOnce \
--uploadproxy-url https://cdi-uploadproxy.cdi.svc:443 \
--insecure
```

36
apps/cdi/cdi-cr.yaml Normal file
View File

@@ -0,0 +1,36 @@
# =============================================================================
# CDI CR — Tells the CDI operator to install CDI components into the cluster.
# =============================================================================
# After cdi-operator.yaml is applied, the operator watches for THIS resource
# (CDI named "cdi"). When found, it deploys cdi-apiserver, cdi-deployment,
# cdi-uploadproxy, cdi-cronjob, and the importer/uploadserver/cloner pods.
#
# Configuration:
# - HonorWaitForFirstConsumer: PVCs created by DataVolumes wait for first
# pod to schedule before binding (lets storage class pick best node).
# - WebhookPvcRendering: validates PVC creation against CDI policies.
# - imagePullPolicy IfNotPresent: re-pull only on tag rotation.
# - nodeSelector linux: pin to Linux nodes (no Windows worker support).
#
# Andrew may want to add a `uploadProxyURLOverride` later to expose the
# uploadproxy via Traefik IngressRoute for `virtctl image-upload` from
# BLUEJAY-WS without `kubectl port-forward`. Phase 2 enhancement.
# =============================================================================
apiVersion: cdi.kubevirt.io/v1beta1
kind: CDI
metadata:
name: cdi
annotations:
bluejay.iamworkin.lan/source: "kubevirt/containerized-data-importer v1.65.0"
spec:
config:
featureGates:
- HonorWaitForFirstConsumer
- WebhookPvcRendering
imagePullPolicy: IfNotPresent
infra:
nodeSelector:
kubernetes.io/os: linux
workload:
nodeSelector:
kubernetes.io/os: linux

5779
apps/cdi/cdi-operator.yaml Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,69 @@
# =============================================================================
# NetworkAttachmentDefinition — PROD VLAN 57 bridge
# =============================================================================
# Purpose: makes KubeVirt VMs reachable on the PROD VLAN (10.0.57.0/24)
# alongside the existing pod network. Required for ci1 to bridge onto PROD
# (e.g. to provision/scrape edge1, edge2, kiosks, Pis on the same L2 segment).
#
# **DEPLOY GATE — Phase 1.5 host work required first**:
# On every RKE2 node (rke2-server, rke2-agent1, rke2-agent2):
# 1. Switch port (UniFi USL16LP) trunks VLAN 57 to the node — usually
# already true since BLUEJAY-WS reaches 10.0.57.x services. Verify
# with `ip link show enp86s0.57` after configuring sub-interface, OR
# `tcpdump -ni enp86s0 vlan 57` and ping a known PROD host.
# 2. Linux bridge `br-prod` enslaving `enp86s0.57` (VLAN sub-interface).
# NetworkManager profile examples in the runbook below.
# 3. Verify Multus DaemonSet `kube-multus-ds` is Ready on all nodes.
#
# Without those, applying this NAD has no effect except to register the CRD.
# A VM that requests this NAD with no bridge present will fail with:
# `error adding pod kubevirt-vms_ci1 to CNI network "prod-vlan57": failed to
# plumb VLAN: open /sys/class/net/br-prod/master: no such file or directory`
#
# Configuration notes:
# - cniVersion 0.3.1 to match Multus daemon-config.json
# - mtu 1500 (matches enp86s0 default; bump if jumbo frames configured)
# - bridge name `br-prod` is convention; if Puppet picks a different name
# (e.g. `br57`, `br-vlan57`), edit BOTH this NAD and the ci1.yaml
# interface block. Keep them in sync.
# - vlan: 0 because the host bridge already strips VLAN tag (br-prod sits
# on top of `enp86s0.57`). If we instead used a VLAN-aware bridge with
# trunk port, set vlan: 57 here. Current convention is VLAN-stripped at
# the sub-interface, so the bridge passes untagged frames.
#
# Apply:
# kubectl --kubeconfig $env:USERPROFILE\.kube\rke2.yaml apply -f apps/kubevirt-vms/prod-vlan57-nad.yaml
#
# Then update ci1.yaml networks: stanza to:
# - name: prod-net
# multus:
# networkName: kubevirt-vms/prod-vlan57
# and the interface block from `masquerade` to `bridge`.
# =============================================================================
---
# Namespace must exist already (created by ci1.yaml's first document).
# This file imports a NAD into that same namespace.
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
name: prod-vlan57
namespace: kubevirt-vms
annotations:
bluejay.iamworkin.lan/host-bridge: "br-prod (enslaves enp86s0.57)"
bluejay.iamworkin.lan/cidr: "10.0.57.0/24"
bluejay.iamworkin.lan/gateway: "10.0.57.1"
bluejay.iamworkin.lan/dns: "10.0.56.1 (pfSense Unbound)"
spec:
config: |
{
"cniVersion": "0.3.1",
"name": "prod-vlan57",
"type": "bridge",
"bridge": "br-prod",
"ipam": {},
"mtu": 1500,
"vlan": 0,
"promiscMode": true,
"preserveDefaultVlan": false
}

286
apps/multus/multus.yaml Normal file
View File

@@ -0,0 +1,286 @@
# =============================================================================
# Multus CNI — Meta-CNI for multi-network attachment to pods/VMs
# =============================================================================
# Purpose: enable KubeVirt VMs (and any future workload) to attach additional
# network interfaces beyond the default Calico-managed pod network. Required
# for ci1 (Windows Server 2025 KubeVirt VM) to bridge onto PROD VLAN 57.
#
# Source: upstream k8snetworkplumbingwg/multus-cni v4.2.2
# https://github.com/k8snetworkplumbingwg/multus-cni/blob/v4.2.2/deployments/multus-daemonset-thick.yml
#
# Inlined verbatim (with project header + version pin annotation) for
# reproducibility and air-gap safety. Bumping versions = edit this file +
# git push. ArgoCD picks up via the bluejay-infra ApplicationSet
# (apps/* directory generator on main).
#
# Why thick plugin (not thin):
# - Thick = daemon + thin shim binary; daemon handles NAD watch + CRD reads
# centrally so each pod's CNI ADD doesn't hit the K8s API server. Better
# for clusters with many NAD-using pods.
# - Thin = each CNI ADD process directly contacts K8s API. Simpler but
# scales worse and has more failure modes.
# - KubeVirt + multi-VM workload pattern fits thick perfectly.
#
# Cluster context (verified 2026-05-08):
# - RKE2 v1.34.5 on 3 nodes (rke2-server, rke2-agent1, rke2-agent2)
# - Calico CNI (Tigera-managed) at /etc/cni/net.d + /opt/cni/bin (default)
# - openSUSE Leap 16, kernel 6.12, containerd 2.1.5
# - host bridge for PROD VLAN 57 = `br-prod` (PUPPET HOST WORK — see Phase 1.5
# in docs/infrastructure/windows-server-build-runner-plan.md)
#
# Version pin: snapshot-thick → pinning to v4.2.2 release tag at deploy time
# would require a private mirror of the image. Upstream `snapshot-thick` tag
# is updated on every release, so for now we trust upstream + Calico's
# established pattern. Pin to a specific SHA256 once we mirror to Gitea OCI.
#
# Apply (once committed to bluejay-infra main, ApplicationSet auto-syncs):
# git add apps/multus/multus.yaml && git commit && git push origin main
# # ArgoCD `infra-multus` Application appears within 3 min via ApplicationSet
#
# Verify:
# kubectl -n kube-system get ds kube-multus-ds
# kubectl -n kube-system rollout status ds kube-multus-ds
# kubectl get crd network-attachment-definitions.k8s.cni.cncf.io
# =============================================================================
---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: network-attachment-definitions.k8s.cni.cncf.io
annotations:
bluejay.iamworkin.lan/source: "k8snetworkplumbingwg/multus-cni v4.2.2"
spec:
group: k8s.cni.cncf.io
scope: Namespaced
names:
plural: network-attachment-definitions
singular: network-attachment-definition
kind: NetworkAttachmentDefinition
shortNames:
- net-attach-def
versions:
- name: v1
served: true
storage: true
schema:
openAPIV3Schema:
description: 'NetworkAttachmentDefinition is a CRD schema specified by the Network Plumbing
Working Group to express the intent for attaching pods to one or more logical or physical
networks. More information available at: https://github.com/k8snetworkplumbingwg/multi-net-spec'
type: object
properties:
apiVersion:
type: string
kind:
type: string
metadata:
type: object
spec:
description: 'NetworkAttachmentDefinition spec defines the desired state of a network attachment'
type: object
properties:
config:
description: 'NetworkAttachmentDefinition config is a JSON-formatted CNI configuration'
type: string
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: multus
rules:
- apiGroups: ["k8s.cni.cncf.io"]
resources:
- '*'
verbs:
- '*'
- apiGroups:
- ""
resources:
- pods
- pods/status
verbs:
- get
- list
- update
- watch
- apiGroups:
- ""
- events.k8s.io
resources:
- events
verbs:
- create
- patch
- update
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: multus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: multus
subjects:
- kind: ServiceAccount
name: multus
namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: multus
namespace: kube-system
---
kind: ConfigMap
apiVersion: v1
metadata:
name: multus-daemon-config
namespace: kube-system
labels:
tier: node
app: multus
data:
daemon-config.json: |
{
"chrootDir": "/hostroot",
"cniVersion": "0.3.1",
"logLevel": "verbose",
"logToStderr": true,
"cniConfigDir": "/host/etc/cni/net.d",
"multusAutoconfigDir": "/host/etc/cni/net.d",
"multusConfigFile": "auto",
"socketDir": "/host/run/multus/"
}
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: kube-multus-ds
namespace: kube-system
labels:
tier: node
app: multus
name: multus
spec:
selector:
matchLabels:
name: multus
updateStrategy:
type: RollingUpdate
template:
metadata:
labels:
tier: node
app: multus
name: multus
spec:
hostNetwork: true
hostPID: true
tolerations:
- operator: Exists
effect: NoSchedule
- operator: Exists
effect: NoExecute
serviceAccountName: multus
containers:
- name: kube-multus
image: ghcr.io/k8snetworkplumbingwg/multus-cni:snapshot-thick
command: [ "/usr/src/multus-cni/bin/multus-daemon" ]
resources:
requests:
cpu: "100m"
memory: "50Mi"
limits:
cpu: "100m"
memory: "50Mi"
securityContext:
privileged: true
terminationMessagePolicy: FallbackToLogsOnError
volumeMounts:
- name: cni
mountPath: /host/etc/cni/net.d
# multus-daemon expects that cnibin path must be identical between pod and container host.
# e.g. if the cni bin is in '/opt/cni/bin' on the container host side, then it should be mount to '/opt/cni/bin' in multus-daemon,
# not to any other directory, like '/opt/bin' or '/usr/bin'.
- name: cnibin
mountPath: /opt/cni/bin
- name: host-run
mountPath: /host/run
- name: host-var-lib-cni-multus
mountPath: /var/lib/cni/multus
- name: host-var-lib-kubelet
mountPath: /var/lib/kubelet
mountPropagation: HostToContainer
- name: host-run-k8s-cni-cncf-io
mountPath: /run/k8s.cni.cncf.io
- name: host-run-netns
mountPath: /run/netns
mountPropagation: HostToContainer
- name: multus-daemon-config
mountPath: /etc/cni/net.d/multus.d
readOnly: true
- name: hostroot
mountPath: /hostroot
mountPropagation: HostToContainer
- mountPath: /etc/cni/multus/net.d
name: multus-conf-dir
env:
- name: MULTUS_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
initContainers:
- name: install-multus-binary
image: ghcr.io/k8snetworkplumbingwg/multus-cni:snapshot-thick
command:
- "sh"
- "-c"
- "cp /usr/src/multus-cni/bin/multus-shim /host/opt/cni/bin/multus-shim && cp /usr/src/multus-cni/bin/passthru /host/opt/cni/bin/passthru"
resources:
requests:
cpu: "10m"
memory: "15Mi"
securityContext:
privileged: true
terminationMessagePolicy: FallbackToLogsOnError
volumeMounts:
- name: cnibin
mountPath: /host/opt/cni/bin
mountPropagation: Bidirectional
terminationGracePeriodSeconds: 10
volumes:
- name: cni
hostPath:
path: /etc/cni/net.d
- name: cnibin
hostPath:
path: /opt/cni/bin
- name: hostroot
hostPath:
path: /
- name: multus-daemon-config
configMap:
name: multus-daemon-config
items:
- key: daemon-config.json
path: daemon-config.json
- name: host-run
hostPath:
path: /run
- name: host-var-lib-cni-multus
hostPath:
path: /var/lib/cni/multus
- name: host-var-lib-kubelet
hostPath:
path: /var/lib/kubelet
- name: host-run-k8s-cni-cncf-io
hostPath:
path: /run/k8s.cni.cncf.io
- name: host-run-netns
hostPath:
path: /run/netns/
- name: multus-conf-dir
hostPath:
path: /etc/cni/multus/net.d