bluejay-infra

Author	SHA1	Message	Date
Codex	0bf47dfa33	fix(ci1): switch ISO from filesystem PVC to Block-mode DataVolume The bootOrder swap alone didn't fix the install — even with `windows-iso` at bootOrder:1, OVMF UEFI still timed out reading the SATA CDROM: BdsDxe: starting Boot0001 "UEFI QEMU DVD-ROM QM00001 " from ... Sata(...) BdsDxe: failed to start Boot0001 ... : Time out BdsDxe: No bootable option or device was found. Diagnosis (debug pod mounting the live PVC): - /pvc/disk.img IS a valid bootable ISO9660 image — `file` reports "ISO 9660 CD-ROM filesystem data 'SSS_X64FRE_EN-US_DV9' (bootable)". - bytes 0..15: zeros (NOT QCOW2 magic 51 46 49 fb). - bytes 32769..32773: "CD001" — ISO9660 primary volume descriptor at the correct offset. So content was fine. The bug is in how KubeVirt + QEMU + Longhorn expose a Filesystem-mode PVC's `/disk.img` as a SATA CDROM. With Block-mode the underlying volume IS the raw ISO9660 sectors, OVMF reads them directly, no QEMU file-emulation layer. This is the recommended pattern for ISO install media on KubeVirt + Longhorn. Migration: - Replace `kind: PersistentVolumeClaim` with `kind: DataVolume` (CDI manages the underlying PVC + upload-target pod). - Set `pvc.volumeMode: Block`. - Annotate `cdi.kubevirt.io/storage.contentType: kubevirt` so CDI keeps raw bytes (no QCOW2 wrap). - VM volume reference changes from `persistentVolumeClaim.claimName` to `dataVolume.name`. KubeVirt's VMI controller blocks VM start until DV phase is Succeeded (upload completed). Operator step after this lands: 1. Wait for DV `phase: UploadReady` kubectl get dv -n kubevirt-vms windows-server-2025-iso -w 2. virtctl image-upload dv windows-server-2025-iso -n kubevirt-vms \ --image-path "...\en-us_windows_server_2025...iso" \ --uploadproxy-url https://localhost:8443 --insecure --no-create 3. Re-flip runStrategy to Always (was set to Halted live-side during migration; this commit keeps the manifest at Always). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 14:23:31 -05:00
Codex	87a7d7c70a	fix(ci1): switch deprecated `running: true` -> `runStrategy: Always` Required to clear OutOfSync state after the bootOrder fix. Live VM had runStrategy: Halted (set during diagnosis to release the PVC for inspection). Manifest had running: true. KubeVirt's validating webhook rejects sync: admission webhook "virtualmachine-validator.kubevirt.io" denied the request: Running and RunStrategy are mutually exclusive. Switching to runStrategy: Always preserves the original "auto-start + auto-restart" semantics with the non-deprecated field, and gives ArgoCD a clean diff target to flip Halted -> Always. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 14:12:07 -05:00
Codex	1c4145a581	fix(ci1): swap bootOrder so Windows install ISO boots first Original order: rootdisk=1 (empty 200Gi virtio), windows-iso=2 (SATA CDROM). UEFI tried the empty virtio disk first, got nothing, fell back to Boot0001 (the SATA CDROM) with a short timeout, and aborted with: BdsDxe: failed to start Boot0001 ... Time out BdsDxe: No bootable option or device was found. VM had been running 38+ min with rootdisk actualSize stuck at 4.13 GiB and no AgentConnected condition — install never started. Diagnosis via debug pod mounting the windows-server-2025-iso PVC: /pvc/disk.img: ISO 9660 CD-ROM filesystem data 'SSS_X64FRE_EN-US_DV9' (bootable) bytes 0..15: zeros (NOT QCOW2 magic 51 46 49 fb) bytes 32769..32773: "CD001" (ISO9660 primary volume descriptor) So the PVC content is a real bootable ISO — the only fix needed is to make the ISO bootOrder=1 for first install. After Windows installs, it writes its own UEFI Boot#### entries pointing at the rootdisk EFI partition; UEFI then boots from rootdisk going forward and the ISO at bootOrder:2 is a fallback for re-install scenarios. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 14:10:17 -05:00
Codex	c50a403f74	fix(infra): pin virtio-container-disk to v1.8.2 (containerd 2.1 manifest fix) KubeVirt v1.4.0 + RKE2 containerd 2.1.5 cannot pull quay.io/kubevirt/virtio-container-disk:latest: rpc error: code = Unimplemented desc = failed to pull and unpack image: not implemented: media type "application/vnd.docker.distribution.manifest.v1+prettyjws" is no longer supported since containerd v2.1, please rebuild the image as "application/vnd.docker.distribution.manifest.v2+json" or "application/vnd.oci.image.manifest.v1+json" The :latest tag was last rebuilt with the v1 manifest schema. Tagged versions v1.6.5+, v1.7.3, v1.8.2 are rebuilt with v2/OCI manifests. Pinning to v1.8.2 (newest available, contains current Windows VirtIO drivers). The image only contains the Windows VirtIO driver ISO mounted as a CDROM — not the KubeVirt runtime — so it is decoupled from the cluster KubeVirt version. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 13:28:22 -05:00
Codex	fb7bd10528	feat(infra): activate ci1 VM — running:true + 10Gi ISO PVC + 1P password Phase 1 prereqs all satisfied: - Multus CNI v4.2.2 thick-plugin DS Running on rke2-server/agent1/agent2 - CDI v1.65.0 operator + CR Deployed (cdi-apiserver/deployment/uploadproxy all Running 1/1) - Windows Server 2025 ISO (7.7GiB, March 2026 update) uploaded via CDI virtctl image-upload to PVC windows-server-2025-iso. Verified via PVC annotations: cdi.kubevirt.io/storage.condition.running.message="Upload Complete", storage.pod.phase="Succeeded" - Local Administrator password generated (26 char, FANTASTIC strength). Stored in 1Password vault IAmWorkin (qaphopopkryhbg353ukzhhuqoq) item h3ix4mgfk65gmkcmvh6ly3d3hu. UTF-16-LE base64 in autounattend.xml Value field matches the 1P "autounattend AdministratorPassword Value" field. Changes: - ISO PVC bumped 6Gi → 10Gi (ISO is 7.7GiB, need headroom) - Added labels app=ci-runner, flowercore.io/managed-by=bluejay-infra - autounattend.xml AdministratorPassword Value: real base64-encoded password - spec.running: false → true (VM starts on next ArgoCD sync) - Header comment refreshed to LIVE state with prereq references Network: still pod-network masquerade. Multus NAD prod-vlan57 is registered but the VM doesn't use it yet (Phase 1.5 host bridge needed first). Verify after sync: kubectl --kubeconfig $env:USERPROFILE\.kube\rke2.yaml -n kubevirt-vms get vm,vmi virtctl --kubeconfig $env:USERPROFILE\.kube\rke2.yaml vnc ci1 -n kubevirt-vms Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 13:24:46 -05:00
Codex	6c21d14a98	deploy(fc-updater): bump image to v20260508-pub3-deepening-2bdf108 Promotes the fleet to FlowerCore.Updater main @ 2bdf108 which combines: - PR #6 publish pre-signed releases (1a188f4) - PR #7 deeper public-host privacy enforcement (8cd8544) - PublishPreSignedAsync(stream, sig) Integration coverage (2bdf108) Live image already imported to rke2-server and rolled via deploy-web.ps1. This commit aligns the bluejay-infra source of truth so ArgoCD doesn't snap the spec back to the previous tag (per feedback_argocd_managed_image_overrides_do_not_stick).	2026-05-08 13:07:24 -05:00
Codex	b3529f8e96	feat(infra): add Multus CNI + CDI + PROD VLAN 57 NAD as GitOps prereqs for ci1 Adds three new bluejay-infra apps that auto-pickup via ApplicationSet (apps/* directory generator on main): * apps/multus/multus.yaml — Multus CNI v4.2.2 thick-plugin daemonset (verbatim upstream, project-annotated). Enables KubeVirt VMs to attach additional network interfaces. Required by ci1 to bridge onto PROD VLAN 57. * apps/cdi/{cdi-operator.yaml,cdi-cr.yaml,README.md} — Containerized Data Importer v1.65.0 (verbatim upstream). Operator + CR pattern. Enables populating PVCs from HTTP/registry/upload sources, used to load the Windows Server 2025 ISO into the windows-server-2025-iso PVC. * apps/kubevirt-vms/prod-vlan57-nad.yaml — NetworkAttachmentDefinition for PROD VLAN 57 bridge. Deploy gated on Phase 1.5 host work: requires br-prod bridge enslaving enp86s0.57 on each RKE2 node (Puppet config-as-code). ci1.yaml continues to use pod-network masquerade until that lands; switching to multus.networkName: kubevirt-vms/prod-vlan57 is a one-line YAML edit followed by a GitOps push. Cluster verification (2026-05-08): - KubeVirt LIVE (3 nodes, virt-api/controller/handler/operator all Running) - Calico CNI on /etc/cni/net.d + /opt/cni/bin (Multus default paths) - ApplicationSet `bluejay-infra` already watches `apps/*` on main Reproducibility: upstream YAMLs vendored verbatim with project header diffs only. Bumping versions = re-curl + git push. No deploy-time internet fetch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 13:05:58 -05:00
Codex	00c11b4eaa	feat(infra): stage ci1 Windows Server 2025 KubeVirt VM (Phase 1, NOT YET APPLIED) Stages a draft VirtualMachine + Namespace + ISO PVC + rootdisk PVC + sysprep ConfigMap for the dedicated GitHub Actions self-hosted runner that replaces the never-registered bluejay-ws-sandbox-1 placeholder. Status: STAGED ONLY. spec.running = false. ISO PVC empty. Two operator decisions still pending before this can boot: 1. Network choice — pod-network fallback (in this draft) vs Multus + PROD VLAN NAD (preferred, requires Multus install). 2. ISO path — manual upload via helper pod (Path A) vs CDI HTTP import (Path B, requires CDI install). Cluster baseline 2026-05-08: - KubeVirt operator: installed, healthy, 14d - CDI: NOT installed - Multus: NOT installed - Calico-only CNI See docs/infrastructure/windows-server-build-runner-plan.md "Phase 1 readiness gate" for the full operator pickup checklist. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 12:32:47 -05:00
Codex	04881f46f0	deploy(intranet): promote brochure wave 1 image	2026-05-08 11:12:56 -05:00
Codex	c0038e4859	deploy(intranet): bump image to v20260508-7bad3a5 (Theme picker + FcThemedRoot) FlowerCore.Intranet.Web@7bad3a5 'feat(theme): add /admin/theme picker page + wrap routes in FcThemedRoot'. Image built, distributed to all 3 RKE2 nodes (10.0.56.11/12/13), 366/366 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 10:20:11 -05:00
Codex	dee48831c6	Deploy updater public privacy hardening	2026-05-07 17:12:33 -05:00
Codex	0f1dc5f871	fix(certs): kill cert-manager renewal loop on 3 broken Certificate specs Three Certificates requested duration: 2160h (90d) with renewBefore: 720h (30d). step-ca's ACME provisioner caps cert lifetime at 30d, so it silently issued 720h certs — making renewBefore EQUAL to the actual cert lifetime. cert-manager treats the cert as needing immediate renewal the moment it's issued, creates a CertificateRequest, gets a new (still 30d) cert, marks it for immediate renewal, and loops. Damage on 2026-05-07 ~20:30 (caught during regroup after 5h gap): - fc-worldbuilder/worldbuilder-web-tls: 2365 CRs in 18h - fc-distribution/fc-distribution-tls: 10880 CRs in 18h - knowledge/knowledge-tls: 10888 CRs in 18h Total: 24,133 stale CertificateRequest objects in etcd. Bulk-deleted all CRs + Orders in those 3 namespaces, then this commit fixes the source so ArgoCD sync stops re-creating the loop. Fix: match the working 720h/240h pattern used by every other FC service cert (agent-zero, fc-dns, fc-llm-bridge, fc-php, traefik-system, etc.). 30d cert lifetime + 10d renewal headroom = renewal at day 20, which is the cert-manager standard 2/3-of-lifetime practice. Side effect during loop: ALSO contributed to step-ca load and may have caused intermittent timeouts cluster-wide (the latest stuck challenge was timing out dialing step-ca:9443 even though step-ca itself was up). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 15:32:00 -05:00
Codex	11c5f6e6cc	fix(selenium): GitOps-capture selenium-netpol (was unmanaged anywhere) Captured during 2026-05-07 regroup audit. selenium-netpol was applied via raw `kubectl apply` to the cluster on 2026-03-15 with no source-of-truth file anywhere — neither in bluejay-infra nor in any FC service repo. A cluster rebuild from bluejay-infra would have lost it entirely (including the Selenium Grid → Traefik VIP allow rule that gates AAT runs against *.iamworkin.lan services). Captured byte-for-byte from `kubectl get netpol -n selenium selenium-netpol -o yaml`. ServerSideApply via ArgoCD will adopt the existing resource without recreation. The Selenium Grid Deployment + Services themselves are still managed outside ArgoCD (deployed via raw kubectl from the original bring-up). Migrating those into bluejay-infra is a separate lane — this commit only restores GitOps repeatability for the NetworkPolicy. See feedback_networkpolicies_belong_in_bluejay_infra.md for the canonical pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 10:30:59 -05:00
Codex	d637fe9b30	fix(fc-desktop): land 4 NetworkPolicies into bluejay-infra (was deploy-script-only) Repeatability gap caught during 2026-05-07 morning regroup. The four fc-desktop NetworkPolicies (desktop-isolation, fc-desktop-default-deny, remotedesktop-web-isolation, cm-acme-http-solver-allow) were applied via FlowerCore.RemoteDesktop/scripts/deploy-web.sh `kubectl apply` calls. That meant a fresh cluster rebuild from bluejay-infra alone would miss all of them — Browser Lab session isolation, control-plane allow-list, and HTTP-01 cert renewal would silently fail to come up. Canonical FC GitOps pattern is for NetworkPolicies to live alongside other resources in bluejay-infra. Verified by audit: 6 of 11 cluster NetworkPolicies (agent-zero, edge2-services, monitoring, noc-services, telephony, voice) already follow this pattern. fc-desktop was the outlier; selenium-netpol is also unmanaged and tracked separately. Source-of-truth split (now documented in fc-desktop.yaml): - bluejay-infra OWNS: Certificate + IngressRoute + all NetworkPolicies. - FlowerCore.RemoteDesktop scripts/deploy-web.sh OWNS: Deployment + Service ONLY (because `localhost/fc-desktop:linux-xfce` image refs require manual ctr import on each node — Deployment in bluejay-infra would race the image-import step). Follow-up commits in FlowerCore.RemoteDesktop will: - Remove the now-duplicate k8s/{networkpolicy,namespace-default-deny, web-networkpolicy,acme-http01-solver-allow}.yaml files. - Drop the 3 `kubectl_apply_file` lines from scripts/deploy-web.sh. The 4 NPs in this commit are byte-for-byte identical to what's running in the cluster today (verified via kubectl get -o yaml diff). ServerSideApply in the bluejay-infra ApplicationSet will adopt the existing resources without recreating them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 10:27:20 -05:00
Codex	5bfe41beca	fix(monitoring): rename bare Grafana dashboard JSONs out of .json extension ArgoCD's directory-driven manifest parser scans .yaml AND .json by default. Bare Grafana dashboard JSONs (no apiVersion/kind/metadata) poisoned manifest generation for the entire infra-monitoring Application ("Object 'Kind' is missing in <dashboard JSON>"), leaving sync state Unknown. These files are SOURCE for the file-provisioning path on noc1 (/opt/monitoring/grafana/dashboards/) and also get inlined into ConfigMap wrappers like grafana-dashboard-remotedesktop.yaml. They are NOT K8s manifests and shouldn't be in ArgoCD's manifest tree. .argocdignore is for repo-level GitOps source eligibility, not for filtering manifests within a directory-mode Application — the cleanest fix is the .txt extension that ArgoCD's parser skips. Reverts the .argocdignore from the previous commit (didn't take effect). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 10:13:37 -05:00
Codex	df22774674	fix(infra): unstick fc-updater + monitoring ArgoCD apps fc-updater PVC: bump updatecenter-data storage 10Gi → 25Gi. The cluster PVC was already manually expanded to 20Gi to fit Mike Bundle (~5.1 GiB) plus the LocalFsBundleStore.MaxTotalBytes soft cap of 25 GiB (see project_uc_remaining_4_apps_signed_2026_05_06). PVCs cannot shrink, so ArgoCD couldn't sync the smaller git value (OutOfSync, retried 5x with "field can not be less than status.capacity"). Setting git to 25Gi gives headroom matching the soft cap. monitoring .argocdignore: skip bare dashboard JSON files. Both fc-updatecenter-dashboard.json and flowercore-remotedesktop-grafana- dashboard.json live in apps/monitoring/ as source-of-truth for file- provisioning to noc1's /opt/monitoring/grafana/dashboards/. ArgoCD's manifest parser tries to unmarshal every file and chokes on bare dashboard JSON ("Object 'Kind' is missing"), which then poisoned the whole infra-monitoring Application status (Unknown sync, no comparison possible). The .argocdignore tells ArgoCD to skip *.json — actual K8s deploys happen via ConfigMap wrappers like grafana-dashboard-remotedesktop.yaml. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 10:11:27 -05:00
Codex	c4065b15a3	deploy(ttsreader): persist voice reference clips on pvc	2026-05-06 20:48:58 -05:00
Codex	a4aa612373	deploy(fc-distribution): roll startup backfill fix	2026-05-06 19:51:11 -05:00
Codex	c2eb37dee9	deploy(ttsreader): enable phase6 biblical routing	2026-05-06 19:46:25 -05:00
Codex	bf6f542569	deploy(fc-distribution): roll latest endpoint fix	2026-05-06 19:38:26 -05:00
Codex	e150b2102f	deploy(fc-distribution): roll phase1 api image	2026-05-06 19:31:22 -05:00
Codex	33a765b0bc	deploy(fc-intranet-web): roll v20260506-1737 with Wave 2 specialist galleries 6 Wave 2 product galleries landed on intranet master c083016: - /specialists/telephony (7 sections + Overview, +11 tests) - /specialists/library (8 sections + Overview, +17 tests) - /specialists/retail (6 sections + Overview, +16 tests) - /specialists/mysql (6 sections + Overview, +22 tests) - /specialists/php (6 sections + Overview, +9 tests) - /specialists/pimanager (7 sections + Overview, +11 tests) NavMenu.razor wired with new Specialists section. Test ledger: 280 -> 366 (+86) full project, 0W/0E build. Sources: 6 sibling-depth worktrees claude/intranet-w2-{name} dispatched 2026-05-06 per intranet-xxxl-sprint-2026-05-05.md §4 Phase 2. Inherits Q-IK-1..15 + Q-IS-1..12 + Q-IX-1..7 verbatim per Q-IW-5. 6 Q-IW-1..6 cards on Notes decisions-waiting.html.	2026-05-06 17:38:22 -05:00
Codex	5484ed7db6	Adopt fc-updater into ArgoCD	2026-05-06 17:33:32 -05:00
Codex	2aa84349ea	merge claude/bluejay-infra-worldbuilder: roll fc-intranet-web v20260506-2120 with WorldBuilder LIVE flip	2026-05-06 16:22:51 -05:00
Codex	851f8e673b	deploy(fc-intranet-web): roll v20260506-2120 with WorldBuilder LIVE flip WorldBuilder live runtime promotion lands in the Intranet at /services/world-builder + ServiceRegistry homepage tile. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 16:22:43 -05:00
Codex	f78f8c8192	Merge claude/bluejay-infra-ttsreader-4delta: bump fc-ttsreader image for Phase 4delta enrichment landing	2026-05-06 16:04:57 -05:00
Codex	9b255fefc1	merge claude/bluejay-infra-worldbuilder: cpu request fix	2026-05-06 16:04:32 -05:00
Codex	6a89a76e39	fc-ttsreader: bump image to v202605061500 (Phase 4delta enrichment pipeline) Phase 4delta server-side HTML overlay enrichment landed in FlowerCore.TtsReader@8f23e15 (master @6091618). Adds 9-pass enrichment + SQLite-backed cache + 4 REST endpoints (/api/v1/enrich/{html,jsonld,both,passes}) + RenderRequest.sourceJsonLd. Tests 476 -> 522 (+46). Image already imported to all RKE2 nodes via deploy.sh; this bumps the bluejay-infra-managed tag so ArgoCD reconciles the live deployment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 16:04:31 -05:00
Codex	2489464d4f	fix(worldbuilder): cpu request 100m -> 25m to clear scheduler Cluster CPU-request budget at 99% on all 3 RKE2 nodes at deploy time. 0/3 nodes available; "3 Insufficient cpu". Actual CPU usage on the nodes is 10/52/19%, so the cluster is request-overprovisioned but has plenty of real headroom. Idle Blazor + SignalR + ComfyUI poller is ~5m. 25m unblocks scheduling and stays generous for expected runtime. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 16:04:25 -05:00
Codex	4b777b16ac	monitoring: mirror fc-signage-marquee alert group into noc-monitoring K8s target Mirror of FlowerCore.Notes/scripts/monitoring/alerts.yml fc-signage-marquee group into the K8s migration target apps/monitoring/noc-monitoring.yaml so that future migration of the noc1 Podman monitoring stack into RKE2 inherits the marquee alert ruleset automatically. Three rules added: - MarqueeDroppedFramesHigh (5% / 5min / warning) - MarqueeRenderLatencyP99High (16ms / 10min / warning) - MarqueeAnimationDurationDrift (10% / 15min / info) All three gated with `unless on() absent_over_time(metric[7d])` so they don't fire during the metric-not-yet-published window before Track 3 IR-21 source IMPL ships the MarqueeMeter into Common + Web + WPF. Live source-of-truth (the noc1 Podman Prometheus reads from /opt/monitoring/prometheus/alerts.yml) was updated and reloaded in the same session — Notes commit 300daa0 carries the matching alerts.yml + Grafana fc-signage-dashboard.json change. Per feedback_monitoring_k8s_target_vs_live_podman: this file is the forward-looking K8s migration target, NOT what the live Podman Prometheus reads. ArgoCD-syncing this file does NOT push alerts to the live monitoring stack. Companion to: - FlowerCore.Notes 300daa0 (live alerts.yml + Grafana panels deployed) - docs/signage/marquee-performance-telemetry-design.md (Track 3 IR-21 spec) - docs/signage/marquee-animation-phases.md (Track 6 13-phase coverage matrix) Memory: project_marquee_vr_promotion_landed_2026_05_06 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 16:01:44 -05:00
Codex	8c60e3a4d3	merge claude/bluejay-infra-worldbuilder: fc-worldbuilder ArgoCD app	2026-05-06 15:57:34 -05:00
Codex	df02b4c3c3	feat(worldbuilder): add fc-worldbuilder ArgoCD app FlowerCore.WorldBuilder runtime deploy: Namespace + Longhorn PVC + Deployment + Service + step-ca Certificate + Traefik IngressRoute. ArgoCD ApplicationSet picks up apps/worldbuilder/ within ~3 minutes. Source: D:\git\FlowerCore\FlowerCore.WorldBuilder@6ed6d26 Initial image: localhost/fc-worldbuilder:v202605062048 (already imported on all 3 RKE2 nodes via ctr images import). DNS preflight done: worldbuilder.iamworkin.lan -> 10.0.56.200 (Traefik VIP) in pfSense Unbound (FlowerCore.DNS provider was 502 at deploy time, fell back to direct pfSense PHP exec via diag_command.php). ImageGen backend: BLUEJAY-WS http://10.0.56.20:8188 (R9700 / gfx1201 / ROCm 7.2.1). One real branding render confirmed working 2026-05-06T20:36:47Z. Memory references in README: - feedback_pfsense_dns_required_for_acme - feedback_rke2_image_import_per_node_scp - feedback_k8s_probes_must_not_hit_openapi - feedback_k8s_probes_behind_auth_middleware - feedback_dataprotection_keys_persist_to_app_dbcontext Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 15:56:59 -05:00
Codex	c0dceafffd	deploy(ttsreader): roll web v20260506-47a88ae	2026-05-06 14:40:57 -05:00
Codex	490db8f9e6	deploy(fc-intranet-web): roll v20260505-1108 with fleet-search seam landed Bumps tag to bring live pod up to FlowerCore.Intranet.Web@a9ede80 (master tip post-fleet-search-resurrect merge). Image imported to all 3 RKE2 nodes via scripts/deploy.sh v20260505-1108. Closes the source-vs-deployed gap that existed since 2026-04-29: the KnowledgeFleetSearchController + Service + TrustedHeader auth handler were running on the deployed pod but never landed on master. Surgical extraction from stale codex/fleet-knowledge-search branch (12-file rebase conflict made full merge non-trivial) brings the source up to match production. +7 tests (280/280 vs 273), 0W/0E build. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 10:18:36 -05:00
Codex	1926bdaf3b	merge claude/bluejay-infra-update-center-monitoring-2026-05-05: Update Center Operations dashboard mirror (Phase 1D)	2026-05-05 11:01:06 -05:00
Codex	ca8d062826	feat(monitoring): mirror Update Center Operations dashboard (Track 1D) Adds fc-updatecenter-dashboard.json (uid: fc-updatecenter, version: 2) to apps/monitoring/ — mirrors the dashboard deployed to noc1 at /opt/monitoring/grafana/dashboards/fc-updatecenter-dashboard.json. 13 panels: 5 existing probe/availability panels + 1 OTEL row header + 7 new panels for the 6 OTEL counters added to FlowerCore.Updater.Web: updatecenter_manifest_requests_total updatecenter_bundle_download_bytes_total updatecenter_bundle_downloads_total updatecenter_checkins_total updatecenter_release_publishes_total updatecenter_signature_verify_failures_total Live on Grafana at https://grafana.iamworkin.lan/d/fc-updatecenter Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 10:54:39 -05:00
Codex	1889462fc4	deploy(fc-intranet-web): roll v20260505-1041 with fc_dp_keys migration Bumps tag to include the new AddDataProtectionKeys EF migration that closes the fc_dp_keys table-creation gap from v20260505-1023. Master tip a82d7d4. Previous tag v20260505-1023 crash-looped on every page load with 'no such table: fc_dp_keys' due to eb9fe6d (DataProtection-in-DB) registering the DI but missing the table-creation migration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 10:35:42 -05:00
Codex	523ba61232	deploy(fc-intranet-web): roll Phase 0 closeout image v20260505-1023 Bumps intranet image tag to bring live pod up to FlowerCore.Intranet.Web@ea80c25 (post-XXXL Phase 0 closeout merge). Image imported to all 3 RKE2 nodes via scripts/deploy.sh v20260505-1023. Carries the 8 commits from claude/intranet-fleet-fixes: - Range processing for read-aloud audio - Blazor SignalR receive limit raise (8 MB) - ASP.NET footgun sweep (PR #3) - Self-contained linux-x64 publish (transitive deps) - Blazor error-ui banner proof + AAT - DataProtection-in-DB + FcReconnectModal adoption - Custom .bj-reconnect CSS removal - Library PNG privacy withdrawal + WorldBuilder design page + Overview enrichment Tests: 273/273 passed, 32 AAT skipped, 0W/0E build. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 10:27:14 -05:00
Codex	53f67c8713	merge claude/k8s-manifest-hardening: K8s gotcha sweep (C7) + lint extensions	2026-05-04 23:00:34 -05:00
Codex	6b9cf3d12c	K8s gotcha sweep C7 — extend lint + cover Track A allowlist + scope Notes/k8s Follow-up to `0b52093` (K8s manifest hardening) closing two real gaps the prior sweep didn't catch: 1. Public read-write allowlist regression guard (Track A) - New PublicReadWriteAllowlistHosts set tracks updatecenter.iamworkin.lan + updates.iamworkin.lan. The allowlist on those hosts is GET\|\|HEAD\|\|POST\|\|OPTIONS — POST is required for the bootstrap-JWT check-in endpoint. PUT/PATCH/DELETE must still 404 at the route. - New PublicReadWriteIngressRoutes_MustPinGetHeadPostOptionsAllowlist test enforces the allowlist invariant (3 required methods present, 3 forbidden methods absent). - Companion conftest.dev policy 08_public_readwrite_allowlist.rego. 2. Selenium NetworkPolicy DNAT backend port audit - FlowerCore.Notes/k8s/selenium/06-networkpolicy.yaml allowed Traefik VIP 10.0.56.200:443 + :80 but its 10.42.0.0/16 + 10.43.0.0/16 egress rules didn't include the post-DNAT backend ports (8443 for Traefik TLS, 8080 for HTTP). Per feedback_netpol_dnat_backend_port: kube-proxy DNATs the destination to a backend pod IP+port BEFORE Calico evaluates the FORWARD chain, so without those backend ports in the pod CIDR rule, Selenium-driven browser AAT calls to https://*.iamworkin.lan time out at connect. - Lint inventory now includes FlowerCore.Notes/k8s/selenium/ so regressions in this manifest fail fast. Lint scope notes: - FlowerCore.Notes/k8s/guacamole/ + monitoring/ are historical scaffolds that have diverged from the live state (bluejay-infra/apps/ is canonical). Operator review is required before bringing them in line OR decommissioning them — kept out of lint scope until that decision lands (see xxl-regroup-2026-05-03-followup.md "Codex 7 §0"). README hardening: - New "Public read-write allowlist hosts" entry under "Known gotchas" documenting the GET\|\|HEAD\|\|POST\|\|OPTIONS pattern + linking the lint. Tests: 8/8 lint tests pass. Companion fix in FlowerCore.Updater repo on branch codex/k8s-gotcha-fleet-sweep-c7 (k8s/web-deployment.yaml: localhost/ image needs imagePullPolicy: Never). The FlowerCore.Updater fix applies to a deploy that's currently live but bites only on first scheduled-pod landing on a fresh node — not a live production-impact regression. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 22:57:59 -05:00
Codex	0b52093b36	K8s manifest hardening + new bluejay-infra-lint test project Manifest hardening (per documented memories): - apps/asterisk/deployment.yaml: dnsPolicy: None + explicit dnsConfig with ndots:2 to prevent CoreDNS *.iamworkin.lan template from hijacking external egress (downloads.asterisk.org). - apps/fc-llm-bridge/fc-llm-bridge.yaml: same dnsConfig pattern for api.anthropic.com egress. - apps/fc-ttsreader/fc-ttsreader.yaml: same dnsConfig pattern for huggingface.co model seeding. - apps/fc-messageboard/fc-messageboard.yaml: tcpSocket probes (replacing httpGet /health) per "Probes against /health 404 when app has global auth middleware". - apps/fc-signalcontrol/fc-signalcontrol.yaml: same tcpSocket probe fix. New lint project: - tests/bluejay-infra-lint/BluejayInfraLint.Tests.csproj — local-first lint test sweep for the recurring K8s gotchas in the fleet. - tests/bluejay-infra-lint/FleetManifestLintTests.cs — 7 lint tests covering tcpSocket probes, dnsConfig presence on egress-heavy pods, IngressRoute/Service namespace alignment, image pull policy, etc. - tests/bluejay-infra-lint/conftest.dev/ — matching conftest policies for environments with conftest/opa. - .gitignore — adds bin/ + obj/ + DS_Store/swp. README.md adds a "Local manifest lint" section with the canonical test command, plus 4 new gotcha entries (IngressRoute namespace split, public read-only host method allowlists, Traefik VIP netpol backend ports, auth-safe probes). Tests: 7 / 7 lint tests passed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 03:18:04 -05:00
Codex	7a9098d3bd	fix(fc-ttsreader): lower web cpu request	2026-05-04 02:28:11 -05:00
Andrew Stoltz	57d7ba46a7	feat(monitoring): add fc-remotedesktop grafana dashboard JSON-provisioned dashboard for FlowerCore.RemoteDesktop session metrics, matches the Apr 23 staging done in the codex/ttsreader-release-b6ca2d5 worktree. Drop into apps/monitoring so ArgoCD-managed Grafana provisioning picks it up alongside the other FC service dashboards.	2026-04-30 14:32:54 -05:00
Andrew Stoltz	9ec2e2d52e	deploy(ttsreader): bump web image to b6ca2d5	2026-04-30 12:43:48 -05:00
Andrew Stoltz	b4d62a8a50	deploy(fc-ttsreader): roll chapter-context image	2026-04-30 02:31:55 -05:00
Andrew Stoltz	fbbc07023b	deploy(fc-llm-bridge): roll fc:vision image v202604300022 Source: FlowerCore.LlmBridge@8dd181c (feat: fc:vision route + image content forwarding). Adds: - fc:vision tier alias parsing (TryParseTier handles fc:vision, FC:VISION, openai/fc:vision, vision) - Image content forwarding: OpenAi image_url shape (https URL + data:[mediaType];base64,... URI) and Anthropic image/source passthrough are now promoted to LlmContentBlocks. Text-only content-parts arrays still flatten to the legacy joined string. - DefaultRoutes seeder + appsettings.json gain Vision -> Anthropic + claude-sonnet-4-6. Image built on BLUEJAY-WS, podman save + ctr import to all 3 RKE2 nodes (rke2-server, rke2-agent1, rke2-agent2). Bridge tests: 62/62 green (was 51/51, +11). Backwards-compatible with current chat / util / embed callers; existing routes unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 00:26:45 -05:00
Andrew Stoltz	4b0eef0fb0	deploy(fc-llm-bridge): roll alias-fix image v20260430001132	2026-04-30 00:13:48 -05:00
Andrew Stoltz	bb09a3786f	fix(knowledge): pin live manifest to bundled edition path	2026-04-29 23:37:02 -05:00
Andrew Stoltz	006dbcf671	fix(agent-zero): export knowledge mcp gate to python builder	2026-04-29 23:32:55 -05:00
Andrew Stoltz	1be71d6ba7	fix(agent-zero): export mcp servers without python indent errors	2026-04-29 23:19:48 -05:00

1 2 3 4 5 ...

456 Commits