Follow-up to 0b52093 (K8s manifest hardening) closing two real gaps the
prior sweep didn't catch:
1. Public read-write allowlist regression guard (Track A)
- New PublicReadWriteAllowlistHosts set tracks updatecenter.iamworkin.lan
+ updates.iamworkin.lan. The allowlist on those hosts is
GET||HEAD||POST||OPTIONS — POST is required for the bootstrap-JWT
check-in endpoint. PUT/PATCH/DELETE must still 404 at the route.
- New PublicReadWriteIngressRoutes_MustPinGetHeadPostOptionsAllowlist
test enforces the allowlist invariant (3 required methods present,
3 forbidden methods absent).
- Companion conftest.dev policy 08_public_readwrite_allowlist.rego.
2. Selenium NetworkPolicy DNAT backend port audit
- FlowerCore.Notes/k8s/selenium/06-networkpolicy.yaml allowed Traefik
VIP 10.0.56.200:443 + :80 but its 10.42.0.0/16 + 10.43.0.0/16 egress
rules didn't include the post-DNAT backend ports (8443 for Traefik
TLS, 8080 for HTTP). Per feedback_netpol_dnat_backend_port: kube-proxy
DNATs the destination to a backend pod IP+port BEFORE Calico
evaluates the FORWARD chain, so without those backend ports in the
pod CIDR rule, Selenium-driven browser AAT calls to
https://*.iamworkin.lan time out at connect.
- Lint inventory now includes FlowerCore.Notes/k8s/selenium/ so
regressions in this manifest fail fast.
Lint scope notes:
- FlowerCore.Notes/k8s/guacamole/ + monitoring/ are historical
scaffolds that have diverged from the live state (bluejay-infra/apps/
is canonical). Operator review is required before bringing them in
line OR decommissioning them — kept out of lint scope until that
decision lands (see xxl-regroup-2026-05-03-followup.md "Codex 7 §0").
README hardening:
- New "Public read-write allowlist hosts" entry under "Known gotchas"
documenting the GET||HEAD||POST||OPTIONS pattern + linking the lint.
Tests: 8/8 lint tests pass.
Companion fix in FlowerCore.Updater repo on branch
codex/k8s-gotcha-fleet-sweep-c7 (k8s/web-deployment.yaml: localhost/ image
needs imagePullPolicy: Never). The FlowerCore.Updater fix applies to a
deploy that's currently live but bites only on first scheduled-pod
landing on a fresh node — not a live production-impact regression.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
JSON-provisioned dashboard for FlowerCore.RemoteDesktop session metrics,
matches the Apr 23 staging done in the codex/ttsreader-release-b6ca2d5
worktree. Drop into apps/monitoring so ArgoCD-managed Grafana provisioning
picks it up alongside the other FC service dashboards.
Per-profile MoodAnnotationModelOverride picker — Profiles page now shows
a model dropdown from IModelRegistry instead of a free-text field; model
override null-falls-back to global TtsReader:Ollama:DefaultModel.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The prior commit b71f9e4 created a stray YAML document between the
bluejay-tools-c and bluejay-profile sections. kubectl applied the stray
block's data to bluejay-profile (wrong ConfigMap, wrong mount target).
The setup-bluejay initContainer copies bluejay-tools-{a,b,c} to the tools
directory; bluejay-profile is copied to the agent profile directory. Tools
must live in one of the three tools ConfigMaps.
Fix: insert corpus_search.py and intranet_search.py directly into the
bluejay-tools-c YAML document (before kind/metadata, matching the
data-first layout the rest of the file uses). Also fix two mojibake
characters (→ and ·) that were corrupted in the prior commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Add corpus_search.py to bluejay-tools-c: semantic vector search over
fleet SQLite-vec DBs (fleet-workstation-full, fleet-pi-edge, fleet-bmo-bot).
Returns offline-friendly results for Bible/Greek/Hebrew/Strongs corpora.
Cluster pod degrades gracefully (no DB mounted yet — BLUEJAY-WS only for now).
- Add intranet_search.py to bluejay-tools-c: live RAG search over the
intranet vector store via GET /api/search?q=...&topK=N. Uses in-cluster
service URL (http://intranet-web.intranet.svc:5300) to bypass Traefik TLS
and the private-range egress denylist.
- Fix intranet_search.py param name: was 'limit', now 'topK' matching the
SearchController's [FromQuery] parameter name.
- NetworkPolicy: add egress rule for intranet namespace port 5300 (without
this the pod's TCP connection to the search endpoint was dropped).
- agent-zero.yaml: set FLOWERCORE_INTRANET_URL env var to in-cluster service
URL so intranet_search uses internal routing, not the public Traefik VIP.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add `print-web-api-keys` OnePasswordItem CRD that syncs from 1Password
"Print.Web API Keys" vault item (password field). Mount as PRINT_WEB_API_KEY
env var in the agent-zero container.
The print_web.py Python tool (already in bluejay-tools ConfigMaps) reads
PRINT_WEB_URL and PRINT_WEB_API_KEY env vars for all HTTP calls to the
thermal print service on edge2. Previously the key was unset so every API
call was rejected with 401.
Note: Print.Web uses the legacy REST MCP shape (/api/mcp/tools/*) not the
streamable-http protocol. The Python tool bridges this gap — no /mcp endpoint
exists on Print.Web today. Network policy already allows 10.0.57.16:5200.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 of Mac mini onboarding (2026-04-28):
- Add OnePasswordItem CRD 'macmini-vnc-creds' in guacamole namespace bound to
vault item 'Mac Mini' — operator mints Secret with username/password/VNC Password fields
- Mac mini discovered at 10.0.56.115 (INFRA VLAN) — not 10.0.57.50 stored in 1P IP field
- Guacamole connections updated via API (not stored here): VNC conn #10, SSH conns #9/#33
corrected from old IP 10.0.57.50 → 10.0.56.115
- macOS: 26.4.1 (Sequoia), Apple M1, 16 GB, user: bluejay (admin group)
- VNC port 5900 confirmed open; SSH works via noc1 jumpbox with password auth
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds ProgressHub endpoint at /hubs/progress with project-scoped
group broadcasting for JobStarted, CueProgress, JobCompleted, and
JobFailed events.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Deploys fix for stale Failed manifest accumulation in TTS Reader Ops view
and atomic-write guard against empty/corrupt job manifests.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Sprint E XXL Phase 4γ MVP deploy — POST /api/v1/render endpoint.
Two changes:
1. Image tag v202604272339 → v202604280946 (TtsReader@d9e0a58 master tip
includes the new RenderController + RenderService + 9 tests).
2. New TtsReader__Render__CdnDirectory=/data/cdn env var. Default
wwwroot/cdn resolves under the read-only app filesystem when
runAsNonRoot=true; pin to the existing writable PVC mount alongside
other TtsReader runtime data. Manifests + cue audio land at
/data/cdn/sha256/<hash>/manifest.json + cues/.
Pre-existing PVC mount at /data/ already covers this — no PVC change
needed, just the env var override.
Pairs with TtsReader@d9e0a58 master tip (ready for image build + import).
Bump intranet image to v20260427-2353 (master @ 38b0148):
- Sprint E search lane: /search Blazor page + IntranetSearchService
+ DocsCorpusIndexer + Shared.Indexing wiring
- 7 new service pages: LocalAiAgents, AiTopology, Distribution, Dns,
Knowledge, LlmBridge, Provisioning
- PiManager drift docs
New env var: PageReadingOverrides__FilePath=/data/page-reading-overrides.json
so the persisted Lane 2α store lives on the writable PVC instead of
the default in-memory fallback (which loses state on pod restart).
Operator-edited overrides via the existing /api/v1/pages/{encoded}/overrides
controller will now survive across restarts.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
TtsReader@9333480: distinguishes partial-render (yellow Warning, audio
plays, 'Re-render N failed sentences' button) from full-fail (red
Danger, 'Try render again'). New TtsFallbackChainFailedException carries
both voices when Kokoro + Piper both fail; chapter breadcrumb names
the entire chain instead of just the requested voice. +8 tests.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Kokoro pod has 4 restarts in 2d6h with exit 143 (SIGTERM from kubelet).
kubectl describe events all show:
Liveness probe failed: Get "http://10.42.229.109:8880/v1/audio/voices":
context deadline exceeded
The probe path /v1/audio/voices shares the FastAPI worker pool with
/v1/audio/speech. A long synth (Bible chapter, 30+ sentences) holds the
pool past the prior 5s × 3 = 15s probe window, kubelet kills the pod,
in-flight renders fail. Operator hits "fallback chain failed" toasts +
partial-render breadcrumbs during these windows.
Bump probe timeoutSeconds 5 → 15 and failureThreshold 3 → 5 → 75 s of
grace before kubelet gives up. Combined with the kokoro-side circuit
breaker landing in TtsReader (Sprint E Phase 1b), the FC backend will
also stop slamming kokoro during recovery so it can serve the probe
even faster.
The companion Prometheus alerts (KokoroPodFlapping, PiperPodFlapping)
land in FlowerCore.Notes/scripts/monitoring/alerts.yml.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Day 8 disk-cache warmer crashes on production with
'Read-only file system : /home/app/data' because the relative default
'data/voice-previews' resolves under runAsNonRoot HOME (read-only with
readOnlyRootFilesystem=true). Pin to /data/voice-previews so the cache
lands on the writable PVC mount alongside ttsreader.db, audio output,
and jobs root.
Image v202604272216 (already on nodes) is unaffected by this — only
the env routing changes. ArgoCD reconciles + rollout restart picks up
the new env without rebuild.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v202604272157 crash-looped on the production PVC because Database.EnsureCreated()
is a no-op on existing DBs and the VoiceProfiles table was missing. TtsReader@a9f0b73
adds an idempotent CREATE TABLE IF NOT EXISTS to the infra reconciler before
TtsReaderDataSeeder runs. Bumping the manifest to pick up that fix.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds 7 new pages (5 service pages, AI topology, opencode operator guide)
to https://intranet.iamworkin.lan:
/services/dns
/services/distribution
/services/llm-bridge
/services/knowledge
/services/provisioning
/services/ai-topology
/development/local-ai-agents
Plus topology corrections in /services/ai (AiStack.razor) and 6 new nav entries.
Source commit: FlowerCore.Intranet.Web@1598542 on
codex-wip-pre-readaloud-collision-2026-04-24.
Image built from artifacts/publish via Dockerfile.deploy on BLUEJAY-WS,
imported to all 3 RKE2 nodes (rke2-server + rke2-agent1 + rke2-agent2).
Build: 0 warnings, 0 errors, 197/197 tests passing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>