Compare commits

...

58 Commits

Author SHA1 Message Date
Codex
05af0c48f9 Mirror voice stack monitoring assets 2026-05-06 17:35:59 -05:00
Codex
c0dceafffd deploy(ttsreader): roll web v20260506-47a88ae 2026-05-06 14:40:57 -05:00
Codex
490db8f9e6 deploy(fc-intranet-web): roll v20260505-1108 with fleet-search seam landed
Bumps tag to bring live pod up to FlowerCore.Intranet.Web@a9ede80 (master tip
post-fleet-search-resurrect merge). Image imported to all 3 RKE2 nodes via
scripts/deploy.sh v20260505-1108.

Closes the source-vs-deployed gap that existed since 2026-04-29: the
KnowledgeFleetSearchController + Service + TrustedHeader auth handler were
running on the deployed pod but never landed on master. Surgical extraction
from stale codex/fleet-knowledge-search branch (12-file rebase conflict made
full merge non-trivial) brings the source up to match production.

+7 tests (280/280 vs 273), 0W/0E build.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 10:18:36 -05:00
Codex
1926bdaf3b merge claude/bluejay-infra-update-center-monitoring-2026-05-05: Update Center Operations dashboard mirror (Phase 1D) 2026-05-05 11:01:06 -05:00
Codex
ca8d062826 feat(monitoring): mirror Update Center Operations dashboard (Track 1D)
Adds fc-updatecenter-dashboard.json (uid: fc-updatecenter, version: 2)
to apps/monitoring/ — mirrors the dashboard deployed to noc1 at
/opt/monitoring/grafana/dashboards/fc-updatecenter-dashboard.json.

13 panels: 5 existing probe/availability panels + 1 OTEL row header
+ 7 new panels for the 6 OTEL counters added to FlowerCore.Updater.Web:

  updatecenter_manifest_requests_total
  updatecenter_bundle_download_bytes_total
  updatecenter_bundle_downloads_total
  updatecenter_checkins_total
  updatecenter_release_publishes_total
  updatecenter_signature_verify_failures_total

Live on Grafana at https://grafana.iamworkin.lan/d/fc-updatecenter

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 10:54:39 -05:00
Codex
1889462fc4 deploy(fc-intranet-web): roll v20260505-1041 with fc_dp_keys migration
Bumps tag to include the new AddDataProtectionKeys EF migration that closes
the fc_dp_keys table-creation gap from v20260505-1023. Master tip a82d7d4.

Previous tag v20260505-1023 crash-looped on every page load with
'no such table: fc_dp_keys' due to eb9fe6d (DataProtection-in-DB)
registering the DI but missing the table-creation migration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 10:35:42 -05:00
Codex
523ba61232 deploy(fc-intranet-web): roll Phase 0 closeout image v20260505-1023
Bumps intranet image tag to bring live pod up to FlowerCore.Intranet.Web@ea80c25
(post-XXXL Phase 0 closeout merge). Image imported to all 3 RKE2 nodes via
scripts/deploy.sh v20260505-1023.

Carries the 8 commits from claude/intranet-fleet-fixes:
- Range processing for read-aloud audio
- Blazor SignalR receive limit raise (8 MB)
- ASP.NET footgun sweep (PR #3)
- Self-contained linux-x64 publish (transitive deps)
- Blazor error-ui banner proof + AAT
- DataProtection-in-DB + FcReconnectModal adoption
- Custom .bj-reconnect CSS removal
- Library PNG privacy withdrawal + WorldBuilder design page + Overview enrichment

Tests: 273/273 passed, 32 AAT skipped, 0W/0E build.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 10:27:14 -05:00
Codex
53f67c8713 merge claude/k8s-manifest-hardening: K8s gotcha sweep (C7) + lint extensions 2026-05-04 23:00:34 -05:00
Codex
6b9cf3d12c K8s gotcha sweep C7 — extend lint + cover Track A allowlist + scope Notes/k8s
Follow-up to 0b52093 (K8s manifest hardening) closing two real gaps the
prior sweep didn't catch:

1. Public read-write allowlist regression guard (Track A)
   - New PublicReadWriteAllowlistHosts set tracks updatecenter.iamworkin.lan
     + updates.iamworkin.lan. The allowlist on those hosts is
     GET||HEAD||POST||OPTIONS — POST is required for the bootstrap-JWT
     check-in endpoint. PUT/PATCH/DELETE must still 404 at the route.
   - New PublicReadWriteIngressRoutes_MustPinGetHeadPostOptionsAllowlist
     test enforces the allowlist invariant (3 required methods present,
     3 forbidden methods absent).
   - Companion conftest.dev policy 08_public_readwrite_allowlist.rego.

2. Selenium NetworkPolicy DNAT backend port audit
   - FlowerCore.Notes/k8s/selenium/06-networkpolicy.yaml allowed Traefik
     VIP 10.0.56.200:443 + :80 but its 10.42.0.0/16 + 10.43.0.0/16 egress
     rules didn't include the post-DNAT backend ports (8443 for Traefik
     TLS, 8080 for HTTP). Per feedback_netpol_dnat_backend_port: kube-proxy
     DNATs the destination to a backend pod IP+port BEFORE Calico
     evaluates the FORWARD chain, so without those backend ports in the
     pod CIDR rule, Selenium-driven browser AAT calls to
     https://*.iamworkin.lan time out at connect.
   - Lint inventory now includes FlowerCore.Notes/k8s/selenium/ so
     regressions in this manifest fail fast.

Lint scope notes:
   - FlowerCore.Notes/k8s/guacamole/ + monitoring/ are historical
     scaffolds that have diverged from the live state (bluejay-infra/apps/
     is canonical). Operator review is required before bringing them in
     line OR decommissioning them — kept out of lint scope until that
     decision lands (see xxl-regroup-2026-05-03-followup.md "Codex 7 §0").

README hardening:
   - New "Public read-write allowlist hosts" entry under "Known gotchas"
     documenting the GET||HEAD||POST||OPTIONS pattern + linking the lint.

Tests: 8/8 lint tests pass.

Companion fix in FlowerCore.Updater repo on branch
codex/k8s-gotcha-fleet-sweep-c7 (k8s/web-deployment.yaml: localhost/ image
needs imagePullPolicy: Never). The FlowerCore.Updater fix applies to a
deploy that's currently live but bites only on first scheduled-pod
landing on a fresh node — not a live production-impact regression.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 22:57:59 -05:00
Codex
0b52093b36 K8s manifest hardening + new bluejay-infra-lint test project
Manifest hardening (per documented memories):
- apps/asterisk/deployment.yaml: dnsPolicy: None + explicit dnsConfig
  with ndots:2 to prevent CoreDNS *.iamworkin.lan template from
  hijacking external egress (downloads.asterisk.org).
- apps/fc-llm-bridge/fc-llm-bridge.yaml: same dnsConfig pattern for
  api.anthropic.com egress.
- apps/fc-ttsreader/fc-ttsreader.yaml: same dnsConfig pattern for
  huggingface.co model seeding.
- apps/fc-messageboard/fc-messageboard.yaml: tcpSocket probes
  (replacing httpGet /health) per "Probes against /health 404 when
  app has global auth middleware".
- apps/fc-signalcontrol/fc-signalcontrol.yaml: same tcpSocket probe
  fix.

New lint project:
- tests/bluejay-infra-lint/BluejayInfraLint.Tests.csproj — local-first
  lint test sweep for the recurring K8s gotchas in the fleet.
- tests/bluejay-infra-lint/FleetManifestLintTests.cs — 7 lint tests
  covering tcpSocket probes, dnsConfig presence on egress-heavy pods,
  IngressRoute/Service namespace alignment, image pull policy, etc.
- tests/bluejay-infra-lint/conftest.dev/ — matching conftest policies
  for environments with conftest/opa.
- .gitignore — adds bin/ + obj/ + DS_Store/swp.

README.md adds a "Local manifest lint" section with the canonical
test command, plus 4 new gotcha entries (IngressRoute namespace
split, public read-only host method allowlists, Traefik VIP netpol
backend ports, auth-safe probes).

Tests: 7 / 7 lint tests passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 03:18:04 -05:00
Codex
7a9098d3bd fix(fc-ttsreader): lower web cpu request 2026-05-04 02:28:11 -05:00
Andrew Stoltz
57d7ba46a7 feat(monitoring): add fc-remotedesktop grafana dashboard
JSON-provisioned dashboard for FlowerCore.RemoteDesktop session metrics,
matches the Apr 23 staging done in the codex/ttsreader-release-b6ca2d5
worktree. Drop into apps/monitoring so ArgoCD-managed Grafana provisioning
picks it up alongside the other FC service dashboards.
2026-04-30 14:32:54 -05:00
Andrew Stoltz
9ec2e2d52e deploy(ttsreader): bump web image to b6ca2d5 2026-04-30 12:43:48 -05:00
Andrew Stoltz
b4d62a8a50 deploy(fc-ttsreader): roll chapter-context image 2026-04-30 02:31:55 -05:00
Andrew Stoltz
fbbc07023b deploy(fc-llm-bridge): roll fc:vision image v202604300022
Source: FlowerCore.LlmBridge@8dd181c (feat: fc:vision route + image
content forwarding). Adds:

- fc:vision tier alias parsing (TryParseTier handles fc:vision,
  FC:VISION, openai/fc:vision, vision)
- Image content forwarding: OpenAi image_url shape (https URL +
  data:[mediaType];base64,... URI) and Anthropic image/source
  passthrough are now promoted to LlmContentBlocks. Text-only
  content-parts arrays still flatten to the legacy joined string.
- DefaultRoutes seeder + appsettings.json gain Vision -> Anthropic +
  claude-sonnet-4-6.

Image built on BLUEJAY-WS, podman save + ctr import to all 3 RKE2
nodes (rke2-server, rke2-agent1, rke2-agent2). Bridge tests: 62/62
green (was 51/51, +11). Backwards-compatible with current chat /
util / embed callers; existing routes unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 00:26:45 -05:00
Andrew Stoltz
4b0eef0fb0 deploy(fc-llm-bridge): roll alias-fix image v20260430001132 2026-04-30 00:13:48 -05:00
Andrew Stoltz
bb09a3786f fix(knowledge): pin live manifest to bundled edition path 2026-04-29 23:37:02 -05:00
Andrew Stoltz
006dbcf671 fix(agent-zero): export knowledge mcp gate to python builder 2026-04-29 23:32:55 -05:00
Andrew Stoltz
1be71d6ba7 fix(agent-zero): export mcp servers without python indent errors 2026-04-29 23:19:48 -05:00
Andrew Stoltz
0c8026c912 fix(agent-zero): avoid heredoc break in mcp bootstrap 2026-04-29 23:16:54 -05:00
Andrew Stoltz
621ae47e00 fix(agent-zero): repair fc knowledge mcp manifest 2026-04-29 23:11:57 -05:00
Andrew Stoltz
ae6b8c0142 fix(knowledge): keep mcp key env on new token secret 2026-04-29 23:06:07 -05:00
Andrew Stoltz
da55220218 feat(agent-zero): wire fc_knowledge phase1 rollout 2026-04-29 22:59:19 -05:00
Andrew Stoltz
b1ad253dd6 fix(agent-zero): prefix bridge embedding alias for litellm 2026-04-29 21:14:12 -05:00
Andrew Stoltz
ee935f6e07 fix(agent-zero): keep internal util/embed on bridge v1 2026-04-29 21:09:04 -05:00
Andrew Stoltz
2853ee2024 chore(bridge): bump fc-llm-bridge image tag v202604292028 2026-04-29 20:50:55 -05:00
Andrew Stoltz
b4a34e16ca refactor(agent-zero): drop ollama-proxy sidecar (Phase 3) 2026-04-29 20:50:55 -05:00
Andrew Stoltz
0d5a1fd530 fix(agent-zero): route util and embed through llm bridge 2026-04-29 19:14:01 -05:00
Andrew Stoltz
1b633f57b2 chore(infra): wire knowledge MCP api key secret 2026-04-29 18:04:43 -05:00
Andrew Stoltz
ee8afd0a08 deploy(intranet): promote auth-gated intranet image 2026-04-29 17:11:17 -05:00
Andrew Stoltz
cf35884eae deploy(intranet): harden knowledge search rollout 2026-04-29 16:43:09 -05:00
Andrew Stoltz
9881767b11 deploy(intranet): bump intranet web for knowledge search lane 2026-04-29 16:21:27 -05:00
Andrew Stoltz
c9bf23834b chore(ttsreader): bump image to v202604291817
Per-profile MoodAnnotationModelOverride picker — Profiles page now shows
a model dropdown from IModelRegistry instead of a free-text field; model
override null-falls-back to global TtsReader:Ollama:DefaultModel.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 13:21:40 -05:00
Andrew Stoltz
174002023d fix(agent-zero): move corpus_search + intranet_search into bluejay-tools-c
The prior commit b71f9e4 created a stray YAML document between the
bluejay-tools-c and bluejay-profile sections. kubectl applied the stray
block's data to bluejay-profile (wrong ConfigMap, wrong mount target).

The setup-bluejay initContainer copies bluejay-tools-{a,b,c} to the tools
directory; bluejay-profile is copied to the agent profile directory. Tools
must live in one of the three tools ConfigMaps.

Fix: insert corpus_search.py and intranet_search.py directly into the
bluejay-tools-c YAML document (before kind/metadata, matching the
data-first layout the rest of the file uses). Also fix two mojibake
characters (→ and ·) that were corrupted in the prior commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 08:49:23 -05:00
Andrew Stoltz
b71f9e4ec9 feat(agent-zero): add corpus_search + intranet_search to cluster configmaps
- Add corpus_search.py to bluejay-tools-c: semantic vector search over
  fleet SQLite-vec DBs (fleet-workstation-full, fleet-pi-edge, fleet-bmo-bot).
  Returns offline-friendly results for Bible/Greek/Hebrew/Strongs corpora.
  Cluster pod degrades gracefully (no DB mounted yet — BLUEJAY-WS only for now).

- Add intranet_search.py to bluejay-tools-c: live RAG search over the
  intranet vector store via GET /api/search?q=...&topK=N. Uses in-cluster
  service URL (http://intranet-web.intranet.svc:5300) to bypass Traefik TLS
  and the private-range egress denylist.

- Fix intranet_search.py param name: was 'limit', now 'topK' matching the
  SearchController's [FromQuery] parameter name.

- NetworkPolicy: add egress rule for intranet namespace port 5300 (without
  this the pod's TCP connection to the search endpoint was dropped).

- agent-zero.yaml: set FLOWERCORE_INTRANET_URL env var to in-cluster service
  URL so intranet_search uses internal routing, not the public Traefik VIP.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 08:34:31 -05:00
Andrew Stoltz
f1431f7324 feat(agent-zero): wire Print.Web API key to pod via 1Password OnePasswordItem
Add `print-web-api-keys` OnePasswordItem CRD that syncs from 1Password
"Print.Web API Keys" vault item (password field). Mount as PRINT_WEB_API_KEY
env var in the agent-zero container.

The print_web.py Python tool (already in bluejay-tools ConfigMaps) reads
PRINT_WEB_URL and PRINT_WEB_API_KEY env vars for all HTTP calls to the
thermal print service on edge2. Previously the key was unset so every API
call was rejected with 401.

Note: Print.Web uses the legacy REST MCP shape (/api/mcp/tools/*) not the
streamable-http protocol. The Python tool bridges this gap — no /mcp endpoint
exists on Print.Web today. Network policy already allows 10.0.57.16:5200.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 20:36:36 -05:00
Andrew Stoltz
35bd055cb4 feat(guacamole): add macmini-vnc-creds OnePasswordItem + fix Mac mini connection IPs
Phase 1 of Mac mini onboarding (2026-04-28):
- Add OnePasswordItem CRD 'macmini-vnc-creds' in guacamole namespace bound to
  vault item 'Mac Mini' — operator mints Secret with username/password/VNC Password fields
- Mac mini discovered at 10.0.56.115 (INFRA VLAN) — not 10.0.57.50 stored in 1P IP field
- Guacamole connections updated via API (not stored here): VNC conn #10, SSH conns #9/#33
  corrected from old IP 10.0.57.50 → 10.0.56.115
- macOS: 26.4.1 (Sequoia), Apple M1, 16 GB, user: bluejay (admin group)
- VNC port 5900 confirmed open; SSH works via noc1 jumpbox with password auth

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-28 20:09:45 -05:00
Andrew Stoltz
f604ab419e feat(ttsreader): bump image to v202604281923 (SignalR ProgressHub)
Adds ProgressHub endpoint at /hubs/progress with project-scoped
group broadcasting for JobStarted, CueProgress, JobCompleted, and
JobFailed events.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 19:30:41 -05:00
Andrew Stoltz
b2786252b0 chore(ttsreader): bump web image to v202604281831 (ops failed-manifest cleanup)
Deploys fix for stale Failed manifest accumulation in TTS Reader Ops view
and atomic-write guard against empty/corrupt job manifests.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-04-28 18:31:53 -05:00
Andrew Stoltz
45ee40920d fix(ttsreader): bump image to v202604281638 (Range support + Ollama timeout 240s) 2026-04-28 16:44:57 -05:00
Andrew Stoltz
8ad7eb714b fix(ttsreader): bump image to v202604281542 (annotation few-shot prompt + UI hint) 2026-04-28 15:46:28 -05:00
Andrew Stoltz
3cb44c3104 feat(noc-services): wire puppetdb.iamworkin.lan through Traefik step-ca cert 2026-04-28 15:13:20 -05:00
Andrew Stoltz
2400329acd fix(intranet): bump image to v20260428-1500 (Monitoring crash patch + Lane 11 anatomy refresh) 2026-04-28 14:59:27 -05:00
Andrew Stoltz
c17af882cc fix(ttsreader): bump image to v202604281444 for UX polish (cross-chapter Bible passage, /profiles dedup, /ops table) 2026-04-28 14:48:13 -05:00
Andrew Stoltz
76b1938afa fix(ttsreader): bump image to v202604281434 for live playback regression patch (study-player + speech override synth) 2026-04-28 14:43:06 -05:00
Andrew Stoltz
ced04a6148 intranet: bump web image to v20260428-0953
Sprint E XXL Intranet docs depth + read-aloud-root sweep deploy.

Image tag v20260427-2353 → v20260428-0953:
- Track A (Intranet.Web@c4f3d78): 7 service pages deepened toward
  PrintService.razor's 8-tab depth standard. Workflows / Verified
  Surfaces / Recent Verified Changes added.
- Read-aloud-root sweep (Intranet.Web@787982c): data-read-aloud-root
  wrappers added to 6 older /services/* pages so the read-aloud
  overlay scopes content extraction precisely instead of falling back
  to <main> with layout chrome included.
2026-04-28 09:54:27 -05:00
Andrew Stoltz
f2258b92a2 fc-ttsreader: bump web image to v202604280946 + add Render__CdnDirectory env
Sprint E XXL Phase 4γ MVP deploy — POST /api/v1/render endpoint.

Two changes:
1. Image tag v202604272339 → v202604280946 (TtsReader@d9e0a58 master tip
   includes the new RenderController + RenderService + 9 tests).
2. New TtsReader__Render__CdnDirectory=/data/cdn env var. Default
   wwwroot/cdn resolves under the read-only app filesystem when
   runAsNonRoot=true; pin to the existing writable PVC mount alongside
   other TtsReader runtime data. Manifests + cue audio land at
   /data/cdn/sha256/<hash>/manifest.json + cues/.

Pre-existing PVC mount at /data/ already covers this — no PVC change
needed, just the env var override.

Pairs with TtsReader@d9e0a58 master tip (ready for image build + import).
2026-04-28 09:47:46 -05:00
Andrew Stoltz
979a7c7b25 feat(intranet): bump fc-intranet-web to v20260427-2353 + persist PageReadingOverrides
Bump intranet image to v20260427-2353 (master @ 38b0148):
- Sprint E search lane: /search Blazor page + IntranetSearchService
  + DocsCorpusIndexer + Shared.Indexing wiring
- 7 new service pages: LocalAiAgents, AiTopology, Distribution, Dns,
  Knowledge, LlmBridge, Provisioning
- PiManager drift docs

New env var: PageReadingOverrides__FilePath=/data/page-reading-overrides.json
so the persisted Lane 2α store lives on the writable PVC instead of
the default in-memory fallback (which loses state on pod restart).
Operator-edited overrides via the existing /api/v1/pages/{encoded}/overrides
controller will now survive across restarts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 23:54:17 -05:00
Andrew Stoltz
0df8f7b936 chore(ttsreader): bump fc-ttsreader-web to v202604272339 (Sprint E Phase C — partial-render UX)
TtsReader@9333480: distinguishes partial-render (yellow Warning, audio
plays, 'Re-render N failed sentences' button) from full-fail (red
Danger, 'Try render again'). New TtsFallbackChainFailedException carries
both voices when Kokoro + Piper both fail; chapter breadcrumb names
the entire chain instead of just the requested voice. +8 tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 23:40:19 -05:00
Andrew Stoltz
38558641c1 fix(ttsreader-kokoro): bump liveness probe timeouts (Sprint E Phase 1a)
Kokoro pod has 4 restarts in 2d6h with exit 143 (SIGTERM from kubelet).
kubectl describe events all show:

  Liveness probe failed: Get "http://10.42.229.109:8880/v1/audio/voices":
    context deadline exceeded

The probe path /v1/audio/voices shares the FastAPI worker pool with
/v1/audio/speech. A long synth (Bible chapter, 30+ sentences) holds the
pool past the prior 5s × 3 = 15s probe window, kubelet kills the pod,
in-flight renders fail. Operator hits "fallback chain failed" toasts +
partial-render breadcrumbs during these windows.

Bump probe timeoutSeconds 5 → 15 and failureThreshold 3 → 5 → 75 s of
grace before kubelet gives up. Combined with the kokoro-side circuit
breaker landing in TtsReader (Sprint E Phase 1b), the FC backend will
also stop slamming kokoro during recovery so it can serve the probe
even faster.

The companion Prometheus alerts (KokoroPodFlapping, PiperPodFlapping)
land in FlowerCore.Notes/scripts/monitoring/alerts.yml.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 23:28:07 -05:00
Andrew Stoltz
63d905b4df chore(ttsreader): bump fc-ttsreader-web to v202604272236 (Thinking + Feedback ALTERs) 2026-04-27 22:37:08 -05:00
Andrew Stoltz
d95f4e0caf chore(ttsreader): bump fc-ttsreader-web to v202604272228 (ChatSessions IsFavorite ALTER hotfix) 2026-04-27 22:28:56 -05:00
Andrew Stoltz
7bc565d17e fix(ttsreader): pin VoicePreview CacheDirectory to /data PVC
Day 8 disk-cache warmer crashes on production with
'Read-only file system : /home/app/data' because the relative default
'data/voice-previews' resolves under runAsNonRoot HOME (read-only with
readOnlyRootFilesystem=true). Pin to /data/voice-previews so the cache
lands on the writable PVC mount alongside ttsreader.db, audio output,
and jobs root.

Image v202604272216 (already on nodes) is unaffected by this — only
the env routing changes. ArgoCD reconciles + rollout restart picks up
the new env without rebuild.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 22:24:04 -05:00
Andrew Stoltz
dfe9c3b67e chore(ttsreader): bump fc-ttsreader-web to v202604272216 (brace-escape fix) 2026-04-27 22:16:19 -05:00
Andrew Stoltz
37f8db89e4 chore(ttsreader): bump fc-ttsreader-web to v202604272208 (Day 10 + VoiceProfiles hotfix)
v202604272157 crash-looped on the production PVC because Database.EnsureCreated()
is a no-op on existing DBs and the VoiceProfiles table was missing. TtsReader@a9f0b73
adds an idempotent CREATE TABLE IF NOT EXISTS to the infra reconciler before
TtsReaderDataSeeder runs. Bumping the manifest to pick up that fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 22:09:08 -05:00
Andrew Stoltz
00c7d8df24 chore(ttsreader): bump fc-ttsreader-web to v202604272157 (Sprint E Day 10 UX polish)
Compact project page (Setup chip strip + chapter inspect-toggle drawer)
+ render feedback (rolling ETA strip + active-chapter pulse) + Bible
Dashboard navigates to /projects/{id} on queue. Source TtsReader@79de78b.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 21:58:12 -05:00
Andrew Stoltz
c6811eadd8 intranet: bump image to v20260427-newpages-and-topology
Adds 7 new pages (5 service pages, AI topology, opencode operator guide)
to https://intranet.iamworkin.lan:
  /services/dns
  /services/distribution
  /services/llm-bridge
  /services/knowledge
  /services/provisioning
  /services/ai-topology
  /development/local-ai-agents

Plus topology corrections in /services/ai (AiStack.razor) and 6 new nav entries.

Source commit: FlowerCore.Intranet.Web@1598542 on
codex-wip-pre-readaloud-collision-2026-04-24.

Image built from artifacts/publish via Dockerfile.deploy on BLUEJAY-WS,
imported to all 3 RKE2 nodes (rke2-server + rke2-agent1 + rke2-agent2).

Build: 0 warnings, 0 errors, 197/197 tests passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 17:52:34 -05:00
Andrew Stoltz
4d9d537d83 fix(knowledge): repoint Ollama at edge1 + flip README to LIVE (Sprint E B7)
Two changes after the Phase 2.4 deploy went live at
https://knowledge.iamworkin.lan:

1. **Ollama URL flip**: from BLUEJAY-WS (10.0.56.20:11434) to edge1 Pi 5
   (10.0.57.17:11434). Honors the cluster-clean architecture from
   bluejay-infra@0f9d56e ("Workstation is private dev hardware and should
   not be in the cluster path"). Query-time embeddings (~ms per query)
   are fast enough on edge1; bulk index rebuilds (Phase 2.5+) will need a
   separate ingestion lane that can opt into the workstation GPU when
   present. ArgoCD picks up the env-var change and rolls the pod
   automatically — no image rebuild needed.

2. **README LIVE status**: flip the staged-not-yet-applied banner to
   LIVE 2026-04-27. Pod running, certificate issued, PVC bound,
   /healthz 200, /api/v1/editions [] (initial-deploy state). Phase 2.5+
   admin UI handles bulk population.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 16:56:35 -05:00
27 changed files with 2868 additions and 153 deletions

7
.gitignore vendored Normal file
View File

@@ -0,0 +1,7 @@
# .NET build outputs (lint test project)
**/bin/
**/obj/
# Editor / temp
.DS_Store
*.swp

View File

@@ -99,8 +99,23 @@ curl -sk -X DELETE https://dns.iamworkin.lan/api/v1/servers/<serverId>/zones/iam
- **CoreDNS template + ndots:5 collision**: inside pods, `<svc>.<ns>.svc.cluster.local` with <5 dots gets search-expanded through `iamworkin.lan` FIRST and hits the wildcard template → resolves to Traefik VIP, not the real ClusterIP. Use short service names (`<svc>`) in K8s manifests. See memory `feedback_coredns_ndots_template_collision.md`. - **CoreDNS template + ndots:5 collision**: inside pods, `<svc>.<ns>.svc.cluster.local` with <5 dots gets search-expanded through `iamworkin.lan` FIRST and hits the wildcard template → resolves to Traefik VIP, not the real ClusterIP. Use short service names (`<svc>`) in K8s manifests. See memory `feedback_coredns_ndots_template_collision.md`.
- **Image not on node**: pods stuck `ErrImageNeverPull` means the image wasn't imported to the node Kubernetes scheduled the pod onto. `ctr images import` on all of rke2-server, rke2-agent1, rke2-agent2. - **Image not on node**: pods stuck `ErrImageNeverPull` means the image wasn't imported to the node Kubernetes scheduled the pod onto. `ctr images import` on all of rke2-server, rke2-agent1, rke2-agent2.
- **StatefulSet PVC drift**: `volumeClaimTemplates` needs explicit `volumeMode: Filesystem` or ArgoCD SSA self-heals forever. See memory `feedback_argocd_statefulset_pvc_drift.md`. - **StatefulSet PVC drift**: `volumeClaimTemplates` needs explicit `volumeMode: Filesystem` or ArgoCD SSA self-heals forever. See memory `feedback_argocd_statefulset_pvc_drift.md`.
- **IngressRoute namespace split**: this RKE2 Traefik install does not allow cross-namespace service refs. Keep the `IngressRoute`, backend `Service`, and TLS secret in the same namespace; if one host is shared across namespaces, duplicate the `Certificate` and move the route next to the destination service.
- **Public read-only hosts**: if a public host fronts a service that also exposes admin writes internally, add a Traefik route match like `Host(...) && (Method(GET) || Method(HEAD))` on the public edge instead of trusting the app to reject unsafe methods.
- **Public read-write allowlist hosts**: if a public host accepts a tightly bounded write surface (e.g. bootstrap-JWT POST), pin the allowlist as `(Method(GET) || Method(HEAD) || Method(POST) || Method(OPTIONS))`. PUT/PATCH/DELETE must still 404 at the route. Track A's `updatecenter.iamworkin.lan` / `updates.iamworkin.lan` are the canonical example. The lint test enforces this invariant.
- **Traefik VIP netpols**: when a `NetworkPolicy` allows `10.0.56.200`, also allow the post-DNAT backend ports (`8443` for TLS plus `8080` or `8000` for HTTP) or Calico will drop the rewritten flow.
- **Auth-safe probes**: services behind API-key or global auth middleware should prefer `tcpSocket` probes unless `/health` is explicitly exempted before the middleware runs.
- **ArgoCD must use internal Gitea URL**: `http://gitea-clusterip.gitea.svc.cluster.local:3000/bluejay/bluejay-infra.git`, not the external HTTPS URL (step-ca cert isn't trusted by ArgoCD). The `ApplicationSet` and any hand-created `Application` must both use the internal URL. - **ArgoCD must use internal Gitea URL**: `http://gitea-clusterip.gitea.svc.cluster.local:3000/bluejay/bluejay-infra.git`, not the external HTTPS URL (step-ca cert isn't trusted by ArgoCD). The `ApplicationSet` and any hand-created `Application` must both use the internal URL.
## Local manifest lint
The repo now carries a local-first lint pass for the recurring K8s gotchas that have burned the fleet:
```bash
dotnet test tests/bluejay-infra-lint/BluejayInfraLint.Tests.csproj -c Release
```
That test project sweeps `bluejay-infra/apps/**` plus the canonical sibling `FlowerCore.*\\k8s` manifests that share the same workspace. Matching `conftest.dev` policy files live under `tests/bluejay-infra-lint/conftest.dev/` for environments that also have `conftest` or `opa`.
## References ## References
- Cert-manager recovery playbook: `FlowerCore.Notes/memory/project_cert_manager_recovery_2026_04_22.md` - Cert-manager recovery playbook: `FlowerCore.Notes/memory/project_cert_manager_recovery_2026_04_22.md`

View File

@@ -92,14 +92,17 @@ subjects:
# ============================================================================= # =============================================================================
# Agent Zero — AI Agent Web UI (NUC Edition, Blue Jay Profile) # Agent Zero — AI Agent Web UI (NUC Edition, Blue Jay Profile)
# ============================================================================= # =============================================================================
# Connects to a local nginx proxy that routes to edge1 Pi 5 + AI HAT+ Ollama only # Connects directly to fc-llm-bridge for chat + internal util/embed + browser.
# Blue Jay profile with 21 tools, 3 prompts, 4 extensions # Agent Zero's internal util/embed slots stay on the bridge's OpenAI-compatible
# /v1 surface, while browser + corpus-search use the Ollama-compatible /api/*
# surface through OLLAMA_HOST.
# Blue Jay profile with 21 tools, 3 prompts, 4 extensions.
--- ---
# FC LLM Bridge API key for Agent Zero (ADR-088 chat_model routing). # FC LLM Bridge API key for Agent Zero (ADR-088 chat/util/embed/browser routing).
# Syncs from 1Password item "FC LLM Bridge API Keys" (field: agent-zero-k8s). # Syncs from 1Password item "FC LLM Bridge API Keys" (field: agent-zero-k8s).
# Consumed by the chat_model only; util / embedding / browser stay on local # Consumed by chat, internal util/embed, browser, and corpus-search requests
# Ollama via the 127.0.0.1 sidecar proxy. # that traverse fc-llm-bridge.
apiVersion: onepassword.com/v1 apiVersion: onepassword.com/v1
kind: OnePasswordItem kind: OnePasswordItem
metadata: metadata:
@@ -108,6 +111,34 @@ metadata:
spec: spec:
itemPath: "vaults/IAmWorkin/items/FC LLM Bridge API Keys" itemPath: "vaults/IAmWorkin/items/FC LLM Bridge API Keys"
---
# Print.Web API key for Agent Zero's print_web.py Python tool.
# Syncs from 1Password item "Print.Web API Keys" (password field = API key).
# The print_web.py tool reads PRINT_WEB_API_KEY env var for all HTTP requests
# to the thermal print service (GET /api/mcp/tools, POST /api/print/*, etc.).
# Note: Print.Web uses the legacy REST MCP shape (/api/mcp/tools/*), not the
# streamable-http MCP protocol. The print_web Python tool bridges this gap
# and is already present in bluejay-tools ConfigMaps.
apiVersion: onepassword.com/v1
kind: OnePasswordItem
metadata:
name: print-web-api-keys
namespace: agent-zero
spec:
itemPath: "vaults/IAmWorkin/items/Print.Web API Keys"
---
# Knowledge MCP bearer token for the direct Agent Zero -> Knowledge.Web path.
# The 1Password item currently stores the raw token in its concealed PASSWORD
# field, which the operator syncs to Secret key `password`.
apiVersion: onepassword.com/v1
kind: OnePasswordItem
metadata:
name: knowledge-mcp-tokens
namespace: agent-zero
spec:
itemPath: "vaults/IAmWorkin/items/FlowerCore Knowledge MCP Tokens"
--- ---
apiVersion: apps/v1 apiVersion: apps/v1
kind: Deployment kind: Deployment
@@ -119,7 +150,7 @@ metadata:
annotations: annotations:
agent-zero/deployment: "nuc" agent-zero/deployment: "nuc"
agent-zero/profile: "bluejay" agent-zero/profile: "bluejay"
agent-zero/ollama: "edge1 Pi 5 + AI HAT+ only (10.0.57.17:11434) — workstation Ollama is private dev hardware, not a cluster dependency" agent-zero/ollama: "fc-llm-bridge fronts edge1 Pi 5 + AI HAT+ Ollama for cluster browser/corpus-search traffic; internal chat/util/embed route through the bridge's authenticated OpenAI surface"
spec: spec:
replicas: 1 replicas: 1
selector: selector:
@@ -134,19 +165,18 @@ spec:
spec: spec:
serviceAccountName: agent-zero serviceAccountName: agent-zero
initContainers: initContainers:
# Wait for edge1 Ollama to be reachable before starting Agent Zero. # Wait for fc-llm-bridge to be reachable before starting Agent Zero.
# (Workstation Ollama is intentionally NOT in the cluster path.) - name: wait-for-llm-bridge
- name: wait-for-ollama
image: busybox:1.37 image: busybox:1.37
command: ["sh", "-c"] command: ["sh", "-c"]
args: args:
- | - |
echo "Waiting for edge1 Ollama (10.0.57.17:11434)..." echo "Waiting for fc-llm-bridge..."
until wget -qO- --timeout=2 http://10.0.57.17:11434/api/tags >/dev/null 2>&1; do until wget -qO- --timeout=2 http://fc-llm-bridge.fc-llm-bridge.svc:8080/healthz >/dev/null 2>&1; do
echo "edge1 Ollama not ready yet, retrying in 5s..." echo "fc-llm-bridge not ready yet, retrying in 5s..."
sleep 5 sleep 5
done done
echo "edge1 Ollama is reachable." echo "fc-llm-bridge is reachable."
# Assemble the Blue Jay profile directory structure from ConfigMaps. # Assemble the Blue Jay profile directory structure from ConfigMaps.
# ConfigMaps can't create nested dirs, so we copy into the workspace PVC. # ConfigMaps can't create nested dirs, so we copy into the workspace PVC.
- name: setup-bluejay - name: setup-bluejay
@@ -193,73 +223,6 @@ spec:
- name: bluejay-theme - name: bluejay-theme
mountPath: /tmp/bluejay-theme mountPath: /tmp/bluejay-theme
containers: containers:
- name: ollama-proxy
image: nginx:1.27-alpine
command: ["/bin/sh", "-c"]
args:
- |
cat > /etc/nginx/nginx.conf <<'NGINX'
worker_processes 1;
events { worker_connections 1024; }
http {
upstream ollama_upstream {
# edge1 Pi 5 + AI HAT+ is the SOLE upstream.
# Workstation Ollama (BLUEJAY-WS) is private dev hardware and
# MUST NOT be added back here without explicit operator decision —
# adding it would expose the workstation to cluster traffic.
server 10.0.57.17:11434 max_fails=2 fail_timeout=10s;
keepalive 16;
}
server {
listen 11434;
# Local healthcheck — proves nginx itself is alive.
# Must NOT depend on upstream so liveness doesn't restart
# the container when edge1 is slow/offline.
location = /healthz {
access_log off;
return 200 'ok\n';
default_type text/plain;
}
location / {
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_set_header Host $host;
proxy_connect_timeout 5s;
proxy_read_timeout 600s;
proxy_send_timeout 600s;
proxy_next_upstream error timeout invalid_header http_502 http_503 http_504;
proxy_pass http://ollama_upstream;
}
}
}
NGINX
exec nginx -g 'daemon off;'
ports:
- containerPort: 11434
# Readiness probe DOES check upstream so K8s only routes traffic
# when edge1 Ollama is reachable. timeoutSeconds=5 absorbs the Pi's
# slower TCP handshake under load (was timeoutSeconds=1 default →
# 172 historic restarts when the workstation primary path went down,
# before the cluster was repointed to edge1-only on 2026-04-27).
readinessProbe:
httpGet:
path: /api/tags
port: 11434
initialDelaySeconds: 5
periodSeconds: 15
timeoutSeconds: 5
failureThreshold: 3
# Liveness probe hits ONLY local healthz — restarts the container
# only when nginx itself is dead. Decoupling liveness from upstream
# eliminates restart-loops caused by transient upstream outages.
livenessProbe:
httpGet:
path: /healthz
port: 11434
initialDelaySeconds: 10
periodSeconds: 30
timeoutSeconds: 3
failureThreshold: 3
- name: agent-zero - name: agent-zero
image: agent0ai/agent-zero:latest image: agent0ai/agent-zero:latest
command: ["/bin/bash", "-c"] command: ["/bin/bash", "-c"]
@@ -280,24 +243,41 @@ spec:
# chat_model: FlowerCore LLM Bridge (ADR-088) — OpenAI-compat, # chat_model: FlowerCore LLM Bridge (ADR-088) — OpenAI-compat,
# spend-tracked, tier-aliased (fc:balanced → Claude Sonnet). # spend-tracked, tier-aliased (fc:balanced → Claude Sonnet).
# api_key comes from A0_SET_chat_model_api_key env var (overrides # api_key comes from A0_SET_chat_model_api_key env var (overrides
# config.json). util + embedding go to local 127.0.0.1 nginx # config.json). Utility + embedding stay on the authenticated
# proxy which routes to edge1 Pi 5 + AI HAT+ ONLY (workstation # OpenAI-compatible /v1 surface; browser and direct tool traffic
# is private dev hardware, intentionally not in the cluster path). # use the bridge's Ollama-compatible root via OLLAMA_HOST.
mkdir -p /a0/usr/plugins/_model_config mkdir -p /a0/usr/plugins/_model_config
cat > /a0/usr/plugins/_model_config/config.json << 'MODELCFG' cat > /a0/usr/plugins/_model_config/config.json << 'MODELCFG'
{"allow_chat_override":true,"chat_model":{"provider":"openai","name":"fc:balanced","api_base":"http://fc-llm-bridge.fc-llm-bridge.svc:8080/v1","ctx_length":8192,"ctx_history":0.7,"vision":false,"kwargs":{"temperature":0,"num_ctx":8192}},"utility_model":{"provider":"ollama","name":"qwen2.5:1.5b","api_base":"http://127.0.0.1:11434","ctx_length":8192,"ctx_input":0.7,"kwargs":{"num_ctx":8192}},"embedding_model":{"provider":"ollama","name":"nomic-embed-text","api_base":"http://127.0.0.1:11434","kwargs":{}}} {"allow_chat_override":true,"chat_model":{"provider":"openai","name":"fc:balanced","api_base":"http://fc-llm-bridge.fc-llm-bridge.svc:8080/v1","ctx_length":8192,"ctx_history":0.7,"vision":false,"kwargs":{"temperature":0,"num_ctx":8192}},"utility_model":{"provider":"openai","name":"fc:cheap","api_base":"http://fc-llm-bridge.fc-llm-bridge.svc:8080/v1","ctx_length":8192,"ctx_input":0.7,"kwargs":{"num_ctx":8192}},"embedding_model":{"provider":"openai","name":"openai/fc:embedding","api_base":"http://fc-llm-bridge.fc-llm-bridge.svc:8080/v1","kwargs":{}}}
MODELCFG MODELCFG
# Strip heredoc indentation # Strip heredoc indentation
sed -i 's/^ //' /a0/usr/plugins/_model_config/config.json sed -i 's/^ //' /a0/usr/plugins/_model_config/config.json
# Phase 0 Chat MCP pilot: Agent Zero does not interpolate env vars # Phase 0 Chat MCP pilot: Agent Zero does not interpolate env vars
# inside A0_SET_mcp_servers JSON, so build the final JSON here from # inside A0_SET_mcp_servers JSON, so build the final JSON here from
# the secret-backed CHAT_MCP_API_KEY env var before initialize.sh. # the secret-backed env vars before initialize.sh. Keep the local
# Use the in-cluster Chat service URL rather than the public # corpus_search.py tool mounted either way so outage fallback
# Traefik hostname so the pod stays off the private VIP lane that # remains available even when fc_knowledge is not advertised.
# the default egress rule blocks. export KNOWLEDGE_MCP_ENABLED=false
if [ -n "${CHAT_MCP_API_KEY:-}" ]; then if [ -n "${KNOWLEDGE_MCP_BEARER_TOKEN:-}" ]; then
export A0_SET_mcp_servers="{\"mcpServers\":{\"fc-chat\":{\"type\":\"streamable-http\",\"url\":\"http://chat-web.fc-chat.svc/mcp\",\"headers\":{\"X-Api-Key\":\"${CHAT_MCP_API_KEY}\"}}}}" if curl -sf --connect-timeout 3 "${KNOWLEDGE_MCP_HEALTH_URL}" > /dev/null && \
curl -sf --connect-timeout 5 \
-H "Authorization: Bearer ${KNOWLEDGE_MCP_BEARER_TOKEN}" \
-H "Accept: application/json, text/event-stream" \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":"fc-knowledge-bootstrap","method":"initialize","params":{"protocolVersion":"2025-03-26","capabilities":{},"clientInfo":{"name":"agent-zero-bootstrap","version":"1.0"}}}' \
"${KNOWLEDGE_MCP_URL}" > /dev/null; then
export KNOWLEDGE_MCP_ENABLED=true
echo "fc_knowledge enabled from ${KNOWLEDGE_MCP_URL}."
else
echo "fc_knowledge unavailable or unauthorized; keeping local corpus_search.py as the fallback path."
fi fi
else
echo "fc_knowledge token missing; keeping local corpus_search.py as the fallback path."
fi
export A0_SET_mcp_servers="$(
python3 -c 'import json, os; servers = {}; chat_key = os.getenv("CHAT_MCP_API_KEY"); knowledge_enabled = os.getenv("KNOWLEDGE_MCP_ENABLED", "false").lower() == "true"; token = os.getenv("KNOWLEDGE_MCP_BEARER_TOKEN", "") if knowledge_enabled else ""; chat_key and servers.setdefault("fc_chat", {"type": "streamable-http", "url": "http://chat-web.fc-chat.svc/mcp", "headers": {"X-Api-Key": chat_key}}); token and servers.setdefault("fc_knowledge", {"type": "streamable-http", "url": os.getenv("KNOWLEDGE_MCP_URL", "http://knowledge-web.knowledge.svc/mcp"), "headers": {"Authorization": f"Bearer {token}"}}); print(json.dumps({"mcpServers": servers}, separators=(",", ":")))'
)"
# Run the original entrypoint # Run the original entrypoint
exec /exe/initialize.sh $BRANCH exec /exe/initialize.sh $BRANCH
ports: ports:
@@ -309,8 +289,9 @@ spec:
# Chat model — routed through FlowerCore LLM Bridge (ADR-088) # Chat model — routed through FlowerCore LLM Bridge (ADR-088)
# so spend is tracked and tier aliases (fc:cheap/fc:balanced/fc:deep) # so spend is tracked and tier aliases (fc:cheap/fc:balanced/fc:deep)
# dispatch to Ollama or Anthropic via a single OpenAI-compat endpoint. # dispatch to Ollama or Anthropic via a single OpenAI-compat endpoint.
# Util / embedding / browser stay on local Ollama via 127.0.0.1 proxy # Internal utility + embedding use the authenticated OpenAI surface,
# for zero-latency, zero-cost small-model traffic. # while browser/corpus-search use the bridge's Ollama-compatible
# endpoints so Agent Zero no longer needs a local proxy sidecar.
- name: A0_SET_chat_model_provider - name: A0_SET_chat_model_provider
value: "openai" value: "openai"
- name: A0_SET_chat_model_name - name: A0_SET_chat_model_name
@@ -332,35 +313,51 @@ spec:
secretKeyRef: secretKeyRef:
name: fc-llm-bridge-api-keys name: fc-llm-bridge-api-keys
key: agent-zero-k8s key: agent-zero-k8s
- name: FC_LLM_BRIDGE_API_KEY
valueFrom:
secretKeyRef:
name: fc-llm-bridge-api-keys
key: agent-zero-k8s
- name: A0_SET_chat_model_ctx_length - name: A0_SET_chat_model_ctx_length
value: "8192" value: "8192"
- name: A0_SET_chat_model_kwargs - name: A0_SET_chat_model_kwargs
value: '{"temperature": 0, "num_ctx": 8192}' value: '{"temperature": 0, "num_ctx": 8192}'
# Utility model — fast small helper tier through the same proxy # Utility model — fast small helper tier through the OpenAI surface
- name: A0_SET_util_model_provider - name: A0_SET_util_model_provider
value: "ollama" value: "openai"
- name: A0_SET_util_model_name - name: A0_SET_util_model_name
value: "qwen2.5:1.5b" value: "fc:cheap"
- name: A0_SET_util_model_api_base - name: A0_SET_util_model_api_base
value: "http://127.0.0.1:11434" value: "http://fc-llm-bridge.fc-llm-bridge.svc:8080/v1"
- name: A0_SET_util_model_kwargs - name: A0_SET_util_model_kwargs
value: '{"num_ctx": 2048}' value: '{"num_ctx": 2048}'
# Embedding model — nomic through the same proxy # Embedding model — authenticated bridge alias to nomic-embed-text.
# LiteLLM's embedding() path needs an explicit provider prefix here
# even though the chat slot can use bare fc:* aliases.
- name: A0_SET_embed_model_provider - name: A0_SET_embed_model_provider
value: "ollama" value: "openai"
- name: A0_SET_embed_model_name - name: A0_SET_embed_model_name
value: "nomic-embed-text" value: "openai/fc:embedding"
- name: A0_SET_embed_model_api_base - name: A0_SET_embed_model_api_base
value: "http://127.0.0.1:11434" value: "http://fc-llm-bridge.fc-llm-bridge.svc:8080/v1"
# Browser model — small Gemma candidate through the same proxy # Browser model — small Gemma candidate through the same proxy
- name: A0_SET_browser_model_provider - name: A0_SET_browser_model_provider
value: "ollama" value: "ollama"
- name: A0_SET_browser_model_name - name: A0_SET_browser_model_name
value: "gemma3:4b" value: "gemma3:4b"
- name: A0_SET_browser_model_api_base - name: A0_SET_browser_model_api_base
value: "http://127.0.0.1:11434" value: "http://fc-llm-bridge.fc-llm-bridge.svc:8080"
- name: A0_SET_browser_model_api_key
valueFrom:
secretKeyRef:
name: fc-llm-bridge-api-keys
key: agent-zero-k8s
- name: A0_SET_browser_model_vision - name: A0_SET_browser_model_vision
value: "true" value: "true"
- name: OLLAMA_HOST
value: "http://fc-llm-bridge.fc-llm-bridge.svc:8080"
- name: FLOWERCORE_AGENTZERO_OLLAMA_URL
value: "http://fc-llm-bridge.fc-llm-bridge.svc:8080"
# Agent profile — Blue Jay personality, tools, and system prompt # Agent profile — Blue Jay personality, tools, and system prompt
- name: A0_SET_agent_profile - name: A0_SET_agent_profile
value: "bluejay" value: "bluejay"
@@ -383,9 +380,38 @@ spec:
name: chat-mcp-api-key name: chat-mcp-api-key
key: api-key key: api-key
optional: true optional: true
# Print.Web — Thermal printer service on edge2 # FlowerCore.Knowledge MCP Phase 1 — direct Agent Zero client path.
# Probe /healthz first, then try an authenticated initialize call.
# If either fails, Agent Zero boots without fc_knowledge and keeps
# the local corpus_search.py tool as the outage-safe path.
- name: KNOWLEDGE_MCP_URL
value: "http://knowledge-web.knowledge.svc/mcp"
- name: KNOWLEDGE_MCP_HEALTH_URL
value: "http://knowledge-web.knowledge.svc/healthz"
- name: KNOWLEDGE_MCP_BEARER_TOKEN
valueFrom:
secretKeyRef:
name: knowledge-mcp-tokens
key: password
# Print.Web — Thermal printer service on edge2.
# PRINT_WEB_URL: internal HTTP (bypasses Traefik TLS — print_web.py
# runs in-cluster and can reach edge2 directly on the PROD VLAN).
# PRINT_WEB_API_KEY: from 1Password "Print.Web API Keys" password field,
# synced by the print-web-api-keys OnePasswordItem CRD above.
# The print_web.py Python tool reads both env vars for all HTTP calls.
- name: PRINT_WEB_URL - name: PRINT_WEB_URL
value: "http://10.0.57.16:5200" value: "http://10.0.57.16:5200"
- name: PRINT_WEB_API_KEY
valueFrom:
secretKeyRef:
name: print-web-api-keys
key: password
# Intranet search — use in-cluster HTTP (no step-ca TLS needed)
# corpus_search.py reads FLOWERCORE_FLEET_VECTOR_DIR but that mount is not
# on the cluster yet (BLUEJAY-WS only). The tool gracefully returns a
# "no DB found" message with rebuild instructions rather than crashing.
- name: FLOWERCORE_INTRANET_URL
value: "http://intranet-web.intranet.svc:5300"
# Kubernetes # Kubernetes
- name: KUBERNETES_SERVICE_HOST - name: KUBERNETES_SERVICE_HOST
value: "kubernetes.default.svc" value: "kubernetes.default.svc"
@@ -420,7 +446,7 @@ spec:
command: command:
- /bin/bash - /bin/bash
- -c - -c
- "curl -sf http://localhost:80/ > /dev/null && curl -sf --connect-timeout 3 http://127.0.0.1:11434/api/tags > /dev/null" - "curl -sf http://localhost:80/ > /dev/null && curl -sf --connect-timeout 3 http://fc-llm-bridge.fc-llm-bridge.svc:8080/healthz > /dev/null"
periodSeconds: 30 periodSeconds: 30
failureThreshold: 2 failureThreshold: 2
resources: resources:
@@ -558,13 +584,6 @@ spec:
protocol: UDP protocol: UDP
- port: 53 - port: 53
protocol: TCP protocol: TCP
# Ollama on edge1 Pi 5 + AI HAT+ (sole upstream — workstation
# is private dev hardware and intentionally not allowlisted)
- to:
- ipBlock:
cidr: 10.0.57.17/32
ports:
- port: 11434
# Print.Web on edge2 # Print.Web on edge2
- to: - to:
- ipBlock: - ipBlock:
@@ -598,6 +617,26 @@ spec:
protocol: TCP protocol: TCP
- port: 8080 - port: 8080
protocol: TCP protocol: TCP
# FlowerCore.Knowledge MCP (Phase 1) — in-cluster direct route with
# anonymous /healthz probe plus authenticated /mcp initialize/tool calls.
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: knowledge
ports:
- port: 80
protocol: TCP
- port: 8080
protocol: TCP
# Intranet search API — use in-cluster svc so traffic stays inside
# the cluster and is not blocked by the private-range egress denylist.
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: intranet
ports:
- port: 5300
protocol: TCP
# Allow internet (for kubectl image pull, etc) # Allow internet (for kubectl image pull, etc)
- to: - to:
- ipBlock: - ipBlock:

View File

@@ -7209,6 +7209,9 @@ data:
"keep_alive": keep_alive, "keep_alive": keep_alive,
"stream": False, "stream": False,
}) })
curl_headers = ["-H", "Content-Type: application/json"]
if os.environ.get("FC_LLM_BRIDGE_API_KEY"):
curl_headers.extend(["-H", f"X-Api-Key: {os.environ['FC_LLM_BRIDGE_API_KEY']}"])
try: try:
result = subprocess.run( result = subprocess.run(
@@ -7216,7 +7219,7 @@ data:
"curl", "-s", "--max-time", "120", "curl", "-s", "--max-time", "120",
"-X", "POST", "-X", "POST",
f"{api_base}/api/generate", f"{api_base}/api/generate",
"-H", "Content-Type: application/json", *curl_headers,
"-d", payload, "-d", payload,
], ],
capture_output=True, capture_output=True,
@@ -13150,6 +13153,451 @@ data:
- PowerShell 5.1 compatibility is assumed (no PowerShell 7+ features). - PowerShell 5.1 compatibility is assumed (no PowerShell 7+ features).
- All commands run with `-NoProfile -NonInteractive` flags for clean execution. - All commands run with `-NoProfile -NonInteractive` flags for clean execution.
""" """
corpus_search.py: |
# FlowerCore Fleet Corpus Vector Search Tool
#
# Queries the AiStation-built SqliteVecVectorStore DB at /a0/usr/vectors/fleet.db
# (bind-mounted read-only from /var/lib/flowercore/vector-stores/ on the host).
# Embeds the query through Ollama's nomic-embed-text model, computes cosine
# similarity against every stored chunk in pure Python (no numpy — not present
# in the container), and returns the top-K nearest neighbors with source metadata.
#
# This is the offline-friendly counterpart to `intranet_search` (which hits the
# Intranet's live REST API). Use it for Bible/Greek/Hebrew/Strong's lookups and
# anywhere the workstation has a newer DB than the Intranet one. The store is
# refreshed by `aistation-indexer build <edition>` — see the FlowerCore.Knowledge
# ADR at docs/ai-agents/flowercore-knowledge-service-plan.md.
import json
import math
import os
import sqlite3
import urllib.request
from pathlib import Path
from python.helpers.tool import Tool, Response
DEFAULT_VECTORS_DIR = os.environ.get(
"FLOWERCORE_FLEET_VECTOR_DIR",
"/a0/usr/vectors",
)
# When the caller doesn't pick an explicit DB, prefer the biggest fleet tier
# present on disk. Workstation → pi-edge → bmo-bot.
PREFERRED_DB_ORDER = [
os.environ.get("FLOWERCORE_FLEET_VECTOR_DB", ""),
"fleet-workstation-full.db",
"fleet-pi-edge.db",
"fleet-bmo-bot.db",
]
OLLAMA_BASE_URL = os.environ.get(
"FLOWERCORE_AGENTZERO_OLLAMA_URL",
"http://host.containers.internal:11434",
)
BRIDGE_API_KEY = os.environ.get("FC_LLM_BRIDGE_API_KEY", "").strip()
EMBEDDING_MODEL = os.environ.get(
"FLOWERCORE_FLEET_EMBEDDING_MODEL",
"nomic-embed-text",
)
class CorpusSearch(Tool):
async def execute(self, **kwargs) -> Response:
"""
Semantic search over the FlowerCore fleet corpus (Bible texts, lexicons,
dictionaries, morphology) pre-indexed by aistation-indexer.
Args (via self.args):
query (str): Search query text. Required unless action=stats.
limit (int): Max results. Default 8.
index (str): Optional index name filter ("bible-texts", "lexicons",
"dictionaries", "morphology"). Default: all indexes.
repo (str): Optional repo filter (e.g. "world-english-bible").
db (str): Override DB path OR file name inside FLOWERCORE_FLEET_VECTOR_DIR
(defaults to /a0/usr/vectors). If omitted, the largest
fleet tier present on disk is picked automatically.
action (str): Optional. "stats" returns an inventory of all fleet DBs
visible to the tool (names, sizes, index counts, chunk
counts, last-built timestamps). No embedding call.
Returns:
Response with ranked chunks (score, source, text preview) OR
(when action=stats) a markdown inventory of available fleet DBs.
"""
query = (self.args.get("query") or "").strip()
limit = int(self.args.get("limit") or 8)
index_filter = (self.args.get("index") or "").strip()
repo_filter = (self.args.get("repo") or "").strip()
db_override = (self.args.get("db") or "").strip()
action = (self.args.get("action") or "").strip().lower()
if action == "stats":
return Response(message=_render_stats(), break_loop=False)
if not query:
return Response(
message=(
"Error: 'query' is required unless action=stats.\n"
"Example: query=\"what does Genesis 1:1 say\" limit=5\n"
"Inventory: action=stats"
),
break_loop=False,
)
db = _resolve_db(db_override)
if db is None:
return Response(
message=(
f"Error: no fleet vector DB found under {DEFAULT_VECTORS_DIR}.\n"
"Host side: run `aistation-indexer build fleet-workstation-full`\n"
"(or `fleet-pi-edge`/`fleet-bmo-bot`) to produce\n"
"`/var/lib/flowercore/vector-stores/<slug>.db`, then confirm the\n"
"Podman unit mounts that directory into `/a0/usr/vectors:ro`."
),
break_loop=False,
)
try:
query_vec = _embed(query)
except Exception as e:
return Response(
message=f"Error: failed to embed query via Ollama at {OLLAMA_BASE_URL}: {e}",
break_loop=False,
)
try:
hits = _search(db, query_vec, index_filter, repo_filter, limit)
except Exception as e:
return Response(
message=f"Error: corpus search failed: {e}",
break_loop=False,
)
if not hits:
return Response(
message=(
f"No matches for '{query}' in {db.name}.\n"
f"Indexes available: " + _list_indexes_summary(db)
),
break_loop=False,
)
lines = [f"**Corpus search: `{query}`** (top {len(hits)} of {limit} requested, DB={db.name})", ""]
for rank, h in enumerate(hits, 1):
passage = h.get("passage") or ""
lang = h.get("language") or ""
meta_bits = [x for x in (h["index"], h["repo"], passage, lang) if x]
meta = " · ".join(meta_bits)
preview = h["text"]
if len(preview) > 320:
preview = preview[:320].rstrip() + "…"
lines.append(f"{rank}. **{h['score']:.3f}** {meta}")
lines.append(f" `{h['source']}`")
lines.append(f" {preview}")
lines.append("")
return Response(message="\n".join(lines).rstrip() + "\n", break_loop=False)
def _resolve_db(override: str) -> "Path | None":
"""Pick a fleet DB by explicit path, explicit filename, or preferred order."""
vectors_dir = Path(DEFAULT_VECTORS_DIR)
if override:
# Absolute or relative path that points at a real file wins outright.
p = Path(override)
if p.is_absolute() and p.exists():
return p
# Otherwise treat it as a filename within the vectors dir.
candidate = vectors_dir / override
if candidate.exists():
return candidate
return None
for name in PREFERRED_DB_ORDER:
if not name:
continue
p = Path(name) if Path(name).is_absolute() else vectors_dir / name
if p.exists():
return p
# Fallback: any *.db in the dir, largest first.
if vectors_dir.is_dir():
candidates = sorted(vectors_dir.glob("*.db"), key=lambda p: p.stat().st_size, reverse=True)
if candidates:
return candidates[0]
return None
def _embed(text: str) -> list:
"""Embed a query via Ollama's /api/embeddings. Single-vector response."""
body = json.dumps({"model": EMBEDDING_MODEL, "prompt": text}).encode("utf-8")
headers = {"Content-Type": "application/json"}
if BRIDGE_API_KEY:
headers["X-Api-Key"] = BRIDGE_API_KEY
req = urllib.request.Request(
f"{OLLAMA_BASE_URL.rstrip('/')}/api/embeddings",
data=body,
headers=headers,
)
with urllib.request.urlopen(req, timeout=60) as resp:
data = json.loads(resp.read().decode("utf-8"))
vec = data.get("embedding")
if not isinstance(vec, list) or not vec:
raise RuntimeError(f"Ollama returned no embedding: {data}")
return [float(x) for x in vec]
def _cosine(a: list, b: list) -> float:
"""Cosine similarity in pure Python — no numpy in the A0 container."""
# zip() stops at the shorter — AiStation DB guarantees same dim per index.
dot = 0.0
na = 0.0
nb = 0.0
for x, y in zip(a, b):
dot += x * y
na += x * x
nb += y * y
if na == 0.0 or nb == 0.0:
return 0.0
return dot / (math.sqrt(na) * math.sqrt(nb))
def _search(db_path: Path, query_vec: list, index_filter: str, repo_filter: str, limit: int) -> list:
"""Load entries, compute cosine, return top-K.
SqliteVecVectorStore schema:
VectorIndexes(IndexName, Dimensions, UpdatedAtUtc)
VectorEntries(IndexName, ChunkId, TextContent, SourceRepo, SourceFile,
Book, Chapter, VerseRange, Language, ContentType, License,
EstimatedTokens, EmbeddingJson)
Embeddings are stored as JSON arrays in EmbeddingJson; similarity is computed
in Python. For ~100k chunks × 768 dims this takes a couple seconds on a
workstation — acceptable for interactive A0 use.
"""
conn = sqlite3.connect(f"file:{db_path}?mode=ro", uri=True)
try:
sql = [
"SELECT IndexName, ChunkId, TextContent, SourceRepo, SourceFile, ",
" Book, Chapter, VerseRange, Language, EmbeddingJson ",
"FROM VectorEntries",
]
where = []
params = []
if index_filter:
where.append("IndexName = ?")
params.append(index_filter)
if repo_filter:
where.append("SourceRepo LIKE ?")
params.append(f"%{repo_filter}%")
if where:
sql.append(" WHERE " + " AND ".join(where))
sql.append(";")
cursor = conn.execute("".join(sql), params)
# Min-heap by (score, ...) would be faster but for interactive use we
# just sort at the end — simpler and readable.
scored = []
for row in cursor:
idx, chunk_id, text, repo, source_file, book, chapter, verses, lang, emb_json = row
try:
vec = json.loads(emb_json)
except (json.JSONDecodeError, TypeError):
continue
score = _cosine(query_vec, vec)
passage = None
if book and chapter:
passage = f"{book} {chapter}"
if verses:
passage += f":{verses}"
scored.append((score, {
"index": idx,
"chunk_id": chunk_id,
"text": text,
"repo": repo or "",
"source": source_file or "",
"passage": passage or "",
"language": lang or "",
}))
scored.sort(key=lambda t: t[0], reverse=True)
return [{"score": s, **meta} for s, meta in scored[:limit]]
finally:
conn.close()
def _render_stats() -> str:
"""Markdown inventory of every *.db in FLOWERCORE_FLEET_VECTOR_DIR."""
vectors_dir = Path(DEFAULT_VECTORS_DIR)
if not vectors_dir.is_dir():
return f"No fleet vector dir mounted at {vectors_dir}. Ask the host operator to build an index with scripts/agent-zero/build-fleet-index.sh."
dbs = sorted(vectors_dir.glob("*.db"))
if not dbs:
return f"No fleet DBs present under {vectors_dir}. Run `scripts/agent-zero/build-fleet-index.sh fleet-workstation-full` on the host."
lines = [f"**Fleet vector DB inventory** ({vectors_dir})", ""]
for db in dbs:
size_mb = db.stat().st_size / (1024 * 1024)
lines.append(f"### `{db.name}` ({size_mb:.1f} MB)")
try:
conn = sqlite3.connect(f"file:{db}?mode=ro", uri=True)
try:
idx_rows = conn.execute(
"SELECT IndexName, Dimensions, UpdatedAtUtc FROM VectorIndexes ORDER BY IndexName;"
).fetchall()
if not idx_rows:
lines.append("- (no indexes registered)")
else:
counts = dict(conn.execute(
"SELECT IndexName, COUNT(*) FROM VectorEntries GROUP BY IndexName;"
).fetchall())
for name, dim, updated in idx_rows:
count = counts.get(name, 0)
lines.append(f"- **{name}** — {count:,} chunks × {dim}d (built {updated})")
finally:
conn.close()
except Exception as e:
lines.append(f"- (inspect failed: {e})")
lines.append("")
lines.append(f"**Tool defaults:** embedding model `{EMBEDDING_MODEL}`, Ollama at `{OLLAMA_BASE_URL}`. Pick a DB with `db=<filename>`; filter by `index=<name>`/`repo=<substring>`.")
return "\n".join(lines).rstrip() + "\n"
def _list_indexes_summary(db_path: Path) -> str:
try:
conn = sqlite3.connect(f"file:{db_path}?mode=ro", uri=True)
try:
rows = conn.execute(
"SELECT IndexName, Dimensions, "
" (SELECT COUNT(*) FROM VectorEntries WHERE VectorEntries.IndexName = VectorIndexes.IndexName) "
"FROM VectorIndexes ORDER BY IndexName;"
).fetchall()
if not rows:
return "(no indexes)"
return ", ".join(f"{r[0]}({r[2]}×{r[1]}d)" for r in rows)
finally:
conn.close()
except Exception as e:
return f"(couldn't list: {e})"
intranet_search.py: |
# Intranet Vector Search Tool
# Queries the Blue Jay Lab Intranet's Shared.Indexing RAG corpus over its
# live REST API (https://intranet.iamworkin.lan/search). Returns ranked chunks
# with source file paths and scores.
import json
import os
import ssl
import urllib.parse
import urllib.request
from python.helpers.tool import Tool, Response
INTRANET_BASE_URL = os.environ.get(
"FLOWERCORE_INTRANET_URL",
"https://intranet.iamworkin.lan",
)
STEPCA_ROOT_CRT = "/a0/usr/ca/stepca-root.crt"
def _ssl_ctx() -> ssl.SSLContext:
ctx = ssl.create_default_context()
if os.path.exists(STEPCA_ROOT_CRT):
ctx.load_verify_locations(cafile=STEPCA_ROOT_CRT)
return ctx
class IntranetSearch(Tool):
async def execute(self, **kwargs) -> Response:
"""
Search the Blue Jay Lab intranet corpus (docs, project notes, dashboards).
Args (via self.args):
query (str): Search query. Required.
limit (int): Max chunks to return. Default 8.
corpus (str): Optional corpus filter (e.g. "notes", "docs").
Returns:
Response with ranked chunk text, source path, and score.
"""
query = self.args.get("query", "").strip()
limit = int(self.args.get("limit", 8))
corpus = self.args.get("corpus", "").strip()
if not query:
return Response(
message="Error: 'query' is required.",
break_loop=False,
)
params = {"q": query, "topK": str(limit)}
if corpus:
params["indexName"] = corpus
url = f"{INTRANET_BASE_URL}/api/search?{urllib.parse.urlencode(params)}"
try:
req = urllib.request.Request(url, headers={"Accept": "application/json"})
with urllib.request.urlopen(req, timeout=20, context=_ssl_ctx()) as resp:
raw = resp.read().decode("utf-8", errors="replace")
except Exception as exc:
return Response(
message=f"Intranet search failed: {exc}\nURL: {url}",
break_loop=False,
)
try:
data = json.loads(raw)
except json.JSONDecodeError:
return Response(
message=f"Intranet returned non-JSON response:\n{raw[:500]}",
break_loop=False,
)
hits = data if isinstance(data, list) else (
data.get("results") or data.get("hits") or data.get("chunks") or []
)
if not hits:
return Response(
message=f"No intranet results for query: {query!r}",
break_loop=False,
)
lines = [f"# Intranet search: {query} ({len(hits)} hits)\n"]
for i, hit in enumerate(hits[:limit], 1):
src = (
hit.get("sourceFile")
or hit.get("source")
or hit.get("path")
or hit.get("file")
or "?"
)
repo = hit.get("sourceRepo") or ""
idx = hit.get("indexName") or ""
score = hit.get("score") or hit.get("similarity") or ""
text = (
hit.get("snippet")
or hit.get("text")
or hit.get("content")
or hit.get("chunk")
or ""
).strip()
if len(text) > 600:
text = text[:600] + "..."
header = f"## [{i}] {repo}/{src}" if repo else f"## [{i}] {src}"
if idx:
header += f" ({idx})"
if score:
header += f" score={score:.3f}" if isinstance(score, float) else f" score={score}"
lines.append(header)
lines.append(text)
lines.append("")
return Response(message="\n".join(lines), break_loop=False)
kind: ConfigMap kind: ConfigMap
metadata: metadata:
name: bluejay-tools-c name: bluejay-tools-c

View File

@@ -20,7 +20,19 @@ spec:
nodeSelector: nodeSelector:
kubernetes.io/hostname: rke2-agent1 kubernetes.io/hostname: rke2-agent1
hostNetwork: true hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet # Keep the search list free of iamworkin.lan so CoreDNS's wildcard
# template cannot hijack public egress like downloads.asterisk.org.
dnsPolicy: None
dnsConfig:
nameservers:
- 10.43.0.10
searches:
- telephony.svc.cluster.local
- svc.cluster.local
- cluster.local
options:
- name: ndots
value: "2"
securityContext: securityContext:
fsGroup: 0 fsGroup: 0
# CoreDNS in this cluster has an iamworkin.lan wildcard that catches # CoreDNS in this cluster has an iamworkin.lan wildcard that catches

View File

@@ -87,6 +87,20 @@ spec:
prometheus.io/port: "8080" prometheus.io/port: "8080"
prometheus.io/path: "/metrics" prometheus.io/path: "/metrics"
spec: spec:
# Use an explicit DNS policy so external FQDNs like api.anthropic.com are
# resolved directly instead of being expanded through the cluster search
# path that includes iamworkin.lan.
dnsPolicy: None
dnsConfig:
nameservers:
- 10.43.0.10
searches:
- fc-llm-bridge.svc.cluster.local
- svc.cluster.local
- cluster.local
options:
- name: ndots
value: "2"
securityContext: securityContext:
fsGroup: 1654 fsGroup: 1654
fsGroupChangePolicy: OnRootMismatch fsGroupChangePolicy: OnRootMismatch
@@ -97,7 +111,7 @@ spec:
# dotnet.exe publish -c Release -o deploy/app \ # dotnet.exe publish -c Release -o deploy/app \
# src/FlowerCore.LlmBridge.Web/FlowerCore.LlmBridge.Web.csproj # src/FlowerCore.LlmBridge.Web/FlowerCore.LlmBridge.Web.csproj
# podman build -t localhost/fc-llm-bridge:v<tag> -f deploy/Dockerfile.deploy deploy # podman build -t localhost/fc-llm-bridge:v<tag> -f deploy/Dockerfile.deploy deploy
image: localhost/fc-llm-bridge:v202604231520 image: localhost/fc-llm-bridge:v202604300022
imagePullPolicy: Never imagePullPolicy: Never
ports: ports:
- containerPort: 8080 - containerPort: 8080
@@ -116,6 +130,10 @@ spec:
value: "default" value: "default"
- name: FlowerCore__LlmBridge__DefaultAppName - name: FlowerCore__LlmBridge__DefaultAppName
value: "agent-zero" value: "agent-zero"
- name: FlowerCore__LlmBridge__UtilModel
value: "qwen2.5:1.5b"
- name: FlowerCore__LlmBridge__EmbedModel
value: "nomic-embed-text"
# Per-consumer API keys — from OnePasswordItem fc-llm-bridge-api-keys. # Per-consumer API keys — from OnePasswordItem fc-llm-bridge-api-keys.
# Each field becomes a Secret key of the same name. The key-name # Each field becomes a Secret key of the same name. The key-name
# lands in the auth principal's `fc.app` claim for ledger scoping. # lands in the auth principal's `fc.app` claim for ledger scoping.
@@ -207,17 +225,6 @@ spec:
port: 8080 port: 8080
initialDelaySeconds: 15 initialDelaySeconds: 15
periodSeconds: 30 periodSeconds: 30
# Lower ndots so external FQDNs like api.anthropic.com are tried BEFORE
# the ndots:5 default expands them through the cluster search path, which
# includes iamworkin.lan. CoreDNS has a `template IN A iamworkin.lan`
# wildcard that answers `api.anthropic.com.iamworkin.lan` with the
# Traefik VIP, which then serves a TRAEFIK-DEFAULT-CERT TLS cert and
# breaks egress to the real Anthropic API (memory:
# feedback_coredns_ndots_template_collision, generalized to external DNS).
dnsConfig:
options:
- name: ndots
value: "2"
volumes: volumes:
- name: data - name: data
persistentVolumeClaim: persistentVolumeClaim:

View File

@@ -69,16 +69,14 @@ spec:
memory: "512Mi" memory: "512Mi"
cpu: "500m" cpu: "500m"
livenessProbe: livenessProbe:
httpGet: tcpSocket:
path: /health
port: 8080 port: 8080
initialDelaySeconds: 10 initialDelaySeconds: 10
periodSeconds: 30 periodSeconds: 30
timeoutSeconds: 5 timeoutSeconds: 5
failureThreshold: 3 failureThreshold: 3
readinessProbe: readinessProbe:
httpGet: tcpSocket:
path: /health
port: 8080 port: 8080
initialDelaySeconds: 10 initialDelaySeconds: 10
periodSeconds: 10 periodSeconds: 10

View File

@@ -76,15 +76,13 @@ spec:
memory: "512Mi" memory: "512Mi"
cpu: "500m" cpu: "500m"
livenessProbe: livenessProbe:
httpGet: tcpSocket:
path: /health
port: http port: http
initialDelaySeconds: 30 initialDelaySeconds: 30
periodSeconds: 30 periodSeconds: 30
timeoutSeconds: 5 timeoutSeconds: 5
readinessProbe: readinessProbe:
httpGet: tcpSocket:
path: /health
port: http port: http
initialDelaySeconds: 10 initialDelaySeconds: 10
periodSeconds: 10 periodSeconds: 10

View File

@@ -37,6 +37,19 @@ spec:
app.kubernetes.io/name: ttsreader-piper app.kubernetes.io/name: ttsreader-piper
app.kubernetes.io/part-of: flowercore app.kubernetes.io/part-of: flowercore
spec: spec:
# Bypass CoreDNS's *.iamworkin.lan wildcard so the init container reaches
# huggingface.co directly when it seeds voice models.
dnsPolicy: None
dnsConfig:
nameservers:
- 10.43.0.10
searches:
- fc-ttsreader.svc.cluster.local
- svc.cluster.local
- cluster.local
options:
- name: ndots
value: "2"
initContainers: initContainers:
- name: seed-voices - name: seed-voices
image: rhasspy/wyoming-piper:latest image: rhasspy/wyoming-piper:latest
@@ -296,14 +309,23 @@ spec:
periodSeconds: 10 periodSeconds: 10
timeoutSeconds: 5 timeoutSeconds: 5
failureThreshold: 18 failureThreshold: 18
# Sprint E Phase 1a (kokoro stability) — 4 restarts in 2d6h with
# exit 143 traced to liveness probe `context deadline exceeded` while
# kokoro was busy synthesizing. /v1/audio/voices shares the FastAPI
# worker pool with /v1/audio/speech, so a long synth can starve the
# probe out within the prior 5s × 3 = 15s window. Bump timeoutSeconds
# 5 → 15 and failureThreshold 3 → 5 → 75s grace before kubelet kills
# the pod. The TtsCircuitBreaker on the synthesizer side (Phase 1b)
# backs this up so the FC backend stops slamming kokoro during
# recovery.
livenessProbe: livenessProbe:
httpGet: httpGet:
path: /v1/audio/voices path: /v1/audio/voices
port: 8880 port: 8880
initialDelaySeconds: 180 initialDelaySeconds: 180
periodSeconds: 30 periodSeconds: 30
timeoutSeconds: 5 timeoutSeconds: 15
failureThreshold: 3 failureThreshold: 5
--- ---
# fc-biblical-tts — eSpeak-NG-backed Ancient Greek + Hebrew TTS with # fc-biblical-tts — eSpeak-NG-backed Ancient Greek + Hebrew TTS with
# word-level timing for read-along playback. Companion to ttsreader-kokoro # word-level timing for read-along playback. Companion to ttsreader-kokoro
@@ -510,7 +532,7 @@ spec:
fsGroupChangePolicy: OnRootMismatch fsGroupChangePolicy: OnRootMismatch
containers: containers:
- name: web - name: web
image: localhost/fc-ttsreader-web:v202604252002 image: localhost/fc-ttsreader-web:v20260506-47a88ae
imagePullPolicy: Never imagePullPolicy: Never
ports: ports:
- containerPort: 5217 - containerPort: 5217
@@ -528,6 +550,8 @@ spec:
value: "/usr/bin/ffmpeg" value: "/usr/bin/ffmpeg"
- name: TtsReader__Bible__CorpusRoot - name: TtsReader__Bible__CorpusRoot
value: "/data/corpus-cache/world-english-bible/eng/usx" value: "/data/corpus-cache/world-english-bible/eng/usx"
- name: TtsReader__ChapterContext__DatabasePath
value: "/data/chapter-context.db"
- name: TtsReader__Jobs__Root - name: TtsReader__Jobs__Root
value: "/data/jobs" value: "/data/jobs"
- name: TtsReader__Piper__Host - name: TtsReader__Piper__Host
@@ -573,6 +597,19 @@ spec:
value: "/data/logs" value: "/data/logs"
- name: TtsReader__Runtime__SmokeStatePath - name: TtsReader__Runtime__SmokeStatePath
value: "/data/ops/smoke-status.json" value: "/data/ops/smoke-status.json"
# Sprint E Day 8 voice-preview disk cache — writes WAVs under
# this directory. Default "data/voice-previews" resolves to
# the read-only $HOME path under runAsNonRoot=true. Pin to
# the writable PVC mount.
- name: TtsReader__Preview__CacheDirectory
value: "/data/voice-previews"
# Sprint E XXL Phase 4γ — content-addressed CDN bundle dir for
# POST /api/v1/render. Default "wwwroot/cdn" resolves under the
# read-only app filesystem, so pin to the writable PVC mount
# alongside other TtsReader runtime data. Manifests + cue audio
# land at /data/cdn/sha256/<hash>/manifest.json + cues/.
- name: TtsReader__Render__CdnDirectory
value: "/data/cdn"
- name: Auth__ApiKey - name: Auth__ApiKey
valueFrom: valueFrom:
secretKeyRef: secretKeyRef:
@@ -587,7 +624,10 @@ spec:
optional: true optional: true
resources: resources:
requests: requests:
cpu: 100m # The cluster is currently saturated on requested CPU by
# remotedesktop workloads even when real usage is low.
# Keep the web frontend schedulable under that pressure.
cpu: 10m
memory: 256Mi memory: 256Mi
limits: limits:
cpu: 500m cpu: 500m

View File

@@ -465,6 +465,22 @@ metadata:
spec: spec:
itemPath: vaults/IAmWorkin/items/Guacamole JSON Auth itemPath: vaults/IAmWorkin/items/Guacamole JSON Auth
--- ---
---
# 1Password-backed credentials for Mac mini VNC access (Phase 1 — 2026-04-28)
# The operator mints Secret 'macmini-vnc-creds' with keys: username, password, VNC Password
# Note: '1Password' field label 'VNC Password' -> K8s Secret key 'VNC Password' (space retained)
# Guacamole VNC connection password is sourced from the 'VNC Password' field.
# Actual IP is 10.0.56.115 (INFRA VLAN) — the 1P item 'IP' field is kept as backup reference.
apiVersion: onepassword.com/v1
kind: OnePasswordItem
metadata:
name: macmini-vnc-creds
namespace: guacamole
labels:
app.kubernetes.io/component: credentials
app.kubernetes.io/part-of: flowercore
spec:
itemPath: vaults/IAmWorkin/items/Mac Mini
# Blue Jay Branding Extension (CSS + translations) # Blue Jay Branding Extension (CSS + translations)
apiVersion: v1 apiVersion: v1
kind: ConfigMap kind: ConfigMap

View File

@@ -16,6 +16,15 @@ spec:
requests: requests:
storage: 1Gi storage: 1Gi
--- ---
apiVersion: v1
kind: ConfigMap
metadata:
name: intranet-config
namespace: intranet
data:
KnowledgeApiKey: ""
TrustedHeaderSharedSecret: ""
---
apiVersion: apps/v1 apiVersion: apps/v1
kind: Deployment kind: Deployment
metadata: metadata:
@@ -37,7 +46,7 @@ spec:
spec: spec:
containers: containers:
- name: intranet-web - name: intranet-web
image: localhost/fc-intranet-web:v202604242354overridefix image: localhost/fc-intranet-web:v20260505-1108
imagePullPolicy: Never imagePullPolicy: Never
ports: ports:
- containerPort: 5300 - containerPort: 5300
@@ -52,6 +61,27 @@ spec:
# in minutes. Memory: feedback_pi5_nomic_embed_slow. # in minutes. Memory: feedback_pi5_nomic_embed_slow.
- name: IntranetSearch__OllamaBaseUrl - name: IntranetSearch__OllamaBaseUrl
value: "http://10.0.56.20:11434" value: "http://10.0.56.20:11434"
# Sprint E Phase 2α — JSON-file-backed PageReadingOverride persistence
# on the writable PVC at /data. Without this env var the
# intranet falls back to the in-memory store (loses state on
# pod restart). Master's PageReadingOverrideOptions binds
# PageReadingOverrides:FilePath.
- name: PageReadingOverrides__FilePath
value: "/data/page-reading-overrides.json"
- name: KnowledgeFleetSearch__BaseUrl
value: "https://knowledge.iamworkin.lan"
- name: KnowledgeFleetSearch__ApiKey
valueFrom:
configMapKeyRef:
name: intranet-config
key: KnowledgeApiKey
optional: true
- name: TrustedHeaderAuthentication__SharedSecret
valueFrom:
configMapKeyRef:
name: intranet-config
key: TrustedHeaderSharedSecret
optional: true
resources: resources:
requests: requests:
memory: "256Mi" memory: "256Mi"

View File

@@ -1,7 +1,13 @@
# knowledge — FlowerCore.Knowledge.Web (Phase 2.4 K8s deploy) # knowledge — FlowerCore.Knowledge.Web (Phase 2.4 K8s deploy)
**Status:** manifests staged, **NOT YET APPLIED**. Image must be built + **Status:** **LIVE 2026-04-27** at `https://knowledge.iamworkin.lan`
imported AND DNS record provisioned before `git push`. Phase 2.4 closed. Pod running, certificate issued (step-ca-acme), PVC
bound (Longhorn 20Gi RWO), ArgoCD `infra-knowledge` synced. `/healthz`
returns 200, `/api/v1/editions` returns `[]` (initial-deploy state — no
*.db files in the PVC yet; Phase 2.5+ admin UI handles bulk
population). Phase 1 of the Agent Zero MCP rollout keeps `/healthz`
anonymous and gates `/mcp` behind `Authorization: Bearer <token>` built
from the 1Password item `FlowerCore Knowledge MCP Tokens`.
- Plan: [`../../../FlowerCore.Notes/docs/ai-agents/flowercore-knowledge-service-plan.md`](../../../FlowerCore.Notes/docs/ai-agents/flowercore-knowledge-service-plan.md) - Plan: [`../../../FlowerCore.Notes/docs/ai-agents/flowercore-knowledge-service-plan.md`](../../../FlowerCore.Notes/docs/ai-agents/flowercore-knowledge-service-plan.md)
- Sprint: [`../../../FlowerCore.Notes/docs/ai-station/sprint-e-xxl-plan.md`](../../../FlowerCore.Notes/docs/ai-station/sprint-e-xxl-plan.md) (Track B) - Sprint: [`../../../FlowerCore.Notes/docs/ai-station/sprint-e-xxl-plan.md`](../../../FlowerCore.Notes/docs/ai-station/sprint-e-xxl-plan.md) (Track B)
@@ -15,6 +21,12 @@ search to the rest of the FC ecosystem (Agent Zero, Chat.Web persona
memory, AiStation embeddings explorer, TtsReader chapter context, BMO memory, AiStation embeddings explorer, TtsReader chapter context, BMO
bot, Pi nodes via `fc-index sync`). bot, Pi nodes via `fc-index sync`).
Phase 1 MCP routing is explicit:
- in-cluster Agent Zero → `http://knowledge-web.knowledge.svc/mcp`
- workstation Agent Zero → `https://knowledge.iamworkin.lan/mcp`
- probe URL for both lanes → `/healthz`
## Deployment order (do NOT skip / reorder) ## Deployment order (do NOT skip / reorder)
### 1. FlowerCore.DNS public A record — knowledge.iamworkin.lan -> 10.0.56.200 ### 1. FlowerCore.DNS public A record — knowledge.iamworkin.lan -> 10.0.56.200

View File

@@ -40,6 +40,17 @@ metadata:
labels: labels:
app.kubernetes.io/part-of: bluejay-infra app.kubernetes.io/part-of: bluejay-infra
--- ---
# MCP bearer token for the read-only Agent Zero Phase 1 lane. The 1Password
# item currently stores the raw token in its concealed PASSWORD field, which
# the operator syncs into the namespaced Secret key `password`.
apiVersion: onepassword.com/v1
kind: OnePasswordItem
metadata:
name: knowledge-mcp-tokens
namespace: knowledge
spec:
itemPath: "vaults/IAmWorkin/items/FlowerCore Knowledge MCP Tokens"
---
apiVersion: v1 apiVersion: v1
kind: PersistentVolumeClaim kind: PersistentVolumeClaim
metadata: metadata:
@@ -91,8 +102,17 @@ spec:
- name: web - name: web
# Placeholder tag — bump to the image you built + imported to ALL # Placeholder tag — bump to the image you built + imported to ALL
# RKE2 nodes via scripts/deploy-knowledge.sh before applying. # RKE2 nodes via scripts/deploy-knowledge.sh before applying.
image: localhost/fc-knowledge-web:v202604272200 image: localhost/fc-knowledge-web:v20260429232635
imagePullPolicy: Never imagePullPolicy: Never
command:
- /bin/sh
- -c
args:
- |
if [ -n "${KNOWLEDGE_MCP_BEARER_TOKEN:-}" ]; then
export FlowerCore__Mcp__ApiKey__Key="Bearer ${KNOWLEDGE_MCP_BEARER_TOKEN}"
fi
exec dotnet FlowerCore.Knowledge.Web.dll
ports: ports:
- containerPort: 8080 - containerPort: 8080
name: http name: http
@@ -104,7 +124,7 @@ spec:
- name: DOTNET_SYSTEM_GLOBALIZATION_INVARIANT - name: DOTNET_SYSTEM_GLOBALIZATION_INVARIANT
value: "false" value: "false"
# Vector-store directory + embedding model + edition profile dir. # Vector-store directory + embedding model + edition profile dir.
# Profile JSON is baked into the image at /app/editions via the # Profile JSON is baked into the image at /home/app/editions via the
# csproj Content-link from FlowerCore.Common/editions/. # csproj Content-link from FlowerCore.Common/editions/.
- name: Knowledge__VectorStoresDirectory - name: Knowledge__VectorStoresDirectory
value: "/data/vector-stores" value: "/data/vector-stores"
@@ -115,12 +135,27 @@ spec:
- name: Knowledge__MaxLimit - name: Knowledge__MaxLimit
value: "50" value: "50"
- name: FlowerCore__Editions__ProfileDirectory - name: FlowerCore__Editions__ProfileDirectory
value: "/app/editions" value: "/home/app/editions"
# Embed via BLUEJAY-WS GPU (R9700, 32GB VRAM). Pi5 Ollama is # Embed via edge1 Pi 5 + AI HAT+ (10.0.57.17:11434). Cluster
# ~4-5x slower; use the workstation while we have it. # services do not depend on BLUEJAY-WS (private dev hardware) per
# Memory: feedback_pi5_nomic_embed_slow. # bluejay-infra@0f9d56e. Query-time embedding is fast enough on
# edge1 (~ms per query); bulk index rebuilds (Phase 2.5+) will
# need a separate ingestion lane that can opt into the
# workstation GPU when present.
- name: FlowerCore__Ollama__BaseUrl - name: FlowerCore__Ollama__BaseUrl
value: "http://10.0.56.20:11434" value: "http://10.0.57.17:11434"
- name: FlowerCore__Mcp__ApiKey__Key
valueFrom:
secretKeyRef:
name: knowledge-mcp-tokens
key: password
- name: FlowerCore__Mcp__ApiKey__HeaderName
value: "Authorization"
- name: KNOWLEDGE_MCP_BEARER_TOKEN
valueFrom:
secretKeyRef:
name: knowledge-mcp-tokens
key: password
resources: resources:
requests: requests:
cpu: 100m cpu: 100m
@@ -166,7 +201,7 @@ spec:
- name: tmp - name: tmp
mountPath: /tmp mountPath: /tmp
- name: logs - name: logs
mountPath: /app/logs mountPath: /home/app/logs
volumes: volumes:
- name: vector-store - name: vector-store
persistentVolumeClaim: persistentVolumeClaim:

View File

@@ -0,0 +1,762 @@
{
"annotations": {
"list": []
},
"editable": true,
"fiscalYearStartMonth": 0,
"graphTooltip": 1,
"id": null,
"links": [
{
"icon": "external link",
"includeVars": false,
"keepTime": false,
"targetBlank": true,
"title": "Open Service",
"type": "link",
"url": "https://updatecenter.iamworkin.lan/"
}
],
"panels": [
{
"datasource": {
"type": "prometheus",
"uid": "fffjikve8llhce"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [
{
"options": {
"0": {
"color": "#f87171",
"index": 1,
"text": "DOWN"
},
"1": {
"color": "#4ade80",
"index": 0,
"text": "UP"
}
},
"type": "value"
}
],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "#f87171",
"value": null
},
{
"color": "#4ade80",
"value": 1
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 4,
"w": 8,
"x": 0,
"y": 0
},
"id": 1,
"options": {
"colorMode": "background",
"graphMode": "none",
"justifyMode": "center",
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"textMode": "value_and_name"
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "fffjikve8llhce"
},
"expr": "probe_success{job=\"probe-traefik-services\",instance=\"updatecenter.iamworkin.lan\"}",
"refId": "A",
"legendFormat": "Availability"
}
],
"title": "Service Availability",
"transparent": true,
"type": "stat"
},
{
"datasource": {
"type": "prometheus",
"uid": "fffjikve8llhce"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"decimals": 2,
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "#f87171",
"value": null
},
{
"color": "#fbbf24",
"value": 95
},
{
"color": "#FFB300",
"value": 99
},
{
"color": "#4ade80",
"value": 99.9
}
]
},
"unit": "percent"
},
"overrides": []
},
"gridPos": {
"h": 4,
"w": 8,
"x": 8,
"y": 0
},
"id": 2,
"options": {
"colorMode": "background_solid",
"graphMode": "area",
"justifyMode": "center",
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"textMode": "value_and_name"
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "fffjikve8llhce"
},
"expr": "avg_over_time(probe_success{job=\"probe-traefik-services\",instance=\"updatecenter.iamworkin.lan\"}[24h]) * 100",
"refId": "A",
"legendFormat": "24h Uptime"
}
],
"title": "24-Hour Uptime",
"transparent": true,
"type": "stat"
},
{
"datasource": {
"type": "prometheus",
"uid": "fffjikve8llhce"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"max": 30,
"min": 0,
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "#f87171",
"value": null
},
{
"color": "#fbbf24",
"value": 2
},
{
"color": "#4ade80",
"value": 7
}
]
},
"unit": "d"
},
"overrides": []
},
"gridPos": {
"h": 4,
"w": 8,
"x": 16,
"y": 0
},
"id": 3,
"options": {
"minVizHeight": 75,
"minVizWidth": 75,
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"showThresholdLabels": false,
"showThresholdMarkers": true
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "fffjikve8llhce"
},
"expr": "(probe_ssl_earliest_cert_expiry{job=\"probe-traefik-services\",instance=\"updatecenter.iamworkin.lan\"} - time()) / 86400",
"refId": "A",
"legendFormat": "Days Remaining"
}
],
"title": "Cert Expiry (Days)",
"transparent": true,
"type": "gauge"
},
{
"datasource": {
"type": "prometheus",
"uid": "fffjikve8llhce"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "Response Time (seconds)",
"drawStyle": "line",
"fillOpacity": 12,
"gradientMode": "scheme",
"lineInterpolation": "smooth",
"lineWidth": 2,
"pointSize": 4,
"showPoints": "never",
"spanNulls": true,
"thresholdsStyle": {
"mode": "dashed"
}
},
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "#4ade80",
"value": null
},
{
"color": "#fbbf24",
"value": 2
},
{
"color": "#f87171",
"value": 5
}
]
},
"unit": "s"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 14,
"x": 0,
"y": 4
},
"id": 4,
"options": {
"legend": {
"calcs": [
"lastNotNull",
"mean",
"max"
],
"displayMode": "table",
"placement": "right"
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "fffjikve8llhce"
},
"expr": "probe_duration_seconds{job=\"probe-traefik-services\",instance=\"updatecenter.iamworkin.lan\"}",
"refId": "A",
"legendFormat": "Probe Duration"
}
],
"timeFrom": "1h",
"title": "Response Time (1h Trend)",
"transparent": true,
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "fffjikve8llhce"
},
"gridPos": {
"h": 8,
"w": 10,
"x": 14,
"y": 4
},
"id": 5,
"options": {
"alertInstanceLabelFilter": "{instance=\"updatecenter.iamworkin.lan\"}",
"alertName": "",
"dashboardAlerts": false,
"groupBy": [],
"groupMode": "default",
"maxItems": 10,
"sortOrder": 1,
"stateFilter": {
"error": true,
"firing": true,
"noData": true,
"normal": false,
"pending": true
},
"viewMode": "list"
},
"title": "Active Alerts",
"type": "alertlist"
},
{
"collapsed": false,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 12
},
"id": 20,
"title": "OTEL Counters — Track 1D",
"type": "row"
},
{
"datasource": {
"type": "prometheus",
"uid": "fffjikve8llhce"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"lineWidth": 1,
"fillOpacity": 10
},
"unit": "reqps"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 13
},
"id": 21,
"options": {
"legend": {
"displayMode": "table",
"placement": "right",
"calcs": ["mean", "lastNotNull"]
},
"tooltip": {
"mode": "multi",
"sort": "desc"
}
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "fffjikve8llhce"
},
"expr": "sum by (status) (rate(updatecenter_manifest_requests_total[5m]))",
"refId": "A",
"legendFormat": "status={{status}}"
}
],
"title": "Manifest Requests rate by status (5m)",
"transparent": true,
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "fffjikve8llhce"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"lineWidth": 1,
"fillOpacity": 10
},
"unit": "Bps"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 13
},
"id": 22,
"options": {
"legend": {
"displayMode": "table",
"placement": "right",
"calcs": ["mean", "lastNotNull"]
},
"tooltip": {
"mode": "multi",
"sort": "desc"
}
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "fffjikve8llhce"
},
"expr": "sum by (slug) (rate(updatecenter_bundle_download_bytes_total[5m]))",
"refId": "A",
"legendFormat": "{{slug}}"
}
],
"title": "Bundle Download Throughput by slug (5m)",
"transparent": true,
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "fffjikve8llhce"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"lineWidth": 1,
"fillOpacity": 10
},
"unit": "reqps"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 21
},
"id": 23,
"options": {
"legend": {
"displayMode": "table",
"placement": "right",
"calcs": ["mean", "lastNotNull"]
},
"tooltip": {
"mode": "multi",
"sort": "desc"
}
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "fffjikve8llhce"
},
"expr": "sum by (status) (rate(updatecenter_checkins_total[5m]))",
"refId": "A",
"legendFormat": "status={{status}}"
}
],
"title": "Agent Check-in Rate by status (5m)",
"transparent": true,
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "fffjikve8llhce"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"thresholds": {
"mode": "absolute",
"steps": [
{ "color": "#4ade80", "value": null },
{ "color": "#f87171", "value": 1 }
]
},
"unit": "none",
"decimals": 2
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 6,
"x": 12,
"y": 21
},
"id": 24,
"options": {
"colorMode": "background",
"graphMode": "area",
"justifyMode": "center",
"orientation": "auto",
"reduceOptions": {
"calcs": ["sum"],
"fields": "",
"values": false
},
"textMode": "value_and_name"
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "fffjikve8llhce"
},
"expr": "increase(updatecenter_signature_verify_failures_total[1h])",
"refId": "A",
"legendFormat": "Sig Verify Failures (1h)"
}
],
"title": "Signature Verify Failures (1h)",
"transparent": true,
"type": "stat"
},
{
"datasource": {
"type": "prometheus",
"uid": "fffjikve8llhce"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"lineWidth": 1,
"fillOpacity": 10
},
"unit": "reqps"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 6,
"x": 18,
"y": 21
},
"id": 25,
"options": {
"legend": {
"displayMode": "table",
"placement": "right",
"calcs": ["mean", "lastNotNull"]
},
"tooltip": {
"mode": "multi",
"sort": "desc"
}
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "fffjikve8llhce"
},
"expr": "sum by (slug, channel) (rate(updatecenter_release_publishes_total[5m]))",
"refId": "A",
"legendFormat": "{{slug}}/{{channel}}"
}
],
"title": "Release Publishes rate by slug/channel (5m)",
"transparent": true,
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "fffjikve8llhce"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"lineWidth": 1,
"fillOpacity": 10
},
"unit": "reqps"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 29
},
"id": 26,
"options": {
"legend": {
"displayMode": "table",
"placement": "right",
"calcs": ["mean", "lastNotNull"]
},
"tooltip": {
"mode": "multi",
"sort": "desc"
}
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "fffjikve8llhce"
},
"expr": "sum by (kind, status) (rate(updatecenter_bundle_downloads_total[5m]))",
"refId": "A",
"legendFormat": "{{kind}} / {{status}}"
}
],
"title": "Bundle Download Requests by kind/status (5m)",
"transparent": true,
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "fffjikve8llhce"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"lineWidth": 2,
"fillOpacity": 20
},
"thresholds": {
"mode": "absolute",
"steps": [
{ "color": "#4ade80", "value": null },
{ "color": "#f87171", "value": 0.01 }
]
},
"unit": "reqps"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 29
},
"id": 27,
"options": {
"legend": {
"displayMode": "table",
"placement": "right",
"calcs": ["mean", "lastNotNull"]
},
"tooltip": {
"mode": "multi",
"sort": "desc"
}
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "fffjikve8llhce"
},
"expr": "rate(updatecenter_signature_verify_failures_total[5m])",
"refId": "A",
"legendFormat": "Sig verify failures/s"
}
],
"title": "Signature Verify Failure Rate (5m) — Critical if >0",
"transparent": true,
"type": "timeseries"
}
],
"refresh": "30s",
"schemaVersion": 39,
"style": "dark",
"tags": [
"blue-jay",
"flowercore",
"synthetic",
"updatecenter",
"otel"
],
"templating": {
"list": []
},
"time": {
"from": "now-24h",
"to": "now"
},
"timezone": "browser",
"title": "FlowerCore.UpdateCenter Dashboard",
"uid": "fc-updatecenter",
"version": 2
}

View File

@@ -0,0 +1,226 @@
{
"annotations": {
"list": []
},
"editable": true,
"fiscalYearStartMonth": 0,
"graphTooltip": 0,
"id": null,
"links": [],
"panels": [
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"unit": "short"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 0
},
"id": 1,
"options": {
"legend": {
"displayMode": "table",
"placement": "bottom"
},
"tooltip": {
"mode": "single"
}
},
"targets": [
{
"editorMode": "code",
"expr": "sum by (event) (increase(fc_desktop_session_events_total[$__rate_interval]))",
"legendFormat": "{{event}}",
"range": true,
"refId": "A"
}
],
"title": "RemoteDesktop Session Events",
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"unit": "short"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 0
},
"id": 2,
"options": {
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"showUnfilled": true
},
"targets": [
{
"editorMode": "code",
"expr": "sum by (template, event) (increase(fc_desktop_session_events_total[24h]))",
"legendFormat": "{{template}} {{event}}",
"range": true,
"refId": "A"
}
],
"title": "24h Session Events By Template",
"type": "bargauge"
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"unit": "short"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 8
},
"id": 3,
"options": {
"legend": {
"displayMode": "table",
"placement": "bottom"
},
"tooltip": {
"mode": "single"
}
},
"targets": [
{
"editorMode": "code",
"expr": "fc_desktop_pool_ready",
"legendFormat": "{{template}} ready",
"range": true,
"refId": "A"
},
{
"editorMode": "code",
"expr": "fc_desktop_pool_desired",
"legendFormat": "{{template}} desired",
"range": true,
"refId": "B"
}
],
"title": "Warm Pool Ready vs Desired",
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "orange",
"value": 1
}
]
},
"unit": "short"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 8
},
"id": 4,
"options": {
"colorMode": "value",
"graphMode": "none",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"textMode": "auto"
},
"targets": [
{
"editorMode": "code",
"expr": "sum(increase(fc_desktop_session_events_total{event=\"connect\",browser_datasource=\"json\"}[24h])) - sum(increase(fc_desktop_session_events_total{event=\"disconnect\"}[24h]))",
"range": true,
"refId": "A"
}
],
"title": "24h Connect Minus Disconnect",
"type": "stat"
}
],
"refresh": "30s",
"schemaVersion": 39,
"style": "dark",
"tags": [
"flowercore",
"remotedesktop",
"guacamole"
],
"templating": {
"list": []
},
"time": {
"from": "now-24h",
"to": "now"
},
"timezone": "browser",
"title": "FlowerCore RemoteDesktop",
"uid": "flowercore-remotedesktop",
"version": 1
}

File diff suppressed because one or more lines are too long

View File

@@ -219,6 +219,65 @@ spec:
tls: tls:
secretName: cockpit-tls secretName: cockpit-tls
--- ---
# ============================================================
# PuppetDB Dashboard - noc1:8080 (HTTP, web UI only)
# Agent-to-PuppetDB mTLS still uses port 8081 directly via Puppet CA
# (NOT via this proxy). See docs/infrastructure/cert-recovery-2026-04-28.md
# ============================================================
apiVersion: v1
kind: Service
metadata:
name: puppetdb-external
namespace: noc-proxy
spec:
ports:
- port: 8080
targetPort: 8080
name: http
clusterIP: None
---
apiVersion: v1
kind: Endpoints
metadata:
name: puppetdb-external
namespace: noc-proxy
subsets:
- addresses:
- ip: 10.0.56.10
ports:
- port: 8080
name: http
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: puppetdb-tls
namespace: noc-proxy
spec:
secretName: puppetdb-tls
issuerRef:
name: step-ca-acme
kind: ClusterIssuer
dnsNames:
- puppetdb.iamworkin.lan
---
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: puppetdb
namespace: noc-proxy
spec:
entryPoints:
- websecure
routes:
- kind: Rule
match: Host(`puppetdb.iamworkin.lan`)
services:
- name: puppetdb-external
port: 8080
tls:
secretName: puppetdb-tls
---
# NetworkPolicy: allow Traefik ingress, allow egress to noc1 # NetworkPolicy: allow Traefik ingress, allow egress to noc1
apiVersion: networking.k8s.io/v1 apiVersion: networking.k8s.io/v1
kind: NetworkPolicy kind: NetworkPolicy
@@ -242,6 +301,8 @@ spec:
ports: ports:
- port: 3000 - port: 3000
protocol: TCP protocol: TCP
- port: 8080
protocol: TCP
- port: 9090 - port: 9090
protocol: TCP protocol: TCP
- port: 9091 - port: 9091

View File

@@ -0,0 +1,24 @@
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<TargetFramework>net10.0</TargetFramework>
<ImplicitUsings>enable</ImplicitUsings>
<Nullable>enable</Nullable>
<IsPackable>false</IsPackable>
<TreatWarningsAsErrors>true</TreatWarningsAsErrors>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="coverlet.collector" Version="6.0.2">
<PrivateAssets>all</PrivateAssets>
<IncludeAssets>runtime; build; native; contentfiles; analyzers; buildtransitive</IncludeAssets>
</PackageReference>
<PackageReference Include="FluentAssertions" Version="6.12.1" />
<PackageReference Include="Microsoft.NET.Test.Sdk" Version="17.12.0" />
<PackageReference Include="xunit" Version="2.9.2" />
<PackageReference Include="xunit.runner.visualstudio" Version="2.8.2">
<PrivateAssets>all</PrivateAssets>
<IncludeAssets>runtime; build; native; contentfiles; analyzers; buildtransitive</IncludeAssets>
</PackageReference>
<PackageReference Include="YamlDotNet" Version="16.2.0" />
</ItemGroup>
</Project>

View File

@@ -0,0 +1,633 @@
using FluentAssertions;
using System.Text.RegularExpressions;
using Xunit;
using YamlDotNet.Core;
using YamlDotNet.RepresentationModel;
namespace BluejayInfraLint.Tests;
[Trait("Category", "Unit")]
public sealed class FleetManifestLintTests
{
private static readonly ManifestInventory Inventory = ManifestInventory.Load();
private static readonly HashSet<string> PublicReadOnlyHosts = new(StringComparer.Ordinal)
{
"dist.flowercore.io",
"dns.iamworkin.lan",
};
// Public hosts that allow a tightly bounded write surface in addition to
// GET/HEAD. updatecenter.iamworkin.lan accepts POST /api/v1/checkin/{id}
// (bootstrap-JWT) so its allowlist is GET||HEAD||POST||OPTIONS — but
// PUT/PATCH/DELETE must still 404 at the route. Anything wider than this
// set should fail this lint.
private static readonly HashSet<string> PublicReadWriteAllowlistHosts = new(StringComparer.Ordinal)
{
"updatecenter.iamworkin.lan",
"updates.iamworkin.lan",
};
private static readonly HashSet<string> ApiKeyProtectedDeployments = new(StringComparer.Ordinal)
{
"messageboard-web",
"scoreboard-web",
"segmentdisplay-web",
"signalcontrol-web",
};
private static readonly HashSet<string> PublicEgressDeployments = new(StringComparer.Ordinal)
{
"asterisk",
"fc-llm-bridge",
"mysql-web",
"php-web",
"ttsreader-align",
"ttsreader-kokoro",
"ttsreader-modern",
"ttsreader-piper",
};
[Fact]
public void IngressRoutes_MustKeepServiceReferencesInTheSameNamespace()
{
var violations = Inventory.Documents
.Where(document => document.Kind == "IngressRoute")
.SelectMany(document =>
document.MappingSequence("spec", "routes")
.SelectMany(route =>
route.MappingSequence("services")
.Select(service => new
{
Document = document,
ServiceName = ManifestNodeExtensions.Scalar(service, "name"),
ServiceNamespace = ManifestNodeExtensions.Scalar(service, "namespace"),
})))
.Where(entry => !string.IsNullOrWhiteSpace(entry.ServiceNamespace))
.Where(entry => !string.Equals(entry.ServiceNamespace, entry.Document.Namespace, StringComparison.Ordinal))
.Select(entry =>
$"{entry.Document.Descriptor} references Service '{entry.ServiceName}' in namespace '{entry.ServiceNamespace}'.")
.ToList();
violations.Should().BeEmpty();
}
[Fact]
public void PublicReadOnlyIngressRoutes_MustExplicitlyAllowOnlyGetAndHead()
{
var violations = Inventory.Documents
.Where(document => document.Kind == "IngressRoute")
.SelectMany(document =>
document.MappingSequence("spec", "routes")
.Select(route => new
{
Document = document,
Match = ManifestNodeExtensions.Scalar(route, "match") ?? string.Empty,
}))
.Where(entry => PublicReadOnlyHosts.Any(host => entry.Match.Contains($"Host(`{host}`)", StringComparison.Ordinal)))
.Where(entry => !entry.Match.Contains("Method(`GET`)", StringComparison.Ordinal)
|| !entry.Match.Contains("Method(`HEAD`)", StringComparison.Ordinal))
.Select(entry => $"{entry.Document.Descriptor} is missing an explicit GET/HEAD method allowlist.")
.ToList();
violations.Should().BeEmpty();
}
[Fact]
public void PublicReadWriteIngressRoutes_MustPinGetHeadPostOptionsAllowlist()
{
// For hosts in PublicReadWriteAllowlistHosts, the route match MUST
// contain Method(`GET`), Method(`HEAD`), Method(`POST`), and
// Method(`OPTIONS`) AND MUST NOT contain Method(`PUT`),
// Method(`PATCH`), or Method(`DELETE`). This keeps the public
// allowlist invariant against regression — see Track A's
// updatecenter-web ingressroute hardening.
var violations = Inventory.Documents
.Where(document => document.Kind == "IngressRoute")
.SelectMany(document =>
document.MappingSequence("spec", "routes")
.Select(route => new
{
Document = document,
Match = ManifestNodeExtensions.Scalar(route, "match") ?? string.Empty,
}))
.Where(entry => PublicReadWriteAllowlistHosts.Any(host => entry.Match.Contains($"Host(`{host}`)", StringComparison.Ordinal)))
.SelectMany(entry =>
{
var localViolations = new List<string>();
foreach (var required in new[] { "GET", "HEAD", "POST", "OPTIONS" })
{
if (!entry.Match.Contains($"Method(`{required}`)", StringComparison.Ordinal))
{
localViolations.Add($"{entry.Document.Descriptor} is missing required Method(`{required}`).");
}
}
foreach (var forbidden in new[] { "PUT", "PATCH", "DELETE" })
{
if (entry.Match.Contains($"Method(`{forbidden}`)", StringComparison.Ordinal))
{
localViolations.Add($"{entry.Document.Descriptor} must not include Method(`{forbidden}`) on a public host.");
}
}
return localViolations;
})
.ToList();
violations.Should().BeEmpty();
}
[Fact]
public void TraefikVipNetworkPolicies_MustAllowPostDnatBackendPorts()
{
var violations = Inventory.Documents
.Where(document => document.Kind == "NetworkPolicy")
.Where(document => document.AllScalars().Any(value => value.Contains("10.0.56.200", StringComparison.Ordinal)))
.SelectMany(document =>
{
var ports = document.EgressPorts().ToHashSet(StringComparer.Ordinal);
var localViolations = new List<string>();
if (ports.Contains("443") && !ports.Contains("8443"))
{
localViolations.Add($"{document.Descriptor} allows Traefik VIP 443 without backend port 8443.");
}
if (ports.Contains("80") && !ports.Contains("8000") && !ports.Contains("8080"))
{
localViolations.Add($"{document.Descriptor} allows Traefik VIP 80 without a backend HTTP port (8000/8080).");
}
return localViolations;
})
.ToList();
violations.Should().BeEmpty();
}
[Fact]
public void ApiKeyProtectedDeployments_MustUseTcpSocketHealthProbes()
{
var violations = Inventory.Documents
.Where(document => document.Kind == "Deployment")
.Where(document => ApiKeyProtectedDeployments.Contains(document.Name))
.SelectMany(document => document.ContainerMappings().SelectMany(container =>
ProbeViolations(document, container, "readinessProbe")
.Concat(ProbeViolations(document, container, "livenessProbe"))))
.ToList();
violations.Should().BeEmpty();
}
[Fact]
public void StatefulSets_WithVolumeClaimTemplates_MustDeclareFilesystemDefaults()
{
var violations = Inventory.Documents
.Where(document => document.Kind == "StatefulSet")
.Where(document => document.MappingSequence("spec", "volumeClaimTemplates").Count > 0)
.SelectMany(document =>
{
var localViolations = new List<string>();
if (string.IsNullOrWhiteSpace(document.Scalar("spec", "podManagementPolicy")))
{
localViolations.Add($"{document.Descriptor} is missing spec.podManagementPolicy.");
}
if (string.IsNullOrWhiteSpace(document.Scalar("spec", "revisionHistoryLimit")))
{
localViolations.Add($"{document.Descriptor} is missing spec.revisionHistoryLimit.");
}
foreach (var claimTemplate in document.MappingSequence("spec", "volumeClaimTemplates"))
{
if (!string.Equals(
ManifestNodeExtensions.Scalar(claimTemplate, "spec", "volumeMode"),
"Filesystem",
StringComparison.Ordinal))
{
var claimName = ManifestNodeExtensions.Scalar(claimTemplate, "metadata", "name") ?? "<unnamed>";
localViolations.Add($"{document.Descriptor} volumeClaimTemplate '{claimName}' is missing volumeMode: Filesystem.");
}
}
return localViolations;
})
.ToList();
violations.Should().BeEmpty();
}
[Fact]
public void LocallyImportedImages_MustUseLocalhostPrefixAndNeverPullPolicy()
{
var violations = Inventory.Documents
.Where(document => document.PodSpec() is not null)
.SelectMany(document => document.ContainerSpecs()
.Where(container => !string.IsNullOrWhiteSpace(container.Image))
.Select(container => new
{
Document = document,
Container = container,
}))
.Where(entry =>
(entry.Container.Image.StartsWith("localhost/", StringComparison.Ordinal)
&& !string.Equals(entry.Container.ImagePullPolicy, "Never", StringComparison.Ordinal))
|| (entry.Container.Image.StartsWith("fc-", StringComparison.Ordinal)
&& !entry.Container.Image.Contains('/', StringComparison.Ordinal)))
.Select(entry =>
{
if (entry.Container.Image.StartsWith("localhost/", StringComparison.Ordinal))
{
return $"{entry.Document.Descriptor} container '{entry.Container.Name}' uses {entry.Container.Image} without imagePullPolicy: Never.";
}
return $"{entry.Document.Descriptor} container '{entry.Container.Name}' uses non-local image '{entry.Container.Image}' for a node-imported FlowerCore workload.";
})
.ToList();
violations.Should().BeEmpty();
}
[Fact]
public void PublicEgressDeployments_MustOptOutOfIamworkinLanSearchSuffixes()
{
var violations = Inventory.Documents
.Where(document => document.PodSpec() is not null)
.Where(document => PublicEgressDeployments.Contains(document.Name))
.SelectMany(document =>
{
var localViolations = new List<string>();
var podSpec = document.PodSpec()!;
var dnsPolicy = ManifestNodeExtensions.Scalar(podSpec, "dnsPolicy");
var searches = ManifestNodeExtensions.ScalarSequence(podSpec, "dnsConfig", "searches").ToList();
if (!string.Equals(dnsPolicy, "None", StringComparison.Ordinal))
{
localViolations.Add($"{document.Descriptor} is missing dnsPolicy: None.");
}
if (searches.Count == 0)
{
localViolations.Add($"{document.Descriptor} is missing dnsConfig.searches.");
}
else if (searches.Any(search => search.Contains("iamworkin.lan", StringComparison.OrdinalIgnoreCase)))
{
localViolations.Add($"{document.Descriptor} still includes iamworkin.lan in dnsConfig.searches.");
}
return localViolations;
})
.ToList();
violations.Should().BeEmpty();
}
private static IEnumerable<string> ProbeViolations(
ManifestDocument document,
YamlMappingNode container,
string probeKey)
{
if (!ManifestNodeExtensions.TryGetMapping(container, probeKey, out var probe)
|| !ManifestNodeExtensions.TryGetMapping(probe, "httpGet", out var httpGet))
{
return Array.Empty<string>();
}
var path = ManifestNodeExtensions.Scalar(httpGet, "path");
if (!string.Equals(path, "/health", StringComparison.Ordinal))
{
return Array.Empty<string>();
}
var containerName = ManifestNodeExtensions.Scalar(container, "name") ?? "<unnamed>";
return new[]
{
$"{document.Descriptor} container '{containerName}' still uses {probeKey}.httpGet on /health.",
};
}
}
internal sealed class ManifestInventory
{
private ManifestInventory(string workspaceRoot, string bluejayRoot, IReadOnlyList<ManifestDocument> documents)
{
WorkspaceRoot = workspaceRoot;
BluejayRoot = bluejayRoot;
Documents = documents;
}
public string WorkspaceRoot { get; }
public string BluejayRoot { get; }
public IReadOnlyList<ManifestDocument> Documents { get; }
public static ManifestInventory Load()
{
var bluejayRoot = FindBluejayInfraRoot();
var workspaceRoot = Directory.GetParent(bluejayRoot)?.FullName
?? throw new DirectoryNotFoundException($"Could not resolve workspace root from '{bluejayRoot}'.");
var documents = ManifestRoots(workspaceRoot, bluejayRoot)
.SelectMany(LoadDocumentsFromRoot)
.ToList();
return new ManifestInventory(workspaceRoot, bluejayRoot, documents);
}
private static string FindBluejayInfraRoot()
{
var current = new DirectoryInfo(AppContext.BaseDirectory);
while (current is not null)
{
if (Directory.Exists(Path.Combine(current.FullName, "apps"))
&& File.Exists(Path.Combine(current.FullName, "README.md")))
{
return current.FullName;
}
current = current.Parent;
}
throw new DirectoryNotFoundException("Could not find the bluejay-infra repository root from the test output directory.");
}
private static IEnumerable<string> ManifestRoots(string workspaceRoot, string bluejayRoot)
{
var roots = new[]
{
Path.Combine(bluejayRoot, "apps"),
Path.Combine(workspaceRoot, "FlowerCore.Chat", "k8s"),
Path.Combine(workspaceRoot, "FlowerCore.DMS", "k8s"),
Path.Combine(workspaceRoot, "FlowerCore.DNS", "k8s"),
Path.Combine(workspaceRoot, "FlowerCore.Intranet.Web", "k8s"),
Path.Combine(workspaceRoot, "FlowerCore.Kiosk", "k8s"),
Path.Combine(workspaceRoot, "FlowerCore.Media", "k8s"),
Path.Combine(workspaceRoot, "FlowerCore.MenuBoard", "k8s"),
Path.Combine(workspaceRoot, "FlowerCore.MessageBoard", "k8s"),
// FlowerCore.Notes/k8s/selenium/ is the live Selenium Grid
// manifest tree (consumed by deploy-selenium scripts).
// FlowerCore.Notes/k8s/guacamole/ + FlowerCore.Notes/k8s/monitoring/
// are historical scaffolds that have diverged from the live state
// (bluejay-infra/apps/guacamole + bluejay-infra/apps/monitoring are
// canonical). Operator review is required before bringing them in
// line OR decommissioning them — keep them out of the lint scope
// until that decision lands. See xxl-regroup-2026-05-03-followup.md
// "Codex 7 §0 stop conditions" + the C7 close-session output.
Path.Combine(workspaceRoot, "FlowerCore.Notes", "k8s", "selenium"),
Path.Combine(workspaceRoot, "FlowerCore.MySQL", "k8s"),
Path.Combine(workspaceRoot, "FlowerCore.PHP", "k8s"),
Path.Combine(workspaceRoot, "FlowerCore.Presentations", "k8s"),
Path.Combine(workspaceRoot, "FlowerCore.Print.Web", "k8s"),
Path.Combine(workspaceRoot, "FlowerCore.RemoteDesktop", "k8s"),
Path.Combine(workspaceRoot, "FlowerCore.Scoreboard", "k8s"),
Path.Combine(workspaceRoot, "FlowerCore.SegmentDisplay", "k8s"),
Path.Combine(workspaceRoot, "FlowerCore.SignalControl", "k8s"),
Path.Combine(workspaceRoot, "FlowerCore.TtsReader", "k8s"),
Path.Combine(workspaceRoot, "FlowerCore.Updater", "k8s"),
};
return roots.Where(Directory.Exists);
}
private static IEnumerable<ManifestDocument> LoadDocumentsFromRoot(string root)
{
foreach (var filePath in Directory.EnumerateFiles(root, "*.yaml", SearchOption.AllDirectories))
{
var fileText = File.ReadAllText(filePath);
var segments = SplitManifestDocuments(fileText);
for (var index = 0; index < segments.Count; index++)
{
var yaml = new YamlStream();
try
{
using var reader = new StringReader(segments[index]);
yaml.Load(reader);
}
catch (YamlException exception)
{
_ = exception;
continue;
}
if (yaml.Documents.Count == 0)
{
continue;
}
if (yaml.Documents[0].RootNode is YamlMappingNode mapping
&& ManifestNodeExtensions.Scalar(mapping, "kind") is not null)
{
yield return new ManifestDocument(root, filePath, index, fileText, mapping);
}
}
}
}
private static IReadOnlyList<string> SplitManifestDocuments(string fileText)
{
var documents = new List<string>();
var currentLines = new List<string>();
var seenApiVersion = false;
foreach (var line in Regex.Split(fileText, @"\r?\n"))
{
if (Regex.IsMatch(line, @"^\s*---\s*$"))
{
FlushCurrentDocument();
continue;
}
if (Regex.IsMatch(line, @"^\s*apiVersion:\s*")
&& seenApiVersion
&& currentLines.Any(existing => !string.IsNullOrWhiteSpace(existing)))
{
FlushCurrentDocument();
}
currentLines.Add(line);
if (Regex.IsMatch(line, @"^\s*apiVersion:\s*"))
{
seenApiVersion = true;
}
}
FlushCurrentDocument();
return documents;
void FlushCurrentDocument()
{
var text = string.Join(Environment.NewLine, currentLines).Trim();
if (!string.IsNullOrWhiteSpace(text))
{
documents.Add(text);
}
currentLines.Clear();
seenApiVersion = false;
}
}
}
internal sealed record ManifestDocument(
string RootPath,
string FilePath,
int DocumentIndex,
string FileText,
YamlMappingNode Root)
{
public string Kind => Scalar("kind") ?? string.Empty;
public string Name => Scalar("metadata", "name") ?? $"document-{DocumentIndex}";
public string Namespace => Scalar("metadata", "namespace") ?? string.Empty;
public string RelativePath => Path.GetRelativePath(RootPath, FilePath).Replace('\\', '/');
public string Descriptor => $"{Kind} {Namespace}/{Name} [{RelativePath}#{DocumentIndex + 1}]";
public string? Scalar(params string[] path) => ManifestNodeExtensions.Scalar(Root, path);
public IReadOnlyList<YamlMappingNode> MappingSequence(params string[] path) => ManifestNodeExtensions.MappingSequence(Root, path);
public IEnumerable<string> AllScalars() => ManifestNodeExtensions.AllScalars(Root);
public IReadOnlyList<string> EgressPorts()
{
return MappingSequence("spec", "egress")
.SelectMany(egressRule => ManifestNodeExtensions.MappingSequence(egressRule, "ports"))
.Select(portMapping => ManifestNodeExtensions.Scalar(portMapping, "port"))
.Where(value => !string.IsNullOrWhiteSpace(value))
.Cast<string>()
.ToList();
}
public YamlMappingNode? PodSpec()
{
return Kind switch
{
"Deployment" or "StatefulSet" or "DaemonSet" or "Job" =>
ManifestNodeExtensions.Mapping(Root, "spec", "template", "spec"),
"CronJob" =>
ManifestNodeExtensions.Mapping(Root, "spec", "jobTemplate", "spec", "template", "spec"),
_ => null,
};
}
public IReadOnlyList<YamlMappingNode> ContainerMappings()
{
var podSpec = PodSpec();
if (podSpec is null)
{
return Array.Empty<YamlMappingNode>();
}
return ManifestNodeExtensions.MappingSequence(podSpec, "containers")
.Concat(ManifestNodeExtensions.MappingSequence(podSpec, "initContainers"))
.ToList();
}
public IReadOnlyList<ContainerSpec> ContainerSpecs()
{
return ContainerMappings()
.Select(container => new ContainerSpec(
ManifestNodeExtensions.Scalar(container, "name") ?? "<unnamed>",
ManifestNodeExtensions.Scalar(container, "image") ?? string.Empty,
ManifestNodeExtensions.Scalar(container, "imagePullPolicy") ?? string.Empty))
.ToList();
}
}
internal sealed record ContainerSpec(string Name, string Image, string ImagePullPolicy);
internal static class ManifestNodeExtensions
{
public static string? Scalar(this YamlMappingNode mapping, params string[] path)
{
return TryGetNode(mapping, path, out var node) && node is YamlScalarNode scalar
? scalar.Value
: null;
}
public static YamlMappingNode? Mapping(this YamlMappingNode mapping, params string[] path)
{
return TryGetNode(mapping, path, out var node) ? node as YamlMappingNode : null;
}
public static bool TryGetMapping(this YamlMappingNode mapping, string key, out YamlMappingNode result)
{
if (TryGetChild(mapping, key, out var child) && child is YamlMappingNode childMapping)
{
result = childMapping;
return true;
}
result = null!;
return false;
}
public static IReadOnlyList<YamlMappingNode> MappingSequence(this YamlMappingNode mapping, params string[] path)
{
return TryGetNode(mapping, path, out var node) && node is YamlSequenceNode sequence
? sequence.Children.OfType<YamlMappingNode>().ToList()
: Array.Empty<YamlMappingNode>();
}
public static IReadOnlyList<string> ScalarSequence(this YamlMappingNode mapping, params string[] path)
{
return TryGetNode(mapping, path, out var node) && node is YamlSequenceNode sequence
? sequence.Children.OfType<YamlScalarNode>()
.Select(child => child.Value)
.Where(value => !string.IsNullOrWhiteSpace(value))
.Cast<string>()
.ToList()
: Array.Empty<string>();
}
public static IEnumerable<string> AllScalars(YamlNode node)
{
return node switch
{
YamlScalarNode scalar when !string.IsNullOrWhiteSpace(scalar.Value) => new[] { scalar.Value! },
YamlSequenceNode sequence => sequence.Children.SelectMany(AllScalars),
YamlMappingNode mapping => mapping.Children.SelectMany(entry => AllScalars(entry.Key).Concat(AllScalars(entry.Value))),
_ => Array.Empty<string>(),
};
}
private static bool TryGetNode(YamlMappingNode mapping, IReadOnlyList<string> path, out YamlNode node)
{
YamlNode current = mapping;
foreach (var segment in path)
{
if (current is not YamlMappingNode currentMapping || !TryGetChild(currentMapping, segment, out current))
{
node = null!;
return false;
}
}
node = current;
return true;
}
private static bool TryGetChild(YamlMappingNode mapping, string key, out YamlNode value)
{
foreach (var entry in mapping.Children)
{
if (entry.Key is YamlScalarNode scalar
&& string.Equals(scalar.Value, key, StringComparison.Ordinal))
{
value = entry.Value;
return true;
}
}
value = null!;
return false;
}
}

View File

@@ -0,0 +1,12 @@
package bluejayinfra.cross_namespace_ingressroute
deny[msg] {
input.kind == "IngressRoute"
ns := object.get(input.metadata, "namespace", "")
route := input.spec.routes[_]
service := route.services[_]
svc_ns := object.get(service, "namespace", "")
svc_ns != ""
svc_ns != ns
msg := sprintf("IngressRoute %s/%s references Service %s in namespace %s", [ns, input.metadata.name, service.name, svc_ns])
}

View File

@@ -0,0 +1,23 @@
package bluejayinfra.public_method_allowlist
public_hosts := {"dist.flowercore.io", "dns.iamworkin.lan"}
deny[msg] {
input.kind == "IngressRoute"
route := input.spec.routes[_]
match := object.get(route, "match", "")
host := public_hosts[_]
contains(match, sprintf("Host(`%s`)", [host]))
not contains(match, "Method(`GET`)")
msg := sprintf("IngressRoute %s/%s is missing Method(GET) for public read-only host %s", [input.metadata.namespace, input.metadata.name, host])
}
deny[msg] {
input.kind == "IngressRoute"
route := input.spec.routes[_]
match := object.get(route, "match", "")
host := public_hosts[_]
contains(match, sprintf("Host(`%s`)", [host]))
not contains(match, "Method(`HEAD`)")
msg := sprintf("IngressRoute %s/%s is missing Method(HEAD) for public read-only host %s", [input.metadata.namespace, input.metadata.name, host])
}

View File

@@ -0,0 +1,30 @@
package bluejayinfra.traefik_vip_backend_ports
has_vip {
some i
some j
input.spec.egress[i].to[j].ipBlock.cidr == "10.0.56.200/32"
}
has_port(port) {
some i
some j
input.spec.egress[i].ports[j].port == port
}
deny[msg] {
input.kind == "NetworkPolicy"
has_vip
has_port(443)
not has_port(8443)
msg := sprintf("NetworkPolicy %s/%s allows 10.0.56.200:443 without backend port 8443", [input.metadata.namespace, input.metadata.name])
}
deny[msg] {
input.kind == "NetworkPolicy"
has_vip
has_port(80)
not has_port(8080)
not has_port(8000)
msg := sprintf("NetworkPolicy %s/%s allows 10.0.56.200:80 without backend HTTP port 8080 or 8000", [input.metadata.namespace, input.metadata.name])
}

View File

@@ -0,0 +1,28 @@
package bluejayinfra.auth_probe_path
protected_deployments := {
"messageboard-web",
"scoreboard-web",
"segmentdisplay-web",
"signalcontrol-web",
}
deny[msg] {
input.kind == "Deployment"
protected_deployments[input.metadata.name]
container := input.spec.template.spec.containers[_]
probe := object.get(container, "readinessProbe", {})
http_get := object.get(probe, "httpGet", {})
object.get(http_get, "path", "") == "/health"
msg := sprintf("Deployment %s/%s must not use readinessProbe.httpGet /health behind API key middleware", [input.metadata.namespace, input.metadata.name])
}
deny[msg] {
input.kind == "Deployment"
protected_deployments[input.metadata.name]
container := input.spec.template.spec.containers[_]
probe := object.get(container, "livenessProbe", {})
http_get := object.get(probe, "httpGet", {})
object.get(http_get, "path", "") == "/health"
msg := sprintf("Deployment %s/%s must not use livenessProbe.httpGet /health behind API key middleware", [input.metadata.namespace, input.metadata.name])
}

View File

@@ -0,0 +1,23 @@
package bluejayinfra.statefulset_volumeclaim_defaults
deny[msg] {
input.kind == "StatefulSet"
count(object.get(input.spec, "volumeClaimTemplates", [])) > 0
object.get(input.spec, "podManagementPolicy", "") == ""
msg := sprintf("StatefulSet %s/%s is missing spec.podManagementPolicy", [input.metadata.namespace, input.metadata.name])
}
deny[msg] {
input.kind == "StatefulSet"
count(object.get(input.spec, "volumeClaimTemplates", [])) > 0
object.get(input.spec, "revisionHistoryLimit", 0) == 0
msg := sprintf("StatefulSet %s/%s is missing spec.revisionHistoryLimit", [input.metadata.namespace, input.metadata.name])
}
deny[msg] {
input.kind == "StatefulSet"
claim := input.spec.volumeClaimTemplates[_]
object.get(claim.spec, "volumeMode", "") != "Filesystem"
claim_name := object.get(claim.metadata, "name", "<unnamed>")
msg := sprintf("StatefulSet %s/%s volumeClaimTemplate %s is missing volumeMode: Filesystem", [input.metadata.namespace, input.metadata.name, claim_name])
}

View File

@@ -0,0 +1,40 @@
package bluejayinfra.localhost_image_pull_policy
pod_spec(spec) = pod {
input.kind == "Deployment"
pod := spec.template.spec
}
pod_spec(spec) = pod {
input.kind == "StatefulSet"
pod := spec.template.spec
}
pod_spec(spec) = pod {
input.kind == "DaemonSet"
pod := spec.template.spec
}
deny[msg] {
pod := pod_spec(input.spec)
container := pod.containers[_]
startswith(object.get(container, "image", ""), "localhost/")
object.get(container, "imagePullPolicy", "") != "Never"
msg := sprintf("%s/%s container %s uses a localhost image without imagePullPolicy: Never", [input.metadata.namespace, input.metadata.name, container.name])
}
deny[msg] {
pod := pod_spec(input.spec)
container := pod.initContainers[_]
startswith(object.get(container, "image", ""), "localhost/")
object.get(container, "imagePullPolicy", "") != "Never"
msg := sprintf("%s/%s initContainer %s uses a localhost image without imagePullPolicy: Never", [input.metadata.namespace, input.metadata.name, container.name])
}
deny[msg] {
pod := pod_spec(input.spec)
container := pod.containers[_]
startswith(object.get(container, "image", ""), "fc-")
not contains(object.get(container, "image", ""), "/")
msg := sprintf("%s/%s container %s uses a non-localhost FlowerCore image reference %s", [input.metadata.namespace, input.metadata.name, container.name, container.image])
}

View File

@@ -0,0 +1,27 @@
package bluejayinfra.public_egress_dns_none
public_egress_workloads := {
"asterisk",
"fc-llm-bridge",
"mysql-web",
"php-web",
"ttsreader-align",
"ttsreader-kokoro",
"ttsreader-modern",
"ttsreader-piper",
}
deny[msg] {
input.kind == "Deployment"
public_egress_workloads[input.metadata.name]
object.get(input.spec.template.spec, "dnsPolicy", "") != "None"
msg := sprintf("Deployment %s/%s must set dnsPolicy: None for public-internet egress", [input.metadata.namespace, input.metadata.name])
}
deny[msg] {
input.kind == "Deployment"
public_egress_workloads[input.metadata.name]
search := object.get(object.get(input.spec.template.spec, "dnsConfig", {}), "searches", [])[_]
contains(lower(search), "iamworkin.lan")
msg := sprintf("Deployment %s/%s must not include iamworkin.lan in dnsConfig.searches", [input.metadata.namespace, input.metadata.name])
}

View File

@@ -0,0 +1,35 @@
package bluejayinfra.public_readwrite_allowlist
# Public hosts that allow a tightly bounded write surface in addition to
# GET/HEAD. updatecenter.iamworkin.lan accepts POST /api/v1/checkin/{id}
# (bootstrap-JWT) so its allowlist is GET||HEAD||POST||OPTIONS — but
# PUT/PATCH/DELETE must still 404 at the route. Any host in this set MUST
# include all four required methods AND MUST NOT include any forbidden
# method.
public_readwrite_hosts := {"updatecenter.iamworkin.lan", "updates.iamworkin.lan"}
required_methods := {"GET", "HEAD", "POST", "OPTIONS"}
forbidden_methods := {"PUT", "PATCH", "DELETE"}
deny[msg] {
input.kind == "IngressRoute"
route := input.spec.routes[_]
match := object.get(route, "match", "")
host := public_readwrite_hosts[_]
contains(match, sprintf("Host(`%s`)", [host]))
required := required_methods[_]
not contains(match, sprintf("Method(`%s`)", [required]))
msg := sprintf("IngressRoute %s/%s is missing required Method(%s) for public read-write host %s", [input.metadata.namespace, input.metadata.name, required, host])
}
deny[msg] {
input.kind == "IngressRoute"
route := input.spec.routes[_]
match := object.get(route, "match", "")
host := public_readwrite_hosts[_]
contains(match, sprintf("Host(`%s`)", [host]))
forbidden := forbidden_methods[_]
contains(match, sprintf("Method(`%s`)", [forbidden]))
msg := sprintf("IngressRoute %s/%s must not include Method(%s) on public read-write host %s", [input.metadata.namespace, input.metadata.name, forbidden, host])
}