NOT YET APPLIED — push to origin/main is gated on the DNS A record knowledge.iamworkin.lan -> 10.0.56.200 being live. Per memory feedback_pfsense_dns_required_for_acme, applying the Certificate without DNS in place puts cert-manager into ~2h HTTP-01 backoff and needs `kubectl -n knowledge delete order <name>` recovery. Manifests authored: - apps/knowledge/knowledge.yaml — Namespace, PVC (knowledge-vector-store Longhorn 20Gi RWO), Deployment (single replica, Recreate, image localhost/fc-knowledge-web:v202604272200 placeholder, runAsNonRoot 1654, readOnlyRootFilesystem, drop ALL caps, /healthz startupProbe + readinessProbe, tcpSocket livenessProbe), Service (ClusterIP port 80 -> 8080), Certificate (step-ca-acme ClusterIssuer, 90d duration), IngressRoute (knowledge.iamworkin.lan, websecure entrypoint). - apps/knowledge/kustomization.yaml — `kubectl kustomize` preview file (matches fc-distribution shape; ApplicationSet uses dir generator). - apps/knowledge/README.md — deployment order checklist with the DNS preflight, image build/import loop for all 3 RKE2 nodes, push procedure, smoke verification, initial-deploy-state notes (zero editions until *.db files are pushed to the PVC), resource sizing, probe + middleware notes. Companion artifacts (separate repos, separate commits): - FlowerCore.Knowledge@eb91eb4 — Dockerfile.deploy at repo root - FlowerCore.Notes@96cd443 — scripts/deploy-knowledge.sh Apply order (from apps/knowledge/README.md): 1. Add DNS A record knowledge.iamworkin.lan -> 10.0.56.200 via FlowerCore.DNS or pfSense web UI. 2. Run `bash scripts/deploy-knowledge.sh` from FlowerCore.Notes — this builds + imports the image to all 3 RKE2 nodes with FLOWERCORE_DEPLOY_SKIP_ROLLOUT=1 (since the Deployment doesn't exist yet on the cluster). 3. Bump the image tag in this manifest to match the freshly-imported tag, then `git push` from this repo to land on main. ArgoCD picks up within ~3 minutes and creates `infra-knowledge`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
154 lines
5.9 KiB
Markdown
154 lines
5.9 KiB
Markdown
# knowledge — FlowerCore.Knowledge.Web (Phase 2.4 K8s deploy)
|
|
|
|
**Status:** manifests staged, **NOT YET APPLIED**. Image must be built +
|
|
imported AND DNS record provisioned before `git push`.
|
|
|
|
- Plan: [`../../../FlowerCore.Notes/docs/ai-agents/flowercore-knowledge-service-plan.md`](../../../FlowerCore.Notes/docs/ai-agents/flowercore-knowledge-service-plan.md)
|
|
- Sprint: [`../../../FlowerCore.Notes/docs/ai-station/sprint-e-xxl-plan.md`](../../../FlowerCore.Notes/docs/ai-station/sprint-e-xxl-plan.md) (Track B)
|
|
- Repo: `D:\git\FlowerCore\FlowerCore.Knowledge\` (private GitHub repo,
|
|
bootstrapped Sprint D batch 35)
|
|
|
|
`FlowerCore.Knowledge.Web` is the fleet-wide vector-indexing & RAG hub —
|
|
a REST + MCP service that scans `*.db` files under
|
|
`/data/vector-stores` and exposes per-edition reachability + corpus
|
|
search to the rest of the FC ecosystem (Agent Zero, Chat.Web persona
|
|
memory, AiStation embeddings explorer, TtsReader chapter context, BMO
|
|
bot, Pi nodes via `fc-index sync`).
|
|
|
|
## Deployment order (do NOT skip / reorder)
|
|
|
|
### 1. FlowerCore.DNS public A record — knowledge.iamworkin.lan -> 10.0.56.200
|
|
|
|
Required BEFORE the Certificate resource is created, or cert-manager
|
|
HTTP-01 silently backs off ~2h. Memory: `feedback_pfsense_dns_required_for_acme`.
|
|
|
|
The canonical path is FlowerCore.DNS:
|
|
|
|
```bash
|
|
curl -sk https://dns.iamworkin.lan/api/v1/servers
|
|
# Find the pfSense serverId, then create the record using the host label only.
|
|
|
|
curl -sk -X POST https://dns.iamworkin.lan/api/v1/servers/<serverId>/zones/iamworkin.lan/records \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"name":"knowledge","type":"A","data":"10.0.56.200","ttl":300}'
|
|
```
|
|
|
|
If FlowerCore.DNS provider writes are failing 502 with "pfSense
|
|
diag_command.php response did not contain a `<pre>` block" (status as of
|
|
Sprint E Track B authoring 2026-04-27), add the override manually via
|
|
the pfSense web UI:
|
|
|
|
1. Log in to `https://10.0.56.1` as admin
|
|
2. Services → DNS Resolver → General Settings → Host Overrides
|
|
3. Add: Host=`knowledge`, Domain=`iamworkin.lan`, IP Address=`10.0.56.200`
|
|
4. Save + Apply Changes
|
|
|
|
Verify resolution from anywhere on LAN:
|
|
|
|
```bash
|
|
nslookup knowledge.iamworkin.lan 10.0.56.1
|
|
# Expect: 10.0.56.200
|
|
```
|
|
|
|
Or against FlowerCore.DNS once the provider is fixed:
|
|
|
|
```bash
|
|
curl -sk "https://dns.iamworkin.lan/api/v1/zones/iamworkin.lan/resolve-preflight?hostname=knowledge.iamworkin.lan"
|
|
# Expect: "resolvable": true
|
|
```
|
|
|
|
### 2. Build + import the image to ALL RKE2 nodes
|
|
|
|
Pods may schedule on any RKE2 worker (server, agent1, agent2). The
|
|
Longhorn PVC accepts mounts from any node, so the image must be
|
|
imported to all three. Memory:
|
|
`feedback_rke2_image_import_targets_all_nodes` +
|
|
`feedback_rke2_localhost_imagepullpolicy`.
|
|
|
|
```bash
|
|
# From BLUEJAY-WS, in D:\git\FlowerCore\FlowerCore.Knowledge
|
|
TAG="v$(date +%Y%m%d%H%M)"
|
|
dotnet.exe publish -c Release -o deploy/app \
|
|
src/FlowerCore.Knowledge.Web/FlowerCore.Knowledge.Web.csproj
|
|
podman build -t localhost/fc-knowledge-web:$TAG -f deploy/Dockerfile.deploy deploy
|
|
podman save localhost/fc-knowledge-web:$TAG -o /tmp/fc-knowledge-web.tar
|
|
|
|
# Import to all three RKE2 nodes
|
|
for node in rke2-server rke2-agent1 rke2-agent2; do
|
|
scp /tmp/fc-knowledge-web.tar $node:/tmp/
|
|
ssh $node "sudo /var/lib/rancher/rke2/bin/ctr -a /run/k3s/containerd/containerd.sock -n k8s.io images import /tmp/fc-knowledge-web.tar"
|
|
done
|
|
```
|
|
|
|
The repo's `scripts/deploy-knowledge.sh` automates this loop.
|
|
|
|
### 3. Bump the image tag + push
|
|
|
|
Edit `knowledge.yaml`, replace `localhost/fc-knowledge-web:v202604272200`
|
|
with the tag from step 2, then:
|
|
|
|
```bash
|
|
cd D:/git/FlowerCore/bluejay-infra
|
|
python scripts/check-pfsense-dns.py # confirms the DNS preflight
|
|
git add apps/knowledge/
|
|
git commit -m "feat(knowledge): deploy Phase 2.4 K8s manifest"
|
|
git push
|
|
```
|
|
|
|
ArgoCD picks up within ~3 minutes and creates `infra-knowledge`.
|
|
|
|
### 4. Verify
|
|
|
|
```bash
|
|
fcadmin_ssh noc1 '
|
|
kubectl -n argocd get application infra-knowledge
|
|
kubectl -n knowledge get certificate,pod,pvc
|
|
curl -sk -m 8 -o /dev/null -w "HTTP %{http_code}\n" https://knowledge.iamworkin.lan/healthz
|
|
curl -sk -m 8 https://knowledge.iamworkin.lan/api/v1/editions | jq
|
|
'
|
|
```
|
|
|
|
Expect: Certificate `Ready: True` within ~60s, `/healthz` HTTP 200,
|
|
`/api/v1/editions` returns an empty array (no DBs in the PVC yet) on
|
|
first deploy.
|
|
|
|
## Initial-deploy state and Phase 2.5 follow-up
|
|
|
|
The Longhorn PVC is empty on first deploy. Knowledge.Web's filesystem
|
|
catalog will report zero editions until vector-store `*.db` files are
|
|
pushed into `/data/vector-stores`. Initial population is a follow-up
|
|
step (Phase 2.5+, Blazor admin UI's "Rebuild" button); for the first
|
|
deploy the goal is just to prove the pod boots, `/healthz` returns 200,
|
|
and the Traefik IngressRoute serves the Scalar UI.
|
|
|
|
To copy an existing local DB into the PVC (one-time, manual until
|
|
Phase 2.5 admin UI lands):
|
|
|
|
```bash
|
|
fcadmin_ssh noc1 '
|
|
POD=$(kubectl -n knowledge get pod -l app=knowledge-web -o jsonpath="{.items[0].metadata.name}")
|
|
kubectl -n knowledge cp /var/lib/flowercore/vector-stores/bluejay-ai.db $POD:/data/vector-stores/bluejay-ai.db
|
|
'
|
|
```
|
|
|
|
## Probes + middleware notes
|
|
|
|
- `/healthz` is mapped by `Controllers/HealthController.cs` (controller-based
|
|
attribute route). Cheap — no DB, no dependencies.
|
|
- Liveness uses `tcpSocket` as a defensive fallback in case future
|
|
middleware accidentally gates `/healthz` behind auth (memory:
|
|
`feedback_k8s_probes_behind_auth_middleware`).
|
|
- `/openapi/v1.json` and `/scalar/v1` are wired by `UseFlowerCoreApi`.
|
|
Per memory `feedback_k8s_probes_must_not_hit_openapi`, probes must NOT
|
|
point at OpenAPI documents — the `MapOpenApi` call can be slow during
|
|
cold startup.
|
|
|
|
## Resource sizing
|
|
|
|
- 256Mi memory request / 1Gi limit.
|
|
- 100m CPU request / 1000m limit.
|
|
- 20Gi Longhorn PVC initial — sufficient for the bluejay-ai 1.94Gi DB +
|
|
fleet-pi-edge 352Mi + fleet-bmo-bot 141Mi + headroom. Resize via
|
|
`kubectl -n knowledge edit pvc knowledge-vector-store` if growing
|
|
past 15Gi.
|