intranet: v202604240135longchunk — long-chunk handling fix

Image bump v202604240108gpu -> v202604240135longchunk, rebuilt from
FlowerCore.Intranet.Web@feat/shared-indexing-search HEAD which transitively
picks up FlowerCore.Common@feat/shared-indexing@105af75:

- MarkdownChunker hard-caps oversized heading-bounded sections at
  ChunkSizeTokens × 4 chars and splits with overlap (same pattern as
  JsonArticleChunker). Stops the indexer from producing chunks above
  nomic-embed-text's 8192-token input limit at the source.

- IndexBuilder gains IndexingOptions.MaxEmbeddingTokens (default 8000)
  safety filter — chunks above the cap are warn-logged and dropped
  before any batch is sent. New IndexBuildResult.ChunksDropped tracks
  how many got skipped.

Goal: notes-md should index 2541/2541 chunks (vs. 2080/2541 last pass)
with zero "Failed to embed batch" 400s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Andrew Stoltz
2026-04-24 01:28:00 -05:00
parent a37fc83584
commit 37ce0aed85

View File

@@ -37,7 +37,7 @@ spec:
spec: spec:
containers: containers:
- name: intranet-web - name: intranet-web
image: localhost/fc-intranet-web:v202604240108gpu image: localhost/fc-intranet-web:v202604240135longchunk
imagePullPolicy: Never imagePullPolicy: Never
ports: ports:
- containerPort: 5300 - containerPort: 5300