Files
bluejay-infra/apps
Andrew Stoltz 37ce0aed85 intranet: v202604240135longchunk — long-chunk handling fix
Image bump v202604240108gpu -> v202604240135longchunk, rebuilt from
FlowerCore.Intranet.Web@feat/shared-indexing-search HEAD which transitively
picks up FlowerCore.Common@feat/shared-indexing@105af75:

- MarkdownChunker hard-caps oversized heading-bounded sections at
  ChunkSizeTokens × 4 chars and splits with overlap (same pattern as
  JsonArticleChunker). Stops the indexer from producing chunks above
  nomic-embed-text's 8192-token input limit at the source.

- IndexBuilder gains IndexingOptions.MaxEmbeddingTokens (default 8000)
  safety filter — chunks above the cap are warn-logged and dropped
  before any batch is sent. New IndexBuildResult.ChunksDropped tracks
  how many got skipped.

Goal: notes-md should index 2541/2541 chunks (vs. 2080/2541 last pass)
with zero "Failed to embed batch" 400s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 01:28:00 -05:00
..