Root cause analysis: extraction can appear stuck at '正在提取xx-xx楼层' indefinitely
when the LLM API connection goes half-open (server stops sending data but keeps the
TCP connection alive). The SSE stream reader.read() would block forever with no
per-chunk idle timeout.
Changes:
1. llm/llm.js: Add LLM_STREAM_IDLE_TIMEOUT_MS (90s default) to
parseDedicatedStreamingResponse. When no SSE data is received for 90 seconds,
the read loop aborts with a clear timeout error instead of hanging forever.
The idle timeout is configurable per-request (defaults to 30% of config timeout,
minimum 30s).
2. index.js: Add EXTRACTION_VECTOR_SYNC_TIMEOUT_MS (120s) timeout wrapper around
syncVectorState in handleExtractionSuccess. Vector sync now uses a combined
AbortSignal (extraction signal + timeout) so that either user abort or 120s
timeout will break out. Vector sync timeout is treated as non-fatal (doesn't
abort the entire extraction batch).
ST 的 /api/backends/chat-completions/generate 端点会从预设配置注入
max_tokens,导致与扩展自行传递的 max_completion_tokens 冲突,
上游 API 返回 400。改为 max_tokens 后由 ST 统一处理,消除冲突。
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>