Files
ST-Bionic-Memory-Ecology/docs/usage/configuration.en.md

205 lines
7.9 KiB
Markdown

# Configuration
[中文](configuration.md) · **English**
This page is split out from the [README](../../README.en.md) as the main ST-BME user configuration reference, preserving setting names, defaults, and tables for quick lookup by feature.
### Memory LLM
The memory LLM is used for:
- Memory extraction.
- Recall reranking.
- Consolidation.
- Compression.
- Small summaries.
- Summary rollup.
- Reflection.
- ENA Planner planning.
Configuration options:
- **Leave blank**
- Reuse the current SillyTavern chat model.
- **Fill in OpenAI-compatible config**
- Use an independent model for memory tasks.
- Useful when you want to separate the main chat model from the background maintenance model.
Security recommendations:
- Do not publicly share exported `extension_settings` or browser storage that contains API keys.
- Debug logs are off by default; enable them temporarily only when troubleshooting.
### Embedding
Embedding is the core of smart recall.
#### Backend mode
Backend mode is recommended first:
- Reuse SillyTavern backend's embedding provider.
- Usually avoids storing the embedding API key directly in the browser.
- Can use sources already supported by SillyTavern, such as OpenAI, Cohere, Mistral, Ollama, LlamaCpp, and vLLM.
#### Direct mode
In direct mode, the browser requests the embedding service directly:
- Requires filling in the API URL, key, and model.
- May hit CORS restrictions.
- Suitable for a self-hosted gateway or independent embedding service.
> After switching embedding mode or model, run "rebuild vectors".
### Extraction settings
| Setting | Default | Description |
| --- | --- | --- |
| 每 N 条回复提取 | `1` | Trigger extraction every N assistant replies |
| 提取上下文轮数 | `2` | Number of conversation rounds to look back during extraction |
| 自动延后最新助手 | `false` | Allows the latest reply to stabilize before extraction |
| Assistant 排除标签 | `think,analysis,reasoning` | Excludes reasoning tags by default |
| 提取消息上限 | `0` | `0` means unlimited |
| 提取 Prompt 结构模式 | `both` | Provides both transcript and structured messages |
| 提取世界书模式 | `active` | Reuses the currently active world info context |
| 包含故事时间 | `true` | Provides the story timeline during extraction |
| 包含总结快照 | `true` | Provides active summaries during extraction |
| 手动提取模式 | `pending` | Default extraction mode in the panel |
### Recall settings
| Setting | Default | Description |
| --- | --- | --- |
| 启用召回 | `true` | Automatically retrieve memories before generation |
| 向量预筛 | `true` | Use embedding to find candidates first |
| 图扩散 | `true` | Diffuse along graph relations to related nodes |
| LLM 精排 | `true` | Let the LLM select final results from candidates |
| 召回 Top-K | `20` | Vector prefilter count |
| 最终节点上限 | `12` | Maximum number of nodes kept before injection |
| 图扩散 Top-K | `100` | Graph diffusion candidate count |
| LLM 候选池 | `30` | Candidate pool size for reranking |
| 多意图拆分 | `true` | Split one input into multiple retrieval intents |
| 上下文混合查询 | `true` | Blend the current input, previous assistant reply, and previous user message |
| 词法增强 | `true` | Weight exact keyword matches |
| 时序链接 | `true` | Mutually boost temporally nearby nodes |
| 多样性采样 | `true` | Avoid overly homogeneous recall results |
### Cognitive and spatial settings
| Setting | Default | Description |
| --- | --- | --- |
| Scoped Memory | `true` | Enable scoped memory |
| POV Memory | `true` | Enable character/user POV memory |
| 区域目标 | `true` | Distinguish current region, adjacent regions, and global |
| 认知记忆 | `true` | Enable subjective/objective cognitive attribution |
| 空间邻接 | `true` | Allow adjacency relations between regions |
| 故事时间线 | `true` | Enable story timeline tags |
| 注入故事时间标签 | `true` | Hint the current story time in injection |
| 软时间引导 | `true` | Guide by prompting, without forcing rewrites |
### Maintenance settings
| Setting | Default | Description |
| --- | --- | --- |
| 启用整合 | `true` | Similar/conflicting memory analysis and merge |
| 整合阈值 | `0.85` | Similarity trigger threshold |
| 启用小总结 | `true` | Compatible with the old `synopsis` name |
| 启用层级总结 | `true` | Use a small summary + rollup summary system |
| 小总结频率 | `3` | Generate a small summary every N extractions |
| 总结折叠扇入 | `3` | Roll up summaries when this many exist at the same layer |
| 启用智能触发 | `false` | Enhance extraction only in high-information scenes |
| 启用主动遗忘 | `false` | Periodically lower the priority of low-value nodes |
| 启用概率召回 | `false` | Allow a small number of weakly related memories to enter by probability |
| 启用反思 | `true` | Periodically summarize long-term trends |
| 启用自动压缩 | `true` | Compress similar memories by extraction cycle |
### Task presets and regex cleanup
Task preset types:
- **`extract`**
- Memory extraction.
- **`recall`**
- Recall reranking.
- **`compress`**
- Memory compression.
- **`synopsis`**
- Small summary generation.
- **`summary_rollup`**
- Summary rollup.
- **`reflection`**
- Long-term reflection.
- **`consolidation`**
- Memory consolidation.
- **`planner`**
- ENA Planner planning.
Regex cleanup reduces polluted tags from entering extraction, recall, and injection:
- `thinking` / `think` / `analysis` / `reasoning`
- `choice`
- `UpdateVariable`
- `status_current_variable`
- `StatusPlaceHolderImpl`
Users can adjust global regex rules and task-local rules in "Task presets". When an empty rule set is explicitly saved, the plugin will not automatically add the default rules back.
### ENA Planner
ENA Planner is now integrated through the `planner` task preset. For deeper implementation and flow details, see the [ENA Planner feature doc](../features/ena-planner.md). It can use:
- Character card blocks.
- World info blocks.
- Recent chat blocks.
- BME recalled memory blocks.
- Historical `<plot>` blocks.
- Current player input blocks.
Recommendations:
- Configure the base API and enabled state in "Config → ENA Planner".
- Adjust the planning prompt structure and generation parameters in "Config → Task presets → planner".
### Hide old turns and render limit
These are two separate features; for deeper implementation and boundary details, see the [Hide old turns and render limit feature doc](../features/hide-and-render.md):
- **Hide old turns**
- Controls context tokens.
- Does not delete chat content.
- Uses SillyTavern's hide mechanism so earlier turns no longer participate in the main reply or ST-BME reads.
- **Limit rendered chat turns**
- Reduces lag in very long chat UIs.
- Syncs to SillyTavern's `chat_truncation`.
- Only controls how many recent turns the frontend loads at most.
- It is not context hiding and is not message deletion.
Important notes:
- If you need to run "rerun extraction range" or full history recovery on very old turns, temporarily disable the render limit or increase the count and refresh.
- When ST-BME detects that the current `context.chat` is likely only a recent N-turn render slice, it pauses destructive history recovery to avoid wrongly clearing the runtime graph.
### Native acceleration
Native acceleration is currently a gradual rollout capability. For deeper implementation and fallback strategy details, see the [Native acceleration feature doc](../features/native-acceleration.md). It covers:
- Graph layout.
- Persist Delta.
- Snapshot Hydrate.
Default strategy:
- Automatically activates based on thresholds for node count, edge count, record count, structural changes, and serialized size.
- `Fail-open` is enabled by default; when Native is unavailable or fails, ST-BME falls back to JS.
- You can use "globally force-disable Native" to fall back to JS everywhere.