mirror of
https://github.com/Youzini-afk/ST-Bionic-Memory-Ecology.git
synced 2026-06-13 18:31:16 +08:00
docs: document split extraction pipeline
This commit is contained in:
2
.gitignore
vendored
2
.gitignore
vendored
@@ -10,10 +10,8 @@ Thumbs.db
|
||||
skip-trivial-user-input-plan.md
|
||||
CLAUDE.md
|
||||
AGENTS.md
|
||||
plans/fix-regex-stage-alias-override.md
|
||||
猫妖恬恬.json
|
||||
plan_global_task_regex.md
|
||||
docs/BME六大功能全景解析.xlsx
|
||||
ST-BME_backup_6f78abcb-9aea-45b1-a8ad-fbbd8e4075f0-cx4dad.json
|
||||
plans/mvu-extra-analysis-guard.md
|
||||
tests/.tmp-*/
|
||||
|
||||
@@ -28,7 +28,7 @@ Quick links: [Configuration](docs/usage/configuration.md) · [Panel guide](docs/
|
||||
|
||||
## Core capabilities
|
||||
|
||||
- **Automatic memory extraction** — After each AI reply, ST-BME extracts structured nodes and relations from the conversation (characters, events, locations, rules, plot threads, reflections, subjective memories), excluding reasoning tags like `think`/`analysis`/`reasoning` by default.
|
||||
- **Automatic memory extraction** — After each AI reply, ST-BME extracts structured nodes and relations from the conversation (characters, events, locations, rules, plot threads, reflections, subjective memories), using a default two-stage objective + subjective/POV commit pipeline and excluding reasoning tags like `think`/`analysis`/`reasoning`.
|
||||
- **Multi-layer hybrid recall** — Before generation, relevant memories are recalled through vector prefilter, graph diffusion, lexical boosting, multi-intent splitting, DPP diversity sampling, and optional LLM reranking; per-message persistent recall cards are supported.
|
||||
- **Cognitive architecture** — Character POV / user POV / objective world memory, spatial region weighting, and a story timeline.
|
||||
- **Summarization & maintenance** — Small summaries, summary rollup, reflection, consolidation, automatic compression, active forgetting — all logged and reversible.
|
||||
@@ -49,7 +49,7 @@ ST-BME can be understood as three pipelines: **write** (conversation → memory)
|
||||
flowchart LR
|
||||
subgraph Write["Write: conversation → memory"]
|
||||
A["AI reply"] --> B["Structured message preprocessing"]
|
||||
B --> C["LLM extracts nodes/edges"]
|
||||
B --> C["LLM objective extraction + subjective/POV extraction"]
|
||||
C --> D["Nearest-neighbor reconciliation + cognitive scoping"]
|
||||
D --> E["Write graph + vector sync + timeline"]
|
||||
E --> F["Consolidate / compress / summarize / reflect"]
|
||||
|
||||
@@ -26,7 +26,7 @@ ST-BME(Bionic Memory Ecology)是一个 **SillyTavern 第三方前端扩展**
|
||||
|
||||
## 核心能力
|
||||
|
||||
- **自动记忆提取** — AI 回复后自动从对话中提取结构化节点和关系(角色、事件、地点、规则、主线、反思、主观记忆),默认排除 `think`/`analysis`/`reasoning` 等推理标签。
|
||||
- **自动记忆提取** — AI 回复后自动从对话中提取结构化节点和关系(角色、事件、地点、规则、主线、反思、主观记忆),默认走客观事实 + 主观/POV 双阶段提交管线,并排除 `think`/`analysis`/`reasoning` 等推理标签。
|
||||
- **多层混合召回** — 生成前自动召回相关记忆,链路含向量预筛、图扩散、词法增强、多意图拆分、DPP 多样性采样和可选 LLM 精排;支持消息级持久召回卡片。
|
||||
- **认知架构** — 角色 POV / 用户 POV / 客观世界记忆,空间区域权重,故事时间线。
|
||||
- **总结与维护** — 小总结、总结折叠、反思、整合、自动压缩、主动遗忘,带日志和回滚。
|
||||
@@ -47,7 +47,7 @@ ST-BME 可以理解为三条链路:**写入**(对话 → 记忆)、**读
|
||||
flowchart LR
|
||||
subgraph Write["写入:对话 → 记忆"]
|
||||
A["AI 回复"] --> B["结构化消息预处理"]
|
||||
B --> C["LLM 提取节点/边"]
|
||||
B --> C["LLM 客观提取 + 主观/POV 提取"]
|
||||
C --> D["近邻对照 + 认知归属"]
|
||||
D --> E["写入图谱 + 向量同步 + 时间线"]
|
||||
E --> F["整合 / 压缩 / 总结 / 反思"]
|
||||
|
||||
@@ -43,9 +43,22 @@
|
||||
|
||||
可选近期消息上限 `extractRecentMessageCap`(默认 0 = 不限)。提示词模式 `extractPromptStructuredMode` 默认 `"both"`(可选 `transcript` / `structured` / `both`)。
|
||||
|
||||
## 3. 构建提取提示词
|
||||
## 3. 默认 split-v1 提取管线
|
||||
|
||||
`buildTaskPrompt(settings, "extract", ...)` 分层组装:
|
||||
默认 `extractPipelineVersion` 是 `"split-v1"`。同一批结构化输入会进入两个职责更窄的 LLM 阶段:
|
||||
|
||||
1. **客观阶段**(`extract_objective`):只保留客观图谱操作,例如事件、角色、地点、规则、线程、区域和故事时间。该阶段输出中的 `pov_memory` 与 cognition 更新会被过滤掉。
|
||||
2. **主观/POV 阶段**(`extract_subjective`):只保留 `pov_memory` 与 cognition 更新。该阶段输出中的客观节点、区域更新和批次故事时间会被过滤掉。
|
||||
|
||||
两个阶段都通过校验后,才合并为一个 commit plan,并一次性写入图谱;如果主观阶段失败或输出无效,客观阶段不会先落库。这保证默认提取仍然保持“一次 batch、一次提交、一次持久化”的原子边界。
|
||||
|
||||
为了不破坏旧用户的自定义提取 Prompt,运行时会先检查旧 `extractPrompt` 和 `taskProfiles.extract`:只要检测到旧式自定义、迁移自旧 Prompt、陈旧默认模板或被修改过的默认 `extract` profile,就自动回退到 `legacy-single` 的单请求提取路径。
|
||||
|
||||
> 当前阶段没有改默认 Prompt 文案;`extract_objective` / `extract_subjective` 是工程管线和 task type 拆分,后续可以在对应 task profile 中替换成真正更短、更专注的客观/主观 Prompt。
|
||||
|
||||
## 4. 构建提取提示词
|
||||
|
||||
默认 split 管线仍复用同一套提取 Prompt 上下文构建能力;legacy 路径使用 `buildTaskPrompt(settings, "extract", ...)`,split 阶段使用对应 task type 进入 LLM 调用。上下文分层包括:
|
||||
|
||||
1. 当前对话(结构化 + transcript)
|
||||
2. 图谱状态上下文(`buildTaskGraphStats()`,topK 12、diffusionTopK 48、多意图开、最大文本 1200)
|
||||
@@ -56,11 +69,11 @@
|
||||
|
||||
LLM JSON 调用,maxRetries 2。
|
||||
|
||||
## 4. 规范化 LLM 操作
|
||||
## 5. 规范化 LLM 操作
|
||||
|
||||
从多种可能的容器键里提取操作数组,规范化每个操作的 `action` / `type` / `nodeId` / `ref` / `links` / `clusters` / `scope` / `storyTime` / `fields`,以及 `cognitionUpdates` / `regionUpdates` / `batchStoryTime`。
|
||||
|
||||
## 5. 写入图谱
|
||||
## 6. 写入图谱
|
||||
|
||||
遍历规范化操作:
|
||||
|
||||
@@ -95,7 +108,7 @@ update 操作触发时序处理:
|
||||
|
||||
> 当前默认后处理优先走**分层总结**(hierarchical summary),而非 `generateSynopsis()`。分层总结见 [`consolidation-and-compression.md`](consolidation-and-compression.md)。
|
||||
|
||||
## 6. 后处理
|
||||
## 7. 后处理
|
||||
|
||||
`handleExtractionSuccessController()`(`maintenance/extraction-success-controller.js`)在提取成功后依次处理:整合去重 → 分层总结 → 反思 → 睡眠遗忘 → 压缩 → 向量同步。这些见 [`consolidation-and-compression.md`](consolidation-and-compression.md)。
|
||||
|
||||
@@ -107,6 +120,7 @@ update 操作触发时序处理:
|
||||
| `extractEvery` | 1 | 每 N 条助手消息提取 |
|
||||
| `extractContextTurns` | 2 | 上下文轮数 |
|
||||
| `extractAutoDelayLatestAssistant` | false | lag-one 延迟提取 |
|
||||
| `extractPipelineVersion` | "split-v1" | 默认客观 + 主观/POV 双阶段提取;旧自定义 Prompt 自动回退 legacy |
|
||||
| `extractPromptStructuredMode` | "both" | 提示词模式 |
|
||||
| `enableSmartTrigger` | false | 智能触发 |
|
||||
| 排除标签 | think,analysis,reasoning | 提取时过滤 |
|
||||
|
||||
@@ -37,7 +37,7 @@ ST-BME 的运行可以归纳为三条相对独立的链路。
|
||||
助手消息落层
|
||||
→ 自动提取计划(够不够触发?智能触发?)
|
||||
→ 构建结构化提取输入(过滤 think/analysis 等)
|
||||
→ LLM 提取 → 规范化操作(create/update/delete/link)
|
||||
→ LLM 客观提取 + 主观/POV 提取 → 规范化操作(create/update/delete/link)
|
||||
→ 写入图谱节点与关系(含时序边)
|
||||
→ 后处理:整合去重 → 分层总结 → 反思 → 睡眠遗忘 → 压缩
|
||||
→ 向量同步(为新节点生成 embedding)
|
||||
|
||||
@@ -72,6 +72,7 @@ In direct mode, the browser requests the embedding service directly:
|
||||
| 每 N 条回复提取 | `1` | Trigger extraction every N assistant replies |
|
||||
| 提取上下文轮数 | `2` | Number of conversation rounds to look back during extraction |
|
||||
| 自动延后最新助手 | `false` | Allows the latest reply to stabilize before extraction |
|
||||
| Extraction pipeline version | `split-v1` | Default two-stage extraction: objective facts, then subjective/POV. Old custom extraction prompts automatically fall back to the legacy single-call path. |
|
||||
| Assistant 排除标签 | `think,analysis,reasoning` | Excludes reasoning tags by default |
|
||||
| 提取消息上限 | `0` | `0` means unlimited |
|
||||
| 提取 Prompt 结构模式 | `both` | Provides both transcript and structured messages |
|
||||
@@ -134,6 +135,9 @@ Task preset types:
|
||||
- **`extract`**
|
||||
- Memory extraction.
|
||||
|
||||
- **`extract_objective` / `extract_subjective`**
|
||||
- Objective and subjective/POV stages for the default `split-v1` extraction pipeline. This version only splits task type and commit boundaries; it does not rewrite prompt text here. Old custom `extract` prompts/profiles automatically fall back to the legacy single-call path.
|
||||
|
||||
- **`recall`**
|
||||
- Recall reranking.
|
||||
|
||||
|
||||
@@ -72,6 +72,7 @@ Embedding 是智能召回的核心。
|
||||
| 每 N 条回复提取 | `1` | 每几条助手回复触发一次提取 |
|
||||
| 提取上下文轮数 | `2` | 提取时向前看的对话轮数 |
|
||||
| 自动延后最新助手 | `false` | 可让最新回复稳定后再提取 |
|
||||
| 提取管线版本 | `split-v1` | 默认分成客观事实阶段 + 主观/POV 阶段;旧自定义提取 Prompt 会自动回退单请求 legacy |
|
||||
| Assistant 排除标签 | `think,analysis,reasoning` | 默认排除推理标签 |
|
||||
| 提取消息上限 | `0` | `0` 表示不限 |
|
||||
| 提取 Prompt 结构模式 | `both` | 同时提供 transcript 和 structured messages |
|
||||
@@ -134,6 +135,9 @@ Embedding 是智能召回的核心。
|
||||
- **`extract`**
|
||||
- 记忆提取。
|
||||
|
||||
- **`extract_objective` / `extract_subjective`**
|
||||
- 默认 `split-v1` 提取管线的客观阶段与主观/POV 阶段。当前版本只做 task type 与提交边界拆分,不在这里改写 Prompt 文案;旧自定义 `extract` Prompt/Profile 会自动回退到 legacy 单请求路径。
|
||||
|
||||
- **`recall`**
|
||||
- 召回精排。
|
||||
|
||||
|
||||
Reference in New Issue
Block a user