taozuqin/ST-Bionic-Memory-Ecology

Fork 0

mirror of https://github.com/Youzini-afk/ST-Bionic-Memory-Ecology.git synced 2026-06-14 02:40:45 +08:00

Files

youzini f67358e024 docs: add English README + bilingual usage manual

2026-05-31 18:20:31 +00:00

7.9 KiB

Raw Blame History

Configuration

中文 · English

This page is split out from the README as the main ST-BME user configuration reference, preserving setting names, defaults, and tables for quick lookup by feature.

Memory LLM

The memory LLM is used for:

Memory extraction.
Recall reranking.
Consolidation.
Compression.
Small summaries.
Summary rollup.
Reflection.
ENA Planner planning.

Configuration options:

Leave blank
- Reuse the current SillyTavern chat model.
Fill in OpenAI-compatible config
- Use an independent model for memory tasks.
- Useful when you want to separate the main chat model from the background maintenance model.

Security recommendations:

Do not publicly share exported extension_settings or browser storage that contains API keys.
Debug logs are off by default; enable them temporarily only when troubleshooting.

Embedding

Embedding is the core of smart recall.

Backend mode

Backend mode is recommended first:

Reuse SillyTavern backend's embedding provider.
Usually avoids storing the embedding API key directly in the browser.
Can use sources already supported by SillyTavern, such as OpenAI, Cohere, Mistral, Ollama, LlamaCpp, and vLLM.

Direct mode

In direct mode, the browser requests the embedding service directly:

Requires filling in the API URL, key, and model.
May hit CORS restrictions.
Suitable for a self-hosted gateway or independent embedding service.

After switching embedding mode or model, run "rebuild vectors".

Extraction settings

Setting	Default	Description
每 N 条回复提取	`1`	Trigger extraction every N assistant replies
提取上下文轮数	`2`	Number of conversation rounds to look back during extraction
自动延后最新助手	`false`	Allows the latest reply to stabilize before extraction
Assistant 排除标签	`think,analysis,reasoning`	Excludes reasoning tags by default
提取消息上限	`0`	`0` means unlimited
提取 Prompt 结构模式	`both`	Provides both transcript and structured messages
提取世界书模式	`active`	Reuses the currently active world info context
包含故事时间	`true`	Provides the story timeline during extraction
包含总结快照	`true`	Provides active summaries during extraction
手动提取模式	`pending`	Default extraction mode in the panel

Recall settings

Setting	Default	Description
启用召回	`true`	Automatically retrieve memories before generation
向量预筛	`true`	Use embedding to find candidates first
图扩散	`true`	Diffuse along graph relations to related nodes
LLM 精排	`true`	Let the LLM select final results from candidates
召回 Top-K	`20`	Vector prefilter count
最终节点上限	`12`	Maximum number of nodes kept before injection
图扩散 Top-K	`100`	Graph diffusion candidate count
LLM 候选池	`30`	Candidate pool size for reranking
多意图拆分	`true`	Split one input into multiple retrieval intents
上下文混合查询	`true`	Blend the current input, previous assistant reply, and previous user message
词法增强	`true`	Weight exact keyword matches
时序链接	`true`	Mutually boost temporally nearby nodes
多样性采样	`true`	Avoid overly homogeneous recall results

Cognitive and spatial settings

Setting	Default	Description
Scoped Memory	`true`	Enable scoped memory
POV Memory	`true`	Enable character/user POV memory
区域目标	`true`	Distinguish current region, adjacent regions, and global
认知记忆	`true`	Enable subjective/objective cognitive attribution
空间邻接	`true`	Allow adjacency relations between regions
故事时间线	`true`	Enable story timeline tags
注入故事时间标签	`true`	Hint the current story time in injection
软时间引导	`true`	Guide by prompting, without forcing rewrites

Maintenance settings

Setting	Default	Description
启用整合	`true`	Similar/conflicting memory analysis and merge
整合阈值	`0.85`	Similarity trigger threshold
启用小总结	`true`	Compatible with the old `synopsis` name
启用层级总结	`true`	Use a small summary + rollup summary system
小总结频率	`3`	Generate a small summary every N extractions
总结折叠扇入	`3`	Roll up summaries when this many exist at the same layer
启用智能触发	`false`	Enhance extraction only in high-information scenes
启用主动遗忘	`false`	Periodically lower the priority of low-value nodes
启用概率召回	`false`	Allow a small number of weakly related memories to enter by probability
启用反思	`true`	Periodically summarize long-term trends
启用自动压缩	`true`	Compress similar memories by extraction cycle

Task presets and regex cleanup

Task preset types:

extract
- Memory extraction.
recall
- Recall reranking.
compress
- Memory compression.
synopsis
- Small summary generation.
summary_rollup
- Summary rollup.
reflection
- Long-term reflection.
consolidation
- Memory consolidation.
planner
- ENA Planner planning.

Regex cleanup reduces polluted tags from entering extraction, recall, and injection:

thinking / think / analysis / reasoning
choice
UpdateVariable
status_current_variable
StatusPlaceHolderImpl

Users can adjust global regex rules and task-local rules in "Task presets". When an empty rule set is explicitly saved, the plugin will not automatically add the default rules back.

ENA Planner

ENA Planner is now integrated through the planner task preset. For deeper implementation and flow details, see the ENA Planner feature doc. It can use:

Character card blocks.
World info blocks.
Recent chat blocks.
BME recalled memory blocks.
Historical <plot> blocks.
Current player input blocks.

Recommendations:

Configure the base API and enabled state in "Config → ENA Planner".
Adjust the planning prompt structure and generation parameters in "Config → Task presets → planner".

Hide old turns and render limit

These are two separate features; for deeper implementation and boundary details, see the Hide old turns and render limit feature doc:

Hide old turns
- Controls context tokens.
- Does not delete chat content.
- Uses SillyTavern's hide mechanism so earlier turns no longer participate in the main reply or ST-BME reads.
Limit rendered chat turns
- Reduces lag in very long chat UIs.
- Syncs to SillyTavern's chat_truncation.
- Only controls how many recent turns the frontend loads at most.
- It is not context hiding and is not message deletion.

Important notes:

If you need to run "rerun extraction range" or full history recovery on very old turns, temporarily disable the render limit or increase the count and refresh.
When ST-BME detects that the current context.chat is likely only a recent N-turn render slice, it pauses destructive history recovery to avoid wrongly clearing the runtime graph.

Native acceleration

Native acceleration is currently a gradual rollout capability. For deeper implementation and fallback strategy details, see the Native acceleration feature doc. It covers:

Graph layout.
Persist Delta.
Snapshot Hydrate.

Default strategy:

Automatically activates based on thresholds for node count, edge count, record count, structural changes, and serialized size.
Fail-open is enabled by default; when Native is unavailable or fails, ST-BME falls back to JS.
You can use "globally force-disable Native" to fall back to JS everywhere.

7.9 KiB Raw Blame History