Files
ST-Bionic-Memory-Ecology/reroll-history-review-notes.md
2026-04-03 00:22:53 +08:00

229 lines
7.9 KiB
Markdown

# ST-BME Reroll / History Recovery Review Notes
## 1. Background and user pain points
This note summarizes the issues found during a real debugging session, the user expectations behind the fixes, and the exact code changes that were made.
Primary user pain points:
- A new chat sometimes showed recall activity, but extraction state stayed at `processed floor = -1`, which made it look like nothing was recorded.
- Manual extraction worked, but automatic behavior felt inconsistent.
- The worst issue: after each new floor, ST-BME could enter history recovery, roll back, and replay extraction again, making every turn slow.
- The user explicitly does not want reroll/swipe to degrade into rebuilding everything before the changed floor.
The user's strongest expectations:
- Normal extraction should stay incremental.
- Hidden old floors do not need to be re-read just for extraction context.
- If floor 16 is rerolled/swiped, only memories related to the affected suffix should be updated.
- Floors before that prefix should not be replayed again.
- `swipe/reroll` should not silently fall back to generic full history recovery.
## 2. Important product/logic expectations from the user
These were treated as design constraints during the fix:
- Do not change the "hide old floors" strategy just to feed more extraction context.
- Do not make reroll depend on re-reading all previous floors.
- Do not allow reroll/swipe to degrade into full rebuild.
- If targeted reroll rollback cannot be done safely, fail closed and report failure instead of rebuilding the whole prefix.
## 3. What the code already intended to do
The repo already had two different recovery concepts.
### A. Generic history mutation recovery
Used for broad history mutations such as edit, delete, or hash mismatch.
Relevant code:
- `inspectHistoryMutation(...)` in `index.js`
- `recoverHistoryIfNeeded(...)` in `index.js`
- `detectHistoryMutation(...)` in `runtime-state.js`
This path can do larger recovery or replay work because it is trying to revalidate processed history integrity.
### B. Dedicated reroll rollback
Used for targeted suffix rollback around a reroll/swipe boundary.
Relevant code:
- `rollbackGraphForReroll(...)` in `index.js`
- `onRerollController(...)` in `extraction-controller.js`
This path is closer to the desired product behavior:
- find a recovery point near the target floor
- rollback only affected journals and suffix state
- prune processed hashes from the affected floor onward
- re-extract only the affected suffix
## 4. Issue 1 that was fixed: recovery loop caused by lost processed hashes
### Symptom
The panel showed a pattern like:
- history recovery starts
- replay completes
- next turn immediately triggers another recovery
- reason mentions missing `processedMessageHashes`
### Root cause
After successful history recovery, dirty state was cleared, but processed message hashes were not restored. On the next integrity recheck, the system saw:
- there is already processed progress
- but `processedMessageHashes` is empty or missing
That was interpreted as another dirty history condition, so recovery started again.
### Fix
After a successful recovery replay, restore processed hashes from current chat state before saving the graph.
Relevant change:
- `index.js`, `recoverHistoryIfNeeded(...)`
- added call: `updateProcessedHistorySnapshot(chat, recoveredLastProcessedFloor)`
Why this matters:
- recovery replay now leaves history state internally consistent
- the next hash recheck does not immediately trigger another replay loop
### Regression test added
- `tests/p0-regressions.mjs`
- `testHistoryRecoverySuccessRestoresProcessedHashesAfterReplay()`
## 5. Issue 2 that was fixed: host swipe event was routed into generic history recovery
### Symptom
Even though the repo already had dedicated reroll rollback code, the actual host `MESSAGE_SWIPED` event still went through generic history mutation recheck first.
That meant a user swipe could enter the broader history recovery pipeline instead of the suffix-only reroll path.
### Why this was wrong
This conflicts with the desired behavior:
- `swipe/reroll` is a targeted suffix change
- it should not be treated like a generic broad history mutation
- it should not have a path that silently escalates into full replay of earlier floors
### Fix
Changed event routing so that `MESSAGE_SWIPED` directly calls `onReroll(...)` instead of scheduling generic history mutation recheck.
Relevant changes:
- `event-binding.js`
- `onMessageSwipedController(...)` is now async
- it calls `runtime.onReroll({ fromFloor, meta })`
- it no longer schedules `scheduleHistoryMutationRecheck("message-swiped", ...)`
- `index.js`
- `onMessageSwiped(...)` is now async
- runtime wiring now passes `onReroll`
### Resulting behavior
For swipe/reroll:
- route to dedicated suffix rollback
- if rollback succeeds, only affected suffix is re-extracted
- if rollback cannot be done safely, fail as reroll rollback failure
- do not silently drop into generic history recovery fallback
### Regression test added
- `tests/p0-regressions.mjs`
- `testSwipeRoutesToRerollWithoutHistoryRecoveryFallback()`
This test specifically asserts:
- `onReroll` is called
- `scheduleHistoryMutationRecheck` is not called
## 6. Things intentionally not changed
These were discussed and intentionally left alone:
- The "hide old floors" behavior was not redesigned.
- No attempt was made to force extraction to re-read very old floors.
- No attempt was made to make reroll reconstruct the entire prior conversation.
This matches the user's explicit preference:
- previous floors should already be recorded
- reroll should only repair the affected suffix
- old hidden floors should not become the reason for broader replay behavior
## 7. Current intended invariants after the fixes
These are the important invariants another reviewer should validate:
1. Normal extraction remains incremental.
2. Generic edit/delete/hash corruption can still use broader recovery if needed.
3. `swipe/reroll` must use dedicated suffix rollback logic.
4. `swipe/reroll` must not silently degrade into generic full history recovery.
5. Successful recovery replay must leave `processedMessageHashes` populated consistently with processed floor state.
## 8. Files directly involved
- `index.js`
- `event-binding.js`
- `extraction-controller.js`
- `runtime-state.js`
- `chat-history.js`
- `tests/p0-regressions.mjs`
## 9. Concrete code locations worth reviewing
Suggested review targets:
- `index.js`
- `recoverHistoryIfNeeded(...)`
- `rollbackGraphForReroll(...)`
- `onMessageSwiped(...)`
- `event-binding.js`
- `onMessageSwipedController(...)`
- `extraction-controller.js`
- `onRerollController(...)`
- `runtime-state.js`
- `detectHistoryMutation(...)`
- `clearHistoryDirty(...)`
- `chat-history.js`
- processed hash pruning and extraction window helpers
## 10. What was verified locally
Executed successfully:
- `node --check event-binding.js`
- `node --check index.js`
- `node --check tests/p0-regressions.mjs`
- `node tests/p0-regressions.mjs`
## 11. What I want the reviewing AI to focus on
Please review for these questions:
- Is there any remaining code path where `MESSAGE_SWIPED` can still end up in `recoverHistoryIfNeeded(...)` instead of dedicated reroll rollback?
- Is there any remaining reroll path that can still escalate into prefix/full rebuild instead of suffix-only repair?
- After reroll rollback, are `historyState`, `processedMessageHashes`, and vector repair state all kept mutually consistent?
- Are there any race conditions between swipe-triggered reroll, auto extraction, and delayed hide application?
- Does the current failure mode truly fail closed, or is there still an implicit generic fallback somewhere else?
## 12. Short conclusion
The key product decision behind these fixes is:
- generic history corruption and targeted reroll are not the same thing
- reroll/swipe should be handled as suffix repair, not broad recovery
The current changes aim to enforce exactly that distinction.