fix(server): paginate large thread history to stop the server running out of memory#3510
fix(server): paginate large thread history to stop the server running out of memory#3510olafura wants to merge 5 commits into
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Address review (PR pingdotgg#3510, Cursor Bugbot): - Reconnect / checkpoint revert can re-snapshot or filter the live activity window, but the prepended `olderActivities` weren't invalidated — leaving gaps or showing reverted history. Reset the lazy-load state when the live window's oldest activity id changes (it's stable while activities only append, so this doesn't fire during a normal turn), in addition to on thread switch. Web + mobile. - Web only deduped a new older page against already-loaded older pages, not the live window (mobile already did both). Dedup against both so a boundary overlap can't produce duplicate ids / React keys. Not changed — the "unsequenced cursor hides sequenced history" finding is a false positive: NULL-sequence (legacy) rows always sort oldest in the window/cursor ordering, so the oldest-loaded row is only unsequenced once every sequenced row is already loaded; the unsequenced cursor can never strand sequenced rows. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Thanks Bugbot — addressed in cddc7e1:
|
ApprovabilityVerdict: Needs human review This PR introduces a new pagination system for thread activities across server, web, and mobile with significant runtime behavior changes. An unresolved bug report identifies potential stale state after checkpoint reverts, requiring human attention. You can customize Macroscope's approvability policy. Learn more. |
Address review (PR pingdotgg#3510, Cursor Bugbot): - Reconnect / checkpoint revert can re-snapshot or filter the live activity window, but the prepended `olderActivities` weren't invalidated — leaving gaps or showing reverted history. Reset the lazy-load state when the live window's oldest activity id changes (it's stable while activities only append, so this doesn't fire during a normal turn), in addition to on thread switch. Web + mobile. - Web only deduped a new older page against already-loaded older pages, not the live window (mobile already did both). Dedup against both so a boundary overlap can't produce duplicate ids / React keys. Not changed — the "unsequenced cursor hides sequenced history" finding is a false positive: NULL-sequence (legacy) rows always sort oldest in the window/cursor ordering, so the oldest-loaded row is only unsequenced once every sequenced row is already loaded; the unsequenced cursor can never strand sequenced rows. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
cddc7e1 to
75d25e2
Compare
|
Addressed the latest review in
Server/web/mobile typecheck, lint, and the projection + timeline suites pass. |
Address review (PR pingdotgg#3510, Cursor Bugbot): - Reconnect / checkpoint revert can re-snapshot or filter the live activity window, but the prepended `olderActivities` weren't invalidated — leaving gaps or showing reverted history. Reset the lazy-load state when the live window's oldest activity id changes (it's stable while activities only append, so this doesn't fire during a normal turn), in addition to on thread switch. Web + mobile. - Web only deduped a new older page against already-loaded older pages, not the live window (mobile already did both). Dedup against both so a boundary overlap can't produce duplicate ids / React keys. Not changed — the "unsequenced cursor hides sequenced history" finding is a false positive: NULL-sequence (legacy) rows always sort oldest in the window/cursor ordering, so the oldest-loaded row is only unsequenced once every sequenced row is already loaded; the unsequenced cursor can never strand sequenced rows. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
75d25e2 to
618df70
Compare
|
Addressed the latest review round: Cursor — "Append clears lazy-loaded history" (Medium) — The reset effect used macroscope — mixed-thread flash (Low, The two earlier macroscope Mediums (
|
| } | ||
| return oldest === null ? null : oldest.id; | ||
| } | ||
|
|
There was a problem hiding this comment.
Stale hasMore after revert
Medium Severity
After a checkpoint revert, the thread reducer drops activities but keeps the prior hasMoreActivities value. Lazy-load reset clears prepended pages yet hasMoreOlderActivities still reads that stale flag when no older page was loaded, so the UI can offer older history or call pagination when the retained set no longer has anything beyond the window.
Additional Locations (2)
Reviewed by Cursor Bugbot for commit 618df70. Configure here.
There was a problem hiding this comment.
Took another careful look — this is a false positive. thread.reverted keeps activities matching turnId === null || retainedTurnIds.has(turnId), i.e. it drops activities for the reverted (newer) turns and keeps the older/retained ones. hasMoreActivities describes whether rows exist older than the window's oldest boundary, which a revert of newer turns does not change — so the older rows it advertises still exist on the server and "load older" remains correct. And even in the worst case where the flag were momentarily stale-true, the lazy-load no-ops safely: an empty/all-overlap page now respects the server's page.hasMore (see the cursor fix in ead248b4) and stops without looping. No code change needed here.
618df70 to
c8e6474
Compare
|
Latest round: Cursor — "Wrong pagination cursor after live sort" (High). Valid consistency gap I introduced last round. I'd made the reshape sentinel order-independent ( Cursor — "Stale hasMore after revert" (Medium). False positive.
|
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using high effort and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit c8e6474. Configure here.
c8e6474 to
848cff6
Compare
Self-review (10-angle, max effort) of
|
| Sev | Finding | Fix |
|---|---|---|
| Low (hardening) | The internal SQL-input schemas typed beforeSequence/limit as bare Schema.Number, looser than the contract's NonNegativeInt. The WHERE clause (sequence < beforeSequence OR sequence IS NULL) is only equivalent to the old COALESCE(sequence,-1) < beforeSequence for non-negative cursors — a negative one silently returns only unsequenced rows. |
Tightened both internal schemas to NonNegativeInt so any future non-RPC caller is validated, not just the RPC boundary. Typecheck + 11 pager tests green. |
Latent / by-design (NOT changed — flagging for a decision)
- Window vs reducer NULL-ordering disagreement (highest-severity-if-triggered): the server window orders
sequence DESC(SQLite → NULLs last/oldest) while the client reducer'sactivityOrderusessequence ?? MAX_SAFE_INTEGER(NULLs newest). They only agree today because every row is NULL. Ifsequenceis ever partially backfilled, a thread with mixed rows would (a) drop the newest unsequenced rows from the 500-window and (b) paginate in a different order than it displays. The robust fix is to order both sides bycreated_atconsistently — but that's an index-strategy change (the covering index leads withsequence) and a no-op today, so I've left it. Recommend either ordering bycreated_atend-to-end, or guaranteeingsequenceis all-or-nothing. - Speculative sequence machinery: given the above, the entire sequenced-cursor arm (second SQL query, union input arm,
"beforeSequence" in inputbranch, the sequence-leading index) is dead in practice. A single(created_at, activity_id)keyset would cover 100% of today's reality with ~half the surface. Kept as forward-looking, but worth a conscious call. - web/mobile duplication (AGENTS.md "Duplicate logic across multiple files … should be avoided"): the ~120-line lazy-load state machine (gen guard, in-flight ref, reshape
useLayoutEffect, dedup) is near-verbatim inChatView.tsxanduse-thread-composer-state.ts. Should become one shared hook — deferred because it's a cross-app React-architecture refactor, not a surgical fix. - Asymmetry: activities are windowed to 500 but
messagesin the samegetThreadDetailByIdare still unbounded, so the heap-OOM rationale is only half-applied. handleScroll/loadOlderidentity churn on every live append (perf, not correctness);THREAD_DETAIL_ACTIVITY_WINDOWoverloaded as window size + default page + max page.
Verified safe (no action)
getSnapshot→getCommandReadModel (only the CLI reads .projects); toReversed() (Node/lib supports it); mapThreadActivityRow typing; hasMoreActivities optional doesn't break fixtures/decoders; window slice/hasMore arithmetic; the empty-page stop guard; sequence===0 cursor; created_at tie-break cursor. No CLAUDE.md/AGENTS.md violations beyond the duplication note.
848cff6 to
ead248b
Compare
Addressed the latest review round (
|
A busy SQLite database materialised hundreds of MB of activity payloads into the
Node heap and crashed the server. Bound the reads and add an on-demand
pagination path so deep history is still recoverable.
- /api/orchestration/snapshot and the `t3` CLI offline path called getSnapshot(),
loading every thread's full activity/message/checkpoint history even though
they only read `.projects`. Point both at getCommandReadModel() (same shape,
without the heavy per-thread tables).
- getThreadDetailById windows activities to the most recent 500 (it fetches
WINDOW+1 to detect truncation and sets `hasMoreActivities` so clients can
lazy-load). Live activities still stream in via the event subscription.
- New orchestration.getThreadActivities RPC pages older activities on demand:
cursor is {beforeSequence} for sequenced rows or {beforeCreatedAt,
beforeActivityId} for legacy unsequenced (NULL) rows, output {activities,
hasMore}. The sequenced query orders by `sequence DESC` with
`(sequence < ? OR sequence IS NULL)` so the (thread_id, sequence, created_at,
activity_id) index satisfies the ORDER BY instead of a filesort. Unsequenced
rows page by a (created_at, activity_id) cursor consistent with the window's
ordering.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The server windows thread detail to the most recent 500 activities and reports `hasMoreActivities`. The web timeline now fetches older pages on demand via the getThreadActivities RPC: - client-runtime: loadThreadActivities command on orchestrationEnvironment. - ChatView keeps per-thread older pages in local state, prepends them ahead of the live window, and derives hasMore from the server flag (no client-side window-size constant). A thread-keyed in-flight ref coalesces the duplicate dispatches a fast scroll-to-top would otherwise fire, and a request-key guard discards results after a thread switch. - MessagesTimeline triggers a load on reaching the top (maintainVisibleContentPosition anchors the viewport on prepend) with a "Load older history" header and loading indicator. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Mobile consumed the windowed thread detail with no way to fetch older activities, silently truncating history. Mirror the web lazy-load: - useThreadComposerState: older-activity state + a thread-keyed in-flight ref guard, a reset effect on thread switch, and loadThreadActivities with the same union cursor. Older pages are deduped (against prior pages and the live window) and prepended into the feed; initial hasMore comes from the server's hasMoreActivities flag. - ThreadFeed: onStartReached triggers the load, maintainVisibleContentPosition anchors the viewport on prepend, and the header shows a spinner while loading. - Forward the values through ThreadRouteScreen → ThreadDetailScreen. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Address review (PR pingdotgg#3510, Cursor Bugbot): - Reconnect / checkpoint revert can re-snapshot or filter the live activity window, but the prepended `olderActivities` weren't invalidated — leaving gaps or showing reverted history. Reset the lazy-load state when the live window's oldest activity id changes (it's stable while activities only append, so this doesn't fire during a normal turn), in addition to on thread switch. Web + mobile. - Web only deduped a new older page against already-loaded older pages, not the live window (mobile already did both). Dedup against both so a boundary overlap can't produce duplicate ids / React keys. Not changed — the "unsequenced cursor hides sequenced history" finding is a false positive: NULL-sequence (legacy) rows always sort oldest in the window/cursor ordering, so the oldest-loaded row is only unsequenced once every sequenced row is already loaded; the unsequenced cursor can never strand sequenced rows. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Detect a live-window reshape from an order-independent oldest boundary (`liveWindowOldestActivityId`: min by createdAt/id) instead of `activities[0]`. The reducer sorts unsequenced rows to the end while the server snapshot lists legacy unsequenced rows first, so the first live append re-sorts the array and shifts index 0 — which previously looked like a reshape and wrongly cleared the user's scrolled-up history. Run the older-pages reset in useLayoutEffect so the cleared state commits before paint; otherwise a thread switch renders one frame with the previous thread's lazy-loaded pages still merged in, flashing stale work-log and approval rows. Applied to both web (ChatView) and mobile (use-thread-composer-state); the shared sentinel helper lives in client-runtime with unit tests. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
ead248b to
311b469
Compare
Address review (PR pingdotgg#3510, Cursor Bugbot): - Reconnect / checkpoint revert can re-snapshot or filter the live activity window, but the prepended `olderActivities` weren't invalidated — leaving gaps or showing reverted history. Reset the lazy-load state when the live window's oldest activity id changes (it's stable while activities only append, so this doesn't fire during a normal turn), in addition to on thread switch. Web + mobile. - Web only deduped a new older page against already-loaded older pages, not the live window (mobile already did both). Dedup against both so a boundary overlap can't produce duplicate ids / React keys. Not changed — the "unsequenced cursor hides sequenced history" finding is a false positive: NULL-sequence (legacy) rows always sort oldest in the window/cursor ordering, so the oldest-loaded row is only unsequenced once every sequenced row is already loaded; the unsequenced cursor can never strand sequenced rows. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>


Fixes the server-side root cause behind #2761 (and likely #996).
What Changed
The server used to load a thread's entire history — every tool activity, message, and checkpoint — into memory at once. On a busy database that's hundreds of MB, which exhausted the Node heap and crashed the server (and made large threads slow to load / churn reconnects, as reported in #2761).
Now the server loads only the most recent slice of a thread's activity and tells the client whether older history exists; the web and mobile apps fetch older pages on demand as you scroll up. Deep history is still fully reachable — it's paged in instead of loaded all at once.
t3CLI offline path stop loading every thread's full history (they only read the project list). Thread detail is bounded to the most recent activities, with ahasMoreActivitiesflag and a cursor-paginatedgetThreadActivitiesRPC for older pages. The older-page query is shaped to use the existing index instead of a full sort.Three commits, one per layer (server / web / mobile), so the mobile commit can be dropped if you'd prefer a smaller PR.
Why
Programs in a thread emit a lot of tool activity over time. The server treated "open a thread" (and "load the snapshot") as "read all of it from SQLite into memory." For a long-lived thread or a busy database, that single read materialised more than the heap could hold, so the process ran out of memory and died — and even when it didn't crash, shipping one giant snapshot blocked the connection (#2761). Bounding the read + paging older history on demand keeps memory flat regardless of how much history a thread accumulates.
UI Changes
The web and mobile timelines gain a small "Load older history" affordance and a loading spinner when you scroll to the top of a thread longer than the window. It's a minor, additive interaction. I wasn't able to attach before/after screenshots or a scroll video here — happy to add them, or the server commit alone (no UI) carries the actual reliability fix if you'd rather review that first.
Checklist
🤖 Generated with Claude Code
Note
Medium Risk
Changes core read paths and pagination semantics for long threads (including legacy unsequenced rows); memory/OOM fix is high impact but clients must page correctly to see full history.
Overview
Server no longer materializes full thread activity history on open. Thread detail returns at most 500 recent activities plus
hasMoreActivities, with a newgetThreadActivitiescursor RPC (beforeSequenceor legacy unsequencedbeforeCreatedAt/beforeActivityId). Orchestration snapshot andt3CLI offline project resolution usegetCommandReadModelinstead of the full snapshot to avoid loading huge per-thread tables.Web and mobile merge lazily fetched older pages into the feed (dedup, in-flight guards, reset on window reshape) using shared helpers
liveWindowOldestActivityId/oldestActivityByChronology. Timelines load more at scroll-top (maintainVisibleContentPosition, header spinner or “Load older history” on web).Behavior: threads with >500 activities need backward paging for older tool history; initial detail is a window, not the full log.
Reviewed by Cursor Bugbot for commit 311b469. Bugbot is set up for automated code reviews on this repo. Configure here.
Note
Paginate large thread history to prevent server memory exhaustion
hasMoreActivitiesset onOrchestrationThreadwhen older entries exist.getThreadActivitiesPageRPC (orchestration.getThreadActivities) that pages older activities using either abeforeSequenceorbeforeCreatedAt/beforeActivityIdcursor.MessagesTimeline) and mobile (ThreadFeed) UIs display a 'Load older history' button or spinner at the top of the feed and trigger paging when scrolled to the start.ChatViewContentanduseThreadComposerState) manages pagination state, deduplicates pages, and resets older-page history when the live activity window reshapes.Macroscope summarized 311b469.