Skip to content

fix: stage uploads to temp before opening a db connection#65

Merged
bigint merged 11 commits into
mainfrom
burst-100rps-fix
May 24, 2026
Merged

fix: stage uploads to temp before opening a db connection#65
bigint merged 11 commits into
mainfrom
burst-100rps-fix

Conversation

@bigint

@bigint bigint commented May 24, 2026

Copy link
Copy Markdown
Owner

No description provided.

@bigint bigint marked this pull request as ready for review May 24, 2026 15:17
@bigint

bigint commented May 24, 2026

Copy link
Copy Markdown
Owner Author

@greptile

@greptile-apps

greptile-apps Bot commented May 24, 2026

Copy link
Copy Markdown

Greptile Summary

This PR fixes a connection-pool pressure issue by staging file uploads to a temp path before opening a DB session in upload_document, alongside a broad cleanup removing the backup feature and document webhook events (document.processing, document.ready, document.failed, document.deleted).

  • Upload fix: upload_document now uses async with session_factory()() opened only after stream_upload_to_temp completes, so large uploads no longer hold a DB connection during the streaming phase. persist_document already calls session.rollback() before re-raising IntegrityError, so the subsequent dedup SELECT inside the catch block executes on a clean session.
  • Backup removal: Migrations 0004/0005 clean up backup settings, webhook subscriptions, deliveries, audit/access log entries, and drop the backup_jobs table; all corresponding Python code and SDK bindings are removed.
  • Maintenance lock caching: active_lock_state() adds a 5-second Redis cache in front of the active_lock() DB query; the middleware and ensure_writes_allowed() now use the cached path, and the cache is invalidated on acquire/release.

Confidence Score: 5/5

Safe to merge — the upload fix is well-scoped and the session lifecycle is correctly managed throughout

The core upload change correctly delays DB connection acquisition until after the file is fully buffered to disk. persist_document performs its own rollback before re-raising any exception, so the session is always in a clean state when the outer catch blocks re-query it. The backup and document-webhook removals are backed by proper Alembic migrations. No correctness issues found.

No files require special attention

Important Files Changed

Filename Overview
api/bigrag/routers/documents/crud.py Core fix: file upload now staged to temp before opening a DB session; persist_document rolls back the session internally on IntegrityError so the subsequent dedup SELECT in the catch block is safe
api/bigrag/services/webhook/events.py Removed DOCUMENT_EVENTS, BACKUP_EVENTS, and DOCUMENT_STEP_EVENTS; VALID_EVENTS is now only COLLECTION_EVENTS
api/bigrag/services/maintenance.py Added Redis-cached active_lock_state() and cache invalidation on acquire/release; ensure_writes_allowed and middleware now use the cached path; is_active() still bypasses the cache (pre-existing, noted in previous review)
api/alembic/versions/0005_remove_document_webhook_events.py New migration cleanly removes document.* webhook deliveries, strips those event types from existing subscriptions, and deletes webhooks that become empty; downgrade is intentionally a no-op
api/alembic/versions/0004_remove_backups.py Migration removes backup settings, backup webhook deliveries/subscriptions, backup audit/access log entries, and drops the backup_jobs table; downgrade recreates the table but cannot restore deleted rows
api/bigrag/services/document_elements/persist.py Returns DocumentElementCounts(total, pending_enrichment) instead of a bare int, enabling the caller to drive multimodal enrichment from the correct subset count
api/bigrag/services/queue_finalize.py Removed _fanout_webhook_event calls for "complete" and "failed" steps; _publish_progress is now called synchronously (was always sync, the await was only around _fanout_webhook_event); uses pending_enrichment_count from new counts dataclass
api/bigrag/routers/webhooks/serializers.py Added _supported_events() filter so API responses omit events no longer in VALID_EVENTS; also applied in _webhook_to_dict for delivery dispatch

Reviews (3): Last reviewed commit: "Update" | Re-trigger Greptile

@greptile-apps

greptile-apps Bot commented May 24, 2026

Copy link
Copy Markdown

Greptile Summary

This PR fixes a DB connection leak in upload_document by staging the uploaded file to a temp path before opening any database session, then opens an explicit session_factory()() context only for the DB work. It also removes the entire backup system and all document-lifecycle webhook events (document.processing, document.ready, document.failed, document.deleted), replacing the ingestion-fanout webhook mechanism with simple synchronous progress publishes.

  • Upload fix: upload_document no longer accepts a DI-injected session; the file is written to temp first, then a short-lived async with session_factory()() as session handles dedup checks and document persistence. With expire_on_commit=False configured on the engine, accessing the returned doc object after the session closes is safe.
  • Backup removal: All backup service code, the backup_jobs table, the backup Dramatiq queue/actor, and the backup S3 setting specs are deleted; migration 0004 drops the table and cleans up related audit/access log rows.
  • Webhook event pruning: VALID_EVENTS is narrowed to collection and connector-sync events; serializers and the dispatcher cache both apply a _supported_events filter so stale subscriptions no longer appear in API responses or trigger deliveries.

Confidence Score: 3/5

The upload fix and backup removal are clean, but deploying as-is will leave existing webhooks subscribed to document events in a permanently broken state — active in the DB, empty via the API, and never firing — with no migration path to clean them up.

The migration removes backup.* event subscriptions from webhooks and deletes newly empty webhooks, but performs no equivalent cleanup for document.* events (document.processing, document.ready, document.failed, document.deleted), which are also removed from VALID_EVENTS in this same PR. Webhooks that subscribed only to document events will appear active but show an empty events list and never deliver, with the stale data silently persisting in the database.

api/alembic/versions/0004_remove_backups.py needs a follow-up block (mirroring the existing backup-event cleanup) that strips document.* events from the webhooks table and deletes any webhooks that become empty as a result.

Important Files Changed

Filename Overview
api/alembic/versions/0004_remove_backups.py New migration removes backup settings, audit/access log rows, and the backup_jobs table. Cleans up backup.* webhook subscriptions but omits the parallel cleanup for document.* events (document.processing, document.ready, document.failed, document.deleted), which are also removed from VALID_EVENTS in this PR.
api/bigrag/routers/documents/crud.py Core fix: DB session is now opened inside a manual session_factory()() block after the file is staged to temp, rather than being injected via Depends(get_session). With expire_on_commit=False, accessing the returned doc object after the session closes is safe. Dedup and integrity-error paths are preserved correctly.
api/bigrag/services/webhook/events.py Removes DOCUMENT_EVENTS (document.processing/ready/failed/deleted) and BACKUP_EVENTS from VALID_EVENTS, leaving only COLLECTION_EVENTS and CONNECTOR_SYNC_EVENTS.
api/bigrag/services/webhook/dispatcher.py Removes _handle_event and _get_collection_for_document methods (used for document ingestion webhook fanout). Adds VALID_EVENTS filter when building webhook cache so stale event types are excluded from dispatch matching.
api/bigrag/routers/webhooks/serializers.py Adds _supported_events helper to filter webhook events to VALID_EVENTS before serializing API responses. Applied in both _webhook_response and _webhook_to_dict.
api/bigrag/services/document_elements/persist.py replace_document_elements now returns a DocumentElementCounts dataclass with total and pending_enrichment fields instead of a plain int, allowing finalize_success to only trigger multimodal enrichment when there are actually pending elements.
api/bigrag/services/queue_finalize.py Replaces await queue._fanout_webhook_event(queue._publish_progress(...)) calls with synchronous queue._publish_progress(...) calls, removing webhook fanout from ingestion finalization.
api/bigrag/services/queue_processing.py Updates element_count usage to element_counts.pending_enrichment for enrichment gating and element_counts.total for DB column update, following the DocumentElementCounts refactor.

Fix All in Codex Fix All in Claude Code

Reviews (2): Last reviewed commit: "Update" | Re-trigger Greptile

Comment thread api/alembic/versions/0004_remove_backups.py Outdated
@bigint bigint merged commit 28c03fb into main May 24, 2026
5 checks passed
@bigint bigint deleted the burst-100rps-fix branch May 25, 2026 07:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant