fix(postgres): right-size pools & bound brainztableinator concurrency to fit shared PgBouncer cap by SimplicityGuy · Pull Request #396 · SimplicityGuy/discogsography

SimplicityGuy · 2026-06-21T20:13:52Z

Problem

In production, all long-lived services connect to a shared PostgreSQL through a PgBouncer pooler in session-pooling mode (POSTGRES_HOST=pgbouncer:6432). In session mode every client connection pins a dedicated Postgres backend for its whole lifetime against a hard per-database cap of 45. During a MusicBrainz bulk import, brainztableinator churns constantly on ⚠️ Connection pool exhausted (attempt 1/5)… while PgBouncer sits at 45/45 with ~18 clients queued and Postgres shows ~29 backends idle in transaction. The app collectively wants ~63 connections against the 45 cap.

Full write-up: docs/postgres-pool-exhaustion-analysis.md.

Root causes (with evidence)

Uncoordinated, oversized pool maxima. Sum of service pool max was 115 (api 10 + tableinator 50 + brainztableinator 50 + insights 5) — 2.5× the cap. brainztableinator's max=50 alone exceeds the 45 cap. Sizes were copy-pasted "to match prefetch", not budgeted against the shared backend pool.
No concurrency bound in brainztableinator. One transaction per message, with prefetch_count=200 × 4 consumers = up to 800 in-flight handlers, each grabbing a pooled connection — permanently driving the pool to its ceiling. (tableinator never exhausts despite the same max=50 because its BatchProcessor semaphore caps concurrent flushes at 2.)
Per-row child inserts widen the idle-in-transaction window. Each relationship/external-link was a separate INSERT in a Python loop inside one open transaction — N+M sequential round-trips pinning the backend between statements.

Fixes (app-side — the app was over-pooling)

Budget-aware pool sizing via resolve_postgres_pool_sizes() in common/config.py, with per-service defaults (api 2/8, tableinator 2/12, brainztableinator 2/12, insights 1/4) and shared POSTGRES_POOL_MIN_SIZE / POSTGRES_POOL_MAX_SIZE overrides. New sum of maxima ≈ 36 ≤ 45; idle footprint drops from 13 → 8.
Couple brainztableinator prefetch to pool capacity — channel-global QoS (global_=True) with prefetch_count = pool max (_channel_prefetch), so RabbitMQ applies backpressure instead of the pool's retry loop.
Batch child-row inserts — _insert_relationships / _insert_external_links use a single executemany, collapsing N+M round-trips to 2 and shrinking the transaction window.

The resilient pool's 5-retry "exhausted" path already surfaces a clear hard failure (common/postgres_resilient.py:543) and is left unchanged — it should now rarely trigger.

When to raise the cap instead

Only if 12 concurrent writers prove insufficient after these fixes — then raise the PgBouncer cap and POSTGRES_POOL_MAX_SIZE together, keeping the sum of service maxima under the new cap. The 45 cap was not the limiting factor; the uncoordinated 115 of demand was.

Tests & verification

New: resolve_postgres_pool_sizes unit tests (defaults, env override, invalid values, clamping); _channel_prefetch coupling tests; batched-insert tests (single executemany, invalid-row filtering, empty-batch no-op).
Updated: pool-size assertions in tableinator/brainztableinator main tests, process-with-relationships/links tests, stale test_batch_performance placeholder.
2494 passed across common/brainztableinator/tableinator/api/insights; ruff, ruff format, mypy, bandit all green.

Does not change POSTGRES_HOST handling (host:port via PgBouncer remains the intended deployment).

🤖 Generated with Claude Code

… to fit shared PgBouncer cap Under a PgBouncer pooler in session mode, every pooled connection pins a dedicated Postgres backend for its lifetime against a hard per-database cap (45). The sum of service pool maxima was 115 (api 10 + tableinator 50 + brainztableinator 50 + insights 5), so a MusicBrainz bulk import drove brainztableinator into a constant "pool exhausted" retry churn while backends sat idle-in-transaction. Root causes & fixes: - Pool maxima uncoordinated and oversized (brainztableinator's 50 alone > cap). Add resolve_postgres_pool_sizes() with budget-aware, env-overridable defaults (POSTGRES_POOL_MIN_SIZE/MAX_SIZE); new sum of maxima ~36. - brainztableinator had no concurrency bound: prefetch_count=200 x 4 consumers (up to 800 in-flight handlers) each grabbing a connection. Couple prefetch to pool capacity via channel-global QoS (_channel_prefetch). - Per-row child inserts kept the transaction open across N+M round-trips (idle-in-transaction). Batch relationships/external_links via executemany. Adds/updates tests for pool sizing, prefetch coupling, and batched inserts. Adds docs/postgres-pool-exhaustion-analysis.md and updates configuration.md + CLAUDE.md. Does not change POSTGRES_HOST handling. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_014oJXBmfpaaPShUEDtakJiL

codecov · 2026-06-21T20:17:21Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

github-actions · 2026-06-21T20:18:02Z

E2E Coverage (webkit)

Totals
Statements:	46.17% ( 1267 / 2744 )
Lines:	46.17% ( 1267 / 2744 )

github-actions · 2026-06-21T20:19:01Z

E2E Coverage (chromium)

Totals
Statements:	46.17% ( 1267 / 2744 )
Lines:	46.17% ( 1267 / 2744 )

github-actions · 2026-06-21T20:19:24Z

E2E Coverage (firefox)

Totals
Statements:	46.17% ( 1267 / 2744 )
Lines:	46.17% ( 1267 / 2744 )

github-actions · 2026-06-21T20:20:14Z

E2E Coverage (webkit - iPhone 15)

Totals
Statements:	46.17% ( 1267 / 2744 )
Lines:	46.17% ( 1267 / 2744 )

github-actions · 2026-06-21T20:20:52Z

E2E Coverage (webkit - iPad Pro 11)

Totals
Statements:	46.17% ( 1267 / 2744 )
Lines:	46.17% ( 1267 / 2744 )

SimplicityGuy merged commit 5710f0c into main Jun 21, 2026
57 checks passed

SimplicityGuy deleted the worktree-fix+pg-pool-exhaustion branch June 21, 2026 20:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(postgres): right-size pools & bound brainztableinator concurrency to fit shared PgBouncer cap#396

fix(postgres): right-size pools & bound brainztableinator concurrency to fit shared PgBouncer cap#396
SimplicityGuy merged 1 commit into
mainfrom
worktree-fix+pg-pool-exhaustion

SimplicityGuy commented Jun 21, 2026

Uh oh!

codecov Bot commented Jun 21, 2026

Uh oh!

github-actions Bot commented Jun 21, 2026

Uh oh!

github-actions Bot commented Jun 21, 2026

Uh oh!

github-actions Bot commented Jun 21, 2026

Uh oh!

github-actions Bot commented Jun 21, 2026

Uh oh!

github-actions Bot commented Jun 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

SimplicityGuy commented Jun 21, 2026

Problem

Root causes (with evidence)

Fixes (app-side — the app was over-pooling)

When to raise the cap instead

Tests & verification

Uh oh!

codecov Bot commented Jun 21, 2026

Codecov Report

Uh oh!

github-actions Bot commented Jun 21, 2026

E2E Coverage (webkit)

Uh oh!

github-actions Bot commented Jun 21, 2026

E2E Coverage (chromium)

Uh oh!

github-actions Bot commented Jun 21, 2026

E2E Coverage (firefox)

Uh oh!

github-actions Bot commented Jun 21, 2026

E2E Coverage (webkit - iPhone 15)

Uh oh!

github-actions Bot commented Jun 21, 2026

E2E Coverage (webkit - iPad Pro 11)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant