Add Hypothesis property tests for blob/bubble parsing (#71) by bradjin8 · Pull Request #81 · cppalliance/cppa-cursor-browser

bradjin8 · 2026-05-27T14:19:04Z

Summary

Adds tests/test_blob_parsing_fuzz.py with Hypothesis property tests for the blob/bubble parsing pipeline (closes Add fuzz/property-based tests for blob parsing #71).
Covers Bubble.from_dict (including adversarial non-dict inputs and invalid bubble_id types), workspace-tabs-style KV assembly (json.loads → Bubble.from_dict), CLI blob chain parsing (classify_blob_data, _extract_blob_refs, traverse_blobs on generated store.db fixtures), and text extraction (extract_text_from_bubble, messages_to_bubbles).
Extracts shared classify_blob_data() in utils/cli_chat_reader.py so traverse_blobs and fuzz tests use the same JSON-vs-chain classification logic.
Fixes extract_text_from_bubble to always return str when text is a non-string truthy value.
Adds hypothesis>=6.100,<7 to [project.optional-dependencies].dev in pyproject.toml and installs it in CI alongside pytest.
Documents pip install -e ".[dev]" in the README and ignores .hypothesis/ in .gitignore.

Test plan

python -m unittest tests.test_blob_parsing_fuzz -v
python -m unittest discover tests -v (full suite, including new fuzz tests)
CI green on the unittest matrix (Linux / Windows / macOS × Python 3.10–3.13)

Notes

Dev dependencies (hypothesis, pytest, mypy) are not added to requirements.txt / requirements-lock.txt (same pattern as pytest today).
Eval context: addresses Test 31 / “Unverified Dark Path” coverage for parser invariants without crashing on schema drift inputs.

Summary by CodeRabbit

Documentation
- Added development setup instructions to README with pip install guidance for contributors
Bug Fixes
- More robust text extraction with consistent string handling
- Improved blob parsing and traversal for more reliable message/attachment handling
Tests
- Added comprehensive property/fuzz tests covering parsing, blob traversal, and text extraction
Chores
- CI test workflow now installs property-testing tooling
- .gitignore updated to exclude local Hypothesis artifacts

coderabbitai · 2026-05-27T14:19:17Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cdad0fa6-040b-4e74-9d37-16409d9169df

📥 Commits

Reviewing files that changed from the base of the PR and between 24a144b and ca33259.

📒 Files selected for processing (5)

.github/workflows/tests.yml
tests/test_blob_parsing_fuzz.py
tests/test_cli_chat_reader.py
tests/test_models.py
utils/cli_chat_reader.py

🚧 Files skipped from review as they are similar to previous changes (1)

utils/cli_chat_reader.py

📝 Walkthrough

Walkthrough

Adds Hypothesis to dev dependencies and CI, ignores Hypothesis artifacts, centralizes blob classification and publicizes extract_blob_refs, coerces bubble text to str, and adds Hypothesis fuzz tests for bubble parsing, blob-chain traversal, and text extraction.

Changes

Property-Based Testing for Blob Parsing

Layer / File(s)	Summary
Hypothesis configuration & docs `pyproject.toml`, `.github/workflows/tests.yml`, `.gitignore`, `README.md`	Adds `hypothesis>=6.100,<7` to dev extras, installs Hypothesis in the unittest CI job, ignores `.hypothesis/`, and documents `pip install -e ".[dev]"` for development installs.
Blob classification & text extraction `utils/cli_chat_reader.py`, `utils/text_extract.py`	Renames `_extract_blob_refs` → `extract_blob_refs`, adds `classify_blob_data(data: bytes)` and wires it into `traverse_blobs` to separate JSON message blobs from binary chain nodes; `extract_text_from_bubble()` now coerces `bubble["text"]` to `str` and guarantees a string return.
Hypothesis fuzz test suite `tests/test_blob_parsing_fuzz.py`, `tests/test_cli_chat_reader.py`, `tests/test_models.py`	Adds comprehensive Hypothesis strategies, helpers, and tests that build temporary `store.db` artifacts and assert Bubble.from_dict, workspace assembly, extract_blob_refs/classify_blob_data/traverse_blobs, and extract_text_from_bubble never raise unexpected exceptions and are idempotent; updates existing tests to import the public `extract_blob_refs`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

cppalliance/cppa-cursor-browser#23: Modifies utils/cli_chat_reader.py including traverse_blobs-related logic; code-level related.

Suggested reviewers

timon0305
wpak-ai

Poem

🐰 In bytes and blobs I nimbly play,

I chase the edges where errors may stray.
I test each bubble, chain, and line,
Repeat and assert till outputs align.
🥕 A rabbit's hop for safer parsing time!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: adding Hypothesis property tests for blob/bubble parsing, directly matching the primary objective of the PR.
Linked Issues check	✅ Passed	The PR successfully implements all acceptance criteria from issue `#71`: hypothesis added to dev dependencies, 4+ property tests covering Bubble.from_dict/blob parsing/text extraction, `@given` decorators with Hypothesis strategies used throughout, parsing verified to never raise unhandled exceptions (only SchemaError), and tests should complete within time bounds.
Out of Scope Changes check	✅ Passed	All changes are directly aligned with issue `#71` objectives: test infrastructure (fuzz tests, Hypothesis dependency), refactoring of shared blob classification logic (classify_blob_data), and bug fixes for robustness (extract_text_from_bubble string coercion). No unrelated changes detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/fuzz-property-based-tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/test_blob_parsing_fuzz.py`:
- Around line 158-160: In _assemble_workspace_bubble the branch that checks if
not isinstance(bubble_id, str) calls Bubble.from_dict(parsed,
bubble_id=bubble_id) which will raise SchemaError via require_non_empty_str, so
the subsequent return None is dead code; remove the unreachable "return None"
after Bubble.from_dict in _assemble_workspace_bubble so control flow relies on
the existing SchemaError handling and the outer except SchemaError that already
returns None.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e10ebaad-910d-447a-a14b-601a0a2b0f21

📥 Commits

Reviewing files that changed from the base of the PR and between adb75a4 and a4ab1af.

📒 Files selected for processing (7)

.github/workflows/tests.yml
.gitignore
README.md
pyproject.toml
tests/test_blob_parsing_fuzz.py
utils/cli_chat_reader.py
utils/text_extract.py

…e_id isn’t a str.

bradjin8 · 2026-05-27T14:46:03Z

@coderabbitai review this PR.

coderabbitai · 2026-05-27T14:48:53Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
echo "=== tests/test_blob_parsing_fuzz.py ==="
cat tests/test_blob_parsing_fuzz.py

echo ""
echo "=== utils/cli_chat_reader.py ==="
cat utils/cli_chat_reader.py

echo ""
echo "=== utils/text_extract.py ==="
cat utils/text_extract.py

echo ""
echo "=== pyproject.toml ==="
cat pyproject.toml

echo ""
echo "=== .github/workflows/tests.yml ==="
cat .github/workflows/tests.yml

echo ""
echo "=== README.md ==="
cat README.md

echo ""
echo "=== .gitignore (last 20 lines) ==="
tail -20 .gitignore

echo ""
echo "=== models/conversation.py (Bubble class) ==="
cat models/conversation.py 2>/dev/null || fd -e py -g '*conversation*' --exec cat {}

echo ""
echo "=== services/workspace_tabs.py ==="
cat services/workspace_tabs.py 2>/dev/null || fd -e py -g '*workspace_tabs*' --exec cat {}

echo ""
echo "=== Directory structure ==="
tree -L 3 --gitignore 2>/dev/null || find . -maxdepth 3 -not -path './.git/*' | sort

Length of output: 94416

@bradjin8 Here is the full review of PR #81.

PR Review: Add Hypothesis property tests for blob/bubble parsing (`#71`)

Overview

This is a well-structured PR. The classify_blob_data() refactor is a clean improvement, the extract_text_from_bubble str-coercion fix is correct, and the test design (no-raise + idempotency + adversarial assertions) follows Hypothesis best practices. A few items worth addressing before merge:

`tests/test_blob_parsing_fuzz.py`

🔴 `test_traverse_blobs_never_raises` may never reach the BFS code path

The meta fixture passes only two fields:

meta = {"latestRootBlobId": root_id, "createdAt": 1_700_000_000_000}

Inside traverse_blobs, the meta is validated through CliSessionMeta.from_dict(...), which is caught as SchemaError and causes an early return []. If CliSessionMeta requires additional mandatory fields (e.g. agentId, mode, name), the BFS traversal and classify_blob_data logic will never actually be exercised by this test. You'd want to verify against models/cli_session.py and either include all required fields or patch the schema validator:

# Option A – complete the fixture
meta = {
    "latestRootBlobId": root_id,
    "createdAt": 1_700_000_000_000,
    "agentId": "test-agent",
    "mode": "agent",
    "name": "test",
}

# Option B – bypass schema validation so BFS is always reachable
from unittest.mock import patch
with patch("utils.cli_chat_reader.CliSessionMeta.from_dict", return_value=mock_meta):
    messages = traverse_blobs(db_path)

🟡 `test_traverse_blobs_never_raises` can trigger `HealthCheck.too_slow`

deadline=None only disables the per-example time budget; it does not suppress HealthCheck.too_slow. With 40 examples × SQLite DB creation per example, slow CI runners (especially Windows) can trip this. Add the suppression explicitly:

`@settings`(
    max_examples=40,
    deadline=None,
    suppress_health_check=[HealthCheck.too_slow],
)

You'll also need to add the import:

from hypothesis import HealthCheck, given, settings

🟡 `_assemble_workspace_bubble` reimplements rather than wraps production logic

The test helper comments say "Mirror workspace_tabs KV bubble load" but it is an independent reimplementation. If _loads_kv_value_logged or the unsupported-type branch in workspace_tabs.py changes, this test would not catch the regression. Consider either wrapping the actual function under test or adding a comment acknowledging the intentional decoupling:

# NOTE: This intentionally re-implements the KV-to-Bubble conversion rather than
# importing _loads_kv_value_logged, which has side effects (logging, payload hashing).
# Keep in sync with workspace_tabs.assemble_workspace_tabs if the KV load logic changes.

🟡 `@st.composite` return type annotation

The annotated return type -> dict on a @st.composite strategy is technically incorrect — it returns SearchStrategy[dict], not a bare dict. Type checkers may warn. Fix:

# Before
`@st.composite`
def _cli_message(draw) -> dict:

# After
from hypothesis.strategies import SearchStrategy

`@st.composite`
def _cli_message(draw) -> SearchStrategy[dict]:

Or simply omit the return annotation since Hypothesis's type stubs handle inference.

🟢 Minor: importing private symbol `_extract_blob_refs`

from utils.cli_chat_reader import (
    ...
    _extract_blob_refs,
)

Testing a private function directly is valid here since classify_blob_data delegates to it, but a brief comment acknowledging this is testing an internal helper (and that renaming it would require updating the import) avoids future confusion.

`utils/cli_chat_reader.py`

🟢 `classify_blob_data` — clean extraction

The new function correctly handles all three failure modes (UnicodeDecodeError, json.JSONDecodeError, TypeError). The isinstance(data, bytes) guard in traverse_blobs' load loop before calling classify_blob_data keeps the type contract honest.

🟢 `_extract_blob_refs` boundary condition

while i + 33 < len(data) is equivalent to i + 34 <= len(data), which is the exact minimum needed before accessing data[i+2:i+34]. No off-by-one.

`utils/text_extract.py`

🟢 `extract_text_from_bubble` fix is correct

if bubble.get("text") and str(bubble["text"]).strip():
    text = str(bubble["text"])

The str() coercion handles schema-drifted payloads where text is an int or bool, and the final guard return text if isinstance(text, str) else "" is now a redundant-but-harmless safety net. Looks good.

CI / tooling

🟢 `tests.yml` hypothesis install is consistent with `pyproject.toml`

Both use hypothesis>=6.100,<7. The mypy job is unaffected because ignore_missing_imports = true covers the uninstalled package.

🟡 Hypothesis database committed vs. gitignored

.hypothesis/ is correctly gitignored for most projects. Just worth knowing: committing it would speed up CI re-runs by replaying previously-found failures. The current choice (ignore it) is the standard default and is fine.

Summary

Severity	Item
🔴	`test_traverse_blobs_never_raises`: meta fixture may always trigger `SchemaError`, never reaching BFS
🟡	Missing `suppress_health_check=[HealthCheck.too_slow]` for SQLite-creating test
🟡	`_assemble_workspace_bubble` reimplements production logic without a sync note
🟡	`@st.composite _cli_message` return annotation should be `SearchStrategy[dict]`
🟢	`classify_blob_data` refactor — clean, correct
🟢	`extract_text_from_bubble` str-coercion fix — correct
🟢	Bounded strategies + `deadline=None` — appropriate for CI

bradjin8 added 2 commits May 27, 2026 09:59

initial implementation

665b70b

improved for issues

a4ab1af

bradjin8 self-assigned this May 27, 2026

coderabbitai Bot reviewed May 27, 2026

View reviewed changes

Comment thread tests/test_blob_parsing_fuzz.py Outdated

fix: Remove dead return None in _assemble_workspace_bubble when bubbl…

8d0f297

…e_id isn’t a str.

fix: resolve findings from coderabbitai's review report

24a144b

bradjin8 requested a review from timon0305 May 27, 2026 15:07

timon0305 approved these changes May 27, 2026

View reviewed changes

bradjin8 added 5 commits May 27, 2026 12:22

fix: _assemble_workspace_bubble doesn't truly mirror production.

b2352c8

fix: classify_blob_data widens the except list unnecessarily

49734fe

fix: _BUBBLE_LIKE doesn't directly cover the bug text_extract.py fixes

6290213

fix: test_traverse_blobs_never_raises skips the meta-parse error path

0289d3e

fix: comment slightly misleading

ca33259

bradjin8 requested a review from wpak-ai May 27, 2026 16:59

wpak-ai approved these changes May 27, 2026

View reviewed changes

wpak-ai merged commit d77d6df into master May 27, 2026
16 checks passed

wpak-ai deleted the feat/fuzz-property-based-tests branch May 27, 2026 17:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Hypothesis property tests for blob/bubble parsing (#71)#81

Add Hypothesis property tests for blob/bubble parsing (#71)#81
wpak-ai merged 9 commits into
masterfrom
feat/fuzz-property-based-tests

bradjin8 commented May 27, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 27, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

bradjin8 commented May 27, 2026

Uh oh!

coderabbitai Bot commented May 27, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

bradjin8 commented May 27, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bradjin8 commented May 27, 2026

Uh oh!

coderabbitai Bot commented May 27, 2026

PR Review: Add Hypothesis property tests for blob/bubble parsing (#71)

Overview

tests/test_blob_parsing_fuzz.py

🔴 test_traverse_blobs_never_raises may never reach the BFS code path

🟡 test_traverse_blobs_never_raises can trigger HealthCheck.too_slow

🟡 _assemble_workspace_bubble reimplements rather than wraps production logic

🟡 @st.composite return type annotation

🟢 Minor: importing private symbol _extract_blob_refs

utils/cli_chat_reader.py

🟢 classify_blob_data — clean extraction

🟢 _extract_blob_refs boundary condition

utils/text_extract.py

🟢 extract_text_from_bubble fix is correct

CI / tooling

🟢 tests.yml hypothesis install is consistent with pyproject.toml

🟡 Hypothesis database committed vs. gitignored

Summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bradjin8 commented May 27, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 27, 2026 •

edited

Loading

PR Review: Add Hypothesis property tests for blob/bubble parsing (`#71`)

`tests/test_blob_parsing_fuzz.py`

🔴 `test_traverse_blobs_never_raises` may never reach the BFS code path

🟡 `test_traverse_blobs_never_raises` can trigger `HealthCheck.too_slow`

🟡 `_assemble_workspace_bubble` reimplements rather than wraps production logic

🟡 `@st.composite` return type annotation

🟢 Minor: importing private symbol `_extract_blob_refs`

`utils/cli_chat_reader.py`

🟢 `classify_blob_data` — clean extraction

🟢 `_extract_blob_refs` boundary condition

`utils/text_extract.py`

🟢 `extract_text_from_bubble` fix is correct

🟢 `tests.yml` hypothesis install is consistent with `pyproject.toml`