Skip to content

[EXPERIMENTAL]: Integrate cp-measure#982

Open
timtreis wants to merge 50 commits into
mainfrom
feature/add_cpmeasure
Open

[EXPERIMENTAL]: Integrate cp-measure#982
timtreis wants to merge 50 commits into
mainfrom
feature/add_cpmeasure

Conversation

@timtreis
Copy link
Copy Markdown
Member

@timtreis timtreis commented Mar 28, 2025

@timtreis timtreis marked this pull request as draft March 28, 2025 16:52
@timtreis timtreis added enhancement ✨ New feature or request image 🔬 squidpy2.0 Everything releated to a Squidpy 2.0 release sdata compat 🌌 release-added labels Mar 28, 2025
@timtreis
Copy link
Copy Markdown
Member Author

timtreis commented May 16, 2025

Note to self:

  • Doesn't correctly parse str names of channels
    INFO Calculating 'cpmeasure' correlation features between channels '0' and '1'.

  • Should show, for permutations, the total number of iterations (in general, the progress bar should contain a (step n out m) readout so one can know how far in the featurisation is. Can easily take more than a day given the amount of cells and cpu_cores. Should also maybe show the total runtime so far for the steps that are done

  • Fails if labels and image don't have the same dimensions, despite transformation to align them

@LucaMarconato
Copy link
Copy Markdown
Member

Looking forward to this PR 👀🥸

timtreis and others added 12 commits January 27, 2026 15:52
Introduces _tiling.py with build_tile_specs() and extract_tile() that
split a label image into overlapping tiles where each cell is assigned
to exactly one tile by centroid. Non-owned cells are zeroed out so
downstream processing never double-counts.

Includes 31 tests: deterministic brick-pattern grid (touching and
non-touching), coverage verification, and visual regression tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@timtreis
Copy link
Copy Markdown
Member Author

timtreis commented Apr 8, 2026

Refactoring in anticipation of afermg/cp_measure#38 being merged so we can upstream behaviour.

timtreis and others added 3 commits May 14, 2026 16:30
Wires the in-progress cp_measure.featurizer + lazy-tiling refactor onto a
working _tiling.py and closes out the six open notes on the PR.

_tiling.py:
* build_tile_specs now takes (shape, cell_info), so it is agnostic to
  whether labels are in memory, dask-backed, or multiscale.
* compute_cell_info is public; new compute_cell_info_multiscale (read
  coarsest scale, rescale to target) and compute_cell_info_tiled
  (stream tiles, merge boundary-spanning cells via additive accumulators).
* extract_tile_lazy slices an xr.DataArray and materializes only the crop;
  extract_tile retained for in-memory callers.
* verify_coverage takes a label_ids set.

_feature.py:
* Channel names: read via spatialdata.models.get_channel_names so c_coords
  set at parse time flow through to output column suffixes.
* Progress: tqdm wrapper around joblib.Parallel(return_as='generator_unordered')
  + periodic logg.info('Tile {n}/{total} done (elapsed ...)') so non-TTY
  runs (CI, slurm) also see progress.
* Alignment: _align_to_image_grid replaces the dim-mismatch raise with a
  coordinate-system aware crop. Identity-or-integer-pixel-translation is
  honored as a 1-to-1 pixel alignment; the overlap rectangle is processed
  and out-of-extent cells are counted, not crashed on. Non-pixel-aligned
  transforms either raise with a spatialdata.rasterize hint
  (align_mode='strict', default) or trigger materialization via
  spatialdata.rasterize (align_mode='rasterize') with a warning.
* DropReport: per-run counter for cells dropped due to extent, partial
  boundary intersection, cp_measure no-data, or empty tiles. Emitted via
  logg.info(report.summary()) at the end of every run.

Tests: 39 in test_tiling.py (was 30; new coverage for the lazy/multiscale
helpers + verify_coverage edge cases), 35 in test_calculate_image_features
including a TestPR982Concerns class with one regression test per open note.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* compute_cell_info_tiled: replace per-id np.where with scipy.ndimage.find_objects
  and np.bincount sums. One vectorized pass per tile instead of O(n_cells) scans.
* _zero_non_owned: replace per-id rewrite loop with np.isin + np.where.
* _classify_dropped_cells: drop the full-array .values + per-cell np.where; use
  compute_cell_info_tiled bboxes for inside/partial/outside classification, so
  the full label array is no longer materialized.
* CellInfo: add bbox_y0/bbox_x0 fields so callers can do bbox math without
  reconstructing from the centroid (which is area-weighted, not bbox-centered).
* _relabel_contiguous: replaced by skimage.segmentation.relabel_sequential.
* _align_to_image_grid: flatten nested if/else with elif chain; extract
  _rasterize_to_image_grid so the shapes-key path and the align_mode='rasterize'
  path no longer duplicate the rasterize call.
* DropReport: empty_tile_drop -> empty_tiles (the counter increments per tile).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
timtreis added a commit that referenced this pull request May 26, 2026
Five Sphinx warnings (treated as errors by ``-W``) on the docs build:

- ``:mod:`spatialdata_plot``` and ``:meth:`spatialdata.SpatialData.pl.show```
  have no intersphinx targets.
- ``:class:`dask.distributed.Client``` has no intersphinx target either.
- ``:func:`~squidpy.experimental.im.calculate_image_features``` pointed at
  a function that does not exist on this branch (planned for PR #982).

Downgrade all five to plain double-backtick literals.  Re-phrase the
calculate_image_features reference to describe the shared tiling
infrastructure (``squidpy.experimental.im._tiling``) without claiming
a public function that has not yet shipped.
timtreis and others added 5 commits May 26, 2026 23:09
* _featurize_tile: accept a pre-built cp_config and drop the per-tile
  _build_cp_config rebuild. Config is now constructed once in
  calculate_image_features and reused across every tile (matters on
  100kx100k images with thousands of tiles).
* pyproject: cp-measure>=0.1.4 -> >=0.1.19 to pick up the granularity
  correctness fix (#44, #47), 3D-only feature filtering (#35), and
  static typing (#45). No upper cap left in place; bump when upstream
  ships a breaking release.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* pyproject: cp-measure>=0.1.19,<0.2 -- pre-1.0 dep, cap upper bound so a
  future 0.2.x release doesn't silently break installs.
* _featurize_tile: cp_config is now keyword-only with a default of None
  and falls back to _build_cp_config when not supplied. Preserves the
  pre-hoist call signature for direct (test/notebook) callers while the
  caller-built reuse path in calculate_image_features still skips the
  per-tile rebuild.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Resolves add/add conflict on src/squidpy/experimental/im/_tiling.py
between our cp_measure-driven lazy tiling refactor and PR #1157's tiling
QC additions. Unified into one module:

* Kept our superset definitions of CellInfo (with bbox_y0/bbox_x0
  defaults), TileSpec, build_tile_specs((shape, cell_info, ...)),
  compute_cell_info, compute_cell_info_multiscale,
  compute_cell_info_tiled, extract_tile, extract_tile_lazy,
  verify_coverage, and the array-returning _zero_non_owned.
* Added extract_labels_tile_lazy(labels_da, spec) -- the labels-only
  crop variant from main, needed by tl/_tiling_qc.py. Implemented on
  top of our _zero_non_owned return style.
* __all__ now exports the new symbol.

Auto-merge restored main's new files (tl/_tiling_qc.py,
pl/_tiling_qc.py, conftest.py, tests/_images/TilingQCVisual_*.png,
test_tiling_qc.py); our earlier deletion of the old tl/_tiling_qc.py
no longer applies -- the new QC implementation supersedes it.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Aligns with main's tl/_tiling_qc.py tests that already call
``compute_cell_info_tiled(labels_da, chunk_size=...)`` and with the
numpy/dask convention. Internal body and our own test in
tests/experimental/test_tiling.py updated accordingly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Convention in the scverse ecosystem is to address channels by name only.
Passing an int now raises TypeError; passing a non-existent name still
raises ValueError as before. A channel whose name happens to be the
string "0" is still accepted -- the check discriminates on Python type
(isinstance(ch, str)), not on the string contents.

* _prepare_lazy: type hint list[str] | list[int] | None -> list[str] | None
  and add an isinstance(ch, str) guard before lookup.
* calculate_image_features: same type hint update.
* Docstring clarified that integer indices are not accepted.
* test_channel_selection_by_index renamed to test_channel_selection_rejects_int
  and now asserts TypeError.
* test_concern4_channel_subset_by_index renamed to ..._by_name and passes
  ["c0", "c2"] instead of [0, 2].

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
timtreis pushed a commit to timtreis/squidpy that referenced this pull request May 27, 2026
- Trim DropReport to `empty_tiles` (the only field this code ever
  increments)
- Narrow `align_mode` Literal to "strict" and add a runtime guard that
  catches dynamic callers passing other values
- Exclude `cpmeasure:*` flag names from the "available" list in the
  unknown-feature error (they always raise NotImplementedError; listing
  them as available is misleading)
- Raise on ambiguous mixes of `skimage:label` with `skimage:label:<prop>`
  (and same for `skimage:label+image`); the previous order-dependent
  behaviour silently expanded the narrowing form
- Collapse `pd.concat -> replace([inf,-inf],0) -> fillna(0) -> .values.astype(float32)`
  into a single numpy pass via `np.nan_to_num`; saves two full-table copies
- Use `pd.Categorical.from_codes` for the region column to avoid
  allocating an N-element Python list for a one-level categorical
- Hoist the labels_key/shapes_key XOR pick to one local
- Add `experimental.im.calculate_image_features` to docs/api.md

Also: remove all references to the multi-PR split (follow-up PRs,
PR-2, PR-4, "in this PR", "cp_measure-as-default behaviour from PR
scverse#982", TestPR982Concerns, "Concern N" markers) from code and tests.
timtreis pushed a commit to timtreis/squidpy that referenced this pull request May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement ✨ New feature or request image 🔬 release-added sdata compat 🌌 squidpy2.0 Everything releated to a Squidpy 2.0 release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants