Add OME-Arrow demonstration notebook by d33bs · Pull Request #210 · cytomining/CytoDataFrame

d33bs · 2026-04-15T20:25:40Z

Description

This PR adds a brief demonstration notebook which shows how OME-Arrow interacts with CytoDataFrame.

What kind of change(s) are included?

Documentation (changes docs or other related content)
Bug fix (fixes an issue).
Enhancement (adds functionality).
Breaking change (these changes would cause existing functionality to not work as expected).

Checklist

Please ensure that all boxes are checked before indicating that this pull request is ready for review.

I have read and followed the CONTRIBUTING.md guidelines.
I have searched for existing content to ensure this is not a duplicate.
I have performed a self-review of these additions (including spelling, grammar, and related).
These changes pass all pre-commit checks.
I have added comments to my code to help provide understanding
I have added a test which covers the code changes found within this PR
I have deleted all non-relevant text in this pull request template.

Summary by CodeRabbit

Documentation
- Added a new example demonstrating how to use OME-Arrow and CytoDataFrame together, including image cropping and data export to Parquet format.
Chores
- Updated development tool versions and configurations.

coderabbitai · 2026-04-15T20:25:48Z

📝 Walkthrough

Walkthrough

The PR adds a new documentation example demonstrating OME-Arrow and CytoDataFrame interoperability, updates the examples index, and upgrades pre-commit hook versions alongside configuration adjustments to support the new example with broader Ruff ignore patterns.

Changes

Cohort / File(s)	Summary
Pre-commit Hook Updates `.pre-commit-config.yaml`	Bumped `pyproject-fmt` from `v2.21.0` to `v2.21.1` and `ruff-pre-commit` from `v0.15.8` to `v0.15.10`.
Documentation Example `docs/src/examples.md`, `docs/src/examples/omearrow_and_cytodataframe.py`	Added new example page showcasing OME-Arrow and CytoDataFrame interoperability: loading OME-TIFF, cropping images, creating PyArrow tables, and persisting to Parquet. Linked in examples index.
Tooling Configuration `pyproject.toml`	Expanded Ruff per-file ignores to apply `E501` and `E402` rules across all `docs/src/examples/\\*.py` files. Reorganized tool configuration sections without changing values.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

A rabbit hopped through code with glee,
New examples for all eyes to see,
OME-Arrow danced with CytoDataFrame,
Version bumps aligned their aim,
Configuration tweaks complete the game! 🐰✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Add OME-Arrow demonstration notebook' directly and clearly describes the main change: adding a new example notebook demonstrating OME-Arrow functionality with CytoDataFrame.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (3)

docs/src/examples/omearrow_and_cytodataframe.py (2)

53-58: Avoid fixed output filenames in docs execution.

Writing example.ome.parquet to the current directory can leave artifacts and conflict across repeated/parallel runs. Prefer a temporary directory.

♻️ Suggested temporary output usage

+from pathlib import Path
+from tempfile import TemporaryDirectory
+
 # write a parquet table
-pq.write_table(table, "example.ome.parquet")
-
-# read the parquet table with cytodataframe
-# (showing the OME-Arrow image that was written)
-CytoDataFrame("example.ome.parquet")
+with TemporaryDirectory() as tmp_dir:
+    parquet_path = Path(tmp_dir) / "example.ome.parquet"
+    pq.write_table(table, parquet_path)
+
+    # read the parquet table with cytodataframe
+    # (showing the OME-Arrow image that was written)
+    CytoDataFrame(str(parquet_path))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@docs/src/examples/omearrow_and_cytodataframe.py` around lines 53 - 58, The
snippet writes a fixed filename via pq.write_table("example.ome.parquet") which
can leave artifacts and clash across runs; change it to write into a temporary
directory or temp file and pass that path to CytoDataFrame instead. Use Python's
tempfile (e.g., TemporaryDirectory or NamedTemporaryFile) to create a unique
path, join the temp path with the parquet filename, call pq.write_table(table,
temp_path) and then construct CytoDataFrame(temp_path); ensure the temp resource
remains available for the read operation.

26-29: Make the sample-image path execution-context safe.

The current relative path only works from specific working directories. A small fallback lookup makes the example much more robust.

♻️ Suggested path handling

+from pathlib import Path
+
 # load a tiff using OME Arrow
-oa_img = OMEArrow(
-    "../../../tests/data/cytotable/JUMP_plate_BR00117006/images/orig/r01c01f01p01-ch2sk1fk1fl1.tiff"
-)
+candidate_paths = [
+    Path("../../../tests/data/cytotable/JUMP_plate_BR00117006/images/orig/r01c01f01p01-ch2sk1fk1fl1.tiff"),
+    Path("tests/data/cytotable/JUMP_plate_BR00117006/images/orig/r01c01f01p01-ch2sk1fk1fl1.tiff"),
+]
+tiff_path = next((p for p in candidate_paths if p.exists()), None)
+if tiff_path is None:
+    raise FileNotFoundError("Could not locate demo TIFF file in expected repo paths.")
+oa_img = OMEArrow(str(tiff_path))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@docs/src/examples/omearrow_and_cytodataframe.py` around lines 26 - 29, The
example uses a hard-coded relative TIFF path which fails outside specific
working directories; update the OMEArrow instantiation to resolve the sample
image relative to the example file location (use __file__ or
Path(__file__).resolve().parent) and fall back to a small search (e.g., glob for
the filename pattern) if the file is not found, then pass that resolved path
into OMEArrow so OMEArrow(...) always receives an absolute,
execution-context-safe path.

pyproject.toml (1)

113-113: Scope E501 ignore more narrowly.

Applying E501 to all docs/src/examples/*.py may mask accidental long lines in future examples. Consider keeping the glob for E402 and restricting E501 to files that specifically need it.

♻️ Suggested config adjustment

-lint.per-file-ignores."docs/src/examples/*.py" = [ "E402", "E501" ]
+lint.per-file-ignores."docs/src/examples/*.py" = [ "E402" ]
+lint.per-file-ignores."docs/src/examples/omearrow_and_cytodataframe.py" = [ "E501" ]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pyproject.toml` at line 113, The per-file ignore rule
lint.per-file-ignores."docs/src/examples/*.py" currently silences both E402 and
E501 for every example file; narrow E501 so long-line checks still run by
keeping the glob entry for E402 only and moving E501 to specific files or
tighter globs (e.g., add separate lint.per-file-ignores entries for the
individual example filenames that truly need E501 or a more specific pattern),
updating the lint.per-file-ignores mapping accordingly so only E402 is applied
to "docs/src/examples/*.py" and E501 is applied only where necessary.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@docs/src/examples/omearrow_and_cytodataframe.py`:
- Around line 53-58: The snippet writes a fixed filename via
pq.write_table("example.ome.parquet") which can leave artifacts and clash across
runs; change it to write into a temporary directory or temp file and pass that
path to CytoDataFrame instead. Use Python's tempfile (e.g., TemporaryDirectory
or NamedTemporaryFile) to create a unique path, join the temp path with the
parquet filename, call pq.write_table(table, temp_path) and then construct
CytoDataFrame(temp_path); ensure the temp resource remains available for the
read operation.
- Around line 26-29: The example uses a hard-coded relative TIFF path which
fails outside specific working directories; update the OMEArrow instantiation to
resolve the sample image relative to the example file location (use __file__ or
Path(__file__).resolve().parent) and fall back to a small search (e.g., glob for
the filename pattern) if the file is not found, then pass that resolved path
into OMEArrow so OMEArrow(...) always receives an absolute,
execution-context-safe path.

In `@pyproject.toml`:
- Line 113: The per-file ignore rule
lint.per-file-ignores."docs/src/examples/*.py" currently silences both E402 and
E501 for every example file; narrow E501 so long-line checks still run by
keeping the glob entry for E402 only and moving E501 to specific files or
tighter globs (e.g., add separate lint.per-file-ignores entries for the
individual example filenames that truly need E501 or a more specific pattern),
updating the lint.per-file-ignores mapping accordingly so only E402 is applied
to "docs/src/examples/*.py" and E501 is applied only where necessary.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b07eb67b-0dc8-4ddc-a2ba-2b4e7c05e60c

📥 Commits

Reviewing files that changed from the base of the PR and between 00c5943 and 0e84970.

📒 Files selected for processing (5)

.pre-commit-config.yaml
docs/src/examples.md
docs/src/examples/omearrow_and_cytodataframe.ipynb
docs/src/examples/omearrow_and_cytodataframe.py
pyproject.toml

gwaybio

might be beyond scope of this PR, but I also wonder if the benchmark notebook belongs somewhere more prominently? (i can be convinced either way - it may actually make sense to only highlight those results in a future paper and therefore bundle that analysis notebook in the same location of other paper

d33bs · 2026-04-16T15:43:24Z

Thanks @gwaybio !

I also wonder if the benchmark notebook belongs somewhere more prominently?

The benchmarks for OME-Arrow currently reside in another repo: https://github.com/WayScience/ome-arrow-benchmarks , where it felt healthy to focus on only benchmarking work (because it's so different from the package-specific focus). Within OME-Arrow we do include some benchmarking in the form of a smoke-test to check that we don't regress on performance as changes proceed.

Regarding prominence / ease-of-understanding, I do think a paper would eventually help here. The modular / component-based repo design (as opposed to monorepo) has some flaws when it comes to how easy it is to find things. I feel it ends up making the work more sustainable over time though - because we don't know which parts will survive and in what ways they might need to change.

d33bs added 5 commits February 16, 2026 21:25

show simple demo of ome-arrow with cytodataframe

c86e2b3

remove colab widgets for rendering

dfa9fc2

Merge remote-tracking branch 'upstream/main' into ome-arrow-demo

6803f15

Update omearrow_and_cytodataframe.ipynb

aec6380

linting and include with docs

0e84970

d33bs marked this pull request as ready for review April 15, 2026 20:28

d33bs requested a review from jenna-tomkinson as a code owner April 15, 2026 20:28

coderabbitai bot reviewed Apr 15, 2026

View reviewed changes

gwaybio approved these changes Apr 15, 2026

View reviewed changes

d33bs merged commit 91cad90 into cytomining:main Apr 16, 2026
9 checks passed

d33bs deleted the ome-arrow-demo branch April 16, 2026 15:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add OME-Arrow demonstration notebook#210

Add OME-Arrow demonstration notebook#210
d33bs merged 5 commits intocytomining:mainfrom
d33bs:ome-arrow-demo

d33bs commented Apr 15, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 15, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

gwaybio left a comment

Uh oh!

d33bs commented Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

d33bs commented Apr 15, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

What kind of change(s) are included?

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

gwaybio left a comment

Choose a reason for hiding this comment

Uh oh!

d33bs commented Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

d33bs commented Apr 15, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 15, 2026 •

edited

Loading