Add OME-Arrow demonstration notebook#210
Conversation
📝 WalkthroughWalkthroughThe PR adds a new documentation example demonstrating OME-Arrow and CytoDataFrame interoperability, updates the examples index, and upgrades pre-commit hook versions alongside configuration adjustments to support the new example with broader Ruff ignore patterns. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (3)
docs/src/examples/omearrow_and_cytodataframe.py (2)
53-58: Avoid fixed output filenames in docs execution.Writing
example.ome.parquetto the current directory can leave artifacts and conflict across repeated/parallel runs. Prefer a temporary directory.♻️ Suggested temporary output usage
+from pathlib import Path +from tempfile import TemporaryDirectory + # write a parquet table -pq.write_table(table, "example.ome.parquet") - -# read the parquet table with cytodataframe -# (showing the OME-Arrow image that was written) -CytoDataFrame("example.ome.parquet") +with TemporaryDirectory() as tmp_dir: + parquet_path = Path(tmp_dir) / "example.ome.parquet" + pq.write_table(table, parquet_path) + + # read the parquet table with cytodataframe + # (showing the OME-Arrow image that was written) + CytoDataFrame(str(parquet_path))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/src/examples/omearrow_and_cytodataframe.py` around lines 53 - 58, The snippet writes a fixed filename via pq.write_table("example.ome.parquet") which can leave artifacts and clash across runs; change it to write into a temporary directory or temp file and pass that path to CytoDataFrame instead. Use Python's tempfile (e.g., TemporaryDirectory or NamedTemporaryFile) to create a unique path, join the temp path with the parquet filename, call pq.write_table(table, temp_path) and then construct CytoDataFrame(temp_path); ensure the temp resource remains available for the read operation.
26-29: Make the sample-image path execution-context safe.The current relative path only works from specific working directories. A small fallback lookup makes the example much more robust.
♻️ Suggested path handling
+from pathlib import Path + # load a tiff using OME Arrow -oa_img = OMEArrow( - "../../../tests/data/cytotable/JUMP_plate_BR00117006/images/orig/r01c01f01p01-ch2sk1fk1fl1.tiff" -) +candidate_paths = [ + Path("../../../tests/data/cytotable/JUMP_plate_BR00117006/images/orig/r01c01f01p01-ch2sk1fk1fl1.tiff"), + Path("tests/data/cytotable/JUMP_plate_BR00117006/images/orig/r01c01f01p01-ch2sk1fk1fl1.tiff"), +] +tiff_path = next((p for p in candidate_paths if p.exists()), None) +if tiff_path is None: + raise FileNotFoundError("Could not locate demo TIFF file in expected repo paths.") +oa_img = OMEArrow(str(tiff_path))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/src/examples/omearrow_and_cytodataframe.py` around lines 26 - 29, The example uses a hard-coded relative TIFF path which fails outside specific working directories; update the OMEArrow instantiation to resolve the sample image relative to the example file location (use __file__ or Path(__file__).resolve().parent) and fall back to a small search (e.g., glob for the filename pattern) if the file is not found, then pass that resolved path into OMEArrow so OMEArrow(...) always receives an absolute, execution-context-safe path.pyproject.toml (1)
113-113: ScopeE501ignore more narrowly.Applying
E501to alldocs/src/examples/*.pymay mask accidental long lines in future examples. Consider keeping the glob forE402and restrictingE501to files that specifically need it.♻️ Suggested config adjustment
-lint.per-file-ignores."docs/src/examples/*.py" = [ "E402", "E501" ] +lint.per-file-ignores."docs/src/examples/*.py" = [ "E402" ] +lint.per-file-ignores."docs/src/examples/omearrow_and_cytodataframe.py" = [ "E501" ]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pyproject.toml` at line 113, The per-file ignore rule lint.per-file-ignores."docs/src/examples/*.py" currently silences both E402 and E501 for every example file; narrow E501 so long-line checks still run by keeping the glob entry for E402 only and moving E501 to specific files or tighter globs (e.g., add separate lint.per-file-ignores entries for the individual example filenames that truly need E501 or a more specific pattern), updating the lint.per-file-ignores mapping accordingly so only E402 is applied to "docs/src/examples/*.py" and E501 is applied only where necessary.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@docs/src/examples/omearrow_and_cytodataframe.py`:
- Around line 53-58: The snippet writes a fixed filename via
pq.write_table("example.ome.parquet") which can leave artifacts and clash across
runs; change it to write into a temporary directory or temp file and pass that
path to CytoDataFrame instead. Use Python's tempfile (e.g., TemporaryDirectory
or NamedTemporaryFile) to create a unique path, join the temp path with the
parquet filename, call pq.write_table(table, temp_path) and then construct
CytoDataFrame(temp_path); ensure the temp resource remains available for the
read operation.
- Around line 26-29: The example uses a hard-coded relative TIFF path which
fails outside specific working directories; update the OMEArrow instantiation to
resolve the sample image relative to the example file location (use __file__ or
Path(__file__).resolve().parent) and fall back to a small search (e.g., glob for
the filename pattern) if the file is not found, then pass that resolved path
into OMEArrow so OMEArrow(...) always receives an absolute,
execution-context-safe path.
In `@pyproject.toml`:
- Line 113: The per-file ignore rule
lint.per-file-ignores."docs/src/examples/*.py" currently silences both E402 and
E501 for every example file; narrow E501 so long-line checks still run by
keeping the glob entry for E402 only and moving E501 to specific files or
tighter globs (e.g., add separate lint.per-file-ignores entries for the
individual example filenames that truly need E501 or a more specific pattern),
updating the lint.per-file-ignores mapping accordingly so only E402 is applied
to "docs/src/examples/*.py" and E501 is applied only where necessary.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b07eb67b-0dc8-4ddc-a2ba-2b4e7c05e60c
📒 Files selected for processing (5)
.pre-commit-config.yamldocs/src/examples.mddocs/src/examples/omearrow_and_cytodataframe.ipynbdocs/src/examples/omearrow_and_cytodataframe.pypyproject.toml
gwaybio
left a comment
There was a problem hiding this comment.
might be beyond scope of this PR, but I also wonder if the benchmark notebook belongs somewhere more prominently? (i can be convinced either way - it may actually make sense to only highlight those results in a future paper and therefore bundle that analysis notebook in the same location of other paper
|
Thanks @gwaybio !
The benchmarks for OME-Arrow currently reside in another repo: https://github.com/WayScience/ome-arrow-benchmarks , where it felt healthy to focus on only benchmarking work (because it's so different from the package-specific focus). Within OME-Arrow we do include some benchmarking in the form of a smoke-test to check that we don't regress on performance as changes proceed. Regarding prominence / ease-of-understanding, I do think a paper would eventually help here. The modular / component-based repo design (as opposed to monorepo) has some flaws when it comes to how easy it is to find things. I feel it ends up making the work more sustainable over time though - because we don't know which parts will survive and in what ways they might need to change. |
Description
This PR adds a brief demonstration notebook which shows how OME-Arrow interacts with CytoDataFrame.
What kind of change(s) are included?
Checklist
Please ensure that all boxes are checked before indicating that this pull request is ready for review.
Summary by CodeRabbit
Documentation
Chores