Skip to content

Commit 4a71309

Browse files
committed
Added test coverage to missing zip file regex, and bumped version number
1 parent 92ff9cb commit 4a71309

File tree

5 files changed

+114
-3
lines changed

5 files changed

+114
-3
lines changed

CHANGELOG.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,14 @@ All notable changes to this project will be documented in this file.
44
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
55
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
66

7+
## [0.7.0] - 2026-02-26
8+
### Modified
9+
- The following flow functions can now take zip files, not just folders.
10+
This speeds up data loading especially on networked filesystems.
11+
- `rd.flow.load_csv`
12+
- `rd.flow.load_csv_with_metadata`
13+
- `rd.flow.load_groups_with_metadata`
14+
715
## [0.6.1] - 2026-02-13
816
### Modified
917
- Collected new modules for export. Now `qpcr` and `ddpcr` modules can be used.

README.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -116,9 +116,13 @@ Following the steps described above, the full process for publishing a release i
116116
## Changelog
117117
See the [CHANGELOG](CHANGELOG.md) for detailed changes.
118118
```
119-
## [0.6.1] - 2026-02-13
119+
## [0.7.0] - 2026-02-26
120120
### Modified
121-
- Collected new modules for export. Now `qpcr` and `ddpcr` modules can be used.
121+
- The following flow functions can now take zip files, not just folders.
122+
This speeds up data loading especially on networked filesystems.
123+
- `rd.flow.load_csv`
124+
- `rd.flow.load_csv_with_metadata`
125+
- `rd.flow.load_groups_with_metadata`
122126
````
123127
124128
## License
Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
==================
2+
Loading flow data
3+
==================
4+
5+
``rushd`` provides several convenient ways to load flow data.
6+
We recommend that you put all metadata into a YAML file and load it based
7+
on auto-generated well-ID information, but you can also load metadata
8+
directly from CSV filenames.
9+
10+
Metadata in YAML
11+
----------------
12+
You can load extra metadata using a YAML file that defines
13+
which wells had which conditions / treatments / cell lines / etc.
14+
15+
For example, this YAML files specifies circuit syntax and dox treatment:
16+
17+
.. code-block:: yaml
18+
19+
metadata:
20+
syntax:
21+
- tandem: A1-A12
22+
- convergent: B1-B12
23+
- divergent: C1-C12
24+
dox_ng:
25+
- 0: A1-C6
26+
- 1000: A7-C12
27+
28+
You are required to place all metadata conditions inside a top-level
29+
key called ``metadata``. Within this, you can define arbitrary mappings
30+
that map onto ranges of wells.
31+
32+
To load a plate of flow data, you can use ``rushd`` by specifying
33+
a path to this YAML file and a path to the folder containing the ``.csv`` files:
34+
35+
.. code-block:: python
36+
37+
df = rd.flow.load_csv_with_metadata(rd.datadir/"exp01"/"metadata.yaml", rd.datadir/"exp01"/"csv_export")
38+
39+
Alternatively, you can specify a ``.zip`` file of the metadata. Let's assume you zipped all of the CSVs into
40+
one file, called ``csvs.zip``. Then, you can load this as:
41+
42+
.. code-block:: python
43+
44+
df = rd.flow.load_csv_with_metadata(rd.datadir/"exp01"/"metadata.yaml", rd.datadir/"exp01"/"csvs.zip")
45+
46+
Finally, you are allowed to zip the metadata and CSVs together. Let's say you have the following zip file:
47+
48+
.. code-block:: text
49+
50+
exp01.zip/
51+
├── metadata.yaml
52+
└── export/
53+
├── export_A1_singlets.csv
54+
├── export_A2_singlets.csv
55+
├── ...
56+
└── export_G12_singlets.csv
57+
58+
You can load this by specifying the path to the metadata file and the CSVs as a tuple:
59+
60+
.. code-block:: python
61+
62+
df = rd.flow.load_csv_with_metadata((rd.datadir/"exp01.zip", "metadata.yaml"), (rd.datadir/"exp01.zip", "export"))
63+
64+
Check out the documentation for this function for more things you can do, like specifying only certain
65+
columns to be loaded:
66+
67+
.. autofunction:: rushd.flow.load_csv_with_metadata
68+
:noindex:
69+
70+
Finally, you can use any of these data loading techniques with the multi-plate loading function:
71+
72+
.. autofunction:: rushd.flow.load_groups_with_metadata
73+
:noindex:
74+
75+
Metadata in filenames
76+
---------------------
77+
If you ran a tube experiment or otherwise have metadata specified in filenames, you can use a function that just
78+
loads CSVs and extracts the metadata out of.
79+
80+
Let's say that we have some files that have metadata in their filenames, like:
81+
82+
- ``export_BFP_100_singlets.csv``
83+
- ``export_GFP_1000_singlets.csv``
84+
85+
where we want to extract the construct and the dox concentration. Developing the regex is beyond the scope here:
86+
use https://regex101.com to evaluate teh regex. In this case, a regex that works here is ``^.*export_(?P<construct>.+)_(?P<dox>[0-9]+)_(?P<population>.+)\.csv``
87+
88+
.. code-block:: python
89+
90+
regex = r"^.*export_(?P<construct>.+)_(?P<dox>[0-9]+)_(?P<population>.+)\.csv"
91+
df = rd.flow.load_csv(rd.datadir/"exp02", regex)
92+
93+
94+
You can see more details of this function below:
95+
96+
.. autofunction:: rushd.flow.load_csv
97+
:noindex:

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212

1313
setuptools.setup(
1414
name="rushd",
15-
version="0.6.1",
15+
version="0.7.0",
1616
author="Christopher Johnstone, Kasey Love, Conrad Oakes",
1717
author_email="[email protected]",
1818
description="Package for maintaining robust, reproducible data management.",

tests/test_flow.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -634,6 +634,7 @@ def test_csv_valid_custom_regex_zip_file(tmp_path: Path):
634634
with zipfile.ZipFile(tmp_path / "test.zip", "w") as zip:
635635
zip.writestr("export_BFP_100_singlets.csv", "channel1,channel2\n1,2")
636636
zip.writestr("export_GFP_1000_singlets.csv", "channel1,channel2\n10,20")
637+
zip.writestr("some_invalid_file.csv", "channel1,channel2\n100,200")
637638

638639
with zipfile.ZipFile(tmp_path / "test_subdir.zip", "w") as zip:
639640
zip.writestr("export/export_BFP_100_singlets.csv", "channel1,channel2\n1,2")
@@ -744,6 +745,7 @@ def test_data_metadata_zip(tmp_path: Path):
744745
)
745746
zip.writestr("export/export_A1_singlets.csv", "channel1,channel2\n1,2")
746747
zip.writestr("export/export_G12_singlets.csv", "channel1,channel2\n10,20")
748+
zip.writestr("export/some_invalid_file.csv", "channel1,channel2\n100,200")
747749

748750
df = flow.load_csv_with_metadata(
749751
(tmp_path / "data.zip", "export"), (tmp_path / "data.zip", "metadata.yaml")

0 commit comments

Comments
 (0)