Added test coverage to missing zip file regex, and bumped version number

meson800 · meson800 · commit 4a71309584d2 · 2026-02-26T12:11:34.000-05:00
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -4,6 +4,14 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## [0.7.0] - 2026-02-26
+### Modified
+- The following flow functions can now take zip files, not just folders.
+  This speeds up data loading especially on networked filesystems.
+  - `rd.flow.load_csv`
+  - `rd.flow.load_csv_with_metadata`
+  - `rd.flow.load_groups_with_metadata`
+
 ## [0.6.1] - 2026-02-13
 ### Modified
 - Collected new modules for export. Now `qpcr` and `ddpcr` modules can be used.
diff --git a/README.md b/README.md
@@ -116,9 +116,13 @@ Following the steps described above, the full process for publishing a release i
 ## Changelog
 See the [CHANGELOG](CHANGELOG.md) for detailed changes.
 ```
-## [0.6.1] - 2026-02-13
+## [0.7.0] - 2026-02-26
 ### Modified
-- Collected new modules for export. Now `qpcr` and `ddpcr` modules can be used.
+- The following flow functions can now take zip files, not just folders.
+  This speeds up data loading especially on networked filesystems.
+  - `rd.flow.load_csv`
+  - `rd.flow.load_csv_with_metadata`
+  - `rd.flow.load_groups_with_metadata`
 ````
 
 ## License
diff --git a/docs/tutorial/flow/loading_data.rst b/docs/tutorial/flow/loading_data.rst
@@ -0,0 +1,97 @@
+==================
+Loading flow data
+==================
+
+``rushd`` provides several convenient ways to load flow data.
+We recommend that you put all metadata into a YAML file and load it based
+on auto-generated well-ID information, but you can also load metadata
+directly from CSV filenames.
+
+Metadata in YAML
+----------------
+You can load extra metadata using a YAML file that defines
+which wells had which conditions / treatments / cell lines / etc.
+
+For example, this YAML files specifies circuit syntax and dox treatment:
+
+.. code-block:: yaml
+
+    metadata:
+      syntax:
+        - tandem: A1-A12
+        - convergent: B1-B12
+        - divergent: C1-C12
+      dox_ng:
+        - 0: A1-C6
+        - 1000: A7-C12
+
+You are required to place all metadata conditions inside a top-level
+key called ``metadata``. Within this, you can define arbitrary mappings
+that map onto ranges of wells.
+
+To load a plate of flow data, you can use ``rushd`` by specifying
+a path to this YAML file and a path to the folder containing the ``.csv`` files:
+
+.. code-block:: python
+
+    df = rd.flow.load_csv_with_metadata(rd.datadir/"exp01"/"metadata.yaml", rd.datadir/"exp01"/"csv_export")
+
+Alternatively, you can specify a ``.zip`` file of the metadata. Let's assume you zipped all of the CSVs into
+one file, called ``csvs.zip``. Then, you can load this as:
+
+.. code-block:: python
+
+    df = rd.flow.load_csv_with_metadata(rd.datadir/"exp01"/"metadata.yaml", rd.datadir/"exp01"/"csvs.zip")
+
+Finally, you are allowed to zip the metadata and CSVs together. Let's say you have the following zip file:
+
+.. code-block:: text
+
+    exp01.zip/
+    ├── metadata.yaml
+    └── export/
+        ├── export_A1_singlets.csv
+        ├── export_A2_singlets.csv
+        ├── ...
+        └── export_G12_singlets.csv
+
+You can load this by specifying the path to the metadata file and the CSVs as a tuple:
+
+.. code-block:: python
+
+    df = rd.flow.load_csv_with_metadata((rd.datadir/"exp01.zip", "metadata.yaml"), (rd.datadir/"exp01.zip", "export"))
+
+Check out the documentation for this function for more things you can do, like specifying only certain
+columns to be loaded:
+
+.. autofunction:: rushd.flow.load_csv_with_metadata
+    :noindex:
+
+Finally, you can use any of these data loading techniques with the multi-plate loading function:
+
+.. autofunction:: rushd.flow.load_groups_with_metadata
+    :noindex:
+
+Metadata in filenames
+---------------------
+If you ran a tube experiment or otherwise have metadata specified in filenames, you can use a function that just
+loads CSVs and extracts the metadata out of.
+
+Let's say that we have some files that have metadata in their filenames, like:
+
+- ``export_BFP_100_singlets.csv``
+- ``export_GFP_1000_singlets.csv``
+
+where we want to extract the construct and the dox concentration. Developing the regex is beyond the scope here:
+use https://regex101.com to evaluate teh regex. In this case, a regex that works here is ``^.*export_(?P<construct>.+)_(?P<dox>[0-9]+)_(?P<population>.+)\.csv``
+
+.. code-block:: python
+
+    regex = r"^.*export_(?P<construct>.+)_(?P<dox>[0-9]+)_(?P<population>.+)\.csv"
+    df = rd.flow.load_csv(rd.datadir/"exp02", regex)
+
+
+You can see more details of this function below:
+
+.. autofunction:: rushd.flow.load_csv
+    :noindex:
diff --git a/setup.py b/setup.py
@@ -12,7 +12,7 @@
 
 setuptools.setup(
     name="rushd",
-    version="0.6.1",
+    version="0.7.0",
     author="Christopher Johnstone, Kasey Love, Conrad Oakes",
     author_email="meson800@gmail.com",
     description="Package for maintaining robust, reproducible data management.",
diff --git a/tests/test_flow.py b/tests/test_flow.py
@@ -634,6 +634,7 @@ def test_csv_valid_custom_regex_zip_file(tmp_path: Path):
     with zipfile.ZipFile(tmp_path / "test.zip", "w") as zip:
         zip.writestr("export_BFP_100_singlets.csv", "channel1,channel2\n1,2")
         zip.writestr("export_GFP_1000_singlets.csv", "channel1,channel2\n10,20")
+        zip.writestr("some_invalid_file.csv", "channel1,channel2\n100,200")
 
     with zipfile.ZipFile(tmp_path / "test_subdir.zip", "w") as zip:
         zip.writestr("export/export_BFP_100_singlets.csv", "channel1,channel2\n1,2")
@@ -744,6 +745,7 @@ def test_data_metadata_zip(tmp_path: Path):
         )
         zip.writestr("export/export_A1_singlets.csv", "channel1,channel2\n1,2")
         zip.writestr("export/export_G12_singlets.csv", "channel1,channel2\n10,20")
+        zip.writestr("export/some_invalid_file.csv", "channel1,channel2\n100,200")
 
     df = flow.load_csv_with_metadata(
         (tmp_path / "data.zip", "export"), (tmp_path / "data.zip", "metadata.yaml")