pepkit
diff --git a/‎_typos.toml‎
Lines changed: 3 additions & 1 deletion b/‎_typos.toml‎
Lines changed: 3 additions & 1 deletion
diff --git a/‎docs/geofetch/changelog.md‎
Lines changed: 6 additions & 0 deletions b/‎docs/geofetch/changelog.md‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎docs/looper/advanced-guide/advanced-run-options.md‎
Lines changed: 19 additions & 9 deletions b/‎docs/looper/advanced-guide/advanced-run-options.md‎
Lines changed: 19 additions & 9 deletions
diff --git a/‎docs/looper/changelog.md‎
Lines changed: 7 additions & 0 deletions b/‎docs/looper/changelog.md‎
Lines changed: 7 additions & 0 deletions
diff --git a/‎docs/looper/code/python-api.md‎
Lines changed: 92 additions & 0 deletions b/‎docs/looper/code/python-api.md‎
Lines changed: 92 additions & 0 deletions
diff --git a/‎docs/looper/developer-tutorial/developer-pipestat.md‎
Lines changed: 45 additions & 9 deletions b/‎docs/looper/developer-tutorial/developer-pipestat.md‎
Lines changed: 45 additions & 9 deletions
diff --git a/‎docs/looper/developer-tutorial/pipeline-interface-specification.md‎
Lines changed: 38 additions & 0 deletions b/‎docs/looper/developer-tutorial/pipeline-interface-specification.md‎
Lines changed: 38 additions & 0 deletions
diff --git a/‎docs/looper/developer-tutorial/pre-submission-hooks.md‎
Lines changed: 10 additions & 2 deletions b/‎docs/looper/developer-tutorial/pre-submission-hooks.md‎
Lines changed: 10 additions & 2 deletions
@@ -4,4 +4,6 @@ extend-exclude = ["*.ipynb", "*.svg"]
 [default.extend-words]
 opf = "opf"
 PN="PN"
-Sur="Sur"
+Sur="Sur"
+certifi = "certifi"
+Tru = "Tru"
@@ -1,5 +1,11 @@
 # Changelog
 
+
+## [0.12.9] -- 2025-12-01
+- Improved error handling
+- Fixed incorrect return for processed series metadata
+
+
 ## [0.12.8] -- 2025-07-08
 - Updated docs
 - Fixed parsing nested items. [[143](https://github.com/pepkit/geofetch/issues/143)]
 
@@ -18,19 +18,23 @@ Let's introduce some of the more advanced capabilities of `looper run`.
 
 ## Grouping many jobs into one
 
-By default, `looper` will translate each row in your `sample_table` into a single job. But perhaps you are running a project with tens of thousands of rows, and each job only takes mere minutes to run; in this case, you'd rather just submit a single job to process many samples. `Looper` makes this easy with the `--lump` and `--lumpn` command line arguments.
+By default, `looper` will translate each row in your `sample_table` into a single job. But perhaps you are running a project with tens of thousands of rows, and each job only takes mere minutes to run; in this case, you'd rather just submit a single job to process many samples. `Looper` makes this easy with the `--lump` and `--lump-n` command line arguments.
 
-### Lumping jobs by job count: `--lumpn`
+### Lumping jobs by job count: `--lump-n`
 
-It's quite simple: if you want to run 100 samples in a single job submission script, just tell looper `--lumpn 100`.
+It's quite simple: if you want to run 100 samples in a single job submission script, just tell looper `--lump-n 100`.
 
 ### Lumping jobs by input file size: `--lump`
 
-But what if your samples are quite different in terms of input file size? For example, your project may include many small samples, which you'd like to lump together with 10 jobs to 1, but you also have a few control samples that are very large and should have their own dedicated job. If you just use `--lumpn` with 10 samples per job, you could end up lumping your control samples together, which would be terrible. To alleviate this problem, `looper` provides the `--lump` argument, which uses input file size to group samples together. By default, you specify an argument in number of gigabytes. Looper will go through your samples and accumulate them until the total input file size reaches your limit, at which point it finalizes and submits the job. This will keep larger files in independent runs and smaller files grouped together.
+But what if your samples are quite different in terms of input file size? For example, your project may include many small samples, which you'd like to lump together with 10 jobs to 1, but you also have a few control samples that are very large and should have their own dedicated job. If you just use `--lump-n` with 10 samples per job, you could end up lumping your control samples together, which would be terrible. To alleviate this problem, `looper` provides the `--lump` argument, which uses input file size to group samples together. By default, you specify an argument in number of gigabytes. Looper will go through your samples and accumulate them until the total input file size reaches your limit, at which point it finalizes and submits the job. This will keep larger files in independent runs and smaller files grouped together.
 
+<<<<<<< HEAD
 ### Lumping samples into number of jobs: `--lumpj`
+=======
+### Lumping jobs by job count: `--lump-j`
+>>>>>>> master
 
-Or you can lump samples into number of jobs.
+If you want to split your samples across a specific number of jobs, use `--lump-j`. For example, `--lump-j 10` will distribute all your samples evenly across 10 jobs.
 
 
 ## Running project-level pipelines
@@ -251,22 +255,28 @@ For example, to choose only samples where the `species` attribute is `human`, `m
 
 ```console
 looper run \
-  --sel-attr species
+  --sel-attr species \
   --sel-incl human mouse fly
 ```
 
 Similarly, to submit only one sample, with `sample_name` as `sample`, you could use:
 
 ```console
 looper run \
-  --sel-attr sample_name
+  --sel-attr sample_name \
   --sel-incl sample1
 ```
 
 ### Sample selection by exclusion
 
-If more convenient to *exclude* samples by filter, you can use the analogous arguments `--sel-attr` with `--sel-excl`.
-This will 
+If it's more convenient to *exclude* samples by filter, you can use the analogous arguments `--sel-attr` with `--sel-excl`.
+This will exclude any samples matching the specified values. For example, to run all samples *except* those where `species` is `rat`:
+
+```console
+looper run \
+  --sel-attr species \
+  --sel-excl rat
+```
 
 ### Toggling sample jobs through the sample table
 
 
@@ -2,6 +2,13 @@
 
 This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html) and [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) format.
 
+## [Unreleased]
+
+### Added
+- Added `inject_env_vars` pipeline interface property for injecting environment variables into submission scripts
+- Added `pipestat_config_required` pipeline interface property to control pipestat handoff validation
+- Added validation that pipestat-enabled interfaces (with `output_schema`) pass config to the pipeline via CLI (`{pipestat.*}`) or environment variable (`PIPESTAT_CONFIG` in `inject_env_vars`)
+
 ## [2.0.3] -- 2025-09-23
 ### Fixed
 - Fixed [#543](https://github.com/pepkit/looper/issues/543)
 
@@ -0,0 +1,92 @@
+# Looper Python API
+
+The looper Python API provides classes for managing pipeline submissions and compute configurations.
+
+## Project
+
+The main class for working with looper projects. Extends peppy's Project with pipeline submission capabilities.
+
+::: looper.project.Project
+    options:
+      members:
+        - __init__
+        - amendments
+        - cli_pifaces
+        - config
+        - config_file
+        - get_sample_piface
+        - get_schemas
+        - list_amendments
+        - make_project_dirs
+        - output_dir
+        - pipeline_interfaces
+        - populate_pipeline_outputs
+        - results_folder
+        - sample_table
+        - sample_table_index
+        - samples
+        - selected_compute_package
+        - set_sample_piface
+        - submission_folder
+        - subsample_table
+      heading_level: 3
+
+## PipelineInterface
+
+Parses and holds information from a pipeline interface YAML file, including resource specifications and command templates.
+
+::: looper.pipeline_interface.PipelineInterface
+    options:
+      members:
+        - __init__
+        - choose_resource_package
+        - get_pipeline_schemas
+        - pipeline_name
+        - render_var_templates
+      heading_level: 3
+
+## SubmissionConductor
+
+Collects and submits pipeline jobs. Manages job pooling based on file size or command count limits.
+
+::: looper.conductor.SubmissionConductor
+    options:
+      members:
+        - __init__
+        - add_sample
+        - failed_samples
+        - is_project_submittable
+        - num_cmd_submissions
+        - num_job_submissions
+        - submit
+        - write_script
+      heading_level: 3
+
+## ComputingConfiguration
+
+Manages compute environment settings from divvy configuration files. Handles resource packages and submission templates.
+
+::: looper.divvy.ComputingConfiguration
+    options:
+      members:
+        - __init__
+        - activate_package
+        - clean_start
+        - default_config_file
+        - get_active_package
+        - get_adapters
+        - list_compute_packages
+        - reset_active_settings
+        - template
+        - templates_folder
+        - update_packages
+        - write_script
+      heading_level: 3
+
+## Utility Functions
+
+### select_divvy_config
+
+::: looper.divvy.select_divvy_config
+    options:
+      heading_level: 3
@@ -86,31 +86,67 @@ We added one new line, which runs `pipestat` and provides it with this informati
 ## Connect pipestat to looper
 
 Next, we need to update our pipeline interface so looper passes all the necessary information to the pipeline.
-Since the sample name (`-r $2`) and the pipestat config file (`-c $3`) weren't previously passed to the pipeline, we need to adjust the pipeline interface, to make sure the command template specifies all the inputs our pipeline needs:
+Since the sample name (`-r $2`) and the pipestat config file (`-c $3`) weren't previously passed to the pipeline, we need to adjust the pipeline interface, to make sure the command template specifies all the inputs our pipeline needs.
 
-```yaml  title="pipeline/pipeline_interface.yaml" hl_lines="5"
+First, we need to tell looper we're dealing with a pipestat-compatible pipeline by adding the `output_schema` in the pipeline interface:
+
+```yaml  title="pipeline/pipeline_interface.yaml" hl_lines="2"
 pipeline_name: count_lines
+output_schema: pipestat_output_schema.yaml
 sample_interface:
   command_template: >
     pipeline/count_lines.sh {sample.file_path} {sample.sample_name} {pipestat.config_file}
 ```
 
-Now, looper will pass the sample_name and the pipestat config file as additional arguments to `count_lines.sh`.
-The `{sample.sample_name}` will just take the appropriate value from the sample table, just like we did previously with `{sample.file_path}`
-The `{pipestat.config_file}` is automatically provided by looper.
-Looper generates this config file based on the looper configuration and the pipeline interface.
-To read more about pipestat config files, see here: [pipestat configuration](../../pipestat/config.md).
+### Pipestat configuration handoff
 
-Next, we need to tell looper we're dealing with a pipestat-compatible pipeline. Specify this by adding the `output_schema` in the pipeline interface to the pipestat output schema we created earlier:
+When looper runs a pipestat-enabled pipeline, it creates a merged configuration file containing:
 
-```yaml  title="pipeline/pipeline_interface.yaml" hl_lines="2"
+- **Pipeline-author settings**: `output_schema`, `pipeline_name` (from pipeline interface)
+- **Pipeline-runner settings**: `results_file_path`, `output_dir`, `flag_file_dir` (from looper config)
+
+This merged config must be passed to the pipeline. Looper supports two mechanisms:
+
+#### Option 1: CLI argument (explicit)
+
+Pass `{pipestat.config_file}` in your command template:
+
+```yaml  title="pipeline/pipeline_interface.yaml"
 pipeline_name: count_lines
 output_schema: pipestat_output_schema.yaml
 sample_interface:
   command_template: >
     pipeline/count_lines.sh {sample.file_path} {sample.sample_name} {pipestat.config_file}
 ```
 
+The `{pipestat.config_file}` is automatically provided by looper. Looper generates this config file based on the looper configuration and the pipeline interface. To read more about pipestat config files, see here: [pipestat configuration](../../pipestat/config.md).
+
+#### Option 2: Environment variable injection (automatic)
+
+Use `inject_env_vars` to set `PIPESTAT_CONFIG`:
+
+```yaml  title="pipeline/pipeline_interface.yaml"
+pipeline_name: count_lines
+output_schema: pipestat_output_schema.yaml
+inject_env_vars:
+  PIPESTAT_CONFIG: "{pipestat.config_file}"
+sample_interface:
+  command_template: >
+    pipeline/count_lines.sh {sample.file_path} {sample.sample_name}
+```
+
+With this approach, your pipeline reads `PIPESTAT_CONFIG` from the environment. Pipestat checks this variable automatically, so no code changes are needed if your pipeline already uses `PipestatManager()` without explicit config.
+
+#### Validation
+
+Looper validates that pipestat-enabled interfaces (those with `output_schema`) use one of these mechanisms. If neither is found, looper raises an error with guidance on how to fix it.
+
+To disable this validation (if your pipeline handles config differently):
+
+```yaml
+pipestat_config_required: false
+```
+
 Finally, we need to configure where the pipestat results will be stored.
 Pipestat offers several ways to store results, including a simple file for a basic pipeline, or a relational database, or even PEPhub.
 We'll start with the simplest option and configure pipestat to use a results file.
 
@@ -26,6 +26,8 @@ A pipeline interface may contain the following keys:
 - `compute` (RECOMMENDED) - Settings for computing resources
 - `var_templates` (OPTIONAL) - A mapping of [Jinja2](https://jinja.palletsprojects.com/) templates and corresponding names, typically used to parameterize plugins
 - `pre_submit` (OPTIONAL) - A mapping that defines the pre-submission tasks to be executed.
+- `inject_env_vars` (OPTIONAL) - Environment variables to inject into submission scripts.
+- `pipestat_config_required` (OPTIONAL) - Set to `false` to disable pipestat config handoff validation.
 
 ## Example pipeline interface
 
@@ -317,6 +319,42 @@ var_templates:
 
 This section can consist of two subsections: `python_functions` and/or `command_templates`, which specify the pre-submission tasks to be run before the main pipeline command is submitted. Please refer to the [pre-submission hooks system](pre-submission-hooks.md) section for a detailed explanation of this feature and syntax.
 
+### inject_env_vars
+
+The `inject_env_vars` section allows you to inject environment variables into submission scripts. This is useful for passing configuration to pipelines without modifying the command template. Keys are environment variable names, values are Jinja2 templates that will be rendered with the available namespaces.
+
+```yaml
+pipeline_name: my_pipeline
+output_schema: output_schema.yaml
+inject_env_vars:
+  PIPESTAT_CONFIG: "{pipestat.config_file}"
+  MY_CUSTOM_VAR: "{looper.output_dir}/config.yaml"
+sample_interface:
+  command_template: >
+    python pipeline.py
+```
+
+These variables are exported at the top of each submission script before the pipeline command runs. This works for both direct execution and cluster submission.
+
+This is particularly useful for pipestat-compatible pipelines, where you can pass the pipestat config via the `PIPESTAT_CONFIG` environment variable instead of a CLI argument.
+
+### pipestat_config_required
+
+When a pipeline interface declares `output_schema` (indicating pipestat compatibility), looper validates that the pipestat configuration is actually passed to the pipeline. This validation ensures the pipeline will receive the merged config that looper creates.
+
+Looper accepts two handoff mechanisms:
+
+1. **CLI argument**: Use `{pipestat.config_file}` (or any `{pipestat.*}` variable) in your `command_template`
+2. **Environment variable**: Set `PIPESTAT_CONFIG` in the `inject_env_vars` section
+
+If neither mechanism is detected, looper raises an error with guidance on how to fix it.
+
+To disable this validation (if your pipeline handles pipestat configuration differently), set:
+
+```yaml
+pipestat_config_required: false
+```
+
 ## Validating a pipeline interface
 
 A pipeline interface can be validated using JSON Schema against [schema.databio.org/pipelines/pipeline_interface.yaml](http://schema.databio.org/pipelines/pipeline_interface.yaml). Looper automatically validates pipeline interfaces at submission initialization stage.
@@ -1,5 +1,10 @@
 # Pre-submission hooks
 
+!!! success "Learning objectives"
+    - What are pre-submission hooks and when should I use them?
+    - How do I use looper's built-in pre-submission plugins?
+    - How do I write my own custom pre-submission hook?
+
 ## Purpose
 
 Sometimes we need to run a set-up task *before* submitting the main pipeline. For example, we may need to generate a particular representation of the sample metadata to be consumed by a pipeline run. Some pre-submission tasks may depend on information outside of the sample, such as compute settings. For this purpose, looper provides **pre-submission hooks**, which allow users to run arbitrary shell commands or Python functions before submitting the actual pipeline. These hooks have access to all of the job submission settings looper uses to populate the primary command template. They can be used in two ways: 1) to simply run required tasks, producing required output before the pipeline is run; and 2) to modify the job submission settings, which can then be used in the actual submission template.
@@ -22,9 +27,12 @@ pre_submit:
 
 Because the looper variables are the input to each task, and are also potentially modified by each task, the order of execution is critical. Execution order follows two rules: First, `python_functions` are *always* executed before `command_templates`; and second, the user-specified order in the pipeline interface is preserved within each subsection.
 
-## Built-in pre-submission functions
+## Built-in pre-submission plugins
+
+Looper ships with several included plugins that you can use as pre-submission functions without installing additional software. These plugins produce various representations of the sample metadata, which can be useful for different types of pipelines.
 
-Looper ships with several included plugins that you can use as pre-submission functions without installing additional software. These plugins produce various representations of the sample metadata, which can be useful for different types of pipelines. The included plugins are described below:
+!!! note "Reference"
+    The following section documents the built-in plugins. Skip to [Writing your own pre-submission hooks](#writing-your-own-pre-submission-hooks) if you want to create custom hooks.
 
 
 ### Included plugin: `looper.write_sample_yaml`