Skip to content

Commit c6901f7

Browse files
committed
update looper docs, swithc to mkdocstrings
2 parents 9d5cf1e + 3e29627 commit c6901f7

29 files changed

+922
-257
lines changed

_typos.toml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,6 @@ extend-exclude = ["*.ipynb", "*.svg"]
44
[default.extend-words]
55
opf = "opf"
66
PN="PN"
7-
Sur="Sur"
7+
Sur="Sur"
8+
certifi = "certifi"
9+
Tru = "Tru"

docs/geofetch/changelog.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,11 @@
11
# Changelog
22

3+
4+
## [0.12.9] -- 2025-12-01
5+
- Improved error handling
6+
- Fixed incorrect return for processed series metadata
7+
8+
39
## [0.12.8] -- 2025-07-08
410
- Updated docs
511
- Fixed parsing nested items. [[143](https://github.com/pepkit/geofetch/issues/143)]

docs/looper/advanced-guide/advanced-run-options.md

Lines changed: 19 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -18,19 +18,23 @@ Let's introduce some of the more advanced capabilities of `looper run`.
1818

1919
## Grouping many jobs into one
2020

21-
By default, `looper` will translate each row in your `sample_table` into a single job. But perhaps you are running a project with tens of thousands of rows, and each job only takes mere minutes to run; in this case, you'd rather just submit a single job to process many samples. `Looper` makes this easy with the `--lump` and `--lumpn` command line arguments.
21+
By default, `looper` will translate each row in your `sample_table` into a single job. But perhaps you are running a project with tens of thousands of rows, and each job only takes mere minutes to run; in this case, you'd rather just submit a single job to process many samples. `Looper` makes this easy with the `--lump` and `--lump-n` command line arguments.
2222

23-
### Lumping jobs by job count: `--lumpn`
23+
### Lumping jobs by job count: `--lump-n`
2424

25-
It's quite simple: if you want to run 100 samples in a single job submission script, just tell looper `--lumpn 100`.
25+
It's quite simple: if you want to run 100 samples in a single job submission script, just tell looper `--lump-n 100`.
2626

2727
### Lumping jobs by input file size: `--lump`
2828

29-
But what if your samples are quite different in terms of input file size? For example, your project may include many small samples, which you'd like to lump together with 10 jobs to 1, but you also have a few control samples that are very large and should have their own dedicated job. If you just use `--lumpn` with 10 samples per job, you could end up lumping your control samples together, which would be terrible. To alleviate this problem, `looper` provides the `--lump` argument, which uses input file size to group samples together. By default, you specify an argument in number of gigabytes. Looper will go through your samples and accumulate them until the total input file size reaches your limit, at which point it finalizes and submits the job. This will keep larger files in independent runs and smaller files grouped together.
29+
But what if your samples are quite different in terms of input file size? For example, your project may include many small samples, which you'd like to lump together with 10 jobs to 1, but you also have a few control samples that are very large and should have their own dedicated job. If you just use `--lump-n` with 10 samples per job, you could end up lumping your control samples together, which would be terrible. To alleviate this problem, `looper` provides the `--lump` argument, which uses input file size to group samples together. By default, you specify an argument in number of gigabytes. Looper will go through your samples and accumulate them until the total input file size reaches your limit, at which point it finalizes and submits the job. This will keep larger files in independent runs and smaller files grouped together.
3030

31+
<<<<<<< HEAD
3132
### Lumping samples into number of jobs: `--lumpj`
33+
=======
34+
### Lumping jobs by job count: `--lump-j`
35+
>>>>>>> master
3236
33-
Or you can lump samples into number of jobs.
37+
If you want to split your samples across a specific number of jobs, use `--lump-j`. For example, `--lump-j 10` will distribute all your samples evenly across 10 jobs.
3438

3539

3640
## Running project-level pipelines
@@ -251,22 +255,28 @@ For example, to choose only samples where the `species` attribute is `human`, `m
251255

252256
```console
253257
looper run \
254-
--sel-attr species
258+
--sel-attr species \
255259
--sel-incl human mouse fly
256260
```
257261

258262
Similarly, to submit only one sample, with `sample_name` as `sample`, you could use:
259263

260264
```console
261265
looper run \
262-
--sel-attr sample_name
266+
--sel-attr sample_name \
263267
--sel-incl sample1
264268
```
265269

266270
### Sample selection by exclusion
267271

268-
If more convenient to *exclude* samples by filter, you can use the analogous arguments `--sel-attr` with `--sel-excl`.
269-
This will
272+
If it's more convenient to *exclude* samples by filter, you can use the analogous arguments `--sel-attr` with `--sel-excl`.
273+
This will exclude any samples matching the specified values. For example, to run all samples *except* those where `species` is `rat`:
274+
275+
```console
276+
looper run \
277+
--sel-attr species \
278+
--sel-excl rat
279+
```
270280

271281
### Toggling sample jobs through the sample table
272282

docs/looper/changelog.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,13 @@
22

33
This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html) and [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) format.
44

5+
## [Unreleased]
6+
7+
### Added
8+
- Added `inject_env_vars` pipeline interface property for injecting environment variables into submission scripts
9+
- Added `pipestat_config_required` pipeline interface property to control pipestat handoff validation
10+
- Added validation that pipestat-enabled interfaces (with `output_schema`) pass config to the pipeline via CLI (`{pipestat.*}`) or environment variable (`PIPESTAT_CONFIG` in `inject_env_vars`)
11+
512
## [2.0.3] -- 2025-09-23
613
### Fixed
714
- Fixed [#543](https://github.com/pepkit/looper/issues/543)

docs/looper/code/python-api.md

Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
# Looper Python API
2+
3+
The looper Python API provides classes for managing pipeline submissions and compute configurations.
4+
5+
## Project
6+
7+
The main class for working with looper projects. Extends peppy's Project with pipeline submission capabilities.
8+
9+
::: looper.project.Project
10+
options:
11+
members:
12+
- __init__
13+
- amendments
14+
- cli_pifaces
15+
- config
16+
- config_file
17+
- get_sample_piface
18+
- get_schemas
19+
- list_amendments
20+
- make_project_dirs
21+
- output_dir
22+
- pipeline_interfaces
23+
- populate_pipeline_outputs
24+
- results_folder
25+
- sample_table
26+
- sample_table_index
27+
- samples
28+
- selected_compute_package
29+
- set_sample_piface
30+
- submission_folder
31+
- subsample_table
32+
heading_level: 3
33+
34+
## PipelineInterface
35+
36+
Parses and holds information from a pipeline interface YAML file, including resource specifications and command templates.
37+
38+
::: looper.pipeline_interface.PipelineInterface
39+
options:
40+
members:
41+
- __init__
42+
- choose_resource_package
43+
- get_pipeline_schemas
44+
- pipeline_name
45+
- render_var_templates
46+
heading_level: 3
47+
48+
## SubmissionConductor
49+
50+
Collects and submits pipeline jobs. Manages job pooling based on file size or command count limits.
51+
52+
::: looper.conductor.SubmissionConductor
53+
options:
54+
members:
55+
- __init__
56+
- add_sample
57+
- failed_samples
58+
- is_project_submittable
59+
- num_cmd_submissions
60+
- num_job_submissions
61+
- submit
62+
- write_script
63+
heading_level: 3
64+
65+
## ComputingConfiguration
66+
67+
Manages compute environment settings from divvy configuration files. Handles resource packages and submission templates.
68+
69+
::: looper.divvy.ComputingConfiguration
70+
options:
71+
members:
72+
- __init__
73+
- activate_package
74+
- clean_start
75+
- default_config_file
76+
- get_active_package
77+
- get_adapters
78+
- list_compute_packages
79+
- reset_active_settings
80+
- template
81+
- templates_folder
82+
- update_packages
83+
- write_script
84+
heading_level: 3
85+
86+
## Utility Functions
87+
88+
### select_divvy_config
89+
90+
::: looper.divvy.select_divvy_config
91+
options:
92+
heading_level: 3

docs/looper/developer-tutorial/developer-pipestat.md

Lines changed: 45 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -86,31 +86,67 @@ We added one new line, which runs `pipestat` and provides it with this informati
8686
## Connect pipestat to looper
8787

8888
Next, we need to update our pipeline interface so looper passes all the necessary information to the pipeline.
89-
Since the sample name (`-r $2`) and the pipestat config file (`-c $3`) weren't previously passed to the pipeline, we need to adjust the pipeline interface, to make sure the command template specifies all the inputs our pipeline needs:
89+
Since the sample name (`-r $2`) and the pipestat config file (`-c $3`) weren't previously passed to the pipeline, we need to adjust the pipeline interface, to make sure the command template specifies all the inputs our pipeline needs.
9090

91-
```yaml title="pipeline/pipeline_interface.yaml" hl_lines="5"
91+
First, we need to tell looper we're dealing with a pipestat-compatible pipeline by adding the `output_schema` in the pipeline interface:
92+
93+
```yaml title="pipeline/pipeline_interface.yaml" hl_lines="2"
9294
pipeline_name: count_lines
95+
output_schema: pipestat_output_schema.yaml
9396
sample_interface:
9497
command_template: >
9598
pipeline/count_lines.sh {sample.file_path} {sample.sample_name} {pipestat.config_file}
9699
```
97100

98-
Now, looper will pass the sample_name and the pipestat config file as additional arguments to `count_lines.sh`.
99-
The `{sample.sample_name}` will just take the appropriate value from the sample table, just like we did previously with `{sample.file_path}`
100-
The `{pipestat.config_file}` is automatically provided by looper.
101-
Looper generates this config file based on the looper configuration and the pipeline interface.
102-
To read more about pipestat config files, see here: [pipestat configuration](../../pipestat/config.md).
101+
### Pipestat configuration handoff
103102

104-
Next, we need to tell looper we're dealing with a pipestat-compatible pipeline. Specify this by adding the `output_schema` in the pipeline interface to the pipestat output schema we created earlier:
103+
When looper runs a pipestat-enabled pipeline, it creates a merged configuration file containing:
105104

106-
```yaml title="pipeline/pipeline_interface.yaml" hl_lines="2"
105+
- **Pipeline-author settings**: `output_schema`, `pipeline_name` (from pipeline interface)
106+
- **Pipeline-runner settings**: `results_file_path`, `output_dir`, `flag_file_dir` (from looper config)
107+
108+
This merged config must be passed to the pipeline. Looper supports two mechanisms:
109+
110+
#### Option 1: CLI argument (explicit)
111+
112+
Pass `{pipestat.config_file}` in your command template:
113+
114+
```yaml title="pipeline/pipeline_interface.yaml"
107115
pipeline_name: count_lines
108116
output_schema: pipestat_output_schema.yaml
109117
sample_interface:
110118
command_template: >
111119
pipeline/count_lines.sh {sample.file_path} {sample.sample_name} {pipestat.config_file}
112120
```
113121

122+
The `{pipestat.config_file}` is automatically provided by looper. Looper generates this config file based on the looper configuration and the pipeline interface. To read more about pipestat config files, see here: [pipestat configuration](../../pipestat/config.md).
123+
124+
#### Option 2: Environment variable injection (automatic)
125+
126+
Use `inject_env_vars` to set `PIPESTAT_CONFIG`:
127+
128+
```yaml title="pipeline/pipeline_interface.yaml"
129+
pipeline_name: count_lines
130+
output_schema: pipestat_output_schema.yaml
131+
inject_env_vars:
132+
PIPESTAT_CONFIG: "{pipestat.config_file}"
133+
sample_interface:
134+
command_template: >
135+
pipeline/count_lines.sh {sample.file_path} {sample.sample_name}
136+
```
137+
138+
With this approach, your pipeline reads `PIPESTAT_CONFIG` from the environment. Pipestat checks this variable automatically, so no code changes are needed if your pipeline already uses `PipestatManager()` without explicit config.
139+
140+
#### Validation
141+
142+
Looper validates that pipestat-enabled interfaces (those with `output_schema`) use one of these mechanisms. If neither is found, looper raises an error with guidance on how to fix it.
143+
144+
To disable this validation (if your pipeline handles config differently):
145+
146+
```yaml
147+
pipestat_config_required: false
148+
```
149+
114150
Finally, we need to configure where the pipestat results will be stored.
115151
Pipestat offers several ways to store results, including a simple file for a basic pipeline, or a relational database, or even PEPhub.
116152
We'll start with the simplest option and configure pipestat to use a results file.

docs/looper/developer-tutorial/pipeline-interface-specification.md

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,8 @@ A pipeline interface may contain the following keys:
2626
- `compute` (RECOMMENDED) - Settings for computing resources
2727
- `var_templates` (OPTIONAL) - A mapping of [Jinja2](https://jinja.palletsprojects.com/) templates and corresponding names, typically used to parameterize plugins
2828
- `pre_submit` (OPTIONAL) - A mapping that defines the pre-submission tasks to be executed.
29+
- `inject_env_vars` (OPTIONAL) - Environment variables to inject into submission scripts.
30+
- `pipestat_config_required` (OPTIONAL) - Set to `false` to disable pipestat config handoff validation.
2931

3032
## Example pipeline interface
3133

@@ -317,6 +319,42 @@ var_templates:
317319

318320
This section can consist of two subsections: `python_functions` and/or `command_templates`, which specify the pre-submission tasks to be run before the main pipeline command is submitted. Please refer to the [pre-submission hooks system](pre-submission-hooks.md) section for a detailed explanation of this feature and syntax.
319321

322+
### inject_env_vars
323+
324+
The `inject_env_vars` section allows you to inject environment variables into submission scripts. This is useful for passing configuration to pipelines without modifying the command template. Keys are environment variable names, values are Jinja2 templates that will be rendered with the available namespaces.
325+
326+
```yaml
327+
pipeline_name: my_pipeline
328+
output_schema: output_schema.yaml
329+
inject_env_vars:
330+
PIPESTAT_CONFIG: "{pipestat.config_file}"
331+
MY_CUSTOM_VAR: "{looper.output_dir}/config.yaml"
332+
sample_interface:
333+
command_template: >
334+
python pipeline.py
335+
```
336+
337+
These variables are exported at the top of each submission script before the pipeline command runs. This works for both direct execution and cluster submission.
338+
339+
This is particularly useful for pipestat-compatible pipelines, where you can pass the pipestat config via the `PIPESTAT_CONFIG` environment variable instead of a CLI argument.
340+
341+
### pipestat_config_required
342+
343+
When a pipeline interface declares `output_schema` (indicating pipestat compatibility), looper validates that the pipestat configuration is actually passed to the pipeline. This validation ensures the pipeline will receive the merged config that looper creates.
344+
345+
Looper accepts two handoff mechanisms:
346+
347+
1. **CLI argument**: Use `{pipestat.config_file}` (or any `{pipestat.*}` variable) in your `command_template`
348+
2. **Environment variable**: Set `PIPESTAT_CONFIG` in the `inject_env_vars` section
349+
350+
If neither mechanism is detected, looper raises an error with guidance on how to fix it.
351+
352+
To disable this validation (if your pipeline handles pipestat configuration differently), set:
353+
354+
```yaml
355+
pipestat_config_required: false
356+
```
357+
320358
## Validating a pipeline interface
321359

322360
A pipeline interface can be validated using JSON Schema against [schema.databio.org/pipelines/pipeline_interface.yaml](http://schema.databio.org/pipelines/pipeline_interface.yaml). Looper automatically validates pipeline interfaces at submission initialization stage.

docs/looper/developer-tutorial/pre-submission-hooks.md

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,10 @@
11
# Pre-submission hooks
22

3+
!!! success "Learning objectives"
4+
- What are pre-submission hooks and when should I use them?
5+
- How do I use looper's built-in pre-submission plugins?
6+
- How do I write my own custom pre-submission hook?
7+
38
## Purpose
49

510
Sometimes we need to run a set-up task *before* submitting the main pipeline. For example, we may need to generate a particular representation of the sample metadata to be consumed by a pipeline run. Some pre-submission tasks may depend on information outside of the sample, such as compute settings. For this purpose, looper provides **pre-submission hooks**, which allow users to run arbitrary shell commands or Python functions before submitting the actual pipeline. These hooks have access to all of the job submission settings looper uses to populate the primary command template. They can be used in two ways: 1) to simply run required tasks, producing required output before the pipeline is run; and 2) to modify the job submission settings, which can then be used in the actual submission template.
@@ -22,9 +27,12 @@ pre_submit:
2227

2328
Because the looper variables are the input to each task, and are also potentially modified by each task, the order of execution is critical. Execution order follows two rules: First, `python_functions` are *always* executed before `command_templates`; and second, the user-specified order in the pipeline interface is preserved within each subsection.
2429

25-
## Built-in pre-submission functions
30+
## Built-in pre-submission plugins
31+
32+
Looper ships with several included plugins that you can use as pre-submission functions without installing additional software. These plugins produce various representations of the sample metadata, which can be useful for different types of pipelines.
2633

27-
Looper ships with several included plugins that you can use as pre-submission functions without installing additional software. These plugins produce various representations of the sample metadata, which can be useful for different types of pipelines. The included plugins are described below:
34+
!!! note "Reference"
35+
The following section documents the built-in plugins. Skip to [Writing your own pre-submission hooks](#writing-your-own-pre-submission-hooks) if you want to create custom hooks.
2836

2937

3038
### Included plugin: `looper.write_sample_yaml`

0 commit comments

Comments
 (0)