You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/looper/advanced-guide/advanced-run-options.md
+19-9Lines changed: 19 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,19 +18,23 @@ Let's introduce some of the more advanced capabilities of `looper run`.
18
18
19
19
## Grouping many jobs into one
20
20
21
-
By default, `looper` will translate each row in your `sample_table` into a single job. But perhaps you are running a project with tens of thousands of rows, and each job only takes mere minutes to run; in this case, you'd rather just submit a single job to process many samples. `Looper` makes this easy with the `--lump` and `--lumpn` command line arguments.
21
+
By default, `looper` will translate each row in your `sample_table` into a single job. But perhaps you are running a project with tens of thousands of rows, and each job only takes mere minutes to run; in this case, you'd rather just submit a single job to process many samples. `Looper` makes this easy with the `--lump` and `--lump-n` command line arguments.
22
22
23
-
### Lumping jobs by job count: `--lumpn`
23
+
### Lumping jobs by job count: `--lump-n`
24
24
25
-
It's quite simple: if you want to run 100 samples in a single job submission script, just tell looper `--lumpn 100`.
25
+
It's quite simple: if you want to run 100 samples in a single job submission script, just tell looper `--lump-n 100`.
26
26
27
27
### Lumping jobs by input file size: `--lump`
28
28
29
-
But what if your samples are quite different in terms of input file size? For example, your project may include many small samples, which you'd like to lump together with 10 jobs to 1, but you also have a few control samples that are very large and should have their own dedicated job. If you just use `--lumpn` with 10 samples per job, you could end up lumping your control samples together, which would be terrible. To alleviate this problem, `looper` provides the `--lump` argument, which uses input file size to group samples together. By default, you specify an argument in number of gigabytes. Looper will go through your samples and accumulate them until the total input file size reaches your limit, at which point it finalizes and submits the job. This will keep larger files in independent runs and smaller files grouped together.
29
+
But what if your samples are quite different in terms of input file size? For example, your project may include many small samples, which you'd like to lump together with 10 jobs to 1, but you also have a few control samples that are very large and should have their own dedicated job. If you just use `--lump-n` with 10 samples per job, you could end up lumping your control samples together, which would be terrible. To alleviate this problem, `looper` provides the `--lump` argument, which uses input file size to group samples together. By default, you specify an argument in number of gigabytes. Looper will go through your samples and accumulate them until the total input file size reaches your limit, at which point it finalizes and submits the job. This will keep larger files in independent runs and smaller files grouped together.
30
30
31
+
<<<<<<< HEAD
31
32
### Lumping samples into number of jobs: `--lumpj`
33
+
=======
34
+
### Lumping jobs by job count: `--lump-j`
35
+
>>>>>>> master
32
36
33
-
Or you can lump samples into number of jobs.
37
+
If you want to split your samples across a specific number of jobs, use `--lump-j`. For example, `--lump-j 10` will distribute all your samples evenly across 10 jobs.
34
38
35
39
36
40
## Running project-level pipelines
@@ -251,22 +255,28 @@ For example, to choose only samples where the `species` attribute is `human`, `m
251
255
252
256
```console
253
257
looper run \
254
-
--sel-attr species
258
+
--sel-attr species \
255
259
--sel-incl human mouse fly
256
260
```
257
261
258
262
Similarly, to submit only one sample, with `sample_name` as `sample`, you could use:
259
263
260
264
```console
261
265
looper run \
262
-
--sel-attr sample_name
266
+
--sel-attr sample_name \
263
267
--sel-incl sample1
264
268
```
265
269
266
270
### Sample selection by exclusion
267
271
268
-
If more convenient to *exclude* samples by filter, you can use the analogous arguments `--sel-attr` with `--sel-excl`.
269
-
This will
272
+
If it's more convenient to *exclude* samples by filter, you can use the analogous arguments `--sel-attr` with `--sel-excl`.
273
+
This will exclude any samples matching the specified values. For example, to run all samples *except* those where `species` is `rat`:
Copy file name to clipboardExpand all lines: docs/looper/changelog.md
+7Lines changed: 7 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,6 +2,13 @@
2
2
3
3
This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html) and [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) format.
4
4
5
+
## [Unreleased]
6
+
7
+
### Added
8
+
- Added `inject_env_vars` pipeline interface property for injecting environment variables into submission scripts
9
+
- Added `pipestat_config_required` pipeline interface property to control pipestat handoff validation
10
+
- Added validation that pipestat-enabled interfaces (with `output_schema`) pass config to the pipeline via CLI (`{pipestat.*}`) or environment variable (`PIPESTAT_CONFIG` in `inject_env_vars`)
Copy file name to clipboardExpand all lines: docs/looper/developer-tutorial/developer-pipestat.md
+45-9Lines changed: 45 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -86,31 +86,67 @@ We added one new line, which runs `pipestat` and provides it with this informati
86
86
## Connect pipestat to looper
87
87
88
88
Next, we need to update our pipeline interface so looper passes all the necessary information to the pipeline.
89
-
Since the sample name (`-r $2`) and the pipestat config file (`-c $3`) weren't previously passed to the pipeline, we need to adjust the pipeline interface, to make sure the command template specifies all the inputs our pipeline needs:
89
+
Since the sample name (`-r $2`) and the pipestat config file (`-c $3`) weren't previously passed to the pipeline, we need to adjust the pipeline interface, to make sure the command template specifies all the inputs our pipeline needs.
Now, looper will pass the sample_name and the pipestat config file as additional arguments to `count_lines.sh`.
99
-
The `{sample.sample_name}` will just take the appropriate value from the sample table, just like we did previously with `{sample.file_path}`
100
-
The `{pipestat.config_file}` is automatically provided by looper.
101
-
Looper generates this config file based on the looper configuration and the pipeline interface.
102
-
To read more about pipestat config files, see here: [pipestat configuration](../../pipestat/config.md).
101
+
### Pipestat configuration handoff
103
102
104
-
Next, we need to tell looper we're dealing with a pipestat-compatible pipeline. Specify this by adding the `output_schema` in the pipeline interface to the pipestat output schema we created earlier:
103
+
When looper runs a pipestat-enabled pipeline, it creates a merged configuration file containing:
The `{pipestat.config_file}` is automatically provided by looper. Looper generates this config file based on the looper configuration and the pipeline interface. To read more about pipestat config files, see here: [pipestat configuration](../../pipestat/config.md).
With this approach, your pipeline reads `PIPESTAT_CONFIG` from the environment. Pipestat checks this variable automatically, so no code changes are needed if your pipeline already uses `PipestatManager()` without explicit config.
139
+
140
+
#### Validation
141
+
142
+
Looper validates that pipestat-enabled interfaces (those with `output_schema`) use one of these mechanisms. If neither is found, looper raises an error with guidance on how to fix it.
143
+
144
+
To disable this validation (if your pipeline handles config differently):
145
+
146
+
```yaml
147
+
pipestat_config_required: false
148
+
```
149
+
114
150
Finally, we need to configure where the pipestat results will be stored.
115
151
Pipestat offers several ways to store results, including a simple file for a basic pipeline, or a relational database, or even PEPhub.
116
152
We'll start with the simplest option and configure pipestat to use a results file.
Copy file name to clipboardExpand all lines: docs/looper/developer-tutorial/pipeline-interface-specification.md
+38Lines changed: 38 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -26,6 +26,8 @@ A pipeline interface may contain the following keys:
26
26
-`compute` (RECOMMENDED) - Settings for computing resources
27
27
-`var_templates` (OPTIONAL) - A mapping of [Jinja2](https://jinja.palletsprojects.com/) templates and corresponding names, typically used to parameterize plugins
28
28
-`pre_submit` (OPTIONAL) - A mapping that defines the pre-submission tasks to be executed.
29
+
-`inject_env_vars` (OPTIONAL) - Environment variables to inject into submission scripts.
30
+
-`pipestat_config_required` (OPTIONAL) - Set to `false` to disable pipestat config handoff validation.
29
31
30
32
## Example pipeline interface
31
33
@@ -317,6 +319,42 @@ var_templates:
317
319
318
320
This section can consist of two subsections: `python_functions`and/or `command_templates`, which specify the pre-submission tasks to be run before the main pipeline command is submitted. Please refer to the [pre-submission hooks system](pre-submission-hooks.md) section for a detailed explanation of this feature and syntax.
319
321
322
+
### inject_env_vars
323
+
324
+
The `inject_env_vars` section allows you to inject environment variables into submission scripts. This is useful for passing configuration to pipelines without modifying the command template. Keys are environment variable names, values are Jinja2 templates that will be rendered with the available namespaces.
325
+
326
+
```yaml
327
+
pipeline_name: my_pipeline
328
+
output_schema: output_schema.yaml
329
+
inject_env_vars:
330
+
PIPESTAT_CONFIG: "{pipestat.config_file}"
331
+
MY_CUSTOM_VAR: "{looper.output_dir}/config.yaml"
332
+
sample_interface:
333
+
command_template: >
334
+
python pipeline.py
335
+
```
336
+
337
+
These variables are exported at the top of each submission script before the pipeline command runs. This works for both direct execution and cluster submission.
338
+
339
+
This is particularly useful for pipestat-compatible pipelines, where you can pass the pipestat config via the `PIPESTAT_CONFIG` environment variable instead of a CLI argument.
340
+
341
+
### pipestat_config_required
342
+
343
+
When a pipeline interface declares `output_schema` (indicating pipestat compatibility), looper validates that the pipestat configuration is actually passed to the pipeline. This validation ensures the pipeline will receive the merged config that looper creates.
344
+
345
+
Looper accepts two handoff mechanisms:
346
+
347
+
1. **CLI argument**: Use `{pipestat.config_file}` (or any `{pipestat.*}` variable) in your `command_template`
348
+
2. **Environment variable**: Set `PIPESTAT_CONFIG` in the `inject_env_vars` section
349
+
350
+
If neither mechanism is detected, looper raises an error with guidance on how to fix it.
351
+
352
+
To disable this validation (if your pipeline handles pipestat configuration differently), set:
353
+
354
+
```yaml
355
+
pipestat_config_required: false
356
+
```
357
+
320
358
## Validating a pipeline interface
321
359
322
360
A pipeline interface can be validated using JSON Schema against [schema.databio.org/pipelines/pipeline_interface.yaml](http://schema.databio.org/pipelines/pipeline_interface.yaml). Looper automatically validates pipeline interfaces at submission initialization stage.
Copy file name to clipboardExpand all lines: docs/looper/developer-tutorial/pre-submission-hooks.md
+10-2Lines changed: 10 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,10 @@
1
1
# Pre-submission hooks
2
2
3
+
!!! success "Learning objectives"
4
+
- What are pre-submission hooks and when should I use them?
5
+
- How do I use looper's built-in pre-submission plugins?
6
+
- How do I write my own custom pre-submission hook?
7
+
3
8
## Purpose
4
9
5
10
Sometimes we need to run a set-up task *before* submitting the main pipeline. For example, we may need to generate a particular representation of the sample metadata to be consumed by a pipeline run. Some pre-submission tasks may depend on information outside of the sample, such as compute settings. For this purpose, looper provides **pre-submission hooks**, which allow users to run arbitrary shell commands or Python functions before submitting the actual pipeline. These hooks have access to all of the job submission settings looper uses to populate the primary command template. They can be used in two ways: 1) to simply run required tasks, producing required output before the pipeline is run; and 2) to modify the job submission settings, which can then be used in the actual submission template.
@@ -22,9 +27,12 @@ pre_submit:
22
27
23
28
Because the looper variables are the input to each task, and are also potentially modified by each task, the order of execution is critical. Execution order follows two rules: First, `python_functions` are *always* executed before `command_templates`; and second, the user-specified order in the pipeline interface is preserved within each subsection.
24
29
25
-
## Built-in pre-submission functions
30
+
## Built-in pre-submission plugins
31
+
32
+
Looper ships with several included plugins that you can use as pre-submission functions without installing additional software. These plugins produce various representations of the sample metadata, which can be useful for different types of pipelines.
26
33
27
-
Looper ships with several included plugins that you can use as pre-submission functions without installing additional software. These plugins produce various representations of the sample metadata, which can be useful for different types of pipelines. The included plugins are described below:
34
+
!!! note "Reference"
35
+
The following section documents the built-in plugins. Skip to [Writing your own pre-submission hooks](#writing-your-own-pre-submission-hooks) if you want to create custom hooks.
0 commit comments