You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/pipestat/README.md
+3-116Lines changed: 3 additions & 116 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,122 +14,9 @@ Pipestat standardizes reporting of pipeline results. It provides 1) a standard s
14
14
15
15
## How does pipestat work?
16
16
17
-
A pipeline author defines all the outputs produced by a pipeline by writing a JSON-schema. The pipeline then uses pipestat to report pipeline outputs as the pipeline runs, either via the Python API or command line interface. The user configures results to be stored either in a [YAML-formatted file](https://yaml.org/spec/1.2/spec.html) or a [PostgreSQL database](https://www.postgresql.org/). The results are recorded according to the pipestat specification, in a standard, pipeline-agnostic way. This way, downstream software can use this specification to create universal tools for analyzing, monitoring, and visualizing pipeline results that will work with any pipeline or workflow.
17
+
A pipeline author defines all the outputs produced by a pipeline by writing a JSON-schema. The pipeline then uses pipestat to report pipeline outputs as the pipeline runs, either via the Python API or command line interface. The user configures results to be stored either in a [YAML-formatted file](https://yaml.org/spec/1.2/spec.html), a [PostgreSQL database](https://www.postgresql.org/) or on [PEPhub](https://pephub.databio.org/). The results are recorded according to the pipestat specification, in a standard, pipeline-agnostic way. This way, downstream software can use this specification to create universal tools for analyzing, monitoring, and visualizing pipeline results that will work with any pipeline or workflow.
18
18
19
19
<!-- TODO: This needs a graphical representation here. -->
20
20
21
-
22
-
## Installing pipestat
23
-
24
-
### Minimal install for file backend
25
-
26
-
Install pipestat from PyPI with `pip`:
27
-
28
-
```
29
-
pip install pipestat
30
-
```
31
-
32
-
Confirm installation by calling `pipestat -h` on the command line. If the `pipestat` executable is not in your `$PATH`, append this to your `.bashrc` or `.profile` (or `.bash_profile` on macOS):
33
-
34
-
```console
35
-
export PATH=~/.local/bin:$PATH
36
-
```
37
-
38
-
### Optional dependencies for database backend
39
-
40
-
Pipestat can use either a file or a database as the backend for recording results. The default installation only provides file backend. To install dependencies required for the database backend:
41
-
42
-
```
43
-
pip install pipestat['dbbackend']
44
-
```
45
-
46
-
### Optional dependencies for pipestat reader
47
-
48
-
To install dependencies for the included `pipestatreader` submodule:
49
-
50
-
```
51
-
pip install pipestat['pipestatreader']
52
-
```
53
-
54
-
## Set environment variables
55
-
56
-
<!-- TODO: What is going on here? This needs a sentence of explanation before jumping into a code block -->
57
-
58
-
```console
59
-
export PIPESTAT_RESULTS_SCHEMA=output_schema.yaml
60
-
export PIPESTAT_RECORD_IDENTIFIER=my_record
61
-
export PIPESTAT_RESULTS_FILE=results_file.yaml
62
-
```
63
-
64
-
When setting environment variables like this, you will need to provide an `output_schema.yaml` file in your current working directory with the following example data:
65
-
66
-
```yaml
67
-
title: An example Pipestat output schema
68
-
description: A pipeline using pipestat to report sample and project results.
69
-
type: object
70
-
properties:
71
-
pipeline_name: "default_pipeline_name"
72
-
samples:
73
-
type: object
74
-
properties:
75
-
result_name:
76
-
type: string
77
-
description: "ResultName"
78
-
```
79
-
80
-
## Pipeline results reporting and retrieval
81
-
82
-
These examples assume the above environment variables are set.
83
-
84
-
### Command-line usage
85
-
86
-
```console
87
-
# Report a result:
88
-
pipestat report -i result_name -v 1.1
89
-
90
-
# Retrieve the result:
91
-
pipestat retrieve -r my_record
92
-
```
93
-
94
-
### Python usage
95
-
96
-
```python
97
-
import pipestat
98
-
99
-
# Report a result
100
-
psm = pipestat.PipestatManager()
101
-
psm.report(values={"result_name": 1.1})
102
-
103
-
# Retrieve a result
104
-
psm = pipestat.PipestatManager()
105
-
psm.retrieve_one(result_identifier="result_name")
106
-
```
107
-
108
-
## Pipeline status management
109
-
110
-
### From command line:
111
-
112
-
113
-
114
-
```console
115
-
# Set status
116
-
pipestat status set running
117
-
118
-
# Get status
119
-
pipestat status get
120
-
```
121
-
122
-
### Python usage
123
-
124
-
125
-
```python
126
-
import pipestat
127
-
128
-
# Set status
129
-
psm = pipestat.PipestatManager()
130
-
psm.set_status(status_identifier="running")
131
-
132
-
# Get status
133
-
psm = pipestat.PipestatManager()
134
-
psm.get_status()
135
-
```
21
+
## Quick start
22
+
Check out the [quickstart guide](./code/api-quickstar.md). See [API Usage](./code/python-tutorial.md) and [CLI Usage](./code/cli.md).
The pipestat specification describes three backend types for storing results: a [YAML-formatted file](https://yaml.org/spec/1.2/spec.html), a [PostgreSQL database](https://www.postgresql.org/) or reporting results to [PEPhub](https://pephub.databio.org/). This flexibility makes pipestat useful for a wide variety of use cases. Some users just need a simple text file for smaller-scale needs, which is convenient and universal, requiring no database infrastructure. For larger-scale systems, a database back-end is necessary. The pipestat specification provides a layer that spans the three possibilities, so that reports can be made in the same way, regardless of which back-end is used in a particular use case.
5
+
6
+
By using the `pipestat` package to write results, the pipeline author need not be concerned with database connections or dealing with racefree file writing, as these tasks are already implemented. The user who runs the pipeline will simply configure the pipestat backend as required.
7
+
8
+
Both backends organize the results in a hierarchy which is *always* structured this way:
9
+
10
+

11
+
12
+
13
+
14
+
## File
15
+
16
+
The changes reported using the `report` method of `PipestatManger` will be securely written to the file. Currently only [YAML](https://yaml.org/) format is supported.
For the YAML file backend, each file represents a namespace. The file always begins with a single top-level key which indicates the namespace. Second-level keys correspond to the record identifiers; third-level keys correspond to result identifiers, which point to the reported values. The values can then be any of the allowed pipestat data types, which include both basic and advanced data types.
25
+
26
+
```yaml
27
+
default_pipeline_name:
28
+
project: {}
29
+
sample:
30
+
sample_1:
31
+
meta:
32
+
pipestat_modified_time: '2025-10-01 12:48:58'
33
+
pipestat_created_time: '2025-10-01 12:48:58'
34
+
number_of_things: '12'
35
+
```
36
+
37
+
## PostgreSQL database
38
+
This option gives the user the possibility to use a fully fledged database to back `PipestatManager`.
where the config file has the following (example) values:
46
+
47
+
```yaml
48
+
schema_path: sample_output_schema.yaml
49
+
database:
50
+
dialect: postgresql
51
+
driver: psycopg
52
+
name: pipestat-test
53
+
user: postgres
54
+
password: pipestat-password
55
+
host: 127.0.0.1
56
+
port: 5432
57
+
58
+
```
59
+
60
+
For the PostgreSQL backend, the name of the database is configurable and defined in the [config file](config.md) in `database.name`. The database is structured like this:
61
+
62
+
- The namespace corresponds to the name of the table.
63
+
- The record identifier is indicated in the *unique* `record_identifier` column in that table.
64
+
- Each result is specified as a column in the table, with the column name corresponding to the result identifier
65
+
- The values in the cells for a record and result identifier correspond to the actual data values reported for the given result.
66
+
67
+

68
+
69
+
70
+
71
+
## PEP on PEPhub
72
+
This option gives the user the possibility to use [PEPhub](https://pephub.databio.org/) as a backend for results.
Copy file name to clipboardExpand all lines: docs/pipestat/code/python-tutorial.md
+4-1Lines changed: 4 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,14 +15,17 @@ To make your Python pipeline pipestat-compatible, you first need to initialize p
15
15
16
16
## Back-end types
17
17
18
-
Two types of back-ends are currently supported:
18
+
Three types of back-ends are currently supported:
19
19
20
20
1. a **file** (pass a file path to the constructor)
21
21
The changes reported using the `report` method of `PipestatManger` will be securely written to the file. Currently only [YAML](https://yaml.org/) format is supported.
22
22
23
23
2. a **PostgreSQL database** (pass a path to the pipestat config to the constructor)
24
24
This option gives the user the possibility to use a fully fledged database to back `PipestatManager`.
25
25
26
+
3. a **PEP on PEPhub** (pass a pep path to the constructior, e.g. `psm = PipestatManager(pephub_path=pephubpath)`)
27
+
This option gives the user the possibility to use PEPhub as a backend for results.
You can also place this in the configuration file:
48
+
49
+
```yaml
50
+
pephub_path: "databio/pipestat_demo:default"
51
+
schema_path: sample_output_schema.yaml
52
+
53
+
```
47
54
48
55
Apart from that, there are many other *optional* configuration points that have defaults. Please refer to the [environment variables reference](http://pipestat.databio.org/en/dev/env_vars/) to learn about the the optional configuration options and their meaning.
Confirm installation by calling `pipestat -h` on the command line. If the `pipestat` executable is not in your `$PATH`, append this to your `.bashrc` or `.profile` (or `.bash_profile` on macOS):
13
+
14
+
```console
15
+
export PATH=~/.local/bin:$PATH
16
+
```
17
+
18
+
### Optional dependencies for database backend
19
+
20
+
Pipestat can use either a file or a database as the backend for recording results. The default installation only provides file backend. To install dependencies required for the database backend:
21
+
22
+
```
23
+
pip install pipestat['dbbackend']
24
+
```
25
+
26
+
### Optional dependencies for pipestat reader
27
+
28
+
To install dependencies for the included `pipestatreader` submodule:
0 commit comments