Skip to content

Commit c41fd70

Browse files
Merge pull request #46 from pepkit/dev_pipestat_docs
Update and polish pipestat docs
2 parents 1564565 + c81c1df commit c41fd70

File tree

13 files changed

+421
-167
lines changed

13 files changed

+421
-167
lines changed

docs/pipestat/README.md

Lines changed: 3 additions & 116 deletions
Original file line numberDiff line numberDiff line change
@@ -14,122 +14,9 @@ Pipestat standardizes reporting of pipeline results. It provides 1) a standard s
1414

1515
## How does pipestat work?
1616

17-
A pipeline author defines all the outputs produced by a pipeline by writing a JSON-schema. The pipeline then uses pipestat to report pipeline outputs as the pipeline runs, either via the Python API or command line interface. The user configures results to be stored either in a [YAML-formatted file](https://yaml.org/spec/1.2/spec.html) or a [PostgreSQL database](https://www.postgresql.org/). The results are recorded according to the pipestat specification, in a standard, pipeline-agnostic way. This way, downstream software can use this specification to create universal tools for analyzing, monitoring, and visualizing pipeline results that will work with any pipeline or workflow.
17+
A pipeline author defines all the outputs produced by a pipeline by writing a JSON-schema. The pipeline then uses pipestat to report pipeline outputs as the pipeline runs, either via the Python API or command line interface. The user configures results to be stored either in a [YAML-formatted file](https://yaml.org/spec/1.2/spec.html), a [PostgreSQL database](https://www.postgresql.org/) or on [PEPhub](https://pephub.databio.org/). The results are recorded according to the pipestat specification, in a standard, pipeline-agnostic way. This way, downstream software can use this specification to create universal tools for analyzing, monitoring, and visualizing pipeline results that will work with any pipeline or workflow.
1818

1919
<!-- TODO: This needs a graphical representation here. -->
2020

21-
22-
## Installing pipestat
23-
24-
### Minimal install for file backend
25-
26-
Install pipestat from PyPI with `pip`:
27-
28-
```
29-
pip install pipestat
30-
```
31-
32-
Confirm installation by calling `pipestat -h` on the command line. If the `pipestat` executable is not in your `$PATH`, append this to your `.bashrc` or `.profile` (or `.bash_profile` on macOS):
33-
34-
```console
35-
export PATH=~/.local/bin:$PATH
36-
```
37-
38-
### Optional dependencies for database backend
39-
40-
Pipestat can use either a file or a database as the backend for recording results. The default installation only provides file backend. To install dependencies required for the database backend:
41-
42-
```
43-
pip install pipestat['dbbackend']
44-
```
45-
46-
### Optional dependencies for pipestat reader
47-
48-
To install dependencies for the included `pipestatreader` submodule:
49-
50-
```
51-
pip install pipestat['pipestatreader']
52-
```
53-
54-
## Set environment variables
55-
56-
<!-- TODO: What is going on here? This needs a sentence of explanation before jumping into a code block -->
57-
58-
```console
59-
export PIPESTAT_RESULTS_SCHEMA=output_schema.yaml
60-
export PIPESTAT_RECORD_IDENTIFIER=my_record
61-
export PIPESTAT_RESULTS_FILE=results_file.yaml
62-
```
63-
64-
When setting environment variables like this, you will need to provide an `output_schema.yaml` file in your current working directory with the following example data:
65-
66-
```yaml
67-
title: An example Pipestat output schema
68-
description: A pipeline using pipestat to report sample and project results.
69-
type: object
70-
properties:
71-
pipeline_name: "default_pipeline_name"
72-
samples:
73-
type: object
74-
properties:
75-
result_name:
76-
type: string
77-
description: "ResultName"
78-
```
79-
80-
## Pipeline results reporting and retrieval
81-
82-
These examples assume the above environment variables are set.
83-
84-
### Command-line usage
85-
86-
```console
87-
# Report a result:
88-
pipestat report -i result_name -v 1.1
89-
90-
# Retrieve the result:
91-
pipestat retrieve -r my_record
92-
```
93-
94-
### Python usage
95-
96-
```python
97-
import pipestat
98-
99-
# Report a result
100-
psm = pipestat.PipestatManager()
101-
psm.report(values={"result_name": 1.1})
102-
103-
# Retrieve a result
104-
psm = pipestat.PipestatManager()
105-
psm.retrieve_one(result_identifier="result_name")
106-
```
107-
108-
## Pipeline status management
109-
110-
### From command line:
111-
112-
113-
114-
```console
115-
# Set status
116-
pipestat status set running
117-
118-
# Get status
119-
pipestat status get
120-
```
121-
122-
### Python usage
123-
124-
125-
```python
126-
import pipestat
127-
128-
# Set status
129-
psm = pipestat.PipestatManager()
130-
psm.set_status(status_identifier="running")
131-
132-
# Get status
133-
psm = pipestat.PipestatManager()
134-
psm.get_status()
135-
```
21+
## Quick start
22+
Check out the [quickstart guide](./code/api-quickstar.md). See [API Usage](./code/python-tutorial.md) and [CLI Usage](./code/cli.md).

docs/pipestat/backends.md

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
# Back-end types
2+
3+
4+
The pipestat specification describes three backend types for storing results: a [YAML-formatted file](https://yaml.org/spec/1.2/spec.html), a [PostgreSQL database](https://www.postgresql.org/) or reporting results to [PEPhub](https://pephub.databio.org/). This flexibility makes pipestat useful for a wide variety of use cases. Some users just need a simple text file for smaller-scale needs, which is convenient and universal, requiring no database infrastructure. For larger-scale systems, a database back-end is necessary. The pipestat specification provides a layer that spans the three possibilities, so that reports can be made in the same way, regardless of which back-end is used in a particular use case.
5+
6+
By using the `pipestat` package to write results, the pipeline author need not be concerned with database connections or dealing with racefree file writing, as these tasks are already implemented. The user who runs the pipeline will simply configure the pipestat backend as required.
7+
8+
Both backends organize the results in a hierarchy which is *always* structured this way:
9+
10+
![Result hierarchy](img/result_hierarchy.svg)
11+
12+
13+
14+
## File
15+
16+
The changes reported using the `report` method of `PipestatManger` will be securely written to the file. Currently only [YAML](https://yaml.org/) format is supported.
17+
18+
Example:
19+
20+
```python
21+
psm = PipestatManager(results_file_path="result_file.yaml", schema_path=schema_file)
22+
```
23+
24+
For the YAML file backend, each file represents a namespace. The file always begins with a single top-level key which indicates the namespace. Second-level keys correspond to the record identifiers; third-level keys correspond to result identifiers, which point to the reported values. The values can then be any of the allowed pipestat data types, which include both basic and advanced data types.
25+
26+
```yaml
27+
default_pipeline_name:
28+
project: {}
29+
sample:
30+
sample_1:
31+
meta:
32+
pipestat_modified_time: '2025-10-01 12:48:58'
33+
pipestat_created_time: '2025-10-01 12:48:58'
34+
number_of_things: '12'
35+
```
36+
37+
## PostgreSQL database
38+
This option gives the user the possibility to use a fully fledged database to back `PipestatManager`.
39+
40+
Example:
41+
42+
```python
43+
psm = PipestatManager(config_file="config_file.yaml", schema_path=schema_file)
44+
```
45+
where the config file has the following (example) values:
46+
47+
```yaml
48+
schema_path: sample_output_schema.yaml
49+
database:
50+
dialect: postgresql
51+
driver: psycopg
52+
name: pipestat-test
53+
user: postgres
54+
password: pipestat-password
55+
host: 127.0.0.1
56+
port: 5432
57+
58+
```
59+
60+
For the PostgreSQL backend, the name of the database is configurable and defined in the [config file](config.md) in `database.name`. The database is structured like this:
61+
62+
- The namespace corresponds to the name of the table.
63+
- The record identifier is indicated in the *unique* `record_identifier` column in that table.
64+
- Each result is specified as a column in the table, with the column name corresponding to the result identifier
65+
- The values in the cells for a record and result identifier correspond to the actual data values reported for the given result.
66+
67+
![RDB hierarchy](img/db_hierarchy.svg)
68+
69+
70+
71+
## PEP on PEPhub
72+
This option gives the user the possibility to use [PEPhub](https://pephub.databio.org/) as a backend for results.
73+
74+
```python
75+
psm = PipestatManager(pephub_path=pephubpath, schema_path="sample_output_schema.yaml")
76+
```
77+
78+
79+
All three backends *can* be configured using the config file. However, the PostgreSQL backend *must* use a config file.

docs/pipestat/code/python-tutorial.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,14 +15,17 @@ To make your Python pipeline pipestat-compatible, you first need to initialize p
1515

1616
## Back-end types
1717

18-
Two types of back-ends are currently supported:
18+
Three types of back-ends are currently supported:
1919

2020
1. a **file** (pass a file path to the constructor)
2121
The changes reported using the `report` method of `PipestatManger` will be securely written to the file. Currently only [YAML](https://yaml.org/) format is supported.
2222

2323
2. a **PostgreSQL database** (pass a path to the pipestat config to the constructor)
2424
This option gives the user the possibility to use a fully fledged database to back `PipestatManager`.
2525

26+
3. a **PEP on PEPhub** (pass a pep path to the constructior, e.g. `psm = PipestatManager(pephub_path=pephubpath)`)
27+
This option gives the user the possibility to use PEPhub as a backend for results.
28+
2629

2730
## Initializing a pipestat session
2831

docs/pipestat/code/reporting-objects.md

Lines changed: 0 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,6 @@ This tutorial will show you how pipestat can report not just primitive types, bu
55
First create a `pipestat.PipestatManager` object with our example schema:
66

77

8-
```python
9-
10-
```
11-
128

139
```python
1410
import pipestat
@@ -93,9 +89,3 @@ psm.retrieve_one("sample1", "mydict")['toplevel']['value']
9389

9490
456
9591

96-
97-
98-
99-
```python
100-
101-
```

docs/pipestat/configuration.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,13 @@ Beginning with v0.10.0, there is also support for reporting results directly to
4444
psm = PipestatManager(pephub_path="databio/pipestat_demo:default", schema_path=my_schema_file_path)
4545
```
4646

47+
You can also place this in the configuration file:
48+
49+
```yaml
50+
pephub_path: "databio/pipestat_demo:default"
51+
schema_path: sample_output_schema.yaml
52+
53+
```
4754
4855
Apart from that, there are many other *optional* configuration points that have defaults. Please refer to the [environment variables reference](http://pipestat.databio.org/en/dev/env_vars/) to learn about the the optional configuration options and their meaning.
4956

docs/pipestat/install.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
2+
# Installing pipestat
3+
4+
### Minimal install for file backend
5+
6+
Install pipestat from PyPI with `pip`:
7+
8+
```
9+
pip install pipestat
10+
```
11+
12+
Confirm installation by calling `pipestat -h` on the command line. If the `pipestat` executable is not in your `$PATH`, append this to your `.bashrc` or `.profile` (or `.bash_profile` on macOS):
13+
14+
```console
15+
export PATH=~/.local/bin:$PATH
16+
```
17+
18+
### Optional dependencies for database backend
19+
20+
Pipestat can use either a file or a database as the backend for recording results. The default installation only provides file backend. To install dependencies required for the database backend:
21+
22+
```
23+
pip install pipestat['dbbackend']
24+
```
25+
26+
### Optional dependencies for pipestat reader
27+
28+
To install dependencies for the included `pipestatreader` submodule:
29+
30+
```
31+
pip install pipestat['pipestatreader']
32+
```

0 commit comments

Comments
 (0)