pepkit
diff --git a/‎docs/pipestat/README.md‎
Lines changed: 3 additions & 116 deletions b/‎docs/pipestat/README.md‎
Lines changed: 3 additions & 116 deletions
diff --git a/‎docs/pipestat/backends.md‎
Lines changed: 79 additions & 0 deletions b/‎docs/pipestat/backends.md‎
Lines changed: 79 additions & 0 deletions
diff --git a/‎docs/pipestat/code/python-tutorial.md‎
Lines changed: 4 additions & 1 deletion b/‎docs/pipestat/code/python-tutorial.md‎
Lines changed: 4 additions & 1 deletion
diff --git a/‎docs/pipestat/code/reporting-objects.md‎
Lines changed: 0 additions & 10 deletions b/‎docs/pipestat/code/reporting-objects.md‎
Lines changed: 0 additions & 10 deletions
diff --git a/‎docs/pipestat/configuration.md‎
Lines changed: 7 additions & 0 deletions b/‎docs/pipestat/configuration.md‎
Lines changed: 7 additions & 0 deletions
diff --git a/‎docs/pipestat/install.md‎
Lines changed: 32 additions & 0 deletions b/‎docs/pipestat/install.md‎
Lines changed: 32 additions & 0 deletions
@@ -14,122 +14,9 @@ Pipestat standardizes reporting of pipeline results. It provides 1) a standard s
 
 ## How does pipestat work?
 
-A pipeline author defines all the outputs produced by a pipeline by writing a JSON-schema. The pipeline then uses pipestat to report pipeline outputs as the pipeline runs, either via the Python API or command line interface. The user configures results to be stored either in a [YAML-formatted file](https://yaml.org/spec/1.2/spec.html) or a [PostgreSQL database](https://www.postgresql.org/). The results are recorded according to the pipestat specification, in a standard, pipeline-agnostic way. This way, downstream software can use this specification to create universal tools for analyzing, monitoring, and visualizing pipeline results that will work with any pipeline or workflow.
+A pipeline author defines all the outputs produced by a pipeline by writing a JSON-schema. The pipeline then uses pipestat to report pipeline outputs as the pipeline runs, either via the Python API or command line interface. The user configures results to be stored either in a [YAML-formatted file](https://yaml.org/spec/1.2/spec.html), a [PostgreSQL database](https://www.postgresql.org/) or on [PEPhub](https://pephub.databio.org/). The results are recorded according to the pipestat specification, in a standard, pipeline-agnostic way. This way, downstream software can use this specification to create universal tools for analyzing, monitoring, and visualizing pipeline results that will work with any pipeline or workflow.
 
 <!-- TODO: This needs a graphical representation here. -->
 
-
-## Installing pipestat
-
-### Minimal install for file backend
-
-Install pipestat from PyPI with `pip`: 
-
-```
-pip install pipestat
-```
-
-Confirm installation by calling `pipestat -h` on the command line. If the `pipestat` executable is not in your `$PATH`, append this to your `.bashrc` or `.profile` (or `.bash_profile` on macOS):
-
-```console
-export PATH=~/.local/bin:$PATH
-```
-
-### Optional dependencies for database backend
-
-Pipestat can use either a file or a database as the backend for recording results. The default installation only provides file backend. To install dependencies required for the database backend:
-
-```
-pip install pipestat['dbbackend']
-```
-
-### Optional dependencies for pipestat reader
-
-To install dependencies for the included `pipestatreader` submodule:
-
-```
-pip install pipestat['pipestatreader']
-```
-
-## Set environment variables
-
-<!-- TODO: What is going on here? This needs a sentence of explanation before jumping into a code block -->
-
-```console
-export PIPESTAT_RESULTS_SCHEMA=output_schema.yaml
-export PIPESTAT_RECORD_IDENTIFIER=my_record
-export PIPESTAT_RESULTS_FILE=results_file.yaml
-```
-
-When setting environment variables like this, you will need to provide an `output_schema.yaml` file in your current working directory with the following example data:
-
-```yaml
-title: An example Pipestat output schema
-description: A pipeline using pipestat to report sample and project results.
-type: object
-properties:
-  pipeline_name: "default_pipeline_name"
-  samples:
-    type: object
-    properties:
-        result_name:
-          type: string
-          description: "ResultName"
-```
-
-## Pipeline results reporting and retrieval
-
-These examples assume the above environment variables are set.
-
-### Command-line usage
-
-```console
-# Report a result:
-pipestat report -i result_name -v 1.1
-
-# Retrieve the result:
-pipestat retrieve -r my_record
-```
-
-### Python usage
-
-```python
-import pipestat
-
-# Report a result
-psm = pipestat.PipestatManager()
-psm.report(values={"result_name": 1.1})
-
-# Retrieve a result
-psm = pipestat.PipestatManager()
-psm.retrieve_one(result_identifier="result_name")
-```
-
-## Pipeline status management
-
-### From command line:
-
-
-
-```console
-# Set status
-pipestat status set running
-
-# Get status
-pipestat status get
-```
-
-### Python usage
-
-
-```python
-import pipestat
-
-# Set status
-psm = pipestat.PipestatManager()
-psm.set_status(status_identifier="running")
-
-# Get status
-psm = pipestat.PipestatManager()
-psm.get_status()
-```
+## Quick start
+Check out the [quickstart guide](./code/api-quickstar.md). See [API Usage](./code/python-tutorial.md) and [CLI Usage](./code/cli.md).
@@ -0,0 +1,79 @@
+# Back-end types
+
+
+The pipestat specification describes three backend types for storing results: a [YAML-formatted file](https://yaml.org/spec/1.2/spec.html), a [PostgreSQL database](https://www.postgresql.org/) or reporting results to [PEPhub](https://pephub.databio.org/). This flexibility makes pipestat useful for a wide variety of use cases. Some users just need a simple text file for smaller-scale needs, which is convenient and universal, requiring no database infrastructure. For larger-scale systems, a database back-end is necessary. The pipestat specification provides a layer that spans the three possibilities, so that reports can be made in the same way, regardless of which back-end is used in a particular use case.
+
+By using the `pipestat` package to write results, the pipeline author need not be concerned with database connections or dealing with racefree file writing, as these tasks are already implemented. The user who runs the pipeline will simply configure the pipestat backend as required.
+
+Both backends organize the results in a hierarchy which is *always* structured this way:
+
+![Result hierarchy](img/result_hierarchy.svg)
+
+
+
+## File
+
+The changes reported using the `report` method of `PipestatManger` will be securely written to the file. Currently only [YAML](https://yaml.org/) format is supported. 
+
+Example:
+
+```python
+psm = PipestatManager(results_file_path="result_file.yaml", schema_path=schema_file)
+```
+
+For the YAML file backend, each file represents a namespace. The file always begins with a single top-level key which indicates the namespace. Second-level keys correspond to the record identifiers; third-level keys correspond to result identifiers, which point to the reported values. The values can then be any of the allowed pipestat data types, which include both basic and advanced data types.
+
+```yaml
+default_pipeline_name:
+  project: {}
+  sample:
+    sample_1:
+      meta:
+        pipestat_modified_time: '2025-10-01 12:48:58'
+        pipestat_created_time: '2025-10-01 12:48:58'
+      number_of_things: '12'
+```
+
+## PostgreSQL database
+This option gives the user the possibility to use a fully fledged database to back `PipestatManager`. 
+
+Example:
+
+```python
+psm = PipestatManager(config_file="config_file.yaml", schema_path=schema_file)
+```
+where the config file has the following (example) values:
+
+```yaml
+schema_path: sample_output_schema.yaml
+database:
+  dialect: postgresql
+  driver: psycopg
+  name: pipestat-test
+  user: postgres
+  password: pipestat-password
+  host: 127.0.0.1
+  port: 5432
+
+```
+
+For the PostgreSQL backend, the name of the database is configurable and defined in the [config file](config.md) in `database.name`. The database is structured like this:
+
+- The namespace corresponds to the name of the table.
+- The record identifier is indicated in the *unique* `record_identifier` column in that table.
+- Each result is specified as a column in the table, with the column name corresponding to the result identifier
+- The values in the cells for a record and result identifier correspond to the actual data values reported for the given result.
+
+![RDB hierarchy](img/db_hierarchy.svg)
+
+
+
+## PEP on PEPhub
+This option gives the user the possibility to use [PEPhub](https://pephub.databio.org/) as a backend for results. 
+
+```python
+psm = PipestatManager(pephub_path=pephubpath, schema_path="sample_output_schema.yaml")
+```
+
+
+All three backends *can* be configured using the config file. However, the PostgreSQL backend *must* use a config file.
@@ -15,14 +15,17 @@ To make your Python pipeline pipestat-compatible, you first need to initialize p
 
 ## Back-end types
 
-Two types of back-ends are currently supported:
+Three types of back-ends are currently supported:
 
 1. a **file** (pass a file path to the constructor)  
 The changes reported using the `report` method of `PipestatManger` will be securely written to the file. Currently only [YAML](https://yaml.org/) format is supported. 
 
 2. a **PostgreSQL database** (pass a path to the pipestat config to the constructor)
 This option gives the user the possibility to use a fully fledged database to back `PipestatManager`. 
 
+3. a **PEP on PEPhub** (pass a pep path to the constructior, e.g. `psm = PipestatManager(pephub_path=pephubpath)`)
+This option gives the user the possibility to use PEPhub as a backend for results. 
+
 
 ## Initializing a pipestat session
 
 
@@ -5,10 +5,6 @@ This tutorial will show you how pipestat can report not just primitive types, bu
 First create a `pipestat.PipestatManager` object with our example schema:
 
 
-```python
-
-```
-
 
 ```python
 import pipestat
@@ -93,9 +89,3 @@ psm.retrieve_one("sample1", "mydict")['toplevel']['value']
 
     456
 
-
-
-
-```python
-
-```
@@ -44,6 +44,13 @@ Beginning with v0.10.0, there is also support for reporting results directly to
 psm = PipestatManager(pephub_path="databio/pipestat_demo:default", schema_path=my_schema_file_path)
 ```
 
+You can also place this in the configuration file:
+
+```yaml
+pephub_path: "databio/pipestat_demo:default"
+schema_path: sample_output_schema.yaml
+
+```
 
 Apart from that, there are many other *optional* configuration points that have defaults. Please refer to the [environment variables reference](http://pipestat.databio.org/en/dev/env_vars/) to learn about the the optional configuration options and their meaning.
 
 
@@ -0,0 +1,32 @@
+
+# Installing pipestat
+
+### Minimal install for file backend
+
+Install pipestat from PyPI with `pip`: 
+
+```
+pip install pipestat
+```
+
+Confirm installation by calling `pipestat -h` on the command line. If the `pipestat` executable is not in your `$PATH`, append this to your `.bashrc` or `.profile` (or `.bash_profile` on macOS):
+
+```console
+export PATH=~/.local/bin:$PATH
+```
+
+### Optional dependencies for database backend
+
+Pipestat can use either a file or a database as the backend for recording results. The default installation only provides file backend. To install dependencies required for the database backend:
+
+```
+pip install pipestat['dbbackend']
+```
+
+### Optional dependencies for pipestat reader
+
+To install dependencies for the included `pipestatreader` submodule:
+
+```
+pip install pipestat['pipestatreader']
+```