Skip to content

nims-mdpf/rdetoolkit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1,338 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GitHub Release python.org MIT License Issue workflow coverage

日本語ドキュメント

RDEToolKit

RDEToolKit is a fundamental Python package for creating workflows of RDE-structured programs. By utilizing various modules provided by RDEToolKit, you can easily build processes for registering research and experimental data into RDE. Additionally, by combining RDEToolKit with Python modules used in your research or experiments, you can achieve a wide range of tasks, from data registration to processing and visualization.

Documents

See the documentation for more details.

Contributing

If you wish to make changes, please read the following document first:

Requirements

  • Python: 3.10 or higher

!!! note "Python 3.9 Support Removed" Python 3.9 support was removed in rdetoolkit 1.6.x. If you need Python 3.9 support, use rdetoolkit 1.5.x or earlier.

Install

To install, run the following command:

pip install rdetoolkit

Usage

Below is an example of building an RDE-structured program.

Create a Project

First, prepare the necessary files for the RDE-structured program. Run the following command in your terminal or shell:

python3 -m rdetoolkit init

If the command runs successfully, the following files and directories will be generated.

In this example, development proceeds within a directory named container.

  • requirements.txt
    • Add any Python packages you wish to use for building the structured program. Run pip install as needed.
  • modules
    • Store programs you want to use for structuring processing here. Details are explained in a later section.
  • main.py
    • Defines the entry point for the structured program.
  • data/inputdata
    • Place data files to be processed here.
  • data/invoice
    • Required even as an empty file for local execution.
  • data/tasksupport
    • Place supporting files for structuring processing here.
container
├── data
│   ├── inputdata
│   ├── invoice
│   │   └── invoice.json
│   └── tasksupport
│       ├── invoice.schema.json
│       └── metadata-def.json
├── main.py
├── modules
└── requirements.txt

Implementing Structuring Processing

You can process input data (e.g., data transformation, visualization, creation of CSV files for machine learning) and register the results into RDE. By following the format below, you can incorporate your own processing into the RDE structured workflow.

The recommended signature for the dataset() function accepts a single RdeDatasetPaths argument that bundles both input and output locations. The legacy two-argument style (RdeInputDirPaths, RdeOutputResourcePath) remains available for backward compatibility.

from rdetoolkit.models.rde2types import RdeDatasetPaths

def dataset(paths: RdeDatasetPaths) -> None:
    ...

In this example, we define a dummy function display_message() under modules to demonstrate how to implement custom structuring processing. Create a file named modules/modules.py as follows:

# modules/modules.py
from rdetoolkit.models.rde2types import RdeDatasetPaths


def display_message(path):
    print(f"Test Message!: {path}")


def dataset(paths: RdeDatasetPaths) -> None:
    display_message(paths.inputdata)
    display_message(paths.struct)

About the Entry Point

Next, use rdetoolkit.workflow.run() to define the entry point. The main tasks performed in the entry point are:

  • Checking input files
  • Obtaining various directory paths as specified by RDE structure
  • Executing user-defined structuring processing
import rdetoolkit
from modules.modules import dataset  # User-defined structuring processing function

# Pass the user-defined structuring processing function as an argument
rdetoolkit.workflows.run(custom_dataset_function=dataset)

If you do not wish to pass a custom structuring processing function, define as follows:

import rdetoolkit

rdetoolkit.workflows.run()

Running in a Local Environment

To debug or test the RDE structured process in your local environment, simply add the necessary input data to the data directory. As long as the data directory is placed at the same level as main.py, it will work as shown below:

container/
├── main.py
├── requirements.txt
├── modules/
│   └── modules.py
└── data/
    ├── inputdata/
    │   └── <experimental data to process>
    ├── invoice/
    │   └── invoice.json
    └── tasksupport/
        ├── metadata-def.json
        └── invoice.schema.json

Validating RDE Files

RDEToolKit provides validation commands to verify the structure and correctness of your RDE project files. These commands help catch configuration errors early and can be integrated into CI/CD pipelines.

Validate Invoice Schema

rdetoolkit validate invoice-schema data/tasksupport/invoice.schema.json

Validate Invoice Data

rdetoolkit validate invoice data/invoice/invoice.json \
  --schema data/tasksupport/invoice.schema.json

Validate Metadata Definition

rdetoolkit validate metadata-def data/tasksupport/metadata-def.json

Validate Metadata Data

rdetoolkit validate metadata data/metadata.json \
  --schema data/tasksupport/metadata-def.json

Batch Validation

Validate all standard files in your project at once:

# Validate all files in current directory
rdetoolkit validate all

# Validate all files in specific project
rdetoolkit validate all /path/to/project

# Use JSON output for CI/CD integration
rdetoolkit validate all --format json

For more details, see the validation documentation.

About

rdetoolkit is a basic Python package for creating workflows for RDE structured programs.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors