RDEToolKit is a fundamental Python package for creating workflows of RDE-structured programs. By utilizing various modules provided by RDEToolKit, you can easily build processes for registering research and experimental data into RDE. Additionally, by combining RDEToolKit with Python modules used in your research or experiments, you can achieve a wide range of tasks, from data registration to processing and visualization.
See the documentation for more details.
If you wish to make changes, please read the following document first:
- Python: 3.10 or higher
!!! note "Python 3.9 Support Removed" Python 3.9 support was removed in rdetoolkit 1.6.x. If you need Python 3.9 support, use rdetoolkit 1.5.x or earlier.
To install, run the following command:
pip install rdetoolkitBelow is an example of building an RDE-structured program.
First, prepare the necessary files for the RDE-structured program. Run the following command in your terminal or shell:
python3 -m rdetoolkit initIf the command runs successfully, the following files and directories will be generated.
In this example, development proceeds within a directory named container.
- requirements.txt
- Add any Python packages you wish to use for building the structured program. Run
pip installas needed.
- Add any Python packages you wish to use for building the structured program. Run
- modules
- Store programs you want to use for structuring processing here. Details are explained in a later section.
- main.py
- Defines the entry point for the structured program.
- data/inputdata
- Place data files to be processed here.
- data/invoice
- Required even as an empty file for local execution.
- data/tasksupport
- Place supporting files for structuring processing here.
container
├── data
│ ├── inputdata
│ ├── invoice
│ │ └── invoice.json
│ └── tasksupport
│ ├── invoice.schema.json
│ └── metadata-def.json
├── main.py
├── modules
└── requirements.txtYou can process input data (e.g., data transformation, visualization, creation of CSV files for machine learning) and register the results into RDE. By following the format below, you can incorporate your own processing into the RDE structured workflow.
The recommended signature for the dataset() function accepts a single
RdeDatasetPaths argument that bundles both input and output locations. The
legacy two-argument style (RdeInputDirPaths, RdeOutputResourcePath) remains
available for backward compatibility.
from rdetoolkit.models.rde2types import RdeDatasetPaths
def dataset(paths: RdeDatasetPaths) -> None:
...In this example, we define a dummy function display_message() under modules to demonstrate how to implement custom structuring processing. Create a file named modules/modules.py as follows:
# modules/modules.py
from rdetoolkit.models.rde2types import RdeDatasetPaths
def display_message(path):
print(f"Test Message!: {path}")
def dataset(paths: RdeDatasetPaths) -> None:
display_message(paths.inputdata)
display_message(paths.struct)Next, use rdetoolkit.workflow.run() to define the entry point. The main tasks performed in the entry point are:
- Checking input files
- Obtaining various directory paths as specified by RDE structure
- Executing user-defined structuring processing
import rdetoolkit
from modules.modules import dataset # User-defined structuring processing function
# Pass the user-defined structuring processing function as an argument
rdetoolkit.workflows.run(custom_dataset_function=dataset)If you do not wish to pass a custom structuring processing function, define as follows:
import rdetoolkit
rdetoolkit.workflows.run()To debug or test the RDE structured process in your local environment, simply add the necessary input data to the data directory. As long as the data directory is placed at the same level as main.py, it will work as shown below:
container/
├── main.py
├── requirements.txt
├── modules/
│ └── modules.py
└── data/
├── inputdata/
│ └── <experimental data to process>
├── invoice/
│ └── invoice.json
└── tasksupport/
├── metadata-def.json
└── invoice.schema.jsonRDEToolKit provides validation commands to verify the structure and correctness of your RDE project files. These commands help catch configuration errors early and can be integrated into CI/CD pipelines.
rdetoolkit validate invoice-schema data/tasksupport/invoice.schema.jsonrdetoolkit validate invoice data/invoice/invoice.json \
--schema data/tasksupport/invoice.schema.jsonrdetoolkit validate metadata-def data/tasksupport/metadata-def.jsonrdetoolkit validate metadata data/metadata.json \
--schema data/tasksupport/metadata-def.jsonValidate all standard files in your project at once:
# Validate all files in current directory
rdetoolkit validate all
# Validate all files in specific project
rdetoolkit validate all /path/to/project
# Use JSON output for CI/CD integration
rdetoolkit validate all --format jsonFor more details, see the validation documentation.