An enhanced RNA sequence design framework integrating SAMFEO optimization with SamplingDesign probability weights.
This repository implements a hybrid approach to RNA sequence design. It builds upon the SAMFEO (Self-Adaptive Multi-objective Feature Extraction and Optimization) framework but fundamentally alters the mutation operator.
In standard evolutionary algorithms, nucleotide mutation is typically stochastic (random). This project integrates the SamplingDesign algorithm to extract structural probability weights. Instead of mutating nucleotides randomly, the algorithm samples new bases according to these calculated weights, guiding the search process toward more thermodynamically favorable and structurally sound conformations.
Original approaches often utilize a Uniform Mutation Strategy:
We utilize SamplingDesign to calculate specific joint probability weights (
-
Canonical Pairs:
$P(N_i, N_j) \propto W_{pair}(N_i, N_j)$ -
Mismatches & Tri-mismatches:
$P(N_i, N_j, ...) \propto W_{motif}(N_i, N_j, ...)$ -
Single Bases:
$P(N_i) \propto W_{single}(N_i)$
By sampling these grouped values directly from the SamplingDesign output, the algorithm preserves critical structural correlations that random mutation would otherwise destroy.
We evaluated SAMFEO-SD on the Eterna100 benchmark using Boltzmann probability as the objective function. Across 5 independent runs:
| Metric | Score |
|---|---|
| Average NED (Normalized Ensemble Defect) | 0.043 |
| Average PD (Probability Defect) | 0.580 |
-
Clone the repository
git clone https://github.com/your-username/SAMFEO-SD.git cd SAMFEO-SD -
Install Python Dependencies
pip install numpy pandas
-
External Dependencies This project relies on RNA folding libraries (likely ViennaRNA) found in the
utils/directory. Ensure your environment is set up to support these calls.
Weight File Path:
The integration with SamplingDesign relies on pre-calculated weight files.
Currently, the code in main.py (function samfeo) points to a specific directory structure:
path = f"D:/ether/results/eterna100_targeted_time/{id}.txt"Before running, you must either:
- Place your SamplingDesign output files in this directory.
- Or modify
main.pyto point to your actual data directory.
To reproduce the results or run the design on the Eterna100 dataset using parallel processing:
python main.py --t 1 --k 10 --object pd --para --repeat 5 --path data/eterna/eterna100_v1.txtThe input file should be a text file where each line contains an ID and a dot-bracket structure:
Puzzle_1 ((((....))))
Puzzle_2 ((......))..
The code expects weight files to contain a "Final Distribution" section. This distribution may define probabilities for single bases, pairs, or larger groups (mismatches):
Final Distribution
(0,): A 0.99, C 0.01
(1, 10): AU 0.5, GC 0.5
(2, 5, 9): AGA 0.4, UCU 0.3
...
| Flag | Description | Default |
|---|---|---|
-p, --path |
Path to the input file containing puzzle IDs and structures. | '' |
--object, -o |
Objective function: pd (Probability Defect) or ned (Normalized Ensemble Defect). |
pd |
--step |
Number of optimization steps (iterations). | 5000 |
--k |
Population size (beam width). | 10 |
--t |
Temperature for Boltzmann selection. | 1.0 |
--para |
Enable parallel processing (multiprocessing). | False |
--repeat |
Number of times to repeat the experiment (runs). | 1 |
--worker_count |
Number of CPU workers for parallel mode. | 10 |
--nosm |
Disable structured mutation (revert to random/traditional mutation). | False |
--nomfe |
Skip MFE (Minimum Free Energy) check (faster, less accurate). | False |
main.py: The core entry point containing thesamfeoloop andmutate_structuredlogic./utils: Contains helper modules (vienna.py,structure.py) for energy calculations.
-
SAMFEO Approach (ISMB 2023): Zhou, T., Dai, N., Li, S., Ward, M., Mathews, D.H. and Huang, L., 2023. RNA design via structure-aware multifrontier ensemble optimization. Bioinformatics, 39(Supplement_1), pp.i563-i571. https://github.com/shanry/SAMFEO
-
SamplingDesign: SamplingDesign: RNA Design via Continuous Optimization with Coupled Variables and Monte-Carlo Sampling. Wei Yu Tang, Ning Dai, Tianshuo Zhou, David H. Mathews, and Liang Huang*. https://github.com/weiyutang1010/SamplingDesign