SAMFEO-SD

An enhanced RNA sequence design framework integrating SAMFEO optimization with SamplingDesign probability weights.

🧬 Overview

This repository implements a hybrid approach to RNA sequence design. It builds upon the SAMFEO (Self-Adaptive Multi-objective Feature Extraction and Optimization) framework but fundamentally alters the mutation operator.

In standard evolutionary algorithms, nucleotide mutation is typically stochastic (random). This project integrates the SamplingDesign algorithm to extract structural probability weights. Instead of mutating nucleotides randomly, the algorithm samples new bases according to these calculated weights, guiding the search process toward more thermodynamically favorable and structurally sound conformations.

🚀 Key Innovation

The Problem

Original approaches often utilize a Uniform Mutation Strategy: $$P(A) = P(U) = P(G) = P(C) = 0.25$$ or simply swap base pairs uniformly. This "blind" mutation treats positions independently or as simple pairs, often disrupting complex stable substructures (like specific mismatch loops or triloops) and leading to slow convergence.

The Solution: Structured Motif Mutation

We utilize SamplingDesign to calculate specific joint probability weights ($W$) for coupled structural elements. The mutation operator is not limited to single nucleotides; it simultaneously updates entire motifs based on the target secondary structure:

Canonical Pairs: $P(N_i, N_j) \propto W_{pair}(N_i, N_j)$
Mismatches & Tri-mismatches: $P(N_i, N_j, ...) \propto W_{motif}(N_i, N_j, ...)$
Single Bases: $P(N_i) \propto W_{single}(N_i)$

By sampling these grouped values directly from the SamplingDesign output, the algorithm preserves critical structural correlations that random mutation would otherwise destroy.

📊 Performance

We evaluated SAMFEO-SD on the Eterna100 benchmark using Boltzmann probability as the objective function. Across 5 independent runs:

Metric	Score
Average NED (Normalized Ensemble Defect)	0.043
Average PD (Probability Defect)	0.580

📦 Installation

Clone the repository

git clone https://github.com/your-username/SAMFEO-SD.git
cd SAMFEO-SD

Install Python Dependencies
```
pip install numpy pandas
```
External Dependencies This project relies on RNA folding libraries (likely ViennaRNA) found in the utils/ directory. Ensure your environment is set up to support these calls.

⚙️ Configuration (Important)

Weight File Path: The integration with SamplingDesign relies on pre-calculated weight files. Currently, the code in main.py (function samfeo) points to a specific directory structure:

path = f"D:/ether/results/eterna100_targeted_time/{id}.txt"

Before running, you must either:

Place your SamplingDesign output files in this directory.
Or modify main.py to point to your actual data directory.

💻 Usage

To reproduce the results or run the design on the Eterna100 dataset using parallel processing:

python main.py --t 1 --k 10 --object pd --para --repeat 5 --path data/eterna/eterna100_v1.txt

📄 Input Formats

Puzzle File (`-p`)

The input file should be a text file where each line contains an ID and a dot-bracket structure:

Puzzle_1  ((((....))))
Puzzle_2  ((......))..

SamplingDesign Output

The code expects weight files to contain a "Final Distribution" section. This distribution may define probabilities for single bases, pairs, or larger groups (mismatches):

Final Distribution
(0,): A 0.99, C 0.01
(1, 10): AU 0.5, GC 0.5
(2, 5, 9): AGA 0.4, UCU 0.3
...

🔧 Arguments

Flag	Description	Default
`-p`, `--path`	Path to the input file containing puzzle IDs and structures.	`''`
`--object`, `-o`	Objective function: `pd` (Probability Defect) or `ned` (Normalized Ensemble Defect).	`pd`
`--step`	Number of optimization steps (iterations).	`5000`
`--k`	Population size (beam width).	`10`
`--t`	Temperature for Boltzmann selection.	`1.0`
`--para`	Enable parallel processing (multiprocessing).	`False`
`--repeat`	Number of times to repeat the experiment (runs).	`1`
`--worker_count`	Number of CPU workers for parallel mode.	`10`
`--nosm`	Disable structured mutation (revert to random/traditional mutation).	`False`
`--nomfe`	Skip MFE (Minimum Free Energy) check (faster, less accurate).	`False`

📂 Repository Structure

main.py: The core entry point containing the samfeo loop and mutate_structured logic.
/utils: Contains helper modules (vienna.py, structure.py) for energy calculations.

🔗 References

SAMFEO Approach (ISMB 2023): Zhou, T., Dai, N., Li, S., Ward, M., Mathews, D.H. and Huang, L., 2023. RNA design via structure-aware multifrontier ensemble optimization. Bioinformatics, 39(Supplement_1), pp.i563-i571. https://github.com/shanry/SAMFEO
SamplingDesign: SamplingDesign: RNA Design via Continuous Optimization with Coupled Variables and Monte-Carlo Sampling. Wei Yu Tang, Ning Dai, Tianshuo Zhou, David H. Mathews, and Liang Huang*. https://github.com/weiyutang1010/SamplingDesign

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
utils		utils
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SAMFEO-SD

🧬 Overview

🚀 Key Innovation

The Problem

The Solution: Structured Motif Mutation

📊 Performance

📦 Installation

⚙️ Configuration (Important)

💻 Usage

📄 Input Formats

Puzzle File (`-p`)

SamplingDesign Output

🔧 Arguments

📂 Repository Structure

🔗 References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SAMFEO-SD

🧬 Overview

🚀 Key Innovation

The Problem

The Solution: Structured Motif Mutation

📊 Performance

📦 Installation

⚙️ Configuration (Important)

💻 Usage

📄 Input Formats

Puzzle File (-p)

SamplingDesign Output

🔧 Arguments

📂 Repository Structure

🔗 References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Puzzle File (`-p`)

Packages