Protenix provides various pre-trained models to suit different computational resources, inference speeds, and prediction accuracy requirements. This document details the characteristics and configurations of these models.
Model names follow the format:
protenix_{model_size}_{features}_{version}
- model_size: Size of the model, including
base,mini(lightweight), andtiny(minimal). - features: Functional characteristics such as
default,constraint(distance constraints),esm(includes ESM embeddings), etc. Multiple features are separated by-. - version: Version number, e.g.,
v0.5.0,v1.0.0.
| Model Name | ESM | MSA | Constraint | RNA MSA | Template | Params | Training Data Cutoff |
|---|---|---|---|---|---|---|---|
protenix_base_default_v1.0.0 |
❌ | ✅ | ❌ | ✅ | ✅ | 368.48 M | 2021-09-30 |
protenix_base_20250630_v1.0.0 * |
❌ | ✅ | ❌ | ✅ | ✅ | 368.48 M | 2025-06-30 |
protenix_base_default_v0.5.0 |
❌ | ✅ | ❌ | ❌ | ❌ | 368.09 M | 2021-09-30 |
protenix_base_constraint_v0.5.0 |
❌ | ✅ | ✅ | ❌ | ❌ | 368.30 M | 2021-09-30 |
protenix_mini_esm_v0.5.0 |
✅ | ✅ | ❌ | ❌ | ❌ | 135.22 M | 2021-09-30 |
protenix_mini_ism_v0.5.0 |
✅ | ✅ | ❌ | ❌ | ❌ | 135.22 M | 2021-09-30 |
protenix_mini_default_v0.5.0 |
❌ | ✅ | ❌ | ❌ | ❌ | 134.06 M | 2021-09-30 |
protenix_tiny_default_v0.5.0 |
❌ | ✅ | ❌ | ❌ | ❌ | 109.50 M | 2021-09-30 |
*Note: For practical application scenarios, protenix_base_20250630_v1.0.0 is trained based on the 2025-06-30 wwPDB cutoff and is also released to the community. For fair benchmarks of model improvements across different versions, please use protenix_base_default_v1.0.0 (trained on the 2021-09-30 cutoff).
- Characteristics: Full-parameter models with the highest prediction accuracy.
- Key Configurations:
N_cycle: 10 (Number of recycle iterations).sample_diffusion.N_step: 200 (Diffusion steps for higher quality sampling).
- Use Case: Scientific research requiring maximum precision.
- Characteristics: Significant reduction in parameters and faster inference speed.
- Key Configurations:
N_cycle: 4sample_diffusion.N_step: 5
- Difference:
Minihas more layers in Pairformer and Transformer modules compared toTiny. - Use Case: High-throughput screening or scenarios with limited computational resources.
- Characteristics: Allows incorporating additional experimental constraints during inference (e.g., Pocket, Contact).
- Included Features:
pocket_embedder: Handles binding pocket information.contact_embedder: Handles contact point information.
- Use Case: Predictions with available structural priors.
- Characteristics: Integrates the single-sequence protein language model (ESM2-3B), performing better when MSAs are unavailable.
- Difference:
ESMuses standard ESM2 embeddings, whileISMuses specific ISM embeddings. - Note: For efficiency, these models do not use MSA by default.