Roadmap for LightRFT v0.1.2

# 🗺️ Roadmap for LightRFT v0.1.2
**Expected Release:** Feb. 2026

### ✨ New Features
*   **Algorithms**
    *   Add support for **GSPO** and **GMPO** algorithms (#22).
    *   Add support for **NeighbourGRPO**.
    *   Implement **On-policy Distillation**.
*   **Multimodal Support & Demos**
    *   **T2I Pipeline:** Add rejective sampling pipeline in T2I (Text-to-Image) demo (#3).
    *   **VLM Demos:**
        *   **Meme RL Training:** VLM demo using a **Reward Model**.
        *   **Metaphorstar (Chenhao's Work):** VLM RL training demo using **Rule Reward**.
    *   **Omni Models:**
        *   **Omni RL Training (Jieyi's Work):** Demo using **Rule Reward**.
    *   **Generative Media:**
        *   **T2I/T2V RL Training:** Demo for Text-to-Image/Video models using a **Reward Model**.
*   **Training Strategies**
    *   Implement **Partial Rollout** in the training process (#29).
    *   Add **PPO** support.

### ♻️ Refactoring & Optimization
*   **Core Logic (Loss & Filtering)**
    *   **Modular Loss-Filter:** Refactor implementation into `metrics`, `filters`, `weights`, and `manager` modules (#17).
    *   Refactored the core advantage calculation logic for better performance and maintainability (#16).
    *   **Loss Calculation:** Move loss calculation logic from Trainer to Model scope.
*   **Architecture & Interfaces**
    *   **Dataset & Reward:** Refactor Dataset and Reward modules for better modularity (#13).
    *   **Model Interface:** Standardize `generate` methods and hyperparameters across all models (aligning with `grm_vl`).
    *   **Token Alignment:** Unify token interfaces between Actor and Reward Model to minimize conversion overhead.
    *   **Critic:** Refactor and enhance Critic model implementation.
*   **Data Pipeline**
    *   **Dataclasses:** Unify dataset return formats using Dataclasses to simplify Trainer/ExpMaker.
    *   **Logic Separation:** Remove strategy logic from Datasets and standardize batch padding locations.
*   **Performance**
    *   Optimize efficiency for entropy and logit calculations.

### ⚙️ Compatibility & Dependencies
*   **Configuration**
    *   **LoRA Simplification:** Drastically simplify LoRA configuration.
        *   *Implementation:* Restrict entry-level arguments to only `use_lora` and `lora_rank`. Move all other detailed parameters into the specific LoRA initialization function.
    *   **DeepSpeed:** Clarify `ds_config` handling and integration within Model initialization.
*   **Dependencies**
    *   **vLLM:** Add support for the latest version of vLLM.

### 🐛 Bug Fixes & Maintenance
*   **Fixes**
    *   Fix issues related to `fire` library usage.
*   **Code Style**
    *   *(Ongoing improvements)*

### 📚 Documentation
*   **Tutorials & Best Practices**
    *   **GSM8K:** Create a comprehensive, step-by-step tutorial for the simplest GSM8K demo.
    *   **Best Practices:** Add 2-3 articles expanding on best practices for training and configuration.
    *   **LoRA Example:** Add a **Geo3K LoRA** training demo to showcase the new simplified LoRA workflow.
*   **Tools & Deployment**
    *   **Project Assistant:** Develop an LLM Q&A Assistant for the project (referencing the [SGLang Cookbook](https://cookbook.sglang.io) implementation).
*   **Content Updates**
    *   *(Placeholder for general updates)*

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues and discussions, and this feature hasn't been requested before.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Roadmap for LightRFT v0.1.2 #28

🗺️ Roadmap for LightRFT v0.1.2

✨ New Features

♻️ Refactoring & Optimization

⚙️ Compatibility & Dependencies

🐛 Bug Fixes & Maintenance

📚 Documentation

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Roadmap for LightRFT v0.1.2 #28

Description

🗺️ Roadmap for LightRFT v0.1.2

✨ New Features

♻️ Refactoring & Optimization

⚙️ Compatibility & Dependencies

🐛 Bug Fixes & Maintenance

📚 Documentation

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions