-
Notifications
You must be signed in to change notification settings - Fork 10
Open
Labels
bugSomething isn't workingSomething isn't workingdocumentationImprovements or additions to documentationImprovements or additions to documentationenhancementNew feature or requestNew feature or requestpolishPolish algorithms, tests or configsPolish algorithms, tests or configsrefactorCleanup, formatting, or restructuring of existing code.Cleanup, formatting, or restructuring of existing code.styleCode or comments formattingCode or comments formatting
Description
🗺️ Roadmap for LightRFT v0.1.2
Expected Release: Feb. 2026
✨ New Features
- Algorithms
- Add support for GSPO and GMPO algorithms (feature(sunjx): add GSPO and GMPO algorithms support #22).
- Add support for NeighbourGRPO.
- Implement On-policy Distillation.
- Multimodal Support & Demos
- T2I Pipeline: Add rejective sampling pipeline in T2I (Text-to-Image) demo (feature(sunjx): add rejective sampling pipeline in t2i demo #3).
- VLM Demos:
- Meme RL Training: VLM demo using a Reward Model.
- Metaphorstar (Chenhao's Work): VLM RL training demo using Rule Reward.
- Omni Models:
- Omni RL Training (Jieyi's Work): Demo using Rule Reward.
- Generative Media:
- T2I/T2V RL Training: Demo for Text-to-Image/Video models using a Reward Model.
- Training Strategies
- Implement Partial Rollout in the training process (feature(luyd): add partial rollout in training process #29).
- Add PPO support.
♻️ Refactoring & Optimization
- Core Logic (Loss & Filtering)
- Modular Loss-Filter: Refactor implementation into
metrics,filters,weights, andmanagermodules (refactor(sunjx): refactor loss-filter implementation #17). - Refactored the core advantage calculation logic for better performance and maintainability (refactor(sunjx): refactor advantage calculation logic #16).
- Loss Calculation: Move loss calculation logic from Trainer to Model scope.
- Modular Loss-Filter: Refactor implementation into
- Architecture & Interfaces
- Dataset & Reward: Refactor Dataset and Reward modules for better modularity (refactor(sunjx): refactor dataset and reward module #13).
- Model Interface: Standardize
generatemethods and hyperparameters across all models (aligning withgrm_vl). - Token Alignment: Unify token interfaces between Actor and Reward Model to minimize conversion overhead.
- Critic: Refactor and enhance Critic model implementation.
- Data Pipeline
- Dataclasses: Unify dataset return formats using Dataclasses to simplify Trainer/ExpMaker.
- Logic Separation: Remove strategy logic from Datasets and standardize batch padding locations.
- Performance
- Optimize efficiency for entropy and logit calculations.
⚙️ Compatibility & Dependencies
- Configuration
- LoRA Simplification: Drastically simplify LoRA configuration.
- Implementation: Restrict entry-level arguments to only
use_loraandlora_rank. Move all other detailed parameters into the specific LoRA initialization function.
- Implementation: Restrict entry-level arguments to only
- DeepSpeed: Clarify
ds_confighandling and integration within Model initialization.
- LoRA Simplification: Drastically simplify LoRA configuration.
- Dependencies
- vLLM: Add support for the latest version of vLLM.
🐛 Bug Fixes & Maintenance
- Fixes
- Fix issues related to
firelibrary usage.
- Fix issues related to
- Code Style
- (Ongoing improvements)
📚 Documentation
- Tutorials & Best Practices
- GSM8K: Create a comprehensive, step-by-step tutorial for the simplest GSM8K demo.
- Best Practices: Add 2-3 articles expanding on best practices for training and configuration.
- LoRA Example: Add a Geo3K LoRA training demo to showcase the new simplified LoRA workflow.
- Tools & Deployment
- Project Assistant: Develop an LLM Q&A Assistant for the project (referencing the SGLang Cookbook implementation).
- Content Updates
- (Placeholder for general updates)
Before submitting a new issue...
- Make sure you already searched for relevant issues and discussions, and this feature hasn't been requested before.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingdocumentationImprovements or additions to documentationImprovements or additions to documentationenhancementNew feature or requestNew feature or requestpolishPolish algorithms, tests or configsPolish algorithms, tests or configsrefactorCleanup, formatting, or restructuring of existing code.Cleanup, formatting, or restructuring of existing code.styleCode or comments formattingCode or comments formatting