Skip to content

xashru/rlvr-seq-generalization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RLVR Sequence Generalization

This repository contains reinforcement learning experiments for sequence generalization on the activity and lis datasets. Training loops, actor/reference coordination, and evaluation utilities build on top of the verl framework while customizing data pipelines and rewards for this project.

Data Layout

  • Parquet files for the activity task live in seqdata/activity/.
  • Parquet files for the lis task live in seqdata/lis/.
  • Both folders include train/test splits as well as *_reason.parquet variants which are used for explicit reasoning format-based reward.

Experiment Launchers

  • Shell scripts in myscripts/ are the primary entrypoints for running GRPO training. Each script pins the dataset split, model checkpoint, rollout configuration, and reward function selection for a specific experiment (e.g. bash myscripts/activity_answer_qwen2-7b.sh).
  • The scripts assume the directory layout above; update the seqdata folders to swap in new datasets without touching the launch configs.

Rewards

  • Custom reward shaping lives in verl/utils/reward_score/myreward.py. The launcher scripts reference functions from this module via the custom_reward_function overrides passed to Verl.
  • Modify or extend this module when introducing new rewards; all scripts pick up the changes automatically.

Evaluation

  • Use python pass_k.py --task activity --model Qwen/Qwen2.5-7B-Instruct --k 256 (or the myscripts/pass_k.sh helper) to measure pass@k metrics on the saved models.
  • Adjust the --task flag to switch between the activity and lis datasets or change --model/--k as needed for alternative checkpoints and sampling depths.

Environment Notes

  • Verl dependencies and CLI flags follow the upstream project. Refer to the official documentation if you need to customize distributed launch parameters or model backends beyond what the scripts provide.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors