RLVR Sequence Generalization

This repository contains reinforcement learning experiments for sequence generalization on the activity and lis datasets. Training loops, actor/reference coordination, and evaluation utilities build on top of the verl framework while customizing data pipelines and rewards for this project.

Data Layout

Parquet files for the activity task live in seqdata/activity/.
Parquet files for the lis task live in seqdata/lis/.
Both folders include train/test splits as well as *_reason.parquet variants which are used for explicit reasoning format-based reward.

Experiment Launchers

Shell scripts in myscripts/ are the primary entrypoints for running GRPO training. Each script pins the dataset split, model checkpoint, rollout configuration, and reward function selection for a specific experiment (e.g. bash myscripts/activity_answer_qwen2-7b.sh).
The scripts assume the directory layout above; update the seqdata folders to swap in new datasets without touching the launch configs.

Rewards

Custom reward shaping lives in verl/utils/reward_score/myreward.py. The launcher scripts reference functions from this module via the custom_reward_function overrides passed to Verl.
Modify or extend this module when introducing new rewards; all scripts pick up the changes automatically.

Evaluation

Use python pass_k.py --task activity --model Qwen/Qwen2.5-7B-Instruct --k 256 (or the myscripts/pass_k.sh helper) to measure pass@k metrics on the saved models.
Adjust the --task flag to switch between the activity and lis datasets or change --model/--k as needed for alternative checkpoints and sampling depths.

Environment Notes

Verl dependencies and CLI flags follow the upstream project. Refer to the official documentation if you need to customize distributed launch parameters or model backends beyond what the scripts provide.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
analysis		analysis
misc_scripts		misc_scripts
myscripts		myscripts
seqdata		seqdata
verl		verl
.gitignore		.gitignore
LICENSE		LICENSE
Notice.txt		Notice.txt
README.md		README.md
pass_k.py		pass_k.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RLVR Sequence Generalization

Data Layout

Experiment Launchers

Rewards

Evaluation

Environment Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RLVR Sequence Generalization

Data Layout

Experiment Launchers

Rewards

Evaluation

Environment Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages