Skip to content

Reward Not Increasing While trying to use Qwen 2.5 - 0.5B,1.5B Instruct models for training on musique with re-search code #76

@prasadke20

Description

@prasadke20

Hi I am trying to use smaller LLMs to train re-search with musique data, although i am facing an issue with no reward increase
Here is the config i am using
bash train.sh --train_batch_size 48 --ppo_mini_batch_size 48 --prompt_template_name re_search_template_sys --actor_model_path Qwen/Qwen2.5-0.5B-Instruct --search_url url --nnodes 1 --n_gpus_per_node 2 --save_freq 5 --test_freq 5 --total_epochs 2 --save_path ReSearch --train_files ReSearch/data/musique/train.parquet --test_files ReSearch/data/musique/test.parquet --apply_chat True

Also after a point searches sent to retriever are blank searches

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions