Reward Not Increasing While trying to use Qwen 2.5 - 0.5B,1.5B Instruct models for training on musique with re-search code

Hi I am trying to use smaller LLMs to train re-search with musique data, although i am facing an issue with no reward increase 
Here is the config i am using
bash train.sh --train_batch_size 48 --ppo_mini_batch_size 48 --prompt_template_name re_search_template_sys --actor_model_path Qwen/Qwen2.5-0.5B-Instruct --search_url url --nnodes 1 --n_gpus_per_node 2 --save_freq 5 --test_freq 5 --total_epochs 2  --save_path ReSearch --train_files  ReSearch/data/musique/train.parquet --test_files ReSearch/data/musique/test.parquet --apply_chat True

Also after a point searches sent to retriever are blank searches 

![Image](https://github.com/user-attachments/assets/bbc955bb-87a2-43f3-b6bb-234bd7eb6022)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reward Not Increasing While trying to use Qwen 2.5 - 0.5B,1.5B Instruct models for training on musique with re-search code #76

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reward Not Increasing While trying to use Qwen 2.5 - 0.5B,1.5B Instruct models for training on musique with re-search code #76

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions