Hi I am trying to use smaller LLMs to train re-search with musique data, although i am facing an issue with no reward increase
Here is the config i am using
bash train.sh --train_batch_size 48 --ppo_mini_batch_size 48 --prompt_template_name re_search_template_sys --actor_model_path Qwen/Qwen2.5-0.5B-Instruct --search_url url --nnodes 1 --n_gpus_per_node 2 --save_freq 5 --test_freq 5 --total_epochs 2 --save_path ReSearch --train_files ReSearch/data/musique/train.parquet --test_files ReSearch/data/musique/test.parquet --apply_chat True
Also after a point searches sent to retriever are blank searches
