Conversation
There was a problem hiding this comment.
This should not be in the main repo folder, probably src/rank_llm/scripts
| parser.add_argument( | ||
| "--dataset", | ||
| type=str, | ||
| default="msp_open_ai_ada2_random_s5000_gpt4_da0_mr20_sampled_mix.jsonl", |
There was a problem hiding this comment.
is it possible to have rank zephy's training data or a subset of it as the default value of the calibration dataset?
There was a problem hiding this comment.
This would probably requires changes to the load dataset logic too
There was a problem hiding this comment.
This is the file that @ronakice shared. Wasn't this one used for training?
There was a problem hiding this comment.
No, this is not the data that we used for finetuning rankzephyr, but I leave it to Ronak to decide if we want to you the training dataset or the one that we shared with you.
There was a problem hiding this comment.
/u3/rpradeep/RankVicuna/data/msp_open_ai_ada2_random_s5000_gpt4_da0_mr20_sampled_mix.jsonl
This is the file I used to train RankZephyr @sahel-sh?
There was a problem hiding this comment.
either way, something is off in AWQ quantizing, i will advice against merging until this is properly sorted
sahel-sh
left a comment
There was a problem hiding this comment.
sorry for being slow on this, LGTM!
|
Here are the details outlining the insights gathered and other experimental information: https://docs.google.com/document/d/1BHpN9lDVGjtjIAFMxjUxNuu1K4KJOgIaOZXkWF1_K8c/edit |
Pull Request Checklist
Reference Issue
ref: castorini/ura-projects#4
Checklist Items
Before submitting your pull request, please review these items:
PR Type
What kind of change does this PR introduce?