Dear author,
I'm confused that these lines texts are from solutions but not completions so the reward calculating is depends on the ground truth not the model predict
texts = [item['text'] for item in solutions]
https://github.com/bio-mlhui/MedGround-R1/blob/37b210dd7d6ee71179b3013d7f8042af1de0d5d3/open-r1-multimodal/src/open_r1/grpo_rec.py#L265
inputs = processor(
text=texts,
images=image,
return_tensors="pt",
padding=True
)
https://github.com/bio-mlhui/MedGround-R1/blob/37b210dd7d6ee71179b3013d7f8042af1de0d5d3/open-r1-multimodal/src/open_r1/grpo_rec.py#L268
Yours,
Jerry
Dear author,
I'm confused that these lines texts are from solutions but not completions so the reward calculating is depends on the ground truth not the model predict
texts = [item['text'] for item in solutions]
https://github.com/bio-mlhui/MedGround-R1/blob/37b210dd7d6ee71179b3013d7f8042af1de0d5d3/open-r1-multimodal/src/open_r1/grpo_rec.py#L265
inputs = processor(
text=texts,
images=image,
return_tensors="pt",
padding=True
)
https://github.com/bio-mlhui/MedGround-R1/blob/37b210dd7d6ee71179b3013d7f8042af1de0d5d3/open-r1-multimodal/src/open_r1/grpo_rec.py#L268
Yours,
Jerry