-
Notifications
You must be signed in to change notification settings - Fork 62
Open
Description
Hi, thank you for the outstanding study — I’ve been fully engaged in trying to reproduce the results.
However, I encountered a significant performance gap and would like to confirm whether the warnings I'm seeing can affect evaluation quality.
This is my experiment result and my F1 score is about 20 points lower than what is reported in the paper.
I also ran two additional experiments on different datasets and both of them showed a similar result.
WARNING:graphr1:Some nodes are missing, maybe the storage is damaged WARNING:graphr1:Some edges are missing, maybe the storage is damaged
I would like to know whether these warnings could cause incomplete graph construction or lead to degraded F1/EM scores.
- Experiment Setup
Model: Qwen2.5-3B-Instruct
Datasets: 2WikiMultiHopQA, NQ
Training & Evaluation: Same hyperparameters and pipeline as the official implementation
Questions:
- Do these graphr1 warnings indicate corrupted storage or missing nodes/edges that can affect retrieval or reasoning steps?
- Could this be the source of the large performance drop?
- Is there any recommended fix
Metadata
Metadata
Assignees
Labels
No labels