Hi, thanks for releasing LEANN — the idea of learning evidence aggregation over noisy neighborhoods is very close to what many RAG practitioners are struggling with.
I maintain an open-source RAG debugging framework called WFGY ProblemMap (MIT).
It encodes 16 common failure modes for retrieval + reasoning systems, with concrete descriptions and fixes:
https://github.com/onestardao/WFGY/blob/main/ProblemMap/README.md
This ProblemMap has already been picked up by several research-style projects:
- Harvard MIMS Lab – ToolUniverse (LLM tools benchmark; WFGY listed under robustness / RAG debugging).
- QCRI LLM Lab – Multimodal-RAG-Survey (multimodal RAG survey repo).
- Univ. of Innsbruck Data Science Group – Rankify (RAG toolkit with merged troubleshooting docs based on WFGY).
Given LEANN’s focus on noisy neighborhoods, a few failure modes show up especially often:
- No.1 hallucination & chunk drift — retrieved neighbors are “plausible but wrong”.
- No.2 interpretation collapse — neighbors are good, but aggregation + reasoning go off.
- No.5 semantic ≠ embedding — cosine similarity selects the wrong local neighborhood.
- No.6 logic collapse & recovery — the system gets into dead-ends and needs controlled reset.
Proposal
I’d like to propose a small, documentation-focused contribution:
-
Add a “Debugging LEANN with WFGY ProblemMap” section to the README or a separate doc file.
For each of the above failure modes, I can:
- Describe the symptom in LEANN terms (noisy neighbors, unstable evidence weighting, etc.).
- Suggest simple experiments and ablations to detect the issue (e.g. neighbor inspection, temperature / top-k sweeps, disabling certain heads).
- Link to the relevant ProblemMap entries for deeper explanation.
-
Optionally add a short “troubleshooting” table that LEANN users can follow when they see odd behavior:
- Column 1: Symptom (e.g. “answers jump when I add a small number of documents”).
- Column 2: Likely ProblemMap mode(s) (No.1, No.5, etc.).
- Column 3: Concrete checks to run in LEANN (what to log, what to visualize).
All content would be MIT-compatible and clearly marked as optional external guidance.
If you are open to this, I can prepare a PR with a first draft and adjust based on your feedback.
Thanks for considering — I think LEANN plus a clear failure-mode map would help a lot of people who are trying to move from “it runs” to “it is debuggable”.
Hi, thanks for releasing LEANN — the idea of learning evidence aggregation over noisy neighborhoods is very close to what many RAG practitioners are struggling with.
I maintain an open-source RAG debugging framework called WFGY ProblemMap (MIT).
It encodes 16 common failure modes for retrieval + reasoning systems, with concrete descriptions and fixes:
https://github.com/onestardao/WFGY/blob/main/ProblemMap/README.md
This ProblemMap has already been picked up by several research-style projects:
Given LEANN’s focus on noisy neighborhoods, a few failure modes show up especially often:
Proposal
I’d like to propose a small, documentation-focused contribution:
Add a “Debugging LEANN with WFGY ProblemMap” section to the README or a separate doc file.
For each of the above failure modes, I can:
Optionally add a short “troubleshooting” table that LEANN users can follow when they see odd behavior:
All content would be MIT-compatible and clearly marked as optional external guidance.
If you are open to this, I can prepare a PR with a first draft and adjust based on your feedback.
Thanks for considering — I think LEANN plus a clear failure-mode map would help a lot of people who are trying to move from “it runs” to “it is debuggable”.