Skip to content

feat(vision-metrics): split vqa#649

Open
davidberenstein1957 wants to merge 3 commits into
feat/vlm-pr-3d-oneig-reasoningfrom
feat/vlm-pr-4a-vqa
Open

feat(vision-metrics): split vqa#649
davidberenstein1957 wants to merge 3 commits into
feat/vlm-pr-3d-oneig-reasoningfrom
feat/vlm-pr-4a-vqa

Conversation

@davidberenstein1957

@davidberenstein1957 davidberenstein1957 commented Apr 28, 2026

Copy link
Copy Markdown
Member

Summary

Splits vqa into its own stacked PR, adds VQAMetric, and wires GenAI Bench benchmark entry with focused VQA coverage.

Stack Position

Files

  • src/pruna/evaluation/metrics/metric_vqa.py
  • src/pruna/evaluation/benchmarks.py
  • tests/evaluation/test_vision_metrics.py

Test Plan

uv run pytest tests/evaluation/test_vision_metrics.py -k vqa

Review Focus

  • Yes-probability scoring
  • GenAI Bench mapping

Review Flow (Order)

Review the stack in this exact order:

  1. feat(vendor): add LLM2Vec embedding model #637 vendor
  2. feat(infrastructure): add VLM base classes and utilities #638 infrastructure
  3. feat(text-metrics): split qa_accuracy #645 qa_accuracy
  4. feat(text-metrics): split oneig_alignment #646 oneig_alignment
  5. feat(text-metrics): split text_score pair #647 text_score pair
  6. feat(text-metrics): split oneig_reasoning #648 oneig_reasoning
  7. feat(vision-metrics): split vqa #649 vqa
  8. feat(vision-metrics): split vie_score #650 vie_score
  9. feat(vision-metrics): split img_edit_score #651 img_edit_score
  10. feat(e2e-tests): stacked e2e after split metrics #641 e2e tests

This PR in the flow (7/10)

@github-actions

Copy link
Copy Markdown

This PR has been inactive for 10 days and is now marked as stale.

@github-actions github-actions Bot added the stale label May 19, 2026
@davidberenstein1957 davidberenstein1957 force-pushed the feat/vlm-pr-3d-oneig-reasoning branch from c853dd2 to fc9d0bb Compare June 2, 2026 17:30
@github-actions github-actions Bot removed the stale label Jun 19, 2026
@github-actions

Copy link
Copy Markdown

This PR has been inactive for 10 days and is now marked as stale. It will be closed in 7 days if there is no further activity.

@github-actions github-actions Bot added the stale label Jun 30, 2026
@davidberenstein1957 davidberenstein1957 force-pushed the feat/vlm-pr-3d-oneig-reasoning branch from bb46ea1 to ec2c190 Compare July 2, 2026 13:47
davidberenstein1957 and others added 2 commits July 2, 2026 15:50
Introduces VQAMetric with GenAI Bench benchmark wiring and focused VQA unit coverage as the first vision metric stack PR.

Made-with: Cursor
Co-authored-by: Cursor <cursoragent@cursor.com>
- fix benchmarks UTF-8 corruption (1-5 ratings)
- sync OneIG subset dataset loaders for benchmark registration
- ruff check/format on changed VLM src files
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant