feat(vision-metrics): split vqa by davidberenstein1957 · Pull Request #649 · PrunaAI/pruna

davidberenstein1957 · 2026-04-28T13:04:07Z

Summary

Splits vqa into its own stacked PR, adds VQAMetric, and wires GenAI Bench benchmark entry with focused VQA coverage.

Stack Position

Base: PR feat(text-metrics): split oneig_reasoning #648 (feat/vlm-pr-3d-oneig-reasoning)
Next: PR feat(vision-metrics): split vie_score #650 (feat/vlm-pr-4b-vie-score)
Final integration: PR feat(e2e-tests): stacked e2e after split metrics #641 (feat/vlm-pr-5-e2e-tests)
Canonical umbrella reference: PR feat(evaluation): add VLMMetrics #545 (feat/metrics-vlm-support)

Files

src/pruna/evaluation/metrics/metric_vqa.py
src/pruna/evaluation/benchmarks.py
tests/evaluation/test_vision_metrics.py

Test Plan

uv run pytest tests/evaluation/test_vision_metrics.py -k vqa

Review Focus

Yes-probability scoring
GenAI Bench mapping

Review Flow (Order)

Review the stack in this exact order:

feat(vendor): add LLM2Vec embedding model #637 vendor
feat(infrastructure): add VLM base classes and utilities #638 infrastructure
feat(text-metrics): split qa_accuracy #645 qa_accuracy
feat(text-metrics): split oneig_alignment #646 oneig_alignment
feat(text-metrics): split text_score pair #647 text_score pair
feat(text-metrics): split oneig_reasoning #648 oneig_reasoning
feat(vision-metrics): split vqa #649 vqa
feat(vision-metrics): split vie_score #650 vie_score
feat(vision-metrics): split img_edit_score #651 img_edit_score
feat(e2e-tests): stacked e2e after split metrics #641 e2e tests

This PR in the flow (7/10)

Review after PR feat(text-metrics): split oneig_reasoning #648.
Next PR to review: feat(vision-metrics): split vie_score #650.
Confirm this PR's tests and scope before continuing.

github-actions · 2026-05-19T00:29:27Z

This PR has been inactive for 10 days and is now marked as stale.

github-actions · 2026-06-30T00:30:33Z

This PR has been inactive for 10 days and is now marked as stale. It will be closed in 7 days if there is no further activity.

Introduces VQAMetric with GenAI Bench benchmark wiring and focused VQA unit coverage as the first vision metric stack PR. Made-with: Cursor

Co-authored-by: Cursor <cursoragent@cursor.com>

- fix benchmarks UTF-8 corruption (1-5 ratings) - sync OneIG subset dataset loaders for benchmark registration - ruff check/format on changed VLM src files

This was referenced Apr 28, 2026

feat(text-metrics): add text-based VLM judge metrics #639

Closed

feat(vision-metrics): add vision-based VLM judge metrics #640

Closed

davidberenstein1957 force-pushed the feat/vlm-pr-3d-oneig-reasoning branch from e586366 to c853dd2 Compare May 8, 2026 09:01

davidberenstein1957 force-pushed the feat/vlm-pr-4a-vqa branch from 3d76a02 to 92109f0 Compare May 8, 2026 09:01

github-actions Bot added the stale label May 19, 2026

davidberenstein1957 force-pushed the feat/vlm-pr-3d-oneig-reasoning branch from c853dd2 to fc9d0bb Compare June 2, 2026 17:30

davidberenstein1957 force-pushed the feat/vlm-pr-4a-vqa branch from 92109f0 to e79c31d Compare June 2, 2026 17:30

github-actions Bot removed the stale label Jun 19, 2026

github-actions Bot added the stale label Jun 30, 2026

davidberenstein1957 force-pushed the feat/vlm-pr-3d-oneig-reasoning branch from bb46ea1 to ec2c190 Compare July 2, 2026 13:47

davidberenstein1957 and others added 2 commits July 2, 2026 15:50

feat(vision-metrics): split vqa into dedicated branch

5876f7e

Introduces VQAMetric with GenAI Bench benchmark wiring and focused VQA unit coverage as the first vision metric stack PR. Made-with: Cursor

chore(metrics): export VQAMetric from __init__

9f5bcc3

Co-authored-by: Cursor <cursoragent@cursor.com>

davidberenstein1957 force-pushed the feat/vlm-pr-4a-vqa branch from 76d1456 to 9f5bcc3 Compare July 2, 2026 13:51

fix(ci): lint, format, and benchmark registration

48885f2

- fix benchmarks UTF-8 corruption (1-5 ratings) - sync OneIG subset dataset loaders for benchmark registration - ruff check/format on changed VLM src files

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(vision-metrics): split vqa#649

feat(vision-metrics): split vqa#649
davidberenstein1957 wants to merge 3 commits into
feat/vlm-pr-3d-oneig-reasoningfrom
feat/vlm-pr-4a-vqa

davidberenstein1957 commented Apr 28, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

github-actions Bot commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

davidberenstein1957 commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Stack Position

Files

Test Plan

Review Focus

Review Flow (Order)

This PR in the flow (7/10)

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

github-actions Bot commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

davidberenstein1957 commented Apr 28, 2026 •

edited

Loading