Skip to content

refactor(task): unify detection into resolve_task/TaskResolution; fix inspect source#878

Merged
timenick merged 30 commits into
mainfrom
zhiwang/unify-task-detection
Jun 16, 2026
Merged

refactor(task): unify detection into resolve_task/TaskResolution; fix inspect source#878
timenick merged 30 commits into
mainfrom
zhiwang/unify-task-detection

Conversation

@timenick

@timenick timenick commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

What

Unifies WinML's task detection into a single resolver. detect_task (inspect/eval), _detect_task_and_class_from_config (build via -m), and resolve_loader_config step 2 (config/build via --model-type) were three implementations of the same decision over the same MODEL_CLASS_MAPPING data, and had drifted. Modality was reconstructed from config field names — a heuristic that misroutes models whose fields aren't modality-exclusive.

All three are replaced by a single resolve_task(config, *, task=None, model_class=None) -> TaskResolution in a new loader/resolution.py, consumed by every entry point. loader/task.py is reduced to data tables + boundary utils. Modality derives from the model class's main_input_name. Addresses #877.

API change: legacy detect_task / resolve_task_and_model_class / _detect_task_and_class_from_config are removed; callers use resolve_task + TaskResolution fields. resolve_loader_config returns the TaskResolution as a 4th tuple element.

Why (concrete bugs fixed)

  • Inspect reported a misleading task source. winml inspect recomputed provenance post-hoc by scanning MODEL_CLASS_MAPPING, so a model detected via Optimum (e.g. TAPEX/BART) was mislabeled HF_MODEL_CLASS_MAPPING. It now reports the resolver's real source (tasks-manager).
  • AST misroute: ASTModel carries patch_size, so the old config-field heuristic upgraded it to image-feature-extraction → image dataset → fail. main_input_name (input_values) keeps it feature-extraction.
  • sam2/sam2-video disagreement: feature-extraction via inspect/--model-type but mask-generation via -m (different artifacts for the same model). Now mask-generation everywhere.
  • load_hf_model returned an inconsistent task (Optimum-canonical in the user_script branch, modality-aware otherwise). Both now return the modality-aware task.
  • Calibration crash: a backbone that stays feature-extraction but exports non-text inputs routed to a text dataset and crashed; calibration falls back to RandomDataset on a modality mismatch.

How

  • resolve_task (5 stages): user override → detection (override / no-architectures / TasksManager + fill-mask→seq2seq correction / default) → model class → modality upgrade (detection path only) → composite tag. Returns TaskResolution(task, optimum_task, model_class, source, composite) with the invariant optimum_task == to_optimum_task(task).
  • TaskSource enum replaces stringly-typed provenance with semantic values (tasks-manager, sentinel-default, model-id-default, wrapped-library, hf-task-default, user-task, user-class) — no internal data-structure names leak into inspect output.
  • resolve_composite + the seq2seq bridge move from commands/config.py into the resolver; TaskResolution.composite carries sub-components so winml config no-task routing matches explicit --task.
  • _resolve_task_override (model-id default → (model_type, None) sentinel reverse-lookup) is the single override source; a default-task override is declared only by an explicit sentinel, not by a lone (model_type, task) class-fix entry (e.g. segformer image-segmentation) — the architecture head decides otherwise.
  • Modality derives from the architecture class's main_input_name (pixel_valuesimage-feature-extraction; text/audio stay feature-extraction).
  • --model-type fallback (first ONNX-exportable task) and the timm wrapped-library stage merge into one no-architectures stage in resolve_task.
  • Detection helpers relocate from task.py into resolution.py. Calibration degrades to RandomDataset (real input specs from the ONNX) on a modality mismatch.

Behavior changes

case before after
inspect task source (seq2seq, e.g. TAPEX) mislabeled HF_MODEL_CLASS_MAPPING tasks-manager (real source)
AST (ASTModel) image-feature-extraction feature-extraction
sam2 / sam2-video inspect/--model-type: feature-extraction; -m: mask-generation mask-generation everywhere
segformer classification ckpt (build) image-segmentation image-classification
prajjwal1/bert-tiny (inspect) head-detected feature-extraction (model-id default)
load_hf_model (user_script branch) Optimum-canonical task modality-aware task
undetectable config detect_task fell back; loader path raised unified to hf-task-default fallback
task source strings TasksManager / HF_MODEL_CLASS_MAPPING / … cleaned kebab-case TaskSource values

Out of scope (deferred)

  • Inspect rendering the full composite structure (encoder+decoder + available pipeline tasks) in the formatter.
  • A real audio-feature-extraction evaluator + dataset (and the task name itself).
  • Video task support — optimum has no ONNX export config for videomae/timesformer/vivit.

timenick added 7 commits June 12, 2026 11:22
Single source of truth for model-type / model-id task overrides, unifying
the detect_task short-circuit, the (model_type, None) sentinel reverse-lookup,
and the model-id default. Not yet wired into the entry points.
…entry points

detect_task, _detect_task_and_class_from_config, and resolve_loader_config
step 2 now all consult _resolve_task_override first. Fixes:
- sam2/sam2-video resolve to mask-generation on every entry point (was
  feature-extraction via inspect and --model-type)
- model-id default (bert-tiny) now applies on the detect path too
Removes the now-redundant sentinel block and the detect-path short-circuit.
Replace the config-field heuristic (_TASK_MODALITY_DISAMBIGUATION, which keyed
on top-level image_size/patch_size) with the architecture class's main_input_name
(pixel_values -> image-feature-extraction; text/audio stay feature-extraction).
Fixes AST (carries patch_size but is audio) being misrouted to image-feature-extraction.
Uses the arch class from config.architectures, not the resolved/Auto class.
…smatch

universal_calib_dataset now degrades to RandomDataset (reading real input specs
from the ONNX) when the task-specific dataset either fails to construct or produces
none of the model's input tensors. Keeps build from crashing when a backbone stays
feature-extraction but the ONNX wants non-text inputs (audio/image/video).
sam2 -> mask-generation via detect_task, resolve_task_and_model_class, and
resolve_loader_config(model_type); detect_task agrees with resolve_task_and_model_class
for AST (feature-extraction), ViT (image-feature-extraction), and BART-mnli.
…entry class-fixes

Review found a regression: the single-real-task branch in _resolve_task_override,
now consulted by the build path, forced segformer classification checkpoints to
image-segmentation (its MODEL_CLASS_MAPPING entry is a class-fix for the segmentation
task, not a default-task declaration). Remove that branch so a default comes only from
an explicit (model_type, None) sentinel; otherwise the architecture head decides.
segformer is the only single-entry-no-sentinel type. Also label detect_task source as
MODEL_TASK_MAPPING for model-id overrides (was mislabeled HF_MODEL_CLASS_MAPPING).
@timenick timenick requested a review from a team as a code owner June 12, 2026 04:25
timenick added 16 commits June 12, 2026 17:13
…tection

# Conflicts:
#	src/winml/modelkit/eval/tensor_similarity_evaluator.py

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Summary

This PR correctly unifies task detection by replacing scattered, divergent paths with a single resolve_task() 5-stage resolver. The sentinel-based override mechanism for SAM/SAM2, the modality upgrade via main_input_name, and the composite component tagging are all well-designed. Below are the substantive issues I found.


Bug: USER_CLASS path skips modality upgrade (resolution.py line 367–369)

When model_class is provided without --task, the USER_CLASS stage returns opt_task directly as TaskResolution.task — no call to _resolve_task_modality. Example:

resolve_task(vit_config, model_class="ViTModel")
TaskResolution(task="feature-extraction", ...) ← should be "image-feature-extraction"

The class docstring says task is always WinML modality-aware. The detection path (Stage 3) applies _resolve_task_modality; the USER_CLASS path does not. This causes TASK_DATASET_MAPPING to route vision models to TextDataset for calibration.

Fix:
`python

resolution.py, after opt_task is set in the USER_CLASS branch:

surfaced = _resolve_task_modality(config, opt_task)
return TaskResolution(surfaced, to_optimum_task(surfaced), resolved, TaskSource.USER_CLASS, None)
`

No existing test covers this combination; it should be added to test_resolve_task.py.


Warning: Stage 1b KeyError propagates uncaught (resolution.py line 417)

python resolved = TasksManager.get_model_class_for_task(opt_task, framework="pt")

This line has no try/except. Stage 2's error handling does not protect it — resolved is set here, before Stage 2 checks if resolved is None. If the first supported task from get_supported_tasks cannot be looked up under framework="pt" (e.g. a timm_wrapper whose classes are under the timm library, not transformers), a raw KeyError escapes. Stage 2's except Exception pattern should be applied here too.


Warning: assert eliminated by -O (resolution.py line 447)

python assert source is not None

Python's -O flag strips assert statements. If an unexpected code path were to leave source = None, this guard disappears in optimised builds. Replace with:

python if source is None: raise RuntimeError("resolve_task: internal invariant violated — source was not set")


Warning: image-feature-extraction not pinned in TASK_SYNONYM_EXTENSIONS (task.py line 222)

to_optimum_task("image-feature-extraction") relies on Optimum's TasksManager.map_from_synonym to collapse it to "feature-extraction". The docstring documents the collapse; the tests verify it works today. But unlike "mask-generation" (which is pinned in TASK_SYNONYM_EXTENSIONS to prevent Optimum from mis-normalising it), "image-feature-extraction" has no fallback. If a future Optimum update changes the mapping, all downstream Optimum API calls will silently receive the unknown task string and fail.

Suggested addition:

python TASK_SYNONYM_EXTENSIONS: dict[str, str] = { "image-feature-extraction": "feature-extraction", # add this "next-sentence-prediction": "text-classification", "mask-generation": "mask-generation", }


Info: audio feature-extraction lands in TextDataset (datasets/__init__.py line 44)

_FEATURE_MODALITY_BY_MAIN_INPUT only upgrades pixel_valuesimage-feature-extraction. Audio models (input_values, input_features) stay as "feature-extraction" and hit TextDataset. The new RandomDataset fallback handles the mismatch gracefully, but it would help to add a # TODO: add audio-feature-extraction once AudioDataset exists comment at line 44 so this gap is visible.


Info: missing test for USER_CLASS with vision feature-extraction

test_resolve_task.py has no case for resolve_task(vit_config, model_class="ViTModel"). Even if the modality gap is intentional (user is being explicit), the expected behaviour should be pinned by a test so future refactors cannot silently change it.

@timenick timenick changed the title fix(task): unify task detection across entry points; derive modality from main_input_name refactor(task): unify detection into resolve_task/TaskResolution; fix inspect source Jun 15, 2026

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: fix(task): unify task detection across entry points

The unified resolve_task() 5-stage resolver is a sound design — sentinel-based overrides, the TaskResolution/TaskSource dataclasses, and the composite tagging are all well-structured. The cross-entry-point tests in test_task_entry_point_consistency.py are a good addition.

Five issues below (one bug, three warnings, one info).


⚠️ task.py line 222 — TASK_SYNONYM_EXTENSIONS missing image-feature-extraction entry

to_optimum_task("image-feature-extraction") relies on Optimum's map_from_synonym to collapse this to "feature-extraction". Comments in clip.py and blip.py confirm Optimum knows the synonym today, but unlike "mask-generation" (which is explicitly pinned here to prevent Optimum mis-normalisation), there is no fallback. If Optimum changes the mapping, all downstream Optimum API calls silently receive the unknown string "image-feature-extraction" and fail.

Suggested fix:

TASK_SYNONYM_EXTENSIONS: dict[str, str] = {
    "image-feature-extraction": "feature-extraction",  # add
    "next-sentence-prediction": "text-classification",
    "mask-generation": "mask-generation",
}

Comment thread src/winml/modelkit/loader/resolution.py Outdated
Comment thread src/winml/modelkit/loader/resolution.py Outdated
Comment thread src/winml/modelkit/loader/resolution.py Outdated
Comment thread tests/unit/loader/test_resolve_task.py
…den Stage 1b + invariant; pin image-feature-extraction synonym; add tests
@timenick

Copy link
Copy Markdown
Collaborator Author

Thanks for the thorough review — all six addressed in 5d47ce9e.

1. USER_CLASS skips modality (bug) — Fixed, with one refinement to your suggested patch. Applying _resolve_task_modality unconditionally would also upgrade a user-given --task (e.g. --model-class ViTModel --task feature-extraction), which should stay verbatim like the USER_TASK path. So I apply modality only when the task is inferred (no --task):

if task is not None:
    opt_task = normalize_task(task); surfaced = opt_task          # preserve user intent
else:
    opt_task = _infer_task_from_architecture(config)
    surfaced = _resolve_task_modality(config, opt_task)           # inferred -> modality-aware

resolve_task(vit_cfg, model_class="ViTModel") now returns image-feature-extraction. Pinned by two new tests.

2. Stage 1b KeyError uncaught — Fixed by removing the eager get_model_class_for_task from 1b and letting Stage 2 resolve the class uniformly (it already has the try/except + arch fallback) — DRY, and the lookup can no longer escape as a raw KeyError. timm routing tests still green. (Legacy had the same un-wrapped call, so not a regression, but worth hardening.)

3. assert stripped by -O — Replaced with if source is None: raise RuntimeError(...).

4. image-feature-extraction not pinned — Agreed; confirmed it's absent from Optimum's get_all_tasks(), so it's a WinML-owned name. Pinned in TASK_SYNONYM_EXTENSIONSfeature-extraction.

5. audio → TextDataset (info) — Added a # TODO at _FEATURE_MODALITY_BY_MAIN_INPUT for input_values/input_featuresaudio-feature-extraction once an AudioDataset exists.

6. missing USER_CLASS test — Added test_user_class_inferred_task_is_modality_aware + test_user_class_with_explicit_task_preserved_verbatim.

Generated with Claude Code

…the validate-gate cross-modality warning (test_build_validate_task); collapse stays guarded by test_task_boundary

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review after 5d47ce9

All five issues from the previous review have been addressed:

  • Bug: USER_CLASS modality upgrade — fixed (inferred-task branch now calls _resolve_task_modality).
  • Warning: Stage 1b raw KeyError — fixed (call removed; Stage 2 handles class resolution).
  • Warning: assert source is not None stripped by -O — fixed (if/raise RuntimeError).
  • Warning: image-feature-extraction not pinned in TASK_SYNONYM_EXTENSIONS — fixed.
  • Info: Missing test for USER_CLASS + vision feature-extraction — fixed (two new tests added).

One new issue introduced by the fix:

⚠️ USER_CLASS explicit-task branch normalizes away the user's original task string (inconsistent with USER_TASK)

In the explicit-task sub-branch (line 358–361), surfaced = normalize_task(task). The comment says "preserve it verbatim (like USER_TASK)", but the USER_TASK path (line 381–398) stores original = task and returns that, not the normalized form. So the two paths are inconsistent:

# USER_TASK: resolve_task(cfg, task="causal-lm") -> TaskResolution(task="causal-lm", ...)
# USER_CLASS: resolve_task(cfg, task="causal-lm", model_class="X") -> TaskResolution(task="text-generation", ...)

The new test test_user_class_with_explicit_task_preserved_verbatim only checks task="feature-extraction" (already canonical, no normalization happens) so it does not catch this divergence.

Fix: use surfaced = task (the original user string) instead of surfaced = opt_task (normalized), keeping opt_task = normalize_task(task) for the Optimum API call only — identical to what USER_TASK does with original vs normalized.

Comment thread src/winml/modelkit/loader/resolution.py Outdated
Comment thread tests/unit/loader/test_resolve_task.py
@timenick

Copy link
Copy Markdown
Collaborator Author

Correction on item 4 (image-feature-extraction synonym pin): reverted in fddd4254. TASK_SYNONYM_EXTENSIONS is also consulted by commands.build._validate_task_supported_for_model — adding the entry moved image-feature-extraction from the Optimum-synonym branch (which emits the cross-modality visibility warning) to the WinML-extension branch (silent), breaking test_silently_accepts_cross_modality_feature_extraction and silencing a deliberately-added safety warning.

The forward-looking robustness concern is already guarded: test_task_boundary.py asserts to_optimum_task("image-feature-extraction") == "feature-extraction", so a future Optimum change would fail CI. Left a NOTE in TASK_SYNONYM_EXTENSIONS so it isn't re-added. Items 1/2/3/5/6 stand.

Generated with Claude Code

timenick added 3 commits June 15, 2026 10:27
…im (match USER_TASK), not the normalized alias; add alias test
…e broke the intentional test_task_normalized_with_model_class; USER_TASK/USER_CLASS divergence is by design

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review after 275b022

Status of previous open items:

  • Warning (resolution.py:361) — Addressed as an intentional design decision, documented in code. The test_task_normalized_with_model_class test pins the normalization behavior.
  • Info (test_resolve_task.py:144) — Test renamed to test_user_class_explicit_task_is_not_modality_upgraded, more accurately describing the behavior.

One new warning found in the current state — see inline comment.

Comment thread src/winml/modelkit/loader/resolution.py Outdated
Comment thread src/winml/modelkit/loader/task.py
…th inferred branch + USER_TASK

The USER_CLASS explicit-task branch surfaced normalize_task(task) directly, collapsing the WinML modality-aware image-feature-extraction to feature-extraction -- which routes a ViT to a text dataset/evaluator (config --task image-feature-extraction --model-class ViTModel -> build/quantize calibration). Re-apply _resolve_task_modality after normalizing so the surfaced task stays modality-aware, matching the USER_CLASS inferred branch and USER_TASK; adding --model-class must not collapse the modality.

opt_task (the Optimum class lookup) is unchanged, so alias canonicalization (masked-lm -> fill-mask) and the optimum_task==to_optimum_task(task) invariant still hold; _resolve_task_modality is a no-op for non-feature-extraction tasks.

Update test_user_class_explicit_feature_extraction_is_modality_aware (was pinning the collapse) and add test_user_class_explicit_image_feature_extraction_preserved as the regression pin.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fourth-pass review — commit e65968d

Previous warning RESOLVED:
esolution.py:361 — surfaced = _resolve_task_modality(config, opt_task) is now applied in the USER_CLASS explicit-task branch. The image-feature-extraction modality is preserved when combined with --model-class, backed by two new tests ( est_user_class_explicit_feature_extraction_is_modality_aware, est_user_class_explicit_image_feature_extraction_preserved). ✅

One info finding remains — see inline comment on
esolution.py:361.

Open item from previous pass:

  • 🔵 Info ( ask.py:226) — TASK_SYNONYM_EXTENSIONS dual role (Optimum translation + build-validator bypass) is a structural design smell. Low priority; recommend a follow-up issue to separate the concerns.

Comment thread src/winml/modelkit/loader/resolution.py

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! All bugs and warnings resolved across 4 review rounds. Two remaining info items (misleading comment on resolution.py:361 and TASK_SYNONYM_EXTENSIONS dual-role coupling) are low-priority and can be tracked as follow-up issues.

@timenick timenick merged commit 1d4feca into main Jun 16, 2026
9 checks passed
@timenick timenick deleted the zhiwang/unify-task-detection branch June 16, 2026 03:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants