research: add autoconfig POC with QNN NPU catalog sweep by DingmaomaoBJTU · Pull Request #891 · microsoft/winml-cli

DingmaomaoBJTU · 2026-06-15T02:30:48Z

What this is

Adds research/autoconfig/ — an experimental automated config search system that finds the optimal winml-cli build configuration (EP, opset, graph optimizations) for a given model on Windows hardware without requiring the user to understand ORT/EP optimizer internals.

The core loop (autoconfig.py) is Explorer → Optimizer → Reviewer:

Explorer proposes the next hypothesis, pruning already-refuted configs from ep_knowledge/
Optimizer runs winml build + winml perf (two-phase: 200-iter CV screen → 3×500-iter full bench)
Reviewer evaluates, updates the KB, decides keep/discard

catalog_qnn_sweep.py sweeps a fixed hypothesis matrix (h0–h5: baseline, opset 17–21, conv fusions) across a catalog of 8 models on QNN NPU.

Key findings — 8-model QNN NPU catalog sweep

npu-001: opset 21 NHWC bypass — architecture-specific (+26–31% on Conv+residual)

Opset ≥ 21 bypasses ORT's NHWC layout transformer for QNN EP. This gives a large speedup on Conv + residual models but is neutral or slightly harmful for pure transformers:

Architecture	Models	opset 21 vs opset 17
Conv + residual	MobileViT-small, DINOv2-small	+26–31% speedup
Pure transformer	ViT-base, YOLOS-small	neutral / slight regression
BERT-family NLP	DistilBERT, MiniLM, RoBERTa	neutral (within DVFS noise)

Root cause confirmed in ORT source: IsSupportedOpset() gate in layout_transformation.cc — bypassing NHWC layout transform leaves fewer uncancellable Transpose nodes in the HTP graph for Conv+residual models.

npu-006: Conv fusions cause ~4900% regression on ResNet-18 QNN NPU

conv_bn_fusion, conv_add_fusion, conv_activation_fusion produce fused op nodes that QNN EP cannot execute natively, causing CPU fallback for every fused Conv:

Model	conv fusions vs baseline
ResNet-18	132.3 ms vs 2.72 ms — 4764% regression
MobileViT-small	neutral (no residual-fused Conv)

Feature gap: winml should detect when the target EP would CPU-fallback fused Conv ops and suppress incompatible fusions automatically.

npu-007: DVFS thermal noise — CV < 15% gate must be disabled for QNN NPU

QNN NPU CV is consistently 0.10–2.0+ across all models due to DVFS thermal throttling. The Phase-A CV gate blocks all models. Reliable comparison requires ≥ 1500 total iterations; differences < 10% are unreliable.

Feature gap: winml perf should support --thermal-stabilization mode and report confidence intervals.

Feature gaps identified

FusedConv detection in winml analyze — detect Conv ops that would CPU-fallback on QNN NPU after fusion (npu-006); warn or suppress incompatible fusions in the generated build config
DVFS-aware perf — winml perf --thermal-stabilization; report CI not just p50
Budget-aware sweep — --quick flag for large models (YOLOS 78ms × 3×500 iters = 207s/hypothesis exhausts 20-min budget after 2 hypotheses)

What's included

autoconfig.py — adaptive single-model config search loop (ConvNext CPU baseline)
catalog_qnn_sweep.py — generalized multi-model QNN NPU sweep
analyze_graph.py — ONNX graph pattern analysis helper
autoconfig_diagram.html — Explorer/Optimizer/Reviewer architecture diagram
gen_report_v3.py — HTML report generator
ep_knowledge/ — empirical KB with confirmed findings per EP (cpu, dml, qnn_gpu, qnn_npu)
catalog-qnn-sweep/ — per-model results.json + SUMMARY.md for 8 catalog models

Status: research POC — not production code.

Adds research/autoconfig/ — an automated config search POC that sweeps opset versions (17-21), execution providers, and graph optimizations to find the best winml-cli build config for a given model on Windows hardware. Key findings from 8-model QNN NPU catalog sweep: - npu-001: opset 21 bypass gives +25-31% on Conv+residual models (MobileViT, DINOv2) - npu-006: conv fusions (conv-bn/add/activation) cause 4900% regression on ResNet-18 QNN NPU - npu-007: DVFS thermal noise requires session-level averaging (3x500 iters) for reliable results Includes ep_knowledge/ KB with confirmed findings per EP, and catalog-qnn-sweep/ with per-model benchmark results and cross-model pattern analysis. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Adds research/autoconfig/docs/agent-design.md — strategic design for the agent layer of winml-cli, covering: - winml-cli vs Olive distinction (UX + Windows-first + explainability) - Why autoconfig search is a sub-tool, not the agent entry point - 5 agent types: Diagnostic, Decision Guidance, Cross-Device Confidence, Regression Detection, Model Recommendation - Autoconfig's role within the agent framework - Key concerns and open questions Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Adds research/autoconfig/docs/skills-design.md — full design doc for the winml-cli skills/agent layer, including: - 11 skill designs (use-winml-cli, optimize-for-device, ep-compatibility-check, debug-accuracy-drop, and others) - Competitive analysis (Apple coremltools, ExecuTorch, AI Hub, NVIDIA ModelOpt, OpenVINO, Olive) - Top 5 feature gaps - Validation confidence levels (L1-L5) - Structured output requirements - QNN NPU catalog sweep findings (npu-001/006/007) - FusedConv unfuse feature request Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

+import json
+
+
+results = json.load(open(r"ablation-search\results.json"))


…ping skills - Split skill catalog into two ranked categories by the 'does it touch code?' discriminator: User (config-only) and Contributor (code changes) - Merge overlapping skills (12 -> 9): - check-model-feasibility = find-a-model + ep-compatibility-check - ship-to-winapp = validate-before-ship + prepare-for-winapp - autoconfig absorbs optimize-for-device as its manual mode - Add self-contained HTML render of the design doc for easier reading

DingmaomaoBJTU requested a review from a team as a code owner June 15, 2026 02:30

github-actions Bot and others added 2 commits June 15, 2026 10:32

github-advanced-security AI found potential problems Jun 15, 2026

View reviewed changes

Comment thread research/autoconfig/gen_report_v3.py

import json

results = json.load(open(r"ablation-search\results.json"))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

research: add autoconfig POC with QNN NPU catalog sweep#891

research: add autoconfig POC with QNN NPU catalog sweep#891
DingmaomaoBJTU wants to merge 4 commits into
mainfrom
dingmaomaobjtu/research-autoconfig-poc

DingmaomaoBJTU commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		import json


		results = json.load(open(r"ablation-search\results.json"))

Conversation

DingmaomaoBJTU commented Jun 15, 2026

What this is

Key findings — 8-model QNN NPU catalog sweep

npu-001: opset 21 NHWC bypass — architecture-specific (+26–31% on Conv+residual)

npu-006: Conv fusions cause ~4900% regression on ResNet-18 QNN NPU

npu-007: DVFS thermal noise — CV < 15% gate must be disabled for QNN NPU

Feature gaps identified

What's included

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants