Skip to content

research: add autoconfig POC with QNN NPU catalog sweep#891

Open
DingmaomaoBJTU wants to merge 4 commits into
mainfrom
dingmaomaobjtu/research-autoconfig-poc
Open

research: add autoconfig POC with QNN NPU catalog sweep#891
DingmaomaoBJTU wants to merge 4 commits into
mainfrom
dingmaomaobjtu/research-autoconfig-poc

Conversation

@DingmaomaoBJTU

Copy link
Copy Markdown
Collaborator

What this is

Adds research/autoconfig/ — an experimental automated config search system that finds the optimal winml-cli build configuration (EP, opset, graph optimizations) for a given model on Windows hardware without requiring the user to understand ORT/EP optimizer internals.

The core loop (autoconfig.py) is Explorer → Optimizer → Reviewer:

  • Explorer proposes the next hypothesis, pruning already-refuted configs from ep_knowledge/
  • Optimizer runs winml build + winml perf (two-phase: 200-iter CV screen → 3×500-iter full bench)
  • Reviewer evaluates, updates the KB, decides keep/discard

catalog_qnn_sweep.py sweeps a fixed hypothesis matrix (h0–h5: baseline, opset 17–21, conv fusions) across a catalog of 8 models on QNN NPU.

Key findings — 8-model QNN NPU catalog sweep

npu-001: opset 21 NHWC bypass — architecture-specific (+26–31% on Conv+residual)

Opset ≥ 21 bypasses ORT's NHWC layout transformer for QNN EP. This gives a large speedup on Conv + residual models but is neutral or slightly harmful for pure transformers:

Architecture Models opset 21 vs opset 17
Conv + residual MobileViT-small, DINOv2-small +26–31% speedup
Pure transformer ViT-base, YOLOS-small neutral / slight regression
BERT-family NLP DistilBERT, MiniLM, RoBERTa neutral (within DVFS noise)

Root cause confirmed in ORT source: IsSupportedOpset() gate in layout_transformation.cc — bypassing NHWC layout transform leaves fewer uncancellable Transpose nodes in the HTP graph for Conv+residual models.

npu-006: Conv fusions cause ~4900% regression on ResNet-18 QNN NPU

conv_bn_fusion, conv_add_fusion, conv_activation_fusion produce fused op nodes that QNN EP cannot execute natively, causing CPU fallback for every fused Conv:

Model conv fusions vs baseline
ResNet-18 132.3 ms vs 2.72 ms — 4764% regression
MobileViT-small neutral (no residual-fused Conv)

Feature gap: winml should detect when the target EP would CPU-fallback fused Conv ops and suppress incompatible fusions automatically.

npu-007: DVFS thermal noise — CV < 15% gate must be disabled for QNN NPU

QNN NPU CV is consistently 0.10–2.0+ across all models due to DVFS thermal throttling. The Phase-A CV gate blocks all models. Reliable comparison requires ≥ 1500 total iterations; differences < 10% are unreliable.

Feature gap: winml perf should support --thermal-stabilization mode and report confidence intervals.

Feature gaps identified

  1. FusedConv detection in winml analyze — detect Conv ops that would CPU-fallback on QNN NPU after fusion (npu-006); warn or suppress incompatible fusions in the generated build config
  2. DVFS-aware perfwinml perf --thermal-stabilization; report CI not just p50
  3. Budget-aware sweep--quick flag for large models (YOLOS 78ms × 3×500 iters = 207s/hypothesis exhausts 20-min budget after 2 hypotheses)

What's included

  • autoconfig.py — adaptive single-model config search loop (ConvNext CPU baseline)
  • catalog_qnn_sweep.py — generalized multi-model QNN NPU sweep
  • analyze_graph.py — ONNX graph pattern analysis helper
  • autoconfig_diagram.html — Explorer/Optimizer/Reviewer architecture diagram
  • gen_report_v3.py — HTML report generator
  • ep_knowledge/ — empirical KB with confirmed findings per EP (cpu, dml, qnn_gpu, qnn_npu)
  • catalog-qnn-sweep/ — per-model results.json + SUMMARY.md for 8 catalog models

Status: research POC — not production code.

Adds research/autoconfig/ — an automated config search POC that sweeps
opset versions (17-21), execution providers, and graph optimizations to
find the best winml-cli build config for a given model on Windows hardware.

Key findings from 8-model QNN NPU catalog sweep:
- npu-001: opset 21 bypass gives +25-31% on Conv+residual models (MobileViT, DINOv2)
- npu-006: conv fusions (conv-bn/add/activation) cause 4900% regression on ResNet-18 QNN NPU
- npu-007: DVFS thermal noise requires session-level averaging (3x500 iters) for reliable results

Includes ep_knowledge/ KB with confirmed findings per EP, and catalog-qnn-sweep/
with per-model benchmark results and cross-model pattern analysis.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@DingmaomaoBJTU DingmaomaoBJTU requested a review from a team as a code owner June 15, 2026 02:30
github-actions Bot and others added 2 commits June 15, 2026 10:32
Adds research/autoconfig/docs/agent-design.md — strategic design for
the agent layer of winml-cli, covering:

- winml-cli vs Olive distinction (UX + Windows-first + explainability)
- Why autoconfig search is a sub-tool, not the agent entry point
- 5 agent types: Diagnostic, Decision Guidance, Cross-Device Confidence,
  Regression Detection, Model Recommendation
- Autoconfig's role within the agent framework
- Key concerns and open questions

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds research/autoconfig/docs/skills-design.md — full design doc for
the winml-cli skills/agent layer, including:

- 11 skill designs (use-winml-cli, optimize-for-device,
  ep-compatibility-check, debug-accuracy-drop, and others)
- Competitive analysis (Apple coremltools, ExecuTorch, AI Hub,
  NVIDIA ModelOpt, OpenVINO, Olive)
- Top 5 feature gaps
- Validation confidence levels (L1-L5)
- Structured output requirements
- QNN NPU catalog sweep findings (npu-001/006/007)
- FusedConv unfuse feature request

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
import json


results = json.load(open(r"ablation-search\results.json"))
…ping skills

- Split skill catalog into two ranked categories by the 'does it touch code?'
  discriminator: User (config-only) and Contributor (code changes)
- Merge overlapping skills (12 -> 9):
  - check-model-feasibility = find-a-model + ep-compatibility-check
  - ship-to-winapp = validate-before-ship + prepare-for-winapp
  - autoconfig absorbs optimize-for-device as its manual mode
- Add self-contained HTML render of the design doc for easier reading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants