research: add autoconfig POC with QNN NPU catalog sweep#891
Open
DingmaomaoBJTU wants to merge 4 commits into
Open
research: add autoconfig POC with QNN NPU catalog sweep#891DingmaomaoBJTU wants to merge 4 commits into
DingmaomaoBJTU wants to merge 4 commits into
Conversation
Adds research/autoconfig/ — an automated config search POC that sweeps opset versions (17-21), execution providers, and graph optimizations to find the best winml-cli build config for a given model on Windows hardware. Key findings from 8-model QNN NPU catalog sweep: - npu-001: opset 21 bypass gives +25-31% on Conv+residual models (MobileViT, DINOv2) - npu-006: conv fusions (conv-bn/add/activation) cause 4900% regression on ResNet-18 QNN NPU - npu-007: DVFS thermal noise requires session-level averaging (3x500 iters) for reliable results Includes ep_knowledge/ KB with confirmed findings per EP, and catalog-qnn-sweep/ with per-model benchmark results and cross-model pattern analysis. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds research/autoconfig/docs/agent-design.md — strategic design for the agent layer of winml-cli, covering: - winml-cli vs Olive distinction (UX + Windows-first + explainability) - Why autoconfig search is a sub-tool, not the agent entry point - 5 agent types: Diagnostic, Decision Guidance, Cross-Device Confidence, Regression Detection, Model Recommendation - Autoconfig's role within the agent framework - Key concerns and open questions Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds research/autoconfig/docs/skills-design.md — full design doc for the winml-cli skills/agent layer, including: - 11 skill designs (use-winml-cli, optimize-for-device, ep-compatibility-check, debug-accuracy-drop, and others) - Competitive analysis (Apple coremltools, ExecuTorch, AI Hub, NVIDIA ModelOpt, OpenVINO, Olive) - Top 5 feature gaps - Validation confidence levels (L1-L5) - Structured output requirements - QNN NPU catalog sweep findings (npu-001/006/007) - FusedConv unfuse feature request Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| import json | ||
|
|
||
|
|
||
| results = json.load(open(r"ablation-search\results.json")) |
…ping skills - Split skill catalog into two ranked categories by the 'does it touch code?' discriminator: User (config-only) and Contributor (code changes) - Merge overlapping skills (12 -> 9): - check-model-feasibility = find-a-model + ep-compatibility-check - ship-to-winapp = validate-before-ship + prepare-for-winapp - autoconfig absorbs optimize-for-device as its manual mode - Add self-contained HTML render of the design doc for easier reading
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this is
Adds
research/autoconfig/— an experimental automated config search system that finds the optimalwinml-clibuild configuration (EP, opset, graph optimizations) for a given model on Windows hardware without requiring the user to understand ORT/EP optimizer internals.The core loop (
autoconfig.py) is Explorer → Optimizer → Reviewer:ep_knowledge/winml build+winml perf(two-phase: 200-iter CV screen → 3×500-iter full bench)catalog_qnn_sweep.pysweeps a fixed hypothesis matrix (h0–h5: baseline, opset 17–21, conv fusions) across a catalog of 8 models on QNN NPU.Key findings — 8-model QNN NPU catalog sweep
npu-001: opset 21 NHWC bypass — architecture-specific (+26–31% on Conv+residual)
Opset ≥ 21 bypasses ORT's NHWC layout transformer for QNN EP. This gives a large speedup on Conv + residual models but is neutral or slightly harmful for pure transformers:
Root cause confirmed in ORT source:
IsSupportedOpset()gate inlayout_transformation.cc— bypassing NHWC layout transform leaves fewer uncancellable Transpose nodes in the HTP graph for Conv+residual models.npu-006: Conv fusions cause ~4900% regression on ResNet-18 QNN NPU
conv_bn_fusion,conv_add_fusion,conv_activation_fusionproduce fused op nodes that QNN EP cannot execute natively, causing CPU fallback for every fused Conv:Feature gap:
winmlshould detect when the target EP would CPU-fallback fused Conv ops and suppress incompatible fusions automatically.npu-007: DVFS thermal noise — CV < 15% gate must be disabled for QNN NPU
QNN NPU CV is consistently 0.10–2.0+ across all models due to DVFS thermal throttling. The Phase-A CV gate blocks all models. Reliable comparison requires ≥ 1500 total iterations; differences < 10% are unreliable.
Feature gap:
winml perfshould support--thermal-stabilizationmode and report confidence intervals.Feature gaps identified
winml analyze— detect Conv ops that would CPU-fallback on QNN NPU after fusion (npu-006); warn or suppress incompatible fusions in the generated build configwinml perf --thermal-stabilization; report CI not just p50--quickflag for large models (YOLOS 78ms × 3×500 iters = 207s/hypothesis exhausts 20-min budget after 2 hypotheses)What's included
autoconfig.py— adaptive single-model config search loop (ConvNext CPU baseline)catalog_qnn_sweep.py— generalized multi-model QNN NPU sweepanalyze_graph.py— ONNX graph pattern analysis helperautoconfig_diagram.html— Explorer/Optimizer/Reviewer architecture diagramgen_report_v3.py— HTML report generatorep_knowledge/— empirical KB with confirmed findings per EP (cpu, dml, qnn_gpu, qnn_npu)catalog-qnn-sweep/— per-modelresults.json+SUMMARY.mdfor 8 catalog modelsStatus: research POC — not production code.