Skip to content

Analyse Measured vs Reported Runtime (v2) – Issue #396#432

Open
Vamsipriya22 wants to merge 6 commits intomainfrom
priya/run-v2-runtimes
Open

Analyse Measured vs Reported Runtime (v2) – Issue #396#432
Vamsipriya22 wants to merge 6 commits intomainfrom
priya/run-v2-runtimes

Conversation

@Vamsipriya22
Copy link
Contributor

This PR is related to the issue #396.

A new notebook has been added:

notebooks/analyze_runtime_discrepancies_v2.ipynb

The notebook analyses the difference between:

  • Runtime (s) (measured)
  • Reported Runtime (s) (solver-reported)

for all successful, non-reference benchmarks in the v2 results.


What was done

  • Filtered results with Status == "ok"
  • Computed:
    • runtime-difference
    • runtime-difference-% (absolute difference divided by the maximum of the two runtimes)
  • Merged benchmark metadata:
    • bench-size
    • solver-version
    • Num. variables
    • Num. constraints
    • size category (S / M / L)
  • Re-ran selected benchmarks (S, M, L) using run_solver.py with:
    • the same conda environment
    • the same solver version
      to check whether discrepancies can be reproduced.

Key Findings

Small (S) Benchmarks

  • Absolute runtime difference is extremely small (often a few milliseconds).
  • Percentage difference appears very large (sometimes ~90%+), because the runtime itself is very small.
  • Likely causes:
    • Solver startup overhead
    • Measurement noise dominating short runtimes
Screenshot from 2026-02-18 18-38-52

Medium (M) Benchmarks

  • Moderate runtime differences (~5–10%).
  • Absolute differences typically around ~0.5–1 second.
  • Likely causes:
    • LP/MPS file parsing time
    • Preprocessing outside solver internal timing
Screenshot from 2026-02-18 18-39-55

Large (L) Benchmarks

  • Large absolute runtime differences (tens to hundreds of seconds).
  • Percentage differences are generally small (~1–17%).
  • Likely causes:
    • Slow parsing of large LP/MPS files
    • Solver initialization overhead
    • License validation time (especially Gurobi)
    • Additional setup not included in solver-reported runtime
Screenshot from 2026-02-18 18-39-29

Observations

  • Many benchmarks (including large ones) show zero difference, especially for highs-hipo-1.12.0-hipo.
Screenshot from 2026-02-18 19-16-00
  • No systematic inconsistency between measured and reported runtimes was observed.
  • Differences are primarily due to overhead outside the solver’s internal timing.

@vercel
Copy link

vercel bot commented Feb 18, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
solver-benchmark Ready Ready Preview, Comment Mar 5, 2026 3:11pm

Request Review

Copy link
Member

@eantonini eantonini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good, thanks for performing this analysis and adding the key conclusions to the Jupyter notebook.

Copy link
Member

@siddharth-krishna siddharth-krishna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Priya, for the analysis and detailed notes. I notice that in the notebook many of the bench-sizes with large runtime-difference-% have really small runtime. Can we:

  • Filter out the instances with runtime < 1min -- even if there is a big discrepancy here people don't really care about such quick solution times
  • Show the top 5 bench-sizes sorted by runtime-difference-%, in each category S/M/L. I see this in the PR description but not in the jupyter notebook. Also, the screenshot in the PR description for Medium instances has S and L instances too. And the screenshot for Large instances isn't sorted by runtime-difference-%.

"cell_type": "markdown",
"metadata": {},
"source": [
"# 4. Identify Benchmark-Solver Pairs with Largest Runtime Discrepancies"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the items on this table have runtime <= 1s. Can we first filter to those with runtime > 1min and then check the largest runtime-difference-%?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Analyze measured vs reported runtime in v2

3 participants