Skip to content

feat: report absolute eval time in nanoseconds instead of relative to reference#1383

Open
ArpanC6 wants to merge 3 commits intoTuringLang:mainfrom
ArpanC6:improve/benchmark-absolute-time
Open

feat: report absolute eval time in nanoseconds instead of relative to reference#1383
ArpanC6 wants to merge 3 commits intoTuringLang:mainfrom
ArpanC6:improve/benchmark-absolute-time

Conversation

@ArpanC6
Copy link
Copy Markdown
Contributor

@ArpanC6 ArpanC6 commented May 3, 2026

Closes #1374 (partially)

What was the problem?

The benchmark reported t(eval) as a ratio relative to an arbitrary
reference function (simple_assume_observe_non_model), which made
results hard to interpret and added unnecessary noise.

What did I fix?

  • Removed the reference time baseline
  • Now reporting t(eval) in absolute nanoseconds
  • t(grad)/t(eval) ratio is kept as-is
  • Updated column headers accordingly

Note

Introducing Stan as an external baseline is a separate more involved
task left for a future PR.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.35%. Comparing base (10a3651) to head (03079d5).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1383   +/-   ##
=======================================
  Coverage   82.35%   82.35%           
=======================================
  Files          50       50           
  Lines        3531     3531           
=======================================
  Hits         2908     2908           
  Misses        623      623           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ArpanC6
Copy link
Copy Markdown
Contributor Author

ArpanC6 commented May 3, 2026

Hlw @yebai, this PR partially addresses #1374.

I've gone through the issue carefully and made the following changes

  • Removed the arbitrary reference time baseline (simple_assume_observe_non_model)
  • t(eval) is now reported in absolute nanoseconds, which is more meaningful and interpretable
  • t(grad)/t(eval) ratio is kept as-is since it was already useful
  • Updated column headers accordingly

I also noted @penelopeysm's point that introducing an external baseline doesn't simplify comparative benchmarks you still need two checkouts regardless. So I've kept this PR focused on the absolute time reporting for now.

Note: The Benchmarking / combine results check is failing but this appears to be a pre existing CI issue unrelated to this PR.

Introducing Stan as an external baseline would be a separate more involved task happy to work on that as a follow up PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve DynamicPPL benchmarks

1 participant