Add benchmarks for end-to-end metrics SDK usage#7768
Add benchmarks for end-to-end metrics SDK usage#7768dashpole wants to merge 2 commits intoopen-telemetry:mainfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #7768 +/- ##
=====================================
Coverage 81.7% 81.7%
=====================================
Files 304 304
Lines 23283 23283
=====================================
+ Hits 19030 19032 +2
+ Misses 3866 3864 -2
Partials 387 387 🚀 New features to boost your workflow:
|
a4632e9 to
472ef54
Compare
f11a492 to
7c8a0ad
Compare
|
I added more benchmark cases to try and make it comprehensive, but results are now unreadable. I'll probably get rid of the varying cardinality, as it doesn't impact any of the results, only do 1 and 10 attributes, and get rid of the no-op results (as soon we can just use the Enabled() method anyways. |
|
I've updated the description to explain the scenarios, and why those were chosen, and removed some of the unnecessary permutations of the benchmark. @MrAlias, you had some issues with the framing of the "Dynamic" benchmark case which I tried to address above. I did implement varying degrees of cardinality for the test, but it did not impact the results at all, so I removed it. |
|
I would probably rather have this just in the metrics SDK package. @MrAlias are you ok with that? Otherwise I can keep it in a new module. |
|
Done |
3160196 to
6ddded9
Compare
|
There's still a lint issue. |
|
fixed |
Objective
Part of #7743. I need a benchmark that can demonstrate the performance of using our API, SDK, and attributes packages together when following our performance guide. https://github.com/open-telemetry/opentelemetry-go/blob/main/CONTRIBUTING.md#attribute-and-option-allocation-management.
I settled on benchmarking three scenarios: "Precomputed", "Dynamic", and "Naive".
In the "Precomputed" scenario, it is assumed that the attribute set being measured against is known ahead of time, and that the instrumentation author can enumerate all possible sets, and precompute whatever they want, and keep references to it.
In the "Dynamic" scenario, it is assumed that the attribute set being measured against is not known ahead of time, and that it is not feasible to enumerate all possible attribute sets ahead of time. However, this scenario still assumes bounded cardinality, as writing metrics with an unbounded cardinality is not the intended use of the API. I had originally written these benchmarks with varying overall cardinality, but the cardinality does not impact the test results, as long as it is reasonable and bounded (e.g. < 100,000).
In the "Naive" scenario, it is assumed the user uses the API in the simplest, most ergonomic way. This is an attempt to measure the "default" experience of our API + SDK that users get when they use it.
I also found that relative benchmark results did not change when different levels of parallelism are used, so all benchmark results are single-threaded.
Results
Observations