sdk/trace/x: Add experimental TraceIDRatioBased sampler#8123
sdk/trace/x: Add experimental TraceIDRatioBased sampler#8123yuanyuanzhao3 wants to merge 3 commits intoopen-telemetry:mainfrom
Conversation
Add a threshold-based TraceIDRatioBased sampler that conforms to the OpenTelemetry specification. It uses the least significant 56 bits of the trace ID for deterministic sampling decisions and propagates the sampling threshold via the `th` sub-key in the W3C `ot` tracestate vendor key. When an explicit `rv` (randomness value) is present in the tracestate, it is used instead of the trace ID.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #8123 +/- ##
=======================================
+ Coverage 81.8% 82.2% +0.4%
=======================================
Files 308 311 +3
Lines 23709 24316 +607
=======================================
+ Hits 19396 20007 +611
+ Misses 3935 3929 -6
- Partials 378 380 +2
🚀 New features to boost your workflow:
|
dashpole
left a comment
There was a problem hiding this comment.
I'm a bit confused by this still. I see we already have a TraceIDRatioBased sampler in the SDK. I also see that the sampler is marked deprecated in the spec: https://opentelemetry.io/docs/specs/otel/trace/sdk/#traceidratiobased, so i'm not sure why we would be adding new features to it. The changes here seem to align with the ProbabilitySampler specification, not the TraceIDRatioBased sampler specification, so i'm not sure why this is named TraceIDRatioBased, and links to that specification.
@jmacd maybe you can clarify? Is the intent that SDKs introduce a ProbabilitySampler artifact along-side the TraceIDRatioBased sampler? Or is the intent that we only keep TraceIDRatioBased, but update it to follow the ProbabilitySampler spec (is that breaking)?
sdk/trace/x/sampler.go
Outdated
|
|
||
| const ( | ||
| // DefaultSamplingPrecision is the default precision for threshold encoding. | ||
| DefaultSamplingPrecision = 4 |
There was a problem hiding this comment.
Does this need to be exported? Is it useful for users in some way?
sdk/trace/x/sampler.go
Outdated
| // delegate of a ParentBased sampler. | ||
| // | ||
| //nolint:revive // XTraceIDRatioBased matches the OpenTelemetry sampler naming convention. | ||
| func XTraceIDRatioBased(fraction float64) sdktrace.Sampler { |
There was a problem hiding this comment.
No need to prefix this with X, since it is already in an experimental package.
There was a problem hiding this comment.
Mainly wanted to make it explicit that this is intended to become part of the TraceIDRatioBased but is now different (I'm aware it is under a different package). I can rename if that's what we want.
sdk/trace/x/sampler.go
Outdated
| return &xTraceIDRatioSampler{ | ||
| threshold: threshold, | ||
| thkv: "th:" + tvalue, | ||
| description: fmt.Sprintf("XTraceIDRatioBased{%g}", fraction), |
There was a problem hiding this comment.
This is required to be TraceIdRatioBased{%g} by the specification.
Not @jmacd, but here's my understanding: This is intended to be a new API-level compatible replacement for the original TraceIDRatioBased sampler. The difference lies in the underlying sampling algorithm and the information it passes through The new spec: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/tracestate-probability-sampling.md is what this PR is about and it is a kind of probability sampling. I do not know why the place https://opentelemetry.io/docs/specs/otel/trace/sdk/#traceidratiobased is marking TraceIdRatioBased as deprecated. |
| hbits = 4 | ||
| ) | ||
| if fraction > probabilityOneThreshold { | ||
| return sdktrace.AlwaysSample() |
There was a problem hiding this comment.
This doesn't propagate th properly, right? It also doesn't return the correct description.
Same comment for NeverSample below.
There was a problem hiding this comment.
I compared with https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/208b7c0565fe51033fa8d8b3d96a3c3dcba79a3f/pkg/sampling/probability.go#L33, which is an ancestor of this. It returns an error in these cases.
I realize this may have happened because in an earlier implementation of this, I had assumed that AlwaysSample() would return a sampler that sets ot=th:0. @yuanyuanzhao3 sadly note that https://opentelemetry.io/docs/specs/otel/trace/sdk/#alwayson does not dictate to set th:0, and I'm not sure myself whether this counts as oversight or just the path of least resistance. It suggests we should have ProbabilitySampler with fraction=1 fix this, later we can discuss modifying the OTel SDK (which is I think what we want).
The sdk/trace package (and experimental features in sdk/trace/x) needs to follow the specification in https://opentelemetry.io/docs/specs/otel/trace/sdk. The only place that tracestate-probability-sampling.md is referenced from in the SDK spec is from https://opentelemetry.io/docs/specs/otel/trace/sdk/#probabilitysampler. So if you are implementing https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/tracestate-probability-sampling.md, I assume it should follow the ProbabilitySampler specification in the SDK. If my understanding is correct, please make sure it follows the ProbabilitySampler specification. Other than naming this ProbabilitySampler, i don't see the compatibility warning described here: https://opentelemetry.io/docs/specs/otel/trace/sdk/#compatibility-warnings-for-probabilitysampler |
|
I can probably explain some confusion -- there was a late-entering change in the sampling specification as it moved from OTEP 235 to the SDK specification, where we decided on the name @yuanyuanzhao3 see open-telemetry/opentelemetry-specification#4627 for details. Sorry for the confusion! Thank you @dashpole for your guidance. |
Description
Adds an experimental
XTraceIDRatioBasedsampler ingo.opentelemetry.io/otel/sdk/trace/xthat conforms to the OpenTelemetry specification's threshold-based sampling algorithm.Features
thhandling: Encodes and propagates the sampling threshold in the W3Cottracestate vendor key for consistent downstream samplingTraceFlags.IsRandom()andWithRandom()(trace: add Random Trace ID Flag #8012) for proper indication ofthvalue for extrapolated metrics supportFiles
sdk/trace/x/sampler.go— Core sampler implementation (XTraceIDRatioBased)sdk/trace/x/sampler_test.go— Tests for sampler behaviorsdk/trace/x/tracestate.go— Tracestateth/rvkey helperssdk/trace/x/tracestate_test.go— Tests for tracestate helperssdk/trace/x/README.md— Documentation for the experimental featureRelated
Co-Author
Joshua MacDonald [email protected]