Skip to content

sdk/trace/x: Add experimental TraceIDRatioBased sampler#8123

Open
yuanyuanzhao3 wants to merge 3 commits intoopen-telemetry:mainfrom
yuanyuanzhao3:traceidratio-sampler-x
Open

sdk/trace/x: Add experimental TraceIDRatioBased sampler#8123
yuanyuanzhao3 wants to merge 3 commits intoopen-telemetry:mainfrom
yuanyuanzhao3:traceidratio-sampler-x

Conversation

@yuanyuanzhao3
Copy link
Copy Markdown
Contributor

Description

Adds an experimental XTraceIDRatioBased sampler in go.opentelemetry.io/otel/sdk/trace/x that conforms to the OpenTelemetry specification's threshold-based sampling algorithm.

Features

  • Threshold-based sampling: Uses the least significant 56 bits of the trace ID (per W3C Trace Context Level 2 Random Trace ID Flag) for deterministic sampling decisions
  • Tracestate th handling: Encodes and propagates the sampling threshold in the W3C ot tracestate vendor key for consistent downstream sampling
  • Random bit support: Integrates with TraceFlags.IsRandom() and WithRandom() (trace: add Random Trace ID Flag #8012) for proper indication of th value for extrapolated metrics support

Files

  • sdk/trace/x/sampler.go — Core sampler implementation (XTraceIDRatioBased)
  • sdk/trace/x/sampler_test.go — Tests for sampler behavior
  • sdk/trace/x/tracestate.go — Tracestate th/rv key helpers
  • sdk/trace/x/tracestate_test.go — Tests for tracestate helpers
  • sdk/trace/x/README.md — Documentation for the experimental feature

Related

Co-Author

Joshua MacDonald [email protected]

Add a threshold-based TraceIDRatioBased sampler that conforms to the
OpenTelemetry specification. It uses the least significant 56 bits of
the trace ID for deterministic sampling decisions and propagates the
sampling threshold via the `th` sub-key in the W3C `ot` tracestate
vendor key. When an explicit `rv` (randomness value) is present in
the tracestate, it is used instead of the trace ID.
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 1, 2026

Codecov Report

❌ Patch coverage is 97.14286% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.2%. Comparing base (97a086e) to head (6c0d82c).
⚠️ Report is 23 commits behind head on main.

Files with missing lines Patch % Lines
sdk/trace/x/sampler.go 94.4% 2 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #8123     +/-   ##
=======================================
+ Coverage   81.8%   82.2%   +0.4%     
=======================================
  Files        308     311      +3     
  Lines      23709   24316    +607     
=======================================
+ Hits       19396   20007    +611     
+ Misses      3935    3929      -6     
- Partials     378     380      +2     
Files with missing lines Coverage Δ
sdk/trace/x/tracestate.go 100.0% <100.0%> (ø)
sdk/trace/x/sampler.go 94.4% <94.4%> (ø)

... and 28 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@dashpole dashpole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused by this still. I see we already have a TraceIDRatioBased sampler in the SDK. I also see that the sampler is marked deprecated in the spec: https://opentelemetry.io/docs/specs/otel/trace/sdk/#traceidratiobased, so i'm not sure why we would be adding new features to it. The changes here seem to align with the ProbabilitySampler specification, not the TraceIDRatioBased sampler specification, so i'm not sure why this is named TraceIDRatioBased, and links to that specification.

@jmacd maybe you can clarify? Is the intent that SDKs introduce a ProbabilitySampler artifact along-side the TraceIDRatioBased sampler? Or is the intent that we only keep TraceIDRatioBased, but update it to follow the ProbabilitySampler spec (is that breaking)?


const (
// DefaultSamplingPrecision is the default precision for threshold encoding.
DefaultSamplingPrecision = 4
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be exported? Is it useful for users in some way?

// delegate of a ParentBased sampler.
//
//nolint:revive // XTraceIDRatioBased matches the OpenTelemetry sampler naming convention.
func XTraceIDRatioBased(fraction float64) sdktrace.Sampler {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to prefix this with X, since it is already in an experimental package.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly wanted to make it explicit that this is intended to become part of the TraceIDRatioBased but is now different (I'm aware it is under a different package). I can rename if that's what we want.

return &xTraceIDRatioSampler{
threshold: threshold,
thkv: "th:" + tvalue,
description: fmt.Sprintf("XTraceIDRatioBased{%g}", fraction),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is required to be TraceIdRatioBased{%g} by the specification.

@yuanyuanzhao3
Copy link
Copy Markdown
Contributor Author

I'm a bit confused by this still. I see we already have a TraceIDRatioBased sampler in the SDK. I also see that the sampler is marked deprecated in the spec: https://opentelemetry.io/docs/specs/otel/trace/sdk/#traceidratiobased, so i'm not sure why we would be adding new features to it. The changes here seem to align with the ProbabilitySampler specification, not the TraceIDRatioBased sampler specification, so i'm not sure why this is named TraceIDRatioBased, and links to that specification.

@jmacd maybe you can clarify? Is the intent that SDKs introduce a ProbabilitySampler artifact along-side the TraceIDRatioBased sampler? Or is the intent that we only keep TraceIDRatioBased, but update it to follow the ProbabilitySampler spec (is that breaking)?

Not @jmacd, but here's my understanding:

This is intended to be a new API-level compatible replacement for the original TraceIDRatioBased sampler. The difference lies in the underlying sampling algorithm and the information it passes through tracestate, which allows proper extrapolation of span metrics. I don't believe the previous tracestate information was ever used by any of our supported connectors (but others might know a place or two that I'm not aware of).

The new spec: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/tracestate-probability-sampling.md is what this PR is about and it is a kind of probability sampling.

I do not know why the place https://opentelemetry.io/docs/specs/otel/trace/sdk/#traceidratiobased is marking TraceIdRatioBased as deprecated.

hbits = 4
)
if fraction > probabilityOneThreshold {
return sdktrace.AlwaysSample()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't propagate th properly, right? It also doesn't return the correct description.

Same comment for NeverSample below.

Copy link
Copy Markdown
Contributor

@jmacd jmacd Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I compared with https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/208b7c0565fe51033fa8d8b3d96a3c3dcba79a3f/pkg/sampling/probability.go#L33, which is an ancestor of this. It returns an error in these cases.

I realize this may have happened because in an earlier implementation of this, I had assumed that AlwaysSample() would return a sampler that sets ot=th:0. @yuanyuanzhao3 sadly note that https://opentelemetry.io/docs/specs/otel/trace/sdk/#alwayson does not dictate to set th:0, and I'm not sure myself whether this counts as oversight or just the path of least resistance. It suggests we should have ProbabilitySampler with fraction=1 fix this, later we can discuss modifying the OTel SDK (which is I think what we want).

@dashpole
Copy link
Copy Markdown
Contributor

dashpole commented Apr 8, 2026

The new spec: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/tracestate-probability-sampling.md is what this PR is about and it is a kind of probability sampling.

I do not know why the place https://opentelemetry.io/docs/specs/otel/trace/sdk/#traceidratiobased is marking TraceIdRatioBased as deprecated.

The sdk/trace package (and experimental features in sdk/trace/x) needs to follow the specification in https://opentelemetry.io/docs/specs/otel/trace/sdk. The only place that tracestate-probability-sampling.md is referenced from in the SDK spec is from https://opentelemetry.io/docs/specs/otel/trace/sdk/#probabilitysampler. So if you are implementing https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/tracestate-probability-sampling.md, I assume it should follow the ProbabilitySampler specification in the SDK.

If my understanding is correct, please make sure it follows the ProbabilitySampler specification.

Other than naming this ProbabilitySampler, i don't see the compatibility warning described here: https://opentelemetry.io/docs/specs/otel/trace/sdk/#compatibility-warnings-for-probabilitysampler

@yuanyuanzhao3 yuanyuanzhao3 changed the title sdk/trace/x: Add experimental XTraceIDRatioBased sampler sdk/trace/x: Add experimental TraceIDRatioBased sampler Apr 9, 2026
@jmacd
Copy link
Copy Markdown
Contributor

jmacd commented Apr 10, 2026

I can probably explain some confusion -- there was a late-entering change in the sampling specification as it moved from OTEP 235 to the SDK specification, where we decided on the name ProbabilitySampler and that TraceIDRatioBased would have a deprecation period.

@yuanyuanzhao3 see open-telemetry/opentelemetry-specification#4627 for details. Sorry for the confusion! Thank you @dashpole for your guidance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants