[SDK] Implement the ProbabilitySampler#4135
Conversation
7a9d7a0 to
44a42f3
Compare
|
here is a demo against this branch to show the behaviour end to end. it shows the threshold boundary at R >= T, the th encoding, rv taking precedence over the trace id, th replacement, and a foreign tracestate entry surviving untouched. output: demo.cc#include <cstdint>
#include <cstdio>
#include <map>
#include <string>
#include <utility>
#include <vector>
#include "opentelemetry/common/key_value_iterable_view.h"
#include "opentelemetry/sdk/trace/samplers/probability.h"
#include "opentelemetry/trace/span_context.h"
#include "opentelemetry/trace/span_context_kv_iterable_view.h"
#include "opentelemetry/trace/trace_state.h"
#include "src/common/random.h"
namespace trace_api = opentelemetry::trace;
using opentelemetry::sdk::common::Random;
using opentelemetry::sdk::trace::Decision;
using opentelemetry::sdk::trace::ProbabilitySampler;
namespace {
trace_api::TraceId TraceIdWithLow56(uint64_t v) {
uint8_t buf[16] = {0};
for (int i = 0; i < 7; ++i)
buf[15 - i] = static_cast<uint8_t>(v >> (8 * i));
return trace_api::TraceId(buf);
}
opentelemetry::sdk::trace::SamplingResult
Sample(ProbabilitySampler &s, const trace_api::SpanContext &ctx,
trace_api::TraceId tid) {
std::map<std::string, int> attrs;
std::vector<
std::pair<trace_api::SpanContext, std::map<std::string, std::string>>>
links;
opentelemetry::common::KeyValueIterableView<decltype(attrs)> attrs_view{
attrs};
trace_api::SpanContextKeyValueIterableView<decltype(links)> links_view{links};
return s.ShouldSample(ctx, tid, "demo", trace_api::SpanKind::kInternal,
attrs_view, links_view);
}
const char *Header(const opentelemetry::sdk::trace::SamplingResult &r) {
static std::string h;
h = r.trace_state ? r.trace_state->ToHeader() : "(parent tracestate kept)";
return h.c_str();
}
} // namespace
int main() {
// ratio 1.0: always sampled, threshold 0 emitted as ot=th:0
ProbabilitySampler all(1.0);
auto r =
Sample(all, trace_api::SpanContext::GetInvalid(), TraceIdWithLow56(0));
printf("ratio 1.0 -> %s, tracestate: %s\n",
r.decision == Decision::RECORD_AND_SAMPLE ? "SAMPLE" : "DROP",
Header(r));
// ratio 0.5: threshold is 2^55. R = 2^55 samples, R = 2^55-1 drops.
ProbabilitySampler half(0.5);
r = Sample(half, trace_api::SpanContext::GetInvalid(),
TraceIdWithLow56(0x80000000000000));
printf("ratio 0.5, R=0x80000000000000 -> %s, tracestate: %s\n",
r.decision == Decision::RECORD_AND_SAMPLE ? "SAMPLE" : "DROP",
Header(r));
r = Sample(half, trace_api::SpanContext::GetInvalid(),
TraceIdWithLow56(0x7fffffffffffff));
printf("ratio 0.5, R=0x7fffffffffffff -> %s\n",
r.decision == Decision::RECORD_AND_SAMPLE ? "SAMPLE" : "DROP");
// 100000 random trace ids at ratio 0.5
int sampled = 0;
for (int i = 0; i < 100000; ++i) {
uint8_t buf[16];
Random::GenerateRandomBuffer(buf);
if (Sample(half, trace_api::SpanContext::GetInvalid(),
trace_api::TraceId(buf))
.decision == Decision::RECORD_AND_SAMPLE)
++sampled;
}
printf("ratio 0.5, 100000 random trace ids -> sampled %d (%.3f)\n", sampled,
sampled / 100000.0);
// explicit rv in the parent tracestate wins over trace id randomness
uint8_t id[16] = {1};
trace_api::SpanContext rv_high(
trace_api::TraceId(id), trace_api::SpanId(), trace_api::TraceFlags{0},
false, trace_api::TraceState::FromHeader("ot=rv:ffffffffffffff"));
r = Sample(half, rv_high, TraceIdWithLow56(0));
printf("ratio 0.5, rv=ff.. trace id low bits 0 -> %s, tracestate: %s\n",
r.decision == Decision::RECORD_AND_SAMPLE ? "SAMPLE" : "DROP",
Header(r));
// existing th is replaced, other entries survive
ProbabilitySampler quarter(0.25);
trace_api::SpanContext with_th(
trace_api::TraceId(id), trace_api::SpanId(), trace_api::TraceFlags{0},
false, trace_api::TraceState::FromHeader("ot=th:8,congo=t61rcWkgMzE"));
r = Sample(quarter, with_th, TraceIdWithLow56(0xffffffffffffff));
printf("ratio 0.25, parent 'ot=th:8,congo=..' -> %s, tracestate: %s\n",
r.decision == Decision::RECORD_AND_SAMPLE ? "SAMPLE" : "DROP",
Header(r));
return 0;
} |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4135 +/- ##
==========================================
+ Coverage 83.00% 83.08% +0.08%
==========================================
Files 406 407 +1
Lines 17256 17361 +105
==========================================
+ Hits 14321 14422 +101
- Misses 2935 2939 +4
🚀 New features to boost your workflow:
|
44a42f3 to
4685f2b
Compare
|
about last force push-
|
| /** | ||
| * Converts a ratio in [0, 1] to a 56-bit rejection threshold. | ||
| */ | ||
| uint64_t CalculateRejectionThreshold(double ratio) noexcept |
There was a problem hiding this comment.
Can we reuse CalculateThreshold, most of the logic seems same?
There was a problem hiding this comment.
Okay. I'm reusing CalculateThreshold and deleting the duplicate.
| return hex; | ||
| } | ||
|
|
||
| uint64_t RandomnessFromTraceId(const trace_api::TraceId &trace_id) noexcept |
There was a problem hiding this comment.
Okay. Reusing GetRandomnessFromTraceId.
| } | ||
|
|
||
| if (randomness < threshold_) | ||
| return {Decision::DROP, nullptr, {}}; |
There was a problem hiding this comment.
If this sampler drops the span, it returns no updated tracestate, so the child span falls back to the parent tracestate and can keep an old ot=th value. Since this is a new independent probability decision, that threshold is no longer valid. Can you check the logic here.
There was a problem hiding this comment.
Fixed it to strip th on the drop path and return the recomputed tracestate, with rv and the other sub keys preserved. Added a test.
…tale th on the drop path
4685f2b to
71a91df
Compare
| { | ||
| if (parent_context.IsValid() && !parent_context.trace_flags().IsRandom()) | ||
| { | ||
| static std::atomic<bool> warned{false}; |
There was a problem hiding this comment.
Non-blocking: probably sample on the warning instead of just showing the first warning.
There was a problem hiding this comment.
Kept it once-per-process to stay off the per-span hot path.
| } | ||
|
|
||
| std::string new_ot_value = SetThresholdSubKey(ot_value, EncodeThreshold(threshold_)); | ||
| if (!trace_api::TraceState::IsValidValue(new_ot_value)) |
There was a problem hiding this comment.
Same stale-th class of issue you fixed for the drop path can still occur on the sample path: when IsValidValue(new_ot_value) is false or the tracestate is already full, we return an empty trace_state, so the child inherits the parent's (possibly stale) ot=th. Should we strip the parent th (reusing the drop-path OtelTraceState logic) before returning RECORD_AND_SAMPLE in those branches?
There was a problem hiding this comment.
Yes. Reused the drop-path strip on the invalid-ot branch. The full branch is !has_ot, so nothing stale there. Added a test.
Signed-off-by: ayush-that <ayush1337@hotmail.com>
Fixes #4127
ref: https://opentelemetry.io/docs/specs/otel/trace/sdk/#probabilitysampler
Changes
adds the ProbabilitySampler from the spec. it turns the configured ratio into a 56 bit threshold, takes the randomness value from the rv key in the ot tracestate if there is one, otherwise from the last 7 bytes of the trace id, and samples the span when the randomness is greater than or equal to the threshold. sampled spans get the threshold written into the tracestate as the th key. the file configuration for the composable probability sampler now uses this sampler instead of TraceIdRatioBasedSampler.
the threshold keeps full 56 bit precision (same as opentelemetry-java-contrib). composable samplers are a separate effort (#4028), and the deprecated TraceIdRatioBased sampler is not changed.
CHANGELOG.mdupdated for non-trivial changes