Sampling Strategies
Configure trace sampling to capture 100% of AI agent runs without sampling all traffic.
Sentry uses head-based sampling, which means the sampling decision happens once at the root span. All child spans, including gen_ai.* operations, inherit that decision. If the root span is dropped, every nested LLM call, tool execution, and agent handoff in the trace is lost with it.
Agent runs can produce many spans per execution. You either capture the full span tree or lose it entirely. A 10% sample rate means you lose visibility into 90% of agent failures.
When agent runs are the root span (cron jobs, queue consumers, CLI scripts), match on the presence of the gen_ai.operation.name attribute:
Sentry.init({
dsn: process.env.SENTRY_DSN,
tracesSampler: ({ name, attributes, inheritOrSampleWith }) => {
if (attributes?.["gen_ai.operation.name"]) {
return 1.0;
}
return inheritOrSampleWith(0.2);
},
});
def traces_sampler(sampling_context):
attributes = sampling_context.get("transaction_context", {}).get("data") or {}
if attributes.get("gen_ai.operation.name"):
return 1.0
parent = sampling_context.get("parent_sampled")
if parent is not None:
return float(parent)
return 0.2
sentry_sdk.init(dsn="...", traces_sampler=traces_sampler)
When agent runs are nested inside HTTP request handlers (the more common case), the root span is the HTTP transaction. You need to identify the routes that trigger AI work and sample those at 100%:
Sentry.init({
dsn: process.env.SENTRY_DSN,
tracesSampler: ({ name, attributes, inheritOrSampleWith }) => {
// Standalone gen_ai root spans
if (attributes?.["gen_ai.operation.name"]) {
return 1.0;
}
// HTTP routes that serve AI features
if (
name?.includes("/api/chat") ||
name?.includes("/api/agent") ||
name?.includes("/api/generate")
) {
return 1.0;
}
return inheritOrSampleWith(0.2);
},
});
def traces_sampler(sampling_context):
tx_context = sampling_context.get("transaction_context", {})
attributes = tx_context.get("data") or {}
op = tx_context.get("op", "")
name = tx_context.get("name", "")
# Standalone gen_ai root spans
if attributes.get("gen_ai.operation.name"):
return 1.0
# HTTP routes that serve AI features
if op == "http.server" and any(
p in name for p in ["/api/chat", "/api/agent", "/api/generate"]
):
return 1.0
# Honor parent decision in distributed traces
parent = sampling_context.get("parent_sampled")
if parent is not None:
return float(parent)
return 0.2
sentry_sdk.init(dsn="...", traces_sampler=traces_sampler)
Replace /api/chat, /api/agent, and /api/generate with the actual routes in your application that handle AI requests.
LLM API calls cost significantly more per agent run than the trace events Sentry ingests for the same run. Dropping AI traces to save on observability doesn't make sense when the LLM calls behind them cost orders of magnitude more.
If 100% trace sampling isn't feasible at your scale, you can supplement lower trace rates with metrics and structured logs that are emitted on every LLM call, regardless of sampling.
Emit custom Application Metrics on every LLM call to track token usage, latency, and error rates independently of traces:
import * as Sentry from "@sentry/node";
Sentry.metrics.distribution(
"gen_ai.token_usage",
result.usage.totalTokens,
{
unit: "none",
attributes: {
model: "claude-sonnet-4-6",
endpoint: "/api/chat",
},
},
);
Sentry.metrics.distribution("gen_ai.latency", responseTimeMs, {
unit: "millisecond",
attributes: { model: "claude-sonnet-4-6" },
});
Use Sentry structured logging to capture per-call details:
Sentry.logger.info("LLM call completed", {
model: "claude-sonnet-4-6",
input_tokens: result.usage.promptTokens,
output_tokens: result.usage.completionTokens,
latency_ms: responseTimeMs,
status: "success",
});
Our documentation is open source and available on GitHub. Your contributions are welcome, whether fixing a typo (drat!) or suggesting an update ("yeah, this would be better").