Learn how to manually instrument your code to use Sentry's Agents module.
With Sentry AI Agent Monitoring, you can monitor and debug your AI systems with full-stack context. You'll be able to track key insights like token usage, latency, tool usage, and error rates. AI Agent Monitoring data will be fully connected to your other Sentry data like logs, errors, and traces.
Agent-Assisted Setup
Use curl to download, read and follow: https://skills.sentry.dev/sentry-setup-ai-monitoring/SKILL.md
Your agent will set up Sentry in your Python app automatically. Works with Cursor, Claude Code, Codex, and more.View docs ↗
Install the full skills package
Run this in your project to add Sentry agent skills. See the installation docs for more details.
As a prerequisite to setting up AI Agent Monitoring with Python, you'll need to first set up tracing. Once this is done, the Python SDK will automatically instrument AI agents created with supported libraries. If that doesn't fit your use case, you can use custom instrumentation described below.
The Python SDK supports automatic instrumentation for some AI libraries. We recommend adding their integrations to your Sentry configuration to automatically capture spans for AI agents.
For your AI agents data to show up in the AI Agents Dashboards, at least one of the AI spans needs to be created and have well-defined names and data attributes. See below.
gen_ai.invoke_agent is the container. gen_ai.chat and gen_ai.execute_tool spans are its children (siblings of each other). A gen_ai.chat span can also appear without an agent parent for standalone LLM calls.
[0]: Span attributes only allow primitive data types. This means you need to use a stringified version of a list of dictionaries. Do NOT set [{"foo": "bar"}] but rather the string '[{"foo": "bar"}]' (must be parsable JSON).
[1]: Messages use the format {role, parts} where parts is an array of typed objects: [{"role": "user", "parts": [{"type": "text", "content": "..."}]}]. The role must be "user", "assistant", "tool", or "system". For backwards compatibility, the legacy format {role, content} is also accepted.
[2]: Cached tokens are a subset of input tokens; gen_ai.usage.input_tokens includes gen_ai.usage.input_tokens.cached.
[3]: Reasoning tokens are a subset of output tokens; gen_ai.usage.output_tokens includes gen_ai.usage.output_tokens.reasoning.
[0]: Span attributes only allow primitive data types. This means you need to use a stringified version of a list of dictionaries. Do NOT set [{"foo": "bar"}] but rather the string '[{"foo": "bar"}]' (must be parsable JSON).
[1]: Messages use the format {role, parts} where parts is an array of typed objects: [{"role": "user", "parts": [{"type": "text", "content": "..."}]}]. The role must be "user", "assistant", "tool", or "system". For backwards compatibility, the legacy format {role, content} is also accepted.
[2]: Cached tokens are a subset of input tokens; gen_ai.usage.input_tokens includes gen_ai.usage.input_tokens.cached.
[3]: Reasoning tokens are a subset of output tokens; gen_ai.usage.output_tokens includes gen_ai.usage.output_tokens.reasoning.
For a complete guide on naming agents across all supported frameworks, see Naming Your Agents.
This span marks the transition of control from one agent to another, typically when the current agent determines another agent is better suited to handle the task.
Handoff span attributes
A span that describes the handoff from one agent to another.
The spans op MUST be "gen_ai.handoff".
The spans name SHOULD be "handoff from {from_agent} to {to_agent}".
import sentry_sdk
with sentry_sdk.start_span(op="gen_ai.handoff", name="handoff from Weather Agent to Travel Agent"):pass# Handoff span just marks the transitionwith sentry_sdk.start_span(op="gen_ai.invoke_agent", name="invoke_agent Travel Agent"):# Run the target agent herepass
Tracking Conversations has alpha stability. Configuration options and behavior may change.
For AI applications that involve multi-turn conversations, you can use sentry_sdk.ai.set_conversation_id() to associate all AI spans from the same conversation. This enables you to track and analyze complete conversation flows within Sentry.
The conversation ID is set as the gen_ai.conversation.id attribute on all AI-related spans in the current scope. To remove the conversation ID, use the remove_conversation_id() method on the Scope.
Copied
import sentry_sdk.ai
sentry_sdk.ai.set_conversation_id("conv_abc123")# All subsequent AI calls will be linked to this conversation
Some integrations, like the OpenAI integration, will automatically set the conversation ID for you, when you use APIs that expose that.
Copied
import sentry_sdk
import openai
sentry_sdk.init(...)conversation = openai.conversations.create()response = openai.responses.create( model="gpt-4.1",input=[{"role":"user","content":"What are the 5 Ds of dodgeball?"}], conversation=conversation.id# this will automatically set `gen_ai.conversation.id` on the span)
Some attributes are common to all AI Agents spans:
Data Attribute
Type
Requirement Level
Description
Example
gen_ai.operation.name
string
required
The name of the operation being performed. [4]
"chat"
gen_ai.provider.name
string
optional
The Generative AI product as identified by the client or server instrumentation.
"openai"
[4]:gen_ai.operation.name is what Sentry uses to classify spans in AI dashboards. Well-defined values include: "chat", "invoke_agent", "execute_tool", "embeddings", "generate_content", "text_completion", "create_agent", "handoff".
When manually setting token attributes, be aware of how Sentry uses them to calculate model costs.
Cached and reasoning tokens are subsets, not separate counts.gen_ai.usage.input_tokens is the total input token count that already includes any cached tokens. Similarly, gen_ai.usage.output_tokens already includes reasoning tokens. Sentry subtracts the cached/reasoning counts from the totals to compute the "raw" portion, so reporting them incorrectly can produce wrong or negative costs.
For example, say your LLM call uses 100 input tokens total, 90 of which were served from cache. Using a standard rate of $0.01 per token and a cached rate of $0.001 per token:
Correct — input_tokens is the total (includes cached):
Help improve this content Our documentation is open source and available on GitHub. Your contributions are welcome, whether fixing a typo (drat!) or suggesting an update ("yeah, this would be better").