Instrument AI Agents

Learn how to manually instrument your code to use Sentry's Agents module.

With Sentry AI Agent Monitoring, you can monitor and debug your AI systems with full-stack context. You'll be able to track key insights like token usage, latency, tool usage, and error rates. AI Agent Monitoring data will be fully connected to your other Sentry data like logs, errors, and traces.

As a prerequisite to setting up AI Agent Monitoring with Ruby, you'll need to first set up tracing. Once this is done, you can use the custom instrumentation described below to capture AI agent spans.

For your AI agents data to show up in the AI Agents Dashboards, at least one of the AI spans needs to be created and have well-defined names and data attributes. See details below.

Make sure that there's a transaction running when you create the spans. If you're using a web framework like Rails those transactions will be created for you automatically.

When instrumenting an agent loop, spans nest like this:

Copied
── invoke_agent My Agent          (gen_ai.invoke_agent)
   ├── chat gpt-4o                (gen_ai.chat)         ← 1st LLM call
   ├── execute_tool get_weather   (gen_ai.execute_tool)  ← tool run
   ├── chat gpt-4o                (gen_ai.chat)         ← 2nd LLM call
   └── ...

gen_ai.invoke_agent is the container. gen_ai.chat and gen_ai.execute_tool spans are its children (siblings of each other). A gen_ai.chat span can also appear without an agent parent for standalone LLM calls.

This span represents a request to an LLM model or service that generates a response based on the input prompt.

AI Request span attributes
  • The span op MUST be "gen_ai.{gen_ai.operation.name}". (e.g. "gen_ai.chat")
  • The span name SHOULD be "{gen_ai.operation.name} {gen_ai.request.model}". (e.g. "chat o3-mini")
  • The gen_ai.request.model attribute MUST be the requested model. (e.g. "o3-mini")
  • The gen_ai.response.model attribute MUST be the concrete model that responded. (e.g. "gpt-4o-2024-08-06")
  • If the request originates from an agent, gen_ai.agent.name SHOULD be set to the agent's name. (e.g. "Weather Agent")
  • If relevant, gen_ai.pipeline.name SHOULD be set to the name of the AI workflow or pipeline. (e.g. "weather-pipeline")
  • All Common Span Attributes SHOULD be set (all required common attributes MUST be set).

Data AttributeTypeRequirement LevelDescriptionExample
gen_ai.input.messagesstringoptionalList of message objects sent to the LLM. [0], [1]'[{"role": "user", "parts": [{"type": "text", "content": "..."}]}]'
gen_ai.tool.definitionsstringoptionalList of objects describing the available tools. [0]'[{"name": "random_number", "description": "..."}]'
gen_ai.system_instructionsstringoptionalThe system instructions passed to the model."You are a helpful assistant."
gen_ai.request.frequency_penaltyfloatoptionalModel configuration parameter.0.5
gen_ai.request.max_tokensintoptionalModel configuration parameter.500
gen_ai.request.seedstringoptionalSeed for reproducible outputs."12345"
gen_ai.request.temperaturefloatoptionalModel configuration parameter.0.1
gen_ai.request.top_kintoptionalLimits model to K most likely next tokens.40
gen_ai.request.top_pfloatoptionalModel configuration parameter.0.7
gen_ai.request.presence_penaltyfloatoptionalModel configuration parameter.0.5
gen_ai.request.messagesstringoptionalDeprecated. Use gen_ai.input.messages instead. List of message objects sent to the LLM. [0]'[{"role": "system", "content": "..."}]'
gen_ai.request.available_toolsstringoptionalDeprecated. Use gen_ai.tool.definitions instead. List of objects describing the available tools. [0]'[{"name": "random_number", "description": "..."}]'

Data AttributeTypeRequirement LevelDescriptionExample
gen_ai.response.modelstringrequiredThe concrete model that responded (may differ from gen_ai.request.model)."gpt-4o-2024-08-06"
gen_ai.output.messagesstringoptionalStringified array of message objects representing the model's output. [0], [1]'[{"role": "assistant", "parts": [{"type": "text", "content": "..."}]}]'
gen_ai.response.finish_reasonsstringoptionalStringified array of reasons the model stopped generating. [0]'["stop"]'
gen_ai.response.idstringoptionalUnique identifier for the completion."chatcmpl-abc123"
gen_ai.response.streamingbooleanoptionalWhether the response was streamed.true
gen_ai.response.time_to_first_tokendoubleoptionalSeconds until first response chunk in streaming.0.5
gen_ai.response.tokens_per_seconddoubleoptionalOutput tokens per second throughput.50.0
gen_ai.response.textstringoptionalDeprecated. Use gen_ai.output.messages instead. The text representation of the model's responses."The weather in Paris is rainy"
gen_ai.response.tool_callsstringoptionalDeprecated. Use gen_ai.output.messages instead. The tool calls in the model's response. [0]'[{"name": "random_number", "type": "function_call", "arguments": "..."}]'

Data AttributeTypeRequirement LevelDescriptionExample
gen_ai.usage.input_tokensintoptionalThe number of tokens used in the AI input (prompt), including cached tokens. [2]60
gen_ai.usage.input_tokens.cachedintoptionalThe number of cached tokens used in the AI input (prompt).50
gen_ai.usage.input_tokens.cache_writeintoptionalTokens written to cache when processing input.20
gen_ai.usage.output_tokensintoptionalThe number of tokens used in the AI output, including reasoning tokens. [3]130
gen_ai.usage.output_tokens.reasoningintoptionalThe number of tokens used for reasoning.30
gen_ai.usage.total_tokensintoptionalThe sum of gen_ai.usage.input_tokens and gen_ai.usage.output_tokens.190

Data AttributeTypeRequirement LevelDescriptionExample
gen_ai.cost.input_tokensdoubleoptionalCost of input tokens in USD (without cached).0.005
gen_ai.cost.output_tokensdoubleoptionalCost of output tokens in USD (without reasoning).0.015
gen_ai.cost.total_tokensdoubleoptionalTotal cost for tokens used.0.020
  • [0]: Span attributes only allow primitive data types. This means you need to use a stringified version of a list of dictionaries. Do NOT set [{"foo": "bar"}] but rather the string '[{"foo": "bar"}]' (must be parsable JSON).
  • [1]: Messages use the format {role, parts} where parts is an array of typed objects: [{"role": "user", "parts": [{"type": "text", "content": "..."}]}]. The role must be "user", "assistant", "tool", or "system". For backwards compatibility, the legacy format {role, content} is also accepted.
  • [2]: Cached tokens are a subset of input tokens; gen_ai.usage.input_tokens includes gen_ai.usage.input_tokens.cached.
  • [3]: Reasoning tokens are a subset of output tokens; gen_ai.usage.output_tokens includes gen_ai.usage.output_tokens.reasoning.

Copied
require 'json'

messages = [{ role: 'user', parts: [{ type: 'text', content: 'Tell me a joke' }] }]

Sentry.with_child_span(op: 'gen_ai.chat', description: 'chat o3-mini') do |span|
  span.set_data('gen_ai.operation.name', 'chat')
  span.set_data('gen_ai.request.model', 'o3-mini')
  span.set_data('gen_ai.provider.name', 'openai')
  span.set_data('gen_ai.input.messages', messages.to_json)

  result = client.chat(model: 'o3-mini', messages: messages)

  span.set_data('gen_ai.response.model', result.model)
  span.set_data('gen_ai.output.messages', [
    { role: 'assistant', parts: [{ type: 'text', content: result.choices[0].message.content }] }
  ].to_json)
  span.set_data('gen_ai.response.finish_reasons', [result.choices[0].finish_reason].to_json)
  span.set_data('gen_ai.usage.input_tokens', result.usage.prompt_tokens)
  span.set_data('gen_ai.usage.output_tokens', result.usage.completion_tokens)
end

This span represents the execution of an AI agent, capturing the full lifecycle from receiving a task to producing a final response.

Invoke Agent span attributes

Describes AI agent invocation.

  • The span op MUST be "gen_ai.invoke_agent".
  • The span name SHOULD be "invoke_agent {gen_ai.agent.name}".
  • The gen_ai.operation.name attribute MUST be "invoke_agent".
  • The gen_ai.agent.name attribute SHOULD be set to the agent's name. (e.g. "Weather Agent")
  • If relevant, gen_ai.pipeline.name SHOULD be set to the name of the AI workflow or pipeline the agent belongs to.
  • All Common Span Attributes SHOULD be set (all required common attributes MUST be set).

Additional attributes on the span:

Data AttributeTypeRequirement LevelDescriptionExample
gen_ai.input.messagesstringoptionalList of message objects given to the agent. [0], [1]'[{"role": "user", "parts": [{"type": "text", "content": "..."}]}]'
gen_ai.tool.definitionsstringoptionalList of objects describing the available tools. [0]'[{"name": "random_number", "description": "..."}]'
gen_ai.system_instructionsstringoptionalThe system instructions passed to the model."You are a helpful assistant."
gen_ai.pipeline.namestringoptionalThe name of the AI workflow or pipeline the agent belongs to."weather-pipeline"
gen_ai.request.messagesstringoptionalDeprecated. Use gen_ai.input.messages instead. List of message objects given to the agent. [0]'[{"role": "system", "content": "..."}]'
gen_ai.request.available_toolsstringoptionalDeprecated. Use gen_ai.tool.definitions instead. List of objects describing the available tools. [0]'[{"name": "random_number", "description": "..."}]'

Data AttributeTypeRequirement LevelDescriptionExample
gen_ai.output.messagesstringoptionalStringified array of message objects representing the agent's output. [0], [1]'[{"role": "assistant", "parts": [{"type": "text", "content": "..."}]}]'
gen_ai.response.textstringoptionalDeprecated. Use gen_ai.output.messages instead. The text representation of the agent's response."The weather in Paris is rainy"
gen_ai.response.tool_callsstringoptionalDeprecated. Use gen_ai.output.messages instead. The tool calls in the model's response. [0]'[{"name": "random_number", "type": "function_call", "arguments": "..."}]'

Data AttributeTypeRequirement LevelDescriptionExample
gen_ai.usage.input_tokensintoptionalThe number of tokens used in the AI input (prompt), including cached tokens. [2]60
gen_ai.usage.input_tokens.cachedintoptionalThe number of cached tokens used in the AI input (prompt).50
gen_ai.usage.input_tokens.cache_writeintoptionalTokens written to cache when processing input.20
gen_ai.usage.output_tokensintoptionalThe number of tokens used in the AI output, including reasoning tokens. [3]130
gen_ai.usage.output_tokens.reasoningintoptionalThe number of tokens used for reasoning.30
gen_ai.usage.total_tokensintoptionalThe sum of gen_ai.usage.input_tokens and gen_ai.usage.output_tokens.190

Data AttributeTypeRequirement LevelDescriptionExample
gen_ai.cost.input_tokensdoubleoptionalCost of input tokens in USD (without cached).0.005
gen_ai.cost.output_tokensdoubleoptionalCost of output tokens in USD (without reasoning).0.015
gen_ai.cost.total_tokensdoubleoptionalTotal cost for tokens used.0.020
  • [0]: Span attributes only allow primitive data types. This means you need to use a stringified version of a list of dictionaries. Do NOT set [{"foo": "bar"}] but rather the string '[{"foo": "bar"}]' (must be parsable JSON).
  • [1]: Messages use the format {role, parts} where parts is an array of typed objects: [{"role": "user", "parts": [{"type": "text", "content": "..."}]}]. The role must be "user", "assistant", "tool", or "system". For backwards compatibility, the legacy format {role, content} is also accepted.
  • [2]: Cached tokens are a subset of input tokens; gen_ai.usage.input_tokens includes gen_ai.usage.input_tokens.cached.
  • [3]: Reasoning tokens are a subset of output tokens; gen_ai.usage.output_tokens includes gen_ai.usage.output_tokens.reasoning.

Copied
Sentry.with_child_span(op: 'gen_ai.invoke_agent', description: 'invoke_agent Weather Agent') do |span|
  span.set_data('gen_ai.operation.name', 'invoke_agent')
  span.set_data('gen_ai.request.model', 'o3-mini')
  span.set_data('gen_ai.agent.name', 'Weather Agent')

  result = my_agent.run

  span.set_data('gen_ai.output.messages', [
    { role: 'assistant', parts: [{ type: 'text', content: result.to_s }] }
  ].to_json)
  span.set_data('gen_ai.usage.input_tokens', result.usage.input_tokens)
  span.set_data('gen_ai.usage.output_tokens', result.usage.output_tokens)
end

This span represents the execution of a tool or function that was requested by an AI model, including the input arguments and resulting output.

Execute Tool span attributes

Describes a tool execution.

  • The span op MUST be "gen_ai.execute_tool".
  • The span name SHOULD be "execute_tool {gen_ai.tool.name}". (e.g. "execute_tool query_database")
  • The gen_ai.tool.name attribute SHOULD be set to the name of the tool. (e.g. "query_database")
  • All Common Span Attributes SHOULD be set (all required common attributes MUST be set).

Additional attributes on the span:

Data AttributeTypeRequirement LevelDescriptionExample
gen_ai.tool.namestringoptionalName of the tool executed."random_number"
gen_ai.tool.call.argumentsstringoptionalArguments of the tool call (stringified JSON)."{\"max\":10}"
gen_ai.tool.call.resultstringoptionalResult of the tool call (stringified)."7"
gen_ai.tool.descriptionstringoptionalDescription of the tool executed."Tool returning a random number"
gen_ai.tool.typestringoptionalThe type of the tools."function"; "extension"; "datastore"
gen_ai.tool.inputstringoptionalDeprecated. Use gen_ai.tool.call.arguments instead. Input given to the executed tool as string."{\"max\":10}"
gen_ai.tool.outputstringoptionalDeprecated. Use gen_ai.tool.call.result instead. The output from the tool."7"

Copied
require 'json'

Sentry.with_child_span(op: 'gen_ai.execute_tool', description: 'execute_tool get_weather') do |span|
  span.set_data('gen_ai.operation.name', 'execute_tool')
  span.set_data('gen_ai.tool.name', 'get_weather')
  span.set_data('gen_ai.tool.call.arguments', { location: 'Paris' }.to_json)

  result = get_weather(location: 'Paris')

  span.set_data('gen_ai.tool.call.result', result.to_json)
end

This span marks the transition of control from one agent to another, typically when the current agent determines another agent is better suited to handle the task.

Handoff span attributes

A span that describes the handoff from one agent to another.

  • The spans op MUST be "gen_ai.handoff".
  • The spans name SHOULD be "handoff from {from_agent} to {to_agent}".
  • All Common Span Attributes SHOULD be set.

Copied
Sentry.with_child_span(op: 'gen_ai.handoff', description: 'handoff from Weather Agent to Travel Agent') do |span|
  # Handoff span just marks the transition
end

Sentry.with_child_span(op: 'gen_ai.invoke_agent', description: 'invoke_agent Travel Agent') do |span|
  # Run the target agent here
end

Some attributes are common to all AI Agents spans:

Data AttributeTypeRequirement LevelDescriptionExample
gen_ai.operation.namestringrequiredThe name of the operation being performed. [4]"chat"
gen_ai.provider.namestringoptionalThe Generative AI product as identified by the client or server instrumentation."openai"
  • [4]: gen_ai.operation.name is what Sentry uses to classify spans in AI dashboards. Well-defined values include: "chat", "invoke_agent", "execute_tool", "embeddings", "generate_content", "text_completion", "create_agent", "handoff".

Well-defined values for gen_ai.provider.name: "anthropic", "aws.bedrock", "azure.ai.inference", "azure.ai.openai", "cohere", "deepseek", "gcp.gemini", "gcp.gen_ai", "gcp.vertex_ai", "groq", "ibm.watsonx.ai", "mistral_ai", "openai", "perplexity", "x_ai".

When manually setting token attributes, be aware of how Sentry uses them to calculate model costs.

Cached and reasoning tokens are subsets, not separate counts. gen_ai.usage.input_tokens is the total input token count that already includes any cached tokens. Similarly, gen_ai.usage.output_tokens already includes reasoning tokens. Sentry subtracts the cached/reasoning counts from the totals to compute the "raw" portion, so reporting them incorrectly can produce wrong or negative costs.

For example, say your LLM call uses 100 input tokens total, 90 of which were served from cache. Using a standard rate of $0.01 per token and a cached rate of $0.001 per token:

Correctinput_tokens is the total (includes cached):

  • gen_ai.usage.input_tokens = 100
  • gen_ai.usage.input_tokens.cached = 90
  • Sentry calculates: (100 - 90) × $0.01 + 90 × $0.001 = $0.10 + $0.09 = $0.19

Wronginput_tokens set to only the non-cached tokens, making cached larger than total:

  • gen_ai.usage.input_tokens = 10
  • gen_ai.usage.input_tokens.cached = 90
  • Sentry calculates: (10 - 90) × $0.01 + 90 × $0.001 = −$0.80 + $0.09 = −$0.71

Because input_tokens.cached (90) is larger than input_tokens (10), the subtraction goes negative, resulting in a negative total cost.

The same applies to gen_ai.usage.output_tokens and gen_ai.usage.output_tokens.reasoning.

Was this helpful?
Help improve this content
Our documentation is open source and available on GitHub. Your contributions are welcome, whether fixing a typo (drat!) or suggesting an update ("yeah, this would be better").