Observability

This document describes how to properly instrument AI agents with OpenTelemetry following the GenAI semantic conventions, with specific guidance for MLflow and Phoenix compatibility.

Overview

Agents emit traces following OpenTelemetry GenAI Semantic Conventions. The OTEL Collector transforms these to target-specific formats for:

MLflow: Experiment tracking and trace analysis
Phoenix: LLM observability and debugging

Agent (gen_ai.*) → OTEL Collector → Transform → MLflow + Phoenix

Span Naming Conventions

Agent Invocation Span

Per GenAI Agent Spans Spec:

Condition	Span Name Format
Agent name available	`invoke_agent {gen_ai.agent.name}`
Agent name unavailable	`invoke_agent`

Example:

# Correct span names:
"invoke_agent weather-assistant"  # When agent name is known
"invoke_agent"                    # Fallback when name unknown

Important: The gen_ai.operation.name attribute MUST be invoke_agent.

Other Operation Spans

Operation	Span Name	`gen_ai.operation.name`
LLM chat completion	`chat {gen_ai.request.model}`	`chat`
Tool execution	`execute_tool {gen_ai.tool.name}`	`execute_tool`
Embeddings	`embeddings {gen_ai.request.model}`	`embeddings`
Agent creation	`create_agent {gen_ai.agent.name}`	`create_agent`

Required Attributes

Agent Span Attributes (invoke_agent)

Attribute	Type	Requirement	Description
`gen_ai.operation.name`	string	Required	Must be `invoke_agent`
`gen_ai.provider.name`	string	Required	Provider: `langchain`, `crewai`, `openai`, etc.
`gen_ai.agent.name`	string	Conditionally Required	Agent identifier
`gen_ai.agent.id`	string	Conditionally Required	Unique agent instance ID
`gen_ai.conversation.id`	string	Conditionally Required	Session/conversation ID (for multi-turn)
`gen_ai.request.model`	string	Conditionally Required	Model name if applicable
`error.type`	string	Conditionally Required	Error class (only if operation failed)

Recommended Attributes

Attribute	Type	Description
`gen_ai.response.finish_reasons`	string[]	Why generation stopped: `["stop"]`, `["tool_calls"]`
`gen_ai.usage.input_tokens`	int	Tokens in prompt
`gen_ai.usage.output_tokens`	int	Tokens in response
`gen_ai.request.temperature`	double	Sampling temperature
`gen_ai.request.max_tokens`	int	Maximum tokens to generate
`gen_ai.response.model`	string	Actual model used (may differ from requested)

Optional (Opt-In, Sensitive)

Attribute	Type	Description
`gen_ai.input.messages`	any	Full conversation history
`gen_ai.output.messages`	any	Model responses
`gen_ai.system_instructions`	any	System prompt
`gen_ai.tool.definitions`	any	Available tools

Span Kind

Scenario	Span Kind
Remote agent service (OpenAI Assistants, AWS Bedrock)	`CLIENT`
In-process agent (LangChain, CrewAI, custom)	`INTERNAL`

MLflow-Specific Requirements

MLflow reads specific attributes for its UI columns. These are in addition to GenAI conventions.

MLflow UI Column Mapping

UI Column	Attribute	Source
Request	`mlflow.spanInputs`	Root span
Response	`mlflow.spanOutputs`	Root span
Trace Name	`mlflow.traceName`	Root span or trace tag
Session	`mlflow.trace.session`	Trace tag (from `gen_ai.conversation.id`)
User	`mlflow.user`	Root span
Tokens	`mlflow.span.chat_usage`	LLM span (JSON: `{"input_tokens": N, "output_tokens": M}`)
Type	`mlflow.spanType`	Span (`AGENT`, `LLM`, `TOOL`, `CHAIN`)

MLflow Span Type Values

`mlflow.spanType`	Use Case
`AGENT`	Agent invocation span
`LLM`	Direct LLM call
`TOOL`	Tool/function execution
`CHAIN`	Chain/workflow step
`RETRIEVER`	RAG retrieval
`EMBEDDING`	Embedding generation

Critical: Root Span Attributes

MLflow reads mlflow.spanInputs and mlflow.spanOutputs from the ROOT span only.

For streaming responses, you must explicitly set these on the root span:

from contextvars import ContextVar

_root_span: ContextVar = ContextVar('root_span', default=None)

def get_root_span():
    return _root_span.get()

# In middleware - store root span
async def middleware(request, call_next):
    with tracer.start_as_current_span("invoke_agent weather-assistant") as span:
        token = _root_span.set(span)
        span.set_attribute("mlflow.spanInputs", user_input)
        try:
            response = await call_next(request)
        finally:
            _root_span.reset(token)

# In agent - set output on root span (NOT current span)
def on_complete(output):
    root_span = get_root_span()
    if root_span:
        root_span.set_attribute("mlflow.spanOutputs", output)

Phoenix (OpenInference) Requirements

Note: Phoenix is an optional component (components.phoenix.enabled). The OTEL Collector’s Phoenix pipeline (traces/phoenix) is only deployed when Phoenix is enabled.

Phoenix uses OpenInference semantic conventions.

Required Attributes

Attribute	Type	Description
`openinference.span.kind`	string	Required: `AGENT`, `LLM`, `TOOL`, etc.
`input.value`	string	Input to the operation
`output.value`	string	Output from the operation

OpenInference Span Kinds

Kind	Use Case
`AGENT`	Agent invocation
`LLM`	LLM calls
`CHAIN`	Chain/workflow
`TOOL`	Tool execution
`RETRIEVER`	RAG retrieval
`EMBEDDING`	Embedding generation
`RERANKER`	Document reranking
`GUARDRAIL`	Safety checks

LLM-Specific Attributes

Attribute	Type	Description
`llm.model_name`	string	Model identifier
`llm.system`	string	AI vendor (`openai`, `anthropic`)
`llm.provider`	string	Hosting provider if different
`llm.token_count.prompt`	int	Input tokens
`llm.token_count.completion`	int	Output tokens
`llm.input_messages`	indexed	Chat messages (flattened)
`llm.output_messages`	indexed	Response messages

Complete Agent Instrumentation Example

"""
Weather Agent with proper GenAI + MLflow + Phoenix instrumentation.
"""
from contextvars import ContextVar
from opentelemetry import trace
from opentelemetry.trace import SpanKind, Status, StatusCode

# Agent metadata
AGENT_NAME = "weather-assistant"
AGENT_VERSION = "1.0.0"
PROVIDER = "langchain"

# ContextVar for root span access
_root_span: ContextVar = ContextVar('root_span', default=None)

def get_root_span():
    return _root_span.get()


def create_tracing_middleware():
    """Middleware that creates properly named root span."""
    tracer = trace.get_tracer("weather-agent")

    async def middleware(request, call_next):
        # Parse user input from request
        user_input = extract_user_input(request)
        session_id = extract_session_id(request)

        # Create root span with correct naming
        span_name = f"invoke_agent {AGENT_NAME}"

        with tracer.start_as_current_span(
            span_name,
            kind=SpanKind.INTERNAL,  # In-process agent
        ) as span:
            # Store for agent code access
            token = _root_span.set(span)

            try:
                # === GenAI Semantic Conventions (Required) ===
                span.set_attribute("gen_ai.operation.name", "invoke_agent")
                span.set_attribute("gen_ai.provider.name", PROVIDER)
                span.set_attribute("gen_ai.agent.name", AGENT_NAME)

                # === GenAI (Conditionally Required) ===
                if session_id:
                    span.set_attribute("gen_ai.conversation.id", session_id)

                # === MLflow-Specific ===
                if user_input:
                    span.set_attribute("mlflow.spanInputs", user_input[:1000])
                span.set_attribute("mlflow.spanType", "AGENT")
                span.set_attribute("mlflow.traceName", AGENT_NAME)
                span.set_attribute("mlflow.version", AGENT_VERSION)

                # === OpenInference (Phoenix) ===
                span.set_attribute("openinference.span.kind", "AGENT")
                if user_input:
                    span.set_attribute("input.value", user_input[:1000])

                # Call agent handler
                response = await call_next(request)

                span.set_status(Status(StatusCode.OK))
                return response

            except Exception as e:
                span.set_status(Status(StatusCode.ERROR, str(e)))
                span.set_attribute("error.type", type(e).__name__)
                span.record_exception(e)
                raise
            finally:
                _root_span.reset(token)

    return middleware


def set_agent_output(output: str):
    """Set output on root span (call after agent completes)."""
    root_span = get_root_span()
    if root_span and root_span.is_recording():
        truncated = output[:1000]

        # GenAI convention (for general OTEL consumers)
        root_span.set_attribute("gen_ai.completion", truncated)

        # MLflow (Response column)
        root_span.set_attribute("mlflow.spanOutputs", truncated)

        # Phoenix/OpenInference
        root_span.set_attribute("output.value", truncated)

OTEL Collector Transforms

The OTEL Collector can transform GenAI attributes to target formats:

processors:
  transform/genai_to_mlflow:
    trace_statements:
      - context: span
        statements:
          # Copy gen_ai.conversation.id to mlflow.trace.session
          - set(attributes["mlflow.trace.session"], attributes["gen_ai.conversation.id"])
            where attributes["gen_ai.conversation.id"] != nil

          # Set mlflow.spanType based on operation
          - set(attributes["mlflow.spanType"], "AGENT")
            where attributes["gen_ai.operation.name"] == "invoke_agent"
          - set(attributes["mlflow.spanType"], "LLM")
            where attributes["gen_ai.operation.name"] == "chat"
          - set(attributes["mlflow.spanType"], "TOOL")
            where attributes["gen_ai.operation.name"] == "execute_tool"

  transform/genai_to_openinference:
    trace_statements:
      - context: span
        statements:
          # Map gen_ai.* to OpenInference llm.*
          - set(attributes["llm.model_name"], attributes["gen_ai.request.model"])
          - set(attributes["llm.system"], attributes["gen_ai.provider.name"])
          - set(attributes["llm.token_count.prompt"], attributes["gen_ai.usage.input_tokens"])
          - set(attributes["llm.token_count.completion"], attributes["gen_ai.usage.output_tokens"])

Attribute Summary Table

Purpose	GenAI Attribute	MLflow Attribute	OpenInference Attribute
Operation type	`gen_ai.operation.name`	`mlflow.spanType`	`openinference.span.kind`
Agent name	`gen_ai.agent.name`	`mlflow.traceName`	-
Model	`gen_ai.request.model`	-	`llm.model_name`
Provider	`gen_ai.provider.name`	-	`llm.system`
Session	`gen_ai.conversation.id`	`mlflow.trace.session`	-
Input	`gen_ai.prompt`	`mlflow.spanInputs`	`input.value`
Output	`gen_ai.completion`	`mlflow.spanOutputs`	`output.value`
Input tokens	`gen_ai.usage.input_tokens`	-	`llm.token_count.prompt`
Output tokens	`gen_ai.usage.output_tokens`	-	`llm.token_count.completion`

Testing Requirements

E2E tests should verify:

Span naming: Root span name matches invoke_agent {agent_name} format
Required attributes: gen_ai.operation.name, gen_ai.provider.name present
MLflow columns: mlflow.spanInputs, mlflow.spanOutputs on root span
Phoenix attributes: openinference.span.kind, input.value, output.value
Session tracking: gen_ai.conversation.id propagated correctly
Token counts: Present when LLM returns usage data

References

Benchmarking

Kagenti Lifecycle Operators