πŸ”Trace Analytics

Overview

Trace Analytics provides visibility into your AI agent workflows, helping you understand performance and costs across your AI operations. A "trace" represents a complete workflow executionβ€”from the initial request through all agent interactions to the final response. By grouping related AI transactions under a single trace ID, you can analyze the full picture of complex AI workflows.

Key Benefits

  • Anomaly Detection: Automatically identify outlier traces using statistical analysis

  • Performance Optimization: Identify slow operations and bottlenecks in your AI workflows

  • Cost Management: Track and analyze spending across models, providers, and trace types

  • Efficiency Analysis: Identify optimization opportunities, circular patterns, and streamline workflows

  • Multi-Agent Analysis: Understand agent-to-agent communication patterns and costs

  • Workflow Visualization: Visualize transaction dependencies and critical paths with the Dependency Tree

  • Detailed Insights: Drill down into individual transactions for granular analysis


Getting Started

Navigate to Traces from the main navigation menu to access the Trace Analytics dashboard. The page is organized with a tab-based interface:

  1. Cost Tab: Analyze costs across your traces and trace types, with integrated cost anomaly detection

  2. Performance Tab: Monitor duration and performance metrics, with integrated performance anomaly detection

  3. Efficiency Tab: Identify optimization opportunities, inefficient patterns, and circular call patterns, with integrated efficiency anomaly detection

  4. Agent Interaction Tab: Analyze multi-agent communication patterns and costs

Each tab provides metric cards, insight cards, trend charts, anomaly detection sections, data tables, and the ability to drill down into individual trace details with the Trace Visualization.


Anomaly Detection

Each tab (Cost, Performance, Efficiency) includes an integrated Anomalies Section that automatically detects outlier traces using statistical analysis. Anomalies are detected using percentile-based thresholds:

  • P99 (Critical) - Traces exceeding the 99th percentile (top 1%) - require immediate attention

  • P95 (High) - Traces exceeding the 95th percentile (top 5%) - should be reviewed and optimized

  • P75 (Moderate) - Traces exceeding the 75th percentile (top 25%) - monitor trends closely

Anomaly Summary Banner

The anomaly section displays three clickable cards showing anomaly counts by severity:

  • Critical Anomalies (P99) - Top 1% outliers requiring immediate attention

  • High Anomalies (P95) - Top 5% outliers to review and optimize

  • Moderate Anomalies (P75) - Top 25% outliers to monitor

Each card displays:

  • The count of anomalous traces

  • What's happening (explanation of the severity level)

  • Next steps (actionable guidance)

  • A "Filter & Investigate" button to focus on that severity

Click any card to filter the anomalies table to that specific percentile. Click again to show all anomalies.

Anomaly Inline Indicators

The main metric card in each tab shows an inline anomaly indicator when anomalies are detected. This displays:

  • Total anomaly count

  • Critical count (if any P99 anomalies exist)

  • Click to scroll directly to the anomalies section

Anomalies Table

Each anomaly section includes a detailed table showing detected anomalies:

Column
Description

Date/Time

When the anomalous trace occurred

Trace ID

Unique identifier (clickable to view trace details)

Type

Trace type category

Name

Trace name

Metric

Which metric triggered the anomaly

Actual

The measured value that exceeded the threshold

Threshold

The percentile threshold value that was exceeded

Percentile

Badge showing which percentile was exceeded (P75/P95/P99)

Click any row to open the Trace Detail View for investigation.


Cost Tab

The Cost tab helps you understand and manage spending across your AI workflows.

Operation Type Filter

The Cost tab includes an Operation Type filter dropdown that allows you to analyze costs by specific AI operation types (e.g., Chat, Embed, Image, Audio). This filter is exclusive to the Cost tab.

The dropdown dynamically shows only the operation types that have cost data in the selected time period. For example, if your traces during the last 7 days only include Chat and Embed operations, those will be the only options available in the filter.

When you select an operation type, all metrics, charts, and tables on the Cost tab are filtered to show only data for that specific operation type. Select "All Operation Types" to view aggregate data across all operations.

Summary Metrics

At the top of the Cost tab, four metric cards provide an at-a-glance summary:

  • Total Cost: Cumulative spending across all traces in the selected time period

  • Average Cost: Typical cost per trace

  • P95 Cost: The 95th percentile costβ€”95% of traces cost less than this amount

  • Trend: Percentage change in cost compared to the previous period (with absolute change and previous period value)

Insight Cards

Four insight cards provide quick findings about cost patterns:

  • Most Expensive - Trace type with the highest average cost

  • Biggest cost increase - Trace type with the largest cost increase compared to the previous period

  • Cost outliers (P95+) - Trace type with the most traces exceeding normal cost patterns

  • Cost Efficiency - Trace type with the lowest cost per transaction

These insights help prioritize which trace types to optimize for cost savings.

A time series chart visualizes how costs are trending over the selected time period. Each trace type is represented as a separate line, allowing you to:

  • See overall cost trends over time

  • Compare costs across different trace types

  • Identify spikes or anomalies in spending

  • Click on a trace type in the chart legend to filter the table below

Cost Anomalies Section

Below the trends chart, a dedicated Cost Anomalies section identifies traces with unusually high costs. This section filters anomalies specifically by the TOTAL_COST metric, showing only cost-related outliers.

The section includes:

  • Anomaly Summary Banner - Three cards showing P99, P95, and P75 cost anomalies

  • Anomalies Table - Detailed list of traces exceeding cost thresholds

The Total Cost metric card at the top includes an inline anomaly indicator. Click it to scroll directly to the Cost Anomalies section.

Cost by Operation Type Card

When operation type data is available, a dedicated Cost by Operation Type card displays the distribution of costs across different AI operation types:

  • Visual Breakdown: Each operation type shows a colored bar representing its percentage of total cost

  • Cost Values: Displays the actual cost and percentage for each operation type

  • Sorted by Cost: Operation types are automatically sorted from highest to lowest cost

  • Color Coding: Each operation type has a distinct color for easy identification

This breakdown helps you understand which types of AI operations (chat completions, embeddings, image generation, etc.) are driving your costs.

Cost by Trace Type Table

A detailed table shows cost metrics grouped by trace type:

Column
Description

Trace Type

The workflow category (e.g., chat-completion, document-analysis)

Total Cost

Cumulative cost for this trace type

Average Cost

Mean cost per trace

P95 Cost

95th percentile cost

P99 Cost

99th percentile cost

Trend

Percentage change from previous period

Expandable Rows: Click on any trace type row to expand it and see individual traces. Click on a specific trace to open the Trace Detail View.


Performance Tab

The Performance tab helps you monitor execution times and identify slow operations.

Summary Metrics

Four metric cards provide performance highlights:

  • Average Duration: Mean execution time across all traces

  • P95 Duration: 95th percentile durationβ€”95% of traces complete faster than this

  • P99 Duration: 99th percentile durationβ€”the slowest 1% of traces

  • Trend: Percentage change in duration compared to the previous period

Insight Cards

Four insight cards provide quick findings about performance patterns:

  • Slowest Trace Type (P95) - Trace type with the highest average duration

  • Most Transaction Heavy - Trace type with the highest transaction count

  • Most Inefficient (P99/P50 Ratio) - Trace type with the highest P99/P50 duration ratio

  • Biggest Degradation - Trace type with the largest negative performance trend compared to the previous period

These insights help prioritize which trace types to optimize for performance.

A time series chart shows how execution times are changing over the selected period. Each trace type appears as a separate line, enabling you to:

  • Track performance trends over time

  • Compare performance across trace types

  • Identify slowdowns or improvements

  • Click on a trace type to filter the table below

Performance Anomalies Section

Below the trends chart, a dedicated Performance Anomalies section identifies traces with unusually long durations. This section filters anomalies specifically by the TRACE_DURATION metric, showing only performance-related outliers.

The section includes:

  • Anomaly Summary Banner - Three cards showing P99, P95, and P75 duration anomalies

  • Anomalies Table - Detailed list of traces exceeding duration thresholds

The Average Duration metric card at the top includes an inline anomaly indicator. Click it to scroll directly to the Performance Anomalies section.

Performance by Trace Type Table

A detailed table shows performance metrics by trace type:

Column
Description

Trace Type

The workflow category

Average Duration

Mean execution time

P95 Duration

95th percentile duration

P99 Duration

99th percentile duration

Expandable Rows: Click on any row to expand and view individual traces. Select a trace to open the Trace Detail View.


Efficiency Tab

The Efficiency tab helps you identify optimization opportunities by analyzing transaction patterns, detecting inefficient workflows, and identifying circular call patterns.

Summary Metrics

Four key efficiency indicators:

  • Average Transactions - Mean number of transactions per trace

  • P95 Transactions - 95th percentile transaction count

  • P99 Transactions - 99th percentile transaction count

  • Trend - Percentage change in average transactions compared to the previous period

Insight Cards

Four insight cards provide quick findings:

  • Most Efficient - Trace type with the lowest average transaction count

  • Least Efficient - Trace type with the highest average transaction count

  • Highest Variability - Trace type with the most inconsistent transaction counts

  • Most Outliers - Trace type with the most traces exceeding normal patterns

Time series visualization showing transaction count trends:

  • Each trace type shown as a separate line

  • Optional P95/P99 percentile lines (toggle with switch)

  • Switch between line and scatter plot views

  • Click a trace type in the legend to filter the table below

Circular Pattern Analysis

A dedicated section that detects and analyzes circular call patterns in your tracesβ€”situations where agents call each other in loops or repetitive patterns that may indicate inefficiencies.

Summary Metrics

  • Patterns Detected - Total number of circular patterns found

  • Total Waste - Combined duration and cost wasted by circular patterns

Severity Filtering

  • Filter patterns by severity level (Critical, Major, Minor)

  • Click on severity badges to filter the pattern list

  • Selected severity shows filtered results

Pattern List

  • Shows top patterns ranked by impact

  • Each pattern displays:

    • The call sequence (e.g., Agent A β†’ Agent B β†’ Agent A)

    • Occurrence count

    • Total waste (duration and cost)

    • Severity indicator

    • Hop count (number of calls in the loop)

  • Pagination for browsing through many patterns

Use Cases

  • Loop Detection - Identify when agents are calling each other in unnecessary loops

  • Cost Reduction - Find patterns that waste resources through redundant calls

  • Workflow Optimization - Understand where to break cycles or add caching

Efficiency Anomalies Section

Below the Circular Pattern Analysis, a dedicated Efficiency Anomalies section identifies traces with unusually high transaction counts. This section filters anomalies specifically by the TRANSACTION_COUNT metric, showing only efficiency-related outliers.

The section includes:

  • Anomaly Summary Banner - Three cards showing P99, P95, and P75 transaction count anomalies

  • Anomalies Table - Detailed list of traces exceeding transaction count thresholds

The Average Transactions metric card at the top includes an inline anomaly indicator. Click it to scroll directly to the Efficiency Anomalies section.

Efficiency Table

Expandable table showing efficiency metrics by trace type:

Column
Description

Trace Type

Category of trace

Avg Transactions

Mean transaction count

P95 Transactions

95th percentile

P99 Transactions

99th percentile

Expandable Rows: Click the expand icon (β–Ά) to see individual traces with their transaction counts. Click any trace to open the Trace Detail View.


Agent Interaction Tab

The Agent Interaction tab provides visibility into how AI agents communicate with each other within your traces, helping you understand multi-agent workflows and their associated costs and patterns.

Summary Metrics

Four metric cards provide an at-a-glance summary of agent activity:

  • Agents - Total number of unique agents active in the selected time period

  • Interactions - Total number of agent-to-agent calls

  • Total Cost - Cumulative cost of all agent interactions

  • Avg Interactions/Agent - Mean number of interactions per agent

Agent Activity Matrix

An interactive matrix visualization showing agent-to-agent communication patterns. The matrix provides a comprehensive view of how agents interact with each other.

Filter Options

  • Filter Agents - Multi-select dropdown to show only specific agents in the matrix

  • Metric - Choose which metric to visualize:

    • Call Count - Number of times one agent called another

    • Total Cost - Cumulative cost of interactions between agents

    • Avg Duration - Average duration of interactions between agents

  • View Mode:

    • Absolute - Colors based on raw metric values

    • Relative - Colors based on each agent's proportion of total activity

  • Sort By:

    • Total Activity - Sort by most active agents first

    • Alphabetical - Sort agents alphabetically

Matrix Features

  • Grid Layout - Rows represent "from" agents, columns represent "to" agents

  • Color Intensity - Darker colors indicate higher values (None β†’ Low β†’ Medium β†’ High β†’ Very High β†’ Extreme)

  • Cell Values - Each cell shows the metric value for that agent pair

  • Total Column - Rightmost column shows total activity for each agent

  • Self-Interaction - Diagonal cells (agent calling itself) are dimmed and disabled

Interactive Tooltips

Hover over any cell to see detailed information:

  • The raw metric value

  • Activity classification (Low to Extreme)

  • Comparison vs median (how many times larger than typical)

  • Percentage of agent's total activity

  • Typical range for context

Color Legend

The bottom of the matrix shows a color legend explaining the intensity scale:

  • Absolute Mode: None β†’ Low (blue) β†’ Medium (green) β†’ High (yellow) β†’ Very High (orange) β†’ Extreme (red)

  • Relative Mode: None β†’ Low (blue) β†’ Medium (cyan) β†’ High (teal) β†’ Very High (green) β†’ Extreme (emerald)

Agent Interactions Table

Detailed table showing all agent-to-agent interactions:

Column
Description

From Agent

The calling agent

To Agent

The agent being called

Call Count

Number of interactions (sortable)

Total Cost

Cumulative cost of interactions (sortable)

Avg Duration

Average duration per interaction in milliseconds (sortable)

Use Cases

Multi-Agent Workflow Analysis

  • Identify which agents communicate most frequently

  • Understand agent collaboration patterns

  • Detect unexpected or inefficient agent interactions

Cost Optimization

  • Find high-cost agent relationships

  • Identify opportunities to reduce inter-agent calls

  • Optimize agent orchestration to minimize costs

Performance Monitoring

  • Track agent interaction latencies

  • Identify slow agent-to-agent communications

  • Monitor changes in interaction patterns over time


Trace Detail View

When you click on a specific trace, a detailed view opens showing comprehensive information about that workflow execution.

Trace Summary Header

The header displays key identifying information:

  • Trace Type: The workflow category badge

  • Trace ID: Unique identifier for this trace (matches the traceId you pass in your API calls)

  • Task Type: The type of AI task performed

  • Agent: The AI agent that processed the trace

Below the identifiers, metric badges show:

  • Total Cost: Combined cost of all transactions in the trace

  • Duration: Total execution time

  • Time to First Token: Latency before the first response token

  • Total Tokens: Combined input and output tokens

  • Transaction Count: Number of AI operations in the trace

  • Success/Error Count: Number of successful vs failed transactions

Context badges provide organizational information:

  • Subscriber: The end user or API consumer

  • Organization: The customer organization

  • Product: The product being used

  • Environment: Production, staging, etc.

  • Provider: AI provider(s) used

  • Model: AI model(s) used

Transaction Timeline

A visual waterfall chart shows the sequence and timing of all transactions within the trace:

  • Horizontal Bars: Each bar represents a transaction, with length proportional to duration

  • Color Coding: Colors indicate the model/provider used

  • Tooltips: Hover over any bar to see transaction details including model, cost, duration, and token counts

  • Timing Information: The timeline shows start times and durations, helping identify bottlenecks

Dependency Tree

An interactive tree visualization that shows the hierarchical relationships between transactions in the trace, revealing the execution flow and dependencies. The Dependency Tree is displayed as a collapsible accordion section.

Header Information

The accordion header displays key tree metrics:

  • Total transaction count

  • Number of depth levels in the tree

  • Pattern type detected (Linear, Converging Paths, or Multi-Root)

Visual Elements

  • Nodes - Each node represents a transaction, displaying:

    • Agent name

    • Task type

    • Model used

    • Individual duration and cost

    • Cumulative path duration and cost (from root to this node)

  • Edges - Arrows connecting nodes show parent-child relationships based on parentTransactionId

  • Critical Path - Highlighted in primary color, showing the longest execution path that determines overall trace duration

  • Bottleneck Indicators - Nodes with significantly higher duration than average are marked as bottlenecks

Tree Legend

An expandable "Learn More" legend explains the visualization:

  • Critical Path (purple circle) - The longest path through the tree

  • Bottleneck (red circle with !) - Transactions that are significantly slower than average

  • Arrows - Show parent β†’ child flow direction

  • No Arrows - Independent transactions with no dependencies

Lane Summaries

At the bottom of the tree, lane summary boxes show aggregated metrics for each execution path:

  • Total duration for the path

  • Total cost for the path

  • Number of nodes in the path

  • Whether the path is on the critical path

Critical Path Analysis

The tree automatically identifies:

  • Critical Path - The longest execution path that determines overall trace duration

  • Optimization Potential - If there are multiple paths, shows how much time could be saved by optimizing the critical path

  • Bottleneck Detection - Nodes that are 2.5x slower than average are marked as bottlenecks

Pattern Recognition

The tree identifies workflow patterns:

  • Linear - Sequential execution with no branching (single path)

  • Converging Paths - Multiple parallel branches that may share a common parent

  • Multi-Root - Multiple independent execution trees (traces with multiple root transactions)

Understanding the Visualization

  • Nodes are arranged vertically by execution depth (top = root, bottom = leaf)

  • Horizontal positioning groups siblings (transactions with the same parent)

  • Path duration shows cumulative time from root to that node

  • Click any node to open the Transaction Details Drawer

Use Cases

  • Performance Optimization - Identify which execution paths are slowest and where bottlenecks occur

  • Workflow Understanding - Visualize complex multi-agent interactions and dependencies

  • Debugging - Trace execution flow to find unexpected dependencies or missing relationships

  • Cost Analysis - See how costs accumulate along different execution paths

Breakdowns & Analytics

Four breakdown cards provide aggregated views of the trace:

  • Cost by Model: Stacked bar showing how costs are distributed across AI models used in the trace

  • Cost by Provider: Stacked bar showing cost distribution across AI providers

  • Token Breakdown: Stacked bar showing input vs output tokens

  • Duration by Task Type: Bar chart showing time spent on different operation types

Transaction Details Table

A comprehensive table lists all transactions in the trace with complete metadata captured by Revenium for each trace.

Export: Use the export button to download transaction details as a CSV file.


Setting Up Traces

To get the most value from Trace Analytics, ensure you're passing trace metadata in your AI transactions:

  • Trace ID: Use consistent trace IDs to group related transactions

  • Trace Type: Categorize workflows for meaningful aggregation

  • Task Type: Label operations for detailed analysis

  • Agent: Identify which agent or service processed the request

  • Parent Transaction ID: Set parent-child relationships to enable Dependency Tree visualization


Last updated

Was this helpful?