# Trace Analytics

## Overview

Trace Analytics provides visibility into your AI agent workflows, helping you understand performance and costs across your AI operations. A "trace" represents a complete workflow execution—from the initial request through all agent interactions to the final response. By grouping related AI transactions under a single trace ID, you can analyze the full picture of complex AI workflows.

### Key Benefits

* **Anomaly Detection**: Automatically identify outlier traces using statistical analysis
* **Performance Optimization**: Identify slow operations and bottlenecks in your AI workflows
* **Cost Management**: Track and analyze spending across models, providers, and trace types
* **Efficiency Analysis**: Identify optimization opportunities, circular patterns, and streamline workflows
* **Multi-Agent Analysis**: Understand agent-to-agent communication patterns and costs
* **Workflow Visualization**: Visualize transaction dependencies and critical paths with the Dependency Tree
* **Detailed Insights**: Drill down into individual transactions for granular analysis

***

## Getting Started

Navigate to **Traces** from the main navigation menu to access the Trace Analytics dashboard. The page is organized with a tab-based interface:

1. **Cost Tab**: Analyze costs across your traces and trace types, with integrated cost anomaly detection
2. **Performance Tab**: Monitor duration and performance metrics, with integrated performance anomaly detection
3. **Efficiency Tab**: Identify optimization opportunities, inefficient patterns, and circular call patterns, with integrated efficiency anomaly detection
4. **Agent Interaction Tab**: Analyze multi-agent communication patterns and costs

Each tab provides metric cards, insight cards, trend charts, anomaly detection sections, data tables, and the ability to drill down into individual trace details with the Trace Visualization.

***

## Anomaly Detection

Each tab (Cost, Performance, Efficiency) includes an integrated **Anomalies Section** that automatically detects outlier traces using statistical analysis. Anomalies are detected using percentile-based thresholds:

* **P99 (Critical)** - Traces exceeding the 99th percentile (top 1%) - require immediate attention
* **P95 (High)** - Traces exceeding the 95th percentile (top 5%) - should be reviewed and optimized
* **P75 (Moderate)** - Traces exceeding the 75th percentile (top 25%) - monitor trends closely

### Anomaly Summary Banner

The anomaly section displays three clickable cards showing anomaly counts by severity:

* **Critical Anomalies (P99)** - Top 1% outliers requiring immediate attention
* **High Anomalies (P95)** - Top 5% outliers to review and optimize
* **Moderate Anomalies (P75)** - Top 25% outliers to monitor

Each card displays:

* The count of anomalous traces
* What's happening (explanation of the severity level)
* Next steps (actionable guidance)
* A "Filter & Investigate" button to focus on that severity

Click any card to filter the anomalies table to that specific percentile. Click again to show all anomalies.

### Anomaly Inline Indicators

The main metric card in each tab shows an inline anomaly indicator when anomalies are detected. This displays:

* Total anomaly count
* Critical count (if any P99 anomalies exist)
* Click to scroll directly to the anomalies section

### Anomalies Table

Each anomaly section includes a detailed table showing detected anomalies:

| Column     | Description                                               |
| ---------- | --------------------------------------------------------- |
| Date/Time  | When the anomalous trace occurred                         |
| Trace ID   | Unique identifier (clickable to view trace details)       |
| Type       | Trace type category                                       |
| Name       | Trace name                                                |
| Metric     | Which metric triggered the anomaly                        |
| Actual     | The measured value that exceeded the threshold            |
| Threshold  | The percentile threshold value that was exceeded          |
| Percentile | Badge showing which percentile was exceeded (P75/P95/P99) |

Click any row to open the Trace Detail View for investigation.

***

## Cost Tab

The Cost tab helps you understand and manage spending across your AI workflows.

<figure><img src="https://2470865788-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FSUfCzMW8qWeXstipFXEh%2Fuploads%2Fgit-blob-df83797967843689e3de9d47176342e5e78f51b5%2Fnew-cost-tab.png?alt=media" alt=""><figcaption></figcaption></figure>

### Operation Type Filter

The Cost tab includes an **Operation Type** filter dropdown that allows you to analyze costs by specific AI operation types (e.g., Chat, Embed, Image, Audio). This filter is exclusive to the Cost tab.

The dropdown dynamically shows only the operation types that have cost data in the selected time period. For example, if your traces during the last 7 days only include Chat and Embed operations, those will be the only options available in the filter.

When you select an operation type, all metrics, charts, and tables on the Cost tab are filtered to show only data for that specific operation type. Select "All Operation Types" to view aggregate data across all operations.

### Summary Metrics

At the top of the Cost tab, four metric cards provide an at-a-glance summary:

* **Total Cost**: Cumulative spending across all traces in the selected time period
* **Average Cost**: Typical cost per trace
* **P95 Cost**: The 95th percentile cost—95% of traces cost less than this amount
* **Trend**: Percentage change in cost compared to the previous period (with absolute change and previous period value)

### Insight Cards

Four insight cards provide quick findings about cost patterns:

* **Most Expensive** - Trace type with the highest average cost
* **Biggest cost increase** - Trace type with the largest cost increase compared to the previous period
* **Cost outliers (P95+)** - Trace type with the most traces exceeding normal cost patterns
* **Cost Efficiency** - Trace type with the lowest cost per transaction

These insights help prioritize which trace types to optimize for cost savings.

### Cost Trends Chart

A time series chart visualizes how costs are trending over the selected time period. Each trace type is represented as a separate line, allowing you to:

* See overall cost trends over time
* Compare costs across different trace types
* Identify spikes or anomalies in spending
* Click on a trace type in the chart legend to filter the table below

### Cost Anomalies Section

Below the trends chart, a dedicated **Cost Anomalies** section identifies traces with unusually high costs. This section filters anomalies specifically by the `TOTAL_COST` metric, showing only cost-related outliers.

<figure><img src="https://2470865788-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FSUfCzMW8qWeXstipFXEh%2Fuploads%2Fgit-blob-18cde2b1680a9c2f2a6fa555491eaadb40311896%2Fnew-cost-anomalies.png?alt=media" alt=""><figcaption></figcaption></figure>

The section includes:

* **Anomaly Summary Banner** - Three cards showing P99, P95, and P75 cost anomalies
* **Anomalies Table** - Detailed list of traces exceeding cost thresholds

The Total Cost metric card at the top includes an inline anomaly indicator. Click it to scroll directly to the Cost Anomalies section.

### Cost by Operation Type Card

When operation type data is available, a dedicated **Cost by Operation Type** card displays the distribution of costs across different AI operation types:

* **Visual Breakdown**: Each operation type shows a colored bar representing its percentage of total cost
* **Cost Values**: Displays the actual cost and percentage for each operation type
* **Sorted by Cost**: Operation types are automatically sorted from highest to lowest cost
* **Color Coding**: Each operation type has a distinct color for easy identification

This breakdown helps you understand which types of AI operations (chat completions, embeddings, image generation, etc.) are driving your costs.

### Cost by Trace Type Table

A detailed table shows cost metrics grouped by trace type:

| Column       | Description                                                          |
| ------------ | -------------------------------------------------------------------- |
| Trace Type   | The workflow category (e.g., `chat-completion`, `document-analysis`) |
| Total Cost   | Cumulative cost for this trace type                                  |
| Average Cost | Mean cost per trace                                                  |
| P95 Cost     | 95th percentile cost                                                 |
| P99 Cost     | 99th percentile cost                                                 |
| Trend        | Percentage change from previous period                               |

**Expandable Rows**: Click on any trace type row to expand it and see individual traces. Click on a specific trace to open the Trace Detail View.

***

## Performance Tab

The Performance tab helps you monitor execution times and identify slow operations.

<figure><img src="https://2470865788-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FSUfCzMW8qWeXstipFXEh%2Fuploads%2Fgit-blob-ac421202ce754f2a6781ae577679e3c1695b882e%2Fnew-performance-tab.png?alt=media" alt=""><figcaption></figcaption></figure>

### Summary Metrics

Four metric cards provide performance highlights:

* **Average Duration**: Mean execution time across all traces
* **P95 Duration**: 95th percentile duration—95% of traces complete faster than this
* **P99 Duration**: 99th percentile duration—the slowest 1% of traces
* **Trend**: Percentage change in duration compared to the previous period

### Insight Cards

Four insight cards provide quick findings about performance patterns:

* **Slowest Trace Type (P95)** - Trace type with the highest average duration
* **Most Transaction Heavy** - Trace type with the highest transaction count
* **Most Inefficient (P99/P50 Ratio)** - Trace type with the highest P99/P50 duration ratio
* **Biggest Degradation** - Trace type with the largest negative performance trend compared to the previous period

These insights help prioritize which trace types to optimize for performance.

### Performance Trends Chart

A time series chart shows how execution times are changing over the selected period. Each trace type appears as a separate line, enabling you to:

* Track performance trends over time
* Compare performance across trace types
* Identify slowdowns or improvements
* Click on a trace type to filter the table below

### Performance Anomalies Section

Below the trends chart, a dedicated **Performance Anomalies** section identifies traces with unusually long durations. This section filters anomalies specifically by the `TRACE_DURATION` metric, showing only performance-related outliers.

<figure><img src="https://2470865788-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FSUfCzMW8qWeXstipFXEh%2Fuploads%2Fgit-blob-b28c3c3f7e516410d201772f1c3abb417d19f83e%2Fnew-performance-anomalies.png?alt=media" alt=""><figcaption></figcaption></figure>

The section includes:

* **Anomaly Summary Banner** - Three cards showing P99, P95, and P75 duration anomalies
* **Anomalies Table** - Detailed list of traces exceeding duration thresholds

The Average Duration metric card at the top includes an inline anomaly indicator. Click it to scroll directly to the Performance Anomalies section.

### Performance by Trace Type Table

A detailed table shows performance metrics by trace type:

| Column           | Description              |
| ---------------- | ------------------------ |
| Trace Type       | The workflow category    |
| Average Duration | Mean execution time      |
| P95 Duration     | 95th percentile duration |
| P99 Duration     | 99th percentile duration |

**Expandable Rows**: Click on any row to expand and view individual traces. Select a trace to open the Trace Detail View.

***

## Efficiency Tab

The Efficiency tab helps you identify optimization opportunities by analyzing transaction patterns, detecting inefficient workflows, and identifying circular call patterns.

<figure><img src="https://2470865788-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FSUfCzMW8qWeXstipFXEh%2Fuploads%2Fgit-blob-d01e0e6cb2eb4be90f654465669ed06a21505e9d%2Fnew-efficiency-tab.png?alt=media" alt=""><figcaption></figcaption></figure>

### Summary Metrics

Four key efficiency indicators:

* **Average Transactions** - Mean number of transactions per trace
* **P95 Transactions** - 95th percentile transaction count
* **P99 Transactions** - 99th percentile transaction count
* **Trend** - Percentage change in average transactions compared to the previous period

### Insight Cards

Four insight cards provide quick findings:

* **Most Efficient** - Trace type with the lowest average transaction count
* **Least Efficient** - Trace type with the highest average transaction count
* **Highest Variability** - Trace type with the most inconsistent transaction counts
* **Most Outliers** - Trace type with the most traces exceeding normal patterns

### Efficiency Trends Chart

Time series visualization showing transaction count trends:

* Each trace type shown as a separate line
* Optional P95/P99 percentile lines (toggle with switch)
* Switch between line and scatter plot views
* Click a trace type in the legend to filter the table below

### Circular Pattern Analysis

A dedicated section that detects and analyzes circular call patterns in your traces—situations where agents call each other in loops or repetitive patterns that may indicate inefficiencies.

<figure><img src="https://2470865788-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FSUfCzMW8qWeXstipFXEh%2Fuploads%2Fgit-blob-e5deeb3c1598c9639c84e4c20f07c1c9a21a9c15%2Fnew-circular-dependencies.png?alt=media" alt=""><figcaption></figcaption></figure>

**Summary Metrics**

* **Patterns Detected** - Total number of circular patterns found
* **Total Waste** - Combined duration and cost wasted by circular patterns

**Severity Filtering**

* Filter patterns by severity level (Critical, Major, Minor)
* Click on severity badges to filter the pattern list
* Selected severity shows filtered results

**Pattern List**

* Shows top patterns ranked by impact
* Each pattern displays:
  * The call sequence (e.g., Agent A → Agent B → Agent A)
  * Occurrence count
  * Total waste (duration and cost)
  * Severity indicator
  * Hop count (number of calls in the loop)
* Pagination for browsing through many patterns

**Use Cases**

* **Loop Detection** - Identify when agents are calling each other in unnecessary loops
* **Cost Reduction** - Find patterns that waste resources through redundant calls
* **Workflow Optimization** - Understand where to break cycles or add caching

### Efficiency Anomalies Section

Below the Circular Pattern Analysis, a dedicated **Efficiency Anomalies** section identifies traces with unusually high transaction counts. This section filters anomalies specifically by the `TRANSACTION_COUNT` metric, showing only efficiency-related outliers.

<figure><img src="https://2470865788-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FSUfCzMW8qWeXstipFXEh%2Fuploads%2Fgit-blob-895246648fab0ccd1dc14bc4bbb6460c2a8eec0a%2Fnew-efficiency-anomalies.png?alt=media" alt=""><figcaption></figcaption></figure>

The section includes:

* **Anomaly Summary Banner** - Three cards showing P99, P95, and P75 transaction count anomalies
* **Anomalies Table** - Detailed list of traces exceeding transaction count thresholds

The Average Transactions metric card at the top includes an inline anomaly indicator. Click it to scroll directly to the Efficiency Anomalies section.

### Efficiency Table

Expandable table showing efficiency metrics by trace type:

| Column           | Description            |
| ---------------- | ---------------------- |
| Trace Type       | Category of trace      |
| Avg Transactions | Mean transaction count |
| P95 Transactions | 95th percentile        |
| P99 Transactions | 99th percentile        |

**Expandable Rows**: Click the expand icon (▶) to see individual traces with their transaction counts. Click any trace to open the Trace Detail View.

***

## Agent Interaction Tab

The Agent Interaction tab provides visibility into how AI agents communicate with each other within your traces, helping you understand multi-agent workflows and their associated costs and patterns.

<figure><img src="https://2470865788-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FSUfCzMW8qWeXstipFXEh%2Fuploads%2Fgit-blob-83f34dbd164bb4eeaa9336867f312003995a6d25%2Fnew-agent-interaction-tab.png?alt=media" alt=""><figcaption></figcaption></figure>

### Summary Metrics

Four metric cards provide an at-a-glance summary of agent activity:

* **Agents** - Total number of unique agents active in the selected time period
* **Interactions** - Total number of agent-to-agent calls
* **Total Cost** - Cumulative cost of all agent interactions
* **Avg Interactions/Agent** - Mean number of interactions per agent

### Agent Activity Matrix

An interactive matrix visualization showing agent-to-agent communication patterns. The matrix provides a comprehensive view of how agents interact with each other.

<figure><img src="https://2470865788-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FSUfCzMW8qWeXstipFXEh%2Fuploads%2Fgit-blob-d54df523b1265d8a113e65bfa199f0070ec80872%2Fnew-agent-tab.png?alt=media" alt=""><figcaption></figcaption></figure>

**Filter Options**

* **Filter Agents** - Multi-select dropdown to show only specific agents in the matrix
* **Metric** - Choose which metric to visualize:
  * **Call Count** - Number of times one agent called another
  * **Total Cost** - Cumulative cost of interactions between agents
  * **Avg Duration** - Average duration of interactions between agents
* **View Mode**:
  * **Absolute** - Colors based on raw metric values
  * **Relative** - Colors based on each agent's proportion of total activity
* **Sort By**:
  * **Total Activity** - Sort by most active agents first
  * **Alphabetical** - Sort agents alphabetically

**Matrix Features**

* **Grid Layout** - Rows represent "from" agents, columns represent "to" agents
* **Color Intensity** - Darker colors indicate higher values (None → Low → Medium → High → Very High → Extreme)
* **Cell Values** - Each cell shows the metric value for that agent pair
* **Total Column** - Rightmost column shows total activity for each agent
* **Self-Interaction** - Diagonal cells (agent calling itself) are dimmed and disabled

**Interactive Tooltips**

Hover over any cell to see detailed information:

* The raw metric value
* Activity classification (Low to Extreme)
* Comparison vs median (how many times larger than typical)
* Percentage of agent's total activity
* Typical range for context

**Color Legend**

The bottom of the matrix shows a color legend explaining the intensity scale:

* **Absolute Mode**: None → Low (blue) → Medium (green) → High (yellow) → Very High (orange) → Extreme (red)
* **Relative Mode**: None → Low (blue) → Medium (cyan) → High (teal) → Very High (green) → Extreme (emerald)

### Agent Interactions Table

Detailed table showing all agent-to-agent interactions:

| Column       | Description                                                 |
| ------------ | ----------------------------------------------------------- |
| From Agent   | The calling agent                                           |
| To Agent     | The agent being called                                      |
| Call Count   | Number of interactions (sortable)                           |
| Total Cost   | Cumulative cost of interactions (sortable)                  |
| Avg Duration | Average duration per interaction in milliseconds (sortable) |

### Use Cases

**Multi-Agent Workflow Analysis**

* Identify which agents communicate most frequently
* Understand agent collaboration patterns
* Detect unexpected or inefficient agent interactions

**Cost Optimization**

* Find high-cost agent relationships
* Identify opportunities to reduce inter-agent calls
* Optimize agent orchestration to minimize costs

**Performance Monitoring**

* Track agent interaction latencies
* Identify slow agent-to-agent communications
* Monitor changes in interaction patterns over time

***

## Trace Detail View

When you click on a specific trace, a detailed view opens showing comprehensive information about that workflow execution.

<figure><img src="https://2470865788-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FSUfCzMW8qWeXstipFXEh%2Fuploads%2Fgit-blob-c7dba83aa1a5f3238f7600d878207eb53d132525%2Fnew-trace-visualization.png?alt=media" alt=""><figcaption></figcaption></figure>

### Trace Summary Header

The header displays key identifying information:

* **Trace Type**: The workflow category badge
* **Trace ID**: Unique identifier for this trace (matches the `traceId` you pass in your API calls)
* **Task Type**: The type of AI task performed
* **Agent**: The AI agent that processed the trace

Below the identifiers, metric badges show:

* **Total Cost**: Combined cost of all transactions in the trace
* **Duration**: Total execution time
* **Time to First Token**: Latency before the first response token
* **Total Tokens**: Combined input and output tokens
* **Transaction Count**: Number of AI operations in the trace
* **Success/Error Count**: Number of successful vs failed transactions

**Context badges** provide organizational information:

* **Subscriber**: The end user or API consumer
* **Organization**: The customer organization
* **Product**: The product being used
* **Environment**: Production, staging, etc.
* **Provider**: AI provider(s) used
* **Model**: AI model(s) used

### Transaction Timeline

A visual waterfall chart shows the sequence and timing of all transactions within the trace:

* **Horizontal Bars**: Each bar represents a transaction, with length proportional to duration
* **Color Coding**: Colors indicate the model/provider used
* **Tooltips**: Hover over any bar to see transaction details including model, cost, duration, and token counts
* **Timing Information**: The timeline shows start times and durations, helping identify bottlenecks

### Dependency Tree

An interactive tree visualization that shows the hierarchical relationships between transactions in the trace, revealing the execution flow and dependencies. The Dependency Tree is displayed as a collapsible accordion section.

<figure><img src="https://2470865788-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FSUfCzMW8qWeXstipFXEh%2Fuploads%2Fgit-blob-b3dd948206c64a1a49fd943bbb10eda3f4f8de5f%2Fnew-dependency-tree.png?alt=media" alt=""><figcaption></figcaption></figure>

**Header Information**

The accordion header displays key tree metrics:

* Total transaction count
* Number of depth levels in the tree
* Pattern type detected (Linear, Converging Paths, or Multi-Root)

**Visual Elements**

* **Nodes** - Each node represents a transaction, displaying:
  * Agent name
  * Task type
  * Model used
  * Individual duration and cost
  * Cumulative path duration and cost (from root to this node)
* **Edges** - Arrows connecting nodes show parent-child relationships based on `parentTransactionId`
* **Critical Path** - Highlighted in primary color, showing the longest execution path that determines overall trace duration
* **Bottleneck Indicators** - Nodes with significantly higher duration than average are marked as bottlenecks

**Tree Legend**

An expandable "Learn More" legend explains the visualization:

* **Critical Path** (purple circle) - The longest path through the tree
* **Bottleneck** (red circle with !) - Transactions that are significantly slower than average
* **Arrows** - Show parent → child flow direction
* **No Arrows** - Independent transactions with no dependencies

**Lane Summaries**

At the bottom of the tree, lane summary boxes show aggregated metrics for each execution path:

* Total duration for the path
* Total cost for the path
* Number of nodes in the path
* Whether the path is on the critical path

**Critical Path Analysis**

The tree automatically identifies:

* **Critical Path** - The longest execution path that determines overall trace duration
* **Optimization Potential** - If there are multiple paths, shows how much time could be saved by optimizing the critical path
* **Bottleneck Detection** - Nodes that are 2.5x slower than average are marked as bottlenecks

**Pattern Recognition**

The tree identifies workflow patterns:

* **Linear** - Sequential execution with no branching (single path)
* **Converging Paths** - Multiple parallel branches that may share a common parent
* **Multi-Root** - Multiple independent execution trees (traces with multiple root transactions)

**Understanding the Visualization**

* Nodes are arranged vertically by execution depth (top = root, bottom = leaf)
* Horizontal positioning groups siblings (transactions with the same parent)
* Path duration shows cumulative time from root to that node
* Click any node to open the Transaction Details Drawer

**Use Cases**

* **Performance Optimization** - Identify which execution paths are slowest and where bottlenecks occur
* **Workflow Understanding** - Visualize complex multi-agent interactions and dependencies
* **Debugging** - Trace execution flow to find unexpected dependencies or missing relationships
* **Cost Analysis** - See how costs accumulate along different execution paths

### Breakdowns & Analytics

Four breakdown cards provide aggregated views of the trace:

* **Cost by Model**: Stacked bar showing how costs are distributed across AI models used in the trace
* **Cost by Provider**: Stacked bar showing cost distribution across AI providers
* **Token Breakdown**: Stacked bar showing input vs output tokens
* **Duration by Task Type**: Bar chart showing time spent on different operation types

### Transaction Details Table

A comprehensive table lists all transactions in the trace with complete metadata captured by Revenium for each trace.

**Export**: Use the export button to download transaction details as a CSV file.

***

### Setting Up Traces

To get the most value from Trace Analytics, ensure you're passing trace metadata in your AI transactions:

* **Trace ID**: Use consistent trace IDs to group related transactions
* **Trace Type**: Categorize workflows for meaningful aggregation
* **Task Type**: Label operations for detailed analysis
* **Agent**: Identify which agent or service processed the request
* **Parent Transaction ID**: Set parent-child relationships to enable Dependency Tree visualization

***

## Related Documentation

* [AI Analytics](https://github.com/revenium/isotope/blob/main/gitbook-docs/ai-analytics.md) - Aggregate AI usage metrics and trends
* [System & Transaction Logs](https://docs.revenium.io/system-and-transaction-logs) - Detailed transaction logging
* [Cost & Performance Alerts](https://docs.revenium.io/cost-and-performance-alerts) - Set up alerts for thresholds
* [Integration Options for AI Metering](https://docs.revenium.io/integration-options-for-ai-metering) - How to send trace data
