# AI Coding Data Reference

This page documents every data point that Revenium collects from AI coding assistant integrations. Use this reference to understand exactly what telemetry is captured, how it's used, and what privacy guarantees apply.

***

## How Data Is Collected

All AI coding assistant data is collected via **OpenTelemetry (OTLP)** log records. Each coding tool has a dedicated integration that exports usage telemetry to Revenium's OTLP endpoint. No proprietary agents or background processes are involved — data flows through the standard OpenTelemetry protocol.

| Tool               | Integration Method          | Data Flow                                |
| ------------------ | --------------------------- | ---------------------------------------- |
| **Claude Code**    | `@revenium/cli` npm package | Claude Code hooks → OTLP logs → Revenium |
| **Gemini CLI SDK** | `@revenium/cli` npm package | Gemini CLI → OTLP logs → Revenium        |
| **Cursor IDE**     | Admin API sync              | Cursor Admin API → Revenium (periodic)   |

### Agent Identifiers

Each tool is identified by an **agent** value in the telemetry:

| Tool        | Agent Identifier |
| ----------- | ---------------- |
| Claude Code | `claude-code`    |
| Gemini CLI  | `gemini-cli`     |
| Cursor IDE  | `cursor-ide`     |

***

## Data Privacy

{% hint style="success" %}
**Revenium never collects your code, prompts, or conversation content.** Only usage metadata is transmitted — token counts, model names, timestamps, and session identifiers. This applies to all integrations by default.
{% endhint %}

Specifically, the following are **never** sent in the default configuration:

* Source code or file contents
* Prompt text or system prompts
* AI response content
* API keys, credentials, or secrets
* Repository names or git history (diffs, commits, file contents)
* Screen content or clipboard data

***

## GitHub Integration Data (Optional)

The following section applies **only** when the optional GitHub integration is connected. Without it, the AI Coding Dashboard operates entirely from OTLP telemetry data and does not interact with GitHub in any way.

### What We Read from GitHub

When the integration is active, Revenium makes the following read-only API calls to GitHub:

| Data Read                                           | Purpose                                                     |
| --------------------------------------------------- | ----------------------------------------------------------- |
| Organization member list (usernames, public emails) | Auto-map developers to their corporate email                |
| Public user profiles and email search               | Resolve GitHub logins to email addresses for attribution    |
| Repository names in the organization                | Determine which repos to scan for merged PRs                |
| Merged PR metadata (author, merge date)             | Count PRs merged per developer in the selected period       |
| Commit messages and commit author emails            | Detect AI co-author patterns (e.g. Co-Authored-By trailers) |

{% hint style="warning" %}
Revenium reads commit messages to detect AI co-authorship patterns, but **does not** read or store the message content itself — only the boolean result (AI-assisted or not) is retained.
{% endhint %}

### What We Store

| Stored Data                   | Description                                                                     |
| ----------------------------- | ------------------------------------------------------------------------------- |
| Daily PR counts per developer | Number of PRs merged and number of AI-assisted PRs, per day                     |
| GitHub-to-email mappings      | Links each developer's GitHub username to their corporate email for attribution |

### What We Do NOT Access

Even though the GitHub token may have broad permissions (`repo` scope), our implementation only makes the specific API calls listed above. The following are **never** accessed:

* File contents, diffs, or patches
* Pull request descriptions or comments
* Repository source code
* Issues, reviews, or branch data
* GitHub Actions, webhooks, or deployment data
* Private user profile data beyond public email

### Token Permissions

The GitHub integration requires a personal access token with `repo` and `read:org` scopes. The `repo` scope is broader than strictly necessary, but GitHub does not offer a narrower scope that grants access to merged PR metadata across private repositories. Our code only exercises the minimum API calls needed for PR attribution.

For details on setting up and configuring the GitHub integration, see [GitHub Integration](/track-and-control-costs/analyze-ai-tooling-spend/github-integration.md).

***

## Common Data Points

The following data points are collected by **all** AI coding assistant integrations. These form the core telemetry schema that powers the [AI Coding Dashboard](/track-and-control-costs/analyze-ai-tooling-spend.md).

### Token Metrics

| Data Point                | Type    | Description                                                   |
| ------------------------- | ------- | ------------------------------------------------------------- |
| `inputTokenCount`         | Integer | Number of input tokens consumed in the request                |
| `outputTokenCount`        | Integer | Number of output tokens generated by the model                |
| `cacheReadTokenCount`     | Integer | Tokens served from the model's prompt cache (reduces cost)    |
| `cacheCreationTokenCount` | Integer | Tokens written to the model's prompt cache                    |
| `reasoningTokenCount`     | Integer | Extended thinking / chain-of-thought tokens (model-dependent) |
| `totalTokenCount`         | Integer | Sum of all token types for the request                        |

### Cost Metrics

| Data Point    | Type    | Description                                                     |
| ------------- | ------- | --------------------------------------------------------------- |
| `totalCost`   | Decimal | Calculated cost in USD for this request, based on model pricing |
| `cost_source` | String  | Always `coding_assistant` for AI coding tool traffic            |
| `costType`    | String  | Always `AI` for AI coding assistant requests                    |

### Model & Provider Identity

| Data Point | Type   | Description                                                                             |
| ---------- | ------ | --------------------------------------------------------------------------------------- |
| `model`    | String | AI model name (e.g., `claude-opus-4-5-20251101`, `gemini-2.5-pro`)                      |
| `provider` | String | AI provider identifier. Set by backend mappers: `ClaudeCode`, `GeminiCli`, `CursorIde`. |
| `agent`    | String | Coding assistant identifier (`claude-code`, `gemini-cli`, `cursor-ide`)                 |

### Timing

| Data Point        | Type      | Description                                                   |
| ----------------- | --------- | ------------------------------------------------------------- |
| `requestTime`     | Timestamp | When the request was initiated (ISO 8601 / epoch nanoseconds) |
| `requestDuration` | Integer   | Total request duration in milliseconds                        |

### Attribution

| Data Point         | OTLP Attribute                           | Type   | Description                                                                                                                           |
| ------------------ | ---------------------------------------- | ------ | ------------------------------------------------------------------------------------------------------------------------------------- |
| `subscriber`       | `user.email`                             | String | Developer email address for usage attribution (optional, user-configured)                                                             |
| `organizationName` | `organization.id` or `organization.name` | String | Organization or company name/ID for cost rollup (optional). The backend prefers `organization.name`; falls back to `organization.id`. |
| `productName`      | `product.id` or `product.name`           | String | Product or project name/ID for cost rollup (optional). The backend prefers `product.name`; falls back to `product.id`.                |
| `traceId`          | `session.id`                             | String | Session identifier — groups requests within a single coding session                                                                   |
| `transactionId`    | `transaction_id`                         | String | Unique identifier for each individual request (used for deduplication)                                                                |

{% hint style="info" %}
The **Data Point** column shows the name as stored in the analytics database. The **OTLP Attribute** column shows the key name in the raw telemetry payload. The backend mapper translates between these formats during ingestion.
{% endhint %}

### Operational Classification

| Data Point      | Type   | Description                                                                                                                                                                                 |
| --------------- | ------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `operationType` | String | Request classification (e.g., `CHAT`)                                                                                                                                                       |
| `stopReason`    | String | Why the model stopped generating. Revenium enum values: `END`, `TOKEN_LIMIT`, `ERROR`, `CANCELLED`. See [Gemini Stop Reason Mapping](#stop-reason-mapping) for tool-specific normalization. |
| `errorReason`   | String | Error description if the request failed (empty on success)                                                                                                                                  |

### Coding Assistant Account Linkage

| Data Point                      | Type   | Description                                                                        |
| ------------------------------- | ------ | ---------------------------------------------------------------------------------- |
| `coding_assistant_account_uuid` | String | Links telemetry to a specific coding assistant account for cross-session tracking  |
| `subscription_tier`             | String | Subscription plan identifier (see [Subscription Tiers](#subscription-tiers) below) |

***

## Claude Code Data Points

In addition to the [Common Data Points](#common-data-points) above, Claude Code captures the following:

### Subscription Tiers

Claude Code subscriptions tiers are optionally tracked when using the Revenium SDKs:

| Tier           | Description                                                   |
| -------------- | ------------------------------------------------------------- |
| `pro`          | Anthropic Pro plan                                            |
| `max_5x`       | Anthropic Max 5x plan                                         |
| `max_20x`      | Anthropic Max 20x plan                                        |
| `team_premium` | Anthropic Team Premium plan                                   |
| `enterprise`   | Anthropic Enterprise plan                                     |
| `api`          | Direct API usage (full API pricing, no subscription discount) |

### Data Collection Modes

Claude Code supports two data collection modes:

<table><thead><tr><th width="126.33984375">Mode</th><th>Description</th></tr></thead><tbody><tr><td><strong>Real-time</strong></td><td>Telemetry is exported automatically during each Claude Code session via OTLP hooks. Captures core token, cost, and timing metrics.</td></tr><tr><td><strong>Backfill</strong></td><td>The <code>revenium-metering backfill</code> command scans local Claude Code session logs (<code>~/.claude/projects/</code>) and sends historical usage data.</td></tr></tbody></table>

Backfill is idempotent — deterministic transaction IDs (SHA-256 hash of session ID, timestamp, model, and token counts) prevent duplicate records.

### Centralized Claude Code Configuration

For team-wide real-time telemetry, configure Claude Code once with managed settings instead of asking every developer to run local setup.

| Approach                      | Best for                                                                                                | Where configured                                                                      |
| ----------------------------- | ------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------- |
| **Server-managed settings**   | Claude Teams or Enterprise organizations without MDM, or with unmanaged developer devices               | **Claude.ai → Admin Settings → Claude Code → Managed settings**                       |
| **Endpoint-managed settings** | Organizations with MDM, device-management, registry policy, or system-level managed-settings deployment | macOS managed preferences, Windows policy/registry, or system `managed-settings.json` |
| **Per-developer CLI setup**   | Individual developers, third-party Anthropic providers, one-off setup, or historical backfill           | `@revenium/cli` on each machine                                                       |

#### Organization-Wide Setup

Claude Code supports centrally-managed configuration through the Claude admin console. An administrator defines the settings once; Anthropic delivers them to every authenticated user on next startup. No per-developer install is required.

**Requirements:**

* Claude for Teams or Claude for Enterprise plan
* Claude Code version that supports managed settings
* Administrator role of **Primary Owner** or **Owner** in the Claude admin console
* Direct connection to `api.anthropic.com` for server-managed settings

{% hint style="info" %}
**Claude Code evolves quickly.** Anthropic updates environment variables, managed-settings fields, and admin console paths frequently. If something does not match what you see in the Claude admin console, cross-reference Anthropic's Claude Code monitoring and server-managed settings docs. The Revenium-specific values below, including the endpoint URL, API key header, and resource attributes, are the values to keep consistent.
{% endhint %}

**Steps:**

1. Sign in to [Claude.ai](https://claude.ai) as Primary Owner or Owner.
2. Navigate to the Managed settings screen. Depending on plan and admin console version, this appears under **Admin Settings → Claude Code → Managed settings** or **Organization settings → Claude Code → Managed settings**.
3. Paste the following JSON, substituting your Revenium metering key.

Use a Revenium metering key (`rev_mk_*`) for Claude Code OTLP telemetry. Do **not** use an Anthropic/OpenAI provider key here; provider keys authenticate model calls, while the Revenium key authenticates telemetry ingest.

```json
{
  "env": {
    "CLAUDE_CODE_ENABLE_TELEMETRY": "1",
    "OTEL_EXPORTER_OTLP_ENDPOINT": "https://api.revenium.ai/v2/otlp",
    "OTEL_EXPORTER_OTLP_HEADERS": "x-api-key=rev_mk_your_tenant_yourkey",
    "OTEL_EXPORTER_OTLP_PROTOCOL": "http/json",
    "OTEL_LOGS_EXPORTER": "otlp",
    "OTEL_METRICS_EXPORTER": "none",
    "OTEL_LOGS_EXPORT_INTERVAL": "5000"
  }
}
```

4. Click **Add settings** or save the managed settings.
5. Ask developers to fully quit and relaunch Claude Code after the config push.

On first launch after the settings are picked up, each developer may see a one-time security approval dialog listing the managed environment variables. They should approve the administrator-delivered configuration.

#### Verifying Your Configuration

Ask a developer to run a brief Claude Code session. Usage should appear in Revenium after the next telemetry flush. Your Revenium tenant is identified by the metering key, and Claude Code attaches the developer identity it knows for the session. If your organization uses direct API keys, Bedrock, Vertex, Microsoft Foundry, or a custom `ANTHROPIC_BASE_URL`, attach identity yourself with `OTEL_RESOURCE_ATTRIBUTES`.

If no sessions appear after configuring the settings, check:

* The Revenium metering key is active in your Revenium account
* The OTLP endpoint matches your Revenium environment
* The developer fully quit and relaunched Claude Code after the config push

{% hint style="info" %}
**Field reference**

* **`OTEL_EXPORTER_OTLP_ENDPOINT`** — Revenium's OTLP endpoint. For most customers this is `https://api.revenium.ai/v2/otlp`.
* **`OTEL_EXPORTER_OTLP_HEADERS`** — Your Revenium metering key, prefixed with `x-api-key=`.
* **`OTEL_EXPORTER_OTLP_PROTOCOL`** — Must be `http/json`.
* **`OTEL_LOGS_EXPORTER`** — Must be `otlp`. This enables log export, which is how Revenium receives per-call telemetry.
* **`OTEL_METRICS_EXPORTER`** — Must be `none`. Revenium bills from log events only; leaving metrics enabled is unnecessary and increases HTTP traffic without adding data.
* **`OTEL_LOGS_EXPORT_INTERVAL`** — Milliseconds between log-batch flushes. `5000` matches Claude Code's documented/default flush interval.
  {% endhint %}

#### Optional Internal Attribution

The required configuration above is enough for correct metering and dashboard attribution. If you want to sub-segment your own Claude Code usage within your Revenium tenant, for example to compare spend across business units, teams, or internal product lines, add an `OTEL_RESOURCE_ATTRIBUTES` line to the `env` block with labels of your choosing:

```json
{
  "env": {
    "CLAUDE_CODE_ENABLE_TELEMETRY": "1",
    "OTEL_EXPORTER_OTLP_ENDPOINT": "https://api.revenium.ai/v2/otlp",
    "OTEL_EXPORTER_OTLP_HEADERS": "x-api-key=rev_mk_your_tenant_yourkey",
    "OTEL_EXPORTER_OTLP_PROTOCOL": "http/json",
    "OTEL_LOGS_EXPORTER": "otlp",
    "OTEL_METRICS_EXPORTER": "none",
    "OTEL_LOGS_EXPORT_INTERVAL": "5000",
    "OTEL_RESOURCE_ATTRIBUTES": "organization.name=Engineering,product.name=internal-claude-code"
  }
}
```

These values are yours to define. They are not matched against any existing list in Revenium. Choose labels that make sense for your internal reporting.

* `organization.name` — a business unit, department, or cost center, such as `Engineering`, `DataPlatform`, or `Marketing`
* `product.name` — the application or use case, such as `internal-claude-code`

If you omit `OTEL_RESOURCE_ATTRIBUTES`, events land under your tenant's default attribution, which is fine for most customers.

{% hint style="warning" %}
**`OTEL_RESOURCE_ATTRIBUTES` formatting constraints:**

* **No spaces in values.** Use underscores, camelCase, or percent-encoding (`%20`) instead, for example `organization.name=Data_Platform` or `organization.name=DataPlatform`, not `organization.name=Data Platform`.
* **Values are case-sensitive.** `Engineering` and `engineering` are treated as separate labels. Pick a canonical form and use it consistently across all configuration and integrations.
  {% endhint %}

{% hint style="info" %}
**Third-party provider compatibility.** Server-managed settings are delivered from Anthropic's servers and require a direct connection to `api.anthropic.com`. They are not delivered when Claude Code is routed through Amazon Bedrock, Google Vertex AI, Microsoft Foundry, or a custom endpoint via `ANTHROPIC_BASE_URL` or an LLM gateway. If your organization uses one of these providers, use endpoint-managed settings or the per-developer CLI setup instead.
{% endhint %}

{% hint style="info" %}
**Managed settings override user-level configuration.** Values defined in the managed settings `env` block take precedence over shell-exported environment variables on each user's machine, including values the Revenium CLI may have previously set.
{% endhint %}

***

## Gemini Data Points

Gemini CLI data flows into Revenium via the `@revenium/cli` npm package, which configures Gemini CLI's native OTLP export to send telemetry to Revenium's endpoint.

### Gemini CLI SDK Data Points

The CLI SDK captures the [Common Data Points](#common-data-points) listed above — token metrics, cost, model identity, timing, and attribution.

Gemini CLI operates in **real-time only** — there is no backfill capability. Telemetry is captured and exported as each Gemini CLI request completes.

### Stop Reason Mapping

Gemini CLI normalizes Google's finish reasons to Revenium's supported `StopReason` value:

| Gemini Finish Reason                                                         | Revenium StopReason         | Description                                                     |
| ---------------------------------------------------------------------------- | --------------------------- | --------------------------------------------------------------- |
| `STOP`                                                                       | `END`                       | Normal completion                                               |
| `MAX_TOKENS`                                                                 | `TOKEN_LIMIT`               | Token limit reached                                             |
| `SAFETY`, `BLOCKLIST`, `PROHIBITED_CONTENT`, `SPII`, `MODEL_ARMOR`           | `ERROR`                     | Content safety filter triggered                                 |
| `RECITATION`, `IMAGE_SAFETY`, `IMAGE_PROHIBITED_CONTENT`, `IMAGE_RECITATION` | `ERROR`                     | Recitation or image safety filter                               |
| `MALFORMED_FUNCTION_CALL`, `UNEXPECTED_TOOL_CALL`, `NO_IMAGE`                | `ERROR`                     | Tool call or image error                                        |
| `CANCELLED` / `CANCELED`                                                     | `CANCELLED`                 | Request canceled                                                |
| `FINISH_REASON_UNSPECIFIED`, `OTHER`, `IMAGE_OTHER`                          | *(caller-supplied default)* | Returns the default stop reason provided by the calling context |

***

## Cursor IDE Data Points

In addition to the [Common Data Points](#common-data-points) above, Cursor IDE captures the following through its Admin API sync:

### Billing Classification

| Data Point                      | Type   | Description                                                                                                 |
| ------------------------------- | ------ | ----------------------------------------------------------------------------------------------------------- |
| `billing.kind`                  | String | Cursor billing classification (`Included`, `Premium`, etc.) — determines whether usage counts against quota |
| `operation_type`                | String | Operation type from Cursor (e.g., request classification)                                                   |
| `stop_reason` / `finish_reason` | String | Finish reason from Cursor                                                                                   |

{% hint style="info" %}
When `billing.kind` is `Included`, Revenium sets `billingSkipped = true`, `skipReason = FREE_TIER`, and forces `totalCost` to `null` — indicating the request was covered by the subscription and incurred no additional cost.
{% endhint %}

### Data Collection Mode

Cursor IDE usage data is collected periodically from Cursor's Admin API and exported to Revenium via OTLP. Unlike Claude Code and Gemini CLI, data is not captured in real-time during each request — it is synced at regular intervals from Cursor's administrative interface.

***

## Derived Fields

The following fields are **not sent by the SDKs** but are calculated by the Revenium backend during ingestion:

<table><thead><tr><th width="233.3125">Field</th><th>Derivation</th><th>Description</th></tr></thead><tbody><tr><td><code>inputTokenCost</code></td><td><code>inputTokenCount × model_input_cost_per_token</code></td><td>Cost attributed to input tokens</td></tr><tr><td><code>outputTokenCost</code></td><td><code>outputTokenCount × model_output_cost_per_token</code></td><td>Cost attributed to output tokens</td></tr><tr><td><code>cacheCreationTokenCost</code></td><td><code>cacheCreationTokenCount × model_cache_creation_cost</code></td><td>Cost attributed to cache creation</td></tr><tr><td><code>cacheReadTokenCost</code></td><td><code>cacheReadTokenCount × model_cache_read_cost</code></td><td>Cost attributed to cache reads</td></tr><tr><td><code>totalCost</code> (when not provided)</td><td>Sum of all token costs</td><td>Calculated when SDK sends zero or null cost</td></tr><tr><td><code>apiKey</code></td><td>Extracted from <code>x-api-key</code> HTTP header</td><td>Authentication key for tenant identification</td></tr><tr><td><code>credentialId</code></td><td>Extracted from <code>subscriber</code> JSON</td><td>Credential identifier for access control</td></tr></tbody></table>

***

## OTLP Transport Details

For teams implementing custom integrations or verifying data flow, here are the OTLP transport details:

### Endpoint

```
POST {base_url}/v1/logs
```

Where `base_url` is typically `https://api.revenium.ai/v2/otlp`.

### Authentication

```
x-api-key: rev_mk_your_key_here
```

This is a metering key (`rev_mk_*`) — sufficient for OTLP telemetry ingest, which is what every AI coding-assistant integration on this page does. For workflows that also report business outcomes or manage Revenium resources, use a write-scope key (`rev_sk_*`) — see [API Key Permissions](/integrations/api-key-permissions.md).

### Payload Format

All integrations use the OTLP/HTTP JSON format (`application/json`):

```json
{
  "resourceLogs": [{
    "resource": {
      "attributes": [
        { "key": "service.name", "value": { "stringValue": "claude-code" } }
      ]
    },
    "scopeLogs": [{
      "scope": { "name": "claude-code", "version": "1.0.0" },
      "logRecords": [{
        "timeUnixNano": "1711324800000000000",
        "body": { "stringValue": "claude_code.api_request" },
        "attributes": [
          { "key": "session.id", "value": { "stringValue": "sess-abc123" } },
          { "key": "model", "value": { "stringValue": "claude-opus-4-5-20251101" } },
          { "key": "input_tokens", "value": { "intValue": 1500 } },
          { "key": "output_tokens", "value": { "intValue": 2000 } },
          { "key": "cache_read_tokens", "value": { "intValue": 500 } },
          { "key": "cache_creation_tokens", "value": { "intValue": 0 } },
          { "key": "total_input_tokens", "value": { "intValue": 2000 } }
        ]
      }]
    }]
  }]
}
```

{% hint style="info" %}
The example above shows a Claude Code backfill payload with the core token attributes. The real-time test/connectivity payload (via `revenium-metering test` in each relevant SDK if used) uses `stringValue` for token fields and additionally sends `cost_usd` and `duration_ms`. Gemini CLI payloads follow the same OTLP structure with `service.name` set to `gemini-cli` and scope name set to `gemini_cli`.
{% endhint %}

***

## Related Documentation

* [AI Coding Dashboard](/track-and-control-costs/analyze-ai-tooling-spend.md) — Dashboard views and analysis features
* [Integration Options for AI Metering](/integrations/integrations.md) — Setup instructions for all integrations
* [OpenTelemetry Integration](/integrations/otlp-integration.md) — General OTLP integration guide
* [Set Budgets & Alerts](/track-and-control-costs/set-budgets-and-alerts.md) — Alerting on coding assistant metrics


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.revenium.io/track-and-control-costs/analyze-ai-tooling-spend/ai-coding-data-reference.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.