📡AI Coding Data Reference

Complete reference of all data points collected from AI coding assistant integrations, including Claude Code, Gemini CLI, and Cursor IDE.

This page documents every data point that Revenium collects from AI coding assistant integrations. Use this reference to understand exactly what telemetry is captured, how it's used, and what privacy guarantees apply.

How Data Is Collected

All AI coding assistant data is collected via OpenTelemetry (OTLP) log records. Each coding tool has a dedicated integration that exports usage telemetry to Revenium's OTLP endpoint. No proprietary agents or background processes are involved — data flows through the standard OpenTelemetry protocol.

Tool

Integration Method

Data Flow

Claude Code

@revenium/cli npm package

Claude Code hooks → OTLP logs → Revenium

Gemini CLI SDK

@revenium/cli npm package

Gemini CLI → OTLP logs → Revenium

Gemini Go Middleware

github.com/revenium/revenium-middleware-google-go

Go app → Completions API → Revenium

Cursor IDE

Admin API sync

Cursor Admin API → Revenium (periodic)

Agent Identifiers

Each tool is identified by an agent value in the telemetry:

Tool

Agent Identifier

Claude Code

claude-code

Gemini CLI

gemini-cli

Cursor IDE

cursor-ide

Privacy Guarantees

Revenium never collects your code, prompts, or conversation content. Only usage metadata is transmitted — token counts, model names, timestamps, and session identifiers. This applies to all integrations by default.

Specifically, the following are never sent in the default configuration:

Source code or file contents
Prompt text or system prompts
AI response content
API keys, credentials, or secrets
Repository names or git history (diffs, commits, file contents)
Screen content or clipboard data

Note on session metadata: When backfilling historical Claude Code data, optional session metadata including the working directory and git branch name may be included if present in the local session logs. These provide context about where AI assistance was used. See Claude Code > Session Metadata for details. No file contents, code, or git history are included.

Common Data Points

The following data points are collected by all AI coding assistant integrations. These form the core telemetry schema that powers the AI Coding Dashboard.

Token Metrics

Data Point

Type

Description

inputTokenCount

Integer

Number of input tokens consumed in the request

outputTokenCount

Integer

Number of output tokens generated by the model

cacheReadTokenCount

Integer

Tokens served from the model's prompt cache (reduces cost)

cacheCreationTokenCount

Integer

Tokens written to the model's prompt cache

reasoningTokenCount

Integer

Extended thinking / chain-of-thought tokens (model-dependent)

totalTokenCount

Integer

Sum of all token types for the request

Not all models or integrations populate every token type. reasoningTokenCount is populated by the Gemini Go middleware for models with extended thinking; the Claude Code SDK does not currently send this field (it may be zero in Claude Code data). cacheCreationTokenCount is always 0 for Gemini CLI (the Google API does not expose cache creation counts). cacheReadTokenCount and cacheCreationTokenCount depend on the model's prompt caching support. For Cursor IDE, reasoningTokenCount is always null.

Cost Metrics

Data Point

Type

Description

totalCost

Decimal

Calculated cost in USD for this request, based on model pricing

cost_multiplier

Float

Subscription tier discount factor (e.g., 0.08 for Max 20x = 8% of API pricing)

cost_source

String

Always coding_assistant for AI coding tool traffic

costType

String

Always AI for AI coding assistant requests

Model & Provider Identity

Data Point

Type

Description

model

String

AI model name (e.g., claude-opus-4-5-20251101, gemini-2.5-pro)

provider

String

AI provider identifier. Set by backend mappers: ClaudeCode, GeminiCli, CursorIde. The Gemini Go middleware may also send google-genai or vertex-ai.

agent

String

Coding assistant identifier (claude-code, gemini-cli, cursor-ide)

middlewareSource

String

SDK or middleware version that generated the telemetry

Timing

Data Point

Type

Description

requestTime

Timestamp

When the request was initiated (ISO 8601 / epoch nanoseconds)

requestDuration

Integer

Total request duration in milliseconds

Attribution

Data Point

OTLP Attribute

Type

Description

subscriber

user.email

String

Developer email address for usage attribution (optional, user-configured)

organizationName

organization.id or organization.name

String

Organization or company name/ID for cost rollup (optional). The backend prefers organization.name; falls back to organization.id.

productName

product.id or product.name

String

Product or project name/ID for cost rollup (optional). The backend prefers product.name; falls back to product.id.

traceId

session.id

String

Session identifier — groups requests within a single coding session

transactionId

transaction_id

String

Unique identifier for each individual request (used for deduplication)

The Data Point column shows the name as stored in the analytics database. The OTLP Attribute column shows the key name in the raw telemetry payload. The backend mapper translates between these formats during ingestion.

Operational Classification

Data Point

Type

Description

operationType

String

Request classification (e.g., CHAT)

stopReason

String

Why the model stopped generating. Revenium enum values: END, TOKEN_LIMIT, ERROR, CANCELLED. See Gemini Stop Reason Mapping for tool-specific normalization.

errorReason

String

Error description if the request failed (empty on success)

Coding Assistant Account Linkage

Data Point

Type

Description

coding_assistant_account_uuid

String

Links telemetry to a specific coding assistant account for cross-session tracking

subscription_tier

String

Subscription plan identifier (see Subscription Tiers below)

These fields are defined in the ClickHouse schema (Migration 15) and populated during data enrichment. The OTLP mappers extract claude_code.account_uuid from resource attributes where available. Full persistence is being rolled out incrementally.

Claude Code Data Points

In addition to the Common Data Points above, Claude Code captures the following:

Extended Token Breakdown

Data Point

Type

Description

cache_creation_5m_tokens

Integer

Cache tokens with 5-minute ephemeral expiry

cache_creation_1h_tokens

Integer

Cache tokens with 1-hour extended expiry

total_input_tokens

Integer

Aggregate input tokens (input + cache creation + cache read) — used for context window threshold detection

These granular cache fields are available in backfilled data where Claude Code's session logs contain the breakdown. Real-time telemetry reports the aggregate cacheCreationTokenCount.

Session Metadata

Data Point

Type

Description

claude_code.version

String

Claude Code application version

claude_code.cwd

String

Working directory during the session

claude_code.git_branch

String

Git branch name in the working directory

claude_code.speed

String

Speed/quality setting: instant, normal, or thorough

claude_code.service_tier

String

Anthropic API service tier used for the request

Session metadata fields are extracted from Claude Code's local session logs during backfill. They provide context about how and where AI coding assistance was used, without capturing any code or prompt content. These fields are currently extracted and logged by the backend mapper; full ClickHouse persistence is pending a schema migration.

Subscription Tiers

Claude Code subscriptions determine the cost_multiplier applied to usage costs:

Tier

cost_multiplier

Description

pro

0.16

Anthropic Pro plan (16% of API pricing)

max_5x

0.16

Anthropic Max 5x plan (16% of API pricing)

max_20x

0.08

Anthropic Max 20x plan (8% of API pricing)

team_premium

0.24

Anthropic Team Premium plan (24% of API pricing)

enterprise

0.05

Anthropic Enterprise plan (5% of API pricing)

api

1.0

Direct API usage (full API pricing, no subscription discount)

Data Collection Modes

Claude Code supports two data collection modes:

Mode

Description

Real-time

Telemetry is exported automatically during each Claude Code session via OTLP hooks. Captures core token, cost, and timing metrics.

Backfill

The revenium-metering backfill command scans local Claude Code session logs (~/.claude/projects/) and sends historical usage data. Captures extended token breakdown and session metadata in addition to core metrics.

Backfill is idempotent — deterministic transaction IDs (SHA-256 hash of session ID, timestamp, model, and token counts) prevent duplicate records.

Gemini Data Points

Gemini data flows into Revenium through two independent integration paths:

CLI SDK

Go Middleware

Package

@revenium/cli

github.com/revenium/revenium-middleware-google-go

Use case

Metering developer Gemini CLI usage

Metering server-side Go applications

Runs on

Developer workstation (one-time setup)

Server-side, wraps genai Go client

Fields captured

26 common fields (Common Data Points)

~52 fields (26 common + 26 extended)

Protocol

OTLP/HTTP logs

Revenium Completions API

Gemini CLI SDK Data Points

The CLI SDK configures Gemini CLI's native OTLP export to send telemetry to Revenium. It captures the Common Data Points listed above — token metrics, cost, model identity, timing, and attribution.

Need extended timing, tracing, vision detection, or prompt capture? These require the Go Middleware integration below.

Gemini CLI operates in real-time only — there is no backfill capability. Telemetry is captured and exported as each Gemini CLI request completes.

Gemini Go Middleware Data Points

In addition to the Common Data Points above, the Go middleware captures the following extended fields:

Extended Timing

Data Point

Type

Description

responseTime

Timestamp

When the response was fully received

completionStartTime

Timestamp

When the model began generating tokens

timeToFirstToken

Integer

Time from request start to first token, in milliseconds

Streaming & Model Configuration

Data Point

Type

Description

isStreamed

Boolean

Whether the response was streamed (hardcoded true for Gemini CLI, false for Cursor IDE)

temperature

Float

Temperature setting from the generation config

Additional Metadata

Data Point

Type

Description

taskType

String

Task type classification

taskId

String

Task identifier

subscriptionId

String

Subscription identifier

modelSource

String

Model source identifier

mediationLatency

Integer

Mediation latency in milliseconds

responseQualityScore

Float

Response quality score

credentialAlias

String

Credential alias for routing

Distributed Tracing

Data Point

Type

Description

traceType

String

Trace type classification (e.g., completion, embedding)

traceName

String

Human-readable trace name

environment

String

Deployment environment (e.g., production, development)

region

String

Cloud region for the request

retryNumber

Integer

Retry attempt number (0 for first attempt)

parentTransactionId

String

Parent transaction ID for request chaining

Vision Content Detection

Data Point

Type

Description

hasVisionContent

Boolean

Whether the request contained image content

attributes.vision_image_count

Integer

Number of images detected in the request (nested in attributes object)

attributes.vision_total_size_bytes

Integer

Total size of image data in bytes (nested in attributes object)

attributes.vision_media_types

String Array

MIME types of detected images (e.g., ["image/png", "image/jpeg"])

Vision detection metadata is only populated when the Gemini request includes image or multimodal content. The vision_* fields are nested inside an attributes object in the payload. This helps track the adoption of vision capabilities in coding workflows.

Optional Prompt Capture

Prompt capture is disabled by default and must be explicitly enabled in the middleware configuration. When enabled, the following fields are populated. Organizations should review their data handling policies before enabling this feature.

Data Point

Type

Description

systemPrompt

String

System prompt content

inputMessages

String

Input messages (JSON)

outputResponse

String

Model response content

promptsTruncated

Boolean

Whether content was truncated due to size limits

Stop Reason Mapping

Gemini CLI normalizes Google's finish reasons to Revenium's internal StopReason enum:

Gemini Finish Reason

Revenium StopReason

Description

STOP

END

Normal completion

MAX_TOKENS

TOKEN_LIMIT

Token limit reached

SAFETY, BLOCKLIST, PROHIBITED_CONTENT, SPII, MODEL_ARMOR

ERROR

Content safety filter triggered

RECITATION, IMAGE_SAFETY, IMAGE_PROHIBITED_CONTENT, IMAGE_RECITATION

ERROR

Recitation or image safety filter

MALFORMED_FUNCTION_CALL, UNEXPECTED_TOOL_CALL, NO_IMAGE

ERROR

Tool call or image error

CANCELLED / CANCELED

CANCELLED

Request cancelled

FINISH_REASON_UNSPECIFIED, OTHER, IMAGE_OTHER

(caller-supplied default)

Returns the default stop reason provided by the calling context

Cursor IDE Data Points

In addition to the Common Data Points above, Cursor IDE captures the following through its Admin API sync:

Billing Classification

Data Point

Type

Description

billing.kind

String

Cursor billing classification (Included, Premium, etc.) — determines whether usage counts against quota

operation_type

String

Operation type from Cursor (e.g., request classification)

stop_reason / finish_reason

String

Finish reason from Cursor

When billing.kind is Included, the backend sets billingSkipped = true, skipReason = FREE_TIER, and forces totalCost to null — indicating the request was covered by the subscription and incurred no additional cost.

Cursor IDE integration is under active development. Additional fields such as cursor.token_fee, cursor.requests_costs, and cursor.is_token_based are planned but not yet mapped in the backend. This section will be updated as the integration matures.

Data Collection Mode

Cursor IDE usage data is collected periodically from Cursor's Admin API and exported to Revenium via OTLP. Unlike Claude Code and Gemini CLI, data is not captured in real-time during each request — it is synced at regular intervals from Cursor's administrative interface.

Derived Fields

The following fields are not sent by the SDKs but are calculated by the Revenium backend during ingestion:

Field

Derivation

Description

inputTokenCost

inputTokenCount × model_input_cost_per_token

Cost attributed to input tokens

outputTokenCost

outputTokenCount × model_output_cost_per_token

Cost attributed to output tokens

cacheCreationTokenCost

cacheCreationTokenCount × model_cache_creation_cost

Cost attributed to cache creation

cacheReadTokenCost

cacheReadTokenCount × model_cache_read_cost

Cost attributed to cache reads

totalCost (when not provided)

Sum of all token costs

Calculated when SDK sends zero or null cost

apiKey

Extracted from x-api-key HTTP header

Authentication key for tenant identification

credentialId

Extracted from subscriber JSON

Credential identifier for access control

OTLP Transport Details

For teams implementing custom integrations or verifying data flow, here are the OTLP transport details:

Endpoint

POST {base_url}/v1/logs

Where base_url is typically https://api.revenium.ai/meter/v2/otlp.

Authentication

x-api-key: hak_XXXX_your_key_here

Payload Format

All integrations use the OTLP/HTTP JSON format (application/json):

{
  "resourceLogs": [{
    "resource": {
      "attributes": [
        { "key": "service.name", "value": { "stringValue": "claude-code" } },
        { "key": "cost_multiplier", "value": { "doubleValue": 0.08 } }
      ]
    },
    "scopeLogs": [{
      "scope": { "name": "claude-code", "version": "1.0.0" },
      "logRecords": [{
        "timeUnixNano": "1711324800000000000",
        "body": { "stringValue": "claude_code.api_request" },
        "attributes": [
          { "key": "session.id", "value": { "stringValue": "sess-abc123" } },
          { "key": "model", "value": { "stringValue": "claude-opus-4-5-20251101" } },
          { "key": "input_tokens", "value": { "intValue": 1500 } },
          { "key": "output_tokens", "value": { "intValue": 2000 } },
          { "key": "cache_read_tokens", "value": { "intValue": 500 } },
          { "key": "cache_creation_tokens", "value": { "intValue": 0 } },
          { "key": "total_input_tokens", "value": { "intValue": 2000 } }
        ]
      }]
    }]
  }]
}

The example above shows a Claude Code backfill payload with the core token attributes. The real-time test/connectivity payload (via revenium-metering test) uses stringValue for token fields and additionally sends cost_usd and duration_ms. Gemini CLI payloads follow the same OTLP structure with service.name set to gemini-cli and scope name set to gemini_cli.

AI Coding Dashboard — Dashboard views and analysis features
Integration Options for AI Metering — Setup instructions for all integrations
OpenTelemetry Integration — General OTLP integration guide
Cost & Performance Alerts — Alerting on coding assistant metrics

Last updated 1 hour ago

Was this helpful?

hashtagHow Data Is Collected

hashtagAgent Identifiers

hashtagPrivacy Guarantees

hashtagCommon Data Points

hashtagToken Metrics

hashtagCost Metrics

hashtagModel & Provider Identity

hashtagTiming

hashtagAttribution

hashtagOperational Classification

hashtagCoding Assistant Account Linkage

hashtagClaude Code Data Points

hashtagExtended Token Breakdown

hashtagSession Metadata

hashtagSubscription Tiers

hashtagData Collection Modes

hashtagGemini Data Points

hashtagGemini CLI SDK Data Points

hashtagGemini Go Middleware Data Points

hashtagExtended Timing

hashtagStreaming & Model Configuration

hashtagAdditional Metadata

hashtagDistributed Tracing

hashtagVision Content Detection

hashtagOptional Prompt Capture

hashtagStop Reason Mapping

hashtagCursor IDE Data Points

hashtagBilling Classification

hashtagData Collection Mode

hashtagDerived Fields

hashtagOTLP Transport Details

hashtagEndpoint

hashtagAuthentication

hashtagPayload Format

hashtagRelated Documentation

How Data Is Collected

Agent Identifiers

Privacy Guarantees

Common Data Points

Token Metrics

Cost Metrics

Model & Provider Identity

Timing

Attribution

Operational Classification

Coding Assistant Account Linkage

Claude Code Data Points

Extended Token Breakdown

Session Metadata

Subscription Tiers

Data Collection Modes

Gemini Data Points

Gemini CLI SDK Data Points

Gemini Go Middleware Data Points

Extended Timing

Streaming & Model Configuration

Additional Metadata

Distributed Tracing

Vision Content Detection

Optional Prompt Capture

Stop Reason Mapping

Cursor IDE Data Points

Billing Classification

Data Collection Mode

Derived Fields

OTLP Transport Details

Endpoint

Authentication

Payload Format

Related Documentation