π‘AI Coding Data Reference
Complete reference of all data points collected from AI coding assistant integrations, including Claude Code, Gemini CLI, and Cursor IDE.
This page documents every data point that Revenium collects from AI coding assistant integrations. Use this reference to understand exactly what telemetry is captured, how it's used, and what privacy guarantees apply.
How Data Is Collected
All AI coding assistant data is collected via OpenTelemetry (OTLP) log records. Each coding tool has a dedicated integration that exports usage telemetry to Revenium's OTLP endpoint. No proprietary agents or background processes are involved β data flows through the standard OpenTelemetry protocol.
Claude Code
@revenium/cli npm package
Claude Code hooks β OTLP logs β Revenium
Gemini CLI SDK
@revenium/cli npm package
Gemini CLI β OTLP logs β Revenium
Gemini Go Middleware
github.com/revenium/revenium-middleware-google-go
Go app β Completions API β Revenium
Cursor IDE
Admin API sync
Cursor Admin API β Revenium (periodic)
Agent Identifiers
Each tool is identified by an agent value in the telemetry:
Claude Code
claude-code
Gemini CLI
gemini-cli
Cursor IDE
cursor-ide
Privacy Guarantees
Revenium never collects your code, prompts, or conversation content. Only usage metadata is transmitted β token counts, model names, timestamps, and session identifiers. This applies to all integrations by default.
Specifically, the following are never sent in the default configuration:
Source code or file contents
Prompt text or system prompts
AI response content
API keys, credentials, or secrets
Repository names or git history (diffs, commits, file contents)
Screen content or clipboard data
Note on session metadata: When backfilling historical Claude Code data, optional session metadata including the working directory and git branch name may be included if present in the local session logs. These provide context about where AI assistance was used. See Claude Code > Session Metadata for details. No file contents, code, or git history are included.
Common Data Points
The following data points are collected by all AI coding assistant integrations. These form the core telemetry schema that powers the AI Coding Dashboard.
Token Metrics
inputTokenCount
Integer
Number of input tokens consumed in the request
outputTokenCount
Integer
Number of output tokens generated by the model
cacheReadTokenCount
Integer
Tokens served from the model's prompt cache (reduces cost)
cacheCreationTokenCount
Integer
Tokens written to the model's prompt cache
reasoningTokenCount
Integer
Extended thinking / chain-of-thought tokens (model-dependent)
totalTokenCount
Integer
Sum of all token types for the request
Not all models or integrations populate every token type. reasoningTokenCount is populated by the Gemini Go middleware for models with extended thinking; the Claude Code SDK does not currently send this field (it may be zero in Claude Code data). cacheCreationTokenCount is always 0 for Gemini CLI (the Google API does not expose cache creation counts). cacheReadTokenCount and cacheCreationTokenCount depend on the model's prompt caching support. For Cursor IDE, reasoningTokenCount is always null.
Cost Metrics
totalCost
Decimal
Calculated cost in USD for this request, based on model pricing
cost_multiplier
Float
Subscription tier discount factor (e.g., 0.08 for Max 20x = 8% of API pricing)
cost_source
String
Always coding_assistant for AI coding tool traffic
costType
String
Always AI for AI coding assistant requests
Model & Provider Identity
model
String
AI model name (e.g., claude-opus-4-5-20251101, gemini-2.5-pro)
provider
String
AI provider identifier. Set by backend mappers: ClaudeCode, GeminiCli, CursorIde. The Gemini Go middleware may also send google-genai or vertex-ai.
agent
String
Coding assistant identifier (claude-code, gemini-cli, cursor-ide)
middlewareSource
String
SDK or middleware version that generated the telemetry
Timing
requestTime
Timestamp
When the request was initiated (ISO 8601 / epoch nanoseconds)
requestDuration
Integer
Total request duration in milliseconds
Attribution
subscriber
user.email
String
Developer email address for usage attribution (optional, user-configured)
organizationName
organization.id or organization.name
String
Organization or company name/ID for cost rollup (optional). The backend prefers organization.name; falls back to organization.id.
productName
product.id or product.name
String
Product or project name/ID for cost rollup (optional). The backend prefers product.name; falls back to product.id.
traceId
session.id
String
Session identifier β groups requests within a single coding session
transactionId
transaction_id
String
Unique identifier for each individual request (used for deduplication)
The Data Point column shows the name as stored in the analytics database. The OTLP Attribute column shows the key name in the raw telemetry payload. The backend mapper translates between these formats during ingestion.
Operational Classification
operationType
String
Request classification (e.g., CHAT)
stopReason
String
Why the model stopped generating. Revenium enum values: END, TOKEN_LIMIT, ERROR, CANCELLED. See Gemini Stop Reason Mapping for tool-specific normalization.
errorReason
String
Error description if the request failed (empty on success)
Coding Assistant Account Linkage
coding_assistant_account_uuid
String
Links telemetry to a specific coding assistant account for cross-session tracking
These fields are defined in the ClickHouse schema (Migration 15) and populated during data enrichment. The OTLP mappers extract claude_code.account_uuid from resource attributes where available. Full persistence is being rolled out incrementally.
Claude Code Data Points
In addition to the Common Data Points above, Claude Code captures the following:
Extended Token Breakdown
cache_creation_5m_tokens
Integer
Cache tokens with 5-minute ephemeral expiry
cache_creation_1h_tokens
Integer
Cache tokens with 1-hour extended expiry
total_input_tokens
Integer
Aggregate input tokens (input + cache creation + cache read) β used for context window threshold detection
These granular cache fields are available in backfilled data where Claude Code's session logs contain the breakdown. Real-time telemetry reports the aggregate cacheCreationTokenCount.
Session Metadata
claude_code.version
String
Claude Code application version
claude_code.cwd
String
Working directory during the session
claude_code.git_branch
String
Git branch name in the working directory
claude_code.speed
String
Speed/quality setting: instant, normal, or thorough
claude_code.service_tier
String
Anthropic API service tier used for the request
Session metadata fields are extracted from Claude Code's local session logs during backfill. They provide context about how and where AI coding assistance was used, without capturing any code or prompt content. These fields are currently extracted and logged by the backend mapper; full ClickHouse persistence is pending a schema migration.
Subscription Tiers
Claude Code subscriptions determine the cost_multiplier applied to usage costs:
Tier
cost_multiplier
Description
pro
0.16
Anthropic Pro plan (16% of API pricing)
max_5x
0.16
Anthropic Max 5x plan (16% of API pricing)
max_20x
0.08
Anthropic Max 20x plan (8% of API pricing)
team_premium
0.24
Anthropic Team Premium plan (24% of API pricing)
enterprise
0.05
Anthropic Enterprise plan (5% of API pricing)
api
1.0
Direct API usage (full API pricing, no subscription discount)
Data Collection Modes
Claude Code supports two data collection modes:
Real-time
Telemetry is exported automatically during each Claude Code session via OTLP hooks. Captures core token, cost, and timing metrics.
Backfill
The revenium-metering backfill command scans local Claude Code session logs (~/.claude/projects/) and sends historical usage data. Captures extended token breakdown and session metadata in addition to core metrics.
Backfill is idempotent β deterministic transaction IDs (SHA-256 hash of session ID, timestamp, model, and token counts) prevent duplicate records.
Gemini Data Points
Gemini data flows into Revenium through two independent integration paths:
Package
@revenium/cli
github.com/revenium/revenium-middleware-google-go
Use case
Metering developer Gemini CLI usage
Metering server-side Go applications
Runs on
Developer workstation (one-time setup)
Server-side, wraps genai Go client
Protocol
OTLP/HTTP logs
Revenium Completions API
Gemini CLI SDK Data Points
The CLI SDK configures Gemini CLI's native OTLP export to send telemetry to Revenium. It captures the Common Data Points listed above β token metrics, cost, model identity, timing, and attribution.
Need extended timing, tracing, vision detection, or prompt capture? These require the Go Middleware integration below.
Gemini CLI operates in real-time only β there is no backfill capability. Telemetry is captured and exported as each Gemini CLI request completes.
Gemini Go Middleware Data Points
In addition to the Common Data Points above, the Go middleware captures the following extended fields:
Extended Timing
responseTime
Timestamp
When the response was fully received
completionStartTime
Timestamp
When the model began generating tokens
timeToFirstToken
Integer
Time from request start to first token, in milliseconds
Streaming & Model Configuration
isStreamed
Boolean
Whether the response was streamed (hardcoded true for Gemini CLI, false for Cursor IDE)
temperature
Float
Temperature setting from the generation config
Additional Metadata
taskType
String
Task type classification
taskId
String
Task identifier
subscriptionId
String
Subscription identifier
modelSource
String
Model source identifier
mediationLatency
Integer
Mediation latency in milliseconds
responseQualityScore
Float
Response quality score
credentialAlias
String
Credential alias for routing
Distributed Tracing
traceType
String
Trace type classification (e.g., completion, embedding)
traceName
String
Human-readable trace name
environment
String
Deployment environment (e.g., production, development)
region
String
Cloud region for the request
retryNumber
Integer
Retry attempt number (0 for first attempt)
parentTransactionId
String
Parent transaction ID for request chaining
Vision Content Detection
hasVisionContent
Boolean
Whether the request contained image content
attributes.vision_image_count
Integer
Number of images detected in the request (nested in attributes object)
attributes.vision_total_size_bytes
Integer
Total size of image data in bytes (nested in attributes object)
attributes.vision_media_types
String Array
MIME types of detected images (e.g., ["image/png", "image/jpeg"])
Vision detection metadata is only populated when the Gemini request includes image or multimodal content. The vision_* fields are nested inside an attributes object in the payload. This helps track the adoption of vision capabilities in coding workflows.
Optional Prompt Capture
Prompt capture is disabled by default and must be explicitly enabled in the middleware configuration. When enabled, the following fields are populated. Organizations should review their data handling policies before enabling this feature.
systemPrompt
String
System prompt content
inputMessages
String
Input messages (JSON)
outputResponse
String
Model response content
promptsTruncated
Boolean
Whether content was truncated due to size limits
Stop Reason Mapping
Gemini CLI normalizes Google's finish reasons to Revenium's internal StopReason enum:
STOP
END
Normal completion
MAX_TOKENS
TOKEN_LIMIT
Token limit reached
SAFETY, BLOCKLIST, PROHIBITED_CONTENT, SPII, MODEL_ARMOR
ERROR
Content safety filter triggered
RECITATION, IMAGE_SAFETY, IMAGE_PROHIBITED_CONTENT, IMAGE_RECITATION
ERROR
Recitation or image safety filter
MALFORMED_FUNCTION_CALL, UNEXPECTED_TOOL_CALL, NO_IMAGE
ERROR
Tool call or image error
CANCELLED / CANCELED
CANCELLED
Request cancelled
FINISH_REASON_UNSPECIFIED, OTHER, IMAGE_OTHER
(caller-supplied default)
Returns the default stop reason provided by the calling context
Cursor IDE Data Points
In addition to the Common Data Points above, Cursor IDE captures the following through its Admin API sync:
Billing Classification
billing.kind
String
Cursor billing classification (Included, Premium, etc.) β determines whether usage counts against quota
operation_type
String
Operation type from Cursor (e.g., request classification)
stop_reason / finish_reason
String
Finish reason from Cursor
When billing.kind is Included, the backend sets billingSkipped = true, skipReason = FREE_TIER, and forces totalCost to null β indicating the request was covered by the subscription and incurred no additional cost.
Cursor IDE integration is under active development. Additional fields such as cursor.token_fee, cursor.requests_costs, and cursor.is_token_based are planned but not yet mapped in the backend. This section will be updated as the integration matures.
Data Collection Mode
Cursor IDE usage data is collected periodically from Cursor's Admin API and exported to Revenium via OTLP. Unlike Claude Code and Gemini CLI, data is not captured in real-time during each request β it is synced at regular intervals from Cursor's administrative interface.
Derived Fields
The following fields are not sent by the SDKs but are calculated by the Revenium backend during ingestion:
inputTokenCost
inputTokenCount Γ model_input_cost_per_token
Cost attributed to input tokens
outputTokenCost
outputTokenCount Γ model_output_cost_per_token
Cost attributed to output tokens
cacheCreationTokenCost
cacheCreationTokenCount Γ model_cache_creation_cost
Cost attributed to cache creation
cacheReadTokenCost
cacheReadTokenCount Γ model_cache_read_cost
Cost attributed to cache reads
totalCost (when not provided)
Sum of all token costs
Calculated when SDK sends zero or null cost
apiKey
Extracted from x-api-key HTTP header
Authentication key for tenant identification
credentialId
Extracted from subscriber JSON
Credential identifier for access control
OTLP Transport Details
For teams implementing custom integrations or verifying data flow, here are the OTLP transport details:
Endpoint
Where base_url is typically https://api.revenium.ai/meter/v2/otlp.
Authentication
Payload Format
All integrations use the OTLP/HTTP JSON format (application/json):
The example above shows a Claude Code backfill payload with the core token attributes. The real-time test/connectivity payload (via revenium-metering test) uses stringValue for token fields and additionally sends cost_usd and duration_ms. Gemini CLI payloads follow the same OTLP structure with service.name set to gemini-cli and scope name set to gemini_cli.
Related Documentation
AI Coding Dashboard β Dashboard views and analysis features
Integration Options for AI Metering β Setup instructions for all integrations
OpenTelemetry Integration β General OTLP integration guide
Cost & Performance Alerts β Alerting on coding assistant metrics
Last updated
Was this helpful?