AI Coding Data Reference
Complete reference of all data points collected from AI coding assistant integrations, including Claude Code, Gemini CLI, and Cursor IDE.
This page documents every data point that Revenium collects from AI coding assistant integrations. Use this reference to understand exactly what telemetry is captured, how it's used, and what privacy guarantees apply.
How Data Is Collected
All AI coding assistant data is collected via OpenTelemetry (OTLP) log records. Each coding tool has a dedicated integration that exports usage telemetry to Revenium's OTLP endpoint. No proprietary agents or background processes are involved — data flows through the standard OpenTelemetry protocol.
Claude Code
@revenium/cli npm package
Claude Code hooks → OTLP logs → Revenium
Gemini CLI SDK
@revenium/cli npm package
Gemini CLI → OTLP logs → Revenium
Cursor IDE
Admin API sync
Cursor Admin API → Revenium (periodic)
Agent Identifiers
Each tool is identified by an agent value in the telemetry:
Claude Code
claude-code
Gemini CLI
gemini-cli
Cursor IDE
cursor-ide
Data Privacy
Revenium never collects your code, prompts, or conversation content. Only usage metadata is transmitted — token counts, model names, timestamps, and session identifiers. This applies to all integrations by default.
Specifically, the following are never sent in the default configuration:
Source code or file contents
Prompt text or system prompts
AI response content
API keys, credentials, or secrets
Repository names or git history (diffs, commits, file contents)
Screen content or clipboard data
GitHub Integration Data (Optional)
The following section applies only when the optional GitHub integration is connected. Without it, the AI Coding Dashboard operates entirely from OTLP telemetry data and does not interact with GitHub in any way.
What We Read from GitHub
When the integration is active, Revenium makes the following read-only API calls to GitHub:
Organization member list (usernames, public emails)
Auto-map developers to their corporate email
Public user profiles and email search
Resolve GitHub logins to email addresses for attribution
Repository names in the organization
Determine which repos to scan for merged PRs
Merged PR metadata (author, merge date)
Count PRs merged per developer in the selected period
Commit messages and commit author emails
Detect AI co-author patterns (e.g. Co-Authored-By trailers)
Revenium reads commit messages to detect AI co-authorship patterns, but does not read or store the message content itself — only the boolean result (AI-assisted or not) is retained.
What We Store
Daily PR counts per developer
Number of PRs merged and number of AI-assisted PRs, per day
GitHub-to-email mappings
Links each developer's GitHub username to their corporate email for attribution
What We Do NOT Access
Even though the GitHub token may have broad permissions (repo scope), our implementation only makes the specific API calls listed above. The following are never accessed:
File contents, diffs, or patches
Pull request descriptions or comments
Repository source code
Issues, reviews, or branch data
GitHub Actions, webhooks, or deployment data
Private user profile data beyond public email
Token Permissions
The GitHub integration requires a personal access token with repo and read:org scopes. The repo scope is broader than strictly necessary, but GitHub does not offer a narrower scope that grants access to merged PR metadata across private repositories. Our code only exercises the minimum API calls needed for PR attribution.
For details on setting up and configuring the GitHub integration, see GitHub Integration.
Common Data Points
The following data points are collected by all AI coding assistant integrations. These form the core telemetry schema that powers the AI Coding Dashboard.
Token Metrics
inputTokenCount
Integer
Number of input tokens consumed in the request
outputTokenCount
Integer
Number of output tokens generated by the model
cacheReadTokenCount
Integer
Tokens served from the model's prompt cache (reduces cost)
cacheCreationTokenCount
Integer
Tokens written to the model's prompt cache
reasoningTokenCount
Integer
Extended thinking / chain-of-thought tokens (model-dependent)
totalTokenCount
Integer
Sum of all token types for the request
Cost Metrics
totalCost
Decimal
Calculated cost in USD for this request, based on model pricing
cost_source
String
Always coding_assistant for AI coding tool traffic
costType
String
Always AI for AI coding assistant requests
Model & Provider Identity
model
String
AI model name (e.g., claude-opus-4-5-20251101, gemini-2.5-pro)
provider
String
AI provider identifier. Set by backend mappers: ClaudeCode, GeminiCli, CursorIde.
agent
String
Coding assistant identifier (claude-code, gemini-cli, cursor-ide)
Timing
requestTime
Timestamp
When the request was initiated (ISO 8601 / epoch nanoseconds)
requestDuration
Integer
Total request duration in milliseconds
Attribution
subscriber
user.email
String
Developer email address for usage attribution (optional, user-configured)
organizationName
organization.id or organization.name
String
Organization or company name/ID for cost rollup (optional). The backend prefers organization.name; falls back to organization.id.
productName
product.id or product.name
String
Product or project name/ID for cost rollup (optional). The backend prefers product.name; falls back to product.id.
traceId
session.id
String
Session identifier — groups requests within a single coding session
transactionId
transaction_id
String
Unique identifier for each individual request (used for deduplication)
The Data Point column shows the name as stored in the analytics database. The OTLP Attribute column shows the key name in the raw telemetry payload. The backend mapper translates between these formats during ingestion.
Operational Classification
operationType
String
Request classification (e.g., CHAT)
stopReason
String
Why the model stopped generating. Revenium enum values: END, TOKEN_LIMIT, ERROR, CANCELLED. See Gemini Stop Reason Mapping for tool-specific normalization.
errorReason
String
Error description if the request failed (empty on success)
Coding Assistant Account Linkage
coding_assistant_account_uuid
String
Links telemetry to a specific coding assistant account for cross-session tracking
Claude Code Data Points
In addition to the Common Data Points above, Claude Code captures the following:
Subscription Tiers
Claude Code subscriptions tiers are optionally tracked when using the Revenium SDKs:
pro
Anthropic Pro plan
max_5x
Anthropic Max 5x plan
max_20x
Anthropic Max 20x plan
team_premium
Anthropic Team Premium plan
enterprise
Anthropic Enterprise plan
api
Direct API usage (full API pricing, no subscription discount)
Data Collection Modes
Claude Code supports two data collection modes:
Real-time
Telemetry is exported automatically during each Claude Code session via OTLP hooks. Captures core token, cost, and timing metrics.
Backfill
The revenium-metering backfill command scans local Claude Code session logs (~/.claude/projects/) and sends historical usage data.
Backfill is idempotent — deterministic transaction IDs (SHA-256 hash of session ID, timestamp, model, and token counts) prevent duplicate records.
Gemini Data Points
Gemini CLI data flows into Revenium via the @revenium/cli npm package, which configures Gemini CLI's native OTLP export to send telemetry to Revenium's endpoint.
Gemini CLI SDK Data Points
The CLI SDK captures the Common Data Points listed above — token metrics, cost, model identity, timing, and attribution.
Gemini CLI operates in real-time only — there is no backfill capability. Telemetry is captured and exported as each Gemini CLI request completes.
Stop Reason Mapping
Gemini CLI normalizes Google's finish reasons to Revenium's supported StopReason value:
STOP
END
Normal completion
MAX_TOKENS
TOKEN_LIMIT
Token limit reached
SAFETY, BLOCKLIST, PROHIBITED_CONTENT, SPII, MODEL_ARMOR
ERROR
Content safety filter triggered
RECITATION, IMAGE_SAFETY, IMAGE_PROHIBITED_CONTENT, IMAGE_RECITATION
ERROR
Recitation or image safety filter
MALFORMED_FUNCTION_CALL, UNEXPECTED_TOOL_CALL, NO_IMAGE
ERROR
Tool call or image error
CANCELLED / CANCELED
CANCELLED
Request canceled
FINISH_REASON_UNSPECIFIED, OTHER, IMAGE_OTHER
(caller-supplied default)
Returns the default stop reason provided by the calling context
Cursor IDE Data Points
In addition to the Common Data Points above, Cursor IDE captures the following through its Admin API sync:
Billing Classification
billing.kind
String
Cursor billing classification (Included, Premium, etc.) — determines whether usage counts against quota
operation_type
String
Operation type from Cursor (e.g., request classification)
stop_reason / finish_reason
String
Finish reason from Cursor
When billing.kind is Included, Revenium sets billingSkipped = true, skipReason = FREE_TIER, and forces totalCost to null — indicating the request was covered by the subscription and incurred no additional cost.
Data Collection Mode
Cursor IDE usage data is collected periodically from Cursor's Admin API and exported to Revenium via OTLP. Unlike Claude Code and Gemini CLI, data is not captured in real-time during each request — it is synced at regular intervals from Cursor's administrative interface.
Derived Fields
The following fields are not sent by the SDKs but are calculated by the Revenium backend during ingestion:
inputTokenCost
inputTokenCount × model_input_cost_per_token
Cost attributed to input tokens
outputTokenCost
outputTokenCount × model_output_cost_per_token
Cost attributed to output tokens
cacheCreationTokenCost
cacheCreationTokenCount × model_cache_creation_cost
Cost attributed to cache creation
cacheReadTokenCost
cacheReadTokenCount × model_cache_read_cost
Cost attributed to cache reads
totalCost (when not provided)
Sum of all token costs
Calculated when SDK sends zero or null cost
apiKey
Extracted from x-api-key HTTP header
Authentication key for tenant identification
credentialId
Extracted from subscriber JSON
Credential identifier for access control
OTLP Transport Details
For teams implementing custom integrations or verifying data flow, here are the OTLP transport details:
Endpoint
Where base_url is typically https://api.revenium.ai/v2/otlp.
Authentication
This is a metering key (rev_mk_*) — sufficient for OTLP telemetry ingest, which is what every AI coding-assistant integration on this page does. For workflows that also report business outcomes or manage Revenium resources, use a write-scope key (rev_sk_*) — see API Key Permissions.
Payload Format
All integrations use the OTLP/HTTP JSON format (application/json):
The example above shows a Claude Code backfill payload with the core token attributes. The real-time test/connectivity payload (via revenium-metering test in each relevant SDK if used) uses stringValue for token fields and additionally sends cost_usd and duration_ms. Gemini CLI payloads follow the same OTLP structure with service.name set to gemini-cli and scope name set to gemini_cli.
Related Documentation
AI Coding Dashboard — Dashboard views and analysis features
Integration Options for AI Metering — Setup instructions for all integrations
OpenTelemetry Integration — General OTLP integration guide
Set Budgets & Alerts — Alerting on coding assistant metrics
Last updated
Was this helpful?