rectangle-codeAI Coding Data Reference

Complete reference of all data points collected from AI coding assistant integrations, including Claude Code, Gemini CLI, and Cursor IDE.

This page documents every data point that Revenium collects from AI coding assistant integrations. Use this reference to understand exactly what telemetry is captured, how it's used, and what privacy guarantees apply.


How Data Is Collected

All AI coding assistant data is collected via OpenTelemetry (OTLP) log records. Each coding tool has a dedicated integration that exports usage telemetry to Revenium's OTLP endpoint. No proprietary agents or background processes are involved — data flows through the standard OpenTelemetry protocol.

Tool
Integration Method
Data Flow

Claude Code

@revenium/cli npm package

Claude Code hooks → OTLP logs → Revenium

Gemini CLI SDK

@revenium/cli npm package

Gemini CLI → OTLP logs → Revenium

Cursor IDE

Admin API sync

Cursor Admin API → Revenium (periodic)

Agent Identifiers

Each tool is identified by an agent value in the telemetry:

Tool
Agent Identifier

Claude Code

claude-code

Gemini CLI

gemini-cli

Cursor IDE

cursor-ide


Data Privacy

circle-check

Specifically, the following are never sent in the default configuration:

  • Source code or file contents

  • Prompt text or system prompts

  • AI response content

  • API keys, credentials, or secrets

  • Repository names or git history (diffs, commits, file contents)

  • Screen content or clipboard data


GitHub Integration Data (Optional)

The following section applies only when the optional GitHub integration is connected. Without it, the AI Coding Dashboard operates entirely from OTLP telemetry data and does not interact with GitHub in any way.

What We Read from GitHub

When the integration is active, Revenium makes the following read-only API calls to GitHub:

Data Read
Purpose

Organization member list (usernames, public emails)

Auto-map developers to their corporate email

Public user profiles and email search

Resolve GitHub logins to email addresses for attribution

Repository names in the organization

Determine which repos to scan for merged PRs

Merged PR metadata (author, merge date)

Count PRs merged per developer in the selected period

Commit messages and commit author emails

Detect AI co-author patterns (e.g. Co-Authored-By trailers)

circle-exclamation

What We Store

Stored Data
Description

Daily PR counts per developer

Number of PRs merged and number of AI-assisted PRs, per day

GitHub-to-email mappings

Links each developer's GitHub username to their corporate email for attribution

What We Do NOT Access

Even though the GitHub token may have broad permissions (repo scope), our implementation only makes the specific API calls listed above. The following are never accessed:

  • File contents, diffs, or patches

  • Pull request descriptions or comments

  • Repository source code

  • Issues, reviews, or branch data

  • GitHub Actions, webhooks, or deployment data

  • Private user profile data beyond public email

Token Permissions

The GitHub integration requires a personal access token with repo and read:org scopes. The repo scope is broader than strictly necessary, but GitHub does not offer a narrower scope that grants access to merged PR metadata across private repositories. Our code only exercises the minimum API calls needed for PR attribution.

For details on setting up and configuring the GitHub integration, see GitHub Integration.


Common Data Points

The following data points are collected by all AI coding assistant integrations. These form the core telemetry schema that powers the AI Coding Dashboard.

Token Metrics

Data Point
Type
Description

inputTokenCount

Integer

Number of input tokens consumed in the request

outputTokenCount

Integer

Number of output tokens generated by the model

cacheReadTokenCount

Integer

Tokens served from the model's prompt cache (reduces cost)

cacheCreationTokenCount

Integer

Tokens written to the model's prompt cache

reasoningTokenCount

Integer

Extended thinking / chain-of-thought tokens (model-dependent)

totalTokenCount

Integer

Sum of all token types for the request

Cost Metrics

Data Point
Type
Description

totalCost

Decimal

Calculated cost in USD for this request, based on model pricing

cost_source

String

Always coding_assistant for AI coding tool traffic

costType

String

Always AI for AI coding assistant requests

Model & Provider Identity

Data Point
Type
Description

model

String

AI model name (e.g., claude-opus-4-5-20251101, gemini-2.5-pro)

provider

String

AI provider identifier. Set by backend mappers: ClaudeCode, GeminiCli, CursorIde.

agent

String

Coding assistant identifier (claude-code, gemini-cli, cursor-ide)

Timing

Data Point
Type
Description

requestTime

Timestamp

When the request was initiated (ISO 8601 / epoch nanoseconds)

requestDuration

Integer

Total request duration in milliseconds

Attribution

Data Point
OTLP Attribute
Type
Description

subscriber

user.email

String

Developer email address for usage attribution (optional, user-configured)

organizationName

organization.id or organization.name

String

Organization or company name/ID for cost rollup (optional). The backend prefers organization.name; falls back to organization.id.

productName

product.id or product.name

String

Product or project name/ID for cost rollup (optional). The backend prefers product.name; falls back to product.id.

traceId

session.id

String

Session identifier — groups requests within a single coding session

transactionId

transaction_id

String

Unique identifier for each individual request (used for deduplication)

circle-info

The Data Point column shows the name as stored in the analytics database. The OTLP Attribute column shows the key name in the raw telemetry payload. The backend mapper translates between these formats during ingestion.

Operational Classification

Data Point
Type
Description

operationType

String

Request classification (e.g., CHAT)

stopReason

String

Why the model stopped generating. Revenium enum values: END, TOKEN_LIMIT, ERROR, CANCELLED. See Gemini Stop Reason Mapping for tool-specific normalization.

errorReason

String

Error description if the request failed (empty on success)

Coding Assistant Account Linkage

Data Point
Type
Description

coding_assistant_account_uuid

String

Links telemetry to a specific coding assistant account for cross-session tracking

subscription_tier

String

Subscription plan identifier (see Subscription Tiers below)


Claude Code Data Points

In addition to the Common Data Points above, Claude Code captures the following:

Subscription Tiers

Claude Code subscriptions tiers are optionally tracked when using the Revenium SDKs:

Tier
Description

pro

Anthropic Pro plan

max_5x

Anthropic Max 5x plan

max_20x

Anthropic Max 20x plan

team_premium

Anthropic Team Premium plan

enterprise

Anthropic Enterprise plan

api

Direct API usage (full API pricing, no subscription discount)

Data Collection Modes

Claude Code supports two data collection modes:

Mode
Description

Real-time

Telemetry is exported automatically during each Claude Code session via OTLP hooks. Captures core token, cost, and timing metrics.

Backfill

The revenium-metering backfill command scans local Claude Code session logs (~/.claude/projects/) and sends historical usage data.

Backfill is idempotent — deterministic transaction IDs (SHA-256 hash of session ID, timestamp, model, and token counts) prevent duplicate records.


Gemini Data Points

Gemini CLI data flows into Revenium via the @revenium/cli npm package, which configures Gemini CLI's native OTLP export to send telemetry to Revenium's endpoint.

Gemini CLI SDK Data Points

The CLI SDK captures the Common Data Points listed above — token metrics, cost, model identity, timing, and attribution.

Gemini CLI operates in real-time only — there is no backfill capability. Telemetry is captured and exported as each Gemini CLI request completes.

Stop Reason Mapping

Gemini CLI normalizes Google's finish reasons to Revenium's supported StopReason value:

Gemini Finish Reason
Revenium StopReason
Description

STOP

END

Normal completion

MAX_TOKENS

TOKEN_LIMIT

Token limit reached

SAFETY, BLOCKLIST, PROHIBITED_CONTENT, SPII, MODEL_ARMOR

ERROR

Content safety filter triggered

RECITATION, IMAGE_SAFETY, IMAGE_PROHIBITED_CONTENT, IMAGE_RECITATION

ERROR

Recitation or image safety filter

MALFORMED_FUNCTION_CALL, UNEXPECTED_TOOL_CALL, NO_IMAGE

ERROR

Tool call or image error

CANCELLED / CANCELED

CANCELLED

Request canceled

FINISH_REASON_UNSPECIFIED, OTHER, IMAGE_OTHER

(caller-supplied default)

Returns the default stop reason provided by the calling context


Cursor IDE Data Points

In addition to the Common Data Points above, Cursor IDE captures the following through its Admin API sync:

Billing Classification

Data Point
Type
Description

billing.kind

String

Cursor billing classification (Included, Premium, etc.) — determines whether usage counts against quota

operation_type

String

Operation type from Cursor (e.g., request classification)

stop_reason / finish_reason

String

Finish reason from Cursor

circle-info

When billing.kind is Included, Revenium sets billingSkipped = true, skipReason = FREE_TIER, and forces totalCost to null — indicating the request was covered by the subscription and incurred no additional cost.

Data Collection Mode

Cursor IDE usage data is collected periodically from Cursor's Admin API and exported to Revenium via OTLP. Unlike Claude Code and Gemini CLI, data is not captured in real-time during each request — it is synced at regular intervals from Cursor's administrative interface.


Derived Fields

The following fields are not sent by the SDKs but are calculated by the Revenium backend during ingestion:

Field
Derivation
Description

inputTokenCost

inputTokenCount × model_input_cost_per_token

Cost attributed to input tokens

outputTokenCost

outputTokenCount × model_output_cost_per_token

Cost attributed to output tokens

cacheCreationTokenCost

cacheCreationTokenCount × model_cache_creation_cost

Cost attributed to cache creation

cacheReadTokenCost

cacheReadTokenCount × model_cache_read_cost

Cost attributed to cache reads

totalCost (when not provided)

Sum of all token costs

Calculated when SDK sends zero or null cost

apiKey

Extracted from x-api-key HTTP header

Authentication key for tenant identification

credentialId

Extracted from subscriber JSON

Credential identifier for access control


OTLP Transport Details

For teams implementing custom integrations or verifying data flow, here are the OTLP transport details:

Endpoint

Where base_url is typically https://api.revenium.ai/v2/otlp.

Authentication

This is a metering key (rev_mk_*) — sufficient for OTLP telemetry ingest, which is what every AI coding-assistant integration on this page does. For workflows that also report business outcomes or manage Revenium resources, use a write-scope key (rev_sk_*) — see API Key Permissions.

Payload Format

All integrations use the OTLP/HTTP JSON format (application/json):

circle-info

The example above shows a Claude Code backfill payload with the core token attributes. The real-time test/connectivity payload (via revenium-metering test in each relevant SDK if used) uses stringValue for token fields and additionally sends cost_usd and duration_ms. Gemini CLI payloads follow the same OTLP structure with service.name set to gemini-cli and scope name set to gemini_cli.


Last updated

Was this helpful?