For the complete documentation index, see llms.txt. This page is also available as Markdown.

Rate Limits

Revenium applies generous rate limits to keep usage predictable for agents and integrations. Limits are scoped per account and grouped into three buckets, each with its own ceiling and observable response headers.

Buckets and limits

Every authenticated request is mapped to one of three buckets based on the path. Each bucket has its own request-per-second limit.

Bucket
Paths
Limit

metering

/meter/v2/** (REST metering: AI completions, events, tool, API) and /v2/otlp/** (OTLP signals)

1,000 req/sec

analytics

AI Metrics, AI Traces, billing analytics, chart and cost-attribution reads

100 req/sec

platform

All other /profitstream/v2/api/** and /v2/sdk/** endpoints (CRUD on subscriptions, tenants, sources, integrations, alerts, models, anomalies, plus SDK auth endpoints)

50 req/sec

Limits are per account, not per API key. If your integration uses multiple keys under the same account, they share the bucket budget.

Response headers

Four headers appear on every authenticated response that resolves to a bucket, regardless of status code (2xx, 4xx, 5xx, 204).

Header
Meaning

X-RateLimit-Limit

Maximum requests allowed in the current window for this bucket

X-RateLimit-Remaining

Requests remaining in the current window. Reaches 0 at the limit, never goes negative

X-RateLimit-Reset

Unix epoch (seconds) when the current window resets

X-RateLimit-Bucket

Which bucket this request mapped to: metering, analytics, or platform

Use Remaining to throttle preemptively. Use Reset to schedule your next batch. Use Bucket so a metering call does not back off because a platform call was busy.

Limits reset each second. Use X-RateLimit-Reset to see exactly when.

When you hit the limit

A request that exceeds the bucket budget returns 429 Too Many Requests with two additional headers and a JSON body.

Additional headers:

Header
Meaning

Retry-After

Seconds to wait before retrying. Integer, always present

X-RateLimit-Limited-Reason

Why the limit fired. Values: bucket-rate (bucket exhaustion) or error-pattern (repeated 4xx responses)

Response body:

The bucket field is present when Limited-Reason is bucket-rate. For error-pattern, the body still has type, code, message, and doc_url, but no bucket.

Error-pattern protection

Beyond per-bucket budgets, Revenium temporarily slows callers that produce sustained 4xx errors. This protects accounts from runaway misconfigured integrations spamming the API with broken requests.

Sustained 4xx over a short window triggers a temporary block. Each consecutive violation doubles the cooldown, capped at one hour. Retry-After reflects the remaining cooldown. X-RateLimit-Limited-Reason will be error-pattern.

Fix the underlying request (auth, missing fields, invalid IDs) and the cooldown clears on its own.

  1. Read the headers. Treat X-RateLimit-Remaining and X-RateLimit-Reset as authoritative. If Remaining is low, slow down before you hit 0.

  2. Honor Retry-After. On 429, wait at least that many seconds before retrying. Do not retry sooner.

  3. Use exponential backoff with jitter on repeat 429s. Start at the Retry-After value and add randomized jitter to avoid thundering-herd retries with other clients in your account.

  4. Differentiate by Limited-Reason:

    • bucket-rate: you are sending too fast. Throttle by Remaining.

    • error-pattern: you are sending broken requests. Fix the request shape, not the rate.

  5. Separate metering from platform calls. The metering bucket has 20x the platform budget. A busy platform flow should not slow down metering.

If you have a use case that needs sustained higher throughput on a specific bucket, contact support so we can review the integration.

Last updated

Was this helpful?