Meter AI Completion

post

https://api.revenium.ai/meter/v2/ai/completions

Submit AI completion metadata for metering and billing purposes.

Recent Requests

Time	Status	User Agent
Retrieving recent requests…

Loading…

Body Params

AI completion metadata including token counts, costs, and timing information

The AI completion metadata

transactionId

string | null

Unique identifier for this specific AI completion transaction. Used for deduplication, correlation with request/response pairs, and transaction lookup in Revenium analytics. If not provided, a UUID will be auto-generated. For best practices, generate a UUID in your application before making the AI call and use the same ID when submitting to Revenium. You can use either 'spanId' (recommended) or 'transactionId'.

traceId

string | null

Optional trace identifier to group multiple related AI completion calls that belong to the same overall user request or workflow. For example, if a single user query triggers multiple LLM calls (e.g., retrieval + generation), use the same traceId for all calls to analyze them together in Revenium's analytics. Leave null for standalone completions.

model

string

required

The AI model identifier used for this completion. Should match the exact model name from your AI provider (e.g., 'gpt-4', 'claude-3-opus-20240229', 'gemini-pro'). This is used for cost calculation, performance analytics, and model comparison reporting in Revenium. Valid model names in Revenium for proper cost estimate can be verified using the sources/ai/models endpoint.

responseQualityScore

double | null

Optional quality score for the AI response on a 0.0-1.0 scale. Set by your application's evaluation logic (e.g., RAGAS, human feedback, custom scoring). Used in Revenium analytics to correlate quality with cost, model choice, and other metrics. Leave null if not tracking quality scores.

modelSource

string | null

The routing or aggregation layer used to access the AI model. This identifies whether you're calling the AI provider directly or through an intermediary service.

Common values: 'DIRECT', 'LITELLM', 'OPENROUTER', 'PORTKEY', 'AZURE_OPENAI', or provider names ('OPENAI', 'ANTHROPIC', 'GOOGLE', etc.) when calling directly.

Custom values are accepted for specialized routing layers or gateways. This field is used for integration tracking and analytics.

inputTokenCount

int64

required

The count of consumed input tokens

outputTokenCount

int64

required

The count of consumed output tokens

reasoningTokenCount

int64 | null

The number of reasoning tokens used in the completion. Reasoning tokens are extended thinking tokens used by AI models for complex problem-solving. These are sometimes billed separately from regular input/output tokens. Only include this field if your AI provider reports reasoning tokens Revenium's middleware will always populate this field if reasoning tokens are reported by the AI provider. Leave null for models without reasoning capabilities.

cacheCreationTokenCount

int64 | null

The number of tokens used to create new cache entries (prompt caching). When you send a long prompt for the first time, the AI provider may cache it for faster subsequent requests. Cache creation tokens are typically billed at a higher rate than regular input tokens. Only include if your provider supports prompt caching (e.g., Anthropic Claude, OpenAI with cache-enabled models). Revenium's middleware will always populate this field automatically. Leave null otherwise.

cacheReadTokenCount

int64 | null

The number of tokens read from cache (prompt caching). When reusing a previously cached prompt, these tokens are read from cache instead of being processed as new input tokens. Cache read tokens are typically billed at a lower rate than regular input tokens. Only include if your provider supports prompt caching and reports cache hits. Revenium's middleware will always populate this field automatically. Leave null otherwise.

totalTokenCount

int64

required

The total number of tokens

stopReason

string

enum

required

The reason for stopping the completion

inputTokenCost

double | null

The cost in USD for input tokens in this completion. Typically leave null to let Revenium automatically calculate costs based on the model and provider's current pricing. Only provide a value if you have custom pricing agreements or want to override Revenium's cost calculation. Note: Manual cost override may not be available on all Revenium plans.

outputTokenCost

double | null

The cost in USD for output tokens in this completion. Typically leave null to let Revenium automatically calculate costs based on the model and provider's current pricing. Only provide a value if you have custom pricing agreements or want to override Revenium's cost calculation. If provided, this will override Revenium's automatic calculation. Note: Manual cost override may not be available on all Revenium plans.

cacheCreationTokenCost

double | null

The cost in USD for cache creation tokens in this completion. Typically leave null to let Revenium automatically calculate costs based on the model and provider's caching pricing. Only provide a value if you have custom pricing agreements or want to override Revenium's cost calculation. If provided, this will override Revenium's automatic calculation.

cacheReadTokenCost

double | null

The cost in USD for cache read tokens in this completion. Typically leave null to let Revenium automatically calculate costs based on the model and provider's caching pricing. Only provide a value if you have custom pricing agreements or want to override Revenium's cost calculation. If provided, this will override Revenium's automatic calculation.

totalCost

double | null

The total cost in USD for this completion (sum of all token costs). Typically leave null to let Revenium automatically calculate the total based on token counts and current pricing. Only provide a value if you have custom pricing agreements or want to override Revenium's cost calculation. If provided, this will override Revenium's automatic calculation.

costType

string

enum

The type of cost being tracked. Currently always 'AI' for AI completion costs. This field is used internally by Revenium to categorize different types of metered usage. You typically do not need to set this field as it defaults to 'AI'.

Allowed:

requestTime

string

required

The timestamp when your application sent the request to the AI provider, in ISO 8601 format with UTC timezone (e.g., '2025-03-02T15:04:05Z'). This is used to calculate request duration and analyze usage patterns over time. Set this to the time immediately before calling the AI provider's API.

completionStartTime

string

required

The timestamp when the AI completion started generating output, in ISO 8601 format with UTC timezone. For streaming requests, this is when the first token was received. For non-streaming requests, this is typically the same as or very close to responseTime. Used to calculate time-to-first-token latency for streaming completions.

timeToFirstToken

int64 | null

The latency in milliseconds from request start to first token received. Calculated as (completionStartTime - requestTime). This metric is particularly important for streaming completions to measure perceived responsiveness. For non-streaming completions, this may be null or equal to requestDuration.

responseTime

string

required

The timestamp when the AI completion finished, in ISO 8601 format with UTC timezone. For streaming requests, this is when the last token was received and the stream closed. For non-streaming requests, this is when the complete response was received. Used to calculate total request duration.

requestDuration

int64

required

The total duration of the AI completion request in milliseconds, from request start to completion. Calculated as (responseTime - requestTime). This includes network latency, AI processing time, and any mediation/gateway overhead. Used for performance analytics and SLA monitoring.

mediationLatency

int64 | null

The latency in milliseconds introduced by intermediate systems between your application and the AI provider, such as API gateways, proxies, or AI mediation layers. This helps identify performance bottlenecks outside of the AI provider's processing time. Leave null if not using intermediate systems or if latency is not tracked separately.

provider

string

required

The underlying AI provider/vendor whose model is actually processing the request. This identifies which company's AI model is being used, regardless of how you're accessing it (direct API, proxy, or gateway).

Common values: 'OpenAI' (for GPT models), 'Anthropic' (for Claude models), 'Google' (for Gemini models), 'Cohere', 'Mistral', 'Meta' (for Llama models), 'Amazon Bedrock', 'Azure'.

Custom values are accepted but may affect analytics categorization. Revenium looks up model pricing primarily by model name (e.g., 'gpt-4', 'claude-3-opus'), so using non-standard provider names will not break cost calculation. However, using standard provider names ensures proper categorization in analytics and reporting.

If using an aggregation service like LiteLLM or OpenRouter, this should still be the actual provider (e.g., 'Anthropic' not 'LiteLLM'). If using Revenium middleware, this is typically auto-populated from the AI provider's API response. Supported provider models can be verified using the sources/ai/models endpoint which returns both providers and model names.

taskType

string | null

Optional category to group related AI tasks for cost and performance analysis. Use consistent values to compare metrics across different models or vendors performing the same type of work. Examples: 'chat', 'summarization', 'code-generation', 'translation', 'image-generation', 'embeddings', 'classification', 'sentiment-analysis'. This is freeform text - choose values that match your use cases.

organizationName

string | null

The name of the subscriber's organization from your system to allow Revenium to track usage & costs by company. i.e. AcmeCorp. If several subscriberIds have the same organizationName, Revenium's reporting will show usage for the entire organization broken down by subscriberId. This field is used for lookup and auto-creation of organizations in Revenium.

subscriptionId

string | null

Unique identifier of the subscription from your own system that you wish to use to correlate usage between Revenium & your application.

productName

string | null

The name of the product from your system that you wish to use to correlate usage between Revenium & your application. This field is used for lookup and auto-creation of products in Revenium.

agent

string | null

The AI agent that is making the request

operationType

string | null

enum

The type of operation performed

systemFingerprint

string | null

A unique identifier provided by the AI provider that represents the statistical signature of the language model that generated this completion. This fingerprint can be used for model attribution, debugging, and monitoring model behavior across requests. Automatically provided by some AI providers (e.g., OpenAI) in their API responses. Leave null if your provider does not supply this value.

temperature

double | null

The temperature parameter used for this completion, controlling randomness in the AI's output. Typically ranges from 0.0 (deterministic) to 2.0 (very random). Track this to correlate temperature settings with response quality, cost, or other metrics. Useful for A/B testing different temperature values.

errorReason

string | null

Error message or reason if the AI completion failed. Include this field when the AI provider returns an error (e.g., rate limit exceeded, invalid API key, model not found, content policy violation). Used for error rate analytics and debugging. Leave null for successful completions.

errorCode

int32 | null

HTTP error code if the completion failed (e.g., 429, 503, 500). Used with billingSkipped to determine no-charge scenarios (e.g., OpenAI flex tier 429 = no charge).

subscriber

middlewareSource

string | null

Identifier of the Revenium middleware package or SDK that captured and submitted this AI completion metadata. This field is AUTOMATICALLY SET by Revenium's middleware packages (e.g., 'revenium-openai-python', 'revenium-anthropic-node'). You typically should NOT manually set this field. It is used for analytics to track which integration methods are being used and for debugging middleware-specific issues.

environment

string | null

Deployment environment where this AI completion was executed. Used for filtering and analyzing usage patterns across different deployment stages. Common values: 'production', 'staging', 'development', 'test'. Leave null if not tracking by environment.

operationSubtype

string | null

Technical classification of the specific operation or tool used within the AI completion. This provides finer-grained categorization than operationType. Examples: 'web_search', 'function_call', 'code_interpreter', 'sql_query', 'http_request', 'file_read', 'ocr'. Used for analyzing which tools or capabilities are being used most frequently. Leave null for standard completions without tool usage.

parentTransactionId

string | null

Parent transaction ID for hierarchical tracing. When an AI completion is part of a larger workflow or spawned by another AI call, this field references the parent's spanId/transactionId. Used to build transaction trees and understand call hierarchies in complex multi-step AI workflows. Leave null for root-level transactions. You can use either 'parentSpanId' (recommended) or 'parentTransactionId'.

transactionName

string | null

Human-readable name for this transaction. Provides context about what this AI completion is doing in business terms. Examples: 'Summarize Application', 'Credit Risk Analysis', 'Customer Support Response'. Used in trace visualization for better readability. Falls back to taskType if not provided.

region

string | null

Cloud region or geographic location where this AI completion was processed. Used for analyzing latency patterns, compliance requirements, and regional cost differences. Examples: 'us-east-1', 'eu-west-1', 'ap-southeast-2'. Leave null if not tracking by region.

retryNumber

int32 | null

Retry attempt number for this AI completion. 0 indicates the first attempt (no retry), 1 indicates first retry, 2 indicates second retry, etc. Used for analyzing failure rates and retry patterns. Each retry attempt should be a separate transaction with incrementing retryNumber and the same traceId. Leave null or set to 0 for first attempts.

traceType

string | null

Categorical identifier for grouping similar workflows (e.g., 'create_video', 'write_script'). Enables trace-level analytics and anomaly detection within workflow cohorts. Must contain only alphanumeric characters, hyphens, and underscores. Max 128 characters. Defaults to 'uncategorized' if not provided or invalid. All transactions in the same trace must have the same traceType.

traceName

string | null

Human-readable label for individual trace instances (e.g., 'Marketing Video Q4 2025'). Enables finding specific workflow executions. Can contain any UTF-8 string. Max 256 characters. All transactions in the same trace must have the same traceName.

squadId

string | null

Identifier for multi-agent framework team or squad (e.g., ARK, CrewAI). Used to group AI operations by agent team for analytics and cost tracking. Free-form string, max 256 characters.

squadName

string | null

Human-readable name for the squad type (e.g., 'Loan Processing', 'Document Analysis'). Used for display in the UI and analytics grouping. If squadId is provided without squadName, squadId will be used as the display name.

squadRole

string | null

The agent's role within this squad execution (e.g., 'Document Extractor', 'Credit Checker'). Used to identify the purpose of each agent's contribution to the squad workflow.

agenticJobId

string | null

Unique identifier for the agentic job instance. Used to correlate all AI operations within a single job execution. Should be unique per job run (e.g., UUID or meaningful ID like 'loan-123-processing').

agenticJobName

string | null

Human-readable name for the agentic job. Used for display in the UI and analytics grouping. If agenticJobId is provided without agenticJobName, the ID will be used as the display name.

agenticJobType

string | null

Category or type of the agentic job. Normalized to lowercase on ingest to prevent data fragmentation. Used for grouping jobs by type in analytics and ROI calculations.

agenticJobVersion

string | null

Version identifier for the job definition. Used to track job evolution and compare performance across versions.

systemPrompt

string | null

System prompt/instructions sent to the model.

inputMessages

string | null

User input messages as JSON array.

outputResponse

string | null

Assistant output/response content.

promptsTruncated

boolean | null

Indicates if prompts were truncated due to size limits.

billingSkipped

boolean | null

If true, backend returns $0 cost. Set for free tier, rate-limited requests (429 on flex), or other no-charge scenarios.

skipReason

string | null

enum

Reason why billing was skipped.

Allowed:

pricingTier

string | null

enum

Pricing tier for batch discounts. BATCH = 50% discount, STANDARD = normal pricing.

Allowed:

requestedServiceTier

string | null

Service tier requested by user (e.g., 'priority', 'default', 'flex').

actualServiceTier

string | null

Actual service tier used for billing. May differ from requested if downgraded. Backend uses this for cost calculation.

subscriptionTier

string | null

The subscription tier for coding assistant tools (e.g., 'pro', 'free', 'unknown'). Only populated for coding_assistant cost sources.

costMultiplier

double | null

Multiplier applied to raw AI provider costs for this coding assistant event. Used to normalize costs relative to subscription pricing.

codingAssistantAccountUuid

string | null

Account UUID from the coding assistant tool (e.g., CLAUDE_CODE_ACCOUNT_UUID). Used for subscriber upsert to correlate coding assistant usage to a specific account.

cacheCreation5mTokenCount

int64 | null

The portion of cacheCreationTokenCount written with a 5-minute cache TTL (Anthropic ephemeral_5m_input_tokens). When supplied, the 5m and 1h buckets sum to cacheCreationTokenCount. Leave null if the provider does not report a TTL split.

cacheCreation1hTokenCount

int64 | null

The portion of cacheCreationTokenCount written with a 1-hour cache TTL (Anthropic ephemeral_1h_input_tokens). When supplied, the 5m and 1h buckets sum to cacheCreationTokenCount. Leave null if the provider does not report a TTL split.

isStreamed

boolean

Indicates whether this completion used streaming (true) or non-streaming/batch mode (false). Streaming completions receive tokens incrementally as they're generated, while non-streaming completions wait for the complete response. This affects how timeToFirstToken and responseTime are interpreted.

Headers

Idempotency-Key

string

length between 1 and 255

Optional Stripe-style retry-safety key. If present, the response (status + body) is cached keyed by (tenant, key). Identical retries replay the cached response; body mismatch returns 409 idempotency_key_mismatch; a concurrent in-flight call returns 409 idempotency_key_in_progress with Retry-After: 1. Must be 1-255 printable ASCII characters; UUID v4 recommended. See https://docs.revenium.io/integrations/idempotency for full behavior.

Responses

201AI completion successfully metered

400Bad request. Either invalid_idempotency_key from the idempotency filter (canonical envelope), or a controller-level body parse / @Valid failure (legacy ApiError envelope). The two will converge to a single schema once the v2 error envelope unification lands.

401Unauthorized - Authentication required

403Forbidden. The API key scope is not allowed to post to metering endpoints. Use a key with METERING (rev_mk_), WRITE (rev_sk_), or legacy (hak_) scope.

404Resource not found

409Idempotency conflict. idempotency_key_mismatch when the same Idempotency-Key was reused with a different request body. idempotency_key_in_progress when a concurrent request with the same key is still in flight; in that case the response carries a Retry-After header.

413Request body exceeds the maximum size buffered by the idempotency filter.

422Unprocessable entity

429Budget limit exceeded — request rejected by enforcement rule

500Internal server error

503Idempotency cache backing store is temporarily unavailable. Retry later.