Skip to content

[Feature] Prometheus / OpenTelemetry metrics export #93

@carlosatta

Description

@carlosatta

What problem does this solve?

Routerly has no metrics export. Teams running standard observability stacks (Grafana, Datadog, New Relic, Prometheus Alertmanager) have no way to scrape Routerly's operational state — request throughput, latency percentiles, error rates, token consumption, budget headroom — without manually querying the dashboard API and building their own adapters. This makes Routerly difficult to integrate into an existing monitoring setup.

Proposed solution

1. Prometheus-compatible /metrics endpoint

Expose a /metrics endpoint (scrapeable by Prometheus or any compatible agent) with the following gauge/counter/histogram families:

Metric Type Labels
routerly_requests_total counter project, model, provider, status
routerly_request_duration_seconds histogram project, model, provider
routerly_tokens_total counter project, model, type (input/output)
routerly_cost_usd_total counter project, model
routerly_budget_used_ratio gauge project, limit_id
routerly_provider_errors_total counter provider, model, error_type
routerly_cache_hits_total counter project, cache_type
routerly_routing_policy_used_total counter policy

The endpoint should support optional bearer token auth (configurable) so it can be kept private in production.

2. OpenTelemetry (OTEL) trace export

For each request, emit an OTEL span carrying: project, model, provider, routing policy used, latency, token counts, cost, cache hit/miss. Export to a configurable OTLP endpoint (gRPC or HTTP).

{
  "telemetry": {
    "prometheus": { "enabled": true, "path": "/metrics", "authToken": "..." },
    "otel": { "enabled": true, "endpoint": "http://otel-collector:4317" }
  }
}

Alternatives you've considered

Polling the existing /api/usage endpoint from a custom exporter. Works but requires maintaining external glue code and does not expose real-time per-request latency.

Who would benefit from this?

Any team running Routerly in production with an existing Prometheus/Grafana or Datadog stack. This is a standard requirement for infrastructure components in enterprise environments.

Additional context

LiteLLM, Bifrost, and Kong all list Prometheus and OpenTelemetry support as first-class features. Bifrost specifically promotes sub-millisecond overhead with full OTEL tracing. A /metrics endpoint is also the most common ask from developers evaluating self-hosted gateways.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions