[Feature] Prometheus / OpenTelemetry metrics export

## What problem does this solve?

Routerly has no metrics export. Teams running standard observability stacks (Grafana, Datadog, New Relic, Prometheus Alertmanager) have no way to scrape Routerly's operational state — request throughput, latency percentiles, error rates, token consumption, budget headroom — without manually querying the dashboard API and building their own adapters. This makes Routerly difficult to integrate into an existing monitoring setup.

## Proposed solution

**1. Prometheus-compatible `/metrics` endpoint**

Expose a `/metrics` endpoint (scrapeable by Prometheus or any compatible agent) with the following gauge/counter/histogram families:

| Metric | Type | Labels |
|---|---|---|
| `routerly_requests_total` | counter | `project`, `model`, `provider`, `status` |
| `routerly_request_duration_seconds` | histogram | `project`, `model`, `provider` |
| `routerly_tokens_total` | counter | `project`, `model`, `type` (input/output) |
| `routerly_cost_usd_total` | counter | `project`, `model` |
| `routerly_budget_used_ratio` | gauge | `project`, `limit_id` |
| `routerly_provider_errors_total` | counter | `provider`, `model`, `error_type` |
| `routerly_cache_hits_total` | counter | `project`, `cache_type` |
| `routerly_routing_policy_used_total` | counter | `policy` |

The endpoint should support optional bearer token auth (configurable) so it can be kept private in production.

**2. OpenTelemetry (OTEL) trace export**

For each request, emit an OTEL span carrying: project, model, provider, routing policy used, latency, token counts, cost, cache hit/miss. Export to a configurable OTLP endpoint (gRPC or HTTP).

```json
{
  "telemetry": {
    "prometheus": { "enabled": true, "path": "/metrics", "authToken": "..." },
    "otel": { "enabled": true, "endpoint": "http://otel-collector:4317" }
  }
}
```

## Alternatives you've considered

Polling the existing `/api/usage` endpoint from a custom exporter. Works but requires maintaining external glue code and does not expose real-time per-request latency.

## Who would benefit from this?

Any team running Routerly in production with an existing Prometheus/Grafana or Datadog stack. This is a standard requirement for infrastructure components in enterprise environments.

## Additional context

LiteLLM, Bifrost, and Kong all list Prometheus and OpenTelemetry support as first-class features. Bifrost specifically promotes sub-millisecond overhead with full OTEL tracing. A `/metrics` endpoint is also the most common ask from developers evaluating self-hosted gateways.

Metric	Type	Labels
`routerly_requests_total`	counter	`project`, `model`, `provider`, `status`
`routerly_request_duration_seconds`	histogram	`project`, `model`, `provider`
`routerly_tokens_total`	counter	`project`, `model`, `type` (input/output)
`routerly_cost_usd_total`	counter	`project`, `model`
`routerly_budget_used_ratio`	gauge	`project`, `limit_id`
`routerly_provider_errors_total`	counter	`provider`, `model`, `error_type`
`routerly_cache_hits_total`	counter	`project`, `cache_type`
`routerly_routing_policy_used_total`	counter	`policy`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Prometheus / OpenTelemetry metrics export #93

What problem does this solve?

Proposed solution

Alternatives you've considered

Who would benefit from this?

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Feature] Prometheus / OpenTelemetry metrics export #93

Description

What problem does this solve?

Proposed solution

Alternatives you've considered

Who would benefit from this?

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions