Grafana Mimir
Mimir is the long-term metrics storage backend. It provides 100% Prometheus compatibility (PromQL, remote_write API) while adding horizontal scalability, multi-tenancy, and durable object storage.
Role in the Stack
| Function | Details |
|---|---|
| Long-term retention | Stores metrics in Azure Blob Storage for weeks/months |
| Horizontal scalability | Each component scales independently |
| Multi-tenancy | Isolates metrics by tenant via X-Scope-OrgID header |
| Prometheus compatibility | 100% PromQL — same queries work against both Prometheus and Mimir |
| Exemplar storage | Stores exemplars for trace correlation on long-term metrics |
| Caching | Memcached for chunks, query results, and metadata |
Why Mimir Over Prometheus Alone?
| Challenge with Prometheus | Mimir Solution |
|---|---|
| Limited retention (memory + local disk) | Object storage = unlimited retention |
| Memory constraints on high cardinality | Horizontal scaling of ingesters |
| No multi-tenancy | Built-in tenant isolation |
| Query latency on large datasets | Query splitting, parallelism, Memcached caching |
| Single point of failure | Replicated components, HA by design |
Why Not Only Mimir?
If Mimir solves all of Prometheus’s limitations, why keep Prometheus at all? Because they serve different roles:
| Concern | Why Prometheus Stays |
|---|---|
| Query latency | Prometheus serves recent metrics from memory — sub-millisecond response for the last few hours. Mimir must fetch from object storage for anything beyond the ingester buffer window |
| Alerting evaluation | Prometheus evaluates alert rules locally against in-memory data. Using Mimir for alerting adds network hops and dependency on the full Mimir stack being healthy |
| Operational simplicity | Prometheus is a single binary with no external dependencies. If Mimir’s ingesters, store-gateways, or object storage have issues, Prometheus still works |
| Bootstrapping | Prometheus can run without Mimir. Mimir cannot run without something writing metrics to it |
| Cost | Not every metric needs long-term retention. Prometheus handles short-lived, high-churn metrics cheaply without sending them to object storage |
In our setup the split is intentional:
- Prometheus = fast, local, always-available metrics for the last few hours + alerting
- Mimir = durable, scalable storage for long-term queries and dashboards
Both are exposed as separate Grafana datasources — use Prometheus for real-time dashboards and alerts, Mimir for historical analysis and capacity planning.
Deployment — Microservices Mode
| Component | Replicas | CPU | RAM | Storage | Purpose |
|---|---|---|---|---|---|
| Nginx Gateway | 2 | 10m | 16Mi | — | Entry point, load balancing |
| Distributor | 2 | 100m | 256Mi | — | Receives writes, routes to ingesters |
| Ingester | 3 | 100m | 512Mi | 10Gi PV | Buffers in memory (~4h), flushes to object storage |
| Querier | 2 | 100m | 256Mi | — | Parallel query execution |
| Query Frontend | 2 | 100m | 256Mi | — | Query splitting, caching via Memcached |
| Query Scheduler | 2 | 10m | 32Mi | — | Fair queuing across tenants |
| Store Gateway | 2 | 100m | 256Mi | 10Gi PV | Reads historical blocks from object storage |
| Compactor | 1 | 100m | 256Mi | 10Gi PV | Merges blocks, enforces retention |
| Ruler | 1 | 100m | 256Mi | — | Evaluates recording rules and alerts |
What Feeds Into Mimir
| Source | Signal | Path |
|---|---|---|
| Prometheus | All scraped metrics | Prometheus remote write → Mimir Nginx Gateway |
| Tempo | Span metrics (RED), service graphs, TraceQL metrics | Tempo metrics generator → Prometheus → Mimir (via remote write) |
Storage
- Backend: Azure Blob Storage
- Containers:
mimir-storage(blocks),mimir-blocks,mimir-alertmanager,mimir-ruler - Format: Prometheus TSDB blocks
Limits
| Limit | Value |
|---|---|
| Max global series per user | 150,000 |
| Max global series per metric | 20,000 |
| Ingestion rate | 10,000 samples/sec |
| Ingestion burst size | 200,000 |
| Out-of-order time window | 10 minutes |
Integration with Other Components
Tempo metrics generator → Prometheus → Mimir — Tempo generates RED metrics (rate, errors, duration), service graphs, and TraceQL metrics from trace spans and writes them to Prometheus via remote write. Prometheus then forwards them to Mimir for long-term storage. This is the source data for Traces Drilldown and the Service Map in Grafana.
Grafana exemplars — Mimir stores exemplars (metric → trace ID links) enabling click-through from metric graphs to traces in Tempo.
Grafana Datasource
- Type:
prometheus(100% compatible) - URL:
http://mimir-nginx.monitoring.svc.cluster.local:80/prometheus - Exemplars: Enabled — links to Tempo by TraceID
- Use for: Long-term queries, dashboards with wide time ranges, span-derived metrics