Grafana Mimir

Grafana Mimir

Mimir is the long-term metrics storage backend. It provides 100% Prometheus compatibility (PromQL, remote_write API) while adding horizontal scalability, multi-tenancy, and durable object storage.

Role in the Stack

Function Details
Long-term retention Stores metrics in Azure Blob Storage for weeks/months
Horizontal scalability Each component scales independently
Multi-tenancy Isolates metrics by tenant via X-Scope-OrgID header
Prometheus compatibility 100% PromQL — same queries work against both Prometheus and Mimir
Exemplar storage Stores exemplars for trace correlation on long-term metrics
Caching Memcached for chunks, query results, and metadata

Why Mimir Over Prometheus Alone?

Challenge with Prometheus Mimir Solution
Limited retention (memory + local disk) Object storage = unlimited retention
Memory constraints on high cardinality Horizontal scaling of ingesters
No multi-tenancy Built-in tenant isolation
Query latency on large datasets Query splitting, parallelism, Memcached caching
Single point of failure Replicated components, HA by design

Why Not Only Mimir?

If Mimir solves all of Prometheus’s limitations, why keep Prometheus at all? Because they serve different roles:

Concern Why Prometheus Stays
Query latency Prometheus serves recent metrics from memory — sub-millisecond response for the last few hours. Mimir must fetch from object storage for anything beyond the ingester buffer window
Alerting evaluation Prometheus evaluates alert rules locally against in-memory data. Using Mimir for alerting adds network hops and dependency on the full Mimir stack being healthy
Operational simplicity Prometheus is a single binary with no external dependencies. If Mimir’s ingesters, store-gateways, or object storage have issues, Prometheus still works
Bootstrapping Prometheus can run without Mimir. Mimir cannot run without something writing metrics to it
Cost Not every metric needs long-term retention. Prometheus handles short-lived, high-churn metrics cheaply without sending them to object storage

In our setup the split is intentional:

  • Prometheus = fast, local, always-available metrics for the last few hours + alerting
  • Mimir = durable, scalable storage for long-term queries and dashboards

Both are exposed as separate Grafana datasources — use Prometheus for real-time dashboards and alerts, Mimir for historical analysis and capacity planning.

Deployment — Microservices Mode

Component Replicas CPU RAM Storage Purpose
Nginx Gateway 2 10m 16Mi Entry point, load balancing
Distributor 2 100m 256Mi Receives writes, routes to ingesters
Ingester 3 100m 512Mi 10Gi PV Buffers in memory (~4h), flushes to object storage
Querier 2 100m 256Mi Parallel query execution
Query Frontend 2 100m 256Mi Query splitting, caching via Memcached
Query Scheduler 2 10m 32Mi Fair queuing across tenants
Store Gateway 2 100m 256Mi 10Gi PV Reads historical blocks from object storage
Compactor 1 100m 256Mi 10Gi PV Merges blocks, enforces retention
Ruler 1 100m 256Mi Evaluates recording rules and alerts

What Feeds Into Mimir

Source Signal Path
Prometheus All scraped metrics Prometheus remote write → Mimir Nginx Gateway
Tempo Span metrics (RED), service graphs, TraceQL metrics Tempo metrics generator → Prometheus → Mimir (via remote write)

Storage

  • Backend: Azure Blob Storage
  • Containers: mimir-storage (blocks), mimir-blocks, mimir-alertmanager, mimir-ruler
  • Format: Prometheus TSDB blocks

Limits

Limit Value
Max global series per user 150,000
Max global series per metric 20,000
Ingestion rate 10,000 samples/sec
Ingestion burst size 200,000
Out-of-order time window 10 minutes

Integration with Other Components

Tempo metrics generator → Prometheus → Mimir — Tempo generates RED metrics (rate, errors, duration), service graphs, and TraceQL metrics from trace spans and writes them to Prometheus via remote write. Prometheus then forwards them to Mimir for long-term storage. This is the source data for Traces Drilldown and the Service Map in Grafana.

Grafana exemplars — Mimir stores exemplars (metric → trace ID links) enabling click-through from metric graphs to traces in Tempo.

Grafana Datasource

  • Type: prometheus (100% compatible)
  • URL: http://mimir-nginx.monitoring.svc.cluster.local:80/prometheus
  • Exemplars: Enabled — links to Tempo by TraceID
  • Use for: Long-term queries, dashboards with wide time ranges, span-derived metrics

results matching ""

    No results matching ""