Grafana Mimir
Mimir is the long-term metrics storage backend. It provides 100% Prometheus compatibility (PromQL, remote_write API) while adding horizontal scalability, multi-tenancy, and durable object storage.
Role in the Stack
| Function | Details |
|---|---|
| Long-term retention | Stores metrics in Azure Blob Storage for weeks/months |
| Horizontal scalability | Each component scales independently |
| Multi-tenancy | Isolates metrics by tenant via X-Scope-OrgID header |
| Prometheus compatibility | 100% PromQL — same queries work against both Prometheus and Mimir |
| Exemplar storage | Stores exemplars for trace correlation on long-term metrics |
| Caching | Memcached for chunks, query results, and metadata |
Why Mimir Over Prometheus Alone?
| Challenge with Prometheus | Mimir Solution |
|---|---|
| Limited retention (memory + local disk) | Object storage = unlimited retention |
| Memory constraints on high cardinality | Horizontal scaling of ingesters |
| No multi-tenancy | Built-in tenant isolation |
| Query latency on large datasets | Query splitting, parallelism, Memcached caching |
| Single point of failure | Replicated components, HA by design |
Why Not Only Mimir?
If Mimir solves all of Prometheus’s limitations, why keep Prometheus at all? Because they serve different roles:
| Concern | Why Prometheus Stays |
|---|---|
| Query latency | Prometheus serves recent metrics from memory — sub-millisecond response for the last few hours. Mimir must fetch from object storage for anything beyond the ingester buffer window |
| Alerting evaluation | Prometheus evaluates alert rules locally against in-memory data. Using Mimir for alerting adds network hops and dependency on the full Mimir stack being healthy |
| Operational simplicity | Prometheus is a single binary with no external dependencies. If Mimir’s ingesters, store-gateways, or object storage have issues, Prometheus still works |
| Bootstrapping | Prometheus can run without Mimir. Mimir cannot run without something writing metrics to it |
| Cost | Not every metric needs long-term retention. Prometheus handles short-lived, high-churn metrics cheaply without sending them to object storage |
In our setup the split is intentional:
- Prometheus = fast, local, always-available metrics for the last few hours + alerting
- Mimir = durable, scalable storage for long-term queries and dashboards
Both are exposed as separate Grafana datasources — use Prometheus for real-time dashboards and alerts, Mimir for historical analysis and capacity planning.
Versions
| Chart | grafana/mimir-distributed 6.0.6 |
| Mimir | 3.0.4 |
| Strimzi Kafka Operator | strimzi/strimzi-kafka-operator chart 1.0.0 |
| Kafka | Apache Kafka 4.1.0 (quay.io/strimzi/kafka:1.0.0-kafka-4.1.0) |
Deployment — Microservices Mode (Kafka-backed ingest storage)
Mimir 3.0 (chart mimir-distributed 6.0+) introduced a new write-path architecture: distributors no longer push samples directly to ingesters over gRPC. Instead, distributors produce to a Kafka topic, and ingesters consume from it asynchronously. This decouples the write path from the ingester ring — a distributor’s POST /api/v1/push completes as soon as Kafka has accepted the write.
| Component | Replicas | CPU | RAM | Storage | Purpose |
|---|---|---|---|---|---|
| Nginx Gateway | 2 | 10m | 16Mi | — | Entry point, load balancing |
| Distributor | 2 | 100m | 256Mi | — | Validates writes, produces to Kafka topic mimir-ingest |
| Kafka (Strimzi) | 3 (KRaft mixed-mode) | 500m | 1Gi | 20Gi PV each | Durable write buffer between distributors and ingesters (100 partitions, RF=3, min.insync.replicas=2) |
| Ingester | 3 (zone-a/b/c) | 100m | 512Mi | 10Gi PV | Consumes its Kafka partitions, builds TSDB blocks, flushes to object storage |
| Querier | 2 | 100m | 256Mi | — | Parallel query execution |
| Query Frontend | 2 | 100m | 256Mi | — | Query splitting, caching via Memcached |
| Query Scheduler | 2 | 10m | 32Mi | — | Fair queuing across tenants |
| Store Gateway | 2 | 100m | 256Mi | 10Gi PV | Reads historical blocks from object storage |
| Compactor | 1 | 100m | 256Mi | 10Gi PV | Merges blocks, enforces retention |
| Ruler | 1 | 100m | 256Mi | — | Evaluates recording rules and alerts |
The Kafka cluster is provisioned by the Strimzi Kafka Operator (CNCF, operator chart strimzi/strimzi-kafka-operator 1.0.0). Three KRaft mixed-mode brokers (each plays both controller and broker roles — no Zookeeper), default.replication.factor: 3, min.insync.replicas: 2. Production deployments swap this for a fully separate Kafka cluster (Strimzi at scale, MSK, Confluent Cloud); the workshop’s 3-broker layout is enough to demonstrate the architecture and survive one broker loss. The Mimir distributors and ingesters connect to a single bootstrap service: mimir-kafka-kafka-bootstrap.monitoring.svc.cluster.local:9092.
What Feeds Into Mimir
| Source | Signal | Path |
|---|---|---|
| Prometheus | All scraped metrics | Prometheus remote write → mimir-gateway:80/api/v1/push → distributor → Kafka → ingester |
| Tempo | Span metrics (RED), service graphs, TraceQL metrics | Tempo metrics generator → Prometheus → Mimir (same Kafka write path) |
Storage
- Backend: Azure Blob Storage
- Containers:
mimir-storage(blocks),mimir-blocks,mimir-alertmanager,mimir-ruler - Format: Prometheus TSDB blocks
Limits
| Limit | Value |
|---|---|
| Max global series per user | 150,000 |
| Max global series per metric | 20,000 |
| Ingestion rate | 10,000 samples/sec |
| Ingestion burst size | 200,000 |
| Out-of-order time window | 10 minutes |
Integration with Other Components
Tempo metrics generator → Prometheus → Mimir — Tempo generates RED metrics (rate, errors, duration), service graphs, and TraceQL metrics from trace spans and writes them to Prometheus via remote write. Prometheus then forwards them to Mimir for long-term storage. This is the source data for Traces Drilldown and the Service Map in Grafana.
Grafana exemplars — Mimir stores exemplars (metric → trace ID links) enabling click-through from metric graphs to traces in Tempo.
Grafana Datasource
- Type:
prometheus(100% compatible) - URL:
http://mimir-gateway.monitoring.svc.cluster.local:80/prometheus - Exemplars: Enabled — links to Tempo by TraceID
- Use for: Long-term queries, dashboards with wide time ranges, span-derived metrics
Auto-Scaling Best Practices
Mimir is built for horizontal scaling in microservices mode. Each component can be independently auto-scaled with Kubernetes HPA.
Which Components to Auto-Scale
| Component | Auto-scalable? | Scale trigger | Notes |
|---|---|---|---|
| Distributor | ✅ Yes | CPU, incoming sample rate | Stateless — scale freely |
| Ingester | ⚠️ With care | Memory, active series | Stateful — ring member, holds data in memory (~4h before flush) |
| Querier | ✅ Yes | CPU, query queue depth | Stateless — more queriers = faster query execution |
| Query Frontend | ⚠️ Rarely needed | — | 2 replicas usually enough — splits queries, doesn’t execute them |
| Query Scheduler | ❌ No | — | Lightweight, 2 replicas fixed |
| Store Gateway | ✅ Yes | CPU, memory | Stateful (caches blocks), but supports ring-based sharding |
| Compactor | ❌ No | — | Singleton — one instance per tenant shard |
| Ruler | ✅ Yes | CPU, number of rules | Uses ring for rule sharding |
HPA Examples
Distributor:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: mimir-distributor
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: mimir-distributor
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Querier:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: mimir-querier
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: mimir-querier
minReplicas: 2
maxReplicas: 15
metrics:
- type: Pods
pods:
metric:
name: cortex_query_scheduler_queue_length
target:
type: AverageValue
averageValue: "5"
Ingester Scaling — Critical Considerations
Mimir ingesters under the Kafka-backed ingest_storage architecture are still stateful, but their durability story is different from the classic architecture: durability is provided by Kafka, not by the ingester ring. An ingester restart no longer risks data loss for samples already accepted by the distributor — those samples are persisted in the Kafka topic. The ingester’s job is to consume its assigned partitions and turn them into TSDB blocks.
Partition ring
Each ingester registers in an ingester-partitions ring (separate from the legacy ingester ring used for query-time lookups). A partition transitions from PartitionPending → PartitionActive once a minimum number of owners (default 1) have been registered for the minimum waiting time (default 10s). Distributors only produce to active partitions.
If the partition ring is empty, distributors reject writes with DoBatch: InstancesCount <= 0. This is the symptom of either: no ingesters running, ingesters stuck before partition registration, or stale PVC state from a previous classic-architecture deployment.
Safe scale-up:
- New ingester joins the partition ring and claims its share of partitions
- No live data migration between ingesters — Kafka holds the source of truth
- Topic must have enough partitions for the target ingester count (chart default: 100 partitions, comfortable up to ~100 ingesters)
Safe scale-down:
- Ingester unsubscribes from its Kafka partitions on shutdown
- Active partitions are reclaimed by surviving owners after the heartbeat-timeout window
- Generous
terminationGracePeriodSecondsis still recommended (e.g., 300s) so the ingester can flush its current TSDB head block to object storage before exiting; failure to do so just means the block is re-built from Kafka by the new owner (some duplicate I/O, no data loss)
terminationGracePeriodSeconds: 300
lifecycle:
preStop:
httpGet:
path: /ingester/shutdown
port: http-metrics
Store Gateway Scaling
Store gateways cache block metadata and index data on local disk. Scaling them improves query performance for historical data.
- Uses ring-based sharding — each store gateway is responsible for a subset of blocks
- New replicas join the ring and gradually take ownership of blocks
- Scale based on query latency for wide time-range queries
- Ensure PVC provisioning is fast — slow disk attachment delays scale-up
Key Metrics for Auto-Scaling
# Distributor — incoming samples/sec
rate(cortex_distributor_received_samples_total[5m])
# Ingester — active series (primary memory driver)
cortex_ingester_active_series
# Querier — queue depth
cortex_query_scheduler_queue_length
# Store Gateway — block load time
cortex_bucket_store_block_load_duration_seconds
# Overall write health
rate(cortex_request_duration_seconds_count{route="/api/v1/push"}[5m])
General Guidelines
- Distributors are the easiest win — stateless, scale aggressively on CPU or incoming sample rate
- Queriers are the second priority — scale on queue depth for faster query response
- Ingesters still need a PodDisruptionBudget (
maxUnavailable: 1) so partition ownership can rebalance cleanly during rolling updates - Set
minReplicas≥ 1 per zone for zone-aware ingesters; for ingest_storage durability is in Kafka, not the ingester count - Kafka is the new bottleneck on the write path — monitor Kafka producer lag and broker resource pressure; scale partition count or broker resources before scaling ingesters
- Store gateways benefit from more replicas when query latency on historical data is high
- Write path and read path scale independently — ingestion spikes don’t correlate with query load
- Monitor partition-ring health after every scaling event:
cortex_ingester_partition_ring_partitions— partitions stuck inPartitionPendingmean ownership isn’t being claimed - Use KEDA for custom metric-based scaling (e.g., consumer lag, active series) when HPA with Prometheus adapter is too complex