Grafana Mimir

Grafana Mimir

Mimir is the long-term metrics storage backend. It provides 100% Prometheus compatibility (PromQL, remote_write API) while adding horizontal scalability, multi-tenancy, and durable object storage.

Role in the Stack

Function Details
Long-term retention Stores metrics in Azure Blob Storage for weeks/months
Horizontal scalability Each component scales independently
Multi-tenancy Isolates metrics by tenant via X-Scope-OrgID header
Prometheus compatibility 100% PromQL — same queries work against both Prometheus and Mimir
Exemplar storage Stores exemplars for trace correlation on long-term metrics
Caching Memcached for chunks, query results, and metadata

Why Mimir Over Prometheus Alone?

Challenge with Prometheus Mimir Solution
Limited retention (memory + local disk) Object storage = unlimited retention
Memory constraints on high cardinality Horizontal scaling of ingesters
No multi-tenancy Built-in tenant isolation
Query latency on large datasets Query splitting, parallelism, Memcached caching
Single point of failure Replicated components, HA by design

Why Not Only Mimir?

If Mimir solves all of Prometheus’s limitations, why keep Prometheus at all? Because they serve different roles:

Concern Why Prometheus Stays
Query latency Prometheus serves recent metrics from memory — sub-millisecond response for the last few hours. Mimir must fetch from object storage for anything beyond the ingester buffer window
Alerting evaluation Prometheus evaluates alert rules locally against in-memory data. Using Mimir for alerting adds network hops and dependency on the full Mimir stack being healthy
Operational simplicity Prometheus is a single binary with no external dependencies. If Mimir’s ingesters, store-gateways, or object storage have issues, Prometheus still works
Bootstrapping Prometheus can run without Mimir. Mimir cannot run without something writing metrics to it
Cost Not every metric needs long-term retention. Prometheus handles short-lived, high-churn metrics cheaply without sending them to object storage

In our setup the split is intentional:

  • Prometheus = fast, local, always-available metrics for the last few hours + alerting
  • Mimir = durable, scalable storage for long-term queries and dashboards

Both are exposed as separate Grafana datasources — use Prometheus for real-time dashboards and alerts, Mimir for historical analysis and capacity planning.

Versions

   
Chart grafana/mimir-distributed 6.0.6
Mimir 3.0.4
Strimzi Kafka Operator strimzi/strimzi-kafka-operator chart 1.0.0
Kafka Apache Kafka 4.1.0 (quay.io/strimzi/kafka:1.0.0-kafka-4.1.0)

Deployment — Microservices Mode (Kafka-backed ingest storage)

Mimir 3.0 (chart mimir-distributed 6.0+) introduced a new write-path architecture: distributors no longer push samples directly to ingesters over gRPC. Instead, distributors produce to a Kafka topic, and ingesters consume from it asynchronously. This decouples the write path from the ingester ring — a distributor’s POST /api/v1/push completes as soon as Kafka has accepted the write.

Component Replicas CPU RAM Storage Purpose
Nginx Gateway 2 10m 16Mi Entry point, load balancing
Distributor 2 100m 256Mi Validates writes, produces to Kafka topic mimir-ingest
Kafka (Strimzi) 3 (KRaft mixed-mode) 500m 1Gi 20Gi PV each Durable write buffer between distributors and ingesters (100 partitions, RF=3, min.insync.replicas=2)
Ingester 3 (zone-a/b/c) 100m 512Mi 10Gi PV Consumes its Kafka partitions, builds TSDB blocks, flushes to object storage
Querier 2 100m 256Mi Parallel query execution
Query Frontend 2 100m 256Mi Query splitting, caching via Memcached
Query Scheduler 2 10m 32Mi Fair queuing across tenants
Store Gateway 2 100m 256Mi 10Gi PV Reads historical blocks from object storage
Compactor 1 100m 256Mi 10Gi PV Merges blocks, enforces retention
Ruler 1 100m 256Mi Evaluates recording rules and alerts

The Kafka cluster is provisioned by the Strimzi Kafka Operator (CNCF, operator chart strimzi/strimzi-kafka-operator 1.0.0). Three KRaft mixed-mode brokers (each plays both controller and broker roles — no Zookeeper), default.replication.factor: 3, min.insync.replicas: 2. Production deployments swap this for a fully separate Kafka cluster (Strimzi at scale, MSK, Confluent Cloud); the workshop’s 3-broker layout is enough to demonstrate the architecture and survive one broker loss. The Mimir distributors and ingesters connect to a single bootstrap service: mimir-kafka-kafka-bootstrap.monitoring.svc.cluster.local:9092.

What Feeds Into Mimir

Source Signal Path
Prometheus All scraped metrics Prometheus remote write → mimir-gateway:80/api/v1/push → distributor → Kafka → ingester
Tempo Span metrics (RED), service graphs, TraceQL metrics Tempo metrics generator → Prometheus → Mimir (same Kafka write path)

Storage

  • Backend: Azure Blob Storage
  • Containers: mimir-storage (blocks), mimir-blocks, mimir-alertmanager, mimir-ruler
  • Format: Prometheus TSDB blocks

Limits

Limit Value
Max global series per user 150,000
Max global series per metric 20,000
Ingestion rate 10,000 samples/sec
Ingestion burst size 200,000
Out-of-order time window 10 minutes

Integration with Other Components

Tempo metrics generator → Prometheus → Mimir — Tempo generates RED metrics (rate, errors, duration), service graphs, and TraceQL metrics from trace spans and writes them to Prometheus via remote write. Prometheus then forwards them to Mimir for long-term storage. This is the source data for Traces Drilldown and the Service Map in Grafana.

Grafana exemplars — Mimir stores exemplars (metric → trace ID links) enabling click-through from metric graphs to traces in Tempo.

Grafana Datasource

  • Type: prometheus (100% compatible)
  • URL: http://mimir-gateway.monitoring.svc.cluster.local:80/prometheus
  • Exemplars: Enabled — links to Tempo by TraceID
  • Use for: Long-term queries, dashboards with wide time ranges, span-derived metrics

Auto-Scaling Best Practices

Mimir is built for horizontal scaling in microservices mode. Each component can be independently auto-scaled with Kubernetes HPA.

Which Components to Auto-Scale

Component Auto-scalable? Scale trigger Notes
Distributor ✅ Yes CPU, incoming sample rate Stateless — scale freely
Ingester ⚠️ With care Memory, active series Stateful — ring member, holds data in memory (~4h before flush)
Querier ✅ Yes CPU, query queue depth Stateless — more queriers = faster query execution
Query Frontend ⚠️ Rarely needed 2 replicas usually enough — splits queries, doesn’t execute them
Query Scheduler ❌ No Lightweight, 2 replicas fixed
Store Gateway ✅ Yes CPU, memory Stateful (caches blocks), but supports ring-based sharding
Compactor ❌ No Singleton — one instance per tenant shard
Ruler ✅ Yes CPU, number of rules Uses ring for rule sharding

HPA Examples

Distributor:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: mimir-distributor
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: mimir-distributor
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

Querier:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: mimir-querier
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: mimir-querier
  minReplicas: 2
  maxReplicas: 15
  metrics:
    - type: Pods
      pods:
        metric:
          name: cortex_query_scheduler_queue_length
        target:
          type: AverageValue
          averageValue: "5"

Ingester Scaling — Critical Considerations

Mimir ingesters under the Kafka-backed ingest_storage architecture are still stateful, but their durability story is different from the classic architecture: durability is provided by Kafka, not by the ingester ring. An ingester restart no longer risks data loss for samples already accepted by the distributor — those samples are persisted in the Kafka topic. The ingester’s job is to consume its assigned partitions and turn them into TSDB blocks.

Partition ring

Each ingester registers in an ingester-partitions ring (separate from the legacy ingester ring used for query-time lookups). A partition transitions from PartitionPendingPartitionActive once a minimum number of owners (default 1) have been registered for the minimum waiting time (default 10s). Distributors only produce to active partitions.

If the partition ring is empty, distributors reject writes with DoBatch: InstancesCount <= 0. This is the symptom of either: no ingesters running, ingesters stuck before partition registration, or stale PVC state from a previous classic-architecture deployment.

Safe scale-up:

  • New ingester joins the partition ring and claims its share of partitions
  • No live data migration between ingesters — Kafka holds the source of truth
  • Topic must have enough partitions for the target ingester count (chart default: 100 partitions, comfortable up to ~100 ingesters)

Safe scale-down:

  • Ingester unsubscribes from its Kafka partitions on shutdown
  • Active partitions are reclaimed by surviving owners after the heartbeat-timeout window
  • Generous terminationGracePeriodSeconds is still recommended (e.g., 300s) so the ingester can flush its current TSDB head block to object storage before exiting; failure to do so just means the block is re-built from Kafka by the new owner (some duplicate I/O, no data loss)
terminationGracePeriodSeconds: 300
lifecycle:
  preStop:
    httpGet:
      path: /ingester/shutdown
      port: http-metrics

Store Gateway Scaling

Store gateways cache block metadata and index data on local disk. Scaling them improves query performance for historical data.

  • Uses ring-based sharding — each store gateway is responsible for a subset of blocks
  • New replicas join the ring and gradually take ownership of blocks
  • Scale based on query latency for wide time-range queries
  • Ensure PVC provisioning is fast — slow disk attachment delays scale-up

Key Metrics for Auto-Scaling

# Distributor — incoming samples/sec
rate(cortex_distributor_received_samples_total[5m])

# Ingester — active series (primary memory driver)
cortex_ingester_active_series

# Querier — queue depth
cortex_query_scheduler_queue_length

# Store Gateway — block load time
cortex_bucket_store_block_load_duration_seconds

# Overall write health
rate(cortex_request_duration_seconds_count{route="/api/v1/push"}[5m])

General Guidelines

  1. Distributors are the easiest win — stateless, scale aggressively on CPU or incoming sample rate
  2. Queriers are the second priority — scale on queue depth for faster query response
  3. Ingesters still need a PodDisruptionBudget (maxUnavailable: 1) so partition ownership can rebalance cleanly during rolling updates
  4. Set minReplicas ≥ 1 per zone for zone-aware ingesters; for ingest_storage durability is in Kafka, not the ingester count
  5. Kafka is the new bottleneck on the write path — monitor Kafka producer lag and broker resource pressure; scale partition count or broker resources before scaling ingesters
  6. Store gateways benefit from more replicas when query latency on historical data is high
  7. Write path and read path scale independently — ingestion spikes don’t correlate with query load
  8. Monitor partition-ring health after every scaling event: cortex_ingester_partition_ring_partitions — partitions stuck in PartitionPending mean ownership isn’t being claimed
  9. Use KEDA for custom metric-based scaling (e.g., consumer lag, active series) when HPA with Prometheus adapter is too complex

results matching ""

    No results matching ""