On-Premise Deployment

Running the Grafana Stack On-Premise

The entire Grafana observability stack (Alloy, Prometheus, Mimir, Loki, Tempo, Pyroscope, Grafana) is open-source and can run fully on-premise — no cloud dependency required. This section covers what changes compared to a cloud-hosted deployment and the key decisions you need to make.

Cloud vs On-Premise — What Changes

Concern Cloud (e.g. AKS + Azure Blob) On-Premise
Kubernetes Managed (AKS, EKS, GKE) Self-managed (kubeadm, k3s, RKE2, OpenShift)
Object storage Azure Blob, S3, GCS Ceph RGW, SeaweedFS, or S3-compatible appliance
Load balancing Cloud LB / Ingress controller MetalLB, HAProxy, or hardware LB
Certificate management Let’s Encrypt + cloud DNS Internal CA, cert-manager with custom issuer
Node provisioning Auto-scaling node pools Fixed hardware, manual capacity planning
DNS Cloud DNS / Cloudflare Internal DNS (CoreDNS, BIND, Active Directory)

Object Storage — The Critical Decision

Mimir, Loki, Tempo, and Pyroscope all use object storage for long-term data. All Grafana stack components use the S3 API, so any S3-compatible storage works — configuration only changes the endpoint URL and credentials.

Component What It Stores Storage Impact
Mimir Metric blocks (TSDB) Grows with number of active series × retention
Loki Log chunks + index Grows with log volume × retention
Tempo Trace spans (Parquet) Grows with trace volume × retention
Pyroscope Profile data Grows with number of profiled services × retention

S3-Compatible Storage Options

Solution Architecture Best For License Status
MinIO Single binary or distributed cluster Widest community adoption, most docs/tutorials AGPLv3 Archived — project is in archive mode, no new features; existing releases still functional
Ceph RGW Distributed, runs on top of Ceph RADOS cluster Enterprises already running Ceph for block/file storage LGPL 2.1 Active
SeaweedFS Lightweight, master + volume servers Simple setup, fast performance, smaller teams Apache 2.0 Active
NetApp StorageGRID Hardware/software appliance Enterprise environments with existing NetApp infrastructure Commercial Active
Dell ECS Hardware/software appliance Enterprise environments with existing Dell infrastructure Commercial Active

Note on MinIO: MinIO has been archived and is no longer actively developed. Existing deployments continue to work, and many tutorials and guides still reference it. For new deployments, consider Ceph RGW (production, battle-tested) or SeaweedFS (simple setup, dev/PoC).

Deployment Modes

Same Helm charts as cloud, only values change. This is the closest to what our training stack uses.

# Example: Mimir storage config pointing to on-prem S3-compatible storage
mimir:
  structuredConfig:
    common:
      storage:
        backend: s3
        s3:
          endpoint: s3.storage.svc.cluster.local:9000
          bucket_name: mimir
          access_key_id: ${S3_ACCESS_KEY}
          secret_access_key: ${S3_SECRET_KEY}
          insecure: true  # if not using TLS internally

Key Helm charts:

Component Chart Notes
Grafana grafana/grafana Same chart, same config
Prometheus prometheus-community/kube-prometheus-stack Same chart, same config
Mimir grafana/mimir-distributed Change storage backend to on-prem S3
Loki grafana/loki Change storage backend to on-prem S3
Tempo grafana/tempo-distributed Change storage backend to on-prem S3
Pyroscope grafana/pyroscope Change storage backend to on-prem S3
Alloy grafana/alloy No storage dependency — same config

Option 2 — Docker Compose

Suitable for smaller environments without Kubernetes. Grafana provides official Docker images for all components.

# docker-compose.yml (simplified)
services:
  seaweedfs:
    image: chrislusf/seaweedfs
    command: "server -s3"
    volumes:
      - seaweedfs-data:/data

  mimir:
    image: grafana/mimir:latest
    volumes:
      - ./mimir-config.yaml:/etc/mimir/config.yaml

  loki:
    image: grafana/loki:latest
    volumes:
      - ./loki-config.yaml:/etc/loki/config.yaml

  tempo:
    image: grafana/tempo:latest
    volumes:
      - ./tempo-config.yaml:/etc/tempo/config.yaml

  alloy:
    image: grafana/alloy:latest
    volumes:
      - ./alloy-config.river:/etc/alloy/config.river

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    volumes:
      - ./grafana-datasources.yaml:/etc/grafana/provisioning/datasources/ds.yaml

Option 3 — Binary / systemd

For environments without containers. All components ship as standalone binaries for Linux. Install them as systemd services.

# Example: running Grafana as a systemd service
wget https://dl.grafana.com/oss/release/grafana-11.x.x.linux-amd64.tar.gz
tar -xzf grafana-*.tar.gz
sudo cp grafana-*/bin/grafana /usr/local/bin/
sudo systemctl enable --now grafana

This approach works but requires more manual configuration and lacks the orchestration benefits of Kubernetes.

Network Architecture

flowchart TB
    subgraph on_prem ["On-Premise Network"]
        subgraph k8s ["Kubernetes Cluster"]
            alloy["Alloy\n(DaemonSet)"]
            prometheus["Prometheus"]
            mimir["Mimir"]
            loki["Loki"]
            tempo["Tempo"]
            pyroscope["Pyroscope"]
            grafana["Grafana"]
        end

        subgraph storage ["Storage Layer"]
            s3["S3-compatible Storage\n(Ceph RGW / SeaweedFS)"]
            nfs["NFS / Local PVs\n(Prometheus TSDB)"]
        end

        subgraph infra ["Infrastructure"]
            lb["Load Balancer\n(MetalLB / HAProxy)"]
            dns["Internal DNS"]
            ca["Internal CA\n(cert-manager)"]
        end

        apps["Applications"] --> alloy
        alloy --> prometheus & loki & tempo & pyroscope
        prometheus --> mimir
        mimir & loki & tempo & pyroscope --> s3
        prometheus --> nfs
        lb --> grafana
        dns --> lb
    end

    users["Users"] --> lb

    style storage fill:#f59e0b,stroke:#d97706,color:#000
    style k8s fill:#3b82f6,stroke:#2563eb,color:#fff
    style infra fill:#e5e7eb,stroke:#9ca3af,color:#000

Hardware Sizing Reference

Component CPU Memory Storage Replicas
Alloy 0.5–1 core 512 MB–1 GB 1 per node (DaemonSet)
Prometheus 2–4 cores 4–16 GB 50–200 GB SSD (short-term) 1–2
Mimir (ingesters) 2–4 cores 4–8 GB 50 GB SSD (WAL) 3+
Loki (ingesters) 1–2 cores 2–4 GB 20 GB SSD (WAL) 3+
Tempo 1–2 cores 2–4 GB 20 GB SSD (WAL) 3+
Pyroscope 1–2 cores 2–4 GB 20 GB SSD 1–2
Grafana 1 core 512 MB–1 GB 1–2
S3 storage (Ceph/SeaweedFS) 2–4 cores 4–8 GB Depends on retention 3+ (distributed)

These are starting points for a medium workload (~50 services, ~100k active series, ~10 GB logs/day). Scale based on actual usage.

Key Considerations

What stays the same

  • Helm chart values (except storage backend URLs and credentials)
  • Alloy configuration — collection pipelines don’t change
  • Grafana dashboards and datasources — same provisioning, different URLs
  • PromQL, LogQL, TraceQL — query languages are infrastructure-agnostic
  • OpenTelemetry SDK instrumentation — application code doesn’t change at all

What you need to plan for

  • Storage capacity — no auto-expanding cloud disks; plan storage capacity upfront and monitor usage
  • Backup strategy — S3 bucket replication, or periodic snapshots to external storage
  • Network segmentation — ensure observability components can reach each other and applications can reach Alloy’s OTLP endpoints
  • Upgrades — no managed control plane; you own Kubernetes and component upgrades
  • Monitoring the monitoring — use a separate lightweight Prometheus or Alloy instance to monitor the observability stack itself

results matching ""

    No results matching ""