Grafana Alloy

Grafana Alloy

Grafana Alloy is an OpenTelemetry Collector distribution by Grafana Labs that acts as the single collection point for all observability signals in our setup. It replaces the need for separate Prometheus scrapers, Promtail for logs, and standalone OTel Collectors.

Role in the Stack

Signal Collection Method Destination
Metrics ServiceMonitor scraping (30s interval) + pod annotation scraping Prometheus, Mimir
Traces OTLP receiver (gRPC :4317, HTTP :4318) Tempo
Logs Kubernetes pod log tailing (/var/log/pods/) + OTLP receiver Loki
Profiles eBPF kernel sampling (97 Hz) + Pyroscope SDK scraping Pyroscope

Key decision: Prometheus is configured as a receiver only (no scraping). All metric collection flows through Alloy.

Deployment

  • Type: DaemonSet — one pod per node (required for eBPF access to each node’s kernel)
  • Privileged: Yes — required for eBPF profiling
  • Capabilities: SYS_ADMIN, SYS_PTRACE, SYS_RESOURCE, PERFMON, BPF
  • Resources: 256Mi–1Gi memory, 100m–1000m CPU
  • Init container: Sets perf_event_paranoid=-1 for eBPF
  • UI: Built-in component graph at port 12345

Metrics Collection

Alloy discovers metrics targets through two mechanisms:

1. ServiceMonitor scraping — discovers all ServiceMonitors cluster-wide, resolving endpoints and scraping at 30s intervals. This is the primary mechanism.

2. Pod annotation scraping — fallback for pods without ServiceMonitors. Pods with prometheus.io/scrape: "true" are automatically scraped. Monitoring stack pods (prometheus, alloy, mimir, loki, tempo, pyroscope) are excluded to avoid duplication.

Both paths support native histograms (protobuf scraping).

Remote write targets:

  • Prometheus (short-term): prometheus-and-grafana-kub-prometheus:9090
  • Mimir (long-term): mimir-nginx:80

Trace Collection

  • Receives OTLP traces via gRPC (:4317) and HTTP (:4318)
  • Adds k8s.namespace.name attribute to all spans for Kubernetes context
  • Forwards to Tempo via OTLP gRPC

Log Collection

Two parallel log streams:

1. Kubernetes pod logs — tails /var/log/pods/ on each node, parses CRI format, maps container labels to Loki labels.

2. OTLP logs — receives structured logs from applications via OTLP, enriches with Kubernetes metadata (namespace, pod, container), maps OpenTelemetry severity to Loki’s detected_level.

Both streams are sent to Loki via native Loki protocol.

Profiling

eBPF profiling (all processes, no instrumentation needed):

  • Sample rate: 97 Hz
  • Collects both kernel and user-space stacks
  • Python-specific profiling enabled
  • Covers every process on the node — including services that have no SDK instrumentation

Pyroscope SDK scraping (richer data for instrumented services):

  • Discovers pods with annotation profiles.grafana.com/cpu_scrape: "true"
  • Scrapes CPU, memory, mutex, block, and goroutine profiles
  • Provides language-specific profile types (JFR for Java, pprof for Go, etc.)

Volume Mounts

Mount Purpose
/sys/fs/bpf BPF filesystem for pinned maps and programs
/sys/kernel/debug Debugfs for kprobes/uprobes
/sys/kernel/btf BTF type information for CO-RE
/var/log/pods Kubernetes pod logs
/run/containerd Container runtime socket for PID-to-pod mapping

Integration Points

Applications ──OTLP──→ Alloy ──→ Tempo (traces)
                                ──→ Loki (logs)
                                ──→ Prometheus → Mimir (metrics)
                                ──→ Pyroscope (profiles)

K8s components ──scrape──→ Alloy ──→ Prometheus → Mimir

All processes ──eBPF──→ Alloy ──→ Pyroscope

Alloy is the only component that needs to run privileged — all backends run as regular pods.

results matching ""

    No results matching ""