Architecture Overview
Putting it all together - Observability Architecture
flowchart TB
subgraph sources ["Data Sources"]
app["Applications\n(instrumented with OTel SDK)"]
subgraph k8s ["Kubernetes"]
ksm["kube-state-metrics"]
node["node-exporter"]
kubelet["kubelet"]
end
subgraph exporters ["Exporters"]
pg["PostgreSQL Exporter\n:9187"]
redis["Redis Exporter\n:9121"]
end
end
subgraph alloy_box ["Grafana Alloy — Unified Collection Layer"]
direction LR
otel_recv["OTLP Receiver\n:4317 gRPC / :4318 HTTP"]
scraper["Prometheus Scraper\nServiceMonitors\nPod Annotations"]
log_scrape["Log Scraper\nPod stdout/stderr"]
ebpf_prof["eBPF Profiler\n97 Hz sampling"]
end
%% Applications → Alloy (OTLP)
app -- "OTLP\n(traces, metrics, logs)" --> otel_recv
%% Kubernetes metrics → Alloy (scrape)
k8s -- "scrape metrics" --> scraper
exporters -- "scrape metrics" --> scraper
%% Alloy scrapes logs from pods
app -. "stdout/stderr\n(pod logs)" .-> log_scrape
%% eBPF profiles all processes
app -. "kernel-level\nstack traces" .-> ebpf_prof
prometheus["Prometheus\n(short-term metrics)\n:9090"]
loki["Loki\n(logs)"]
tempo["Tempo\n(traces)\n:4317"]
pyroscope["Pyroscope\n(profiles)\n:4040"]
mimir["Mimir\n(long-term metrics)"]
%% Alloy → backends
otel_recv -- "metrics" --> prometheus
scraper -- "metrics\n(remote write)" --> prometheus
otel_recv -- "logs (OTLP)" --> loki
log_scrape -- "logs" --> loki
otel_recv -- "traces (OTLP)" --> tempo
ebpf_prof -- "profiles" --> pyroscope
%% Metrics from Traces
tempo -- "Span metrics generator\n(RED metrics, service graphs,\nTraceQL metrics)" --> prometheus
prometheus -- "remote write" --> mimir
grafana["Grafana\n:3000"]
mimir --> grafana
prometheus --> grafana
loki --> grafana
tempo --> grafana
pyroscope --> grafana
%% Cross-signal links in Grafana
grafana -. "Traces → Logs\n(TraceID correlation)" .-> loki
grafana -. "Traces → Profiles\n(service_name mapping)" .-> pyroscope
grafana -. "Traces → Metrics\n(span metrics queries)" .-> prometheus
%% Styling
style alloy_box fill:#f59e0b,stroke:#d97706,color:#000
style sources fill:#e5e7eb,stroke:#9ca3af,color:#000
style grafana fill:#10b981,stroke:#059669,color:#fff
style prometheus fill:#3b82f6,stroke:#2563eb,color:#fff
style loki fill:#3b82f6,stroke:#2563eb,color:#fff
style tempo fill:#3b82f6,stroke:#2563eb,color:#fff
style pyroscope fill:#3b82f6,stroke:#2563eb,color:#fff
style mimir fill:#3b82f6,stroke:#2563eb,color:#fff
Signal Flow Summary
| Signal | Source | Collector | Backend | Long-term |
|---|---|---|---|---|
| Metrics | Apps (OTLP), K8s components, Exporters | Alloy (scrape + OTLP) | Prometheus | Mimir |
| Logs | App stdout/stderr, OTLP structured logs | Alloy (file tail + OTLP) | Loki | Loki (Azure Blob) |
| Traces | Apps (OTLP) | Alloy (OTLP passthrough) | Tempo | Tempo (Azure Blob) |
| Profiles | All processes (eBPF), SDK-instrumented apps | Alloy (eBPF + scrape) | Pyroscope | Pyroscope (Azure Blob) |
| Metrics from Traces | Trace spans | Tempo metrics generator | Mimir | Mimir (Azure Blob) |
Cross-Signal Correlations
The stack is wired so that you can navigate between signals without leaving Grafana:
| From | To | How |
|---|---|---|
| Trace → Logs | TraceID derived field in Loki | Click a trace span → see matching logs (±1h time window) |
| Trace → Metrics | Span metrics (RED) generated by Tempo | Click a trace span → see rate/error/duration metrics for that service |
| Trace → Profile | service_name mapping to Pyroscope | Click a trace span → see CPU/memory profile for that service at that time |
| Trace → PostgreSQL | db.statement attribute extraction |
Click a trace span with DB query → run the SQL directly against PostgreSQL datasource |
| Trace → Redis | db.statement attribute extraction |
Click a trace span with Redis command → run HGET/GET directly against Redis datasource |
| Metric → Trace | Exemplars on metrics | Click an exemplar point on a metric graph → jump to the corresponding trace |
| Log → Trace | TraceID field in structured logs | Click a TraceID in a log line → open the full trace in Tempo |
Key Design Decisions
-
Alloy as the single collection point — Prometheus does not scrape directly. All metric collection flows through Alloy, which gives a unified configuration point and enables eBPF profiling from the same DaemonSet.
-
Dual metrics path — Prometheus holds short-term metrics (hours), Mimir holds long-term (weeks/months) in Azure Blob Storage. Both are queryable as Grafana datasources.
-
Tempo metrics generator — Tempo extracts RED metrics, service graphs, and TraceQL metrics from traces and writes them to Mimir. This means you get metrics-based alerting on traces without manual instrumentation.
-
eBPF profiling by default — Every process on every node is profiled at 97 Hz with zero code changes. SDK-based profiling adds richer data (goroutines, locks, exceptions) for instrumented services.