Grafana Tempo
Tempo is the distributed tracing backend. It stores traces in object storage with no indexing requirements — search is powered by the columnar vParquet4 format. Tempo also generates metrics from traces, bridging the gap between tracing and monitoring.
Role in the Stack
| Function | Details |
|---|---|
| Trace storage | Stores spans in Azure Blob Storage (vParquet4 columnar format) |
| TraceQL engine | Query language for filtering and analyzing traces |
| Metrics generation | Extracts RED metrics, service graphs, and TraceQL metrics from spans |
| Protocol gateway | Accepts traces via OTLP, Jaeger, Zipkin, and OpenCensus protocols |
| MCP server | Exposes trace data to AI assistants via Model Context Protocol |
Deployment — Microservices Mode
| Component | Replicas | CPU | RAM | Storage | Purpose |
|---|---|---|---|---|---|
| Distributor | 1 | 200m | 512Mi | — | Entry point, accepts all trace protocols |
| Ingester | 3 | 500m | 2Gi | 10Gi PV | Buffers spans, writes blocks to object storage |
| Querier | 2 | 200m | 512Mi | — | Retrieves traces from ingesters + storage |
| Query Frontend | 1 | 200m | 512Mi | — | Query optimization, streaming, MCP server |
| Gateway | 1 | — | — | — | Nginx reverse proxy |
| Compactor | 1 | 200m | 2Gi | — | Block compaction, retention enforcement |
| Metrics Generator | 1 | 200m | 512Mi | — | Extracts metrics from spans |
What Feeds Into Tempo
| Source | Protocol | Path |
|---|---|---|
| Alloy | OTLP gRPC | App OTel SDK → Alloy OTLP receiver → Tempo Distributor |
Supported protocols (for direct ingestion):
- OTLP (4317/4318)
- Jaeger (14250/14268/6831/6832)
- Zipkin (9411)
- OpenCensus (55678)
Storage
- Backend: Azure Blob Storage
- Container:
tempo-traces - Format: vParquet4 (columnar, optimized for TraceQL)
- Block retention: 24 hours
- Compacted block retention: 1 hour
Metrics Generator — Traces to Metrics
This is one of Tempo’s most powerful features. The metrics generator processes every ingested span and produces three types of derived data:
1. Span Metrics (RED)
Generates Prometheus-compatible metrics from spans:
traces_spanmetrics_latency_bucket— duration histogram per service/operationtraces_spanmetrics_calls_total— request count per service/operation with status
These are the foundation of the Traces Drilldown Rate/Errors/Duration signals.
2. Service Graphs
Generates service-to-service dependency metrics:
- Request rate between services
- Error rate between services
- Duration between services
This powers the Service Map (node graph) visualization in Grafana.
3. Local Blocks (TraceQL Metrics)
Generates metrics from TraceQL expressions for the Traces Drilldown breakdown and comparison features. No duration limit on metrics API queries.
All generated metrics are written to Prometheus via remote write at http://prometheus-and-grafana-kub-prometheus.monitoring.svc.cluster.local:9090/api/v1/write, which then forwards them to Mimir for long-term storage.
Integration with Other Components
Traces → Logs (Loki)
- Links trace spans to logs by injecting
TraceIDfilter - Tag mapping:
service.name(trace attribute) →service(Loki label) - Time shift: ±1 hour around the span timestamp
- Enables: Click a span → see all logs from that service around that time
Traces → Metrics (Prometheus/Mimir)
- Span metrics generator produces RED metrics per service/operation
- Grafana queries these metrics when you click a span’s “Related metrics”
- Exemplars on metrics link back to specific traces
Traces → Profiles (Pyroscope)
- Links trace spans to Pyroscope profiles by
service_name - Enables: Click a span → see the CPU/memory profile of that service at that time
- Useful for answering “this span was slow — what was the service doing?”
MCP Server
- Enabled on Query Frontend at
/api/mcp - Allows AI assistants to query traces programmatically
- Exposes trace search and TraceQL capabilities
Grafana Datasource
- Type:
tempo - URL:
http://tempo-query-frontend.monitoring.svc.cluster.local:3200 - Features enabled:
- HTTP streaming (for large trace results)
- Node Graph (service dependency visualization)
- Service Map from Prometheus (uses span metrics)
- Traces to Logs (Loki with tag mapping)
- Traces to Metrics (Prometheus with span metrics queries)
- Traces to Profiles (Pyroscope with service_name mapping)
- TraceQL search