Grafana
Grafana is the visualization and exploration layer. In our stack it connects all backends (Prometheus, Mimir, Loki, Tempo, Pyroscope) and adds direct database datasources (PostgreSQL, Redis) for a complete observability experience.
Datasources
Observability Backends
| Datasource | Type | URL | Purpose |
|---|---|---|---|
| Prometheus | prometheus |
prometheus-kub-prometheus:9090 |
Short-term metrics, exemplars |
| Mimir | prometheus |
mimir-nginx:80/prometheus |
Long-term metrics, span-derived metrics, exemplars |
| Loki | loki |
loki-gateway |
Logs, TraceID derived field |
| Tempo | tempo |
tempo-query-frontend:3200 |
Traces, TraceQL, service map |
| Pyroscope | grafana-pyroscope-datasource |
pyroscope:4040 |
Continuous profiling, flame graphs |
| Alertmanager | alertmanager |
prometheus-kub-alertmanager:9093 |
Alert status and silences |
Application Databases
| Datasource | Type | Connection | Purpose |
|---|---|---|---|
| PostgreSQL | postgres |
postgresql.otel:5432 (db: otel, user: otelu) |
Direct SQL queries against the app database |
| Redis | redis-datasource |
valkey-cart.otel:6379 |
Direct Redis command execution |
Why Application Database Datasources?
These datasources enable self-service data exploration without involving a database administrator:
PostgreSQL — When a trace shows a slow SQL query (visible in the db.statement span attribute), you can click through from the trace to run that exact query directly against the database. This lets you:
- Verify query results without SSH access to the database
- Run
EXPLAIN ANALYZEto check query plans - Check table sizes and row counts
- Investigate data integrity issues found via traces or logs
Redis — When a trace shows a Redis operation (visible in the db.statement span attribute), you can click through to execute the command directly. This lets you:
- Check cart contents for a specific user (
HGET) - Verify cache state
- Debug session data
Trace-to-Database Correlations
Grafana is configured with correlations that extract database commands from trace spans and link them to the database datasources:
| Correlation | Source | Extraction | Target |
|---|---|---|---|
| Tempo → PostgreSQL | db.statement or db.query.text span attribute |
Regex extracts SQL statement | Runs query in PostgreSQL datasource |
| Tempo → Redis | db.statement span attribute |
Regex extracts HGET/GET command | Runs command in Redis datasource |
This means: when viewing a trace span that contains a database query, Grafana shows a link to run that exact query against the live database — no copy-pasting, no requesting admin access.
Drilldown Apps
Drilldown (formerly Explore Apps) provides query-free exploration of all four observability signals. Access via: Menu → Drilldown → Logs / Metrics / Traces / Profiles.
Logs Drilldown
Explores logs from Loki without writing LogQL.
| Tab | What It Shows |
|---|---|
| Logs | Log lines with filtering controls (level, text search, sorting) |
| Labels | Volume breakdown per label value — quickly spot which label values dominate |
| Fields | Volume breakdown per detected field — find high-value fields for filtering |
| Patterns | Auto-detected log patterns — hide noise, focus on anomalies |
Key feature: Patterns tab uses Loki’s pattern ingester to automatically group similar log lines. You can hide repetitive patterns (e.g., HTTP 200 access logs) to surface errors and anomalies.
Transition: Click Open in Explore to see the generated LogQL query.
Metrics Drilldown
Explores metrics from Prometheus without writing PromQL.
| Tab | What It Shows |
|---|---|
| Breakdown | Time series split by each label-value pair — see distribution at a glance |
| Related Metrics | Metrics with similar names and shared prefixes — discover metrics you didn’t know existed |
Filtering options: by labels, text search, prefixes/suffixes, alerting rules.
Transition: Click Open in Explore to see the generated PromQL query.
Traces Drilldown
Explores traces from Tempo without writing TraceQL. First select a signal: Rate, Errors, or Duration.
| Tab | What It Shows |
|---|---|
| Breakdown | Attribute-based breakdown of the selected signal — which services/operations dominate |
| Service structure | Call chain visualization between services — see how requests flow |
| Comparison | Side-by-side comparison of attributes: green = baseline, red = errors — spot correlations |
| Traces | List of matching traces — click to open full trace waterfall |
Key feature: Comparison tab highlights which attributes correlate with errors or high latency. This is the fastest path to root cause analysis without writing queries.
Powered by: Tempo’s span metrics generator and TraceQL metrics (local_blocks processor).
Profiles Drilldown
Explores profiles from Pyroscope without writing queries.
Navigation: All Services → select a service → select a profile type → Flame Graph
Available profile types (language-dependent):
- CPU — where processor time is spent (all services)
- Alloc Objects / Alloc Space — memory allocations (Go, Java, .NET)
- Inuse Objects / Inuse Space — current memory usage (Go, .NET)
- Goroutines — active goroutines (Go)
- Lock Contention — lock competition (.NET)
- Exceptions — exception profiles (.NET)
Key features:
- Explain flame graph — AI-powered analysis of bottlenecks (via Grafana LLM plugin + OpenAI)
- Diff flame graphs — compare two time windows, red = regression, green = improvement
Plugins Installed
| Plugin | Purpose |
|---|---|
grafana-pyroscope-app |
Pyroscope integration for continuous profiling and Profiles Drilldown |
grafana-llm-app |
LLM integration for AI-powered flame graph explanations (OpenAI: gpt-4o-mini default, gpt-4o for large) |
redis-datasource |
Redis/Valkey datasource for direct command execution |
Features Enabled
| Feature | Setting | Purpose |
|---|---|---|
| Anonymous auth | Enabled (Editor role) | Workshop access without login |
| Viewers can edit | Enabled | All participants can modify dashboards |
| Dark theme | Default | Easier on the eyes |
| Alpha plugins | Enabled | Access to latest Drilldown features |
| Allow unsigned plugins | Enabled | Support for custom/community plugins |
Access
- Port: 3000 (NodePort 30080)
- Ingress:
grafana.<domain> - Default auth: Anonymous with Editor role