Grafana

Grafana

Grafana is the visualization and exploration layer. In our stack it connects all backends (Prometheus, Mimir, Loki, Tempo, Pyroscope) and adds direct database datasources (PostgreSQL, Redis) for a complete observability experience.

Datasources

Observability Backends

Datasource Type URL Purpose
Prometheus prometheus prometheus-kub-prometheus:9090 Short-term metrics, exemplars
Mimir prometheus mimir-nginx:80/prometheus Long-term metrics, span-derived metrics, exemplars
Loki loki loki-gateway Logs, TraceID derived field
Tempo tempo tempo-query-frontend:3200 Traces, TraceQL, service map
Pyroscope grafana-pyroscope-datasource pyroscope:4040 Continuous profiling, flame graphs
Alertmanager alertmanager prometheus-kub-alertmanager:9093 Alert status and silences

Application Databases

Datasource Type Connection Purpose
PostgreSQL postgres postgresql.otel:5432 (db: otel, user: otelu) Direct SQL queries against the app database
Redis redis-datasource valkey-cart.otel:6379 Direct Redis command execution

Why Application Database Datasources?

These datasources enable self-service data exploration without involving a database administrator:

PostgreSQL — When a trace shows a slow SQL query (visible in the db.statement span attribute), you can click through from the trace to run that exact query directly against the database. This lets you:

  • Verify query results without SSH access to the database
  • Run EXPLAIN ANALYZE to check query plans
  • Check table sizes and row counts
  • Investigate data integrity issues found via traces or logs

Redis — When a trace shows a Redis operation (visible in the db.statement span attribute), you can click through to execute the command directly. This lets you:

  • Check cart contents for a specific user (HGET)
  • Verify cache state
  • Debug session data

Trace-to-Database Correlations

Grafana is configured with correlations that extract database commands from trace spans and link them to the database datasources:

Correlation Source Extraction Target
Tempo → PostgreSQL db.statement or db.query.text span attribute Regex extracts SQL statement Runs query in PostgreSQL datasource
Tempo → Redis db.statement span attribute Regex extracts HGET/GET command Runs command in Redis datasource

This means: when viewing a trace span that contains a database query, Grafana shows a link to run that exact query against the live database — no copy-pasting, no requesting admin access.

Drilldown Apps

Drilldown (formerly Explore Apps) provides query-free exploration of all four observability signals. Access via: Menu → Drilldown → Logs / Metrics / Traces / Profiles.

Logs Drilldown

Explores logs from Loki without writing LogQL.

Tab What It Shows
Logs Log lines with filtering controls (level, text search, sorting)
Labels Volume breakdown per label value — quickly spot which label values dominate
Fields Volume breakdown per detected field — find high-value fields for filtering
Patterns Auto-detected log patterns — hide noise, focus on anomalies

Key feature: Patterns tab uses Loki’s pattern ingester to automatically group similar log lines. You can hide repetitive patterns (e.g., HTTP 200 access logs) to surface errors and anomalies.

Transition: Click Open in Explore to see the generated LogQL query.

Metrics Drilldown

Explores metrics from Prometheus without writing PromQL.

Tab What It Shows
Breakdown Time series split by each label-value pair — see distribution at a glance
Related Metrics Metrics with similar names and shared prefixes — discover metrics you didn’t know existed

Filtering options: by labels, text search, prefixes/suffixes, alerting rules.

Transition: Click Open in Explore to see the generated PromQL query.

Traces Drilldown

Explores traces from Tempo without writing TraceQL. First select a signal: Rate, Errors, or Duration.

Tab What It Shows
Breakdown Attribute-based breakdown of the selected signal — which services/operations dominate
Service structure Call chain visualization between services — see how requests flow
Comparison Side-by-side comparison of attributes: green = baseline, red = errors — spot correlations
Traces List of matching traces — click to open full trace waterfall

Key feature: Comparison tab highlights which attributes correlate with errors or high latency. This is the fastest path to root cause analysis without writing queries.

Powered by: Tempo’s span metrics generator and TraceQL metrics (local_blocks processor).

Profiles Drilldown

Explores profiles from Pyroscope without writing queries.

Navigation: All Services → select a service → select a profile type → Flame Graph

Available profile types (language-dependent):

  • CPU — where processor time is spent (all services)
  • Alloc Objects / Alloc Space — memory allocations (Go, Java, .NET)
  • Inuse Objects / Inuse Space — current memory usage (Go, .NET)
  • Goroutines — active goroutines (Go)
  • Lock Contention — lock competition (.NET)
  • Exceptions — exception profiles (.NET)

Key features:

  • Explain flame graph — AI-powered analysis of bottlenecks (via Grafana LLM plugin + OpenAI)
  • Diff flame graphs — compare two time windows, red = regression, green = improvement

Plugins Installed

Plugin Purpose
grafana-pyroscope-app Pyroscope integration for continuous profiling and Profiles Drilldown
grafana-llm-app LLM integration for AI-powered flame graph explanations (OpenAI: gpt-4o-mini default, gpt-4o for large)
redis-datasource Redis/Valkey datasource for direct command execution

Features Enabled

Feature Setting Purpose
Anonymous auth Enabled (Editor role) Workshop access without login
Viewers can edit Enabled All participants can modify dashboards
Dark theme Default Easier on the eyes
Alpha plugins Enabled Access to latest Drilldown features
Allow unsigned plugins Enabled Support for custom/community plugins

Access

  • Port: 3000 (NodePort 30080)
  • Ingress: grafana.<domain>
  • Default auth: Anonymous with Editor role

results matching ""

    No results matching ""