Observability on Kubernetes — OpenTelemetry, Grafana Stack
Welcome to the Observability on Kubernetes workshop! This training covers end-to-end observability using OpenTelemetry, Prometheus, Grafana, Loki, Tempo, Mimir, and alerting on Azure AKS.
Materials
OpenTelemetry & Instrumentation
- Observability vs Monitoring — Why classic monitoring is no longer enough
- Czym jest Observability — Observability, OpenTelemetry overview
- Logi — Log formats, best practices, architecture
- Metryki — Metric types, PromQL, naming conventions
- Trace’y — Distributed tracing, spans, sampling
- Profile (eBPF) — Continuous profiling, flame graphs
- Architektura OpenTelemetry — Collector, instrumentation, Grafana Stack
- Cost Optimization — Data volume, sampling, retention, storage costs
Prometheus (separate sidebar group)
- Overview — Architecture, data flow, pull model
- Service Discovery — Kubernetes SD, relabeling, annotations
- Federation — Hierarchical monitoring, cross-datacenter
- Remote Write — Long-term storage, queue tuning, HA
- Native Histograms — Sparse buckets, migration, PromQL
- Naming Conventions — Metric names, labels, base units
- Recording Rules — Pre-aggregation, hierarchical rules
- Internal Mechanisms — TSDB, WAL, compaction, security
- Metric Cardinality — High cardinality, refactoring, analysis
- Push Gateway — When to use, encoding, API
- Exporters — Writing exporters, naming, deployment
Other Metrics Tools
- Mimir — Long-term metrics storage with Grafana Mimir
- Postgres Exporter — PostgreSQL metrics exporter
- Redis Exporter — Redis metrics exporter
Loki (separate sidebar group)
- Overview — Components, deployment modes, chunk format, labels
- LogQL — Index filtering, content filtering, exercises
Tempo (separate sidebar group)
- Overview — Architecture, components, protocols
- Integrations & TraceQL — Metrics generator, traces to logs/profiles, MCP
- Configuration & Deployment — Helm values, Grafana, retention
Visualization
- Grafana — Dashboards and visualization
Collection
- Alloy — Grafana Alloy collector
Alerting
- Alerting — General — Alerting concepts and strategies
- Alerting — AlertManager — Prometheus AlertManager configuration
Exercises
- Loki — Log querying exercises
- Tempo — Distributed tracing exercises
- Prometheus — Metrics and PromQL exercises
- Grafana Dashboards — Dashboard creation
- Drilldown — Multi-level dashboard navigation
- Grafana Alerting — Alerting rule configuration
- Application Insights — Azure monitoring integration
- Scalability — Scaling exercises